Credit: Eva Belmonte, David Cabo, Miguel Ángel Gavilanes, Ángela Bernardo, Carmen Torrecillas, Raúl Díaz, María Álvarez del Vayo
Datos Civioor Civio Datawebsite is a repository of more than 50 datasets used in Civio’s investigations for the last five years. It includes the first and only archive of all pardons granted in Spain, a list of every fire recorded in Spain since 2001 or several datasets of public contracts that do not comply with Spanish law, among others. These datasets are creative commons and open for everyone, which allows: 1) readers to check the facts themselves; 2) media professionals to re-use data that was not publically available before.
Civio Data has been recently released, however, Civio’s commitment to open data has a long track record. In our efforts for transparency, in public administration and in our own work, we have always made available the datasets used in our investigations, encouraging colleagues to re-use the data we liberated. The website is the result of years of work and use of these datasets by readers, subscribers and colleagues. Sometimes, the data we published ended up being the base for scientific papers. For instance, the paper ‘Using Firefighter Mobility Traces to Understand Ad-Hoc Networks in Wildfires‘ uses Civio’s project Spain in Flames’ database for the analysis of wildfire patterns in the area. Regarding the repository of pardons, the data we made available has been also used in the paper: ‘Are pardons in Spain proportional to sanctions?‘ by the University of Palma de Gran Canaria.
Moreover, all our data is creative commons, which allows other media outlets to use and find different approaches. Sometimes, the use of databases with a local scope has allowed different journalistic pieces from the same information. For example, the repository of forest fires has been used by local journalists to explain in-depth endemic environmental problems in Cantabria, as well. More recently, our database of public contracts was also used by local journalists call on local and regional authorities to take responsibility, as it happened in Galicia, a northwestern region in Spain. Data about anticonceptives’ use worldwide was used in countries like Guatemala.
Since the launching of the Data Civio website in November 2019, there has been more than 350 downloads of our datasets.
Each database available at Civio Data website has a story of its own. As a data-driven organization, we are constantly using new technologies in cleaning and analysing data. The pardon repository, for example, was scraped from the Spanish Official Gazette (BOE). Scrapping the BOE has also been key to get the data about public workers who have lost their jobs since 1996 on account of conviction.
However, that is not the only way to get the data that is afterwards published on the website. Sometimes, we have to collect the data manually from different sources. This was the case with the information on the composition of the board of directors from public or semi public Spanish companies gender-wise. In order to obtain the data on doctors who were paid more than €50K from one big pharma company we had to convert the original PDFs from the Pharma companies into structured data. Most of the time, we stand by the Freedom of Information Act to request and access data from public administrations to later, make it available to citizens. Following this procedure, we have been able to access, process and release data on hygiene and sanitary inspections in food premises shops, schools or kindergartens in Madrid. This data was never published before and had a remarkable local impact. It created general awareness about the importance of access to information.
Sometimes, having these databases ready to be analyzed takes days or weeks of work from several people at Civio, who clean and assess the data for days. These tasks range from finding doubled names or numbers; creating small databases from larger ones for colleagues with no programing skills, converting the data to usable formats.Our multidisciplinary team puts a lot of effort in making the databases useful to the public.
What was the hardest part of this project?
Our motto is to liberate the data from public administrations to be available for citizens. This means that most of the time we have to struggle with bureaucracy -working with the Freedom Information Act- to access the information. The Spanish administrative model, which divides the territory in 17 so-called autonomous communities or regions, does not make it easier. Sometimes we have to do 17 requests for information, one for each region, since each region in Spain has its own Freedom of Information Act. The same happens when we request information from several ministries about, for example, the names of public workers that didn’t follow the regular process to become a civil servant.
It has been very challenging to make the information available with the human resources we count on. We are a small non-profit organization of 10 employees. The team has to know the law to release the data, but also needs to clean and structure it, understand it and make it easy for people to access it.
What can others learn from this project?
Civio Data shows the importance of open data for public governance. All the data we liberated was not publically available before. Now other journalists working in smaller media with no skills on data analysis, and the general public can access it at Civio Data. Access to information is key to a healthy democratic system, that is why we will keep feeding the project with new datasets we work on.
What can others learn from this project? Basically: sharing is caring. By making all the data we use in our investigations open for consultation or analysis, we are not only strengthening democracy, but also reinforcing journalism. Sharing our datasets adds value to our journalistic work because our readers can know where exactly the conclusions come from. Moreover, that data can be used by other colleagues.