2020
Elecciones y Contratos (Elections and contracts)
Category: Best data-driven reporting (small and large newsrooms)
Country/area: Colombia
Organisation: Datasketch, Transparencia por Colombia
Organisation size: Small
Publication date: 12 Nov 2019

Credit: Juan Pablo Marín Díaz, Mariana Villamizar Rodríguez, Camila Achuri, Juliana Galvis, David Daza, Sebastián Botero, Sandra Martínez, Camilo Peña, Ángela Rodríguez, Sergio Rocha
Project description:
Eleccions and contracts is a long form investigative and data journalism report with an additional news app to understand the relationship of political campaign financing and public contracts in Colombia. It was developed by Datasketch and Transparencia por Colombia as an effort to unveil possible corruption schemes in politics in Colombia and the incorporation of private citizens and companies in such schemes with the use of large volume open data. The special report features different stories and a web app to explore the public data. From death political donors to a big data app to help other newsrooms in Colombia.
Impact reached:
Political campaign financing has many intricacies, and Colombia is no exception, several laws norms regulate political financing but unfortunately they are not always enforced because of lack of access to proper tools that handle big data. This is particular relevant because campaign financing data makes more sense to analyze in the context of other datasets. In our report we used hundred thousands of records from campaign financing and 8 million public contracts to explore their relationship through a series of stories and an interactive web app to explore the bulky datasets. The stories tell how we discovered irregular “anonymous” donors (we found death people being reported as financiers or citizens with social assistance financing candidates with millions) but most importantly we created a web app for other journalists and researchers to explore.
The special report was released in the midst of a series of debates around recent elections and how different government entities are using data to track corruption. One of the most important points of the project is that it put the issue of quality data access to debate among high level officials in Colombia to incorporate new technologies to track corruption. It also offered a new way to create communities of data savvy journalists to use these tools to research corruption topics in a more intuitive way. For instance, in the web app one can explore interesting connections such as that of a company who was sanctioned by corruption with around 100 million dollars, whose CEO is donated to the elected president in Colombia who later appointed the wife of the CEO as the equivalent of the secretary of state.
Techniques/technologies used:
The investigation started with the support of multiple civil society organizations for a pilot project developed jointly with the ministry of technologies in Colombia. After one year of data cleaning, organizing information and crossing multiple databases, including a custom built repository with the history of the most important corruption incidents in Colombia, the team at Datasketch and Transparencia por Colombia managed to put in a digestible format the complexity of the issue. We used data from multiple sources, from excel sheets, data collected by hand and database dumps from open contracting data, totaling more than 9 million registries that were compiled and organized using different R scripts that would be used to answer a series of questions proposed by law experts and journalists in multiple design thinking exercises. Special care with data cleaning algorithms and custom scrappers was done. Visualizations were built with custom R code to implement javascript visualizations in multiple libraries (d3, vis.js, highcharter, datatable) as well as an interactive web app with multiple javascript components using R and the Shiny web framework for data driven applications. Final report results was created using a static site generator called Hugo and code is hosted freely and open sourced on github. The design was made in Adobe Illustrator and then built into the website and web app.
What was the hardest part of this project?
The most challenging aspects of this project were: first to understand the legislation of campaign financing and its implications the recent elections (presidential and congress) as different rules apply and second the data access and cleaning, that included very high volumes of data and a lot revision from the team. The data available was truly important in terms of the insights only when it was crossed with multiple databases that came in very different formats (plain texts, excel sheets, database dumps, web services). Additionally, the exercises of design thinking we did with multiple stakeholders to finally end with an intuitive, yet useful design was fundamental as we covered individual stories, but also a web app for others to reuse the data we collected and prepared for other organizations.
What can others learn from this project?
Other journalists can appreciate the storytelling in this project but also that many times it is necessary to implement a project as complex as this in partnership with multiple organizations. A very complex topic such as campaign financing can be boring and difficult to understand, especially to non-experts so bridging the gap between the stories and a news app can be very powerful. Finally, we are big advocates of using R in data journalism, it helps cover all of the data needs a small newsroom. From data access or scrapping, to data cleaning and even building the full interactive sites and web apps to host the project, all possible with R.
Project links:
especiales.datasketch.co/elecciones-y-contratos/
www.monitorciudadano.co/elecciones-contratos/campanas
www.monitorciudadano.co/elecciones-contratos