Innovation (small newsrooms) – year 2020
Winner: Funes: an algorithm to fight corruption
Credit: Gianfranco Rossi, Nelly Luna Amancio, Gianfranco Huamán, Ernesto Cabral, Óscar Castilla
Jury’s comment: As more and more potentially newsworthy documents become routinely available online as digital data, classifying this deluge and prioritising reporters’ attention is becoming one of data journalism’s major challenges. The “Funes” tool from Peru’s OjoPúblico shows that even relatively small organisations can develop algorithms to help tackle this problem for specific types of documents. Funes adapts a model a contracting risk model developed in Europe to the Peruvian context. Using data scraped from five public databases, the algorithm analysed hundreds of thousands of Peruvian public procurement documents. Using a linear model, it combines 20 risk indicators — such as recently founded contractors or uncontested bids — to flag potentially corrupt contracts. It resulted in a large volume of cases for OjoPúblico and regional media partners to investigate as well as an interactive interface for readers, providing an excellent pioneering example of the sort of automated story discovery tools several judges said they expect to become an increasingly important area of investigative computational journalism.
Organisation size: Small
Publication date: 25 Nov 2019
Project description: Funes is an algorithm that identifies corruption risk situations in public contractings in Peru. The research project began to take shape in February 2018 and its development began in September of the same year. For 15 months a multidisciplinary team – integrated by programmers, statisticians and journalists – discussed, analyzed, built databases, verified the information and developed modeled an algorithm we call Funes, as the memorable protagonist of the Argentine writer Jorge Luis Borges. The algorithm rates a risk score for each contract process, entity and company. With that information journalists can prioritize their investigations.
Impact: The project was developed in the context of the fiscal investigations of the Lavajato case, which involves the payment of bribes by the Brazilian company Odebrecht in order to take charge of public contracts for the construction of public works. FUNES analyzes the contracts, and during its launch, identified a huge number of contracts with corruption risks. Of these, several were investigated and transformed into published reports. FUNES is the first tool developed in Peru, and one of the first of its kind in Latin America, which analyzes millions of data, to grant a corruption risk score in public procurement. FUNES identified that between 2015 and 2018 the Peruvian State granted almost 20 billion dollars in risky contracts. These were delivered to a single bidder who had no competition and to companies created a few days before the contest. The amount represents 90 times the civil reparation that Odebrecht must pay for its acts of corruption. Other published reports identified acts of corruption in companies that sell milk for social programs.
The tool has a friendly interface for readers with several visualizations in which the reader can analyze the situation of public contracts in Peru. The open source tool has attracted the interest of the control and control entities of Peru, who have requested to share the methodology and possibilities so that they can implement it in their equipment. FUNES warns of risk in thousands of contracts. Therefore, and given the dimension of the findings, OjoPúblico established alliances with regional media to analyze and investigate some of the main cases. Everyone noticed the same thing: irregular public contracts that have now begun to be investigated by the authorities. The investigations continue.
Techniques/technologies: Funes proviene de una familia de algoritmos denominados modelos lineales para combinar la información de 20 indicadores de riesgo, que fueron calculados a partir de 4 bases de datos. Un modelo lineal tiene la forma de un promedio ponderado: peso_1indicador_1 + peso_2indicador_2 + … + peso_nindicador_n = riesgo de corrupción Para aprender estos pesos usualmente se utiliza un esquema de regresión, que consiste en intentar predecir la respuesta -que en este caso, sería la corrupción- a partir de variables relacionadas -como llamaremos a los indicadores de riesgo-. De esta manera, los pesos aprendidos para cada indicador son los que mejor ayudan a predecir la respuesta para todos los contratos analizados. Sin embargo, Funes usa una variante de este esquema porque la corrupción en contrataciones públicas -denominada nuestra variable respuesta- es un fenómeno no observable: tenemos seguridad de que los contratos que han sido descubiertos por los fiscalizadores fueron corruptos; pero los que no, no sabemos si están absolutamente limpios o aún no son descubiertos, porque pueden responder a sofisticados y esquemas de corrupción más complejos como sucede, por ejemplo, con el caso Odebrecht y Lava Jato. El método de Funes parte de un esquema de proxies de corrupción, propuesto por Mihaly Fazekas, investigador de la Universidad de Cambridge, y adecuado y p al contexto peruano de l. Un proxy es una variable estrechamente relacionada a la variable no observable. Funes usa dos proxies: 1) que un contrato haya tenido un único postor; 2) la proporción de concentración del presupuesto de una entidad que tiene cada contratista. Entonces, Funes es una combinación de dos modelos lineales, una regresión logística para el único postor y una regresión beta para la proporción de concentración. El resultado de este proceso es un índice de riesgo de corrupción para cada contrato: a más alto, mayor
The hardest part of this project: The main challenges were related to the construction, access and quality of the data, the need for the team to learn new data analysis tools and the formation of a multidisciplinary team hitherto oblivious to journalistic research. In Peru there is no open data portal for hiring. For 7 months a script was developed and extracted data from a platform, which had blocked mass access through a captcha. The responsible entity blocked our IP to avoid downloading, forcing the team to reformulate the code to make extraction more efficient. To complete this information, 20 requests for access to information were also submitted. Another challenge was also the learning process on corruption theory, statistics and public procurement laws in Peru. We were not specialists in public bidding and there are 15 regulatory regimes. Meetings with experts were organized to know the process in detail, the processes were documented and each of the legal norms was analyzed.
Another of the challenges was also the definition of the concept of corruption that we were going to monitor and the model that we were going to use to develop the algorithm. Many papers were reviewed and interviews were conducted. In the end, the statistical model promoted by researcher Mihali Fazekas was chosen. The project left a journalistic team with robust knowledge in algorithms, R programming language, public contractings and predictability.
What can others learn from this project: We learned that the fight against corruption from journalism requires incorporating into its traditional case-by-case methods and massive data analysis, tools with algorithmic models that allow it to anticipate corruption. For them, journalistic teams are required to go beyond spreadsheets and open refining, and learn relational analysis technologies and R., and at the same time learn to convene and work with mathematicians, statisticians, programmers and political scientists.