2020
El Salvador Congress Observatory
Category: Open data
Country/area: El Salvador
Organisation: El Diario de Hoy
Organisation size: Big
Publication date: 3 Jan 2019

Credit: Lilian Martínez, Eduardo Sosa
Project description:
El Diario de Hoy created an observatory to follow what the members of El Salvadors congress do. This include open data contained in images and PDF files about how congressmen vote in every issue, what laws they propose and how many times they don’t go their job. Using some algoritms, the observatory transforms congress data in open data for the public.
Impact reached:
The impact of the observatory is that it will detect citizenship and the Diario de Hoy will obtain data that demonstrate the deputies approve all the laws requested by the president of the country, as well as detect the lack of parliamentary control over the work of the central government. In the same way, data will be generated that will allow monitoring throughout the years of the labor parliamentarian, know what each legislator has focused on and see if they have actually attended work. On the other hand, the data produced by the observatory allows to connect the work in the commissions that analyze the proposals of law with which it is carried out in the voting plenary session to determine whether or not it becomes the law of the country, something that is not could do until before the production of the new open data.
Techniques/technologies used:
Mainly techniques were used to process and extract text from images contained in PDF documents (scanned), as well as the Python programming language to mass process thousands of documents and download them from the Legislative Assembly website. There was also a need to create a specific algorithm to extract the names of legislators who vote for each law. The difficulty of this was that the Legislature does not use the full names of legislators and there are many with similar or similar names.
What was the hardest part of this project?
The most difficult part was relating the content of the opinions contained in the bills with their respective vote since the Assembly does not associate each vote with its respective file. To connect both data, there was a need to process the images to extract data such as the number of the opinion, the commission that prepared it and the date of it to be able to associate it unequivocally with each vote. There were also repeated votes and duplicate opinions so there was a need to debug.
What can others learn from this project?
One of the most valuable lessons has to do with the need for parliaments to properly identify the documents that are put to the vote so that it is not very difficult to connect data on how each legislator votes in each subject or bill.