Innovation (large newsrooms) – year 2020

Co-winner: Zones of Silence

Organisation: El Universal

Country: Mexico

Credit: Esteban Román, Gilberto Leon, Elsa Hernandez, Miguel Garnica, Edson Arroyo, César Saavedra, Jenny Lee, Dale Markowitz, Alberto Cairo

Jury’s comment: How do you measure the something that isn’t happening? What if the main cause of concern isn’t noise but silence? El Universal asked that question about the falling levels of coverage of homicides in Mexico, working on the hypothesis that journalists have been intimidated and harassed into silence. By comparing murder statistics with news stories over time, they were able to show where, and by how much, the troubling silence was growing.

Organisation size: Big

Publication date: 13 Jun 2019

Project description: Violent organized crime is one of the biggest crises facing Mexico. Journalists avoid becoming a target, so they choose to stay quiet to save their lives. We set out to measure this silence and its impact on journalism. To do so, we used artificial intelligence to quantify and visualize news coverage and analyze the gaps in coverage across the country. To measure the degree of silence in each region of the country, we created a formula that allows us to see the evolution of this phenomenon over time.

Impact: Something akin to a code of silence has emerged across the country. We suspected that there were entire regions where journalists were not reporting on the violence, threats, intimidation and murder that were well known to be part of daily life. This was confirmed by journalists who sought for us after the story was released, to tell us they have been facing this problems. In collaboration with them, now we are preparing a second part of this story, to focus on the patterns that lead to agressions. Hopefully this will lead us to some kind of alert when certain conditions (of news coverage and crimen) are present in regions of our country.

Techniques/technologies: Our first step was to establish a process to determine the absence of news. We explored articles on violence to understand how they compare to the government’s official registry of homicides. In theory, each murder that occurs ought to correspond with at least one local report about the event. If we saw a divergence, or if the government’s reports were suddenly very different from local news coverage, we could deduce that journalists were being silenced. Early on, sorting through news articles seemed impossible. We knew we needed to find a news archive with the largest number of publications in Mexico possible so we could track daily coverage across the country. Google News’ vast collection of local and national news stories across Mexico was a good fit. The effort required us to identify the difference between the number of homicides officially recorded and the news stories of those killings on Google News. This required machine learning algorithms that were able to identify the first reported story and then pinpoint where the event took place. With that information, we were able to connect reported events by media with the government’s reports on homicides across more than 2400 municipalities in Mexico. Finally, to measure the degree of silence in each region of the country, we created a formula that allows us to see the evolution of this phenomenon over time. The resulting data shows a fascinating mix of falls or peaks in unreported deaths, which coincide with events such as the arrival of new governments or the deaths of drug dealers. Further investigation will allow us to explain these connections.

The hardest part of this project: The hardest part was creating the “formula for silence” to measure the degree of non reported homicides along the country. There are many variables behind the reason why there aren’t as much articles as homicides in each region. So, in order to be sure the discrepancy was linked to violence and killings we had to rule out or include segments of data along the way. This was extremely hard to do with machine learning, because words in spanish that are usually used to represent this kind of coverage, are also synonyms for other things. We had to validate (manually) a lot of the initial reports until we had a well validated sample of results. This took us half a year. Then we felt lost due to the amount of variables we had in our hands (disparity between events reported and published stories; matching stories reporting one single event by different websites; the uncertainty of internet penetration in all parts of the country and its evolution over time within the 14 years we analyzed…). Luckly, the interdisciplinary nature of our team (with economists, programmers, data experts, designers and journalists) helped us to find an answer that we felt was truly accurate.

What can others learn from this project: No matter how hard it is to measure a problem, there is always a way to do it, even if its not what you thought you would find in the beginning.

Project links: