2021 Shortlist


Country/area: Brazil

Organisation: Revista AzMina, Internet Lab, Volt Data Lab

Organisation size: Big

Publication date: 5 Nov 2020

Credit: Bárbara Libório, Jamile Santana, Carolina Oms, Helena Bertho, Thais Folego, Mariana Valente, Fernanda Sousa, Alessandra Gomes, Blenda Santos, Catharina Pereira, Jade Becari, Renata Hirota, Sérgio Spagnuolo, Yasmin Curzi, Larissa Ribeiro, Carolina Herrera

Project description:

MonitorA is an observatory of political violence against female candidates on social networks, a project by Revista AzMina and InternetLab. Throughout the election campaign, from September to November 2020, we collected hundreds of thousands of comments directed at candidates from all political spectrum on different social networks (Twitter, Instagram and YouTube). With automated linguistic filters and also with human analysis, we analyze these publications to understand the dynamics of violence during the elections and we show that political violence against women in networks is sexist and misogynistic.

Impact reached:

MonitorA’s main objective was to prove with data what we already empirically predicted: that women are provoked by sexist political violence in political environments. With a partnership with InternetLab, an independent research center in the areas of law and technology, we were able to analyze how sexist hate speech takes place within social platforms, and by partnering with five local vehicles from five Brazilian states, we also managed cover local contexts of political violence, including offline.

MonitorA’s first survey revealed that 123 monitored candidates in 7 states in municipal events received more than 40 curses a day on Twitter alone. In the second survey, in the second round of updates, we showed that other female political figures who supported women’s candidacies were also attacked. The insults were mostly focused on the physical, intellectual and moral characteristics of these women, and not for their political performance, as we have shown that it happened with men.

This survey was released by the candidates themselves, who, caused by the violence in the networks, reported this harassment. Manuela D’Ávila, a former candidate for the country’s vice presidency in 2018, investigates MonitorA’s monitoring data in a television debate. Other candidates, such as Joice Hasseman, from PSL, candidate for the mayor of São Paulo, mentioned the surveys on social networks. A candidate for councilor in São Paulo, Erika Hilton, decided to sue more than 50 people who harassed her on the networks and talked to our team.

Our data was also reported and republished by more than 50 media outlets, including television channels such as CNN Brasil and TV Cultura, CBN radio, as well as appearing in reports by UOL, Estadão, etc. It was the first time that Political Violence became a debate in the press.

Techniques/technologies used:

We created Python scripts that captured publications that cited nearly 200 applications from around the country for two months on Twitter, Youtube and Instagram. On Twitter on Youtube the APIs of the respective social networks were used. Data collection for Instagram was performed using web scraping techniques. In all, 2.3 million publications on social networks were captured for analysis. The data was cleaned up and organized to address some inconsistencies, such as name changes on social networks, standardization of columns and types of data and formatting of dates. All messages were categorized as offensive or non-offensive, based on regular expressions identified by a linguist. She created dictionaries of offensive terms that covered all profiles of monitored candidacies: for white and non-white women, cis and trans, LGBTs and straight, from different political spectrum, etc. The dashboard that allowed the analysis and visualization of these data by our content team was developed in R with shiny and golem, packages used for the creation of dynamic applications. The application’s filters and functionalities were improved throughout the project. With the filters and features we were able to make queries and create databases in CSV for smaller and more specific analyzes, using mostly Google Sheets.

What was the hardest part of this project?

The hardest part was dealing with this large amount of data for journalistic analysis. We captured 2.3 million publications on social media for analysis. Of these, at least 155,000 contained offensive terms, were potentially violent and could be analyzed. Even with the automated linguistic filters created by our linguist, it still took a great deal of human analysis of these tweets: checking if they were cursing directed at the candidates, how the terms were used in different speeches, etc. Each published content required the human checking of at least 1,000 tweets in a few days, as we follow the electoral campaign calendar, which lasts only two months. All this work brought together a very diverse team: developers, data journalists, linguists, anthropologists, specialists in digital law, etc. In this way, we were able to gather different points of view on the data to make powerful analyzes of the dynamics of political violence in the networks. And it must be remembered that this is an especially sensitive content to be worked on by a mostly female team, which had to focus on misogynistic strategies of violence and attack.

What can others learn from this project?

MonitorA is a project that combines technology, linguistics, journalism, law and advocacy to combat political gender violence. It is, therefore, a collaborative project: we brought together AzMina, InternetLab, an independent research center in the areas of law and technology, and five other local media outlets in five states in the country, which together produced content on political violence on the networks. With that, we were able to make cuts of territory, legal and technology cuts, and journalistic cuts on the subject.

In the technology area, it is also a major text mining and sentiment analysis project: we use linguistic filters to determine whether publications were potentially violent. This can be very useful and inspiring for other journalists and media who want to investigate hate speech on the internet: not just the terms used, but how these speeches are spread, what are the strategies used, the actors involved, the difference in speech in attacks to different profiles of people, etc. It is also possible to learn from the flows and processes used by our team to deal with this large amount of data and transform them into not only quantitative, but qualitative analyzes.

Project links: