Confere.ai is a measure of disinformation characteristics, which identifies patterns of false or misleading content in texts or news links circulating on social networks and the internet. The project involves researching, analyzing, and formatting a database with more than 22,000 news items, in addition to developing web crawlers to search for content and computational intelligence for analyzing and looking for patterns. One of the winning projects for the Google News Initiative Innovation Challenge in Latin America, Confere.ai, also produces educational content to teach the public to identify false or misleading content.
The first initiative to automate fact-checking in communication vehicles in Northeast Brazil, Confere.ai, was selected by the Google News Initiative Innovation Challenge in Latin America. The tool received about 4 thousand contents to check automatically during the first three months of operation. One of the tool’s main impacts is the assertiveness in an uncontrolled environment, which was around 92% for texts and 75% for links. Confere.ai generated more than 70 editorial contents about reach, including articles on disinformation, fact checks, and videos with guidance to the public. The work also generated partnerships with the Universidade Católica de Pernambuco – Unicap. It was mentioned in dozens of local media outlets, and journalism course completion works, promoting knowledge about the automation of fact-checking in communication sciences. The articles produced by Confere.ai had 286 thousand hits over six months, while the tool reached an average of 500 hits per day. Confere.ai also managed to identify the waves of disinformation growing in the local media and debunking some rumors. Finally, the tool became part of the National Network to Combat Disinformation (RNCD).
The creation and development of Confere.ai were carried out in stages. In the first, a hybrid database was set up: manual collection and storage of news links and texts previously checked by checking agencies; use of the corpus Fake.BR, from USP; and development of web crawlers to extract content from four different sites. The material was distributed in a database – with 21,956 news items, distributed in 9,011 texts and 12,945 links – on google sheets, cleaned with Open Refine. Part of this bank was analyzed manually to identify criteria. Creations were made using filters and a dynamic table for data extraction. From that, 15 criteria were defined for text evaluation and 20 criteria for link analysis. In the final analysis, the behavior of the values obtained in each test performed in the database was observed to determine each criterion’s relevance, and patterns were found that separated the real news from the misinformation. Part of the database was used for the development and training of computational intelligence. The following were used: the supervised technique (Random Forest -> uses a set of decision trees to form the fine answer) and input data BoW (Bag of Words) (a way of representing information in a text through the quantity or frequency of the words contained in it + linguistic characteristics of the text – extracted through natural language processing techniques). We built The Confere.ai web platform with python and angular. A dashboard was built from it, which facilitates the visualization of the data patterns entered in the tool. We carried out hundreds of data crossings to validate the previously defined criteria and measure assertiveness – which was 75% for links and 92% for texts.
What was the hardest part of this project?
The creation and development of Confere.ai faced several challenges. The first was not to have similar reference projects in Portuguese, which could guide which technique is the most assertive to find disinformation patterns. Likewise, academic studies on fact-checking automation were lacking to create the criteria for identifying misinformation on the internet automatically. Due to this limitation, the team needed to seek English studies and try to adapt the results achieved to the Brazilian reality.There are also not many Portuguese corpora of disinformation data ready in Portuguese; there was only one database from the University of São Paulo. This, in turn, was out of date, with a series of links from 2016 and 2017 that had already been taken down, further limiting the analysis to identify patterns of disinformation. Likewise, there is a lack of free natural language processing APIs in Portuguese, which limited the identification of grammatical classes, emotional intentionality of words, grammatical errors, etc. This reduced the possibility of using many of the criteria for identifying misinformation applicable to English-language projects.To overcome these difficulties, the team used two actions. He performed the manual collection of previously classified misinformation, based on the analysis of three years of checks carried out by Brazilian checking agencies. The team also developed web crawlers to extract uninformative texts found on the Boatos.org website and informative texts found on the JC Online, Diario de Pernambuco, and G1 sites. This, however, presented different programming standards, which generated the need to study how each site was created and a dozen tests by the crawlers. In the end, we still face the challenge of convincing people to access a platform to check disinformation independently.
What can others learn from this project?
The Confere.ai project offers a great contribution to the field of automation of fact-checking in Brazil. One of the pioneers in offering a solution that breaks the intermediaries between the uninformative piece and the readers’ doubt. It is a project that contributes to thinking about using technology to combat disinformation in Brazil, which has proven to be a powerful engine for the destruction of public debate in the last two years.The originality of the project lies in, precisely, trying to propose a solution that shortens the distance between the content to be checked, the audience that receives it, and the final result of the check. Although it does not indicate whether something is true or false, it proposes to create a ‘flea behind the ear’ and a critical census in public.The project also innovates by developing a series of technologies capable of searching content on specific pages to form a database with more than 22 thousand contents; something is never before done in Northeast Brazil. This not only makes it possible to improve the tool, but it can also serve as an instrument for consolidating other fact-checking projects in the country’s journalism.The differentials of Confere.ai are the creation of a list of criteria, based on the study of patterns of disinformation in text content and news links, which can be made available to other interested journalists and researchers in the communication area; It is also innovative in the application of artificial intelligence techniques in communication vehicles in Northeast Brazil and can serve as a reference for studies on AI in journalism.