Fakecrunch (Фейкогриз) is a browser add-on and Telegram-bot which promotes media literacy. It warns about manipulative news, as well as lets users report about dis- and misinformation.
Under the hood, Fakecrunch leverages the power of AI to label manipulative content. Its algorithm is based on a language model classifier trained on news labeled by trusted professional news editors.
A lot of mentions in Ukrainian media, the total user count (after two months) is about 700, most of them are journalists.
Up to date, we processed about 35 000 requests to check news items.
The manipulative news database is build using ULMFIT-based AI classifier. Every week we crawl more than 80 000 news from Ukrainian and some Russian websites using Scrapy. Each piece of news is pre-processed — language detection, tokenization, conversion of tokens to the sequence of token-ids, filtering out news unrelated to politics, economics, and society (such as sports, celebrities, lifestyle). Relevant news items are checked with ULMFIT classifier implemented in fast.ai Python library, initially trained on Wikipedia corpus, fine-tuned on manually labeled news corpus. News classified as manipulative with high confidence are loaded into Fakecrunch database (PostgreSQL). The classifier algorithm is available on GitHub.
Backend for Fakecrunch is built using Django. It links the database of automatically detected and human-labeled manipulative news with Fakecrunch applications, add-ons, and Telegram bot. We use social authentication to sign up users in order to avoid misuse and distinguish trustworthy users. Also, we have developed an interface for the moderation of users’ labels.
What was the hardest part of this project?
For our best knowledge, this is the first such tool & whole pipeline for Russian and Ukrainian languages. Texty was first asked about such a tool to warn about manipulative news at the end of 2018 when we developed an AI classifier for the detection of manipulative news. The accuracy of the algorithm at the time did not allow us to use it for the classification of individual pieces. Instead, we were able to estimate the average manipulativeness of content on some website. Recently, after improving the accuracy of classifier we decided to apply it for the labeling of individual news items. To address the problem of false-positive labels (when a non-manipulative item is marked as manipulative) we have increased the threshold of classifier (at a price of an increasing number of false-negatives). Also, all AI-classified news are labeled as “suspicious”, not as outright manipulative. Users are informed whether a piece of news was labeled by classifier or by a human.
Another challenge was the risk of misuse of Fakecrunch. First of all, only signed-up users can send reports about manipulative news. The login is implemented as social authentication with a minimum of permissions to protect user privacy and make login as comfortable as possible. We moderate all user labels and are developing automated ways of moderation, such as user ranking and agreement with classifier.
Technically the project was challenging in putting the whole NLP pipeline (crawling — pre-processing — classification) into a real-time application. It was made possible by the optimization of news crawling and data processing scripts to add known manipulations to Fakecrunch within one hour after publication. The variety of products (add-ons, bot, and backend, real-time data processing) was difficult to handle in itself and required discipline in the project’s code and development process.
What can others learn from this project?
The project shows how a research data journalism project was turned into a tangible interface. Texty started work on automated detection of manipulative news more than two years ago. Since then we were constantly improving algorithms and delivery of findings, trying to reach more audience and promote media literacy through our work.
In addition to journalists, an application attracted a more diverse audience.