The Dutch Telegram landscape: How the marginal went mainstream
Entry type: Single project
Publishing organisation: De Groene Amsterdammer, Utrecht Data School
Organisation size: Big
Publication date: 2022-08-17
Authors: Joris Veerbeek, Eva Hofman, Or Goldenberg
* Joris Veerbeek (27) is a PhD Candidate at the Utrecht Data School and part-time employee for De Groene Amsterdammer. His research areas focus on applied data science and human centered artificial intelligence.
* Eva Hofman (27) is a full-time journalist for De Groene Amsterdammer, where she heads data research projects at De Groene and holds the technology and internet culture portfolio.
* Or Goldenberg (24) is a full-time journalist for De Groene Amsterdammer who focuses on international security, including its online components.
In collaboration with Utrecht Data School (Utrecht University), De Groene Amsterdammer charted the Dutch Telegram landscape. We collected data from more than 4,000 active groups and analyzed over 30 million messages – the most comprehensive picture of the Dutch Telegram sphere to our knowledge. Our project uncovered a predominance of conspiracy groups, arms and drugs trade, and (child) pornography. Furthermore, we found over 14,000 explicit death threats. Combining this data driven approach with interviews with victims and legal experts, our project highlights the importance of effective moderation on (and regulation of) Telegram – in the Netherlands and elsewhere.
The article was featured on the cover of De Groene Amsterdammer and it was picked up by the largest radio station in the Netherlands. The news was also discussed by major news outlets in Austria including public broadcaster ORF, Heute, and Wiener Zeitung, as well as in Luxembourg’s L’essentiel. The methodology we developed for identifying and mapping Telegram groups in the Netherlands was recognized by journalists from other countries and could serve as a valuable tool for similar investigative journalism projects in other languages. Finally, the list of Dutch Telegram groups identified in the research is currently being used as a starting point for a multi-year research project on radicalization in the broader Dutch media landscape.
To uncover the dynamics of Telegram groups in the Netherlands, we used a method called “”snowballing.”” This involved starting with a seed list of groups, which we found by searching for links to Telegram groups on other social media platforms like Twitter (using the Twitter API) and Facebook (using CrowdTangle). We then followed the references and links within these groups to find other groups, repeating this process until we could no longer find any new public Dutch-language groups, resulting in a comprehensive list of over 4,000 active groups. To collect the data from these groups, we used the Telegram API and the Python package Telethon.
Besides, we combined network analysis and keyword analysis to chart the Telegram landscape. We started by creating a network visualization of the Telegram landscape using Gephi and NetworkX, which helped us identify groups that were closely connected and sharing information. Then, we dove deeper by using keyword analysis to understand what topics were being discussed in these groups.
Finally, we also used machine learning techniques to identify and quantify the number of death threats present in the messages. By fine-tuning a pre-existing classifier (the hate speech model of IMSyPP, based on the Dutch-language BERT model) on a custom dataset of 6000 manually labelled messages on Telegram, we were able to scan through the 30 million messages.
Context about the project:
This project took several months to complete, from April to August 2022. Just over a month earlier, Russia had invaded Ukraine, and Telegram received praise for being a safe place for dissidents. We wanted to know what kind of conversations Telegram facilitates in the Netherlands, as the free marketplace of speech does not only serve dissidents. Freedom of speech, as championed by tech innovators like Elon Musk often leads to clashes with the law at a basic level. What if leads to hate-speech, child pornography, and death threats?
The project is part of an investigative journalism cooperation called “Data & Debate”, a cooporation between De Groene Amsterdammer and Utrecht Data School, which focuses on the way in which (online) public debate takes shape. Its goal is to understand and investigate the dynamics of how information is shared, debated, and moderated in the digital age. This includes studying the role of social media platforms, algorithms, and disinformation in shaping public discourse. The collaboration is a well-established partnership that combines journalistic expertise and storytelling methods with scientific research methods.
Charting the entire public landscape of Telegram in the Netherlands was a complex task, as groups on the platform are interconnected and often link to one another. Telegram is a global platform, and there were many groups that were not related to the Netherlands. This meant we had to (automatically) check every group that was linked to for Dutch language. Combined with the Telegram API’s inconsistent and untransparent rate limits, it took us over four months to finally complete the landscape.
For a long time, we only looked at the public groups. However, these linked to ‘hidden’ groups that we could not map automatically. What we encountered in these groups pulled our overview off track: the hidden groups, dealing in sex, drugs and weapons, turned out to be even bigger than the public ones. We kept the public landscape for analysis, as we could analyse and discuss it with a lot of certainty. Besides, we also discussed the hidden groups.
The parts of the data we used in the published article are only part of the data retrieved. For instance, we labelled all Dutch court rulings on Telegram, but ultimately chose not to delve further into them due to the project size.
Crucial was the help of AI, which allowed us to handle the volume of messages. Moreover, that is where journalistic skill comes in: diving into groups ourselves, talking to moderators ourselves.
To conduct investigations where you encounter images of beheadings, pornograhpy of underage girls and hate-speech, it is necessary to create some distance from it for yourself, but when writing, you have to let go of some of that distance, otherwise you cannot properly convey why what is happening is bad. For example, we only contacted the victims of expose groups (young women) about a week before the publication date. Here, too, journalistic craftsmanship is important: dealing responsibly with these women and their data, giving them the space for a safe conversation.
What can other journalists learn from this project?
The main takeaway for other journalists is that it can be beneficial to combine academic research (techniques) with journalistic endeavours to create a story. In our academic-journalistic collaboration, rigorous data scraping techniques and analysis were translated to a tangible story, which makes the data-driven results comprehensible to the wider public.
Another distinctive takeaway from our collaboration is methodological transparency, which was uploaded as a separate article on the website of De Groene Amsterdammer [https://web.archive.org/web/20221202031842/https://www.groene.nl/artikel/verantwoording-bij-het-onderzoek-naar-het-nederlandstalige-telegramlandschap]. As data-journalism can seem complicated, and sometimes overwhelming for most readers, it is important to be transparent and explain the methods and research in comprehensible steps.
A third takeaway is that it is important to keep having a critical view on (tech-)platforms which are being praised for security, freedom, and safety, especially as this can turn out to be distinct in different countries and their respective political landscape/context.
Lastly, journalists often focus on individual extremist groups, while these are not isolated from this broader radicalised landscape. By mapping the entire landscape, we clarified the root sources of this flawed information.