A data investigation uncovering how Twitter’s trending algorithm boosted anti-mask movements and conspiracy theories about the pandemic, combining innovative data analysis methods with rigorous tracking of anti-mask movements online in the UK and Europe.
In July 2020, the hashtag #NoMasks trended in the UK suggesting there was surging anti-mask sentiment and distrust in coronavirus prevention measures. But our investigation of more than 30,000 tweets showed that while there was an increasingly loud anti-mask movement in the UK, the platform’s algorithm gave anti-mask misinformation a boost by dubiously presenting it as a widespread trend.
Prior to this piece, there were few robust examples of this and none in relation to anti-mask or anti-lockdown activism. Our story was revelatory, hitting a nerve and making an imprint on people’s understanding of social networks and how they present information. High-profile technology researchers and influencers shared and praised the findings, including specialist researcher and Syracuse University professor Whitney Philips who featured the story in her 2020 election course syllabus. During the reporting of this piece, we conversed with Twitter representatives, flagging our findings and suggesting the algorithm was fueling misleading reports. A few days after our story was published, Twitter changed their rules to include descriptions on trending topics.
The story had almost 10 thousand views on First Draft’s website and was shortlisted for the British Journalism Award for Technology Journalism in December 2020.
The story uncovered a little-known problem with technology platforms: algorithms often present topics and ideas as “trending” without any context or verification, which could give extra exposure to divisive topics, increase disinformation and make the public less likely to comply with scientifically informed government measures – causing more damage to democracy and society than conspiracy theories alone.
Through the use of data, the investigation served the public interest by uncovering the British public’s real sentiment around masks. This illustrated how Twitter’s algorithms created the false perception that anti-mask positions are more common than pro-mask views, promoting the idea that the majority of people in the UK opposed mask-wearing. In turn, the inclusion of the hashtag in Twitter’s “trending” sidebar exposed more users to anti-mask messages, potentially encouraging new audiences to flout the laws. The story visually presented the problem with social networks — that they don’t care what people share, just that they share — allowing a wide public audience to take knowledge away.
During our long-term monitoring of global disinformation narratives, #NoMasks trended on Twitter on July 14, prompting us to notice a lack of posts pertaining to the anti-mask sentiment. We used Python’s Twint to scrape over 30 thousand Twitter posts containing the #NoMasks hashtag in July 2020 and 1,660 posts with the French hashtag #StopMasques between July 20 and August 4. With Python’s Pandas, we analyzed shares over time, the most shared tweets and top accounts posting them.
We used Regex to parse out hashtags and tags in the tweets and accounts’ descriptions to select the most common hashtags associated with the #NoMask hashtag.
The next step of analysis focused on the tweets posted on July 14 — the day the British government announced that masks would be compulsory in shops — to track when, how and through which accounts #NoMasks went viral. After filtering out the false-positive tweets that contained US- or Australian-focused messages, we manually analyzed the 222 resulting posts that gained at least a little traffic — more than 10 retweets — on July 14. We then tagged whether they were “Pro-Masks” or “Anti-Masks” and analyzed both groups by top associated hashtags and by the total number of retweets and likes per hour. The data visualizations were built with Chart.js and Adobe Illustrator.
What was the hardest part of this project?
As a new field, disinformation reporting is less developed than other areas and the impact of monitoring social media and consuming harmful posts long-term is not known. Vicarious trauma can be common amongst journalists focusing on social media. It was challenging to sift through reams of misleading posts and it can be overwhelming as a journalist to go through thousands of such messages. Alongside the investigation, we were also working on daily output, including newsletters and news reports, as well as long term case studies and features, so we also had to use our time stringently.
Finding ways to explore and analyze online disinformation data is also challenging. The work of a data journalist in this field is mostly experimental and exploratory. With limited access to the platforms’ API, we have limited tools at our disposal and we always have to come up with innovative ways to collect data and research suspicious content in a comprehensive way. Therefore, developing an accurate methodology was the most delicate and time-consuming part, followed by hours of work to clean bad-formatted data and to make sure the sample of data was relevant and comprehensive.
What can others learn from this project?
As part of our work of tracking social media trends and narratives day in, day out, we often rely on platform algorithms to inform our reporting and help shine a light on disinformation. However, we’ve experienced a lack of transparency and utility to many platforms and often grapple with bugs and flaws. Many journalists also rely on trends such as Twitter’s trending sidebar for news stories, so uncovering these shortfalls and an uncomfortable truth about how journalists do their work can be difficult.
Having developed a new and original methodology to dig into the Twitter data and the trending hashtag, other investigative journalists can be inspired and apply it to their own field of research.