The Storybench 2020 Election Coverage Tracker

Category: Best data-driven reporting (small and large newsrooms)

Country/area: United States

Organisation: Northeastern University School of Journalism

Organisation size: Small

Publication date: 13/02/2019

Credit: Aleszu Bajak, John Wihbey, Dan Kennedy, Meg Heckman, Alexander Frandsen, Alexa Gagosz

Project description:

The Storybench 2020 Election Coverage Tracker from Northeastern University’s School of Journalism is an ongoing project that keeps tabs on the media’s coverage of – and public discussion surrounding – the 2020 U.S. presidential election. Using data sources like the Facebook Ad Library, Media Cloud and Twitter, and mixed methods analysis, the Tracker reveals trends and gaps in both political journalism and the rhetoric made by and about the candidates on social media. Our most popular articles have revealed the negative sentiment of the media toward female 2020 candidates and how the media sets the 2020 news agenda.

Impact reached:

The Storybench 2020 Election Coverage Tracker has been wildly popular with media critics, journalism educators and political journalists. Most popular has been our sentiment analysis of 2020 political candidates by gender, “Women on the 2020 campaign trail are being treated more negatively by the media,” which was picked up by CNN’s Reliable Sources and broadcast nationally. Following that, The New York Times, The Washington Post, MSNBC and several other news outlets covered the analysis. Our most recent analysis, ”How news media are setting the 2020 election agenda: Chasing daily controversies, often burying policy,” was shared by popular media critics like Jay Rosen and Brian Stelter, and provided empirical data for a long-standing debate about political journalism. 


In addition to providing the public and researchers with data and open-source methods (and code) for important questions related to political journalism, the Storybench 2020 Election Coverage Tracker is a teaching tool, permitting several journalism undergraduate and graduate students at Northeastern University to learn these data and media analysis techniques and publish articles of their own. 


Techniques/technologies used:

The Storybench 2020 Election Coverage Tracker initially started with textual analysis techniques from natural language processing including text mining with TF-IDF and dictionary-based sentiment analysis, provided by R’s “tidytext” package and performed in RStudio. As the project matured, it began using R and Python wrappers for various APIs (including the Facebook Ad Library and Twitter) and built a public-facing R Shiny app “Illuminating the road to 2020 through media coverage, candidate tweets and Google searches” to display findings and allow users to explore the datasets and their own questions.

The project has also used more complex techniques like structural topic modeling, as well as Python scrapers (and the newspaper3k library) to collect full news articles and D3 Javascript to visualize their distribution. And to illustrate geographic distribution of Facebook political ads, the project used GIS packages in R.

The project, like any in media criticism, relied on more analog, qualitative techniques. We measured news attention by eye, logging Fox News’ obsession with Alexandria Ocasio-Cortez, and tallied up by hand the gender of political reporters covering the 2020 election.

Finally, for techniques we were not equipped to deploy ourselves, we collaborated with state-of-the-art companies like MarvelousAI to examine the volume and nature of attacks on and support for 2020 candidates on Twitter, providing analysis and visualization.  

What was the hardest part of this project?

Lining up the right data set with the right research question and the right analytical tool was the most difficult part of each project. Doing that over and over again for almost 20 posts was a steep challenge. There were a lot of dead ends and wasted time and code. It’s also really hard to break through on a topic where (almost) everyone has an opinion! That’s why we hoped bringing data and open-source methods and code to bear on this project might help us get our work out farther – and add an iota to the discussion around 2020 election. Each challenge, of course, was a learning opportunity for our faculty and students and we’re fortunate to have managed to raise a few eyebrows outside of Northeastern along the way.

What can others learn from this project?

The Storybench 2020 Election Coverage Tracker has been a fantastic opportunity to do data journalism from within an institution of higher education. The School of Journalism faculty and students without the deadlines and strictures of a more traditional newsroom, have had the time and resources to pursue research questions in the public interest and the space, Storybench.org, to disseminate our results (and, in many cases, code) widely. We hope that others in the data journalism community may see the value in collaborating with data journalism students and faculty who may have the time, skills and resources to collaborate on data collection, analysis and visualization for projects as ambitious as understanding the media coverage of U.S. politics. 

Project links: