2020 Shortlist

The Language of Congress

Category: Best news application

Country/area: United States

Organisation: The Pudding

Organisation size: Small

Publication date: 13/09/2019

Credit: Charlie Smart

Project description:

We fed thousands of Congressional tweets to a machine learning algorithm powered by Salesforce’s Einstein AI in order to recognize political issues. The tweets are categorized into 15 topic areas include environment, guns, jobs, and social issues, and then visualized nationally for members of Congress, and issues. The project updates every day of the 116th congress, from January 3 2019 through January 3, 2021.

Impact reached:

Twitter is designed so that you come across one tweet at a time — often breaking news, reactive rants, and unfiltered spur of the moment thoughts — and are never exposed to larger patterns or trends. This application allows people to dig into the issues and see which issues Congress as a whole prioritizes and which issues their representatives personally favor. It uses big data to put the power in the hands of average people.

Techniques/technologies used:

We sought to use an out-of-the-box machine learning model to make predictions—one that could run in real-time and update each day.

This analysis builds on new deep learning, advanced language models. For this project, we used the Einstein Intent API to train a model to predict what issue a member of Congress’ tweet pertains to. This model was trained on approximately 3,000 tweets that were manually classified into issues by our team (i.e., a training process). Afterwards, it develops a probability that a tweet falls within a given issue.

Tweets were obtained via the Twitter API for all current members of Congress with active Twitter accounts.

The front-end is built with Javascipt and D3.js.

What was the hardest part of this project?

The hardest part of a continually updating news application is making sure that you build out all the base infrastructure to handle as many of the future unknowns as you can. We are working with a massive and ever-growing amount of text data so it’s important to make sure the framework is robust and flexible. Luckily we we working with two strong and structured APIs: the Einstein Intent API and the Twitter API.

What can others learn from this project?

The project is able to provide real-time insights into what issues are pushed into political and public discourse by Congress. After the Global Climate March in September, we were able to see how Congress responded and map their tweets to news events.

Project links: