Analysis of more than 350,000 words written over 15-years of Boris Johnson’s Telegraph columns helped us identify the topics and people he discusses the most.
We also used a computing technique called “natural language processing” — a form of artificial intelligence that can interpret the meaning of sentences — to reveal the sentiment behind his words.
This was a complicated analysis of a relatively light hearted subject, and so we set out to display the findings in a clear yet fun way. This led to an interactive extravaganza, topped off with a never ending Boris quiz.
The overall reaction to the project was extremely positive. I received good feedback in the comments section, from others in the newsroom and on twitter where I shared the process behind the piece.
The article was picked up by several outlets and featured in Politico’s London Playbook newsletter. I was also contacted by Factiva – a business information and research tool owned by Dow Jones – about how we could use their data to collaborate on other future projects.
When we first discussed doing a project like this in the Spring of 2019, we planned to focus on the then prime minister Theresa May. However, the only source we had for was her parliament speeches, which were very scripted, formulated and similar.
So when Boris Johnson became prime minister I had the idea to use his Telegraph articles as a basis for the analysis. This worked out perfectly – never before has a prime minister provided such a personal insight into their mind. The overall project benefited from having this volume of personal musings available and out there to analyse.
The process can be broken down into three sections: the collection of the data, the analysis of the data and the front-end interactives.
The data was collected using a script written in R. The Telegraph has a page for each of its writers that has links to all of the stories they have written.
I used IBM Watson’s Natural Language Understanding API (https://www.ibm.com/uk-en/cloud/watson-natural-language-understanding) to see which topics he discusses in each article. I limited it to the top ten for each and then grouped this data by year to see how the trends have changed over time.
Finally, I used the API to calculate the sentiment and emotion behind how he uses certain terms and phrases. Our findings show that he discusses the Queen with the most joy, Labour with the most sadness, the police with the most fear, Liberal Democrats with the most disgust and democracy with the most anger.
For the Boris bot game, I trained a text generating neural network (textgernn) with all 350 articles. I then split the output into sentences for the purpose of the game. The “real” sentences were chosen at random.
The interactives were built in React js as components that sit within pages on our site and in the app. For the first two interactives, I used D3.js to create the graphical elements. The static graphic on the page was created using Illustrator.
What was the hardest part of this project?
This project should be selected due to the array of interesting data visualisations I produced to help explain the complicated analysis in a simple way.
I showcased a variety of skills: creating a static graphic in illustrator, building the interactive elements using D3.js and developing the Boris vs Boris Bot game in React js. I had to use a multitude of skills to get to the finished product. What was produced was a fun, informative and interactive article that presented our complicated findings in an engaging and interesting way.
The analytics from the piece were strong as well: readers spent an average of 3 and a half minutes on the page, considerably higher than average and most likely down to the interactive elements. The story also drew in almost 1,000 new registrations and 87 subscriptions.
The hardest part of the project was working on all of the different technical elements, from the scraping to the analysis and building the finished interactive elements. It was a daunting process to begin with, but one I dealt with by splitting up the different parts into more manageable chunks.
What can others learn from this project?
Good data visualisation can help bring a story to life and explain findings in an interesting and simple way.
Using computing techniques, such as machine learning and natural language understanding, can be a difficult process to explain to editors in the newsroom.
It’s important to focus on the conclusions you can draw from the data and to use those as a basis for a story when pitching it.
I think the way we presented the data in fun and interactive visuals definitely helped the story.