Our machine-learning General Election predictor – unmatched by any other publisher – was borne out of collaboration with our data science and graphics teams. To coincide with the release of the official 10pm exit, we generated our own accurate seat-by-seat forecast of the results. Our forecasting model was closer to the final outcome than the exit poll, and that Labour’s vote share would hold up slightly better in certain areas. This was because of the wealth of data it took in: the last five national elections; demographic data and the latest polling data.
Our forecaster’s success lies not in the fact that complex methods were used, but that they were communicated in an accessible fashion – we feel the pursuit of this clarity is just as important as the pursuit of technological development. This helped drive the impact of this project. On social media, animated gifs were used to increase the impact of what some would see as quite a dull, numbers-heavy topic (https://twitter.com/Telegraph/status/1205276976396873728).
Our first forecast had the Conservatives on 363 and Labour on 201 (which you can see from 10.23PM on the night here https://twitter.com/Ashley_J_Kirk/status/1205251919452786688). This was just two seats short for both major parties. Our very first forecast correctly called some of the biggest results of the night, such as that Jo Swinson would lose her seat, Luciana Berger and Chuka Umunna would fail to take Remain-London seats, and Dominic Raab would cling on despite a Lib Dem surge. The BBC’s exit poll originally put the Conservatives on 368 and Labour on 191, so we were closer to the final result (BBC +3 for Con, we were -2; BBC -12 for Lab, we were -2).
The impact of this was that Telegraph readers were able to see who their own area was likely to be represented by even before the results had come in. Our metrics showed that this piece of data journalism is something that they were interested in, with the piece being one of the most subscribed and viewed pieces on results night.
Our general election predictor model was the most complex piece we’ve produced. The data used to power the model was a mixture of demographic data – obtained through various governmental departments and the 2011 Census – and historic polling figures compiled from various polling companies. We used this information to power a model – built in R – which predicted the votes for each party in each constituency.
This model would run machine learning on the last five national elections and their corresponding demographic data to find out what demographic data is the most important in determining how a constituency voted – with factors such as car ownership being surprising determinants which we wouldn’t have otherwise realised. This data was then combined with the latest regional polling and the exit poll to produce our own prediction of the final result.
We had great success with the model, with our forecast correctly calling over nine in 10 seats and getting closer to the final result than our competitors.
What was the hardest part of this project?
The hardest part of forecasting elections is knowing whether or not your model works, and then persuading stakeholders within the newsroom to take the risk on new technology. The only real test of it comes once people have cast their vote – which is a highly unpredictable thing. At this point, it’s too late to change your model without incurring flaws in your methodology. So we had to persuade editors, many of whom did not fully understand the technologies involved, to take a risk.
The key with persuading editors and stakeholders at The Telegraph to take the risk with machine learning was not by showing them the exciting technologies involved, but was to show them results. The fact that we are producing exclusive news stories and visual projects on a weekly basis helps here. We have built a solid reputation within the newsroom, and so we are trusted when we pitch an out-of-the-box idea.
This meant that, when we said that we had a model that could correctly predict the outcome of the last four general elections, and would likely do the same in 2019 with engaging, subscriber-driving visualisations, they jumped at the idea. This meant that we were able to take our first steps in machine learning, an area that we hope to do a lot more in in the coming years.
What can others learn from this project?
From this project we have learned that, even in a small data journalism team where resources are tight, you can still be rewarded by taking risks and doing something out of your comfort zone. The Telegraph’s Data Journalism desk is consistently centred on either delivering exclusive news stories for the publication or producing visually-compelling pieces of journalism, but machine learning was time-intensive risk for us. The collaboration was weeks in the making, pulling together designs and data scientists together. The differing workflows involved with these different teams, who work in different parts of the company, meant that flexibility was required from all parts. But the risk and invested time was worth it, with our highly-accurate forecaster being one of The Telegraph’s best-performing pieces of online content on the 2019 General Election.