Brexit ternary plots
Category: Innovation (small and large newsrooms)
Country/area: United Kingdom
Organisation: The Economist
Organisation size: Big
Publication date: 22/02/2019
Credit: James Fransham, Martín González, Matt McLean
In late February we produced a piece for the “Graphic detail” print section of the newspaper on Britons’ attitudes towards Brexit: deal, no deal, or stay in the EU. Along with each anonymous person’s response, YouGov provided us with detailed demographic variables for 90,000 people: the respondents’ age; sex; education status; household income; region, previous voting history; and so on.
While ternary plots aren’t a new invention — they have been an oft-seen feature in scientific papers for years — they are rarely used by media outlets. But since we published our story in February 2019 we have seen ternary plots become more commonplace. We have also, for the first time, published a static ternary in the print edition of the Economist.
The data was crunched in Rstudio making use of data.table and tidyverse libraries. Our voter model was created using mutli-nomial regression analysis in nnet package.
What was the hardest part of this project?
Plotting every single survey response — 90,000 points in total — was overwhelming. We needed another solution. In order to whittle-down the data, we first created a set of 675,000 hypothetical voters for each combination of sex, education, income, and so on. We took a weighted random sample of 2,500 of our observations. Because our random sample picked the most prevalent profiles, we were able to represent some 25% of the British electorate with just 2,500 observations of individuals. Plotting this number of points gave us the right balance between exploration and clarity.
What can others learn from this project?
With line charts, a bar or a column chart, or an x-y scatter, each one takes two variables— GDP and time, for example — and plots them in a two-dimensional space. But a three-dimensional “ternary” plot can be used effectively to demonstrate the relationship between three inter-linked variables.