I am a data-driven journalist and US correspondent for The Economist. 2022 marked my fourth full year in this role. As a journalist that sits on two teams, my job marries “traditional” reporting with data analysis, computational social science, and applied statistics to tell interesting stories about our complex and (increasingly data-rich) world. Most of those stories were about US politics last year; In 2022 I also published my first book about how political polling works and why it’s crucial for a healthy democracy. Before this, I was an undergraduate at the University of Texas at Austin, where I studied government, history and computer science. (If you’re counting, that makes me 26 years old — in the past I believe this would have qualified me for the “Young Journalist” award, though I gather there are no categories this year!)
A “data journalist” in practice, I am a politics researcher and empirical storyteller at heart. I draw my inspiration from messy datasets that hide fascinating stories on how the structures of politics and society influence individual behaviors and outcomes. That includes studying survey data to show the effects of the two-party system on political polarization, electoral results to quantifying how Donald Trump has extremized the Republican Party, historical data to show how gerrymandering works (in America and elsewhere), and, polling and other data to forecast uncertainty in election outcomes; I am the chief architect of The Economist’s statistical models of polling data worldwide. These are areas of coverage with a lot of demand from both readers and decision-makers; Our various elections projects this year combined for nearly 2 million views, making the content area one of our most-read.
In my role I have been infusing more statistics-driven ideas into our industry’s coverage of politics. Over the last decade, the growing popularity of data-driven journalism has been a major disruption to political journalism in the US and abroad. In my view that disruption has been mostly, but not entirely, positive. Iterations in communicating uncertainty have been particularly impactful on the news media and inspirational to my own career. But it is easy to see an excessive focus on the horse race and minute changes in the latest readings from political polls. This distracts readers from the true trends of an election and the broader importance of democracy.
My work on The Economist’s election forecasting models thus steers readers away from a focus on point predictions and daily movements in probabilities and instead uses methods from machine learning and Bayesian statistics to capture and communicate the full range of uncertainty in an election—using polling data and other sources. Unlike other well-known political data journalists, I focus especially on deciphering whether some polls are more trustworthy than others and whether movements in polls are real or phantom. I look at the underlying data-generating process for political surveys and elucidate for readers, via parameters in our model and in standalone articles, what more shallow analyses of data don’t tell us.
Of course, as a data journalist, it is hard to resist pursuing stories outside of my beat. Last year I wrote for Graphic Detail, our online and print data journalism section, about covid-19 stimulus buoying retail stock trading, the link between religion and attitudes about abortion, and inflation in food and gasoline prices. In the past, I have covered subjects from climate to crypto to the statistics of climbing Mt Everest. In all my projects I work with my stellar colleagues on visualization, editing and programming. My portfolio below details how I use statistical rigor to extract stories from data in new, interesting, and entertaining ways.
Description of portfolio:
My major project in 2022 was The Economist’s statistical model of the 2022 midterm elections for the United States House and Senate. I worked on methods for this project with Dan Rosenheck, our Data Editor, and was responsible for our pipelines gathering, processing, and modeling polling data in every race. We aided the paper’s visual and interactive journalists in producing the online interactive. Unlike other outlets’ presentations of forecasting models, ours highlights confidence intervals like a weather forecast. For those familiar with the contest, we did not predict a “red wave” as some others did, providing a useful signal in the noise of most political coverage last year. (See links 1 and 2.)
As mentioned, The Economist uses its forecasting models to guide other coverage of the election and unearth interesting stories from the data used to train and test its models. This means I get to work with editors outside of our data team on coverage and teach other non-data correspondents how to use the model and write about its outputs. This lets us spread the impact of empirical journalism throughout the company. We used the model to write about where voters were sending their campaign contributions — a data source aside from polling, coverage of which is abundant (third link).
We also produced forecasting models for the French and Brazilian elections (links 4 and 5). The two-round electoral systems for those countries allowed us to analyze the ideological “lanes” of the country’s party systems. This diversity of viewpoints is often obscured by the final-round vote totals for both of the two top candidates (link 6).
On political process, I also wrangled data on America’s new congressional maps and built a congressional model to analyze how biased it was toward a certain party, and compare that to its historical slant (link 7). For The Economist’s newsletter on US politics, which is sent to nearly 200,000 readers each Friday, I popularized two metrics of fairness (bias and responsiveness) for the maps that have not before been covered by political journalists (link 8). This is another example of using my data journalism skills in sections beyond the data team’s designated space. Through our “Checks and Balance” newsletter, from 2020 through 2022 I was the only journalist at our organization to write a weekly article for our subscribers.
For this year’s midterm elections, we also launched a weekly data-driven feature for the paper’s print US section. One of the most popular articles was an investigation into a pollster that was producing results wildly out of sync with the industry’s consensus. We obtained the company’s data, discovered severe problems with their methodology and published our own re-analysis of their data. We used the investigation as an opportunity to teach our readers about how polls work (see link 8). Later, in the same column, we discovered that a slew of polls from right-wing pollsters was biasing the most popular polling aggregation sites in America.
Along similar lines, I used a large database of historical election results to write a story about how Hungary’s far-right ruling party gerrymandered the country’s legislative maps to give itself an advantage in parliament (link 9). We also analyzed the changes the party made to Hungary’s constitution that allowed them to rig the system.
My work required both deploying current skills in my toolset — a mix of traditional social science techniques in the R programming language and programs for Bayesian statistics — and learning how to use C to code complex machine-learning methods for big data and large parameter spaces. I learned additional reporting skills from working with my non-data colleagues.