‘Deterring Democracy’ revealed the secret data strategies of Donald Trump’s Presidential campaign, and his blueprint for dividing America and taking the White House.
After years of source work, we obtained Trump’s 2016 campaign database – one of the largest ever leaks, with 5,000 SQL tables totaling 5TB of data.
In a complex project we rebuilt it, showing how thousands of data points on more than 200 million US citizens were used to segment and manipulate people.
It revealed strategies to deter millions of Black Americans from voting altogether, and splinter the country on racial lines using messages of fear.
The project was a global news event, which trended across Twitter and was reported widely in the UK, Europe, India, Australia and Asia. In the US, it was covered by major outlets including The Washington Post, The Boston Globe, CNN, ABC, PBS, NPR, Salon, Axios, Forbes, The Atlantic, Vanity Fair, Variety, Bloomberg, the Daily Beast and the New York Times among others.
Importantly, the story was also covered in detail by regional US outlets in battleground states
including Wisconsin, Michigan, Ohio, Virginia, and Georgia. In Florida, we collaborated with the Miami Herald to produce a series of impactful features on battleground counties also titled ‘Deterring Democracy’.
The broadcasts galvanised multiple Black rights organisations including the NAACP and Color
of Change who had long-accused the Trump campaign of running a racist and divisive political operation, and charged Facebook with enabling it.
Channel 4 News obtained exclusive interviews with key Black politicians and figures in the US,
including the Democratic chief whip Jim Clyburn, campaigner Al Sharpton and Professor Cornel West.
The day after the “deterrence” report was broadcast, former President Barack Obama released a
video telling Black voters: “Right now, from the White House on down, folks are working to keep people from voting, especially communities of colour. That’s because there’s a lot at stake in this election. Not just our pandemic response or racial justice, but our democracy itself.”
The technical infrastructure was a MySQL database large enough and powerful enough for terabytes of data to be ingested relatively quickly, and to handle multiple large-scale joins and queries. We used a bespoke process to help bulk ingestion of zipped files for speed. Security measures included encrypting the data at all points, and limitations and protections to access.
In addition to our data security measures, we had a further obligation to ensure the highest possible level of protection for our source/s. Early on, we identified technical information in the SQL files we felt may pose a risk, and decided to remove them. We designed a Python program that would perform these tasks and produce a short clear log so any member of the team could check it had succeeded for each file.
For analysis, two members of our Investigations learned SQL, to allow a full range of tasks including multiple table joins and complex querying. Multiple joins across tables were especially important in examining data techniques used by the campaign.
We found some statistical analysis was better performed using Python Pandas on exported CSVs. One of the team learned Pandas to perform large-scale counts and analysis. Combining SQL table joins with automated Pandas processes enabled us to quickly produce numerical analysis for geographies across the United States, or for example, for all precincts in battleground states.
A final step in our analysis was to try to evaluate the result of the campaign’s strategy in key areas. To do this we obtained state voter records and voting history, and joined these to the campaign data. This allowed us to evaluate, for example, how many citizens that were marked for a ‘deterrence’ operation actually turned out to vote in the 2016 election – in some cases very small numbers.
What was the hardest part of this project?
The scale of the data involved in this project was beyond anything we’d dealt with before. It required a huge amount of research and upskilling to understand how best to handle it. The quantity of files, and large file sizes, necessitated planning ahead to prepare files for ingestion or analysis, and produced large volumes of outputs that needed to be examined. It meant a mountain of work, all during the heat and hard deadline of the 2020 US election.
A considerable further challenge came in the vital need to understand the people at the end of the Trump campaign’s ‘data points’, to marry our analysis with the stories of the people involved.
Our first film examined the campaign’s ‘deterrence’ operation that saw predominantly Black urban areas targeted most heavily. We needed to approach residents of these communities with copies of their personal data. This was data they didn’t know existed, and didn’t know how we had obtained it. We also needed to ask them about the potential they had been manipulated, and if their voting decision in 2016 was fully their own. In neighbourhoods of Milwaukee people told of their suspicion and distrust of the media. Many people also felt under fire for low turnouts in 2016, and that they were being ‘blamed’ for Trump’s victory. As a result, our teams faced some understandable hostility.
In all, despite the pressures of the election and covid travel restrictions, we put three teams on the ground across the US. All spent weeks gaining the trust and cooperation of our contributors. This process required huge sensitivity and perseverance, from experienced and talented journalists, and finally allowed us to join up our data analysis with the stories of the people involved.
What can others learn from this project?
Serious concerns have emerged around a growing industry using big data, opaque algorithms and social media manipulation as a key tool in fighting political campaigns around the world.
We began to examine this subject in a previous investigation into UK-based political data firm Cambridge Analytica in 2017 and 2018, that resulted in a series of films including a lengthy undercover operation that won a Bafta, Emmy and Peabody.
In the field of political journalism, which is dominated by access to key figures and off-the-record sources, this project demonstrated an essential point: to hold power to account, reporters must be empowered to scrutinise big data.
Big data increasingly governs the lives of individuals, and algorithms determine how decisions are made. When used alongside social media its power is enormous, and can govern the direction of nations with huge consequences for many millions of people.
We believe the obtaining of the Trump campaign database is the first time the techniques of a modern data campaign have been fully exposed in this way. It uncovered material that answered questions as to how the campaign sought to manipulate the electorate – and exposed a blueprint for a divided America that still has serious ramifications today.
But it poses more questions. It’s vital for journalists to be able to interrogate the consequences of this on people’s lives and we hope our investigation contributes to the knowledge journalists, campaigners and the public have in this area, to protect the public interest.