2022 Shortlist

Sacrifice Zones: Mapping Cancer-Causing Industrial Air Pollution

Country/area: United States

Organisation: ProPublica

Organisation size: Big

Publication date: 02/11/2021

Credit: Lylla Younes, Al Shaw, Ava Kofman, Lisa Song, Max Blau, Maya Miller, Kiah Collier, Alyssa Johnson, Ken Ward Jr., Jeff Kao, Lucas Waldron


Lylla Younes and Al Shaw are interactive data reporters and news applications developers.

Ava Kofman, Lisa Song, Max Blau, Maya Miller, Kiah Collier, Alyssa Johnson and Ken Ward Jr. are reporters.

Jeff Kao is a computational journalist.

Lucas Waldron is a visual investigations producer.

Project description:

In a groundbreaking interactive-first investigation that the EPA’s own staffers praised as “a wake-up call,” ProPublica revealed more than 1,000 hot spots of cancer-causing industrial air pollution that the agency allowed to take root across America. These are “sacrifice zones.” Residents pay the price so that consumers can enjoy products made there. We captured the ways EPA has failed to protect the public, not just through weak policies, but through deliberate choices recounted to us on the record by insiders. This project was conceived by journalists on our interactive data team and grew to include expertise from across our newsroom.

Impact reached:

The project, which the EPA’s own staffers praised as “a wake-up call” and “a huge bucket of cold water in the face,”  led to the kind of impact environmental advocates said they had been working for decades to achieve. Two days after the first parts were published, the EPA announced that its administrator Michael S. Regan would visit the communities we featured; on his tour, he said the agency had “looked very carefully” at ProPublica’s reporting and was “incorporating much of it” into plans for reform, which include increasing air monitoring and enforcement and reexamining the way the agency assesses cancer risk. New cumulative risk assessment guidelines are expected to be released in early 2022, along with an updated “more robust” analysis of air pollution. In response to our reporting, officials launched air monitoring efforts in Laredo, Texas and Pascagoula, Mississippi. 

The investigation, which we distributed to impacted communities through an unprecedented engagement effort, also led to a groundswell of activism among residents, many of whom said they had been unaware of the dangers they’d faced. Residents lobbied for air monitoring, packed town halls, circulated petitions, started neighborhood health surveys, and called for the CDC to conduct blood testing on schoolchildren. 

More than 60 local TV stations aired segments about our analysis; at least 16 local newspapers did the same. Their stories extended our impact — an article in a Michigan newspaper led state officials to investigate a polluter that had never been permitted; a Missouri television station’s report prompted such outrage that the EPA called a meeting in Verona, Missouri in response to community outrage over the extreme cancer risks we revealed. “As soon as I saw that report, I knew I needed to come down here tonight,” Greg Winters told us at the meeting. “It pissed me off.”

Techniques/technologies used:

We used five years of the EPA’s Risk-Screening Environmental Indicators (RSEI) database, along with EPA’s Toxics Release Inventory to generate estimated additional cancer risks from industrial emissions down to 810 x 810 meter grid cells for the whole country. We averaged values over the years 2014-2018 to get a better idea of long term exposure. We used US Census data to determine racial disparities in cancer risk from industrial emissions. We obtained all of this data through online government servers. 


At around 7 billion rows, our data was too large to use the analysis tools we’d normally use, so we turned to  Google BigQuery. Using BigQuery, we were able to compute cancer risks at an incredibly high resolution — each grid cell in our analysis represents a quarter of a square mile of the country. We used Ruby and Python to write a clustering algorithm that generated “hot spots” around areas that represented estimated lifetime cancer risks from industrial emissions of above 1 in 100,000. We used R to do our race analysis. We used Ruby, Python (with rasterio) and Photoshop to generate static maps of high risk zones. To make our interactive map, we took the result of our analysis and compressed it into mbtiles format with tippecanoe. We then designed the interactive map with Mapbox Studio and wrote a JavaScript web application using Vue.js and the Mapbox API to geocode user input, query the data and surface it for readers. The web application also included d3.js charts for individual results.

What was the hardest part of this project?

Five years of EPA modeled industrial chemical concentrations added up to about 7 billion rows of data. Turning the disaggregated concentration data into cancer risks required learning how big data systems worked, and then learning how to distill the outputs into something that could be served in a web application and queried on the fly. It took the better part of a year to develop that pipeline. But once we had initial findings, we ran into another issue: the quality of the government’s data. Because the EPA doesn’t directly monitor the air, it accepts self-reported emissions estimates that companies often derive using flawed formulas. The EPA does little to check the accuracy of these numbers and failed to catch major errors that our reporters began to spot. To publish an analysis we could trust, the entire reporting team undertook a vast, weekslong data quality scrub that the agency had never bothered to do. The scrub led more than two dozen facilities to correct their data with the EPA and for agency officials to admit that the EPA needs to do a better job of ensuring data integrity. We then wrote software to reintegrate those updated submissions into our overall analysis.

From the very start, we recognized that all of this technical work would amount to little, however, if it neglected to serve the people in these hot spots. We launched the most ambitious and far-ranging community engagement endeavor ProPublica has ever undertaken to make sure our work reached those most impacted by the risks we’d uncovered. We reported on the ground in 10 states and mailed postcards to 8,800 homes. In the end we heard from more than 1,000 impacted residents across 34 states, many of whom had been unaware of the dangers posed by nearby facilities.

What can others learn from this project?

Cross-newsroom collaborations yield incredible results. The custom analysis undertaken by our newsroom’s data journalists for over a year laid a foundation upon which we could tell uniquely authoritative stories. Seven reporters then joined in the effort to illuminate how and why these hot spots came to be. Together, the team distilled our findings— powered by billions of rows of data, countless records and scores of interviews— into lucid language with a clear presentation. Our visuals team, for instance, developed an interactive graphic to teach readers about cumulative risk—a concept they needed to see to understand.

We learned that giving readers such an intimate and personalizable look at a problem makes for effective storytelling. Readers had strong emotional reactions to being able to plug in their address and get a precise view of the estimated industrial cancer risk where they live for the very first time. Especially since no one else had ever compressed, processed and made this data accessible in this interactive form. Our map quantified a problem in many places that was previously anecdotal, allowing residents in the most marginalized communities to point to hard data when discussing the risks in their neighborhood. 

Visualizing and publishing a government’s agency data in such a granular way can also drive change and conversations among policymakers. EPA employees told us that our presentation has led the agency to improve its own data analysis efforts. 

Finally, the careful way in which we approached our analysis helped it be taken seriously by experts who might otherwise dismiss work by non-academics. Prior to publication, we invited air toxics scientists to give us feedback during map demonstrations and went over our detailed methodology with them, word for word. This was hard and unglamorous work, but it demonstrated to key stakeholders that our approach was sound.

Project links: