“Dangers in Our Air: Mapping Chicago’s Air Pollution Hotspots”

Entry type: Single project

Country/area: United States

Publishing organisation: MuckRock, the Chicago Sun-Times, WBEZ and the Cicero Independiente

Organisation size: Big

Publication date: 2022-05-16

Language: English

Authors: Smarth Gupta of Columbia University’s Brown Institute for Media Innovation; Dillon Bergin of MuckRock; María Inés Zamudio, Charmaine Runes and Maggie Sivit of WBEZ and Brett Chase of the Chicago Sun-Times provided reporting. Derek Kravitz of MuckRock, Dave Newbart of the Sun-Times and Matt Kiefer of WBEZ edited.


MuckRock is a nonprofit, collaborative news platform that brings together journalists, researchers and the public to request, analyze and share primary source data and documents in the public interest. In addition to its services, training and support programs, MuckRock’s news team works on original editorial projects, including both collaborative and independent reporting efforts on issues of public importance.

Project description:

“Dangers in Our Air: Mapping Chicago’s Air Pollution Hotspots” is a multi-newsroom collaboration analyzing air pollution data from a new Microsoft network of sensors installed atop bus shelters across Chicago and our own PurpleAir sensors installed on the homes of neighborhood volunteers. The result has been the first neighborhood-level map of air pollution across Chicago, along with a collection of audio and print stories examining the reasons behind spikes in particulate matter 2.5 and why pollution coalesces in largely Hispanic and Black enclaves of the city.

Impact reached:

In response to the findings on Fourth of July air pollution, the Chicago Department of Public Health said that the data highlights not just the illegal use of fireworks, but how pollution from fireworks “can affect the health of vulnerable populations” on the city’s South and West Sides. After the second story, the department said that it is only in the beginning stages of ingesting, analyzing and using air sensor data, but will be using data like this in a forthcoming impact study.

In 2023, we’ll continue work on this project by honing in on one of the root causes of air pollution in Chicago’s hot spots: freight hubs and last-mile distribution warehouses. We’ll continue to use data from air quality sensors and combine the data with public records like 311 complaints, inspection violations and environmental impact statements.

Microsoft has also indicated that they may expand their air quality network to cities beyond Chicago. If their network does expand, we plan on providing training and editorial support as well as code and analysis to reporters in those cities so that they can make use of the data quicker and investigate the issues that matter to residents on the ground.

Techniques/technologies used:

Because of the massive amount of data, and the details in each reading, MuckRock and Columbia University’s Brown Institute for Media Innovation, WBEZ and the Chicago Sun-Times initially sought not to answer all questions at once, but one critical question first:

In each month since the sensors have been installed, what areas of Chicago have been PM2.5 pollution hotspots?

To do this, we stored the dataset in our own database, and then averaged sensor readings by hour, day and month. At each aggregation step, a so-called “data completion criteria” was applied to match the data quality standards laid out by the EPA for Hotspot Identification and Characterization. Any sensor’s aggregated reading which had a sampling rate of less than 75% in the defined aggregation period was excluded.

At this point, the collected readings still only showed the amount of PM2.5 recorded at each individual sensor, which is largely meaningless without being calibrated for context.

To estimate how the reading of an individual sensor might explain pollution in specific neighborhoods, census tracts or community areas, we used a mathematical formula called Inverse Distance Weighting, or IDW. This is a popular interpolation technique, which makes use of the readings from all sensors at a given time, and uses those readings, based on the sensor’s distance, to estimate the level of PM2.5 pollution in regions between sensors. We then used this technique to estimate air quality levels at more than 50,000 uniformly distributed geocoordinates in the entire Chicago region and then generated tract level aggregated estimates. These estimates were then normalized based on the average estimated air quality in Chicago.

Context about the project:

Chicago’s air quality is among the worst in the U.S., and the city has several local hotspots for particulate matter 2.5 — the tiny particles that come from diesel trucks and industry and enter people’s lungs and blood, causing significant health problems.

Between April and July 2021, the tech company Microsoft installed 115 new air quality sensors it built across Chicago. Microsoft’s “mesh network” represented one of this country’s first large-scale air quality projects — a novel way to explore neighborhood-by-neighborhood differences in pollution.

For the past 18 months, journalists and data scientists at MuckRock and Columbia University’s Brown Institute for Media Innovation, along with partner newsrooms throughout Chicago, have used Microsoft’s Application Programming Interface, or API, to analyze this data for a series of stories on the city’s comparatively poor air quality. The API acts as a pipeline to the 115 sensors around Chicago and the information those sensors have recorded for more than a year — every five minutes, every hour of the day.

We then went a step further, installing our own air quality sensors in Chicago neighborhoods that lacked coverage in the Microsoft network — and looked for trends and spikes in pollution.

The sensors, both our PurpleAir devices and the Microsoft models, can be used to fill in gaps where there aren’t higher quality regulatory sensors, to identify spikes and hotspots, and to empower people with concerns about the air pollution in their neighborhoods, experts told us. Microsoft said it is also working to make its data easier to navigate for the general public.

But the Microsoft API was, and remains, inaccessible to most users. And the data work to accurately calibrate, plot and visualize the data is particularly daunting: If three months’ worth of data were placed into one spreadsheet, it would contain more than 8 million rows.

What can other journalists learn from this project?

With our interpolated data in hand, we were able to plot the estimated air quality readings, by month, across the entire city. We quickly found that predominantly Hispanic and Black portions of Little Village, Austin, Englewood, Auburn Gresham, Irving Park and Avondale, which are near heavy traffic and industrial areas, recorded the highest PM2.5 readings. But these estimates only tell part of the story. We then needed to connect with Chicagoans who live in the most impacted neighborhoods.

With WBEZ and the Sun-Times, we asked the public for help, to allow us to install our own air quality sensors to compare our readings with the Microsoft network and other sensors, such as EPA regulatory stations and other PurpleAir sensors. That callout resulted in more than 200 volunteers from across Chicago writing to us, asking for a sensor to be installed outside their home. We had to choose just two families — one on Chicago’s Far Southeast Side, near several intermodal freight hubs and the other on the Northwest Side, just across from one of the city’s most congested highways.

Both families had long struggled with the health effects of poor air quality near their homes, including asthma attacks and scarring on lung tissue. We interviewed them repeatedly over the next several months, relying on their experiences with poor air quality around the Fourth of July and summertime pollution spikes to make sense of the data we were seeing.

This resulted in two follow-up stories: One looking at the spike in air pollution on the Fourth of July and the other looking at an unexplained jump in pollution in the city’s East Side neighborhood during late July, which likely led to the sensor becoming clogged and decommissioned entirely.

Project links: