We analyzed genomic sequencing data and used phylogenetic tree visualization software to show that even after the CDC ended its outbreak investigation into salmonella infantis, the dangerous and drug-resistant strain was still running rampant through the chicken industry and sickening tens of thousands of people. The unchecked spread of this strain is emblematic of America’s baffling, broken food safety system, which is ill-equipped to protect consumers or rebuff industry influence. We also built an interactive database using the USDA’s microbiological sampling data to allow consumers to look up the salmonella rates of the plants that produced their chicken or turkey.
A week before the first story was published — after about a month of interview requests about our findings — the USDA announced that it was rethinking its approach to salmonella. In November, the department asked a key advisory committee for suggestions on how to improve its testing program to focus more on public health risks. In particular, the USDA said it wanted recommendations on how to focus on the riskiest types of salmonella, how much salmonella was present and how to better control the bacteria on farms — all vulnerabilities highlighted by ProPublica’s reporting. And in early December, the USDA asked poultry companies for project proposals to test new strategies for reducing contamination. ProPublica’s reporting also spurred one of the country’s leading food safety lawyers, Bill Marler, to threaten to sue the USDA if it didn’t respond to his long-pending petition to ban the sale of raw meat and poultry tainted with certain types of salmonella — including infantis. Our Chicken Checker app engaged thousands of consumers across the country who, for the first time, were empowered to make more informed shopping decisions. And through the app, we received nearly 900 submissions from people who collected information from packages of poultry, which allowed us to see which supermarkets had received poultry from the most problematic plants. Several readers wrote in to say that they had sought to avoid industrial poultry processors and were surprised to learn through Chicken Checker that the organic, free-range chicken they paid a premium for was actually processed by a big chicken company. One Maryland reader called our investigation “eye opening and upsetting piece.” “Congress is supposed to be protecting us, the consumer, and yet they are constantly letting us down by siding with the very industry they are supposed to be protecting us from,”
Data reporter Irena Hwang used a combination of command-line tools, DB Browser for SQLite and various Python open-source libraries. Hwang used command-line tools to obtain data from the NCBI Pathogen Detection Browser’s (https://www.ncbi.nlm.nih.gov/pathogens/) public API and DB Browser for SQLite to convert raw TSV files into query-able SQL databases. Then, Hwang used the Jupyter Lab user interface to write Python scripts for combining and analyzing data from public APIs and state and federal information requests, using Python packages including pandas and sqlite3. Hwang also used the Interactive Tree of Life (https://itol.embl.de/), software developed by researchers in Germany, to visualize phylogenetic data.
News applications developers Andrea Suozzo and Ash Ngu combined 15 datasets, including the USDA’s list of registered poultry processing plants and corresponding salmonella sampling data, into a PostgreSQL database. They adapted federal regulatory methodologies used to evaluate salmonella prevalence in plants to focus on the types of salmonella most likely to cause human illness, then built a front-end searchable interface to surface and visualize that data using Ruby on Rails and D3. The Chicken Checker app showed consumers how to find the plant code, which may appear in several places on raw poultry packaging, and presented information about the plant’s salmonella record in an easily understood format.
What was the hardest part of this project?
The project depended on analysis of data that was new to ProPublica, particularly genomic sequencing data. Given the highly specialized nature of this data, much of our reporting focused on gaining a thorough understanding of the origin and scope of the data, identifying which analyses were most informative and useful, finding and familiarizing ourselves with the right software for analysis, interpreting our results and verifying those results with federal agencies and nearly a dozen outside experts.
What can others learn from this project?
ProPublica is excited to bring to the attention of other journalists underutilized databases like the NCBI Pathogen Detection Browser and USDA Food Safety and Inspection Services Laboratory Data. We believe that these databases can and should be used for additional reporting on food safety and public health, and accountability stories about the federal agencies that gather and review this data in order to regulate industries. We also believe that this story can help expand the definition and scope of data journalism to a field that can leverage even the most esoteric datasets from academic science. Investigative stories are often data-driven, and we believe that our story and Chicken Checker news application help expand that definition to include “science-based” and “public-service oriented.”