The Associated Press used public records requests across 50 states to compile an exclusive dataset on the status and conditions of dams in the U.S. It found at least 1,688 high hazard dams in poor or unsatisfactory condition, and in places where failure is likely to kill at least one person. The AP shared its data — and thousands of inspection and emergency documents — with member news organizations so local outlets could do local stories using the AP’s research.
The results were immediate. Newspapers across the country editorialized about the need for lawmakers to prioritize the problem. A series of local newspapers in Pennsylvania wrote: “You ignore infrastructure issues at your own risk. That’s why every Pennsylvanian, including the 450,000 folks in York County, should be more than a little concerned about a recent Associated Press report on the condition of the state’s nearly 3,400 dams.” In Iowa, The Daily Nonpareil wrote, “Given the growing frequency and intensity of storms and the potential loss of life associated with dam failures caused by those storms, Congress should begin working to develop national standards for dam inspections.” The Columbus Dispatch editorial noted the AP story and its own localization of the project in writing, “Ohio’s dam guardians have no time to waste.” Some states, including Arizona, began studying the condition of their dams and how to make them safer as a result of the AP’s investigation.Weeks after the story moved, U.S. Sen. Kirsten Gillibrand called for congressional action to provide more oversight of the nation’s dams and more money to fix them, saying, “We should not wait for a catastrophic dam failure or major flooding event to spur us to action.”
To compile and analyze the original data set used in the dams project, we used a variety of data acquisition, analysis and visualization techniques. We FOIAed state submissions of dams databases to the National Inventory of Dams. In many cases, we were able to get access to them as CSV files, that could be interpreted by a database, but some states’ submissions came in as PDF files which had to be parsed, with a first pass using Tabula, and then double checking by hand. A custom Ruby script was used to iterate through state data files, and standardize headers using a metadata spreadsheet we created in Excel by hand, and export state-specific csvs in an easy-to-combine format. A custom Python script was used to pull down data from the National Inventory of Dams in 2016, for comparison with what we received from the states, before the structured data was made available in a downloadable format in 2018.
R was our primary tool for analyzing data. We compiled state-provided csvs into one file, standardized values in individual columns accounting for different ways of encoding values, and encapsulated answers about the data received from states into structured data. We used R for analysis including basics such as sorting, counting and grouping dams, as well as graphing the data to visually explore outliers and surface reporting avenues.Mapping was an important component of the project. We used libraries in R, as well as the program QGIS, to map dams we wanted to focus on by coordinates, so we could look at our data geographically. We also compared dam location to those of past disasters, such as earthquakes and floods. Eventually, our partners at ESRI used ArcGIS Online to create an interactive map for the public of dam coordinates we focused on.
What was the hardest part of this project?
The fundamental challenge of the project was also essential to its importance: because the National Inventory of Dams redacts key fields needed to assess the state of the nation’s dams, AP reporters had to file records requests in every state for the information the federal government would not release. Initial responses from state officials necessitated additional rounds of FOIA requests, a process that took more than a year. And of course when the data was made available, the formats varied wildly. Analyzing the data, comparing it to what was available on the federal level and then vetting and organizing the state reports _some released only in hard copy form _ took months longer. As the project progressed, reporters and data journalists also had to monitor the states for updated information to keep the data fresh. Once the data was usable for AP analysis, it had to be packaged in a way that would be easily accessible to thousands of AP customers across the country for localizations.
What can others learn from this project?
This project exemplifies the power of collaboration to overcome data roadblocks. The redaction of key information that people need to understand their own environmental safety could have been seen as a dealbreaker for any comprehensive coverage of this issue. Instead, AP data journalists and reporters found a way around the roadblock, and by sharing that data with other news organizations pre-publication, we were able to create a groundswell of coverage and concern that led to real impact. Local news organizations were essential to telling the ground truth story for their communities, and the collaboration allowed them to compare the dams in their coverage area with the state of dams across the country — providing essential context that is lacking from the occasional dangerous dam story.