2022 Shortlist

Waves of Abandonment

Country/area: United States

Organisation: Grist, The Texas Observer

Organisation size: Small

Publication date: 05/04/2021

Credit: Naveena Sadasivam, Christopher Collins, Clayton Aldern


Naveena Sadasivam is a senior staff writer at Grist. She previously covered environmental issues for The Texas Observer, Inside Climate News, and ProPublica. At ProPublica, she was part of a team that reported on the water woes of the West, a project that was a 2016 Pulitzer Prize finalist for national reporting. She is also a Livingston Award finalist and has been recognized by the Society of Professional Journalists and the Society of Environmental Journalists for her work. Sadasivam has a degree in chemical engineering and a master’s in environmental and science reporting from New York University.

Christopher Collins is an associate editor at The Texas Observer. The Wichita Falls native graduated from Midwestern State University in 2012 with a degree in Mass Communication. He previously has worked as a reporter at the Abilene Reporter-News and the Wichita Falls Times Record News, along with running a freelance reporting business.

Clayton Aldern is a senior data reporter at Grist. A Reynolds Journalism Institute fellow, his reporting and data visualization have appeared in a variety of outlets, including The Atlantic, The Economist, Logic, and on the floor of the U.S. Senate. He holds a master’s degree in neuroscience and a master’s in public policy from the University of Oxford, where he studied as a Rhodes scholar. Based in Seattle, he is originally from Minnesota.

Project description:

Volatile oil prices are setting the stage for fossil fuel companies to abandon oil and gas wells en masse, particularly in the Permian Basin straddling Texas and New Mexico. Those two states are already responsible for cleaning up about 7,000 such abandoned wells, a task that will cost at least $335 million. New statistical models developed by Grist and the Texas Observer predict the states could soon find themselves saddled with an additional 13,000 wells in need of cleanup — with a true cost that is closer to $1 billion. This project was supported by the Pulitzer Center.

Impact reached:

The orphan well problem has received considerable attention at the local, state, and national level since the publication of our series. Regulators were receptive to our findings, and advocates and industry veterans alike sought to translate our methods to other geographic contexts. (Our open-source code makes this type of effort possible.) We saw our reporting cited in various public testimonies in state and federal hearings on well bonding. The recent infrastructure bill passed by Congress included almost $5 billion for old oil and gas infrastructure remediation — and while we wouldn’t claim to have unilaterally spurred that particular investment, we’re confident we contributed to the broader policy conversation in this space. Furthermore, in January 2022 (nine months after our reporting), the U.S. Interior Department drastically revised upwards their estimate of the number of orphan wells nationwide. The adjustment aligns with our predictions.

With respect to the media ecosystem, our reporting was read by hundreds of thousands of people across a variety of platforms, and our feature was republished by The Guardian, among other outlets. Our Waves of Abandonment series received the Online News Association award in Investigative Data Journalism (Small/Medium Newsroom) earlier this year and contributed to Grist’s win in the General Excellence (Small Newsroom) category.

Techniques/technologies used:

Grist partnered with The Texas Observer to harness their reporter Christopher Collins’ expertise covering his native West Texas. Grist staff writer Naveena Sadasivam conducted archival research, interviewed policy experts, and filed dozens of public records requests to get the raw material for our data analysis. Grist data reporter Clayton Aldern then mined and combined these datasets and leveraged machine learning to create a statistical model of well abandonment. The model — which generates projections for every well in the Permian Basin — allowed us to predict which of the region’s tens of thousands of oil wells are on the verge of orphanage. A separate survival analysis helped us understand producers’ sensitivity to oil prices.

Specifically, we created a series of cross-validated LASSO models to identify wells the states had not yet considered orphaned (but which were statistically indistinguishable from orphan wells). Because of our imbalanced classes (i.e. there were many more examples of merely ‘inactive’ wells in the dataset than orphan wells), we also penalized models for incorrectly classifying orphan wells during training. This move prevented the models from simply learning to always guess ‘inactive’ — which would have led to high (but misleading) accuracy values for model performance. To forecast the public costs of these potential future abandonments, we used American Petroleum Institute identification numbers to perform lookups in state databases of plugging cost projections. We leveraged R/RStudio, Python, Tableau, Adobe Illustrator, and HTML/CSS/JavaScript. More information on our methodology is available in our methods write-up and on GitHub

To help ensure the piece had the most impact, we added eye-catching imagery and drone videography and crafted interactive data visualizations (including, in our methods story, a web-app version of the model) that allow readers to see for themselves how the variables fit together to tell the larger story.

What was the hardest part of this project?

One challenge came from the disparate nature of the datasets in question. Some of our production data went back to the 1960s and had been saved in a variety of obscure formats, including those written for original IBM computers. Accordingly, sidestepping some particularly nasty data-ingestion tasks involved loading 62 million rows of well- and lease-level data into memory before cleaning. And not all datasets came to us easily. For example, we didn’t receive a complete dataset of New Mexico’s enforcement actions until we presented them with a BeautifulSoup-scraped version of their own public-facing database. A public records request sent to the agency was also rebuffed until we challenged their decision by submitting a complaint to the New Mexico attorney general’s office. Only after the attorney general’s review and decision to compel the agency to comply with our request did we receive the enforcement records.

While other modeling efforts (at think tanks, for example) have sought to understand the orphan well problem at the national level, we believe they suffer from unrealistic assumptions about the abandonment rates of inactive wells in the country. Not all inactive wells will be orphaned. We believe our model is the first to operate at the level of the individual oil well (and produce realistic as opposed to sensationalist predictions).

From inception to completion, our reporting process took approximately 11 months.

What can others learn from this project?

We believe this project is a good example of journalists leveraging statistical models to make predictions about the future (as opposed to exclusively describing the present or explaining the past). Certainly, talented data journalists at outlets like The Economist and Bloomberg engage in this kind of projection all the time. Our project offers a reminder that statistical models of public data can have lives outside the economic realm. Furthermore, we believe the models presented here represent a nice example of combining various subfields of statistical inquiry (in our case, statistical learning/machine learning, null-hypothesis falsification testing, and survival analysis) in order to paint a fuller, more reliable, portrait of real-world phenomena.

Fundamentally, we think our project presents a case study in balancing advanced statistical models with archival research, public-records requests, interviews, and other tools of traditional reporting. When further combined with expressive visuals, interactive graphics, and drone videography, the package communicates important findings without deifying numbers over hard, qualitative reporting — despite being a piece of “data journalism.”

Project links: