Volatile oil prices are setting the stage for fossil fuel companies to abandon oil and gas wells en masse, particularly in the Permian Basin straddling Texas and New Mexico. Those two states are already responsible for cleaning up about 7,000 such abandoned wells, a task that will cost at least $335 million. New statistical models developed by Grist and the Texas Observer predict the states could soon find themselves saddled with an additional 13,000 wells in need of cleanup — with a true cost that is closer to $1 billion. This project was supported by the Pulitzer Center.
The orphan well problem has received considerable attention at the local, state, and national level since the publication of our series. Regulators were receptive to our findings, and advocates and industry veterans alike sought to translate our methods to other geographic contexts. (Our open-source code makes this type of effort possible.) We saw our reporting cited in various public testimonies in state and federal hearings on well bonding. The recent infrastructure bill passed by Congress included almost $5 billion for old oil and gas infrastructure remediation — and while we wouldn’t claim to have unilaterally spurred that particular investment, we’re confident we contributed to the broader policy conversation in this space. Furthermore, in January 2022 (nine months after our reporting), the U.S. Interior Department drastically revised upwards their estimate of the number of orphan wells nationwide. The adjustment aligns with our predictions.
With respect to the media ecosystem, our reporting was read by hundreds of thousands of people across a variety of platforms, and our feature was republished by The Guardian, among other outlets. Our Waves of Abandonment series received the Online News Association award in Investigative Data Journalism (Small/Medium Newsroom) earlier this year and contributed to Grist’s win in the General Excellence (Small Newsroom) category.
Grist partnered with The Texas Observer to harness their reporter Christopher Collins’ expertise covering his native West Texas. Grist staff writer Naveena Sadasivam conducted archival research, interviewed policy experts, and filed dozens of public records requests to get the raw material for our data analysis. Grist data reporter Clayton Aldern then mined and combined these datasets and leveraged machine learning to create a statistical model of well abandonment. The model — which generates projections for every well in the Permian Basin — allowed us to predict which of the region’s tens of thousands of oil wells are on the verge of orphanage. A separate survival analysis helped us understand producers’ sensitivity to oil prices.
To help ensure the piece had the most impact, we added eye-catching imagery and drone videography and crafted interactive data visualizations (including, in our methods story, a web-app version of the model) that allow readers to see for themselves how the variables fit together to tell the larger story.
What was the hardest part of this project?
One challenge came from the disparate nature of the datasets in question. Some of our production data went back to the 1960s and had been saved in a variety of obscure formats, including those written for original IBM computers. Accordingly, sidestepping some particularly nasty data-ingestion tasks involved loading 62 million rows of well- and lease-level data into memory before cleaning. And not all datasets came to us easily. For example, we didn’t receive a complete dataset of New Mexico’s enforcement actions until we presented them with a BeautifulSoup-scraped version of their own public-facing database. A public records request sent to the agency was also rebuffed until we challenged their decision by submitting a complaint to the New Mexico attorney general’s office. Only after the attorney general’s review and decision to compel the agency to comply with our request did we receive the enforcement records.
While other modeling efforts (at think tanks, for example) have sought to understand the orphan well problem at the national level, we believe they suffer from unrealistic assumptions about the abandonment rates of inactive wells in the country. Not all inactive wells will be orphaned. We believe our model is the first to operate at the level of the individual oil well (and produce realistic as opposed to sensationalist predictions).
From inception to completion, our reporting process took approximately 11 months.
What can others learn from this project?
We believe this project is a good example of journalists leveraging statistical models to make predictions about the future (as opposed to exclusively describing the present or explaining the past). Certainly, talented data journalists at outlets like The Economist and Bloomberg engage in this kind of projection all the time. Our project offers a reminder that statistical models of public data can have lives outside the economic realm. Furthermore, we believe the models presented here represent a nice example of combining various subfields of statistical inquiry (in our case, statistical learning/machine learning, null-hypothesis falsification testing, and survival analysis) in order to paint a fuller, more reliable, portrait of real-world phenomena.
Fundamentally, we think our project presents a case study in balancing advanced statistical models with archival research, public-records requests, interviews, and other tools of traditional reporting. When further combined with expressive visuals, interactive graphics, and drone videography, the package communicates important findings without deifying numbers over hard, qualitative reporting — despite being a piece of “data journalism.”