For this story on PFAS, or “forever chemicals,” John Dunbar and Christina Brady used a novel way of analyzing lobbying records by focusing on search terms rather than often-misleading overall lobbying figures. That data analysis formed the core of the article and helped pinpoint a key insight: that water utilities were outpacing other groups, such as chemical companies and the air travel industry, in lobbying Congress. That lobbying, including from publicly owned utilities, largely focused on weakening certain provisions to clean up the toxic chemicals from the nation’s water systems.
While the piece did not result in any new laws being passed or changed, it did impact how water utilities were seen in the eyes of reformers when it came to efforts to clean up the forever chemicals, known as PFAS. That was made possible by the enormous effort that went into the data analysis. We let it be known that it was water utilities—not the mega chemical corporations and their trade associations—that were most active in lobbying the PFAS cleanup bills and that they were more often than not on the side of industry. This was a surprise to everyone and a resulted in a paradigm shift among those who work on these issues.
We used R to download the lobbying disclosure database from the Senate website, process the data and save it. We use a PostgreSQL database to store most of the data while saving the lobbying issues section to an Elasticsearch database.
Once the data is processed and saved, we used R to search the Elasticsearch database for terms related to PFAS. Then, we matched those issues back to the registered lobbyist and the client. We analyzed the data in R and categorized the industry of the companies hiring lobbyists manually.
What was the hardest part of this project?
The lobbying disclosure database is full of amended forms, potential for double counting, inconsistencies, and ambiguity. It is also difficult to parse and process.
Forms are often amended and it is important to look at only the latest amendment. However, there is no ID number matching amended forms to their original forms, so we had to get creative and use the registered lobbyist, client, report type and report period to link forms and find the latest form.
It is also important to know that both outside law firms and the company hiring them may submit overlapping forms as both entities must report lobbying. The company reports may also include lobbying by in-house lobbyists. Thus, it is necessary to scrutinize the reports to ensure you are not double counting.
Next, there limitations to using the issues field. On one extreme, some people are more detailed and clear when completing this section. On the other extreme, some people leave it blank. Furthermore, it is impossible to link any one individual to any one issue because they are often all listed together. Understanding these limitations are essential to not misusing the data.
Finally, the xml files in the bulk download cannot be simply uploaded to a single table in a database. A lot of planning and trial and error was required to figure out how many tables we needed and the best ways to link them.
What can others learn from this project?
Getting to know your data and exploring it thoroughly before jumping into an analysis is important. In this case, that’s all we needed. Looking at how many lobbyists were hired each year was one of the first things we did to understand the data set. That showed us a huge spike in 2019, which led us to the story. The natural next question was “who is driving that increase?”. We discovered, much to our surprise, that it wasn’t the manufacturers but water utilities.