Our investigation into the health of the nation’s waterways revealed that pollutants found in rivers were at their highest levels since testing began: one in ten tests revealed a serious breach, with at least one recorded at half of all testing sites.
We visited some of the most idyllic rivers flagged by the data in the height of summer to see if residents using them for bathing were aware of the dangers.
Our in-depth reporting humanised the data while also exploring the context behind the story: that sampling levels were down significantly and budget cuts had prevented effective monitoring.
The project prompted a strong response from our readers and generated a lot of discussion. Some shared their experiences of wild swimming, others shared their shock at our findings and at the revelations of how many fewer tests were now taken.
A group campaigning for better regulation of the Ilkley river in the North of England praised our investigation and have since gone on to apply for the river to recieve bathing certification from the Environment Agency. This would mean the EA would be forced to test the water quality with greater scrutiny.
We used open data published by the Environment Agency to filter millions of samples taken across the country to find all tests for “hazardous pollutants”, as identified by the Environment Agency. We had to collate 20 years worth of data across multiple, large files and used Apache Spark to query a database too large for R to handle on its own.
To determine whether a test was a breach or not, we compared the recorded result against a CSV file of “hazardous pollutants” published in the government’s guidance to all businesses with Environment Agency permits.
We grouped the breaches by the name of the river it was recorded at to determine how many breaches are recorded annually at each site and we grouped the data by year to calculate the proportion of all tests taken that year that revealed a breach.
We used additional datasets published by the EA to determine that 86 per cent of the countries waterways fell short of the EU’s ecological standard at last assessment, again performing our analysis in R. We mapped this using a shapefile we generated by combining scraped files of individual rivers on the EA website, combining them into a single shapefile, and then joining with our summarised data.
What was the hardest part of this project?
Our headline figures did not reveal the full picture. Once we had completed our analysis, and discovered that the proportion of tests that revealed breaches was at its highest since testing began, we wanted the answer to why this was happening. The Environment Agency told us that more breaches meant that sampling had become increasingly targeted and risk-based.
Through our reporting we discovered that the testing regime itself was as interesting as the results. By digging into the number of tests taken to discover that sampling was down dramatically over 20 years, we found a new angle for the story.
Ensuring this report was appropriately nuanced and considered the scale of sampling, the responsibilities of water companies and the Environment Agency, and the true dangers of the substances being tested for helped make this in-depth piece a strong example of data-informed reporting.
What can others learn from this project?
We put our findings to people swimming in the River Wharfe on a warm summer day. They, alongside fishermen, volunteers, conservationists and other stakeholders in the health of the river, provided an important perspective as to why water quality testing was so important.
We used the data to identify rivers of interest – those with significant breaches or where the number of tests was falling disproportionately – and then focussed our reporting there. Using the figures to inform our journalism from the very beginning made the process easier and helped us find good case studies.