In Mexico, 133 billion liters of water are used by the ultra-processed food industry that, in addition to causing chronic diseases, causes serious damage to the environmen. We made a database and a map to locate the water wells. We found that some companies exceed the limit they are allowed
The report was republished in many media. We were interviewed on TV shows and invited to some universities to talk about it. For the first time, the business chambers spoke about the water consumption of these industries. For the public, it was shocking to know the amount of water used to make a bottle of Coca Cola, there was a wide discussion on social networks on the subject. Activists used the information to create campaigns to promote the sustainable use of water
Tools for scraping, parsing and processing data:
Tools used on the website
First, we had to crawl the whole database contained in https://www.wri.org/aqueduct as the online explorer at the time wasn’t usable for research work. The website had 2 components, a registry of all the concessions that matched a query and the details page for each concession id. The first was a ASP.Net website with a secured REST API that we couldn’t use outside the browser. We wrote a crawler using automated chrome instances via selenium to interact with the ASP.Net query component and collect the Ids of each concession. As the website crashed and hanged pretty often, we decided to use MongoDB as our crawler’s database engine and rewrited the script so it could run on multiple threads and recover from the website constantly crashing while avoiding duplicated or missing data. Luckily the details page worked via path query parameters. Once we collected all the Ids, we were able to use a much simpler python script using requests, b4s and pandas to collect and process the details of each concession.
With MongoDB, we were able to easily query and extract subsets of data which were used for research by our journalist team.
The website was built on Angular v11. We exported the final dataset as json and used NgRx as a state manager for the data explorer. Using RxJs, we built a reactive dashboard that filters the data and shows the concession details, the associated brand, geographical position on a mapbox.js map and the water stress level of the region with WRI’s aqueduct platform.
Currently the website runs containerized on a K3s Kubernetes cluster for high availability.
What was the hardest part of this project?
Companies use different social reasons to request water concessions, so once the database was extracted we had to review the hundreds of thousands of permits, one by one to verify their business line, since not all of them have permits for industry. So we had to resort to public trade records to identify them and in some cases we called by phone to ask directly what brand they belong to.
The database provided by the National Water Commission is not entirely transparent, so we hope that this database will help citizens to locate these companies.
What can others learn from this project?
That official information, even if it is public, has a hidden layer that must be explored and that from it, stories of impact can be told