Since the beginning of the coronavirus pandemic, data about cases in a local level has been arduous to reach. At elDiario.es, we created an exclusive database of COVID-19 cases in each municipality of Spain and presented the data as an interactive map that is updated once per week. To build this project we collect the data from each of the 17 regions (more than 5000 municipalities) Spain has, who publish the data in different formats. This publication is performing a public service that neither the government or other media outlets are doing.
During the first months of the coronavirus pandemic, only six Spanish regions were activly publishing the data of coronavirus cases registered in their municipalities. The publication of our map, our persistence calling each regional goverment everyday and the request of our readers for more data influenced the goverments to start publishing this dissagregated local data.
The publication of our map works as an early warning systema for citizens and authorities, who can notice if the situation in their city is dangerous before all its region is affected or the global indicators published by the government can portray the danger.
Moreover, the data is published in an aggregated way and this allows comparisons between cities. The weekly ranking of coronavirus incidence in the main Spanish cities is one of our most shared charts week after week (even by other news media, as seen at link 3). Even city council members has shared the publication to raise awareness on the the situation in their cities or are surprised by the data we published because their local goverments are not delivering the information as fast as we are doing it.
Every region publishes the data in a different format and they even change it over time, so we had to design different ways to gather and convert it. Some regions make the data accessible using xlsx, csv or pdf files, but others publish it on visualizations (charts or dashboards), so we must get creative to find the source of the data and extract it from the code behind. As we do the same process each week, we designed a reproducible way to collect, clean and check the data and we centralize the results in a shared spreadsheets book.
We use the weekly collected data to obtain incidence rates per inhabitant (to be able to compare between municipalities) and to calculate the change in time, using our database built in the previous weeks. We even have used this timeseries to publish other articles about the evolution of coronavirus in a local leves, as can be seen in link 2.
The main result of our data collection process is a map created using Mapbox and D3.js, which shows in different tabs the absolute number of cases, the cases in the last 14 days per 100.000 inhabitants and the evolution of the pandemic in every municipality. We also publish a chart ranking the coronavirus incidence in the top 100 most populated cities and a table to ease the search.
What was the hardest part of this project?
The main importance of this project is that this collection of data can’t be found anywhere else. Neither the government makes it publicly available nor other media publishes it. And, despite each individual figure that we publish for each municipality can be found on the regional sources, nowhere else other than our map can the data be found in an aggregate and comparable way.
Behind the map and the different charts there is arduous work to find the data sources, collect, clean, aggregate, analyse and visualise that elDiario.es’ team has been performing every week since April 2020. In the meantime, we also kept asking the regions that didn’t publish the data to do it and even the government who, despite having this data, is refusing to publish it.
What can others learn from this project?
Persistence is maybe what defines this project better: during 2020 we have updated this map nearly 40 times. Thanks to this, every week more and more people (including other journalists) were waiting for the new data and relied on our map.
To be able to repeat the process every week, organization and a reproducible workflow were basic. It included annotations on changes and on the specificities for each region and also verification steps.