At the beginning of the Covid-19 pandemic there was no reliable public source for Covid-19 cases in Finland. We started to collect that data by hand to our database. Right from the beginning we also published that data as open-source API for everyone to use.
The Helsingin Sanomat Covid-17 Tracking Project covers roughly three main areas: creating and maintaining our own open source Covid database, creating automatically updating graphs and alerts based on the data and publishing a daily Covid data story.
Our Covid-19 tracking project has had huge impact on two fronts: informing the public and in creating a new kind of open-source culture in journalism.
Our API has been used by several of our competitors, many start-ups and EU’s official statistics. In April 2020 alone the APIs were used 86 million times and in May 50 million times.
Later, when official government API sources got better, we automatized the datasourcing and continued to offer that data through our API. These APIs are still used by many because of the easy usability.
The government doesn’t still offer all crucial information such as hospitalization numbers in a machine-readable way. We have continued to update that data manually to our database and even the EU’s European Centre for Disease Prevention and Control still uses HS API in their official statistics.
In May 2020, together with the tech company Futurice, we created a service called Oiretutka (https://github.com/futurice/symptomradar) where people could report their symptoms. Through that we were able to detect a Covid-19 surge in Northern Finland before official statistics.
The money and time spend on maintaining our open-source database has been significant, but we feel publishing open-source data is important for journalism. It is new kind of journalism but based on traditional journalistic principles: we want to spread current, accurate information about important issues to the public.
We also use the data by ourselves. We have created multiple, automatically updating charts and tools. Also, our news robot writes a daily breaking news story about the newest Covid-19 cases. Our daily data story, that has been read millions of times during the pandemic, presents all our corona data graphs.
This combination of data gathering, news automation and open data is, we believe, unique and important part of the future of journalism.
React is the main library used to build the automatically updating visualisations. The visualisation layer was built to support multiple visualisation dimensions, such as different geographical scopes (Finland <--> Global), measurements (Confirmed cases, deaths, hospitalisations, vaccinations) and scales (incidence vs absolute numbers). In addition the journalists could pick and choose the visualisations that supported their story the best.
The automated gathering of the data was implemented as one node.js service, which would collate the data from the multiple automated sources available (such as Johns Hopkins open corona data, The Finnish Health Authority’s APIs and the HS data collected by hand). In addition to scraping the data periodically from the sources a more efficient detection of changes in the Finnish situation was implemented with AWS Lambda to give HS an edge in reporting the real time situation of Finland.
The open data APIs were build using AWS Lambda for ease of scaling and maintenance. The lambdas normalise the data collected by hand and machine for a standardised access to the data for the consumers.
Throughout the Covid-19 pandemic HS Datadesk has worked in an extreme agile manner, combining software development expertise with journalistic excellence to provide tools to address the constantly changing news situations. A practical example of this has been that all the web visualisations have been made in a way that they are easily exportable to print as well, enabling the infographic designers quickly and easily make print versions of the visualisations. Another example is the editor that allows journalists to create visualisations for exactly their story (https://bit.ly/2ITUM9J).
What was the hardest part of this project?
Covid-19 has of course been new terrain for all of us, so we had to design everything on the go. One of the biggest challenges has been the inconsistency and schedule changes in some of our data source, e.g. the data structure changing without prior notice.
What can others learn from this project?
The power of open-source data and the benefits of sometimes collaborating with competitors. Combining software development with journalism is potent combination in responding to both longer running situations and quick reactions.