With the covid-19 pandemic impacting our daily lives, the media started to report on national numbers of cases. But, as a reader, I always wanted to check how the number of cases was evolving in the municipalities where people I cared about lived. The Portuguese government was publishing daily numbers by municipality and there were quite a few dashboards already. But could those numbers be turned into local personalized stories?
Using that data and contextual information about the municipalities, I built this news application that turned into a personalized news story about one of the 308 municipalities in Portugal.
When I get excited about a new dataset that I’m going to explore, the mantra that I try to keep in mind is: it’s not data journalism if it’s not relevant to people.
National numbers of covid-19 are important news – and of course, people want to have a general view about how the pandemic is evolving. But people also want to know how things are going on where they live. There were already dashboards and a pdf file from the Portuguese authorities with that info. But just numbers don’t make journalism. That’s why this news application was useful – because people got to have personalized information without losing context.
We also decided to create custom links for all 308 municipalities to allow people to bookmark and share on social media the article about the place where they live. Our analytics tool showed us that was a clever move since we saw an increasing number of people accessing the page using those custom links.
But, in my opinion, what works as a true indicator that a news application is useful for our readers is when we get emails from them asking questions. In this case, apart from people that truly believed that the text was not automatically generated and sent us e-mails asking if we had information about a specific place inside that municipality, I got some people asking if we could update the data – I was running the script that exported the .json file manually and uploading it to our FTP server, so there were days that the data got updated a bit later than usual. When the Portuguese authorities stopped publishing local data daily, there were a lot of readers who sent me emails asking Why we stopped updating it.
Since the data was being published in pdf files by the Portuguese authorities, I’ve used the tabulizer R package to turn it into a CSV. Because those pdf files also had some inconsistencies – for example, names of municipalities were sometimes abbreviated and other times not – that parsing needed to be manually checked daily.
Because I was the one developing the news application, I used R to transform the data into a .json file that fed everything on it. All the calculations, data joins and other data transformations were done using R and the tidyverse family of packages.
For the news application, we used vue.js and its reactive features to do everything on the browser. This allowed me to build a tone of “if-else” rules to build the customized text. The fact that I was the one doing the data transformation with R helped me because I was the person who knew what values the .json file was returning and how I could use them to turn the text more personalized.
For the data visualizations on the project, I’ve used the chart.js library.
What was the hardest part of this project?
When you build a news application that aims to deliver a customized news piece for the reader, the biggest challenge is to generate a text that doesn’t feel robot-made. A ‘feel in the gaps” approach is not enough, not only because the text is only updated in the gaps, but also because data is messy, and things can get complicated. This was the hardest part – to write a compelling text that didn’t feel like it was generated programmatically.
To tackle this issue, I’ve used a complicated set of ‘if-else’ rules to turn every text a bit differently according to the data. To give you an example, in the first paragraphs the selected municipality gets compared with the ones that saw the biggest increase in covid-19 cases for the last 7 days. But when you select one of those three municipalities you need to highlight the fact that a specific municipality is indeed among the three municipalities with the biggest increases – closer to the other two. Because we often see text as a non-dynamic thing it’s hard to think about all possible outcomes that data can lead you to.
Adding to that you also need to take into consideration static and grammatical rules. For example, in Portuguese, almost every object has a gender – including municipalities’ names. So I had to create a database just to know what word I should use to refer to a certain municipality.
But probably the hardest part was the end. Portuguese authorities decided to stop publishing data daily, which made all my code useless. To adapt the news application, I needed to write all my R and vue.js code from scratch. So we needed to stop updating it.
What can others learn from this project?
I believe this work is a perfect example of how much personalized news can be meaningful for people. As I said before, readers let us know how important this piece was to them through social media and emails.
The covid-19 brought us a pandemic of dashboards – places with numbers and some data visualizations about the situation. I have nothing against dashboards and they can be very useful, but we need to take in consideration that people like to read the news, not to stare at dashboards. In an ideal world, a newsroom should have the resources to write 308 personalized news pieces about every single municipality – or at least have a strong local press that would do that. But that didn’t happen here and code was the solution. So we needed to at least make sure people didn’t feel they were reading a text that a robot just wrote.