This was a data-driven investigation into one of the defining health issues of the age, and – as we showed – is resulting in the needless daily deaths of 7,000 people. Embracing a data-led perspective and working with a large dataset with millions of entries allowed us to take a wider perspective and see the big picture and complexity of the problem. The piece put into context many of those media stories –e.g. measles outbreaks- as well as demonstrate the scale of the ongoing need for vaccination and examine to what extent misleading information and the anti-vax movement had and
The finished piece was rapturously received by organisations that aim to protect people from disease – Save The Children and UNICEF among them.
Save The Children said: “thank you for the superb, superb data piece on vaccination. In particular the focus on the toll from pneumonia, an issue we’ve been campaigning on for the last few years – often, it feels, in a vacuum! So a huge boost to team morale to see vaccination and pneumonia in the spotlight, and delivered so well.”
Unicef, who tweeted out the piece to their 7.9 million followers, said: “Thank you so much for these fantastic pieces, they are excellent!”
The piece was accompanied by a second story where we collected case studies which humanised the analysis.
Up to ten million unique users visit Sky News website and app every week, and this investigation received far higher for engagement than average.
The story was also broadcast on Sky News channel, breaking the traditional process of being first on TV and then online.
It has also an impact in the data community about our use of the data visualisation tool, Flourish, and its scroll down template. Sky News was the first media in using this quality from Flourish and, afterwards, several data team from different countries got interested in replicating it.
The data for this story came from the Global Burden of Disease, an international research program coordinated by the Institute for Health and Metrics and Evaluation (IHME) at the University of Washington, and the WHO-UNICEF Joint Report.
The raw dataset we started with contained almost 8 million of observations, so we used the programming language R to explore it.
With R Studio we were able to filter the specific data needed for our analysis and we applied statistical models to find patterns and make calculations. We used a wide range of methods from simple statistics as aggregations, percentages, rates or percentages changes to correlations and logarithmic interpolations.
Some of the R libraries used were Tidyverse, reshape2, readxl, ggplot2, plotly, and gghighlight. The last three helped to produce visualisations (some of them interactive) on the R Notebook where the analysis was done. This notebook was later shared with the experts and the reporter to explain the findings through text and graphics.
Despite using R to visualise during the analysis process, we used the data visualisation tool Flourish for the published story.
Flourish had a by then hidden functionality to create scroll-down visualisations that no one else in the industry had used it before. Installing its SDK we could transform a Flourish story which moves horizontally to a vertical interactive experience.
The performance of the chart rendering was optimised by adopting a ‘lazy loading’ technique where loading of visualisations is deferred shortly before they are to be displayed to the user.
What was the hardest part of this project?
This project involved months of work, for which was essential getting the specific data. There hadn’t been story without it as we were not aiming another piece based on single cases. But finding this data was one of the hardest tasks.
Not many organisations compiled this type of information and institutions as WHO warned us about not using their data due to methodology issues.
There were also concerns with the list of vaccine-preventable diseases. Although WHO has an established list, some experts differ, and this was considered in our analysis. But nor the Global Burden of Diseases or any other database found had information about all diseases.
Reaching leading institutions in the field, we were able to find a robust database, and here started the second hardest part: understanding the data. The technical terms and some issues with the data –e.g. population sizes for some diseases in some age groups were not big enough in all countries – made it necessary to work closely with specialists. Fortunately, we kept a fluent and constant relationship with experts.
This project also involved a good command of statistics, not only to carry out the analysis but to facilitate the communication with researchers and implement their indications. Proper knowledge of the tools and techniques used by experts and statisticians like R simplify the revision process. The IHME offered to review our analysis and we shared it in the R Notebook.
As for the visualisation process, we innovate in the storytelling, creating a scroll down visualisation never used at Sky News. This made an impact internally and built bridges between departments that have been very useful in later projects. Although we used an external tool, we used it like nobody did it before and our example was followed by other data teams in different countries.
What can others learn from this project?
Even if the source of the data is trustworthy, it would be advisable to contact them if your analysis slightly differs from what they explain in the methodology. The first dataset I found was on World Health Organisation website but just after speaking to them I realised it was not suitable for our analysis.
Approaching data sources can save time to the reporter, as some of them can be interesting voices to interview and include in the story.
Collaborating with experts makes your analysis more rigorous, but researcher and journalists understand time differently. It is also recommendable agreeing with them on a formula to express academically correct how the analysis was developed (e.g. in the methodology) while keeping the story clear and engaging for the audience.
Be updated with tools and techniques and explore them further to understand the whole potential a tool has to make the most of it.
Identify colleagues with different skills useful in a data project, try to engage them to your project and build bridges with their departments that can be reused in the future. The collaboration between the designer, the developer and the data journalist was key to produce the main visualisation.