ScotRail apology tracker

Category: Best data-driven reporting (small and large newsrooms)

Country/area: United Kingdom

Organisation: The Courier

Organisation size: Small

Publication date: 18/12/2019

Credit: Lesley-Anne Kelly

Project description:

The story used data scraping, analysis, and visualisation to track customer satisfaction levels with the main Scottish rail network – Scotrail.

Impact reached:

The impact of the project was to highlight variations in customer satisfaction that are not availble in other formats and to correlate this with the many times throughout the year we have reported on service disruptions.

Due to the nature of the contract Abellio/ScotRail they are not generally subject to FOI requests (with small exceptions such as cimmunications with the government agencies they come into contact with etc). This was therefore a more creative way to collect data to quantify the nature of the issues with the business.


Techniques/technologies used:

This project starts out in Workbench:


I used workbench to beging scraping tweets from the @ScotRail twitter account. As Workbench is limited by the twitter api to only pull the last 3000 or so tweets – I left the scraper running with the tweets set to auto-accumulate. 

Continuing in workbench I used a series of regex extractions, excel formulae, and data transformations to calculate the number of times the ScotRail account used the word “sorry”, “apologies”, or “apologise” every day and calculate the % of tweets every day that were apologies.

This scraper has been used a few times, and for visualisation I would either use Flourish or Datawrapper.  This time I used datawrapper.

What was the hardest part of this project?

The project was not intended to be used on the day it was.

I have had the scraper running for nearly a year and was intending on doing a full analysis when I had a year of data. However, on the 18th of December the Scottish Government announced that they were stripping Abellio (the parent company) of the national rail contract due to the consistent performance issues – which seemed like a perfect time to show our data.

I then had to take the data from Workbench and pull out key dates, investigate the reasons behind these, and produce a visualisation within a few hours. As such, the final graphic isn’t as polished as I would have liked, but such are the limitations of working in a small newsroom limited to open source/free resources.

What can others learn from this project?

Having scrapers/tools set up and running like this can come in hugely handy to add to breaking news at the right time.

Project links: