2020 Shortlist

BBC Shared Data Unit

Category: Open data

Country/area: United Kingdom

Organisation: BBC Shared Data Unit

Organisation size: Big

Publication date: 1 Oct 2019

Credit: Peter Sherlock, Alex Homer, Eileen Murphy, Matthew Barraclough, Anna Khoo, Paul Lynch

Project description:

A commitment to transparency and the open data movement is at the heart of our team and our output.  The Shared Data Unit’s remit is twofold – to find and clean unexamined data sources and put them into the public domain for journalists to use, and to train up the next generation of regional data journalists to use open data for their own stories.  We demonstrate our commitment to transparency through sharing source data, methods and code for every project upon which we embark. Our innovative industry partnership has had tangible positive impact upon the regional news marketplace.

Impact reached:

Our world-class training programme has impacted on the scale and quality of regional data journalism across the industry.  We strengthen local news output by giving journalists the skills to interrogate data and tell stories of importance to their communities.  Journalists benefit from dozens of training sessions over a three-month period, including sessions on Excel, Open Refine, Flourish, Datawrapper, Freedom of Information laws, how to use APIs and the programming language R.  In March 2019, JPI Media launched its first dedicated data journalism team. Five of its 11 members were our former secondees. Newsquest also launched its own unit in June 2019 staffed by three SDU alumni.

Secondly, our original journalism sourced from open data has generated hundreds of stories across the regional news marketplace.  In 2019, the data we published,  generated around 300 stories for local news partners, bringing the total to around 850 stories since the project began.  Our journalists focus on stories that matter to local audiences – the state of their roads, the cost of garden waste recycling and the ability to get a GP appointment outside working hours.  Our reporting was picked up for 15 reports in national newspapers and 11 times on flagship BBC TV and radio programmes including the 1pm national TV news, Today on BBC Radio 4, BBC Radio 5 Live, BBC Reporting Scotland and Wales Today. It informed 68 local radio reports.

Finally, our journalism regularly provokes public debate.  For example, on the day we reported one in two people who appealed in court against a decision to deny them disability benefits was successful, our research was raised at First Minister’s Questions in Scotland and the Secretary of State at the Department for Work and Pensions (DWP) faced questions about the department’s performance on the campaign trail ahead of the General Election.

Techniques/technologies used:

Our commitment is to use technical skills to bring untapped, but open data sources into the public domain and make them accessible to journalists.  Our data, code and methodologies are published on our Github repository, posted as inline links to Google in our stories, and shared with more than 900 media titles across the UK.

For example, we wanted to explore whether the electric car charging point infrastructure was being developed at a sufficient speed to meet the anticipated rise in demand.  We interrogated an open-source API to report the locations of UK electric vehicle charging points.  We used R and the Haversine formula to perform 49 million calculations for each distance between each of the 7,000 UK charging points, using their latitude and longitude coordinates and then storing the shortest distance.. Six months later the government published charging points locations data for the first time while encouraging local councils to improve charging infrastructure.

In another project, we used R to merge together five years’ of police statistics for an analysis revealing community resolution orders – informal punishments which do not appear on criminal records – were still being used by police against suspects of violent crimes, despite guidance restricting the orders’ use to low-level offences.

Another project saw us produce the largest UK-wide report on the re-sale of homes bought under the Right to Buy policy, which allowed council tenants to purchase their former council homes at a discount. When Northern Ireland Housing Executive sent its FOIA response as PDFs including 83 pages of scans of a paper ledger, we manually entered data into a public-facing spreadsheet to enrich the data commons. 

What was the hardest part of this project?

We have set out on an ambitious project to ensure the vast amount of open data published in the UK is reported on by local newsrooms.  We want the data we find and analyse to be used as widely as possible, and that involves breaking down journalists’ aversion to using data for their stories.  In order to break down barriers, we use a number of techniques, which include answering queries through a Slack Channel and publishing ‘how to’ guides to accompany the data we distribute.  In addition, we have held hack days and conferences where journalists are invited to learn more about handling and processing data for news stories.  And the journalists who have completed our secondments regularly hold data training sessions when they return to their substantive posts, ensuring the skills they have picked up cascade down to their colleagues. 

Another challenge is posed by organisations who have not previously been exposed to the level of scrutiny.  During the Right to Buy investigation we had around 150,000 rows of data – each row representing a property title in Great Britain formerly sold under the policy. Both HM Land Registry and Registers of Scotland underline they cannot guarantee their datasets are error-free.   We found some 60,080 rows did not have comparable sales prices. When we highlighted this, Land Registry re-ran its script and found 2,582 more price entries so we updated our calculations. Similarly, we encountered 734 dates anomalies in Scottish data.  It’s important to treat data as a source just like any interviewee and ask it questions and not treat its first answers as objective truth.

What can others learn from this project?

Our project was established to tap into public interest open data, which was hiding in plain sight, and to train up journalists with data journalism skills.  We have demonstrated how adopting open data principles enhances the trust in our reporting, builds a personal relationship with audiences and helps to engage them in the process of journalism.

We have demonstrated that the openness of data in itself is not enough, it is necessary to engage with journalists and help them find what they’re looking for and understand its implications for their audiences.  Through training key individuals, the levels of data journalism can be enhanced across a media landscape.  We give the journalists who undertake a secondment with us the skills and confidence to deliver their new-found knowledge to colleagues when they return to their newsrooms.  Our two-day Train the Trainer course equips participants with the skills and techniques to prepare, plan and structure a training session.  Secondees from Newsquest and JPI, for example, have carried out training days for colleagues upon completing their secondments.

Project links:

www.bbc.com/lnp/sdu

github.com/BBC-Data-Unit/shared-data-unit

www.bbc.co.uk/news/uk-47696839

www.bbc.co.uk/news/uk-47443183

www.bbc.co.uk/news/uk-49891159

www.bbc.co.uk/news/uk-49085346

www.bbc.co.uk/news/uk-47697778