2020 Shortlist

Editorial dashboard to monitor ads spending on social media

Category: Innovation (small and large newsrooms)

Country/area: United Kingdom

Organisation: Sky News

Organisation size: Big

Publication date: 11 Apr 2019

Credit: Carmen Aguilar García, Przemyslaw Pluta, Peter Diapre

Project description:

The dashboard was part of a wider Sky News project: Under the Radar, which monitored the impact political actors were having on social media during the 2019 general election campaign. 

It was an interactive tool which combined and visualised aggregated data in a single place. It tracked periodical publication of ads on Facebook, Google and Snapchat by the main UK parties and political operators. 

It measured daily spending on these social platforms, the performance of ads published and demographic information about the audience reached. This allowed us to understand how these political actors were using social media to target voters.

Impact reached:

We wanted to reveal how parties were fighting the election on social media, but the relevant information was not easily accessible for non-data journalists. 

The dashboard collected, analysed and presented that data in a single tool, making the information easily available for non-data reporters who could use it whenever and wherever they needed it. 

Using this tool, Sky News published several stories about political ads spending on social media, parties’ strategies on these platforms, and characteristics of the audience reached. 

It gave us the first hint about political ads on Facebook disappearing days before the election, in what we later confirmed with researchers to be a “catastrophic” loss of transparency and accountability. 

While creating the tool, we explored the different methods each social media platform used to publish their data. This gave us a better understanding of their transparency policies and their problems, and we could verify obstacles in their mechanisms to access the information. 

Academics valued this knowledge and we have been later interviewed by researchers about our experience working with social media libraries and their data. 

The fact that we created the tool before the general election was called involved better management of the resources during the actual campaign. The dashboard automated the process of gathering data, analysing it and visualising it. That freed the data journalist up from repeating these tasks during the campaign, when other data work needed to be done. 

Techniques/technologies used:

Data was gathered using the Facebook Ads Library Report, Facebook Ads Library API, Snap Political Ads Library and Google Political Ads Transparency Report.

Facebook Report offers manually downloadable files ranging in daily, weekly, monthly and three-month detail. Files update at random times and are not archived, so it had to be downloaded on the day. As Facebook prevents basic web scraping, nor does it provide downloadable URL endpoints, we developed on-premises cloud-based back end service with Node.js, which periodically monitored the library and mimicked user’s action of clicking on the download button.

Additional data was collected by Facebook Ad Library API, whose access is granted upon request and app review by Facebook.

Snapchat publishes yearly files updated daily at random times. As with Facebook, we automated the process and developed an on-premises cloud-based service to monitor and download files daily.

Google data was easily accessible through Google Cloud Public Datasets which are updated weekly.

Data collected from each social media platform was normalised and stored in separated Google Cloud BigQuery Datasets. On-premises cloud-based back end services were developed with Node.js utilizing Google Cloud Platform SDK to interact with Google BigQuery datasets.

Using the bigrquery package from the programming language R we accessed the datasets on Google Cloud. Further cleaning was needed in RStudio, especially to normalise names, and we used Tidyverse package to analyse the data and plotly and datatable libraries to create interactive visualisations and searchable tables. That would allow the journalists to easily identify trends, but also search for specific information.

Using the Shiny library from R we created an interactive application deployed in shinyapps.io. We granted access to the App to journalists and editors involved in the Under the Radar project, who could easily access to the updated dashboard from a browser.

What was the hardest part of this project?

The compilation and standardisation of the data was complex, as the mechanisms differ from platform to platform and there is no harmonic criteria in the structure of the information.

Facebook posed bigger challenges. Its daily reports are only available manually by downloading a csv file which disappears every 24 hours. To track information for months, we needed to automate the process.

These files only provide a subset of the information we were looking for, so we had to complement it with data collected through Facebook API to understand more details of the ads, like status, impressions, distribution and audience reached.

Access to the API is not public and is being granted upon request and app review by Facebook. Requests to the API are heavily rate-limited in comparison to the number of requests required to collect all relevant data, which at times proved problematic.

Although data from the Facebook Report and the Facebook API were about the same topic and actors, both sets of information were not comparable due to the way Facebook discloses the data.

The process in Snapchat and Google was easier, but both of them offer bulk global data and none of them had a specific tag for the UK elections 2019 which would allow us to easily create a filter on the fields we required.

Snapchat changed the format of its data without prior notice, forcing us to adjust the code, but also giving us a sign to close monitor the data even after building the dashboard, as platforms could make changes which would affect our results.

Names and variables also differ in each platform, which involved cleaning to standardise the data. Due to the criteria and timing in which each platform disclose the information, we had to create different tabs in the dashboard –one per platform.

What can others learn from this project?

It is required a high integration between the data journalists’ team and the developers’ one, for which it is advisable both teams to understand the other’s job, and the resources and skills they have. 

Agreeing on formats and structure of the data in the gathering stage proved to simplify the process later during the analysis phase and the creation of the App.  

Google Cloud and bigrquery made the data analysis faster and avoided the data journalist to upload big datasets to RStudio which could have slowed down the process. 

Although we always kept in mind the potential stories we could find using this tool, investing in the input part of the project made it easier publishing several stories using information from the dashboard during the campaign process. 

It is recommended to include a “Get the data” button in the dashboard if your company use specific tools to visualise, as well as training journalists about how to use the dashboard. 

Project links: