Non-accessible Underground

Category: Best data-driven reporting (small and large newsrooms)

Country/area: Spain

Organisation: El Confidencial

Organisation size: Big

Publication date: 29/07/2019

Credit: Cristina Suárez, Michael McLoughlin, Jesús Escudero, Antonio Hernández, Pablo López Learte, Laura Marín, Pablo Narváez, Luis Rodríguez, Carmen Castellón

Project description:

During one month and a half, El Confidencial Data Team has scraped every 30 minutes the real-time Madrid subway incidents web to find how many times escalators and elevators have not been working. The data collection was captured in a database with 1,549 records for each station. With this data, we produced two articles: the first one, published on July 29th, shows the stations with more unavailable escalators and elevators; the second one, published on September 6th, is focused on the difficulties for parents to move children’s carriages on the subway.

Impact reached:

It was the first time that anyone could measure how many times escalators and elevators of Madrid subway were broken. After that, we made a second report about newly born families that have a lot of problems to access to non-accesible stations. The two reports has a huge impact in terms of pageviews, with more than 60,000 visits. Further, Madrid subway company reacted with some internal criticism towards our reports.

Techniques/technologies used:

For scraping the real-time Madrid subway incidents web we used a Javascript code to scrape the incidents of 300 stations every 30 minutes. After one month and a half of compilation, we built a database with every incident or ‘no problem’ notice recorded every 30 minutes. In total, we had 1,549 records for every station. Due to the size of the database, we managed it with R in the very first moment to extract the incidents. Then, we analyzed the data with Microsoft Excel. The infographics are made with Datawrapper and Javascript, and the maps with QGIS.

What was the hardest part of this project?

The hardest part of the project was to find an affordable way to scrape the real-time Madrid subway incidents web. We wouldn’t be able to scrape it in real time because we would down the server, we found a way to scrape it every 30 minutes. With this formula, we managed a sample of 1,549 records for every station,

What can others learn from this project?

The best lesson of this project is the original and imaginative way we found to scrape the real-time Madrid subway incidents web. The ideal would be to have the data in real-time, but it was unable, so we had a huge sample of observations for every station to find how many times elevators and escalators were broken.

Project links: