During one month and a half, El Confidencial Data Team has scraped every 30 minutes the real-time Madrid subway incidents web to find how many times escalators and elevators have not been working. The data collection was captured in a database with 1,549 records for each station. With this data, we produced two articles: the first one, published on July 29th, shows the stations with more unavailable escalators and elevators; the second one, published on September 6th, is focused on the difficulties for parents to move children’s carriages on the subway.
It was the first time that anyone could measure how many times escalators and elevators of Madrid subway were broken. After that, we made a second report about newly born families that have a lot of problems to access to non-accesible stations. The two reports has a huge impact in terms of pageviews, with more than 60,000 visits. Further, Madrid subway company reacted with some internal criticism towards our reports.
What was the hardest part of this project?
The hardest part of the project was to find an affordable way to scrape the real-time Madrid subway incidents web. We wouldn’t be able to scrape it in real time because we would down the server, we found a way to scrape it every 30 minutes. With this formula, we managed a sample of 1,549 records for every station,
What can others learn from this project?
The best lesson of this project is the original and imaginative way we found to scrape the real-time Madrid subway incidents web. The ideal would be to have the data in real-time, but it was unable, so we had a huge sample of observations for every station to find how many times elevators and escalators were broken.