How India mobilised a million polling stations

Category: Best data-driven reporting (small and large newsrooms)

Country/area: Singapore

Organisation: Reuters

Organisation size: Big

Publication date: 22/05/2019

Credit: Manas Sharma, Simon Scarr, Marco Hernandez

Project description:

This piece utilized the data from the 400,000 electoral roll PDFs that were scraped programmatically. They gave us the number and sex of voters at every polling station. We then matched exact coordinates for each station in order to find out other interesting aspects such as accessibility or altitude.

We were able to explain some of the extreme locations and show the lengths to which the election commission goes in order to provide a polling station, sometimes only for one or two voters.

Impact reached:

The piece was a hit with Reuters clients but was also shared widely in India as well as elsewhere in the world.

Techniques/technologies used:

Javascript and Python were used to scrape and merge the data. R was used to then analyse that master dataset.

QGIS was used to map out all of the stations in the project before being polished in Adobe’s Creative Suite and built into a web page.

What was the hardest part of this project?

This was a massive data scraping challenge. Every state had a completely different way of storing this data so each had to be tackled separately. Electoral roll PDFs we only available in the format of a scanned/photocopied image which made scraping almost impossible. A specific tool had to be made to do that task and it took a long time to run through all 400,000 of them.

What can others learn from this project?

Preparing data well in advance can be a powerful weapon when preparing for a big event. Data scraping started months ahead of India’s election day.

Project links: