While the European Union holds the second largest elections in the world, with more than 400 million eligible voters, there were no detailed maps and analyses so far, because there is no pan-EU data available. We changed In our article “The New Colours of Europe,” we were the first to visualise the votes across the whole Union on the most-detailed level possible, showing all votes in almost 80,000 administrative units, mostly municipalities. Therefore, we collected and harmonised the results of all 28 EU countries right after the election. We also opened the raw data later to researchers and the general
We received requests for the data from Germany’s Federal Institute on Building, Urban Affairs and Spatial Development, which is working on a study where they intend to measure the influence of socio-demographic factors on election results and the European Commission’s Joint Research Centre, which used the data in a Science for Policy report called “Voting, attitudes towards immigration and Euroscepticism – A territorial perspective” which is due to be published on 4 February 2020 (Embargo). Our detailed data was used to analyse whether high shares of migrants in a given territorial unit may be associated with a higher vote for anti-immigration parties during elections for the European Parliament. So, we created a dataset that’s not only valuable for our users but also for research purposes.
To collect the data, when downloading entire datasets was impossible, we wrote custom scripts in Python and NodeJS that parsed and scraped the election data from the sites of national election authorities. When downloading the data was possible, the files made available by the national election authorities often required bits of python code to transform them into our standard format. The pandas library was indispensable for this work, since it allowed us to easily manipulate tabular content. Preparing the shapefile that would serve as the backbone of the visualisation was modified in QGIS (updating administrative divisions, manually creating constituencies in Slovenia, i.e.).
The main challenge we had while developing the interactive maps was that of handling the large amount of data. To ensure, that around 80,000 districts can be loaded and displayed fast on every device, we came up with a mixture of raster tiles (created with Mapnik) and vector tiles (created with Tippecanoe). Both layers were put together using MapboxGL.
What was the hardest part of this project?
The hardest part was getting the data. Election data may be published on the website of election authorities, the Interior Ministry, sometimes open data portals. This often meant datasets only available in the member state’s national language (there are 24 official languages in the European Union). If you manage to download a file, it’s an Excel file or CSV. But in the case of the Netherlands, for example, they were more complicated XML-type files.
Also, figuring out the best way to format the data so as to harmonise elections that happen according to 28 different rules, resulting in 28 different datasets. There is no harmonised EU format for data, done either by national or EU authorities.
The data changes based on the type of elections. Most countries vote on list systems, each with its own twist (e.g. Luxembourg has six votes per person), but some states or regions use Single Transferable Vote, where candidates are ranked. Votes abroad are sometimes counted in a category apart, sometimes they get allocated to the voter’s hometown if done by mail, and sometimes embassy votes get added to votes in the national capital, skewing the data there. Often times, data is not geo-referenced with the relevant municipality codes that would help attach the data to the map’s shapefile. We ran into problems due to spelling variations (like accents) or language variations when a municipality is in a bilingual area and it has the national name in one dataset, and the regional name in another.
What can others learn from this project?
The main purpose of this project was to see the landscape of the European vote as it manifests itself in the second largest elections in the world (after India), to see how cities, rural areas, the different regions and nations vote, and where differences and similarities lie.
We also hoped to represent, for the first time, all EU voters as members of a single political sphere, with the vote coloured according to the European Parliament Groups, and not national parties. This, together with the interactive map helps foster a conversation between voters, who now share a common vocabulary when talking about their choices, but also have the possibility of comparing the vote in their municipalities, even if they come from diametrically opposed places in the EU.
Until this project, there had been no harmonised EU data format done either by national or EU authorities, since it was national authorities that published the data for European elections. By making the data available (http://interactive.zeit.de/2019/data-eu-elections-2019/data-eu-elections-2019.zip), we hoped to encourage others – both in the media and the research community – to use this data, which means more in-depth conversations about the European vote. A first step in seeing ourselves represented more as a European Community.
Original in German: www.zeit.de/politik/ausland/2019-07/europawahl-gemeinden-eu-mitgliedsstaaten-ergebnisse-analyse