The probability of contracting covid depends strongly on various factors, as shown by numerous studies and data analyses. ZEIT ONLINE has compiled detailed data from ten major German cities for the first time. They show that the coronavirus affects socially disadvantaged people the most.
Since Germany lacks detailed Covid data, our analysis showed for the first time: Where many people share an apartment, the Covid rates tend to be higher. This is plausible, since infected people are very likely to infect other people living in the same household. In wealthy neighborhoods, case numbers are comparatively low. Well-paid jobs often take place at a desk and can easily be transferred to the home office, where the risk of infection is low. Unemployment also plays a role: in those neighborhoods where a particularly large number of people do not have a job, there is thus an increased incidence of corona infections.
1) Reproject with GDAL
2) Clean and clip with QGIS
3) Combine data with R or PostGIS
4) Simplify with Mapshaper
5) Find colors with Color Palette Helper and Color Brewer
6) Convert to tiles or images with Tippecanoe and Mapnik
7) Visualize interactively with MapboxGL and Maptiler
What was the hardest part of this project?
Because of federalism in Germany, data at the district level does not exist centrally. They have to be requested individually. ZEIT ONLINE asked the 15 largest German cities for Corona case figures at the district level. Ten cities provided complete data. In addition to these figures, sociodemographic data were also collected and made available for download (https://interactive.zeit.de/2021/corona-stadtteile/rohdaten-corona-stadtteile-zeit-online.csv) for everyone. This allowed for further analysis by other journalists and researchers.
What can others learn from this project?
We tested the statistical correlations presented for significance. We used multilinear regression models to check the influence of further factors. However, it is not easy to draw the right conclusions from statistical correlations. To take one of the examples from the article: Neighborhoods with high unemployment have more Corona infections. The data say nothing about whether high unemployment is the cause of more infections. And whether the unemployed are more likely to be infected. It’s worth looking more closely at these two conceivable fallacies.
First, correlation does not equal causation. In principle, it may be pure coincidence that high unemployment and high incidence occur in the same neighborhoods. However, since the pattern occurs in all but a few of the cities studied, this is unlikely.
Second, it is important to note that such analyses do not look at individuals but at groups of people, in this case the residents of a neighborhood. No conclusions about individuals can be drawn from this.
Statistically, not only is incidence related to various social characteristics – the characteristics also correlate with each other. A high unemployment rate pushes income statistics down, and those with little money cannot afford a large apartment. People with a migration background are more likely than average to be affected by unemployment and poverty. In short, a lot is connected to a lot. This is also what makes it so difficult to distinguish what is cause, what is effect and what is merely coincidence.
For this reason, we compared the results from our data analysis with the current state of studies, which come to similar conclusions.