This story measures the level of racial segregation in Philadelphia over time, and compares it to other cities. We found that Philly remains one of the most segregated cities and metros in the nation, and that the rate of desegregation has been slowing.
We used data from the 6 most recent Censuses to derive the “dissimilarity index” (a common segregation measure) for Philadelphia and peer cities over time. With static and dynamic visualizations, we illustrated the racial distribution of Philadelphia’s residents, how Philly’s score compares, and the intuition behind how the dissimilarity index works.
Our article galvanized a conversation around this ever-present but seemingly eternal issue. We saw widespread attention from policymakers, elected officials (including two US congress members), academics, and citizens around our area. Our internal analytics showed that ordinary readers were also highly engaged with this project, and that it generated a disproportionate number of letters and new subscribers.
The first task was to work with experts to find the right quantitative measures of segregation: while we primarily discussed the dissimilarity index, we calculated and mentioned two others. Calculating those values required us to collect census data, which we analyzed using Python/ Pandas given the many hundreds of thousands of individual data points (census tracts by year) that were involved. We also undertook exploratory geographic visualization using QGIS.
Having done the analysis and determined the arc of the story, we collaborated with on-staff artists to create maps of racial distributions across Philadelphia and with our interactive designer to make the maps interactive (and to explain the logic behind the dissimilarity score) using react.js and d3.js. Finally, we created static charts and tables in Datawrapper to illustrate distributions and trends over time.
What was the hardest part of this project?
There were two types of challenges with this story: technical and editorial.
The chief technical challenge was obtaining and munging the census data. While the census bureau makes obtaining the most recent census tract data fairly simple, earlier years’ data required finding data pre-processed from a third party source (NHGIS). The raw census data required significant pre-processing, and we did extensive exploratory analysis before we knew what the story was.
The second challenge involved the interplay between editorial and technical decisions. How should we categorize racial/ ethnic groups? Which indices of segregation should we focus on, and what should we make of different scores across indices? Which ethnic groups should best be compared against each other. What was the right level of geographic resolution (city? county? metro area?), and for which types of questions? These questions had to be answered iteratively as we performed exploratory analysis, because they informed a programmatic approach to structuring and analyzing the data.
And of course, having done our analysis, we had to decide which insights were most worth sharing. Not only did they have to drive the point home (namely, that Philadelphia is highly segregated across multiple measures and over time), but we also had to contextualize the significance of our findings. We relied on experts to make the case that segregation matters not intrinsically, but because it is correlated with various other social ills, and therefore represents a kind of structural racism.
What can others learn from this project?
Above all, we hope that this story provides an example of how journalists might tell a timely story about ever-present problems, even if they are not grounded in breaking news. Many worthy stories, after all, are about entrenched rather than emergent problems. Segregation and its concomitant ills are not new, but drawing attention to them in a new light helped spark conversations about it nonetheless.
Relatedly, we also think we succeeded in making abstruse social science research more accessible. While and index of segregation may make sense at a high level, we’re proud that we also tried to explain the intuition behind how it was calculated, and why that is a valid way of thinking about the geography of race. That involved developing relationships with academics and translating their research into plain language and visuals.
Finally, we hope other journalists can learn from our use of census data at scale.