Entry type: Single project
Country/area: United States
Publishing organisation: The Markup, The Associated Press.
Also Associated Press distribution partnership, which means many organizations republished our story the day it came out,
We also worked with Outlier Media specifically to help tell the story in Detroit. Outlier’s piece published on the same day as ours.
Organisation size: Small
Publication date: 2022-10-19
Authors: Leon Yin, Aaron Sankin, Evelyn Larrubia, Joel Eastwood, Gabriel Hongsdusit, Paroma Soni, Jeremy Singer-Vine, Maria Puertas, Jill Jaroff
Leon Yin (primary reporter, investigative data reporter)
Aaron Sankin (primary reporter, investigative reporter)
Evelyn Larrubia (editor)
Joel Eastwood (visual journalist)
Gabriel Hongsdusit (visual journalist)
Paroma Soni (freelancer)
Jeremy Singer-Vine (data coach)
Maria Puertas (audience and community)
Jill Jaroff (copy editing and production)
Prize committee’s comments:
It’s not a secret that internet speeds vary tremendously across the U.S., denying households from effectively taking part in remote learning or working, not least during the worst of the pandemic. What the Markup’s Still Loading project did was quantify those disparities, and show how they disproportionately affect poorer neighborhoods and communities of color. It showed the injustice didn’t stop there: slow internet connections cost the same as fast ones, meaning communities are disadvantaged twice. In laying out the scale and inequity of the issue, the Markup did outstanding work in the public interest.
An investigation by The Markup found that AT&T, Verizon, EarthLink, and CenturyLink disproportionately offered lower-income and least-White neighborhoods slow internet service for the same price as speedy connections they offered in other parts of town. The project includes our main investigative story, methodology, and story recipe. We also updated an earlier story called “Why Is My Internet So Slow?” to include more specific information to help consumers.
Following publication, the reaction to this story was overwhelming. The Markup reporters Leon Yin and Aaron Sankin were interviewed on national public radio shows like Marketplace Tech, On The Media, Oregon Public Broadcasting, and All Things Considered. They were invited to give presentations to the National Digital Inclusion Alliance and have briefed local, federal, and state-level lawmakers and regulators about our findings.
Leon and Aaron also created a guide showing other journalists how to localize the story for their own communities, which many publications did. Dozens more republished the story, mostly through The Markup’s partnership with the Associated Press. Aaron also updated a story he wrote earlier in the year to add explanations on how individuals can check if they’re able to switch to another internet provider and get better speeds, and how people might be eligible for a monthly subsidy to help with their internet bill.
The investigation has also proved a crucial resource for government officials working to bring more Americans online. Detroit’s director of digital inclusion told The Markup that the data would help the city decide which neighborhoods to prioritize for the construction of its in-progress open-access municipal fiber network.
The investigation also came as the Federal Communications Commission is in the midst of a rulemaking process aimed at “preventing digital discrimination.” This process gives the agency broad leeway to take action to rectify digital inequality. Our reporting has the potential to directly influence this rulemaking process—especially since the story and data were formally submitted for consideration by the California Community Foundation.
Our data collection method is modeled on a 2020 Princeton study that used an often overlooked information source: the “broadband availability tools” from internet service providers’ own websites. We used those lookup tools—by finding and using the underlying APIs—to collect granular information on internet plans offered by ISPs to individual addresses. This allowed us to join information about price and offered speed with the socioeconomic data of the surrounding area and test for patterns among consumers receiving the worst value for the same monthly fee.
Between April 15 and Oct. 1, 2022, we gathered internet offers from AT&T’s, Verizon’s, EarthLink’s, and CenturyLink’s websites for 1.1 million U.S. residential addresses for 45 cities that we had collected from open sources. We focused on the largest city served by at least one of the four ISPs in each state.
We used Census block group data and historical redlining maps, where available, to determine the socioeconomic characteristics of areas that were disproportionately offered the worst deals. To do this, we grouped addresses by their area’s median income, racial/ethnic demographics, and redlining grades to compare the proportion of slow speeds offered to each of these groups for each city and ISP.
In addition, we conducted a logistic regression to determine whether business factors such as competition and customer adoption (explanations that ISPs have said play a role in where they choose to upgrade equipment) affected the disparities in who got the worst offers. Even after adjusting for these factors, lower-income, least-White, and historically redlined areas still got the worst offers in the vast majority of cities we examined.
Context about the project:
We filed 46 public records requests asking cities, states, and counties across the country to provide any franchise agreements they had with the internet service providers that were the subject of our investigation. The initial goal was to determine if any of these providers were violating the terms of their franchise agreements by offering inequitable service across a municipality.
Through this process we learned that internet service is not regulated at the local level, as is cable service. None of the cities provided franchise agreements that covered internet service, even though there were agreements with some of the companies covering cable service.
There were also challenges with gathering the data.
**Address data difficulties**: For each address we gathered from OpenAddresses and NYC Open Data, we identified both its Census block group and whether it was within the boundaries of the city by plugging its coordinates into the Census.gov geocoder API. We chose this approach because zip codes or city names were often absent from the open-source data we used.
**Address selection difficulties**: For each city and ISP, we went through each Census block group and took a random 10 percent sample of the addresses we had collected (called a stratified sampling). If a lookup tool told us that an address was not served, or invalid, we continued to search new addresses until we reached a 10 percent sample of each block group. This is how we collected a representative sample of offers (speed and price) for each city and ISP in our investigation.
**Difficulties with historical redlining maps**: We relied on the University of Richmond’s Mapping Inequality project, an online repository of residential “mortgage security” maps and other associated documents from the Home Owners’ Loan Corporation (HOLC), to categorize historically redlined neighborhoods. We merged HOLC grades to addresses by checking if an address’s coordinates fell within the boundaries of graded areas to which we had access. The digitized maps don’t include every map created by the HOLC, since some were never digitized by the researchers at Mapping Inequality. Further, some cities in our data were never mapped by the HOLC. As a result, we have maps for only 22 cities.
**Data gathering difficulties**: ISPs make the process of querying an address a slow and multistep process; they also block IP addresses that make too many sequential requests. We reverse-engineered their websites by finding and using up to five APIs that make up the process of querying an address for internet plans. To prevent overloading the ISP servers or getting our queries rejected, requests were delayed and routed through a network of residential IPs provided to The Markup by The Bright Initiative.
What can other journalists learn from this project?
All of the data collected for the story was published to The Markup’s GitHub page, and The Markup also published the methodology for the investigation alongside the story. We hope that our transparency at every level of this project helps cement showing your work as an industry standard, especially because it helps newsrooms without the same resources or access.
We also wanted to make sure we got our findings into the hands of local reporters and to carve out time to help them publish their own stories. First, we loaded the address-level data into interactive maps and produced granular and summarized output files organized by city and provider. Then, alongside Leon and Aaron’s investigation, we published a how-to guide (a “story recipe”) so that local journalists could use our findings, combine it with their own reporting, and get more specific, localized reports to their readers. We’ve been in awe at the impact we’ve seen from helping local reporters parse this data we wrangled, including in the comments from their readers expressing that they’ve long suspected internet inequities and now finally have something to affirm their suspicions.