Country/area: United States

Organisation: The Markup

Organisation size: Small

Publication date: 25/08/2021

Credit: Emmanuel Martinez, Lauren Kirchner, Malena Carollo, Evelyn Larrubia, Ben Tanen


Emmanuel Martinez is a data reporter who uses data, statistics, and programming to tell stories about the disparities marginalized communities face. 

Lauren Kirchner is an investigative reporter who writes about the intersection between government and technology and how the use of data in decisions affects us all, particularly the most vulnerable.

Malena Carollo is an investigative reporter who writes stories that expose broken systems and wrongdoing.

Project description:

Discriminatory lending practices have been well-documented for decades. The response from the lending industry has been that apparent racial disparities would disappear if researchers and journalists had enough relevant data—while at the same time resisting its release.

The Markup’s investigation “Denied” debunks the lenders’ argument. Even after accounting for financial characteristics that the industry said were key and had not been publicly available until now, including debt-to-income and combined loan-to-value ratios, we found that prospective borrowers of color are still denied mortgages at higher rates than similarly qualified White borrowers.

Impact reached:

The Department of Justice, the Consumer Financial Protection Bureau, and the Office of the Comptroller of the Currency announced a new initiative in October to combat redlining practices. As part of the announcement, CFPB director Rohit Chopra cited The Markup’s findings and criticized the mortgage lenders’ response to it.

The new multi-agency effort will increase government analyses of lending patterns with an eye toward fair lending, task state attorneys general with enforcing fair lending practices more stringently, and strengthen reporting pipelines between the Department of Justice and “financial regulatory agencies” to identify fair lending violations. The Department of Justice described the Combating Redlining Initiative as its “most aggressive” attempt to curb the practice.

Weeks after we began inquiring about Fannie Mae’s and Freddie Mac’s mortgage-approval algorithms and their potential disparate effects on people of color, Fannie announced it would start incorporating on-time rent payments in its software starting in mid-September. Freddie Mac followed suit two months after the story’s publication. The two companies buy about half of all mortgages in the country, de facto setting the rules for the industry.

Various legislators and regulators also took notice. Minnesota’s attorney general, Keith Ellison, said it’s become clear that lenders’ criteria and policies “result in serious disparities in lending patterns,” and if lenders don’t review their algorithms, they “should not be surprised if they are investigated for violating state and federal laws, like the Fair Housing Act.” 

Techniques/technologies used:

To test whether debt-to-income and combined loan-to-value ratios would eliminate racial and ethnic lending disparities, we used a statistical technique called a binary logistic regression. This type of regression allows us to assess and quantify the relationship between multiple independent variables against a single binary outcome—in this case, whether a lender approved or denied a mortgage application.

By using a binary logistic regression, we were able to control for multiple independent variables, ensuring we compared the application outcomes of similarly qualified applicants.

We started with nearly 18 million mortgage applications from the Home Mortgage Disclosure Act dataset, known as HMDA data. We filtered for conventional, first-lien home purchase loans for one-to-four-unit properties where the person plans to live in the home they’re looking to buy. We excluded those loans that are purchased by other financial institutions and government-insured loans, like those insured by the Federal Housing Administration, for example.

We also included various Census datasets in our analysis: the racial and ethnic demographics of all neighborhoods, the median property value for all counties, and the population size for all metropolitan areas in the country.

Using these datasets, we built a regression model that consisted of 17 different variables. We judged the strength of our model using McFadden’s pseudo r-squared, using a cutoff of  0.1 to find valid results. We determined if a variable was statistically significant by assessing the p-values and making sure they were below 0.05. Lastly, we ran a series of collinearity tests to ensure that we were not using two independent variables that correlated with each other. 

We detailed our approach—how we filtered the data and the variables we used—in a nearly 10,000-word methodology. We used Python and Jupyter Notebook to run our analyses and made our code publicly available on GitHub.

What was the hardest part of this project?

Statistics is both an art and a science, and there’s no straightforward regression equation to assess lending disparities. 

We ran more than 150 regression equations before publication, filtering the data in different ways and including different variables in different forms to see if the disparities persisted regardless of the equation we used—and they did. Simultaneously, we were in conversation with various experts in fair lending laws, Home Mortgage Disclosure Act data, and statistics for feedback on our analysis, approach, and findings. All this work culminated in our extensive methodology, describing in detail every step and decision along the way, which was in turn reviewed by a panel of experts and sent to various mortgage industry groups for their feedback before publication. Our analysis produced national and regional results. 

Finding human examples for the story was an enormous undertaking. The data does not include names or addresses, so investigative reporter Lauren Kirchner spent months emailing and calling advocates, realtors, and experts to talk to people of color who may have been wrongfully denied mortgages. One of the biggest problems she faced: When you’re denied by a computer, you don’t think you’ve been discriminated against. 

In addition, we partnered with The Associated Press to give newsrooms across the country access to our story, data, and analysis before publication so they too could report on the issues in their coverage areas. We hosted a webinar in advance of publication, detailing the relevance of the story, translating the complex analysis into an easily digestible format, and explaining how to use the localized version of the findings. As a result, more than 160 newsrooms co-published our story.

What can others learn from this project?

Because we detail all our decisions and steps in our methodology section and make our code publicly available, other journalists can reproduce our analysis or build on it. Our investigation relies on the latest and most expansive version of the Home Mortgage Disclosure Act dataset to date, more than 18 million records and nearly 100 columns, some of which had never been publicly available before.

Reporters and researchers can use our methodology and code and apply it to other types of mortgages—government-insured loans, for instance—their local metropolitan areas, or lenders they may be interested in.

They can also take away general ideas about how to audit algorithms to determine whether they produce race-neutral decisions. The inclusion of certain variables that may not seem tied to race can affect people differently, depending on their race or ethnicity. For example, income and assets are used to gauge risk, but the average White family has eight times the wealth as the average Black family, so Black families are going to be disproportionately harmed by an algorithm that makes decisions based on that factor.

Project links: