Last year, the Howard Center for Investigative Journalism at the University of Maryland collaborated with other universities to investigate the impact of homelessness and the threat of homelessness posed by the pandemic for a project called “Nowhere to Go.”
The initial stories this spring documented the criminalization of homelessness in some of the country’s least affordable cities. As the pandemic caused millions of Americans to lose their jobs, the consortium pivoted over the summer to determine whether the federal moratorium on evictions was working. This fall, it investigated evictions by public housing authorities, often the last stop before the street.
The series focused attention on the plight of some of the nation’s most vulnerable people, with distribution by the Associated Press and USA Today. By highlighting stories of individual tenants at risk of eviction, our work also led directly to some getting necessary assistance to remain housed.
It also drove the creation of two major journalistic collaborations that were launched for the project, collaborations that have continued even after this particular reporting project ended.
First, to tell these stories, we brought together data journalists and faculty at seven universities: the University of Maryland, University of Oregon, Boston University, Stanford University, the University of Florida, the University of Arkansas and Arizona State University. This multiple-university collaboration was, we believe, unprecedented in scope. This collaboration is now working together on a new reporting project, and recruiting others to join.
Second, it kickstarted a collaborative effort to obtain hard-to-get court records in more than a dozen cities. Data journalists working for the Howard Center at Maryland, the Big Local News project at Stanford University, USA Today and others collaboratively wrote custom, open-source software to scrape court records on evictions from online court record management systems in more than a dozen cities. This effort was onerous — but necessary — because eviction records we needed to tell these stories were unavailable through other means. This data collection effort has continued, and is even expanding.
We have open-sourced the court scraping software package we developed for this project and we continue to improve. We are also adding additional court systems to the list of jurisdictions from which we regularly collect records. And we are starting to gather new case types, beyond just evictions. We are releasing all the data we’ve collected on Stanford’s Big Local News data sharing platform for use by other journalists.
We used several tools and technologies to bring this project to life, including:
Web scraping with Python, R and a suite of open source software libraries, including selenium.
Data analysis with Python, R and SQL, primarily working in Jupyter notebooks and R Studio.
GitHub for version control and open source tool sharing.
Open Refine for data cleaning.
What was the hardest part of this project?
The hardest part of this project by far was on the data acquisition front. Obtaining bulk court data in the U.S. is needlessly complicated. Most state and county judicial systems in the U.S. are exempt from public records laws that would compel the release of bulk data. When court systems do make bulk data available, they often sell access at prices unaffordable for most news organizations.
In most states, access to court records is made available through a public web application that typically allows users to look up individual cases, linked to a database back end that contains the bulk data we were after. By writing custom web scraping software to programmatically access these sites, we were able to gather court data in bulk.
This was not an easy effort. No two web applications were alike. Though many sites employ software from a common technology vendor to power their sites, each site had its own quirks, making it difficult to write a single tool to gather all the data we needed. And many sites had defenses designed to prevent automated tools like ours from gathering records at scale, but we developed legal techniques to defeat these systems.
What can others learn from this project?
There are several things we think other journalists could benefit from that we discovered while working on this project.
Collaboration is a great way to take on journalism projects that are beyond the scope of one individual newsroom. But it takes work to organize and manage correctly, especially on projects that combine student journalists with professionals.
Through this project, we have done more than just produce a package of meaningful journalism. We have also provided other news organizations several opportunities to build on our reporting.
We have released the underlying data on evictions in more than a dozen cities for use by other journalists.
We have produced an open-source software package that they can modify to obtain court records in their local jurisdiction, and have opened the door to others who wish to join our collaboration and work alongside us.
We have also tested and implemented a stable, low-cost method for legally defeating captcha technology that could be of broad use to other journalists involved in scraping government websites.