Free to Shoot Again
Category: Open data
Country/area: United States
Organisation: The Trace and BuzzFeed News
Organisation size: Big
Publication date: 24/01/2019
Credit: Sarah Ryley, Jeremy Singer-Vine, and Sean Campbell
Across America, the odds that a shooter will face justice are abysmally low and dropping. Police make an arrest in fewer than half the murders committed with firearms. If the victim survives being shot, the chance of arrest drops to 1 out of 3. Thousands of nonfatal shooting cases every year are never investigated. The shooters are left free to strike again, fueling cycles of violence and eroding the public’s trust in law enforcement.
The Trace and BuzzFeed News’ series “Free to Shoot Again” interrogates this failure in policing with a data-driven investigation unprecedented in its methodology, scope, and findings.
Our sweeping national findings are illustrated through nine shootings in Baltimore linked by a common victim or suspect. Police had closed just two of the nine cases. One case was reopened after we found the shooting had been pinned on a dead man without any evidence. In the other closed case, the man who had been serving a life sentence for the murder was freed from prison, in part by citing information we uncovered.
Baltimore leaders called for more resources toward solving shootings and for an entity to review open cases. The police department has since added more detectives to investigate homicides.
Due to the complexity of the analysis, we felt a strong need to share our materials and methods with our readers. Between the documents, data, code, and detailed methodologies we made public, we believe this is among the most extensively documented data-driven investigations to date.
Our analysis on linked shootings was the first to take suspects into account. Andrew Papachristos of Northwestern University, whose network analyses on shootings informs violence intervention efforts across the country, said he plans to incorporate our methodology into future work. Daniel Webster of Johns Hopkins said he used the materials we posted for the analysis in a report to the mayor and police chief on strategies to reduce gun violence.
Detailed data on violent crimes can be difficult to obtain. For that reason, we made nearly all of the raw data we obtained public: 4.3 million murders, rapes, robberies, and assaults from 56 agencies. The data was downloaded more than 500 times within the first month of posting, and numerous researchers, journalists, and advocates have written to us about the data. Webster said he is using it for a multi-year research proposal. Crime analyst Jeff Asher called the release “a remarkable service.”
To examine long-term trends in clearance rates, we analyzed data from three major FBI crime-reporting programs: the Supplementary Homicide Report, the National Incident-Based Reporting System, and Return A. Each dataset is structured completely differently, so we standardized the FBI’s raw files into order to compare results between the three datasets. We also used rigorous statistical controls to ensure the integrity of the analysis. We posted our standardized data, the code to produce the standardized data, and an extensive methodology on GitHub.
To examine present-day trends in clearance rates in the nation’s largest cities, we standardized data from 22 police departments. This remains the largest-scale analysis of clearances for nonfatal shootings that we’ve come across. We also compared the results with our analysis of FBI data and found them to be substantiated. We posted the standardized data, code, and methodology on GitHub.
To examine the linkages between fatal and nonfatal shootings, we created a network analysis of Baltimore Police data that included the names and birthdates of shooting victims and suspects. That analysis was the first of its kind to be published; others, by various academics, did not take suspects into account. We posted the data, methodology, and code on GitHub.
To examine the over-policing/under-policing paradigm that exists in black and Latinx neighborhoods across America, we used the following datasets: crime incidents, homicide and shooting case characteristics, and sworn officer assignments from the Chicago Police Department; police district and census tract boundaries from the City of Chicago; Census statistics from the Census Bureau; and sunrise and sunset times from sunrise_sunset.org. We posted the data, methodology, and code on GitHub.
We did our analysis in Python and R.
What was the hardest part of this project?
We filed more than 200 public records requests for data, case files, staffing audits, and database documentation. Nearly every agency gave us significant resistance and frequently quoted exorbitant fees. For instance, the Baltimore Police Department initially demanded $1,800 for one murder case file. Negotiations and appeals stretched on for longer than a year.
Once we got data, we often found problems that were difficult to resolve, such as major discrepancies with the agency’s reported statistics. Some agencies produced corrected versions of the data. But we often had to “appeal” missing incidents as an incomplete fulfillment of our request. We also threw out numerous datasets because of integrity issues.
We wanted to illustrate how law enforcement’s failure to arrest shootings fuels vicious cycles of unchecked gun violence. But we had to examine dozens of datasets to find one with all the information necessary to conduct a network analysis. This took months.
Once we decided on Baltimore, we encountered significant challenges in the field reporting. The areas where the shootings occurred have among the highest rates of poverty and gun violence in the nation. Residents often have insecure housing situations, so it took weeks of door-knocking and poring through court records to find key individuals. Many were deeply traumatized, and scared of retaliation, so we had to approach our interviews with sensitivity and deference to their concerns.
Initially, the Maryland Division of Corrections said we could not interview Devon Little, an inmate at the time. Gaining approval, partly by working through his mother, took months.
The Baltimore Police initially refused to work with us because we were national media outlets. It took months of persistence just to get a short interview. More on-the-record responses came only after we repeatedly demanded answers to our findings.
What can others learn from this project?
We’ve shared our methods and materials for this project online and in presentations, which has helped other people in their work in many ways:
The FBI analysis: Our extensive methodology details important nuances in three of the FBI’s most important crime datasets. The raw formats of the datasets require advanced coding skills to process and analyze. We’ve shared standardized versions of the data and relevant code so our work can be replicated.
The analysis of internal police data: Our methodology identifies differences across agencies that must be considered when combining them. This data has already been used by numerous researchers and journalists for their own work.
Network analysis: Our methodology and code can be used to analyze networks of crime victims and suspects, which is particularly important for shootings, which have a high rate of retaliatory violence.
Over-policing/under-policing analysis: Our methodology, data, and code shows how data from multiple sources can be integrated into a single analysis. Some of the data was obtained through public records requests, and is available for use.
The raw violent crime data has aided countless research and reporting efforts. Numerous journalists have told us it’s also helped them get more data out of the agencies that they cover.
In January 2020, we published an article in partnership with MuckRock that links to thousands of pages in documentation detailing the architecture of the most common police records- and case-management databases. This documentation can be key to winning public records battles for police data.
That article, along with numerous presentations that we’ve given at conferences and classrooms, gives detailed strategies on making public records requests for data. People regularly contact us seeking further advice on their own requests.
We spoke at NICAR 2019 on finding human stories in data, based on our reporting in Baltimore.