We partnered with The Associated Press and Northeastern University to revive a dormant database tracking every mass killing in the United States, back to 2006. USA Today Graphics created a visual overview, showing trends and summaries from this unique and collaborative project. The overview is evergreen, and updates are automated each time an incident is added to the database or an earlier incident is modified. The data pipelines and presentation are elegantly coded and built to be easily updated in perpetuity so that data on this important subject will be available for all to download, free of charge.
The USA TODAY/AP/Northeastern University mass killing database contains information on incidents, offenders, victims and weapons for all multiple homicides with four or more victims killed within a 24-hour time frame in the United States from 2006 to the present. The database is designed to exist in perpetuity, and we commit to updating it, so the findings will change. At current, we have learned the following:
– There has been a spike in these types of killings over the past few years, but the rate of occurrence has remained relatively flat since the mid-2000s.
– Public mass shootings are only part of the story.
– Victims of mass killings are more likely to have been killed by someone they know.
– Most mass killings are committed using handguns.
– Most mass killers are men.
– Mass killings aren’t confined to big cities.
The database and visualization system has made possible dozens of subscriber-converting stories covering this important subject, and the main presenation has drawn over a quarter million visitors since launch who each spend two mintues on average with the page.
In addition to the primary presentation, a publicly accessible data download portal, and set of modular data visualizations that are updated daily with any new information have been provided free of charge. It has been covered in several publications, including Nieman Lab. The project has also been named “Best community service project/reporting” and runner-up for “Best use of data/infographics” by Editor and Publisher. It was also on the Data Visualization Society’s Information Is Beautiful longlist. Deeper analysis of the database including visual stories, additional data visualizations and a breaking news response kit are planned for 2023.
The database is constructed and maintained using sound social science methodology, including a detailed codebook with explicit instructions on data collection protocols (this hyperlink is to a summary codebook, the full codebook is included at the end of this document.) In addition, determinations pertaining to certain subjective variables (e.g., underlying circumstances and motivation) involve judgements of multiple coders with any disagreements settled through consensus. Also, we avoid promoting data that depend on unreliable sources (e.g., reports by family and friends about mental health issues) that weaken other datasets.
The full database currently consists of four linked data tables with a total of 59 data fields (not counting indicators for the availability of offender/victim identity)—18 fields for each incident, 20 fields for each offender, 13 fields for each victim killed, and 8 fields for each weapon used. Most variables, with the notable exception of victim names, are available for public download. The remaining data are reserved for individuals affiliated with the Associated Press, USA TODAY/Gannett, and Northeastern University’s School of Criminology and Criminal Justice, and others by permission of all three organizations. Moving forward, additional variables may be added to the full database as well as the public subset.
The presenation is coded and designed in an elegant and resilant manner that should stand the test of time, and inevitable platform and technology shifts. Updates run automatically, the vizualiztions are responsive, and the project will be easy for some future graphics journalist to maintain and improve if they should discover it long after we are no longer working at USA TODAY.
Context about the project:
This project is over a decade in the making and work on the most recent iteration of the database and presentation took longer than a year to complete.
The effort started nearly a decade ago at USA TODAY when Paul Overberg, Meghan Hoyer, Mark Hannan, Jodi Upton, Barbie Hansen, and Erin Durkin contributed to an original 2012 data reporting effort. In 2016, primary data collection and verification efforts shifted from USA TODAY to Northeastern University. Northeastern researchers conducted extensive searches for incidents not reported in the SHR during the time period, utilizing internet search engines including Lexis-Nexis, Google News, and Newspapers.com. Northeastern University researchers also independently verified data collected by USA TODAY staff and filled in missing information, sometimes involving updated reports on older cases.
In December 2018, a Memo of Understanding (MOU) was signed by The Associated Press, USA TODAY and Northeastern University to formalize a joint initiative to maintain and expand the mass killing database previously housed at USA TODAY. The Associated Press hosts the database and maintains the data entry tool; USA TODAY has developed and maintains the public website containing visualizations and interpretations of the data; and Northeastern University manages data collection and updates. At USA TODAY, the torch was passed from Jodi Upton and Meghan Hoyer to John Kelly, and subsequently Shawn Sullivan and Mitchell Thorson who remained committed to a complete overhaul of the project. It published in August 2022.
Researchers at USA TODAY first identified potential incidents using the FBI’s Supplementary Homicide Reports (SHR). Homicide incidents in the SHR were flagged as potential mass murder cases if four or more victims were reported on the same record, and the type of homicide was coded as “murder or non-negligent manslaughter.” Cases were subsequently verified utilizing media accounts, court documents, academic journal articles, books, and local law enforcement records obtained through Freedom of Information Act (FOIA) requests.
Each data point was corroborated by multiple sources, which were compiled into a single document to assess the quality of information. When sources were contradictory, official law enforcement or court records were used, when available, followed by the most recent media or academic source. Case information was subsequently compared with other available mass murder or mass shooting databases to ensure validity. Incidents listed in the SHR that could not be independently verified were excluded from the database (many SHR records, contrary to the protocol, include injured victims along with those who were killed, giving the false impression of a mass killing).
Finally, as an important step in checking for completeness, cases in our database are compared side-by-side against other available datasets (e.g., the Gun Violence Archive, Everytown for Gun Safety, The Violence Project, Mother Jones), finding no qualifying cases in other databases that are missing from our database but many in our database that are missing in others.
What can other journalists learn from this project?
In short, the value of taking your time to get something right, willingness to collaborate across organizations, and honoring commitments.
Given the surging interest in mass killings, including those incidents that involve a firearm, as well as the lack of an official, government database on such cases, several news organization, academic collectives, and advocacy groups have launched their own database projects. In terms of the breadth of cases and range of variables, the USA TODAY/AP/Northeastern University Mass Killing Database is the most comprehensive, including incidents with four or victims killed regardless of location, motivation, victim-offender relationship, or weapon since 2006. Unlike other databases on the topic, tallies of injuries are not conflated with data on victims who are killed. Similarly, data on assailants who are killed at the scene or take their own lives are not blended with those on victim fatalities.
There are several other features that make our database unique and of significant value to journalists, social scientists, and policy makers alike:
It is the only relational database linking incident, offender, victim, and weapon characteristics—spanning more than five dozen data fields.
It is the only database that includes the 20 percent of mass killings that are committed with means other than guns, recognizing that these victims are just as dead as those who are fatally shot and often suffer especially painful deaths.
It is the only database that follows a case through the criminal justice process and eventual sentencing.
It is the only database that features an interactive, visual website that is updated daily with the ability for others easily to download incident, offender, victim, and weapon data files.