Best data-driven reporting (large newsrooms) – year 2020
Honorable Mention: Copy, Paste, Legislate
Organisation: USA TODAY, The Center for Public Integrity, The Arizona Republic
Country: United States
For USA Today/Arizona Republic: Natalie Allison, Chris Amico, Robert Barnes, Christian Baucom, Daniel Bice, Giacomo Bologna, Ben Botkin, David Boucher, Jon Campbell, Chris Davis, Amy DiPierro, Paul Egan, Tom Foster, Dustin Gardiner, Ronald J. Hansen, Tyler Hawkins, Greg Hilburn, Greg Holman, Joe Hong, Lisa Kaczke, John Kelly, Marisa Kwiatkowski, Keegan Kyle, Kaitlin Lange, Pamela Ren Larson, Aamer Madhani, Patrick Marley, Kelsey Mo, Dan Nowicki, Rob O’Dell, Geoff Pender, Nick Penzenstadler, Svetlana Peterlin, Agnel Philip, Justin Price, Nick Pugliese, Amy Pyle, Anne Ryman, Yvonne Wingett Sanchez, Jeff Schwaner, Chris Sikich, Michael Squires, Matt Wynn
For Center for Public Integrity: Jared Bennett, Kristian Hernandez, Sameea Kamal, Rui Kaneya, Mark Olalde, Pratheek Rebala, Peter Smith, Liz Essley Whyte
Jury’s comment: The Arizona Republic, USA Today Network and the Center for Public Integrity analyzed the language of proposed legislation in all 50 states, revealing 10,000 nearly identical bills. Their sophisticated methods revealed the extent of corporate lobbyists and interest group influence on the day-to-day lives of ordinary people, all conducted behind closed doors in statehouses around the U.S.
Organisation size: Big
Publication date: 6 Feb 2019
Project description: Copy, Paste, Legislate marks the first time a news organization detailed how deeply legislation enacted into law at the state level is influenced by special interests in a practice known as “model legislation.” The series explained how model legislation was used by auto dealers to sell recalled used cars; by anti-abortion advocates to push further restrictions; by far-right groups to advocate for what some called government-sanctioned Islamophobia to moves by the Catholic Church to limit their exposure to past child abuse claims. (Published February 6, April 3, May 23, June 19, July 17 and October 2, 2019)
Impact: People in various states called for legislation to require more transparency about the origin of bill language. Legislators found themselves compelled to defend their sponsorship of model bills. A public-facing model legislation tracker tool launched in November 2019, allowed journalists and the public to: –Identify recent model legislation introduced nationally –Identify recent model legislation introduced in their state –Perform a national search for model legislation mentioning specific keywords or topics –Upload a document they have to instantly identify if any language in their document matches any state legislation introduced since 2010 –Look up a specific bill by number to see all other bills matching it –Look up individual legislators and see all bills sponsored by them that contain model language As part of the project, local newsrooms were able to identify and interview major sponsors of model legislation and identified key issues that resonated in their state. Those stories explored the reach of model legislation and its surprising impact on policies across the nation. The combined national and local reporting revealed: –More than 10,000 bills introduced in statehouses nationwide were almost entirely copied from bills written by special interests –The largest block of special interest bills — more than 4,000 — were aimed at achieving conservative goals –More than 2,100 of the bills were signed into law –The model bills amount to the nation’s largest unreported special interest campaign, touching nearly every area of public policy –Models were drafted with deceptive titles to disguise their true intent, including “transparency” bills that made it harder to sue corporations –Because copycat bills have become so intertwined with the lawmaking process, the nation’s most prolific sponsor of model legislation claimed that he had no idea he had authored 72 bills originally written by outside interests.
Techniques/technologies: No news organization had attempted to put a number on how many of the bills debated in statehouses are substantially copied from those pushed by special interests. We obtained metadata on more than 1 million pieces of legislation from all 50 states for the years 2010 through 2018 from a third-party vendor, Legiscan. We also scraped bill text associated with these bills from the websites of state legislatures. In addition, we pieced together a database of 2,000 pieces of model legislation by getting data from sources, downloading data from advocacy organizations and searching for models ourselves. This was done either by identifying known models and trying to find the source or finding organizations that have pushed model bills and searching for each of the models for which they have advocated. We then compared the two data sets, which proved to be complicated. The team developed an algorithm that relied on natural language processing techniques to recognize similar words and phrases and compared each model in our database to the bills that lawmakers had introduced. These comparisons were powered by the equivalent of more than 150 computers, called virtual machines, that ran nonstop for months. Even with that computing power, we couldn’t compare every model in its entirety against every bill. To cut computing time, we used keywords – guns, abortion, etc. The system only compared a model with a bill if they had at least one keyword in common. The team then developed a matching process that led to the development of an updatable, public-facing tool that reporters and members of the public can use to identify not only past bills but future model bills as they are introduced, while the bills are still newsworthy.
The hardest part of this project: It’s hard to overstate how resource-intensive this analysis was. This was our first foray into natural language processing. We had to compare one million bills — each several pages long, with some up to 100 pages in length — to each other. Computationally, scale bought with a lot of complexities. We had to go deep into understanding how to deploy some of the software we used at scale and solve the problems we faced along the way. We spent tens of thousands of dollars on cloud services. We had to re-run this analysis every time we made changes to our methodology — which we did often. The resulting analysis and reporting took more than six months to put together. We obtained metadata on more than 1 million pieces of legislation from all 50 states for the years 2010 through 2018 from a third-party vendor, Legiscan. We also scraped bill text associated with these bills from the websites of state legislatures.In addition, we pieced together a database of 2,000 pieces of model legislation by getting data from sources, downloading data from advocacy organizations and searching for models ourselves. This was done either by identifying known models and trying to find the source or finding organizations that have pushed model bills and searching for each of the models for which they have advocated.
What can others learn from this project: The power of collaboration. CPI and USA TODAY/Arizona Republic built two analysis tools to identify model language, using two different approaches. USA TODAY’s efforts found at least 10,000 bills almost entirely copied from model language that were introduced in legislatures nationwide over the last eight years. CPI’s tool worked to identify common language in approximately 60,000 bills nationwide to flag previously unknown model legislation. Together the tools allowed for analysis of success from identified model bills and enabled identification of new model legislation. The computer comparisons, along with on-the-ground reporting in more than a dozen states, revealed that copycat legislation amounts to the nation’s largest, unreported special-interest campaign. Model bills drive the agenda in states across the U.S. and influence almost every area of public policy.