This story was produced as part of a collaboration between USA TODAY, The Arizona Republic and the Center for Public Integrity. More than 30 reporters across the country were involved in the two-year investigation, which identified copycat bills in every state. The team used a unique data-analysis engine built on hundreds of cloud computers to compare millions of words of legislation provided by LegiScan.
The impact: Fewer model bills, and the special interests behind them, will go undetected.
The phenomenon of copycat legislation is so pervasive that it is difficult to imagine it being curtailed without years of reforms across many states. The impact of our reporting has been felt, however, in dozens of communities, where our stories revealed the motivations behind many model bills and empowered citizens with the knowledge of who is behind them.
We interviewed more than 40 state lawmakers across the country about bills they sponsored. Many said they didn’t know they had introduced a bill written by a corporation or didn’t understand how bill language was crafted to help the company.
One lawmaker in Pennsylvania sponsored more than 70 copycat bills — without recognizing, he said, the source of the bill in many instances. “I had no way of knowing…” Rep. Thomas Murt said.
The effort continues. As part of the project, we built two tools that give the public access to information on copycat bills in ways that didn’t previously exist.
Our public facing tool — launched in November and the product of months of development work — allows online readers to explore tens of thousands of possible model bills by state or keyword. We update it weekly with newly filed model legislation, making this tool a nearly real-time resource.
Our internal tool allows Gannett reporters to explore the copycat bills we’ve confirmed and report on the lawmakers in their local communities who sponsored them. So far, we have trained more than 100 reporters on the tool.
How do you find 10,000 needles in 50 haystacks?
That, in effect, is what journalists and developers with USA TODAY and The Arizona Republic set out to do two years ago: Identify among the roughly 100,000 bills introduced in the 50 states each year what’s been copied from drafts pushed by special interests.
Here’s how we did it.
Using data provided by LegiScan, which tracks every proposed law introduced in the U.S., we pulled in digital copies of nearly 1 million pieces of legislation introduced between 2010 and Oct. 15, 2018. The data included a limited number of bills from 2008 and 2009.
We then asked a dozen reporters covering state legislatures for USA TODAY Network newsrooms across the nation to build a list of model bills by searching special-interest groups’ websites, scouring news coverage and interviewing lobbyists and lawmakers. We identified more than 2,100 models, a list that is far from complete because many groups don’t make their models public.
We then used a computer algorithm designed to recognize similar words and phrases and compared each model in our database to the bills that lawmakers had introduced.
These comparisons were powered by the equivalent of more than 150 computers, called virtual machines, that ran nonstop for months.
What was the hardest part of this project?
Our scoring system is based on three factors: the longest string of common text between a model and a bill; the number of common strings of five or more words; and the number of common strings of 10 or more words.
Based on those factors, bills received scores on a 100-point scale. The closer to 100, the more likely a bill was copied from model legislation.
For its analysis, USA TODAY/Arizona Republic used only bills that scored 80 or higher. At that level, substantial amounts of text have been duplicated.
Another estimated 10,000 bills below the 80-point threshold were likely copied from model legislation but matched less of the model’s text. Out of caution, USA TODAY/Arizona Republic cited in its investigation only bills with substantial portions copied from a model. In addition, if legislators copied an idea but not the precise language, a bill would not be flagged.
Joe Walsh, a former data scientist at the University of Chicago, used what’s known as the Smith-Waterman algorithm to create the Legislative Influence Detector, which also finds similarities between model legislation and bills. His system has been used by reporters around the country to find model bills.
Walsh reviewed USA TODAY/Arizona Republic’s investigation and findings and applauded its scoring system for showing when a bill has been substantially copied from model legislation.
“It’s really clear, the numbers are nice and round, and it’s easy to show and explain,” Walsh said. “I wish that we were able to do some of this stuff. I am glad someone is.”
What can others learn from this project?
Can I examine the results?
USA TODAY/Arizona Republic continues to search legislation and compare it with known model bills from around the country, furthering its investigation of outside influences on state lawmakers.
Initially, the system is being rolled out to USA TODAY Network journalists for use in reporting on state legislatures.
How were bills categorized?
Special-interest groups, both liberal and conservative, have for years crafted and lobbied for model bills. Generally, the organizations that craft the bills have a clear mission or ideological bent. The American Legislative Exchange Council, the best-known and one of the most prolific model-bill factories, supports conservative ideas and efforts. The State Innovation Exchange, once known as ALICE, is in effect ALEC’s liberal counterpart. We classified bills based on the mission or ideological orientation of the organizations that created each model. In some cases, groups with a conservative bent also push bills that benefit industry. We labeled each bill according to the most dominant characteristic.This story was produced as part of a collaboration between USA TODAY, The Arizona Republic and the Center for Public Integrity. More than 30 reporters across the country were involved in the two-year investigation, which identified copycat bills in every state. The team used a unique data-analysis engine built on hundreds of cloud computers to compare millions of words of legislation provided by LegiScan.