How the parliament and government of Kyrgyzstan copy-paste Russian laws

Country/area: Kyrgyzstan

Organisation: azattyk.org

Organisation size: Big

Publication date: 26 Jun 2020

Credit: Edil Baiyzbekov

Project description:

In Kyrgyzstan people often said that deputies plagiarize Russian laws. I analyzed all the texts of Kyrgyz laws and found out that it’s true. I scraped all 1277 texts of laws of the last’s convocation of parliament (from 2015).
I removed technical documents such as audit results, reports or renaming of villages. I found that in 40% of 805 new and complementary laws there were articles that are similar to Russian laws.

There is also a chart that shows how many laws each deputie initiated and the average % of his/her copi-paste.

It’s made in Instagram-story format with animated charts.

Impact reached:

People became more aware of how untenable was the last convocation of Parliament. Journalists rised more questions about why deputies copy-pasted Russian laws and how it affect on Kyrgyzstan.

It was in top-10 by GIJN and republished by other media outlets

Techniques/technologies used:

I used Python for scraping, analysis and cleaning. Libraries: pandas, BS4, asyncio, textract, re, selenium, ast etc.

I used text.ru API for finding plagiat in texts, it was excellent at finding plagiarism in consecutive sentences.

For visual part I used HTML, CSS, JavaScript, AMP stories with Flourish charts.

What was the hardest part of this project?

The hardest part were cleaning and finding plagiat in 911 712 words in 6248 articles. I made script that worked 2 days straight to pass that amount of text data using API. Due to the fact that laws are legal texts and are often similar, it was difficult to find a minimum percentage of similarity of texts to call it plagiarism. We decided to call the article from the law a plagiarism if it has more than 40% of similarity with the Russian article. Less were minor copies and were not included in further analysis.

I made all charts in flourish-story and used JavaScript to animate them accordingly the current number of the amp-story component.

What can others learn from this project?

This project looks great on mobile and desktop, visual part was made using little JavaScript and AMP components that are similar to pure html tags. Beautiful charts were made without any coding.

Also it’s not necessary to use machine learning, neural networks and NLP to find plagiralism. There are avaliable APIs and tools, journalists in CIS can try to make similar research in their countries.

My project inspired another media in Kyrgyzstan to make similar visual aproach to tell a story.

Project links: