At the core of this project is a tool that analyzes and visualizes all speeches and oral contributions made in the Russian State Duma since its creation in 1994. The tool shows how often and in which years single terms (like ‘sanction’) or term combinations (like ‘alexei navalny’ or ‘foreing agent’) were used by deputies. This allows to understand the important trends in Russian politics and to trace the transformation of the Russian parliament. In the Russian and German versions of the project, the tool is accompanied by an analysis of the evolution of the Russian State Duma.
The Russian State Duma plays a minor role in international reporting. The aim of the project was, on the one hand, to develop a tool to make the issues discussed in the State Duma tangible and, on the other hand, to give international readers an insight into the Russian parliament and to analyse how it has changed from its genesis in 1994 and how it ticks today and is positioned within the Putin system.
With this project, we have reached a broad readership in Russian and German-speaking countries, first and foremost Russia and Germany, and call the project a success with over 110,000 users. The project itself became the subject of news coverage in Russia, for example it was reported on by online media Meduza.io and independent TV channel “TV Rain”.
The project serves as a tool for many international journalists, researchers and politicians, from whom we have received very positive feedback.
The tool is based on raw data pulled from about 385.000 transcripts from speeches and oral contributions published on the website of the State Duma.
The first thing we did to prepare the data was to chop the filtered transcripts up into individual words (tokens). Then we removed all of the “stop words” from the token list, which have no particular relevance for the analysis.
Individual terms can occur in a variety of forms, so the next step was to standardise all the variants, i.e. change them all to their dictionary form, or lemma. In computational linguistics, this step is called lemmatisation. We used an algorithm (MyStem) developed by the Russian search engine provider Yandex for this.
We also searched the data for words that occur in two or three-word strings (known as n-grams) with particular frequency, because we were interested in combinations of words like “artificial intelligence” or “Great Patriotic War”, as well as in individual terms.The last step was to count the number of times that the words and word combinations appear in the data associated with each individual year. To ensure that differences in the volume of material published in different years would not distort the results, we set up the tool to chart relative rather than absolute frequency; i.e. it shows the frequency with which a word or a combination of words appears per 100,000 words in a year.
The data is stored in ElasticSearch database and retrieved via React Frontend.
What was the hardest part of this project?
There were various difficulties in this project, e.g. getting the data and cleaning it up so that the results are correct. However, the most difficult was the data evaluation. We had to realize that the discourse in the State Duma (intensified since 2000s) does not correspond to the social or media discourse, so that some results were sometimes strange. Many important topics (e.g. climate change) are not taken up, for many terms are used euphemisms (e.g. ‘anti-terrorist operation’ instead of ‘Chechen war’). We notice the similar situation with Alexei Nawalny: Although the deputies (unlike Putin) use this name regularly, much more often he is mentioned as “a figure in a case”, “blogger” or “this person”. There are also many other examples.
The difficulties and the attempts to solve them have become main substantive results: parallel data analysis in the tool, analysis of the metadata of the dataset and selective “manual” comparisons with the stenograms led us to inhabit new insights into how the laws are passed and generally how the Duma ticks. The most important results show, on the one hand, the evolution of the parliament in Russia from the 1990s, through the early Putin era to today, and the subordinate and functional role of the parliament in the Putin system – the Duma passes more and more laws that come from the president and the government, the count of passed laws is getting bigger and bigger, laws are passed faster and faster, if there are any discussions at all about draft laws, they do not take place in plenary. These findings were published in the accompanying text on the evolution of the Duma and in separate articles on Novaya Gazeta, we link below.
What can others learn from this project?
First of all, journalists can use this project as a tool for their own analysis, which they are doing diligently by now)).
We believe the project can be used as a model for analyses of the big data collected in different state institutions in Russia as well as in other post-Soviet and Eastern European countries, to trace the long-term trajectories, turning points, and current state of the institutions. In relation to Russia, it applies especially to regional parliaments and local issues, which are even less transparent than the State Duma.
And one can learn (again): dealing with big data is worthwhile and it is worthwhile to offer the user not only the results of a research, but also the tool itself so that one makes the analyses transparent and creates the desire to try it out oneself. This creates the “joy of complexity”, the guiding principle of dekoder.org.