Over the course of two weeks in February 2021, we scraped more than a million news articles from more than 6,500 Turkish news websites in which the names of 973 administrative districts of Turkey appear. With a choropleth map, we revealed which districts have the highest or lowest news coverage per capita, and which districts of Turkey face the risk of emerging “news deserts.” We also analyzed the authenticity of news content for each district, showing where original reporting is We hope that this research has humbly contributed to a subject area with hardly any data about digital journalism in
Until our study, there was no data about the geographical scope and originality of digital journalism in Turkey. For the first time, we presented the public data-based insights on the issue, showing that news deserts are spreading in certain parts of Turkey —particularly the Inner Aegean, Central Anatolian, and Eastern regions where President Erdogan’s ruling Justice and Develıpment Party (AKP) dominates, while original reporting is surprisingly higher in other areas, including the Kurdish-majority Southeast region and in most strongholds of the main opposition Republican People’s Party (CHP) across the country.
Perhaps the most striking findings showed that 85% of the news articles published in Turkish digital media are not original reporting but are merely copy-pasted from initial sources (particularly the news agencies). Meanwhile, many provinces that seem like an “oasis” in the gross number of news articles are actually “deserts” of original journalism.
Our results were reported by dozens of news outlets in national and local media. We were interviewed by seven newspapers, five national TV networks, and one national radio channel. The Global Investigative Journalism Network (GIJN) shared our story through its social media accounts and its newsletter. According to our estimates, the project reached more than 13 million people in Turkey through conventional and digital media outlets.
We were also hosted at panels to present our findings at physical and online events organized by respected institutions like Columbia University’s Global Centers in Istanbul, the European Endowment for Democracy, the International Press Institute, and the Journalists’ Association in Ankara. Kizilkaya will also talk about this project at the Data Visualization Society’s Outlier Conference in February 2022.
We manually constructed a dataset of more than 6,500 news websites —including almost all national, regional and local outlets in Turkey. We coded a Python script and used jQuery to scrape the articles that these news websites published for two weeks.
We then built a knowledge graph to analyze this Turkish-language news content, semantically pinpointing the administrative district names in each article, while also assigning them “originality scores” by using a custom algorithm.
We computed all results in a custom-designed macro-powered spreadsheet, completing the analysis and data visualization by using Tableau and Datawrapper.
As we adhere to the principles of open data, we published the dataset as part of our feature article about this study on our non-profit news website Journo.com.tr in Turkish and in English (see Project Link 3 for the dataset).
Several Turkish and international communication academics contacted us to thank us for releasing the dataset and explaining the methodology so that they can replicate or expand this study on Turkish journalism.
What was the hardest part of this project?
Turkey is a highly challenging country for media outlets. Journalists are being jailed like in China, physically or virtually assaulted like in Russia, and are forced to self-censor like in many Asian and African countries. Independent media outlets are shut down or heavily fined like in Iran, the cronies of the government capture newspapers and TV outlets like in Hungary, and the public broadcaster is under heavy government influence like in Romania and Bulgaria. Independent journalists are labeled as “traitors”, and pro-government media spread fake news like in Serbia, while defamation laws are weaponized by the ruling elite like in Poland.
How these challenges affect original reporting in Turkey was a big question mark because there was no extensive study on the subject until we embarked on this route. The hardest part of our project is to come up with a viable method to answer at least a part of this question. After we figured out how we can technically scrape and then semantically analyze so many news articles, the rest was all about succinctly and aesthetically presenting the significant findings related to this Big Data. We narrated our data-based story on these findings and designed social media posts in accordance with the features of the specific audiences and algorithms on various platforms.
What can others learn from this project?
We believe that this project has humbly contributed to a subject area with hardly any data about digital journalism in Turkey. Our findings should be particularly helpful for those who try to understand the current state of local journalism and original reporting in Turkey amid so many challenges. Journalists can benefit from this study especially by focusing on the geographical discrepancies of original news reporting. In some provinces, we showed that two neighboring districts may be hugely different in this regard. Although they share similar resources and features, one district may have a significantly higher rate of original journalism than its neighbor. As an on-the-field follow-up to this data journalism study, the Journo website sent its reporters to three geographical regions to investigate the reasons behind these discrepancies (see Project Link 4 for these three news articles).
By using this work, Turkish journalists can now microscopically analyze the supply-demand of news reporting on the district level, and use these actionable insights to decide about their newsroom’s operations on the ground. Our main map simply shows where newsrooms can invest (for instance, by hiring more local reporters there) because there is an opportunity to satisfy the public demand for original reporting per capita in certain districts. They can also quickly see where competition in local journalism is relatively more intense.
Local journalism is the heart of any democracy. We hope that these findings will also serve the reporters and media executives in their endeavor to inform the public, including the electorate in rural districts, in line with universal standards and rules of high-quality, original journalism.