Beware, lazy journalists! I’m chasing your clichés

Country/area: France

Organisation: Dans mon labo

Organisation size: Small

Publication date: 1 Jun 2020

Credit: Yann Guégan

Project description:

Journalists are supposed to avoid overused expressions in their writing, or at least that’s what they teach in J-schools. Yet the stories we read are often filled with clichés. 

To name and shame the news outlets that keep perpetuating the bad habits, I prepared a set of lively infographics, updated daily. They showcase the most frequent journalistic clichés and the news outlets that produces them in bulk. 

But I also interviewed experts to understand why I was so obsessed by this topic.

Impact reached:

I am proud to say that my work put a smile on the face of many many journalists and news junkies during the grim pandemic year we went through. 

To date, the page received 18,000 visitors, not bad for a personal blog. Soon after the publication, my Twitter mentions filled with users proclaiming their love or their hate of a particular cliché. Some of them kept on exchanging them ones joyfully FOR DAYS, which is a form of online harassment if you ask me.

I received 60+ suggestions of new expressions to add to my scraping engine – eight months in, I am still getting some. I guess denouncing a cliché has a cathartic effect for some readers.

It was mentioned by Nicolas Demorand, France’s most listened radio host, which prompted a furry of textos from friends of colleagues and made my mom very proud. I was interviewed by another famous radio host, Pascale Clark, at the station Europe 1. I suspect that radio journalists like to make fun about their print counterparts.

On a most serious note, many journalism teachers as well as various trainers from other fields have thanked me for this tool, that they use with their students.

The project also inspired a Canadian journalist, Philippe Couture. We worked together on a special Quebecois edition for the rankings, by tweaking the list of clichés to be detected and the news outlets to watch.

Techniques/technologies used:

The very first step was to come up with an initial list of 10+ clichés to detect. It later grew up to 100+ expressions used of the web scraping and/or for the custom detector provided at the bottom of the page. 

I then wrote two Python scripts. The first one searches daily for each expression in the French edition of Google News, using its little-known RSS feeds. The second one searches the same source for the number of stories published by each news outlet identified by the first script.

The number of stories (averaged on several weeks) is used to weight the number of clichés found by an estimation of the size of the medium. We need fairness here, I don’t want to infuriate the French media world. The result we get is a global clichés score for each news outlet.

I then wrote a third Python script, that works as an API to access the scraped data. It can be queried to retrieve a list of clichés or a list of news outlets on a custom timespan, in the form of a JSON file.

On the front-end side, I wrote a Vue.js application to generate dynamically the infographics by querying said API.

(The whole process is probably vastly oversized for such a lightheaded topic, but the French lockdown during the first phase of the Covid pandemic was long, and I needed a project to clear my mind.)

What was the hardest part of this project?

I set myself a challenge: to create the definitive online ressource on the topic of clichés, and not just a one-shot story that will die in a few days.

So it had to be updated live, I could not just use the results of some punctual searches. That led me to write Python scripts reliable enough to run each day without crashes and errors.

One consequence is that I had to take a step back from my code at the end of the project, to decide if the infographics were enough or if I needed to beef up the story itself. That prompted me to interview two linguists after showing them the then almost finished project, to get a more scientific take on the subject. 

They explained me why journalists can be so obsessed by clichés, and why the readers probably don’t care about them so much. 

Believe it or not, despite the long hours with my friends Python and Javascript, the more traditional journalistic work of the project is the part I am the most proud of.

What can others learn from this project?

Here are some take-aways:

  • doing data-driven journalism does not mean you have to work on serious stuff all year. Sometimes it’s good to work on a Covid-19 dashboard that saves lives. Sometimes it’s good to be The Pudding.
  • getting some usable data for a one-shot story is far more easy than working on a comprehensive process to work with this data on a long-term basis. No shocker here.
  • if you are too deep in technical stuff, it’s easy to forget about the basic journalistic work: reading documentation, interviewing people that know the data more than you, questioning the take on the subject you chose initially…
  • designing custom infographics means being obsessed about details until the last minute. The first time you think your work is over, there is probably 10% of the total time still left to go.
  • bar chart races are cool, they are not a fad so don’t let any dataviz pundit tell you the opposite.

Project links: