She is pretty. He is strong. He is a teacher. She is kindergarten teacher
Organisation: Republik Magazin
Organisation size: Small
Publication date: 19/04/2021
Credit: Marie-José Kolly, Simon Schmid
Marie-José Kolly is a journalist at the online magazine “Republik” where she focuses on science and data journalism. After studying german language and literature and mathematics in Bern, she earned a doctorate in linguistics in Zurich and Paris. She teaches at the University for Education and spends most of her free time with books or in the snow. She lives in Zurich with her partner.
Simon Schmid is a journalist. At the online magazine “Republik” he is Co-Head of the «Business, Science and Digital» desk. He studied sociology in Basel and economics in St. Gallen and completed further training in data journalism in New York. Schmid teaches data journalism at the MAZ in Lucerne and Storytelling with data at the FHNW in Brugg. He spends his free time hiking, biking and skiing. He lives with his family in Zurich.
Google Translate used to translate texts in a way that cemented gender stereotypes. Then vowed to do better. We tested how well the algorithm improved using Finnish sentences (the grammar of which does not code for gender) and translating them to German.
Finnish «hän on kaunis» (‘he/she is beautiful’) becomes, for example, «sie ist schön» (‘she is beautiful’) in German.
Profession names as well as adjectives were translated in a heavily stereotyped way.
We explain how this comes to be (algorithms are trained in certain ways) and what could be done to amend it.
The project was well-read on Republik Magazin’s webpage (www.republik.ch) and particularly well-shared in social media.
We hand-selected and hand-checked original sentences to be translated, then used Python to automatically translate them via Google Translate’s API and then calculated the percentage of translations as he/she.
What was the hardest part of this project?
Designing the experiment and creating a well-balanced (but small, of course, since we hand-checked or hand-coded all the sentences) language corpus.
What can others learn from this project?
How broad the field of «data journalism» can be: Sometimes the interesting stories do not necessarily lie in huge datasets and do not have to be presented in fancy visualisations. Sometimes some knowledge, a good idea and a fitting experiment design produces an insightful story.