Google Translate used to translate texts in a way that cemented gender stereotypes. Then vowed to do better. We tested how well the algorithm improved using Finnish sentences (the grammar of which does not code for gender) and translating them to German.
Finnish «hän on kaunis» (‘he/she is beautiful’) becomes, for example, «sie ist schön» (‘she is beautiful’) in German.
Profession names as well as adjectives were translated in a heavily stereotyped way.
We explain how this comes to be (algorithms are trained in certain ways) and what could be done to amend it.
The project was well-read on Republik Magazin’s webpage (www.republik.ch) and particularly well-shared in social media.
We hand-selected and hand-checked original sentences to be translated, then used Python to automatically translate them via Google Translate’s API and then calculated the percentage of translations as he/she.
What was the hardest part of this project?
Designing the experiment and creating a well-balanced (but small, of course, since we hand-checked or hand-coded all the sentences) language corpus.
What can others learn from this project?
How broad the field of «data journalism» can be: Sometimes the interesting stories do not necessarily lie in huge datasets and do not have to be presented in fancy visualisations. Sometimes some knowledge, a good idea and a fitting experiment design produces an insightful story.