2022 Shortlist

OBJECTIVE OR BIASED – On the questionable use of Artificial Intelligence for job applications

Country/area: Germany

Organisation: Bayerischer Rundfunk

Organisation size: Big

Publication date: 16/02/2021

Credit: Elisa Harlan, Oliver Schnuck, Steffen Kühne, Sebastian Bayerl, Benedikt Nabben, Uli Köppen, Lisa Wreschniok


Elisa Harlan works as data journalist and reporter at the German public broadcaster Bayerischer Rundfunk (ARD). She graduated from the German School of Journalism and studied data journalism at Columbia University in the US. She was a fellow at the investigative newsroom Correctiv and was one of the “Top 30 under 30” journalists in 2019, awarded by Mediummagazin. Her work was awarded with the Grimme Online Award and nominated for the Reporter:innenpreis 2021.

Oliver Schnuck Computer and social scientist by training, he works as data journalist at BR Data / BR Recherche (Bayerischer Rundfunk). Interested in the numbers behind the words and the graphics next to them. Previous work was awarded with the Philip Meyer Award. 

Steffen Kühne Tech Lead AI + Automation Lab and BR Recherche / BR Data at Bayerischer Rundfunk. Data journalist and interactive developer. Specialized in data analysis, visualization and storytelling.

Sebastian Bayerl  Full stack developer AI + Automation Lab and BR Recherche / BR Data at Bayerischer Rundfunk. Creator of user-friendly web applications and immersive interactive experiences.

Uli Köppen is Head of the AI + Automation Lab and Co-Lead of the investigative data team BR Data at German Public Broadcaster Bayerischer Rundfunk. In this role she’s working with interdisciplinary teams of journalists, coders and product developers specializing in investigative data stories, interactive storytelling and experimentation with new research methods such as bots and machine learning. As a Nieman Fellow 2019 she spent an academic year at Harvard and MIT and has won several awards together with her colleagues.

Project description:

Software programs promise to identify the personality traits of job candidates based on short videos. With the help of Artificial Intelligence, they are being advertised to make the selection process of candidates more objective and faster. In the US more and more companies and their HR departments are working with this kind of software. Also, startups in the UK and EU begin to enter the market with similar products for recruiters. We scrutinized one of those products and wanted to find out whether it makes the recruitment process more objective and fairer. 

Impact reached:

As an interdisciplinary team that works at the crossroads of journalism and computer science we think it’s important not only to write about the phenomenon but also include data analysis that would show if such AI systems were able to deliver on the product promises. Also, algorithmic accountability reporting and AI as an investigative topic still is quite an underreported field in Europe and Germany. Furthermore, we wanted to contribute to the general debate if and how we want to use those algorithms in the recruiting process as a society.  

The investigation triggered a discussion in the media and policy making landscape about the use of AI for recruiting purposes: The Markup’s Julia Angwin addressed the issue in a detailed twitter thread, the weekly magazine Der SPIEGEL and also Business Punk – a partner magazine of the political magazine stern – quoted the investigation in a longer piece about recruiting software. Also the MIT Technology Review cited our work. A tweet by one of our colleagues was largely echoed, especially in the US. In the aftermath of our investigation scientists from universities all over Europe reached out to learn more about our method and results. Also, unions got interested in the issue and refer to our findings. Policy makers who are working with and around the EU AI Act, where such AI Systems are assessed as “high risk”, regularly quote our investigation. 

Techniques/technologies used:

Based on insights from scientific research in the field of face and personality recognition on the basis of images or video materiel we developed an experimental setup and several hypotheses to test the software. Together with test persons several hundred video clips were produced. The goal: To find out whether a range of factors would affect the artificial intelligence of the software and hence the personality assessment of the candidates. The experiment was performed in two different ways: On the one hand, a professional actress wearing different outfits would answer the various job interview questions, always using the same text and way of speaking. On the other hand, video producers technically modified a considerable number of recorded videos of a diverse group of test subjects. That way, it was possible to make sure for both scenarios that only a single factor would be purposefully changed in each experiment. 

What was the hardest part of this project?

Since we entered terra incognita – there was almost no reporting and only few research on the use of AI-driven personality prediction based on video-snippets in the context of Human Resources – it was challenging to verify the results. We did that by consulting and discussing our results with experts from relevant fields such as Business Psychology and Computer Vision / Machine Learning. 

We gained access to an application that promises to evaluate the mimic, gestures and voice of job candidates on the basis of short videos. No other data journalistic team – as far as we know – has conducted such an experiment on a software that claims to use Artificial Intelligence on job interviews before. Getting access to the application and testing it under “real circumstances” – without being uncovered – was a challenging and time-consuming process. We have good reason to assume that the investigation and especially the results of our experiments can contribute to an urgently needed discussion if and how we want to use such AI driven software for recruiting at such an early stage as a society. 

What can others learn from this project?

As an interdisciplinary team that works at the crossroads of journalism and computer science, we find it important to not only write about the phenomenon but also include data analysis that shows whether such AI systems are able to deliver on the product promises. Algorithmic accountability reporting and AI as an investigative topic is still an underreported field. And each investigation requires an individual approach. Thus, every story adds to a better understanding of the field and algorithmic accountability in general. 

Our method draws on the idea of investigating an algorithm by holding the input constant – except for a single factor under consideration – while evaluating the differences in the output afterwards. This gives you an idea of the importance and the influence of an input variable without knowing the inner details of the algorithm. These (pairwise) comparisons can and should be repeated for various factors in different contexts. That also helps in order to assess whether the impact of certain factors happens by purpose or hints to technical flaws. This approach can be applied to other AI-related investigations and black-box algorithms in general. 

Project links: