Software programs promise to identify the personality traits of job candidates based on short videos. With the help of Artificial Intelligence, they are being advertised to make the selection process of candidates more objective and faster. In the US more and more companies and their HR departments are working with this kind of software. Also, startups in the UK and EU begin to enter the market with similar products for recruiters. We scrutinized one of those products and wanted to find out whether it makes the recruitment process more objective and fairer.
As an interdisciplinary team that works at the crossroads of journalism and computer science we think it’s important not only to write about the phenomenon but also include data analysis that would show if such AI systems were able to deliver on the product promises. Also, algorithmic accountability reporting and AI as an investigative topic still is quite an underreported field in Europe and Germany. Furthermore, we wanted to contribute to the general debate if and how we want to use those algorithms in the recruiting process as a society.
The investigation triggered a discussion in the media and policy making landscape about the use of AI for recruiting purposes: The Markup’s Julia Angwin addressed the issue in a detailed twitter thread, the weekly magazine Der SPIEGEL and also Business Punk – a partner magazine of the political magazine stern – quoted the investigation in a longer piece about recruiting software. Also the MIT Technology Review cited our work. A tweet by one of our colleagues was largely echoed, especially in the US. In the aftermath of our investigation scientists from universities all over Europe reached out to learn more about our method and results. Also, unions got interested in the issue and refer to our findings. Policy makers who are working with and around the EU AI Act, where such AI Systems are assessed as “high risk”, regularly quote our investigation.
Based on insights from scientific research in the field of face and personality recognition on the basis of images or video materiel we developed an experimental setup and several hypotheses to test the software. Together with test persons several hundred video clips were produced. The goal: To find out whether a range of factors would affect the artificial intelligence of the software and hence the personality assessment of the candidates. The experiment was performed in two different ways: On the one hand, a professional actress wearing different outfits would answer the various job interview questions, always using the same text and way of speaking. On the other hand, video producers technically modified a considerable number of recorded videos of a diverse group of test subjects. That way, it was possible to make sure for both scenarios that only a single factor would be purposefully changed in each experiment.
What was the hardest part of this project?
Since we entered terra incognita – there was almost no reporting and only few research on the use of AI-driven personality prediction based on video-snippets in the context of Human Resources – it was challenging to verify the results. We did that by consulting and discussing our results with experts from relevant fields such as Business Psychology and Computer Vision / Machine Learning.
We gained access to an application that promises to evaluate the mimic, gestures and voice of job candidates on the basis of short videos. No other data journalistic team – as far as we know – has conducted such an experiment on a software that claims to use Artificial Intelligence on job interviews before. Getting access to the application and testing it under “real circumstances” – without being uncovered – was a challenging and time-consuming process. We have good reason to assume that the investigation and especially the results of our experiments can contribute to an urgently needed discussion if and how we want to use such AI driven software for recruiting at such an early stage as a society.
What can others learn from this project?
As an interdisciplinary team that works at the crossroads of journalism and computer science, we find it important to not only write about the phenomenon but also include data analysis that shows whether such AI systems are able to deliver on the product promises. Algorithmic accountability reporting and AI as an investigative topic is still an underreported field. And each investigation requires an individual approach. Thus, every story adds to a better understanding of the field and algorithmic accountability in general.
Our method draws on the idea of investigating an algorithm by holding the input constant – except for a single factor under consideration – while evaluating the differences in the output afterwards. This gives you an idea of the importance and the influence of an input variable without knowing the inner details of the algorithm. These (pairwise) comparisons can and should be repeated for various factors in different contexts. That also helps in order to assess whether the impact of certain factors happens by purpose or hints to technical flaws. This approach can be applied to other AI-related investigations and black-box algorithms in general.