2020 Shortlist
Can you make AI fairer than a judge? Play our courtroom algorithm game
Category: Best visualization (small and large newsrooms)
Country/area: United States
Organisation: MIT Technology Review
Organisation size: Small
Publication date: 17/10/2019

Credit: Karen Hao and Jonathan Stray
Project description:
This interactive narrative shows why decision-making algorithms can’t be completely fair when operating on data from an unfair world. It visualizes real data used in an algorithmic system to predict whether an individual will be re-arrested. It then challenges readers to adjust the algorithm to make those predictions fairer. But after every adjustment, a new notion of fairness is revealed requiring more adjustment. This build up leads to the punchline: these notions, in practice, can never be satisfied all at once. While the story uses criminal justice data, it’s lesson holds true across industries, including in hiring, healthcare, and lending.
Impact reached:
Our exploration of algorithmic bias, while grounded in an example from the criminal justice system, is widely applicable across industries and fields that are increasingly adopting automated decision-making systems. We received feedback from readers across many fields—criminal justice, law, AI, social sciences, and healthcare—saying that our piece has become an important reference and/or teaching tool for understanding the impact and consequences of this phenomenon. Overall our piece has received over 80,000 views and been publicly commended by leading experts and institutions like AI Now, Azeem Azhar, Janelle Shane, and Mary Gray. It has also been featured on radio and podcasts and become assigned reading in university courses, including at Stanford University.
While it’s difficult to track impact beyond these metrics, its wide reach shows how relevant and salient this topic is. As we’re seeing now, governments around the world have begun to actively draft legislation to address such bias. The work of researchers and journalists like us plays a critical role in helping policymakers understand how to regulate this issue. In the meantime, it’s also important to have a robust public conversation that unveils and elevates the weaknesses of algorithmic systems to the people affected by them so they can push back.
Techniques/technologies used:
Data processing: The data we were using was already publicly available through ProPublica in .cvs form. We downloaded it and filtered it down to the relevant dimensions in Python, then reduced it by randomly sampling 500 rows.
Prototyping: We began by sketching different visualization ideas on a whiteboard, then using Observable notebooks for rapid prototyping in Javascript and d3. With each prototype, we then conducted in-person user tests, where we would ask readers to read the story, then talk out loud about their thought process during the interactive sections. Based on their explicit and implicit feedback (stumbles, confusions, and otherwise), we’d develop new whiteboard sketches and improve our prototype. This cycle happened three times.
Production: Once we finished prototyping, we transitioned to our own local development environments, and used Github to collaborate on code and Github Pages to deploy. We first got things working on desktop, then quickly optimized for mobile. Then again, we conducted user tests in the real article-reading environment on and refined the interaction details until it was a completely seamless experience. Finally we did more comprehensive quality-assurance tests by testing across different browsers and devices.
What was the hardest part of this project?
The core of this project was translating obscure yet consequential statistical concepts into an approachable and engaging visualization—something that people would easily understand and that would compel them to care. All of this was hard, but the toughest part was making our data and information-dense visual as clean and simple as possible. We wanted to eliminate any cognitive burden, so people wouldn’t even realize the underlying complexity.
This required us to first deeply understand the material in order to distill it. We began by interviewing many experts in statistics, algorithm design, law, and criminal justice, to ensure that we understood the full pipeline from the technical statistical concepts to their impacts on the ground.
It also required many iterations of both visual and interaction design, as also detailed in the answer about tools and techniques. We conducted many rounds of user testing with people who were unfamiliar with the topic. We also had to overcome the design challenge of making the graphics and interactions work seamlessly on both desktop and mobile. The result is a unique interactive narrative that has quickly become a standard reference on the interplay between different conceptions of fairness, and why no algorithm can satisfy all of them.
What can others learn from this project?
This project offers lessons on both the content and process. From a content perspective, decision-making algorithms have completely permeated our lives. As many are struggling to understand and communicate what this means, including citizens, activists, and policymakers, our interactive narrative offers an authoritative, comprehensive, and clear answer.
From a process perspective, this project was a highly collaborative and iterative. Jonathan and Karen worked on everything together: we conducted our interviews together, pair-programmed together, and wrote the narrative in a tight feedback loop of communication. We also approached the story like a product, using prototyping, user testing, and quality assurance techniques to refine every element of the story. Both these facts undeniably strengthened the story and can offer a model for future collaborators.
Project links:
www.technologyreview.com/s/613508/ai-fairer-than-judge-criminal-risk-assessment-algorithm/