Aleszu Bajak

Entry type: Portfolio

Country/area: United States

Publishing organisation: USA TODAY

Organisation size: Big

Cover letter:

To the judges:

Aleszu Bajak exemplifies the best qualities of a data journalist. He gravitates to cutting-edge social science techniques, contacts experts in the field for ideas and brings them to bear on important questions in the news. Those qualities shone brightly in 2022, starting with Broken Adoptions, USA TODAY’s groundbreaking examination of how and why parents walk away from children they agreed to make their own.

We learned early on that no one had quantified how many U.S. adoptions fail. Building on work of previous USA TODAY reporters, Aleszu scoured the obscure Adoption and Foster Care Analysis and Reporting System, or AFCARS. The multi-million-record database shows the status of each child in foster care each year, including whether a child was previously adopted. Although unique identifiers are meant to link records across years, Aleszu found states inconsistent about flagging a kid’s past adoption and preserving IDs. Any responsible estimate required painstakingly weeding out nonsensical cases, such as those that went from “Adopted” to “Never adopted” in one year.

Tallying up 66,000 failed adoptions was only the start, however. Aleszu wanted to know why they failed. The breakthrough came when he unearthed a study finding 16 states with reliable IDs in AFCARS over time. Suddenly, we could track trips from foster care to adoption and back with confidence. Aleszu decided to see what happened to a cohort adopted from 2008 to 2010. Using Cox proportional hazards regression, Aleszu found statistically significant, independent risk factors predicting failure. The results might help child welfare workers know which kids need the most support at the time of adoption placement. Yet no one had done this analysis to tell them.

The reporting begged one last question. Why did states so frequently delete child identifiers from AFCARS, seemingly breaking from guidance laid down for this federally funded database? It was one of those irritating data flaws that, in this case, had newsworthy consequences. Aleszu tracked down the bureaucrat who designed AFCARS three decades ago, and she had a lot to say. Through this and other historical research, Aleszu showed how a deliberately obfuscated dataset undermines our ability to see whether billions in adoption subsidies are working.

In the second half of the year, Aleszu turned his lens on politics. He used Crowdtangle and the Twitter API to collect social media posts from 1,500 congressional campaign accounts in 2022. Then he examined phrases that rose and fell in frequency based on party to reveal trends. Later, at a summer residency with Vienna’s Complexity Science Hub, Aleszu learned of a technique called hierarchical clustering. It can identify shared linguistic patterns among speakers. Aleszu quickly pointed the algorithm at our campaign database, surfacing first-time candidates with the most in common with well-known incumbents Marjorie Taylor Greene and Alexandria Ocasio-Cortez. He then added historical perspective by tackling a decade of congressional tweets. With graphics specialist Ramon Padilla, Aleszu showed how shared Republican- Democratic Twitter clusters declined as partisans sorted themselves into like-minded camps.

I am pleased to recommend Aleszu Bajak’s body of work for your consideration.


Steve Suo
Data Editor

Description of portfolio:

[How many adoptions fail and why? Here’s what the numbers tell us.](https://www.usatoday.com/in-depth/graphics/2022/05/18/adopted-children-end-up-in-foster-care-us/9634018002/)
This is a detailed visual presentation of all Aleszu’s findings. Using Google Big Query and R, he wrangled the massive Adoption and Foster Care Analysis Reporting System database containing millions of records over a decade. In theory, database should make it possible to easily identify kids in foster care who landed there after a failed adoption. But USA TODAY data reporters who worked with the data previously found the data quality questionable. Building on their work, Aleszu created tests, in an abundance of caution, to eliminate potential false positives. Aleszu also located a researcher who examined data quality state by state, giving us a list of 16 that made longitudinal tracking of kids possible. He followed what happened to a cohort of kids adopted from foster care 2008-2010, using Cox proportional hazards regression to estimate the impact of key risk factors such as race and mental health diagnoses.

[Broken Adoptions](https://www.usatoday.com/in-depth/news/investigations/2022/05/19/failed-adoptions-america-foster-care-troubles/9258846002/)
The full investigation, incorporating data findings and human narrative. The project as a whole, which was published as premium content, was one of the biggest sellers of USA TODAY digital subscriptions in 2022.

[Broken Adoptions: How USA TODAY did its analysis](https://www.usatoday.com/story/news/2022/05/19/broken-adoptions-data-analysis-how-usa-today-uncovered-failures/9800886002/). Nerdbox.

[Broken adoptions, buried records](https://www.usatoday.com/in-depth/news/investigations/2022/05/19/bad-data-accountability-adoption-subsidies/9722162002/). Aleszu’s reporting showed how the failure of states to preserve unique IDs, a technical flaw many people would find ho-hum, means no one can account for whether or not $3 billion in annual adoption subsidies are working. He traced the history of the federal government’s foster care database, tracked down its designer, dug through audits of state data quality and examined failed attempts to penalize states that did badly. Penelope Maza, a statistician who prompoted the federal data effort to states in the 1990s, told Aleszu: “If they thought there was a penalty, they’d improve their data.”

[‘Hope’ is out, ‘fight’ is in: Does tweeting divide Congress, or simply echo its divisions?](https://www.usatoday.com/in-depth/news/investigations/2022/09/09/congress-twitter-language-used-democrats-republicans/10146954002/) This story applied a technique known as hierarchical clustering to 2.8 million congressional tweets since 2011, illustrating how bipartisan rhetorical clusters disappeared over the course of a decade, giving way to single-party groupings with their own linguistic similarities.

[Another Marjorie Taylor Greene or AOC? We found Congress’ next potential lightning rods.](https://www.usatoday.com/in-depth/news/investigations/2022/08/01/new-marjorie-taylor-greene-aoc-congress-next-lightning-rods/10014336002/) Using hierarchical clustering, Aleszu looked for commonalities between congressional incumbents and political newcomers based on social media posts from 1,500 campaigns. Marjorie Taylor Greene’s matches used terms such as “southern border,” “free speech,” and “God bless.” For AOC, it was candidates who used more generic terms like “children,” “health care,” and “voting rights.”

[‘Celebrate’ vs. ‘dangerous’: How campaigns are talking about Supreme Court abortion decision](https://www.usatoday.com/in-depth/news/investigations/2022/06/30/2022-election-rhetoric-supreme-court-roe-abortion/7762332001/) This story looked at words gaining steam in Democratic and Repubilcan campaigns following the SCOTUS decision overturning Roe v. Wade. Perhaps not surprisingly, Democrats were talking about “rights” and “women,” while Republicans spoke of “god” and “unborn babies.:” But Aleszu’s analysis quantified the differences in a way no one else had. He also quantified how much more frequently Democrats discussed the topic than Republicans.

Project links: