Pixel Hunt (The Markup)

Entry type: Single project

Country/area: United States

Publishing organisation: The Markup, STAT, The Verge, Chalkbeat

Organisation size: Small

Publication date: 2022-04-28

Language: English

Authors: Simon Fondrie-Teitler, Surya Mattu, Colin Lecher, Todd Feathers, Angie Waller, Katie Palmer, Micha Gorelick


Simon Fondrie-Teitler, infrastructure engineer
Surya Mattu, data journalist and engineer
Colin Lecher, reporter
Todd Feathers, enterprise reporter
Angie Waller, research manager, special projects
Katie Palmer, health tech correspondent at STAT
Micha Gorelick, data scientist

Project description:

With the help of web surfing records shared by volunteers, The Markup revealed that websites across multiple industries and government agencies were transmitting sensitive information to Facebook through the Meta Pixel, a Facebook tracking tool that helps site operators better target their advertising. This included health systems, hospitals, and telehealth providers sharing information on users’ health conditions, medications, alcohol use, and reports of self-harm; tax preparation sites sharing information on income, refunds, and dependents; and the U.S. Department of Education sharing the personal contact details of student aid applicants.

Impact reached:

As of Oct. 20, 2022, at least 35 of the 40 hospitals and health systems identified in our coverage had removed the Meta Pixel entirely, or from portions of their websites. At least four health care systems sent federally mandated “breach” notifications to 6.3 million patients and the U.S. Department of Health and Human Services (HHS), disclosing possibly inappropriate transmission of data to Facebook. Numerous telehealth companies said they had removed the trackers or were reviewing their use of the tools.

Tax preparation companies TaxAct, TaxSlayer, H&R Block, and Intuit, the maker of TurboTax, all either removed their Meta Pixels or changed the pixels’ settings to capture less data after they were contacted for comment.

The U.S. Department of Education, after initially denying the tracking happened, reversed course before we published and changed the settings of the pixel to reduce the information gathered on applicants.

Three big tech companies—Google, Snap, and Pinterest—said they took action to investigate or stop the data sharing detailed in our stories. Facebook said it contacted the Department of Education.

Hospital patients filed at least five proposed federal class action lawsuits against Meta, alleging that the company broke various state and federal laws. Customers of H&R Block also filed a proposed federal class action lawsuit against Meta, alleging that the company violated contractual promises.

HHS cited our story in new guidance on the lawful use of online tracking technologies.

Our coverage prompted questions in hearings in the U.S. Senate and House of Representatives and was cited in letters from eight senators and three members of Congress to subjects of our stories. One of the letters, signed by six U.S. senators, said the sharing of tax information was “an appalling breach of users’ trust” and “potentially illegal.”

Techniques/technologies used:

In 2019, Surya Mattu, a data journalist and engineer at The Markup, began studying how and where Facebook’s pixel was used. By late 2020, he’d discovered that close to a third of the world’s top 100,000 websites had the pixel installed, and he launched an online tool called Blacklight so our readers could discover for themselves whether a particular webpage was using the tracker.

To find out what sorts of sensitive information the pixels were funneling to Facebook, Surya and Markup co-founder Julia Angwin brought a proposal to Mozilla, the maker of the Firefox web browser. They asked to work with Rally, a Mozilla project that allows participants to opt in to sending data to researchers studying online activity. The Markup provided Rally with specifications for the type of data sharing we were interested in, and Rally added the collection functionality to its tool as a project called “Facebook Pixel Hunt” that Firefox users could join (more than 5,000 ultimately did).

The project captured data in places that are challenging to observe, such as behind login pages. Although Rally had worked with university researchers, The Markup was the first (and is so far the only) journalism organization to tap its data.

To better understand the pixel ecosystem, Markup engineer Simon Fondrie-Teitler created dummy pixels, Facebook business accounts, and websites. Reporters Todd Feathers and Colin Lecher, working with Simon, helped to test each of the websites we wrote about, including by going through the appointment booking process on more than 100 hospital websites and completing onboarding forms on 50 telehealth sites. A “Show Your Work” article detailed the methodology The Markup used along with what we learned about the inner workings of the Meta Pixel. For many of our stories, we released the data we collected on GitHub.

Context about the project:

The data collected in partnership with Mozilla Rally was reasonably straightforward to use, but there were two main pieces of work that needed to happen before we could begin looking through it in earnest. One is that the data collected by Rally was in Google Cloud Platform’s BigQuery database but was not in a format that was easy to parse. We had to create scripts to parse and reformat the existing data into tables with a structure that was easier to query—so we could more quickly find the information we were looking for.

We also needed to figure out what we were actually looking at. Meta documents how to use its pixel as an end user, but doesn’t document what the parameters sent by network requests made by the pixel represent. So we read the existing documentation, then created a sandbox website with an instance of the pixel, fed in our own data to the JavaScript API Meta provides, and tweaked settings in it to see how data being sent changed. From this information we could infer the meanings of the unknown parameters.

We also faced challenges deciding what tracking evidence we were going to collect and how we were going to categorize it. For example, on the telehealth story we examined not only the Meta Pixels but also trackers from Google, TikTok, and others. Trackers from different companies all behaved slightly differently, and one company’s tracker often behaved differently on different websites, so we had to do a lot of data collection, cleaning, refining, and recollecting before we finalized 1) what we were able to measure conclusively and 2) how we were going to categorize it (i.e., what constitutes personal information, what constitutes a “checkout” or “add-to-cart” event).

What can other journalists learn from this project?

We hope Pixel Hunt inspires other newsrooms to think creatively about how to get the answers they want. In our case, we knew there were millions of Meta Pixels in the wild and that close to a third of the largest 100,000 websites had the pixels installed. But before we launched Pixel Hunt, we didn’t have a clear path to answering our most pressing question: What sorts of sensitive information were all these pixels funneling to Facebook?

We suspected that answering this would require getting inside walled off parts of the web like patient portals, tax preparation apps, and government benefit sites. This led us to our partnership with Mozilla Rally. From our experience with Rally and on Pixel Hunt, we learned lessons that could prove useful to other journalists, including:

People are often willing to share personal data with you if you ask them the right way. Our project page for Rally users (https://rally.mozilla.org/past-studies/facebook-pixel-hunt/) provided a nine-item list of exactly what data we were interested in collecting along with three ways we protected the data once we had access to it.
Partnerships magnify journalistic power. Our work with Rally helped us obtain sensitive data, a reporting partnership with STAT expanded our range of stories, and co-publishing with STAT, The Verge, and Chalkbeat helped reach larger audiences across multiple industries.
It pays to show your work. We published a dedicated methodology article and also detailed some of our methods within each article. We also kept records of the web browsing sessions we conducted to verify what we saw in the Rally data. All this bolstered the credibility of our coverage and helped us confidently answer questions from readers and the subjects of our stories.

Project links: