For sale: your health
Entry type: Single project
Publishing organisation: Sveriges Radio Ekot/Swedish Radio News
Organisation size: Big
Publication date: 2022-04-09
Authors: Sascha Granberg, Sven Carlsson
Sascha Granberg works as an investigative data journalist at Sveriges Radio Ekot.
Sven Carlsson works as an investigative tech reporter at Sveriges Radio Ekot.
By investigating the Meta Pixel, we revealed that the largest pharmacy in Sweden, state-owned Apoteket AB, for several years shared personally identifiable data and information about over-the-counter pharmacy purchases of approximately one million online customers with Meta.
We also exposed leaks by several other actors in the health sector.
When we asked Meta what happened with the data, they responded, “[O]ur systems are designed to remove potentially sensitive data”. By creating our own virtual, fake online pharmacy and a robot customer, we showed that Meta in fact, and contrary to their claim, stored such information.
As a direct consequence of the first part of our project, all actors mentioned in our stories reported their data breaches to the Swedish Data Protection Authority, removed or deactivated the Meta Pixel and initiated internal investigations.
The Swedish Data Protection Authority opened formal investigations into three pharmacy chains and the online health care provider Kry/Livi, which had shared, for example, e-mail addresses of patients and doctors with Meta. Kry told us that their data breach had affected an estimated 90,000 users across several European countries. All of the authority’s investigations may or may not lead to fines under the GDPR.
We were also able to use the software we developed in the project to expose leaks from membership application forms of two political parties in Sweden.
In the second part of our project, we showed that Meta – despite filters, their own data policies and claims to notify advertisers who send them sensitive data – in fact stored information about medications, diagnoses and illnesses sent from our fake pharmacy via the Meta Pixel, contradicting their statements made to us.
Our investigation was the first to test the robustness of Meta’s filtering systems for health data, raising questions such as which languages they had trained it to work on and whether it worked in Swedish at all.
“For sale: your health” was shortlisted for the 2022 edition of the Swedish Grand Prize for Journalism as Innovator of the Year.
We developed several tools. In 2021, we created a “first detection” tool in Python and Selenium to identify websites that share data with third parties. Essentially, it is a scanner that:
1. goes to a given website,
2. stays there for a while, and then
3. downloads, among other things, the network traffic between the browser and third parties.
In 2022, we added a function that automatically interacts with cookie boxes in multiple languages. If our scanner finds certain third parties in the network data it can, for example, determine if there is a high likelihood that the website shares personally identifiable data with Meta.
We also have a tool in Python that identifies personal data a website has shared with a third party. We built the first version of this tool in 2021, and in 2022, we added support to include metadata (about, for example, pharmacy products).
We built our fake pharmacy using Vue.js, and our robot customer using Python and Selenium. We manually compiled the sensitive data used in our pharmacy and by our robot. To analyze what Meta was doing with the data, we mapped API calls made within the Facebook Ads Management tool and found that one of them contained names of medications, types of diagnoses and more – stored on Meta’s servers. We regularly made requests to this endpoint, and could see that Meta repeatedly stored the same kind of sensitive information they had told us their systems remove.
After confronting Meta with our results, we noticed that employees at the company began visiting our pharmacy. When the company got back to us, they did not answer our questions (for example: can the filter detect sensitive health information in Swedish?) but emphasized the responsibility of the advertiser.
Context about the project:
In 2019, the Wall Street Journal reported that health apps were transferring sensitive health information to Facebook. Two years later, New York State investigators issued a report saying that Facebook had taken measures to block the collection of sensitive data. A central component was a filtering mechanism designed to prevent Facebook from storing sensitive health data. As far as we know, we are the first journalists to investigate how robust this filtering mechanism is.
Our reporting on the Meta Pixel began in 2021. In June of that year, we exposed leaks of user data (i.e. emails) from the websites of two Swedish banks. We believe this was the first time that journalists had reported on the contents of specific parameters (i.e. user emails) being transferred through the Pixel over the web (whereas the WSJ had looked at app traffic).
In early 2022, we began to investigate actors that handle sensitive health information, resulting in the revelations submitted here for your consideration. The first story, about a mental health charity passing on personal data submitted by users of its online forum, was published in early April. Among other things, Meta collected the users’ emails when they registered or logged in, at time including the “anonymous” nickname that the charity had auto-generated for each new forum user.
We then continued to publish stories in the investigative series “For sale: your health”.
A few weeks after our first story in the series, other outlets, including The Markup, began to report on US examples of sensitive data collected by Meta via the Meta Pixel. Those stories raised awareness of Meta’s data gathering in the US, resulting in political scrutiny of the company’s practices and class-action lawsuits
What can other journalists learn from this project?
Our investigations into personal data breaches via online tracking usually follow a pattern that has become almost too predictable:
**Step 1.** We tell the company/actor what they are doing.
**Step 2.** The company/actor denies this, and tell us that they cannot reproduce our results.
**Step 3.** We provide a step-by-step manual on how to verify our claims.
**Step 4.** The company/actor stop their transfers and report themselves to the data protection authority.
**Step 5.** The company claim they were completely unaware of the transfers.
All of this shows that even though the GDPR became law five years ago, companies and actors entrusted with sensitive data lack the tools and knowledge to survey their own handling of such data on their public platforms online.
In addition, we believe that these types of breaches are generally under-reported by the media, yet of great public interest judging by our audience’s engagement with what we have published so far. In other words, finding and exposing data breaches and digging deep into what happens with the information once shared with a third party has proven an area ripe for exploration by investigative journalists.
For two years, we have asked Meta questions relating to what happens with data after they have received it. Their responses have seldom been clear-cut, and often lead to more questions that they either do not answer or answer with more ambiguity. This could often be where the investigation ends, unless you have sources from within the company.
Our work shows that there is an alternative: sometimes you can use the tech company’s own products to investigate their claims when they refuse to answer your questions.