Investigating Swedish bank websites, we exposed personal data breaches and security vulnerabilities. Avanza, with over 1 million customers, had designed their website in a way that made it possible for anyone to identify its customers. For over a year, Avanza also sent customers’ personal data to Facebook. Länsförsäkringar, with over 350,000 bank customers, tracked users without their consent and sent personal data belonging to customers to Facebook, and others. As we reported the story, Collector, a niche bank, discovered a breach of customer data to Facebook – but chose not to inform the public or report it to the authorities.
The banks that were leaking customer data almost immediately changed their websites when we started asking questions. Avanza made a complete overhaul of parts of its website, following our first report on how we managed to identify several senior government officials as customers in the bank using the bank’s own web applications. The Swedish Financial Supervisory Authority (FI) initiated an investigation, which it later closed. Following the first story, we reported that Avanza had, for over a year, sent customers’ social security numbers, telephone numbers and information from loan applications, to Facebook. Avanza initially denied this, but confirmed after we gave them a step-by-step guide on how to recreate our findings. FI opened a new investigation into Avanza after our reports on the breach of personal data to Facebook. Avanza and Länsförsäkringar, who according to our research leaked personal data from customer registration forms, also reported themselves to the Swedish Data Inspection Authority (IMY), which initiated formal investigations that are still ongoing. Both banks are risking fines related to GDPR breaches as a result of our investigation. In an interview, the Swedish Minister for Financial Markets, said that our investigation shows the need for new legislation. Many Swedish media outlets picked up our story. Our work was one of three projects shortlisted for The Swedish Grand Journalism Prize, Stora Journalistpriset, 2021 in the category Innovator of the Year.
We continue to use the tools we developed for this project to uncover other personal data breaches. In September 2021 we revealed that the Swedish Equality Ombudsman (DO) leaked personal data of people who used their website to report discrimination to an external company. The data leaked included sensitive data relating to health and more. DO estimates that information was leaked about hundreds of individuals, and is now being investigated by IMY
We started by setting up our own website and activated various trackers. We added different HTML elements and used the information we gathered, by interacting with the website while changing settings on the trackers, to build Python scripts that extract posts of personal data being sent to a third party, from a browser HAR log. We built a Python script that we used as a “first detection” tool, to filter out websites that were using trackers with specific settings that we had previously identified as particularly sensitive. After that, we collected data from banking websites, including data while we were logged in as customers. This process, of collecting browser network logs, was done manually. In total, we collected around 200 HAR logs for about 10 websites. We used the Python scripts we built to extract relevant information from the logs.
Because we collected the data manually, we got a first look at all data that was being sent to and from the websites. On Avanza’s website, we noticed that they had an open endpoint that allowed for anyone to check if a user was registered or not. Since we were investigating how banks manage customer data, we started to look for ways to identify actual customers – and we found one. In essence, Avanza had two functions that – when used together – could be used not only for “user enumeration” but also to connect each user to a specific social security number. Using another Python script, we managed to connect several senior government officials to accounts in the bank. One of the people we identified was the previous chief legal officer at the Swedish Financial Supervisory Authority, whom we also interviewed for our story.
What was the hardest part of this project?
We encountered many challenges in this project, the list below shows some of them.
- Could our results be impacted by A/B testing? To be certain that developers running experiments weren’t impacting our results, we always collected logs across different browsers and versions of browsers.
- What is being collected by a tracker? We built our own website and interacted with it to fully understand what various trackers collect.
- How and where do we draw the line between personal data and no personal data? Every time a website forces a user’s browser to communicate with a third party, personal data in the form of browser user-agent and ip-address will be received by the third party. In our reporting, we only highlighted websites that were sharing personal data generated by the user when, for example, submitting forms, logging in et.c.
What can others learn from this project?
We’ve seen a lot of journalism investigating tracking online. Some of it has focused on the mere existence of trackers on certain sensitive websites. This makes sense because that data collection can be fully automated. By adding manual elements to a data journalism project, in this case parts of the data collection, we can sometimes learn new things.
If you’re interested in uncovering the kind of breaches we did, start interacting with websites and save your browser’s HAR log while you do it. Submitting web forms is a good place to start.
When you build scripts, strive to make them reusable to help you answer similar questions in the future.