Followed on the internet: hundreds of websites violate your privacy

Category: Best data-driven reporting (small and large newsrooms)

Country/area: Netherlands

Organisation: NOS

Organisation size: Big

Publication date: 3 Jan 2019

Credit: Joost Schellevis, Winny de Jong

Project description:

Both European and Dutch law forbid the placing of so-called tracking cookies on someone’s phone or computer without explicit permission. That is why many websites display a cookie-message: if you agree, you give websites permission to track your online behavior through tracking cookies. Research by the NOS shows that more than 1300 in the Netherlands popular websites place tracking cookies at the first site visit without permission.

Impact reached:

Following our publication the Dutch Data Protection Authority (Dutch DPA) started an investigation looking into the permission requirement regarding tracking cookies and whether website owners obey this requirement. Some of the websites also stopped their practice of placing tracking cookies without permission, and so stopped violating the online privacy of their website visitors.

Techniques/technologies used:

For this investigation we wrote several computer scripts. First we created a list of all sites, around 10.000 in total, we wanted to investigate. Second Joost Schellevis wrote a script in PHP to visit each website 10 times. Hereby using the sandbox function from the Google Chrome webbrowser to make sure we would get a clear overview of cookies placed by every site. Using the sandbox function, we could gather placed cookies for every website in a seperate folder. Use of the sandbox function also guaranteed that we indeed simulated a first time visit to a website. For every visit we stored all cookies placed by the visited site in a different folder. This information then became the basis for our first dataset. 

Next we created a list of all unique cookies websites had placed on our machines during the 100.000 first website-visites we simulated. For every cookie we checked if it was a tracking cookie or not. Part of this work was done low-tech, with help of several colleagues to check every cookie we came across

Using a Python script we combined this second dataset with the data from all site visits. We filtered all sites that placed tracking cookies on three or more first visits using the Python Pandas library. 

We found over 1300 websites that placed tracking cookies without the users permission at the first visit. Among them Dutch political parties; insurance companies, popular websites for children and youngsters and many sites of media companies. 

What was the hardest part of this project?

The hardest part of this project was making sure our findings for every website visit would be and stay isolated from both other visits to the same websites and visits to other websites all together. Since we decided to visit 10.000 websites all 10 times, this required some thought. 

We ended up using the Google Chrome sandbox, creating a folder for every website and website visit. This way, we automatically isolated cookies placed by a website for every visit; making sure our final result would be trustworthy.


What can others learn from this project?

With the use of technology, comes responsibilty. This investigation shows that many organisations – among others companies that handle healthcare data and sites from political parties – aren’t as ready as one might expect them to be when it comes to taking proper care of protecting the online privacy of their website visitors. 

Journalists create the maps people navigate society by: since society digitalizes with godspeed, journalists need to map the digital world too. This publication shows that sometimes no exclusive technologies are needed to do so: our reporting relied heaviliy on the use of the Google Chrome sandbox, a tool available to all.

Project links: