Goodbye Big Five

Category: Best data-driven reporting (small and large newsrooms)

Country/area: United States

Organisation: Gizmodo

Organisation size: Small

Publication date: 22/01/2019

Credit: Dhruv Mehrotra and Kashmir Hill

Project description:

Reporters Kashmir Hill and Dhruv Mehrotra spent six weeks blocking Amazon, Facebook, Google, Microsoft, and Apple from getting their money, data, and attention, using a custom-built VPN. Here’s what happened.

Impact reached:

The series recieved over one million reads and provided a striking look into how these companies control internet infrastructure, online commerce, and information flows. The piece furthered the existing discourse about technology monopolies and just how integrated they are in our lives.  

Techniques/technologies used:

In our initial analysis, we wanted a basic understanding of how much data was flowing to a tech giant given a specific behavior, so Mehrotra built network monitoring software to independently conduct experiments to gather data. For instance, when Kash wanted to go on a run, she would direct the software to begin monitoring her network traffic and then assign the data-capture a label. This is how Gizmodo linked network activity to specific behavior. The software then used WHOIS lookups to categorize where each packet was headed.

The next step was to actually block outgoing traffic to each tech giant. To do that we first needed to identify the various IP networks that each company operates. Internet infrastructure relies on a certain level of transparency in order for data to be routed appropriately through the multitudes of networks that comprise it. As such, we were able to utilize the public Autonomous System Numbers of the various tech giants to identify their IP networks.

Armed with a means to categorize IP addresses,we crafted firewall rules on our VPN to drop packets associated with the five tech giants. A firewall rule specifies criteria for how your computer should handle internet packets. For example, if your VPN spots data traveling to on port 5222, our packet filter would recognize it as traffic to WhatsApp, a Facebook-owned company, and drop it. 

What was the hardest part of this project?

When designing our system, we did not consider how the prevalence of content delivery networks, or CDNs, would affect our blockade. Many websites and apps are not actually sent to your browser directly from their hosting provider. Instead, often times there is a middle-man, a CDN, that acts as a buffer between your browser and the company’s servers.

The reason for this is speed and security. A CDN will store versions of a company’s content in multiple geographical locations in order to deliver it to the end user faster. If you think of the internet as a bunch of wires, instead of as a kind of omnipotent cloud-like thing, the reason for this is quite obvious: the closer you are to your content, physically, the faster you will get it.

For our purposes, what this meant was that when we were blocking the companies that do web hosting as a service—such as Amazon, Google, and Microsoft—websites they host would evade our firewall if they used a CDN from a third party because it didn’t look like the website was being sent from a tech giant. That’s how Airbnb and Gizmodo itself, which are both hosted by AWS, broke through our Amazon blockade.

What can others learn from this project?

All of our code is open source, and we wrote a tutorial so others can also experience a tech-giant free life. 

Project links: