How do journalists show the potential impact of a new technology on society best? They try it out themselves. That’s what we thought when we crawled hundred thousands of images from Instagram and searched them for the faces of 2500 Swiss politicians – and found hundreds of matches. We were suprised how easy an open-source version of a popular facial recognition library can be leveraged to effectively build a little surveillance machine that can ingest thousands, if not millions of publicly available images and dig through them in search for friends, foes, and PEPs.
Our experiment came right after the 2020 Clearview revelations by the NYT. While this investigation was ground-breaking and certainly had a lot of impact around the world, it remained vague regarding the simplicity and power of the technology behind it. And it also failed to emphasize how vulnerable large social networks still are to crawling and essentially data theft. That’s why we thought we should try out bulk collecting images and using publicly available technology for facial recognition ourselves. Only so does our reader realize how dangerous facial recognition can be, when even journalists can build their own surveillance machine. In our experiment, we were able to identify several hundred candidates for and members of parliament in around 200’000 Instagram images from popular Swiss events, some of them dating back to as early as 2015. In one case, we identified a female candidate for the Swiss parliament, showing her in her teens, participating in a street rave. In another case we could identify several of these candidates and MPs participating in political rallies. While as such, these examples might not be as spectacular, they clearly show what’s possible with facial recognition combined with the availability of images that were uploaded to the Internet some point in time: Search for somebody just using his or her facial features and uncover potentially harmful images that might damage somebody’s reputation when taken out of context, or even worse, when used for repression. After our research, we confronted several of the people we found in the images, and they were all more or less shocked to see their “past” uncovered with such a technology. Also, we confronted Facebook asking them why their Instagram portal is still so easily crawleable, getting as the answer that they were looking into the problem and that the safety
We built our facial recognition system with Python and used it to glue together different parts of the system:
- An Instagram crawler that downloaded all pictures having a particular hashtag of a Swiss event (e.g. #streetparade2018), using the https://instaloader.github.io/ library which was of great help.
- A facial recognition machine that used the open source Dlib library and a Python wrapper (https://github.com/ageitgey/face_recognition) to extract faces, embed them, and match them with “search” faces that we extracted from portrait images our 2500 politicians.
- A MongoDB instance to store all the images, intermediate results, and statistics.
- A React frontend that we used internally to go through potential matches and find actual matches with which we could confront the politicians.
- Another React frontend that would showcase some of these matches to our readers (see some examples of matches in the “Testen Sie die Suchmaschine!” section at the top of the project link).
What was the hardest part of this project?
The hardest part of the work was getting started: Understanding the technology and finding good open source libraries and packages that could be used out of the box. Also, fine-tuning the facial recognition algorithm to our needs and re-iterating on the system that we used to handproof potential facial matches took some time. Nevertheless, it was still possible within 3-4 weeks, which showed us how anyone with some time and dedication could come up with a similar system.
This work should be selected because it required us to try out an often written about, but rarely really understood technology, facial recognition. For this, we actually needed to deeply understand the AI behind it, to dig through pages of documentation, and to write their own code in several iterations to achieve what’s otherwise behind the curtains of big companies. We also took the effort to really explain how facial recognition works to their readership – to leave the often superficial explanations behind and explain the concepts on a high-level, so the readers would understand the implications of that technology, such as racial bias. Lastly, we took the risk to violate Instagram / Facebook terms of service and to also violate Swiss data protection law. However, public interest was on our side and we were not charged with anything. In fact, the Swiss data protection officer actually, while condemning the bulk-scale collection of image data, praised our work for showcasing the dangers of that technology to a large audience.
What can others learn from this project?
We think it’s rewarding to try out novel and possibly impactful technologies like facial recognition, deep fakes, and the like on your own. Nowadays, with the wide-spread availabilty of open source code and great documentation, also journalists and newsrooms with less engineering capabilities can do this. We think that the impact on the readers is just so much higher when you can show them that you were able to do it yourself, as it certainly helps in terms of credibility to not to be relying on an external expert.