Star Wars – Uncovering the Networks of Fake-Reviewers on Google

Country/area: Switzerland

Organisation: SRF

Organisation size: Big

Publication date: 12/10/2021

Credit: Lukas Frischknecht, Julian Schmidli, Pascal Albisser, Stefanie Hasler, Flurin Maissen


Lukas Frischknecht, data reporter, has a background in data science and robotics.

Julian Schmidli, investigative data reporter, has ten years of experience in datajournalism.

Pascal Albisser, visual journalist, builds and designs interactive news apps.

Stefanie Hasler, video reporter, films, produces and hosts the webformat SRF forward.

Flurin Maissen is a TV reporter for SRF Kassensturz.

Project description:

This investigation series uncovered several networks of fake profiles that were deployed to leave fake reviews on Google places against money. The investigation found thousands of profiles linked to SEO firms – and hundreds of Swiss businesses that bought fake five star reviews to unlawfully manipulate their reputation and page rank. SRF applied different scraping- and network-analysis-techniques to uncover those networks and used OSINT and classic reporting to investigate businesses and find victims of this fraudulent behaviour. The results show that authorities are not aware of the scale of the problem and Googles algorithms are not doing enough to stop

Impact reached:

The investigation lead to regulation activities of Swiss authorities concerning doctors and clinics. Several businesses reacted and deleted their fake reviews. One SEO company closed down.

Techniques/technologies used:

To extract the data we distinguished between scraping the reviews of a place and scraping the reviews of a user. To scrape the reviews of a user we built a custom scraper written in Python. We used Selenium, ChromeDriver and a tool called BrowserMob Proxy which allowed us to cache the responses and efficiently extract the data. To scrape all the reviews of a place we used Apify.com, because of its clean proxy army integration, which allowed us to overcome Google’s rate limiting and blocking wall. We were able to build a comprehensive user location graph of over 1 Mio. reviews. Initially we scraped all the reviews of a manually compiled list of places and then all reviews of the users behind theses reviews. This process was repeated several times to expand the user-place graph to multiple levels of depth. To find suspicious users we used the method published in this paper https://t.co/hm7XBVTX5l. The analysis involved folding the bipartite user-place graph to obtain a user-user and place-place similarity graph. The folding of the graph was done via adjacency matrix multiplication, which was the computationally most intensive part. We then continued our analysis in Gephi, leveraging the rich collection of graph algorithms available in this great tool. To extract the suspicious users we ran community detection algorithms on the user-user similarity graph. The resulting communities were then analysed manually to distinguish fake cliques from legitimate cliques. Checking the remaining users in these communities manually, we were able to pick up further signals for fake users, such as stock images and spelling mistakes. The Scrollytelling was built with Mapbox, React.js and Javascript.

What was the hardest part of this project?

The difficulty of this project was the development of a functioning snowballing-approach to gather the right data (more signal and less noise) and filter it to differentiate between networks of fake profiles and networks of existing people.

What can others learn from this project?

That it is possible to combine BigTechAccountability and AlgorithmicAccountability with datajournalism techniques to cover different relevant perspectives on a subject and tell in through ways of interactive web apps, video and audio.

Project links: