Who’s really behind Canada’s most active Airbnb host accounts?
Category: Best data-driven reporting (small and large newsrooms)
Organisation size: Big
Publication date: 30/04/2019
Credit: Naël Shiab & Valérie Ouellet (Data Reporters), Zach Dubinsky (Investigative Reporter), Philippe Tardif & Francis Lamontagne (Web Designers), Vincent Maisonneuve, Romain Schué, Lina Forero (Additional Reporting).
With questions swirling worldwide about the impact of short-term rental apps on the real estate market, CBC/Radio-Canada wanted to know: who’s really behind Canada’s most active Airbnb accounts, and are they legit? We scraped over 32,000 listings to find out.
Our investigation revealed many of Canada’s most prolific “mega-hosts” are fronts for multimillion-dollar companies and Airbnb is thriving in zones prohibited to short-term rentals. Our stories also questioned the company’s own failsafe mechanisms by showing how easily one apparently fraudulent host was able to build a complex web of accounts to hide a trail of bad reviews.
After CBC’s initial two stories aired, Quebec’s Tourism Minister Caroline Proulx acknowledged that the law in Canada’s most Airbnb-dense province lacked clarity and would be changed. Our story had also found that Revenue Québec, which was responsible for monitoring Airbnb listings since June 2018, has not issued a single fine — only warnings. The Revenue Ministry also assured CBC/Radio-Canada that fines would be issued from this point on.
After CBC aired its third story, about the prolific but likely swindling host account “AJ” and his eight other aliases, Airbnb shut down all nine accounts permanently, citing violations of its terms around truthfulness and potential fraudulent activity. Dozens of guests who had already booked were to be offered complete or partial reimbursements. It is believed to be the most significant action Airbnb has ever taken against a Canadian host.
We began by coding a complex Node script to scrape more than 32,000 entire apartment, condo and house listings posted in 16 Canadian cities on one day in April 2019. Then we used a Python script to clean, structure and analyze the tens of thousands of ads we’d scraped. That allowed us to group properties with the host who managed them, leading us to Canada’s most prolific “mega-hosts”.
We produced a tip sheet of individuals who were each operating up to 270 listings in various Canadian cities — and then used traditional investigative reporting techniques to find out who they really were. Using tools like reverse image search engines, we found that while many had personable-sounding profiles, they were using fake headshots and were in fact owners or employees of multimillion-dollar companies.
Our second story used census data and GIS technology to calculate the concentration of Airbnb listings in each Canadian city and neighbourhood. We asked each city for an official shapefile and breakdown of their total number of private dwellings by neighbourhood – and used QGIS software to calculate what percentage of these properties were listed on Airbnb.
For our third story, we zeroed in on one of Canada’s most prolific hosts: “AJ” in Montreal, and his seemingly fraudulent behaviour. We scraped hundreds of reviews connected to his 90-plus properties listings and were able to track down former guests who told us they’d had horrible experiences. After analyzing patterns in reviews and listings, we found “AJ” wasn’t operating alone. We discovered he ran eight other accounts on Airbnb under other names, all using fake profile pictures, that rented out the same properties and even gave each other glowing reviews.
To share our findings, we created a map with D3js and animated it using the scrolly telling technique.
What was the hardest part of this project?
Telling this story was both a challenge on the data science and the investigative front. There was no way to directly download the Airbnb data. It took a significant amount of time for our team to understand how Airbnb’s data was structured and develop a robust code that would connect to the website’s API and collect listings advertised in 16 Canadian cities – meaning 16 slightly different scripts, with lots of requests and time. For our spatial analysis, the main challenge was to match the Airbnb listings we collected with shapefiles and customized census data from a half dozen cities.
Uncovering the true human behind each “mega-host” account was equally as challenging. We quickly discovered that, despite Airbnb’s ID verification, hosts could still use an alias and a stock photo. We were able to track down some hosts by comparing landmarks visible in their listings photos against Google Maps, then scouring property records; one of the hosts was unmasked after we noticed a well-known church steeple in one of his photos. We pieced together “AJ”’s web of aliases by assiduously tracking down two dozen of his guests based solely on their profiles — a major challenge on Airbnb since profiles often only have a first name and a hometown. To find these people, we took to social media to look for comparable headshots, used databases of professional organizations like teachers or doctors, and often just guessed, luckily, what their email addresses might be. Contacting them proved invaluable because they sent us their communications with one or another of the AJ aliases and all the information from the listings they had booked, all of which enabled us to say definitively that the properties and the operator behind them were one and the same.
What can others learn from this project?
From Barcelona to San Francisco, cities around the world have been struggling to better understand the impact of short-term rentals on their neighborhoods and economy. In many places, they’ve had to vote on regulations and navigate lawsuits while dealing with short-term rental companies like Airbnb who consistently refuse to release their raw data and prefer sharing carefully chosen aggregated figures rather than being completely transparent and held accountable for their actions and their host’s actions. Many cities are well-intended, but lack the technology and resources to even know how many rentals are, in fact, illegal or leased over the maximum of days prescribed.
We designed this project and methodology so it could be duplicated by any newsroom in any city or country, big or small, and hope news outlets around the world will take advantage of that. We used census data and QGIS which is a free software as well.
Another great lesson is how valuable traditional investigative techniques are when powered by data-driven findings. Without the valuable background information our investigative partner Zach Dubinsky dug up on these mega-hosts, our stories would have only been another Airbnb explainer with general numbers and a few interactive maps. He spent hours looking for patterns in reviews we’d scraped, speaking with countless disgruntled tourists and digging through property records to find the real life humans and victims behind the data, which enabled us to bring our stories to life in broadcast media. The stories generated an outpour of tips from Canadians who felt they’d also been scammed by Airbnb hosts, including AJ.