At the start of the school year, and amid the covid-19 pandemic, the Portuguese government ruled that every coffee shops, bakeries, and restaurants at 300 meters by any school would have a limit of four people by a table. But how many businesses were affected by this rule? No one seemed to know – not even the government.
Scrapping a geolocated database of Portuguese schools, and using Google Places API, we manage to find out that at least 21 thousand businesses were affected.
This story was very successful because it answered the questions that the government was not able to answer when they announced the law: how many businesses were affected by it and how could a business owner (and their clients) know if that business was under that rule.
The first challenge was to find a geolocated database of all schools and universities in the country. I had a list of all schools and I’ve added the university campus to it and was ready to geolocate it all. But then I’ve found that the Portuguese government had that database but you had to use a complex system of dropdowns to get the geolocation of any school. Fortunately, I was able to find the json file that was feeding that website which had all the coordinates for all schools. Using R, I’ve converted that json file to a data frame.
Because this website lacked all college institutions, I had to use another database and then geolocate all of them by hand.
With all the data loaded, I’ve used Google Places API to request all coffee shops and restaurants 300meters close to any of those coordinates. Because the API returned results that were not 300 meters close to that point, I then calculated the euclidean distance to the point, filtering out those cases that were too far away.
Then, while a reporter contacted some of the businesses that were affected by the rules, I built an interactive map using Mapbox that allowed the readers to explore the business affected by the law.
What was the hardest part of this project?
Time and the bill. This was a kind of breaking news story and we had to answer fast to it. Finding all schools coordinates took me half a day, but I wasn’t considering the time that asking that data would take me.
I also had a little problem with Google Cloud. I was using the free credit they give you when you create an account and was expecting it to only give me the results that were precisely 300m from those points. After doing quick math based on the results from the first 200 schools, I thought that I would hardly pass that amount. But I forgot that the big cities were not yet collected, where there are more schools and more business – also some businesses were under the radar of two schools, which meant that I would have to delete those duplicates later – but they counted as an API request anyway. It was 4 am when I went to sleep, expecting the data to be all collected in the morning. It was indeed, but with a 300 euros bill.
What can others learn from this project?
This kind of project is the kind that shows pretty well how data journalism can work on breaking news and how a data journalist’s brain works. Most people thought it would be impossible to measure how many businesses were affected, but I thought about the possibility of using the school’s coordinates and Google Places API to answer it. Of course, it is an estimate – I was not able to request Bars and Bakeries because of my bill problem. And we need to take into consideration the fact that there’s a percentage of businesses that are not on Google Maps.