After a financial crisis at the beginning of the last decade, the Portuguese economy started to strive. The nice weather and low-cost flights made tourism bloom, but brought upon challenges with the gentrification of the old and biggest cities in the country. With rising prices on rents and meals, Portuguese people started to question: can I really live in this city?
With information from multiple data sources and the ability for the reader to provide personal context, we created an insightful news application that enabled people to find out if their budget crossed the minimum threshold for livelihood.
When I moved out of my parent’s house, in 2012, I rented an apartment for 450€. This year, I saw the same place being rented out by 850€. In seven years, the rent for this flat almost doubled, even though the average national income in Portugal only increased 135€. On average, rent prices in Lisbon have increased by almost 40% in eight years. I realized that that wasn’t a single experience, almost everyone was complaining about rising prices in major cities – especially referring to housing prices.
This tool allowed readers to put themselves in context. Using input data from the user, we allied algorithmically generated text with storytelling techniques allowing people to compare themselves with the average prices in a city, with data gathered from prices of housing, salaries, water, gas, electricity, grocery, and schools.
Because much of this data was never aggregated in such a way, the work sparked a national debate around cost of living in the country.
One of the main goals of this project was to create a 100% personalized reading experience. Every time I read an interactive feature that used input data it always felt like a “fill in the gaps” exercise. For this project, I wanted the reader to be tricked to believe that the article was written with his personal situation in mind and enabled him to gather actionable context about his life.
We ended up using Vue.js reactivity to be able to enable all the mathematical equations and to markup the text so that the proper blocks of text were displayed. This was all backed up by a very complex logic system so that no text in the article seemed like it was written by an algorithm.
On the data side of the story, I used R for all the web scraping from multiple websites that stored the necessary data. R was also very useful to create an API needed for the project (using the plumber package) and to generate the multiple JSON files that Vue.js used.
What was the hardest part of this project?
This was the most challenging and complex data-driven project that I was involved in my career. For example, I never considered that the Portuguese language was of such complexity. Because this concept is so rooted in my speaking habits, I never fully grasped that Portuguese has grammatical gender. So, as I wanted the text to sound 100% natural, I had to develop an API to guess the gender from the reader’s name when possible.
Another challenge was making my logical side work together with my creative/journalistic side. The nature of the project required me to think about all possible outcomes when comparing data and personalizing text while sounding as natural as possible. I had started to write a draft on Google Docs, using a system of numbered blocks of text and colored “if” and “else” tags, but that ended up very confusing and the outcome wasn’t very natural. So, I decided to write while marking it up with the logic behind the multiple options. This made it possible to check if I was missing something immediately, but ended up being very weird to me because I had to use my “writer brain” and my “coder brain” at the same time.
What can others learn from this project?
I believe that making the effort to bypass a “fill in the blanks” approach in automatically generated text and trying to generate text that doesn’t sound written by a machine is a challenge journocoders should focus more on – especially when English is not the language being used.
In my case, writing in a “if-else” logic system was always triggering the “lazy” side of my brain because I always wanted to write paragraphs like “The house prince increased by in ” “The house prince decreased by in ”. And even though there is nothing wrong with that kind of sentence, we know actual news doesn’t contain only that type of sentences. They provide additional context, explaining why the house prince increased in the specific region. Which means that, in your code, you have to probably write an almost custom paragraph for that specific case. This effort is what makes the automatically generated text sound like it wasn’t written by a robot, even if sometimes it is a phrase that only makes sense if four conditions are true.