Using the data from the national exams and the percentage of poor students by municipality, I calculated the euclidean distance to a hypothetical place where all students were poor and yet had 20 out 20 on their national exams. That led us to four municipalities that contradicted the trend of a poorer environment meaning worse marks at school and talked with schools and local authorities to try to understand what was being done differently in those places.
Every time the national exam results get published, it’s always the schools from richer areas that get highlighted by the media. Using this statistical approach – a bit more statistically complex than simply ordering columns in Excel – I was able to give a voice to schools that don’t often get highlighted for their good work in very precarious conditions.
That also led to good critics of my piece, especially because every time this data is turned publically, the major criticism made to the media is that they always ignore socio-economic factors.
The national exams database is a huge .mdb file that the Ministry of Education provides to the journalists under embargo. I’ve used R to read that database and build the excel file for other non-data savvy reporters that were working on other stories about this issue, highlighting possible stories.
Since the data on the database gives us the results for all students in the country, I’ve started to filter out the results for the eight exams with more students and grouping by municipality calculating the mean value for every municipality. Then, using the data about the percentage of students that the government provides some kind of aid because of their parents’ income, I’ve created a scatter plot with that data and calculated, for every point, the distance to the point where 100% of the students were poor and had an average of 20 out of 20 on the national exams. That led us to the four municipalities that were the outliers (I’ve done the same for the previous year’s data just to check if those municipalities were the outliers only by chance this year).
Then, and while I’ve talked with school directors and the mayors of those cities, I’ve used scrollama.js to build a scrollytelling piece where I could explain visually what led us to those schools and, as we presented the data from those places, the piece makes a social demographic explanation about the place and tells us about what is being done differently there.
What was the hardest part of this project?
Even though education is a priority for our newspaper, the hardest part of this project was finding time to do it. With covid-19, almost all the newsroom resources seemed to be channeled to cover the pandemic. Being the only data journalist in the newsroom, it happened the same to me. So finding time to write about something that was not directly associated with the pandemic was hard.
What can others learn from this project?
I would say that the biggest lesson here is that sometimes you just need to look to a database a bit differently to find a new story. This database is published every year, so naturally, everyone ends up getting similar stories. Adding a bit of statistical sophistication can lead you to new approaches.
Another lesson on this project is that I broke all rules or left the explanation about how we got to the story to the middle/end of the piece. Because explaining how we got to those four municipalities was essential to understanding why they were important, the infographic that changes through scroll started precisely there – to show why those places were important to have a look at.