To get to a public college in Portugal, you need to get better marks than the other students applying for the same degree you want. If a degree opens with 80 places in some university, you need to at least be the 80th with the best school results to get in. But 2020 was not a normal year, and national exams went up, making some of the last students to enter classifications get pretty high. Using the results from the last 4 years and standard deviation, I’ve calculated which ones were mostly affected by this.
This story helped to measure something everyone was commenting out without any factual data. People were using last year’s results, but that could be misleading since we didn’t know how much that last entry mark usually varies over the years. Using the standard deviation, that was possible to measure.
I used R to load the excel files and do some data cleaning to find the college courses that could be compared over the four years. Then calculated the standard deviation and the difference to these year’s results from the mean plus the standard deviation.
What was the hardest part of this project?
Explaining the reader what I did there. The standard deviation is something that people that don’t know a lot about statistics know what it means. So it was a challenge to explain in a clear way what was done to get to the results presented in the article.
What can others learn from this project?
Something that probably all data journalists know: that the public debate becomes a little more informative when you add data to the equation.