Football is becoming ever more reliant on data to drive transfer decisions, line-ups and tactics. That is why De Tijd chose to create our own model showing the strength of each player and team competing in the 2020 UEFA European Football Championship, presented as the “Star index” (“sterindex” in Dutch). Using this model and feeding it new data as matches were played, we were able to rank players and calculated the ‘Man of the Match’ and the ‘Most Valuable Player’ of the tournament. These findings were not only communicated in our dashboard but were also translated in individual articles.
As a newspaper focused on business and financial news, De Tijd will always look at sports reporting in a different way. With this project, we managed to bring the emerging world of data science, as it is applied professionally by football teams, to our readers in a very accessible way.
Although the newspaper doens’t has a dedicated section to sports we were able to plant our foot next to broader newspapers with specialised sports team. We were able to position ourselves next to these major outlets by focussing on a data driven approach while shying away of tendentious reporting that fluctuates on the rythm of single wins or losses. The central dashboard was well-read and resulted in more than 20 derivate articles in the newspaper.
The “Star index” is based on more than 20 criteria – all possible player actions and skills – from more than 20,000 national and club matches since the 2018 World Cup.
Data used for the model ranges from scored goals, shots on target and assists to tackles and ball possession, but also included power, stamina, reaction speed and vision. This information was provided by the specialised dataprovider iSports API.
Statistician Maarten De Schryver (Ghent University) turned this data into a model that provided us with individual ratings for each player in the national selections. This Star Index was then corrected for competitions – the five major competitions were given more weight – and minutes played.
All data used was either stored in a compact database or calculated on the fly to provide for individual, dynamically generated url’s for each player and match. The frontend application was built using the Vue framework with router and data store plugins.
What was the hardest part of this project?
This project managed to combine several disciplines. The help of Maarten De Schryver was absolutely essential to make sure the data we showed to our readers was both relevant and clear. And although results and statistics from the iSports API were available relatively soon after each match, we quickly stumbled upon enough edge cases to realise it wouldn’t be possible to have this entire project run automatically. This meant downloading new data by hand and processing it in R with some manual checks before updating files on the server.
Similarly, the sheer volume of parameters involved in this project as well as the lack of any real world data already available forced us to start start coding the application before we had fully settled on the presentation of the data or the exact composition of the statistical model.
Once launched, we were more or less at the mercy of the match schedule to keep our readers interested. Although new data was available nearly every single day, some matches were obviously more interesting than others – from both a sports as well as a data point of view.
What can others learn from this project?
– With a small team of four people we were able to claim (data driven) sports journalism as a valid subject in our newspaper. Doing so we happily surprised our readers and other major news outlets who even congratulated us with our innovative approach.
– The results of our efforts wasn’t limited to an online dashboard but also resulted in more than 20 individual articles that were fed based on the findings of our data analysis. This means our time-investment at the start resultated in a more efficient reporting later on.