RAPTOR, which stands for Robust Algorithm (using) Player Tracking (and) On/Off Ratings, is FiveThirtyEight’s new NBA statistic. We’re pretty excited about it. In addition to being a statistic that we bake in house to fuel our data-driven coverage of the NBA, RAPTOR fulfills two long-standing goals of ours:
- First, we wanted to create a publicly available statistic that takes advantage of modern NBA data, specifically player tracking and play-by-play data that isn’t available in traditional box scores.
- Second, and relatedly, we wanted a statistic that better reflects how modern NBA teams actually evaluate players.
RAPTOR allows us to cover the NBA in new and more nuanced ways than ever. It allows us to compare players in the context of the modern NBA while accounting for remarkable subtleties in how different players play, such as how much they control the ball and how they affect their team on the court. It powers our reporting, fuels our forecasts, and allows us to make more nuanced comparisons between current players and those of the past. This new metric is the analytical engine behind FiveThirtyEight’s rigorous NBA coverage, empowering:
- Interactive 2019-20 NBA Predictions that forecast which teams will win specific games, make the playoffs, and bring home the Larry O’Brien trophy. These forecasts update after every game and depth chart revision to give the most accurate picture of today’s NBA landscape.
- A fancy visual interactive player ratings leaderboard that answers the question: “Who are the best offensive, defensive, and overall players in the NBA this season?”
- Interactive player projections that identify similar players throughout NBA history and uses them to develop a probabilistic forecast of what a current NBA player’s future might look like.
- NBA Stat Battles that pit sportswriter against sportswriter to answer some of the biggest NBA barroom debates about which player is better.
- Reporting on players, teams, and trends throughout the NBA season, like this piece on the star-level performance of Luka Dončić.
- Pretty much the rest of NBA reporting FiveThirtyEight produces
RAPTOR’s technical infrastructure is complicated and intensive, pulling data from multiple sources and building interactive front-end visuals. The project’s back-end scaffolding uses a host of different tools and technologies, including:
- Amazon EC2 PostgreSQL database
- API calls and cron jobs build in Ruby on Rails
- Two STATA statistical models
- Google sheets that bring in injuries and suspensions
- Python/Monte Carlo simulation forecast model
These tools and techniques are set up in a daily process that scrapes data from ~1,500 URLs across multiple sources, organizes these data into structured data inputs, runs multiple statistical models, stores results of those models in a SQL database, generates data outputs, and deploys builds of fancy front-end interactives that pick up these data outputs. Our front-end interactives use node builds (based in gulp) that use ArchieML and d3 amongst a host of other libraries and techniques to build visual representations of player rankings, team ratings, and season forecasts that update every day – or whenever a game is played or a new team depth chart revision is released.
What was the hardest part of this project?
A project of this scale and complexity comes with a host of challenges, both in data analysis and front-end visualization. The data we use to power the player ratings in this project come from an incredibly detailed and nuanced statistical process. RAPTOR collects, parses, and analyzes thousands of pieces of publicly available data to evaluate NBA players in a statistically rigorous way – an inherently complex, complicated task when player value is often hard to concretely define and even harder to measure. This process parses an incredible volume of data pretty frequently, repeating every day, along with every time an NBA game goes final or when a team releases an updated depth chart. It also manages the complexities around modeling the real world of the NBA, incorporating injuries and suspensions to reflect a team’s current rotation in real games. Nate and Jay were able to manage these complexities to build reliable, consistent, and surprisingly efficient scripts and processes to parse an impressive amount of data, crunch those numbers with sophisticated techniques, and spit out cleanly formatted data.
After parsing and processing the data, the design and development aspects of RAPTOR provided other challenges that our team navigated well. There were so many different analyses, forecasts, and ratings we needed to share with our users, along with explaining exactly how RAPTOR works. Our main player rating dashboard had to visually show our assessment of the top players, while still allowing users to explore the whole league and see stats for any player. The main RAPTOR dashboard manages these challenges well, providing an overall leaderboard that shows the top players four different ways (WAR, overall, offensive, and defensive ratings) while also giving a fully interactive scatterplot and heatmap-based table.
What can others learn from this project?
RAPTOR provides some great opportunities to learn about how we took complex statistical modeling and visualized the results in an intuitive, understandable way. RAPTOR is a really ambitious project that covers web scraping, data aggregation, back-end web builds, statistical modeling, and interactive visualization. Taking a closer look at this project would give plenty of valuable takeaways to both back-end data crunchers and front-end designers. The visual design of the project was ambitious in its approach to condense complex information into concise ratings, providing both high-level takeaways and more expansive exploration. Exploring some of the visual design and development decisions our team made in this project would show interesting nuanced conversations we had around “smaller” issues like scaling the upper and lower bounds of our beeswarm charts (along with when we kept them consistent with each other, and when we didn’t) along with high level decisions about what we wanted the takeaways of this dashboard to be – and how we approached showing those takeaways from a visual context.
A deep dive into the end-to-end data pipeline and analysis process would also provide great learning opportunities, both in our approach and technical execution. RAPTOR’s data process touches a number of different coding languages, infrastructures and softwares that come together in an automated process that feeds data into our front-end visualizations. It includes inputs from web scraping, manual tweaks to incorporate injuries and connection with Slack that allows our team to be proactively involved while still being hands-off. The impressive volume of data flowing into and out of RAPTOR on a daily basis would make for an interesting and informative case study for data journalists.