Dec. 15, 2019

Predicting All-NBA Teams with XGBoost

Each year, the All-NBA teams are voted on by sports journalists and are an indicator of who has performed well over the course of an NBA season. In this way, these yearly accolades can be used as a way to gauge player performance and top players will be honored with a spot on the first, second or third All-NBA team. It is a good snapshot of the top 15 players in the league for a given year and I am interested in trying to predict which players will be on an All-NBA team at the end of the year.


I took a dataset from Kaggle - NBA Players stats since 1950 and enriched it with the All-NBA selections since their inception in 1955. However, instead of the 1st, 2nd or 3rd team honors, I simply set a binary flag that the player was on the All-NBA team. This was so I could use linear regression to train a model to decide yes or no if a player will be an All-NBA player.

Using the XGBoost Python libraries and this Kaggle notebook as a guide I ran a bunch of models using vanilla XGBoost tuning parameters and found some interesting predictions.

Input Features

Model Outputs

Feature Importance

Assists are weighted highly. This makes sense as good players will have the ball in their hands a lot and make others better with their passing ability. Surprisingly, defense rebounds are weighed second and offensive and total rebounds are weighted lowest.


I chose a set of different players, best players of time (MJ, Duncan, LeBron, KD, Wade), all-star players now (Giannis, Westbrook), players with long careers who made less than a handful of All-NBA teams (Randolph, Deron Williams, Vince Carter), and players who didn't make an All-NBA team (Battier, Muscala, Miller).


For all testing outcomes check this file.

This was a fun way to learn XGBoost. I would like to test some more players, tune the algorithm and check some of the lower probabilities to see what insights there are there.

All code can be found on my github at