Each year, the All-NBA teams are voted on by sports journalists and are an indicator of who has performed well over the course of an NBA season. In this way, these yearly accolades can be used as a way to gauge player performance and top players will be honored with a spot on the first, second or third All-NBA team. It is a good snapshot of the top 15 players in the league for a given year and I am interested in trying to predict which players will be on an All-NBA team at the end of the year.
Data
I took a dataset from Kaggle - NBA Players stats since 1950 and enriched it with the All-NBA selections since their inception in 1955. However, instead of the 1st, 2nd or 3rd team honors, I simply set a binary flag that the player was on the All-NBA team. This was so I could use linear regression to train a model to decide yes or no if a player will be an All-NBA player.
Using the XGBoost Python libraries and this Kaggle notebook as a guide I ran a bunch of models using vanilla XGBoost tuning parameters and found some interesting predictions.
Input Features
- Data starting from 1975, I chose 1975 because that is when some NBA statistics started being tracked. I ended up taking those out but I left it as is for now.
- '2P', '3P', 'eFG%', 'AST', 'BLK', 'FT', 'MP', 'ORB', 'DRB', 'TRB', 'STL', 'PTS', 'G', 'all_nba'
Model Outputs
- precision score: 0.840909
- recall score: 0.637931
- accurary score: 0.984840
- f1 score: 0.725490
- RMSE: 0.123100
- precision-recall curve: 0.847533
Feature Importance
Assists are weighted highly. This makes sense as good players will have the ball in their hands a lot and make others better with their passing ability. Surprisingly, defense rebounds are weighed second and offensive and total rebounds are weighted lowest.
- ('AST', 346)
- ('DRB', 254)
- ('eFG%', 251)
- ('3P', 247)
- ('PTS', 236)
- ('G', 236)
- ('STL', 231)
- ('FT', 212)
- ('BLK', 209)
- ('2P', 188)
- ('MP', 183)
- ('ORB', 181)
- ('TRB', 122)
Testing
I chose a set of different players, best players of time (MJ, Duncan, LeBron, KD, Wade), all-star players now (Giannis, Westbrook), players with long careers who made less than a handful of All-NBA teams (Randolph, Deron Williams, Vince Carter), and players who didn't make an All-NBA team (Battier, Muscala, Miller).
Outcomes
For all testing outcomes check this file.- Jordan's probabilities make sense except for his first year, (11% probability?)
- Vince Carter made 3rd team All-NBA in 2000 - algorthim predicts 49.9% probabilty of making All-NBA. This makes some sense to me, if you are making 3rd team NBA, your chances will not be as high as others.
- Did Vince Carter '07 deserve All-NBA? 66% chance of making it into top 15
- KD '17 had a 54% chance to be on All-NBA. Probability is most likely lower due to games played.
- Probabilities for current NBA Allstars make sense, most >80% probability.
This was a fun way to learn XGBoost. I would like to test some more players, tune the algorithm and check some of the lower probabilities to see what insights there are there.
All code can be found on my github at https://github.com/pjames5/nba_ml.