https://deepzissou.github.io/Draft_Prediction_Project/
The focus of this project was to use Python to develop Machine Learning Models that use historical NCAA player stats to predict if a player is likely to play Basketball professionally.
Player stats were gathered via web-scraping and amounted to approximately 109 thousand unique records.
Several different Machine Learning models were trained and tested using the gathered data, with a Random Forest Classifier Model showing the best results.
Python
BeautifulSoup
Scikit-learn
PostgreSQL
Web-scraping code was used to scrape data from the site Sports Reference https://www.sports-reference.com/. The code was specified to pull data for eleven key player stat parameters:
Data was prepared using postgres SQL and pandas. An “is_pro” column was added to assign players with a known outcome of either 1 = played professionally or 0 = no professional statistics. The height column was converted to centimeters and rows containing NAN values were dropped. The dataframe was then randomly split into two dataframes “train” and “test”.
The “train” dataframe was used to train our model. After assessing different models, the Random Forest Classifier model was chosen to train the data. RFC allowed for multiple feature inputs to be utilized without reducing our accuracy.
The “test” dataset was used to test the predictability of our model. The known outcomes and names were removed and the player stats were coded as the input. The model was coded to predict the “is_pro” status.
The outputs of the model were combined with the known results into a dataframe and saved to the file randomForest_model_results.csv.
Marisa Kiger: marisa_krg@yahoo.com
Galen Kellner: kellnergp@gmail.com
Thomas Martin: thomas.martin321@gmail.com
Dwayne Jordan: https://github.com/deepzissou
