▪ Constructed supervised machine learning models for telecommunications service vendors to predict customer churn probability via Python programming.
▪ Performed data cleaning, feature processing, standardization, encoding to prepare data for model training. Dropped 19% of uninformative and redundant data in preprocessing.
▪ Trained models including Logistic Regression, Random Forest, and K-Nearest Neighbors, and applied regularization to avoid overfitting and select important features.
▪ Evaluated model performance via 5-fold cross validation and selected Random Forest with top accuracy of 0.958 and AUC score of 0.96.