Here’s a professional and detailed README.md for your GitHub repository based on the uploaded notebook titled Cancer_Tumor_Classification_Cleaned.ipynb:
This project applies machine learning techniques to classify whether a tumor is benign or malignant using the Breast Cancer Wisconsin (Diagnostic) Dataset. The notebook includes comprehensive steps such as data cleaning, exploratory data analysis (EDA), feature selection, model training, and evaluation.
- Understand and clean the cancer dataset.
- Explore key features and visualize data patterns.
- Train multiple classification models.
- Evaluate model performance using accuracy, confusion matrix, and classification reports.
- Select the best-performing model for cancer prediction.
- Source: UCI Machine Learning Repository – Breast Cancer Wisconsin (Diagnostic)
- Features: 30 numeric features describing characteristics of cell nuclei present in digitized images.
- Target: Diagnosis (
Mfor malignant,Bfor benign).
- Checked for null values and duplicates.
- Removed irrelevant columns (e.g., ID).
- Converted categorical target labels to numerical form (
M→ 1,B→ 0).
Trained and compared the following classification algorithms:
- Logistic Regression
- K-Nearest Neighbors (KNN)
- Support Vector Machine (SVM)
- Decision Tree
- Random Forest
- Gradient Boosting
- Accuracy Score
- Confusion Matrix
- Precision, Recall, F1-Score
- ROC-AUC Score
The Random Forest Classifier achieved the highest accuracy and robustness across metrics, making it the best choice for real-world deployment.
Install the required libraries using:
pip install numpy pandas matplotlib seaborn scikit-learn- Clone the repository:
git clone https://github.com/your-username/Cancer_Tumor_Classification.git
cd Cancer_Tumor_Classification- Open the notebook in Jupyter:
jupyter notebook Cancer_Tumor_Classification_Cleaned.ipynb- Early breast cancer detection
- Medical diagnosis tools
- Academic teaching for binary classification
- UCI Machine Learning Repository for the dataset
- scikit-learn for the robust ML toolkit