This project implements an end-to-end MLOps pipeline for predicting whether customers will purchase the Wellness Tourism Package from "Visit with Us" travel company.
- Hugging Face Spaces: View Live
git clone https://github.com/ananttripathi/Tourism_Project.git
cd Tourism_Project
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r tourism_project/requirements.txt
cd tourism_project/deployment && streamlit run app.pyRequires a trained model (see Deploying via GitHub Actions or Running Locally). Or try the Live Demo first.
tourism_project/
βββ .github/
β βββ workflows/
β βββ pipeline.yml # GitHub Actions CI/CD workflow
βββ data/
β βββ tourism.csv # Original dataset
βββ deployment/
β βββ app.py # Streamlit web application
β βββ Dockerfile # Docker configuration
β βββ requirements.txt # Deployment dependencies
βββ hosting/
β βββ hosting.py # Script to push to Hugging Face Spaces
βββ model_building/
β βββ data_register.py # Dataset registration to Hugging Face
β βββ prep.py # Data preprocessing script
β βββ train.py # Model training with MLflow tracking
βββ requirements.txt # Workflow dependencies
- Automated dataset upload to Hugging Face Hub
- Comprehensive data cleaning and preprocessing
- Handling of missing values and data quality issues
- Label encoding of categorical variables
- Stratified train-test split (80-20)
- Algorithm: XGBoost Classifier
- Hyperparameter Tuning: GridSearchCV with 3-fold cross-validation
- Experiment Tracking: MLflow integration
- Metrics: Accuracy, Precision, Recall, F1-Score, ROC-AUC
- Web Application: Interactive Streamlit app
- Containerization: Docker support
- Hosting: Hugging Face Spaces
- Automated workflow with GitHub Actions
- Four main jobs:
- Dataset Registration
- Data Preparation
- Model Training
- Deployment to Hugging Face
Before running this project, you need to set up the following:
# Create a new GitHub repository
# Repository name: tourism-package-prediction (or your choice)
# Initialize with README- Create a Hugging Face account at https://huggingface.co
- Generate an access token:
- Go to Settings β Access Tokens
- Create a new token with Write permissions
- Copy and save the token securely
- Go to your GitHub repository
- Navigate to: Settings β Secrets and Variables β Actions
- Add a new repository secret:
- Name:
HF_TOKEN - Secret: Paste your Hugging Face token
- Name:
Create the following spaces on Hugging Face:
- Space name:
wellness-tourism-prediction - SDK: Docker (Streamlit template)
IMPORTANT: You must replace all placeholder values in the code:
Replace <---repo id----> with your Hugging Face username in the following files:
-
tourism_project/model_building/data_register.py
- Line:
repo_id = "<---repo id---->/tourism-dataset"
- Line:
-
tourism_project/model_building/prep.py
- Line:
DATASET_PATH = "hf://datasets/<---repo id---->/tourism-dataset/tourism.csv" - Line:
repo_id="<---repo id---->/tourism-dataset"
- Line:
-
tourism_project/model_building/train.py
- Lines with paths to Hugging Face datasets
- Line:
repo_id = "<---repo id---->/tourism-prediction-model"
-
tourism_project/deployment/app.py
- Line:
repo_id="<---repo id---->/tourism-prediction-model"
- Line:
-
tourism_project/hosting/hosting.py
- Line:
repo_id="<---repo id---->/wellness-tourism-prediction"
- Line:
# Clone the repository
git clone https://github.com/ananttripathi/Tourism_Project.git
cd Tourism_Project
# Install dependencies
pip install -r tourism_project/requirements.txt# 1. Register dataset (requires HF_TOKEN environment variable)
export HF_TOKEN="your_hugging_face_token" # On Windows: set HF_TOKEN=your_token
python tourism_project/model_building/data_register.py
# 2. Prepare data
python tourism_project/model_building/prep.py
# 3. Train model (start MLflow server first)
mlflow ui --host 0.0.0.0 --port 5000 &
python tourism_project/model_building/train.py
# 4. Deploy to Hugging Face
python tourism_project/hosting/hosting.pycd tourism_project/deployment
streamlit run app.py-
Prepare your repository:
- Ensure all placeholder values are replaced
- Verify HF_TOKEN is added to GitHub Secrets
-
Push to GitHub:
git add . git commit -m "Initial commit: Tourism Package Prediction Pipeline" git push origin main
-
Monitor the workflow:
- Go to your GitHub repository
- Click on "Actions" tab
- Watch the pipeline execute automatically
-
Access your deployed app:
- Visit:
https://huggingface.co/spaces/<your-username>/wellness-tourism-prediction
- Visit:
The model is trained using XGBoost Classifier with the following characteristics:
- Training Features: 17 features including customer demographics and interaction data
- Target Variable: Binary (Purchase = 1, No Purchase = 0)
- Evaluation Metrics:
- Accuracy
- Precision
- Recall
- F1-Score
- ROC-AUC Score
The web application provides:
- User-friendly interface with two-column layout
- Customer Demographics Section: Age, occupation, income, etc.
- Interaction Data Section: Pitch details, follow-ups, preferences
- Real-time Predictions with confidence scores
- Actionable Recommendations based on prediction results
All experiments are tracked using MLflow:
- Hyperparameter combinations logged
- Model metrics recorded
- Best model artifacts saved
- Experiment comparison capabilities
Features:
- Customer demographics (Age, Gender, Occupation, etc.)
- Travel preferences (PropertyStar, NumberOfTrips, etc.)
- Interaction data (PitchDuration, Followups, SatisfactionScore)
- Financial data (MonthlyIncome, OwnCar)
Target: ProdTaken (0 = No Purchase, 1 = Purchase)
- Python 3.9
- XGBoost - Machine Learning
- scikit-learn - Preprocessing & Evaluation
- MLflow - Experiment Tracking
- Streamlit - Web Application
- Docker - Containerization
- GitHub Actions - CI/CD
- Hugging Face Hub - Model & Data Storage
- Complete folder structure created
- Data registration script (
data_register.py) - Data preparation script (
prep.py) - Model training script with MLflow (
train.py) - Streamlit application (
app.py) - Dockerfile for deployment
- Deployment requirements.txt
- Hosting script (
hosting.py) - GitHub Actions workflow (
pipeline.yml) - Workflow requirements.txt
- Jupyter notebook with all code cells filled
- Replace all
<---repo id---->placeholders with your HF username - GitHub repository created
- HF_TOKEN added to GitHub Secrets
- Hugging Face Space created
- Pipeline executed successfully
- Screenshots of:
- GitHub repository structure
- GitHub Actions workflow execution
- Deployed Streamlit app on Hugging Face
- Screenshot showing folder structure
- Screenshot showing successful workflow execution
- Link to deployed application
- Screenshot of the Streamlit app in action
-
HF_TOKEN not found:
- Ensure the token is added to GitHub Secrets
- Verify the secret name is exactly
HF_TOKEN
-
Import errors:
- Check all dependencies are in requirements.txt
- Verify correct versions are specified
-
Model not loading in Streamlit:
- Ensure model is uploaded to correct HF repository
- Check repo_id matches across files
-
GitHub Actions failing:
- Check workflow logs for specific errors
- Verify all file paths are correct
- Ensure requirements.txt includes all dependencies
This MLOps pipeline was developed as part of the Advanced Machine Learning and MLOps course assignment.
Co-author: ananttripathiak
This project is licensed under the MIT License.
Suggested GitHub topics: mlops machine-learning xgboost streamlit huggingface github-actions docker wellness-tourism
Note: Remember to replace ALL placeholder values (<---repo id---->) with your actual Hugging Face username before submitting!