Spotify Profiler

A machine learning pipeline for analyzing Spotify playlist data and generating music recommendations using collaborative filtering and embedding-based approaches.

Overview

This project processes the Million Playlist Dataset (MPD) to build music recommendation systems using various machine learning techniques including:

Collaborative filtering with co-occurrence matrices
Item2Vec embeddings for track representations
Hyperparameter tuning and experimentation
Evaluation metrics for recommendation quality

Project Structure

spotifyprofiler/
├── data/MPD/                 # Million Playlist Dataset
├── pipeline/                 # Core processing scripts
│   ├── mpd_processor.py      # MPD data processing
│   ├── build_co_occurrence.py # Co-occurrence matrix builder
│   ├── build_track_vocab.py  # Track vocabulary builder
│   ├── item2vec_trainer.py   # Item2Vec model trainer
│   └── reccobeats_client.py  # Recommendation client
├── tuning/                   # Hyperparameter tuning results
│   ├── checkpoints/          # Model checkpoints
│   ├── embeddings/           # Trained embeddings
│   └── experiment_results/   # Experiment results
├── requirements.txt          # Python dependencies
└── run_*.py                 # Execution scripts

Setup

Clone the repository

git clone <your-repo-url>
cd spotifyprofiler

Set up virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies
```
pip install -r requirements.txt
```

Set up environment variables

cp env_template.txt .env
# Edit .env with your configuration

Usage

Data Processing

Process MPD data
```
python pipeline/mpd_processor.py
```
Build co-occurrence matrix
```
python pipeline/build_co_occurrence.py
```
Build track vocabulary
```
python pipeline/build_track_vocab.py
```

Training Models

Train Item2Vec embeddings
```
python pipeline/item2vec_trainer.py
```
Run hyperparameter tuning
```
python run_second_round.py
```

Making Recommendations

from pipeline.reccobeats_client import RecCoBeatsClient

client = RecCoBeatsClient()
recommendations = client.get_recommendations(playlist_tracks)

Configuration

The project uses JSON configuration files for different experiments:

best_second_round_config_working.json - Best performing configuration
second_round_checkpoint_working.json - Training checkpoint
Various experiment configs in tuning/experiment_results/

Results

The project includes extensive hyperparameter tuning results stored in tuning/experiment_results/ with metrics including:

Precision@K
Recall@K
NDCG@K
MRR (Mean Reciprocal Rank)

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

License

[Add your license here]

Acknowledgments

Million Playlist Dataset (MPD) for providing the training data
Item2Vec paper for the embedding approach
Various open-source libraries used in this project

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
api		api
config		config
data		data
gpt		gpt
model		model
pipeline		pipeline
tuning/experiment_results		tuning/experiment_results
.gitignore		.gitignore
01_custom_neural_net_plan_clean.md		01_custom_neural_net_plan_clean.md
DEAM_Preprocessing_Guide.md		DEAM_Preprocessing_Guide.md
README.md		README.md
README_SonicSync_Updated.md		README_SonicSync_Updated.md
SECOND_ROUND_README.md		SECOND_ROUND_README.md
SETUP.md		SETUP.md
best_second_round_config_working.json		best_second_round_config_working.json
env_template.txt		env_template.txt
requirements.txt		requirements.txt
run_fixed_experiments.py		run_fixed_experiments.py
run_second_round.py		run_second_round.py
second_round_checkpoint_working.json		second_round_checkpoint_working.json
sonic_sync_mpd_plan.md		sonic_sync_mpd_plan.md
sonic_sync_shifted_plan.md		sonic_sync_shifted_plan.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spotify Profiler

Overview

Project Structure

Setup

Usage

Data Processing

Training Models

Making Recommendations

Configuration

Results

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Simar0108/SpotifyProfiler

Folders and files

Latest commit

History

Repository files navigation

Spotify Profiler

Overview

Project Structure

Setup

Usage

Data Processing

Training Models

Making Recommendations

Configuration

Results

Contributing

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages