How It Works

This Music RAG Data Scraper collects guitar tabs, music theory content, and educational materials for AI training. It processes both local files and web content, then stores everything in a searchable ChromaDB vector database.

How It Works

Phase 1: Local File Processing

Parses TuxGuitar files (.gp3/.gp4/.gp5) for guitar tablature Processes MIDI files to extract musical notes and timing Reads ASCII tab files and chord charts Converts everything into structured JSON chunks

Phase 2: Web Scraping

Scrapes educational music websites (Wikipedia, music theory sites, etc.) Attempts to get Ultimate Guitar tabs Processes GitHub awesome-guitar resources Extracts and chunks text content for AI training

Data Storage

Saves all content as JSON files Stores in ChromaDB vector database for semantic search Creates searchable chunks with metadata (source, difficulty, topic, etc.)

Output Files

The scraper creates several JSON files in the music_rag_data/ directory:

phase1_local_file_chunks.json - Local file processing results educational_websites.json - Web scraped educational content ultimate_guitar_tabs.json - Guitar tabs (if successful) awesome_guitar_resources.json - GitHub resources complete_music_dataset.json - Combined dataset chroma_db/ - ChromaDB vector database

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
chroma_store		chroma_store
samples		samples
.gitignore		.gitignore
README.md		README.md
ask_rag.py		ask_rag.py
check_chroma_contents.py		check_chroma_contents.py
embed_scraped_data.py		embed_scraped_data.py
parse_alphatex.js		parse_alphatex.js
requirements.txt		requirements.txt
scraped_data.json		scraped_data.json
scraper.log		scraper.log
web_scraper.py		web_scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How It Works

Output Files

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

How It Works

Output Files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages