vlm3 - BMW E30 M3 Vision Language Model Project

Build a Vision-Language Model that understands BMW E30 M3 service documentation. This project provides the complete stack: scraping community knowledge, processing service manuals into training data, and fine-tuning VLMs.

Project Status

Component	Status	Description
Data Pipeline	✅ Complete	12,410 Q&A pairs from service manuals
Scraper	✅ Implemented	Web scraper for community knowledge
Training	⚙️ Config Only	Qwen2-VL-7B LoRA fine-tuning ready
Evaluation	📋 Planned	DeepEval framework with Claude-as-judge

Quick Start

Environment Setup

python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
export ANTHROPIC_API_KEY=your_key  # Required for pipeline Stages 3-4

Run Data Pipeline

make all      # Full pipeline: inventory → prepare → classify → generate → filter → emit
make status   # Check progress

Output: training_data/vlm_train.jsonl (11,154 examples) + training_data/vlm_val.jsonl (1,256 examples)

Run Scraper

# See scraper/README.md for full usage
python scraper/01_discover_forums.py    # Discover site structure
python scraper/02_scrape_threads.py     # Scrape thread listings
python scraper/03_scrape_posts.py       # Download post content
python scraper/04_download_images.py    # Download images

Project Structure

vlm3/
├── pipeline/                 # Data processing pipeline
│   ├── scripts/              # Stages 01-09
│   ├── tests/                # pytest suite
│   └── config.yaml           # Pipeline configuration
│
├── scraper/                  # Web scraper for community knowledge
│   ├── 01_discover_forums.py # Discover site structure
│   ├── 02_scrape_threads.py  # Scrape thread listings
│   ├── 03_scrape_posts.py    # Download post content
│   ├── 04_download_images.py # Download images
│   ├── core.py               # HTTP client, checkpointing
│   ├── parser.py             # HTML parsing
│   └── tests/                # Scraper tests
│
├── training/                 # VLM fine-tuning (⚙️)
│   └── configs/
│       └── lora_qwen2vl.yaml # LoRA training config
│
├── eval/                     # Model evaluation (📋)
│   └── benchmarks/
│       └── manual_probes.json
│
├── data_src/                 # Source materials (read-only)
├── work/                     # Pipeline intermediates
├── training_data/            # Final outputs
├── forum_archive/            # Scraped web data
│
├── Makefile                  # Pipeline orchestration
└── specs/                    # Project specifications

Data Pipeline

Converts scanned service manual pages into VLM training data using Claude's vision capabilities—no OCR needed.

Pipeline Flow

data_src/ (JPG/PDF/HTML)
    ↓
Stage 1: Inventory    → work/inventory.csv
Stage 2: Prepare      → work/inventory_prepared.csv (PDF→JPG)
Stage 3: Classify     → work/classified/pages.csv [Claude API]
Stage 4: Generate Q&A → work/qa_raw/*.json [Claude API]
Stage 5: Filter       → work/qa_filtered/*.json → work/qa_unique/*.json
Stage 6: Emit         → training_data/vlm_train.jsonl + vlm_val.jsonl

Source Materials

Source	Format	Content
Service Manual (00-97)	JPG scans	Procedures, specs, diagrams
Electrical Manual	JPG scans	Wiring, pinouts, flowcharts
Bosch Motronic ML 3-1	JPG scans	ECU signals, parameters
Getrag 265/5 Rebuild	PDF	Transmission procedures
Tech specs (HTML)	HTML	Vehicle specifications

Output Format

{
  "image": "images/21-03.jpg",
  "conversations": [
    {"role": "user", "content": "What should I inspect the clutch for?"},
    {"role": "assistant", "content": "Inspect for cracks, wear, and burnt spots..."}
  ],
  "metadata": {
    "page_id": "21-03",
    "section_name": "Clutch",
    "content_type": "procedure",
    "question_type": "inspection"
  }
}

Make Targets

Target	Description
`make all`	Complete pipeline
`make status`	Show progress
`make quick`	Skip Stages 1-2
`make regen-qa`	Regenerate from Stage 4
`make refilter`	Rerun from Stage 5
`make clean`	Clean intermediates

Scraper

Collects E30 M3 community knowledge from vBulletin forums for additional training data.

Features

Rate limiting: Polite scraping with randomized 1.5-2.5s delays
Checkpoint/resume: Stop and restart without losing progress
Structured storage: Raw HTML + parsed JSON
Image downloading: Downloads embedded images with references

Usage

# Discover forum structure
python scraper/01_discover_forums.py

# Scrape specific forum
python scraper/02_scrape_threads.py --forum-id 42
python scraper/03_scrape_posts.py --forum-id 42
python scraper/04_download_images.py --forum-id 42

# Or scrape everything
python scraper/02_scrape_threads.py --all
python scraper/03_scrape_posts.py --all
python scraper/04_download_images.py --all

See scraper/README.md for detailed usage and configuration.

Training Infrastructure

Fine-tune Qwen2-VL-7B-Instruct using LoRA on Modal GPU cloud.

Configuration (`training/configs/lora_qwen2vl.yaml`)

Parameter	Value
Base Model	Qwen2-VL-7B-Instruct
Method	LoRA (rank 64, alpha 128)
Quantization	4-bit (nf4, bfloat16)
Training	3 epochs, batch 16 (4×4 accumulation)
Learning Rate	2e-4, cosine decay
GPU	A100-80GB (~$8-16 estimated cost)

Planned Scripts

prepare_dataset.py - Convert JSONL → HuggingFace Dataset
modal_train.py - LoRA training on Modal
modal_serve.py - Inference endpoint

Evaluation (Planned)

DeepEval-based framework using Claude-as-judge.

Planned Metrics

Metric	Purpose	Threshold
AnswerRelevancy	Does answer address question?	>0.7
Faithfulness	Is answer grounded in image?	>0.7
NumericExactMatch	Torque specs, measurements	>0.85
KeywordPresence	Required technical terms	>0.80

Approach

Baseline evaluation on unmodified Qwen2-VL-7B
Post-training evaluation
Manual probe benchmarks (20-30 critical questions)

Requirements

pip install -r requirements.txt

Key dependencies:

anthropic - Claude API for classification/Q&A
pillow, opencv-python - Image processing
pdf2image - PDF conversion
sentence-transformers - Semantic deduplication
requests, beautifulsoup4 - Web scraping
datasets, huggingface_hub - Dataset management

Testing

pytest pipeline/tests/           # Pipeline tests
pytest scraper/tests/            # Scraper tests
pytest -v                        # Verbose
pytest -k "classify"             # Pattern match

License

Research/educational purposes. Check original BMW service manual licensing.

Acknowledgments

BMW E30 M3 service manuals
Anthropic Claude for vision-based Q&A generation
E30 M3 enthusiast community

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vlm3 - BMW E30 M3 Vision Language Model Project

Project Status

Quick Start

Environment Setup

Run Data Pipeline

Run Scraper

Project Structure

Data Pipeline

Pipeline Flow

Source Materials

Output Format

Make Targets

Scraper

Features

Usage

Training Infrastructure

Configuration (`training/configs/lora_qwen2vl.yaml`)

Planned Scripts

Evaluation (Planned)

Planned Metrics

Approach

Requirements

Testing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/workflows		.github/workflows
data_src		data_src
eval		eval
pipeline		pipeline
scraper		scraper
specs		specs
training		training
training_data		training_data
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt

drumwell/vlm3

Folders and files

Latest commit

History

Repository files navigation

vlm3 - BMW E30 M3 Vision Language Model Project

Project Status

Quick Start

Environment Setup

Run Data Pipeline

Run Scraper

Project Structure

Data Pipeline

Pipeline Flow

Source Materials

Output Format

Make Targets

Scraper

Features

Usage

Training Infrastructure

Configuration (training/configs/lora_qwen2vl.yaml)

Planned Scripts

Evaluation (Planned)

Planned Metrics

Approach

Requirements

Testing

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Configuration (`training/configs/lora_qwen2vl.yaml`)

Packages