Transform physical notebook images into structured markdown notes using AI vision models.
- Multiple model support: Choose between local TrOCR, Claude API, or local Ollama vision models
- Image optimization: Automatic resizing, compression, and optional grayscale conversion to reduce token usage
- Template-based output: Customizable markdown templates for consistent note formatting
- Test-driven development: Comprehensive pytest suite with 73% test coverage
git clone <repository-url>
cd notebook-parser
uv sync- Get your API key from Anthropic Console
- Set your API key:
export ANTHROPIC_API_KEY=sk-ant-api03-your-key-here- Parse a notebook image:
uv run notebook-parser parse -i notebook.jpg -o note.md --model claude- Install and start Ollama:
brew install ollama # macOS
ollama serve
ollama pull llama3.2-vision- Parse a notebook image:
uv run notebook-parser parse -i notebook.jpg -o note.md --model ollamauv run notebook-parser parse -i notebook.jpg -o note.md --model localnotebook-parser parse [OPTIONS]Required Options:
-i, --input PATH: Input image file-o, --output PATH: Output markdown file
Model Options:
--model [local|claude|ollama]: Model to use (default: local)local: TrOCR (basic OCR, lower quality)claude: Claude 3.5 Sonnet vision API (best quality, requires API key)ollama: Local Ollama vision model (good quality, fully private)
Image Optimization Options:
--optimize/--no-optimize: Optimize image for LLM vision (default: True)--grayscale: Convert to grayscale to save tokens (~3x reduction)
Claude-specific Options:
--api-key TEXT: Anthropic API key (or set ANTHROPIC_API_KEY env var)
Ollama-specific Options:
--ollama-model TEXT: Ollama model name (default: llama3.2-vision)--ollama-url TEXT: Ollama API endpoint (default: http://localhost:11434)
Template Options:
-t, --template PATH: Custom template file (default: templates/note-template.md)
Extract text quickly without template formatting:
notebook-parser read notebook.jpgexport ANTHROPIC_API_KEY=sk-ant-api03-xxx
uv run notebook-parser parse -i page1.jpg -o page1.md --model claudeuv run notebook-parser parse -i page1.jpg -o page1.md --model claude --grayscaleuv run notebook-parser parse -i page1.jpg -o page1.md --model ollama --ollama-model llavauv run notebook-parser parse -i page1.jpg -o page1.md --template my-template.mduv run notebook-parser parse -i page1.jpg -o page1.md --model claude --no-optimizeThe default template creates notes with this structure:
**Title**: <extracted-title>
**Source**: <image-filename>
**Date**: <current-date>
**Tags**: #notes #handwritten
**Status**: Raw Note
## Key Idea
<extracted-content>
## Why It Matters
*To be filled*
## How I Might Use It
*To be filled*When using Claude or Ollama models with --optimize enabled (default):
- Claude: Images resized to max 1568px, JPEG quality 85
- Ollama: Images resized to max 1024px, JPEG quality 75
- Grayscale: Optional flag reduces token usage by ~3x
uv run pytest -vuv run pytest --cov=src| Model | Quality | Speed | Cost | Privacy |
|---|---|---|---|---|
| TrOCR (local) | Low | Fast | Free | Full |
| Ollama (local) | Good | Medium | Free | Full |
| Claude API | Best | Fast | $3/1K images* | API calls |
*Estimated cost based on average image size and token usage
- Ensure
ANTHROPIC_API_KEYis set correctly - Check API key has proper permissions at console.anthropic.com
- Verify Ollama is running:
ollama list - Check Ollama URL:
curl http://localhost:11434/api/tags - Pull the model:
ollama pull llama3.2-vision
- Use
--model claudeor--model ollamafor better handwriting recognition - TrOCR works best with typed text, not handwritten notes