中文版 | English
A local offline OCR tool powered by PaddleOCR. Extracts text from images and returns structured JSON.
Supports three modes of use: CLI / MCP Server (for AI Agents) / TUI Interactive Wizard (Textual terminal UI).
- Fully offline — Models run locally; no internet required, data never leaves your machine
- CPU inference — No GPU or CUDA needed; works on Apple Silicon / Intel / Linux
- Three interfaces:
sauron ocr— Command-line OCR with JSON or plain text outputsauron serve— MCP stdio server for AI Agents such as Claude Desktopsauron— Launch the TUI wizard with no arguments: pick directories, preview scan, confirm, batch process
- Structured output — Each text region includes coordinates, confidence score, and text, sorted in reading order
- Portable design — Models and configuration live alongside the app directory; copy the entire directory to migrate
- Python
>=3.10, <3.13 - uv package manager
# Clone the repository
git clone https://github.com/truewatch/Sauron.git
cd Sauron
# Install dependencies
uv sync
# (Optional) Install TUI wizard support
uv sync --extra tui
# (Optional) Install development dependencies
uv sync --extra dev --extra tuiuv run python download_models.pyModels are approximately 16 MB total and are downloaded to the models/ directory inside the app directory. See the Model Management section for details.
You can run without downloading first — PaddleOCR will automatically download models from its servers to
~/.paddleocr/on first use, but this requires an internet connection.
# Single image -> JSON (output to stdout)
sauron ocr screenshot.png
# Multiple images -> JSON array
sauron ocr img1.png img2.jpg img3.bmp
# Text only
sauron ocr --text-only screenshot.png
# Batch mode: scan the ./in/ directory, write results to ./out/ (preserving directory structure)
sauron ocr
# Launch the TUI wizard (requires textual)
sauron| Condition | Behavior |
|---|---|
| TTY + textual installed | Launch TUI wizard |
| TTY + textual not installed | Print help + installation instructions |
| Non-TTY (pipe, etc.) | Print help |
Perform OCR on one or more images. Supports two modes:
Mode 1 — Specify images (output to stdout):
sauron ocr [OPTIONS] IMAGES...Mode 2 — Batch directory (no arguments, write to files):
sauron ocr [OPTIONS]When no IMAGES are provided, automatically scans the in/ subdirectory under the current directory, runs OCR on all images, and writes results to the out/ subdirectory, fully preserving the directory structure from in/.
project-directory/
├── in/ <- Place images to process here
│ ├── page1.png
│ └── chapter2/
│ ├── fig1.jpg
│ └── fig2.png
└── out/ <- Generated automatically, mirrors the structure
├── page1.json
└── chapter2/
├── fig1.json
└── fig2.json
| Option | Description |
|---|---|
--text-only |
Output text only. In specified-image mode, output goes to stdout; in batch mode, writes .txt files |
--lang LANG |
OCR language, defaults to ch (Chinese). Supports en, japan, etc. |
--model-dir PATH |
Specify model directory, overrides all other sources |
--json-pretty / --no-json-pretty |
Whether to pretty-print JSON output |
--tui / --no-tui |
Enable/disable progress panel (enabled by default on TTY) |
Specified-image mode output:
- Single image: JSON object (stdout)
- Multiple images: JSON array (stdout)
--text-only: Plain text (stdout), errors go to stderr
Batch directory mode output:
- Each image produces a corresponding
.json(or.txtwith--text-only) file - Progress and summary are output to stderr:
Done: N succeeded, M failed in 12.34s, output at ./out - The
in X.XXswall-clock figure includes model lazy-load on the first call
Exit codes:
0— All succeeded1— At least one image failed OCR2—--tuispecified but textual not installed, or no arguments and./in/does not exist
Output example:
{
"success": true,
"file": "/path/to/screenshot.png",
"results": [
{
"text": "Hello World",
"confidence": 0.96,
"box": [[10, 20], [200, 20], [200, 50], [10, 50]]
}
],
"full_text": "Hello World"
}Start the MCP stdio server for use by AI Agents.
sauron serve [--lang LANG] [--model-dir PATH]No output is produced after startup (stdio mode). Use sauron info to verify the environment.
Print environment diagnostic information.
sauron: 0.1.0.dev0
python: 3.12.8
platform: macOS-15.4-arm64-arm-64bit
app_dir: /Users/you/Sauron
settings: app (/Users/you/Sauron/settings.json)
model_dir: app (/Users/you/Sauron/models)
det: ok
cls: ok
rec: ok
paddleocr: 2.9.1
paddle: 3.0.0
textual: installed
Sauron exposes four tools to AI Agents via MCP (Model Context Protocol). All tool responses include an elapsed_ms field (end-to-end wall-clock, includes lazy model load on the first call).
Implementation note: blocking PaddleOCR work is offloaded to a worker thread so the asyncio event loop stays responsive — this avoids Claude Desktop's
-32001 Request timed outduring the slow first call. Individual tool calls are still bounded by the client-side timeout (~60 s on Claude Desktop), so for very large directories preferocr_extract_text_directorywithoutput_dir(summary-only response) or split work across severalocr_extract_text_batchcalls.
Extract text from a single image file.
{
"image_path": "/absolute/path/to/image.png",
"text_only": false
}Response: full OCRResult dict with elapsed_ms at the top level (omitted when text_only=true, which returns raw text).
Extract text from multiple image files in one call.
{
"image_paths": ["/abs/1.png", "/abs/2.jpg"],
"text_only": false
}Response: {"elapsed_ms": <int>, "results": [...]} — results is aligned with image_paths.
Scan a directory for images and OCR them all.
{
"directory": "/abs/path/to/images",
"recursive": true,
"output_dir": "/abs/path/to/out",
"output_format": "json",
"text_only": false
}Two response shapes:
- Inline (no
output_dir):{mode:"inline", directory, total, succeeded, failed, elapsed_ms, files:[...]}. - Written (
output_dirset):{mode:"written", directory, output_dir, output_format, total, succeeded, failed, elapsed_ms}— per-file content is written to disk; read from the mirrored output files.
Extract text from a base64-encoded image (supports data:image/...;base64, prefix).
{
"image_base64": "iVBORw0KGgoAAAANS...",
"text_only": false
}Response: full OCRResult dict plus elapsed_ms (omitted when text_only=true).
Add the following to your Claude Desktop configuration file:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"sauron": {
"command": "uv",
"args": [
"--directory",
"/absolute/path/to/Sauron",
"run",
"sauron",
"serve"
]
}
}
}Use
uv --directoryso that uv can automatically locate.venvandpyproject.toml. Do not hard-code the venv path.
A template file is provided in the repository: claude_desktop_config.example.json.
After installing textual, run sauron with no arguments to enter a 5-screen interactive wizard:
| Screen | Content | Actions |
|---|---|---|
| 1. Select image directory | File tree + path input | Browse or type a path, Enter to confirm |
| 2. Select output directory | File tree + path input + format selector (JSON/TXT/Both) | Choose directory and output format |
| 3. Pre-scan report | File count, total size, extension breakdown, estimated time, preview of first 20 files | Continue or go back to modify |
| 4. Confirm execution | Summary of all settings | Execute / Go back / Cancel |
| 5. Execution progress | Progress bar + current file + success/failure counts | Wait for completion, Enter to exit |
Keyboard shortcuts:
| Key | Action |
|---|---|
Enter |
Confirm / Next step |
Esc |
Go back to previous screen |
Ctrl-C |
Cancel immediately (during execution, results completed so far are preserved) |
Output rules:
- Results are written to the user-selected output directory with filenames
<original-name>.json/<original-name>.txt - Existing files with the same name are overwritten
- A single file failure does not interrupt the entire process
- The completion screen shows
Succeeded / Failed / Elapsed / Outputcounters - After exiting, a one-line summary is printed to stderr:
Done: N succeeded, M failed in 12.34s, output at <path>
Installing TUI support:
uv sync --extra tuiAll fields are optional; missing fields use default values.
{
"model_dir": null,
"lang": "ch",
"output_format": "json",
"last_input_dir": null,
"last_output_dir": null,
"json_pretty": false
}| Field | Type | Default | Description |
|---|---|---|---|
model_dir |
string | null | null | Model directory. Absolute path or relative to the settings.json location |
lang |
string | "ch" |
OCR language |
output_format |
string | "json" |
Default output format for the TUI wizard: json / txt / both |
last_input_dir |
string | null | null | TUI memory: last used image directory |
last_output_dir |
string | null | null | TUI memory: last used output directory |
json_pretty |
bool | false | Whether CLI pretty-prints JSON by default |
Template file: settings.example.json
In order of priority (first match is loaded; no cross-file merging):
- File specified by the
$SAURON_CONFIGenvironment variable app_dir()/settings.json(inside the app directory, portable-first)~/.sauron/settings.json(user directory fallback)
If the configuration file cannot be read (missing, invalid JSON, wrong field types), defaults are used silently without errors.
| Variable | Description |
|---|---|
SAURON_CONFIG |
Absolute path to settings.json |
SAURON_MODEL_DIR |
Absolute path to the model directory (second highest priority after --model-dir) |
CLI flags such as --model-dir and --lang always take precedence over settings.json and environment variables.
# Download to the default models/ directory inside the app directory
uv run python download_models.py
# Specify a custom directory
uv run python download_models.py --output-dir /path/to/models
# Force download to the user directory
uv run python download_models.py --to-user-dir| Model | Purpose | Size |
|---|---|---|
| det | Text region detection | ~4.7MB |
| cls | Orientation classification | ~1.4MB |
| rec | Character recognition | ~10MB |
Existing model directories are automatically skipped.
At runtime, Sauron searches for the model directory in the following order (first match is used):
--model-dirCLI argument$SAURON_MODEL_DIRenvironment variablemodel_dirfield insettings.jsonapp_dir()/models/(inside the app directory)~/.sauron/models/(user directory)- None of the above match — PaddleOCR downloads automatically to
~/.paddleocr/
Build a standalone executable using PyInstaller:
# Make sure models are downloaded first
uv run python download_models.py
# Build (default: onedir mode)
bash build.sh
# Single-file mode
bash build.sh --onefile
# Specify model directory
bash build.sh --model-dir /path/to/modelsBuild output is in the dist/sauron/ directory.
# Install all dependencies (including dev + tui)
make dev
# Or
uv sync --extra dev --extra tuimake test
# Or
uv run pytest -vCurrent tests: 153 passed, 2 skipped (E2E tests require real models).
| Target | Description |
|---|---|
make dev |
Install all development dependencies |
make test |
Run tests |
make lint |
Run ruff linter and formatter check (read-only, same as CI) |
make format |
Auto-fix with ruff format + ruff check --fix |
make install-hooks |
Install a git pre-commit hook mirroring the CI lint job |
make models |
Download models |
make build |
PyInstaller build |
make clean |
Clean build artifacts |
make release |
Release final version (see Releasing) |
make release-alpha |
Release alpha version |
make release-beta |
Release beta version |
make release-rc |
Release candidate |
CI enforces ruff format --check and ruff check. To catch failures locally before pushing, install the provided git hook once per clone:
make install-hooks # writes .git/hooks/pre-commitOn every git commit the hook runs the same checks as CI on staged src/ and tests/ Python files. When it fires, run:
make format # ruff format + ruff check --fix
git add -u && git commit # re-stage and retrymake format is also safe to run any time — it's the fastest way to clear CI lint failures.
uv run textual run --dev src/sauron/tui/app.pySauron follows PEP 440 versioning:
| Stage | Format | Example | PyPI | GitHub Release |
|---|---|---|---|---|
| Development | X.Y.Z.devN |
0.1.0.dev0 |
— | — |
| Alpha | X.Y.ZaN |
0.2.0a1 |
— | Pre-release |
| Beta | X.Y.ZbN |
0.2.0b1 |
— | Pre-release |
| Release Candidate | X.Y.ZrcN |
0.2.0rc1 |
— | Pre-release |
| Final | X.Y.Z |
0.2.0 |
Published | Release |
The version is defined in src/sauron/__init__.py (single source of truth). pyproject.toml reads it dynamically via hatchling — you never need to edit both files.
The patch version auto-increments. You only need to specify VERSION=X.Y.Z when bumping major or minor versions.
# Common workflow: alpha → beta → rc → final
make release-alpha # 0.1.0 → 0.1.1a1
make release-alpha # 0.1.1a1 → 0.1.1a2 (keep iterating)
make release-beta # 0.1.1a2 → 0.1.1b1 (promote)
make release-rc # 0.1.1b1 → 0.1.1rc1 (promote)
make release # 0.1.1rc1 → 0.1.1 (finalize)
# Next cycle auto-bumps patch
make release-alpha # 0.1.1 → 0.1.2a1
# Manual major/minor bump (only when needed)
make release-alpha VERSION=0.2.0 # → 0.2.0a1
make release VERSION=1.0.0 # → 1.0.0
# Preview without making changes
make release-alpha ARGS="--dry-run"
# Create commit and tag but don't push
make release ARGS="--no-push"| Current | Command | Next | Logic |
|---|---|---|---|
0.1.0 |
make release |
0.1.1 |
patch++ |
0.1.0 |
make release-alpha |
0.1.1a1 |
patch++ then a1 |
0.1.1a1 |
make release-alpha |
0.1.1a2 |
alpha num++ |
0.1.1a2 |
make release-beta |
0.1.1b1 |
promote to beta |
0.1.1b1 |
make release-beta |
0.1.1b2 |
beta num++ |
0.1.1b2 |
make release-rc |
0.1.1rc1 |
promote to rc |
0.1.1rc1 |
make release-rc |
0.1.1rc2 |
rc num++ |
0.1.1rc2 |
make release |
0.1.1 |
strip pre-release (finalize) |
0.1.0.dev0 |
make release-alpha |
0.1.0a1 |
dev to alpha (same base) |
0.1.0.dev0 |
make release |
0.1.0 |
dev to final (same base) |
make release-alpha
│
├─ 1. Safety checks (clean tree, on master, tag available)
├─ 2. Run tests (uv run pytest)
├─ 3. Update version in src/sauron/__init__.py
├─ 4. Verify wheel builds (uv build)
├─ 5. Git commit: "release: v0.1.1a1"
├─ 6. Git tag: v0.1.1a1
└─ 7. Prompt: "Push commit and tag to origin? [Y/n]"
│
└─ Y ─→ git push --follow-tags
│
└─ GitHub receives tag v0.1.1a1
│
├─ release.yml: test → build wheel → create GitHub Release
│ (alpha/beta/rc marked as Pre-release)
│
└─ build.yml: build PyInstaller binaries
├─ sauron-macos-arm64.zip
├─ sauron-macos-x86_64.zip
└─ sauron-linux-x86_64.tar.gz
(attached to the GitHub Release)
Final releases (X.Y.Z without pre-release suffix) are additionally published to PyPI.
The release script can also be invoked directly:
./scripts/release.sh [--alpha|--beta|--rc] [--version X.Y.Z] [--dry-run] [--no-push]| Option | Description |
|---|---|
--alpha |
Target alpha pre-release |
--beta |
Target beta pre-release |
--rc |
Target release candidate |
--version X.Y.Z |
Manually set major.minor.patch |
--dry-run |
Preview actions without making changes |
--no-push |
Create commit and tag locally, do not push |
--allow-branch B |
Allow releasing from a branch other than master |
| Workflow | Trigger | What it does |
|---|---|---|
CI (ci.yml) |
Push to master, PRs | Lint (ruff) + test matrix (2 OS x 3 Python versions) + build check |
Release (release.yml) |
Push tag v* |
Test → build wheel → create GitHub Release → publish to PyPI (final only) |
Build Binaries (build.yml) |
Release published | Build PyInstaller binaries for macOS (arm64 + x86_64) and Linux (x86_64) |
The project maintains a CHANGELOG.md following the Keep a Changelog format. Update it before each release to document notable changes.
To enable automatic PyPI publishing for final releases, configure Trusted Publishers on pypi.org:
- Go to https://pypi.org/manage/project/sauron-ocr/settings/publishing/
- Add a new publisher: GitHub, repository
truewatch/Sauron, workflowrelease.yml
Sauron/
├── CLAUDE.md # Contract-based specification document
├── README.md # This file
├── CHANGELOG.md # Release changelog
├── pyproject.toml # Project configuration and dependencies
├── uv.lock # Locked dependency versions
├── Makefile # Development shortcut commands
├── download_models.py # Model download script
├── build.sh # PyInstaller build script
├── scripts/
│ ├── release.sh # Release automation script
│ └── install-hooks.sh # Installs .git/hooks/pre-commit mirroring CI lint
├── .github/workflows/
│ ├── ci.yml # CI: lint + test + build check
│ ├── release.yml # Release: wheel + GitHub Release + PyPI
│ └── build.yml # Build: cross-platform PyInstaller binaries
├── claude_desktop_config.example.json # Claude Desktop configuration template
├── settings.example.json # settings.json template
├── models/ # Model files (git-ignored)
│ ├── det/ # Text detection model
│ ├── cls/ # Orientation classification model
│ └── rec/ # Character recognition model
├── src/sauron/
│ ├── __init__.py # Version number (single source of truth)
│ ├── paths.py # Path resolution and config loading
│ ├── ocr_engine.py # OCR engine (PaddleOCR wrapper)
│ ├── progress.py # Progress event definitions
│ ├── scanner.py # Directory scanning
│ ├── cli.py # CLI entry point (click)
│ ├── mcp_server.py # MCP stdio server
│ └── tui/ # TUI wizard (requires textual)
│ ├── __init__.py # run_wizard() entry point
│ ├── app.py # WizardApp screen orchestration
│ ├── state.py # State machine (pure dataclass)
│ ├── styles.tcss # Styles
│ └── screens/ # 5 wizard screens
└── tests/ # Test suite
├── test_paths.py # Paths and configuration
├── test_engine.py # OCR engine
├── test_scanner.py # Directory scanning
├── test_wizard_flow.py # Wizard state machine
├── test_progress.py # Progress events
├── test_cli.py # CLI commands
├── test_mcp.py # MCP server
└── test_e2e.py # End-to-end tests
Contributions are welcome! Please read CONTRIBUTING.md before submitting a pull request.
This project follows the Contributor Covenant code of conduct.
To report a vulnerability, see SECURITY.md.