Sauron

中文版 | English

A local offline OCR tool powered by PaddleOCR. Extracts text from images and returns structured JSON.

Supports three modes of use: CLI / MCP Server (for AI Agents) / TUI Interactive Wizard (Textual terminal UI).

Features

Fully offline — Models run locally; no internet required, data never leaves your machine
CPU inference — No GPU or CUDA needed; works on Apple Silicon / Intel / Linux
Three interfaces:
- sauron ocr — Command-line OCR with JSON or plain text output
- sauron serve — MCP stdio server for AI Agents such as Claude Desktop
- sauron — Launch the TUI wizard with no arguments: pick directories, preview scan, confirm, batch process
Structured output — Each text region includes coordinates, confidence score, and text, sorted in reading order
Portable design — Models and configuration live alongside the app directory; copy the entire directory to migrate

Quick Start

Prerequisites

Python >=3.10, <3.13
uv package manager

Installation

# Clone the repository
git clone https://github.com/truewatch/Sauron.git
cd Sauron

# Install dependencies
uv sync

# (Optional) Install TUI wizard support
uv sync --extra tui

# (Optional) Install development dependencies
uv sync --extra dev --extra tui

Download Models

uv run python download_models.py

Models are approximately 16 MB total and are downloaded to the models/ directory inside the app directory. See the Model Management section for details.

You can run without downloading first — PaddleOCR will automatically download models from its servers to ~/.paddleocr/ on first use, but this requires an internet connection.

Run OCR

# Single image -> JSON (output to stdout)
sauron ocr screenshot.png

# Multiple images -> JSON array
sauron ocr img1.png img2.jpg img3.bmp

# Text only
sauron ocr --text-only screenshot.png

# Batch mode: scan the ./in/ directory, write results to ./out/ (preserving directory structure)
sauron ocr

# Launch the TUI wizard (requires textual)
sauron

CLI Usage

`sauron` (no subcommand)

Condition	Behavior
TTY + textual installed	Launch TUI wizard
TTY + textual not installed	Print help + installation instructions
Non-TTY (pipe, etc.)	Print help

`sauron ocr IMAGES...`

Perform OCR on one or more images. Supports two modes:

Mode 1 — Specify images (output to stdout):

sauron ocr [OPTIONS] IMAGES...

Mode 2 — Batch directory (no arguments, write to files):

sauron ocr [OPTIONS]

When no IMAGES are provided, automatically scans the in/ subdirectory under the current directory, runs OCR on all images, and writes results to the out/ subdirectory, fully preserving the directory structure from in/.

project-directory/
├── in/                    <- Place images to process here
│   ├── page1.png
│   └── chapter2/
│       ├── fig1.jpg
│       └── fig2.png
└── out/                   <- Generated automatically, mirrors the structure
    ├── page1.json
    └── chapter2/
        ├── fig1.json
        └── fig2.json

Option	Description
`--text-only`	Output text only. In specified-image mode, output goes to stdout; in batch mode, writes `.txt` files
`--lang LANG`	OCR language, defaults to `ch` (Chinese). Supports `en`, `japan`, etc.
`--model-dir PATH`	Specify model directory, overrides all other sources
`--json-pretty / --no-json-pretty`	Whether to pretty-print JSON output
`--tui / --no-tui`	Enable/disable progress panel (enabled by default on TTY)

Specified-image mode output:

Single image: JSON object (stdout)
Multiple images: JSON array (stdout)
--text-only: Plain text (stdout), errors go to stderr

Batch directory mode output:

Each image produces a corresponding .json (or .txt with --text-only) file
Progress and summary are output to stderr: Done: N succeeded, M failed in 12.34s, output at ./out
The in X.XXs wall-clock figure includes model lazy-load on the first call

Exit codes:

0 — All succeeded
1 — At least one image failed OCR
2 — --tui specified but textual not installed, or no arguments and ./in/ does not exist

Output example:

{
  "success": true,
  "file": "/path/to/screenshot.png",
  "results": [
    {
      "text": "Hello World",
      "confidence": 0.96,
      "box": [[10, 20], [200, 20], [200, 50], [10, 50]]
    }
  ],
  "full_text": "Hello World"
}

`sauron serve`

Start the MCP stdio server for use by AI Agents.

sauron serve [--lang LANG] [--model-dir PATH]

No output is produced after startup (stdio mode). Use sauron info to verify the environment.

`sauron info`

Print environment diagnostic information.

sauron:       0.1.0.dev0
python:       3.12.8
platform:     macOS-15.4-arm64-arm-64bit
app_dir:      /Users/you/Sauron
settings:     app (/Users/you/Sauron/settings.json)
model_dir:    app (/Users/you/Sauron/models)
  det: ok
  cls: ok
  rec: ok
paddleocr:    2.9.1
paddle:       3.0.0
textual:      installed

MCP Server

Sauron exposes four tools to AI Agents via MCP (Model Context Protocol). All tool responses include an elapsed_ms field (end-to-end wall-clock, includes lazy model load on the first call).

Implementation note: blocking PaddleOCR work is offloaded to a worker thread so the asyncio event loop stays responsive — this avoids Claude Desktop's -32001 Request timed out during the slow first call. Individual tool calls are still bounded by the client-side timeout (~60 s on Claude Desktop), so for very large directories prefer ocr_extract_text_directory with output_dir (summary-only response) or split work across several ocr_extract_text_batch calls.

Tool List

`ocr_extract_text`

Extract text from a single image file.

{
  "image_path": "/absolute/path/to/image.png",
  "text_only": false
}

Response: full OCRResult dict with elapsed_ms at the top level (omitted when text_only=true, which returns raw text).

`ocr_extract_text_batch`

Extract text from multiple image files in one call.

{
  "image_paths": ["/abs/1.png", "/abs/2.jpg"],
  "text_only": false
}

Response: {"elapsed_ms": <int>, "results": [...]} — results is aligned with image_paths.

`ocr_extract_text_directory`

Scan a directory for images and OCR them all.

{
  "directory": "/abs/path/to/images",
  "recursive": true,
  "output_dir": "/abs/path/to/out",
  "output_format": "json",
  "text_only": false
}

Two response shapes:

Inline (no output_dir): {mode:"inline", directory, total, succeeded, failed, elapsed_ms, files:[...]}.
Written (output_dir set): {mode:"written", directory, output_dir, output_format, total, succeeded, failed, elapsed_ms} — per-file content is written to disk; read from the mirrored output files.

`ocr_extract_text_base64`

Extract text from a base64-encoded image (supports data:image/...;base64, prefix).

{
  "image_base64": "iVBORw0KGgoAAAANS...",
  "text_only": false
}

Response: full OCRResult dict plus elapsed_ms (omitted when text_only=true).

Claude Desktop Configuration

Add the following to your Claude Desktop configuration file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "sauron": {
      "command": "uv",
      "args": [
        "--directory",
        "/absolute/path/to/Sauron",
        "run",
        "sauron",
        "serve"
      ]
    }
  }
}

Use uv --directory so that uv can automatically locate .venv and pyproject.toml. Do not hard-code the venv path.

A template file is provided in the repository: claude_desktop_config.example.json.

TUI Wizard

After installing textual, run sauron with no arguments to enter a 5-screen interactive wizard:

Screen	Content	Actions
1. Select image directory	File tree + path input	Browse or type a path, Enter to confirm
2. Select output directory	File tree + path input + format selector (JSON/TXT/Both)	Choose directory and output format
3. Pre-scan report	File count, total size, extension breakdown, estimated time, preview of first 20 files	Continue or go back to modify
4. Confirm execution	Summary of all settings	Execute / Go back / Cancel
5. Execution progress	Progress bar + current file + success/failure counts	Wait for completion, Enter to exit

Keyboard shortcuts:

Key	Action
`Enter`	Confirm / Next step
`Esc`	Go back to previous screen
`Ctrl-C`	Cancel immediately (during execution, results completed so far are preserved)

Output rules:

Results are written to the user-selected output directory with filenames <original-name>.json / <original-name>.txt
Existing files with the same name are overwritten
A single file failure does not interrupt the entire process
The completion screen shows Succeeded / Failed / Elapsed / Output counters
After exiting, a one-line summary is printed to stderr: Done: N succeeded, M failed in 12.34s, output at <path>

Installing TUI support:

uv sync --extra tui

Configuration

settings.json

All fields are optional; missing fields use default values.

{
  "model_dir": null,
  "lang": "ch",
  "output_format": "json",
  "last_input_dir": null,
  "last_output_dir": null,
  "json_pretty": false
}

Field	Type	Default	Description
`model_dir`	string \| null	null	Model directory. Absolute path or relative to the settings.json location
`lang`	string	`"ch"`	OCR language
`output_format`	string	`"json"`	Default output format for the TUI wizard: `json` / `txt` / `both`
`last_input_dir`	string \| null	null	TUI memory: last used image directory
`last_output_dir`	string \| null	null	TUI memory: last used output directory
`json_pretty`	bool	false	Whether CLI pretty-prints JSON by default

Template file: settings.example.json

Configuration File Lookup Order

In order of priority (first match is loaded; no cross-file merging):

File specified by the $SAURON_CONFIG environment variable
app_dir()/settings.json (inside the app directory, portable-first)
~/.sauron/settings.json (user directory fallback)

If the configuration file cannot be read (missing, invalid JSON, wrong field types), defaults are used silently without errors.

Environment Variables

Variable	Description
`SAURON_CONFIG`	Absolute path to settings.json
`SAURON_MODEL_DIR`	Absolute path to the model directory (second highest priority after `--model-dir`)

CLI Flag Priority

CLI flags such as --model-dir and --lang always take precedence over settings.json and environment variables.

Model Management

Downloading Models

# Download to the default models/ directory inside the app directory
uv run python download_models.py

# Specify a custom directory
uv run python download_models.py --output-dir /path/to/models

# Force download to the user directory
uv run python download_models.py --to-user-dir

Model	Purpose	Size
det	Text region detection	~4.7MB
cls	Orientation classification	~1.4MB
rec	Character recognition	~10MB

Existing model directories are automatically skipped.

Model Lookup Priority

At runtime, Sauron searches for the model directory in the following order (first match is used):

--model-dir CLI argument
$SAURON_MODEL_DIR environment variable
model_dir field in settings.json
app_dir()/models/ (inside the app directory)
~/.sauron/models/ (user directory)
None of the above match — PaddleOCR downloads automatically to ~/.paddleocr/

Packaging & Distribution

Build a standalone executable using PyInstaller:

# Make sure models are downloaded first
uv run python download_models.py

# Build (default: onedir mode)
bash build.sh

# Single-file mode
bash build.sh --onefile

# Specify model directory
bash build.sh --model-dir /path/to/models

Build output is in the dist/sauron/ directory.

Development Guide

Setting Up the Environment

# Install all dependencies (including dev + tui)
make dev
# Or
uv sync --extra dev --extra tui

Running Tests

make test
# Or
uv run pytest -v

Current tests: 153 passed, 2 skipped (E2E tests require real models).

Makefile Targets

Target	Description
`make dev`	Install all development dependencies
`make test`	Run tests
`make lint`	Run ruff linter and formatter check (read-only, same as CI)
`make format`	Auto-fix with `ruff format` + `ruff check --fix`
`make install-hooks`	Install a git pre-commit hook mirroring the CI lint job
`make models`	Download models
`make build`	PyInstaller build
`make clean`	Clean build artifacts
`make release`	Release final version (see Releasing)
`make release-alpha`	Release alpha version
`make release-beta`	Release beta version
`make release-rc`	Release candidate

Lint & Pre-commit Hook

CI enforces ruff format --check and ruff check. To catch failures locally before pushing, install the provided git hook once per clone:

make install-hooks          # writes .git/hooks/pre-commit

On every git commit the hook runs the same checks as CI on staged src/ and tests/ Python files. When it fires, run:

make format                 # ruff format + ruff check --fix
git add -u && git commit    # re-stage and retry

make format is also safe to run any time — it's the fastest way to clear CI lint failures.

TUI Debugging

uv run textual run --dev src/sauron/tui/app.py

Releasing

Version Scheme

Sauron follows PEP 440 versioning:

Stage	Format	Example	PyPI	GitHub Release
Development	`X.Y.Z.devN`	`0.1.0.dev0`	—	—
Alpha	`X.Y.ZaN`	`0.2.0a1`	—	Pre-release
Beta	`X.Y.ZbN`	`0.2.0b1`	—	Pre-release
Release Candidate	`X.Y.ZrcN`	`0.2.0rc1`	—	Pre-release
Final	`X.Y.Z`	`0.2.0`	Published	Release

The version is defined in src/sauron/__init__.py (single source of truth). pyproject.toml reads it dynamically via hatchling — you never need to edit both files.

Release Commands

The patch version auto-increments. You only need to specify VERSION=X.Y.Z when bumping major or minor versions.

# Common workflow: alpha → beta → rc → final
make release-alpha                  # 0.1.0 → 0.1.1a1
make release-alpha                  # 0.1.1a1 → 0.1.1a2 (keep iterating)
make release-beta                   # 0.1.1a2 → 0.1.1b1 (promote)
make release-rc                     # 0.1.1b1 → 0.1.1rc1 (promote)
make release                        # 0.1.1rc1 → 0.1.1 (finalize)

# Next cycle auto-bumps patch
make release-alpha                  # 0.1.1 → 0.1.2a1

# Manual major/minor bump (only when needed)
make release-alpha VERSION=0.2.0    # → 0.2.0a1
make release VERSION=1.0.0          # → 1.0.0

# Preview without making changes
make release-alpha ARGS="--dry-run"

# Create commit and tag but don't push
make release ARGS="--no-push"

Auto-Increment Rules

Current	Command	Next	Logic
`0.1.0`	`make release`	`0.1.1`	patch++
`0.1.0`	`make release-alpha`	`0.1.1a1`	patch++ then a1
`0.1.1a1`	`make release-alpha`	`0.1.1a2`	alpha num++
`0.1.1a2`	`make release-beta`	`0.1.1b1`	promote to beta
`0.1.1b1`	`make release-beta`	`0.1.1b2`	beta num++
`0.1.1b2`	`make release-rc`	`0.1.1rc1`	promote to rc
`0.1.1rc1`	`make release-rc`	`0.1.1rc2`	rc num++
`0.1.1rc2`	`make release`	`0.1.1`	strip pre-release (finalize)
`0.1.0.dev0`	`make release-alpha`	`0.1.0a1`	dev to alpha (same base)
`0.1.0.dev0`	`make release`	`0.1.0`	dev to final (same base)

What Happens When You Release

make release-alpha
│
├─ 1. Safety checks (clean tree, on master, tag available)
├─ 2. Run tests (uv run pytest)
├─ 3. Update version in src/sauron/__init__.py
├─ 4. Verify wheel builds (uv build)
├─ 5. Git commit: "release: v0.1.1a1"
├─ 6. Git tag: v0.1.1a1
└─ 7. Prompt: "Push commit and tag to origin? [Y/n]"
     │
     └─ Y ─→ git push --follow-tags
              │
              └─ GitHub receives tag v0.1.1a1
                 │
                 ├─ release.yml: test → build wheel → create GitHub Release
                 │                      (alpha/beta/rc marked as Pre-release)
                 │
                 └─ build.yml: build PyInstaller binaries
                               ├─ sauron-macos-arm64.zip
                               ├─ sauron-macos-x86_64.zip
                               └─ sauron-linux-x86_64.tar.gz
                               (attached to the GitHub Release)

Final releases (X.Y.Z without pre-release suffix) are additionally published to PyPI.

Release Script Options

The release script can also be invoked directly:

./scripts/release.sh [--alpha|--beta|--rc] [--version X.Y.Z] [--dry-run] [--no-push]

Option	Description
`--alpha`	Target alpha pre-release
`--beta`	Target beta pre-release
`--rc`	Target release candidate
`--version X.Y.Z`	Manually set major.minor.patch
`--dry-run`	Preview actions without making changes
`--no-push`	Create commit and tag locally, do not push
`--allow-branch B`	Allow releasing from a branch other than master

CI/CD Workflows

Workflow	Trigger	What it does
CI (`ci.yml`)	Push to master, PRs	Lint (ruff) + test matrix (2 OS x 3 Python versions) + build check
Release (`release.yml`)	Push tag `v*`	Test → build wheel → create GitHub Release → publish to PyPI (final only)
Build Binaries (`build.yml`)	Release published	Build PyInstaller binaries for macOS (arm64 + x86_64) and Linux (x86_64)

CHANGELOG

The project maintains a CHANGELOG.md following the Keep a Changelog format. Update it before each release to document notable changes.

PyPI Setup (one-time)

To enable automatic PyPI publishing for final releases, configure Trusted Publishers on pypi.org:

Go to https://pypi.org/manage/project/sauron-ocr/settings/publishing/
Add a new publisher: GitHub, repository truewatch/Sauron, workflow release.yml

Project Structure

Sauron/
├── CLAUDE.md                          # Contract-based specification document
├── README.md                          # This file
├── CHANGELOG.md                       # Release changelog
├── pyproject.toml                     # Project configuration and dependencies
├── uv.lock                            # Locked dependency versions
├── Makefile                           # Development shortcut commands
├── download_models.py                 # Model download script
├── build.sh                           # PyInstaller build script
├── scripts/
│   ├── release.sh                     # Release automation script
│   └── install-hooks.sh               # Installs .git/hooks/pre-commit mirroring CI lint
├── .github/workflows/
│   ├── ci.yml                         # CI: lint + test + build check
│   ├── release.yml                    # Release: wheel + GitHub Release + PyPI
│   └── build.yml                      # Build: cross-platform PyInstaller binaries
├── claude_desktop_config.example.json # Claude Desktop configuration template
├── settings.example.json              # settings.json template
├── models/                            # Model files (git-ignored)
│   ├── det/                           # Text detection model
│   ├── cls/                           # Orientation classification model
│   └── rec/                           # Character recognition model
├── src/sauron/
│   ├── __init__.py                    # Version number (single source of truth)
│   ├── paths.py                       # Path resolution and config loading
│   ├── ocr_engine.py                  # OCR engine (PaddleOCR wrapper)
│   ├── progress.py                    # Progress event definitions
│   ├── scanner.py                     # Directory scanning
│   ├── cli.py                         # CLI entry point (click)
│   ├── mcp_server.py                  # MCP stdio server
│   └── tui/                           # TUI wizard (requires textual)
│       ├── __init__.py                # run_wizard() entry point
│       ├── app.py                     # WizardApp screen orchestration
│       ├── state.py                   # State machine (pure dataclass)
│       ├── styles.tcss                # Styles
│       └── screens/                   # 5 wizard screens
└── tests/                             # Test suite
    ├── test_paths.py                  # Paths and configuration
    ├── test_engine.py                 # OCR engine
    ├── test_scanner.py                # Directory scanning
    ├── test_wizard_flow.py            # Wizard state machine
    ├── test_progress.py               # Progress events
    ├── test_cli.py                    # CLI commands
    ├── test_mcp.py                    # MCP server
    └── test_e2e.py                    # End-to-end tests

Contributing

Contributions are welcome! Please read CONTRIBUTING.md before submitting a pull request.

This project follows the Contributor Covenant code of conduct.

Security

To report a vulnerability, see SECURITY.md.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github		.github
docs		docs
scripts		scripts
src/sauron		src/sauron
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CLAUDE_zh.md		CLAUDE_zh.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_zh.md		README_zh.md
SECURITY.md		SECURITY.md
build.sh		build.sh
claude_desktop_config.example.json		claude_desktop_config.example.json
download_models.py		download_models.py
pyproject.toml		pyproject.toml
settings.example.json		settings.example.json
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Sauron

Features

Quick Start

Prerequisites

Installation

Download Models

Run OCR

CLI Usage

sauron (no subcommand)

sauron ocr IMAGES...

sauron serve

sauron info

MCP Server

Tool List

ocr_extract_text

ocr_extract_text_batch

ocr_extract_text_directory

ocr_extract_text_base64

Claude Desktop Configuration

TUI Wizard

Configuration

settings.json

Configuration File Lookup Order

Environment Variables

CLI Flag Priority

Model Management

Downloading Models

Model Lookup Priority

Packaging & Distribution

Development Guide

Setting Up the Environment

Running Tests

Makefile Targets

Lint & Pre-commit Hook

TUI Debugging

Releasing

Version Scheme

Release Commands

Auto-Increment Rules

What Happens When You Release

Release Script Options

CI/CD Workflows

CHANGELOG

PyPI Setup (one-time)

Project Structure

Contributing

Security

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`sauron` (no subcommand)

`sauron ocr IMAGES...`

`sauron serve`

`sauron info`

`ocr_extract_text`

`ocr_extract_text_batch`

`ocr_extract_text_directory`

`ocr_extract_text_base64`

Packages