Skip to content

foprc/sauron

Sauron

中文版 | English

A local offline OCR tool powered by PaddleOCR. Extracts text from images and returns structured JSON.

Supports three modes of use: CLI / MCP Server (for AI Agents) / TUI Interactive Wizard (Textual terminal UI).

Features

  • Fully offline — Models run locally; no internet required, data never leaves your machine
  • CPU inference — No GPU or CUDA needed; works on Apple Silicon / Intel / Linux
  • Three interfaces:
    • sauron ocr — Command-line OCR with JSON or plain text output
    • sauron serve — MCP stdio server for AI Agents such as Claude Desktop
    • sauron — Launch the TUI wizard with no arguments: pick directories, preview scan, confirm, batch process
  • Structured output — Each text region includes coordinates, confidence score, and text, sorted in reading order
  • Portable design — Models and configuration live alongside the app directory; copy the entire directory to migrate

Quick Start

Prerequisites

  • Python >=3.10, <3.13
  • uv package manager

Installation

# Clone the repository
git clone https://github.com/truewatch/Sauron.git
cd Sauron

# Install dependencies
uv sync

# (Optional) Install TUI wizard support
uv sync --extra tui

# (Optional) Install development dependencies
uv sync --extra dev --extra tui

Download Models

uv run python download_models.py

Models are approximately 16 MB total and are downloaded to the models/ directory inside the app directory. See the Model Management section for details.

You can run without downloading first — PaddleOCR will automatically download models from its servers to ~/.paddleocr/ on first use, but this requires an internet connection.

Run OCR

# Single image -> JSON (output to stdout)
sauron ocr screenshot.png

# Multiple images -> JSON array
sauron ocr img1.png img2.jpg img3.bmp

# Text only
sauron ocr --text-only screenshot.png

# Batch mode: scan the ./in/ directory, write results to ./out/ (preserving directory structure)
sauron ocr

# Launch the TUI wizard (requires textual)
sauron

CLI Usage

sauron (no subcommand)

Condition Behavior
TTY + textual installed Launch TUI wizard
TTY + textual not installed Print help + installation instructions
Non-TTY (pipe, etc.) Print help

sauron ocr IMAGES...

Perform OCR on one or more images. Supports two modes:

Mode 1 — Specify images (output to stdout):

sauron ocr [OPTIONS] IMAGES...

Mode 2 — Batch directory (no arguments, write to files):

sauron ocr [OPTIONS]

When no IMAGES are provided, automatically scans the in/ subdirectory under the current directory, runs OCR on all images, and writes results to the out/ subdirectory, fully preserving the directory structure from in/.

project-directory/
├── in/                    <- Place images to process here
│   ├── page1.png
│   └── chapter2/
│       ├── fig1.jpg
│       └── fig2.png
└── out/                   <- Generated automatically, mirrors the structure
    ├── page1.json
    └── chapter2/
        ├── fig1.json
        └── fig2.json
Option Description
--text-only Output text only. In specified-image mode, output goes to stdout; in batch mode, writes .txt files
--lang LANG OCR language, defaults to ch (Chinese). Supports en, japan, etc.
--model-dir PATH Specify model directory, overrides all other sources
--json-pretty / --no-json-pretty Whether to pretty-print JSON output
--tui / --no-tui Enable/disable progress panel (enabled by default on TTY)

Specified-image mode output:

  • Single image: JSON object (stdout)
  • Multiple images: JSON array (stdout)
  • --text-only: Plain text (stdout), errors go to stderr

Batch directory mode output:

  • Each image produces a corresponding .json (or .txt with --text-only) file
  • Progress and summary are output to stderr: Done: N succeeded, M failed in 12.34s, output at ./out
  • The in X.XXs wall-clock figure includes model lazy-load on the first call

Exit codes:

  • 0 — All succeeded
  • 1 — At least one image failed OCR
  • 2--tui specified but textual not installed, or no arguments and ./in/ does not exist

Output example:

{
  "success": true,
  "file": "/path/to/screenshot.png",
  "results": [
    {
      "text": "Hello World",
      "confidence": 0.96,
      "box": [[10, 20], [200, 20], [200, 50], [10, 50]]
    }
  ],
  "full_text": "Hello World"
}

sauron serve

Start the MCP stdio server for use by AI Agents.

sauron serve [--lang LANG] [--model-dir PATH]

No output is produced after startup (stdio mode). Use sauron info to verify the environment.

sauron info

Print environment diagnostic information.

sauron:       0.1.0.dev0
python:       3.12.8
platform:     macOS-15.4-arm64-arm-64bit
app_dir:      /Users/you/Sauron
settings:     app (/Users/you/Sauron/settings.json)
model_dir:    app (/Users/you/Sauron/models)
  det: ok
  cls: ok
  rec: ok
paddleocr:    2.9.1
paddle:       3.0.0
textual:      installed

MCP Server

Sauron exposes four tools to AI Agents via MCP (Model Context Protocol). All tool responses include an elapsed_ms field (end-to-end wall-clock, includes lazy model load on the first call).

Implementation note: blocking PaddleOCR work is offloaded to a worker thread so the asyncio event loop stays responsive — this avoids Claude Desktop's -32001 Request timed out during the slow first call. Individual tool calls are still bounded by the client-side timeout (~60 s on Claude Desktop), so for very large directories prefer ocr_extract_text_directory with output_dir (summary-only response) or split work across several ocr_extract_text_batch calls.

Tool List

ocr_extract_text

Extract text from a single image file.

{
  "image_path": "/absolute/path/to/image.png",
  "text_only": false
}

Response: full OCRResult dict with elapsed_ms at the top level (omitted when text_only=true, which returns raw text).

ocr_extract_text_batch

Extract text from multiple image files in one call.

{
  "image_paths": ["/abs/1.png", "/abs/2.jpg"],
  "text_only": false
}

Response: {"elapsed_ms": <int>, "results": [...]}results is aligned with image_paths.

ocr_extract_text_directory

Scan a directory for images and OCR them all.

{
  "directory": "/abs/path/to/images",
  "recursive": true,
  "output_dir": "/abs/path/to/out",
  "output_format": "json",
  "text_only": false
}

Two response shapes:

  • Inline (no output_dir): {mode:"inline", directory, total, succeeded, failed, elapsed_ms, files:[...]}.
  • Written (output_dir set): {mode:"written", directory, output_dir, output_format, total, succeeded, failed, elapsed_ms} — per-file content is written to disk; read from the mirrored output files.

ocr_extract_text_base64

Extract text from a base64-encoded image (supports data:image/...;base64, prefix).

{
  "image_base64": "iVBORw0KGgoAAAANS...",
  "text_only": false
}

Response: full OCRResult dict plus elapsed_ms (omitted when text_only=true).

Claude Desktop Configuration

Add the following to your Claude Desktop configuration file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "sauron": {
      "command": "uv",
      "args": [
        "--directory",
        "/absolute/path/to/Sauron",
        "run",
        "sauron",
        "serve"
      ]
    }
  }
}

Use uv --directory so that uv can automatically locate .venv and pyproject.toml. Do not hard-code the venv path.

A template file is provided in the repository: claude_desktop_config.example.json.


TUI Wizard

After installing textual, run sauron with no arguments to enter a 5-screen interactive wizard:

Screen Content Actions
1. Select image directory File tree + path input Browse or type a path, Enter to confirm
2. Select output directory File tree + path input + format selector (JSON/TXT/Both) Choose directory and output format
3. Pre-scan report File count, total size, extension breakdown, estimated time, preview of first 20 files Continue or go back to modify
4. Confirm execution Summary of all settings Execute / Go back / Cancel
5. Execution progress Progress bar + current file + success/failure counts Wait for completion, Enter to exit

Keyboard shortcuts:

Key Action
Enter Confirm / Next step
Esc Go back to previous screen
Ctrl-C Cancel immediately (during execution, results completed so far are preserved)

Output rules:

  • Results are written to the user-selected output directory with filenames <original-name>.json / <original-name>.txt
  • Existing files with the same name are overwritten
  • A single file failure does not interrupt the entire process
  • The completion screen shows Succeeded / Failed / Elapsed / Output counters
  • After exiting, a one-line summary is printed to stderr: Done: N succeeded, M failed in 12.34s, output at <path>

Installing TUI support:

uv sync --extra tui

Configuration

settings.json

All fields are optional; missing fields use default values.

{
  "model_dir": null,
  "lang": "ch",
  "output_format": "json",
  "last_input_dir": null,
  "last_output_dir": null,
  "json_pretty": false
}
Field Type Default Description
model_dir string | null null Model directory. Absolute path or relative to the settings.json location
lang string "ch" OCR language
output_format string "json" Default output format for the TUI wizard: json / txt / both
last_input_dir string | null null TUI memory: last used image directory
last_output_dir string | null null TUI memory: last used output directory
json_pretty bool false Whether CLI pretty-prints JSON by default

Template file: settings.example.json

Configuration File Lookup Order

In order of priority (first match is loaded; no cross-file merging):

  1. File specified by the $SAURON_CONFIG environment variable
  2. app_dir()/settings.json (inside the app directory, portable-first)
  3. ~/.sauron/settings.json (user directory fallback)

If the configuration file cannot be read (missing, invalid JSON, wrong field types), defaults are used silently without errors.

Environment Variables

Variable Description
SAURON_CONFIG Absolute path to settings.json
SAURON_MODEL_DIR Absolute path to the model directory (second highest priority after --model-dir)

CLI Flag Priority

CLI flags such as --model-dir and --lang always take precedence over settings.json and environment variables.


Model Management

Downloading Models

# Download to the default models/ directory inside the app directory
uv run python download_models.py

# Specify a custom directory
uv run python download_models.py --output-dir /path/to/models

# Force download to the user directory
uv run python download_models.py --to-user-dir
Model Purpose Size
det Text region detection ~4.7MB
cls Orientation classification ~1.4MB
rec Character recognition ~10MB

Existing model directories are automatically skipped.

Model Lookup Priority

At runtime, Sauron searches for the model directory in the following order (first match is used):

  1. --model-dir CLI argument
  2. $SAURON_MODEL_DIR environment variable
  3. model_dir field in settings.json
  4. app_dir()/models/ (inside the app directory)
  5. ~/.sauron/models/ (user directory)
  6. None of the above match — PaddleOCR downloads automatically to ~/.paddleocr/

Packaging & Distribution

Build a standalone executable using PyInstaller:

# Make sure models are downloaded first
uv run python download_models.py

# Build (default: onedir mode)
bash build.sh

# Single-file mode
bash build.sh --onefile

# Specify model directory
bash build.sh --model-dir /path/to/models

Build output is in the dist/sauron/ directory.


Development Guide

Setting Up the Environment

# Install all dependencies (including dev + tui)
make dev
# Or
uv sync --extra dev --extra tui

Running Tests

make test
# Or
uv run pytest -v

Current tests: 153 passed, 2 skipped (E2E tests require real models).

Makefile Targets

Target Description
make dev Install all development dependencies
make test Run tests
make lint Run ruff linter and formatter check (read-only, same as CI)
make format Auto-fix with ruff format + ruff check --fix
make install-hooks Install a git pre-commit hook mirroring the CI lint job
make models Download models
make build PyInstaller build
make clean Clean build artifacts
make release Release final version (see Releasing)
make release-alpha Release alpha version
make release-beta Release beta version
make release-rc Release candidate

Lint & Pre-commit Hook

CI enforces ruff format --check and ruff check. To catch failures locally before pushing, install the provided git hook once per clone:

make install-hooks          # writes .git/hooks/pre-commit

On every git commit the hook runs the same checks as CI on staged src/ and tests/ Python files. When it fires, run:

make format                 # ruff format + ruff check --fix
git add -u && git commit    # re-stage and retry

make format is also safe to run any time — it's the fastest way to clear CI lint failures.

TUI Debugging

uv run textual run --dev src/sauron/tui/app.py

Releasing

Version Scheme

Sauron follows PEP 440 versioning:

Stage Format Example PyPI GitHub Release
Development X.Y.Z.devN 0.1.0.dev0
Alpha X.Y.ZaN 0.2.0a1 Pre-release
Beta X.Y.ZbN 0.2.0b1 Pre-release
Release Candidate X.Y.ZrcN 0.2.0rc1 Pre-release
Final X.Y.Z 0.2.0 Published Release

The version is defined in src/sauron/__init__.py (single source of truth). pyproject.toml reads it dynamically via hatchling — you never need to edit both files.

Release Commands

The patch version auto-increments. You only need to specify VERSION=X.Y.Z when bumping major or minor versions.

# Common workflow: alpha → beta → rc → final
make release-alpha                  # 0.1.0 → 0.1.1a1
make release-alpha                  # 0.1.1a1 → 0.1.1a2 (keep iterating)
make release-beta                   # 0.1.1a2 → 0.1.1b1 (promote)
make release-rc                     # 0.1.1b1 → 0.1.1rc1 (promote)
make release                        # 0.1.1rc1 → 0.1.1 (finalize)

# Next cycle auto-bumps patch
make release-alpha                  # 0.1.1 → 0.1.2a1

# Manual major/minor bump (only when needed)
make release-alpha VERSION=0.2.0    # → 0.2.0a1
make release VERSION=1.0.0          # → 1.0.0

# Preview without making changes
make release-alpha ARGS="--dry-run"

# Create commit and tag but don't push
make release ARGS="--no-push"

Auto-Increment Rules

Current Command Next Logic
0.1.0 make release 0.1.1 patch++
0.1.0 make release-alpha 0.1.1a1 patch++ then a1
0.1.1a1 make release-alpha 0.1.1a2 alpha num++
0.1.1a2 make release-beta 0.1.1b1 promote to beta
0.1.1b1 make release-beta 0.1.1b2 beta num++
0.1.1b2 make release-rc 0.1.1rc1 promote to rc
0.1.1rc1 make release-rc 0.1.1rc2 rc num++
0.1.1rc2 make release 0.1.1 strip pre-release (finalize)
0.1.0.dev0 make release-alpha 0.1.0a1 dev to alpha (same base)
0.1.0.dev0 make release 0.1.0 dev to final (same base)

What Happens When You Release

make release-alpha
│
├─ 1. Safety checks (clean tree, on master, tag available)
├─ 2. Run tests (uv run pytest)
├─ 3. Update version in src/sauron/__init__.py
├─ 4. Verify wheel builds (uv build)
├─ 5. Git commit: "release: v0.1.1a1"
├─ 6. Git tag: v0.1.1a1
└─ 7. Prompt: "Push commit and tag to origin? [Y/n]"
     │
     └─ Y ─→ git push --follow-tags
              │
              └─ GitHub receives tag v0.1.1a1
                 │
                 ├─ release.yml: test → build wheel → create GitHub Release
                 │                      (alpha/beta/rc marked as Pre-release)
                 │
                 └─ build.yml: build PyInstaller binaries
                               ├─ sauron-macos-arm64.zip
                               ├─ sauron-macos-x86_64.zip
                               └─ sauron-linux-x86_64.tar.gz
                               (attached to the GitHub Release)

Final releases (X.Y.Z without pre-release suffix) are additionally published to PyPI.

Release Script Options

The release script can also be invoked directly:

./scripts/release.sh [--alpha|--beta|--rc] [--version X.Y.Z] [--dry-run] [--no-push]
Option Description
--alpha Target alpha pre-release
--beta Target beta pre-release
--rc Target release candidate
--version X.Y.Z Manually set major.minor.patch
--dry-run Preview actions without making changes
--no-push Create commit and tag locally, do not push
--allow-branch B Allow releasing from a branch other than master

CI/CD Workflows

Workflow Trigger What it does
CI (ci.yml) Push to master, PRs Lint (ruff) + test matrix (2 OS x 3 Python versions) + build check
Release (release.yml) Push tag v* Test → build wheel → create GitHub Release → publish to PyPI (final only)
Build Binaries (build.yml) Release published Build PyInstaller binaries for macOS (arm64 + x86_64) and Linux (x86_64)

CHANGELOG

The project maintains a CHANGELOG.md following the Keep a Changelog format. Update it before each release to document notable changes.

PyPI Setup (one-time)

To enable automatic PyPI publishing for final releases, configure Trusted Publishers on pypi.org:

  1. Go to https://pypi.org/manage/project/sauron-ocr/settings/publishing/
  2. Add a new publisher: GitHub, repository truewatch/Sauron, workflow release.yml

Project Structure

Sauron/
├── CLAUDE.md                          # Contract-based specification document
├── README.md                          # This file
├── CHANGELOG.md                       # Release changelog
├── pyproject.toml                     # Project configuration and dependencies
├── uv.lock                            # Locked dependency versions
├── Makefile                           # Development shortcut commands
├── download_models.py                 # Model download script
├── build.sh                           # PyInstaller build script
├── scripts/
│   ├── release.sh                     # Release automation script
│   └── install-hooks.sh               # Installs .git/hooks/pre-commit mirroring CI lint
├── .github/workflows/
│   ├── ci.yml                         # CI: lint + test + build check
│   ├── release.yml                    # Release: wheel + GitHub Release + PyPI
│   └── build.yml                      # Build: cross-platform PyInstaller binaries
├── claude_desktop_config.example.json # Claude Desktop configuration template
├── settings.example.json              # settings.json template
├── models/                            # Model files (git-ignored)
│   ├── det/                           # Text detection model
│   ├── cls/                           # Orientation classification model
│   └── rec/                           # Character recognition model
├── src/sauron/
│   ├── __init__.py                    # Version number (single source of truth)
│   ├── paths.py                       # Path resolution and config loading
│   ├── ocr_engine.py                  # OCR engine (PaddleOCR wrapper)
│   ├── progress.py                    # Progress event definitions
│   ├── scanner.py                     # Directory scanning
│   ├── cli.py                         # CLI entry point (click)
│   ├── mcp_server.py                  # MCP stdio server
│   └── tui/                           # TUI wizard (requires textual)
│       ├── __init__.py                # run_wizard() entry point
│       ├── app.py                     # WizardApp screen orchestration
│       ├── state.py                   # State machine (pure dataclass)
│       ├── styles.tcss                # Styles
│       └── screens/                   # 5 wizard screens
└── tests/                             # Test suite
    ├── test_paths.py                  # Paths and configuration
    ├── test_engine.py                 # OCR engine
    ├── test_scanner.py                # Directory scanning
    ├── test_wizard_flow.py            # Wizard state machine
    ├── test_progress.py               # Progress events
    ├── test_cli.py                    # CLI commands
    ├── test_mcp.py                    # MCP server
    └── test_e2e.py                    # End-to-end tests

Contributing

Contributions are welcome! Please read CONTRIBUTING.md before submitting a pull request.

This project follows the Contributor Covenant code of conduct.

Security

To report a vulnerability, see SECURITY.md.

License

MIT

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors