🔍 DeepSeek-OCR-WebUI

Intelligent OCR System · Batch Processing · Multi-Mode Support · Bounding Box Visualization

Features • Quick Start • Version History • Documentation • Contributing

🎉 Major Update: Apple Silicon Support!

🍎 Now fully supports Mac M1/M2/M3/M4 with native MPS acceleration!

DeepSeek-OCR-WebUI v3.3 brings native Apple Silicon support, enabling Mac users to run high-performance OCR locally with:

✅ Native MPS Backend - Metal Performance Shaders acceleration
✅ Easy Setup - One-command conda environment installation
✅ Private Deployment - Run completely offline on your Mac
✅ Fast Inference - ~3s per image on M3 Pro

👉 Jump to Mac Deployment Guide

📖 Introduction

DeepSeek-OCR-WebUI is an intelligent image recognition web application based on the DeepSeek-OCR model, providing an intuitive user interface and powerful recognition capabilities.

🖼️ UI Preview

Modern user interface with multilingual support, batch processing, and bounding box visualization

📈 Star History

Star growth over time - Help us grow! ⭐

✨ Core Highlights

🎯 7 Recognition Modes - Document, OCR, Chart, Find, Freeform, etc.
🖼️ Bounding Box Visualization - Find mode automatically annotates positions
📦 Batch Processing - Support for multiple image sequential recognition
📄 PDF Support - Upload PDF files, automatically convert to images
🎨 Modern UI - Cool gradient backgrounds and animation effects
🌐 Multilingual Support - Simplified Chinese, Traditional Chinese, English, Japanese
🍎 Apple Silicon Support - Native MPS acceleration for Mac M1/M2/M3/M4
🐳 Docker Deployment - One-click startup, ready to use
⚡ GPU Acceleration - High-performance inference based on NVIDIA GPU
🌏 ModelScope Fallback - Auto-switch to ModelScope when HuggingFace is unavailable

🚀 Features

7 Recognition Modes

Mode	Icon	Description	Use Cases
Doc to Markdown	📄	Preserve format and layout	Contracts, papers, reports
General OCR	📝	Extract all visible text	Image text extraction
Plain Text	📋	Pure text without format	Simple text recognition
Chart Parser	📊	Recognize charts and formulas	Data charts, math formulas
Image Description	🖼️	Generate detailed descriptions	Image understanding, accessibility
Find & Locate ⭐	🔍	Find and annotate positions	Invoice field locating
Custom Prompt ⭐	✨	Customize recognition needs	Flexible recognition tasks

📄 PDF Support (New in v3.2)

DeepSeek-OCR-WebUI now supports PDF file uploads! When you upload a PDF file, it automatically converts each page to a separate image, maintaining all subsequent processing logic (OCR recognition, batch processing, etc.).

PDF upload and automatic conversion to images - Each page becomes a separate image for processing

Key Features:

Multi-page PDF Conversion: Automatically converts each page to a separate image
Real-time Progress: Shows conversion progress page by page
Drag & Drop: Support drag & drop PDF upload
Find Mode: PDF support in Find mode (uses first page automatically)
Format Validation: Automatic file type detection and error prompts
Seamless Integration: Converted images follow the same processing pipeline as regular images

🌏 ModelScope Auto-Fallback (New in v3.2)

Auto-Switch: Automatically switches to ModelScope when HuggingFace is unavailable
Smart Detection: Intelligently detects network errors and timeouts
China-Friendly: Seamless experience for users in mainland China
5-minute Timeout: Configurable timeout for model loading

🎨 Find Mode Features

Left-Right Split Layout:

┌──────────────────────┬─────────────────────────────┐
│   Left: Control Panel │    Right: Result Display    │
├──────────────────────┼─────────────────────────────┤
│ 📤 Image Upload      │ 🖼️ Result Image (with boxes) │
│ 🎯 Search Input      │ 📊 Statistics               │
│ 🚀 Action Buttons    │ 📝 Recognition Text         │
│                      │ 📦 Match List                │
└──────────────────────┴─────────────────────────────┘

Bounding Box Visualization:

🟢 Colorful neon border auto-annotation
🎨 6 colors in rotation
📍 Precise coordinate positioning
🔄 Responsive auto-redraw

Feature Demo:

Find & Locate mode in action: Upload on left, auto-annotated results on right

🌐 Multilingual Support

Supported Languages

🇨🇳 Simplified Chinese (zh-CN)
🇹🇼 Traditional Chinese (zh-TW)
🇺🇸 English (en-US) - Default
🇯🇵 Japanese (ja-JP)

How to Switch Language

Web UI:

Click the language selector in the top-right corner
Select your desired language
Interface switches immediately, settings auto-save

📦 Quick Start

Prerequisites

For Docker (Recommended):

Docker & Docker Compose
NVIDIA GPU + Drivers (for GPU acceleration)
8GB+ RAM
20GB+ Disk Space

For Mac (Apple Silicon):

macOS with Apple Silicon (M1/M2/M3/M4)
Python 3.11+
16GB+ RAM (recommended)
20GB+ Disk Space

For Linux (Native):

Python 3.11+
NVIDIA GPU + CUDA (optional, for acceleration)
8GB+ RAM
20GB+ Disk Space

🐳 Option 1: Docker Deployment (Linux/Windows)

Best for: Linux servers with NVIDIA GPU, production environments

# 1. Clone repository
git clone https://github.com/neosun100/DeepSeek-OCR-WebUI.git
cd DeepSeek-OCR-WebUI

# 2. Start service
docker compose up -d

# 3. Wait for model loading (about 1-2 minutes)
docker logs -f deepseek-ocr-webui

# 4. Access Web UI
# The service listens on all network interfaces (0.0.0.0:8001)
# Choose the appropriate access method:
#
# - Local access: http://localhost:8001
# - LAN access: http://<server-ip>:8001
# - Domain access: http://<your-domain>:8001 (if configured)
#
# Example: If your server IP is 192.168.1.100, use:
# http://192.168.1.100:8001

Access Methods:

Local Machine: http://localhost:8001
Remote Server (No Domain): http://<服务器IP地址>:8001
- Find your IP: hostname -I or ip addr show
- Example: If IP is 192.168.1.100, access http://192.168.1.100:8001
With Domain: http://<your-domain>:8001 or https://<your-domain>
- Configure your reverse proxy (nginx/caddy) to forward to localhost:8001

🍎 Option 2: Mac Native Deployment (Apple Silicon)

Best for: Mac M1/M2/M3/M4 users, local development

⚠️ Important: Always use a conda virtual environment to avoid dependency conflicts.

Step 1: Install Dependencies

# Clone repository
git clone https://github.com/neosun100/DeepSeek-OCR-WebUI.git
cd DeepSeek-OCR-WebUI

# Create and activate conda environment (REQUIRED)
conda create -n deepseek-ocr-mlx python=3.11
conda activate deepseek-ocr-mlx

# Install PyTorch with MPS support
pip install torch torchvision

# Install required packages
pip install transformers==4.46.3 tokenizers==0.20.3
pip install fastapi uvicorn PyMuPDF Pillow
pip install einops addict easydict matplotlib

# Or install all dependencies at once
pip install -r requirements-mac.txt

# Verify installation (optional)
./verify_mac_env.sh

Step 2: Start Service

# IMPORTANT: Always activate the conda environment first
conda activate deepseek-ocr-mlx

# Start service (auto-detects MPS backend)
./start.sh

# Or manually
python web_service_unified.py

Step 3: Access Web UI

Access Methods:

Local Machine: http://localhost:8001
Remote Server: http://<服务器IP>:8001
- Find IP: ifconfig | grep "inet " or ip addr show
- Example: If IP is 192.168.1.100, access http://192.168.1.100:8001
With Domain: Configure reverse proxy to point to localhost:8001

Note: First run will download ~7GB model, please be patient.

🐧 Option 3: Linux Native Deployment

Best for: Linux servers, custom configurations

With NVIDIA GPU:

# Install PyTorch with CUDA
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# Install dependencies
pip install transformers==4.46.3 tokenizers==0.20.3
pip install fastapi uvicorn PyMuPDF Pillow
pip install einops addict easydict matplotlib

# Start service (auto-detects CUDA backend)
./start.sh

Without GPU (CPU only):

# Install PyTorch CPU version
pip install torch torchvision

# Install dependencies
pip install transformers==4.46.3 tokenizers==0.20.3
pip install fastapi uvicorn PyMuPDF Pillow
pip install einops addict easydict matplotlib

# Start service (auto-detects CPU backend)
./start.sh

✅ Verify Installation

# Check container status (Docker)
docker compose ps

# Check health status
curl http://localhost:8001/health

# Expected response:
# {
#   "status": "healthy",
#   "backend": "mps",  # or "cuda" or "cpu"
#   "platform": "Darwin",  # or "Linux"
#   "model_loaded": true
# }

🔧 Platform Detection

The service automatically detects your platform and uses the optimal backend:

Platform	Backend	Acceleration	Auto-Detected
Mac M1/M2/M3/M4	MPS	Metal GPU	✅ Yes
Linux + NVIDIA GPU	CUDA	CUDA GPU	✅ Yes
Linux (CPU only)	CPU	None	✅ Yes
Docker	CUDA	CUDA GPU	✅ Yes

Force specific backend (optional):

FORCE_BACKEND=mps ./start.sh   # Force MPS (Mac only)
FORCE_BACKEND=cuda ./start.sh  # Force CUDA (Linux+GPU)
FORCE_BACKEND=cpu ./start.sh   # Force CPU (any platform)
# http://localhost:8001

Verify Installation

# Check container status
docker compose ps

# Check health status
curl http://localhost:8001/health

# View logs
docker logs deepseek-ocr-webui

📊 Version History

v3.3 (2025-11-05) - Apple Silicon Support & Multi-Platform

🍎 Apple Silicon Support:

✅ Native MPS (Metal Performance Shaders) backend for Mac M1/M2/M3/M4
✅ Automatic platform detection and backend selection
✅ Optimized float32 precision for MPS compatibility
✅ ~7GB model with automatic download and caching

🌍 Multi-Platform Architecture:

✅ Unified backend interface (MPS/CUDA/CPU)
✅ Smart platform detection (Mac/Linux/Docker)
✅ Independent backend implementations (no conflicts)
✅ Universal startup script (./start.sh)

🔧 Technical Improvements:

✅ Model revision: 1e3401a3d4603e9e71ea0ec850bfead602191ec4 (MPS support)
✅ Transformers 4.46.3 compatibility
✅ Fixed LlamaFlashAttention2 import issues
✅ Unified model inference interface across platforms

📚 Documentation:

✅ Multi-platform deployment guide
✅ Platform compatibility documentation
✅ Verification tools (verify_platform.sh)

v3.2 (2025-11-04) - PDF Support & ModelScope Fallback

📄 New Features:

✅ PDF upload support (auto-convert to images)
✅ Multi-page PDF conversion with real-time progress
✅ Drag & drop PDF upload
✅ ModelScope auto-fallback (when HuggingFace unavailable)
✅ Smart network error detection and retry

🐛 Bug Fixes:

✅ Fixed PDF conversion progress logging
✅ Fixed button text duplication in i18n
✅ Fixed system initialization log information

🔧 Technical Improvements:

✅ PyMuPDF integration for high-quality PDF conversion (144 DPI)
✅ Async PDF processing for real-time progress
✅ Enhanced error handling and logging

v3.1 (2025-10-22) - Multilingual & Bug Fixes

🌐 New Features:

✅ Added multilingual support (Simplified Chinese, Traditional Chinese, English, Japanese)
✅ Language selector UI component
✅ Localization persistence storage
✅ Multilingual documentation (README)

🐛 Bug Fixes:

✅ Fixed mode switching issues
✅ Fixed bounding boxes exceeding image boundaries
✅ Optimized image container layout
✅ Added rendering delay for alignment

🎨 UI Optimization:

✅ Centered image display
✅ Responsive bounding box redraw
✅ Language switcher integration

v3.0 (2025-10-22) - Find Mode & Split Layout

✨ Major Updates:

✅ New Find mode (find & locate)
✅ Dedicated left-right split layout
✅ Canvas bounding box visualization
✅ Colorful neon annotation effects

🔧 Technical Improvements:

✅ transformers engine (replacing vLLM)
✅ Precise coordinate conversion algorithm
✅ Responsive design optimization

📖 Documentation

User Documentation

Technical Documentation

🎯 Usage Examples

Find Mode Example

Scenario: Find "Total" amount in invoice

Steps:
1. Select "🔍 Find & Locate" mode
2. Upload invoice image
3. Enter search term: Total
4. Click "🚀 Start Search"

Results:
✓ "Total" marked with green border on image
✓ Shows 1-2 matches found
✓ Provides precise coordinate information

Batch Processing Example

Scenario: Batch recognize 20 contracts

Steps:
1. Select "📄 Doc to Markdown" mode
2. Drag and upload 20 images
3. Adjust order (optional)
4. Click "🚀 Start Recognition"

Results:
✓ Process each image sequentially
✓ Real-time progress display
✓ Auto-merge all results
✓ One-click copy or download

🔧 Configuration

Environment Variables

# docker-compose.yml
API_HOST=0.0.0.0              # Listen address
MODEL_NAME=deepseek-ai/DeepSeek-OCR  # Model name
CUDA_VISIBLE_DEVICES=0        # GPU device

Performance Tuning

# Memory configuration
shm_size: "8g"                # Shared memory

# GPU configuration
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: 1
          capabilities: [gpu]

🤝 Contributing

Contributions welcome! Please check the Contributing Guide.

How to Contribute

Fork this repository
Create feature branch (git checkout -b feature/AmazingFeature)
Commit changes (git commit -m 'Add some AmazingFeature')
Push to branch (git push origin feature/AmazingFeature)
Open Pull Request

📞 Support

Having Issues?

Check Troubleshooting
Check Known Issues
Submit an Issue

Feature Suggestions?

Check Roadmap
Submit a Feature Request

📱 Follow Us

Scan to get more information

📄 License

This project is licensed under the MIT License.

🙏 Acknowledgments

DeepSeek-AI - DeepSeek-OCR model
deepseek_ocr_app - Reference project
All contributors and users

🔗 Related Links

⭐ If this project helps you, please give it a Star! ⭐

Made with ❤️ by neosun100

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
DeepSeek-OCR-master		DeepSeek-OCR-master
assets		assets
backends		backends
images		images
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
ABOUT.md		ABOUT.md
BUGFIX_SUMMARY.md		BUGFIX_SUMMARY.md
CHANGELOG.md		CHANGELOG.md
DEPLOYMENT_SUMMARY.md		DEPLOYMENT_SUMMARY.md
DeepSeek_OCR_paper.pdf		DeepSeek_OCR_paper.pdf
Dockerfile		Dockerfile
Dockerfile.gpu		Dockerfile.gpu
ENHANCED_FEATURES.md		ENHANCED_FEATURES.md
FINAL_SUMMARY.txt		FINAL_SUMMARY.txt
FIND_MODE_GUIDE.md		FIND_MODE_GUIDE.md
FIND_MODE_V2_GUIDE.md		FIND_MODE_V2_GUIDE.md
GITHUB_INTEGRATION.md		GITHUB_INTEGRATION.md
GPU_MANAGEMENT.md		GPU_MANAGEMENT.md
GPU_UPGRADE_SUMMARY.md		GPU_UPGRADE_SUMMARY.md
I18N_IMPLEMENTATION.md		I18N_IMPLEMENTATION.md
I18N_TEST_GUIDE.md		I18N_TEST_GUIDE.md
LICENSE		LICENSE
MULTILINGUAL_SUPPORT.txt		MULTILINGUAL_SUPPORT.txt
PUSH_SUMMARY.md		PUSH_SUMMARY.md
QUICKSTART_GPU.md		QUICKSTART_GPU.md
QUICK_START.md		QUICK_START.md
README.md		README.md
README_MULTIPLATFORM.md		README_MULTIPLATFORM.md
README_ja.md		README_ja.md
README_v4.md		README_v4.md
README_zh-CN.md		README_zh-CN.md
README_zh-TW.md		README_zh-TW.md
boundary_issue.png		boundary_issue.png
deepseek-ocr.service		deepseek-ocr.service
docker-compose.gpu.yml		docker-compose.gpu.yml
docker-compose.yml		docker-compose.yml
find_mode_issue.png		find_mode_issue.png
fix_ui.py		fix_ui.py
fix_ui_final.py		fix_ui_final.py
fix_ui_footer.py		fix_ui_footer.py
gpu_manager.py		gpu_manager.py
i18n.js		i18n.js
ocr_ui_enhanced.html		ocr_ui_enhanced.html
ocr_ui_modern.html		ocr_ui_modern.html
ocr_ui_modern.html.backup		ocr_ui_modern.html.backup
ocr_ui_modern_backup.html		ocr_ui_modern_backup.html
ocr_ui_modern_backup_v3.html		ocr_ui_modern_backup_v3.html
requirements-mac.txt		requirements-mac.txt
requirements.txt		requirements.txt
start.sh		start.sh
start_gpu.sh		start_gpu.sh
test_gpu_management.sh		test_gpu_management.sh
verify_mac_env.sh		verify_mac_env.sh
web_service.py		web_service.py
web_service_gpu.py		web_service_gpu.py
web_service_unified.py		web_service_unified.py
web_service_vllm_backup.py		web_service_vllm_backup.py
公众号文章_v3.2更新.md		公众号文章_v3.2更新.md

License

neosun100/DeepSeek-OCR-WebUI

Folders and files

Latest commit

History

Repository files navigation

🔍 DeepSeek-OCR-WebUI

🎉 Major Update: Apple Silicon Support!

📖 Introduction

🖼️ UI Preview

📈 Star History

✨ Core Highlights

🚀 Features

7 Recognition Modes

📄 PDF Support (New in v3.2)

🌏 ModelScope Auto-Fallback (New in v3.2)

🎨 Find Mode Features

🌐 Multilingual Support

Supported Languages

How to Switch Language

📦 Quick Start

Prerequisites

🐳 Option 1: Docker Deployment (Linux/Windows)

🍎 Option 2: Mac Native Deployment (Apple Silicon)

Step 1: Install Dependencies

Step 2: Start Service

Step 3: Access Web UI

🐧 Option 3: Linux Native Deployment

With NVIDIA GPU:

Without GPU (CPU only):

✅ Verify Installation

🔧 Platform Detection

Verify Installation

📊 Version History

v3.3 (2025-11-05) - Apple Silicon Support & Multi-Platform

v3.2 (2025-11-04) - PDF Support & ModelScope Fallback

v3.1 (2025-10-22) - Multilingual & Bug Fixes

v3.0 (2025-10-22) - Find Mode & Split Layout

📖 Documentation

User Documentation

Technical Documentation

🎯 Usage Examples

Find Mode Example

Batch Processing Example

🔧 Configuration

Environment Variables

Performance Tuning

🤝 Contributing

How to Contribute

📞 Support

Having Issues?

Feature Suggestions?

📱 Follow Us

📄 License

🙏 Acknowledgments

🔗 Related Links

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages