DocuScan AI: Automated Document Processing & Data Extraction

Overview

DocuScan AI is a professional, AI-powered solution for extracting structured data from business documents. Built with modern OCR technologies and a high-end web interface, it allows users to transform scanned invoices, receipts, and forms into clean, actionable digital data (JSON/Excel) in seconds.

Key Features

AI-Powered OCR: High-accuracy text extraction using state-of-the-art models (PaddleOCR).
Field Extraction: Automatically detects Vendor names, Dates, Total Amounts, and Line Items.
Premium UI: Modern, responsive glassmorphism design that "wows" clients.
Data Export: Copy extracted data to the clipboard or export for use in accounting software.
Multi-Format Support: Works seamlessly with JPEG, PNG, and PDF files.
Fast & Lightweight: Built with FastAPI and Vanilla JS for maximum efficiency.

Use Cases

Accounting Automation: Reduce manual data entry for bookkeepers.
Inventory Tracking: Digitizing supplier receipts.
Expense Management: Fast extraction for reimbursement workflows.
Digital Archives: Turning physical paper into searchable databases.

Technical Specifications

Backend: FastAPI (Python 3.8+)
OCR Engine: PaddleOCR (High-accuracy Chinese/English/Multi-lang support)
Frontend: Standard HTML5, CSS3, ES6 JavaScript
Deployment Ready: Modular architecture designed for easy Dockerization or Cloud deployment.

Installation & Setup

Clone the Project:

git clone [Your-Repo-Link]
cd "Document AI (OCR + Data Extraction)"

Setup Virtual Environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install Dependencies:
```
pip install -r requirements.txt
```
Launch the Backend:
```
cd backend
python main.py
```
Open the Frontend: Simply open frontend/index.html in your browser or serve it via a local web server.

Author

Au Amores

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
backend		backend
frontend		frontend
uploads		uploads
LICENSE		LICENSE
README.md		README.md
crash.txt		crash.txt
document_ai_hero_1774713186627.png		document_ai_hero_1774713186627.png
requirements.txt		requirements.txt
run.py		run.py
stderr.txt		stderr.txt
stdout.txt		stdout.txt
test_imports.py		test_imports.py
test_main_imports.py		test_main_imports.py
test_mock_torch.py		test_mock_torch.py
test_ocr.py		test_ocr.py
test_paddle.py		test_paddle.py
test_paddle_init.py		test_paddle_init.py
test_server.py		test_server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocuScan AI: Automated Document Processing & Data Extraction

Overview

Key Features

Use Cases

Technical Specifications

Installation & Setup

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DocuScan AI: Automated Document Processing & Data Extraction

Overview

Key Features

Use Cases

Technical Specifications

Installation & Setup

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages