DocuScan AI is a professional, AI-powered solution for extracting structured data from business documents. Built with modern OCR technologies and a high-end web interface, it allows users to transform scanned invoices, receipts, and forms into clean, actionable digital data (JSON/Excel) in seconds.
- AI-Powered OCR: High-accuracy text extraction using state-of-the-art models (PaddleOCR).
- Field Extraction: Automatically detects Vendor names, Dates, Total Amounts, and Line Items.
- Premium UI: Modern, responsive glassmorphism design that "wows" clients.
- Data Export: Copy extracted data to the clipboard or export for use in accounting software.
- Multi-Format Support: Works seamlessly with JPEG, PNG, and PDF files.
- Fast & Lightweight: Built with FastAPI and Vanilla JS for maximum efficiency.
- Accounting Automation: Reduce manual data entry for bookkeepers.
- Inventory Tracking: Digitizing supplier receipts.
- Expense Management: Fast extraction for reimbursement workflows.
- Digital Archives: Turning physical paper into searchable databases.
- Backend: FastAPI (Python 3.8+)
- OCR Engine: PaddleOCR (High-accuracy Chinese/English/Multi-lang support)
- Frontend: Standard HTML5, CSS3, ES6 JavaScript
- Deployment Ready: Modular architecture designed for easy Dockerization or Cloud deployment.
-
Clone the Project:
git clone [Your-Repo-Link] cd "Document AI (OCR + Data Extraction)"
-
Setup Virtual Environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Dependencies:
pip install -r requirements.txt
-
Launch the Backend:
cd backend python main.py -
Open the Frontend: Simply open
frontend/index.htmlin your browser or serve it via a local web server.
Au Amores
