Skip to content

secureml-au/Document-AI--OCR---Data-Extraction-

Repository files navigation

DocuScan AI: Automated Document Processing & Data Extraction

DocuScan AI Banner

Overview

DocuScan AI is a professional, AI-powered solution for extracting structured data from business documents. Built with modern OCR technologies and a high-end web interface, it allows users to transform scanned invoices, receipts, and forms into clean, actionable digital data (JSON/Excel) in seconds.

Key Features

  • AI-Powered OCR: High-accuracy text extraction using state-of-the-art models (PaddleOCR).
  • Field Extraction: Automatically detects Vendor names, Dates, Total Amounts, and Line Items.
  • Premium UI: Modern, responsive glassmorphism design that "wows" clients.
  • Data Export: Copy extracted data to the clipboard or export for use in accounting software.
  • Multi-Format Support: Works seamlessly with JPEG, PNG, and PDF files.
  • Fast & Lightweight: Built with FastAPI and Vanilla JS for maximum efficiency.

Use Cases

  • Accounting Automation: Reduce manual data entry for bookkeepers.
  • Inventory Tracking: Digitizing supplier receipts.
  • Expense Management: Fast extraction for reimbursement workflows.
  • Digital Archives: Turning physical paper into searchable databases.

Technical Specifications

  • Backend: FastAPI (Python 3.8+)
  • OCR Engine: PaddleOCR (High-accuracy Chinese/English/Multi-lang support)
  • Frontend: Standard HTML5, CSS3, ES6 JavaScript
  • Deployment Ready: Modular architecture designed for easy Dockerization or Cloud deployment.

Installation & Setup

  1. Clone the Project:

    git clone [Your-Repo-Link]
    cd "Document AI (OCR + Data Extraction)"
  2. Setup Virtual Environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install Dependencies:

    pip install -r requirements.txt
  4. Launch the Backend:

    cd backend
    python main.py
  5. Open the Frontend: Simply open frontend/index.html in your browser or serve it via a local web server.


Author

Au Amores

About

AI-powered OCR and data extraction system for converting unstructured documents into structured data using computer vision and machine learning.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors