Skip to content

Telegram bot for automated invoice processing using OCR. Extract, edit, and store invoice data from PDFs and images.

License

Notifications You must be signed in to change notification settings

AmaLS367/InvoiceFlowBot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Typing SVG

CI Python 3.11+ License: Apache 2.0 Code style: ruff Telegram

OCR SQLite Aiogram


🇷🇺 Русская документация: README.ru.mddocs/ru/index.md


A Telegram bot for automated invoice processing using OCR technology. The bot extracts structured data from PDF invoices and photos, allowing users to review, edit, and save invoice information to a database.

✨ Features

graph LR
    A[📱 Upload] -->|PDF/Image| B[🔍 OCR Extract]
    B -->|Parse Data| C[✏️ Edit Draft]
    C -->|Confirm| D[💾 SQLite]
    D -->|Query| E[📊 Reports]

    style A fill:#4A90E2,stroke:#2c3e50,stroke-width:2px,color:#fff
    style B fill:#FF6B6B,stroke:#2c3e50,stroke-width:2px,color:#fff
    style C fill:#FFD93D,stroke:#2c3e50,stroke-width:2px,color:#333
    style D fill:#50C878,stroke:#2c3e50,stroke-width:2px,color:#fff
    style E fill:#B19CD9,stroke:#2c3e50,stroke-width:2px,color:#fff
Loading
Feature Description Status
🤖 OCR Processing Automatic extraction via Mindee API with provider abstraction
📎 Multiple Formats PDF, JPEG, PNG, HEIC, HEIF, WebP
✏️ Interactive Editing Edit headers and line items via Telegram
💾 Data Storage SQLite with Alembic migrations
📅 Period Queries Filter by date range and supplier
💬 Comment System Add notes to invoices
📊 CSV Export Export line items for analysis

📋 Requirements

  • 🐍 Python 3.11+
  • 🤖 Telegram Bot Token
  • 🔑 Mindee API Key

🚀 Quick Start with Docker

Tip

The fastest way to get started! Docker handles all dependencies automatically.

# 1. Clone and setup environment
git clone https://github.com/AmaLS367/InvoiceFlowBot.git
cd InvoiceFlowBot
Copy-Item .env.example .env

# 2. Edit .env with your tokens
notepad .env

# 3. Start the bot
docker-compose up --build -d

# 4. Check logs
docker-compose logs -f

# 5. Stop when done
docker-compose down

💻 Installation

Note

Requires Python 3.11+ and Git installed on your system.

📦 Step-by-step installation guide

1. Clone the repository

git clone https://github.com/AmaLS367/InvoiceFlowBot.git
cd InvoiceFlowBot

2. Create a virtual environment

python -m venv .venv
.\.venv\Scripts\Activate.ps1

3. Install dependencies

pip install -e .

4. Create a .env file in the project root

BOT_TOKEN=your_telegram_bot_token
MINDEE_API_KEY=your_mindee_api_key
MINDEE_MODEL_ID=your_mindee_model_id

# Optional logging configuration
LOG_LEVEL=INFO
LOG_ROTATE_MB=10
LOG_BACKUPS=5
LOG_CONSOLE=0
LOG_DIR=logs

⚙️ Configuration

The bot is configured via environment variables managed by pydantic settings in config.py.

For local development you can create a .env file in the project root:

BOT_TOKEN=123456:ABCDEF_your_bot_token
MINDEE_API_KEY=your-mindee-api-key
MINDEE_MODEL_ID=mindee/invoices/v4
DB_FILENAME=data.sqlite

On startup the application reads these values into the Settings model.

5. Run the bot

python bot.py

[!TIP] Check logs/ directory for detailed application logs if you encounter any issues.

🧪 Tests

Run unit tests with pytest. On Windows PowerShell:

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -e .[dev]
pytest

You can also run a specific test file:

python -m pytest tests/test_invoice_service.py

📖 Usage

Tip

New to the bot? Start with /start to see the interactive menu!

🎯 Basic Commands
Command Description
/start Start the bot and see main menu
/help Show help message
/show Display current draft invoice
/save Save current draft to database
✏️ Editing Commands
Command Description Example
/edit Edit header fields /edit supplier=ACME client=Corp date=2024-01-15
/edititem Edit specific line item /edititem 0 name=Widget qty=5 price=10.50
/comment Add a comment /comment Approved by manager
🔍 Query Commands
/invoices YYYY-MM-DD YYYY-MM-DD [supplier=text]

Example:

/invoices 2024-01-01 2024-01-31 supplier=ACME
🔘 Interactive Buttons

The bot provides inline keyboard buttons for:

  • 📤 Upload invoice
  • ✏️ Edit invoice fields
  • 💬 Add comments
  • 💾 Save invoice
  • 📅 Query invoices by period
  • ❓ View help

📊 Project Stats

Top Language Code Size Last Commit

📁 Project Structure

InvoiceFlowBot/
├── bot.py                 # Main bot entry point
├── config.py              # Configuration management
├── domain/
│   └── invoices.py        # Domain entities (Invoice, InvoiceHeader, InvoiceItem, etc.)
├── services/
│   └── invoice_service.py # Service layer (OCR orchestration, domain conversion)
├── handlers/
│   ├── commands.py        # Text command handlers (/show, /edit, /invoices, etc.)
│   ├── callbacks.py       # Callback query handlers (inline button actions)
│   ├── file.py            # File upload handlers
│   ├── state.py           # Global state management
│   └── utils.py           # Utility functions and keyboards
├── ocr/
│   ├── extract.py         # Invoice extraction entry point
│   ├── mindee_client.py   # Mindee API integration
│   ├── providers/         # OCR provider abstraction layer
│   │   ├── base.py        # OcrProvider interface
│   │   └── mindee_provider.py  # Mindee provider implementation
│   └── engine/
│       ├── router.py      # OCR routing logic (uses providers)
│       ├── types.py       # Data type definitions
│       └── util.py        # OCR utilities and logging
└── storage/
    └── db.py              # Database operations

⚙️ Configuration

The bot uses environment variables for configuration. See .env.example for available options.

📋 Environment Variables Reference

🔑 Required Variables

Variable Description Example
BOT_TOKEN Telegram bot token from @BotFather 123456:ABCDEF...
MINDEE_API_KEY API key from Mindee platform your-api-key
MINDEE_MODEL_ID Mindee model ID for invoice processing mindee/invoices/v4

[!WARNING] The bot will not start without these required variables!

🔧 Optional Variables

Variable Description Default
LOG_LEVEL Logging level INFO
LOG_ROTATE_MB Max log file size in MB 10
LOG_BACKUPS Number of backup log files 5
LOG_CONSOLE Enable console logging 0
LOG_DIR Custom log directory logs

🗄️ Database

The bot uses SQLite database to store invoices. The database schema is managed by Alembic.

🔨 Database Setup & Structure

Initial Setup

python -m alembic upgrade head

[!NOTE] The application automatically runs migrations on startup via storage.db.init_db().

Database Tables

Table Description
invoices Header information (supplier, client, dates, totals)
invoice_items Line items for each invoice
comments User comments associated with invoices
invoice_drafts Temporary drafts for editing

Backup & Restore

# Backup
Copy-Item .\data.sqlite .\backup\data-$(Get-Date -Format yyyyMMddHHmmss).sqlite

# Restore
Copy-Item .\backup\data-20240115.sqlite .\data.sqlite

[!WARNING] Always backup data.sqlite before major updates!

📝 Logging

Logs are written to the logs/ directory by default:

  • ocr_engine.log - General application logs
  • errors.log - Error and warning logs
  • router.log - OCR routing logs
  • extract.log - Invoice extraction logs

📚 Documentation

📸 Screenshots

📄 License

Copyright 2025 Ama

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

👨‍💻 Development

🔍 Code Quality

The project uses the following tools for code quality:

  • ruff - Fast Python linter
  • mypy - Static type checking

🛠️ Local Development Setup

# Install dependencies
pip install -e .
pip install -e .[dev]

# Run linter
python -m ruff check .

# Run type checker
python -m mypy domain services ocr storage

# Run tests
python -m pytest

The CI pipeline automatically runs ruff, mypy, and pytest on every push and pull request.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

💬 Support

For issues and questions, please open an issue on the repository.


🌟 Star History

Star History Chart

Made with ❤️ by Ama

About

Telegram bot for automated invoice processing using OCR. Extract, edit, and store invoice data from PDFs and images.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages