A Telegram bot for automated invoice processing using OCR technology. The bot extracts structured data from PDF invoices and photos, allowing users to review, edit, and save invoice information to a database.
graph LR
A[📱 Upload] -->|PDF/Image| B[🔍 OCR Extract]
B -->|Parse Data| C[✏️ Edit Draft]
C -->|Confirm| D[💾 SQLite]
D -->|Query| E[📊 Reports]
style A fill:#4A90E2,stroke:#2c3e50,stroke-width:2px,color:#fff
style B fill:#FF6B6B,stroke:#2c3e50,stroke-width:2px,color:#fff
style C fill:#FFD93D,stroke:#2c3e50,stroke-width:2px,color:#333
style D fill:#50C878,stroke:#2c3e50,stroke-width:2px,color:#fff
style E fill:#B19CD9,stroke:#2c3e50,stroke-width:2px,color:#fff
- 🐍 Python 3.11+
- 🤖 Telegram Bot Token
- 🔑 Mindee API Key
Tip
The fastest way to get started! Docker handles all dependencies automatically.
# 1. Clone and setup environment
git clone https://github.com/AmaLS367/InvoiceFlowBot.git
cd InvoiceFlowBot
Copy-Item .env.example .env
# 2. Edit .env with your tokens
notepad .env
# 3. Start the bot
docker-compose up --build -d
# 4. Check logs
docker-compose logs -f
# 5. Stop when done
docker-compose downNote
Requires Python 3.11+ and Git installed on your system.
📦 Step-by-step installation guide
git clone https://github.com/AmaLS367/InvoiceFlowBot.git
cd InvoiceFlowBotpython -m venv .venv
.\.venv\Scripts\Activate.ps1pip install -e .BOT_TOKEN=your_telegram_bot_token
MINDEE_API_KEY=your_mindee_api_key
MINDEE_MODEL_ID=your_mindee_model_id
# Optional logging configuration
LOG_LEVEL=INFO
LOG_ROTATE_MB=10
LOG_BACKUPS=5
LOG_CONSOLE=0
LOG_DIR=logsThe bot is configured via environment variables managed by pydantic settings in config.py.
For local development you can create a .env file in the project root:
BOT_TOKEN=123456:ABCDEF_your_bot_token
MINDEE_API_KEY=your-mindee-api-key
MINDEE_MODEL_ID=mindee/invoices/v4
DB_FILENAME=data.sqliteOn startup the application reads these values into the Settings model.
python bot.py[!TIP] Check
logs/directory for detailed application logs if you encounter any issues.
Run unit tests with pytest. On Windows PowerShell:
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -e .[dev]
pytestYou can also run a specific test file:
python -m pytest tests/test_invoice_service.pyTip
New to the bot? Start with /start to see the interactive menu!
🎯 Basic Commands
| Command | Description |
|---|---|
/start |
Start the bot and see main menu |
/help |
Show help message |
/show |
Display current draft invoice |
/save |
Save current draft to database |
✏️ Editing Commands
| Command | Description | Example |
|---|---|---|
/edit |
Edit header fields | /edit supplier=ACME client=Corp date=2024-01-15 |
/edititem |
Edit specific line item | /edititem 0 name=Widget qty=5 price=10.50 |
/comment |
Add a comment | /comment Approved by manager |
🔍 Query Commands
/invoices YYYY-MM-DD YYYY-MM-DD [supplier=text]
Example:
/invoices 2024-01-01 2024-01-31 supplier=ACME
🔘 Interactive Buttons
The bot provides inline keyboard buttons for:
- 📤 Upload invoice
- ✏️ Edit invoice fields
- 💬 Add comments
- 💾 Save invoice
- 📅 Query invoices by period
- ❓ View help
InvoiceFlowBot/
├── bot.py # Main bot entry point
├── config.py # Configuration management
├── domain/
│ └── invoices.py # Domain entities (Invoice, InvoiceHeader, InvoiceItem, etc.)
├── services/
│ └── invoice_service.py # Service layer (OCR orchestration, domain conversion)
├── handlers/
│ ├── commands.py # Text command handlers (/show, /edit, /invoices, etc.)
│ ├── callbacks.py # Callback query handlers (inline button actions)
│ ├── file.py # File upload handlers
│ ├── state.py # Global state management
│ └── utils.py # Utility functions and keyboards
├── ocr/
│ ├── extract.py # Invoice extraction entry point
│ ├── mindee_client.py # Mindee API integration
│ ├── providers/ # OCR provider abstraction layer
│ │ ├── base.py # OcrProvider interface
│ │ └── mindee_provider.py # Mindee provider implementation
│ └── engine/
│ ├── router.py # OCR routing logic (uses providers)
│ ├── types.py # Data type definitions
│ └── util.py # OCR utilities and logging
└── storage/
└── db.py # Database operations
The bot uses environment variables for configuration. See .env.example for available options.
📋 Environment Variables Reference
| Variable | Description | Example |
|---|---|---|
BOT_TOKEN |
Telegram bot token from @BotFather | 123456:ABCDEF... |
MINDEE_API_KEY |
API key from Mindee platform | your-api-key |
MINDEE_MODEL_ID |
Mindee model ID for invoice processing | mindee/invoices/v4 |
[!WARNING] The bot will not start without these required variables!
| Variable | Description | Default |
|---|---|---|
LOG_LEVEL |
Logging level | INFO |
LOG_ROTATE_MB |
Max log file size in MB | 10 |
LOG_BACKUPS |
Number of backup log files | 5 |
LOG_CONSOLE |
Enable console logging | 0 |
LOG_DIR |
Custom log directory | logs |
The bot uses SQLite database to store invoices. The database schema is managed by Alembic.
🔨 Database Setup & Structure
python -m alembic upgrade head[!NOTE] The application automatically runs migrations on startup via
storage.db.init_db().
| Table | Description |
|---|---|
invoices |
Header information (supplier, client, dates, totals) |
invoice_items |
Line items for each invoice |
comments |
User comments associated with invoices |
invoice_drafts |
Temporary drafts for editing |
# Backup
Copy-Item .\data.sqlite .\backup\data-$(Get-Date -Format yyyyMMddHHmmss).sqlite
# Restore
Copy-Item .\backup\data-20240115.sqlite .\data.sqlite[!WARNING] Always backup
data.sqlitebefore major updates!
Logs are written to the logs/ directory by default:
ocr_engine.log- General application logserrors.log- Error and warning logsrouter.log- OCR routing logsextract.log- Invoice extraction logs
Copyright 2025 Ama
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
The project uses the following tools for code quality:
- ruff - Fast Python linter
- mypy - Static type checking
# Install dependencies
pip install -e .
pip install -e .[dev]
# Run linter
python -m ruff check .
# Run type checker
python -m mypy domain services ocr storage
# Run tests
python -m pytestThe CI pipeline automatically runs ruff, mypy, and pytest on every push and pull request.
Contributions are welcome! Please feel free to submit a Pull Request.
For issues and questions, please open an issue on the repository.