AI-powered fax classification and routing system for healthcare and business workflows
FaxSort is an automated fax processing service that receives faxes via HumbleFax API, uses OCR and AI to classify document types, and intelligently routes them to the appropriate recipients via email. Originally built for internal use at a healthcare practice, it's now open sourced for anyone who needs intelligent fax handling.
- π Automated Polling: Continuously monitors HumbleFax for incoming faxes
- ποΈ OCR Processing: Extracts text from TIFF images using Tesseract
- π§ AI Classification: Uses Anthropic's Claude to categorize documents with custom categories
- π HIPAA Compliance: Optional PHI redaction using Microsoft Presidio
- π§ Smart Routing: Automatically emails classified faxes to designated recipients
- β‘ Known Sender Fast-Track: Direct routing for pre-configured senders
- π§ Highly Configurable: Extensive environment variable configuration
- π Health Monitoring: Built-in health check endpoint
- π§Ή Auto Cleanup: Automatic cleanup of temporary files
- Healthcare Practices: Route lab results, referrals, prescriptions, and medical records
- Legal Offices: Classify and distribute contracts, court documents, and correspondence
- Insurance Companies: Sort claims, authorizations, and policy documents
- Any Business: That receives high volumes of faxes requiring manual sorting
- Python 3.11+
- Tesseract OCR
- HumbleFax account
- Anthropic API key
- Office 365 email account (for routing)
-
Clone the repository
git clone https://github.com/yourusername/faxSort.git cd faxSort -
Install dependencies
pip install -r requirements.txt
-
Install Tesseract OCR
# Ubuntu/Debian sudo apt-get install tesseract-ocr # macOS brew install tesseract # Windows # Download from: https://github.com/UB-Mannheim/tesseract/wiki
-
Download spaCy model
python -m spacy download en_core_web_md
-
Configure environment variables
cp .env.example .env # Edit .env with your configuration (see Configuration section) -
Run the service
cd src python main.py
FaxSort is configured entirely through environment variables. Copy .env.example to .env and customize:
# API Keys
ANTHROPIC_API_KEY=your_anthropic_key_here
HUMBLE_FAX_ACCESS_KEY=your_humblefax_access_key
HUMBLE_FAX_SECRET_KEY=your_humblefax_secret_key
FAX_TO_NUMBER=your_fax_number
# Document Categories (comma-separated)
CLASSIFICATION_CATEGORIES=Medical Records Request,Pathology Report,Prior Authorization,Referral,Prescription Refill,Insurance Document,Uncategorized
# Email Configuration
SMTP_USERNAME=your.email@yourdomain.com
SMTP_PASSWORD=your_password_or_app_password
DEFAULT_FROM_EMAIL=your.email@yourdomain.com
# Email Routing (DocumentType:email@domain.com)
EMAIL_MAPPINGS=Medical Records Request:medrec@example.com,Pathology Report:path@example.com,Prior Authorization:auth@example.com# HIPAA Compliance
HIPAA_MODE=false # Set to 'true' to enable PHI redaction
# Known Sender Mappings (bypass OCR/AI classification)
SENDER_MAPPINGS=LabCorp:Lab Results,Quest Diagnostics:Lab Results
# Polling Settings
POLLING_RATE=60 # Seconds between polls
# Custom Classification Rules
KEYWORD_RULES=If you see Humira or Dupixent, classify as Biologicsgraph LR
A[HumbleFax API] --> B[Fax Processor]
B --> C{Known Sender?}
C -->|Yes| D[Direct Classification]
C -->|No| E[OCR Processing]
E --> F[PHI Redaction]
F --> G[AI Classification]
D --> H[Email Routing]
G --> H
H --> I[Cleanup]
- Polling: Service polls HumbleFax API every 60 seconds (configurable)
- Download: Downloads faxes as both TIFF (for OCR) and PDF (for email attachment)
- Classification:
- Known senders get classified immediately based on sender mappings
- Unknown senders go through OCR β PHI redaction (if enabled) β AI classification
- Routing: Classified faxes are emailed to appropriate recipients based on document type
- Cleanup: Temporary files are automatically cleaned up after successful processing
The AI classification system is highly customizable:
- Keyword Rules: Define specific keywords that trigger certain classifications
- Sender Mappings: Bypass classification for known, trusted senders
- Fallback Handling: Unclassifiable documents default to "Uncategorized"
- Medical Records Request
- Pathology Report
- Prior Authorization
- Referral
- Prescription Refill
- Lab Results
- Insurance Document
- Biologics (medication-specific)
Enable HIPAA mode to automatically redact PHI before sending to external AI services:
HIPAA_MODE=trueWhen enabled, FaxSort uses Microsoft Presidio to identify and redact:
- Names, addresses, phone numbers
- Social Security Numbers
- Medical record numbers
- Dates and other identifying information
curl http://localhost:8000/healthReturns:
{
"status": "healthy",
"timestamp": "2025-01-15T10:30:00",
"processor_status": "running"
}Comprehensive logging shows:
- Fax polling activity
- Classification results
- Email delivery status
- Error handling and retries
Coming soon! Docker support is planned for easier deployment.
FaxSort was built for internal use and is now open sourced to help others solve similar challenges. Contributions are welcome!
- Fork the repository
- Create a virtual environment:
python -m venv venv - Install dev dependencies:
pip install -r requirements.txt - Install pre-commit hooks:
pre-commit install - Make your changes and submit a PR
- Docker containerization
- Additional OCR engines
- More classification providers (OpenAI, local models)
- Web dashboard for monitoring
- Additional email providers
- Performance optimizations
- Credentials: Store all API keys and passwords securely
- PHI Handling: Enable HIPAA mode when processing medical documents
- Network Security: Run behind a firewall, consider VPN for production
- File Storage: Temporary files are cleaned up automatically
- Logging: No PHI is logged when HIPAA mode is enabled
MIT License - see LICENSE file for details.
This project is provided as-is. For bugs and feature requests, please open a GitHub issue.
Made with β€οΈ for healthcare workers and anyone drowning in fax paperwork