The Sri Lankan License Extractor is an advanced machine learning pipeline designed to automatically extract structured information from images of Sri Lankan driving licenses. Traditional manual data entry methods for vehicle license information are time-consuming, error-prone, and inefficient for large-scale processing. This project provides a comprehensive solution that leverages computer vision and OCR techniques to detect and extract vehicle classes and their validity periods from the rear page of Sri Lankan driving licenses, enabling faster, more accurate data processing for administrative and analytical purposes.
Project OverviewThis project addresses the challenge of automatically extracting structured data from Sri Lankan driving license images. The system focuses specifically on detecting and extracting:
Allowed vehicle classes (A1, B, C, D, etc.) License validity periods (start date and expiry date) for each vehicle class
The solution is built entirely with open-source tools and libraries, without relying on external APIs like Google Vision or cloud-based OCR services, making it more accessible and cost-effective.
Figure: Sri Lankan license extraction pipeline showcasing the preprocessing, detection, and extraction stages.
1. Image Preprocessing:
Objective: Transform raw license images into a format optimized for text extraction through orientation correction, enhancement, and table detection.
Outcome: Clean, properly oriented images with identified regions of interest ready for OCR processing.
2. Text Extraction with OCR:
Objective: Apply optical character recognition techniques to extract raw text from preprocessed license images.
Outcome: Raw text data containing vehicle classes and validity dates extracted from the license.
3. Data Extraction and Validation:
Objective: Parse the extracted text to identify vehicle classes and their associated dates, applying validation rules to ensure accuracy.
Figure: Visual representation of the extraction results, showing detected vehicle classes and their validity periods.
4. Result Presentation:
Objective: Format the extracted data into a clean, structured tabular format for easy interpretation and further processing.
Outcome: A structured table where each row corresponds to a vehicle category with its start and expiry dates.
sri-lankan-license-extractor/ ├── data/ │ ├── raw/ # Raw input license images │ ├── processed/ # Preprocessed images │ └── output/ # Extraction results ├── docs/ │ ├── examples/ # Example images and results │ └── README.md # Documentation ├── src/ │ ├── preprocessing/ │ │ ├── __init__.py │ │ ├── orientation.py # Image orientation detection/correction │ │ ├── enhancement.py # Image enhancement for OCR │ │ └── table_detection.py # License table detection │ ├── extraction/ │ │ ├── __init__.py │ │ ├── ocr.py # OCR functionality │ │ ├── vehicle_classes.py # Vehicle class extraction │ │ └── dates.py # Date extraction and validation │ ├── utils/ │ │ ├── __init__.py │ │ ├── visualization.py # Visualization utilities │ │ └── io.py # I/O utilities │ ├── __init__.py │ └── pipeline.py # Main processing pipeline ├── tests/ │ ├── test_preprocessing.py │ ├── test_extraction.py │ └── test_pipeline.py ├── notebooks/ │ ├── development.ipynb # Development notebook │ └── demo.ipynb # Demo notebook ├── .gitignore ├── LICENSE ├── README.md ├── requirements.txt └── setup.pyInstallation and Setup Prerequisites
Python 3.8 or higher Tesseract OCR engine OpenCV
Setup Instructions bash# Clone the repository git clone https://github.com/yourusername/sri-lankan-license-extractor.git cd sri-lankan-license-extractor
Install Tesseract OCR For Ubuntu/Debian: sudo apt-get install tesseract-ocr For macOS: brew install tesseract For Windows, download from: https://github.com/UB-Mannheim/tesseract/wiki
Create and activate virtual environment (optional) python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
Install required packages pip install -r requirements.txt
Install package in development mode pip install -e . Usage Examples Command Line Interface bash# Process a single image python -m src.pipeline --image path/to/license_image.jpg
Process all images in a directory python -m src.pipeline --directory path/to/images/ --output path/to/output/ Python API pythonfrom src.pipeline import preprocess_license_image from src.extraction.ocr import extract_text from src.extraction.vehicle_classes import extract_vehicle_classes from src.extraction.dates import extract_dates
Process a single image processed_img = preprocess_license_image('path/to/license_image.jpg') text = extract_text(processed_img) vehicle_classes = extract_vehicle_classes(text) dates = extract_dates(text)
Display results print(f"Extracted vehicle classes: {vehicle_classes}") print(f"Extracted dates: {dates}") Performance Metrics
The system has been tested on a diverse set of Sri Lankan driving license images with varying quality and orientations:
| Metric | Score | Description |
|---|---|---|
| Vehicle Class Extraction Accuracy | ~92% | Percentage of correctly identified vehicle classes |
| Date Extraction Accuracy | ~88% | Percentage of correctly extracted and validated dates |
| Processing Time | 2.3 seconds/image | Average time to process a single license image |
| 🎓 Role | 👲 Name | 🔗 GitHub | |
|---|---|---|---|
| Project Lead | Randika Prabashwara | ||
| Contributor | Randika Prabashwara | ||