SQL Copilot AI-powered SQL query generator using fine-tuned Llama 3.1, LangChain RAG, and GPT-based retry mechanism.
Author: Neha Kotwal
- Natural Language to SQL: Convert plain English questions into executable SQL queries
- Fine-tuned Llama 3.1: Uses LoRA adapters for efficient fine-tuning on SQL generation tasks
- RAG-based Context: Retrieves relevant schema information using vector embeddings
- GPT Retry Mechanism: Achieves ~95% execution success rate through intelligent query fixing
- Multi-database Support: Works with PostgreSQL, MySQL andd SQLite
- nteractive Mode: Real-time query generation and testing
┌─────────────────┐
│ User Question │
└────────┬────────┘
│
v
┌─────────────────────────┐
│ RAG Context Retriever │ ← Retrieves relevant schema from vector DB
└────────┬────────────────┘
│
v
┌─────────────────────────┐
│ Llama 3.1 Generator │ ← Fine-tuned with LoRA adapters
│ (via LangChain) │
└────────┬────────────────┘
│
v
┌─────────────────────────┐
│ GPT Retry Mechanism │ ← Validates and fixes queries
└────────┬────────────────┘
│
v
┌─────────────────┐
│ SQL Query │
└─────────────────┘
# OpenAI API Key (required for retry mechanism)
OPENAI_API_KEY=your_key_here
# Database connection
DB_TYPE=postgresql
DB_HOST=localhost
DB_PORT=5432
DB_NAME=your_database
DB_USER=your_username
DB_PASSWORD=your_password
# Model paths
LLAMA_MODEL_PATH=meta-llama/Meta-Llama-3.1-8B-Instruct
LORA_ADAPTER_PATH=./checkpoints/lora_adapter
# Generation parameters
MAX_RETRIES=3
TEMPERATURE=0.1
MAX_TOKENS=512- Interactive Mode Start an interactive session:
python main.py --interactiveIn interactive mode:
- Enter natural language questions
- Type
schemato view database schema - Type
tablesto list all tables - Type
quitto exit
- View Schema Display database schema:
python main.py --show-schemaUsed the Spider dataset with 10,181 questions and 5,693 unique SQL queries across 200 databases. Download and prepare Spider dataset:
python scripts/download_spider.py --max-examples 2000This will download and convert Spider dataset to the training format, creating training_data/spider_training_data.json with 2000 examples.
Custom Training Data
[
{
"question": "Get all users",
"sql": "SELECT * FROM users;",
"schema": "Table: users\nColumns: id, name, email, created_at"
},
{
"question": "Find top 5 customers by spending",
"sql": "SELECT user_id, SUM(total) as spending FROM orders GROUP BY user_id ORDER BY spending DESC LIMIT 5;",
"schema": "Table: orders\nColumns: id, user_id, total"
}
]Fine-tune Llama 3.1 with Spider dataset:
python train.py \
--data training_data/spider_training_data.json \
--epochs 3 \
--batch-size 4 \
--output ./checkpointsCheck data quality before training:
python train.py --data training_data/your_data.json --validate-onlySqlCopilot/
├── src/
│ ├── database/ # Database connection and schema extraction
│ │ ├── db_connector.py
│ │ └── schema_extractor.py
│ ├── rag/ # RAG pipeline for context retrieval
│ │ ├── schema_vectorstore.py
│ │ └── context_retriever.py
│ ├── generator/ # Query generation and retry
│ │ ├── query_generator.py
│ │ └── retry_mechanism.py
│ ├── finetuning/ # Model fine-tuning
│ │ ├── data_processor.py
│ │ └── trainer.py
│ └── utils/ # Utilities and configuration
│ ├── config.py
│ └── logger.py
├── main.py # Main CLI application
├── train.py # Training script
├── requirements.txt # Dependencies
└── .env.example # Example configuration
Based on evaluation with 2000+ query pairs:
- Execution Success Rate : ~95% with retry mechanism
- Syntactic Validity: ~92% first attempt
- Context Retrieval: RAG significantly improves accuracy for complex schemas
- Schema Indexing (RAG)
- Extracts database schema tables, columns, relationships
- Converts to embeddings using sentence-transformers
- Query Generation
- Retrieves relevant schema context based on user question
- Passes context + question to fine-tuned Llama 3.1
- Uses LangChain for prompt management
- Validation & Retry
- Attempts to execute generated query
- If it fails, GPT-4 analyzes error and fixes the query
- Retries up to 3 times
- Achieves high success rate through iterative refinement
- Fine-tuning
- Uses LoRA (Low-Rank Adaptation) for efficient fine-tuning
- Instruction tuning format optimized for Llama 3.1
- 4-bit quantization reduces memory requirements
- Can be trained on consumer GPUs