A production-ready Python project that validates user profile data using prompt engineering with Large Language Models (LLMs). Built for the DeepStack AI/ML Internship Assignment.
This project implements a strict input validator that:
- Takes user profile JSON data as input
- Uses Claude (or compatible LLM) to validate based on high-level constraints
- Returns structured JSON output with validation results
- Includes comprehensive automated test suite using Promptfoo
- LLM-Only Validation: All validation logic delegated to the LLM via prompt engineering
- High-Level Constraints: Rules expressed at a standard level (e.g., "E.164 format" instead of detailed regex patterns)
- Strict Schema Compliance: Output always matches the required JSON schema
- No Hardcoded Rules: The validator learns rules from prompts, not hardcoded checks
- Name: Required and non-empty
- Email: Must be valid email format (if provided)
- Age: Must be positive number (if provided)
- Country: Must be ISO-2 country code like US, IN, GB (if provided)
- Phone: Required and must follow E.164 format (+[country code][number])
- Age: Below 18 years old
- Name: Shorter than 3 characters
- Email: Uses disposable/temporary email domain
- Phone-Country: Mismatch between country code and phone number
# Clone the repository
git clone <your-repo-url>
cd llm-input-validator
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Copy the example file
cp .env.example .env
# Edit .env and add your GROQ API key
GROQ_API_KEY=your_actual_api_key_here
GROQ_MODEL="MODELNAME"# Validate a user profile
python validate_user.py examples/input1.json
# Example with valid input
python validate_user.py examples/input2.json
# Your own file
python validate_user.py path/to/your/input.json# Install promptfoo (if not already installed)
npm install -g promptfoo
#npm
npm install
# Run the test suite
promptfoo eval -c promptfoo.yaml
# View the HTML report
promptfoo viewllm-input-validator/
βββ validate_user.py # Main validation script
βββ prompts.py # Prompt engineering templates
βββ config.py # Configuration and constants
βββ requirements.txt # Python dependencies
βββ .env.example # Environment variables template
βββ .env # Your actual environment variables (create from .env.example)
βββ promptfoo.yaml # Promptfoo test configuration
βββ README.md # This file
β
βββ examples/ # Example input files
β βββ input1.json # Invalid input example
β βββ input2.json # Valid input example
β
βββ tests/ # Unit tests
βββ test_validator.py # Test cases
python validate_user.py examples/input2.jsonInput (examples/input2.json):
{
"name": "Aarav Patel",
"email": "aarav.patel@gmail.com",
"age": 24,
"country": "IN",
"phone": "+919876543210"
}Output:
{
"is_valid": true,
"errors": [],
"warnings": []
}python validate_user.py examples/input1.jsonInput (examples/input1.json):
{
"name": "",
"email": "user@gmail",
"age": 16,
"country": "India",
"phone": "99999"
}Output:
{
"is_valid": false,
"errors": [
"name is required",
"email is not a valid email address",
"country must be a valid ISO-2 country code",
"phone number is not in E.164 format"
],
"warnings": [
"age is below recommended minimum"
]
}# Run evaluation with detailed output
promptfoo eval -c promptfoo.yaml --verbose
# Open HTML report
promptfoo viewThe promptfoo.yaml includes 14 comprehensive test cases:
| Test Case | Description | Expected |
|---|---|---|
| 1 | Valid user profile | is_valid: true |
| 2 | Multiple errors | is_valid: false with errors |
| 3 | Missing name | Error about name |
| 4 | Missing phone | Error about phone |
| 5 | Invalid email | Error about email format |
| 6 | Invalid phone format | Error about E.164 |
| 7 | Invalid country code | Error about country |
| 8 | Negative age | Error about age |
| 9 | Age below 18 | Warning triggered |
| 10 | Short name | Warning triggered |
| 11 | Disposable email | Warning triggered |
| 12 | Country-phone mismatch | Warning triggered |
| 13 | Null fields ignored | Properly handled |
| 14 | Minimal valid input | Success with minimal fields |
- UserValidator class: Main validation engine
- LLM Integration: Calls Claude API with structured prompts
- JSON Parsing: Extracts JSON from markdown code blocks
- Schema Validation: Ensures output matches expected structure
- Error Handling: Graceful handling of malformed responses
- High-Level Constraints: Uses standard terminology (E.164, ISO-2)
- No Rule Enumeration: Avoids listing every validation detail
- Context-Aware: Includes examples and explanations
- Flexible: Can be adapted for different validation needs
- API Configuration: Centralized LLM settings
- Constants: Validation schemas and rules
- ISO-2 Country Codes: Sample list for reference
- Disposable Email Domains: Sample list for warnings
Input JSON
β
prompts.py: Generate validation prompt with high-level rules
β
GROQ.GROQ: Call Claude API
β
LLM: Understands rules and validates data
β
validate_user.py: Parse JSON response
β
Validate schema and format
β
Output structured JSON result
-
Standard Terminology: Use recognized standards (E.164, ISO-2, etc.)
Good: "Phone must be in E.164 format" Bad: "Phone must be +[digits only], no dashes" -
Context Over Rules: Provide context rather than exhaustive lists
Good: "Country code from phone should match the country field" Bad: "If phone is +91 and country is US, flag warning" -
Example-Driven: Use examples to clarify expectations
Good: "E.164 format example: +919876543210" Bad: "Exactly 10 digits after +" -
High-Level Constraints: Let LLM infer details
Good: "E.164 international phone format" Bad: "Must start with +, contain country code 1-3 digits, etc"
- No External APIs: Only uses GROQ API for validation
- No Data Storage: All validation is stateless
- Environment Variables: API keys stored in
.env, not in code - Input Validation: Schema checked before processing
- Output Sanitization: JSON parsing with error handling
Based on assignment requirements:
- β Output Quality: Strict JSON schema compliance
- β Schema Discipline: Validated at each step
- β Prompt Clarity: High-level, standard-based rules
- β Eval Quality: 14 comprehensive test cases
- β Ease of Use: Single command to run everything
- β Communication: Clear code and documentation
- The LLM response doesn't contain valid JSON
- Try: Rerun the validator (LLM might have had a parsing issue)
- Check: Ensure API key is valid
- Environment variable not set
- Solution: Create
.envfile from.env.exampleand add your key
- Invalid or expired API key
- Solution: Update
GROQ_API_KEYin.env
- Installation issue
- Solution:
npm install -g @promptfoo/promptfoo
- Support for additional LLM providers (OpenAI, Cohere)
- Batch validation for multiple users
- Custom validation rule sets
- Performance benchmarking
- Extended ISO-2 country code list
- Extended disposable email domain list
- Caching for common validations
- Web API wrapper (Flask/FastAPI)
MIT License - Feel free to use and modify
For questions or issues:
- Check the README
- Review the test cases in
promptfoo.yaml - Check LLM response with verbose mode
Created for: DeepStack AI/ML Last Updated: January 2026