A comprehensive AI-powered tool for analyzing PDF documents with reference-based question answering using Anthropic Claude Sonnet 4. This application provides intelligent document analysis through a sophisticated training workflow that learns from example documents and reference Q&A pairs.
- ๐ง AI Training System: Train your AI analyzer with custom document samples and reference Q&A
- ๐ PDF Analysis: Direct PDF processing with base64 encoding to Claude (no text extraction required)
- ๐ฏ Question-Answer Extraction: Structured Q&A generation based on trained templates
- โ๏ธ Interactive Correction: Real-time answer editing with AI feedback integration
- ๐ Test Validation: Validate AI performance with known test documents
- ๐พ Context Caching: Anthropic message caching for improved performance and cost efficiency
- ๐ Progress Tracking: Real-time analysis progress and status monitoring
- ๐ Session Management: Persistent training context across sessions
- Vue.js 3 frontend with modern Composition API
- FastAPI backend with async processing
- Anthropic Claude Sonnet 4 integration with prompt caching
- Tailwind CSS for responsive, modern UI
- Shared dependency injection for consistent training context
- Comprehensive error handling and logging
- File upload validation and security
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ Vue.js 3 โ โ FastAPI โ โ Anthropic โ
โ Frontend โโโโโบโ Backend โโโโโบโ Claude API โ
โ โ โ โ โ โ
โ โข Training UI โ โ โข PDF Encoder โ โ โข Sonnet 4 โ
โ โข Analysis UI โ โ โข Doc Analyzer โ โ โข Caching โ
โ โข Validation โ โ โข Context Mgmt โ โ โข Training โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
Frontend:
- Vue.js 3 with Composition API
- Tailwind CSS for styling
- Pinia for state management
- Axios for HTTP requests
- Vue Router for navigation
Backend:
- FastAPI with async support
- Anthropic Python SDK
- Pydantic for data validation
- aiofiles for async file operations
- python-multipart for file uploads
- Node.js 18+ and npm
- Python 3.9+
- Anthropic API Key (Claude access required)
- Git for version control
git clone <repository-url>
cd ai_powered_document_analyzercd backend
pip install -r requirements.txtCreate a .env file in the backend directory:
cp .env.example .envConfigure your .env file:
# Anthropic API Configuration
ANTHROPIC_API_KEY=your_anthropic_api_key_here
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022
MAX_TOKENS=8192
TEMPERATURE=0.0
# Application Settings
APP_NAME=AI Document Analyzer
DEBUG=true
LOG_LEVEL=INFO
# CORS Settings
ALLOWED_ORIGINS=["http://localhost:5173", "http://127.0.0.1:5173"]
# File Storage
UPLOAD_DIRECTORY=uploads
CACHE_DIRECTORY=cache
LOG_DIRECTORY=logs
MAX_FILE_SIZE=10485760 # 10MB
# Context Logging
ENABLE_CONTEXT_LOGGING=truecd backend
python -m uvicorn app.main:app --reload --host 0.0.0.0 --port 8000The backend will be available at http://localhost:8000
cd frontend
npm installnpm run devThe frontend will be available at http://localhost:5173
- Backend Health Check: Visit
http://localhost:8000/health - API Documentation: Visit
http://localhost:8000/docs - Frontend Application: Visit
http://localhost:5173
The AI training system uses a sophisticated three-document approach to create a comprehensive context for document analysis:
graph TD
A[Sample Document] --> D[Training Context]
B[Reference Q&A] --> D
C[Question Template] --> D
D --> E[Trained AI Model]
E --> F[Document Analysis]
F --> G[Structured Q&A Output]
- Purpose: Provides the AI with an example of the document type to analyze
- Content: A representative PDF document similar to those you plan to analyze
- AI Learning: Document structure, formatting, content patterns, terminology
- Purpose: Teaches the AI the desired answering style and format
- Content: PDF containing example questions and ideal answers
- AI Learning: Answer formatting, tone, depth, structure, citation style
- Purpose: Defines the specific questions to ask for every document analysis
- Content: PDF with numbered questions to be answered
- AI Learning: Question types, analysis focus areas, expected output structure
# The system constructs a comprehensive training prompt:
training_context = f"""
You are an expert document analyzer. Use these training materials:
1. SAMPLE DOCUMENT: {sample_document}
- Study the document structure and content type
- Understand the formatting and layout
- Learn domain-specific terminology
2. REFERENCE Q&A: {reference_document}
- Learn the desired answering style
- Follow the format and structure shown
- Match the depth and detail level
- Use similar citation methods
3. QUESTION TEMPLATE: {template_document}
- These are the exact questions to answer
- Follow the numbering system
- Address each question comprehensively
INSTRUCTIONS:
- Maintain consistent formatting across all answers
- Provide accurate, detailed responses
- Include relevant citations when available
- Follow the established answering pattern
"""analysis_prompt = f"""
Based on your training with the sample document and reference Q&A style,
analyze this new document: {target_document}
Answer each question from the template with the same style and format
you learned from the reference answers.
Ensure each answer is:
1. Accurate and well-researched
2. Properly formatted
3. Appropriately detailed
4. Consistent with the training style
Questions to answer: {template_questions}
"""The application uses Anthropic's message caching feature for efficiency:
# Training documents are cached to reduce API costs and improve performance
cached_messages = [
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": sample_document_base64
},
"cache_control": {"type": "ephemeral"} # Cache enabled
}
]
},
# Additional cached training documents...
]- AI analyzes the sample document to understand:
- Document layout and formatting
- Section organization
- Content hierarchy
- Domain-specific language
- AI studies reference Q&A to learn:
- Answer formatting preferences
- Appropriate response length
- Citation and reference style
- Professional tone and language
- AI processes question template to understand:
- Required analysis scope
- Question types and categories
- Expected output structure
- Numbering and organization
# Test document analysis with known expected answers
test_result = {
"ai_answers": generated_answers,
"expected_answers": provided_expectations,
"accuracy_scores": calculated_similarities,
"feedback_areas": identified_improvements
}- Real-time Corrections: Users can edit answers and provide feedback
- Style Refinement: AI learns from corrections to improve future responses
- Contextual Learning: Feedback is integrated into the training context
- Navigate to the Training section
- Upload three required documents:
- Sample Document: Representative PDF for analysis
- Reference Q&A: Example questions and answers
- Question Template: Questions to ask for each analysis
- Click "Process Training Documents"
- Wait for AI training completion (caching and context building)
- Verify training status shows "Ready"
- Navigate to the Validation section
- Upload a test PDF document
- Optionally provide expected answers in JSON format
- Compare AI-generated answers with expected results
- Review accuracy scores for each question
- Identify areas for improvement
- Edit answers that need improvement
- Provide specific feedback for better responses
- Submit feedback to refine AI training
- Review overall accuracy
- Approve the validation to proceed to analysis
- Training context is updated with feedback
- Navigate to the Analysis section
- Upload PDF document for analysis
- Start the analysis process
- Review structured Q&A output
- Each question from the template is answered
- Answers follow the trained style and format
- Edit any answers that need refinement
- Use the correction prompt feature for AI-assisted improvements
- Regenerate specific answers with additional context
- Export analysis results in JSON format
- Include source context and metadata
- Download for external use or reporting
POST /api/training/upload-document
Content-Type: multipart/form-data
Form Data:
- file: PDF file
- document_type: "sample" | "reference" | "template"POST /api/training/process-individual
Response:
{
"status": "completed",
"message": "Training completed successfully",
"progress": 100,
"completed": true
}GET /api/training/status
Response:
{
"status": "completed",
"message": "Training ready",
"progress": 100,
"completed": true
}POST /api/analysis/upload-document
Content-Type: multipart/form-data
Form Data:
- file: PDF file
- document_type: "analysis"POST /api/analysis/analyze-document
Response:
{
"session_id": "uuid",
"status": "completed",
"progress": 100
}GET /api/analysis/results/{session_id}
Response:
{
"session_id": "uuid",
"document_filename": "document.pdf",
"analysis_status": "completed",
"questions_answers": [
{
"question_number": 1,
"question": "What is the main purpose?",
"answer": "The main purpose is...",
"confidence": 0.95
}
]
}PUT /api/analysis/correct-answer
Body:
{
"session_id": "uuid",
"question_number": 1,
"correction_prompt": "Please be more specific about...",
"current_answer": "Current answer text"
}POST /api/test/analyze-document
Content-Type: multipart/form-data
Form Data:
- file: PDF file
- expected_answers: JSON string (optional)POST /api/test/feedback
Body:
{
"test_session_id": "uuid",
"question_number": 1,
"feedback_text": "The answer should include...",
"suggested_answer": "Suggested improvement"
}- Cause: Training documents not properly cached
- Solution: Re-upload training documents and process again
- Technical: Check shared dependency injection in
dependencies.py
- Check: Backend server is running on port 8000
- Verify:
ANTHROPIC_API_KEYis correctly set in.env - Test: Visit
/healthendpoint for status
- Check: File size under 10MB
- Verify: PDF format only
- Ensure: Proper file permissions in upload directory
- Verify: All three training documents are uploaded
- Check: PDF files are valid and readable
- Review: API rate limits and quotas
# Check backend logs
cd backend
tail -f logs/app.log# Check browser console for JavaScript errors
# Open Developer Tools โ Console# Test API endpoints directly
curl -X GET http://localhost:8000/health
curl -X GET http://localhost:8000/api/training/statusai_powered_document_analyzer/
โโโ backend/
โ โโโ app/
โ โ โโโ config/ # Application configuration
โ โ โโโ models/ # Pydantic data models
โ โ โโโ routes/ # FastAPI route handlers
โ โ โโโ services/ # Business logic services
โ โ โโโ utils/ # Utility functions
โ โ โโโ main.py # FastAPI application
โ โโโ cache/ # Document and context cache
โ โโโ logs/ # Application logs
โ โโโ uploads/ # Temporary file storage
โโโ frontend/
โ โโโ src/
โ โ โโโ components/ # Vue.js components
โ โ โโโ views/ # Page-level components
โ โ โโโ stores/ # Pinia state management
โ โ โโโ services/ # API service layer
โ โ โโโ utils/ # Frontend utilities
โ โโโ public/ # Static assets
โโโ Document_analyzer_test_data/ # Sample test documents
- Models: Define data structures in
app/models/ - Services: Implement business logic in
app/services/ - Routes: Create API endpoints in
app/routes/ - Tests: Add tests for new functionality
- Components: Create reusable Vue components
- Views: Implement page-level components
- Stores: Manage application state with Pinia
- Services: Handle API communication
- Type hints for all functions
- Async/await for I/O operations
- Pydantic models for data validation
- Comprehensive error handling
- Vue 3 Composition API
- TypeScript-style prop definitions
- Reactive state management
- Component composition patterns
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request
- Follow existing code style and conventions
- Include comprehensive docstrings
- Add unit tests for new features
- Update documentation as needed
This is licensed proprietary
For questions, issues, or contributions:
- Create an issue in the repository
- Review the troubleshooting section
- Check the API documentation at
/docs
- Multi-language Support: Support for documents in different languages
- Batch Processing: Analyze multiple documents simultaneously
- Advanced Analytics: Confidence scoring and uncertainty indicators
- Template Management: Save and manage multiple question templates
- Export Formats: Support for PDF, Word, and Excel exports
- User Authentication: Multi-user support with role-based access
- Cloud Storage: Integration with cloud storage providers
- API Rate Limiting: Advanced rate limiting and quota management
Built with โค๏ธ using Vue.js, FastAPI, and Anthropic Claude Sonnet 4