AI-Powered Document Analyzer

A comprehensive AI-powered tool for analyzing PDF documents with reference-based question answering using Anthropic Claude Sonnet 4. This application provides intelligent document analysis through a sophisticated training workflow that learns from example documents and reference Q&A pairs.

🚀 Features

Core Capabilities

🧠 AI Training System: Train your AI analyzer with custom document samples and reference Q&A
📄 PDF Analysis: Direct PDF processing with base64 encoding to Claude (no text extraction required)
🎯 Question-Answer Extraction: Structured Q&A generation based on trained templates
✏️ Interactive Correction: Real-time answer editing with AI feedback integration
🔍 Test Validation: Validate AI performance with known test documents
💾 Context Caching: Anthropic message caching for improved performance and cost efficiency
📊 Progress Tracking: Real-time analysis progress and status monitoring
🔄 Session Management: Persistent training context across sessions

Technical Features

Vue.js 3 frontend with modern Composition API
FastAPI backend with async processing
Anthropic Claude Sonnet 4 integration with prompt caching
Tailwind CSS for responsive, modern UI
Shared dependency injection for consistent training context
Comprehensive error handling and logging
File upload validation and security

🏗️ Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Vue.js 3      │    │   FastAPI        │    │  Anthropic      │
│   Frontend      │◄──►│   Backend        │◄──►│  Claude API     │
│                 │    │                  │    │                 │
│ • Training UI   │    │ • PDF Encoder    │    │ • Sonnet 4      │
│ • Analysis UI   │    │ • Doc Analyzer   │    │ • Caching       │
│ • Validation    │    │ • Context Mgmt   │    │ • Training      │
└─────────────────┘    └──────────────────┘    └─────────────────┘

Technology Stack

Frontend:

Vue.js 3 with Composition API
Tailwind CSS for styling
Pinia for state management
Axios for HTTP requests
Vue Router for navigation

Backend:

FastAPI with async support
Anthropic Python SDK
Pydantic for data validation
aiofiles for async file operations
python-multipart for file uploads

📋 Prerequisites

Node.js 18+ and npm
Python 3.9+
Anthropic API Key (Claude access required)
Git for version control

🛠️ Installation & Setup

1. Clone the Repository

git clone <repository-url>
cd ai_powered_document_analyzer

2. Backend Setup

Install Python Dependencies

cd backend
pip install -r requirements.txt

Environment Configuration

Create a .env file in the backend directory:

cp .env.example .env

Configure your .env file:

# Anthropic API Configuration
ANTHROPIC_API_KEY=your_anthropic_api_key_here
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022
MAX_TOKENS=8192
TEMPERATURE=0.0

# Application Settings
APP_NAME=AI Document Analyzer
DEBUG=true
LOG_LEVEL=INFO

# CORS Settings
ALLOWED_ORIGINS=["http://localhost:5173", "http://127.0.0.1:5173"]

# File Storage
UPLOAD_DIRECTORY=uploads
CACHE_DIRECTORY=cache
LOG_DIRECTORY=logs
MAX_FILE_SIZE=10485760  # 10MB

# Context Logging
ENABLE_CONTEXT_LOGGING=true

Start the Backend Server

cd backend
python -m uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

The backend will be available at http://localhost:8000

3. Frontend Setup

Install Node.js Dependencies

cd frontend
npm install

Start the Frontend Development Server

npm run dev

The frontend will be available at http://localhost:5173

4. Verify Installation

Backend Health Check: Visit http://localhost:8000/health
API Documentation: Visit http://localhost:8000/docs
Frontend Application: Visit http://localhost:5173

🧠 AI Prompt Workflow & Training Process

Overview

The AI training system uses a sophisticated three-document approach to create a comprehensive context for document analysis:

graph TD
    A[Sample Document] --> D[Training Context]
    B[Reference Q&A] --> D
    C[Question Template] --> D
    D --> E[Trained AI Model]
    E --> F[Document Analysis]
    F --> G[Structured Q&A Output]

1. Training Document Types

📄 Sample Document

Purpose: Provides the AI with an example of the document type to analyze
Content: A representative PDF document similar to those you plan to analyze
AI Learning: Document structure, formatting, content patterns, terminology

📋 Reference Q&A Document

Purpose: Teaches the AI the desired answering style and format
Content: PDF containing example questions and ideal answers
AI Learning: Answer formatting, tone, depth, structure, citation style

❓ Question Template Document

Purpose: Defines the specific questions to ask for every document analysis
Content: PDF with numbered questions to be answered
AI Learning: Question types, analysis focus areas, expected output structure

2. AI Training Prompt Engineering

Training Context Construction

# The system constructs a comprehensive training prompt:
training_context = f"""
You are an expert document analyzer. Use these training materials:

1. SAMPLE DOCUMENT: {sample_document}
   - Study the document structure and content type
   - Understand the formatting and layout
   - Learn domain-specific terminology

2. REFERENCE Q&A: {reference_document}
   - Learn the desired answering style
   - Follow the format and structure shown
   - Match the depth and detail level
   - Use similar citation methods

3. QUESTION TEMPLATE: {template_document}
   - These are the exact questions to answer
   - Follow the numbering system
   - Address each question comprehensively

INSTRUCTIONS:
- Maintain consistent formatting across all answers
- Provide accurate, detailed responses
- Include relevant citations when available
- Follow the established answering pattern
"""

Analysis Prompt Structure

analysis_prompt = f"""
Based on your training with the sample document and reference Q&A style, 
analyze this new document: {target_document}

Answer each question from the template with the same style and format 
you learned from the reference answers.

Ensure each answer is:
1. Accurate and well-researched
2. Properly formatted
3. Appropriately detailed
4. Consistent with the training style

Questions to answer: {template_questions}
"""

3. Anthropic Message Caching Strategy

The application uses Anthropic's message caching feature for efficiency:

# Training documents are cached to reduce API costs and improve performance
cached_messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "document",
                "source": {
                    "type": "base64",
                    "media_type": "application/pdf", 
                    "data": sample_document_base64
                },
                "cache_control": {"type": "ephemeral"}  # Cache enabled
            }
        ]
    },
    # Additional cached training documents...
]

4. Context Learning Process

Phase 1: Document Structure Learning

AI analyzes the sample document to understand:
- Document layout and formatting
- Section organization
- Content hierarchy
- Domain-specific language

Phase 2: Style Training

AI studies reference Q&A to learn:
- Answer formatting preferences
- Appropriate response length
- Citation and reference style
- Professional tone and language

Phase 3: Question Framework

AI processes question template to understand:
- Required analysis scope
- Question types and categories
- Expected output structure
- Numbering and organization

5. Validation & Feedback Loop

Test Validation Process

# Test document analysis with known expected answers
test_result = {
    "ai_answers": generated_answers,
    "expected_answers": provided_expectations,
    "accuracy_scores": calculated_similarities,
    "feedback_areas": identified_improvements
}

Feedback Integration

Real-time Corrections: Users can edit answers and provide feedback
Style Refinement: AI learns from corrections to improve future responses
Contextual Learning: Feedback is integrated into the training context

📖 Usage Guide

1. Training Workflow

Step 1: Upload Training Documents

Navigate to the Training section
Upload three required documents:
- Sample Document: Representative PDF for analysis
- Reference Q&A: Example questions and answers
- Question Template: Questions to ask for each analysis

Step 2: Process Training

Click "Process Training Documents"
Wait for AI training completion (caching and context building)
Verify training status shows "Ready"

2. Validation Workflow (Recommended)

Step 1: Upload Test Document

Navigate to the Validation section
Upload a test PDF document
Optionally provide expected answers in JSON format

Step 2: Review Results

Compare AI-generated answers with expected results
Review accuracy scores for each question
Identify areas for improvement

Step 3: Provide Feedback

Edit answers that need improvement
Provide specific feedback for better responses
Submit feedback to refine AI training

Step 4: Approve Validation

Review overall accuracy
Approve the validation to proceed to analysis
Training context is updated with feedback

3. Document Analysis

Step 1: Upload Analysis Document

Navigate to the Analysis section
Upload PDF document for analysis
Start the analysis process

Step 2: Review Generated Answers

Review structured Q&A output
Each question from the template is answered
Answers follow the trained style and format

Step 3: Interactive Correction

Edit any answers that need refinement
Use the correction prompt feature for AI-assisted improvements
Regenerate specific answers with additional context

Step 4: Export Results

Export analysis results in JSON format
Include source context and metadata
Download for external use or reporting

🔧 API Documentation

Training Endpoints

Upload Training Document

POST /api/training/upload-document
Content-Type: multipart/form-data

Form Data:
- file: PDF file
- document_type: "sample" | "reference" | "template"

Process Training Documents

POST /api/training/process-individual

Response:
{
  "status": "completed",
  "message": "Training completed successfully",
  "progress": 100,
  "completed": true
}

Get Training Status

GET /api/training/status

Response:
{
  "status": "completed",
  "message": "Training ready",
  "progress": 100,
  "completed": true
}

Analysis Endpoints

Upload Document for Analysis

POST /api/analysis/upload-document
Content-Type: multipart/form-data

Form Data:
- file: PDF file
- document_type: "analysis"

Start Analysis

POST /api/analysis/analyze-document

Response:
{
  "session_id": "uuid",
  "status": "completed",
  "progress": 100
}

Get Analysis Results

GET /api/analysis/results/{session_id}

Response:
{
  "session_id": "uuid",
  "document_filename": "document.pdf",
  "analysis_status": "completed",
  "questions_answers": [
    {
      "question_number": 1,
      "question": "What is the main purpose?",
      "answer": "The main purpose is...",
      "confidence": 0.95
    }
  ]
}

Correct Answer

PUT /api/analysis/correct-answer

Body:
{
  "session_id": "uuid",
  "question_number": 1,
  "correction_prompt": "Please be more specific about...",
  "current_answer": "Current answer text"
}

Validation Endpoints

Analyze Test Document

POST /api/test/analyze-document
Content-Type: multipart/form-data

Form Data:
- file: PDF file
- expected_answers: JSON string (optional)

Provide Feedback

POST /api/test/feedback

Body:
{
  "test_session_id": "uuid",
  "question_number": 1,
  "feedback_text": "The answer should include...",
  "suggested_answer": "Suggested improvement"
}

🚨 Troubleshooting

Common Issues

"Training context not available"

Cause: Training documents not properly cached
Solution: Re-upload training documents and process again
Technical: Check shared dependency injection in dependencies.py

API Connection Issues

Check: Backend server is running on port 8000
Verify: ANTHROPIC_API_KEY is correctly set in .env
Test: Visit /health endpoint for status

File Upload Failures

Check: File size under 10MB
Verify: PDF format only
Ensure: Proper file permissions in upload directory

Training Process Fails

Verify: All three training documents are uploaded
Check: PDF files are valid and readable
Review: API rate limits and quotas

Error Handling

Backend Errors

# Check backend logs
cd backend
tail -f logs/app.log

Frontend Errors

# Check browser console for JavaScript errors
# Open Developer Tools → Console

API Debugging

# Test API endpoints directly
curl -X GET http://localhost:8000/health
curl -X GET http://localhost:8000/api/training/status

🏗️ Development

Project Structure

ai_powered_document_analyzer/
├── backend/
│   ├── app/
│   │   ├── config/           # Application configuration
│   │   ├── models/           # Pydantic data models
│   │   ├── routes/           # FastAPI route handlers
│   │   ├── services/         # Business logic services
│   │   ├── utils/            # Utility functions
│   │   └── main.py          # FastAPI application
│   ├── cache/               # Document and context cache
│   ├── logs/                # Application logs
│   └── uploads/             # Temporary file storage
├── frontend/
│   ├── src/
│   │   ├── components/      # Vue.js components
│   │   ├── views/           # Page-level components
│   │   ├── stores/          # Pinia state management
│   │   ├── services/        # API service layer
│   │   └── utils/           # Frontend utilities
│   └── public/              # Static assets
└── Document_analyzer_test_data/  # Sample test documents

Adding New Features

Backend Development

Models: Define data structures in app/models/
Services: Implement business logic in app/services/
Routes: Create API endpoints in app/routes/
Tests: Add tests for new functionality

Frontend Development

Components: Create reusable Vue components
Views: Implement page-level components
Stores: Manage application state with Pinia
Services: Handle API communication

Code Quality

Backend Standards

Type hints for all functions
Async/await for I/O operations
Pydantic models for data validation
Comprehensive error handling

Frontend Standards

Vue 3 Composition API
TypeScript-style prop definitions
Reactive state management
Component composition patterns

📝 Contributing

Development Setup

Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Submit a pull request

Coding Standards

Follow existing code style and conventions
Include comprehensive docstrings
Add unit tests for new features
Update documentation as needed

📄 License

This is licensed proprietary

🤝 Support

For questions, issues, or contributions:

Create an issue in the repository
Review the troubleshooting section
Check the API documentation at /docs

🔮 Roadmap

Multi-language Support: Support for documents in different languages
Batch Processing: Analyze multiple documents simultaneously
Advanced Analytics: Confidence scoring and uncertainty indicators
Template Management: Save and manage multiple question templates
Export Formats: Support for PDF, Word, and Excel exports
User Authentication: Multi-user support with role-based access
Cloud Storage: Integration with cloud storage providers
API Rate Limiting: Advanced rate limiting and quota management

Built with ❤️ using Vue.js, FastAPI, and Anthropic Claude Sonnet 4

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.claude		.claude
Document_anayzer_test_data		Document_anayzer_test_data
backend		backend
frontend		frontend
.cursorrules		.cursorrules
.gitignore		.gitignore
AI_WORKFLOW_PROMPTS_README.md		AI_WORKFLOW_PROMPTS_README.md
FILES_API_INTEGRATION_README.md		FILES_API_INTEGRATION_README.md
README.md		README.md
TRAINING_CONTEXT_FIX.md		TRAINING_CONTEXT_FIX.md
claude.md		claude.md
pdf_analyzer_prompt.md		pdf_analyzer_prompt.md

Folders and files

Latest commit

History

Repository files navigation