Skip to content

rogerg1967/willSuit

Repository files navigation

AI-Powered Document Analyzer

A comprehensive AI-powered tool for analyzing PDF documents with reference-based question answering using Anthropic Claude Sonnet 4. This application provides intelligent document analysis through a sophisticated training workflow that learns from example documents and reference Q&A pairs.

๐Ÿš€ Features

Core Capabilities

  • ๐Ÿง  AI Training System: Train your AI analyzer with custom document samples and reference Q&A
  • ๐Ÿ“„ PDF Analysis: Direct PDF processing with base64 encoding to Claude (no text extraction required)
  • ๐ŸŽฏ Question-Answer Extraction: Structured Q&A generation based on trained templates
  • โœ๏ธ Interactive Correction: Real-time answer editing with AI feedback integration
  • ๐Ÿ” Test Validation: Validate AI performance with known test documents
  • ๐Ÿ’พ Context Caching: Anthropic message caching for improved performance and cost efficiency
  • ๐Ÿ“Š Progress Tracking: Real-time analysis progress and status monitoring
  • ๐Ÿ”„ Session Management: Persistent training context across sessions

Technical Features

  • Vue.js 3 frontend with modern Composition API
  • FastAPI backend with async processing
  • Anthropic Claude Sonnet 4 integration with prompt caching
  • Tailwind CSS for responsive, modern UI
  • Shared dependency injection for consistent training context
  • Comprehensive error handling and logging
  • File upload validation and security

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Vue.js 3      โ”‚    โ”‚   FastAPI        โ”‚    โ”‚  Anthropic      โ”‚
โ”‚   Frontend      โ”‚โ—„โ”€โ”€โ–บโ”‚   Backend        โ”‚โ—„โ”€โ”€โ–บโ”‚  Claude API     โ”‚
โ”‚                 โ”‚    โ”‚                  โ”‚    โ”‚                 โ”‚
โ”‚ โ€ข Training UI   โ”‚    โ”‚ โ€ข PDF Encoder    โ”‚    โ”‚ โ€ข Sonnet 4      โ”‚
โ”‚ โ€ข Analysis UI   โ”‚    โ”‚ โ€ข Doc Analyzer   โ”‚    โ”‚ โ€ข Caching       โ”‚
โ”‚ โ€ข Validation    โ”‚    โ”‚ โ€ข Context Mgmt   โ”‚    โ”‚ โ€ข Training      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Technology Stack

Frontend:

  • Vue.js 3 with Composition API
  • Tailwind CSS for styling
  • Pinia for state management
  • Axios for HTTP requests
  • Vue Router for navigation

Backend:

  • FastAPI with async support
  • Anthropic Python SDK
  • Pydantic for data validation
  • aiofiles for async file operations
  • python-multipart for file uploads

๐Ÿ“‹ Prerequisites

  • Node.js 18+ and npm
  • Python 3.9+
  • Anthropic API Key (Claude access required)
  • Git for version control

๐Ÿ› ๏ธ Installation & Setup

1. Clone the Repository

git clone <repository-url>
cd ai_powered_document_analyzer

2. Backend Setup

Install Python Dependencies

cd backend
pip install -r requirements.txt

Environment Configuration

Create a .env file in the backend directory:

cp .env.example .env

Configure your .env file:

# Anthropic API Configuration
ANTHROPIC_API_KEY=your_anthropic_api_key_here
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022
MAX_TOKENS=8192
TEMPERATURE=0.0

# Application Settings
APP_NAME=AI Document Analyzer
DEBUG=true
LOG_LEVEL=INFO

# CORS Settings
ALLOWED_ORIGINS=["http://localhost:5173", "http://127.0.0.1:5173"]

# File Storage
UPLOAD_DIRECTORY=uploads
CACHE_DIRECTORY=cache
LOG_DIRECTORY=logs
MAX_FILE_SIZE=10485760  # 10MB

# Context Logging
ENABLE_CONTEXT_LOGGING=true

Start the Backend Server

cd backend
python -m uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

The backend will be available at http://localhost:8000

3. Frontend Setup

Install Node.js Dependencies

cd frontend
npm install

Start the Frontend Development Server

npm run dev

The frontend will be available at http://localhost:5173

4. Verify Installation

  1. Backend Health Check: Visit http://localhost:8000/health
  2. API Documentation: Visit http://localhost:8000/docs
  3. Frontend Application: Visit http://localhost:5173

๐Ÿง  AI Prompt Workflow & Training Process

Overview

The AI training system uses a sophisticated three-document approach to create a comprehensive context for document analysis:

graph TD
    A[Sample Document] --> D[Training Context]
    B[Reference Q&A] --> D
    C[Question Template] --> D
    D --> E[Trained AI Model]
    E --> F[Document Analysis]
    F --> G[Structured Q&A Output]
Loading

1. Training Document Types

๐Ÿ“„ Sample Document

  • Purpose: Provides the AI with an example of the document type to analyze
  • Content: A representative PDF document similar to those you plan to analyze
  • AI Learning: Document structure, formatting, content patterns, terminology

๐Ÿ“‹ Reference Q&A Document

  • Purpose: Teaches the AI the desired answering style and format
  • Content: PDF containing example questions and ideal answers
  • AI Learning: Answer formatting, tone, depth, structure, citation style

โ“ Question Template Document

  • Purpose: Defines the specific questions to ask for every document analysis
  • Content: PDF with numbered questions to be answered
  • AI Learning: Question types, analysis focus areas, expected output structure

2. AI Training Prompt Engineering

Training Context Construction

# The system constructs a comprehensive training prompt:
training_context = f"""
You are an expert document analyzer. Use these training materials:

1. SAMPLE DOCUMENT: {sample_document}
   - Study the document structure and content type
   - Understand the formatting and layout
   - Learn domain-specific terminology

2. REFERENCE Q&A: {reference_document}
   - Learn the desired answering style
   - Follow the format and structure shown
   - Match the depth and detail level
   - Use similar citation methods

3. QUESTION TEMPLATE: {template_document}
   - These are the exact questions to answer
   - Follow the numbering system
   - Address each question comprehensively

INSTRUCTIONS:
- Maintain consistent formatting across all answers
- Provide accurate, detailed responses
- Include relevant citations when available
- Follow the established answering pattern
"""

Analysis Prompt Structure

analysis_prompt = f"""
Based on your training with the sample document and reference Q&A style, 
analyze this new document: {target_document}

Answer each question from the template with the same style and format 
you learned from the reference answers.

Ensure each answer is:
1. Accurate and well-researched
2. Properly formatted
3. Appropriately detailed
4. Consistent with the training style

Questions to answer: {template_questions}
"""

3. Anthropic Message Caching Strategy

The application uses Anthropic's message caching feature for efficiency:

# Training documents are cached to reduce API costs and improve performance
cached_messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "document",
                "source": {
                    "type": "base64",
                    "media_type": "application/pdf", 
                    "data": sample_document_base64
                },
                "cache_control": {"type": "ephemeral"}  # Cache enabled
            }
        ]
    },
    # Additional cached training documents...
]

4. Context Learning Process

Phase 1: Document Structure Learning

  • AI analyzes the sample document to understand:
    • Document layout and formatting
    • Section organization
    • Content hierarchy
    • Domain-specific language

Phase 2: Style Training

  • AI studies reference Q&A to learn:
    • Answer formatting preferences
    • Appropriate response length
    • Citation and reference style
    • Professional tone and language

Phase 3: Question Framework

  • AI processes question template to understand:
    • Required analysis scope
    • Question types and categories
    • Expected output structure
    • Numbering and organization

5. Validation & Feedback Loop

Test Validation Process

# Test document analysis with known expected answers
test_result = {
    "ai_answers": generated_answers,
    "expected_answers": provided_expectations,
    "accuracy_scores": calculated_similarities,
    "feedback_areas": identified_improvements
}

Feedback Integration

  • Real-time Corrections: Users can edit answers and provide feedback
  • Style Refinement: AI learns from corrections to improve future responses
  • Contextual Learning: Feedback is integrated into the training context

๐Ÿ“– Usage Guide

1. Training Workflow

Step 1: Upload Training Documents

  1. Navigate to the Training section
  2. Upload three required documents:
    • Sample Document: Representative PDF for analysis
    • Reference Q&A: Example questions and answers
    • Question Template: Questions to ask for each analysis

Step 2: Process Training

  1. Click "Process Training Documents"
  2. Wait for AI training completion (caching and context building)
  3. Verify training status shows "Ready"

2. Validation Workflow (Recommended)

Step 1: Upload Test Document

  1. Navigate to the Validation section
  2. Upload a test PDF document
  3. Optionally provide expected answers in JSON format

Step 2: Review Results

  1. Compare AI-generated answers with expected results
  2. Review accuracy scores for each question
  3. Identify areas for improvement

Step 3: Provide Feedback

  1. Edit answers that need improvement
  2. Provide specific feedback for better responses
  3. Submit feedback to refine AI training

Step 4: Approve Validation

  1. Review overall accuracy
  2. Approve the validation to proceed to analysis
  3. Training context is updated with feedback

3. Document Analysis

Step 1: Upload Analysis Document

  1. Navigate to the Analysis section
  2. Upload PDF document for analysis
  3. Start the analysis process

Step 2: Review Generated Answers

  1. Review structured Q&A output
  2. Each question from the template is answered
  3. Answers follow the trained style and format

Step 3: Interactive Correction

  1. Edit any answers that need refinement
  2. Use the correction prompt feature for AI-assisted improvements
  3. Regenerate specific answers with additional context

Step 4: Export Results

  1. Export analysis results in JSON format
  2. Include source context and metadata
  3. Download for external use or reporting

๐Ÿ”ง API Documentation

Training Endpoints

Upload Training Document

POST /api/training/upload-document
Content-Type: multipart/form-data

Form Data:
- file: PDF file
- document_type: "sample" | "reference" | "template"

Process Training Documents

POST /api/training/process-individual

Response:
{
  "status": "completed",
  "message": "Training completed successfully",
  "progress": 100,
  "completed": true
}

Get Training Status

GET /api/training/status

Response:
{
  "status": "completed",
  "message": "Training ready",
  "progress": 100,
  "completed": true
}

Analysis Endpoints

Upload Document for Analysis

POST /api/analysis/upload-document
Content-Type: multipart/form-data

Form Data:
- file: PDF file
- document_type: "analysis"

Start Analysis

POST /api/analysis/analyze-document

Response:
{
  "session_id": "uuid",
  "status": "completed",
  "progress": 100
}

Get Analysis Results

GET /api/analysis/results/{session_id}

Response:
{
  "session_id": "uuid",
  "document_filename": "document.pdf",
  "analysis_status": "completed",
  "questions_answers": [
    {
      "question_number": 1,
      "question": "What is the main purpose?",
      "answer": "The main purpose is...",
      "confidence": 0.95
    }
  ]
}

Correct Answer

PUT /api/analysis/correct-answer

Body:
{
  "session_id": "uuid",
  "question_number": 1,
  "correction_prompt": "Please be more specific about...",
  "current_answer": "Current answer text"
}

Validation Endpoints

Analyze Test Document

POST /api/test/analyze-document
Content-Type: multipart/form-data

Form Data:
- file: PDF file
- expected_answers: JSON string (optional)

Provide Feedback

POST /api/test/feedback

Body:
{
  "test_session_id": "uuid",
  "question_number": 1,
  "feedback_text": "The answer should include...",
  "suggested_answer": "Suggested improvement"
}

๐Ÿšจ Troubleshooting

Common Issues

"Training context not available"

  • Cause: Training documents not properly cached
  • Solution: Re-upload training documents and process again
  • Technical: Check shared dependency injection in dependencies.py

API Connection Issues

  • Check: Backend server is running on port 8000
  • Verify: ANTHROPIC_API_KEY is correctly set in .env
  • Test: Visit /health endpoint for status

File Upload Failures

  • Check: File size under 10MB
  • Verify: PDF format only
  • Ensure: Proper file permissions in upload directory

Training Process Fails

  • Verify: All three training documents are uploaded
  • Check: PDF files are valid and readable
  • Review: API rate limits and quotas

Error Handling

Backend Errors

# Check backend logs
cd backend
tail -f logs/app.log

Frontend Errors

# Check browser console for JavaScript errors
# Open Developer Tools โ†’ Console

API Debugging

# Test API endpoints directly
curl -X GET http://localhost:8000/health
curl -X GET http://localhost:8000/api/training/status

๐Ÿ—๏ธ Development

Project Structure

ai_powered_document_analyzer/
โ”œโ”€โ”€ backend/
โ”‚   โ”œโ”€โ”€ app/
โ”‚   โ”‚   โ”œโ”€โ”€ config/           # Application configuration
โ”‚   โ”‚   โ”œโ”€โ”€ models/           # Pydantic data models
โ”‚   โ”‚   โ”œโ”€โ”€ routes/           # FastAPI route handlers
โ”‚   โ”‚   โ”œโ”€โ”€ services/         # Business logic services
โ”‚   โ”‚   โ”œโ”€โ”€ utils/            # Utility functions
โ”‚   โ”‚   โ””โ”€โ”€ main.py          # FastAPI application
โ”‚   โ”œโ”€โ”€ cache/               # Document and context cache
โ”‚   โ”œโ”€โ”€ logs/                # Application logs
โ”‚   โ””โ”€โ”€ uploads/             # Temporary file storage
โ”œโ”€โ”€ frontend/
โ”‚   โ”œโ”€โ”€ src/
โ”‚   โ”‚   โ”œโ”€โ”€ components/      # Vue.js components
โ”‚   โ”‚   โ”œโ”€โ”€ views/           # Page-level components
โ”‚   โ”‚   โ”œโ”€โ”€ stores/          # Pinia state management
โ”‚   โ”‚   โ”œโ”€โ”€ services/        # API service layer
โ”‚   โ”‚   โ””โ”€โ”€ utils/           # Frontend utilities
โ”‚   โ””โ”€โ”€ public/              # Static assets
โ””โ”€โ”€ Document_analyzer_test_data/  # Sample test documents

Adding New Features

Backend Development

  1. Models: Define data structures in app/models/
  2. Services: Implement business logic in app/services/
  3. Routes: Create API endpoints in app/routes/
  4. Tests: Add tests for new functionality

Frontend Development

  1. Components: Create reusable Vue components
  2. Views: Implement page-level components
  3. Stores: Manage application state with Pinia
  4. Services: Handle API communication

Code Quality

Backend Standards

  • Type hints for all functions
  • Async/await for I/O operations
  • Pydantic models for data validation
  • Comprehensive error handling

Frontend Standards

  • Vue 3 Composition API
  • TypeScript-style prop definitions
  • Reactive state management
  • Component composition patterns

๐Ÿ“ Contributing

Development Setup

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Submit a pull request

Coding Standards

  • Follow existing code style and conventions
  • Include comprehensive docstrings
  • Add unit tests for new features
  • Update documentation as needed

๐Ÿ“„ License

This is licensed proprietary

๐Ÿค Support

For questions, issues, or contributions:

  • Create an issue in the repository
  • Review the troubleshooting section
  • Check the API documentation at /docs

๐Ÿ”ฎ Roadmap

  • Multi-language Support: Support for documents in different languages
  • Batch Processing: Analyze multiple documents simultaneously
  • Advanced Analytics: Confidence scoring and uncertainty indicators
  • Template Management: Save and manage multiple question templates
  • Export Formats: Support for PDF, Word, and Excel exports
  • User Authentication: Multi-user support with role-based access
  • Cloud Storage: Integration with cloud storage providers
  • API Rate Limiting: Advanced rate limiting and quota management

Built with โค๏ธ using Vue.js, FastAPI, and Anthropic Claude Sonnet 4

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors