Skip to content

JonasReuter/physics-rag-stack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Physics RAG Stack

A production-ready Retrieval-Augmented Generation (RAG) system that applies physics-inspired principles to improve retrieval quality and computational efficiency.

📖 Documentation: For detailed theoretical concepts and architecture deep dive, see Architecture.md

Key Features

  • Boundary Filtering — Fast document filtering using compact signatures before expensive vector operations
  • Hierarchical Retrieval — Multi-scale index traversal with adaptive depth control
  • Evidence Correlation — Graph-aware context assembly with relationship-based expansion
  • Budget-Aware Execution — Dynamic cost-quality tradeoffs based on query complexity
  • Multi-View Verification — Cross-validation across multiple evidence sources before generation
  • Adversarial Training — Hard negative mining for robust constraint handling
  • Adaptive Index Health — Continuous monitoring and rebalancing for long-term stability

Result: Reduced hallucinations, improved multi-hop reasoning, and predictable latency under budget constraints.


Motivation

Traditional RAG pipelines follow a naive approach:

Query → Embed → ANN Top-K → Stuff Context → Generate

Problems:

  • High false positive rates (semantically similar but contextually wrong)
  • No multi-hop reasoning support
  • Unpredictable costs (always use expensive operations)
  • Brittleness to constraint violations (wrong product/version/time)
  • Hallucinations when evidence is weak

Our Approach:

We treat information retrieval as a controlled resource allocation problem with graduated escalation:

  1. Fast boundary filtering reduces the search space by 10-100x
  2. Hierarchical traversal expands candidates adaptively
  3. Graph-based correlation assembles evidence packages, not isolated chunks
  4. Budget-aware controller escalates to expensive operations only when justified
  5. Multi-view verification requires consensus before answering
  6. Continuous health monitoring prevents index degradation over time

Physics-Inspired Principles

This system applies seven foundational concepts from theoretical physics and information theory to solve practical RAG challenges. You don't need to be a physics expert to use this system—these principles simply provide elegant solutions to information retrieval problems.

1. Holographic Principle

Origin: Gerard 't Hooft (1993), Leonard Susskind (1995)

Physics Idea: All information contained in a volume of space can be encoded on its boundary surface—like a hologram encoding 3D information in 2D.

Why it matters for RAG: Instead of searching through entire documents, we create compact "boundary signatures" that capture essential information. This reduces search space by 10-100x while preserving what matters.

Practical analogy: Like judging a book by its cover, table of contents, and back-cover summary rather than reading every page first.

2. Renormalization Group

Origin: Kenneth Wilson (1971, Nobel Prize 1982)

Physics Idea: Complex systems can be understood at different scales—zoom out to see patterns, zoom in for details. Information at each scale is "effective" for that level.

Why it matters for RAG: Build a multi-level index (domain → topic → document → chunk) and start broad, only drilling down where promising. This is logarithmic instead of linear search.

Practical analogy: Like searching for a restaurant: first pick a city, then a neighborhood, then a street, then a specific place—not checking every restaurant in the country.

3. Quantum Entanglement

Origin: Einstein, Podolsky, Rosen (1935); John Bell (1964)

Physics Idea: Particles can be correlated such that measuring one instantly reveals information about another, regardless of distance.

Why it matters for RAG: Information chunks are correlated through shared entities, topics, time periods, and relationships. Retrieve context packages, not isolated chunks.

Practical analogy: Like finding a news article and automatically getting related context: the original source, follow-up stories, and expert commentary—not just one isolated paragraph.

4. Fast Scrambling

Origin: Stephen Hawking (black hole thermodynamics), Patrick Hayden & John Preskill (2007)

Physics Idea: Information in quantum systems can mix (scramble) extremely quickly, making distinctions important for thermalization.

Why it matters for RAG: Generate "hard negatives"—examples that look similar but violate constraints (wrong product, version, or date). Train models to distinguish what matters.

Practical analogy: Like training someone to spot counterfeit money by showing them high-quality fakes, not obviously different objects.

5. Landauer's Principle

Origin: Rolf Landauer (1961)

Physics Idea: Erasing information requires minimum energy proportional to temperature—computation has thermodynamic cost.

Why it matters for RAG: Every operation costs something (latency, tokens, API calls). Allocate computational "energy" wisely: use cheap operations first, escalate to expensive ones only when justified.

Practical analogy: Like triage in medicine—quick initial assessment for everyone, expensive tests only for those who need them.

6. Quantum Error Correction

Origin: Peter Shor (1995), Andrew Steane (1996)

Physics Idea: Encode quantum information redundantly across multiple qubits so errors can be detected and corrected without measuring the information directly.

Why it matters for RAG: Require consensus from multiple independent evidence sources (text, structured data, graphs) before generating answers. Contradictions indicate errors.

Practical analogy: Like checking a fact across multiple news sources—if they all agree, it's probably true; if they contradict, investigate further.

7. Bekenstein Bound

Origin: Jacob Bekenstein (1981)

Physics Idea: There's a maximum amount of information that can be contained in a region of space, related to its surface area and energy.

Why it matters for RAG: Index clusters have optimal information density. Too heterogeneous? Split them. Too redundant? Merge them. Keep the index "healthy."

Practical analogy: Like organizing a library—if a shelf becomes too mixed (physics + cooking + history), split it. If shelves are mostly duplicates, consolidate them.


The Key Insight: These physics concepts provide mathematical frameworks for problems that plague RAG systems—they're not mere analogies but guide actual implementation decisions around information encoding, search efficiency, correlation structures, and error correction.

📚 Deep Dive: For detailed mathematical formulations, historical context, and technical explanations of each physics concept, see the Theoretical Foundations section in Architecture.md.


Architecture

Query Pipeline

graph TD
    A[Query] --> B[Parse & Extract]
    B --> C[Boundary Filter]
    C --> D[Hierarchical Walk]
    D --> E[Correlation Expansion]
    E --> F{Budget Controller}
    F -->|Low Confidence| G[Rerank]
    F -->|High Risk| H[Verify]
    F -->|Sufficient| I[Answer Synthesis]
    G --> I
    H --> I
    I -->|Pass| J[Response]
    I -->|Fail| K[Abstain/Ask Back]
Loading

Components

Stage Input Output Cost Purpose
Parse Raw query Entities, constraints, intent Cheap Extract structured requirements
Boundary Filter Query signature Candidate buckets Cheap Eliminate 90%+ of irrelevant docs
Hierarchical Walk Buckets Ranked chunks Medium Adaptive coarse-to-fine expansion
Correlation Chunks Evidence packages Medium Assemble contextual support
Budget Control Confidence metrics Escalation decision Cheap Optimize quality/cost tradeoff
Reranking Top candidates Refined ranking Expensive Deep semantic matching
Verification Evidence set Consistency score Medium Multi-view validation
Synthesis Verified evidence Answer + citations Expensive Generate response

Offline Processes

  • Hard Negative Mining — Generate constraint-violating examples for training
  • Index Rebalancing — Split/merge clusters based on entropy metrics
  • Signature Refresh — Update boundary representations on drift
  • Performance Monitoring — Track latency, hit rates, and quality metrics

Project Structure

physics-rag-stack/
├── README.md
├── LICENSE
├── pyproject.toml                 # Modern Python packaging
├── setup.py                       # Backwards compatibility
├── .env.example                   # Environment template
├── docker-compose.yml             # Local development stack
├── Dockerfile                     # Production image
│
├── configs/
│   ├── default.yaml               # Base configuration
│   ├── development.yaml           # Dev overrides
│   └── production.yaml            # Prod overrides
│
├── src/
│   └── physics_rag/
│       ├── __init__.py
│       │
│       ├── core/                  # Core business logic
│       │   ├── __init__.py
│       │   ├── models.py          # Data models (Chunk, Evidence, Query)
│       │   ├── types.py           # Type definitions
│       │   └── exceptions.py      # Custom exceptions
│       │
│       ├── ingestion/             # Document processing pipeline
│       │   ├── __init__.py
│       │   ├── chunker.py         # Text chunking strategies
│       │   ├── embedding_service.py
│       │   ├── entity_extractor.py
│       │   ├── boundary_builder.py    # Signature generation
│       │   ├── hierarchy_builder.py   # Multi-scale clustering
│       │   └── graph_builder.py       # Knowledge graph construction
│       │
│       ├── storage/               # Data persistence layer
│       │   ├── __init__.py
│       │   ├── base.py            # Abstract interfaces
│       │   ├── vector_store.py    # Vector DB adapter
│       │   ├── boundary_store.py  # Signature index
│       │   ├── hierarchy_store.py # Multi-scale index
│       │   ├── graph_store.py     # Graph database adapter
│       │   └── implementations/   # Concrete implementations
│       │       ├── __init__.py
│       │       ├── qdrant.py
│       │       ├── weaviate.py
│       │       ├── neo4j.py
│       │       └── postgres.py
│       │
│       ├── retrieval/             # Query execution engine
│       │   ├── __init__.py
│       │   ├── query_parser.py    # Intent & constraint extraction
│       │   ├── boundary_filter.py # Fast pre-filtering
│       │   ├── hierarchical_search.py  # Coarse-to-fine traversal
│       │   ├── correlation_engine.py   # Evidence assembly
│       │   ├── scoring.py         # Multi-signal fusion
│       │   ├── budget_controller.py    # Cost-quality optimization
│       │   └── evidence_verifier.py    # Multi-view validation
│       │
│       ├── ranking/               # Re-ranking & scoring
│       │   ├── __init__.py
│       │   ├── cross_encoder.py   # Deep reranker
│       │   ├── nli_verifier.py    # Entailment checking
│       │   └── ensemble.py        # Score fusion
│       │
│       ├── training/              # Model training & calibration
│       │   ├── __init__.py
│       │   ├── hard_negative_miner.py  # Adversarial examples
│       │   ├── reranker_trainer.py
│       │   └── calibration.py     # Confidence calibration
│       │
│       ├── monitoring/            # Observability & health
│       │   ├── __init__.py
│       │   ├── metrics.py         # Performance tracking
│       │   ├── entropy_monitor.py # Index health
│       │   ├── drift_detector.py  # Distribution shifts
│       │   └── rebalancer.py      # Adaptive maintenance
│       │
│       ├── api/                   # External interfaces
│       │   ├── __init__.py
│       │   ├── rest.py            # FastAPI application
│       │   ├── schemas.py         # Request/response models
│       │   ├── dependencies.py    # DI container
│       │   └── middleware.py      # Auth, logging, etc.
│       │
│       ├── cli/                   # Command-line tools
│       │   ├── __init__.py
│       │   ├── ingest.py          # Data ingestion
│       │   ├── build.py           # Index building
│       │   ├── query.py           # Interactive queries
│       │   └── maintain.py        # Maintenance tasks
│       │
│       └── utils/                 # Shared utilities
│           ├── __init__.py
│           ├── config.py          # Configuration management
│           ├── logging.py         # Structured logging
│           ├── profiling.py       # Performance profiling
│           └── validation.py      # Input validation
│
├── tests/
│   ├── __init__.py
│   ├── conftest.py                # Pytest fixtures
│   ├── unit/                      # Unit tests
│   │   ├── test_chunker.py
│   │   ├── test_boundary.py
│   │   └── ...
│   ├── integration/               # Integration tests
│   │   ├── test_pipeline.py
│   │   └── ...
│   └── performance/               # Benchmark tests
│       ├── test_latency.py
│       └── test_throughput.py
│
├── examples/                      # Usage examples
│   ├── quickstart.py
│   ├── custom_adapter.py
│   ├── fine_tuning.py
│   └── demo_corpus/
│       └── documents/
│
├── notebooks/                     # Jupyter notebooks
│   ├── 01_ingestion_demo.ipynb
│   ├── 02_retrieval_analysis.ipynb
│   └── 03_performance_tuning.ipynb
│
├── scripts/                       # Automation scripts
│   ├── setup_dev.sh
│   ├── run_tests.sh
│   └── deploy.sh
│
└── docs/                          # Documentation
    ├── architecture.md
    ├── api_reference.md
    ├── configuration.md
    └── deployment.md

Quick Start

Installation

# Clone repository
git clone https://github.com/yourusername/physics-rag-stack.git
cd physics-rag-stack

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install package with dependencies
pip install -e ".[dev]"  # Includes dev dependencies

# Or for production
pip install -e .

Docker Setup (Recommended)

# Copy environment template
cp .env.example .env

# Edit .env with your configuration
# Start all services (vector DB, graph DB, API)
docker-compose up -d

# Check services
docker-compose ps

Basic Usage

1. Ingest Documents

# Process and index a document corpus
physics-rag ingest \
  --input ./examples/demo_corpus \
  --output ./data/index \
  --config ./configs/default.yaml

# This will:
# - Chunk documents
# - Generate embeddings
# - Extract entities
# - Build boundary signatures
# - Create hierarchical index
# - Construct knowledge graph

2. Build Indices

# Build boundary filter (fast)
physics-rag build boundary \
  --index ./data/index \
  --signature-dim 256

# Build hierarchical index (4 levels: domain -> topic -> doc -> chunk)
physics-rag build hierarchy \
  --index ./data/index \
  --levels 4 \
  --clustering-method kmeans

# Build knowledge graph
physics-rag build graph \
  --index ./data/index \
  --min-confidence 0.7

3. Query the System

# Interactive query
physics-rag query \
  --index ./data/index \
  --budget-ms 500 \
  --min-confidence 0.85

# Programmatic query
physics-rag query \
  --index ./data/index \
  --question "How does feature X work in version 2.0?" \
  --constraints "product=widget,version>=2.0" \
  --budget-ms 800 \
  --output json

4. Start API Server

# Development server
physics-rag serve \
  --host 0.0.0.0 \
  --port 8000 \
  --reload

# Production server (with Gunicorn)
gunicorn physics_rag.api.rest:app \
  --workers 4 \
  --worker-class uvicorn.workers.UvicornWorker \
  --bind 0.0.0.0:8000

Python API

from physics_rag import PhysicsRAG
from physics_rag.core.models import Query, QueryConstraints

# Initialize system
rag = PhysicsRAG.from_config("./configs/default.yaml")

# Load index
rag.load_index("./data/index")

# Create query
query = Query(
    text="How does feature X work in version 2.0?",
    constraints=QueryConstraints(
        product="widget",
        version_min="2.0",
        time_range=None
    ),
    budget_ms=500,
    min_confidence=0.85
)

# Execute query
result = rag.query(query)

# Access results
print(f"Answer: {result.answer}")
print(f"Confidence: {result.confidence:.2f}")
print(f"Citations: {[c.source for c in result.citations]}")
print(f"Latency: {result.latency_ms}ms")

# Evidence map (for debugging)
for evidence in result.evidence_packages:
    print(f"- {evidence.anchor.text[:100]}...")
    print(f"  Sources: {evidence.sources}")
    print(f"  Score: {evidence.score:.3f}")

Core Concepts

1. Boundary Signatures

Problem: Vector similarity search is expensive and returns many false positives.

Solution: Pre-filter using compact document signatures before dense vector operations.

Each document/cluster maintains a lightweight signature:

@dataclass
class BoundarySignature:
    """Compact representation for fast filtering"""
    node_id: str
    topic_vector: np.ndarray      # 256-dim compressed embedding
    entity_set: Set[str]           # Top entities (with types)
    keyword_sketch: Dict[str, float]  # BM25-style sparse features
    constraints: Constraints       # Time, product, version, language
    quality_score: float           # Freshness, reliability, completeness
    
    def matches(self, query: Query) -> bool:
        """Fast boolean filter"""
        return (
            self.satisfies_constraints(query.constraints) and
            self.has_entity_overlap(query.entities) and
            self.quality_score >= query.min_quality
        )

Benefits:

  • 10-100x reduction in candidate space
  • Sub-millisecond filtering
  • Constraint violations caught early

2. Hierarchical Search

Problem: Flat vector search doesn't scale; we need adaptive depth control.

Solution: Multi-level index with coarse-to-fine traversal and early stopping.

# Index structure
Level 3: Domains      (5-10 nodes)    e.g., "Authentication", "Billing"
Level 2: Topics       (50-100 nodes)  e.g., "OAuth 2.0", "JWT Tokens"  
Level 1: Documents    (1K-10K nodes)  e.g., "auth_guide_v2.pdf"
Level 0: Chunks       (10K-1M nodes)  e.g., paragraph-level segments

# Walkdown algorithm
def hierarchical_search(query: Query, budget: Budget) -> List[Chunk]:
    candidates = get_top_k_at_level(3, query, k=3)
    
    for level in [2, 1, 0]:
        # Expand only promising nodes
        expanded = []
        for node in candidates:
            if should_expand(node, budget):
                expanded.extend(node.children)
        
        # Score and prune
        candidates = score_and_filter(expanded, query)
        
        # Early stop if improvement plateaus
        if improvement_ratio < threshold:
            break
            
    return candidates

Benefits:

  • Logarithmic complexity instead of linear
  • Budget-aware depth control
  • Natural hierarchy (domain → topic → doc → chunk)

3. Evidence Correlation

Problem: Single chunks lack context; LLMs need supporting evidence.

Solution: Assemble evidence packages using graph relationships and co-occurrence.

@dataclass
class EvidencePackage:
    """Correlated evidence bundle"""
    anchor: Chunk                    # Primary match
    supporting: List[Chunk]          # Related chunks
    graph_context: List[GraphEdge]   # Entity relationships
    temporal_context: List[Event]    # Time-aligned events
    cross_references: List[Citation] # Inter-document links
    
    @property
    def confidence(self) -> float:
        """Multi-signal confidence score"""
        return weighted_sum([
            self.anchor.score,
            self.support_diversity(),
            self.graph_connectivity(),
            self.temporal_alignment(),
            self.cross_validation()
        ])

Correlation Signals:

  • Graph edges: Entity co-mentions, semantic relations
  • Co-occurrence: PMI, co-citation, co-editing
  • Temporal: Same time period, event chains
  • Structural: Document hierarchy, section proximity

4. Hard Negative Mining

Problem: Models confuse semantically similar but contextually wrong content.

Solution: Generate adversarial examples with high similarity but constraint violations.

def mine_hard_negatives(query: Query, positives: List[Chunk]) -> List[Chunk]:
    """Generate constraint-violating negatives"""
    negatives = []
    
    for pos in positives:
        # Find high-similarity candidates
        candidates = vector_search(pos.embedding, k=100)
        
        # Filter for constraint violations
        for cand in candidates:
            if (cosine_sim(pos, cand) > 0.8 and 
                violates_any_constraint(cand, query.constraints)):
                negatives.append(cand)
                
    return negatives

# Training objective: penalize high-scoring constraint violations
loss = max(0, score(negative) - score(positive) + margin)

Use Cases:

  • Reranker fine-tuning
  • Online uncertainty detection
  • Calibration dataset generation

5. Budget Controller

Problem: Always using expensive operations wastes resources; always skipping them hurts quality.

Solution: Dynamic escalation based on confidence and query risk.

class BudgetController:
    """Adaptive quality-cost tradeoffs"""
    
    def decide_next_step(
        self,
        state: RetrievalState,
        budget: Budget
    ) -> Action:
        # Check budget exhaustion
        if budget.is_exhausted():
            return Action.SYNTHESIZE
            
        # Check confidence
        if state.confidence > self.high_confidence_threshold:
            return Action.SYNTHESIZE
            
        # Check query risk
        if state.query.is_high_risk():  # Legal, financial, technical specs
            if not state.has_verified:
                return Action.VERIFY  # Multi-view validation
                
        # Check score gap
        if state.top1_top5_gap < 0.1:
            if not state.has_reranked:
                return Action.RERANK  # Deep semantic scoring
                
        # Default: proceed with current evidence
        return Action.SYNTHESIZE

Decision Factors:

  • Confidence scores (top-1, score gaps)
  • Query risk class (legal, medical, financial)
  • Evidence diversity (number of sources)
  • Remaining budget (latency, tokens, API calls)

6. Multi-View Verification

Problem: Single-source evidence enables hallucination.

Solution: Require consensus across independent views before answering.

def verify_evidence(packages: List[EvidencePackage]) -> VerificationResult:
    """Cross-validate evidence across views"""
    
    # Extract claims from different views
    text_claims = extract_from_text(packages)
    graph_facts = extract_from_graph(packages)
    structured_data = extract_from_tables(packages)
    
    # Check consistency
    consistency_score = check_consistency([
        text_claims,
        graph_facts,
        structured_data
    ])
    
    # Check contradictions
    contradictions = find_contradictions(text_claims)
    
    # Quorum requirement: at least 2 of 3 must agree
    if consistency_score >= 0.8 and len(contradictions) == 0:
        return VerificationResult(
            passed=True,
            confidence=consistency_score,
            evidence=[text_claims, graph_facts, structured_data]
        )
    else:
        return VerificationResult(
            passed=False,
            reason="Insufficient consensus",
            conflicts=contradictions
        )

View Types:

  • Text passages (dense retrieval)
  • Knowledge graph facts (entity-relation triples)
  • Structured data (tables, schemas)
  • Cross-document citations

7. Index Health Monitoring

Problem: Indices degrade over time (drift, data skew, hot spots).

Solution: Continuous monitoring with adaptive rebalancing.

@dataclass
class ClusterHealth:
    """Health metrics for index regions"""
    intra_variance: float      # Embedding spread within cluster
    topic_entropy: float       # Keyword distribution entropy
    hit_distribution: float    # Query hit uniformity
    redundancy_rate: float     # Near-duplicate percentage
    freshness_score: float     # Avg document age
    
def monitor_and_rebalance(index: Index):
    """Periodic health check and remediation"""
    for cluster in index.clusters:
        health = compute_health(cluster)
        
        # Split if too heterogeneous
        if health.intra_variance > threshold:
            split_cluster(cluster)
            
        # Merge if too redundant
        if health.redundancy_rate > threshold:
            merge_with_similar(cluster)
            
        # Refresh if stale
        if health.freshness_score < threshold:
            recompute_signatures(cluster)

Monitoring Metrics:

  • Retrieval latency (p50, p95, p99)
  • False positive rate
  • Answer confidence distribution
  • Query hit entropy per cluster
  • Embedding drift (distribution shift)

Configuration

All system parameters are configurable via YAML:

# configs/default.yaml

system:
  index_path: ./data/index
  log_level: INFO
  
ingestion:
  chunk_size: 512
  chunk_overlap: 50
  embedding_model: sentence-transformers/all-MiniLM-L6-v2
  entity_extraction: spacy  # or: "openai", "local-ner"

boundary:
  signature_dim: 256
  top_entities: 20
  top_keywords: 50
  
hierarchy:
  levels: 4
  clustering_method: kmeans
  min_cluster_size: 10
  max_cluster_size: 500
  
retrieval:
  top_k_per_level: [3, 10, 50, 100]  # For each level
  early_stop_threshold: 0.05  # Stop if Δscore < 5%
  
scoring:
  weights:
    dense_sim: 0.4
    sparse_sim: 0.2
    graph_proximity: 0.2
    temporal_alignment: 0.1
    constraint_penalty: -0.3
    
budget:
  default_latency_ms: 500
  max_latency_ms: 2000
  rerank_cost_ms: 200
  verify_cost_ms: 100
  
verification:
  min_views: 2
  consistency_threshold: 0.8
  enable_nli: true
  
monitoring:
  enable_telemetry: true
  rebalance_interval_hours: 24
  health_check_interval_minutes: 60

REST API

FastAPI-based HTTP interface with automatic OpenAPI documentation.

Endpoints

POST /v1/query

Execute a query against the RAG system.

curl -X POST http://localhost:8000/v1/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "How does OAuth 2.0 work in version 3.0?",
    "constraints": {
      "product": "auth-service",
      "version": ">=3.0",
      "language": "en"
    },
    "budget_ms": 500,
    "min_confidence": 0.8,
    "return_evidence": true
  }'

Response:

{
  "answer": "OAuth 2.0 in version 3.0 implements...",
  "confidence": 0.92,
  "citations": [
    {
      "chunk_id": "doc123_chunk45",
      "source": "auth-service-v3-guide.pdf",
      "text": "OAuth 2.0 implementation uses...",
      "page": 12
    }
  ],
  "evidence_packages": [...],
  "metadata": {
    "latency_ms": 347,
    "steps_executed": ["boundary", "hierarchy", "correlation", "synthesis"],
    "candidates_filtered": 1523,
    "chunks_ranked": 47
  }
}

POST /v1/ingest

Ingest and index new documents.

curl -X POST http://localhost:8000/v1/ingest \
  -F "file=@document.pdf" \
  -F "metadata={\"product\": \"widget\", \"version\": \"2.0\"}"

GET /v1/health

Health check endpoint for monitoring.

curl http://localhost:8000/v1/health

Response:

{
  "status": "healthy",
  "index_loaded": true,
  "cluster_count": 156,
  "document_count": 2341,
  "avg_query_latency_ms": 234,
  "uptime_seconds": 86400
}

GET /v1/metrics

Prometheus-compatible metrics endpoint.


Extending the System

Custom Vector Store

from physics_rag.storage.base import VectorStoreAdapter
from physics_rag.core.models import Chunk, QueryVector

class MyVectorStore(VectorStoreAdapter):
    """Custom vector database adapter"""
    
    def __init__(self, connection_url: str):
        self.client = MyVectorDB(connection_url)
        
    def insert(self, chunks: List[Chunk]) -> None:
        vectors = [c.embedding for c in chunks]
        metadata = [c.metadata for c in chunks]
        self.client.upsert(vectors, metadata)
        
    def search(
        self,
        query_vector: QueryVector,
        filters: Dict,
        top_k: int
    ) -> List[Chunk]:
        results = self.client.query(
            vector=query_vector.embedding,
            filters=filters,
            limit=top_k
        )
        return [self._to_chunk(r) for r in results]

Custom Reranker

from physics_rag.ranking.base import Reranker

class MyReranker(Reranker):
    """Custom reranking model"""
    
    def __init__(self, model_path: str):
        self.model = load_model(model_path)
        
    def score(self, query: str, chunks: List[Chunk]) -> List[float]:
        pairs = [(query, c.text) for c in chunks]
        scores = self.model.predict(pairs)
        return scores

Configuration Override

# Use custom components
from physics_rag import PhysicsRAG
from my_extensions import MyVectorStore, MyReranker

rag = PhysicsRAG.from_config("config.yaml")

# Override defaults
rag.set_vector_store(MyVectorStore("postgresql://..."))
rag.set_reranker(MyReranker("./models/my-reranker"))

# Use normally
result = rag.query(query)

Testing

Run Test Suite

# All tests
pytest

# Unit tests only
pytest tests/unit/

# Integration tests (requires Docker)
pytest tests/integration/

# With coverage
pytest --cov=physics_rag --cov-report=html

# Performance benchmarks
pytest tests/performance/ --benchmark-only

Writing Tests

# tests/unit/test_boundary_filter.py
import pytest
from physics_rag.retrieval.boundary_filter import BoundaryFilter

def test_boundary_filter_excludes_violating_constraints():
    filter = BoundaryFilter()
    
    query = Query(
        text="test query",
        constraints={"product": "widget-a"}
    )
    
    signatures = [
        BoundarySignature(constraints={"product": "widget-a"}),  # Match
        BoundarySignature(constraints={"product": "widget-b"}),  # Violation
    ]
    
    result = filter.apply(query, signatures)
    
    assert len(result) == 1
    assert result[0].constraints["product"] == "widget-a"

Deployment

Docker Production Image

# Build
docker build -t physics-rag:latest .

# Run
docker run -d \
  -p 8000:8000 \
  -v /data/index:/app/data/index \
  -e CONFIG_PATH=/app/configs/production.yaml \
  physics-rag:latest

Kubernetes

# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: physics-rag
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: api
        image: physics-rag:latest
        ports:
        - containerPort: 8000
        env:
        - name: CONFIG_PATH
          value: /config/production.yaml
        volumeMounts:
        - name: index-data
          mountPath: /app/data/index
          readOnly: true
        resources:
          requests:
            memory: "2Gi"
            cpu: "1000m"
          limits:
            memory: "4Gi"
            cpu: "2000m"
        livenessProbe:
          httpGet:
            path: /v1/health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10

Performance Tuning

For High Throughput:

retrieval:
  top_k_per_level: [2, 5, 20, 50]  # Reduce candidates
  early_stop_threshold: 0.1         # Earlier stopping
budget:
  default_latency_ms: 200           # Tighter budget
verification:
  enable_nli: false                 # Skip expensive verification

For High Quality:

retrieval:
  top_k_per_level: [5, 20, 100, 200]
  early_stop_threshold: 0.01
budget:
  default_latency_ms: 1000
verification:
  enable_nli: true
  min_views: 3

Monitoring & Observability

Structured Logging

import structlog

logger = structlog.get_logger()

# Query execution automatically logs:
logger.info(
    "query_executed",
    query_id="abc123",
    latency_ms=234,
    confidence=0.92,
    steps=["boundary", "hierarchy", "synthesis"],
    candidates_filtered=1523
)

Metrics

Prometheus metrics are exposed at /v1/metrics:

# Query latency histogram
physics_rag_query_latency_seconds_bucket

# Query confidence distribution
physics_rag_query_confidence

# Index health
physics_rag_cluster_entropy
physics_rag_cluster_variance

# System resources
physics_rag_memory_usage_bytes
physics_rag_active_queries

Grafana Dashboard

Import dashboards/grafana.json for pre-built visualizations:

  • Query latency (p50, p95, p99)
  • Throughput (queries/sec)
  • Confidence distribution
  • Index health metrics
  • Error rates

Performance Characteristics

Typical performance on commodity hardware (16GB RAM, 8 CPUs):

Corpus Size Index Size Query Latency (p95) Throughput
1K docs ~500MB 150ms ~60 qps
10K docs ~5GB 250ms ~40 qps
100K docs ~50GB 400ms ~25 qps
1M docs ~500GB 800ms ~12 qps

Notes:

  • With GPU acceleration, add 2-3x throughput
  • With distributed vector DB, latency plateaus at ~300ms even for 10M+ docs
  • Budget controller allows trading latency for quality

Roadmap

  • Core retrieval pipeline
  • Boundary filtering
  • Hierarchical search
  • Evidence correlation
  • Budget controller
  • Multi-view verification
  • GPU-accelerated embedding
  • Distributed index sharding
  • Streaming response generation
  • Multi-modal support (images, tables)
  • Active learning for reranker
  • Automatic hyperparameter tuning

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Development Setup

# Install dev dependencies
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

# Run formatters
black src/ tests/
isort src/ tests/

# Type checking
mypy src/

# Linting
ruff check src/ tests/

Code Style

  • Python 3.10+
  • Type hints required
  • Black formatting (line length 100)
  • Google-style docstrings
  • Test coverage >80%

License

MIT License - see LICENSE file for details.


Citation

If you use this work in your research, please cite:

@software{physics_rag_2026,
  title = {Physics-Inspired RAG Stack},
  author = {Your Name},
  year = {2026},
  url = {https://github.com/yourusername/physics-rag-stack}
}

References

Theoretical Foundations:

  • Holographic Principle: Information encoding on boundaries
  • Renormalization Group: Multi-scale effective theories
  • Quantum Entanglement: Correlation structures
  • Landauer's Principle: Thermodynamic cost of computation
  • Bekenstein Bound: Information capacity limits
  • Fast Scrambling: Information mixing and thermalization

Practical Implementations:

  • Dense Retrieval: Sentence-BERT, DPR, ColBERT
  • Sparse Retrieval: BM25, SPLADE
  • Reranking: Cross-encoders, MonoT5
  • Verification: NLI models, consistency checking
  • Knowledge Graphs: Neo4j, Entity linking

Support

About

Physics-inspired Retrieval-Augmented Generation (RAG) stack for vector databases + graphs. Implements concepts inspired by modern physics / information theory

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors