A production-ready Retrieval-Augmented Generation (RAG) system that applies physics-inspired principles to improve retrieval quality and computational efficiency.
📖 Documentation: For detailed theoretical concepts and architecture deep dive, see Architecture.md
- Boundary Filtering — Fast document filtering using compact signatures before expensive vector operations
- Hierarchical Retrieval — Multi-scale index traversal with adaptive depth control
- Evidence Correlation — Graph-aware context assembly with relationship-based expansion
- Budget-Aware Execution — Dynamic cost-quality tradeoffs based on query complexity
- Multi-View Verification — Cross-validation across multiple evidence sources before generation
- Adversarial Training — Hard negative mining for robust constraint handling
- Adaptive Index Health — Continuous monitoring and rebalancing for long-term stability
Result: Reduced hallucinations, improved multi-hop reasoning, and predictable latency under budget constraints.
Traditional RAG pipelines follow a naive approach:
Query → Embed → ANN Top-K → Stuff Context → Generate
Problems:
- High false positive rates (semantically similar but contextually wrong)
- No multi-hop reasoning support
- Unpredictable costs (always use expensive operations)
- Brittleness to constraint violations (wrong product/version/time)
- Hallucinations when evidence is weak
Our Approach:
We treat information retrieval as a controlled resource allocation problem with graduated escalation:
- Fast boundary filtering reduces the search space by 10-100x
- Hierarchical traversal expands candidates adaptively
- Graph-based correlation assembles evidence packages, not isolated chunks
- Budget-aware controller escalates to expensive operations only when justified
- Multi-view verification requires consensus before answering
- Continuous health monitoring prevents index degradation over time
This system applies seven foundational concepts from theoretical physics and information theory to solve practical RAG challenges. You don't need to be a physics expert to use this system—these principles simply provide elegant solutions to information retrieval problems.
Origin: Gerard 't Hooft (1993), Leonard Susskind (1995)
Physics Idea: All information contained in a volume of space can be encoded on its boundary surface—like a hologram encoding 3D information in 2D.
Why it matters for RAG: Instead of searching through entire documents, we create compact "boundary signatures" that capture essential information. This reduces search space by 10-100x while preserving what matters.
Practical analogy: Like judging a book by its cover, table of contents, and back-cover summary rather than reading every page first.
Origin: Kenneth Wilson (1971, Nobel Prize 1982)
Physics Idea: Complex systems can be understood at different scales—zoom out to see patterns, zoom in for details. Information at each scale is "effective" for that level.
Why it matters for RAG: Build a multi-level index (domain → topic → document → chunk) and start broad, only drilling down where promising. This is logarithmic instead of linear search.
Practical analogy: Like searching for a restaurant: first pick a city, then a neighborhood, then a street, then a specific place—not checking every restaurant in the country.
Origin: Einstein, Podolsky, Rosen (1935); John Bell (1964)
Physics Idea: Particles can be correlated such that measuring one instantly reveals information about another, regardless of distance.
Why it matters for RAG: Information chunks are correlated through shared entities, topics, time periods, and relationships. Retrieve context packages, not isolated chunks.
Practical analogy: Like finding a news article and automatically getting related context: the original source, follow-up stories, and expert commentary—not just one isolated paragraph.
Origin: Stephen Hawking (black hole thermodynamics), Patrick Hayden & John Preskill (2007)
Physics Idea: Information in quantum systems can mix (scramble) extremely quickly, making distinctions important for thermalization.
Why it matters for RAG: Generate "hard negatives"—examples that look similar but violate constraints (wrong product, version, or date). Train models to distinguish what matters.
Practical analogy: Like training someone to spot counterfeit money by showing them high-quality fakes, not obviously different objects.
Origin: Rolf Landauer (1961)
Physics Idea: Erasing information requires minimum energy proportional to temperature—computation has thermodynamic cost.
Why it matters for RAG: Every operation costs something (latency, tokens, API calls). Allocate computational "energy" wisely: use cheap operations first, escalate to expensive ones only when justified.
Practical analogy: Like triage in medicine—quick initial assessment for everyone, expensive tests only for those who need them.
Origin: Peter Shor (1995), Andrew Steane (1996)
Physics Idea: Encode quantum information redundantly across multiple qubits so errors can be detected and corrected without measuring the information directly.
Why it matters for RAG: Require consensus from multiple independent evidence sources (text, structured data, graphs) before generating answers. Contradictions indicate errors.
Practical analogy: Like checking a fact across multiple news sources—if they all agree, it's probably true; if they contradict, investigate further.
Origin: Jacob Bekenstein (1981)
Physics Idea: There's a maximum amount of information that can be contained in a region of space, related to its surface area and energy.
Why it matters for RAG: Index clusters have optimal information density. Too heterogeneous? Split them. Too redundant? Merge them. Keep the index "healthy."
Practical analogy: Like organizing a library—if a shelf becomes too mixed (physics + cooking + history), split it. If shelves are mostly duplicates, consolidate them.
The Key Insight: These physics concepts provide mathematical frameworks for problems that plague RAG systems—they're not mere analogies but guide actual implementation decisions around information encoding, search efficiency, correlation structures, and error correction.
📚 Deep Dive: For detailed mathematical formulations, historical context, and technical explanations of each physics concept, see the Theoretical Foundations section in Architecture.md.
graph TD
A[Query] --> B[Parse & Extract]
B --> C[Boundary Filter]
C --> D[Hierarchical Walk]
D --> E[Correlation Expansion]
E --> F{Budget Controller}
F -->|Low Confidence| G[Rerank]
F -->|High Risk| H[Verify]
F -->|Sufficient| I[Answer Synthesis]
G --> I
H --> I
I -->|Pass| J[Response]
I -->|Fail| K[Abstain/Ask Back]
| Stage | Input | Output | Cost | Purpose |
|---|---|---|---|---|
| Parse | Raw query | Entities, constraints, intent | Cheap | Extract structured requirements |
| Boundary Filter | Query signature | Candidate buckets | Cheap | Eliminate 90%+ of irrelevant docs |
| Hierarchical Walk | Buckets | Ranked chunks | Medium | Adaptive coarse-to-fine expansion |
| Correlation | Chunks | Evidence packages | Medium | Assemble contextual support |
| Budget Control | Confidence metrics | Escalation decision | Cheap | Optimize quality/cost tradeoff |
| Reranking | Top candidates | Refined ranking | Expensive | Deep semantic matching |
| Verification | Evidence set | Consistency score | Medium | Multi-view validation |
| Synthesis | Verified evidence | Answer + citations | Expensive | Generate response |
- Hard Negative Mining — Generate constraint-violating examples for training
- Index Rebalancing — Split/merge clusters based on entropy metrics
- Signature Refresh — Update boundary representations on drift
- Performance Monitoring — Track latency, hit rates, and quality metrics
physics-rag-stack/
├── README.md
├── LICENSE
├── pyproject.toml # Modern Python packaging
├── setup.py # Backwards compatibility
├── .env.example # Environment template
├── docker-compose.yml # Local development stack
├── Dockerfile # Production image
│
├── configs/
│ ├── default.yaml # Base configuration
│ ├── development.yaml # Dev overrides
│ └── production.yaml # Prod overrides
│
├── src/
│ └── physics_rag/
│ ├── __init__.py
│ │
│ ├── core/ # Core business logic
│ │ ├── __init__.py
│ │ ├── models.py # Data models (Chunk, Evidence, Query)
│ │ ├── types.py # Type definitions
│ │ └── exceptions.py # Custom exceptions
│ │
│ ├── ingestion/ # Document processing pipeline
│ │ ├── __init__.py
│ │ ├── chunker.py # Text chunking strategies
│ │ ├── embedding_service.py
│ │ ├── entity_extractor.py
│ │ ├── boundary_builder.py # Signature generation
│ │ ├── hierarchy_builder.py # Multi-scale clustering
│ │ └── graph_builder.py # Knowledge graph construction
│ │
│ ├── storage/ # Data persistence layer
│ │ ├── __init__.py
│ │ ├── base.py # Abstract interfaces
│ │ ├── vector_store.py # Vector DB adapter
│ │ ├── boundary_store.py # Signature index
│ │ ├── hierarchy_store.py # Multi-scale index
│ │ ├── graph_store.py # Graph database adapter
│ │ └── implementations/ # Concrete implementations
│ │ ├── __init__.py
│ │ ├── qdrant.py
│ │ ├── weaviate.py
│ │ ├── neo4j.py
│ │ └── postgres.py
│ │
│ ├── retrieval/ # Query execution engine
│ │ ├── __init__.py
│ │ ├── query_parser.py # Intent & constraint extraction
│ │ ├── boundary_filter.py # Fast pre-filtering
│ │ ├── hierarchical_search.py # Coarse-to-fine traversal
│ │ ├── correlation_engine.py # Evidence assembly
│ │ ├── scoring.py # Multi-signal fusion
│ │ ├── budget_controller.py # Cost-quality optimization
│ │ └── evidence_verifier.py # Multi-view validation
│ │
│ ├── ranking/ # Re-ranking & scoring
│ │ ├── __init__.py
│ │ ├── cross_encoder.py # Deep reranker
│ │ ├── nli_verifier.py # Entailment checking
│ │ └── ensemble.py # Score fusion
│ │
│ ├── training/ # Model training & calibration
│ │ ├── __init__.py
│ │ ├── hard_negative_miner.py # Adversarial examples
│ │ ├── reranker_trainer.py
│ │ └── calibration.py # Confidence calibration
│ │
│ ├── monitoring/ # Observability & health
│ │ ├── __init__.py
│ │ ├── metrics.py # Performance tracking
│ │ ├── entropy_monitor.py # Index health
│ │ ├── drift_detector.py # Distribution shifts
│ │ └── rebalancer.py # Adaptive maintenance
│ │
│ ├── api/ # External interfaces
│ │ ├── __init__.py
│ │ ├── rest.py # FastAPI application
│ │ ├── schemas.py # Request/response models
│ │ ├── dependencies.py # DI container
│ │ └── middleware.py # Auth, logging, etc.
│ │
│ ├── cli/ # Command-line tools
│ │ ├── __init__.py
│ │ ├── ingest.py # Data ingestion
│ │ ├── build.py # Index building
│ │ ├── query.py # Interactive queries
│ │ └── maintain.py # Maintenance tasks
│ │
│ └── utils/ # Shared utilities
│ ├── __init__.py
│ ├── config.py # Configuration management
│ ├── logging.py # Structured logging
│ ├── profiling.py # Performance profiling
│ └── validation.py # Input validation
│
├── tests/
│ ├── __init__.py
│ ├── conftest.py # Pytest fixtures
│ ├── unit/ # Unit tests
│ │ ├── test_chunker.py
│ │ ├── test_boundary.py
│ │ └── ...
│ ├── integration/ # Integration tests
│ │ ├── test_pipeline.py
│ │ └── ...
│ └── performance/ # Benchmark tests
│ ├── test_latency.py
│ └── test_throughput.py
│
├── examples/ # Usage examples
│ ├── quickstart.py
│ ├── custom_adapter.py
│ ├── fine_tuning.py
│ └── demo_corpus/
│ └── documents/
│
├── notebooks/ # Jupyter notebooks
│ ├── 01_ingestion_demo.ipynb
│ ├── 02_retrieval_analysis.ipynb
│ └── 03_performance_tuning.ipynb
│
├── scripts/ # Automation scripts
│ ├── setup_dev.sh
│ ├── run_tests.sh
│ └── deploy.sh
│
└── docs/ # Documentation
├── architecture.md
├── api_reference.md
├── configuration.md
└── deployment.md
# Clone repository
git clone https://github.com/yourusername/physics-rag-stack.git
cd physics-rag-stack
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install package with dependencies
pip install -e ".[dev]" # Includes dev dependencies
# Or for production
pip install -e .# Copy environment template
cp .env.example .env
# Edit .env with your configuration
# Start all services (vector DB, graph DB, API)
docker-compose up -d
# Check services
docker-compose ps# Process and index a document corpus
physics-rag ingest \
--input ./examples/demo_corpus \
--output ./data/index \
--config ./configs/default.yaml
# This will:
# - Chunk documents
# - Generate embeddings
# - Extract entities
# - Build boundary signatures
# - Create hierarchical index
# - Construct knowledge graph# Build boundary filter (fast)
physics-rag build boundary \
--index ./data/index \
--signature-dim 256
# Build hierarchical index (4 levels: domain -> topic -> doc -> chunk)
physics-rag build hierarchy \
--index ./data/index \
--levels 4 \
--clustering-method kmeans
# Build knowledge graph
physics-rag build graph \
--index ./data/index \
--min-confidence 0.7# Interactive query
physics-rag query \
--index ./data/index \
--budget-ms 500 \
--min-confidence 0.85
# Programmatic query
physics-rag query \
--index ./data/index \
--question "How does feature X work in version 2.0?" \
--constraints "product=widget,version>=2.0" \
--budget-ms 800 \
--output json# Development server
physics-rag serve \
--host 0.0.0.0 \
--port 8000 \
--reload
# Production server (with Gunicorn)
gunicorn physics_rag.api.rest:app \
--workers 4 \
--worker-class uvicorn.workers.UvicornWorker \
--bind 0.0.0.0:8000from physics_rag import PhysicsRAG
from physics_rag.core.models import Query, QueryConstraints
# Initialize system
rag = PhysicsRAG.from_config("./configs/default.yaml")
# Load index
rag.load_index("./data/index")
# Create query
query = Query(
text="How does feature X work in version 2.0?",
constraints=QueryConstraints(
product="widget",
version_min="2.0",
time_range=None
),
budget_ms=500,
min_confidence=0.85
)
# Execute query
result = rag.query(query)
# Access results
print(f"Answer: {result.answer}")
print(f"Confidence: {result.confidence:.2f}")
print(f"Citations: {[c.source for c in result.citations]}")
print(f"Latency: {result.latency_ms}ms")
# Evidence map (for debugging)
for evidence in result.evidence_packages:
print(f"- {evidence.anchor.text[:100]}...")
print(f" Sources: {evidence.sources}")
print(f" Score: {evidence.score:.3f}")Problem: Vector similarity search is expensive and returns many false positives.
Solution: Pre-filter using compact document signatures before dense vector operations.
Each document/cluster maintains a lightweight signature:
@dataclass
class BoundarySignature:
"""Compact representation for fast filtering"""
node_id: str
topic_vector: np.ndarray # 256-dim compressed embedding
entity_set: Set[str] # Top entities (with types)
keyword_sketch: Dict[str, float] # BM25-style sparse features
constraints: Constraints # Time, product, version, language
quality_score: float # Freshness, reliability, completeness
def matches(self, query: Query) -> bool:
"""Fast boolean filter"""
return (
self.satisfies_constraints(query.constraints) and
self.has_entity_overlap(query.entities) and
self.quality_score >= query.min_quality
)Benefits:
- 10-100x reduction in candidate space
- Sub-millisecond filtering
- Constraint violations caught early
Problem: Flat vector search doesn't scale; we need adaptive depth control.
Solution: Multi-level index with coarse-to-fine traversal and early stopping.
# Index structure
Level 3: Domains (5-10 nodes) e.g., "Authentication", "Billing"
Level 2: Topics (50-100 nodes) e.g., "OAuth 2.0", "JWT Tokens"
Level 1: Documents (1K-10K nodes) e.g., "auth_guide_v2.pdf"
Level 0: Chunks (10K-1M nodes) e.g., paragraph-level segments
# Walkdown algorithm
def hierarchical_search(query: Query, budget: Budget) -> List[Chunk]:
candidates = get_top_k_at_level(3, query, k=3)
for level in [2, 1, 0]:
# Expand only promising nodes
expanded = []
for node in candidates:
if should_expand(node, budget):
expanded.extend(node.children)
# Score and prune
candidates = score_and_filter(expanded, query)
# Early stop if improvement plateaus
if improvement_ratio < threshold:
break
return candidatesBenefits:
- Logarithmic complexity instead of linear
- Budget-aware depth control
- Natural hierarchy (domain → topic → doc → chunk)
Problem: Single chunks lack context; LLMs need supporting evidence.
Solution: Assemble evidence packages using graph relationships and co-occurrence.
@dataclass
class EvidencePackage:
"""Correlated evidence bundle"""
anchor: Chunk # Primary match
supporting: List[Chunk] # Related chunks
graph_context: List[GraphEdge] # Entity relationships
temporal_context: List[Event] # Time-aligned events
cross_references: List[Citation] # Inter-document links
@property
def confidence(self) -> float:
"""Multi-signal confidence score"""
return weighted_sum([
self.anchor.score,
self.support_diversity(),
self.graph_connectivity(),
self.temporal_alignment(),
self.cross_validation()
])Correlation Signals:
- Graph edges: Entity co-mentions, semantic relations
- Co-occurrence: PMI, co-citation, co-editing
- Temporal: Same time period, event chains
- Structural: Document hierarchy, section proximity
Problem: Models confuse semantically similar but contextually wrong content.
Solution: Generate adversarial examples with high similarity but constraint violations.
def mine_hard_negatives(query: Query, positives: List[Chunk]) -> List[Chunk]:
"""Generate constraint-violating negatives"""
negatives = []
for pos in positives:
# Find high-similarity candidates
candidates = vector_search(pos.embedding, k=100)
# Filter for constraint violations
for cand in candidates:
if (cosine_sim(pos, cand) > 0.8 and
violates_any_constraint(cand, query.constraints)):
negatives.append(cand)
return negatives
# Training objective: penalize high-scoring constraint violations
loss = max(0, score(negative) - score(positive) + margin)Use Cases:
- Reranker fine-tuning
- Online uncertainty detection
- Calibration dataset generation
Problem: Always using expensive operations wastes resources; always skipping them hurts quality.
Solution: Dynamic escalation based on confidence and query risk.
class BudgetController:
"""Adaptive quality-cost tradeoffs"""
def decide_next_step(
self,
state: RetrievalState,
budget: Budget
) -> Action:
# Check budget exhaustion
if budget.is_exhausted():
return Action.SYNTHESIZE
# Check confidence
if state.confidence > self.high_confidence_threshold:
return Action.SYNTHESIZE
# Check query risk
if state.query.is_high_risk(): # Legal, financial, technical specs
if not state.has_verified:
return Action.VERIFY # Multi-view validation
# Check score gap
if state.top1_top5_gap < 0.1:
if not state.has_reranked:
return Action.RERANK # Deep semantic scoring
# Default: proceed with current evidence
return Action.SYNTHESIZEDecision Factors:
- Confidence scores (top-1, score gaps)
- Query risk class (legal, medical, financial)
- Evidence diversity (number of sources)
- Remaining budget (latency, tokens, API calls)
Problem: Single-source evidence enables hallucination.
Solution: Require consensus across independent views before answering.
def verify_evidence(packages: List[EvidencePackage]) -> VerificationResult:
"""Cross-validate evidence across views"""
# Extract claims from different views
text_claims = extract_from_text(packages)
graph_facts = extract_from_graph(packages)
structured_data = extract_from_tables(packages)
# Check consistency
consistency_score = check_consistency([
text_claims,
graph_facts,
structured_data
])
# Check contradictions
contradictions = find_contradictions(text_claims)
# Quorum requirement: at least 2 of 3 must agree
if consistency_score >= 0.8 and len(contradictions) == 0:
return VerificationResult(
passed=True,
confidence=consistency_score,
evidence=[text_claims, graph_facts, structured_data]
)
else:
return VerificationResult(
passed=False,
reason="Insufficient consensus",
conflicts=contradictions
)View Types:
- Text passages (dense retrieval)
- Knowledge graph facts (entity-relation triples)
- Structured data (tables, schemas)
- Cross-document citations
Problem: Indices degrade over time (drift, data skew, hot spots).
Solution: Continuous monitoring with adaptive rebalancing.
@dataclass
class ClusterHealth:
"""Health metrics for index regions"""
intra_variance: float # Embedding spread within cluster
topic_entropy: float # Keyword distribution entropy
hit_distribution: float # Query hit uniformity
redundancy_rate: float # Near-duplicate percentage
freshness_score: float # Avg document age
def monitor_and_rebalance(index: Index):
"""Periodic health check and remediation"""
for cluster in index.clusters:
health = compute_health(cluster)
# Split if too heterogeneous
if health.intra_variance > threshold:
split_cluster(cluster)
# Merge if too redundant
if health.redundancy_rate > threshold:
merge_with_similar(cluster)
# Refresh if stale
if health.freshness_score < threshold:
recompute_signatures(cluster)Monitoring Metrics:
- Retrieval latency (p50, p95, p99)
- False positive rate
- Answer confidence distribution
- Query hit entropy per cluster
- Embedding drift (distribution shift)
All system parameters are configurable via YAML:
# configs/default.yaml
system:
index_path: ./data/index
log_level: INFO
ingestion:
chunk_size: 512
chunk_overlap: 50
embedding_model: sentence-transformers/all-MiniLM-L6-v2
entity_extraction: spacy # or: "openai", "local-ner"
boundary:
signature_dim: 256
top_entities: 20
top_keywords: 50
hierarchy:
levels: 4
clustering_method: kmeans
min_cluster_size: 10
max_cluster_size: 500
retrieval:
top_k_per_level: [3, 10, 50, 100] # For each level
early_stop_threshold: 0.05 # Stop if Δscore < 5%
scoring:
weights:
dense_sim: 0.4
sparse_sim: 0.2
graph_proximity: 0.2
temporal_alignment: 0.1
constraint_penalty: -0.3
budget:
default_latency_ms: 500
max_latency_ms: 2000
rerank_cost_ms: 200
verify_cost_ms: 100
verification:
min_views: 2
consistency_threshold: 0.8
enable_nli: true
monitoring:
enable_telemetry: true
rebalance_interval_hours: 24
health_check_interval_minutes: 60FastAPI-based HTTP interface with automatic OpenAPI documentation.
Execute a query against the RAG system.
curl -X POST http://localhost:8000/v1/query \
-H "Content-Type: application/json" \
-d '{
"query": "How does OAuth 2.0 work in version 3.0?",
"constraints": {
"product": "auth-service",
"version": ">=3.0",
"language": "en"
},
"budget_ms": 500,
"min_confidence": 0.8,
"return_evidence": true
}'Response:
{
"answer": "OAuth 2.0 in version 3.0 implements...",
"confidence": 0.92,
"citations": [
{
"chunk_id": "doc123_chunk45",
"source": "auth-service-v3-guide.pdf",
"text": "OAuth 2.0 implementation uses...",
"page": 12
}
],
"evidence_packages": [...],
"metadata": {
"latency_ms": 347,
"steps_executed": ["boundary", "hierarchy", "correlation", "synthesis"],
"candidates_filtered": 1523,
"chunks_ranked": 47
}
}Ingest and index new documents.
curl -X POST http://localhost:8000/v1/ingest \
-F "file=@document.pdf" \
-F "metadata={\"product\": \"widget\", \"version\": \"2.0\"}"Health check endpoint for monitoring.
curl http://localhost:8000/v1/healthResponse:
{
"status": "healthy",
"index_loaded": true,
"cluster_count": 156,
"document_count": 2341,
"avg_query_latency_ms": 234,
"uptime_seconds": 86400
}Prometheus-compatible metrics endpoint.
from physics_rag.storage.base import VectorStoreAdapter
from physics_rag.core.models import Chunk, QueryVector
class MyVectorStore(VectorStoreAdapter):
"""Custom vector database adapter"""
def __init__(self, connection_url: str):
self.client = MyVectorDB(connection_url)
def insert(self, chunks: List[Chunk]) -> None:
vectors = [c.embedding for c in chunks]
metadata = [c.metadata for c in chunks]
self.client.upsert(vectors, metadata)
def search(
self,
query_vector: QueryVector,
filters: Dict,
top_k: int
) -> List[Chunk]:
results = self.client.query(
vector=query_vector.embedding,
filters=filters,
limit=top_k
)
return [self._to_chunk(r) for r in results]from physics_rag.ranking.base import Reranker
class MyReranker(Reranker):
"""Custom reranking model"""
def __init__(self, model_path: str):
self.model = load_model(model_path)
def score(self, query: str, chunks: List[Chunk]) -> List[float]:
pairs = [(query, c.text) for c in chunks]
scores = self.model.predict(pairs)
return scores# Use custom components
from physics_rag import PhysicsRAG
from my_extensions import MyVectorStore, MyReranker
rag = PhysicsRAG.from_config("config.yaml")
# Override defaults
rag.set_vector_store(MyVectorStore("postgresql://..."))
rag.set_reranker(MyReranker("./models/my-reranker"))
# Use normally
result = rag.query(query)# All tests
pytest
# Unit tests only
pytest tests/unit/
# Integration tests (requires Docker)
pytest tests/integration/
# With coverage
pytest --cov=physics_rag --cov-report=html
# Performance benchmarks
pytest tests/performance/ --benchmark-only# tests/unit/test_boundary_filter.py
import pytest
from physics_rag.retrieval.boundary_filter import BoundaryFilter
def test_boundary_filter_excludes_violating_constraints():
filter = BoundaryFilter()
query = Query(
text="test query",
constraints={"product": "widget-a"}
)
signatures = [
BoundarySignature(constraints={"product": "widget-a"}), # Match
BoundarySignature(constraints={"product": "widget-b"}), # Violation
]
result = filter.apply(query, signatures)
assert len(result) == 1
assert result[0].constraints["product"] == "widget-a"# Build
docker build -t physics-rag:latest .
# Run
docker run -d \
-p 8000:8000 \
-v /data/index:/app/data/index \
-e CONFIG_PATH=/app/configs/production.yaml \
physics-rag:latest# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: physics-rag
spec:
replicas: 3
template:
spec:
containers:
- name: api
image: physics-rag:latest
ports:
- containerPort: 8000
env:
- name: CONFIG_PATH
value: /config/production.yaml
volumeMounts:
- name: index-data
mountPath: /app/data/index
readOnly: true
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "4Gi"
cpu: "2000m"
livenessProbe:
httpGet:
path: /v1/health
port: 8000
initialDelaySeconds: 30
periodSeconds: 10For High Throughput:
retrieval:
top_k_per_level: [2, 5, 20, 50] # Reduce candidates
early_stop_threshold: 0.1 # Earlier stopping
budget:
default_latency_ms: 200 # Tighter budget
verification:
enable_nli: false # Skip expensive verificationFor High Quality:
retrieval:
top_k_per_level: [5, 20, 100, 200]
early_stop_threshold: 0.01
budget:
default_latency_ms: 1000
verification:
enable_nli: true
min_views: 3import structlog
logger = structlog.get_logger()
# Query execution automatically logs:
logger.info(
"query_executed",
query_id="abc123",
latency_ms=234,
confidence=0.92,
steps=["boundary", "hierarchy", "synthesis"],
candidates_filtered=1523
)Prometheus metrics are exposed at /v1/metrics:
# Query latency histogram
physics_rag_query_latency_seconds_bucket
# Query confidence distribution
physics_rag_query_confidence
# Index health
physics_rag_cluster_entropy
physics_rag_cluster_variance
# System resources
physics_rag_memory_usage_bytes
physics_rag_active_queries
Import dashboards/grafana.json for pre-built visualizations:
- Query latency (p50, p95, p99)
- Throughput (queries/sec)
- Confidence distribution
- Index health metrics
- Error rates
Typical performance on commodity hardware (16GB RAM, 8 CPUs):
| Corpus Size | Index Size | Query Latency (p95) | Throughput |
|---|---|---|---|
| 1K docs | ~500MB | 150ms | ~60 qps |
| 10K docs | ~5GB | 250ms | ~40 qps |
| 100K docs | ~50GB | 400ms | ~25 qps |
| 1M docs | ~500GB | 800ms | ~12 qps |
Notes:
- With GPU acceleration, add 2-3x throughput
- With distributed vector DB, latency plateaus at ~300ms even for 10M+ docs
- Budget controller allows trading latency for quality
- Core retrieval pipeline
- Boundary filtering
- Hierarchical search
- Evidence correlation
- Budget controller
- Multi-view verification
- GPU-accelerated embedding
- Distributed index sharding
- Streaming response generation
- Multi-modal support (images, tables)
- Active learning for reranker
- Automatic hyperparameter tuning
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
# Install dev dependencies
pip install -e ".[dev]"
# Install pre-commit hooks
pre-commit install
# Run formatters
black src/ tests/
isort src/ tests/
# Type checking
mypy src/
# Linting
ruff check src/ tests/- Python 3.10+
- Type hints required
- Black formatting (line length 100)
- Google-style docstrings
- Test coverage >80%
MIT License - see LICENSE file for details.
If you use this work in your research, please cite:
@software{physics_rag_2026,
title = {Physics-Inspired RAG Stack},
author = {Your Name},
year = {2026},
url = {https://github.com/yourusername/physics-rag-stack}
}Theoretical Foundations:
- Holographic Principle: Information encoding on boundaries
- Renormalization Group: Multi-scale effective theories
- Quantum Entanglement: Correlation structures
- Landauer's Principle: Thermodynamic cost of computation
- Bekenstein Bound: Information capacity limits
- Fast Scrambling: Information mixing and thermalization
Practical Implementations:
- Dense Retrieval: Sentence-BERT, DPR, ColBERT
- Sparse Retrieval: BM25, SPLADE
- Reranking: Cross-encoders, MonoT5
- Verification: NLI models, consistency checking
- Knowledge Graphs: Neo4j, Entity linking
- Architecture Deep Dive: Architecture.md
- Documentation: docs/
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: support@example.com