HTTP API and Python SDK reference.
The Engine class is the single entry point for all QuantumRAG functionality.
from quantumrag import EngineEngine(
config: str | Path | QuantumRAGConfig | None = None,
*,
document_store: Any | None = None, # Inject custom store
vector_store: Any | None = None, # Inject custom store
bm25_store: Any | None = None, # Inject custom store
embedding_model: str | None = None, # Quick override
generation_model: str | None = None, # Quick override
data_dir: str | None = None, # Quick override
)ingest(path, *, chunking_strategy=None, metadata=None, recursive=True, enable_hype=True) → IngestResult
Ingest documents from a file or directory.
result = engine.ingest("./docs", recursive=True)
print(result.documents) # Number of documents processed
print(result.chunks) # Number of chunks created
print(result.elapsed_seconds) # Time taken
print(result.errors) # List of error messagesParameters:
path— File or directory pathchunking_strategy— Override:"auto","structural","semantic","fixed"metadata— Custom metadata dict attached to all documentsrecursive— Recurse into subdirectories (default:True)enable_hype— Generate HyPE embeddings (default:True)
query(query, *, filters=None, top_k=None, rerank=None, conversation_history=None) → QueryResult
Query the indexed documents.
result = engine.query("How does adaptive query routing work?")
print(result.answer) # Answer with inline citations
print(result.confidence) # Confidence enum
print(result.sources) # List[Source] with excerpts
print(result.trace) # List[TraceStep] pipeline trace
print(result.metadata) # tokens_used, cost, latency_ms, etc.Parameters:
query— Natural language questionfilters— Metadata filters (dict)top_k— Override retrieval countrerank— Override reranking (bool)conversation_history— List ofConversationTurnfor multi-turn
query_stream(query, *, filters=None, top_k=None) → AsyncIterator[str]
Stream answer tokens.
async for token in engine.query_stream("What reranking providers are supported?"):
print(token, end="", flush=True)evaluate(**kwargs) → EvalResult
Run the evaluation pipeline.
result = engine.evaluate()
print(result.summary)
for metric in result.metrics:
print(f"{metric.name}: {metric.score:.2f}")
for suggestion in result.suggestions:
print(f"- {suggestion}")status() → dict
Get engine status.
status = engine.status()
# {'documents': 15, 'chunks': 234, 'config': {...}, 'data_dir': '...'}QueryResult
@dataclass
class QueryResult:
answer: str # Generated answer with citations
sources: list[Source] # Source references
confidence: Confidence # STRONGLY_SUPPORTED | PARTIALLY_SUPPORTED | INSUFFICIENT_EVIDENCE
trace: list[TraceStep] # Pipeline execution trace
metadata: dict[str, Any] # tokens_used, cost, latency_ms, path, etc.Source
@dataclass
class Source:
chunk_id: str
document_title: str
page: int | None
section: str | None
excerpt: str # Relevant text excerpt
relevance_score: floatConfidence
class Confidence(Enum):
STRONGLY_SUPPORTED = "STRONGLY_SUPPORTED"
PARTIALLY_SUPPORTED = "PARTIALLY_SUPPORTED"
INSUFFICIENT_EVIDENCE = "INSUFFICIENT_EVIDENCE"TraceStep
@dataclass
class TraceStep:
step: str # "rewrite", "classify", "retrieve", "generate", etc.
result: str # Summary of step output
latency_ms: float
details: dict # Step-specific detailsIngestResult
@dataclass
class IngestResult:
documents: int
chunks: int
elapsed_seconds: float
errors: list[str]Start the server:
quantumrag serve --host 0.0.0.0 --port 8000Interactive docs at http://localhost:8000/docs (Swagger UI).
If QUANTUMRAG_API_KEY is set, all /v1/* endpoints require the header:
Authorization: Bearer <api-key>
Health check (no auth required).
{
"status": "ok",
"version": "0.1.0",
"uptime_seconds": 123.4
}Ingest documents from a filesystem path.
Request:
{
"path": "./docs",
"chunking_strategy": "auto",
"metadata": {"project": "alpha"},
"recursive": true,
"enable_hype": true
}Response:
{
"documents": 15,
"chunks": 234,
"elapsed_seconds": 45.2,
"errors": []
}Upload and ingest files (multipart form data).
curl -X POST http://localhost:8000/v1/ingest/upload \
-H "Authorization: Bearer $API_KEY" \
-F "files=@report.pdf" \
-F "files=@data.xlsx"Ingest raw text directly.
Request:
{
"text": "QuantumRAG supports four chunking strategies: auto, structural, semantic, and fixed...",
"title": "Chunking Guide",
"metadata": {"type": "documentation"}
}Synchronous query.
Request:
{
"query": "What reranking providers are supported?",
"filters": null,
"top_k": 7,
"rerank": true,
"conversation_history": []
}Response:
{
"answer": "QuantumRAG supports FlashRank (default, CPU-based) and Cohere reranking providers [1].",
"sources": [
{
"chunk_id": "abc123",
"document_title": "Configuration Guide",
"page": null,
"section": "Reranking",
"excerpt": "FlashRank provides CPU-based reranking at no cost...",
"relevance_score": 0.92
}
],
"confidence": "STRONGLY_SUPPORTED",
"trace": [...],
"metadata": {
"tokens_used": 1250,
"estimated_cost": 0.003,
"latency_ms": 1450,
"path": "MEDIUM"
}
}SSE streaming query.
Request:
{
"query": "Summarize the key findings",
"top_k": 7
}Response: Server-Sent Events stream:
data: The
data: key
data: findings
data: include
data: ...
data: [DONE]
List indexed documents.
Query Parameters:
limit(default: 50)offset(default: 0)
Response:
{
"documents": [
{
"id": "doc-uuid",
"title": "Q3 Report",
"source_type": "FILE",
"chunks": 15,
"created_at": "2025-01-15T10:30:00Z"
}
],
"total": 15
}Delete a document and its chunks.
Engine status.
{
"documents": 15,
"chunks": 234,
"config": {
"language": "ko",
"domain": "general"
}
}Run evaluation metrics.
Request:
{
"benchmark_file": null,
"sample_count": 20
}Response:
{
"metrics": [
{"name": "retrieval_recall", "score": 0.92},
{"name": "faithfulness", "score": 0.95},
{"name": "answer_relevancy", "score": 0.88}
],
"summary": "Overall quality: Good",
"suggestions": ["Consider increasing top_k for complex queries"]
}Submit user feedback on a query result.
Request:
{
"query": "What reranking providers are supported?",
"answer": "QuantumRAG supports FlashRank and Cohere reranking [1].",
"rating": 5,
"comment": "Accurate and well-cited"
}quantumrag [OPTIONS] COMMAND [ARGS]| Option | Description |
|---|---|
--verbose, -v |
Enable debug logging |
--json-log |
Output logs in JSON format |
--version |
Show version and exit |
init — Create default config
quantumrag init [--config quantumrag.yaml]ingest — Ingest documents
quantumrag ingest <PATH> [OPTIONS]| Option | Description | Default |
|---|---|---|
--config, -c |
Config file path | quantumrag.yaml |
--strategy, -s |
Chunking strategy | auto |
--metadata, -m |
Key=value pairs | (none) |
--watch, -w |
Watch for file changes | false |
--recursive / --no-recursive |
Recurse into directories | true |
query — Ask a question
quantumrag query "What reranking providers are supported?" [OPTIONS]| Option | Description | Default |
|---|---|---|
--config, -c |
Config file path | quantumrag.yaml |
--top-k |
Number of chunks to retrieve | from config |
serve — Start HTTP API server
quantumrag serve [OPTIONS]| Option | Description | Default |
|---|---|---|
--config, -c |
Config file path | quantumrag.yaml |
--host |
Bind address | 127.0.0.1 |
--port |
Bind port | 8000 |
status — Show engine status
quantumrag status [--config quantumrag.yaml]evaluate — Run evaluation
quantumrag evaluate [--benchmark benchmark.json]