ChronoQuant

Time-aware embedding compression middleware for Qdrant.

Store vectors at full precision when fresh. Automatically compress them as they age — saving 70%+ Qdrant storage with ≥95% cosine recall preserved.

app  →  ChronoQuant proxy  →  Qdrant
              ↓
     tier-routes + compresses
     embeddings by age + importance

How it works

Every vector gets routed to one of five tiers on write. A background compressor demotes vectors as they age, applying progressively lossy quantization while keeping semantic similarity intact.

Tier	Encoding	Age threshold	Recall	Storage vs float32
anchor	float16	never demoted	100%	−50%
hot	float16	< 7 days	100%	−50%
warm	int8 (PolarQuant)	7–30 days	≥99%	−75%
cool	4-bit (PolarQuant)	30–90 days	≥97%	−87.5%
cold	3-bit (PolarQuant)	> 90 days	≥95%	−90.6%

Anchor tier: any vector with importance ≥ 0.7 is pinned here forever — never compressed, never demoted.

PolarQuant: Randomized Hadamard Transform + scalar quantization + bit-packing. Distributes energy uniformly before quantization, maximizing precision per bit.

Storage savings

At 1 million vectors with a realistic age distribution (5% anchor, 15% hot, 25% warm, 30% cool, 25% cold):

Dim	Raw Qdrant (float32)	ChronoQuant	Saved
384	1.4 GiB	0.4 GiB	70%
768	2.9 GiB	0.9 GiB	70%
1536	5.7 GiB	1.7 GiB	70%
3072	11.4 GiB	3.4 GiB	70%

Savings grow as your corpus ages. A collection where 80%+ of vectors are > 30 days old approaches 85–90% reduction.

Run the benchmark yourself:

poetry run python benchmarks/storage_savings.py

Quick start

Requirements: Docker, Python 3.11+, Poetry

git clone https://github.com/your-org/chronoquant
cd chronoquant
poetry install

# Start Qdrant + Redis
docker compose up -d

# Run the demo (real sentence-transformer embeddings)
poetry run python demo.py

Tier aging demo

Watch BGCompressor demote vectors across tiers in real time:

poetry run python tier_aging_demo.py

Output:

Tier counts BEFORE compression:
  anchor  ██           2
  hot     ██████       6
  warm    ██           2
  cool    ██           2
  cold                 0

Running BGCompressor...

Tier counts AFTER compression:
  anchor  ██           2
  hot     ██████       6  (-4)
  warm    █████        5  (+5)
  cool    ████         4  (+4)
  cold    ██           2  (+2)

Score drift (before vs after compression):
  [anchor] +0.0000  Transformers use self-attention...
  [hot   ] +0.0000  Flash Attention rewrites the softmax...
  [warm  ] +0.0031  Sparse autoencoders decompose...
  [cold  ] +0.0052  Gradient checkpointing trades compute...

Takeaway: semantic similarity preserved despite lossy compression.

SDK usage

import numpy as np
from chronoquant.sdk import ChronoQuantClient

client = ChronoQuantClient(
    qdrant_url="http://localhost:6333",
    redis_url="redis://localhost:6379",
    embedding_dim=1536,
)

# Store a vector
vec_id = client.store(
    text="Flash Attention rewrites the softmax kernel for IO-bound efficiency.",
    embedding=my_embedding,   # np.ndarray, float32, normalized
    importance=0.75,          # ≥0.7 → anchor tier (never compressed)
)

# Search across all tiers
results = await client.search(query_vec, top_k=10)
for r in results:
    print(r["score"], r["payload"]["raw_text_ref"])

# Run compression manually (normally a background job)
await client.run_compressor()

FastAPI proxy

Drop ChronoQuant in as an HTTP proxy between your app and Qdrant:

poetry run uvicorn chronoquant.proxy.server:app --port 8765

Endpoints:

Method	Path	Body
POST	`/store`	`{text, embedding, importance}`
POST	`/search`	`{embedding, top_k, include_cold}`
POST	`/compress`	`{}` — triggers BGCompressor manually
GET	`/health`	liveness check

Architecture

┌─────────────────────────────────────────────────────┐
│                  ChronoQuantClient                  │
│                                                     │
│  store() → TierRouter → QdrantAdapter → Qdrant     │
│          → WriteAheadLog (SQLite)                   │
│                                                     │
│  search() → fan-out all tiers → ScoreCalibrator    │
│           → AccessTracker (Redis sorted set)        │
│                                                     │
│  BGCompressor (background)                          │
│    Redlock (Redis) — prevents concurrent runs      │
│    For each tier, for each aged vector:             │
│      PolarQuant.encode() → shadow-write →          │
│      verify recall ≥ 0.95 → delete source          │
└─────────────────────────────────────────────────────┘

BGCompressor safety: shadow-write pattern — encode, upsert to target tier, verify the point exists and recall ≥ 0.95, then delete from source. A failed recall check aborts the migration for that vector; it stays in its current tier.

WriteAheadLog: SQLite-backed, INSERT OR IGNORE idempotent. Survives process crashes. Pending entries can be replayed on startup.

ScoreCalibrator: Applies per-tier cosine score offsets learned from ≥500 samples. Normalizes compressed-tier scores to be directly comparable with float16 scores.

agentmemory plugin

ChronoQuant ships a ready-made plugin that intercepts agentmemory save/search calls and routes them through the compression proxy. Embeddings are automatically tiered by age — your agent's memory gets smaller over time without losing semantic quality.

Install

# In your agentmemory project
npm install /path/to/chronoquant/plugins/agentmemory
# or copy plugins/agentmemory/ into your project directly

Wire it up

const { memory_save_hook, memory_search_hook } = require('chronoquant-agentmemory-plugin');

// Set proxy URL (default: http://localhost:8765)
process.env.CHRONOQUANT_PROXY_URL = 'http://localhost:8765';

// Replace agentmemory's default store call:
await memory_save_hook({
  text: "Flash Attention rewrites the softmax kernel for IO-bound efficiency.",
  embedding: float32Array,   // your embedding as a plain JS array or TypedArray
  importance: 0.6,           // 0.0–1.0; ≥0.7 pins to anchor tier permanently
});

// Replace agentmemory's default search call:
const results = await memory_search_hook({
  embedding: queryEmbedding,
  top_k: 10,
  include_cold: false,  // set true to search 90d+ tier (slower, more thorough)
});
// results: [{ vec_id, score, payload: { text, tier, importance, ... } }, ...]

How it behaves

Writes are fire-and-forget with a 2 s timeout. A failed ChronoQuant write logs a warning and does not crash your agent.
Searches fall back gracefully: if the proxy is unreachable, memory_search_hook returns null and your code can fall back to a direct Qdrant call.
No agentmemory internals are modified — the plugin is pure pass-through hooks.

Start the proxy alongside your agent

# terminal 1 — infrastructure
docker compose up -d

# terminal 2 — ChronoQuant proxy
poetry run uvicorn chronoquant.proxy.server:app --port 8765

# terminal 3 — your agent (now writes compress automatically)
node your-agent.js

Using ChronoQuant as a proxy for other vector databases

ChronoQuant's proxy layer is database-agnostic. The HTTP proxy speaks a simple REST protocol; the backend adapter is swappable. To add a new vector DB:

1. Implement the adapter interface

# chronoquant/adapters/my_db_adapter.py
from __future__ import annotations
import numpy as np

class MyDBAdapter:
    """Implement these five methods and ChronoQuant works out of the box."""

    def upsert(self, collection: str, vec_id: str, vector: np.ndarray, payload: dict) -> None:
        ...

    def search(self, collection: str, query: np.ndarray, limit: int) -> list:
        # Return objects with .id, .score, .payload, .vector attributes
        ...

    def get(self, collection: str, vec_id: str):
        # Return single point or None
        ...

    def delete(self, collection: str, vec_ids: list[str]) -> None:
        ...

    def count(self, collection: str) -> int:
        ...

    def scroll(self, collection: str, limit: int = 1000) -> list:
        # Return all points (used by BGCompressor to scan for aged vectors)
        ...

    def set_payload(self, collection: str, vec_id: str, patch: dict) -> None:
        # Merge-update payload fields without overwriting the whole document
        ...

2. Pass your adapter to the SDK

from chronoquant.sdk import ChronoQuantClient
from my_project.adapters.pinecone_adapter import PineconeAdapter

# Bypass built-in Qdrant wiring
client = ChronoQuantClient.__new__(ChronoQuantClient)
client._qdrant = PineconeAdapter(api_key="...", index="my-index")
# ... init remaining fields (pq, router, wal, etc.) as needed

3. Or pass it directly to BGCompressor / QdrantAdapter

For lighter-weight integration (compression only, no full SDK):

from chronoquant.engine.bg_compressor import BGCompressor
from chronoquant.engine.polar_quant import PolarQuant
from my_project.adapters.weaviate_adapter import WeaviateAdapter

compressor = BGCompressor(
    qdrant=WeaviateAdapter(url="http://localhost:8080"),
    pq=PolarQuant(seed=42),
    redis=None,  # optional — skip distributed lock
)
await compressor.run_compression_job()

Adapter examples (community / bring-your-own)

Database	Adapter status	Notes
Qdrant	✅ built-in	`chronoquant.adapters.qdrant_adapter`
Pinecone	PRs welcome	Needs upsert + query + fetch + delete
Weaviate	PRs welcome	GraphQL query → adapt `.score` field name
pgvector	PRs welcome	SQL adapter; `scroll` = `SELECT * WHERE collection = ?`
Chroma	PRs welcome	`collection.query()` returns distances, invert to similarity
Milvus	PRs welcome	Flush before scroll

The only hard constraint: the search() method must return objects with .score as cosine similarity (0–1, higher = more similar). If your DB returns distances, convert with score = 1 - distance before returning.

Schema migration

Existing Qdrant collections on schema v1 can be patched to v2 in-place:

from migrations.v1_to_v2 import migrate
from chronoquant.adapters.qdrant_adapter import QdrantAdapter

counts = migrate(qdrant_adapter)
# {"anchor": 0, "hot": 42, "warm": 17, ...}

Adds compressed_at field and bumps schema_version to 2. Safe to run multiple times (idempotent).

Tests

poetry run pytest -v

32 tests across all components. Recall validation suite:

poetry run pytest benchmarks/recall_validation.py -v

Validates PolarQuant meets recall thresholds on 1000 random 1536-dim unit vectors:

test_int8_recall  PASSED  (mean cosine ≥ 0.99)
test_4bit_recall  PASSED  (mean cosine ≥ 0.97)
test_3bit_recall  PASSED  (mean cosine ≥ 0.95)

Project layout

chronoquant/
├── chronoquant/
│   ├── engine/
│   │   ├── polar_quant.py       # RHT + scalar quantization + bit-packing
│   │   ├── tier_router.py       # age + importance → tier name
│   │   ├── anchor_detector.py   # importance threshold check
│   │   ├── bg_compressor.py     # Redlock + shadow-write demotion loop
│   │   └── score_calibrator.py  # per-tier cosine offset correction
│   ├── adapters/
│   │   └── qdrant_adapter.py    # Qdrant CRUD + UUID hashing
│   ├── tracking/
│   │   └── access_tracker.py    # Redis sorted set access log
│   ├── wal/
│   │   └── write_ahead_log.py   # SQLite WAL with idempotent replay
│   ├── proxy/
│   │   └── server.py            # FastAPI proxy (port 8765)
│   └── sdk.py                   # ChronoQuantClient
├── migrations/
│   └── v1_to_v2.py
├── plugins/
│   └── agentmemory/             # agentmemory JS plugin
├── benchmarks/
│   ├── storage_savings.py       # storage reduction + throughput table
│   └── recall_validation.py     # pytest recall thresholds
├── tests/                       # 32 unit + integration tests
├── demo.py                      # sentence-transformers end-to-end demo
├── tier_aging_demo.py           # BGCompressor tier aging visualization
└── docker-compose.yml           # Qdrant + Redis

Configuration

Parameter	Default	Description
`embedding_dim`	1536	Vector dimension (must match Qdrant collection)
`qdrant_url`	`http://localhost:6333`	Qdrant endpoint
`redis_url`	None	Redis endpoint (optional; disables Redlock + access tracking if absent)
`wal_db_path`	`chronoquant_wal.db`	SQLite WAL path (`:memory:` for testing)
`pq_seed`	42	PolarQuant RNG seed (must be consistent across restarts)

BGCompressor age thresholds (days):

Demotion	Threshold
hot → warm	7 days
warm → cool	30 days
cool → cold	90 days

Limitations

Throughput: PolarQuant encode/decode is ~160–270 vectors/sec single-threaded Python at 1536-dim. This is a background compression job — not on the search hot path.
Dim constraint: PolarQuant pads to next power-of-2. A 1000-dim vector is stored as 1024-dim coefficients, adding ~2.4% overhead.
Redis optional but recommended: Without Redis, BGCompressor skips distributed locking (safe for single-process deployments). Access tracking and score calibration are degraded.
No automatic re-promotion: Vectors do not move back up tiers if accessed frequently. Access counts are tracked but re-promotion is not yet implemented.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github		.github
benchmarks		benchmarks
chronoquant		chronoquant
migrations		migrations
plugins/agentmemory		plugins/agentmemory
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
demo.py		demo.py
docker-compose.yml		docker-compose.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
tier_aging_demo.py		tier_aging_demo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChronoQuant

How it works

Storage savings

Quick start

Tier aging demo

SDK usage

FastAPI proxy

Architecture

agentmemory plugin

Install

Wire it up

How it behaves

Start the proxy alongside your agent

Using ChronoQuant as a proxy for other vector databases

1. Implement the adapter interface

2. Pass your adapter to the SDK

3. Or pass it directly to BGCompressor / QdrantAdapter

Adapter examples (community / bring-your-own)

Schema migration

Tests

Project layout

Configuration

Limitations

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ChronoQuant

How it works

Storage savings

Quick start

Tier aging demo

SDK usage

FastAPI proxy

Architecture

agentmemory plugin

Install

Wire it up

How it behaves

Start the proxy alongside your agent

Using ChronoQuant as a proxy for other vector databases

1. Implement the adapter interface

2. Pass your adapter to the SDK

3. Or pass it directly to BGCompressor / QdrantAdapter

Adapter examples (community / bring-your-own)

Schema migration

Tests

Project layout

Configuration

Limitations

License

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages