Long-term memory for LLM agents using Sparse Distributed Representations — no vector DB, no GPU, no external server.
Bio-inspired memory system based on Numenta/HTM theory. Uses 4096-bit SDR encoding with Hamming distance retrieval over pure SQLite WAL. Built-in salience filter rejects ~85% of noise automatically.
| Feature | sdr-memory | Mem0 | Zep | ChromaDB |
|---|---|---|---|---|
| External server required | No | Yes | Yes | Yes |
| GPU required | No | No | No | Optional |
| Storage backend | SQLite | Various | Postgres | DuckDB |
| Memory approach | SDR (bio-inspired) | Embeddings | Knowledge Graph | Embeddings |
| Built-in salience filter | Yes | No | No | No |
pip install and go |
Yes | Yes | No | Yes |
| Retrieval (top-1 accuracy) | 92% | — | — | 81%* |
| Search latency (1600 memories) | 2.4ms | — | — | 0.8ms* |
| Memory footprint per entry | 512 bytes | ~3KB+ | ~5KB+ | ~1.5KB+ |
*Measured against sentence-transformers/all-MiniLM-L6-v2 with 32-dim dense embeddings on the same dataset. See benchmarks/. Mem0 and Zep benchmarks pending — contributions welcome.
Trade-off: SDR retrieval is pure brute-force over bit arrays — fast up to ~100K memories, but doesn't scale to millions like HNSW/FAISS. For LLM agent memory (typically 1K-50K entries), this is more than enough.
pip install sdr-memoryfrom sdr_memory import SDRMemory
store = SDRMemory("my_memory.db")
store.store("Database connection pool exhausted after traffic spike")
store.store("Payment gateway returns 503 during maintenance windows")
store.store("DNS timeout was the root cause of the outage")
results = store.query("database timeout", limit=3)
for r in results:
print(f"[{r['score']:.3f}] {r['text']}")
# [0.987] DNS resolution timeout was the root cause of the outage
# [0.984] Database connection pool exhausted after traffic spike
# [0.982] Payment gateway returns 503 during maintenance windowsThat's it. No server, no config, no model download.
Run as a persistent daemon for inter-process communication:
# Start the daemon
sdr-memory serve
# Query from another process
sdr-memory call --json '{"action":"store","text":"nginx OOM killed at 3am","metadata":{"severity":"critical"}}'
sdr-memory call --json '{"action":"query","text":"out of memory","limit":3}'
sdr-memory call --json '{"action":"stats"}'Or from Python:
from sdr_memory.server import client_call
from pathlib import Path
result = client_call(Path("/tmp/sdr-memory.sock"), {
"action": "query",
"text": "memory leak",
"limit": 5
})
print(result)Input text
│
▼
┌─────────────────┐
│ Salience Filter │──── reject (~85% noise)
│ (regex-based) │
└────────┬────────┘
│ accept
▼
┌─────────────────┐
│ Trigram Hashing │ "database timeout" → {" d","dat","ata","tab",...}
│ │
└────────┬────────┘
│
▼
┌─────────────────┐
│ SDR Encoding │ Hash each trigram → bit position (mod 4096)
│ 4096-bit vector │ Cap at 80 active bits → sparse binary vector
└────────┬────────┘
│
▼
┌─────────────────┐
│ SQLite WAL │ Pack bits → 512 bytes BLOB → INSERT
│ (storage) │ Journal mode: WAL, synchronous: NORMAL
└────────┬────────┘
│
┌────┴────┐
│ Query │
└────┬────┘
│
▼
┌─────────────────┐
│ Hamming Distance │ XOR query bits with each stored SDR
│ (retrieval) │ Score = 1 - (hamming_dist / 4096)
└─────────────────┘
Not everything is worth remembering. The salience filter automatically rejects:
- Procedural narration: "I'll read the file now", "Starting search..."
- Path-only strings: "/usr/local/bin/python"
- Short noise: Anything under 8 characters
And accepts:
- High-signal keywords: error, timeout, resolved, fixed, critical, etc.
- Substantive statements: Anything 60+ characters that passes the filters
This is critical for LLM agent memory — without it, you'd store 10x more garbage than useful facts.
Sparse Distributed Representations come from Numenta's HTM theory and Kanerva's Sparse Distributed Memory (1988). Key properties:
- Noise tolerance: Partial matches work naturally — a query doesn't need to exactly match to retrieve relevant memories
- No training required: Unlike embeddings, SDR encoding is deterministic and instant
- Compositionality: Similar texts produce overlapping bit patterns automatically via trigram hashing
- Tiny footprint: 512 bytes per memory vs ~1.5-6KB for embedding vectors
The trade-off vs embeddings: SDR captures lexical similarity (shared words/substrings), not semantic similarity ("dog" ≠ "puppy"). For factual/technical memory (logs, incidents, configs), lexical overlap is usually sufficient and often more precise.
Measured on 1,600 stored memories with 100 queries (benchmark details):
| Method | Top-1 Accuracy | Top-5 Accuracy | MRR | Search Time/Query |
|---|---|---|---|---|
| SDR 4096-bit Hamming | 92% | 98% | 0.949 | 2.4ms |
| Dense 32-dim cosine (MiniLM-L6) | 81% | 90% | 0.852 | 0.8ms |
SDR wins on accuracy by +11% top-1, at the cost of ~1.6ms extra search time (still sub-5ms). Dense embeddings are faster because they use optimized BLAS, but SDR doesn't need a model download or GPU.
# Run benchmarks yourself (requires [research] extras)
pip install sdr-memory[research]
python benchmarks/vector_duel.py --data your_data.jsonl| Parameter | Default | Environment Variable | Description |
|---|---|---|---|
| DB path | ~/.sdr-memory/memory.db |
SDR_MEMORY_DB |
SQLite database location |
| Socket path | /tmp/sdr-memory.sock |
SDR_MEMORY_SOCK |
Unix socket for daemon mode |
| SDR bits | 4096 | — | Width of the SDR vector |
| Max active bits | 80 | — | Sparsity cap per encoding |
sdr-memory is designed to give LLM agents persistent memory across sessions. Example integration with Claude Code hooks:
# On session start — recall relevant context
from sdr_memory.server import client_call
from pathlib import Path
result = client_call(Path("/tmp/sdr-memory.sock"), {
"action": "query",
"text": "What was I working on?",
"limit": 10
})
context = "\n".join(r["text"] for r in result.get("results", []))
# Inject `context` into the LLM system prompt
# On session end — store what was learned
client_call(Path("/tmp/sdr-memory.sock"), {
"action": "store",
"text": "Resolved OOM by increasing connection pool from 10 to 50",
"metadata": {"session": "abc123", "source": "manual"}
})The salience filter ensures only meaningful facts get stored — not the "Let me read the file..." filler that LLMs produce.
This project started as a research question: Can we build memory for LLM agents that works more like human memory — associative, lossy, and reconstructive — rather than brute-force database search?
The examples/research/ directory contains the experiments that led to this implementation:
- Experiment 01: Baseline SDR retrieval accuracy
- Experiment 02: Semantic embedding comparison (sentence-transformers)
- Experiment 02 Hybrid: SDR + semantic fusion
- Experiment 05: Memory reconstruction from compressed representations
- Vector Duel: Head-to-head SDR vs dense embeddings (the benchmark)
See docs/research_context.md for the full research narrative and docs/architecture.md for technical details.
# Core only (numpy + sqlite, ~5MB)
pip install sdr-memory
# With research extras (torch, transformers, etc.)
pip install sdr-memory[research]
# Development
git clone https://github.com/angelsu/sdr-memory.git
cd sdr-memory
pip install -e ".[dev]"
pytestContributions welcome! Some areas that need work:
- Semantic hybrid mode: Combine SDR with lightweight embeddings for best of both worlds
- Forgetting curve: Time-decay scoring so old memories fade naturally
- Batch operations: Bulk store/query for large imports
- Index optimization: Replace brute-force scan with approximate nearest neighbor for >100K memories
- More benchmarks: Compare against Mem0, Zep, MemoryBank on standard datasets
# Development setup
git clone https://github.com/angelsu/sdr-memory.git
cd sdr-memory
pip install -e ".[dev]"
pytest --covMIT — see LICENSE.
Built with curiosity about how memory actually works, not just how databases store things.