Built on: V4 Draft + Hackathon Teams 1–5 + Infrastructure Research (Turso/Convex/Retrieval Pipeline) Status: Pre-implementation design document Date: 2026-02-22 Key change from V4: Turso/libSQL replaces better-sqlite3, Convex for auth/team/UI, OpenAI embedding fallback, Graphiti replaced by TS Knowledge Graph, complete retrieval pipeline from day one
- Design Philosophy and Competitive Positioning
- Infrastructure Architecture
- Memory Schema
- Memory Observer
- Scratchpad to Validated Promotion Pipeline
- Knowledge Graph
- Complete Retrieval Pipeline
- Embedding Strategy
- Agent Loop Integration
- Build Pipeline Integration
- Worker Thread Architecture and Concurrency
- Cross-Session Pattern Synthesis
- UX and Developer Trust
- Cloud Sync, Multi-Device, and Web App
- Team and Organization Memories
- Privacy and Compliance
- Database Schema
- Memory Pruning and Lifecycle
- A/B Testing and Metrics
- Implementation Checklist
- Open Questions
Auto Claude positions as "more control than Lovable, more automatic than Cursor or Claude Code." Memory is the primary mechanism that delivers on this promise. Every session without memory forces agents to rediscover the codebase from scratch — re-reading the same files, retrying the same failed approaches, hitting the same gotchas. With a well-designed memory system, agents navigate the codebase like senior developers who built it.
The accumulated value compounds over time:
Sessions 1-5: Cold. Agent explores from scratch every session.
High discovery cost. No patterns established.
Sessions 5-15: Co-access graph built. Prefetch patterns emerging.
Gotchas accumulating. ~30% reduction in redundant reads.
Sessions 15-30: Calibration active. QA failures no longer recur.
Workflow recipes firing at planning time.
Impact analysis preventing ripple bugs.
~60% reduction in discovery cost.
Sessions 30+: The system knows this codebase. Agents navigate it
like senior developers who built it. Context token
savings measurable in the thousands per session.
| Tier | When | Mechanism | Purpose |
|---|---|---|---|
| Passive | Session start | System prompt + initial message injection | Global memories, module memories, workflow recipes, work state |
| Reactive | Mid-session, agent-requested | search_memory tool in agent toolset |
On-demand retrieval when agent explicitly needs context |
| Active | Mid-session, system-initiated | prepareStep callback in streamText() |
Proactive injection per step based on what agent just did |
The most valuable memories are never explicitly requested. They emerge from watching what the agent does — which files it reads together, which errors it retries, which edits it immediately reverts, which approaches it abandons. Explicit record_memory calls are supplementary, not primary.
| Capability | Cursor | Windsurf | Copilot | Augment | Devin | Auto Claude V5 |
|---|---|---|---|---|---|---|
| Behavioral observation | No | Partial | No | No | No | Yes (17 signals) |
| Co-access graph | No | No | No | No | No | Yes |
| BM25 + semantic + graph hybrid | No | No | No | Partial | No | Yes |
| Graph neighborhood boost | No | No | No | No | No | Yes (+7pp, unique) |
| Cross-encoder reranking | No | No | No | No | No | Yes (local) |
| AST-based chunking | Partial | No | No | No | No | Yes (tree-sitter) |
| Contextual embeddings | No | No | No | No | No | Yes |
| Active prepareStep injection | No | No | No | No | No | Yes |
| Scratchpad-to-promotion gate | No | No | No | No | No | Yes |
| Knowledge graph (3 layers) | No | No | No | No | No | Yes |
| Same code path local + cloud | N/A | N/A | N/A | N/A | N/A | Yes (libSQL) |
Where Auto Claude uniquely wins:
- Graph neighborhood boost — 3-path hybrid retrieval that boosts results co-located in the knowledge graph. No competitor does this because none have a closure-table knowledge graph.
- Behavioral observation — watching what agents do, not what they say.
- Active prepareStep injection — the third tier that fires between every agent step.
The single most important infrastructure decision is using Turso/libSQL (@libsql/client) as the memory database. This gives us identical query code for both local Electron and cloud web app deployments.
// Free tier — Electron desktop, no login
const db = createClient({ url: 'file:memory.db' });
// Logged-in user — Electron with cloud sync
const db = createClient({
url: 'file:memory.db', // Local replica (fast reads)
syncUrl: 'libsql://project-user.turso.io',
authToken: convexAuthToken,
syncInterval: 60, // Sync every 60 seconds
});
// Web app (SaaS, Next.js) — no local file, pure cloud
const db = createClient({
url: 'libsql://project-user.turso.io',
authToken: convexAuthToken,
});The identical query: FTS5, vector search, closure tables, co-access edges — same SQL works in all three modes.
| Concern | Technology | Notes |
|---|---|---|
| Memory storage | libSQL (@libsql/client) |
Turso Cloud in cloud mode, in-process for local |
| Vector search | sqlite-vec extension |
vector_distance_cos(), vector_top_k() — works in libSQL |
| BM25 search | FTS5 virtual table | Same in local and cloud; FTS5 not Tantivy (Tantivy is cloud-only) |
| Knowledge graph | SQLite closure tables | Recursive CTEs work in libSQL |
| Auth, billing, team UI | Convex + Better Auth | Real-time subscriptions, multi-tenancy, per-query scoping |
| Embeddings (local) | Qwen3-embedding 4b/8b via Ollama | 1024-dim primary |
| Embeddings (cloud/fallback) | OpenAI text-embedding-3-small |
Request 1024-dim to match Qwen3 |
| Reranking (local) | Qwen3-Reranker-0.6B via Ollama | Free, ~85-380ms latency |
| Reranking (cloud) | Cohere Rerank API | ~$1/1K queries, ~200ms latency |
| AST parsing | tree-sitter WASM (web-tree-sitter) |
No native rebuild on Electron updates |
| Agent execution | Vercel AI SDK v6 streamText() |
Worker threads in Electron |
MODE 1: Free / Offline (Electron, no login)
└── libSQL in-process → memory.db
├── All features work offline
├── No cloud sync
└── Ollama for embeddings (or OpenAI fallback)
MODE 2: Cloud User (Electron, logged in)
└── libSQL embedded replica → memory.db + syncUrl → Turso Cloud
├── Same queries, same tables
├── Reads from local replica (fast, offline-tolerant)
├── Syncs to Turso Cloud every 60s
└── Convex for auth, team memory display, real-time UI
MODE 3: Web App (Next.js SaaS)
└── libSQL → Turso Cloud directly (no local file)
├── Same queries as Electron
├── OpenAI embeddings (no Ollama in cloud)
├── Convex for auth, billing, real-time features
└── Cohere Rerank API for cross-encoder reranking
Convex handles the application layer concerns, NOT memory storage:
| Convex handles | libSQL/Turso handles |
|---|---|
| Authentication (Better Auth) | All memory records |
| Session management | Vector embeddings |
| Team membership + roles | Knowledge graph nodes/edges |
| Billing and subscription state | FTS5 BM25 index |
| Real-time UI subscriptions | Co-access graph |
| Project metadata | Observer scratchpad data |
This clean split means Convex never touches the hot path of memory search. libSQL handles all data-intensive operations.
Every user or project gets an isolated Turso database. This is Turso's database-per-tenant model:
user-alice-project-myapp.turso.io → Alice's memory for "myapp"
user-alice-project-backend.turso.io → Alice's memory for "backend"
user-bob-project-myapp.turso.io → Bob's memory for "myapp"
No row-level security complexity. No cross-tenant leak risk. Each database is fully isolated.
| Users | Turso (Scaler $25/month base) | Convex (Pro $25/month) | OpenAI Embeddings | Total |
|---|---|---|---|---|
| 10 | $25 | $25 | <$1 | ~$51/mo |
| 100 | ~$165 | $25 | ~$3 | ~$193/mo |
| 500 | ~$1,200 | $25+ | ~$15 | ~$1,240/mo |
At 500+ users, negotiate Turso Enterprise pricing. Writes dominate the bill; embedded replica reads are free.
// apps/desktop/src/main/ai/memory/types.ts
interface Memory {
id: string; // UUID
type: MemoryType;
content: string;
confidence: number; // 0.0 - 1.0
tags: string[];
relatedFiles: string[];
relatedModules: string[];
createdAt: string; // ISO 8601
lastAccessedAt: string;
accessCount: number;
workUnitRef?: WorkUnitRef;
scope: MemoryScope;
// Provenance
source: MemorySource;
sessionId: string;
commitSha?: string;
provenanceSessionIds: string[];
// Knowledge graph link
targetNodeId?: string;
impactedNodeIds?: string[];
// Relations
relations?: MemoryRelation[];
// Decay
decayHalfLifeDays?: number;
// Trust
needsReview?: boolean;
userVerified?: boolean;
citationText?: string; // Max 40 chars, for inline chips
pinned?: boolean; // Pinned memories never decay
methodology?: string; // Which plugin created this (for cross-plugin retrieval)
// Chunking metadata (V5 new — for AST-chunked code memories)
chunkType?: 'function' | 'class' | 'module' | 'prose';
chunkStartLine?: number;
chunkEndLine?: number;
contextPrefix?: string; // Prepended at embed time for contextual embeddings
}
type MemoryType =
// Core
| 'gotcha' // Trap or non-obvious constraint
| 'decision' // Architectural decision with rationale
| 'preference' // User or project coding preference
| 'pattern' // Reusable implementation pattern
| 'requirement' // Functional or non-functional requirement
| 'error_pattern' // Recurring error and its fix
| 'module_insight' // Understanding about a module's purpose
// Active loop
| 'prefetch_pattern' // Files always/frequently read together
| 'work_state' // Partial work snapshot for cross-session continuity
| 'causal_dependency' // File A must be touched when file B changes
| 'task_calibration' // Actual vs planned step ratio per module
// V3+
| 'e2e_observation' // UI behavioral fact from MCP tool use
| 'dead_end' // Strategic approach tried and abandoned
| 'work_unit_outcome' // Per work-unit result
| 'workflow_recipe' // Step-by-step procedural map
| 'context_cost'; // Token consumption profile per module
type MemorySource =
| 'agent_explicit' // Agent called record_memory
| 'observer_inferred' // MemoryObserver derived from behavioral signals
| 'qa_auto' // Auto-extracted from QA report failures
| 'mcp_auto' // Auto-extracted from Electron MCP tool results
| 'commit_auto' // Auto-tagged at git commit time
| 'user_taught'; // User typed /remember or used Teach panel
type MemoryScope = 'global' | 'module' | 'work_unit' | 'session';
interface WorkUnitRef {
methodology: string; // 'native' | 'bmad' | 'tdd'
hierarchy: string[]; // e.g. ['spec_042', 'subtask_3']
label: string;
}
type UniversalPhase =
| 'define' // Planning, spec creation, writing failing tests
| 'implement' // Coding, development
| 'validate' // QA, acceptance criteria
| 'refine' // Refactoring, cleanup, fixing QA issues
| 'explore' // Research, insights, discovery
| 'reflect'; // Session wrap-up, learning capture
interface MemoryRelation {
targetMemoryId?: string;
targetFilePath?: string;
relationType: 'required_with' | 'conflicts_with' | 'validates' | 'supersedes' | 'derived_from';
confidence: number;
autoExtracted: boolean;
}interface WorkflowRecipe extends Memory {
type: 'workflow_recipe';
taskPattern: string; // "adding a new IPC handler"
steps: Array<{
order: number;
description: string;
canonicalFile?: string;
canonicalLine?: number;
}>;
lastValidatedAt: string;
successCount: number;
scope: 'global';
}
interface DeadEndMemory extends Memory {
type: 'dead_end';
approachTried: string;
whyItFailed: string;
alternativeUsed: string;
taskContext: string;
decayHalfLifeDays: 90;
}
interface PrefetchPattern extends Memory {
type: 'prefetch_pattern';
alwaysReadFiles: string[]; // >80% session coverage
frequentlyReadFiles: string[]; // >50% session coverage
moduleTrigger: string;
sessionCount: number;
scope: 'module';
}
interface TaskCalibration extends Memory {
type: 'task_calibration';
module: string;
methodology: string;
averageActualSteps: number;
averagePlannedSteps: number;
ratio: number;
sampleCount: number;
}All methodology phases map into six UniversalPhase values. The retrieval engine operates exclusively on UniversalPhase.
interface MemoryMethodologyPlugin {
id: string;
displayName: string;
mapPhase(methodologyPhase: string): UniversalPhase;
resolveWorkUnitRef(context: ExecutionContext): WorkUnitRef;
getRelayTransitions(): RelayTransition[];
formatRelayContext(memories: Memory[], toStage: string): string;
extractWorkState(sessionOutput: string): Promise<Record<string, unknown>>;
formatWorkStateContext(state: Record<string, unknown>): string;
customMemoryTypes?: MemoryTypeDefinition[];
onWorkUnitComplete?(ctx: ExecutionContext, result: WorkUnitResult, svc: MemoryService): Promise<void>;
}
const nativePlugin: MemoryMethodologyPlugin = {
id: 'native',
displayName: 'Auto Claude (Subtasks)',
mapPhase: (p) => ({
planning: 'define', spec: 'define',
coding: 'implement',
qa_review: 'validate', qa_fix: 'refine',
debugging: 'refine',
insights: 'explore',
}[p] ?? 'explore'),
resolveWorkUnitRef: (ctx) => ({
methodology: 'native',
hierarchy: [ctx.specNumber, ctx.subtaskId].filter(Boolean),
label: ctx.subtaskId
? `Spec ${ctx.specNumber} / Subtask ${ctx.subtaskId}`
: `Spec ${ctx.specNumber}`,
}),
getRelayTransitions: () => [
{ from: 'planner', to: 'coder' },
{ from: 'coder', to: 'qa_reviewer' },
{ from: 'qa_reviewer', to: 'qa_fixer', filter: { types: ['error_pattern', 'requirement'] } },
],
};The Observer is the passive behavioral layer. It runs on the main thread, tapping every postMessage event from worker threads. It never writes to the database during execution.
Signal value formula: signal_value = (diagnostic_value × 0.5) + (cross_session_relevance × 0.3) + (1.0 - false_positive_rate) × 0.2
Signals with signal_value < 0.4 are discarded before promotion filtering.
| # | Signal Class | Score | Promotes To | Min Sessions |
|---|---|---|---|---|
| 2 | Co-Access Graph | 0.91 | causal_dependency, prefetch_pattern | 3 |
| 9 | Self-Correction | 0.88 | gotcha, module_insight | 1 |
| 3 | Error-Retry | 0.85 | error_pattern, gotcha | 2 |
| 16 | Parallel Conflict | 0.82 | gotcha | 1 |
| 5 | Read-Abandon | 0.79 | gotcha | 3 |
| 6 | Repeated Grep | 0.76 | module_insight, gotcha | 2 |
| 13 | Test Order | 0.74 | task_calibration | 3 |
| 7 | Tool Sequence | 0.73 | workflow_recipe | 3 |
| 1 | File Access | 0.72 | prefetch_pattern | 3 |
| 15 | Step Overrun | 0.71 | task_calibration | 3 |
| 4 | Backtrack | 0.68 | gotcha | 2 |
| 14 | Config Touch | 0.66 | causal_dependency | 2 |
| 11 | Glob-Ignore | 0.64 | gotcha | 2 |
| 17 | Context Token Spike | 0.63 | context_cost | 3 |
| 10 | External Reference | 0.61 | module_insight | 3 |
| 12 | Import Chase | 0.52 | causal_dependency | 4 |
| 8 | Time Anomaly | 0.48 | (with correlation) | 3 |
const SELF_CORRECTION_PATTERNS = [
/I was wrong about (.+?)\. (.+?) is actually/i,
/Let me reconsider[.:]? (.+)/i,
/Actually,? (.+?) (not|instead of|rather than) (.+)/i,
/I initially thought (.+?) but (.+)/i,
/Correction: (.+)/i,
/Wait[,.]? (.+)/i,
];Inspired by the Windsurf SpAIware exploit. Any signal derived from agent output produced after a WebFetch or WebSearch call is flagged as potentially tainted:
function applyTrustGate(
candidate: MemoryCandidate,
externalToolCallStep: number | undefined,
): MemoryCandidate {
if (externalToolCallStep !== undefined && candidate.originatingStep > externalToolCallStep) {
return {
...candidate,
needsReview: true,
confidence: candidate.confidence * 0.7,
trustFlags: { contaminated: true, contaminationSource: 'web_fetch' },
};
}
return candidate;
}| Resource | Hard Limit | Enforcement |
|---|---|---|
| CPU per event (ingest) | 2ms | process.hrtime.bigint() measurement; logged if exceeded, never throw |
| CPU for finalize (non-LLM) | 100ms | Budget tracked; abort if exceeded |
| Scratchpad resident memory | 50MB | Pre-allocated buffers; evict low-value signals on overflow |
| LLM synthesis calls per session | 1 max | Counter enforced in finalize() |
| Memories promoted per session | 20 (build), 5 (insights), 3 (others) | Hard cap |
| DB writes per session | 1 batched transaction after finalize | No writes during execution |
// Dead-end detection patterns (from agent text stream)
const DEAD_END_LANGUAGE_PATTERNS = [
/this approach (won't|will not|cannot) work/i,
/I need to abandon this/i,
/let me try a different approach/i,
/unavailable in (test|ci|production)/i,
/not available in this environment/i,
];
// In-session early promotion triggers
const EARLY_TRIGGERS = [
{ condition: (a: ScratchpadAnalytics) => a.selfCorrectionCount >= 1, signalType: 'self_correction', priority: 0.9 },
{ condition: (a) => [...a.grepPatternCounts.values()].some(c => c >= 3), signalType: 'repeated_grep', priority: 0.8 },
{ condition: (a) => a.configFilesTouched.size > 0 && a.fileEditSet.size >= 2, signalType: 'config_touch', priority: 0.7 },
{ condition: (a) => a.errorFingerprints.size >= 2, signalType: 'error_retry', priority: 0.75 },
];export class MemoryObserver {
private readonly scratchpad: Scratchpad;
private externalToolCallStep: number | undefined = undefined;
observe(message: MemoryIpcRequest): void {
const start = process.hrtime.bigint();
switch (message.type) {
case 'memory:tool-call': this.onToolCall(message); break;
case 'memory:tool-result': this.onToolResult(message); break;
case 'memory:reasoning': this.onReasoning(message); break;
case 'memory:step-complete': this.onStepComplete(message.stepNumber); break;
}
const elapsed = Number(process.hrtime.bigint() - start) / 1_000_000;
if (elapsed > 2) {
logger.warn(`[MemoryObserver] observe() budget exceeded: ${elapsed.toFixed(2)}ms`);
}
}
async finalize(outcome: SessionOutcome): Promise<MemoryCandidate[]> {
const candidates = [
...this.finalizeCoAccess(),
...this.finalizeErrorRetry(),
...this.finalizeAcuteCandidates(),
...this.finalizeRepeatedGrep(),
...this.finalizeSequences(),
];
const gated = candidates.map(c => applyTrustGate(c, this.externalToolCallStep));
const gateLimit = SESSION_TYPE_PROMOTION_LIMITS[this.scratchpad.sessionType];
const filtered = gated.sort((a, b) => b.priority - a.priority).slice(0, gateLimit);
if (outcome === 'success' && filtered.some(c => c.signalType === 'co_access')) {
const synthesized = await this.synthesizeWithLLM(filtered);
filtered.push(...synthesized);
}
return filtered;
}
}interface Scratchpad {
sessionId: string;
sessionType: SessionType;
startedAt: number;
signals: Map<SignalType, ObserverSignal[]>;
analytics: ScratchpadAnalytics;
acuteCandidates: AcuteCandidate[];
}
interface ScratchpadAnalytics {
fileAccessCounts: Map<string, number>;
fileFirstAccess: Map<string, number>;
fileLastAccess: Map<string, number>;
fileEditSet: Set<string>;
grepPatternCounts: Map<string, number>;
errorFingerprints: Map<string, number>;
currentStep: number;
recentToolSequence: CircularBuffer<string>; // last 8 tool calls
intraSessionCoAccess: Map<string, Set<string>>;
configFilesTouched: Set<string>;
selfCorrectionCount: number;
totalInputTokens: number;
}| Session Type | Gate Trigger | Max Memories | Primary Signals |
|---|---|---|---|
| Build (full pipeline) | QA passes | 20 | All 17 signals |
| Insights | Session end | 5 | co_access, self_correction, repeated_grep |
| Roadmap | Session end | 3 | decision, requirement |
| Terminal (agent terminal) | Session end | 3 | error_retry, sequence |
| Changelog | Skip | 0 | None |
| Spec Creation | Spec accepted | 3 | file_access, module_insight |
| PR Review | Review completed | 8 | error_retry, self_correction |
- Validation filter: discard signals from failed approaches (unless becoming
dead_end) - Frequency filter: require minimum sessions per signal class
- Novelty filter: cosine similarity > 0.88 to existing memory = discard
- Trust gate: contamination check for post-external-tool signals
- Scoring: final confidence from signal priority + session count + source trust multiplier
- LLM synthesis: single
generateText()call — raw signal data → 1-3 sentence memory content - Embedding generation: batch embed all promoted memories
- DB write: single transaction for all promoted memories
At each subtask boundary, checkpoint the scratchpad to disk to survive Electron crashes during long pipelines:
await scratchpadStore.checkpoint(workUnitRef, sessionId);
// On restart: restore from checkpoint and continueFor builds with more than 5 subtasks, promote scratchpad notes after each validated subtask rather than waiting for the full pipeline.
Fully TypeScript. Graphiti Python MCP sidecar is removed. All structural and semantic code intelligence lives here.
LAYER 3: KNOWLEDGE (agent-discovered + LLM-analyzed)
+----------------------------------------------------------+
| [Pattern: Repository] [Decision: JWT over sessions] |
| | applies_pattern | documents |
+----------------------------------------------------------+
LAYER 2: SEMANTIC (LLM-derived module relationships)
+----------------------------------------------------------+
| [Module: auth] --is_entrypoint_for--> [routes/auth.ts]|
| [Fn: login()] --flows_to--> [Fn: validateCreds()] |
+----------------------------------------------------------+
LAYER 1: STRUCTURAL (AST-extracted via tree-sitter WASM)
+----------------------------------------------------------+
| [File: routes/auth.ts] |
| | imports |
| v |
| [File: middleware/auth.ts] --calls--> [Fn: verifyJwt()] |
+----------------------------------------------------------+
Layer 1: computed from code — fast, accurate, automatically maintained via file watchers. Layer 2: LLM analysis of Layer 1 subgraphs — async, scheduled. Layer 3: accumulates from agent sessions and user input — continuous, incremental.
import Parser from 'web-tree-sitter';
import { app } from 'electron';
import { join } from 'path';
const GRAMMAR_PATHS: Record<string, string> = {
typescript: 'tree-sitter-typescript.wasm',
tsx: 'tree-sitter-tsx.wasm',
python: 'tree-sitter-python.wasm',
rust: 'tree-sitter-rust.wasm',
go: 'tree-sitter-go.wasm',
javascript: 'tree-sitter-javascript.wasm',
};
export class TreeSitterLoader {
private getWasmDir(): string {
return app.isPackaged
? join(process.resourcesPath, 'grammars')
: join(__dirname, '..', '..', '..', '..', 'node_modules', 'tree-sitter-wasms');
}
async initialize(): Promise<void> {
await Parser.init({ locateFile: (f) => join(this.getWasmDir(), f) });
}
async loadGrammar(lang: string): Promise<Parser.Language | null> {
const wasmFile = GRAMMAR_PATHS[lang];
if (!wasmFile) return null;
return Parser.Language.load(join(this.getWasmDir(), wasmFile));
}
}Grammar load time: ~50ms per grammar. Incremental re-parse: <5ms on edit. No native rebuild on Electron updates.
Instead of chunking code by fixed line counts, split at function/class boundaries using tree-sitter. This prevents function bodies from being split across chunks.
interface ASTChunk {
content: string;
filePath: string;
language: string;
chunkType: 'function' | 'class' | 'module' | 'prose';
startLine: number;
endLine: number;
name?: string; // Function name, class name, etc.
contextPrefix: string; // Prepended at embed time
}
export async function chunkFileByAST(
filePath: string,
content: string,
lang: string,
parser: Parser,
): Promise<ASTChunk[]> {
const tree = parser.parse(content);
const chunks: ASTChunk[] = [];
// Walk tree looking for function/class declarations
// Split at these boundaries; never split a function body across chunks
// For files with no AST structure (JSON, .md), fall back to 100-line chunks
const query = CHUNK_QUERIES[lang];
if (!query) return fallbackChunks(content, filePath);
const matches = query.matches(tree.rootNode);
for (const match of matches) {
const node = match.captures[0].node;
chunks.push({
content: node.text,
filePath,
language: lang,
chunkType: nodeTypeToChunkType(node.type),
startLine: node.startPosition.row + 1,
endLine: node.endPosition.row + 1,
name: extractName(node),
contextPrefix: buildContextPrefix(filePath, node),
});
}
return chunks;
}The contextPrefix is critical — it's prepended at embed time for contextual embeddings (see Section 8).
Pre-computed closure enables O(1) "what breaks if I change X?" queries:
// Agent tool call: analyzeImpact({ target: "auth/tokens.ts:verifyJwt", maxDepth: 3 })
// SQL:
// SELECT descendant_id, depth, path, total_weight
// FROM graph_closure
// WHERE ancestor_id = ? AND depth <= 3
// ORDER BY depth, total_weight DESCWhen a source file changes, immediately mark all edges from it as stale (stale_at = NOW()). Re-index asynchronously. Agents always query WHERE stale_at IS NULL.
// IncrementalIndexer: chokidar file watcher with 500ms debounce
// On change: markFileEdgesStale(filePath) → rebuildEdges(filePath) → updateClosure()Migrate from SQLite closure tables to Kuzu graph database when:
- 50,000+ graph nodes, OR
- 500MB SQLite size, OR
- P99 graph query latency > 100ms
V5 builds the complete pipeline from day one. No phased introduction of retrieval tiers.
Stage 1: CANDIDATE GENERATION (parallel, ~10-50ms)
├── Path A: Dense vector search via sqlite-vec
│ └── 256-dim MRL query → top 30 (cosine similarity, fast)
├── Path B: FTS5 BM25 keyword search
│ └── Exact technical terms → top 20
└── Path C: Knowledge graph traversal
└── Files in recently accessed module → 1-hop neighbors → top 15
De-duplicate across paths.
Total: ~50-70 candidates.
Stage 2a: RRF FUSION + PHASE FILTERING (~2ms)
└── Weighted Reciprocal Rank Fusion (identifier queries: FTS5 0.5 / graph 0.3 / dense 0.2)
(semantic queries: dense 0.5 / FTS5 0.25 / graph 0.25)
(structural queries: graph 0.6 / FTS5 0.25 / dense 0.15)
Stage 2b: GRAPH NEIGHBORHOOD BOOST (~5ms) ← FREE LUNCH, UNIQUE ADVANTAGE
└── For each top-10 result, query closure table for 1-hop neighbors
Boost candidates in positions 11-50 that neighbor top results:
boosted_score = rrf_score + 0.3 × (neighbor_count / 10)
Stage 3: CROSS-ENCODER RERANKING (~85-380ms, local Electron only)
├── Qwen3-Reranker-0.6B via Ollama
├── Top 20 candidates → final top 8
└── In cloud/web mode, use Cohere Rerank API (~$1/1K queries)
Stage 4: CONTEXT PACKING (~1ms)
├── Deduplicate overlapping chunks
├── Cluster by file locality
├── Pack into token budget per phase
└── Append citation chip format to each memory
function detectQueryType(query: string, recentToolCalls: string[]): 'identifier' | 'semantic' | 'structural' {
// Identifier: query contains camelCase, snake_case, or known file paths
if (/[a-z][A-Z]|_[a-z]/.test(query) || query.includes('/')) return 'identifier';
// Structural: recent tool calls include analyzeImpact or graph queries
if (recentToolCalls.some(t => t === 'analyzeImpact' || t === 'getDependencies')) return 'structural';
return 'semantic';
}Note: FTS5 is used in ALL modes (local and cloud). Turso's Tantivy is cloud-only and inconsistent. FTS5 is simpler and identical everywhere.
-- BM25 search
SELECT m.id, bm25(memories_fts) AS bm25_score
FROM memories_fts
JOIN memories m ON memories_fts.memory_id = m.id
WHERE memories_fts MATCH ?
AND m.project_id = ?
AND m.deprecated = 0
ORDER BY bm25_score -- lower is better in SQLite FTS5
LIMIT 100;function weightedRRF(
paths: Array<{ results: Array<{ memoryId: string }>; weight: number }>,
k: number = 60,
): Map<string, number> {
const scores = new Map<string, number>();
for (const { results, weight } of paths) {
results.forEach((r, rank) => {
const contribution = weight / (k + rank + 1);
scores.set(r.memoryId, (scores.get(r.memoryId) ?? 0) + contribution);
});
}
return scores;
}IMPORTANT — libSQL FULL OUTER JOIN workaround: libSQL doesn't support FULL OUTER JOIN. Use UNION pattern for RRF merging:
-- Merge dense and BM25 results without FULL OUTER JOIN
SELECT id FROM (
SELECT memory_id AS id FROM dense_results
UNION
SELECT memory_id AS id FROM bm25_results
)RRF scoring is done application-side after fetching both result sets.
This is Auto Claude's primary competitive differentiator in retrieval. Zero competitor does this.
async function applyGraphNeighborhoodBoost(
rankedCandidates: RankedMemory[],
topK: number = 10,
): Promise<RankedMemory[]> {
// Step 1: Get the file paths of the top-K results
const topFiles = rankedCandidates.slice(0, topK).flatMap(m => m.relatedFiles);
// Step 2: Query closure table for 1-hop neighbors of those files
const neighborNodeIds = await db.execute(`
SELECT DISTINCT gc.descendant_id
FROM graph_closure gc
JOIN graph_nodes gn ON gc.ancestor_id = gn.id
WHERE gn.file_path IN (${topFiles.map(() => '?').join(',')})
AND gc.depth = 1
`, topFiles);
const neighborFileIds = new Set(neighborNodeIds.rows.map(r => r.descendant_id as string));
// Step 3: Boost candidates in positions 11-50 that share files with neighbors
return rankedCandidates.map((candidate, rank) => {
if (rank < topK) return candidate;
const neighborCount = candidate.relatedFiles.filter(f =>
neighborFileIds.has(f)
).length;
if (neighborCount === 0) return candidate;
return {
...candidate,
score: candidate.score + 0.3 * (neighborCount / Math.max(topFiles.length, 1)),
boostReason: 'graph_neighborhood',
};
}).sort((a, b) => b.score - a.score);
}Expected improvement: +7 percentage points on retrieval quality with ~5ms additional latency.
const PHASE_WEIGHTS: Record<UniversalPhase, Partial<Record<MemoryType, number>>> = {
define: {
workflow_recipe: 1.4, dead_end: 1.2, requirement: 1.2,
decision: 1.1, task_calibration: 1.1,
gotcha: 0.8, error_pattern: 0.8,
},
implement: {
gotcha: 1.4, error_pattern: 1.3, causal_dependency: 1.2,
pattern: 1.1, dead_end: 1.2, prefetch_pattern: 1.1,
},
validate: {
error_pattern: 1.4, e2e_observation: 1.4, requirement: 1.2,
work_unit_outcome: 1.1,
},
refine: {
error_pattern: 1.3, gotcha: 1.2, dead_end: 1.2, pattern: 1.0,
},
explore: {
module_insight: 1.4, decision: 1.2, pattern: 1.1, causal_dependency: 1.0,
},
reflect: {
work_unit_outcome: 1.4, task_calibration: 1.3, dead_end: 1.1,
},
};
const SOURCE_TRUST_MULTIPLIERS: Record<MemorySource, number> = {
user_taught: 1.4,
agent_explicit: 1.2,
qa_auto: 1.1,
mcp_auto: 1.0,
commit_auto: 1.0,
observer_inferred: 0.85,
};
function computeFinalScore(memory: Memory, queryEmbedding: number[], phase: UniversalPhase): number {
const cosine = cosineSimilarity(memory.embedding, queryEmbedding);
const recency = Math.exp(-daysSince(memory.lastAccessedAt) * volatilityDecayRate(memory.relatedFiles));
const frequency = Math.log1p(memory.accessCount) / Math.log1p(100);
const base = 0.6 * cosine + 0.25 * recency + 0.15 * frequency;
const phaseWeight = PHASE_WEIGHTS[phase][memory.type] ?? 1.0;
const trustWeight = SOURCE_TRUST_MULTIPLIERS[memory.source];
return base * phaseWeight * trustWeight * memory.confidence;
}const DEFAULT_PACKING_CONFIG: Record<UniversalPhase, ContextPackingConfig> = {
define: { totalBudget: 2500, allocation: { workflow_recipe: 0.30, requirement: 0.20, decision: 0.20, dead_end: 0.15, task_calibration: 0.10, other: 0.05 } },
implement: { totalBudget: 3000, allocation: { gotcha: 0.30, error_pattern: 0.25, causal_dependency: 0.15, pattern: 0.15, dead_end: 0.10, other: 0.05 } },
validate: { totalBudget: 2500, allocation: { error_pattern: 0.30, requirement: 0.25, e2e_observation: 0.25, work_unit_outcome: 0.15, other: 0.05 } },
refine: { totalBudget: 2000, allocation: { error_pattern: 0.35, gotcha: 0.25, dead_end: 0.20, pattern: 0.15, other: 0.05 } },
explore: { totalBudget: 2000, allocation: { module_insight: 0.40, decision: 0.25, pattern: 0.20, causal_dependency: 0.15 } },
reflect: { totalBudget: 1500, allocation: { work_unit_outcome: 0.40, task_calibration: 0.35, dead_end: 0.15, other: 0.10 } },
};When fewer than 3 results score above 0.5 after all pipeline stages, generate a hypothetical ideal memory and use that for a secondary dense search:
// Applied only for search_memory tool calls (T3), never for proactive injection
if (topResults.filter(r => r.score > 0.5).length < 3) {
const hypoMemory = await generateText({
model: fastModel,
prompt: `Write a 2-sentence memory that would perfectly answer: "${query}"`,
maxTokens: 100,
});
return denseSearch(embed(hypoMemory.text), filters);
}1. `memory.staleAt` explicitly set (manual deprecation or file deletion)
2. `memory.lastAccessedAt` older than `memory.decayHalfLifeDays` — confidence penalty applied
3. `relatedFiles` changed in git log since `memory.commitSha` — confidence reduced proportionally
4. File modification time newer than `memory.createdAt` by more than 30 days — trigger review flag
- OpenAI replaces Voyage as API fallback —
text-embedding-3-smallat 1024-dim - Contextual embeddings built in from day one — prepend file/module context before every embed
- 1024-dim everywhere — OpenAI requests 1024-dim to match Qwen3 storage format
| Priority | Model | When Available | Dims | Notes |
|---|---|---|---|---|
| 1 | qwen3-embedding:8b via Ollama |
>32GB RAM available | 1024 (MRL) | SOTA local, auto-selected by RAM check |
| 2 | qwen3-embedding:4b via Ollama |
Ollama running (recommended) | 1024 (MRL) | Default recommendation |
| 3 | qwen3-embedding:0.6b via Ollama |
Low-memory machines | 1024 | For Stage 1 candidate generation |
| 4 | OpenAI text-embedding-3-small |
API key configured | 1024 | Request dimensions: 1024 explicitly |
| 5 | ONNX bundled bge-small-en-v1.5 |
Always | 384 | Zero-config fallback, ~100MB |
Dimension consistency note: OpenAI text-embedding-3-small natively produces 1536-dim but supports truncation. Always request dimensions: 1024 to match Qwen3 storage. Track model_id per embedding to prevent cross-model similarity comparisons.
// OpenAI embedding with dimension matching
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text,
dimensions: 1024, // Match Qwen3's MRL dimension
});Before embedding any memory, prepend its file/module context. This is Anthropic's contextual embedding technique adapted for code.
function buildContextualText(chunk: ASTChunk): string {
const prefix = [
`File: ${chunk.filePath}`,
chunk.chunkType !== 'module' ? `${chunk.chunkType}: ${chunk.name ?? 'unknown'}` : null,
`Lines: ${chunk.startLine}-${chunk.endLine}`,
].filter(Boolean).join(' | ');
return `${prefix}\n\n${chunk.content}`;
}
// For memories (not just code chunks):
function buildMemoryContextualText(memory: Memory): string {
const parts = [
memory.relatedFiles.length > 0 ? `Files: ${memory.relatedFiles.join(', ')}` : null,
memory.relatedModules.length > 0 ? `Module: ${memory.relatedModules[0]}` : null,
`Type: ${memory.type}`,
].filter(Boolean).join(' | ');
return parts ? `${parts}\n\n${memory.content}` : memory.content;
}
async function embedMemory(memory: Memory, embeddingService: EmbeddingService): Promise<number[]> {
const contextualText = buildMemoryContextualText(memory);
return embeddingService.embed(contextualText);
}Both Qwen3-embedding models support MRL. Use tiered dimensions:
- Stage 1 candidate generation: 256-dim — 14x faster, ~90% accuracy retained
- Stage 3 precision reranking: 1024-dim — full quality
- Storage: 1024-dim stored permanently per memory record
class EmbeddingCache {
async get(text: string, modelId: string, dims: number): Promise<number[] | null> {
const key = sha256(`${text}:${modelId}:${dims}`);
const row = await db.execute(
'SELECT embedding FROM embedding_cache WHERE key = ? AND expires_at > ?',
[key, Date.now()]
);
return row.rows[0] ? deserializeEmbedding(row.rows[0].embedding as ArrayBuffer) : null;
}
async set(text: string, modelId: string, dims: number, embedding: number[]): Promise<void> {
const key = sha256(`${text}:${modelId}:${dims}`);
await db.execute(
'INSERT OR REPLACE INTO embedding_cache (key, embedding, model_id, dims, expires_at) VALUES (?,?,?,?,?)',
[key, serializeEmbedding(embedding), modelId, dims, Date.now() + 7 * 86400 * 1000]
);
}
}INJECTION POINT 1: System prompt (before streamText())
Content: global memories, module memories, workflow recipes
Latency budget: up to 500ms
INJECTION POINT 2: Initial user message (before streamText())
Content: prefetched file contents, work state (if resuming)
Latency budget: up to 2s
INJECTION POINT 3: Tool result augmentation (during streamText())
Content: gotchas, dead_ends for file just read
Latency budget: < 100ms per augmentation
Mechanism: tool execute() appends to result string
INJECTION POINT 4: prepareStep callback (between each step)
Content: step-specific memory based on current agent state
Latency budget: < 50ms
Mechanism: prepareStep returns updated messages array
const result = streamText({
model: config.model,
system: config.systemPrompt,
messages: config.initialMessages,
tools: tools ?? {},
stopWhen: stepCountIs(adjustedMaxSteps),
abortSignal: config.abortSignal,
prepareStep: async ({ stepNumber, messages }) => {
// Skip first 5 steps — agent processing initial context
if (stepNumber < 5 || !memoryContext) {
workerObserverProxy.onStepComplete(stepNumber);
return {};
}
const injection = await workerObserverProxy.requestStepInjection(
stepNumber,
stepMemoryState.getRecentContext(5),
);
workerObserverProxy.onStepComplete(stepNumber);
if (!injection) return {};
return {
messages: [
...messages,
{ role: 'system' as const, content: injection.content },
],
};
},
onStepFinish: (stepResult) => {
progressTracker.processStepResult(stepResult);
},
});export class StepInjectionDecider {
async decide(stepNumber: number, recentContext: RecentToolCallContext): Promise<StepInjection | null> {
// Trigger 1: Agent read a file with unseen gotchas
const recentReads = recentContext.toolCalls
.filter(t => t.toolName === 'Read' || t.toolName === 'Edit')
.map(t => t.args.file_path as string).filter(Boolean);
if (recentReads.length > 0) {
const freshGotchas = await this.memoryService.search({
types: ['gotcha', 'error_pattern', 'dead_end'],
relatedFiles: recentReads,
limit: 4,
minConfidence: 0.65,
filter: (m) => !recentContext.injectedMemoryIds.has(m.id),
});
if (freshGotchas.length > 0) {
return { content: this.formatGotchas(freshGotchas), type: 'gotcha_injection' };
}
}
// Trigger 2: New scratchpad entry from agent's record_memory call
const newEntries = this.scratchpad.getNewSince(stepNumber - 1);
if (newEntries.length > 0) {
return { content: this.formatScratchpadEntries(newEntries), type: 'scratchpad_reflection' };
}
// Trigger 3: Agent is searching for something already in memory
const recentSearches = recentContext.toolCalls
.filter(t => t.toolName === 'Grep' || t.toolName === 'Glob').slice(-3);
for (const search of recentSearches) {
const pattern = (search.args.pattern ?? search.args.glob ?? '') as string;
const known = await this.memoryService.searchByPattern(pattern);
if (known && !recentContext.injectedMemoryIds.has(known.id)) {
return { content: `MEMORY CONTEXT: ${known.content}`, type: 'search_short_circuit' };
}
}
return null;
}
}export function buildMemoryAwareStopCondition(
baseMaxSteps: number,
calibrationFactor: number | undefined,
): StopCondition {
const factor = Math.min(calibrationFactor ?? 1.0, 2.0); // Cap at 2x
const adjusted = Math.min(Math.ceil(baseMaxSteps * factor), MAX_ABSOLUTE_STEPS);
return stepCountIs(adjusted);
}async function buildPlannerMemoryContext(
taskDescription: string,
relevantModules: string[],
memoryService: MemoryService,
): Promise<string> {
const [calibrations, deadEnds, causalDeps, outcomes, recipes] = await Promise.all([
memoryService.search({ types: ['task_calibration'], relatedModules: relevantModules, limit: 5 }),
memoryService.search({ types: ['dead_end'], relatedModules: relevantModules, limit: 8 }),
memoryService.search({ types: ['causal_dependency'], relatedModules: relevantModules, limit: 10 }),
memoryService.search({ types: ['work_unit_outcome'], relatedModules: relevantModules, limit: 5, sort: 'recency' }),
memoryService.searchWorkflowRecipe(taskDescription, { limit: 2 }),
]);
return formatPlannerSections({ calibrations, deadEnds, causalDeps, outcomes, recipes });
}Planning transformations:
- Calibration → multiply subtask count estimates by empirical ratio
- Dead ends → write constraints directly into the plan
- Causal deps → expand scope to include coupled files pre-emptively
Budget: max 32K tokens (~25% of context), max 12 files. Files accessed in >80% of past sessions load first; >50% load second.
QA sessions start with e2e_observation, error_pattern, and requirement memories injected before the first MCP call.
async function processMcpToolResult(
toolName: string,
result: string,
sessionId: string,
workUnitRef: WorkUnitRef,
): Promise<void> {
const MCP_OBS_TOOLS = ['take_screenshot', 'click_by_text', 'fill_input', 'get_page_structure', 'eval'];
if (!MCP_OBS_TOOLS.includes(toolName)) return;
const classification = await generateText({
model: fastModel,
prompt: `Classify this MCP observation. Is this: A=precondition, B=timing, C=ui_behavior, D=test_sequence, E=mcp_gotcha, F=not_worth_remembering
Tool=${toolName}, Result=${result.slice(0, 400)}
Reply: letter + one sentence`,
maxTokens: 100,
});
const match = classification.text.match(/^([ABCDE])[:\s]*(.+)/s);
if (!match) return;
await memoryService.store({
type: 'e2e_observation',
observationType: { A: 'precondition', B: 'timing', C: 'ui_behavior', D: 'test_sequence', E: 'mcp_gotcha' }[match[1]],
content: match[2].trim(),
confidence: 0.75,
source: 'mcp_auto',
needsReview: true,
scope: 'global',
sessionId, workUnitRef,
});
}MAIN THREAD (Electron)
├── WorkerBridge (per task)
│ ├── MemoryObserver (observes all worker messages)
│ ├── MemoryService (reads/writes via libSQL — WAL mode)
│ ├── ScratchpadStore (in-memory, checkpointed to disk)
│ └── Worker (worker_threads.Worker)
│ │ postMessage() IPC
│ WORKER THREAD
│ ├── runAgentSession() → streamText()
│ ├── Tool executors (Read, Write, Edit, Bash, Grep, Glob)
│ └── Memory tools (IPC to main thread):
│ ├── search_memory → MemoryService
│ ├── record_memory → ScratchpadStore
│ └── get_session_context → local scratchpad state
For parallel subagents:
MAIN THREAD
├── WorkerBridge-A (subtask 1) → ScratchpadStore-A (isolated)
├── WorkerBridge-B (subtask 2) → ScratchpadStore-B (isolated)
└── WorkerBridge-C (subtask 3) → ScratchpadStore-C (isolated)
After completion: ParallelScratchpadMerger.merge([A, B, C]) → observer.finalize()
Note on libSQL in worker threads: @libsql/client uses HTTP for cloud mode and is inherently async-safe. For local mode, the client is pure JS — safe in worker_threads. All writes are proxied through main thread MemoryService to avoid WAL conflicts.
export type MemoryIpcRequest =
| { type: 'memory:search'; requestId: string; query: string; filters: MemorySearchFilters }
| { type: 'memory:record'; requestId: string; entry: MemoryRecordEntry }
| { type: 'memory:tool-call'; toolName: string; args: Record<string, unknown>; stepIndex: number }
| { type: 'memory:tool-result'; toolName: string; result: string; isError: boolean; stepIndex: number }
| { type: 'memory:reasoning'; text: string; stepIndex: number }
| { type: 'memory:step-complete'; stepNumber: number }
| { type: 'memory:session-complete'; outcome: SessionOutcome; stepsExecuted: number };All IPC uses async request-response with UUID correlation. 3-second timeout: on timeout, agent proceeds without memory context (graceful degradation).
export class ParallelScratchpadMerger {
merge(scratchpads: ScratchpadStore[]): MergedScratchpad {
const allEntries = scratchpads.flatMap((s, idx) =>
s.getAll().map(e => ({ ...e, sourceAgentIndex: idx }))
);
const deduplicated = this.deduplicateByContent(allEntries);
// Quorum boost: entries observed by 2+ agents get confidence boost
return {
entries: deduplicated.map(entry => ({
...entry,
quorumCount: allEntries.filter(e =>
e.sourceAgentIndex !== entry.sourceAgentIndex &&
this.contentSimilarity(e.content, entry.content) > 0.85
).length + 1,
effectiveFrequencyThreshold: entry.confirmedBy >= 1 ? 1 : DEFAULT_FREQUENCY_THRESHOLD,
})),
};
}
}Mode 1: Incremental (after every session, no LLM) — Update rolling file statistics, co-access edge weights, error fingerprint registry. O(n) over new session's signals.
Mode 2: Threshold-triggered (sessions 5, 10, 20, 50, 100 — one LLM call per trigger per module) — Synthesize cross-session patterns. Output: 0-5 novel memories per call.
Mode 3: Scheduled (weekly — one LLM call per cross-module cluster) — Find module pairs with high co-access not yet captured as causal_dependency.
const SYNTHESIS_THRESHOLDS = [5, 10, 20, 50, 100];
async function triggerModuleSynthesis(module: string, sessionCount: number): Promise<void> {
const stats = buildModuleStatsSummary(module);
const synthesis = await generateText({
model: fastModel,
prompt: `You are analyzing ${sessionCount} agent sessions on the "${module}" module.
File access patterns:
${stats.topFiles.map(f => `- ${f.path}: ${f.sessions} sessions`).join('\n')}
Co-accessed pairs:
${stats.strongCoAccess.map(e => `- ${e.fileA} + ${e.fileB}: ${e.sessions} sessions`).join('\n')}
Recurring errors:
${stats.errors.map(e => `- "${e.errorType}": ${e.sessions} sessions, resolved: ${e.resolvedHow}`).join('\n')}
Identify (max 5 memories, omit obvious things):
1. Files to prefetch (prefetch_pattern)
2. Non-obvious file coupling (causal_dependency or gotcha)
3. Recurring errors (error_pattern)
4. Non-obvious module purpose (module_insight)
Format: JSON [{ "type": "...", "content": "...", "relatedFiles": [...], "confidence": 0.0-1.0 }]`,
maxTokens: 400,
});
const memories = parseSynthesisOutput(synthesis.text);
for (const memory of memories) {
if (await isNovel(memory)) {
await memoryService.store({ ...memory, source: 'observer_inferred', needsReview: true });
}
}
}Memory (Cmd+Shift+M)
├── Health Dashboard (default)
│ ├── Stats: total | active (used 30d) | needs-review | tokens-saved-this-session
│ ├── Health score 0-100
│ ├── Module coverage progress bars
│ └── Needs Attention: stale memories, pending reviews
├── Module Map (collapsible per-module cards)
├── Memory Browser (search + filters, full provenance)
├── Ask Memory (chat with citations)
└── [Cloud only] Team Memory
Memory citation format in agent output: [^ Memory: JWT 24h expiry decision]
The renderer detects [Memory #ID: brief text] and replaces with MemoryCitationChip — amber-tinted pill with a flag button. Dead-end citations use red tint. More than 5 citations collapse to "Used N memories [view all]".
Session Complete: Auth Bug Fix
Memory saved ~6,200 tokens of discovery this session
What the agent remembered:
- JWT decision → used when planning approach [ok]
- Redis gotcha → avoided concurrent validation bug [ok]
What the agent learned (4 new memories):
1/4 GOTCHA middleware/auth.ts [ok] [edit] [x]
Token refresh fails silently when Redis is unreachable
2/4 ERROR PATTERN tests/auth/ [ok] [edit] [x]
Auth tests require REDIS_URL env var — hang without it
...
[Save all confirmed] [Review later]
Level 1 — Cautious (Sessions 1-3): inject confidence > 0.80 only; all new memories require confirmation; advance: 3 sessions + 50% confirmed.
Level 2 — Standard (Sessions 4-15): inject confidence > 0.65; "Confirm all" is default; advance: 10+ sessions, <5% correction rate.
Level 3 — Confident (Sessions 16+): inject confidence > 0.55; session summary condensed to needsReview only.
Level 4 — Autonomous (Opt-in only): inject confidence > 0.45; session summary suppressed by default.
Trust regression: if user flags 3+ memories wrong in one session, offer (not force) moving to more conservative level.
| Method | Location | Action |
|---|---|---|
/remember [text] |
Agent terminal | Creates user_taught memory immediately |
Cmd+Shift+M |
Global | Opens Teach panel |
| Right-click file | File tree | Opens Teach panel pre-filled with file path |
| Import CLAUDE.md / .cursorrules | Settings | Parse rules into typed memories |
The Electron app is open source and free. Cloud features are gated behind Convex Better Auth login:
Electron App (all users)
├── Free tier: libSQL in-process → memory.db (offline, full features)
└── Logged-in tier: libSQL embedded replica + Turso Cloud sync
├── Same SQL queries, same tables
├── Reads from local replica (fast, offline-tolerant)
├── Syncs to Turso Cloud every 60s
└── Convex for: auth state, team features, billing UI, real-time memory panel
Web App (Next.js SaaS, same repo/OSS)
├── Self-hosted: users run their own stack (no cloud features)
└── Cloud hosted (auto-claude.app): Turso Cloud + Convex
├── Pure cloud libSQL (no local file)
├── OpenAI embeddings (no Ollama)
└── Cohere Rerank API
Electron write → libSQL local (immediate)
→ Turso embedded replica sync (within 60s)
Other device read → Turso Cloud fetch → embedded replica
Conflict (same memory edited on two devices before sync):
├── Non-conflicting fields (access_count, tags): auto-merge
└── Content field: present both versions, require user decision
| Feature | Electron (local) | Web App (cloud) |
|---|---|---|
| Database | libSQL in-process file | libSQL → Turso Cloud |
| Embeddings | Qwen3 via Ollama | OpenAI text-embedding-3-small |
| Reranking | Qwen3-Reranker-0.6B via Ollama | Cohere Rerank API |
| Graph indexing | tree-sitter WASM | tree-sitter WASM (in Node.js worker) |
| Auth | Convex Better Auth | Convex Better Auth |
| Agent execution | Worker threads | Next.js API routes + queue |
The same retrieval SQL queries work in both modes. Only the client connection differs.
// Create a dedicated Turso database per user+project
async function getOrCreateProjectDb(
userId: string,
projectId: string,
convexToken: string,
): Promise<Client> {
const dbName = `user-${userId}-proj-${projectId}`;
const tursoClient = createTursoClient(tursoApiToken);
const existing = await tursoClient.databases.get(dbName);
if (!existing) {
await tursoClient.databases.create({ name: dbName, group: 'memory' });
}
const dbToken = await tursoClient.databases.createToken(dbName);
return createClient({
url: `libsql://${dbName}.turso.io`,
authToken: dbToken.jwt,
});
}| Scope | Visible To | Use Cases |
|---|---|---|
| Personal | Only you | Workflow preferences, personal aliases |
| Project | All project members | Gotchas, error patterns, decisions |
| Team | All team members | Organization conventions, architecture |
| Organization | All org members | Security policies, compliance requirements |
When a new developer joins, surface the 5 most important team memories immediately. Sort by confidence × pinned_weight × access_count. New developer sees months of accumulated tribal knowledge in 60 seconds.
- Team member clicks "Dispute"
- Threaded comment opens on the memory
- Steward notified
- Memory gets "disputed" badge — agents still use it but with
confidence × 0.8 - Resolution: steward updates or team admin escalates
- Personal-scope memories
- Any memory flagged by the secret scanner
- Embedding vectors when "vectors-only" mode selected
Runs before any cloud upload and before storing user_taught memories:
const SECRET_PATTERNS = [
/sk-[a-zA-Z0-9]{48}/,
/sk-ant-[a-zA-Z0-9-]{95}/,
/ghp_[a-zA-Z0-9]{36}/,
/-----BEGIN (RSA|EC) PRIVATE KEY-----/,
/password\s*[:=]\s*["']?\S+/i,
];- Export all memories as JSON (machine-readable)
- Export as Markdown (human-readable, importable)
- Export as CLAUDE.md format (portable)
- Delete all memories (hard delete for explicit account deletion)
- Request data archive (SQLite + embeddings)
The V5 schema uses @libsql/client compatible SQL. No better-sqlite3. All queries are async.
PRAGMA journal_mode = WAL;
PRAGMA synchronous = NORMAL;
PRAGMA foreign_keys = ON;
-- ============================================================
-- CORE MEMORY TABLES
-- ============================================================
CREATE TABLE IF NOT EXISTS memories (
id TEXT PRIMARY KEY,
type TEXT NOT NULL,
content TEXT NOT NULL,
confidence REAL NOT NULL DEFAULT 0.8,
tags TEXT NOT NULL DEFAULT '[]', -- JSON array
related_files TEXT NOT NULL DEFAULT '[]', -- JSON array
related_modules TEXT NOT NULL DEFAULT '[]', -- JSON array
created_at TEXT NOT NULL,
last_accessed_at TEXT NOT NULL,
access_count INTEGER NOT NULL DEFAULT 0,
session_id TEXT,
commit_sha TEXT,
scope TEXT NOT NULL DEFAULT 'global',
work_unit_ref TEXT, -- JSON WorkUnitRef
methodology TEXT,
source TEXT NOT NULL DEFAULT 'agent_explicit',
target_node_id TEXT,
impacted_node_ids TEXT DEFAULT '[]',
relations TEXT NOT NULL DEFAULT '[]',
decay_half_life_days REAL,
provenance_session_ids TEXT DEFAULT '[]',
needs_review INTEGER NOT NULL DEFAULT 0,
user_verified INTEGER NOT NULL DEFAULT 0,
citation_text TEXT,
pinned INTEGER NOT NULL DEFAULT 0,
deprecated INTEGER NOT NULL DEFAULT 0,
deprecated_at TEXT,
stale_at TEXT,
project_id TEXT NOT NULL,
trust_level_scope TEXT DEFAULT 'personal',
-- V5 new: AST chunking metadata
chunk_type TEXT,
chunk_start_line INTEGER,
chunk_end_line INTEGER,
context_prefix TEXT,
embedding_model_id TEXT -- track which model produced this embedding
);
CREATE TABLE IF NOT EXISTS memory_embeddings (
memory_id TEXT PRIMARY KEY REFERENCES memories(id) ON DELETE CASCADE,
embedding BLOB NOT NULL, -- float32 vector, 1024-dim
model_id TEXT NOT NULL,
dims INTEGER NOT NULL DEFAULT 1024,
created_at TEXT NOT NULL
);
-- FTS5 for BM25 keyword search (same syntax in Turso local and cloud)
CREATE VIRTUAL TABLE IF NOT EXISTS memories_fts USING fts5(
memory_id UNINDEXED,
content,
tags,
related_files,
tokenize='porter unicode61'
);
-- Embedding cache
CREATE TABLE IF NOT EXISTS embedding_cache (
key TEXT PRIMARY KEY, -- sha256(contextualText:modelId:dims)
embedding BLOB NOT NULL,
model_id TEXT NOT NULL,
dims INTEGER NOT NULL,
expires_at INTEGER NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_embedding_cache_expires ON embedding_cache(expires_at);
-- ============================================================
-- OBSERVER TABLES
-- ============================================================
CREATE TABLE IF NOT EXISTS observer_file_nodes (
file_path TEXT PRIMARY KEY,
project_id TEXT NOT NULL,
access_count INTEGER NOT NULL DEFAULT 0,
last_accessed_at TEXT NOT NULL,
session_count INTEGER NOT NULL DEFAULT 0
);
CREATE TABLE IF NOT EXISTS observer_co_access_edges (
file_a TEXT NOT NULL,
file_b TEXT NOT NULL,
project_id TEXT NOT NULL,
weight REAL NOT NULL DEFAULT 0.0,
raw_count INTEGER NOT NULL DEFAULT 0,
session_count INTEGER NOT NULL DEFAULT 0,
avg_time_delta_ms REAL,
directional INTEGER NOT NULL DEFAULT 0,
task_type_breakdown TEXT DEFAULT '{}',
last_observed_at TEXT NOT NULL,
promoted_at TEXT,
PRIMARY KEY (file_a, file_b, project_id)
);
CREATE TABLE IF NOT EXISTS observer_error_patterns (
id TEXT PRIMARY KEY,
project_id TEXT NOT NULL,
tool_name TEXT NOT NULL,
error_fingerprint TEXT NOT NULL,
error_message TEXT NOT NULL,
occurrence_count INTEGER NOT NULL DEFAULT 1,
last_seen_at TEXT NOT NULL,
resolved_how TEXT,
sessions TEXT DEFAULT '[]'
);
CREATE TABLE IF NOT EXISTS observer_module_session_counts (
module TEXT NOT NULL,
project_id TEXT NOT NULL,
count INTEGER NOT NULL DEFAULT 0,
PRIMARY KEY (module, project_id)
);
CREATE TABLE IF NOT EXISTS observer_synthesis_log (
module TEXT NOT NULL,
project_id TEXT NOT NULL,
trigger_count INTEGER NOT NULL,
synthesized_at INTEGER NOT NULL,
memories_generated INTEGER NOT NULL DEFAULT 0,
PRIMARY KEY (module, project_id, trigger_count)
);
-- ============================================================
-- KNOWLEDGE GRAPH TABLES
-- ============================================================
CREATE TABLE IF NOT EXISTS graph_nodes (
id TEXT PRIMARY KEY,
project_id TEXT NOT NULL,
type TEXT NOT NULL,
label TEXT NOT NULL,
file_path TEXT,
language TEXT,
start_line INTEGER,
end_line INTEGER,
layer INTEGER NOT NULL DEFAULT 1,
source TEXT NOT NULL, -- 'ast' | 'scip' | 'llm' | 'agent'
confidence TEXT DEFAULT 'inferred',
metadata TEXT DEFAULT '{}',
created_at INTEGER NOT NULL,
updated_at INTEGER NOT NULL,
stale_at INTEGER,
associated_memory_ids TEXT DEFAULT '[]'
);
CREATE INDEX IF NOT EXISTS idx_gn_project_type ON graph_nodes(project_id, type);
CREATE INDEX IF NOT EXISTS idx_gn_project_label ON graph_nodes(project_id, label);
CREATE INDEX IF NOT EXISTS idx_gn_file_path ON graph_nodes(project_id, file_path) WHERE file_path IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_gn_stale ON graph_nodes(stale_at) WHERE stale_at IS NOT NULL;
CREATE TABLE IF NOT EXISTS graph_edges (
id TEXT PRIMARY KEY,
project_id TEXT NOT NULL,
from_id TEXT NOT NULL REFERENCES graph_nodes(id) ON DELETE CASCADE,
to_id TEXT NOT NULL REFERENCES graph_nodes(id) ON DELETE CASCADE,
type TEXT NOT NULL,
layer INTEGER NOT NULL DEFAULT 1,
weight REAL DEFAULT 1.0,
source TEXT NOT NULL,
confidence REAL DEFAULT 1.0,
metadata TEXT DEFAULT '{}',
created_at INTEGER NOT NULL,
updated_at INTEGER NOT NULL,
stale_at INTEGER
);
CREATE INDEX IF NOT EXISTS idx_ge_from_type ON graph_edges(from_id, type) WHERE stale_at IS NULL;
CREATE INDEX IF NOT EXISTS idx_ge_to_type ON graph_edges(to_id, type) WHERE stale_at IS NULL;
CREATE INDEX IF NOT EXISTS idx_ge_stale ON graph_edges(stale_at) WHERE stale_at IS NOT NULL;
-- Pre-computed closure for O(1) impact analysis
CREATE TABLE IF NOT EXISTS graph_closure (
ancestor_id TEXT NOT NULL,
descendant_id TEXT NOT NULL,
depth INTEGER NOT NULL,
path TEXT NOT NULL, -- JSON array of node IDs
edge_types TEXT NOT NULL, -- JSON array of edge types along path
total_weight REAL NOT NULL,
PRIMARY KEY (ancestor_id, descendant_id),
FOREIGN KEY (ancestor_id) REFERENCES graph_nodes(id) ON DELETE CASCADE,
FOREIGN KEY (descendant_id) REFERENCES graph_nodes(id) ON DELETE CASCADE
);
CREATE INDEX IF NOT EXISTS idx_gc_ancestor ON graph_closure(ancestor_id, depth);
CREATE INDEX IF NOT EXISTS idx_gc_descendant ON graph_closure(descendant_id, depth);
CREATE TABLE IF NOT EXISTS graph_index_state (
project_id TEXT PRIMARY KEY,
last_indexed_at INTEGER NOT NULL,
last_commit_sha TEXT,
node_count INTEGER DEFAULT 0,
edge_count INTEGER DEFAULT 0,
stale_edge_count INTEGER DEFAULT 0,
index_version INTEGER DEFAULT 1
);
CREATE TABLE IF NOT EXISTS scip_symbols (
symbol_id TEXT PRIMARY KEY,
node_id TEXT NOT NULL REFERENCES graph_nodes(id) ON DELETE CASCADE,
project_id TEXT NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_scip_node ON scip_symbols(node_id);
-- ============================================================
-- PERFORMANCE INDEXES
-- ============================================================
CREATE INDEX IF NOT EXISTS idx_memories_project_type ON memories(project_id, type);
CREATE INDEX IF NOT EXISTS idx_memories_project_scope ON memories(project_id, scope);
CREATE INDEX IF NOT EXISTS idx_memories_source ON memories(source);
CREATE INDEX IF NOT EXISTS idx_memories_needs_review ON memories(needs_review) WHERE needs_review = 1;
CREATE INDEX IF NOT EXISTS idx_memories_confidence ON memories(confidence DESC);
CREATE INDEX IF NOT EXISTS idx_memories_last_accessed ON memories(last_accessed_at DESC);
CREATE INDEX IF NOT EXISTS idx_memories_type_conf ON memories(project_id, type, confidence DESC);
CREATE INDEX IF NOT EXISTS idx_memories_not_deprecated ON memories(project_id, deprecated) WHERE deprecated = 0;
CREATE INDEX IF NOT EXISTS idx_co_access_weight ON observer_co_access_edges(weight DESC);const DEFAULT_HALF_LIVES: Partial<Record<MemoryType, number>> = {
work_state: 7,
e2e_observation: 30,
error_pattern: 60,
gotcha: 60,
module_insight: 90,
dead_end: 90,
causal_dependency: 120,
decision: Infinity, // Decisions never decay
workflow_recipe: 120,
task_calibration: 180,
};
function currentConfidence(memory: Memory): number {
if (!memory.decayHalfLifeDays || memory.pinned) return memory.confidence;
const daysSince = (Date.now() - Date.parse(memory.lastAccessedAt)) / 86400000;
const decayFactor = Math.pow(0.5, daysSince / memory.decayHalfLifeDays);
return memory.confidence * decayFactor;
}Runs daily via Electron powerMonitor idle event:
async function runPruningJob(db: Client, projectId: string): Promise<void> {
const now = new Date().toISOString();
// Soft-delete expired memories
await db.execute(`
UPDATE memories SET deprecated = 1, deprecated_at = ?
WHERE project_id = ? AND deprecated = 0
AND decay_half_life_days IS NOT NULL
AND pinned = 0
AND (julianday(?) - julianday(last_accessed_at)) > decay_half_life_days * 3
`, [now, projectId, now]);
// Hard-delete after 30-day grace (except user-verified)
await db.execute(`
DELETE FROM memories
WHERE project_id = ? AND deprecated = 1
AND user_verified = 0
AND (julianday(?) - julianday(deprecated_at)) > 30
`, [projectId, now]);
// Evict expired embedding cache
await db.execute('DELETE FROM embedding_cache WHERE expires_at < ?', [Date.now()]);
}Every time a memory is injected, increment access_count. After 5 accesses with no correction, auto-increment confidence by 0.05 (capped at 0.95). After 10 accesses, remove needsReview flag.
5% of new sessions assigned to control group (no memory injection). Control sessions still generate observer signals — they just receive no injections.
enum MemoryABGroup {
CONTROL = 'control', // No injection (5%)
PASSIVE_ONLY = 'passive', // T1 + T2 only (10%)
FULL = 'full', // All 4 tiers (85%)
}
function assignABGroup(sessionId: string, projectId: string): MemoryABGroup {
const hash = murmurhash(`${sessionId}:${projectId}`) % 100;
if (hash < 5) return MemoryABGroup.CONTROL;
if (hash < 15) return MemoryABGroup.PASSIVE_ONLY;
return MemoryABGroup.FULL;
}| Metric | Definition | Target |
|---|---|---|
| Tool calls per task | Total tool calls in session | <20% reduction vs control |
| File re-reads | Read calls on files previously read in prior session | <50% reduction vs control |
| QA first-pass rate | QA passes without fix cycle | >15% improvement vs control |
| Dead-end re-entry rate | Agent tries a previously-failed approach | <5% |
| User correction rate | Memories flagged / memories used | <5% |
| Graph boost rate | Fraction of retrievals where neighborhood boost changed top-8 | Track for value validation |
After 30+ sessions, run background weight optimization: which memory types most strongly correlate with QA first-pass success per phase? Human review required before applying new weights.
V5 is built complete, not phased. The retrieval pipeline, AST chunking, contextual embeddings, and graph neighborhood boost are all implemented from the start. Implementation order follows dependency order.
cd apps/desktop
npm install @libsql/client
# Remove better-sqlite3 if present for memory module (keep for other uses if needed)Create apps/desktop/src/main/ai/memory/db.ts:
import { createClient, type Client } from '@libsql/client';
import { app } from 'electron';
import { join } from 'path';
import { MEMORY_SCHEMA_SQL } from './schema';
let _client: Client | null = null;
export async function getMemoryClient(
tursoSyncUrl?: string,
authToken?: string,
): Promise<Client> {
if (_client) return _client;
const localPath = join(app.getPath('userData'), 'memory.db');
_client = createClient({
url: `file:${localPath}`,
...(tursoSyncUrl && authToken ? { syncUrl: tursoSyncUrl, authToken, syncInterval: 60 } : {}),
});
// Initialize schema (idempotent)
await _client.executeMultiple(MEMORY_SCHEMA_SQL);
// Load sqlite-vec extension for local mode only
// Cloud Turso has built-in vector support (DiskANN) — no extension needed
if (!tursoSyncUrl) {
const vecExtPath = app.isPackaged
? join(process.resourcesPath, 'extensions', 'vec0')
: join(__dirname, '..', '..', 'node_modules', 'sqlite-vec', 'vec0');
await _client.execute(`SELECT load_extension('${vecExtPath}')`);
}
return _client;
}
export async function closeMemoryClient(): Promise<void> {
if (_client) {
await _client.close();
_client = null;
}
}sqlite-vec with libSQL: Use @libsql/client with the vec0 extension. For cloud Turso databases, vector functions are built in. For local, bundle the vec0 extension binary.
Implement MemoryService with:
store(entry)→ inserts memory, generates contextual embedding, updates FTS5 triggersearch(query, filters)→ full 4-stage pipeline (candidates → RRF → neighborhood boost → pack)searchByPattern(pattern)→ BM25-only for quick pattern matching in StepInjectionDeciderinsertUserTaught(content, projectId, tags)→ immediate insert for/remembercommand
Implement with provider auto-detection:
export class EmbeddingService {
private provider: 'ollama-8b' | 'ollama-4b' | 'ollama-0.6b' | 'openai' | 'onnx' = 'onnx';
async initialize(): Promise<void> {
// Check Ollama availability and RAM
const ollamaAvailable = await checkOllama();
if (ollamaAvailable) {
const ram = await getAvailableRAM();
this.provider = ram > 32 ? 'ollama-8b' : 'ollama-4b';
} else if (process.env.OPENAI_API_KEY) {
this.provider = 'openai';
}
// else: onnx bundled fallback
}
async embed(text: string, dims: 256 | 1024 = 1024): Promise<number[]> {
const cached = await this.cache.get(text, this.provider, dims);
if (cached) return cached;
const embedding = await this.callProvider(text, dims);
await this.cache.set(text, this.provider, dims, embedding);
return embedding;
}
private async callProvider(text: string, dims: number): Promise<number[]> {
switch (this.provider) {
case 'openai':
const res = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text,
dimensions: dims, // Always 1024 for storage
});
return res.data[0].embedding;
// ... ollama and onnx implementations
}
}
}TreeSitterLoaderwith TypeScript + JavaScript + Python + RustTreeSitterExtractor: import edges, function definitions, call edges, class hierarchyASTChunker: split files at function/class boundariesGraphDatabase: node/edge CRUD with closure table maintenanceIncrementalIndexer: chokidar file watcher, 500ms debounce, Glean staleness model
- FTS5 BM25 path
- Dense vector path (256-dim candidates, 1024-dim precision)
- Graph traversal path (co-access edges + closure table neighbors)
- Weighted RRF fusion (with UNION workaround — no FULL OUTER JOIN)
- Graph neighborhood boost (the unique advantage)
- Phase-aware scoring and context packing
- Reranking via Qwen3-Reranker-0.6B (Ollama, local only)
- HyDE fallback
MemoryObserveron main thread tapping WorkerBridge eventsScratchpadwith O(1) analytics data structures- Top-5 signals: self_correction, co_access, error_retry, parallel_conflict, read_abandon
- Trust defense layer (SpAIware protection)
- Session-type-aware promotion gates
observer.finalize()with LLM synthesis call
StepInjectionDecider(3 triggers)prepareStepcallback inrunAgentSession()- Planner memory context builder
- Prefetch plan builder (T2 pre-loading)
- E2E observation pipeline for MCP tool results
- Memory-aware
stopWhen(calibration-adjusted max steps)
- Health Dashboard + Module Map + Memory Browser
- Session-end summary panel
MemoryCitationChipin agent terminal- Correction modal
- Teach panel with all entry points
- Trust progression system (4 levels, per-project)
- First-run experience
- i18n keys in en.json and fr.json
- Turso Cloud integration (per-tenant database provisioning)
- Convex integration (auth token → Turso sync URL)
- Login-gated feature detection in Electron
- Team memory scoping (project/team/org)
- Dispute resolution UI
- Secret scanner
- GDPR export/delete controls
- Incremental synthesis (Mode 1, every session)
- Threshold-triggered synthesis (Mode 2, LLM calls)
- Weekly scheduled synthesis (Mode 3)
- A/B group assignment and metric tracking
- Phase weight optimization framework
-
sqlite-vec with @libsql/client: The
sqlite-vecextension works withbetter-sqlite3. With@libsql/client, the extension loading mechanism differs. Turso Cloud has built-in vector support (vector_distance_cos()). Local libSQL may needlibsql-vectorpackage or bundled vec0 binary. Verify before Step 1. -
Embedding model cross-compatibility: Memories embedded with Qwen3-4b have the same 1024-dim format as memories embedded with OpenAI text-embedding-3-small. However, embeddings from different models are NOT directly comparable (different embedding spaces). When a user switches from Ollama to OpenAI fallback or vice versa, existing memories need re-embedding. Background re-embedding job needed; track
embedding_model_idper memory. -
Web app agent execution: In Next.js, agents cannot run in
worker_threadsthe same way as Electron. Server-side agent execution needs a job queue (BullMQ, Inngest, or Trigger.dev). The memory system architecture is the same, but the IPC mechanism differs. Define the web app execution model before Step 9. -
Scratchpad granularity for large pipelines: For a 40-subtask build, promote after each validated subtask, not just at pipeline end. The exact promotion gate per subtask: does it require subtask-level QA, or is the subtask returning success sufficient? Recommendation: subtask returning success is sufficient gate; pipeline-level QA is the gate for high-confidence observer-inferred memories.
-
Tree-sitter vs. ts-morph for TypeScript: tree-sitter extracts syntactic call sites but cannot resolve cross-module which function is being called. ts-morph has full TypeScript compiler resolution but is much slower. Use tree-sitter for Phases 1-5 (speed), add SCIP integration for precision in later phases. Mark edges with
source: 'ast'vssource: 'scip'. -
Reranking in cloud/web mode: Qwen3-Reranker-0.6B is not available without Ollama. In cloud/web mode, Cohere Rerank API (~$1/1K queries) is used from the start as the cross-encoder reranking tier. Monitor Cohere costs and evaluate alternatives (e.g., self-hosted reranker on VPS) if costs become significant at scale.
-
Graph neighborhood boost in cloud mode: The boost queries the
graph_closuretable which lives in libSQL/Turso. This works in all modes (local and cloud) with the same SQL. Confirm there's no cold-start state where graph_closure is empty but memories exist — if so, fall back gracefully to 2-path retrieval. -
Turso rate limits: The Scaler plan allows 500 databases. With database-per-tenant, this limits to 500 active project databases before upgrading to Enterprise. Plan the upgrade path before hitting this ceiling.
-
Cold-start graph indexing UX: First project open triggers tree-sitter cold-start (30 seconds to 20 minutes). Agents should start with
source: "ast"edges unavailable and progressively get better impact analysis. Prepend[Knowledge Graph: indexing in progress — impact analysis may be incomplete]to the first 3 agent sessions after project open. -
Personal memory vs. team memory conflict: If a team decision says "use PostgreSQL" and a developer's personal memory says "this client project uses SQLite," personal memories override project memories in retrieval scoring when the personal memory has higher confidence and is more recent. Never silently suppress team memories — surface both with attribution.
Document version: V5.0 — 2026-02-22 Built on: V4 Draft + Hackathon Teams 1-5 + Infrastructure Research Key V4→V5 changes: Turso/libSQL replaces better-sqlite3, Convex for auth/team/UI only, OpenAI text-embedding-3-small replaces Voyage, Graphiti Python sidecar removed (replaced by TS Knowledge Graph), AST chunking + contextual embeddings + graph neighborhood boost built in from day one, complete retrieval pipeline from day one (no phases), FTS5 everywhere (not Tantivy), Cohere Rerank API for cloud reranking