Sankhya-AI
diff --git a/‎README.md‎
Lines changed: 51 additions & 3 deletions b/‎README.md‎
Lines changed: 51 additions & 3 deletions
diff --git a/‎engram/api/app.py‎
Lines changed: 25 additions & 7 deletions b/‎engram/api/app.py‎
Lines changed: 25 additions & 7 deletions
diff --git a/‎engram/api/schemas.py‎
Lines changed: 4 additions & 4 deletions b/‎engram/api/schemas.py‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎engram/configs/active.py‎
Lines changed: 22 additions & 7 deletions b/‎engram/configs/active.py‎
Lines changed: 22 additions & 7 deletions
@@ -55,6 +55,9 @@ But Engram isn't just a handoff bus. It solves four fundamental problems with ho
 | **Nobody forgets** | Store everything forever | **Ebbinghaus decay curve, ~45% less storage** |
 | **Agents write with no oversight** | Store directly | **Staging + verification + trust scoring** |
 | **No episodic memory** | Vector search only | **CAST scenes (time/place/topic)** |
+| **No consolidation** | Store everything as-is | **CLS Distillation — replay-driven fact extraction** |
+| **Single decay rate** | One exponential curve | **Multi-trace Benna-Fusi model (fast/mid/slow)** |
+| **No intent routing** | Same search for all queries | **Episodic vs semantic query classification** |
 | Multi-modal encoding | Single embedding | **5 retrieval paths (EchoMem)** |
 | Cross-agent memory sharing | Per-agent silos | **Scoped retrieval with all-but-mask privacy** |
 | Concurrent multi-agent access | Single-process locks | **sqlite-vec WAL mode — multiple agents, one DB** |
@@ -90,6 +93,9 @@ pip install "engram-memory[sqlite_vec]"
 # OpenAI provider add-on
 pip install "engram-memory[openai]"
 
+# NVIDIA provider add-on (Llama 3.1, nv-embed-v1, etc.)
+pip install "engram-memory[nvidia]"
+
 # Ollama provider add-on
 pip install "engram-memory[ollama]"
 ```
@@ -144,7 +150,7 @@ Engram has five opinions about how memory should work:
 
 1. **Switching agents shouldn't mean starting over.** When an agent pauses — rate limit, crash, tool switch — it saves a session digest. The next agent loads it and continues. Zero re-explanation.
 2. **Agents need shared real-time state.** Active Memory lets agents broadcast what they're doing right now — no polling, no coordination protocol. Agent A posts "editing auth.py"; Agent B sees it instantly.
-3. **Memory has a lifecycle.** New memories start in short-term (SML), get promoted to long-term (LML) through repeated access, and fade away through Ebbinghaus decay if unused.
+3. **Memory has a lifecycle.** New memories start in short-term (SML), get promoted to long-term (LML) through repeated access, and fade away through Ebbinghaus decay if unused. Sleep cycles distill episodic conversations into durable semantic facts (CLS consolidation), cascade strength traces from fast to slow, and prune redundant or contradictory memories.
 4. **Agents are untrusted writers.** Every write is a proposal that lands in staging. Trusted agents can auto-merge; untrusted ones wait for approval.
 5. **Scoping is mandatory.** Every memory is scoped by user. Agents see only what they're allowed to — everything else gets the "all but mask" treatment (structure visible, details redacted).
 
@@ -209,7 +215,7 @@ Engram has five opinions about how memory should work:
 
 ### The Memory Stack
 
-Engram combines seven systems, each handling a different aspect of how memory should work:
+Engram combines multiple systems, each handling a different aspect of how memory should work:
 
 #### Active Memory — Real-Time Signal Bus
 
@@ -289,6 +295,48 @@ Scene: "Engram v2 architecture session"
   Memories:   [mem_1, mem_2]  ← semantic facts extracted
 ```
 
+#### CLS Distillation Memory — Bio-Inspired Consolidation (v1.4)
+
+Inspired by Complementary Learning Systems (CLS) theory — how the hippocampus and neocortex work together in the brain. Engram v1.4 adds five mechanisms that make memory smarter over time:
+
+**1. Episodic/Semantic Memory Types**
+Conversations are stored as `episodic` memories. During sleep cycles, a replay-driven distiller extracts durable facts into `semantic` memories — just like how your brain consolidates experiences into knowledge overnight.
+
+**2. Replay-Driven Distillation**
+The `ReplayDistiller` samples recent episodic memories, groups them by scene/time, and uses the LLM to extract reusable semantic facts. Every distilled fact links back to its source episodes (provenance tracking).
+
+**3. Multi-Mechanism Forgetting**
+Beyond simple exponential decay, Engram now has three advanced forgetting mechanisms:
+- **Interference Pruning** — contradictory memories are detected and the weaker one is demoted
+- **Redundancy Collapse** — near-duplicate memories are auto-fused
+- **Homeostatic Normalization** — memory budgets per namespace prevent unbounded growth
+
+**4. Multi-Timescale Strength Traces (Benna-Fusi Model)**
+Each memory has three strength traces instead of one scalar:
+```
+s_fast  (decay: 0.20/day) — recent access, volatile
+s_mid   (decay: 0.05/day) — medium-term consolidation
+s_slow  (decay: 0.005/day) — durable long-term knowledge
+```
+New memories start in `s_fast`. Sleep cycles cascade strength: `fast → mid → slow`. Important facts become nearly permanent.
+
+**5. Intent-Aware Retrieval Routing**
+Queries are classified as episodic ("when did we discuss..."), semantic ("what is the deployment process?"), or mixed. Matching memory types get a retrieval boost — the right type of answer for the right type of question.
+
+```
+┌──────────────────────────────────────────────────────────────┐
+│                    Sleep Cycle (v1.4)                         │
+│                                                              │
+│  1. Standard FadeMem decay (SML/LML)                         │
+│  2. Multi-trace decay (fast/mid/slow independently)          │
+│  3. Interference pruning (contradict → demote weaker)        │
+│  4. Redundancy collapse (near-dupes → fuse)                  │
+│  5. Homeostatic normalization (budget enforcement)            │
+│  6. Replay distillation (episodic → semantic facts)          │
+│  7. Trace cascade (fast → mid → slow consolidation)          │
+└──────────────────────────────────────────────────────────────┘
+```
+
 #### Handoff Bus — Cross-Agent Continuity
 
 Engram now defaults to a zero-intervention continuity model: MCP adapters automatically request resume context before tool execution and auto-write checkpoints on lifecycle events (`tool_complete`, `agent_pause`, `agent_end`). The legacy tools (`save_session_digest`, `get_last_session`, `list_sessions`) remain available for compatibility.
@@ -785,7 +833,7 @@ Engram is based on:
 | Multi-hop Reasoning | +12% accuracy |
 | Retrieval Precision | +8% on LTI-Bench |
 
-Biological inspirations: Ebbinghaus Forgetting Curve → exponential decay, Spaced Repetition → access boosts strength, Sleep Consolidation → SML → LML promotion, Working Memory → Active Memory signal bus, Conscious/Subconscious Split → Active vs Passive memory, Production Effect → echo encoding, Elaborative Encoding → deeper processing = stronger memory.
+Biological inspirations: Ebbinghaus Forgetting Curve → exponential decay, Spaced Repetition → access boosts strength, Sleep Consolidation → SML → LML promotion + CLS replay distillation, Benna-Fusi Model → multi-timescale strength traces (fast/mid/slow), Complementary Learning Systems → episodic-to-semantic consolidation, Working Memory → Active Memory signal bus, Conscious/Subconscious Split → Active vs Passive memory, Production Effect → echo encoding, Elaborative Encoding → deeper processing = stronger memory.
 
 ---
 
 
@@ -85,22 +85,32 @@ class DecayResponse(BaseModel):
     redoc_url="/redoc",
 )
 
+_cors_origins_raw = os.environ.get("ENGRAM_CORS_ORIGINS", "")
+_cors_origins = (
+    [o.strip() for o in _cors_origins_raw.split(",") if o.strip()]
+    if _cors_origins_raw
+    else ["http://localhost:3000", "http://127.0.0.1:3000"]
+)
+
 app.add_middleware(
     CORSMiddleware,
-    allow_origins=["*"],
+    allow_origins=_cors_origins,
     allow_credentials=True,
     allow_methods=["*"],
     allow_headers=["*"],
 )
 add_metrics_routes(app)
 
 _memory: Optional[Memory] = None
+_memory_lock = threading.Lock()
 
 
 def get_memory() -> Memory:
     global _memory
     if _memory is None:
-        _memory = Memory()
+        with _memory_lock:
+            if _memory is None:
+                _memory = Memory()
     return _memory
 
 
@@ -403,7 +413,7 @@ async def search_memories(request: SearchRequestV2, http_request: Request):
             raise require_session_error(exc)
         except Exception as exc:
             logger.exception("Error searching memories")
-            raise HTTPException(status_code=500, detail=str(exc))
+            raise HTTPException(status_code=500, detail="Internal server error")
 
 
 @app.get("/v1/scenes")
@@ -494,7 +504,7 @@ async def add_memory(request: AddMemoryRequestV2, http_request: Request):
         raise require_session_error(exc)
     except Exception as exc:
         logger.exception("Error creating proposal/direct memory")
-        raise HTTPException(status_code=500, detail=str(exc))
+        raise HTTPException(status_code=500, detail="Internal server error")
 
 
 @app.get("/v1/staging/commits")
@@ -779,15 +789,19 @@ async def get_memory_by_id(memory_id: str):
 
 @app.put("/v1/memories/{memory_id}", response_model=Dict[str, Any])
 @app.put("/v1/memories/{memory_id}/", response_model=Dict[str, Any])
-async def update_memory(memory_id: str, request: Dict[str, Any]):
+async def update_memory(memory_id: str, request: Dict[str, Any], http_request: Request):
+    token = get_token_from_request(http_request)
+    require_token_for_untrusted_request(http_request, token)
     memory = get_memory()
     result = memory.update(memory_id, request)
     return result
 
 
 @app.delete("/v1/memories/{memory_id}")
 @app.delete("/v1/memories/{memory_id}/")
-async def delete_memory(memory_id: str):
+async def delete_memory(memory_id: str, http_request: Request):
+    token = get_token_from_request(http_request)
+    require_token_for_untrusted_request(http_request, token)
     memory = get_memory()
     memory.delete(memory_id)
     return {"status": "deleted", "id": memory_id}
@@ -796,14 +810,18 @@ async def delete_memory(memory_id: str):
 @app.delete("/v1/memories", response_model=Dict[str, Any])
 @app.delete("/v1/memories/", response_model=Dict[str, Any])
 async def delete_memories(
+    http_request: Request,
     user_id: Optional[str] = Query(default=None),
     agent_id: Optional[str] = Query(default=None),
     run_id: Optional[str] = Query(default=None),
     app_id: Optional[str] = Query(default=None),
+    dry_run: bool = Query(default=False, description="Preview what would be deleted without actually deleting"),
 ):
+    token = get_token_from_request(http_request)
+    require_token_for_untrusted_request(http_request, token)
     memory = get_memory()
     try:
-        return memory.delete_all(user_id=user_id, agent_id=agent_id, run_id=run_id, app_id=app_id)
+        return memory.delete_all(user_id=user_id, agent_id=agent_id, run_id=run_id, app_id=app_id, dry_run=dry_run)
     except FadeMemValidationError as exc:
         raise HTTPException(status_code=400, detail=exc.message)
 
 
@@ -93,31 +93,31 @@ class HandoffSessionDigestRequest(BaseModel):
 
 
 class SearchRequestV2(BaseModel):
-    query: str
+    query: str = Field(min_length=1, max_length=10000)
     user_id: str = Field(default="default")
     agent_id: Optional[str] = Field(default=None)
     limit: int = Field(default=10, ge=1, le=100)
     categories: Optional[List[str]] = Field(default=None)
 
 
 class AddMemoryRequestV2(BaseModel):
-    content: Optional[str] = Field(default=None)
+    content: Optional[str] = Field(default=None, max_length=100000)
     messages: Optional[Union[str, List[Dict[str, Any]]]] = Field(default=None)
     user_id: str = Field(default="default")
     agent_id: Optional[str] = Field(default=None)
     metadata: Optional[Dict[str, Any]] = Field(default=None)
     categories: Optional[List[str]] = Field(default=None)
     scope: Optional[str] = Field(default="work")
     namespace: Optional[str] = Field(default="default")
-    mode: str = Field(default="staging", description="staging|direct")
+    mode: Literal["staging", "direct"] = Field(default="staging", description="staging|direct")
     infer: bool = Field(default=False)
     source_app: Optional[str] = Field(default=None)
     source_type: str = Field(default="rest")
     source_event_id: Optional[str] = Field(default=None)
 
 
 class SceneSearchRequest(BaseModel):
-    query: str
+    query: str = Field(min_length=1, max_length=10000)
     user_id: str = Field(default="default")
     agent_id: Optional[str] = Field(default=None)
     limit: int = Field(default=10, ge=1, le=100)
 
@@ -3,7 +3,7 @@
 from enum import Enum
 from typing import Dict
 
-from pydantic import BaseModel, Field
+from pydantic import BaseModel, Field, field_validator
 
 
 class TTLTier(str, Enum):
@@ -25,6 +25,14 @@ class SignalScope(str, Enum):
     NAMESPACE = "namespace"  # Only agents in same namespace
 
 
+class ConsolidationConfig(BaseModel):
+    """Configuration for active → passive memory consolidation."""
+    promote_critical: bool = True
+    promote_high_read: bool = True
+    promote_read_threshold: int = 3
+    directive_to_passive: bool = True
+
+
 class ActiveMemoryConfig(BaseModel):
     """Configuration for the Active Memory signal bus."""
     enabled: bool = True
@@ -40,11 +48,18 @@ class ActiveMemoryConfig(BaseModel):
     consolidation_enabled: bool = True
     consolidation_min_age_seconds: int = 600
     consolidation_min_reads: int = 3
+    consolidation: ConsolidationConfig = Field(default_factory=ConsolidationConfig)
 
+    @field_validator("default_ttl_tier")
+    @classmethod
+    def _valid_ttl_tier(cls, v: str) -> str:
+        allowed = {t.value for t in TTLTier}
+        v = str(v).strip().lower()
+        if v not in allowed:
+            return TTLTier.NOTABLE.value
+        return v
 
-class ConsolidationConfig(BaseModel):
-    """Configuration for active → passive memory consolidation."""
-    promote_critical: bool = True
-    promote_high_read: bool = True
-    promote_read_threshold: int = 3
-    directive_to_passive: bool = True
+    @field_validator("max_signals_per_response")
+    @classmethod
+    def _clamp_max_signals(cls, v: int) -> int:
+        return min(100, max(1, int(v)))