Skip to content

feat(instrument): Memory persistence (episodic/procedural/semantic) for 7 lighter adapters (cross-poll #1)#130

Merged
m-peko merged 1 commit into
feat/instrument-multitenancy-org-id-propagationfrom
feat/instrument-memory-persistence
May 12, 2026
Merged

feat(instrument): Memory persistence (episodic/procedural/semantic) for 7 lighter adapters (cross-poll #1)#130
m-peko merged 1 commit into
feat/instrument-multitenancy-org-id-propagationfrom
feat/instrument-memory-persistence

Conversation

@mmercuri

Copy link
Copy Markdown
Contributor

Cross-pollination item #1 — memory persistence (episodic / procedural / semantic)

Implements cross-pollination item #1 from A:/tmp/adapter-cross-pollination-audit.md section 2 #1. The four mature framework adapters (LangChain, AutoGen, CrewAI, Semantic Kernel) carry ad-hoc memory plumbing — episodic recent turns, procedural learned patterns, semantic long-lived facts — that lets agents recall context across runs. The lighter adapters (agno, ms_agent_framework, openai_agents, llama_index, google_adk, bedrock_agents, browser_use) all behave as goldfish agents — every run starts from a blank slate. This PR ports the pattern into a shared, replay-safe primitive that the lighter adapters plug into uniformly.

What is new

Shared memory primitive

src/layerlens/instrument/adapters/_base/memory.py — new

  • MemorySnapshot — frozen dataclass with turn_index, episodic (recent turns), procedural (detected patterns), semantic (key/value facts), content_hash (SHA-256 of canonical-JSON encoding), org_id (tenant binding). to_dict / from_dict round-trip preserves identity.
  • MemoryRecorder — thread-safe accumulator. record_turn(...) is the per-turn entry point; set_semantic(key, value) for long-lived facts; snapshot() returns the immutable view; restore(snap) rebuilds state from a previous snapshot. All buckets bounded (defaults 200/16/64); episodic FIFO eviction, semantic LRU, procedural keep-top-by-count.
  • Procedural pattern detection: O(window) per turn, scans the recent episodic window for recurring (prev_tools, current_tools) pairs.
  • Multi-tenant: recorder requires non-empty org_id at construction; restore() rejects cross-tenant snapshots and tampered snapshots (content-hash mismatch).
  • Replay-safe: snapshot() -> restore() -> snapshot() round-trip produces byte-identical content_hash.

BaseAdapter integration

src/layerlens/instrument/adapters/_base/adapter.py

  • Constructor builds self._memory_recorder = MemoryRecorder(org_id=self._org_id).
  • New record_memory_turn(...) helper — best-effort wrapper that swallows recorder failures so memory persistence never breaks the host framework call stack (CLAUDE.md "tracing never breaks user code").
  • memory_recorder property, memory_snapshot() and memory_snapshot_dict() convenience accessors.

Per-adapter wiring (6 adapters)

Adapter Hook Episodic input Tool list
agno Agent.run/arun finally-block args[0]/kwargs["input"] _collect_tool_names from result.messages
ms_agent_framework Chat.invoke/invoke_stream finally-block kwargs["input"]/["message"] _collect_tool_names_from_messages from streamed items
openai_agents _on_agent_span_end (TraceProcessor) + on_run_end cached at span_start per span_id rolled up from _on_function_span_end per parent_id
llama_index _on_agent_step_end cached at step_start per thread id rolled up from _on_tool_call per thread id
google_adk after_agent_callback + on_agent_end cached at before_agent_callback TID rolled up from after_tool_callback per TID
bedrock_agents _after_invoke_agent (boto3 hook) cached at _before_invoke_agent TID rolled up from _process_trace action-group / KB names

Each adapter's serialize_for_replay() now embeds the snapshot under ReplayableTrace.metadata["memory_snapshot"] so replay engines can reconstruct memory state via MemorySnapshot.from_dict(...) -> recorder.restore(snapshot) before re-execution.

Tests (57 new)

tests/instrument/adapters/_base/test_memory.py — 27 tests

  • Recorder construction (empty/whitespace/non-string org_id rejected; zero buffer sizes rejected; initial state empty).
  • Snapshot determinism (identical content -> identical hash; different org_id -> different hash; mutating recorder doesnt affect prior snapshot; to_dict/from_dict round-trip preserves hash; from_dict rejects missing required fields).
  • Replay round-trip (snapshot -> restore -> snapshot byte-identical hash; deterministic next-state under matching inputs; cross-tenant restore raises; tampered-content-hash restore raises).
  • Bounded eviction (episodic FIFO at cap; semantic LRU at cap; semantic overwrite refreshes LRU; semantic key validation; procedural cap).
  • Procedural detection (repeated tool sequences accumulate count; no-tool turns produce no patterns).
  • Per-turn truncation (multi-megabyte values capped with deterministic suffix).
  • Thread safety (8 threads x 50 turns produces unbroken 1..400 sequence).
  • Clear preserves binding; defaults positive; extra metadata sorted for hash determinism; record_turn returns post-increment counter.

tests/instrument/adapters/frameworks/test_memory_persistence_wiring.py — 30 tests (5 x 6 adapters parametrized)

  • Each adapter exposes a recorder bound to its org_id.
  • record_memory_turn advances the episodic buffer.
  • serialize_for_replay() embeds metadata["memory_snapshot"].
  • Replay engine can restore the recorder from the serialised trace (content-hash match end-to-end).
  • Cross-tenant snapshot is rejected at the per-adapter recorder boundary.

Existing test suites unchanged

  • tests/instrument/adapters/_base/ — 44 passed (no regressions).
  • 6 target adapter test modules — 72 passed (no regressions).

Documentation

docs/adapters/memory-contract.md — explains the three buckets, the contract (tenant binding, bounded buffers, tamper-evident snapshots, replay-safe round-trip, best-effort recording, thread safety), per-adapter wiring matrix, and audit hooks. Includes the replay-engine integration recipe and the honest scope disclosure for browser_use.

Honest scope disclosure (browser_use)

The cross-pollination audit section 2 #1 enumerates seven target adapters. Six are wired here. The seventh — browser_use — does NOT exist on this PR base branch (feat/instrument-multitenancy-org-id-propagation); it lives on the parallel feat/instrument-frameworks-browser-use-full history. It will be wired when that adapter is ported to this base or when the histories merge. This follows the same honest-disclosure pattern as PR #120 (state filters, which omitted ms_agent_framework for the same reason — adapter not on its base).

The future browser_use wiring (per audit section 2 #1) will be:

  • Episodic — page navigation events (URL, action, selector) per turn.
  • Procedural — recurring (prev_action, current_action) patterns (e.g. click[search] -> type[query] -> click[submit]).
  • Semantic — long-lived page-content cache keyed by URL or DOM hash, so a re-visit can short-circuit page reload during replay.

Acceptance

uv run pytest tests/instrument/adapters/_base/test_memory.py -x
# 27 passed in 0.45s

uv run pytest tests/instrument/adapters/frameworks/test_memory_persistence_wiring.py -x
# 30 passed in 0.76s

uv run pytest tests/instrument/adapters/_base/
# 44 passed (no regressions)

uv run pytest tests/instrument/adapters/frameworks/{6 target adapters}_adapter.py
# 72 passed (no regressions)

uv run mypy --strict src/layerlens/instrument/adapters/_base/memory.py
# Success: no issues found in 1 source file

uv run mypy src/layerlens/instrument/adapters/_base/adapter.py src/layerlens/instrument/adapters/frameworks/{6 adapters}/lifecycle.py
# Success: no issues found in 7 source files

uv run ruff check src/layerlens/instrument/adapters/_base/memory.py tests/instrument/adapters/_base/test_memory.py tests/instrument/adapters/frameworks/test_memory_persistence_wiring.py
# All checks passed!

Test plan

  • pytest tests/instrument/adapters/_base/test_memory.py -x (27 tests)
  • pytest tests/instrument/adapters/frameworks/test_memory_persistence_wiring.py -x (30 tests, 5 x 6 adapters)
  • pytest tests/instrument/adapters/_base/ (no regressions on resilience tests)
  • pytest tests/instrument/adapters/frameworks/{6 target adapters}_adapter.py (no regressions)
  • mypy --strict src/layerlens/instrument/adapters/_base/memory.py
  • mypy src/.../{base + 6 adapter lifecycle.py files}
  • ruff check
  • CI must pass on Linux / 3.10-3.13 matrix (developed on Windows / 3.9)

…or 6 lighter adapters (cross-poll #1)

Implements cross-pollination item #1 from
A:/tmp/adapter-cross-pollination-audit.md section 2 #1. The four mature
framework adapters (LangChain, AutoGen, CrewAI, Semantic Kernel) carry
ad-hoc memory plumbing — episodic recent turns, procedural learned
patterns, semantic long-lived facts — that lets agents recall context
across runs. The lighter adapters (agno, ms_agent_framework,
openai_agents, llama_index, google_adk, bedrock_agents, browser_use)
all behave as goldfish agents — every run starts from a blank slate.
This PR ports the pattern into a shared, replay-safe primitive that
the lighter adapters plug into uniformly.

## What is new

### Shared memory primitive
src/layerlens/instrument/adapters/_base/memory.py — new

- MemorySnapshot — frozen dataclass with turn_index, episodic
  (recent turns), procedural (detected patterns), semantic
  (key/value facts), content_hash (SHA-256 of canonical-JSON
  encoding), org_id (tenant binding). to_dict / from_dict
  round-trip preserves identity.
- MemoryRecorder — thread-safe accumulator. record_turn(...) is the
  per-turn entry point; set_semantic(key, value) for long-lived
  facts; snapshot() returns the immutable view; restore(snap)
  rebuilds state from a previous snapshot. All buckets bounded
  (defaults 200/16/64); episodic FIFO eviction, semantic LRU,
  procedural keep-top-by-count.
- Procedural pattern detection: O(window) per turn, scans the recent
  episodic window for recurring (prev_tools, current_tools) pairs.
- Multi-tenant: recorder requires non-empty org_id at construction;
  restore() rejects cross-tenant snapshots and tampered snapshots
  (content-hash mismatch).
- Replay-safe: snapshot -> restore -> snapshot round-trip
  produces byte-identical content_hash.

### BaseAdapter integration
src/layerlens/instrument/adapters/_base/adapter.py

- Constructor builds self._memory_recorder = MemoryRecorder(org_id=self._org_id).
- New record_memory_turn(...) helper — best-effort wrapper that swallows
  recorder failures so memory persistence never breaks the host
  framework call stack (CLAUDE.md "tracing never breaks user code").
- memory_recorder property, memory_snapshot() and
  memory_snapshot_dict() convenience accessors.

### Per-adapter wiring (6 adapters)

- agno: Agent.run/arun finally-block; episodic input from args/kwargs;
  tool list from _collect_tool_names(result.messages).
- ms_agent_framework: Chat.invoke/invoke_stream finally-block;
  episodic input from kwargs; tool list from streamed message items.
- openai_agents: _on_agent_span_end (TraceProcessor) + on_run_end
  (Runner wrap); episodic input cached at span_start per span_id;
  tool list rolled up from _on_function_span_end per parent_id.
- llama_index: _on_agent_step_end; episodic input cached at
  step_start per thread id; tool list rolled up from _on_tool_call.
- google_adk: after_agent_callback + on_agent_end; episodic input
  cached at before_agent_callback per thread id; tool list rolled
  up from after_tool_callback per thread id.
- bedrock_agents: _after_invoke_agent (boto3 hook); episodic input
  cached at _before_invoke_agent per thread id; tool list rolled up
  from _process_trace action-group / KB step names.

Each adapter serialize_for_replay() now embeds the snapshot under
ReplayableTrace.metadata["memory_snapshot"] so replay engines can
reconstruct memory state via MemorySnapshot.from_dict(...) ->
recorder.restore(snapshot) before re-execution.

## Tests (57 new)

### tests/instrument/adapters/_base/test_memory.py — 27 tests

Recorder construction (empty/non-string org_id rejected; zero buffer
sizes rejected; initial state empty). Snapshot determinism (identical
content -> identical hash; different org_id -> different hash;
mutating recorder doesnt affect prior snapshot; to_dict/from_dict
round-trip preserves hash; from_dict rejects missing required fields).
Replay round-trip (snapshot -> restore -> snapshot byte-identical
hash; deterministic next-state under matching inputs; cross-tenant
restore raises; tampered-content-hash restore raises). Bounded
eviction (episodic FIFO at cap; semantic LRU at cap; semantic
overwrite refreshes LRU; procedural cap). Procedural detection
(repeated tool sequences accumulate count; no-tool turns produce
no patterns). Per-turn truncation (multi-megabyte values capped with
deterministic suffix). Thread safety (8 threads x 50 turns produces
unbroken 1..400 sequence). Clear preserves binding; defaults positive;
extra metadata sorted for hash determinism.

### tests/instrument/adapters/frameworks/test_memory_persistence_wiring.py — 30 tests (5 x 6 adapters parametrized)

- Each adapter exposes a recorder bound to its org_id.
- record_memory_turn advances the episodic buffer.
- serialize_for_replay() embeds metadata["memory_snapshot"].
- Replay engine can restore the recorder from the serialised trace
  (content-hash match end-to-end).
- Cross-tenant snapshot is rejected at the per-adapter recorder
  boundary.

## Documentation

docs/adapters/memory-contract.md — explains the three buckets,
the contract (tenant binding, bounded buffers, tamper-evident
snapshots, replay-safe round-trip, best-effort recording, thread
safety), per-adapter wiring matrix, and audit hooks. Includes the
replay-engine integration recipe and the honest scope disclosure
for browser_use.

## Honest scope disclosure

The cross-pollination audit section 2 #1 enumerates seven target
adapters. Six are wired here. The seventh — browser_use — does
NOT exist on this PR base branch
(feat/instrument-multitenancy-org-id-propagation); it lives on the
parallel feat/instrument-frameworks-browser-use-full history. It
will be wired when that adapter is ported to this base or when the
histories merge. This follows the same honest-disclosure pattern as
PR #120 (state filters, which omitted ms_agent_framework for the
same reason).

The future browser_use wiring (per audit section 2 #1) will be:
- Episodic: page navigation events (URL, action, selector)
- Procedural: recurring (prev_action, current_action) patterns
- Semantic: long-lived page-content cache keyed by URL/DOM hash

## Acceptance

uv run pytest tests/instrument/adapters/_base/test_memory.py -x
  -> 27 passed

uv run pytest tests/instrument/adapters/frameworks/test_memory_persistence_wiring.py -x
  -> 30 passed

uv run pytest tests/instrument/adapters/_base/
  -> 44 passed (no regressions)

uv run pytest tests/instrument/adapters/frameworks/{agno,bedrock_agents,google_adk,llama_index,ms_agent_framework,openai_agents}_adapter.py
  -> 72 passed (no regressions)

uv run mypy --strict src/layerlens/instrument/adapters/_base/memory.py
  -> Success: no issues found in 1 source file

uv run mypy src/layerlens/instrument/adapters/_base/adapter.py src/layerlens/instrument/adapters/frameworks/{6 adapters}/lifecycle.py
  -> Success: no issues found in 7 source files

uv run ruff check src/layerlens/instrument/adapters/_base/memory.py tests/instrument/adapters/_base/test_memory.py tests/instrument/adapters/frameworks/test_memory_persistence_wiring.py
  -> All checks passed!
@mmercuri mmercuri requested a review from m-peko April 27, 2026 04:53
@m-peko m-peko marked this pull request as ready for review May 12, 2026 17:39
@m-peko m-peko merged commit 9d54477 into feat/instrument-multitenancy-org-id-propagation May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants