feat(instrument): Memory persistence (episodic/procedural/semantic) for 7 lighter adapters (cross-poll #1) by mmercuri · Pull Request #130 · LayerLens/stratix-python

mmercuri · 2026-04-27T04:53:43Z

Cross-pollination item #1 — memory persistence (episodic / procedural / semantic)

Implements cross-pollination item #1 from A:/tmp/adapter-cross-pollination-audit.md section 2 #1. The four mature framework adapters (LangChain, AutoGen, CrewAI, Semantic Kernel) carry ad-hoc memory plumbing — episodic recent turns, procedural learned patterns, semantic long-lived facts — that lets agents recall context across runs. The lighter adapters (agno, ms_agent_framework, openai_agents, llama_index, google_adk, bedrock_agents, browser_use) all behave as goldfish agents — every run starts from a blank slate. This PR ports the pattern into a shared, replay-safe primitive that the lighter adapters plug into uniformly.

What is new

Shared memory primitive

src/layerlens/instrument/adapters/_base/memory.py — new

MemorySnapshot — frozen dataclass with turn_index, episodic (recent turns), procedural (detected patterns), semantic (key/value facts), content_hash (SHA-256 of canonical-JSON encoding), org_id (tenant binding). to_dict / from_dict round-trip preserves identity.
MemoryRecorder — thread-safe accumulator. record_turn(...) is the per-turn entry point; set_semantic(key, value) for long-lived facts; snapshot() returns the immutable view; restore(snap) rebuilds state from a previous snapshot. All buckets bounded (defaults 200/16/64); episodic FIFO eviction, semantic LRU, procedural keep-top-by-count.
Procedural pattern detection: O(window) per turn, scans the recent episodic window for recurring (prev_tools, current_tools) pairs.
Multi-tenant: recorder requires non-empty org_id at construction; restore() rejects cross-tenant snapshots and tampered snapshots (content-hash mismatch).
Replay-safe: snapshot() -> restore() -> snapshot() round-trip produces byte-identical content_hash.

BaseAdapter integration

src/layerlens/instrument/adapters/_base/adapter.py

Constructor builds self._memory_recorder = MemoryRecorder(org_id=self._org_id).
New record_memory_turn(...) helper — best-effort wrapper that swallows recorder failures so memory persistence never breaks the host framework call stack (CLAUDE.md "tracing never breaks user code").
memory_recorder property, memory_snapshot() and memory_snapshot_dict() convenience accessors.

Per-adapter wiring (6 adapters)

Adapter	Hook	Episodic input	Tool list
`agno`	`Agent.run`/`arun` finally-block	`args[0]/kwargs["input"]`	`_collect_tool_names` from `result.messages`
`ms_agent_framework`	`Chat.invoke`/`invoke_stream` finally-block	`kwargs["input"]/["message"]`	`_collect_tool_names_from_messages` from streamed items
`openai_agents`	`_on_agent_span_end` (TraceProcessor) + `on_run_end`	cached at span_start per `span_id`	rolled up from `_on_function_span_end` per `parent_id`
`llama_index`	`_on_agent_step_end`	cached at step_start per thread id	rolled up from `_on_tool_call` per thread id
`google_adk`	`after_agent_callback` + `on_agent_end`	cached at `before_agent_callback` TID	rolled up from `after_tool_callback` per TID
`bedrock_agents`	`_after_invoke_agent` (boto3 hook)	cached at `_before_invoke_agent` TID	rolled up from `_process_trace` action-group / KB names

Each adapter's serialize_for_replay() now embeds the snapshot under ReplayableTrace.metadata["memory_snapshot"] so replay engines can reconstruct memory state via MemorySnapshot.from_dict(...) -> recorder.restore(snapshot) before re-execution.

Tests (57 new)

`tests/instrument/adapters/_base/test_memory.py` — 27 tests

Recorder construction (empty/whitespace/non-string org_id rejected; zero buffer sizes rejected; initial state empty).
Snapshot determinism (identical content -> identical hash; different org_id -> different hash; mutating recorder doesnt affect prior snapshot; to_dict/from_dict round-trip preserves hash; from_dict rejects missing required fields).
Replay round-trip (snapshot -> restore -> snapshot byte-identical hash; deterministic next-state under matching inputs; cross-tenant restore raises; tampered-content-hash restore raises).
Bounded eviction (episodic FIFO at cap; semantic LRU at cap; semantic overwrite refreshes LRU; semantic key validation; procedural cap).
Procedural detection (repeated tool sequences accumulate count; no-tool turns produce no patterns).
Per-turn truncation (multi-megabyte values capped with deterministic suffix).
Thread safety (8 threads x 50 turns produces unbroken 1..400 sequence).
Clear preserves binding; defaults positive; extra metadata sorted for hash determinism; record_turn returns post-increment counter.

`tests/instrument/adapters/frameworks/test_memory_persistence_wiring.py` — 30 tests (5 x 6 adapters parametrized)

Each adapter exposes a recorder bound to its org_id.
record_memory_turn advances the episodic buffer.
serialize_for_replay() embeds metadata["memory_snapshot"].
Replay engine can restore the recorder from the serialised trace (content-hash match end-to-end).
Cross-tenant snapshot is rejected at the per-adapter recorder boundary.

Existing test suites unchanged

tests/instrument/adapters/_base/ — 44 passed (no regressions).
6 target adapter test modules — 72 passed (no regressions).

Documentation

docs/adapters/memory-contract.md — explains the three buckets, the contract (tenant binding, bounded buffers, tamper-evident snapshots, replay-safe round-trip, best-effort recording, thread safety), per-adapter wiring matrix, and audit hooks. Includes the replay-engine integration recipe and the honest scope disclosure for browser_use.

Honest scope disclosure (browser_use)

The cross-pollination audit section 2 #1 enumerates seven target adapters. Six are wired here. The seventh — browser_use — does NOT exist on this PR base branch (feat/instrument-multitenancy-org-id-propagation); it lives on the parallel feat/instrument-frameworks-browser-use-full history. It will be wired when that adapter is ported to this base or when the histories merge. This follows the same honest-disclosure pattern as PR #120 (state filters, which omitted ms_agent_framework for the same reason — adapter not on its base).

The future browser_use wiring (per audit section 2 #1) will be:

Episodic — page navigation events (URL, action, selector) per turn.
Procedural — recurring (prev_action, current_action) patterns (e.g. click[search] -> type[query] -> click[submit]).
Semantic — long-lived page-content cache keyed by URL or DOM hash, so a re-visit can short-circuit page reload during replay.

Acceptance

uv run pytest tests/instrument/adapters/_base/test_memory.py -x
# 27 passed in 0.45s

uv run pytest tests/instrument/adapters/frameworks/test_memory_persistence_wiring.py -x
# 30 passed in 0.76s

uv run pytest tests/instrument/adapters/_base/
# 44 passed (no regressions)

uv run pytest tests/instrument/adapters/frameworks/{6 target adapters}_adapter.py
# 72 passed (no regressions)

uv run mypy --strict src/layerlens/instrument/adapters/_base/memory.py
# Success: no issues found in 1 source file

uv run mypy src/layerlens/instrument/adapters/_base/adapter.py src/layerlens/instrument/adapters/frameworks/{6 adapters}/lifecycle.py
# Success: no issues found in 7 source files

uv run ruff check src/layerlens/instrument/adapters/_base/memory.py tests/instrument/adapters/_base/test_memory.py tests/instrument/adapters/frameworks/test_memory_persistence_wiring.py
# All checks passed!

Test plan

pytest tests/instrument/adapters/_base/test_memory.py -x (27 tests)
pytest tests/instrument/adapters/frameworks/test_memory_persistence_wiring.py -x (30 tests, 5 x 6 adapters)
pytest tests/instrument/adapters/_base/ (no regressions on resilience tests)
pytest tests/instrument/adapters/frameworks/{6 target adapters}_adapter.py (no regressions)
mypy --strict src/layerlens/instrument/adapters/_base/memory.py
mypy src/.../{base + 6 adapter lifecycle.py files}
ruff check
CI must pass on Linux / 3.10-3.13 matrix (developed on Windows / 3.9)

…or 6 lighter adapters (cross-poll #1) Implements cross-pollination item #1 from A:/tmp/adapter-cross-pollination-audit.md section 2 #1. The four mature framework adapters (LangChain, AutoGen, CrewAI, Semantic Kernel) carry ad-hoc memory plumbing — episodic recent turns, procedural learned patterns, semantic long-lived facts — that lets agents recall context across runs. The lighter adapters (agno, ms_agent_framework, openai_agents, llama_index, google_adk, bedrock_agents, browser_use) all behave as goldfish agents — every run starts from a blank slate. This PR ports the pattern into a shared, replay-safe primitive that the lighter adapters plug into uniformly. ## What is new ### Shared memory primitive src/layerlens/instrument/adapters/_base/memory.py — new - MemorySnapshot — frozen dataclass with turn_index, episodic (recent turns), procedural (detected patterns), semantic (key/value facts), content_hash (SHA-256 of canonical-JSON encoding), org_id (tenant binding). to_dict / from_dict round-trip preserves identity. - MemoryRecorder — thread-safe accumulator. record_turn(...) is the per-turn entry point; set_semantic(key, value) for long-lived facts; snapshot() returns the immutable view; restore(snap) rebuilds state from a previous snapshot. All buckets bounded (defaults 200/16/64); episodic FIFO eviction, semantic LRU, procedural keep-top-by-count. - Procedural pattern detection: O(window) per turn, scans the recent episodic window for recurring (prev_tools, current_tools) pairs. - Multi-tenant: recorder requires non-empty org_id at construction; restore() rejects cross-tenant snapshots and tampered snapshots (content-hash mismatch). - Replay-safe: snapshot -> restore -> snapshot round-trip produces byte-identical content_hash. ### BaseAdapter integration src/layerlens/instrument/adapters/_base/adapter.py - Constructor builds self._memory_recorder = MemoryRecorder(org_id=self._org_id). - New record_memory_turn(...) helper — best-effort wrapper that swallows recorder failures so memory persistence never breaks the host framework call stack (CLAUDE.md "tracing never breaks user code"). - memory_recorder property, memory_snapshot() and memory_snapshot_dict() convenience accessors. ### Per-adapter wiring (6 adapters) - agno: Agent.run/arun finally-block; episodic input from args/kwargs; tool list from _collect_tool_names(result.messages). - ms_agent_framework: Chat.invoke/invoke_stream finally-block; episodic input from kwargs; tool list from streamed message items. - openai_agents: _on_agent_span_end (TraceProcessor) + on_run_end (Runner wrap); episodic input cached at span_start per span_id; tool list rolled up from _on_function_span_end per parent_id. - llama_index: _on_agent_step_end; episodic input cached at step_start per thread id; tool list rolled up from _on_tool_call. - google_adk: after_agent_callback + on_agent_end; episodic input cached at before_agent_callback per thread id; tool list rolled up from after_tool_callback per thread id. - bedrock_agents: _after_invoke_agent (boto3 hook); episodic input cached at _before_invoke_agent per thread id; tool list rolled up from _process_trace action-group / KB step names. Each adapter serialize_for_replay() now embeds the snapshot under ReplayableTrace.metadata["memory_snapshot"] so replay engines can reconstruct memory state via MemorySnapshot.from_dict(...) -> recorder.restore(snapshot) before re-execution. ## Tests (57 new) ### tests/instrument/adapters/_base/test_memory.py — 27 tests Recorder construction (empty/non-string org_id rejected; zero buffer sizes rejected; initial state empty). Snapshot determinism (identical content -> identical hash; different org_id -> different hash; mutating recorder doesnt affect prior snapshot; to_dict/from_dict round-trip preserves hash; from_dict rejects missing required fields). Replay round-trip (snapshot -> restore -> snapshot byte-identical hash; deterministic next-state under matching inputs; cross-tenant restore raises; tampered-content-hash restore raises). Bounded eviction (episodic FIFO at cap; semantic LRU at cap; semantic overwrite refreshes LRU; procedural cap). Procedural detection (repeated tool sequences accumulate count; no-tool turns produce no patterns). Per-turn truncation (multi-megabyte values capped with deterministic suffix). Thread safety (8 threads x 50 turns produces unbroken 1..400 sequence). Clear preserves binding; defaults positive; extra metadata sorted for hash determinism. ### tests/instrument/adapters/frameworks/test_memory_persistence_wiring.py — 30 tests (5 x 6 adapters parametrized) - Each adapter exposes a recorder bound to its org_id. - record_memory_turn advances the episodic buffer. - serialize_for_replay() embeds metadata["memory_snapshot"]. - Replay engine can restore the recorder from the serialised trace (content-hash match end-to-end). - Cross-tenant snapshot is rejected at the per-adapter recorder boundary. ## Documentation docs/adapters/memory-contract.md — explains the three buckets, the contract (tenant binding, bounded buffers, tamper-evident snapshots, replay-safe round-trip, best-effort recording, thread safety), per-adapter wiring matrix, and audit hooks. Includes the replay-engine integration recipe and the honest scope disclosure for browser_use. ## Honest scope disclosure The cross-pollination audit section 2 #1 enumerates seven target adapters. Six are wired here. The seventh — browser_use — does NOT exist on this PR base branch (feat/instrument-multitenancy-org-id-propagation); it lives on the parallel feat/instrument-frameworks-browser-use-full history. It will be wired when that adapter is ported to this base or when the histories merge. This follows the same honest-disclosure pattern as PR #120 (state filters, which omitted ms_agent_framework for the same reason). The future browser_use wiring (per audit section 2 #1) will be: - Episodic: page navigation events (URL, action, selector) - Procedural: recurring (prev_action, current_action) patterns - Semantic: long-lived page-content cache keyed by URL/DOM hash ## Acceptance uv run pytest tests/instrument/adapters/_base/test_memory.py -x -> 27 passed uv run pytest tests/instrument/adapters/frameworks/test_memory_persistence_wiring.py -x -> 30 passed uv run pytest tests/instrument/adapters/_base/ -> 44 passed (no regressions) uv run pytest tests/instrument/adapters/frameworks/{agno,bedrock_agents,google_adk,llama_index,ms_agent_framework,openai_agents}_adapter.py -> 72 passed (no regressions) uv run mypy --strict src/layerlens/instrument/adapters/_base/memory.py -> Success: no issues found in 1 source file uv run mypy src/layerlens/instrument/adapters/_base/adapter.py src/layerlens/instrument/adapters/frameworks/{6 adapters}/lifecycle.py -> Success: no issues found in 7 source files uv run ruff check src/layerlens/instrument/adapters/_base/memory.py tests/instrument/adapters/_base/test_memory.py tests/instrument/adapters/frameworks/test_memory_persistence_wiring.py -> All checks passed!

mmercuri requested a review from m-peko April 27, 2026 04:53

m-peko marked this pull request as ready for review May 12, 2026 17:39

m-peko merged commit 9d54477 into feat/instrument-multitenancy-org-id-propagation May 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(instrument): Memory persistence (episodic/procedural/semantic) for 7 lighter adapters (cross-poll #1)#130

feat(instrument): Memory persistence (episodic/procedural/semantic) for 7 lighter adapters (cross-poll #1)#130
m-peko merged 1 commit into
feat/instrument-multitenancy-org-id-propagationfrom
feat/instrument-memory-persistence

mmercuri commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mmercuri commented Apr 27, 2026

Cross-pollination item #1 — memory persistence (episodic / procedural / semantic)

What is new

Shared memory primitive

BaseAdapter integration

Per-adapter wiring (6 adapters)

Tests (57 new)

tests/instrument/adapters/_base/test_memory.py — 27 tests

tests/instrument/adapters/frameworks/test_memory_persistence_wiring.py — 30 tests (5 x 6 adapters parametrized)

Existing test suites unchanged

Documentation

Honest scope disclosure (browser_use)

Acceptance

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`tests/instrument/adapters/_base/test_memory.py` — 27 tests

`tests/instrument/adapters/frameworks/test_memory_persistence_wiring.py` — 30 tests (5 x 6 adapters parametrized)