feat(instrument): Memory persistence (episodic/procedural/semantic) for 7 lighter adapters (cross-poll #1)#130
Merged
m-peko merged 1 commit intoMay 12, 2026
Conversation
…or 6 lighter adapters (cross-poll #1) Implements cross-pollination item #1 from A:/tmp/adapter-cross-pollination-audit.md section 2 #1. The four mature framework adapters (LangChain, AutoGen, CrewAI, Semantic Kernel) carry ad-hoc memory plumbing — episodic recent turns, procedural learned patterns, semantic long-lived facts — that lets agents recall context across runs. The lighter adapters (agno, ms_agent_framework, openai_agents, llama_index, google_adk, bedrock_agents, browser_use) all behave as goldfish agents — every run starts from a blank slate. This PR ports the pattern into a shared, replay-safe primitive that the lighter adapters plug into uniformly. ## What is new ### Shared memory primitive src/layerlens/instrument/adapters/_base/memory.py — new - MemorySnapshot — frozen dataclass with turn_index, episodic (recent turns), procedural (detected patterns), semantic (key/value facts), content_hash (SHA-256 of canonical-JSON encoding), org_id (tenant binding). to_dict / from_dict round-trip preserves identity. - MemoryRecorder — thread-safe accumulator. record_turn(...) is the per-turn entry point; set_semantic(key, value) for long-lived facts; snapshot() returns the immutable view; restore(snap) rebuilds state from a previous snapshot. All buckets bounded (defaults 200/16/64); episodic FIFO eviction, semantic LRU, procedural keep-top-by-count. - Procedural pattern detection: O(window) per turn, scans the recent episodic window for recurring (prev_tools, current_tools) pairs. - Multi-tenant: recorder requires non-empty org_id at construction; restore() rejects cross-tenant snapshots and tampered snapshots (content-hash mismatch). - Replay-safe: snapshot -> restore -> snapshot round-trip produces byte-identical content_hash. ### BaseAdapter integration src/layerlens/instrument/adapters/_base/adapter.py - Constructor builds self._memory_recorder = MemoryRecorder(org_id=self._org_id). - New record_memory_turn(...) helper — best-effort wrapper that swallows recorder failures so memory persistence never breaks the host framework call stack (CLAUDE.md "tracing never breaks user code"). - memory_recorder property, memory_snapshot() and memory_snapshot_dict() convenience accessors. ### Per-adapter wiring (6 adapters) - agno: Agent.run/arun finally-block; episodic input from args/kwargs; tool list from _collect_tool_names(result.messages). - ms_agent_framework: Chat.invoke/invoke_stream finally-block; episodic input from kwargs; tool list from streamed message items. - openai_agents: _on_agent_span_end (TraceProcessor) + on_run_end (Runner wrap); episodic input cached at span_start per span_id; tool list rolled up from _on_function_span_end per parent_id. - llama_index: _on_agent_step_end; episodic input cached at step_start per thread id; tool list rolled up from _on_tool_call. - google_adk: after_agent_callback + on_agent_end; episodic input cached at before_agent_callback per thread id; tool list rolled up from after_tool_callback per thread id. - bedrock_agents: _after_invoke_agent (boto3 hook); episodic input cached at _before_invoke_agent per thread id; tool list rolled up from _process_trace action-group / KB step names. Each adapter serialize_for_replay() now embeds the snapshot under ReplayableTrace.metadata["memory_snapshot"] so replay engines can reconstruct memory state via MemorySnapshot.from_dict(...) -> recorder.restore(snapshot) before re-execution. ## Tests (57 new) ### tests/instrument/adapters/_base/test_memory.py — 27 tests Recorder construction (empty/non-string org_id rejected; zero buffer sizes rejected; initial state empty). Snapshot determinism (identical content -> identical hash; different org_id -> different hash; mutating recorder doesnt affect prior snapshot; to_dict/from_dict round-trip preserves hash; from_dict rejects missing required fields). Replay round-trip (snapshot -> restore -> snapshot byte-identical hash; deterministic next-state under matching inputs; cross-tenant restore raises; tampered-content-hash restore raises). Bounded eviction (episodic FIFO at cap; semantic LRU at cap; semantic overwrite refreshes LRU; procedural cap). Procedural detection (repeated tool sequences accumulate count; no-tool turns produce no patterns). Per-turn truncation (multi-megabyte values capped with deterministic suffix). Thread safety (8 threads x 50 turns produces unbroken 1..400 sequence). Clear preserves binding; defaults positive; extra metadata sorted for hash determinism. ### tests/instrument/adapters/frameworks/test_memory_persistence_wiring.py — 30 tests (5 x 6 adapters parametrized) - Each adapter exposes a recorder bound to its org_id. - record_memory_turn advances the episodic buffer. - serialize_for_replay() embeds metadata["memory_snapshot"]. - Replay engine can restore the recorder from the serialised trace (content-hash match end-to-end). - Cross-tenant snapshot is rejected at the per-adapter recorder boundary. ## Documentation docs/adapters/memory-contract.md — explains the three buckets, the contract (tenant binding, bounded buffers, tamper-evident snapshots, replay-safe round-trip, best-effort recording, thread safety), per-adapter wiring matrix, and audit hooks. Includes the replay-engine integration recipe and the honest scope disclosure for browser_use. ## Honest scope disclosure The cross-pollination audit section 2 #1 enumerates seven target adapters. Six are wired here. The seventh — browser_use — does NOT exist on this PR base branch (feat/instrument-multitenancy-org-id-propagation); it lives on the parallel feat/instrument-frameworks-browser-use-full history. It will be wired when that adapter is ported to this base or when the histories merge. This follows the same honest-disclosure pattern as PR #120 (state filters, which omitted ms_agent_framework for the same reason). The future browser_use wiring (per audit section 2 #1) will be: - Episodic: page navigation events (URL, action, selector) - Procedural: recurring (prev_action, current_action) patterns - Semantic: long-lived page-content cache keyed by URL/DOM hash ## Acceptance uv run pytest tests/instrument/adapters/_base/test_memory.py -x -> 27 passed uv run pytest tests/instrument/adapters/frameworks/test_memory_persistence_wiring.py -x -> 30 passed uv run pytest tests/instrument/adapters/_base/ -> 44 passed (no regressions) uv run pytest tests/instrument/adapters/frameworks/{agno,bedrock_agents,google_adk,llama_index,ms_agent_framework,openai_agents}_adapter.py -> 72 passed (no regressions) uv run mypy --strict src/layerlens/instrument/adapters/_base/memory.py -> Success: no issues found in 1 source file uv run mypy src/layerlens/instrument/adapters/_base/adapter.py src/layerlens/instrument/adapters/frameworks/{6 adapters}/lifecycle.py -> Success: no issues found in 7 source files uv run ruff check src/layerlens/instrument/adapters/_base/memory.py tests/instrument/adapters/_base/test_memory.py tests/instrument/adapters/frameworks/test_memory_persistence_wiring.py -> All checks passed!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cross-pollination item #1 — memory persistence (episodic / procedural / semantic)
Implements cross-pollination item #1 from
A:/tmp/adapter-cross-pollination-audit.mdsection 2 #1. The four mature framework adapters (LangChain, AutoGen, CrewAI, Semantic Kernel) carry ad-hoc memory plumbing — episodic recent turns, procedural learned patterns, semantic long-lived facts — that lets agents recall context across runs. The lighter adapters (agno,ms_agent_framework,openai_agents,llama_index,google_adk,bedrock_agents,browser_use) all behave as goldfish agents — every run starts from a blank slate. This PR ports the pattern into a shared, replay-safe primitive that the lighter adapters plug into uniformly.What is new
Shared memory primitive
src/layerlens/instrument/adapters/_base/memory.py— newMemorySnapshot— frozen dataclass withturn_index,episodic(recent turns),procedural(detected patterns),semantic(key/value facts),content_hash(SHA-256 of canonical-JSON encoding),org_id(tenant binding).to_dict/from_dictround-trip preserves identity.MemoryRecorder— thread-safe accumulator.record_turn(...)is the per-turn entry point;set_semantic(key, value)for long-lived facts;snapshot()returns the immutable view;restore(snap)rebuilds state from a previous snapshot. All buckets bounded (defaults 200/16/64); episodic FIFO eviction, semantic LRU, procedural keep-top-by-count.(prev_tools, current_tools)pairs.org_idat construction;restore()rejects cross-tenant snapshots and tampered snapshots (content-hash mismatch).snapshot() -> restore() -> snapshot()round-trip produces byte-identicalcontent_hash.BaseAdapter integration
src/layerlens/instrument/adapters/_base/adapter.pyself._memory_recorder = MemoryRecorder(org_id=self._org_id).record_memory_turn(...)helper — best-effort wrapper that swallows recorder failures so memory persistence never breaks the host framework call stack (CLAUDE.md "tracing never breaks user code").memory_recorderproperty,memory_snapshot()andmemory_snapshot_dict()convenience accessors.Per-adapter wiring (6 adapters)
agnoAgent.run/arunfinally-blockargs[0]/kwargs["input"]_collect_tool_namesfromresult.messagesms_agent_frameworkChat.invoke/invoke_streamfinally-blockkwargs["input"]/["message"]_collect_tool_names_from_messagesfrom streamed itemsopenai_agents_on_agent_span_end(TraceProcessor) +on_run_endspan_id_on_function_span_endperparent_idllama_index_on_agent_step_end_on_tool_callper thread idgoogle_adkafter_agent_callback+on_agent_endbefore_agent_callbackTIDafter_tool_callbackper TIDbedrock_agents_after_invoke_agent(boto3 hook)_before_invoke_agentTID_process_traceaction-group / KB namesEach adapter's
serialize_for_replay()now embeds the snapshot underReplayableTrace.metadata["memory_snapshot"]so replay engines can reconstruct memory state viaMemorySnapshot.from_dict(...) -> recorder.restore(snapshot)before re-execution.Tests (57 new)
tests/instrument/adapters/_base/test_memory.py— 27 testsorg_idrejected; zero buffer sizes rejected; initial state empty).org_id-> different hash; mutating recorder doesnt affect prior snapshot;to_dict/from_dictround-trip preserves hash;from_dictrejects missing required fields).snapshot -> restore -> snapshotbyte-identical hash; deterministic next-state under matching inputs; cross-tenant restore raises; tampered-content-hash restore raises).record_turnreturns post-increment counter.tests/instrument/adapters/frameworks/test_memory_persistence_wiring.py— 30 tests (5 x 6 adapters parametrized)org_id.record_memory_turnadvances the episodic buffer.serialize_for_replay()embedsmetadata["memory_snapshot"].Existing test suites unchanged
tests/instrument/adapters/_base/— 44 passed (no regressions).Documentation
docs/adapters/memory-contract.md— explains the three buckets, the contract (tenant binding, bounded buffers, tamper-evident snapshots, replay-safe round-trip, best-effort recording, thread safety), per-adapter wiring matrix, and audit hooks. Includes the replay-engine integration recipe and the honest scope disclosure forbrowser_use.Honest scope disclosure (browser_use)
The cross-pollination audit section 2 #1 enumerates seven target adapters. Six are wired here. The seventh —
browser_use— does NOT exist on this PR base branch (feat/instrument-multitenancy-org-id-propagation); it lives on the parallelfeat/instrument-frameworks-browser-use-fullhistory. It will be wired when that adapter is ported to this base or when the histories merge. This follows the same honest-disclosure pattern as PR #120 (state filters, which omittedms_agent_frameworkfor the same reason — adapter not on its base).The future
browser_usewiring (per audit section 2 #1) will be:(prev_action, current_action)patterns (e.g.click[search]->type[query]->click[submit]).Acceptance
Test plan
pytest tests/instrument/adapters/_base/test_memory.py -x(27 tests)pytest tests/instrument/adapters/frameworks/test_memory_persistence_wiring.py -x(30 tests, 5 x 6 adapters)pytest tests/instrument/adapters/_base/(no regressions on resilience tests)pytest tests/instrument/adapters/frameworks/{6 target adapters}_adapter.py(no regressions)mypy --strict src/layerlens/instrument/adapters/_base/memory.pymypy src/.../{base + 6 adapter lifecycle.py files}ruff check