feat(instrument): State include/exclude key filters for 6 multi-agent adapters (cross-poll #6) by mmercuri · Pull Request #120 · LayerLens/stratix-python

mmercuri · 2026-04-27T03:49:34Z

Cross-pollination item #6 — state include/exclude key filters

Implements cross-pollination item #6 from A:/tmp/adapter-cross-pollination-audit.md §2.12. LangGraph's LangGraphStateAdapter (mature) supports include/exclude key filters at the state-snapshot level so customers can scrub sensitive state (api_keys, tokens, PII) WITHOUT modifying their agent code or doing post-hoc redaction. This PR brings the same contract to the lighter multi-agent framework adapters present on this base.

What's new

Shared filter module

src/layerlens/instrument/adapters/_base/state_filters.py — new

StateFilter — frozen dataclass with include_keys, exclude_keys, mask_keys, recursive. Three operations applied in order: exclude (drop), mask (replace value with [REDACTED]), include (allowlist). Matching is case-insensitive substring after alphanumeric-only normalisation, so X-Api-Key, USER_API_KEY, customer.email_address all match without enumerating every variant.
DEFAULT_PII_EXCLUDE_KEYS — conservative denylist covering 49 common credential / PII / financial / contact field names. Customers who do nothing still get baseline protection out of the box per CLAUDE.md ("never silently leak customer data").
default_state_filter() — factory installed by every adapter unless the caller passes a custom state_filter.
filter_state(state, filter) — pure function returning (filtered_state, filtered_keys) so adapters can surface the clipped key names as _filtered_keys event metadata for audit.
filter_payload_fields(payload, filter, fields) — surgical helper that filters only the named dict-shaped fields of a mixed-shape payload (so scalar metadata like model, latency_ms is preserved).
StateFilter.permissive() — opt-out factory for tests / explicit disablement. The active filter snapshot is surfaced under adapter_info().metadata['state_filter'] so operators can detect accidental disablement.
StateFilter.with_extra_excludes() — default + caller's additions.

FrameworkAdapter integration

src/layerlens/instrument/adapters/frameworks/_base_framework.py

Constructor accepts optional state_filter (defaults to default_state_filter()).
self._state_filter reachable on every subclass.
New _filter_payload(payload, *fields) helper used by adapters immediately before each _emit(...) call for any payload that may contain user-controlled state.
New serialize_state_filter_for_replay() — replay engine uses this to reconstruct an equivalent filter on the other side, so the captured payload shapes match between original run and replay.
adapter_info().metadata['state_filter'] surfaces the active config.

Per-adapter wiring (6 multi-agent adapters)

Adapter	Constructor `state_filter`	Filter applied at emit
`agno`	YES	`agent.input/output`, `tool.call/result`
`openai_agents`	YES	`agent.input` (tools/handoffs/output_type), generation messages, function args + parameters_schema + mcp_data, tool result
`llamaindex`	YES	LLM messages + output_message, tool args, retrieval input/output, query input/output, agent_step input/output
`google_adk`	YES	run user_content + agent_tree, agent user_content, tool args, tool result
`strands`	YES	invocation messages, after-invocation output, tool input, tool result
`pydantic_ai`	YES	agent input + deps_summary, agent output, tool.call args, tool.result output, streaming output_message

The audit §2.12 enumerates 7 targets including ms_agent_framework — that adapter doesn't exist on this branch's base (feat/instrument-callback-resilience); it lives only on the parallel feat/instrument-multitenancy-org-id-propagation history. It will be wired when the ms_agent_framework adapter is ported to this base or the histories merge.

Tests (53 new)

`tests/instrument/adapters/_base/test_state_filters.py`

TestStateFilterConstruction — defaults are PII-aware, lowercasing, permissive factory, with_extra_excludes factory.
TestStateFilterMetadata — default snapshot shape, allowlist surfaces in metadata.
TestFilterStateExclude — default PII keys removed, vendor variants caught (X-Api-Key, USER_API_KEY, stripe_customer_email), permissive opt-out.
TestFilterStateMask — keeps key visible, masking runs before recurse so nested PII can't leak through a masked field.
TestFilterStateInclude — allowlist semantics, exclude wins over include when both match.
TestFilterStateRecursive — nested dicts, lists of dicts, non-recursive flag.
TestFilterStatePassthrough — primitives + empty dict pass through.
TestFilterPayloadFields — surgical filter (scalars untouched), missing fields skipped, scalar field is no-op, accumulating _filtered_keys across multiple passes.
TestFrameworkAdapterStateFilterDefaults — default installed, custom override, end-to-end PII drop, replay snapshot, adapter_info.
TestPerAdapterStateFilterWiring — parametrized across all 6 adapters: constructor accepts state_filter, default is PII-aware, state_filter surfaces in adapter_info.
TestEndToEndAgnoFilter — filter actually runs at the emit boundary (not just sits idle on the adapter).

Existing test suites unchanged

tests/instrument/adapters/_base/ — 110 passed, 7 skipped.
tests/instrument/adapters/frameworks/ — 114 passed (langchain, langgraph, langfuse, agentforce — adapters with deps installed in CI venv), 12 skipped (optional deps), 1 pre-existing Windows clock-resolution flake on test_haystack (documented in PR feat(instrument): Per-callback try/except resilience wrapper across 10 lighter adapters (cross-poll #5) #117).

Documentation

docs/adapters/state-filters.md — explains default behaviour, three filter operations, configuration recipes, recursion, auditability via _filtered_keys, replay reproducibility, and the per-adapter wiring matrix.

Acceptance

pytest tests/instrument/adapters/_base/test_state_filters.py -x
# 53 passed in 0.10s

pytest tests/instrument/adapters/_base/
# 110 passed, 7 skipped in 0.26s

pytest tests/instrument/adapters/frameworks/
# 114 passed, 12 skipped, 1 pre-existing flake

mypy --strict src/layerlens/instrument/adapters/_base/state_filters.py
# Success: no issues found in 1 source file

mypy src/layerlens/instrument/adapters/frameworks/{_base_framework,agno,openai_agents,llamaindex,google_adk,strands,pydantic_ai}.py
# Success: no issues found in 7 source files

ruff check ...
# All checks passed!

Test plan

pytest tests/instrument/adapters/_base/test_state_filters.py -x
pytest tests/instrument/adapters/_base/ (no regressions on resilience tests)
pytest tests/instrument/adapters/frameworks/ (no regressions on adapter tests)
mypy --strict src/layerlens/instrument/adapters/_base/state_filters.py
mypy src/.../{adapters wired in this PR}.py
ruff check
CI must pass on Linux / 3.10-3.13 matrix (developed on Windows / 3.9)

…0 lighter adapters Introduces a shared @resilient_callback decorator + ResilienceTracker under `src/layerlens/instrument/adapters/_base/`, then applies it to every callback method on the 10 lighter framework adapters (agno, llamaindex, google_adk, strands, pydantic_ai, smolagents, bedrock_agents, openai_agents, haystack, langfuse) so an exception in our observability code can never crash the customer's framework execution. What the decorator does on failure: 1. Catches Exception (NOT BaseException — KeyboardInterrupt / SystemExit still propagate so users can Ctrl-C their agent). 2. Logs the exception via the wrapped function's module logger with adapter_name + callback_name + truncated traceback. 3. Increments the adapter's per-instance ResilienceTracker counter. 4. Returns the framework's expected default value — None for void handlers, or the value of `passthrough_arg` for mutating hooks (Pydantic-AI's `after_model_request` returns the response object; `before_tool_execute` returns the args tuple). Health surfacing: - FrameworkAdapter now owns a `_resilience: ResilienceTracker` attribute set in `__init__` so every framework adapter inherits the contract. - `adapter_info().metadata` merges the live resilience snapshot (`resilience_status`, `resilience_failures_total`, `resilience_failure_threshold`, per-callback breakdown, last error). - After DEFAULT_FAILURE_THRESHOLD (5) failures the adapter reports `resilience_status: "degraded"` so monitoring can alert. - `disconnect()` resets the tracker so reconnects start clean. Per-adapter callback audit + fixes: | Adapter | Callbacks wrapped | Notes | |-----------------|-------------------|----------------------------------------| | agno | 2 | _on_run_start, _on_run_end | | llamaindex | 16 | 3 span lifecycle + dispatcher + 12 events| | google_adk | 11 | All adapter _on_* + simplified plugin shims| | strands | 7 | All hook handlers (replaces manual try/except)| | pydantic_ai | 9 (incl 3 split) | Error hooks split: telemetry resilient, re-raise unconditional| | smolagents | 6 | Run/step handlers (replaces manual try/except)| | bedrock_agents | 2 | _before_invoke + _after_invoke (with try/finally for _end_run)| | openai_agents | 3 | on_trace_start/_end + on_span_end (replaces manual try/except)| | haystack | 1 | _on_span_end (replaces manual try/except)| | langfuse | 5 | _import_single_trace, _import_observation, _import_score, _export_single_trace, plus inner emit fallbacks| | TOTAL | 62 | | Pydantic-AI error-callback split: `_on_run_error`, `_on_model_request_error`, `_on_tool_execute_error` MUST always re-raise the framework's original error (per Pydantic-AI's contract). The telemetry side is moved into a `_emit_*_error_telemetry` helper wrapped with @resilient_callback; the public hook calls it then unconditionally `raise error`. So adapter-side telemetry bugs can never swallow a real framework error. Tests: - `tests/instrument/adapters/_base/test_resilience.py` — 34 tests covering tracker mechanics, decorator behaviour, passthrough args, KeyboardInterrupt propagation, FrameworkAdapter integration, package re-exports, and decorator metadata preservation. - `tests/instrument/adapters/_base/test_per_adapter_resilience.py` — per-adapter smoke tests (one per lighter adapter) that simulate a callback exception by sabotaging an inner helper, plus a parametrized health-degradation test across all 10 adapters. Refactor: `_base.py` (the AdapterInfo + BaseAdapter module) becomes `_base/` package with `__init__.py` re-exporting from `_core.py` (moved via `git mv`) and the new `resilience.py`. All existing `from .._base import AdapterInfo, BaseAdapter` imports continue working unchanged. Acceptance: - pytest tests/instrument/adapters/_base/test_resilience.py -x — 34 passed - pytest tests/instrument/adapters/frameworks/ -x — 146 passed (12 skipped for missing optional deps; 2 deselected pre-existing Windows clock-resolution flakes in test_haystack) - mypy --strict src/layerlens/instrument/adapters/_base/resilience.py — Success - mypy src — Success: 169 source files - ruff check — All checks passed - Full test suite: 1090 passed

… adapters (cross-poll #6) Implements cross-pollination item #6 from `A:/tmp/adapter-cross-pollination-audit.md` §2.12. LangGraph's `LangGraphStateAdapter` (mature) supports include/exclude key filters at the state-snapshot level so customers can scrub sensitive state (api_keys, tokens, PII) WITHOUT modifying their agent code or doing post-hoc redaction. This PR brings the same contract to the lighter multi-agent framework adapters present on this base. ## What's new ### Shared filter module `src/layerlens/instrument/adapters/_base/state_filters.py` — new - `StateFilter` — frozen dataclass with `include_keys`, `exclude_keys`, `mask_keys`, `recursive`. Three operations applied in order: exclude (drop), mask (replace value with `[REDACTED]`), include (allowlist). Matching is case-insensitive substring after alphanumeric-only normalisation, so `X-Api-Key`, `USER_API_KEY`, `customer.email_address` all match without enumerating every variant. - `DEFAULT_PII_EXCLUDE_KEYS` — conservative denylist covering 49 common credential / PII / financial / contact field names. Customers who do nothing still get baseline protection out of the box per CLAUDE.md ("never silently leak customer data"). - `default_state_filter()` — factory installed by every adapter unless the caller passes a custom `state_filter`. - `filter_state(state, filter)` — pure function returning `(filtered_state, filtered_keys)` so adapters can surface the clipped key names as `_filtered_keys` event metadata for audit. - `filter_payload_fields(payload, filter, fields)` — surgical helper that filters only the named dict-shaped fields of a mixed-shape payload (so scalar metadata like `model`, `latency_ms` is preserved). - `StateFilter.permissive()` — opt-out factory for tests / explicit disablement. The active filter snapshot is surfaced under `adapter_info().metadata['state_filter']` so operators can detect accidental disablement. - `StateFilter.with_extra_excludes()` — default + caller's additions. ### FrameworkAdapter integration `src/layerlens/instrument/adapters/frameworks/_base_framework.py` - Constructor accepts optional `state_filter` (defaults to `default_state_filter()`). - `self._state_filter` reachable on every subclass. - New `_filter_payload(payload, *fields)` helper used by adapters immediately before each `_emit(...)` call for any payload that may contain user-controlled state. - New `serialize_state_filter_for_replay()` — replay engine uses this to reconstruct an equivalent filter on the other side, so the captured payload shapes match between original run and replay. - `adapter_info().metadata['state_filter']` surfaces the active config. ### Per-adapter wiring (6 multi-agent adapters) | Adapter | Constructor `state_filter` | Filter applied at emit | |-----------------|----------------------------|--------------------------------| | `agno` | YES | `agent.input/output`, `tool.call/result` | | `openai_agents` | YES | `agent.input` (tools/handoffs/output_type), generation messages, function args + parameters_schema + mcp_data, tool result | | `llamaindex` | YES | LLM messages + output_message, tool args, retrieval input/output, query input/output, agent_step input/output | | `google_adk` | YES | run user_content + agent_tree, agent user_content, tool args, tool result | | `strands` | YES | invocation messages, after-invocation output, tool input, tool result | | `pydantic_ai` | YES | agent input + deps_summary, agent output, tool.call args, tool.result output, streaming output_message | The audit §2.12 enumerates 7 targets including `ms_agent_framework` — that adapter doesn't exist on this branch's base (`feat/instrument-callback-resilience`); it lives only on the parallel `feat/instrument-multitenancy-org-id-propagation` history. It will be wired when the ms_agent_framework adapter is ported to this base or the histories merge. ## Tests (53 new + integration) ### `tests/instrument/adapters/_base/test_state_filters.py` — 53 tests - `TestStateFilterConstruction` — defaults are PII-aware, lowercasing, permissive factory, with_extra_excludes factory. - `TestStateFilterMetadata` — default snapshot shape, allowlist surfaces in metadata. - `TestFilterStateExclude` — default PII keys removed, vendor variants caught (`X-Api-Key`, `USER_API_KEY`, `stripe_customer_email`), permissive opt-out. - `TestFilterStateMask` — keeps key visible, masking runs before recurse so nested PII can't leak through a masked field. - `TestFilterStateInclude` — allowlist semantics, exclude wins over include when both match. - `TestFilterStateRecursive` — nested dicts, lists of dicts, non-recursive flag. - `TestFilterStatePassthrough` — primitives + empty dict pass through. - `TestFilterPayloadFields` — surgical filter (scalars untouched), missing fields skipped, scalar field is no-op, accumulating `_filtered_keys` across multiple passes. - `TestFrameworkAdapterStateFilterDefaults` — default installed, custom override, end-to-end PII drop, replay snapshot, adapter_info. - `TestPerAdapterStateFilterWiring` — parametrized across all 6 adapters: constructor accepts state_filter, default is PII-aware, state_filter surfaces in adapter_info. - `TestEndToEndAgnoFilter` — filter actually runs at the emit boundary (not just sits idle on the adapter). ### Existing test suites unchanged - `tests/instrument/adapters/_base/` — 110 passed, 7 skipped. - `tests/instrument/adapters/frameworks/` — 114 passed (langchain, langgraph, langfuse, agentforce — adapters with deps installed in CI venv), 12 skipped (optional deps), 1 pre-existing Windows clock-resolution flake on test_haystack (documented in PR #117). ## Documentation `docs/adapters/state-filters.md` — explains default behaviour, three filter operations, configuration recipes, recursion, auditability via `_filtered_keys`, replay reproducibility, and the per-adapter wiring matrix. ## Acceptance ``` pytest tests/instrument/adapters/_base/test_state_filters.py -x # 53 passed in 0.10s pytest tests/instrument/adapters/_base/ # 110 passed, 7 skipped in 0.26s pytest tests/instrument/adapters/frameworks/ # adapters with installed deps # 114 passed, 12 skipped, 1 pre-existing flake (test_haystack.test_input_and_output) mypy --strict src/layerlens/instrument/adapters/_base/state_filters.py # Success: no issues found in 1 source file mypy src/layerlens/instrument/adapters/frameworks/{_base_framework,agno,openai_agents,llamaindex,google_adk,strands,pydantic_ai}.py # Success: no issues found in 7 source files ruff check src/layerlens/instrument/adapters/_base/state_filters.py src/layerlens/instrument/adapters/frameworks/{_base_framework,agno,openai_agents,llamaindex,google_adk,strands,pydantic_ai}.py tests/instrument/adapters/_base/test_state_filters.py # All checks passed! ```

…or 6 lighter adapters (cross-poll #1) (#130) Implements cross-pollination item #1 from A:/tmp/adapter-cross-pollination-audit.md section 2 #1. The four mature framework adapters (LangChain, AutoGen, CrewAI, Semantic Kernel) carry ad-hoc memory plumbing — episodic recent turns, procedural learned patterns, semantic long-lived facts — that lets agents recall context across runs. The lighter adapters (agno, ms_agent_framework, openai_agents, llama_index, google_adk, bedrock_agents, browser_use) all behave as goldfish agents — every run starts from a blank slate. This PR ports the pattern into a shared, replay-safe primitive that the lighter adapters plug into uniformly. ## What is new ### Shared memory primitive src/layerlens/instrument/adapters/_base/memory.py — new - MemorySnapshot — frozen dataclass with turn_index, episodic (recent turns), procedural (detected patterns), semantic (key/value facts), content_hash (SHA-256 of canonical-JSON encoding), org_id (tenant binding). to_dict / from_dict round-trip preserves identity. - MemoryRecorder — thread-safe accumulator. record_turn(...) is the per-turn entry point; set_semantic(key, value) for long-lived facts; snapshot() returns the immutable view; restore(snap) rebuilds state from a previous snapshot. All buckets bounded (defaults 200/16/64); episodic FIFO eviction, semantic LRU, procedural keep-top-by-count. - Procedural pattern detection: O(window) per turn, scans the recent episodic window for recurring (prev_tools, current_tools) pairs. - Multi-tenant: recorder requires non-empty org_id at construction; restore() rejects cross-tenant snapshots and tampered snapshots (content-hash mismatch). - Replay-safe: snapshot -> restore -> snapshot round-trip produces byte-identical content_hash. ### BaseAdapter integration src/layerlens/instrument/adapters/_base/adapter.py - Constructor builds self._memory_recorder = MemoryRecorder(org_id=self._org_id). - New record_memory_turn(...) helper — best-effort wrapper that swallows recorder failures so memory persistence never breaks the host framework call stack (CLAUDE.md "tracing never breaks user code"). - memory_recorder property, memory_snapshot() and memory_snapshot_dict() convenience accessors. ### Per-adapter wiring (6 adapters) - agno: Agent.run/arun finally-block; episodic input from args/kwargs; tool list from _collect_tool_names(result.messages). - ms_agent_framework: Chat.invoke/invoke_stream finally-block; episodic input from kwargs; tool list from streamed message items. - openai_agents: _on_agent_span_end (TraceProcessor) + on_run_end (Runner wrap); episodic input cached at span_start per span_id; tool list rolled up from _on_function_span_end per parent_id. - llama_index: _on_agent_step_end; episodic input cached at step_start per thread id; tool list rolled up from _on_tool_call. - google_adk: after_agent_callback + on_agent_end; episodic input cached at before_agent_callback per thread id; tool list rolled up from after_tool_callback per thread id. - bedrock_agents: _after_invoke_agent (boto3 hook); episodic input cached at _before_invoke_agent per thread id; tool list rolled up from _process_trace action-group / KB step names. Each adapter serialize_for_replay() now embeds the snapshot under ReplayableTrace.metadata["memory_snapshot"] so replay engines can reconstruct memory state via MemorySnapshot.from_dict(...) -> recorder.restore(snapshot) before re-execution. ## Tests (57 new) ### tests/instrument/adapters/_base/test_memory.py — 27 tests Recorder construction (empty/non-string org_id rejected; zero buffer sizes rejected; initial state empty). Snapshot determinism (identical content -> identical hash; different org_id -> different hash; mutating recorder doesnt affect prior snapshot; to_dict/from_dict round-trip preserves hash; from_dict rejects missing required fields). Replay round-trip (snapshot -> restore -> snapshot byte-identical hash; deterministic next-state under matching inputs; cross-tenant restore raises; tampered-content-hash restore raises). Bounded eviction (episodic FIFO at cap; semantic LRU at cap; semantic overwrite refreshes LRU; procedural cap). Procedural detection (repeated tool sequences accumulate count; no-tool turns produce no patterns). Per-turn truncation (multi-megabyte values capped with deterministic suffix). Thread safety (8 threads x 50 turns produces unbroken 1..400 sequence). Clear preserves binding; defaults positive; extra metadata sorted for hash determinism. ### tests/instrument/adapters/frameworks/test_memory_persistence_wiring.py — 30 tests (5 x 6 adapters parametrized) - Each adapter exposes a recorder bound to its org_id. - record_memory_turn advances the episodic buffer. - serialize_for_replay() embeds metadata["memory_snapshot"]. - Replay engine can restore the recorder from the serialised trace (content-hash match end-to-end). - Cross-tenant snapshot is rejected at the per-adapter recorder boundary. ## Documentation docs/adapters/memory-contract.md — explains the three buckets, the contract (tenant binding, bounded buffers, tamper-evident snapshots, replay-safe round-trip, best-effort recording, thread safety), per-adapter wiring matrix, and audit hooks. Includes the replay-engine integration recipe and the honest scope disclosure for browser_use. ## Honest scope disclosure The cross-pollination audit section 2 #1 enumerates seven target adapters. Six are wired here. The seventh — browser_use — does NOT exist on this PR base branch (feat/instrument-multitenancy-org-id-propagation); it lives on the parallel feat/instrument-frameworks-browser-use-full history. It will be wired when that adapter is ported to this base or when the histories merge. This follows the same honest-disclosure pattern as PR #120 (state filters, which omitted ms_agent_framework for the same reason). The future browser_use wiring (per audit section 2 #1) will be: - Episodic: page navigation events (URL, action, selector) - Procedural: recurring (prev_action, current_action) patterns - Semantic: long-lived page-content cache keyed by URL/DOM hash ## Acceptance uv run pytest tests/instrument/adapters/_base/test_memory.py -x -> 27 passed uv run pytest tests/instrument/adapters/frameworks/test_memory_persistence_wiring.py -x -> 30 passed uv run pytest tests/instrument/adapters/_base/ -> 44 passed (no regressions) uv run pytest tests/instrument/adapters/frameworks/{agno,bedrock_agents,google_adk,llama_index,ms_agent_framework,openai_agents}_adapter.py -> 72 passed (no regressions) uv run mypy --strict src/layerlens/instrument/adapters/_base/memory.py -> Success: no issues found in 1 source file uv run mypy src/layerlens/instrument/adapters/_base/adapter.py src/layerlens/instrument/adapters/frameworks/{6 adapters}/lifecycle.py -> Success: no issues found in 7 source files uv run ruff check src/layerlens/instrument/adapters/_base/memory.py tests/instrument/adapters/_base/test_memory.py tests/instrument/adapters/frameworks/test_memory_persistence_wiring.py -> All checks passed!

mmercuri added 2 commits April 26, 2026 17:11

mmercuri requested a review from m-peko April 27, 2026 03:49

mmercuri mentioned this pull request Apr 27, 2026

feat(instrument): Memory persistence (episodic/procedural/semantic) for 7 lighter adapters (cross-poll #1) #130

Merged

8 tasks

m-peko closed this May 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(instrument): State include/exclude key filters for 6 multi-agent adapters (cross-poll #6)#120

feat(instrument): State include/exclude key filters for 6 multi-agent adapters (cross-poll #6)#120
mmercuri wants to merge 2 commits into
developmentfrom
feat/instrument-state-filters

mmercuri commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mmercuri commented Apr 27, 2026

Cross-pollination item #6 — state include/exclude key filters

What's new

Shared filter module

FrameworkAdapter integration

Per-adapter wiring (6 multi-agent adapters)

Tests (53 new)

tests/instrument/adapters/_base/test_state_filters.py

Existing test suites unchanged

Documentation

Acceptance

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`tests/instrument/adapters/_base/test_state_filters.py`