feat(instrument): Field-specific truncation policy across 11 lighter adapters (cross-poll #3) by mmercuri · Pull Request #116 · LayerLens/stratix-python

mmercuri · 2026-04-27T00:11:38Z

Summary

Cross-pollination audit §2.4 (A:/tmp/adapter-cross-pollination-audit.md) — port the field-specific truncation policy that LangChain / LangGraph / AutoGen / Semantic Kernel / Agentforce each carry ad-hoc to the 11 lighter adapters (agno, bedrock_agents, embedding, google_adk, llama_index, ms_agent_framework, openai_agents, pydantic_ai, smolagents, strands, browser_use).

Why now: without this, lighter adapters emit unbounded user payloads — full prompts, tool I/O, state snapshots, browser screenshots. That blows past Kafka's 1 MB record limit, triggers S3 multi-part uploads in the ingestion pipeline, inflates TimescaleDB indexing cost, and embeds high-cardinality user data in every span attribute. CRITICAL for browser_use: a single navigation step can produce multi-megabyte base64 PNG screenshots.

What changed

Shared helper — `src/layerlens/instrument/adapters/_base/truncation.py`

FieldTruncationPolicy (frozen dataclass, zero-overhead) + DEFAULT_POLICY.
truncate_field(value, field_name, policy) — recursive, UTF-8-safe, list-cap, recursion-cap.
truncate_payload(payload, policy) returns (truncated_payload, truncated_fields) audit list.
Defaults from the spec:
- prompt/completion/message: 4096 chars
- tool_input/tool_output: 2048 chars
- state_snapshot: 8192 chars
- error_message: 1024 chars
- traceback: 8 frames (truncated by frame, not chars)
- screenshot/image_data: DROP → deterministic <dropped:...:sha256:...> reference
- html/dom (browser_use): 16384 chars

`BaseAdapter` wiring

New _truncation_policy attribute (defaults None = legacy passthrough).
emit_dict_event calls _apply_truncation(payload) which rewrites the payload and attaches _truncated_fields audit list when any field exceeded its cap.
Failure-safe: policy errors fall back to raw payload + DEBUG log.

Per-adapter wiring (all 11 targets)

Each lighter adapter constructor now sets self._truncation_policy = DEFAULT_POLICY:

agno, bedrock_agents, google_adk, llama_index, ms_agent_framework, openai_agents, pydantic_ai, smolagents, strands.
embedding (both EmbeddingAdapter + VectorStoreAdapter).
browser_use — placeholder lifecycle.py created (M7-precursor) so the truncation policy is in place ahead of M7 instrumentation. manifest_consistency xfails for browser_use are strict=True so they flip red the moment M7 lands the canonical event hooks.

Tests

tests/instrument/adapters/_base/test_truncation.py — 31 tests covering policy correctness, UTF-8/CJK/emoji multibyte safety, drop/hash references, traceback frame truncation, nested dicts/lists, recursion guard, primitives, immutability, determinism.
tests/instrument/test_base_layer.py — 9 new tests for BaseAdapter._apply_truncation integration (gating, drop, custom policy, no mutation, nested dicts, broken-policy fallback, browser_use HTML pattern).
4 new truncation tests in each of 9 existing lighter-adapter test files (36 total).
New test_embedding_adapter.py (5 tests) and test_browser_use_adapter.py (7 tests).

Test plan

uv run pytest tests/instrument/adapters/_base/test_truncation.py -x → 31 passed
uv run pytest tests/instrument/adapters/frameworks/ -x (excl. pre-existing test_bulk_ported_smoke.py import) → 168 passed
uv run pytest tests/instrument/test_base_layer.py → 51 passed (was 42, now includes 9 new TestBaseAdapterTruncation cases)
uv run mypy --strict src/layerlens/instrument/adapters/_base/truncation.py → Success
uv run mypy --strict src/layerlens/instrument/adapters/ (full package) → Success: 39 source files
uv run ruff check (all modified + new files) → All checks passed

CLAUDE.md compliance

No TODOs.
All 11 adapters wired (10 existing + browser_use placeholder pre-wired).
UTF-8-safe truncation verified for emoji + CJK round-trip.
Truncation metadata exposed via _truncated_fields audit list — no silent truncation.
No co-author trailers.
Draft PR.

Pre-existing failures (out of scope)

Two test failures exist on the base branch unrelated to this PR:

test_lazy_imports.py::test_adapter_packages_importable_without_framework — protocols/ package lives on a separate branch not yet merged.
test_bulk_ported_smoke.py collection error — imports adapters (crewai, autogen, langchain, langgraph, agentforce, langfuse) that live on different feature branches.

Both are confirmed pre-existing on feat/instrument-manifest-tier-fix (the base of this PR).

Bootstraps the LayerLens instrument layer with the abstract base classes, adapter registry, capture configuration, event sinks, vendored event schemas, and pydantic v1/v2 compatibility shim that every concrete adapter (frameworks, protocols, providers) will depend on. Scope ----- - src/layerlens/instrument/__init__.py: lean re-export surface - src/layerlens/instrument/_vendored/: frozen ateam event schemas (no runtime ateam dependency) - src/layerlens/instrument/adapters/_base/: BaseAdapter, AdapterRegistry, AdapterStatus, AdapterHealth, AdapterCapability, ReplayableTrace, CaptureConfig, EventSink, TraceStoreSink, IngestionPipelineSink, PydanticCompat - src/layerlens/_compat/pydantic.py: model_dump/model_validate shim spanning pydantic v1 + v2 - scripts/{port_adapter,port_protocol,emit_adapter_manifest, regen_dep_baselines}.py: codegen helpers used to port the rest of M1 - tests/instrument/{test_base_layer,test_lazy_imports, test_default_install,test_resolved_dep_tree}.py + _baselines/ - .github/workflows/dep-tree-guard.yaml: CI gate that locks the default install footprint - docs/adapters/: CONTRIBUTING, STATUS, pydantic-compatibility, testing, PERSONA_REVIEW Blast radius ------------ - Pure additions. No public surface changes outside the new layerlens.instrument namespace. - Default `pip install layerlens` install set is unchanged (verified by test_default_install.py against the new baseline). - Lazy adapter discovery: importing layerlens.instrument MUST NOT pull in any optional adapter dep (verified by test_lazy_imports.py). Test plan --------- - uv run pytest tests/instrument/test_base_layer.py tests/instrument/test_lazy_imports.py -x -> 45 passed - The dep-tree-guard workflow exercises test_default_install.py and test_resolved_dep_tree.py against the new baselines on every PR. LAY-3400 umbrella: this PR is the prerequisite for the M1.B/M1.C/M1.D adapter ports, M7 protocol certification, and M8 Cohere/Mistral.

Ports the twelve agent-tier framework adapters from the ateam reference implementation onto the new layerlens.instrument base layer: Semantic Kernel, LlamaIndex, OpenAI Agents, Pydantic-AI, Agno, Strands, SmolAgents, MS Agent Framework, Google ADK, Bedrock Agents, Embedding (vector store hooks), Benchmark Import Pairs with feat/instrument-frameworks-orchestration (M1.C part 1) which lands LangChain, LangGraph, CrewAI, AutoGen, Langfuse, and Agentforce. Together they complete M1.C. Scope ----- - src/layerlens/instrument/adapters/frameworks/{semantic_kernel, llama_index,openai_agents,pydantic_ai,agno,strands,smolagents, ms_agent_framework,google_adk,bedrock_agents,embedding, benchmark_import}/: per-framework packages - tests/instrument/adapters/frameworks/test_*_adapter.py + the test_bulk_ported_smoke.py harness (which exercises every ported adapter against canned trace fixtures so partial framework SDKs on a given runner don't drop coverage to zero) - samples/instrument/<framework>/: runnable per-framework samples - docs/adapters/frameworks-<framework>.md: per-framework integration guide - pyproject.toml: twelve new optional extras (semantic-kernel, llama-index, openai-agents, pydantic-ai, agno, strands, smolagents, ms-agent-framework, google-adk, bedrock-agents, embedding, benchmark-import) with python_version markers; pyright/ruff exclusions for the dynamic monkey-patching framework code Blast radius ------------ - Default `pip install layerlens` install set is unchanged. Each framework's heavy deps are gated behind their own extra. - No changes to existing public API surface. - Importing layerlens.instrument still does NOT pull in any framework module (lazy registry lookup). Test plan --------- - uv run pytest tests/instrument/adapters/frameworks/ -x -> 184 passed, 1 skipped (test_bulk_ported_smoke.py covers all 12 agent-tier adapters plus the orchestration-tier ones from part 1 via the same harness) Stacks on --------- - feat/instrument-base-foundation (M1.A) — required for the BaseAdapter surface this PR consumes. Sibling of ---------- - feat/instrument-frameworks-orchestration (M1.C part 1) — both branches stack on the base foundation independently and don't conflict; they can land in either order. LAY-3400 umbrella (M1.C part 2).

…ier + lint guards for capability/event consistency Fixes the audit finding that scripts/emit_adapter_manifest.py only ever wrote ``mature`` or ``smoke_only`` despite ``lifecycle_preview`` being documented in the manifest schema for months. As a result, every adapter that shipped full lifecycle hooks but lacked one graduation artifact (doc, sample, alias) was rendered as ``smoke_only`` in the catalog UI -- hiding real coverage from customers. What this change does ===================== scripts/emit_adapter_manifest.py -------------------------------- * Adds ``_LIFECYCLE_PREVIEW`` set (12 adapters) and a ``_maturity_for`` helper that tier-resolves ``mature`` -> ``lifecycle_preview`` -> ``smoke_only`` in that order. * Asserts ``_MATURE`` and ``_LIFECYCLE_PREVIEW`` are disjoint at module import time (catches future copy/paste mistakes). * Adds ``by_maturity`` summary block to the manifest output. * Moves ``smolagents`` out of ``_MATURE`` (it lacked doc + sample) and into ``_LIFECYCLE_PREVIEW`` until the sibling artifact PR lands. * Updates the docstring with audit context so the reason this exists is in the source. tests/instrument/adapters/test_manifest_consistency.py (new) ------------------------------------------------------------ Three lint-style guards run statically (no optional runtimes needed): 1. Capability/hook consistency. If lifecycle.py defines ``on_handoff``, ``TRACE_HANDOFFS`` MUST be in ``get_adapter_info().capabilities``; same for ``on_tool_use``/``TRACE_TOOLS`` and ``on_llm_call``/ ``TRACE_MODELS``. Catches the pydantic_ai bug (xfail until sibling capabilities-cleanup PR lands). 2. Canonical event-type parity. Every framework lifecycle.py MUST emit the five canonical event types (``agent.input``, ``agent.output``, ``tool.call``, ``model.invoke``, ``environment.config``) at least once. Catches missing-emit regressions. 3. Maturity-vs-artifacts. Every adapter in ``_MATURE`` MUST have a per-adapter test file (>= 12 funcs), a sample, a doc, and a STRATIX->LayerLens deprecation alias. Provider entries are xfailed today since their per-provider artifact PRs haven't landed on this branch. xfail markers are ``strict=True`` so when sibling PRs land and fix the underlying gap, the marker MUST be removed in the same PR or the suite fails. scripts/__init__.py (new) ------------------------- Empty package marker so ``mypy .`` does not double-detect ``scripts/emit_adapter_manifest.py`` as both a top-level module and a package member when imported by the test suite. scripts/README.md (new) ----------------------- Documents the three maturity tiers, the promotion workflow, and the audit context. tests/instrument/adapters/MANIFEST_CONSISTENCY.md (new) ------------------------------------------------------- Documents the three lint guards, the xfail policy, and how to fix violations. Verification ============ * ``python scripts/emit_adapter_manifest.py --stdout`` now reports 12 ``lifecycle_preview`` adapters (was 0). * ``mypy --strict scripts/emit_adapter_manifest.py`` passes. * ``pytest tests/instrument/adapters/test_manifest_consistency.py -v`` reports 23 passed, 10 xfailed (one cap/hook + nine pending-provider artifacts). * ``ruff check`` clean on all new files.

…adapters Cross-pollination audit §2.4: lighter framework adapters (agno, bedrock_agents, embedding, google_adk, llama_index, ms_agent_framework, openai_agents, pydantic_ai, smolagents, strands, browser_use) emit event payloads without consistent size limits. Untruncated, oversized prompts/tool I/O/state values blow past Kafka's 1 MB record limit, trigger S3 multi-part uploads in the ingestion pipeline, inflate TimescaleDB indexing cost, and embed unbounded user payloads in every span attribute. Browser_use is CRITICAL: a single navigation step can capture multi-megabyte base64 PNG screenshots and hundred-KB DOM HTML payloads. Mature adapters (LangChain, LangGraph, AutoGen, Semantic Kernel, Agentforce) each carry their own ad-hoc truncation. This change standardises the policy across all 11 lighter adapters. Shared helper * New ``layerlens.instrument.adapters._base.truncation`` module with ``FieldTruncationPolicy`` (frozen dataclass, zero-overhead), ``DEFAULT_POLICY``, ``truncate_field``, ``truncate_payload``. * Defaults: prompt/completion/message 4 KiB, tool_input/tool_output 2 KiB, state_snapshot 8 KiB, error_message 1 KiB, traceback 8 frames, screenshot/image_data DROP→hash reference, html/dom 16 KiB. * UTF-8-safe (CJK + emoji round-trip), recursive over nested dicts/ lists, list cap 100 items, recursion depth limit 8. * Auditable: returns ``(truncated_payload, truncated_fields)`` so the emit path can attach a ``_truncated_fields`` audit list — silent truncation is forbidden by CLAUDE.md. BaseAdapter wiring * ``BaseAdapter._truncation_policy`` (default None = legacy passthrough). * ``emit_dict_event`` calls ``_apply_truncation(payload)`` which rewrites the payload via ``truncate_payload`` and appends the audit list as ``_truncated_fields`` when any field exceeded its cap. * Failure-safe: if the policy raises (defensive), the raw payload is emitted and the failure is logged at DEBUG. Per-adapter wiring (all 11 targets) Each lighter adapter constructor now sets ``self._truncation_policy = DEFAULT_POLICY``: * agno, bedrock_agents, google_adk, llama_index, ms_agent_framework, openai_agents, pydantic_ai, smolagents, strands — lifecycle.py ``__init__`` updated. * embedding — both EmbeddingAdapter and VectorStoreAdapter wired. * browser_use — placeholder lifecycle.py created (M7-precursor) so the truncation policy is in place ahead of the M7 instrumentation PR. Pre-wires ``screenshot``→hash and ``html``→16 KiB cap. Tests * ``tests/instrument/adapters/_base/test_truncation.py`` — 31 tests covering policy correctness, UTF-8/CJK/emoji multibyte safety, drop/hash references, traceback frame truncation, nested dicts/ lists, recursion guard, primitive pass-through, immutability of policy and input payload, deterministic re-runs. * ``tests/instrument/test_base_layer.py`` — 9 new tests for ``BaseAdapter._apply_truncation`` integration. * Per-adapter test extensions on agno, bedrock_agents, google_adk, llama_index, ms_agent_framework, openai_agents, pydantic_ai, smolagents, strands — 4 truncation tests each (36 total). * New ``test_embedding_adapter.py`` (5 tests) and ``test_browser_use_adapter.py`` (7 tests). * ``test_manifest_consistency.py`` xfails the canonical-events and capability-hook checks for browser_use until M7 lands (strict=True). Acceptance * uv run pytest tests/instrument/adapters/_base/test_truncation.py -x → 31 passed * uv run pytest tests/instrument/adapters/frameworks/ -x → 168 passed * uv run mypy --strict src/layerlens/instrument/adapters/_base/truncation.py → success * uv run ruff check (modified files) → all checks passed

…laceholder from #116) (#126) Replaces the M7 placeholder shipped in PR #116 (truncation policy) with the full BrowserUseAdapter — every lifecycle hook wired, every event emitted, and every cross-cutting CLAUDE.md contract enforced from day one. What changed ------------ Full lifecycle adapter (src/layerlens/instrument/adapters/frameworks/ browser_use/lifecycle.py): * connect / disconnect / health_check / get_adapter_info / serialize_for_replay (all five abstract BaseAdapter methods). * on_session_start, on_session_end, on_navigation, on_action, on_screenshot, on_dom_extraction, on_llm_call (every spec'd hook). * Capability declaration: TRACE_TOOLS + TRACE_MODELS + TRACE_STATE + STREAMING + REPLAY (no longer the placeholder's TRACE_TOOLS-only set). * Canonical events: browser.session.start, browser.navigate, browser.action, browser.screenshot, browser.dom.extract, tool.call, model.invoke, agent.input/output/state.change, cost.record, environment.config — plus agent.error / tool.error / model.error per the PR #115 error-aware emission contract. * Per-callback resilience wrapper per PR #117 — observability errors NEVER crash the customer's agent, surfaced via resilience_snapshot(). * Multi-tenant org_id propagation per PR #118 — bound at construction (kwarg or resolved from stratix.org_id), stamped defensively on every emit, caller-supplied values overwritten to prevent cross-tenant leaks. * Truncation policy from day one (DEFAULT_POLICY) — screenshot bytes DROPPED to deterministic SHA-256 references, DOM/HTML capped at 16 KiB, prompts/completions/tool I/O at 4/2 KiB. * Browser-event layer mapping (_BROWSER_EVENT_LAYERS) so unknown browser.* event types respect CaptureConfig gating without falling through the unknown-event-drops-by-default path. * requires_pydantic = PydanticCompat.V2_ONLY (browser_use is a v2 lib). Public surface (src/layerlens/instrument/adapters/frameworks/ browser_use/__init__.py): * ADAPTER_CLASS = BrowserUseAdapter (registry). * instrument_agent(agent, stratix=, capture_config=, org_id=) one-liner returning the connected, wrapping adapter. * STRATIXBrowserUseAdapter top-level binding (legacy alias) — fires DeprecationWarning on construction. Exposed as a static binding so the manifest consistency lint's AST walk finds it. Pyproject: * Adds 'browser-use' optional extra: browser-use>=0.1.0,<2 with the python_version >= '3.11' marker (browser_use's own constraint). Tests (tests/instrument/adapters/frameworks/test_browser_use_adapter.py): * Replaces the 7-test scaffold from #116 with 40 tests covering: wiring + alias + lifecycle round-trip + truncation (screenshot drop, hash determinism, HTML cap, short-payload no-audit) + multi-tenancy (kwarg, client attribute, defensive overwrite) + resilience (poison stratix, exploding agent attribute access) + error-aware emission (agent.error / tool.error / model.error) + per-hook coverage + sync + async wrapping + replay round-trip + 10-case provider detection table. Sample (samples/instrument/browser_use/{main.py,__init__.py,README.md}): * Runs OFFLINE — no browser-use install, no Playwright, no API key, no network. Three-step duck-typed agent + happy/--fail paths exercise the full event surface and demonstrate screenshot drop + org_id stamping + agent.error emission before re-raise. Doc (docs/adapters/frameworks-browser_use.md): * Install + quickstart + capabilities matrix + 14-event reference table + truncation policy table + multi-tenancy + resilience + error-aware emission + capture config + browser_use specifics + BYOK + replay sections. Manifest (scripts/emit_adapter_manifest.py): * Promotes browser_use from _LIFECYCLE_PREVIEW to _MATURE — every required artifact (test file with >= 12 funcs, sample, doc, STRATIX→LayerLens deprecation alias) ships in this PR. Verification ------------ * uv run pytest tests/instrument/adapters/frameworks/test_browser_use_adapter.py → 40 passed * mypy --strict src/layerlens/instrument/adapters/frameworks/browser_use → Success: no issues found in 2 source files * ruff check on src + test + script → All checks passed! * Sample runs cleanly offline (happy + --fail) * pip install -e .[browser-use] resolves cleanly (browser-use only pulled on Python 3.11+ per the env marker) * tests/instrument/adapters/test_manifest_consistency.py:: test_mature_adapters_have_required_artifacts[browser_use] passes * Full instrument suite (excl. pre-existing crewai/protocols references not on this branch): 312 passed, 1 skipped, 12 xfailed

mmercuri and others added 4 commits April 25, 2026 19:13

mmercuri requested a review from m-peko April 27, 2026 00:11

mmercuri mentioned this pull request Apr 27, 2026

feat(instrument): browser_use adapter full implementation (replaces placeholder from #116) #126

Merged

7 tasks

m-peko closed this May 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(instrument): Field-specific truncation policy across 11 lighter adapters (cross-poll #3)#116

feat(instrument): Field-specific truncation policy across 11 lighter adapters (cross-poll #3)#116
mmercuri wants to merge 5 commits into
mainfrom
feat/instrument-truncation-policy

mmercuri commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mmercuri commented Apr 27, 2026

Summary

What changed

Shared helper — src/layerlens/instrument/adapters/_base/truncation.py

BaseAdapter wiring

Per-adapter wiring (all 11 targets)

Tests

Test plan

CLAUDE.md compliance

Pre-existing failures (out of scope)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Shared helper — `src/layerlens/instrument/adapters/_base/truncation.py`

`BaseAdapter` wiring