feat(instrument): Field-specific truncation policy across 11 lighter adapters (cross-poll #3)#116
Closed
mmercuri wants to merge 5 commits into
Closed
feat(instrument): Field-specific truncation policy across 11 lighter adapters (cross-poll #3)#116mmercuri wants to merge 5 commits into
mmercuri wants to merge 5 commits into
Conversation
Bootstraps the LayerLens instrument layer with the abstract base classes,
adapter registry, capture configuration, event sinks, vendored event
schemas, and pydantic v1/v2 compatibility shim that every concrete
adapter (frameworks, protocols, providers) will depend on.
Scope
-----
- src/layerlens/instrument/__init__.py: lean re-export surface
- src/layerlens/instrument/_vendored/: frozen ateam event schemas (no
runtime ateam dependency)
- src/layerlens/instrument/adapters/_base/: BaseAdapter, AdapterRegistry,
AdapterStatus, AdapterHealth, AdapterCapability, ReplayableTrace,
CaptureConfig, EventSink, TraceStoreSink, IngestionPipelineSink,
PydanticCompat
- src/layerlens/_compat/pydantic.py: model_dump/model_validate shim
spanning pydantic v1 + v2
- scripts/{port_adapter,port_protocol,emit_adapter_manifest,
regen_dep_baselines}.py: codegen helpers used to port the rest of M1
- tests/instrument/{test_base_layer,test_lazy_imports,
test_default_install,test_resolved_dep_tree}.py + _baselines/
- .github/workflows/dep-tree-guard.yaml: CI gate that locks the default
install footprint
- docs/adapters/: CONTRIBUTING, STATUS, pydantic-compatibility, testing,
PERSONA_REVIEW
Blast radius
------------
- Pure additions. No public surface changes outside the new
layerlens.instrument namespace.
- Default `pip install layerlens` install set is unchanged (verified by
test_default_install.py against the new baseline).
- Lazy adapter discovery: importing layerlens.instrument MUST NOT pull
in any optional adapter dep (verified by test_lazy_imports.py).
Test plan
---------
- uv run pytest tests/instrument/test_base_layer.py
tests/instrument/test_lazy_imports.py -x -> 45 passed
- The dep-tree-guard workflow exercises test_default_install.py and
test_resolved_dep_tree.py against the new baselines on every PR.
LAY-3400 umbrella: this PR is the prerequisite for the M1.B/M1.C/M1.D
adapter ports, M7 protocol certification, and M8 Cohere/Mistral.
Ports the twelve agent-tier framework adapters from the ateam
reference implementation onto the new layerlens.instrument base layer:
Semantic Kernel, LlamaIndex, OpenAI Agents, Pydantic-AI, Agno,
Strands, SmolAgents, MS Agent Framework, Google ADK,
Bedrock Agents, Embedding (vector store hooks), Benchmark Import
Pairs with feat/instrument-frameworks-orchestration (M1.C part 1)
which lands LangChain, LangGraph, CrewAI, AutoGen, Langfuse, and
Agentforce. Together they complete M1.C.
Scope
-----
- src/layerlens/instrument/adapters/frameworks/{semantic_kernel,
llama_index,openai_agents,pydantic_ai,agno,strands,smolagents,
ms_agent_framework,google_adk,bedrock_agents,embedding,
benchmark_import}/: per-framework packages
- tests/instrument/adapters/frameworks/test_*_adapter.py + the
test_bulk_ported_smoke.py harness (which exercises every ported
adapter against canned trace fixtures so partial framework SDKs
on a given runner don't drop coverage to zero)
- samples/instrument/<framework>/: runnable per-framework samples
- docs/adapters/frameworks-<framework>.md: per-framework integration
guide
- pyproject.toml: twelve new optional extras
(semantic-kernel, llama-index, openai-agents, pydantic-ai, agno,
strands, smolagents, ms-agent-framework, google-adk,
bedrock-agents, embedding, benchmark-import) with python_version
markers; pyright/ruff exclusions for the dynamic monkey-patching
framework code
Blast radius
------------
- Default `pip install layerlens` install set is unchanged. Each
framework's heavy deps are gated behind their own extra.
- No changes to existing public API surface.
- Importing layerlens.instrument still does NOT pull in any framework
module (lazy registry lookup).
Test plan
---------
- uv run pytest tests/instrument/adapters/frameworks/ -x ->
184 passed, 1 skipped (test_bulk_ported_smoke.py covers all 12
agent-tier adapters plus the orchestration-tier ones from part 1
via the same harness)
Stacks on
---------
- feat/instrument-base-foundation (M1.A) — required for the
BaseAdapter surface this PR consumes.
Sibling of
----------
- feat/instrument-frameworks-orchestration (M1.C part 1) — both
branches stack on the base foundation independently and don't
conflict; they can land in either order.
LAY-3400 umbrella (M1.C part 2).
…ier + lint guards for capability/event consistency Fixes the audit finding that scripts/emit_adapter_manifest.py only ever wrote ``mature`` or ``smoke_only`` despite ``lifecycle_preview`` being documented in the manifest schema for months. As a result, every adapter that shipped full lifecycle hooks but lacked one graduation artifact (doc, sample, alias) was rendered as ``smoke_only`` in the catalog UI -- hiding real coverage from customers. What this change does ===================== scripts/emit_adapter_manifest.py -------------------------------- * Adds ``_LIFECYCLE_PREVIEW`` set (12 adapters) and a ``_maturity_for`` helper that tier-resolves ``mature`` -> ``lifecycle_preview`` -> ``smoke_only`` in that order. * Asserts ``_MATURE`` and ``_LIFECYCLE_PREVIEW`` are disjoint at module import time (catches future copy/paste mistakes). * Adds ``by_maturity`` summary block to the manifest output. * Moves ``smolagents`` out of ``_MATURE`` (it lacked doc + sample) and into ``_LIFECYCLE_PREVIEW`` until the sibling artifact PR lands. * Updates the docstring with audit context so the reason this exists is in the source. tests/instrument/adapters/test_manifest_consistency.py (new) ------------------------------------------------------------ Three lint-style guards run statically (no optional runtimes needed): 1. Capability/hook consistency. If lifecycle.py defines ``on_handoff``, ``TRACE_HANDOFFS`` MUST be in ``get_adapter_info().capabilities``; same for ``on_tool_use``/``TRACE_TOOLS`` and ``on_llm_call``/ ``TRACE_MODELS``. Catches the pydantic_ai bug (xfail until sibling capabilities-cleanup PR lands). 2. Canonical event-type parity. Every framework lifecycle.py MUST emit the five canonical event types (``agent.input``, ``agent.output``, ``tool.call``, ``model.invoke``, ``environment.config``) at least once. Catches missing-emit regressions. 3. Maturity-vs-artifacts. Every adapter in ``_MATURE`` MUST have a per-adapter test file (>= 12 funcs), a sample, a doc, and a STRATIX->LayerLens deprecation alias. Provider entries are xfailed today since their per-provider artifact PRs haven't landed on this branch. xfail markers are ``strict=True`` so when sibling PRs land and fix the underlying gap, the marker MUST be removed in the same PR or the suite fails. scripts/__init__.py (new) ------------------------- Empty package marker so ``mypy .`` does not double-detect ``scripts/emit_adapter_manifest.py`` as both a top-level module and a package member when imported by the test suite. scripts/README.md (new) ----------------------- Documents the three maturity tiers, the promotion workflow, and the audit context. tests/instrument/adapters/MANIFEST_CONSISTENCY.md (new) ------------------------------------------------------- Documents the three lint guards, the xfail policy, and how to fix violations. Verification ============ * ``python scripts/emit_adapter_manifest.py --stdout`` now reports 12 ``lifecycle_preview`` adapters (was 0). * ``mypy --strict scripts/emit_adapter_manifest.py`` passes. * ``pytest tests/instrument/adapters/test_manifest_consistency.py -v`` reports 23 passed, 10 xfailed (one cap/hook + nine pending-provider artifacts). * ``ruff check`` clean on all new files.
…adapters Cross-pollination audit §2.4: lighter framework adapters (agno, bedrock_agents, embedding, google_adk, llama_index, ms_agent_framework, openai_agents, pydantic_ai, smolagents, strands, browser_use) emit event payloads without consistent size limits. Untruncated, oversized prompts/tool I/O/state values blow past Kafka's 1 MB record limit, trigger S3 multi-part uploads in the ingestion pipeline, inflate TimescaleDB indexing cost, and embed unbounded user payloads in every span attribute. Browser_use is CRITICAL: a single navigation step can capture multi-megabyte base64 PNG screenshots and hundred-KB DOM HTML payloads. Mature adapters (LangChain, LangGraph, AutoGen, Semantic Kernel, Agentforce) each carry their own ad-hoc truncation. This change standardises the policy across all 11 lighter adapters. Shared helper * New ``layerlens.instrument.adapters._base.truncation`` module with ``FieldTruncationPolicy`` (frozen dataclass, zero-overhead), ``DEFAULT_POLICY``, ``truncate_field``, ``truncate_payload``. * Defaults: prompt/completion/message 4 KiB, tool_input/tool_output 2 KiB, state_snapshot 8 KiB, error_message 1 KiB, traceback 8 frames, screenshot/image_data DROP→hash reference, html/dom 16 KiB. * UTF-8-safe (CJK + emoji round-trip), recursive over nested dicts/ lists, list cap 100 items, recursion depth limit 8. * Auditable: returns ``(truncated_payload, truncated_fields)`` so the emit path can attach a ``_truncated_fields`` audit list — silent truncation is forbidden by CLAUDE.md. BaseAdapter wiring * ``BaseAdapter._truncation_policy`` (default None = legacy passthrough). * ``emit_dict_event`` calls ``_apply_truncation(payload)`` which rewrites the payload via ``truncate_payload`` and appends the audit list as ``_truncated_fields`` when any field exceeded its cap. * Failure-safe: if the policy raises (defensive), the raw payload is emitted and the failure is logged at DEBUG. Per-adapter wiring (all 11 targets) Each lighter adapter constructor now sets ``self._truncation_policy = DEFAULT_POLICY``: * agno, bedrock_agents, google_adk, llama_index, ms_agent_framework, openai_agents, pydantic_ai, smolagents, strands — lifecycle.py ``__init__`` updated. * embedding — both EmbeddingAdapter and VectorStoreAdapter wired. * browser_use — placeholder lifecycle.py created (M7-precursor) so the truncation policy is in place ahead of the M7 instrumentation PR. Pre-wires ``screenshot``→hash and ``html``→16 KiB cap. Tests * ``tests/instrument/adapters/_base/test_truncation.py`` — 31 tests covering policy correctness, UTF-8/CJK/emoji multibyte safety, drop/hash references, traceback frame truncation, nested dicts/ lists, recursion guard, primitive pass-through, immutability of policy and input payload, deterministic re-runs. * ``tests/instrument/test_base_layer.py`` — 9 new tests for ``BaseAdapter._apply_truncation`` integration. * Per-adapter test extensions on agno, bedrock_agents, google_adk, llama_index, ms_agent_framework, openai_agents, pydantic_ai, smolagents, strands — 4 truncation tests each (36 total). * New ``test_embedding_adapter.py`` (5 tests) and ``test_browser_use_adapter.py`` (7 tests). * ``test_manifest_consistency.py`` xfails the canonical-events and capability-hook checks for browser_use until M7 lands (strict=True). Acceptance * uv run pytest tests/instrument/adapters/_base/test_truncation.py -x → 31 passed * uv run pytest tests/instrument/adapters/frameworks/ -x → 168 passed * uv run mypy --strict src/layerlens/instrument/adapters/_base/truncation.py → success * uv run ruff check (modified files) → all checks passed
Merged
7 tasks
…laceholder from #116) (#126) Replaces the M7 placeholder shipped in PR #116 (truncation policy) with the full BrowserUseAdapter — every lifecycle hook wired, every event emitted, and every cross-cutting CLAUDE.md contract enforced from day one. What changed ------------ Full lifecycle adapter (src/layerlens/instrument/adapters/frameworks/ browser_use/lifecycle.py): * connect / disconnect / health_check / get_adapter_info / serialize_for_replay (all five abstract BaseAdapter methods). * on_session_start, on_session_end, on_navigation, on_action, on_screenshot, on_dom_extraction, on_llm_call (every spec'd hook). * Capability declaration: TRACE_TOOLS + TRACE_MODELS + TRACE_STATE + STREAMING + REPLAY (no longer the placeholder's TRACE_TOOLS-only set). * Canonical events: browser.session.start, browser.navigate, browser.action, browser.screenshot, browser.dom.extract, tool.call, model.invoke, agent.input/output/state.change, cost.record, environment.config — plus agent.error / tool.error / model.error per the PR #115 error-aware emission contract. * Per-callback resilience wrapper per PR #117 — observability errors NEVER crash the customer's agent, surfaced via resilience_snapshot(). * Multi-tenant org_id propagation per PR #118 — bound at construction (kwarg or resolved from stratix.org_id), stamped defensively on every emit, caller-supplied values overwritten to prevent cross-tenant leaks. * Truncation policy from day one (DEFAULT_POLICY) — screenshot bytes DROPPED to deterministic SHA-256 references, DOM/HTML capped at 16 KiB, prompts/completions/tool I/O at 4/2 KiB. * Browser-event layer mapping (_BROWSER_EVENT_LAYERS) so unknown browser.* event types respect CaptureConfig gating without falling through the unknown-event-drops-by-default path. * requires_pydantic = PydanticCompat.V2_ONLY (browser_use is a v2 lib). Public surface (src/layerlens/instrument/adapters/frameworks/ browser_use/__init__.py): * ADAPTER_CLASS = BrowserUseAdapter (registry). * instrument_agent(agent, stratix=, capture_config=, org_id=) one-liner returning the connected, wrapping adapter. * STRATIXBrowserUseAdapter top-level binding (legacy alias) — fires DeprecationWarning on construction. Exposed as a static binding so the manifest consistency lint's AST walk finds it. Pyproject: * Adds 'browser-use' optional extra: browser-use>=0.1.0,<2 with the python_version >= '3.11' marker (browser_use's own constraint). Tests (tests/instrument/adapters/frameworks/test_browser_use_adapter.py): * Replaces the 7-test scaffold from #116 with 40 tests covering: wiring + alias + lifecycle round-trip + truncation (screenshot drop, hash determinism, HTML cap, short-payload no-audit) + multi-tenancy (kwarg, client attribute, defensive overwrite) + resilience (poison stratix, exploding agent attribute access) + error-aware emission (agent.error / tool.error / model.error) + per-hook coverage + sync + async wrapping + replay round-trip + 10-case provider detection table. Sample (samples/instrument/browser_use/{main.py,__init__.py,README.md}): * Runs OFFLINE — no browser-use install, no Playwright, no API key, no network. Three-step duck-typed agent + happy/--fail paths exercise the full event surface and demonstrate screenshot drop + org_id stamping + agent.error emission before re-raise. Doc (docs/adapters/frameworks-browser_use.md): * Install + quickstart + capabilities matrix + 14-event reference table + truncation policy table + multi-tenancy + resilience + error-aware emission + capture config + browser_use specifics + BYOK + replay sections. Manifest (scripts/emit_adapter_manifest.py): * Promotes browser_use from _LIFECYCLE_PREVIEW to _MATURE — every required artifact (test file with >= 12 funcs, sample, doc, STRATIX→LayerLens deprecation alias) ships in this PR. Verification ------------ * uv run pytest tests/instrument/adapters/frameworks/test_browser_use_adapter.py → 40 passed * mypy --strict src/layerlens/instrument/adapters/frameworks/browser_use → Success: no issues found in 2 source files * ruff check on src + test + script → All checks passed! * Sample runs cleanly offline (happy + --fail) * pip install -e .[browser-use] resolves cleanly (browser-use only pulled on Python 3.11+ per the env marker) * tests/instrument/adapters/test_manifest_consistency.py:: test_mature_adapters_have_required_artifacts[browser_use] passes * Full instrument suite (excl. pre-existing crewai/protocols references not on this branch): 312 passed, 1 skipped, 12 xfailed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Cross-pollination audit §2.4 (
A:/tmp/adapter-cross-pollination-audit.md) — port the field-specific truncation policy that LangChain / LangGraph / AutoGen / Semantic Kernel / Agentforce each carry ad-hoc to the 11 lighter adapters (agno, bedrock_agents, embedding, google_adk, llama_index, ms_agent_framework, openai_agents, pydantic_ai, smolagents, strands, browser_use).Why now: without this, lighter adapters emit unbounded user payloads — full prompts, tool I/O, state snapshots, browser screenshots. That blows past Kafka's 1 MB record limit, triggers S3 multi-part uploads in the ingestion pipeline, inflates TimescaleDB indexing cost, and embeds high-cardinality user data in every span attribute. CRITICAL for
browser_use: a single navigation step can produce multi-megabyte base64 PNG screenshots.What changed
Shared helper —
src/layerlens/instrument/adapters/_base/truncation.pyFieldTruncationPolicy(frozen dataclass, zero-overhead) +DEFAULT_POLICY.truncate_field(value, field_name, policy)— recursive, UTF-8-safe, list-cap, recursion-cap.truncate_payload(payload, policy)returns(truncated_payload, truncated_fields)audit list.prompt/completion/message: 4096 charstool_input/tool_output: 2048 charsstate_snapshot: 8192 charserror_message: 1024 charstraceback: 8 frames (truncated by frame, not chars)screenshot/image_data: DROP → deterministic<dropped:...:sha256:...>referencehtml/dom(browser_use): 16384 charsBaseAdapterwiring_truncation_policyattribute (defaultsNone= legacy passthrough).emit_dict_eventcalls_apply_truncation(payload)which rewrites the payload and attaches_truncated_fieldsaudit list when any field exceeded its cap.Per-adapter wiring (all 11 targets)
Each lighter adapter constructor now sets
self._truncation_policy = DEFAULT_POLICY:lifecycle.pycreated (M7-precursor) so the truncation policy is in place ahead of M7 instrumentation.manifest_consistencyxfails for browser_use arestrict=Trueso they flip red the moment M7 lands the canonical event hooks.Tests
tests/instrument/adapters/_base/test_truncation.py— 31 tests covering policy correctness, UTF-8/CJK/emoji multibyte safety, drop/hash references, traceback frame truncation, nested dicts/lists, recursion guard, primitives, immutability, determinism.tests/instrument/test_base_layer.py— 9 new tests forBaseAdapter._apply_truncationintegration (gating, drop, custom policy, no mutation, nested dicts, broken-policy fallback, browser_use HTML pattern).test_embedding_adapter.py(5 tests) andtest_browser_use_adapter.py(7 tests).Test plan
uv run pytest tests/instrument/adapters/_base/test_truncation.py -x→ 31 passeduv run pytest tests/instrument/adapters/frameworks/ -x(excl. pre-existingtest_bulk_ported_smoke.pyimport) → 168 passeduv run pytest tests/instrument/test_base_layer.py→ 51 passed (was 42, now includes 9 new TestBaseAdapterTruncation cases)uv run mypy --strict src/layerlens/instrument/adapters/_base/truncation.py→ Successuv run mypy --strict src/layerlens/instrument/adapters/(full package) → Success: 39 source filesuv run ruff check(all modified + new files) → All checks passedCLAUDE.md compliance
_truncated_fieldsaudit list — no silent truncation.Pre-existing failures (out of scope)
Two test failures exist on the base branch unrelated to this PR:
test_lazy_imports.py::test_adapter_packages_importable_without_framework—protocols/package lives on a separate branch not yet merged.test_bulk_ported_smoke.pycollection error — imports adapters (crewai, autogen, langchain, langgraph, agentforce, langfuse) that live on different feature branches.Both are confirmed pre-existing on
feat/instrument-manifest-tier-fix(the base of this PR).