Skip to content

feat(instrument): Field-specific truncation policy across 11 lighter adapters (cross-poll #3)#116

Closed
mmercuri wants to merge 5 commits into
mainfrom
feat/instrument-truncation-policy
Closed

feat(instrument): Field-specific truncation policy across 11 lighter adapters (cross-poll #3)#116
mmercuri wants to merge 5 commits into
mainfrom
feat/instrument-truncation-policy

Conversation

@mmercuri

Copy link
Copy Markdown
Contributor

Summary

Cross-pollination audit §2.4 (A:/tmp/adapter-cross-pollination-audit.md) — port the field-specific truncation policy that LangChain / LangGraph / AutoGen / Semantic Kernel / Agentforce each carry ad-hoc to the 11 lighter adapters (agno, bedrock_agents, embedding, google_adk, llama_index, ms_agent_framework, openai_agents, pydantic_ai, smolagents, strands, browser_use).

Why now: without this, lighter adapters emit unbounded user payloads — full prompts, tool I/O, state snapshots, browser screenshots. That blows past Kafka's 1 MB record limit, triggers S3 multi-part uploads in the ingestion pipeline, inflates TimescaleDB indexing cost, and embeds high-cardinality user data in every span attribute. CRITICAL for browser_use: a single navigation step can produce multi-megabyte base64 PNG screenshots.

What changed

Shared helper — src/layerlens/instrument/adapters/_base/truncation.py

  • FieldTruncationPolicy (frozen dataclass, zero-overhead) + DEFAULT_POLICY.
  • truncate_field(value, field_name, policy) — recursive, UTF-8-safe, list-cap, recursion-cap.
  • truncate_payload(payload, policy) returns (truncated_payload, truncated_fields) audit list.
  • Defaults from the spec:
    • prompt/completion/message: 4096 chars
    • tool_input/tool_output: 2048 chars
    • state_snapshot: 8192 chars
    • error_message: 1024 chars
    • traceback: 8 frames (truncated by frame, not chars)
    • screenshot/image_data: DROP → deterministic <dropped:...:sha256:...> reference
    • html/dom (browser_use): 16384 chars

BaseAdapter wiring

  • New _truncation_policy attribute (defaults None = legacy passthrough).
  • emit_dict_event calls _apply_truncation(payload) which rewrites the payload and attaches _truncated_fields audit list when any field exceeded its cap.
  • Failure-safe: policy errors fall back to raw payload + DEBUG log.

Per-adapter wiring (all 11 targets)

Each lighter adapter constructor now sets self._truncation_policy = DEFAULT_POLICY:

  • agno, bedrock_agents, google_adk, llama_index, ms_agent_framework, openai_agents, pydantic_ai, smolagents, strands.
  • embedding (both EmbeddingAdapter + VectorStoreAdapter).
  • browser_use — placeholder lifecycle.py created (M7-precursor) so the truncation policy is in place ahead of M7 instrumentation. manifest_consistency xfails for browser_use are strict=True so they flip red the moment M7 lands the canonical event hooks.

Tests

  • tests/instrument/adapters/_base/test_truncation.py31 tests covering policy correctness, UTF-8/CJK/emoji multibyte safety, drop/hash references, traceback frame truncation, nested dicts/lists, recursion guard, primitives, immutability, determinism.
  • tests/instrument/test_base_layer.py9 new tests for BaseAdapter._apply_truncation integration (gating, drop, custom policy, no mutation, nested dicts, broken-policy fallback, browser_use HTML pattern).
  • 4 new truncation tests in each of 9 existing lighter-adapter test files (36 total).
  • New test_embedding_adapter.py (5 tests) and test_browser_use_adapter.py (7 tests).

Test plan

  • uv run pytest tests/instrument/adapters/_base/test_truncation.py -x31 passed
  • uv run pytest tests/instrument/adapters/frameworks/ -x (excl. pre-existing test_bulk_ported_smoke.py import) → 168 passed
  • uv run pytest tests/instrument/test_base_layer.py51 passed (was 42, now includes 9 new TestBaseAdapterTruncation cases)
  • uv run mypy --strict src/layerlens/instrument/adapters/_base/truncation.pySuccess
  • uv run mypy --strict src/layerlens/instrument/adapters/ (full package) → Success: 39 source files
  • uv run ruff check (all modified + new files) → All checks passed

CLAUDE.md compliance

  • No TODOs.
  • All 11 adapters wired (10 existing + browser_use placeholder pre-wired).
  • UTF-8-safe truncation verified for emoji + CJK round-trip.
  • Truncation metadata exposed via _truncated_fields audit list — no silent truncation.
  • No co-author trailers.
  • Draft PR.

Pre-existing failures (out of scope)

Two test failures exist on the base branch unrelated to this PR:

  • test_lazy_imports.py::test_adapter_packages_importable_without_frameworkprotocols/ package lives on a separate branch not yet merged.
  • test_bulk_ported_smoke.py collection error — imports adapters (crewai, autogen, langchain, langgraph, agentforce, langfuse) that live on different feature branches.

Both are confirmed pre-existing on feat/instrument-manifest-tier-fix (the base of this PR).

mmercuri and others added 4 commits April 25, 2026 19:13
Bootstraps the LayerLens instrument layer with the abstract base classes,
adapter registry, capture configuration, event sinks, vendored event
schemas, and pydantic v1/v2 compatibility shim that every concrete
adapter (frameworks, protocols, providers) will depend on.

Scope
-----
- src/layerlens/instrument/__init__.py: lean re-export surface
- src/layerlens/instrument/_vendored/: frozen ateam event schemas (no
  runtime ateam dependency)
- src/layerlens/instrument/adapters/_base/: BaseAdapter, AdapterRegistry,
  AdapterStatus, AdapterHealth, AdapterCapability, ReplayableTrace,
  CaptureConfig, EventSink, TraceStoreSink, IngestionPipelineSink,
  PydanticCompat
- src/layerlens/_compat/pydantic.py: model_dump/model_validate shim
  spanning pydantic v1 + v2
- scripts/{port_adapter,port_protocol,emit_adapter_manifest,
  regen_dep_baselines}.py: codegen helpers used to port the rest of M1
- tests/instrument/{test_base_layer,test_lazy_imports,
  test_default_install,test_resolved_dep_tree}.py + _baselines/
- .github/workflows/dep-tree-guard.yaml: CI gate that locks the default
  install footprint
- docs/adapters/: CONTRIBUTING, STATUS, pydantic-compatibility, testing,
  PERSONA_REVIEW

Blast radius
------------
- Pure additions. No public surface changes outside the new
  layerlens.instrument namespace.
- Default `pip install layerlens` install set is unchanged (verified by
  test_default_install.py against the new baseline).
- Lazy adapter discovery: importing layerlens.instrument MUST NOT pull
  in any optional adapter dep (verified by test_lazy_imports.py).

Test plan
---------
- uv run pytest tests/instrument/test_base_layer.py
  tests/instrument/test_lazy_imports.py -x  -> 45 passed
- The dep-tree-guard workflow exercises test_default_install.py and
  test_resolved_dep_tree.py against the new baselines on every PR.

LAY-3400 umbrella: this PR is the prerequisite for the M1.B/M1.C/M1.D
adapter ports, M7 protocol certification, and M8 Cohere/Mistral.
Ports the twelve agent-tier framework adapters from the ateam
reference implementation onto the new layerlens.instrument base layer:

  Semantic Kernel, LlamaIndex, OpenAI Agents, Pydantic-AI, Agno,
  Strands, SmolAgents, MS Agent Framework, Google ADK,
  Bedrock Agents, Embedding (vector store hooks), Benchmark Import

Pairs with feat/instrument-frameworks-orchestration (M1.C part 1)
which lands LangChain, LangGraph, CrewAI, AutoGen, Langfuse, and
Agentforce. Together they complete M1.C.

Scope
-----
- src/layerlens/instrument/adapters/frameworks/{semantic_kernel,
  llama_index,openai_agents,pydantic_ai,agno,strands,smolagents,
  ms_agent_framework,google_adk,bedrock_agents,embedding,
  benchmark_import}/: per-framework packages
- tests/instrument/adapters/frameworks/test_*_adapter.py + the
  test_bulk_ported_smoke.py harness (which exercises every ported
  adapter against canned trace fixtures so partial framework SDKs
  on a given runner don't drop coverage to zero)
- samples/instrument/<framework>/: runnable per-framework samples
- docs/adapters/frameworks-<framework>.md: per-framework integration
  guide
- pyproject.toml: twelve new optional extras
  (semantic-kernel, llama-index, openai-agents, pydantic-ai, agno,
  strands, smolagents, ms-agent-framework, google-adk,
  bedrock-agents, embedding, benchmark-import) with python_version
  markers; pyright/ruff exclusions for the dynamic monkey-patching
  framework code

Blast radius
------------
- Default `pip install layerlens` install set is unchanged. Each
  framework's heavy deps are gated behind their own extra.
- No changes to existing public API surface.
- Importing layerlens.instrument still does NOT pull in any framework
  module (lazy registry lookup).

Test plan
---------
- uv run pytest tests/instrument/adapters/frameworks/ -x  ->
  184 passed, 1 skipped (test_bulk_ported_smoke.py covers all 12
  agent-tier adapters plus the orchestration-tier ones from part 1
  via the same harness)

Stacks on
---------
- feat/instrument-base-foundation (M1.A) — required for the
  BaseAdapter surface this PR consumes.

Sibling of
----------
- feat/instrument-frameworks-orchestration (M1.C part 1) — both
  branches stack on the base foundation independently and don't
  conflict; they can land in either order.

LAY-3400 umbrella (M1.C part 2).
…ier + lint guards for capability/event consistency

Fixes the audit finding that scripts/emit_adapter_manifest.py only ever
wrote ``mature`` or ``smoke_only`` despite ``lifecycle_preview`` being
documented in the manifest schema for months. As a result, every
adapter that shipped full lifecycle hooks but lacked one graduation
artifact (doc, sample, alias) was rendered as ``smoke_only`` in the
catalog UI -- hiding real coverage from customers.

What this change does
=====================

scripts/emit_adapter_manifest.py
--------------------------------
* Adds ``_LIFECYCLE_PREVIEW`` set (12 adapters) and a ``_maturity_for``
  helper that tier-resolves ``mature`` -> ``lifecycle_preview`` ->
  ``smoke_only`` in that order.
* Asserts ``_MATURE`` and ``_LIFECYCLE_PREVIEW`` are disjoint at
  module import time (catches future copy/paste mistakes).
* Adds ``by_maturity`` summary block to the manifest output.
* Moves ``smolagents`` out of ``_MATURE`` (it lacked doc + sample) and
  into ``_LIFECYCLE_PREVIEW`` until the sibling artifact PR lands.
* Updates the docstring with audit context so the reason this exists
  is in the source.

tests/instrument/adapters/test_manifest_consistency.py (new)
------------------------------------------------------------
Three lint-style guards run statically (no optional runtimes needed):

1. Capability/hook consistency. If lifecycle.py defines ``on_handoff``,
   ``TRACE_HANDOFFS`` MUST be in ``get_adapter_info().capabilities``;
   same for ``on_tool_use``/``TRACE_TOOLS`` and ``on_llm_call``/
   ``TRACE_MODELS``. Catches the pydantic_ai bug (xfail until sibling
   capabilities-cleanup PR lands).
2. Canonical event-type parity. Every framework lifecycle.py MUST emit
   the five canonical event types (``agent.input``, ``agent.output``,
   ``tool.call``, ``model.invoke``, ``environment.config``) at least
   once. Catches missing-emit regressions.
3. Maturity-vs-artifacts. Every adapter in ``_MATURE`` MUST have a
   per-adapter test file (>= 12 funcs), a sample, a doc, and a
   STRATIX->LayerLens deprecation alias. Provider entries are xfailed
   today since their per-provider artifact PRs haven't landed on this
   branch.

xfail markers are ``strict=True`` so when sibling PRs land and fix
the underlying gap, the marker MUST be removed in the same PR or the
suite fails.

scripts/__init__.py (new)
-------------------------
Empty package marker so ``mypy .`` does not double-detect
``scripts/emit_adapter_manifest.py`` as both a top-level module and a
package member when imported by the test suite.

scripts/README.md (new)
-----------------------
Documents the three maturity tiers, the promotion workflow, and the
audit context.

tests/instrument/adapters/MANIFEST_CONSISTENCY.md (new)
-------------------------------------------------------
Documents the three lint guards, the xfail policy, and how to fix
violations.

Verification
============

* ``python scripts/emit_adapter_manifest.py --stdout`` now reports 12
  ``lifecycle_preview`` adapters (was 0).
* ``mypy --strict scripts/emit_adapter_manifest.py`` passes.
* ``pytest tests/instrument/adapters/test_manifest_consistency.py -v``
  reports 23 passed, 10 xfailed (one cap/hook + nine pending-provider
  artifacts).
* ``ruff check`` clean on all new files.
…adapters

Cross-pollination audit §2.4: lighter framework adapters (agno,
bedrock_agents, embedding, google_adk, llama_index, ms_agent_framework,
openai_agents, pydantic_ai, smolagents, strands, browser_use) emit
event payloads without consistent size limits. Untruncated, oversized
prompts/tool I/O/state values blow past Kafka's 1 MB record limit,
trigger S3 multi-part uploads in the ingestion pipeline, inflate
TimescaleDB indexing cost, and embed unbounded user payloads in
every span attribute. Browser_use is CRITICAL: a single navigation
step can capture multi-megabyte base64 PNG screenshots and
hundred-KB DOM HTML payloads.

Mature adapters (LangChain, LangGraph, AutoGen, Semantic Kernel,
Agentforce) each carry their own ad-hoc truncation. This change
standardises the policy across all 11 lighter adapters.

Shared helper

* New ``layerlens.instrument.adapters._base.truncation`` module with
  ``FieldTruncationPolicy`` (frozen dataclass, zero-overhead),
  ``DEFAULT_POLICY``, ``truncate_field``, ``truncate_payload``.
* Defaults: prompt/completion/message 4 KiB, tool_input/tool_output
  2 KiB, state_snapshot 8 KiB, error_message 1 KiB, traceback 8 frames,
  screenshot/image_data DROP→hash reference, html/dom 16 KiB.
* UTF-8-safe (CJK + emoji round-trip), recursive over nested dicts/
  lists, list cap 100 items, recursion depth limit 8.
* Auditable: returns ``(truncated_payload, truncated_fields)`` so the
  emit path can attach a ``_truncated_fields`` audit list — silent
  truncation is forbidden by CLAUDE.md.

BaseAdapter wiring

* ``BaseAdapter._truncation_policy`` (default None = legacy passthrough).
* ``emit_dict_event`` calls ``_apply_truncation(payload)`` which
  rewrites the payload via ``truncate_payload`` and appends the audit
  list as ``_truncated_fields`` when any field exceeded its cap.
* Failure-safe: if the policy raises (defensive), the raw payload is
  emitted and the failure is logged at DEBUG.

Per-adapter wiring (all 11 targets)

Each lighter adapter constructor now sets
``self._truncation_policy = DEFAULT_POLICY``:

* agno, bedrock_agents, google_adk, llama_index, ms_agent_framework,
  openai_agents, pydantic_ai, smolagents, strands — lifecycle.py
  ``__init__`` updated.
* embedding — both EmbeddingAdapter and VectorStoreAdapter wired.
* browser_use — placeholder lifecycle.py created (M7-precursor) so
  the truncation policy is in place ahead of the M7 instrumentation
  PR. Pre-wires ``screenshot``→hash and ``html``→16 KiB cap.

Tests

* ``tests/instrument/adapters/_base/test_truncation.py`` — 31 tests
  covering policy correctness, UTF-8/CJK/emoji multibyte safety,
  drop/hash references, traceback frame truncation, nested dicts/
  lists, recursion guard, primitive pass-through, immutability of
  policy and input payload, deterministic re-runs.
* ``tests/instrument/test_base_layer.py`` — 9 new tests for
  ``BaseAdapter._apply_truncation`` integration.
* Per-adapter test extensions on agno, bedrock_agents, google_adk,
  llama_index, ms_agent_framework, openai_agents, pydantic_ai,
  smolagents, strands — 4 truncation tests each (36 total).
* New ``test_embedding_adapter.py`` (5 tests) and
  ``test_browser_use_adapter.py`` (7 tests).
* ``test_manifest_consistency.py`` xfails the canonical-events and
  capability-hook checks for browser_use until M7 lands (strict=True).

Acceptance

* uv run pytest tests/instrument/adapters/_base/test_truncation.py
  -x → 31 passed
* uv run pytest tests/instrument/adapters/frameworks/ -x → 168 passed
* uv run mypy --strict
  src/layerlens/instrument/adapters/_base/truncation.py → success
* uv run ruff check (modified files) → all checks passed
…laceholder from #116) (#126)

Replaces the M7 placeholder shipped in PR #116 (truncation policy) with
the full BrowserUseAdapter — every lifecycle hook wired, every event
emitted, and every cross-cutting CLAUDE.md contract enforced from day one.

What changed
------------

Full lifecycle adapter (src/layerlens/instrument/adapters/frameworks/
browser_use/lifecycle.py):

* connect / disconnect / health_check / get_adapter_info /
  serialize_for_replay (all five abstract BaseAdapter methods).
* on_session_start, on_session_end, on_navigation, on_action,
  on_screenshot, on_dom_extraction, on_llm_call (every spec'd hook).
* Capability declaration: TRACE_TOOLS + TRACE_MODELS + TRACE_STATE +
  STREAMING + REPLAY (no longer the placeholder's TRACE_TOOLS-only set).
* Canonical events: browser.session.start, browser.navigate,
  browser.action, browser.screenshot, browser.dom.extract, tool.call,
  model.invoke, agent.input/output/state.change, cost.record,
  environment.config — plus agent.error / tool.error / model.error
  per the PR #115 error-aware emission contract.
* Per-callback resilience wrapper per PR #117 — observability errors
  NEVER crash the customer's agent, surfaced via resilience_snapshot().
* Multi-tenant org_id propagation per PR #118 — bound at construction
  (kwarg or resolved from stratix.org_id), stamped defensively on
  every emit, caller-supplied values overwritten to prevent
  cross-tenant leaks.
* Truncation policy from day one (DEFAULT_POLICY) — screenshot bytes
  DROPPED to deterministic SHA-256 references, DOM/HTML capped at
  16 KiB, prompts/completions/tool I/O at 4/2 KiB.
* Browser-event layer mapping (_BROWSER_EVENT_LAYERS) so unknown
  browser.* event types respect CaptureConfig gating without falling
  through the unknown-event-drops-by-default path.
* requires_pydantic = PydanticCompat.V2_ONLY (browser_use is a v2 lib).

Public surface (src/layerlens/instrument/adapters/frameworks/
browser_use/__init__.py):

* ADAPTER_CLASS = BrowserUseAdapter (registry).
* instrument_agent(agent, stratix=, capture_config=, org_id=)
  one-liner returning the connected, wrapping adapter.
* STRATIXBrowserUseAdapter top-level binding (legacy alias) — fires
  DeprecationWarning on construction. Exposed as a static binding so
  the manifest consistency lint's AST walk finds it.

Pyproject:

* Adds 'browser-use' optional extra: browser-use>=0.1.0,<2 with the
  python_version >= '3.11' marker (browser_use's own constraint).

Tests (tests/instrument/adapters/frameworks/test_browser_use_adapter.py):

* Replaces the 7-test scaffold from #116 with 40 tests covering:
  wiring + alias + lifecycle round-trip + truncation (screenshot
  drop, hash determinism, HTML cap, short-payload no-audit) +
  multi-tenancy (kwarg, client attribute, defensive overwrite) +
  resilience (poison stratix, exploding agent attribute access) +
  error-aware emission (agent.error / tool.error / model.error) +
  per-hook coverage + sync + async wrapping + replay round-trip +
  10-case provider detection table.

Sample (samples/instrument/browser_use/{main.py,__init__.py,README.md}):

* Runs OFFLINE — no browser-use install, no Playwright, no API key,
  no network. Three-step duck-typed agent + happy/--fail paths
  exercise the full event surface and demonstrate screenshot drop +
  org_id stamping + agent.error emission before re-raise.

Doc (docs/adapters/frameworks-browser_use.md):

* Install + quickstart + capabilities matrix + 14-event reference
  table + truncation policy table + multi-tenancy + resilience +
  error-aware emission + capture config + browser_use specifics +
  BYOK + replay sections.

Manifest (scripts/emit_adapter_manifest.py):

* Promotes browser_use from _LIFECYCLE_PREVIEW to _MATURE — every
  required artifact (test file with >= 12 funcs, sample, doc,
  STRATIX→LayerLens deprecation alias) ships in this PR.

Verification
------------

* uv run pytest tests/instrument/adapters/frameworks/test_browser_use_adapter.py
  → 40 passed
* mypy --strict src/layerlens/instrument/adapters/frameworks/browser_use
  → Success: no issues found in 2 source files
* ruff check on src + test + script
  → All checks passed!
* Sample runs cleanly offline (happy + --fail)
* pip install -e .[browser-use] resolves cleanly (browser-use only
  pulled on Python 3.11+ per the env marker)
* tests/instrument/adapters/test_manifest_consistency.py::
  test_mature_adapters_have_required_artifacts[browser_use] passes
* Full instrument suite (excl. pre-existing crewai/protocols
  references not on this branch): 312 passed, 1 skipped, 12 xfailed
@m-peko m-peko closed this May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants