feat(instrument): Per-callback try/except resilience wrapper across 10 lighter adapters (cross-poll #5)#117
Closed
mmercuri wants to merge 1 commit into
Closed
Conversation
…0 lighter adapters Introduces a shared @resilient_callback decorator + ResilienceTracker under `src/layerlens/instrument/adapters/_base/`, then applies it to every callback method on the 10 lighter framework adapters (agno, llamaindex, google_adk, strands, pydantic_ai, smolagents, bedrock_agents, openai_agents, haystack, langfuse) so an exception in our observability code can never crash the customer's framework execution. What the decorator does on failure: 1. Catches Exception (NOT BaseException — KeyboardInterrupt / SystemExit still propagate so users can Ctrl-C their agent). 2. Logs the exception via the wrapped function's module logger with adapter_name + callback_name + truncated traceback. 3. Increments the adapter's per-instance ResilienceTracker counter. 4. Returns the framework's expected default value — None for void handlers, or the value of `passthrough_arg` for mutating hooks (Pydantic-AI's `after_model_request` returns the response object; `before_tool_execute` returns the args tuple). Health surfacing: - FrameworkAdapter now owns a `_resilience: ResilienceTracker` attribute set in `__init__` so every framework adapter inherits the contract. - `adapter_info().metadata` merges the live resilience snapshot (`resilience_status`, `resilience_failures_total`, `resilience_failure_threshold`, per-callback breakdown, last error). - After DEFAULT_FAILURE_THRESHOLD (5) failures the adapter reports `resilience_status: "degraded"` so monitoring can alert. - `disconnect()` resets the tracker so reconnects start clean. Per-adapter callback audit + fixes: | Adapter | Callbacks wrapped | Notes | |-----------------|-------------------|----------------------------------------| | agno | 2 | _on_run_start, _on_run_end | | llamaindex | 16 | 3 span lifecycle + dispatcher + 12 events| | google_adk | 11 | All adapter _on_* + simplified plugin shims| | strands | 7 | All hook handlers (replaces manual try/except)| | pydantic_ai | 9 (incl 3 split) | Error hooks split: telemetry resilient, re-raise unconditional| | smolagents | 6 | Run/step handlers (replaces manual try/except)| | bedrock_agents | 2 | _before_invoke + _after_invoke (with try/finally for _end_run)| | openai_agents | 3 | on_trace_start/_end + on_span_end (replaces manual try/except)| | haystack | 1 | _on_span_end (replaces manual try/except)| | langfuse | 5 | _import_single_trace, _import_observation, _import_score, _export_single_trace, plus inner emit fallbacks| | TOTAL | 62 | | Pydantic-AI error-callback split: `_on_run_error`, `_on_model_request_error`, `_on_tool_execute_error` MUST always re-raise the framework's original error (per Pydantic-AI's contract). The telemetry side is moved into a `_emit_*_error_telemetry` helper wrapped with @resilient_callback; the public hook calls it then unconditionally `raise error`. So adapter-side telemetry bugs can never swallow a real framework error. Tests: - `tests/instrument/adapters/_base/test_resilience.py` — 34 tests covering tracker mechanics, decorator behaviour, passthrough args, KeyboardInterrupt propagation, FrameworkAdapter integration, package re-exports, and decorator metadata preservation. - `tests/instrument/adapters/_base/test_per_adapter_resilience.py` — per-adapter smoke tests (one per lighter adapter) that simulate a callback exception by sabotaging an inner helper, plus a parametrized health-degradation test across all 10 adapters. Refactor: `_base.py` (the AdapterInfo + BaseAdapter module) becomes `_base/` package with `__init__.py` re-exporting from `_core.py` (moved via `git mv`) and the new `resilience.py`. All existing `from .._base import AdapterInfo, BaseAdapter` imports continue working unchanged. Acceptance: - pytest tests/instrument/adapters/_base/test_resilience.py -x — 34 passed - pytest tests/instrument/adapters/frameworks/ -x — 146 passed (12 skipped for missing optional deps; 2 deselected pre-existing Windows clock-resolution flakes in test_haystack) - mypy --strict src/layerlens/instrument/adapters/_base/resilience.py — Success - mypy src — Success: 169 source files - ruff check — All checks passed - Full test suite: 1090 passed
m-peko
pushed a commit
that referenced
this pull request
May 12, 2026
…laceholder from #116) (#126) Replaces the M7 placeholder shipped in PR #116 (truncation policy) with the full BrowserUseAdapter — every lifecycle hook wired, every event emitted, and every cross-cutting CLAUDE.md contract enforced from day one. What changed ------------ Full lifecycle adapter (src/layerlens/instrument/adapters/frameworks/ browser_use/lifecycle.py): * connect / disconnect / health_check / get_adapter_info / serialize_for_replay (all five abstract BaseAdapter methods). * on_session_start, on_session_end, on_navigation, on_action, on_screenshot, on_dom_extraction, on_llm_call (every spec'd hook). * Capability declaration: TRACE_TOOLS + TRACE_MODELS + TRACE_STATE + STREAMING + REPLAY (no longer the placeholder's TRACE_TOOLS-only set). * Canonical events: browser.session.start, browser.navigate, browser.action, browser.screenshot, browser.dom.extract, tool.call, model.invoke, agent.input/output/state.change, cost.record, environment.config — plus agent.error / tool.error / model.error per the PR #115 error-aware emission contract. * Per-callback resilience wrapper per PR #117 — observability errors NEVER crash the customer's agent, surfaced via resilience_snapshot(). * Multi-tenant org_id propagation per PR #118 — bound at construction (kwarg or resolved from stratix.org_id), stamped defensively on every emit, caller-supplied values overwritten to prevent cross-tenant leaks. * Truncation policy from day one (DEFAULT_POLICY) — screenshot bytes DROPPED to deterministic SHA-256 references, DOM/HTML capped at 16 KiB, prompts/completions/tool I/O at 4/2 KiB. * Browser-event layer mapping (_BROWSER_EVENT_LAYERS) so unknown browser.* event types respect CaptureConfig gating without falling through the unknown-event-drops-by-default path. * requires_pydantic = PydanticCompat.V2_ONLY (browser_use is a v2 lib). Public surface (src/layerlens/instrument/adapters/frameworks/ browser_use/__init__.py): * ADAPTER_CLASS = BrowserUseAdapter (registry). * instrument_agent(agent, stratix=, capture_config=, org_id=) one-liner returning the connected, wrapping adapter. * STRATIXBrowserUseAdapter top-level binding (legacy alias) — fires DeprecationWarning on construction. Exposed as a static binding so the manifest consistency lint's AST walk finds it. Pyproject: * Adds 'browser-use' optional extra: browser-use>=0.1.0,<2 with the python_version >= '3.11' marker (browser_use's own constraint). Tests (tests/instrument/adapters/frameworks/test_browser_use_adapter.py): * Replaces the 7-test scaffold from #116 with 40 tests covering: wiring + alias + lifecycle round-trip + truncation (screenshot drop, hash determinism, HTML cap, short-payload no-audit) + multi-tenancy (kwarg, client attribute, defensive overwrite) + resilience (poison stratix, exploding agent attribute access) + error-aware emission (agent.error / tool.error / model.error) + per-hook coverage + sync + async wrapping + replay round-trip + 10-case provider detection table. Sample (samples/instrument/browser_use/{main.py,__init__.py,README.md}): * Runs OFFLINE — no browser-use install, no Playwright, no API key, no network. Three-step duck-typed agent + happy/--fail paths exercise the full event surface and demonstrate screenshot drop + org_id stamping + agent.error emission before re-raise. Doc (docs/adapters/frameworks-browser_use.md): * Install + quickstart + capabilities matrix + 14-event reference table + truncation policy table + multi-tenancy + resilience + error-aware emission + capture config + browser_use specifics + BYOK + replay sections. Manifest (scripts/emit_adapter_manifest.py): * Promotes browser_use from _LIFECYCLE_PREVIEW to _MATURE — every required artifact (test file with >= 12 funcs, sample, doc, STRATIX→LayerLens deprecation alias) ships in this PR. Verification ------------ * uv run pytest tests/instrument/adapters/frameworks/test_browser_use_adapter.py → 40 passed * mypy --strict src/layerlens/instrument/adapters/frameworks/browser_use → Success: no issues found in 2 source files * ruff check on src + test + script → All checks passed! * Sample runs cleanly offline (happy + --fail) * pip install -e .[browser-use] resolves cleanly (browser-use only pulled on Python 3.11+ per the env marker) * tests/instrument/adapters/test_manifest_consistency.py:: test_mature_adapters_have_required_artifacts[browser_use] passes * Full instrument suite (excl. pre-existing crewai/protocols references not on this branch): 312 passed, 1 skipped, 12 xfailed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cross-pollination item #5 — per-callback resilience wrapper
Implements the resilience pattern described in
A:/tmp/adapter-cross-pollination-audit.md§2 item #5 (§2.11). Mature adapters (CrewAI, AutoGen, OpenAI Agents) wrap every observability callback in a try/except boundary so an exception in the adapter never escapes back into the framework's own execution path. This PR brings the SAME contract to all 10 lighter framework adapters — agno, llamaindex, google_adk, strands, pydantic_ai, smolagents, bedrock_agents, openai_agents, haystack, langfuse.What's new
Shared resilience module
src/layerlens/instrument/adapters/_base/resilience.py— new@resilient_callback(...)decorator: catchesException(NOTBaseException), logs it via the adapter's module logger with adapter_name + callback_name + truncated traceback, increments the adapter's failure counter, and returns the framework's expected default value.default=parameter: explicit default return on failure.passthrough_arg=parameter: returns the value of the named argument on failure — required for mutating hooks (Pydantic-AIafter_model_requestreturns the response object;before_tool_executereturns the args tuple). ReturningNonewould corrupt the framework's data flow.ResilienceTrackerclass: thread-safe per-adapter failure counter + per-callback breakdown.health_status()returnsDEGRADEDonceDEFAULT_FAILURE_THRESHOLD(5) failures cross.get_default_for(callback_name)helper: canonical defaults table for known callbacks (Strands hooks, Google ADK plugins, OpenAI Agents TracingProcessor, boto3 event handlers — all returnNone).FrameworkAdapter integration
src/layerlens/instrument/adapters/frameworks/_base_framework.pyFrameworkAdaptersubclass now ownsself._resilience: ResilienceTrackerset in__init__.adapter_info().metadatamerges the live resilience snapshot (resilience_status,resilience_failures_total,resilience_failure_threshold, per-callback breakdown, last error).disconnect()resets the tracker so reconnects start clean._base/package refactor_base.py(BaseAdapter contract) becomes_base/package:__init__.pyre-exports from_core.py(moved viagit mv, full history preserved) and the newresilience.py. All existingfrom .._base import AdapterInfo, BaseAdapterimports continue working unchanged.Per-adapter audit + fixes
_on_run_start,_on_run_endwould have crashed user code_on_*+ simplified plugin shims (single source of truth)try/finallyso_end_run()always runs_on_span_endPydantic-AI error-callback split
_on_run_error,_on_model_request_error,_on_tool_execute_errorMUST always re-raise the framework's original error (per Pydantic-AI's contract). The telemetry side moves into a_emit_*_error_telemetryhelper wrapped with@resilient_callback; the public hook calls it then unconditionallyraise error. Adapter-side telemetry bugs can never swallow a real framework error.Bedrock-Agents finally semantics
_after_invokekeeps an outertry/finallyso_end_run()always runs (otherwise collector + ContextVars leak across boto3 calls). The inner body is delegated to a_after_invoke_bodyhelper wrapped with@resilient_callback.Tests (50+ new tests)
tests/instrument/adapters/_base/test_resilience.py— 34 testsTestGetDefaultFor— known/unknown callback name lookupsTestResilienceTracker— counter, threshold validation, health degradation, metadata snapshot, reset, thread-safety under 8 threads × 100 incrementsTestResilientCallbackDecorator— success path, default-on-exception, counter increment, log context, exception does NOT propagate, positional + keyword passthrough_arg, health degradation, KeyboardInterrupt propagates, works without tracker, logger uses module of decorated functionTestFrameworkAdapterIntegration— tracker on every framework adapter, metadata surface, degraded status, disconnect resets, callback failure does not break frameworkTestPackageExports— public surface via_base.__init__TestDecoratorMetadata—functools.wrapspreserves__name__/__doc__TestPerAdapterCallbackException— parametrized per-adapter callback exception tests + threshold-degradation testtests/instrument/adapters/_base/test_per_adapter_resilience.py— 30 testsOne test per adapter that simulates a callback exception by sabotaging an inner helper, asserts:
adapter_info().metadata['resilience_status']is"degraded".Plus a parametrized health-degradation test across all 10 adapters. Includes ContextVar isolation fixture so tests that intentionally leave
_begin_runstate (Pydantic-AI, Bedrock-Agents) don't poison subsequent test files.Acceptance
Test plan
pytest tests/instrument/adapters/_base/test_resilience.py -x(resilience mechanics)pytest tests/instrument/adapters/_base/test_per_adapter_resilience.py(per-adapter)pytest tests/instrument/adapters/frameworks/(no regressions on adapter tests)pytest tests/instrument/test_trace_context.py(no ContextVar leak from _base tests)mypy --stricton resilience.pymypy src(full project)ruff check