Skip to content

feat(instrument): Per-callback try/except resilience wrapper across 10 lighter adapters (cross-poll #5)#117

Closed
mmercuri wants to merge 1 commit into
developmentfrom
feat/instrument-callback-resilience
Closed

feat(instrument): Per-callback try/except resilience wrapper across 10 lighter adapters (cross-poll #5)#117
mmercuri wants to merge 1 commit into
developmentfrom
feat/instrument-callback-resilience

Conversation

@mmercuri

Copy link
Copy Markdown
Contributor

Cross-pollination item #5 — per-callback resilience wrapper

Implements the resilience pattern described in A:/tmp/adapter-cross-pollination-audit.md §2 item #5 (§2.11). Mature adapters (CrewAI, AutoGen, OpenAI Agents) wrap every observability callback in a try/except boundary so an exception in the adapter never escapes back into the framework's own execution path. This PR brings the SAME contract to all 10 lighter framework adapters — agno, llamaindex, google_adk, strands, pydantic_ai, smolagents, bedrock_agents, openai_agents, haystack, langfuse.

What's new

Shared resilience module

src/layerlens/instrument/adapters/_base/resilience.py — new

  • @resilient_callback(...) decorator: catches Exception (NOT BaseException), logs it via the adapter's module logger with adapter_name + callback_name + truncated traceback, increments the adapter's failure counter, and returns the framework's expected default value.
    • default= parameter: explicit default return on failure.
    • passthrough_arg= parameter: returns the value of the named argument on failure — required for mutating hooks (Pydantic-AI after_model_request returns the response object; before_tool_execute returns the args tuple). Returning None would corrupt the framework's data flow.
  • ResilienceTracker class: thread-safe per-adapter failure counter + per-callback breakdown. health_status() returns DEGRADED once DEFAULT_FAILURE_THRESHOLD (5) failures cross.
  • get_default_for(callback_name) helper: canonical defaults table for known callbacks (Strands hooks, Google ADK plugins, OpenAI Agents TracingProcessor, boto3 event handlers — all return None).

FrameworkAdapter integration

src/layerlens/instrument/adapters/frameworks/_base_framework.py

  • Every FrameworkAdapter subclass now owns self._resilience: ResilienceTracker set in __init__.
  • adapter_info().metadata merges the live resilience snapshot (resilience_status, resilience_failures_total, resilience_failure_threshold, per-callback breakdown, last error).
  • disconnect() resets the tracker so reconnects start clean.

_base/ package refactor

_base.py (BaseAdapter contract) becomes _base/ package: __init__.py re-exports from _core.py (moved via git mv, full history preserved) and the new resilience.py. All existing from .._base import AdapterInfo, BaseAdapter imports continue working unchanged.

Per-adapter audit + fixes

Adapter Before Callbacks wrapped Notes
agno NO try/except 2 _on_run_start, _on_run_end would have crashed user code
llamaindex PARTIAL (dispatcher only) 16 3 span lifecycle + dispatcher + 12 per-event handlers
google_adk YES (in plugin shims) 11 All adapter _on_* + simplified plugin shims (single source of truth)
strands YES (manual try/except in each) 7 Replaced manual try/except with decorator (consistency + tracking)
pydantic_ai NO try/except 9 (incl 3 split) Error hooks split: telemetry resilient, framework error always re-raised
smolagents PARTIAL (step callbacks only) 6 Run lifecycle handlers were unwrapped; replaced manual try/except in step handlers
bedrock_agents YES (manual) 2 Replaced manual try/except; preserved try/finally so _end_run() always runs
openai_agents YES (manual) 3 Replaced manual try/except in TracingProcessor methods
haystack YES (manual) 1 Replaced manual try/except in _on_span_end
langfuse PARTIAL (per-record loops) 5 Imports + exports refactored to push per-record resilience into helpers
TOTAL 62

Pydantic-AI error-callback split

_on_run_error, _on_model_request_error, _on_tool_execute_error MUST always re-raise the framework's original error (per Pydantic-AI's contract). The telemetry side moves into a _emit_*_error_telemetry helper wrapped with @resilient_callback; the public hook calls it then unconditionally raise error. Adapter-side telemetry bugs can never swallow a real framework error.

Bedrock-Agents finally semantics

_after_invoke keeps an outer try/finally so _end_run() always runs (otherwise collector + ContextVars leak across boto3 calls). The inner body is delegated to a _after_invoke_body helper wrapped with @resilient_callback.

Tests (50+ new tests)

tests/instrument/adapters/_base/test_resilience.py — 34 tests

  • TestGetDefaultFor — known/unknown callback name lookups
  • TestResilienceTracker — counter, threshold validation, health degradation, metadata snapshot, reset, thread-safety under 8 threads × 100 increments
  • TestResilientCallbackDecorator — success path, default-on-exception, counter increment, log context, exception does NOT propagate, positional + keyword passthrough_arg, health degradation, KeyboardInterrupt propagates, works without tracker, logger uses module of decorated function
  • TestFrameworkAdapterIntegration — tracker on every framework adapter, metadata surface, degraded status, disconnect resets, callback failure does not break framework
  • TestPackageExports — public surface via _base.__init__
  • TestDecoratorMetadatafunctools.wraps preserves __name__/__doc__
  • TestPerAdapterCallbackException — parametrized per-adapter callback exception tests + threshold-degradation test

tests/instrument/adapters/_base/test_per_adapter_resilience.py — 30 tests

One test per adapter that simulates a callback exception by sabotaging an inner helper, asserts:

  1. The exception does NOT propagate.
  2. The resilience tracker recorded the failure.
  3. After enough failures, adapter_info().metadata['resilience_status'] is "degraded".

Plus a parametrized health-degradation test across all 10 adapters. Includes ContextVar isolation fixture so tests that intentionally leave _begin_run state (Pydantic-AI, Bedrock-Agents) don't poison subsequent test files.

Acceptance

pytest tests/instrument/adapters/_base/test_resilience.py -x
# 34 passed in 0.10s

pytest tests/instrument/adapters/frameworks/ -x
# 146 passed, 12 skipped (missing optional framework deps),
# 2 deselected (pre-existing Windows clock-resolution flakes in test_haystack
#  — confirmed against origin/development baseline)

mypy --strict src/layerlens/instrument/adapters/_base/resilience.py
# Success: no issues found in 1 source file

mypy src
# Success: no issues found in 169 source files

ruff check src/ tests/
# All checks passed!

pytest tests/ (full suite, excluding cli + providers which need extra deps)
# 1090 passed, 19 skipped, 2 deselected (pre-existing flakes)

Test plan

  • pytest tests/instrument/adapters/_base/test_resilience.py -x (resilience mechanics)
  • pytest tests/instrument/adapters/_base/test_per_adapter_resilience.py (per-adapter)
  • pytest tests/instrument/adapters/frameworks/ (no regressions on adapter tests)
  • pytest tests/instrument/test_trace_context.py (no ContextVar leak from _base tests)
  • mypy --strict on resilience.py
  • mypy src (full project)
  • ruff check
  • CI must pass on Linux / 3.10-3.13 matrix (this PR was developed on Windows / 3.11)

…0 lighter adapters

Introduces a shared @resilient_callback decorator + ResilienceTracker
under `src/layerlens/instrument/adapters/_base/`, then applies it to every
callback method on the 10 lighter framework adapters (agno, llamaindex,
google_adk, strands, pydantic_ai, smolagents, bedrock_agents, openai_agents,
haystack, langfuse) so an exception in our observability code can never
crash the customer's framework execution.

What the decorator does on failure:
1. Catches Exception (NOT BaseException — KeyboardInterrupt / SystemExit
   still propagate so users can Ctrl-C their agent).
2. Logs the exception via the wrapped function's module logger with
   adapter_name + callback_name + truncated traceback.
3. Increments the adapter's per-instance ResilienceTracker counter.
4. Returns the framework's expected default value — None for void
   handlers, or the value of `passthrough_arg` for mutating hooks
   (Pydantic-AI's `after_model_request` returns the response object;
   `before_tool_execute` returns the args tuple).

Health surfacing:
- FrameworkAdapter now owns a `_resilience: ResilienceTracker` attribute
  set in `__init__` so every framework adapter inherits the contract.
- `adapter_info().metadata` merges the live resilience snapshot
  (`resilience_status`, `resilience_failures_total`,
  `resilience_failure_threshold`, per-callback breakdown, last error).
- After DEFAULT_FAILURE_THRESHOLD (5) failures the adapter reports
  `resilience_status: "degraded"` so monitoring can alert.
- `disconnect()` resets the tracker so reconnects start clean.

Per-adapter callback audit + fixes:

| Adapter         | Callbacks wrapped | Notes                                  |
|-----------------|-------------------|----------------------------------------|
| agno            | 2                 | _on_run_start, _on_run_end             |
| llamaindex      | 16                | 3 span lifecycle + dispatcher + 12 events|
| google_adk      | 11                | All adapter _on_* + simplified plugin shims|
| strands         | 7                 | All hook handlers (replaces manual try/except)|
| pydantic_ai     | 9 (incl 3 split)  | Error hooks split: telemetry resilient, re-raise unconditional|
| smolagents      | 6                 | Run/step handlers (replaces manual try/except)|
| bedrock_agents  | 2                 | _before_invoke + _after_invoke (with try/finally for _end_run)|
| openai_agents   | 3                 | on_trace_start/_end + on_span_end (replaces manual try/except)|
| haystack        | 1                 | _on_span_end (replaces manual try/except)|
| langfuse        | 5                 | _import_single_trace, _import_observation, _import_score, _export_single_trace, plus inner emit fallbacks|
| TOTAL           | 62                |                                        |

Pydantic-AI error-callback split: `_on_run_error`,
`_on_model_request_error`, `_on_tool_execute_error` MUST always re-raise
the framework's original error (per Pydantic-AI's contract). The
telemetry side is moved into a `_emit_*_error_telemetry` helper wrapped
with @resilient_callback; the public hook calls it then unconditionally
`raise error`. So adapter-side telemetry bugs can never swallow a real
framework error.

Tests:
- `tests/instrument/adapters/_base/test_resilience.py` — 34 tests
  covering tracker mechanics, decorator behaviour, passthrough args,
  KeyboardInterrupt propagation, FrameworkAdapter integration, package
  re-exports, and decorator metadata preservation.
- `tests/instrument/adapters/_base/test_per_adapter_resilience.py` —
  per-adapter smoke tests (one per lighter adapter) that simulate a
  callback exception by sabotaging an inner helper, plus a parametrized
  health-degradation test across all 10 adapters.

Refactor: `_base.py` (the AdapterInfo + BaseAdapter module) becomes
`_base/` package with `__init__.py` re-exporting from `_core.py` (moved
via `git mv`) and the new `resilience.py`. All existing
`from .._base import AdapterInfo, BaseAdapter` imports continue working
unchanged.

Acceptance:
- pytest tests/instrument/adapters/_base/test_resilience.py -x — 34 passed
- pytest tests/instrument/adapters/frameworks/ -x — 146 passed (12 skipped for missing optional deps; 2 deselected pre-existing Windows clock-resolution flakes in test_haystack)
- mypy --strict src/layerlens/instrument/adapters/_base/resilience.py — Success
- mypy src — Success: 169 source files
- ruff check — All checks passed
- Full test suite: 1090 passed
@mmercuri mmercuri requested a review from m-peko April 27, 2026 00:16
m-peko pushed a commit that referenced this pull request May 12, 2026
…laceholder from #116) (#126)

Replaces the M7 placeholder shipped in PR #116 (truncation policy) with
the full BrowserUseAdapter — every lifecycle hook wired, every event
emitted, and every cross-cutting CLAUDE.md contract enforced from day one.

What changed
------------

Full lifecycle adapter (src/layerlens/instrument/adapters/frameworks/
browser_use/lifecycle.py):

* connect / disconnect / health_check / get_adapter_info /
  serialize_for_replay (all five abstract BaseAdapter methods).
* on_session_start, on_session_end, on_navigation, on_action,
  on_screenshot, on_dom_extraction, on_llm_call (every spec'd hook).
* Capability declaration: TRACE_TOOLS + TRACE_MODELS + TRACE_STATE +
  STREAMING + REPLAY (no longer the placeholder's TRACE_TOOLS-only set).
* Canonical events: browser.session.start, browser.navigate,
  browser.action, browser.screenshot, browser.dom.extract, tool.call,
  model.invoke, agent.input/output/state.change, cost.record,
  environment.config — plus agent.error / tool.error / model.error
  per the PR #115 error-aware emission contract.
* Per-callback resilience wrapper per PR #117 — observability errors
  NEVER crash the customer's agent, surfaced via resilience_snapshot().
* Multi-tenant org_id propagation per PR #118 — bound at construction
  (kwarg or resolved from stratix.org_id), stamped defensively on
  every emit, caller-supplied values overwritten to prevent
  cross-tenant leaks.
* Truncation policy from day one (DEFAULT_POLICY) — screenshot bytes
  DROPPED to deterministic SHA-256 references, DOM/HTML capped at
  16 KiB, prompts/completions/tool I/O at 4/2 KiB.
* Browser-event layer mapping (_BROWSER_EVENT_LAYERS) so unknown
  browser.* event types respect CaptureConfig gating without falling
  through the unknown-event-drops-by-default path.
* requires_pydantic = PydanticCompat.V2_ONLY (browser_use is a v2 lib).

Public surface (src/layerlens/instrument/adapters/frameworks/
browser_use/__init__.py):

* ADAPTER_CLASS = BrowserUseAdapter (registry).
* instrument_agent(agent, stratix=, capture_config=, org_id=)
  one-liner returning the connected, wrapping adapter.
* STRATIXBrowserUseAdapter top-level binding (legacy alias) — fires
  DeprecationWarning on construction. Exposed as a static binding so
  the manifest consistency lint's AST walk finds it.

Pyproject:

* Adds 'browser-use' optional extra: browser-use>=0.1.0,<2 with the
  python_version >= '3.11' marker (browser_use's own constraint).

Tests (tests/instrument/adapters/frameworks/test_browser_use_adapter.py):

* Replaces the 7-test scaffold from #116 with 40 tests covering:
  wiring + alias + lifecycle round-trip + truncation (screenshot
  drop, hash determinism, HTML cap, short-payload no-audit) +
  multi-tenancy (kwarg, client attribute, defensive overwrite) +
  resilience (poison stratix, exploding agent attribute access) +
  error-aware emission (agent.error / tool.error / model.error) +
  per-hook coverage + sync + async wrapping + replay round-trip +
  10-case provider detection table.

Sample (samples/instrument/browser_use/{main.py,__init__.py,README.md}):

* Runs OFFLINE — no browser-use install, no Playwright, no API key,
  no network. Three-step duck-typed agent + happy/--fail paths
  exercise the full event surface and demonstrate screenshot drop +
  org_id stamping + agent.error emission before re-raise.

Doc (docs/adapters/frameworks-browser_use.md):

* Install + quickstart + capabilities matrix + 14-event reference
  table + truncation policy table + multi-tenancy + resilience +
  error-aware emission + capture config + browser_use specifics +
  BYOK + replay sections.

Manifest (scripts/emit_adapter_manifest.py):

* Promotes browser_use from _LIFECYCLE_PREVIEW to _MATURE — every
  required artifact (test file with >= 12 funcs, sample, doc,
  STRATIX→LayerLens deprecation alias) ships in this PR.

Verification
------------

* uv run pytest tests/instrument/adapters/frameworks/test_browser_use_adapter.py
  → 40 passed
* mypy --strict src/layerlens/instrument/adapters/frameworks/browser_use
  → Success: no issues found in 2 source files
* ruff check on src + test + script
  → All checks passed!
* Sample runs cleanly offline (happy + --fail)
* pip install -e .[browser-use] resolves cleanly (browser-use only
  pulled on Python 3.11+ per the env marker)
* tests/instrument/adapters/test_manifest_consistency.py::
  test_mature_adapters_have_required_artifacts[browser_use] passes
* Full instrument suite (excl. pre-existing crewai/protocols
  references not on this branch): 312 passed, 1 skipped, 12 xfailed
@m-peko m-peko closed this May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants