Skip to content

feat(instrument): Standardize handoff seq + context hash + preview across 5 lighter adapters (cross-poll #7)#112

Closed
mmercuri wants to merge 1 commit into
feat/instrument-frameworks-agentsfrom
feat/instrument-handoff-standardization
Closed

feat(instrument): Standardize handoff seq + context hash + preview across 5 lighter adapters (cross-poll #7)#112
mmercuri wants to merge 1 commit into
feat/instrument-frameworks-agentsfrom
feat/instrument-handoff-standardization

Conversation

@mmercuri

Copy link
Copy Markdown
Contributor

Summary

Cross-pollination item #7 from the adapter audit
(A:/tmp/adapter-cross-pollination-audit.md §2.5). Mature adapters
(LangChain, LangGraph, CrewAI, AutoGen) emit agent.handoff events
with a consistent metadata contract — monotonic handoff_seq,
SHA-256 context_hash of canonical-JSON state, and bounded
context_preview text — but the 5 target lighter adapters either
omitted these fields or implemented them inconsistently per-adapter.

This PR introduces a shared helper at
src/layerlens/instrument/adapters/_base/handoff.py and rewires all
5 target adapters to use it.

What changed

Shared helper (new file)

src/layerlens/instrument/adapters/_base/handoff.py

  • compute_context_hash(state) — deterministic SHA-256 of canonical-
    JSON-encoded state. default=str fallback so non-JSON-native values
    (sets, datetimes, custom objects) never raise. None and {} both
    hash to the digest of "{}" so the field is never absent.
  • make_preview(content, max_chars=256) — length-bounded preview with
    U+2026 ellipsis on truncation. Returns "<unrepresentable>" when
    __str__ raises rather than propagating.
  • HandoffSequencer — thread-safe monotonic counter, 1-indexed,
    scoped per adapter instance. Lock-protected next() so concurrent
    agent runs (asyncio gathers, threadpool workers, callbacks firing
    from multiple OS threads) all draw unique IDs from one source.
  • HandoffMetadata dataclass + build_handoff_payload() — assembles
    the full payload in one call. Standard fields cannot be
    overridden
    via the extra= merge (passing
    extra={"handoff_seq": 999} does not shadow the real seq).

Re-exported from layerlens.instrument.adapters._base.

Per-adapter wiring (5 adapters)

Adapter Emit sites Notes
agno 2 Team-delegation in _extract_run_details + on_handoff hook.
openai_agents 2 HandoffSpanData span path + on_handoff hook. Single shared sequencer keeps seqs monotonic across detection paths.
llama_index 1 on_handoff hook (workflow handoff).
google_adk 1 transfer_to_agent callback path.
ms_agent_framework 2 Group-chat-turn detection in _process_message + on_handoff hook. Single shared sequencer.

Removed the previously-duplicated bespoke hashlib.sha256(...) calls
and ad-hoc [:500] truncations from each adapter — now flows through
the shared helper.

Tests

  • tests/instrument/adapters/_base/test_handoff.py — 33 unit
    tests covering helper correctness:

    • SHA-256 hash format / canonical key ordering / None handling /
      non-JSON-native value coercion / determinism.
    • Preview truncation / ellipsis / default 256-char cap / coercion /
      faulty-__str__ resilience.
    • Sequencer 1-indexed monotonicity / 20×200 concurrency stress /
      reset / instance independence.
    • Payload assembly / explicit-vs-fallback preview /
      extras-cannot-clobber-standards / parametric coverage.
    • Re-export wiring from _base/__init__.py.
  • 6 new per-adapter integration tests — each adapter now has a
    test_*_emits_standardized_metadata test verifying:

    • handoff_seq advances monotonically across multiple emit calls.
    • context_hash starts with "sha256:" and differs for different
      contexts.
    • context_preview is present and length-bounded.
    • timestamp is present and ISO 8601.
    • framework field reflects the originating adapter.

Documentation

docs/adapters/handoff-standardization.md — contract spec, helper
API reference, adapter authoring guide, and inventory of which
adapters currently follow the contract.

Test results

tests/instrument/adapters/_base/test_handoff.py        33 passed
tests/instrument/adapters/frameworks/test_agno_*       14 passed
tests/instrument/adapters/frameworks/test_ms_agent_*   13 passed
tests/instrument/adapters/frameworks/test_openai_*     13 passed
tests/instrument/adapters/frameworks/test_llama_*      13 passed
tests/instrument/adapters/frameworks/test_google_*     13 passed
                                                  ─────────────
                                                       99 passed

uv run mypy --strict src/layerlens/instrument/adapters/_base/handoff.py — clean.
uv run ruff check src/layerlens/instrument/adapters/_base tests/instrument/adapters/_base — clean.

Test plan

  • Helper unit tests pass (uv run pytest tests/instrument/adapters/_base/test_handoff.py -x)
  • All 5 adapter test suites pass (uv run pytest tests/instrument/adapters/frameworks/test_{agno,ms_agent_framework,openai_agents,llama_index,google_adk}_adapter.py -x)
  • mypy strict clean on the new helper
  • ruff clean on _base + tests/_base
  • Concurrency stress test (20 threads × 200 iterations on the sequencer) yields a contiguous, unique set of seqs

Out of scope

  • Mature adapters (LangChain, LangGraph, CrewAI, AutoGen, Agentforce,
    Semantic Kernel) keep their bespoke implementations of the same
    contract for now — migration to the shared helper is deferred to a
    follow-up so their independent test suites stay isolated. The
    contract surface they emit is already compatible.
  • Cross-pollination items Feat | LAY-874 unit tests #1 (memory persistence) and docs | LAY-881 Initial version of the SDK docs for gitbooks #2 (error-aware
    emission) are tracked separately.

…ross 5 lighter adapters

Cross-pollination item #7 from the adapter audit. Mature adapters
(LangChain, LangGraph, CrewAI, AutoGen) emit agent.handoff events
with consistent metadata: monotonic handoff_seq, SHA-256 context_hash
of canonical-JSON state, and bounded context_preview text. The 5
target lighter adapters either omitted these fields or implemented
them inconsistently per-adapter.

Adds a shared helper at src/layerlens/instrument/adapters/_base/handoff.py:

- compute_context_hash() - SHA-256 of canonical-JSON-encoded state,
  default=str fallback so non-JSON-native values never raise.
- make_preview() - length-bounded preview, U+2026 ellipsis on
  truncation, str() coercion, returns "<unrepresentable>" on
  __str__ failure.
- HandoffSequencer - thread-safe monotonic counter scoped per
  adapter instance; concurrent agent runs draw from one lock-
  protected source.
- HandoffMetadata dataclass + build_handoff_payload() - assembles
  the full standardised payload in one call. Standard fields cannot
  be overridden via the extra= merge.

Wires all 5 target adapters to the new helper:

- agno: 2 emit sites (team-delegation in _extract_run_details +
  on_handoff hook).
- openai_agents: 2 emit sites (HandoffSpanData span path +
  on_handoff hook); seqs stay monotonic across detection paths.
- llama_index: 1 emit site (on_handoff hook).
- google_adk: 1 emit site (transfer_to_agent on_handoff hook).
- ms_agent_framework: 2 emit sites (group-chat-turn detection in
  _process_message + on_handoff hook); single shared sequencer.

Tests:

- tests/instrument/adapters/_base/test_handoff.py (33 tests):
  helper correctness, canonical-key-order hashing, None handling,
  non-JSON-native value coercion, ellipsis truncation, default
  256-char cap, faulty __str__ handling, 1-indexed monotonic seqs,
  20-thread x 200-iteration concurrency stress, sequencer reset,
  instance independence, payload assembly, preview fallback,
  extras-cannot-clobber-standard-fields, re-export wiring.
- 6 new per-adapter integration tests verify standardised metadata
  appears at each emit site (12 -> 13/14 tests per adapter).

Documentation: docs/adapters/handoff-standardization.md describes
the contract, lists all standardised adapters, and gives the
authoring guide for new adapters.

mypy --strict on the helper file: clean.
ruff on _base + tests: clean.
99 tests pass (33 helper + 66 across the 5 adapter test files).
@mmercuri mmercuri requested a review from m-peko April 26, 2026 23:21
@m-peko m-peko closed this May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants