Skip to content

feat(instrument): typed events Bundle #4 — bedrock_agents + openai_agents#152

Closed
mmercuri wants to merge 2 commits into
feat/instrument-typed-events-foundationfrom
feat/instrument-typed-events-bundle-4-bedrock-openai-agents
Closed

feat(instrument): typed events Bundle #4 — bedrock_agents + openai_agents#152
mmercuri wants to merge 2 commits into
feat/instrument-typed-events-foundationfrom
feat/instrument-typed-events-bundle-4-bedrock-openai-agents

Conversation

@mmercuri

Copy link
Copy Markdown
Contributor

Summary

Bundle 4 of the typed-event migration follow-ups to PR #129 (typed Pydantic event foundation + agno reference). Walks every emit_dict_event() call site in the lifecycle.py modules of two framework adapters and replaces each with the typed emit_event() path, using canonical Pydantic payloads from layerlens.instrument._compat.events.

Adapter lifecycle.py sites (verified by grep) After
bedrock_agents 13 0
openai_agents 15 0
Total 28 0

Honest counts confirmed by grep -rn 'emit_dict_event' src/.../<adapter>/lifecycle.py — these match the migration doc estimate (~13, ~15).

Both adapters set ALLOW_UNREGISTERED_EVENTS: bool = False — they target the canonical 13-event taxonomy exclusively.

Per-adapter migration

bedrock_agents/lifecycle.py (13 → 0)

Old emission New typed emission
agent.input (_before_invoke_agent, on_invoke_start) AgentInputEvent (role=HUMAN; framework / agent_id / session_id / enable_trace / timestamp_ns / raw_input on metadata)
agent.output (_after_invoke_agent, on_invoke_end) AgentOutputEvent (framework / session_id / duration_ns / raw_output / run_status on metadata)
tool.call (ACTION_GROUP, KNOWLEDGE_BASE trace steps) ToolCallEvent (integration=SERVICE for AWS-managed; framework / tool_type folded onto canonical input)
tool.call (on_tool_use) ToolCallEvent (integration=LIBRARY for the generic manual hook)
model.invoke + cost.record (MODEL_INVOCATION trace step) ModelInvokeEvent (provider='aws_bedrock', version='unavailable', framework on parameters) + paired CostRecordEvent
model.invoke (on_llm_call) ModelInvokeEvent (provider defaults to 'aws_bedrock' for the manual hook)
agent.handoff (AGENT_COLLABORATOR trace step, on_handoff) AgentHandoffEvent with deterministic sha256:<hex64> handoff_context_hash (hashes from/to/reason tuple for trace steps; hashes context string for manual handoffs, with empty-string fallback)
environment.config (_emit_agent_config, idempotent per agent) EnvironmentConfigEvent (env_type=CLOUD; agent_id, agent_alias_id, enable_trace on attributes)

openai_agents/lifecycle.py (15 → 0)

Old emission New typed emission
agent.state.change event_subtype=trace_start (_on_trace_start) AgentInputEvent (role=AGENT; event_subtype=trace_start, trace_id, timestamp_ns on metadata)
agent.state.change event_subtype=trace_end (_on_trace_end) AgentOutputEvent (event_subtype=trace_end, trace_id, duration_ns on metadata)
agent.input (_on_agent_span_start, on_run_start) AgentInputEvent (role=AGENT for spans, role=HUMAN for Runner; framework / agent_name / span_id / raw_input on metadata)
agent.output (_on_agent_span_end, on_run_end) AgentOutputEvent (framework / agent_name / span_id / raw_output / run_status on metadata)
model.invoke + cost.record (_on_generation_span_end) ModelInvokeEvent (provider derived from model identifier, defaults to 'openai'; version='unavailable'; framework on parameters) + paired CostRecordEvent
model.invoke (on_llm_call) ModelInvokeEvent (provider falls back to identifier-derived guess)
tool.call (_on_function_span_end, on_tool_use) ToolCallEvent (integration=LIBRARY — function spans wrap in-process Python callables)
agent.handoff (_on_handoff_span_end, on_handoff) AgentHandoffEvent with deterministic sha256:<hex64> hash (hashes from/to/reason for spans; hashes context for manual)
policy.violation (_on_guardrail_span_end) PolicyViolationEvent (violation_type=POLICY_CONSTRAINT; framework / guardrail_name / triggered / output on details)
environment.config (_emit_agent_config, idempotent per agent) EnvironmentConfigEvent (env_type=CLOUD; instructions / model / tools / handoffs on attributes)

Cross-cutting decisions

  • Adapter-specific provenance (framework, agent_id / agent_name, span_id, session_id, trace_id, timestamp_ns, duration_ns, run_status, event_subtype, tool_type) moves into the canonical metadata / attributes / parameters / input slots — no ad-hoc top-level fields ship on the canonical schema. Mirrors the agno reference and PR feat(instrument): Typed Pydantic events — autogen + crewai + smolagents (3 adapters / ~23 sites) #138 pattern.
  • No-hash agent.state.change emissions (openai_agents trace_start/trace_end) are remapped — the canonical AgentStateChangeEvent requires real before_hash / after_hash. Trace boundaries map to AgentInputEvent / AgentOutputEvent with the original event_subtype marker preserved on MessageContent.metadata.
  • Handoff context hashes are always emitted in sha256:<hex64> format — empty contexts hash the empty string. Trace-step handoffs (which lack a context payload) hash a deterministic reason::from::to tuple so replays are stable.
  • Bedrock action groups + knowledge bases map to integration=SERVICE (AWS-managed cloud services). The OpenAI Agents function tool surface (in-process Python) maps to integration=LIBRARY. Each adapter's environment.config uses env_type=CLOUD (Bedrock runs in AWS, OpenAI Agents in OpenAI's managed cloud).

Test updates

  • Both _RecordingStratix doubles now record legacy dict AND typed Pydantic emissions (mirrors PR feat(instrument): Typed Pydantic events — autogen + crewai + smolagents (3 adapters / ~23 sites) #138 / agno reference).
  • Pre-existing assertions updated from ad-hoc dict shape to canonical payload shape (e.g. payload['tool']['name'] instead of payload['tool_name']; payload['model']['name'] instead of payload['model']).
  • Each adapter gains 2 new regression tests:
    • test_<adapter>_emits_typed_payloads_only — asserts every emit site is a typed Pydantic instance of the expected canonical model class.
    • test_<adapter>_emit_does_not_warn_after_migrationfilterwarnings('error', DeprecationWarning) catches any residual emit_dict_event call.

Acceptance

  • grep -rn 'emit_dict_event' src/layerlens/instrument/adapters/frameworks/{bedrock_agents,openai_agents}/0 occurrences
  • uv run mypy --strict src/.../frameworks/{bedrock_agents,openai_agents}clean (4 source files)
  • uv run pytest tests/.../test_{bedrock_agents,openai_agents}_adapter.py28/28 pass (14 + 14)
  • uv run pytest tests/instrument/adapters/frameworks/ (excluding 4 pre-existing collection errors from untracked semantic_kernel / langfuse / bulk_ported_smoke / per_adapter_org_id modules + 4 already-migrated suites that depend on untracked submodules) → 116/116 pass (no regression in 11 not-yet-migrated adapters; dual-path contract preserved)

Multi-tenancy

Per CLAUDE.md, every typed event constructor receives org_id indirectly via BaseAdapter.emit_event()BaseAdapter is constructed with org_id=... and the recording stratix doubles in tests carry org_id = 'test-org' so the fail-fast org check passes. Multi-tenancy contract preserved (no event emission can occur without an org_id resolved on the adapter).

Test plan

  • All 2 adapter test suites pass with new typed-event assertions
  • Regression tests confirm zero emit_dict_event calls on lifecycle paths
  • Regression tests confirm zero DeprecationWarning fires from lifecycle paths
  • No regression in 11 not-yet-migrated adapters — they still emit via emit_dict_event with DeprecationWarning (dual-path contract intact)
  • mypy --strict passes on both lifecycle modules + their init.py
  • Schema validation REJECTS invalid payloads (canonical sha256: hash format, required fields, etc.) — verified via the canonical models' built-in validators

References

mmercuri added 2 commits May 10, 2026 10:42
Replace all 13 emit_dict_event() call sites in bedrock_agents/lifecycle.py
with typed Pydantic payloads from layerlens.instrument._compat.events.

Per-emission mapping:
- agent.input/output (boto3 invoke pre/post hooks + on_invoke_start/end)
  -> AgentInputEvent / AgentOutputEvent (role=HUMAN/AGENT, framework
  provenance + raw_input/raw_output on MessageContent.metadata)
- ACTION_GROUP and KNOWLEDGE_BASE trace steps + on_tool_use
  -> ToolCallEvent (integration=SERVICE for AWS-managed action groups
  and knowledge bases; integration=LIBRARY for the generic on_tool_use
  hook). Bedrock-specific provenance (framework, tool_type) folded
  onto the canonical input dict.
- MODEL_INVOCATION trace step + on_llm_call
  -> ModelInvokeEvent (provider='aws_bedrock', version='unavailable',
  framework on parameters; canonical prompt_tokens/completion_tokens
  slots; paired CostRecordEvent when usage is present).
- AGENT_COLLABORATOR trace step + on_handoff
  -> AgentHandoffEvent with deterministic sha256:<hex64>
  handoff_context_hash (hashes the supervisor/collaborator/reason
  tuple for trace steps; hashes the context string for manual
  handoffs, including the empty-string fallback).
- environment.config (per-agent, idempotent via _seen_agents)
  -> EnvironmentConfigEvent (env_type=CLOUD; agent_id, agent_alias_id,
  enable_trace on attributes).

Set ALLOW_UNREGISTERED_EVENTS = False -- bedrock_agents targets the
canonical 13-event taxonomy exclusively.

Test suite (14 tests, 12 pre-existing + 2 new regression):
- All pre-existing assertions updated for canonical payload shape
  (e.g. payload['tool']['name'] instead of payload['tool_name']).
- _RecordingStratix doubles record both legacy dict and typed
  Pydantic emissions (matches PR #138 pattern).
- New: test_bedrock_agents_emits_typed_payloads_only -- asserts
  every emit site is an instance of the expected typed model
  (AgentInputEvent, AgentOutputEvent, AgentHandoffEvent,
  EnvironmentConfigEvent, ModelInvokeEvent, ToolCallEvent,
  CostRecordEvent).
- New: test_bedrock_agents_emit_does_not_warn_after_migration --
  filterwarnings('error', DeprecationWarning) catches any residual
  emit_dict_event call.

Acceptance:
- grep emit_dict_event src/.../bedrock_agents/ -> 0 occurrences
- mypy --strict src/.../bedrock_agents -> clean
- pytest tests/.../test_bedrock_agents_adapter.py -> 14/14 pass
Replace all 15 emit_dict_event() call sites in
openai_agents/lifecycle.py with typed Pydantic payloads from
layerlens.instrument._compat.events.

Per-emission mapping:
- AgentSpanData start/end + on_run_start/end
  -> AgentInputEvent / AgentOutputEvent (role=AGENT for span
  boundaries, role=HUMAN for Runner inbound; framework provenance +
  raw_input/raw_output + span_id on MessageContent.metadata).
- GenerationSpanData + on_llm_call
  -> ModelInvokeEvent (provider derived from model identifier --
  defaults to 'openai' since the SDK is OpenAI-centric;
  version='unavailable'; framework on parameters; canonical
  prompt_tokens/completion_tokens slots; paired CostRecordEvent when
  usage is present).
- FunctionSpanData + on_tool_use
  -> ToolCallEvent (integration=LIBRARY -- function spans wrap
  in-process Python callables; framework on canonical input dict).
- HandoffSpanData + on_handoff
  -> AgentHandoffEvent with deterministic sha256:<hex64>
  handoff_context_hash (hashes from/to/reason tuple for spans;
  hashes the context string for manual handoffs, including the
  empty-string fallback).
- GuardrailSpanData
  -> PolicyViolationEvent (violation_type=POLICY_CONSTRAINT;
  framework, guardrail_name, triggered, output on details dict;
  root_cause + remediation set canonically).
- environment.config (per-agent, idempotent via _seen_agents)
  -> EnvironmentConfigEvent (env_type=CLOUD; instructions, model,
  handoff_description, tools, handoffs on attributes).
- trace_start / trace_end markers (previously ad-hoc
  agent.state.change with only event_subtype)
  -> AgentInputEvent (trace_start, role=AGENT) and
  AgentOutputEvent (trace_end). The canonical AgentStateChangeEvent
  requires before_hash/after_hash which the trace boundary cannot
  produce; the original event_subtype marker is preserved on
  MessageContent.metadata so downstream consumers can still filter.

Set ALLOW_UNREGISTERED_EVENTS = False -- openai_agents targets the
canonical 13-event taxonomy exclusively.

Test suite (14 tests, 12 pre-existing + 2 new regression):
- All pre-existing assertions updated for canonical payload shape
  (e.g. payload['model']['name'] instead of payload['model']).
- _RecordingStratix doubles record both legacy dict and typed
  Pydantic emissions (matches PR #138 pattern).
- New: test_openai_agents_emits_typed_payloads_only -- asserts
  every emit site is an instance of the expected typed model
  (AgentInputEvent, AgentOutputEvent, AgentHandoffEvent,
  EnvironmentConfigEvent, ModelInvokeEvent, ToolCallEvent,
  CostRecordEvent, PolicyViolationEvent).
- New: test_openai_agents_emit_does_not_warn_after_migration --
  filterwarnings('error', DeprecationWarning) catches any residual
  emit_dict_event call.

Acceptance:
- grep emit_dict_event src/.../openai_agents/ -> 0 occurrences
- mypy --strict src/.../openai_agents -> clean
- pytest tests/.../test_openai_agents_adapter.py -> 14/14 pass
- 116 framework adapter tests pass overall (no regression in 11
  not-yet-migrated adapters; dual-path contract preserved)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants