feat(instrument): typed events Bundle #4 — bedrock_agents + openai_agents#152
Closed
mmercuri wants to merge 2 commits into
Closed
Conversation
Replace all 13 emit_dict_event() call sites in bedrock_agents/lifecycle.py with typed Pydantic payloads from layerlens.instrument._compat.events. Per-emission mapping: - agent.input/output (boto3 invoke pre/post hooks + on_invoke_start/end) -> AgentInputEvent / AgentOutputEvent (role=HUMAN/AGENT, framework provenance + raw_input/raw_output on MessageContent.metadata) - ACTION_GROUP and KNOWLEDGE_BASE trace steps + on_tool_use -> ToolCallEvent (integration=SERVICE for AWS-managed action groups and knowledge bases; integration=LIBRARY for the generic on_tool_use hook). Bedrock-specific provenance (framework, tool_type) folded onto the canonical input dict. - MODEL_INVOCATION trace step + on_llm_call -> ModelInvokeEvent (provider='aws_bedrock', version='unavailable', framework on parameters; canonical prompt_tokens/completion_tokens slots; paired CostRecordEvent when usage is present). - AGENT_COLLABORATOR trace step + on_handoff -> AgentHandoffEvent with deterministic sha256:<hex64> handoff_context_hash (hashes the supervisor/collaborator/reason tuple for trace steps; hashes the context string for manual handoffs, including the empty-string fallback). - environment.config (per-agent, idempotent via _seen_agents) -> EnvironmentConfigEvent (env_type=CLOUD; agent_id, agent_alias_id, enable_trace on attributes). Set ALLOW_UNREGISTERED_EVENTS = False -- bedrock_agents targets the canonical 13-event taxonomy exclusively. Test suite (14 tests, 12 pre-existing + 2 new regression): - All pre-existing assertions updated for canonical payload shape (e.g. payload['tool']['name'] instead of payload['tool_name']). - _RecordingStratix doubles record both legacy dict and typed Pydantic emissions (matches PR #138 pattern). - New: test_bedrock_agents_emits_typed_payloads_only -- asserts every emit site is an instance of the expected typed model (AgentInputEvent, AgentOutputEvent, AgentHandoffEvent, EnvironmentConfigEvent, ModelInvokeEvent, ToolCallEvent, CostRecordEvent). - New: test_bedrock_agents_emit_does_not_warn_after_migration -- filterwarnings('error', DeprecationWarning) catches any residual emit_dict_event call. Acceptance: - grep emit_dict_event src/.../bedrock_agents/ -> 0 occurrences - mypy --strict src/.../bedrock_agents -> clean - pytest tests/.../test_bedrock_agents_adapter.py -> 14/14 pass
Replace all 15 emit_dict_event() call sites in openai_agents/lifecycle.py with typed Pydantic payloads from layerlens.instrument._compat.events. Per-emission mapping: - AgentSpanData start/end + on_run_start/end -> AgentInputEvent / AgentOutputEvent (role=AGENT for span boundaries, role=HUMAN for Runner inbound; framework provenance + raw_input/raw_output + span_id on MessageContent.metadata). - GenerationSpanData + on_llm_call -> ModelInvokeEvent (provider derived from model identifier -- defaults to 'openai' since the SDK is OpenAI-centric; version='unavailable'; framework on parameters; canonical prompt_tokens/completion_tokens slots; paired CostRecordEvent when usage is present). - FunctionSpanData + on_tool_use -> ToolCallEvent (integration=LIBRARY -- function spans wrap in-process Python callables; framework on canonical input dict). - HandoffSpanData + on_handoff -> AgentHandoffEvent with deterministic sha256:<hex64> handoff_context_hash (hashes from/to/reason tuple for spans; hashes the context string for manual handoffs, including the empty-string fallback). - GuardrailSpanData -> PolicyViolationEvent (violation_type=POLICY_CONSTRAINT; framework, guardrail_name, triggered, output on details dict; root_cause + remediation set canonically). - environment.config (per-agent, idempotent via _seen_agents) -> EnvironmentConfigEvent (env_type=CLOUD; instructions, model, handoff_description, tools, handoffs on attributes). - trace_start / trace_end markers (previously ad-hoc agent.state.change with only event_subtype) -> AgentInputEvent (trace_start, role=AGENT) and AgentOutputEvent (trace_end). The canonical AgentStateChangeEvent requires before_hash/after_hash which the trace boundary cannot produce; the original event_subtype marker is preserved on MessageContent.metadata so downstream consumers can still filter. Set ALLOW_UNREGISTERED_EVENTS = False -- openai_agents targets the canonical 13-event taxonomy exclusively. Test suite (14 tests, 12 pre-existing + 2 new regression): - All pre-existing assertions updated for canonical payload shape (e.g. payload['model']['name'] instead of payload['model']). - _RecordingStratix doubles record both legacy dict and typed Pydantic emissions (matches PR #138 pattern). - New: test_openai_agents_emits_typed_payloads_only -- asserts every emit site is an instance of the expected typed model (AgentInputEvent, AgentOutputEvent, AgentHandoffEvent, EnvironmentConfigEvent, ModelInvokeEvent, ToolCallEvent, CostRecordEvent, PolicyViolationEvent). - New: test_openai_agents_emit_does_not_warn_after_migration -- filterwarnings('error', DeprecationWarning) catches any residual emit_dict_event call. Acceptance: - grep emit_dict_event src/.../openai_agents/ -> 0 occurrences - mypy --strict src/.../openai_agents -> clean - pytest tests/.../test_openai_agents_adapter.py -> 14/14 pass - 116 framework adapter tests pass overall (no regression in 11 not-yet-migrated adapters; dual-path contract preserved)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bundle 4 of the typed-event migration follow-ups to PR #129 (typed Pydantic event foundation + agno reference). Walks every
emit_dict_event()call site in thelifecycle.pymodules of two framework adapters and replaces each with the typedemit_event()path, using canonical Pydantic payloads fromlayerlens.instrument._compat.events.bedrock_agentsopenai_agentsHonest counts confirmed by
grep -rn 'emit_dict_event' src/.../<adapter>/lifecycle.py— these match the migration doc estimate (~13, ~15).Both adapters set
ALLOW_UNREGISTERED_EVENTS: bool = False— they target the canonical 13-event taxonomy exclusively.Per-adapter migration
bedrock_agents/lifecycle.py(13 → 0)agent.input(_before_invoke_agent,on_invoke_start)AgentInputEvent(role=HUMAN; framework / agent_id / session_id / enable_trace / timestamp_ns / raw_input on metadata)agent.output(_after_invoke_agent,on_invoke_end)AgentOutputEvent(framework / session_id / duration_ns / raw_output / run_status on metadata)tool.call(ACTION_GROUP, KNOWLEDGE_BASE trace steps)ToolCallEvent(integration=SERVICEfor AWS-managed; framework / tool_type folded onto canonical input)tool.call(on_tool_use)ToolCallEvent(integration=LIBRARYfor the generic manual hook)model.invoke+cost.record(MODEL_INVOCATION trace step)ModelInvokeEvent(provider='aws_bedrock',version='unavailable', framework on parameters) + pairedCostRecordEventmodel.invoke(on_llm_call)ModelInvokeEvent(provider defaults to 'aws_bedrock' for the manual hook)agent.handoff(AGENT_COLLABORATOR trace step,on_handoff)AgentHandoffEventwith deterministicsha256:<hex64>handoff_context_hash(hashes from/to/reason tuple for trace steps; hashes context string for manual handoffs, with empty-string fallback)environment.config(_emit_agent_config, idempotent per agent)EnvironmentConfigEvent(env_type=CLOUD; agent_id, agent_alias_id, enable_trace on attributes)openai_agents/lifecycle.py(15 → 0)agent.state.changeevent_subtype=trace_start (_on_trace_start)AgentInputEvent(role=AGENT; event_subtype=trace_start, trace_id, timestamp_ns on metadata)agent.state.changeevent_subtype=trace_end (_on_trace_end)AgentOutputEvent(event_subtype=trace_end, trace_id, duration_ns on metadata)agent.input(_on_agent_span_start,on_run_start)AgentInputEvent(role=AGENT for spans, role=HUMAN for Runner; framework / agent_name / span_id / raw_input on metadata)agent.output(_on_agent_span_end,on_run_end)AgentOutputEvent(framework / agent_name / span_id / raw_output / run_status on metadata)model.invoke+cost.record(_on_generation_span_end)ModelInvokeEvent(provider derived from model identifier, defaults to 'openai';version='unavailable'; framework on parameters) + pairedCostRecordEventmodel.invoke(on_llm_call)ModelInvokeEvent(provider falls back to identifier-derived guess)tool.call(_on_function_span_end,on_tool_use)ToolCallEvent(integration=LIBRARY— function spans wrap in-process Python callables)agent.handoff(_on_handoff_span_end,on_handoff)AgentHandoffEventwith deterministicsha256:<hex64>hash (hashes from/to/reason for spans; hashes context for manual)policy.violation(_on_guardrail_span_end)PolicyViolationEvent(violation_type=POLICY_CONSTRAINT; framework / guardrail_name / triggered / output on details)environment.config(_emit_agent_config, idempotent per agent)EnvironmentConfigEvent(env_type=CLOUD; instructions / model / tools / handoffs on attributes)Cross-cutting decisions
framework,agent_id/agent_name,span_id,session_id,trace_id,timestamp_ns,duration_ns,run_status,event_subtype,tool_type) moves into the canonicalmetadata/attributes/parameters/inputslots — no ad-hoc top-level fields ship on the canonical schema. Mirrors the agno reference and PR feat(instrument): Typed Pydantic events — autogen + crewai + smolagents (3 adapters / ~23 sites) #138 pattern.agent.state.changeemissions (openai_agents trace_start/trace_end) are remapped — the canonicalAgentStateChangeEventrequires realbefore_hash/after_hash. Trace boundaries map toAgentInputEvent/AgentOutputEventwith the originalevent_subtypemarker preserved onMessageContent.metadata.sha256:<hex64>format — empty contexts hash the empty string. Trace-step handoffs (which lack a context payload) hash a deterministicreason::from::totuple so replays are stable.integration=SERVICE(AWS-managed cloud services). The OpenAI Agents function tool surface (in-process Python) maps tointegration=LIBRARY. Each adapter's environment.config usesenv_type=CLOUD(Bedrock runs in AWS, OpenAI Agents in OpenAI's managed cloud).Test updates
_RecordingStratixdoubles now record legacy dict AND typed Pydantic emissions (mirrors PR feat(instrument): Typed Pydantic events — autogen + crewai + smolagents (3 adapters / ~23 sites) #138 / agno reference).payload['tool']['name']instead ofpayload['tool_name'];payload['model']['name']instead ofpayload['model']).test_<adapter>_emits_typed_payloads_only— asserts every emit site is a typed Pydantic instance of the expected canonical model class.test_<adapter>_emit_does_not_warn_after_migration—filterwarnings('error', DeprecationWarning)catches any residualemit_dict_eventcall.Acceptance
grep -rn 'emit_dict_event' src/layerlens/instrument/adapters/frameworks/{bedrock_agents,openai_agents}/→ 0 occurrencesuv run mypy --strict src/.../frameworks/{bedrock_agents,openai_agents}→ clean (4 source files)uv run pytest tests/.../test_{bedrock_agents,openai_agents}_adapter.py→ 28/28 pass (14 + 14)uv run pytest tests/instrument/adapters/frameworks/(excluding 4 pre-existing collection errors from untrackedsemantic_kernel/langfuse/bulk_ported_smoke/per_adapter_org_idmodules + 4 already-migrated suites that depend on untracked submodules) → 116/116 pass (no regression in 11 not-yet-migrated adapters; dual-path contract preserved)Multi-tenancy
Per CLAUDE.md, every typed event constructor receives
org_idindirectly viaBaseAdapter.emit_event()—BaseAdapteris constructed withorg_id=...and the recording stratix doubles in tests carryorg_id = 'test-org'so the fail-fast org check passes. Multi-tenancy contract preserved (no event emission can occur without anorg_idresolved on the adapter).Test plan
emit_dict_eventcalls on lifecycle pathsDeprecationWarningfires from lifecycle pathsemit_dict_eventwithDeprecationWarning(dual-path contract intact)mypy --strictpasses on both lifecycle modules + their init.pysha256:hash format, required fields, etc.) — verified via the canonical models' built-in validatorsReferences
feat/instrument-typed-events-foundation) — vendored canonical events, dual-path emit, agno referencedocs/adapters/typed-events.md