feat(instrument): typed events Bundle #5 — pydantic_ai + semantic_kernel + strands#153
Closed
mmercuri wants to merge 3 commits into
Conversation
Migrate all 10 emission sites in pydantic_ai lifecycle.py from emit_dict_event(...) to emit_event(TypedModel.create(...)) using the canonical Pydantic models from layerlens.instrument._compat.events (PR #129 foundation). Sites migrated (grep-counted, not estimated): - _extract_run_usage cost -> CostRecordEvent - _extract_run_usage model response -> ModelInvokeEvent - _extract_run_usage tool-return -> ToolCallEvent - on_run_start -> AgentInputEvent - on_run_end (output) -> AgentOutputEvent - on_run_end (state.change) -> COLLAPSED into AgentOutputEvent metadata - on_tool_use -> ToolCallEvent - on_llm_call -> ModelInvokeEvent - on_handoff -> AgentHandoffEvent (sha256:<hex64>) - _emit_agent_config -> EnvironmentConfigEvent Total: 10 emission sites (matches estimate). Behavioural change (matching PR #151 ms_agent_framework precedent): the previous adapter emitted an ad-hoc agent.state.change payload alongside agent.output to carry a run_complete / run_failed marker. That payload did not satisfy the canonical AgentStateChangeEvent before_hash / after_hash contract (the run boundary has no real state mutation to hash). The post-migration mapping carries the same signal as run_status on the AgentOutputEvent metadata, preserving the cross-cutting completion marker without violating the canonical schema. Test coverage updated to assert agent.state.change is no longer emitted and run_status is on AgentOutputEvent.metadata. PydanticAI tools are in-process Python callables (@agent.tool), so the L5a integration is IntegrationType.LIBRARY. PydanticAI agents run in a 'simulated' environment (the host application is responsible for emitting the real cloud / on_prem environment record). Helpers added: _stringify, _coerce_to_dict, _sha256_of. Regression tests added: - test_pydantic_ai_emits_typed_payloads_only - test_pydantic_ai_emit_does_not_warn_after_migration mypy --strict: clean (2 source files) pytest: 15/15 passing
Migrate all 10 emission sites in semantic_kernel lifecycle.py from emit_dict_event(...) to emit_event(TypedModel.create(...)) using the canonical Pydantic models from layerlens.instrument._compat.events (PR #129 foundation). Sites migrated (grep-counted, not estimated): - on_function_start environment.config -> EnvironmentConfigEvent - on_function_end tool.call -> ToolCallEvent (LIBRARY) - on_prompt_render agent.code -> ToolLogicEvent (L5b — see below) - on_model_invoke model.invoke -> ModelInvokeEvent - on_model_invoke cost.record -> CostRecordEvent - on_planner_step agent.code -> ToolLogicEvent (L5b) - on_memory_operation tool.call -> ToolCallEvent (LIBRARY) - on_kernel_invoke_start agent.input -> AgentInputEvent - on_kernel_invoke_end agent.output -> AgentOutputEvent - _discover_plugins environment.config -> EnvironmentConfigEvent Total: 10 emission sites (matches estimate). Vendor-concept mappings: - SK plugins/skills are in-process Python callables registered through KernelFunction. The L5a integration is therefore IntegrationType.LIBRARY for both function invocations and memory operations. Memory backends (qdrant, redis, etc.) are recorded as backend_type provenance — the *call* is still in-process even when the storage is remote. - The legacy ad-hoc 'agent.code' event type (used for prompt rendering and planner step boundaries) is NOT in the canonical 13-event taxonomy. Following the PR #138 smolagents precedent (collapsing non-canonical types onto the closest semantic match), those boundaries are re-mapped onto ToolLogicEvent (L5b — tool business logic). The semantic fit is good: prompt rendering applies templating rules to a template; planner steps emit reasoning rules (thought/action/observation). The L5b description slot carries the operation summary; per-event provenance (framework, function name, planner status, etc.) is encoded as JSON-encoded key=value rule entries on the ToolLogicInfo.rules list. Pre-existing repo cleanup: the foundation branch shipped only lifecycle.py for semantic_kernel — __init__.py, filters.py, and metadata.py were missing (the test suite imports ADAPTER_CLASS from the package, which without __init__.py errored at collection time). The missing files were re-added verbatim from main; this is not a behaviour change, only restoring the package layer required to run the test module the foundation branch already shipped. Helpers added: _stringify, _coerce_to_dict, _kv_rule. Regression tests added: - test_semantic_kernel_emits_typed_payloads_only - test_semantic_kernel_emit_does_not_warn_after_migration mypy --strict: clean (4 source files) pytest: 14/14 passing
Migrate all 10 emission sites in strands lifecycle.py from emit_dict_event(...) to emit_event(TypedModel.create(...)) using the canonical Pydantic models from layerlens.instrument._compat.events (PR #129 foundation). Sites migrated (grep-counted, not estimated): - _extract_run_details model.invoke -> ModelInvokeEvent - _extract_run_details cost.record -> CostRecordEvent - _extract_run_details tool.call -> ToolCallEvent (LIBRARY) - _extract_run_details state.change -> COLLAPSED into AgentOutputEvent metadata - on_run_start -> AgentInputEvent - on_run_end (output) -> AgentOutputEvent - on_run_end (state.change) -> COLLAPSED into AgentOutputEvent metadata - on_tool_use -> ToolCallEvent (LIBRARY) - on_llm_call -> ModelInvokeEvent - _emit_agent_config -> EnvironmentConfigEvent Total: 10 emission sites (matches estimate). Vendor-concept mappings: - AWS Strands tools execute as in-process Python callables in the host runtime — even when the underlying capability is an AWS service, the *call* boundary instrumented here is the in-process invocation. Integration is therefore IntegrationType.LIBRARY. This deliberately differs from the bedrock_agents adapter's IntegrationType.SERVICE mapping — bedrock_agents tool execution is performed by the Bedrock service via Lambda action groups, not in-process. - Strands defaults to AWS Bedrock when no provider is explicitly supplied, matching the framework's primary deployment target. Behavioural change (matching PR #151 ms_agent_framework precedent): the previous adapter emitted two ad-hoc agent.state.change payloads: (a) at the run boundary (run_complete / run_failed), and (b) at the conversation boundary (conversation_update + turn_count). Neither satisfied the canonical AgentStateChangeEvent before_hash / after_hash contract: - the run boundary has no real state mutation to hash, - the conversation boundary does not surface a hashable state snapshot. Post-migration: - run_complete / run_failed is carried as run_status on AgentOutputEvent.metadata, - conversation_update + turn_count is carried as conversation_state + turn_count on a paired AgentOutputEvent with empty message, preserving both cross-cutting completion signals without violating the canonical schema. Test coverage updated to assert agent.state.change is no longer emitted and the markers live on agent.output metadata. Helpers added: _stringify, _coerce_to_dict. Regression tests added: - test_strands_emits_typed_payloads_only - test_strands_emit_does_not_warn_after_migration - test_conversation_state_carried_on_agent_output_metadata mypy --strict: clean (2 source files) pytest: 15/15 passing broader regression: 132/132 passing across healthy adapters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bundle #5 of 6 in the typed-events migration. Migrates all raw
emit_dict_event(...)sites in 3 framework adapters to canonical Pydantic models from PR #129's_compat/events.py. Follows the pattern proven in PR #138 (Bundle #2), PR #151 (Bundle #3), PR #152 (Bundle #4).Per-adapter delivery (actual emission counts vs. estimate)
pydantic_aisemantic_kernelstrandsCounts are grep-verified per CLAUDE.md "no fake claims" —
grep -E '\.emit\(|stratix\.emit\(|emit_dict_event\(' frameworks/<adapter>/against the foundation branch. Estimate matched exactly.Notable design decisions
Common pattern (mirroring PR #138/#151/#152)
_stringifyhelper coerces arbitrary inputs/outputs tostrfor the canonicalMessageContent.messageslot; original payload preserved onmetadata.raw_input/raw_output._coerce_to_dicthelper wraps non-dict tool I/O in{"value": ...}so the canonicalToolCallEvent.input/outputdict slots are satisfied._sha256_of(where applicable) emits canonicalsha256:<hex64>hashes forAgentHandoffEvent.handoff_context_hash.ALLOW_UNREGISTERED_EVENTS = False— strict canonical taxonomy.BaseAdapter.emit_eventfor org_id stamping (PR feat(instrument): Typed Pydantic event foundation + agno reference (1/17 adapters) #129 multi-tenancy guarantee).Adapter-specific mappings
pydantic_ai(feat(instrument): migrate pydantic_ai to typed events (Bundle #5)—0d784b1)@agent.tool) →IntegrationType.LIBRARY.agent.state.changerun_complete/run_failedmarker collapsed intoAgentOutputEvent.metadata.run_status(PR feat(instrument): typed events Bundle #3 — google_adk + llama_index + ms_agent_framework #151 ms_agent_framework precedent). The marker did not satisfy the canonicalAgentStateChangeEvent.before_hash/after_hashcontract — the run boundary has no real state mutation to hash.semantic_kernel(feat(instrument): migrate semantic_kernel to typed events (Bundle #5)—63352e0)KernelFunctionPython callables →IntegrationType.LIBRARY(matches the user-feedback BYOK model: SK plugins are libraries, not external services). Memory backends (qdrant,redis, etc.) are recorded asbackend_typeprovenance — the call is in-process even when the storage is remote.agent.codeevent type (used foron_prompt_renderandon_planner_step) is NOT in the canonical 13-event taxonomy. Re-mapped ontoToolLogicEvent(L5b — tool business logic), following the PR feat(instrument): Typed Pydantic events — autogen + crewai + smolagents (3 adapters / ~23 sites) #138 smolagents precedent of collapsing non-canonical types onto the closest semantic match. Per-event provenance encoded as JSON-encodedkey=valuerule entries onToolLogicInfo.rules.lifecycle.pyfor semantic_kernel;__init__.py,filters.py,metadata.pywere missing (the test suite importsADAPTER_CLASSfrom the package, which without__init__.pyerrored at collection). Restored verbatim frommain— not a behaviour change, just restoring the package layer the foundation's test module already required.strands(feat(instrument): migrate strands to typed events (Bundle #5)—c52d216)IntegrationType.LIBRARY. Deliberately differs frombedrock_agents(PR feat(instrument): typed events Bundle #4 — bedrock_agents + openai_agents #152) which usesIntegrationType.SERVICE— bedrock_agents tool execution is performed by the Bedrock service via Lambda action groups; Strands tool execution is in-process even though the framework targets AWS.bedrock(Strands' primary deployment target).agent.state.changepayloads collapsed: (a) run boundary (run_complete/run_failed) →AgentOutputEvent.metadata.run_status; (b) conversation boundary (conversation_update+turn_count) → pairedAgentOutputEvent.metadata.conversation_state. Neither satisfied the canonical hash contract.Regression tests added (2 per adapter)
test_<adapter>_emits_typed_payloads_only— drives every emission path and asserts every captured payload is a Pydantic model instance from the canonical set; theRecordingStratixtyped_payloadslist grows for every emission.test_<adapter>_emit_does_not_warn_after_migration—warnings.simplefilter("error", DeprecationWarning)catches any residualemit_dict_eventcall.Plus an extra strands regression:
test_conversation_state_carried_on_agent_output_metadatafor the conversation-state collapse.Verification
Verified zero residual
emit_dict_eventcalls in any of the 3 migrated adapters via grep.Test plan
mypy --strictclean per adapteremit_dict_eventin scope (grep)