feat(instrument): typed events Bundle #5 — pydantic_ai + semantic_kernel + strands by mmercuri · Pull Request #153 · LayerLens/stratix-python

mmercuri · 2026-05-10T19:33:37Z

Summary

Bundle #5 of 6 in the typed-events migration. Migrates all raw emit_dict_event(...) sites in 3 framework adapters to canonical Pydantic models from PR #129's _compat/events.py. Follows the pattern proven in PR #138 (Bundle #2), PR #151 (Bundle #3), PR #152 (Bundle #4).

Per-adapter delivery (actual emission counts vs. estimate)

Adapter	Estimated	Actual	Files modified	mypy --strict	Tests
`pydantic_ai`	~10	10	2	clean	15/15
`semantic_kernel`	~10	10	5 (incl. 3 restored)	clean	14/14
`strands`	~10	10	2	clean	15/15
Total	~30	30	9	clean	44/44

Counts are grep-verified per CLAUDE.md "no fake claims" — grep -E '\.emit\(|stratix\.emit\(|emit_dict_event\(' frameworks/<adapter>/ against the foundation branch. Estimate matched exactly.

Notable design decisions

Common pattern (mirroring PR #138/#151/#152)

_stringify helper coerces arbitrary inputs/outputs to str for the canonical MessageContent.message slot; original payload preserved on metadata.raw_input / raw_output.
_coerce_to_dict helper wraps non-dict tool I/O in {"value": ...} so the canonical ToolCallEvent.input / output dict slots are satisfied.
_sha256_of (where applicable) emits canonical sha256:<hex64> hashes for AgentHandoffEvent.handoff_context_hash.
All three adapters set ALLOW_UNREGISTERED_EVENTS = False — strict canonical taxonomy.
All three rely on BaseAdapter.emit_event for org_id stamping (PR feat(instrument): Typed Pydantic event foundation + agno reference (1/17 adapters) #129 multi-tenancy guarantee).

Adapter-specific mappings

pydantic_ai (feat(instrument): migrate pydantic_ai to typed events (Bundle #5) — 0d784b1)

PydanticAI tools are in-process Python callables (@agent.tool) → IntegrationType.LIBRARY.
Ad-hoc agent.state.change run_complete/run_failed marker collapsed into AgentOutputEvent.metadata.run_status (PR feat(instrument): typed events Bundle #3 — google_adk + llama_index + ms_agent_framework #151 ms_agent_framework precedent). The marker did not satisfy the canonical AgentStateChangeEvent.before_hash/after_hash contract — the run boundary has no real state mutation to hash.

semantic_kernel (feat(instrument): migrate semantic_kernel to typed events (Bundle #5) — 63352e0)

SK plugins/skills are in-process KernelFunction Python callables → IntegrationType.LIBRARY (matches the user-feedback BYOK model: SK plugins are libraries, not external services). Memory backends (qdrant, redis, etc.) are recorded as backend_type provenance — the call is in-process even when the storage is remote.
Ad-hoc agent.code event type (used for on_prompt_render and on_planner_step) is NOT in the canonical 13-event taxonomy. Re-mapped onto ToolLogicEvent (L5b — tool business logic), following the PR feat(instrument): Typed Pydantic events — autogen + crewai + smolagents (3 adapters / ~23 sites) #138 smolagents precedent of collapsing non-canonical types onto the closest semantic match. Per-event provenance encoded as JSON-encoded key=value rule entries on ToolLogicInfo.rules.
Pre-existing repo cleanup: the foundation branch shipped only lifecycle.py for semantic_kernel; __init__.py, filters.py, metadata.py were missing (the test suite imports ADAPTER_CLASS from the package, which without __init__.py errored at collection). Restored verbatim from main — not a behaviour change, just restoring the package layer the foundation's test module already required.

strands (feat(instrument): migrate strands to typed events (Bundle #5) — c52d216)

AWS Strands tools execute as in-process Python callables in the host runtime → IntegrationType.LIBRARY. Deliberately differs from bedrock_agents (PR feat(instrument): typed events Bundle #4 — bedrock_agents + openai_agents #152) which uses IntegrationType.SERVICE — bedrock_agents tool execution is performed by the Bedrock service via Lambda action groups; Strands tool execution is in-process even though the framework targets AWS.
Default provider falls back to bedrock (Strands' primary deployment target).
TWO ad-hoc agent.state.change payloads collapsed: (a) run boundary (run_complete/run_failed) → AgentOutputEvent.metadata.run_status; (b) conversation boundary (conversation_update + turn_count) → paired AgentOutputEvent.metadata.conversation_state. Neither satisfied the canonical hash contract.

Regression tests added (2 per adapter)

test_<adapter>_emits_typed_payloads_only — drives every emission path and asserts every captured payload is a Pydantic model instance from the canonical set; the RecordingStratix typed_payloads list grows for every emission.
test_<adapter>_emit_does_not_warn_after_migration — warnings.simplefilter("error", DeprecationWarning) catches any residual emit_dict_event call.

Plus an extra strands regression: test_conversation_state_carried_on_agent_output_metadata for the conversation-state collapse.

Verification

uv run mypy --strict src/layerlens/instrument/adapters/frameworks/pydantic_ai     # clean (2 files)
uv run mypy --strict src/layerlens/instrument/adapters/frameworks/semantic_kernel # clean (4 files)
uv run mypy --strict src/layerlens/instrument/adapters/frameworks/strands         # clean (2 files)

python -m pytest tests/instrument/adapters/frameworks/test_pydantic_ai_adapter.py     # 15/15
python -m pytest tests/instrument/adapters/frameworks/test_semantic_kernel_adapter.py # 14/14
python -m pytest tests/instrument/adapters/frameworks/test_strands_adapter.py         # 15/15

# Broader regression across healthy adapters (autogen/crewai/langfuse have
# pre-existing missing-module import errors unrelated to Bundle #5):
python -m pytest tests/instrument/adapters/frameworks/test_{pydantic_ai,semantic_kernel,strands,smolagents,agno,bedrock_agents,google_adk,llama_index,ms_agent_framework,openai_agents}_adapter.py
# 132/132 passing

Verified zero residual emit_dict_event calls in any of the 3 migrated adapters via grep.

Test plan

mypy --strict clean per adapter
Per-adapter test suite passing
Broader regression across healthy adapters
Zero residual emit_dict_event in scope (grep)
Two regression tests per adapter assert typed payloads + no DeprecationWarning
PR pattern matches PR feat(instrument): Typed Pydantic events — autogen + crewai + smolagents (3 adapters / ~23 sites) #138 / feat(instrument): typed events Bundle #3 — google_adk + llama_index + ms_agent_framework #151 / feat(instrument): typed events Bundle #4 — bedrock_agents + openai_agents #152

Migrate all 10 emission sites in pydantic_ai lifecycle.py from emit_dict_event(...) to emit_event(TypedModel.create(...)) using the canonical Pydantic models from layerlens.instrument._compat.events (PR #129 foundation). Sites migrated (grep-counted, not estimated): - _extract_run_usage cost -> CostRecordEvent - _extract_run_usage model response -> ModelInvokeEvent - _extract_run_usage tool-return -> ToolCallEvent - on_run_start -> AgentInputEvent - on_run_end (output) -> AgentOutputEvent - on_run_end (state.change) -> COLLAPSED into AgentOutputEvent metadata - on_tool_use -> ToolCallEvent - on_llm_call -> ModelInvokeEvent - on_handoff -> AgentHandoffEvent (sha256:<hex64>) - _emit_agent_config -> EnvironmentConfigEvent Total: 10 emission sites (matches estimate). Behavioural change (matching PR #151 ms_agent_framework precedent): the previous adapter emitted an ad-hoc agent.state.change payload alongside agent.output to carry a run_complete / run_failed marker. That payload did not satisfy the canonical AgentStateChangeEvent before_hash / after_hash contract (the run boundary has no real state mutation to hash). The post-migration mapping carries the same signal as run_status on the AgentOutputEvent metadata, preserving the cross-cutting completion marker without violating the canonical schema. Test coverage updated to assert agent.state.change is no longer emitted and run_status is on AgentOutputEvent.metadata. PydanticAI tools are in-process Python callables (@agent.tool), so the L5a integration is IntegrationType.LIBRARY. PydanticAI agents run in a 'simulated' environment (the host application is responsible for emitting the real cloud / on_prem environment record). Helpers added: _stringify, _coerce_to_dict, _sha256_of. Regression tests added: - test_pydantic_ai_emits_typed_payloads_only - test_pydantic_ai_emit_does_not_warn_after_migration mypy --strict: clean (2 source files) pytest: 15/15 passing

Migrate all 10 emission sites in semantic_kernel lifecycle.py from emit_dict_event(...) to emit_event(TypedModel.create(...)) using the canonical Pydantic models from layerlens.instrument._compat.events (PR #129 foundation). Sites migrated (grep-counted, not estimated): - on_function_start environment.config -> EnvironmentConfigEvent - on_function_end tool.call -> ToolCallEvent (LIBRARY) - on_prompt_render agent.code -> ToolLogicEvent (L5b — see below) - on_model_invoke model.invoke -> ModelInvokeEvent - on_model_invoke cost.record -> CostRecordEvent - on_planner_step agent.code -> ToolLogicEvent (L5b) - on_memory_operation tool.call -> ToolCallEvent (LIBRARY) - on_kernel_invoke_start agent.input -> AgentInputEvent - on_kernel_invoke_end agent.output -> AgentOutputEvent - _discover_plugins environment.config -> EnvironmentConfigEvent Total: 10 emission sites (matches estimate). Vendor-concept mappings: - SK plugins/skills are in-process Python callables registered through KernelFunction. The L5a integration is therefore IntegrationType.LIBRARY for both function invocations and memory operations. Memory backends (qdrant, redis, etc.) are recorded as backend_type provenance — the *call* is still in-process even when the storage is remote. - The legacy ad-hoc 'agent.code' event type (used for prompt rendering and planner step boundaries) is NOT in the canonical 13-event taxonomy. Following the PR #138 smolagents precedent (collapsing non-canonical types onto the closest semantic match), those boundaries are re-mapped onto ToolLogicEvent (L5b — tool business logic). The semantic fit is good: prompt rendering applies templating rules to a template; planner steps emit reasoning rules (thought/action/observation). The L5b description slot carries the operation summary; per-event provenance (framework, function name, planner status, etc.) is encoded as JSON-encoded key=value rule entries on the ToolLogicInfo.rules list. Pre-existing repo cleanup: the foundation branch shipped only lifecycle.py for semantic_kernel — __init__.py, filters.py, and metadata.py were missing (the test suite imports ADAPTER_CLASS from the package, which without __init__.py errored at collection time). The missing files were re-added verbatim from main; this is not a behaviour change, only restoring the package layer required to run the test module the foundation branch already shipped. Helpers added: _stringify, _coerce_to_dict, _kv_rule. Regression tests added: - test_semantic_kernel_emits_typed_payloads_only - test_semantic_kernel_emit_does_not_warn_after_migration mypy --strict: clean (4 source files) pytest: 14/14 passing

Migrate all 10 emission sites in strands lifecycle.py from emit_dict_event(...) to emit_event(TypedModel.create(...)) using the canonical Pydantic models from layerlens.instrument._compat.events (PR #129 foundation). Sites migrated (grep-counted, not estimated): - _extract_run_details model.invoke -> ModelInvokeEvent - _extract_run_details cost.record -> CostRecordEvent - _extract_run_details tool.call -> ToolCallEvent (LIBRARY) - _extract_run_details state.change -> COLLAPSED into AgentOutputEvent metadata - on_run_start -> AgentInputEvent - on_run_end (output) -> AgentOutputEvent - on_run_end (state.change) -> COLLAPSED into AgentOutputEvent metadata - on_tool_use -> ToolCallEvent (LIBRARY) - on_llm_call -> ModelInvokeEvent - _emit_agent_config -> EnvironmentConfigEvent Total: 10 emission sites (matches estimate). Vendor-concept mappings: - AWS Strands tools execute as in-process Python callables in the host runtime — even when the underlying capability is an AWS service, the *call* boundary instrumented here is the in-process invocation. Integration is therefore IntegrationType.LIBRARY. This deliberately differs from the bedrock_agents adapter's IntegrationType.SERVICE mapping — bedrock_agents tool execution is performed by the Bedrock service via Lambda action groups, not in-process. - Strands defaults to AWS Bedrock when no provider is explicitly supplied, matching the framework's primary deployment target. Behavioural change (matching PR #151 ms_agent_framework precedent): the previous adapter emitted two ad-hoc agent.state.change payloads: (a) at the run boundary (run_complete / run_failed), and (b) at the conversation boundary (conversation_update + turn_count). Neither satisfied the canonical AgentStateChangeEvent before_hash / after_hash contract: - the run boundary has no real state mutation to hash, - the conversation boundary does not surface a hashable state snapshot. Post-migration: - run_complete / run_failed is carried as run_status on AgentOutputEvent.metadata, - conversation_update + turn_count is carried as conversation_state + turn_count on a paired AgentOutputEvent with empty message, preserving both cross-cutting completion signals without violating the canonical schema. Test coverage updated to assert agent.state.change is no longer emitted and the markers live on agent.output metadata. Helpers added: _stringify, _coerce_to_dict. Regression tests added: - test_strands_emits_typed_payloads_only - test_strands_emit_does_not_warn_after_migration - test_conversation_state_carried_on_agent_output_metadata mypy --strict: clean (2 source files) pytest: 15/15 passing broader regression: 132/132 passing across healthy adapters

mmercuri added 3 commits May 10, 2026 11:31

mmercuri requested a review from m-peko May 10, 2026 19:33

m-peko closed this May 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(instrument): typed events Bundle #5 — pydantic_ai + semantic_kernel + strands#153

feat(instrument): typed events Bundle #5 — pydantic_ai + semantic_kernel + strands#153
mmercuri wants to merge 3 commits into
feat/instrument-typed-events-foundationfrom
feat/instrument-typed-events-bundle-5-pydantic-ai-semantic-kernel-strands

mmercuri commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mmercuri commented May 10, 2026

Summary

Per-adapter delivery (actual emission counts vs. estimate)

Notable design decisions

Common pattern (mirroring PR #138/#151/#152)

Adapter-specific mappings

Regression tests added (2 per adapter)

Verification

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants