Skip to content

feat(instrument): typed events Bundle #5 — pydantic_ai + semantic_kernel + strands#153

Closed
mmercuri wants to merge 3 commits into
feat/instrument-typed-events-foundationfrom
feat/instrument-typed-events-bundle-5-pydantic-ai-semantic-kernel-strands
Closed

feat(instrument): typed events Bundle #5 — pydantic_ai + semantic_kernel + strands#153
mmercuri wants to merge 3 commits into
feat/instrument-typed-events-foundationfrom
feat/instrument-typed-events-bundle-5-pydantic-ai-semantic-kernel-strands

Conversation

@mmercuri

Copy link
Copy Markdown
Contributor

Summary

Bundle #5 of 6 in the typed-events migration. Migrates all raw emit_dict_event(...) sites in 3 framework adapters to canonical Pydantic models from PR #129's _compat/events.py. Follows the pattern proven in PR #138 (Bundle #2), PR #151 (Bundle #3), PR #152 (Bundle #4).

Per-adapter delivery (actual emission counts vs. estimate)

Adapter Estimated Actual Files modified mypy --strict Tests
pydantic_ai ~10 10 2 clean 15/15
semantic_kernel ~10 10 5 (incl. 3 restored) clean 14/14
strands ~10 10 2 clean 15/15
Total ~30 30 9 clean 44/44

Counts are grep-verified per CLAUDE.md "no fake claims" — grep -E '\.emit\(|stratix\.emit\(|emit_dict_event\(' frameworks/<adapter>/ against the foundation branch. Estimate matched exactly.

Notable design decisions

Common pattern (mirroring PR #138/#151/#152)

  • _stringify helper coerces arbitrary inputs/outputs to str for the canonical MessageContent.message slot; original payload preserved on metadata.raw_input / raw_output.
  • _coerce_to_dict helper wraps non-dict tool I/O in {"value": ...} so the canonical ToolCallEvent.input / output dict slots are satisfied.
  • _sha256_of (where applicable) emits canonical sha256:<hex64> hashes for AgentHandoffEvent.handoff_context_hash.
  • All three adapters set ALLOW_UNREGISTERED_EVENTS = False — strict canonical taxonomy.
  • All three rely on BaseAdapter.emit_event for org_id stamping (PR feat(instrument): Typed Pydantic event foundation + agno reference (1/17 adapters) #129 multi-tenancy guarantee).

Adapter-specific mappings

pydantic_ai (feat(instrument): migrate pydantic_ai to typed events (Bundle #5)0d784b1)

  • PydanticAI tools are in-process Python callables (@agent.tool) → IntegrationType.LIBRARY.
  • Ad-hoc agent.state.change run_complete/run_failed marker collapsed into AgentOutputEvent.metadata.run_status (PR feat(instrument): typed events Bundle #3 — google_adk + llama_index + ms_agent_framework #151 ms_agent_framework precedent). The marker did not satisfy the canonical AgentStateChangeEvent.before_hash/after_hash contract — the run boundary has no real state mutation to hash.

semantic_kernel (feat(instrument): migrate semantic_kernel to typed events (Bundle #5)63352e0)

  • SK plugins/skills are in-process KernelFunction Python callables → IntegrationType.LIBRARY (matches the user-feedback BYOK model: SK plugins are libraries, not external services). Memory backends (qdrant, redis, etc.) are recorded as backend_type provenance — the call is in-process even when the storage is remote.
  • Ad-hoc agent.code event type (used for on_prompt_render and on_planner_step) is NOT in the canonical 13-event taxonomy. Re-mapped onto ToolLogicEvent (L5b — tool business logic), following the PR feat(instrument): Typed Pydantic events — autogen + crewai + smolagents (3 adapters / ~23 sites) #138 smolagents precedent of collapsing non-canonical types onto the closest semantic match. Per-event provenance encoded as JSON-encoded key=value rule entries on ToolLogicInfo.rules.
  • Pre-existing repo cleanup: the foundation branch shipped only lifecycle.py for semantic_kernel; __init__.py, filters.py, metadata.py were missing (the test suite imports ADAPTER_CLASS from the package, which without __init__.py errored at collection). Restored verbatim from main — not a behaviour change, just restoring the package layer the foundation's test module already required.

strands (feat(instrument): migrate strands to typed events (Bundle #5)c52d216)

  • AWS Strands tools execute as in-process Python callables in the host runtime → IntegrationType.LIBRARY. Deliberately differs from bedrock_agents (PR feat(instrument): typed events Bundle #4 — bedrock_agents + openai_agents #152) which uses IntegrationType.SERVICE — bedrock_agents tool execution is performed by the Bedrock service via Lambda action groups; Strands tool execution is in-process even though the framework targets AWS.
  • Default provider falls back to bedrock (Strands' primary deployment target).
  • TWO ad-hoc agent.state.change payloads collapsed: (a) run boundary (run_complete/run_failed) → AgentOutputEvent.metadata.run_status; (b) conversation boundary (conversation_update + turn_count) → paired AgentOutputEvent.metadata.conversation_state. Neither satisfied the canonical hash contract.

Regression tests added (2 per adapter)

  • test_<adapter>_emits_typed_payloads_only — drives every emission path and asserts every captured payload is a Pydantic model instance from the canonical set; the RecordingStratix typed_payloads list grows for every emission.
  • test_<adapter>_emit_does_not_warn_after_migrationwarnings.simplefilter("error", DeprecationWarning) catches any residual emit_dict_event call.

Plus an extra strands regression: test_conversation_state_carried_on_agent_output_metadata for the conversation-state collapse.

Verification

uv run mypy --strict src/layerlens/instrument/adapters/frameworks/pydantic_ai     # clean (2 files)
uv run mypy --strict src/layerlens/instrument/adapters/frameworks/semantic_kernel # clean (4 files)
uv run mypy --strict src/layerlens/instrument/adapters/frameworks/strands         # clean (2 files)

python -m pytest tests/instrument/adapters/frameworks/test_pydantic_ai_adapter.py     # 15/15
python -m pytest tests/instrument/adapters/frameworks/test_semantic_kernel_adapter.py # 14/14
python -m pytest tests/instrument/adapters/frameworks/test_strands_adapter.py         # 15/15

# Broader regression across healthy adapters (autogen/crewai/langfuse have
# pre-existing missing-module import errors unrelated to Bundle #5):
python -m pytest tests/instrument/adapters/frameworks/test_{pydantic_ai,semantic_kernel,strands,smolagents,agno,bedrock_agents,google_adk,llama_index,ms_agent_framework,openai_agents}_adapter.py
# 132/132 passing

Verified zero residual emit_dict_event calls in any of the 3 migrated adapters via grep.

Test plan

mmercuri added 3 commits May 10, 2026 11:31
Migrate all 10 emission sites in pydantic_ai lifecycle.py from
emit_dict_event(...) to emit_event(TypedModel.create(...)) using the
canonical Pydantic models from layerlens.instrument._compat.events
(PR #129 foundation).

Sites migrated (grep-counted, not estimated):
- _extract_run_usage cost            -> CostRecordEvent
- _extract_run_usage model response  -> ModelInvokeEvent
- _extract_run_usage tool-return     -> ToolCallEvent
- on_run_start                       -> AgentInputEvent
- on_run_end (output)                -> AgentOutputEvent
- on_run_end (state.change)          -> COLLAPSED into AgentOutputEvent metadata
- on_tool_use                        -> ToolCallEvent
- on_llm_call                        -> ModelInvokeEvent
- on_handoff                         -> AgentHandoffEvent (sha256:<hex64>)
- _emit_agent_config                 -> EnvironmentConfigEvent

Total: 10 emission sites (matches estimate).

Behavioural change (matching PR #151 ms_agent_framework precedent):
the previous adapter emitted an ad-hoc agent.state.change payload
alongside agent.output to carry a run_complete / run_failed marker.
That payload did not satisfy the canonical AgentStateChangeEvent
before_hash / after_hash contract (the run boundary has no real state
mutation to hash). The post-migration mapping carries the same signal
as run_status on the AgentOutputEvent metadata, preserving the
cross-cutting completion marker without violating the canonical
schema. Test coverage updated to assert agent.state.change is no
longer emitted and run_status is on AgentOutputEvent.metadata.

PydanticAI tools are in-process Python callables (@agent.tool), so
the L5a integration is IntegrationType.LIBRARY. PydanticAI agents
run in a 'simulated' environment (the host application is responsible
for emitting the real cloud / on_prem environment record).

Helpers added: _stringify, _coerce_to_dict, _sha256_of.

Regression tests added:
- test_pydantic_ai_emits_typed_payloads_only
- test_pydantic_ai_emit_does_not_warn_after_migration

mypy --strict: clean (2 source files)
pytest: 15/15 passing
Migrate all 10 emission sites in semantic_kernel lifecycle.py from
emit_dict_event(...) to emit_event(TypedModel.create(...)) using the
canonical Pydantic models from layerlens.instrument._compat.events
(PR #129 foundation).

Sites migrated (grep-counted, not estimated):
- on_function_start environment.config -> EnvironmentConfigEvent
- on_function_end tool.call            -> ToolCallEvent (LIBRARY)
- on_prompt_render agent.code          -> ToolLogicEvent (L5b — see below)
- on_model_invoke model.invoke         -> ModelInvokeEvent
- on_model_invoke cost.record          -> CostRecordEvent
- on_planner_step agent.code           -> ToolLogicEvent (L5b)
- on_memory_operation tool.call        -> ToolCallEvent (LIBRARY)
- on_kernel_invoke_start agent.input   -> AgentInputEvent
- on_kernel_invoke_end agent.output    -> AgentOutputEvent
- _discover_plugins environment.config -> EnvironmentConfigEvent

Total: 10 emission sites (matches estimate).

Vendor-concept mappings:
- SK plugins/skills are in-process Python callables registered
  through KernelFunction. The L5a integration is therefore
  IntegrationType.LIBRARY for both function invocations and memory
  operations. Memory backends (qdrant, redis, etc.) are recorded as
  backend_type provenance — the *call* is still in-process even
  when the storage is remote.
- The legacy ad-hoc 'agent.code' event type (used for prompt
  rendering and planner step boundaries) is NOT in the canonical
  13-event taxonomy. Following the PR #138 smolagents precedent
  (collapsing non-canonical types onto the closest semantic match),
  those boundaries are re-mapped onto ToolLogicEvent (L5b — tool
  business logic). The semantic fit is good: prompt rendering
  applies templating rules to a template; planner steps emit
  reasoning rules (thought/action/observation). The L5b
  description slot carries the operation summary; per-event
  provenance (framework, function name, planner status, etc.) is
  encoded as JSON-encoded key=value rule entries on the
  ToolLogicInfo.rules list.

Pre-existing repo cleanup: the foundation branch shipped only
lifecycle.py for semantic_kernel — __init__.py, filters.py, and
metadata.py were missing (the test suite imports ADAPTER_CLASS from
the package, which without __init__.py errored at collection time).
The missing files were re-added verbatim from main; this is not a
behaviour change, only restoring the package layer required to run
the test module the foundation branch already shipped.

Helpers added: _stringify, _coerce_to_dict, _kv_rule.

Regression tests added:
- test_semantic_kernel_emits_typed_payloads_only
- test_semantic_kernel_emit_does_not_warn_after_migration

mypy --strict: clean (4 source files)
pytest: 14/14 passing
Migrate all 10 emission sites in strands lifecycle.py from
emit_dict_event(...) to emit_event(TypedModel.create(...)) using the
canonical Pydantic models from layerlens.instrument._compat.events
(PR #129 foundation).

Sites migrated (grep-counted, not estimated):
- _extract_run_details model.invoke    -> ModelInvokeEvent
- _extract_run_details cost.record     -> CostRecordEvent
- _extract_run_details tool.call       -> ToolCallEvent (LIBRARY)
- _extract_run_details state.change    -> COLLAPSED into AgentOutputEvent metadata
- on_run_start                         -> AgentInputEvent
- on_run_end (output)                  -> AgentOutputEvent
- on_run_end (state.change)            -> COLLAPSED into AgentOutputEvent metadata
- on_tool_use                          -> ToolCallEvent (LIBRARY)
- on_llm_call                          -> ModelInvokeEvent
- _emit_agent_config                   -> EnvironmentConfigEvent

Total: 10 emission sites (matches estimate).

Vendor-concept mappings:
- AWS Strands tools execute as in-process Python callables in the
  host runtime — even when the underlying capability is an AWS
  service, the *call* boundary instrumented here is the in-process
  invocation. Integration is therefore IntegrationType.LIBRARY.
  This deliberately differs from the bedrock_agents adapter's
  IntegrationType.SERVICE mapping — bedrock_agents tool execution
  is performed by the Bedrock service via Lambda action groups,
  not in-process.
- Strands defaults to AWS Bedrock when no provider is explicitly
  supplied, matching the framework's primary deployment target.

Behavioural change (matching PR #151 ms_agent_framework precedent):
the previous adapter emitted two ad-hoc agent.state.change payloads:
  (a) at the run boundary (run_complete / run_failed), and
  (b) at the conversation boundary (conversation_update + turn_count).
Neither satisfied the canonical AgentStateChangeEvent before_hash /
after_hash contract:
  - the run boundary has no real state mutation to hash,
  - the conversation boundary does not surface a hashable state
    snapshot.
Post-migration:
  - run_complete / run_failed is carried as run_status on
    AgentOutputEvent.metadata,
  - conversation_update + turn_count is carried as
    conversation_state + turn_count on a paired AgentOutputEvent
    with empty message,
preserving both cross-cutting completion signals without violating
the canonical schema. Test coverage updated to assert
agent.state.change is no longer emitted and the markers live on
agent.output metadata.

Helpers added: _stringify, _coerce_to_dict.

Regression tests added:
- test_strands_emits_typed_payloads_only
- test_strands_emit_does_not_warn_after_migration
- test_conversation_state_carried_on_agent_output_metadata

mypy --strict: clean (2 source files)
pytest: 15/15 passing
broader regression: 132/132 passing across healthy adapters
@mmercuri mmercuri requested a review from m-peko May 10, 2026 19:33
@m-peko m-peko closed this May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants