Skip to content

feat(instrument): Typed Pydantic events — autogen + crewai + smolagents (3 adapters / ~23 sites)#138

Closed
mmercuri wants to merge 1 commit into
feat/instrument-typed-events-foundationfrom
feat/instrument-typed-events-small-adapters
Closed

feat(instrument): Typed Pydantic events — autogen + crewai + smolagents (3 adapters / ~23 sites)#138
mmercuri wants to merge 1 commit into
feat/instrument-typed-events-foundationfrom
feat/instrument-typed-events-small-adapters

Conversation

@mmercuri

Copy link
Copy Markdown
Contributor

Summary

Bundle 1 of 5 follow-up migrations to PR #129 (typed Pydantic event foundation + agno reference). This PR walks every emit_dict_event() site in the lifecycle.py modules of 3 small framework adapters (~23 sites total) and replaces each call with the typed emit_event() path, using canonical Pydantic payloads from layerlens.instrument._compat.events.

Adapter lifecycle.py sites After
autogen 8 0
crewai 8 0
smolagents 7 0
Total 23 0

All 3 adapters set ALLOW_UNREGISTERED_EVENTS: bool = False — they target the canonical 13-event taxonomy exclusively (per-adapter extra=\"allow\" decision documented per docs/adapters/typed-events.md).

Per-adapter migration

autogen/lifecycle.py (8 → 0)

Old emission New typed emission
agent.handoff (on_send) AgentHandoffEvent (sha256: hash from message preview)
agent.state.change (on_receive — no real hashes) AgentInputEvent (role=AGENT, event_subtype=message_received on metadata)
model.invoke (on_generate_reply) ModelInvokeEvent (provider auto-detected from model identifier)
tool.call + tool.environment (on_execute_code) ToolCallEvent (name=code_execution, integration=SCRIPT) + ToolEnvironmentEvent
agent.input + agent.output (conversation start/end) AgentInputEvent + AgentOutputEvent
environment.config EnvironmentConfigEvent (env_type=simulated)

crewai/lifecycle.py (8 → 0)

Old emission New typed emission
agent.input + agent.output (crew start/end) AgentInputEvent + AgentOutputEvent
agent.code (task_start — NOT in canonical taxonomy) AgentInputEvent (role=AGENT, event_subtype=task_start)
agent.state.change (task_end — no real hashes) AgentOutputEvent (run_status=task_complete on metadata)
cost.record (token usage) CostRecordEvent (canonical prompt_tokens / completion_tokens / tokens slots)
tool.call (on_tool_use) ToolCallEvent
model.invoke (on_llm_call) ModelInvokeEvent
environment.config EnvironmentConfigEvent

smolagents/lifecycle.py (7 → 0)

Old emission New typed emission
agent.input + agent.output (run start/end) AgentInputEvent + AgentOutputEvent
tool.call (on_tool_use) ToolCallEvent
model.invoke (on_llm_call) ModelInvokeEvent
agent.handoff (on_handoff — was context_hash=None for empty context) AgentHandoffEvent (canonical sha256:<hex64> always — empty hash on empty context)
environment.config EnvironmentConfigEvent
agent.code (CodeAgent execution — NOT in canonical taxonomy) ToolCallEvent (name=code_execution, integration=SCRIPT)

Cross-cutting decisions

  • Adapter-specific provenance (framework, agent_name, message_seq, event_subtype, task_description, task_order, etc.) moves into the canonical metadata / attributes / parameters / input slots — no ad-hoc top-level fields ship on the canonical schema. Mirrors the agno reference pattern.
  • Ad-hoc agent.code emissions (crewai task_start, smolagents code execution, autogen code execution) are remapped — agent.code is NOT in the canonical 13-event taxonomy. Code execution maps to ToolCallEvent(name=\"code_execution\", integration=SCRIPT); task delegation maps to AgentInputEvent(role=AGENT).
  • No-hash agent.state.change emissions (autogen on_receive, crewai task_end) are remapped — the canonical AgentStateChangeEvent requires real before_hash / after_hash. The receive boundary maps to AgentInputEvent(role=AGENT); task completion maps to AgentOutputEvent with run_status on metadata.
  • Handoff context hashes are always emitted in sha256:<hex64> format — empty contexts hash the empty string. The previous adapters emitted None or bare hex, which the canonical validator rejects.

extra=\"allow\" policy decision

All 3 adapters set ALLOW_UNREGISTERED_EVENTS: bool = False. None of them ship custom event types outside the canonical taxonomy. Where the previous code emitted ad-hoc event types (agent.code), they are remapped to canonical alternatives rather than allowed through as unregistered. This matches the agno reference and keeps the schema-validation rejection guarantee intact.

Test updates

  • All 3 _RecordingStratix doubles now record both legacy dict and typed Pydantic emissions (mirrors the agno reference). Pre-migration assertions update from ad-hoc dict shape to canonical payload shape (e.g. payload[\"tool\"][\"name\"] instead of payload[\"tool_name\"]).
  • Each adapter gains 2 new regression tests:
    • test_<adapter>_emits_typed_payloads_only — every emit site is a typed emit_event() call.
    • test_<adapter>_emit_does_not_warn_after_migration — no DeprecationWarning fires from lifecycle.py paths (filterwarnings(\"error\", DeprecationWarning)).

Crewai delegation note

on_delegation routes through crewai/delegation.py which is untracked on this branch (see docs/adapters/typed-events-followups.md — "submodules untracked on this branch"). The delegation path still emits via emit_dict_event and will be migrated in a future follow-up PR. The 2 affected crewai tests (test_on_delegation_emits_handoff, test_capture_config_gates_l5a_tool_calls) suppress the expected DeprecationWarning with an inline pointer to the follow-up.

Acceptance

  • grep -E \"self\.emit_dict_event\(\" \$(git ls-files src/layerlens/instrument/adapters/frameworks/{autogen,crewai,smolagents}/)0 occurrences
  • uv run pytest tests/instrument/adapters/frameworks/test_{autogen,crewai,smolagents}_adapter.py -x43/43 pass (15 + 14 + 14)
  • uv run pytest tests/instrument/adapters/frameworks/ (excluding 4 pre-existing collection errors from untracked semantic_kernel / langfuse / bulk_ported_smoke / per_adapter_org_id modules) → 143/143 pass with DeprecationWarnings only from the 13 not-yet-migrated adapters (dual-path contract preserved)
  • uv run mypy --strict src/.../frameworks/{autogen,crewai,smolagents}/lifecycle.pyall 3 pass
  • uv run ruff check on changed files → all pass

Test plan

  • All 3 adapter test suites pass with new typed-event assertions
  • Regression tests confirm zero emit_dict_event calls on lifecycle paths
  • Regression tests confirm zero DeprecationWarning fires from lifecycle paths
  • No regression in 13 not-yet-migrated adapters — they still emit via emit_dict_event with DeprecationWarning (dual-path contract intact)
  • mypy --strict passes on all 3 lifecycle modules
  • ruff check passes on all changed files
  • Schema validation REJECTS invalid payloads (canonical sha256: hash format, required fields, etc.) — verified via the canonical models' built-in validators

References

…ents (3 adapters / 23 sites)

Bundle 1 follow-up to PR #129 (typed-event foundation + agno reference).

Walks every emit_dict_event() site in the lifecycle.py modules of 3
small framework adapters and replaces each call with the typed
emit_event() path:

* autogen/lifecycle.py — 8 sites
  - on_send             → AgentHandoffEvent (sha256: hash from message preview)
  - on_receive          → AgentInputEvent (role=AGENT, event_subtype on metadata)
  - on_generate_reply   → ModelInvokeEvent (provider auto-detected)
  - on_execute_code     → ToolCallEvent + ToolEnvironmentEvent (script integration)
  - on_conversation_start/end → AgentInputEvent + AgentOutputEvent
  - _emit_agent_config  → EnvironmentConfigEvent (env_type=simulated)

* crewai/lifecycle.py — 8 sites
  - on_crew_start/end   → AgentInputEvent + AgentOutputEvent
  - on_task_start       → AgentInputEvent (role=AGENT, event_subtype=task_start)
                          (was ad-hoc agent.code, NOT in canonical taxonomy)
  - on_task_end         → AgentOutputEvent (run_status=task_complete) +
                          canonical CostRecordEvent
  - on_tool_use         → ToolCallEvent
  - on_llm_call         → ModelInvokeEvent
  - _emit_agent_config  → EnvironmentConfigEvent

* smolagents/lifecycle.py — 7 sites
  - on_run_start/end    → AgentInputEvent + AgentOutputEvent
  - on_tool_use         → ToolCallEvent
  - on_llm_call         → ModelInvokeEvent
  - on_handoff          → AgentHandoffEvent (canonical sha256: hash format)
  - _emit_agent_config  → EnvironmentConfigEvent
  - _emit_code_execution → ToolCallEvent (name=code_execution, integration=SCRIPT)
                           (was ad-hoc agent.code)

Per-adapter ALLOW_UNREGISTERED_EVENTS = False on all 3 adapters — they
target the canonical 13-event taxonomy exclusively (per-adapter
extra='allow' decision documented per docs/adapters/typed-events.md).

Adapter-specific provenance (framework, agent_name, message_seq,
event_subtype, etc.) moves into the canonical metadata / attributes /
parameters / input slots; no ad-hoc top-level fields ship on the
canonical schema.

The previous 'agent.code' and 'agent.state.change' (no-hash) emissions
were rejected by the canonical schema. They are remapped to
AgentInputEvent (role=AGENT) and AgentOutputEvent (run_status on
metadata) respectively — see the typed-events guide for the
worked-example pattern.

Test updates:

* All 3 _RecordingStratix doubles now record both legacy dict and
  typed Pydantic emissions (mirrors the agno reference).
* Existing assertions migrate from ad-hoc dict shape to the canonical
  payload shape (e.g. payload['tool']['name'] instead of
  payload['tool_name']).
* New regression tests pin the post-migration contract:
  - test_<adapter>_emits_typed_payloads_only — every emit site is a
    typed emit_event() call (no legacy dict path).
  - test_<adapter>_emit_does_not_warn_after_migration — no
    DeprecationWarning fires from lifecycle.py paths.

Crewai delegation flow note: on_delegation routes through
crewai/delegation.py which is untracked on this branch (see
docs/adapters/typed-events-followups.md). The delegation path
DeprecationWarning is suppressed in the affected tests with a
pointer to the future follow-up PR.

Acceptance:
* grep emit_dict_event src/.../{autogen,crewai,smolagents}/ tracked
  files → 0 occurrences
* uv run pytest test_{autogen,crewai,smolagents}_adapter.py → 43/43 pass
* uv run pytest tests/instrument/adapters/frameworks/ (excluding 4
  pre-existing collection errors from untracked semantic_kernel /
  langfuse / smoke / per-adapter modules) → 143/143 pass with
  DeprecationWarnings only from the 13 not-yet-migrated adapters
  (dual-path contract preserved)
* uv run mypy --strict src/.../frameworks/{autogen,crewai,smolagents}/lifecycle.py → all 3 pass
* uv run ruff check on changed files → all pass
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants