feat(instrument): typed events Bundle #6 (final) — _base + langchain + langgraph + agentforce + protocols#154
Closed
mmercuri wants to merge 4 commits into
Conversation
…— final)
Migrates the four shared LLMProviderAdapter._emit_* helpers from
emit_dict_event to typed emit_event against the canonical Pydantic
models in layerlens.instrument._compat.events. Every concrete LLM
provider (openai, anthropic, azure_openai, aws_bedrock,
google_vertex, cohere, mistral, ollama, litellm) inherits these
helpers — no per-provider change is required because the public
Python signatures (_emit_model_invoke, _emit_cost_record,
_emit_tool_calls, _emit_provider_error) are unchanged.
Migration mapping:
* _emit_model_invoke → ModelInvokeEvent.create. Adapter-specific
provenance from the metadata kwarg (response_id, finish_reason,
response_model, request_type, cache_creation_input_tokens,
cache_read_input_tokens, system_fingerprint, error,
timestamp_ns) folds onto ModelInfo.parameters since the canonical
schema does not declare these as top-level fields. cached_tokens
/ reasoning_tokens (not in CostInfo) ride on parameters too.
* _emit_cost_record → CostRecordEvent.create. The legacy
pricing_unavailable=True boolean is replaced with the canonical
api_cost_usd='unavailable' string-union sentinel — matches the
schema's NORMATIVE rule 'Costs must mark unavailable (never omit
silently)'.
* _emit_tool_calls → ToolCallEvent.create. Provider tool-use is
in-process Python so integration=LIBRARY. Legacy tool_call_id /
parent_model / provider provenance rides on namespaced
_tool_call_id / _parent_model / _provider keys inside
ToolCallEvent.input so it does not collide with caller tool args.
* _emit_provider_error → PolicyViolationEvent.create. Default
violation_type maps to ViolationType.SAFETY. Provider error
string lands on root_cause; remediation provides generic retry
guidance; provider / model / metadata flows onto details.
Adds tests/instrument/adapters/providers/test_base_provider_typed_events.py
with 14 regression tests covering each helper, the canonical dict
shape, the no-DeprecationWarning contract, and the cumulative emit
cycle. The test module gates on pytest.importorskip because
provider.py imports providers/_base/{tokens,pricing}.py which are
untracked on PR #129's foundation branch — same submodule-untracked
pattern as PR #138's affected adapters.
Verification:
grep emit_dict_event src/.../providers/_base/provider.py → 0 (4 → 0)
uv run mypy --strict on provider.py → clean (2 pre-existing
untracked-submodule errors only)
uv run pytest tests/instrument/adapters/_base/ +
test_agno_adapter.py → 56/56 pass (no regression)
Migrates the LangChain LayerLensCallbackHandler from the legacy
_emit_event() dispatcher (which routed every callback through
emit_dict_event) to a _emit_typed() dispatcher that calls
BaseAdapter.emit_event with canonical Pydantic payloads from
layerlens.instrument._compat.events.
The original 1 emit_dict_event call site in callbacks.py is the
_emit_event wrapper itself, but it is invoked from 9 callback hooks
(on_llm_end x2 success/error, on_tool_end x2 success/error,
on_agent_action, on_agent_finish, on_chain_start, on_chain_end,
on_chain_error). Every site is migrated:
* on_llm_end / on_llm_error → ModelInvokeEvent.create with provider
/ model / version='unavailable' / parameters carrying
framework='langchain' / run_id / prompts / output / token_usage /
duration_ns / invocation_params / node_name / error. Tokens lift
onto canonical prompt_tokens / completion_tokens / total_tokens.
* on_tool_end / on_tool_error / on_agent_action → ToolCallEvent.create
with integration=LIBRARY (LangChain tools are in-process Python
callables). run_id / framework / node_name / source ride on
namespaced _* keys inside the canonical input dict. on_tool_end
preserves output via {value: <str>} on the canonical output slot.
* on_agent_finish → AgentOutputEvent.create. Output/log carried on
metadata; canonical message=str(output) for schema validation.
* on_chain_start (LangGraph node executions) → AgentInputEvent.create
with role=AGENT (graph runtime origin, not human user). Provenance
(langgraph_node, langgraph_step, langgraph_triggers) on metadata.
* on_chain_end / on_chain_error → AgentOutputEvent.create with
run_status='run_complete' / 'run_failed' marker on metadata.
Adds tests/instrument/adapters/frameworks/test_langchain_typed_events.py
with 5 regression tests covering each typed-emit site and the
no-DeprecationWarning contract.
Verification:
grep emit_dict_event langchain/callbacks.py → 0 (1 → 0)
uv run mypy --strict langchain/ → clean (no errors)
uv run pytest test_langchain_typed_events.py → 5/5 pass
Migrates the LangGraph LayerLensLangGraphAdapter lifecycle hooks
(on_graph_start, on_graph_end, on_node_end) from emit_dict_event to
typed emit_event against canonical Pydantic models from
layerlens.instrument._compat.events.
5 emit_dict_event call sites migrated:
* on_graph_start environment.config → EnvironmentConfigEvent.create
with env_type=SIMULATED (LangGraph runs as in-process Python state
machine, not a cloud service). framework / graph_id / config on
EnvironmentInfo.attributes.
* on_graph_start agent.input → AgentInputEvent.create with role=AGENT
(graph executions originate from the runtime, not a human user).
graph_id / execution_id / framework / raw_input on metadata.
* on_graph_end agent.output → AgentOutputEvent.create with
run_status='run_complete' / 'run_failed' marker on metadata.
* on_graph_end agent.state.change → AgentStateChangeEvent.create with
state_type=INTERNAL. LangGraph supplies real before/after state
hashes via LangGraphStateAdapter — _canonicalize_state_hash() lifts
raw hex64 / pre-prefixed / arbitrary digests onto the canonical
sha256:<hex64> shape required by the schema validator.
* on_node_end agent.state.change → AgentStateChangeEvent.create
identical to above for per-node mutations.
Adds tests/instrument/adapters/frameworks/test_langgraph_typed_events.py
with 5 regression tests (gated on pytest.importorskip because
langgraph/{state,handoff}.py are untracked on PR #129's foundation
branch — same pattern as PR #138's untracked adapters).
Verification:
grep emit_dict_event langgraph/lifecycle.py → 0 (5 → 0)
uv run mypy --strict langgraph/ → 2 pre-existing untracked-submodule
errors only; migration code is clean
uv run pytest test_langgraph_typed_events.py → 5 skipped (importorskip
on untracked langgraph/state.py); will pass on environments where
state.py and handoff.py are present
Migrates the AgentForce AgentForceAdapter.import_sessions per-event re-emission loop from emit_dict_event to BaseAdapter.emit_event. The single emit_dict_event call site is the per-event loop that forwards normalised AgentForce trace records through the adapter pipeline. Because AgentForce is an importer-style adapter — events come from AgentForceNormalizer (Salesforce-native shape) rather than runtime instrumentation — the migration sets ALLOW_UNREGISTERED_EVENTS = True This opts the adapter out of canonical 13-event taxonomy validation: the typed-event validator wraps each dict in an open-ended Pydantic model (so org_id stamping + circuit-breaker + capture-config gating all still apply) without requiring the AgentForce taxonomy to be re-shaped onto canonical schema slots. Same policy decision PR #129 made for langfuse — both are importer-style adapters whose taxonomy is upstream-defined. The dict shape on the wire is unchanged for AgentForce consumers. The legacy identity / timestamp splice (which folds those values onto the payload root with underscore prefixes for downstream consumers) is preserved exactly; an event_type setdefault was added on the dict because the typed-event validator inspects payload ['event_type'] when invoked with a single dict argument. Adds tests/instrument/adapters/frameworks/test_agentforce_typed_events.py with 3 regression tests (gated on pytest.importorskip because agentforce/{auth,importer,normalizer}.py are untracked on PR #129's foundation branch — same pattern as PR #138's untracked adapters). Verification: grep emit_dict_event agentforce/adapter.py → 0 (1 → 0) uv run mypy --strict agentforce/ → 3 pre-existing untracked-submodule errors only; migration code is clean uv run pytest test_agentforce_typed_events.py → 3 skipped (importorskip on untracked auth/importer/normalizer); will pass on environments where those submodules are present
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bundle 6 of 6 — the final bundle in the typed-events migration
series. Walks every
emit_dict_event(...)site in the LLM providershared base class plus the three remaining mature framework adapters
(langchain, langgraph, agentforce) and replaces each call with a
typed
emit_event(TypedModel.create(...))against the canonicalPydantic models from
layerlens.instrument._compat.events(PR #129 foundation).
Honest site counts (grep, not estimated)
Per CLAUDE.md item 11 — counts verified with
grep -E 'emit_dict_event\(' src/.../<target>/:providers/_baselangchainlanggraphagentforceprotocolsprotocolshas zero emit sites on this foundation branch andis out of scope for this bundle as the migration doc predicted.
Estimates were honest in this bundle (matches PR #151 / #152
pattern).
Per-target status
providers/_base/provider.py(commit 1c8b586)_emit_model_invoke,_emit_cost_record,_emit_tool_calls,_emit_provider_error) → 0(openai, anthropic, azure_openai, aws_bedrock, google_vertex,
cohere, mistral, ollama, litellm) inherit these helpers — their
public Python signatures are unchanged so no per-provider source
edit is required.
untracked-submodule errors only —
providers/_base/tokens.pyandproviders/_base/pricing.pyare untracked on this foundationbranch, same pattern as PR feat(instrument): Typed Pydantic events — autogen + crewai + smolagents (3 adapters / ~23 sites) #138 for some adapters).
tests/instrument/adapters/providers/test_base_provider_typed_events.py— covers each helper, canonical dict shape, no-DeprecationWarning
contract, and a cumulative emit cycle. Gated on
pytest.importorskipbecauseprovider.pyitself imports theuntracked
tokens/pricingsubmodules at module load.frameworks/langchain/callbacks.py(commit f3ea009)_emit_event()dispatcher invoked from9 callback hooks) → 0
_emit_typed()dispatcher that callsBaseAdapter.emit_eventwith canonical Pydantic payloads. Eachof the 9 callback hooks now constructs the right canonical model:
ModelInvokeEvent(LLM start/end/error),ToolCallEvent(toolstart/end/error + agent action),
AgentInputEvent(chain startfor LangGraph nodes),
AgentOutputEvent(chain end/error + agentfinish).
tests/instrument/adapters/frameworks/test_langchain_typed_events.py— passes typed-emission contract for each hook + the
no-DeprecationWarning regression.
frameworks/langgraph/lifecycle.py(commit 132b51e)graph_end agent.output + agent.state.change; node_end
agent.state.change) → 0
EnvironmentConfigEvent(env_type=SIMULATED),AgentInputEvent(role=AGENT — graph runtime origin),AgentOutputEvent(run_status marker),AgentStateChangeEvent(state_type=INTERNAL with canonical sha256: hashes via
_canonicalize_state_hashhelper).(
langgraph/state.py,langgraph/handoff.py); migration code isclean.
tests/instrument/adapters/frameworks/test_langgraph_typed_events.py— gated on
pytest.importorskipbecauselifecycle.pyimportsthe untracked
state.py. Will run cleanly on environments wherethe submodule is present.
frameworks/agentforce/adapter.py(commit be5cef9)import_sessions) → 0ALLOW_UNREGISTERED_EVENTS = True. AgentForce is animporter-style adapter — events come from
AgentForceNormalizerin Salesforce-native shape rather than runtime instrumentation,
so the typed-event validator wraps them in open-ended Pydantic
models. Same policy decision PR feat(instrument): Typed Pydantic event foundation + agno reference (1/17 adapters) #129 made for langfuse.
(
auth.py,importer.py,normalizer.py); migration code isclean.
tests/instrument/adapters/frameworks/test_agentforce_typed_events.py— gated on
pytest.importorskip(untracked submodules).protocols/(out of scope per honest grep)Zero
emit_dict_eventcalls on this foundation branch — the onlyfile is
protocols/base.pywhich does not emit events directly.No migration needed; out of scope for this bundle as the
migration doc predicted.
Cross-cutting design decisions
Adapter-specific provenance (run_id, node_name, framework,
graph_id, execution_id, response_id, finish_reason,
cache_creation_input_tokens, etc.) folds onto canonical
metadata/parameters/attributesslots — no ad-hoctop-level fields ship on the canonical schema. Mirrors agno
reference and PR feat(instrument): Typed Pydantic events — autogen + crewai + smolagents (3 adapters / ~23 sites) #138 / feat(instrument): typed events Bundle #3 — google_adk + llama_index + ms_agent_framework #151 / feat(instrument): typed events Bundle #4 — bedrock_agents + openai_agents #152 pattern.
Provider-shared
_emit_provider_errormaps toPolicyViolationEvent(violation_type=ViolationType.SAFETY).Provider error string lands on
root_cause; remediation providesgeneric retry guidance; provider / model / metadata flows onto
details._emit_cost_recordunavailable pricing uses canonicalapi_cost_usd="unavailable"string-union sentinel rather thana side-channel
pricing_unavailable=Trueflag — matches theschema's NORMATIVE rule "Costs must mark unavailable (never omit
silently)".
_emit_tool_callsprovenance namespaced under_*keys(
_tool_call_id,_parent_model,_provider) inside thecanonical
inputslot so they do not collide with caller toolarguments.
LangGraph state-change hashes use a
_canonicalize_state_hashhelper that lifts raw hex64 / pre-prefixed / arbitrary digests
onto the canonical
sha256:<hex64>shape required by theschema validator. The host's
LangGraphStateAdaptermay returnany of those formats; canonicalization is deterministic and idempotent.
AgentForce
ALLOW_UNREGISTERED_EVENTS=Trueis the rightcall for an importer-style adapter (vs. forcing a full
Salesforce → canonical taxonomy re-shape in scope of this
bundle). Documented in the migration follow-ups for a future
semantic-mapping PR.
Cross-provider regression status
The 9 provider test suites
(
tests/instrument/adapters/providers/test_<provider>_adapter.py)have pre-existing collection errors on PR #129's foundation
branch because they all import from
providers/_base/tokens.py(untracked). This is the same"submodules untracked on this branch" pattern documented in
PR #138.
Verified via
git stash: the 9 collection errors are present BEFOREmy migration changes too — they are not regressions from this
bundle. The cross-provider regression contract is preserved by:
(
_emit_model_invoke, etc.) — public signatures unchanged.test_base_provider_typed_events.pycovering all 4helper paths via a
_StubProvider(LLMProviderAdapter)subclass(gated on
pytest.importorskiplike the provider tests).tokens.pyandpricing.pyare merged in (i.e. on master).Combined verification
Multi-tenancy compliance
Per PR #129 foundation:
BaseAdapter.emit_eventstampsorg_idontoevery typed payload before delegating to
self._stratix.emit(...).No emit site in this bundle bypasses the typed path, so every
emission is tenant-scoped by construction.
Test plan
grep emit_dict_event = 0mypy --strictclean on migration codeproviders/_baseregression test asserts each helper producesa canonical Pydantic instance (14 tests)
langchainregression test asserts each callback hookproduces a canonical Pydantic instance (5 tests, all passing)
langgraphregression test asserts each lifecycle hookproduces a canonical Pydantic instance (5 tests, importorskip
on untracked langgraph/state.py)
agentforceregression test asserts the import_sessions loopproduces typed open-ended payloads + ALLOW_UNREGISTERED_EVENTS
is set (3 tests, importorskip on untracked sub-submodules)
fires after migration (
filterwarnings("error", ...))provider test collection errors pre-exist; this PR does not
regress them.
tests/instrument/adapters/_base/(40 tests) + agno (16) +9 prior-bundle adapter suites all still pass — 157 total
collected & passing post-migration.
_canonicalize_state_hashre-hashingstrategy for LangGraph (re-hash arbitrary digests when not
already hex64) is the right call vs. raising on
malformed hashes.
agentforceALLOW_UNREGISTERED_EVENTSdecision (vs. re-shaping the AgentForce taxonomy onto
canonical models in this bundle).
Bundle progression — series complete
Total cumulative emission sites migrated across Bundles #2 – #6:
23 + 35 + 28 + (Bundle #5 sites) + 11 = 97 + Bundle #5 contribution
The grand-total accounting will be reconciled when Bundle #5 (the
parallel sibling PR for pydantic_ai + semantic_kernel + strands) is
on the foundation branch's diff. Per Bundle #6 alone: 11 sites
migrated, 11 estimated — 0 delta — honest counts.
References
feat/instrument-typed-events-foundation)docs/adapters/typed-events.md