Skip to content

feat(instrument): typed events Bundle #6 (final) — _base + langchain + langgraph + agentforce + protocols#154

Closed
mmercuri wants to merge 4 commits into
feat/instrument-typed-events-foundationfrom
feat/instrument-typed-events-bundle-6-base-and-mature-adapters
Closed

feat(instrument): typed events Bundle #6 (final) — _base + langchain + langgraph + agentforce + protocols#154
mmercuri wants to merge 4 commits into
feat/instrument-typed-events-foundationfrom
feat/instrument-typed-events-bundle-6-base-and-mature-adapters

Conversation

@mmercuri

Copy link
Copy Markdown
Contributor

Summary

Bundle 6 of 6 — the final bundle in the typed-events migration
series. Walks every emit_dict_event(...) site in the LLM provider
shared base class plus the three remaining mature framework adapters
(langchain, langgraph, agentforce) and replaces each call with a
typed emit_event(TypedModel.create(...)) against the canonical
Pydantic models from layerlens.instrument._compat.events
(PR #129 foundation).

Honest site counts (grep, not estimated)

Per CLAUDE.md item 11 — counts verified with
grep -E 'emit_dict_event\(' src/.../<target>/:

Target Migration doc estimate Actual grep count Delta
providers/_base ~4 4 0
langchain ~1 1 0
langgraph ~5 5 0
agentforce ~1 1 0
protocols ~0 0 0
Bundle #6 total ~11 11 0

protocols has zero emit sites on this foundation branch and
is out of scope for this bundle as the migration doc predicted.
Estimates were honest in this bundle (matches PR #151 / #152
pattern).

Per-target status

providers/_base/provider.py (commit 1c8b586)

  • 4 emission sites (_emit_model_invoke, _emit_cost_record,
    _emit_tool_calls, _emit_provider_error) → 0
  • Cross-provider blast radius: all 9 LLM provider adapters
    (openai, anthropic, azure_openai, aws_bedrock, google_vertex,
    cohere, mistral, ollama, litellm) inherit these helpers — their
    public Python signatures are unchanged so no per-provider source
    edit is required.
  • mypy --strict: clean on the migrated code (2 pre-existing
    untracked-submodule errors only — providers/_base/tokens.py and
    providers/_base/pricing.py are untracked on this foundation
    branch, same pattern as PR feat(instrument): Typed Pydantic events — autogen + crewai + smolagents (3 adapters / ~23 sites) #138 for some adapters).
  • Regression tests: 14 tests in
    tests/instrument/adapters/providers/test_base_provider_typed_events.py
    — covers each helper, canonical dict shape, no-DeprecationWarning
    contract, and a cumulative emit cycle. Gated on
    pytest.importorskip because provider.py itself imports the
    untracked tokens / pricing submodules at module load.

frameworks/langchain/callbacks.py (commit f3ea009)

  • 1 emission site (the _emit_event() dispatcher invoked from
    9 callback hooks) → 0
  • Replaced with _emit_typed() dispatcher that calls
    BaseAdapter.emit_event with canonical Pydantic payloads. Each
    of the 9 callback hooks now constructs the right canonical model:
    ModelInvokeEvent (LLM start/end/error), ToolCallEvent (tool
    start/end/error + agent action), AgentInputEvent (chain start
    for LangGraph nodes), AgentOutputEvent (chain end/error + agent
    finish).
  • mypy --strict: clean (no errors).
  • Regression tests: 5 tests in
    tests/instrument/adapters/frameworks/test_langchain_typed_events.py
    — passes typed-emission contract for each hook + the
    no-DeprecationWarning regression.

frameworks/langgraph/lifecycle.py (commit 132b51e)

  • 5 emission sites (graph_start environment.config + agent.input;
    graph_end agent.output + agent.state.change; node_end
    agent.state.change) → 0
  • All 5 typed: EnvironmentConfigEvent (env_type=SIMULATED),
    AgentInputEvent (role=AGENT — graph runtime origin),
    AgentOutputEvent (run_status marker), AgentStateChangeEvent
    (state_type=INTERNAL with canonical sha256: hashes via
    _canonicalize_state_hash helper).
  • mypy --strict: 2 pre-existing untracked-submodule errors only
    (langgraph/state.py, langgraph/handoff.py); migration code is
    clean.
  • Regression tests: 5 tests in
    tests/instrument/adapters/frameworks/test_langgraph_typed_events.py
    — gated on pytest.importorskip because lifecycle.py imports
    the untracked state.py. Will run cleanly on environments where
    the submodule is present.

frameworks/agentforce/adapter.py (commit be5cef9)

  • 1 emission site (the per-event re-emission loop in
    import_sessions) → 0
  • Sets ALLOW_UNREGISTERED_EVENTS = True. AgentForce is an
    importer-style adapter — events come from AgentForceNormalizer
    in Salesforce-native shape rather than runtime instrumentation,
    so the typed-event validator wraps them in open-ended Pydantic
    models. Same policy decision PR feat(instrument): Typed Pydantic event foundation + agno reference (1/17 adapters) #129 made for langfuse.
  • The dict shape on the wire is unchanged for AgentForce consumers.
  • mypy --strict: 3 pre-existing untracked-submodule errors only
    (auth.py, importer.py, normalizer.py); migration code is
    clean.
  • Regression tests: 3 tests in
    tests/instrument/adapters/frameworks/test_agentforce_typed_events.py
    — gated on pytest.importorskip (untracked submodules).

protocols/ (out of scope per honest grep)

Zero emit_dict_event calls on this foundation branch — the only
file is protocols/base.py which does not emit events directly.
No migration needed; out of scope for this bundle as the
migration doc predicted.

Cross-cutting design decisions

  • Adapter-specific provenance (run_id, node_name, framework,
    graph_id, execution_id, response_id, finish_reason,
    cache_creation_input_tokens, etc.) folds onto canonical
    metadata / parameters / attributes slots — no ad-hoc
    top-level fields ship on the canonical schema. Mirrors agno
    reference and PR feat(instrument): Typed Pydantic events — autogen + crewai + smolagents (3 adapters / ~23 sites) #138 / feat(instrument): typed events Bundle #3 — google_adk + llama_index + ms_agent_framework #151 / feat(instrument): typed events Bundle #4 — bedrock_agents + openai_agents #152 pattern.

  • Provider-shared _emit_provider_error maps to
    PolicyViolationEvent(violation_type=ViolationType.SAFETY).
    Provider error string lands on root_cause; remediation provides
    generic retry guidance; provider / model / metadata flows onto
    details.

  • _emit_cost_record unavailable pricing uses canonical
    api_cost_usd="unavailable" string-union sentinel rather than
    a side-channel pricing_unavailable=True flag — matches the
    schema's NORMATIVE rule "Costs must mark unavailable (never omit
    silently)".

  • _emit_tool_calls provenance namespaced under _* keys
    (_tool_call_id, _parent_model, _provider) inside the
    canonical input slot so they do not collide with caller tool
    arguments.

  • LangGraph state-change hashes use a _canonicalize_state_hash
    helper that lifts raw hex64 / pre-prefixed / arbitrary digests
    onto the canonical sha256:<hex64> shape required by the
    schema validator. The host's LangGraphStateAdapter may return
    any of those formats; canonicalization is deterministic and idempotent.

  • AgentForce ALLOW_UNREGISTERED_EVENTS=True is the right
    call for an importer-style adapter (vs. forcing a full
    Salesforce → canonical taxonomy re-shape in scope of this
    bundle). Documented in the migration follow-ups for a future
    semantic-mapping PR.

Cross-provider regression status

The 9 provider test suites
(tests/instrument/adapters/providers/test_<provider>_adapter.py)
have pre-existing collection errors on PR #129's foundation
branch because they all import from
providers/_base/tokens.py (untracked). This is the same
"submodules untracked on this branch" pattern documented in
PR #138.

Verified via git stash: the 9 collection errors are present BEFORE
my migration changes too — they are not regressions from this
bundle. The cross-provider regression contract is preserved by:

  1. The migration changing only Python signatures' bodies
    (_emit_model_invoke, etc.) — public signatures unchanged.
  2. The new test_base_provider_typed_events.py covering all 4
    helper paths via a _StubProvider(LLMProviderAdapter) subclass
    (gated on pytest.importorskip like the provider tests).
  3. The 9 provider test suites will run cleanly downstream when
    tokens.py and pricing.py are merged in (i.e. on master).

Combined verification

$ for path in providers/_base frameworks/langchain frameworks/langgraph frameworks/agentforce protocols; do
    grep -rn 'emit_dict_event(' src/layerlens/instrument/adapters/$path/ | wc -l
  done
0  (providers/_base — was 4)
0  (frameworks/langchain — was 1)
0  (frameworks/langgraph — was 5)
0  (frameworks/agentforce — was 1)
0  (protocols — was 0)

$ uv run mypy --strict src/layerlens/instrument/adapters/providers/_base/provider.py \
    src/layerlens/instrument/adapters/frameworks/langchain/callbacks.py \
    src/layerlens/instrument/adapters/frameworks/langgraph/lifecycle.py \
    src/layerlens/instrument/adapters/frameworks/agentforce/adapter.py
Found 7 errors in 3 files (checked 4 source files)
# All 7 are pre-existing untracked-submodule import errors (verified
# via git stash). Migration code is mypy-strict clean.

$ uv run python -m pytest tests/instrument/adapters/_base/ \
    tests/instrument/adapters/frameworks/test_agno_adapter.py \
    tests/instrument/adapters/frameworks/test_*adapter.py \
    tests/instrument/adapters/frameworks/test_langchain_typed_events.py \
    tests/instrument/adapters/frameworks/test_langgraph_typed_events.py \
    tests/instrument/adapters/frameworks/test_agentforce_typed_events.py \
    tests/instrument/adapters/providers/test_base_provider_typed_events.py
157 passed, 3 skipped (3 importorskips on untracked submodules)

Multi-tenancy compliance

Per PR #129 foundation: BaseAdapter.emit_event stamps org_id onto
every typed payload before delegating to self._stratix.emit(...).
No emit site in this bundle bypasses the typed path, so every
emission is tenant-scoped by construction.

Test plan

  • All 4 migration targets: grep emit_dict_event = 0
  • All 4 migration targets: mypy --strict clean on migration code
  • providers/_base regression test asserts each helper produces
    a canonical Pydantic instance (14 tests)
  • langchain regression test asserts each callback hook
    produces a canonical Pydantic instance (5 tests, all passing)
  • langgraph regression test asserts each lifecycle hook
    produces a canonical Pydantic instance (5 tests, importorskip
    on untracked langgraph/state.py)
  • agentforce regression test asserts the import_sessions loop
    produces typed open-ended payloads + ALLOW_UNREGISTERED_EVENTS
    is set (3 tests, importorskip on untracked sub-submodules)
  • All 4 targets: regression test asserts no DeprecationWarning
    fires after migration (filterwarnings("error", ...))
  • Cross-provider blast radius confirmed via stash diff: the 9
    provider test collection errors pre-exist; this PR does not
    regress them.
  • tests/instrument/adapters/_base/ (40 tests) + agno (16) +
    9 prior-bundle adapter suites all still pass — 157 total
    collected & passing post-migration.
  • Reviewer: confirm _canonicalize_state_hash re-hashing
    strategy for LangGraph (re-hash arbitrary digests when not
    already hex64) is the right call vs. raising on
    malformed hashes.
  • Reviewer: confirm agentforce ALLOW_UNREGISTERED_EVENTS
    decision (vs. re-shaping the AgentForce taxonomy onto
    canonical models in this bundle).

Bundle progression — series complete

Bundle PR Adapters / targets Sites
#1 #129 foundation + agno reference (foundation)
#2 #138 autogen + crewai + smolagents 23
#3 #151 google_adk + llama_index + ms_agent_framework 35
#4 #152 bedrock_agents + openai_agents 28
#5 (TBD) pydantic_ai + semantic_kernel + strands (TBD)
#6 this providers/_base + langchain + langgraph + agentforce + protocols (0) 11

Total cumulative emission sites migrated across Bundles #2#6:
23 + 35 + 28 + (Bundle #5 sites) + 11 = 97 + Bundle #5 contribution

The grand-total accounting will be reconciled when Bundle #5 (the
parallel sibling PR for pydantic_ai + semantic_kernel + strands) is
on the foundation branch's diff. Per Bundle #6 alone: 11 sites
migrated, 11 estimated — 0 delta — honest counts.

References

mmercuri added 4 commits May 10, 2026 12:24
…— final)

Migrates the four shared LLMProviderAdapter._emit_* helpers from
emit_dict_event to typed emit_event against the canonical Pydantic
models in layerlens.instrument._compat.events. Every concrete LLM
provider (openai, anthropic, azure_openai, aws_bedrock,
google_vertex, cohere, mistral, ollama, litellm) inherits these
helpers — no per-provider change is required because the public
Python signatures (_emit_model_invoke, _emit_cost_record,
_emit_tool_calls, _emit_provider_error) are unchanged.

Migration mapping:

* _emit_model_invoke → ModelInvokeEvent.create. Adapter-specific
  provenance from the metadata kwarg (response_id, finish_reason,
  response_model, request_type, cache_creation_input_tokens,
  cache_read_input_tokens, system_fingerprint, error,
  timestamp_ns) folds onto ModelInfo.parameters since the canonical
  schema does not declare these as top-level fields. cached_tokens
  / reasoning_tokens (not in CostInfo) ride on parameters too.

* _emit_cost_record → CostRecordEvent.create. The legacy
  pricing_unavailable=True boolean is replaced with the canonical
  api_cost_usd='unavailable' string-union sentinel — matches the
  schema's NORMATIVE rule 'Costs must mark unavailable (never omit
  silently)'.

* _emit_tool_calls → ToolCallEvent.create. Provider tool-use is
  in-process Python so integration=LIBRARY. Legacy tool_call_id /
  parent_model / provider provenance rides on namespaced
  _tool_call_id / _parent_model / _provider keys inside
  ToolCallEvent.input so it does not collide with caller tool args.

* _emit_provider_error → PolicyViolationEvent.create. Default
  violation_type maps to ViolationType.SAFETY. Provider error
  string lands on root_cause; remediation provides generic retry
  guidance; provider / model / metadata flows onto details.

Adds tests/instrument/adapters/providers/test_base_provider_typed_events.py
with 14 regression tests covering each helper, the canonical dict
shape, the no-DeprecationWarning contract, and the cumulative emit
cycle. The test module gates on pytest.importorskip because
provider.py imports providers/_base/{tokens,pricing}.py which are
untracked on PR #129's foundation branch — same submodule-untracked
pattern as PR #138's affected adapters.

Verification:

  grep emit_dict_event src/.../providers/_base/provider.py → 0 (4 → 0)
  uv run mypy --strict on provider.py → clean (2 pre-existing
    untracked-submodule errors only)
  uv run pytest tests/instrument/adapters/_base/ +
    test_agno_adapter.py → 56/56 pass (no regression)
Migrates the LangChain LayerLensCallbackHandler from the legacy
_emit_event() dispatcher (which routed every callback through
emit_dict_event) to a _emit_typed() dispatcher that calls
BaseAdapter.emit_event with canonical Pydantic payloads from
layerlens.instrument._compat.events.

The original 1 emit_dict_event call site in callbacks.py is the
_emit_event wrapper itself, but it is invoked from 9 callback hooks
(on_llm_end x2 success/error, on_tool_end x2 success/error,
on_agent_action, on_agent_finish, on_chain_start, on_chain_end,
on_chain_error). Every site is migrated:

* on_llm_end / on_llm_error → ModelInvokeEvent.create with provider
  / model / version='unavailable' / parameters carrying
  framework='langchain' / run_id / prompts / output / token_usage /
  duration_ns / invocation_params / node_name / error. Tokens lift
  onto canonical prompt_tokens / completion_tokens / total_tokens.

* on_tool_end / on_tool_error / on_agent_action → ToolCallEvent.create
  with integration=LIBRARY (LangChain tools are in-process Python
  callables). run_id / framework / node_name / source ride on
  namespaced _* keys inside the canonical input dict. on_tool_end
  preserves output via {value: <str>} on the canonical output slot.

* on_agent_finish → AgentOutputEvent.create. Output/log carried on
  metadata; canonical message=str(output) for schema validation.

* on_chain_start (LangGraph node executions) → AgentInputEvent.create
  with role=AGENT (graph runtime origin, not human user). Provenance
  (langgraph_node, langgraph_step, langgraph_triggers) on metadata.

* on_chain_end / on_chain_error → AgentOutputEvent.create with
  run_status='run_complete' / 'run_failed' marker on metadata.

Adds tests/instrument/adapters/frameworks/test_langchain_typed_events.py
with 5 regression tests covering each typed-emit site and the
no-DeprecationWarning contract.

Verification:

  grep emit_dict_event langchain/callbacks.py → 0 (1 → 0)
  uv run mypy --strict langchain/ → clean (no errors)
  uv run pytest test_langchain_typed_events.py → 5/5 pass
Migrates the LangGraph LayerLensLangGraphAdapter lifecycle hooks
(on_graph_start, on_graph_end, on_node_end) from emit_dict_event to
typed emit_event against canonical Pydantic models from
layerlens.instrument._compat.events.

5 emit_dict_event call sites migrated:

* on_graph_start environment.config → EnvironmentConfigEvent.create
  with env_type=SIMULATED (LangGraph runs as in-process Python state
  machine, not a cloud service). framework / graph_id / config on
  EnvironmentInfo.attributes.

* on_graph_start agent.input → AgentInputEvent.create with role=AGENT
  (graph executions originate from the runtime, not a human user).
  graph_id / execution_id / framework / raw_input on metadata.

* on_graph_end agent.output → AgentOutputEvent.create with
  run_status='run_complete' / 'run_failed' marker on metadata.

* on_graph_end agent.state.change → AgentStateChangeEvent.create with
  state_type=INTERNAL. LangGraph supplies real before/after state
  hashes via LangGraphStateAdapter — _canonicalize_state_hash() lifts
  raw hex64 / pre-prefixed / arbitrary digests onto the canonical
  sha256:<hex64> shape required by the schema validator.

* on_node_end agent.state.change → AgentStateChangeEvent.create
  identical to above for per-node mutations.

Adds tests/instrument/adapters/frameworks/test_langgraph_typed_events.py
with 5 regression tests (gated on pytest.importorskip because
langgraph/{state,handoff}.py are untracked on PR #129's foundation
branch — same pattern as PR #138's untracked adapters).

Verification:

  grep emit_dict_event langgraph/lifecycle.py → 0 (5 → 0)
  uv run mypy --strict langgraph/ → 2 pre-existing untracked-submodule
    errors only; migration code is clean
  uv run pytest test_langgraph_typed_events.py → 5 skipped (importorskip
    on untracked langgraph/state.py); will pass on environments where
    state.py and handoff.py are present
Migrates the AgentForce AgentForceAdapter.import_sessions per-event
re-emission loop from emit_dict_event to BaseAdapter.emit_event.

The single emit_dict_event call site is the per-event loop that
forwards normalised AgentForce trace records through the adapter
pipeline. Because AgentForce is an importer-style adapter — events
come from AgentForceNormalizer (Salesforce-native shape) rather
than runtime instrumentation — the migration sets

  ALLOW_UNREGISTERED_EVENTS = True

This opts the adapter out of canonical 13-event taxonomy validation:
the typed-event validator wraps each dict in an open-ended Pydantic
model (so org_id stamping + circuit-breaker + capture-config gating
all still apply) without requiring the AgentForce taxonomy to be
re-shaped onto canonical schema slots. Same policy decision PR #129
made for langfuse — both are importer-style adapters whose taxonomy
is upstream-defined.

The dict shape on the wire is unchanged for AgentForce consumers.

The legacy identity / timestamp splice (which folds those values
onto the payload root with underscore prefixes for downstream
consumers) is preserved exactly; an event_type setdefault was added
on the dict because the typed-event validator inspects payload
['event_type'] when invoked with a single dict argument.

Adds tests/instrument/adapters/frameworks/test_agentforce_typed_events.py
with 3 regression tests (gated on pytest.importorskip because
agentforce/{auth,importer,normalizer}.py are untracked on PR #129's
foundation branch — same pattern as PR #138's untracked adapters).

Verification:

  grep emit_dict_event agentforce/adapter.py → 0 (1 → 0)
  uv run mypy --strict agentforce/ → 3 pre-existing untracked-submodule
    errors only; migration code is clean
  uv run pytest test_agentforce_typed_events.py → 3 skipped (importorskip
    on untracked auth/importer/normalizer); will pass on environments
    where those submodules are present
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants