Skip to content

fix(instrument): Brand leak in agentforce trust layer YAML + missing STREAMING/REPLAY capability declarations#119

Closed
mmercuri wants to merge 7 commits into
mainfrom
feat/instrument-brand-leak-and-capability-declarations
Closed

fix(instrument): Brand leak in agentforce trust layer YAML + missing STREAMING/REPLAY capability declarations#119
mmercuri wants to merge 7 commits into
mainfrom
feat/instrument-brand-leak-and-capability-declarations

Conversation

@mmercuri

Copy link
Copy Markdown
Contributor

Two related fixes from the depth audit (A:/tmp/adapter-depth-audit.md).

1. Brand leak in agentforce trust layer YAML

Audit finding: agentforce/trust_layer.py:144 was emitting

# Stratix Policy - Imported from Einstein Trust Layer
# Generated by: stratix.sdk.python.adapters.agentforce.trust_layer

into customer-visible YAML files. These files end up in customers' source trees, version control, and audit packages, so the legacy STRATIX brand and the obsolete stratix.sdk.python.* import path were leaking into customer-owned artifacts.

Fix: header now emits the LayerLens brand and the actual current import path:

# LayerLens Policy - Imported from Einstein Trust Layer
# Generated by: layerlens.instrument.adapters.frameworks.agentforce.trust_layer

A regression test test_trust_layer_yaml_has_no_stratix_brand_leak covers both the canonical method and the deprecated to_stratix_policy alias, asserting:

  • Positive: # LayerLens Policy and the new generator path are present.
  • Negative: STRATIX, Stratix, stratix.sdk, stratix.sdk.python, ateam are all absent.

Sweep (audit task #4)

grep -rn "Stratix\|stratix" src/layerlens/instrument/adapters/frameworks/ | grep -v "deprecated\|backward\|alias" surfaced additional leaks. All non-deprecation contexts fixed:

  • embedding/embedding_adapter.py — module docstring, author="STRATIX Team" (surfaced via AdapterInfo)
  • embedding/vector_store_adapter.py — module docstring + author
  • embedding/__init__.py, benchmark_import/__init__.py, benchmark_import/adapter.py, semantic_kernel/__init__.py, semantic_kernel/lifecycle.py, semantic_kernel/filters.py — module docstrings
  • agentforce/auth.py — error messages now reference layerlens agentforce connect (current CLI binary; both layerlens and stratix are registered scripts)
  • agentforce/events.py — Platform Events thread name (stratix-sf-eventslayerlens-sf-events)
  • agentforce/mapper.py — internal docstrings
  • google_adk, llama_index, openai_agents, semantic_kernel — method docstrings
  • llama_index/lifecycle.py — closure-only StratixEventHandler renamed to LayerLensEventHandler (the name surfaces in LlamaIndex dispatcher logs / UI)

Intentionally left alone (would require deprecation aliases — out of scope for a fix PR):

  • class StratixMemoryStore (semantic_kernel) — public class with backward-compat note added to the docstring
  • class Stratix re-exported from layerlens — actual API client class name
  • to_stratix_policy method — already exists as a deprecated alias

2. Missing STREAMING / REPLAY capability declarations

Audit finding: 6 adapters wrap streaming methods but do not declare AdapterCapability.STREAMING. Only langfuse declares REPLAY despite every adapter implementing serialize_for_replay().

STREAMING (added to all 6)

Adapter Streaming entry-point wrapped
agno Agent.arun (async stream)
ms_agent_framework ChatCompletionAgent.invoke_stream
openai_agents TraceProcessor receives GenerationSpanData per chunk
google_adk BeforeModelCallback / AfterModelCallback fire per chunk
llama_index Instrumentation Module emits per-chunk events
bedrock_agents invoke_agent returns an EventStream completion

REPLAY (added to every adapter that implements serialize_for_replay())

agno, bedrock_agents, google_adk, llama_index, pydantic_ai, strands, openai_agents, ms_agent_framework, embedding (both EmbeddingAdapter and VectorStoreAdapter), semantic_kernel, smolagents, agentforce.

Without these declarations, the atlas-app catalog UI surfaces incorrect feature support — telling customers they cannot replay traces from an adapter that supports it, or cannot stream from one that wraps every streaming entry-point.

3. Tests

  • Per-adapter test extension: every adapter test file gained a declares_streaming_and_replay_capabilities (6) or declares_replay_capability (7) test that asserts the capability appears in get_adapter_info().capabilities.
  • New lint guard tests/instrument/adapters/frameworks/test_capability_consistency.py enforces both rules consistently across every adapter discovered in the branch — the in-tree counterpart of the upstream manifest_consistency lint guard from the manifest emitter PR:
    • test_replay_capability_matches_serialize_for_replay — REPLAY declared iff serialize_for_replay implemented (with non-stub body).
    • test_streaming_capability_declared_for_streaming_adapters — every adapter on the canonical streaming list declares STREAMING.

4. Drive-by fixes

  • tests/instrument/adapters/frameworks/test_bulk_ported_smoke.py was hard-coded to import every adapter package, failing collection on any branch missing one of them. Replaced the per-adapter import block with a try/except importlib loop so the smoke suite tests whatever adapters are present.
  • test_agentforce.test_package_does_not_eagerly_import_requests was deleting agentforce.* from sys.modules without restoring them, breaking class identity (is) checks in subsequent tests within the same session. Saved/restored the original module objects so the cleanup is hermetic.

5. Sample regenerated trust-layer YAML

# LayerLens Policy - Imported from Einstein Trust Layer
# Generated by: layerlens.instrument.adapters.frameworks.agentforce.trust_layer
# Source: Salesforce Einstein Trust Layer

policy:
  name: customer_policy
  version: "1.0.0"
  description: "Policy imported from Salesforce Einstein Trust Layer"
  source: salesforce_agentforce

settings:
  data_masking: true
  zero_data_retention: true
  audit_trail: true

rules:
  - name: toxicity_detection
    description: "Imported from Einstein Trust Layer: toxicity"
    type: toxicity
    enabled: true
    threshold: 0.7
    action: block
    source: einstein_trust_layer
  - name: pii_detection
    description: "Imported from Einstein Trust Layer: pii"
    type: pii
    enabled: true
    threshold: 0.8
    action: block
    source: einstein_trust_layer
  ...

Verification

$ uv run python -m pytest tests/instrument/adapters/frameworks/
======================= 231 passed, 1 skipped in 2.14s ========================

$ uv run python -m pytest tests/instrument/adapters/frameworks/test_agentforce.py -k brand_leak
====================== 1 passed, 37 deselected in 0.66s =======================

$ uv run python -m mypy --strict src/layerlens/instrument/adapters/frameworks/agentforce
Success: no issues found in 11 source files

$ uv run python -m ruff check src/layerlens/instrument/adapters/frameworks/ tests/instrument/adapters/frameworks/
All checks passed!

Branch base note

This branch merges feat/instrument-frameworks-agentforce (brand leak source) with feat/instrument-frameworks-agents (the 13 agent-tier framework adapters that need capability declarations) so all enumerated audit items can land in a single PR. Conflicts resolved in pyproject.toml (combined the agentforce + agent-tier extras blocks) and frameworks/__init__.py (combined the docstring adapter list).

Test plan

  • uv run python -m pytest tests/instrument/adapters/frameworks/ (231 passed, 1 skipped)
  • uv run python -m pytest tests/instrument/adapters/frameworks/test_agentforce.py -k brand_leak (1 passed)
  • uv run python -m mypy --strict src/layerlens/instrument/adapters/frameworks/agentforce (clean)
  • uv run python -m ruff check src/layerlens/instrument/adapters/frameworks/ tests/instrument/adapters/frameworks/ (clean)
  • Manual sample of regenerated trust-layer YAML — LayerLens branding, no STRATIX leaks.

mmercuri and others added 7 commits April 25, 2026 19:13
Bootstraps the LayerLens instrument layer with the abstract base classes,
adapter registry, capture configuration, event sinks, vendored event
schemas, and pydantic v1/v2 compatibility shim that every concrete
adapter (frameworks, protocols, providers) will depend on.

Scope
-----
- src/layerlens/instrument/__init__.py: lean re-export surface
- src/layerlens/instrument/_vendored/: frozen ateam event schemas (no
  runtime ateam dependency)
- src/layerlens/instrument/adapters/_base/: BaseAdapter, AdapterRegistry,
  AdapterStatus, AdapterHealth, AdapterCapability, ReplayableTrace,
  CaptureConfig, EventSink, TraceStoreSink, IngestionPipelineSink,
  PydanticCompat
- src/layerlens/_compat/pydantic.py: model_dump/model_validate shim
  spanning pydantic v1 + v2
- scripts/{port_adapter,port_protocol,emit_adapter_manifest,
  regen_dep_baselines}.py: codegen helpers used to port the rest of M1
- tests/instrument/{test_base_layer,test_lazy_imports,
  test_default_install,test_resolved_dep_tree}.py + _baselines/
- .github/workflows/dep-tree-guard.yaml: CI gate that locks the default
  install footprint
- docs/adapters/: CONTRIBUTING, STATUS, pydantic-compatibility, testing,
  PERSONA_REVIEW

Blast radius
------------
- Pure additions. No public surface changes outside the new
  layerlens.instrument namespace.
- Default `pip install layerlens` install set is unchanged (verified by
  test_default_install.py against the new baseline).
- Lazy adapter discovery: importing layerlens.instrument MUST NOT pull
  in any optional adapter dep (verified by test_lazy_imports.py).

Test plan
---------
- uv run pytest tests/instrument/test_base_layer.py
  tests/instrument/test_lazy_imports.py -x  -> 45 passed
- The dep-tree-guard workflow exercises test_default_install.py and
  test_resolved_dep_tree.py against the new baselines on every PR.

LAY-3400 umbrella: this PR is the prerequisite for the M1.B/M1.C/M1.D
adapter ports, M7 protocol certification, and M8 Cohere/Mistral.
Ports the twelve agent-tier framework adapters from the ateam
reference implementation onto the new layerlens.instrument base layer:

  Semantic Kernel, LlamaIndex, OpenAI Agents, Pydantic-AI, Agno,
  Strands, SmolAgents, MS Agent Framework, Google ADK,
  Bedrock Agents, Embedding (vector store hooks), Benchmark Import

Pairs with feat/instrument-frameworks-orchestration (M1.C part 1)
which lands LangChain, LangGraph, CrewAI, AutoGen, Langfuse, and
Agentforce. Together they complete M1.C.

Scope
-----
- src/layerlens/instrument/adapters/frameworks/{semantic_kernel,
  llama_index,openai_agents,pydantic_ai,agno,strands,smolagents,
  ms_agent_framework,google_adk,bedrock_agents,embedding,
  benchmark_import}/: per-framework packages
- tests/instrument/adapters/frameworks/test_*_adapter.py + the
  test_bulk_ported_smoke.py harness (which exercises every ported
  adapter against canned trace fixtures so partial framework SDKs
  on a given runner don't drop coverage to zero)
- samples/instrument/<framework>/: runnable per-framework samples
- docs/adapters/frameworks-<framework>.md: per-framework integration
  guide
- pyproject.toml: twelve new optional extras
  (semantic-kernel, llama-index, openai-agents, pydantic-ai, agno,
  strands, smolagents, ms-agent-framework, google-adk,
  bedrock-agents, embedding, benchmark-import) with python_version
  markers; pyright/ruff exclusions for the dynamic monkey-patching
  framework code

Blast radius
------------
- Default `pip install layerlens` install set is unchanged. Each
  framework's heavy deps are gated behind their own extra.
- No changes to existing public API surface.
- Importing layerlens.instrument still does NOT pull in any framework
  module (lazy registry lookup).

Test plan
---------
- uv run pytest tests/instrument/adapters/frameworks/ -x  ->
  184 passed, 1 skipped (test_bulk_ported_smoke.py covers all 12
  agent-tier adapters plus the orchestration-tier ones from part 1
  via the same harness)

Stacks on
---------
- feat/instrument-base-foundation (M1.A) — required for the
  BaseAdapter surface this PR consumes.

Sibling of
----------
- feat/instrument-frameworks-orchestration (M1.C part 1) — both
  branches stack on the base foundation independently and don't
  conflict; they can land in either order.

LAY-3400 umbrella (M1.C part 2).
Auto-fixed by 'ruff check --fix'. No behavior change.
Ports the Salesforce Agentforce framework adapter from ateam
(stratix.sdk.python.adapters.agentforce, ~2,954 LOC across 11 files —
the largest of the M2 framework batch) onto the layerlens.instrument
base layer landed in M1.A.

Scope
-----
- src/layerlens/instrument/adapters/frameworks/agentforce/ — full port
  of all 11 modules (adapter, auth, client, events, importer, llm_eval,
  mapper, models, normalizer, trust_layer, __init__)
- src/layerlens/instrument/adapters/frameworks/__init__.py — package
  marker that does NOT eagerly import any framework SDK
- tests/instrument/adapters/frameworks/test_agentforce.py — 36 unit
  tests (lifecycle, importer with paginated SOQL fixtures, normalizer
  for every DMO record type, Agent API client/mapper round-trip,
  Trust Layer YAML emission + deprecation alias, Platform Events
  handler, Einstein evaluator offline behavior, lazy-import guard).
  All mocks are SDK-shape only — no real Salesforce / network call.
- samples/instrument/agentforce/ — runnable end-to-end sample with
  4 mocked flows (SOQL backfill, live Agent API capture, Trust Layer
  policy export, evaluator offline) plus optional live JWT auth check.
- docs/adapters/frameworks-agentforce.md — integration guide including
  Connected App + JWT Bearer OAuth setup, event taxonomy, capture
  config, BYOK, Trust Layer round-trip, and replay semantics.
- pyproject.toml — new "agentforce" optional extra
  (requests + PyJWT[crypto]).

Salesforce specifics preserved from the source port
---------------------------------------------------
- OAuth 2.0 JWT Bearer flow with private-key resolution from env vars,
  filesystem paths, or inline PEM strings.
- SOQL injection guards: every parent ID interpolated into WHERE … IN
  clauses is validated against the 15/18-char Salesforce ID regex;
  date and timestamp params validated against ISO 8601 regexes.
- Token re-authentication on expiry, with X-RateLimit / Sforce-Limit-Info
  warnings at 80% consumption.
- Trust Layer policy export renamed to_layerlens_policy with a
  deprecation alias keeping to_stratix_policy callable for one
  migration window.

Verification
------------
- mypy --strict src/layerlens/instrument/adapters/frameworks/agentforce
  → Success: no issues found in 11 source files
- ruff check src/layerlens/instrument/adapters/frameworks/agentforce
  tests/instrument/adapters/frameworks/test_agentforce.py
  → All checks passed
- pytest tests/instrument/adapters/frameworks/test_agentforce.py
  → 36 passed
- pytest tests/instrument/test_default_install.py
  → 3 passed (extra does not change default install set)
- python samples/instrument/agentforce/main.py
  → exits 0, prints all 4 flow summaries

Refs LAY-INSTRUMENT (M2 fan-out)
Two corrections after running mypy --strict against a fresh resolved
environment:

- auth.py: drop the now-unused [arg-type] ignore on _check_rate_limit;
  response.headers from requests already typechecks against the
  dict[str, Any] parameter.
- events.py: include [import-not-found] alongside [import-untyped] in
  the optional grpc import; mypy resolves grpc to import-not-found when
  no stubs are installed (the default install path).

mypy --strict src/layerlens/instrument/adapters/frameworks/agentforce
  -> Success: no issues found in 11 source files
…STREAMING/REPLAY capability declarations

Two related fixes from the depth audit (A:/tmp/adapter-depth-audit.md).

1. Brand leak in agentforce trust layer
=========================================

The Trust Layer importer writes a customer-visible YAML policy file
into the customer's source tree. The header carried legacy STRATIX
branding:

  # Stratix Policy - Imported from Einstein Trust Layer
  # Generated by: stratix.sdk.python.adapters.agentforce.trust_layer

both lines are leaked to the customer's VCS / auditors / shared docs.
That landed in the original port; this fix replaces them with the
current LayerLens brand and the actual import path:

  # LayerLens Policy - Imported from Einstein Trust Layer
  # Generated by: layerlens.instrument.adapters.frameworks.agentforce.trust_layer

A regression test (test_trust_layer_yaml_has_no_stratix_brand_leak)
asserts both the positive ("LayerLens Policy" present) and negative
("STRATIX" / "stratix.sdk" absent) cases across the canonical method
AND the deprecated to_stratix_policy alias, so no future regression
slips through either entry point.

The audit-wide sweep (#4 in the brief) caught additional STRATIX brand
strings in non-deprecation contexts. All fixed in this PR:

* embedding/embedding_adapter.py — module docstring + author field
  (was author="STRATIX Team" surfaced via AdapterInfo)
* embedding/vector_store_adapter.py — module docstring + author field
* embedding/__init__.py, benchmark_import/__init__.py,
  benchmark_import/adapter.py, semantic_kernel/__init__.py,
  semantic_kernel/lifecycle.py, semantic_kernel/filters.py — module
  docstrings
* agentforce/auth.py — error messages now say
  `layerlens agentforce connect` (current CLI binary)
* agentforce/events.py — Platform Events thread name (was
  "stratix-sf-events", now "layerlens-sf-events")
* agentforce/mapper.py — internal docstrings
* google_adk, llama_index, openai_agents, semantic_kernel — method
  docstrings that referenced "Stratix events / callbacks"

Public class names that would require a deprecation alias to rename
(StratixMemoryStore, the Stratix client class re-exported from
layerlens, the to_stratix_policy alias method) are intentionally left
in place — class-name renames belong in their own breaking-change PR.
The StratixMemoryStore docstring now carries an explicit note that the
prefix is retained for backward compatibility.

The closure-only StratixEventHandler in llama_index/lifecycle.py was
renamed to LayerLensEventHandler since it is never reachable from
outside the adapter — but its name does surface in LlamaIndex
dispatcher logs / UI, so it counts as customer-visible.

2. Missing STREAMING / REPLAY capability declarations
======================================================

Per audit, six adapters wrap a streaming entry-point but do not
declare AdapterCapability.STREAMING in get_adapter_info():

* agno          — wraps Agent.arun (async)
* ms_agent_framework — wraps ChatCompletionAgent.invoke_stream
* openai_agents — TraceProcessor receives GenerationSpanData per chunk
* google_adk    — BeforeModelCallback / AfterModelCallback fire per chunk
* llama_index   — Instrumentation Module emits per-chunk events
* bedrock_agents — invoke_agent returns an EventStream completion

All six now declare STREAMING.

Per audit, every adapter implements serialize_for_replay() but only
langfuse declared AdapterCapability.REPLAY. REPLAY is now declared by
every adapter that has its own (non-stub) serialize_for_replay
implementation — which is every BaseAdapter subclass in this branch:
agno, bedrock_agents, google_adk, llama_index, pydantic_ai, strands,
openai_agents, ms_agent_framework, embedding (both EmbeddingAdapter
and VectorStoreAdapter), semantic_kernel, smolagents, agentforce.

Without these declarations, the atlas-app catalog UI surfaces
incorrect feature support — it tells customers they cannot replay
traces from an adapter that supports it, or cannot stream from one
that wraps every streaming entry-point.

Tests
=====

Per-adapter test extension: every adapter test file gained a
"declares_streaming_and_replay_capabilities" or
"declares_replay_capability" test that asserts the capability appears
in get_adapter_info().capabilities.

A new lint guard test
(tests/instrument/adapters/frameworks/test_capability_consistency.py)
enforces both rules consistently across every adapter discovered in
the branch. It is the in-tree counterpart of the upstream
manifest_consistency lint (which lands in the manifest emitter PR):

* test_replay_capability_matches_serialize_for_replay — REPLAY is
  declared iff serialize_for_replay is implemented.
* test_streaming_capability_declared_for_streaming_adapters — every
  adapter on the canonical streaming list declares STREAMING.

Drive-bys
=========

* tests/instrument/adapters/frameworks/test_bulk_ported_smoke.py was
  hard-coded to import every adapter package, which fails collection
  on any branch missing one of them. Replaced the per-adapter import
  block with a try/except importlib loop so the smoke suite tests
  whatever adapters are present, not a fixed superset.
* test_agentforce.test_package_does_not_eagerly_import_requests was
  deleting agentforce.* from sys.modules and never restoring them,
  which broke class identity (`is`) checks in subsequent tests in the
  same session. Saved/restored the original module objects so the
  cleanup is hermetic.

Verification
============

* uv run python -m pytest tests/instrument/adapters/frameworks/
    -> 231 passed, 1 skipped
* uv run python -m pytest tests/instrument/adapters/frameworks/
      test_agentforce.py -k brand_leak
    -> 1 passed
* uv run python -m mypy --strict
      src/layerlens/instrument/adapters/frameworks/agentforce
    -> Success: no issues found in 11 source files
* uv run python -m ruff check
      src/layerlens/instrument/adapters/frameworks/
      tests/instrument/adapters/frameworks/
    -> All checks passed

Refs adapter-depth-audit.md (brand leak + capability declarations)
@mmercuri mmercuri requested a review from m-peko April 27, 2026 00:18
mmercuri added a commit that referenced this pull request May 10, 2026
…#119 deferred)

LangGraph wraps llm.stream / llm.astream (see frameworks/langgraph/llm.py)
and accumulates per-chunk events in TracedLLM. The atlas-app catalog UI
reads AdapterCapability.STREAMING off info() to surface streaming
support, but the langgraph adapter only declared REPLAY here.

PR #119 (brand leak + capability declarations) wired STREAMING for the
six adapters that lived on its branch; langgraph was deferred because
it lives on its own source-port branch (PR #100). This closes that
deferral per CLAUDE.md item 5/11.

Capabilities declared honestly per CLAUDE.md 'no fake claims': REPLAY
matches serialize_for_replay (already present); STREAMING matches the
wrapped llm.stream / llm.astream entry-points.

Tests: added test_declares_replay_and_streaming_capabilities regression
asserting both capabilities surface via adapter.info().capabilities.

Verification:
* uv run python -m pytest tests/instrument/adapters/frameworks/test_langgraph.py -x
    -> 30 passed
mmercuri added a commit that referenced this pull request May 10, 2026
…deferred)

CrewAIAdapter implements serialize_for_replay (returns a populated
ReplayableTrace with events + state snapshots + capture config), but
get_adapter_info().capabilities did not declare AdapterCapability.REPLAY.
The atlas-app catalog UI reads that list to surface replay support, so
customers were told they could not replay traces from CrewAI even though
the adapter supports it.

PR #119 (brand leak + capability declarations) wired REPLAY for the
adapters that lived on its branch; CrewAI was deferred because it lives
on its own source-port branch (PR #104). This closes that deferral per
CLAUDE.md item 5/11.

STREAMING is intentionally NOT declared. Per CLAUDE.md 'no fake claims',
a capability is only declared if the adapter actually implements it.
The CrewAI integration is purely callback-driven (kickoff / task
callbacks via callbacks.py) and does not wrap a streaming entry-point —
no per-chunk events flow through the adapter.

Tests: added test_declares_replay_capability regression asserting
REPLAY appears and STREAMING does NOT, so a future refactor that adds
streaming must update both the implementation and the test together.

Verification:
* uv run python -m pytest tests/instrument/adapters/frameworks/test_crewai.py -x
    -> 13 passed
mmercuri added a commit that referenced this pull request May 10, 2026
…deferred)

AutoGenAdapter implements serialize_for_replay (returns a populated
ReplayableTrace with events + state snapshots + capture config), but
get_adapter_info().capabilities did not declare AdapterCapability.REPLAY.
The atlas-app catalog UI reads that list to surface replay support, so
customers were told they could not replay traces from AutoGen even
though the adapter supports it.

PR #119 (brand leak + capability declarations) wired REPLAY for the
adapters that lived on its branch; AutoGen was deferred because it
lives on its own source-port branch (PR #101). This closes that
deferral per CLAUDE.md item 5/11.

STREAMING is intentionally NOT declared. Per CLAUDE.md 'no fake claims',
a capability is only declared if the adapter actually implements it.
The AutoGen integration wraps send / receive / generate_reply /
execute_code as discrete method calls rather than streaming
entry-points — no per-chunk events flow through the adapter.

Tests: added test_declares_replay_capability regression asserting
REPLAY appears and STREAMING does NOT.

Verification:
* uv run --with pytest python -m pytest tests/instrument/adapters/frameworks/test_autogen.py -x
    -> 14 passed
mmercuri added a commit that referenced this pull request May 10, 2026
 deferred)

LayerLensCallbackHandler implements serialize_for_replay (returns
a non-stub ReplayableTrace populated with self._trace_events plus the
capture config) AND a working execute_replay coroutine, but
get_adapter_info().capabilities did not declare
AdapterCapability.REPLAY. The atlas-app catalog UI reads that list
to surface replay support, so customers were told they could not
replay traces from LangChain even though the adapter supports it.

PR #119 (brand leak + capability declarations) wired REPLAY for the
adapters that lived on its branch; LangChain was deferred because it
lives on the orchestration source-port branch (PR #96). This closes
that deferral per CLAUDE.md item 5/11.

STREAMING is intentionally NOT declared. Per CLAUDE.md 'no fake
claims', a capability is only declared if the adapter actually
implements it. The LangChain adapter registers
on_chat_model_start / on_llm_start / on_llm_end / on_tool_* /
on_agent_* / on_chain_* — but NOT on_llm_new_token, the LangChain
streaming callback. No per-chunk events flow through this adapter.
Adding STREAMING here would mislead the catalog UI into telling
customers they can stream from an adapter that does not see chunks.

Tests: added tests/instrument/adapters/frameworks/test_langchain_capabilities.py
with three regressions:
* test_declares_replay_capability — REPLAY surfaces via info().capabilities
* test_does_not_declare_streaming_capability — STREAMING stays absent
  until on_llm_new_token is wired and tested explicitly
* test_get_adapter_info_matches_info_wrapper — info() and
  get_adapter_info() agree on the capability list

Verification:
* uv run --with pytest python -m pytest     tests/instrument/adapters/frameworks/test_langchain_capabilities.py -x
    -> 3 passed
* uv run --with pytest python -m pytest     tests/instrument/adapters/frameworks/ -x
    -> 40 passed (no regressions in autogen / crewai / langfuse suites)
@m-peko m-peko closed this May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants