fix(instrument): Brand leak in agentforce trust layer YAML + missing STREAMING/REPLAY capability declarations#119
Closed
mmercuri wants to merge 7 commits into
Closed
Conversation
Bootstraps the LayerLens instrument layer with the abstract base classes,
adapter registry, capture configuration, event sinks, vendored event
schemas, and pydantic v1/v2 compatibility shim that every concrete
adapter (frameworks, protocols, providers) will depend on.
Scope
-----
- src/layerlens/instrument/__init__.py: lean re-export surface
- src/layerlens/instrument/_vendored/: frozen ateam event schemas (no
runtime ateam dependency)
- src/layerlens/instrument/adapters/_base/: BaseAdapter, AdapterRegistry,
AdapterStatus, AdapterHealth, AdapterCapability, ReplayableTrace,
CaptureConfig, EventSink, TraceStoreSink, IngestionPipelineSink,
PydanticCompat
- src/layerlens/_compat/pydantic.py: model_dump/model_validate shim
spanning pydantic v1 + v2
- scripts/{port_adapter,port_protocol,emit_adapter_manifest,
regen_dep_baselines}.py: codegen helpers used to port the rest of M1
- tests/instrument/{test_base_layer,test_lazy_imports,
test_default_install,test_resolved_dep_tree}.py + _baselines/
- .github/workflows/dep-tree-guard.yaml: CI gate that locks the default
install footprint
- docs/adapters/: CONTRIBUTING, STATUS, pydantic-compatibility, testing,
PERSONA_REVIEW
Blast radius
------------
- Pure additions. No public surface changes outside the new
layerlens.instrument namespace.
- Default `pip install layerlens` install set is unchanged (verified by
test_default_install.py against the new baseline).
- Lazy adapter discovery: importing layerlens.instrument MUST NOT pull
in any optional adapter dep (verified by test_lazy_imports.py).
Test plan
---------
- uv run pytest tests/instrument/test_base_layer.py
tests/instrument/test_lazy_imports.py -x -> 45 passed
- The dep-tree-guard workflow exercises test_default_install.py and
test_resolved_dep_tree.py against the new baselines on every PR.
LAY-3400 umbrella: this PR is the prerequisite for the M1.B/M1.C/M1.D
adapter ports, M7 protocol certification, and M8 Cohere/Mistral.
Ports the twelve agent-tier framework adapters from the ateam
reference implementation onto the new layerlens.instrument base layer:
Semantic Kernel, LlamaIndex, OpenAI Agents, Pydantic-AI, Agno,
Strands, SmolAgents, MS Agent Framework, Google ADK,
Bedrock Agents, Embedding (vector store hooks), Benchmark Import
Pairs with feat/instrument-frameworks-orchestration (M1.C part 1)
which lands LangChain, LangGraph, CrewAI, AutoGen, Langfuse, and
Agentforce. Together they complete M1.C.
Scope
-----
- src/layerlens/instrument/adapters/frameworks/{semantic_kernel,
llama_index,openai_agents,pydantic_ai,agno,strands,smolagents,
ms_agent_framework,google_adk,bedrock_agents,embedding,
benchmark_import}/: per-framework packages
- tests/instrument/adapters/frameworks/test_*_adapter.py + the
test_bulk_ported_smoke.py harness (which exercises every ported
adapter against canned trace fixtures so partial framework SDKs
on a given runner don't drop coverage to zero)
- samples/instrument/<framework>/: runnable per-framework samples
- docs/adapters/frameworks-<framework>.md: per-framework integration
guide
- pyproject.toml: twelve new optional extras
(semantic-kernel, llama-index, openai-agents, pydantic-ai, agno,
strands, smolagents, ms-agent-framework, google-adk,
bedrock-agents, embedding, benchmark-import) with python_version
markers; pyright/ruff exclusions for the dynamic monkey-patching
framework code
Blast radius
------------
- Default `pip install layerlens` install set is unchanged. Each
framework's heavy deps are gated behind their own extra.
- No changes to existing public API surface.
- Importing layerlens.instrument still does NOT pull in any framework
module (lazy registry lookup).
Test plan
---------
- uv run pytest tests/instrument/adapters/frameworks/ -x ->
184 passed, 1 skipped (test_bulk_ported_smoke.py covers all 12
agent-tier adapters plus the orchestration-tier ones from part 1
via the same harness)
Stacks on
---------
- feat/instrument-base-foundation (M1.A) — required for the
BaseAdapter surface this PR consumes.
Sibling of
----------
- feat/instrument-frameworks-orchestration (M1.C part 1) — both
branches stack on the base foundation independently and don't
conflict; they can land in either order.
LAY-3400 umbrella (M1.C part 2).
Auto-fixed by 'ruff check --fix'. No behavior change.
Ports the Salesforce Agentforce framework adapter from ateam (stratix.sdk.python.adapters.agentforce, ~2,954 LOC across 11 files — the largest of the M2 framework batch) onto the layerlens.instrument base layer landed in M1.A. Scope ----- - src/layerlens/instrument/adapters/frameworks/agentforce/ — full port of all 11 modules (adapter, auth, client, events, importer, llm_eval, mapper, models, normalizer, trust_layer, __init__) - src/layerlens/instrument/adapters/frameworks/__init__.py — package marker that does NOT eagerly import any framework SDK - tests/instrument/adapters/frameworks/test_agentforce.py — 36 unit tests (lifecycle, importer with paginated SOQL fixtures, normalizer for every DMO record type, Agent API client/mapper round-trip, Trust Layer YAML emission + deprecation alias, Platform Events handler, Einstein evaluator offline behavior, lazy-import guard). All mocks are SDK-shape only — no real Salesforce / network call. - samples/instrument/agentforce/ — runnable end-to-end sample with 4 mocked flows (SOQL backfill, live Agent API capture, Trust Layer policy export, evaluator offline) plus optional live JWT auth check. - docs/adapters/frameworks-agentforce.md — integration guide including Connected App + JWT Bearer OAuth setup, event taxonomy, capture config, BYOK, Trust Layer round-trip, and replay semantics. - pyproject.toml — new "agentforce" optional extra (requests + PyJWT[crypto]). Salesforce specifics preserved from the source port --------------------------------------------------- - OAuth 2.0 JWT Bearer flow with private-key resolution from env vars, filesystem paths, or inline PEM strings. - SOQL injection guards: every parent ID interpolated into WHERE … IN clauses is validated against the 15/18-char Salesforce ID regex; date and timestamp params validated against ISO 8601 regexes. - Token re-authentication on expiry, with X-RateLimit / Sforce-Limit-Info warnings at 80% consumption. - Trust Layer policy export renamed to_layerlens_policy with a deprecation alias keeping to_stratix_policy callable for one migration window. Verification ------------ - mypy --strict src/layerlens/instrument/adapters/frameworks/agentforce → Success: no issues found in 11 source files - ruff check src/layerlens/instrument/adapters/frameworks/agentforce tests/instrument/adapters/frameworks/test_agentforce.py → All checks passed - pytest tests/instrument/adapters/frameworks/test_agentforce.py → 36 passed - pytest tests/instrument/test_default_install.py → 3 passed (extra does not change default install set) - python samples/instrument/agentforce/main.py → exits 0, prints all 4 flow summaries Refs LAY-INSTRUMENT (M2 fan-out)
Two corrections after running mypy --strict against a fresh resolved environment: - auth.py: drop the now-unused [arg-type] ignore on _check_rate_limit; response.headers from requests already typechecks against the dict[str, Any] parameter. - events.py: include [import-not-found] alongside [import-untyped] in the optional grpc import; mypy resolves grpc to import-not-found when no stubs are installed (the default install path). mypy --strict src/layerlens/instrument/adapters/frameworks/agentforce -> Success: no issues found in 11 source files
…STREAMING/REPLAY capability declarations
Two related fixes from the depth audit (A:/tmp/adapter-depth-audit.md).
1. Brand leak in agentforce trust layer
=========================================
The Trust Layer importer writes a customer-visible YAML policy file
into the customer's source tree. The header carried legacy STRATIX
branding:
# Stratix Policy - Imported from Einstein Trust Layer
# Generated by: stratix.sdk.python.adapters.agentforce.trust_layer
both lines are leaked to the customer's VCS / auditors / shared docs.
That landed in the original port; this fix replaces them with the
current LayerLens brand and the actual import path:
# LayerLens Policy - Imported from Einstein Trust Layer
# Generated by: layerlens.instrument.adapters.frameworks.agentforce.trust_layer
A regression test (test_trust_layer_yaml_has_no_stratix_brand_leak)
asserts both the positive ("LayerLens Policy" present) and negative
("STRATIX" / "stratix.sdk" absent) cases across the canonical method
AND the deprecated to_stratix_policy alias, so no future regression
slips through either entry point.
The audit-wide sweep (#4 in the brief) caught additional STRATIX brand
strings in non-deprecation contexts. All fixed in this PR:
* embedding/embedding_adapter.py — module docstring + author field
(was author="STRATIX Team" surfaced via AdapterInfo)
* embedding/vector_store_adapter.py — module docstring + author field
* embedding/__init__.py, benchmark_import/__init__.py,
benchmark_import/adapter.py, semantic_kernel/__init__.py,
semantic_kernel/lifecycle.py, semantic_kernel/filters.py — module
docstrings
* agentforce/auth.py — error messages now say
`layerlens agentforce connect` (current CLI binary)
* agentforce/events.py — Platform Events thread name (was
"stratix-sf-events", now "layerlens-sf-events")
* agentforce/mapper.py — internal docstrings
* google_adk, llama_index, openai_agents, semantic_kernel — method
docstrings that referenced "Stratix events / callbacks"
Public class names that would require a deprecation alias to rename
(StratixMemoryStore, the Stratix client class re-exported from
layerlens, the to_stratix_policy alias method) are intentionally left
in place — class-name renames belong in their own breaking-change PR.
The StratixMemoryStore docstring now carries an explicit note that the
prefix is retained for backward compatibility.
The closure-only StratixEventHandler in llama_index/lifecycle.py was
renamed to LayerLensEventHandler since it is never reachable from
outside the adapter — but its name does surface in LlamaIndex
dispatcher logs / UI, so it counts as customer-visible.
2. Missing STREAMING / REPLAY capability declarations
======================================================
Per audit, six adapters wrap a streaming entry-point but do not
declare AdapterCapability.STREAMING in get_adapter_info():
* agno — wraps Agent.arun (async)
* ms_agent_framework — wraps ChatCompletionAgent.invoke_stream
* openai_agents — TraceProcessor receives GenerationSpanData per chunk
* google_adk — BeforeModelCallback / AfterModelCallback fire per chunk
* llama_index — Instrumentation Module emits per-chunk events
* bedrock_agents — invoke_agent returns an EventStream completion
All six now declare STREAMING.
Per audit, every adapter implements serialize_for_replay() but only
langfuse declared AdapterCapability.REPLAY. REPLAY is now declared by
every adapter that has its own (non-stub) serialize_for_replay
implementation — which is every BaseAdapter subclass in this branch:
agno, bedrock_agents, google_adk, llama_index, pydantic_ai, strands,
openai_agents, ms_agent_framework, embedding (both EmbeddingAdapter
and VectorStoreAdapter), semantic_kernel, smolagents, agentforce.
Without these declarations, the atlas-app catalog UI surfaces
incorrect feature support — it tells customers they cannot replay
traces from an adapter that supports it, or cannot stream from one
that wraps every streaming entry-point.
Tests
=====
Per-adapter test extension: every adapter test file gained a
"declares_streaming_and_replay_capabilities" or
"declares_replay_capability" test that asserts the capability appears
in get_adapter_info().capabilities.
A new lint guard test
(tests/instrument/adapters/frameworks/test_capability_consistency.py)
enforces both rules consistently across every adapter discovered in
the branch. It is the in-tree counterpart of the upstream
manifest_consistency lint (which lands in the manifest emitter PR):
* test_replay_capability_matches_serialize_for_replay — REPLAY is
declared iff serialize_for_replay is implemented.
* test_streaming_capability_declared_for_streaming_adapters — every
adapter on the canonical streaming list declares STREAMING.
Drive-bys
=========
* tests/instrument/adapters/frameworks/test_bulk_ported_smoke.py was
hard-coded to import every adapter package, which fails collection
on any branch missing one of them. Replaced the per-adapter import
block with a try/except importlib loop so the smoke suite tests
whatever adapters are present, not a fixed superset.
* test_agentforce.test_package_does_not_eagerly_import_requests was
deleting agentforce.* from sys.modules and never restoring them,
which broke class identity (`is`) checks in subsequent tests in the
same session. Saved/restored the original module objects so the
cleanup is hermetic.
Verification
============
* uv run python -m pytest tests/instrument/adapters/frameworks/
-> 231 passed, 1 skipped
* uv run python -m pytest tests/instrument/adapters/frameworks/
test_agentforce.py -k brand_leak
-> 1 passed
* uv run python -m mypy --strict
src/layerlens/instrument/adapters/frameworks/agentforce
-> Success: no issues found in 11 source files
* uv run python -m ruff check
src/layerlens/instrument/adapters/frameworks/
tests/instrument/adapters/frameworks/
-> All checks passed
Refs adapter-depth-audit.md (brand leak + capability declarations)
mmercuri
added a commit
that referenced
this pull request
May 10, 2026
…#119 deferred) LangGraph wraps llm.stream / llm.astream (see frameworks/langgraph/llm.py) and accumulates per-chunk events in TracedLLM. The atlas-app catalog UI reads AdapterCapability.STREAMING off info() to surface streaming support, but the langgraph adapter only declared REPLAY here. PR #119 (brand leak + capability declarations) wired STREAMING for the six adapters that lived on its branch; langgraph was deferred because it lives on its own source-port branch (PR #100). This closes that deferral per CLAUDE.md item 5/11. Capabilities declared honestly per CLAUDE.md 'no fake claims': REPLAY matches serialize_for_replay (already present); STREAMING matches the wrapped llm.stream / llm.astream entry-points. Tests: added test_declares_replay_and_streaming_capabilities regression asserting both capabilities surface via adapter.info().capabilities. Verification: * uv run python -m pytest tests/instrument/adapters/frameworks/test_langgraph.py -x -> 30 passed
mmercuri
added a commit
that referenced
this pull request
May 10, 2026
…deferred) CrewAIAdapter implements serialize_for_replay (returns a populated ReplayableTrace with events + state snapshots + capture config), but get_adapter_info().capabilities did not declare AdapterCapability.REPLAY. The atlas-app catalog UI reads that list to surface replay support, so customers were told they could not replay traces from CrewAI even though the adapter supports it. PR #119 (brand leak + capability declarations) wired REPLAY for the adapters that lived on its branch; CrewAI was deferred because it lives on its own source-port branch (PR #104). This closes that deferral per CLAUDE.md item 5/11. STREAMING is intentionally NOT declared. Per CLAUDE.md 'no fake claims', a capability is only declared if the adapter actually implements it. The CrewAI integration is purely callback-driven (kickoff / task callbacks via callbacks.py) and does not wrap a streaming entry-point — no per-chunk events flow through the adapter. Tests: added test_declares_replay_capability regression asserting REPLAY appears and STREAMING does NOT, so a future refactor that adds streaming must update both the implementation and the test together. Verification: * uv run python -m pytest tests/instrument/adapters/frameworks/test_crewai.py -x -> 13 passed
mmercuri
added a commit
that referenced
this pull request
May 10, 2026
…deferred) AutoGenAdapter implements serialize_for_replay (returns a populated ReplayableTrace with events + state snapshots + capture config), but get_adapter_info().capabilities did not declare AdapterCapability.REPLAY. The atlas-app catalog UI reads that list to surface replay support, so customers were told they could not replay traces from AutoGen even though the adapter supports it. PR #119 (brand leak + capability declarations) wired REPLAY for the adapters that lived on its branch; AutoGen was deferred because it lives on its own source-port branch (PR #101). This closes that deferral per CLAUDE.md item 5/11. STREAMING is intentionally NOT declared. Per CLAUDE.md 'no fake claims', a capability is only declared if the adapter actually implements it. The AutoGen integration wraps send / receive / generate_reply / execute_code as discrete method calls rather than streaming entry-points — no per-chunk events flow through the adapter. Tests: added test_declares_replay_capability regression asserting REPLAY appears and STREAMING does NOT. Verification: * uv run --with pytest python -m pytest tests/instrument/adapters/frameworks/test_autogen.py -x -> 14 passed
mmercuri
added a commit
that referenced
this pull request
May 10, 2026
deferred) LayerLensCallbackHandler implements serialize_for_replay (returns a non-stub ReplayableTrace populated with self._trace_events plus the capture config) AND a working execute_replay coroutine, but get_adapter_info().capabilities did not declare AdapterCapability.REPLAY. The atlas-app catalog UI reads that list to surface replay support, so customers were told they could not replay traces from LangChain even though the adapter supports it. PR #119 (brand leak + capability declarations) wired REPLAY for the adapters that lived on its branch; LangChain was deferred because it lives on the orchestration source-port branch (PR #96). This closes that deferral per CLAUDE.md item 5/11. STREAMING is intentionally NOT declared. Per CLAUDE.md 'no fake claims', a capability is only declared if the adapter actually implements it. The LangChain adapter registers on_chat_model_start / on_llm_start / on_llm_end / on_tool_* / on_agent_* / on_chain_* — but NOT on_llm_new_token, the LangChain streaming callback. No per-chunk events flow through this adapter. Adding STREAMING here would mislead the catalog UI into telling customers they can stream from an adapter that does not see chunks. Tests: added tests/instrument/adapters/frameworks/test_langchain_capabilities.py with three regressions: * test_declares_replay_capability — REPLAY surfaces via info().capabilities * test_does_not_declare_streaming_capability — STREAMING stays absent until on_llm_new_token is wired and tested explicitly * test_get_adapter_info_matches_info_wrapper — info() and get_adapter_info() agree on the capability list Verification: * uv run --with pytest python -m pytest tests/instrument/adapters/frameworks/test_langchain_capabilities.py -x -> 3 passed * uv run --with pytest python -m pytest tests/instrument/adapters/frameworks/ -x -> 40 passed (no regressions in autogen / crewai / langfuse suites)
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two related fixes from the depth audit (
A:/tmp/adapter-depth-audit.md).1. Brand leak in agentforce trust layer YAML
Audit finding:
agentforce/trust_layer.py:144was emittinginto customer-visible YAML files. These files end up in customers' source trees, version control, and audit packages, so the legacy STRATIX brand and the obsolete
stratix.sdk.python.*import path were leaking into customer-owned artifacts.Fix: header now emits the LayerLens brand and the actual current import path:
A regression test
test_trust_layer_yaml_has_no_stratix_brand_leakcovers both the canonical method and the deprecatedto_stratix_policyalias, asserting:# LayerLens Policyand the new generator path are present.STRATIX,Stratix,stratix.sdk,stratix.sdk.python,ateamare all absent.Sweep (audit task #4)
grep -rn "Stratix\|stratix" src/layerlens/instrument/adapters/frameworks/ | grep -v "deprecated\|backward\|alias"surfaced additional leaks. All non-deprecation contexts fixed:embedding/embedding_adapter.py— module docstring,author="STRATIX Team"(surfaced viaAdapterInfo)embedding/vector_store_adapter.py— module docstring + authorembedding/__init__.py,benchmark_import/__init__.py,benchmark_import/adapter.py,semantic_kernel/__init__.py,semantic_kernel/lifecycle.py,semantic_kernel/filters.py— module docstringsagentforce/auth.py— error messages now referencelayerlens agentforce connect(current CLI binary; bothlayerlensandstratixare registered scripts)agentforce/events.py— Platform Events thread name (stratix-sf-events→layerlens-sf-events)agentforce/mapper.py— internal docstringsgoogle_adk,llama_index,openai_agents,semantic_kernel— method docstringsllama_index/lifecycle.py— closure-onlyStratixEventHandlerrenamed toLayerLensEventHandler(the name surfaces in LlamaIndex dispatcher logs / UI)Intentionally left alone (would require deprecation aliases — out of scope for a fix PR):
class StratixMemoryStore(semantic_kernel) — public class with backward-compat note added to the docstringclass Stratixre-exported fromlayerlens— actual API client class nameto_stratix_policymethod — already exists as a deprecated alias2. Missing STREAMING / REPLAY capability declarations
Audit finding: 6 adapters wrap streaming methods but do not declare
AdapterCapability.STREAMING. Only langfuse declaresREPLAYdespite every adapter implementingserialize_for_replay().STREAMING (added to all 6)
agnoAgent.arun(async stream)ms_agent_frameworkChatCompletionAgent.invoke_streamopenai_agentsTraceProcessorreceivesGenerationSpanDataper chunkgoogle_adkBeforeModelCallback/AfterModelCallbackfire per chunkllama_indexbedrock_agentsinvoke_agentreturns an EventStream completionREPLAY (added to every adapter that implements
serialize_for_replay())agno, bedrock_agents, google_adk, llama_index, pydantic_ai, strands, openai_agents, ms_agent_framework, embedding (both
EmbeddingAdapterandVectorStoreAdapter), semantic_kernel, smolagents, agentforce.Without these declarations, the atlas-app catalog UI surfaces incorrect feature support — telling customers they cannot replay traces from an adapter that supports it, or cannot stream from one that wraps every streaming entry-point.
3. Tests
declares_streaming_and_replay_capabilities(6) ordeclares_replay_capability(7) test that asserts the capability appears inget_adapter_info().capabilities.tests/instrument/adapters/frameworks/test_capability_consistency.pyenforces both rules consistently across every adapter discovered in the branch — the in-tree counterpart of the upstreammanifest_consistencylint guard from the manifest emitter PR:test_replay_capability_matches_serialize_for_replay— REPLAY declared iffserialize_for_replayimplemented (with non-stub body).test_streaming_capability_declared_for_streaming_adapters— every adapter on the canonical streaming list declares STREAMING.4. Drive-by fixes
tests/instrument/adapters/frameworks/test_bulk_ported_smoke.pywas hard-coded to import every adapter package, failing collection on any branch missing one of them. Replaced the per-adapter import block with a try/exceptimportlibloop so the smoke suite tests whatever adapters are present.test_agentforce.test_package_does_not_eagerly_import_requestswas deletingagentforce.*fromsys.moduleswithout restoring them, breaking class identity (is) checks in subsequent tests within the same session. Saved/restored the original module objects so the cleanup is hermetic.5. Sample regenerated trust-layer YAML
Verification
Branch base note
This branch merges
feat/instrument-frameworks-agentforce(brand leak source) withfeat/instrument-frameworks-agents(the 13 agent-tier framework adapters that need capability declarations) so all enumerated audit items can land in a single PR. Conflicts resolved inpyproject.toml(combined the agentforce + agent-tier extras blocks) andframeworks/__init__.py(combined the docstring adapter list).Test plan
uv run python -m pytest tests/instrument/adapters/frameworks/(231 passed, 1 skipped)uv run python -m pytest tests/instrument/adapters/frameworks/test_agentforce.py -k brand_leak(1 passed)uv run python -m mypy --strict src/layerlens/instrument/adapters/frameworks/agentforce(clean)uv run python -m ruff check src/layerlens/instrument/adapters/frameworks/ tests/instrument/adapters/frameworks/(clean)