LayerLens · mmercuri · Apr 23, 2026 · Apr 23, 2026 · Apr 23, 2026 · Apr 24, 2026
diff --git a/docs/adapters/multi-tenancy.md b/docs/adapters/multi-tenancy.md
@@ -0,0 +1,148 @@
+# Multi-tenancy contract for adapters
+
+LayerLens is a multi-tenant SaaS platform. Every event emitted by an
+adapter MUST be tagged with the originating tenant's `org_id`. Cache
+keys, queue topics, ingest streams, RLS policies, and downstream
+attestation chains all read this field to scope data to a single
+tenant.
+
+This document defines the binding contract that every framework /
+protocol / provider adapter must satisfy. It is enforced at runtime by
+`BaseAdapter.__init__` (fail-fast) and at CI time by the test suite at
+`tests/instrument/adapters/_base/test_org_id_propagation.py` plus the
+parametrized `tests/instrument/adapters/frameworks/test_per_adapter_org_id.py`.
+
+## The contract
+
+1. **Every adapter is bound to exactly one tenant at construction.**
+   The tenant binding (`org_id`) is stored as `self._org_id` and
+   exposed as the read-only property `adapter.org_id`. The bound value
+   is a non-empty string — there is no null sentinel, no empty
+   fallback, no `"default"` placeholder.
+
+2. **Construction without a resolvable `org_id` raises.** Resolution
+   order at `__init__`:
+
+   1. Explicit `org_id=...` keyword to the adapter constructor.
+   2. `stratix.org_id` attribute on the attached client (if not blank).
+   3. `stratix.organization_id` attribute on the attached client — the
+      public `layerlens.Stratix` client uses this name (if not blank).
+
+   If none of the three resolve to a non-empty string,
+   `BaseAdapter.__init__` raises `ValueError`. This is a fail-fast.
+   Callers cannot opt out, suppress, or work around it. There is no
+   silent fallback. A blank `org_id` is rejected with the same error
+   as an absent one.
+
+3. **Every emission is stamped.** Both `emit_event` (typed payload)
+   and `emit_dict_event` (dict payload) call `BaseAdapter._stamp_org_id`
+   before forwarding to the client. The bound `self._org_id` is
+   written to the payload's `org_id` field unconditionally — any
+   caller-supplied value (including a wrong tenant's id) is
+   overwritten. The adapter binding is the source of truth.
+
+4. **Every trace record carries `org_id`.** The replay event records
+   stored in `self._trace_events` include `org_id` at the envelope
+   level *and* inside the payload dict, so replay round-trips and
+   downstream re-ingest preserve the binding.
+
+5. **Every sink dispatch carries `org_id`.** The `EventSink.send`
+   ABC requires the keyword: `send(event_type, payload, timestamp_ns,
+   *, org_id: str)`. Sinks that omit it are flagged at the type-check
+   layer (mypy `--strict`). The `IngestionPipelineSink` uses the
+   per-event `org_id` as the `tenant_id` for downstream ingest.
+
+## Wiring a new adapter
+
+Subclasses of `BaseAdapter` (and `BaseProtocolAdapter` /
+`LLMProviderAdapter`) get the contract for free **as long as their
+`__init__` forwards `org_id` to `super().__init__`**. The canonical
+shape:
+
+```python
+class MyAdapter(BaseAdapter):
+    FRAMEWORK = "my_framework"
+    VERSION = "0.1.0"
+
+    def __init__(
+        self,
+        stratix: Any | None = None,
+        capture_config: CaptureConfig | None = None,
+        # framework-specific args here ...
+        *,
+        org_id: str | None = None,
+    ) -> None:
+        super().__init__(
+            stratix=stratix,
+            capture_config=capture_config,
+            org_id=org_id,
+        )
+        # adapter-specific state ...
+```
+
+Note the keyword-only `*` separator for `org_id`. The rest of
+`__init__` is unchanged from the pre-multi-tenancy era.
+
+Adapter helper functions (the `instrument_*` convenience exports in
+each adapter's `__init__.py`) should also accept and forward `org_id`:
+
+```python
+def instrument_agent(
+    agent: Any,
+    stratix: Any = None,
+    capture_config: dict[str, Any] | None = None,
+    org_id: str | None = None,
+) -> MyAdapter:
+    adapter = MyAdapter(
+        stratix=stratix,
+        capture_config=capture_config,
+        org_id=org_id,
+    )
+    adapter.connect()
+    adapter.instrument_agent(agent)
+    return adapter
+```
+
+## Test obligations
+
+Every new framework adapter must:
+
+1. Have its class added to `_all_adapter_classes()` in
+   `tests/instrument/adapters/frameworks/test_per_adapter_org_id.py`.
+   The two parametrized tests there assert (a) the adapter accepts
+   `org_id` and exposes the bound value via the property, and (b) the
+   adapter raises without an `org_id`.
+2. If the adapter ships its own dedicated test file, every test that
+   constructs the adapter must pass `org_id` (typically via the
+   shared `_RecordingStratix` test stand-in, which carries
+   `org_id = "test-org"` as a class attribute).
+3. The cross-tenant isolation guarantee is covered centrally in
+   `tests/instrument/adapters/_base/test_org_id_propagation.py`. New
+   adapters do not need to re-prove cross-tenant isolation if they
+   route emissions through the standard `BaseAdapter` path; they MUST
+   add a per-adapter cross-tenant test if they bypass the base path.
+
+## What changed (April 2026)
+
+Prior to this change, all adapter emissions in the stratix-python SDK
+shipped without `org_id` propagation. The 2026-04-25 audit
+(`A:/tmp/adapter-depth-audit.md`, cross-cutting finding #3) flagged
+this as a CLAUDE.md violation. The fix:
+
+- `BaseAdapter.__init__` now requires a resolvable `org_id` and
+  stores it on the instance.
+- `emit_event` and `emit_dict_event` stamp `org_id` into every
+  payload before forwarding to the client.
+- `EventSink.send` now requires the `org_id` keyword.
+- Every shipped adapter (17 framework + protocol + provider) was
+  updated to thread `org_id` through to `super().__init__`.
+
+## References
+
+- CLAUDE.md, "Multi-Tenancy" section — the platform-wide mandate.
+- `A:/tmp/adapter-depth-audit.md` — the audit that surfaced the gap.
+- `src/layerlens/instrument/adapters/_base/adapter.py` — `_resolve_org_id`,
+  `BaseAdapter.__init__`, `_stamp_org_id`, `emit_event`,
+  `emit_dict_event`, `_post_emit_success`.
+- `src/layerlens/instrument/adapters/_base/sinks.py` — `EventSink`
+  ABC, `TraceStoreSink.send`, `IngestionPipelineSink.send`.
diff --git a/docs/adapters/replay-execution.md b/docs/adapters/replay-execution.md
@@ -0,0 +1,218 @@
+# Adapter Replay Re-execution
+
+This document covers the **factory-based replay** path on
+LayerLens framework adapters — the cross-pollination audit
+item §2.6 lift that brings the LangChain `execute_replay` pattern
+to the eight lighter framework adapters.
+
+It is companion to:
+
+* [`docs/adapters/multi-tenancy.md`](multi-tenancy.md) — the tenant
+  binding contract that `ReplayResult` propagates.
+* `A:/tmp/adapter-cross-pollination-audit.md` §2.6 — the audit row
+  that scopes this lift.
+
+## When to use which replay path
+
+`BaseAdapter` exposes **two** replay entry points:
+
+| Method                           | Caller                | Inputs                                                | Returns                       |
+| -------------------------------- | --------------------- | ----------------------------------------------------- | ----------------------------- |
+| `execute_replay()`               | LayerLens replay engine | `(inputs, original_trace, request, replay_trace_id)` | A `SerializedTrace`           |
+| `execute_replay_via_factory()`   | Adapter SDK / CI tests | `(trace: ReplayableTrace, agent_factory: Callable)`  | A `ReplayResult` (this doc)   |
+
+The **engine** path stays untouched — that is the integration with
+the platform replay service that owns trace storage and result routing.
+
+The **factory** path is what this document covers. It is the
+self-contained option you reach for when:
+
+* You want to re-run a captured trace through a fresh agent
+  *inside the same Python process* (CI, integration tests, debugging).
+* You want a divergence report rather than a single pass/fail.
+* You want a uniform `ReplayResult` shape across every framework so
+  dashboards and `assert` lines do not need adapter-specific branches.
+
+## The pieces
+
+```text
+┌─────────────────────────┐  builds   ┌──────────────────┐
+│ ReplayExecutor          │──────────>│ ReplayResult     │
+│  (shared)               │           │  - trace_id      │
+│                         │           │  - source_trace_id│
+│  + execute_replay()     │           │  - org_id        │
+│                         │           │  - framework     │
+│                         │           │  - outputs       │
+│                         │           │  - captured_events│
+│                         │           │  - divergences[] │
+│                         │           │  - duration_ns   │
+│                         │           │  - execution_error│
+└─────────────┬───────────┘           └──────────────────┘
+              │ uses
+              v
+┌─────────────────────────┐
+│ StubInjector            │  (optional, adapter-specific)
+│  + build_patches()      │
+└─────────────────────────┘
+```
+
+* `ReplayExecutor` lives at
+  `layerlens.instrument.adapters._base.replay`. It is intentionally
+  *narrow* — it does not know how to invoke a framework agent.
+* Adapters provide `_invoke_for_replay(agent, inputs, trace)` to
+  invoke their framework's run/arun/__call__ entry point.
+* Adapters expose `execute_replay_via_factory(trace, agent_factory)`
+  as the public surface — typically a 2-line delegate to the
+  base helper `_replay_via_executor`.
+
+## Eight wired adapters (cross-pollination audit §2.6)
+
+| Adapter                    | `instrument_*`        | Invocation             |
+| -------------------------- | --------------------- | ---------------------- |
+| `agno`                     | `instrument_agent`    | `arun` / `run`         |
+| `openai_agents`            | `instrument_runner`   | `Runner.run(agent, x)` |
+| `llama_index`              | `instrument_workflow` | `arun` / `run`         |
+| `google_adk`               | `instrument_agent`    | `run_async` / `run`    |
+| `strands`                  | `instrument_agent`    | `__call__` / `invoke`  |
+| `pydantic_ai`              | `instrument_agent`    | `run` / `run_sync`     |
+| `smolagents`               | `instrument_agent`    | `run(task)`            |
+| `ms_agent_framework`       | `instrument_chat`     | `invoke(input=)`       |
+
+LangChain already had its own `execute_replay()` (the original audit
+reference) and is not in this lift. Bedrock Agents, browser_use,
+embedding, and langfuse are excluded by audit rationale (see §2.6 row
+notes — "MAYBE — requires Bedrock-side state, harder",
+"N/A — importer / single-agent / no agent concept").
+
+## Honest divergence detection
+
+Per CLAUDE.md ("Honest divergence detection — if replay can't
+reproduce exactly, surface it"), `ReplayResult.divergences` is the
+*authoritative* report of every event mismatch. The executor never
+silently "passes" a divergent replay.
+
+Five divergence kinds are surfaced:
+
+| Kind                  | When it fires                                                  |
+| --------------------- | -------------------------------------------------------------- |
+| `MISSING_EVENT`       | Original had an event at index N; replay's sequence is shorter |
+| `EXTRA_EVENT`         | Replay emitted an event the original did not contain           |
+| `EVENT_TYPE_MISMATCH` | Same position, different `event_type`                          |
+| `PAYLOAD_MISMATCH`    | Same `event_type`, different meaningful payload field          |
+| `EXECUTION_ERROR`     | Framework raised before producing a comparable trace           |
+
+`PAYLOAD_MISMATCH` is deliberately conservative — it compares only
+fields whose mismatch genuinely means the agent did something different
+(`model`, `provider`, `tool_name`, `agent_name`, `from_agent`,
+`to_agent`). Wall-clock fields like `timestamp_ns`, `duration_ns`,
+and `run_id` are *expected* to differ between runs and are not
+flagged. Flagging them would make every replay "divergent" and
+hide real regressions.
+
+`is_exact` reports zero divergences. `succeeded` reports no execution
+error. They are orthogonal:
+
+| `succeeded` | `is_exact` | Outcome                                |
+| ----------- | ---------- | -------------------------------------- |
+| `True`      | `True`     | Perfect reproduction                   |
+| `True`      | `False`    | Replay ran but diverged                |
+| `False`     | (any)      | Framework crashed during replay        |
+
+## Multi-tenancy
+
+`ReplayResult.org_id` carries the bound tenant from the originating
+adapter, set by `BaseAdapter._resolve_org_id` at construction
+(see `multi-tenancy.md`). Per-event records inside
+`ReplayResult.captured_events` also carry `org_id` because they are
+emitted through `_post_emit_success`, which always stamps the field.
+
+A replay started on `adapter_a` (tenant A) cannot leak events into
+`adapter_b`'s (tenant B) trace stream — the executor binds to the
+adapter at construction and never crosses adapters mid-replay.
+
+## Usage
+
+### Minimal example (agno)
+
+```python
+from layerlens.instrument.adapters.frameworks.agno import AgnoAdapter
+
+adapter = AgnoAdapter(org_id="tenant-acme")
+adapter.connect()
+
+# 1. Capture an original run.
+original_agent = build_my_agno_agent()
+adapter.instrument_agent(original_agent)
+original_agent.run("Plan a trip to Tokyo")
+
+trace = adapter.serialize_for_replay()
+
+# 2. Replay through a fresh agent built by a factory.
+def factory():
+    return build_my_agno_agent()  # fresh instance every replay
+
+result = await adapter.execute_replay_via_factory(trace, factory)
+
+assert result.org_id == "tenant-acme"
+assert result.framework == "agno"
+if not result.is_exact:
+    for div in result.divergences:
+        print(f"[{div.kind.value}] index={div.index} {div.detail}")
+```
+
+### Async factory
+
+The factory may return either an agent instance or an awaitable
+resolving to one — the executor inspects the return value and awaits
+when needed:
+
+```python
+async def async_factory():
+    config = await load_config_from_db()
+    return AgnoAgent.from_config(config)
+
+result = await adapter.execute_replay_via_factory(trace, async_factory)
+```
+
+### Adapter-specific stub injection
+
+For LLM-deterministic replay, supply a `StubInjector` that returns
+patches the executor will apply for the duration of the replay run.
+The base case (no stubs) works for fixture-based tests where the
+agent itself is deterministic.
+
+```python
+from layerlens.instrument.adapters._base.replay import (
+    ReplayExecutor,
+    StubInjector,
+)
+
+class MyOpenAIStubs(StubInjector):
+    def build_patches(self, adapter, trace):
+        # Replace ChatCompletions.create with a deterministic fake.
+        return [
+            ("openai.resources.chat.completions.Completions.create",
+             my_fake_create),
+        ]
+
+executor = ReplayExecutor(adapter, stub_injector=MyOpenAIStubs())
+result = await executor.execute_replay(trace, factory)
+```
+
+Stub teardown is guaranteed even on framework error — patches are
+unwound in a `finally` block so a failed replay leaves no global
+monkey-patches behind.
+
+## Failure modes (what is *not* swallowed)
+
+| Failure                        | Handling                                                     |
+| ------------------------------ | ------------------------------------------------------------ |
+| Agent raises mid-execution     | Captured into `result.execution_error`; replay marked failed |
+| Agent factory itself raises    | Captured into `result.execution_error`                       |
+| Stub teardown raises           | Logged at WARNING; original execution outcome preserved      |
+| `org_id` cannot be resolved    | `BaseAdapter.__init__` raises `ValueError` (fail-fast)       |
+| Adapter never overrode the method | `NotImplementedError` from `BaseAdapter.execute_replay_via_factory` |
+
+The first two are intentional — collecting them as data lets a
+replay-batch consumer aggregate partial results across many traces
+without exception-handling boilerplate.
diff --git a/docs/samples-guide.md b/docs/samples-guide.md
@@ -92,11 +92,13 @@ See the [MCP README](../samples/mcp/README.md) for setup instructions.
 
 Located in [`samples/copilotkit/`](../samples/copilotkit/). Full-stack integration with CopilotKit using LangGraph CoAgents and generative UI card components.
 
-- [`agents/evaluator_agent.py`](../samples/copilotkit/agents/evaluator_agent.py) -- LangGraph CoAgent for evaluation workflows
+- [`agents/evaluator_agent.py`](../samples/copilotkit/agents/evaluator_agent.py) -- LangGraph CoAgent for evaluation workflows (human-in-the-loop judge confirmation via `interrupt()`)
 - [`agents/investigator_agent.py`](../samples/copilotkit/agents/investigator_agent.py) -- LangGraph CoAgent for trace investigation
 - [`components/*.tsx`](../samples/copilotkit/components/) -- React card components for rendering results
 - [`hooks/*.ts`](../samples/copilotkit/hooks/) -- CopilotKit hooks for wiring LayerLens actions
 
+> **Checkpointer note:** Any LangGraph CoAgent that calls `interrupt()` (such as `evaluator_agent.py`) **must** be compiled with a checkpointer. Without one, the AG-UI stream ends without emitting `RUN_FINISHED` and CopilotKit blocks all subsequent messages. The sample ships with `InMemorySaver` for a zero-setup local run and documents Postgres / SQLite / Redis / LangGraph Platform alternatives for production in its [README](../samples/copilotkit/README.md#human-in-the-loop-checkpointers).
+
 See the [CopilotKit README](../samples/copilotkit/README.md) for the full list.
 
 ### Claude Code Skills (6 skills)