LayerLens · mmercuri · Apr 26, 2026
diff --git a/docs/adapters/providers-anthropic.md b/docs/adapters/providers-anthropic.md
@@ -0,0 +1,70 @@
+# Anthropic provider adapter
+
+`layerlens.instrument.adapters.providers.anthropic_adapter.AnthropicAdapter`
+instruments the Anthropic Python SDK to emit telemetry on every
+`messages.create` and `messages.stream` call.
+
+## Install
+
+```bash
+pip install 'layerlens[providers-anthropic]'
+```
+
+Pulls `anthropic>=0.30,<1`.
+
+## Quick start
+
+```python
+from anthropic import Anthropic
+from layerlens.instrument.adapters.providers.anthropic_adapter import AnthropicAdapter
+from layerlens.instrument.transport.sink_http import HttpEventSink
+
+sink = HttpEventSink(adapter_name="anthropic")
+adapter = AnthropicAdapter()
+adapter.add_sink(sink)
+adapter.connect()
+
+client = Anthropic()
+adapter.connect_client(client)
+
+response = client.messages.create(
+    model="claude-haiku-4-5-20251001",
+    max_tokens=20,
+    messages=[{"role": "user", "content": "Hello"}],
+)
+
+adapter.disconnect()
+sink.close()
+```
+
+## Events emitted
+
+| Event | Layer | When |
+|---|---|---|
+| `model.invoke` | L3 | Every `messages.create` (success or failure) and once per stream completion |
+| `cost.record` | cross-cutting | Every successful call with token usage |
+| `tool.call` | L5a | One per `tool_use` block in the response |
+| `policy.violation` | cross-cutting | When the SDK raises (rate limit, invalid input, etc.) |
+
+The `model.invoke` payload includes Anthropic-specific fields:
+- `cache_creation_input_tokens` / `cache_read_input_tokens` (when prompt caching is used)
+- `parameters.has_system: true` when a system prompt is supplied
+- `parameters.tools_count` when tools are passed
+- `reasoning_tokens` (Claude extended thinking)
+
+## Streaming
+
+The adapter wraps both `messages.create(stream=True)` and the
+`messages.stream()` context manager. A single consolidated `model.invoke`
+fires on stream completion, accumulating content from `text_delta` events
+and tool input from `input_json_delta` events.
+
+## Cost calculation
+
+Pricing comes from the canonical table — Claude models get the 90% cached-token
+discount automatically.
+
+## BYOK
+
+Same pattern as the OpenAI adapter — pass `api_key` to the `Anthropic()` client.
+The platform-side BYOK store ships in atlas-app M1.B.
diff --git a/docs/adapters/providers-azure-openai.md b/docs/adapters/providers-azure-openai.md
@@ -0,0 +1,48 @@
+# Azure OpenAI provider adapter
+
+`layerlens.instrument.adapters.providers.azure_openai_adapter.AzureOpenAIAdapter`
+uses the same `openai` SDK as the OpenAI adapter but captures Azure-specific
+metadata (deployment, endpoint, region, api-version) and uses the Azure
+pricing table.
+
+## Install
+
+```bash
+pip install 'layerlens[providers-azure-openai]'
+```
+
+## Quick start
+
+```python
+from openai import AzureOpenAI
+from layerlens.instrument.adapters.providers.azure_openai_adapter import AzureOpenAIAdapter
+from layerlens.instrument.transport.sink_http import HttpEventSink
+
+sink = HttpEventSink(adapter_name="azure_openai")
+adapter = AzureOpenAIAdapter()
+adapter.add_sink(sink)
+adapter.connect()
+
+client = AzureOpenAI(
+    api_key="...",
+    api_version="2024-08-01-preview",
+    azure_endpoint="https://my-resource.openai.azure.com/",
+)
+adapter.connect_client(client)
+
+client.chat.completions.create(model="my-deployment", messages=[...])
+```
+
+## Azure-specific behavior
+
+- **Endpoint sanitization**: query strings are stripped from the captured
+  `azure_endpoint` to prevent token leakage if the URL ever contains an
+  `api-key` query param.
+- **Pricing**: cost calculations use `AZURE_PRICING` (different rates than
+  OpenAI's public API).
+- **api-version**: read from `client._api_version` or the `api-version` key of
+  `client._custom_query` and surfaced in every `model.invoke`.
+
+## Events emitted
+
+Same set as OpenAI: `model.invoke`, `cost.record`, `tool.call`, `policy.violation`.
diff --git a/docs/adapters/providers-bedrock.md b/docs/adapters/providers-bedrock.md
@@ -0,0 +1,64 @@
+# AWS Bedrock provider adapter
+
+`layerlens.instrument.adapters.providers.bedrock_adapter.AWSBedrockAdapter`
+wraps the `bedrock-runtime` boto3 client. Bedrock is a multi-provider
+front: Anthropic, Meta, Cohere, Amazon Titan, AI21, and Mistral models all
+flow through the same client interface but with different request and
+response body shapes. The adapter detects the provider family from
+`modelId` and parses tokens, content, and stop reasons accordingly.
+
+## Install
+
+```bash
+pip install 'layerlens[providers-bedrock]'
+```
+
+Pulls `boto3>=1.34`.
+
+## Quick start
+
+```python
+import boto3
+from layerlens.instrument.adapters.providers.bedrock_adapter import AWSBedrockAdapter
+from layerlens.instrument.transport.sink_http import HttpEventSink
+
+sink = HttpEventSink(adapter_name="aws_bedrock")
+adapter = AWSBedrockAdapter()
+adapter.add_sink(sink)
+adapter.connect()
+
+client = boto3.client("bedrock-runtime", region_name="us-east-1")
+adapter.connect_client(client)
+
+# Either invoke_model or converse — both wrapped.
+client.converse(
+    modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
+    messages=[{"role": "user", "content": [{"text": "Hi"}]}],
+)
+```
+
+## Wrapped methods
+
+- `invoke_model` — body is JSON, parsed per provider family. Response body is
+  wrapped in a `_RereadableBody` so the caller's downstream `body.read()`
+  still works.
+- `converse` — unified Anthropic-style envelope. Token extraction is uniform.
+- `invoke_model_with_response_stream` — emits `model.invoke` immediately with
+  `streaming=true`; content extraction during stream consumption is deferred
+  to a future PR.
+- `converse_stream` — same.
+
+## Provider-family token extraction
+
+| `modelId` prefix | Family | Token fields |
+|---|---|---|
+| `anthropic.` | anthropic | `usage.input_tokens` / `usage.output_tokens` |
+| `meta.` | meta | `prompt_token_count` / `generation_token_count` |
+| `cohere.` | cohere | `meta.billed_units.input_tokens` / `output_tokens` |
+| `amazon.` | amazon | (no usage in body; tokens come from `Converse` API) |
+| `ai21.` | ai21 | (handled via `Converse` API) |
+| `mistral.` | mistral | `prompt_tokens` / `completion_tokens` |
+
+## Cost calculation
+
+Uses the `BEDROCK_PRICING` table (separate from OpenAI/Azure tables).
diff --git a/docs/adapters/providers-cohere.md b/docs/adapters/providers-cohere.md
@@ -0,0 +1,78 @@
+# Cohere provider adapter
+
+`layerlens.instrument.adapters.providers.cohere_adapter.CohereAdapter`
+instruments the Cohere Python SDK (v5+) for chat (v1 + v2) and embeddings.
+
+## Install
+
+```bash
+pip install 'layerlens[providers-cohere]'
+```
+
+Pulls `cohere>=5.0,<6`.
+
+## Quick start
+
+```python
+import cohere
+from layerlens.instrument.adapters.providers.cohere_adapter import CohereAdapter
+from layerlens.instrument.transport.sink_http import HttpEventSink
+
+sink = HttpEventSink(adapter_name="cohere")
+adapter = CohereAdapter()
+adapter.add_sink(sink)
+adapter.connect()
+
+client = cohere.Client()
+adapter.connect_client(client)
+
+# v1 chat (single message + optional preamble)
+response = client.chat(model="command-r-plus", message="Hello", preamble="Be concise.")
+
+# v2 chat (OpenAI-style messages list)
+response = client.v2.chat(
+    model="command-r-plus",
+    messages=[{"role": "user", "content": "Hello"}],
+)
+```
+
+## What's wrapped
+
+- `client.chat` (v1) — `message` is normalized to a `user` role; optional `preamble` becomes a `system` message at index 0.
+- `client.v2.chat` — already OpenAI-style; messages pass through.
+- `client.embed` — `meta.billed_units.input_tokens` populates the cost record.
+
+## Events emitted
+
+| Event | Layer | When |
+|---|---|---|
+| `model.invoke` | L3 | Every chat or embed call (success or failure). |
+| `cost.record` | cross-cutting | Every successful call with billed units. |
+| `tool.call` | L5a | One per tool call in the response (v1: `tool_calls[].name/parameters`; v2: `message.tool_calls[].function.{name,arguments}`). |
+| `policy.violation` | cross-cutting | When the SDK raises (rate limit, invalid input, etc.). |
+
+## Cost calculation
+
+Pricing is sourced from the canonical `PRICING` table:
+
+| Model | Input | Output |
+|---|---|---|
+| command-r-plus | $0.003 | $0.015 |
+| command-r | $0.0005 | $0.0015 |
+| command-r-plus-08-2024 | $0.0025 | $0.01 |
+| command-r-08-2024 | $0.00015 | $0.0006 |
+| command | $0.001 | $0.002 |
+| command-light | $0.0003 | $0.0006 |
+
+Cohere-via-Bedrock models use `BEDROCK_PRICING` instead.
+
+## Streaming
+
+The current adapter wraps non-streaming `chat` and `chat_stream`-style
+calls. If you call `client.chat_stream(...)` directly, the underlying
+function is not currently wrapped — open an issue if you need it.
+
+## BYOK
+
+Pass `api_key` to `cohere.Client(api_key=...)` as you would normally.
+The platform-side BYOK store ships in atlas-app M1.B.
diff --git a/docs/adapters/providers-google-vertex.md b/docs/adapters/providers-google-vertex.md
@@ -0,0 +1,52 @@
+# Google Vertex AI provider adapter
+
+`layerlens.instrument.adapters.providers.google_vertex_adapter.GoogleVertexAdapter`
+wraps `GenerativeModel.generate_content` from either the
+`google.generativeai` or `vertexai.generative_models` SDK.
+
+## Install
+
+```bash
+pip install 'layerlens[providers-vertex]'
+```
+
+Pulls `google-cloud-aiplatform>=1.50,<2`.
+
+## Quick start
+
+```python
+from vertexai.generative_models import GenerativeModel
+from layerlens.instrument.adapters.providers.google_vertex_adapter import GoogleVertexAdapter
+from layerlens.instrument.transport.sink_http import HttpEventSink
+
+sink = HttpEventSink(adapter_name="google_vertex")
+adapter = GoogleVertexAdapter()
+adapter.add_sink(sink)
+adapter.connect()
+
+model = GenerativeModel("gemini-1.5-pro")
+adapter.connect_client(model)
+
+response = model.generate_content("Why is the sky blue?")
+```
+
+## Vertex-specific behavior
+
+- **`models/` prefix stripping**: `model_name="models/gemini-1.5-pro"` is normalized to
+  `gemini-1.5-pro` for pricing-table lookup.
+- **Function calls**: extracted from `candidates[0].content.parts[].function_call`
+  and emitted as `tool.call` events with the `args` dict.
+- **`thoughts_token_count`**: when the model returns reasoning tokens, they
+  populate `model.invoke.reasoning_tokens`.
+- **`finish_reason`**: enum value name is captured (e.g., `"STOP"`, `"MAX_TOKENS"`).
+
+## Streaming
+
+`generate_content(stream=True)` is wrapped — the adapter accumulates
+chunk-level usage and emits one consolidated `model.invoke` on stream
+completion. Function calls in streamed responses follow the same accumulation
+pattern.
+
+## Cost calculation
+
+Gemini models get the 75% cached-token discount per the canonical pricing table.
diff --git a/docs/adapters/providers-litellm.md b/docs/adapters/providers-litellm.md
@@ -0,0 +1,67 @@
+# LiteLLM provider adapter
+
+`layerlens.instrument.adapters.providers.litellm_adapter.LiteLLMAdapter`
+hooks into LiteLLM's callback system rather than monkey-patching client
+methods. This avoids interfering with LiteLLM's own routing, fallback, and
+retry behavior.
+
+## Install
+
+```bash
+pip install 'layerlens[providers-litellm]'
+```
+
+Pulls `litellm>=1.40,<2`.
+
+## Quick start
+
+```python
+import litellm
+from layerlens.instrument.adapters.providers.litellm_adapter import LiteLLMAdapter
+from layerlens.instrument.transport.sink_http import HttpEventSink
+
+sink = HttpEventSink(adapter_name="litellm")
+adapter = LiteLLMAdapter()
+adapter.add_sink(sink)
+adapter.connect()  # registers the callback with litellm.callbacks
+
+# No connect_client needed — the callback is module-global.
+litellm.completion(
+    model="openai/gpt-4o-mini",
+    messages=[{"role": "user", "content": "Hi"}],
+)
+
+adapter.disconnect()  # removes the callback
+```
+
+## Provider auto-detection
+
+The adapter parses LiteLLM model strings and routes the `provider` field of
+each event to the underlying provider name:
+
+| Prefix | Provider |
+|---|---|
+| `openai/` | `openai` |
+| `anthropic/` | `anthropic` |
+| `azure/` | `azure_openai` |
+| `bedrock/` | `aws_bedrock` |
+| `vertex_ai/` | `google_vertex` |
+| `ollama/` | `ollama` |
+| `cohere/` | `cohere` |
+| `groq/` | `groq` |
+| (no prefix) | inferred from model name (`gpt-`, `claude-`, `gemini-`, ...) |
+
+Unrecognized models get `provider="unknown"`.
+
+## Cost calculation
+
+Cost is sourced in this order:
+1. LiteLLM's own `litellm.completion_cost(...)` — if it returns a non-None value,
+   it's used and the event is tagged `cost_source: "litellm"`.
+2. The canonical LayerLens pricing table for the resolved provider.
+
+## Backward-compat alias
+
+`STRATIXLiteLLMCallback` is preserved as an alias for `LayerLensLiteLLMCallback`
+so users coming from the `ateam` framework codebase don't need to rewrite
+imports immediately. The alias will be removed in v2.0.