Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 70 additions & 0 deletions docs/adapters/providers-anthropic.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Anthropic provider adapter

`layerlens.instrument.adapters.providers.anthropic_adapter.AnthropicAdapter`
instruments the Anthropic Python SDK to emit telemetry on every
`messages.create` and `messages.stream` call.

## Install

```bash
pip install 'layerlens[providers-anthropic]'
```

Pulls `anthropic>=0.30,<1`.

## Quick start

```python
from anthropic import Anthropic
from layerlens.instrument.adapters.providers.anthropic_adapter import AnthropicAdapter
from layerlens.instrument.transport.sink_http import HttpEventSink

sink = HttpEventSink(adapter_name="anthropic")
adapter = AnthropicAdapter()
adapter.add_sink(sink)
adapter.connect()

client = Anthropic()
adapter.connect_client(client)

response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=20,
messages=[{"role": "user", "content": "Hello"}],
)

adapter.disconnect()
sink.close()
```

## Events emitted

| Event | Layer | When |
|---|---|---|
| `model.invoke` | L3 | Every `messages.create` (success or failure) and once per stream completion |
| `cost.record` | cross-cutting | Every successful call with token usage |
| `tool.call` | L5a | One per `tool_use` block in the response |
| `policy.violation` | cross-cutting | When the SDK raises (rate limit, invalid input, etc.) |

The `model.invoke` payload includes Anthropic-specific fields:
- `cache_creation_input_tokens` / `cache_read_input_tokens` (when prompt caching is used)
- `parameters.has_system: true` when a system prompt is supplied
- `parameters.tools_count` when tools are passed
- `reasoning_tokens` (Claude extended thinking)

## Streaming

The adapter wraps both `messages.create(stream=True)` and the
`messages.stream()` context manager. A single consolidated `model.invoke`
fires on stream completion, accumulating content from `text_delta` events
and tool input from `input_json_delta` events.

## Cost calculation

Pricing comes from the canonical table — Claude models get the 90% cached-token
discount automatically.

## BYOK

Same pattern as the OpenAI adapter — pass `api_key` to the `Anthropic()` client.
The platform-side BYOK store ships in atlas-app M1.B.
48 changes: 48 additions & 0 deletions docs/adapters/providers-azure-openai.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Azure OpenAI provider adapter

`layerlens.instrument.adapters.providers.azure_openai_adapter.AzureOpenAIAdapter`
uses the same `openai` SDK as the OpenAI adapter but captures Azure-specific
metadata (deployment, endpoint, region, api-version) and uses the Azure
pricing table.

## Install

```bash
pip install 'layerlens[providers-azure-openai]'
```

## Quick start

```python
from openai import AzureOpenAI
from layerlens.instrument.adapters.providers.azure_openai_adapter import AzureOpenAIAdapter
from layerlens.instrument.transport.sink_http import HttpEventSink

sink = HttpEventSink(adapter_name="azure_openai")
adapter = AzureOpenAIAdapter()
adapter.add_sink(sink)
adapter.connect()

client = AzureOpenAI(
api_key="...",
api_version="2024-08-01-preview",
azure_endpoint="https://my-resource.openai.azure.com/",
)
adapter.connect_client(client)

client.chat.completions.create(model="my-deployment", messages=[...])
```

## Azure-specific behavior

- **Endpoint sanitization**: query strings are stripped from the captured
`azure_endpoint` to prevent token leakage if the URL ever contains an
`api-key` query param.
- **Pricing**: cost calculations use `AZURE_PRICING` (different rates than
OpenAI's public API).
- **api-version**: read from `client._api_version` or the `api-version` key of
`client._custom_query` and surfaced in every `model.invoke`.

## Events emitted

Same set as OpenAI: `model.invoke`, `cost.record`, `tool.call`, `policy.violation`.
64 changes: 64 additions & 0 deletions docs/adapters/providers-bedrock.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# AWS Bedrock provider adapter

`layerlens.instrument.adapters.providers.bedrock_adapter.AWSBedrockAdapter`
wraps the `bedrock-runtime` boto3 client. Bedrock is a multi-provider
front: Anthropic, Meta, Cohere, Amazon Titan, AI21, and Mistral models all
flow through the same client interface but with different request and
response body shapes. The adapter detects the provider family from
`modelId` and parses tokens, content, and stop reasons accordingly.

## Install

```bash
pip install 'layerlens[providers-bedrock]'
```

Pulls `boto3>=1.34`.

## Quick start

```python
import boto3
from layerlens.instrument.adapters.providers.bedrock_adapter import AWSBedrockAdapter
from layerlens.instrument.transport.sink_http import HttpEventSink

sink = HttpEventSink(adapter_name="aws_bedrock")
adapter = AWSBedrockAdapter()
adapter.add_sink(sink)
adapter.connect()

client = boto3.client("bedrock-runtime", region_name="us-east-1")
adapter.connect_client(client)

# Either invoke_model or converse — both wrapped.
client.converse(
modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
messages=[{"role": "user", "content": [{"text": "Hi"}]}],
)
```

## Wrapped methods

- `invoke_model` — body is JSON, parsed per provider family. Response body is
wrapped in a `_RereadableBody` so the caller's downstream `body.read()`
still works.
- `converse` — unified Anthropic-style envelope. Token extraction is uniform.
- `invoke_model_with_response_stream` — emits `model.invoke` immediately with
`streaming=true`; content extraction during stream consumption is deferred
to a future PR.
- `converse_stream` — same.

## Provider-family token extraction

| `modelId` prefix | Family | Token fields |
|---|---|---|
| `anthropic.` | anthropic | `usage.input_tokens` / `usage.output_tokens` |
| `meta.` | meta | `prompt_token_count` / `generation_token_count` |
| `cohere.` | cohere | `meta.billed_units.input_tokens` / `output_tokens` |
| `amazon.` | amazon | (no usage in body; tokens come from `Converse` API) |
| `ai21.` | ai21 | (handled via `Converse` API) |
| `mistral.` | mistral | `prompt_tokens` / `completion_tokens` |

## Cost calculation

Uses the `BEDROCK_PRICING` table (separate from OpenAI/Azure tables).
78 changes: 78 additions & 0 deletions docs/adapters/providers-cohere.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Cohere provider adapter

`layerlens.instrument.adapters.providers.cohere_adapter.CohereAdapter`
instruments the Cohere Python SDK (v5+) for chat (v1 + v2) and embeddings.

## Install

```bash
pip install 'layerlens[providers-cohere]'
```

Pulls `cohere>=5.0,<6`.

## Quick start

```python
import cohere
from layerlens.instrument.adapters.providers.cohere_adapter import CohereAdapter
from layerlens.instrument.transport.sink_http import HttpEventSink

sink = HttpEventSink(adapter_name="cohere")
adapter = CohereAdapter()
adapter.add_sink(sink)
adapter.connect()

client = cohere.Client()
adapter.connect_client(client)

# v1 chat (single message + optional preamble)
response = client.chat(model="command-r-plus", message="Hello", preamble="Be concise.")

# v2 chat (OpenAI-style messages list)
response = client.v2.chat(
model="command-r-plus",
messages=[{"role": "user", "content": "Hello"}],
)
```

## What's wrapped

- `client.chat` (v1) — `message` is normalized to a `user` role; optional `preamble` becomes a `system` message at index 0.
- `client.v2.chat` — already OpenAI-style; messages pass through.
- `client.embed` — `meta.billed_units.input_tokens` populates the cost record.

## Events emitted

| Event | Layer | When |
|---|---|---|
| `model.invoke` | L3 | Every chat or embed call (success or failure). |
| `cost.record` | cross-cutting | Every successful call with billed units. |
| `tool.call` | L5a | One per tool call in the response (v1: `tool_calls[].name/parameters`; v2: `message.tool_calls[].function.{name,arguments}`). |
| `policy.violation` | cross-cutting | When the SDK raises (rate limit, invalid input, etc.). |

## Cost calculation

Pricing is sourced from the canonical `PRICING` table:

| Model | Input | Output |
|---|---|---|
| command-r-plus | $0.003 | $0.015 |
| command-r | $0.0005 | $0.0015 |
| command-r-plus-08-2024 | $0.0025 | $0.01 |
| command-r-08-2024 | $0.00015 | $0.0006 |
| command | $0.001 | $0.002 |
| command-light | $0.0003 | $0.0006 |

Cohere-via-Bedrock models use `BEDROCK_PRICING` instead.

## Streaming

The current adapter wraps non-streaming `chat` and `chat_stream`-style
calls. If you call `client.chat_stream(...)` directly, the underlying
function is not currently wrapped — open an issue if you need it.

## BYOK

Pass `api_key` to `cohere.Client(api_key=...)` as you would normally.
The platform-side BYOK store ships in atlas-app M1.B.
52 changes: 52 additions & 0 deletions docs/adapters/providers-google-vertex.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Google Vertex AI provider adapter

`layerlens.instrument.adapters.providers.google_vertex_adapter.GoogleVertexAdapter`
wraps `GenerativeModel.generate_content` from either the
`google.generativeai` or `vertexai.generative_models` SDK.

## Install

```bash
pip install 'layerlens[providers-vertex]'
```

Pulls `google-cloud-aiplatform>=1.50,<2`.

## Quick start

```python
from vertexai.generative_models import GenerativeModel
from layerlens.instrument.adapters.providers.google_vertex_adapter import GoogleVertexAdapter
from layerlens.instrument.transport.sink_http import HttpEventSink

sink = HttpEventSink(adapter_name="google_vertex")
adapter = GoogleVertexAdapter()
adapter.add_sink(sink)
adapter.connect()

model = GenerativeModel("gemini-1.5-pro")
adapter.connect_client(model)

response = model.generate_content("Why is the sky blue?")
```

## Vertex-specific behavior

- **`models/` prefix stripping**: `model_name="models/gemini-1.5-pro"` is normalized to
`gemini-1.5-pro` for pricing-table lookup.
- **Function calls**: extracted from `candidates[0].content.parts[].function_call`
and emitted as `tool.call` events with the `args` dict.
- **`thoughts_token_count`**: when the model returns reasoning tokens, they
populate `model.invoke.reasoning_tokens`.
- **`finish_reason`**: enum value name is captured (e.g., `"STOP"`, `"MAX_TOKENS"`).

## Streaming

`generate_content(stream=True)` is wrapped — the adapter accumulates
chunk-level usage and emits one consolidated `model.invoke` on stream
completion. Function calls in streamed responses follow the same accumulation
pattern.

## Cost calculation

Gemini models get the 75% cached-token discount per the canonical pricing table.
67 changes: 67 additions & 0 deletions docs/adapters/providers-litellm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# LiteLLM provider adapter

`layerlens.instrument.adapters.providers.litellm_adapter.LiteLLMAdapter`
hooks into LiteLLM's callback system rather than monkey-patching client
methods. This avoids interfering with LiteLLM's own routing, fallback, and
retry behavior.

## Install

```bash
pip install 'layerlens[providers-litellm]'
```

Pulls `litellm>=1.40,<2`.

## Quick start

```python
import litellm
from layerlens.instrument.adapters.providers.litellm_adapter import LiteLLMAdapter
from layerlens.instrument.transport.sink_http import HttpEventSink

sink = HttpEventSink(adapter_name="litellm")
adapter = LiteLLMAdapter()
adapter.add_sink(sink)
adapter.connect() # registers the callback with litellm.callbacks

# No connect_client needed — the callback is module-global.
litellm.completion(
model="openai/gpt-4o-mini",
messages=[{"role": "user", "content": "Hi"}],
)

adapter.disconnect() # removes the callback
```

## Provider auto-detection

The adapter parses LiteLLM model strings and routes the `provider` field of
each event to the underlying provider name:

| Prefix | Provider |
|---|---|
| `openai/` | `openai` |
| `anthropic/` | `anthropic` |
| `azure/` | `azure_openai` |
| `bedrock/` | `aws_bedrock` |
| `vertex_ai/` | `google_vertex` |
| `ollama/` | `ollama` |
| `cohere/` | `cohere` |
| `groq/` | `groq` |
| (no prefix) | inferred from model name (`gpt-`, `claude-`, `gemini-`, ...) |

Unrecognized models get `provider="unknown"`.

## Cost calculation

Cost is sourced in this order:
1. LiteLLM's own `litellm.completion_cost(...)` — if it returns a non-None value,
it's used and the event is tagged `cost_source: "litellm"`.
2. The canonical LayerLens pricing table for the resolved provider.

## Backward-compat alias

`STRATIXLiteLLMCallback` is preserved as an alias for `LayerLensLiteLLMCallback`
so users coming from the `ateam` framework codebase don't need to rewrite
imports immediately. The alias will be removed in v2.0.
Loading