Skip to content

Commit 12c8b93

Browse files
mmercurimmercuri
andauthored
instrument: agent framework adapters (M1.C part 2) (#97)
Ports the twelve agent-tier framework adapters from the ateam reference implementation onto the new layerlens.instrument base layer: Semantic Kernel, LlamaIndex, OpenAI Agents, Pydantic-AI, Agno, Strands, SmolAgents, MS Agent Framework, Google ADK, Bedrock Agents, Embedding (vector store hooks), Benchmark Import Pairs with feat/instrument-frameworks-orchestration (M1.C part 1) which lands LangChain, LangGraph, CrewAI, AutoGen, Langfuse, and Agentforce. Together they complete M1.C. Scope ----- - src/layerlens/instrument/adapters/frameworks/{semantic_kernel, llama_index,openai_agents,pydantic_ai,agno,strands,smolagents, ms_agent_framework,google_adk,bedrock_agents,embedding, benchmark_import}/: per-framework packages - tests/instrument/adapters/frameworks/test_*_adapter.py + the test_bulk_ported_smoke.py harness (which exercises every ported adapter against canned trace fixtures so partial framework SDKs on a given runner don't drop coverage to zero) - samples/instrument/<framework>/: runnable per-framework samples - docs/adapters/frameworks-<framework>.md: per-framework integration guide - pyproject.toml: twelve new optional extras (semantic-kernel, llama-index, openai-agents, pydantic-ai, agno, strands, smolagents, ms-agent-framework, google-adk, bedrock-agents, embedding, benchmark-import) with python_version markers; pyright/ruff exclusions for the dynamic monkey-patching framework code Blast radius ------------ - Default `pip install layerlens` install set is unchanged. Each framework's heavy deps are gated behind their own extra. - No changes to existing public API surface. - Importing layerlens.instrument still does NOT pull in any framework module (lazy registry lookup). Test plan --------- - uv run pytest tests/instrument/adapters/frameworks/ -x -> 184 passed, 1 skipped (test_bulk_ported_smoke.py covers all 12 agent-tier adapters plus the orchestration-tier ones from part 1 via the same harness) Stacks on --------- - feat/instrument-base-foundation (M1.A) — required for the BaseAdapter surface this PR consumes. Sibling of ---------- - feat/instrument-frameworks-orchestration (M1.C part 1) — both branches stack on the base foundation independently and don't conflict; they can land in either order. LAY-3400 umbrella (M1.C part 2). Co-authored-by: mmercuri <marc@giantdigital.io>
1 parent 4a128d8 commit 12c8b93

75 files changed

Lines changed: 10804 additions & 1 deletion

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/adapters/frameworks-agno.md

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
# Agno framework adapter
2+
3+
`layerlens.instrument.adapters.frameworks.agno.AgnoAdapter` instruments
4+
[Agno](https://github.com/agno-agi/agno) agents — single-agent and
5+
multi-agent teams — by wrapping `Agent.run()` and `Agent.arun()`.
6+
7+
## Install
8+
9+
```bash
10+
pip install 'layerlens[agno]'
11+
```
12+
13+
Pulls `agno>=0.1,<1.0`. Requires Python 3.10+.
14+
15+
## Quick start
16+
17+
```python
18+
from agno.agent import Agent
19+
from agno.models.openai import OpenAIChat
20+
21+
from layerlens.instrument.adapters.frameworks.agno import AgnoAdapter, instrument_agent
22+
from layerlens.instrument.transport.sink_http import HttpEventSink
23+
24+
sink = HttpEventSink(adapter_name="agno")
25+
adapter = AgnoAdapter()
26+
adapter.add_sink(sink)
27+
adapter.connect()
28+
29+
agent = Agent(model=OpenAIChat(id="gpt-4o-mini"), instructions="Be concise.")
30+
adapter.instrument_agent(agent)
31+
32+
response = agent.run("What is 2 + 2?")
33+
34+
adapter.disconnect()
35+
sink.close()
36+
```
37+
38+
`instrument_agent(agent)` is the one-liner equivalent.
39+
40+
## What's wrapped
41+
42+
`adapter.instrument_agent(agent)` patches the following on each Agent:
43+
44+
- `run` — sync entry point. Emits `agent.input` + `agent.output` and any
45+
inner `model.invoke` / `tool.call` events.
46+
- `arun` — async entry point. Same semantics.
47+
- `_run_tool` — emits `tool.call` per tool invocation (when present in the
48+
Agno version).
49+
- Model adapter hooks — emit `model.invoke` per LLM call.
50+
51+
`disconnect()` restores all originals.
52+
53+
## Events emitted
54+
55+
| Event | Layer | When |
56+
|---|---|---|
57+
| `environment.config` | L4a | First `run` per agent. |
58+
| `agent.input` | L1 | Beginning of every `run` / `arun`. |
59+
| `agent.output` | L1 | End of every `run` / `arun`. |
60+
| `agent.action` | L4a | Per intermediate reasoning step. |
61+
| `agent.handoff` | L4a | When a team agent delegates to a sub-agent. |
62+
| `agent.state.change` | cross-cutting | Memory mutations. |
63+
| `tool.call` | L5a | Per tool invocation. |
64+
| `model.invoke` | L3 | Per LLM call. |
65+
66+
## Agno specifics
67+
68+
- **Teams**: Agno supports multi-agent teams via `Team(agents=[...])`.
69+
Each team member must be instrumented individually with
70+
`adapter.instrument_agent(team_member)` — or call
71+
`instrument_agent(team)` and the convenience helper recurses.
72+
- **Reasoning agents**: when `reasoning=True` is set on an Agent, the
73+
intermediate reasoning steps emit `agent.action` events with a
74+
`step_index` field.
75+
- **Storage backends**: Agno session storage (Postgres, sqlite, Redis,
76+
etc.) emits `agent.state.change` on every save.
77+
78+
## Capture config
79+
80+
```python
81+
from layerlens.instrument.adapters._base import CaptureConfig
82+
83+
# Recommended.
84+
adapter = AgnoAdapter(capture_config=CaptureConfig.standard())
85+
86+
# Heavy: include reasoning steps as agent.code (the chain-of-thought).
87+
adapter = AgnoAdapter(
88+
capture_config=CaptureConfig(
89+
l1_agent_io=True,
90+
l2_agent_code=True,
91+
l3_model_metadata=True,
92+
l5a_tool_calls=True,
93+
),
94+
)
95+
```
96+
97+
## BYOK
98+
99+
Agno model adapters (`OpenAIChat`, `AnthropicClaude`, etc.) read their own
100+
credentials. The Agno adapter does not own them. For platform-managed
101+
BYOK see `docs/adapters/byok.md` (atlas-app M1.B).
Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
# AWS Bedrock Agents framework adapter
2+
3+
`layerlens.instrument.adapters.frameworks.bedrock_agents.BedrockAgentsAdapter`
4+
instruments AWS Bedrock Agent runtime calls by registering boto3 event hooks
5+
and parsing the `InvokeAgent` response stream's `trace` blocks.
6+
7+
## Install
8+
9+
```bash
10+
pip install 'layerlens[bedrock-agents]'
11+
```
12+
13+
Pulls `boto3>=1.34`. AWS credentials and region must be configured the
14+
standard way (env vars, IAM role, profile).
15+
16+
## Quick start
17+
18+
```python
19+
import boto3
20+
21+
from layerlens.instrument.adapters.frameworks.bedrock_agents import (
22+
BedrockAgentsAdapter,
23+
instrument_client,
24+
)
25+
from layerlens.instrument.transport.sink_http import HttpEventSink
26+
27+
sink = HttpEventSink(adapter_name="bedrock_agents")
28+
adapter = BedrockAgentsAdapter()
29+
adapter.add_sink(sink)
30+
adapter.connect()
31+
32+
client = boto3.client("bedrock-agent-runtime", region_name="us-east-1")
33+
adapter.instrument_client(client)
34+
35+
response = client.invoke_agent(
36+
agentId="ABCDEFGHIJ",
37+
agentAliasId="TSTALIASID",
38+
sessionId="my-session",
39+
inputText="What is 2+2?",
40+
)
41+
# Iterate the response stream — trace events are captured automatically.
42+
for chunk in response["completion"]:
43+
pass
44+
45+
adapter.disconnect()
46+
sink.close()
47+
```
48+
49+
`instrument_client(client)` is the convenience helper.
50+
51+
## What's wrapped
52+
53+
`adapter.instrument_client(client)` registers two boto3 event hooks on the
54+
provided `bedrock-agent-runtime` client:
55+
56+
- `provide-client-params.bedrock-agent-runtime.InvokeAgent` — fires before
57+
the request goes out. Captures `agentId`, `sessionId`, `inputText`,
58+
emits `agent.input` and `environment.config` on first agent encounter.
59+
- `after-call.bedrock-agent-runtime.InvokeAgent` — fires after the response
60+
comes back. Walks the `trace` blocks in the streamed events and emits
61+
`model.invoke` / `tool.call` / `agent.action` per trace step.
62+
63+
`disconnect()` unregisters both hooks.
64+
65+
## Events emitted
66+
67+
| Event | Layer | When |
68+
|---|---|---|
69+
| `environment.config` | L4a | First `InvokeAgent` per `agentId`. |
70+
| `agent.input` | L1 | Beginning of every `InvokeAgent`. |
71+
| `agent.output` | L1 | End of every `InvokeAgent` (after stream consumption). |
72+
| `agent.action` | L4a | Per `orchestrationTrace.modelInvocationInput` block. |
73+
| `agent.handoff` | L4a | Per cross-agent collaboration step. |
74+
| `tool.call` | L5a | Per `actionGroupInvocationInput` / `knowledgeBaseLookupInput` block. |
75+
| `model.invoke` | L3 | Per `modelInvocationOutput` block (with token usage). |
76+
77+
## Bedrock Agents specifics
78+
79+
- **Action groups**: each `actionGroup` invocation maps to a `tool.call`
80+
with `tool_name = "{actionGroupName}::{apiPath}"` and the typed
81+
parameters in the payload.
82+
- **Knowledge bases**: every KB lookup emits a `tool.call` with
83+
`tool_name = "knowledge_base::{knowledgeBaseId}"` and the rendered
84+
query + retrieved citations.
85+
- **Multi-agent collaboration**: when a supervisor agent delegates to a
86+
collaborator, an `agent.handoff` event is emitted with both agent IDs.
87+
- **Session attributes**: passed through into `agent.input` payloads as
88+
`session_attributes`.
89+
90+
## Capture config
91+
92+
```python
93+
from layerlens.instrument.adapters._base import CaptureConfig
94+
95+
# Recommended.
96+
adapter = BedrockAgentsAdapter(capture_config=CaptureConfig.standard())
97+
98+
# Compliance: drop user input/output content but keep tool/model metadata.
99+
adapter = BedrockAgentsAdapter(
100+
capture_config=CaptureConfig(
101+
l1_agent_io=True,
102+
l3_model_metadata=True,
103+
l5a_tool_calls=True,
104+
capture_content=False,
105+
),
106+
)
107+
```
108+
109+
## BYOK
110+
111+
Bedrock Agents bills directly to your AWS account via your IAM identity.
112+
There's no separate API key to manage. The model used by the agent is
113+
configured server-side in the agent definition.
Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# Benchmark import framework adapter
2+
3+
`layerlens.instrument.adapters.frameworks.benchmark_import.BenchmarkImportAdapter`
4+
imports external benchmark datasets into Stratix evaluation spaces. Unlike
5+
the other framework adapters, this is a **data importer**, not a runtime
6+
instrumentation adapter — it reads benchmarks from disk or from
7+
HuggingFace and produces normalized rows.
8+
9+
## Install
10+
11+
```bash
12+
pip install 'layerlens[benchmark-import]'
13+
```
14+
15+
The `benchmark-import` extra has no required dependencies. To use the
16+
HuggingFace import path, additionally install `datasets`:
17+
18+
```bash
19+
pip install datasets
20+
```
21+
22+
## Quick start (CSV)
23+
24+
```python
25+
from layerlens.instrument.adapters.frameworks.benchmark_import import (
26+
BenchmarkImportAdapter,
27+
)
28+
29+
adapter = BenchmarkImportAdapter()
30+
31+
result = adapter.import_csv(
32+
path="my_benchmark.csv",
33+
schema_mapping={"question": "prompt", "answer": "expected_output"},
34+
max_records=1000,
35+
tags=["custom", "qa"],
36+
)
37+
38+
print(f"Imported {result.records_imported} records into {result.benchmark_id}")
39+
```
40+
41+
## Quick start (HuggingFace)
42+
43+
```python
44+
result = adapter.import_huggingface(
45+
dataset_name="squad",
46+
split="validation",
47+
max_records=200,
48+
tags=["public", "qa"],
49+
)
50+
```
51+
52+
## Quick start (HELM)
53+
54+
```python
55+
result = adapter.import_helm(
56+
path="/path/to/helm_results.json",
57+
tags=["helm", "leaderboard"],
58+
)
59+
```
60+
61+
## Public API
62+
63+
| Method | Description |
64+
|---|---|
65+
| `import_huggingface(dataset_name, split=, subset=, schema_mapping=, max_records=, tags=)` | Stream a HuggingFace dataset into Stratix. |
66+
| `import_helm(path, tags=)` | Import HELM JSON results. |
67+
| `import_csv(path, schema_mapping=, delimiter=, max_records=, tags=)` | Import a CSV benchmark. |
68+
| `import_json(path, schema_mapping=, records_key=, max_records=, tags=)` | Import a JSON benchmark. |
69+
| `import_parquet(path, schema_mapping=, max_records=, tags=)` | Import a Parquet benchmark (requires `pyarrow`). |
70+
71+
All methods return `ImportResult` with `success`, `benchmark_id`,
72+
`records_imported`, `records_skipped`, `duration_ms`, `errors`, and
73+
`metadata` (a `BenchmarkMetadata` Pydantic model).
74+
75+
## Schema mapping
76+
77+
Supplying a `schema_mapping` dict renames source columns to the canonical
78+
Stratix evaluation schema:
79+
80+
| Stratix field | Common source columns |
81+
|---|---|
82+
| `prompt` | `question`, `input`, `query` |
83+
| `expected_output` | `answer`, `target`, `reference`, `ground_truth` |
84+
| `difficulty` | `difficulty`, `level` |
85+
| `category` | `category`, `subject`, `topic` |
86+
87+
When no mapping is provided, the adapter applies a small set of automatic
88+
heuristics (case-insensitive name match against the canonical fields).
89+
90+
## Persistence
91+
92+
If you pass a `store=` argument to `BenchmarkImportAdapter(...)` (something
93+
that exposes `save_benchmark(metadata, records)`), the adapter writes
94+
imported benchmarks through it. Otherwise records are returned to the
95+
caller and held in `adapter._benchmarks` keyed by `benchmark_id`.
96+
97+
## Events emitted
98+
99+
This adapter does not emit telemetry events — it produces benchmark rows.
100+
Once stored in atlas-app, the platform's evaluation runner can iterate the
101+
benchmark and produce `model.invoke` / `evaluation.score` events through
102+
the standard provider adapters.
103+
104+
## BYOK
105+
106+
Not applicable. The adapter reads files locally or downloads from
107+
HuggingFace using the standard `datasets` library — no model API keys are
108+
involved.

0 commit comments

Comments
 (0)