Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -248,15 +248,62 @@ Catch the most specific exception first. The hierarchy:

Note: Only `StratixError`, `APIError`, `BadRequestError`, `AuthenticationError`, and `NotFoundError` are exported from the top-level package. For other exception types, import from `layerlens._exceptions`.

## CLI

The LayerLens CLI lets you manage traces, judges, evaluations, integrations, and more from the terminal.

### Install

```bash
pip install layerlens[cli] --extra-index-url https://sdk.layerlens.ai/package
```

### Configure

```bash
export LAYERLENS_STRATIX_API_KEY="your-api-key"
```

### Usage

```bash
stratix --help # Show all commands
stratix trace list # List traces
stratix evaluate run \
--model openai/gpt-4o \
--benchmark arc-agi-2 --wait # Run an evaluation and wait for results
stratix judge create \
--name "Quality" \
--goal "Rate response quality" \
--model-id <MODEL_ID> # Create a judge
stratix ci report -o summary.md # Generate CI report
```

Shell completions are available for bash, zsh, fish, and powershell:

```bash
stratix completion bash # Print setup instructions
```

Full CLI docs: [docs/cli/](docs/cli/)

| Guide | Description |
| --- | --- |
| [Getting Started](docs/cli/getting-started.md) | Installation, configuration, first commands |
| [Command Reference](docs/cli/commands.md) | All commands and options |
| [Examples](docs/cli/examples.md) | 15 common workflows as copy-paste shell sessions |

## Requirements

- Python 3.8+
- Dependencies: `httpx`, `pydantic`, `requests`
- CLI extra: `click>=8.0.0`

## Documentation

Full API reference and examples are available in the [docs/](docs/) directory:

- [CLI Guide](docs/cli/) (getting started, command reference, workflow examples)
- [API Reference](docs/api-reference/) (client config, all resource methods, error handling)
- [Code Examples](docs/examples/) (evaluations, judges, traces)
- [Troubleshooting](docs/troubleshooting/) (auth issues, error codes)
Expand Down
6 changes: 6 additions & 0 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,17 @@
* [Results](api-reference/results.md)
* [Models & Benchmarks](api-reference/models-benchmarks.md)
* [Judges](api-reference/judges.md)
* [Scorers](api-reference/scorers.md)
* [Traces](api-reference/traces.md)
* [Trace Evaluations](api-reference/trace-evaluations.md)
* [Judge Optimizations](api-reference/judge-optimizations.md)
* [Error Handling](api-reference/errors.md)

## CLI
* [Getting Started](cli/getting-started.md)
* [Command Reference](cli/commands.md)
* [Workflow Examples](cli/examples.md)

## Code Examples
* [Overview](examples/README.md)
* [Creating Evaluations](examples/creating-evaluations.md)
Expand Down
30 changes: 19 additions & 11 deletions docs/api-reference/evaluations.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,22 +177,23 @@ async def get_evaluation():
asyncio.run(get_evaluation())
```

### `get_many(page=None, page_size=None, sort_by=None, order=None, model_ids=None, benchmark_ids=None, status=None, timeout=None)`
### `get_many(page=None, page_size=None, sort_by=None, order=None, model_ids=None, benchmark_ids=None, status=None, unique=False, timeout=None)`

Retrieves multiple evaluations with optional pagination, sorting, and filtering.

#### Parameters

| Parameter | Type | Required | Description |
| --------------- | -------------------------------- | -------- | ------------------------------------------------------- |
| `page` | `int \| None` | No | Page number for pagination (1-based, defaults to 1) |
| `page_size` | `int \| None` | No | Number of evaluations per page (default: 100, max: 500) |
| `sort_by` | `str \| None` | No | Sort by field: `submittedAt`, `accuracy`, or `averageDuration` |
| `order` | `str \| None` | No | Sort order: `asc` or `desc` |
| `model_ids` | `List[str] \| None` | No | Filter by model IDs |
| `benchmark_ids` | `List[str] \| None` | No | Filter by benchmark/dataset IDs |
| `status` | `EvaluationStatus \| None` | No | Filter by evaluation status |
| `timeout` | `float \| httpx.Timeout \| None` | No | Override request timeout |
| Parameter | Type | Required | Description |
| --------------- | -------------------------------- | -------- | ----------------------------------------------------------------------------------- |
| `page` | `int \| None` | No | Page number for pagination (1-based, defaults to 1) |
| `page_size` | `int \| None` | No | Number of evaluations per page (default: 100, max: 500) |
| `sort_by` | `str \| None` | No | Sort by field: `submitted_at`, `accuracy`, or `average_duration` |
| `order` | `str \| None` | No | Sort order: `asc` or `desc` |
| `model_ids` | `List[str] \| None` | No | Filter by model IDs |
| `benchmark_ids` | `List[str] \| None` | No | Filter by benchmark/dataset IDs |
| `status` | `EvaluationStatus \| None` | No | Filter by evaluation status |
| `unique` | `bool` | No | If `True`, deduplicate by model+benchmark pair, keeping only the latest evaluation |
| `timeout` | `float \| httpx.Timeout \| None` | No | Override request timeout |

#### Returns

Expand Down Expand Up @@ -222,6 +223,13 @@ response = client.evaluations.get_many(
if response:
for evaluation in response.evaluations:
print(f"{evaluation.id}: accuracy={evaluation.accuracy:.2f}%")

# Get only the latest evaluation per model+benchmark pair
response = client.evaluations.get_many(
unique=True,
sort_by="accuracy",
order="desc",
)
```

### `get_results(page=None, page_size=None, timeout=None)`
Expand Down
39 changes: 24 additions & 15 deletions docs/api-reference/models-benchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,20 +35,24 @@ benchmarks = client.benchmarks.get()

## Models

### `get(type=None, name=None, companies=None, regions=None, licenses=None, timeout=None)`
### `get(type=None, name=None, key=None, categories=None, companies=None, regions=None, licenses=None, timeout=None)`

Retrieves a list of available models with optional filtering parameters. Both the `Stratix` and `AsyncStratix` clients have this method.

#### Parameters

| Parameter | Type | Required | Description |
| ----------- | ------------------------------------- | -------- | ---------------------------------------------------------------------- |
| `type` | `Literal["custom", "public"] \| None` | No | Filter by model type. If `None`, returns both custom and public models |
| `name` | `str \| None` | No | Filter models by name (partial match search) |
| `companies` | `List[str] \| None` | No | Filter by model companies/providers |
| `regions` | `List[str] \| None` | No | Filter by supported regions |
| `licenses` | `List[str] \| None` | No | Filter by license types |
| `timeout` | `float \| httpx.Timeout \| None` | No | Override request timeout |
| Parameter | Type | Required | Description |
| ------------ | ------------------------------------- | -------- | ---------------------------------------------------------------------------------------------- |
| `type` | `Literal["custom", "public"] \| None` | No | Filter by model type. If `None`, returns both custom and public models |
| `name` | `str \| None` | No | Filter models by name (partial match search) |
| `key` | `str \| None` | No | Filter models by key (partial match search) |
| `categories` | `List[str] \| None` | No | Filter by categories: `Transformer`, `MoE`, `Open-Source`, `Closed-Source` |
| `companies` | `List[str] \| None` | No | Filter by model companies/providers |
| `regions` | `List[str] \| None` | No | Filter by supported regions |
| `licenses` | `List[str] \| None` | No | Filter by license types |
| `timeout` | `float \| httpx.Timeout \| None` | No | Override request timeout |

> **Note:** When filtering by `categories`, `companies`, `regions`, or `licenses`, only public models are returned since custom models do not have these fields.

#### Returns

Expand Down Expand Up @@ -185,17 +189,22 @@ if result:

## Benchmarks

### `get(type=None, name=None, timeout=None)`
### `get(type=None, name=None, key=None, categories=None, languages=None, timeout=None)`

Retrieves a list of available benchmarks with optional filtering parameters. Both the `Stratix` and `AsyncStratix` clients have this method.

#### Parameters

| Parameter | Type | Required | Description |
| --------- | ------------------------------------- | -------- | ------------------------------------------------------------------------------ |
| `type` | `Literal["custom", "public"] \| None` | No | Filter by benchmark type. If `None`, returns both custom and public benchmarks |
| `name` | `str \| None` | No | Filter benchmarks by name (partial match search) |
| `timeout` | `float \| httpx.Timeout \| None` | No | Override request timeout |
| Parameter | Type | Required | Description |
| ------------ | ------------------------------------- | -------- | ------------------------------------------------------------------------------ |
| `type` | `Literal["custom", "public"] \| None` | No | Filter by benchmark type. If `None`, returns both custom and public benchmarks |
| `name` | `str \| None` | No | Filter benchmarks by name (partial match search) |
| `key` | `str \| None` | No | Filter benchmarks by key (partial match search) |
| `categories` | `List[str] \| None` | No | Filter by categories (e.g., `reasoning`, `knowledge`, `coding`) |
| `languages` | `List[str] \| None` | No | Filter by language (e.g., `english`, `french`) |
| `timeout` | `float \| httpx.Timeout \| None` | No | Override request timeout |

> **Note:** When filtering by `categories` or `languages`, only public benchmarks are returned since custom benchmarks do not have these fields.

#### Returns

Expand Down
21 changes: 11 additions & 10 deletions docs/api-reference/public-client.md
Original file line number Diff line number Diff line change
Expand Up @@ -286,16 +286,17 @@ Retrieves evaluations with optional pagination, sorting, and filtering.

#### Parameters

| Parameter | Type | Required | Description |
| ----------------- | -------------------------------- | -------- | ------------------------------------------------------------------ |
| `page` | `int \| None` | No | Page number for pagination (1-based, defaults to 1) |
| `page_size` | `int \| None` | No | Number of evaluations per page (default: 100, max: 500) |
| `sort_by` | `str \| None` | No | Sort by field: `submittedAt`, `accuracy`, or `averageDuration` |
| `order` | `str \| None` | No | Sort order: `asc` or `desc` |
| `model_ids` | `List[str] \| None` | No | Filter by model IDs |
| `benchmark_ids` | `List[str] \| None` | No | Filter by benchmark/dataset IDs |
| `status` | `EvaluationStatus \| None` | No | Filter by evaluation status |
| `timeout` | `float \| httpx.Timeout \| None` | No | Override request timeout |
| Parameter | Type | Required | Description |
| ----------------- | -------------------------------- | -------- | ---------------------------------------------------------------------------------- |
| `page` | `int \| None` | No | Page number for pagination (1-based, defaults to 1) |
| `page_size` | `int \| None` | No | Number of evaluations per page (default: 100, max: 500) |
| `sort_by` | `str \| None` | No | Sort by field: `submitted_at`, `accuracy`, or `average_duration` |
| `order` | `str \| None` | No | Sort order: `asc` or `desc` |
| `model_ids` | `List[str] \| None` | No | Filter by model IDs |
| `benchmark_ids` | `List[str] \| None` | No | Filter by benchmark/dataset IDs |
| `status` | `EvaluationStatus \| None` | No | Filter by evaluation status |
| `unique` | `bool` | No | If `True`, deduplicate by model+benchmark pair, keeping only the latest evaluation |
| `timeout` | `float \| httpx.Timeout \| None` | No | Override request timeout |

#### Returns

Expand Down
Loading
Loading