LayerLens · m-peko · Mar 21, 2026 · Mar 19, 2026 · Mar 19, 2026 · Mar 20, 2026
diff --git a/README.md b/README.md
@@ -248,15 +248,62 @@ Catch the most specific exception first. The hierarchy:
 
 Note: Only `StratixError`, `APIError`, `BadRequestError`, `AuthenticationError`, and `NotFoundError` are exported from the top-level package. For other exception types, import from `layerlens._exceptions`.
 
+## CLI
+
+The LayerLens CLI lets you manage traces, judges, evaluations, integrations, and more from the terminal.
+
+### Install
+
+```bash
+pip install layerlens[cli] --extra-index-url https://sdk.layerlens.ai/package
+```
+
+### Configure
+
+```bash
+export LAYERLENS_STRATIX_API_KEY="your-api-key"
+```
+
+### Usage
+
+```bash
+stratix --help                   # Show all commands
+stratix trace list               # List traces
+stratix evaluate run \
+  --model openai/gpt-4o \
+  --benchmark arc-agi-2 --wait     # Run an evaluation and wait for results
+stratix judge create \
+  --name "Quality" \
+  --goal "Rate response quality" \
+  --model-id <MODEL_ID>            # Create a judge
+stratix ci report -o summary.md  # Generate CI report
+```
+
+Shell completions are available for bash, zsh, fish, and powershell:
+
+```bash
+stratix completion bash          # Print setup instructions
+```
+
+Full CLI docs: [docs/cli/](docs/cli/)
+
+| Guide | Description |
+| --- | --- |
+| [Getting Started](docs/cli/getting-started.md) | Installation, configuration, first commands |
+| [Command Reference](docs/cli/commands.md) | All commands and options |
+| [Examples](docs/cli/examples.md) | 15 common workflows as copy-paste shell sessions |
+
 ## Requirements
 
 - Python 3.8+
 - Dependencies: `httpx`, `pydantic`, `requests`
+- CLI extra: `click>=8.0.0`
 
 ## Documentation
 
 Full API reference and examples are available in the [docs/](docs/) directory:
 
+- [CLI Guide](docs/cli/) (getting started, command reference, workflow examples)
 - [API Reference](docs/api-reference/) (client config, all resource methods, error handling)
 - [Code Examples](docs/examples/) (evaluations, judges, traces)
 - [Troubleshooting](docs/troubleshooting/) (auth issues, error codes)

diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md
@@ -15,11 +15,17 @@
   * [Results](api-reference/results.md)
   * [Models & Benchmarks](api-reference/models-benchmarks.md)
   * [Judges](api-reference/judges.md)
+  * [Scorers](api-reference/scorers.md)
   * [Traces](api-reference/traces.md)
   * [Trace Evaluations](api-reference/trace-evaluations.md)
   * [Judge Optimizations](api-reference/judge-optimizations.md)
   * [Error Handling](api-reference/errors.md)
 
+## CLI
+* [Getting Started](cli/getting-started.md)
+* [Command Reference](cli/commands.md)
+* [Workflow Examples](cli/examples.md)
+
 ## Code Examples
 * [Overview](examples/README.md)
   * [Creating Evaluations](examples/creating-evaluations.md)

diff --git a/docs/api-reference/evaluations.md b/docs/api-reference/evaluations.md
@@ -177,22 +177,23 @@ async def get_evaluation():
 asyncio.run(get_evaluation())
 ```
 
-### `get_many(page=None, page_size=None, sort_by=None, order=None, model_ids=None, benchmark_ids=None, status=None, timeout=None)`
+### `get_many(page=None, page_size=None, sort_by=None, order=None, model_ids=None, benchmark_ids=None, status=None, unique=False, timeout=None)`
 
 Retrieves multiple evaluations with optional pagination, sorting, and filtering.
 
 #### Parameters
 
-| Parameter       | Type                             | Required | Description                                             |
-| --------------- | -------------------------------- | -------- | ------------------------------------------------------- |
-| `page`          | `int \| None`                    | No       | Page number for pagination (1-based, defaults to 1)     |
-| `page_size`     | `int \| None`                    | No       | Number of evaluations per page (default: 100, max: 500) |
-| `sort_by`       | `str \| None`                    | No       | Sort by field: `submittedAt`, `accuracy`, or `averageDuration` |
-| `order`         | `str \| None`                    | No       | Sort order: `asc` or `desc`                             |
-| `model_ids`     | `List[str] \| None`              | No       | Filter by model IDs                                     |
-| `benchmark_ids` | `List[str] \| None`              | No       | Filter by benchmark/dataset IDs                         |
-| `status`        | `EvaluationStatus \| None`       | No       | Filter by evaluation status                             |
-| `timeout`       | `float \| httpx.Timeout \| None` | No       | Override request timeout                                |
+| Parameter       | Type                             | Required | Description                                                                         |
+| --------------- | -------------------------------- | -------- | ----------------------------------------------------------------------------------- |
+| `page`          | `int \| None`                    | No       | Page number for pagination (1-based, defaults to 1)                                 |
+| `page_size`     | `int \| None`                    | No       | Number of evaluations per page (default: 100, max: 500)                             |
+| `sort_by`       | `str \| None`                    | No       | Sort by field: `submitted_at`, `accuracy`, or `average_duration`                    |
+| `order`         | `str \| None`                    | No       | Sort order: `asc` or `desc`                                                         |
+| `model_ids`     | `List[str] \| None`              | No       | Filter by model IDs                                                                 |
+| `benchmark_ids` | `List[str] \| None`              | No       | Filter by benchmark/dataset IDs                                                     |
+| `status`        | `EvaluationStatus \| None`       | No       | Filter by evaluation status                                                         |
+| `unique`        | `bool`                           | No       | If `True`, deduplicate by model+benchmark pair, keeping only the latest evaluation  |
+| `timeout`       | `float \| httpx.Timeout \| None` | No       | Override request timeout                                                            |
 
 #### Returns
 
@@ -222,6 +223,13 @@ response = client.evaluations.get_many(
 if response:
     for evaluation in response.evaluations:
         print(f"{evaluation.id}: accuracy={evaluation.accuracy:.2f}%")
+
+# Get only the latest evaluation per model+benchmark pair
+response = client.evaluations.get_many(
+    unique=True,
+    sort_by="accuracy",
+    order="desc",
+)
 ```
 
 ### `get_results(page=None, page_size=None, timeout=None)`

diff --git a/docs/api-reference/models-benchmarks.md b/docs/api-reference/models-benchmarks.md
@@ -35,20 +35,24 @@ benchmarks = client.benchmarks.get()
 
 ## Models
 
-### `get(type=None, name=None, companies=None, regions=None, licenses=None, timeout=None)`
+### `get(type=None, name=None, key=None, categories=None, companies=None, regions=None, licenses=None, timeout=None)`
 
 Retrieves a list of available models with optional filtering parameters. Both the `Stratix` and `AsyncStratix` clients have this method.
 
 #### Parameters
 
-| Parameter   | Type                                  | Required | Description                                                            |
-| ----------- | ------------------------------------- | -------- | ---------------------------------------------------------------------- |
-| `type`      | `Literal["custom", "public"] \| None` | No       | Filter by model type. If `None`, returns both custom and public models |
-| `name`      | `str \| None`                         | No       | Filter models by name (partial match search)                           |
-| `companies` | `List[str] \| None`                   | No       | Filter by model companies/providers                                    |
-| `regions`   | `List[str] \| None`                   | No       | Filter by supported regions                                            |
-| `licenses`  | `List[str] \| None`                   | No       | Filter by license types                                                |
-| `timeout`   | `float \| httpx.Timeout \| None`      | No       | Override request timeout                                               |
+| Parameter    | Type                                  | Required | Description                                                                                    |
+| ------------ | ------------------------------------- | -------- | ---------------------------------------------------------------------------------------------- |
+| `type`       | `Literal["custom", "public"] \| None` | No       | Filter by model type. If `None`, returns both custom and public models                         |
+| `name`       | `str \| None`                         | No       | Filter models by name (partial match search)                                                   |
+| `key`        | `str \| None`                         | No       | Filter models by key (partial match search)                                                    |
+| `categories` | `List[str] \| None`                   | No       | Filter by categories: `Transformer`, `MoE`, `Open-Source`, `Closed-Source`                     |
+| `companies`  | `List[str] \| None`                   | No       | Filter by model companies/providers                                                            |
+| `regions`    | `List[str] \| None`                   | No       | Filter by supported regions                                                                    |
+| `licenses`   | `List[str] \| None`                   | No       | Filter by license types                                                                        |
+| `timeout`    | `float \| httpx.Timeout \| None`      | No       | Override request timeout                                                                       |
+
+> **Note:** When filtering by `categories`, `companies`, `regions`, or `licenses`, only public models are returned since custom models do not have these fields.
 
 #### Returns
 
@@ -185,17 +189,22 @@ if result:
 
 ## Benchmarks
 
-### `get(type=None, name=None, timeout=None)`
+### `get(type=None, name=None, key=None, categories=None, languages=None, timeout=None)`
 
 Retrieves a list of available benchmarks with optional filtering parameters. Both the `Stratix` and `AsyncStratix` clients have this method.
 
 #### Parameters
 
-| Parameter | Type                                  | Required | Description                                                                    |
-| --------- | ------------------------------------- | -------- | ------------------------------------------------------------------------------ |
-| `type`    | `Literal["custom", "public"] \| None` | No       | Filter by benchmark type. If `None`, returns both custom and public benchmarks |
-| `name`    | `str \| None`                         | No       | Filter benchmarks by name (partial match search)                               |
-| `timeout` | `float \| httpx.Timeout \| None`      | No       | Override request timeout                                                       |
+| Parameter    | Type                                  | Required | Description                                                                    |
+| ------------ | ------------------------------------- | -------- | ------------------------------------------------------------------------------ |
+| `type`       | `Literal["custom", "public"] \| None` | No       | Filter by benchmark type. If `None`, returns both custom and public benchmarks |
+| `name`       | `str \| None`                         | No       | Filter benchmarks by name (partial match search)                               |
+| `key`        | `str \| None`                         | No       | Filter benchmarks by key (partial match search)                                |
+| `categories` | `List[str] \| None`                   | No       | Filter by categories (e.g., `reasoning`, `knowledge`, `coding`)                |
+| `languages`  | `List[str] \| None`                   | No       | Filter by language (e.g., `english`, `french`)                                 |
+| `timeout`    | `float \| httpx.Timeout \| None`      | No       | Override request timeout                                                       |
+
+> **Note:** When filtering by `categories` or `languages`, only public benchmarks are returned since custom benchmarks do not have these fields.
 
 #### Returns
 

diff --git a/docs/api-reference/public-client.md b/docs/api-reference/public-client.md
@@ -286,16 +286,17 @@ Retrieves evaluations with optional pagination, sorting, and filtering.
 
 #### Parameters
 
-| Parameter         | Type                             | Required | Description                                                        |
-| ----------------- | -------------------------------- | -------- | ------------------------------------------------------------------ |
-| `page`            | `int \| None`                    | No       | Page number for pagination (1-based, defaults to 1)                |
-| `page_size`       | `int \| None`                    | No       | Number of evaluations per page (default: 100, max: 500)            |
-| `sort_by`         | `str \| None`                    | No       | Sort by field: `submittedAt`, `accuracy`, or `averageDuration`     |
-| `order`           | `str \| None`                    | No       | Sort order: `asc` or `desc`                                       |
-| `model_ids`       | `List[str] \| None`              | No       | Filter by model IDs                                                |
-| `benchmark_ids`   | `List[str] \| None`              | No       | Filter by benchmark/dataset IDs                                    |
-| `status`          | `EvaluationStatus \| None`       | No       | Filter by evaluation status                                        |
-| `timeout`         | `float \| httpx.Timeout \| None` | No       | Override request timeout                                           |
+| Parameter         | Type                             | Required | Description                                                                        |
+| ----------------- | -------------------------------- | -------- | ---------------------------------------------------------------------------------- |
+| `page`            | `int \| None`                    | No       | Page number for pagination (1-based, defaults to 1)                                |
+| `page_size`       | `int \| None`                    | No       | Number of evaluations per page (default: 100, max: 500)                            |
+| `sort_by`         | `str \| None`                    | No       | Sort by field: `submitted_at`, `accuracy`, or `average_duration`                   |
+| `order`           | `str \| None`                    | No       | Sort order: `asc` or `desc`                                                        |
+| `model_ids`       | `List[str] \| None`              | No       | Filter by model IDs                                                                |
+| `benchmark_ids`   | `List[str] \| None`              | No       | Filter by benchmark/dataset IDs                                                    |
+| `status`          | `EvaluationStatus \| None`       | No       | Filter by evaluation status                                                        |
+| `unique`          | `bool`                           | No       | If `True`, deduplicate by model+benchmark pair, keeping only the latest evaluation |
+| `timeout`         | `float \| httpx.Timeout \| None` | No       | Override request timeout                                                           |
 
 #### Returns