diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md new file mode 100644 index 0000000..f9df3e6 --- /dev/null +++ b/CODE_OF_CONDUCT.md @@ -0,0 +1,13 @@ +# Code of Conduct + +This project adopts the [Contributor Covenant, version 2.1](https://www.contributor-covenant.org/version/2/1/code_of_conduct/) as its Code of Conduct. + +By participating in this project, in any of its issues, pull requests, discussions, Discord channels, or other community spaces, you agree to abide by its terms. + +## Reporting + +Report instances of abusive, harassing, or otherwise unacceptable behavior to the project maintainers at **support@layerlens.ai** with the subject line "Code of Conduct report." All reports are reviewed and investigated promptly and fairly. The privacy and safety of the reporter is a priority. + +For the full text, see [contributor-covenant.org/version/2/1](https://www.contributor-covenant.org/version/2/1/code_of_conduct/). + +For translations, see the [official translations index](https://www.contributor-covenant.org/translations/). diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000..442cc33 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,80 @@ +# Contributing to stratix-python + +Thanks for your interest in contributing. The fastest path to a merged PR is to open an issue first so we can align on direction before code. + +## Before you start + +- Browse [open issues](https://github.com/LayerLens/stratix-python/issues), especially anything tagged `good first issue`. +- For non-trivial changes, [open an issue](https://github.com/LayerLens/stratix-python/issues/new) describing the problem and your proposed approach. We'll respond within a few business days. +- For questions and design discussion, join us in [Discord](https://discord.gg/layerlens). + +## Repo layout + +- `src/layerlens/` is the SDK source (clients, resources, CLI). +- `tests/` is the test suite (unit, integration, sample E2E). +- `samples/` holds runnable code samples organized by topic: `core`, `cicd`, `cli`, `mcp`, `integrations`, `industry`, `modalities`, `claude-code`, `cowork`, `copilotkit`, `openclaw`, `data`. +- `docs/` is the source for the [GitBook docs site](https://layerlens.gitbook.io/stratix-python-sdk). +- `scripts/` holds developer scripts (`bootstrap`, `test`, `lint`, `format`, `test_coverage`). +- `pyproject.toml` is the Python project config and tool settings. +- `requirements.lock` and `requirements-dev.lock` are the pinned dependencies. +- `.husky/` holds Git hooks that run on commit (lint-staged formats and lints staged Python files). + +## Local setup + +The project uses [Rye](https://rye.astral.sh/) to manage Python and dependencies. The bootstrap script sets everything up: + +```bash +git clone https://github.com/LayerLens/stratix-python.git +cd stratix-python +./scripts/bootstrap +source .venv/bin/activate +``` + +If you would rather use plain pip, ensure the Python version in `.python-version` is active, then: + +```bash +python -m venv .venv && source .venv/bin/activate +pip install -r requirements-dev.lock +pip install -e . +``` + +## Dev loop + +```bash +./scripts/test # run the test suite +./scripts/lint # run the linter +./scripts/format # format and auto-fix +``` + +A pre-commit hook runs `./scripts/format` and `./scripts/lint` against staged Python files automatically. + +## Required CI checks + +Every PR runs these workflows. They must pass before review: + +- [`run-tests.yaml`](https://github.com/LayerLens/stratix-python/actions/workflows/run-tests.yaml) is the full test suite. +- [`check-format.yaml`](https://github.com/LayerLens/stratix-python/actions/workflows/check-format.yaml) checks formatting. +- [`check-lint.yaml`](https://github.com/LayerLens/stratix-python/actions/workflows/check-lint.yaml) runs the linter. + +Run them locally before pushing. + +## Pull request guidelines + +- One logical change per PR. Smaller PRs merge faster. +- Reference the issue your PR addresses in the description. +- Include a runnable sample under `samples/` when adding a new SDK capability. +- Update `docs/` when changing public API surface. +- Add or update tests under `tests/` when changing behavior. +- Make sure all CI checks are green before requesting review. + +## Code of conduct + +This project follows the [Code of Conduct](./CODE_OF_CONDUCT.md). By participating, you agree to abide by it. + +## Reporting security issues + +Do not file a public issue for security vulnerabilities. See [SECURITY.md](./SECURITY.md) for the private disclosure process. + +## License + +By contributing, you agree your contribution is licensed under the [Apache License 2.0](./LICENSE). diff --git a/README.md b/README.md index 536e80b..befcbec 100644 --- a/README.md +++ b/README.md @@ -1,313 +1,744 @@ -# LayerLens Stratix Python SDK +
-The official Python library for the [LayerLens Stratix](https://app.layerlens.ai) evaluation API. +# Stratix Python SDK -[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) -[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/) +### Evaluate AI models before you ship them. -## Installation +The official Python SDK for [Stratix by LayerLens](https://stratix.layerlens.ai). Run reproducible benchmarks across 200+ models, evaluate agent traces, calibrate custom judges, and catch silent regressions, all from Python or your CI pipeline. -```bash -pip install layerlens --extra-index-url https://sdk.layerlens.ai/package -``` +**213 public models · 59 benchmarks · 26 model providers · 180,000+ benchmark prompts** + +Live counts from the Stratix public registry. Pulled at SDK build time, refreshed on every release. + +[![PyPI](https://img.shields.io/pypi/v/layerlens.svg?color=1454FF&style=flat-square)](https://pypi.org/project/layerlens/) +[![Downloads](https://img.shields.io/pypi/dm/layerlens.svg?color=1454FF&style=flat-square)](https://pypi.org/project/layerlens/) +[![Python 3.8+](https://img.shields.io/pypi/pyversions/layerlens.svg?style=flat-square)](https://www.python.org/downloads/) +[![Tests](https://github.com/LayerLens/stratix-python/actions/workflows/run-tests.yaml/badge.svg)](https://github.com/LayerLens/stratix-python/actions/workflows/run-tests.yaml) +[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg?style=flat-square)](https://opensource.org/licenses/Apache-2.0) +[![GitHub stars](https://img.shields.io/github/stars/LayerLens/stratix-python?style=social)](https://github.com/LayerLens/stratix-python) + +[**Browse 213 models →**](https://stratix.layerlens.ai) · +[**Docs**](https://layerlens.gitbook.io/stratix-python-sdk) · +[**Discord**](https://discord.gg/layerlens) · +[**Blog**](https://layerlens.ai/blog) · +[**Issues**](https://github.com/LayerLens/stratix-python/issues) + +Stratix evaluation dashboard: 213 models scored on 59 benchmarks, every result reproducible + +[**Run your first eval**](#quick-start) · [**Browse 213 models**](https://stratix.layerlens.ai) · [**Star if useful ⭐**](https://github.com/LayerLens/stratix-python) + +
+ +--- + +
+ Stratix SDK demo: 213 models, reproducible benchmarks, agent trace evaluation in Python +

Vendor-neutral evals in 5 lines of Python.

+
+ +--- + +## Why Stratix + +Hand-rolled eval pipelines drift. Vendor leaderboards are not reproducible. Production agents fail silently and nobody knows which release introduced the regression. -## Authentication + + + + + + +
-Set your API key as an environment variable: +### Vendor-neutral + +Stratix is not owned by a model provider. The same benchmark runs across 213 public models from 26 providers in one workspace. No labs grading their own homework. No leaderboards optimized for marketing. + + + +### Reproducible by default + +Every score is backed by a verifiable, persisted trace you can re-run, inspect, and cite. Same prompt, same prompt template, same scoring logic, same model version. Every time. + + + +### Production-ready + +Wire evals into CI. Calibrate judges to a quality goal in plain English. Score full agent traces, not just last-token outputs. Ship reliable agents faster. + +
+ +--- + +## Quick Start + +Three steps. Under two minutes if you already have an API key. ```bash -export LAYERLENS_STRATIX_API_KEY="your-api-key" +pip install layerlens ``` -Or pass it directly when creating a client: - ```python from layerlens import Stratix +# Auth via env (LAYERLENS_STRATIX_API_KEY) or kwarg client = Stratix(api_key="your-api-key") + +# Pick a model + benchmark from the public registry +model = client.models.get_by_key("openai/gpt-5.5-20260423") +benchmark = client.benchmarks.get_by_key("aime2026") + +# Run the evaluation +evaluation = client.evaluations.create(model=model, benchmark=benchmark) +result = client.evaluations.wait_for_completion(evaluation) + +print(f"accuracy: {result.accuracy}") +print(f"view: https://stratix.layerlens.ai/evaluations/{result.id}") ``` -## Quick Start +**If that worked end-to-end in under two minutes, [star the repo](https://github.com/LayerLens/stratix-python). Helps more teams find Stratix.** + +[Get an API key →](https://stratix.layerlens.ai) · [Full Quick Start docs →](https://layerlens.gitbook.io/stratix-python-sdk/getting-started) + +--- + +## Install + + + + + + + + + + + + +
Standard (pip)Modern (uv)Authenticate
+ +```bash +pip install layerlens +``` + + + +```bash +uv pip install layerlens +``` + + + +```bash +export LAYERLENS_STRATIX_API_KEY=... +``` + +Or pass `api_key=...` to the client. + +
+ +Requires Python 3.8+. Free tier available at [stratix.layerlens.ai](https://stratix.layerlens.ai). Browse all 213 models and 59 benchmarks before you sign up. + +--- + +## Capabilities + +Six capabilities, one SDK, one feedback loop. -### Run an evaluation + + + + + + + + + + + +
+ +### Model evaluation + +Run any of 213 public models across 59 benchmarks. AIME, GPQA, ARC-AGI-2, HumanEval, Terminal-Bench, MMLU Pro, BIRD-CRITIC, more. Reasoning, coding, math, agentic, multilingual. + +[Docs →](https://layerlens.gitbook.io/stratix-python-sdk) + + + +### Agent trace evaluation + +Upload OpenAI-format trace files and score multi-step agent behavior. Tool use, planning quality, recovery from failures. Not just the final token. + +[Docs →](https://layerlens.gitbook.io/stratix-python-sdk) + + + +### Judge calibration + +Define a quality goal in plain English. Stratix calibrates an LLM-as-judge to that goal, validates against your gold examples, and reuses the judge across runs. + +[Docs →](https://layerlens.gitbook.io/stratix-python-sdk) + +
+ +### Custom benchmarks + +Bring your own dataset. Smart benchmark generation for adversarial cases, edge inputs, and domain-specific evals. Reuses public scoring infrastructure. + +[Docs →](https://layerlens.gitbook.io/stratix-python-sdk) + + + +### CI integration + +Fail the build on quality regressions, not just on red unit tests. Use `stratix ci report` in GitHub Actions, GitLab CI, CircleCI, or any Python-capable runner. + +[Sample →](./samples/cicd) + + + +### Reproducible runs + +Every evaluation persists model version, prompt template, judge config, and full traces. Re-run any evaluation by ID. Cite the result with confidence. + +[Docs →](https://layerlens.gitbook.io/stratix-python-sdk) + +
+ +--- + +## Hand-rolled vs. Stratix + +The same task: score GPT-5.4 against AIME 2026 and store the results. + + + + + + + + + + +
Hand-rolled (typical)Stratix
```python -import os -from layerlens import Stratix +import openai, json, asyncio +from datasets import load_dataset -client = Stratix(api_key=os.environ.get("LAYERLENS_STRATIX_API_KEY")) +ds = load_dataset("aime-2026")["test"] +client = openai.OpenAI() -# Get a model and benchmark by key -model = client.models.get_by_key("openai/gpt-4o") -benchmark = client.benchmarks.get_by_key("arc-agi-2") +results = [] +async def score_one(item): + resp = await client.chat.completions.create( + model="gpt-5.5-20260423", + messages=[{"role":"user","content":item["q"]}], + ) + answer = parse_answer(resp.choices[0].message.content) + return {"q": item["q"], "ans": answer, "expected": item["a"], + "correct": answer == item["a"]} + +# Implement: rate limiting, retries, cost tracking, +# trace storage, judge logic, schema versioning, +# benchmark drift detection, regression alerting. +# Repeat per benchmark. Per model. Per release. +``` + + + +```python +from layerlens import Stratix + +client = Stratix() # reads LAYERLENS_STRATIX_API_KEY -# Create an evaluation (pass the full model and benchmark objects) evaluation = client.evaluations.create( - model=model, - benchmark=benchmark, + model=client.models.get_by_key("openai/gpt-5.5-20260423"), + benchmark=client.benchmarks.get_by_key("aime2026"), ) - -# Wait for results (pass the evaluation object, not just the ID) result = client.evaluations.wait_for_completion(evaluation) -print(f"Accuracy: {result.accuracy}") + +print(result.accuracy) +print(f"https://stratix.layerlens.ai/evaluations/{result.id}") ``` -### Async usage +
-```python -import os -import asyncio -from layerlens import AsyncStratix +--- -async def main(): - client = AsyncStratix(api_key=os.environ.get("LAYERLENS_STRATIX_API_KEY")) +## How Stratix compares - model = await client.models.get_by_key("openai/gpt-4o") - benchmark = await client.benchmarks.get_by_key("arc-agi-2") + + + + + + + + + + + + + + + + + + + + + +
StratixBraintrustLangSmithPhoenixOpenAI Evals
Public-model leaderboard213nonenonenonelimited
Independent grading⚠️ vendor
Reproducible scores
traces persisted
Agent trace evaluation⚠️
Judge calibration in SDK⚠️⚠️⚠️
Custom benchmarks
Smart benchmark generationvia templatesvia templatesmanualmanual
59 prebuilt benchmarks out of the boxvia templatesvia templatesvia Arizesmall core set
- evaluation = await client.evaluations.create( - model=model, - benchmark=benchmark, - ) +Comparison based on publicly documented features as of April 2026. Corrections welcome via issue or PR. - result = await client.evaluations.wait_for_completion(evaluation) - print(f"Accuracy: {result.accuracy}") +--- -asyncio.run(main()) -``` +## Built for every kind of evaluation -### Public endpoints +Teams use Stratix to: -Public models, benchmarks, and evaluations are accessible through `client.public`. Note: the public client still requires an API key. +- **Pick the right model.** Compare 213 candidate models against your benchmark of choice before locking a vendor. +- **Lock in CI.** Wire the SDK into your test suite. Fail builds on quality drops, not just code regressions. +- **Audit production agents.** Score full agent traces against custom judges that match your quality bar. +- **Generate adversarial datasets.** Use smart benchmark generation to surface edge cases your manual tests missed. +- **Prove model claims.** Cite a reproducible Stratix score in security reviews, customer pitches, and compliance audits. +- **Replace hand-rolled eval pipelines.** Stop maintaining bespoke scripts that drift with every release. -```python -import os -from layerlens import Stratix +--- + +## Cite, share, embed -client = Stratix(api_key=os.environ.get("LAYERLENS_STRATIX_API_KEY")) +Every evaluation has a stable URL. Paste it in a paper, a blog post, a security review, or a tweet. Anyone with the link can inspect the prompts, the judge, the traces, and the score. -# Browse public models -models = client.public.models.get() -for model in models.models: - print(f"{model.key}: {model.name}") +``` +https://stratix.layerlens.ai/evaluations/ ``` -Or instantiate the public client directly: +Compare two models on the same benchmark, share the link: -```python -import os -from layerlens import PublicClient +``` +https://stratix.layerlens.ai/comparison?benchmark=682bddc1e014f9fa440f8a91&referenceModel=6994bcd3e014f9f182758de1&comparisonModel=69ab1647e014f9a88f33907a +``` + +Tweet template after a run: + +> Just ran `` on ``. Score: ``. Reproducible trace: ``. Built on @LayerLens_AI Stratix. + +--- -public = PublicClient(api_key=os.environ.get("LAYERLENS_STRATIX_API_KEY")) -models = public.models.get() +## CI in 30 seconds + +Use the SDK in any GitHub Actions workflow. Fail the build on quality drops, not just unit-test red. + +```yaml +- name: Run Stratix evals + run: | + pip install layerlens + stratix evaluate run --model openai/gpt-5.5-20260423 --benchmark aime2026 --wait + stratix ci report >> $GITHUB_STEP_SUMMARY + env: + LAYERLENS_STRATIX_API_KEY: ${{ secrets.LAYERLENS_STRATIX_API_KEY }} ``` -## Resources +The CI report renders directly in the GitHub Actions job summary. No custom action required. -The SDK provides access to these resource types: +--- -| Resource | Description | -| ---------------------------- | ----------------------------------------------------------------------------- | -| `client.models` | Manage models (get, get_by_key, add, remove, create_custom) | -| `client.benchmarks` | Manage benchmarks (get, get_by_key, add, remove, create_custom, create_smart) | -| `client.evaluations` | Create evaluations and wait for results | -| `client.judges` | CRUD operations for evaluation judges | -| `client.traces` | Upload trace files and manage traces | -| `client.trace_evaluations` | Run trace-level evaluations with judges | -| `client.judge_optimizations` | Optimize judge configurations | -| `client.results` | Retrieve evaluation results | -| `client.public` | Public models, benchmarks, evaluations, and comparisons | +## CLI -Every resource is available in both sync (`Stratix`) and async (`AsyncStratix`) clients. +The `layerlens` package ships with a `stratix` (and `layerlens`) CLI for one-line evaluations from your terminal. -## Examples +```bash +# Set API key once +export LAYERLENS_STRATIX_API_KEY=your-api-key -### Working with judges +# Run an evaluation and wait for results +stratix evaluate run --model openai/gpt-5.5-20260423 --benchmark aime2026 --wait -```python -# Create a judge (name and evaluation_goal are required) -judge = client.judges.create( - name="Response Quality Judge", - evaluation_goal="Rate whether the response is accurate, complete, and well-structured", -) +# List evaluations, filter and sort +stratix evaluate list --status success --sort-by accuracy --order desc +stratix evaluate get -# List judges (returns a JudgesResponse with .judges list) -response = client.judges.get_many() -for j in response.judges: - print(f"{j.name} (id: {j.id})") +# Generate a CI summary report +stratix ci report --output summary.md -# Update a judge -client.judges.update(judge.id, name="Updated Judge Name") +# Manage traces, judges, scorers, integrations +stratix trace --help +stratix judge --help +stratix scorer --help +stratix integration --help -# Delete a judge -client.judges.delete(judge.id) +# Shell completion (bash/zsh/fish) +stratix completion bash ``` -### Uploading and evaluating traces +[Full CLI reference →](https://layerlens.gitbook.io/stratix-python-sdk/cli) -Trace upload works with JSON or JSONL files (up to 50 MB). The SDK handles presigned S3 uploads automatically. +--- -```python -# Upload a trace file (pass a file path, not raw data) -result = client.traces.upload("./my_traces.json") -print(f"Uploaded trace IDs: {result.trace_ids}") - -# List traces -traces = client.traces.get_many() -for t in traces.traces: - print(f"Trace {t.id}") - -# Create a trace evaluation -trace_eval = client.trace_evaluations.create( - trace_id=t.id, - judge_id=judge.id, -) +## Architecture -# Get results -results = client.trace_evaluations.get_results(trace_eval.id) +Stratix sits between your code and any model provider. Every score is backed by a stored trace. + +``` + your code / agent / CI pipeline + │ + ▼ + ┌──────────────┐ + │ layerlens │ ◄── Python SDK + CLI + │ SDK │ + └──────┬───────┘ + │ HTTPS + ▼ + ┌────────────────────────┐ + │ Stratix platform │ + │ ┌──────────────────┐ │ + │ │ model gateway │ │ ─► OpenAI · Anthropic · Google · xAI · Moonshot · 22 more + │ ├──────────────────┤ │ + │ │ benchmark engine │ │ ─► 59 benchmarks · 180k+ prompts + │ ├──────────────────┤ │ + │ │ judge calibrator │ │ ─► LLM-as-judge + heuristic + ML + │ ├──────────────────┤ │ + │ │ trace store │ │ ─► reproducible per-run artifacts + │ └──────────────────┘ │ + └────────────────────────┘ ``` -### Custom models +--- + +## Examples + +| File | What it shows | +| -------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------ | +| [`samples/core/quickstart.py`](./samples/core/quickstart.py) | First evaluation in 10 lines | +| [`samples/core/trace_evaluation.py`](./samples/core/trace_evaluation.py) | Score a multi-step agent trace | +| [`samples/core/judge_optimization.py`](./samples/core/judge_optimization.py) | Calibrate an LLM-as-judge to a quality goal | +| [`samples/core/custom_benchmark.py`](./samples/core/custom_benchmark.py) | Bring your own dataset | +| [`samples/cicd/github_actions_gate.yml`](./samples/cicd/github_actions_gate.yml) | Fail CI on quality regressions | +| [`samples/`](./samples) | Full samples tree: cicd, claude-code, cli, copilotkit, integrations, mcp, modalities, more | + +**Build something with Stratix in 30 minutes.** Pick a target model, run it against a benchmark you care about, and post the URL in [Discord](https://discord.gg/layerlens) or tag [@LayerLens_AI](https://x.com/LayerLens_AI). + +--- -Custom models require an OpenAI-compatible API endpoint. +## Handling errors + +Connection failures (network, timeout) raise a subclass of `APIConnectionError`. API errors (4xx/5xx) raise a subclass of `APIStatusError` with `.status_code` and `.response`. Everything inherits from `StratixError`. ```python -response = client.models.create_custom( - name="My Fine-tuned Model", - key="my-org/custom-model-v1", - description="Fine-tuned GPT for medical Q&A", - api_url="https://my-api.example.com/v1", - max_tokens=4096, - api_key=os.environ.get("MY_PROVIDER_API_KEY"), # optional +from layerlens import ( + Stratix, + APIConnectionError, + APIStatusError, + RateLimitError, ) -print(f"Created model: {response.model_id}") + +client = Stratix() + +try: + client.evaluations.create(model=..., benchmark=...) +except APIConnectionError as e: + print(f"could not reach Stratix: {e.__cause__}") +except RateLimitError: + print("429: back off and retry") +except APIStatusError as e: + print(f"{e.status_code}: {e.response}") ``` -## Client aliases +| Status | Error | +| ------ | --------------------------------------- | +| 400 | `BadRequestError` | +| 401 | `AuthenticationError` | +| 403 | `PermissionDeniedError` | +| 404 | `NotFoundError` | +| 409 | `ConflictError` | +| 422 | `UnprocessableEntityError` | +| 429 | `RateLimitError` | +| 5xx | `InternalServerError` | +| n/a | `APIConnectionError`, `APITimeoutError` | + +--- -For backward compatibility, multiple import names are available: +## Configuration + + + + + + + + + + +
Context manager (sync)Context manager (async)
```python -from layerlens import Stratix # Primary -from layerlens import AsyncStratix # Async primary -from layerlens import Client # Alias for Stratix -from layerlens import AsyncClient # Alias for AsyncStratix -from layerlens import Atlas # Legacy alias -from layerlens import AsyncAtlas # Legacy alias -from layerlens import PublicClient # Public endpoints -from layerlens import AsyncPublicClient +from layerlens import Stratix + +with Stratix() as client: + eval = client.evaluations.create(...) +# HTTP connection released ``` -## Configuration + -| Environment Variable | Description | Default | -| ---------------------------- | ------------------------- | --------------------------------- | -| `LAYERLENS_STRATIX_API_KEY` | Your API key | (required) | -| `LAYERLENS_STRATIX_BASE_URL` | Override the API base URL | `https://api.layerlens.ai/api/v1` | +```python +import asyncio +from layerlens import AsyncStratix -Legacy env vars (`LAYERLENS_ATLAS_API_KEY`, `LAYERLENS_ATLAS_BASE_URL`) are also supported. +async def main(): + async with AsyncStratix() as client: + eval = await client.evaluations.create(...) -## Error handling +asyncio.run(main()) +``` -The SDK raises typed exceptions for API errors: +
```python -import os -from layerlens import Stratix, StratixError, APIError, BadRequestError, NotFoundError +import httpx +from layerlens import Stratix -client = Stratix(api_key=os.environ.get("LAYERLENS_STRATIX_API_KEY")) +# Configure the default for all requests +client = Stratix( + api_key="...", + base_url="https://stratix.layerlens.ai", + timeout=httpx.Timeout(60.0, read=30.0, connect=5.0), # default: 600s read +) -try: - result = client.models.get_by_id("nonexistent-id") -except NotFoundError as e: - print(f"Not found (HTTP {e.status_code}): {e.message}") -except BadRequestError as e: - print(f"Bad request: {e.message}") -except APIError as e: - print(f"API error: {e.message}") -except StratixError as e: - print(f"Client error: {e}") +# Override per-request +client.with_options(timeout=5.0).evaluations.create(...) ``` -Catch the most specific exception first. The hierarchy: - -- `StratixError` (base for all SDK errors) - - `APIError` (base for all API-related errors) - - `APIConnectionError` (network issues) - - `APITimeoutError` (request timed out) - - `APIResponseValidationError` (response didn't match expected schema) - - `APIStatusError` (HTTP 4xx/5xx) - - `BadRequestError` (400) - - `AuthenticationError` (401) - - `PermissionDeniedError` (403) - - `NotFoundError` (404) - - `ConflictError` (409) - - `UnprocessableEntityError` (422) - - `RateLimitError` (429) - - `InternalServerError` (500+) - -Note: Only `StratixError`, `APIError`, `BadRequestError`, `AuthenticationError`, and `NotFoundError` are exported from the top-level package. For other exception types, import from `layerlens._exceptions`. +The `LAYERLENS_STRATIX_API_KEY` and `LAYERLENS_STRATIX_BASE_URL` environment variables are read automatically when no kwarg is passed. -## CLI +--- -The LayerLens CLI lets you manage traces, judges, evaluations, integrations, and more from the terminal. +## Reference -### Install +
Client classes and aliases -```bash -pip install layerlens[cli] --extra-index-url https://sdk.layerlens.ai/package +`Stratix` is the canonical synchronous client. `AsyncStratix` is the async counterpart. The legacy `Client` and `AsyncClient` aliases are kept for backward compatibility. + +```python +from layerlens import Stratix, AsyncStratix +from layerlens import Client, AsyncClient # aliases (deprecated, kept for compat) +from layerlens import PublicClient # read-only, unauthenticated public API +from layerlens import Atlas, AsyncAtlas # Atlas product client (separate platform) ``` -### Configure +
-```bash -export LAYERLENS_STRATIX_API_KEY="your-api-key" -``` +
Async client -### Usage +Every method on `Stratix` has an `AsyncStratix` counterpart with the same signature and `await`-able returns. -```bash -stratix --help # Show all commands -stratix trace list # List traces -stratix evaluate run \ - --model openai/gpt-4o \ - --benchmark arc-agi-2 --wait # Run an evaluation and wait for results -stratix judge create \ - --name "Quality" \ - --goal "Rate response quality" \ - --model-id # Create a judge -stratix ci report -o summary.md # Generate CI report +```python +import asyncio +from layerlens import AsyncStratix + +async def main(): + async with AsyncStratix() as client: + evaluation = await client.evaluations.create( + model=await client.models.get_by_key("openai/gpt-5.5-20260423"), + benchmark=await client.benchmarks.get_by_key("aime2026"), + ) + result = await client.evaluations.wait_for_completion(evaluation) + print(result.accuracy) + +asyncio.run(main()) ``` -Shell completions are available for bash, zsh, fish, and powershell: +
+ +
Error hierarchy -```bash -stratix completion bash # Print setup instructions ``` +StratixError +├── AtlasError +└── APIError + ├── APIConnectionError + │ └── APITimeoutError + ├── APIResponseValidationError + └── APIStatusError + ├── BadRequestError (400) + ├── AuthenticationError (401) + ├── PermissionDeniedError (403) + ├── NotFoundError (404) + ├── ConflictError (409) + ├── UnprocessableEntityError (422) + ├── RateLimitError (429) + └── InternalServerError (5xx) +``` + +```python +from layerlens import ( + StratixError, APIError, + APIConnectionError, APITimeoutError, + APIStatusError, + BadRequestError, AuthenticationError, PermissionDeniedError, + NotFoundError, ConflictError, UnprocessableEntityError, + RateLimitError, InternalServerError, +) +``` + +
+ +
Environment variables + +| Variable | Purpose | +| ---------------------------- | ----------------------------------------------------------- | +| `LAYERLENS_STRATIX_API_KEY` | API key (required if not passed to client) | +| `LAYERLENS_STRATIX_BASE_URL` | Override base URL (default: `https://stratix.layerlens.ai`) | + +
+ +
Resources on the Stratix client + +| Resource | What it does | +| ---------------------------- | ---------------------------------------------------------------- | +| `client.models` | Add, remove, list, fetch models in your project | +| `client.benchmarks` | Add, remove, list, fetch benchmarks (including custom and smart) | +| `client.evaluations` | Run model-against-benchmark evaluations | +| `client.trace_evaluations` | Score uploaded agent traces against judges | +| `client.judges` | Create, update, delete custom LLM-as-judge configs | +| `client.judge_optimizations` | Calibrate a judge to a quality goal, then apply | +| `client.scorers` | Heuristic and ML scorer registry | +| `client.traces` | Upload, list, fetch agent trace artifacts | +| `client.evaluation_spaces` | Group related evaluations into a project space | +| `client.integrations` | Manage CI / webhook / SSO integrations | +| `client.results` | Fetch raw evaluation results (for ETL) | +| `client.public` | Public read-only access (no auth required) | + +
+ +--- + +## Get help + +| | | +| -------------------------------------------------------------------------- | ------------------------------------------------------- | +| 💬 [**Discord**](https://discord.gg/layerlens) | Real-time help from the team and community | +| 🐛 [**GitHub Issues**](https://github.com/LayerLens/stratix-python/issues) | Bug reports, feature requests, design questions | +| 📖 [**Docs**](https://layerlens.gitbook.io/stratix-python-sdk) | Full SDK reference + cookbooks | +| 🌐 [**Web app**](https://stratix.layerlens.ai) | Browse 213 models, 59 benchmarks, run evals from the UI | +| 📺 [**YouTube**](https://www.youtube.com/@LayerLens-Official) | Walkthroughs and demos | +| 𝕏 [**@LayerLens_AI**](https://x.com/LayerLens_AI) | Release announcements, model launches, Stratix scores | +| 🔐 **security@layerlens.ai** | Private vulnerability disclosure | + +--- + +## Roadmap + +[**Releases**](https://github.com/LayerLens/stratix-python/releases) · [**Changelog**](https://layerlens.gitbook.io/stratix-python-sdk) · [**Open issues**](https://github.com/LayerLens/stratix-python/issues) + + + + + + + + + + + + + + +
Recently shippedIn progressComing upExploring
+ +- [x] 213 public models +- [x] Agent trace evaluation +- [x] Judge calibration +- [x] Smart benchmark generation +- [x] Async client +- [x] Reproducible runs + + + +- [ ] Deliberation panels +- [ ] Custom-model adapters (open weights) +- [ ] Cost-aware eval routing -Full CLI docs: [docs/cli/](docs/cli/) + -| Guide | Description | -| --- | --- | -| [Getting Started](docs/cli/getting-started.md) | Installation, configuration, first commands | -| [Command Reference](docs/cli/commands.md) | All commands and options | -| [Examples](docs/cli/examples.md) | 15 common workflows as copy-paste shell sessions | +- [ ] Per-domain leaderboards +- [ ] Streaming eval results +- [ ] TypeScript SDK -## Requirements + -- Python 3.8+ -- Dependencies: `httpx`, `pydantic`, `requests` -- CLI extra: `click>=8.0.0` +- [ ] Cross-model A/B harness +- [ ] Latency-quality Pareto plots +- [ ] OpenTelemetry trace ingest -## Documentation +
-Full API reference and examples are available in the [docs/](docs/) directory: +--- -- [CLI Guide](docs/cli/) (getting started, command reference, workflow examples) -- [API Reference](docs/api-reference/) (client config, all resource methods, error handling) -- [Code Examples](docs/examples/) (evaluations, judges, traces) -- [Troubleshooting](docs/troubleshooting/) (auth issues, error codes) +## Contributing + +Bug fixes, new examples, framework integrations, doc improvements, all welcome. + +1. Browse [`good first issue`](https://github.com/LayerLens/stratix-python/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22). +2. Open a [GitHub Issue](https://github.com/LayerLens/stratix-python/issues) before large changes so we can align on direction. +3. Say hi in [Discord](https://discord.gg/layerlens) or open a [GitHub Issue](https://github.com/LayerLens/stratix-python/issues). + + + Contributors + + +--- + +## Security and privacy + +Report vulnerabilities privately via security@layerlens.ai or the [Security Advisory](https://github.com/LayerLens/stratix-python/security/advisories) flow. Coordinated disclosure preferred. + +The SDK does not collect telemetry. Network requests originate from your environment and target `https://stratix.layerlens.ai` only. API keys are sent via HTTPS in the `Authorization` header and are never logged client-side. + +--- + +## Star history + + + + + Star history of LayerLens/stratix-python + + + +--- + +## Versioning + +This package follows [SemVer](https://semver.org/spec/v2.0.0.html). Public APIs (everything in `from layerlens import ...`) are stable across minor versions. Internal modules (anything starting with `_`) may change without notice. + +Determine the installed version: + +```python +from importlib.metadata import version +print(version("layerlens")) +``` + +Breaking changes, deprecations, and migration notes ship in [Releases](https://github.com/LayerLens/stratix-python/releases) and the [Changelog](https://layerlens.gitbook.io/stratix-python-sdk). + +--- ## License -Apache 2.0. See [LICENSE](LICENSE) for details. +Apache 2.0. See [LICENSE](./LICENSE). + +--- + +
+ +**Built by the LayerLens team and [contributors worldwide](https://github.com/LayerLens/stratix-python/graphs/contributors).** + +If Stratix helps a team ship more reliable AI, a star helps more teams find it. + +[🌐 layerlens.ai](https://layerlens.ai) · [📖 Docs](https://layerlens.gitbook.io/stratix-python-sdk) · [☁️ Web app](https://stratix.layerlens.ai) · [💬 Discord](https://discord.gg/layerlens) + +
diff --git a/SECURITY.md b/SECURITY.md new file mode 100644 index 0000000..b80283f --- /dev/null +++ b/SECURITY.md @@ -0,0 +1,46 @@ +# Security Policy + +We take the security of the Stratix Python SDK seriously. Thanks for helping us keep it safe. + +## Reporting a vulnerability + +**Do not file a public GitHub issue for security vulnerabilities.** + +Email **support@layerlens.ai** with the subject line "Security report: stratix-python" and include: + +- A description of the vulnerability and where it lives in the codebase. +- Steps to reproduce, including any proof-of-concept code if you have it. +- The version of `layerlens` you tested against (`pip show layerlens`). +- Your assessment of the impact (data exposure, RCE, auth bypass, denial of service, etc.). +- Whether you would like credit in the disclosure, and if so, how you would like to be credited. + +We will acknowledge receipt within 3 business days, give you an initial assessment within 7 business days, and keep you updated as we work on a fix. + +## Scope + +In scope: + +- The `layerlens` Python package published to PyPI. +- Source code in this repository (`src/`, `tests/`, `samples/`, `scripts/`). +- The `stratix` CLI binary distributed with the SDK. + +Out of scope (please report to the relevant team instead): + +- Vulnerabilities in the hosted Stratix platform itself ([stratix.layerlens.ai](https://stratix.layerlens.ai)). Email **support@layerlens.ai** with subject "Security report: Stratix platform." +- Third-party dependencies. Please file with the upstream project. +- Issues that require physical access to a user's machine. + +## Supported versions + +We provide security fixes for the latest minor release of `layerlens`. Older versions may receive fixes at our discretion. + +| Version | Supported | +| ------- | ------------------ | +| 1.6.x | Yes | +| < 1.6 | No, please upgrade | + +## Disclosure + +We follow coordinated disclosure. Once a fix is released, we will publish an advisory on the [GitHub Security Advisories](https://github.com/LayerLens/stratix-python/security/advisories) page and credit the reporter unless they prefer to remain anonymous. + +Thanks for keeping the community safe. diff --git a/assets/before_after_hero.png b/assets/before_after_hero.png new file mode 100644 index 0000000..de7ee93 Binary files /dev/null and b/assets/before_after_hero.png differ diff --git a/assets/hero-demo.gif b/assets/hero-demo.gif new file mode 100644 index 0000000..1157c97 Binary files /dev/null and b/assets/hero-demo.gif differ