Models & Benchmarks

This page provides reference information about accessing the available models and benchmarks on your organization using the layerlens python sdk.

Overview

Running evaluations on the Stratix platform require a model and benchmark to be selected. The availability of models and benchmarks depends on your organizations configuration.

Before running the below examples ensure the model and benchmark being run are present on your organiztion.

Finding Available Models and Benchmarks

1. Using the Stratix Dashboard

The most reliable way to find available models and benchmarks:

Log into your Stratix Dashboard.
Navigate to the evaluation creation page.
View dropdown lists of available models and benchmarks.

2. Using the python sdk

from layerlens import Stratix

# Construct sync client (API key from env or inline)
client = Stratix()

# --- Models
models = client.models.get()

# --- Benchmarks
benchmarks = client.benchmarks.get()

Models

`get(type=None, name=None, key=None, categories=None, companies=None, regions=None, licenses=None, timeout=None)`

Retrieves a list of available models with optional filtering parameters. Both the Stratix and AsyncStratix clients have this method.

Parameters

Parameter	Type	Required	Description
`type`	`Literal["custom", "public"] \| None`	No	Filter by model type. If `None`, returns both custom and public models
`name`	`str \| None`	No	Filter models by name (partial match search)
`key`	`str \| None`	No	Filter models by key (partial match search)
`categories`	`List[str] \| None`	No	Filter by categories: `Transformer`, `MoE`, `Open-Source`, `Closed-Source`
`companies`	`List[str] \| None`	No	Filter by model companies/providers
`regions`	`List[str] \| None`	No	Filter by supported regions
`licenses`	`List[str] \| None`	No	Filter by license types
`timeout`	`float \| httpx.Timeout \| None`	No	Override request timeout

Note: When filtering by categories, companies, regions, or licenses, only public models are returned since custom models do not have these fields.

Returns

Returns an Optional[List[Model]] - a list of Model objects that match the filter criteria. Returns an empty list [] if no models match the criteria, or None if there's an error.

Model Object Properties

Each Model object in the returned list contains:

Property	Type	Description
`id`	`str`	Unique model identifier for use in evaluations
`name`	`str`	Human-readable model name
`key`	`str`	Unique model key/identifier that is similar to the name
`description`	`str`	Text description of the model

`get_by_id(id, timeout=None)`

Retrieves a specific model by its unique identifier. Both the Stratix and AsyncStratix clients have this method.

Parameters

Parameter	Type	Required	Description
`id`	`str`	Yes	Unique model identifier
`timeout`	`float \| httpx.Timeout \| None`	No	Override request timeout

Returns

Returns an Optional[Model] - a single Model object if found, or None if the model doesn't exist or there's an error.

`get_by_key(key, timeout=None)`

Retrieves a specific model by its unique key. Both the Stratix and AsyncStratix clients have this method.

Parameters

Parameter	Type	Required	Description
`key`	`str`	Yes	Unique model key identifier
`timeout`	`float \| httpx.Timeout \| None`	No	Override request timeout

Returns

Returns an Optional[Model] - a single Model object if found, or None if the model doesn't exist or there's an error.

`add(*model_ids, timeout=None)`

Adds models (public or custom) to the project by their IDs.

Parameters

Parameter	Type	Required	Description
`*model_ids`	`str`	Yes	One or more model IDs to add
`timeout`	`float \| httpx.Timeout \| None`	No	Override request timeout

Returns

Returns bool - True if the operation succeeded, False otherwise.

Example

client = Stratix()
success = client.models.add("model-id-1", "model-id-2")

`remove(*model_ids, timeout=None)`

Removes models (public or custom) from the project's model list. The underlying records are not deleted — use delete_custom to fully tear down a custom model.

Parameters

Parameter	Type	Required	Description
`*model_ids`	`str`	Yes	One or more model IDs to remove
`timeout`	`float \| httpx.Timeout \| None`	No	Override request timeout

Returns

Returns bool - True if the operation succeeded, False otherwise.

Example

client = Stratix()
success = client.models.remove("model-id-1", "model-id-2")

`create_custom(name, key, description, api_url, max_tokens, api_key=None, extra_payload=None, timeout=None)`

Creates a custom model backed by an OpenAI-compatible API endpoint. This allows you to evaluate any model accessible via a chat completions endpoint.

Parameters

Parameter	Type	Required	Description
`name`	`str`	Yes	Model name (max 256 characters)
`key`	`str`	Yes	Unique model key, lowercase alphanumeric with dots/hyphens/slashes (max 256 chars)
`description`	`str`	Yes	Model description (max 500 characters)
`api_url`	`str`	Yes	Full URL of the OpenAI-compatible chat completions endpoint
`max_tokens`	`int`	Yes	Maximum number of tokens the model supports
`api_key`	`str \| None`	No	API key for the model provider
`extra_payload`	`Dict[str, Any] \| None`	No	JSON object merged into every outgoing chat-completions request body (see below)
`timeout`	`float \| httpx.Timeout \| None`	No	Override request timeout

`extra_payload` semantics

When set, the keys/values in extra_payload are deep-merged into every outgoing request body. Customer values win on conflict with our hardcoded defaults — use it to override temperature (we send 0 for reproducible evaluations) or to add provider-specific fields like top_p, presence_penalty, or max_completion_tokens (required by some OpenAI reasoning models that reject max_tokens).

The keys messages, model, and stream are reserved and will be rejected.

Returns

Returns an Optional[CreateModelResponse] containing:

organization_id: Organization identifier
project_id: Project identifier
model_id: The newly created model's identifier

Returns None if the request fails.

Example

client = Stratix()

result = client.models.create_custom(
    name="My Custom Model",
    key="my-org/custom-model-v1",
    description="Custom fine-tuned model served via vLLM",
    api_url="https://my-model-endpoint.example.com/v1/chat/completions",
    api_key="my-provider-api-key",
    max_tokens=4096,
    # Optional — provider-specific overrides merged into every request body.
    extra_payload={"top_p": 0.9},
)

if result:
    print(f"Created model: {result.model_id}")

`update_custom(model_id, *, api_url=None, api_key=None, max_tokens=None, extra_payload=None, timeout=None)`

Updates a custom model's mutable fields. At least one of api_url, api_key, max_tokens, or extra_payload must be provided. Primary use case: repointing api_url for ephemeral vLLM endpoints behind cloudflared tunnels whose URL changes between sessions.

Parameters

Parameter	Type	Required	Description
`model_id`	`str`	Yes	ID of the custom model to update
`api_url`	`str \| None`	No	New full URL of the OpenAI-compatible chat completions endpoint
`api_key`	`str \| None`	No	New API key for the model provider
`max_tokens`	`int \| None`	No	New maximum tokens value
`extra_payload`	`Dict[str, Any] \| None`	No	New JSON object merged into every outgoing request. Pass `{}` to clear it.
`timeout`	`float \| httpx.Timeout \| None`	No	Override request timeout

Returns

Returns bool — True on success, False otherwise.

Example

client = Stratix()

# Repoint the api_url without re-creating the model
client.models.update_custom(
    "model-id-from-create-custom",
    api_url="https://my-new-endpoint.example.com/v1/chat/completions",
)

# Override request parameters for a model that doesn't accept temperature=0
client.models.update_custom(
    "model-id-from-create-custom",
    extra_payload={"temperature": 1},
)

`delete_custom(model_id, *, timeout=None)`

Disables a custom model and removes it from Project.Models. The backend tears down the model's S3 yaml artifacts and AWS secret, and marks the record as disabled (preserving any evaluation references). Public models cannot be deleted via the SDK.

Parameters

Parameter	Type	Required	Description
`model_id`	`str`	Yes	ID of the custom model to delete
`timeout`	`float \| httpx.Timeout \| None`	No	Override request timeout

Returns

Returns bool — True on success, False otherwise.

Example

client = Stratix()
client.models.delete_custom("model-id-from-create-custom")

Benchmarks

`get(type=None, name=None, key=None, categories=None, languages=None, timeout=None)`

Retrieves a list of available benchmarks with optional filtering parameters. Both the Stratix and AsyncStratix clients have this method.

Parameters

Parameter	Type	Required	Description
`type`	`Literal["custom", "public"] \| None`	No	Filter by benchmark type. If `None`, returns both custom and public benchmarks
`name`	`str \| None`	No	Filter benchmarks by name (partial match search)
`key`	`str \| None`	No	Filter benchmarks by key (partial match search)
`categories`	`List[str] \| None`	No	Filter by categories (e.g., `reasoning`, `knowledge`, `coding`)
`languages`	`List[str] \| None`	No	Filter by language (e.g., `english`, `french`)
`timeout`	`float \| httpx.Timeout \| None`	No	Override request timeout

Note: When filtering by categories or languages, only public benchmarks are returned since custom benchmarks do not have these fields.

Returns

Returns Optional[List[Benchmark]] - a list of Benchmark objects that match the filter criteria. Returns an empty list [] if no benchmarks match the criteria, or None if there's an error.

Benchmark Object Properties

Each Benchmark object in the returned list contains:

Property	Type	Description
`id`	`str`	Unique benchmark identifier for use in evaluations
`key`	`str`	Unique benchmark key/identifier that is similar to the name
`name`	`str`	Human-readable benchmark name

`get_by_id(id, timeout=None)`

Retrieves a specific benchmark by its unique identifier. Both the Stratix and AsyncStratix clients have this method.

Parameters

Parameter	Type	Required	Description
`id`	`str`	Yes	Unique benchmark identifier
`timeout`	`float \| httpx.Timeout \| None`	No	Override request timeout

Returns

Returns an Optional[Benchmark] - a single Benchmark object if found, or None if the benchmark doesn't exist or there's an error.

`get_by_key(key, timeout=None)`

Retrieves a specific benchmark by its unique key. Both the Stratix and AsyncStratix clients have this method.

Parameters

Parameter	Type	Required	Description
`key`	`str`	Yes	Unique benchmark key identifier
`timeout`	`float \| httpx.Timeout \| None`	No	Override request timeout

Returns

Returns an Optional[Benchmark] - a single Benchmark object if found, or None if the benchmark doesn't exist or there's an error.

`add(*benchmark_ids, timeout=None)`

Adds benchmarks to the project by their IDs.

Parameters

Parameter	Type	Required	Description
`*benchmark_ids`	`str`	Yes	One or more benchmark IDs to add
`timeout`	`float \| httpx.Timeout \| None`	No	Override request timeout

Returns

Returns bool - True if the operation succeeded, False otherwise.

Example

client = Stratix()
success = client.benchmarks.add("benchmark-id-1", "benchmark-id-2")

`remove(*benchmark_ids, timeout=None)`

Removes benchmarks from the project by their IDs.

Parameters

Parameter	Type	Required	Description
`*benchmark_ids`	`str`	Yes	One or more benchmark IDs to remove
`timeout`	`float \| httpx.Timeout \| None`	No	Override request timeout

Returns

Returns bool - True if the operation succeeded, False otherwise.

Example

client = Stratix()
success = client.benchmarks.remove("benchmark-id-1", "benchmark-id-2")

`create_custom(name, description, file_path, additional_metrics=None, custom_scorer_ids=None, input_type=None, timeout=None)`

Creates a custom benchmark by uploading a JSONL file. The file should contain one JSON object per line with input and truth fields.

Parameters

Parameter	Type	Required	Description
`name`	`str`	Yes	Benchmark name (max 64 characters)
`description`	`str`	Yes	Benchmark description (max 280 characters)
`file_path`	`str`	Yes	Path to a JSONL file with benchmark prompts
`additional_metrics`	`List[str] \| None`	No	Additional metrics: `readability`, `toxicity`, `hallucination`
`custom_scorer_ids`	`List[str] \| None`	No	List of custom scorer IDs to use
`input_type`	`str \| None`	No	Input type: `messages` or `json_payload`
`timeout`	`float \| httpx.Timeout \| None`	No	Override request timeout

JSONL File Format

Each line should be a JSON object:

{"input": "What is 2+2?", "truth": "4"}
{"input": "Capital of France?", "truth": "Paris"}

Optional fields: subset (for grouping prompts into categories).

Returns

Returns an Optional[CreateBenchmarkResponse] containing:

organization_id: Organization identifier
project_id: Project identifier
benchmark_id: The newly created benchmark's identifier

Returns None if the request fails.

Example

client = Stratix()

result = client.benchmarks.create_custom(
    name="QA Benchmark",
    description="Tests model factual accuracy",
    file_path="benchmark_data.jsonl",
    additional_metrics=["hallucination"],
)

if result:
    print(f"Created benchmark: {result.benchmark_id}")

`create_smart(name, description, system_prompt, file_paths, metrics=None, timeout=None)`

Creates a smart benchmark from uploaded files. The platform uses AI to automatically generate benchmark prompts from the provided documents. The benchmark is generated asynchronously.

Parameters

Parameter	Type	Required	Description
`name`	`str`	Yes	Benchmark name (max 256 characters)
`description`	`str`	Yes	Benchmark description (max 500 characters)
`system_prompt`	`str`	Yes	System prompt guiding benchmark generation (max 4000 characters)
`file_paths`	`List[str]`	Yes	List of file paths to upload (1-20 files, max 50 MB each)
`metrics`	`List[str] \| None`	No	Additional metrics: `readability`, `toxicity`, `hallucination`
`timeout`	`float \| httpx.Timeout \| None`	No	Override request timeout

Supported File Types

.txt, .pdf, .html, .docx, .csv, .json, .jsonl, .parquet

Returns

Returns an Optional[CreateBenchmarkResponse] containing:

organization_id: Organization identifier
project_id: Project identifier
benchmark_id: The newly created benchmark's identifier

Returns None if the request fails.

Example

client = Stratix()

result = client.benchmarks.create_smart(
    name="Product Knowledge Benchmark",
    description="Evaluates model knowledge of product docs",
    system_prompt="Generate QA pairs testing understanding of product features.",
    file_paths=["product_docs.pdf", "faq.txt"],
    metrics=["hallucination"],
)

if result:
    print(f"Smart benchmark created: {result.benchmark_id}")
    print("Check the dashboard for generation progress.")

Uh oh!

FilesExpand file tree

models-benchmarks.md

Latest commit

History

models-benchmarks.md

File metadata and controls

Models & Benchmarks

Overview

Finding Available Models and Benchmarks

1. Using the Stratix Dashboard

2. Using the python sdk

Models

get(type=None, name=None, key=None, categories=None, companies=None, regions=None, licenses=None, timeout=None)

Parameters

Returns

Model Object Properties

get_by_id(id, timeout=None)

Parameters

Returns

get_by_key(key, timeout=None)

Parameters

Returns

add(*model_ids, timeout=None)

Parameters

Returns

Example

remove(*model_ids, timeout=None)

Parameters

Returns

Example

create_custom(name, key, description, api_url, max_tokens, api_key=None, extra_payload=None, timeout=None)

Parameters

extra_payload semantics

Returns

Example

update_custom(model_id, *, api_url=None, api_key=None, max_tokens=None, extra_payload=None, timeout=None)

Parameters

Returns

Example

delete_custom(model_id, *, timeout=None)

Parameters

Returns

Example

Benchmarks

get(type=None, name=None, key=None, categories=None, languages=None, timeout=None)

Parameters

Returns

Benchmark Object Properties

get_by_id(id, timeout=None)

Parameters

Returns

get_by_key(key, timeout=None)

Parameters

Returns

add(*benchmark_ids, timeout=None)

Parameters

Returns

Example

remove(*benchmark_ids, timeout=None)

Parameters

Returns

Example

create_custom(name, description, file_path, additional_metrics=None, custom_scorer_ids=None, input_type=None, timeout=None)

Parameters

JSONL File Format

Returns

Example

create_smart(name, description, system_prompt, file_paths, metrics=None, timeout=None)

Parameters

Supported File Types

Returns

Example

`get(type=None, name=None, key=None, categories=None, companies=None, regions=None, licenses=None, timeout=None)`

`get_by_id(id, timeout=None)`

`get_by_key(key, timeout=None)`

`add(*model_ids, timeout=None)`

`remove(*model_ids, timeout=None)`

`create_custom(name, key, description, api_url, max_tokens, api_key=None, extra_payload=None, timeout=None)`

`extra_payload` semantics

`update_custom(model_id, *, api_url=None, api_key=None, max_tokens=None, extra_payload=None, timeout=None)`

`delete_custom(model_id, *, timeout=None)`

`get(type=None, name=None, key=None, categories=None, languages=None, timeout=None)`

`get_by_id(id, timeout=None)`

`get_by_key(key, timeout=None)`

`add(*benchmark_ids, timeout=None)`

`remove(*benchmark_ids, timeout=None)`

`create_custom(name, description, file_path, additional_metrics=None, custom_scorer_ids=None, input_type=None, timeout=None)`

`create_smart(name, description, system_prompt, file_paths, metrics=None, timeout=None)`