The judges resource on the Stratix client allows you to create and manage LLM-based judges for evaluating traces. Judges define evaluation criteria that can be run against traces to assess quality, correctness, and other properties.
A judge encapsulates an evaluation goal and is backed by a specific LLM model. Once created, judges can be run against traces via the trace evaluations resource to produce scored results.
from layerlens import Stratix
client = Stratix()
# Fetch a model to use for the judge
models = client.models.get(type="public", name="gpt-4o")
model = models[0]
# Create a judge
judge = client.judges.create(
name="Code Quality Judge",
evaluation_goal="Evaluate the quality of code output including correctness and style",
model_id=model.id,
)
print(f"Created judge: {judge.name} (v{judge.version})")
# List all judges
response = client.judges.get_many()
for j in response.judges:
print(f" {j.name}: {j.run_count} runs")import asyncio
from layerlens import AsyncStratix
async def main():
client = AsyncStratix()
judge = await client.judges.create(
name="Code Quality Judge",
evaluation_goal="Evaluate the quality of code output including correctness and style",
model_id="model-abc123",
)
print(f"Created judge: {judge.name} (v{judge.version})")
if __name__ == "__main__":
asyncio.run(main())Both the Stratix (synchronous) and AsyncStratix (asynchronous) clients support the following methods.
Creates a new judge with the specified evaluation criteria.
| Parameter | Type | Required | Description |
|---|---|---|---|
name |
str |
Yes | Display name for the judge |
evaluation_goal |
str |
Yes | Description of what the judge should evaluate |
model_id |
str | None |
No | ID of the LLM model to use. If omitted, the server uses a default model |
timeout |
float | httpx.Timeout | None |
No | Override request timeout |
Returns a Judge object if successful, None if the judge could not be created.
judge = client.judges.create(
name="Accuracy Judge",
evaluation_goal="Evaluate whether the response is factually accurate",
model_id="model-abc123",
)Retrieves a judge by its unique identifier.
| Parameter | Type | Required | Description |
|---|---|---|---|
id |
str |
Yes | The unique judge ID |
timeout |
float | httpx.Timeout | None |
No | Override request timeout |
Returns a Judge object if found, None otherwise.
Retrieves multiple judges with pagination.
| Parameter | Type | Required | Description |
|---|---|---|---|
page |
int | None |
No | Page number (1-based, defaults to 1) |
page_size |
int | None |
No | Number of judges per page (default: 100, max: 500) |
timeout |
float | httpx.Timeout | None |
No | Override request timeout |
Returns a JudgesResponse object containing:
judges: List ofJudgeobjectscount: Number of judges in this pagetotal_count: Total number of judges
Returns None if the request fails.
# Get first page
response = client.judges.get_many()
print(f"Total judges: {response.total_count}")
# Get specific page
response = client.judges.get_many(page=2, page_size=50)Updates an existing judge. Only provided fields are modified; omitted fields remain unchanged. Updating a judge creates a new version.
| Parameter | Type | Required | Description |
|---|---|---|---|
id |
str |
Yes | The unique judge ID |
name |
str | None |
No | Updated display name |
evaluation_goal |
str | None |
No | Updated evaluation criteria |
model_id |
str | None |
No | Updated model ID |
timeout |
float | httpx.Timeout | None |
No | Override request timeout |
Returns an UpdateJudgeResponse if successful, None otherwise.
updated = client.judges.update(
"judge-123",
evaluation_goal="Evaluate code for correctness, readability, and security",
)Deletes a judge by its unique identifier.
| Parameter | Type | Required | Description |
|---|---|---|---|
id |
str |
Yes | The unique judge ID |
timeout |
float | httpx.Timeout | None |
No | Override request timeout |
Returns a DeleteJudgeResponse if successful, None otherwise.
| Property | Type | Description |
|---|---|---|
id |
str |
Unique judge identifier |
organization_id |
str |
Organization the judge belongs to |
project_id |
str |
Project the judge belongs to |
name |
str |
Display name |
evaluation_goal |
str |
Description of evaluation criteria |
model_id |
str | None |
ID of the backing LLM model |
model_name |
str | None |
Name of the backing LLM model |
model_company |
str | None |
Company that provides the model |
version |
int |
Current version number |
run_count |
int |
Number of times this judge has been run |
created_at |
str |
ISO 8601 creation timestamp |
updated_at |
str | None |
ISO 8601 last update timestamp |
versions |
List[JudgeVersion] |
Version history |
| Property | Type | Description |
|---|---|---|
version |
int |
Version number |
name |
str |
Name at this version |
evaluation_goal |
str |
Evaluation goal at this version |
model_id |
str | None |
Model ID at this version |
model_name |
str | None |
Model name at this version |
model_company |
str | None |
Model company at this version |
updated_at |
str | None |
When this version was created |
updated_by |
str | None |
Who created this version |
- Learn about Traces to upload data for evaluation
- Learn about Trace Evaluations to run judges against traces