The judge_optimizations resource on the Stratix client allows you to optimize a judge's evaluation criteria using automated prompt engineering. This can improve the accuracy and consistency of your judge's evaluations.
A judge optimization run takes an existing judge and refines its evaluation goal through automated testing and prompt engineering. You can estimate costs before running, monitor progress, and apply successful optimizations to update the judge.
from layerlens import Stratix
client = Stratix()
# Estimate cost before running
estimate = client.judge_optimizations.estimate(
judge_id="judge-123",
budget="medium",
)
print(f"Estimated cost: ${estimate.estimated_cost:.4f}")
# Start an optimization run
run = client.judge_optimizations.create(
judge_id="judge-123",
budget="medium",
)
# Poll for completion
import time
for _ in range(60):
optimization = client.judge_optimizations.get(run.id)
if optimization.status.value in ("success", "failure"):
break
time.sleep(5)
# Apply successful results
if optimization.status.value == "success":
result = client.judge_optimizations.apply(run.id)
print(f"New version: v{result.new_version}")import asyncio
from layerlens import AsyncStratix
async def main():
client = AsyncStratix()
run = await client.judge_optimizations.create(
judge_id="judge-123",
budget="medium",
)
# Poll for completion
import time
for _ in range(60):
optimization = await client.judge_optimizations.get(run.id)
if optimization.status.value in ("success", "failure"):
break
await asyncio.sleep(5)
if optimization.status.value == "success":
result = await client.judge_optimizations.apply(run.id)
print(f"New version: v{result.new_version}")
if __name__ == "__main__":
asyncio.run(main())Both the Stratix (synchronous) and AsyncStratix (asynchronous) clients support the following methods.
Estimates the cost of running an optimization on a judge before actually executing it.
| Parameter | Type | Required | Description |
|---|---|---|---|
judge_id |
str |
Yes | ID of the judge to optimize |
budget |
str |
No | Optimization budget: "light", "medium", or "heavy" (default: "medium") |
timeout |
float | httpx.Timeout | None |
No | Override request timeout |
Returns an EstimateJudgeOptimizationCostResponse object if successful, None otherwise.
estimate = client.judge_optimizations.estimate(
judge_id="judge-123",
budget="heavy",
)
if estimate:
print(f"Estimated cost: ${estimate.estimated_cost:.4f}")
print(f"Annotations: {estimate.annotation_count}")
print(f"Budget: {estimate.budget}")Starts a new optimization run for a judge. The optimization runs asynchronously on the server.
| Parameter | Type | Required | Description |
|---|---|---|---|
judge_id |
str |
Yes | ID of the judge to optimize |
budget |
str |
No | Optimization budget: "light", "medium", or "heavy" (default: "medium") |
timeout |
float | httpx.Timeout | None |
No | Override request timeout |
Returns a CreateJudgeOptimizationRunResponse object if successful, None otherwise.
run = client.judge_optimizations.create(
judge_id="judge-123",
budget="medium",
)
print(f"Optimization {run.id}: {run.status}")Retrieves the current state of an optimization run by its unique identifier.
| Parameter | Type | Required | Description |
|---|---|---|---|
id |
str |
Yes | The unique optimization run ID |
timeout |
float | httpx.Timeout | None |
No | Override request timeout |
Returns a JudgeOptimizationRun object if found, None otherwise.
optimization = client.judge_optimizations.get("opt-run-123")
if optimization:
print(f"Status: {optimization.status}")
if optimization.status.value == "success":
print(f"Baseline accuracy: {optimization.baseline_accuracy}")
print(f"Optimized accuracy: {optimization.optimized_accuracy}")Retrieves multiple optimization runs with optional filtering and pagination.
| Parameter | Type | Required | Description |
|---|---|---|---|
judge_id |
str | None |
No | Filter by judge |
page |
int | None |
No | Page number (1-based, defaults to 1) |
page_size |
int | None |
No | Number of runs per page (default: 20, max: 500) |
timeout |
float | httpx.Timeout | None |
No | Override request timeout |
Returns a JudgeOptimizationRunsResponse object containing:
optimization_runs: List ofJudgeOptimizationRunobjectscount: Number of runs in this pagetotal: Total number of matching runs
Returns None if the request fails.
# List all optimization runs
response = client.judge_optimizations.get_many()
print(f"Total runs: {response.total}")
# Filter by judge
response = client.judge_optimizations.get_many(judge_id="judge-123")
for run in response.optimization_runs:
print(f" {run.id}: {run.status} (budget: {run.budget})")Applies the results of a successful optimization run to the judge, updating its evaluation goal and creating a new judge version.
| Parameter | Type | Required | Description |
|---|---|---|---|
id |
str |
Yes | The unique optimization run ID |
timeout |
float | httpx.Timeout | None |
No | Override request timeout |
Returns an ApplyJudgeOptimizationResultResponse object if successful, None otherwise.
result = client.judge_optimizations.apply("opt-run-123")
if result:
print(f"Applied to judge {result.judge_id}")
print(f"New version: v{result.new_version}")
print(f"{result.message}")| Property | Type | Description |
|---|---|---|
id |
str |
Unique optimization run identifier |
judge_id |
str |
ID of the judge being optimized |
budget |
str |
Optimization budget level |
status |
str |
Initial status of the run |
| Property | Type | Description |
|---|---|---|
id |
str |
Unique optimization run identifier |
judge_id |
str |
ID of the judge being optimized |
status |
OptimizationRunStatus |
Current status (pending, in_progress, success, failure) |
status_description |
str | None |
Human-readable status description |
budget |
OptimizationBudget |
Budget level (light, medium, heavy) |
annotation_count |
int |
Number of annotations used |
baseline_accuracy |
float | None |
Accuracy before optimization |
optimized_accuracy |
float | None |
Accuracy after optimization |
original_goal |
str | None |
Original evaluation goal |
optimized_goal |
str | None |
Optimized evaluation goal |
estimated_cost |
float |
Estimated cost in dollars |
actual_cost |
float |
Actual cost incurred |
created_at |
str |
ISO 8601 creation timestamp |
started_at |
str | None |
When optimization started |
finished_at |
str | None |
When optimization finished |
applied_at |
str | None |
When results were applied to the judge |
applied_version |
int | None |
Judge version created by applying results |
| Property | Type | Description |
|---|---|---|
estimated_cost |
float |
Estimated cost in dollars |
annotation_count |
int |
Number of annotations to process |
budget |
str |
Budget level used for estimate |
| Property | Type | Description |
|---|---|---|
judge_id |
str |
ID of the updated judge |
new_version |
int |
New version number of the judge |
message |
str |
Confirmation message |
| Budget | Description |
|---|---|
"light" |
Faster, lower cost. Good for quick iterations. |
"medium" |
Balanced cost and thoroughness. Recommended default. |
"heavy" |
Most thorough optimization. Higher cost but potentially better results. |
- Learn about Judges to create judges for optimization
- Learn about Trace Evaluations to evaluate traces with optimized judges