The PublicClient (synchronous) and AsyncPublicClient (asynchronous) classes provide access to public LayerLens API endpoints for browsing public models, benchmarks, benchmark content, fetching evaluations, and comparing evaluation results.
from layerlens import PublicClient
# Loads API key from the "LAYERLENS_STRATIX_API_KEY" environment variable
client = PublicClient()
# Browse public models
models = client.models.get(companies=["OpenAI"])
# Browse public benchmarks
benchmarks = client.benchmarks.get(languages=["English"])import asyncio
from layerlens import AsyncPublicClient
async def main():
client = AsyncPublicClient()
models = await client.models.get(companies=["OpenAI"])
benchmarks = await client.benchmarks.get(languages=["English"])
asyncio.run(main())If you already have an authenticated Stratix or AsyncStratix client, you can access public endpoints through the .public property:
from layerlens import Stratix
client = Stratix() # requires API key
# Access public endpoints through the authenticated client
public_models = client.public.models.get(query="claude")| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
api_key |
str | None |
Yes* | None |
Your LayerLens Stratix API key |
base_url |
str | httpx.URL | None |
No | Stratix API URL | Custom API base URL |
timeout |
float | httpx.Timeout | None |
No | 10 minutes | Request timeout configuration |
*Required unless set via the LAYERLENS_STRATIX_API_KEY environment variable
Retrieves a list of public models with optional filtering, sorting, and pagination.
| Parameter | Type | Required | Description |
|---|---|---|---|
query |
str | None |
No | Full-text search on model name |
name |
str | None |
No | Filter by model name |
key |
str | None |
No | Filter by model key |
ids |
List[str] | None |
No | Filter by specific model IDs |
categories |
List[str] | None |
No | Filter by categories (e.g. transformer, moe, open-source, closed-source, usa, china, size-sm, size-md, size-lg, size-xl) |
companies |
List[str] | None |
No | Filter by company names |
regions |
List[str] | None |
No | Filter by regions |
licenses |
List[str] | None |
No | Filter by license types |
sizes |
List[str] | None |
No | Filter by size (Small, Medium, Large, Extra Large) |
sort_by |
str | None |
No | Sort column: name, createdAt, releasedAt, architectureType, contextLength, license, region |
order |
str | None |
No | Sort order: asc or desc |
page |
int | None |
No | Page number (1-based) |
page_size |
int | None |
No | Results per page |
include_deprecated |
bool | None |
No | Include deprecated models (default: false) |
timeout |
float | httpx.Timeout | None |
No | Override request timeout |
Returns a PublicModelsListResponse containing:
models: List ofPublicModelDetailobjectscategories: List of available category stringscount: Number of results in current pagetotal_count: Total number of matching results
Returns None if the request fails.
| Property | Type | Description |
|---|---|---|
id |
str |
Unique model identifier |
key |
str |
Unique model key |
name |
str |
Human-readable model name |
description |
str | None |
Text description |
company |
str | None |
Model provider company |
released_at |
int | None |
Release timestamp |
parameters |
float | None |
Number of parameters |
modality |
str | None |
Model modality |
context_length |
int | None |
Maximum context length |
architecture_type |
str | None |
Architecture type |
license |
str | None |
License type |
open_weights |
bool | None |
Whether weights are open |
region |
str | None |
Region |
key_takeaways |
List[str] | None |
Key takeaways |
deprecated |
bool | None |
Whether the model is deprecated |
cost_per_input_token |
str | None |
Cost per input token |
cost_per_output_token |
str | None |
Cost per output token |
from layerlens import PublicClient
client = PublicClient()
# Get newest OpenAI models
response = client.models.get(
companies=["OpenAI"],
sort_by="releasedAt",
order="desc",
page_size=5,
)
for model in response.models:
print(f"{model.name} - {model.context_length} context length")Retrieves a list of public benchmarks with optional filtering, sorting, and pagination.
| Parameter | Type | Required | Description |
|---|---|---|---|
query |
str | None |
No | Full-text search |
name |
str | None |
No | Filter by name |
key |
str | None |
No | Filter by key |
ids |
List[str] | None |
No | Filter by specific IDs |
categories |
List[str] | None |
No | Filter by categories |
languages |
List[str] | None |
No | Filter by languages |
sort_by |
str | None |
No | Sort column (currently: name) |
order |
str | None |
No | Sort order: asc or desc |
page |
int | None |
No | Page number (1-based) |
page_size |
int | None |
No | Results per page |
include_deprecated |
bool | None |
No | Include deprecated benchmarks |
timeout |
float | httpx.Timeout | None |
No | Override request timeout |
Returns a PublicBenchmarksListResponse containing:
datasets: List ofPublicBenchmarkDetailobjectscategories: List of available category stringscount: Number of results in current pagetotal_count: Total number of matching results
Returns None if the request fails.
| Property | Type | Description |
|---|---|---|
id |
str |
Unique benchmark identifier |
key |
str |
Unique benchmark key |
name |
str |
Human-readable name |
description |
str | None |
Text description |
prompt_count |
int | None |
Number of prompts in the benchmark |
language |
str | None |
Language of the benchmark |
categories |
List[str] | None |
Categories |
characteristics |
List[str] | None |
Characteristics |
deprecated |
bool | None |
Whether the benchmark is deprecated |
is_public |
bool | None |
Whether the benchmark is public |
Fetches prompts/content from a public benchmark with optional search and pagination.
| Parameter | Type | Required | Description |
|---|---|---|---|
benchmark_id |
str |
Yes | The benchmark ID to fetch prompts from |
page |
int | None |
No | Page number (1-based) |
page_size |
int | None |
No | Results per page |
search_field |
str | None |
No | Search field: id, input, or truth |
search_value |
str | None |
No | Search value |
sort_by |
str | None |
No | Sort field: id, input, or truth |
sort_order |
str | None |
No | Sort order: asc or desc |
timeout |
float | httpx.Timeout | None |
No | Override request timeout |
Returns a BenchmarkPromptsResponse containing:
status: Response status stringdata.prompts: List ofBenchmarkPromptobjectsdata.count: Total number of prompts
Returns None if the request fails.
| Property | Type | Description |
|---|---|---|
id |
str |
Unique prompt identifier |
input |
str | List | Dict |
The prompt input |
truth |
str |
The expected/ground truth answer |
Fetches all prompts from a benchmark by automatically handling pagination.
| Parameter | Type | Required | Description |
|---|---|---|---|
benchmark_id |
str |
Yes | The benchmark ID to fetch prompts from |
timeout |
float | httpx.Timeout | None |
No | Override request timeout |
Returns a List[BenchmarkPrompt] containing all prompts in the benchmark.
from layerlens import PublicClient
client = PublicClient()
# List benchmarks
benchmarks = client.benchmarks.get(query="mmlu")
if benchmarks and benchmarks.datasets:
benchmark = benchmarks.datasets[0]
# Get first page of prompts
prompts = client.benchmarks.get_prompts(benchmark.id, page=1, page_size=10)
if prompts:
print(f"Total prompts: {prompts.data.count}")
for prompt in prompts.data.prompts:
print(f" Input: {str(prompt.input)[:80]}...")
print(f" Truth: {prompt.truth[:50]}")
# Or fetch all prompts at once
all_prompts = client.benchmarks.get_all_prompts(benchmark.id)
print(f"All prompts: {len(all_prompts)}")Retrieves a single evaluation by its unique identifier, including the full evaluation summary.
| Parameter | Type | Required | Description |
|---|---|---|---|
id |
str |
Yes | The unique evaluation identifier |
timeout |
float | httpx.Timeout | None |
No | Override request timeout |
Returns an Evaluation object if found, None otherwise. See Evaluations for the full Evaluation object properties.
Retrieves evaluations with optional pagination, sorting, and filtering.
| Parameter | Type | Required | Description |
|---|---|---|---|
page |
int | None |
No | Page number for pagination (1-based, defaults to 1) |
page_size |
int | None |
No | Number of evaluations per page (default: 100, max: 500) |
sort_by |
str | None |
No | Sort by field: submitted_at, accuracy, or average_duration |
order |
str | None |
No | Sort order: asc or desc |
model_ids |
List[str] | None |
No | Filter by model IDs |
benchmark_ids |
List[str] | None |
No | Filter by benchmark/dataset IDs |
status |
EvaluationStatus | None |
No | Filter by evaluation status |
unique |
bool |
No | If True, deduplicate by model+benchmark pair, keeping only the latest evaluation |
timeout |
float | httpx.Timeout | None |
No | Override request timeout |
Returns an EvaluationsResponse object containing:
evaluations: List ofEvaluationobjectspagination: Pagination metadata withpage,page_size,total_pages, andtotal_count
Returns None if the request fails.
from layerlens import PublicClient
from layerlens.models import EvaluationStatus
client = PublicClient()
# Get a specific evaluation by ID (with full summary)
evaluation = client.evaluations.get_by_id("eval_abc123")
if evaluation:
print(f"{evaluation.model_name} on {evaluation.benchmark_name}: {evaluation.accuracy:.2f}%")
if evaluation.summary:
print(f"Goal: {evaluation.summary.goal}")
for takeaway in evaluation.summary.analysis_summary.key_takeaways:
print(f" - {takeaway}")
# List successful evaluations sorted by accuracy
response = client.evaluations.get_many(
status=EvaluationStatus.SUCCESS,
sort_by="accuracy",
order="desc",
page_size=10,
)
if response:
print(f"Top evaluations ({response.pagination.total_count} total):")
for e in response.evaluations:
print(f" {e.model_name}: {e.accuracy:.2f}%")Compares results between two evaluations side-by-side.
| Parameter | Type | Required | Description |
|---|---|---|---|
evaluation_id_1 |
str |
Yes | First evaluation ID |
evaluation_id_2 |
str |
Yes | Second evaluation ID |
page |
int | None |
No | Page number (1-based) |
page_size |
int | None |
No | Results per page |
outcome_filter |
str | None |
No | Filter by outcome (see below) |
search |
str | None |
No | Search within results |
timeout |
float | httpx.Timeout | None |
No | Override request timeout |
| Value | Description |
|---|---|
"all" |
All results (default) |
"both_succeed" |
Both models answered correctly |
"both_fail" |
Both models answered incorrectly |
"reference_fails" |
First model fails, second succeeds |
"comparison_fails" |
Second model fails, first succeeds |
Returns a ComparisonResponse containing:
results: List ofComparisonResultobjectstotal_count: Total number of comparable resultscorrect_count_1: Number of correct answers for evaluation 1total_results_1: Total results for evaluation 1correct_count_2: Number of correct answers for evaluation 2total_results_2: Total results for evaluation 2
Returns None if the request fails.
| Property | Type | Description |
|---|---|---|
result_id_1 |
int | None |
Result ID from evaluation 1 |
result_id_2 |
int | None |
Result ID from evaluation 2 |
prompt |
str |
The prompt text |
truth |
str |
The ground truth answer |
result1 |
str | None |
Model 1's response |
score1 |
float | None |
Model 1's score |
result2 |
str | None |
Model 2's response |
score2 |
float | None |
Model 2's score |
from layerlens import PublicClient
client = PublicClient()
comparison = client.comparisons.compare(
evaluation_id_1="eval-abc",
evaluation_id_2="eval-def",
outcome_filter="reference_fails",
page=1,
page_size=20,
)
if comparison:
print(f"Eval 1: {comparison.correct_count_1}/{comparison.total_results_1}")
print(f"Eval 2: {comparison.correct_count_2}/{comparison.total_results_2}")
for result in comparison.results:
print(f" Prompt: {result.prompt[:80]}...")
print(f" Model 1 score: {result.score1}, Model 2 score: {result.score2}")Compares two models on a benchmark by automatically finding their most recent successful evaluations. This is a convenience method that wraps compare().
| Parameter | Type | Required | Description |
|---|---|---|---|
benchmark_id |
str |
Yes | Benchmark ID to compare on |
model_id_1 |
str |
Yes | First model ID |
model_id_2 |
str |
Yes | Second model ID |
page |
int | None |
No | Page number (1-based) |
page_size |
int | None |
No | Results per page |
outcome_filter |
str | None |
No | Filter by outcome (same options as compare) |
search |
str | None |
No | Search within results |
timeout |
float | httpx.Timeout | None |
No | Override request timeout |
Returns a ComparisonResponse (same as compare()), or None if the comparison request fails.
Raises ValueError if no successful evaluation is found for either model on the given benchmark.
from layerlens import PublicClient
client = PublicClient()
# Compare two models on AIME 2025 - no need to look up evaluation IDs
comparison = client.comparisons.compare_models(
benchmark_id="682bddc1e014f9fa440f8a91",
model_id_1="699f9761e014f9c3072b0513",
model_id_2="699f9761e014f9c3072b0512",
page=1,
page_size=10,
)
if comparison:
print(f"Model 1: {comparison.correct_count_1}/{comparison.total_results_1} correct")
print(f"Model 2: {comparison.correct_count_2}/{comparison.total_results_2} correct")