Public Client

The PublicClient (synchronous) and AsyncPublicClient (asynchronous) classes provide access to public LayerLens API endpoints for browsing public models, benchmarks, benchmark content, fetching evaluations, and comparing evaluation results.

Basic Usage

Synchronous Client

from layerlens import PublicClient

# Loads API key from the "LAYERLENS_STRATIX_API_KEY" environment variable
client = PublicClient()

# Browse public models
models = client.models.get(companies=["OpenAI"])

# Browse public benchmarks
benchmarks = client.benchmarks.get(languages=["English"])

Asynchronous Client

import asyncio
from layerlens import AsyncPublicClient

async def main():
    client = AsyncPublicClient()

    models = await client.models.get(companies=["OpenAI"])
    benchmarks = await client.benchmarks.get(languages=["English"])

asyncio.run(main())

Accessing from an Authenticated Client

If you already have an authenticated Stratix or AsyncStratix client, you can access public endpoints through the .public property:

from layerlens import Stratix

client = Stratix()  # requires API key

# Access public endpoints through the authenticated client
public_models = client.public.models.get(query="claude")

Constructor Parameters

`PublicClient(api_key, base_url, timeout)` and `AsyncPublicClient(api_key, base_url, timeout)`

Parameter	Type	Required	Default	Description
`api_key`	`str \| None`	Yes*	`None`	Your LayerLens Stratix API key
`base_url`	`str \| httpx.URL \| None`	No	Stratix API URL	Custom API base URL
`timeout`	`float \| httpx.Timeout \| None`	No	10 minutes	Request timeout configuration

*Required unless set via the LAYERLENS_STRATIX_API_KEY environment variable

Public Models

`models.get(...)`

Retrieves a list of public models with optional filtering, sorting, and pagination.

Parameters

Parameter	Type	Required	Description
`query`	`str \| None`	No	Full-text search on model name
`name`	`str \| None`	No	Filter by model name
`key`	`str \| None`	No	Filter by model key
`ids`	`List[str] \| None`	No	Filter by specific model IDs
`categories`	`List[str] \| None`	No	Filter by categories (e.g. `transformer`, `moe`, `open-source`, `closed-source`, `usa`, `china`, `size-sm`, `size-md`, `size-lg`, `size-xl`)
`companies`	`List[str] \| None`	No	Filter by company names
`regions`	`List[str] \| None`	No	Filter by regions
`licenses`	`List[str] \| None`	No	Filter by license types
`sizes`	`List[str] \| None`	No	Filter by size (Small, Medium, Large, Extra Large)
`sort_by`	`str \| None`	No	Sort column: `name`, `createdAt`, `releasedAt`, `architectureType`, `contextLength`, `license`, `region`
`order`	`str \| None`	No	Sort order: `asc` or `desc`
`page`	`int \| None`	No	Page number (1-based)
`page_size`	`int \| None`	No	Results per page
`include_deprecated`	`bool \| None`	No	Include deprecated models (default: false)
`timeout`	`float \| httpx.Timeout \| None`	No	Override request timeout

Returns

Returns a PublicModelsListResponse containing:

models: List of PublicModelDetail objects
categories: List of available category strings
count: Number of results in current page
total_count: Total number of matching results

Returns None if the request fails.

PublicModelDetail Properties

Property	Type	Description
`id`	`str`	Unique model identifier
`key`	`str`	Unique model key
`name`	`str`	Human-readable model name
`description`	`str \| None`	Text description
`company`	`str \| None`	Model provider company
`released_at`	`int \| None`	Release timestamp
`parameters`	`float \| None`	Number of parameters
`modality`	`str \| None`	Model modality
`context_length`	`int \| None`	Maximum context length
`architecture_type`	`str \| None`	Architecture type
`license`	`str \| None`	License type
`open_weights`	`bool \| None`	Whether weights are open
`region`	`str \| None`	Region
`key_takeaways`	`List[str] \| None`	Key takeaways
`deprecated`	`bool \| None`	Whether the model is deprecated
`cost_per_input_token`	`str \| None`	Cost per input token
`cost_per_output_token`	`str \| None`	Cost per output token

Example

from layerlens import PublicClient

client = PublicClient()

# Get newest OpenAI models
response = client.models.get(
    companies=["OpenAI"],
    sort_by="releasedAt",
    order="desc",
    page_size=5,
)

for model in response.models:
    print(f"{model.name} - {model.context_length} context length")

Public Benchmarks

`benchmarks.get(...)`

Retrieves a list of public benchmarks with optional filtering, sorting, and pagination.

Parameters

Parameter	Type	Required	Description
`query`	`str \| None`	No	Full-text search
`name`	`str \| None`	No	Filter by name
`key`	`str \| None`	No	Filter by key
`ids`	`List[str] \| None`	No	Filter by specific IDs
`categories`	`List[str] \| None`	No	Filter by categories
`languages`	`List[str] \| None`	No	Filter by languages
`sort_by`	`str \| None`	No	Sort column (currently: `name`)
`order`	`str \| None`	No	Sort order: `asc` or `desc`
`page`	`int \| None`	No	Page number (1-based)
`page_size`	`int \| None`	No	Results per page
`include_deprecated`	`bool \| None`	No	Include deprecated benchmarks
`timeout`	`float \| httpx.Timeout \| None`	No	Override request timeout

Returns

Returns a PublicBenchmarksListResponse containing:

datasets: List of PublicBenchmarkDetail objects
categories: List of available category strings
count: Number of results in current page
total_count: Total number of matching results

Returns None if the request fails.

PublicBenchmarkDetail Properties

Property	Type	Description
`id`	`str`	Unique benchmark identifier
`key`	`str`	Unique benchmark key
`name`	`str`	Human-readable name
`description`	`str \| None`	Text description
`prompt_count`	`int \| None`	Number of prompts in the benchmark
`language`	`str \| None`	Language of the benchmark
`categories`	`List[str] \| None`	Categories
`characteristics`	`List[str] \| None`	Characteristics
`deprecated`	`bool \| None`	Whether the benchmark is deprecated
`is_public`	`bool \| None`	Whether the benchmark is public

`benchmarks.get_prompts(benchmark_id, ...)`

Fetches prompts/content from a public benchmark with optional search and pagination.

Parameters

Parameter	Type	Required	Description
`benchmark_id`	`str`	Yes	The benchmark ID to fetch prompts from
`page`	`int \| None`	No	Page number (1-based)
`page_size`	`int \| None`	No	Results per page
`search_field`	`str \| None`	No	Search field: `id`, `input`, or `truth`
`search_value`	`str \| None`	No	Search value
`sort_by`	`str \| None`	No	Sort field: `id`, `input`, or `truth`
`sort_order`	`str \| None`	No	Sort order: `asc` or `desc`
`timeout`	`float \| httpx.Timeout \| None`	No	Override request timeout

Returns

Returns a BenchmarkPromptsResponse containing:

status: Response status string
data.prompts: List of BenchmarkPrompt objects
data.count: Total number of prompts

Returns None if the request fails.

BenchmarkPrompt Properties

Property	Type	Description
`id`	`str`	Unique prompt identifier
`input`	`str \| List \| Dict`	The prompt input
`truth`	`str`	The expected/ground truth answer

`benchmarks.get_all_prompts(benchmark_id, timeout=None)`

Fetches all prompts from a benchmark by automatically handling pagination.

Parameters

Parameter	Type	Required	Description
`benchmark_id`	`str`	Yes	The benchmark ID to fetch prompts from
`timeout`	`float \| httpx.Timeout \| None`	No	Override request timeout

Returns

Returns a List[BenchmarkPrompt] containing all prompts in the benchmark.

Example

from layerlens import PublicClient

client = PublicClient()

# List benchmarks
benchmarks = client.benchmarks.get(query="mmlu")

if benchmarks and benchmarks.datasets:
    benchmark = benchmarks.datasets[0]

    # Get first page of prompts
    prompts = client.benchmarks.get_prompts(benchmark.id, page=1, page_size=10)

    if prompts:
        print(f"Total prompts: {prompts.data.count}")
        for prompt in prompts.data.prompts:
            print(f"  Input: {str(prompt.input)[:80]}...")
            print(f"  Truth: {prompt.truth[:50]}")

    # Or fetch all prompts at once
    all_prompts = client.benchmarks.get_all_prompts(benchmark.id)
    print(f"All prompts: {len(all_prompts)}")

Evaluations

`evaluations.get_by_id(id, ...)`

Retrieves a single evaluation by its unique identifier, including the full evaluation summary.

Parameters

Parameter	Type	Required	Description
`id`	`str`	Yes	The unique evaluation identifier
`timeout`	`float \| httpx.Timeout \| None`	No	Override request timeout

Returns

Returns an Evaluation object if found, None otherwise. See Evaluations for the full Evaluation object properties.

`evaluations.get_many(...)`

Retrieves evaluations with optional pagination, sorting, and filtering.

Parameters

Parameter	Type	Required	Description
`page`	`int \| None`	No	Page number for pagination (1-based, defaults to 1)
`page_size`	`int \| None`	No	Number of evaluations per page (default: 100, max: 500)
`sort_by`	`str \| None`	No	Sort by field: `submitted_at`, `accuracy`, or `average_duration`
`order`	`str \| None`	No	Sort order: `asc` or `desc`
`model_ids`	`List[str] \| None`	No	Filter by model IDs
`benchmark_ids`	`List[str] \| None`	No	Filter by benchmark/dataset IDs
`status`	`EvaluationStatus \| None`	No	Filter by evaluation status
`unique`	`bool`	No	If `True`, deduplicate by model+benchmark pair, keeping only the latest evaluation
`timeout`	`float \| httpx.Timeout \| None`	No	Override request timeout

Returns

Returns an EvaluationsResponse object containing:

evaluations: List of Evaluation objects
pagination: Pagination metadata with page, page_size, total_pages, and total_count

Returns None if the request fails.

Example

from layerlens import PublicClient
from layerlens.models import EvaluationStatus

client = PublicClient()

# Get a specific evaluation by ID (with full summary)
evaluation = client.evaluations.get_by_id("eval_abc123")
if evaluation:
    print(f"{evaluation.model_name} on {evaluation.benchmark_name}: {evaluation.accuracy:.2f}%")
    if evaluation.summary:
        print(f"Goal: {evaluation.summary.goal}")
        for takeaway in evaluation.summary.analysis_summary.key_takeaways:
            print(f"  - {takeaway}")

# List successful evaluations sorted by accuracy
response = client.evaluations.get_many(
    status=EvaluationStatus.SUCCESS,
    sort_by="accuracy",
    order="desc",
    page_size=10,
)
if response:
    print(f"Top evaluations ({response.pagination.total_count} total):")
    for e in response.evaluations:
        print(f"  {e.model_name}: {e.accuracy:.2f}%")

Comparisons

`comparisons.compare(...)`

Compares results between two evaluations side-by-side.

Parameters

Parameter	Type	Required	Description
`evaluation_id_1`	`str`	Yes	First evaluation ID
`evaluation_id_2`	`str`	Yes	Second evaluation ID
`page`	`int \| None`	No	Page number (1-based)
`page_size`	`int \| None`	No	Results per page
`outcome_filter`	`str \| None`	No	Filter by outcome (see below)
`search`	`str \| None`	No	Search within results
`timeout`	`float \| httpx.Timeout \| None`	No	Override request timeout

Outcome Filter Options

Value	Description
`"all"`	All results (default)
`"both_succeed"`	Both models answered correctly
`"both_fail"`	Both models answered incorrectly
`"reference_fails"`	First model fails, second succeeds
`"comparison_fails"`	Second model fails, first succeeds

Returns

Returns a ComparisonResponse containing:

results: List of ComparisonResult objects
total_count: Total number of comparable results
correct_count_1: Number of correct answers for evaluation 1
total_results_1: Total results for evaluation 1
correct_count_2: Number of correct answers for evaluation 2
total_results_2: Total results for evaluation 2

Returns None if the request fails.

ComparisonResult Properties

Property	Type	Description
`result_id_1`	`int \| None`	Result ID from evaluation 1
`result_id_2`	`int \| None`	Result ID from evaluation 2
`prompt`	`str`	The prompt text
`truth`	`str`	The ground truth answer
`result1`	`str \| None`	Model 1's response
`score1`	`float \| None`	Model 1's score
`result2`	`str \| None`	Model 2's response
`score2`	`float \| None`	Model 2's score

Example

from layerlens import PublicClient

client = PublicClient()

comparison = client.comparisons.compare(
    evaluation_id_1="eval-abc",
    evaluation_id_2="eval-def",
    outcome_filter="reference_fails",
    page=1,
    page_size=20,
)

if comparison:
    print(f"Eval 1: {comparison.correct_count_1}/{comparison.total_results_1}")
    print(f"Eval 2: {comparison.correct_count_2}/{comparison.total_results_2}")

    for result in comparison.results:
        print(f"  Prompt: {result.prompt[:80]}...")
        print(f"  Model 1 score: {result.score1}, Model 2 score: {result.score2}")

`comparisons.compare_models(...)`

Compares two models on a benchmark by automatically finding their most recent successful evaluations. This is a convenience method that wraps compare().

Parameters

Parameter	Type	Required	Description
`benchmark_id`	`str`	Yes	Benchmark ID to compare on
`model_id_1`	`str`	Yes	First model ID
`model_id_2`	`str`	Yes	Second model ID
`page`	`int \| None`	No	Page number (1-based)
`page_size`	`int \| None`	No	Results per page
`outcome_filter`	`str \| None`	No	Filter by outcome (same options as `compare`)
`search`	`str \| None`	No	Search within results
`timeout`	`float \| httpx.Timeout \| None`	No	Override request timeout

Returns

Returns a ComparisonResponse (same as compare()), or None if the comparison request fails.

Raises ValueError if no successful evaluation is found for either model on the given benchmark.

Example

from layerlens import PublicClient

client = PublicClient()

# Compare two models on AIME 2025 - no need to look up evaluation IDs
comparison = client.comparisons.compare_models(
    benchmark_id="682bddc1e014f9fa440f8a91",
    model_id_1="699f9761e014f9c3072b0513",
    model_id_2="699f9761e014f9c3072b0512",
    page=1,
    page_size=10,
)

if comparison:
    print(f"Model 1: {comparison.correct_count_1}/{comparison.total_results_1} correct")
    print(f"Model 2: {comparison.correct_count_2}/{comparison.total_results_2} correct")

Uh oh!

FilesExpand file tree

public-client.md

Latest commit

History

public-client.md

File metadata and controls

Public Client

Basic Usage

Synchronous Client

Asynchronous Client

Accessing from an Authenticated Client

Constructor Parameters

PublicClient(api_key, base_url, timeout) and AsyncPublicClient(api_key, base_url, timeout)

Public Models

models.get(...)

Parameters

Returns

PublicModelDetail Properties

Example

Public Benchmarks

benchmarks.get(...)

Parameters

Returns

PublicBenchmarkDetail Properties

benchmarks.get_prompts(benchmark_id, ...)

Parameters

Returns

BenchmarkPrompt Properties

benchmarks.get_all_prompts(benchmark_id, timeout=None)

Parameters

Returns

Example

Evaluations

evaluations.get_by_id(id, ...)

Parameters

Returns

evaluations.get_many(...)

Parameters

Returns

Example

Comparisons

comparisons.compare(...)

Parameters

Outcome Filter Options

Returns

ComparisonResult Properties

Example

comparisons.compare_models(...)

Parameters

Returns

Example

`PublicClient(api_key, base_url, timeout)` and `AsyncPublicClient(api_key, base_url, timeout)`

`models.get(...)`

`benchmarks.get(...)`

`benchmarks.get_prompts(benchmark_id, ...)`

`benchmarks.get_all_prompts(benchmark_id, timeout=None)`

`evaluations.get_by_id(id, ...)`

`evaluations.get_many(...)`

`comparisons.compare(...)`

`comparisons.compare_models(...)`