Usage Guide

Command Structure

sysmobench --task <task> --method <method> --model <model> --metric <metric> [options]

Required Parameters

--task - System: spin, mutex, rwmutex, etcd, redisraft, curp, dqueue, locksvc, raftkvs
--method - Generation method: direct_call, agent_based, trace_based
--model - Model name (configured in config/models.yaml)

Common Parameters

--metric - Evaluation metric
--spec-file <path> - Use existing TLA+ spec (skip generation)
--config-file <path> - Use existing TLC config (skip generation)

Batch Evaluation

--tasks - Multiple tasks (space-separated)
--metrics - Multiple metrics
--output <dir> - Output directory (default: results/)

List Options

sysmobench --list-tasks
sysmobench --list-methods
sysmobench --list-models
sysmobench --list-metrics

Metrics

SysMoBench evaluates models across four dimensions:

1. Syntax Correctness

Metric	Description	Parameters
`compilation_check`	Full-model compilation with SANY	None
`action_decomposition`	Per-action validation with recovery	None

2. Runtime Correctness

Metric	Description	Parameters
`runtime_check`	Model checking without invariants	`--tlc-timeout <seconds>`
`coverage`	Action coverage via TLC statistics	`--tlc-timeout <seconds>`
`runtime_coverage`	Simulation-based coverage	`--tlc-timeout <seconds>`

3. Conformance to Implementation

Metric	Description	Applies To	Parameters
`trace_validation`	Trace generation and validation	`spin`, `mutex`, `rwmutex`, `etcd`, `redisraft`, `curp`	`--with-exist-traces <N>` `--with-exist-specTrace` `--create-mapping`
`pgo_trace_validation`	Trace validation for PGo systems	`dqueue`, `locksvc`, `raftkvs`	`--with-exist-traces <N>`

4. Invariant Correctness

Metric	Description	Parameters
`invariant_verification`	Verify system-specific safety and liveness properties	`--tlc-timeout <seconds>`

Composite

Metric	Description	Parameters
`composite`	Sequential evaluation across all dimensions	None

Model Configuration

File Location

config/models.yaml

Install Modes

Recommended: pip install -e .
If you need the old native SDK adapters too: pip install -e ".[legacy-providers]"

Configuration Format

models:
  <model_name>:
    provider: "litellm" | "openai" | "anthropic" | "genai" | "deepseek" | "yunwu" | "legacy_openai" | "legacy_anthropic" | "legacy_genai"
    model_name: "<litellm-model-name>"
    api_key_env: "<ENV_VAR_NAME>"
    temperature: <float>
    max_tokens: <int>
    timeout: <int>        # seconds, optional
    top_p: <float>        # optional
    url: "<endpoint>"     # optional, for OpenAI-compatible endpoints
    litellm_params:       # optional, provider-specific passthrough params
      <key>: <value>

Examples

models:
  # Anthropic Claude via LiteLLM
  claude:
    provider: "litellm"
    model_name: "anthropic/claude-sonnet-4-20250514"
    api_key_env: "ANTHROPIC_API_KEY"
    temperature: 0.1
    max_tokens: 64000
    top_p: 0.9

Compatibility notes:

Existing provider: "openai" | "anthropic" | "genai" | "deepseek" | "yunwu" entries still work and now route through the LiteLLM adapter by default.
Use legacy_openai, legacy_anthropic, or legacy_genai if you need to force the old native SDK adapters.
The checked-in config/models.yaml intentionally keeps the old provider names so the repo still runs even before litellm is installed.

Usage

Add model configuration to config/models.yaml
Set environment variable:
```
export YOUR_API_KEY="key"
```

Use model name in command:

sysmobench --task spin --method direct_call --model custom --metric compilation_check

Output Structure

Results in output/<metric>/<task>/<method>_<model>/:

output/coverage/spin/direct_call_gemini/
├── generated_spec.tla
├── generated_config.cfg
├── evaluation_results.json
└── tlc_output.log

Task Configuration

Task configs in tla_eval/tasks/<task>/task.yaml:

name: "spin"
description: "Asterinas OS spinlock"
system_type: "concurrent"
language: "rust"

repository:
  url: "https://github.com/asterinas/asterinas.git"
  branch: "main"

source_files:
  - path: "ostd/src/sync/spin.rs"

default_source_file: "ostd/src/sync/spin.rs"
specModule: "spin"
traces_folder: "data/sys_traces/spin"
...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usage Guide

Command Structure

Required Parameters

Common Parameters

Batch Evaluation

List Options

Metrics

1. Syntax Correctness

2. Runtime Correctness

3. Conformance to Implementation

4. Invariant Correctness

Composite

Model Configuration

File Location

Install Modes

Configuration Format

Examples

Usage

Output Structure

Task Configuration

FilesExpand file tree

Usage.md

Latest commit

History

Usage.md

File metadata and controls

Usage Guide

Command Structure

Required Parameters

Common Parameters

Batch Evaluation

List Options

Metrics

1. Syntax Correctness

2. Runtime Correctness

3. Conformance to Implementation

4. Invariant Correctness

Composite

Model Configuration

File Location

Install Modes

Configuration Format

Examples

Usage

Output Structure

Task Configuration