sysmobench --task <task> --method <method> --model <model> --metric <metric> [options]--task- System:spin,mutex,rwmutex,etcd,redisraft,curp,dqueue,locksvc,raftkvs--method- Generation method:direct_call,agent_based,trace_based--model- Model name (configured inconfig/models.yaml)
--metric- Evaluation metric--spec-file <path>- Use existing TLA+ spec (skip generation)--config-file <path>- Use existing TLC config (skip generation)
--tasks- Multiple tasks (space-separated)--metrics- Multiple metrics--output <dir>- Output directory (default:results/)
sysmobench --list-tasks
sysmobench --list-methods
sysmobench --list-models
sysmobench --list-metricsSysMoBench evaluates models across four dimensions:
| Metric | Description | Parameters |
|---|---|---|
compilation_check |
Full-model compilation with SANY | None |
action_decomposition |
Per-action validation with recovery | None |
| Metric | Description | Parameters |
|---|---|---|
runtime_check |
Model checking without invariants | --tlc-timeout <seconds> |
coverage |
Action coverage via TLC statistics | --tlc-timeout <seconds> |
runtime_coverage |
Simulation-based coverage | --tlc-timeout <seconds> |
| Metric | Description | Applies To | Parameters |
|---|---|---|---|
trace_validation |
Trace generation and validation | spin, mutex, rwmutex, etcd, redisraft, curp |
--with-exist-traces <N>--with-exist-specTrace--create-mapping |
pgo_trace_validation |
Trace validation for PGo systems | dqueue, locksvc, raftkvs |
--with-exist-traces <N> |
| Metric | Description | Parameters |
|---|---|---|
invariant_verification |
Verify system-specific safety and liveness properties | --tlc-timeout <seconds> |
| Metric | Description | Parameters |
|---|---|---|
composite |
Sequential evaluation across all dimensions | None |
config/models.yaml
- Recommended:
pip install -e . - If you need the old native SDK adapters too:
pip install -e ".[legacy-providers]"
models:
<model_name>:
provider: "litellm" | "openai" | "anthropic" | "genai" | "deepseek" | "yunwu" | "legacy_openai" | "legacy_anthropic" | "legacy_genai"
model_name: "<litellm-model-name>"
api_key_env: "<ENV_VAR_NAME>"
temperature: <float>
max_tokens: <int>
timeout: <int> # seconds, optional
top_p: <float> # optional
url: "<endpoint>" # optional, for OpenAI-compatible endpoints
litellm_params: # optional, provider-specific passthrough params
<key>: <value>models:
# Anthropic Claude via LiteLLM
claude:
provider: "litellm"
model_name: "anthropic/claude-sonnet-4-20250514"
api_key_env: "ANTHROPIC_API_KEY"
temperature: 0.1
max_tokens: 64000
top_p: 0.9Compatibility notes:
- Existing
provider: "openai" | "anthropic" | "genai" | "deepseek" | "yunwu"entries still work and now route through the LiteLLM adapter by default. - Use
legacy_openai,legacy_anthropic, orlegacy_genaiif you need to force the old native SDK adapters. - The checked-in
config/models.yamlintentionally keeps the old provider names so the repo still runs even beforelitellmis installed.
- Add model configuration to
config/models.yaml - Set environment variable:
export YOUR_API_KEY="key"
- Use model name in command:
sysmobench --task spin --method direct_call --model custom --metric compilation_check
Results in output/<metric>/<task>/<method>_<model>/:
output/coverage/spin/direct_call_gemini/
├── generated_spec.tla
├── generated_config.cfg
├── evaluation_results.json
└── tlc_output.log
Task configs in tla_eval/tasks/<task>/task.yaml:
name: "spin"
description: "Asterinas OS spinlock"
system_type: "concurrent"
language: "rust"
repository:
url: "https://github.com/asterinas/asterinas.git"
branch: "main"
source_files:
- path: "ostd/src/sync/spin.rs"
default_source_file: "ostd/src/sync/spin.rs"
specModule: "spin"
traces_folder: "data/sys_traces/spin"
...