CLI Command Reference

Complete reference for all madengine CLI commands with detailed options and examples.

Overview
Global Options
Commands
- discover
- build
- run
- report
- database
Exit Codes

Overview

madengine provides a modern CLI for AI model automation and distributed execution. All commands follow a consistent pattern with rich terminal output and comprehensive error handling.

madengine [OPTIONS] COMMAND [ARGS]...

Global Options

These options are available for the main madengine command:

Option	Description
`--version`	Show version and exit
`--help`	Show help message and exit

Commands

`discover` - Discover Available Models

Discover all models available in the MAD package based on specified tags.

Usage:

madengine discover [OPTIONS]

Options:

Option	Short	Type	Default	Description
`--tags`	`-t`	TEXT	`[]`	Model tags to discover (can specify multiple)
`--verbose`	`-v`	FLAG	`False`	Enable verbose logging

Examples:

# Discover all models
madengine discover

# Discover specific models by tag
madengine discover --tags dummy pyt_huggingface_bert

# Multiple tags with comma separation
madengine discover --tags dummy,multi,vllm

# With verbose output
madengine discover --tags model --verbose

# Directory-specific models
madengine discover --tags dummy2:dummy_2

# Dynamic models with parameters
madengine discover --tags dummy3:dummy_3:batch_size=512

Discovery Methods:

Root models - From models.json in MAD package root
Directory-specific - From scripts/{dir}/models.json
Dynamic models - Generated by scripts/{dir}/get_models_json.py

`build` - Build Docker Images

Build Docker images for models, optionally pushing them to a registry.

Usage:

madengine build [OPTIONS]

Options:

Option	Short	Type	Default	Description
`--tags`	`-t`	TEXT	`[]`	Model tags to build (can specify multiple)
`--target-archs`	`-a`	TEXT	`[]`	Target GPU architectures (e.g., gfx908,gfx90a,gfx942)
`--registry`	`-r`	TEXT	`None`	Docker registry to push images to
`--batch-manifest`		TEXT	`None`	Input batch.json file for batch build mode
`--additional-context`	`-c`	TEXT	`"{}"`	Additional context as JSON string
`--additional-context-file`	`-f`	TEXT	`None`	File containing additional context JSON
`--clean-docker-cache`		FLAG	`False`	Rebuild images without using cache
`--manifest-output`	`-m`	TEXT	`build_manifest.json`	Output file for build manifest
`--summary-output`	`-s`	TEXT	`None`	Output file for build summary JSON
`--live-output`	`-l`	FLAG	`False`	Print output in real-time
`--verbose`	`-v`	FLAG	`False`	Enable verbose logging

Examples:

# Basic build
madengine build --tags dummy \
  --additional-context '{"gpu_vendor": "AMD", "guest_os": "UBUNTU"}'

# Build with registry
madengine build --tags model \
  --registry docker.io/myorg \
  --additional-context '{"gpu_vendor": "AMD", "guest_os": "UBUNTU"}'

# Build multiple models
madengine build --tags model1 model2 model3 \
  --registry localhost:5000

# Build for multiple GPU architectures
madengine build --tags model \
  --target-archs gfx908 gfx90a gfx942 \
  --registry gcr.io/myproject

# Clean rebuild without cache
madengine build --tags model --clean-docker-cache

# Batch build mode (selective builds)
madengine build --batch-manifest batch.json \
  --registry docker.io/myorg \
  --additional-context-file config.json

# Custom manifest output
madengine build --tags model \
  --manifest-output my_manifest.json \
  --summary-output build_summary.json

# Real-time output with verbose logging
madengine build --tags model --live-output --verbose

Default Values:

The build command applies the following defaults if not specified:

gpu_vendor: AMD
guest_os: UBUNTU

Example with defaults:

# Equivalent to providing {"gpu_vendor": "AMD", "guest_os": "UBUNTU"}
madengine build --tags dummy

You will see a message indicating which defaults were applied:

ℹ️  Using default values for build configuration:
   • gpu_vendor: AMD (default)
   • guest_os: UBUNTU (default)

💡 To customize, use --additional-context '{"gpu_vendor": "NVIDIA", "guest_os": "CENTOS"}'

Supported Values:

gpu_vendor: "AMD" or "NVIDIA"
guest_os: "UBUNTU" or "CENTOS"

Batch Build Mode:

When using --batch-manifest, provide a JSON file with selective build configuration:

[
  {
    "model_name": "model1",
    "build_new": true,
    "registry": "docker.io/myorg",
    "registry_image": "custom-namespace/model1"
  },
  {
    "model_name": "model2",
    "build_new": false
  }
]

See Batch Build Guide for details.

`run` - Execute Models

Run models locally or deploy to Kubernetes/SLURM clusters.

Usage:

madengine run [OPTIONS]

Options:

Option	Short	Type	Default	Description
`--tags`	`-t`	TEXT	`[]`	Model tags to run (can specify multiple)
`--manifest-file`	`-m`	TEXT	`""`	Build manifest file path (for pre-built images)
`--rocm-path`		TEXT	`None`	ROCm installation root (default: `ROCM_PATH` env or `/opt/rocm`). Use when ROCm is not in `/opt/rocm` (e.g. Rock, pip).
`--registry`	`-r`	TEXT	`None`	Docker registry URL
`--timeout`		INT	`-1`	Timeout in seconds (-1=default 7200s, 0=no timeout)
`--additional-context`	`-c`	TEXT	`"{}"`	Additional context as JSON string
`--additional-context-file`	`-f`	TEXT	`None`	File containing additional context JSON
`--keep-alive`		FLAG	`False`	Keep Docker containers alive after run
`--keep-model-dir`		FLAG	`False`	Keep model directory after run
`--clean-docker-cache`		FLAG	`False`	Rebuild images without using cache (full workflow)
`--skip-model-run`		FLAG	`False`	After a build in this invocation, skip executing models (manifest/images still produced). Ignored when using `--manifest-file` with an existing manifest (run-only), or when no build ran in this invocation. See Usage — Skip model run.
`--manifest-output`		TEXT	`build_manifest.json`	Output file for build manifest (full workflow)
`--summary-output`	`-s`	TEXT	`None`	Output file for summary JSON
`--live-output`	`-l`	FLAG	`False`	Print output in real-time
`--output`	`-o`	TEXT	`perf_entry.csv`	Performance output file
`--ignore-deprecated`		FLAG	`False`	Force run deprecated models
`--data-config`		TEXT	`data.json`	Custom data configuration file
`--tools-config`		TEXT	`tools.json`	Custom tools JSON configuration
`--sys-env-details`		FLAG	`True`	Generate system config env details
`--force-mirror-local`		TEXT	`None`	Path to force local data mirroring
`--disable-skip-gpu-arch`		FLAG	`False`	Disable skipping models based on GPU architecture
`--verbose`	`-v`	FLAG	`False`	Enable verbose logging
`--cleanup-perf`		FLAG	`False`	Remove intermediate perf_entry files after run (keeps perf.csv and perf_super files)

Examples:

# Local execution
madengine run --tags dummy \
  --additional-context '{"gpu_vendor": "AMD", "guest_os": "UBUNTU"}'

# Custom ROCm path (when ROCm is not in /opt/rocm, e.g. Rock or pip install)
madengine run --tags dummy --rocm-path /path/to/rocm \
  --additional-context '{"gpu_vendor": "AMD", "guest_os": "UBUNTU"}'

# Run with pre-built images (manifest-based)
madengine run --manifest-file build_manifest.json

# Build in this invocation but skip executing containers (CI: images + manifest only)
madengine run --tags model \
  --additional-context '{"gpu_vendor": "AMD", "guest_os": "UBUNTU"}' \
  --skip-model-run

# Multi-GPU with torchrun
madengine run --tags model \
  --additional-context '{
    "gpu_vendor": "AMD",
    "guest_os": "UBUNTU",
    "docker_gpus": "0,1,2,3",
    "distributed": {
      "launcher": "torchrun",
      "nproc_per_node": 4
    }
  }'

# Kubernetes deployment (minimal config)
madengine run --tags model \
  --additional-context '{"k8s": {"gpu_count": 2}}'

# Kubernetes multi-node with vLLM
madengine run --tags model \
  --additional-context '{
    "k8s": {"gpu_count": 8},
    "distributed": {
      "launcher": "vllm",
      "nnodes": 2,
      "nproc_per_node": 4
    }
  }'

# SLURM deployment
madengine run --tags model \
  --additional-context '{
    "slurm": {
      "partition": "gpu",
      "nodes": 4,
      "gpus_per_node": 8
    },
    "distributed": {
      "launcher": "torchtitan",
      "nnodes": 4,
      "nproc_per_node": 8
    }
  }'

# With profiling tools
madengine run --tags model \
  --additional-context '{
    "gpu_vendor": "AMD",
    "guest_os": "UBUNTU",
    "tools": [
      {"name": "rocprof"},
      {"name": "gpu_info_power_profiler"}
    ]
  }'

# Custom timeout (2 hours)
madengine run --tags model --timeout 7200

# No timeout (run indefinitely)
madengine run --tags model --timeout 0

# Keep container alive for debugging
madengine run --tags model --keep-alive --verbose

# Real-time output
madengine run --tags model --live-output

# Custom performance output file
madengine run --tags model --output my_perf_results.csv

# Clean up intermediate perf files after run
madengine run --tags model --cleanup-perf

# Using configuration file
madengine run --tags model \
  --additional-context-file k8s-config.json

Execution Modes:

Full Workflow - Build + Run (when no manifest exists)
Execution Only - Run only (when manifest-file provided and exists)
Manifest-based - Use pre-built images from manifest

Deployment Targets:

Local - Docker containers on local machine
Kubernetes - Detected when k8s key present in context
SLURM - Detected when slurm key present in context

Performance Output:

Results are saved to CSV file (default: perf_entry.csv) with metrics including:

Execution time
GPU utilization
Memory usage
Model-specific performance metrics

`report` - Generate Reports

Generate HTML reports from CSV performance files.

Subcommands

`report to-html` - Convert CSV to HTML

Convert a single CSV file to HTML table format.

Usage:

madengine report to-html [OPTIONS]

Options:

Option	Short	Type	Required	Description
`--csv-file`		TEXT	Yes	Path to the CSV file to convert
`--verbose`	`-v`	FLAG	No	Enable verbose logging

Examples:

# Convert CSV to HTML
madengine report to-html --csv-file perf_entry.csv

# With custom CSV file
madengine report to-html --csv-file results/perf_mi300.csv

# Verbose output
madengine report to-html --csv-file perf.csv --verbose

Output: Creates {filename}.html in the same directory as the CSV file.

`report to-email` - Generate Email Report

Convert all CSV files in a directory to a consolidated email-ready HTML report.

Usage:

madengine report to-email [OPTIONS]

Options:

Option	Short	Type	Default	Description
`--directory`	`--dir`	TEXT	`"."`	Path to directory containing CSV files
`--output`	`-o`	TEXT	`run_results.html`	Output HTML filename
`--verbose`	`-v`	FLAG	`False`	Enable verbose logging

Examples:

# Generate email report from current directory
madengine report to-email

# Specify directory
madengine report to-email --directory ./results

# Custom output filename
madengine report to-email --dir ./results --output summary.html

# Verbose output
madengine report to-email --directory ./results --verbose

Output: Creates consolidated HTML report suitable for email distribution.

`database` - Upload to MongoDB

Upload CSV performance data to MongoDB database.

Usage:

madengine database [OPTIONS]

Options:

Option	Short	Type	Default	Required	Description
`--csv-file`		TEXT	`perf_entry.csv`	No	Path to the CSV file to upload
`--database-name`	`--db`	TEXT	`None`	Yes	Name of the MongoDB database
`--collection-name`	`--collection`	TEXT	`None`	Yes	Name of the MongoDB collection
`--verbose`	`-v`	FLAG	`False`	No	Enable verbose logging

Examples:

# Upload to MongoDB
madengine database \
  --csv-file perf_entry.csv \
  --database-name mydb \
  --collection-name results

# Short option names
madengine database \
  --csv-file perf.csv \
  --db test \
  --collection perf_data

# With verbose output
madengine database \
  --csv-file perf.csv \
  --db mydb \
  --collection results \
  --verbose

Environment Variables:

MongoDB connection details are read from environment variables:

Variable	Description	Example
`MONGO_HOST`	MongoDB host address	`localhost` or `mongodb.example.com`
`MONGO_PORT`	MongoDB port	`27017`
`MONGO_USER`	MongoDB username	`admin`
`MONGO_PASSWORD`	MongoDB password	`secretpassword`

Example Setup:

export MONGO_HOST=mongodb.example.com
export MONGO_PORT=27017
export MONGO_USER=myuser
export MONGO_PASSWORD=mypassword

madengine database \
  --csv-file perf_entry.csv \
  --db performance_db \
  --collection model_runs

Exit Codes

madengine uses standard exit codes so scripts and CI (e.g. Jenkins) can detect success or failure:

Code	Constant	Description
`0`	`SUCCESS`	Command completed successfully
`1`	`FAILURE`	General failure
`2`	`BUILD_FAILURE`	One or more image builds failed (e.g. Docker build error)
`3`	`RUN_FAILURE`	One or more model executions failed
`4`	`INVALID_ARGS`	Invalid command-line arguments or configuration

Failure recording: Pre-run failures (e.g. image pull, setup) and run failures are recorded in the performance table (perf.csv) with status FAILURE, so all attempted models appear in the CSV. The file is created automatically if missing.

Example usage in scripts / CI:

#!/bin/bash

madengine build --tags model
if [ $? -eq 0 ]; then
  echo "Build successful"
  madengine run --manifest-file build_manifest.json
else
  echo "Build failed with exit code $?"
  exit $?
fi

Configuration File Format

For complex configurations, use JSON files with --additional-context-file:

Example: config.json

{
  "gpu_vendor": "AMD",
  "guest_os": "UBUNTU",
  "docker_gpus": "0,1,2,3",
  "timeout_multiplier": 2.0,
  "docker_env_vars": {
    "PYTORCH_TUNABLEOP_ENABLED": "1",
    "HSA_ENABLE_SDMA": "0",
    "NCCL_DEBUG": "INFO"
  },
  "distributed": {
    "launcher": "torchrun",
    "nnodes": 1,
    "nproc_per_node": 4
  }
}

Example: k8s-config.json

{
  "gpu_vendor": "AMD",
  "k8s": {
    "namespace": "ml-team",
    "gpu_count": 8,
    "cpu_request": "32",
    "memory_request": "256Gi",
    "node_selector": {
      "gpu-type": "mi300x"
    }
  },
  "distributed": {
    "launcher": "vllm",
    "nnodes": 2,
    "nproc_per_node": 4
  }
}

Example: slurm-config.json

{
  "gpu_vendor": "AMD",
  "slurm": {
    "partition": "gpu",
    "nodes": 4,
    "gpus_per_node": 8,
    "time": "24:00:00",
    "account": "ml_research",
    "qos": "high"
  },
  "distributed": {
    "launcher": "torchtitan",
    "nnodes": 4,
    "nproc_per_node": 8
  }
}

To run on specific nodes, add "nodelist": "node01,node02" to the slurm section. When set, the job runs only on those nodes and node health preflight is skipped. See examples/slurm-configs/basic/03-multi-node-basic-nodelist.json.

Run phase: log error pattern scan (optional)

These keys apply to local Docker runs when madengine post-processes the run log. Use them when substring matches cause false FAILURE status (for example benign RuntimeError: lines). Full details: Configuration — Run phase: log error pattern scan.

Key	Description
`log_error_pattern_scan`	Default `true`. Set `false` to skip grep-based log failure detection.
`log_error_benign_patterns`	Array of extra strings to exclude from matching (merged with built-in benign list).
`log_error_patterns`	Non-empty array replaces the default substring list (advanced).

Environment Variables

madengine recognizes these environment variables:

Variable	Description	Default
`MODEL_DIR`	Path to MAD package directory	Auto-detected
`ROCM_PATH`	ROCm installation root (used when `--rocm-path` not set)	`/opt/rocm`
`MAD_VERBOSE_CONFIG`	Enable verbose configuration logging	`false`
`MAD_DOCKERHUB_USER`	Docker Hub username	None
`MAD_DOCKERHUB_PASSWORD`	Docker Hub password/token	None
`MAD_DOCKERHUB_REPO`	Docker Hub repository	None
`MAD_CONTAINER_IMAGE`	Pre-built container image to use	None
`MONGO_HOST`	MongoDB host for database command	`localhost`
`MONGO_PORT`	MongoDB port for database command	`27017`
`MONGO_USER`	MongoDB username	None
`MONGO_PASSWORD`	MongoDB password	None

Best Practices

Use configuration files for complex setups instead of long command lines
Separate build and run phases for distributed deployments
Test locally first before deploying to clusters
Use registries for distributed execution across multiple nodes
Enable verbose logging (--verbose) when debugging issues
Use real-time output (--live-output) for long-running operations
Version your configuration files alongside your model code
Use batch build mode for CI/CD pipelines to optimize build times

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI Command Reference

Table of Contents

Overview

Global Options

Commands

`discover` - Discover Available Models

`build` - Build Docker Images

`run` - Execute Models

`report` - Generate Reports

Subcommands

`report to-html` - Convert CSV to HTML

`report to-email` - Generate Email Report

`database` - Upload to MongoDB

Exit Codes

Configuration File Format

Run phase: log error pattern scan (optional)

Environment Variables

Best Practices

Related Documentation

FilesExpand file tree

cli-reference.md

Latest commit

History

cli-reference.md

File metadata and controls

CLI Command Reference

Table of Contents

Overview

Global Options

Commands

discover - Discover Available Models

build - Build Docker Images

run - Execute Models

report - Generate Reports

Subcommands

report to-html - Convert CSV to HTML

report to-email - Generate Email Report

database - Upload to MongoDB

Exit Codes

Configuration File Format

Run phase: log error pattern scan (optional)

Environment Variables

Best Practices

Related Documentation

`discover` - Discover Available Models

`build` - Build Docker Images

`run` - Execute Models

`report` - Generate Reports

`report to-html` - Convert CSV to HTML

`report to-email` - Generate Email Report

`database` - Upload to MongoDB