Complete reference for all madengine CLI commands with detailed options and examples.
madengine provides a modern CLI for AI model automation and distributed execution. All commands follow a consistent pattern with rich terminal output and comprehensive error handling.
madengine [OPTIONS] COMMAND [ARGS]...These options are available for the main madengine command:
| Option | Description |
|---|---|
--version |
Show version and exit |
--help |
Show help message and exit |
Discover all models available in the MAD package based on specified tags.
Usage:
madengine discover [OPTIONS]Options:
| Option | Short | Type | Default | Description |
|---|---|---|---|---|
--tags |
-t |
TEXT | [] |
Model tags to discover (can specify multiple) |
--verbose |
-v |
FLAG | False |
Enable verbose logging |
Examples:
# Discover all models
madengine discover
# Discover specific models by tag
madengine discover --tags dummy pyt_huggingface_bert
# Multiple tags with comma separation
madengine discover --tags dummy,multi,vllm
# With verbose output
madengine discover --tags model --verbose
# Directory-specific models
madengine discover --tags dummy2:dummy_2
# Dynamic models with parameters
madengine discover --tags dummy3:dummy_3:batch_size=512Discovery Methods:
- Root models - From
models.jsonin MAD package root - Directory-specific - From
scripts/{dir}/models.json - Dynamic models - Generated by
scripts/{dir}/get_models_json.py
Build Docker images for models, optionally pushing them to a registry.
Usage:
madengine build [OPTIONS]Options:
| Option | Short | Type | Default | Description |
|---|---|---|---|---|
--tags |
-t |
TEXT | [] |
Model tags to build (can specify multiple) |
--target-archs |
-a |
TEXT | [] |
Target GPU architectures (e.g., gfx908,gfx90a,gfx942) |
--registry |
-r |
TEXT | None |
Docker registry to push images to |
--batch-manifest |
TEXT | None |
Input batch.json file for batch build mode | |
--additional-context |
-c |
TEXT | "{}" |
Additional context as JSON string |
--additional-context-file |
-f |
TEXT | None |
File containing additional context JSON |
--clean-docker-cache |
FLAG | False |
Rebuild images without using cache | |
--manifest-output |
-m |
TEXT | build_manifest.json |
Output file for build manifest |
--summary-output |
-s |
TEXT | None |
Output file for build summary JSON |
--live-output |
-l |
FLAG | False |
Print output in real-time |
--verbose |
-v |
FLAG | False |
Enable verbose logging |
Examples:
# Basic build
madengine build --tags dummy \
--additional-context '{"gpu_vendor": "AMD", "guest_os": "UBUNTU"}'
# Build with registry
madengine build --tags model \
--registry docker.io/myorg \
--additional-context '{"gpu_vendor": "AMD", "guest_os": "UBUNTU"}'
# Build multiple models
madengine build --tags model1 model2 model3 \
--registry localhost:5000
# Build for multiple GPU architectures
madengine build --tags model \
--target-archs gfx908 gfx90a gfx942 \
--registry gcr.io/myproject
# Clean rebuild without cache
madengine build --tags model --clean-docker-cache
# Batch build mode (selective builds)
madengine build --batch-manifest batch.json \
--registry docker.io/myorg \
--additional-context-file config.json
# Custom manifest output
madengine build --tags model \
--manifest-output my_manifest.json \
--summary-output build_summary.json
# Real-time output with verbose logging
madengine build --tags model --live-output --verboseDefault Values:
The build command applies the following defaults if not specified:
- gpu_vendor:
AMD - guest_os:
UBUNTU
Example with defaults:
# Equivalent to providing {"gpu_vendor": "AMD", "guest_os": "UBUNTU"}
madengine build --tags dummyYou will see a message indicating which defaults were applied:
ℹ️ Using default values for build configuration:
• gpu_vendor: AMD (default)
• guest_os: UBUNTU (default)
💡 To customize, use --additional-context '{"gpu_vendor": "NVIDIA", "guest_os": "CENTOS"}'
Supported Values:
gpu_vendor:"AMD"or"NVIDIA"guest_os:"UBUNTU"or"CENTOS"
Batch Build Mode:
When using --batch-manifest, provide a JSON file with selective build configuration:
[
{
"model_name": "model1",
"build_new": true,
"registry": "docker.io/myorg",
"registry_image": "custom-namespace/model1"
},
{
"model_name": "model2",
"build_new": false
}
]See Batch Build Guide for details.
Run models locally or deploy to Kubernetes/SLURM clusters.
Usage:
madengine run [OPTIONS]Options:
| Option | Short | Type | Default | Description |
|---|---|---|---|---|
--tags |
-t |
TEXT | [] |
Model tags to run (can specify multiple) |
--manifest-file |
-m |
TEXT | "" |
Build manifest file path (for pre-built images) |
--rocm-path |
TEXT | None |
ROCm installation root (default: ROCM_PATH env or /opt/rocm). Use when ROCm is not in /opt/rocm (e.g. Rock, pip). |
|
--registry |
-r |
TEXT | None |
Docker registry URL |
--timeout |
INT | -1 |
Timeout in seconds (-1=default 7200s, 0=no timeout) | |
--additional-context |
-c |
TEXT | "{}" |
Additional context as JSON string |
--additional-context-file |
-f |
TEXT | None |
File containing additional context JSON |
--keep-alive |
FLAG | False |
Keep Docker containers alive after run | |
--keep-model-dir |
FLAG | False |
Keep model directory after run | |
--clean-docker-cache |
FLAG | False |
Rebuild images without using cache (full workflow) | |
--skip-model-run |
FLAG | False |
After a build in this invocation, skip executing models (manifest/images still produced). Ignored when using --manifest-file with an existing manifest (run-only), or when no build ran in this invocation. See Usage — Skip model run. |
|
--manifest-output |
TEXT | build_manifest.json |
Output file for build manifest (full workflow) | |
--summary-output |
-s |
TEXT | None |
Output file for summary JSON |
--live-output |
-l |
FLAG | False |
Print output in real-time |
--output |
-o |
TEXT | perf_entry.csv |
Performance output file |
--ignore-deprecated |
FLAG | False |
Force run deprecated models | |
--data-config |
TEXT | data.json |
Custom data configuration file | |
--tools-config |
TEXT | tools.json |
Custom tools JSON configuration | |
--sys-env-details |
FLAG | True |
Generate system config env details | |
--force-mirror-local |
TEXT | None |
Path to force local data mirroring | |
--disable-skip-gpu-arch |
FLAG | False |
Disable skipping models based on GPU architecture | |
--verbose |
-v |
FLAG | False |
Enable verbose logging |
--cleanup-perf |
FLAG | False |
Remove intermediate perf_entry files after run (keeps perf.csv and perf_super files) |
Examples:
# Local execution
madengine run --tags dummy \
--additional-context '{"gpu_vendor": "AMD", "guest_os": "UBUNTU"}'
# Custom ROCm path (when ROCm is not in /opt/rocm, e.g. Rock or pip install)
madengine run --tags dummy --rocm-path /path/to/rocm \
--additional-context '{"gpu_vendor": "AMD", "guest_os": "UBUNTU"}'
# Run with pre-built images (manifest-based)
madengine run --manifest-file build_manifest.json
# Build in this invocation but skip executing containers (CI: images + manifest only)
madengine run --tags model \
--additional-context '{"gpu_vendor": "AMD", "guest_os": "UBUNTU"}' \
--skip-model-run
# Multi-GPU with torchrun
madengine run --tags model \
--additional-context '{
"gpu_vendor": "AMD",
"guest_os": "UBUNTU",
"docker_gpus": "0,1,2,3",
"distributed": {
"launcher": "torchrun",
"nproc_per_node": 4
}
}'
# Kubernetes deployment (minimal config)
madengine run --tags model \
--additional-context '{"k8s": {"gpu_count": 2}}'
# Kubernetes multi-node with vLLM
madengine run --tags model \
--additional-context '{
"k8s": {"gpu_count": 8},
"distributed": {
"launcher": "vllm",
"nnodes": 2,
"nproc_per_node": 4
}
}'
# SLURM deployment
madengine run --tags model \
--additional-context '{
"slurm": {
"partition": "gpu",
"nodes": 4,
"gpus_per_node": 8
},
"distributed": {
"launcher": "torchtitan",
"nnodes": 4,
"nproc_per_node": 8
}
}'
# With profiling tools
madengine run --tags model \
--additional-context '{
"gpu_vendor": "AMD",
"guest_os": "UBUNTU",
"tools": [
{"name": "rocprof"},
{"name": "gpu_info_power_profiler"}
]
}'
# Custom timeout (2 hours)
madengine run --tags model --timeout 7200
# No timeout (run indefinitely)
madengine run --tags model --timeout 0
# Keep container alive for debugging
madengine run --tags model --keep-alive --verbose
# Real-time output
madengine run --tags model --live-output
# Custom performance output file
madengine run --tags model --output my_perf_results.csv
# Clean up intermediate perf files after run
madengine run --tags model --cleanup-perf
# Using configuration file
madengine run --tags model \
--additional-context-file k8s-config.jsonExecution Modes:
- Full Workflow - Build + Run (when no manifest exists)
- Execution Only - Run only (when manifest-file provided and exists)
- Manifest-based - Use pre-built images from manifest
Deployment Targets:
- Local - Docker containers on local machine
- Kubernetes - Detected when
k8skey present in context - SLURM - Detected when
slurmkey present in context
Performance Output:
Results are saved to CSV file (default: perf_entry.csv) with metrics including:
- Execution time
- GPU utilization
- Memory usage
- Model-specific performance metrics
Generate HTML reports from CSV performance files.
Convert a single CSV file to HTML table format.
Usage:
madengine report to-html [OPTIONS]Options:
| Option | Short | Type | Required | Description |
|---|---|---|---|---|
--csv-file |
TEXT | Yes | Path to the CSV file to convert | |
--verbose |
-v |
FLAG | No | Enable verbose logging |
Examples:
# Convert CSV to HTML
madengine report to-html --csv-file perf_entry.csv
# With custom CSV file
madengine report to-html --csv-file results/perf_mi300.csv
# Verbose output
madengine report to-html --csv-file perf.csv --verboseOutput: Creates {filename}.html in the same directory as the CSV file.
Convert all CSV files in a directory to a consolidated email-ready HTML report.
Usage:
madengine report to-email [OPTIONS]Options:
| Option | Short | Type | Default | Description |
|---|---|---|---|---|
--directory |
--dir |
TEXT | "." |
Path to directory containing CSV files |
--output |
-o |
TEXT | run_results.html |
Output HTML filename |
--verbose |
-v |
FLAG | False |
Enable verbose logging |
Examples:
# Generate email report from current directory
madengine report to-email
# Specify directory
madengine report to-email --directory ./results
# Custom output filename
madengine report to-email --dir ./results --output summary.html
# Verbose output
madengine report to-email --directory ./results --verboseOutput: Creates consolidated HTML report suitable for email distribution.
Upload CSV performance data to MongoDB database.
Usage:
madengine database [OPTIONS]Options:
| Option | Short | Type | Default | Required | Description |
|---|---|---|---|---|---|
--csv-file |
TEXT | perf_entry.csv |
No | Path to the CSV file to upload | |
--database-name |
--db |
TEXT | None |
Yes | Name of the MongoDB database |
--collection-name |
--collection |
TEXT | None |
Yes | Name of the MongoDB collection |
--verbose |
-v |
FLAG | False |
No | Enable verbose logging |
Examples:
# Upload to MongoDB
madengine database \
--csv-file perf_entry.csv \
--database-name mydb \
--collection-name results
# Short option names
madengine database \
--csv-file perf.csv \
--db test \
--collection perf_data
# With verbose output
madengine database \
--csv-file perf.csv \
--db mydb \
--collection results \
--verboseEnvironment Variables:
MongoDB connection details are read from environment variables:
| Variable | Description | Example |
|---|---|---|
MONGO_HOST |
MongoDB host address | localhost or mongodb.example.com |
MONGO_PORT |
MongoDB port | 27017 |
MONGO_USER |
MongoDB username | admin |
MONGO_PASSWORD |
MongoDB password | secretpassword |
Example Setup:
export MONGO_HOST=mongodb.example.com
export MONGO_PORT=27017
export MONGO_USER=myuser
export MONGO_PASSWORD=mypassword
madengine database \
--csv-file perf_entry.csv \
--db performance_db \
--collection model_runsmadengine uses standard exit codes so scripts and CI (e.g. Jenkins) can detect success or failure:
| Code | Constant | Description |
|---|---|---|
0 |
SUCCESS |
Command completed successfully |
1 |
FAILURE |
General failure |
2 |
BUILD_FAILURE |
One or more image builds failed (e.g. Docker build error) |
3 |
RUN_FAILURE |
One or more model executions failed |
4 |
INVALID_ARGS |
Invalid command-line arguments or configuration |
Failure recording: Pre-run failures (e.g. image pull, setup) and run failures are recorded in the performance table (perf.csv) with status FAILURE, so all attempted models appear in the CSV. The file is created automatically if missing.
Example usage in scripts / CI:
#!/bin/bash
madengine build --tags model
if [ $? -eq 0 ]; then
echo "Build successful"
madengine run --manifest-file build_manifest.json
else
echo "Build failed with exit code $?"
exit $?
fiFor complex configurations, use JSON files with --additional-context-file:
Example: config.json
{
"gpu_vendor": "AMD",
"guest_os": "UBUNTU",
"docker_gpus": "0,1,2,3",
"timeout_multiplier": 2.0,
"docker_env_vars": {
"PYTORCH_TUNABLEOP_ENABLED": "1",
"HSA_ENABLE_SDMA": "0",
"NCCL_DEBUG": "INFO"
},
"distributed": {
"launcher": "torchrun",
"nnodes": 1,
"nproc_per_node": 4
}
}Example: k8s-config.json
{
"gpu_vendor": "AMD",
"k8s": {
"namespace": "ml-team",
"gpu_count": 8,
"cpu_request": "32",
"memory_request": "256Gi",
"node_selector": {
"gpu-type": "mi300x"
}
},
"distributed": {
"launcher": "vllm",
"nnodes": 2,
"nproc_per_node": 4
}
}Example: slurm-config.json
{
"gpu_vendor": "AMD",
"slurm": {
"partition": "gpu",
"nodes": 4,
"gpus_per_node": 8,
"time": "24:00:00",
"account": "ml_research",
"qos": "high"
},
"distributed": {
"launcher": "torchtitan",
"nnodes": 4,
"nproc_per_node": 8
}
}To run on specific nodes, add "nodelist": "node01,node02" to the slurm section. When set, the job runs only on those nodes and node health preflight is skipped. See examples/slurm-configs/basic/03-multi-node-basic-nodelist.json.
These keys apply to local Docker runs when madengine post-processes the run log. Use them when substring matches cause false FAILURE status (for example benign RuntimeError: lines). Full details: Configuration — Run phase: log error pattern scan.
| Key | Description |
|---|---|
log_error_pattern_scan |
Default true. Set false to skip grep-based log failure detection. |
log_error_benign_patterns |
Array of extra strings to exclude from matching (merged with built-in benign list). |
log_error_patterns |
Non-empty array replaces the default substring list (advanced). |
madengine recognizes these environment variables:
| Variable | Description | Default |
|---|---|---|
MODEL_DIR |
Path to MAD package directory | Auto-detected |
ROCM_PATH |
ROCm installation root (used when --rocm-path not set) |
/opt/rocm |
MAD_VERBOSE_CONFIG |
Enable verbose configuration logging | false |
MAD_DOCKERHUB_USER |
Docker Hub username | None |
MAD_DOCKERHUB_PASSWORD |
Docker Hub password/token | None |
MAD_DOCKERHUB_REPO |
Docker Hub repository | None |
MAD_CONTAINER_IMAGE |
Pre-built container image to use | None |
MONGO_HOST |
MongoDB host for database command | localhost |
MONGO_PORT |
MongoDB port for database command | 27017 |
MONGO_USER |
MongoDB username | None |
MONGO_PASSWORD |
MongoDB password | None |
- Use configuration files for complex setups instead of long command lines
- Separate build and run phases for distributed deployments
- Test locally first before deploying to clusters
- Use registries for distributed execution across multiple nodes
- Enable verbose logging (
--verbose) when debugging issues - Use real-time output (
--live-output) for long-running operations - Version your configuration files alongside your model code
- Use batch build mode for CI/CD pipelines to optimize build times
- Usage Guide - Comprehensive usage examples and workflows
- Configuration Guide - Advanced configuration options
- Deployment Guide - Kubernetes and SLURM deployment details
- Batch Build Guide - Selective builds with batch manifests
- Launchers Guide - Distributed training frameworks
- Profiling Guide - Performance analysis tools
Version: 2.0.0
Last Updated: December 2025