Skip to content

Latest commit

 

History

History
160 lines (111 loc) · 3.82 KB

File metadata and controls

160 lines (111 loc) · 3.82 KB

Installation Guide

Complete installation instructions for madengine.

Prerequisites

  • Python 3.8+ with pip
  • Docker with GPU support (ROCm for AMD, CUDA for NVIDIA)
  • Git for repository management
  • MAD package - Required for model discovery and execution

Quick Install

From GitHub

# Basic installation
pip install git+https://github.com/ROCm/madengine.git

# With Kubernetes support
pip install "madengine[kubernetes] @ git+https://github.com/ROCm/madengine.git"

# With all optional dependencies
pip install "madengine[all] @ git+https://github.com/ROCm/madengine.git"

Development Installation

# Clone repository
git clone https://github.com/ROCm/madengine.git
cd madengine

# Create virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in editable mode with dev dependencies
pip install -e ".[dev]"

# Setup pre-commit hooks (optional, for contributors)
pre-commit install

Optional Dependencies

Extra Install Command Use Case
kubernetes pip install madengine[kubernetes] Kubernetes deployment support
dev pip install madengine[dev] Development tools (pytest, black, mypy, etc.)
all pip install madengine[all] All optional dependencies

Note: SLURM deployment requires no additional Python dependencies (uses CLI commands).

MAD Package Setup

madengine requires the MAD package for model definitions and execution scripts.

# Clone MAD package
git clone https://github.com/ROCm/MAD.git
cd MAD

# Install madengine within MAD directory
pip install git+https://github.com/ROCm/madengine.git

# Verify installation
madengine --version
madengine discover  # Test model discovery

Docker GPU Setup

AMD ROCm

# Test ROCm GPU access
docker run --rm --device=/dev/kfd --device=/dev/dri --group-add video \
  rocm/pytorch:latest rocm-smi

# Verify with madengine
madengine run --tags dummy \
  --additional-context '{"gpu_vendor": "AMD", "guest_os": "UBUNTU"}'

Non-default ROCm location: If ROCm is not under /opt/rocm (e.g. TheRock or pip install), set ROCM_PATH or use madengine run --rocm-path /path/to/rocm so GPU detection and container env use the correct paths.

NVIDIA CUDA

# Test CUDA GPU access
docker run --rm --gpus all nvidia/cuda:latest nvidia-smi

# Verify with madengine
madengine run --tags dummy \
  --additional-context '{"gpu_vendor": "NVIDIA", "guest_os": "UBUNTU"}'

Verify Installation

# Check installation
madengine --version
madengine --version

# Test basic functionality (requires MAD package)
cd /path/to/MAD
madengine discover --tags dummy
madengine run --tags dummy \
  --additional-context '{"gpu_vendor": "AMD", "guest_os": "UBUNTU"}'

Troubleshooting

Import Errors

If you get import errors, ensure your virtual environment is activated and madengine is installed:

pip list | grep madengine

Docker Permission Issues

If you encounter Docker permission errors:

# Add user to docker group (Linux)
sudo usermod -aG docker $USER
newgrp docker

ROCm GPU Not Detected

# Check ROCm installation
rocm-smi

# Verify devices are accessible
ls -la /dev/kfd /dev/dri

If ROCm is installed in a non-default path (e.g. Rock or pip), set export ROCM_PATH=/path/to/rocm or use madengine run --rocm-path /path/to/rocm.

MAD Package Not Found

Ensure you're running madengine commands from within a MAD package directory:

cd /path/to/MAD
export MODEL_DIR=$(pwd)
madengine discover

Next Steps