Skip to content

feat(autotuner): autonomous kernel and inference configuration tuning for AMD GPUs#522

Open
ChuanLi1101 wants to merge 11 commits intomainfrom
feature/rocm-autotuner
Open

feat(autotuner): autonomous kernel and inference configuration tuning for AMD GPUs#522
ChuanLi1101 wants to merge 11 commits intomainfrom
feature/rocm-autotuner

Conversation

@ChuanLi1101
Copy link
Copy Markdown
Collaborator

Summary

Add atom.autotuner -- an autonomous kernel and inference configuration tuning framework for AMD GPUs (MI355X/MI325X/MI300X).

  • Framework-agnostic: pluggable adapters for ATOM, vLLM, and SGLang via InferenceAdapter ABC
  • Performance modeling: RBF interpolation for measured data, roofline-anchored SOL for extrapolation, 4 accuracy modes (SILICON/HYBRID/EMPIRICAL/SOL)
  • Search strategies: grid search, Bayesian optimization, and agent-guided mutation search (autoresearch-style) with Pareto frontier analysis
  • E2E estimation: composes kernel-level latencies into TTFT/TPOT/throughput, accounting for launch overhead, pipeline bubbles, KV cache transfer (disaggregated serving)
  • Crash recovery: periodic checkpointing with session resume support

Architecture

cli.py -> AgentLoop -> E2EEstimator -> PerformanceModel -> PerfStorage (SQLite)
              |              |
         ConfigSpace    Collectors (GEMM, Attention, MoE, RCCL)
              |
         ParetoAnalyzer

Code cleanup in this PR

  • Extracted common adapter logic (output parsing, health check, server lifecycle) into InferenceAdapter base class, eliminating ~140 lines of duplicated code across 3 adapters
  • Removed 6 temporary benchmark/testing scripts from prior MI355X experiments
  • Cleaned unused imports across source and test files
  • Added missing tests/autotuner/__init__.py

Test plan

  • All 49 unit tests pass (python -m pytest tests/autotuner/ -v) -- no GPU required
  • Run python -m atom.autotuner.cli run --model meta-llama/Llama-3.1-70B --system mi355x --total-gpus 8 on MI355X
  • Verify real GPU benchmarks via --adapter atom --eval-mode real_bench

…ults

Targeted Pareto optimization for GPT-OSS-120B MXFP4 on single MI355X:
- Throughput +3.6% at c256 (12023 -> 12458 tok/s)
- TTFT -78% at c256 (1042ms -> 227ms) with max_num_batched_tokens=8192
- 8K/1K TTFT -42% at c256 with combined config

Key findings:
- max_num_batched_tokens=8192 is the single best optimization for high concurrency
- gpu_memory_utilization=0.95 provides +3.3% throughput at c256
- ATOM_DUAL_STREAM_MOE_TOKEN_THRESHOLD=512 gives +1.3% at medium concurrency

Infrastructure:
- orchestrator.py: Master experiment driver with targeted search strategy
- experiment_tracker.py: Pareto frontier tracking with auto status file generation
- notifier.py: Multi-channel push notifications (ntfy/Slack/Discord/Telegram)
- status.py: CLI tool for remote experiment monitoring
- run_bench.py: Enhanced benchmark runner with integrated tracking

Made-with: Cursor
…ning for AMD GPUs

Framework-agnostic autotuner inspired by NVIDIA AIConfigurator (offline perf
modeling + config search) and Karpathy's autoresearch (agent-driven experiment
loop).  Targets MI355X/MI325X/MI300X on ROCm.

Key components:
- Collector: LLM-workload-informed micro-benchmarks for GEMM, attention, MoE, RCCL
- Database: RBF interpolation + roofline SOL modeling with 4 accuracy modes
- Search: grid / Bayesian / agent-guided strategies with Pareto frontier analysis
- Agent: propose -> benchmark -> evaluate -> keep/discard autonomous loop
- Adapters: pluggable backends for ATOM, vLLM, and SGLang
- CLI: python -m atom.autotuner.cli run --model <hf_id> --system mi355x

Includes 49 unit tests (no GPU required) covering all components.

Made-with: Cursor
Comment on lines +29 to +35
from atom.autotuner.types import (
BenchmarkResult,
ExperimentStatus,
GPUInfo,
InferenceConfig,
TunerState,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [ruff] <F401> reported by reviewdog 🐶
atom.autotuner.types.ExperimentStatus imported but unused

Suggested change
from atom.autotuner.types import (
BenchmarkResult,
ExperimentStatus,
GPUInfo,
InferenceConfig,
TunerState,
)
from atom.autotuner.types import (
BenchmarkResult,
GPUInfo,
InferenceConfig,
TunerState,
)

from atom.autotuner.database.estimator import E2EEstimator, ModelArch
from atom.autotuner.database.perf_model import PerformanceModel
from atom.autotuner.search.pareto import ParetoAnalyzer
from atom.autotuner.search.space import ConfigSpace, SearchBounds
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [ruff] <F401> reported by reviewdog 🐶
atom.autotuner.search.space.SearchBounds imported but unused

Suggested change
from atom.autotuner.search.space import ConfigSpace, SearchBounds
from atom.autotuner.search.space import ConfigSpace

Returns the experiment tracker with all results.
"""
self._setup_signal_handlers()
start_time = time.time()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [ruff] <F841> reported by reviewdog 🐶
Local variable start_time is assigned to but never used

Suggested change
start_time = time.time()
time.time()

strategy = self._build_strategy()
evaluate_fn = self._build_evaluate_fn()

last_checkpoint = time.time()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [ruff] <F841> reported by reviewdog 🐶
Local variable last_checkpoint is assigned to but never used

Suggested change
last_checkpoint = time.time()
time.time()


def _cmd_run(args: argparse.Namespace) -> int:
"""Run the autonomous tuning loop."""
from atom.autotuner.types import DatabaseMode, GPUInfo
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [ruff] <F401> reported by reviewdog 🐶
atom.autotuner.types.GPUInfo imported but unused

Suggested change
from atom.autotuner.types import DatabaseMode, GPUInfo
from atom.autotuner.types import DatabaseMode

grid[y][x] = "."

lines = []
lines.append(f" tokens/s/gpu vs tokens/s/user (* = Pareto frontier)")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [ruff] <F541> reported by reviewdog 🐶
f-string without any placeholders

Suggested change
lines.append(f" tokens/s/gpu vs tokens/s/user (* = Pareto frontier)")
lines.append(" tokens/s/gpu vs tokens/s/user (* = Pareto frontier)")

Comment on lines +18 to +19
import math
from dataclasses import dataclass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [ruff] <F401> reported by reviewdog 🐶
math imported but unused

Suggested change
import math
from dataclasses import dataclass
from dataclasses import dataclass

Comment on lines +14 to +15
import time
from abc import ABC, abstractmethod
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [ruff] <F401> reported by reviewdog 🐶
time imported but unused

Suggested change
import time
from abc import ABC, abstractmethod
from abc import ABC, abstractmethod

import random
import time
from abc import ABC, abstractmethod
from typing import Callable, Optional
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [ruff] <F401> reported by reviewdog 🐶
typing.Optional imported but unused

Suggested change
from typing import Callable, Optional
from typing import Callable

Comment on lines +15 to +16
import json
import logging
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [ruff] <F401> reported by reviewdog 🐶
json imported but unused

Suggested change
import json
import logging
import logging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant