feat(autotuner): autonomous kernel and inference configuration tuning for AMD GPUs by ChuanLi1101 · Pull Request #522 · ROCm/ATOM

ChuanLi1101 · 2026-04-08T22:48:30Z

Summary

Add atom.autotuner -- an autonomous kernel and inference configuration tuning framework for AMD GPUs (MI355X/MI325X/MI300X).

Framework-agnostic: pluggable adapters for ATOM, vLLM, and SGLang via InferenceAdapter ABC
Performance modeling: RBF interpolation for measured data, roofline-anchored SOL for extrapolation, 4 accuracy modes (SILICON/HYBRID/EMPIRICAL/SOL)
Search strategies: grid search, Bayesian optimization, and agent-guided mutation search (autoresearch-style) with Pareto frontier analysis
E2E estimation: composes kernel-level latencies into TTFT/TPOT/throughput, accounting for launch overhead, pipeline bubbles, KV cache transfer (disaggregated serving)
Crash recovery: periodic checkpointing with session resume support

Architecture

cli.py -> AgentLoop -> E2EEstimator -> PerformanceModel -> PerfStorage (SQLite)
              |              |
         ConfigSpace    Collectors (GEMM, Attention, MoE, RCCL)
              |
         ParetoAnalyzer

Code cleanup in this PR

Extracted common adapter logic (output parsing, health check, server lifecycle) into InferenceAdapter base class, eliminating ~140 lines of duplicated code across 3 adapters
Removed 6 temporary benchmark/testing scripts from prior MI355X experiments
Cleaned unused imports across source and test files
Added missing tests/autotuner/__init__.py

Test plan

All 49 unit tests pass (python -m pytest tests/autotuner/ -v) -- no GPU required
Run python -m atom.autotuner.cli run --model meta-llama/Llama-3.1-70B --system mi355x --total-gpus 8 on MI355X
Verify real GPU benchmarks via --adapter atom --eval-mode real_bench

Made-with: Cursor

…ults Targeted Pareto optimization for GPT-OSS-120B MXFP4 on single MI355X: - Throughput +3.6% at c256 (12023 -> 12458 tok/s) - TTFT -78% at c256 (1042ms -> 227ms) with max_num_batched_tokens=8192 - 8K/1K TTFT -42% at c256 with combined config Key findings: - max_num_batched_tokens=8192 is the single best optimization for high concurrency - gpu_memory_utilization=0.95 provides +3.3% throughput at c256 - ATOM_DUAL_STREAM_MOE_TOKEN_THRESHOLD=512 gives +1.3% at medium concurrency Infrastructure: - orchestrator.py: Master experiment driver with targeted search strategy - experiment_tracker.py: Pareto frontier tracking with auto status file generation - notifier.py: Multi-channel push notifications (ntfy/Slack/Discord/Telegram) - status.py: CLI tool for remote experiment monitoring - run_bench.py: Enhanced benchmark runner with integrated tracking Made-with: Cursor

…placeholders Made-with: Cursor

…i355x-perf-experiment

…github.com/ROCm/ATOM into chuali/gpt-oss-120b-mi355x-perf-experiment

…oard changes Made-with: Cursor

…github.com/ROCm/ATOM into chuali/gpt-oss-120b-mi355x-perf-experiment

…ning for AMD GPUs Framework-agnostic autotuner inspired by NVIDIA AIConfigurator (offline perf modeling + config search) and Karpathy's autoresearch (agent-driven experiment loop). Targets MI355X/MI325X/MI300X on ROCm. Key components: - Collector: LLM-workload-informed micro-benchmarks for GEMM, attention, MoE, RCCL - Database: RBF interpolation + roofline SOL modeling with 4 accuracy modes - Search: grid / Bayesian / agent-guided strategies with Pareto frontier analysis - Agent: propose -> benchmark -> evaluate -> keep/discard autonomous loop - Adapters: pluggable backends for ATOM, vLLM, and SGLang - CLI: python -m atom.autotuner.cli run --model <hf_id> --system mi355x Includes 49 unit tests (no GPU required) covering all components. Made-with: Cursor

github-actions · 2026-04-08T22:49:13Z

atom/autotuner/agent/loop.py

+from atom.autotuner.types import (
+    BenchmarkResult,
+    ExperimentStatus,
+    GPUInfo,
+    InferenceConfig,
+    TunerState,
+)


⚠️ [ruff] <F401> _{reported by reviewdog 🐶}
atom.autotuner.types.ExperimentStatus imported but unused

Suggested change

from atom.autotuner.types import (

BenchmarkResult,

ExperimentStatus,

GPUInfo,

InferenceConfig,

TunerState,

)

from atom.autotuner.types import (

BenchmarkResult,

GPUInfo,

InferenceConfig,

TunerState,

)

github-actions · 2026-04-08T22:49:13Z

atom/autotuner/agent/loop.py

+from atom.autotuner.database.estimator import E2EEstimator, ModelArch
+from atom.autotuner.database.perf_model import PerformanceModel
+from atom.autotuner.search.pareto import ParetoAnalyzer
+from atom.autotuner.search.space import ConfigSpace, SearchBounds


⚠️ [ruff] <F401> _{reported by reviewdog 🐶}
atom.autotuner.search.space.SearchBounds imported but unused

Suggested change

from atom.autotuner.search.space import ConfigSpace, SearchBounds

from atom.autotuner.search.space import ConfigSpace

github-actions · 2026-04-08T22:49:13Z

atom/autotuner/agent/loop.py

+        Returns the experiment tracker with all results.
+        """
+        self._setup_signal_handlers()
+        start_time = time.time()


⚠️ [ruff] <F841> _{reported by reviewdog 🐶}
Local variable start_time is assigned to but never used

Suggested change

start_time = time.time()

time.time()

github-actions · 2026-04-08T22:49:13Z

atom/autotuner/agent/loop.py

+        strategy = self._build_strategy()
+        evaluate_fn = self._build_evaluate_fn()
+
+        last_checkpoint = time.time()


⚠️ [ruff] <F841> _{reported by reviewdog 🐶}
Local variable last_checkpoint is assigned to but never used

Suggested change

last_checkpoint = time.time()

time.time()

github-actions · 2026-04-08T22:49:13Z

atom/autotuner/cli.py

+
+def _cmd_run(args: argparse.Namespace) -> int:
+    """Run the autonomous tuning loop."""
+    from atom.autotuner.types import DatabaseMode, GPUInfo


⚠️ [ruff] <F401> _{reported by reviewdog 🐶}
atom.autotuner.types.GPUInfo imported but unused

Suggested change

from atom.autotuner.types import DatabaseMode, GPUInfo

from atom.autotuner.types import DatabaseMode

github-actions · 2026-04-08T22:49:14Z

atom/autotuner/search/pareto.py

+                grid[y][x] = "."
+
+        lines = []
+        lines.append(f"  tokens/s/gpu vs tokens/s/user (* = Pareto frontier)")


⚠️ [ruff] <F541> _{reported by reviewdog 🐶}
f-string without any placeholders

Suggested change

lines.append(f" tokens/s/gpu vs tokens/s/user (* = Pareto frontier)")

lines.append(" tokens/s/gpu vs tokens/s/user (* = Pareto frontier)")

github-actions · 2026-04-08T22:49:14Z

atom/autotuner/search/space.py

+import math
+from dataclasses import dataclass


⚠️ [ruff] <F401> _{reported by reviewdog 🐶}
math imported but unused

Suggested change

import math

from dataclasses import dataclass

from dataclasses import dataclass

github-actions · 2026-04-08T22:49:14Z

atom/autotuner/search/strategies.py

+import time
+from abc import ABC, abstractmethod


⚠️ [ruff] <F401> _{reported by reviewdog 🐶}
time imported but unused

Suggested change

import time

from abc import ABC, abstractmethod

from abc import ABC, abstractmethod

github-actions · 2026-04-08T22:49:14Z

atom/autotuner/search/strategies.py

+import random
+import time
+from abc import ABC, abstractmethod
+from typing import Callable, Optional


⚠️ [ruff] <F401> _{reported by reviewdog 🐶}
typing.Optional imported but unused

Suggested change

from typing import Callable, Optional

from typing import Callable

github-actions · 2026-04-08T22:49:15Z

atom/autotuner/utils/state.py

+import json
+import logging


⚠️ [ruff] <F401> _{reported by reviewdog 🐶}
json imported but unused

Suggested change

import json

import logging

import logging

ChuanLi1101 added 10 commits April 2, 2026 09:49

feat: add vLLM benchmark workflow, model configs, and dashboard template

c63f825

Made-with: Cursor

Fix Black and Ruff CI failures: formatting, unused imports, f-string …

44b7442

…placeholders Made-with: Cursor

Merge branch 'main' into chuali/gpt-oss-120b-mi355x-perf-experiment

5674ffa

Merge remote-tracking branch 'origin/main' into chuali/gpt-oss-120b-m…

6f2ec89

…i355x-perf-experiment

Merge branch 'chuali/gpt-oss-120b-mi355x-perf-experiment' of https://…

61e5e91

…github.com/ROCm/ATOM into chuali/gpt-oss-120b-mi355x-perf-experiment

Merge branch 'main' into chuali/gpt-oss-120b-mi355x-perf-experiment

f9d70b5

CI: expand paths-ignore to skip GPU tests for scripts/benchmark/dashb…

fb90ff7

…oard changes Made-with: Cursor

Merge branch 'chuali/gpt-oss-120b-mi355x-perf-experiment' of https://…

abe3f0f

…github.com/ROCm/ATOM into chuali/gpt-oss-120b-mi355x-perf-experiment

github-actions bot reviewed Apr 8, 2026

View reviewed changes

Merge branch 'main' into feature/rocm-autotuner

66cb4ae

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(autotuner): autonomous kernel and inference configuration tuning for AMD GPUs#522

feat(autotuner): autonomous kernel and inference configuration tuning for AMD GPUs#522
ChuanLi1101 wants to merge 11 commits intomainfrom
feature/rocm-autotuner

ChuanLi1101 commented Apr 8, 2026

Uh oh!

github-actions bot Apr 8, 2026

Uh oh!

github-actions bot Apr 8, 2026

Uh oh!

github-actions bot Apr 8, 2026

Uh oh!

github-actions bot Apr 8, 2026

Uh oh!

github-actions bot Apr 8, 2026

Uh oh!

github-actions bot Apr 8, 2026

Uh oh!

github-actions bot Apr 8, 2026

Uh oh!

github-actions bot Apr 8, 2026

Uh oh!

github-actions bot Apr 8, 2026

Uh oh!

github-actions bot Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	from atom.autotuner.search.space import ConfigSpace, SearchBounds
	from atom.autotuner.search.space import ConfigSpace

	from atom.autotuner.types import DatabaseMode, GPUInfo
	from atom.autotuner.types import DatabaseMode

	lines.append(f" tokens/s/gpu vs tokens/s/user (* = Pareto frontier)")
	lines.append(" tokens/s/gpu vs tokens/s/user (* = Pareto frontier)")

	import math
	from dataclasses import dataclass
	from dataclasses import dataclass

	import time
	from abc import ABC, abstractmethod
	from abc import ABC, abstractmethod

	from typing import Callable, Optional
	from typing import Callable

Conversation

ChuanLi1101 commented Apr 8, 2026

Summary

Architecture

Code cleanup in this PR

Test plan

Uh oh!

github-actions bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant