Skip to content

Add the benchmark flow for ATOM vLLM plugin#514

Draft
wuhuikx wants to merge 4 commits intomainfrom
hattie/oot_benchmark
Draft

Add the benchmark flow for ATOM vLLM plugin#514
wuhuikx wants to merge 4 commits intomainfrom
hattie/oot_benchmark

Conversation

@wuhuikx
Copy link
Copy Markdown
Contributor

@wuhuikx wuhuikx commented Apr 8, 2026

  1. This PR adds a reproducible benchmark flow for ATOM vLLM Plugin. The default environment is rocm/atom-dev:vllm-latest with the benchmarking recipe aligned with InferenceMax and ATOM itself.

  2. It also provides an option to compare ATOM vLLM plugin against upstream vLLM under a fixed A/B setup. It standardizes benchmark execution and reports key serving metrics (throughput + latency) for direct runtime comparison.

Here is an example to compare performance between ATOM vLLM Plugin and Upstream vllm (with vllm/vllm-openai-rocm:nightly) for Kimi-K2-Thinking-MXFP4 (ISL/OSL=1K/1K, concurrency=8, TP=4).

Key Results (1K/1K, C8, TP4)

1k/1k c8 主结果(TP4)
ATOM plugin

Output TPUT: 578.85 tok/s
Total TPUT: 1152.17 tok/s
mean TTFT: 135.47 ms
mean TPOT: 12.97 ms
mean E2EL: 13898.61 ms
Upstream nightly

Output TPUT: 459.33 tok/s
Total TPUT: 914.28 tok/s
mean TTFT: 225.03 ms
mean TPOT: 16.38 ms
mean E2EL: 17570.20 ms
对比结论(ATOM 相对 upstream)
吞吐:
Output TPUT +26.02%
Total TPUT +26.02%
时延(越低越好):
TTFT -39.80%(更低)
TPOT -20.82%
E2EL -20.90%

@wuhuikx wuhuikx changed the title Add the benchmark flow for OOT Add the benchmark flow for ATOM vLLM plugin Apr 8, 2026
@wuhuikx wuhuikx marked this pull request as draft April 8, 2026 05:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant