[DNM][AMD] agentx-v0.4 rebased from commit chore/agentx-v0.4 commit 7f61#1709
[DNM][AMD] agentx-v0.4 rebased from commit chore/agentx-v0.4 commit 7f61#1709seungrokj wants to merge 8 commits into
Conversation
…r mi355x models Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
1 similar comment
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
| $ASYNC_SCHEDULING_ARGS | ||
| "${PREFIX_CACHE_ARGS[@]}" | ||
| "${OFFLOAD_ARGS[@]}" | ||
| ) |
There was a problem hiding this comment.
vLLM uses wrong model
High Severity
The vLLM command serves "$MODEL" and omits --served-model-name, while the script downloads weights into MODEL_PATH and build_replay_cmd sends --model $MODEL to aiperf. That breaks the usual MODEL_PATH + served-name pairing used by sibling agentic scripts and can fail when MODEL is a Hub id but weights live under MODEL_PATH.
Reviewed by Cursor Bugbot for commit 01cc2af. Configure here.
| --mem-fraction-static 0.8 \ | ||
| --context-length $MAX_MODEL_LEN \ | ||
| "${CACHE_ARGS[@]}" \ | ||
| "${WARMUP_ARGS[@]}" \ |
There was a problem hiding this comment.
SGLang ignores MODEL_PATH
Medium Severity
SGLang is started with --model-path $MODEL and no --served-model-name, after the script may download into MODEL_PATH. Matrix jobs that set a local MODEL_PATH can still point the server at the Hub id, and the OpenAI model name may not match MODEL used by aiperf.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 01cc2af. Configure here.
| cd LMCache | ||
| pip install -r requirements/build.txt | ||
| CXX=hipcc BUILD_WITH_HIP=1 pip install -e . --no-build-isolation | ||
| cd .. |
There was a problem hiding this comment.
LMCache clone not idempotent
Medium Severity
The lmcache path runs git clone https://github.com/LMCache/LMCache.git unconditionally. With set -e, a second run in the same working directory exits when LMCache already exists, so lmcache agentic jobs fail on retry or reuse of the job cwd.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 01cc2af. Configure here.
Signed-off-by: ajith-sirra-amd <ajith.sirra@amd.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…onfig Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
||
| python3 -m sglang.launch_server \ | ||
| --attention-backend aiter \ | ||
| --model-path $MODEL \ |
There was a problem hiding this comment.
Server ignores MODEL_PATH
Medium Severity
Weights are downloaded into MODEL_PATH when the workflow sets that directory, but SGLang is started with --model-path $MODEL (Hub id) instead of MODEL_PATH. The server may load a different cache path than the one prepared for the job.
Reviewed by Cursor Bugbot for commit 32f5007. Configure here.
| OFFLOAD_ARGS=( | ||
| --kv-transfer-config | ||
| "{\"kv_connector\":\"LMCacheMPConnector\",\"kv_connector_module_path\":\"lmcache.integration.vllm.lmcache_mp_connector\",\"kv_role\":\"kv_both\",\"kv_connector_extra_config\":{\"lmcache.mp.host\":\"$LMCACHE_CONNECT_HOST\",\"lmcache.mp.port\":$LMCACHE_PORT}}" | ||
| ) |
There was a problem hiding this comment.
LMCache missing hybrid disable
High Severity
The lmcache branch omits --disable-hybrid-kv-cache-manager on vllm serve, while the new minimaxm2.5-fp8-mi355x-vllm-agentic-lmcache config exercises that path. The sibling FP4 script documents that LMCache is incompatible without disabling the hybrid KV manager.
Reviewed by Cursor Bugbot for commit 32f5007. Configure here.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| module = _orig_import(name, globals, locals, fromlist, level) | ||
| if name == "lmcache.v1.lazy_memory_allocator" or ( | ||
| name.startswith("lmcache") and "lmcache.v1.lazy_memory_allocator" in sys.modules | ||
| ): |
There was a problem hiding this comment.
Kimi LMCache ROCm fixes removed
High Severity
The Kimi MI355X agentic script replaces the prior ROCm LMCache install (ROCm CuPy, nixl cleanup, demand-pinned allocator, MLA block fallback, chunked connector, scheduler KV-transfer patch) with a bare git clone and HIP build. New kimik2.5-fp4-mi355x-vllm-agentic-lmcache sweeps depend on this path for Kimi MLA KV on AMD.
Reviewed by Cursor Bugbot for commit 351e729. Configure here.
|
|
||
| # ---- Resolve traces and install deps ---------------------------------------- | ||
| # https://huggingface.co/datasets/semianalysisai/cc-traces-weka-with-subagents-060826 | ||
| export WEKA_LOADER_OVERRIDE=semianalysis_cc_traces_weka_with_subagents_060826 |
There was a problem hiding this comment.
DSv4 atom uncapped traces
Medium Severity
This new DSv4 ATOM agentic script sets WEKA_LOADER_OVERRIDE to the uncapped 060826 trace set, while peer MI355X agentic scripts in the same PR use 060226_256k to avoid ~1M-token traces that are rejected and skew sweeps.
Reviewed by Cursor Bugbot for commit 351e729. Configure here.
Signed-off-by: ajith-sirra-amd <ajith.sirra@amd.com>
…nalysisAI/InferenceX into amd/agentx-v0.4_rebase0611
| $ASYNC_SCHEDULING_ARGS | ||
| "${PREFIX_CACHE_ARGS[@]}" | ||
| "${OFFLOAD_ARGS[@]}" | ||
| ) |
There was a problem hiding this comment.
MiniMax FP8 launcher regressed
High Severity
The MI355X MiniMax FP8 agentic launcher was replaced with a Kimi-style vLLM recipe. Existing minimaxm2.5-fp8-mi355x-vllm-agentic jobs (TP4/EP4, offloading=cpu) lose the prior --max-model-len, ROCM_AITER_UNIFIED_ATTN backend, MODEL_PATH-based serve, and SimpleCPU offload wiring they depended on.
Reviewed by Cursor Bugbot for commit faba18f. Configure here.
| device, | ||
| ) | ||
| return torch.as_strided( | ||
| base, |
There was a problem hiding this comment.
Kimi context length dropped
Medium Severity
The launcher no longer normalizes MAX_MODEL_LEN to 262144 or passes --max-model-len to vLLM. Agentic sweeps typically leave MAX_MODEL_LEN at 0, so the replay harness and Kimi’s enforced context window can disagree and traces may be filtered or rejected differently than the server allows.
Reviewed by Cursor Bugbot for commit faba18f. Configure here.
Signed-off-by: ajith-sirra-amd <ajith.sirra@amd.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 10 total unresolved issues (including 9 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 8ca4bc1. Configure here.
| # ZMQ-style host string. | ||
| LMCACHE_CONNECT_HOST="${LMCACHE_CONNECT_HOST:-tcp://$LMCACHE_HOST}" | ||
| LMCACHE_L1_SIZE_GB="${LMCACHE_L1_SIZE_GB:-$TOTAL_CPU_DRAM_GB}" | ||
| LMCACHE_L1_SIZE_GB="${LMCACHE_L1_SIZE_GB:-$((TOTAL_CPU_DRAM_GB / (8 / TP)))}" |
There was a problem hiding this comment.
LMCache pool wrongly partitioned
Medium Severity
LMCACHE_L1_SIZE_GB for the external LMCache MP server is derived with TOTAL_CPU_DRAM_GB / (8 / TP), the same formula used for per-rank vLLM CPU offload. The MP server owns one node pool; at TP=4 this shrinks L1 from ~3 TB to ~1.5 TB versus the prior full TOTAL_CPU_DRAM_GB default.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 8ca4bc1. Configure here.


Summary
qwen3.5-fp4-mi355x-sglang-agentic-hicacheconfig: SGLang agentic-coding sweep with and without hicache offloading (TP2, EP1)minimaxm2.5-fp4-mi355x-vllm-agentic-lmcacheconfig: vLLM agentic-coding sweep with lmcacheminimaxm2.5_fp4_mi355x.sh,qwen3.5_fp4_mi355x.shglm5.1_fp4_mi355x.sh,kimik2.5_fp4_mi355x.sh,minimaxm2.5_fp8_mi355x.sh,qwen3.5_fp8_mi355x.shlaunch_mi355x-amds.shTest plan
🤖 Generated with Claude Code
Note
Medium Risk
Large benchmark-only change, but it alters KV offload/LMCache startup paths and host DRAM sizing on cluster jobs; misconfiguration could cause OOM or failed sweeps rather than app regressions.
Overview
Extends agentx-v0.4 MI355X agentic-coding coverage by wiring CPU-tier KV offload into both the CI matrix (
amd-master.yaml) and single-node launchers.Matrix: Adds targeted sweeps comparing
offloading: nonevshicache(SGLang) orlmcache(vLLM/ATOM) for Qwen3.5 FP4/FP8, GLM-5.1 FP4, Kimi K2.5 FP4, MiniMax M2.5 FP4/FP8, and DeepSeek-V4 (SGLang + ATOM). Updates the existing Qwen3.5 FP8 HiCache entry (newer image, TP4 grid). Pins several MiniMax FP8 agentic jobs to vLLM v0.22.0 (from v0.22.1).Launchers: New scripts for DSv4 SGLang/ATOM agentic runs and several model-specific agentic recipes; existing scripts gain HiCache ratio/size tuning, LMCache MP (HIP build from source), larger host DRAM budgets (~3 TB), 060226_256k trace corpus overrides, and concurrency-tuned vLLM flags. Kimi’s launcher drops the prior in-repo ROCm LMCache monkey-patches in favor of upstream LMCache + connector config. Slurm:
launch_mi355x-amds.shalso excludes nodemia1-p01-g37.Reviewed by Cursor Bugbot for commit 8ca4bc1. Bugbot is set up for automated code reviews on this repo. Configure here.