[plugin][dashboard] use nightly date tagged docker#503
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves reproducibility and presentation of ATOM vLLM OOT benchmark runs by resolving the floating vllm-latest Docker tag to a stable, date-tagged (or digest-pinned) image reference and by surfacing the resolved image digest in the benchmark dashboard while suppressing commit metadata in OOT detail views.
Changes:
- Update the OOT benchmark workflow to resolve
rocm/atom-dev:vllm-latestto a same-digest nightly tag (or digest-pin) and record the pulled image digest into the result payload. - Add a new
.github/scripts/resolve_oot_image.pyhelper to perform the Docker Hub registry resolution from a floating tag to a stable reference. - Extend the OOT dashboard pipeline/UI to ingest and display the Docker image digest, and hide commit/message/author for ATOM-vLLM detail views.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
.github/workflows/atom-vllm-oot-benchmark.yaml |
Resolves prebuilt OOT image to a stable nightly/digest reference and records the pulled digest into the benchmark payload. |
.github/scripts/resolve_oot_image.py |
New resolver script that maps a floating tag to a same-digest nightly tag (or digest-pinned fallback). |
.github/scripts/oot_benchmark_to_dashboard.py |
Adds oot_image_digest into the extra metadata string for dashboard ingestion. |
.github/dashboard/index.html |
Parses digest from extra and updates detail/popover rendering to show digest and hide commit metadata for ATOM-vLLM. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Low — Reverse lookup in .github/scripts/resolve_oot_image.py performs a linear, per-tag digest fetch for nightly tags. As the number of tags grows, request volume increases and may hit Docker Hub rate limits, which can force a fallback to floating vllm-latest and weaken reproducibility stability. resolve_oot_image.py candidates = nightly_candidates(list_tags(repository, token), preferred_version) |
The docker image is searched from the latest to the oldest according to the digest, so the search overhead is very small. |
and neglect commit/message/author info Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
512283e to
3d69252
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Can we remove the OOT or oot naming? |
Sure. I will remove the OOT terminology in this PR's code change. For other remove work, let's make in PR #541 |
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (2)
.github/workflows/atom-vllm-benchmark.yaml:1
- The workflow name was changed to "ATOM vLLM Benchmark", but later steps still query the old workflow name ("ATOM vLLM OOT Benchmark") when downloading baseline runs via
gh run list --workflow=.... That lookup will return no runs, so regression/baseline comparison will silently stop working. Update thegh run listfilter to match the new workflow name (or switch to referencing the workflow file path / workflow ID to avoid breakage on future renames).
.github/workflows/atom-vllm-benchmark.yaml:186 - This fallback uses the floating tag
rocm/atom-dev:vllm-latestwhen resolution fails, which can undermine the PR goal of avoiding non-date-tagged (and non-stable) images for main-branch benchmarks. Consider failing the run when digest resolution cannot be performed, or add a more robust fallback that still produces a digest-pinned reference (e.g., retry/backoff, optional Docker Hub auth to avoid rate limits, or a second resolver usingdocker manifest inspect).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
when running vllm-atom benchmark, only use docker image with the date instead of latest docker
CC: @wuhuikx