Skip to content

Conversation

@sundar-pds
Copy link
Collaborator

No description provided.

@sundar-pds sundar-pds changed the title RoCE Workload Dockerfile, Build and Setup scripts for docker image generation and Readme for RCCL runs. Dockerfile for RoCE Workload, Build and Setup scripts for image generation and Readme for RCCL runs. Dec 12, 2025
@spraveenio spraveenio requested review from Copilot and removed request for sajmera-pensando and yuva29 December 12, 2025 03:11
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a Docker-based infrastructure for running RoCE (RDMA over Converged Ethernet) workloads with RCCL (ROCm Communication Collectives Library). The implementation includes a Dockerfile that bundles ROCm, RCCL, OpenMPI, UCX, and AMD AINIC drivers, along with supporting scripts for building and executing distributed GPU workloads.

Key changes:

  • Dockerfile for building a comprehensive RoCE workload environment with AMD GPU support
  • Build script (docker-build.sh) for automated image generation with configurable parameters
  • Runtime script (run_rccl.sh) for executing RCCL performance tests across multiple nodes
  • Utility script (show_gid) for displaying InfiniBand GID information
  • Documentation covering build instructions and usage examples

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
docker/roce-workload/Dockerfile Multi-stage build setup installing UCX, OpenMPI, RCCL, AMD Network Plugin, and AINIC drivers with SSH configuration for distributed runs
docker/roce-workload/docker-build.sh Build automation script parameterizing image tags, driver versions, and component versions
docker/roce-workload/run_rccl.sh MPI launcher script configured with RCCL-specific environment variables for multi-node collective operations
docker/roce-workload/show_gid Bash utility for displaying InfiniBand device GIDs with color-coded output
docker/roce-workload/README.md Documentation providing build instructions and deployment workflow

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sundar-pds
Copy link
Collaborator Author

Any further comments on this PR will be addressed in a subsequent PR. Will merge this PR now .

@sundar-pds sundar-pds merged commit 7429ed8 into ROCm:main Dec 12, 2025
4 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant