Skip to content

Add CI testcases and benchmark for allreduce#387

Open
yanboshao wants to merge 6 commits intomainfrom
allreduce
Open

Add CI testcases and benchmark for allreduce#387
yanboshao wants to merge 6 commits intomainfrom
allreduce

Conversation

@yanboshao
Copy link
Copy Markdown
Contributor

@yanboshao yanboshao commented Apr 13, 2026

Motivation

  1. rm mem_ops module
  2. add CI testcase of allreduce

Technical Details

Test Plan

Test the performance of allreduce on MI325.

Test Result

Submission Checklist

@yanboshao yanboshao changed the title Allreduce Add CI testcases and benchmark for allreduce Apr 13, 2026
…ntain permissions'

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
@coderfeli
Copy link
Copy Markdown
Collaborator

coderfeli commented Apr 13, 2026

[rank=7] Error: [rank=7] cudagraph max_err=7.845e+00 >= atol=0.15
File "/tmp/flydsl-main/tests/kernels/test_allreduce.py", line 458, in _dist_worker
assert max_err < atol, f"[rank={rank}] cudagraph max_err={max_err:.3e} >= atol={atol}"
^^^^^^^^^^^^^^
File "/tmp/flydsl-main/tests/kernels/test_allreduce.py", line 458, in _dist_worker
assert max_err < atol, f"[rank={rank}] cudagraph max_err={max_err:.3e} >= atol={atol}"
^^^^^^^^^^^^^^

Are these expected?

@coderfeli
Copy link
Copy Markdown
Collaborator

Too many logs in benchmark. Difficult to figure out whether regressions exist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants