Skip to content

Conversation

@mmarcinkiewicz
Copy link
Contributor

No description provided.

@mmarcinkiewicz mmarcinkiewicz requested a review from a team as a code owner August 29, 2025 14:56
@github-actions
Copy link

github-actions bot commented Aug 29, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@ShriyaRishab
Copy link
Contributor

@suachong - can you please review this?

@ShriyaRishab
Copy link
Contributor

@suachong - is this the error you're still getting? -

0000-steps/0 [default6]:[rank6]:   File "/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/compiler.py", line 297, in make_cubin

0000-steps/0 [default6]:[rank6]:     raise RuntimeError(f'Internal Triton PTX codegen error: \n{log}')

0000-steps/0 [default6]:[rank6]: RuntimeError: Internal Triton PTX codegen error:

0000-steps/0 [default6]:[rank6]: ptxas fatal   : Value 'sm_100' is not defined for option 'gpu-name'

0000-steps/0 [default6]:

0000-steps/0 [default6]:

0000-steps/0 [default6]:

0000-steps/0 [default6]:[rank6]: Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

@mmarcinkiewicz - can you please help with this one?

NVIDIA-SMI 580.65.06
Driver Version: 580.65.06
CUDA Version: 13.0

@suachong
Copy link
Contributor

suachong commented Sep 9, 2025

@mmarcinkiewicz is the Dockerfile for B200 working? I'm still running into the same error that shriya posted.

@mmarcinkiewicz
Copy link
Contributor Author

I've rebuilt the image and I can repro now, which is weird because it used to work.

@mmarcinkiewicz
Copy link
Contributor Author

@suachong please try now

Copy link
Contributor

@suachong suachong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with training run to convergence.

@hiwotadese
Copy link
Contributor

@suachong is this working now and can we merge this?

@suachong
Copy link
Contributor

suachong commented Sep 12, 2025 via email

@ShriyaRishab ShriyaRishab merged commit 30b8763 into mlcommons:master Sep 22, 2025
1 check passed
@github-actions github-actions bot locked and limited conversation to collaborators Sep 22, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants