Disable TE cross entropy loss fusion by mchrzanowski · Pull Request #5115 · NVIDIA/Megatron-LM

mchrzanowski · 2026-06-02T19:46:50Z

[X] I, the PR author, have personally reviewed every line of this PR.

What does this PR do ?

Disables the Transformer Engine implementation of cross entropy loss fusion with an assertion due to observed training stability issues, while keeping native cross entropy fusion available.

Issue tracking

Linked issue: N/A, small stability bug fix.

Contribution process

Pre-checks

I have added relevant unit tests
I have added relevant functional tests
I have added proper typing to my code Typing guidelines
I have added relevant documentation
I have run the autoformatter.sh on my PR

Validation

Added a unit test covering rejection of cross_entropy_loss_fusion=True with cross_entropy_fusion_impl='te'.
Updated existing functional test configs and MoE README guidance to use native.
Ran python -m py_compile on changed Python files.
Ran git diff --check.
Could not run pytest in the local environment because torch is not installed.

copy-pr-bot · 2026-06-02T19:46:54Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2026-06-02T19:47:05Z

This PR has been automatically converted to draft because all PRs must start as drafts.

When you are ready for review, click Ready for Review to begin the review process. This will:

Add the oncall reviewer (optional reviewer)
Add required review teams based on your changes

See the contribution guide for more details.

greptile-apps · 2026-06-02T19:51:44Z

Greptile Summary

This PR disables the Transformer Engine implementation of cross-entropy loss fusion by adding a guard that raises an error when cross_entropy_loss_fusion=True is combined with cross_entropy_fusion_impl='te', while leaving the native fusion path fully available.

Adds the guard in both ModelParallelConfig.__post_init__ and validate_args, so the restriction applies whether configuration is supplied via the Python API or the CLI.
Updates six functional test YAML configs, one unit test fixture, and the MoE README to switch from te to native.
Adds a new tests/unit_tests/test_model_parallel_config.py covering both the rejected and the allowed paths.

Confidence Score: 4/5

The change correctly blocks the unstable TE fusion path; the only concern is that model_parallel_config.py uses assert while every other guard in that same file uses raise ValueError, leaving the check silently skippable under python -O.

In model_parallel_config.py the new guard uses assert while all six surrounding validation checks raise ValueError. Because Python's -O flag strips assert statements, the intended stability block would be bypassed silently in any optimised deployment that passes cross_entropy_fusion_impl='te' through the direct Python API rather than the CLI path. The fix is a one-line change and the rest of the PR is clean.

megatron/core/model_parallel_config.py and tests/unit_tests/test_model_parallel_config.py (the test expects AssertionError and will need updating alongside the fix)

Important Files Changed

Filename	Overview
megatron/core/model_parallel_config.py	Adds a guard against TE cross-entropy fusion using `assert` instead of `raise ValueError`, inconsistent with all other validation in the file and bypassable with `python -O`.
megatron/training/arguments.py	Adds TE cross-entropy fusion guard in `validate_args` using `assert`, consistent with the 188 other `assert`-based validations in this file.
tests/unit_tests/test_model_parallel_config.py	New unit test file; tests both the disabled TE path (expects `AssertionError`) and the allowed native path — will need updating if the guard is changed to `raise ValueError`.
tests/unit_tests/models/mimo/test_mimo_1f1b_schedule.py	Updates `cross_entropy_fusion_impl` from `'te'` to `'native'` in test fixture to comply with the new guard.
megatron/core/transformer/moe/README.md	Updates recommended flag from `te` to `native` in performance flags documentation.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[User sets cross_entropy_loss_fusion=True] --> B{cross_entropy_fusion_impl?}
    B -- "'te'" --> C["ModelParallelConfig.__post_init__\nassert / raise ValueError"]
    C --> D[AssertionError / ValueError\n'TE fusion disabled due to stability issues']
    B -- "'native'" --> E[Validation passes]
    E --> F[Native cross-entropy fusion used]

    G[CLI: --cross-entropy-loss-fusion] --> H{--cross-entropy-fusion-impl?}
    H -- "te" --> I["validate_args assert"]
    I --> J[AssertionError\n'Use --cross-entropy-fusion-impl native']
    H -- "native" --> K[Validation passes]
    K --> L[Native cross-entropy fusion used]

_{Reviews (1): Last reviewed commit: "Disable TE cross entropy loss fusion" | Re-trigger Greptile}

greptile-apps · 2026-06-02T19:51:48Z

+        assert not (
+            self.cross_entropy_loss_fusion and self.cross_entropy_fusion_impl == 'te'
+        ), (
+            "Transformer Engine cross entropy loss fusion is disabled due to stability issues. "
+            "Use cross_entropy_fusion_impl='native', or disable cross_entropy_loss_fusion."
+        )


Every other validation in __post_init__ uses raise ValueError(...), but this guard uses assert. Running Python with the -O (optimize) flag silently disables all assert statements, meaning the TE fusion combination would be permitted and the stability regression would reappear without any error. Use raise ValueError here to match the surrounding code and make the guard unconditional.

Suggested change

assert not (

self.cross_entropy_loss_fusion and self.cross_entropy_fusion_impl == 'te'

), (

"Transformer Engine cross entropy loss fusion is disabled due to stability issues. "

"Use cross_entropy_fusion_impl='native', or disable cross_entropy_loss_fusion."

)

if self.cross_entropy_loss_fusion and self.cross_entropy_fusion_impl == 'te':

raise ValueError(

"Transformer Engine cross entropy loss fusion is disabled due to stability issues. "

"Use cross_entropy_fusion_impl='native', or disable cross_entropy_loss_fusion."

)

greptile-apps · 2026-06-02T19:51:49Z

+def test_te_cross_entropy_loss_fusion_is_disabled():
+    with pytest.raises(AssertionError, match="Transformer Engine cross entropy loss fusion"):
+        ModelParallelConfig(cross_entropy_loss_fusion=True, cross_entropy_fusion_impl='te')


If the guard in model_parallel_config.py is changed from assert to raise ValueError (as suggested), this test will need to match ValueError instead of AssertionError.

Suggested change

def test_te_cross_entropy_loss_fusion_is_disabled():

with pytest.raises(AssertionError, match="Transformer Engine cross entropy loss fusion"):

ModelParallelConfig(cross_entropy_loss_fusion=True, cross_entropy_fusion_impl='te')

def test_te_cross_entropy_loss_fusion_is_disabled():

with pytest.raises(ValueError, match="Transformer Engine cross entropy loss fusion"):

ModelParallelConfig(cross_entropy_loss_fusion=True, cross_entropy_fusion_impl='te')

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

ko3n1g · 2026-06-03T17:02:16Z

/ok to test eab65fb

svcnvidia-nemo-ci · 2026-06-03T18:59:13Z

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/26906455119

svcnvidia-nemo-ci · 2026-06-03T21:29:44Z

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/26914187574

mchrzanowski requested review from a team as code owners June 2, 2026 19:46

svcnvidia-nemo-ci marked this pull request as draft June 2, 2026 19:47

mchrzanowski marked this pull request as ready for review June 2, 2026 19:48

svcnvidia-nemo-ci added the complexity: low label Jun 2, 2026

greptile-apps Bot reviewed Jun 2, 2026

View reviewed changes

yashaswikarnati approved these changes Jun 2, 2026

View reviewed changes

ko3n1g added the core_r0.18.0 Auto-cherrypick to release branch. Apply before merge; cherrypick happens after merge. label Jun 2, 2026

Disable TE cross entropy loss fusion

89daa0f

mchrzanowski force-pushed the disable-te-cross-entropy-fusion branch from 2fac253 to 89daa0f Compare June 3, 2026 02:12

kvareddy approved these changes Jun 3, 2026

View reviewed changes

yaox12 approved these changes Jun 3, 2026

View reviewed changes

svcnvidia-nemo-ci added the Final Review PR is in the "final review" stage label Jun 3, 2026

yaoyu-33 approved these changes Jun 3, 2026

View reviewed changes

svcnvidia-nemo-ci added Approved All necessary approvals have been made and removed Final Review PR is in the "final review" stage labels Jun 3, 2026

Merge branch 'main' into disable-te-cross-entropy-fusion

eab65fb

mchrzanowski enabled auto-merge June 3, 2026 16:23

yaoyu-33 mentioned this pull request Jun 3, 2026

fix(recipes): default cross entropy fusion to native NVIDIA-NeMo/Megatron-Bridge#4138

Closed

copy-pr-bot Bot temporarily deployed to public June 3, 2026 17:02 Inactive

copy-pr-bot Bot temporarily deployed to test June 3, 2026 17:03 Inactive

copy-pr-bot Bot temporarily deployed to public June 3, 2026 17:06 Inactive

copy-pr-bot Bot temporarily deployed to public June 3, 2026 17:15 Inactive

mchrzanowski added this pull request to the merge queue Jun 3, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 3, 2026

jaredcasper approved these changes Jun 3, 2026

View reviewed changes

ko3n1g added this pull request to the merge queue Jun 3, 2026

Merged via the queue into NVIDIA:main with commit 168cb15 Jun 3, 2026
86 of 87 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable TE cross entropy loss fusion#5115

Disable TE cross entropy loss fusion#5115
ko3n1g merged 2 commits into
NVIDIA:mainfrom
mchrzanowski:disable-te-cross-entropy-fusion

mchrzanowski commented Jun 2, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

greptile-apps Bot commented Jun 2, 2026

Uh oh!

greptile-apps Bot Jun 2, 2026

Uh oh!

greptile-apps Bot Jun 2, 2026

Uh oh!

ko3n1g commented Jun 3, 2026

Uh oh!

svcnvidia-nemo-ci commented Jun 3, 2026

Uh oh!

Uh oh!

svcnvidia-nemo-ci commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

mchrzanowski commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issue tracking

Contribution process

Pre-checks

Validation

Uh oh!

copy-pr-bot Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

greptile-apps Bot commented Jun 2, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

ko3n1g commented Jun 3, 2026

Uh oh!

svcnvidia-nemo-ci commented Jun 3, 2026

Uh oh!

Uh oh!

svcnvidia-nemo-ci commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

mchrzanowski commented Jun 2, 2026 •

edited

Loading