Skip to content

fix(recipes): default cross entropy fusion to native#4138

Closed
yaoyu-33 wants to merge 1 commit into
mainfrom
theo/mcore5115-cross-entropy-defaults
Closed

fix(recipes): default cross entropy fusion to native#4138
yaoyu-33 wants to merge 1 commit into
mainfrom
theo/mcore5115-cross-entropy-defaults

Conversation

@yaoyu-33
Copy link
Copy Markdown
Contributor

@yaoyu-33 yaoyu-33 commented Jun 3, 2026

Summary

  • Switch shipped recipe defaults from cross_entropy_fusion_impl="te" to "native" while keeping cross_entropy_loss_fusion=True.
  • Update provider and bridge defaults that can bypass recipes, including DeepSeek, Kimi, Kimi-VL, and Sarvam providers.
  • Update the common performance override and diffusion AR-to-DLM recipe default to use native CE fusion.
  • Add a focused unit guard that scans shipped recipe/provider/performance source defaults for cross_entropy_fusion_impl="te".

Context

NVIDIA/Megatron-LM#5115 disables the Transformer Engine implementation of cross-entropy loss fusion because of training-stability issues. Native cross-entropy fusion remains available, so Bridge defaults should avoid producing cross_entropy_loss_fusion=True with cross_entropy_fusion_impl="te".

Validation

  • uv run --active --no-sync pre-commit run --all-files
  • uv run --active --no-sync ruff check <changed files>
  • uv run --active --no-sync ruff format --check <changed files>
  • git diff --check
  • uv run --active --no-sync python -m pytest --confcutdir=tests/unit_tests/recipes tests/unit_tests/recipes/test_cross_entropy_defaults.py -v

Targeted recipe pytest suites were attempted locally, but full local collection is blocked by environment issues: project sync cannot install the pinned nvidia-resiliency-ext==0.6.0 wheel for this platform, and the active environment does not provide Transformer Engine for recipe module imports.

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Jun 3, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Jun 3, 2026

Light Code Review -- Clean, well-scoped mechanical change. All cross_entropy_fusion_impl defaults switched from te to native. Verified no remaining te defaults. Guard test is solid (AST parsing). Completeness looks good. No bugs found. Suggested test cases: The change to scripts/performance/utils/overrides.py affects all perf workloads. No individual perf config files were modified. All perf baselines will need re-evaluation.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Jun 3, 2026

Light Code Review

Clean, well-scoped mechanical change. All cross_entropy_fusion_impl defaults switched from te to native across recipes, bridges, providers, perf overrides, and tests. Verified no remaining te defaults in src/ or scripts/.

Observations

  • Guard test is solid. test_cross_entropy_defaults.py uses AST parsing to scan shipped defaults for Assign, AnnAssign, and keyword-arg patterns. Good regression prevention.
  • Completeness looks good. All bridge files (DeepSeek v2/v3/v4, Kimi, Kimi-VL), both Sarvam providers, all recipe families (Llama, Qwen, DeepSeek, Kimi, Moonlight, Nemotron, StepFun), diffusion, and the common perf override are covered.
  • Existing test assertions updated consistently. The Llama test simplification (removing the pretrain-vs-SFT branch) is correct since all configs now use native.

No bugs, typos, or missing coverage found.

Suggested test cases

The change to scripts/performance/utils/overrides.py (_set_common_perf_overrides) affects all perf workloads that have the cross_entropy_fusion_impl attribute, since this is the shared override path. No individual perf config files under scripts/performance/configs/ were modified and none directly set cross_entropy_fusion_impl, so there are no config-specific test cases to enumerate. All perf baselines will need to be re-evaluated after this lands since native CE fusion may have different throughput characteristics than TE CE fusion.

@yaoyu-33 yaoyu-33 added the r0.5.0 Auto-cherrypick to release branch. Apply before merge; cherrypick happens after merge. label Jun 3, 2026
@yaoyu-33 yaoyu-33 closed this Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

r0.5.0 Auto-cherrypick to release branch. Apply before merge; cherrypick happens after merge.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant