LM Workload TF32 #900

rka97 · 2026-01-27T19:51:21Z

This is the PR for the LM workload with mixed-precision training on 4xA100.

Dev -> main

…w pytorch" This reverts commit 6f7d638.

Fix slow ImageNet workloads on PyTorch

add mixed precision training for lm workload

Revert "add mixed precision training for lm workload"

github-actions · 2026-01-27T19:51:31Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

priyakasimbeg · 2026-01-29T03:51:50Z

This branch has the mixed precision code changes for all of the workloads. We have an existing PR open from the lm_workload_base branch that only has the LM workload changes. Note I did not do a final test with the mixed precision for just the LM workload after increasing the number of evals to get a better timing estimates. At this point we may just opt to do LM workload with TF32 for the new release since we don't have anymore bandwidth to test changes.

priyakasimbeg and others added 30 commits February 27, 2025 14:56

Merge pull request #847 from mlcommons/dev

1d81455

Dev -> main

first LM commit

da5f85a

lm data pipeline

a12a364

testing

ca83ab8

LM workload tested torch pipeline

e3e78dc

LM workload - fix torch tests

e619495

add LM tests, remove dev files

d8e9c56

add LM tests, remove dev files

6b4ff12

Stop tracking .gitignore

3c5c847

Remove dev/ from repo, keep locally

20d841b

fix comments

f3ba059

add class specifications

381451f

add workload LM info

f111d2e

restore data_utils.py tree map

808d398

fixed NFS bug

35f8f89

train/val split before concat

cbb6ee6

renamed datasets to avoid conflict with HF

868987c

Merge remote-tracking branch 'upstream/lm_workload' into lm_workload

8191f6d

renamed datasets to dataset

dd59ded

fix style

496b9c3

fix formatting

50989eb

fix style

5af0fdc

fix style

2683099

fix yapf

6b7ee29

fix style

46b645b

HF datasets pipeline

b3ae647

Testing with linear model

f095d4b

Merge branch 'jit_switch' into lm_workload

4189ae0

lm workload with linear model

0c22f3d

add nanodo model

99c7b9b

priyakasimbeg and others added 20 commits December 11, 2025 03:15

Revert "ImageNet and CIFAR mixed-precision support, need to debug slo…

6806019

…w pytorch" This reverts commit 6f7d638.

Use tf32 in pytorch

c9899cf

ImageNet caching for faster dataset access PyTorch

2f865a1

set matmuls, conv and rnn to tf32 for torch.cuda

3e0e07c

add include_submission option

113c481

remove new api settings for tf32

4e82d1c

document how to use nw api for torch.backend precision

85705d7

Merge branch 'a100' into lm_workload_tf32

b6f37ac

modify pytorch run command in startup script for docker

38fa915

some benchmarking steps

9c93fc2

Change num_workers for imagenet, add validation tests for step times

f6974eb

Merge pull request #897 from mlcommons/a100_imagenet_fixed

056281e

Fix slow ImageNet workloads on PyTorch

merge imagenet fixes

d2f23cf

eval with same number of workers

d24264d

temporarily disable cache for imagneet

56337f8

remove print statement

d4dc1b9

Merge pull request #898 from mlcommons/lm_workload_mp

ad28694

add mixed precision training for lm workload

Revert "add mixed precision training for lm workload"

c7ad36d

Merge pull request #899 from mlcommons/revert-898-lm_workload_mp

e09cb4b

Revert "add mixed precision training for lm workload"

fix formatting errors

5f733e1

rka97 assigned priyakasimbeg and rka97 Jan 27, 2026

rka97 requested a review from a team as a code owner January 27, 2026 19:51

rka97 requested a review from priyakasimbeg January 27, 2026 19:52

priyakasimbeg added 2 commits January 29, 2026 03:27

logging warning in utils script

5ac8fa6

merge

f1774fd

priyakasimbeg closed this Jan 29, 2026

github-actions bot locked and limited conversation to collaborators Jan 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LM Workload TF32 #900

LM Workload TF32 #900

Uh oh!

rka97 commented Jan 27, 2026

Uh oh!

github-actions bot commented Jan 27, 2026 •

edited

Loading

Uh oh!

priyakasimbeg commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

LM Workload TF32 #900

LM Workload TF32 #900

Uh oh!

Conversation

rka97 commented Jan 27, 2026

Uh oh!

github-actions bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

priyakasimbeg commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Jan 27, 2026 •

edited

Loading