[Performance] Batched calibration #2054

kylesayrs · 2025-11-20T00:43:13Z

Purpose

Reduce calibration runtime by providing users with options to increase performance
- batch_size controls the batch size of calibration data
- offload_sequential_activations controls whether calibration data is offloaded to the CPU between layers

Prerequisites

Changes

Batched Calibration

Add batch_size argument
Change data_collator default from the default data collator to a "truncation" collator
The data_collator_with_truncation function truncates all samples to the shortest length sample in the batch.
- Statistics about how many tokens are dropped using this method are in the tables below
- The data collator can also be changed to "padding" instead to pad to the longest length sample in the batch
In order to reduce the amount of excess truncation/padding, default to LengthAwareSampler which samples from the dataset such that samples with similar batch lengths are batched together

Batch Size	Time	% Speedup	% Deleted
Original (1)	11m17	N/A	0.0
1	11m17	0.0	0.0
2	10m48	4.2	0.2
4	10m39	5.6	0.5
8	10m39	5.6	1.1
16	10m58	2.8	2.6
64	11m4	11.2	12.0
128	9m29	16.0	23.9
512	7m39	37.3	75.3

The speedup is relatively meager up until you start deleting significant portions of the dataset via truncation

Disable Offloading

Add offload_sequential_activations argument, defaults to True (no behavior change)
- Enabling this option increases throughput but also increases memory usage

Batch Size	Time	% Speedup	% Deleted
Original (1)	11m17	N/A	0.0
1	10m14	9.3	0.0
2	9m46	13.4	0.2
4	9m36	14.9	0.5
8	9m48	13.1	1.1
16	9m26	16.3	2.6
32	9m27	16.2	5.8
128	8m34	24.0	23.9
512	6m40	40.9	75.3

Memory requirement for 512 samples on Llama 8B is ~70Gb, which is equivalent to batch size 128
With this option enabled and batch size 32, calibration runtime is less than 1s per layer (down from ~11s)
- This implies that the theoretical maximum speedup from reducing calibration time alone is ~15% for this model + dataset

Misc

Fix examples
- Fixed examples where there's issues between model dtypes and processor dtypes (Mixtral, Pixtral, Whisper)
- For multimodal models which use multimodal datasets, remove their data collators, as the batch unwrapping is now done by theTextGenerationDataset
Remove _mask_padding from IntermediatesCache, as I do not believe that this method is effective in masking padding tokens from hessian calculations
Fix AWQ
- AWQ was hard coded to handle only batches of size 1

Testing

Evaluation Regression

Batch Size	Eval Score	Difference	% Deleted
Original (1)	0.6573	0.000	0.0
1	0.6513	-0.6	0.0
2	0.6513	-0.6	0.2
4	0.6657	+0.8	0.5
8	0.6513	-0.6	1.1
16	0.6672	+1.0	2.6
64	0.6338	-2.4	12.0
128	0.6603	+0.3	23.9
512	0.6391	-1.8	75.3

Deleting significant portions of the dataset (delete longer sequences first) has a detrimental effect on recovery

Modifiers

GPTQ
- Ran full regression tests, as shown above
AWQ
- Ran AWQ with batch size 32 and checked output sanity
Quantization Modifier
- Ran NVFP4 with batch size 10 and checked output sanity

Calibration Regression Testing

I ran calibration for the following models (but did not evaluate recovery)

The following model examples can calibrate without issue:

Llama3
Gemma3
Internvl3
Mllama
Llama4

The following models had a bug where processor and model dtypes were mismatched, but are now fixed by this PR:

Mistral3
Pixtral
Whisper

The following models have an accelerate device offloading bug:

Idefics3
Phi3 Vision

The following model examples have an MoE replacement bug:

qwen3-vl-30b-a3b-Instruct

Future Work

While these options are a great place to start, the next step to improve runtime is to allow multi-GPU compression, likely via torch.distributed tensor parallelism

github-actions · 2025-11-20T00:43:21Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

src/llmcompressor/entrypoints/oneshot.py

HDCharles

looks good aside from the missing docstring

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

kylesayrs force-pushed the kylesayrs/batched-calibration branch from 32de48f to 35a0507 Compare December 2, 2025 01:09

kylesayrs changed the base branch from main to kylesayrs/modifiers-expose-targets December 2, 2025 01:10

kylesayrs mentioned this pull request Dec 2, 2025

Support user-defined batch size for one shot #1117

Closed

kylesayrs added 5 commits December 2, 2025 19:11

implement requires_lm_head_calibration, disable_lm_head

986c989

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

fix typo

7a50269

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

always disable

48850b8

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

clean up dreggs

e6f0abc

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

clean up implementation

6559de0

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

kylesayrs force-pushed the kylesayrs/modifiers-expose-targets branch from 34814c7 to 6559de0 Compare December 2, 2025 19:11

kylesayrs force-pushed the kylesayrs/batched-calibration branch from dc957a8 to 74f8882 Compare December 4, 2025 22:25

fix typo

03dec9d

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

kylesayrs force-pushed the kylesayrs/batched-calibration branch from 74f8882 to 33c6fe9 Compare December 4, 2025 23:29

reduce diff

59df812

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

kylesayrs force-pushed the kylesayrs/batched-calibration branch from 33c6fe9 to 4cd1f89 Compare December 4, 2025 23:30

add batching and offload activations options

413d7b6

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

kylesayrs force-pushed the kylesayrs/batched-calibration branch from c0e37ab to 413d7b6 Compare December 5, 2025 02:48

kylesayrs changed the base branch from kylesayrs/modifiers-expose-targets to main December 5, 2025 02:54

kylesayrs added the ready When a PR is ready for review label Dec 5, 2025

kylesayrs added 2 commits December 5, 2025 04:52

fix tests

e929d69

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

fix awq

9d5a55a

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

kylesayrs marked this pull request as ready for review December 5, 2025 05:55

kylesayrs added 3 commits December 5, 2025 06:51

remove unused tests

4c01240

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

Merge branch 'main' into kylesayrs/batched-calibration

3a14e09

swap to padding default, add warnings, fix typo

cb87354

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

HDCharles reviewed Dec 5, 2025

View reviewed changes

src/llmcompressor/entrypoints/oneshot.py Show resolved Hide resolved

HDCharles previously approved these changes Dec 5, 2025

View reviewed changes

docstrings, truncation as default

326befa

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

kylesayrs dismissed HDCharles’s stale review via 326befa December 5, 2025 21:53

HDCharles approved these changes Dec 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Performance] Batched calibration #2054

[Performance] Batched calibration #2054

kylesayrs commented Nov 20, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 20, 2025

Uh oh!

Uh oh!

HDCharles left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Performance] Batched calibration #2054

Are you sure you want to change the base?

[Performance] Batched calibration #2054

Conversation

kylesayrs commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Prerequisites

Changes

Batched Calibration

Disable Offloading

Misc

Testing

Evaluation Regression

Modifiers

Calibration Regression Testing

Future Work

Uh oh!

github-actions bot commented Nov 20, 2025

Uh oh!

Uh oh!

HDCharles left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kylesayrs commented Nov 20, 2025 •

edited

Loading