Skip to content

Conversation

@kylesayrs
Copy link
Collaborator

@kylesayrs kylesayrs commented Nov 20, 2025

Purpose

  • Reduce calibration runtime by providing users with options to increase performance
    • batch_size controls the batch size of calibration data
    • offload_sequential_activations controls whether calibration data is offloaded to the CPU between layers

Prerequisites

Changes

Batched Calibration

  • Add batch_size argument
  • Change data_collator default from the default data collator to a "truncation" collator
  • The data_collator_with_truncation function truncates all samples to the shortest length sample in the batch.
    • Statistics about how many tokens are dropped using this method are in the tables below
    • The data collator can also be changed to "padding" instead to pad to the longest length sample in the batch
  • In order to reduce the amount of excess truncation/padding, default to LengthAwareSampler which samples from the dataset such that samples with similar batch lengths are batched together
Batch Size Time % Speedup % Deleted
Original (1) 11m17 N/A 0.0
1 11m17 0.0 0.0
2 10m48 4.2 0.2
4 10m39 5.6 0.5
8 10m39 5.6 1.1
16 10m58 2.8 2.6
64 11m4 11.2 12.0
128 9m29 16.0 23.9
512 7m39 37.3 75.3
  • The speedup is relatively meager up until you start deleting significant portions of the dataset via truncation

Disable Offloading

  • Add offload_sequential_activations argument, defaults to True (no behavior change)
    • Enabling this option increases throughput but also increases memory usage
Batch Size Time % Speedup % Deleted
Original (1) 11m17 N/A 0.0
1 10m14 9.3 0.0
2 9m46 13.4 0.2
4 9m36 14.9 0.5
8 9m48 13.1 1.1
16 9m26 16.3 2.6
32 9m27 16.2 5.8
128 8m34 24.0 23.9
512 6m40 40.9 75.3
  • Memory requirement for 512 samples on Llama 8B is ~70Gb, which is equivalent to batch size 128
  • With this option enabled and batch size 32, calibration runtime is less than 1s per layer (down from ~11s)
    • This implies that the theoretical maximum speedup from reducing calibration time alone is ~15% for this model + dataset

Misc

  • Fix examples
    • Fixed examples where there's issues between model dtypes and processor dtypes (Mixtral, Pixtral, Whisper)
    • For multimodal models which use multimodal datasets, remove their data collators, as the batch unwrapping is now done by theTextGenerationDataset
  • Remove _mask_padding from IntermediatesCache, as I do not believe that this method is effective in masking padding tokens from hessian calculations
  • Fix AWQ
    • AWQ was hard coded to handle only batches of size 1

Testing

Evaluation Regression

Batch Size Eval Score Difference % Deleted
Original (1) 0.6573 0.000 0.0
1 0.6513 -0.6 0.0
2 0.6513 -0.6 0.2
4 0.6657 +0.8 0.5
8 0.6513 -0.6 1.1
16 0.6672 +1.0 2.6
64 0.6338 -2.4 12.0
128 0.6603 +0.3 23.9
512 0.6391 -1.8 75.3

Deleting significant portions of the dataset (delete longer sequences first) has a detrimental effect on recovery

Modifiers

  • GPTQ
    • Ran full regression tests, as shown above
  • AWQ
    • Ran AWQ with batch size 32 and checked output sanity
  • Quantization Modifier
    • Ran NVFP4 with batch size 10 and checked output sanity

Calibration Regression Testing

I ran calibration for the following models (but did not evaluate recovery)

The following model examples can calibrate without issue:

  • Llama3
  • Gemma3
  • Internvl3
  • Mllama
  • Llama4

The following models had a bug where processor and model dtypes were mismatched, but are now fixed by this PR:

  • Mistral3
  • Pixtral
  • Whisper

The following models have an accelerate device offloading bug:

  • Idefics3
  • Phi3 Vision

The following model examples have an MoE replacement bug:

  • qwen3-vl-30b-a3b-Instruct

Future Work

While these options are a great place to start, the next step to improve runtime is to allow multi-GPU compression, likely via torch.distributed tensor parallelism

@github-actions
Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

@kylesayrs kylesayrs force-pushed the kylesayrs/batched-calibration branch from 32de48f to 35a0507 Compare December 2, 2025 01:09
@kylesayrs kylesayrs changed the base branch from main to kylesayrs/modifiers-expose-targets December 2, 2025 01:10
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
@kylesayrs kylesayrs force-pushed the kylesayrs/modifiers-expose-targets branch from 34814c7 to 6559de0 Compare December 2, 2025 19:11
@kylesayrs kylesayrs force-pushed the kylesayrs/batched-calibration branch from dc957a8 to 74f8882 Compare December 4, 2025 22:25
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
@kylesayrs kylesayrs force-pushed the kylesayrs/batched-calibration branch from 74f8882 to 33c6fe9 Compare December 4, 2025 23:29
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
@kylesayrs kylesayrs force-pushed the kylesayrs/batched-calibration branch from 33c6fe9 to 4cd1f89 Compare December 4, 2025 23:30
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
@kylesayrs kylesayrs force-pushed the kylesayrs/batched-calibration branch from c0e37ab to 413d7b6 Compare December 5, 2025 02:48
@kylesayrs kylesayrs changed the base branch from kylesayrs/modifiers-expose-targets to main December 5, 2025 02:54
@kylesayrs kylesayrs added the ready When a PR is ready for review label Dec 5, 2025
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
@kylesayrs kylesayrs marked this pull request as ready for review December 5, 2025 05:55
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
HDCharles
HDCharles previously approved these changes Dec 5, 2025
Copy link
Collaborator

@HDCharles HDCharles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good aside from the missing docstring

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready When a PR is ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants