-
Notifications
You must be signed in to change notification settings - Fork 306
[Performance] Batched calibration #2054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed. |
32de48f to
35a0507
Compare
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
34814c7 to
6559de0
Compare
dc957a8 to
74f8882
Compare
74f8882 to
33c6fe9
Compare
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
33c6fe9 to
4cd1f89
Compare
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
c0e37ab to
413d7b6
Compare
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
HDCharles
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good aside from the missing docstring
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Purpose
batch_sizecontrols the batch size of calibration dataoffload_sequential_activationscontrols whether calibration data is offloaded to the CPU between layersPrerequisites
Changes
Batched Calibration
batch_sizeargumentdata_collatordefault from the default data collator to a"truncation"collatordata_collator_with_truncationfunction truncates all samples to the shortest length sample in the batch.LengthAwareSamplerwhich samples from the dataset such that samples with similar batch lengths are batched togetherDisable Offloading
offload_sequential_activationsargument, defaults to True (no behavior change)Misc
TextGenerationDataset_mask_paddingfromIntermediatesCache, as I do not believe that this method is effective in masking padding tokens from hessian calculationsTesting
Evaluation Regression
Deleting significant portions of the dataset (delete longer sequences first) has a detrimental effect on recovery
Modifiers
Calibration Regression Testing
I ran calibration for the following models (but did not evaluate recovery)
The following model examples can calibrate without issue:
The following models had a bug where processor and model dtypes were mismatched, but are now fixed by this PR:
The following models have an accelerate device offloading bug:
The following model examples have an MoE replacement bug:
Future Work
While these options are a great place to start, the next step to improve runtime is to allow multi-GPU compression, likely via torch.distributed tensor parallelism