Skip to content

Conversation

@shanjiaz
Copy link
Collaborator

@shanjiaz shanjiaz commented Nov 10, 2025

Updated the decompress_weight function to unpack zero_point/cast scale dtype during decompression. Replace the tensor in module with updated one.
Example script used:

from transformers import AutoModelForCausalLM, AutoTokenizer
from llmcompressor.utils import dispatch_for_generation

#MODEL_ID = "nm-testing/TinyLlama-1.1B-Chat-v1.0-w4a16-asym-awq-e2e"
MODEL_ID = "nm-testing/TinyLlama-1.1B-Chat-v1.0-NVFP4"

model = AutoModelForCausalLM.from_pretrained(MODEL_ID, torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

print("========== SAMPLE GENERATION ==============")
dispatch_for_generation(model)
input_ids = tokenizer("Hello my name is", return_tensors="pt").input_ids.to(model.device)
output = model.generate(input_ids, max_new_tokens=100)
print(tokenizer.decode(output[0]))''
print("==========================================\n\n")

Example script now generates coherent result:

(llm-compressor) [shanjiaz@nma-a100-solo-4-preserve llm-compressor]$ python zp_decompression.py 
`torch_dtype` is deprecated! Use `dtype` instead!
Compressing model: 154it [00:00, 747.12it/s]


========== SAMPLE GENERATION ==============
<s> Hello my name is John and I am a software engineer. I have been working in the tech industry for the past 10 years. I have worked on various projects and have gained a lot of experience. I am passionate about technology and have a keen interest in the latest technologies. I have a bachelor's degree in computer science and have completed several certifications in various technologies. I am currently working as a software engineer at a leading technology company. In my free time, I enjoy
==========================================

@shanjiaz shanjiaz added the bug Something isn't working label Nov 10, 2025
Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an fyi: #509
This will also impact mxfp4.
I've turned off mxfp4 decompression in the meantime / lower priority anyway

Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be cleaner to add optional:
compress_scale / decompress_scale
and compress_zp / decompress_zp functions?

This would impact:

  • PackedCompressor (packed zp)
  • NVFP4PackedCompressor (fp8 scales)
  • MXFP4PackedCompressor (uint8 scales)

@shanjiaz
Copy link
Collaborator Author

Would it be cleaner to add optional: compress_scale / decompress_scale and compress_zp / decompress_zp functions?

This would impact:

  • PackedCompressor (packed zp)
  • NVFP4PackedCompressor (fp8 scales)
  • MXFP4PackedCompressor (uint8 scales)

Sure! I can do that.

@shanjiaz shanjiaz changed the title [WIP] fix qparams decompression fix qparams decompression Dec 3, 2025
@shanjiaz shanjiaz changed the title fix qparams decompression [WIP] fix qparams decompression Dec 9, 2025
@shanjiaz shanjiaz changed the title [WIP] fix qparams decompression fix qparams decompression Dec 9, 2025
kylesayrs
kylesayrs previously approved these changes Dec 10, 2025
kylesayrs
kylesayrs previously approved these changes Dec 10, 2025
Copy link
Collaborator

@kylesayrs kylesayrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woop

Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, needs rebase

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
@shanjiaz shanjiaz requested review from dsikka and kylesayrs December 10, 2025 22:27
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
kylesayrs
kylesayrs previously approved these changes Dec 11, 2025
@kylesayrs
Copy link
Collaborator

Please make sure that compressed_data is only updated when you want it to be, and not accidentally updated as part of some other calculation

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
@shanjiaz shanjiaz requested review from dsikka and kylesayrs December 12, 2025 00:50
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
dsikka
dsikka previously approved these changes Dec 12, 2025
@shanjiaz shanjiaz enabled auto-merge (squash) December 12, 2025 19:24
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Copy link
Collaborator

@kylesayrs kylesayrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job

@shanjiaz shanjiaz requested a review from dsikka December 12, 2025 20:23
@shanjiaz shanjiaz merged commit f9e7426 into main Dec 12, 2025
3 checks passed
@shanjiaz shanjiaz deleted the fix-qparams-decompression branch December 12, 2025 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants