Conversation
Signed-off-by: Xin He <xin3.he@intel.com>
Signed-off-by: Xin He <xin3.he@intel.com>
There was a problem hiding this comment.
Pull request overview
Updates AutoRound’s compressed_tensors integration to handle the upstream removal of functional CompressedLinear wrappers by detecting/processing quantized torch.nn.Linear modules that carry quantization_scheme / quantization_status metadata.
Changes:
- Updated weight-type handlers to detect quantized modules via
quantization_scheme(new compressed_tensors behavior) in addition to legacyCompressedLinear/compressorlogic. - Added
decompress_module(...)handling for new-style quantizedLinearmodules in FP8 and NVFP4 conversions. - Re-enabled/adjusted CPU tests to validate FP8/MXFP4 models without relying on
CompressedLineartype checks (NVFP4 still skipped due to upstream issue).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
auto_round/utils/weight_handler.py |
Extends detection/conversion to support new compressed_tensors quantized Linear modules. |
test/test_cpu/advanced/test_low_precision_input_model.py |
Updates assertions to match new model module types/attributes and unskips FP8/MXFP4 coverage. |
Code reviewFound 1 issue:
auto-round/auto_round/utils/weight_handler.py Lines 496 to 500 in e99111f Also, is the new Otherwise the approach looks good -- approve once the above is addressed. |
yiliu30
left a comment
There was a problem hiding this comment.
Approach looks good. One bug to fix (.data_type -> .type in FP8Handler) and a question about backward compat with older compressed_tensors versions -- see comment.
Signed-off-by: Xin He <xin3.he@intel.com>
|
Remove the version limit here to test your code, auto-round/test/test_cpu/requirements.txt Line 13 in 88824a1 |
|
/azp run Unit-Test-CUDA-AutoRound |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run Unit-Test-CUDA-AutoRound |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run Unit-Test-CUDA-AutoRound |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Description
compressed_tensors PR vllm-project/compressed-tensors#610 (commit 927f6d5, Mar 17 2026) removed CompressedLinear as a functional class. The class is now a stub that raises ValueError from from_linear().
Models loaded with the new compressed_tensors produce regular torch.nn.Linear modules with quantization_scheme/quantization_status attributes instead of CompressedLinear wrappers.
Type of Change
Related Issues
Fixes or relates to #1578
Checklist Before Submitting