Skip to content

support compressed-tensors refactor#1595

Merged
XuehaoSun merged 10 commits intomainfrom
xinhe/3-20a
Mar 24, 2026
Merged

support compressed-tensors refactor#1595
XuehaoSun merged 10 commits intomainfrom
xinhe/3-20a

Conversation

@xin3he
Copy link
Copy Markdown
Contributor

@xin3he xin3he commented Mar 23, 2026

Description

compressed_tensors PR vllm-project/compressed-tensors#610 (commit 927f6d5, Mar 17 2026) removed CompressedLinear as a functional class. The class is now a stub that raises ValueError from from_linear().

Models loaded with the new compressed_tensors produce regular torch.nn.Linear modules with quantization_scheme/quantization_status attributes instead of CompressedLinear wrappers.

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Other (please specify):

Related Issues

Fixes or relates to #1578

Checklist Before Submitting

  • My code has been tested locally.
  • Documentation has been updated as needed.
  • New or updated tests are included where applicable.

Signed-off-by: Xin He <xin3.he@intel.com>
Copilot AI review requested due to automatic review settings March 23, 2026 03:34
@xin3he xin3he requested review from Copilot and yiliu30 and removed request for Copilot March 23, 2026 03:34
Signed-off-by: Xin He <xin3.he@intel.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates AutoRound’s compressed_tensors integration to handle the upstream removal of functional CompressedLinear wrappers by detecting/processing quantized torch.nn.Linear modules that carry quantization_scheme / quantization_status metadata.

Changes:

  • Updated weight-type handlers to detect quantized modules via quantization_scheme (new compressed_tensors behavior) in addition to legacy CompressedLinear/compressor logic.
  • Added decompress_module(...) handling for new-style quantized Linear modules in FP8 and NVFP4 conversions.
  • Re-enabled/adjusted CPU tests to validate FP8/MXFP4 models without relying on CompressedLinear type checks (NVFP4 still skipped due to upstream issue).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
auto_round/utils/weight_handler.py Extends detection/conversion to support new compressed_tensors quantized Linear modules.
test/test_cpu/advanced/test_low_precision_input_model.py Updates assertions to match new model module types/attributes and unskips FP8/MXFP4 coverage.

@chensuyue chensuyue added this to the 0.12.0 milestone Mar 23, 2026
Signed-off-by: Xin He <xin3.he@intel.com>
@yiliu30
Copy link
Copy Markdown
Contributor

yiliu30 commented Mar 24, 2026

Code review

Found 1 issue:

  1. FP8Handler.detect_layer uses .data_type (lines 497, 499) to access quantization scheme fields, but compressed_tensors QuantizationArgs only has a .type field. The other three handlers (MXFP4, MXFP8, NVFP4) in this same PR correctly use .type. This will raise AttributeError at runtime when detecting FP8-BLOCK models via the new quantization_scheme path.

q_scheme.weights.num_bits == 8
and "float" in q_scheme.weights.data_type
and q_scheme.input_activations.num_bits == 8
and "float" in q_scheme.input_activations.data_type
):

Also, is the new quantization_scheme-based detection compatible with the older compressed_tensors versions that still use CompressedLinear? Want to make sure we don't break backward compatibility.

Otherwise the approach looks good -- approve once the above is addressed.

Copy link
Copy Markdown
Contributor

@yiliu30 yiliu30 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach looks good. One bug to fix (.data_type -> .type in FP8Handler) and a question about backward compat with older compressed_tensors versions -- see comment.

xin3he added 3 commits March 24, 2026 10:04
Signed-off-by: Xin He <xin3.he@intel.com>
Signed-off-by: Xin He <xin3.he@intel.com>
@chensuyue
Copy link
Copy Markdown
Contributor

Remove the version limit here to test your code,

compressed-tensors==0.14.1a20260313 # temporary pin for llmcompressor

@xin3he
Copy link
Copy Markdown
Contributor Author

xin3he commented Mar 24, 2026

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@xin3he
Copy link
Copy Markdown
Contributor Author

xin3he commented Mar 24, 2026

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@xin3he
Copy link
Copy Markdown
Contributor Author

xin3he commented Mar 24, 2026

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@XuehaoSun XuehaoSun merged commit 19b5237 into main Mar 24, 2026
38 of 40 checks passed
@XuehaoSun XuehaoSun deleted the xinhe/3-20a branch March 24, 2026 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants