AWQ-quantized model fails to load due to "Decompression of packed zero points is not supported"

Hi team,

I quantized the Cohere model using AWQ through LLM Compressor.
The quantization process completed successfully and produced an asymmetric (zero-point) AWQ model. I then uploaded the quantized model to Hugging Face.

When I try to load the model from Hugging Face, it asks me to install compressed_tensors. After installing it, loading fails with the following error:

File "/usr/local/lib/python3.12/dist-packages/compressed_tensors/compressors/quantized_compressors/pack_quantized.py", line 175, in decompress_weight
    raise ValueError(
ValueError: Decompression of packed zero points is currently not supported

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AWQ-quantized model fails to load due to "Decompression of packed zero points is not supported" #2099

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AWQ-quantized model fails to load due to "Decompression of packed zero points is not supported" #2099

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions