Hi team,
I quantized the Cohere model using AWQ through LLM Compressor.
The quantization process completed successfully and produced an asymmetric (zero-point) AWQ model. I then uploaded the quantized model to Hugging Face.
When I try to load the model from Hugging Face, it asks me to install compressed_tensors. After installing it, loading fails with the following error:
File "/usr/local/lib/python3.12/dist-packages/compressed_tensors/compressors/quantized_compressors/pack_quantized.py", line 175, in decompress_weight
raise ValueError(
ValueError: Decompression of packed zero points is currently not supported