fix gguf bug for qwen3.5 moe and some small models by n1ck-guo · Pull Request #1575 · intel/auto-round

n1ck-guo · 2026-03-19T11:30:43Z

Description

Please briefly describe your main changes, the motivation.

Type of Change

Related Issues

Fixes or relates to #

Checklist Before Submitting

My code has been tested locally.
Documentation has been updated as needed.
New or updated tests are included where applicable.

Signed-off-by: n1ck-guo <heng.guo@intel.com>

Copilot

Pull request overview

This PR targets GGUF export robustness and inference correctness (notably for Qwen3.5-series models) by aligning tensor handling with llama.cpp expectations and preventing failures in edge cases during GGUF conversion/packing.

Changes:

Guard GGUF tensor preparation against empty tensor_map.mapping (e.g., block_count=0 models) to avoid crashes.
Expand the “always-F32” tensor-type allowlist during GGUF export to include additional llama.cpp tensor identifiers.
Adjust RTN layer quantization flow for immediate GGUF packing so the GGUF-specific path is taken regardless of disable_opt_rtn.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`auto_round/export/export_to_gguf/convert.py`	Prevents failures when `tensor_map.mapping` is empty; updates GGUF tensor-type handling for F32 selection.
`auto_round/compressors/base.py`	Changes RTN quantization branching for immediate GGUF packing, affecting how `disable_opt_rtn` influences behavior.

auto_round/compressors/base.py

auto_round/export/export_to_gguf/convert.py

wenhuach21 · 2026-03-19T11:41:47Z

Great！been waiting for this pr for a long time

Signed-off-by: n1ck-guo <heng.guo@intel.com>

auto_round/compressors/base.py

Signed-off-by: n1ck-guo <heng.guo@intel.com>

wenhuach21 · 2026-03-23T01:41:27Z

python3 -m auto_round Qwen/Qwen3-1.7B    --enable_alg_ext --iters 1 --nsamples 1 --format gguf:q2_k_s

File "/home/wenhuach/auto-round-main/auto_round/main.py", line 827, in
run()
~~~^^
File "/home/wenhuach/auto-round-main/auto_round/main.py", line 811, in run
start()
~~~~~^^
File "/home/wenhuach/auto-round-main/auto_round/main.py", line 533, in start
tune(args)
~~~~^^^^^^
File "/home/wenhuach/auto-round-main/auto_round/main.py", line 750, in tune
model, folders = autoround.quantize_and_save(export_dir, format=args.format) # pylint: disable=E1101
~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wenhuach/auto-round-main/auto_round/compressors/base.py", line 1004, in quantize_and_save
model, folders = self.save_quantized(
~~~~~~~~~~~~~~~~~~~^
output_dir, format=self.formats, inplace=inplace, return_folders=True, **kwargs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/wenhuach/auto-round-main/auto_round/compressors/base.py", line 3375, in save_quantized
compressed_model = format.save_quantized(
save_folder,
...<6 lines>...
**kwargs,
)
File "/home/wenhuach/auto-round-main/auto_round/formats.py", line 829, in save_quantized
return save_quantized_as_gguf(
output_dir=output_dir,
...<6 lines>...
**kwargs,
)
File "/home/wenhuach/miniforge3/envs/autoround/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
return func(*args, **kwargs)
File "/home/wenhuach/auto-round-main/auto_round/export/export_to_gguf/export.py", line 266, in save_quantized_as_gguf
gguf_model.write()
~~~~~~~~~~~~~~~~^^
File "/home/wenhuach/auto-round-main/auto_round/export/export_to_gguf/convert_hf_to_gguf.py", line 750, in write
self.prepare_tensors()
~~~~~~~~~~~~~~~~~~~~^^
File "/home/wenhuach/auto-round-main/auto_round/export/export_to_gguf/convert.py", line 653, in prepare_tensors
data, data_qtype = _quant_data(
~~~~~~~~~~~^
cls, data_torch, data_qtype, name, modify_name, new_name, bid, device=device
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/wenhuach/auto-round-main/auto_round/export/export_to_gguf/convert.py", line 301, in _quant_data
data = ggml_quant(data_torch, data_qtype.name.lower(), device=device, **kwargs)
File "/home/wenhuach/auto-round-main/auto_round/export/export_to_gguf/packing.py", line 105, in ggml_quant
new_data = ggml_quant_core(quant_func, blocks, scale, zp, wmin, d_scale, d_wmin, imatrix, original)
File "/home/wenhuach/auto-round-main/auto_round/export/export_to_gguf/packing.py", line 35, in ggml_quant_core
new_data = quant_func(
blocks,
...<6 lines>...
original=original,
)
File "/home/wenhuach/auto-round-main/auto_round/export/export_to_gguf/packing.py", line 615, in q2_k_quant_block
mins = wmin.reshape((-1, QK_K // 16))
^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'reshape'

Signed-off-by: n1ck-guo <heng.guo@intel.com>

n1ck-guo · 2026-03-24T05:09:52Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-03-24T05:10:03Z

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: n1ck-guo <heng.guo@intel.com>

fix gguf format fail infer for qwen3.5 series

3e7230c

Signed-off-by: n1ck-guo <heng.guo@intel.com>

Copilot AI review requested due to automatic review settings March 19, 2026 11:30

n1ck-guo mentioned this pull request Mar 19, 2026

[Bug]: Qwen3.5 series model fail infer for gguf format #1513

Closed

Copilot AI reviewed Mar 19, 2026

View reviewed changes

auto_round/compressors/base.py Outdated Show resolved Hide resolved

auto_round/export/export_to_gguf/convert.py Outdated Show resolved Hide resolved

codescan

bf37f6b

Signed-off-by: n1ck-guo <heng.guo@intel.com>

chensuyue requested a review from wenhuach21 March 19, 2026 15:48

wenhuach21 reviewed Mar 20, 2026

View reviewed changes

auto_round/compressors/base.py Show resolved Hide resolved

chensuyue added this to the 0.12.0 milestone Mar 20, 2026

n1ck-guo added 2 commits March 20, 2026 19:47

fix

b9d760e

Signed-off-by: n1ck-guo <heng.guo@intel.com>

Merge branch 'main' into hengguo/gguf_fix_319

8484729

n1ck-guo mentioned this pull request Mar 23, 2026

[Bug]: unexpected bits config for gguf #1585

Closed

n1ck-guo added 3 commits March 23, 2026 09:48

fix 1585

0d81b60

Signed-off-by: n1ck-guo <heng.guo@intel.com>

Merge branch 'main' into hengguo/gguf_fix_319

c11d80f

update

e4f265f

Signed-off-by: n1ck-guo <heng.guo@intel.com>

wenhuach21 approved these changes Mar 23, 2026

View reviewed changes

wenhuach21 changed the title ~~fix gguf format fail infer for qwen3.5 series~~ fix gguf bug for qwen3.5 moe and some small models Mar 23, 2026

fix bug

766f67f

Signed-off-by: n1ck-guo <heng.guo@intel.com>

XuehaoSun merged commit a2aced8 into main Mar 24, 2026
29 of 30 checks passed

XuehaoSun deleted the hengguo/gguf_fix_319 branch March 24, 2026 07:08

n1ck-guo mentioned this pull request Mar 24, 2026

[Bug]: Qwen3-30B-A3B KeyError: 'blk.0.ffn_down_exps' #1602

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix gguf bug for qwen3.5 moe and some small models#1575

fix gguf bug for qwen3.5 moe and some small models#1575
XuehaoSun merged 8 commits intomainfrom
hengguo/gguf_fix_319

n1ck-guo commented Mar 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

wenhuach21 commented Mar 19, 2026

Uh oh!

Uh oh!

wenhuach21 commented Mar 23, 2026

Uh oh!

n1ck-guo commented Mar 24, 2026

Uh oh!

azure-pipelines bot commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

n1ck-guo commented Mar 19, 2026

Description

Type of Change

Related Issues

Checklist Before Submitting

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

wenhuach21 commented Mar 19, 2026

Uh oh!

Uh oh!

wenhuach21 commented Mar 23, 2026

Uh oh!

n1ck-guo commented Mar 24, 2026

Uh oh!

azure-pipelines bot commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants