Skip to content

fix gguf bug for qwen3.5 moe and some small models#1575

Merged
XuehaoSun merged 8 commits intomainfrom
hengguo/gguf_fix_319
Mar 24, 2026
Merged

fix gguf bug for qwen3.5 moe and some small models#1575
XuehaoSun merged 8 commits intomainfrom
hengguo/gguf_fix_319

Conversation

@n1ck-guo
Copy link
Copy Markdown
Contributor

Description

Please briefly describe your main changes, the motivation.

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Other (please specify):

Related Issues

Fixes or relates to #

Checklist Before Submitting

  • My code has been tested locally.
  • Documentation has been updated as needed.
  • New or updated tests are included where applicable.

Signed-off-by: n1ck-guo <heng.guo@intel.com>
Copilot AI review requested due to automatic review settings March 19, 2026 11:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets GGUF export robustness and inference correctness (notably for Qwen3.5-series models) by aligning tensor handling with llama.cpp expectations and preventing failures in edge cases during GGUF conversion/packing.

Changes:

  • Guard GGUF tensor preparation against empty tensor_map.mapping (e.g., block_count=0 models) to avoid crashes.
  • Expand the “always-F32” tensor-type allowlist during GGUF export to include additional llama.cpp tensor identifiers.
  • Adjust RTN layer quantization flow for immediate GGUF packing so the GGUF-specific path is taken regardless of disable_opt_rtn.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
auto_round/export/export_to_gguf/convert.py Prevents failures when tensor_map.mapping is empty; updates GGUF tensor-type handling for F32 selection.
auto_round/compressors/base.py Changes RTN quantization branching for immediate GGUF packing, affecting how disable_opt_rtn influences behavior.

@wenhuach21
Copy link
Copy Markdown
Contributor

Great!been waiting for this pr for a long time

Signed-off-by: n1ck-guo <heng.guo@intel.com>
@chensuyue chensuyue requested a review from wenhuach21 March 19, 2026 15:48
@chensuyue chensuyue added this to the 0.12.0 milestone Mar 20, 2026
Signed-off-by: n1ck-guo <heng.guo@intel.com>
@wenhuach21
Copy link
Copy Markdown
Contributor

python3 -m auto_round Qwen/Qwen3-1.7B    --enable_alg_ext --iters 1 --nsamples 1 --format gguf:q2_k_s

File "/home/wenhuach/auto-round-main/auto_round/main.py", line 827, in
run()
~~~^^
File "/home/wenhuach/auto-round-main/auto_round/main.py", line 811, in run
start()
~~~~~^^
File "/home/wenhuach/auto-round-main/auto_round/main.py", line 533, in start
tune(args)
~~~~^^^^^^
File "/home/wenhuach/auto-round-main/auto_round/main.py", line 750, in tune
model, folders = autoround.quantize_and_save(export_dir, format=args.format) # pylint: disable=E1101
~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wenhuach/auto-round-main/auto_round/compressors/base.py", line 1004, in quantize_and_save
model, folders = self.save_quantized(
~~~~~~~~~~~~~~~~~~~^
output_dir, format=self.formats, inplace=inplace, return_folders=True, **kwargs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/wenhuach/auto-round-main/auto_round/compressors/base.py", line 3375, in save_quantized
compressed_model = format.save_quantized(
save_folder,
...<6 lines>...
**kwargs,
)
File "/home/wenhuach/auto-round-main/auto_round/formats.py", line 829, in save_quantized
return save_quantized_as_gguf(
output_dir=output_dir,
...<6 lines>...
**kwargs,
)
File "/home/wenhuach/miniforge3/envs/autoround/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
return func(*args, **kwargs)
File "/home/wenhuach/auto-round-main/auto_round/export/export_to_gguf/export.py", line 266, in save_quantized_as_gguf
gguf_model.write()
~~~~~~~~~~~~~~~~^^
File "/home/wenhuach/auto-round-main/auto_round/export/export_to_gguf/convert_hf_to_gguf.py", line 750, in write
self.prepare_tensors()
~~~~~~~~~~~~~~~~~~~~^^
File "/home/wenhuach/auto-round-main/auto_round/export/export_to_gguf/convert.py", line 653, in prepare_tensors
data, data_qtype = _quant_data(
~~~~~~~~~~~^
cls, data_torch, data_qtype, name, modify_name, new_name, bid, device=device
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/wenhuach/auto-round-main/auto_round/export/export_to_gguf/convert.py", line 301, in _quant_data
data = ggml_quant(data_torch, data_qtype.name.lower(), device=device, **kwargs)
File "/home/wenhuach/auto-round-main/auto_round/export/export_to_gguf/packing.py", line 105, in ggml_quant
new_data = ggml_quant_core(quant_func, blocks, scale, zp, wmin, d_scale, d_wmin, imatrix, original)
File "/home/wenhuach/auto-round-main/auto_round/export/export_to_gguf/packing.py", line 35, in ggml_quant_core
new_data = quant_func(
blocks,
...<6 lines>...
original=original,
)
File "/home/wenhuach/auto-round-main/auto_round/export/export_to_gguf/packing.py", line 615, in q2_k_quant_block
mins = wmin.reshape((-1, QK_K // 16))
^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'reshape'

Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
@wenhuach21 wenhuach21 changed the title fix gguf format fail infer for qwen3.5 series fix gguf bug for qwen3.5 moe and some small models Mar 23, 2026
@n1ck-guo
Copy link
Copy Markdown
Contributor Author

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: n1ck-guo <heng.guo@intel.com>
@XuehaoSun XuehaoSun merged commit a2aced8 into main Mar 24, 2026
29 of 30 checks passed
@XuehaoSun XuehaoSun deleted the hengguo/gguf_fix_319 branch March 24, 2026 07:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants