fix gguf bug for qwen3.5 moe and some small models#1575
Conversation
Signed-off-by: n1ck-guo <heng.guo@intel.com>
There was a problem hiding this comment.
Pull request overview
This PR targets GGUF export robustness and inference correctness (notably for Qwen3.5-series models) by aligning tensor handling with llama.cpp expectations and preventing failures in edge cases during GGUF conversion/packing.
Changes:
- Guard GGUF tensor preparation against empty
tensor_map.mapping(e.g., block_count=0 models) to avoid crashes. - Expand the “always-F32” tensor-type allowlist during GGUF export to include additional llama.cpp tensor identifiers.
- Adjust RTN layer quantization flow for immediate GGUF packing so the GGUF-specific path is taken regardless of
disable_opt_rtn.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
auto_round/export/export_to_gguf/convert.py |
Prevents failures when tensor_map.mapping is empty; updates GGUF tensor-type handling for F32 selection. |
auto_round/compressors/base.py |
Changes RTN quantization branching for immediate GGUF packing, affecting how disable_opt_rtn influences behavior. |
|
Great!been waiting for this pr for a long time |
python3 -m auto_round Qwen/Qwen3-1.7B --enable_alg_ext --iters 1 --nsamples 1 --format gguf:q2_k_sFile "/home/wenhuach/auto-round-main/auto_round/main.py", line 827, in |
|
/azp run Unit-Test-CUDA-AutoRound |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Description
Please briefly describe your main changes, the motivation.
Type of Change
Related Issues
Fixes or relates to #
Checklist Before Submitting