[mllm] support longcat_next by xin3he · Pull Request #1637 · intel/auto-round

xin3he · 2026-03-30T06:28:42Z

Description

ValueError: Cannot use apply_chat_template because this processor does not have a chat template.

To reproduce: auto-round /storage/xinhe/meituan-longcat/LongCat-Next/

Type of Change

Related Issues

Fixes or relates to #

Checklist Before Submitting

My code has been tested locally.
Documentation has been updated as needed.
New or updated tests are included where applicable.

Signed-off-by: Xin He <xin3.he@intel.com>

for more information, see https://pre-commit.ci

XuehaoSun · 2026-03-31T01:20:43Z

2026-03-30 15:56:33 INFO __main__.py L599: start to quantize meituan-longcat/LongCat-Next
2026-03-30 15:56:34 INFO autoround.py L178: using MLLM mode for multimodal model.
/data3/hf_new_model_cache/modules/transformers_modules/meituan_hyphen_longcat/LongCat_hyphen_Next/522f2020e5ed353429cc403b72491ba1899ef0e6/modular_longcat_next_audio.py:220: Fut
  @autocast(enabled=True, dtype=torch.float32)
2026-03-30 15:56:41 WARNING modeling_utils.py L2446: You are attempting to use Flash Attention 2 without specifying a torch dtype. This might lead to unexpected behaviour
/home/uttest/miniforge3/envs/autoround_test/lib/python3.12/site-packages/diffusers/models/lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in
  deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
self.visual_offset_vals=tensor([150581, 166965, 183349, 199733, 216117, 232501, 248885, 265269])
self.audio_offset_vals=tensor([131125, 139317, 143413, 145461, 146485, 147509, 148533, 149557])
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:01<00:00, 11.14it/s]
2026-03-30 15:57:01 WARNING compressor.py L286: longcat_next does not support for NeelNanda/pile-10k, will use liuhaotian/llava_conv_58k with default config as an alternative.
2026-03-30 15:57:01 WARNING compressor.py L296: reset batch_size(8) to 1 and gradient_accumulate_steps(1) to 8, because batch_size=8 cannot be used for liuhaotian/llava_conv_58k
2026-03-30 15:57:01 INFO base.py L517: using torch.bfloat16 for quantization tuning
2026-03-30 15:57:01 INFO base.py L834: 'enable_torch_compile' is set to `False` by default. Enabling it can reduce tuning cost by 20%, but it might throw an exception.
2026-03-30 15:57:01 WARNING formats.py L166: some layers are skipped quantization (shape not divisible by 32): audio_head.heads.[0-7], lm_head, model.audio_tokenizer.audio_flow_
2026-03-30 15:57:01 INFO base.py L1660: Using predefined ignore_layers: classifier
2026-03-30 15:57:02 INFO base.py L1818: start to cache block inputs
2026-03-30 15:57:07 WARNING base.py L2328: Some layers are offloaded to cpu, which may severely impact calibration speed. Please consider using more cards.
Some parameters are on the meta device because they were offloaded to the cpu.
2026-03-30 15:57:28 WARNING dataset.py L251: seqlen(2048) is greater than the maximum length supported by the liuhaotian/llava_conv_58k, reset to 512
2026-03-30 15:57:28 INFO dataset.py L99: use dataset llava_conv_58k, downloading...
cache block inputs: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 128/128 [14:44<00:00,  6.91s/it]
2026-03-30 16:12:42 INFO base.py L1835: caching done
Quantizing model.layers.0:   0%|                                                                                                                         | 0/100 [00:10<?, ?it/s]
  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
^[[B^[[A2026-03-30 16:54:23 INFO base.py L3187: Unquantized layers: ['mlp.router.classifier']
quantized 784/785 layers in the block, loss iter 0: 0.000444 -> iter 194: 0.000079,'peak_ram': 86.58GB, 'peak_vram': 66.75GB
Quantizing model.layers.1:   1%|█                                                                                                           | 1/100 [42:12<69:39:08, 2532.81s/it]
quantized 784/785 layers in the block, loss iter 0: 0.001716 -> iter 195: 0.000445,'peak_ram': 94.89GB, 'peak_vram': 66.75GB
Quantizing model.layers.2:   2%|██                                                                                                        | 2/100 [1:23:27<68:01:31, 2498.89s/it]
quantized 784/785 layers in the block, loss iter 0: 0.002576 -> iter 199: 0.001224,'peak_ram': 103.3GB, 'peak_vram': 66.75GB
Quantizing model.layers.3:   3%|███▏                                                                                                      | 3/100 [2:04:37<66:58:09, 2485.46s/it]
quantized 784/785 layers in the block, loss iter 0: 0.003595 -> iter 197: 0.001099,'peak_ram': 104.32GB, 'peak_vram': 66.75GB
Quantizing model.layers.4:   4%|████▏                                                                                                     | 4/100 [2:43:32<64:41:42, 2426.07s/it]
quantized 784/785 layers in the block, loss iter 0: 0.003605 -> iter 192: 0.001413,'peak_ram': 116.5GB, 'peak_vram': 66.75GB
Quantizing model.layers.5:   5%|█████▎                                                                                                    | 5/100 [3:21:56<62:51:48, 2382.19s/it]
quantized 784/785 layers in the block, loss iter 0: 0.004384 -> iter 192: 0.002084,'peak_ram': 116.6GB, 'peak_vram': 66.75GB
Quantizing model.layers.6:   6%|██████▎                                                                                                   | 6/100 [4:00:49<61:45:39, 2365.31s/it]
quantized 784/785 layers in the block, loss iter 0: 0.006060 -> iter 196: 0.002672,'peak_ram': 121.61GB, 'peak_vram': 66.75GB
Quantizing model.layers.7:   7%|███████▍                                                                                                  | 7/100 [4:39:00<60:28:36, 2341.03s/it]2026-03-30 21:30:55 INFO base.py L3187: Unquantized layers: ['mlp.router.classifier']
quantized 784/785 layers in the block, loss iter 0: 0.009842 -> iter 169: 0.003777,'peak_ram': 121.7GB, 'peak_vram': 66.75GB
Quantizing model.layers.8:   8%|████████▍                                                                                                 | 8/100 [5:18:48<60:12:30, 2355.99s/it]2026-03-30 22:10:08 INFO base.py L3187: Unquantized layers: ['mlp.router.classifier']
quantized 784/785 layers in the block, loss iter 0: 0.009777 -> iter 199: 0.004623,'peak_ram': 121.7GB, 'peak_vram': 66.75GB
Quantizing model.layers.9:   9%|█████████▌                                                                                                | 9/100 [5:57:55<59:29:10, 2353.30s/it]2026-03-30 22:48:58 INFO base.py L3187: Unquantized layers: ['mlp.router.classifier']
quantized 784/785 layers in the block, loss iter 0: 0.018928 -> iter 191: 0.008281,'peak_ram': 121.7GB, 'peak_vram': 66.75GB
Quantizing model.layers.10:  10%|██████████▍                                                                                             | 10/100 [6:36:50<58:41:27, 2347.64s/it]2026-03-30 23:28:31 INFO base.py L3187: Unquantized layers: ['mlp.router.classifier']
quantized 784/785 layers in the block, loss iter 0: 0.022149 -> iter 180: 0.011693,'peak_ram': 121.7GB, 'peak_vram': 66.75GB
Quantizing model.layers.11:  11%|███████████▍                                                                                            | 11/100 [7:16:19<58:12:02, 2354.18s/it]2026-03-31 00:09:23 INFO base.py L3187: Unquantized layers: ['mlp.router.classifier']
quantized 784/785 layers in the block, loss iter 0: 0.041877 -> iter 196: 0.017732,'peak_ram': 121.7GB, 'peak_vram': 66.75GB
Quantizing model.layers.12:  12%|████████████▍                                                                                           | 12/100 [7:57:11<58:16:20, 2383.87s/it]2026-03-31 00:52:34 INFO base.py L3187: Unquantized layers: ['mlp.router.classifier']
quantized 784/785 layers in the block, loss iter 0: 0.072172 -> iter 197: 0.030324,'peak_ram': 121.7GB, 'peak_vram': 66.75GB
Quantizing model.layers.13:  13%|█████████████▌                                                                                          | 13/100 [8:40:29<59:10:31, 2448.64s/it]2026-03-31 01:34:45 INFO base.py L3187: Unquantized layers: ['mlp.router.classifier']
quantized 784/785 layers in the block, loss iter 0: 0.134645 -> iter 190: 0.045848,'peak_ram': 121.7GB, 'peak_vram': 66.75GB
Quantizing model.layers.13:  14%|██████████████▌                                                                                         | 14/100 [9:22:36<59:03:33, 2472.25s/it]Traceback (most recent call last):
  File "/home/uttest/miniforge3/envs/autoround_test/bin/auto-round", line 10, in <module>
    sys.exit(run())
             ^^^^^
  File "/home/uttest/miniforge3/envs/autoround_test/lib/python3.12/site-packages/auto_round/__main__.py", line 822, in run
    start()
  File "/home/uttest/miniforge3/envs/autoround_test/lib/python3.12/site-packages/auto_round/__main__.py", line 541, in start
    tune(args)
  File "/home/uttest/miniforge3/envs/autoround_test/lib/python3.12/site-packages/auto_round/__main__.py", line 761, in tune
    model, folders = autoround.quantize_and_save(export_dir, format=args.format)  # pylint: disable=E1101
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/uttest/miniforge3/envs/autoround_test/lib/python3.12/site-packages/auto_round/compressors/base.py", line 1018, in quantize_and_save
    model, _ = self.quantize()
               ^^^^^^^^^^^^^^^
  File "/home/uttest/miniforge3/envs/autoround_test/lib/python3.12/site-packages/auto_round/compressors/base.py", line 1850, in quantize
    inputs = all_inputs[block_names[0]]
             ~~~~~~~~~~^^^^^^^^^^^^^^^^
KeyError: 'model.audio_tokenizer.audio_model.layers.0'

xin3he · 2026-03-31T07:03:17Z

Thank you for the checking. @XuehaoSun
Audio part should be skipped since the datasets only contains image and text, I will fix it and let you know.

Signed-off-by: Xin He <xin3.he@intel.com>

xin3he · 2026-04-02T03:24:34Z

It's more complex than the original expectation. since it's an omni model, more time is needed to enable it.

[mllm] support longcat_next

b777e59

Signed-off-by: Xin He <xin3.he@intel.com>

Copilot AI review requested due to automatic review settings March 30, 2026 06:28

[pre-commit.ci] auto fixes from pre-commit.com hooks

12f1f3d

for more information, see https://pre-commit.ci

Copilot started reviewing on behalf of xin3he March 30, 2026 06:30 View session

xin3he requested review from lvliang-intel and n1ck-guo and removed request for Copilot March 30, 2026 06:31

xin3he marked this pull request as draft April 1, 2026 11:09

fix processor issue

abfef49

Signed-off-by: Xin He <xin3.he@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mllm] support longcat_next#1637

[mllm] support longcat_next#1637
xin3he wants to merge 3 commits intomainfrom
xinhe/3-30

xin3he commented Mar 30, 2026 •

edited

Loading

Uh oh!

XuehaoSun commented Mar 31, 2026

Uh oh!

xin3he commented Mar 31, 2026

Uh oh!

xin3he commented Apr 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xin3he commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Related Issues

Checklist Before Submitting

Uh oh!

XuehaoSun commented Mar 31, 2026

Uh oh!

xin3he commented Mar 31, 2026

Uh oh!

xin3he commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xin3he commented Mar 30, 2026 •

edited

Loading

xin3he commented Apr 2, 2026 •

edited

Loading