Release v3.10.0 · modelscope/ms-swift

中文版

新特性

Megatron-SWIFT
a. Mcore-Bridge发布。支持直接加载和存储 safetensors 格式的模型权重；支持LoRA增量权重双向转换；支持多机转换。文档参考：https://swift.readthedocs.io/zh-cn/latest/Megatron-SWIFT/Mcore-Bridge.html 。训练脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/megatron/mcore_bridge
b. megatron-core 版本升级至0.14.0。
c. 多模态模型训练新增 vit_lr 和 aligner_lr 参数支持。
d. 新增存储优化参数：async_save, save_retain_interval等。
e. 支持batched mrope，加速Qwen3-VL、Qwen2.5-VL等模型的训练速度。
RL
a. GRPO LoRA 训练权重同步速度优化，具体参考：https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/GetStarted/GRPO.html#id3
b. GRPO 训练显存优化以降低峰值显存占用。
c. RLVR 新算法支持：RLOO，文档参考：https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/AdvancedResearch/RLOO.html 。REINFORCE++ Baseline，文档参考：https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/AdvancedResearch/REINFORCEPP.html
d. GKD 支持使用 vLLM 加速策略模型rollout，并新增参数teacher_deepspeed额外控制教师模型分片策略。文档参考：https://swift.readthedocs.io/zh-cn/latest/Instruction/GKD.html
e. GSPO 支持使用liger_kernel减少显存使用。
训练
a. PT/SFT/采样/数据蒸馏中支持了RAY，具体参考文档：https://swift.readthedocs.io/zh-cn/latest/Instruction/Ray.html
b. Qwen3-VL、Qwen3-Omni支持混合模态数据训练；Qwen3-VL支持ulysses序列并行。训练脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_vl
c. 支持 yaml 方式配置训练参数，脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/yaml
d. 新增 FSDP2 训练启动案例，脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/train/multi-gpu/fsdp2_lora
e. 新增自定义多模态模型注册最佳实践：https://swift.readthedocs.io/zh-cn/latest/BestPractices/MLLM-Registration.html
f. embedding 训练中的 InfoNCE 损失与 Qwen3-Embedding 论文描述对齐。具体参考文档：https://swift.readthedocs.io/zh-cn/latest/BestPractices/Embedding.html
g. 新增多标签分类训练案例，脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/train/seq_cls/multi_label
h. agent_template 支持 seed-oss。感谢@hpsun1109的贡献。
全链路
a. swift export支持 GPTQ-v2 量化，脚本参考：https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize/gptq_v2.sh 。感谢@zzc0430的贡献。
b. swift deploy vllm推理后端支持 DP 部署，使用--vllm_data_parallel_size参数。感谢@YushunXiang 的贡献。
c. swift deploy 新增 health/ping endpoints。
d. vLLM 部署新增参数 vllm_mm_processor_cache_gb/vllm_engine_kwargs。

新模型

纯文本模型：
a. Qwen/Qwen3Guard-Gen-0.6B系列
b. MiniMax/MiniMax-M2
多模态模型：
a. Qwen/Qwen3-VL-2B-Instruct系列
b. deepseek-ai/DeepSeek-OCR，训练脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/models/deepseek_ocr
c. PaddlePaddle/PaddleOCR-VL
d. ZhipuAI/Glyph
e. PaddlePaddle/ERNIE-4.5-VL-28B-A3B-Thinking系列
f. lmms-lab/LLaVA-OneVision-1.5-4B-Instruct系列

English Version

New Features

Megatron-SWIFT
a. Mcore-Bridge Release. Supports direct loading and saving of model weights in safetensors format; supports bidirectional conversion of LoRA incremental weights; supports multi-node conversion. Documentation: https://swift.readthedocs.io/en/latest/Megatron-SWIFT/Mcore-Bridge.html. Training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/megatron/mcore_bridge
b. Upgraded megatron-core version to 0.14.0.
c. Added vit_lr and aligner_lr parameter support for multimodal model training.
d. Added storage optimization parameters: async_save, save_retain_interval, etc.
e. Support for batched mrope to accelerate training speed of Qwen3-VL, Qwen2.5-VL, and other models.
RL
a. GRPO LoRA training weight synchronization speed optimization. Details: https://swift.readthedocs.io/en/latest/Instruction/GRPO/GetStarted/GRPO.html#memory-optimization-solutions-in-colocate-mode
b. GRPO training memory optimization to reduce peak memory consumption.
c. New RLVR algorithm support: RLOO, documentation: https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/RLOO.html. REINFORCE++ Baseline, documentation: https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/REINFORCEPP.html
d. GKD supports using vLLM to accelerate policy model rollout, with new parameter teacher_deepspeed for additional control of teacher model sharding strategy. Documentation: https://swift.readthedocs.io/en/latest/Instruction/GKD.html
e. GSPO supports using liger_kernel to reduce memory usage.
Training
a. RAY support added for PT/SFT/Sampling/Data Distillation, documentation: https://swift.readthedocs.io/en/latest/Instruction/Ray.html
b. Qwen3-VL and Qwen3-Omni support mixed modality data training; Qwen3-VL supports Ulysses sequence parallelism. Training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_vl
c. Support for YAML-based training parameter configuration, scripts: https://github.com/modelscope/ms-swift/tree/main/examples/yaml
d. Added FSDP2 training launch example, scripts: https://github.com/modelscope/ms-swift/tree/main/examples/train/multi-gpu/fsdp2_lora
e. Added best practice for custom multimodal model registration: https://swift.readthedocs.io/en/latest/BestPractices/MLLM-Registration.html
f. InfoNCE loss in embedding training aligned with Qwen3-Embedding paper description. Documentation: https://swift.readthedocs.io/en/latest/BestPractices/Embedding.html
g. Added multi-label classification training example, scripts: https://github.com/modelscope/ms-swift/tree/main/examples/train/seq_cls/multi_label
h. agent_template supports seed-oss. Thanks to @hpsun1109 for the contribution.
Full Pipeline
a. swift export supports GPTQ-v2 quantization, scripts: https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize/gptq_v2.sh. Thanks to @zzc0430 for the contribution.
b. swift deploy vLLM inference backend supports DP deployment, using --vllm_data_parallel_size parameter. Thanks to @YushunXiang for the contribution.
c. swift deploy added health/ping endpoints.
d. vLLM deployment added parameters vllm_mm_processor_cache_gb/vllm_engine_kwargs.

New Models

Text-only models:
a. Qwen/Qwen3Guard-Gen-0.6B series
b. MiniMax/MiniMax-M2
Multimodal models:
a. Qwen/Qwen3-VL-2B-Instruct series
b. deepseek-ai/DeepSeek-OCR, training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/models/deepseek_ocr
c. PaddlePaddle/PaddleOCR-VL
d. ZhipuAI/Glyph
e. PaddlePaddle/ERNIE-4.5-VL-28B-A3B-Thinking series
f. lmms-lab/LLaVA-OneVision-1.5-4B-Instruct series

What's Changed

[bugfix] fix image_list qwen2.5/3-omni by @Jintao-Huang in #6122
[model] Support Qwen3-VL dense by @Jintao-Huang in #6120
feat: support gptq_v2 quantization method by @zzc0430 in #6102
[bugfix] fix gptq_v2 by @Jintao-Huang in #6126
[bugfix] patch timeout & fix print_rich_table by @Jintao-Huang in #6137
Add the support for vLLM data parallel configuration in SwiftDeploy by @YushunXiang in #6114
[docs] update vllm deploy DP docs by @Jintao-Huang in #6139
[model] Support Qwen/Qwen3-VL-4B-Instruct series by @Jintao-Huang in #6143
Update loss_scale method call to pass through inputs.extra_kwargs by @CJack812 in #6160
[bugfix] fix qwen3_vl videos by @Jintao-Huang in #6162
Fix bug of sp/cp by @tastelikefeet in #6163
[deploy] update vllm_enable_prefix_caching by @Jintao-Huang in #6165
[bugfix] qwen3-vl support mixed data by @Jintao-Huang in #6161
[template] add_retry by @Jintao-Huang in #6138
[bugfix] Fix multimodal lazy_tokenize false by @Jintao-Huang in #6172
[template] update qwen3_vl grounding dataset format by @Jintao-Huang in #6178
[docs] update docs by @Jintao-Huang in #6180
[bugfix] add tools fileds in inputs2reqeusts by @hjh0119 in #6054
[grpo] Optimize vLLM weight synchronization & update buitin accuracy reward by @hjh0119 in #5773
[model] support Qwen/Qwen3Guard-Gen-0.6B series by @Jintao-Huang in #6189
[template] Support qwen3 omni mixed data by @Jintao-Huang in #6196
[docs] update qwen3_vl best practice by @Jintao-Huang in #6206
[vllm] support vllm_mm_processor_cache_gb by @hjh0119 in #6210
[megatron] fix qwen3_vl new_special_tokens by @Jintao-Huang in #6213
[megatron] add mcore save_args by @Jintao-Huang in #6216
[bugfix] fix dtype warning by @Jintao-Huang in #6219
[bugfix] fix infer pt dp by @Jintao-Huang in #6222
support training for multimodal reranker by @0russwest0 in #6192
[bugfix] fix reward_trainer logger by @Jintao-Huang in #6240
[model] Support deepseek-ocr by @Jintao-Huang in #6238
[docs] update deepseek_ocr docs by @Jintao-Huang in #6242
[bugfix] fix qwen3_vl vllm by @Jintao-Huang in #6246
update requirements torch28 by @Jintao-Huang in #6181
[deploy] support vllm_engine_kwargs by @Jintao-Huang in #6249
[bugfix] fix megatron seq_cls by @Jintao-Huang in #6253
[docs] update grounding docs by @Jintao-Huang in #6254
[model] Support Qwen3-VL 2B/32B by @Jintao-Huang in #6259
Support sp + qwen3-vl by @tastelikefeet in #6263
refactor GKD to support vLLM rollout by @hjh0119 in #6250
[bugfix] fix grpo mixed data training by @hjh0119 in #6269
[megatron] use batched mrope by @Jintao-Huang in #6281
[docs] update register mllm docs by @Jintao-Huang in #6282
[bugfix] fix grpo padding-free get logps by @hjh0119 in #6275
[model] support paddle-ocr by @hjh0119 in #6285
[model] support MiniMax/MiniMax-M2 by @Jintao-Huang in #6303
[vllm] Filter out None values in vLLM initialization by @hjh0119 in #6305
[dapo] fix truncation_strategy="delete" in dynamic sampling by @hjh0119 in #6309
update wechat by @tastelikefeet in #6314
[model] support LLaVA-OneVision-1.5 by @slin000111 in #6284
[model] support glyph by @Jintao-Huang in #6324
[algo] support RLOO algorithm by @hjh0119 in #6325
[bugfix] remove response before filter encoded failed data by @hjh0119 in #6315
[script] provide on-policy distillation script & update GKD doc by @hjh0119 in #6334
Support ray by @tastelikefeet in #6323
fix ray doc by @tastelikefeet in #6336
feat: Add SeedAgentTemplate by @hpsun1109 in #6270
[liger-kernel] support more model & gspo by @hjh0119 in #6338
[bugfix] Reset prefix cache when syncing only LoRA adapters by @hjh0119 in #6343
fix argv out of range by @tastelikefeet in #6344
fix listwise generative reranker loss by @0russwest0 in #6347
[megatron] fix megatron qwen3_vl overlap_grad_reduce by @Jintao-Huang in #6352
[bugfix] fix megatron qwen3_vl gradient_checkpointing by @Jintao-Huang in #6356
fix eval by @tastelikefeet in #6354
fix eval by @Jintao-Huang in #6357
[doc] add offload_teacher_model args by @hjh0119 in #6370
[grpo] fix gym training by @hjh0119 in #6374
Bug fix: Enable LoRA on MoE experts with 'all-linear' by @B-201 in #6340
[colocate] Optimizing GPU Memory Usage to Reduce Peak Memory Consumption by @hjh0119 in #6375
fix qwen3-ulysses by @tastelikefeet in #6382
fix bugs & update docs by @Jintao-Huang in #6397
[template] support deepseek_ocr batch train by @Jintao-Huang in #6399
[bugfix] Fix padding free acc by @Jintao-Huang in #6400
[bugfix] fix teacher_model_type by @Jintao-Huang in #6401
[bugfix] fix padding_side by @Jintao-Huang in #6403
[algo] support reinforce++ baseline by @hjh0119 in #6385
Support TRL 0.24 compatibility by @hjh0119 in #6406
[bugfix] Fix qwen3 coder template by @Jintao-Huang in #6409
fix check latest model for trainers by @hjh0119 in #6412
support mcore_bridge by @Jintao-Huang in #6182
[bugfix] fix megatron is_master by @Jintao-Huang in #6426
[bugfix] fix qwen3_vl mcore-bridge by @Jintao-Huang in #6428
[mcore_bridge] update docs by @Jintao-Huang in #6431
[mcore_bridge] fix qwen3_moe merge_lora by @Jintao-Huang in #6435
align InfoNCE with Qwen3-Embedding by @0russwest0 in #6420
update swift image by @Jintao-Huang in #6437
[bugfix] fix mcore_bridge dpo by @Jintao-Huang in #6440
[doc] fix gkd script link by @hjh0119 in #6458
[docs] update readthedocs by @Jintao-Huang in #6457
[docs] update docs by @Jintao-Huang in #6460
[docs] update docs by @Jintao-Huang in #6461
[bugfix] fix qwen3_next_quant by @Jintao-Huang in #6462
[mcore] update mcore_version by @Jintao-Huang in #6463
[megatron] support vit_lr aligner_lr by @Jintao-Huang in #6469
Add FSDP2 example by @slin000111 in #6411
[mcore_bridge] fix multimodal lora export by @Jintao-Huang in #6483
[deploy]add health/ping endpoints by @hjh0119 in #6488
[mcore_bridge] fix mcore_bridge bugs by @Jintao-Huang in #6499
[examples] Add multilabel examples by @Jintao-Huang in #6500
[bugfix] fix mcore_bridge deepseek-v3 by @Jintao-Huang in #6508
[bugfix] compat deepseek-v3 mcore 0.13.0 by @Jintao-Huang in #6510
Fix ppu by @tastelikefeet in #6489
Fix qwen3 vl sp by @tastelikefeet in #6514
[megatron] Support mcore bridge seq-cls by @Jintao-Huang in #6511
fix docs by @tastelikefeet in #6517
resolve template bug in seed oss pretraining by @hpsun1109 in #6520
[megatron] default use batched_rope by @Jintao-Huang in #6524
update random & fix bug by @Jintao-Huang in #6531
fix bugs by @tastelikefeet in #6533
update bleu by @Jintao-Huang in #6538
[GKD] Log comletions & profiling by @hjh0119 in #6540
[bugfix] fix eval register by @Jintao-Huang in #6543
[bugfix] fix GRPO PT Rollout with SP by @hjh0119 in #6546
[bugfix] fix GSPO padding_free by @hjh0119 in #6548
[model] support ernie_vl by @Jintao-Huang in #6545

New Contributors

@YushunXiang made their first contribution in #6114
@B-201 made their first contribution in #6340

Full Changelog: v3.9.0...v3.10.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v3.10.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

中文版

新特性

新模型

English Version

New Features

New Models

What's Changed

New Contributors

Contributors

Uh oh!