v3.10.0
中文版
新特性
- Megatron-SWIFT
a. Mcore-Bridge发布。支持直接加载和存储 safetensors 格式的模型权重;支持LoRA增量权重双向转换;支持多机转换。文档参考:https://swift.readthedocs.io/zh-cn/latest/Megatron-SWIFT/Mcore-Bridge.html 。训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/megatron/mcore_bridge
b. megatron-core 版本升级至0.14.0。
c. 多模态模型训练新增vit_lr和aligner_lr参数支持。
d. 新增存储优化参数:async_save, save_retain_interval等。
e. 支持batched mrope,加速Qwen3-VL、Qwen2.5-VL等模型的训练速度。 - RL
a. GRPO LoRA 训练权重同步速度优化,具体参考:https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/GetStarted/GRPO.html#id3
b. GRPO 训练显存优化以降低峰值显存占用。
c. RLVR 新算法支持:RLOO,文档参考:https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/AdvancedResearch/RLOO.html 。REINFORCE++ Baseline,文档参考:https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/AdvancedResearch/REINFORCEPP.html
d. GKD 支持使用 vLLM 加速策略模型rollout,并新增参数teacher_deepspeed额外控制教师模型分片策略。文档参考:https://swift.readthedocs.io/zh-cn/latest/Instruction/GKD.html
e. GSPO 支持使用liger_kernel减少显存使用。 - 训练
a. PT/SFT/采样/数据蒸馏中支持了RAY,具体参考文档:https://swift.readthedocs.io/zh-cn/latest/Instruction/Ray.html
b. Qwen3-VL、Qwen3-Omni支持混合模态数据训练;Qwen3-VL支持ulysses序列并行。训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_vl
c. 支持 yaml 方式配置训练参数,脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/yaml
d. 新增 FSDP2 训练启动案例,脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/train/multi-gpu/fsdp2_lora
e. 新增自定义多模态模型注册最佳实践:https://swift.readthedocs.io/zh-cn/latest/BestPractices/MLLM-Registration.html
f. embedding 训练中的 InfoNCE 损失与 Qwen3-Embedding 论文描述对齐。具体参考文档:https://swift.readthedocs.io/zh-cn/latest/BestPractices/Embedding.html
g. 新增多标签分类训练案例,脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/train/seq_cls/multi_label
h. agent_template 支持 seed-oss。感谢@hpsun1109的贡献。 - 全链路
a.swift export支持 GPTQ-v2 量化,脚本参考:https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize/gptq_v2.sh 。感谢@zzc0430的贡献。
b.swift deployvllm推理后端支持 DP 部署,使用--vllm_data_parallel_size参数。感谢@YushunXiang 的贡献。
c.swift deploy新增 health/ping endpoints。
d. vLLM 部署新增参数vllm_mm_processor_cache_gb/vllm_engine_kwargs。
新模型
- 纯文本模型:
a. Qwen/Qwen3Guard-Gen-0.6B系列
b. MiniMax/MiniMax-M2 - 多模态模型:
a. Qwen/Qwen3-VL-2B-Instruct系列
b. deepseek-ai/DeepSeek-OCR,训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/models/deepseek_ocr
c. PaddlePaddle/PaddleOCR-VL
d. ZhipuAI/Glyph
e. PaddlePaddle/ERNIE-4.5-VL-28B-A3B-Thinking系列
f. lmms-lab/LLaVA-OneVision-1.5-4B-Instruct系列
English Version
New Features
- Megatron-SWIFT
a. Mcore-Bridge Release. Supports direct loading and saving of model weights in safetensors format; supports bidirectional conversion of LoRA incremental weights; supports multi-node conversion. Documentation: https://swift.readthedocs.io/en/latest/Megatron-SWIFT/Mcore-Bridge.html. Training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/megatron/mcore_bridge
b. Upgraded megatron-core version to 0.14.0.
c. Addedvit_lrandaligner_lrparameter support for multimodal model training.
d. Added storage optimization parameters: async_save, save_retain_interval, etc.
e. Support for batched mrope to accelerate training speed of Qwen3-VL, Qwen2.5-VL, and other models. - RL
a. GRPO LoRA training weight synchronization speed optimization. Details: https://swift.readthedocs.io/en/latest/Instruction/GRPO/GetStarted/GRPO.html#memory-optimization-solutions-in-colocate-mode
b. GRPO training memory optimization to reduce peak memory consumption.
c. New RLVR algorithm support: RLOO, documentation: https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/RLOO.html. REINFORCE++ Baseline, documentation: https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/REINFORCEPP.html
d. GKD supports using vLLM to accelerate policy model rollout, with new parameter teacher_deepspeed for additional control of teacher model sharding strategy. Documentation: https://swift.readthedocs.io/en/latest/Instruction/GKD.html
e. GSPO supports using liger_kernel to reduce memory usage. - Training
a. RAY support added for PT/SFT/Sampling/Data Distillation, documentation: https://swift.readthedocs.io/en/latest/Instruction/Ray.html
b. Qwen3-VL and Qwen3-Omni support mixed modality data training; Qwen3-VL supports Ulysses sequence parallelism. Training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_vl
c. Support for YAML-based training parameter configuration, scripts: https://github.com/modelscope/ms-swift/tree/main/examples/yaml
d. Added FSDP2 training launch example, scripts: https://github.com/modelscope/ms-swift/tree/main/examples/train/multi-gpu/fsdp2_lora
e. Added best practice for custom multimodal model registration: https://swift.readthedocs.io/en/latest/BestPractices/MLLM-Registration.html
f. InfoNCE loss in embedding training aligned with Qwen3-Embedding paper description. Documentation: https://swift.readthedocs.io/en/latest/BestPractices/Embedding.html
g. Added multi-label classification training example, scripts: https://github.com/modelscope/ms-swift/tree/main/examples/train/seq_cls/multi_label
h. agent_template supports seed-oss. Thanks to @hpsun1109 for the contribution. - Full Pipeline
a.swift exportsupports GPTQ-v2 quantization, scripts: https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize/gptq_v2.sh. Thanks to @zzc0430 for the contribution.
b. swift deploy vLLM inference backend supports DP deployment, using--vllm_data_parallel_sizeparameter. Thanks to @YushunXiang for the contribution.
c. swift deploy added health/ping endpoints.
d. vLLM deployment added parametersvllm_mm_processor_cache_gb/vllm_engine_kwargs.
New Models
- Text-only models:
a. Qwen/Qwen3Guard-Gen-0.6B series
b. MiniMax/MiniMax-M2 - Multimodal models:
a. Qwen/Qwen3-VL-2B-Instruct series
b. deepseek-ai/DeepSeek-OCR, training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/models/deepseek_ocr
c. PaddlePaddle/PaddleOCR-VL
d. ZhipuAI/Glyph
e. PaddlePaddle/ERNIE-4.5-VL-28B-A3B-Thinking series
f. lmms-lab/LLaVA-OneVision-1.5-4B-Instruct series
What's Changed
- [bugfix] fix image_list qwen2.5/3-omni by @Jintao-Huang in #6122
- [model] Support Qwen3-VL dense by @Jintao-Huang in #6120
- feat: support gptq_v2 quantization method by @zzc0430 in #6102
- [bugfix] fix gptq_v2 by @Jintao-Huang in #6126
- [bugfix] patch timeout & fix print_rich_table by @Jintao-Huang in #6137
- Add the support for vLLM data parallel configuration in SwiftDeploy by @YushunXiang in #6114
- [docs] update vllm deploy DP docs by @Jintao-Huang in #6139
- [model] Support Qwen/Qwen3-VL-4B-Instruct series by @Jintao-Huang in #6143
- Update loss_scale method call to pass through inputs.extra_kwargs by @CJack812 in #6160
- [bugfix] fix qwen3_vl videos by @Jintao-Huang in #6162
- Fix bug of sp/cp by @tastelikefeet in #6163
- [deploy] update vllm_enable_prefix_caching by @Jintao-Huang in #6165
- [bugfix] qwen3-vl support mixed data by @Jintao-Huang in #6161
- [template] add_retry by @Jintao-Huang in #6138
- [bugfix] Fix multimodal lazy_tokenize false by @Jintao-Huang in #6172
- [template] update qwen3_vl grounding dataset format by @Jintao-Huang in #6178
- [docs] update docs by @Jintao-Huang in #6180
- [bugfix] add tools fileds in inputs2reqeusts by @hjh0119 in #6054
- [grpo] Optimize vLLM weight synchronization & update buitin accuracy reward by @hjh0119 in #5773
- [model] support Qwen/Qwen3Guard-Gen-0.6B series by @Jintao-Huang in #6189
- [template] Support qwen3 omni mixed data by @Jintao-Huang in #6196
- [docs] update qwen3_vl best practice by @Jintao-Huang in #6206
- [vllm] support vllm_mm_processor_cache_gb by @hjh0119 in #6210
- [megatron] fix qwen3_vl new_special_tokens by @Jintao-Huang in #6213
- [megatron] add mcore save_args by @Jintao-Huang in #6216
- [bugfix] fix dtype warning by @Jintao-Huang in #6219
- [bugfix] fix infer pt dp by @Jintao-Huang in #6222
- support training for multimodal reranker by @0russwest0 in #6192
- [bugfix] fix reward_trainer logger by @Jintao-Huang in #6240
- [model] Support deepseek-ocr by @Jintao-Huang in #6238
- [docs] update deepseek_ocr docs by @Jintao-Huang in #6242
- [bugfix] fix qwen3_vl vllm by @Jintao-Huang in #6246
- update requirements torch28 by @Jintao-Huang in #6181
- [deploy] support vllm_engine_kwargs by @Jintao-Huang in #6249
- [bugfix] fix megatron seq_cls by @Jintao-Huang in #6253
- [docs] update grounding docs by @Jintao-Huang in #6254
- [model] Support Qwen3-VL 2B/32B by @Jintao-Huang in #6259
- Support sp + qwen3-vl by @tastelikefeet in #6263
- refactor GKD to support vLLM rollout by @hjh0119 in #6250
- [bugfix] fix grpo mixed data training by @hjh0119 in #6269
- [megatron] use batched mrope by @Jintao-Huang in #6281
- [docs] update register mllm docs by @Jintao-Huang in #6282
- [bugfix] fix grpo padding-free get logps by @hjh0119 in #6275
- [model] support paddle-ocr by @hjh0119 in #6285
- [model] support MiniMax/MiniMax-M2 by @Jintao-Huang in #6303
- [vllm] Filter out None values in vLLM initialization by @hjh0119 in #6305
- [dapo] fix truncation_strategy="delete" in dynamic sampling by @hjh0119 in #6309
- update wechat by @tastelikefeet in #6314
- [model] support LLaVA-OneVision-1.5 by @slin000111 in #6284
- [model] support glyph by @Jintao-Huang in #6324
- [algo] support RLOO algorithm by @hjh0119 in #6325
- [bugfix] remove response before filter encoded failed data by @hjh0119 in #6315
- [script] provide on-policy distillation script & update GKD doc by @hjh0119 in #6334
- Support ray by @tastelikefeet in #6323
- fix ray doc by @tastelikefeet in #6336
- feat: Add SeedAgentTemplate by @hpsun1109 in #6270
- [liger-kernel] support more model & gspo by @hjh0119 in #6338
- [bugfix] Reset prefix cache when syncing only LoRA adapters by @hjh0119 in #6343
- fix argv out of range by @tastelikefeet in #6344
- fix listwise generative reranker loss by @0russwest0 in #6347
- [megatron] fix megatron qwen3_vl overlap_grad_reduce by @Jintao-Huang in #6352
- [bugfix] fix megatron qwen3_vl gradient_checkpointing by @Jintao-Huang in #6356
- fix eval by @tastelikefeet in #6354
- fix eval by @Jintao-Huang in #6357
- [doc] add offload_teacher_model args by @hjh0119 in #6370
- [grpo] fix gym training by @hjh0119 in #6374
- Bug fix: Enable LoRA on MoE experts with 'all-linear' by @B-201 in #6340
- [colocate] Optimizing GPU Memory Usage to Reduce Peak Memory Consumption by @hjh0119 in #6375
- fix qwen3-ulysses by @tastelikefeet in #6382
- fix bugs & update docs by @Jintao-Huang in #6397
- [template] support deepseek_ocr batch train by @Jintao-Huang in #6399
- [bugfix] Fix padding free acc by @Jintao-Huang in #6400
- [bugfix] fix teacher_model_type by @Jintao-Huang in #6401
- [bugfix] fix padding_side by @Jintao-Huang in #6403
- [algo] support reinforce++ baseline by @hjh0119 in #6385
- Support TRL 0.24 compatibility by @hjh0119 in #6406
- [bugfix] Fix qwen3 coder template by @Jintao-Huang in #6409
- fix check latest model for trainers by @hjh0119 in #6412
- support mcore_bridge by @Jintao-Huang in #6182
- [bugfix] fix megatron is_master by @Jintao-Huang in #6426
- [bugfix] fix qwen3_vl mcore-bridge by @Jintao-Huang in #6428
- [mcore_bridge] update docs by @Jintao-Huang in #6431
- [mcore_bridge] fix qwen3_moe merge_lora by @Jintao-Huang in #6435
- align InfoNCE with Qwen3-Embedding by @0russwest0 in #6420
- update swift image by @Jintao-Huang in #6437
- [bugfix] fix mcore_bridge dpo by @Jintao-Huang in #6440
- [doc] fix gkd script link by @hjh0119 in #6458
- [docs] update readthedocs by @Jintao-Huang in #6457
- [docs] update docs by @Jintao-Huang in #6460
- [docs] update docs by @Jintao-Huang in #6461
- [bugfix] fix qwen3_next_quant by @Jintao-Huang in #6462
- [mcore] update mcore_version by @Jintao-Huang in #6463
- [megatron] support vit_lr aligner_lr by @Jintao-Huang in #6469
- Add FSDP2 example by @slin000111 in #6411
- [mcore_bridge] fix multimodal lora export by @Jintao-Huang in #6483
- [deploy]add health/ping endpoints by @hjh0119 in #6488
- [mcore_bridge] fix mcore_bridge bugs by @Jintao-Huang in #6499
- [examples] Add multilabel examples by @Jintao-Huang in #6500
- [bugfix] fix mcore_bridge deepseek-v3 by @Jintao-Huang in #6508
- [bugfix] compat deepseek-v3 mcore 0.13.0 by @Jintao-Huang in #6510
- Fix ppu by @tastelikefeet in #6489
- Fix qwen3 vl sp by @tastelikefeet in #6514
- [megatron] Support mcore bridge seq-cls by @Jintao-Huang in #6511
- fix docs by @tastelikefeet in #6517
- resolve template bug in seed oss pretraining by @hpsun1109 in #6520
- [megatron] default use batched_rope by @Jintao-Huang in #6524
- update random & fix bug by @Jintao-Huang in #6531
- fix bugs by @tastelikefeet in #6533
- update bleu by @Jintao-Huang in #6538
- [GKD] Log comletions & profiling by @hjh0119 in #6540
- [bugfix] fix eval register by @Jintao-Huang in #6543
- [bugfix] fix GRPO PT Rollout with SP by @hjh0119 in #6546
- [bugfix] fix GSPO padding_free by @hjh0119 in #6548
- [model] support ernie_vl by @Jintao-Huang in #6545
New Contributors
- @YushunXiang made their first contribution in #6114
- @B-201 made their first contribution in #6340
Full Changelog: v3.9.0...v3.10.0