Skip to content

v3.10.0

Choose a tag to compare

@Jintao-Huang Jintao-Huang released this 11 Nov 12:14
· 191 commits to main since this release

中文版

新特性

  1. Megatron-SWIFT
    a. Mcore-Bridge发布。支持直接加载和存储 safetensors 格式的模型权重;支持LoRA增量权重双向转换;支持多机转换。文档参考:https://swift.readthedocs.io/zh-cn/latest/Megatron-SWIFT/Mcore-Bridge.html 。训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/megatron/mcore_bridge
    b. megatron-core 版本升级至0.14.0。
    c. 多模态模型训练新增 vit_lraligner_lr 参数支持。
    d. 新增存储优化参数:async_save, save_retain_interval等。
    e. 支持batched mrope,加速Qwen3-VL、Qwen2.5-VL等模型的训练速度。
  2. RL
    a. GRPO LoRA 训练权重同步速度优化,具体参考:https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/GetStarted/GRPO.html#id3
    b. GRPO 训练显存优化以降低峰值显存占用。
    c. RLVR 新算法支持:RLOO,文档参考:https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/AdvancedResearch/RLOO.htmlREINFORCE++ Baseline,文档参考:https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/AdvancedResearch/REINFORCEPP.html
    d. GKD 支持使用 vLLM 加速策略模型rollout,并新增参数teacher_deepspeed额外控制教师模型分片策略。文档参考:https://swift.readthedocs.io/zh-cn/latest/Instruction/GKD.html
    e. GSPO 支持使用liger_kernel减少显存使用。
  3. 训练
    a. PT/SFT/采样/数据蒸馏中支持了RAY,具体参考文档:https://swift.readthedocs.io/zh-cn/latest/Instruction/Ray.html
    b. Qwen3-VL、Qwen3-Omni支持混合模态数据训练;Qwen3-VL支持ulysses序列并行。训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_vl
    c. 支持 yaml 方式配置训练参数,脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/yaml
    d. 新增 FSDP2 训练启动案例,脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/train/multi-gpu/fsdp2_lora
    e. 新增自定义多模态模型注册最佳实践:https://swift.readthedocs.io/zh-cn/latest/BestPractices/MLLM-Registration.html
    f. embedding 训练中的 InfoNCE 损失与 Qwen3-Embedding 论文描述对齐。具体参考文档:https://swift.readthedocs.io/zh-cn/latest/BestPractices/Embedding.html
    g. 新增多标签分类训练案例,脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/train/seq_cls/multi_label
    h. agent_template 支持 seed-oss。感谢@hpsun1109的贡献。
  4. 全链路
    a. swift export支持 GPTQ-v2 量化,脚本参考:https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize/gptq_v2.sh 。感谢@zzc0430的贡献。
    b. swift deploy vllm推理后端支持 DP 部署,使用--vllm_data_parallel_size参数。感谢@YushunXiang 的贡献。
    c. swift deploy 新增 health/ping endpoints。
    d. vLLM 部署新增参数 vllm_mm_processor_cache_gb/vllm_engine_kwargs

新模型

  1. 纯文本模型:
    a. Qwen/Qwen3Guard-Gen-0.6B系列
    b. MiniMax/MiniMax-M2
  2. 多模态模型:
    a. Qwen/Qwen3-VL-2B-Instruct系列
    b. deepseek-ai/DeepSeek-OCR,训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/models/deepseek_ocr
    c. PaddlePaddle/PaddleOCR-VL
    d. ZhipuAI/Glyph
    e. PaddlePaddle/ERNIE-4.5-VL-28B-A3B-Thinking系列
    f. lmms-lab/LLaVA-OneVision-1.5-4B-Instruct系列

English Version

New Features

  1. Megatron-SWIFT
    a. Mcore-Bridge Release. Supports direct loading and saving of model weights in safetensors format; supports bidirectional conversion of LoRA incremental weights; supports multi-node conversion. Documentation: https://swift.readthedocs.io/en/latest/Megatron-SWIFT/Mcore-Bridge.html. Training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/megatron/mcore_bridge
    b. Upgraded megatron-core version to 0.14.0.
    c. Added vit_lr and aligner_lr parameter support for multimodal model training.
    d. Added storage optimization parameters: async_save, save_retain_interval, etc.
    e. Support for batched mrope to accelerate training speed of Qwen3-VL, Qwen2.5-VL, and other models.
  2. RL
    a. GRPO LoRA training weight synchronization speed optimization. Details: https://swift.readthedocs.io/en/latest/Instruction/GRPO/GetStarted/GRPO.html#memory-optimization-solutions-in-colocate-mode
    b. GRPO training memory optimization to reduce peak memory consumption.
    c. New RLVR algorithm support: RLOO, documentation: https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/RLOO.html. REINFORCE++ Baseline, documentation: https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/REINFORCEPP.html
    d. GKD supports using vLLM to accelerate policy model rollout, with new parameter teacher_deepspeed for additional control of teacher model sharding strategy. Documentation: https://swift.readthedocs.io/en/latest/Instruction/GKD.html
    e. GSPO supports using liger_kernel to reduce memory usage.
  3. Training
    a. RAY support added for PT/SFT/Sampling/Data Distillation, documentation: https://swift.readthedocs.io/en/latest/Instruction/Ray.html
    b. Qwen3-VL and Qwen3-Omni support mixed modality data training; Qwen3-VL supports Ulysses sequence parallelism. Training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_vl
    c. Support for YAML-based training parameter configuration, scripts: https://github.com/modelscope/ms-swift/tree/main/examples/yaml
    d. Added FSDP2 training launch example, scripts: https://github.com/modelscope/ms-swift/tree/main/examples/train/multi-gpu/fsdp2_lora
    e. Added best practice for custom multimodal model registration: https://swift.readthedocs.io/en/latest/BestPractices/MLLM-Registration.html
    f. InfoNCE loss in embedding training aligned with Qwen3-Embedding paper description. Documentation: https://swift.readthedocs.io/en/latest/BestPractices/Embedding.html
    g. Added multi-label classification training example, scripts: https://github.com/modelscope/ms-swift/tree/main/examples/train/seq_cls/multi_label
    h. agent_template supports seed-oss. Thanks to @hpsun1109 for the contribution.
  4. Full Pipeline
    a. swift export supports GPTQ-v2 quantization, scripts: https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize/gptq_v2.sh. Thanks to @zzc0430 for the contribution.
    b. swift deploy vLLM inference backend supports DP deployment, using --vllm_data_parallel_size parameter. Thanks to @YushunXiang for the contribution.
    c. swift deploy added health/ping endpoints.
    d. vLLM deployment added parameters vllm_mm_processor_cache_gb/vllm_engine_kwargs.

New Models

  1. Text-only models:
    a. Qwen/Qwen3Guard-Gen-0.6B series
    b. MiniMax/MiniMax-M2
  2. Multimodal models:
    a. Qwen/Qwen3-VL-2B-Instruct series
    b. deepseek-ai/DeepSeek-OCR, training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/models/deepseek_ocr
    c. PaddlePaddle/PaddleOCR-VL
    d. ZhipuAI/Glyph
    e. PaddlePaddle/ERNIE-4.5-VL-28B-A3B-Thinking series
    f. lmms-lab/LLaVA-OneVision-1.5-4B-Instruct series

What's Changed

New Contributors

Full Changelog: v3.9.0...v3.10.0