Skip to content

Releases: THUDM/slime

v0.2.1

12 Dec 13:02
0934a0e

Choose a tag to compare

Thanks to the incredible support and contributions from our community — v0.2.1 is here!

Major Updates

  • VLM + FSDP: true on-policy training on Qwen3-VL (dense).
  • PD-disaggregation support during rollout
  • DP-attention support in rollout routing replay (R3)
  • Upgraded to SGLang v0.5.6

What's Changed

New Contributors

Read more

v0.2.0.post1

01 Dec 04:17
763f18d

Choose a tag to compare

Fix critical bug mentioned in #958.

What's Changed

  • extract mla update weight logic out by @zhuzilin in #960
  • support do all evals together by @zhuzilin in #959
  • Add --rollout-sample-filter-path by @zhuzilin in #961
  • [FSDP] Optimize FSDP2 Model Loading with Rank-0 Broadcast by @Hecate0821 in #915
  • Add sample.remove_sample by @zhuzilin in #977
  • add --eval-max-prompt-len by @zhuzilin in #978
  • Add args check for max_context_len by @zhuzilin in #979
  • Remove hard coded balance_abs_threshold by @zhuzilin in #981
  • Tiny fix fp8_cast_bf16 not copying chat template by @fzyzcjy in #964
  • Super tiny install dnsutils in dockerfile by @fzyzcjy in #965
  • Super tiny sanity check checkpoint dir by @fzyzcjy in #966
  • Fix convert_hf_to_torch_dist OOM by @fzyzcjy in #967
  • Tiny support using environment variables in addition to arguments for all scripts by @fzyzcjy in #968
  • Super tiny increase default timeout sec by @fzyzcjy in #969
  • Fix random port in use error even though already have free port detection by @fzyzcjy in #970
  • Super tiny enable draft-weights-cpu-backup to avoid MTP acc len issue by @fzyzcjy in #971
  • Add generation function for benchmarking purpose by @fzyzcjy in #972
  • Support zero host or device memory waste for weight update by @fzyzcjy in #973
  • Add fp8 kv cache and tis in qwen3 30b a3b script by @fzyzcjy in #974
  • Add GB200, MTP, benchmark, fp8 rollout mode to glm script by @fzyzcjy in #975
  • [FSDP] Add private func indicator for better usage by @PopSoda2002 in #982
  • [Bugfix] Rename save model by @PopSoda2002 in #983
  • Fix: resolve variable shadowing bug in setup_model_and_optimizer by @fangzhensheng in #963

New Contributors

Full Changelog: v0.2.0...v0.2.0.post1

v0.2.0

28 Nov 02:51
91acef0

Choose a tag to compare

We are thrilled to announce the release of slime v0.2.0! Thanks to the incredible support and contributions from our community, slime has gained significant features and substantial performance enhancements in this version.

Major Updates

  • FSDP Backend: Introduced a fully Fully Sharded Data Parallel (FSDP) based training backend for improved scalability.
  • PPO Support: Added native support for Proximal Policy Optimization (PPO).
  • MTP Training: Enabled training of the MTP (Multi-Token Prediction) during Reinforcement Learning.
  • FP8 Full Stack: Support for both FP8 training and FP8 inference.
  • Train-Inference Mismatch: Alleviate or even eliminate train-inference mismatch
    • Importance Sampling: Custom interface for train-infer importance sampling (e.g., MIS).
    • Routing Replay: Added Rollout Routing Replay (R3) and Routing Replay (R2).
    • True On-Policy Training: Enabled strictly on-policy training with dense models on the FSDP backend.
  • Performance Improvements
    • Memory Optimization: CUDA Graphs offload, asystem-amem integration.
    • Faster Weight Updates: Significantly accelerated FP8 weight updates.
  • Python-based Router: A new slime router implemented in pure Python for accessibility.
  • Fault Tolerance: Added robustness with fault tolerance for the rollout engines.
  • Custom Configs: Support for passing customized configurations via --config.
  • [Experimental] Checkpoint Loading: Added support for Megatron-bridge based checkpoint loading.
  • New Examples
    • Fully Async Training
    • Multi-Agent Scenarios
    • On-Policy Distillation
    • Retool

What's Changed

Read more

v0.1.0

31 Aug 16:35
261ecee

Choose a tag to compare

Performance Optimizations

  • SGLang: FP8 + DeepEP + speculative decoding
  • Megatron: all parallel strategy supports (TP, PP, VPP, EP, CP, etc) + DeepEP + CPU Adam.
  • New Megatron offload strategy with better memory usage.
  • Faster weight updation.

New Algorithm Supports

  • GSPO
  • TIS
  • reinforce++ & reinforce++ base

Correctness

  • CI for E2E GLM4 9B adn Qwen3 30B-A3B training
  • CI for Build Conda environment