diff --git a/README.md b/README.md
index de9c43af49..fecaea4ac5 100644
--- a/README.md
+++ b/README.md
@@ -32,13 +32,13 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob
 
 ## 🚀 News
 
+* [2026-02] [[Release Notes]](https://github.com/agentscope-ai/Trinity-RFT/releases/tag/v0.5.1) Trinity-RFT v0.5.1 released: Enhanced VLM support, logging improvements, bug fixes.
 * [2026-02] [[Release Notes]](https://github.com/agentscope-ai/Trinity-RFT/releases/tag/v0.5.0) Trinity-RFT v0.5.0 released: colocate mode for single-GPU scenarios, trainer driven weight synchronization, automatic parallelism setting suggestion, and more.
 * [2026-01] 🎉 Three papers accepted by ICLR 2026: [CHORD](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/mix_chord), [BOTS](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/bots), and [Group-relative REINFORCE variants](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/rec_gsm8k). Try out these new algorithms in Trinity-RFT!
 * [2026-01] [[Release Notes]](https://github.com/agentscope-ai/Trinity-RFT/releases/tag/v0.4.1) Trinity-RFT v0.4.1 released: upgraded verl to v0.7.0, Tinker backend supports OpenAI API, bug fixes.
 * [2026-01] Introducing [R3L](https://github.com/shiweijiezero/R3L): a systematic reflect-then-retry RL mechanism with efficient language-guided exploration and stable off-policy learning ([paper](https://arxiv.org/abs/2601.03715)).
 * [2025-12] [[Release Notes]](https://github.com/agentscope-ai/Trinity-RFT/releases/tag/v0.4.0) Trinity-RFT v0.4.0 released: added [Tinker](https://thinkingmachines.ai/tinker/) backend for users **without GPUs**, add more benchmarks, enhance online RL and more.
 * [2025-12] Trinity-RFT powers the medical and health business of "Taobao Shangou", enabling the AI agent to understand vague symptoms, proactively ask follow-up questions, and provide precise recommendations ([News](https://tech.china.com.cn/sx/20251201/411376.shtml)).
-* [2025-11] [[Release Notes](https://github.com/agentscope-ai/Trinity-RFT/releases/tag/v0.3.3)] Trinity-RFT v0.3.3 released: bug fixes.
 * [2025-11] Introducing [Learn-to-Ask](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/learn_to_ask): a framework for training proactive dialogue agents from offline expert data  ([paper](https://arxiv.org/pdf/2510.25441)).
 * [2025-11] Introducing [BOTS](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/bots): online RL task selection for efficient LLM fine-tuning ([paper](https://arxiv.org/pdf/2510.26374)).
 * [2025-09] [Our paper](https://arxiv.org/pdf/2509.24203) reveals a novel off-policy interpretation for group-relative REINFORCE and its variants like GRPO and AsymRE ([implementation](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/rec_gsm8k)).
@@ -46,6 +46,7 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob
 
 <details><summary> More... </summary>
 <ul>
+  <li> [2025-11] Trinity-RFT v0.3.3 released: bug fixes.</li>
   <li> [2025-11] Trinity-RFT v0.3.2 released: bug fixes and advanced task selection & scheduling.</li>
   <li> [2025-10] Trinity-RFT v0.3.1 released: multi-stage training support, improved agentic RL examples, LoRA support, debug mode and new RL algorithms.</li>
   <li> [2025-09] Trinity-RFT v0.3.0 released: enhanced Buffer, FSDP2 & Megatron support, multi-modal models, and new RL algorithms/examples.</li>
@@ -67,7 +68,7 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob
 | *Full-lifecycle data pipelines*   | • [Rollout task mixing and selection](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_selector.html)<br>• [Online task curriculum](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/bots) (📝 [paper](https://arxiv.org/pdf/2510.26374))<br>• [Research project: learn-to-ask](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [paper](https://arxiv.org/pdf/2510.25441))<br>• [Experience replay with prioritization](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)<br>• [Advanced data processing & human-in-the-loop](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities.html) |
 | *Algorithm development*           | • [RL algorithm development with Trinity-RFT](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html) (📝 [paper](https://arxiv.org/pdf/2508.11408))<br>• [Research project: R3L (reflect-then-retry RL)](https://github.com/shiweijiezero/R3L) (📝 [paper](https://arxiv.org/abs/2601.03715))<br>• [Research project: group-relative REINFORCE](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [paper](https://arxiv.org/abs/2509.24203))<br>• Non-verifiable domains: [RULER](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [trainable RULER](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) |
 | *Benchmarks*                      | • [Benchmark toolkit (quick verification & experimentation)](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/README.md)<br>• [Guru-Math benchmark & comparison with veRL](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/reports/guru_math.md)<br>• [FrozenLake benchmark & comparison with rLLM](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/reports/frozenlake.md)<br>• [Alfworld benchmark & comparison with rLLM](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/reports/alfworld.md) |
-| *Going deeper into Trinity-RFT*   | • [Full configurations](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)<br>• [GPU resource and training configuration guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)<br>• [Understand the coordination between explorer and trainer](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/synchronizer.html)<br>• [How to align configuration with veRL](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/align_with_verl.html) |
+| *Going deeper into Trinity-RFT*   | • [Full configurations](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)<br>• [GPU resource and training configuration guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)<br>• [Training VLM](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_vlm)<br>• [Understand the coordination between explorer and trainer](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/synchronizer.html)<br>• [How to align configuration with veRL](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/align_with_verl.html) |
 
 > [!NOTE]
 > For more tutorials, please refer to the [Trinity-RFT documentation](https://agentscope-ai.github.io/Trinity-RFT/).
diff --git a/README_zh.md b/README_zh.md
index 5642481dd2..5b46002dcb 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -41,6 +41,7 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能：
 
 ## 🚀 新闻
 
+* [2026-02] [[发布说明]](https://github.com/agentscope-ai/Trinity-RFT/releases/tag/v0.5.1) Trinity-RFT v0.5.1 发布：增强 VLM 支持，改进日志系统，修复若干 Bug。
 * [2026-02] [[发布说明]](https://github.com/agentscope-ai/Trinity-RFT/releases/tag/v0.5.0) Trinity-RFT v0.5.0 发布：单 GPU 场景下的 colocate 模式，trainer 驱动的权重同步，自动并行设置建议等新功能。
 * [2026-01] 🎉 三篇论文被 ICLR 2026 接收：[CHORD](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/mix_chord)、[BOTS](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/bots) 和 [Group-relative REINFORCE 系列变种](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/rec_gsm8k)。在 Trinity-RFT 中尝试这些新算法吧！
 * [2026-01] [[发布说明]](https://github.com/agentscope-ai/Trinity-RFT/releases/tag/v0.4.1) Trinity-RFT v0.4.1 发布：升级 verl 至 v0.7.0，Tinker 后端支持 OpenAI API，修复若干 Bug。
@@ -80,7 +81,7 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能：
 | *全生命周期的数据流水线* | + [Rollout 任务混合与选取](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/develop_selector.html)<br>+ [在线任务选择](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/bots) (📝 [论文](https://arxiv.org/pdf/2510.26374))<br>+ [研究项目：learn-to-ask](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [论文](https://arxiv.org/pdf/2510.25441)) <br>+ [经验回放机制](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)<br>+ [高级数据处理能力 &  Human-in-the-loop](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_data_functionalities.html)  |
 | *强化学习算法开发* | + [使用 Trinity-RFT 进行 RL 算法开发](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_mix_algo.html) (📝 [论文](https://arxiv.org/pdf/2508.11408))<br>+ [研究项目: R3L (基于反思-重试的强化学习)](https://github.com/shiweijiezero/R3L) (📝 [论文](https://arxiv.org/abs/2601.03715))<br>+ [研究项目: group-relative REINFORCE](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [论文](https://arxiv.org/abs/2509.24203)) <br>+ 不可验证的领域: [RULER](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [可训练 RULER](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) |
 | *基准测试* | + [基准测试工具 (快速验证与实验)](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/README.md)<br>+ [Guru-Math 测试 & 对比 veRL](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/reports/guru_math.md)<br>+ [FrozenLake 测试 & 对比 rLLM](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/reports/frozenlake.md)<br>+ [Alfworld 测试 & 对比 rLLM](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/reports/alfworld.md) |
-| *深入认识 Trinity-RFT* | + [完整配置指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_configs.html)<br>+ [GPU 资源与训练配置对应指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_gpu_configs.html)<br>+ [理解 explorer-trainer 同步逻辑](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/synchronizer.html)<br>+ [如何与 verl 对齐配置](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/align_with_verl.html)   |
+| *深入了解 Trinity-RFT* | + [完整配置指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_configs.html)<br>+ [GPU 资源与训练配置对应指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_gpu_configs.html)<br>+ [训练多模态模型](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_vlm)<br>+ [理解 explorer-trainer 同步逻辑](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/synchronizer.html)<br>+ [如何与 verl 对齐配置](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/align_with_verl.html)   |
 
 
 > [!NOTE]
diff --git a/docs/sphinx_doc/source/main.md b/docs/sphinx_doc/source/main.md
index 4caf698727..b2dfb3b371 100644
--- a/docs/sphinx_doc/source/main.md
+++ b/docs/sphinx_doc/source/main.md
@@ -32,7 +32,7 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob
 | *Full-lifecycle data pipelines* | + [Rollout task mixing and selection](/tutorial/develop_selector.md)<br>+ [Online task curriculum](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/bots) (📝 [paper](https://arxiv.org/pdf/2510.26374))<br>+ [Research project: learn-to-ask](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [paper](https://arxiv.org/pdf/2510.25441)) <br>+ [Experience replay with prioritization](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)<br>+ [Advanced data processing & human-in-the-loop](/tutorial/example_data_functionalities.md)  |
 | *Algorithm development* | + [RL algorithm development with Trinity-RFT](/tutorial/example_mix_algo.md) (📝 [paper](https://arxiv.org/pdf/2508.11408))<br>+ [Research project: R3L (reflect-then-retry RL)](https://github.com/shiweijiezero/R3L) (📝 [paper](https://arxiv.org/abs/2601.03715))<br>+ [Research project: group-relative REINFORCE](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [paper](https://arxiv.org/abs/2509.24203)) <br>+ Non-verifiable domains: [RULER](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [trainable RULER](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) |
 | *Benchmarks* | + [Benchmark toolkit (quick verification & experimentation)](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/README.md)<br>+ [Guru-Math benchmark & comparison with veRL](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/reports/guru_math.md)<br>+ [FrozenLake benchmark & comparison with rLLM](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/reports/frozenlake.md)<br>+ [Alfworld benchmark & comparison with rLLM](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/reports/alfworld.md) |
-| *Going deeper into Trinity-RFT* | + [Full configurations](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)<br>+ [GPU resource and training configuration guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)<br>+ [Understand the coordination between explorer and trainer](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/synchronizer.html)<br>+ [How to align configuration with veRL](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/align_with_verl.html)    |
+| *Going deeper into Trinity-RFT*   | + [Full configurations](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)<br>+ [GPU resource and training configuration guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)<br>+ [Training VLM](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_vlm)<br>+ [Understand the coordination between explorer and trainer](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/synchronizer.html)<br>+ [How to align configuration with veRL](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/align_with_verl.html) |
 
 
 
diff --git a/docs/sphinx_doc/source/tutorial/develop_workflow.md b/docs/sphinx_doc/source/tutorial/develop_workflow.md
index bc0e8fa69d..fd0b52ef89 100644
--- a/docs/sphinx_doc/source/tutorial/develop_workflow.md
+++ b/docs/sphinx_doc/source/tutorial/develop_workflow.md
@@ -562,4 +562,4 @@ Each runner will log its output to a separate log file. The log file naming conv
     └── ...
 ```
 
-If you found errors or blocking issues during training, you can check the corresponding log files for detailed information to help diagnose and fix problems.
+Trinity-RFT also provide a convenient command `log` to view these logs in real-time. You can use `trinity log --log-dir /path/to/log/dir -k explorer_runner` command to filter and view the logs of all runners at once or use `trinity log --log-dir /path/to/log/dir -k explorer_runner_0` to view the logs of a specific runner. If you encounter errors or blocking issues during training, you can check the corresponding log files for detailed information to help diagnose and resolve the problems.
diff --git a/docs/sphinx_doc/source_zh/main.md b/docs/sphinx_doc/source_zh/main.md
index 1252742d4e..58ab96ad0e 100644
--- a/docs/sphinx_doc/source_zh/main.md
+++ b/docs/sphinx_doc/source_zh/main.md
@@ -31,8 +31,7 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能：
 | *全生命周期的数据流水线* | + [Rollout 任务混合与选取](/tutorial/develop_selector.md)<br>+ [在线任务选择](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/bots) (📝 [论文](https://arxiv.org/pdf/2510.26374))<br>+ [研究项目：learn-to-ask](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [论文](https://arxiv.org/pdf/2510.25441)) <br>+ [经验回放机制](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)<br>+ [高级数据处理能力 &  Human-in-the-loop](/tutorial/example_data_functionalities.md)  |
 | *强化学习算法开发* | + [使用 Trinity-RFT 进行 RL 算法开发](/tutorial/example_mix_algo.md) (📝 [论文](https://arxiv.org/pdf/2508.11408))<br>+ [研究项目: R3L (基于反思-重试的强化学习)](https://github.com/shiweijiezero/R3L) (📝 [论文](https://arxiv.org/abs/2601.03715))<br>+ [研究项目: group-relative REINFORCE](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [论文](https://arxiv.org/abs/2509.24203)) <br>+ 不可验证的领域: [RULER](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [可训练 RULER](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) |
 | *基准测试* | + [基准测试工具 (快速验证与实验)](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/README.md)<br>+ [Guru-Math 测试 & 对比 veRL](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/reports/guru_math.md)<br>+ [FrozenLake 测试 & 对比 rLLM](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/reports/frozenlake.md)<br>+ [Alfworld 测试 & 对比 rLLM](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/reports/alfworld.md) |
-| *深入认识 Trinity-RFT* | + [完整配置指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_configs.html)<br>+ [GPU 资源与训练配置对应指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_gpu_configs.html)<br>+ [理解 explorer-trainer 同步逻辑](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/synchronizer.html)<br>+ [如何与 verl 对齐配置](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/align_with_verl.html)   |
-
+| *深入了解 Trinity-RFT* | + [完整配置指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_configs.html)<br>+ [GPU 资源与训练配置对应指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_gpu_configs.html)<br>+ [训练多模态模型](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_vlm)<br>+ [理解 explorer-trainer 同步逻辑](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/synchronizer.html)<br>+ [如何与 verl 对齐配置](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/align_with_verl.html)   |
 
 ## 🌟 核心特性
 
diff --git a/docs/sphinx_doc/source_zh/tutorial/develop_workflow.md b/docs/sphinx_doc/source_zh/tutorial/develop_workflow.md
index af1f41900d..60513f7b64 100644
--- a/docs/sphinx_doc/source_zh/tutorial/develop_workflow.md
+++ b/docs/sphinx_doc/source_zh/tutorial/develop_workflow.md
@@ -556,4 +556,4 @@ class ExampleWorkflow(Workflow):
     └── ...
 ```
 
-如果训练过程中出现报错或阻塞等异常情况，可通过查看对应日志文件快速定位问题，从而高效调试和优化工作流实现。
+Trinity-RFT 还提供了一个方便的 `log` 命令来实时查看这些日志。你可以使用 `trinity log --log-dir /path/to/log/dir -k explorer_runner` 命令来过滤并查看所有 workflow runner 的日志，或者使用 `trinity log --log-dir /path/to/log/dir -k explorer_runner_0` 来查看特定 workflow runner 的日志。
diff --git a/pyproject.toml b/pyproject.toml
index 812f14e6fd..3b422d5d4d 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name =  "trinity-rft"
-version = "0.5.0"
+version = "0.5.1"
 authors = [
     {name="Trinity-RFT Team", email="trinity-rft@outlook.com"},
 ]
@@ -42,7 +42,7 @@ dependencies = [
     "sortedcontainers",
     "word2number",
     "matplotlib",
-    "transformers>=4.51.0",
+    "transformers>=4.51.0,<5.0.0",
     "datasets>=4.0.0",
     "typer>=0.20.1",
 ]
@@ -53,8 +53,8 @@ trinity = "trinity.cli.launcher:main"
 [project.optional-dependencies]
 vllm = [
     "vllm>=0.10.2,<=0.15.1,!=0.11.0,!=0.12.0",
-    # v0.11.0 has bug when prefix-caching is enabled so we exclude it
-    # v0.12.0 has a huge performance regression so we exclude it
+    # v0.11 has bug when prefix-caching is enabled so we exclude it
+    # v0.12 has a huge performance regression so we exclude it
     # v0.10.2 is the most stable version, but we allow up to 0.15.1 for new features
 ]
 data = [
diff --git a/tests/cli/launcher_test.py b/tests/cli/launcher_test.py
index f0d7425831..3845c39c5c 100644
--- a/tests/cli/launcher_test.py
+++ b/tests/cli/launcher_test.py
@@ -396,6 +396,62 @@ def test_debug_mode(self, mock_load):
             process.join(timeout=10)
             process.terminate()
 
+    @mock.patch("trinity.manager.log_manager.LogManager")
+    @mock.patch("trinity.cli.launcher.load_config")
+    def test_log_mode(self, mock_load_config, mock_log_manager):
+        result = runner.invoke(launcher.app, ["log"])
+        self.assertNotEqual(result.exit_code, 0)
+        self.assertIn("Either --config or --log-dir must be provided", result.output)
+
+        mock_cfg = mock.Mock()
+        mock_cfg.get_checkpoint_job_dir.return_value = "/tmp/job"
+        mock_load_config.return_value = mock_cfg
+        with mock.patch("os.path.exists", return_value=True):
+            result = runner.invoke(
+                launcher.app,
+                [
+                    "log",
+                    "--config",
+                    "dummy.yaml",
+                    "-k",
+                    "trainer",
+                    "-l",
+                    "DEBUG",
+                    "-n",
+                    str(5),
+                    "-p",
+                    "ERROR",
+                ],
+            )
+            self.assertEqual(result.exit_code, 0)
+            mock_log_manager.assert_called_once_with(
+                log_dir="/tmp/job/log",
+                keyword="trainer",
+                min_level="DEBUG",
+                color_output=True,
+                last_n_lines=5,
+                search_pattern="ERROR",
+            )
+            mock_log_manager.return_value.monitor.assert_called_once()
+
+        with mock.patch("os.path.exists", return_value=True):
+            result = runner.invoke(launcher.app, ["log", "--log-dir", "/tmp/job/log"])
+            self.assertEqual(result.exit_code, 0)
+            mock_log_manager.assert_called_with(
+                log_dir="/tmp/job/log",
+                keyword=None,
+                min_level="INFO",
+                color_output=True,
+                last_n_lines=0,
+                search_pattern=None,
+            )
+
+        with mock.patch("os.path.exists", return_value=False):
+            result = runner.invoke(launcher.app, ["log", "--config", "dummy.yaml"])
+            print("result.exc_info:", result.exc_info)
+            self.assertNotEqual(result.exit_code, 0)
+            self.assertEqual(result.exc_info[0], FileNotFoundError)
+
 
 def debug_inference_model_process():
     config = get_template_config()
diff --git a/tests/manager/log_manager_test.py b/tests/manager/log_manager_test.py
new file mode 100644
index 0000000000..f5f0886e21
--- /dev/null
+++ b/tests/manager/log_manager_test.py
@@ -0,0 +1,76 @@
+import os
+import tempfile
+import unittest
+from unittest import mock
+
+from trinity.manager.log_manager import LogManager
+
+
+class TestLogManager(unittest.TestCase):
+    def setUp(self):
+        self.temp_dir = tempfile.TemporaryDirectory()
+        self.log_path = os.path.join(self.temp_dir.name, "test.log")
+        with open(self.log_path, "w") as f:
+            f.write("INFO Start\nWARN Something\nERROR FindMe\nDEBUG End\n")
+
+    def tearDown(self):
+        self.temp_dir.cleanup()
+
+    @mock.patch("builtins.print")
+    def test_init_and_tracking(self, mock_print):
+        manager = LogManager(
+            log_dir=self.temp_dir.name,
+            min_level="INFO",
+            last_n_lines=2,
+            color_output=False,
+        )
+        manager.scan_new_files()
+        self.assertEqual(mock_print.call_count, 1 + 2)  # 1 for tracking, 2 for last_n_lines
+        self.assertIn(self.log_path, manager.trackers)
+        tracker = manager.trackers[self.log_path]
+        with open(self.log_path, "a") as f:
+            f.write("INFO line 4\nERROR line 5\nDEBUG line 6\n")
+        lines = tracker.read_new_lines()
+        filtered = [line for line in lines if tracker.should_display(line)]
+        self.assertEqual(filtered, ["INFO line 4", "ERROR line 5"])
+
+    @mock.patch("builtins.print")
+    def test_file_rotation(self, mock_print):
+        manager = LogManager(
+            log_dir=self.temp_dir.name,
+            min_level="DEBUG",
+            color_output=False,
+            last_n_lines=1,
+        )
+        manager.scan_new_files()
+        tracker = manager.trackers[self.log_path]
+        with open(self.log_path, "w") as f:
+            f.write("INFO AfterRotation\n")
+        tracker.read_new_lines()
+        self.assertEqual(
+            mock_print.call_count, 4
+        )  # 1 for last_n_lines, 1 for tracking, 1 for file rotation, 1 for last_n_lines
+        import re
+
+        def strip_ansi(s):
+            return re.sub(r"\x1b\[[0-9;]*m", "", s)
+
+        self.assertIn("Detected file rotation", strip_ansi(mock_print.call_args_list[2][0][0]))
+        self.assertIn("INFO AfterRotation", strip_ansi(mock_print.call_args_list[3][0][0]))
+
+    @mock.patch("builtins.print")
+    def test_keyword_filter_and_search_pattern(self, mock_print):
+        log2 = os.path.join(self.temp_dir.name, "other.log")
+        with open(log2, "w") as f:
+            f.write("INFO Other\n")
+        manager = LogManager(
+            log_dir=self.temp_dir.name,
+            keyword="test",
+            min_level="INFO",
+            color_output=False,
+            search_pattern="FindMe",
+        )
+        manager.scan_new_files()
+        self.assertIn(self.log_path, manager.trackers)
+        self.assertNotIn(log2, manager.trackers)
+        self.assertTrue(any("FindMe" in call[0][0] for call in mock_print.call_args_list))
diff --git a/tests/trainer/trainer_test.py b/tests/trainer/trainer_test.py
index 5a0560a0a5..46caeda744 100644
--- a/tests/trainer/trainer_test.py
+++ b/tests/trainer/trainer_test.py
@@ -1,6 +1,7 @@
 """Tests for trainer."""
 
 import asyncio
+import gc
 import json
 import math
 import multiprocessing
@@ -302,6 +303,9 @@ def tearDown(self):
         shutil.rmtree(self.config.checkpoint_job_dir, ignore_errors=True)
 
 
+@unittest.skip(
+    "This test is used for testing the warmup stage of SFT, which is not stable yet. Will enable it after we have a more stable implementation."
+)
 class TestTrainerSFTWarmupGSM8K(BaseTrainerCase):
     @mock.patch("trinity.cli.launcher.load_config")
     def test_trainer(self, mock_load):
@@ -352,7 +356,10 @@ def test_trainer(self, mock_load):
 
         with self.assertRaises(Exception):
             run(config="dummy.yaml")
+
         ray.shutdown(_exiting_interpreter=True)
+        self._cleanup_ray_data_state()
+        gc.collect()
 
         stage_configs = [cfg.check_and_update() for cfg in deepcopy(self.config)]
 
diff --git a/trinity/__init__.py b/trinity/__init__.py
index f9c4d8f061..1ce298d2b5 100644
--- a/trinity/__init__.py
+++ b/trinity/__init__.py
@@ -1,4 +1,4 @@
 # -*- coding: utf-8 -*-
 """Trinity-RFT (Reinforcement Fine-Tuning)"""
 
-__version__ = "0.5.0"
+__version__ = "0.5.1"
diff --git a/trinity/cli/launcher.py b/trinity/cli/launcher.py
index ba1b00ccaa..02795e13c2 100644
--- a/trinity/cli/launcher.py
+++ b/trinity/cli/launcher.py
@@ -190,7 +190,7 @@ def run_stage(config: Config) -> None:
 def run(
     config: Annotated[
         str,
-        typer.Option("--config", help="Path to the config file."),
+        typer.Option("--config", "-c", help="Path to the config file."),
     ],
     dlc: Annotated[
         bool,
@@ -261,7 +261,7 @@ def run(
 def studio(
     port: Annotated[
         int,
-        typer.Option("--port", help="The port for Trinity-Studio."),
+        typer.Option("--port", "-p", help="The port for Trinity-Studio."),
     ] = 8501,
 ) -> None:
     """Run studio to manage configurations."""
@@ -274,12 +274,13 @@ def studio(
 def debug(
     config: Annotated[
         str,
-        typer.Option("--config", help="Path to the config file."),
+        typer.Option("--config", "-c", help="Path to the config file."),
     ],
     module: Annotated[
         str,
         typer.Option(
             "--module",
+            "-m",
             help="The module to debug: 'inference_model', 'workflow', or 'viewer'.",
         ),
     ],
@@ -289,7 +290,7 @@ def debug(
     ] = None,
     output_dir: Annotated[
         str,
-        typer.Option("--output-dir", help="The output directory for debug files."),
+        typer.Option("--output-dir", "-o", help="The output directory for debug files."),
     ] = "debug_output",
     disable_overwrite: Annotated[
         bool,
@@ -301,7 +302,7 @@ def debug(
     ] = False,
     port: Annotated[
         int,
-        typer.Option("--port", help="The port for Experience Viewer."),
+        typer.Option("--port", "-p", help="The port for Experience Viewer."),
     ] = 8502,
 ) -> None:
     """Debug a workflow implementation."""
@@ -351,11 +352,11 @@ def debug(
 def convert(
     checkpoint_dir: Annotated[
         str,
-        typer.Option("--checkpoint-dir", help="The path to the checkpoint directory."),
+        typer.Option("--checkpoint-dir", "-c", help="The path to the checkpoint directory."),
     ],
     base_model_dir: Annotated[
         Optional[str],
-        typer.Option("--base-model-dir", help="The path to the base model."),
+        typer.Option("--base-model-dir", "-b", help="The path to the base model."),
     ] = None,
 ) -> None:
     """Convert checkpoints to huggingface format."""
@@ -367,6 +368,79 @@ def convert(
     converter.convert(dir_path)
 
 
+@app.command()
+def log(
+    log_dir: Annotated[
+        str,
+        typer.Option(
+            "--log-dir",
+            "-d",
+            help="Path to the log directory. If provided, it will be used directly and ignore --config.",
+        ),
+    ] = "",
+    config: Annotated[
+        str,
+        typer.Option(
+            "--config",
+            "-c",
+            help="Path to the config file. If provided, it will automatically locate the log directory based on the config.",
+        ),
+    ] = "",
+    keyword: Annotated[
+        Optional[str],
+        typer.Option(
+            "--keyword",
+            "-k",
+            help="Only track log files containing the keyword in their filenames.",
+        ),
+    ] = None,
+    level: Annotated[
+        str,
+        typer.Option("--level", "-l", help="The minimum log level to display in real-time."),
+    ] = "INFO",
+    last_n_lines: Annotated[
+        int,
+        typer.Option("--last-n-lines", "-n", help="Number of last lines to display when starting."),
+    ] = 0,
+    search_pattern: Annotated[
+        Optional[str],
+        typer.Option(
+            "--search-pattern",
+            "-p",
+            help="The pattern to search in log files. Only search for history logs and display all lines containing the pattern.",
+        ),
+    ] = None,
+    no_color: Annotated[
+        bool,
+        typer.Option("--no-color", help="Disable colored output."),
+    ] = False,
+) -> None:
+    """Monitor log files in real-time."""
+    from trinity.manager.log_manager import LogManager
+
+    if not config and not log_dir:
+        raise typer.BadParameter("Either --config or --log-dir must be provided.")
+    if not log_dir:
+        cfg = load_config(config)
+        checkpoint_job_dir = cfg.get_checkpoint_job_dir()
+        # we do not use check_and_update here because user may use this command
+        # in another environment
+        log_dir = os.path.join(checkpoint_job_dir, "log")
+
+    if not os.path.exists(log_dir):
+        raise FileNotFoundError(f"Log directory not found: {log_dir}")
+
+    log_manager = LogManager(
+        log_dir=log_dir,
+        keyword=keyword,
+        min_level=level,
+        color_output=not no_color,
+        last_n_lines=last_n_lines,
+        search_pattern=search_pattern,
+    )
+    log_manager.monitor()
+
+
 def main() -> None:
     """The main entrypoint."""
     app()
diff --git a/trinity/common/config.py b/trinity/common/config.py
index 7f4d6edb65..097cda5e57 100644
--- a/trinity/common/config.py
+++ b/trinity/common/config.py
@@ -943,6 +943,10 @@ def get_envs(self) -> Dict[str, str]:
             envs["TINKER_BASE_URL"] = self.model.tinker.base_url
         return envs
 
+    def get_checkpoint_job_dir(self) -> str:
+        """Get the checkpoint job dir."""
+        return os.path.join(self.checkpoint_root_dir, self.project, self.group, self.name)
+
 
 def load_config(config_path: str) -> Config:
     """Load the configuration from the given path."""
diff --git a/trinity/common/config_validator.py b/trinity/common/config_validator.py
index 69520fe56d..dbe12af63d 100644
--- a/trinity/common/config_validator.py
+++ b/trinity/common/config_validator.py
@@ -97,9 +97,7 @@ def validate(self, config: Config) -> None:
         if not os.path.isabs(config.checkpoint_root_dir):
             config.checkpoint_root_dir = os.path.join(os.getcwd(), config.checkpoint_root_dir)
         # create a job dir at checkpoint_root_dir/project/name
-        config.checkpoint_job_dir = os.path.join(
-            config.checkpoint_root_dir, config.project, config.group, config.name
-        )
+        config.checkpoint_job_dir = config.get_checkpoint_job_dir()
         # rename the experiment when necessary
         if not config.continue_from_checkpoint and (
             os.path.exists(config.checkpoint_job_dir) and os.listdir(config.checkpoint_job_dir)
diff --git a/trinity/common/verl_config.py b/trinity/common/verl_config.py
index 218f67c4ba..1aaa4a3c4a 100644
--- a/trinity/common/verl_config.py
+++ b/trinity/common/verl_config.py
@@ -461,11 +461,11 @@ def _adjust_token_len_if_needed(
         """
         Helper to adjust token length per GPU if current setting is too small.
 
-        Ensures: token_len * seq_parallel >= config.model.max_model_len * 2
+        Ensures: token_len * seq_parallel >= config.model.max_model_len
         """
         current_token_len = getattr(obj, token_len_attr)
         seq_parallel = getattr(obj, sp_attr)
-        required_min = config.model.max_model_len * 2  # type: ignore
+        required_min = config.model.max_model_len  # type: ignore
 
         if current_token_len * seq_parallel < required_min:
             new_token_len = math.ceil(required_min / seq_parallel)
diff --git a/trinity/manager/log_manager.py b/trinity/manager/log_manager.py
new file mode 100644
index 0000000000..5ca1313a82
--- /dev/null
+++ b/trinity/manager/log_manager.py
@@ -0,0 +1,282 @@
+"""logger manager"""
+
+import os
+import re
+import threading
+import time
+from pathlib import Path
+from typing import Dict, List, Set
+
+
+class Colors:
+    RESET = "\033[0m"
+    RED = "\033[91m"
+    GREEN = "\033[92m"
+    YELLOW = "\033[93m"
+    BLUE = "\033[94m"
+    MAGENTA = "\033[95m"
+    CYAN = "\033[96m"
+    WHITE = "\033[97m"
+    GRAY = "\033[90m"
+
+
+LOG_LEVELS = {
+    "DEBUG": 0,
+    "INFO": 1,
+    "WARNING": 2,
+    "WARN": 2,
+    "ERROR": 3,
+    "CRITICAL": 4,
+    "FATAL": 4,
+}
+
+
+class LogFileTracker:
+    """Single log file tracker"""
+
+    def __init__(
+        self,
+        filepath: str,
+        min_level: int = 0,
+        color_output: bool = True,
+        last_n_lines: int = 0,
+        search_pattern: str | None = None,
+    ):
+        self.filepath = filepath
+        self.min_level = min_level
+        self.color_output = color_output
+        self.last_n_lines = last_n_lines
+        self.search_pattern = search_pattern
+        self.file = None
+        self.file_size = 0
+        self.inode = None
+
+    def open_file(self):
+        """Open file and optionally read last N lines"""
+        try:
+            self.file = open(self.filepath, "r", encoding="utf-8", errors="ignore")
+            if self.search_pattern:
+                print(
+                    f"{Colors.CYAN}[INFO] Searching for pattern '{self.search_pattern}' in {self.filepath}{Colors.RESET}"
+                )
+                self.file.seek(0)
+                lines = self.file.readlines()
+                match_indices = [i for i, line in enumerate(lines) if self.search_pattern in line]
+                for idx in match_indices:
+                    start = max(0, idx - 5)
+                    end = min(len(lines), idx + 6)
+                    print(f"{Colors.MAGENTA}[{self.filepath}:{idx + 1}]{Colors.RESET}")
+                    for i in range(start, end):
+                        prefix = f"{Colors.MAGENTA}>> {Colors.RESET}" if i == idx else "   "
+                        print(prefix + self.format_output(lines[i].rstrip("\n")))
+                print(
+                    f"{Colors.CYAN}[INFO] Finished searching in {self.filepath}, now monitoring for new lines...{Colors.RESET}"
+                )
+            stat = os.stat(self.filepath)
+            self.inode = stat.st_ino
+            if self.last_n_lines > 0:
+                # Read last N lines
+                self.file.seek(0, 2)
+                file_size = self.file.tell()
+                block_size = 4096
+                blocks = []
+                lines_found = 0
+                pos = file_size
+                while pos > 0 and lines_found < self.last_n_lines:
+                    read_size = min(block_size, pos)
+                    pos -= read_size
+                    self.file.seek(pos)
+                    block = self.file.read(read_size)
+                    blocks.insert(0, block)
+                    lines_found = sum(b.count("\n") for b in blocks)
+                all_data = "".join(blocks)
+                last_lines = all_data.splitlines()[-self.last_n_lines :]
+                for line in last_lines:
+                    print(self.format_output(line))
+                self.file.seek(0, 2)
+            else:
+                self.file.seek(0, 2)
+            self.file_size = self.file.tell()
+            return True
+        except Exception as e:
+            print(f"{Colors.RED}[ERROR] Failed to open {self.filepath}: {e}{Colors.RESET}")
+            return False
+
+    def check_rotation(self):
+        """Check if file has been rotated"""
+        try:
+            stat = os.stat(self.filepath)
+            # detect rotation: inode changed or file size decreased
+            if stat.st_ino != self.inode or stat.st_size < self.file_size:
+                print(f"{Colors.CYAN}[INFO] Detected file rotation: {self.filepath}{Colors.RESET}")
+                if self.file:
+                    self.file.close()
+                return True
+            return False
+        except FileNotFoundError:
+            return True
+        except Exception as e:
+            print(
+                f"{Colors.RED}[ERROR] Error checking file rotation for {self.filepath}: {e}{Colors.RESET}"
+            )
+            return False
+
+    def read_new_lines(self) -> List[str]:
+        """Read newly added lines"""
+        lines = []
+        try:
+            if self.check_rotation():
+                if self.open_file():
+                    return []
+
+            while True:
+                line = self.file.readline()
+                if not line:
+                    break
+                lines.append(line.rstrip("\n"))
+
+            self.file_size = self.file.tell()
+        except Exception as e:
+            print(f"{Colors.RED}[ERROR] Error reading file {self.filepath}: {e}{Colors.RESET}")
+
+        return lines
+
+    def parse_log_level(self, line: str) -> int:
+        """Parse log level"""
+        match = re.match(r"^(DEBUG|INFO|WARNING|WARN|ERROR|CRITICAL|FATAL)\b", line)
+        if match:
+            level = match.group(1).upper()
+            return LOG_LEVELS.get(level, -1)
+        return -1
+
+    def should_display(self, line: str) -> bool:
+        """Determine if this log line should be displayed"""
+        priority = self.parse_log_level(line)
+
+        if priority >= self.min_level:
+            return True
+        return False
+
+    def format_output(self, line: str) -> str:
+        """Format output"""
+        filename = Path(self.filepath).name
+
+        if self.color_output:
+            return f"{Colors.BLUE}[{filename}]{Colors.RESET} {line}"
+        else:
+            return f"[{filename}] {line}"
+
+    def close(self):
+        """Close file"""
+        if self.file:
+            self.file.close()
+
+
+class LogManager:
+    """A manager to track multiple log files in real-time."""
+
+    def __init__(
+        self,
+        log_dir: str,
+        keyword: str | None = None,
+        min_level: str = "DEBUG",
+        scan_interval: float = 0.5,
+        last_n_lines: int = 0,
+        search_pattern: str | None = None,
+        color_output: bool = True,
+    ):
+        self.log_dir = Path(log_dir)
+        self.keyword = keyword
+        self.min_level_name = min_level.upper()
+        self.min_level_priority = LOG_LEVELS.get(self.min_level_name, 0)
+        self.scan_interval = scan_interval
+        self.color_output = color_output
+        self.search_pattern = search_pattern
+        self.trackers: Dict[str, LogFileTracker] = {}
+        self.running = False
+        self.last_n_lines = last_n_lines
+        self.lock = threading.Lock()
+
+    def find_log_files(self) -> Set[str]:
+        """Find matching log files"""
+        log_files = set()
+        try:
+            for file in self.log_dir.iterdir():
+                if file.is_file() and (self.keyword is None or self.keyword in file.name):
+                    log_files.add(str(file.resolve()))
+        except Exception as e:
+            print(f"{Colors.RED}[ERROR] Failed to scan directory {self.log_dir}: {e}{Colors.RESET}")
+
+        return log_files
+
+    def scan_new_files(self):
+        """Scan for newly added log files"""
+        current_files = self.find_log_files()
+
+        with self.lock:
+            for filepath in current_files:
+                if filepath not in self.trackers:
+                    tracker = LogFileTracker(
+                        filepath,
+                        self.min_level_priority,
+                        self.color_output,
+                        self.last_n_lines,
+                        self.search_pattern,
+                    )
+                    if tracker.open_file():
+                        self.trackers[filepath] = tracker
+                        print(f"{Colors.GREEN}[INFO] Started tracking: {filepath}{Colors.RESET}")
+
+            removed_files = set(self.trackers.keys()) - current_files
+            for filepath in removed_files:
+                self.trackers[filepath].close()
+                del self.trackers[filepath]
+                print(f"{Colors.YELLOW}[INFO] Stopped tracking: {filepath}{Colors.RESET}")
+
+    def monitor(self):
+        """Main monitoring loop"""
+        self.running = True
+        last_scan = 0
+
+        print(f"{Colors.CYAN}{'=' * 60}{Colors.RESET}")
+        print(f"{Colors.CYAN}Log monitoring started{Colors.RESET}")
+        print(f"{Colors.CYAN}Monitoring directory: {self.log_dir}{Colors.RESET}")
+        print(f"{Colors.CYAN}Keyword: {self.keyword}{Colors.RESET}")
+        print(f"{Colors.CYAN}Minimum level: {self.min_level_name}{Colors.RESET}")
+        print(f"{Colors.CYAN}{'=' * 60}{Colors.RESET}\n")
+
+        # Initial scan
+        self.scan_new_files()
+
+        try:
+            while self.running:
+                current_time = time.time()
+
+                # Periodically scan for new files (every 5 seconds)
+                if current_time - last_scan > 5:
+                    self.scan_new_files()
+                    last_scan = current_time
+
+                # Read new content from all files
+                with self.lock:
+                    for tracker in list(self.trackers.values()):
+                        lines = tracker.read_new_lines()
+                        for line in lines:
+                            if tracker.should_display(line):
+                                output = tracker.format_output(line)
+                                print(output)
+
+                time.sleep(self.scan_interval)
+
+        except KeyboardInterrupt:
+            print(f"\n{Colors.YELLOW}[INFO] Received stop signal, exiting...{Colors.RESET}")
+        finally:
+            self.stop()
+
+    def stop(self):
+        """Stop monitoring"""
+        self.running = False
+        with self.lock:
+            for tracker in self.trackers.values():
+                tracker.close()
+        print(f"{Colors.GREEN}[INFO] Monitoring stopped{Colors.RESET}")