[Feat] Add support for Dr.GRPO algorithm. Provide a better format reward function for countdown task.#1
Open
Bonjir wants to merge 3 commits into
Open
[Feat] Add support for Dr.GRPO algorithm. Provide a better format reward function for countdown task.#1Bonjir wants to merge 3 commits into
Bonjir wants to merge 3 commits into