Skip to content

Comments

feat: eval results#23

Draft
Ki-Seki wants to merge 48 commits intomainfrom
feat/eval-results
Draft

feat: eval results#23
Ki-Seki wants to merge 48 commits intomainfrom
feat/eval-results

Conversation

@Ki-Seki
Copy link
Member

@Ki-Seki Ki-Seki commented Jan 7, 2026

No description provided.

Copilot AI review requested due to automatic review settings January 7, 2026 15:38

This comment was marked as outdated.

@Ki-Seki
Copy link
Member Author

Ki-Seki commented Jan 11, 2026

Note: Eval Bench Sizes

#(Sculpt-AI/GIM-SFT/*/train) = 3157829
#(Idavidrein/gpqa/gpqa_diamond/train) = 198
#(openlifescienceai/medmcqa/validation) = 4183
#(TIGER-Lab/MMLU-Pro/test) = 12102
#(allenai/qasc/validation) = 920

@Ki-Seki Ki-Seki marked this pull request as draft February 6, 2026 03:56
Ki-Seki and others added 5 commits February 8, 2026 19:36
* chore: backup results

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* chore: remove gim prompt about gim models

* chore: update api model eval scripts

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* chore: fix api model eval scripts

* chore: fix api model eval scripts

* chore: ignore eval.log.* files (#76)

* feat: add upper bound to auto reason_budget (#78)

* chore: backup

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Shichao Song <60967965+Ki-Seki@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants