feat: eval results by Ki-Seki · Pull Request #23 · SculptAI/GIMBench

Ki-Seki · 2026-01-07T15:38:27Z

No description provided.

for more information, see https://pre-commit.ci

Ki-Seki · 2026-01-11T09:14:45Z

Note: Eval Bench Sizes

#(Sculpt-AI/GIM-SFT/*/train) = 3157829
#(Idavidrein/gpqa/gpqa_diamond/train) = 198
#(openlifescienceai/medmcqa/validation) = 4183
#(TIGER-Lab/MMLU-Pro/test) = 12102
#(allenai/qasc/validation) = 920

for more information, see https://pre-commit.ci

…tion

…eriments

* chore: backup results * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * chore: remove gim prompt about gim models * chore: update api model eval scripts * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * chore: fix api model eval scripts * chore: fix api model eval scripts * chore: ignore eval.log.* files (#76) * feat: add upper bound to auto reason_budget (#78) * chore: backup * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Shichao Song <60967965+Ki-Seki@users.noreply.github.com>

feat: add some old eval results

9c54e0d

Copilot AI review requested due to automatic review settings January 7, 2026 15:38

Ki-Seki added the do not merge label Jan 7, 2026

This comment was marked as outdated.

Sign in to view

pre-commit-ci bot and others added 4 commits January 7, 2026 15:41

[pre-commit.ci] auto fixes from pre-commit.com hooks

f5b2e57

for more information, see https://pre-commit.ci

Merge branch 'main' into feat/eval-results

a5a769a

feat: add Match @ Qwen3-1.7B

0f879cb

[pre-commit.ci] auto fixes from pre-commit.com hooks

bc7f050

for more information, see https://pre-commit.ci

Ki-Seki and others added 9 commits January 11, 2026 17:15

Merge branch 'main' into feat/eval-results

7d8f4cc

feat: add evaluation script

ec310b1

[pre-commit.ci] auto fixes from pre-commit.com hooks

11cd71e

for more information, see https://pre-commit.ci

chore: backup

7c8f021

chore: backup

b27f382

feat: add evaluation and README scripts for KDD experiments

76dedf3

Merge branch 'main' into feat/eval-results

c05a661

fix: update model names in evaluation scripts for consistency

29c4c1f

chore: backup

c9eef17

Ki-Seki force-pushed the feat/eval-results branch from 3dd512e to c9eef17 Compare January 21, 2026 14:27

pre-commit-ci bot and others added 11 commits January 21, 2026 14:30

[pre-commit.ci] auto fixes from pre-commit.com hooks

2950741

for more information, see https://pre-commit.ci

Merge branch 'main' into feat/eval-results

4167eb9

feat: update eval script to include model downloads and API configura…

d894217

…tion

chore: backup only

dc3dc95

Merge branch 'main' into feat/eval-results

266e570

feat: add evaluation scripts and README for KDD language modeling exp…

7ac1037

…eriments

chore: backup results

43008fe

Merge branch 'main' into feat/eval-results

9c6a617

refactor: update evaluation script to use new module structure

abc0c28

feat: add new models to evaluation script

030ecdf

chore: backup results

6187512

Ki-Seki added 18 commits January 26, 2026 13:29

chore: remove some useless evals

25b7081

chore: backup

3b047d1

Merge branch 'main' into feat/eval-results

18976ec

chore: backup

f8dcd18

Merge branch 'main' into feat/eval-results

9c89999

chore: backup

f1bcb31

Merge branch 'main' into feat/eval-results

c6a90bf

feat: add evaluation script and update .gitignore for privacy

dfc676f

backup

b56b002

chore: backup

abc9b0e

Merge branch 'main' into feat/eval-results

37c71d7

Merge branch 'main' into feat/eval-results

e44b746

feat: add eval

20f32f9

feat: add API key and base URL to eval script

2970920

Merge branch 'main' into feat/eval-results

fd18d08

chore: backup

fa7f28e

Merge branch 'main' into feat/eval-results

ff9192d

chore: backup

d7ec2ea

Ki-Seki marked this pull request as draft February 6, 2026 03:56

Ki-Seki and others added 5 commits February 8, 2026 19:36

chore: backup

b5cc868

Merge branch 'main' into feat/eval-results

1a03fca

Delete results/260128-kdd-expt-3 directory

59ba658

Merge branch 'main' into feat/eval-results

1378edc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: eval results#23

feat: eval results#23
Ki-Seki wants to merge 48 commits intomainfrom
feat/eval-results

Ki-Seki commented Jan 7, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

Ki-Seki commented Jan 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

Ki-Seki commented Jan 7, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

Ki-Seki commented Jan 11, 2026

Note: Eval Bench Sizes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants