evals: avoid calling torch.compile when not needed by dsocolobsky · Pull Request #618 · PsycheFoundation/nousnet

dsocolobsky · 2026-03-04T20:55:15Z

This improves performance, we can avoid calling torch.compile in evals since that precompiles length-specific kernels which we'll not reuse since every input has different input length.

WIP performance improvements might not be real, running tests.

Closes #588

This improves performance, we can avoid calling torch.compile in evals since that precompiles length-specific kernels which we'll not reuse since every input has different input length.

pefontana · 2026-03-06T20:04:52Z

@dsocolobsky
Okay, so I think that the benchmarks you shared are not really accurate.
Yes, the average time is slightly better in dy/evals-torchtitan-no-compile vs main, but I think that's due to not having the cache loaded (model or dataset tasks). Take a look at the range:

main: Range (min … max): 106.472 s … 159.700
dy/evals-torchtitan-no-compile: Range (min … max): 107.198 s … 111.477 s

So, best case (when the model and tasks are cached), the evaluations seem to take the same time.
I ran benchmarks and the eval times don't seem to improve.

### main no python full eval
- time ./target/release/examples/evaluate     --model NousResearch/Meta-Llama-3.1-8B     --data-parallelism 8     --tasks arc_easy

ARC-Easy: {"acc_norm": 0.8055555555555556, "acc_uncond": 0.7394781144781145, "acc": 0.8127104377104377}

real	2m12.354s
user	14m4.919s
sys	0m52.347s


### main python full eval
- time ./target/release/examples/evaluate     --model NousResearch/Meta-Llama-3.1-8B     --python --python-arch Torchtitan     --data-parallelism 8     --tasks arc_easy
I didnt finished because of long times, but estimate 30 min

### main python limit 100
- time ./target/release/examples/evaluate     --model NousResearch/Meta-Llama-3.1-8B     --python --python-arch Torchtitan     --data-parallelism 8     --tasks arc_easy --limit 100
real	2m10.221s
user	4m52.856s
sys	0m13.055s

# dy/evals-torchtitan-no-compile python limit 100
 time ./target/release/examples/evaluate     --model NousResearch/Meta-Llama-3.1-8B     --python --python-arch Torchtitan     --data-parallelism 8     --tasks arc_easy --limit 100
real	2m13.175s

I would expect results similar to the no-python run.
Also, this is not the better case to use hyperfine, with just a time measure is OK

pefontana

check comment

dsocolobsky · 2026-03-09T17:32:58Z

Okay yeah, I've been trying with larger models and more documents processed and sometimes it seems like I get a speedup but sometimes I do not, so I don't think it's enough to claim we're seeing a performance improvement, moreover I added some logs and apparently I do hit compilation code sometimes.

Will leave the PR as draft for a while to continue testing a few things.

dsocolobsky and others added 3 commits March 4, 2026 17:06

evals: avoid calling torch.compile when not needed

12ef503

This improves performance, we can avoid calling torch.compile in evals since that precompiles length-specific kernels which we'll not reuse since every input has different input length.

Merge branch 'main' into dy/evals-torchtitan-no-compile

04610fe

Merge branch 'main' into dy/evals-torchtitan-no-compile

a295108

pefontana requested changes Mar 6, 2026

View reviewed changes

try alternate approach to avoid compiling CUDA kernels

46f9a71

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evals: avoid calling torch.compile when not needed#618

evals: avoid calling torch.compile when not needed#618
dsocolobsky wants to merge 4 commits intomainfrom
dy/evals-torchtitan-no-compile

dsocolobsky commented Mar 4, 2026 •

edited

Loading

Uh oh!

pefontana commented Mar 6, 2026

Uh oh!

pefontana left a comment •

edited

Loading

Uh oh!

dsocolobsky commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dsocolobsky commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pefontana commented Mar 6, 2026

Uh oh!

pefontana left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dsocolobsky commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dsocolobsky commented Mar 4, 2026 •

edited

Loading

pefontana left a comment •

edited

Loading