Skip to content

Avoid retesting non-viable chunk sizes in tuner#212

Open
GMNGeoffrey wants to merge 2 commits into
aqlaboratory:mainfrom
GMNGeoffrey:chunk-tuner-fix-bin-search
Open

Avoid retesting non-viable chunk sizes in tuner#212
GMNGeoffrey wants to merge 2 commits into
aqlaboratory:mainfrom
GMNGeoffrey:chunk-tuner-fix-bin-search

Conversation

@GMNGeoffrey
Copy link
Copy Markdown
Contributor

@GMNGeoffrey GMNGeoffrey commented May 5, 2026

Summary
The chunk size tuner performs a binary search over chunk sizes, but the existing algorithm only kept track of the lower-bound of the search and so unnecessarily re-tested chunk sizes that had already been proven non-viable.

Changes

  • This commit adds more conventional hi/lo tracking to the chunk size search as well as tests for this failure mode.

Related Issues
Fixes #211

Testing
Added unit tests for this case and confirmed they failed with the old implementation.

Other Notes
Based on #207 (to avoid issues with the +4 on chunk sizes). Only the last commit is part of this PR

- Avoid weird addition of 4 to power-of-two chunk sizes. This was added
  in aqlaboratory@a9a12890d without
  explanation. We can hypothesize that it was related to adding 4 to an
  input dimension in trace_utils.py (trying to get a test case to fit in
  one chunk?), but that file was long ago deleted. This just looks like
  a bug and makes us hit unhappy paths all over the place. Fixes
  aqlaboratory#203
- Enable chunking for AuxiliaryHeadsAllAtom pairformer embedding when
  using optimized kernels. Without chunking, this is the first call to
  cause OOMs because its `diffusion_samples*sequence_length` batches.
  Chunking gets turned off in prediction_heads.py due to batch size > 1
  and use of optimized kernels because cross-sample chunking requires
  expanding out pair bias and they all require it to have size 1 in the
  second dimension with implicit broadcasting. So we turn on
  `apply_per_sample` when optimized kernels are in use. This splits the
  > 1 batch dimension, which avoids this problematic path and then we
  can do normal chunking for the rest if it's still too large. We could
  do something more elaborate (see suggestions in linked issue), but
  this is an improvement for now. Fixes
  aqlaboratory#206
The chunk size tuner performs a binary search over chunk sizes, but the
existing algorithm only kept track of the lower-bound of the search and
so unnecessarily re-tested chunk sizes that had already been proven
non-viable. This commit adds more conventional hi/lo tracking to the
chunk size search as well as tests for this failure mode.
@GMNGeoffrey
Copy link
Copy Markdown
Contributor Author

@christinaflo PTAL :-)

@christinaflo
Copy link
Copy Markdown
Collaborator

Hi @GMNGeoffrey, sorry I was out the last week, I'll take a look at these chunking PRs this week!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Chunk size tuner re-tests non-viable chunk sizes

2 participants