-
Notifications
You must be signed in to change notification settings - Fork 31.4k
Fix BLT training_ci overfit test #42685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Quick update:
The remaining CI failures are in
These come from the assisted decoding tests using the new |
|
run-slow: blt |
|
💔 This comment contains |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: blt |
I found it weird that the generation is not working with
As for lowering the |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
💔 This comment contains |
|
edit (more info): The so |
What does this PR do?
This PR fixes the BLT entry in the new training CI by making the tiny BLT model both:
In the current setup, the tiny BLT config used in
BltModelTest::test_training_overfitshows:but the test failed because:
use_cache=Falsedid.This PR makes two changes:
BLT config
use_cacheargument toBltConfig.__init__with defaultFalse, and forward it intosuper().__init__.use_cache=False(matching the recommended generation settings in the BLT model card), while still respecting any explicituse_cachevalue in existing configs.model.generate(...)uses the non-cache path by default for BLT, which fixes the generation mismatch in the training overfit test.BLT tests
BltModelTest(intests/models/blt/test_modeling_blt.py), override the training thresholds used byTrainingTesterMixin:training_loss_reduction_threshold = 0.9,training_grad_norm_reduction_threshold = 0.8for BLT only.test_training_overfit, so the sharedtest_training_overfitfromTrainingTesterMixinnow runs with BLT thresholds.0.8is a stable but still strict threshold, while the loss overfits very strongly (~95% reduction).Verification (local):
Command:
pytest tests/models/blt/test_modeling_blt.py::BltModelTest::test_training_overfit -s -vvResults:
Fixes #42629