-
Notifications
You must be signed in to change notification settings - Fork 138
Description
While training a FlexBERT model using the yamls/modernbert/modernbert-large-context-extension.yaml configuration, the training process consistently crashes at the same point due to a StopIteration exception raised inside the dataloader. The issue occurs after converting a custom Polish corpus to the MosaicML Dataset (MDS) format following the official ModernBERT instructions.
To Reproduce
Steps to reproduce the behavior:
Clone the ModernBERT repository.
Prepare a custom Polish text corpus and convert it to MDS format using the provided dataset conversion utilities (as described in the repo’s documentation).
Launch training using Composer and the following configuration
The training process starts normally, but always crashes at the same iteration with the traceback shown below.
File "/usr/local/lib/python3.12/dist-packages/torch/utils/data/dataloader.py", line 701, in next
data = self._next_data()
File "/usr/local/lib/python3.12/dist-packages/torch/utils/data/dataloader.py", line 1438, in _next_data
raise StopIteration
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
self.run()
File "/usr/lib/python3.12/threading.py", line 1010, in run
self._target(*self._args, **self._kwargs)
File "/proot/src/sequence_packer.py", line 514, in _background_fill
item = next(self.iterator)
File "/proot/src/sequence_packer.py", line 251, in _generate_batches
retval = self._create_batch()
File "/proot/src/sequence_packer.py", line 450, in _create_batch
items_added = self._fill_buffer(items_to_fetch)
Expected behavior
Training should iterate through all samples in the converted MDS dataset without triggering a StopIteration in the background dataloader thread. The dataloader should handle dataset exhaustion gracefully and signal epoch completion rather than crash.
Environment
ModernBERT version: latest main branch (as of Oct 2025)
PyTorch: 2.6.0a0+ecf3bae40a
CUDA: 12.4
Datasets: 4.1.0
Python: 3.12
OS: Linux (Docker environment)
Additional context
The error occurs deterministically (always at the same iteration). It seems related to dataset exhaustion or improper synchronization between the dataloader and the _background_fill thread inside sequence_packer.py. Possibly, the iterator is being exhausted without proper handling of StopIteration when using custom MDS datasets.