StopIteration error during training with modernbert-large-context-extension.yaml configuration

While training a FlexBERT model using the yamls/modernbert/modernbert-large-context-extension.yaml configuration, the training process consistently crashes at the same point due to a StopIteration exception raised inside the dataloader. The issue occurs after converting a custom Polish corpus to the MosaicML Dataset (MDS) format following the official ModernBERT instructions.

### To Reproduce
Steps to reproduce the behavior:

Clone the ModernBERT repository.

Prepare a custom Polish text corpus and convert it to MDS format using the provided dataset conversion utilities (as described in the repo’s documentation).

Launch training using Composer and the following configuration

The training process starts normally, but always crashes at the same iteration with the traceback shown below.

File "/usr/local/lib/python3.12/dist-packages/torch/utils/data/dataloader.py", line 701, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/data/dataloader.py", line 1438, in _next_data
    raise StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/proot/src/sequence_packer.py", line 514, in _background_fill
    item = next(self.iterator)
  File "/proot/src/sequence_packer.py", line 251, in _generate_batches
    retval = self._create_batch()
  File "/proot/src/sequence_packer.py", line 450, in _create_batch
    items_added = self._fill_buffer(items_to_fetch)

### Expected behavior
Training should iterate through all samples in the converted MDS dataset without triggering a StopIteration in the background dataloader thread. The dataloader should handle dataset exhaustion gracefully and signal epoch completion rather than crash.

### Environment

ModernBERT version: latest main branch (as of Oct 2025)
PyTorch: 2.6.0a0+ecf3bae40a
CUDA: 12.4
Datasets: 4.1.0
Python: 3.12
OS: Linux (Docker environment)

### Additional context
The error occurs deterministically (always at the same iteration). It seems related to dataset exhaustion or improper synchronization between the dataloader and the _background_fill thread inside sequence_packer.py. Possibly, the iterator is being exhausted without proper handling of StopIteration when using custom MDS datasets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

StopIteration error during training with modernbert-large-context-extension.yaml configuration #249

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

StopIteration error during training with modernbert-large-context-extension.yaml configuration #249

Description

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions