Skip to content

StopIteration error during training with modernbert-large-context-extension.yaml configuration #249

@mmichall

Description

@mmichall

While training a FlexBERT model using the yamls/modernbert/modernbert-large-context-extension.yaml configuration, the training process consistently crashes at the same point due to a StopIteration exception raised inside the dataloader. The issue occurs after converting a custom Polish corpus to the MosaicML Dataset (MDS) format following the official ModernBERT instructions.

To Reproduce

Steps to reproduce the behavior:

Clone the ModernBERT repository.

Prepare a custom Polish text corpus and convert it to MDS format using the provided dataset conversion utilities (as described in the repo’s documentation).

Launch training using Composer and the following configuration

The training process starts normally, but always crashes at the same iteration with the traceback shown below.

File "/usr/local/lib/python3.12/dist-packages/torch/utils/data/dataloader.py", line 701, in next
data = self._next_data()
File "/usr/local/lib/python3.12/dist-packages/torch/utils/data/dataloader.py", line 1438, in _next_data
raise StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
self.run()
File "/usr/lib/python3.12/threading.py", line 1010, in run
self._target(*self._args, **self._kwargs)
File "/proot/src/sequence_packer.py", line 514, in _background_fill
item = next(self.iterator)
File "/proot/src/sequence_packer.py", line 251, in _generate_batches
retval = self._create_batch()
File "/proot/src/sequence_packer.py", line 450, in _create_batch
items_added = self._fill_buffer(items_to_fetch)

Expected behavior

Training should iterate through all samples in the converted MDS dataset without triggering a StopIteration in the background dataloader thread. The dataloader should handle dataset exhaustion gracefully and signal epoch completion rather than crash.

Environment

ModernBERT version: latest main branch (as of Oct 2025)
PyTorch: 2.6.0a0+ecf3bae40a
CUDA: 12.4
Datasets: 4.1.0
Python: 3.12
OS: Linux (Docker environment)

Additional context

The error occurs deterministically (always at the same iteration). It seems related to dataset exhaustion or improper synchronization between the dataloader and the _background_fill thread inside sequence_packer.py. Possibly, the iterator is being exhausted without proper handling of StopIteration when using custom MDS datasets.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions