Skip to content

[MiniCPM-o 4.5] LayerDrop appears disabled during training because 'encoder_layerdrop' is set to '0.0' #1118

Description

@standardwish

Hi, thank you for releasing MiniCPM-o-4_5 and making it available as open source.

I was looking into the MiniCPM-o-4_5 code and noticed that LayerDrop seems to be effectively disabled in the released checkpoint config.

In modeling_minicpmo.py, MiniCPMWhisperEncoder inherits from transformers.models.whisper.modeling_whisper.WhisperEncoder:

class MiniCPMWhisperEncoder(WhisperEncoder):
    def __init__(self, config: WhisperConfig):
        super().__init__(config)

In Hugging Face Transformers, WhisperEncoder.__init__ sets:

self.layerdrop = config.encoder_layerdrop

Then during the forward pass, LayerDrop is only applied in training mode:

if self.training:
    dropout_probability = torch.rand([])
    if dropout_probability < self.layerdrop:
        to_drop = True

However, in the released config.json, I see:

"encoder_layerdrop": 0.0,
"decoder_layerdrop": 0.0

So even when the model is in training mode, self.layerdrop is 0.0, which means no encoder layers will actually be skipped.

My questions are:

  1. Is this intentional for MiniCPM-o-4_5?
  2. If users fine-tune MiniCPM-o-4_5, should encoder_layerdrop remain 0.0, or is there a recommended non-zero value?
  3. Was LayerDrop used during the original training, or is this code path inherited from Whisper but not used for MiniCPM-o-4_5?

I just wanted to confirm whether the current behavior is expected or whether the released config should use a different LayerDrop value.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions