Hi, thank you for releasing MiniCPM-o-4_5 and making it available as open source.
I was looking into the MiniCPM-o-4_5 code and noticed that LayerDrop seems to be effectively disabled in the released checkpoint config.
In modeling_minicpmo.py, MiniCPMWhisperEncoder inherits from transformers.models.whisper.modeling_whisper.WhisperEncoder:
class MiniCPMWhisperEncoder(WhisperEncoder):
def __init__(self, config: WhisperConfig):
super().__init__(config)
In Hugging Face Transformers, WhisperEncoder.__init__ sets:
self.layerdrop = config.encoder_layerdrop
Then during the forward pass, LayerDrop is only applied in training mode:
if self.training:
dropout_probability = torch.rand([])
if dropout_probability < self.layerdrop:
to_drop = True
However, in the released config.json, I see:
"encoder_layerdrop": 0.0,
"decoder_layerdrop": 0.0
So even when the model is in training mode, self.layerdrop is 0.0, which means no encoder layers will actually be skipped.
My questions are:
- Is this intentional for MiniCPM-o-4_5?
- If users fine-tune MiniCPM-o-4_5, should
encoder_layerdrop remain 0.0, or is there a recommended non-zero value?
- Was LayerDrop used during the original training, or is this code path inherited from Whisper but not used for MiniCPM-o-4_5?
I just wanted to confirm whether the current behavior is expected or whether the released config should use a different LayerDrop value.
Thanks!
Hi, thank you for releasing MiniCPM-o-4_5 and making it available as open source.
I was looking into the MiniCPM-o-4_5 code and noticed that LayerDrop seems to be effectively disabled in the released checkpoint config.
In
modeling_minicpmo.py,MiniCPMWhisperEncoderinherits fromtransformers.models.whisper.modeling_whisper.WhisperEncoder:In Hugging Face Transformers,
WhisperEncoder.__init__sets:Then during the forward pass, LayerDrop is only applied in training mode:
However, in the released
config.json, I see:So even when the model is in training mode,
self.layerdropis0.0, which means no encoder layers will actually be skipped.My questions are:
encoder_layerdropremain0.0, or is there a recommended non-zero value?I just wanted to confirm whether the current behavior is expected or whether the released config should use a different LayerDrop value.
Thanks!