Error with loading model checkpoint #12399
-
|
Hi everyone. I was recently running a lightning model and saved a checkpoint to store the intermediate results. When I try to open the checkpoint, I get an error that positional arguments (used to initialize the lightning module) are not present. This wouldn't be a big deal but one of the positional arguments is the encoder (used for BarlowTwins training). I was worried if I loaded the model checkpoint with an encoder initialized with starting weights, this would overwrite the weight parameters stored in the checkpoint. See the error log and a block of code below. Any suggestions on how I can appropriately load this stored model to resume training? original model loaded with: |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
|
hey @dmandair ! did you call also note that, if you are passing an |
Beta Was this translation helpful? Give feedback.
hey @dmandair !
did you call
self.save_hyperparameters()inside yourLM.__init__? else hyperparameters won't be saved inside the checkpoint and you might need to provide them again usingLMModel.load_from_checkpoint(..., encoder=encoder, encoder_out_dim=encoder_out_dim, ...).also note that, if you are passing an
nn.Moduleinside your LM and callingself.save_hyperparameters(), it will save that too inside your hparams, which is not a good thing considering that nn.Modules are saved inside checkpoint state_dict and might create issues for you. Ideally, you should ignore them usingself.save_hyperparameters(ignore=['encoder']). Check out this PR: #12068