Skip to content

RuntimeError: CUDA out of memory. Tried to allocate 240.00 MiB (GPU 0; 14.76 GiB total capacity; 13.54 GiB already allocated; 73.75 MiB free; 13.63 GiB reserved in total by PyTorch) #19

@danielbellhv

Description

@danielbellhv

I am new to using PyTorch. I am using Google Colaboratory, Accelerator is GPU.

Attempted Solutions (same error):

  • torch.cuda.empty_cache(), suggested here.
  • torch.cuda.memory_summary(device=None, abbreviated=False), suggested here.
  • batch_size=1 for dm instance.

Attempted Solutions (different error):

  • torch.cuda.clear_memory_allocated(), suggested here.
    AttributeError: module 'torch.cuda' has no attribute 'clear_memory_allocated'

Code:

trainer = pl.Trainer.from_argparse_args(args, callbacks=[checkpoint_callback], logger=loggers)
#torch.cuda.empty_cache()
#torch.cuda.memory_summary(device=None, abbreviated=False)
#torch.cuda.clear_memory_allocated()
trainer.fit(model, dm) # ERROR
model_file = os.path.join(args.modeldir, 'last.ckpt')
trainer.save_checkpoint(model_file, weights_only=True)

Error:

RuntimeError: CUDA out of memory. Tried to allocate 240.00 MiB (GPU 0; 11.17 GiB total capacity; 10.58 GiB already allocated; 63.81 MiB free; 10.66 GiB reserved in total by PyTorch)

Output and Traceback:

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForMulticlassSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForMulticlassSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMulticlassSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMulticlassSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifiers.1.bias', 'classifiers.0.weight', 'classifiers.1.weight', 'classifiers.0.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/distributed.py:52: UserWarning: ModelCheckpoint(save_last=True, monitor=None) is a redundant configuration. You can save the last checkpoint with ModelCheckpoint(save_top_k=None, monitor=None).
  warnings.warn(*args, **kwargs)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name            | Type                                    | Params
----------------------------------------------------------------------------
0 | model           | BertForMulticlassSequenceClassification | 108 M 
1 | valid_acc       | Accuracy                                | 0     
2 | valid_f1        | F1                                      | 0     
3 | valid_acc_multi | ModuleList                              | 0     
----------------------------------------------------------------------------
108 M     Trainable params
0         Non-trainable params
108 M     Total params
433.413   Total estimated model params size (MB)
###score: val_score### 0.0625
/usr/local/lib/python3.7/dist-packages/torchmetrics/utilities/prints.py:36: UserWarning: The ``compute`` method of metric Accuracy was called before the ``update`` method which may lead to errors, as metric states have not yet been updated.
  warnings.warn(*args, **kwargs)
/usr/local/lib/python3.7/dist-packages/torchmetrics/utilities/prints.py:36: UserWarning: The ``compute`` method of metric F1 was called before the ``update`` method which may lead to errors, as metric states have not yet been updated.
  warnings.warn(*args, **kwargs)
/usr/local/lib/python3.7/dist-packages/torchmetrics/utilities/prints.py:36: UserWarning: The ``compute`` method of metric CompositionalMetric was called before the ``update`` method which may lead to errors, as metric states have not yet been updated.
  warnings.warn(*args, **kwargs)
Epoch 0: 0% 0/3715 [00:00<?, ?it/s]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-13-89b1728bd5a6> in <module>()
      7       --gpus 1
      8     """.split()
----> 9 run_training(args)

37 frames
<ipython-input-5-7f8e9eed480d> in run_training(input)
     68 
     69     trainer = pl.Trainer.from_argparse_args(args, callbacks=[checkpoint_callback], logger=loggers)
---> 70     trainer.fit(model, dm)
     71     model_file = os.path.join(args.modeldir, 'last.ckpt')
     72     trainer.save_checkpoint(model_file, weights_only=True)

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
    497 
    498         # dispath `start_training` or `start_testing` or `start_predicting`
--> 499         self.dispatch()
    500 
    501         # plugin will finalized fitting (e.g. ddp_spawn will load trained model)

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py in dispatch(self)
    544 
    545         else:
--> 546             self.accelerator.start_training(self)
    547 
    548     def train_or_test_or_predict(self):

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/accelerators/accelerator.py in start_training(self, trainer)
     71 
     72     def start_training(self, trainer):
---> 73         self.training_type_plugin.start_training(trainer)
     74 
     75     def start_testing(self, trainer):

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py in start_training(self, trainer)
    112     def start_training(self, trainer: 'Trainer') -> None:
    113         # double dispatch to initiate the training loop
--> 114         self._results = trainer.run_train()
    115 
    116     def start_testing(self, trainer: 'Trainer') -> None:

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py in run_train(self)
    635                 with self.profiler.profile("run_training_epoch"):
    636                     # run train epoch
--> 637                     self.train_loop.run_training_epoch()
    638 
    639                 if self.max_steps and self.max_steps <= self.global_step:

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/training_loop.py in run_training_epoch(self)
    490             # ------------------------------------
    491             with self.trainer.profiler.profile("run_training_batch"):
--> 492                 batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
    493 
    494             # when returning -1 from train_step, we end epoch early

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/training_loop.py in run_training_batch(self, batch, batch_idx, dataloader_idx)
    652 
    653                         # optimizer step
--> 654                         self.optimizer_step(optimizer, opt_idx, batch_idx, train_step_and_backward_closure)
    655 
    656                     else:

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/training_loop.py in optimizer_step(self, optimizer, opt_idx, batch_idx, train_step_and_backward_closure)
    431             on_tpu=self.trainer._device_type == DeviceType.TPU and _TPU_AVAILABLE,
    432             using_native_amp=using_native_amp,
--> 433             using_lbfgs=is_lbfgs,
    434         )
    435 

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/core/lightning.py in optimizer_step(self, epoch, batch_idx, optimizer, optimizer_idx, optimizer_closure, on_tpu, using_native_amp, using_lbfgs)
   1388             # wraps into LightingOptimizer only for running step
   1389             optimizer = LightningOptimizer._to_lightning_optimizer(optimizer, self.trainer, optimizer_idx)
-> 1390         optimizer.step(closure=optimizer_closure)
   1391 
   1392     def optimizer_zero_grad(self, epoch: int, batch_idx: int, optimizer: Optimizer, optimizer_idx: int):

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/core/optimizer.py in step(self, closure, *args, **kwargs)
    212             profiler_name = f"optimizer_step_and_closure_{self._optimizer_idx}"
    213 
--> 214         self.__optimizer_step(*args, closure=closure, profiler_name=profiler_name, **kwargs)
    215         self._total_optimizer_step_calls += 1
    216 

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/core/optimizer.py in __optimizer_step(self, closure, profiler_name, **kwargs)
    132 
    133         with trainer.profiler.profile(profiler_name):
--> 134             trainer.accelerator.optimizer_step(optimizer, self._optimizer_idx, lambda_closure=closure, **kwargs)
    135 
    136     def step(self, *args, closure: Optional[Callable] = None, **kwargs):

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/accelerators/accelerator.py in optimizer_step(self, optimizer, opt_idx, lambda_closure, **kwargs)
    275         )
    276         if make_optimizer_step:
--> 277             self.run_optimizer_step(optimizer, opt_idx, lambda_closure, **kwargs)
    278         self.precision_plugin.post_optimizer_step(optimizer, opt_idx)
    279         self.training_type_plugin.post_optimizer_step(optimizer, opt_idx, **kwargs)

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/accelerators/accelerator.py in run_optimizer_step(self, optimizer, optimizer_idx, lambda_closure, **kwargs)
    280 
    281     def run_optimizer_step(self, optimizer: Optimizer, optimizer_idx: int, lambda_closure: Callable, **kwargs):
--> 282         self.training_type_plugin.optimizer_step(optimizer, lambda_closure=lambda_closure, **kwargs)
    283 
    284     def optimizer_zero_grad(self, current_epoch: int, batch_idx: int, optimizer: Optimizer, opt_idx: int) -> None:

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py in optimizer_step(self, optimizer, lambda_closure, **kwargs)
    161 
    162     def optimizer_step(self, optimizer: torch.optim.Optimizer, lambda_closure: Callable, **kwargs):
--> 163         optimizer.step(closure=lambda_closure, **kwargs)

/usr/local/lib/python3.7/dist-packages/torch/optim/optimizer.py in wrapper(*args, **kwargs)
     86                 profile_name = "Optimizer.step#{}.step".format(obj.__class__.__name__)
     87                 with torch.autograd.profiler.record_function(profile_name):
---> 88                     return func(*args, **kwargs)
     89             return wrapper
     90 

/usr/local/lib/python3.7/dist-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
     26         def decorate_context(*args, **kwargs):
     27             with self.__class__():
---> 28                 return func(*args, **kwargs)
     29         return cast(F, decorate_context)
     30 

/usr/local/lib/python3.7/dist-packages/torch/optim/adam.py in step(self, closure)
     64         if closure is not None:
     65             with torch.enable_grad():
---> 66                 loss = closure()
     67 
     68         for group in self.param_groups:

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/training_loop.py in train_step_and_backward_closure()
    647                         def train_step_and_backward_closure():
    648                             result = self.training_step_and_backward(
--> 649                                 split_batch, batch_idx, opt_idx, optimizer, self.trainer.hiddens
    650                             )
    651                             return None if result is None else result.loss

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/training_loop.py in training_step_and_backward(self, split_batch, batch_idx, opt_idx, optimizer, hiddens)
    740         with self.trainer.profiler.profile("training_step_and_backward"):
    741             # lightning module hook
--> 742             result = self.training_step(split_batch, batch_idx, opt_idx, hiddens)
    743             self._curr_step_result = result
    744 

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/training_loop.py in training_step(self, split_batch, batch_idx, opt_idx, hiddens)
    291             model_ref._results = Result()
    292             with self.trainer.profiler.profile("training_step"):
--> 293                 training_step_output = self.trainer.accelerator.training_step(args)
    294                 self.trainer.accelerator.post_training_step()
    295 

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/accelerators/accelerator.py in training_step(self, args)
    154 
    155         with self.precision_plugin.train_step_context(), self.training_type_plugin.train_step_context():
--> 156             return self.training_type_plugin.training_step(*args)
    157 
    158     def post_training_step(self):

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py in training_step(self, *args, **kwargs)
    123 
    124     def training_step(self, *args, **kwargs):
--> 125         return self.lightning_module.training_step(*args, **kwargs)
    126 
    127     def post_training_step(self):

<ipython-input-4-a6cb4f83dcb2> in training_step(self, batch, batch_idx)
    104     def training_step(self, batch, batch_idx):
    105         x, y_true = batch
--> 106         loss, _ = self(x, labels=y_true)
    107         self.log('train_loss', loss)
    108         return loss

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

<ipython-input-4-a6cb4f83dcb2> in forward(self, *input, **kwargs)
    100 
    101     def forward(self, *input, **kwargs):
--> 102         return self.model(*input, **kwargs)
    103 
    104     def training_step(self, batch, batch_idx):

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

<ipython-input-4-a6cb4f83dcb2> in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)
     50             output_attentions=output_attentions,
     51             output_hidden_states=output_hidden_states,
---> 52             return_dict=return_dict,
     53         )
     54 

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
    999             output_attentions=output_attentions,
   1000             output_hidden_states=output_hidden_states,
-> 1001             return_dict=return_dict,
   1002         )
   1003         sequence_output = encoder_outputs[0]

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
    587                     encoder_attention_mask,
    588                     past_key_value,
--> 589                     output_attentions,
    590                 )
    591 

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, past_key_value, output_attentions)
    473             head_mask,
    474             output_attentions=output_attentions,
--> 475             past_key_value=self_attn_past_key_value,
    476         )
    477         attention_output = self_attention_outputs[0]

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, past_key_value, output_attentions)
    406             encoder_attention_mask,
    407             past_key_value,
--> 408             output_attentions,
    409         )
    410         attention_output = self.output(self_outputs[0], hidden_states)

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, past_key_value, output_attentions)
    303 
    304         # Take the dot product between "query" and "key" to get the raw attention scores.
--> 305         attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
    306 
    307         if self.position_embedding_type == "relative_key" or self.position_embedding_type == "relative_key_query":

RuntimeError: CUDA out of memory. Tried to allocate 240.00 MiB (GPU 0; 11.17 GiB total capacity; 10.58 GiB already allocated; 63.81 MiB free; 10.66 GiB reserved in total by PyTorch)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions