RuntimeError: CUDA out of memory. Tried to allocate 240.00 MiB (GPU 0; 14.76 GiB total capacity; 13.54 GiB already allocated; 73.75 MiB free; 13.63 GiB reserved in total by PyTorch)

I am new to using `PyTorch`. I am using `Google Colaboratory`, `Accelerator` is `GPU`.

Attempted Solutions (same error):
- `torch.cuda.empty_cache()`, [suggested here][1].
- `torch.cuda.memory_summary(device=None, abbreviated=False)`, [suggested here][1].
- `batch_size=1` for `dm` instance.

Attempted Solutions (different error):
- `torch.cuda.clear_memory_allocated()`, [suggested here][2].
`AttributeError: module 'torch.cuda' has no attribute 'clear_memory_allocated'`

Code:
```
trainer = pl.Trainer.from_argparse_args(args, callbacks=[checkpoint_callback], logger=loggers)
#torch.cuda.empty_cache()
#torch.cuda.memory_summary(device=None, abbreviated=False)
#torch.cuda.clear_memory_allocated()
trainer.fit(model, dm) # ERROR
model_file = os.path.join(args.modeldir, 'last.ckpt')
trainer.save_checkpoint(model_file, weights_only=True)
```

Error:
```
RuntimeError: CUDA out of memory. Tried to allocate 240.00 MiB (GPU 0; 11.17 GiB total capacity; 10.58 GiB already allocated; 63.81 MiB free; 10.66 GiB reserved in total by PyTorch)
```

Output and Traceback:
```
Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForMulticlassSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForMulticlassSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMulticlassSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMulticlassSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifiers.1.bias', 'classifiers.0.weight', 'classifiers.1.weight', 'classifiers.0.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/distributed.py:52: UserWarning: ModelCheckpoint(save_last=True, monitor=None) is a redundant configuration. You can save the last checkpoint with ModelCheckpoint(save_top_k=None, monitor=None).
  warnings.warn(*args, **kwargs)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name            | Type                                    | Params
----------------------------------------------------------------------------
0 | model           | BertForMulticlassSequenceClassification | 108 M 
1 | valid_acc       | Accuracy                                | 0     
2 | valid_f1        | F1                                      | 0     
3 | valid_acc_multi | ModuleList                              | 0     
----------------------------------------------------------------------------
108 M     Trainable params
0         Non-trainable params
108 M     Total params
433.413   Total estimated model params size (MB)
###score: val_score### 0.0625
/usr/local/lib/python3.7/dist-packages/torchmetrics/utilities/prints.py:36: UserWarning: The ``compute`` method of metric Accuracy was called before the ``update`` method which may lead to errors, as metric states have not yet been updated.
  warnings.warn(*args, **kwargs)
/usr/local/lib/python3.7/dist-packages/torchmetrics/utilities/prints.py:36: UserWarning: The ``compute`` method of metric F1 was called before the ``update`` method which may lead to errors, as metric states have not yet been updated.
  warnings.warn(*args, **kwargs)
/usr/local/lib/python3.7/dist-packages/torchmetrics/utilities/prints.py:36: UserWarning: The ``compute`` method of metric CompositionalMetric was called before the ``update`` method which may lead to errors, as metric states have not yet been updated.
  warnings.warn(*args, **kwargs)
Epoch 0: 0% 0/3715 [00:00<?, ?it/s]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-13-89b1728bd5a6> in <module>()
      7       --gpus 1
      8     """.split()
----> 9 run_training(args)

37 frames
<ipython-input-5-7f8e9eed480d> in run_training(input)
     68 
     69     trainer = pl.Trainer.from_argparse_args(args, callbacks=[checkpoint_callback], logger=loggers)
---> 70     trainer.fit(model, dm)
     71     model_file = os.path.join(args.modeldir, 'last.ckpt')
     72     trainer.save_checkpoint(model_file, weights_only=True)

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
    497 
    498         # dispath `start_training` or `start_testing` or `start_predicting`
--> 499         self.dispatch()
    500 
    501         # plugin will finalized fitting (e.g. ddp_spawn will load trained model)

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py in dispatch(self)
    544 
    545         else:
--> 546             self.accelerator.start_training(self)
    547 
    548     def train_or_test_or_predict(self):

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/accelerators/accelerator.py in start_training(self, trainer)
     71 
     72     def start_training(self, trainer):
---> 73         self.training_type_plugin.start_training(trainer)
     74 
     75     def start_testing(self, trainer):

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py in start_training(self, trainer)
    112     def start_training(self, trainer: 'Trainer') -> None:
    113         # double dispatch to initiate the training loop
--> 114         self._results = trainer.run_train()
    115 
    116     def start_testing(self, trainer: 'Trainer') -> None:

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py in run_train(self)
    635                 with self.profiler.profile("run_training_epoch"):
    636                     # run train epoch
--> 637                     self.train_loop.run_training_epoch()
    638 
    639                 if self.max_steps and self.max_steps <= self.global_step:

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/training_loop.py in run_training_epoch(self)
    490             # ------------------------------------
    491             with self.trainer.profiler.profile("run_training_batch"):
--> 492                 batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
    493 
    494             # when returning -1 from train_step, we end epoch early

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/training_loop.py in run_training_batch(self, batch, batch_idx, dataloader_idx)
    652 
    653                         # optimizer step
--> 654                         self.optimizer_step(optimizer, opt_idx, batch_idx, train_step_and_backward_closure)
    655 
    656                     else:

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/training_loop.py in optimizer_step(self, optimizer, opt_idx, batch_idx, train_step_and_backward_closure)
    431             on_tpu=self.trainer._device_type == DeviceType.TPU and _TPU_AVAILABLE,
    432             using_native_amp=using_native_amp,
--> 433             using_lbfgs=is_lbfgs,
    434         )
    435 

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/core/lightning.py in optimizer_step(self, epoch, batch_idx, optimizer, optimizer_idx, optimizer_closure, on_tpu, using_native_amp, using_lbfgs)
   1388             # wraps into LightingOptimizer only for running step
   1389             optimizer = LightningOptimizer._to_lightning_optimizer(optimizer, self.trainer, optimizer_idx)
-> 1390         optimizer.step(closure=optimizer_closure)
   1391 
   1392     def optimizer_zero_grad(self, epoch: int, batch_idx: int, optimizer: Optimizer, optimizer_idx: int):

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/core/optimizer.py in step(self, closure, *args, **kwargs)
    212             profiler_name = f"optimizer_step_and_closure_{self._optimizer_idx}"
    213 
--> 214         self.__optimizer_step(*args, closure=closure, profiler_name=profiler_name, **kwargs)
    215         self._total_optimizer_step_calls += 1
    216 

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/core/optimizer.py in __optimizer_step(self, closure, profiler_name, **kwargs)
    132 
    133         with trainer.profiler.profile(profiler_name):
--> 134             trainer.accelerator.optimizer_step(optimizer, self._optimizer_idx, lambda_closure=closure, **kwargs)
    135 
    136     def step(self, *args, closure: Optional[Callable] = None, **kwargs):

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/accelerators/accelerator.py in optimizer_step(self, optimizer, opt_idx, lambda_closure, **kwargs)
    275         )
    276         if make_optimizer_step:
--> 277             self.run_optimizer_step(optimizer, opt_idx, lambda_closure, **kwargs)
    278         self.precision_plugin.post_optimizer_step(optimizer, opt_idx)
    279         self.training_type_plugin.post_optimizer_step(optimizer, opt_idx, **kwargs)

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/accelerators/accelerator.py in run_optimizer_step(self, optimizer, optimizer_idx, lambda_closure, **kwargs)
    280 
    281     def run_optimizer_step(self, optimizer: Optimizer, optimizer_idx: int, lambda_closure: Callable, **kwargs):
--> 282         self.training_type_plugin.optimizer_step(optimizer, lambda_closure=lambda_closure, **kwargs)
    283 
    284     def optimizer_zero_grad(self, current_epoch: int, batch_idx: int, optimizer: Optimizer, opt_idx: int) -> None:

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py in optimizer_step(self, optimizer, lambda_closure, **kwargs)
    161 
    162     def optimizer_step(self, optimizer: torch.optim.Optimizer, lambda_closure: Callable, **kwargs):
--> 163         optimizer.step(closure=lambda_closure, **kwargs)

/usr/local/lib/python3.7/dist-packages/torch/optim/optimizer.py in wrapper(*args, **kwargs)
     86                 profile_name = "Optimizer.step#{}.step".format(obj.__class__.__name__)
     87                 with torch.autograd.profiler.record_function(profile_name):
---> 88                     return func(*args, **kwargs)
     89             return wrapper
     90 

/usr/local/lib/python3.7/dist-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
     26         def decorate_context(*args, **kwargs):
     27             with self.__class__():
---> 28                 return func(*args, **kwargs)
     29         return cast(F, decorate_context)
     30 

/usr/local/lib/python3.7/dist-packages/torch/optim/adam.py in step(self, closure)
     64         if closure is not None:
     65             with torch.enable_grad():
---> 66                 loss = closure()
     67 
     68         for group in self.param_groups:

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/training_loop.py in train_step_and_backward_closure()
    647                         def train_step_and_backward_closure():
    648                             result = self.training_step_and_backward(
--> 649                                 split_batch, batch_idx, opt_idx, optimizer, self.trainer.hiddens
    650                             )
    651                             return None if result is None else result.loss

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/training_loop.py in training_step_and_backward(self, split_batch, batch_idx, opt_idx, optimizer, hiddens)
    740         with self.trainer.profiler.profile("training_step_and_backward"):
    741             # lightning module hook
--> 742             result = self.training_step(split_batch, batch_idx, opt_idx, hiddens)
    743             self._curr_step_result = result
    744 

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/training_loop.py in training_step(self, split_batch, batch_idx, opt_idx, hiddens)
    291             model_ref._results = Result()
    292             with self.trainer.profiler.profile("training_step"):
--> 293                 training_step_output = self.trainer.accelerator.training_step(args)
    294                 self.trainer.accelerator.post_training_step()
    295 

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/accelerators/accelerator.py in training_step(self, args)
    154 
    155         with self.precision_plugin.train_step_context(), self.training_type_plugin.train_step_context():
--> 156             return self.training_type_plugin.training_step(*args)
    157 
    158     def post_training_step(self):

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py in training_step(self, *args, **kwargs)
    123 
    124     def training_step(self, *args, **kwargs):
--> 125         return self.lightning_module.training_step(*args, **kwargs)
    126 
    127     def post_training_step(self):

<ipython-input-4-a6cb4f83dcb2> in training_step(self, batch, batch_idx)
    104     def training_step(self, batch, batch_idx):
    105         x, y_true = batch
--> 106         loss, _ = self(x, labels=y_true)
    107         self.log('train_loss', loss)
    108         return loss

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

<ipython-input-4-a6cb4f83dcb2> in forward(self, *input, **kwargs)
    100 
    101     def forward(self, *input, **kwargs):
--> 102         return self.model(*input, **kwargs)
    103 
    104     def training_step(self, batch, batch_idx):

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

<ipython-input-4-a6cb4f83dcb2> in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)
     50             output_attentions=output_attentions,
     51             output_hidden_states=output_hidden_states,
---> 52             return_dict=return_dict,
     53         )
     54 

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
    999             output_attentions=output_attentions,
   1000             output_hidden_states=output_hidden_states,
-> 1001             return_dict=return_dict,
   1002         )
   1003         sequence_output = encoder_outputs[0]

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
    587                     encoder_attention_mask,
    588                     past_key_value,
--> 589                     output_attentions,
    590                 )
    591 

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, past_key_value, output_attentions)
    473             head_mask,
    474             output_attentions=output_attentions,
--> 475             past_key_value=self_attn_past_key_value,
    476         )
    477         attention_output = self_attention_outputs[0]

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, past_key_value, output_attentions)
    406             encoder_attention_mask,
    407             past_key_value,
--> 408             output_attentions,
    409         )
    410         attention_output = self.output(self_outputs[0], hidden_states)

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, past_key_value, output_attentions)
    303 
    304         # Take the dot product between "query" and "key" to get the raw attention scores.
--> 305         attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
    306 
    307         if self.position_embedding_type == "relative_key" or self.position_embedding_type == "relative_key_query":

RuntimeError: CUDA out of memory. Tried to allocate 240.00 MiB (GPU 0; 11.17 GiB total capacity; 10.58 GiB already allocated; 63.81 MiB free; 10.66 GiB reserved in total by PyTorch)
```


  [1]: https://stackoverflow.com/questions/59129812/how-to-avoid-cuda-out-of-memory-in-pytorch
  [2]: https://discuss.pytorch.org/t/unable-to-allocate-cuda-memory-when-there-is-enough-of-cached-memory/33296/8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: CUDA out of memory. Tried to allocate 240.00 MiB (GPU 0; 14.76 GiB total capacity; 13.54 GiB already allocated; 73.75 MiB free; 13.63 GiB reserved in total by PyTorch) #19

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

RuntimeError: CUDA out of memory. Tried to allocate 240.00 MiB (GPU 0; 14.76 GiB total capacity; 13.54 GiB already allocated; 73.75 MiB free; 13.63 GiB reserved in total by PyTorch) #19

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions