Skip to content

force same gpu#398

Merged
mike-ferguson merged 1 commit intomainfrom
same_gpu_load
Mar 19, 2026
Merged

force same gpu#398
mike-ferguson merged 1 commit intomainfrom
same_gpu_load

Conversation

@mike-ferguson
Copy link
Copy Markdown
Member

Fix multi-GPU device mismatch in HuggingfaceSubject

Problem

When running with device_map='auto' on multi-GPU machines, inputs were always sent to cuda:0 while the embedding layer could be on another GPU (e.g. cuda:3), causing:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:3 and cuda:0!

Small models (e.g. DistilGPT-2) were most affected because device_map='auto' can place the embedding layer on a non-zero device.

Solution

  1. Use the embedding layer's device for inputs instead of assuming cuda:0: self.device = self.basemodel.get_input_embeddings().weight.device
  2. In estimate_reading_times, move actual_tokens to predicted_logits.device before F.cross_entropy, since logits can reside on a different GPU with device_map='auto'.

Impact

  • Fixes multi-GPU runs for all model sizes when device_map='auto' is used.
  • No behavior change for single-GPU use.
  • Batch jobs typically use one GPU per job, so they were unaffected.

@KartikP KartikP added the OOM label Mar 18, 2026
@mike-ferguson mike-ferguson merged commit 151976f into main Mar 19, 2026
17 of 18 checks passed
@mschrimpf mschrimpf deleted the same_gpu_load branch March 25, 2026 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants