Skip to content

add multi-gpu support#394

Merged
mike-ferguson merged 1 commit intomainfrom
add_multi_gpu_support
Mar 16, 2026
Merged

add multi-gpu support#394
mike-ferguson merged 1 commit intomainfrom
add_multi_gpu_support

Conversation

@mike-ferguson
Copy link
Copy Markdown
Member

Add multi-GPU and memory-efficient model loading to HuggingfaceSubject

Changes

  • Multi-GPU support: When multiple CUDA GPUs are detected, HuggingfaceSubject now loads models with device_map='auto', automatically distributing layers across all available GPUs. Single-GPU and MPS (Apple Silicon) behavior is unchanged.
  • Memory-efficient loading: Added low_cpu_mem_usage=True to from_pretrained, reducing peak CPU RAM during checkpoint loading from ~3x to ~1x model size.

Why

Models over ~6B parameters in float32 exceed a single 24 GB GPU. Previously, all weights were loaded onto GPU 0 regardless of how many GPUs were available, causing OOM kills on multi-GPU instances (e.g. g5.12xlarge). This fix allows 7-13B models to run in fp32 on multi-GPU instances without any changes to individual model plugins.

Impact

  • No changes required to existing model plugins
  • No behavior change on single-GPU or CPU/MPS setups
  • Fixes OOM for medium-tier models (Mistral-7B, OPT-6.7B, Falcon-7B, Pythia-12B, etc.) on multi-GPU Batch instances

@mike-ferguson mike-ferguson merged commit 1ee35a3 into main Mar 16, 2026
10 of 11 checks passed
@KartikP KartikP added the OOM label Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants