Feature/multi gpu by theodufort · Pull Request #84 · Andyyyy64/whichllm

theodufort · 2026-06-03T22:53:56Z

Allow simulating multiple GPUs with comma-separated names (RTX 5080,RTX 5060 Ti) or count shorthand (2x RTX 4090, 4x H100). VRAM fit uses conservative pooling — per-GPU framework overhead (~300MB each) and a utilization factor (95% homogeneous, 90% heterogeneous) are applied rather than naively summing all VRAM as one device. Speed estimation uses a conservative flat 30% overhead factor without claiming precision about the interconnect topology (PCIe vs NVLink); multi-GPU throughput is always marked low-confidence.

What

Add multi-GPU support to implement #65

Parse --gpu values: comma-separated heterogeneous GPUs and Nx count shorthand
Conservative VRAM fit that accounts for per-GPU overhead and heterogeneous split inefficiency
Low-confidence speed estimates that avoid faking PCIe/NVLink precision
Warnings surfaced for multi-GPU splits and heterogeneous configurations

Why

Users with multiple GPUs (e.g. 2x RTX 3090, mixed RTX 4090 + 3090) need to see which models fit across their combined VRAM. The fit simulation layer comes first; speed modeling is deliberately conservative until interconnect and tensor-split assumptions can be properly validated.

Testing

Tests pass (pytest)
New tests added (if applicable)
[] Tested on real hardware (if hardware-related)

Notes

NVLink detection from the initial commit was removed — it incorrectly assumed any NVIDIA GPU with compute capability >= 7.0 has NVLink (consumer GPUs like RTX 4090 do not)
vram_available_bytes still reports the raw physical total; the conservative budget is only used internally for fit decisions
Per-GPU --vram override is not yet supported for multi-GPU; the error message now says so clearly
Speed estimates for multi-GPU are always low-confidence with an explicit note about PCIe/NVLink dependence — this is intentional, not a gap

Andyyyy64 · 2026-06-09T03:02:47Z

Thanks @theodufort, this is a lot of careful work, and the conservative approach to VRAM pooling and speed looks right to me. It conflicts with main now, and since it touches the ranker, compatibility, and performance paths, I'd like it rebased on the latest main before I review it properly. Could you update it? I want to give the fit logic a careful read once it's clean.

Allow simulating multiple GPUs with comma-separated names ("RTX 5080,RTX 5060 Ti") or count shorthand ("2x RTX 4090", "4x H100"). VRAM is pooled across all GPUs for fit determination. Speed estimation uses a tensor-parallel model where the slowest GPU is the bottleneck, with inter-GPU communication overhead (PCIe/NVLink) factored in. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The initial multi-GPU commit naively pooled all GPU VRAM as one device and assumed any NVIDIA GPU with compute capability >= 7.0 has NVLink (wrong for consumer GPUs like RTX 4090). This commit: - Applies per-GPU framework overhead (~300MB) and a utilization factor (95% homogeneous, 90% heterogeneous) to VRAM fit checks - Replaces the NVLink/PCIe sync model with a flat 30% overhead factor, avoiding false precision about interconnect topology - Adds warnings for multi-GPU and heterogeneous configurations - Fixes broken --vram error message grammar Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

theodufort · 2026-06-14T01:38:32Z

Thanks @theodufort, this is a lot of careful work, and the conservative approach to VRAM pooling and speed looks right to me. It conflicts with main now, and since it touches the ranker, compatibility, and performance paths, I'd like it rebased on the latest main before I review it properly. Could you update it? I want to give the fit logic a careful read once it's clean.

Hey Andy, sorry for the delay, I just rebased it!

Andyyyy64 · 2026-06-14T05:07:28Z

Thanks for putting this together and for coming back to rebase it.

I checked the branch again after the recent main changes. It still conflicts with current main, and the conflict is in compatibility.py around the multi-GPU fit path and the newer offload_ratio logic.

I am going to take this back into a maintainer-owned implementation instead of asking you to keep chasing main. Multi-GPU is now coming up in #65, #52, and #110, and it needs to line up with the RAM reserve work, the partial-offload ranking fix, and the display split.

Your PR is still useful. The conservative VRAM-fit direction and the test cases are good reference material, and I will credit this work if I reuse pieces of it.

Andyyyy64 · 2026-06-14T05:12:31Z

Closing this in favor of #112.

This is not a rejection of the direction. The PR helped clarify the shape of the feature, especially the conservative VRAM-fit approach. The issue is that main has changed under it: RAM reserve behavior, partial-offload ranking, and output structure all moved, and the PR still conflicts in the compatibility path.

I will carry the useful parts forward in the new implementation and credit this PR where it informs the final code.

theodufort · 2026-06-15T20:52:20Z

Closing this in favor of #112.

This is not a rejection of the direction. The PR helped clarify the shape of the feature, especially the conservative VRAM-fit approach. The issue is that main has changed under it: RAM reserve behavior, partial-offload ranking, and output structure all moved, and the PR still conflicts in the compatibility path.

I will carry the useful parts forward in the new implementation and credit this PR where it informs the final code.

I don't know how I feel about you closing the merge request and recommitting yourself the changes, this kind of behaviour will not get you many contributors...

Andyyyy64 · 2026-06-18T06:32:39Z

Closing this in favor of #112.
This is not a rejection of the direction. The PR helped clarify the shape of the feature, especially the conservative VRAM-fit approach. The issue is that main has changed under it: RAM reserve behavior, partial-offload ranking, and output structure all moved, and the PR still conflicts in the compatibility path.
I will carry the useful parts forward in the new implementation and credit this PR where it informs the final code.

I don't know how I feel about you closing the merge request and recommitting yourself the changes, this kind of behaviour will not get you many contributors...

I'm sorry.

Looking back, I handled this poorly. You opened the PR, came back to rebase it, then I closed it and shipped a maintainer version soon after. I can see why that felt like your work was taken out of your hands.

The reason I took it back was that main had moved across the same code paths: RAM reserve behavior, partial-offload ranking, compatibility logic, and output structure. I thought landing a maintainer-owned version would avoid making you chase more conflicts. That was the reasoning, but I should have made that handoff clear before closing the PR.

Your PR did help shape the final implementation, especially the conservative VRAM fit approach and the tests. I should have credited that more visibly when I opened the maintainer PR.

Sorry again. Next time, I will either help get the contributor PR over the line, or say earlier that I need to take over the implementation and make the credit clear.

theodufort mentioned this pull request Jun 3, 2026

Feature Request: Multi-GPU simulation via --gpu flag (heterogeneous setups) #65

Open

theodufort and others added 2 commits June 13, 2026 21:32

theodufort force-pushed the feature/multi-gpu branch from 3df8c07 to a5da49d Compare June 14, 2026 01:32

Andyyyy64 mentioned this pull request Jun 14, 2026

Implement maintainer-owned multi-GPU fit simulation #112

Closed

Andyyyy64 closed this Jun 14, 2026

Andyyyy64 mentioned this pull request Jun 14, 2026

feat: add conservative multi-GPU simulation #113

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/multi gpu#84

Feature/multi gpu#84
theodufort wants to merge 2 commits into
Andyyyy64:mainfrom
theodufort:feature/multi-gpu

theodufort commented Jun 3, 2026

Uh oh!

Andyyyy64 commented Jun 9, 2026

Uh oh!

theodufort commented Jun 14, 2026

Uh oh!

Andyyyy64 commented Jun 14, 2026

Uh oh!

Andyyyy64 commented Jun 14, 2026

Uh oh!

theodufort commented Jun 15, 2026

Uh oh!

Andyyyy64 commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

theodufort commented Jun 3, 2026

What

Why

Testing

Notes

Uh oh!

Andyyyy64 commented Jun 9, 2026

Uh oh!

theodufort commented Jun 14, 2026

Uh oh!

Andyyyy64 commented Jun 14, 2026

Uh oh!

Andyyyy64 commented Jun 14, 2026

Uh oh!

theodufort commented Jun 15, 2026

Uh oh!

Andyyyy64 commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants