feat: auto-detect Apple Silicon (MPS) and keep Triton CUDA-only by Berkkirik · Pull Request #22 · openai/privacy-filter

Berkkirik · 2026-04-24T16:54:07Z

Fixes #21

Builds on #17 (which added --device auto with cuda > cpu fallback). Extends the auto chain to include Apple Silicon (MPS) so Mac users get GPU acceleration by default instead of falling back to CPU.

Problem

After #17, an Apple Silicon Mac with no CUDA device falls back to CPU. That's correct but leaves performance on the table — MPS is available on M1/M2/M3 Macs. Naively adding MPS to the auto chain is unsafe, though: the Triton-backed MoE kernels are CUDA-only (Triton does not target Metal), and the current code auto-enables Triton on any non-CPU device. Picking mps as default would silently crash once the MoE layer is hit.

Fix

Two coordinated changes:

opf/_common/device.py — auto now picks cuda > mps > cpu. Each fallback emits an info line on stderr so the user always knows which backend was selected.
opf/_model/model.py — narrow the Triton auto-enable to device.type == "cuda". MPS and CPU both use the torch-ops path unless the user explicitly sets OPF_MOE_TRITON=1.
opf/_train/runner.py — mirror the same CUDA-only gate when the training runner sets OPF_MOE_TRITON=1 on behalf of the user (previously set for any non-CPU device, which would silently enable Triton on mps).
opf/_cli/common.py — expand --device help text to list the full backend order.

Verified on this machine

macOS 26.4 on Apple Silicon (M-series), Python 3.14, torch 2.11.0:

$ python3 -c "from opf._common.device import resolve_device; print(resolve_device('auto'))"
info: no CUDA device detected; using Apple Metal (MPS).
mps

Resolver cases:

Case	Result
`resolve_device("auto")` on Apple Silicon	`mps` + stderr info line ✅
`resolve_device("mps")`	`mps` ✅
`resolve_device("cpu")`	`cpu` ✅
`resolve_device("cuda")` on non-CUDA machine	Returns `cuda` object, fails at tensor alloc — same as today when user asks for cuda explicitly ✅

Low-level MPS op smoke test (ops the inference path depends on), all pass on MPS:

torch.nn.Embedding forward (the op that crashed in on MacBook: Torch not compiled with CUDA enabled #21/Odd defaults: fails if no NVIDIA driver found on a system with no nvidia device at all #12)
Attention-like matmul + softmax
log_softmax (inference logprob path)
topk (MoE expert routing)
argsort (MoE packing)
bincount (MoE expert counts — historically flaky on MPS, works here)

End-to-end inference on MPS was not run here (no checkpoint locally). A maintainer on Apple Silicon can verify with:

opf "Alice was born on 1990-01-02."
# should print an "info: ... using Apple Metal (MPS)" line and redact correctly

Backwards compatibility

--device cuda / --device cpu / --device mps: unchanged (all pass through resolve_device as-is).
On CUDA machines: auto still picks cuda first — no behavior change.
Triton path: was auto-enabled on any non-CPU device, now auto-enabled on CUDA only. Users who were running training on CUDA get the same behavior; CPU users get the same behavior; only an MPS user (who would have crashed before) now gets the torch-ops path.
No public Python API change.

Depends on

#17 — This PR builds directly on the resolve_device helper introduced there. If #17 is merged first, this rebases to a no-conflict diff; if the maintainer prefers to squash both into one, I'm happy to close this and post a combined patch.

The CLI --device flag defaulted to "cuda", which crashed with a raw PyTorch traceback ("Found no NVIDIA driver on your system ...") on machines without a GPU. Users had to discover --device cpu themselves. Add an "auto" mode that picks the best available backend (cuda if detected, otherwise cpu) and make it the default. Users who explicitly pass --device cuda still get the original loud failure on non-CUDA machines, which is the correct behavior when they ask for cuda by name. - opf/_common/device.py (new): resolve_device("auto"|...) helper. - opf/_cli/common.py: flip --device default to "auto", expand help text. - opf/_core/runtime.py, opf/_train/runner.py: call resolve_device() where device names turn into torch.device objects. Stderr on auto-fallback: info: no CUDA device detected; falling back to CPU (pass --device cuda to override). Fixes openai#12

Extends the --device auto resolution from openai#17 to include Apple Silicon (MPS) so Mac users get GPU acceleration by default instead of falling back to CPU. Two coordinated changes make this safe: 1. opf/_common/device.py — "auto" now picks cuda > mps > cpu. Each fallback emits an info line on stderr so the user always knows which backend was selected. 2. opf/_model/model.py — the Triton-backed MoE kernels are CUDA-only (Triton does not target Metal). Previously the default enabled Triton on any non-CPU device, so trying mps crashed once the MoE layer was hit. Narrow the auto-enable to device.type == "cuda"; mps and cpu both fall back to the torch-ops path unless the user explicitly sets OPF_MOE_TRITON=1. 3. opf/_train/runner.py — mirror the same CUDA-only gate when setting OPF_MOE_TRITON=1 on behalf of the user (previously set it for any non-CPU device, which would silently enable Triton on mps). 4. opf/_cli/common.py — expand --device help text to list the full backend order (cuda > mps > cpu). Verified on macOS (Apple Silicon, Python 3.14, torch 2.11): - resolve_device("auto") → mps (with stderr info line) - resolve_device("mps") → mps - resolve_device("cpu") → cpu - resolve_device("cuda") → returns cuda device (still fails loudly at tensor alloc when the user explicitly asks for it — unchanged) Low-level MPS op sanity check passed for embedding, attention-like matmul/softmax, log_softmax, topk, argsort, bincount — all the ops the inference path relies on. Fixes openai#21

clement-heliot · 2026-04-24T17:57:24Z

FTR, I currently do the following:
OPF_MOE_TRITON=0 opf --device mps "Alice was born on 1990-01-02."

Berkkirik · 2026-04-24T18:01:07Z

FTR, I currently do the following: OPF_MOE_TRITON=0 opf --device mps "Alice was born on 1990-01-02."

Thanks for the data point — that's exactly the workflow this PR automates. After merge, OPF_MOE_TRITON=0 and --device mps both become implicit on Apple Silicon; plain opf "..." should give you the same result.

Berkkirik · 2026-04-29T20:00:21Z

FTR, I currently do the following: OPF_MOE_TRITON=0 opf --device mps "Alice was born on 1990-01-02."

Thanks for the data point — that's exactly the workflow this PR automates. After merge, OPF_MOE_TRITON=0 and --device mps both become implicit on Apple Silicon; plain opf "..." should give you the same result.

I couldn't see your messages, can you write it again ?

losoy88 · 2026-04-30T22:25:04Z

FTR, I currently do the following: OPF_MOE_TRITON=0 opf --device mps "Alice was born on 1990-01-02."

Thanks for the data point — that's exactly the workflow this PR automates. After merge, OPF_MOE_TRITON=0 and --device mps both become implicit on Apple Silicon; plain opf "..." should give you the same result.

I couldn't see your messages, can you write it again ?

Berkkirik added 2 commits April 24, 2026 11:06

This comment has been minimized.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: auto-detect Apple Silicon (MPS) and keep Triton CUDA-only#22

feat: auto-detect Apple Silicon (MPS) and keep Triton CUDA-only#22
Berkkirik wants to merge 2 commits into
openai:mainfrom
Berkkirik:feat/mps-support-auto

Berkkirik commented Apr 24, 2026

Uh oh!

clement-heliot commented Apr 24, 2026 •

edited

Loading

Uh oh!

Berkkirik commented Apr 24, 2026

Uh oh!

This comment has been minimized.

Berkkirik commented Apr 29, 2026

Uh oh!

losoy88 commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Berkkirik commented Apr 24, 2026

Problem

Fix

Verified on this machine

Backwards compatibility

Depends on

Uh oh!

clement-heliot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Berkkirik commented Apr 24, 2026

Uh oh!

This comment has been minimized.

Berkkirik commented Apr 29, 2026

Uh oh!

losoy88 commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

clement-heliot commented Apr 24, 2026 •

edited

Loading