feat(rocm): Add AMD GPU support via rocm by steamwings · Pull Request #133 · davidamacey/OpenTranscribe

steamwings · 2026-02-16T14:03:23Z

Pull Request

Description

Add rocm support for AMD GPUs.

I don't really expect this to be merged, but I'm putting up the PR up for visibility anyway.

Limitations

Processing via rocm will only handle one file at a time. I ran into issues with forked celery processes so had to limit it to one process.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Code refactoring
Performance improvement

Changes made

Commit 1: feat(rocm): Add AMD ROCm GPU support infrastructure

Dockerfile.rocm — Multi-stage build installing PyTorch ROCm 6.4, CTranslate2 ROCm wheel, and MIOpen JIT compilation header stubs. Navigates the ROCm library maze (PyTorch bundles ROCm 6.4, CTranslate2 needs ROCm 7.0 system libs, both coexist via different SONAMEs).
requirements-rocm.txt — Python deps mirroring requirements.txt but with ROCm-specific PyTorch wheels and
CTranslate2 installed separately via Dockerfile.
docker-compose.rocm-build.yml — Build overlay that points all backend services at Dockerfile.rocm.
docker-compose.gpu-rocm.yml — Runtime overlay with /dev/kfd + /dev/dri passthrough, render group GID mapping, HSA_OVERRIDE_GFX_VERSION, and shared memory sizing.
opentr.sh — Auto-detects ROCm GPUs (/dev/kfd) and injects the correct compose overlays across all code paths
(start, reset, rebuild-backend, build).
.env.example — Documents HSA_OVERRIDE_GFX_VERSION and RENDER_GROUP_GID configuration.

Commit 2: feat(rocm): Add ROCm awareness to Python backend

hardware_detection.py — Adds is_rocm property to detect HIP backend; skips NVIDIA-specific env vars (TORCH_CUDA_ARCH_LIST) on ROCm; reports gpu_backend: rocm and hip_version in hardware summary; skips NVIDIA Docker driver config for ROCm containers.
utility.py — Uses rocm-smi CLI for GPU stats (temperature, VRAM, utilization) on ROCm, with fallback to PyTorch
CUDA API for memory stats when rocm-smi is unavailable.

Commit 3: fix(deps): Pin huggingface-hub<1.0.0 for pyannote.audio compatibility

Pins huggingface-hub<1.0.0 because pyannote.audio v3 uses the deprecated use_auth_token parameter removed in 1.0.0.
Kept as a separate commit so it can be easily cherry-picked since this issue also affects the CUDA build (requirements.txt on master).

Testing

I have tested these changes locally
I have added/updated tests for my changes
All existing tests pass
I have tested with different audio/video formats (if applicable)

Frontend changes (if applicable)

Changes work in light and dark mode
Changes are responsive on mobile devices
No console errors or warnings

Backend changes (if applicable)

API endpoints are properly documented
Database migrations are included (if needed)
Error handling is implemented
Logging is appropriate

Documentation

I have updated the README if needed
I have updated relevant documentation
Code is properly commented

Screenshots (if applicable)

Add screenshots to help explain your changes

Additional notes

Any additional information that reviewers should know

Add Docker build infrastructure and shell script support for running OpenTranscribe on AMD GPUs via ROCm/HIP. New files: - Dockerfile.rocm: Multi-stage build with PyTorch ROCm 6.4, CTranslate2 ROCm wheel, and MIOpen JIT compilation headers - requirements-rocm.txt: Python dependencies with ROCm-specific PyTorch - docker-compose.rocm-build.yml: Build overlay for ROCm backend image - docker-compose.gpu-rocm.yml: Runtime overlay with GPU device passthrough, render group mapping, and ROCm environment variables Modified files: - opentr.sh: Auto-detect ROCm GPUs and inject compose overlays - .env.example: Document HSA_OVERRIDE_GFX_VERSION and RENDER_GROUP_GID Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Update hardware detection and GPU monitoring to handle AMD ROCm/HIP alongside NVIDIA CUDA: - hardware_detection.py: Add is_rocm property to detect HIP backend, skip NVIDIA-specific env vars (TORCH_CUDA_ARCH_LIST) on ROCm, report gpu_backend and hip_version in hardware summary, skip NVIDIA driver config in Docker runtime helper - utility.py: Use rocm-smi for GPU stats on ROCm (temperature, VRAM, utilization), with fallback to PyTorch CUDA API for memory stats Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

pyannote.audio v3 uses the deprecated use_auth_token parameter that was removed in huggingface-hub 1.0.0. Pin to <1.0.0 to prevent runtime errors during speaker diarization model loading. This affects the CUDA build as well (requirements.txt on master) but is kept separate here for easy cherry-pick reference. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

davidamacey · 2026-02-16T21:02:35Z

@steamwings thank you for the PR, ROCm is on the long term plan.

While I am not a regular ROCm I am looking forward to learning more.

Thank you for doing the initial test to get this started! This will expand the user base!

I will explore this more after the version 0.4.0 release.

davidamacey · 2026-02-22T07:18:51Z

ROCm PR Review & Implementation Plan

@steamwings — thank you so much for putting this together! This is a really solid foundation for AMD GPU support. The work you've done navigating the ROCm library compatibility maze (PyTorch ROCm 6.4 bundled libs + CTranslate2 ROCm 7.0 system libs, MIOpen JIT headers, HSA runtime symlinks) is impressive and clearly shows real hands-on experience with ROCm. The --pool=solo fix for Celery fork safety is exactly right.

We've done a thorough review of the code and researched ROCm best practices for each component in our ML stack (WhisperX, PyAnnote, CTranslate2, Sentence Transformers). Based on your work and our research, we've put together a detailed phased implementation plan that we'll use to build on your foundation after the v0.4.0 release.

Full Implementation Plan

📋 OpenTranscribe ROCm Implementation Plan

This covers:

Phase 1: Alignment fixes (requirements sync with main requirements.txt, WhisperX version/install pattern, stability env vars for RDNA3)
Phase 2: Architecture improvements (unified requirements management, potential single Dockerfile with build args, enhanced hardware detection)
Phase 3: Testing infrastructure (CI build verification, smoke test script, GPU test matrix)
Phase 4: Documentation (docs/ROCM_SETUP.md, troubleshooting guide, known limitations)
Phase 5: Future enhancements (multi-GPU ROCm support, Docker Hub ROCm images, RDNA4 support)

Key Takeaways from Research

The good news is that all of our ML stack components work on ROCm:

Component	ROCm Status
PyTorch 2.8.0	Official wheels for ROCm 6.4
CTranslate2 4.7.x	Official ROCm support since v4.7.0 (Feb 2026)
PyAnnote Audio 3.1+	Pure PyTorch — works via HIP translation
Sentence Transformers	Works with explicit `device='cuda'`

We noted a few things we'll want to address when we pick this up:

Syncing requirements-rocm.txt with our current requirements.txt (some packages drifted and the WhisperX --no-deps install pattern needs to match our v4 pyannote compatibility layer)
Adding RDNA3 stability environment variables (PYTORCH_HIP_ALLOC_CONF, MIOPEN_FIND_MODE)
Refactoring the repeated overlay injection blocks in opentr.sh into a helper function
The huggingface-hub<1.0.0 pin you identified is a great catch — we'll look at cherry-picking that to master as you suggested

We'll work through the plan after v0.4.0 and will keep this PR as the reference point. Really appreciate you getting the ball rolling on this — it's going to open up OpenTranscribe to a much wider audience of GPU users!

Zander and others added 3 commits February 15, 2026 15:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat(rocm): Add AMD GPU support via rocm#133

feat(rocm): Add AMD GPU support via rocm#133
steamwings wants to merge 3 commits intodavidamacey:masterfrom
steamwings:master

steamwings commented Feb 16, 2026

Uh oh!

davidamacey commented Feb 16, 2026

Uh oh!

davidamacey commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

steamwings commented Feb 16, 2026

Pull Request

Description

Limitations

Type of change

Changes made

Commit 1: feat(rocm): Add AMD ROCm GPU support infrastructure

Commit 2: feat(rocm): Add ROCm awareness to Python backend

Commit 3: fix(deps): Pin huggingface-hub<1.0.0 for pyannote.audio compatibility

Testing

Frontend changes (if applicable)

Backend changes (if applicable)

Documentation

Screenshots (if applicable)

Additional notes

Uh oh!

davidamacey commented Feb 16, 2026

Uh oh!

davidamacey commented Feb 22, 2026

ROCm PR Review & Implementation Plan

Full Implementation Plan

Key Takeaways from Research

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants