Skip to content

Comments

feat(rocm): Add AMD GPU support via rocm#133

Open
steamwings wants to merge 3 commits intodavidamacey:masterfrom
steamwings:master
Open

feat(rocm): Add AMD GPU support via rocm#133
steamwings wants to merge 3 commits intodavidamacey:masterfrom
steamwings:master

Conversation

@steamwings
Copy link

Pull Request

Description

Add rocm support for AMD GPUs.

I don't really expect this to be merged, but I'm putting up the PR up for visibility anyway.

Limitations

  • Processing via rocm will only handle one file at a time. I ran into issues with forked celery processes so had to limit it to one process.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Code refactoring
  • Performance improvement

Changes made

Commit 1: feat(rocm): Add AMD ROCm GPU support infrastructure

  • Dockerfile.rocm — Multi-stage build installing PyTorch ROCm 6.4, CTranslate2 ROCm wheel, and MIOpen JIT compilation header stubs. Navigates the ROCm library maze (PyTorch bundles ROCm 6.4, CTranslate2 needs ROCm 7.0 system libs, both coexist via different SONAMEs).
  • requirements-rocm.txt — Python deps mirroring requirements.txt but with ROCm-specific PyTorch wheels and
    CTranslate2 installed separately via Dockerfile.
  • docker-compose.rocm-build.yml — Build overlay that points all backend services at Dockerfile.rocm.
  • docker-compose.gpu-rocm.yml — Runtime overlay with /dev/kfd + /dev/dri passthrough, render group GID mapping, HSA_OVERRIDE_GFX_VERSION, and shared memory sizing.
  • opentr.sh — Auto-detects ROCm GPUs (/dev/kfd) and injects the correct compose overlays across all code paths
    (start, reset, rebuild-backend, build).
  • .env.example — Documents HSA_OVERRIDE_GFX_VERSION and RENDER_GROUP_GID configuration.

Commit 2: feat(rocm): Add ROCm awareness to Python backend

  • hardware_detection.py — Adds is_rocm property to detect HIP backend; skips NVIDIA-specific env vars (TORCH_CUDA_ARCH_LIST) on ROCm; reports gpu_backend: rocm and hip_version in hardware summary; skips NVIDIA Docker driver config for ROCm containers.
  • utility.py — Uses rocm-smi CLI for GPU stats (temperature, VRAM, utilization) on ROCm, with fallback to PyTorch
    CUDA API for memory stats when rocm-smi is unavailable.

Commit 3: fix(deps): Pin huggingface-hub<1.0.0 for pyannote.audio compatibility

  • Pins huggingface-hub<1.0.0 because pyannote.audio v3 uses the deprecated use_auth_token parameter removed in 1.0.0.
  • Kept as a separate commit so it can be easily cherry-picked since this issue also affects the CUDA build (requirements.txt on master).

Testing

  • I have tested these changes locally
  • I have added/updated tests for my changes
  • All existing tests pass
  • I have tested with different audio/video formats (if applicable)

Frontend changes (if applicable)

  • Changes work in light and dark mode
  • Changes are responsive on mobile devices
  • No console errors or warnings

Backend changes (if applicable)

  • API endpoints are properly documented
  • Database migrations are included (if needed)
  • Error handling is implemented
  • Logging is appropriate

Documentation

  • I have updated the README if needed
  • I have updated relevant documentation
  • Code is properly commented

Screenshots (if applicable)

Add screenshots to help explain your changes

Additional notes

Any additional information that reviewers should know

Zander and others added 3 commits February 15, 2026 15:20
Add Docker build infrastructure and shell script support for running
OpenTranscribe on AMD GPUs via ROCm/HIP.

New files:
- Dockerfile.rocm: Multi-stage build with PyTorch ROCm 6.4, CTranslate2
  ROCm wheel, and MIOpen JIT compilation headers
- requirements-rocm.txt: Python dependencies with ROCm-specific PyTorch
- docker-compose.rocm-build.yml: Build overlay for ROCm backend image
- docker-compose.gpu-rocm.yml: Runtime overlay with GPU device passthrough,
  render group mapping, and ROCm environment variables

Modified files:
- opentr.sh: Auto-detect ROCm GPUs and inject compose overlays
- .env.example: Document HSA_OVERRIDE_GFX_VERSION and RENDER_GROUP_GID

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update hardware detection and GPU monitoring to handle AMD ROCm/HIP
alongside NVIDIA CUDA:

- hardware_detection.py: Add is_rocm property to detect HIP backend,
  skip NVIDIA-specific env vars (TORCH_CUDA_ARCH_LIST) on ROCm,
  report gpu_backend and hip_version in hardware summary, skip
  NVIDIA driver config in Docker runtime helper
- utility.py: Use rocm-smi for GPU stats on ROCm (temperature, VRAM,
  utilization), with fallback to PyTorch CUDA API for memory stats

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
pyannote.audio v3 uses the deprecated use_auth_token parameter that was
removed in huggingface-hub 1.0.0. Pin to <1.0.0 to prevent runtime
errors during speaker diarization model loading.

This affects the CUDA build as well (requirements.txt on master) but
is kept separate here for easy cherry-pick reference.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@davidamacey
Copy link
Owner

@steamwings thank you for the PR, ROCm is on the long term plan.

While I am not a regular ROCm I am looking forward to learning more.

Thank you for doing the initial test to get this started! This will expand the user base!

I will explore this more after the version 0.4.0 release.

@davidamacey
Copy link
Owner

ROCm PR Review & Implementation Plan

@steamwings — thank you so much for putting this together! This is a really solid foundation for AMD GPU support. The work you've done navigating the ROCm library compatibility maze (PyTorch ROCm 6.4 bundled libs + CTranslate2 ROCm 7.0 system libs, MIOpen JIT headers, HSA runtime symlinks) is impressive and clearly shows real hands-on experience with ROCm. The --pool=solo fix for Celery fork safety is exactly right.

We've done a thorough review of the code and researched ROCm best practices for each component in our ML stack (WhisperX, PyAnnote, CTranslate2, Sentence Transformers). Based on your work and our research, we've put together a detailed phased implementation plan that we'll use to build on your foundation after the v0.4.0 release.

Full Implementation Plan

📋 OpenTranscribe ROCm Implementation Plan

This covers:

  • Phase 1: Alignment fixes (requirements sync with main requirements.txt, WhisperX version/install pattern, stability env vars for RDNA3)
  • Phase 2: Architecture improvements (unified requirements management, potential single Dockerfile with build args, enhanced hardware detection)
  • Phase 3: Testing infrastructure (CI build verification, smoke test script, GPU test matrix)
  • Phase 4: Documentation (docs/ROCM_SETUP.md, troubleshooting guide, known limitations)
  • Phase 5: Future enhancements (multi-GPU ROCm support, Docker Hub ROCm images, RDNA4 support)

Key Takeaways from Research

The good news is that all of our ML stack components work on ROCm:

Component ROCm Status
PyTorch 2.8.0 Official wheels for ROCm 6.4
CTranslate2 4.7.x Official ROCm support since v4.7.0 (Feb 2026)
PyAnnote Audio 3.1+ Pure PyTorch — works via HIP translation
Sentence Transformers Works with explicit device='cuda'

We noted a few things we'll want to address when we pick this up:

  • Syncing requirements-rocm.txt with our current requirements.txt (some packages drifted and the WhisperX --no-deps install pattern needs to match our v4 pyannote compatibility layer)
  • Adding RDNA3 stability environment variables (PYTORCH_HIP_ALLOC_CONF, MIOPEN_FIND_MODE)
  • Refactoring the repeated overlay injection blocks in opentr.sh into a helper function
  • The huggingface-hub<1.0.0 pin you identified is a great catch — we'll look at cherry-picking that to master as you suggested

We'll work through the plan after v0.4.0 and will keep this PR as the reference point. Really appreciate you getting the ball rolling on this — it's going to open up OpenTranscribe to a much wider audience of GPU users!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants