Skip to content

[FEAT]: Migrate AI model inference to GPU acceleration #563

Description

@marcvergees

📝 Description

Migrate the AI model inference pipeline to use GPU acceleration so that model inference runs significantly faster and more efficiently. This includes enabling CUDA/CUDNN support where applicable and switching to GPU-enabled builds of the inference libraries.

💡 Rationale

Current CPU-based inference is slow for larger inputs and high-throughput workloads. GPU acceleration will reduce latency and increase throughput for AI-driven features in FireForm, improving user experience and enabling more complex models to be used in production.

🛠️ Proposed Solution

A brief plan to implement GPU acceleration:

  • Add GPU-capable dependencies (e.g., torch with CUDA support, CUDA-enabled runtime images) and document compatibility in requirements.
  • Update inference code in src/ to select GPU when available and fall back to CPU.
  • Add Dockerfile variant or runtime configuration for CUDA-enabled containers (nvidia/cuda base images or NVIDIA Container Toolkit support).
  • Integrate optional TensorRT/ONNX Runtime backend for further optimization (if applicable).
  • Add CI/Dev steps for smoke testing on GPU-enabled runners or document how to test locally with NVIDIA Docker.
  • Add performance benchmarks and example configs for common model sizes.

✅ Acceptance Criteria

  • GPU-enabled Docker image runs inference successfully.
  • Code auto-detects GPU and falls back to CPU when not available.
  • Documentation updated in docs/ with setup and testing steps for GPU.
  • Measurable performance improvement documented (e.g., latency and throughput benchmarks) for one or more representative models.

📌 Additional Context

  • Consider compatibility matrix for supported CUDA/CUDNN and PyTorch/TensorRT versions.
  • If the project supports multiple model backends (Mistral/Ollama/etc.), document which backends can use GPU acceleration and any extra setup required.
  • This change may require updates to deployment documentation and CI to support GPU testing or clear instructions for maintainers on how to validate changes.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

Status
No status

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions