[FEAT]: Migrate AI model inference to GPU acceleration

## 📝 Description
Migrate the AI model inference pipeline to use GPU acceleration so that model inference runs significantly faster and more efficiently. This includes enabling CUDA/CUDNN support where applicable and switching to GPU-enabled builds of the inference libraries.

## 💡 Rationale
Current CPU-based inference is slow for larger inputs and high-throughput workloads. GPU acceleration will reduce latency and increase throughput for AI-driven features in FireForm, improving user experience and enabling more complex models to be used in production.

## 🛠️ Proposed Solution
A brief plan to implement GPU acceleration:
- [ ] Add GPU-capable dependencies (e.g., torch with CUDA support, CUDA-enabled runtime images) and document compatibility in requirements.
- [ ] Update inference code in `src/` to select GPU when available and fall back to CPU.
- [ ] Add Dockerfile variant or runtime configuration for CUDA-enabled containers (nvidia/cuda base images or NVIDIA Container Toolkit support).
- [ ] Integrate optional TensorRT/ONNX Runtime backend for further optimization (if applicable).
- [ ] Add CI/Dev steps for smoke testing on GPU-enabled runners or document how to test locally with NVIDIA Docker.
- [ ] Add performance benchmarks and example configs for common model sizes.

## ✅ Acceptance Criteria
- [ ] GPU-enabled Docker image runs inference successfully.
- [ ] Code auto-detects GPU and falls back to CPU when not available.
- [ ] Documentation updated in `docs/` with setup and testing steps for GPU.
- [ ] Measurable performance improvement documented (e.g., latency and throughput benchmarks) for one or more representative models.

## 📌 Additional Context
- Consider compatibility matrix for supported CUDA/CUDNN and PyTorch/TensorRT versions.
- If the project supports multiple model backends (Mistral/Ollama/etc.), document which backends can use GPU acceleration and any extra setup required.
- This change may require updates to deployment documentation and CI to support GPU testing or clear instructions for maintainers on how to validate changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT]: Migrate AI model inference to GPU acceleration #563

📝 Description

💡 Rationale

🛠️ Proposed Solution

✅ Acceptance Criteria

📌 Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[FEAT]: Migrate AI model inference to GPU acceleration #563

Description

📝 Description

💡 Rationale

🛠️ Proposed Solution

✅ Acceptance Criteria

📌 Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions