Skip to content

fix: respect NODE_LLAMA_CPP_GPU and GGML_CUDA env vars for CPU-only mode#266

Open
MichalSy wants to merge 1 commit intotobi:mainfrom
MichalSy:fix/respect-node-llama-cpp-gpu-env-var
Open

fix: respect NODE_LLAMA_CPP_GPU and GGML_CUDA env vars for CPU-only mode#266
MichalSy wants to merge 1 commit intotobi:mainfrom
MichalSy:fix/respect-node-llama-cpp-gpu-env-var

Conversation

@MichalSy
Copy link

Problem

On systems without a physical GPU, qmd still attempts to use CUDA if the CUDA toolkit or libraries are installed. This happens because:

  1. getLlamaGpuTypes() from node-llama-cpp detects CUDA libraries on the system
  2. qmd then calls getLlama({ gpu: "cuda", ... }) to use CUDA
  3. This triggers a CUDA build of llama.cpp (which can take significant time and CPU)
  4. On systems without a GPU, this build ultimately fails or runs very slowly in CPU fallback mode

Even when setting environment variables like:

qmd would ignore these variables and still try to use CUDA if detected.

Impact

  • High CPU usage: Users without GPUs see 100-400% CPU usage from CUDA build attempts
  • Slow startup: qmd takes much longer to initialize while building CUDA binaries
  • Wasted disk space: CUDA binaries (~50MB) are built even though they can't be used
  • Confusing behavior: Setting NODE_LLAMA_CPP_GPU=false has no effect

Solution

This PR modifies the ensureLlama() function in src/llm.ts to:

  1. Check environment variables first before calling getLlamaGpuTypes()
  2. If NODE_LLAMA_CPP_GPU=false or GGML_CUDA=OFF is set, skip GPU detection entirely
  3. Force CPU-only mode with getLlama({ gpu: false, ... })

Environment Variables Supported

Variable Value Effect
NODE_LLAMA_CPP_GPU "false" Force CPU-only mode
GGML_CUDA "OFF" Force CPU-only mode

Usage

Users without GPUs can now reliably force CPU-only mode:

# Option 1: Set in shell
export NODE_LLAMA_CPP_GPU=false
qmd search "query"

# Option 2: Set inline
NODE_LLAMA_CPP_GPU=false qmd search "query"

# Option 3: In wrapper script
#!/bin/bash
export NODE_LLAMA_CPP_GPU=false
export GGML_CUDA=OFF
qmd "$@"

Testing

Tested on a system without GPU:

  • ✅ No CUDA builds triggered when NODE_LLAMA_CPP_GPU=false is set
  • ✅ qmd uses pre-built CPU binaries from @node-llama-cpp/linux-x64
  • ✅ CPU usage remains normal during embedding operations

Related

Notes

  • This is a non-breaking change: existing behavior is preserved for users who don't set the environment variables
  • Users with working GPU setups are unaffected
  • This aligns qmd's behavior with user expectations from node-llama-cpp environment variables

When running on systems without a GPU, qmd would still try to use CUDA
if the CUDA libraries were detected by getLlamaGpuTypes(). This caused
unnecessary CUDA builds and high CPU usage.

This fix checks for NODE_LLAMA_CPP_GPU=false or GGML_CUDA=OFF environment
variables and forces CPU-only mode, skipping GPU detection entirely.

Related: tobi#185
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant