fix: respect NODE_LLAMA_CPP_GPU and GGML_CUDA env vars for CPU-only mode by MichalSy · Pull Request #266 · tobi/qmd

MichalSy · 2026-02-26T22:31:19Z

Problem

On systems without a physical GPU, qmd still attempts to use CUDA if the CUDA toolkit or libraries are installed. This happens because:

getLlamaGpuTypes() from node-llama-cpp detects CUDA libraries on the system
qmd then calls getLlama({ gpu: "cuda", ... }) to use CUDA
This triggers a CUDA build of llama.cpp (which can take significant time and CPU)
On systems without a GPU, this build ultimately fails or runs very slowly in CPU fallback mode

Even when setting environment variables like:

NODE_LLAMA_CPP_GPU=false
GGML_CUDA=OFF
NODE_LLAMA_CPP_CMAKE_OPTION_GGML_CUDA=OFF (related issue macOS Intel: unnecessary CUDA build attempt causes ~30s delay on every startup #185)

qmd would ignore these variables and still try to use CUDA if detected.

Impact

High CPU usage: Users without GPUs see 100-400% CPU usage from CUDA build attempts
Slow startup: qmd takes much longer to initialize while building CUDA binaries
Wasted disk space: CUDA binaries (~50MB) are built even though they can't be used
Confusing behavior: Setting NODE_LLAMA_CPP_GPU=false has no effect

Solution

This PR modifies the ensureLlama() function in src/llm.ts to:

Check environment variables first before calling getLlamaGpuTypes()
If NODE_LLAMA_CPP_GPU=false or GGML_CUDA=OFF is set, skip GPU detection entirely
Force CPU-only mode with getLlama({ gpu: false, ... })

Environment Variables Supported

Variable	Value	Effect
`NODE_LLAMA_CPP_GPU`	`"false"`	Force CPU-only mode
`GGML_CUDA`	`"OFF"`	Force CPU-only mode

Usage

Users without GPUs can now reliably force CPU-only mode:

# Option 1: Set in shell
export NODE_LLAMA_CPP_GPU=false
qmd search "query"

# Option 2: Set inline
NODE_LLAMA_CPP_GPU=false qmd search "query"

# Option 3: In wrapper script
#!/bin/bash
export NODE_LLAMA_CPP_GPU=false
export GGML_CUDA=OFF
qmd "$@"

Testing

Tested on a system without GPU:

✅ No CUDA builds triggered when NODE_LLAMA_CPP_GPU=false is set
✅ qmd uses pre-built CPU binaries from @node-llama-cpp/linux-x64
✅ CPU usage remains normal during embedding operations

Notes

This is a non-breaking change: existing behavior is preserved for users who don't set the environment variables
Users with working GPU setups are unaffected
This aligns qmd's behavior with user expectations from node-llama-cpp environment variables

When running on systems without a GPU, qmd would still try to use CUDA if the CUDA libraries were detected by getLlamaGpuTypes(). This caused unnecessary CUDA builds and high CPU usage. This fix checks for NODE_LLAMA_CPP_GPU=false or GGML_CUDA=OFF environment variables and forces CPU-only mode, skipping GPU detection entirely. Related: tobi#185

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: respect NODE_LLAMA_CPP_GPU and GGML_CUDA env vars for CPU-only mode#266

fix: respect NODE_LLAMA_CPP_GPU and GGML_CUDA env vars for CPU-only mode#266
MichalSy wants to merge 1 commit intotobi:mainfrom
MichalSy:fix/respect-node-llama-cpp-gpu-env-var

MichalSy commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MichalSy commented Feb 26, 2026

Problem

Impact

Solution

Environment Variables Supported

Usage

Testing

Related

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant