Skip to content

feat: support QMD_EMBED_MODEL and QMD_EMBED_CTX_SIZE env vars#261

Open
ehc-io wants to merge 1 commit intotobi:mainfrom
ehc-io:feat/env-embed-model-override
Open

feat: support QMD_EMBED_MODEL and QMD_EMBED_CTX_SIZE env vars#261
ehc-io wants to merge 1 commit intotobi:mainfrom
ehc-io:feat/env-embed-model-override

Conversation

@ehc-io
Copy link

@ehc-io ehc-io commented Feb 25, 2026

Summary

Allow overriding the default embedding model and context size via environment variables, enabling use of alternative GGUF embedding models without code changes.

New Environment Variables

Variable Purpose Default
QMD_EMBED_MODEL Override the embedding model URI (any hf: URI or local path) embeddinggemma-300M
QMD_EMBED_CTX_SIZE Override embedding context size (tokens) Model's GGUF default

Usage

export QMD_EMBED_MODEL="hf:nomic-ai/nomic-embed-text-v2-moe-GGUF/nomic-embed-text-v2-moe.Q8_0.gguf"
export QMD_EMBED_CTX_SIZE=2048
qmd embed -f
qmd query "my search"

Without the env vars, behavior is unchanged (falls back to embeddinggemma-300M).

Motivation

Some GGUF embedding models (e.g., Nomic Embed v2 MoE) offer significant improvements in speed and multilingual support. Currently there's no way to swap the embedding model without editing the compiled JS.

This is a minimal, non-breaking change — 3 lines added to src/llm.ts.

Testing

Tested with nomic-embed-text-v2-moe (Q8_0, 768d) on Apple M4 (Metal):

  • 3x faster embedding throughput (26.3 KB/s vs 8.1 KB/s)
  • 2.4x faster query embedding (378ms vs 923ms)
  • Improved retrieval quality on security/vulnerability queries
  • Same 768-dim vectors — no storage schema changes

Why QMD_EMBED_CTX_SIZE?

Some models' GGUF metadata reports a default context size smaller than qmd's 900-token chunks. Without an override, embedding fails with Input is longer than the context size. The env var lets users set an appropriate context size (e.g., 2048) without touching code.

Changes

src/llm.ts: 3 lines added, 1 line modified (constructor + ensureEmbedContexts)

Allow overriding the default embedding model and context size via
environment variables, enabling use of alternative GGUF embedding
models (e.g., Nomic Embed v2 MoE) without code changes.

- QMD_EMBED_MODEL: override the embedding model URI
  (falls back to default embeddinggemma-300M if unset)
- QMD_EMBED_CTX_SIZE: override the embedding context size
  (required for models whose GGUF metadata reports a small default)

Tested with nomic-embed-text-v2-moe (Q8_0, 768d) on Apple M4 Metal:
- 3x faster embedding throughput vs embeddinggemma
- 2.4x faster query embedding
- Improved retrieval quality on BEIR-style queries
- Same 768-dim vectors, no schema changes needed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant