feat: support QMD_EMBED_MODEL and QMD_EMBED_CTX_SIZE env vars by ehc-io · Pull Request #261 · tobi/qmd

ehc-io · 2026-02-25T18:15:01Z

Summary

Allow overriding the default embedding model and context size via environment variables, enabling use of alternative GGUF embedding models without code changes.

New Environment Variables

Variable	Purpose	Default
`QMD_EMBED_MODEL`	Override the embedding model URI (any `hf:` URI or local path)	`embeddinggemma-300M`
`QMD_EMBED_CTX_SIZE`	Override embedding context size (tokens)	Model's GGUF default

Usage

export QMD_EMBED_MODEL="hf:nomic-ai/nomic-embed-text-v2-moe-GGUF/nomic-embed-text-v2-moe.Q8_0.gguf"
export QMD_EMBED_CTX_SIZE=2048
qmd embed -f
qmd query "my search"

Without the env vars, behavior is unchanged (falls back to embeddinggemma-300M).

Motivation

Some GGUF embedding models (e.g., Nomic Embed v2 MoE) offer significant improvements in speed and multilingual support. Currently there's no way to swap the embedding model without editing the compiled JS.

This is a minimal, non-breaking change — 3 lines added to src/llm.ts.

Testing

Tested with nomic-embed-text-v2-moe (Q8_0, 768d) on Apple M4 (Metal):

3x faster embedding throughput (26.3 KB/s vs 8.1 KB/s)
2.4x faster query embedding (378ms vs 923ms)
Improved retrieval quality on security/vulnerability queries
Same 768-dim vectors — no storage schema changes

Why QMD_EMBED_CTX_SIZE?

Some models' GGUF metadata reports a default context size smaller than qmd's 900-token chunks. Without an override, embedding fails with Input is longer than the context size. The env var lets users set an appropriate context size (e.g., 2048) without touching code.

Changes

src/llm.ts: 3 lines added, 1 line modified (constructor + ensureEmbedContexts)

Allow overriding the default embedding model and context size via environment variables, enabling use of alternative GGUF embedding models (e.g., Nomic Embed v2 MoE) without code changes. - QMD_EMBED_MODEL: override the embedding model URI (falls back to default embeddinggemma-300M if unset) - QMD_EMBED_CTX_SIZE: override the embedding context size (required for models whose GGUF metadata reports a small default) Tested with nomic-embed-text-v2-moe (Q8_0, 768d) on Apple M4 Metal: - 3x faster embedding throughput vs embeddinggemma - 2.4x faster query embedding - Improved retrieval quality on BEIR-style queries - Same 768-dim vectors, no schema changes needed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support QMD_EMBED_MODEL and QMD_EMBED_CTX_SIZE env vars#261

feat: support QMD_EMBED_MODEL and QMD_EMBED_CTX_SIZE env vars#261
ehc-io wants to merge 1 commit intotobi:mainfrom
ehc-io:feat/env-embed-model-override

ehc-io commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ehc-io commented Feb 25, 2026

Summary

New Environment Variables

Usage

Motivation

Testing

Why QMD_EMBED_CTX_SIZE?

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant