feat: support QMD_EMBED_MODEL and QMD_EMBED_CTX_SIZE env vars#261
Open
feat: support QMD_EMBED_MODEL and QMD_EMBED_CTX_SIZE env vars#261
Conversation
Allow overriding the default embedding model and context size via environment variables, enabling use of alternative GGUF embedding models (e.g., Nomic Embed v2 MoE) without code changes. - QMD_EMBED_MODEL: override the embedding model URI (falls back to default embeddinggemma-300M if unset) - QMD_EMBED_CTX_SIZE: override the embedding context size (required for models whose GGUF metadata reports a small default) Tested with nomic-embed-text-v2-moe (Q8_0, 768d) on Apple M4 Metal: - 3x faster embedding throughput vs embeddinggemma - 2.4x faster query embedding - Improved retrieval quality on BEIR-style queries - Same 768-dim vectors, no schema changes needed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Allow overriding the default embedding model and context size via environment variables, enabling use of alternative GGUF embedding models without code changes.
New Environment Variables
QMD_EMBED_MODELhf:URI or local path)embeddinggemma-300MQMD_EMBED_CTX_SIZEUsage
Without the env vars, behavior is unchanged (falls back to embeddinggemma-300M).
Motivation
Some GGUF embedding models (e.g., Nomic Embed v2 MoE) offer significant improvements in speed and multilingual support. Currently there's no way to swap the embedding model without editing the compiled JS.
This is a minimal, non-breaking change — 3 lines added to
src/llm.ts.Testing
Tested with
nomic-embed-text-v2-moe(Q8_0, 768d) on Apple M4 (Metal):Why QMD_EMBED_CTX_SIZE?
Some models' GGUF metadata reports a default context size smaller than qmd's 900-token chunks. Without an override, embedding fails with
Input is longer than the context size. The env var lets users set an appropriate context size (e.g., 2048) without touching code.Changes
src/llm.ts: 3 lines added, 1 line modified (constructor +ensureEmbedContexts)