feat(llm): add API backend support with backend-aware session routing by marcinbogdanski · Pull Request #204 · tobi/qmd

marcinbogdanski · 2026-02-17T11:14:17Z

Hello, looks like repo is exploding!

Introduce API backend for LLM support so qmd can be run w/o local models

Draft PR for now, feedback welcome, happy to update to maintainer requirements.

Covered:

API backend covering:
- embeddings via /v1/embeddings endpoint
- query expansion via /v1/chat/completions
- rerank via /v1/rerank
backend selection via QMD_LLM_BACKEND="api" env var, current behavior unchanged if unset
search/vsearch/query session paths update to be backend aware (avoids Llama init/download)
token-based chunking: in API mode skips tokenization, uses char-based chunking
tests for new API behavior - require provider api keys or skipped
some backward compatibility tests

Not covered:

support per embedding backend/model isolation in index - this is likely required to allow API api provider switching?
I have separate fix in the works on my end to derive index name from model provider/name
via short hash and always use it instead of default

Related work / other PRs

I did a quick look (e.g. PRs Add optional OpenRouter provider for embeddings/search pipeline #133, feat: Add OpenAI embedding and query expansion support #116, Add OpenAI-compatible remote LLM support and remote sanity-check scri… #106, feat: add remote embedding providers (Voyage AI, OpenAI-compatible) #65)
this PR is intended to handle session-aware backend routing and token-based chunking, which seems out-of-scope in other work at this time

Query Expansion - details

currently simply prompts completions endpoint to return lex/vec/hyde lines
it does not use constrained generation
if result parser returns empty, it will return empty array
params context and includeLexical supported

Error handling

query
- API key missing -> throw
- network/http/request failure -> throw
- invalid result shape -> throw
- valid result but no parseable lex|vec|hyde lines -> return empty array
Embeddings
- model override (options param) is ignored
- missing API key -> throws
- network/http/request failure -> returns null (consistent with local path)
rerank
- model override (options param) is ignored
- missing API key -> throws
- network/http/request failure -> throws

Hope this is helpful.

PS. got some free time, happy to update to spec or do other work if you want to dump anything on me, now that I am somewhat familiar with the repo

marcinbogdanski · 2026-02-18T18:04:13Z

Added a simple guard to protect existing embeddings in the database in case user changes embedding backend/provider.

Problem:

user starts using qmd via API with model A -> db is populated with embeddings from model A
user changes API embedding model to model B (or switches back to local model), old model A embeddings still exist in the db
now new embeddings come from new model B, but database still has old incompatible embeddings from model A

How new guard works:

if qmd is used only with built-in local models (API never enabled) -> nothing changes, I didn't touch that path
at any point, if embeddinsg are generated with API model, that information is written to small new table in the database
on all future embed/vsearch/query commands, qmd first checks if current model/provider setting matches what's saved in the database. If not, an error message is shown to user, to either:
- A) change env var config to match original embeddings and re-run the command
- B) use different index with --index parameter
- C) manually run qmd embed -f to force clear old embeddings and rebuild with new model

I investigated adding model/provider scoping to the relevant tables, but it would be too intrusive IMHO for this PR. I'm happy to look into this again (to provide proper multi-model support in single index file), if such functionality is required.

For now if you want to switch model, you need to re-embed or use separate index.

Tosko4 · 2026-02-22T07:03:21Z

@marcinbogdanski would you be willing to support this FR too: #239

marcinbogdanski · 2026-02-23T15:15:04Z

@Tosko4 This PR supports Cohere re-ranker, but in a different way thatn #239

This PR implements alternative backend ApiLLM that slots-in as replacement to LlamaCpp (with local models). I also implement minimal additional changes to make this backend swap-able.

Unfortunately at present this all-or-nothing, i.e. you either completely disable local models and do embedding/query-extension/re-ranking via API. Or you stick with local models for all steps. I want to keep this PR minimally intrusive so I can easily re-base to latest qmd.

More fine-grained backend control (switch only re-ranking to API, keep other locally) would require significantly more intrusive refactor in qmd. I could have a look, but would require discussing with @tobi to agree on details and scope to ensure it's aligned with his long term goal for the repo.

…nv vars to QMD_{EMBED|CHAT|RERANK}_*

…tests

…bility; keep rerank Cohere-only

…ract/live provider tests

…explicitly

… handling, and make QMD_LLM_BACKEND validation strict

…s use configured API models

…ndling

…chunking

…ed constants

…rors and tests

…tDefaultLLM for embeddings

marcinbogdanski mentioned this pull request Feb 17, 2026

[FEATURE]: models via API #114

Open

marcinbogdanski force-pushed the feat/api-llm-provider-backends branch from eba03a6 to e16627c Compare February 18, 2026 17:16

marcinbogdanski force-pushed the feat/api-llm-provider-backends branch 2 times, most recently from 05203b3 to 59aa0c2 Compare February 19, 2026 13:33

marcinbogdanski and others added 20 commits February 24, 2026 12:33

refactor: add default LLM seam for backend transition

632e15e

feat: add API embeddings backend with contract and live tests

98b5906

feat: add API rerank backend with contract and live tests

dfa55ca

feat: add API query expansion with strict-json toggle and normalize e…

d5c1c58

…nv vars to QMD_{EMBED|CHAT|RERANK}_*

chore: enforce QMD_ runtime envs and provider-only envs for live API …

5cfbd41

…tests

test: loop live embedding/chat tests across OpenAI and Cohere compati…

4ff14b3

…bility; keep rerank Cohere-only

feat(api): add Voyage embeddings/rerank compatibility and expand cont…

9b0b022

…ract/live provider tests

refactor(api): remove API-mode local fallback and make generate fail …

d7dd4d8

…explicitly

refactor(api): simplify query expansion to line-format parsing only

645e4f9

refactor(api): simplify query expansion parsing, align endpoint error…

917bd4a

… handling, and make QMD_LLM_BACKEND validation strict

refactor(api): ignore per-call embed/rerank model overrides and alway…

c15bdda

…s use configured API models

refactor(api): drop model overrides and simplify chat/rerank error ha…

8129e7a

…ndling

test: add mini comment

2855c77

refactor(api): rerquire explicit API key for chat/renark providers

26d1d32

refactor(api): make tokenization optional and fallback to char-based …

656e5a9

…chunking

refactor(api): route withLLMSession through backend-specific sessions

1fe01fa

refactor(api): centralize API backend default URLs and models in shar…

62dcfa6

…ed constants

fix(index): add API embedding scope guard with actionable mismatch er…

ff783f5

…rors and tests

feat(status): show API model/base-url configuration in status output

e2f7024

fix(store): enforce vector scope guard in structuredSearch and use ge…

dfa5458

…tDefaultLLM for embeddings

marcinbogdanski force-pushed the feat/api-llm-provider-backends branch from 59aa0c2 to dfa5458 Compare February 24, 2026 11:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llm): add API backend support with backend-aware session routing#204

feat(llm): add API backend support with backend-aware session routing#204
marcinbogdanski wants to merge 20 commits intotobi:mainfrom
marcinbogdanski:feat/api-llm-provider-backends

marcinbogdanski commented Feb 17, 2026 •

edited

Loading

Uh oh!

marcinbogdanski commented Feb 18, 2026

Uh oh!

Tosko4 commented Feb 22, 2026

Uh oh!

marcinbogdanski commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

marcinbogdanski commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marcinbogdanski commented Feb 18, 2026

Uh oh!

Tosko4 commented Feb 22, 2026

Uh oh!

marcinbogdanski commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

marcinbogdanski commented Feb 17, 2026 •

edited

Loading