feat(llm): add API backend support with backend-aware session routing#204
feat(llm): add API backend support with backend-aware session routing#204marcinbogdanski wants to merge 20 commits intotobi:mainfrom
Conversation
eba03a6 to
e16627c
Compare
|
Added a simple guard to protect existing embeddings in the database in case user changes embedding backend/provider. Problem:
How new guard works:
I investigated adding model/provider scoping to the relevant tables, but it would be too intrusive IMHO for this PR. I'm happy to look into this again (to provide proper multi-model support in single index file), if such functionality is required. For now if you want to switch model, you need to re-embed or use separate index. |
05203b3 to
59aa0c2
Compare
|
@marcinbogdanski would you be willing to support this FR too: #239 |
|
@Tosko4 This PR supports Cohere re-ranker, but in a different way thatn #239 This PR implements alternative backend ApiLLM that slots-in as replacement to LlamaCpp (with local models). I also implement minimal additional changes to make this backend swap-able. Unfortunately at present this all-or-nothing, i.e. you either completely disable local models and do embedding/query-extension/re-ranking via API. Or you stick with local models for all steps. I want to keep this PR minimally intrusive so I can easily re-base to latest qmd. More fine-grained backend control (switch only re-ranking to API, keep other locally) would require significantly more intrusive refactor in qmd. I could have a look, but would require discussing with @tobi to agree on details and scope to ensure it's aligned with his long term goal for the repo. |
…nv vars to QMD_{EMBED|CHAT|RERANK}_*
…bility; keep rerank Cohere-only
…ract/live provider tests
… handling, and make QMD_LLM_BACKEND validation strict
…s use configured API models
…tDefaultLLM for embeddings
59aa0c2 to
dfa5458
Compare
Hello, looks like repo is exploding!
Introduce API backend for LLM support so
qmdcan be run w/o local modelsDraft PR for now, feedback welcome, happy to update to maintainer requirements.
Covered:
/v1/embeddingsendpoint/v1/chat/completions/v1/rerankQMD_LLM_BACKEND="api"env var, current behavior unchanged if unsetNot covered:
via short hash and always use it instead of default
Related work / other PRs
Query Expansion - details
contextandincludeLexicalsupportedError handling
optionsparam) is ignoredoptionsparam) is ignoredHope this is helpful.
PS. got some free time, happy to update to spec or do other work if you want to dump anything on me, now that I am somewhat familiar with the repo