Skip to content

feat: allow configurable embedding model for local provider#55

Closed
eugenepro2 wants to merge 2 commits intotirth8205:mainfrom
eugenepro2:feat/configurable-embedding-model
Closed

feat: allow configurable embedding model for local provider#55
eugenepro2 wants to merge 2 commits intotirth8205:mainfrom
eugenepro2:feat/configurable-embedding-model

Conversation

@eugenepro2
Copy link

Summary

  • Allow users to specify any sentence-transformers compatible model for local embeddings instead of the hardcoded all-MiniLM-L6-v2
  • Model can be configured via model parameter on MCP tool, CRG_EMBEDDING_MODEL env var, or falls back to the default
  • Changing the model automatically re-embeds all nodes (provider name in DB acts as cache key)

Changes

  • embeddings.py: LocalEmbeddingProvider accepts model_name; get_provider() and EmbeddingStore forward new model kwarg; removed redundant default in Google provider passthrough
  • tools.py: embed_graph and semantic_search_nodes accept model parameter — ensures query vectors use the same model as indexing
  • main.py: Both MCP tool wrappers (embed_graph_tool, semantic_search_nodes_tool) expose and forward the model parameter; updated docstrings
  • README.md: Configuration section documents custom model setup via .mcp.json env
  • tests/test_embeddings.py: +10 tests covering model_name priority chain, env var fallback, get_provider/EmbeddingStore passthrough, re-embed on provider change

Configuration

Via .mcp.json (recommended — applies to all MCP tool calls):

{
  "mcpServers": {
    "code-review-graph": {
      "command": "uvx",
      "args": ["code-review-graph", "serve"],
      "env": {
        "CRG_EMBEDDING_MODEL": "BAAI/bge-small-en-v1.5"
      }
    }
  }
}

Or directly via MCP tool parameter:

embed_graph(model="intfloat/multilingual-e5-small")

tirth8205 added a commit that referenced this pull request Mar 26, 2026
Adds CRG_EMBEDDING_MODEL env var and model parameter to embedding
functions, allowing users to specify any sentence-transformers compatible
model. Changing the model re-embeds all nodes automatically.

Co-Authored-By: eugenepro2 <noreply@github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
tirth8205 added a commit that referenced this pull request Mar 26, 2026
…nfigurable embeddings, MiniMax, Perl)

* feat: integrate PR #43 — R language support

Adds R language parsing with function extraction (both <- and = assignment),
S4/R5 class detection via setClass/setRefClass, library/require/source
imports, namespace-qualified calls (dplyr::filter), and testthat test detection.

Co-Authored-By: michael-denyer <noreply@github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: integrate PR #54 — Vitest/Jest test detection

Adds describe/it/test block parsing for JS/TS test files, producing
synthetic Test nodes with description labels. Supports modifier suffixes
(describe.only, it.skip, test.each) and nested describe/it containment.

Co-Authored-By: JF10R <noreply@github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: integrate PR #53 — tsconfig path alias resolution

Adds TsconfigResolver module that resolves TypeScript path aliases
(e.g., @/ -> src/) from tsconfig.json compilerOptions.paths. Also
resolves import targets to absolute file paths in IMPORTS_FROM edges.

Co-Authored-By: JF10R <noreply@github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: integrate PR #58 — .mjs/.astro support

Adds .mjs extension mapping to JavaScript and .astro extension mapping
to TypeScript for import path resolution.

Co-Authored-By: zoneghost7 <noreply@github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: integrate PR #55 — configurable embedding model

Adds CRG_EMBEDDING_MODEL env var and model parameter to embedding
functions, allowing users to specify any sentence-transformers compatible
model. Changing the model re-embeds all nodes automatically.

Co-Authored-By: eugenepro2 <noreply@github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: integrate PR #45 — MiniMax embedding provider

Adds MiniMaxEmbeddingProvider using the embo-01 model (1536 dimensions)
with support for distinct task types (db/query), batching, and retry logic.

Co-Authored-By: octo-patch <noreply@github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: integrate PR #62 — Perl support

Adds Perl language parsing with package detection, subroutine extraction,
use/require imports, and function call tracking. Includes test fixture
and comprehensive test class.

Co-Authored-By: potatogim <noreply@github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: michael-denyer <noreply@github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tirth8205
Copy link
Owner

Integrated into main via PR #68. Thank you for the contribution! 🎉

@tirth8205 tirth8205 closed this Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants