Skip to content

Sharper-Flow/lgrep

Repository files navigation

lgrep

Dual-engine code intelligence for OpenCode.

lgrep gives AI agents a better first move.

Instead of starting with glob, grep, and random file reads, agents can:

  • search by meaning when they do not know the symbol yet
  • search by symbol when they know the name
  • inspect file and repo structure before opening code
  • reuse one warm local server across multiple sessions and agents

That is the whole pitch: fewer bad searches, less wasted context, faster understanding for both humans and agents.

What lgrep is

lgrep combines two complementary engines in one MCP server:

  • Semantic engine - natural-language code search using Voyage Code 3 embeddings with local LanceDB storage
  • Symbol engine - exact symbol, outline, and text tools using tree-sitter parsing with a local JSON index

Use the semantic engine to answer questions like:

  • "where is auth enforced between route and service?"
  • "how does retry logic work for failed requests?"
  • "where are permissions checked?"

Use the symbol engine to answer questions like:

  • "find the authenticate function"
  • "show me the outline for src/auth.py"
  • "get the symbol source for UserService.login"

Your code stays local. Only short semantic queries and indexing payloads go to Voyage. Symbol lookup stays fully local and works without an API key.

Why it exists

AI coding agents usually fail early, not late.

They miss because they start with the wrong retrieval primitive:

  • grep cannot answer concept questions
  • full-file reads waste tokens on irrelevant code
  • repeated local exploration across parallel agents duplicates work

lgrep fixes that by giving agents a search stack that matches how they actually reason:

  1. find the implementation by intent
  2. narrow to the right file or symbol
  3. retrieve only the code that matters

For heavy OpenCode users, this is not a convenience plugin. It is search infrastructure.

Why lgrep feels different

  • Intent-first search - agents can ask by meaning before they know names
  • Exact structure tools - file outlines, repo outlines, symbol lookup, and text search are in the same server
  • Local-first storage - vectors and indexes live on disk, not in someone else's SaaS
  • Shared warm process - one HTTP MCP server can serve multiple concurrent OpenCode sessions
  • Commercially usable - lgrep is MIT licensed, so commercial use is allowed

Comparisons

lgrep vs grep and ripgrep

grep and rg are still the right tool for exact text and regex lookups. They are not good at intent discovery.

If the code says jwt.verify() and your agent asks "where is authentication enforced?", text search often misses the right entry point. Semantic search closes that gap.

lgrep vs mgrep

mgrep is the closest semantic-search comparison point.

  • mgrep is semantic-only
  • lgrep combines semantic search with symbol and structure tools
  • mgrep is cloud-oriented
  • lgrep keeps vectors local and shares one warm server across agents

lgrep vs jCodeMunch MCP

jcodemunch-mcp is a strong symbol-first MCP server. It is built around tree-sitter indexing and precise byte-offset retrieval, which makes it good when an agent already knows roughly what symbol it wants and wants to minimize context spend.

lgrep is optimized for a different problem: the agent does not know the symbol yet and needs to find the right implementation by intent first, then drill into structure.

grep / rg mgrep jCodeMunch MCP lgrep
Primary strength Exact text / regex Semantic search Symbol retrieval Semantic discovery + symbol retrieval
Semantic search No Yes No Yes
Symbol tools No No Yes Yes
File / repo outline No No Yes Yes
Local vector storage N/A No N/A Yes
No API key mode Yes No Yes Yes (symbol engine)
Commercial usage Yes Subscription service Paid commercial license required per its README Yes (MIT)
Best first question "find this exact string" "find code matching this concept" "find this exact symbol" "find the code that implements this idea"

If you want the most precise symbol retrieval with no semantic layer, jcodemunch-mcp is a credible option.

If you want OpenCode agents to start with intent-level discovery and still have symbol tooling in the same server, lgrep is the better fit.

How it works

Architecture

flowchart LR
    A[OpenCode Session 1] --> M[lgrep MCP Server]
    B[OpenCode Session 2] --> M
    C[OpenCode Session N] --> M

    M --> S[Semantic Engine\nVoyage Code 3 + LanceDB]
    M --> Y[Symbol Engine\ntree-sitter + JSON index]

    S --> V[(Local vector store)]
    Y --> J[(Local symbol store)]
    S -. query embeddings .-> Q[Voyage API]
Loading
Agent -> lgrep MCP server -> semantic engine + symbol engine

Semantic engine

  1. Discover files while respecting .gitignore
  2. Chunk code with AST-aware boundaries
  3. Embed chunks with Voyage Code 3
  4. Store vectors locally in LanceDB
  5. Search with hybrid retrieval and reranking

Symbol engine

  1. Parse source with tree-sitter
  2. Extract functions, classes, methods, and related structure
  3. Store a local symbol index
  4. Serve symbol search, outlines, and source retrieval without an API call

Installation

Requirements

  • Python 3.11+
  • a Voyage API key if you want semantic search

Install from GitHub

pip install git+https://github.com/Sharper-Flow/lgrep.git

Install from source

git clone https://github.com/Sharper-Flow/lgrep.git
cd lgrep
pip install .

Fast setup for OpenCode

1. Get a Voyage API key

Create a key at dash.voyageai.com.

You only need this for the semantic engine. The symbol engine works without it.

2. Start lgrep as a shared HTTP MCP server

Recommended manual start:

VOYAGE_API_KEY=your-key \
LGREP_WARM_PATHS=/path/to/project-a:/path/to/project-b \
lgrep --transport streamable-http --host 127.0.0.1 --port 6285

Why HTTP instead of stdio?

With stdio, each OpenCode session spawns its own server process. With streamable-http, one warm server handles all sessions.

3. Wire it into OpenCode

Add this to ~/.config/opencode/opencode.json:

{
  "instructions": [
    "~/.config/opencode/instructions/lgrep-tools.md"
  ],
  "mcp": {
    "lgrep": {
      "type": "remote",
      "url": "http://localhost:6285/mcp",
      "enabled": true
    }
  }
}

Or let lgrep do the wiring for you:

lgrep install-opencode

That installer can:

  • add the MCP entry
  • copy the packaged lgrep-tools.md instruction into your OpenCode config
  • copy the packaged skills/lgrep/SKILL.md reference file into your OpenCode config
  • append it to the instructions array so agents prefer lgrep first

Important: the active agent must also expose lgrep_* tool definitions in its tool manifest. If an agent profile only allows read/glob/grep, the model cannot choose lgrep even when the MCP server is configured and the instruction policy is present.

The installed files land at:

  • ~/.config/opencode/instructions/lgrep-tools.md
  • ~/.config/opencode/skills/lgrep/SKILL.md

4. Optional: generate a .lgrepignore

lgrep init-ignore /path/to/project

First-use workflow

Typical OpenCode flow:

  1. Ask an intent question with lgrep_search_semantic
  2. Inspect structure with lgrep_get_file_outline or lgrep_get_repo_outline
  3. Retrieve exact symbols with lgrep_search_symbols and lgrep_get_symbol

Examples:

lgrep_search_semantic(query="authentication flow", path="/path/to/project")
lgrep_get_file_outline(path="/path/to/project/src/auth.py")
lgrep_index_symbols_folder(path="/path/to/project")
lgrep_search_symbols(query="authenticate", path="/path/to/project")
lgrep_get_symbol(symbol_id="src/auth.py:function:authenticate", path="/path/to/project")

High-value prompts:

  • "Where do we enforce auth between route and service?"
  • "Find the authenticate function"
  • "What are the main symbols in src/auth.py?"
  • "Show me the repo structure around billing"
  • "Find references to verifyToken"

Tool selection guide

Task Best tool Why
Intent or concept discovery lgrep_search_semantic Search by meaning
Find a function or class by name lgrep_search_symbols Exact symbol lookup
Inspect a single file's structure lgrep_get_file_outline Fast AST outline
Inspect repo structure lgrep_get_repo_outline Symbol-level overview
Find exact text or identifiers lgrep_search_text or grep Literal match
Retrieve exact source for a symbol lgrep_get_symbol Targeted code retrieval
Read a known file directly Read No search needed

MCP tools

Semantic tools

Tool Purpose
lgrep_search_semantic(query, path, limit=10, hybrid=true) Search code by meaning
lgrep_index_semantic(path) Build or refresh a semantic index
lgrep_status_semantic(path?) Show semantic index and watcher status
lgrep_watch_start_semantic(path) Start background semantic re-indexing
lgrep_watch_stop_semantic(path?) Stop the watcher

Symbol tools

Tool Purpose
lgrep_index_symbols_folder(path, max_files=500, incremental=True) Index symbols in a local folder
lgrep_index_symbols_repo(repo, ref="HEAD") Index symbols from a GitHub repo
lgrep_list_repos() List indexed symbol repos
lgrep_get_file_tree(path, max_files=500) Show repo file tree
lgrep_get_file_outline(path) Show symbol outline for one file
lgrep_get_repo_outline(path, max_files=500) Show symbol outline for a repo
lgrep_search_symbols(query, path, limit=20, kind?) Search symbols by name
lgrep_search_text(query, path, max_results=50) Search literal text
lgrep_get_symbol(symbol_id, path) Retrieve one symbol
lgrep_get_symbols(symbol_ids, path) Retrieve multiple symbols
lgrep_invalidate_cache(path) Drop the symbol index for a repo

Symbol ID format

Symbol IDs use this deterministic format:

file_path:kind:name

Examples:

src/auth.py:function:authenticate
src/auth.py:class:AuthManager
src/auth.py:method:login

Transport and security

lgrep supports both stdio and streamable-http, but shared HTTP is the intended deployment mode for OpenCode.

lgrep --transport streamable-http --host 127.0.0.1 --port 6285

Security notes:

  • default host is 127.0.0.1
  • there is no built-in auth layer on the HTTP transport
  • lgrep does not set CORS headers, and browser-based clients should not connect directly to the streamable HTTP endpoint
  • if you do put it behind a proxy, enforce your own authentication and origin controls there
  • exposing 0.0.0.0 is a non-default, explicit opt-in; do not do it without a reverse proxy or firewall

Configuration

Environment variables

Variable Required Default Description
VOYAGE_API_KEY For semantic search none Voyage API key
LGREP_LOG_LEVEL No INFO Log verbosity
LGREP_CACHE_DIR No ~/.cache/lgrep Cache directory
LGREP_WARM_PATHS No none Colon-separated projects to warm on startup

Ignore behavior

  • .gitignore is respected automatically
  • .lgrepignore lets you exclude additional paths

Example .lgrepignore entries:

src/generated/
docs/site/
*.test.data

Resource profile

Typical operating profile:

Resource Idle During indexing
RAM ~300MB ~500MB
CPU <1% usually low local CPU; Voyage does most semantic heavy lifting
Disk ~250MB per semantic index + ~5MB per symbol index grows with indexed projects
Network minimal semantic indexing and semantic queries call Voyage

Supported languages

  • Semantic engine - AST-aware chunking for 30+ languages, with text fallback when needed
  • Symbol engine - tree-sitter-language-pack support across 165+ languages

Troubleshooting

VOYAGE_API_KEY not set

Set the key in your MCP server environment. The symbol engine still works without it.

Slow first semantic index

The first run embeds the whole project. Later runs skip unchanged files using hashes.

Repository not indexed from symbol tools

Run lgrep_index_symbols_folder(path=...) first.

Stale semantic results

Run lgrep_index_semantic(...) or start lgrep_watch_start_semantic(...).

Native dependency build issues

lgrep depends on packages with native extensions. Prebuilt wheels usually work; otherwise install the required compiler toolchain.

Migration from v1.x

The semantic tools were renamed in v2.0.0:

v1.x v2.x
lgrep_search lgrep_search_semantic
lgrep_index lgrep_index_semantic
lgrep_status lgrep_status_semantic
lgrep_watch_start lgrep_watch_start_semantic
lgrep_watch_stop lgrep_watch_stop_semantic

Development

git clone https://github.com/Sharper-Flow/lgrep.git
cd lgrep
pip install -e ".[dev]"
pytest -v

License

MIT - see LICENSE.

About

Dual-engine code intelligence for OpenCode: semantic code search plus symbol lookup for AI agents.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages