Skip to content

feat: Respect .gitignore when discovering files#6

Open
laplaque wants to merge 3 commits intoMikeRecognex:mainfrom
laplaque:feat/respect-gitignore
Open

feat: Respect .gitignore when discovering files#6
laplaque wants to merge 3 commits intoMikeRecognex:mainfrom
laplaque:feat/respect-gitignore

Conversation

@laplaque
Copy link
Copy Markdown

@laplaque laplaque commented Apr 7, 2026

Summary

  • In git repositories, filter discovered files through git check-ignore --stdin so that paths matched by .gitignore, .git/info/exclude, and the global gitignore are excluded from the index
  • Falls back gracefully to the existing exclude_patterns logic when the project is not a git repo or git is unavailable
  • Zero new dependencies — uses subprocess (already used by git_tracker.py)

Motivation

The hardcoded exclude_patterns list covers common directories like __pycache__ and node_modules, but misses project-specific ignores (e.g. .mypy_cache, build/, dist/, IDE directories). Most projects already define these in .gitignore. By respecting it, the indexer automatically excludes the right files without requiring users to patch the source.

Changes

File Change
src/mcp_codebase_index/project_indexer.py Add _get_git_ignored_paths() method; call it from _discover_files() after glob matching
tests/test_project_indexer.py Add TestGitIgnore class with 4 tests (git-ignored detection, discover exclusion, non-git fallback, nested patterns)

How it works

After the existing glob + exclude_patterns filtering, _discover_files() passes all candidate paths to git check-ignore --stdin. Any paths git reports as ignored are removed from the result set. This runs once per full index build (not per query).

Test plan

  • TestGitIgnore::test_git_ignored_paths_detected — verifies _get_git_ignored_paths returns git-ignored files
  • TestGitIgnore::test_discover_files_excludes_git_ignored — verifies end-to-end: indexed files exclude git-ignored paths
  • TestGitIgnore::test_non_git_project_no_filtering — verifies graceful no-op in non-git directories
  • TestGitIgnore::test_gitignore_with_nested_patterns — verifies dynamically added .gitignore patterns are respected
  • Full suite: 368 passed, 11 failed (all 11 failures are pre-existing on main)

In git repositories, filter discovered files through `git check-ignore
--stdin` so that paths matched by .gitignore, .git/info/exclude, and
the global gitignore are excluded from the index.

Falls back gracefully to the existing exclude_patterns logic when the
project is not a git repo or git is unavailable.
When PROJECT_ROOT spans multiple git repositories (or is not itself a
git repo), group discovered files by their containing repo and run
git check-ignore once per repo. This ensures each repo's .gitignore
is respected even when the indexed directory is a parent of several
independent repos.

Add _find_git_root() helper and a multi-repo test case with three
independent repos under a shared parent directory.
The incremental update path in _maybe_incremental_update() only checked
the hardcoded exclude_patterns, not .gitignore. Collect candidate paths
from the changeset, batch-check them via _get_git_ignored_paths(), and
skip any that are git-ignored before reindexing.
@laplaque laplaque changed the title Respect .gitignore when discovering files feat: Respect .gitignore when discovering files Apr 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants