DeusData · halindrome · Mar 23, 2026 · Mar 23, 2026 · Mar 23, 2026 · Mar 23, 2026
diff --git a/.vbw-planning/phases/21-language-support-tool/21-01-PLAN.md b/.vbw-planning/phases/21-language-support-tool/21-01-PLAN.md
@@ -0,0 +1,145 @@
+---
+plan: 21-01
+phase: 21
+title: "Validate tool-routing behavior and improve CLAUDE.md guidance"
+status: ready
+date: 2026-03-24
+commit_scope: docs
+---
+
+# Plan 21-01: Validate Tool-Routing Graceful Degradation + Improve CLAUDE.md
+
+## Goal
+
+Validate the Scout research finding that a `list_supported_languages` tool is NOT
+needed. CMM already handles misrouted calls gracefully (empty results, no errors).
+After hands-on validation, improve CLAUDE.md documentation to close the minor gaps
+identified in the research.
+
+## Background (from research)
+
+- All structural queries (`search_graph`, `get_code_snippet`, `trace_call_path`)
+  return `{"total":0, "results":[]}` for text-only or non-indexed file types — no
+  errors, no crashes.
+- JSON, YAML, and Markdown ARE indexed by CMM (Tree-Sitter parsers extract Module
+  and Variable nodes), so `search_code` works on them. Current CLAUDE.md guidance
+  says to use Read for these, missing the `search_code` opportunity.
+- The cost of a `list_supported_languages` tool (~10ms per session + complexity)
+  exceeds the cost of try-and-fallback (~1ms empty query).
+- User-configured extensions (Phase 5) create a minor knowledge gap, but graceful
+  fallback handles it.
+
+## File Ownership (no overlapping edits between tasks)
+
+| Task | Files Modified |
+|------|---------------|
+| T1 | None (validation only — reads + CMM tool calls) |
+| T2 | `.claude/rules/global-claude-md.md` |
+
+T1 and T2 can run in parallel — T1 produces validation evidence, T2 edits docs.
+T2 does not depend on T1 because the research already provides sufficient evidence
+for the doc changes. If T1 discovers unexpected issues, a pivot task would be added.
+
+---
+
+## Tasks
+
+### T1 — Hands-on validation of graceful degradation
+
+**Wave:** 1 (no deps)
+**Files:** None modified (read-only validation)
+
+**Steps:**
+
+1. Call `search_graph` with a `file_pattern` targeting a non-indexed extension
+   (e.g., `*.txt` or `*.xyz`) on the current CMM project. Verify the response is
+   `{"total":0, "results":[]}` or equivalent empty result — not an error.
+
+2. Call `search_graph` with `label_filter="Function"` on a YAML or JSON file in
+   the project. Verify it returns empty results (these file types have no
+   Function nodes).
+
+3. Call `search_code` with a known string pattern scoped to `*.yaml` or `*.json`
+   files. Verify it returns matches (proving text search works on indexed
+   text-only formats).
+
+4. Call `get_code_snippet` with a qualified name that does not exist (e.g.,
+   `nonexistent_module::fake_function`). Verify graceful empty/error response.
+
+5. Document results: write a brief validation summary as a comment in the
+   research file or as a standalone validation note.
+
+**Acceptance criteria:**
+- All four CMM calls return graceful responses (empty results or clear error
+  messages, never crashes or unhandled exceptions)
+- Validation results confirm the research finding: no tool needed
+- If ANY call produces an unexpected error or crash, escalate immediately —
+  this would trigger a pivot to tool implementation
+
+**Commit:** `docs(phase-21): add tool-routing validation results`
+
+---
+
+### T2 — Improve CLAUDE.md guidance for text-only file types and search_code
+
+**Wave:** 1 (no deps)
+**Files:** `.claude/rules/global-claude-md.md`
+
+**Changes:**
+
+1. Update the "When Read is Correct" section to clarify that text-only formats
+   (JSON, YAML, TOML, Markdown) are indexed and searchable via `search_code`,
+   while `Read` remains correct for full-file context:
+
+```markdown
+### When Read is Correct
+
+Use `Read` directly when:
+- Non-code files (JSON, YAML, TOML, config, HTML templates, Markdown, .env)
+  **Note**: After indexing, these formats are searchable via `search_code` for
+  pattern matching (e.g., finding all occurrences of a config key across YAML
+  files). Use `Read` when you need full-file context; use `search_code` when
+  searching for specific strings across many config/data files.
+- Full file context needed (imports, globals, module-level initialization flow)
+- Very small files (under 50 lines)
+- Files not yet indexed (new files before `index_repository`)
+- Editing 6+ functions in the same file (batch context is more efficient)
+- Jupyter notebooks, READMEs, documentation files
+```
+
+2. Add a `search_code` usage note to the Tool Reference section, after the
+   existing `search_code` bullet:
+
+```markdown
+- **`search_code`**: Use for text search in source files — string literals, error
+  messages, TODOs, config values, import statements. Scoped to indexed project with
+  pagination. Case-insensitive by default. Also works on indexed non-code files
+  (JSON, YAML, TOML, Markdown) — prefer over `Read` when searching for patterns
+  across multiple config or data files.
+```
+
+**Acceptance criteria:**
+- "When Read is Correct" section includes the clarifying note about `search_code`
+  for indexed text-only formats
+- `search_code` tool reference mentions non-code file applicability
+- No other sections are modified
+- The guidance does NOT recommend building a `list_supported_languages` tool
+
+**Commit:** `docs(claude-md): clarify search_code works on indexed text-only formats`
+
+---
+
+## Wave Summary
+
+| Wave | Tasks | Parallelizable? |
+|------|-------|----------------|
+| 1 | T1, T2 | Yes — T1 is read-only, T2 edits docs only |
+
+## Definition of Done
+
+- [ ] Hands-on validation confirms all CMM tools degrade gracefully for
+      text-only and non-indexed file types (T1)
+- [ ] `.claude/rules/global-claude-md.md` updated with clarified guidance (T2)
+- [ ] No `list_supported_languages` tool implementation (research conclusion
+      validated)
+- [ ] Both commits follow conventional format
diff --git a/src/cli/cli.c b/src/cli/cli.c
@@ -405,6 +405,7 @@ static const char skill_reference_content[] =
     "- `delete_project` — remove a project\n"
     "- `manage_adr` — architecture decision records\n"
     "- `ingest_traces` — import runtime traces\n"
+    "- `touch_project` — reset poll timer for on-demand reindex\n"
     "\n"
     "## Edge Types\n"
     "CALLS, HTTP_CALLS, ASYNC_CALLS, IMPORTS, DEFINES, DEFINES_METHOD,\n"

diff --git a/src/main.c b/src/main.c
@@ -149,7 +149,7 @@ static void print_help(void) {
     printf("\nTools: index_repository, search_graph, query_graph, trace_path,\n");
     printf("  get_code_snippet, get_graph_schema, get_architecture, search_code,\n");
     printf("  list_projects, delete_project, index_status, detect_changes,\n");
-    printf("  manage_adr, ingest_traces\n");
+    printf("  manage_adr, ingest_traces, touch_project\n");
 }
 
 /* ── Main ───────────────────────────────────────────────────────── */

diff --git a/src/mcp/mcp.c b/src/mcp/mcp.c
@@ -1,5 +1,5 @@
 /*
- * mcp.c — MCP server: JSON-RPC 2.0 over stdio with 14 graph tools.
+ * mcp.c — MCP server: JSON-RPC 2.0 over stdio with 15 graph tools.
  *
  * Uses yyjson for fast JSON parsing/building.
  * Single-threaded event loop: read line → parse → dispatch → respond.
@@ -329,6 +329,13 @@ static const tool_def_t TOOLS[] = {
      "{\"type\":\"object\",\"properties\":{\"traces\":{\"type\":\"array\",\"items\":{\"type\":"
      "\"object\"}},\"project\":{\"type\":"
      "\"string\"}},\"required\":[\"traces\",\"project\"]}"},
+
+    {"touch_project",
+     "Reset the adaptive poll timer for a project so the next watcher cycle "
+     "runs check_changes() immediately. Useful from git hooks or editor "
+     "save hooks to trigger reindex without waiting for the poll interval.",
+     "{\"type\":\"object\",\"properties\":{\"project\":{\"type\":\"string\","
+     "\"description\":\"Project name to touch\"}},\"required\":[\"project\"]}"},
 };
 
 static const int TOOL_COUNT = sizeof(TOOLS) / sizeof(TOOLS[0]);
@@ -2842,6 +2849,39 @@ static char *handle_ingest_traces(cbm_mcp_server_t *srv, const char *args) {
     return result;
 }
 
+/* touch_project: reset adaptive backoff so next poll cycle is immediate. */
+static char *handle_touch_project(cbm_mcp_server_t *srv, const char *args) {
+    char *project = cbm_mcp_get_string_arg(args, "project");
+    if (!project) {
+        return cbm_mcp_text_result("project is required", true);
+    }
+    if (!srv->watcher) {
+        free(project);
+        return cbm_mcp_text_result("watcher not running", true);
+    }
+    bool found = cbm_watcher_touch(srv->watcher, project);
+    if (!found) {
+        char msg[256];
+        snprintf(msg, sizeof(msg), "project '%s' not found in watch list", project);
+        free(project);
+        return cbm_mcp_text_result(msg, true);
+    }
+
+    yyjson_mut_doc *doc = yyjson_mut_doc_new(NULL);
+    yyjson_mut_val *root = yyjson_mut_obj(doc);
+    yyjson_mut_doc_set_root(doc, root);
+    yyjson_mut_obj_add_str(doc, root, "project", project);
+    yyjson_mut_obj_add_str(doc, root, "status", "touched");
+
+    char *json = yy_doc_to_str(doc);
+    yyjson_mut_doc_free(doc);
+    free(project);
+
+    char *result = cbm_mcp_text_result(json, false);
+    free(json);
+    return result;
+}
+
 /* ── Tool dispatch ────────────────────────────────────────────── */
 
 char *cbm_mcp_handle_tool(cbm_mcp_server_t *srv, const char *tool_name, const char *args_json) {
@@ -2893,6 +2933,10 @@ char *cbm_mcp_handle_tool(cbm_mcp_server_t *srv, const char *tool_name, const ch
     if (strcmp(tool_name, "ingest_traces") == 0) {
         return handle_ingest_traces(srv, args_json);
     }
+    if (strcmp(tool_name, "touch_project") == 0) {
+        return handle_touch_project(srv, args_json);
+    }
+
     char msg[256];
     snprintf(msg, sizeof(msg), "unknown tool: %s", tool_name);
     return cbm_mcp_text_result(msg, true);

diff --git a/src/watcher/watcher.c b/src/watcher/watcher.c
@@ -111,7 +111,9 @@ static int git_head(const char *root_path, char *out, size_t out_size) {
     return -1;
 }
 
-/* Returns true if working tree has changes (modified, untracked, etc.) */
+/* Returns true if working tree has changes (modified, untracked, etc.).
+ * Also checks submodules via `git submodule foreach` to detect uncommitted
+ * changes inside submodules that `git status` alone would not report. */
 static bool git_is_dirty(const char *root_path) {
     char cmd[1024];
     snprintf(cmd, sizeof(cmd),
@@ -136,6 +138,34 @@ static bool git_is_dirty(const char *root_path) {
         }
     }
     cbm_pclose(fp);
+
+    if (dirty) {
+        return true;
+    }
+
+    /* Check submodules: uncommitted changes inside a submodule are invisible
+     * to the parent's `git status` unless --recurse-submodules is supported.
+     * Use `git submodule foreach` as a portable fallback. */
+    snprintf(cmd, sizeof(cmd),
+             "git --no-optional-locks -C '%s' submodule foreach --quiet --recursive "
+             "'git status --porcelain --untracked-files=normal 2>/dev/null' "
+             "2>/dev/null",
+             root_path);
+    // NOLINTNEXTLINE(bugprone-command-processor,cert-env33-c)
+    fp = cbm_popen(cmd, "r");
+    if (!fp) {
+        return false;
+    }
+    if (fgets(line, sizeof(line), fp)) {
+        size_t len = strlen(line);
+        while (len > 0 && (line[len - 1] == '\n' || line[len - 1] == '\r')) {
+            line[--len] = '\0';
+        }
+        if (len > 0) {
+            dirty = true;
+        }
+    }
+    cbm_pclose(fp);
     return dirty;
 }
 
@@ -248,15 +278,17 @@ void cbm_watcher_unwatch(cbm_watcher_t *w, const char *project_name) {
     }
 }
 
-void cbm_watcher_touch(cbm_watcher_t *w, const char *project_name) {
+bool cbm_watcher_touch(cbm_watcher_t *w, const char *project_name) {
     if (!w || !project_name) {
-        return;
+        return false;
     }
     project_state_t *s = cbm_ht_get(w->projects, project_name);
     if (s) {
         /* Reset backoff — poll immediately on next cycle */
         s->next_poll_ns = 0;
+        return true;
     }
+    return false;
 }
 
 int cbm_watcher_watch_count(const cbm_watcher_t *w) {

diff --git a/src/watcher/watcher.h b/src/watcher/watcher.h
@@ -45,8 +45,9 @@ void cbm_watcher_watch(cbm_watcher_t *w, const char *project_name, const char *r
 /* Remove a project from the watch list. */
 void cbm_watcher_unwatch(cbm_watcher_t *w, const char *project_name);
 
-/* Refresh a project's timestamp (resets adaptive backoff). */
-void cbm_watcher_touch(cbm_watcher_t *w, const char *project_name);
+/* Refresh a project's timestamp (resets adaptive backoff).
+ * Returns true if the project was found, false otherwise. */
+bool cbm_watcher_touch(cbm_watcher_t *w, const char *project_name);
 
 /* ── Polling ────────────────────────────────────────────────────── */
 

diff --git a/tests/test_mcp.c b/tests/test_mcp.c
@@ -129,7 +129,7 @@ TEST(mcp_initialize_response) {
 TEST(mcp_tools_list) {
     char *json = cbm_mcp_tools_list();
     ASSERT_NOT_NULL(json);
-    /* Should contain all 14 tools */
+    /* Should contain all 15 tools */
     ASSERT_NOT_NULL(strstr(json, "index_repository"));
     ASSERT_NOT_NULL(strstr(json, "search_graph"));
     ASSERT_NOT_NULL(strstr(json, "query_graph"));
@@ -144,6 +144,7 @@ TEST(mcp_tools_list) {
     ASSERT_NOT_NULL(strstr(json, "detect_changes"));
     ASSERT_NOT_NULL(strstr(json, "manage_adr"));
     ASSERT_NOT_NULL(strstr(json, "ingest_traces"));
+    ASSERT_NOT_NULL(strstr(json, "touch_project"));
     free(json);
     PASS();
 }
@@ -707,6 +708,38 @@ TEST(tool_ingest_traces_empty) {
     PASS();
 }
 
+/* ══════════════════════════════════════════════════════════════════
+ *  TOUCH PROJECT
+ * ══════════════════════════════════════════════════════════════════ */
+
+TEST(mcp_touch_project_no_watcher) {
+    /* When watcher is NULL (CLI mode), touch_project returns "watcher not running"
+     * error rather than crashing. */
+    cbm_mcp_server_t *srv = cbm_mcp_server_new(NULL);
+    /* srv->watcher is NULL by default */
+
+    char *result = cbm_mcp_handle_tool(srv, "touch_project", "{\"project\":\"x\"}");
+    ASSERT_NOT_NULL(result);
+    ASSERT_NOT_NULL(strstr(result, "watcher not running"));
+    free(result);
+
+    cbm_mcp_server_free(srv);
+    PASS();
+}
+
+TEST(mcp_touch_project_missing_arg) {
+    /* touch_project without project arg returns "project is required" error. */
+    cbm_mcp_server_t *srv = cbm_mcp_server_new(NULL);
+
+    char *result = cbm_mcp_handle_tool(srv, "touch_project", "{}");
+    ASSERT_NOT_NULL(result);
+    ASSERT_NOT_NULL(strstr(result, "project is required"));
+    free(result);
+
+    cbm_mcp_server_free(srv);
+    PASS();
+}
+
 /* ══════════════════════════════════════════════════════════════════
  *  IDLE STORE EVICTION
  * ══════════════════════════════════════════════════════════════════ */
@@ -1717,6 +1750,10 @@ SUITE(mcp) {
     RUN_TEST(tool_ingest_traces_basic);
     RUN_TEST(tool_ingest_traces_empty);
 
+    /* touch_project */
+    RUN_TEST(mcp_touch_project_no_watcher);
+    RUN_TEST(mcp_touch_project_missing_arg);
+
     /* Idle store eviction */
     RUN_TEST(store_idle_eviction);
     RUN_TEST(store_idle_no_eviction_within_timeout);