supercharge codebase-memory-mcp: streamline and consolidate api, autoindexing, PageRank, dependency indexing, speedup, cli config, autotune by ahundt · Pull Request #151 · DeusData/codebase-memory-mcp

ahundt · 2026-03-26T10:04:02Z

Thanks for building codebase-memory-mcp! I really like how it works and the performance, plus i saw some opportunities to make it even better, so I supercharged it!

What This PR Proposes

This PR brings together a number of development tracks I've been working on:
1. a streamlined & modernized MCP tool surface,
2. automatic indexing on first call,
3. PageRank-based result ranking,
4. dependency source indexing,
5. token reduction strategies,
6. a full config registry, and
7. autotuning of algorithm performance to get better results.

It also fixes several bugs where search parameters were accepted by the API but silently never applied to queries, and i fixed a bunch of memory leaks. I'd appreciate feedback on any of these areas!

Built on peer reviewed research

I built these changes upon several peer reviewed papers and other repos that show empirically strong performance:

RepoGraph ICLR 2025 https://github.com/ozyyshr/RepoGraph https://iclr.cc/media/iclr-2025/Slides/28957.pdf
PageRank: Page, Brin et al. (1998/1999) — the original Google algorithm, damping factor d=0.85
LinkRank: Kim et al. (2010), "LinkRank: Finding communities in directed networks", https://arxiv.org/abs/0902.3728, Physical Review E, Vol 81, Issue 1
https://github.com/pdavis68/RepoMapper - starting point for some of the integrations like PageRank
https://github.com/rtk-ai/rtk - to help reduce input token usage, saving $$$

Here is a few iterations of ai generated description about what changes, and yes I've actually been running and using and testing the code, there is a whole test suite which is a good portion of the changes:

A Cleaner Tool Surface (`src/mcp/mcp.c`)

Why This Matters

The classic API exposed 15 tools by default. When an AI client receives a large tool list, it has to reason about which tool applies before doing any actual work, and the likelihood of choosing a suboptimal tool grows with list size. Presenting three well-chosen tools lowers that overhead and makes the common workflows (find code, trace a call chain, read a symbol) straightforward to discover and use. The tradeoff is worth it: the three streamlined tools cover the large majority of real usage, and the full classic set is one config change away, which is the best practice known as "progressive disclosure".

The Streamlined API (default)

Three tools are exposed by default (tool_mode=streamlined):

search_code_graph searches the knowledge graph for functions, classes, routes, and other symbols, sorted by PageRank by default. It also accepts Cypher queries for multi-hop patterns.

Parameter	Type	Description
`project`	string	Project name, path, `self`, `dep`/`deps`, `dep.{name}`, or glob
`cypher`	string	Cypher query for complex patterns (overrides filter params when set)
`label`	string	Node label filter (Function, Class, Route, etc.)
`name_pattern`	string	Glob/substring match on node name
`qn_pattern`	string	Regex match on qualified name
`file_pattern`	string	Glob match on file path
`relationship`	string	Filter to nodes connected by this edge type (e.g. CALLS)
`exclude_entry_points`	boolean	Exclude nodes with no inbound edges
`include_dependencies`	boolean	Include indexed dependency sub-projects in results
`exclude`	string[]	Glob patterns for file paths to exclude
`sort_by`	enum	`relevance`, `name`, `degree`, `calls`, `linkrank`
`mode`	enum	`full` (default) or `summary` (counts only, no individual results)
`compact`	boolean	Omit redundant fields (name when equal to last QN segment, zero degrees)
`limit`	integer	Max results (default 50, config: `search_limit`)
`offset`	integer	Pagination offset
`min_degree` / `max_degree`	integer	Degree range filter
`max_output_bytes`	integer	Cypher mode only: cap response size (0 = unlimited)

trace_call_path traces who calls a function and what it calls, using BFS over the graph.

Parameter	Type	Description
`function_name`	string	Name to start tracing from (required)
`project`	string	Project scope
`direction`	enum	`inbound`, `outbound`, or `both`
`depth`	integer	BFS depth limit (default 3)
`max_results`	integer	Results per direction (default 25, config: `trace_max_results`)
`edge_types`	string[]	Restrict traversal to these edge types (e.g. `["CALLS", "HTTP_CALLS"]`)
`exclude`	string[]	Glob patterns for file paths to exclude from results
`compact`	boolean	Omit redundant name/file fields

get_code returns source code for a symbol by qualified name.

Parameter	Type	Description
`qualified_name`	string	Qualified name from search results (required)
`project`	string	Project scope
`mode`	enum	`full` (default), `signature` (declaration only), `head_tail` (first 60% + last 40%)
`max_lines`	integer	Line cap (default 200, config: `snippet_max_lines`)
`auto_resolve`	boolean	For module nodes: resolve to file source instead of metadata
`include_neighbors`	boolean	Include directly connected nodes in response

_hidden_tools is a discovery tool that describes the full classic API and explains how to unlock it. It takes no parameters and returns a description of all 15 classic tools, how to enable them individually (config set tool_<name> true) or globally (config set tool_mode classic). This allows AI clients operating in streamlined mode to find out what else exists without being presented with the full list upfront.

Modern MCP: Resources

This PR adds three MCP resources, following the modern MCP spec (resources/list + resources/read). Resources are the right mechanism for ambient context that clients can read passively without invoking a tool call, which preserves tool slots for active operations.

codebase://architecture: highest-importance functions by PageRank score, hotspot files, graph statistics
codebase://status: current index state (ready / indexing / not_indexed / empty)
codebase://schema: node labels, edge types, and example Cypher queries

Backwards Compatibility

The classic API is fully preserved. Nothing is removed. Clients can restore the full 15-tool list with a single setting:

config set tool_mode classic
# or environment variable:
CBM_TOOL_MODE=classic

Individual classic tools can be enabled selectively (config set tool_search_graph true). A _hidden_tools pseudo-tool in the streamlined list explains this to AI clients, so they can discover what exists and unlock what they need without breaking existing integrations.

Every tool response now appends _result_bytes and _est_tokens so clients can gauge context cost.

Ranked Results: PageRank + LinkRank (`src/pagerank/pagerank.c`, `pagerank.h`, new)

Search results previously came back in arbitrary SQLite order, so core utilities and test helpers ranked equally. I've added a new src/pagerank/ module that runs power-iteration PageRank (Google, d=0.85) over the graph after indexing and stores scores in a pagerank column on the nodes table. Results from search_code_graph and codebase://architecture sort by this score by default.

Edge types carry configurable weights: call edges are weighted more heavily than type-reference edges, which matter more than test edges. Thirteen edge weight keys are exposed in the config registry (edge_weight_calls, edge_weight_usage, etc.). The defaults were tuned using scripts/autotune.py, which runs full reindex cycles against known codebases and scores results against a ground-truth function list.

Dependency Source Indexing (`src/depindex/depindex.c`, `depindex.h`, new)

I've added a new index_dependencies tool that indexes library source code as searchable sub-projects named {project}.dep.{package}. It accepts a package manager name (uv, cargo, npm, and 10 others) and resolves installed source paths automatically. Results are tagged source: "dependency" so project code and library code stay distinguishable. The include_dependencies parameter on search_code_graph controls whether dependency nodes appear in results.

Token Reduction (`src/mcp/mcp.c`)

Several additions reduce how much context search and trace results consume:

Default result limit is now 50 (was unbounded). Configurable via search_limit.
compact=true omits fields that are redundant given other fields (e.g. name when it's the last segment of qualified_name, and zero-value degree counts).
mode=summary returns label and file counts rather than individual nodes, useful for getting a sense of a module's contents.
mode=signature on source retrieval returns the function signature only. mode=head_tail returns the first and last portions of the body. max_lines defaults to 200.
Cypher query results are capped at 10,000 rows and 32KB output by default (both configurable).
Pagination is now stable across pages (deterministic ordering) and responses include has_more and next_offset.

Search Parameters That Were Silently Ignored (bug fixes, `src/mcp/mcp.c`, `src/store/store.c`)

I found five parameters that were being extracted from incoming requests but never passed to the query layer. The server accepted them without error and returned results, giving no indication the parameters had no effect:

qn_pattern: regex filter on qualified_name
relationship: filter to nodes connected by a specific edge type
exclude_entry_points: filter out nodes with no inbound edges
include_dependencies: scope results to project-only or include dep sub-projects
edge_types in trace_call_path: restrict BFS traversal to specific edge types

All five are now wired through to the store layer. Results for queries using these parameters will differ from before.

Config Registry (`src/cli/cli.c`, `cli.h`)

I've replaced hardcoded #define constants with a typed config registry of 33 keys, settable via config set or CBM_* environment variables. Groups: Indexing, Search, Ranking, Watcher, Dependencies. Each key has a description and valid range hint.

Notable keys: search_limit, trace_max_results, key_functions_count, key_functions_exclude, pagerank_max_iter, all 13 edge_weight_* keys, auto_index, tool_mode, snippet_max_lines, query_max_output_bytes.

Reliability Fixes

Heap leaks (src/mcp/mcp.c, src/depindex/, src/pipeline/, src/watcher/): Over 200 leaks identified under macOS leaks --atExit and Linux LSan, now fixed. I've added make test-leak and make test-analyze (Clang static analyzer) targets to make these checks easy to run.

Graph normalization pass (src/pipeline/pass_normalize.c, new): Runs after extraction, before the SQLite dump. Enforces that every Method node has a parent Class via DEFINES_METHOD/MEMBER_OF edges and every Field is attached to its containing type. Extraction gaps previously left orphaned nodes that confused traversal queries.

Watcher (src/watcher/watcher.c): Now tracks all projects accessed in a session, not just the startup project. Uses an adaptive poll interval (1s to 30s). Fixed a SIGBUS crash caused by a stack overflow in the background thread.

macOS 25+ code signing (Makefile.cbm): Added codesign --force --sign - after make cbm and make install. Without this, macOS kills the binary after a cp invalidates the ad-hoc signature.

New Test Files

File	What it covers
`tests/test_tool_consolidation.c`	Streamlined/classic mode switching, `_hidden_tools`, all 3 streamlined tool schemas
`tests/test_token_reduction.c`	All 8 token reduction strategies
`tests/test_depindex.c`	Package manager detection, source path resolution, `include_dependencies` filter
`tests/test_pagerank.c`	PageRank convergence, edge weight application, MEMBER_OF reverse edges
`tests/test_input_validation.c`	Boundary conditions, malformed inputs, SQL injection resistance

The full suite runs under ASan + UBSan. make test-leak runs the leak check separately.

Key Files Changed

File	Description
`src/mcp/mcp.c`	Streamlined mode, resources, token reduction, 5 param bug fixes, depindex handler
`src/pagerank/pagerank.c` + `.h`	New: PageRank + LinkRank implementation and public API
`src/depindex/depindex.c` + `.h`	New: dependency indexing pipeline and 13-package-manager detection
`src/cli/cli.c` + `.h`	33-key typed config registry with env var overrides
`src/store/store.c` + `.h`	New `cbm_search_params_t` fields, degree precompute columns, stable pagination
`src/pipeline/pass_normalize.c`	New: graph invariant enforcement pass
`scripts/autotune.py`	New: PageRank weight optimizer via persistent MCP stdio sessions
`Makefile.cbm`	macOS signing, leak test target, Clang analyzer target

Breaking Changes

Default tool list is 3, not 15. Restore with CBM_TOOL_MODE=classic.
Search results default to 50. Pass explicit limit or raise search_limit config to get more.
Cypher results capped at 10,000 rows / 32KB. Raise query_max_output_bytes or add LIMIT in Cypher if needed.

Additional useful list of repos:
https://github.com/YerbaPage/Awesome-Repo-Level-Code-Generation

Apply 8 token reduction techniques inspired by RTK (Rust Token Killer): 1. Default search limits: search_graph/search_code default limit 500K→50 (CBM_DEFAULT_SEARCH_LIMIT constant). Callers can override explicitly. 2. Smart truncation for get_code_snippet: 3 modes (full/signature/head_tail) with max_lines=200 default (CBM_DEFAULT_SNIPPET_MAX_LINES). head_tail preserves function signature + return/cleanup code. Signature mode returns only API surface without reading source files. 3. Compact mode for search_graph/trace_call_path: omits redundant name field when it's the last segment of qualified_name. 4. Summary mode for search_graph: returns aggregated counts by label and file (top 20) instead of individual results. 95% token reduction. 5. Trace edge case fixes: max_results param (default 25), BFS cycle deduplication by node ID, candidates array for ambiguous function names, callees_total/callers_total counts. 6. query_graph output truncation: max_output_bytes (default 32KB) caps worst-case output. Does NOT change max_rows (which is a scan-limit that would break aggregation queries). 7. Token metadata: _result_bytes and _est_tokens in all MCP tool responses for LLM token awareness. 8. Stable pagination: ORDER BY name, id for deterministic pagination. All defaults use named constants (CBM_DEFAULT_*) — no magic numbers. CYPHER_RESULT_CEILING reduced 100K→10K as safety net. Tests: 22 new tests in test_token_reduction.c, all passing. All 2060+ existing tests pass with zero regressions.

Register index_dependencies MCP tool for indexing dependency/library source code into a separate dependency graph. Dependencies are stored in {project}_deps.db (separate from project.db) and are NOT included in queries unless include_dependencies=true is passed. AI grounding safeguards (7-layer defense): 1. Storage: separate _deps.db not touched by index_repository 2. Query default: include_dependencies=false (deps excluded by default) 3. QN prefix: dep.{mgr}.{package}.{symbol} convention documented 4. Response field: "source":"project" / "source":"dependency" labels 5. Properties: "external":true on dependency nodes 6. Tool description: explicitly states "SEPARATE dependency graph" 7. Boundary markers: trace_call_path shows project→dep edges Current state: - Tool registered with full parameter validation (project, package_manager required) - include_dependencies param added to search_graph with source field - Handler returns structured "not_yet_implemented" status - Full dep resolution pipeline (depindex module) designed but deferred Tests: 12 new tests in test_depindex.c, all passing. All 2042 existing tests pass with zero regressions. Next: implement src/depindex/ module for actual package resolution (uv/cargo/npm/bun), dependency file discovery, and pipeline integration per the plan in plans/serialized-pondering-puppy.md.

…ndexing (reference-api-indexing) Resolve 5 conflicts across 3 files: - Makefile.cbm: combine TEST_TOKEN_REDUCTION_SRCS + TEST_DEPINDEX_SRCS in ALL_TEST_SRCS - src/mcp/mcp.c: merge compact/summary/search_mode params with include_dependencies param in handle_search_graph; add source:"project" field in full-mode results loop at line ~912 - tests/test_main.c: register both suite_token_reduction and suite_depindex Combined capabilities on merged branch: - 8 RTK-inspired token reduction strategies (CBM_DEFAULT_* constants, 3 snippet modes, compact/summary search, BFS dedup, query output cap, _result_bytes/_est_tokens metadata) - index_dependencies MCP tool with AI grounding (7-layer defense: separate _deps.db, include_dependencies=false default, dep.{mgr}.{pkg} QN prefix, source field, external property) Tests: 2064 passed, 0 failed (22 token_reduction + 12 depindex + 2030 existing)

Summary mode bug: by_label only counted 50 results (the default limit) instead of all symbols. Fix: override effective_limit to 10000 when mode=summary so aggregation covers representative sample. Pagination: when has_more=true, add pagination_hint field: "Use offset:50 and limit:50 for next page (13818 total)" This guides LLMs to use offset/limit for progressive exploration. Verified on RTK codebase (45,388 symbols): - Summary mode: 1,317 bytes with accurate label counts - Default search: pagination_hint present when has_more=true - All 2064 tests pass

…branch The TEST_DEPINDEX_SRCS and suite_depindex belong on the reference-api-indexing branch only. Remove from this branch to fix build error (test_depindex.c not present here).

All token reduction defaults are now configurable at runtime via the config system (cbm_config_get_int). Config keys: - search_limit: default result limit for search_graph/search_code - snippet_max_lines: default max source lines for get_code_snippet - trace_max_results: default max BFS nodes for trace_call_path - query_max_output_bytes: default output cap for query_graph Tool schema descriptions no longer contain hardcoded numbers — they reference config keys instead, so changing a default won't make the description misleading. Tool descriptions now include comprehensive AI guidance: - search_graph: how to paginate (offset+limit), mode=summary for overview - query_graph: max_output_bytes=0 for unlimited, LIMIT in Cypher - get_code_snippet: mode=signature for API lookup, mode=head_tail for preserving return/cleanup, max_lines=0 for full source - trace_call_path: max_results for exhaustive traces, callees_total for truncation awareness - All tools: config key names documented for runtime override Tests: 2052 passed, 0 failed

…indexing

Previous behavior: Merging reduce-token-usage (which removed depindex refs from its branch) into the combined branch dropped TEST_DEPINDEX_SRCS and suite_depindex, reducing test count from 2064 to 2052. What changed: - Makefile.cbm: re-add TEST_DEPINDEX_SRCS = tests/test_depindex.c and include $(TEST_DEPINDEX_SRCS) in ALL_TEST_SRCS - tests/test_main.c: re-add extern suite_depindex declaration and RUN_SUITE(depindex) call before integration suite Why: The merged branch must run both test suites (token_reduction + depindex). The upstream reduce-token-usage branch correctly excludes depindex (it doesn't have that feature), but the combined branch needs both. Testable: make -f Makefile.cbm test → 2064 passed, 0 failed

1. Remove misleading "Set limit=0 for no cap" from search_graph schema description — store.c maps limit=0 to 500K, not truly unlimited 2. Eliminate redundant is_summary_early variable — merge into single is_summary bool computed once before the search query 3. Add bounds-check comment for summary mode labels[64] array explaining the cap matches CBM's ~12 label types with margin 4. Replace %zu with %lu + (unsigned long) cast in query_graph truncation snprintf for portability (existing codebase avoids %zu) 5. Add include_dependencies parameter to search_graph tool schema so LLMs can discover the opt-in dependency inclusion feature 6. Remove hardcoded "default":50 from search_code JSON schema — actual default comes from config key search_limit at runtime Tests: 2064 passed, 0 failed

The include_dependencies parameter belongs to the reference-api-indexing branch only. It was accidentally introduced via cherry-pick of the code review fix. The schema declared a parameter that the handler on this branch doesn't read — a maintainer would flag this as a schema/code mismatch. Removed the include_dependencies property from the search_graph tool schema JSON. The parameter remains in the combined branch where the handler code exists. Tests: 2052 passed, 0 failed

- Token metadata comment: explain _result_bytes (byte length of inner JSON text) and _est_tokens (bytes/4, same heuristic as RTK's estimate_tokens function in tracking.rs) - Pagination hint: add comment explaining the pagination_hint field purpose (tells caller how to get next page) - Head/tail mode: document the 60/40 split rationale (60% head captures signature/setup, 40% tail captures return/cleanup; middle implementation detail is what gets omitted) Tests: 2064 passed, 0 failed

Three defensive guards for out-of-memory conditions: 1. trace_call_path: calloc for seen_out/seen_in dedup arrays now gracefully degrades — if calloc returns NULL, dedup is skipped (may return duplicates) instead of NULL-dereference crash 2. build_snippet_response: head_tail combined buffer malloc is NULL-checked — on OOM, falls back to outputting head portion only instead of passing NULL to snprintf All guards are idiomatic C (if-pointer-check, no gotos). Existing tests cover the functional behavior; OOM paths are defensive safety nets for production resilience. Tests: 2052 passed, 0 failed

The include_dependencies parameter was parsed in the handler (line 776) but not declared in the TOOLS[] schema JSON. This meant LLMs could not discover the parameter from tool descriptions — it was silently accepted but undiscoverable. Added include_dependencies boolean property with description to the search_graph tool schema, matching the merged branch's schema. Tests: 2042 passed, 0 failed

OOM fixes (applied to both feature branches): 1. trace_call_path: calloc for seen_out/seen_in dedup arrays gracefully degrades on OOM — skips dedup instead of NULL-dereference crash 2. build_snippet_response: head_tail combined buffer malloc falls back to head-only output on OOM instead of NULL snprintf Documentation (notes/ folder): - notes/token-reduction-changes.md: 8 RTK-inspired strategies, config system, real-world results, mermaid architecture diagram - notes/reference-api-indexing-changes.md: 7-layer AI grounding defense, QN prefix format, deferred work, mermaid flow diagram - notes/merged-branch-changes.md: branch lineage gitGraph, combined architecture diagram, snippet mode decision flow, token reduction pipeline per tool, test coverage, merge conflict resolution Tests: 2064 passed, 0 failed

…l 4 SKILL.md files codebase-memory-reference/SKILL.md: - Update tool count from 14 to 15 (add index_dependencies) - Remove read_file/list_directory (not in TOOLS[] array) - Add "Token Reduction Parameters" section documenting mode, compact, max_lines, max_output_bytes, max_results, include_dependencies - Add config key reference for runtime overrides - Update Critical Pitfalls: search_graph defaults to 50, query_graph capped at 32KB - Add decision matrix entries for summary, signature, head_tail, dependency search codebase-memory-tracing/SKILL.md: - Add mode=signature example to Step 5 for quick API inspection - Document max_results default (25) and compact=true for token savings codebase-memory-exploring/SKILL.md: - Add mode=summary to Step 2 as alternative overview method - Update default from 10 to 50 results per page - Add compact=true and pagination_hint tips codebase-memory-quality/SKILL.md: - Add mode=summary and compact=true tips - Update pagination guidance with pagination_hint Tests: 2064 passed, 0 failed

codebase-memory-reference/SKILL.md: - Update search_graph, trace_call_path, query_graph, get_code_snippet tool descriptions with new parameters - Remove read_file/list_directory (not in TOOLS[] array) - Add "Token Reduction Parameters" section with mode, compact, max_lines, max_output_bytes, max_results documentation - Add config key reference for runtime overrides - Update Critical Pitfalls for new defaults - Add decision matrix entries for summary, signature, head_tail codebase-memory-tracing/SKILL.md: - Add mode=signature example, max_results default, compact=true tip codebase-memory-exploring/SKILL.md: - Add mode=summary to Step 2, update default to 50, add compact tip codebase-memory-quality/SKILL.md: - Add mode=summary, compact=true, pagination_hint tips Tests: 2052 passed, 0 failed

codebase-memory-reference/SKILL.md: - Update tool count from 14 to 15 (add index_dependencies) - Remove read_file/list_directory (not in TOOLS[] array) - Add include_dependencies note to search_graph description - Add decision matrix entries for dependency search and indexing Tests: 2042 passed, 0 failed

…grams Comprehensive feature matrix documenting: - Branch availability for all 13 existing + new features - Composability matrix showing how features interact when combined - Detailed interaction table with justifications for each combination - Strengths and limitations of each feature with specific measurements - AI grounding 7-layer defense failure mode analysis - Architecture diagram showing composable pipeline stages - 5 generalizable design patterns extracted from the implementation Key composability findings: - summary mode overrides limit (uses 10K for accurate aggregation) - signature mode overrides max_lines (no file I/O needed) - compact applies independently at serialization stage - include_dependencies composes with all token reduction features - _result_bytes/_est_tokens always reflects final output size Tests: 2064 passed, 0 failed

…h_code fix New module src/depindex/ with package resolution (uv/cargo/npm/bun), ecosystem detection, dep discovery from indexed graph, auto-index helper, and cross-boundary edge creation stub. Dependencies stored in same db with {project}.dep.{package} naming convention. Pipeline changes: - Add CBM_MODE_DEP index mode (keeps vendor/, .d.ts for dep source) - Add cbm_pipeline_set_project_name() to override auto-derived name - Add cbm_pipeline_set_flush_store() for upsert vs fresh dump - Conditional dump/flush at pipeline.c:646 Store changes: - Add project_pattern (LIKE) and project_exact fields to cbm_search_params_t - Support LIKE queries for glob-style project filtering - Add project-first ORDER BY for mixed project+dep results - Stable pagination via ORDER BY name, id MCP changes: - Replace index_dependencies stub with full implementation (source_paths[] primary interface, package_manager optional shortcut) - Fix detect_session() to use cbm_project_name_from_path (Bug DeusData#12) - REQUIRE_STORE error now includes actionable hint field - search_code: fix -m limit exhaustion (limit*50 min 500 vs limit*3) - search_code: add case_sensitive param (default false = case-insensitive) DRY improvements: - CBM_MANIFEST_FILES shared list in depindex.h used by pass_configlink.c and dep discovery (adds pyproject.toml, setup.py, Pipfile) - Remove package.json and composer.json from IGNORED_JSON_FILES (needed by pass_configlink and dep auto-discovery) Tests: 25 depindex tests (2055 total, all passing) - Package manager parse/str roundtrip, dep naming, is_dep detection - Ecosystem detection (python/rust/none), manifest path matching - npm resolution with fixture, pipeline set_project_name - MCP tool validation, AI grounding, dep reindex replaces

Merge reduce-token-usage branch into token-reduction-and-reference-indexing. Conflict in codebase-memory-reference/SKILL.md resolved by taking the reduce-token-usage version (has complete token reduction documentation).

Merge reference-api-indexing into token-reduction-and-reference-indexing. Three conflicts resolved with superset approach: 1. search_graph schema (mcp.c:272): Keep reduce-token-usage's mode/compact/ pagination_hint params AND reference-api-indexing's include_dependencies 2. search_code schema (mcp.c:349): Keep reduce-token-usage's configurable limit description AND reference-api-indexing's case_sensitive param 3. store.c ORDER BY (1866): Keep reference-api-indexing's project-first sort for mixed project+dep results (superset of stable name,id sort) SKILL.md: took reduce-token-usage version (complete token reduction docs). All 2077 tests pass (2042 base + 22 token-reduction + 13 dep-indexing).

…paths expand_project_param() (mcp.c:764-840): - "self" → session project exact match - "dep"/"deps" → session.dep prefix match - "dep.pandas" → session.dep.pandas prefix - "myapp.pandas" → myapp.dep.pandas (auto-insert .dep.) - Glob "*" → SQL LIKE with % substitution - fill_project_params() helper sets cbm_search_params_t fields search_graph result tagging (mcp.c:930-960): - Every result tagged source:"project" or source:"dependency" - Dep results get package name + read_only:true - session_project added to response for AI project name awareness - Uses cbm_is_dep_project() with session context for precision handle_index_status (mcp.c:1046-1100): - Reports dependencies[] array with package names and node counts - Reports detected_ecosystem from project root marker files - session_project in response Dep auto-reindex in all 3 re-index paths: - handle_index_repository (mcp.c:1472): cbm_dep_auto_index after dump - watcher_index_fn (main.c:86-96): cbm_dep_auto_index after dump - autoindex_thread (mcp.c:2496-2501): cbm_dep_auto_index after dump All use DRY cbm_dep_auto_index() with CBM_DEFAULT_AUTO_DEP_LIMIT cbm_mcp_server_set_session_project() added (mcp.h:128, mcp.c:526) Fix: yyjson_mut_obj_add_strcpy for dep package names from search results (heap-use-after-free when cbm_store_search_free frees borrowed strings) Fix: db_project selection when session_project is empty (integration test integ_mcp_delete_project was failing — resolve_store got NULL instead of project name after expand_project_param) Tests: 29 depindex tests (2059 total, all passing) - test_search_results_have_source_field: project results tagged - test_search_dep_results_tagged_dependency: dep results have package+read_only - test_search_response_has_session_project: session_project in response - test_index_status_shows_deps: dependencies[] in index_status response

…ep re-index Merge latest reference-api-indexing (bd09623) with gap implementations: Conflicts resolved (2 in mcp.c search_graph handler): 1. Param parsing: keep reduce-token-usage compact/summary/mode params, use fill_project_params() from reference-api-indexing for smart project 2. Result loop: keep reduce-token-usage compact mode + pagination_hint, add unconditional source/package/read_only tagging from ref-api-indexing Superset features in merged handler: - expand_project_param (self/dep/deps/glob/prefix resolution) - compact mode (omit redundant name when suffix of QN) - summary mode (aggregate counts by label/file) - pagination_hint with offset/limit guidance - session_project in all responses - source:"project"/"dependency" on every result - package + read_only:true on dep results - dep auto-reindex in all 3 paths (handler, watcher, autoindex) - index_status reports dependencies[] + detected_ecosystem All 2081 tests pass (2042 base + 22 token-reduction + 17 dep-indexing).

Root cause: handle_list_projects opens every .db file in ~/.cache/codebase-memory-mcp/ via cbm_store_open_path (which runs CREATE TABLE IF NOT EXISTS, modifying foreign databases). With 62 stale .db files (1.3GB) including a corrupt 223MB "..db" (empty project name), the server hung during Claude Code health checks. Fixes: - Add validate_cbm_db(): read-only SQLite validation with magic byte check + 'nodes' table schema check + 1s busy_timeout. Never modifies foreign databases. Logs actionable warnings on skip. - Guard detect_session() against empty/dot project names that produce the corrupt "..db" filename - Skip "..db" and ".db" filenames in handle_list_projects - Skip empty/dot project names after filename-to-name extraction - Force unbuffered stdin/stdout via setvbuf for MCP stdio protocol - Add #include <sqlite3.h> for read-only validation queries Files: src/main.c (setvbuf), src/mcp/mcp.c (validate_cbm_db, detect_session guard, list_projects guards, sqlite3.h include)

Gap 3 (trace boundary tagging): trace_call_path now tags each caller and callee with source:"project"|"dependency" and read_only:true for dep nodes. Uses cbm_is_dep_project() for consistent tagging. Gap 4 (snippet provenance): build_snippet_response adds source and read_only fields so get_code_snippet results indicate whether code is from the project or a dependency. Cross-edges: cbm_dep_link_cross_edges implemented — searches project Variable nodes, looks for matching Module nodes in dep projects (project.dep.%), creates IMPORTS edges to link them. Enables trace_call_path to follow imports across project/dep boundary. Gap 1 (watcher dep re-index) was already done in prior commit. Files: src/mcp/mcp.c (trace + snippet tagging), src/depindex/depindex.c (cross-edge implementation)

…lt true exclude param (search_code_graph, trace_call_path, get_architecture): - Accepts array of glob patterns to filter results by file_path - Converted to SQL NOT LIKE via cbm_glob_to_like in store.c - New cbm_search_params_t.exclude_paths field (NULL-terminated array) - Helper: cbm_mcp_get_string_array_arg() parses JSON array → C string array - 4 TDD tests: filters paths, empty array no-op, exclude-all, schema presence Config-driven key_functions (get_architecture tool + codebase://architecture): - build_key_functions_sql() shared helper: builds PageRank query with config + param exclude patterns applied via NOT LIKE clauses - CBM_CONFIG_KEY_FUNCTIONS_EXCLUDE: comma-separated globs persisted in config (e.g. "scripts/**,tools/**,tests/**") — no hardcoded path assumptions - Labels filtered to Function/Class/Method/Interface (code entities only) - Both get_architecture tool and build_resource_architecture use same helper auto_index default changed from false to true: - maybe_auto_index() now indexes on first startup by default - Ensures codebase://schema/architecture/status resources have data at first read - Configurable: set auto_index=false to disable for large repos Tests: 2197 → 2201 (4 new exclude param tests)

Config registry (CBM_CONFIG_REGISTRY in cli.c): - 25 config keys across 5 categories: Indexing, Search, Tools, PageRank, Dependencies. Each entry has key, default, env var name, category, description. - All defaults verified against code-level #define values. cbm_config_get_effective(): priority chain env > DB > default. - Checks registry for env var name, reads env first, falls back to DB. - Used by config get CLI and auto_index in maybe_auto_index. Env var overrides for key settings: - CBM_AUTO_INDEX (bool), CBM_AUTO_INDEX_LIMIT (int) - CBM_REINDEX_ON_STARTUP (bool) - CBM_KEY_FUNCTIONS_EXCLUDE (comma-separated globs) - CBM_TOOL_MODE (streamlined/classic) config list output: - Grouped by category with [Category] headers - Shows (env) when env var is active, (set) when DB value differs from default - All 25 keys visible (was: only 2) config help: - Shows storage location (~/.cache/codebase-memory-mcp/_config.db) - Priority explanation (env > config set > default) - Examples for config set and env var usage - Keys grouped by category with [env: VAR_NAME] annotation Fixed: auto_dep_limit default 5→20, dep_max_files default 5000→1000 to match code-level CBM_DEFAULT_AUTO_DEP_LIMIT and CBM_DEFAULT_DEP_MAX_FILES. Fixed: hint message provides complete commands, not fragments. Improved: dependency config descriptions explain what packages/files mean.

Root cause: pass_configlink.c allocated ~4.2MB on the stack: - config_entries[4096] × 520 bytes = 2.0MB - code_entries[8192] × 264 bytes = 2.1MB - deps[2048] × 264 bytes = 0.5MB Background threads get 512KB stack (macOS default) → SIGBUS. Fix: heap-allocate all three arrays with calloc, free on every return path. Verified: autorun repo (311 files, 6766 nodes) completes in 409ms. Also fix: main.c shutdown order — join autoindex thread BEFORE freeing watcher and watch_store. Previously watcher was freed while autoindex thread still had a reference to srv->watcher, causing use-after-free. Tested: CBM_AUTO_INDEX=true on ~/.claude/autorun — clean completion, no SIGBUS, no hang. 2201 tests pass.

MEMBER_OF edges (Method→Class): - Pipeline inserts MEMBER_OF reverse edge alongside each DEFINES_METHOD edge in both parallel (pass_parallel.c) and sequential (pass_definitions.c) paths. PageRank power iteration naturally propagates member importance to parent classes via the graph structure — no post-hoc hacks. - Config: edge_weight_member_of (default 0.5, 0=disabled) Edge weight tuning: - USAGE: 0.2→0.7 (type refs dominant in Python/JS) - DEFINES: 0.5→0.1 (structural noise) - DEFINES_METHOD: 0.8→0.5 - default_weight: 0.3→0.1 - New explicit: TESTS=0.05, WRITES=0.15, DECORATES=0.2 Result on autorun (no hacks, pure algorithm): EventContext DeusData#5, SessionStateManager DeusData#4, classes throughout top 10 Test functions dampened, structural noise reduced

…ere silently ignored Previous behavior: search_graph accepted qn_pattern, relationship, exclude_entry_points, include_connected, and include_dependencies in its JSON schema but never extracted or applied them — all 5 were silently ignored. trace_call_path hardcoded edge_types=["CALLS"] regardless of user input, and its compact default (true) disagreed with the schema (false). include_dependencies schema default was false, opposite to the prefix-match behavior that already included dep sub-projects by default. What changed: - src/mcp/mcp.c: extract qn_pattern and relationship in handle_search_graph Phase 1 (after name_pattern, before file_pattern); extract exclude_entry_points, include_connected, include_dependencies as bools after max_degree; wire all 5 into cbm_search_params_t; add include_dependencies=false guard: sets project_exact=true when project is set without glob pattern, scoping results to exact project name (excludes .dep.* sub-projects); add free(qn_pattern) and free(relationship) to cleanup block - src/mcp/mcp.c: replace hardcoded edge_types[]={"CALLS"} in handle_trace_call_path with user-supplied edge_types array extracted after all three early-return guards (lines 2062, 2069, 2086) to avoid memory leaks on those paths; use free_string_array() for cleanup; fix compact default from false to true (matches schema); fix include_dependencies schema default from false to true with updated description - src/store/store.c: add qn_pattern REGEXP/iregexp dual-branch WHERE clause after name_pattern block (same pattern as name_pattern at lines 1835-1844); add relationship EXISTS filter using local rel_cond[128] (exceeds bind_buf[64]) with both edge directions (source OR target); merge exclude_entry_points "in_deg > 0" condition into the existing degree-filter subquery block to avoid double subquery nesting; fix has_degree_wrap to include exclude_entry_points so ORDER BY uses bare column names in the outer wrapped query - tests/test_token_reduction.c: add setup_sp_server() fixture (4 nodes: main, process_request, fetch_data, dep_helper; 2 edges: CALLS main->process_request, HTTP_CALLS fetch_data->process_request); add 12 new parameterization accuracy tests in token_reduction suite covering qn_pattern filter, relationship filter, exclude_entry_points, include_dependencies=true/false, compact default, edge_types traversal Why: parameters declared in the MCP schema but not implemented silently accept user input and return wrong results — AI agents and users passing these params get misleading output. The include_dependencies schema default disagreed with actual behavior. The trace edge_types hardcoding prevented traversal of non-CALLS relationships (HTTP_CALLS, IMPORTS, etc.). Testable: make -f Makefile.cbm test (2213 passed, 0 failed) search_graph '{"qn_pattern":".*handlers.*","project":"sp-test"}' returns only handlers search_graph '{"relationship":"HTTP_CALLS","project":"sp-test"}' returns nodes with HTTP edges search_graph '{"exclude_entry_points":true}' removes nodes with in_deg=0 (CALLS) search_graph '{"include_dependencies":false,"project":"myapp"}' excludes myapp.dep.* nodes trace_call_path '{"function_name":"f","edge_types":["HTTP_CALLS"]}' follows HTTP edges

…token efficiency search_graph compact: enumerate all omitted fields explicitly (name, empty label/file_path, zero degrees) with concrete example and absent-field defaults, replacing ambiguous "Absent:" footnote that didn't connect omission to compact. search_graph include_dependencies: remove redundant "Default: true" restatement (already in schema) and duplicate "dep sub-projects" mention. trace_call_path compact: add missing omission condition (name == qualified_name last segment) and example, replacing unexplained "redundant" jargon. query_graph max_rows: tighten prose without losing the "default: unlimited" fact (absent from schema) or the scanned-vs-returned distinction. search_code case_sensitive: consolidate into single clause "Match case-sensitively (default: case-insensitive)." Also includes (from prior commits on this branch): - search_graph: omit empty label/file_path fields instead of emitting "" - search_graph: omit zero in_degree/out_degree instead of emitting 0 - trace_call_path candidates: omit empty file_path instead of emitting ""

Replace hardcoded /Users/martinvogel path (and intermediate ~ which MCP clients don't expand) with sh -c "exec \$HOME/.local/bin/..." so the shell expands \$HOME at launch time on any machine.

…on to 17 managers, improve compact output Memory leaks fixed (0 leaks confirmed via leaks --atExit): - mcp.c resolve_store: cbm_project_free_fields was gated on proj.root_path[0] — empty string paths silently skipped free. Separated free from the watcher call; now always frees after successful cbm_store_get_project. - mcp.c handle_index_status: cbm_store_search_free skipped when dep_out.count==0 — cbm_store_search allocates even for empty results. Restructured to free whenever search succeeds. Same fix for cbm_project_free_fields call in ecosystem detection path. - pagerank.c: node_labels leaked on two early return paths (N==0 and id_map_init failure). Both paths now free node_ids and node_labels (with per-element free for strdup'd entries before the N==0 branch assigns any). - pass_envscan.c: 8 static regexes compiled once by compile_patterns() were never freed. Added cbm_envscan_free_patterns() that calls cbm_regfree on each and resets patterns_compiled=0. - pipeline.h/pipeline.c: public cbm_pipeline_global_cleanup() wraps cbm_envscan_free_patterns(). Called in main.c after ALL server threads joined (HTTP + stdio) to avoid racing with autoindex threads. Also called in run_cli() path and test_pipeline.c teardown. Ecosystem detection expanded from 8 to 17 package managers: - depindex.h: added CBM_PKG_MAKE, CBM_PKG_CMAKE, CBM_PKG_MESON, CBM_PKG_CONAN (C/C++ build systems). Expanded CBM_MANIFEST_FILES with build.gradle.kts, bun.lockb, global.json, Directory.Build.props, NuGet.Config, Makefile, GNUmakefile, Makefile.cbm, CMakeLists.txt, meson.build, conanfile.txt, conanfile.py, vcpkg.json. - depindex.c: rewrote cbm_detect_ecosystem to cover all 17 managers using CHECK() macro for exact filename matches and dir_contains_suffix() for wildcard patterns (*.csproj, *.fsproj). Added has_vendored_deps_dir() helper. Added discover_vendored_deps() which scans vendor/ vendored/ third_party/ thirdparty/ deps/ external/ ext/ contrib/ lib/ _vendor/ submodules/ for C/C++ and CBM_PKG_CUSTOM build systems. dep search hint in handle_search_graph: - When a dep project search (project:"dep", expanded to prefix "<session>.dep") returns 0 results, emits a "hint" field with an ecosystem-aware actionable message. If cbm_detect_ecosystem succeeds, the hint names the detected build system and instructs to re-run index_repository. If no ecosystem detected, lists all 17 supported manifest file types. Compact output improvements in mcp.c: - handle_search_graph: skip emitting "name" when it equals the last segment of qualified_name (ends_with_segment check) or when empty. - handle_trace_call_path: same fix for both outbound (callees) and inbound (callers) node arrays. Added callers_total emission to match callees_total (was documented in tool description but never emitted). - build_snippet_response: skip empty name, label, and file_path fields. Compact param now wired through all six call sites in handle_get_code. Zero-value numeric fields skipped in compact mode. - handle_get_architecture / build_resource_architecture: skip redundant name (when equals last qualified_name segment) and empty label/fp in key_functions arrays. Test coverage: - test_token_reduction.c: 504-line new file covering compact suppression of redundant name/label/empty fields, callers_total presence, get_code compact param propagation, architecture key_functions, and dep search hint emission. - test_mcp.c, test_pipeline.c: minor additions for new behaviors. Makefile.cbm: - Added nosan build (CFLAGS_NOSAN, LDFLAGS_NOSAN, MONGOOSE_CFLAGS_NOSAN, per-object NOSAN variants for sqlite3/lsp/grammar/ts_runtime/mongoose). - Added test-leak target: macOS uses leaks --atExit on test-runner-nosan; Linux uses ASAN_OPTIONS=detect_leaks=1 on regular test-runner. - Added test-analyze target: Clang --analyze on production + test sources (skipped with message when IS_GCC=yes). - Updated .PHONY with test-leak, test-analyze, test-runner-nosan.

CLAUDE.md (new): project-level developer notes for Claude with concrete commands for make test, make test-leak, make test-analyze, and explanation of why macOS requires test-runner-nosan (ASan replaces malloc, blocking leaks --atExit from walking the heap). CONTRIBUTING.md: added "Run C Server Tests" section after the Go test section. Covers make -f Makefile.cbm test/test-leak/test-analyze, the macOS vs Linux difference in leak detection approach, and the expected clean-run output ("0 leaks for 0 total leaked bytes"). Makefile.cbm HOW TO USE block (committed previously) already documents the commands inline — these docs surface the same info for contributors who read CONTRIBUTING.md first.

…te, pass_normalize, 11 TDD tests Phases 1-8 from comprehensive plan (notes/2026-03-26-0013-plan-*.md): Phase 1 — Input validation (F1,F4,F6,F7,F9,F10,F15): mcp.c: empty label→NULL, limit≤0→default, sort_by/mode enum validation, regex pre-validation via cbm_regcomp, depth clamp, direction validation Phase 2 — B7 Cypher param fix + CQ-2 project expansion: mcp.c:handle_query_graph reads "cypher" first with "query" fallback, uses resolve_project_store for "self"/"dep"/path shortcuts Phase 3 — DRY resolve_project_store in 5 handlers: handle_get_graph_schema, handle_index_status, handle_get_architecture, handle_get_code_snippet, handle_index_dependencies Phase 4 — DF-1 degree precompute (100× faster queries): store.c: node_degree table DDL, search SELECT uses LEFT JOIN with HC-6 fallback to edge COUNT, cbm_store_node_degree reads precomputed table, arch_hotspots uses nd.calls_in, arch_boundaries adds behavioral types pagerank.c: is_calls field, degree accumulation during edge iteration, node_degree batch INSERT after LinkRank, OOM-safe allocations Phase 5 — B2/B5 name-based caller fallback: pass_calls.c: 3-step resolution (exact QN → shared helper → Module) graph_buffer.c: cbm_gbuf_resolve_by_name_in_file DRY helper (HC-1) Phase 6 — B17/B13 class-method edge repair: NEW pass_normalize.c: enforces I2 (Method→Class) and I3 (Field→Class) invariants via QN prefix + name+label+file fallback. O(M+F) runtime. pipeline.c: normalize pass before dump. Makefile.cbm updated. Phase 7 — CBMLangSpec section_node_types field: lang_specs.h: added section_node_types (17th field) lang_specs.c: all 64 language specs updated with NULL initializer Phase 8 — IX-1..3 indexing pathway fixes: mcp.c: autoindex_failed + just_autoindexed flags in server struct, REQUIRE_STORE captures pipeline return code, build_resource_status shows "indexing" state + failure detail + action_required hints Additional fixes: G1: summary mode adds results=[] + results_suppressed=true CQ-3: Cypher + filter params produces warning Tests: 2238 pass (11 new in test_input_validation.c covering F1,F6,F9, F10,F15 edge cases, G1, CQ-3, IX-2). Updated test_store_nodes.c for total degree. Updated test_token_reduction.c for G1 results key.

.clangd: mirrors Makefile.cbm CFLAGS_COMMON include paths so clangd resolves headers without compile_commands.json. .gitignore: add .worktrees/, session_project, project, conductor/, with — runtime/session artifacts from Claude Code subagents.

…/3 indexing status Phase 3 — DRY project resolution in 5 handlers: handle_get_graph_schema, handle_index_status, handle_get_architecture, handle_get_code_snippet: resolve_store → resolve_project_store handle_index_dependencies: expand raw_project before resolve_store Forward declaration added for resolve_project_store (needed by handle_get_graph_schema which precedes the definition) Phase 8 — Indexing pathway status state machine: IX-1: autoindex_failed flag in server struct. REQUIRE_STORE captures pipeline_run return code — on failure sets flag + logs error. Error response includes "auto-indexing failed" with detail and fix hint. IX-2: build_resource_status checks autoindex_active → "indexing" state with timing hint. Not-indexed path shows failure detail or action_required. Empty store path shows hint about no recognized source files. IX-3: just_autoindexed flag set on successful auto-index in REQUIRE_STORE. All 2238 tests pass. Installed to ~/.local/bin/.

search_code_graph: add auto-index on first query, cypher filter ignore note, summary mode results_suppressed behavior. trace_call_path: add auto-index, depth<1 clamped to 1, invalid direction returns error. get_code: add Module metadata-only note with auto_resolve hint. codebase://status resource: add indexing state, project name field, action_required hint, auto-index failure detail. _hidden_tools: add auto-index note, list all 4 status states. All 2238 tests pass. Installed to ~/.local/bin/.

…totune Fixes codebase://architecture returning only 10 results all from graph-ui by wiring hardcoded limits through the config system and raising defaults to 25. Key changes: - mcp.c: add key_functions_count config (default 25); wire into build_key_functions_sql (was hardcoded LIMIT 10 at line 4317) and build_resource_architecture call site - mcp.c: add arch_hotspot_limit config (default 25); wire into classic get_architecture tool handler - store.c/store.h: raise CBM_ARCH_HOTSPOT_DEFAULT_LIMIT 10->25; add hotspot_limit param to cbm_store_get_architecture - store.c/store.h: add sort_by=calls (ORDER BY calls_in+calls_out DESC) and sort_by=linkrank (ORDER BY linkrank_in DESC) dispatch cases; add degree_mode config (weighted|unweighted|calls_only) for min_degree/max_degree filter column selection - watcher.c/watcher.h: add poll_base_ms/poll_max_ms to struct cbm_watcher; change cbm_watcher_run and cbm_watcher_poll_interval_ms signatures to accept base_ms/max_ms params (0=defaults); wire watcher_poll_base_ms and watcher_poll_max_ms config keys through main.c - cli.h: extend cbm_config_entry_t with range and guidance fields (5->7) - cli.c: replace entire CBM_CONFIG_REGISTRY with 7-field entries for all 32 config keys with broadest feasible ranges and actionable guidance strings; update config list/get/help display to print [range] + guidance per entry - scripts/autotune.py: new standalone Python 3.9+ script that sends JSON-RPC directly to the binary via stdin/stdout, tries 7 experiments, scores against expected top-10 ground truth for 3 repos, resets config on exit - tests: update all callers of cbm_store_get_architecture (pass 0 for hotspot_limit) and cbm_watcher_poll_interval_ms (pass 0,0 for defaults) All 2238 tests pass.

…ults, CLI params Previous behavior: autotune set config keys but never triggered PageRank recompute between experiments — all experiments read stale stored scores, producing identical results. The binary also got SIGKILL'd on macOS 25+ due to invalidated ad-hoc signature after `cp` during install. What changed: - scripts/autotune.py: replace query_architecture() (async REQUIRE_STORE reindex) with index_and_query_architecture() — opens one persistent stdio MCP session per repo per experiment sending 3 sequential messages: initialize → tools/call index_repository (synchronous, blocks until full pipeline+PageRank completes with current edge weights) → resources/read codebase://architecture - scripts/autotune.py: add project_name_from_path() mirroring cbm_project_name_from_path() from src/pipeline/fqn.c, and delete_project_db() to remove stale DBs - scripts/autotune.py: add _send_batch() env+cwd params; pass CBM_TOOL_MODE=classic so index_repository tool is available in MCP session - scripts/autotune.py: add --top-matches (default 10) and --key-count (default 25) CLI params; show matched expected names + top-N per repo in output - scripts/autotune.py: default timeout 60s → 1200s (indexing takes ~40s per repo) - scripts/autotune.py: add exclude_ui_tests experiment; rename calls_boost_excl → calls_boost_excl_tests with tests/** added to exclude list - scripts/autotune.py: save every run to scripts/autotune_results.json (appended, with timestamp/binary/repos/experiments/best fields) - scripts/autotune.py: show progress bar (█/░) and ◀ BEST marker in final report - .gitignore: add scripts/autotune_results.json (generated artifact, not tracked) Why: edge weights and PageRank iterations are only applied at index time via cbm_pagerank_compute_with_config(); querying a DB indexed with old weights produces wrong rankings regardless of config changes. Full reindex per experiment is required. Also fixes macOS 25+ SIGKILL by rebuilding binary (Makefile.cbm re-signs with codesign --force --sign - after install). First run result: calls_boost_excl_tests scores 6/30 (best), baseline 0/30. Testable: python3 scripts/autotune.py

…l_tests) Previous defaults: edge_weight_calls=1.0, edge_weight_usage=0.7, key_functions_exclude="" (no exclusions). What changed: - scripts/autotune.py DEFAULTS: edge_weight_calls 1.0 → 2.0 (call edges are the strongest signal for production importance) - scripts/autotune.py DEFAULTS: edge_weight_usage 0.7 → 0.3 (type-reference edges add noise, dampening improves ranking signal) - scripts/autotune.py DEFAULTS: key_functions_exclude "" → "graph-ui/**, tools/**,scripts/**,tests/**" (excluding non-production paths surfaces core library functions instead of test helpers) Why: autotune run on 2026-03-26 scored calls_boost_excl_tests at 6/30 across 3 repos (codebase-memory-mcp, autorun, rtk), best of 8 experiments. Baseline scored 0/30. These defaults are now the baseline that experiments diverge from, so future autotune runs search the config space around the current best. Testable: python3 scripts/autotune.py (baseline_25 now starts from these values)

DeusData · 2026-03-26T10:18:03Z

Hey, thanks for the commit! Looks cool, but evaluating and checking will take a bit of time as this is quite a lot. Will give you feedback as soon as I can :)

ahundt · 2026-03-26T10:33:17Z

@DeusData yeah it is quite a bit haha, i appreciate the patience! also i edited the summary a bit to be clearer

DeusData · 2026-03-26T10:39:49Z

@ahundt would it maybe be possible that you split this up into seperate PRs? Then its easier to review and we can discuss on the individual things you have implemented. That would be awesome!

DeusData · 2026-03-26T10:56:58Z

I also see that some of the issues were already addressed. Maybe you can rebase here with latest main and check what might be already here

DeusData · 2026-03-26T11:00:27Z

Thanks @ahundt — ambitious PR! Note: we have a strict no-system() policy in production code (replaced with cbm_exec_no_shell for security). The test cleanup helpers use system(rm -rf ...) — please switch to the project's test_rmdir_r() helper. Will do a full review.

maplenk · 2026-03-26T11:09:11Z

Hi @ahundt @DeusData
I will try to combine some of these, plus some other open PRs.
Will keep a fork on my end and will share PRs, can choose to merge them if they feel relevant.

Also, @DeusData THANK YOU SO MUCH!!!
This has definitely reduced token usage for me.

DeusData · 2026-03-27T12:55:03Z

@ahundt — appreciate the energy and the thorough write-up. A few things:

On PageRank — same feedback as on #147: PageRank on a call graph would rank log.Error() and fmt.Sprintf() as the most important functions in any codebase, because they have the highest fan-in. That's not architectural importance — it's just utility popularity. We already have min_degree/max_degree for degree-based filtering, which gives you the same ranking signal without the conceptual mismatch or the computation cost. The academic papers you cite (Page/Brin 1998, Kim 2010) are about web and social network graphs where link semantics are fundamentally different from call-graph semantics.

On the API consolidation (3 tools instead of 15) — this is a major UX decision that changes how every user interacts with the tool. Reducing to 3 tools looks clean on paper, but it front-loads complexity into each tool (your search_code_graph has 17 parameters). The current API was designed so each tool does one thing with a small parameter surface. I'm not ready to change that default without much more user feedback.

On the bug fixes — the 5 silently-ignored parameters and the memory leaks sound like genuine issues worth addressing. These are the kind of contributions I'd love to merge. Could you split those into a focused PR? Same for the input validation tests.

On scope — as mentioned earlier, this needs to be split up. A PR that touches mcp.c (+2,475/-210), adds 4 new modules (pagerank/, depindex/, pass_normalize.c, cli.c config registry), introduces 3 breaking changes, and adds ~10K lines is not reviewable as a single unit. Each feature should be its own PR so we can discuss and merge independently.

I'd suggest: (1) bug fixes + leak fixes, (2) input validation tests, (3) config registry, (4) dependency indexing, (5) API consolidation — each as separate PRs rebased on latest main.

ahundt added 30 commits March 20, 2026 00:02

Makefile.cbm, test_main.c: remove depindex refs from token-reduction …

701d8a7

…branch The TEST_DEPINDEX_SRCS and suite_depindex belong on the reference-api-indexing branch only. Remove from this branch to fix build error (test_depindex.c not present here).

Merge branch 'reduce-token-usage' into token-reduction-and-reference-…

8cac6c1

…indexing

merge reduce-token-usage: resolve SKILL.md conflict

909438a

Merge reduce-token-usage branch into token-reduction-and-reference-indexing. Conflict in codebase-memory-reference/SKILL.md resolved by taking the reduce-token-usage version (has complete token reduction documentation).

ahundt added 16 commits March 23, 2026 07:00

mcp.json: use sh -c exec \$HOME/... for cross-machine portability

ffecf58

Replace hardcoded /Users/martinvogel path (and intermediate ~ which MCP clients don't expand) with sh -c "exec \$HOME/.local/bin/..." so the shell expands \$HOME at launch time on any machine.

DeusData added the enhancement New feature or request label Mar 26, 2026

This was referenced Mar 27, 2026

feat: auto-index on first tool call when project is not yet indexed #156

Open

fix: wire up silently-ignored search and trace parameters #155

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

supercharge codebase-memory-mcp: streamline and consolidate api, autoindexing, PageRank, dependency indexing, speedup, cli config, autotune#151

supercharge codebase-memory-mcp: streamline and consolidate api, autoindexing, PageRank, dependency indexing, speedup, cli config, autotune#151
ahundt wants to merge 71 commits intoDeusData:mainfrom
ahundt:api-consolidation

ahundt commented Mar 26, 2026 •

edited

Loading

Uh oh!

DeusData commented Mar 26, 2026

Uh oh!

ahundt commented Mar 26, 2026

Uh oh!

DeusData commented Mar 26, 2026

Uh oh!

DeusData commented Mar 26, 2026

Uh oh!

DeusData commented Mar 26, 2026

Uh oh!

maplenk commented Mar 26, 2026

Uh oh!

DeusData commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ahundt commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What This PR Proposes

Built on peer reviewed research

A Cleaner Tool Surface (src/mcp/mcp.c)

Why This Matters

The Streamlined API (default)

Modern MCP: Resources

Backwards Compatibility

Ranked Results: PageRank + LinkRank (src/pagerank/pagerank.c, pagerank.h, new)

Dependency Source Indexing (src/depindex/depindex.c, depindex.h, new)

Token Reduction (src/mcp/mcp.c)

Search Parameters That Were Silently Ignored (bug fixes, src/mcp/mcp.c, src/store/store.c)

Config Registry (src/cli/cli.c, cli.h)

Reliability Fixes

New Test Files

Key Files Changed

Breaking Changes

Uh oh!

DeusData commented Mar 26, 2026

Uh oh!

ahundt commented Mar 26, 2026

Uh oh!

DeusData commented Mar 26, 2026

Uh oh!

DeusData commented Mar 26, 2026

Uh oh!

DeusData commented Mar 26, 2026

Uh oh!

maplenk commented Mar 26, 2026

Uh oh!

DeusData commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ahundt commented Mar 26, 2026 •

edited

Loading

A Cleaner Tool Surface (`src/mcp/mcp.c`)

Ranked Results: PageRank + LinkRank (`src/pagerank/pagerank.c`, `pagerank.h`, new)

Dependency Source Indexing (`src/depindex/depindex.c`, `depindex.h`, new)

Token Reduction (`src/mcp/mcp.c`)

Search Parameters That Were Silently Ignored (bug fixes, `src/mcp/mcp.c`, `src/store/store.c`)

Config Registry (`src/cli/cli.c`, `cli.h`)