Source
Local source note: /Users/azalio/Downloads/Telegram Desktop/codegraph_the_open_source_knowledge_graph_that_makes_ai_coding_t.md, extracted from Medium article "CodeGraph: The Open-Source Knowledge Graph That Makes AI Coding Tools Dramatically Cheaper" (https://medium.com/kd-agentic/codegraph-the-open-source-knowledge-graph-that-makes-ai-coding-tools-dramatically-cheaper-190f8b89f8a7).
Source-specific idea used here: CodeGraph reduces agent exploration cost by prebuilding a local structural code graph from parsed source, then answering "what calls this?", "what imports this?", and "where is the handler/entrypoint?" style questions in one tool call instead of file-by-file agent search.
Relevant source takeaways
- CodeGraph indexes source into a local SQLite/FTS-backed graph of symbols and edges such as imports, calls, inheritance, implementations, and framework routes.
- The useful mechanism for MAP is not the exact product claim or star count; it is the architecture: separate codebase discovery from solving, return compact structural evidence, and avoid repeated broad searches inside the main Actor context.
- Tree-sitter is called out because it can parse incomplete code and supports multiple languages; that matters for agent workflows where the tree may be temporarily uncompilable.
- The article distinguishes semantic/vector search from deterministic structural relationships. MAP's research artifacts already prefer exact file/line evidence; a structural provider would make the upstream localization less probabilistic.
- Local-first/no-cloud is relevant to MAP's current local workflow posture and avoids requiring user code to leave the machine.
- Route recognition is useful, but it should be treated as one query family on top of a structural map rather than the first implementation slice.
Repo evidence
src/mapify_cli/repo_insight.py:1-5 says repo insight analyzes project structure for language detection, suggested checks, and key directories. src/mapify_cli/repo_insight.py:13-43, :46-82, :85-119, and :122-162 implement exactly that shallow artifact. It does not index symbols, imports, callers, routes, or line-level relationships.
src/mapify_cli/dependency_graph.py:1-6 and :66-85 implement a graph, but it is a workflow subtask DAG for cascade invalidation. It is not a repository code graph.
src/mapify_cli/templates_src/agents/research-agent.md.jinja:14-22 and :143-150 still describe research as Glob/Grep/Read. src/mapify_cli/templates_src/codex/agents/researcher.toml.jinja:70-83 likewise instructs provider-neutral file discovery + grep + narrow reads.
docs/USAGE.md:1719-1731 documents the current RESEARCH path: persisted ResearchEvidence is mandatory, delegation is conditional, and cold-start/high-risk work uses research-agent/researcher plus save_research/validate_research.
docs/ARCHITECTURE.md:55-58 says MCP is optional and provider runtimes can call configured MCP servers. docs/ARCHITECTURE.md:32-36 says MAP does not ship or maintain third-party MCP servers, so this should be an optional integration/detection surface, not vendoring CodeGraph.
docs/ARCHITECTURE.md:20-30 confirms MAP already owns local generated provider scaffolding, branch artifacts, token accounting, and optional MCP wiring; a local structural map fits that surface if it is optional and artifact-backed.
Existing issue search
Commands/searches used:
Close issues checked:
Why this is not already covered
MAP has strict research artifacts and some advisory ROI telemetry, but the discovery engine is still prompt-instructed Glob/Grep/Read. The repo has a dependency_graph, but it models MAP subtasks, not code symbols. The remaining gap is a local, optional structural code-map provider that can answer targeted relationship queries and emit normal ResearchEvidence, reducing cold-start search loops without changing Actor semantics.
Problem
Cold-start multi-file MAP tasks still pay the mechanical exploration cost that the article describes: researcher/decomposer/Actor use broad file discovery and text search to infer symbol relationships. That increases token/tool-call spend and makes localization quality depend on prompt discipline rather than an explicit code map.
Proposed slice
Add an optional mapify code-map / structural-discovery provider surface that can populate or query a local repository map and feed MAP's existing ResearchEvidence contract.
Concrete first slice:
- Add a provider-neutral abstraction such as
mapify_cli.code_map with a minimal query model: symbols by name, imports/exports, callers/references where available, and file-level dependency edges.
- Prefer existing local tooling when present, e.g. detect a CodeGraph MCP/server/config and document it as an optional provider. Do not vendor or maintain CodeGraph as a required dependency.
- Provide a deterministic fallback for Python projects using stdlib
ast plus import scanning, so the feature can be tested without network, MCP credentials, or external binaries.
- Add a runner command that emits compact JSON compatible with or directly convertible to ResearchEvidence:
confidence, status, search_method, search_stats, and <=5 relevant_locations with path/lines/signature/relevance.
- Teach
research-agent/researcher prompts to prefer the structural map for locate/impact/pattern queries when available, then fall back to Glob/Grep/Read with an explicit reason.
- Keep generated templates single-source: edit
src/mapify_cli/templates_src/**.jinja, then render templates.
Out of scope for this slice:
- Full multi-language tree-sitter implementation.
- Route recognition for all frameworks.
- Mandatory MCP installation.
- Cloud indexing or transmitting source code outside the local machine.
Acceptance criteria
mapify code-map query or equivalent deterministic helper returns structural evidence for a fixture repo with at least Python imports/classes/functions and line ranges.
- ResearchEvidence emitted from the code-map path passes the existing
validate_research/research eval expectations.
- Claude and Codex researcher templates mention the structural-map-first path only when a map provider is available and preserve the current Glob/Grep/Read fallback.
- Tests cover: no provider available, Python fallback success, stale/missing index fallback, unsafe path rejection, and conversion to <=5 relevant locations.
- Docs explain optional CodeGraph/MCP integration without making MAP responsible for installing/maintaining third-party MCP servers.
make render-templates, make check-render, and relevant pytest suites are expected validation gates for implementation.
Guardrails
- Do not make CodeGraph a mandatory runtime dependency for
mapify init.
- Do not add cloud indexing, API keys, or source upload.
- Do not bypass existing ResearchEvidence validation; structural-map output must be normal evidence, not a privileged side channel.
- Do not treat semantic/vector similarity as a substitute for deterministic relationship evidence.
- Do not use shadow-mode rollout; gate behind explicit optional availability/config and validate directly.
Source
Local source note:
/Users/azalio/Downloads/Telegram Desktop/codegraph_the_open_source_knowledge_graph_that_makes_ai_coding_t.md, extracted from Medium article "CodeGraph: The Open-Source Knowledge Graph That Makes AI Coding Tools Dramatically Cheaper" (https://medium.com/kd-agentic/codegraph-the-open-source-knowledge-graph-that-makes-ai-coding-tools-dramatically-cheaper-190f8b89f8a7).Source-specific idea used here: CodeGraph reduces agent exploration cost by prebuilding a local structural code graph from parsed source, then answering "what calls this?", "what imports this?", and "where is the handler/entrypoint?" style questions in one tool call instead of file-by-file agent search.
Relevant source takeaways
Repo evidence
src/mapify_cli/repo_insight.py:1-5says repo insight analyzes project structure for language detection, suggested checks, and key directories.src/mapify_cli/repo_insight.py:13-43,:46-82,:85-119, and:122-162implement exactly that shallow artifact. It does not index symbols, imports, callers, routes, or line-level relationships.src/mapify_cli/dependency_graph.py:1-6and:66-85implement a graph, but it is a workflow subtask DAG for cascade invalidation. It is not a repository code graph.src/mapify_cli/templates_src/agents/research-agent.md.jinja:14-22and:143-150still describe research as Glob/Grep/Read.src/mapify_cli/templates_src/codex/agents/researcher.toml.jinja:70-83likewise instructs provider-neutral file discovery + grep + narrow reads.docs/USAGE.md:1719-1731documents the current RESEARCH path: persisted ResearchEvidence is mandatory, delegation is conditional, and cold-start/high-risk work uses research-agent/researcher plussave_research/validate_research.docs/ARCHITECTURE.md:55-58says MCP is optional and provider runtimes can call configured MCP servers.docs/ARCHITECTURE.md:32-36says MAP does not ship or maintain third-party MCP servers, so this should be an optional integration/detection surface, not vendoring CodeGraph.docs/ARCHITECTURE.md:20-30confirms MAP already owns local generated provider scaffolding, branch artifacts, token accounting, and optional MCP wiring; a local structural map fits that surface if it is optional and artifact-backed.Existing issue search
Commands/searches used:
gh issue list --state all --limit 100 --search "CodeGraph OR \"knowledge graph\" OR \"call graph\" OR tree-sitter OR symbols OR \"repo insight\" OR \"repository map\" OR \"token reduction\" OR \"research ROI\""returned no direct matches.gh issue list --state all --limit 100 --search "repo insight"returned no matches.gh issue list --state all --limit 100 --search "tree-sitter"returned no matches.gh issue list --state all --limit 100 --search "symbol graph"returned no matches.gh issue list --state all --limit 100 --search "parallel wave context map"returned /map-efficient: parallel-wave execution shipped but dark — sequential by default at every gate #303 and GSD-style fresh-context-per-task: worktree isolation for autonomous execution #284, but those cover parallel execution/worktree isolation, not structural code localization.Close issues checked:
Why this is not already covered
MAP has strict research artifacts and some advisory ROI telemetry, but the discovery engine is still prompt-instructed Glob/Grep/Read. The repo has a
dependency_graph, but it models MAP subtasks, not code symbols. The remaining gap is a local, optional structural code-map provider that can answer targeted relationship queries and emit normal ResearchEvidence, reducing cold-start search loops without changing Actor semantics.Problem
Cold-start multi-file MAP tasks still pay the mechanical exploration cost that the article describes: researcher/decomposer/Actor use broad file discovery and text search to infer symbol relationships. That increases token/tool-call spend and makes localization quality depend on prompt discipline rather than an explicit code map.
Proposed slice
Add an optional
mapify code-map/ structural-discovery provider surface that can populate or query a local repository map and feed MAP's existing ResearchEvidence contract.Concrete first slice:
mapify_cli.code_mapwith a minimal query model: symbols by name, imports/exports, callers/references where available, and file-level dependency edges.astplus import scanning, so the feature can be tested without network, MCP credentials, or external binaries.confidence,status,search_method,search_stats, and <=5relevant_locationswith path/lines/signature/relevance.research-agent/researcherprompts to prefer the structural map for locate/impact/pattern queries when available, then fall back to Glob/Grep/Read with an explicit reason.src/mapify_cli/templates_src/**.jinja, then render templates.Out of scope for this slice:
Acceptance criteria
mapify code-map queryor equivalent deterministic helper returns structural evidence for a fixture repo with at least Python imports/classes/functions and line ranges.validate_research/research eval expectations.make render-templates,make check-render, and relevant pytest suites are expected validation gates for implementation.Guardrails
mapify init.