Codebase Cartographer is an autonomous, multi-agent system designed to map complex brownfield repositories, resolve multi-language data lineage, and provide interactive query capabilities for Forward Deployed Engineers (FDEs).
The Cartographer operates through a sequenced pipeline where each agent incrementally enriches a central Knowledge Graph.
graph TD
Repo[Target Repository] -- "Source Code / Git Metadata" --> Surveyor
subgraph "Cartographer Pipeline"
Surveyor[Surveyor Agent] -- "AST + Git Velocity" --> KG[(Central Knowledge Graph)]
Hydrologist[Hydrologist Agent] -- "Data Lineage Edges" --> KG
Semanticist[Semanticist Agent] -- "Purpose + Domain Labels" --> KG
Archivist[Archivist Agent] -- "Final Artifact Synthesis" --> KG
end
subgraph "Query Interface"
KG -- "Graph Context" --> Navigator[Navigator Agent]
User[FDE User] -- "Natural Language / Tools" --> Navigator
end
KG -- "Export" --> CB["CODEBASE.md"]
KG -- "Export" --> LG["lineage_graph.json"]
style KG fill:#f9f,stroke:#333,stroke-width:2px
style Repo fill:#dfd,stroke:#383
style Navigator fill:#bbf,stroke:#333
- Surveyor: Extracted Structural AST data (classes, functions, calls) and Git churn metrics to identify Architectural Hubs via PageRank.
- Hydrologist: Operates on the structure to define Read/Write (Product) edges, resolving cross-layer lineage (e.g., dbt
refto PythonSQLAlchemy). - Semanticist: Consumes the context of nodes/edges to generate implementation-aware purposes (ignoring stale docstrings) and performs K-Means Domain Clustering.
- Archivist: Synthesizes the finalized graph into
CODEBASE.md, calculating the Critical Path (longest dependency chain) using NetworkX DAG algorithms.
- NetworkX vs. Neo4j: We chose a NetworkX-based in-memory graph with JSON serialization rather than a heavy graph database. This ensures 100% portability—an FDE can run the tool locally on any terminal without external infra.
- Deferred LLM Synthesis: Semantic analysis is deferred until Phase 3 of the pipeline. This ensures the model has access to the entire structural and lineage context, significantly increasing purpose-statement accuracy while reducing token cost.
The Cartographer is designed for real-world client engagements:
- Cold Start: Run
analyze --llmto generate a Day-One Onboarding Brief. - Exploration: Use the
Navigatortool to query the blast radius of specific modules during bug fix planning. - Maintenance: The
Archivistupdates theCODEBASE.mdincrementally as the FDE pushes changes, preserving a living technical debt registry.
python src/cli.py analyze /path/to/repo --llm
python src/cli.py query "Explain the blast radius of models/staging"