Problem
When ingesting code folders via agent-notes memory ingest, the system concatenates all files as raw text and relies on LLM agents (wiki-compiler at Sonnet/Opus cost) to discover what entities exist (classes, functions, modules) and how they relate. This is:
- Expensive: ~$0.10-0.50 per file in LLM tokens for entity discovery
- Non-deterministic: same code may produce different entity lists on different runs
- Slow: LLM round-trips for purely structural information available in the AST
Solution
Integrate Graphify (47k stars, MIT, Python 3.10+) which uses tree-sitter to parse ASTs locally and extract code entities deterministically with zero API cost. Wire its extraction into the existing wiki_ingest_folder() pipeline.
Cost Impact
| Step |
Current (LLM) |
With Graphify |
| Code structure (functions, classes, imports) |
~$0.10-0.50/file |
Free (tree-sitter, local) |
| Relationships between code entities |
LLM inference |
Free (AST traversal) |
| Community/module detection |
Manual or LLM-inferred |
Free (Leiden algorithm) |
| Domain narrative compilation |
LLM agents |
LLM agents (unchanged) |
For a typical 80% code / 20% docs project, ~80% of extraction work becomes free.
Graphify Python API (verified from source + context7)
# File detection
from graphify.detect import detect, classify_file, FileType
result = detect(Path("./project"))
# -> {"files": {"code": [...], "document": [...], "paper": [...], "image": [...]},
# "total_files": int, "total_words": int, "skipped_sensitive": [...]}
# File collection
from graphify.extract import collect_files, extract
code_files = collect_files(Path("./src")) # -> [Path, ...]
# AST extraction (zero API cost for code)
result = extract(code_files, cache_root=Path("."))
# -> {"nodes": [{"id": str, "label": str, "file_type": str, "source_file": str, "source_location": str}],
# "edges": [{"source": str, "target": str, "relation": str, "confidence": str, "weight": float}],
# "input_tokens": 0, "output_tokens": 0}
# Graph construction
from graphify.build import build_from_json
G = build_from_json(result) # -> networkx.Graph
# Community detection
from graphify.cluster import cluster, score_all
communities = cluster(G) # -> {0: ["node_id_a", "node_id_b"], 1: [...]}
cohesion = score_all(G, communities) # -> {0: 0.85, 1: 0.72}
# Analysis
from graphify.analyze import god_nodes, surprising_connections
gods = god_nodes(G) # -> [{"label": str, "degree": int, ...}]
surprises = surprising_connections(G, communities)
Key facts:
extract() accepts list[Path] of code files, returns dict with nodes/edges
build_from_json() accepts extraction dict, returns networkx.Graph
cluster() returns dict[int, list[str]] mapping community ID to node IDs
- Node IDs are deterministic:
{filename_stem}_{entity_name} (lowercase, NFKC normalized)
- Edge confidence:
"EXTRACTED" (from AST), "INFERRED", "AMBIGUOUS"
- Edge relations:
"contains", "calls", "imports", "uses", "inherits", "method"
- Supports 15+ languages: Python, JS/TS, Java, Go, Rust, C/C++, C#, Kotlin, Scala, PHP, Ruby, Swift, Lua, Groovy, Fortran
- PyPI package:
graphifyy (double-y), CLI command: graphify
Architecture
wiki_ingest_folder(folder_path)
│
├── [existing] Walk files, concatenate with --- FILE: markers
│
├── [NEW] if has_code and graphify_available():
│ ├── collect_files() → code file paths
│ ├── extract(code_files) → {nodes, edges}
│ ├── build_from_json() → NetworkX graph
│ ├── cluster() → communities
│ ├── graph_to_wiki_terms() → {entities, concepts, edges_by_entity}
│ └── save_graph_json() → raw/<slug>-graph.json
│
├── Merge graphify-discovered entities/concepts with caller-provided ones
│
└── [existing] wiki_ingest(merged_entities, merged_concepts)
├── Store raw content
├── Create source page
├── Fan out entity stub pages ← now pre-populated by Graphify
├── Fan out concept stub pages ← now pre-populated by Graphify
└── Cross-reference (enhanced with edge data)
Design Constraints
- Optional dependency —
graphifyy in [project.optional-dependencies], graceful fallback via try/except ImportError
- No new CLI commands — auto-detect folder path in existing
agent-notes memory ingest
- No manual configuration — Graphify extraction runs automatically when available
- Backward compatible — all existing tests and workflows unchanged
- Single integration point — new
code_graph.py module encapsulates all Graphify interaction
Sub-issues
Dependency Graph
#6 (optional dep)
└── #7 (code_graph.py)
└── #8 (wire into wiki_ingest_folder)
├── #9 (CLI folder detection)
├── #10 (cross-ref enrichment)
└── #11 (agent instructions)
└── #12 (tests — depends on all above)
Files to Modify
| File |
Action |
pyproject.toml |
Add graph optional dep, pytest marker |
agent_notes/services/code_graph.py |
NEW — Graphify extraction boundary |
agent_notes/services/wiki_backend.py |
Modify wiki_ingest_folder(), _cross_reference() |
agent_notes/commands/memory.py |
Modify do_ingest() for folder auto-detect |
agent_notes/data/agents/wiki-compiler.md |
Add graph.json usage instructions |
agent_notes/data/skills/obsidian-memory/SKILL.md |
Document folder auto-detection |
tests/unit/services/test_code_graph.py |
NEW — extraction module tests |
tests/unit/services/test_wiki_backend.py |
Add Graphify integration tests |
tests/functional/memory/test_memory_command.py |
Add folder detection test |
Branch
feat/graphify-integration from develop
Problem
When ingesting code folders via
agent-notes memory ingest, the system concatenates all files as raw text and relies on LLM agents (wiki-compiler at Sonnet/Opus cost) to discover what entities exist (classes, functions, modules) and how they relate. This is:Solution
Integrate Graphify (47k stars, MIT, Python 3.10+) which uses tree-sitter to parse ASTs locally and extract code entities deterministically with zero API cost. Wire its extraction into the existing
wiki_ingest_folder()pipeline.Cost Impact
For a typical 80% code / 20% docs project, ~80% of extraction work becomes free.
Graphify Python API (verified from source + context7)
Key facts:
extract()acceptslist[Path]of code files, returns dict with nodes/edgesbuild_from_json()accepts extraction dict, returnsnetworkx.Graphcluster()returnsdict[int, list[str]]mapping community ID to node IDs{filename_stem}_{entity_name}(lowercase, NFKC normalized)"EXTRACTED"(from AST),"INFERRED","AMBIGUOUS""contains","calls","imports","uses","inherits","method"graphifyy(double-y), CLI command:graphifyArchitecture
Design Constraints
graphifyyin[project.optional-dependencies], graceful fallback viatry/except ImportErroragent-notes memory ingestcode_graph.pymodule encapsulates all Graphify interactionSub-issues
Dependency Graph
Files to Modify
pyproject.tomlgraphoptional dep, pytest markeragent_notes/services/code_graph.pyagent_notes/services/wiki_backend.pywiki_ingest_folder(),_cross_reference()agent_notes/commands/memory.pydo_ingest()for folder auto-detectagent_notes/data/agents/wiki-compiler.mdagent_notes/data/skills/obsidian-memory/SKILL.mdtests/unit/services/test_code_graph.pytests/unit/services/test_wiki_backend.pytests/functional/memory/test_memory_command.pyBranch
feat/graphify-integrationfromdevelop