Skip to content

Enrich cross-referencing with Graphify edge data #10

@verkligheten

Description

@verkligheten

Parent Epic

Part of #5 — Integrate Graphify for zero-cost code entity extraction

Task

Enhance _cross_reference() in wiki_backend.py to accept optional edge hints from Graphify, creating deterministic wikilinks between entity pages that have structural relationships (calls, imports, inherits).

File

agent_notes/services/wiki_backend.py — function _cross_reference() (lines 734-776)

Current Behavior

Cross-referencing is text-based: it scans page bodies for mentions of other page titles/aliases and adds [[wikilinks]] in ## Related sections. This works but misses structural relationships where code entities reference each other by different names than their page titles.

Example: UserService calls Gateway.process() — but if the page is titled "Payment Gateway", text-based matching won't find "Gateway.process" in the body.

New Behavior

When Graphify edge data is available, use it as an additional signal for creating wikilinks. Structural relationships (calls, imports, inherits) get wikilinks regardless of whether the text mentions them by name.

Implementation

Step 1: Add edge_hints parameter to _cross_reference()

def _cross_reference(
    wiki_dir: Path,
    touched_pages: list[Path],
    edge_hints: list[dict] | None = None,
) -> int:

Step 2: Add edge-based linking after existing text-based logic (after line 775)

    # Graphify edge-based cross-referencing
    if edge_hints:
        for edge in edge_hints:
            src_label = edge.get("source_label", "")
            tgt_label = edge.get("target_label", "")
            if not src_label or not tgt_label:
                continue

            src_page = _find_page_by_title(wiki_dir, src_label, registry)
            tgt_page = _find_page_by_title(wiki_dir, tgt_label, registry)

            if src_page and tgt_page and src_page != tgt_page:
                added = _ensure_related_section(src_page, [tgt_page])
                links_inserted += added
                added = _ensure_related_section(tgt_page, [src_page])
                links_inserted += added

    return links_inserted

Step 3: Add _find_page_by_title() helper

def _find_page_by_title(wiki_dir: Path, title: str, registry: dict[str, Path]) -> Path | None:
    """Find a wiki page matching a title or slug."""
    # Try exact match in registry (case-insensitive)
    lower = title.lower()
    if lower in registry:
        return registry[lower]
    # Try slug match
    slug = _slug(title)
    if slug in registry:
        return registry[slug]
    # Try direct file path
    for sub in WIKI_PAGE_TYPES:
        candidate = wiki_dir / sub / f"{slug}.md"
        if candidate.exists():
            return candidate
    return None

Step 4: Thread edge hints through wiki_ingest_folder → wiki_ingest → _cross_reference

In wiki_ingest_folder() (modified in #8), after Graphify extraction, build edge hints:

edge_hints = None
if graphify_entities and wiki_terms.get("edges_by_entity"):
    edge_hints = []
    for src_label, targets in wiki_terms["edges_by_entity"].items():
        for t in targets:
            if t["relation"] in ("calls", "imports", "uses", "inherits"):
                edge_hints.append({
                    "source_label": src_label,
                    "target_label": t["target"],
                    "relation": t["relation"],
                })

Pass through to wiki_ingest() and then to _cross_reference(). This requires adding an edge_hints parameter to wiki_ingest():

def wiki_ingest(
    wiki_root: Path,
    *,
    title: str,
    body: str,
    # ... existing params ...
    edge_hints: list[dict] | None = None,  # NEW
) -> dict[str, list[Path]]:

And passing it to the _cross_reference() call at line 221:

cross_ref_count = _cross_reference(wiki_dir, touched, edge_hints=edge_hints)

Filtered Relation Types

Only create cross-references for semantically meaningful relations:

Relation Cross-reference? Rationale
calls Yes "A calls B" = strong dependency
imports Yes "A imports B" = direct dependency
uses Yes "A uses B" = references
inherits Yes "A inherits B" = IS-A relationship
contains No "module contains function" is structural, not a relationship worth linking
method No "class has method" is structural

Potential Issues

  1. Duplicate links: _ensure_related_section() is already idempotent — it checks for existing [[slug]] before adding. Safe to call with overlapping edge-based and text-based matches.
  2. Pages not yet created: Edge hints reference labels, but the stub pages may not be created yet at cross-reference time. Solution: _cross_reference() runs after wiki_ingest() fans out all stub pages, so they should exist. If not found, _find_page_by_title() returns None and the edge is skipped.
  3. Large number of edges: A 200-file project could have 500+ edges. Filtering to calls/imports/uses/inherits and only edges where both pages exist keeps this manageable.

Dependencies

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions