Process Markdown into Graph Structure #64
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
feat: Add markdown structure processing to graph
Description
Summary
Implements comprehensive markdown structure processing functionality that converts markdown elements (headings, sections, lists, tables, code blocks, and blockquotes) into RDF graph entities with proper relationships and metadata.
Key Changes
Added 7 new Pydantic models for markdown structure:
KbHeading - Markdown headings (h1-h6) with level and hierarchy
KbSection - Content sections with heading relationships
KbList - Ordered/unordered lists with item counts
KbListItem - Individual list items with parent relationships
KbTable - Tables with row/column counts and headers
KbCodeBlock - Code blocks with language and line count
KbBlockquote - Blockquotes with nesting levels
All models include RDF property mappings, position tracking, and Schema.org types.
Converts markdown elements to KB entities
Maintains parent-child relationships (heading↔section, list↔items)
Tracks position information (start/end line numbers)
Uses deterministic ID generation based on position for reproducibility
Provides statistics on extracted structure
Integrated into main processing pipeline
Automatically extracts structure from all documents
Processes alongside todos, wikilinks, and named entities
Added generate_markdown_element_id() method
Deterministic URIs based on element type and position
Created 5 new test cases in specs/test_cases/:
markdown_structure_01_single_heading
markdown_structure_02_code_block
markdown_structure_03_list
markdown_structure_04_table
markdown_structure_05_blockquote
Regenerated all 60 existing spec test outputs to include new entities
Added scripts/regenerate_spec_outputs.py utility for batch updates
Impact
All markdown structure elements are now fully represented in the knowledge graph with:
✅ Proper RDF types and Schema.org mappings
✅ Position metadata (start/end line numbers)
✅ Parent-child relationships
✅ Queryable via SPARQL
✅ Deterministic, reproducible entity IDs
Test Plan

All 61 specification tests pass

RDF converter handles all new entity types

Deterministic ID generation ensures test reproducibility

Integration tests verify end-to-end processing

Spec tests use declarative approach per project standards
Testing Results
============================= test session starts ==============================
collected 61 items
tests/test_specifications.py::test_specifications PASSED x60
tests/test_specifications.py::test_test_cases_directory_exists PASSED
===================== 61 passed, 31 warnings in 1.51s =========================