Skip to content

Migrate renderer to AST tree-walker (resolves cascade + indent limitations) #19

@thesmallstar

Description

@thesmallstar

Background

The current renderer extends markdown_it.renderer.RendererHTML — a sequential token-stream model where each token's handler returns a string in isolation and the framework blindly concatenates them. This model is the root cause of two recurring classes of bugs:

  1. Structural-newline cascades. When several block elements close back-to-back (deeply nested lists, blockquote containing a paragraph, etc.), each *_close handler independently emits \n. They don't know about each other, so we get \n\n\n\n (multiple blank lines) where we want one. Current workaround: emit a sentinel char (\x02) for "structural newline" and a regex cap in render() that collapses runs of 3+ sentinels to 2. Works, but is essentially a normalization band-aid over a structural problem. See docs/architecture.md for the full rationale.
  2. Continuation content in list items loses indent. Multi-paragraph items, code blocks inside list items, and hardbreak/softbreak continuation lines all flow back to column 0 because the relevant handlers (paragraph_open, code_block, fence, hardbreak) have no idea they're inside a list_item. We currently track _list_depth only for list_item_open, where the depth happens to be available — for everything else, the depth would need to be threaded as renderer state.

Proposal

Migrate the renderer from the stream-based model to an AST tree-walker. markdown-it-py already ships with markdown_it.tree.SyntaxTreeNode for building a tree from a token list. A tree-walker has full structural context at every node:

  • Knows what its parent is (e.g., "I am a code_block inside a list_item at depth 2") → can emit the right leading indent unconditionally.
  • Knows what its next sibling is → can emit exactly the right separator (single newline vs. blank line) without cascade.
  • Can prepend / append content per-node rather than relying on close-handlers to emit trailing whitespace.

Implementation sketch:

from markdown_it.tree import SyntaxTreeNode

def render(self, tokens, options, env):
    tree = SyntaxTreeNode(tokens)
    return self._walk(tree).rstrip(\"\n\") + \"\n\"

def _walk(self, node, depth=0, list_depth=0, in_blockquote=False) -> str:
    # dispatch on node.type, recurse into children with updated context
    ...

Roughly ~50–80 lines to replace the per-handler dispatch + sentinel/cap with a single recursive walker.

What this resolves

  • The sentinel + cap regex in render() becomes unnecessary — the walker emits the right separator directly from structural context.
  • Multi-paragraph list items render with correct indentation.
  • Code blocks inside list items are properly indented.
  • Hardbreak/softbreak continuation lines stay at the list's indent column.
  • Blockquote multi-paragraph rendering can prepend `> ` to every line, not just the first.

What stays the same

  • Public API (slackify_markdown(text)) is unchanged.
  • Format mappings (bold/italic/strike/links/headings/mentions) — those are 1:1 transformations and don't depend on structural context.
  • Test suite — should pass unchanged after the rewrite (and we can drop the regression tests for cascade-cap since the new design makes them impossible by construction).

Acceptance criteria

  • All 60 existing tests pass on the rewritten renderer
  • Multi-paragraph list items render with correct indent (add a regression test)
  • Code block inside a list item renders with correct indent (add a regression test)
  • Multi-line blockquotes prefix every line with `> ` (add a regression test)
  • NEW_LINE sentinel + cap regex + input scrub are removed
  • docs/architecture.md is updated to describe the new walker model

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions