Background
The current renderer extends markdown_it.renderer.RendererHTML — a sequential token-stream model where each token's handler returns a string in isolation and the framework blindly concatenates them. This model is the root cause of two recurring classes of bugs:
- Structural-newline cascades. When several block elements close back-to-back (deeply nested lists, blockquote containing a paragraph, etc.), each
*_close handler independently emits \n. They don't know about each other, so we get \n\n\n\n (multiple blank lines) where we want one. Current workaround: emit a sentinel char (\x02) for "structural newline" and a regex cap in render() that collapses runs of 3+ sentinels to 2. Works, but is essentially a normalization band-aid over a structural problem. See docs/architecture.md for the full rationale.
- Continuation content in list items loses indent. Multi-paragraph items, code blocks inside list items, and hardbreak/softbreak continuation lines all flow back to column 0 because the relevant handlers (
paragraph_open, code_block, fence, hardbreak) have no idea they're inside a list_item. We currently track _list_depth only for list_item_open, where the depth happens to be available — for everything else, the depth would need to be threaded as renderer state.
Proposal
Migrate the renderer from the stream-based model to an AST tree-walker. markdown-it-py already ships with markdown_it.tree.SyntaxTreeNode for building a tree from a token list. A tree-walker has full structural context at every node:
- Knows what its parent is (e.g., "I am a code_block inside a list_item at depth 2") → can emit the right leading indent unconditionally.
- Knows what its next sibling is → can emit exactly the right separator (single newline vs. blank line) without cascade.
- Can prepend / append content per-node rather than relying on close-handlers to emit trailing whitespace.
Implementation sketch:
from markdown_it.tree import SyntaxTreeNode
def render(self, tokens, options, env):
tree = SyntaxTreeNode(tokens)
return self._walk(tree).rstrip(\"\n\") + \"\n\"
def _walk(self, node, depth=0, list_depth=0, in_blockquote=False) -> str:
# dispatch on node.type, recurse into children with updated context
...
Roughly ~50–80 lines to replace the per-handler dispatch + sentinel/cap with a single recursive walker.
What this resolves
- The sentinel + cap regex in
render() becomes unnecessary — the walker emits the right separator directly from structural context.
- Multi-paragraph list items render with correct indentation.
- Code blocks inside list items are properly indented.
- Hardbreak/softbreak continuation lines stay at the list's indent column.
- Blockquote multi-paragraph rendering can prepend `> ` to every line, not just the first.
What stays the same
- Public API (
slackify_markdown(text)) is unchanged.
- Format mappings (bold/italic/strike/links/headings/mentions) — those are 1:1 transformations and don't depend on structural context.
- Test suite — should pass unchanged after the rewrite (and we can drop the regression tests for cascade-cap since the new design makes them impossible by construction).
Acceptance criteria
References
Background
The current renderer extends
markdown_it.renderer.RendererHTML— a sequential token-stream model where each token's handler returns a string in isolation and the framework blindly concatenates them. This model is the root cause of two recurring classes of bugs:*_closehandler independently emits\n. They don't know about each other, so we get\n\n\n\n(multiple blank lines) where we want one. Current workaround: emit a sentinel char (\x02) for "structural newline" and a regex cap inrender()that collapses runs of 3+ sentinels to 2. Works, but is essentially a normalization band-aid over a structural problem. Seedocs/architecture.mdfor the full rationale.paragraph_open,code_block,fence,hardbreak) have no idea they're inside alist_item. We currently track_list_depthonly forlist_item_open, where the depth happens to be available — for everything else, the depth would need to be threaded as renderer state.Proposal
Migrate the renderer from the stream-based model to an AST tree-walker.
markdown-it-pyalready ships withmarkdown_it.tree.SyntaxTreeNodefor building a tree from a token list. A tree-walker has full structural context at every node:Implementation sketch:
Roughly ~50–80 lines to replace the per-handler dispatch + sentinel/cap with a single recursive walker.
What this resolves
render()becomes unnecessary — the walker emits the right separator directly from structural context.What stays the same
slackify_markdown(text)) is unchanged.Acceptance criteria
NEW_LINEsentinel + cap regex + input scrub are removeddocs/architecture.mdis updated to describe the new walker modelReferences
docs/architecture.md— current design, known limitationsmarkdown_it.tree.SyntaxTreeNodesource