Skip to content

Fix list/blockquote spacing cascades; add nested indent + bullet hierarchy#20

Merged
thesmallstar merged 1 commit into
mainfrom
fix/list-close-spacing
May 25, 2026
Merged

Fix list/blockquote spacing cascades; add nested indent + bullet hierarchy#20
thesmallstar merged 1 commit into
mainfrom
fix/list-close-spacing

Conversation

@thesmallstar
Copy link
Copy Markdown
Owner

@thesmallstar thesmallstar commented May 25, 2026

Summary

  • Deeply nested lists (3+ levels) and blockquotes were producing 2+ blank lines between blocks because each *_close handler emitted \n independently with no awareness of the cascade. Nested bullets also had no indent.
  • This PR fixes both: adds proper indent + bullet hierarchy for nested lists, and a \x02 STX sentinel + cap regex in render() to collapse multi-newline runs from cascading closes down to a single blank line — without touching content inside fenced code blocks.

What changed

src/slackify_markdown/slackify.py

  • _list_depth state drives a 4-space indent per nest level in list_item_open.
  • _BULLETS_BY_DEPTH = ("•", "◦", "▪") — matches Slack's native nested-list glyphs.
  • Close-handlers (heading_close, paragraph_close, bullet_list_close, ordered_list_close, blockquote_close) emit NEW_LINE = "\x02" instead of literal \n.
  • render() caps runs of 3+ sentinels to 2 with one regex, then replaces sentinels with real \n. Code-block / fence handlers emit real \n directly so their content is preserved verbatim (including multi-blank-line content).
  • slackify() scrubs \x02 from user input so a literal STX in a Markdown source can't collide with our cap machinery.

tests/test_convert.py

  • 10 new edge-case tests: deeply nested bullets (4+ levels), mixed ordered/unordered nesting, code blocks with special chars and blank-line preservation, loose lists, blockquote+inner-list, all 6 heading levels, multi-blank-line collapse with code-block preservation, links with nested formatting + URL ampersands, inline-code + Slack mentions, and the STX-sentinel input scrub.
  • 1 new full-document integration test (test_full_document_with_all_patterns) that exercises ~everything together.
  • Updated test_complex_markdown expectations for the cleaner spacing.
  • 60 tests pass.

docs/architecture.md (new)

  • Pipeline overview, format mappings, full rationale for the sentinel/cap mechanism.
  • Trail of alternatives considered for the sentinel char (regex on real \n, split on ```, PUA , Unicode noncharacters, NULL \x00, STX \x02) and why STX won — same convention as python-markdown.
  • Known limitations (multi-paragraph items in lists, code-blocks-in-lists, hardbreak continuations, multi-line blockquote prefix) with note that AST-walker migration would resolve all four. Linked to Migrate renderer to AST tree-walker (resolves cascade + indent limitations) #19.

Why STX over alternatives

Choice Verdict
\n{3,} regex on real newlines rejected — eats blank lines inside code blocks
Private Use Area rejected — Unicode FAQ explicitly warns this collides with real PUA usage
Unicode noncharacter rejected — same valid-codepoint risk
\x00 NULL rejected — Python source files cannot contain literal NULL; shells / argv / os.exec* reject NULL
\x02 STX chosen — battle-tested by python-markdown; ASCII-safe in source, shells, JSON, filesystems

Test plan

  • PYTHONPATH=src python3 -m pytest tests/ -v — all 60 pass
  • Manual Slack render via webhook of .venv/sample.txt (LLM-generated content with 2-level nested lists + links) — clean single-blank-line separation everywhere
  • Verified user-typed \x02 in input does not corrupt output
  • Verified \n\n\n\n inside a fenced code block is preserved
  • Reviewer: visually confirm rendered Slack output looks clean for a typical assistant message

Follow-ups

…archy

Deeply nested lists (3+ levels) and blockquotes were producing 2+ blank lines
between blocks because each close-handler emitted \n independently with no
awareness of the cascade. Nested bullets also had no indent, so sub-items
sat at column 0 alongside their parents.

This change:
- Adds a 4-space indent per nest level in list_item_open via _list_depth state.
- Uses •/◦/▪ for bullet depths 1/2/3+ to match Slack's native rendering.
- Replaces structural newlines from close-handlers (heading, paragraph,
  list, blockquote) with a sentinel char (\x02 STX), then in render():
  caps runs of 3+ sentinels to 2 with a single regex, and materializes
  sentinels to real \n. Code-block content emits real \n directly, so it's
  untouched by the cap — multi-blank-line content inside fenced blocks is
  preserved verbatim.
- Scrubs the STX sentinel from user input in slackify() so user-typed
  control chars can't collide with our cap machinery.
- Adds 10 complex edge-case tests + one full-document integration test
  covering deep nesting, mixed ordered/unordered, code-with-specials,
  blockquote+list, all heading levels, multi-blank-line collapse with
  code preservation, link+formatting, mentions+inline-code, and the
  sentinel scrub.
- Adds docs/architecture.md explaining the pipeline, the cap/sentinel
  rationale (with the trail of alternatives considered — NULL, PUA,
  noncharacters — and why STX won), known limitations, and the future
  AST-walker migration tracked in #19.

60 tests pass.
@thesmallstar thesmallstar merged commit ea1da7e into main May 25, 2026
8 checks passed
@thesmallstar thesmallstar deleted the fix/list-close-spacing branch May 25, 2026 14:52
@thesmallstar thesmallstar mentioned this pull request May 25, 2026
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant