feat(indent): opt-in extensions for non-YAML indentation languages (commentExcept, rawBlock, flowColonSeparator)#41
Open
theoephraim wants to merge 1 commit into
Conversation
Indentation languages that nest tag lines (Pug-like) rather than key/value
scalars need three behaviors the indent mode currently hardcodes for YAML.
Each is an opt-in IndentConfig field, default off — a grammar declaring none
tokenizes byte-identically (all existing gates unchanged).
- commentExcept: an exception string after the comment introducer makes the
line fall through to tokenization ('//' lines vanish, '//!' doc-comment
lines lex as real structural tokens).
- rawBlock: verbatim capture introduced from the END of a line (tag:mode
filters / content modes) — the mirror image of blockScalar's leading | / >.
The introducer must be glued to the line content (no top-level whitespace)
or sit at the line lead.
- flowColonSeparator: false disables the YAML flow ':' key-separator
carve-out, for grammars with ':name'-shaped tokens (bound-attribute
shorthand) that legally follow quoted values / flow closes.
Specified as engine behavior over toy grammars in test/indent-extensions.ts
(21 checks, registered as a core gate).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
theoephraim
added a commit
to theoephraim/nmbl
that referenced
this pull request
Jun 12, 2026
The NMBL grammar is now defined once (src/nmbl-grammar.ts) using monogram (github:johnsoncodehk/monogram, pinned + bundled into dist); the runtime lexer/parser execute it directly and all editor artifacts derive from it (scripts/gen-artifacts.ts → TextMate, language-config, tree-sitter, Monarch, CST types). Replaces the hand-written lexer.ts/parser.ts/tokens.ts; the battle-tested compiler codegen survives via a CST→AST adapter (cst-to-ast.ts). Engine extensions live in patches/monogram.patch (commentExcept, rawBlock, flowColonSeparator — proposed upstream in johnsoncodehk/monogram#41). Language/compiler changes: - host-native @-blocks: framework option ('html'|'vue'|'svelte'|'astro'|'jsx'), default 'html'; vue compiles @if/@each to <template v-if/v-for> wrappers; astro/jsx emit JSX expressions; unsupported blocks are hard errors - unified @each: accepts 'item of items' AND 'items as item (key)' forms in every mode, parsed to structured {collection, bindings, key}; :key wrapper attribute unifies keying across hosts - jsx target: attribute aliasing (class→className), self-closing voids, {/* */} comments, key injection on the iteration root - comments: '//' silent (recoverable via recoverComments()), '//!' rendered; works at line level and inside attribute lists - content blocks (script:, article:md), component-name token, escape scopes 178 tests. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Owner
|
Thank you for your PR. I'll check it tomorrow. Have you also tested the behavior for src/emit-lexer.ts? |
Contributor
Author
|
All three behaviors I added are gated on |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Testing out monogram on another parser project I've had kicking around for a while... Ran into a few more small things. Overall it's working great!!
Three opt-in
IndentConfigextensions that the indent mode needs to host indentation languages that aren't YAML — specifically ones that nest tag lines (Pug-shaped) rather than key/value scalars. All three default off; a grammar declaring none tokenizes byte-identically (all 32 gates pass unchanged, including every YAML gate and the generative scope≡role check).Context: we're building NMBL (WIP... site a bit outdated) — an indentation shorthand for HTML that compiles to Vue/Svelte/Astro/JSX templates — as a monogram grammar, on the "adding a language is one grammar file" promise. It very nearly is: the grammar, parser, TextMate/tree-sitter/Monarch outputs and language-config all work on the unmodified engine. These three behaviors were the only places the indent mode had YAML baked in as the sole client.
1.
commentExcept— two-tier commentsNMBL has
//comments that are stripped (dev notes) and//!comments that are rendered into the output (<!-- … -->). The strip-tier wants exactly YAML'scommenttreatment — comment-only lines invisible to the indent stack. But the introducer check is a plainstartsWith, so//!lines (which must lex as real, structural tokens) get swallowed too: they share the//prefix.A line whose comment introducer is immediately followed by the exception string falls through to ordinary tokenization.
// notelines vanish;//! shipped notelexes as a declared token and participates in Newline/Indent structure.// !(space between) is still a comment.2.
rawBlock— verbatim capture introduced from the END of a lineblockScalarcaptures more-indented lines verbatim, but its trigger is a leading introducer char (the signature regex hardcodes[|>]). Pug-style languages introduce raw regions from the line's end — NMBL's content modes:Same capture semantics as blockScalar (bounded by indentation, blank lines included, one token from introducer through body). The introducer only counts when glued to the line's content — no top-level whitespace before it (whitespace inside balanced parens/quotes is fine, so
div(a="1" b):triggers) — or at the line lead (:md). That guard matters in practice:label Size:is inline text ending in a colon, not a raw block; we hit exactly this in a real template.3.
flowColonSeparator: false— opt out of the flow:separator carve-outIn flow context the lexer force-emits a
:glued after a quoted scalar or flow-close as the YAMLkey: valueseparator (the 5T43/C2DT cohort). Correct for YAML — but NMBL has:name-shaped tokens (Vue-style bound-attribute shorthand) that legally follow values and closes inside its(…)attribute lists:Default
truepreserves YAML behavior exactly (asserted in the tests both ways).Field notes (no action needed)
Two other YAML-isms we avoided via grammar choices rather than config, mentioned as data points on what
string: true/blockPatternopt into: flagging our string tokensstring: truepulled them into the same flow-:carve-out (we dropped the flag and lost auto-close delimiter derivation), and any token withblockPatternparticipates in plain-scalar continuation folding (rest-of-line capture), which is surprising outside YAML. If useful, happy to file these as separate issues.🤖 Generated with Claude Code