Motivation
The incremental parser (#38) keeps a keystroke at ~0.05ms on multi-MB files, but only for inputs the grammar accepts. A typing session spends most of its time in transiently invalid states (const is already invalid), and the current session policy — reject on both sides, keep the previous tree — leaves the tree stale through every burst of typing until the text parses again.
Error tolerance is what turns the incremental parser into an editor engine (and what a parser-as-product IDE adopter needs). It is also the one capability tree-sitter/Lezer have that this parser deliberately lacks today.
Hard constraint: recovery is a MODE
Conformance is a product guarantee (bidirectional vs tsc, byte-exact reject messages, emit ≡ interp parity). Therefore:
- Strict mode (default) does not change at all. Every existing gate keeps running against strict mode and must stay byte-identical.
- Recovery mode is opt-in (e.g.
parse(src, entry, { recover: true }) / a recovering session) and only does extra work on inputs strict mode rejects: a valid input must produce the IDENTICAL tree in both modes.
Design sketch
- Recover at the spine/rep sync points (statement granularity first). When an element fails inside a repetition, skip tokens to a sync set, emit an error node spanning the skipped region, and continue the loop. The sync set must be grammar-derived (rep-element FIRST sets + enclosing closers) — no language names (the agnostic gate applies).
- Error nodes are ordinary rows (a reserved rule id): the green tree, visit/toObject, matchers, adoption and node surgery all generalize without new cases — an error row adopts and splices like any row.
- The equivalence invariant survives: incremental-with-recovery ≡ fresh-with-recovery, byte-identical, gated by seeded sessions whose syntax-breaking edits must now produce identical trees instead of falling into the both-reject bucket. (Tree-sitter does not make this guarantee — its edited trees can diverge from fresh ones in error scenarios.)
- Diagnostics instead of throws: recovery-mode parse never throws; it returns the root plus a diagnostics list (offset, skipped span, the strict-mode expectation info).
Open questions
- Error-node shape: span-only (skip-to-sync) vs keeping the longest partial parse of the broken element (tsc-style incomplete nodes give better trees for IDE features, at a much larger surface). Start span-only?
- Expression-internal recovery (Pratt/left-rec arms) — out of scope for the first round; statement-granularity recovery already bounds the damage to one element.
- Watermark semantics for error rows: rowExt must cover the skipped region plus every probe of the failed arms, or a later edit inside the skipped span could keep a stale error row alive (the memo/adoption soundness argument extends, but needs its own gate-witnessed proof).
- ASI interaction: a skip-to-sync that crosses a newline must not change how the following statement's ASI behaves (sync sets vs sameLine connectors).
Acceptance criteria
- Strict mode: all existing gates pass unchanged (30/30, conformance matrix, parity, reject-message equality).
- Recovery mode totality: never throws across the generative/fuzz corpus + the edit-session corpus.
- Valid inputs: recovery tree ≡ strict tree, byte-identical.
- Invalid inputs: error-node spans tile the source (CST text invariant holds); diagnostics non-empty exactly when strict rejects.
- Incremental: recovery sessions stay O(damage) steady-state (surgery/adoption around error rows), and incremental-with-recovery ≡ fresh-with-recovery over seeded sessions including syntax-breaking edits.
Context
Motivation
The incremental parser (#38) keeps a keystroke at ~0.05ms on multi-MB files, but only for inputs the grammar accepts. A typing session spends most of its time in transiently invalid states (
constis already invalid), and the current session policy — reject on both sides, keep the previous tree — leaves the tree stale through every burst of typing until the text parses again.Error tolerance is what turns the incremental parser into an editor engine (and what a parser-as-product IDE adopter needs). It is also the one capability tree-sitter/Lezer have that this parser deliberately lacks today.
Hard constraint: recovery is a MODE
Conformance is a product guarantee (bidirectional vs tsc, byte-exact reject messages, emit ≡ interp parity). Therefore:
parse(src, entry, { recover: true })/ a recovering session) and only does extra work on inputs strict mode rejects: a valid input must produce the IDENTICAL tree in both modes.Design sketch
Open questions
Acceptance criteria
Context