Incremental re-parse: O(damage) edits, a 9MB keystroke in ~0.05ms, proven ≡ fresh by johnsoncodehk · Pull Request #38 · johnsoncodehk/monogram

johnsoncodehk · 2026-06-10T18:46:53Z

parseEdited(source, entryRule?, edits?) re-parses after an edit reusing the previous parse, byte-identical to a from-scratch parse (gate-enforced). Per-edit work is O(damage) in steady state — a keystroke on a 9MB file re-parses in ~0.05ms median (~750× a full parse); batch parsing is unaffected (all mechanisms are sign/branch-dead outside incremental sessions).

Numbers (M2 Max)

shape	fresh parse	keystroke median	p90	speedup
9MB flat body (180k stmts)	569ms	0.04ms	0.07ms	~750×
8MB nested real code	253ms	0.13ms	0.16ms	~700×
81KB editor-typical	2.5ms	0.10ms	0.20ms	~25×

First edit of a session pays one-time costs (suffix flip, spare-buffer allocation, paren-stack scan); every keystroke after is sub-millisecond. Mixed random sessions (the verify gate, far jumps + rejects): 1.6-1.9×.

How

Layered so each mechanism only fires when the one above rejects:

Node surgery (the steady-state path): descend the old tree to the deepest pure list container (a seq of literals/refs around exactly one repetition of a memoized rule — shapes with alts/lookahead at the container's own level are excluded, since a losing arm's probes are owned by no kid row). Re-parse only the affected elements with the real rule functions, require exact rejoin at an old kid boundary, then splice the kid range and patch lengths up the path — all checks happen before any row is mutated. Kept prefix kids are bounded by the same lookahead-watermark rule adoption uses, made transitive by a lazy per-row containment bit.
Old-tree adoption + run-extension (the fallback): every rule entry consults the previous tree (stateless, backtracking-safe); repetition loops bulk-adopt runs of old siblings under a (pos, rule, generation) signature.
Windowed re-lex with depth-tolerant restart/resync; the live paren stack rolls forward from the previous edit's anchor instead of a backward scan.
Sign-relative spans kill the remaining O(n) add-loops: suffix token spans store EOF-relative negatives (updating one length variable is the shift), and a spliced parent's suffix kid rels store parent-end-relative negatives (the parent's length update is the shift). Values self-describe by sign; flip bands are cursor-locality sized.

Soundness

The equivalence gate (`test/incremental-verify.ts`) runs seeded random edit sessions (inserts, deletes, statement inserts, syntax-breaking edits) across real files, with explicit edit ranges throughout; every accepted re-parse must be byte-identical (toObject) to a fresh parse, rejects must reject on both sides.
Watermark discipline: memo/adoption/surgery all reuse only nodes whose recorded lookahead extent clears the damage (+2 read slack at use sites); reused subtree hits bump the enclosing watermark.
The seeded sessions caught three real holes during development (a trivia-boundary length leak — token-inside is not char-inside; an empty-range insertion stitched into a closed node; an Int32 overflow in a boundary remap). All three fixes are in, and the gate is green: 120 steps, 0 mismatches.
Full suite: 30/30 gates, emit≡interp parity 0 mismatches, lexer streams byte-identical, batch bench unchanged (11.4-11.6× aggregate).

API

The change protocol is LSP/VS Code's native shape — the engine owns the document text and builds it from the changes, so "the ranges don't match the text" is unrepresentable:

const p = createParser();
const cst = p.parse(text);
p.edit(cst, [{ start, end, text }]);     // each change in the coordinates of the
p.visit(cst, fns); p.tree.ruleNameOf(id);//   document after the preceding ones

The document lives as PIECES (flat fragments; a change splices O(1) string views, never a V8 cons-flatten — which costs ~1.2ms per keystroke on 9MB and was previously paid, unmeasured, by whoever materialized the text). The lexer re-lexes through a small materialized window with retry-on-truncation. The quoted keystroke numbers are therefore END-TO-END: text application included, no hidden caller cost.

There is no toObject: the engine hands out a tree to traverse (visit + accessors), never a materialized object tree — full materialization is a consumer choice (the equivalence gates build their comparison objects from visit in test code).

Contract (gated): a rejected edit throws and leaves the handle on the previous tree, readable (the reject path restores the token columns to the live tree's source); re-opening the document via parse() — even a rejecting one, since its arena reset precedes its outcome — invalidates prior handles; foreign handles throw. Documents are isolated per parser instance (gated: interleaved edit sessions stay byte-identical to fresh parses). Module-level parse/parseEdited keep working on a default document.

… reuse parseEdited(newSource) re-parses after an edit reusing everything the edit provably did not touch. No edit protocol: the damage window is DERIVED by diffing the old and new token columns (longest identical prefix; longest suffix identical modulo the char delta) — the caller just hands the new text. Reuse flows through the carried memo. Soundness rests on three pieces: - Every memo entry records its lookahead WATERMARK (memoExt): the farthest token the stored parse may have read — a PEG parse probes beyond its end (failed longer arms, not() lookaheads, SECOND-token dispatch). It comes for free from the global advance watermark at frame exit; the fixed read slack (stop token + SECOND probe, +2) is applied at invalidation time, so the stored value stays the true watermark. - A memo HIT bumps the watermark to the entry's own: the jump semantically reads everything the stored parse read, or an enclosing rule completing right after a reused subtree would record a watermark smaller than what its result depends on (including the child's over-probing failed arms), and a later edit in the gap would keep the stale entry alive. Guaranteed batch no-op by monotonicity — the 18,805-file byte-identical gate and the exact reject-message gate both stay green. - Prefix entries survive when watermark+slack stays inside the prefix; suffix entries shift by the token delta; the damage window drops. The old arena is re-based in place (suffix rows by charDelta, reused leaf entries by tokenDelta; damage-spanning rows are unreachable garbage), and new rows append after the old — a full parse() compacts. The ENTRY rule's repetition units (Stmt/Decl for TS — derived from the grammar shape, no language names) now memoize through parseRuleEntry like pratt/ left-rec rules, so whole untouched statements reuse, not just expression subtrees. Token columns double-buffer across edits (ping-pong, zero steady-state allocation; the in-place memo variant for token-count changes measured SLOWER than sparse rebuild — undefined writes materialize holes — and was reverted). Gate (test/incremental-verify.ts, #30 in the chain): scripted edit sessions over the bench files — inserts, deletes, statement insertions, syntax-breaking edits — every accepted re-parse must be byte-identical (toObject) to a fresh parse; rejects must reject on both sides. 120 steps, 0 mismatches. Measured: mixed-session 1.4-1.5x, single-keystroke ~3x, pure-reuse floor ~5.6x; the remaining cost is full-file relex + diff bookkeeping (windowed relex and the green {rel,len} re-base are the named follow-ups), the reused parse itself is ~1% of the profile.

…(M1) The lexer core is parameterized (lexCore): start anywhere with the previous token's (k, t) as the regex-context seed and empty template/paren stacks. Every token records its two stack depths (tkDp/tkPd columns); the restart anchor is the last token before the damage with both records zero and no live cross-token flag (a control-head ')' or postfix-ambiguous operator), walking back to the file head in the worst case — always sound. The window lexes into the spare buffer set (the old stream stays live), and RESYNC fires at the first token at/past the damage end that aligns with an old token (same k/t, spans shifted by the char delta) at EQUAL stack depths where every still-open bracket was opened BEFORE the damage — the byte-equal prefix guarantees those stack entries agree, while anything opened inside the damage may differ in control-head-ness and must not span the join. The depth-tolerant condition matters: an all-wrapping IIFE (typescript.js) keeps paren depth >= 1 everywhere, and a depth-0-only resync degraded 9MB edits to ~1.2x; with it they reach 2.6x. The splice is copyWithin + a suffix span shift; the damage window is derived from a char-level prefix/suffix compare of the two sources (no edit protocol needed). The true token prefix is recovered by comparing the window's leading tokens against the old stream before the splice, so the memo carry keeps everything the re-lex merely re-derived. Fallback-lexer grammars keep the full-relex path; tokenize() is unchanged for batch (the lexer-equality gate runs the full streams). Numbers: 81KB keystroke 3.5x -> 3.3x parse-side with lex now O(damage); mixed sessions ~1.5-1.65x; 9MB keystroke 2.6x. Remaining per-edit O(n) is the M3/M4 bookkeeping (memo prefix scans, arena re-base loops, suffix span shift) — the green {rel,len} re-base and old-tree cursor adoption kill those next. 30/30 gates; emit≡interp 18,802 byte-identical; reject messages and token streams exact.

… anchor The restart anchor no longer requires paren depth zero — inside an all-wrapping IIFE (typescript.js's 8.9MB bundle) no interior token has it, so the anchor fell back to the file head and the window re-lexed half the file. A '(' token now records its control-head-ness as tkFl bit 8, and reconstructParens rebuilds the live stack enclosing the anchor by a backward scan: the first '(' recording exactly depth d is the live opener of level d (closed openers at that depth are re-opened later, and the re-opener comes first backward). The anchor still requires template depth zero (interp brace counters are not reconstructable) and additionally must not be a control KEYWORD — a '(' lexed first in the window would mis-derive its head-ness from a missing predecessor. Two boundary bugs found by measurement on the way: lastDp/lastPd (the "depth before the damage" baseline that resync compares against) must initialize from the ANCHOR's depths, not zero — with the anchor adjacent to the edit there are no pre-damage pushes to set them, the baseline froze at zero, resync never fired inside the IIFE and the window ran to EOF (306ms edits, worse than the depth-0 restart it replaced); and tokenize() must keep returning tokN (the lexer-equality gate consumes it). incremental ≡ fresh 0/120 mismatches; lexer streams, reject messages, and the 18,805-file byte-identical gate all green; 30/30. 9MB keystroke edits land at ~120-160ms (machine-thermal band), now dominated by the named O(n) bookkeeping (memo prefix scans ~9M iterations/edit, arena re-base, suffix span shift, char-diff scans) — the green {rel,len} + cursor-adoption milestones' targets.

…vanish Nodes no longer store absolute positions. A row owns only its LENGTHS (rowLen chars, rowTokLen tokens); a node child's relative coordinates (kidRel chars, kidTokRel tokens, both against the parent's start) live in the PARENT's kids stream, parallel to the entries — NOT on the child row: a memo-reused subtree can be a child of several longest-match CANDIDATES, and a losing candidate completing after the winner would clobber child-side rel fields (928 corpus mismatches before the edge-ownership fix). Leaf entries are node-relative token indices. The red layer is a descent: visit/toObject thread (charBase, tokBase); leaf spans come from the token columns at tokBase + rel. Build stays absolute in TRANSIENT per-row coordinates (absChar/absTok), written at finishNode/finishWrap, read by the enclosing parent, never part of the green tree. A memo HIT refreshes the reused root's transients to the current stream in O(1) — which is the whole point: - the arena re-base loops (rowOff O(nodes), kids O(kids) per edit) are GONE; - the '>'-splice kids/scratch fixup is GONE (completed rows lie wholly before the splice point; the carried memo is cleared); - a reused subtree needs zero rewriting at any depth. Matchers thread one tokBase parameter (leaf spans come from the token columns, so they never need charBase); the totality gate's visit supplies it. The ts-ast lowering moves to the INTERPRETER oracle through a new object-tree TreeAccess adapter (test/obj-tree.ts, absolute coordinates, tokBase ignored) — the grammar↔tsc structure contract is engine-independent, and the lowering needed zero semantic changes. 18,802/18,805 emit ≡ interp byte-identical (toObject reproduces absolute objects exactly); reject messages exact; incremental ≡ fresh 0/120 with the mixed session at 1.69× (best yet); totality 32,167 nodes / 0 misses; 30/30. 9MB keystrokes ~148ms — now dominated by the memo prefix scans, the suffix span shift, and the char-diff scans (the cursor-adoption and chunked-column milestones' targets).

An incremental rule entry now asks the PREVIOUS tree first: adoptSeek walks the old root toward the mapped old position (cached containment path + binary search over each node's monotone child starts) and adopts a node when the rule matches, its lookahead gap stays clear of the damage (rowExt — the ext-minus-start LENGTH, position-independent like everything green), and the old parse MEMOIZED it (rowOK): a row built under a suppress (no-'in') or parseLimit-capped context is a context-dependent parse, and adoption must not widen the contract the memo carry never offered — skipping that bit produced real divergences (an incremental reject of text the fresh parse accepts). Adoption is STATELESS: nothing is consumed, so PEG backtracking needs no cursor rollback, a node refused under one longest-match candidate can be adopted by the next, and exploratory descent through same-start chains never commits to the cache. On adoption: pos jumps by rowTokLen, the watermark bumps by the gap, the transients refresh — all O(1). The memo becomes purely intra-parse: parseEdited's whole O(rules × n) carry/invalidate machinery (the prefix watermark scans, the sparse rebuilds) is deleted; fresh memo arrays per parse. incremental ≡ fresh 0/120 with the mixed session at 1.91× (best yet); 9MB keystrokes ~121ms; 18,802/18,805 emit ≡ interp byte-identical; reject messages exact; 30/30 gates.

The intra-parse memo arrays persist across parses; an entry is live iff its stamp (a new memoGen Int32Array per rule) equals the current generation, and bumping the generation counter IS the whole reset — parse(), parseEdited() and the '>'-splice all just increment it. Allocating fresh multi-million-slot arrays per edit was ~30% of a large-file edit in GC alone (and pushed V8 toward dictionary elements); now steady-state edits allocate nothing. 9MB keystroke edits: ~121ms -> ~50ms (5.4x vs a full parse); mixed sessions 2.27x. incremental ≡ fresh 0/120; 18,802 byte-identical; reject messages exact; 30/30 gates.

…scans An editor knows its edit ranges, so the damage envelope can come from the caller ([{start, oldEnd, newEnd}], merged over multiple edits) instead of the char-level prefix/suffix compare — which was the largest remaining O(file) scan (two charCodeAt sweeps over a 9MB source per keystroke). The compare stays as the no-protocol fallback. 9MB keystroke edits: ~50ms -> 7.8ms (34.6x vs a full parse), equivalence verified. What remains per edit is memcpy-grade: the suffix span shift and the token-column splice (the chunked-columns endgame), the window lex, and the adoption walk. incremental ≡ fresh 0/120 (the gate exercises the fallback path); 30/30 gates.

…ack cache A 9MB flat-body keystroke spent 79% of its 61ms re-entering parseRuleEntry/adoptSeek once per undamaged statement, and another 7ms re-deriving the live paren stack by backward scan (the IIFE worst case). - adoptSeek publishes the hit site (old parent row / kid index / base) when the adopted node is the parent's direct kid; parseRuleEntry arms a (pos, rid, generation)-signed run signal on such adoptions. - '*'/'+' loops whose element is a parseRuleEntry-routed rule (pratt / left-rec / spine) consume the signal via runExtend: following old siblings are adopted in one tight loop under exactly the single-adopt eligibility (same-rule row, rowOK, contiguous, damage-clear, non-zero width). A member's existence proves the loop's FIRST guard true at its position; the signature triple keeps an inner rule's adoption from feeding elements into an outer loop. Members skip memo stores - a backtracking re-probe just re-adopts. - reconstructParensCached rolls the previous anchor's stack FORWARD over the tokens between the anchors (tokens at/before the cached anchor are splice-stable); backward jumps fall back to the full scan. Invalidated by full lexes and the '>' splice. - The spine-rule set moved to Emitter.spineSet(), shared by emitRuleFns and the quantifier hook. 9MB IIFE keystroke: 61ms -> 10.4ms (parse 50.5 -> 6.9, parens 7.2 -> 0.2). Gates: 30/30, incremental 0/120, emit-parser-verify 0 mismatch, emit-lexer-verify streams equal, batch bench unchanged (11.4x aggregate).

After run-adoption, three O(n)-per-edit costs remained on a 9MB flat body: the damage-path list parent re-collected all 180k kids through scratch (and the arena grew by that much per edit), the suffix token spans took a char-delta add-loop, and the spliced parent's suffix kids took a rel add-loop. - Node SURGERY patches the damage path in place. Descend the old tree along single-affected-row kids; at the deepest PURE container (SURG_ELEM: a seq of literals/refs around exactly one '*'/'+' rep of a parseRuleEntry-routed rule - no alt/sep/opt/not at the container's own level, so every probe is owned by a kid row), re-parse only the affected elements with the real rule fn (adoption reuses their undamaged subtrees), require exact rejoin at an old kid start, then splice the kid range and patch lengths up the path. Every check runs before any row is mutated; any failure falls back to the full adoption re-parse. Prefix kids are kept under the adoption watermark rule, made transitive by rowKC (lazy kid-containment bit). Pure insertions at a kid boundary must touch the rep zone (a neighbour element), or the splice would stitch the element into a CLOSED node. Char lengths are re-DERIVED from the token columns, not patched by the char delta: a pure-trivia edit can sit token-inside but char-outside a node (the gap belongs to no node). - EOF-relative token spans: tkOff/tkEnd at/after the damage store value - (srcLen + 1); decode adds the current length back, so updating srcLenP1 IS the suffix shift. Values self-describe by sign; negFrom bounds the flip band (cursor-locality sized). The '>' splice writes its pair sign-consistently with the zone it lands in. - END-relative kid rels: a row kid's kidRel/kidTokRel may be stored relative to the parent's END (strictly negative, decoded with the parent's current lengths), so a surgical splice shifts the whole kid suffix by updating the parent's lengths. Stable across edits while the parent row is untouched; rowNF bounds the per-row band. Leaf kids stay start-relative (packed) - a pure container's trailing leaves get an O(1) backward walk. - incremental-verify now alternates the edits-protocol and char-diff envelopes, and its seeded sessions caught three real holes during development (trivia-boundary length leak, closed-node stitching, Int32 overflow in the relocated-range boundary remap). 9MB keystroke: 10.4ms -> median 0.04ms / p90 0.07ms (~750x vs fresh, steady state; the first edit of a session pays the one-time flip + buffer allocation). 8MB nested real-code shape: median 0.13ms. 81KB: median 0.10ms. Batch is sign-clean: emitted aggregate 11.4-11.6x (unchanged band), 30/30 gates, emit-parser-verify 0 mismatches, emit-lexer-verify byte-identical streams.

A deep edit under a giant flat list paid an O(suffix-kids) eager rel walk per keystroke on every ANCESTOR with a large suffix - the band so far existed only on the surgical container itself. Measured on the 9MB flat body as ancestor: 0.60ms median / 1.85ms p90 per keystroke. Ancestors whose rule is a pure container (SURG_ELEM: interior = element rows only, leaves only as a trailing run) now maintain the same end-relative band as D: rows entering the suffix flip once (old-length bias cancels), rows beyond the boundary ride the parent length update, trailing leaves get the O(1) backward re-pack. Mixed-content ancestors (interleaved leaves cannot sign-encode inside the packed kid entry) keep the eager walk - those are the grammar's non-list shapes with small kid counts. Nested edit on the 9MB flat body: 0.60ms -> 0.031ms median / 0.047ms p90. List-level keystroke unchanged (0.04ms median), 8MB nested real shape 0.13ms, batch aggregate in band (11.2x), 30/30 gates, incremental-verify 0/120, emit-parser-verify 0 mismatches.

…estart anchor API rework (the session model made the edit base IMPLICIT - parseEdited acted on whatever was parsed last, and two interleaved documents shared one module state): const p = createParser(); const cst = p.parse(text); const cst2 = p.edit(cst, next[, edits]); - Each parser instance owns a DOCUMENT: the 51 per-document fields (token columns, arena, kids, memo, session, paren cache, spare buffers) live in a doc object; the module-level variables stay the ACTIVE REGISTER SET and activate() lazily swaps on instance switch - the hot paths never indirect through an object (batch unchanged, 11.0-11.3x band; handle-API keystroke median 0.06ms). - Handles are generation-stamped: trees are edited IN PLACE (node surgery), so an edit invalidates earlier handles of that parser - using one throws instead of silently reading a mutated tree. A REJECTED edit leaves the previous handle valid and the next edit falls back to a full re-parse internally. - Module-level parse/parseEdited/visit/toObject keep working on a default document (gates/back-compat); the interpreter's createParser gains edit() (full re-parse - immutable object trees) for API parity. - NEW gate test/multi-doc.ts: two instances over two sources, edits interleaved with the default doc mixed in - every edited tree must equal a fresh parse (a missed swap field = cross-document corruption), plus the stale/foreign/reject handle contract. SOUNDNESS FIX the new smoke test exposed (predates this branch, M1-era): findRestart anchored at tokens ENDING exactly at the damage start, but maximal munch lets the edit EXTEND such a token ('b' + inserted 'x' lexes as 'bx', '=' + '=' as '==', deleting a gap glues neighbours) and the anchor itself is never re-lexed - the spliced stream then carried 'b','x' as two tokens where a batch lex has one, parsing a DIFFERENT (sometimes still valid) program. The fixed-seed gate sessions never hit the abutment in 120 steps. Anchor comparison is now STRICT (<), so the abutting token falls inside the window and the merge is re-derived; incremental-verify gains a deterministic boundary-glue session (ident glue, operator glue, gap deletion, '>>' split sites) so the class stays pinned. 31/31 gates (multi-doc included), incremental-verify 128 steps 0 mismatch, emit-parser-verify 0 mismatches, agnostic 9/9, batch in band.

…ontract Returning a new handle from edit() read like value semantics - as if the old cst survived and edit produced a clone. There is no clone: surgery mutates the tree in place. The handle is now the STABLE IDENTITY of the document's tree: p.edit(cst, next) updates cst.root and returns void; the same reference always reads the current tree. Making that honest exposed two reject holes the contract tests pinned: - A rejected edit had already spliced the token columns to the rejected text (the splice precedes the parse attempt), so the kept tree's leaf spans read corrupted data. The reject path now restores the columns by re-lexing the LIVE tree's source (treeSrc, which unlike lastSrc survives rejects) - O(n) on the reject path only; #39's recovery mode is what makes rejects rare. - The full-parse fallback inside edit (after a previous reject) went through parseCore, whose arena reset destroys the live tree BEFORE knowing whether the new text parses. edit now falls back in APPEND mode; parse() is the only compaction point - and since its reset happens before its outcome is known, parse() bumps the generation on entry: old handles die when a document is re-opened, success or not. Handle contract (gated, 5/5): in-place edit updates the same handle; a rejected edit throws and keeps the handle on the previous tree (readable); foreign handles throw; re-opening via parse() - including a REJECTING parse() - invalidates prior handles. The interpreter's edit() mirrors the in-place semantics by replacing the tree object's fields. 31/31 gates, incremental-verify 128 steps 0 mismatch, parity 0 mismatches, handle-API keystroke median 0.028ms.

…surface The arena design's premise (PR #36) is that parse() hands out a tree to TRAVERSE, not an object tree to materialize - toObject was the materialization back door left on both the module API and the handle API, and its only real consumer was the gate layer's byte-identical JSON comparison. That is a test concern: gates now build the comparison object through visit + tree accessors (test/emitted-obj.ts, mirroring the interpreter's object shape and key order exactly, so the emit ≡ interp and incremental ≡ fresh comparisons are unchanged). The unused emitted getText went with it; the interpreter keeps returning its native object trees (that IS its representation, not a conversion). 31/31 gates, emit-parser-verify 0 mismatches, multi-doc contract 5/5, handle-API keystroke median 0.028ms (9MB) / 0.089ms (8MB nested).

The char-diff envelope was the protocol's predecessor left in as a convenience default - but it silently spends O(file) prefix/suffix scans, defeating the O(damage) contract exactly for the callers who reached for the incremental API. Callers that track edits (editors) all have the ranges; a caller without them passes the whole-file range and gets an honest full re-parse instead of hidden scans. The ranges MUST cover every change: over-claiming shrinks via the true token-prefix compare; under-claiming is the caller's bug (the same garbage-in contract as tree-sitter's tree.edit, now documented at the envelope). edit() without ranges throws (gated, contract 6/6); the seeded sessions and glue cases all pass explicit ranges (the gate keeps a small test-side diff helper for its constructed pairs). 31/31 gates, parity 0 mismatches, keystroke median 0.028ms.

…O(damage) Why next had to go: it was the only carrier of the inserted content (the ranges carry positions, not text), which meant the API could express 'the text and the ranges disagree' - the garbage-in hazard. The change protocol [{ start, end, text }] is LSP/VS Code's native shape (each edit in the coordinates of the document after the preceding ones) and makes the inconsistency unrepresentable: the engine BUILDS the new text from the changes. Building it as a string exposed a cost that was always there, hidden in the caller: slicing the previous edit's cons string flattens it in V8 - measured 1.18ms per keystroke on 9MB, paid by whoever materializes the text. The engine now owns the document as PIECES (flat fragments; applying a change splits via O(1) SlicedString views and never flattens), with: - window-materialized relexing: lexCore reads a small flat slice with an absolute srcBase bias (biased once per token in tkPush - batch cost is two adds per token); running off the window end - including a matcher failing at the EDGE (a truncated string literal is not a lex error) - signals a retry with a larger window via LEX_RETRY. A cut token cannot fake a resync: suffix-zone equality makes its end mismatch the old token's. - doc reads route through docChar/docText (flat fast path, cursor- cached piece lookup otherwise); cold paths (errors, debug) flatten lazily; pieces consolidate past 256 fragments (amortized join). - the reject restore re-lexes the live tree's pieces but preserves the DOCUMENT pieces (the editor's buffer holds the rejected text; the gates' editor model now advances on reject and verifies an UNDO revert edit against a fresh parse every time). End-to-end keystroke (engine builds the text, nothing hidden): 9MB median 0.024ms / p90 0.047ms; 8MB nested 0.072ms / p90 0.094ms. 31/31 gates, parity 0 mismatches, lexer streams byte-identical, batch in band (11.5x).

johnsoncodehk added 9 commits June 11, 2026 02:46

johnsoncodehk changed the title ~~Incremental re-parse: parseEdited() with watermarked memo carry-over, proven ≡ fresh~~ Incremental re-parse: O(damage) edits, a 9MB keystroke in ~0.05ms, proven ≡ fresh Jun 10, 2026

johnsoncodehk mentioned this pull request Jun 10, 2026

Error-tolerant parsing mode: a live tree through transiently-invalid edit states #39

Open

johnsoncodehk added 5 commits June 11, 2026 06:59

johnsoncodehk merged commit 70a8019 into master Jun 10, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental re-parse: O(damage) edits, a 9MB keystroke in ~0.05ms, proven ≡ fresh#38

Incremental re-parse: O(damage) edits, a 9MB keystroke in ~0.05ms, proven ≡ fresh#38
johnsoncodehk merged 15 commits into
masterfrom
incremental

johnsoncodehk commented Jun 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

johnsoncodehk commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Numbers (M2 Max)

How

Soundness

API

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

johnsoncodehk commented Jun 10, 2026 •

edited

Loading