fix: two O(n²) editing slowdowns (UTF-16 boundary check + event compose)#1019
Merged
Conversation
Since 1.12.0 (#954), `validate_text_boundary` was added to the text insert/delete/splice/mark hot paths. For UTF-16 and byte positions it round-trips the position through `convert_pos`, whose to-UTF-16/bytes branch materialized the entire `[0, pos)` prefix string via `get_text_slice_by_event_index(0, pos)` and counted its length. That made every edit O(n) and a run of edits O(n^2). The JS binding always uses UTF-16 coordinates, so it was hit hardest. Replace the prefix materialization with `event_index_to_index`, which maps an event index onto the target coordinate by reading the rope's prefix caches from a single cursor query (O(log n)). Behavior is unchanged: invalid boundaries (mid-surrogate / mid-codepoint) are still rejected by the round-trip comparison in `validate_text_boundary`. Benchmarks (random single-char inserts, n=48000): - JS loro-crdt@1.13.3 -> local fix: 161284ms -> 482ms (~334x) - Rust insert_utf16/insert_utf8/delete_utf16: now linear (flat per-op) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
WASM Size Report
|
When a subscriber is attached, `diffs_to_event` composes all diff fragments that target the same container within one event batch. It did `last_container_diff.clone().compose(...)`, cloning the *growing* accumulated diff on every fragment — O(K^2) for K fragments. This is hit by random-position text inserts (whose event hints don't merge) and by many distinct map-key writes, etc. - Compose in place via `mem::replace` instead of cloning the accumulator. - `ResolvedMapDelta::compose` now consumes `self` instead of cloning `self.updated` (the map event path). - `InternalDiff::compose` no longer clones the accumulated RichtextRaw delta; `Diff::compose_ref` (undo path) no longer clones the Map/Tree accumulator. Benchmarks (random single-char inserts, subscriber attached, n=48000): - text: 51149ms -> 269ms (~190x), now linear - map distinct-key writes: 47248ms -> 112ms (~420x), now linear Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Converting a UTF-16 or UTF-8 (byte) offset to a unicode offset within a text chunk scanned the chunk char-by-char (`utf16_to_unicode_index` / `utf8_to_unicode_index` and their inverses). The b-tree locates the leaf in O(log n), but the within-leaf scan is O(offset-in-chunk). Since the state rope does not bound chunk size — a large insert, a loaded document, or a run of sequentially-typed text that merges into one chunk all produce big chunks — every UTF-16/UTF-8 position op (insert, delete, slice, splice, mark, convert_pos) was O(chunk length), so a sequence of them on a large chunk was O(n^2). A chunk whose `utf16_len == unicode_len` contains no astral-plane chars, so UTF-16 offsets equal unicode offsets; a chunk whose byte length equals `unicode_len` is pure ASCII, so byte offsets equal unicode offsets. In both cases the conversion is O(1). This covers essentially all real-world text (ASCII / Latin / CJK and the JS UTF-16 path). Chunks that actually contain astral chars (e.g. a huge all-emoji block) still scan, but that is rare and unchanged in correctness. Benchmarks (n=32000): - random slice_utf16(p,p+1) on a big ASCII chunk: O(n^2) -> O(n) (flat ~0.46us/op) - sequential utf16 append (typing): superlinear -> O(n) (flat ~3.2us/op) Validated by the loro/loro-internal test suites (default + wasm feature) and the `all` cargo-fuzz target over the shared corpus. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- event_index_to_index: drop the unreachable PosType::Event guard; both callers only pass Bytes/Utf16, and get_index_from_cursor already handles Event identically. - TextChunk::_slice: reuse the new is_all_ascii() helper instead of an inline duplicate of the same predicate. - diffs_to_event: use mem::take (DiffVariant already derives Default with None) instead of mem::replace(.., DiffVariant::None). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes two independent O(n²) slowdowns in editing. Both surfaced from the same investigation; the first matches the user-visible "JS text insert is quadratic" report, the second matches the "slow when a subscriber is attached" framing.
Bug 1 — UTF-16 / byte boundary check materialized the whole prefix (regression since 1.12.0)
1.12.0(#954) addedvalidate_text_boundary()to the text edit hot paths. For UTF-16/byte positions it round-trips the position throughconvert_pos, whose to-UTF-16/bytes branch did:— O(pos) per edit → O(n²) for a run of edits. The JS binding always uses UTF-16 indices, so it was hit hardest. Fix: map the position via a single cursor query reading the rope's prefix caches (
event_index_to_index→get_index_from_cursor), O(log n). Unicode-indexed editing was never affected (it early-returns beforeconvert_pos).Bug 2 — event building cloned the growing accumulated diff (subscriber-gated)
When a subscriber is attached,
diffs_to_eventcomposes all diff fragments targeting the same container in one batch vialast_container_diff.clone().compose(...)— cloning the growing accumulator on every fragment → O(K²) for K fragments. Triggered by random-position text inserts (whose event hints don't merge) and by many distinct map-key writes. Fix: compose in place (mem::replace), makeResolvedMapDelta::composeconsumeself, and drop the needless accumulator clones inInternalDiff::compose/Diff::compose_ref. Affects text, map, and list events.Benchmarks (random single-char edits)
loro-crdt@1.13.3, utf16 insert, n=48kVerification
convert_pos_matches_prefix_reference_all_coords(every coord pair vs prefix reference + mid-char rejection),batched_same_container_edits_emit_correct_event(composed event content equals final doc),perf_text_insert_utf16_is_linear(#[ignore]perf guard, mirrors the existingperf_import_quadraticconvention).lorotext + contract tests,loro-internallib tests (335),wasm-feature richtext tests, and undo tests pass. Clippy clean.Related finding (not in this PR)
A wider audit found a third, confirmed O(n²): walking a tree backward through history (
doc.checkoutover many versions) is quadratic because the tree diff-calculator's retreat scans every node per step (crates/loro-internal/src/diff_calc/tree.rs:177), while the forward path uses a bounded lamport range. Measured x3.8–4.7 per doubling for tree, vs ~x2 for an equivalent text checkout walk (confirming it's the tree calculator, not general checkout). It is a distinct, correctness-sensitive diff-calc change and is left for a dedicated follow-up PR with fuzzing rather than bundled here.🤖 Generated with Claude Code