Skip to content

fix: two O(n²) editing slowdowns (UTF-16 boundary check + event compose)#1019

Merged
zxch3n merged 5 commits into
mainfrom
feat/loro-text-insert-perf
Jun 16, 2026
Merged

fix: two O(n²) editing slowdowns (UTF-16 boundary check + event compose)#1019
zxch3n merged 5 commits into
mainfrom
feat/loro-text-insert-perf

Conversation

@zxch3n

@zxch3n zxch3n commented Jun 16, 2026

Copy link
Copy Markdown
Member

Summary

Fixes two independent O(n²) slowdowns in editing. Both surfaced from the same investigation; the first matches the user-visible "JS text insert is quadratic" report, the second matches the "slow when a subscriber is attached" framing.

Bug 1 — UTF-16 / byte boundary check materialized the whole prefix (regression since 1.12.0)

1.12.0 (#954) added validate_text_boundary() to the text edit hot paths. For UTF-16/byte positions it round-trips the position through convert_pos, whose to-UTF-16/bytes branch did:

let prefix = state.get_text_slice_by_event_index(0, event_index)?; // whole [0,pos) string
count_utf16_len(prefix.as_bytes())

— O(pos) per edit → O(n²) for a run of edits. The JS binding always uses UTF-16 indices, so it was hit hardest. Fix: map the position via a single cursor query reading the rope's prefix caches (event_index_to_indexget_index_from_cursor), O(log n). Unicode-indexed editing was never affected (it early-returns before convert_pos).

Bug 2 — event building cloned the growing accumulated diff (subscriber-gated)

When a subscriber is attached, diffs_to_event composes all diff fragments targeting the same container in one batch via last_container_diff.clone().compose(...) — cloning the growing accumulator on every fragment → O(K²) for K fragments. Triggered by random-position text inserts (whose event hints don't merge) and by many distinct map-key writes. Fix: compose in place (mem::replace), make ResolvedMapDelta::compose consume self, and drop the needless accumulator clones in InternalDiff::compose / Diff::compose_ref. Affects text, map, and list events.

Benchmarks (random single-char edits)

scenario before after scaling before→after
JS loro-crdt@1.13.3, utf16 insert, n=48k 161284 ms 482 ms (local build) O(n²) → O(n)
Rust utf16 / utf8 / delete-utf16 insert quadratic flat per-op O(n²) → O(n)
Rust text insert + subscriber, n=48k 51149 ms 269 ms O(n²) → O(n)
Rust map distinct-key + subscriber, n=48k 47248 ms 112 ms O(n²) → O(n)

Verification

  • New tests: convert_pos_matches_prefix_reference_all_coords (every coord pair vs prefix reference + mid-char rejection), batched_same_container_edits_emit_correct_event (composed event content equals final doc), perf_text_insert_utf16_is_linear (#[ignore] perf guard, mirrors the existing perf_import_quadratic convention).
  • All loro text + contract tests, loro-internal lib tests (335), wasm-feature richtext tests, and undo tests pass. Clippy clean.

Related finding (not in this PR)

A wider audit found a third, confirmed O(n²): walking a tree backward through history (doc.checkout over many versions) is quadratic because the tree diff-calculator's retreat scans every node per step (crates/loro-internal/src/diff_calc/tree.rs:177), while the forward path uses a bounded lamport range. Measured x3.8–4.7 per doubling for tree, vs ~x2 for an equivalent text checkout walk (confirming it's the tree calculator, not general checkout). It is a distinct, correctness-sensitive diff-calc change and is left for a dedicated follow-up PR with fuzzing rather than bundled here.

🤖 Generated with Claude Code

Since 1.12.0 (#954), `validate_text_boundary` was added to the text
insert/delete/splice/mark hot paths. For UTF-16 and byte positions it
round-trips the position through `convert_pos`, whose to-UTF-16/bytes
branch materialized the entire `[0, pos)` prefix string via
`get_text_slice_by_event_index(0, pos)` and counted its length. That made
every edit O(n) and a run of edits O(n^2). The JS binding always uses
UTF-16 coordinates, so it was hit hardest.

Replace the prefix materialization with `event_index_to_index`, which
maps an event index onto the target coordinate by reading the rope's
prefix caches from a single cursor query (O(log n)). Behavior is
unchanged: invalid boundaries (mid-surrogate / mid-codepoint) are still
rejected by the round-trip comparison in `validate_text_boundary`.

Benchmarks (random single-char inserts, n=48000):
- JS loro-crdt@1.13.3 -> local fix: 161284ms -> 482ms (~334x)
- Rust insert_utf16/insert_utf8/delete_utf16: now linear (flat per-op)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

WASM Size Report

  • Original size: 3031.65 KB
  • Gzipped size: 1000.11 KB
  • Brotli size: 702.39 KB

When a subscriber is attached, `diffs_to_event` composes all diff
fragments that target the same container within one event batch. It did
`last_container_diff.clone().compose(...)`, cloning the *growing*
accumulated diff on every fragment — O(K^2) for K fragments. This is hit
by random-position text inserts (whose event hints don't merge) and by
many distinct map-key writes, etc.

- Compose in place via `mem::replace` instead of cloning the accumulator.
- `ResolvedMapDelta::compose` now consumes `self` instead of cloning
  `self.updated` (the map event path).
- `InternalDiff::compose` no longer clones the accumulated RichtextRaw
  delta; `Diff::compose_ref` (undo path) no longer clones the Map/Tree
  accumulator.

Benchmarks (random single-char inserts, subscriber attached, n=48000):
- text: 51149ms -> 269ms (~190x), now linear
- map distinct-key writes: 47248ms -> 112ms (~420x), now linear

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@zxch3n zxch3n changed the title fix: O(n²) text editing with UTF-16/byte positions fix: two O(n²) editing slowdowns (UTF-16 boundary check + event compose) Jun 16, 2026
zxch3n and others added 3 commits June 16, 2026 08:22
Converting a UTF-16 or UTF-8 (byte) offset to a unicode offset within a
text chunk scanned the chunk char-by-char (`utf16_to_unicode_index` /
`utf8_to_unicode_index` and their inverses). The b-tree locates the leaf
in O(log n), but the within-leaf scan is O(offset-in-chunk). Since the
state rope does not bound chunk size — a large insert, a loaded document,
or a run of sequentially-typed text that merges into one chunk all
produce big chunks — every UTF-16/UTF-8 position op (insert, delete,
slice, splice, mark, convert_pos) was O(chunk length), so a sequence of
them on a large chunk was O(n^2).

A chunk whose `utf16_len == unicode_len` contains no astral-plane chars,
so UTF-16 offsets equal unicode offsets; a chunk whose byte length equals
`unicode_len` is pure ASCII, so byte offsets equal unicode offsets. In
both cases the conversion is O(1). This covers essentially all
real-world text (ASCII / Latin / CJK and the JS UTF-16 path). Chunks that
actually contain astral chars (e.g. a huge all-emoji block) still scan,
but that is rare and unchanged in correctness.

Benchmarks (n=32000):
- random slice_utf16(p,p+1) on a big ASCII chunk: O(n^2) -> O(n) (flat ~0.46us/op)
- sequential utf16 append (typing): superlinear -> O(n) (flat ~3.2us/op)

Validated by the loro/loro-internal test suites (default + wasm feature)
and the `all` cargo-fuzz target over the shared corpus.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- event_index_to_index: drop the unreachable PosType::Event guard; both
  callers only pass Bytes/Utf16, and get_index_from_cursor already handles
  Event identically.
- TextChunk::_slice: reuse the new is_all_ascii() helper instead of an
  inline duplicate of the same predicate.
- diffs_to_event: use mem::take (DiffVariant already derives Default with
  None) instead of mem::replace(.., DiffVariant::None).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@zxch3n zxch3n merged commit 4d577ad into main Jun 16, 2026
1 check passed
@zxch3n zxch3n deleted the feat/loro-text-insert-perf branch June 16, 2026 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant