Skip to content

fix: recover two per-op editing slowdowns regressed since 1.11#1021

Merged
zxch3n merged 3 commits into
mainfrom
feat/perf-regression-hunt
Jun 17, 2026
Merged

fix: recover two per-op editing slowdowns regressed since 1.11#1021
zxch3n merged 3 commits into
mainfrom
feat/perf-regression-hunt

Conversation

@zxch3n

@zxch3n zxch3n commented Jun 16, 2026

Copy link
Copy Markdown
Member

Summary

Recovers two constant-factor per-operation editing regressions that crept in
between the 1.11.1 release and current HEAD. Both were introduced by the
lazy-snapshot work in #985 and were found by bisecting the public-API editing
benchmarks against the loro-internal-v1.11.1 tag.

These are distinct from #1019 (which fixed genuine O(n²) editing paths); these
are O(1)-per-op overheads that nonetheless added ~15–20% to common editing
workloads.

Fixes

1. Per-insert allocation in ensure_no_regular_container_value (handler.rs)

Every MapHandler/ListHandler/MovableListHandler insert (14 call sites)
heap-allocated a Vec to walk the value for nested containers — even for scalar
values, which is the overwhelmingly common case. Added a scalar fast-path that
returns before allocating.

2. Double DocState lock on the text bounds check (handler.rsstate.rs)

TextHandler::len/len_unicode/len_utf16 are called on every public text
insert/delete. Post-#985 they took two DocState locks — one to check
whether the container state was decoded, then another to query the length.
Consolidated into a single DocState::get_text_len taking one lock + one
container-store lookup. The per-pos_type store helpers already branch
decoded-vs-lazy internally, so the lazy-snapshot memory optimization is fully
preserved: a still-lazy container reads its cached length metadata without
materializing the richtext state.

Benchmarks (clean, back-to-back, same machine)

Benchmark Before (HEAD) After vs 1.11.1
map create 10^4 key ~19.4 ms ~10.7 ms (−45%) was 16.9 ms → now faster than 1.11
bench_text B4 apply (per-op text) ~389 ms ~352 ms (−9.5%) recovers ~half of #985's regression

Validation

  • cargo test -p loro-internal --lib --features test_utils,counter — 287 passed
    (incl. the lazy-decode invariant tests text_lazy_event_queries_match_decoded_state
    and text_snapshot_string_queries_do_not_decode_state, which guard perf: reduce snapshot read memory retention #985's
    no-force-decode behavior).
  • cargo test -p loro-internal --test mergeable_container — 22 passed.
  • cargo test -p loro --test contracts list_movable_boundary — 4 passed
    (incl. the container-rejection error path for fix Dag #1).

Not addressed here

The remaining ~half of the text per-op regression comes from #985's lazy
ContainerWrapper indirection on the state-apply path
(with_state_mut → get_or_create_mut → ContainerWrapper::get_state_mut → decode_state). That is the core of #985's design and warrants a separate,
carefully-benched follow-up rather than being bundled here.

🤖 Generated with Claude Code

zxch3n and others added 2 commits June 16, 2026 15:55
Both are constant-factor regressions on the per-op (auto-commit) editing
path introduced by the lazy-snapshot work in #985, found while bisecting
current HEAD against the 1.11.1 release.

1. map/list/movable-list insert: ensure_no_regular_container_value
   heap-allocated a Vec on every insert, even for scalar values (the
   common case). Add a scalar fast-path that skips the allocation and
   traversal. `map create 10^4 key`: ~19.4ms -> ~10.7ms.

2. text insert/delete: the per-op bounds-check len()/len_unicode()/
   len_utf16() took two DocState locks (decoded-check, then query).
   Consolidate into one DocState::get_text_len taking a single lock +
   container-store lookup, preserving the lazy-snapshot memory behavior
   (a still-lazy container reads cached length metadata without
   materializing the richtext state). `bench_text B4 apply`: ~389ms -> ~352ms.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Lockfile update to match the crates/loro-wasm Cargo.toml version bump that
landed via the main merge (chore: version packages).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

WASM Size Report

  • Original size: 3032.05 KB
  • Gzipped size: 1000.40 KB
  • Brotli size: 702.41 KB

Addresses a review finding on the per-op text length consolidation: the
PosType::Bytes branch of DocState::get_text_len built the full plain string
via get_value_by_idx().as_string().len(), so insert_utf8/delete_utf8 bounds
checks on an already-decoded text became O(text length) per op (the old code
read cached byte-length metadata in the decoded case).

Add a text_utf8_len store helper mirroring text_unicode_len/text_utf16_len:
- decoded state reads the O(1) cached byte length
  (RichtextState::len_utf8 = root_cache().bytes)
- a still-lazy container reads the byte length from the already-materialized
  text value string (O(1) str::len), preserving the no-force-decode behavior

Also route the public TextHandler::len_utf8 through the same single-lock
helper; it had the same has_decoded_state double-lock + string-construction
pattern.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@zxch3n zxch3n merged commit 52d8168 into main Jun 17, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant