fix: recover two per-op editing slowdowns regressed since 1.11#1021
Merged
Conversation
Both are constant-factor regressions on the per-op (auto-commit) editing path introduced by the lazy-snapshot work in #985, found while bisecting current HEAD against the 1.11.1 release. 1. map/list/movable-list insert: ensure_no_regular_container_value heap-allocated a Vec on every insert, even for scalar values (the common case). Add a scalar fast-path that skips the allocation and traversal. `map create 10^4 key`: ~19.4ms -> ~10.7ms. 2. text insert/delete: the per-op bounds-check len()/len_unicode()/ len_utf16() took two DocState locks (decoded-check, then query). Consolidate into one DocState::get_text_len taking a single lock + container-store lookup, preserving the lazy-snapshot memory behavior (a still-lazy container reads cached length metadata without materializing the richtext state). `bench_text B4 apply`: ~389ms -> ~352ms. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Lockfile update to match the crates/loro-wasm Cargo.toml version bump that landed via the main merge (chore: version packages). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
WASM Size Report
|
Addresses a review finding on the per-op text length consolidation: the PosType::Bytes branch of DocState::get_text_len built the full plain string via get_value_by_idx().as_string().len(), so insert_utf8/delete_utf8 bounds checks on an already-decoded text became O(text length) per op (the old code read cached byte-length metadata in the decoded case). Add a text_utf8_len store helper mirroring text_unicode_len/text_utf16_len: - decoded state reads the O(1) cached byte length (RichtextState::len_utf8 = root_cache().bytes) - a still-lazy container reads the byte length from the already-materialized text value string (O(1) str::len), preserving the no-force-decode behavior Also route the public TextHandler::len_utf8 through the same single-lock helper; it had the same has_decoded_state double-lock + string-construction pattern. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Recovers two constant-factor per-operation editing regressions that crept in
between the 1.11.1 release and current HEAD. Both were introduced by the
lazy-snapshot work in #985 and were found by bisecting the public-API editing
benchmarks against the
loro-internal-v1.11.1tag.These are distinct from #1019 (which fixed genuine O(n²) editing paths); these
are O(1)-per-op overheads that nonetheless added ~15–20% to common editing
workloads.
Fixes
1. Per-insert allocation in
ensure_no_regular_container_value(handler.rs)Every
MapHandler/ListHandler/MovableListHandlerinsert (14 call sites)heap-allocated a
Vecto walk the value for nested containers — even for scalarvalues, which is the overwhelmingly common case. Added a scalar fast-path that
returns before allocating.
2. Double
DocStatelock on the text bounds check (handler.rs→state.rs)TextHandler::len/len_unicode/len_utf16are called on every public textinsert/delete. Post-#985 they took twoDocStatelocks — one to checkwhether the container state was decoded, then another to query the length.
Consolidated into a single
DocState::get_text_lentaking one lock + onecontainer-store lookup. The per-
pos_typestore helpers already branchdecoded-vs-lazy internally, so the lazy-snapshot memory optimization is fully
preserved: a still-lazy container reads its cached length metadata without
materializing the richtext state.
Benchmarks (clean, back-to-back, same machine)
map create 10^4 keybench_text B4 apply(per-op text)Validation
cargo test -p loro-internal --lib --features test_utils,counter— 287 passed(incl. the lazy-decode invariant tests
text_lazy_event_queries_match_decoded_stateand
text_snapshot_string_queries_do_not_decode_state, which guard perf: reduce snapshot read memory retention #985'sno-force-decode behavior).
cargo test -p loro-internal --test mergeable_container— 22 passed.cargo test -p loro --test contracts list_movable_boundary— 4 passed(incl. the container-rejection error path for fix Dag #1).
Not addressed here
The remaining ~half of the text per-op regression comes from #985's lazy
ContainerWrapperindirection on the state-apply path(
with_state_mut → get_or_create_mut → ContainerWrapper::get_state_mut → decode_state). That is the core of #985's design and warrants a separate,carefully-benched follow-up rather than being bundled here.
🤖 Generated with Claude Code