From e8907f7c2ae754935cd08111b078149665e074eb Mon Sep 17 00:00:00 2001 From: Zixuan Chen Date: Tue, 16 Jun 2026 02:18:55 +0000 Subject: [PATCH 1/4] chore: expand agent guidelines --- AGENTS.md | 223 +++++++++++++++++++++---------------- crates/loro-wasm/AGENTS.md | 70 +++++++++++- 2 files changed, 199 insertions(+), 94 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index ac85d79ea..42a429d3b 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,113 +1,150 @@ # Repository Guidelines -## Project Structure & Module Organization - -This is a Rust workspace with JS/WASM packaging around the core CRDT library. -Key crates live under `crates/`: `loro` is the public Rust API, `loro-internal` -contains core CRDT logic, `loro-wasm` exposes the WASM/TypeScript package, and -`delta`, `rle`, `kv-store`, and `fractional_index` hold shared primitives. -Integration and regression tests are mostly in `crates/loro/tests` and -`crates/loro-internal/tests`; WASM tests and package files are in -`crates/loro-wasm`. Examples live in `examples/` and `crates/examples`. - -## Build, Test, and Development Commands - -- `cargo build`: build the Rust workspace. -- `cargo check -p loro-internal`: quickly validate core internals. -- `cargo test -p loro-internal --doc`: run Rust doctests for internal APIs. -- `pnpm test`: run the main Rust test suite via nextest plus doctests. -- `pnpm check`: run clippy with all features and deny warnings. -- `pnpm release-wasm`: sync versions and build the release WASM package. -- `pnpm test-loom`: run loom concurrency tests for `crates/loro/tests/multi_thread_test.rs`. - -## Coding Style & Naming Conventions - -Use standard Rust formatting with `rustfmt`; keep imports and chained calls formatted -by the tool. Prefer explicit, small APIs and existing crate-local helpers over new -abstractions. Rust items use `snake_case` for functions/modules and `CamelCase` for -types. JS/TS bindings in `loro-wasm` should preserve the established exported API -names used by tests and docs. +## Project Snapshot + +This repository is a Rust workspace for the Loro CRDT library, with JS/WASM, +TypeScript, and MoonBit packaging around the Rust core. + +- `crates/loro`: public Rust API. Treat this as a stable downstream-facing crate. +- `crates/loro-internal`: core CRDT implementation, including oplog, state, + diff calculation, encoding, containers, DAG/version logic, and checkout/import + behavior. +- `crates/loro-wasm`: `loro-crdt` WASM/TypeScript package. Read its nested + `AGENTS.md` before changing WASM bindings, package exports, or JS wrappers. +- `crates/delta`, `crates/rle`, `crates/kv-store`, `crates/fractional_index`, + and `crates/loro-common`: shared primitives used by the core crates. +- `packages/fractional-index`: TypeScript package for the fractional index + algorithm. +- `examples/` and `crates/examples`: integration examples and bundler smoke + tests. +- `moon/`: MoonBit implementation of the Loro binary codec. Use the MoonBit + skill in `skills/moonbit` when working there. +- `skills/loro`: project skill for user-facing Loro guidance. Prefer loading + its focused reference files over copying broad CRDT background into answers. + +## Build, Test, And Development Commands + +Use narrow commands first, then broaden when touching shared behavior. + +- Install JS dependencies when needed: `pnpm install --frozen-lockfile`. +- Build Rust workspace: `cargo build`. +- Fast internal check: `cargo check -p loro-internal`. +- Rust format: `cargo fmt --all`. +- Rust lint: `pnpm check` (`cargo clippy --all-features -- -Dwarnings`). +- Main Rust tests: `pnpm test` (`cargo nextest run --features=test_utils,jsonpath --no-fail-fast && cargo test --doc`). +- Internal doctests: `cargo test -p loro-internal --doc`. +- Loom concurrency test: `pnpm test-loom`. +- WASM package build/test: `pnpm release-wasm`. +- WASM local dev build: `pnpm -C crates/loro-wasm build-dev`. +- Bundler smoke tests after WASM packaging or entrypoint changes: + `pnpm test-bundlers`, and for browser runtime coverage + `pnpm --dir examples/bundler-smoke-tests run test:browser`. +- Fractional-index TS package: `pnpm test-fractional-index`. +- Short fuzz corpus smoke: `pnpm run-fuzz-corpus`. +- MoonBit codec, when `moon` is available: run commands from `moon/`, usually + `moon check`, `moon test`, and `moon fmt`. + +Do not run broad fuzzing or long browser matrices without checking with the user +when time/cost is unclear. ## Testing Guidelines -Add regression tests near the behavior being fixed: Rust API tests in -`crates/loro/tests`, internal tests in `crates/loro-internal/tests` or module tests, -and WASM behavior in `crates/loro-wasm/tests`. For import/encoding bugs, prefer -fixture-based tests with small binary fixtures. Run the narrow package test first, -then `pnpm test` when the change affects shared behavior. For changes touching -internal diff calculation, checkout, import, or state-replay logic, also consider -the fuzz targets in `crates/fuzz`; ask whether to run the broader `fuzz all` -target before spending the extra time. +Add regression tests near the behavior being fixed. -## Commit & Pull Request Guidelines +- Public Rust API tests: `crates/loro/tests`. +- Internal behavior tests: `crates/loro-internal/tests` or local module tests. +- WASM behavior tests: `crates/loro-wasm/tests`. +- MoonBit codec tests: `moon/loro_codec/*_test.mbt` plus Rust/Moon e2e drivers + documented in `docs/moon-codec-fuzzing.md`. +- Import, encoding, and replay bugs should use small binary or JSON fixtures + when possible. +- Changes touching internal diff calculation, checkout, import, state replay, or + encoding may need fuzz coverage under `crates/fuzz`; ask before running the + broad `cargo +nightly fuzz run all` style targets. -History uses short imperative commits, often prefixed by scope such as `fix:`, -`test:`, `chore:`, or `refactor:`. Keep commits focused and include fixtures or -tests with fixes. PRs should describe what changed, why, validation commands, and -linked issues or production traces when relevant. Add a changeset when publishing -behavior or package output changes. +## Coding Style And Boundaries -## Agent-Specific Notes +Use standard Rust formatting with `rustfmt`. Keep imports and chained calls +formatted by the tool. Rust functions/modules use `snake_case`; Rust types use +`CamelCase`. JS/TS bindings in `loro-wasm` must preserve established exported API +names used by tests and docs. -### Principle: Avoid Breaking Changes Unless Absolutely Necessary +Prefer existing crate-local helpers and data structures over new abstractions. +Keep changes scoped to the relevant crate boundary. Do not refactor shared CRDT +machinery while fixing unrelated package, docs, or binding issues. -The `loro` crate is a public library with downstream users. When fixing panics or bugs, -prefer non-breaking solutions: +## Public API Compatibility -- Add `try_*` methods that return `Option` or `Result` instead of changing existing - method signatures. -- Replace `assert!` / `unwrap()` / `unreachable!()` with descriptive `expect()` messages - when the method must remain panicking for backward compatibility. -- Only introduce breaking signature changes (e.g., changing a return type from `T` to - `Option`) when there is no safe backward-compatible alternative and the breakage - is justified by a critical correctness or safety issue. +The `loro` crate and `loro-crdt` package are public libraries with downstream +users. Avoid breaking changes unless there is no safe alternative. -### Principle: Internal Invariant Preservation Over Graceful Degradation +- Prefer adding `try_*` methods returning `Option` or `Result` over changing an + existing method signature. +- If an existing public method must keep panicking for compatibility, prefer a + descriptive `expect()` message over an opaque `unwrap()`. +- Only change public return types or names when required by a critical + correctness or safety issue. +- Add a changeset for publishing behavior or package output changes. -When an internal invariant is violated (e.g., a state lookup that should always succeed -returns `None`, an event batch has an unexpected structure, or a diff cannot be composed), -the priority is: +## Internal Invariants -1. **Do not let the system continue in a corrupted or inconsistent state.** - Prefer `panic!` / `unwrap()` / `expect()` over silently skipping, returning a default, - or returning success when the internal state is known to be wrong. -2. **Preserve the correctness of public API contracts.** - A public method should not return a value that violates its documented contract - (e.g., returning an empty list when nodes actually exist). -3. **Avoid panics on valid user input.** - Malformed external input (decode errors, invalid JSON schema, out-of-bounds indices) - should return `Err`. But do not replace internal-safety panics with silent skips - just to avoid crashing. +Internal corruption should fail fast. Invalid external input should return an +error. -In short: internal corruption → fail-fast (panic); invalid user input → `Result::Err`; -returning wrong data is worse than panicking. +- Do not let the system continue after a violated internal invariant, such as a + missing state that should exist, an impossible event shape, or a diff that + cannot be composed. +- Do not silently skip data, return defaults, or report success when internal + state is known to be inconsistent. +- Malformed user input, invalid JSON schema, decode failures, and out-of-bounds + external requests should return `Err` where the API supports it. +- Returning wrong data is worse than panicking on corrupted internal state. -### Invariant: Flush Pending Events In `loro-wasm` +## WASM Event Flush Invariant -In `crates/loro-wasm/src/lib.rs`, subscription callbacks (`subscribe*`, -container `subscribe`, etc.) do not call user JS immediately. The binding -enqueues JS calls into a global pending queue and schedules a microtask check. -If the microtask runs before `callPendingEvents()` flushes the queue, it logs: +In `crates/loro-wasm/src/lib.rs`, subscription callbacks enqueue JS calls into a +global pending queue instead of calling user JS immediately. If the microtask +check runs before `callPendingEvents()` flushes that queue, it logs: -- `[LORO_INTERNAL_ERROR] Event not called` +```text +[LORO_INTERNAL_ERROR] Event not called +``` Any WASM-exposed API that can enqueue subscription events must flush pending -events before returning control to JS. To avoid adding overhead to every op, only -a small JS-side allowlist is wrapped; the wrapper calls `callPendingEvents()` in -a `finally` block. - -When adding or changing a `#[wasm_bindgen]` API in `crates/loro-wasm/src/lib.rs` -that can mutate document state, check whether it can trigger an implicit commit -or barrier (`commit`, `with_barrier`, `implicit_commit_then_stop`), emit events -(`emit_events`), or apply diffs (`revertTo`, `applyDiff`). If so, add its JS -name to the allowlist near the bottom of `crates/loro-wasm/index.ts`: -`decorateMethods(LoroDoc.prototype, [...])` or the relevant prototype allowlist. -Pure read/query APIs should not be decorated. - -Quick check with active subscriptions (`doc.subscribe(...)` or container -`subscribe(...)`): mutating APIs should not produce the error above. A useful -local check is: - -```sh -pnpm -C crates/loro-wasm build-release -``` +events before returning to JS. The JS-side allowlist lives near the bottom of +`crates/loro-wasm/index.ts` in `decorateMethods(...)`. When adding or changing a +`#[wasm_bindgen]` API that can mutate document state, trigger implicit commits +or barriers, emit events, or apply diffs, update the relevant allowlist. Pure +read/query APIs should not be decorated. See `crates/loro-wasm/AGENTS.md` before +editing this area. + +## Release And Generated Files + +- WASM release output and versions are synchronized through + `scripts/sync-loro-version.ts` and the `pnpm release-wasm` / changesets flow. +- Rust crate releases use `scripts/cargo-release.ts` and `cargo-release`; keep + version bumps focused. +- Do not hand-edit generated package output from the WASM build. Regenerate it + with the package scripts. +- Keep lockfiles and small fixtures when they are intentionally affected by the + change. Do not churn them for unrelated work. + +## Agent Workflow + +- Start with `git status --short --branch` and treat uncommitted changes as user + work unless you made them in the current turn. +- Read the nearest `AGENTS.md` before editing a subtree. +- Use `rg` / `rg --files` for search and repository mapping. +- Load `skills/loro` for user-facing Loro usage, CRDT modeling, sync, + persistence, editor integration, or performance guidance. Load + `skills/moonbit` for work under `moon/`. +- Make the smallest durable context or code change that solves the request. +- Validate with the narrowest meaningful command first and report any broader + checks not run. + +## Commit And PR Notes + +History uses short imperative commits, often prefixed by scope such as `fix:`, +`test:`, `chore:`, or `refactor:`. Keep commits focused and include fixtures or +tests with fixes. PRs should describe what changed, why, validation commands, and +linked issues or production traces when relevant. diff --git a/crates/loro-wasm/AGENTS.md b/crates/loro-wasm/AGENTS.md index 81e5a1bf4..681f87b76 100644 --- a/crates/loro-wasm/AGENTS.md +++ b/crates/loro-wasm/AGENTS.md @@ -1 +1,69 @@ -If you change WASM packaging or bundler-facing entrypoints, run `pnpm --dir examples/bundler-smoke-tests run test:browser` from the repo root to verify the package still builds and executes in real browsers. +# WASM Package Guidelines + +This subtree builds the `loro-crdt` JS/WASM package. It contains the Rust +`#[wasm_bindgen]` bindings in `src/lib.rs`, the JS/TS wrapper in `index.ts`, +Rollup/build scripts, package exports, and WASM-specific tests. + +## Commands + +Run commands from the repository root unless noted. + +- Build/test release package: `pnpm release-wasm`. +- Local dev package build: `pnpm -C crates/loro-wasm build-dev`. +- Package-local release build: `pnpm -C crates/loro-wasm build-release`. +- Package-local tests after an existing build: `pnpm -C crates/loro-wasm test`. +- Fast bundler smoke tests after entrypoint/export changes: + `pnpm test-bundlers`. +- Browser runtime smoke tests for packaging changes: + `pnpm --dir examples/bundler-smoke-tests run test:browser`. + +`pnpm release-wasm` runs version sync, installs this package's dependencies, and +builds the release artifacts. Use it for final validation when changing +`src/lib.rs`, `index.ts`, package exports, Rollup config, or build scripts. + +## Pending Events Invariant + +Subscription callbacks (`subscribe*`, container `subscribe`, and related APIs) +do not call user JS immediately. Rust queues JS calls into a global pending queue +and schedules a microtask check. If the microtask runs before +`callPendingEvents()` flushes the queue, the package logs: + +```text +[LORO_INTERNAL_ERROR] Event not called +``` + +Any WASM-exposed API that can enqueue subscription events must flush pending +events before returning control to JS. This is intentionally implemented as a +small JS-side allowlist in `index.ts` rather than wrapping every method. + +When adding or changing a `#[wasm_bindgen]` API in `src/lib.rs`, check whether it +can: + +- mutate document or container state, +- trigger an implicit commit or barrier (`commit`, `with_barrier`, + `implicit_commit_then_stop`), +- emit events, +- apply diffs (`revertTo`, `applyDiff`), or +- change ephemeral store state that has JS subscribers. + +If yes, add the JS method name to the relevant `decorateMethods(...)` allowlist +near the bottom of `index.ts` (`LoroDoc.prototype`, container prototypes, +`EphemeralStoreWasm.prototype`, or `UndoManager.prototype`). Pure read/query APIs +should not be decorated. + +A quick behavioral check is to run with an active `doc.subscribe(...)` or +container `subscribe(...)` and confirm the mutation does not produce the internal +error above. Keep or add a regression test when the issue is observable from JS. + +## Packaging Rules + +- Preserve the public `loro-crdt` API names and package export paths used by + tests and docs. +- Do not hand-edit generated package output. Regenerate with `build-dev`, + `build-release`, or `pnpm release-wasm`. +- Package entrypoint changes must consider `bundler`, `browser`, `nodejs`, + `web`, and `base64` outputs. +- Vite and Webpack can emit the `.wasm` asset from `new URL(...)`; plain esbuild + and Rollup need either the `base64` entry or an explicit asset copy. Keep the + bundler smoke tests aligned with these expectations. +- If package output or published behavior changes, add a changeset. From fbb4901c640d642dfb34939cac79e138be23a6d4 Mon Sep 17 00:00:00 2001 From: Zixuan Chen Date: Tue, 16 Jun 2026 02:37:17 +0000 Subject: [PATCH 2/4] chore: index internal agent context --- AGENTS.md | 5 + CLAUDE.md | 1 + crates/loro-internal/AGENTS.md | 58 +++++++++++ crates/loro-internal/CLAUDE.md | 1 + crates/loro-internal/src/encoding/AGENTS.md | 101 ++++++++++++++++++ crates/loro-internal/src/encoding/CLAUDE.md | 1 + crates/loro-internal/src/state/AGENTS.md | 109 ++++++++++++++++++++ crates/loro-internal/src/state/CLAUDE.md | 1 + crates/loro-wasm/CLAUDE.md | 1 + 9 files changed, 278 insertions(+) create mode 120000 CLAUDE.md create mode 100644 crates/loro-internal/AGENTS.md create mode 120000 crates/loro-internal/CLAUDE.md create mode 100644 crates/loro-internal/src/encoding/AGENTS.md create mode 120000 crates/loro-internal/src/encoding/CLAUDE.md create mode 100644 crates/loro-internal/src/state/AGENTS.md create mode 120000 crates/loro-internal/src/state/CLAUDE.md create mode 120000 crates/loro-wasm/CLAUDE.md diff --git a/AGENTS.md b/AGENTS.md index 42a429d3b..653819d56 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -11,6 +11,9 @@ TypeScript, and MoonBit packaging around the Rust core. behavior. - `crates/loro-wasm`: `loro-crdt` WASM/TypeScript package. Read its nested `AGENTS.md` before changing WASM bindings, package exports, or JS wrappers. +- `crates/loro-internal`: core CRDT implementation. Read its nested `AGENTS.md` + before changing encoding, import/export, state, diff, or mergeable container + behavior. - `crates/delta`, `crates/rle`, `crates/kv-store`, `crates/fractional_index`, and `crates/loro-common`: shared primitives used by the core crates. - `packages/fractional-index`: TypeScript package for the fractional index @@ -134,6 +137,8 @@ editing this area. - Start with `git status --short --branch` and treat uncommitted changes as user work unless you made them in the current turn. - Read the nearest `AGENTS.md` before editing a subtree. +- Keep `CLAUDE.md` as a symlink to the nearest `AGENTS.md` when adding agent + instructions, so Claude and Codex read the same durable context. - Use `rg` / `rg --files` for search and repository mapping. - Load `skills/loro` for user-facing Loro usage, CRDT modeling, sync, persistence, editor integration, or performance guidance. Load diff --git a/CLAUDE.md b/CLAUDE.md new file mode 120000 index 000000000..47dc3e3d8 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1 @@ +AGENTS.md \ No newline at end of file diff --git a/crates/loro-internal/AGENTS.md b/crates/loro-internal/AGENTS.md new file mode 100644 index 000000000..eab917cd3 --- /dev/null +++ b/crates/loro-internal/AGENTS.md @@ -0,0 +1,58 @@ +# loro-internal Guidelines + +This crate contains Loro's unstable internal CRDT implementation. Public API +compatibility concerns still matter because `crates/loro` and `crates/loro-wasm` +wrap this crate directly, but the internal priority is preserving invariants +over graceful degradation. + +## Internal Map + +- `src/loro.rs`: document-level orchestration for commit, import/export, + checkout, barriers, state/oplog coordination, and event emission. +- `src/encoding.rs`: public/internal `ExportMode`, binary header parsing, + checksum verification, `EncodeMode` dispatch, import metadata, and the bridge + from decoded changes into `OpLog`. +- `src/encoding/`: concrete binary and JSON encoding implementations. Read + `src/encoding/AGENTS.md` before changing binary layout, JSON schema, import + metadata, shallow snapshot, or op/value encoding. +- `src/oplog/` and `src/dag/`: change storage, dependency ordering, pending + changes, version vectors/frontiers, shallow roots, and history traversal. +- `src/state.rs` and `src/state/`: materialized document state, container stores, + diff application, checkout/replay, deep value, dead-container tracking, and + mergeable container visibility. Read `src/state/AGENTS.md` before changing + mergeable containers. +- `src/handler.rs`: typed container handlers, local operation creation, and + `MapHandler::ensure_mergeable_*`. +- `src/diff_calc/`: diff calculation when moving between versions. +- `docs/diff_calc.md`: design notes for diff calculation. +- `docs/mergeable-container-id.md`: current mergeable container id encoding. +- `tests/mergeable_container/` and `tests/mergeable_cid_encoding.rs`: focused + mergeable container regression tests. +- `src/tests/import_atomicity.rs`: import rollback and malformed-input + regressions. + +## Commands + +Use narrow checks first: + +- `cargo check -p loro-internal` +- `cargo test -p loro-internal --doc` +- `cargo test -p loro-internal --test mergeable_container` +- `cargo test -p loro-internal --test mergeable_cid_encoding` +- `cargo test -p loro-internal import_atomicity` + +For broad shared behavior, run the root commands from `AGENTS.md`. For changes +to import, checkout, encoding, state replay, or diff calculation, consider fuzz +coverage under `crates/fuzz` and ask before running long fuzz targets. + +## Working Rules + +- Internal invariant violation should fail fast. Invalid external bytes or JSON + should return `Err`. +- Do not silently skip ops, containers, state entries, diffs, or pending changes. +- Snapshot/import paths must be atomic: if decode or state application fails, + rollback must leave the document usable. +- Preserve attached/detached document state when export paths temporarily + checkout another version. +- If a change affects `crates/loro` or `crates/loro-wasm` behavior, add or update + tests at the wrapper layer as well as the internal layer when practical. diff --git a/crates/loro-internal/CLAUDE.md b/crates/loro-internal/CLAUDE.md new file mode 120000 index 000000000..47dc3e3d8 --- /dev/null +++ b/crates/loro-internal/CLAUDE.md @@ -0,0 +1 @@ +AGENTS.md \ No newline at end of file diff --git a/crates/loro-internal/src/encoding/AGENTS.md b/crates/loro-internal/src/encoding/AGENTS.md new file mode 100644 index 000000000..4e7eb7c3c --- /dev/null +++ b/crates/loro-internal/src/encoding/AGENTS.md @@ -0,0 +1,101 @@ +# Encoding Guidelines + +This module owns Loro's import/export formats. It is easy to confuse the +top-level blob modes, the current fast binary layouts, the legacy helper module, +and the JSON schema path; keep those boundaries explicit. + +## Entry Points + +- `../encoding.rs`: `ExportMode`, `EncodeMode`, 22-byte `loro` header, + checksum validation, top-level encode/decode dispatch, and + `decode_import_blob_meta`. +- `fast_snapshot.rs`: current `FastSnapshot` and `FastUpdates` body encoding. +- `shallow_snapshot.rs`: `ShallowSnapshot`, `StateOnly`, and `SnapshotAt` + variants built on `FastSnapshot`. +- `json_schema.rs`: JSON updates (`JsonSchema`, `schema_version = 1`), peer + compression, JSON validation, import/export, and redaction. +- `outdated_encode_reordered.rs`: legacy-named op/value columnar helpers and + `import_changes_to_oplog`. The top-level outdated blob modes are unsupported, + but this file still contains helpers used by current fast paths. +- `value.rs`, `value_register.rs`, and `arena.rs`: value/op encoding support, + value tables, peer/key registers, and arena-backed value decoding. +- `../../Encoding.md`: older high-level encoding notes. Treat it as background, + not the source of truth when code disagrees. +- `../../../../docs/encoding.md`: detailed current binary format reference. +- `../../../../docs/encoding-container-states.md`: container state snapshot + layouts used inside `FastSnapshot.state_bytes`. + +## Supported Formats + +Top-level binary blobs all start with: + +- magic bytes `loro`, +- 16 checksum bytes, +- a big-endian `u16` encode mode, +- then mode-specific body bytes. + +Current supported binary modes: + +- `EncodeMode::FastSnapshot = 3`: used by `ExportMode::Snapshot`, + `ShallowSnapshot`, `StateOnly`, and `SnapshotAt`. +- `EncodeMode::FastUpdates = 4`: used by `ExportMode::Updates` and + `UpdatesInRange`. + +Legacy top-level modes: + +- `EncodeMode::OutdatedRle = 1` +- `EncodeMode::OutdatedSnapshot = 2` + +These parse as known modes for compatibility detection but currently return +`ImportUnsupportedEncodingMode` on import/metadata decode. Do not re-enable them +without a compatibility plan and fixtures. + +JSON update format: + +- `json_schema.rs` is not wrapped in the binary `loro` header. +- It carries `schema_version = 1`, `start_version`, optional peer compression, + and a list of JSON changes/ops. +- Malformed JSON schema should return `Err`, not partially import. + +## FastSnapshot + +`fast_snapshot.rs` encodes a snapshot body as three length-prefixed sections: + +1. `oplog_bytes`: KV-store encoded change history. +2. `state_bytes`: KV-store encoded materialized state, or `EMPTY_MARK` when + omitted and state must be recalculated. +3. `shallow_root_state_bytes`: KV-store encoded shallow root state, empty for + non-shallow snapshots. + +Importing a snapshot into an empty doc can initialize oplog and state directly. +Importing snapshot data into a non-empty doc goes through decoded oplog changes +instead. Failed snapshot import must roll back both oplog and state. + +## FastUpdates + +`FastUpdates` body is a sequence of LEB128 length-prefixed change blocks. +`decode_updates` must reject truncated blocks, length overflow, and corrupt block +payloads. Decoded changes are sorted by lamport before being applied. + +## Shallow/State-Only Snapshots + +`shallow_snapshot.rs` temporarily checks out versions to build shallow root +state and state deltas, then restores the original document state. + +- `ShallowSnapshot` retains history since a calculated shallow start frontier. +- `StateOnly` is a shallow snapshot with minimal history at a target version. +- `SnapshotAt` exports full history up to target frontiers plus state there. +- Unknown container types must block shallow/state snapshot export rather than + writing a blob that cannot be decoded correctly. +- Style start/end ops must not be split across shallow roots. + +## Validation + +For encoding changes, prefer focused fixtures and malformed-input tests. Useful +starting points: + +- `cargo test -p loro-internal import_atomicity` +- `cargo test -p loro-internal decode_updates_rejects_truncated_block` +- `cargo test -p loro-internal --test mergeable_container` when snapshots may + affect mergeable child retention. +- Root `pnpm test` when changing shared import/export semantics. diff --git a/crates/loro-internal/src/encoding/CLAUDE.md b/crates/loro-internal/src/encoding/CLAUDE.md new file mode 120000 index 000000000..47dc3e3d8 --- /dev/null +++ b/crates/loro-internal/src/encoding/CLAUDE.md @@ -0,0 +1 @@ +AGENTS.md \ No newline at end of file diff --git a/crates/loro-internal/src/state/AGENTS.md b/crates/loro-internal/src/state/AGENTS.md new file mode 100644 index 000000000..9caa33beb --- /dev/null +++ b/crates/loro-internal/src/state/AGENTS.md @@ -0,0 +1,109 @@ +# State Guidelines + +This module owns materialized document state, container stores, diff application, +checkout/replay behavior, deep/shallow values, and mergeable container +visibility. + +## State Map + +- `../state.rs`: `DocState`, checkout/path/deep-value traversal, state replay, + container lifecycle, and alive-container discovery. +- `container_store/`: persisted KV-backed container snapshots and + `ContainerWrapper` encoding. +- `map_state.rs`, `list_state.rs`, `richtext_state.rs`, `tree_state.rs`, + `movable_list_state.rs`, `counter_state.rs`: per-container state and snapshot + codecs. +- `mergeable.rs`: logical child edge resolution for mergeable containers. +- `dead_containers_cache.rs`: dead/alive tracking, including marker-driven + mergeable reactivation. +- `unknown_state.rs` and `../diff_calc/unknown.rs`: forward-compatibility support + for unknown container types. +- `../../docs/mergeable-container-id.md`: mergeable container id format. +- `../../tests/mergeable_container/`: behavior tests for mergeable visibility, + deletion, conflicts, pending updates, paths/events, and snapshots. + +## Mergeable Container Model + +Mergeable child containers are created by `MapHandler::ensure_mergeable_*` in +`../handler.rs` and exposed by Rust/WASM wrapper APIs. + +Core idea: + +- Two peers calling `ensure_mergeable_(key)` on the same parent map derive + the same deterministic child `ContainerID` via + `ContainerID::new_mergeable(parent, key, kind)` in `loro-common`. +- The child id is represented as a reserved `ContainerID::Root` name. The child + kind lives in `ContainerID::Root.container_type`; the root-name payload only + encodes parent map identity and map-key path. +- The parent map slot stores a compact binary marker from + `loro_common::mergeable_marker(parent, key, kind)`. This marker is the source + of truth for which mergeable child kind is currently visible. +- Parent map LWW semantics resolve concurrent different-kind markers. Losing + children are hidden but their state must be preserved and can resurface if a + later `ensure_mergeable_` rewrites the marker. + +Important boundaries: + +- User strings, arbitrary binary values, scalars, and regular child containers + are not mergeable markers and must block `ensure_mergeable_*` rather than be + overwritten. +- Same-kind `ensure_mergeable_*` over an existing marker is idempotent and should + not emit another op. +- Different-kind `ensure_mergeable_*` over an existing marker is a deliberate + kind change and writes a new marker. +- Deleting the map key clears the marker and hides the child, but the child state + is preserved by deterministic id. Re-ensuring the same kind resurfaces it. +- Visibility comes from the parent marker, not from whether the child already + has direct ops. This matters for nested mergeable maps and pending imports. + +## Mergeable Code Index + +- `crates/loro-common/src/lib.rs`: `MERGEABLE_NAMESPACE_PREFIX`, + `ContainerID::new_mergeable`, `parse_mergeable`, `mergeable_marker`, + `parse_mergeable_marker`, and marker-to-container translation. +- `../handler.rs`: `MapHandler::ensure_mergeable_container` validates the parent + slot, writes markers, and returns a handler for the deterministic cid. +- `mergeable.rs`: resolves logical child paths from deterministic cid plus the + parent map's current marker. +- `map_state.rs`: translates marker values to `LoroValue::Container` at read and + diff boundaries when the parent id is known. +- `../state.rs`: deep-value/path traversal must recognize marker-backed child + edges. +- `../txn.rs`: local event diffs translate marker writes into container values + for subscribers. +- `dead_containers_cache.rs`: import/reactivation behavior when marker values + change across peers. +- `../../tests/mergeable_cid_encoding.rs`: deterministic cid and parser tests. +- `../../tests/mergeable_container/discriminator.rs`: marker layout, + idempotency, kind-change, and non-mergeable occupant tests. +- `../../tests/mergeable_container/type_conflict.rs`: concurrent different-kind + conflict behavior. +- `../../tests/mergeable_container/snapshot.rs`: snapshot and shallow snapshot + retention, including losing-kind state. +- `../../tests/mergeable_container/pending.rs`: pending updates that arrive + before all mergeable context exists. +- `../../tests/mergeable_container/events_and_paths.rs`: event and path surface. + +## Encoding/Retention Rules + +- Snapshot and shallow snapshot alive-container walks must retain mergeable + child state even when the child is not currently visible because another kind's + marker wins. +- Raw marker bytes are the wire/storage representation. Public read surfaces + should translate active markers to container values unless the API explicitly + exposes raw/shallow storage. +- Mergeable root names grow with nested mergeable map key paths. Avoid adding + APIs or tests that encourage deep mergeable-map chains without measuring the + serialized id cost. + +## Validation + +For mergeable/state changes, start with: + +- `cargo test -p loro-internal --test mergeable_cid_encoding` +- `cargo test -p loro-internal --test mergeable_container` +- `cargo test -p loro-internal import_atomicity` if import or rollback is + involved. + +Run root-level broader tests when changing shared replay, checkout, or snapshot +behavior. diff --git a/crates/loro-internal/src/state/CLAUDE.md b/crates/loro-internal/src/state/CLAUDE.md new file mode 120000 index 000000000..47dc3e3d8 --- /dev/null +++ b/crates/loro-internal/src/state/CLAUDE.md @@ -0,0 +1 @@ +AGENTS.md \ No newline at end of file diff --git a/crates/loro-wasm/CLAUDE.md b/crates/loro-wasm/CLAUDE.md new file mode 120000 index 000000000..47dc3e3d8 --- /dev/null +++ b/crates/loro-wasm/CLAUDE.md @@ -0,0 +1 @@ +AGENTS.md \ No newline at end of file From 553f1711e4496fe08cee3351043ffb0905761e02 Mon Sep 17 00:00:00 2001 From: Zixuan Chen Date: Tue, 16 Jun 2026 02:47:10 +0000 Subject: [PATCH 3/4] chore: add context articles for internal concepts --- AGENTS.md | 219 +++++++------------- context/CONTEXT-GAPS.md | 7 + context/internal-encoding.md | 135 ++++++++++++ context/mergeable-containers.md | 121 +++++++++++ crates/loro-internal/AGENTS.md | 11 +- crates/loro-internal/src/encoding/AGENTS.md | 123 +++-------- crates/loro-internal/src/state/AGENTS.md | 113 ++-------- 7 files changed, 396 insertions(+), 333 deletions(-) create mode 100644 context/CONTEXT-GAPS.md create mode 100644 context/internal-encoding.md create mode 100644 context/mergeable-containers.md diff --git a/AGENTS.md b/AGENTS.md index 653819d56..80645ade8 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -2,154 +2,83 @@ ## Project Snapshot -This repository is a Rust workspace for the Loro CRDT library, with JS/WASM, -TypeScript, and MoonBit packaging around the Rust core. - -- `crates/loro`: public Rust API. Treat this as a stable downstream-facing crate. -- `crates/loro-internal`: core CRDT implementation, including oplog, state, - diff calculation, encoding, containers, DAG/version logic, and checkout/import - behavior. -- `crates/loro-wasm`: `loro-crdt` WASM/TypeScript package. Read its nested - `AGENTS.md` before changing WASM bindings, package exports, or JS wrappers. -- `crates/loro-internal`: core CRDT implementation. Read its nested `AGENTS.md` - before changing encoding, import/export, state, diff, or mergeable container - behavior. +Loro is a Rust CRDT workspace with JS/WASM packaging and a MoonBit codec. + +- `crates/loro`: public Rust API; avoid breaking downstream users. +- `crates/loro-internal`: core CRDT logic. Read its + [AGENTS.md](crates/loro-internal/AGENTS.md) before changing import/export, + encoding, state, diff, checkout, or replay behavior. +- `crates/loro-wasm`: `loro-crdt` WASM/TypeScript package. Read its + [AGENTS.md](crates/loro-wasm/AGENTS.md) before changing bindings, exports, + wrappers, or build scripts. - `crates/delta`, `crates/rle`, `crates/kv-store`, `crates/fractional_index`, - and `crates/loro-common`: shared primitives used by the core crates. -- `packages/fractional-index`: TypeScript package for the fractional index - algorithm. -- `examples/` and `crates/examples`: integration examples and bundler smoke - tests. -- `moon/`: MoonBit implementation of the Loro binary codec. Use the MoonBit - skill in `skills/moonbit` when working there. -- `skills/loro`: project skill for user-facing Loro guidance. Prefer loading - its focused reference files over copying broad CRDT background into answers. - -## Build, Test, And Development Commands - -Use narrow commands first, then broaden when touching shared behavior. - -- Install JS dependencies when needed: `pnpm install --frozen-lockfile`. -- Build Rust workspace: `cargo build`. -- Fast internal check: `cargo check -p loro-internal`. -- Rust format: `cargo fmt --all`. -- Rust lint: `pnpm check` (`cargo clippy --all-features -- -Dwarnings`). -- Main Rust tests: `pnpm test` (`cargo nextest run --features=test_utils,jsonpath --no-fail-fast && cargo test --doc`). -- Internal doctests: `cargo test -p loro-internal --doc`. -- Loom concurrency test: `pnpm test-loom`. -- WASM package build/test: `pnpm release-wasm`. -- WASM local dev build: `pnpm -C crates/loro-wasm build-dev`. -- Bundler smoke tests after WASM packaging or entrypoint changes: - `pnpm test-bundlers`, and for browser runtime coverage + `crates/loro-common`, and `packages/fractional-index`: shared primitives and + packages. +- `moon/`: MoonBit Loro binary codec; use [skills/moonbit/SKILL.md](skills/moonbit/SKILL.md). + +## Context Index + +- Encoding/import/export modes, current vs outdated formats, shallow snapshots: + [context/internal-encoding.md](context/internal-encoding.md). +- Mergeable container model, marker/cid rules, tests, and common pitfalls: + [context/mergeable-containers.md](context/mergeable-containers.md). +- User-facing Loro usage, sync, editor integration, and performance guidance: + [skills/loro/SKILL.md](skills/loro/SKILL.md). +- Context backlog: [context/CONTEXT-GAPS.md](context/CONTEXT-GAPS.md). + +## Commands + +- JS deps: `pnpm install --frozen-lockfile`. +- Rust build/check/format/lint: `cargo build`, `cargo check -p loro-internal`, + `cargo fmt --all`, `pnpm check`. +- Rust tests: `pnpm test`; internal doctests: `cargo test -p loro-internal --doc`. +- Loom: `pnpm test-loom`. +- WASM: `pnpm release-wasm`, or `pnpm -C crates/loro-wasm build-dev`. +- Bundlers after WASM packaging changes: `pnpm test-bundlers`; browser runtime: `pnpm --dir examples/bundler-smoke-tests run test:browser`. -- Fractional-index TS package: `pnpm test-fractional-index`. -- Short fuzz corpus smoke: `pnpm run-fuzz-corpus`. -- MoonBit codec, when `moon` is available: run commands from `moon/`, usually - `moon check`, `moon test`, and `moon fmt`. - -Do not run broad fuzzing or long browser matrices without checking with the user -when time/cost is unclear. - -## Testing Guidelines - -Add regression tests near the behavior being fixed. - -- Public Rust API tests: `crates/loro/tests`. -- Internal behavior tests: `crates/loro-internal/tests` or local module tests. -- WASM behavior tests: `crates/loro-wasm/tests`. -- MoonBit codec tests: `moon/loro_codec/*_test.mbt` plus Rust/Moon e2e drivers - documented in `docs/moon-codec-fuzzing.md`. -- Import, encoding, and replay bugs should use small binary or JSON fixtures - when possible. -- Changes touching internal diff calculation, checkout, import, state replay, or - encoding may need fuzz coverage under `crates/fuzz`; ask before running the - broad `cargo +nightly fuzz run all` style targets. - -## Coding Style And Boundaries - -Use standard Rust formatting with `rustfmt`. Keep imports and chained calls -formatted by the tool. Rust functions/modules use `snake_case`; Rust types use -`CamelCase`. JS/TS bindings in `loro-wasm` must preserve established exported API -names used by tests and docs. - -Prefer existing crate-local helpers and data structures over new abstractions. -Keep changes scoped to the relevant crate boundary. Do not refactor shared CRDT -machinery while fixing unrelated package, docs, or binding issues. - -## Public API Compatibility - -The `loro` crate and `loro-crdt` package are public libraries with downstream -users. Avoid breaking changes unless there is no safe alternative. - -- Prefer adding `try_*` methods returning `Option` or `Result` over changing an - existing method signature. -- If an existing public method must keep panicking for compatibility, prefer a - descriptive `expect()` message over an opaque `unwrap()`. -- Only change public return types or names when required by a critical - correctness or safety issue. +- Fractional-index TS: `pnpm test-fractional-index`. +- Fuzz smoke: `pnpm run-fuzz-corpus`. +- MoonBit codec, when `moon` is available: from `moon/`, run `moon check`, + `moon test`, `moon fmt`. + +Use narrow checks first. Ask before broad fuzzing or long browser matrices. + +## Working Rules + +- Start with `git status --short --branch`; treat uncommitted changes as user + work unless you made them. +- Before editing, read every `AGENTS.md` from root to target directory. Keep + `CLAUDE.md` as a symlink to the nearest `AGENTS.md`. +- Use `rg` / `rg --files` for search. +- Public API changes in `loro` or `loro-crdt` should be backward-compatible when + possible. Prefer new `try_*` APIs over breaking signatures. +- Internal corruption should fail fast; invalid external input should return + `Err`. Returning wrong state is worse than panicking on an impossible internal + invariant. +- Add regression tests near behavior: `crates/loro/tests`, + `crates/loro-internal/tests`, module tests, `crates/loro-wasm/tests`, or + `moon/loro_codec/*_test.mbt`. - Add a changeset for publishing behavior or package output changes. - -## Internal Invariants - -Internal corruption should fail fast. Invalid external input should return an -error. - -- Do not let the system continue after a violated internal invariant, such as a - missing state that should exist, an impossible event shape, or a diff that - cannot be composed. -- Do not silently skip data, return defaults, or report success when internal - state is known to be inconsistent. -- Malformed user input, invalid JSON schema, decode failures, and out-of-bounds - external requests should return `Err` where the API supports it. -- Returning wrong data is worse than panicking on corrupted internal state. - -## WASM Event Flush Invariant - -In `crates/loro-wasm/src/lib.rs`, subscription callbacks enqueue JS calls into a -global pending queue instead of calling user JS immediately. If the microtask -check runs before `callPendingEvents()` flushes that queue, it logs: - -```text -[LORO_INTERNAL_ERROR] Event not called -``` - -Any WASM-exposed API that can enqueue subscription events must flush pending -events before returning to JS. The JS-side allowlist lives near the bottom of -`crates/loro-wasm/index.ts` in `decorateMethods(...)`. When adding or changing a -`#[wasm_bindgen]` API that can mutate document state, trigger implicit commits -or barriers, emit events, or apply diffs, update the relevant allowlist. Pure -read/query APIs should not be decorated. See `crates/loro-wasm/AGENTS.md` before -editing this area. - -## Release And Generated Files - -- WASM release output and versions are synchronized through - `scripts/sync-loro-version.ts` and the `pnpm release-wasm` / changesets flow. -- Rust crate releases use `scripts/cargo-release.ts` and `cargo-release`; keep - version bumps focused. -- Do not hand-edit generated package output from the WASM build. Regenerate it - with the package scripts. -- Keep lockfiles and small fixtures when they are intentionally affected by the - change. Do not churn them for unrelated work. - -## Agent Workflow - -- Start with `git status --short --branch` and treat uncommitted changes as user - work unless you made them in the current turn. -- Read the nearest `AGENTS.md` before editing a subtree. -- Keep `CLAUDE.md` as a symlink to the nearest `AGENTS.md` when adding agent - instructions, so Claude and Codex read the same durable context. -- Use `rg` / `rg --files` for search and repository mapping. -- Load `skills/loro` for user-facing Loro usage, CRDT modeling, sync, - persistence, editor integration, or performance guidance. Load - `skills/moonbit` for work under `moon/`. -- Make the smallest durable context or code change that solves the request. -- Validate with the narrowest meaningful command first and report any broader - checks not run. +- Do not hand-edit generated WASM package output; regenerate it with package + scripts. + +## Self-Maintained Agent Context + +- Treat "why was that hard to find?" as a context bug. Add a nearby + `AGENTS.md` pointer or a `context/` article, or append a line to + [context/CONTEXT-GAPS.md](context/CONTEXT-GAPS.md). +- Keep root context short. If an `AGENTS.md` grows past about 4000 characters, move + detail into a linked `context/` article. +- Header context articles with `Verified against code YYYY-MM-DD`, anchor claims + to files/symbols, and link them from root plus the nearest per-directory + `AGENTS.md`. +- If code changes make an `AGENTS.md` or context article stale, update the docs + in the same change. +- When a commit needs non-obvious rationale, land that rationale in the nearest + context file and keep the commit message as a pointer. ## Commit And PR Notes -History uses short imperative commits, often prefixed by scope such as `fix:`, -`test:`, `chore:`, or `refactor:`. Keep commits focused and include fixtures or -tests with fixes. PRs should describe what changed, why, validation commands, and -linked issues or production traces when relevant. +History uses short imperative commits, often prefixed by `fix:`, `test:`, +`chore:`, or `refactor:`. PRs should include summary, rationale, validation, and +linked issues or traces when relevant. diff --git a/context/CONTEXT-GAPS.md b/context/CONTEXT-GAPS.md new file mode 100644 index 000000000..9b92d88f0 --- /dev/null +++ b/context/CONTEXT-GAPS.md @@ -0,0 +1,7 @@ +# Context Discoverability Gaps (backlog) + +Append a line when you discovered something important the hard way but could not +fix the docs in that change. + +Format: +`YYYY-MM-DD | | | why it was hard | suggested home` diff --git a/context/internal-encoding.md b/context/internal-encoding.md new file mode 100644 index 000000000..8dafa95f4 --- /dev/null +++ b/context/internal-encoding.md @@ -0,0 +1,135 @@ +# Internal Encoding Context + +Verified against code 2026-06-16. + +Loro has one binary blob envelope, two current binary body formats, two +recognized-but-unsupported legacy top-level modes, and a separate JSON updates +schema. The most common mistake is to treat `outdated_encode_reordered.rs` as an +obsolete file; only top-level blob modes 1 and 2 are obsolete. Several helpers in +that file are still used by current fast paths. + +## Two-Hop Answer + +If an agent asks "how does Loro encoding work?", start here: + +- [crates/loro-internal/src/encoding.rs](../crates/loro-internal/src/encoding.rs): + `ExportMode`, `EncodeMode`, `parse_header_and_body`, `encode_with`, + `decode_oplog_changes`, `decode_snapshot`, `decode_import_blob_meta`. +- [crates/loro-internal/src/loro.rs](../crates/loro-internal/src/loro.rs): + `LoroDoc::_import_with` chooses snapshot-vs-updates application behavior. +- [crates/loro-internal/src/encoding/fast_snapshot.rs](../crates/loro-internal/src/encoding/fast_snapshot.rs): + `Snapshot`, `encode_snapshot_inner`, `decode_snapshot_inner`, `encode_updates`, + `decode_updates`. +- [crates/loro-internal/src/encoding/shallow_snapshot.rs](../crates/loro-internal/src/encoding/shallow_snapshot.rs): + `export_shallow_snapshot_inner`, `export_state_only_snapshot`, + `encode_snapshot_at`. +- [crates/loro-internal/src/encoding/json_schema.rs](../crates/loro-internal/src/encoding/json_schema.rs): + `JsonSchema`, `export_json`, `decode_changes`, `redact`. +- [docs/encoding.md](../docs/encoding.md) and + [docs/encoding-container-states.md](../docs/encoding-container-states.md): + external binary format references. Verify against code before changing them. + +## Binary Envelope + +Every binary export starts with: + +- magic bytes `loro` from `encoding.rs:MAGIC_BYTES`, +- a 16-byte checksum field, +- a big-endian `u16` `EncodeMode`, +- mode-specific body bytes. + +For current `FastSnapshot` and `FastUpdates` blobs, `ParsedHeaderAndBody::check_checksum` +uses `xxhash32` over bytes starting at offset 20, which includes the mode bytes +and body. Legacy modes use the older MD5 check path only for detection. + +## Supported And Outdated Modes + +Current modes: + +- `EncodeMode::FastSnapshot = 3`: used by `ExportMode::Snapshot`, + `ShallowSnapshot`, `StateOnly`, and `SnapshotAt`. +- `EncodeMode::FastUpdates = 4`: used by `ExportMode::Updates` and + `UpdatesInRange`. + +Recognized but unsupported top-level modes: + +- `EncodeMode::OutdatedRle = 1` +- `EncodeMode::OutdatedSnapshot = 2` + +`encoding.rs:decode_oplog_changes`, `encoding.rs:decode_snapshot`, and +`LoroDoc::decode_import_blob_meta` return `ImportUnsupportedEncodingMode` for +these outdated top-level modes. Do not extend them without compatibility +fixtures and a migration plan. + +Important nuance: [outdated_encode_reordered.rs](../crates/loro-internal/src/encoding/outdated_encode_reordered.rs) +still contains current helpers including `import_changes_to_oplog`, `encode_op`, +`decode_op`, and `ValueRegister`. + +## FastSnapshot + +`fast_snapshot.rs:Snapshot` has three body sections: + +1. `oplog_bytes`: KV-store encoded change history. +2. `state_bytes`: KV-store encoded materialized state, or `EMPTY_MARK` when + omitted and state must be recalculated. +3. `shallow_root_state_bytes`: KV-store encoded shallow root state; empty for a + non-shallow snapshot. + +`decode_snapshot_inner` only initializes directly when importing into an empty +document. If a snapshot is imported into a non-empty document, +`LoroDoc::_import_with` routes through decoded oplog changes instead. Failed +direct snapshot import must reset both state and oplog. + +## FastUpdates + +`FastUpdates` is a sequence of LEB128 length-prefixed change blocks. +`fast_snapshot.rs:decode_updates` rejects invalid block lengths, length +overflow, and truncated block payloads, then sorts decoded changes by lamport. +`encoding.rs:apply_decoded_changes_to_oplog` imports changes, separates pending +changes, applies newly-unlocked pending changes, and rejects dependencies before +a shallow root. + +## Shallow, State-Only, And SnapshotAt + +All three use `FastSnapshot` mode: + +- `ShallowSnapshot` retains history since a calculated shallow start frontier. +- `StateOnly` is a shallow snapshot with minimal history at the target version. +- `SnapshotAt` exports full history up to target frontiers plus state at that + version. + +`shallow_snapshot.rs` temporarily checks out versions and must restore the +document's original state and attached/detached status. It must not split rich +text style start/end ops across the shallow root. Unknown container types block +shallow/state snapshot export through `LoroEncodeError::UnknownContainer`. + +## JSON Updates + +`json_schema.rs` is not wrapped in the binary `loro` envelope. Its +`JsonSchema` carries: + +- `schema_version = 1`, +- `start_version`, +- optional peer compression table, +- JSON changes and ops. + +Malformed JSON schema should return `Err` without partial import. Look at +[crates/loro-internal/src/tests/import_atomicity.rs](../crates/loro-internal/src/tests/import_atomicity.rs) +when changing JSON import validation or rollback behavior. + +## Validation Shortcuts + +- Binary malformed input or rollback: `cargo test -p loro-internal import_atomicity` +- Truncated fast updates: `cargo test -p loro-internal decode_updates_rejects_truncated_block` +- Snapshot retention that might involve mergeable containers: + `cargo test -p loro-internal --test mergeable_container` +- Shared behavior: root `pnpm test` + +## Common Misconceptions + +- "Outdated modes are still supported because `LoroDoc::_import_with` branches on + them." They are detected, then route to decode paths that return unsupported. +- "`outdated_encode_reordered.rs` is dead." It is legacy-named but still contains + active op/value helpers. +- "Snapshot import always initializes state directly." Only empty docs can reset + from snapshot; non-empty imports use oplog-change application. diff --git a/context/mergeable-containers.md b/context/mergeable-containers.md new file mode 100644 index 000000000..f98f6b69b --- /dev/null +++ b/context/mergeable-containers.md @@ -0,0 +1,121 @@ +# Mergeable Container Context + +Verified against code 2026-06-16. + +Mergeable containers let two peers independently create the same child container +under a map key and converge to one deterministic container id. The source of +truth for visibility is a binary marker in the parent map slot, not whether the +child already has direct operations. + +## Two-Hop Answer + +If an agent asks "how do mergeable containers work?", start here: + +- [crates/loro-common/src/lib.rs](../crates/loro-common/src/lib.rs): + `MERGEABLE_NAMESPACE_PREFIX`, `ContainerID::new_mergeable`, + `ContainerID::parse_mergeable`, `mergeable_marker`, + `parse_mergeable_marker`, `translate_mergeable_marker_value`. +- [crates/loro-internal/src/handler.rs](../crates/loro-internal/src/handler.rs): + `MapHandler::ensure_mergeable_container` and public + `ensure_mergeable_*` helpers. +- [crates/loro-internal/src/state/mergeable.rs](../crates/loro-internal/src/state/mergeable.rs): + logical child edge resolution from deterministic cid plus parent marker. +- [crates/loro-internal/src/state/map_state.rs](../crates/loro-internal/src/state/map_state.rs) + and [crates/loro-internal/src/txn.rs](../crates/loro-internal/src/txn.rs): + marker-to-container translation at read, diff, and event boundaries. +- [crates/loro-internal/docs/mergeable-container-id.md](../crates/loro-internal/docs/mergeable-container-id.md): + current mergeable cid encoding. +- [crates/loro-internal/tests/mergeable_container/](../crates/loro-internal/tests/mergeable_container/) + and [crates/loro-internal/tests/mergeable_cid_encoding.rs](../crates/loro-internal/tests/mergeable_cid_encoding.rs): + regression coverage. + +## Model + +`MapHandler::ensure_mergeable_(key)` does two things: + +1. Derives a deterministic `ContainerID::Root` with + `ContainerID::new_mergeable(parent, key, kind)`. +2. Writes `mergeable_marker(parent, key, kind)` into the parent map slot. + +The deterministic cid uses the reserved `🤝:` namespace. Its payload encodes the +nearest non-mergeable map ancestor and escaped key path. The child kind is stored +in `ContainerID::Root.container_type`, not duplicated in the root-name payload. + +The marker is compact binary storage: + +- magic bytes from `MERGEABLE_MARKER_MAGIC`, +- one byte for container kind, +- a 24-bit digest bound to `(parent, key, kind)`. + +Copying marker bytes to another key or parent does not activate a mergeable child +there. + +## Visibility And Conflicts + +The parent map's current value decides visibility: + +- no marker: child is hidden, though state may still exist at its deterministic cid; +- same-kind marker: child is active and read surfaces translate it to + `LoroValue::Container`; +- different-kind marker: parent map LWW picks the visible kind. + +Concurrent same-kind creation writes identical markers and merges into the same +child. Concurrent different-kind creation writes different markers; regular map +LWW chooses one visible kind. Losing-kind state must remain addressable by +deterministic cid and can resurface if a later `ensure_mergeable_` +rewrites the marker. + +## Boundaries + +- User strings, arbitrary binary values, scalars, and regular child containers + are not mergeable markers. `ensure_mergeable_*` must return `ArgErr` rather + than overwrite them. +- Repeating same-kind `ensure_mergeable_*` over the same marker is idempotent and + should not emit another op. +- Calling a different-kind `ensure_mergeable_*` over an existing mergeable marker + is a deliberate local kind change. +- Deleting the map key clears the marker and hides the child; re-ensuring writes + a new marker and resurfaces preserved state. +- Detached map handlers cannot ensure mergeable children, because the + deterministic child cid depends on the attached parent cid. + +## Snapshot And Retention Rules + +Snapshot and shallow snapshot alive-container walks must preserve mergeable child +state even when that child is hidden by a different winning marker. This is +covered by `tests/mergeable_container/snapshot.rs`, including shallow snapshot +tests for losing-kind state. + +Raw marker bytes are the wire/storage representation. Public read and diff +surfaces should translate an active marker to a container value. APIs that expose +raw/shallow storage may still show the binary marker for forward compatibility. + +## Tests By Question + +- Deterministic cid and malformed parser cases: + `cargo test -p loro-internal --test mergeable_cid_encoding` +- Marker layout, idempotency, kind changes, and non-mergeable occupant guards: + `cargo test -p loro-internal --test mergeable_container discriminator` +- Same-kind convergence and nested chains: + `cargo test -p loro-internal --test mergeable_container convergence` +- Delete/hide/reactivate behavior: + `cargo test -p loro-internal --test mergeable_container delete` +- Different-kind conflicts: + `cargo test -p loro-internal --test mergeable_container type_conflict` +- Snapshot and shallow snapshot retention: + `cargo test -p loro-internal --test mergeable_container snapshot` +- Pending import ordering: + `cargo test -p loro-internal --test mergeable_container pending` +- Events and paths: + `cargo test -p loro-internal --test mergeable_container events_and_paths` + +## Common Misconceptions + +- "A mergeable child is visible once it has ops." False; visibility is controlled + by the parent marker. +- "Deleting the key deletes the child state." False; it hides the child by + removing the marker. +- "Kind conflict discards the loser." False; the loser is hidden but should stay + recoverable by deterministic cid. +- "The marker is the child cid." False; the marker activates a kind at a + `(parent, key)`, while the cid is derived independently. diff --git a/crates/loro-internal/AGENTS.md b/crates/loro-internal/AGENTS.md index eab917cd3..a4c7af4f0 100644 --- a/crates/loro-internal/AGENTS.md +++ b/crates/loro-internal/AGENTS.md @@ -13,14 +13,17 @@ over graceful degradation. checksum verification, `EncodeMode` dispatch, import metadata, and the bridge from decoded changes into `OpLog`. - `src/encoding/`: concrete binary and JSON encoding implementations. Read - `src/encoding/AGENTS.md` before changing binary layout, JSON schema, import - metadata, shallow snapshot, or op/value encoding. + `src/encoding/AGENTS.md` and + [../../context/internal-encoding.md](../../context/internal-encoding.md) + before changing binary layout, JSON schema, import metadata, shallow snapshot, + or op/value encoding. - `src/oplog/` and `src/dag/`: change storage, dependency ordering, pending changes, version vectors/frontiers, shallow roots, and history traversal. - `src/state.rs` and `src/state/`: materialized document state, container stores, diff application, checkout/replay, deep value, dead-container tracking, and - mergeable container visibility. Read `src/state/AGENTS.md` before changing - mergeable containers. + mergeable container visibility. Read `src/state/AGENTS.md` and + [../../context/mergeable-containers.md](../../context/mergeable-containers.md) + before changing mergeable containers. - `src/handler.rs`: typed container handlers, local operation creation, and `MapHandler::ensure_mergeable_*`. - `src/diff_calc/`: diff calculation when moving between versions. diff --git a/crates/loro-internal/src/encoding/AGENTS.md b/crates/loro-internal/src/encoding/AGENTS.md index 4e7eb7c3c..5a50ce70d 100644 --- a/crates/loro-internal/src/encoding/AGENTS.md +++ b/crates/loro-internal/src/encoding/AGENTS.md @@ -1,101 +1,38 @@ # Encoding Guidelines -This module owns Loro's import/export formats. It is easy to confuse the -top-level blob modes, the current fast binary layouts, the legacy helper module, -and the JSON schema path; keep those boundaries explicit. - -## Entry Points - -- `../encoding.rs`: `ExportMode`, `EncodeMode`, 22-byte `loro` header, - checksum validation, top-level encode/decode dispatch, and - `decode_import_blob_meta`. -- `fast_snapshot.rs`: current `FastSnapshot` and `FastUpdates` body encoding. -- `shallow_snapshot.rs`: `ShallowSnapshot`, `StateOnly`, and `SnapshotAt` - variants built on `FastSnapshot`. -- `json_schema.rs`: JSON updates (`JsonSchema`, `schema_version = 1`), peer - compression, JSON validation, import/export, and redaction. -- `outdated_encode_reordered.rs`: legacy-named op/value columnar helpers and - `import_changes_to_oplog`. The top-level outdated blob modes are unsupported, - but this file still contains helpers used by current fast paths. -- `value.rs`, `value_register.rs`, and `arena.rs`: value/op encoding support, - value tables, peer/key registers, and arena-backed value decoding. -- `../../Encoding.md`: older high-level encoding notes. Treat it as background, - not the source of truth when code disagrees. -- `../../../../docs/encoding.md`: detailed current binary format reference. -- `../../../../docs/encoding-container-states.md`: container state snapshot - layouts used inside `FastSnapshot.state_bytes`. - -## Supported Formats - -Top-level binary blobs all start with: - -- magic bytes `loro`, -- 16 checksum bytes, -- a big-endian `u16` encode mode, -- then mode-specific body bytes. - -Current supported binary modes: - -- `EncodeMode::FastSnapshot = 3`: used by `ExportMode::Snapshot`, - `ShallowSnapshot`, `StateOnly`, and `SnapshotAt`. -- `EncodeMode::FastUpdates = 4`: used by `ExportMode::Updates` and - `UpdatesInRange`. - -Legacy top-level modes: - -- `EncodeMode::OutdatedRle = 1` -- `EncodeMode::OutdatedSnapshot = 2` - -These parse as known modes for compatibility detection but currently return -`ImportUnsupportedEncodingMode` on import/metadata decode. Do not re-enable them -without a compatibility plan and fixtures. - -JSON update format: - -- `json_schema.rs` is not wrapped in the binary `loro` header. -- It carries `schema_version = 1`, `start_version`, optional peer compression, - and a list of JSON changes/ops. -- Malformed JSON schema should return `Err`, not partially import. - -## FastSnapshot - -`fast_snapshot.rs` encodes a snapshot body as three length-prefixed sections: - -1. `oplog_bytes`: KV-store encoded change history. -2. `state_bytes`: KV-store encoded materialized state, or `EMPTY_MARK` when - omitted and state must be recalculated. -3. `shallow_root_state_bytes`: KV-store encoded shallow root state, empty for - non-shallow snapshots. - -Importing a snapshot into an empty doc can initialize oplog and state directly. -Importing snapshot data into a non-empty doc goes through decoded oplog changes -instead. Failed snapshot import must roll back both oplog and state. - -## FastUpdates - -`FastUpdates` body is a sequence of LEB128 length-prefixed change blocks. -`decode_updates` must reject truncated blocks, length overflow, and corrupt block -payloads. Decoded changes are sorted by lamport before being applied. - -## Shallow/State-Only Snapshots - -`shallow_snapshot.rs` temporarily checks out versions to build shallow root -state and state deltas, then restores the original document state. - -- `ShallowSnapshot` retains history since a calculated shallow start frontier. -- `StateOnly` is a shallow snapshot with minimal history at a target version. -- `SnapshotAt` exports full history up to target frontiers plus state there. +This module owns Loro import/export formats. Read +[../../../../context/internal-encoding.md](../../../../context/internal-encoding.md) +for the verified map of supported modes, outdated modes, shallow snapshots, JSON +schema, and validation entry points. + +## Local Entry Points + +- `../encoding.rs`: `ExportMode`, `EncodeMode`, 22-byte `loro` header, checksum + validation, top-level dispatch, and `decode_import_blob_meta`. +- `fast_snapshot.rs`: current `FastSnapshot` and `FastUpdates` body layouts. +- `shallow_snapshot.rs`: `ShallowSnapshot`, `StateOnly`, and `SnapshotAt`. +- `json_schema.rs`: JSON updates, peer compression, validation, import/export, + and redaction. +- `outdated_encode_reordered.rs`: legacy-named op/value columnar helpers still + used by current fast paths; do not confuse this with unsupported top-level + outdated blob modes. +- `value.rs`, `value_register.rs`, `arena.rs`: op/value encoding support. + +## Rules + +- Current binary modes are `FastSnapshot = 3` and `FastUpdates = 4`. +- Top-level `OutdatedRle = 1` and `OutdatedSnapshot = 2` are compatibility + detections, not formats to extend. +- Malformed bytes or JSON schema should return `Err`, not partially import. +- Snapshot import/export must preserve rollback and attached/detached state + invariants. - Unknown container types must block shallow/state snapshot export rather than - writing a blob that cannot be decoded correctly. -- Style start/end ops must not be split across shallow roots. + producing a blob that cannot be decoded correctly. ## Validation -For encoding changes, prefer focused fixtures and malformed-input tests. Useful -starting points: - - `cargo test -p loro-internal import_atomicity` - `cargo test -p loro-internal decode_updates_rejects_truncated_block` -- `cargo test -p loro-internal --test mergeable_container` when snapshots may - affect mergeable child retention. -- Root `pnpm test` when changing shared import/export semantics. +- `cargo test -p loro-internal --test mergeable_container` when snapshot changes + can affect mergeable child retention. +- Root `pnpm test` for shared import/export semantic changes. diff --git a/crates/loro-internal/src/state/AGENTS.md b/crates/loro-internal/src/state/AGENTS.md index 9caa33beb..5c653e67a 100644 --- a/crates/loro-internal/src/state/AGENTS.md +++ b/crates/loro-internal/src/state/AGENTS.md @@ -2,108 +2,39 @@ This module owns materialized document state, container stores, diff application, checkout/replay behavior, deep/shallow values, and mergeable container -visibility. +visibility. Read +[../../../../context/mergeable-containers.md](../../../../context/mergeable-containers.md) +before changing mergeable child behavior. -## State Map +## Local Entry Points - `../state.rs`: `DocState`, checkout/path/deep-value traversal, state replay, - container lifecycle, and alive-container discovery. + lifecycle, and alive-container discovery. - `container_store/`: persisted KV-backed container snapshots and `ContainerWrapper` encoding. - `map_state.rs`, `list_state.rs`, `richtext_state.rs`, `tree_state.rs`, `movable_list_state.rs`, `counter_state.rs`: per-container state and snapshot codecs. - `mergeable.rs`: logical child edge resolution for mergeable containers. -- `dead_containers_cache.rs`: dead/alive tracking, including marker-driven - mergeable reactivation. -- `unknown_state.rs` and `../diff_calc/unknown.rs`: forward-compatibility support - for unknown container types. -- `../../docs/mergeable-container-id.md`: mergeable container id format. -- `../../tests/mergeable_container/`: behavior tests for mergeable visibility, - deletion, conflicts, pending updates, paths/events, and snapshots. - -## Mergeable Container Model - -Mergeable child containers are created by `MapHandler::ensure_mergeable_*` in -`../handler.rs` and exposed by Rust/WASM wrapper APIs. - -Core idea: - -- Two peers calling `ensure_mergeable_(key)` on the same parent map derive - the same deterministic child `ContainerID` via - `ContainerID::new_mergeable(parent, key, kind)` in `loro-common`. -- The child id is represented as a reserved `ContainerID::Root` name. The child - kind lives in `ContainerID::Root.container_type`; the root-name payload only - encodes parent map identity and map-key path. -- The parent map slot stores a compact binary marker from - `loro_common::mergeable_marker(parent, key, kind)`. This marker is the source - of truth for which mergeable child kind is currently visible. -- Parent map LWW semantics resolve concurrent different-kind markers. Losing - children are hidden but their state must be preserved and can resurface if a - later `ensure_mergeable_` rewrites the marker. - -Important boundaries: - -- User strings, arbitrary binary values, scalars, and regular child containers - are not mergeable markers and must block `ensure_mergeable_*` rather than be - overwritten. -- Same-kind `ensure_mergeable_*` over an existing marker is idempotent and should - not emit another op. -- Different-kind `ensure_mergeable_*` over an existing marker is a deliberate - kind change and writes a new marker. -- Deleting the map key clears the marker and hides the child, but the child state - is preserved by deterministic id. Re-ensuring the same kind resurfaces it. -- Visibility comes from the parent marker, not from whether the child already - has direct ops. This matters for nested mergeable maps and pending imports. - -## Mergeable Code Index - -- `crates/loro-common/src/lib.rs`: `MERGEABLE_NAMESPACE_PREFIX`, - `ContainerID::new_mergeable`, `parse_mergeable`, `mergeable_marker`, - `parse_mergeable_marker`, and marker-to-container translation. -- `../handler.rs`: `MapHandler::ensure_mergeable_container` validates the parent - slot, writes markers, and returns a handler for the deterministic cid. -- `mergeable.rs`: resolves logical child paths from deterministic cid plus the - parent map's current marker. -- `map_state.rs`: translates marker values to `LoroValue::Container` at read and - diff boundaries when the parent id is known. -- `../state.rs`: deep-value/path traversal must recognize marker-backed child - edges. -- `../txn.rs`: local event diffs translate marker writes into container values - for subscribers. -- `dead_containers_cache.rs`: import/reactivation behavior when marker values - change across peers. -- `../../tests/mergeable_cid_encoding.rs`: deterministic cid and parser tests. -- `../../tests/mergeable_container/discriminator.rs`: marker layout, - idempotency, kind-change, and non-mergeable occupant tests. -- `../../tests/mergeable_container/type_conflict.rs`: concurrent different-kind - conflict behavior. -- `../../tests/mergeable_container/snapshot.rs`: snapshot and shallow snapshot - retention, including losing-kind state. -- `../../tests/mergeable_container/pending.rs`: pending updates that arrive - before all mergeable context exists. -- `../../tests/mergeable_container/events_and_paths.rs`: event and path surface. - -## Encoding/Retention Rules - -- Snapshot and shallow snapshot alive-container walks must retain mergeable - child state even when the child is not currently visible because another kind's - marker wins. -- Raw marker bytes are the wire/storage representation. Public read surfaces - should translate active markers to container values unless the API explicitly - exposes raw/shallow storage. -- Mergeable root names grow with nested mergeable map key paths. Avoid adding - APIs or tests that encourage deep mergeable-map chains without measuring the - serialized id cost. +- `dead_containers_cache.rs`: dead/alive tracking and marker-driven mergeable + reactivation. +- `unknown_state.rs` and `../diff_calc/unknown.rs`: forward compatibility for + unknown container types. + +## Mergeable Rules + +- `MapHandler::ensure_mergeable_*` writes a compact marker into the parent map + and returns a handler for a deterministic `ContainerID`. +- The parent map marker, not "child has ops", decides whether a mergeable child + is visible. +- Non-mergeable occupants must block `ensure_mergeable_*`; same-kind marker + writes are idempotent; different-kind marker writes are deliberate kind + changes. +- Snapshot and shallow snapshot retention must preserve hidden losing-kind + mergeable state. ## Validation -For mergeable/state changes, start with: - - `cargo test -p loro-internal --test mergeable_cid_encoding` - `cargo test -p loro-internal --test mergeable_container` -- `cargo test -p loro-internal import_atomicity` if import or rollback is - involved. - -Run root-level broader tests when changing shared replay, checkout, or snapshot -behavior. +- `cargo test -p loro-internal import_atomicity` if import or rollback is involved. From e43157488a5b2a8c56269ed2d6c40889c1eb9916 Mon Sep 17 00:00:00 2001 From: Zixuan Chen Date: Tue, 16 Jun 2026 02:55:58 +0000 Subject: [PATCH 4/4] chore: fix context hop gaps from probes --- context/internal-encoding.md | 7 +++++++ crates/loro-wasm/AGENTS.md | 7 ++++--- examples/bundler-smoke-tests/README.md | 2 +- 3 files changed, 12 insertions(+), 4 deletions(-) diff --git a/context/internal-encoding.md b/context/internal-encoding.md index 8dafa95f4..65ecb0975 100644 --- a/context/internal-encoding.md +++ b/context/internal-encoding.md @@ -103,6 +103,10 @@ document's original state and attached/detached status. It must not split rich text style start/end ops across the shallow root. Unknown container types block shallow/state snapshot export through `LoroEncodeError::UnknownContainer`. +Pre-shallow frontier safety lives in `loro.rs`: `checkout`, `diff`, and +`revert_to` must return `SwitchToVersionBeforeShallowRoot` instead of traversing +history before the shallow root. + ## JSON Updates `json_schema.rs` is not wrapped in the binary `loro` envelope. Its @@ -121,6 +125,9 @@ when changing JSON import validation or rollback behavior. - Binary malformed input or rollback: `cargo test -p loro-internal import_atomicity` - Truncated fast updates: `cargo test -p loro-internal decode_updates_rejects_truncated_block` +- Pre-shallow checkout/diff/revert behavior: + `cargo test -p loro --test issue issue_928` and + `cargo test -p loro --test contracts shallow` - Snapshot retention that might involve mergeable containers: `cargo test -p loro-internal --test mergeable_container` - Shared behavior: root `pnpm test` diff --git a/crates/loro-wasm/AGENTS.md b/crates/loro-wasm/AGENTS.md index 681f87b76..287a8b51d 100644 --- a/crates/loro-wasm/AGENTS.md +++ b/crates/loro-wasm/AGENTS.md @@ -46,9 +46,10 @@ can: - apply diffs (`revertTo`, `applyDiff`), or - change ephemeral store state that has JS subscribers. -If yes, add the JS method name to the relevant `decorateMethods(...)` allowlist -near the bottom of `index.ts` (`LoroDoc.prototype`, container prototypes, -`EphemeralStoreWasm.prototype`, or `UndoManager.prototype`). Pure read/query APIs +If yes, add the JS method name to the relevant installed `decorateMethods(...)` +allowlist near the bottom of `index.ts`. Today those wrappers cover +`LoroDoc.prototype`, `EphemeralStoreWasm.prototype`, and `UndoManager.prototype`; +add another prototype only when the wrapper is wired there. Pure read/query APIs should not be decorated. A quick behavioral check is to run with an active `doc.subscribe(...)` or diff --git a/examples/bundler-smoke-tests/README.md b/examples/bundler-smoke-tests/README.md index 28f1562bb..c476812cc 100644 --- a/examples/bundler-smoke-tests/README.md +++ b/examples/bundler-smoke-tests/README.md @@ -29,7 +29,7 @@ pnpm --dir examples/bundler-smoke-tests run test:next ``` To also launch each production-built app in Chromium and verify `doc.toJSON()` -returns `{ t: "hi" }` in a real browser: +returns `{ map: { text: "mergeable-smoke" } }` in a real browser: ```sh pnpm --dir examples/bundler-smoke-tests run test:browser