refactor(content): replace content-scanner + content-manager with content-pipeline#1
Merged
mkotelnikov merged 1 commit intomainfrom Apr 24, 2026
Merged
Conversation
…tent-pipeline
BREAKING: removes @statewalker/content-scanner and @statewalker/content-manager.
Adds @statewalker/content-pipeline — a single package that collapses the two
older packages' ~2450 LOC of scanner + orchestration infrastructure down to
~585 LOC of layered Trackers, typed entry types, and three interchangeable
Store<E> backends (JsonManifestStore, BlobStore with msgpack + raw Float32
codecs, and an optional day-2 SqlStore).
Why: the old scanner/manager pair carried two parallel change-tracking APIs,
a dual truth source (per-entry JSON + side index), opaque async-generator
binary payloads on every entry, and a sync()-drains-each-stage model that
pinned the CPU on large collections. The new pipeline uses monotonic integer
stamps, per-listener cursors, batched pacing with pauseMs, and a runtime
cascade via onStampUpdate — so scanFiles() writes kick extract → split →
fts/embed/vec in turn, with tombstones propagating automatically and
per-URI errors isolated.
Migration: replace createContentManager({registry: FilesScanRegistry}) with
createContentManager({statePrefix: "/..."}). Public sync/search/status/
clear/close surface is preserved. On-disk store layout is not
backward-compatible — first run rebuilds from scratch.
content-cli ported; smoke test (10/10) green. See the content-pipeline-redesign
change in the umbrella's openspec/ for the full proposal, design, specs,
and task history.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
BREAKING: removes
@statewalker/content-scannerand@statewalker/content-manager, replaces them with a single new package@statewalker/content-pipeline(~585 LOC, replacing ~2450 LOC).batchSize+pauseMs), runtime cascade viaonStampUpdate, tombstone propagation, per-URI error isolation.Store<E>backends:JsonManifestStore(small-meta layers),BlobStorewith pluggable codecs including a raw Float32 fast-path for embeddings (payload-heavy layers), and an optional day-2SqlStoreover@statewalker/db-api.content-cliported; publiccreateContentManagersurface (sync/search/status/clear/close) preserved exceptregistry: FilesScanRegistry→statePrefix: string.Paired PR in statewalker-apps ports
chat.core/fts-search-tools.tsto the new package. Full design context, alternatives, and task history live underopenspec/changes/content-pipeline-redesign/in the umbrella.Test plan
pnpm --filter @statewalker/content-pipeline test— 41/41 ✓ (tracker driver, JSON/Blob stores, codecs, end-to-end cascade, tombstone cascade, rebuild-from-scratch, embeddings Float32 round-trip, pacing)pnpm --filter @statewalker/content-cli test— 10/10 ✓ (realtsx cli.tsagainst fixture docs)pnpm --filter @statewalker/content-pipeline typecheck— zero new errorspnpm biome check --write --unsafe— clean.changeset/content-pipeline-redesign.md(minor for content-pipeline, patch for content-cli)Migration
For external consumers of
@statewalker/content-manager:@statewalker/content-pipeline.createContentManageroptions: dropregistry: FilesScanRegistry, addstatePrefix: string.🤖 Generated with Claude Code