Skip to content

refactor(content): replace content-scanner + content-manager with content-pipeline#1

Merged
mkotelnikov merged 1 commit intomainfrom
feature/content-pipeline-redesign
Apr 24, 2026
Merged

refactor(content): replace content-scanner + content-manager with content-pipeline#1
mkotelnikov merged 1 commit intomainfrom
feature/content-pipeline-redesign

Conversation

@mkotelnikov
Copy link
Copy Markdown
Contributor

Summary

BREAKING: removes @statewalker/content-scanner and @statewalker/content-manager, replaces them with a single new package @statewalker/content-pipeline (~585 LOC, replacing ~2450 LOC).

  • Layered Trackers with per-listener cursors, monotonic integer stamps, batched pacing (batchSize + pauseMs), runtime cascade via onStampUpdate, tombstone propagation, per-URI error isolation.
  • Three interchangeable Store<E> backends: JsonManifestStore (small-meta layers), BlobStore with pluggable codecs including a raw Float32 fast-path for embeddings (payload-heavy layers), and an optional day-2 SqlStore over @statewalker/db-api.
  • content-cli ported; public createContentManager surface (sync/search/status/clear/close) preserved except registry: FilesScanRegistrystatePrefix: string.

Paired PR in statewalker-apps ports chat.core/fts-search-tools.ts to the new package. Full design context, alternatives, and task history live under openspec/changes/content-pipeline-redesign/ in the umbrella.

Test plan

  • pnpm --filter @statewalker/content-pipeline test — 41/41 ✓ (tracker driver, JSON/Blob stores, codecs, end-to-end cascade, tombstone cascade, rebuild-from-scratch, embeddings Float32 round-trip, pacing)
  • pnpm --filter @statewalker/content-cli test — 10/10 ✓ (real tsx cli.ts against fixture docs)
  • pnpm --filter @statewalker/content-pipeline typecheck — zero new errors
  • pnpm biome check --write --unsafe — clean
  • Changeset: .changeset/content-pipeline-redesign.md (minor for content-pipeline, patch for content-cli)

Migration

For external consumers of @statewalker/content-manager:

  • Replace imports with @statewalker/content-pipeline.
  • createContentManager options: drop registry: FilesScanRegistry, add statePrefix: string.
  • First run rebuilds state from scratch; on-disk store layout is not backward-compatible.

🤖 Generated with Claude Code

…tent-pipeline

BREAKING: removes @statewalker/content-scanner and @statewalker/content-manager.
Adds @statewalker/content-pipeline — a single package that collapses the two
older packages' ~2450 LOC of scanner + orchestration infrastructure down to
~585 LOC of layered Trackers, typed entry types, and three interchangeable
Store<E> backends (JsonManifestStore, BlobStore with msgpack + raw Float32
codecs, and an optional day-2 SqlStore).

Why: the old scanner/manager pair carried two parallel change-tracking APIs,
a dual truth source (per-entry JSON + side index), opaque async-generator
binary payloads on every entry, and a sync()-drains-each-stage model that
pinned the CPU on large collections. The new pipeline uses monotonic integer
stamps, per-listener cursors, batched pacing with pauseMs, and a runtime
cascade via onStampUpdate — so scanFiles() writes kick extract → split →
fts/embed/vec in turn, with tombstones propagating automatically and
per-URI errors isolated.

Migration: replace createContentManager({registry: FilesScanRegistry}) with
createContentManager({statePrefix: "/..."}). Public sync/search/status/
clear/close surface is preserved. On-disk store layout is not
backward-compatible — first run rebuilds from scratch.

content-cli ported; smoke test (10/10) green. See the content-pipeline-redesign
change in the umbrella's openspec/ for the full proposal, design, specs,
and task history.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mkotelnikov mkotelnikov marked this pull request as ready for review April 24, 2026 17:56
@mkotelnikov mkotelnikov merged commit ee6e951 into main Apr 24, 2026
1 check failed
@mkotelnikov mkotelnikov deleted the feature/content-pipeline-redesign branch April 24, 2026 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant