Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions .changeset/content-pipeline-redesign.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
"@statewalker/content-pipeline": minor
"@statewalker/content-cli": patch
---

**BREAKING:** removed `@statewalker/content-scanner` and `@statewalker/content-manager`, replaced by a single new package `@statewalker/content-pipeline`.

The new package implements the same cascade (files → extract → split → embed → fts/vec index) as a set of layered Trackers — each with a persisted cursor, monotonic integer stamps, batched pacing, runtime cascade via `onStampUpdate`, and tombstone propagation — wiring ~2450 LOC of scanner + manager infrastructure down to ~585 LOC. Three interchangeable `Store<E>` backends (`JsonManifestStore`, `BlobStore` with pluggable codecs including a raw Float32 fast-path for embeddings, and an optional day-2 `SqlStore`) let each layer pick the right persistence for its payload profile.

**Migration:**
- Replace `@statewalker/content-manager` / `@statewalker/content-scanner` imports with `@statewalker/content-pipeline`.
- `createContentManager` options change: drop `registry: FilesScanRegistry`, add `statePrefix: string` (directory under which the pipeline stores its state). Everything else (`indexer`, `files`, `extractors`, `chunkOptions`, `embed`, `root`, `filter`) is unchanged. The `sync` / `search` / `status` / `clear` / `close` public surface is preserved.
- First run after upgrading rebuilds state from scratch; the on-disk store layout is not compatible with the old one.

See `openspec/changes/content-pipeline-redesign/` in the umbrella for the full proposal, design notes, and spec deltas.
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,7 @@ Content pipeline: blocks, extractors, scanners, managers, plus the content-cli.
| --- | --- |
| [@statewalker/content-blocks](packages/content-blocks) | Block types shared across the content pipeline. |
| [@statewalker/content-extractors](packages/content-extractors) | PDF/DOCX/XLSX/Markdown/HTML extractors. |
| [@statewalker/content-scanner](packages/content-scanner) | Scans a file tree and streams blocks into indexers. |
| [@statewalker/content-manager](packages/content-manager) | High-level scan + index orchestration. |
| [@statewalker/content-pipeline](packages/content-pipeline) | Layered trackers that cascade file-system changes through extract, split, embed, and index stages. |

## Apps

Expand Down
4 changes: 2 additions & 2 deletions apps/content-cli/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,14 @@
},
"dependencies": {
"@statewalker/content-extractors": "workspace:*",
"@statewalker/content-manager": "workspace:*",
"@statewalker/content-scanner": "workspace:*",
"@statewalker/content-pipeline": "workspace:*",
"@statewalker/indexer-api": "catalog:",
"@statewalker/indexer-mem-flexsearch": "catalog:",
"@statewalker/webrun-files": "catalog:",
"@statewalker/webrun-files-node": "catalog:"
},
"devDependencies": {
"@types/node": "catalog:",
"rimraf": "catalog:",
"tsdown": "catalog:",
"tsx": "catalog:",
Expand Down
8 changes: 3 additions & 5 deletions apps/content-cli/src/cli.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@

import { resolve } from "node:path";
import { createDefaultRegistry } from "@statewalker/content-extractors/extractors";
import { createContentManager } from "@statewalker/content-manager";
import { FilesScanRegistry } from "@statewalker/content-scanner";
import { createContentManager } from "@statewalker/content-pipeline";
import type { IndexerPersistence, PersistenceEntry } from "@statewalker/indexer-api";
import { createFlexSearchIndexer } from "@statewalker/indexer-mem-flexsearch";
import type { FilesApi } from "@statewalker/webrun-files";
Expand Down Expand Up @@ -112,18 +111,17 @@ async function main() {
const rootDir = resolve(rootPath);
const files = new NodeFilesApi({ rootDir });
const indexDir = `/${systemFolder}/indexer`;
const scanDir = `/${systemFolder}/scan`;
const statePrefix = `/${systemFolder}/content`;

const persistence = createFilePersistence(files, indexDir);
const indexer = createFlexSearchIndexer({ persistence });
const registry = new FilesScanRegistry({ files, prefix: scanDir });
const extractors = createDefaultRegistry();

const manager = createContentManager({
registry,
indexer,
files,
extractors,
statePrefix,
root: "/",
filter: (path: string) => !path.startsWith(`/${systemFolder}/`),
});
Expand Down
3 changes: 2 additions & 1 deletion apps/content-cli/tsconfig.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
"target": "ES2022",
"module": "Preserve",
"moduleResolution": "Bundler",
"lib": ["ESNext"],
"lib": ["ESNext", "DOM"],
"types": ["node"],
"strict": true,
"skipLibCheck": true,
"verbatimModuleSyntax": true,
Expand Down
2 changes: 1 addition & 1 deletion packages/content-blocks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,4 @@ For stable block ID generation use `@statewalker/shared-ids` directly.

## Related

- `@statewalker/content-extractors`, `@statewalker/content-scanner`, `@statewalker/content-manager`.
- `@statewalker/content-extractors`, `@statewalker/content-pipeline`.
26 changes: 0 additions & 26 deletions packages/content-manager/README.md

This file was deleted.

53 changes: 0 additions & 53 deletions packages/content-manager/package.json

This file was deleted.

221 changes: 0 additions & 221 deletions packages/content-manager/src/content-manager.ts

This file was deleted.

Loading
Loading