-
Notifications
You must be signed in to change notification settings - Fork 0
feat!: mandatory CAS blob storage with streaming I/O (OG-014) #72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
5f0cebd
docs: promote OG-014 to active with streaming CAS blob storage design
flyingrobots 5dba666
test: add red-phase spec tests for OG-014 streaming CAS blob storage
flyingrobots af80714
fix: align content tests with mandatory blobStorage and fix size/mime…
flyingrobots ea0df5f
feat!: mandatory CAS blob storage with streaming I/O (OG-014)
flyingrobots 05bdb3a
feat!: expose streaming content API on public surface (OG-014 checkpo…
flyingrobots d7e93c8
docs: add OG-014 retrospective
flyingrobots 916449d
docs: close OG-010 — design thinking pass shipped in v15.0.0
flyingrobots 166f955
docs: add JSDoc comments to undocumented exports in index.d.ts
flyingrobots 3e19a91
docs: raise JSR symbol documentation to 100% coverage (OG-015)
flyingrobots de7c138
docs: close OG-015 — JSR symbol documentation at 100%
flyingrobots ec783ce
fix: resolve pre-push gate failures for OG-014 slice
flyingrobots bc12f76
fix: resolve 13 code review findings from OG-014 self-review
flyingrobots e476303
refactor!: remove deprecated TraversalService and createWriter
flyingrobots 017d5a1
release: v16.0.0
flyingrobots 024b0bb
chore: gitignore EDITORS-REPORT.md
flyingrobots 5677907
fix: restore LogicalTraversal type declarations for traverse property
flyingrobots File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,4 +5,5 @@ node_modules/ | |
| coverage/ | ||
| CLAUDE.md | ||
| TASKS-DO-NOT-CHECK-IN.md | ||
| EDITORS-REPORT.md | ||
| git-stunts-git-warp-*.tgz | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,6 @@ | ||
| # OG-010 — IBM Design Thinking Pass Over Public APIs And README | ||
|
|
||
| Status: ACTIVE | ||
| Status: DONE | ||
|
|
||
| ## Problem | ||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,102 +1,114 @@ | ||
| # OG-014 — Stream content attachments through git-cas | ||
| # OG-014 — Mandatory CAS blob storage with streaming I/O | ||
|
|
||
| Status: QUEUED | ||
| Status: DONE | ||
|
|
||
| Legend: Observer Geometry | ||
|
|
||
| ## Problem | ||
|
|
||
| `getContent()` and `getEdgeContent()` currently return full `Uint8Array` | ||
| buffers. That means attachment reads materialize the entire payload in memory | ||
| before user code can process it. | ||
|
|
||
| This is fine for small text blobs, but it is the wrong default shape for large | ||
| attachments: | ||
| Design doc: `docs/design/streaming-cas-blob-storage.md` | ||
|
|
||
| - the attachment may not fit comfortably in memory | ||
| - the caller cannot decide between buffered read and stream processing | ||
| - builder-facing docs risk teaching attachment reads as eager byte loads | ||
| - the current blob-storage abstraction still forces `retrieve()` to return a | ||
| full buffer rather than a stream-capable interface | ||
| ## Problem | ||
|
|
||
| `git-warp` already contains a `CasBlobAdapter` that stores attachments in | ||
| `git-cas` with CDC chunking, but the public attachment path still terminates in | ||
| buffered reads. That leaves the most scalable backend present but not fully | ||
| expressed through the public API. | ||
| Content blob attachments in `git-warp` have two structural problems: | ||
|
|
||
| ## Why this matters | ||
| ### 1. CAS blob storage is opt-in | ||
|
|
||
| WARP graphs can legitimately carry attached documents, artifacts, and other | ||
| payloads that are larger than normal graph properties. | ||
| `attachContent()` and `attachEdgeContent()` accept an optional `blobStorage` | ||
| injection. When callers do not provide it, blobs fall through to raw | ||
| `persistence.writeBlob()` — a single unchunked Git object with no CDC | ||
| deduplication, no encryption support, and no streaming restore path. | ||
|
|
||
| The API should make the memory tradeoff explicit: | ||
| This means the substrate's chunking, deduplication, and encryption capabilities | ||
| are present but silently bypassed by default. There is no good reason for a | ||
| content blob to skip CAS. Every blob should be chunked. | ||
|
|
||
| - buffered reads when you actually want all bytes in memory | ||
| - streaming reads when you want to process incrementally | ||
| ### 2. Neither write nor read paths support streaming | ||
|
|
||
| That decision should belong to the caller, not be forced by the default | ||
| attachment API shape. | ||
| **Write path**: `attachContent(nodeId, content)` accepts `Uint8Array | string`. | ||
| The caller must buffer the entire payload in memory before handing it to the | ||
| patch builder. `CasBlobAdapter.store()` then wraps that buffer in | ||
| `Readable.from([buf])` — a synthetic stream from an already-buffered payload. | ||
|
|
||
| ## Current state | ||
| **Read path**: `getContent(nodeId)` returns `Promise<Uint8Array | null>`. The | ||
| full blob is materialized into memory before the caller can process it. | ||
| `CasBlobAdapter.retrieve()` calls `cas.restore()` which buffers internally. | ||
|
|
||
| Today the attachment read path is eager: | ||
| `git-cas` already supports streaming on both sides: | ||
| - `cas.store({ source })` accepts any readable/iterable source | ||
| - `cas.restoreStream()` returns `AsyncIterable<Buffer>` | ||
|
|
||
| - `getContent()` -> `Promise<Uint8Array|null>` | ||
| - `getEdgeContent()` -> `Promise<Uint8Array|null>` | ||
| - `BlobStoragePort.retrieve()` -> `Promise<Uint8Array>` | ||
| - default Git blob reads go through `readBlob()` and collect the full blob | ||
| - `CasBlobAdapter` can already store attachment content in `git-cas`, but it | ||
| still restores into one full buffer via `retrieve()` | ||
| - `git-cas` streaming restore is already used in `CasSeekCacheAdapter`, but not | ||
| yet exposed through attachment reads | ||
| The streaming substrate is there. It is not expressed through the public API. | ||
|
|
||
| ## Desired outcome | ||
| ## Why this matters | ||
|
|
||
| Make `git-cas` the first-class streaming attachment path without breaking the | ||
| simple buffered paths. | ||
| WARP graphs can carry attached documents, media, model weights, and other | ||
| payloads that are legitimately large. The API should not force full in-memory | ||
| buffering on either side of the I/O boundary. | ||
|
|
||
| Likely shape: | ||
| - Callers writing large content should be able to pipe a stream in | ||
| - Callers reading large content should be able to consume it incrementally | ||
| - Every blob should get CDC chunking and deduplication as a substrate guarantee | ||
| - The decision between buffered and streaming I/O should belong to the caller | ||
|
|
||
| - `getContentStream(nodeId)` | ||
| - `getEdgeContentStream(from, to, label)` | ||
| - `BlobStoragePort.retrieveStream(oid)` | ||
| - `CasBlobAdapter.retrieveStream(oid)` backed by `git-cas restoreStream()` | ||
| - a clear default/recommended way to wire `CasBlobAdapter` into `WarpApp.open()` | ||
| / `WarpCore.open()` for attachment storage | ||
| ## Current state | ||
|
|
||
| Buffered helpers should remain available for convenience, but they should be | ||
| clearly layered on top of the stream-capable substrate. | ||
| As of `v15.0.1`: | ||
|
|
||
| - `BlobStoragePort`: `store(content, options) → Promise<string>`, | ||
| `retrieve(oid) → Promise<Uint8Array>` — both buffered | ||
| - `CasBlobAdapter`: fully implemented CAS adapter with CDC chunking, optional | ||
| encryption, backward-compat fallback to raw Git blobs — but only buffered I/O | ||
| - `CasBlobAdapter` is internal (not exported from `index.js`) | ||
| - `PatchBuilderV2.attachContent()`: accepts `Uint8Array | string`, uses | ||
| `blobStorage.store()` if injected, else raw `persistence.writeBlob()` | ||
| - `getContent()` / `getEdgeContent()`: returns `Promise<Uint8Array | null>`, | ||
| uses `blobStorage.retrieve()` if injected, else raw `persistence.readBlob()` | ||
| - `WarpApp` and `WarpCore` do not expose content read methods at all | ||
| - `git-cas` streaming (`restoreStream()`) is already used in | ||
| `CasSeekCacheAdapter` but not in blob reads | ||
| - `InMemoryGraphAdapter` has `writeBlob()`/`readBlob()` for browser/test path | ||
|
|
||
| Longer-term, if attachment storage standardizes on `git-cas`, the builder story | ||
| gets cleaner too: | ||
| ## Desired outcome | ||
|
|
||
| - large attachments become chunked CAS assets | ||
| - reads can stream incrementally | ||
| - dedupe happens below the API surface | ||
| - legacy raw Git blob attachments can remain readable for compatibility | ||
| 1. CAS blob storage is mandatory — no fallback to raw `writeBlob()` for content | ||
| 2. Write path accepts streaming input and pipes through without buffering | ||
| 3. Read path returns a stream the caller can consume incrementally | ||
| 4. Buffered convenience methods remain available, layered on top of streams | ||
| 5. Browser and in-memory paths still work via a conforming adapter | ||
| 6. Legacy raw Git blob attachments remain readable for backward compatibility | ||
|
|
||
| ## Acceptance criteria | ||
|
|
||
| 1. `git-warp` exposes explicit streaming APIs for node and edge attachments. | ||
| 2. Callers can choose stream vs buffered read intentionally. | ||
| 3. `BlobStoragePort` grows a stream-capable retrieval contract. | ||
| 4. `CasBlobAdapter` supports streaming retrieval via `git-cas`. | ||
| 5. `git-cas` becomes the recommended path for large attachment storage. | ||
| 6. Legacy raw Git blob attachments remain readable for compatibility. | ||
| 7. Builder docs explain when to use buffered reads vs streams. | ||
| 8. Large attachment reads no longer require full in-memory buffering by | ||
| default in the stream path. | ||
| 1. Every content blob written through `attachContent()` / `attachEdgeContent()` | ||
| goes through `BlobStoragePort` — no raw `persistence.writeBlob()` fallback. | ||
| 2. `attachContent()` / `attachEdgeContent()` accept streaming input | ||
| (`AsyncIterable<Uint8Array>`, `ReadableStream`, `Uint8Array`, `string`). | ||
| 3. New `getContentStream()` / `getEdgeContentStream()` return | ||
| `AsyncIterable<Uint8Array>` for incremental consumption. | ||
| 4. Existing `getContent()` / `getEdgeContent()` remain as buffered convenience, | ||
| implemented on top of the stream primitive. | ||
| 5. `BlobStoragePort` grows `storeStream()` and `retrieveStream()` methods. | ||
| 6. `CasBlobAdapter` implements streaming via `git-cas` natively. | ||
| 7. An `InMemoryBlobStorageAdapter` implements the port contract for browser and | ||
| test paths. | ||
| 8. Legacy raw Git blob attachments remain readable through backward-compat | ||
| fallback in `CasBlobAdapter.retrieveStream()`. | ||
| 9. Content stream methods are exposed on `WarpApp` and `WarpCore`. | ||
|
|
||
| ## Non-goals | ||
|
|
||
| - no automatic conversion of all existing attachment reads to streams | ||
| - no silent breaking change to `getContent()` / `getEdgeContent()` | ||
| - no attempt to solve whole-state out-of-core replay here | ||
| - No automatic migration of existing raw Git blobs to CAS format | ||
| - No silent breaking change to existing `getContent()` / `getEdgeContent()` | ||
| return types | ||
| - No attempt to solve whole-state out-of-core replay (that is OG-013) | ||
| - No encryption-by-default (encryption remains an opt-in CAS capability) | ||
|
|
||
| ## Notes | ||
|
|
||
| This item is related to, but narrower than, | ||
| `OG-013-out-of-core-materialization-and-streaming-reads.md`. | ||
| `OG-013` is about whole-state and replay architecture. | ||
| This item is specifically about attachment payload I/O and making | ||
| `git-cas` the streaming/chunked attachment path. | ||
| This item supersedes the original OG-014 scope, which covered only streaming | ||
| reads. The expanded scope now includes mandatory CAS and streaming writes. | ||
|
|
||
| Related items: | ||
| - `OG-013`: out-of-core materialization and streaming reads (broader, separate) | ||
| - `B160`: blob attachments via CAS (done, but opt-in — this item makes it | ||
| mandatory) | ||
| - `B163`: streaming restore for seek cache (done, pattern to follow for blobs) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,6 @@ | ||
| # OG-015 — Raise JSR documentation quality score | ||
|
|
||
| Status: QUEUED | ||
| Status: DONE | ||
|
|
||
| Legend: Observer Geometry | ||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,108 @@ | ||
| # Migrating to v16 | ||
|
|
||
| This guide covers the breaking changes in v16.0.0 and how to update your code. | ||
|
|
||
| ## Content attachments now require blob storage (OG-014) | ||
|
|
||
| **What changed:** `attachContent()` and `attachEdgeContent()` no longer fall | ||
| back to raw `persistence.writeBlob()`. They always route through | ||
| `BlobStoragePort`. Without blob storage, they throw `NO_BLOB_STORAGE`. | ||
|
|
||
| **Who is affected:** Only consumers who construct `PatchBuilderV2` directly | ||
| (bypassing `WarpRuntime.open()`) without passing `blobStorage`. If you use | ||
| `WarpApp.open()`, `WarpCore.open()`, or `WarpRuntime.open()`, blob storage | ||
| is auto-constructed — no code changes needed. | ||
|
|
||
| **How to migrate:** | ||
|
|
||
| ```javascript | ||
| import InMemoryBlobStorageAdapter from '@git-stunts/git-warp/defaultBlobStorage'; | ||
|
|
||
| // If you construct PatchBuilderV2 directly, add blobStorage: | ||
| const builder = new PatchBuilderV2({ | ||
| persistence, | ||
| writerId: 'alice', | ||
| blobStorage: new InMemoryBlobStorageAdapter(), // or CasBlobAdapter for Git | ||
| // ...other options | ||
| }); | ||
| ``` | ||
|
|
||
| ## Streaming content I/O | ||
|
|
||
| **What changed:** `attachContent()` and `attachEdgeContent()` now accept | ||
| streaming input (`AsyncIterable<Uint8Array>`, `ReadableStream<Uint8Array>`) | ||
| in addition to `Uint8Array` and `string`. New `getContentStream()` and | ||
| `getEdgeContentStream()` methods return `AsyncIterable<Uint8Array>`. | ||
|
|
||
| **Who is affected:** No one — this is additive. Existing code continues to | ||
| work. New streaming APIs are opt-in. | ||
|
|
||
| **How to use:** | ||
|
|
||
| ```javascript | ||
| // Streaming write — pipe a file directly | ||
| import { createReadStream } from 'node:fs'; | ||
| const patch = await app.createPatch(); | ||
| patch.addNode('doc:1'); | ||
| await patch.attachContent('doc:1', createReadStream('large-file.bin'), { | ||
| size: fileStat.size, | ||
| mime: 'application/octet-stream', | ||
| }); | ||
| await patch.commit(); | ||
|
|
||
| // Streaming read — consume incrementally | ||
| const stream = await app.getContentStream('doc:1'); | ||
| if (stream) { | ||
| for await (const chunk of stream) { | ||
| process.stdout.write(chunk); | ||
| } | ||
| } | ||
|
|
||
| // Buffered read — unchanged | ||
| const buf = await app.getContent('doc:1'); | ||
| ``` | ||
|
|
||
| ## Content blob tree entries use tree mode | ||
|
|
||
| **What changed:** Patch commit trees and checkpoint trees now reference | ||
| content blobs as `040000 tree` entries (CAS tree OIDs) instead of | ||
| `100644 blob` entries. | ||
|
|
||
| **Who is affected:** Consumers who parse raw Git commit trees and expect | ||
| content anchor entries to use blob mode. This does not affect any public | ||
| API — it is an internal storage format change. | ||
|
|
||
| **How to migrate:** If you parse `_content_<oid>` entries from commit trees, | ||
| update your parser to accept `040000 tree` mode. | ||
|
|
||
| ## `TraversalService` removed | ||
|
|
||
| **What changed:** The `TraversalService` export was a deprecated alias for | ||
| `CommitDagTraversalService`. It has been removed. | ||
|
|
||
| **How to migrate:** | ||
|
|
||
| ```javascript | ||
| // Before | ||
| import { TraversalService } from '@git-stunts/git-warp'; | ||
|
|
||
| // After | ||
| import { CommitDagTraversalService } from '@git-stunts/git-warp'; | ||
| ``` | ||
|
|
||
| ## `createWriter()` removed | ||
|
|
||
| **What changed:** The `createWriter()` method on `WarpApp` was deprecated in | ||
| v15 and has been removed. Use `writer()` instead. | ||
|
|
||
| **How to migrate:** | ||
|
|
||
| ```javascript | ||
| // Before | ||
| const w = await app.createWriter(); | ||
| const w2 = await app.createWriter({ persist: 'config', alias: 'secondary' }); | ||
|
|
||
| // After | ||
| const w = await app.writer(); // resolves from git config or generates | ||
| const w2 = await app.writer('secondary'); // explicit ID | ||
| ``` |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Criterion 4 does not match the current implementation.
src/domain/warp/query.methods.js:737-760andsrc/domain/warp/query.methods.js:803-826still callretrieve()directly for the buffered getters, so they are not yet “implemented on top of the stream primitive” as written. Please either relax this wording or route the buffered helpers through the stream path before closing OG-014.🤖 Prompt for AI Agents