Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,5 @@ node_modules/
coverage/
CLAUDE.md
TASKS-DO-NOT-CHECK-IN.md
EDITORS-REPORT.md
git-stunts-git-warp-*.tgz
2 changes: 1 addition & 1 deletion BACKLOG/OG-010-public-api-design-thinking.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# OG-010 — IBM Design Thinking Pass Over Public APIs And README

Status: ACTIVE
Status: DONE

## Problem

Expand Down
156 changes: 84 additions & 72 deletions BACKLOG/OG-014-streaming-content-attachments.md
Original file line number Diff line number Diff line change
@@ -1,102 +1,114 @@
# OG-014 — Stream content attachments through git-cas
# OG-014 — Mandatory CAS blob storage with streaming I/O

Status: QUEUED
Status: DONE

Legend: Observer Geometry

## Problem

`getContent()` and `getEdgeContent()` currently return full `Uint8Array`
buffers. That means attachment reads materialize the entire payload in memory
before user code can process it.

This is fine for small text blobs, but it is the wrong default shape for large
attachments:
Design doc: `docs/design/streaming-cas-blob-storage.md`

- the attachment may not fit comfortably in memory
- the caller cannot decide between buffered read and stream processing
- builder-facing docs risk teaching attachment reads as eager byte loads
- the current blob-storage abstraction still forces `retrieve()` to return a
full buffer rather than a stream-capable interface
## Problem

`git-warp` already contains a `CasBlobAdapter` that stores attachments in
`git-cas` with CDC chunking, but the public attachment path still terminates in
buffered reads. That leaves the most scalable backend present but not fully
expressed through the public API.
Content blob attachments in `git-warp` have two structural problems:

## Why this matters
### 1. CAS blob storage is opt-in

WARP graphs can legitimately carry attached documents, artifacts, and other
payloads that are larger than normal graph properties.
`attachContent()` and `attachEdgeContent()` accept an optional `blobStorage`
injection. When callers do not provide it, blobs fall through to raw
`persistence.writeBlob()` — a single unchunked Git object with no CDC
deduplication, no encryption support, and no streaming restore path.

The API should make the memory tradeoff explicit:
This means the substrate's chunking, deduplication, and encryption capabilities
are present but silently bypassed by default. There is no good reason for a
content blob to skip CAS. Every blob should be chunked.

- buffered reads when you actually want all bytes in memory
- streaming reads when you want to process incrementally
### 2. Neither write nor read paths support streaming

That decision should belong to the caller, not be forced by the default
attachment API shape.
**Write path**: `attachContent(nodeId, content)` accepts `Uint8Array | string`.
The caller must buffer the entire payload in memory before handing it to the
patch builder. `CasBlobAdapter.store()` then wraps that buffer in
`Readable.from([buf])` — a synthetic stream from an already-buffered payload.

## Current state
**Read path**: `getContent(nodeId)` returns `Promise<Uint8Array | null>`. The
full blob is materialized into memory before the caller can process it.
`CasBlobAdapter.retrieve()` calls `cas.restore()` which buffers internally.

Today the attachment read path is eager:
`git-cas` already supports streaming on both sides:
- `cas.store({ source })` accepts any readable/iterable source
- `cas.restoreStream()` returns `AsyncIterable<Buffer>`

- `getContent()` -> `Promise<Uint8Array|null>`
- `getEdgeContent()` -> `Promise<Uint8Array|null>`
- `BlobStoragePort.retrieve()` -> `Promise<Uint8Array>`
- default Git blob reads go through `readBlob()` and collect the full blob
- `CasBlobAdapter` can already store attachment content in `git-cas`, but it
still restores into one full buffer via `retrieve()`
- `git-cas` streaming restore is already used in `CasSeekCacheAdapter`, but not
yet exposed through attachment reads
The streaming substrate is there. It is not expressed through the public API.

## Desired outcome
## Why this matters

Make `git-cas` the first-class streaming attachment path without breaking the
simple buffered paths.
WARP graphs can carry attached documents, media, model weights, and other
payloads that are legitimately large. The API should not force full in-memory
buffering on either side of the I/O boundary.

Likely shape:
- Callers writing large content should be able to pipe a stream in
- Callers reading large content should be able to consume it incrementally
- Every blob should get CDC chunking and deduplication as a substrate guarantee
- The decision between buffered and streaming I/O should belong to the caller

- `getContentStream(nodeId)`
- `getEdgeContentStream(from, to, label)`
- `BlobStoragePort.retrieveStream(oid)`
- `CasBlobAdapter.retrieveStream(oid)` backed by `git-cas restoreStream()`
- a clear default/recommended way to wire `CasBlobAdapter` into `WarpApp.open()`
/ `WarpCore.open()` for attachment storage
## Current state

Buffered helpers should remain available for convenience, but they should be
clearly layered on top of the stream-capable substrate.
As of `v15.0.1`:

- `BlobStoragePort`: `store(content, options) → Promise<string>`,
`retrieve(oid) → Promise<Uint8Array>` — both buffered
- `CasBlobAdapter`: fully implemented CAS adapter with CDC chunking, optional
encryption, backward-compat fallback to raw Git blobs — but only buffered I/O
- `CasBlobAdapter` is internal (not exported from `index.js`)
- `PatchBuilderV2.attachContent()`: accepts `Uint8Array | string`, uses
`blobStorage.store()` if injected, else raw `persistence.writeBlob()`
- `getContent()` / `getEdgeContent()`: returns `Promise<Uint8Array | null>`,
uses `blobStorage.retrieve()` if injected, else raw `persistence.readBlob()`
- `WarpApp` and `WarpCore` do not expose content read methods at all
- `git-cas` streaming (`restoreStream()`) is already used in
`CasSeekCacheAdapter` but not in blob reads
- `InMemoryGraphAdapter` has `writeBlob()`/`readBlob()` for browser/test path

Longer-term, if attachment storage standardizes on `git-cas`, the builder story
gets cleaner too:
## Desired outcome

- large attachments become chunked CAS assets
- reads can stream incrementally
- dedupe happens below the API surface
- legacy raw Git blob attachments can remain readable for compatibility
1. CAS blob storage is mandatory — no fallback to raw `writeBlob()` for content
2. Write path accepts streaming input and pipes through without buffering
3. Read path returns a stream the caller can consume incrementally
4. Buffered convenience methods remain available, layered on top of streams
5. Browser and in-memory paths still work via a conforming adapter
6. Legacy raw Git blob attachments remain readable for backward compatibility

## Acceptance criteria

1. `git-warp` exposes explicit streaming APIs for node and edge attachments.
2. Callers can choose stream vs buffered read intentionally.
3. `BlobStoragePort` grows a stream-capable retrieval contract.
4. `CasBlobAdapter` supports streaming retrieval via `git-cas`.
5. `git-cas` becomes the recommended path for large attachment storage.
6. Legacy raw Git blob attachments remain readable for compatibility.
7. Builder docs explain when to use buffered reads vs streams.
8. Large attachment reads no longer require full in-memory buffering by
default in the stream path.
1. Every content blob written through `attachContent()` / `attachEdgeContent()`
goes through `BlobStoragePort` — no raw `persistence.writeBlob()` fallback.
2. `attachContent()` / `attachEdgeContent()` accept streaming input
(`AsyncIterable<Uint8Array>`, `ReadableStream`, `Uint8Array`, `string`).
3. New `getContentStream()` / `getEdgeContentStream()` return
`AsyncIterable<Uint8Array>` for incremental consumption.
4. Existing `getContent()` / `getEdgeContent()` remain as buffered convenience,
implemented on top of the stream primitive.
Comment on lines +87 to +88
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Criterion 4 does not match the current implementation.

src/domain/warp/query.methods.js:737-760 and src/domain/warp/query.methods.js:803-826 still call retrieve() directly for the buffered getters, so they are not yet “implemented on top of the stream primitive” as written. Please either relax this wording or route the buffered helpers through the stream path before closing OG-014.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@BACKLOG/OG-014-streaming-content-attachments.md` around lines 87 - 88, The
buffered helpers getContent() and getEdgeContent() currently call retrieve()
directly (see retrieve()), which contradicts the stated design that they should
be implemented on top of the stream primitive; update those functions so they
route through the stream-based path instead of calling retrieve() directly —
e.g., have getContent()/getEdgeContent() call the stream primitive (the existing
streaming retrieval function) and then collect the stream into a buffer before
returning, or alternatively relax the spec text to reflect current
implementation if you don't want to change code; ensure you update references to
retrieve(), getContent(), and getEdgeContent() so the buffered getters either
wrap the stream primitive or the doc is revised.

5. `BlobStoragePort` grows `storeStream()` and `retrieveStream()` methods.
6. `CasBlobAdapter` implements streaming via `git-cas` natively.
7. An `InMemoryBlobStorageAdapter` implements the port contract for browser and
test paths.
8. Legacy raw Git blob attachments remain readable through backward-compat
fallback in `CasBlobAdapter.retrieveStream()`.
9. Content stream methods are exposed on `WarpApp` and `WarpCore`.

## Non-goals

- no automatic conversion of all existing attachment reads to streams
- no silent breaking change to `getContent()` / `getEdgeContent()`
- no attempt to solve whole-state out-of-core replay here
- No automatic migration of existing raw Git blobs to CAS format
- No silent breaking change to existing `getContent()` / `getEdgeContent()`
return types
- No attempt to solve whole-state out-of-core replay (that is OG-013)
- No encryption-by-default (encryption remains an opt-in CAS capability)

## Notes

This item is related to, but narrower than,
`OG-013-out-of-core-materialization-and-streaming-reads.md`.
`OG-013` is about whole-state and replay architecture.
This item is specifically about attachment payload I/O and making
`git-cas` the streaming/chunked attachment path.
This item supersedes the original OG-014 scope, which covered only streaming
reads. The expanded scope now includes mandatory CAS and streaming writes.

Related items:
- `OG-013`: out-of-core materialization and streaming reads (broader, separate)
- `B160`: blob attachments via CAS (done, but opt-in — this item makes it
mandatory)
- `B163`: streaming restore for seek cache (done, pattern to follow for blobs)
2 changes: 1 addition & 1 deletion BACKLOG/OG-015-jsr-documentation-quality.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# OG-015 — Raise JSR documentation quality score

Status: QUEUED
Status: DONE

Legend: Observer Geometry

Expand Down
8 changes: 4 additions & 4 deletions BACKLOG/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# BACKLOG — Observer Geometry

Last updated: 2026-03-28
Last updated: 2026-03-29

This directory holds promotable pre-design items for the current Observer
Geometry tranche.
Expand All @@ -26,9 +26,9 @@ Workflow:
| DONE | OG-007 | Expand hash-stability coverage across snapshot flavors | [OG-007-hash-stability-coverage.md](OG-007-hash-stability-coverage.md) |
| DONE | OG-008 | Make retargeting compatibility a hard major-version cut | [OG-008-retargeting-compatibility.md](OG-008-retargeting-compatibility.md) |
| QUEUED | OG-009 | Align playback-head and TTD consumers after read nouns stabilize | [OG-009-playback-head-alignment.md](OG-009-playback-head-alignment.md) |
| ACTIVE | OG-010 | IBM Design Thinking pass over public APIs and README | [OG-010-public-api-design-thinking.md](OG-010-public-api-design-thinking.md) |
| DONE | OG-010 | IBM Design Thinking pass over public APIs and README | [OG-010-public-api-design-thinking.md](OG-010-public-api-design-thinking.md) |
| QUEUED | OG-011 | Publish a public API catalog and browser documentation playground | [OG-011-public-api-catalog-and-playground.md](OG-011-public-api-catalog-and-playground.md) |
| DONE | OG-012 | Audit and reconcile the documentation corpus before v15 | [OG-012-documentation-corpus-audit.md](OG-012-documentation-corpus-audit.md) |
| QUEUED | OG-013 | Design out-of-core materialization and streaming reads | [OG-013-out-of-core-materialization-and-streaming-reads.md](OG-013-out-of-core-materialization-and-streaming-reads.md) |
| QUEUED | OG-014 | Stream content attachments through `git-cas` | [OG-014-streaming-content-attachments.md](OG-014-streaming-content-attachments.md) |
| QUEUED | OG-015 | Raise JSR documentation quality score | [OG-015-jsr-documentation-quality.md](OG-015-jsr-documentation-quality.md) |
| DONE | OG-014 | Mandatory CAS blob storage with streaming I/O | [OG-014-streaming-content-attachments.md](OG-014-streaming-content-attachments.md) |
| DONE | OG-015 | Raise JSR documentation quality score | [OG-015-jsr-documentation-quality.md](OG-015-jsr-documentation-quality.md) |
19 changes: 19 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,25 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [16.0.0] — 2026-03-29

### Added

- **Streaming content attachment I/O (OG-014)** — `getContentStream()` and `getEdgeContentStream()` return `AsyncIterable<Uint8Array>` for incremental consumption of large content blobs. `attachContent()` and `attachEdgeContent()` now accept `AsyncIterable<Uint8Array>`, `ReadableStream<Uint8Array>`, `Uint8Array`, or `string` — streaming inputs are piped directly to blob storage without intermediate buffering.
- **`InMemoryBlobStorageAdapter`** — new domain-local adapter implementing `BlobStoragePort` with content-addressed `Map`-based storage for browser and test paths. Exported from the package surface.
- **`BlobStoragePort.storeStream()` / `retrieveStream()`** — streaming variants of the blob storage port contract. `storeStream()` accepts `AsyncIterable<Uint8Array>`, `retrieveStream()` returns `AsyncIterable<Uint8Array>`.
- **Content methods on `WarpApp` and `WarpCore`** — `getContent()`, `getContentStream()`, `getContentOid()`, `getContentMeta()` and their edge equivalents are now exposed on both public API surfaces (previously only on `WarpRuntime`).

### Changed

- **CAS blob storage is now mandatory for content attachments** — `attachContent()` and `attachEdgeContent()` always route through `BlobStoragePort`. The raw `persistence.writeBlob()` fallback has been removed. `WarpRuntime.open()` auto-constructs `CasBlobAdapter` for Git-backed persistence (when `plumbing` is available) or `InMemoryBlobStorageAdapter` otherwise.
- **Content blob tree entries use tree mode** — patch commit trees and checkpoint trees now reference content blobs as `040000 tree` entries (matching CAS tree OIDs) instead of `100644 blob` entries.

### Removed

- **`TraversalService`** — deprecated alias removed. Use `CommitDagTraversalService` directly.
- **`createWriter()`** — deprecated method removed from `WarpApp` and `WarpCore`. Use `writer()` or `writer(id)` instead.

## [15.0.0] — 2026-03-28

## [15.0.1] — 2026-03-28
Expand Down
108 changes: 108 additions & 0 deletions MIGRATING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# Migrating to v16

This guide covers the breaking changes in v16.0.0 and how to update your code.

## Content attachments now require blob storage (OG-014)

**What changed:** `attachContent()` and `attachEdgeContent()` no longer fall
back to raw `persistence.writeBlob()`. They always route through
`BlobStoragePort`. Without blob storage, they throw `NO_BLOB_STORAGE`.

**Who is affected:** Only consumers who construct `PatchBuilderV2` directly
(bypassing `WarpRuntime.open()`) without passing `blobStorage`. If you use
`WarpApp.open()`, `WarpCore.open()`, or `WarpRuntime.open()`, blob storage
is auto-constructed — no code changes needed.

**How to migrate:**

```javascript
import InMemoryBlobStorageAdapter from '@git-stunts/git-warp/defaultBlobStorage';

// If you construct PatchBuilderV2 directly, add blobStorage:
const builder = new PatchBuilderV2({
persistence,
writerId: 'alice',
blobStorage: new InMemoryBlobStorageAdapter(), // or CasBlobAdapter for Git
// ...other options
});
```

## Streaming content I/O

**What changed:** `attachContent()` and `attachEdgeContent()` now accept
streaming input (`AsyncIterable<Uint8Array>`, `ReadableStream<Uint8Array>`)
in addition to `Uint8Array` and `string`. New `getContentStream()` and
`getEdgeContentStream()` methods return `AsyncIterable<Uint8Array>`.

**Who is affected:** No one — this is additive. Existing code continues to
work. New streaming APIs are opt-in.

**How to use:**

```javascript
// Streaming write — pipe a file directly
import { createReadStream } from 'node:fs';
const patch = await app.createPatch();
patch.addNode('doc:1');
await patch.attachContent('doc:1', createReadStream('large-file.bin'), {
size: fileStat.size,
mime: 'application/octet-stream',
});
await patch.commit();

// Streaming read — consume incrementally
const stream = await app.getContentStream('doc:1');
if (stream) {
for await (const chunk of stream) {
process.stdout.write(chunk);
}
}

// Buffered read — unchanged
const buf = await app.getContent('doc:1');
```

## Content blob tree entries use tree mode

**What changed:** Patch commit trees and checkpoint trees now reference
content blobs as `040000 tree` entries (CAS tree OIDs) instead of
`100644 blob` entries.

**Who is affected:** Consumers who parse raw Git commit trees and expect
content anchor entries to use blob mode. This does not affect any public
API — it is an internal storage format change.

**How to migrate:** If you parse `_content_<oid>` entries from commit trees,
update your parser to accept `040000 tree` mode.

## `TraversalService` removed

**What changed:** The `TraversalService` export was a deprecated alias for
`CommitDagTraversalService`. It has been removed.

**How to migrate:**

```javascript
// Before
import { TraversalService } from '@git-stunts/git-warp';

// After
import { CommitDagTraversalService } from '@git-stunts/git-warp';
```

## `createWriter()` removed

**What changed:** The `createWriter()` method on `WarpApp` was deprecated in
v15 and has been removed. Use `writer()` instead.

**How to migrate:**

```javascript
// Before
const w = await app.createWriter();
const w2 = await app.createWriter({ persist: 'config', alias: 'secondary' });

// After
const w = await app.writer(); // resolves from git config or generates
const w2 = await app.writer('secondary'); // explicit ID
```
Loading
Loading