Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- **`git cas agent rotate`** — added a machine-facing rotation flow so Relay can rotate recipient keys by slug or detached tree OID and expose the resulting tree and vault side effects explicitly.
- **`git cas agent vault rotate`** — added a machine-facing vault passphrase rotation flow so Relay can rotate encrypted vault state with explicit commit, KDF, and rotated/skipped-entry results.
- **`git cas agent vault init|remove`** — added machine-facing vault lifecycle commands so Relay can initialize encrypted or plaintext vaults and remove entries without scraping human CLI output.
- **Benchmark baselines doc** — added [docs/BENCHMARKS.md](./docs/BENCHMARKS.md) with the first published chunking baseline, including fixed-size versus CDC throughput, dedupe reuse results, and refresh instructions.
- **Threat model doc** — added [docs/THREAT_MODEL.md](./docs/THREAT_MODEL.md) as the canonical statement of attacker models, trust boundaries, exposed metadata, and explicit non-goals.
- **Workflow model** — added [WORKFLOW.md](./WORKFLOW.md), explicit legends/backlog/invariants directories, and a cycle-first planning model for fresh work.
- **Review automation baseline** — added `.github/CODEOWNERS` with repo-wide ownership for `@git-stunts`.
Expand Down
2 changes: 1 addition & 1 deletion docs/BACKLOG/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,13 @@ If the planning history is still useful, move it to

Current backlog items:

- [TR-003 — Benchmark Baselines](./TR-003-benchmark-baselines.md)
- [TR-005 — CasService Decomposition Plan](./TR-005-casservice-decomposition-plan.md)
- [TR-006 — Docs Maintainer Checklist](./TR-006-docs-maintainer-checklist.md)
- [TR-007 — Security Doc Discoverability Audit](./TR-007-security-doc-discoverability-audit.md)
- [TR-008 — Empty-State Phrasing Consistency](./TR-008-empty-state-phrasing-consistency.md)
- [TR-009 — Pre-PR Doc Cross-Link Audit](./TR-009-pre-pr-doc-cross-link-audit.md)
- [TR-010 — Planning Index Consistency Review](./TR-010-planning-index-consistency-review.md)
- [TR-011 — Streaming Encrypted Restore](./TR-011-streaming-encrypted-restore.md)

Archived delivered backlog items:

Expand Down
46 changes: 46 additions & 0 deletions docs/BACKLOG/TR-011-streaming-encrypted-restore.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# TR-011 — Streaming Encrypted Restore

## Legend

- [TR — Truth](../legends/TR-truth.md)

## Why This Exists

`git-cas` currently streams plaintext restores chunk-by-chunk, but encrypted or
compressed restores buffer the full payload in memory before yielding output.

That is safe and simple for the current whole-object AES-GCM format, but it
also means large encrypted restores are bounded by `maxRestoreBufferSize` and do
not yet benefit from a lower-memory temp-file streaming approach.

## Target Outcome

Produce a design-backed investigation of streaming encrypted/compressed restore,
including:

- current integrity and buffering constraints
- whether decrypt-to-temp-file plus atomic rename is the right model
- benchmark questions needed to compare memory and throughput tradeoffs

## Human Value

Maintainers and operators should be able to understand whether large encrypted
restores can become more memory-efficient without weakening integrity
guarantees.

## Agent Value

Agents should be able to reason about encrypted restore constraints and propose
bounded follow-on work without hand-waving around the current buffering model.

## Linked Invariants

- [I-001 — Determinism, Trust, And Explicit Surfaces](../invariants/I-001-determinism-trust-and-explicit-surfaces.md)

## Notes

- distinguish plaintext streaming from encrypted/compressed restore behavior
- account for the current whole-object AES-GCM tag model
- evaluate temp-file restore semantics before considering direct-to-destination
writes
- tie any design work to benchmark and memory observations, not intuition alone
130 changes: 130 additions & 0 deletions docs/BENCHMARKS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# Benchmarks

This document records published baseline measurements for `git-cas`.

These numbers are meant to be:

- honest
- reproducible enough for maintainers to refresh
- useful for human and agent tradeoff discussions

They are not meant to be universal truths across every machine, runtime, or
repository shape.

## Current Scope

The first published baseline focuses on chunking tradeoffs:

- fixed-size chunking
- CDC (content-defined chunking)

This is the highest-value first comparison because it exposes the core tradeoff
that users ask about most often:

- fixed chunking is cheaper and faster
- CDC preserves dedupe much better when small edits shift later bytes

The repo also contains broader CAS benchmarks in
[`test/benchmark/cas.bench.js`](../test/benchmark/cas.bench.js), but those
results are not yet published here as a maintained baseline.

## Benchmark Configuration

Observed on **March 30, 2026** with:

- command:
`CI=1 npx vitest bench --run test/benchmark/chunking.bench.js`
- machine: Apple M1 Pro
- memory: 16 GiB
- OS: macOS 26.3 (`25D125`)
- runtime: Node `v25.8.1`
- package manager: npm `11.11.0`
- benchmark runner: Vitest `2.1.9`

The current harness uses:

- seeded pseudo-random input buffers for reproducibility
- buffer sizes: `1 MB`, `10 MB`, `100 MB`
- fixed chunking: `16 KiB`
- CDC:
`minChunkSize=4096`, `targetChunkSize=16384`, `maxChunkSize=65536`
- dedupe scenario:
a `1 MB` base file with deterministic inserted edits of `1`, `10`, `100`,
and `1000` bytes about one-third into the file

One implementation detail to keep in mind:
Vitest emitted multiple pass blocks during the one-shot run on this machine.
The throughput table below records the final reported block from that run. The
dedupe table is deterministic in this harness and was stable across the
observed output.

## Throughput Baseline

Observed chunking throughput:

| Strategy | Buffer | Mean time | Throughput |
| -------- | -------- | -----------: | ------------: |
| CDC | `1 MB` | `4.0060 ms` | `249.62 hz` |
| CDC | `10 MB` | `36.8944 ms` | `27.1044 hz` |
| CDC | `100 MB` | `342.75 ms` | `2.9176 hz` |
| Fixed | `1 MB` | `0.1401 ms` | `7,137.96 hz` |
| Fixed | `10 MB` | `1.1948 ms` | `836.96 hz` |
| Fixed | `100 MB` | `13.1405 ms` | `76.1006 hz` |

Observed speed advantage for fixed chunking on this machine:

- `1 MB`: about `28.6x` faster than CDC
- `10 MB`: about `30.9x` faster than CDC
- `100 MB`: about `26.1x` faster than CDC

## Dedupe Reuse Baseline

Observed chunk reuse after deterministic inserted edits:

| Inserted edit | Fixed chunks | Fixed reuse | CDC chunks | CDC reuse |
| ------------- | -----------: | ----------: | ---------: | --------: |
| `1 B` | `65` | `32.3%` | `62` | `98.4%` |
| `10 B` | `65` | `32.3%` | `62` | `98.4%` |
| `100 B` | `65` | `32.3%` | `62` | `98.4%` |
| `1000 B` | `65` | `32.3%` | `62` | `98.4%` |

What this means:

- fixed chunking keeps a simple, cheap chunk boundary model
- a small inserted edit shifts later fixed boundaries, so most later chunks stop
matching
- CDC pays much more CPU cost up front, but keeps chunk boundaries aligned well
enough that nearly all later chunks still dedupe in this scenario

## What Falls Out

For current `git-cas` guidance:

- fixed chunking is the right default when ingest cost and simplicity matter
more than edit-shift dedupe
- CDC is the better choice for large assets that change incrementally and where
preserved chunk reuse matters enough to justify more CPU time
- these measurements are chunker-centric, not full end-to-end store or restore
numbers

This baseline should be read as tradeoff guidance, not as a promise that one
strategy is categorically better.

## Limits Of This Baseline

- local-machine measurements are directional, not portable
- this run used Node `v25.8.1`, not the repo's minimum supported Node `22.x`
- the published baseline does not yet cover:
end-to-end store/restore cost, encryption overhead, codec overhead, or Bun and
Deno runtime comparisons

## Refreshing This Doc

To refresh the chunking baseline:

1. Run:
`CI=1 npx vitest bench --run test/benchmark/chunking.bench.js`
2. Record the environment details of the machine and runtime used.
3. Update the throughput and dedupe tables.
4. Keep the narrative honest if the benchmark harness, target chunk sizes, or
interpretation changes.
2 changes: 2 additions & 0 deletions docs/archive/BACKLOG/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,7 @@ Landed archived backlog items:
- landed as [TR-001 — Truth: Architecture Reality Gap](../../design/TR-001-architecture-reality-gap.md)
- [TR-002 — Threat Model](./TR-002-threat-model.md)
- landed as [TR-002 — Truth: Threat Model](../../design/TR-002-threat-model.md)
- [TR-003 — Benchmark Baselines](./TR-003-benchmark-baselines.md)
- landed as [TR-003 — Truth: Benchmark Baselines](../../design/TR-003-benchmark-baselines.md)
- [TR-004 — Design Doc Lifecycle](./TR-004-design-doc-lifecycle.md)
- landed as [TR-004 — Truth: Design Doc Lifecycle](../../design/TR-004-design-doc-lifecycle.md)
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Legend

- [TR — Truth](../legends/TR-truth.md)
- [TR — Truth](../../legends/TR-truth.md)

## Why This Exists

Expand All @@ -11,7 +11,7 @@ yet publish stable benchmark guidance that helps users choose among them.

## Target Outcome

Add [docs/BENCHMARKS.md](../BENCHMARKS.md) with baseline results and enough
Add [docs/BENCHMARKS.md](../../BENCHMARKS.md) with baseline results and enough
methodology detail that maintainers can refresh it intentionally.

## Human Value
Expand All @@ -26,7 +26,7 @@ tuning guidance, or follow-on optimization work.

## Linked Invariants

- [I-001 — Determinism, Trust, And Explicit Surfaces](../invariants/I-001-determinism-trust-and-explicit-surfaces.md)
- [I-001 — Determinism, Trust, And Explicit Surfaces](../../invariants/I-001-determinism-trust-and-explicit-surfaces.md)

## Notes

Expand Down
1 change: 1 addition & 0 deletions docs/design/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ Landed cycle docs:
- [RL-005 — Relay: Agent Vault Lifecycle](./RL-005-agent-vault-lifecycle.md)
- [TR-001 — Truth: Architecture Reality Gap](./TR-001-architecture-reality-gap.md)
- [TR-002 — Truth: Threat Model](./TR-002-threat-model.md)
- [TR-003 — Truth: Benchmark Baselines](./TR-003-benchmark-baselines.md)
- [TR-004 — Truth: Design Doc Lifecycle](./TR-004-design-doc-lifecycle.md)

Archived or retired cycle docs:
Expand Down
145 changes: 145 additions & 0 deletions docs/design/TR-003-benchmark-baselines.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# TR-003 — Truth: Benchmark Baselines

## Status

Landed

## Linked Legend

- [TR — Truth](../legends/TR-truth.md)

## Linked Invariants

- [I-001 — Determinism, Trust, And Explicit Surfaces](../invariants/I-001-determinism-trust-and-explicit-surfaces.md)

## Context

`git-cas` already had a benchmark harness, but it did not yet publish stable
benchmark guidance that maintainers, operators, or agents could cite.

That left a recurring gap:

- the repo could measure chunking tradeoffs
- the repo could not point readers to a maintained baseline
- default and tuning guidance therefore risked slipping into guesswork

This cycle closes that gap by publishing the first benchmark baseline instead of
expanding the benchmark surface.

## Human Users, Jobs, And Hills

### Users

- maintainers
- operators evaluating storage tradeoffs
- adopters deciding between fixed chunking and CDC

### Jobs

- understand the current chunking cost/benefit tradeoff
- compare fixed chunking and CDC with real observed numbers
- rerun and refresh the benchmark doc intentionally later

### Hill

A maintainer or operator can read [docs/BENCHMARKS.md](../BENCHMARKS.md) and
come away with an honest current baseline for chunking throughput and edit-shift
dedupe behavior.

## Agent Users, Jobs, And Hills

### Users

- coding agents
- review agents
- documentation agents

### Jobs

- cite current benchmark tradeoffs without inventing missing numbers
- recommend chunking strategies from published repo truth
- plan performance follow-up work from explicit observed behavior

### Hill

An agent can reference [docs/BENCHMARKS.md](../BENCHMARKS.md) as the canonical
published chunking baseline instead of extrapolating from raw benchmark source
files alone.

## Human Playback

- Does the published doc explain both throughput cost and dedupe benefit?
- Does it say what machine and runtime produced the numbers?
- Does it avoid pretending local measurements are universal truth?

## Agent Playback

- Can an agent tell which benchmark results are published versus merely possible
to derive from the harness?
- Can it distinguish fixed-chunk speed from CDC edit-shift reuse benefits?
- Can it tell how to refresh the baseline later without inventing a new method?

## Explicit Non-Goals

- no code changes to the chunkers in this cycle
- no attempt to publish every existing benchmark in one pass
- no claim that these local measurements are portable across all environments

## Decisions

### Publish Chunking Guidance First

The first maintained benchmark baseline should cover the highest-value tradeoff:
fixed-size chunking versus CDC.

That is the benchmark question most likely to affect defaults, tuning, and
adoption guidance.

### Reuse The Existing Harness

This cycle should publish results from the committed benchmark harness in
[`test/benchmark/chunking.bench.js`](../../test/benchmark/chunking.bench.js),
not create a second ad hoc benchmark path.

### Keep The Baseline Local And Dated

The right claim is "these are observed local baseline numbers on a documented
machine and runtime," not "these are universal performance truths."

## Implementation Outline

1. Audit the current chunking benchmark harness and capture its actual input
sizes and chunker settings.
2. Run the harness and record the observed throughput and dedupe output.
3. Add [docs/BENCHMARKS.md](../BENCHMARKS.md) with methodology, environment,
results, interpretation, and rerun instructions.
4. Add this cycle doc, archive the consumed backlog card, update the Truth
indexes, and record the change in [CHANGELOG.md](../../CHANGELOG.md).

## Tests To Write First

No new executable tests.

This is a documentation-truth cycle. Verification is:

- rerunning the committed benchmark harness
- direct cross-check against benchmark input sizes and chunker options in
`test/benchmark/chunking.bench.js`
- formatting validation for the touched Markdown files

## Risks And Unknowns

- local benchmark results can drift as the machine, Node version, or Vitest
behavior changes
- readers can overread a local baseline as a universal recommendation if the doc
stops being explicit about scope
- the repo still does not publish end-to-end store/restore or cross-runtime
benchmark baselines

## Retrospective

This was the right next Truth cycle after the architecture and threat-model
work.

The repo already knew how to measure chunking tradeoffs. The missing piece was a
published, refreshable statement of what those measurements currently say.
Loading
Loading