git-stunts · flyingrobots · Mar 30, 2026 · Mar 30, 2026 · Mar 30, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -13,6 +13,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - **`git cas agent rotate`** — added a machine-facing rotation flow so Relay can rotate recipient keys by slug or detached tree OID and expose the resulting tree and vault side effects explicitly.
 - **`git cas agent vault rotate`** — added a machine-facing vault passphrase rotation flow so Relay can rotate encrypted vault state with explicit commit, KDF, and rotated/skipped-entry results.
 - **`git cas agent vault init|remove`** — added machine-facing vault lifecycle commands so Relay can initialize encrypted or plaintext vaults and remove entries without scraping human CLI output.
+- **Benchmark baselines doc** — added [docs/BENCHMARKS.md](./docs/BENCHMARKS.md) with the first published chunking baseline, including fixed-size versus CDC throughput, dedupe reuse results, and refresh instructions.
 - **Threat model doc** — added [docs/THREAT_MODEL.md](./docs/THREAT_MODEL.md) as the canonical statement of attacker models, trust boundaries, exposed metadata, and explicit non-goals.
 - **Workflow model** — added [WORKFLOW.md](./WORKFLOW.md), explicit legends/backlog/invariants directories, and a cycle-first planning model for fresh work.
 - **Review automation baseline** — added `.github/CODEOWNERS` with repo-wide ownership for `@git-stunts`.

diff --git a/docs/BACKLOG/README.md b/docs/BACKLOG/README.md
@@ -29,13 +29,13 @@ If the planning history is still useful, move it to
 
 Current backlog items:
 
-- [TR-003 — Benchmark Baselines](./TR-003-benchmark-baselines.md)
 - [TR-005 — CasService Decomposition Plan](./TR-005-casservice-decomposition-plan.md)
 - [TR-006 — Docs Maintainer Checklist](./TR-006-docs-maintainer-checklist.md)
 - [TR-007 — Security Doc Discoverability Audit](./TR-007-security-doc-discoverability-audit.md)
 - [TR-008 — Empty-State Phrasing Consistency](./TR-008-empty-state-phrasing-consistency.md)
 - [TR-009 — Pre-PR Doc Cross-Link Audit](./TR-009-pre-pr-doc-cross-link-audit.md)
 - [TR-010 — Planning Index Consistency Review](./TR-010-planning-index-consistency-review.md)
+- [TR-011 — Streaming Encrypted Restore](./TR-011-streaming-encrypted-restore.md)
 
 Archived delivered backlog items:
 

diff --git a/docs/BACKLOG/TR-011-streaming-encrypted-restore.md b/docs/BACKLOG/TR-011-streaming-encrypted-restore.md
@@ -0,0 +1,46 @@
+# TR-011 — Streaming Encrypted Restore
+
+## Legend
+
+- [TR — Truth](../legends/TR-truth.md)
+
+## Why This Exists
+
+`git-cas` currently streams plaintext restores chunk-by-chunk, but encrypted or
+compressed restores buffer the full payload in memory before yielding output.
+
+That is safe and simple for the current whole-object AES-GCM format, but it
+also means large encrypted restores are bounded by `maxRestoreBufferSize` and do
+not yet benefit from a lower-memory temp-file streaming approach.
+
+## Target Outcome
+
+Produce a design-backed investigation of streaming encrypted/compressed restore,
+including:
+
+- current integrity and buffering constraints
+- whether decrypt-to-temp-file plus atomic rename is the right model
+- benchmark questions needed to compare memory and throughput tradeoffs
+
+## Human Value
+
+Maintainers and operators should be able to understand whether large encrypted
+restores can become more memory-efficient without weakening integrity
+guarantees.
+
+## Agent Value
+
+Agents should be able to reason about encrypted restore constraints and propose
+bounded follow-on work without hand-waving around the current buffering model.
+
+## Linked Invariants
+
+- [I-001 — Determinism, Trust, And Explicit Surfaces](../invariants/I-001-determinism-trust-and-explicit-surfaces.md)
+
+## Notes
+
+- distinguish plaintext streaming from encrypted/compressed restore behavior
+- account for the current whole-object AES-GCM tag model
+- evaluate temp-file restore semantics before considering direct-to-destination
+  writes
+- tie any design work to benchmark and memory observations, not intuition alone
diff --git a/docs/BENCHMARKS.md b/docs/BENCHMARKS.md
@@ -0,0 +1,130 @@
+# Benchmarks
+
+This document records published baseline measurements for `git-cas`.
+
+These numbers are meant to be:
+
+- honest
+- reproducible enough for maintainers to refresh
+- useful for human and agent tradeoff discussions
+
+They are not meant to be universal truths across every machine, runtime, or
+repository shape.
+
+## Current Scope
+
+The first published baseline focuses on chunking tradeoffs:
+
+- fixed-size chunking
+- CDC (content-defined chunking)
+
+This is the highest-value first comparison because it exposes the core tradeoff
+that users ask about most often:
+
+- fixed chunking is cheaper and faster
+- CDC preserves dedupe much better when small edits shift later bytes
+
+The repo also contains broader CAS benchmarks in
+[`test/benchmark/cas.bench.js`](../test/benchmark/cas.bench.js), but those
+results are not yet published here as a maintained baseline.
+
+## Benchmark Configuration
+
+Observed on **March 30, 2026** with:
+
+- command:
+  `CI=1 npx vitest bench --run test/benchmark/chunking.bench.js`
+- machine: Apple M1 Pro
+- memory: 16 GiB
+- OS: macOS 26.3 (`25D125`)
+- runtime: Node `v25.8.1`
+- package manager: npm `11.11.0`
+- benchmark runner: Vitest `2.1.9`
+
+The current harness uses:
+
+- seeded pseudo-random input buffers for reproducibility
+- buffer sizes: `1 MB`, `10 MB`, `100 MB`
+- fixed chunking: `16 KiB`
+- CDC:
+  `minChunkSize=4096`, `targetChunkSize=16384`, `maxChunkSize=65536`
+- dedupe scenario:
+  a `1 MB` base file with deterministic inserted edits of `1`, `10`, `100`,
+  and `1000` bytes about one-third into the file
+
+One implementation detail to keep in mind:
+Vitest emitted multiple pass blocks during the one-shot run on this machine.
+The throughput table below records the final reported block from that run. The
+dedupe table is deterministic in this harness and was stable across the
+observed output.
+
+## Throughput Baseline
+
+Observed chunking throughput:
+
+| Strategy | Buffer   |    Mean time |    Throughput |
+| -------- | -------- | -----------: | ------------: |
+| CDC      | `1 MB`   |  `4.0060 ms` |   `249.62 hz` |
+| CDC      | `10 MB`  | `36.8944 ms` |  `27.1044 hz` |
+| CDC      | `100 MB` |  `342.75 ms` |   `2.9176 hz` |
+| Fixed    | `1 MB`   |  `0.1401 ms` | `7,137.96 hz` |
+| Fixed    | `10 MB`  |  `1.1948 ms` |   `836.96 hz` |
+| Fixed    | `100 MB` | `13.1405 ms` |  `76.1006 hz` |
+
+Observed speed advantage for fixed chunking on this machine:
+
+- `1 MB`: about `28.6x` faster than CDC
+- `10 MB`: about `30.9x` faster than CDC
+- `100 MB`: about `26.1x` faster than CDC
+
+## Dedupe Reuse Baseline
+
+Observed chunk reuse after deterministic inserted edits:
+
+| Inserted edit | Fixed chunks | Fixed reuse | CDC chunks | CDC reuse |
+| ------------- | -----------: | ----------: | ---------: | --------: |
+| `1 B`         |         `65` |     `32.3%` |       `62` |   `98.4%` |
+| `10 B`        |         `65` |     `32.3%` |       `62` |   `98.4%` |
+| `100 B`       |         `65` |     `32.3%` |       `62` |   `98.4%` |
+| `1000 B`      |         `65` |     `32.3%` |       `62` |   `98.4%` |
+
+What this means:
+
+- fixed chunking keeps a simple, cheap chunk boundary model
+- a small inserted edit shifts later fixed boundaries, so most later chunks stop
+  matching
+- CDC pays much more CPU cost up front, but keeps chunk boundaries aligned well
+  enough that nearly all later chunks still dedupe in this scenario
+
+## What Falls Out
+
+For current `git-cas` guidance:
+
+- fixed chunking is the right default when ingest cost and simplicity matter
+  more than edit-shift dedupe
+- CDC is the better choice for large assets that change incrementally and where
+  preserved chunk reuse matters enough to justify more CPU time
+- these measurements are chunker-centric, not full end-to-end store or restore
+  numbers
+
+This baseline should be read as tradeoff guidance, not as a promise that one
+strategy is categorically better.
+
+## Limits Of This Baseline
+
+- local-machine measurements are directional, not portable
+- this run used Node `v25.8.1`, not the repo's minimum supported Node `22.x`
+- the published baseline does not yet cover:
+  end-to-end store/restore cost, encryption overhead, codec overhead, or Bun and
+  Deno runtime comparisons
+
+## Refreshing This Doc
+
+To refresh the chunking baseline:
+
+1. Run:
+   `CI=1 npx vitest bench --run test/benchmark/chunking.bench.js`
+2. Record the environment details of the machine and runtime used.
+3. Update the throughput and dedupe tables.
+4. Keep the narrative honest if the benchmark harness, target chunk sizes, or
+   interpretation changes.
diff --git a/docs/archive/BACKLOG/README.md b/docs/archive/BACKLOG/README.md
@@ -21,5 +21,7 @@ Landed archived backlog items:
   - landed as [TR-001 — Truth: Architecture Reality Gap](../../design/TR-001-architecture-reality-gap.md)
 - [TR-002 — Threat Model](./TR-002-threat-model.md)
   - landed as [TR-002 — Truth: Threat Model](../../design/TR-002-threat-model.md)
+- [TR-003 — Benchmark Baselines](./TR-003-benchmark-baselines.md)
+  - landed as [TR-003 — Truth: Benchmark Baselines](../../design/TR-003-benchmark-baselines.md)
 - [TR-004 — Design Doc Lifecycle](./TR-004-design-doc-lifecycle.md)
   - landed as [TR-004 — Truth: Design Doc Lifecycle](../../design/TR-004-design-doc-lifecycle.md)
diff --git a/docs/BACKLOG/TR-003-benchmark-baselines.md → ...ive/BACKLOG/TR-003-benchmark-baselines.md b/docs/BACKLOG/TR-003-benchmark-baselines.md → ...ive/BACKLOG/TR-003-benchmark-baselines.md
@@ -2,7 +2,7 @@
 
 ## Legend
 
-- [TR — Truth](../legends/TR-truth.md)
+- [TR — Truth](../../legends/TR-truth.md)
 
 ## Why This Exists
 
@@ -11,7 +11,7 @@ yet publish stable benchmark guidance that helps users choose among them.
 
 ## Target Outcome
 
-Add [docs/BENCHMARKS.md](../BENCHMARKS.md) with baseline results and enough
+Add [docs/BENCHMARKS.md](../../BENCHMARKS.md) with baseline results and enough
 methodology detail that maintainers can refresh it intentionally.
 
 ## Human Value
@@ -26,7 +26,7 @@ tuning guidance, or follow-on optimization work.
 
 ## Linked Invariants
 
-- [I-001 — Determinism, Trust, And Explicit Surfaces](../invariants/I-001-determinism-trust-and-explicit-surfaces.md)
+- [I-001 — Determinism, Trust, And Explicit Surfaces](../../invariants/I-001-determinism-trust-and-explicit-surfaces.md)
 
 ## Notes
 

diff --git a/docs/design/README.md b/docs/design/README.md
@@ -41,6 +41,7 @@ Landed cycle docs:
 - [RL-005 — Relay: Agent Vault Lifecycle](./RL-005-agent-vault-lifecycle.md)
 - [TR-001 — Truth: Architecture Reality Gap](./TR-001-architecture-reality-gap.md)
 - [TR-002 — Truth: Threat Model](./TR-002-threat-model.md)
+- [TR-003 — Truth: Benchmark Baselines](./TR-003-benchmark-baselines.md)
 - [TR-004 — Truth: Design Doc Lifecycle](./TR-004-design-doc-lifecycle.md)
 
 Archived or retired cycle docs:

diff --git a/docs/design/TR-003-benchmark-baselines.md b/docs/design/TR-003-benchmark-baselines.md
@@ -0,0 +1,145 @@
+# TR-003 — Truth: Benchmark Baselines
+
+## Status
+
+Landed
+
+## Linked Legend
+
+- [TR — Truth](../legends/TR-truth.md)
+
+## Linked Invariants
+
+- [I-001 — Determinism, Trust, And Explicit Surfaces](../invariants/I-001-determinism-trust-and-explicit-surfaces.md)
+
+## Context
+
+`git-cas` already had a benchmark harness, but it did not yet publish stable
+benchmark guidance that maintainers, operators, or agents could cite.
+
+That left a recurring gap:
+
+- the repo could measure chunking tradeoffs
+- the repo could not point readers to a maintained baseline
+- default and tuning guidance therefore risked slipping into guesswork
+
+This cycle closes that gap by publishing the first benchmark baseline instead of
+expanding the benchmark surface.
+
+## Human Users, Jobs, And Hills
+
+### Users
+
+- maintainers
+- operators evaluating storage tradeoffs
+- adopters deciding between fixed chunking and CDC
+
+### Jobs
+
+- understand the current chunking cost/benefit tradeoff
+- compare fixed chunking and CDC with real observed numbers
+- rerun and refresh the benchmark doc intentionally later
+
+### Hill
+
+A maintainer or operator can read [docs/BENCHMARKS.md](../BENCHMARKS.md) and
+come away with an honest current baseline for chunking throughput and edit-shift
+dedupe behavior.
+
+## Agent Users, Jobs, And Hills
+
+### Users
+
+- coding agents
+- review agents
+- documentation agents
+
+### Jobs
+
+- cite current benchmark tradeoffs without inventing missing numbers
+- recommend chunking strategies from published repo truth
+- plan performance follow-up work from explicit observed behavior
+
+### Hill
+
+An agent can reference [docs/BENCHMARKS.md](../BENCHMARKS.md) as the canonical
+published chunking baseline instead of extrapolating from raw benchmark source
+files alone.
+
+## Human Playback
+
+- Does the published doc explain both throughput cost and dedupe benefit?
+- Does it say what machine and runtime produced the numbers?
+- Does it avoid pretending local measurements are universal truth?
+
+## Agent Playback
+
+- Can an agent tell which benchmark results are published versus merely possible
+  to derive from the harness?
+- Can it distinguish fixed-chunk speed from CDC edit-shift reuse benefits?
+- Can it tell how to refresh the baseline later without inventing a new method?
+
+## Explicit Non-Goals
+
+- no code changes to the chunkers in this cycle
+- no attempt to publish every existing benchmark in one pass
+- no claim that these local measurements are portable across all environments
+
+## Decisions
+
+### Publish Chunking Guidance First
+
+The first maintained benchmark baseline should cover the highest-value tradeoff:
+fixed-size chunking versus CDC.
+
+That is the benchmark question most likely to affect defaults, tuning, and
+adoption guidance.
+
+### Reuse The Existing Harness
+
+This cycle should publish results from the committed benchmark harness in
+[`test/benchmark/chunking.bench.js`](../../test/benchmark/chunking.bench.js),
+not create a second ad hoc benchmark path.
+
+### Keep The Baseline Local And Dated
+
+The right claim is "these are observed local baseline numbers on a documented
+machine and runtime," not "these are universal performance truths."
+
+## Implementation Outline
+
+1. Audit the current chunking benchmark harness and capture its actual input
+   sizes and chunker settings.
+2. Run the harness and record the observed throughput and dedupe output.
+3. Add [docs/BENCHMARKS.md](../BENCHMARKS.md) with methodology, environment,
+   results, interpretation, and rerun instructions.
+4. Add this cycle doc, archive the consumed backlog card, update the Truth
+   indexes, and record the change in [CHANGELOG.md](../../CHANGELOG.md).
+
+## Tests To Write First
+
+No new executable tests.
+
+This is a documentation-truth cycle. Verification is:
+
+- rerunning the committed benchmark harness
+- direct cross-check against benchmark input sizes and chunker options in
+  `test/benchmark/chunking.bench.js`
+- formatting validation for the touched Markdown files
+
+## Risks And Unknowns
+
+- local benchmark results can drift as the machine, Node version, or Vitest
+  behavior changes
+- readers can overread a local baseline as a universal recommendation if the doc
+  stops being explicit about scope
+- the repo still does not publish end-to-end store/restore or cross-runtime
+  benchmark baselines
+
+## Retrospective
+
+This was the right next Truth cycle after the architecture and threat-model
+work.
+
+The repo already knew how to measure chunking tradeoffs. The missing piece was a
+published, refreshable statement of what those measurements currently say.