From 57ee58ec8a637bbc609437c19d6af3af52342f6b Mon Sep 17 00:00:00 2001 From: James Ross Date: Mon, 30 Mar 2026 06:00:32 -0700 Subject: [PATCH 1/2] docs: publish chunking benchmark baselines --- CHANGELOG.md | 1 + docs/BACKLOG/README.md | 1 - docs/BENCHMARKS.md | 130 ++++++++++++++++ docs/archive/BACKLOG/README.md | 2 + .../BACKLOG/TR-003-benchmark-baselines.md | 6 +- docs/design/README.md | 1 + docs/design/TR-003-benchmark-baselines.md | 145 ++++++++++++++++++ docs/legends/TR-truth.md | 2 +- 8 files changed, 283 insertions(+), 5 deletions(-) create mode 100644 docs/BENCHMARKS.md rename docs/{ => archive}/BACKLOG/TR-003-benchmark-baselines.md (78%) create mode 100644 docs/design/TR-003-benchmark-baselines.md diff --git a/CHANGELOG.md b/CHANGELOG.md index 246a26f..0eaa8a9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -13,6 +13,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - **`git cas agent rotate`** — added a machine-facing rotation flow so Relay can rotate recipient keys by slug or detached tree OID and expose the resulting tree and vault side effects explicitly. - **`git cas agent vault rotate`** — added a machine-facing vault passphrase rotation flow so Relay can rotate encrypted vault state with explicit commit, KDF, and rotated/skipped-entry results. - **`git cas agent vault init|remove`** — added machine-facing vault lifecycle commands so Relay can initialize encrypted or plaintext vaults and remove entries without scraping human CLI output. +- **Benchmark baselines doc** — added [docs/BENCHMARKS.md](./docs/BENCHMARKS.md) with the first published chunking baseline, including fixed-size versus CDC throughput, dedupe reuse results, and refresh instructions. - **Threat model doc** — added [docs/THREAT_MODEL.md](./docs/THREAT_MODEL.md) as the canonical statement of attacker models, trust boundaries, exposed metadata, and explicit non-goals. - **Workflow model** — added [WORKFLOW.md](./WORKFLOW.md), explicit legends/backlog/invariants directories, and a cycle-first planning model for fresh work. - **Review automation baseline** — added `.github/CODEOWNERS` with repo-wide ownership for `@git-stunts`. diff --git a/docs/BACKLOG/README.md b/docs/BACKLOG/README.md index 06a3e9f..8b348cf 100644 --- a/docs/BACKLOG/README.md +++ b/docs/BACKLOG/README.md @@ -29,7 +29,6 @@ If the planning history is still useful, move it to Current backlog items: -- [TR-003 — Benchmark Baselines](./TR-003-benchmark-baselines.md) - [TR-005 — CasService Decomposition Plan](./TR-005-casservice-decomposition-plan.md) - [TR-006 — Docs Maintainer Checklist](./TR-006-docs-maintainer-checklist.md) - [TR-007 — Security Doc Discoverability Audit](./TR-007-security-doc-discoverability-audit.md) diff --git a/docs/BENCHMARKS.md b/docs/BENCHMARKS.md new file mode 100644 index 0000000..51349d3 --- /dev/null +++ b/docs/BENCHMARKS.md @@ -0,0 +1,130 @@ +# Benchmarks + +This document records published baseline measurements for `git-cas`. + +These numbers are meant to be: + +- honest +- reproducible enough for maintainers to refresh +- useful for human and agent tradeoff discussions + +They are not meant to be universal truths across every machine, runtime, or +repository shape. + +## Current Scope + +The first published baseline focuses on chunking tradeoffs: + +- fixed-size chunking +- CDC (content-defined chunking) + +This is the highest-value first comparison because it exposes the core tradeoff +that users ask about most often: + +- fixed chunking is cheaper and faster +- CDC preserves dedupe much better when small edits shift later bytes + +The repo also contains broader CAS benchmarks in +[`test/benchmark/cas.bench.js`](../test/benchmark/cas.bench.js), but those +results are not yet published here as a maintained baseline. + +## Benchmark Configuration + +Observed on **March 30, 2026** with: + +- command: + `CI=1 npx vitest bench --run test/benchmark/chunking.bench.js` +- machine: Apple M1 Pro +- memory: 16 GiB +- OS: macOS 26.3 (`25D125`) +- runtime: Node `v25.8.1` +- package manager: npm `11.11.0` +- benchmark runner: Vitest `2.1.9` + +The current harness uses: + +- seeded pseudo-random input buffers for reproducibility +- buffer sizes: `1 MB`, `10 MB`, `100 MB` +- fixed chunking: `16 KiB` +- CDC: + `minChunkSize=4096`, `targetChunkSize=16384`, `maxChunkSize=65536` +- dedupe scenario: + a `1 MB` base file with deterministic inserted edits of `1`, `10`, `100`, + and `1000` bytes about one-third into the file + +One implementation detail to keep in mind: +Vitest emitted multiple pass blocks during the one-shot run on this machine. +The throughput table below records the final reported block from that run. The +dedupe table is deterministic in this harness and was stable across the +observed output. + +## Throughput Baseline + +Observed chunking throughput: + +| Strategy | Buffer | Mean time | Throughput | +| -------- | -------- | -----------: | ------------: | +| CDC | `1 MB` | `4.0060 ms` | `249.62 hz` | +| CDC | `10 MB` | `36.8944 ms` | `27.1044 hz` | +| CDC | `100 MB` | `342.75 ms` | `2.9176 hz` | +| Fixed | `1 MB` | `0.1401 ms` | `7,137.96 hz` | +| Fixed | `10 MB` | `1.1948 ms` | `836.96 hz` | +| Fixed | `100 MB` | `13.1405 ms` | `76.1006 hz` | + +Observed speed advantage for fixed chunking on this machine: + +- `1 MB`: about `28.6x` faster than CDC +- `10 MB`: about `30.9x` faster than CDC +- `100 MB`: about `26.1x` faster than CDC + +## Dedupe Reuse Baseline + +Observed chunk reuse after deterministic inserted edits: + +| Inserted edit | Fixed chunks | Fixed reuse | CDC chunks | CDC reuse | +| ------------- | -----------: | ----------: | ---------: | --------: | +| `1 B` | `65` | `32.3%` | `62` | `98.4%` | +| `10 B` | `65` | `32.3%` | `62` | `98.4%` | +| `100 B` | `65` | `32.3%` | `62` | `98.4%` | +| `1000 B` | `65` | `32.3%` | `62` | `98.4%` | + +What this means: + +- fixed chunking keeps a simple, cheap chunk boundary model +- a small inserted edit shifts later fixed boundaries, so most later chunks stop + matching +- CDC pays much more CPU cost up front, but keeps chunk boundaries aligned well + enough that nearly all later chunks still dedupe in this scenario + +## What Falls Out + +For current `git-cas` guidance: + +- fixed chunking is the right default when ingest cost and simplicity matter + more than edit-shift dedupe +- CDC is the better choice for large assets that change incrementally and where + preserved chunk reuse matters enough to justify more CPU time +- these measurements are chunker-centric, not full end-to-end store or restore + numbers + +This baseline should be read as tradeoff guidance, not as a promise that one +strategy is categorically better. + +## Limits Of This Baseline + +- local-machine measurements are directional, not portable +- this run used Node `v25.8.1`, not the repo's minimum supported Node `22.x` +- the published baseline does not yet cover: + end-to-end store/restore cost, encryption overhead, codec overhead, or Bun and + Deno runtime comparisons + +## Refreshing This Doc + +To refresh the chunking baseline: + +1. Run: + `CI=1 npx vitest bench --run test/benchmark/chunking.bench.js` +2. Record the environment details of the machine and runtime used. +3. Update the throughput and dedupe tables. +4. Keep the narrative honest if the benchmark harness, target chunk sizes, or + interpretation changes. diff --git a/docs/archive/BACKLOG/README.md b/docs/archive/BACKLOG/README.md index 8be268e..5de9b4b 100644 --- a/docs/archive/BACKLOG/README.md +++ b/docs/archive/BACKLOG/README.md @@ -21,5 +21,7 @@ Landed archived backlog items: - landed as [TR-001 — Truth: Architecture Reality Gap](../../design/TR-001-architecture-reality-gap.md) - [TR-002 — Threat Model](./TR-002-threat-model.md) - landed as [TR-002 — Truth: Threat Model](../../design/TR-002-threat-model.md) +- [TR-003 — Benchmark Baselines](./TR-003-benchmark-baselines.md) + - landed as [TR-003 — Truth: Benchmark Baselines](../../design/TR-003-benchmark-baselines.md) - [TR-004 — Design Doc Lifecycle](./TR-004-design-doc-lifecycle.md) - landed as [TR-004 — Truth: Design Doc Lifecycle](../../design/TR-004-design-doc-lifecycle.md) diff --git a/docs/BACKLOG/TR-003-benchmark-baselines.md b/docs/archive/BACKLOG/TR-003-benchmark-baselines.md similarity index 78% rename from docs/BACKLOG/TR-003-benchmark-baselines.md rename to docs/archive/BACKLOG/TR-003-benchmark-baselines.md index 0acb923..2ceb281 100644 --- a/docs/BACKLOG/TR-003-benchmark-baselines.md +++ b/docs/archive/BACKLOG/TR-003-benchmark-baselines.md @@ -2,7 +2,7 @@ ## Legend -- [TR — Truth](../legends/TR-truth.md) +- [TR — Truth](../../legends/TR-truth.md) ## Why This Exists @@ -11,7 +11,7 @@ yet publish stable benchmark guidance that helps users choose among them. ## Target Outcome -Add [docs/BENCHMARKS.md](../BENCHMARKS.md) with baseline results and enough +Add [docs/BENCHMARKS.md](../../BENCHMARKS.md) with baseline results and enough methodology detail that maintainers can refresh it intentionally. ## Human Value @@ -26,7 +26,7 @@ tuning guidance, or follow-on optimization work. ## Linked Invariants -- [I-001 — Determinism, Trust, And Explicit Surfaces](../invariants/I-001-determinism-trust-and-explicit-surfaces.md) +- [I-001 — Determinism, Trust, And Explicit Surfaces](../../invariants/I-001-determinism-trust-and-explicit-surfaces.md) ## Notes diff --git a/docs/design/README.md b/docs/design/README.md index 3ba1eea..7dbfea1 100644 --- a/docs/design/README.md +++ b/docs/design/README.md @@ -41,6 +41,7 @@ Landed cycle docs: - [RL-005 — Relay: Agent Vault Lifecycle](./RL-005-agent-vault-lifecycle.md) - [TR-001 — Truth: Architecture Reality Gap](./TR-001-architecture-reality-gap.md) - [TR-002 — Truth: Threat Model](./TR-002-threat-model.md) +- [TR-003 — Truth: Benchmark Baselines](./TR-003-benchmark-baselines.md) - [TR-004 — Truth: Design Doc Lifecycle](./TR-004-design-doc-lifecycle.md) Archived or retired cycle docs: diff --git a/docs/design/TR-003-benchmark-baselines.md b/docs/design/TR-003-benchmark-baselines.md new file mode 100644 index 0000000..bbc0666 --- /dev/null +++ b/docs/design/TR-003-benchmark-baselines.md @@ -0,0 +1,145 @@ +# TR-003 — Truth: Benchmark Baselines + +## Status + +Landed + +## Linked Legend + +- [TR — Truth](../legends/TR-truth.md) + +## Linked Invariants + +- [I-001 — Determinism, Trust, And Explicit Surfaces](../invariants/I-001-determinism-trust-and-explicit-surfaces.md) + +## Context + +`git-cas` already had a benchmark harness, but it did not yet publish stable +benchmark guidance that maintainers, operators, or agents could cite. + +That left a recurring gap: + +- the repo could measure chunking tradeoffs +- the repo could not point readers to a maintained baseline +- default and tuning guidance therefore risked slipping into guesswork + +This cycle closes that gap by publishing the first benchmark baseline instead of +expanding the benchmark surface. + +## Human Users, Jobs, And Hills + +### Users + +- maintainers +- operators evaluating storage tradeoffs +- adopters deciding between fixed chunking and CDC + +### Jobs + +- understand the current chunking cost/benefit tradeoff +- compare fixed chunking and CDC with real observed numbers +- rerun and refresh the benchmark doc intentionally later + +### Hill + +A maintainer or operator can read [docs/BENCHMARKS.md](../BENCHMARKS.md) and +come away with an honest current baseline for chunking throughput and edit-shift +dedupe behavior. + +## Agent Users, Jobs, And Hills + +### Users + +- coding agents +- review agents +- documentation agents + +### Jobs + +- cite current benchmark tradeoffs without inventing missing numbers +- recommend chunking strategies from published repo truth +- plan performance follow-up work from explicit observed behavior + +### Hill + +An agent can reference [docs/BENCHMARKS.md](../BENCHMARKS.md) as the canonical +published chunking baseline instead of extrapolating from raw benchmark source +files alone. + +## Human Playback + +- Does the published doc explain both throughput cost and dedupe benefit? +- Does it say what machine and runtime produced the numbers? +- Does it avoid pretending local measurements are universal truth? + +## Agent Playback + +- Can an agent tell which benchmark results are published versus merely possible + to derive from the harness? +- Can it distinguish fixed-chunk speed from CDC edit-shift reuse benefits? +- Can it tell how to refresh the baseline later without inventing a new method? + +## Explicit Non-Goals + +- no code changes to the chunkers in this cycle +- no attempt to publish every existing benchmark in one pass +- no claim that these local measurements are portable across all environments + +## Decisions + +### Publish Chunking Guidance First + +The first maintained benchmark baseline should cover the highest-value tradeoff: +fixed-size chunking versus CDC. + +That is the benchmark question most likely to affect defaults, tuning, and +adoption guidance. + +### Reuse The Existing Harness + +This cycle should publish results from the committed benchmark harness in +[`test/benchmark/chunking.bench.js`](../../test/benchmark/chunking.bench.js), +not create a second ad hoc benchmark path. + +### Keep The Baseline Local And Dated + +The right claim is "these are observed local baseline numbers on a documented +machine and runtime," not "these are universal performance truths." + +## Implementation Outline + +1. Audit the current chunking benchmark harness and capture its actual input + sizes and chunker settings. +2. Run the harness and record the observed throughput and dedupe output. +3. Add [docs/BENCHMARKS.md](../BENCHMARKS.md) with methodology, environment, + results, interpretation, and rerun instructions. +4. Add this cycle doc, archive the consumed backlog card, update the Truth + indexes, and record the change in [CHANGELOG.md](../../CHANGELOG.md). + +## Tests To Write First + +No new executable tests. + +This is a documentation-truth cycle. Verification is: + +- rerunning the committed benchmark harness +- direct cross-check against benchmark input sizes and chunker options in + `test/benchmark/chunking.bench.js` +- formatting validation for the touched Markdown files + +## Risks And Unknowns + +- local benchmark results can drift as the machine, Node version, or Vitest + behavior changes +- readers can overread a local baseline as a universal recommendation if the doc + stops being explicit about scope +- the repo still does not publish end-to-end store/restore or cross-runtime + benchmark baselines + +## Retrospective + +This was the right next Truth cycle after the architecture and threat-model +work. + +The repo already knew how to measure chunking tradeoffs. The missing piece was a +published, refreshable statement of what those measurements currently say. diff --git a/docs/legends/TR-truth.md b/docs/legends/TR-truth.md index 4ccc5b2..48b3df9 100644 --- a/docs/legends/TR-truth.md +++ b/docs/legends/TR-truth.md @@ -73,11 +73,11 @@ Current Truth design docs: - [TR-001 — Truth: Architecture Reality Gap](../design/TR-001-architecture-reality-gap.md) - [TR-002 — Truth: Threat Model](../design/TR-002-threat-model.md) +- [TR-003 — Truth: Benchmark Baselines](../design/TR-003-benchmark-baselines.md) - [TR-004 — Truth: Design Doc Lifecycle](../design/TR-004-design-doc-lifecycle.md) Current Truth backlog items: -- [TR-003 — Benchmark Baselines](../BACKLOG/TR-003-benchmark-baselines.md) - [TR-005 — CasService Decomposition Plan](../BACKLOG/TR-005-casservice-decomposition-plan.md) - [TR-006 — Docs Maintainer Checklist](../BACKLOG/TR-006-docs-maintainer-checklist.md) - [TR-007 — Security Doc Discoverability Audit](../BACKLOG/TR-007-security-doc-discoverability-audit.md) From 764ac2cdcfebeb2fb156d84e94748e2cb5a84df1 Mon Sep 17 00:00:00 2001 From: James Ross Date: Mon, 30 Mar 2026 10:09:45 -0700 Subject: [PATCH 2/2] docs: add encrypted restore streaming backlog item --- docs/BACKLOG/README.md | 1 + .../TR-011-streaming-encrypted-restore.md | 46 +++++++++++++++++++ docs/legends/TR-truth.md | 2 + 3 files changed, 49 insertions(+) create mode 100644 docs/BACKLOG/TR-011-streaming-encrypted-restore.md diff --git a/docs/BACKLOG/README.md b/docs/BACKLOG/README.md index 8b348cf..fa3ca41 100644 --- a/docs/BACKLOG/README.md +++ b/docs/BACKLOG/README.md @@ -35,6 +35,7 @@ Current backlog items: - [TR-008 — Empty-State Phrasing Consistency](./TR-008-empty-state-phrasing-consistency.md) - [TR-009 — Pre-PR Doc Cross-Link Audit](./TR-009-pre-pr-doc-cross-link-audit.md) - [TR-010 — Planning Index Consistency Review](./TR-010-planning-index-consistency-review.md) +- [TR-011 — Streaming Encrypted Restore](./TR-011-streaming-encrypted-restore.md) Archived delivered backlog items: diff --git a/docs/BACKLOG/TR-011-streaming-encrypted-restore.md b/docs/BACKLOG/TR-011-streaming-encrypted-restore.md new file mode 100644 index 0000000..dd1c631 --- /dev/null +++ b/docs/BACKLOG/TR-011-streaming-encrypted-restore.md @@ -0,0 +1,46 @@ +# TR-011 — Streaming Encrypted Restore + +## Legend + +- [TR — Truth](../legends/TR-truth.md) + +## Why This Exists + +`git-cas` currently streams plaintext restores chunk-by-chunk, but encrypted or +compressed restores buffer the full payload in memory before yielding output. + +That is safe and simple for the current whole-object AES-GCM format, but it +also means large encrypted restores are bounded by `maxRestoreBufferSize` and do +not yet benefit from a lower-memory temp-file streaming approach. + +## Target Outcome + +Produce a design-backed investigation of streaming encrypted/compressed restore, +including: + +- current integrity and buffering constraints +- whether decrypt-to-temp-file plus atomic rename is the right model +- benchmark questions needed to compare memory and throughput tradeoffs + +## Human Value + +Maintainers and operators should be able to understand whether large encrypted +restores can become more memory-efficient without weakening integrity +guarantees. + +## Agent Value + +Agents should be able to reason about encrypted restore constraints and propose +bounded follow-on work without hand-waving around the current buffering model. + +## Linked Invariants + +- [I-001 — Determinism, Trust, And Explicit Surfaces](../invariants/I-001-determinism-trust-and-explicit-surfaces.md) + +## Notes + +- distinguish plaintext streaming from encrypted/compressed restore behavior +- account for the current whole-object AES-GCM tag model +- evaluate temp-file restore semantics before considering direct-to-destination + writes +- tie any design work to benchmark and memory observations, not intuition alone diff --git a/docs/legends/TR-truth.md b/docs/legends/TR-truth.md index 48b3df9..fcbd5e7 100644 --- a/docs/legends/TR-truth.md +++ b/docs/legends/TR-truth.md @@ -84,6 +84,7 @@ Current Truth backlog items: - [TR-008 — Empty-State Phrasing Consistency](../BACKLOG/TR-008-empty-state-phrasing-consistency.md) - [TR-009 — Pre-PR Doc Cross-Link Audit](../BACKLOG/TR-009-pre-pr-doc-cross-link-audit.md) - [TR-010 — Planning Index Consistency Review](../BACKLOG/TR-010-planning-index-consistency-review.md) +- [TR-011 — Streaming Encrypted Restore](../BACKLOG/TR-011-streaming-encrypted-restore.md) Truth work under this legend is currently focused on: @@ -94,6 +95,7 @@ Truth work under this legend is currently focused on: - evaluating service decomposition where the current boundary is under strain - improving documentation review hygiene and cross-link discoverability - keeping planning indexes and empty-state language consistent over time +- investigating lower-memory restore paths for encrypted and compressed assets ## Explicit Non-Goals