From 52ec76a8d1c0e80bb6561a0570d6039e9d38476b Mon Sep 17 00:00:00 2001 From: James Ross Date: Sun, 29 Mar 2026 15:24:47 -0700 Subject: [PATCH 1/9] docs: add editor report backlog items --- docs/BACKLOG/README.md | 5 ++ .../TR-001-architecture-reality-gap.md | 39 +++++++++ docs/BACKLOG/TR-002-threat-model.md | 40 +++++++++ docs/BACKLOG/TR-003-benchmark-baselines.md | 37 +++++++++ docs/BACKLOG/TR-004-design-doc-lifecycle.md | 39 +++++++++ .../TR-005-casservice-decomposition-plan.md | 41 +++++++++ docs/legends/README.md | 1 + docs/legends/TR-truth.md | 83 +++++++++++++++++++ 8 files changed, 285 insertions(+) create mode 100644 docs/BACKLOG/TR-001-architecture-reality-gap.md create mode 100644 docs/BACKLOG/TR-002-threat-model.md create mode 100644 docs/BACKLOG/TR-003-benchmark-baselines.md create mode 100644 docs/BACKLOG/TR-004-design-doc-lifecycle.md create mode 100644 docs/BACKLOG/TR-005-casservice-decomposition-plan.md create mode 100644 docs/legends/TR-truth.md diff --git a/docs/BACKLOG/README.md b/docs/BACKLOG/README.md index bd58536..8374635 100644 --- a/docs/BACKLOG/README.md +++ b/docs/BACKLOG/README.md @@ -30,3 +30,8 @@ Current backlog items: - [RL-003 — Agent Rotate](./RL-003-agent-rotate.md) - [RL-004 — Agent Vault Rotate](./RL-004-agent-vault-rotate.md) - [RL-005 — Agent Vault Lifecycle](./RL-005-agent-vault-lifecycle.md) +- [TR-001 — Architecture Reality Gap](./TR-001-architecture-reality-gap.md) +- [TR-002 — Threat Model](./TR-002-threat-model.md) +- [TR-003 — Benchmark Baselines](./TR-003-benchmark-baselines.md) +- [TR-004 — Design Doc Lifecycle](./TR-004-design-doc-lifecycle.md) +- [TR-005 — CasService Decomposition Plan](./TR-005-casservice-decomposition-plan.md) diff --git a/docs/BACKLOG/TR-001-architecture-reality-gap.md b/docs/BACKLOG/TR-001-architecture-reality-gap.md new file mode 100644 index 0000000..251a1f2 --- /dev/null +++ b/docs/BACKLOG/TR-001-architecture-reality-gap.md @@ -0,0 +1,39 @@ +# TR-001 — Architecture Reality Gap + +## Legend + +- [TR — Truth](../legends/TR-truth.md) + +## Why This Exists + +[ARCHITECTURE.md](../../ARCHITECTURE.md) appears to lag the shipped system. +If it still describes flat manifests and treats Merkle structure as future work, +it is no longer guidance. It is misinformation. + +## Target Outcome + +Either rewrite [ARCHITECTURE.md](../../ARCHITECTURE.md) so it matches the +current code and shipped behavior, or retire it and fold the durable truth into +the docs that are actually maintained. + +## Human Value + +Contributors and operators should be able to trust the architecture docs +without cross-checking every claim against the code and release history. + +## Agent Value + +Agents should be able to use the architecture docs as current planning input +instead of carrying stale assumptions into review, refactor, or documentation +work. + +## Linked Invariants + +- [I-001 — Determinism, Trust, And Explicit Surfaces](../invariants/I-001-determinism-trust-and-explicit-surfaces.md) + +## Notes + +- verify every claim against the shipped facade and current internals +- decide whether one repaired architecture document is better than several + overlapping partial maps +- remove future-tense language for already-landed behavior diff --git a/docs/BACKLOG/TR-002-threat-model.md b/docs/BACKLOG/TR-002-threat-model.md new file mode 100644 index 0000000..faa9968 --- /dev/null +++ b/docs/BACKLOG/TR-002-threat-model.md @@ -0,0 +1,40 @@ +# TR-002 — Threat Model + +## Legend + +- [TR — Truth](../legends/TR-truth.md) + +## Why This Exists + +`git-cas` has meaningful security behavior, but +[SECURITY.md](../../SECURITY.md) is not the same thing as a threat model. +Operators still need explicit answers about what is protected, what is exposed, +and which compromises are out of scope. + +## Target Outcome + +Add [docs/THREAT_MODEL.md](../THREAT_MODEL.md) with explicit attacker models, +trust boundaries, non-goals, and operator responsibilities. + +## Human Value + +Operators should be able to decide whether `git-cas` is appropriate for a given +repository and threat environment without inferring guarantees from marketing or +implementation details. + +## Agent Value + +Agents should be able to reason about security posture and cite the repo's +actual guarantees and non-guarantees during implementation and review. + +## Linked Invariants + +- [I-001 — Determinism, Trust, And Explicit Surfaces](../invariants/I-001-determinism-trust-and-explicit-surfaces.md) + +## Notes + +- cover vault ref exposure without passphrase disclosure +- cover recipient-based encryption and passphrase-based vault protection +- document what Git object retention, working tree exposure, and host + compromise mean for the security model +- separate design goals from operator duties and non-goals diff --git a/docs/BACKLOG/TR-003-benchmark-baselines.md b/docs/BACKLOG/TR-003-benchmark-baselines.md new file mode 100644 index 0000000..0acb923 --- /dev/null +++ b/docs/BACKLOG/TR-003-benchmark-baselines.md @@ -0,0 +1,37 @@ +# TR-003 — Benchmark Baselines + +## Legend + +- [TR — Truth](../legends/TR-truth.md) + +## Why This Exists + +`git-cas` exposes multiple storage and chunking choices, but the repo does not +yet publish stable benchmark guidance that helps users choose among them. + +## Target Outcome + +Add [docs/BENCHMARKS.md](../BENCHMARKS.md) with baseline results and enough +methodology detail that maintainers can refresh it intentionally. + +## Human Value + +Operators and maintainers should be able to compare fixed-size chunking and CDC +with real numbers instead of guesswork. + +## Agent Value + +Agents should be able to reference benchmark tradeoffs when suggesting defaults, +tuning guidance, or follow-on optimization work. + +## Linked Invariants + +- [I-001 — Determinism, Trust, And Explicit Surfaces](../invariants/I-001-determinism-trust-and-explicit-surfaces.md) + +## Notes + +- include dataset shape, runtime, and machine assumptions +- compare at least fixed-size versus CDC +- capture both cost and benefit signals where practical: + bytes stored, chunk count, elapsed time, and restore behavior +- keep the doc honest about benchmark scope and recency diff --git a/docs/BACKLOG/TR-004-design-doc-lifecycle.md b/docs/BACKLOG/TR-004-design-doc-lifecycle.md new file mode 100644 index 0000000..10d72e2 --- /dev/null +++ b/docs/BACKLOG/TR-004-design-doc-lifecycle.md @@ -0,0 +1,39 @@ +# TR-004 — Design Doc Lifecycle + +## Legend + +- [TR — Truth](../legends/TR-truth.md) + +## Why This Exists + +The new legends and cycles workflow is real, but the repo still needs a clear +rule for what happens to completed cycle docs and backlog items after work +lands. + +Without that, design history can crowd out current truth. + +## Target Outcome + +Define and document how completed backlog items and cycle docs are kept, +indexed, summarized, archived, or retired. + +## Human Value + +Maintainers should be able to distinguish active planning from historical +context without losing useful decision records. + +## Agent Value + +Agents should be able to tell which planning artifacts are active truth, +historical context, or implementation residue. + +## Linked Invariants + +- [I-001 — Determinism, Trust, And Explicit Surfaces](../invariants/I-001-determinism-trust-and-explicit-surfaces.md) + +## Notes + +- decide whether implemented cycle docs remain in `docs/design/` or move to an + archive location +- keep indexes lightweight and current +- prefer explicit status and lifecycle rules over ad hoc cleanup diff --git a/docs/BACKLOG/TR-005-casservice-decomposition-plan.md b/docs/BACKLOG/TR-005-casservice-decomposition-plan.md new file mode 100644 index 0000000..3645dae --- /dev/null +++ b/docs/BACKLOG/TR-005-casservice-decomposition-plan.md @@ -0,0 +1,41 @@ +# TR-005 — CasService Decomposition Plan + +## Legend + +- [TR — Truth](../legends/TR-truth.md) + +## Why This Exists + +[src/domain/services/CasService.js](../../src/domain/services/CasService.js) +appears to hold multiple responsibilities under one roof: +chunking orchestration, manifest generation, encryption flow, and vault-facing +behavior. + +That may now be a real boundary problem, but it should be proven before the +repo pays for a large refactor. + +## Target Outcome + +Produce a design-backed decomposition plan that identifies stable seams, +candidate extractions, and the tests that would need to hold behavior in place. + +## Human Value + +Maintainers should be able to evolve the core service with less fear, clearer +ownership boundaries, and less architectural guesswork. + +## Agent Value + +Agents should be able to make bounded changes in the core service without +unintentionally coupling chunking, encryption, and vault behavior more tightly. + +## Linked Invariants + +- [I-001 — Determinism, Trust, And Explicit Surfaces](../invariants/I-001-determinism-trust-and-explicit-surfaces.md) + +## Notes + +- investigate before extracting +- identify which responsibilities are already implicit subdomains +- prefer seams that reduce coupling and improve testability +- do not treat class count or architectural symmetry as success on their own diff --git a/docs/legends/README.md b/docs/legends/README.md index d52fcba..d38c3e6 100644 --- a/docs/legends/README.md +++ b/docs/legends/README.md @@ -16,3 +16,4 @@ Each legend should define: Current legend docs: - [RL — Relay](./RL-relay.md) +- [TR — Truth](./TR-truth.md) diff --git a/docs/legends/TR-truth.md b/docs/legends/TR-truth.md new file mode 100644 index 0000000..b0ac4f6 --- /dev/null +++ b/docs/legends/TR-truth.md @@ -0,0 +1,83 @@ +# TR — Truth + +## Status + +Active + +## Theme + +Keep the repo honest about what `git-cas` is, how it works, what it protects, +and what tradeoffs it makes. + +## Why This Legend Exists + +`git-cas` now has a strong front door and a substantial shipped surface, but +parts of the repo still drift out of sync with reality: + +- architectural docs can lag shipped behavior +- security docs can stop short of a real threat model +- benchmark entrypoints can exist without stable published results +- planning history can accumulate faster than current-state truth + +That kind of drift is costly for both humans and agents. It makes the repo +harder to trust, harder to review, and harder to extend cleanly. + +## Human Users, Jobs, And Hills + +### Users + +- maintainers +- contributors +- operators evaluating storage and security tradeoffs + +### Jobs + +- understand the current architecture without reverse-engineering the code +- understand what the cryptographic and operational guarantees do and do not + cover +- understand performance tradeoffs before adopting a mode or default + +### Hill + +A maintainer or operator can read the docs and make correct architectural, +security, and adoption decisions without discovering later that the repo told +them something stale or incomplete. + +## Agent Users, Jobs, And Hills + +### Users + +- coding agents +- review agents +- documentation agents +- CI and release workflows that depend on repo truth + +### Jobs + +- reason from current docs without inheriting stale assumptions +- plan refactors and follow-on work from explicit architectural seams +- cite threat and benchmark guidance without inventing missing context + +### Hill + +An agent can treat the repo docs and planning surfaces as reliable inputs for +implementation, review, and follow-on planning. + +## Linked Invariants + +- [I-001 — Determinism, Trust, And Explicit Surfaces](../invariants/I-001-determinism-trust-and-explicit-surfaces.md) + +## Current Cycle Surface + +Backlog items under this legend are currently focused on: + +- repairing stale architecture truth +- publishing security and benchmark guidance that matches shipped behavior +- defining planning-document lifecycle rules +- evaluating service decomposition where the current boundary is under strain + +## Explicit Non-Goals + +- no documentation churn without a concrete truth gap to close +- no architecture refactor for purity alone +- no archival cleanup that destroys useful decision history From 6a6c22c943f49ac5bebc705cb4916c1f747c4cfa Mon Sep 17 00:00:00 2001 From: James Ross Date: Sun, 29 Mar 2026 16:42:34 -0700 Subject: [PATCH 2/9] docs: repair architecture reality gap --- ARCHITECTURE.md | 303 ++++++++++++++++-- CHANGELOG.md | 1 + docs/design/README.md | 1 + .../design/TR-001-architecture-reality-gap.md | 152 +++++++++ docs/legends/TR-truth.md | 4 + 5 files changed, 433 insertions(+), 28 deletions(-) create mode 100644 docs/design/TR-001-architecture-reality-gap.md diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index eb3ed85..af91b45 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -1,37 +1,284 @@ -# Architecture: @git-stunts/cas +# Architecture: `git-cas` -Content Addressable Store (CAS) for Git. +This document is the high-level map of the shipped `git-cas` system. -## 🧱 Core Concepts +It is intentionally not a full API reference. For command and method details, +see [docs/API.md](./docs/API.md). For crypto and security guidance, see +[SECURITY.md](./SECURITY.md). -### Domain Layer (`src/domain/`) -- **Value Objects**: `Manifest` and `Chunk` represent the structured metadata of an asset. -- **Services**: `CasService` implements streaming chunking, encryption (AES-256-GCM), and manifest generation. +## System Model -### Ports Layer (`src/ports/`) -- **GitPersistencePort**: Defines how blobs and trees are saved to Git. -- **CodecPort**: Defines how manifests are encoded (JSON, CBOR). +`git-cas` uses Git as the storage substrate, not as a user-facing abstraction. -### Infrastructure Layer (`src/infrastructure/`) -- **Adapters**: `GitPersistenceAdapter` implementation using `@git-stunts/plumbing`. -- **Codecs**: `JsonCodec` and `CborCodec`. +At a high level, the system does four things: -## 🚀 Scalability & Limits +1. turns input bytes into chunk blobs stored in Git +2. records how to rebuild those bytes in a manifest +3. emits a Git tree that keeps the manifest and chunk blobs reachable +4. optionally indexes trees by slug through a GC-safe vault ref -- **Chunk Size**: Configurable, default 256KB. Minimum 1KB. -- **Streaming**: Encryption and chunking are fully streamed. Memory usage is constant (O(1)) relative to file size. -- **Manifest Limit**: Currently, all chunk metadata is stored in a single flat `manifest` blob. For extremely large files (>100GB), the manifest itself may become unwieldy (linear growth). Future iterations may require a Merkle Tree structure for the manifest itself. +The same core supports: -## 📂 Directory Structure +- a library facade in [index.js](./index.js) +- a human CLI and TUI under `bin/` +- a machine-facing agent CLI under `bin/agent/` -```text -src/ -├── domain/ -│ ├── schemas/ # Zod and JSON schemas -│ ├── services/ # CasService -│ └── value-objects/ # Manifest, Chunk -├── infrastructure/ -│ ├── adapters/ # GitPersistenceAdapter -│ └── codecs/ # JsonCodec, CborCodec -└── ports/ # GitPersistencePort, CodecPort -``` \ No newline at end of file +Those surfaces are different contracts over one shared core. + +## Layer Model + +### Facade + +The public entrypoint is [index.js](./index.js). + +`ContentAddressableStore` is a high-level facade that: + +- lazily initializes the underlying services +- selects the appropriate crypto adapter for the current runtime +- resolves chunking strategy configuration +- wires persistence, ref, codec, crypto, chunking, and observability adapters +- exposes convenience methods like `storeFile()` and `restoreFile()` + +The facade is orchestration glue. It is not the storage engine itself. + +### Domain + +The domain lives under `src/domain/`. + +Current key domain pieces: + +- `Manifest` and `Chunk` + - value objects that describe stored content and chunk metadata +- `CasService` + - the main content orchestration service + - handles store, restore, tree creation, manifest reads, inspection, and + recipient/key operations +- `KeyResolver` + - resolves key sources, passphrase-derived keys, and envelope recipient DEK + wrapping and unwrapping +- `VaultService` + - manages the GC-safe vault ref and its commit-backed slug index +- `rotateVaultPassphrase` + - coordinates vault-wide passphrase rotation across existing entries +- `CasError` + - the canonical domain error type with stable codes and metadata + +`CasService` is still the central orchestration unit for content flows. That is +current architecture truth, not a future-state claim. + +### Ports + +The ports live under `src/ports/`. + +They define the seams the domain depends on: + +- `GitPersistencePort` + - blob and tree read/write operations +- `GitRefPort` + - ref resolution, commit creation, and compare-and-swap ref updates +- `CodecPort` + - manifest encoding and decoding +- `CryptoPort` + - hashing, encryption, decryption, random bytes, and KDF operations +- `ChunkingPort` + - strategy interface for fixed-size and content-defined chunking +- `ObservabilityPort` + - metrics, logs, and spans without binding the domain to Node event APIs + +### Infrastructure + +The infrastructure layer lives under `src/infrastructure/`. + +Current shipped adapters include: + +- `GitPersistenceAdapter` +- `GitRefAdapter` +- `NodeCryptoAdapter` +- `BunCryptoAdapter` +- `WebCryptoAdapter` +- `JsonCodec` +- `CborCodec` +- `FixedChunker` +- `CdcChunker` +- `SilentObserver` +- `EventEmitterObserver` +- `StatsCollector` + +There are also small adapter helpers such as: + +- `createCryptoAdapter` + - runtime-adaptive crypto selection +- `resolveChunker` + - chunker construction from config +- `FileIOHelper` + - file-backed convenience helpers for the facade + +## Storage Model + +### Chunks + +Stored content is broken into chunks and written as Git blobs. + +The manifest records the authoritative ordered chunk list, including: + +- chunk index +- chunk size +- SHA-256 digest +- backing blob OID + +The manifest, not the tree layout, is the source of truth for reconstruction +order and repeated chunk occurrences. + +### Manifests + +Manifests are encoded through the configured codec: + +- JSON by default +- CBOR when configured + +Small and medium assets use a single manifest blob. + +Large assets already use Merkle-style manifests. When chunk count exceeds +`merkleThreshold`, `createTree()` writes: + +- a root manifest with `version: 2` +- an empty top-level `chunks` array +- `subManifests` references pointing at additional manifest blobs + +`readManifest()` resolves those sub-manifests transparently and reconstructs the +flat logical chunk list for callers. + +Merkle manifests are shipped behavior, not future work. + +### Trees + +`createTree()` emits a Git tree that keeps the asset reachable. + +For non-Merkle assets the tree contains: + +- `manifest.` +- one blob entry per unique chunk digest, in first-seen order + +For Merkle assets the tree contains: + +- `manifest.` +- `sub-manifest-.` blobs +- one blob entry per unique chunk digest, in first-seen order + +Chunk blobs are deduplicated at the tree-entry level by digest. The manifest +still remains authoritative for repeated-chunk order and multiplicity. + +### Vault + +The vault is a GC-safe slug index rooted at `refs/cas/vault`. + +It is implemented as a commit chain. Each vault commit points to a tree +containing: + +- one tree entry per stored slug, mapped to that asset's tree OID +- `.vault.json` metadata for vault configuration + +`VaultService` owns: + +- slug validation +- vault initialization +- add, update, list, resolve, remove, and history-oriented state reads +- compare-and-swap ref updates with retry on conflict +- vault metadata validation + +Vault metadata can include passphrase-derived encryption configuration and +related counters, but the vault still fundamentally acts as the durable +slug-to-tree index for stored assets. + +## Core Flows + +### Store + +The store path looks like this: + +1. resolve key source or recipient envelope settings +2. optionally gzip the input stream +3. choose a chunking strategy +4. optionally encrypt the processed stream +5. write chunk blobs to Git +6. build a manifest +7. optionally emit a Git tree and add it to the vault + +Important current behavior: + +- encryption and recipient envelope setup are mutually exclusive +- CDC is supported, but encryption removes CDC dedupe benefits because + ciphertext is pseudorandom +- observability ports receive metrics and warnings throughout the flow + +### Restore + +The restore path: + +1. reads a manifest from a tree or receives one directly +2. resolves decryption key material if needed +3. reads and verifies chunk blobs by SHA-256 digest +4. either streams plaintext chunks directly or buffers for decrypt/decompress +5. returns bytes or writes them to disk through the facade helper + +For unencrypted and uncompressed assets, restore can operate as true chunk +streaming. Encrypted or compressed restores currently use a buffered path with +explicit size guards. + +### Vault Mutation + +Vault mutation is separate from the core chunk store. + +`VaultService` updates `refs/cas/vault` through compare-and-swap semantics, +creating a new commit for each successful mutation and retrying on conflicts. + +That keeps slug resolution durable across `git gc` while leaving the content +store itself in ordinary Git objects. + +## Runtime Model + +`git-cas` targets multiple JavaScript runtimes. + +The core architecture is designed so the domain does not care whether it is +running on Node, Bun, or a Web Crypto-capable environment. Runtime differences +are isolated in the infrastructure adapters and selected by the facade or CLI +bootstrapping code. + +The repo enforces this with a real Node, Bun, and Deno test matrix. + +## Honest Pressure Points + +The main architectural pressure point today is `CasService`. + +It already benefits from some meaningful extractions: + +- `KeyResolver` +- `VaultService` +- `rotateVaultPassphrase` +- chunker and crypto adapter factories +- file I/O helpers + +But it still owns a broad content-orchestration surface: + +- store and restore +- manifest and tree handling +- lifecycle inspection helpers +- recipient mutation and key rotation + +That is good candidate pressure for future decomposition work, but it is not yet +a completed architectural split. + +## Reading This With Other Docs + +Use this document for the current system shape. + +Use these docs for adjacent truth: + +- [README.md](./README.md) + - positioning, feature overview, and release highlights +- [docs/API.md](./docs/API.md) + - library and CLI reference +- [SECURITY.md](./SECURITY.md) + - crypto and security guidance +- [WORKFLOW.md](./WORKFLOW.md) + - current planning and delivery model diff --git a/CHANGELOG.md b/CHANGELOG.md index ee32eca..79263d4 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -23,6 +23,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Changed +- **Architecture map repaired** — [ARCHITECTURE.md](./ARCHITECTURE.md) now describes the shipped system instead of an older flat-manifest-only model, including Merkle manifests, the extracted `VaultService` and `KeyResolver`, current ports/adapters, and the real storage layout for trees and the vault. - **GitHub Actions runtime maintenance** — CI and release workflows now run on `actions/checkout@v6` and `actions/setup-node@v6`, clearing the Node 20 deprecation warnings from GitHub-hosted runners. - **Ubuntu-based Docker test stages** — the local/CI Node, Bun, and Deno test images now build on `ubuntu:24.04`, copying runtime binaries from the official upstream images instead of inheriting Debian-based runtime images directly, and the final test commands now run as an unprivileged `gitstunts` user. - **Test conventions expanded** — `test/CONVENTIONS.md` now documents Git tree filename ordering, Docker-only integration policy, pinned integration `fileParallelism: false`, and direct-argv subprocess helpers. diff --git a/docs/design/README.md b/docs/design/README.md index a199a51..b286a06 100644 --- a/docs/design/README.md +++ b/docs/design/README.md @@ -27,3 +27,4 @@ Current design docs: - [RL-003 — Relay: Agent Rotate](./RL-003-agent-rotate.md) - [RL-004 — Relay: Agent Vault Rotate](./RL-004-agent-vault-rotate.md) - [RL-005 — Relay: Agent Vault Lifecycle](./RL-005-agent-vault-lifecycle.md) +- [TR-001 — Truth: Architecture Reality Gap](./TR-001-architecture-reality-gap.md) diff --git a/docs/design/TR-001-architecture-reality-gap.md b/docs/design/TR-001-architecture-reality-gap.md new file mode 100644 index 0000000..104ac79 --- /dev/null +++ b/docs/design/TR-001-architecture-reality-gap.md @@ -0,0 +1,152 @@ +# TR-001 — Truth: Architecture Reality Gap + +## Status + +Landed + +## Linked Legend + +- [TR — Truth](../legends/TR-truth.md) + +## Linked Invariants + +- [I-001 — Determinism, Trust, And Explicit Surfaces](../invariants/I-001-determinism-trust-and-explicit-surfaces.md) + +## Context + +[ARCHITECTURE.md](../../ARCHITECTURE.md) had drifted far enough from the shipped +system that it was teaching the wrong model: + +- it still described manifests as flat-only +- it did not reflect the extracted `VaultService` +- it did not reflect the extracted `KeyResolver` +- it did not reflect the chunking, observability, and runtime-adapter seams that + now exist in the code + +That kind of drift is worse than no architecture map at all. + +This cycle repairs the map without pretending the codebase is more decomposed +than it really is. + +## Human Users, Jobs, And Hills + +### Users + +- maintainers +- contributors +- operators reading the repo before adoption or modification + +### Jobs + +- understand the current system shape quickly +- distinguish stable seams from implementation pressure points +- trust that the architecture docs describe shipped behavior + +### Hill + +A maintainer can read [ARCHITECTURE.md](../../ARCHITECTURE.md) and come away +with a correct current model of storage, layering, and system responsibilities +without having to reverse-engineer the code. + +## Agent Users, Jobs, And Hills + +### Users + +- coding agents +- review agents +- documentation agents + +### Jobs + +- reason from the docs without inheriting stale claims +- identify the current architecture boundaries before proposing changes +- separate current truth from future refactor intent + +### Hill + +An agent can use [ARCHITECTURE.md](../../ARCHITECTURE.md) as a reliable +high-level map of the shipped system instead of a historical artifact. + +## Human Playback + +- Does the doc explain what gets stored in Git and how those objects stay + reachable? +- Does it explain the roles of `CasService`, `VaultService`, and the facade + without pretending they are something they are not? +- Does it point readers toward the more detailed docs for API and security + questions? + +## Agent Playback + +- Can an agent infer the current seams between domain, ports, infrastructure, + and CLI surfaces without reading the entire repo first? +- Can it tell that Merkle manifests are already shipped behavior rather than a + future plan? +- Can it identify the current central orchestration pressure in `CasService` + without mistaking that for a landed decomposition? + +## Explicit Non-Goals + +- no code refactor in this cycle +- no attempt to turn the architecture doc into a full API reference +- no invented decomposition that the codebase does not yet implement + +## Decisions + +### Keep A Single Architecture Map + +The repo still benefits from one durable architecture document. + +The right fix is to repair [ARCHITECTURE.md](../../ARCHITECTURE.md), not delete +it and scatter the map across unrelated docs. + +### Describe The Shipped Storage Model Explicitly + +The repaired doc must describe the current storage truth: + +- chunk blobs are stored in Git +- manifests are authoritative for ordered chunk reconstruction +- large assets already use Merkle-style sub-manifests +- Git trees keep manifests and chunks reachable +- the vault is a ref-backed slug-to-tree index with metadata + +### Be Honest About Boundary Pressure + +The doc should state that `CasService` remains the central content orchestration +unit even after extractions like `KeyResolver` and `VaultService`. + +That is current truth, not a flaw to paper over. + +## Implementation Outline + +1. Audit the old architecture doc against the current facade and domain code. +2. Rewrite [ARCHITECTURE.md](../../ARCHITECTURE.md) as a current high-level map + of system surfaces, layers, flows, and storage structures. +3. Add this cycle doc to the design index and surface it from the Truth legend. +4. Record the truth-repair change in [CHANGELOG.md](../../CHANGELOG.md). + +## Tests To Write First + +No new executable tests. + +This is a documentation-truth cycle. Verification is: + +- direct cross-check against `index.js` +- direct cross-check against `CasService.js`, `VaultService.js`, and + `KeyResolver.js` +- formatting validation for the touched Markdown files + +## Risks And Unknowns + +- the doc can still become stale later if follow-on refactors do not update it +- a high-level map can drift toward API reference if it becomes too detailed +- `CasService` remains a pressure point, so the doc needs to stay honest without + overcommitting to a future split + +## Retrospective + +This cycle was worth doing first. + +The old doc was short, but its brevity hid real inaccuracies. Rewriting it as a +current map repaired the biggest truth gap without forcing a premature +architectural refactor. diff --git a/docs/legends/TR-truth.md b/docs/legends/TR-truth.md index b0ac4f6..0117f7a 100644 --- a/docs/legends/TR-truth.md +++ b/docs/legends/TR-truth.md @@ -69,6 +69,10 @@ implementation, review, and follow-on planning. ## Current Cycle Surface +Current Truth design docs: + +- [TR-001 — Truth: Architecture Reality Gap](../design/TR-001-architecture-reality-gap.md) + Backlog items under this legend are currently focused on: - repairing stale architecture truth From 61866300525151a5ebe5d4cf664920db7db51bce Mon Sep 17 00:00:00 2001 From: James Ross Date: Sun, 29 Mar 2026 18:52:37 -0700 Subject: [PATCH 3/9] docs: add threat model --- CHANGELOG.md | 1 + SECURITY.md | 131 ++++++++++----- docs/API.md | 209 +++++++++++++----------- docs/THREAT_MODEL.md | 249 +++++++++++++++++++++++++++++ docs/design/README.md | 1 + docs/design/TR-002-threat-model.md | 166 +++++++++++++++++++ docs/legends/TR-truth.md | 1 + 7 files changed, 625 insertions(+), 133 deletions(-) create mode 100644 docs/THREAT_MODEL.md create mode 100644 docs/design/TR-002-threat-model.md diff --git a/CHANGELOG.md b/CHANGELOG.md index 79263d4..66efc38 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -13,6 +13,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - **`git cas agent rotate`** — added a machine-facing rotation flow so Relay can rotate recipient keys by slug or detached tree OID and expose the resulting tree and vault side effects explicitly. - **`git cas agent vault rotate`** — added a machine-facing vault passphrase rotation flow so Relay can rotate encrypted vault state with explicit commit, KDF, and rotated/skipped-entry results. - **`git cas agent vault init|remove`** — added machine-facing vault lifecycle commands so Relay can initialize encrypted or plaintext vaults and remove entries without scraping human CLI output. +- **Threat model doc** — added [docs/THREAT_MODEL.md](./docs/THREAT_MODEL.md) as the canonical statement of attacker models, trust boundaries, exposed metadata, and explicit non-goals. - **Workflow model** — added [WORKFLOW.md](./WORKFLOW.md), explicit legends/backlog/invariants directories, and a cycle-first planning model for fresh work. - **Review automation baseline** — added `.github/CODEOWNERS` with repo-wide ownership for `@git-stunts`. - **Release runbook** — added `docs/RELEASE.md` and linked it from `CONTRIBUTING.md` as the canonical patch-release workflow. diff --git a/SECURITY.md b/SECURITY.md index 12fd81d..1160368 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -2,6 +2,9 @@ This document describes the security architecture, cryptographic design, and limitations of git-cas's content-addressable storage system with optional encryption. +For explicit attacker models, trust boundaries, protected assets, exposed +metadata, and non-goals, see [docs/THREAT_MODEL.md](./docs/THREAT_MODEL.md). + ## Table of Contents 1. [Operational Limits](#operational-limits) @@ -31,21 +34,21 @@ git-cas tracks encryption operations via `encryptionCount` in vault metadata. Wh When using passphrase-based encryption, git-cas derives keys using PBKDF2 or scrypt. -| Algorithm | Recommended Parameters | Notes | -|-----------|----------------------|-------| -| PBKDF2 | iterations ≥ 600,000 (SHA-256) | OWASP 2024 recommendation | -| scrypt | N=2^17, r=8, p=1 | ~128 MiB memory | +| Algorithm | Recommended Parameters | Notes | +| --------- | ------------------------------ | ------------------------- | +| PBKDF2 | iterations ≥ 600,000 (SHA-256) | OWASP 2024 recommendation | +| scrypt | N=2^17, r=8, p=1 | ~128 MiB memory | Higher iteration counts / cost parameters increase resistance to brute-force attacks but also increase the time to derive a key. Choose parameters based on your threat model and latency tolerance. ### Passphrase Entropy Recommendations -| Entropy (bits) | Example | Brute-Force Resistance | -|---------------|---------|----------------------| -| < 40 | `password123` | Trivially crackable | -| 40–60 | 4–5 random dictionary words | Weak against GPU attacks | -| 60–80 | 6+ random dictionary words or 12+ mixed characters | Moderate | -| > 80 | 8+ random dictionary words or 16+ mixed characters | Strong | +| Entropy (bits) | Example | Brute-Force Resistance | +| -------------- | -------------------------------------------------- | ------------------------ | +| < 40 | `password123` | Trivially crackable | +| 40–60 | 4–5 random dictionary words | Weak against GPU attacks | +| 60–80 | 6+ random dictionary words or 12+ mixed characters | Moderate | +| > 80 | 8+ random dictionary words or 16+ mixed characters | Strong | **Minimum recommendation**: 80+ bits of entropy for vault passphrases. Use a random passphrase generator (e.g., Diceware) rather than human-chosen passwords. @@ -53,6 +56,11 @@ Higher iteration counts / cost parameters increase resistance to brute-force att ## Threat Model +For the canonical threat model, see +[docs/THREAT_MODEL.md](./docs/THREAT_MODEL.md). + +This section is the short-form summary. + ### What git-cas Protects Against git-cas provides defense against the following threat scenarios: @@ -180,6 +188,7 @@ _validateKey(key) { **Required length**: Exactly 32 bytes (256 bits) If validation fails: + - **INVALID_KEY_TYPE**: Key is not a Buffer or Uint8Array - **INVALID_KEY_LENGTH**: Key is not 32 bytes @@ -211,15 +220,18 @@ When storing content with encryption enabled: ### Step-by-Step: `store({ source, slug, filename, encryptionKey })` **Step 1: Key Validation** + ```javascript if (encryptionKey) { this._validateKey(encryptionKey); } ``` + - If `encryptionKey` is provided, validate it is a 32-byte Buffer/Uint8Array. - If validation fails, throw `CasError` with code `INVALID_KEY_TYPE` or `INVALID_KEY_LENGTH`. **Step 2: Initialize Manifest Data** + ```javascript const manifestData = { slug, @@ -230,9 +242,11 @@ const manifestData = { ``` **Step 3: Create Encryption Stream** + ```javascript const { encrypt, finalize } = this.crypto.createEncryptionStream(encryptionKey); ``` + - `createEncryptionStream()` generates a 12-byte random nonce. - Creates an `aes-256-gcm` cipher with the key and nonce. - Returns: @@ -240,18 +254,22 @@ const { encrypt, finalize } = this.crypto.createEncryptionStream(encryptionKey); - `finalize`: a function that returns encryption metadata after encryption completes **Step 4: Chunk and Store Encrypted Stream** + ```javascript await this._chunkAndStore(encrypt(source), manifestData); ``` + - The `encrypt(source)` async generator reads from the source, encrypts data incrementally, and yields encrypted buffers. - `_chunkAndStore()` buffers encrypted data to 256KB boundaries. - Each 256KB chunk is SHA-256 hashed and written as a Git blob. - Chunk metadata (index, size, digest, blob OID) is appended to `manifestData.chunks`. **Step 5: Finalize Encryption Metadata** + ```javascript manifestData.encryption = finalize(); ``` + - `finalize()` retrieves the GCM authentication tag. - Returns an object: ```javascript @@ -265,6 +283,7 @@ manifestData.encryption = finalize(); - This metadata is stored in the manifest's `encryption` field. **Step 6: Create Manifest** + ```javascript const manifest = new Manifest(manifestData); ``` @@ -295,6 +314,7 @@ When restoring content with encryption: ### Step-by-Step: `restore({ manifest, encryptionKey })` **Step 1: Key Validation** + ```javascript if (encryptionKey) { this._validateKey(encryptionKey); @@ -302,20 +322,21 @@ if (encryptionKey) { ``` **Step 2: Check if Key is Required** + ```javascript if (manifest.encryption?.encrypted && !encryptionKey) { - throw new CasError( - 'Encryption key required to restore encrypted content', - 'MISSING_KEY', - ); + throw new CasError('Encryption key required to restore encrypted content', 'MISSING_KEY'); } ``` + - If the manifest indicates content is encrypted but no key is provided, throw `MISSING_KEY`. **Step 3: Read and Verify Chunks** + ```javascript const chunks = await this._readAndVerifyChunks(manifest.chunks); ``` + - For each chunk in the manifest: 1. Read the Git blob by OID. 2. Compute SHA-256 digest of the blob. @@ -324,14 +345,17 @@ const chunks = await this._readAndVerifyChunks(manifest.chunks); 5. If match, append blob to `buffers` array. **Step 4: Concatenate Encrypted Chunks** + ```javascript let buffer = Buffer.concat(chunks); ``` + - All encrypted chunk buffers are concatenated into a single ciphertext buffer. **CRITICAL**: This operation loads the entire ciphertext into memory. For large files, this may cause memory exhaustion. See [Limitations](#limitations). **Step 5: Decrypt Buffer** + ```javascript if (manifest.encryption?.encrypted) { buffer = await this.decrypt({ @@ -341,6 +365,7 @@ if (manifest.encryption?.encrypted) { }); } ``` + - Extract nonce and tag from `manifest.encryption`. - Create `aes-256-gcm` decipher with key and nonce. - Set authentication tag via `setAuthTag()`. @@ -355,6 +380,7 @@ if (manifest.encryption?.encrypted) { - If `decipher.final()` throws (due to tag mismatch or corrupted ciphertext), catch and re-throw as `CasError` with code `INTEGRITY_ERROR`. **Step 6: Return Plaintext** + ```javascript return { buffer, bytesWritten: buffer.length }; ``` @@ -414,10 +440,12 @@ let buffer = Buffer.concat(chunks); ``` **Impact**: + - For large encrypted files (e.g., 1GB+), this can cause memory exhaustion. - Node.js has a maximum buffer size of ~2GB (depending on architecture). **Workaround**: + - Avoid encrypting extremely large files with git-cas. - If large encrypted files are required, implement application-level chunking (e.g., split a 10GB file into 10 separate 1GB files before storing). @@ -428,6 +456,7 @@ let buffer = Buffer.concat(chunks); **Issue**: AES-256-GCM decryption is currently performed on the entire ciphertext as a single operation. The authentication tag is verified only at the end of decryption. **Impact**: + - Cannot stream decrypted plaintext to the caller incrementally. - Cannot detect tampering until the entire ciphertext is processed. @@ -441,6 +470,7 @@ let buffer = Buffer.concat(chunks); - `rotateVaultPassphrase({ oldPassphrase, newPassphrase })` — rotates all envelope-encrypted vault entries atomically. **Limitations**: + - **Legacy (non-envelope) encrypted content** does not support rotation. You must restore with the old key and re-store with envelope encryption. - **Rotation does not invalidate old ciphertext**: The encrypted data blobs remain unchanged in the Git object database. An attacker who has both the old wrapped DEK (from a prior manifest commit) and the old KEK can still decrypt. To fully revoke access, the old manifest commits must be unreachable (e.g., via vault history squash + `git gc`). @@ -451,25 +481,30 @@ let buffer = Buffer.concat(chunks); **Issue**: While 96-bit nonces have negligible collision probability for practical use cases, the GCM security proof degrades after ~2^32 encryptions with the same key. **Impact**: + - If the same key is used to encrypt more than 2^32 files, nonce reuse becomes more likely. - Nonce reuse with AES-GCM is catastrophic: it allows attackers to recover the plaintext and authentication key. **Mitigation**: + - Rotate encryption keys after a reasonable number of operations (e.g., every 1 million encryptions, or every 90 days, whichever comes first). ### 5. Metadata Not Encrypted **Issue**: The following metadata is stored in plaintext in the manifest: + - `slug` (file identifier) - `filename` - `size` (total size of encrypted content) - `chunks` array (chunk indices, sizes, digests, blob OIDs) **Impact**: + - An attacker with access to the repository can infer file structure, sizes, and access patterns. - Chunk digests may leak information about plaintext content if chunks are small or predictable. **Mitigation**: + - If metadata privacy is required, implement application-level encryption of the entire manifest before storing it as a Git blob. ### 6. No Protection Against Replay or Rollback Attacks @@ -477,10 +512,12 @@ let buffer = Buffer.concat(chunks); **Issue**: git-cas does not include versioning or timestamps in the encryption metadata. **Impact**: + - An attacker can replace a newer manifest tree with an older one (rollback attack). - An attacker can duplicate encrypted content across different slugs (replay attack). **Mitigation**: + - Use Git commit signing to authenticate manifest trees. - Implement application-level versioning or monotonic counters. @@ -511,6 +548,7 @@ git gc --aggressive --prune=now ``` **Important**: + - `git gc` only removes objects that are not reachable from any ref (branch, tag, commit). - If a manifest tree is still referenced (e.g., in a commit or reflog), its chunks will NOT be pruned. @@ -519,6 +557,7 @@ git gc --aggressive --prune=now 1. **Deleted content may persist**: If you "delete" a file by removing its manifest reference, the encrypted chunks remain in `.git/objects/` until `git gc` prunes them. 2. **Reflog prevents immediate pruning**: Git's reflog keeps references to old commits for 90 days by default. To prune immediately: + ```bash git reflog expire --expire=now --all git gc --prune=now @@ -545,69 +584,77 @@ git-cas defines the following error codes for security-related operations: ### `INTEGRITY_ERROR` **Thrown when**: + - A chunk's SHA-256 digest does not match the stored digest in the manifest. - AES-256-GCM authentication tag verification fails during decryption. **Example**: + ```javascript -throw new CasError( - 'Chunk 2 integrity check failed', - 'INTEGRITY_ERROR', - { chunkIndex: 2, expected: 'abc123...', actual: 'def456...' }, -); +throw new CasError('Chunk 2 integrity check failed', 'INTEGRITY_ERROR', { + chunkIndex: 2, + expected: 'abc123...', + actual: 'def456...', +}); ``` **Possible causes**: + - Corruption of Git objects on disk. - Tampering with chunk blobs. - Wrong encryption key used for decryption (GCM tag mismatch). - Incomplete or interrupted writes. **Recommended action**: + - If this occurs during `restore()`, the file is corrupted and cannot be recovered without a backup. - If this occurs during `verifyIntegrity()`, investigate storage hardware or Git repository health. ### `INVALID_KEY_LENGTH` **Thrown when**: + - An encryption key is provided but is not exactly 32 bytes (256 bits). **Example**: + ```javascript -throw new CasError( - 'Encryption key must be 32 bytes, got 16', - 'INVALID_KEY_LENGTH', - { expected: 32, actual: 16 }, -); +throw new CasError('Encryption key must be 32 bytes, got 16', 'INVALID_KEY_LENGTH', { + expected: 32, + actual: 16, +}); ``` **Possible causes**: + - Incorrect key generation (e.g., using 128-bit AES key instead of 256-bit). - Key truncation during storage or transmission. - Encoding issues (e.g., base64 decoding resulting in wrong length). **Recommended action**: + - Verify key generation logic uses `crypto.randomBytes(32)` or equivalent. - Check key storage/retrieval does not corrupt or truncate the key. ### `INVALID_KEY_TYPE` **Thrown when**: + - An encryption key is provided but is not a `Buffer` or `Uint8Array`. **Example**: + ```javascript -throw new CasError( - 'Encryption key must be a Buffer or Uint8Array', - 'INVALID_KEY_TYPE', -); +throw new CasError('Encryption key must be a Buffer or Uint8Array', 'INVALID_KEY_TYPE'); ``` **Possible causes**: + - Passing a string instead of a Buffer (e.g., `"my-secret-key"` instead of `Buffer.from("my-secret-key")`). - Passing a base64-encoded string without decoding it first. **Recommended action**: + - Ensure keys are stored as `Buffer` or `Uint8Array`. - If keys are stored as hex/base64 strings, decode them before passing to git-cas: ```javascript @@ -617,44 +664,48 @@ throw new CasError( ### `MISSING_KEY` **Thrown when**: + - A manifest indicates content is encrypted (`manifest.encryption.encrypted === true`) but no `encryptionKey` is provided to `restore()`. **Example**: + ```javascript -throw new CasError( - 'Encryption key required to restore encrypted content', - 'MISSING_KEY', -); +throw new CasError('Encryption key required to restore encrypted content', 'MISSING_KEY'); ``` **Possible causes**: + - Application logic error: Forgot to pass key to `restore()`. - Key was lost or not available in the current environment. **Recommended action**: + - Verify the encryption key is available and passed to `restore()`. - If the key is lost, the content is permanently inaccessible. ### `RESTORE_TOO_LARGE` **Thrown when**: + - An encrypted or compressed restore would exceed the configured `maxRestoreBufferSize` limit. - The post-decompression size exceeds the limit (checked after gunzip). **Example**: + ```javascript -throw new CasError( - 'Restore buffer exceeds limit', - 'RESTORE_TOO_LARGE', - { size: 1073741824, limit: 536870912 }, -); +throw new CasError('Restore buffer exceeds limit', 'RESTORE_TOO_LARGE', { + size: 1073741824, + limit: 536870912, +}); ``` **Possible causes**: + - The asset is larger than the configured buffer limit (default 512 MiB). - A compressed asset inflates beyond the limit after decompression. **Recommended action**: + - Increase `maxRestoreBufferSize` in the `CasService` constructor or `.casrc`. - For very large assets, consider storing without encryption to enable streaming restore. @@ -663,23 +714,27 @@ throw new CasError( ### `ENCRYPTION_BUFFER_EXCEEDED` **Thrown when**: + - Web Crypto AES-GCM encryption is attempted on data exceeding the configured `maxEncryptionBufferSize`. - Web Crypto is a one-shot API — it cannot stream, so the entire plaintext must fit in memory. **Example**: + ```javascript throw new CasError( 'Streaming encryption buffered 1073741824 bytes (limit: 536870912)...', 'ENCRYPTION_BUFFER_EXCEEDED', - { accumulated: 1073741824, limit: 536870912 }, + { accumulated: 1073741824, limit: 536870912 } ); ``` **Possible causes**: + - Large chunks combined with `WebCryptoAdapter` (used in Bun/Deno). - `NodeCryptoAdapter` uses true streaming and is not affected by this limit. **Recommended action**: + - Increase `maxEncryptionBufferSize` in the `WebCryptoAdapter` constructor. - Switch to `NodeCryptoAdapter` if streaming encryption is needed. - Split the asset before storing, or store without encryption on the Web Crypto path for very large files. diff --git a/docs/API.md b/docs/API.md index a55f02e..10aa3f7 100644 --- a/docs/API.md +++ b/docs/API.md @@ -20,7 +20,7 @@ The main facade class providing high-level API for content-addressable storage. ### Constructor ```javascript -new ContentAddressableStore(options) +new ContentAddressableStore(options); ``` **Parameters:** @@ -47,7 +47,7 @@ const cas = new ContentAddressableStore({ plumbing }); #### createJson ```javascript -ContentAddressableStore.createJson({ plumbing, chunkSize, policy }) +ContentAddressableStore.createJson({ plumbing, chunkSize, policy }); ``` Creates a CAS instance with JSON codec. @@ -69,7 +69,7 @@ const cas = ContentAddressableStore.createJson({ plumbing }); #### createCbor ```javascript -ContentAddressableStore.createCbor({ plumbing, chunkSize, policy }) +ContentAddressableStore.createCbor({ plumbing, chunkSize, policy }); ``` Creates a CAS instance with CBOR codec. @@ -93,7 +93,7 @@ const cas = ContentAddressableStore.createCbor({ plumbing }); #### getService ```javascript -await cas.getService() +await cas.getService(); ``` Lazily initializes and returns the underlying CasService instance. @@ -109,7 +109,7 @@ const service = await cas.getService(); #### store ```javascript -await cas.store({ source, slug, filename, encryptionKey, passphrase, kdfOptions, compression }) +await cas.store({ source, slug, filename, encryptionKey, passphrase, kdfOptions, compression }); ``` Stores content from an async iterable source. @@ -143,14 +143,22 @@ const stream = createReadStream('/path/to/file.txt'); const manifest = await cas.store({ source: stream, slug: 'my-asset', - filename: 'file.txt' + filename: 'file.txt', }); ``` #### storeFile ```javascript -await cas.storeFile({ filePath, slug, filename, encryptionKey, passphrase, kdfOptions, compression }) +await cas.storeFile({ + filePath, + slug, + filename, + encryptionKey, + passphrase, + kdfOptions, + compression, +}); ``` Convenience method that opens a file and stores it. @@ -174,14 +182,14 @@ Convenience method that opens a file and stores it. ```javascript const manifest = await cas.storeFile({ filePath: '/path/to/file.txt', - slug: 'my-asset' + slug: 'my-asset', }); ``` #### restore ```javascript -await cas.restore({ manifest, encryptionKey, passphrase }) +await cas.restore({ manifest, encryptionKey, passphrase }); ``` Restores content from a manifest and returns the buffer. @@ -213,7 +221,7 @@ const { buffer, bytesWritten } = await cas.restore({ manifest }); #### restoreFile ```javascript -await cas.restoreFile({ manifest, encryptionKey, passphrase, outputPath }) +await cas.restoreFile({ manifest, encryptionKey, passphrase, outputPath }); ``` Restores content from a manifest and writes it to a file. @@ -234,14 +242,14 @@ Restores content from a manifest and writes it to a file. ```javascript await cas.restoreFile({ manifest, - outputPath: '/path/to/output.txt' + outputPath: '/path/to/output.txt', }); ``` #### createTree ```javascript -await cas.createTree({ manifest }) +await cas.createTree({ manifest }); ``` Creates a Git tree object from a manifest. @@ -261,7 +269,7 @@ const treeOid = await cas.createTree({ manifest }); #### verifyIntegrity ```javascript -await cas.verifyIntegrity(manifest) +await cas.verifyIntegrity(manifest); ``` Verifies the integrity of stored content by re-hashing all chunks. @@ -284,7 +292,7 @@ if (!isValid) { #### readManifest ```javascript -await cas.readManifest({ treeOid }) +await cas.readManifest({ treeOid }); ``` Reads a Git tree, locates the manifest entry, decodes it, and returns a validated Manifest value object. @@ -306,14 +314,14 @@ Reads a Git tree, locates the manifest entry, decodes it, and returns a validate ```javascript const treeOid = 'a1b2c3d4e5f6...'; const manifest = await cas.readManifest({ treeOid }); -console.log(manifest.slug); // "photos/vacation" -console.log(manifest.chunks); // array of Chunk objects +console.log(manifest.slug); // "photos/vacation" +console.log(manifest.chunks); // array of Chunk objects ``` #### deleteAsset ```javascript -await cas.deleteAsset({ treeOid }) +await cas.deleteAsset({ treeOid }); ``` Returns logical deletion metadata for an asset. Does not perform any destructive Git operations — the caller must remove refs, and physical deletion requires `git gc --prune`. @@ -340,7 +348,7 @@ console.log(`Asset "${slug}" has ${chunksOrphaned} chunks to clean up`); #### deriveKey ```javascript -await cas.deriveKey(options) +await cas.deriveKey(options); ``` Derives an encryption key from a passphrase using PBKDF2 or scrypt. @@ -382,7 +390,7 @@ const manifest = await cas.storeFile({ #### findOrphanedChunks ```javascript -await cas.findOrphanedChunks({ treeOids }) +await cas.findOrphanedChunks({ treeOids }); ``` Aggregates all chunk blob OIDs referenced across multiple assets and returns a report. Analysis only — does not delete or modify anything. @@ -405,7 +413,7 @@ Aggregates all chunk blob OIDs referenced across multiple assets and returns a r ```javascript const { referenced, total } = await cas.findOrphanedChunks({ - treeOids: [treeOid1, treeOid2, treeOid3] + treeOids: [treeOid1, treeOid2, treeOid3], }); console.log(`${referenced.size} unique blobs across ${total} total chunk references`); ``` @@ -413,7 +421,7 @@ console.log(`${referenced.size} unique blobs across ${total} total chunk referen #### encrypt ```javascript -await cas.encrypt({ buffer, key }) +await cas.encrypt({ buffer, key }); ``` Encrypts a buffer using AES-256-GCM. @@ -435,14 +443,14 @@ Encrypts a buffer using AES-256-GCM. ```javascript const { buf, meta } = await cas.encrypt({ buffer: Buffer.from('secret data'), - key: crypto.randomBytes(32) + key: crypto.randomBytes(32), }); ``` #### decrypt ```javascript -await cas.decrypt({ buffer, key, meta }) +await cas.decrypt({ buffer, key, meta }); ``` Decrypts a buffer using AES-256-GCM. @@ -468,7 +476,7 @@ const decrypted = await cas.decrypt({ buffer: buf, key, meta }); #### rotateKey ```javascript -await cas.rotateKey({ manifest, oldKey, newKey, label }) +await cas.rotateKey({ manifest, oldKey, newKey, label }); ``` Rotates a recipient's encryption key without re-encrypting data blobs. Unwraps the DEK with `oldKey`, re-wraps with `newKey`, and increments `keyVersion` counters. @@ -493,7 +501,10 @@ Rotates a recipient's encryption key without re-encrypting data blobs. Unwraps t ```javascript const rotated = await cas.rotateKey({ - manifest, oldKey: aliceOldKey, newKey: aliceNewKey, label: 'alice', + manifest, + oldKey: aliceOldKey, + newKey: aliceNewKey, + label: 'alice', }); const treeOid = await cas.createTree({ manifest: rotated }); await cas.addToVault({ slug: 'my-asset', treeOid, force: true }); @@ -502,7 +513,7 @@ await cas.addToVault({ slug: 'my-asset', treeOid, force: true }); #### rotateVaultPassphrase ```javascript -await cas.rotateVaultPassphrase({ oldPassphrase, newPassphrase, kdfOptions }) +await cas.rotateVaultPassphrase({ oldPassphrase, newPassphrase, kdfOptions }); ``` Rotates the vault-level encryption passphrase. Re-wraps every envelope-encrypted entry's DEK with a new KEK derived from `newPassphrase`. Non-envelope entries are skipped. @@ -525,7 +536,8 @@ Rotates the vault-level encryption passphrase. Re-wraps every envelope-encrypted ```javascript const { commitOid, rotatedSlugs, skippedSlugs } = await cas.rotateVaultPassphrase({ - oldPassphrase: 'old-secret', newPassphrase: 'new-secret', + oldPassphrase: 'old-secret', + newPassphrase: 'new-secret', }); console.log(`Rotated: ${rotatedSlugs.join(', ')}`); console.log(`Skipped: ${skippedSlugs.join(', ')}`); @@ -536,7 +548,7 @@ console.log(`Skipped: ${skippedSlugs.join(', ')}`); #### chunkSize ```javascript -cas.chunkSize +cas.chunkSize; ``` Returns the configured chunk size in bytes. @@ -659,7 +671,7 @@ await cas.addToVault({ slug: 'demo/hello', treeOid }); #### listVault ```javascript -await cas.listVault() +await cas.listVault(); ``` Lists all vault entries sorted by slug. @@ -678,7 +690,7 @@ for (const { slug, treeOid } of entries) { #### removeFromVault ```javascript -await cas.removeFromVault({ slug }) +await cas.removeFromVault({ slug }); ``` Removes an entry from the vault. @@ -702,7 +714,7 @@ const { removedTreeOid } = await cas.removeFromVault({ slug: 'demo/hello' }); #### resolveVaultEntry ```javascript -await cas.resolveVaultEntry({ slug }) +await cas.resolveVaultEntry({ slug }); ``` Resolves a vault entry slug to its tree OID. @@ -727,7 +739,7 @@ const manifest = await cas.readManifest({ treeOid }); #### getVaultMetadata ```javascript -await cas.getVaultMetadata() +await cas.getVaultMetadata(); ``` Returns the vault metadata, or `null` if no vault exists. @@ -755,9 +767,11 @@ Slugs are validated with the following rules: - Each segment must not exceed 255 bytes - Total slug must not exceed 1024 bytes -### Vault-Level Encryption +### Vault-Configured Passphrase Encryption -When a vault is initialized with a passphrase, all store/restore operations through the vault derive the encryption key from the vault's KDF configuration: +When a vault is initialized with a passphrase, store and restore operations that +use that vault passphrase derive the asset encryption key from the vault's KDF +configuration: ```javascript // Initialize vault with encryption @@ -770,7 +784,12 @@ await cas.initVault({ passphrase: 'secret' }); // git-cas restore --slug demo/hello --out file.txt --vault-passphrase secret ``` -The vault stores the KDF parameters (algorithm, salt, iterations) in `.vault.json` — the passphrase is never stored. +The vault stores the KDF parameters (algorithm, salt, iterations) in +`.vault.json`; the passphrase is never stored. + +This does not make `refs/cas/vault` itself confidential. The vault remains a +readable slug-to-tree index for repository readers. See +[docs/THREAT_MODEL.md](./THREAT_MODEL.md) for the explicit boundary. ### CLI Vault Commands @@ -804,23 +823,23 @@ git cas rotate --slug demo/hello \ #### `git cas rotate` flags -| Flag | Description | -|------|-------------| -| `--slug ` | Resolve tree OID from vault slug (updates vault entry) | -| `--oid ` | Direct tree OID (outputs updated manifest) | -| `--old-key-file ` | Path to current 32-byte key file (required) | -| `--new-key-file ` | Path to new 32-byte key file (required) | -| `--label