Skip to content

docs: publish benchmark baselines#27

Merged
flyingrobots merged 2 commits intomainfrom
docs/tr-003-benchmark-baselines
Mar 30, 2026
Merged

docs: publish benchmark baselines#27
flyingrobots merged 2 commits intomainfrom
docs/tr-003-benchmark-baselines

Conversation

@flyingrobots
Copy link
Copy Markdown
Member

@flyingrobots flyingrobots commented Mar 30, 2026

Summary

  • publish the first maintained benchmark baseline in docs/BENCHMARKS.md
  • land TR-003 as a Truth cycle and archive the consumed backlog card
  • add TR-011 for streaming encrypted restore follow-up work

Verification

  • CI=1 npx vitest bench --run test/benchmark/chunking.bench.js
  • npx prettier --check CHANGELOG.md docs/BENCHMARKS.md docs/BACKLOG/README.md docs/archive/BACKLOG/README.md docs/archive/BACKLOG/TR-003-benchmark-baselines.md docs/design/README.md docs/design/TR-003-benchmark-baselines.md docs/legends/TR-truth.md
  • git diff --check
  • pre-commit eslint
  • pre-push eslint
  • pre-push npm test

Summary by CodeRabbit

  • Documentation
    • Published benchmark baselines documentation with chunking performance results and guidance for development
    • Updated project documentation to reflect completed design investigation and added streaming encrypted restore work to backlog

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 30, 2026

📝 Walkthrough

Walkthrough

This PR closes the TR-003 Benchmark Baselines task by publishing chunking performance baselines to docs/BENCHMARKS.md and archiving the associated design documentation. It simultaneously introduces TR-011 Streaming Encrypted Restore as a new backlog item, and updates relevant indices and legends to reflect these changes.

Changes

Cohort / File(s) Summary
Changelog & Index Updates
CHANGELOG.md, docs/design/README.md, docs/legends/TR-truth.md
Added entries documenting TR-003 completion and TR-011 initiation; promoted TR-003 from backlog to design doc surface; added TR-011 to active backlog listing.
Active Backlog Management
docs/BACKLOG/README.md, docs/BACKLOG/TR-011-streaming-encrypted-restore.md
Removed TR-003 from live backlog; added new TR-011 design investigation focused on lower-memory streaming encrypted restore paths and buffering constraints.
Published Benchmarks
docs/BENCHMARKS.md
New baseline document recording chunking tradeoff results (fixed-size vs. CDC) with throughput data and reuse percentages across variable input sizes, environment specifics, and refresh guidance.
Design & Archive Records
docs/design/TR-003-benchmark-baselines.md, docs/archive/BACKLOG/README.md, docs/archive/BACKLOG/TR-003-benchmark-baselines.md
Formalized TR-003 truth cycle with scope, non-goals, decisions, implementation sequence, and verification steps; archived backlog entry with linked reference; corrected relative paths in archived design doc.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

🐰 A rabbit hops through docs with glee,
TR-003 benchmarks published—wild and free!
Archives tucked, new TR-011 begun,
Indices aligned, the routing's done. 📊✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'docs: publish benchmark baselines' directly and clearly summarizes the main change—publishing benchmark baselines documentation as shown in docs/BENCHMARKS.md and related documentation updates.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/tr-003-benchmark-baselines

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
docs/design/TR-003-benchmark-baselines.md (2)

71-74: Optional: Consider varying sentence structure for Human Playback questions.

The static analysis tool flags three consecutive questions beginning with "Does." While the parallel structure works well for a checklist format and aids consistency, you could optionally vary the phrasing if you prefer:

  • "Does the published doc explain both throughput cost and dedupe benefit?"
  • "Is the machine and runtime environment documented?"
  • "Does it avoid pretending local measurements are universal truth?"

However, the current structure is clear and appropriate for validation questions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/TR-003-benchmark-baselines.md` around lines 71 - 74, The three
consecutive checklist items beginning with "Does" can be optionally rephrased
for variety: keep the first as-is ("Does the published doc explain both
throughput cost and dedupe benefit?"), change the second to a neutral phrasing
like "Is the machine and runtime environment documented?" (replace the line that
currently begins "Does it say what machine and runtime produced the numbers?"),
and reword the third to "Does it avoid presenting local measurements as
universal truth?" (replace the line that begins "Does it avoid pretending local
measurements are universal truth?"); update those three checklist lines in
TR-003-benchmark-baselines.md accordingly while preserving their validation
intent and checklist formatting.

77-80: Optional: Consider varying sentence structure for Agent Playback questions.

Similar to the Human Playback section, three consecutive questions begin with "Can." The parallel structure is appropriate for a checklist, but you could optionally vary the phrasing:

  • "Can an agent tell which benchmark results are published versus merely possible to derive from the harness?"
  • "Is the distinction between fixed-chunk speed and CDC edit-shift reuse benefits clear?"
  • "Can it tell how to refresh the baseline later without inventing a new method?"

The current structure is effective for validation purposes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/TR-003-benchmark-baselines.md` around lines 77 - 80, Update the
"Agent Playback" question block by rephrasing the second question to vary
sentence structure while preserving meaning: keep the first question as "Can an
agent tell which benchmark results are published versus merely possible to
derive from the harness?", change the second to "Is the distinction between
fixed-chunk speed and CDC edit-shift reuse benefits clear?", and keep the third
as "Can it tell how to refresh the baseline later without inventing a new
method?" so the three lines still form a checklist but avoid repetitive "Can"
openings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@docs/design/TR-003-benchmark-baselines.md`:
- Around line 71-74: The three consecutive checklist items beginning with "Does"
can be optionally rephrased for variety: keep the first as-is ("Does the
published doc explain both throughput cost and dedupe benefit?"), change the
second to a neutral phrasing like "Is the machine and runtime environment
documented?" (replace the line that currently begins "Does it say what machine
and runtime produced the numbers?"), and reword the third to "Does it avoid
presenting local measurements as universal truth?" (replace the line that begins
"Does it avoid pretending local measurements are universal truth?"); update
those three checklist lines in TR-003-benchmark-baselines.md accordingly while
preserving their validation intent and checklist formatting.
- Around line 77-80: Update the "Agent Playback" question block by rephrasing
the second question to vary sentence structure while preserving meaning: keep
the first question as "Can an agent tell which benchmark results are published
versus merely possible to derive from the harness?", change the second to "Is
the distinction between fixed-chunk speed and CDC edit-shift reuse benefits
clear?", and keep the third as "Can it tell how to refresh the baseline later
without inventing a new method?" so the three lines still form a checklist but
avoid repetitive "Can" openings.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d3104acb-c689-46a6-bfc1-ca843d3853ca

📥 Commits

Reviewing files that changed from the base of the PR and between 849762d and 764ac2c.

📒 Files selected for processing (9)
  • CHANGELOG.md
  • docs/BACKLOG/README.md
  • docs/BACKLOG/TR-011-streaming-encrypted-restore.md
  • docs/BENCHMARKS.md
  • docs/archive/BACKLOG/README.md
  • docs/archive/BACKLOG/TR-003-benchmark-baselines.md
  • docs/design/README.md
  • docs/design/TR-003-benchmark-baselines.md
  • docs/legends/TR-truth.md

@flyingrobots flyingrobots merged commit 60b9dd0 into main Mar 30, 2026
6 checks passed
@flyingrobots flyingrobots deleted the docs/tr-003-benchmark-baselines branch March 30, 2026 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant