d3v07 · d3v07 · Jun 16, 2026 · Jun 16, 2026 · Jun 16, 2026
diff --git a/README.md b/README.md
@@ -13,7 +13,8 @@ Do not publish specific throughput, latency, cache-hit, Kafka-lag, or availabili
 - Evidence guide: `docs/article-evidence.md`
 - Observability map: `docs/observability.md`
 - Benchmark report template: `docs/benchmarks/YYYY-MM-DD-pulseops-benchmark.md`
-- Clean full local benchmark: `docs/benchmarks/2026-06-16-clean-full-benchmark.md`
+- Clean publish local benchmark: `docs/benchmarks/2026-06-16-clean-publish-benchmark.md`
+- Full local benchmark evidence: `docs/benchmarks/2026-06-16-clean-full-benchmark.md` (records `Dirty tree | yes`; rerun after committing before citing final article numbers)
 - Canonical local smoke report: `docs/benchmarks/2026-06-16-final-benchmark-smoke-pulseops-benchmark.md`
 - Heavier ingest-scale report: `docs/benchmarks/2026-06-16-ingest-scale-pulseops-benchmark.md`
 - Synthetic skew generator: `scripts/generate-skewed-events.ts`
@@ -170,9 +171,13 @@ pnpm test:e2e  # Playwright
 
 ### Load Testing
 ```bash
+# Publishable local evidence must start from a clean tree.
+git status --short
+
 pnpm --silent benchmark:generate -- --tenants 100 --events 100000 --days 30 --hot-tenant-ratio 0.6 --late-arrival-ratio 0.05 --duplicate-ratio 0.01 --output jsonl > docs/benchmarks/evidence/events.jsonl
 RUN_ID=local-smoke API_URL=http://localhost:3001 GRAPHQL_URL=http://localhost:3002/graphql API_KEY=demo_key_change_this pnpm benchmark
-pnpm benchmark:report -- --run-id local-smoke --output docs/benchmarks/local-smoke-pulseops-benchmark.md
+RUN_ID=local-smoke pnpm query-plans:capture
+pnpm benchmark:report -- --run-id local-smoke --output docs/benchmarks/local-smoke-pulseops-benchmark.md --force
 RUN_ID=local-smoke pnpm validate:evidence  # writes docs/benchmarks/latest-pulseops-benchmark.md
 pnpm db:verify:fresh
 API_URL=http://localhost:3001 API_KEY=demo_key_change_this pnpm benchmark:ingest
@@ -194,14 +199,14 @@ These are benchmark targets and measurement areas, not measured claims.
 
 | Metric | Status | Notes |
 |--------|--------|-------|
-| Ingest throughput | Measured locally | See `docs/benchmarks/2026-06-16-clean-full-benchmark.md` for the clean full local run, and `docs/benchmarks/2026-06-16-ingest-scale-pulseops-benchmark.md` for the heavier fixed-rate ingest runs. The 1000 RPS target was not sustained locally. |
-| Ingest p95 latency | Measured locally | See dated benchmark reports; request acceptance latency is not aggregate visibility latency |
-| Dashboard query p95 | Measured locally | See canonical smoke report; includes k6 dashboard smoke and cold/warm cache smoke |
-| Worker catch-up | Measured locally | 200-event local smoke run; see canonical smoke report |
+| Ingest throughput | Measured locally | Use `docs/benchmarks/2026-06-16-clean-publish-benchmark.md` for clean-tree article numbers. `docs/benchmarks/2026-06-16-ingest-scale-pulseops-benchmark.md` remains dirty-tree stress evidence. The 1000 RPS target was not sustained locally. |
+| Ingest p95 latency | Measured locally | See dated benchmark reports; request acceptance latency is not aggregate visibility latency. |
+| Dashboard query p95 | Measured locally | See dated reports; cache smoke is cold-vs-warm local evidence, not a production cache-hit-ratio benchmark. |
+| Worker catch-up | Measured locally | 200-event bounded local smoke run; cite the worker catch-up evidence file for the exact run ID. |
 | Kafka lag | Measured locally | Smoke run returned lag to 0; heavier ingest-scale snapshot captured 10,254,305 queued messages. Do not claim a lag limit or freshness guarantee. |
 | Tenant skew impact | Smoke measured locally | Canonical local smoke reconciled 249 persisted hot-test events with Kafka lag 0: hot 201, quiet 40, medium 8. Evidence: `docs/benchmarks/evidence/hot-tenant-db-2026-06-16-final-benchmark-smoke.json`; full long-duration skew benchmark still needed |
 | Hot-tenant DB pressure | Measured locally when `benchmark:hot-db -- --require-complete` is run | Aggregate-key pressure, request/persistence/lag reconciliation, and after-run DB snapshot; not continuous lock sampling |
-| Backpressure behavior | TBD | Record rate limits, errors, queue lag, and recovery |
+| Backpressure behavior | k6 load-script evidence only | Correlate with Kafka lag, worker catch-up, and DB metrics before making stronger backpressure claims. |
 
 ## Deployment
 

diff --git a/docs/article-evidence.md b/docs/article-evidence.md
diff --git a/docs/benchmarks/2026-06-16-clean-full-benchmark.md b/docs/benchmarks/2026-06-16-clean-full-benchmark.md
@@ -1,6 +1,8 @@
 # PulseOps Benchmark Report: 2026-06-16
 
-Status: evidence-backed local report for run ID `2026-06-16-clean-full-benchmark`; not production-scale
+Status: evidence-backed local report for run ID `2026-06-16-clean-full-benchmark`; dirty-tree evidence; not production-scale
+
+Publishability: not final publishable article evidence because this report records `Dirty tree | yes`. Use it for review and methodology, then rerun from a clean commit before citing final numbers publicly.
 
 ## Environment
 
@@ -21,6 +23,8 @@ Status: evidence-backed local report for run ID `2026-06-16-clean-full-benchmark
 | k6 version | k6 v2.0.0+dirty (commit/8c3be52cc1-dirty, go1.26.3, linux/arm64) (Docker fallback image grafana/k6:2.0.0) |
 | Dataset | local Docker dataset at report generation time |
 
+The dashboard cache measurement in this run used the default demo org/project, while the hot-tenant and run-scoped raw-event query plans target a seeded benchmark org/project. Treat the cache row as same-run local cache-path evidence, not as the hot tenant's cache latency.
+
 ## Commands
 
 ```bash

diff --git a/docs/benchmarks/2026-06-16-clean-publish-benchmark.md b/docs/benchmarks/2026-06-16-clean-publish-benchmark.md
@@ -0,0 +1,186 @@
+# PulseOps Benchmark Report: 2026-06-16
+
+Status: evidence-backed local report for run ID `2026-06-16-clean-publish-benchmark`; not production-scale
+
+Publishability: candidate publishable local evidence; still not production-scale
+
+## Environment
+
+| Field | Value |
+| --- | --- |
+| Git commit | `63f9556cefad9548774c0eca17b01e558eda3d87` |
+| Dirty tree | no |
+| Dirty tree details | none |
+| Machine | Apple M4 Pro, 12 logical CPUs, 24.00 GiB host memory |
+| Docker resources | 12 CPUs, 7.65 GiB |
+| OS | Darwin 25.5.0 arm64 |
+| Node.js version | v25.3.0 |
+| PostgreSQL version | 16.13 |
+| Redis version | v=7.4.8 |
+| Kafka version | 4.2.0 |
+| PostgreSQL row count at report capture | 18376 raw events |
+| Daily aggregate row count at report capture | 630 rows |
+| Event partitions at report capture | 7 child partitions |
+| k6 version | k6 v2.0.0+dirty (commit/8c3be52cc1-dirty, go1.26.3, linux/arm64) (Docker fallback image grafana/k6:2.0.0) |
+| Dataset | local Docker dataset at report generation time |
+
+Environment values come from pre-run metadata when available. PostgreSQL row counts are captured when this report is generated, after the benchmark and query-plan capture.
+
+## Run Provenance
+
+| Field | Value |
+| --- | --- |
+| Metadata file | `docs/benchmarks/evidence/run-metadata-2026-06-16-clean-publish-benchmark.json` |
+| Run started | 2026-06-16T21:22:27.830Z |
+| Run completed | 2026-06-16T21:24:13.432Z |
+| Run status | completed |
+| Branch at run start | `feat/publish-safe-evidence` |
+| Suites requested | ingest, hot, hotDb, dashboard, cache, worker, backpressure |
+| Suites completed | ingest, hot, hotDb, dashboard, cache, worker, backpressure |
+
+### Dirty Tree Details
+
+```text
+none
+```
+
+### Recorded Environment Overrides
+
+| Name | Value |
+| --- | --- |
+| `API_URL` | `http://localhost:3001` |
+| `BATCH_SIZE` | `20` |
+| `BURST_HOLD` | `15s` |
+| `BURST_RAMP` | `5s` |
+| `BURST_RATE` | `20` |
+| `DURATION` | `20s` |
+| `EVENTS` | `200` |
+| `GRAPHQL_URL` | `http://localhost:3002/graphql` |
+| `HOLD_DURATION` | `15s` |
+| `MAX_VUS` | `100` |
+| `ORG_ID` | `00000000-0000-4000-8000-0000000f4241` |
+| `PEAK_RATE` | `20` |
+| `POLL_MS` | `500` |
+| `PREALLOCATED_VUS` | `30` |
+| `PROJECT_ID` | `00000000-0000-4000-8000-0000001e8481` |
+| `RAMP_DOWN_DURATION` | `5s` |
+| `RAMP_DURATION` | `5s` |
+| `RATE` | `20` |
+| `RECOVERY` | `10s` |
+| `RECOVERY_RATE` | `5` |
+| `SLEEP_SECONDS` | `0` |
+| `START_RATE` | `5` |
+| `TENANT_KEYS_FILE` | `tmp/clean-publish-benchmark-tenants.json` |
+| `TIMEOUT_MS` | `120000` |
+| `VUS` | `10` |
+| `WARM_ITERATIONS` | `10` |
+
+### Recorded Suite Commands
+
+| Suite | Command |
+| --- | --- |
+| ingest | `node scripts/run-k6.js tests/load/ingest-throughput.js` |
+| hot | `node scripts/run-k6.js tests/load/hot-tenant.js` |
+| hotDb | `pnpm exec tsx scripts/measure-hot-tenant-db.ts` |
+| dashboard | `node scripts/run-k6.js tests/load/dashboard-query.js` |
+| cache | `pnpm exec tsx scripts/measure-dashboard-cache.ts` |
+| worker | `pnpm exec tsx scripts/measure-worker-catchup.ts` |
+| backpressure | `node scripts/run-k6.js tests/load/backpressure.js` |
+
+## Commands
+
+```bash
+# Command matching the run-specific evidence files currently present in this report:
+RUN_ID=2026-06-16-clean-publish-benchmark pnpm benchmark
+RUN_ID=2026-06-16-clean-publish-benchmark pnpm benchmark:report -- --run-id 2026-06-16-clean-publish-benchmark --force
+
+# Full-suite command, if you want every row populated:
+RUN_ID=2026-06-16-clean-publish-benchmark pnpm benchmark
+```
+
+Run-specific evidence files found for this report: ingest, hot, hotDb, dashboard, cache, worker, backpressure.
+If only part of the suite was run, missing evidence stays marked as `not found` below.
+
+## Results
+
+| Test | Command | Throughput | p50 latency | p95 latency | p99 latency | Error rate | Kafka lag | DB notes | Result |
+| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
+| Ingest throughput | `pnpm benchmark:ingest` | 20.00 req/s | 3.34 ms | 7.19 ms | 10.55 ms | 0.00% | not measured by this k6 row | 400 requests | Measured; docs/benchmarks/evidence/ingest-throughput-2026-06-16-clean-publish-benchmark.json |
+| Hot tenant | `pnpm benchmark:hot-tenant` | 16.99 req/s | 3.55 ms | 6.69 ms | 9.32 ms | 0.00% | not measured by this k6 row | 425 requests | Measured; docs/benchmarks/evidence/hot-tenant-2026-06-16-clean-publish-benchmark.json |
+| Hot tenant DB evidence | `pnpm benchmark:hot-db` | 425 persisted hot-test events | n/a | n/a | n/a | 0 unmatched requests | 0 | hot raw count 0.11 ms; quiet raw count 0.06 ms; 0 waiting locks at snapshot; hot 322/425; max hot events/key 257 | Measured; docs/benchmarks/evidence/hot-tenant-db-2026-06-16-clean-publish-benchmark.json |
+| Dashboard query | `pnpm benchmark:dashboard` | 2119.62 req/s | 4.24 ms | 6.60 ms | 11.48 ms | 0.00% | not measured by this k6 row | 42399 requests | Measured; docs/benchmarks/evidence/dashboard-query-2026-06-16-clean-publish-benchmark.json |
+| Dashboard cache | `pnpm benchmark:cache` | n/a | 1.56 ms | 3.05 ms | not captured | 0 GraphQL errors | n/a | cold 30.97 ms, 10 warm iterations | Measured; docs/benchmarks/evidence/dashboard-cache-2026-06-16-clean-publish-benchmark.json |
+| Worker catch-up | `pnpm benchmark:worker` | 111.90 persisted events/s | n/a | n/a | n/a | 0 lost in run | 0 | 200 accepted / 200 persisted | Measured; docs/benchmarks/evidence/worker-catchup-2026-06-16-clean-publish-benchmark.json |
+| Backpressure | `pnpm benchmark:backpressure` | 16.28 req/s | 3.08 ms | 4.78 ms | 6.83 ms | 0.00% | not measured by this k6 row | 487 requests | Measured; docs/benchmarks/evidence/backpressure-2026-06-16-clean-publish-benchmark.json |
+
+## Run-Scoped Query Plans
+
+| Query | Plan file | Observation |
+| --- | --- | --- |
+| clean-publish-benchmark-aggregate-daily-dashboard | `docs/query-plans/2026-06-16-clean-publish-benchmark-aggregate-daily-dashboard.md` | Captured for run ID 2026-06-16-clean-publish-benchmark; read file for row counts and interpretation |
+| clean-publish-benchmark-graphql-cache-path | `docs/query-plans/2026-06-16-clean-publish-benchmark-graphql-cache-path.md` | Captured for run ID 2026-06-16-clean-publish-benchmark; read file for row counts and interpretation |
+| clean-publish-benchmark-materialized-dashboard | `docs/query-plans/2026-06-16-clean-publish-benchmark-materialized-dashboard.md` | Captured for run ID 2026-06-16-clean-publish-benchmark; read file for row counts and interpretation |
+| clean-publish-benchmark-partition-pruning-24h | `docs/query-plans/2026-06-16-clean-publish-benchmark-partition-pruning-24h.md` | Captured for run ID 2026-06-16-clean-publish-benchmark; read file for row counts and interpretation |
+| clean-publish-benchmark-partition-pruning-30d | `docs/query-plans/2026-06-16-clean-publish-benchmark-partition-pruning-30d.md` | Captured for run ID 2026-06-16-clean-publish-benchmark; read file for row counts and interpretation |
+| clean-publish-benchmark-tenant-dashboard-chosen-index | `docs/query-plans/2026-06-16-clean-publish-benchmark-tenant-dashboard-chosen-index.md` | Captured for run ID 2026-06-16-clean-publish-benchmark; read file for row counts and interpretation |
+| clean-publish-benchmark-tenant-dashboard-index-disabled | `docs/query-plans/2026-06-16-clean-publish-benchmark-tenant-dashboard-index-disabled.md` | Captured for run ID 2026-06-16-clean-publish-benchmark; read file for row counts and interpretation |
+
+## Reference Query Plans
+
+These saved EXPLAIN ANALYZE files are repository evidence, not generated by this benchmark report unless they explicitly mention run ID `2026-06-16-clean-publish-benchmark`.
+
+| Query | Plan file | Observation |
+| --- | --- | --- |
+| aggregate-daily-dashboard | `docs/query-plans/2026-06-16-aggregate-daily-dashboard.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run |
+| clean-full-benchmark-aggregate-daily-dashboard | `docs/query-plans/2026-06-16-clean-full-benchmark-aggregate-daily-dashboard.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run |
+| clean-full-benchmark-graphql-cache-path | `docs/query-plans/2026-06-16-clean-full-benchmark-graphql-cache-path.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run |
+| clean-full-benchmark-materialized-dashboard | `docs/query-plans/2026-06-16-clean-full-benchmark-materialized-dashboard.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run |
+| clean-full-benchmark-partition-pruning-24h | `docs/query-plans/2026-06-16-clean-full-benchmark-partition-pruning-24h.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run |
+| clean-full-benchmark-partition-pruning-30d | `docs/query-plans/2026-06-16-clean-full-benchmark-partition-pruning-30d.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run |
+| clean-full-benchmark-tenant-dashboard-chosen-index | `docs/query-plans/2026-06-16-clean-full-benchmark-tenant-dashboard-chosen-index.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run |
+| clean-full-benchmark-tenant-dashboard-index-disabled | `docs/query-plans/2026-06-16-clean-full-benchmark-tenant-dashboard-index-disabled.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run |
+| final-benchmark-smoke-aggregate-daily-dashboard | `docs/query-plans/2026-06-16-final-benchmark-smoke-aggregate-daily-dashboard.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run |
+| final-benchmark-smoke-graphql-cache-path | `docs/query-plans/2026-06-16-final-benchmark-smoke-graphql-cache-path.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run |
+| final-benchmark-smoke-materialized-dashboard | `docs/query-plans/2026-06-16-final-benchmark-smoke-materialized-dashboard.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run |
+| final-benchmark-smoke-partition-pruning-24h | `docs/query-plans/2026-06-16-final-benchmark-smoke-partition-pruning-24h.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run |
+| final-benchmark-smoke-partition-pruning-30d | `docs/query-plans/2026-06-16-final-benchmark-smoke-partition-pruning-30d.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run |
+| final-benchmark-smoke-tenant-dashboard-chosen-index | `docs/query-plans/2026-06-16-final-benchmark-smoke-tenant-dashboard-chosen-index.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run |
+| final-benchmark-smoke-tenant-dashboard-index-disabled | `docs/query-plans/2026-06-16-final-benchmark-smoke-tenant-dashboard-index-disabled.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run |
+| materialized-dashboard | `docs/query-plans/2026-06-16-materialized-dashboard.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run |
+| partition-pruning-24h | `docs/query-plans/2026-06-16-partition-pruning-24h.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run |
+| partition-pruning-30d | `docs/query-plans/2026-06-16-partition-pruning-30d.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run |
+| tenant-dashboard-chosen-index | `docs/query-plans/2026-06-16-tenant-dashboard-chosen-index.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run |
+| tenant-dashboard-index-disabled | `docs/query-plans/2026-06-16-tenant-dashboard-index-disabled.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run |
+
+## Evidence Files
+
+| File | Description |
+| --- | --- |
+| `docs/benchmarks/evidence/run-metadata-2026-06-16-clean-publish-benchmark.json` | Pre-run benchmark metadata JSON |
+| `docs/benchmarks/evidence/ingest-throughput-2026-06-16-clean-publish-benchmark.json` | Raw k6 ingest summary JSON |
+| `docs/benchmarks/evidence/hot-tenant-2026-06-16-clean-publish-benchmark.json` | Raw k6 hot-tenant summary JSON |
+| `docs/benchmarks/evidence/hot-tenant-db-2026-06-16-clean-publish-benchmark.json` | Hot-tenant PostgreSQL evidence JSON |
+| `docs/benchmarks/evidence/dashboard-query-2026-06-16-clean-publish-benchmark.json` | Raw k6 dashboard-query summary JSON |
+| `docs/benchmarks/evidence/dashboard-cache-2026-06-16-clean-publish-benchmark.json` | Cold/warm GraphQL cache JSON measurement |
+| `docs/benchmarks/evidence/worker-catchup-2026-06-16-clean-publish-benchmark.json` | Worker catch-up JSON measurement |
+| `docs/benchmarks/evidence/backpressure-2026-06-16-clean-publish-benchmark.json` | Raw k6 backpressure summary JSON |
+
+## Claims Allowed From This Run
+
+- The numbers in the table are local measurements for run ID `2026-06-16-clean-publish-benchmark` only.
+- Kafka decoupling can be discussed when ingest acceptance and worker catch-up or lag evidence are both present.
+- Cache claims are limited to the cold/warm GraphQL measurement if the dashboard cache evidence file exists.
+- Worker throughput claims are limited to the bounded worker catch-up workload if the worker evidence file exists.
+- Hot-tenant database claims are limited to the aggregate-key pressure, representative EXPLAIN timings, reconciliation status, and after-run PostgreSQL snapshot in the hot-tenant DB evidence file if present.
+- Query plan claims from this run require run-scoped files above. Otherwise cite the reference query-plan files separately.
+- Treat this report as article-ready only if `Dirty tree` is `no`, run metadata status is `completed`, every requested suite is completed, run-scoped query plans are listed, and every cited number comes from this run ID.
+
+## Claims Not Supported By This Run
+
+- Do not claim production scale, production readiness, or a fixed capacity limit.
+- Do not extrapolate beyond the exact workload, machine, Docker resources, and dataset above.
+- Do not claim long-duration or million-event tenant-skew behavior unless that evidence file is present.
+- Do not claim realistic cache hit ratio from a cold/warm smoke measurement.
+- Do not claim Kafka lag limits beyond the captured lag evidence; this run's worker final lag was 0.
+- Do not claim final publishable benchmark evidence from this report if `Dirty tree` is `yes`, run metadata is missing/incomplete, or run-scoped query plans are missing.
+- The fallback k6 runner is pinned to `grafana/k6:2.0.0`; record a new exact version if you override it or use a local k6 binary.