diff --git a/README.md b/README.md index 41841cf..9488e4b 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,8 @@ Do not publish specific throughput, latency, cache-hit, Kafka-lag, or availabili - Evidence guide: `docs/article-evidence.md` - Observability map: `docs/observability.md` - Benchmark report template: `docs/benchmarks/YYYY-MM-DD-pulseops-benchmark.md` -- Clean full local benchmark: `docs/benchmarks/2026-06-16-clean-full-benchmark.md` +- Clean publish local benchmark: `docs/benchmarks/2026-06-16-clean-publish-benchmark.md` +- Full local benchmark evidence: `docs/benchmarks/2026-06-16-clean-full-benchmark.md` (records `Dirty tree | yes`; rerun after committing before citing final article numbers) - Canonical local smoke report: `docs/benchmarks/2026-06-16-final-benchmark-smoke-pulseops-benchmark.md` - Heavier ingest-scale report: `docs/benchmarks/2026-06-16-ingest-scale-pulseops-benchmark.md` - Synthetic skew generator: `scripts/generate-skewed-events.ts` @@ -170,9 +171,13 @@ pnpm test:e2e # Playwright ### Load Testing ```bash +# Publishable local evidence must start from a clean tree. +git status --short + pnpm --silent benchmark:generate -- --tenants 100 --events 100000 --days 30 --hot-tenant-ratio 0.6 --late-arrival-ratio 0.05 --duplicate-ratio 0.01 --output jsonl > docs/benchmarks/evidence/events.jsonl RUN_ID=local-smoke API_URL=http://localhost:3001 GRAPHQL_URL=http://localhost:3002/graphql API_KEY=demo_key_change_this pnpm benchmark -pnpm benchmark:report -- --run-id local-smoke --output docs/benchmarks/local-smoke-pulseops-benchmark.md +RUN_ID=local-smoke pnpm query-plans:capture +pnpm benchmark:report -- --run-id local-smoke --output docs/benchmarks/local-smoke-pulseops-benchmark.md --force RUN_ID=local-smoke pnpm validate:evidence # writes docs/benchmarks/latest-pulseops-benchmark.md pnpm db:verify:fresh API_URL=http://localhost:3001 API_KEY=demo_key_change_this pnpm benchmark:ingest @@ -194,14 +199,14 @@ These are benchmark targets and measurement areas, not measured claims. | Metric | Status | Notes | |--------|--------|-------| -| Ingest throughput | Measured locally | See `docs/benchmarks/2026-06-16-clean-full-benchmark.md` for the clean full local run, and `docs/benchmarks/2026-06-16-ingest-scale-pulseops-benchmark.md` for the heavier fixed-rate ingest runs. The 1000 RPS target was not sustained locally. | -| Ingest p95 latency | Measured locally | See dated benchmark reports; request acceptance latency is not aggregate visibility latency | -| Dashboard query p95 | Measured locally | See canonical smoke report; includes k6 dashboard smoke and cold/warm cache smoke | -| Worker catch-up | Measured locally | 200-event local smoke run; see canonical smoke report | +| Ingest throughput | Measured locally | Use `docs/benchmarks/2026-06-16-clean-publish-benchmark.md` for clean-tree article numbers. `docs/benchmarks/2026-06-16-ingest-scale-pulseops-benchmark.md` remains dirty-tree stress evidence. The 1000 RPS target was not sustained locally. | +| Ingest p95 latency | Measured locally | See dated benchmark reports; request acceptance latency is not aggregate visibility latency. | +| Dashboard query p95 | Measured locally | See dated reports; cache smoke is cold-vs-warm local evidence, not a production cache-hit-ratio benchmark. | +| Worker catch-up | Measured locally | 200-event bounded local smoke run; cite the worker catch-up evidence file for the exact run ID. | | Kafka lag | Measured locally | Smoke run returned lag to 0; heavier ingest-scale snapshot captured 10,254,305 queued messages. Do not claim a lag limit or freshness guarantee. | | Tenant skew impact | Smoke measured locally | Canonical local smoke reconciled 249 persisted hot-test events with Kafka lag 0: hot 201, quiet 40, medium 8. Evidence: `docs/benchmarks/evidence/hot-tenant-db-2026-06-16-final-benchmark-smoke.json`; full long-duration skew benchmark still needed | | Hot-tenant DB pressure | Measured locally when `benchmark:hot-db -- --require-complete` is run | Aggregate-key pressure, request/persistence/lag reconciliation, and after-run DB snapshot; not continuous lock sampling | -| Backpressure behavior | TBD | Record rate limits, errors, queue lag, and recovery | +| Backpressure behavior | k6 load-script evidence only | Correlate with Kafka lag, worker catch-up, and DB metrics before making stronger backpressure claims. | ## Deployment diff --git a/docs/article-evidence.md b/docs/article-evidence.md index 081641a..270d065 100644 --- a/docs/article-evidence.md +++ b/docs/article-evidence.md @@ -1,249 +1,243 @@ # PulseOps Article Evidence -This document separates what the repository implements from what still needs a measured benchmark run. Use it as source material for articles, demos, and portfolio writeups without inventing performance numbers. +This document is the strict evidence guide for using PulseOps in the article "Postgres Scales Until Coordination Becomes the Product." + +Do not use it as a hype sheet. A public claim is safe only when it points to a dated benchmark report, evidence JSON, query plan, test, migration, or source file in this repository. + +## Publishability Status + +The repository now has one checked-in benchmark report that qualifies as candidate publishable local evidence under the stricter clean-tree rule: + +- `docs/benchmarks/2026-06-16-clean-publish-benchmark.md` records run ID `2026-06-16-clean-publish-benchmark`, commit `63f9556cefad9548774c0eca17b01e558eda3d87`, `Dirty tree | no`, completed metadata, all benchmark suites, and run-scoped query plans. +- `docs/benchmarks/2026-06-16-clean-full-benchmark.md` is an older full-suite local run, but it records `Dirty tree | yes`. +- `docs/benchmarks/2026-06-16-ingest-scale-pulseops-benchmark.md` is useful stress evidence, but it also records a dirty tree and should be framed as internal/local stress evidence. +- Future runs write `docs/benchmarks/evidence/run-metadata-.json` before benchmark suites start, so the report generator can distinguish pre-run git cleanliness from evidence files created by the run. + +For final article numbers, cite only `docs/benchmarks/2026-06-16-clean-publish-benchmark.md` or a newer report that also says `Dirty tree | no`. + +## A. Safe Public Claims + +These claims are supported by code and repository evidence: + +- PulseOps uses a Fastify ingest API, Kafka, a worker process, PostgreSQL, Redis, and a GraphQL dashboard API. +- HTTP ingest acceptance is decoupled from PostgreSQL visibility by Kafka. A `202 Accepted` response means Kafka accepted the message, not that the event is already queryable in PostgreSQL. +- The real authorization and tenant boundary is `org_id` plus `project_id`. +- `tenant_id` exists in some schema objects as a generated compatibility column equal to `org_id`; it is not the auth boundary. `properties.tenant_id` is synthetic benchmark metadata only. +- Raw events are range-partitioned by event `timestamp` with monthly local partitions. +- Dashboard resolvers read aggregate tables when no property filter is needed and fall back to raw `events` for filtered paths or recent events. +- Redis cache keys include `orgId`, `projectId`, cache version, query family, date range, and filters. Cache invalidation is namespace versioning; old keys remain until TTL expiry. +- Worker idempotency is bounded by `event_dedup_keys` and aggregate updates in the `processEvent` transaction. The repo proves duplicate handling and one controlled Kafka replay window, not global exactly-once processing. +- The benchmark suite can produce evidence for ingest acceptance, hot-tenant skew, dashboard query latency, cold-vs-warm cache behavior, worker catch-up, backpressure load, and query plans. +- Materialized dashboard query plans are schema/query-plan evidence only. The current GraphQL resolvers do not read `mv_dashboard_metrics`. + +## B. Unsafe Public Claims + +Do not publish these claims from current evidence: + +- PulseOps is production-ready. +- PulseOps proves production scale, Supabase scale, or a universal Postgres capacity limit. +- PulseOps sustained 1000 RPS end-to-end locally. The heavier ingest-scale run explicitly shows that target was not sustained. +- The local cache timing proves a realistic production cache-hit ratio. +- Kafka lag has a stable SLO or upper bound. +- Dashboard freshness is guaranteed under load. +- The system provides exactly-once processing across every crash, rebalance, broker, network, and database failure mode. +- Horizontal scalability has been demonstrated. +- The materialized view is on the runtime GraphQL dashboard path. +- Backpressure behavior has been fully characterized unless the k6 summary is correlated with Kafka lag, worker catch-up, and database metrics from the same run. + +## C. Evidence-Backed Benchmark Numbers -## Architecture Evidence +Use the clean publish run for article numbers. Keep older dirty-tree reports as review/history evidence only. -PulseOps is an event analytics system with four main runtime paths: +| Evidence | Current measured result | Publish caveat | +| --- | ---: | --- | +| Clean publish local benchmark | `docs/benchmarks/2026-06-16-clean-publish-benchmark.md` completed ingest, hot tenant, hot DB reconciliation, dashboard, cache, worker catch-up, backpressure, and query-plan capture | Candidate publishable local evidence; still not production-scale | +| Ingest throughput in clean publish run | 400 requests, 20.00 req/s, p95 7.19 ms, 0.00% HTTP failures | Local request-acceptance number only | +| Hot-tenant DB reconciliation in clean publish run | 425 persisted hot-test events, hot class 322/425, Kafka lag 0 | Local short run; not long-duration contention evidence | +| Dashboard query in clean publish run | 42,399 GraphQL requests, 2119.62 req/s, p95 6.60 ms, 0.00% HTTP failures | Local k6 query mix; not a production dashboard SLO | +| Cache smoke in clean publish run | Cold 30.97 ms, warm median 1.56 ms, warm p95 3.05 ms over 10 warm requests | Cold-vs-warm local timing, not cache-hit ratio | +| Worker catch-up in clean publish run | 200 accepted events, 200 persisted rows, final Kafka lag 0, 111.90 persisted events/s until caught up | Bounded local worker catch-up smoke | +| Clean full local benchmark | `docs/benchmarks/2026-06-16-clean-full-benchmark.md` completed ingest, hot tenant, hot DB reconciliation, dashboard, cache, worker catch-up, backpressure, and query-plan capture | Report says `Dirty tree | yes`; rerun after commit before final citation | +| Ingest throughput in clean-full run | 400 requests, 20.00 req/s, p95 8.02 ms, 0.00% HTTP failures | Local request-acceptance number only | +| Hot-tenant DB reconciliation in clean-full run | 424 persisted hot-test events, hot class 349/424, Kafka lag 0 | Local short run; not long-duration contention evidence | +| Dashboard query in clean-full run | 42,324 GraphQL requests, 2115.89 req/s, p95 6.79 ms, 0.00% HTTP failures | Local k6 query mix; not a production dashboard SLO | +| Cache smoke in clean-full run | Cold 115.26 ms, warm median 1.29 ms, warm p95 3.85 ms over 10 warm requests | Same run ID but default demo tenant/project, not the hot tenant; not a cache-hit-ratio claim | +| Worker catch-up in clean-full run | 200 accepted events, 200 persisted rows, final Kafka lag 0, 107.48 persisted events/s until caught up | Bounded local worker catch-up smoke | +| Backpressure k6 in clean-full run | 487 requests, p95 4.31 ms, 0.00% HTTP failures | k6 summary only; needs same-run lag/DB correlation for stronger claims | +| Ingest-scale stress run | 100 RPS was clean at HTTP layer; 500 RPS showed dropped iterations and p95 716.94 ms; 1000 RPS target was not sustained | Dirty-tree stress evidence; use to avoid overclaiming | +| Ingest-scale Kafka lag snapshot | 10,254,305 messages lag for `pulseops-aggregators` | Coordination/backpressure evidence, not a persistence guarantee | -1. Event producers send JSON events to the Fastify ingest API at `POST /api/v1/events` or `POST /api/v1/events/batch`. -2. The ingest API validates payload shape, applies API-key auth, attaches `org_id` and `project_id`, and publishes accepted events to Kafka topic `events-raw`. -3. The worker consumes Kafka messages, writes raw events to PostgreSQL through an idempotency table, and updates daily aggregate rows only for new events. -4. The GraphQL API authenticates `X-API-Key`, serves dashboard queries from `daily_aggregates` when possible, falls back to filtered raw event queries when filters require it, and caches query responses in Redis with tenant/project cache-version keys. +## D. Files Supporting Each Claim -Primary code references: +| Claim | Supporting files | +| --- | --- | +| Ingest API publishes to Kafka and returns acceptance | `services/ingest-api/src/index.ts`, `services/ingest-api/src/events.ts`, `tests/load/ingest-throughput.js` | +| API key maps request to tenant/project | `services/ingest-api/src/middleware/auth.ts`, `services/ingest-api/tests/integration/ingest.test.ts` | +| GraphQL rejects cross-tenant reads | `services/graphql-api/src/resolvers.ts`, `tests/integration/tenant-isolation.test.ts`, `tests/integration/graphql-authz.test.ts` | +| Worker inserts idempotently and updates aggregates | `services/worker/src/processing.ts`, `services/worker/src/aggregators/daily.ts`, `tests/integration/idempotency.test.ts` | +| Worker restart/retry does not double-count in covered window | `tests/integration/worker-restart.test.ts`, `scripts/prove-worker-retry-offsets.ts`, `docs/benchmarks/evidence/worker-retry-offsets-2026-06-16-worker-retry-proof-3.json` | +| Late events update event-time bucket | `tests/integration/late-events.test.ts`, `services/worker/src/aggregators/daily.ts` | +| Cache keys include tenant/project/version and TTL | `services/graphql-api/src/resolvers.ts`, `services/worker/src/processing.ts`, `tests/integration/cache-correctness.test.ts`, `scripts/measure-dashboard-cache.ts` | +| Events are partitioned by timestamp | `scripts/init-db.sql`, `migrations/006_performance_optimizations.sql`, `docs/query-plans/2026-06-16-clean-full-benchmark-partition-pruning-24h.md`, `docs/query-plans/2026-06-16-clean-full-benchmark-partition-pruning-30d.md` | +| Index choices are query-plan-backed | `docs/query-plans/2026-06-16-clean-full-benchmark-tenant-dashboard-chosen-index.md`, `docs/query-plans/2026-06-16-clean-full-benchmark-aggregate-daily-dashboard.md` | +| Hot-tenant skew was measured locally | `tests/load/hot-tenant.js`, `scripts/seed-benchmark-tenants.ts`, `scripts/measure-hot-tenant-db.ts`, `docs/benchmarks/evidence/hot-tenant-db-2026-06-16-clean-full-benchmark.json` | +| Worker catch-up was measured locally | `scripts/measure-worker-catchup.ts`, `docs/benchmarks/evidence/worker-catchup-2026-06-16-clean-publish-benchmark.json` | +| Ingest-scale stress showed backlog | `docs/benchmarks/2026-06-16-ingest-scale-pulseops-benchmark.md`, `docs/benchmarks/evidence/ingest-scale-snapshot-2026-06-16.json` | +| Observability questions and current gaps | `docs/observability.md` | +| Fresh migration path proof | `docs/migrations/safe-migration-example.md`, `docs/migrations/evidence/fresh-migration-2026-06-16-final-fresh-migration.txt` | -- Ingest API: `services/ingest-api/src/index.ts` -- Event schema: `services/ingest-api/src/schemas/event.ts` -- Worker: `services/worker/src/index.ts` -- Daily aggregation: `services/worker/src/aggregators/daily.ts` -- GraphQL schema/resolvers: `services/graphql-api/src/schema.ts`, `services/graphql-api/src/resolvers.ts` -- Database initialization: `scripts/init-db.sql` -- Performance migration notes: `migrations/006_performance_optimizations.sql` +## E. Tenant Terminology -## Event Flow +PulseOps uses the following tenant model: -Accepted single events return `202 Accepted` after Kafka publish. Batch events return `202 Accepted` with the accepted count after validating and publishing all messages in the batch. The API does not synchronously wait for PostgreSQL writes, so ingestion acceptance and analytics availability are intentionally decoupled. +| Term | Meaning in PulseOps | +| --- | --- | +| `org_id` | Organization boundary from API-key auth. This is the primary tenant boundary. | +| `project_id` | Project/workspace boundary under an organization. Project-scoped API keys cannot query or ingest into another project. | +| `tenant_id` generated column | Compatibility/generated column equal to `org_id` in several schema objects. It is not independently authenticated. | +| `properties.tenant_id` | Synthetic benchmark label used by load tests and reports. It is not trusted for authorization. | -Kafka messages are keyed by `org_id:project_id:user/session/event`, which preserves tenant context while spreading a hot tenant across more partition keys than org-only routing. Hot tenants can still concentrate aggregate updates on the same tenant/project metric rows, so Kafka partition distribution and aggregate write contention both need measurement. +Use `tenant` in article prose only as the conceptual label. When discussing implementation, say PulseOps enforces `org_id` and `project_id`. -## PostgreSQL Writes +## F. PostgreSQL Evidence -The worker first inserts `(org_id, project_id, event_id)` into `event_dedup_keys` inside the same transaction as raw-event and aggregate writes. If that insert conflicts, the worker treats the Kafka message as a duplicate and skips aggregate updates. +Confirmed: -For new events, the worker writes into `events`: +- `events` is partitioned by `timestamp`. +- Local partitions are monthly for the benchmark window. +- Main raw-event lookup indexes include `idx_events_org_project_time`, `idx_events_org_time`, `idx_events_project_time`, and generated-column tenant indexes. +- Aggregate lookup indexes include `idx_aggregates_lookup`, `idx_daily_aggregates_org_project_date`, and the uniqueness constraint on `(org_id, project_id, metric_name, date, dimensions)`. +- Query plans are saved under `docs/query-plans/`. + +Important caveat: -- `org_id` -- `project_id` -- `event_name` -- `user_id` -- `session_id` -- `properties` -- `timestamp` +- `docs/query-plans/2026-06-16-clean-full-benchmark-tenant-dashboard-chosen-index.md` shows PostgreSQL used timestamp-oriented child-partition indexes for the raw `ORDER BY timestamp DESC LIMIT 100` plan, not always the parent `idx_events_org_project_time` name. Do not claim a specific index was used unless the saved `EXPLAIN ANALYZE` file shows it. +- The 24-hour and 30-day partition-pruning plans show child partitions removed and should be cited for partition pruning, not for broad scaling claims. +- The materialized-view plan is not the current GraphQL runtime path. -It then increments daily aggregate rows in `daily_aggregates` for: +## G. Kafka And Worker Evidence + +Confirmed: + +- HTTP acceptance happens after Kafka publish, before PostgreSQL write. +- The worker consumes `events-raw`, calls `processEvent`, and then commits Kafka offsets. +- `processEvent` writes the dedupe key, raw event, daily aggregates, and cache-version bump. +- `scripts/measure-worker-catchup.ts` measures accepted events, persisted rows, polling samples, and Kafka lag after the bounded run. +- `scripts/prove-worker-retry-offsets.ts` covers one real Kafka replay boundary by crashing after `processEvent` and before offset commit. + +Do not claim: -- `dau`, after inserting a distinct `(org_id, project_id, date, user_id)` row into `daily_active_users` -- `event_count` grouped by `event_name` -- `total_events` +- That every possible worker crash/rebalance/network/broker/database failure mode is exactly-once. +- That Kafka lag will stay at 0 under arbitrary load. +- That HTTP acceptance latency equals analytics visibility latency. -The aggregate table has a uniqueness constraint on `(org_id, project_id, metric_name, date, dimensions)`, allowing `INSERT ... ON CONFLICT ... DO UPDATE` increments. +## H. Cache Evidence + +Confirmed: + +- Cache keys include `orgId`, `projectId`, version, query family, date range, event name where relevant, and normalized filters. +- Cache TTL is 300 seconds. +- The worker invalidates by incrementing `cache_version:{org_id}:{project_id}`. +- Old cache keys remain until TTL expiry; the integration test proves this by checking the old `v0` key after a `v1` namespace is created. +- The cache benchmark measures one cold request after local cache clearing, then repeated warm requests. -## Redis Cache +Do not claim: -The GraphQL resolvers cache dashboard query results in Redis with keys that include `orgId`, `projectId`, cache version, date range, query type, event name where relevant, and normalized filters. Current resolver TTL is 300 seconds. The worker increments `cache_version:{org_id}:{project_id}` after new aggregate writes so subsequent reads use a fresh cache namespace. +- Realistic production cache-hit ratio. +- Global cache invalidation. +- Cache correctness for query families not covered by tests. -Cacheable paths include: +## I. Correctness Tests -- `dailyActiveUsers` -- `eventCounts` -- `totalEvents` -- `metrics` -- `eventCountsOverTime` +The following integration tests exist: -`recentEvents` currently queries PostgreSQL directly. +- `tests/integration/idempotency.test.ts` +- `tests/integration/late-events.test.ts` +- `tests/integration/worker-restart.test.ts` +- `tests/integration/tenant-isolation.test.ts` +- `tests/integration/graphql-authz.test.ts` +- `tests/integration/cache-correctness.test.ts` -Cache invalidation correctness is covered by `tests/integration/cache-correctness.test.ts`. The integration test populates a real Redis cache entry through the GraphQL resolver, processes a real worker event for the same tenant/project, verifies `cache_version:{org_id}:{project_id}` increments, and verifies the next GraphQL read uses the fresh namespace and sees the updated aggregate. +The direct integration tests mostly exercise `processEvent` and resolver behavior. The stronger Kafka replay proof is the manual script `scripts/prove-worker-retry-offsets.ts` plus its saved evidence JSON. -## Kafka Decoupling +## J. Clean Benchmark Flow -Kafka decouples request acceptance from database writes. This lets the ingest API absorb short bursts as long as Kafka accepts messages, while the worker controls database write pressure. Operationally, this means both ingest latency and end-to-end data freshness must be measured: - -- Ingest acceptance latency: HTTP request start to `202 Accepted`. -- Queue lag: Kafka produced offset to committed consumer offset. -- Analytics lag: event timestamp or ingestion time to visible aggregate/query result. - -Do not claim analytics are real-time unless a benchmark run measures acceptable lag under the stated load. - -## Observability Evidence - -The ingest API exposes Prometheus text metrics at `GET /metrics`. The endpoint currently includes HTTP request counts and duration, tenant/project-scoped ingest counters, batch ingest counters, Kafka produced/error counters, and PostgreSQL pool gauges. The live integration test `services/ingest-api/tests/integration/ingest.test.ts` verifies that the endpoint emits the core coordination series. - -The GraphQL API also exposes `GET /metrics` for resolver duration, resolver counts, API-key auth failures, Redis cache hit/miss counts by query family, and PostgreSQL pool gauges. A checked scrape after two authenticated dashboard queries is saved at `docs/benchmarks/evidence/graphql-metrics-2026-06-16.txt`. - -The worker exposes `GET /metrics` on port 3003 for processed events, processing errors, processing duration, cache invalidation count, and PostgreSQL pool gauges. A checked scrape after a 5-event worker catch-up smoke is saved at `docs/benchmarks/evidence/worker-metrics-2026-06-16.txt`. DB write duration remains in structured worker logs as `db_write_duration_ms`; do not claim it is exposed as a Prometheus DB-write histogram until that specific metric is wired and scraped. - -## Tenant Mapping - -The codebase uses `org_id` and `project_id` as the tenant isolation boundary. Some docs or articles may use the generic term `tenant_id`; in PulseOps that maps to: - -| Concept | PulseOps field | Notes | -| --- | --- | --- | -| Tenant | `org_id` | Organization-level boundary from API-key auth. | -| Workspace/app | `project_id` | Project under an organization. | -| Synthetic tenant label | `properties.tenant_id` | Useful for generated benchmark data, not an auth boundary. | - -Direct ingest requests cannot choose arbitrary `org_id`; the API attaches it from the authenticated API key. Project-scoped API keys also reject mismatched `project_id` values. Synthetic benchmark JSONL can include tenant metadata for analysis, but production ingest treats API-key auth as the source of truth. - -Tenant isolation evidence lives in `tests/integration/tenant-isolation.test.ts`, `tests/integration/graphql-authz.test.ts`, and the live ingest checks in `services/ingest-api/tests/integration/ingest.test.ts`. The ingest integration tests verify that the demo project-scoped API key rejects cross-project single and batch writes. - -## Partitions And Indexes - -The initial schema partitions `events` by `timestamp` with monthly partitions for the local 90-day benchmark window. Existing indexes support tenant/time and project/time query patterns: - -- `idx_events_org_time` on `(org_id, timestamp DESC)` -- `idx_events_project_time` on `(project_id, timestamp DESC)` -- `idx_events_org_project_time` on `(org_id, project_id, timestamp DESC)` -- `idx_events_event_id` on `(org_id, project_id, event_id)` -- `idx_aggregates_lookup` on `(org_id, project_id, date)` -- `idx_aggregates_metric` on `(metric_name, date)` - -`migrations/006_performance_optimizations.sql` adds additional index and materialized-view ideas, but benchmark claims should cite the actual migration state used during the run. - -## Tenant-Skew Metrics To Capture - -Tenant skew is the main stress pattern for this system because Kafka keys, aggregate rows, and cache keys are tenant/project scoped. Benchmark reports should include: - -- Tenant distribution: hot, medium, and quiet tenant event counts. -- Hottest tenant share of total events. -- Kafka partition distribution by produced messages and consumer lag. -- PostgreSQL write latency and lock/wait behavior during hot-tenant bursts. -- Aggregate row conflict/update rate for hot `(org_id, project_id, metric_name, date, dimensions)` keys. -- Dashboard query latency for hot tenant vs quiet tenant. -- Cache hit ratio by query family. -- Late-arrival count and duplicate count in the generated workload. - -## Benchmark Commands - -Generate synthetic JSONL without sending it: +Before a final publishable benchmark, check: ```bash -pnpm --silent benchmark:generate -- --tenants 100 --events 100000 --days 30 --hot-tenant-ratio 0.6 --late-arrival-ratio 0.05 --duplicate-ratio 0.01 --output jsonl > docs/benchmarks/evidence/events.jsonl +git status --short ``` -Send synthetic events directly to a local ingest API: +If anything is dirty because implementation or docs changed, stop and commit first. Do not call a dirty-tree run final article evidence. -```bash -API_URL=http://localhost:3001 API_KEY=demo_key_change_this pnpm benchmark:generate -- --tenants 100 --events 10000 --days 7 --hot-tenant-ratio 0.6 --late-arrival-ratio 0.05 --duplicate-ratio 0.01 --output direct -``` - -Run k6 ingest throughput: - -```bash -API_URL=http://localhost:3001 API_KEY=demo_key_change_this pnpm benchmark:ingest -``` - -Run hot-tenant skew: - -```bash -pnpm benchmark:seed-tenants -- --tenants 100 --hot-tenants 1 --medium-tenants 10 --manifest tmp/benchmark-tenants.json -TENANT_KEYS_FILE=tmp/benchmark-tenants.json API_URL=http://localhost:3001 pnpm benchmark:hot-tenant -RUN_ID= TENANT_KEYS_FILE=tmp/benchmark-tenants.json pnpm benchmark:hot-db -- --require-complete -``` - -Run dashboard query benchmark: +Clean local flow: ```bash -GRAPHQL_URL=http://localhost:3002/graphql ORG_ID=00000000-0000-0000-0000-000000000001 PROJECT_ID=00000000-0000-0000-0000-000000000002 pnpm benchmark:dashboard +docker compose down -v +docker compose up -d --build +pnpm db:migrate +pnpm db:seed +pnpm health +pnpm test +pnpm test:integration +pnpm typecheck +pnpm lint ``` -Run cold/warm dashboard cache measurement: +Seed benchmark tenants: ```bash -RUN_ID=cache-smoke WARM_ITERATIONS=12 pnpm benchmark:cache -- --run-id cache-smoke --warm-iterations 12 +pnpm benchmark:seed-tenants -- --tenants 100 --hot-tenants 1 --medium-tenants 10 --manifest tmp/clean-benchmark-tenants.json ``` -Capture run-scoped PostgreSQL query plans and the GraphQL cache-path note: +Run the conservative full benchmark: ```bash -RUN_ID=2026-06-16-final-benchmark-smoke \ -ORG_ID=00000000-0000-4000-8000-0000000f4241 \ -PROJECT_ID=00000000-0000-4000-8000-0000001e8481 \ -CACHE_EVIDENCE=docs/benchmarks/evidence/dashboard-cache-2026-06-16-final-benchmark-smoke.json \ -pnpm query-plans:capture +RUN_ID=YYYY-MM-DD-clean-full-benchmark \ +TENANT_KEYS_FILE=tmp/clean-benchmark-tenants.json \ +API_URL=http://localhost:3001 \ +GRAPHQL_URL=http://localhost:3002/graphql \ +RATE=20 \ +DURATION=20s \ +BATCH_SIZE=20 \ +START_RATE=5 \ +PEAK_RATE=20 \ +RAMP_DURATION=5s \ +HOLD_DURATION=15s \ +RAMP_DOWN_DURATION=5s \ +VUS=10 \ +SLEEP_SECONDS=0 \ +BURST_RATE=20 \ +BURST_RAMP=5s \ +BURST_HOLD=15s \ +RECOVERY_RATE=5 \ +RECOVERY=10s \ +PREALLOCATED_VUS=30 \ +MAX_VUS=100 \ +EVENTS=200 \ +TIMEOUT_MS=120000 \ +POLL_MS=500 \ +WARM_ITERATIONS=10 \ +pnpm benchmark ``` -Run worker catch-up measurement from HTTP acceptance through Kafka to persisted rows: +Capture query plans, then generate the report: ```bash -pnpm benchmark:worker -- --run-id worker-catchup-smoke --events 1000 --batch-size 100 --poll-ms 500 --timeout-ms 60000 +RUN_ID=YYYY-MM-DD-clean-full-benchmark ./scripts/capture-query-plans.sh +RUN_ID=YYYY-MM-DD-clean-full-benchmark pnpm benchmark:report -- --run-id YYYY-MM-DD-clean-full-benchmark --output docs/benchmarks/YYYY-MM-DD-clean-full-benchmark.md --force ``` -Run the controlled worker retry/offset proof: +The resulting report must say: -```bash -docker compose stop worker -pnpm prove:worker-retry-offsets -- --timeout-ms 120000 --poll-ms 500 -docker compose start worker -``` - -Run backpressure benchmark: - -```bash -API_URL=http://localhost:3001 API_KEY=demo_key_change_this pnpm benchmark:backpressure +```text +Dirty tree | no ``` -Run the full local benchmark suite and generate an evidence-backed report: +## K. Article Paragraph -```bash -RUN_ID=local-smoke API_URL=http://localhost:3001 GRAPHQL_URL=http://localhost:3002/graphql API_KEY=demo_key_change_this pnpm benchmark -pnpm benchmark:report -- --run-id local-smoke --output docs/benchmarks/local-smoke-pulseops-benchmark.md -``` - -Verify the fresh PostgreSQL migration path: +Use this paragraph with the clean publish report. It includes measured numbers from one clean local run only. -```bash -pnpm db:verify:fresh -``` +Safe paragraph: -## Safe Public Claims - -These claims are supported by repository evidence: - -- PulseOps uses a Fastify ingest API, Kafka queue, worker aggregation process, PostgreSQL storage, Redis query cache, and GraphQL query layer. -- Ingestion is decoupled from PostgreSQL writes by Kafka. -- The data model scopes events and aggregates by `org_id` and `project_id`. -- Raw events are partitioned by timestamp in the initial database schema. -- Dashboard resolvers use Redis caching for several aggregate query paths. -- The repository includes synthetic skew generation and k6 benchmark scripts that can produce evidence for throughput, hot-tenant behavior, dashboard query latency, and backpressure. -- `pnpm db:verify:fresh` runs the migrator against a throwaway PostgreSQL 16 database and proves the fresh local schema path creates expected partitions, materialized dashboard evidence objects, and migration ledger rows without the known duplicate legacy index names. The checked evidence file is `docs/migrations/evidence/fresh-migration-2026-06-16-final-fresh-migration.txt`. -- In the clean full local benchmark report `docs/benchmarks/2026-06-16-clean-full-benchmark.md`, all benchmark suites completed after a Docker volume reset using conservative local rates. The run produced ingest, hot-tenant, hot-tenant DB reconciliation, dashboard, cache, worker catch-up, backpressure, and run-scoped query-plan evidence, and final Kafka lag was 0. -- In the canonical `2026-06-16-final-benchmark-smoke` report, the ingest smoke run accepted 226 HTTP batch requests at 14.96 requests/second with 0% HTTP request failure and p95 request latency of 7.17 ms. This is request acceptance evidence, not immediate-persistence evidence. -- The heavier `2026-06-16-ingest-scale` report is the better source for ingest stress claims. In that local run, 100 RPS for 2 minutes was clean at the HTTP layer, 500 RPS showed stress through 1,119 dropped iterations and 716.94 ms p95 latency, and the 1000 RPS target was not sustained: actual request rate was 469.37 requests/second, dropped iterations were 158,898, HTTP failure rate was 0.739%, and p99 reached 15000.34 ms. -- The same ingest-scale snapshot captured Kafka lag of 10,254,305 messages for consumer group `pulseops-aggregators`. That supports a coordination/backpressure claim, not a claim that all attempted events were persisted or aggregated. -- In that same canonical smoke report, the hot-tenant DB evidence reconciled 249 successful hot-tenant k6 requests with 249 persisted events and Kafka lag 0. The hot tenant class produced 201 of those 249 persisted events. -- The canonical dashboard cache smoke measured a 32.34 ms cold GraphQL dashboard request, then a 1.54 ms warm median and 2.31 ms warm p95 across 5 warm requests after three Redis keys were created for the tenant/project/date range. -- The canonical worker catch-up smoke accepted 200 events through HTTP/Kafka, persisted 200 raw event rows, and returned Kafka lag to 0. The measured persisted rate until caught up was 92.72 events/second for that bounded local workload. -- The 2026-06-16 controlled worker retry proof stopped the compose worker, ran a local worker with a one-shot crash hook after `processEvent` and before Kafka offset commit, and saved `docs/benchmarks/evidence/worker-retry-offsets-2026-06-16-worker-retry-proof-3.json`. In that proof, the worker exited with code 86, the probe event had exactly 1 raw event row, 1 dedupe key, and aggregate value 1 after the crash, Kafka lag was 1 after the crash, replay kept those database counts at 1, and Kafka lag returned to 0. -- `pnpm benchmark:hot-db -- --require-complete` measures persisted hot-tenant distribution, aggregate-key pressure, burst windows, partition spread, representative hot/quiet query plans, an after-run PostgreSQL lock/activity snapshot, and reconciliation between k6 requests, Kafka lag, and persisted rows for a specific `run_id`. -- In the canonical `2026-06-16-final-benchmark-smoke` hot-tenant DB evidence, `pnpm benchmark:hot-db -- --require-complete` reconciled 249 successful hot-tenant k6 requests with 249 persisted hot-test events and Kafka lag 0. The persisted events were 201 hot-class events, 40 quiet-class events, and 8 medium-class events. Evidence: `docs/benchmarks/evidence/hot-tenant-db-2026-06-16-final-benchmark-smoke.json`. -- The final 2026-06-16 smoke report now includes run-scoped query-plan evidence under `docs/query-plans/2026-06-16-final-benchmark-smoke-*.md`. Those files capture the run ID, Git commit, target hot-tenant org/project, row counts, indexes, event partitions, exact SQL, EXPLAIN ANALYZE output, and interpretation. The GraphQL cache-path file explicitly records that Redis cache hits do not have a PostgreSQL EXPLAIN plan and must be cited from the dashboard cache JSON evidence instead. - -## Unsafe Public Claims Until Measured - -Do not publish these as facts without a dated benchmark report and environment details: - -- Specific events-per-second throughput beyond the exact dated run and workload. -- Specific p95 or p99 ingest latency beyond the exact dated run and workload. -- Specific p95 or p99 dashboard query latency. -- Kafka lag limits or dashboard freshness guarantees under load. The ingest-scale report captures a large lag snapshot, but it does not establish a stable upper bound or catch-up SLO. -- General exactly-once processing claims across every crash/rebalance/failure mode. The worker retry proof covers one controlled post-processing/pre-offset-commit replay window. -- PostgreSQL write capacity under hot-tenant skew. -- Production lock/wait behavior from `benchmark:hot-db`; its lock/activity data is an after-run snapshot unless a report explicitly says it sampled continuously during load. -- Redis cache hit ratio in realistic usage. The local cold/warm cache timing exists, but it is not a production cache-hit-ratio measurement. -- Horizontal scalability claims. -- Production availability or SLO claims. - -## Article-Ready Paragraph - -While building PulseOps, I saw a smaller version of this coordination problem. In the canonical local Docker smoke run on June 16, 2026, the ingest API accepted 226 batch requests at 14.96 requests per second with 0% HTTP request failure and 7.17 ms p95 request latency, while a separate worker catch-up proof accepted 200 events, persisted 200 raw event rows, and returned Kafka lag to 0. In the heavier ingest-scale run, the 1000 RPS target was not sustained locally and Kafka lag reached 10,254,305 messages, which made the coordination boundary explicit: HTTP acceptance was not the same thing as database visibility. The dashboard cache path showed the same issue in another form: one cold GraphQL dashboard request took 32.34 ms and populated three Redis keys; the warm median over 5 repeat requests was 1.54 ms. That is the point: a single incoming event was not just one database write. It entered through a Fastify ingest API, moved through Kafka, was consumed by a worker, written idempotently into partitioned PostgreSQL, aggregated into daily buckets, invalidated Redis-backed GraphQL cache keys, and appeared later in dashboard reads. The lesson was not that PulseOps was operating at massive scale. It was not. The lesson was that even at project scale, the hard part became coordination: ingestion, batching, tenant-aware metrics, cache freshness, dashboard latency, and keeping the database predictable under uneven load. +While building PulseOps, I saw a smaller project-scale version of the coordination problem. A single incoming event was not just one database write. It entered through a Fastify ingest API, was accepted after Kafka publish, consumed by a worker, written idempotently into partitioned PostgreSQL, aggregated into daily buckets, and later served through Redis-versioned GraphQL dashboard cache paths. In a clean local Docker benchmark on June 16, 2026, the ingest acceptance suite ran 400 batch requests at 20.00 requests per second with 7.19 ms p95 HTTP latency and 0.00% HTTP failures, while the worker catch-up suite accepted 200 events, persisted 200 raw rows, and ended with Kafka lag at 0. The point was not that PulseOps operated at massive scale. It did not. The useful lesson was that HTTP acceptance, queue lag, database visibility, tenant-aware metrics, cache freshness, and dashboard latency are different stages, and each one needs to be coordinated. Postgres did not become less important; it became too important to leave alone, because the surrounding platform decides how truth is routed, cached, migrated, observed, and recovered. diff --git a/docs/benchmarks/2026-06-16-clean-full-benchmark.md b/docs/benchmarks/2026-06-16-clean-full-benchmark.md index 7682219..1705214 100644 --- a/docs/benchmarks/2026-06-16-clean-full-benchmark.md +++ b/docs/benchmarks/2026-06-16-clean-full-benchmark.md @@ -1,6 +1,8 @@ # PulseOps Benchmark Report: 2026-06-16 -Status: evidence-backed local report for run ID `2026-06-16-clean-full-benchmark`; not production-scale +Status: evidence-backed local report for run ID `2026-06-16-clean-full-benchmark`; dirty-tree evidence; not production-scale + +Publishability: not final publishable article evidence because this report records `Dirty tree | yes`. Use it for review and methodology, then rerun from a clean commit before citing final numbers publicly. ## Environment @@ -21,6 +23,8 @@ Status: evidence-backed local report for run ID `2026-06-16-clean-full-benchmark | k6 version | k6 v2.0.0+dirty (commit/8c3be52cc1-dirty, go1.26.3, linux/arm64) (Docker fallback image grafana/k6:2.0.0) | | Dataset | local Docker dataset at report generation time | +The dashboard cache measurement in this run used the default demo org/project, while the hot-tenant and run-scoped raw-event query plans target a seeded benchmark org/project. Treat the cache row as same-run local cache-path evidence, not as the hot tenant's cache latency. + ## Commands ```bash diff --git a/docs/benchmarks/2026-06-16-clean-publish-benchmark.md b/docs/benchmarks/2026-06-16-clean-publish-benchmark.md new file mode 100644 index 0000000..3d9a4d6 --- /dev/null +++ b/docs/benchmarks/2026-06-16-clean-publish-benchmark.md @@ -0,0 +1,186 @@ +# PulseOps Benchmark Report: 2026-06-16 + +Status: evidence-backed local report for run ID `2026-06-16-clean-publish-benchmark`; not production-scale + +Publishability: candidate publishable local evidence; still not production-scale + +## Environment + +| Field | Value | +| --- | --- | +| Git commit | `63f9556cefad9548774c0eca17b01e558eda3d87` | +| Dirty tree | no | +| Dirty tree details | none | +| Machine | Apple M4 Pro, 12 logical CPUs, 24.00 GiB host memory | +| Docker resources | 12 CPUs, 7.65 GiB | +| OS | Darwin 25.5.0 arm64 | +| Node.js version | v25.3.0 | +| PostgreSQL version | 16.13 | +| Redis version | v=7.4.8 | +| Kafka version | 4.2.0 | +| PostgreSQL row count at report capture | 18376 raw events | +| Daily aggregate row count at report capture | 630 rows | +| Event partitions at report capture | 7 child partitions | +| k6 version | k6 v2.0.0+dirty (commit/8c3be52cc1-dirty, go1.26.3, linux/arm64) (Docker fallback image grafana/k6:2.0.0) | +| Dataset | local Docker dataset at report generation time | + +Environment values come from pre-run metadata when available. PostgreSQL row counts are captured when this report is generated, after the benchmark and query-plan capture. + +## Run Provenance + +| Field | Value | +| --- | --- | +| Metadata file | `docs/benchmarks/evidence/run-metadata-2026-06-16-clean-publish-benchmark.json` | +| Run started | 2026-06-16T21:22:27.830Z | +| Run completed | 2026-06-16T21:24:13.432Z | +| Run status | completed | +| Branch at run start | `feat/publish-safe-evidence` | +| Suites requested | ingest, hot, hotDb, dashboard, cache, worker, backpressure | +| Suites completed | ingest, hot, hotDb, dashboard, cache, worker, backpressure | + +### Dirty Tree Details + +```text +none +``` + +### Recorded Environment Overrides + +| Name | Value | +| --- | --- | +| `API_URL` | `http://localhost:3001` | +| `BATCH_SIZE` | `20` | +| `BURST_HOLD` | `15s` | +| `BURST_RAMP` | `5s` | +| `BURST_RATE` | `20` | +| `DURATION` | `20s` | +| `EVENTS` | `200` | +| `GRAPHQL_URL` | `http://localhost:3002/graphql` | +| `HOLD_DURATION` | `15s` | +| `MAX_VUS` | `100` | +| `ORG_ID` | `00000000-0000-4000-8000-0000000f4241` | +| `PEAK_RATE` | `20` | +| `POLL_MS` | `500` | +| `PREALLOCATED_VUS` | `30` | +| `PROJECT_ID` | `00000000-0000-4000-8000-0000001e8481` | +| `RAMP_DOWN_DURATION` | `5s` | +| `RAMP_DURATION` | `5s` | +| `RATE` | `20` | +| `RECOVERY` | `10s` | +| `RECOVERY_RATE` | `5` | +| `SLEEP_SECONDS` | `0` | +| `START_RATE` | `5` | +| `TENANT_KEYS_FILE` | `tmp/clean-publish-benchmark-tenants.json` | +| `TIMEOUT_MS` | `120000` | +| `VUS` | `10` | +| `WARM_ITERATIONS` | `10` | + +### Recorded Suite Commands + +| Suite | Command | +| --- | --- | +| ingest | `node scripts/run-k6.js tests/load/ingest-throughput.js` | +| hot | `node scripts/run-k6.js tests/load/hot-tenant.js` | +| hotDb | `pnpm exec tsx scripts/measure-hot-tenant-db.ts` | +| dashboard | `node scripts/run-k6.js tests/load/dashboard-query.js` | +| cache | `pnpm exec tsx scripts/measure-dashboard-cache.ts` | +| worker | `pnpm exec tsx scripts/measure-worker-catchup.ts` | +| backpressure | `node scripts/run-k6.js tests/load/backpressure.js` | + +## Commands + +```bash +# Command matching the run-specific evidence files currently present in this report: +RUN_ID=2026-06-16-clean-publish-benchmark pnpm benchmark +RUN_ID=2026-06-16-clean-publish-benchmark pnpm benchmark:report -- --run-id 2026-06-16-clean-publish-benchmark --force + +# Full-suite command, if you want every row populated: +RUN_ID=2026-06-16-clean-publish-benchmark pnpm benchmark +``` + +Run-specific evidence files found for this report: ingest, hot, hotDb, dashboard, cache, worker, backpressure. +If only part of the suite was run, missing evidence stays marked as `not found` below. + +## Results + +| Test | Command | Throughput | p50 latency | p95 latency | p99 latency | Error rate | Kafka lag | DB notes | Result | +| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | +| Ingest throughput | `pnpm benchmark:ingest` | 20.00 req/s | 3.34 ms | 7.19 ms | 10.55 ms | 0.00% | not measured by this k6 row | 400 requests | Measured; docs/benchmarks/evidence/ingest-throughput-2026-06-16-clean-publish-benchmark.json | +| Hot tenant | `pnpm benchmark:hot-tenant` | 16.99 req/s | 3.55 ms | 6.69 ms | 9.32 ms | 0.00% | not measured by this k6 row | 425 requests | Measured; docs/benchmarks/evidence/hot-tenant-2026-06-16-clean-publish-benchmark.json | +| Hot tenant DB evidence | `pnpm benchmark:hot-db` | 425 persisted hot-test events | n/a | n/a | n/a | 0 unmatched requests | 0 | hot raw count 0.11 ms; quiet raw count 0.06 ms; 0 waiting locks at snapshot; hot 322/425; max hot events/key 257 | Measured; docs/benchmarks/evidence/hot-tenant-db-2026-06-16-clean-publish-benchmark.json | +| Dashboard query | `pnpm benchmark:dashboard` | 2119.62 req/s | 4.24 ms | 6.60 ms | 11.48 ms | 0.00% | not measured by this k6 row | 42399 requests | Measured; docs/benchmarks/evidence/dashboard-query-2026-06-16-clean-publish-benchmark.json | +| Dashboard cache | `pnpm benchmark:cache` | n/a | 1.56 ms | 3.05 ms | not captured | 0 GraphQL errors | n/a | cold 30.97 ms, 10 warm iterations | Measured; docs/benchmarks/evidence/dashboard-cache-2026-06-16-clean-publish-benchmark.json | +| Worker catch-up | `pnpm benchmark:worker` | 111.90 persisted events/s | n/a | n/a | n/a | 0 lost in run | 0 | 200 accepted / 200 persisted | Measured; docs/benchmarks/evidence/worker-catchup-2026-06-16-clean-publish-benchmark.json | +| Backpressure | `pnpm benchmark:backpressure` | 16.28 req/s | 3.08 ms | 4.78 ms | 6.83 ms | 0.00% | not measured by this k6 row | 487 requests | Measured; docs/benchmarks/evidence/backpressure-2026-06-16-clean-publish-benchmark.json | + +## Run-Scoped Query Plans + +| Query | Plan file | Observation | +| --- | --- | --- | +| clean-publish-benchmark-aggregate-daily-dashboard | `docs/query-plans/2026-06-16-clean-publish-benchmark-aggregate-daily-dashboard.md` | Captured for run ID 2026-06-16-clean-publish-benchmark; read file for row counts and interpretation | +| clean-publish-benchmark-graphql-cache-path | `docs/query-plans/2026-06-16-clean-publish-benchmark-graphql-cache-path.md` | Captured for run ID 2026-06-16-clean-publish-benchmark; read file for row counts and interpretation | +| clean-publish-benchmark-materialized-dashboard | `docs/query-plans/2026-06-16-clean-publish-benchmark-materialized-dashboard.md` | Captured for run ID 2026-06-16-clean-publish-benchmark; read file for row counts and interpretation | +| clean-publish-benchmark-partition-pruning-24h | `docs/query-plans/2026-06-16-clean-publish-benchmark-partition-pruning-24h.md` | Captured for run ID 2026-06-16-clean-publish-benchmark; read file for row counts and interpretation | +| clean-publish-benchmark-partition-pruning-30d | `docs/query-plans/2026-06-16-clean-publish-benchmark-partition-pruning-30d.md` | Captured for run ID 2026-06-16-clean-publish-benchmark; read file for row counts and interpretation | +| clean-publish-benchmark-tenant-dashboard-chosen-index | `docs/query-plans/2026-06-16-clean-publish-benchmark-tenant-dashboard-chosen-index.md` | Captured for run ID 2026-06-16-clean-publish-benchmark; read file for row counts and interpretation | +| clean-publish-benchmark-tenant-dashboard-index-disabled | `docs/query-plans/2026-06-16-clean-publish-benchmark-tenant-dashboard-index-disabled.md` | Captured for run ID 2026-06-16-clean-publish-benchmark; read file for row counts and interpretation | + +## Reference Query Plans + +These saved EXPLAIN ANALYZE files are repository evidence, not generated by this benchmark report unless they explicitly mention run ID `2026-06-16-clean-publish-benchmark`. + +| Query | Plan file | Observation | +| --- | --- | --- | +| aggregate-daily-dashboard | `docs/query-plans/2026-06-16-aggregate-daily-dashboard.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run | +| clean-full-benchmark-aggregate-daily-dashboard | `docs/query-plans/2026-06-16-clean-full-benchmark-aggregate-daily-dashboard.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run | +| clean-full-benchmark-graphql-cache-path | `docs/query-plans/2026-06-16-clean-full-benchmark-graphql-cache-path.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run | +| clean-full-benchmark-materialized-dashboard | `docs/query-plans/2026-06-16-clean-full-benchmark-materialized-dashboard.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run | +| clean-full-benchmark-partition-pruning-24h | `docs/query-plans/2026-06-16-clean-full-benchmark-partition-pruning-24h.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run | +| clean-full-benchmark-partition-pruning-30d | `docs/query-plans/2026-06-16-clean-full-benchmark-partition-pruning-30d.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run | +| clean-full-benchmark-tenant-dashboard-chosen-index | `docs/query-plans/2026-06-16-clean-full-benchmark-tenant-dashboard-chosen-index.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run | +| clean-full-benchmark-tenant-dashboard-index-disabled | `docs/query-plans/2026-06-16-clean-full-benchmark-tenant-dashboard-index-disabled.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run | +| final-benchmark-smoke-aggregate-daily-dashboard | `docs/query-plans/2026-06-16-final-benchmark-smoke-aggregate-daily-dashboard.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run | +| final-benchmark-smoke-graphql-cache-path | `docs/query-plans/2026-06-16-final-benchmark-smoke-graphql-cache-path.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run | +| final-benchmark-smoke-materialized-dashboard | `docs/query-plans/2026-06-16-final-benchmark-smoke-materialized-dashboard.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run | +| final-benchmark-smoke-partition-pruning-24h | `docs/query-plans/2026-06-16-final-benchmark-smoke-partition-pruning-24h.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run | +| final-benchmark-smoke-partition-pruning-30d | `docs/query-plans/2026-06-16-final-benchmark-smoke-partition-pruning-30d.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run | +| final-benchmark-smoke-tenant-dashboard-chosen-index | `docs/query-plans/2026-06-16-final-benchmark-smoke-tenant-dashboard-chosen-index.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run | +| final-benchmark-smoke-tenant-dashboard-index-disabled | `docs/query-plans/2026-06-16-final-benchmark-smoke-tenant-dashboard-index-disabled.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run | +| materialized-dashboard | `docs/query-plans/2026-06-16-materialized-dashboard.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run | +| partition-pruning-24h | `docs/query-plans/2026-06-16-partition-pruning-24h.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run | +| partition-pruning-30d | `docs/query-plans/2026-06-16-partition-pruning-30d.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run | +| tenant-dashboard-chosen-index | `docs/query-plans/2026-06-16-tenant-dashboard-chosen-index.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run | +| tenant-dashboard-index-disabled | `docs/query-plans/2026-06-16-tenant-dashboard-index-disabled.md` | Reference EXPLAIN ANALYZE evidence; cite separately from this benchmark run | + +## Evidence Files + +| File | Description | +| --- | --- | +| `docs/benchmarks/evidence/run-metadata-2026-06-16-clean-publish-benchmark.json` | Pre-run benchmark metadata JSON | +| `docs/benchmarks/evidence/ingest-throughput-2026-06-16-clean-publish-benchmark.json` | Raw k6 ingest summary JSON | +| `docs/benchmarks/evidence/hot-tenant-2026-06-16-clean-publish-benchmark.json` | Raw k6 hot-tenant summary JSON | +| `docs/benchmarks/evidence/hot-tenant-db-2026-06-16-clean-publish-benchmark.json` | Hot-tenant PostgreSQL evidence JSON | +| `docs/benchmarks/evidence/dashboard-query-2026-06-16-clean-publish-benchmark.json` | Raw k6 dashboard-query summary JSON | +| `docs/benchmarks/evidence/dashboard-cache-2026-06-16-clean-publish-benchmark.json` | Cold/warm GraphQL cache JSON measurement | +| `docs/benchmarks/evidence/worker-catchup-2026-06-16-clean-publish-benchmark.json` | Worker catch-up JSON measurement | +| `docs/benchmarks/evidence/backpressure-2026-06-16-clean-publish-benchmark.json` | Raw k6 backpressure summary JSON | + +## Claims Allowed From This Run + +- The numbers in the table are local measurements for run ID `2026-06-16-clean-publish-benchmark` only. +- Kafka decoupling can be discussed when ingest acceptance and worker catch-up or lag evidence are both present. +- Cache claims are limited to the cold/warm GraphQL measurement if the dashboard cache evidence file exists. +- Worker throughput claims are limited to the bounded worker catch-up workload if the worker evidence file exists. +- Hot-tenant database claims are limited to the aggregate-key pressure, representative EXPLAIN timings, reconciliation status, and after-run PostgreSQL snapshot in the hot-tenant DB evidence file if present. +- Query plan claims from this run require run-scoped files above. Otherwise cite the reference query-plan files separately. +- Treat this report as article-ready only if `Dirty tree` is `no`, run metadata status is `completed`, every requested suite is completed, run-scoped query plans are listed, and every cited number comes from this run ID. + +## Claims Not Supported By This Run + +- Do not claim production scale, production readiness, or a fixed capacity limit. +- Do not extrapolate beyond the exact workload, machine, Docker resources, and dataset above. +- Do not claim long-duration or million-event tenant-skew behavior unless that evidence file is present. +- Do not claim realistic cache hit ratio from a cold/warm smoke measurement. +- Do not claim Kafka lag limits beyond the captured lag evidence; this run's worker final lag was 0. +- Do not claim final publishable benchmark evidence from this report if `Dirty tree` is `yes`, run metadata is missing/incomplete, or run-scoped query plans are missing. +- The fallback k6 runner is pinned to `grafana/k6:2.0.0`; record a new exact version if you override it or use a local k6 binary. diff --git a/docs/benchmarks/YYYY-MM-DD-pulseops-benchmark.md b/docs/benchmarks/YYYY-MM-DD-pulseops-benchmark.md index 5e9468d..126809c 100644 --- a/docs/benchmarks/YYYY-MM-DD-pulseops-benchmark.md +++ b/docs/benchmarks/YYYY-MM-DD-pulseops-benchmark.md @@ -2,12 +2,15 @@ Status: TBD / not run +Publishability: not publishable until this report has a run ID, run metadata, `Dirty tree | no`, exact commands, raw evidence files, and query plans for every cited claim. + ## Environment | Field | Value | | --- | --- | | Git commit | TBD | | Dirty tree | TBD | +| Dirty tree details | TBD | | Machine/container limits | TBD | | Node.js version | TBD | | PostgreSQL version | TBD | @@ -16,6 +19,37 @@ Status: TBD / not run | k6 version | TBD | | Dataset | TBD | +## Run Provenance + +| Field | Value | +| --- | --- | +| Metadata file | TBD | +| Run started | TBD | +| Run completed | TBD | +| Run status | TBD | +| Branch at run start | TBD | +| Suites requested | TBD | +| Suites completed | TBD | + +### Recorded Environment Overrides + +| Name | Value | +| --- | --- | +| TBD | TBD | + +### Validation Gates + +| Gate | Result | Evidence | +| --- | --- | --- | +| `git status --short` before benchmark | TBD | Must be empty for final article evidence | +| Docker reset | TBD | Command output or report note | +| Migrate and seed | TBD | Command output or report note | +| `pnpm health` | TBD | Command output or report note | +| `pnpm test` | TBD | Command output or report note | +| `pnpm test:integration` | TBD | Command output or report note | +| `pnpm typecheck` | TBD | Command output or report note | +| `pnpm lint` | TBD | Command output or report note | + ## Workload | Parameter | Value | @@ -62,6 +96,7 @@ Status: TBD / not run | File | Description | | --- | --- | | TBD | Raw k6 summary JSON | +| TBD | Pre-run benchmark metadata JSON | | TBD | Generator metadata | | TBD | PostgreSQL query plan | | TBD | Kafka lag snapshot | @@ -72,8 +107,10 @@ Status: TBD / not run ## Claims Allowed From This Run -- TBD +- Only claims tied to this run ID, this commit, this machine/container setup, and the exact evidence files listed above. +- Request-acceptance latency is not database visibility latency. +- Cache timing is cold-vs-warm local timing unless a separate cache-hit-ratio workload is measured. ## Claims Not Supported By This Run -- TBD +- Production scale, production readiness, Supabase-scale comparisons, horizontal scalability, global exactly-once processing, fixed throughput capacity, stable Kafka lag SLOs, and realistic production cache-hit ratio. diff --git a/docs/benchmarks/evidence/backpressure-2026-06-16-clean-publish-benchmark.json b/docs/benchmarks/evidence/backpressure-2026-06-16-clean-publish-benchmark.json new file mode 100644 index 0000000..264fdf3 --- /dev/null +++ b/docs/benchmarks/evidence/backpressure-2026-06-16-clean-publish-benchmark.json @@ -0,0 +1,234 @@ +{ + "root_group": { + "path": "", + "id": "d41d8cd98f00b204e9800998ecf8427e", + "groups": [], + "checks": [ + { + "path": "::accepted, limited, or saturated", + "id": "7e8df65fd154c2d3bd8d977f0ef1cf22", + "passes": 487, + "fails": 0, + "name": "accepted, limited, or saturated" + }, + { + "passes": 487, + "fails": 0, + "name": "not a validation failure", + "path": "::not a validation failure", + "id": "f384cc847cef144de5644bbb7040d223" + } + ], + "name": "" + }, + "options": { + "summaryTrendStats": [ + "avg", + "min", + "med", + "p(90)", + "p(95)", + "p(99)", + "max" + ], + "summaryTimeUnit": "", + "noColor": false + }, + "state": { + "isStdOutTTY": false, + "isStdErrTTY": false, + "testRunDurationMs": 29911.166458 + }, + "metrics": { + "iteration_duration": { + "type": "trend", + "contains": "time", + "values": { + "max": 77.373042, + "avg": 4.0733215482546195, + "min": 2.666959, + "med": 3.531084, + "p(90)": 4.817674800000001, + "p(95)": 5.624525199999999, + "p(99)": 7.756529759999998 + } + }, + "http_req_duration{expected_response:true}": { + "type": "trend", + "contains": "time", + "values": { + "p(90)": 3.974975, + "p(95)": 4.781341199999998, + "p(99)": 6.8265606199999995, + "max": 71.596958, + "avg": 3.513741954825458, + "min": 2.278833, + "med": 3.082166 + } + }, + "http_req_waiting": { + "type": "trend", + "contains": "time", + "values": { + "avg": 3.3652921601642714, + "min": 2.194708, + "med": 2.943542, + "p(90)": 3.8230002, + "p(95)": 4.5309501999999995, + "p(99)": 6.585091139999999, + "max": 67.428708 + } + }, + "data_sent": { + "type": "counter", + "contains": "data", + "values": { + "count": 3767812, + "rate": 125966.73571024396 + } + }, + "http_reqs": { + "values": { + "count": 487, + "rate": 16.281544910119933 + }, + "type": "counter", + "contains": "default" + }, + "http_req_connecting": { + "type": "trend", + "contains": "time", + "values": { + "p(99)": 1.1285969999999999, + "max": 1.345625, + "avg": 0.06521415605749488, + "min": 0, + "med": 0, + "p(90)": 0, + "p(95)": 0.9701042999999997 + } + }, + "http_req_tls_handshaking": { + "type": "trend", + "contains": "time", + "values": { + "max": 0, + "avg": 0, + "min": 0, + "med": 0, + "p(90)": 0, + "p(95)": 0, + "p(99)": 0 + } + }, + "http_req_failed": { + "values": { + "rate": 0, + "passes": 0, + "fails": 487 + }, + "type": "rate", + "contains": "default" + }, + "iterations": { + "contains": "default", + "values": { + "count": 487, + "rate": 16.281544910119933 + }, + "type": "counter" + }, + "http_req_blocked": { + "type": "trend", + "contains": "time", + "values": { + "max": 1.554083, + "avg": 0.07338157905544129, + "min": 0.002541, + "med": 0.003625, + "p(90)": 0.007225400000000001, + "p(95)": 1.0132752, + "p(99)": 1.2129588799999997 + } + }, + "http_req_duration": { + "type": "trend", + "contains": "time", + "values": { + "p(90)": 3.974975, + "p(95)": 4.781341199999998, + "p(99)": 6.8265606199999995, + "max": 71.596958, + "avg": 3.513741954825458, + "min": 2.278833, + "med": 3.082166 + } + }, + "vus_max": { + "values": { + "value": 30, + "min": 30, + "max": 30 + }, + "type": "gauge", + "contains": "default" + }, + "checks": { + "contains": "default", + "values": { + "fails": 0, + "rate": 1, + "passes": 974 + }, + "thresholds": { + "rate>0.80": { + "ok": true + } + }, + "type": "rate" + }, + "data_received": { + "type": "counter", + "contains": "data", + "values": { + "count": 162658, + "rate": 5438.035999980058 + } + }, + "http_req_receiving": { + "type": "trend", + "contains": "time", + "values": { + "med": 0.05175, + "p(90)": 0.0767922, + "p(95)": 0.09514979999999991, + "p(99)": 0.14174476, + "max": 0.203083, + "avg": 0.05640049075975355, + "min": 0.027875 + } + }, + "vus": { + "type": "gauge", + "contains": "default", + "values": { + "value": 0, + "min": 0, + "max": 0 + } + }, + "http_req_sending": { + "type": "trend", + "contains": "time", + "values": { + "min": 0.044875, + "med": 0.07725, + "p(90)": 0.1044586, + "p(95)": 0.13634549999999998, + "p(99)": 0.21404088, + "max": 4.094292, + "avg": 0.09204930390143744 + } + } + } +} \ No newline at end of file diff --git a/docs/benchmarks/evidence/dashboard-cache-2026-06-16-clean-publish-benchmark.json b/docs/benchmarks/evidence/dashboard-cache-2026-06-16-clean-publish-benchmark.json new file mode 100644 index 0000000..7fa8c94 --- /dev/null +++ b/docs/benchmarks/evidence/dashboard-cache-2026-06-16-clean-publish-benchmark.json @@ -0,0 +1,117 @@ +{ + "run_id": "2026-06-16-clean-publish-benchmark", + "captured_at": "2026-06-16T21:23:38.391Z", + "graphql_url": "http://localhost:3002/graphql", + "org_id": "00000000-0000-4000-8000-0000000f4241", + "project_id": "00000000-0000-4000-8000-0000001e8481", + "date_range": { + "start": "2026-06-01", + "end": "2026-06-30" + }, + "cache_namespace": { + "version": "402", + "deleted_keys_before_cold_run": [ + "dau:00000000-0000-4000-8000-0000000f4241:00000000-0000-4000-8000-0000001e8481:v402:Mon Jun 01 2026 00:00:00 GMT+0000 (Coordinated Universal Time):Tue Jun 30 2026 00:00:00 GMT+0000 (Coordinated Universal Time):{\"segment\":\"pro\"}", + "total:00000000-0000-4000-8000-0000000f4241:00000000-0000-4000-8000-0000001e8481:v402:Mon Jun 01 2026 00:00:00 GMT+0000 (Coordinated Universal Time):Tue Jun 30 2026 00:00:00 GMT+0000 (Coordinated Universal Time):{\"segment\":\"pro\"}", + "events:00000000-0000-4000-8000-0000000f4241:00000000-0000-4000-8000-0000001e8481:v402:Mon Jun 01 2026 00:00:00 GMT+0000 (Coordinated Universal Time):Tue Jun 30 2026 00:00:00 GMT+0000 (Coordinated Universal Time):all:{\"segment\":\"pro\"}", + "events-series:00000000-0000-4000-8000-0000000f4241:00000000-0000-4000-8000-0000001e8481:v402:Mon Jun 01 2026 00:00:00 GMT+0000 (Coordinated Universal Time):Tue Jun 30 2026 00:00:00 GMT+0000 (Coordinated Universal Time):all:{}", + "dau:00000000-0000-4000-8000-0000000f4241:00000000-0000-4000-8000-0000001e8481:v402:Mon Jun 01 2026 00:00:00 GMT+0000 (Coordinated Universal Time):Tue Jun 30 2026 00:00:00 GMT+0000 (Coordinated Universal Time):{}", + "events-series:00000000-0000-4000-8000-0000000f4241:00000000-0000-4000-8000-0000001e8481:v402:Mon Jun 01 2026 00:00:00 GMT+0000 (Coordinated Universal Time):Tue Jun 30 2026 00:00:00 GMT+0000 (Coordinated Universal Time):all:{\"segment\":\"pro\"}", + "events:00000000-0000-4000-8000-0000000f4241:00000000-0000-4000-8000-0000001e8481:v402:Mon Jun 01 2026 00:00:00 GMT+0000 (Coordinated Universal Time):Tue Jun 30 2026 00:00:00 GMT+0000 (Coordinated Universal Time):all:{}", + "total:00000000-0000-4000-8000-0000000f4241:00000000-0000-4000-8000-0000001e8481:v402:Mon Jun 01 2026 00:00:00 GMT+0000 (Coordinated Universal Time):Tue Jun 30 2026 00:00:00 GMT+0000 (Coordinated Universal Time):{}" + ], + "keys_after_cold_run": [ + "dau:00000000-0000-4000-8000-0000000f4241:00000000-0000-4000-8000-0000001e8481:v402:Mon Jun 01 2026 00:00:00 GMT+0000 (Coordinated Universal Time):Tue Jun 30 2026 00:00:00 GMT+0000 (Coordinated Universal Time):{}", + "events:00000000-0000-4000-8000-0000000f4241:00000000-0000-4000-8000-0000001e8481:v402:Mon Jun 01 2026 00:00:00 GMT+0000 (Coordinated Universal Time):Tue Jun 30 2026 00:00:00 GMT+0000 (Coordinated Universal Time):all:{}", + "total:00000000-0000-4000-8000-0000000f4241:00000000-0000-4000-8000-0000001e8481:v402:Mon Jun 01 2026 00:00:00 GMT+0000 (Coordinated Universal Time):Tue Jun 30 2026 00:00:00 GMT+0000 (Coordinated Universal Time):{}" + ] + }, + "cold": { + "label": "cold-cache-miss", + "duration_ms": 30.97, + "status": 200, + "graphql_errors": 0, + "total_events": 402 + }, + "warm": [ + { + "label": "warm-cache-hit-1", + "duration_ms": 2.329, + "status": 200, + "graphql_errors": 0, + "total_events": 402 + }, + { + "label": "warm-cache-hit-2", + "duration_ms": 3.051, + "status": 200, + "graphql_errors": 0, + "total_events": 402 + }, + { + "label": "warm-cache-hit-3", + "duration_ms": 1.734, + "status": 200, + "graphql_errors": 0, + "total_events": 402 + }, + { + "label": "warm-cache-hit-4", + "duration_ms": 1.709, + "status": 200, + "graphql_errors": 0, + "total_events": 402 + }, + { + "label": "warm-cache-hit-5", + "duration_ms": 1.239, + "status": 200, + "graphql_errors": 0, + "total_events": 402 + }, + { + "label": "warm-cache-hit-6", + "duration_ms": 1.268, + "status": 200, + "graphql_errors": 0, + "total_events": 402 + }, + { + "label": "warm-cache-hit-7", + "duration_ms": 1.861, + "status": 200, + "graphql_errors": 0, + "total_events": 402 + }, + { + "label": "warm-cache-hit-8", + "duration_ms": 1.563, + "status": 200, + "graphql_errors": 0, + "total_events": 402 + }, + { + "label": "warm-cache-hit-9", + "duration_ms": 1.35, + "status": 200, + "graphql_errors": 0, + "total_events": 402 + }, + { + "label": "warm-cache-hit-10", + "duration_ms": 1.25, + "status": 200, + "graphql_errors": 0, + "total_events": 402 + } + ], + "summary": { + "warm_iterations": 10, + "warm_min_ms": 1.239, + "warm_median_ms": 1.563, + "warm_p95_ms": 3.051, + "warm_max_ms": 3.051, + "cold_to_warm_median_ratio": 19.81 + }, + "safe_claim_note": "This is a local cold-vs-warm GraphQL dashboard cache measurement. It is not a production cache-hit-ratio benchmark." +} diff --git a/docs/benchmarks/evidence/dashboard-query-2026-06-16-clean-publish-benchmark.json b/docs/benchmarks/evidence/dashboard-query-2026-06-16-clean-publish-benchmark.json new file mode 100644 index 0000000..500be17 --- /dev/null +++ b/docs/benchmarks/evidence/dashboard-query-2026-06-16-clean-publish-benchmark.json @@ -0,0 +1,234 @@ +{ + "root_group": { + "id": "d41d8cd98f00b204e9800998ecf8427e", + "groups": [], + "checks": [ + { + "name": "graphql status ok", + "path": "::graphql status ok", + "id": "5ebf21a86bad85f2506224456f9905d1", + "passes": 42399, + "fails": 0 + }, + { + "passes": 42399, + "fails": 0, + "name": "graphql has no errors", + "path": "::graphql has no errors", + "id": "a51792f93b5c8e532ce92a5e740eaace" + } + ], + "name": "", + "path": "" + }, + "options": { + "summaryTrendStats": [ + "avg", + "min", + "med", + "p(90)", + "p(95)", + "p(99)", + "max" + ], + "summaryTimeUnit": "", + "noColor": false + }, + "state": { + "isStdOutTTY": false, + "isStdErrTTY": false, + "testRunDurationMs": 20003.143578 + }, + "metrics": { + "http_req_duration{expected_response:true}": { + "type": "trend", + "contains": "time", + "values": { + "max": 158.558959, + "avg": 4.586409179367396, + "min": 1.944583, + "med": 4.235958, + "p(90)": 5.718141200000001, + "p(95)": 6.600733, + "p(99)": 11.481469339999999 + } + }, + "iteration_duration": { + "type": "trend", + "contains": "time", + "values": { + "p(95)": 6.773816399999998, + "p(99)": 11.743092, + "max": 160.99325, + "avg": 4.713319219226793, + "min": 2.042375, + "med": 4.353292, + "p(90)": 5.8715082 + } + }, + "http_req_sending": { + "type": "trend", + "contains": "time", + "values": { + "med": 0.003459, + "p(90)": 0.007292, + "p(95)": 0.010708, + "p(99)": 0.040669499999999956, + "max": 3.889917, + "avg": 0.006631863345833489, + "min": 0.001291 + } + }, + "http_req_receiving": { + "type": "trend", + "contains": "time", + "values": { + "p(99)": 0.13179449999999995, + "max": 2.273208, + "avg": 0.022936492110663165, + "min": 0.003875, + "med": 0.013958, + "p(90)": 0.037666, + "p(95)": 0.05525 + } + }, + "vus": { + "type": "gauge", + "contains": "default", + "values": { + "value": 10, + "min": 10, + "max": 10 + } + }, + "http_req_failed": { + "thresholds": { + "rate<0.05": { + "ok": true + } + }, + "type": "rate", + "contains": "default", + "values": { + "rate": 0, + "passes": 0, + "fails": 42399 + } + }, + "checks": { + "type": "rate", + "contains": "default", + "values": { + "rate": 1, + "passes": 84798, + "fails": 0 + } + }, + "http_req_blocked": { + "type": "trend", + "contains": "time", + "values": { + "p(90)": 0.001708, + "p(95)": 0.002125, + "p(99)": 0.005042819999999984, + "max": 3.500583, + "avg": 0.0018700171466307209, + "min": 0.000334, + "med": 0.000917 + } + }, + "data_sent": { + "type": "counter", + "contains": "data", + "values": { + "count": 42790323, + "rate": 2139179.9160538926 + } + }, + "vus_max": { + "type": "gauge", + "contains": "default", + "values": { + "max": 10, + "value": 10, + "min": 10 + } + }, + "iterations": { + "type": "counter", + "contains": "default", + "values": { + "count": 42399, + "rate": 2119.616840956517 + } + }, + "http_req_waiting": { + "type": "trend", + "contains": "time", + "values": { + "p(95)": 6.543891699999998, + "p(99)": 11.437194179999995, + "max": 158.511709, + "avg": 4.556840823910954, + "min": 1.163875, + "med": 4.210209, + "p(90)": 5.6807834 + } + }, + "http_reqs": { + "type": "counter", + "contains": "default", + "values": { + "count": 42399, + "rate": 2119.616840956517 + } + }, + "http_req_tls_handshaking": { + "type": "trend", + "contains": "time", + "values": { + "p(95)": 0, + "p(99)": 0, + "max": 0, + "avg": 0, + "min": 0, + "med": 0, + "p(90)": 0 + } + }, + "data_received": { + "type": "counter", + "contains": "data", + "values": { + "count": 56869962, + "rate": 2843051.232334658 + } + }, + "http_req_duration": { + "type": "trend", + "contains": "time", + "values": { + "max": 158.558959, + "avg": 4.586409179367396, + "min": 1.944583, + "med": 4.235958, + "p(90)": 5.718141200000001, + "p(95)": 6.600733, + "p(99)": 11.481469339999999 + } + }, + "http_req_connecting": { + "type": "trend", + "contains": "time", + "values": { + "p(90)": 0, + "p(95)": 0, + "p(99)": 0, + "max": 3.472583, + "avg": 0.0005208662468454444, + "min": 0, + "med": 0 + } + } + } +} \ No newline at end of file diff --git a/docs/benchmarks/evidence/hot-tenant-2026-06-16-clean-publish-benchmark.json b/docs/benchmarks/evidence/hot-tenant-2026-06-16-clean-publish-benchmark.json new file mode 100644 index 0000000..2b5e442 --- /dev/null +++ b/docs/benchmarks/evidence/hot-tenant-2026-06-16-clean-publish-benchmark.json @@ -0,0 +1,227 @@ +{ + "root_group": { + "id": "d41d8cd98f00b204e9800998ecf8427e", + "groups": [], + "checks": [ + { + "name": "accepted or rate limited", + "path": "::accepted or rate limited", + "id": "f582c15494cc8beb1c5d5e90e17b3cef", + "passes": 425, + "fails": 0 + } + ], + "name": "", + "path": "" + }, + "options": { + "summaryTrendStats": [ + "avg", + "min", + "med", + "p(90)", + "p(95)", + "p(99)", + "max" + ], + "summaryTimeUnit": "", + "noColor": false + }, + "state": { + "testRunDurationMs": 25010.466623, + "isStdOutTTY": false, + "isStdErrTTY": false + }, + "metrics": { + "checks": { + "type": "rate", + "contains": "default", + "values": { + "passes": 425, + "fails": 0, + "rate": 1 + } + }, + "http_req_duration": { + "type": "trend", + "contains": "time", + "values": { + "p(95)": 6.691916399999999, + "p(99)": 9.324239839999997, + "max": 16.761959, + "avg": 3.931087865882351, + "min": 1.884042, + "med": 3.55325, + "p(90)": 5.7649084 + } + }, + "iteration_duration": { + "contains": "time", + "values": { + "max": 17.234417, + "avg": 4.390649985882352, + "min": 2.0455, + "med": 3.922167, + "p(90)": 6.9094082000000006, + "p(95)": 7.657417, + "p(99)": 11.856343239999996 + }, + "type": "trend" + }, + "data_received": { + "type": "counter", + "contains": "data", + "values": { + "count": 158525, + "rate": 6338.346356729628 + } + }, + "http_req_tls_handshaking": { + "type": "trend", + "contains": "time", + "values": { + "p(90)": 0, + "p(95)": 0, + "p(99)": 0, + "max": 0, + "avg": 0, + "min": 0, + "med": 0 + } + }, + "http_req_sending": { + "type": "trend", + "contains": "time", + "values": { + "max": 0.65175, + "avg": 0.045077868235294055, + "min": 0.01525, + "med": 0.034208, + "p(90)": 0.0694086, + "p(95)": 0.09039179999999995, + "p(99)": 0.16366683999999995 + } + }, + "http_req_failed": { + "type": "rate", + "contains": "default", + "values": { + "rate": 0, + "passes": 0, + "fails": 425 + }, + "thresholds": { + "rate<0.10": { + "ok": true + } + } + }, + "vus": { + "type": "gauge", + "contains": "default", + "values": { + "value": 0, + "min": 0, + "max": 0 + } + }, + "http_reqs": { + "type": "counter", + "contains": "default", + "values": { + "count": 425, + "rate": 16.99288567487836 + } + }, + "iterations": { + "type": "counter", + "contains": "default", + "values": { + "count": 425, + "rate": 16.99288567487836 + } + }, + "vus_max": { + "type": "gauge", + "contains": "default", + "values": { + "value": 30, + "min": 30, + "max": 30 + } + }, + "http_req_connecting": { + "type": "trend", + "contains": "time", + "values": { + "p(90)": 0, + "p(95)": 1.1314331999999996, + "p(99)": 2.1414929199999992, + "max": 3.551041, + "avg": 0.11476666352941173, + "min": 0, + "med": 0 + } + }, + "http_req_blocked": { + "type": "trend", + "contains": "time", + "values": { + "p(95)": 1.2051493999999996, + "p(99)": 2.2855749199999993, + "max": 3.685584, + "avg": 0.13003057647058822, + "min": 0.002625, + "med": 0.005875, + "p(90)": 0.02036680000000001 + } + }, + "data_sent": { + "values": { + "count": 233204, + "rate": 9324.25626099843 + }, + "type": "counter", + "contains": "data" + }, + "http_req_duration{expected_response:true}": { + "type": "trend", + "contains": "time", + "values": { + "avg": 3.931087865882351, + "min": 1.884042, + "med": 3.55325, + "p(90)": 5.7649084, + "p(95)": 6.691916399999999, + "p(99)": 9.324239839999997, + "max": 16.761959 + } + }, + "http_req_receiving": { + "type": "trend", + "contains": "time", + "values": { + "avg": 0.06900096705882353, + "min": 0.026166, + "med": 0.060709, + "p(90)": 0.10904980000000002, + "p(95)": 0.12737539999999997, + "p(99)": 0.16487491999999998, + "max": 0.239708 + } + }, + "http_req_waiting": { + "type": "trend", + "contains": "time", + "values": { + "p(95)": 6.4973334000000005, + "p(99)": 9.124723079999997, + "max": 16.557709, + "avg": 3.8170090305882365, + "min": 1.825292, + "med": 3.444333, + "p(90)": 5.5820663999999995 + } + } + } +} \ No newline at end of file diff --git a/docs/benchmarks/evidence/hot-tenant-db-2026-06-16-clean-publish-benchmark.json b/docs/benchmarks/evidence/hot-tenant-db-2026-06-16-clean-publish-benchmark.json new file mode 100644 index 0000000..1502187 --- /dev/null +++ b/docs/benchmarks/evidence/hot-tenant-db-2026-06-16-clean-publish-benchmark.json @@ -0,0 +1,541 @@ +{ + "run_id": "2026-06-16-clean-publish-benchmark", + "captured_at": "2026-06-16T21:23:17.206Z", + "manifest": { + "path": "tmp/clean-publish-benchmark-tenants.json", + "counts": { + "total": 100, + "hot": 1, + "medium": 10, + "quiet": 89 + } + }, + "total_events": 425, + "tenant_distribution": [ + { + "tenant_class": "hot", + "events": 322, + "orgs": 1, + "projects": 1, + "synthetic_tenants": 1, + "first_event_timestamp": "2026-06-16T21:22:49.484Z", + "last_event_timestamp": "2026-06-16T21:23:14.106Z", + "first_received_at": "2026-06-16T21:22:49.491Z", + "last_received_at": "2026-06-16T21:23:14.114Z" + }, + { + "tenant_class": "quiet", + "events": 75, + "orgs": 53, + "projects": 53, + "synthetic_tenants": 53, + "first_event_timestamp": "2026-06-16T21:22:49.814Z", + "last_event_timestamp": "2026-06-16T21:23:13.488Z", + "first_received_at": "2026-06-16T21:22:49.820Z", + "last_received_at": "2026-06-16T21:23:13.491Z" + }, + { + "tenant_class": "medium", + "events": 28, + "orgs": 10, + "projects": 10, + "synthetic_tenants": 10, + "first_event_timestamp": "2026-06-16T21:22:51.109Z", + "last_event_timestamp": "2026-06-16T21:23:14.295Z", + "first_received_at": "2026-06-16T21:22:51.118Z", + "last_received_at": "2026-06-16T21:23:14.303Z" + } + ], + "top_tenants": [ + { + "tenant_class": "hot", + "tenant_id": "tenant_001", + "org_id": "00000000-0000-4000-8000-0000000f4241", + "project_id": "00000000-0000-4000-8000-0000001e8481", + "events": 322, + "users": 322, + "event_names": 2, + "first_event_timestamp": "2026-06-16T21:22:49.484Z", + "last_event_timestamp": "2026-06-16T21:23:14.106Z" + }, + { + "tenant_class": "medium", + "tenant_id": "tenant_006", + "org_id": "00000000-0000-4000-8000-0000000f4246", + "project_id": "00000000-0000-4000-8000-0000001e8486", + "events": 5, + "users": 5, + "event_names": 1, + "first_event_timestamp": "2026-06-16T21:22:59.419Z", + "last_event_timestamp": "2026-06-16T21:23:14.295Z" + }, + { + "tenant_class": "medium", + "tenant_id": "tenant_004", + "org_id": "00000000-0000-4000-8000-0000000f4244", + "project_id": "00000000-0000-4000-8000-0000001e8484", + "events": 4, + "users": 4, + "event_names": 1, + "first_event_timestamp": "2026-06-16T21:22:54.821Z", + "last_event_timestamp": "2026-06-16T21:23:11.050Z" + }, + { + "tenant_class": "medium", + "tenant_id": "tenant_005", + "org_id": "00000000-0000-4000-8000-0000000f4245", + "project_id": "00000000-0000-4000-8000-0000001e8485", + "events": 4, + "users": 4, + "event_names": 1, + "first_event_timestamp": "2026-06-16T21:22:51.109Z", + "last_event_timestamp": "2026-06-16T21:23:10.474Z" + }, + { + "tenant_class": "quiet", + "tenant_id": "tenant_014", + "org_id": "00000000-0000-4000-8000-0000000f424e", + "project_id": "00000000-0000-4000-8000-0000001e848e", + "events": 3, + "users": 3, + "event_names": 2, + "first_event_timestamp": "2026-06-16T21:22:49.814Z", + "last_event_timestamp": "2026-06-16T21:23:00.221Z" + }, + { + "tenant_class": "medium", + "tenant_id": "tenant_007", + "org_id": "00000000-0000-4000-8000-0000000f4247", + "project_id": "00000000-0000-4000-8000-0000001e8487", + "events": 3, + "users": 3, + "event_names": 1, + "first_event_timestamp": "2026-06-16T21:23:04.470Z", + "last_event_timestamp": "2026-06-16T21:23:08.468Z" + }, + { + "tenant_class": "quiet", + "tenant_id": "tenant_030", + "org_id": "00000000-0000-4000-8000-0000000f425e", + "project_id": "00000000-0000-4000-8000-0000001e849e", + "events": 3, + "users": 3, + "event_names": 2, + "first_event_timestamp": "2026-06-16T21:22:51.558Z", + "last_event_timestamp": "2026-06-16T21:23:05.470Z" + }, + { + "tenant_class": "quiet", + "tenant_id": "tenant_067", + "org_id": "00000000-0000-4000-8000-0000000f4283", + "project_id": "00000000-0000-4000-8000-0000001e84c3", + "events": 3, + "users": 3, + "event_names": 2, + "first_event_timestamp": "2026-06-16T21:22:53.961Z", + "last_event_timestamp": "2026-06-16T21:23:02.869Z" + }, + { + "tenant_class": "quiet", + "tenant_id": "tenant_022", + "org_id": "00000000-0000-4000-8000-0000000f4256", + "project_id": "00000000-0000-4000-8000-0000001e8496", + "events": 3, + "users": 3, + "event_names": 2, + "first_event_timestamp": "2026-06-16T21:22:50.808Z", + "last_event_timestamp": "2026-06-16T21:23:09.523Z" + }, + { + "tenant_class": "medium", + "tenant_id": "tenant_010", + "org_id": "00000000-0000-4000-8000-0000000f424a", + "project_id": "00000000-0000-4000-8000-0000001e848a", + "events": 3, + "users": 3, + "event_names": 1, + "first_event_timestamp": "2026-06-16T21:22:53.582Z", + "last_event_timestamp": "2026-06-16T21:23:10.177Z" + }, + { + "tenant_class": "quiet", + "tenant_id": "tenant_036", + "org_id": "00000000-0000-4000-8000-0000000f4264", + "project_id": "00000000-0000-4000-8000-0000001e84a4", + "events": 3, + "users": 3, + "event_names": 1, + "first_event_timestamp": "2026-06-16T21:22:52.038Z", + "last_event_timestamp": "2026-06-16T21:23:01.321Z" + }, + { + "tenant_class": "quiet", + "tenant_id": "tenant_082", + "org_id": "00000000-0000-4000-8000-0000000f4292", + "project_id": "00000000-0000-4000-8000-0000001e84d2", + "events": 3, + "users": 3, + "event_names": 2, + "first_event_timestamp": "2026-06-16T21:22:54.722Z", + "last_event_timestamp": "2026-06-16T21:23:08.072Z" + }, + { + "tenant_class": "quiet", + "tenant_id": "tenant_019", + "org_id": "00000000-0000-4000-8000-0000000f4253", + "project_id": "00000000-0000-4000-8000-0000001e8493", + "events": 2, + "users": 2, + "event_names": 2, + "first_event_timestamp": "2026-06-16T21:23:00.470Z", + "last_event_timestamp": "2026-06-16T21:23:04.920Z" + }, + { + "tenant_class": "quiet", + "tenant_id": "tenant_053", + "org_id": "00000000-0000-4000-8000-0000000f4275", + "project_id": "00000000-0000-4000-8000-0000001e84b5", + "events": 2, + "users": 2, + "event_names": 2, + "first_event_timestamp": "2026-06-16T21:22:53.176Z", + "last_event_timestamp": "2026-06-16T21:22:57.719Z" + }, + { + "tenant_class": "medium", + "tenant_id": "tenant_003", + "org_id": "00000000-0000-4000-8000-0000000f4243", + "project_id": "00000000-0000-4000-8000-0000001e8483", + "events": 2, + "users": 2, + "event_names": 1, + "first_event_timestamp": "2026-06-16T21:23:04.270Z", + "last_event_timestamp": "2026-06-16T21:23:06.269Z" + }, + { + "tenant_class": "quiet", + "tenant_id": "tenant_061", + "org_id": "00000000-0000-4000-8000-0000000f427d", + "project_id": "00000000-0000-4000-8000-0000001e84bd", + "events": 2, + "users": 2, + "event_names": 1, + "first_event_timestamp": "2026-06-16T21:22:53.638Z", + "last_event_timestamp": "2026-06-16T21:22:58.120Z" + }, + { + "tenant_class": "medium", + "tenant_id": "tenant_011", + "org_id": "00000000-0000-4000-8000-0000000f424b", + "project_id": "00000000-0000-4000-8000-0000001e848b", + "events": 2, + "users": 2, + "event_names": 1, + "first_event_timestamp": "2026-06-16T21:23:06.670Z", + "last_event_timestamp": "2026-06-16T21:23:10.855Z" + }, + { + "tenant_class": "quiet", + "tenant_id": "tenant_038", + "org_id": "00000000-0000-4000-8000-0000000f4266", + "project_id": "00000000-0000-4000-8000-0000001e84a6", + "events": 2, + "users": 2, + "event_names": 1, + "first_event_timestamp": "2026-06-16T21:22:52.186Z", + "last_event_timestamp": "2026-06-16T21:23:05.868Z" + }, + { + "tenant_class": "medium", + "tenant_id": "tenant_002", + "org_id": "00000000-0000-4000-8000-0000000f4242", + "project_id": "00000000-0000-4000-8000-0000001e8482", + "events": 2, + "users": 2, + "event_names": 1, + "first_event_timestamp": "2026-06-16T21:22:52.469Z", + "last_event_timestamp": "2026-06-16T21:23:10.916Z" + }, + { + "tenant_class": "medium", + "tenant_id": "tenant_008", + "org_id": "00000000-0000-4000-8000-0000000f4248", + "project_id": "00000000-0000-4000-8000-0000001e8488", + "events": 2, + "users": 2, + "event_names": 1, + "first_event_timestamp": "2026-06-16T21:22:54.015Z", + "last_event_timestamp": "2026-06-16T21:23:00.521Z" + } + ], + "aggregate_pressure": [ + { + "tenant_class": "hot", + "event_count_aggregate_keys": 2, + "events": 322, + "avg_events_per_event_count_key": 161, + "max_events_per_event_count_key": 257, + "p95_events_per_event_count_key": 247.39999999999998 + }, + { + "tenant_class": "quiet", + "event_count_aggregate_keys": 62, + "events": 75, + "avg_events_per_event_count_key": 1.21, + "max_events_per_event_count_key": 3, + "p95_events_per_event_count_key": 2 + }, + { + "tenant_class": "medium", + "event_count_aggregate_keys": 10, + "events": 28, + "avg_events_per_event_count_key": 2.8, + "max_events_per_event_count_key": 5, + "p95_events_per_event_count_key": 4.55 + } + ], + "active_user_pressure": [ + { + "tenant_class": "hot", + "active_user_keys": 322, + "events_with_user": 322, + "avg_events_per_user_day_key": 1, + "max_events_per_user_day_key": 1 + }, + { + "tenant_class": "quiet", + "active_user_keys": 75, + "events_with_user": 75, + "avg_events_per_user_day_key": 1, + "max_events_per_user_day_key": 1 + }, + { + "tenant_class": "medium", + "active_user_keys": 28, + "events_with_user": 28, + "avg_events_per_user_day_key": 1, + "max_events_per_user_day_key": 1 + } + ], + "burst_seconds": [ + { + "second": "2026-06-16T21:22:54.000Z", + "events": 20 + }, + { + "second": "2026-06-16T21:22:55.000Z", + "events": 20 + }, + { + "second": "2026-06-16T21:22:56.000Z", + "events": 20 + }, + { + "second": "2026-06-16T21:22:57.000Z", + "events": 20 + }, + { + "second": "2026-06-16T21:22:58.000Z", + "events": 20 + }, + { + "second": "2026-06-16T21:22:59.000Z", + "events": 20 + }, + { + "second": "2026-06-16T21:23:00.000Z", + "events": 20 + }, + { + "second": "2026-06-16T21:23:01.000Z", + "events": 20 + }, + { + "second": "2026-06-16T21:23:02.000Z", + "events": 20 + }, + { + "second": "2026-06-16T21:23:03.000Z", + "events": 20 + } + ], + "partitions": [ + { + "partition": "events_2026_06", + "events": 425 + } + ], + "reconciliation": { + "status": "complete", + "k6_summary_path": "docs/benchmarks/evidence/hot-tenant-2026-06-16-clean-publish-benchmark.json", + "k6_request_count": 425, + "k6_http_req_failed_rate": 0, + "persisted_events": 425, + "kafka_lag_after": { + "captured_at": "2026-06-16T21:23:17.200Z", + "partitions": [ + { + "topic": "events-raw", + "partition": 0, + "current_offset": 2815, + "log_end_offset": 2815, + "lag": 0 + }, + { + "topic": "events-raw", + "partition": 1, + "current_offset": 2821, + "log_end_offset": 2821, + "lag": 0 + }, + { + "topic": "events-raw", + "partition": 2, + "current_offset": 2800, + "log_end_offset": 2800, + "lag": 0 + } + ], + "total_lag": 0 + }, + "notes": [] + }, + "postgres_snapshot": { + "locks": { + "locks_total": 2, + "waiting_locks": 0, + "waiting_pids": 0 + }, + "activity": { + "connections": 2, + "active_connections": 1, + "waiting_connections": 1, + "lock_wait_connections": 0 + }, + "database": { + "xact_commit": "9127", + "xact_rollback": "2", + "deadlocks": "0", + "conflicts": "0", + "temp_files": "0", + "temp_bytes": "0", + "blk_read_time": 0, + "blk_write_time": 0 + }, + "table_stats": [ + { + "relname": "daily_active_users", + "n_tup_ins": "8441", + "n_tup_upd": "0", + "n_tup_del": "5", + "n_dead_tup": "0", + "seq_scan": "3", + "idx_scan": "8456" + }, + { + "relname": "daily_aggregates", + "n_tup_ins": "623", + "n_tup_upd": "24704", + "n_tup_del": "19", + "n_dead_tup": "0", + "seq_scan": "32", + "idx_scan": "25327" + }, + { + "relname": "event_dedup_keys", + "n_tup_ins": "8440", + "n_tup_upd": "0", + "n_tup_del": "4", + "n_dead_tup": "4", + "seq_scan": "2", + "idx_scan": "8454" + }, + { + "relname": "events", + "n_tup_ins": "0", + "n_tup_upd": "0", + "n_tup_del": "0", + "n_dead_tup": "0", + "seq_scan": "0", + "idx_scan": "0" + } + ], + "note": "PostgreSQL lock/activity/stat snapshots are captured after the run, not sampled continuously during load." + }, + "representative_query_plans": [ + { + "label": "hot-raw-event-count", + "planning_ms": 0.526, + "execution_ms": 0.11, + "plan_summary": { + "node_types": { + "Aggregate": 1, + "Index Only Scan": 1 + }, + "relations": [ + "events_2026_06" + ], + "actual_rows_sum": 323, + "plan_rows_sum": 2, + "shared_hit_blocks": 12, + "shared_read_blocks": 0 + } + }, + { + "label": "hot-aggregate-total-events", + "planning_ms": 0.575, + "execution_ms": 0.04, + "plan_summary": { + "node_types": { + "Aggregate": 1, + "Index Scan": 1 + }, + "relations": [ + "daily_aggregates" + ], + "actual_rows_sum": 2, + "plan_rows_sum": 2, + "shared_hit_blocks": 6, + "shared_read_blocks": 0 + } + }, + { + "label": "quiet-raw-event-count", + "planning_ms": 0.106, + "execution_ms": 0.061, + "plan_summary": { + "node_types": { + "Aggregate": 1, + "Index Scan": 1 + }, + "relations": [ + "events_2026_06" + ], + "actual_rows_sum": 4, + "plan_rows_sum": 2, + "shared_hit_blocks": 24, + "shared_read_blocks": 0 + } + }, + { + "label": "quiet-aggregate-total-events", + "planning_ms": 0.053, + "execution_ms": 0.033, + "plan_summary": { + "node_types": { + "Aggregate": 1, + "Index Scan": 1 + }, + "relations": [ + "daily_aggregates" + ], + "actual_rows_sum": 2, + "plan_rows_sum": 2, + "shared_hit_blocks": 6, + "shared_read_blocks": 0 + } + } + ], + "safe_claim_note": "Local hot-tenant PostgreSQL evidence for one run_id. Aggregate pressure is derived from raw events per aggregate key, not from a Postgres ON CONFLICT counter.", + "unsafe_claim_note": "Do not claim production lock behavior or long-duration contention from this after-run snapshot." +} diff --git a/docs/benchmarks/evidence/ingest-throughput-2026-06-16-clean-publish-benchmark.json b/docs/benchmarks/evidence/ingest-throughput-2026-06-16-clean-publish-benchmark.json new file mode 100644 index 0000000..a43fb57 --- /dev/null +++ b/docs/benchmarks/evidence/ingest-throughput-2026-06-16-clean-publish-benchmark.json @@ -0,0 +1,227 @@ +{ + "state": { + "isStdOutTTY": false, + "isStdErrTTY": false, + "testRunDurationMs": 20001.260745 + }, + "metrics": { + "http_req_duration": { + "type": "trend", + "contains": "time", + "values": { + "p(90)": 5.8612910000000005, + "p(95)": 7.186158649999998, + "p(99)": 10.549927499999992, + "max": 85.845166, + "avg": 4.290495077500001, + "min": 2.300833, + "med": 3.3425209999999996 + } + }, + "data_sent": { + "type": "counter", + "contains": "data", + "values": { + "count": 2935391, + "rate": 146760.2986343649 + } + }, + "http_req_duration{expected_response:true}": { + "type": "trend", + "contains": "time", + "values": { + "min": 2.300833, + "med": 3.3425209999999996, + "p(90)": 5.8612910000000005, + "p(95)": 7.186158649999998, + "p(99)": 10.549927499999992, + "max": 85.845166, + "avg": 4.290495077500001 + } + }, + "http_req_blocked": { + "type": "trend", + "contains": "time", + "values": { + "avg": 0.09033376749999983, + "min": 0.002375, + "med": 0.0035, + "p(90)": 0.007441000000000002, + "p(95)": 0.9518850999999999, + "p(99)": 1.2299703299999996, + "max": 3.477125 + } + }, + "vus_max": { + "contains": "default", + "values": { + "min": 30, + "max": 30, + "value": 30 + }, + "type": "gauge" + }, + "http_req_connecting": { + "type": "trend", + "contains": "time", + "values": { + "avg": 0.0810830175, + "min": 0, + "med": 0, + "p(90)": 0, + "p(95)": 0.9057375, + "p(99)": 1.1797836699999997, + "max": 2.822834 + } + }, + "http_req_receiving": { + "type": "trend", + "contains": "time", + "values": { + "med": 0.050354, + "p(90)": 0.07851190000000001, + "p(95)": 0.09783365, + "p(99)": 0.15949883999999995, + "max": 0.793125, + "avg": 0.05816770000000004, + "min": 0.031 + } + }, + "http_req_failed": { + "thresholds": { + "rate<0.05": { + "ok": true + } + }, + "type": "rate", + "contains": "default", + "values": { + "fails": 400, + "rate": 0, + "passes": 0 + } + }, + "http_req_waiting": { + "type": "trend", + "contains": "time", + "values": { + "min": 2.2175, + "med": 3.204771, + "p(90)": 5.703458600000001, + "p(95)": 7.059898199999999, + "p(99)": 10.406952169999993, + "max": 85.466667, + "avg": 4.1600093675000025 + } + }, + "iteration_duration": { + "type": "trend", + "contains": "time", + "values": { + "avg": 4.779590817499996, + "min": 2.60625, + "med": 3.713521, + "p(90)": 6.7098994, + "p(95)": 8.47111075, + "p(99)": 12.217744169999996, + "max": 90.136791 + } + }, + "vus": { + "values": { + "min": 0, + "max": 0, + "value": 0 + }, + "type": "gauge", + "contains": "default" + }, + "checks": { + "type": "rate", + "contains": "default", + "values": { + "rate": 1, + "passes": 400, + "fails": 0 + } + }, + "http_req_sending": { + "type": "trend", + "contains": "time", + "values": { + "p(95)": 0.10465624999999998, + "p(99)": 0.25459518, + "max": 0.380208, + "avg": 0.07231800999999996, + "min": 0.039416, + "med": 0.06514600000000001, + "p(90)": 0.0879756 + } + }, + "data_received": { + "type": "counter", + "contains": "data", + "values": { + "count": 133600, + "rate": 6679.578937712608 + } + }, + "http_req_tls_handshaking": { + "type": "trend", + "contains": "time", + "values": { + "min": 0, + "med": 0, + "p(90)": 0, + "p(95)": 0, + "p(99)": 0, + "max": 0, + "avg": 0 + } + }, + "http_reqs": { + "type": "counter", + "contains": "default", + "values": { + "count": 400, + "rate": 19.998739334468887 + } + }, + "iterations": { + "contains": "default", + "values": { + "count": 400, + "rate": 19.998739334468887 + }, + "type": "counter" + } + }, + "root_group": { + "name": "", + "path": "", + "id": "d41d8cd98f00b204e9800998ecf8427e", + "groups": [], + "checks": [ + { + "name": "accepted or rate limited", + "path": "::accepted or rate limited", + "id": "f582c15494cc8beb1c5d5e90e17b3cef", + "passes": 400, + "fails": 0 + } + ] + }, + "options": { + "summaryTrendStats": [ + "avg", + "min", + "med", + "p(90)", + "p(95)", + "p(99)", + "max" + ], + "summaryTimeUnit": "", + "noColor": false + } +} \ No newline at end of file diff --git a/docs/benchmarks/evidence/ingest-throughput-2026-06-16-clean-publish-test-load.json b/docs/benchmarks/evidence/ingest-throughput-2026-06-16-clean-publish-test-load.json new file mode 100644 index 0000000..2dd78f6 --- /dev/null +++ b/docs/benchmarks/evidence/ingest-throughput-2026-06-16-clean-publish-test-load.json @@ -0,0 +1,227 @@ +{ + "state": { + "isStdOutTTY": false, + "isStdErrTTY": false, + "testRunDurationMs": 10004.142797 + }, + "metrics": { + "iterations": { + "type": "counter", + "contains": "default", + "values": { + "count": 201, + "rate": 20.09167642631761 + } + }, + "http_req_receiving": { + "type": "trend", + "contains": "time", + "values": { + "p(99)": 0.197417, + "max": 0.244709, + "avg": 0.05059987064676619, + "min": 0.029333, + "med": 0.042209, + "p(90)": 0.068375, + "p(95)": 0.09475 + } + }, + "data_received": { + "type": "counter", + "contains": "data", + "values": { + "count": 67134, + "rate": 6710.619926390081 + } + }, + "http_req_blocked": { + "contains": "time", + "values": { + "avg": 0.21769070149253722, + "min": 0.0025, + "med": 0.00325, + "p(90)": 0.959958, + "p(95)": 1.03675, + "p(99)": 3.035584, + "max": 5.512542 + }, + "type": "trend" + }, + "http_req_tls_handshaking": { + "type": "trend", + "contains": "time", + "values": { + "min": 0, + "med": 0, + "p(90)": 0, + "p(95)": 0, + "p(99)": 0, + "max": 0, + "avg": 0 + } + }, + "checks": { + "type": "rate", + "contains": "default", + "values": { + "fails": 0, + "rate": 1, + "passes": 201 + } + }, + "http_reqs": { + "contains": "default", + "values": { + "count": 201, + "rate": 20.09167642631761 + }, + "type": "counter" + }, + "http_req_duration": { + "type": "trend", + "contains": "time", + "values": { + "med": 2.696792, + "p(90)": 3.41825, + "p(95)": 4.411542, + "p(99)": 8.305042, + "max": 12.29675, + "avg": 2.9685424975124373, + "min": 2.21975 + } + }, + "data_sent": { + "values": { + "count": 1262870, + "rate": 126234.70352489411 + }, + "type": "counter", + "contains": "data" + }, + "http_req_connecting": { + "type": "trend", + "contains": "time", + "values": { + "avg": 0.201443184079602, + "min": 0, + "med": 0, + "p(90)": 0.907584, + "p(95)": 0.993958, + "p(99)": 2.907875, + "max": 5.423083 + } + }, + "iteration_duration": { + "type": "trend", + "contains": "time", + "values": { + "max": 39.466958, + "avg": 3.704062756218908, + "min": 2.551458, + "med": 3.107, + "p(90)": 4.490459, + "p(95)": 5.538625, + "p(99)": 12.52775 + } + }, + "http_req_failed": { + "type": "rate", + "contains": "default", + "values": { + "passes": 0, + "fails": 201, + "rate": 0 + }, + "thresholds": { + "rate<0.05": { + "ok": true + } + } + }, + "vus": { + "type": "gauge", + "contains": "default", + "values": { + "value": 0, + "min": 0, + "max": 0 + } + }, + "http_req_duration{expected_response:true}": { + "type": "trend", + "contains": "time", + "values": { + "p(95)": 4.411542, + "p(99)": 8.305042, + "max": 12.29675, + "avg": 2.9685424975124373, + "min": 2.21975, + "med": 2.696792, + "p(90)": 3.41825 + } + }, + "vus_max": { + "type": "gauge", + "contains": "default", + "values": { + "value": 30, + "min": 30, + "max": 30 + } + }, + "http_req_sending": { + "type": "trend", + "contains": "time", + "values": { + "p(95)": 0.122333, + "p(99)": 0.167583, + "max": 0.206667, + "avg": 0.06515839800995026, + "min": 0.037708, + "med": 0.057875, + "p(90)": 0.085416 + } + }, + "http_req_waiting": { + "type": "trend", + "contains": "time", + "values": { + "avg": 2.8527842288557212, + "min": 2.135916, + "med": 2.588792, + "p(90)": 3.284209, + "p(95)": 4.158625, + "p(99)": 8.063125, + "max": 12.010292 + } + } + }, + "root_group": { + "path": "", + "id": "d41d8cd98f00b204e9800998ecf8427e", + "groups": [], + "checks": [ + { + "name": "accepted or rate limited", + "path": "::accepted or rate limited", + "id": "f582c15494cc8beb1c5d5e90e17b3cef", + "passes": 201, + "fails": 0 + } + ], + "name": "" + }, + "options": { + "summaryTrendStats": [ + "avg", + "min", + "med", + "p(90)", + "p(95)", + "p(99)", + "max" + ], + "summaryTimeUnit": "", + "noColor": false + } +} \ No newline at end of file diff --git a/docs/benchmarks/evidence/latest-run-id.txt b/docs/benchmarks/evidence/latest-run-id.txt index 78b90fa..98fc8e6 100644 --- a/docs/benchmarks/evidence/latest-run-id.txt +++ b/docs/benchmarks/evidence/latest-run-id.txt @@ -1 +1 @@ -2026-06-16-clean-full-benchmark +2026-06-16-clean-publish-benchmark diff --git a/docs/benchmarks/evidence/run-metadata-2026-06-16-clean-publish-benchmark.json b/docs/benchmarks/evidence/run-metadata-2026-06-16-clean-publish-benchmark.json new file mode 100644 index 0000000..febb8dd --- /dev/null +++ b/docs/benchmarks/evidence/run-metadata-2026-06-16-clean-publish-benchmark.json @@ -0,0 +1,96 @@ +{ + "run_id": "2026-06-16-clean-publish-benchmark", + "started_at": "2026-06-16T21:22:27.830Z", + "completed_at": "2026-06-16T21:24:13.432Z", + "status": "completed", + "suites_requested": [ + "ingest", + "hot", + "hotDb", + "dashboard", + "cache", + "worker", + "backpressure" + ], + "suites_completed": [ + "ingest", + "hot", + "hotDb", + "dashboard", + "cache", + "worker", + "backpressure" + ], + "git": { + "commit": "63f9556cefad9548774c0eca17b01e558eda3d87", + "branch": "feat/publish-safe-evidence", + "dirty_tree": false, + "dirty_status": "" + }, + "environment": { + "os": "Darwin 25.5.0 arm64", + "node": "v25.3.0", + "docker_resources": { + "cpus": "12", + "memory_bytes": "8217165824" + }, + "overrides": { + "API_URL": "http://localhost:3001", + "GRAPHQL_URL": "http://localhost:3002/graphql", + "RATE": "20", + "DURATION": "20s", + "BATCH_SIZE": "20", + "START_RATE": "5", + "PEAK_RATE": "20", + "RAMP_DURATION": "5s", + "HOLD_DURATION": "15s", + "RAMP_DOWN_DURATION": "5s", + "VUS": "10", + "SLEEP_SECONDS": "0", + "BURST_RATE": "20", + "BURST_RAMP": "5s", + "BURST_HOLD": "15s", + "RECOVERY_RATE": "5", + "RECOVERY": "10s", + "PREALLOCATED_VUS": "30", + "MAX_VUS": "100", + "EVENTS": "200", + "TIMEOUT_MS": "120000", + "POLL_MS": "500", + "WARM_ITERATIONS": "10", + "TENANT_KEYS_FILE": "tmp/clean-publish-benchmark-tenants.json", + "ORG_ID": "00000000-0000-4000-8000-0000000f4241", + "PROJECT_ID": "00000000-0000-4000-8000-0000001e8481" + } + }, + "commands": [ + { + "suite": "ingest", + "command": "node scripts/run-k6.js tests/load/ingest-throughput.js" + }, + { + "suite": "hot", + "command": "node scripts/run-k6.js tests/load/hot-tenant.js" + }, + { + "suite": "hotDb", + "command": "pnpm exec tsx scripts/measure-hot-tenant-db.ts" + }, + { + "suite": "dashboard", + "command": "node scripts/run-k6.js tests/load/dashboard-query.js" + }, + { + "suite": "cache", + "command": "pnpm exec tsx scripts/measure-dashboard-cache.ts" + }, + { + "suite": "worker", + "command": "pnpm exec tsx scripts/measure-worker-catchup.ts" + }, + { + "suite": "backpressure", + "command": "node scripts/run-k6.js tests/load/backpressure.js" + } + ] +} diff --git a/docs/benchmarks/evidence/worker-catchup-2026-06-16-clean-publish-benchmark.json b/docs/benchmarks/evidence/worker-catchup-2026-06-16-clean-publish-benchmark.json new file mode 100644 index 0000000..fefbc6b --- /dev/null +++ b/docs/benchmarks/evidence/worker-catchup-2026-06-16-clean-publish-benchmark.json @@ -0,0 +1,136 @@ +{ + "run_id": "2026-06-16-clean-publish-benchmark", + "captured_at": "2026-06-16T21:23:43.000Z", + "requested_events": 200, + "accepted_events": 200, + "persisted_events": 200, + "batch_size": 20, + "batch_statuses": { + "202": 10 + }, + "acceptance_duration_ms": 104.547, + "catchup_duration_ms": 1787.377, + "accepted_events_per_second": 1913.01, + "persisted_events_per_second_until_caught_up": 111.9, + "kafka_lag_before": { + "captured_at": "2026-06-16T21:23:40.060Z", + "partitions": [ + { + "topic": "events-raw", + "partition": 0, + "current_offset": 2815, + "log_end_offset": 2815, + "lag": 0 + }, + { + "topic": "events-raw", + "partition": 1, + "current_offset": 2821, + "log_end_offset": 2821, + "lag": 0 + }, + { + "topic": "events-raw", + "partition": 2, + "current_offset": 2800, + "log_end_offset": 2800, + "lag": 0 + } + ], + "total_lag": 0 + }, + "kafka_lag_after": { + "captured_at": "2026-06-16T21:23:42.998Z", + "partitions": [ + { + "topic": "events-raw", + "partition": 0, + "current_offset": 2866, + "log_end_offset": 2866, + "lag": 0 + }, + { + "topic": "events-raw", + "partition": 1, + "current_offset": 2885, + "log_end_offset": 2885, + "lag": 0 + }, + { + "topic": "events-raw", + "partition": 2, + "current_offset": 2885, + "log_end_offset": 2885, + "lag": 0 + } + ], + "total_lag": 0 + }, + "batches": [ + { + "status": 202, + "accepted": 20, + "duration_ms": 59.591 + }, + { + "status": 202, + "accepted": 20, + "duration_ms": 8.366 + }, + { + "status": 202, + "accepted": 20, + "duration_ms": 5.782 + }, + { + "status": 202, + "accepted": 20, + "duration_ms": 6.379 + }, + { + "status": 202, + "accepted": 20, + "duration_ms": 5.633 + }, + { + "status": 202, + "accepted": 20, + "duration_ms": 3.461 + }, + { + "status": 202, + "accepted": 20, + "duration_ms": 4.193 + }, + { + "status": 202, + "accepted": 20, + "duration_ms": 3.258 + }, + { + "status": 202, + "accepted": 20, + "duration_ms": 3.739 + }, + { + "status": 202, + "accepted": 20, + "duration_ms": 2.988 + } + ], + "samples": [ + { + "elapsed_ms": 1276.325, + "persisted_events": 14, + "kafka_lag": 0, + "active_db_connections": 1 + }, + { + "elapsed_ms": 1787.377, + "persisted_events": 200, + "kafka_lag": 0, + "active_db_connections": 1 + } + ], + "safe_claim_note": "Local worker catch-up smoke measurement from HTTP acceptance through Kafka to persisted PostgreSQL rows." +} diff --git a/docs/query-plans/2026-06-16-clean-publish-benchmark-aggregate-daily-dashboard.md b/docs/query-plans/2026-06-16-clean-publish-benchmark-aggregate-daily-dashboard.md new file mode 100644 index 0000000..a95a0e1 --- /dev/null +++ b/docs/query-plans/2026-06-16-clean-publish-benchmark-aggregate-daily-dashboard.md @@ -0,0 +1,120 @@ +# Daily Aggregate Dashboard Query + +Captured: 2026-06-16T21:25:50Z +Run ID: 2026-06-16-clean-publish-benchmark +Git commit: 63f9556cefad9548774c0eca17b01e558eda3d87 +Command: RUN_ID=2026-06-16-clean-publish-benchmark ./scripts/capture-query-plans.sh +Target org_id: 00000000-0000-4000-8000-0000000f4241 +Target project_id: 00000000-0000-4000-8000-0000001e8481 + +## PostgreSQL Version + +```text +16.13 +``` + +## Benchmark Run Counts + +```text +events_for_run_id=18365 +target_tenant_events_for_run_id=5482 +``` + +## Table Row Counts + +```text +events=18376 +daily_aggregates=630 +hourly_aggregates=0 +mv_dashboard_metrics=0 +``` + +## Relevant Indexes + +```text +daily_aggregates: daily_aggregates_pkey => CREATE UNIQUE INDEX daily_aggregates_pkey ON public.daily_aggregates USING btree (id) +daily_aggregates: idx_aggregates_lookup => CREATE INDEX idx_aggregates_lookup ON public.daily_aggregates USING btree (org_id, project_id, date) +daily_aggregates: idx_aggregates_metric => CREATE INDEX idx_aggregates_metric ON public.daily_aggregates USING btree (metric_name, date) +daily_aggregates: idx_aggregates_tenant_metric_date => CREATE INDEX idx_aggregates_tenant_metric_date ON public.daily_aggregates USING btree (tenant_id, metric_name, date DESC) +daily_aggregates: idx_daily_aggregates_date => CREATE INDEX idx_daily_aggregates_date ON public.daily_aggregates USING btree (date DESC) +daily_aggregates: idx_daily_aggregates_metric_name => CREATE INDEX idx_daily_aggregates_metric_name ON public.daily_aggregates USING btree (metric_name, date DESC) +daily_aggregates: idx_daily_aggregates_org_project_date => CREATE INDEX idx_daily_aggregates_org_project_date ON public.daily_aggregates USING btree (org_id, project_id, date DESC) +daily_aggregates: unique_aggregate => CREATE UNIQUE INDEX unique_aggregate ON public.daily_aggregates USING btree (org_id, project_id, metric_name, date, dimensions) +events: events_org_project_event_id_timestamp_key => CREATE UNIQUE INDEX events_org_project_event_id_timestamp_key ON ONLY public.events USING btree (org_id, project_id, event_id, "timestamp") +events: events_pkey => CREATE UNIQUE INDEX events_pkey ON ONLY public.events USING btree (id, "timestamp") +events: idx_events_event_id => CREATE INDEX idx_events_event_id ON ONLY public.events USING btree (org_id, project_id, event_id) +events: idx_events_event_id_time => CREATE INDEX idx_events_event_id_time ON ONLY public.events USING btree (event_id, "timestamp") +events: idx_events_event_name_time => CREATE INDEX idx_events_event_name_time ON ONLY public.events USING btree (event_name, "timestamp" DESC) +events: idx_events_org_project => CREATE INDEX idx_events_org_project ON ONLY public.events USING btree (org_id, project_id) +events: idx_events_org_project_time => CREATE INDEX idx_events_org_project_time ON ONLY public.events USING btree (org_id, project_id, "timestamp" DESC) +events: idx_events_org_time => CREATE INDEX idx_events_org_time ON ONLY public.events USING btree (org_id, "timestamp" DESC) +events: idx_events_project_time => CREATE INDEX idx_events_project_time ON ONLY public.events USING btree (project_id, "timestamp" DESC) +events: idx_events_properties_gin => CREATE INDEX idx_events_properties_gin ON ONLY public.events USING gin (properties) +events: idx_events_session => CREATE INDEX idx_events_session ON ONLY public.events USING btree (session_id) WHERE (session_id IS NOT NULL) +events: idx_events_tenant_metric_time => CREATE INDEX idx_events_tenant_metric_time ON ONLY public.events USING btree (tenant_id, event_name, "timestamp" DESC) +events: idx_events_tenant_time => CREATE INDEX idx_events_tenant_time ON ONLY public.events USING btree (tenant_id, "timestamp" DESC) +events: idx_events_timestamp => CREATE INDEX idx_events_timestamp ON ONLY public.events USING btree ("timestamp" DESC) +events: idx_events_user_time => CREATE INDEX idx_events_user_time ON ONLY public.events USING btree (org_id, project_id, user_id, "timestamp" DESC) WHERE (user_id IS NOT NULL) +hourly_aggregates: hourly_aggregates_pkey => CREATE UNIQUE INDEX hourly_aggregates_pkey ON public.hourly_aggregates USING btree (id) +hourly_aggregates: idx_hourly_aggregates_lookup => CREATE INDEX idx_hourly_aggregates_lookup ON public.hourly_aggregates USING btree (org_id, project_id, hour DESC) +hourly_aggregates: idx_hourly_aggregates_metric => CREATE INDEX idx_hourly_aggregates_metric ON public.hourly_aggregates USING btree (metric_name, hour DESC) +hourly_aggregates: idx_hourly_aggregates_tenant_metric_hour => CREATE INDEX idx_hourly_aggregates_tenant_metric_hour ON public.hourly_aggregates USING btree (tenant_id, metric_name, hour DESC) +hourly_aggregates: unique_hourly_aggregate => CREATE UNIQUE INDEX unique_hourly_aggregate ON public.hourly_aggregates USING btree (org_id, project_id, metric_name, hour, dimensions) +mv_dashboard_metrics: idx_mv_dashboard_org_project_date => CREATE INDEX idx_mv_dashboard_org_project_date ON public.mv_dashboard_metrics USING btree (org_id, project_id, date DESC) +mv_dashboard_metrics: idx_mv_dashboard_unique => CREATE UNIQUE INDEX idx_mv_dashboard_unique ON public.mv_dashboard_metrics USING btree (org_id, project_id, date) +``` + +## Event Partitions + +```text +events_2026_03 +events_2026_04 +events_2026_05 +events_2026_06 +events_2026_07 +events_2026_08 +events_2026_09 +``` + +## Query + +```sql +EXPLAIN (ANALYZE, BUFFERS) +SELECT date, metric_name, SUM(metric_value) AS metric_value +FROM daily_aggregates +WHERE org_id = '00000000-0000-4000-8000-0000000f4241' + AND project_id = '00000000-0000-4000-8000-0000001e8481' + AND metric_name IN ('dau', 'event_count', 'total_events') + AND date >= CURRENT_DATE - INTERVAL '90 days' +GROUP BY date, metric_name +ORDER BY date ASC, metric_name ASC; +``` + +## EXPLAIN ANALYZE + +```text + QUERY PLAN +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + GroupAggregate (cost=8.32..8.39 rows=1 width=46) (actual time=0.092..0.095 rows=6 loops=1) + Group Key: date, metric_name + Buffers: shared hit=25 + -> Incremental Sort (cost=8.32..8.36 rows=2 width=19) (actual time=0.081..0.081 rows=12 loops=1) + Sort Key: date, metric_name + Presorted Key: date + Full-sort Groups: 1 Sort Method: quicksort Average Memory: 25kB Peak Memory: 25kB + Buffers: shared hit=25 + -> Index Scan Backward using idx_daily_aggregates_org_project_date on daily_aggregates (cost=0.28..8.31 rows=1 width=19) (actual time=0.023..0.059 rows=12 loops=1) + Index Cond: ((org_id = '00000000-0000-4000-8000-0000000f4241'::uuid) AND (project_id = '00000000-0000-4000-8000-0000001e8481'::uuid) AND (date >= (CURRENT_DATE - '90 days'::interval))) + Filter: ((metric_name)::text = ANY ('{dau,event_count,total_events}'::text[])) + Buffers: shared hit=19 + Planning: + Buffers: shared hit=339 + Planning Time: 0.905 ms + Execution Time: 0.144 ms +(16 rows) + +``` + +## Interpretation + +This plan is the dashboard aggregate read path. It should touch aggregate rows rather than raw event rows. diff --git a/docs/query-plans/2026-06-16-clean-publish-benchmark-graphql-cache-path.md b/docs/query-plans/2026-06-16-clean-publish-benchmark-graphql-cache-path.md new file mode 100644 index 0000000..212bfa5 --- /dev/null +++ b/docs/query-plans/2026-06-16-clean-publish-benchmark-graphql-cache-path.md @@ -0,0 +1,83 @@ +# Dashboard Cache Evidence Note + +Captured: 2026-06-16T21:25:51Z +Run ID: 2026-06-16-clean-publish-benchmark +Git commit: 63f9556cefad9548774c0eca17b01e558eda3d87 +Target org_id: 00000000-0000-4000-8000-0000000f4241 +Target project_id: 00000000-0000-4000-8000-0000001e8481 + +## Benchmark Run Counts + +```text +events_for_run_id=18365 +target_tenant_events_for_run_id=5482 +``` + +## Table Row Counts + +```text +events=18376 +daily_aggregates=630 +hourly_aggregates=0 +mv_dashboard_metrics=0 +``` + +## Relevant Indexes + +```text +daily_aggregates: daily_aggregates_pkey => CREATE UNIQUE INDEX daily_aggregates_pkey ON public.daily_aggregates USING btree (id) +daily_aggregates: idx_aggregates_lookup => CREATE INDEX idx_aggregates_lookup ON public.daily_aggregates USING btree (org_id, project_id, date) +daily_aggregates: idx_aggregates_metric => CREATE INDEX idx_aggregates_metric ON public.daily_aggregates USING btree (metric_name, date) +daily_aggregates: idx_aggregates_tenant_metric_date => CREATE INDEX idx_aggregates_tenant_metric_date ON public.daily_aggregates USING btree (tenant_id, metric_name, date DESC) +daily_aggregates: idx_daily_aggregates_date => CREATE INDEX idx_daily_aggregates_date ON public.daily_aggregates USING btree (date DESC) +daily_aggregates: idx_daily_aggregates_metric_name => CREATE INDEX idx_daily_aggregates_metric_name ON public.daily_aggregates USING btree (metric_name, date DESC) +daily_aggregates: idx_daily_aggregates_org_project_date => CREATE INDEX idx_daily_aggregates_org_project_date ON public.daily_aggregates USING btree (org_id, project_id, date DESC) +daily_aggregates: unique_aggregate => CREATE UNIQUE INDEX unique_aggregate ON public.daily_aggregates USING btree (org_id, project_id, metric_name, date, dimensions) +events: events_org_project_event_id_timestamp_key => CREATE UNIQUE INDEX events_org_project_event_id_timestamp_key ON ONLY public.events USING btree (org_id, project_id, event_id, "timestamp") +events: events_pkey => CREATE UNIQUE INDEX events_pkey ON ONLY public.events USING btree (id, "timestamp") +events: idx_events_event_id => CREATE INDEX idx_events_event_id ON ONLY public.events USING btree (org_id, project_id, event_id) +events: idx_events_event_id_time => CREATE INDEX idx_events_event_id_time ON ONLY public.events USING btree (event_id, "timestamp") +events: idx_events_event_name_time => CREATE INDEX idx_events_event_name_time ON ONLY public.events USING btree (event_name, "timestamp" DESC) +events: idx_events_org_project => CREATE INDEX idx_events_org_project ON ONLY public.events USING btree (org_id, project_id) +events: idx_events_org_project_time => CREATE INDEX idx_events_org_project_time ON ONLY public.events USING btree (org_id, project_id, "timestamp" DESC) +events: idx_events_org_time => CREATE INDEX idx_events_org_time ON ONLY public.events USING btree (org_id, "timestamp" DESC) +events: idx_events_project_time => CREATE INDEX idx_events_project_time ON ONLY public.events USING btree (project_id, "timestamp" DESC) +events: idx_events_properties_gin => CREATE INDEX idx_events_properties_gin ON ONLY public.events USING gin (properties) +events: idx_events_session => CREATE INDEX idx_events_session ON ONLY public.events USING btree (session_id) WHERE (session_id IS NOT NULL) +events: idx_events_tenant_metric_time => CREATE INDEX idx_events_tenant_metric_time ON ONLY public.events USING btree (tenant_id, event_name, "timestamp" DESC) +events: idx_events_tenant_time => CREATE INDEX idx_events_tenant_time ON ONLY public.events USING btree (tenant_id, "timestamp" DESC) +events: idx_events_timestamp => CREATE INDEX idx_events_timestamp ON ONLY public.events USING btree ("timestamp" DESC) +events: idx_events_user_time => CREATE INDEX idx_events_user_time ON ONLY public.events USING btree (org_id, project_id, user_id, "timestamp" DESC) WHERE (user_id IS NOT NULL) +hourly_aggregates: hourly_aggregates_pkey => CREATE UNIQUE INDEX hourly_aggregates_pkey ON public.hourly_aggregates USING btree (id) +hourly_aggregates: idx_hourly_aggregates_lookup => CREATE INDEX idx_hourly_aggregates_lookup ON public.hourly_aggregates USING btree (org_id, project_id, hour DESC) +hourly_aggregates: idx_hourly_aggregates_metric => CREATE INDEX idx_hourly_aggregates_metric ON public.hourly_aggregates USING btree (metric_name, hour DESC) +hourly_aggregates: idx_hourly_aggregates_tenant_metric_hour => CREATE INDEX idx_hourly_aggregates_tenant_metric_hour ON public.hourly_aggregates USING btree (tenant_id, metric_name, hour DESC) +hourly_aggregates: unique_hourly_aggregate => CREATE UNIQUE INDEX unique_hourly_aggregate ON public.hourly_aggregates USING btree (org_id, project_id, metric_name, hour, dimensions) +mv_dashboard_metrics: idx_mv_dashboard_org_project_date => CREATE INDEX idx_mv_dashboard_org_project_date ON public.mv_dashboard_metrics USING btree (org_id, project_id, date DESC) +mv_dashboard_metrics: idx_mv_dashboard_unique => CREATE UNIQUE INDEX idx_mv_dashboard_unique ON public.mv_dashboard_metrics USING btree (org_id, project_id, date) +``` + +## GraphQL Query Text + +```graphql +query DashboardCacheBenchmark($orgId: ID!, $projectId: ID!, $startDate: Date!, $endDate: Date!) { + metrics(orgId: $orgId, projectId: $projectId, startDate: $startDate, endDate: $endDate) { + totalEvents + dailyActiveUsers { date value } + topEvents { eventName count } + dateRange { start end } + } +} +``` + +## EXPLAIN ANALYZE + +No PostgreSQL EXPLAIN ANALYZE is recorded for cached vs uncached GraphQL timing. The warm path is served through Redis and resolver-level cache behavior, so a PostgreSQL plan would not represent the cached request. + +## Dashboard Cache Evidence + +Run-scoped cache evidence: `docs/benchmarks/evidence/dashboard-cache-2026-06-16-clean-publish-benchmark.json`. + +## Interpretation + +Use the dashboard cache JSON evidence for cold and warm GraphQL timings. Use the PostgreSQL plans in the other files from this run for database access paths; do not invent a cached GraphQL EXPLAIN plan. diff --git a/docs/query-plans/2026-06-16-clean-publish-benchmark-materialized-dashboard.md b/docs/query-plans/2026-06-16-clean-publish-benchmark-materialized-dashboard.md new file mode 100644 index 0000000..03ebb17 --- /dev/null +++ b/docs/query-plans/2026-06-16-clean-publish-benchmark-materialized-dashboard.md @@ -0,0 +1,109 @@ +# Materialized Dashboard Metrics Query Schema Evidence + +Captured: 2026-06-16T21:25:50Z +Run ID: 2026-06-16-clean-publish-benchmark +Git commit: 63f9556cefad9548774c0eca17b01e558eda3d87 +Command: RUN_ID=2026-06-16-clean-publish-benchmark ./scripts/capture-query-plans.sh +Target org_id: 00000000-0000-4000-8000-0000000f4241 +Target project_id: 00000000-0000-4000-8000-0000001e8481 + +## PostgreSQL Version + +```text +16.13 +``` + +## Benchmark Run Counts + +```text +events_for_run_id=18365 +target_tenant_events_for_run_id=5482 +``` + +## Table Row Counts + +```text +events=18376 +daily_aggregates=630 +hourly_aggregates=0 +mv_dashboard_metrics=0 +``` + +## Relevant Indexes + +```text +daily_aggregates: daily_aggregates_pkey => CREATE UNIQUE INDEX daily_aggregates_pkey ON public.daily_aggregates USING btree (id) +daily_aggregates: idx_aggregates_lookup => CREATE INDEX idx_aggregates_lookup ON public.daily_aggregates USING btree (org_id, project_id, date) +daily_aggregates: idx_aggregates_metric => CREATE INDEX idx_aggregates_metric ON public.daily_aggregates USING btree (metric_name, date) +daily_aggregates: idx_aggregates_tenant_metric_date => CREATE INDEX idx_aggregates_tenant_metric_date ON public.daily_aggregates USING btree (tenant_id, metric_name, date DESC) +daily_aggregates: idx_daily_aggregates_date => CREATE INDEX idx_daily_aggregates_date ON public.daily_aggregates USING btree (date DESC) +daily_aggregates: idx_daily_aggregates_metric_name => CREATE INDEX idx_daily_aggregates_metric_name ON public.daily_aggregates USING btree (metric_name, date DESC) +daily_aggregates: idx_daily_aggregates_org_project_date => CREATE INDEX idx_daily_aggregates_org_project_date ON public.daily_aggregates USING btree (org_id, project_id, date DESC) +daily_aggregates: unique_aggregate => CREATE UNIQUE INDEX unique_aggregate ON public.daily_aggregates USING btree (org_id, project_id, metric_name, date, dimensions) +events: events_org_project_event_id_timestamp_key => CREATE UNIQUE INDEX events_org_project_event_id_timestamp_key ON ONLY public.events USING btree (org_id, project_id, event_id, "timestamp") +events: events_pkey => CREATE UNIQUE INDEX events_pkey ON ONLY public.events USING btree (id, "timestamp") +events: idx_events_event_id => CREATE INDEX idx_events_event_id ON ONLY public.events USING btree (org_id, project_id, event_id) +events: idx_events_event_id_time => CREATE INDEX idx_events_event_id_time ON ONLY public.events USING btree (event_id, "timestamp") +events: idx_events_event_name_time => CREATE INDEX idx_events_event_name_time ON ONLY public.events USING btree (event_name, "timestamp" DESC) +events: idx_events_org_project => CREATE INDEX idx_events_org_project ON ONLY public.events USING btree (org_id, project_id) +events: idx_events_org_project_time => CREATE INDEX idx_events_org_project_time ON ONLY public.events USING btree (org_id, project_id, "timestamp" DESC) +events: idx_events_org_time => CREATE INDEX idx_events_org_time ON ONLY public.events USING btree (org_id, "timestamp" DESC) +events: idx_events_project_time => CREATE INDEX idx_events_project_time ON ONLY public.events USING btree (project_id, "timestamp" DESC) +events: idx_events_properties_gin => CREATE INDEX idx_events_properties_gin ON ONLY public.events USING gin (properties) +events: idx_events_session => CREATE INDEX idx_events_session ON ONLY public.events USING btree (session_id) WHERE (session_id IS NOT NULL) +events: idx_events_tenant_metric_time => CREATE INDEX idx_events_tenant_metric_time ON ONLY public.events USING btree (tenant_id, event_name, "timestamp" DESC) +events: idx_events_tenant_time => CREATE INDEX idx_events_tenant_time ON ONLY public.events USING btree (tenant_id, "timestamp" DESC) +events: idx_events_timestamp => CREATE INDEX idx_events_timestamp ON ONLY public.events USING btree ("timestamp" DESC) +events: idx_events_user_time => CREATE INDEX idx_events_user_time ON ONLY public.events USING btree (org_id, project_id, user_id, "timestamp" DESC) WHERE (user_id IS NOT NULL) +hourly_aggregates: hourly_aggregates_pkey => CREATE UNIQUE INDEX hourly_aggregates_pkey ON public.hourly_aggregates USING btree (id) +hourly_aggregates: idx_hourly_aggregates_lookup => CREATE INDEX idx_hourly_aggregates_lookup ON public.hourly_aggregates USING btree (org_id, project_id, hour DESC) +hourly_aggregates: idx_hourly_aggregates_metric => CREATE INDEX idx_hourly_aggregates_metric ON public.hourly_aggregates USING btree (metric_name, hour DESC) +hourly_aggregates: idx_hourly_aggregates_tenant_metric_hour => CREATE INDEX idx_hourly_aggregates_tenant_metric_hour ON public.hourly_aggregates USING btree (tenant_id, metric_name, hour DESC) +hourly_aggregates: unique_hourly_aggregate => CREATE UNIQUE INDEX unique_hourly_aggregate ON public.hourly_aggregates USING btree (org_id, project_id, metric_name, hour, dimensions) +mv_dashboard_metrics: idx_mv_dashboard_org_project_date => CREATE INDEX idx_mv_dashboard_org_project_date ON public.mv_dashboard_metrics USING btree (org_id, project_id, date DESC) +mv_dashboard_metrics: idx_mv_dashboard_unique => CREATE UNIQUE INDEX idx_mv_dashboard_unique ON public.mv_dashboard_metrics USING btree (org_id, project_id, date) +``` + +## Event Partitions + +```text +events_2026_03 +events_2026_04 +events_2026_05 +events_2026_06 +events_2026_07 +events_2026_08 +events_2026_09 +``` + +## Query + +```sql +EXPLAIN (ANALYZE, BUFFERS) +SELECT date, event_count, unique_users, unique_sessions, events_by_name +FROM mv_dashboard_metrics +WHERE org_id = '00000000-0000-4000-8000-0000000f4241' + AND project_id = '00000000-0000-4000-8000-0000001e8481' + AND date >= CURRENT_DATE - INTERVAL '30 days' +ORDER BY date DESC; +``` + +## EXPLAIN ANALYZE + +```text + QUERY PLAN +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + Index Scan using idx_mv_dashboard_org_project_date on mv_dashboard_metrics (cost=0.15..8.18 rows=1 width=84) (actual time=0.014..0.014 rows=0 loops=1) + Index Cond: ((org_id = '00000000-0000-4000-8000-0000000f4241'::uuid) AND (project_id = '00000000-0000-4000-8000-0000001e8481'::uuid) AND (date >= (CURRENT_DATE - '30 days'::interval))) + Buffers: shared hit=2 + Planning: + Buffers: shared hit=114 + Planning Time: 0.365 ms + Execution Time: 0.027 ms +(7 rows) + +``` + +## Interpretation + +This plan captures the materialized dashboard view as schema/query-plan evidence only. The current GraphQL resolvers do not read mv_dashboard_metrics; runtime dashboard paths are daily_aggregates, raw events, and Redis-cached resolver results. diff --git a/docs/query-plans/2026-06-16-clean-publish-benchmark-partition-pruning-24h.md b/docs/query-plans/2026-06-16-clean-publish-benchmark-partition-pruning-24h.md new file mode 100644 index 0000000..e928e76 --- /dev/null +++ b/docs/query-plans/2026-06-16-clean-publish-benchmark-partition-pruning-24h.md @@ -0,0 +1,115 @@ +# Partition Pruning Over Last 24 Hours + +Captured: 2026-06-16T21:25:49Z +Run ID: 2026-06-16-clean-publish-benchmark +Git commit: 63f9556cefad9548774c0eca17b01e558eda3d87 +Command: RUN_ID=2026-06-16-clean-publish-benchmark ./scripts/capture-query-plans.sh +Target org_id: 00000000-0000-4000-8000-0000000f4241 +Target project_id: 00000000-0000-4000-8000-0000001e8481 + +## PostgreSQL Version + +```text +16.13 +``` + +## Benchmark Run Counts + +```text +events_for_run_id=18365 +target_tenant_events_for_run_id=5482 +``` + +## Table Row Counts + +```text +events=18376 +daily_aggregates=630 +hourly_aggregates=0 +mv_dashboard_metrics=0 +``` + +## Relevant Indexes + +```text +daily_aggregates: daily_aggregates_pkey => CREATE UNIQUE INDEX daily_aggregates_pkey ON public.daily_aggregates USING btree (id) +daily_aggregates: idx_aggregates_lookup => CREATE INDEX idx_aggregates_lookup ON public.daily_aggregates USING btree (org_id, project_id, date) +daily_aggregates: idx_aggregates_metric => CREATE INDEX idx_aggregates_metric ON public.daily_aggregates USING btree (metric_name, date) +daily_aggregates: idx_aggregates_tenant_metric_date => CREATE INDEX idx_aggregates_tenant_metric_date ON public.daily_aggregates USING btree (tenant_id, metric_name, date DESC) +daily_aggregates: idx_daily_aggregates_date => CREATE INDEX idx_daily_aggregates_date ON public.daily_aggregates USING btree (date DESC) +daily_aggregates: idx_daily_aggregates_metric_name => CREATE INDEX idx_daily_aggregates_metric_name ON public.daily_aggregates USING btree (metric_name, date DESC) +daily_aggregates: idx_daily_aggregates_org_project_date => CREATE INDEX idx_daily_aggregates_org_project_date ON public.daily_aggregates USING btree (org_id, project_id, date DESC) +daily_aggregates: unique_aggregate => CREATE UNIQUE INDEX unique_aggregate ON public.daily_aggregates USING btree (org_id, project_id, metric_name, date, dimensions) +events: events_org_project_event_id_timestamp_key => CREATE UNIQUE INDEX events_org_project_event_id_timestamp_key ON ONLY public.events USING btree (org_id, project_id, event_id, "timestamp") +events: events_pkey => CREATE UNIQUE INDEX events_pkey ON ONLY public.events USING btree (id, "timestamp") +events: idx_events_event_id => CREATE INDEX idx_events_event_id ON ONLY public.events USING btree (org_id, project_id, event_id) +events: idx_events_event_id_time => CREATE INDEX idx_events_event_id_time ON ONLY public.events USING btree (event_id, "timestamp") +events: idx_events_event_name_time => CREATE INDEX idx_events_event_name_time ON ONLY public.events USING btree (event_name, "timestamp" DESC) +events: idx_events_org_project => CREATE INDEX idx_events_org_project ON ONLY public.events USING btree (org_id, project_id) +events: idx_events_org_project_time => CREATE INDEX idx_events_org_project_time ON ONLY public.events USING btree (org_id, project_id, "timestamp" DESC) +events: idx_events_org_time => CREATE INDEX idx_events_org_time ON ONLY public.events USING btree (org_id, "timestamp" DESC) +events: idx_events_project_time => CREATE INDEX idx_events_project_time ON ONLY public.events USING btree (project_id, "timestamp" DESC) +events: idx_events_properties_gin => CREATE INDEX idx_events_properties_gin ON ONLY public.events USING gin (properties) +events: idx_events_session => CREATE INDEX idx_events_session ON ONLY public.events USING btree (session_id) WHERE (session_id IS NOT NULL) +events: idx_events_tenant_metric_time => CREATE INDEX idx_events_tenant_metric_time ON ONLY public.events USING btree (tenant_id, event_name, "timestamp" DESC) +events: idx_events_tenant_time => CREATE INDEX idx_events_tenant_time ON ONLY public.events USING btree (tenant_id, "timestamp" DESC) +events: idx_events_timestamp => CREATE INDEX idx_events_timestamp ON ONLY public.events USING btree ("timestamp" DESC) +events: idx_events_user_time => CREATE INDEX idx_events_user_time ON ONLY public.events USING btree (org_id, project_id, user_id, "timestamp" DESC) WHERE (user_id IS NOT NULL) +hourly_aggregates: hourly_aggregates_pkey => CREATE UNIQUE INDEX hourly_aggregates_pkey ON public.hourly_aggregates USING btree (id) +hourly_aggregates: idx_hourly_aggregates_lookup => CREATE INDEX idx_hourly_aggregates_lookup ON public.hourly_aggregates USING btree (org_id, project_id, hour DESC) +hourly_aggregates: idx_hourly_aggregates_metric => CREATE INDEX idx_hourly_aggregates_metric ON public.hourly_aggregates USING btree (metric_name, hour DESC) +hourly_aggregates: idx_hourly_aggregates_tenant_metric_hour => CREATE INDEX idx_hourly_aggregates_tenant_metric_hour ON public.hourly_aggregates USING btree (tenant_id, metric_name, hour DESC) +hourly_aggregates: unique_hourly_aggregate => CREATE UNIQUE INDEX unique_hourly_aggregate ON public.hourly_aggregates USING btree (org_id, project_id, metric_name, hour, dimensions) +mv_dashboard_metrics: idx_mv_dashboard_org_project_date => CREATE INDEX idx_mv_dashboard_org_project_date ON public.mv_dashboard_metrics USING btree (org_id, project_id, date DESC) +mv_dashboard_metrics: idx_mv_dashboard_unique => CREATE UNIQUE INDEX idx_mv_dashboard_unique ON public.mv_dashboard_metrics USING btree (org_id, project_id, date) +``` + +## Event Partitions + +```text +events_2026_03 +events_2026_04 +events_2026_05 +events_2026_06 +events_2026_07 +events_2026_08 +events_2026_09 +``` + +## Query + +```sql +EXPLAIN (ANALYZE, BUFFERS) +SELECT COUNT(*) AS events_in_window +FROM events +WHERE org_id = '00000000-0000-4000-8000-0000000f4241' + AND project_id = '00000000-0000-4000-8000-0000001e8481' + AND timestamp >= NOW() - INTERVAL '24 hours' + AND timestamp < NOW(); +``` + +## EXPLAIN ANALYZE + +```text + QUERY PLAN +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + Aggregate (cost=103.35..103.36 rows=1 width=8) (actual time=0.818..0.818 rows=1 loops=1) + Buffers: shared hit=19 + -> Append (cost=0.00..99.36 rows=1599 width=0) (actual time=0.032..0.673 rows=5238 loops=1) + Buffers: shared hit=19 + Subplans Removed: 6 + -> Index Only Scan using events_2026_06_org_id_project_id_timestamp_idx on events_2026_06 events_1 (cost=0.29..91.36 rows=1593 width=0) (actual time=0.032..0.435 rows=5238 loops=1) + Index Cond: ((org_id = '00000000-0000-4000-8000-0000000f4241'::uuid) AND (project_id = '00000000-0000-4000-8000-0000001e8481'::uuid) AND ("timestamp" >= (now() - '24:00:00'::interval)) AND ("timestamp" < now())) + Heap Fetches: 14 + Buffers: shared hit=19 + Planning: + Buffers: shared hit=2356 + Planning Time: 4.854 ms + Execution Time: 0.846 ms +(13 rows) + +``` + +## Interpretation + +This query demonstrates the partition set touched for a narrow dashboard window. Compare the child tables and subplans removed here with the 30-day plan from the same run. diff --git a/docs/query-plans/2026-06-16-clean-publish-benchmark-partition-pruning-30d.md b/docs/query-plans/2026-06-16-clean-publish-benchmark-partition-pruning-30d.md new file mode 100644 index 0000000..1aad961 --- /dev/null +++ b/docs/query-plans/2026-06-16-clean-publish-benchmark-partition-pruning-30d.md @@ -0,0 +1,117 @@ +# Partition Pruning Over Last 30 Days + +Captured: 2026-06-16T21:25:50Z +Run ID: 2026-06-16-clean-publish-benchmark +Git commit: 63f9556cefad9548774c0eca17b01e558eda3d87 +Command: RUN_ID=2026-06-16-clean-publish-benchmark ./scripts/capture-query-plans.sh +Target org_id: 00000000-0000-4000-8000-0000000f4241 +Target project_id: 00000000-0000-4000-8000-0000001e8481 + +## PostgreSQL Version + +```text +16.13 +``` + +## Benchmark Run Counts + +```text +events_for_run_id=18365 +target_tenant_events_for_run_id=5482 +``` + +## Table Row Counts + +```text +events=18376 +daily_aggregates=630 +hourly_aggregates=0 +mv_dashboard_metrics=0 +``` + +## Relevant Indexes + +```text +daily_aggregates: daily_aggregates_pkey => CREATE UNIQUE INDEX daily_aggregates_pkey ON public.daily_aggregates USING btree (id) +daily_aggregates: idx_aggregates_lookup => CREATE INDEX idx_aggregates_lookup ON public.daily_aggregates USING btree (org_id, project_id, date) +daily_aggregates: idx_aggregates_metric => CREATE INDEX idx_aggregates_metric ON public.daily_aggregates USING btree (metric_name, date) +daily_aggregates: idx_aggregates_tenant_metric_date => CREATE INDEX idx_aggregates_tenant_metric_date ON public.daily_aggregates USING btree (tenant_id, metric_name, date DESC) +daily_aggregates: idx_daily_aggregates_date => CREATE INDEX idx_daily_aggregates_date ON public.daily_aggregates USING btree (date DESC) +daily_aggregates: idx_daily_aggregates_metric_name => CREATE INDEX idx_daily_aggregates_metric_name ON public.daily_aggregates USING btree (metric_name, date DESC) +daily_aggregates: idx_daily_aggregates_org_project_date => CREATE INDEX idx_daily_aggregates_org_project_date ON public.daily_aggregates USING btree (org_id, project_id, date DESC) +daily_aggregates: unique_aggregate => CREATE UNIQUE INDEX unique_aggregate ON public.daily_aggregates USING btree (org_id, project_id, metric_name, date, dimensions) +events: events_org_project_event_id_timestamp_key => CREATE UNIQUE INDEX events_org_project_event_id_timestamp_key ON ONLY public.events USING btree (org_id, project_id, event_id, "timestamp") +events: events_pkey => CREATE UNIQUE INDEX events_pkey ON ONLY public.events USING btree (id, "timestamp") +events: idx_events_event_id => CREATE INDEX idx_events_event_id ON ONLY public.events USING btree (org_id, project_id, event_id) +events: idx_events_event_id_time => CREATE INDEX idx_events_event_id_time ON ONLY public.events USING btree (event_id, "timestamp") +events: idx_events_event_name_time => CREATE INDEX idx_events_event_name_time ON ONLY public.events USING btree (event_name, "timestamp" DESC) +events: idx_events_org_project => CREATE INDEX idx_events_org_project ON ONLY public.events USING btree (org_id, project_id) +events: idx_events_org_project_time => CREATE INDEX idx_events_org_project_time ON ONLY public.events USING btree (org_id, project_id, "timestamp" DESC) +events: idx_events_org_time => CREATE INDEX idx_events_org_time ON ONLY public.events USING btree (org_id, "timestamp" DESC) +events: idx_events_project_time => CREATE INDEX idx_events_project_time ON ONLY public.events USING btree (project_id, "timestamp" DESC) +events: idx_events_properties_gin => CREATE INDEX idx_events_properties_gin ON ONLY public.events USING gin (properties) +events: idx_events_session => CREATE INDEX idx_events_session ON ONLY public.events USING btree (session_id) WHERE (session_id IS NOT NULL) +events: idx_events_tenant_metric_time => CREATE INDEX idx_events_tenant_metric_time ON ONLY public.events USING btree (tenant_id, event_name, "timestamp" DESC) +events: idx_events_tenant_time => CREATE INDEX idx_events_tenant_time ON ONLY public.events USING btree (tenant_id, "timestamp" DESC) +events: idx_events_timestamp => CREATE INDEX idx_events_timestamp ON ONLY public.events USING btree ("timestamp" DESC) +events: idx_events_user_time => CREATE INDEX idx_events_user_time ON ONLY public.events USING btree (org_id, project_id, user_id, "timestamp" DESC) WHERE (user_id IS NOT NULL) +hourly_aggregates: hourly_aggregates_pkey => CREATE UNIQUE INDEX hourly_aggregates_pkey ON public.hourly_aggregates USING btree (id) +hourly_aggregates: idx_hourly_aggregates_lookup => CREATE INDEX idx_hourly_aggregates_lookup ON public.hourly_aggregates USING btree (org_id, project_id, hour DESC) +hourly_aggregates: idx_hourly_aggregates_metric => CREATE INDEX idx_hourly_aggregates_metric ON public.hourly_aggregates USING btree (metric_name, hour DESC) +hourly_aggregates: idx_hourly_aggregates_tenant_metric_hour => CREATE INDEX idx_hourly_aggregates_tenant_metric_hour ON public.hourly_aggregates USING btree (tenant_id, metric_name, hour DESC) +hourly_aggregates: unique_hourly_aggregate => CREATE UNIQUE INDEX unique_hourly_aggregate ON public.hourly_aggregates USING btree (org_id, project_id, metric_name, hour, dimensions) +mv_dashboard_metrics: idx_mv_dashboard_org_project_date => CREATE INDEX idx_mv_dashboard_org_project_date ON public.mv_dashboard_metrics USING btree (org_id, project_id, date DESC) +mv_dashboard_metrics: idx_mv_dashboard_unique => CREATE UNIQUE INDEX idx_mv_dashboard_unique ON public.mv_dashboard_metrics USING btree (org_id, project_id, date) +``` + +## Event Partitions + +```text +events_2026_03 +events_2026_04 +events_2026_05 +events_2026_06 +events_2026_07 +events_2026_08 +events_2026_09 +``` + +## Query + +```sql +EXPLAIN (ANALYZE, BUFFERS) +SELECT COUNT(*) AS events_in_window +FROM events +WHERE org_id = '00000000-0000-4000-8000-0000000f4241' + AND project_id = '00000000-0000-4000-8000-0000001e8481' + AND timestamp >= NOW() - INTERVAL '30 days' + AND timestamp < NOW(); +``` + +## EXPLAIN ANALYZE + +```text + QUERY PLAN +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + Aggregate (cost=104.72..104.73 rows=1 width=8) (actual time=0.675..0.676 rows=1 loops=1) + Buffers: shared hit=22 + -> Append (cost=0.00..100.62 rows=1641 width=0) (actual time=0.018..0.536 rows=5482 loops=1) + Buffers: shared hit=22 + Subplans Removed: 5 + -> Seq Scan on events_2026_05 events_1 (cost=0.00..0.00 rows=1 width=0) (actual time=0.001..0.001 rows=0 loops=1) + Filter: ((org_id = '00000000-0000-4000-8000-0000000f4241'::uuid) AND (project_id = '00000000-0000-4000-8000-0000001e8481'::uuid) AND ("timestamp" < now()) AND ("timestamp" >= (now() - '30 days'::interval))) + -> Index Only Scan using events_2026_06_org_id_project_id_timestamp_idx on events_2026_06 events_2 (cost=0.29..92.41 rows=1635 width=0) (actual time=0.016..0.341 rows=5482 loops=1) + Index Cond: ((org_id = '00000000-0000-4000-8000-0000000f4241'::uuid) AND (project_id = '00000000-0000-4000-8000-0000001e8481'::uuid) AND ("timestamp" >= (now() - '30 days'::interval)) AND ("timestamp" < now())) + Heap Fetches: 14 + Buffers: shared hit=22 + Planning: + Buffers: shared hit=2359 + Planning Time: 4.313 ms + Execution Time: 0.704 ms +(15 rows) + +``` + +## Interpretation + +This 30-day query is the broader scan/pruning comparison for the 24-hour plan from the same run. With monthly local partitions, it can touch more than one child partition depending on the date and available benchmark rows. diff --git a/docs/query-plans/2026-06-16-clean-publish-benchmark-tenant-dashboard-chosen-index.md b/docs/query-plans/2026-06-16-clean-publish-benchmark-tenant-dashboard-chosen-index.md new file mode 100644 index 0000000..e3661df --- /dev/null +++ b/docs/query-plans/2026-06-16-clean-publish-benchmark-tenant-dashboard-chosen-index.md @@ -0,0 +1,129 @@ +# Tenant Dashboard Raw Event Query With Chosen Indexes + +Captured: 2026-06-16T21:25:49Z +Run ID: 2026-06-16-clean-publish-benchmark +Git commit: 63f9556cefad9548774c0eca17b01e558eda3d87 +Command: RUN_ID=2026-06-16-clean-publish-benchmark ./scripts/capture-query-plans.sh +Target org_id: 00000000-0000-4000-8000-0000000f4241 +Target project_id: 00000000-0000-4000-8000-0000001e8481 + +## PostgreSQL Version + +```text +16.13 +``` + +## Benchmark Run Counts + +```text +events_for_run_id=18365 +target_tenant_events_for_run_id=5482 +``` + +## Table Row Counts + +```text +events=18376 +daily_aggregates=630 +hourly_aggregates=0 +mv_dashboard_metrics=0 +``` + +## Relevant Indexes + +```text +daily_aggregates: daily_aggregates_pkey => CREATE UNIQUE INDEX daily_aggregates_pkey ON public.daily_aggregates USING btree (id) +daily_aggregates: idx_aggregates_lookup => CREATE INDEX idx_aggregates_lookup ON public.daily_aggregates USING btree (org_id, project_id, date) +daily_aggregates: idx_aggregates_metric => CREATE INDEX idx_aggregates_metric ON public.daily_aggregates USING btree (metric_name, date) +daily_aggregates: idx_aggregates_tenant_metric_date => CREATE INDEX idx_aggregates_tenant_metric_date ON public.daily_aggregates USING btree (tenant_id, metric_name, date DESC) +daily_aggregates: idx_daily_aggregates_date => CREATE INDEX idx_daily_aggregates_date ON public.daily_aggregates USING btree (date DESC) +daily_aggregates: idx_daily_aggregates_metric_name => CREATE INDEX idx_daily_aggregates_metric_name ON public.daily_aggregates USING btree (metric_name, date DESC) +daily_aggregates: idx_daily_aggregates_org_project_date => CREATE INDEX idx_daily_aggregates_org_project_date ON public.daily_aggregates USING btree (org_id, project_id, date DESC) +daily_aggregates: unique_aggregate => CREATE UNIQUE INDEX unique_aggregate ON public.daily_aggregates USING btree (org_id, project_id, metric_name, date, dimensions) +events: events_org_project_event_id_timestamp_key => CREATE UNIQUE INDEX events_org_project_event_id_timestamp_key ON ONLY public.events USING btree (org_id, project_id, event_id, "timestamp") +events: events_pkey => CREATE UNIQUE INDEX events_pkey ON ONLY public.events USING btree (id, "timestamp") +events: idx_events_event_id => CREATE INDEX idx_events_event_id ON ONLY public.events USING btree (org_id, project_id, event_id) +events: idx_events_event_id_time => CREATE INDEX idx_events_event_id_time ON ONLY public.events USING btree (event_id, "timestamp") +events: idx_events_event_name_time => CREATE INDEX idx_events_event_name_time ON ONLY public.events USING btree (event_name, "timestamp" DESC) +events: idx_events_org_project => CREATE INDEX idx_events_org_project ON ONLY public.events USING btree (org_id, project_id) +events: idx_events_org_project_time => CREATE INDEX idx_events_org_project_time ON ONLY public.events USING btree (org_id, project_id, "timestamp" DESC) +events: idx_events_org_time => CREATE INDEX idx_events_org_time ON ONLY public.events USING btree (org_id, "timestamp" DESC) +events: idx_events_project_time => CREATE INDEX idx_events_project_time ON ONLY public.events USING btree (project_id, "timestamp" DESC) +events: idx_events_properties_gin => CREATE INDEX idx_events_properties_gin ON ONLY public.events USING gin (properties) +events: idx_events_session => CREATE INDEX idx_events_session ON ONLY public.events USING btree (session_id) WHERE (session_id IS NOT NULL) +events: idx_events_tenant_metric_time => CREATE INDEX idx_events_tenant_metric_time ON ONLY public.events USING btree (tenant_id, event_name, "timestamp" DESC) +events: idx_events_tenant_time => CREATE INDEX idx_events_tenant_time ON ONLY public.events USING btree (tenant_id, "timestamp" DESC) +events: idx_events_timestamp => CREATE INDEX idx_events_timestamp ON ONLY public.events USING btree ("timestamp" DESC) +events: idx_events_user_time => CREATE INDEX idx_events_user_time ON ONLY public.events USING btree (org_id, project_id, user_id, "timestamp" DESC) WHERE (user_id IS NOT NULL) +hourly_aggregates: hourly_aggregates_pkey => CREATE UNIQUE INDEX hourly_aggregates_pkey ON public.hourly_aggregates USING btree (id) +hourly_aggregates: idx_hourly_aggregates_lookup => CREATE INDEX idx_hourly_aggregates_lookup ON public.hourly_aggregates USING btree (org_id, project_id, hour DESC) +hourly_aggregates: idx_hourly_aggregates_metric => CREATE INDEX idx_hourly_aggregates_metric ON public.hourly_aggregates USING btree (metric_name, hour DESC) +hourly_aggregates: idx_hourly_aggregates_tenant_metric_hour => CREATE INDEX idx_hourly_aggregates_tenant_metric_hour ON public.hourly_aggregates USING btree (tenant_id, metric_name, hour DESC) +hourly_aggregates: unique_hourly_aggregate => CREATE UNIQUE INDEX unique_hourly_aggregate ON public.hourly_aggregates USING btree (org_id, project_id, metric_name, hour, dimensions) +mv_dashboard_metrics: idx_mv_dashboard_org_project_date => CREATE INDEX idx_mv_dashboard_org_project_date ON public.mv_dashboard_metrics USING btree (org_id, project_id, date DESC) +mv_dashboard_metrics: idx_mv_dashboard_unique => CREATE UNIQUE INDEX idx_mv_dashboard_unique ON public.mv_dashboard_metrics USING btree (org_id, project_id, date) +``` + +## Event Partitions + +```text +events_2026_03 +events_2026_04 +events_2026_05 +events_2026_06 +events_2026_07 +events_2026_08 +events_2026_09 +``` + +## Query + +```sql +EXPLAIN (ANALYZE, BUFFERS) +SELECT id, event_id, event_name, user_id, session_id, timestamp, properties +FROM events +WHERE org_id = '00000000-0000-4000-8000-0000000f4241' + AND project_id = '00000000-0000-4000-8000-0000001e8481' + AND timestamp >= NOW() - INTERVAL '7 days' +ORDER BY timestamp DESC +LIMIT 100; +``` + +## EXPLAIN ANALYZE + +```text + QUERY PLAN +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + Limit (cost=1.07..177.32 rows=100 width=304) (actual time=0.036..0.087 rows=100 loops=1) + Buffers: shared hit=29 + -> Append (cost=1.07..2829.84 rows=1605 width=304) (actual time=0.035..0.082 rows=100 loops=1) + Buffers: shared hit=29 + Subplans Removed: 3 + -> Index Scan using events_2026_09_timestamp_idx on events_2026_09 events_4 (cost=0.13..8.15 rows=1 width=1112) (actual time=0.004..0.004 rows=0 loops=1) + Index Cond: ("timestamp" >= (now() - '7 days'::interval)) + Filter: ((org_id = '00000000-0000-4000-8000-0000000f4241'::uuid) AND (project_id = '00000000-0000-4000-8000-0000001e8481'::uuid)) + Buffers: shared hit=2 + -> Index Scan using events_2026_08_timestamp_idx on events_2026_08 events_3 (cost=0.13..8.15 rows=1 width=1112) (actual time=0.004..0.004 rows=0 loops=1) + Index Cond: ("timestamp" >= (now() - '7 days'::interval)) + Filter: ((org_id = '00000000-0000-4000-8000-0000000f4241'::uuid) AND (project_id = '00000000-0000-4000-8000-0000001e8481'::uuid)) + Buffers: shared hit=2 + -> Index Scan using events_2026_07_timestamp_idx on events_2026_07 events_2 (cost=0.13..8.15 rows=1 width=1112) (actual time=0.003..0.003 rows=0 loops=1) + Index Cond: ("timestamp" >= (now() - '7 days'::interval)) + Filter: ((org_id = '00000000-0000-4000-8000-0000000f4241'::uuid) AND (project_id = '00000000-0000-4000-8000-0000001e8481'::uuid)) + Buffers: shared hit=2 + -> Index Scan using events_2026_06_timestamp_idx on events_2026_06 events_1 (cost=0.29..2772.90 rows=1599 width=301) (actual time=0.024..0.067 rows=100 loops=1) + Index Cond: ("timestamp" >= (now() - '7 days'::interval)) + Filter: ((org_id = '00000000-0000-4000-8000-0000000f4241'::uuid) AND (project_id = '00000000-0000-4000-8000-0000001e8481'::uuid)) + Rows Removed by Filter: 95 + Buffers: shared hit=23 + Planning: + Buffers: shared hit=2441 + Planning Time: 4.726 ms + Execution Time: 0.139 ms +(26 rows) + +``` + +## Interpretation + +This is the normal tenant dashboard raw-event access path for this run. Use this plan to verify whether PostgreSQL chooses the tenant/project/timestamp index or another timestamp-oriented partition index, and how many partition children are touched. diff --git a/docs/query-plans/2026-06-16-clean-publish-benchmark-tenant-dashboard-index-disabled.md b/docs/query-plans/2026-06-16-clean-publish-benchmark-tenant-dashboard-index-disabled.md new file mode 100644 index 0000000..fb21240 --- /dev/null +++ b/docs/query-plans/2026-06-16-clean-publish-benchmark-tenant-dashboard-index-disabled.md @@ -0,0 +1,134 @@ +# Tenant Dashboard Raw Event Query With Index Scans Disabled + +Captured: 2026-06-16T21:25:48Z +Run ID: 2026-06-16-clean-publish-benchmark +Git commit: 63f9556cefad9548774c0eca17b01e558eda3d87 +Command: RUN_ID=2026-06-16-clean-publish-benchmark ./scripts/capture-query-plans.sh +Target org_id: 00000000-0000-4000-8000-0000000f4241 +Target project_id: 00000000-0000-4000-8000-0000001e8481 + +## PostgreSQL Version + +```text +16.13 +``` + +## Benchmark Run Counts + +```text +events_for_run_id=18365 +target_tenant_events_for_run_id=5482 +``` + +## Table Row Counts + +```text +events=18376 +daily_aggregates=630 +hourly_aggregates=0 +mv_dashboard_metrics=0 +``` + +## Relevant Indexes + +```text +daily_aggregates: daily_aggregates_pkey => CREATE UNIQUE INDEX daily_aggregates_pkey ON public.daily_aggregates USING btree (id) +daily_aggregates: idx_aggregates_lookup => CREATE INDEX idx_aggregates_lookup ON public.daily_aggregates USING btree (org_id, project_id, date) +daily_aggregates: idx_aggregates_metric => CREATE INDEX idx_aggregates_metric ON public.daily_aggregates USING btree (metric_name, date) +daily_aggregates: idx_aggregates_tenant_metric_date => CREATE INDEX idx_aggregates_tenant_metric_date ON public.daily_aggregates USING btree (tenant_id, metric_name, date DESC) +daily_aggregates: idx_daily_aggregates_date => CREATE INDEX idx_daily_aggregates_date ON public.daily_aggregates USING btree (date DESC) +daily_aggregates: idx_daily_aggregates_metric_name => CREATE INDEX idx_daily_aggregates_metric_name ON public.daily_aggregates USING btree (metric_name, date DESC) +daily_aggregates: idx_daily_aggregates_org_project_date => CREATE INDEX idx_daily_aggregates_org_project_date ON public.daily_aggregates USING btree (org_id, project_id, date DESC) +daily_aggregates: unique_aggregate => CREATE UNIQUE INDEX unique_aggregate ON public.daily_aggregates USING btree (org_id, project_id, metric_name, date, dimensions) +events: events_org_project_event_id_timestamp_key => CREATE UNIQUE INDEX events_org_project_event_id_timestamp_key ON ONLY public.events USING btree (org_id, project_id, event_id, "timestamp") +events: events_pkey => CREATE UNIQUE INDEX events_pkey ON ONLY public.events USING btree (id, "timestamp") +events: idx_events_event_id => CREATE INDEX idx_events_event_id ON ONLY public.events USING btree (org_id, project_id, event_id) +events: idx_events_event_id_time => CREATE INDEX idx_events_event_id_time ON ONLY public.events USING btree (event_id, "timestamp") +events: idx_events_event_name_time => CREATE INDEX idx_events_event_name_time ON ONLY public.events USING btree (event_name, "timestamp" DESC) +events: idx_events_org_project => CREATE INDEX idx_events_org_project ON ONLY public.events USING btree (org_id, project_id) +events: idx_events_org_project_time => CREATE INDEX idx_events_org_project_time ON ONLY public.events USING btree (org_id, project_id, "timestamp" DESC) +events: idx_events_org_time => CREATE INDEX idx_events_org_time ON ONLY public.events USING btree (org_id, "timestamp" DESC) +events: idx_events_project_time => CREATE INDEX idx_events_project_time ON ONLY public.events USING btree (project_id, "timestamp" DESC) +events: idx_events_properties_gin => CREATE INDEX idx_events_properties_gin ON ONLY public.events USING gin (properties) +events: idx_events_session => CREATE INDEX idx_events_session ON ONLY public.events USING btree (session_id) WHERE (session_id IS NOT NULL) +events: idx_events_tenant_metric_time => CREATE INDEX idx_events_tenant_metric_time ON ONLY public.events USING btree (tenant_id, event_name, "timestamp" DESC) +events: idx_events_tenant_time => CREATE INDEX idx_events_tenant_time ON ONLY public.events USING btree (tenant_id, "timestamp" DESC) +events: idx_events_timestamp => CREATE INDEX idx_events_timestamp ON ONLY public.events USING btree ("timestamp" DESC) +events: idx_events_user_time => CREATE INDEX idx_events_user_time ON ONLY public.events USING btree (org_id, project_id, user_id, "timestamp" DESC) WHERE (user_id IS NOT NULL) +hourly_aggregates: hourly_aggregates_pkey => CREATE UNIQUE INDEX hourly_aggregates_pkey ON public.hourly_aggregates USING btree (id) +hourly_aggregates: idx_hourly_aggregates_lookup => CREATE INDEX idx_hourly_aggregates_lookup ON public.hourly_aggregates USING btree (org_id, project_id, hour DESC) +hourly_aggregates: idx_hourly_aggregates_metric => CREATE INDEX idx_hourly_aggregates_metric ON public.hourly_aggregates USING btree (metric_name, hour DESC) +hourly_aggregates: idx_hourly_aggregates_tenant_metric_hour => CREATE INDEX idx_hourly_aggregates_tenant_metric_hour ON public.hourly_aggregates USING btree (tenant_id, metric_name, hour DESC) +hourly_aggregates: unique_hourly_aggregate => CREATE UNIQUE INDEX unique_hourly_aggregate ON public.hourly_aggregates USING btree (org_id, project_id, metric_name, hour, dimensions) +mv_dashboard_metrics: idx_mv_dashboard_org_project_date => CREATE INDEX idx_mv_dashboard_org_project_date ON public.mv_dashboard_metrics USING btree (org_id, project_id, date DESC) +mv_dashboard_metrics: idx_mv_dashboard_unique => CREATE UNIQUE INDEX idx_mv_dashboard_unique ON public.mv_dashboard_metrics USING btree (org_id, project_id, date) +``` + +## Event Partitions + +```text +events_2026_03 +events_2026_04 +events_2026_05 +events_2026_06 +events_2026_07 +events_2026_08 +events_2026_09 +``` + +## Query + +```sql +BEGIN; +SET LOCAL enable_indexscan = off; +SET LOCAL enable_bitmapscan = off; +EXPLAIN (ANALYZE, BUFFERS) +SELECT id, event_id, event_name, user_id, session_id, timestamp, properties +FROM events +WHERE org_id = '00000000-0000-4000-8000-0000000f4241' + AND project_id = '00000000-0000-4000-8000-0000001e8481' + AND timestamp >= NOW() - INTERVAL '7 days' +ORDER BY timestamp DESC +LIMIT 100; +ROLLBACK; +``` + +## EXPLAIN ANALYZE + +```text +BEGIN +SET +SET + QUERY PLAN +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + Limit (cost=1423.83..1424.08 rows=100 width=304) (actual time=7.059..7.066 rows=100 loops=1) + Buffers: shared hit=944 + -> Sort (cost=1423.83..1427.84 rows=1605 width=304) (actual time=7.058..7.061 rows=100 loops=1) + Sort Key: events."timestamp" DESC + Sort Method: top-N heapsort Memory: 127kB + Buffers: shared hit=944 + -> Append (cost=0.00..1362.49 rows=1605 width=304) (actual time=0.010..6.119 rows=5238 loops=1) + Buffers: shared hit=941 + Subplans Removed: 3 + -> Seq Scan on events_2026_06 events_1 (cost=0.00..1354.46 rows=1599 width=301) (actual time=0.009..5.863 rows=5238 loops=1) + Filter: ((org_id = '00000000-0000-4000-8000-0000000f4241'::uuid) AND (project_id = '00000000-0000-4000-8000-0000001e8481'::uuid) AND ("timestamp" >= (now() - '7 days'::interval))) + Rows Removed by Filter: 13138 + Buffers: shared hit=941 + -> Seq Scan on events_2026_07 events_2 (cost=0.00..0.00 rows=1 width=1112) (actual time=0.007..0.007 rows=0 loops=1) + Filter: ((org_id = '00000000-0000-4000-8000-0000000f4241'::uuid) AND (project_id = '00000000-0000-4000-8000-0000001e8481'::uuid) AND ("timestamp" >= (now() - '7 days'::interval))) + -> Seq Scan on events_2026_08 events_3 (cost=0.00..0.00 rows=1 width=1112) (actual time=0.002..0.002 rows=0 loops=1) + Filter: ((org_id = '00000000-0000-4000-8000-0000000f4241'::uuid) AND (project_id = '00000000-0000-4000-8000-0000001e8481'::uuid) AND ("timestamp" >= (now() - '7 days'::interval))) + -> Seq Scan on events_2026_09 events_4 (cost=0.00..0.00 rows=1 width=1112) (actual time=0.002..0.002 rows=0 loops=1) + Filter: ((org_id = '00000000-0000-4000-8000-0000000f4241'::uuid) AND (project_id = '00000000-0000-4000-8000-0000001e8481'::uuid) AND ("timestamp" >= (now() - '7 days'::interval))) + Planning: + Buffers: shared hit=2441 + Planning Time: 4.895 ms + Execution Time: 7.144 ms +(23 rows) + +ROLLBACK +``` + +## Interpretation + +Index and bitmap scans are disabled in this session to show the cost shape when PostgreSQL cannot use the tenant/time access path. This is not a dropped-index benchmark; it is planner-controlled evidence for comparison. diff --git a/docs/query-plans/README.md b/docs/query-plans/README.md index f375678..c9b0da8 100644 --- a/docs/query-plans/README.md +++ b/docs/query-plans/README.md @@ -42,3 +42,5 @@ Each plan file should include: - Short observation about whether partition pruning and expected indexes were used. Redis cache hits bypass PostgreSQL, so cached GraphQL paths should be documented as a cache evidence note that links to the dashboard cache JSON measurement, not as a PostgreSQL EXPLAIN plan. + +`mv_dashboard_metrics` plans are schema evidence only unless a source file in `services/graphql-api/src/` actually queries that materialized view for the same commit. The current GraphQL resolvers use `daily_aggregates`, raw `events`, and Redis-cached resolver responses. diff --git a/scripts/benchmark-report.js b/scripts/benchmark-report.js index 57b0985..4753501 100644 --- a/scripts/benchmark-report.js +++ b/scripts/benchmark-report.js @@ -51,6 +51,10 @@ function evidencePath(prefix, runId) { return resolve(evidenceDir, `${prefix}-${runId}.json`); } +function metadataPath(runId) { + return evidencePath('run-metadata', runId); +} + function relative(path) { return path.replace(`${repoRoot}/`, ''); } @@ -244,6 +248,130 @@ function dockerResources() { return `${cpus}, ${memoryLabel}`; } +function currentDirtyStatus() { + return safeCommand('git status --short').split('\n').filter(Boolean).join('\n'); +} + +function dirtyTreeLabel(metadata) { + if (metadata?.git && typeof metadata.git.dirty_tree === 'boolean') { + return metadata.git.dirty_tree ? 'yes' : 'no'; + } + + return currentDirtyStatus().length ? 'yes' : 'no'; +} + +function dirtyTreeDetails(metadata) { + if (metadata?.git && typeof metadata.git.dirty_status === 'string') { + return metadata.git.dirty_status || 'none'; + } + + return currentDirtyStatus() || 'none'; +} + +function gitCommit(metadata) { + return metadata?.git?.commit || safeCommand('git rev-parse HEAD'); +} + +function metadataRows(metadata) { + if (!metadata) { + return [ + '| Metadata file | not found |', + '| Metadata note | This report fell back to current working-tree state. For publishable evidence, rerun `pnpm benchmark` so `run-metadata-.json` records pre-run provenance. |', + ].join('\n'); + } + + return [ + `| Metadata file | \`${relative(metadataPath(metadata.run_id))}\` |`, + `| Run started | ${metadata.started_at || 'not found'} |`, + `| Run completed | ${metadata.completed_at || 'not found'} |`, + `| Run status | ${metadata.status || 'not found'} |`, + `| Branch at run start | \`${metadata.git?.branch || 'not found'}\` |`, + `| Suites requested | ${(metadata.suites_requested || []).join(', ') || 'not found'} |`, + `| Suites completed | ${(metadata.suites_completed || []).join(', ') || 'not found'} |`, + ].join('\n'); +} + +function envOverrideRows(metadata) { + const overrides = metadata?.environment?.overrides; + if (!overrides || Object.keys(overrides).length === 0) { + return '| not recorded | not recorded |'; + } + + return Object.entries(overrides) + .sort(([left], [right]) => left.localeCompare(right)) + .map(([key, value]) => `| \`${key}\` | \`${value}\` |`) + .join('\n'); +} + +function commandMetadataRows(metadata) { + const commands = metadata?.commands || []; + if (commands.length === 0) { + return '| not recorded | not recorded |'; + } + + return commands + .map((item) => `| ${item.suite || 'unknown'} | \`${item.command || 'not recorded'}\` |`) + .join('\n'); +} + +function completedSuites(metadata) { + return Array.isArray(metadata?.suites_completed) ? metadata.suites_completed : []; +} + +function requestedSuites(metadata) { + return Array.isArray(metadata?.suites_requested) ? metadata.suites_requested : []; +} + +function missingSuites(metadata) { + const requested = requestedSuites(metadata); + const completed = new Set(completedSuites(metadata)); + return requested.filter((suite) => !completed.has(suite)); +} + +function publishability(metadata, commands, runScopedQueryPlans) { + const issues = []; + if (!metadata) { + issues.push('run metadata missing'); + } else { + if (metadata.git?.dirty_tree) { + issues.push('pre-run working tree was dirty'); + } + if (metadata.status !== 'completed') { + issues.push(`benchmark metadata status is ${metadata.status || 'missing'}`); + } + const missing = missingSuites(metadata); + if (missing.length > 0) { + issues.push(`metadata missing completed suite(s): ${missing.join(', ')}`); + } + } + if (commands.found.length !== 7) { + issues.push('not every benchmark suite produced evidence'); + } + if (runScopedQueryPlans.length === 0) { + issues.push('run-scoped query plans missing'); + } + + return issues.length === 0 + ? 'candidate publishable local evidence; still not production-scale' + : `not final publishable evidence: ${issues.join('; ')}`; +} + +function metadataDockerResources(metadata) { + const resources = metadata?.environment?.docker_resources; + if (!resources) return dockerResources(); + + const memory = Number(resources.memory_bytes); + const memoryLabel = Number.isFinite(memory) && memory > 0 + ? `${(memory / 1024 / 1024 / 1024).toFixed(2)} GiB` + : resources.memory_bytes; + + return `${resources.cpus || 'TBD'} CPUs, ${memoryLabel}`; +} + +function metadataServiceVersion(metadata, name, fallback) { + return metadata?.environment?.service_versions?.[name] || fallback; +} + function dbCount(sql) { const value = safeCommand(`docker exec pulseops-postgres psql -U pulseops -d pulseops_dev -tAc "${sql}"`); return value === 'TBD' ? 'TBD' : value; @@ -299,6 +427,7 @@ if (!runId) { } const files = { + metadata: metadataPath(runId), ingest: evidencePath('ingest-throughput', runId), hot: evidencePath('hot-tenant', runId), hotDb: evidencePath('hot-tenant-db', runId), @@ -309,6 +438,7 @@ const files = { }; const summaries = { + metadata: readJsonIfExists(files.metadata), ingest: readJsonIfExists(files.ingest), hot: readJsonIfExists(files.hot), hotDb: readJsonIfExists(files.hotDb), @@ -342,25 +472,54 @@ const report = `# PulseOps Benchmark Report: ${date} Status: evidence-backed local report for run ID \`${runId}\`; not production-scale +Publishability: ${publishability(summaries.metadata, commands, runScopedQueryPlans)} + ## Environment | Field | Value | | --- | --- | -| Git commit | \`${safeCommand('git rev-parse HEAD')}\` | -| Dirty tree | ${safeCommand('git status --short').split('\n').filter(Boolean).length ? 'yes' : 'no'} | +| Git commit | \`${gitCommit(summaries.metadata)}\` | +| Dirty tree | ${dirtyTreeLabel(summaries.metadata)} | +| Dirty tree details | ${dirtyTreeDetails(summaries.metadata) === 'none' ? 'none' : 'see Run Provenance'} | | Machine | ${os.cpus()[0]?.model || 'TBD'}, ${os.cpus().length} logical CPUs, ${(os.totalmem() / 1024 / 1024 / 1024).toFixed(2)} GiB host memory | -| Docker resources | ${dockerResources()} | -| OS | ${os.type()} ${os.release()} ${os.arch()} | -| Node.js version | ${process.version} | -| PostgreSQL version | ${versions.postgres} | -| Redis version | ${versions.redis} | -| Kafka version | ${versions.kafka} | -| PostgreSQL row count | ${rawEvents} raw events | -| Daily aggregate row count | ${dailyAggregates} rows | -| Event partitions | ${partitionCount} child partitions | -| k6 version | ${versions.k6} | +| Docker resources | ${metadataDockerResources(summaries.metadata)} | +| OS | ${summaries.metadata?.environment?.os || `${os.type()} ${os.release()} ${os.arch()}`} | +| Node.js version | ${summaries.metadata?.environment?.node || process.version} | +| PostgreSQL version | ${metadataServiceVersion(summaries.metadata, 'postgres', versions.postgres)} | +| Redis version | ${metadataServiceVersion(summaries.metadata, 'redis', versions.redis)} | +| Kafka version | ${metadataServiceVersion(summaries.metadata, 'kafka', versions.kafka)} | +| PostgreSQL row count at report capture | ${rawEvents} raw events | +| Daily aggregate row count at report capture | ${dailyAggregates} rows | +| Event partitions at report capture | ${partitionCount} child partitions | +| k6 version | ${metadataServiceVersion(summaries.metadata, 'k6', versions.k6)} | | Dataset | ${dataset} | +Environment values come from pre-run metadata when available. PostgreSQL row counts are captured when this report is generated, after the benchmark and query-plan capture. + +## Run Provenance + +| Field | Value | +| --- | --- | +${metadataRows(summaries.metadata)} + +### Dirty Tree Details + +\`\`\`text +${dirtyTreeDetails(summaries.metadata)} +\`\`\` + +### Recorded Environment Overrides + +| Name | Value | +| --- | --- | +${envOverrideRows(summaries.metadata)} + +### Recorded Suite Commands + +| Suite | Command | +| --- | --- | +${commandMetadataRows(summaries.metadata)} + ## Commands \`\`\`bash @@ -395,6 +554,7 @@ ${queryPlanRows(referenceQueryPlans, 'Reference EXPLAIN ANALYZE evidence; cite s | File | Description | | --- | --- | ${evidenceRows([ + { label: 'Pre-run benchmark metadata JSON', path: files.metadata }, { label: 'Raw k6 ingest summary JSON', path: files.ingest }, { label: 'Raw k6 hot-tenant summary JSON', path: files.hot }, { label: 'Hot-tenant PostgreSQL evidence JSON', path: files.hotDb }, @@ -412,6 +572,7 @@ ${evidenceRows([ - Worker throughput claims are limited to the bounded worker catch-up workload if the worker evidence file exists. - Hot-tenant database claims are limited to the aggregate-key pressure, representative EXPLAIN timings, reconciliation status, and after-run PostgreSQL snapshot in the hot-tenant DB evidence file if present. - Query plan claims from this run require run-scoped files above. Otherwise cite the reference query-plan files separately. +- Treat this report as article-ready only if \`Dirty tree\` is \`no\`, run metadata status is \`completed\`, every requested suite is completed, run-scoped query plans are listed, and every cited number comes from this run ID. ## Claims Not Supported By This Run @@ -420,6 +581,7 @@ ${evidenceRows([ - Do not claim long-duration or million-event tenant-skew behavior unless that evidence file is present. - Do not claim realistic cache hit ratio from a cold/warm smoke measurement. - Do not claim Kafka lag limits beyond the captured lag evidence${Number.isFinite(workerLag) ? `; this run's worker final lag was ${workerLag}` : ''}. +- Do not claim final publishable benchmark evidence from this report if \`Dirty tree\` is \`yes\`, run metadata is missing/incomplete, or run-scoped query plans are missing. - The fallback k6 runner is pinned to \`${process.env.K6_DOCKER_IMAGE || 'grafana/k6:2.0.0'}\`; record a new exact version if you override it or use a local k6 binary. `; diff --git a/scripts/capture-query-plans.sh b/scripts/capture-query-plans.sh index 1e4940e..89cef96 100755 --- a/scripts/capture-query-plans.sh +++ b/scripts/capture-query-plans.sh @@ -243,7 +243,7 @@ ORDER BY date ASC, metric_name ASC;" \ capture_plan \ "materialized-dashboard" \ - "Materialized Dashboard Metrics Query" \ + "Materialized Dashboard Metrics Query Schema Evidence" \ "EXPLAIN (ANALYZE, BUFFERS) SELECT date, event_count, unique_users, unique_sessions, events_by_name FROM mv_dashboard_metrics @@ -251,6 +251,6 @@ WHERE org_id = '$ORG_ID' AND project_id = '$PROJECT_ID' AND date >= CURRENT_DATE - INTERVAL '30 days' ORDER BY date DESC;" \ - "This plan captures the materialized dashboard read path. It is not Redis cache evidence; Redis cache timing is captured through GraphQL/load-test summaries." + "This plan captures the materialized dashboard view as schema/query-plan evidence only. The current GraphQL resolvers do not read mv_dashboard_metrics; runtime dashboard paths are daily_aggregates, raw events, and Redis-cached resolver results." capture_cache_note diff --git a/scripts/run-benchmark.js b/scripts/run-benchmark.js index 91876cf..1418bf8 100644 --- a/scripts/run-benchmark.js +++ b/scripts/run-benchmark.js @@ -1,7 +1,8 @@ #!/usr/bin/env node -const { spawnSync } = require('node:child_process'); +const { spawnSync, execSync } = require('node:child_process'); const { mkdirSync, writeFileSync } = require('node:fs'); +const os = require('node:os'); const suites = { ingest: ['node', 'scripts/run-k6.js', 'tests/load/ingest-throughput.js'], @@ -16,17 +17,131 @@ const suites = { const requested = process.argv.slice(2); const names = requested.length ? requested : Object.keys(suites); const runId = process.env.RUN_ID || new Date().toISOString().replace(/[:.]/g, '-'); +const evidenceDir = 'docs/benchmarks/evidence'; +const metadataPath = `${evidenceDir}/run-metadata-${runId}.json`; -mkdirSync('docs/benchmarks/evidence', { recursive: true }); -writeFileSync('docs/benchmarks/evidence/latest-run-id.txt', `${runId}\n`, 'utf8'); +const unknownSuites = names.filter((name) => !suites[name]); +if (unknownSuites.length > 0) { + console.error(`Unknown benchmark suite(s): ${unknownSuites.join(', ')}`); + console.error(`Available suites: ${Object.keys(suites).join(', ')}`); + process.exit(1); +} + +function safeCommand(command) { + try { + return execSync(command, { encoding: 'utf8', stdio: ['ignore', 'pipe', 'ignore'] }).trim(); + } catch (_error) { + return 'TBD'; + } +} + +function selectedEnvironment() { + const allowed = [ + 'API_URL', + 'GRAPHQL_URL', + 'RATE', + 'DURATION', + 'BATCH_SIZE', + 'START_RATE', + 'PEAK_RATE', + 'RAMP_DURATION', + 'HOLD_DURATION', + 'RAMP_DOWN_DURATION', + 'VUS', + 'SLEEP_SECONDS', + 'BURST_RATE', + 'BURST_RAMP', + 'BURST_HOLD', + 'RECOVERY_RATE', + 'RECOVERY', + 'PREALLOCATED_VUS', + 'MAX_VUS', + 'EVENTS', + 'TIMEOUT_MS', + 'POLL_MS', + 'WARM_ITERATIONS', + 'TENANT_KEYS_FILE', + 'ORG_ID', + 'PROJECT_ID', + 'START_DATE', + 'END_DATE', + 'FILTER_RATIO', + 'HOT_TENANT_RATIO', + 'HOT_TENANT_COUNT', + ]; + + return Object.fromEntries( + allowed + .filter((name) => process.env[name] !== undefined) + .map((name) => [name, process.env[name]]) + ); +} + +function serviceVersions() { + const localK6 = safeCommand('k6 version'); + const k6DockerImage = process.env.K6_DOCKER_IMAGE || 'grafana/k6:2.0.0'; + const dockerK6 = localK6 === 'TBD' + ? safeCommand(`docker run --rm ${k6DockerImage} version 2>/dev/null | head -1`) + : localK6; + + return { + postgres: safeCommand("docker exec pulseops-postgres psql -U pulseops -d pulseops_dev -tAc 'SHOW server_version;'"), + redis: safeCommand("docker exec pulseops-redis redis-server --version | awk '{print $3}'"), + kafka: safeCommand("docker exec pulseops-kafka /opt/kafka/bin/kafka-topics.sh --version 2>/dev/null | head -1"), + k6: localK6 === 'TBD' ? `${dockerK6} (Docker fallback image ${k6DockerImage})` : localK6, + }; +} + +function dbCounts() { + return { + raw_events: safeCommand('docker exec pulseops-postgres psql -U pulseops -d pulseops_dev -tAc "SELECT count(*) FROM events;"'), + daily_aggregates: safeCommand('docker exec pulseops-postgres psql -U pulseops -d pulseops_dev -tAc "SELECT count(*) FROM daily_aggregates;"'), + event_partitions: safeCommand('docker exec pulseops-postgres psql -U pulseops -d pulseops_dev -tAc "SELECT count(*) FROM pg_inherits WHERE inhparent = \'events\'::regclass;"'), + }; +} + +const startedAt = new Date().toISOString(); +const preRunStatus = safeCommand('git status --short'); +const metadata = { + run_id: runId, + started_at: startedAt, + completed_at: null, + status: 'running', + suites_requested: names, + suites_completed: [], + git: { + commit: safeCommand('git rev-parse HEAD'), + branch: safeCommand('git rev-parse --abbrev-ref HEAD'), + dirty_tree: preRunStatus.length > 0, + dirty_status: preRunStatus, + }, + environment: { + os: `${os.type()} ${os.release()} ${os.arch()}`, + node: process.version, + docker_resources: { + cpus: safeCommand("docker info --format '{{.NCPU}}' 2>/dev/null"), + memory_bytes: safeCommand("docker info --format '{{.MemTotal}}' 2>/dev/null"), + }, + service_versions: serviceVersions(), + db_counts_before: dbCounts(), + overrides: selectedEnvironment(), + }, + commands: names.map((name) => ({ + suite: name, + command: suites[name] ? suites[name].join(' ') : null, + })), +}; + +function writeMetadata() { + writeFileSync(metadataPath, `${JSON.stringify(metadata, null, 2)}\n`, 'utf8'); +} + +mkdirSync(evidenceDir, { recursive: true }); +writeFileSync(`${evidenceDir}/latest-run-id.txt`, `${runId}\n`, 'utf8'); +writeMetadata(); for (const name of names) { const command = suites[name]; - if (!command) { - console.error(`Unknown benchmark suite: ${name}`); - console.error(`Available suites: ${Object.keys(suites).join(', ')}`); - process.exit(1); - } console.log(`Running ${name} with RUN_ID=${runId}: ${command.join(' ')}`); const result = spawnSync(command[0], command.slice(1), { @@ -38,6 +153,17 @@ for (const name of names) { }); if (result.status !== 0) { + metadata.status = 'failed'; + metadata.failed_suite = name; + metadata.completed_at = new Date().toISOString(); + writeMetadata(); process.exit(result.status || 1); } + + metadata.suites_completed.push(name); + writeMetadata(); } + +metadata.status = 'completed'; +metadata.completed_at = new Date().toISOString(); +writeMetadata();