perf(db): cut CPU ~78% via evmlog/output indexes + hot query rewrites#406
Draft
matheus1lva wants to merge 1 commit into
Draft
perf(db): cut CPU ~78% via evmlog/output indexes + hot query rewrites#406matheus1lva wants to merge 1 commit into
matheus1lva wants to merge 1 commit into
Conversation
Top 3 pg_stat_statements consumers (74% + 8.7% = 82.7% of total exec time) were full-table or full-hypertable scans: - #1 (39.4%, 1.17M calls): projectDebtAllocator full scan of evmlog (no usable index on chain_id, signature). - #2 (35.0%, 287k calls): fetchStrategyPerformance scans every Timescale chunk of output (no series_time predicate). - #3 (8.7%, 12.7M calls): timeseries fanout DISTINCT block_time over whole output history per (vault, label). Fix: - Add evmlog(chain_id, signature, block_number DESC, log_index DESC) and partial expr index on (args->>'vault'). - Add output(chain_id, address, label, series_time DESC). - fetchStrategyPerformance: CTE+JOIN -> DISTINCT ON with series_time >= now() - 30 days (chunk pruning). - timeseries fanout: query already-bucketed series_time bounded by [start, end] instead of DISTINCT block_time over all history. See docs/cpu-cost-analysis.md for full breakdown + plan.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Neon CPU usage roughly doubled since the start of the year and Neon recently changed CPU billing.
pg_stat_statementsshows two queries account for 74% of total exec time, seven for ~99%. All are reads againstevmlog(5M rows / 5.9 GB) and theoutputhypertable, and all are caused by missing indexes or missingseries_timepredicates.Full breakdown, per-query root cause, and ranked plan:
docs/cpu-cost-analysis.md.Top consumers fixed in this PR
projectDebtAllocatorfull-scans evmlog — PK is(chain_id, address, signature, …)so(chain_id, signature)without address can't use itevmlog(chain_id, signature, block_number DESC, log_index DESC)+ partial expr index on(args->>'vault')fetchStrategyPerformancehas noseries_timepredicate → every Timescale chunk scanned +MAX(block_time) GROUP BYDISTINCT ON (…) … ORDER BY series_time DESCbounded byseries_time >= now() - interval '30 days'SELECT DISTINCT block_timeover all history per (vault × label)series_timebetween[start, end]; backed by the newoutput(chain_id, address, label, series_time DESC)indexExpected combined CPU reduction: ~78% of monthly CPU-seconds.
Schema changes (migration
20260513233437-cpu-indexes)Indexes are idempotent (
IF NOT EXISTS). Recommended deployment path: create the evmlog indexesCONCURRENTLYagainst prod first (5.9 GB table — non-concurrent creation will block writes for minutes), then run the migration which becomes a no-op.Code changes
packages/ingest/abis/yearn/3/vault/snapshot/hook.ts—fetchStrategyPerformancerewritten (CTE+JOIN →DISTINCT ON+series_timewindow).packages/ingest/fanout/timeseries.ts— DISTINCTblock_time→ DISTINCTseries_timewith[start, end]bounds.Out of scope (follow-ups)
Listed in
docs/cpu-cost-analysis.mdunder the fix plan, in materiality order:projectDebtAllocatorresult in the vault snapshot row.series_timelower bound to the apr/apyMAX(CASE WHEN …)resolvers (Online ingest loader #4 / Setup yarn workspace #6 / Move .env to root #7).timeseries/tvlsGQL queries (Online gql #5).event_name COUNT(*)probe inpackages/ingest/probe/index.ts(Index vaults and strategies #8 — 47 s per call).Test plan
Setup
.envPOSTGRES_*points at a dev branch (not prod).make dev— starts redis, postgres, ingest, web.Migration
CREATE INDEXstatements run cleanly; subsequent runs are no-ops.migrate upto re-apply.Hot query #1 —
projectDebtAllocatorevmlog_idx_chain_signature_args_vault(orevmlog_idx_chain_signature), execution time < 20 ms (was seconds).fanout abisand confirm v3 vault snapshots complete without errors.Hot query #2 —
fetchStrategyPerformanceoracle.apr,historical.weeklyNet, etc.) populate as before.EXPLAIN ANALYZEthe rewritten query against the dev DB and confirm it usesidx_output_chain_address_label_series_timewith chunk exclusion (TimescaleChunks excluded during runtime).Hot query #3 — Timeseries fanout
fanout abis(timeseries). Confirm jobs enqueue identically to before for a known vault — same set ofendOfDaytimestamps queued.mq.addcalls to a baseline if reproducible.Regression — replays
fanout replaysfrom terminal UI; confirm replays still complete and produce the same outputs.Cleanup
make down