Skip to content

first phase of db improvements#400

Closed
matheus1lva wants to merge 5 commits into
mainfrom
fix/first-phase-egress
Closed

first phase of db improvements#400
matheus1lva wants to merge 5 commits into
mainfrom
fix/first-phase-egress

Conversation

@matheus1lva
Copy link
Copy Markdown
Collaborator

@matheus1lva matheus1lva commented May 7, 2026

Summary

Implements the first phase of the Neon egress reduction plan from the HackMD. The branch reduces rows/bytes returned by hot ingest queries without adding a cache layer.

Main changes:

  • Bound fanout timestamp discovery to the requested series window using series_time.
  • Replace the strategy-performance latest_times CTE/JOIN with a bounded DISTINCT ON (address, label, component) scan.
  • Fetch only thing.defaults in hot timeseries hooks instead of full thing rows.
  • Push things.get() equality / inequality filters into SQL and keep semver filtering in JS.
  • Bound current APY/APR MAX pivots to the last 7 days.
  • Narrow targeted evmlog.args reads to the fields each caller actually uses.

Expected savings

These are directional estimates from the HackMD diagnosis, not proof of production impact. The real pass/fail check is the post-deploy pg_stat_statements resample after about 3 days.

  • Fanout timestamp discovery was the biggest measured row offender: about 4M calls / 3.5d, 2B rows, and 16 GB estimated egress. Bounding it to the requested window should plausibly remove 99%+ of returned rows for that query shape.
  • Strategy performance fetch had the largest CPU cost: about 91K calls / 3.5d and roughly 35 days of database CPU over that window. The bounded DISTINCT ON scan should turn the full-history CTE/JOIN into a recent-window lookup.
  • thing and evmlog projection changes reduce JSONB bytes returned at hot call sites. These are expected to help most where callers only need a few fields from large defaults / args blobs.
  • Overall expectation: targeted queryids should drop by at least 80% in returned rows, with the fanout query expected at 99%+. Invoice-level savings still need the production measurement window because JSONB/TOAST wire bytes are not fully captured by local tests.

How to review

Start with:

  • packages/ingest/fanout/timeseries.ts
  • packages/ingest/abis/yearn/3/vault/snapshot/hook.ts
  • packages/ingest/things.ts
  • packages/ingest/helpers/apy-apr.ts

Then skim the small hook updates under packages/ingest/abis/yearn/** and packages/ingest/abis/erc4626/** to confirm shape-preserving query projection changes.

Behavior should stay the same for normal/current data. The intentional tradeoff is that current strategy performance and APY/APR helpers now ignore stale output outside their bounded lookback windows instead of returning old historical values.

Test plan

Automated checks run:

bun run --filter ingest lint
bunx tsc -p packages/ingest/tsconfig.json --noEmit
cd packages/ingest && bun run test things.spec.ts abis/yearn/2/vault/snapshot/hook.spec.ts abis/yearn/2/strategy/snapshot/hook.spec.ts
bun run --filter ingest test
bun run --filter lib test

What those checks cover:

  • Type safety and syntax of changed SQL call sites.
  • Existing behavior for things.get() filters and affected Yearn snapshot hooks.
  • No obvious shape regression in the tested ingest flows.

What tests do not prove:

  • They do not prove Neon egress reduction by themselves.
  • They do not prove pg_stat_statements.rows drops for production queryids.
  • They do not fully smoke GraphQL/API response shapes for frontend consumers.

GraphQL/API smoke: run representative frontend-style queries and confirm response shapes stay unchanged.

(
  GQL_URL="http://localhost:3001/api/gql"
  CHAIN_ID=1
  VAULT_ADDRESS="0x0000000000000000000000000000000000000000" # replace with a known production vault
  STRATEGY_ADDRESS="0x0000000000000000000000000000000000000000" # replace with one of that vault's strategies

  # Vault response shape
  curl -sS "$GQL_URL" \
    -H "content-type: application/json" \
    --data "$(jq -nc --argjson chainId "$CHAIN_ID" --arg address "$VAULT_ADDRESS" '{query:"query Vault($chainId:Int!,$address:String!){ vault(chainId:$chainId,address:$address){ chainId address name symbol apy { net weeklyNet monthlyNet } tvl { close } strategies { address name status performance { oracle { apr apy } historical { net weeklyNet monthlyNet inceptionNet } } } } }", variables:{chainId:$chainId,address:$address}}')" \
    | jq '{errors, vault: .data.vault | {chainId, address, name, symbol, apy, tvl, strategiesCount: (.strategies | length), firstStrategy: .strategies[0]}}'

  # Strategy response shape
  curl -sS "$GQL_URL" \
    -H "content-type: application/json" \
    --data "$(jq -nc --argjson chainId "$CHAIN_ID" --arg address "$STRATEGY_ADDRESS" '{query:"query Strategy($chainId:Int!,$address:String!){ strategy(chainId:$chainId,address:$address){ chainId address name apiVersion vault { address name } apr { net gross } reports { blockNumber transactionHash gain loss totalDebt } } }", variables:{chainId:$chainId,address:$address}}')" \
    | jq '{errors, strategy: .data.strategy | {chainId, address, name, apiVersion, vault, apr, reportsCount: (.reports | length), firstReport: .reports[0]}}'
)

Production measurement after deploy:

psql "$DATABASE_URL" -c "SELECT pg_stat_statements_reset();"
# Wait about 3 days, then sample targeted queryids. This emits JSON so reviewers can inspect with jq.
psql "$DATABASE_URL" -Atc "SELECT jsonb_pretty(jsonb_agg(jsonb_build_object('queryid', queryid, 'calls', calls, 'rows', rows, 'mean_exec_time', mean_exec_time, 'query', left(query, 220)) ORDER BY rows DESC)) FROM pg_stat_statements WHERE query ILIKE '%FROM output%' OR query ILIKE '%FROM thing%' OR query ILIKE '%FROM evmlog%';" | jq .

Expected production pass criteria:

  • Targeted queryids show at least 80% row reduction versus the HackMD baseline.
  • Fanout timestamp query shows 99%+ row reduction.
  • Targeted queries drop out of, or materially fall within, the top-N egress offenders.

Risk / impact

No migrations or cache layer. Query changes are additive/narrowing, but production validation still needs the post-deploy pg_stat_statements resample above. Main behavior tradeoff: stale output outside the bounded lookback windows is no longer used for current strategy performance and APY/APR helpers.

@vercel
Copy link
Copy Markdown

vercel Bot commented May 7, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
kong Ready Ready Preview, Comment May 13, 2026 11:10pm

Request Review

@yearn yearn deleted a comment from matheusilva-stord May 13, 2026
Copy link
Copy Markdown
Contributor

@murderteeth murderteeth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request: split this PR for safer rollout and clean attribution

This PR bundles six different change shapes that all touch hot ingest paths: fanout window, strategy-performance query rewrite, current APY/APR lookback bounds, things.get() SQL pushdown, JSONB projection narrowing, and test runner cleanup. Two of those are behavior changes (dropping stale-output fallback past the lookback windows), not pure perf.

Bundled, this is hard to operate:

  • If anything regresses post-deploy, revert has to unwind six unrelated shapes at once.
  • pg_stat_statements deltas only attribute cleanly per deploy. Bundled, we see aggregate movement but can't tie a queryid improvement (or regression) to a specific change. That's the whole point of the HackMD baseline.
  • The fanout query is the 99%+ row-reduction win. Landing it on its own starts clawing back egress immediately instead of waiting on consensus about the smaller changes.

Suggested split

PR 1 — test runner cleanup

  • packages/lib/run-tests.ts default spec discovery.
  • Remove describe.only at packages/lib/strider.spec.ts:4 (still there on this branch — the PR only removes it.only).
  • Confirm bun run --filter lib test runs the full suite, not just strider.

PR 2 — fanout timestamp discovery

  • Just the series_time bounded discovery in packages/ingest/fanout/timeseries.ts.
  • Maps to the top HackMD offender (SELECT DISTINCT block_time FROM output…).
  • Deploy and measure before adding more.

PR 3 — current performance / output lookups

  • Bounded current APY/APR pivots. Replace the hardcoded 7 days with a dedicated env var, e.g. CURRENT_PERFORMANCE_LOOKBACK_DAYS, defaulting to 7.
  • Reuse that same env var for the 14-day strategy performance lookback. If the 7d/14d gap was deliberate (e.g. strategy reports lag), call out why and keep them named separately instead.
  • Preserve the single-timestamp invariant in fetchStrategyPerformance (packages/ingest/abis/yearn/3/vault/snapshot/hook.ts). The old CTE used GROUP BY address, label, so every component for a given (address, label) came from the same block_time. The new DISTINCT ON (address, label, component) shape picks the latest block_time per component independently, so a strategy can return net from one output point and weeklyNet from another.

After 1–3 are measured, decide whether the remaining projection changes (thing.defaults, things.get() pushdown, evmlog.args keys) still carry their weight against the new baseline.

PR 1 is pretty lightweight, if convenient combine with PR 2.

@matheus1lva
Copy link
Copy Markdown
Collaborator Author

Split per review feedback:

Closing this in favor of the split PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants