🎨 Improved admin Comments page load time on large comment datasets#28734
🎨 Improved admin Comments page load time on large comment datasets#28734jwmarshall wants to merge 1 commit into
Conversation
no ref - The admin "all comments" moderation page (getAdminAllComments) orders by created_at and emits the count.replies / count.direct_replies relations as per-row correlated subqueries. With no supporting indexes the optimizer full-scans the comments table once per returned row and filesorts the whole table for the ordering, so the page took ~90s to load on sites with very large numbers of comments (especially ones skewed toward top-level comments). - Added four additive secondary indexes on comments — created_at, status, (in_reply_to_id, status) and (parent_id, in_reply_to_id, status) — covering the ORDER BY list, the COUNT(DISTINCT) pagination count, and the reply-count subqueries, cutting the page's DB time from ~90s to tens of milliseconds. - Chose indexes over a query refactor to keep the change minimal and backward-compatible; they are purely additive, so the migration's down drops them without disturbing the existing foreign-key indexes.
|
It looks like this PR contains a migration 👀 General requirements
Schema changes
Data changes
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
WalkthroughFour new indexes are added to the 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
I have also created a PR that contains all four new indexes combined with a query refactor under my fork: jwmarshall#1 I'm working out a test environment to show the difference in performance gains. |
Hello Ghost team 👋 - This is my first PR and I've tried to follow all repository guidelines. For context, I'm working on a migration from Wordpress that has almost 400k comments. Once imported I noticed performance issues for some pages and queries for comments. This PR attempts to add some tables indexes to improve performance.
Thanks in advance!
--
On sites with a large number of comments, the admin Comments moderation page ("all comments") becomes effectively unusable. It took ~90s to load on a real-world dataset of ~390k comments. The public per-post comment widget is unaffected; this is purely the admin
getAdminAllCommentspath.The page query orders by
created_atand emits thecount.replies/count.direct_repliesrelations as per-row correlated subqueries. With no supporting indexes, the optimizer full-scans thecommentstable once per returned row and filesorts the whole table for the ordering. The problem is worst when the data is skewed toward top-level comments (parent_id IS NULL), where the column is too low-cardinality for the optimizer to trust the existing single-column FK index, but it slows down any large comment table.What does it do?
Adds four additive secondary indexes to the
commentstable, declared inschema.jsand applied via a non-transactional migration:comments(created_at)ORDER BY created_at DESClist (removes the filesort)comments(status)COUNT(DISTINCT id)pagination countcomments(in_reply_to_id, status)count.direct_repliessubquery onin_reply_to_idcomments(parent_id, in_reply_to_id, status)count.repliesplus theparent_id + in_reply_to_id IS NULLhalf ofcount.direct_repliesThe 3-column index covers both the
parent_id-only andparent_id + in_reply_to_id IS NULLsubqueries, so a separate(parent_id, status)is not needed. On the test dataset this cut the page's DB time from ~90s to tens of milliseconds.The indexes are purely additive —
parent_idandin_reply_to_idkeep their own foreign-key indexes — so the migration'sdowndrops them without any FK index re-add dance.Why is this something Ghost users or developers need?
Comment moderation on any high-volume Ghost site is currently slow to the point of timing out. This is a low-risk, backward-compatible fix (no schema/data. changes beyond indexes, no API or behaviour changes) that makes the moderation page usable at scale.
Notes / trade-offs
comments(status)is low-cardinality: it still beats a clustered full scan for theCOUNT(DISTINCT)because the secondary index is far narrower than the row (which carries thehtmllongtext). It's the most droppable of the four.Testing
integrity.test.js).upcreates all four indexes, is idempotent, anddownreverses cleanly.A note on that last checkbox: this is a pure index/migration change, covered by the schema integrity test and manual migration verification, but there's no automated test proving the performance win (that needs a large seeded dataset and EXPLAIN ANALYZE, which isn't practical as a unit test). I left it unchecked to be honest.