Skip to content

🎨 Improved admin Comments page load time on large comment datasets#28734

Open
jwmarshall wants to merge 1 commit into
TryGhost:mainfrom
jwmarshall:comment-indexes
Open

🎨 Improved admin Comments page load time on large comment datasets#28734
jwmarshall wants to merge 1 commit into
TryGhost:mainfrom
jwmarshall:comment-indexes

Conversation

@jwmarshall

@jwmarshall jwmarshall commented Jun 18, 2026

Copy link
Copy Markdown

Hello Ghost team 👋 - This is my first PR and I've tried to follow all repository guidelines. For context, I'm working on a migration from Wordpress that has almost 400k comments. Once imported I noticed performance issues for some pages and queries for comments. This PR attempts to add some tables indexes to improve performance.

Thanks in advance!

--

On sites with a large number of comments, the admin Comments moderation page ("all comments") becomes effectively unusable. It took ~90s to load on a real-world dataset of ~390k comments. The public per-post comment widget is unaffected; this is purely the admin getAdminAllComments path.

The page query orders by created_at and emits the count.replies / count.direct_replies relations as per-row correlated subqueries. With no supporting indexes, the optimizer full-scans the comments table once per returned row and filesorts the whole table for the ordering. The problem is worst when the data is skewed toward top-level comments (parent_id IS NULL), where the column is too low-cardinality for the optimizer to trust the existing single-column FK index, but it slows down any large comment table.

What does it do?

Adds four additive secondary indexes to the comments table, declared in schema.js and applied via a non-transactional migration:

Index Serves
comments(created_at) the ORDER BY created_at DESC list (removes the filesort)
comments(status) the COUNT(DISTINCT id) pagination count
comments(in_reply_to_id, status) the count.direct_replies subquery on in_reply_to_id
comments(parent_id, in_reply_to_id, status) count.replies plus the parent_id + in_reply_to_id IS NULL half of count.direct_replies

The 3-column index covers both the parent_id-only and parent_id + in_reply_to_id IS NULL subqueries, so a separate (parent_id, status) is not needed. On the test dataset this cut the page's DB time from ~90s to tens of milliseconds.

The indexes are purely additive — parent_id and in_reply_to_id keep their own foreign-key indexes — so the migration's down drops them without any FK index re-add dance.

Why is this something Ghost users or developers need?

Comment moderation on any high-volume Ghost site is currently slow to the point of timing out. This is a low-risk, backward-compatible fix (no schema/data. changes beyond indexes, no API or behaviour changes) that makes the moderation page usable at scale.

Notes / trade-offs

  • Write amplification: four extra secondary indexes add modest per-row cost on comment insert/update/delete; negligible for read-heavy comment workloads.
  • comments(status) is low-cardinality: it still beats a clustered full scan for the COUNT(DISTINCT) because the secondary index is far narrower than the row (which carries the html longtext). It's the most droppable of the four.
  • Alternative considered: a query-level refactor that batches the reply counts into a single grouped pass (instead of per-row correlated subqueries) would remove the need for the reply-count indexes entirely. Indexes were chosen here as the minimal, backward-compatible change; happy to follow up with the query refactor if preferred.

Testing

  • Updated the schema integrity hash test (integrity.test.js).
  • Verified the migration up creates all four indexes, is idempotent, and down reverses cleanly.
  • Existing comments-service unit tests pass; lint passes.

  • I've read and followed the Contributor Guide
  • I've explained my change
  • I've written an automated test to prove my change works

A note on that last checkbox: this is a pure index/migration change, covered by the schema integrity test and manual migration verification, but there's no automated test proving the performance win (that needs a large seeded dataset and EXPLAIN ANALYZE, which isn't practical as a unit test). I left it unchecked to be honest.

no ref

- The admin "all comments" moderation page (getAdminAllComments) orders by
  created_at and emits the count.replies / count.direct_replies relations as
  per-row correlated subqueries. With no supporting indexes the optimizer
  full-scans the comments table once per returned row and filesorts the whole
  table for the ordering, so the page took ~90s to load on sites with very
  large numbers of comments (especially ones skewed toward top-level comments).
- Added four additive secondary indexes on comments — created_at, status,
  (in_reply_to_id, status) and (parent_id, in_reply_to_id, status) — covering
  the ORDER BY list, the COUNT(DISTINCT) pagination count, and the reply-count
  subqueries, cutting the page's DB time from ~90s to tens of milliseconds.
- Chose indexes over a query refactor to keep the change minimal and
  backward-compatible; they are purely additive, so the migration's down drops
  them without disturbing the existing foreign-key indexes.
@github-actions github-actions Bot added the migration [pull request] Includes migration for review label Jun 18, 2026
@github-actions

Copy link
Copy Markdown
Contributor

It looks like this PR contains a migration 👀
Here's the checklist for reviewing migrations:

General requirements

  • ⚠️ Tested performance on staging database servers, as performance on local machines is not comparable to a production environment
  • Satisfies idempotency requirement (both up() and down())
  • Does not reference models
  • Filename is in the correct format (and correctly ordered)
  • Targets the next minor version
  • All code paths have appropriate log messages
  • Uses the correct utils
  • Contains a minimal changeset
  • Does not mix DDL/DML operations
  • Tested in MySQL and SQLite

Schema changes

  • Both schema change and related migration have been implemented
  • For index changes: has been performance tested for large tables
  • For new tables/columns: fields use the appropriate predefined field lengths
  • For new tables/columns: field names follow the appropriate conventions
  • Does not drop a non-alpha table outside of a major version

Data changes

  • Mass updates/inserts are batched appropriately
  • Does not loop over large tables/datasets
  • Defends against missing or invalid data
  • For settings updates: follows the appropriate guidelines

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0eded1e9-91a2-4187-b113-d8d29741febb

📥 Commits

Reviewing files that changed from the base of the PR and between 2d9f208 and 60d1b02.

📒 Files selected for processing (3)
  • ghost/core/core/server/data/migrations/versions/6.46/2026-06-18-18-02-46-add-comments-moderation-indexes.js
  • ghost/core/core/server/data/schema/schema.js
  • ghost/core/test/unit/server/data/schema/integrity.test.js

Walkthrough

Four new indexes are added to the comments table to support admin moderation page queries: a single-column index on created_at, a single-column index on status, a compound index on (in_reply_to_id, status), and a compound index on (parent_id, in_reply_to_id, status). The schema definition in schema.js is updated with these entries, a new non-transactional migration file implements the up (create) and down (drop) handlers, and the schema integrity test hash is updated to reflect the new schema state.

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title mentions improved admin Comments page load time, which directly aligns with the core objective of fixing performance issues on the admin Comments moderation page through database indexes.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the performance problem, the solution with four specific indexes, the rationale behind design choices, and testing performed.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jwmarshall

Copy link
Copy Markdown
Author

I have also created a PR that contains all four new indexes combined with a query refactor under my fork: jwmarshall#1

I'm working out a test environment to show the difference in performance gains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

migration [pull request] Includes migration for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant