Adds subtree (descendant) matching to taxonomy `where` filters by MA2153 · Pull Request #1648 · emdash-cms/emdash

MA2153 · 2026-06-29T11:22:27Z

What does this PR do?

Adds a subtree operator to collection where taxonomy filters so selecting a parent term matches that term and all of its descendants, resolved in SQL:

where: { region: { subtree: "europe" } } // matches europe + every descendant region

Today the taxonomy where filter matches by exact slug only, so "this term or anything filed under it" (selecting a parent category in a faceted browse UI) is inexpressible. The only workaround — enumerating the subtree client-side and passing every descendant slug — expands to one bound parameter per slug and overflows D1's 100-bind-parameter cap (D1_ERROR: too many SQL variables) on deep hierarchies. It also can't be chunked without breaking keyset pagination.

This resolves the subtree server-side from a single root slug via a recursive CTE over taxonomies.parent_id, so the bound-parameter count is independent of subtree size. After #1646 both parent_id and content_taxonomies.taxonomy_id live in translation_group space, so the walk is locale-correct and matches taxonomy_id directly.

Also adds an opt-in rollup option to getTaxonomyTerms (and the admin terms endpoint via ?rollup=1) returning distinct-entry subtree counts, so a facet badge equals what selecting that facet returns. Default behavior (exact-slug filter, exact-term counts) is unchanged.

Related: Discussion #1647 (opened in Ideas; awaiting maintainer approval — opening the PR early to share the implementation, happy to adjust the operator surface, e.g. { subtree } vs { descendantsOf }, per the discussion).

Note for reviewers — query-count snapshot: CI may report +1 query on GET /posts/building-for-the-long-term (17→18). This is pre-existing and unrelated to this PR: the snapshot was last refreshed by #1619, and #1577 (offset pagination, which changed bucketFilter) merged afterward without refreshing it. This branch's diff does not touch the loader's bucketing/cache-key path (only a WhereSubtree type re-export in query.ts). I deliberately did not update the snapshot here to avoid misattributing the drift; it belongs to a separate fix.

Type of change

Feature (requires maintainer-approved Discussion)

Checklist

I have read CONTRIBUTING.md
pnpm typecheck passes
pnpm lint passes (0 diagnostics)
pnpm test passes (targeted: the new subtree filter + subtree count suites, 10 tests + the schema-coercion unit tests)
pnpm format has been run
I have added/updated tests for my changes
No admin UI strings added — no messages.po changes included (n/a for i18n)
I have added a changeset (emdash: minor)
New feature links to a Discussion: Native subtree (descendant) matching for hierarchical taxonomy `where` filters #1647 (awaiting approval)

AI-generated code disclosure

This PR includes AI-generated code — model/tool: Claude Opus 4.8 (Claude Code)

Screenshots / test output

Dialect-parity tests run under describeEachDialect (SQLite locally; Postgres in CI via PG_CONNECTION_STRING). Highlights:

loader-taxonomy-subtree-filter.test.ts — single-root and multi-root match, >999-descendant overflow guard (would exceed SQLite's bind limit if descendants were enumerated rather than walked in-SQL), mixed exact + subtree across taxonomies, empty-roots short-circuit, cross-locale (match by translation_group), keyset pagination.
taxonomy-subtree-counts.test.ts — distinct-entry rollup honesty (an entry tagged at both a parent and its child counts once), getTaxonomyTerms({ rollup }), and handleTermList({ rollup }).

Test Files  2 passed (2)
      Tests  10 passed (10)

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…nation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

changeset-bot · 2026-06-29T11:22:38Z

🦋 Changeset detected

Latest commit: 399fe8e

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 16 packages

Name	Type
emdash	Minor
@emdash-cms/cloudflare	Minor
@emdash-cms/sandbox-workerd	Patch
@emdash-cms/fixture-perf-site	Patch
@emdash-cms/perf-demo-site	Patch
@emdash-cms/cache-demo-site	Patch
@emdash-cms/do-demo-site	Patch
@emdash-cms/do-solo-demo-site	Patch
@emdash-cms/admin	Minor
@emdash-cms/auth	Minor
@emdash-cms/blocks	Minor
@emdash-cms/gutenberg-to-portable-text	Minor
@emdash-cms/x402	Minor
create-emdash	Minor
@emdash-cms/auth-atproto	Patch
@emdash-cms/plugin-embeds	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

github-actions · 2026-06-29T11:23:57Z

Scope check

This PR changes 567 lines across 12 files. Large PRs are harder to review and more likely to be closed without review.

If this scope is intentional, no action needed. A maintainer will review it. If not, please consider splitting this into smaller PRs.

See CONTRIBUTING.md for contribution guidelines.

pkg-pr-new · 2026-06-29T11:25:08Z

Open in StackBlitz

@emdash-cms/admin

npm i https://pkg.pr.new/@emdash-cms/admin@1648

@emdash-cms/auth

npm i https://pkg.pr.new/@emdash-cms/auth@1648

@emdash-cms/auth-atproto

npm i https://pkg.pr.new/@emdash-cms/auth-atproto@1648

@emdash-cms/blocks

npm i https://pkg.pr.new/@emdash-cms/blocks@1648

@emdash-cms/cloudflare

npm i https://pkg.pr.new/@emdash-cms/cloudflare@1648

@emdash-cms/contentful-to-portable-text

npm i https://pkg.pr.new/@emdash-cms/contentful-to-portable-text@1648

emdash

npm i https://pkg.pr.new/emdash@1648

create-emdash

npm i https://pkg.pr.new/create-emdash@1648

@emdash-cms/gutenberg-to-portable-text

npm i https://pkg.pr.new/@emdash-cms/gutenberg-to-portable-text@1648

@emdash-cms/plugin-cli

npm i https://pkg.pr.new/@emdash-cms/plugin-cli@1648

@emdash-cms/plugin-types

npm i https://pkg.pr.new/@emdash-cms/plugin-types@1648

@emdash-cms/registry-client

npm i https://pkg.pr.new/@emdash-cms/registry-client@1648

@emdash-cms/registry-lexicons

npm i https://pkg.pr.new/@emdash-cms/registry-lexicons@1648

@emdash-cms/sandbox-workerd

npm i https://pkg.pr.new/@emdash-cms/sandbox-workerd@1648

@emdash-cms/x402

npm i https://pkg.pr.new/@emdash-cms/x402@1648

@emdash-cms/plugin-ai-moderation

npm i https://pkg.pr.new/@emdash-cms/plugin-ai-moderation@1648

@emdash-cms/plugin-atproto

npm i https://pkg.pr.new/@emdash-cms/plugin-atproto@1648

@emdash-cms/plugin-audit-log

npm i https://pkg.pr.new/@emdash-cms/plugin-audit-log@1648

@emdash-cms/plugin-color

npm i https://pkg.pr.new/@emdash-cms/plugin-color@1648

@emdash-cms/plugin-embeds

npm i https://pkg.pr.new/@emdash-cms/plugin-embeds@1648

@emdash-cms/plugin-field-kit

npm i https://pkg.pr.new/@emdash-cms/plugin-field-kit@1648

@emdash-cms/plugin-forms

npm i https://pkg.pr.new/@emdash-cms/plugin-forms@1648

@emdash-cms/plugin-webhook-notifier

npm i https://pkg.pr.new/@emdash-cms/plugin-webhook-notifier@1648

commit: 399fe8e

emdashbot

Approach

This is the right change for the right problem. Exact-slug taxonomy filters cannot express "this term or anything under it", and the only workaround (enumerating every descendant slug) blows past D1's bind-parameter limit on deep trees. Adding a first-class { subtree } operator and resolving descendants in SQL with a recursive CTE fits EmDash's architecture cleanly: it reuses the existing translation_group-aware model from migration 045, keeps the parameter count constant, and doesn't disturb the default exact-slug behavior.

What I checked

SQL safety: The recursive CTEs use Kysely's sql tagged template for values and sql.ref() for identifiers; the dynamic table/collection names are validated via getTableName/getTaxonomyNames. No raw interpolation of user slugs.
Locale / i18n correctness: The subtree walk uses translation_group/parent_id, so matches are locale-agnostic in the same way content_taxonomies.taxonomy_id is. The loader's outer locale filter still scopes the returned entries.
Authorization: The admin terms route still checks taxonomies:read; the new ?rollup query param is just passed through.
Cache invalidation: getTaxonomyTerms({ rollup: true }) is still wrapped in the existing cachedQuery/requestCached layers with a cache key that includes r1, so term mutations via invalidateTermCache() bust it correctly.
Tests: Dialect-parametric tests cover single root, multiple roots, the >999-descendant overflow guard, mixing exact + subtree filters, empty-root short-circuit, cross-locale group matching, keyset pagination, and distinct-entry rollup counts. A changeset is present.

Headline conclusion

The code is clean and I don't see any blocking bugs. I have one small suggestion: the public where docstrings in loader.ts and query.ts list exact/array/range examples but don't mention the new subtree operator, so developers won't discover it from autocomplete/docs.

Process note: Per AGENTS.md, new features require a maintainer-approved Discussion. This PR links to Discussion #1647, which is currently in Ideas and awaiting approval. The implementation looks ready, but merge should wait for that approval.

Findings

[suggestion] packages/core/src/loader.ts:643

The public where docstring lists exact, byline, field, and range examples but omits the new subtree operator. Add a usage example so callers discover the feature through API docs/autocomplete.
```
	 * @example { published_at: { gte: '2024-01-01', lt: '2025-01-01' } } - date range
	 * @example { category: { subtree: 'news' } } - match a term and all descendants
```
[suggestion] packages/core/src/query.ts:124

The public where docstring lists exact/array/byline/field/range examples but does not mention the new subtree operator added by this PR. Add an example so the public API surface is documented.
```
	 * @example { published_at: { gte: '2024-01-01', lt: '2025-01-01' } } - Date range
	 * @example { category: { subtree: 'news' } } - Match a term and all its descendants
```

Postgres COUNT() returns bigint as a string, so getTaxonomyTermCounts returned "1" instead of 1 under the pg driver, failing the rollup test's exact-count assertion. Coerce with Number(), matching countEntriesForSubtrees. Also document the new `subtree` where-operator in the loader/query docstrings (review suggestion). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

MA2153

Production note: the `subtree` filter's `EXISTS` plan is entry-driven and scans the whole collection at low selectivity

Caught this in a production D1 trace with the feature live behind a faceted-browse UI. Sharing here since this PR is the source of the SQL — it's not a blocker for the operator surface, but it's worth weighing before this lands.

What the trace showed

One faceted-browse request — a single collection filtered by { <tax>: { subtree: [<two sibling leaf terms>] } }, first page (LIMIT 24) — issued one D1 query that read 43,986 rows to return ≤24 cards (sql_duration_ms ≈ 68, served_by_primary). It dominated the request by ~4 orders of magnitude: every other span read 0–1 rows, and the rollup counts were served from the KV object cache and cost no D1 at all.

The query is exactly the subtreeCond block added in loader.ts:

... FROM <collection>
WHERE <status> AND locale = ?
  AND EXISTS (
    SELECT 1 FROM content_taxonomies ct
    WHERE ct.collection = ? AND ct.entry_id = <collection>.id
      AND ct.taxonomy_id IN (WITH RECURSIVE sub(grp) AS (...) SELECT grp FROM sub)
  )
ORDER BY created_at DESC, id DESC
LIMIT 24

Why it reads so much

The filter is an EXISTS correlated to <collection>.id, so the plan is entry-driven: walk the collection in created_at DESC order and probe content_taxonomies per candidate until 24 rows pass EXISTS. When the selected subtree is sparse relative to the sort order (here the chosen leaf terms tag ~0.1% of the collection, with none near the top of the recency order), the engine pages through tens of thousands of entries to fill a single page.

Indexes aren't the problem — content_taxonomies PK (collection, entry_id, taxonomy_id) makes each per-entry probe an index hit. The problem is the plan visits every candidate entry. The one index that could drive selection — idx_content_taxonomies_term (taxonomy_id) — sits on the side this entry-driven plan never reaches.

This shape isn't unique to subtree; the pre-existing exact-slug taxonomyCond is the same EXISTS-per-row pattern. But subtree is the operator built for faceted browse over deep hierarchies — exactly the case where collections are large and a parent/section selection is sparse-or-old against a recency sort — so it's where the plan degrades toward O(table) reads. (Worth confirming with EXPLAIN QUERY PLAN on a representative dataset; the row counts strongly indicate the above.)

Suggestion — a pivot-driven plan for taxonomy/subtree filters

Resolve matching entry_ids from the pivot first, then fetch/sort entries:

WITH sub(grp) AS ( ...recursive subtree... ),
     matched AS (
       SELECT DISTINCT entry_id FROM content_taxonomies
       WHERE collection = ? AND taxonomy_id IN (SELECT grp FROM sub)
     )
SELECT ... FROM <collection>
JOIN matched ON matched.entry_id = <collection>.id
WHERE <status> AND locale = ?
ORDER BY created_at DESC, id DESC LIMIT 24

matched reads only the taggings under the subtree via idx_content_taxonomies_term (hundreds, not the whole table). I don't think it's a free swap, though — tradeoffs to weigh:

Selectivity cuts both ways. Pivot-first wins when the subtree is sparse (the faceted-browse common case); the current entry-driven plan can win when the subtree matches most rows and the recency index lets it stop early after 24. A cost-based choice — or at least a documented heuristic — may be warranted rather than always doing one or the other.
Multi-facet AND (several taxonomy keys in one where) becomes an intersection of matched sets rather than independent EXISTS clauses.
Keyset pagination must still order by the entry's (created_at, id); the JOIN form preserves that.

Smaller, orthogonal win regardless of plan: hoist the recursive sub resolution into a single top-level CTE shared by the filter (and reuse it for the rollup count path in taxonomy.ts), so the subtree group set is resolved once.

Not a blocker

The headline win here — resolving the subtree server-side from root slugs so the bound-parameter count is independent of subtree size — is the right call, and the recursive CTE is the correct tool for it. This is purely about the execution plan of the resulting EXISTS, which I'm flagging because in the motivating use case the production cost of the whole request landed almost entirely on this one query.

github-actions · 2026-06-29T13:44:23Z

Overlapping PRs

This PR modifies files that are also changed by other open PRs:

perf(core): cache taxonomy defs per-isolate; move runtime/db singletons to globalThis #1399 (3 shared files)

This may cause merge conflicts or duplicated work. A maintainer will coordinate.

MA2153 and others added 9 commits June 29, 2026 12:35

feat(loader): add subtree operator to taxonomy where filters

01a4b76

fix(loader): short-circuit empty subtree roots to avoid invalid SQL

f2f9ecf

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

test(loader): cover subtree filter overflow, multi-root, locale, pagi…

15f4961

…nation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test(loader): strengthen subtree overflow guard past SQLite bind limit

f6a3c2c

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(taxonomy): add countEntriesForSubtrees distinct rollup

507b7a8

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(taxonomy): add rollup option to getTaxonomyTerms

6e6b14f

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(taxonomy): expose rollup counts via handleTermList and terms route

ce90d30

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

chore: changeset for taxonomy subtree filter

88458b8

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

fix(taxonomy): coerce ?rollup=false/0 instead of rejecting with 400

da8bc61

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-actions Bot added review/needs-review No maintainer or bot review yet area/core size/XL labels Jun 29, 2026

emdashbot Bot reviewed Jun 29, 2026

View reviewed changes

github-actions Bot added review/needs-rereview Author pushed changes since the last review and removed review/needs-review No maintainer or bot review yet labels Jun 29, 2026

Merge branch 'main' into feat/taxonomy-subtree-where-filter

6690313

github-actions Bot added the cla: signed label Jun 29, 2026

MA2153 marked this pull request as draft June 29, 2026 13:16

MA2153 commented Jun 29, 2026

View reviewed changes

github-actions Bot added the overlap label Jun 29, 2026

github-actions Bot removed the review/needs-rereview Author pushed changes since the last review label Jun 29, 2026

Use pivot

399fe8e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adds subtree (descendant) matching to taxonomy `where` filters#1648

Adds subtree (descendant) matching to taxonomy `where` filters#1648
MA2153 wants to merge 12 commits into
emdash-cms:mainfrom
MA2153:feat/taxonomy-subtree-where-filter

MA2153 commented Jun 29, 2026

Uh oh!

changeset-bot Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

pkg-pr-new Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

emdashbot Bot left a comment

Uh oh!

MA2153 left a comment

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

MA2153 commented Jun 29, 2026

What does this PR do?

Type of change

Checklist

AI-generated code disclosure

Screenshots / test output

Uh oh!

changeset-bot Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

github-actions Bot commented Jun 29, 2026

Scope check

Uh oh!

pkg-pr-new Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emdashbot Bot left a comment

Choose a reason for hiding this comment

Approach

What I checked

Headline conclusion

Findings

Uh oh!

MA2153 left a comment

Choose a reason for hiding this comment

Production note: the subtree filter's EXISTS plan is entry-driven and scans the whole collection at low selectivity

What the trace showed

Why it reads so much

Suggestion — a pivot-driven plan for taxonomy/subtree filters

Not a blocker

Uh oh!

github-actions Bot commented Jun 29, 2026

Overlapping PRs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

changeset-bot Bot commented Jun 29, 2026 •

edited

Loading

pkg-pr-new Bot commented Jun 29, 2026 •

edited

Loading

Production note: the `subtree` filter's `EXISTS` plan is entry-driven and scans the whole collection at low selectivity