fix(core): drop locale code lowercasing that broke mixed-case locales#1552
fix(core): drop locale code lowercasing that broke mixed-case locales#1552ching-kuo wants to merge 1 commit into
Conversation
localeCode schema was calling .toLowerCase() on the validated value, rewriting e.g. zh-TW to zh-tw before the query ran. Rows are stored with the casing from defaultLocale and SQLite/D1 compare strings case-sensitively, so the filter matched zero rows — admin content list returned an empty envelope with no error. Removing the transform keeps read and write casing symmetric.
🦋 Changeset detectedLatest commit: a88fd5e The changes in this PR will be included in the next version bump. This PR includes changesets to release 16 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
There was a problem hiding this comment.
Approach
The fix targets a real, correctly-diagnosed bug. localeCode ran .toLowerCase() on the validated value, so a ?locale=zh-TW filter became zh-tw before the query ran. Content rows are written with the casing of the configured defaultLocale (the seed/import/runtime paths set locale directly from getI18nConfig().defaultLocale and never go through this schema), and SQLite/D1 compare TEXT case-sensitively (no NOCASE collation on the locale column — migration 019_i18n.ts). So the filter matched zero rows and the admin content list came back empty with no error. Removing the transform to make read/write casing symmetric is the right minimal fix and fits EmDash's conventions. The added unit test reproduces the regression at the schema level (it fails under the old .transform(...)), and the changeset is present, scoped to emdash as patch, and written as user-facing release notes. So the change is the right change, solving the right problem.
What I checked
- Traced every
localeCodeconsumer (contentListQuery,contentCreateBody,searchQuery,searchSuggestQuery) — all four are locale filter/input fields; none rely on the lowercasing. - Confirmed the stored locale comes from
defaultLocaleviaseed/apply.ts,import/, andhandleContentCreate(body.locale ?? getI18nConfig()?.defaultLocale), none of which normalize casing — so the asymmetry the PR describes is real. - Confirmed the repository layer filters with a bare
WHERE locale = ?(database/repositories/content.ts:510,852) and the FTS search path withAND c.locale = ?(search/query.ts:240,373) — both case-sensitive, so the same bug applied to search and is fixed by the same change. - Confirmed the
localecolumn is plainTEXTwith noCOLLATE NOCASE, so the case-sensitive comparison premise holds on SQLite and Postgres. - Confirmed the test would fail under the old transform and passes now, and that no other test asserted the old lowercasing behavior.
- Confirmed the changeset package name (
emdash) and bump level (patch) are correct.
Headline
The fix resolves the reported bug for seed/import/runtime-created content. However, it can regress sites that created content through the admin UI under the previously-released lowercasing code on a mixed-case defaultLocale. The admin create flow always sends an explicit locale: defaultLocale through contentCreateBody → localeCode, so under the old released code those rows were stored lowercased (zh-TW → zh-tw). After this PR the now-case-preserving ?locale=zh-TW filter no longer matches them, so they vanish from the admin list — and UNIQUE(slug, locale) won't catch a re-created entry because zh-tw ≠ zh-TW. See the line-anchored finding for the recommended remediation (a canonicalizing migration or a case-insensitive query), which would fix the reported bug without orphaning the lowercased rows the predecessor code wrote.
Net: approach is correct and the code is clean, but I'd hold for a migration/query-level fix (or an explicit decision that the affected population is acceptable) before merging.
| .string() | ||
| .regex(/^[a-z]{2,3}(-[a-z0-9]{2,8})*$/i, "Invalid locale code") | ||
| .transform((v) => v.toLowerCase()); | ||
| export const localeCode = z.string().regex(/^[a-z]{2,3}(-[a-z0-9]{2,8})*$/i, "Invalid locale code"); |
There was a problem hiding this comment.
[needs fixing] Removing the .toLowerCase() transform correctly fixes the reported bug for content stored with defaultLocale casing (seed/import/runtime paths write locale directly from getI18nConfig().defaultLocale, bypassing this schema, so e.g. zh-TW rows existed while the old filter lowercased ?locale=zh-TW to zh-tw and matched nothing).
But the transform being removed has shipped in released versions (it's in the repo's first commit), and the admin create flow does not bypass the schema: router.tsx always sends locale: pickerLocale (pickerLocale = locale ?? manifest?.i18n?.defaultLocale), which is parsed by contentCreateBody → localeCode. Under the old code, localeCode.parse("zh-TW") returned "zh-tw", so handleContentCreate stored those rows with locale = 'zh-tw' (see api/handlers/content.ts:640, effectiveLocale = body.locale ?? defaultLocale).
After this PR, the list filter preserves casing (?locale=zh-TW), and the locale column is plain TEXT with no COLLATE NOCASE (migration 019_i18n.ts), so SQLite/D1 compare case-sensitively. Net effect on a mixed-case-defaultLocale site (e.g. zh-TW) that created content via the admin UI under the released code:
- Those existing rows are stored as
zh-tw. WHERE locale = 'zh-TW'no longer matches them → they vanish from the admin content list.
This trades one empty-list bug for another on a different (and likely larger) population of sites — any non-English site using canonical BCP 47 casing that created entries through the admin UI. A secondary effect: because uniqueness is UNIQUE(slug, locale) and zh-tw ≠ zh-TW, an editor re-creating an entry with the same slug at the now-correct zh-TW casing won't get a conflict, producing duplicate slugs across the two casings.
The robust fix is to canonicalize what's stored rather than just stop normalizing the input — e.g. a forward-only migration that rewrites every ec_*.locale to the configured casing (looping _emdash_collections like migration 013), or making the filter case-insensitive (WHERE locale = ? COLLATE NOCASE on SQLite / LOWER(locale) = LOWER(?) on Postgres) so the query matches regardless of how a row was historically stored. Either approach fixes the reported bug without orphaning the lowercased rows this PR's predecessor wrote. A changeset note calling out the migration step would also be appropriate if a migration is added.
|
The bot review looks valid, I didn't consider that posts created in the admin page go through the schema and get stored with a lowercase locale (e.g. en-US becomes en-us). A few possible solutions here:
I personally prefer the canonicalizing migration, since a migration likely runs only once, while the case-insensitive comparison causes a performance hit on every locale-filtered query (it can't use the idx_*_locale index). Happy to implement the migration if that direction sounds good. |
What does this PR do?
localeCode schema was calling .toLowerCase() on the validated value, rewriting e.g. zh-TW to zh-tw before the query ran. Rows are stored with the casing from defaultLocale and SQLite/D1 compare strings case-sensitively, so the filter matched zero rows — admin content list returned an empty envelope with no error.
Removing the transform keeps read and write casing symmetric.
Simple test is also added for 3 different cases.
Closes #1551
Type of change
Checklist
pnpm typecheckpassespnpm lintpassespnpm testpasses (or targeted tests for my change)pnpm formathas been runmessages.pochanges except in translation PRs — a workflow extracts catalogs on merge tomain.AI-generated code disclosure
Screenshots / test output