feat(cloudflare): Durable Objects SQL database backend with read replication#1492
Conversation
…cation Adds a production durableObjects() adapter backed by EmDashDB, a Durable Object that holds the whole CMS in SQLite. Reads route to the nearest replica, writes proxy to the primary, and authenticated requests get read-your-writes via a bookmark cookie. One DO stub is reused per request (D1 pays per-query stub setup). Transactions match D1 (rejected, so withTransaction degrades to direct execution). Adds infra/do-demo, a perf fixture mirroring blog-demo but on the DO backend for head-to-head RTT comparison.
A Durable Object stub is a per-request I/O object. The singleton dialect is cached across requests on globalThis, so caching a stub in the driver bound it to a stale request and threw 'Cannot perform I/O on behalf of a different request' on the next request's migration check. The driver now resolves the stub per acquireConnection; the request-scoped factory's closure provides one-stub-per-request reuse, the singleton resolves fresh. Verified end-to-end locally: migrations, seed, and content reads all run against EmDashDB. Documents the local-dev replica_routing caveat.
Registers do-demo.emdashcms.com (emdash-demo-do) as a third head-to-head perf target alongside the D1 baseline and Astro-cache sites.
blog-demo's live D1 was running stale pre-seed content (marshland-birds), which isn't in the repo seed and doesn't exist on the freshly-seeded do-demo. Reseeded blog/cache from the repo seed and repointed the measured post to notes-on-simplicity so all three sites are comparable.
…arial review) - Read-after-write within a request now waits on the freshest write bookmark (sink) rather than the stale cookie bookmark, so create()+findById() on a lagging replica sees the row instead of throwing. - Singleton dialect (migrations, scheduled tasks) gets its own bookmark sink so hasColumn-after-ALTER and publish-then-read stay consistent on upgrades where replicas already exist. - Detect writes via cursor.rowsWritten so write-CTEs/PRAGMA-writes on the primary still capture their bookmark and affected-row count. - waitForBookmark failures degrade to a possibly-stale read instead of 500ing every read until a bad bookmark cookie clears. - Remove unused batch() (advertised atomicity the dialect never provided). - Cookie: skip oversized bookmarks, add bounded maxAge. - Document the PRAGMA/foreign-key and singleton-sink-concurrency limitations. Adds read-after-write and bookmark-feedback dialect tests.
…licas) Same DO SQL driver as do-demo but without the replica_routing flag, so reads hit the single primary. Isolates the DO/RPC-architecture cost from read-replica routing in the perf comparison. Not yet registered in perf-monitor (needs content seeded via the setup wizard first).
🦋 Changeset detectedLatest commit: 6bae578 The changes in this PR will be included in the next version bump. This PR includes changesets to release 16 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
emdash-demo-cache | 6bae578 | Jun 16 2026, 03:26 PM |
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
docs | 6bae578 | Jun 16 2026, 03:25 PM |
Scope checkThis PR changes 10,714 lines across 61 files. Large PRs are harder to review and more likely to be closed without review. If this scope is intentional, no action needed. A maintainer will review it. If not, please consider splitting this into smaller PRs. See CONTRIBUTING.md for contribution guidelines. |
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
emdash-playground | 6bae578 | Jun 16 2026, 03:26 PM |
The per-request DO dialect now buffers SELECTs issued in the same event-loop turn and ships them as a single batchQuery RPC, instead of one round trip per read. A page that issues ~17 reads collapses to one round trip -- a large win when reads cross a region to the primary, and still meaningful on a local replica. Always on for the per-request path (not a config flag -- this is a new driver, so there's no back-compat reason to gate it); the cross-request singleton never coalesces. Only plain SELECTs batch; writes and other statements take the direct path, serialized on an op-chain so bookmark ordering holds. Batch failures fall back to per-statement execution. Adds EmDashDB.batchQuery (read-only, one bookmark wait for the batch) and CoalescingDOSqlDialect, modeled on the D1 coalescer. 8 new tests.
Overlapping PRsThis PR modifies files that are also changed by other open PRs:
This may cause merge conflicts or duplicated work. A maintainer will coordinate. |
Adds RequestMetrics.rpcCount + recordRpc(), emitted as the rpc.count Server-Timing entry. The DO SQL dialects bump it once per physical RPC (query/batchQuery) via an injected onRpc callback, so coalescing is visible: a page issuing N coalesced reads shows db.count=N but rpc.count=1. db.count counts logical queries (Kysely log hook); rpc.count counts round trips.
Adds region-placed probe Workers in aws:sa-east-1 (Sao Paulo) and aws:ap-southeast-2 (Sydney) via placement hints, plus their coordinator service bindings and region labels. Both are far from the primary, where DO read replicas should help most.
…enableReplicas)
The replication API moved from ctx.configureReadReplication({mode}) + ctx.primaryStub
to ctx.storage.enableReplicas() + ctx.storage.primary. We were on the old names,
so #isReplica was always false: a write landing on a replica wouldn't proxy to the
primary (it'd surface the readonly error instead of routing). Switch to the current
storage.* surface (the bookmarks API was already there).
oxfmt was reformatting the wrangler-generated worker-configuration.d.ts (24k-line churn the format bot kept committing). Add it to ignorePatterns alongside the other generated file (emdash-env.d.ts) and restore the generated version.
@emdash-cms/admin
@emdash-cms/auth
@emdash-cms/auth-atproto
@emdash-cms/blocks
@emdash-cms/cloudflare
@emdash-cms/contentful-to-portable-text
emdash
create-emdash
@emdash-cms/gutenberg-to-portable-text
@emdash-cms/plugin-cli
@emdash-cms/plugin-types
@emdash-cms/registry-client
@emdash-cms/registry-lexicons
@emdash-cms/sandbox-workerd
@emdash-cms/x402
@emdash-cms/plugin-ai-moderation
@emdash-cms/plugin-atproto
@emdash-cms/plugin-audit-log
@emdash-cms/plugin-color
@emdash-cms/plugin-embeds
@emdash-cms/plugin-field-kit
@emdash-cms/plugin-forms
@emdash-cms/plugin-webhook-notifier
commit: |
|
/review |
| // the next request once a fresh bookmark is minted). | ||
| if (isRead && opts?.bookmark && this.#isReplica) { | ||
| try { | ||
| await this.#replication.waitForBookmark?.(opts.bookmark); |
There was a problem hiding this comment.
BUG: waitForBookmark can stall reads under a stale cookie with no client-side bound
Category: Resource Management / Error Handling
Severity: MEDIUM
The try/catch around waitForBookmark swallows errors and serves stale, which is good. But there is no application-level timeout: if waitForBookmark takes its full underlying budget to determine a bookmark is unreachable (or worse, blocks at the platform's outer limits), every read for that client pays the full wait on every request for the cookie's 24-hour maxAge. There is no self-heal until either (a) the bad cookie is overwritten by a fresh write bookmark in this same response, or (b) the cookie expires. A bookmark from a different DO ID (e.g. the same browser hits the site after a redeploy with a renamed name) cannot be advanced to by this DO at all — so the user will never get a fresh write bookmark to overwrite it, and they're stuck waiting on every read for a full day.
Trigger: Issue a fresh login under DO name: "emdash", then redeploy with name: "emdash-v2" (or rename the binding so idFromName resolves differently). The browser still presents the old bookmark cookie. Every read on the new DO calls waitForBookmark(stale) and waits the full underlying timeout before throwing and serving stale.
Fix: Wrap waitForBookmark in Promise.race with a short timeout (e.g. 200–500 ms). On timeout, log and serve stale — same posture as the catch path. Same change applies at line 193 in batchQuery.
|
|
||
| return { | ||
| db, | ||
| commit() { |
There was a problem hiding this comment.
BUG: bookmark cookie is not updated when next() throws
Category: State & Concurrency
Severity: LOW
commit() is invoked from the middleware only on the success path (renderAndFinalize() returns), not in a finally. If a write succeeds against the primary and then render throws (template error, late authorization failure, plugin crash mid-render), the write is durable but the user's bookmark cookie never gets the post-write bookmark. Their immediately-following request can therefore land on a replica that hasn't caught up and read pre-write state — exactly the read-your-writes hole the bookmark is supposed to close.
Trigger: Authenticated POST /_emdash/api/content/posts succeeds (write reaches primary, sink.latest set), the handler then throws while serializing the response. The catch path in middleware returns the error page without calling commit(). The user retries, lands on a replica, sees pre-write state.
Fix: Move scoped.commit() into a finally so it runs whether render succeeded or threw — the bookmark is correct either way (write was committed to the primary independently of render outcome).
| // would otherwise drop their bookmark (breaking read-your-writes) and | ||
| // report no affected rows. (On a replica a misclassified write throws | ||
| // readonly above and is retried on the primary, so it never reaches here.) | ||
| const wrote = !isRead || cursor.rowsWritten > 0; |
There was a problem hiding this comment.
BUG: PRAGMA writes drop their bookmark when rowsWritten === 0
Category: Logic Errors
Severity: LOW
The wrote = !isRead || cursor.rowsWritten > 0 check is meant to catch "PRAGMA writes the heuristic classifies as reads" (per the comment) — but cursor.rowsWritten only counts mutated rows. A PRAGMA user_version = N (or PRAGMA journal_mode = …) doesn't touch any row; on better-sqlite3 and SQLite's documented behavior, that PRAGMA's changes is 0. So isRead === true (matches PRAGMA prefix) AND rowsWritten === 0 → wrote === false → no bookmark returned. A subsequent read on a replica won't wait for that PRAGMA-write to replicate, and the read-your-writes guarantee is silently lost for any PRAGMA mutation.
In practice the codebase rarely uses write-PRAGMAs at runtime, but migrations do (PRAGMA user_version, FK toggles per the class doc), and the singleton dialect runs those — so a freshly-altered schema can be missed by a same-request follow-up read on a replica.
Trigger: Run a migration that executes PRAGMA user_version = 13 followed by a SELECT that depends on the migration having landed. On a replica, the SELECT does not wait for the PRAGMA's bookmark.
Fix: Drop the rowsWritten > 0 short-circuit for statements where isReadStatement(sql) was true AND the prefix is PRAGMA. Easier: return a bookmark whenever the statement was classified as PRAGMA (cheap; the cost is one extra getCurrentBookmark() per PRAGMA call).
| secure: opts.url.protocol === "https:", | ||
| // Bound the lifetime so a stale bookmark can't linger indefinitely. | ||
| maxAge: 60 * 60 * 24, | ||
| }); |
There was a problem hiding this comment.
BUG: read-your-writes lost for anonymous writes
Category: Logic Errors
Severity: LOW
commit() short-circuits on !opts.isAuthenticated. A request that's anonymous but mutates (public comment form, contact submission, anonymous like/vote — any plugin route that accepts unauth POSTs) leaves no bookmark cookie. The user refreshes immediately, lands on a replica, and sees pre-write state for their own submission.
This differs from the D1 adapter's behavior, which (per D1Config.session docs) doesn't carve out anonymous writes — the bookmark cookie tracks the session regardless of auth.
Trigger: Anonymous POST to a plugin-defined public endpoint that writes. Reload immediately on a lagging replica region. Result: own write invisible until replica catches up (no upper bound; bookmark would have prevented this).
Fix: Drop the !opts.isAuthenticated guard in commit(). The bookmark cookie is httpOnly + sameSite=lax and bound to 24h, so writing it for anonymous users carries no auth/privacy cost. (If maintainers want to gate it because anonymous-replica latency is preferred over consistency, document this divergence from D1 explicitly in the JSDoc on session.)
| */ | ||
| export function isReadStatement(sql: string): boolean { | ||
| const trimmed = sql.trimStart().toUpperCase(); | ||
| return READ_PREFIXES.some((prefix) => trimmed.startsWith(prefix)); |
There was a problem hiding this comment.
Possible issue: isReadStatement toUpper's the entire SQL string per query
Category: Resource Management
Severity: LOW
sql.trimStart().toUpperCase() allocates an upper-case copy of the entire SQL text on every call. Large content queries with embedded portable-text JSON or long IN-lists can be many KB. This function runs on every Kysely query in both the dialect and inside the DO query() RPC handler — so for a coalesced batch of 17 SELECTs, that's 34 full-string upper-cases per page.
Fix: Compare only the leading keyword. e.g.
const start = sql.trimStart();
return READ_PREFIXES.some(p => start.length >= p.length && start.slice(0, p.length).toUpperCase() === p);or use a regex like /^(?:select|pragma|explain|with)\b/i. Identical semantics, doesn't allocate over the body of the query.
|
Review posted successfully. Posted 5 line-anchored comments on PR #1492:
|
- waitForBookmark is now bounded by a 250ms timeout (Promise.race) in both query() and batchQuery(): a stale/cross-DO-id bookmark cookie no longer makes every read block the full underlying budget for the cookie's lifetime; on timeout we serve a possibly-stale read (self-heals on the next fresh bookmark). - PRAGMA statements now capture a replication bookmark even when rowsWritten is 0 (e.g. PRAGMA user_version), so a follow-up read on a replica waits for the schema change. - isReadStatement matches only the leading keyword via an anchored regex instead of upper-casing the entire SQL body on every query. - Persist the session bookmark cookie in a finally so a write-then-failed-render still records it (read-your-writes; also fixes the D1 session path). Disputed: the 'anonymous writes lose RYW' finding -- the DO commit() guard matches the D1 adapter exactly (d1.ts), so it's not a divergence; changing it is a cross-cutting decision for both adapters, out of scope here.
|
Addressed the review (commit 91c7ceb):
|
What does this PR do?
Adds a first-class Durable Objects SQL database backend for EmDash on Cloudflare, as an alternative to D1, with experimental read replication. The goal is to cut read round-trip latency by serving reads from a Durable Object replica near the visitor instead of a single-region D1 primary.
A new
durableObjects()config adapter stores the whole CMS in one Durable Object's SQLite (EmDashDB). Withsession: "auto"plus theexperimental+replica_routingcompatibility flags:namespace.get(id)).Key implementation notes:
beginTransaction, so EmDash's existingwithTransactionhelper degrades to direct execution — the same atomicity profile the codebase already runs under on D1.workers-typesyet), so the class also works as a plain single-DO database when the flag is off.Also adds two deployed perf fixtures for head-to-head measurement against the existing D1 baseline:
infra/do-demo— DO with read replicas (do-demo.emdashcms.com), registered in perf-monitor.infra/do-solo-demo— DO single primary, noreplica_routing— isolates the DO/RPC-architecture cost from the replica-routing win. Deployed but not yet registered in perf-monitor (pending content seed).The measured perf-monitor Single Post slug was repointed to
notes-on-simplicity(present in the repo seed) after reseeding the blog/cache fixtures, so all sites are comparable.This is a maintainer-directed prototype (no prior Discussion, per the maintainer's instruction to open the PR directly).
Type of change
Checklist
pnpm typecheckpassespnpm lintpassespnpm testpasses (or targeted tests for my change) —@emdash-cms/cloudflaresuite (174 tests) passes; new dialect tests addedpnpm formathas been run@emdash-cms/cloudflareminorAI-generated code disclosure
Screenshots / test output
Validated end-to-end on a local
EmDashDB: setup runs all migrations, the seed applies, and content reads (homepage, single post, content API with bylines/revisions) all serve correctly. An adversarial review pass (review → fix → re-review) closed four critical/major read-your-writes bugs before this PR; remaining items are documented limitations (connection-scoped PRAGMAs, best-effort singleton-sink under concurrent maintenance writes) and a follow-up (avitest-pool-workersharness to unit-test the DO class directly — a gap shared with the existing preview DO class).Note
Replica routing only activates on deployed workers — local
pnpm dev(miniflare) rejects thereplica_routingflag, so the do-demo fixture'swrangler.jsoncdocuments dropping it for local runs. The latency win shows up across perf-monitor's regional probes, not locally.Try this PR
Open a fresh playground →
A full working EmDash site, deployed from this branch. Each visit gets its own session-scoped sandbox: no login needed and no shared state. Try the admin, edit content, hit the public site.
Tracks
feat/do-sql-driver. Updated automatically when the playground redeploys.