Skip to content

chore(deploy): retain previous-deploy immutable chunks in R2#2365

Draft
vladjerca wants to merge 5 commits into
mainfrom
chore/r2-immutable-asset-retention
Draft

chore(deploy): retain previous-deploy immutable chunks in R2#2365
vladjerca wants to merge 5 commits into
mainfrom
chore/r2-immutable-asset-retention

Conversation

@vladjerca
Copy link
Copy Markdown
Contributor

@vladjerca vladjerca commented May 17, 2026

Summary

After every deploy, SvelteKit emits new content-hashed chunks under _app/immutable/. Existing clients (cached HTML, in-flight tabs, iOS WebViews) still reference the previous deploy's hashed filenames, and the CF ASSETS binding 404s them. Earlier client-side recovery only fires once a chunk has loaded, leaving cold-start clients stuck.

This PR keeps recent deploys' immutable assets alive in R2 and serves them as a fallback on cache miss:

  • New R2 bucket binding R2_IMMUTABLE (bucket name: trakt-web-immutable)
  • CI sync step uploads .svelte-kit/cloudflare/_app/immutable/ to R2 additively (HEAD-skip on hit; content-hashed keys make this safe), then writes a release manifest at releases/<ISO-ts>_<sha>.json listing the keys this deploy depends on
  • CI prune step (runs after sync) computes retention as the union of two windows:
    • Count: newest KEEP_RELEASES manifests (default 10)
    • Age: every manifest younger than MIN_AGE_HOURS (default 24)
      Then unions kept manifests' keys, deletes orphan _app/immutable/* objects, and deletes pruned manifests. Idempotent. Supports DRY_RUN=1.
  • New SvelteKit handle hook (src/lib/features/asset-fallback/) intercepts /_app/immutable/* paths that miss the ASSETS binding and serves them from R2 with cache-control: public, immutable, max-age=31536000
  • Sigv4 + S3 ops live in .scripts/_internal/r2.ts (no aws-sdk dep, plain Deno fetch())
  • Both CI steps use continue-on-error: true so transient R2 failures do not block deploys

Why count + age?

Pure count-based retention breaks under deploy bursts. 10 fast deploys in an hour shrinks the recovery pool to ~1 hour, and a user whose HTML was cached 6 hours ago will request a chunk that's already evicted. The age window guarantees ≥24h coverage regardless of deploy cadence.

Scenario Manifests retained
1 deploy / day last 10 (=10 days coverage)
15 deploys in 1 hour all 15 (=24h coverage, count window ignored)
Mixed: 12 today + 30 last month 12 recent + 0 old = 12 (age dominates)

Stray objects under releases/ (manual uploads, partial writes) whose key does not parse as an ISO timestamp are always pruned — we trust no provenance.

Flow on stale-chunk hit post-deploy

  1. Client requests /_app/immutable/chunks/<old-hash>.js
  2. CF ASSETS binding 404s (chunk not in new build)
  3. Worker invoked - handleAssetFallback runs first in the hook sequence
  4. R2 lookup _app/immutable/chunks/<old-hash>.js
  5. R2 hit -> serve with immutable cache-control
  6. Client recovers, page boots

Remaining steps for @seferturan to wrap up

  • Create R2 bucket: wrangler r2 bucket create trakt-web-immutable
  • Generate an R2 token scoped to object R/W on trakt-web-immutable only
  • Add GitHub Action secrets in trakt/trakt-web:
    • R2_ACCESS_KEY_ID
    • R2_SECRET_ACCESS_KEY
      (CLOUDFLARE_ACCOUNT_ID already present)
  • Verify first deploy: Sync Immutable Chunks to R2 shows non-zero uploads; Prune Old Releases from R2 reports nothing to do early on
  • After enough deploys to exceed both windows: confirm Prune step reports orphan deletes and that the bucket size stops growing
  • Sanity test: fetch a previous-deploy chunk URL post-redeploy and confirm 200 via R2 fallback
  • Tune knobs as needed: KEEP_RELEASES / MIN_AGE_HOURS env on the CI step
  • (Optional) Add alerting on either CI step failing (currently continue-on-error so silent degradation is possible)

Test plan

  • deno fmt / deno lint clean for new files
  • vitest run covers asset-fallback handle + R2 helpers + count/age retention selection (20 tests passing)
  • Manual: after bucket + secrets exist, run a PR build to confirm sync + prune both execute
  • Manual: DRY_RUN=1 deno task prune:immutable against the bucket to inspect what would be pruned

Context

Related to the iOS OAuth login break investigated in trakt-api#790 and trakt-web bug where stale chunks across deploys cause SPA boot failure. Companion to trakt-website kill-switch SW (separate PR) and the earlier client-side hooks.client.ts recovery (already on main as d4a8ea682).

After every deploy, SvelteKit emits new content-hashed chunks under
_app/immutable/. Existing clients (cached HTML, in-flight tabs, iOS
WebViews) still reference the previous deploy's hashed filenames, and
the CF ASSETS binding 404s them. Earlier client-side recovery only fires
once a chunk has already loaded, which leaves cold-start clients stuck.

This change keeps every deploy's immutable assets alive:
- New R2 bucket binding `R2_IMMUTABLE` (`trakt-web-immutable`)
- CI step between build and deploy syncs `.svelte-kit/cloudflare/_app/immutable/`
  to R2 additively (HEAD-skip on hit; content-hashed keys make this safe)
- New SK `handle` hook intercepts `/_app/immutable/*` paths that miss the
  ASSETS binding and serves them from R2 with immutable cache headers
- Sync step uses `continue-on-error: true` so transient R2 failures do
  not block deploys (intentional tradeoff; add alerting follow-up)

Required one-time setup before merge:
- Create `trakt-web-immutable` R2 bucket
- Add GitHub Action secrets `R2_ACCESS_KEY_ID` / `R2_SECRET_ACCESS_KEY`
  (R2 token scoped to object R/W on the bucket)
@deepsource-io
Copy link
Copy Markdown

deepsource-io Bot commented May 17, 2026

DeepSource Code Review

We reviewed changes in d4a8ea6...51a8746 on this pull request. Below is the summary for the review, and you can see the individual issues we found as inline review comments.

See full review on DeepSource ↗

PR Report Card

Overall Grade   Security  

Reliability  

Complexity  

Hygiene  

Code Review Summary

Analyzer Status Updated (UTC) Details
JavaScript May 17, 2026 3:02p.m. Review ↗
Code coverage May 17, 2026 3:02p.m. Review ↗

Important

AI Review is run only on demand for your team. We're only showing results of static analysis review right now. To trigger AI Review, comment @deepsourcebot review on this thread.

… oldest

Bounds R2 retention pool storage growth. Each deploy now writes a small
`releases/<ISO-ts>_<sha>.json` manifest listing every immutable key that
deploy depends on. CI runs a prune step after sync that keeps the most
recent 10 manifests, unions their keys, and deletes anything under
`_app/immutable/` that no kept manifest references. Pruned manifests are
deleted as well.

- Content-hashed chunks dedupe across manifests (same key = same content),
  so the union approach never deletes a chunk still referenced by a
  retained release.
- KEEP_RELEASES configurable (default 10). DRY_RUN supported for ad-hoc
  inspection.
- Sync and prune share `_internal/r2.ts` (sigv4 + S3 ops) so we avoid an
  aws-sdk dependency. Spec covers manifest key naming + content-type map.
- CI step uses `continue-on-error: true` so transient prune failures do
  not block the deploy; the next run will catch up.
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses the issue of SPA boot failures caused by stale asset references after deployments. By persisting immutable chunks in an R2 bucket and implementing a server-side fallback mechanism, the application ensures that clients referencing assets from previous builds can still successfully load them, improving reliability for users with cached content or long-lived sessions.

Highlights

  • Immutable Asset Retention: Introduced a new R2 bucket ('trakt-web-immutable') to store content-hashed assets from previous deployments, preventing 404 errors for clients with cached HTML.
  • CI Sync Script: Added a Deno script to sync build assets to R2 additively during the CI process, ensuring older chunks remain available after new deployments.
  • SvelteKit Asset Fallback: Implemented a new server-side 'handle' hook that intercepts requests for missing assets and serves them from the R2 bucket with appropriate immutable cache headers.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/ci_cd.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Pure count-based retention is brittle under deploy bursts: 10 fast
deploys in an hour shrinks the recovery pool to ~1 hour, breaking
long-lived SPA tabs whose cached HTML still references chunks from a
deploy 2 hours ago.

Retention now keeps the union of two windows:
  1. KEEP_RELEASES newest manifests by count (default 10)
  2. Every manifest younger than MIN_AGE_HOURS (default 24)

- Slow week: rule 1 dominates → ~10 days of coverage.
- Deploy burst: rule 2 dominates → every deploy in the past day stays.

Implementation:
- Added `manifestKeyToDate(key)` that round-trips the encoded ISO
  timestamp in the manifest filename. Returns null for unparseable
  keys; stray objects are always pruned because we trust no provenance.
- Lifted env reading and `r2FromEnv()` instantiation into `main()` so
  the module is importable by unit tests (previously crashed vitest
  workers via Deno.exit at import time when env vars were missing).
- New unit tests cover the count/age union, the burst scenario, and
  unparseable-name rejection.
- CI step now passes `MIN_AGE_HOURS: '24'` alongside `KEEP_RELEASES: '10'`.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to sync and serve immutable assets from a Cloudflare R2 bucket, ensuring that clients with cached HTML can still resolve assets after a new deployment. Key changes include a new Deno script for additive R2 synchronization, a SvelteKit server hook for asset fallback, and updated project configurations. The review feedback focuses on optimizing header processing in the sync script to improve performance and extracting duplicated R2 path encoding logic into a reusable helper function.

I am having trouble creating individual review comments. Click here to see my feedback.

projects/client/.scripts/sync-immutable-to-r2.ts (125-135)

medium

The current implementation for generating canonical headers is inefficient because it performs an O(N) lookup using 'find' inside a 'map' operation, resulting in O(N^2) complexity for header processing. Normalizing headers into a single object ensures that keys used for lookup match the keys used for storage and improves performance.

const normalizedHeaders = Object.fromEntries(
  Object.entries(headers).map(([k, v]) => [k.toLowerCase(), v.trim()]),
);
const sortedHeaderNames = Object.keys(normalizedHeaders).sort();
const canonicalHeaders =
  sortedHeaderNames
    .map((h) => h + ':' + normalizedHeaders[h])
    .join('\n') + '\n';
const signedHeaders = sortedHeaderNames.join(';');
References
  1. When working with maps or objects, ensure that the keys used for lookup match the keys used for storage to prevent incorrect data retrieval.

projects/client/.scripts/sync-immutable-to-r2.ts (137-139)

medium

The logic for constructing the encoded R2 key and the canonical URI is duplicated across 'sigv4Headers', 'headObject', and 'putObject'. Extracting this into a helper function improves maintainability and ensures consistency in how keys are escaped.

const getEncodedPath = (key: string) => '/' + BUCKET + '/' + key.split('/').map(encodeURIComponent).join('/');

const canonicalUri = getEncodedPath(key);
References
  1. Use factory functions or helpers to create request handlers for similar queries to avoid code duplication.

@vladjerca
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a system for syncing and serving immutable assets from Cloudflare R2 to support clients with stale HTML. It includes a minimal R2 client, scripts for asset synchronization and pruning, and a SvelteKit server hook for asset fallback. Review feedback pointed out that the R2 client fails to unescape keys from the S3 List API and needs more robust XML escaping for multi-object delete operations.

Comment thread projects/client/.scripts/_internal/r2.ts Outdated
Comment thread projects/client/.scripts/_internal/r2.ts Outdated
vladjerca added 2 commits May 17, 2026 17:01
- Drop non-null assertions on RELEASE_SHA by validating inside main() so
  the local const narrows. Threads the r2 client through uploadOne()
  instead of relying on module-level state.
- Replace the O(N^2) header lookup with an O(N) normalized-object pass
  for SigV4 canonical header building. The old code did a find() inside
  a map() and used a non-null assertion to silence the result.
- Extract keyPath() so head/put/getText share one place to encode the
  bucket-relative URL.
- XML-escape object keys on the multi-object delete body (covers all
  five predefined entities, not just & and <) and XML-unescape Key /
  NextContinuationToken on list responses. ListObjectsV2 returns those
  entities encoded; without unescaping, a key with & would round-trip
  as &amp; and miss the keep_set lookup during prune.
The internal r2.ts had drifted into a grab bag of unrelated concerns
(SigV4 client, manifest naming, content-type table, XML escape). This
splits the file along its natural seams so each piece is testable on
its own and the R2 client stays focused on the wire protocol.

- xml.ts: predefined-entity escape/unescape. New xml.spec.ts covers
  round-trip + order-of-replacement (the &amp; pass must run last on
  decode).
- manifest.ts: release manifest key naming + the IMMUTABLE_PREFIX /
  RELEASE_MANIFEST_PREFIX constants. manifest.spec.ts moved over from
  r2.spec.ts.
- content-type.ts: extension-to-MIME table. content-type.spec.ts moved
  over and picks up a no-extension fallback case + a case-insensitive
  case.
- r2.ts: now only the R2Client + r2FromEnv. Imports the xml helpers
  from xml.ts. Renamed the inline byte iterator (`b` → `byte`) flagged
  by DeepSource.
- Renamed `d` → `parsed` in manifestKeyToDate and `t` → `when` in the
  prune retention specs to clear the remaining JS-C1002 findings.
- Updated import paths in prune-immutable-r2.ts, prune-immutable-r2.spec.ts,
  and sync-immutable-to-r2.ts.
@vladjerca
Copy link
Copy Markdown
Contributor Author

@gemini-code-assist thanks for the review. Addressed both suggestions from the first review body and the two inline comments on the second:

  • Canonical headers now build off a single normalized object, dropping the O(N^2) lookup and the non-null assertion.
  • Extracted keyPath() on R2Client so head/put/getText share the bucket-relative path builder; same shape used by list/deleteMany.
  • list unescapes <Key> and <NextContinuationToken>; deleteMany escapes the full set of five predefined XML entities. Helpers live in _internal/xml.ts with round-trip coverage.

Bonus while I was in there: split the R2 helpers into focused modules (xml.ts, manifest.ts, content-type.ts) so each is testable in isolation, and cleared the DeepSource findings on this run.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Thanks for the update, @vladjerca. The refactoring into focused modules (xml.ts, manifest.ts, content-type.ts) and the addition of round-trip testing significantly improve the maintainability and robustness of the R2 client. The logic for canonical header normalization and the unified keyPath() builder look solid.

Since you've addressed the previous review comments and cleared the static analysis findings, this looks ready for final verification. Good luck with the deployment and testing steps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants