Skip to content

fix(redis): timeout hung fetchers in cachedFetchJson/cachedFetchJsonWithMeta#3672

Open
fuleinist wants to merge 1 commit into
koala73:mainfrom
fuleinist:fix/cachedfetchjson-timeout
Open

fix(redis): timeout hung fetchers in cachedFetchJson/cachedFetchJsonWithMeta#3672
fuleinist wants to merge 1 commit into
koala73:mainfrom
fuleinist:fix/cachedfetchjson-timeout

Conversation

@fuleinist
Copy link
Copy Markdown
Contributor

Summary

Fixes a bug where a fetcher that never settles (no internal AbortController / no signal timeout) would poison the inflight Map for the entire Vercel Edge isolate lifetime. Both cachedFetchJson and cachedFetchJsonWithMeta now race the fetcher against a 30-second timeout, ensuring the .finally() cleanup always runs.

Problem

cachedFetchJson and cachedFetchJsonWithMeta coalesce concurrent cache-miss requests by storing in-flight promises in a Map<string, Promise<unknown>>. If a fetcher never resolves or rejects (e.g., a Transport.fetch with no timeout signal), the entry is only cleaned up in .finally() — which never runs. Every concurrent and subsequent caller for that cache key gets the same unresolved promise and hangs indefinitely.

Solution

Wrap the fetcher in Promise.race([fetcher(), timeoutPromise]). If the fetcher doesn't settle within FETCHER_TIMEOUT_MS (30 seconds), the race rejects, the .catch() block logs the warning, and the .finally() block cleans up the inflight entry. Applied to both functions.

FETCHER_TIMEOUT_MS is exported for testability.

Testing

  • Unit test: hung fetcher is rejected after timeout (cachedFetchJson)
  • Unit test: hung fetcher is rejected after timeout (cachedFetchJsonWithMeta)
  • Unit test: fast fetcher still works correctly through the Promise.race path
  • All 21 tests in redis-caching.test.mjs pass

Files Changed

  • server/_shared/redis.ts — added FETCHER_TIMEOUT_MS constant and Promise.race wrappers in both functions
  • tests/redis-caching.test.mjs — added 'inflight timeout guard' describe block (3 tests)

Fixes #3539

…ithMeta

What was wrong:
- cachedFetchJson and cachedFetchJsonWithMeta store in-flight promises in a
  Map keyed by cache key. If a fetcher never settles (Transport.fetch with no
  internal AbortController / no signal timeout), the entry stays in the Map
  for the entire Vercel Edge isolate lifetime.
- Every concurrent and subsequent caller for that key gets the same unresolved
  promise and hangs forever. The inflight entry is only cleaned up in
  .finally(), which never runs for a never-settling promise.

What this fixes:
- Wraps the fetcher in Promise.race with a 30-second timeout. If the fetcher
  doesn't settle within FETCHER_TIMEOUT_MS, the race rejects and the .finally()
  block runs to clean up the inflight entry.
- Applied to both cachedFetchJson and cachedFetchJsonWithMeta.
- FETCHER_TIMEOUT_MS is exported so callers can verify the constant or
  override it in tests.

## Changes
- server/_shared/redis.ts: add FETCHER_TIMEOUT_MS (30s) and wrap fetcher in
  Promise.race in both functions
- tests/redis-caching.test.mjs: add 'inflight timeout guard' describe block
  with 3 tests (hung fetcher rejection for both functions + fast fetcher
  sanity check)

## Testing
- [x] Unit test added (inflight timeout guard suite, 3 tests)
- [x] All 21 tests pass
- [x] Existing tests unchanged

Fixes koala73#3539
@vercel
Copy link
Copy Markdown

vercel Bot commented May 12, 2026

@fuleinist is attempting to deploy a commit to the World Monitor Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions github-actions Bot added the trust:safe Brin: contributor trust score safe label May 12, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 12, 2026

Greptile Summary

This PR fixes the inflight-map poisoning bug by wrapping fetchers in Promise.race against a 30-second timeout in both cachedFetchJson and cachedFetchJsonWithMeta, ensuring the .finally() cleanup always runs even when a fetcher never settles.

  • server/_shared/redis.ts: Adds FETCHER_TIMEOUT_MS = 30_000 and a Promise.race wrapper in both caching helpers; the setTimeout handle is not saved and never clearTimeout-ed, leaving a live 30-second timer in the event loop on every cache miss even when the fetcher resolves in milliseconds.
  • tests/redis-caching.test.mjs: Adds three timeout-guard tests; the two hung-fetcher cases assert elapsed < 5000 ms but importRedisFresh() loads the module with the real 30,000 ms constant and no fake-timer setup, so those assertions will always fail at runtime.

Confidence Score: 3/5

The production fix is logically correct but ships two defects: uncancelled timers on every successful cache miss and two tests that will fail in CI due to a mismatched elapsed-time assertion.

The core Promise.race approach correctly prevents the inflight-map poisoning described in the bug. However, neither cachedFetchJson nor cachedFetchJsonWithMeta saves the setTimeout return value, so the 30-second timer is never cancelled when the fetcher resolves early — every cache miss in a busy isolate accumulates a live timer for 30 s. Separately, both hung-fetcher tests assert elapsed < 5000 ms while importRedisFresh() loads the module with the unmodified FETCHER_TIMEOUT_MS = 30_000; those tests will time out or fail the elapsed assertion, making the test suite unreliable.

Both server/_shared/redis.ts (the clearTimeout omission in both caching functions) and tests/redis-caching.test.mjs (the elapsed-time assertions in the two hung-fetcher tests) need fixes before merging.

Important Files Changed

Filename Overview
server/_shared/redis.ts Adds Promise.race timeout guard to both cachedFetchJson and cachedFetchJsonWithMeta; the setTimeout handle is never cleared when the fetcher resolves early, leaking a live 30-second timer on every cache miss.
tests/redis-caching.test.mjs Adds three timeout-guard tests; the two hung-fetcher tests assert elapsed < 5000 ms but use the real 30,000 ms FETCHER_TIMEOUT_MS with no fake timers, so they will always fail the elapsed assertion.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant cachedFetchJson
    participant inflightMap as inflight Map
    participant Redis
    participant Fetcher
    participant Timer as setTimeout (30s)

    Caller->>cachedFetchJson: cachedFetchJson(key, ttl, fetcher)
    cachedFetchJson->>Redis: getCachedJson(key)
    Redis-->>cachedFetchJson: null (cache miss)
    cachedFetchJson->>inflightMap: "get(key) -> undefined"
    cachedFetchJson->>Timer: setTimeout(reject, 30000) [handle NOT saved]
    cachedFetchJson->>Fetcher: fetcher()
    cachedFetchJson->>inflightMap: set(key, promise)

    alt Fetcher resolves before timeout
        Fetcher-->>cachedFetchJson: result
        cachedFetchJson->>Redis: setCachedJson(key, result, ttl)
        cachedFetchJson->>inflightMap: delete(key) [finally]
        Note over Timer: Timer still live for ~30s (handle not cleared)
        cachedFetchJson-->>Caller: result
    else Fetcher hangs - timeout fires
        Timer-->>cachedFetchJson: reject(TimeoutError)
        cachedFetchJson->>cachedFetchJson: .catch() logs warning
        cachedFetchJson->>inflightMap: delete(key) [finally]
        cachedFetchJson-->>Caller: throw TimeoutError
    end
Loading

Comments Outside Diff (2)

  1. server/_shared/redis.ts, line 284-304 (link)

    P1 setTimeout handle never cleared on fast success, causing a timer leak per cache miss

    When the fetcher resolves before the 30-second deadline the setTimeout callback remains queued in the event loop for the remaining duration. In a Vercel Edge isolate that handles many cache misses the accumulated live timers add up. The orphaned callback calls reject() on an already-settled promise (harmless) but the timer itself is never cancelled. Clearing it in .finally() fixes both functions.

  2. server/_shared/redis.ts, line 375-405 (link)

    P1 Same timer-leak issue as in cachedFetchJson — the setTimeout handle for the race timeout is not cleared when the fetcher resolves early. Apply the same clearTimeout fix here.

Reviews (1): Last reviewed commit: "fix(redis): timeout hung fetchers in cac..." | Re-trigger Greptile

Comment on lines +988 to +994
// guard is not working.
assert.ok(elapsed < 5000, `timeout should fire well under 5s, got ${elapsed}ms`);
} finally {
globalThis.fetch = originalFetch;
restoreEnv();
}
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Elapsed-time assertions will always fail at the real timeout

Both hung-fetcher tests assert elapsed < 5000 (5 s), but FETCHER_TIMEOUT_MS is 30_000 ms. importRedisFresh() performs a real dynamic import of the unmodified module — no fake timers, no constant override. The test runner will wait ~30 s for the timeout to fire, then the assert.ok(elapsed < 5000, ...) assertion fails with "timeout should fire well under 5s, got ~30000ms". To fix this the tests need either (a) fake/mock timers so the 30-second setTimeout fires immediately, or (b) a test-specific override of FETCHER_TIMEOUT_MS via importPatchedTsModule.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

trust:safe Brin: contributor trust score safe

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cachedFetchJson inflight Map has no timeout — hung fetcher poisons key for isolate lifetime

1 participant