fix(redis): timeout cachedFetchJson inflight fetcher (#3539) by koala73 · Pull Request #3561 · koala73/worldmonitor

koala73 · 2026-05-02T13:07:31Z

Summary

Wraps the fetcher in cachedFetchJson and cachedFetchJsonWithMeta with a 30 s Promise.race timeout so the inflight Map slot is guaranteed to settle even if the fetcher truly never resolves (no internal timeout, no AbortController).
Pre-fix, the inflight entry only deleted in .finally(). A hung fetcher persisted in the Vercel isolate forever — every subsequent caller for that key was handed the same unresolved promise, permanently poisoning the key.
Treats the cache layer as the last line of defense: well-behaved fetchers using their own UPSTREAM_TIMEOUT_MS (typically 5–15 s) never see the 30 s ceiling. The trap was a future caller forgetting to set their own timeout.
Test-only __setFetcherTimeoutForTests / __resetFetcherTimeoutForTests exports follow the existing __resetKeyPrefixCacheForTests pattern so unit tests can exercise the timeout branch without sleeping for 30 s.
Timer is clearTimeout'd via .finally() whenever the fetcher wins so we don't leak handles or keep the isolate awake unnecessarily.

Closes #3539.

Test plan

npx tsx --test tests/redis-caching.test.mjs — 21/21 pass, including 3 new cases:
- hung fetcher rejects with the timeout error and the inflight slot is released so a follow-up caller successfully re-fetches (the core regression),
- prompt fetchers don't trip the timeout (no leaked timers / unhandled rejections after the timeout window passes),
- cachedFetchJsonWithMeta enforces the same bound.
npx tsc --noEmit -p tsconfig.api.json clean.
npx biome lint server/_shared/redis.ts tests/redis-caching.test.mjs clean.

…ey for isolate lifetime (#3539) Wraps the fetcher in `cachedFetchJson` and `cachedFetchJsonWithMeta` with a 30s `Promise.race` timeout so the inflight Map slot is guaranteed to settle even if the fetcher truly never resolves (no internal timeout, no AbortController). Pre-fix: the inflight entry only deleted in `.finally()`, so a hung fetcher persisted in the Vercel isolate forever — every subsequent caller for that key was handed the same unresolved promise. Mostly mitigated when fetchers behave (use `UPSTREAM_TIMEOUT_MS`), but a future caller forgetting the timeout was a permanent hot-key trap. Treats the cache layer as the last line of defense. Tests cover (a) a hung fetcher rejects with the timeout error and the inflight slot is released so the next caller re-fetches successfully, (b) prompt fetchers don't trip the timeout (no leaked timers / unhandled rejections), and (c) `cachedFetchJsonWithMeta` enforces the same bound.

vercel · 2026-05-02T13:07:36Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
worldmonitor	Ready	Preview, Comment	May 3, 2026 4:08am

greptile-apps · 2026-05-02T13:10:04Z

Greptile Summary

This PR fixes a permanent inflight-map poisoning bug (#3539) by wrapping the fetcher() call in both cachedFetchJson and cachedFetchJsonWithMeta with a 30 s Promise.race timeout, ensuring the .finally() that deletes the inflight entry always fires even when the upstream hangs indefinitely. Test-only escape hatches (__setFetcherTimeoutForTests / __resetFetcherTimeoutForTests) follow the established module-override pattern and are backed by three targeted new test cases.

Confidence Score: 4/5

Safe to merge; core fix is correct and well-tested, only P2 style suggestions remain.

No P0 or P1 issues found. Two P2 style suggestions: the error message reads the module-level timeout variable at fire-time rather than capture-time, and it hard-codes "cachedFetchJson" for both call sites. Neither affects correctness in production.

No files require special attention beyond the two P2 style notes on server/_shared/redis.ts.

Important Files Changed

Filename	Overview
server/_shared/redis.ts	Adds withFetcherTimeout helper wrapping both cachedFetchJson and cachedFetchJsonWithMeta fetcher calls; two minor P2 issues: error message captures fetcherTimeoutMs at fire-time and hard-codes "cachedFetchJson" even for the WithMeta call site.
tests/redis-caching.test.mjs	Adds three well-structured test cases covering hung fetcher rejection, clean timer clearance on fast resolution, and the WithMeta variant; test isolation via importRedisFresh and try/finally resets is solid.

Sequence Diagram

sequenceDiagram
    participant C1 as Caller 1 (leader)
    participant C2 as Caller 2 (follower)
    participant CF as cachedFetchJson
    participant WFT as withFetcherTimeout
    participant F as Fetcher
    participant T as setTimeout(30s)
    participant IM as inflight Map
    participant R as Redis

    C1->>CF: cachedFetchJson(key)
    CF->>R: getCachedJson(key) → null
    CF->>WFT: withFetcherTimeout(fetcher(), key)
    WFT->>F: fetcher() [already started]
    WFT->>T: setTimeout(reject, 30s)
    CF->>IM: inflight.set(key, promise)
    CF-->>C1: promise

    C2->>CF: cachedFetchJson(key)
    CF->>R: getCachedJson(key) → null
    CF->>IM: inflight.get(key) → existing
    CF-->>C2: existing (same promise)

    alt Fetcher resolves in time
        F-->>WFT: result
        WFT->>T: clearTimeout(timer)
        WFT-->>CF: result
        CF->>R: setCachedJson(key, result)
        CF->>IM: inflight.delete(key)
        CF-->>C1: result
        CF-->>C2: result
    else Fetcher hangs → timeout fires
        T-->>WFT: reject(TimeoutError)
        WFT-->>CF: TimeoutError
        CF->>IM: inflight.delete(key)
        CF-->>C1: ✗ TimeoutError
        CF-->>C2: ✗ TimeoutError
        Note over IM: Slot released — next caller starts fresh
    end

_{Reviews (1): Last reviewed commit: "fix(redis): timeout inflight fetcher to ..." | Re-trigger Greptile}

greptile-apps · 2026-05-02T13:10:07Z

+  const timeout = new Promise<never>((_, reject) => {
+    timer = setTimeout(() => {
+      reject(new Error(`cachedFetchJson timeout after ${fetcherTimeoutMs}ms for "${key}"`));
+    }, fetcherTimeoutMs);


Error message captures timeout duration at fire-time, not creation-time

fetcherTimeoutMs is read inside the setTimeout callback via closure, so the message reflects the value of the variable when the timer fires, not the value passed to setTimeout. In production this never matters (the value is constant), but in tests where __setFetcherTimeoutForTests and __resetFetcherTimeoutForTests are called, if the reset races ahead of the timer firing the message could print the wrong duration. Capturing the value eagerly at call-time makes the message self-consistent with the actual delay used.

Suggested change

const timeout = new Promise<never>((_, reject) => {

timer = setTimeout(() => {

reject(new Error(`cachedFetchJson timeout after ${fetcherTimeoutMs}ms for "${key}"`));

}, fetcherTimeoutMs);

const timeout = new Promise<never>((_, reject) => {

const timeoutMs = fetcherTimeoutMs;

timer = setTimeout(() => {

reject(new Error(`cachedFetchJson timeout after ${timeoutMs}ms for "${key}"`));

}, timeoutMs);

Pushing back on this one — timeoutMs in withFetcherTimeout is a local function parameter (line 287/290), captured at function-call time, not a live read against the module-level fetcherTimeoutDefaultMs. The closure inside setTimeout reads the parameter, which is immutable across the lifetime of that race. __resetFetcherTimeoutForTests mutates fetcherTimeoutDefaultMs, but that read happens at call site (line 335 / line 421) before withFetcherTimeout is invoked — by then the value is already snapshotted into the local parameter. So the message is already self-consistent with the actual delay. Closing as no-op; happy to revisit if I'm missing a vector.

greptile-apps · 2026-05-02T13:10:08Z

+function withFetcherTimeout<T>(promise: Promise<T>, key: string): Promise<T> {
+  let timer: ReturnType<typeof setTimeout> | undefined;
+  const timeout = new Promise<never>((_, reject) => {
+    timer = setTimeout(() => {
+      reject(new Error(`cachedFetchJson timeout after ${fetcherTimeoutMs}ms for "${key}"`));
+    }, fetcherTimeoutMs);
+  });
+  return Promise.race([promise, timeout]).finally(() => {
+    if (timer !== undefined) clearTimeout(timer);
+  });
+}


Error message hard-codes "cachedFetchJson" for both call sites

withFetcherTimeout is shared by both cachedFetchJson and cachedFetchJsonWithMeta, but the rejection message always says "cachedFetchJson timeout …". This makes the timeout error from cachedFetchJsonWithMeta misleading in logs. Adding a callerName parameter (or even a short tag) would make it trivial to distinguish which call site timed out in production log searches.

Fixed in 5cca589 — added a callerName: 'cachedFetchJson' | 'cachedFetchJsonWithMeta' parameter to withFetcherTimeout, both call sites now brand their own error string. Tests tightened to assert the exact prefix on each helper.

…s for LLM/multi-stage callers PR review: hardcoded 30s ceiling pre-empted callers that intentionally allow longer fetcher work. Concrete victims: - deductSituation (deduct-situation.ts:121) wraps callLlmReasoning with DEDUCT_TIMEOUT_MS = 120_000 — at 30s the cache wrapper rejected before the LLM's own 120s timeout/fallback could complete; deductSituation didn't catch the rejection so users saw a cold 500. - analyzeStock (analyze-stock.ts:1185) chains 2x UPSTREAM_TIMEOUT_MS sequenced upstream fetches plus a 20s LLM overlay inside one cached fetcher; realistic worst-case ~40s exceeds 30s. Fix: add opts.timeoutMs to cachedFetchJson and cachedFetchJsonWithMeta. Default stays at 30s for the common HTTP-fetch case so a forgetful new caller still can't poison the inflight Map. Long-running callers pass an explicit budget set above their internal bound (deduct: 125s, analyze-stock: 60s) so the safety net is the LAST resort, not the active enforcer. Also added test coverage for the override path on both helpers.

Greptile P2 on #3561: withFetcherTimeout is shared by both cachedFetchJson and cachedFetchJsonWithMeta but the rejection always said "cachedFetchJson timeout …", making cachedFetchJsonWithMeta timeout errors misleading in logs. Add a `callerName` parameter so each call site brands its own error and Sentry log searches can grep the right helper. Tests tightened to assert the exact prefix on each helper.

Per residual-risk note on #3561 review: this helper only releases the inflight slot — the underlying fetcher promise keeps running in the background until isolate recycle. Document the limitation so the next reader knows what NOT to claim, and so a future "add real cancellation" PR has its scope laid out.

vercel Bot deployed to Preview May 2, 2026 13:10 View deployment

greptile-apps Bot reviewed May 2, 2026

View reviewed changes

vercel Bot deployed to Preview May 2, 2026 16:20 View deployment

koala73 added 2 commits May 3, 2026 08:01

vercel Bot deployed to Preview May 3, 2026 04:08 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(redis): timeout cachedFetchJson inflight fetcher (#3539)#3561

fix(redis): timeout cachedFetchJson inflight fetcher (#3539)#3561
koala73 wants to merge 4 commits into
mainfrom
fix/cached-fetch-inflight-timeout

koala73 commented May 2, 2026

Uh oh!

vercel Bot commented May 2, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 2, 2026

Uh oh!

greptile-apps Bot May 2, 2026

Uh oh!

koala73 May 3, 2026

Uh oh!

greptile-apps Bot May 2, 2026

Uh oh!

koala73 May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

koala73 commented May 2, 2026

Summary

Test plan

Uh oh!

vercel Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps Bot commented May 2, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

koala73 May 3, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

koala73 May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 2, 2026 •

edited

Loading