Skip to content

fix: retry rule load after a transient failure#49

Draft
sarahxsanders wants to merge 1 commit into
mainfrom
posthog-code/fix-rule-load-retry
Draft

fix: retry rule load after a transient failure#49
sarahxsanders wants to merge 1 commit into
mainfrom
posthog-code/fix-rule-load-retry

Conversation

@sarahxsanders

@sarahxsanders sarahxsanders commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

FROM SIGNALS, REVIEWING MYSELF BEFORE READY

Problem and acceptable goal

Problem this change addresses

scan() lazily assigns rulesPromise = loadRules() on first use and reuses that promise forever. If loadRules() rejects — a transient WASM read failure, a momentarily missing rules directory, or a compile error — the rejected promise stays cached. Every later scan() re-awaits the same rejected promise and throws, so a one-time recoverable failure permanently breaks scanning for the lifetime of the process.

Why the Warlock is the right home for this change:

This is a correctness bug in Warlock's own lazy-load cache inside src/scanner/engine.ts. The faulty caching lives entirely in the engine, so the fix belongs here — pushing retry logic into context-mill or any other consumer would just paper over an engine defect and force every consumer to reimplement the same workaround.

Changes

  • scan() now attaches a .catch to the cached loadRules() promise that resets rulesPromise to null and re-throws, so the next scan() retries the load. The success fast-path (compile once, reuse) is unchanged, and it's concurrency-safe: in-flight callers all see the rejection and only the next call triggers a fresh load.
  • New src/scanner/__tests__/engine.retry.test.ts forces a one-time readdirSync failure, asserts scan() rejects, then asserts the next scan() recovers. It lives in its own file because the module-level rulesPromise is shared across tests in a file — a fresh file gives a clean module instance so the failure path is actually exercised.

Anti-goal check

  • This is not a code-quality or best-practice check
  • This does not add policy or enforcement logic
  • This does not add phase, tool, or workflow awareness to the engine
  • This does not encode wizard-specific assumptions

Test plan

  • Existing tests pass (517 passed)
  • Build succeeds (npm run build)
  • New test covers the transient-failure-then-recovery path
  • For new rules: n/a (no rule changes)
  • For API changes: n/a (public scan() signature unchanged)

Created with PostHog Code

scan() cached the rulesPromise on first call and reused it forever. If
loadRules() rejected (transient WASM read failure, momentarily missing rules
directory, compile error), the rejected promise stayed cached and every later
scan() re-awaited it and threw — permanently breaking scanning for the lifetime
of a long-running host process.

Clear the cached promise on rejection so the next scan() retries the load, while
keeping the success fast-path (compile once, reuse) intact.

Generated-By: PostHog Code
Task-Id: 5b7b5c21-83c6-4c64-ad90-6bdb30815436
@sarahxsanders sarahxsanders requested a review from a team as a code owner June 12, 2026 20:46
@sarahxsanders sarahxsanders removed the request for review from a team June 12, 2026 20:51
@sarahxsanders sarahxsanders marked this pull request as draft June 12, 2026 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant