fix(api): make Slack connection check resilient to transient failures#786
Draft
posthog[bot] wants to merge 1 commit into
Draft
fix(api): make Slack connection check resilient to transient failures#786posthog[bot] wants to merge 1 commit into
posthog[bot] wants to merge 1 commit into
Conversation
`fetchSlackConnected` issued a bare `axios.get` with no timeout or retry, so any transient network blip against `/integrations/` threw and the callers captured it as an exception. On a non-critical post-install poll that already degrades gracefully to the connect nudge, this produced steadily accumulating error-tracking noise across several transient failure modes (gateway 504s, socket resets, TLS-handshake disconnects). Add a short timeout plus a small retry/backoff for gateway 5xx and retryable socket/TLS errors. Exhausted transient failures now degrade quietly to "not connected" without calling `captureException`; only genuine, non-transient errors (401/403, malformed responses) still throw for the caller to report. Generated-By: PostHog Code Task-Id: 54188ec4-3681-47ed-9228-a0628b42af27
🧙 Wizard CIRun the Wizard CI and test your changes against wizard-workbench example apps by replying with a GitHub comment using one of the following commands: Test all apps:
Test all apps in a directory:
Test an individual app:
Show more apps
Results will be posted here when complete. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The Slack-connection check on the post-install step (
fetchSlackConnectedinsrc/lib/api.ts) issued a bareaxios.getagainst/api/projects/:id/integrations/with no timeout, retry, or transient-failure handling. Any flaky request — a gateway 504, a socket reset, a TLS-handshake disconnect — threw, and theSlackConnectScreenpoll caught it and calledcaptureException. The wizard already degrades gracefully to the connect nudge, so this was purely error-tracking noise: it turned routine network blips on a non-critical step into tracked exceptions, split across several error classes and per-release bundle filenames.Changes
fetchSlackConnectedresilient to all transient failures: a short request timeout plus a small retry/backoff for gateway 5xx (502/503/504) and retryable socket/TLS errors (ECONNRESET, EADDRNOTAVAIL, ETIMEDOUT, axios timeouts, TLS-handshake disconnects, DNS blips).false) without callingcaptureException. Exception reporting is reserved for genuine, non-transient errors (401/403, malformed responses).SlackConnectScreencatch comment to reflect that only genuine errors now reach it.This changes what gets reported, not the UX — the fallback nudge already covered the degraded path.
Test plan
New unit tests in
src/lib/__tests__/fetch-slack-connected.test.tscover: success, retry-then-succeed, quiet degrade across each transient failure mode (504/502/503, EADDRNOTAVAIL, ECONNRESET, axios timeout, TLS disconnect), throw-on-genuine (401/403/404, malformed 200), cancellation propagation, and the already-aborted guard.pnpm build && pnpm test— all 1141 tests pass;pnpm fixclean.Created with PostHog Code from an inbox report.