fix: auth resilience -- graceful JWT fallback + placeholder detection (production outage postmortem)#264
Conversation
… postmortem tests
Production auth outage: SUPABASE_JWT_SECRET was a placeholder
('dev-secret-key') on Railway. PR OpenCodeIntel#253 switched JWT verification
from Supabase API calls to local decode. Local decode against a
wrong secret silently failed, causing 401 for all authenticated users.
Layer 1 -- Graceful fallback (services/auth.py):
When local JWT decode fails with InvalidTokenError, try Supabase
API verification before giving up. Wrong secret now degrades to
slow auth (API call per request) instead of broken auth (401).
Expired tokens and missing claims still fail immediately (no
point retrying those).
Layer 2 -- Startup validation:
- auth.py __init__: detects placeholder secrets (< 32 chars or
known placeholders like 'dev-secret-key'). Nulls them out and
logs error with instructions to get the real secret from
Supabase dashboard. Forces API verification path.
- startup_checks.py: warns if JWT secret is suspiciously short.
Layer 3 -- Postmortem tests (3 new):
- test_wrong_secret_falls_back_to_api: THE production scenario.
JWT secret is set but wrong. Verifies auth succeeds via API
fallback instead of returning 401.
- test_expired_token_does_not_fallback: expired is expired.
Verifies we don't waste an API call on genuinely expired tokens.
- test_placeholder_secret_nulled_at_startup: verifies
'dev-secret-key' gets detected and nulled at init time.
Updated 2 existing tests:
- test_wrong_secret_raises_error -> test_wrong_secret_falls_back_not_raises
- test_wrong_audience_raises_error -> test_wrong_audience_falls_back_not_raises
Both now verify fallback behavior instead of asserting errors.
292 tests pass (289 + 3 new). Zero flake8 errors.
|
@DevanshuNEU is attempting to deploy a commit to the Dev's projects Team on Vercel. A member of the Team first needs to authorize it. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: Repository UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds runtime validation for SUPABASE_JWT_SECRET at startup and changes auth verification to prefer local JWT decode when a valid secret is configured, but fall back to Supabase API verification when local decoding fails or when the secret is missing/placeholder. Tests updated to cover fallback and expired-token behavior. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant AuthService
participant LocalVerify as Local JWT Decoder
participant SupabaseAPI as Supabase API
Client->>AuthService: verify_jwt(token)
alt Valid JWT secret configured
AuthService->>LocalVerify: decode(token, secret)
alt decode succeeds
LocalVerify-->>AuthService: claims
AuthService-->>Client: user data
else decode fails (InvalidTokenError)
LocalVerify-->>AuthService: InvalidTokenError
AuthService->>SupabaseAPI: get_user(token)
SupabaseAPI-->>AuthService: user data
AuthService-->>Client: user data
else token expired / missing claim
LocalVerify-->>AuthService: TokenExpiredError / TokenMissingClaimError
AuthService-->>Client: raise error
end
else No/placeholder/too-short secret
AuthService->>SupabaseAPI: get_user(token)
SupabaseAPI-->>AuthService: user data
AuthService-->>Client: user data
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
backend/services/auth.py (1)
112-129: Synchronous network call may block async endpoints.
_verify_via_apimakes a synchronous Supabase API call. Ifverify_jwtis called from an async request handler, this blocks the event loop during the fallback path.Given this is a fallback path (not the happy path), the impact is limited, but consider wrapping the call in
asyncio.to_thread()or making the method async for consistency with other I/O methods in this class (signup,login, etc.).♻️ Optional: Make async-safe
- def _verify_via_api(self, token: str) -> Dict[str, Any]: + async def _verify_via_api(self, token: str) -> Dict[str, Any]: """Fallback: verify via Supabase API call (requires network).""" + import asyncio try: - response = self.client.auth.get_user(token) + response = await asyncio.to_thread( + self.client.auth.get_user, token + )Note: This would require
verify_jwtto also become async or use a sync wrapper.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/services/auth.py` around lines 112 - 129, _verify_via_api currently does a synchronous Supabase API call (self.client.auth.get_user) which can block the event loop; make it async-safe by running that blocking call in a thread: convert _verify_via_api to async and call the blocking call via asyncio.to_thread(lambda: self.client.auth.get_user(token)) (or keep it sync but move the to_thread call into verify_jwt if you prefer) and propagate the async change to verify_jwt (update its signature and callers) so exceptions and the existing return shape are preserved; keep the existing exception handling (AuthenticationError, InvalidTokenError) and logging behavior.backend/config/startup_checks.py (1)
56-65: Consider aligning placeholder detection with auth.py for consistency.The startup check only validates length (
< 32), butauth.py(lines 33-36) additionally checks for known placeholder strings like"super-secret-jwt-token-with-at-least-32-characters"which would pass the length check here.For defense-in-depth consistency, consider extracting the placeholder detection logic into a shared helper or at minimum adding the same string checks here.
♻️ Optional: Add placeholder string check
# Validate JWT secret looks real (not a placeholder) jwt_secret = os.getenv("SUPABASE_JWT_SECRET", "") - if jwt_secret and len(jwt_secret) < 32: + KNOWN_PLACEHOLDERS = { + "dev-secret-key", "secret", "your-jwt-secret", "change-me", + "test-secret-key", "super-secret-jwt-token-with-at-least-32-characters", + } + if jwt_secret and (len(jwt_secret) < 32 or jwt_secret in KNOWN_PLACEHOLDERS): logger.error( "SUPABASE_JWT_SECRET is too short -- likely a placeholder. "🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/config/startup_checks.py` around lines 56 - 65, The startup check using jwt_secret currently only enforces length (< 32) but misses known placeholder values used in auth.py; update the logic in startup_checks.py to mirror auth.py's placeholder detection by rejecting the specific placeholder string "super-secret-jwt-token-with-at-least-32-characters" (and any other hardcoded placeholders used elsewhere) or, better, extract the placeholder test into a shared helper (e.g., is_placeholder_jwt_secret(secret)) and call it from startup_checks.py (use jwt_secret variable) so the logger.error branch is triggered for both short secrets and recognized placeholder strings; ensure the logger.error call (logger.error(..., secret_length=...)) remains and is used when the helper indicates a placeholder.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/tests/test_jwt_local_decode.py`:
- Around line 216-225: Remove the dead `token` variable created by the
_make_token call: delete the lines that assign token = _make_token({"sub":
"expired-user"}, secret=JWT_SECRET) since the test uses `expired_token` produced
by pyjwt.encode; ensure no other references to `token` remain and run tests to
confirm nothing else depends on _make_token in this test.
---
Nitpick comments:
In `@backend/config/startup_checks.py`:
- Around line 56-65: The startup check using jwt_secret currently only enforces
length (< 32) but misses known placeholder values used in auth.py; update the
logic in startup_checks.py to mirror auth.py's placeholder detection by
rejecting the specific placeholder string
"super-secret-jwt-token-with-at-least-32-characters" (and any other hardcoded
placeholders used elsewhere) or, better, extract the placeholder test into a
shared helper (e.g., is_placeholder_jwt_secret(secret)) and call it from
startup_checks.py (use jwt_secret variable) so the logger.error branch is
triggered for both short secrets and recognized placeholder strings; ensure the
logger.error call (logger.error(..., secret_length=...)) remains and is used
when the helper indicates a placeholder.
In `@backend/services/auth.py`:
- Around line 112-129: _verify_via_api currently does a synchronous Supabase API
call (self.client.auth.get_user) which can block the event loop; make it
async-safe by running that blocking call in a thread: convert _verify_via_api to
async and call the blocking call via asyncio.to_thread(lambda:
self.client.auth.get_user(token)) (or keep it sync but move the to_thread call
into verify_jwt if you prefer) and propagate the async change to verify_jwt
(update its signature and callers) so exceptions and the existing return shape
are preserved; keep the existing exception handling (AuthenticationError,
InvalidTokenError) and logging behavior.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
backend/config/startup_checks.pybackend/services/auth.pybackend/tests/test_jwt_local_decode.py
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
Incident
Production auth outage: all authenticated API calls returning 401 ('Invalid token or API key'). Dashboard shows 'No repositories yet' for logged-in users.
Root Cause
PR #253 (OPE-75) switched JWT verification from Supabase API calls to local HS256 decode. SUPABASE_JWT_SECRET on Railway was 'some_key' (a placeholder). Local decode against wrong secret failed silently. The code had no fallback. So if local decode failed, it assumed the token was bad instead of trying API verification.
Fix
Layer 1: Graceful fallback
When local JWT decode fails with InvalidTokenError, try Supabase API verification before giving up. Wrong/mismatched secret now degrades to 'slow auth' (API call per request) instead of 'broken auth' (401 for everyone).
Expired tokens and missing claims still fail immediately. No point retrying those via API.
Layer 2: Startup validation
auth.py __init__: detects placeholder secrets (< 32 chars or known values like 'some_key'). Nulls them out, logs error with Supabase dashboard instructions. Forces API path automatically.startup_checks.py: warns at boot if JWT secret is suspiciously short.Layer 3: Postmortem tests (3 new)
test_wrong_secret_falls_back_to_api: THE exact production scenario. Wrong secret, auth still works via API.test_expired_token_does_not_fallback: expired tokens fail fast, no wasted API calls.test_placeholder_secret_nulled_at_startup: 'some_key' detected and neutralized at init.Immediate Action Required
Update SUPABASE_JWT_SECRET on Railway with the real value from: Supabase Dashboard -> Settings -> API -> JWT Secret
After this PR merges, even a wrong secret will degrade gracefully instead of breaking auth.
Summary by CodeRabbit
Bug Fixes
Tests