Skip to content

fix: auth resilience -- graceful JWT fallback + placeholder detection (production outage postmortem)#264

Merged
DevanshuNEU merged 2 commits into
OpenCodeIntel:mainfrom
DevanshuNEU:fix/auth-jwt-fallback
Feb 28, 2026
Merged

fix: auth resilience -- graceful JWT fallback + placeholder detection (production outage postmortem)#264
DevanshuNEU merged 2 commits into
OpenCodeIntel:mainfrom
DevanshuNEU:fix/auth-jwt-fallback

Conversation

@DevanshuNEU

@DevanshuNEU DevanshuNEU commented Feb 28, 2026

Copy link
Copy Markdown
Collaborator

Incident

Production auth outage: all authenticated API calls returning 401 ('Invalid token or API key'). Dashboard shows 'No repositories yet' for logged-in users.

Root Cause

PR #253 (OPE-75) switched JWT verification from Supabase API calls to local HS256 decode. SUPABASE_JWT_SECRET on Railway was 'some_key' (a placeholder). Local decode against wrong secret failed silently. The code had no fallback. So if local decode failed, it assumed the token was bad instead of trying API verification.

Fix

Layer 1: Graceful fallback

When local JWT decode fails with InvalidTokenError, try Supabase API verification before giving up. Wrong/mismatched secret now degrades to 'slow auth' (API call per request) instead of 'broken auth' (401 for everyone).

Expired tokens and missing claims still fail immediately. No point retrying those via API.

Layer 2: Startup validation

  • auth.py __init__: detects placeholder secrets (< 32 chars or known values like 'some_key'). Nulls them out, logs error with Supabase dashboard instructions. Forces API path automatically.
  • startup_checks.py: warns at boot if JWT secret is suspiciously short.

Layer 3: Postmortem tests (3 new)

  • test_wrong_secret_falls_back_to_api: THE exact production scenario. Wrong secret, auth still works via API.
  • test_expired_token_does_not_fallback: expired tokens fail fast, no wasted API calls.
  • test_placeholder_secret_nulled_at_startup: 'some_key' detected and neutralized at init.

Immediate Action Required

Update SUPABASE_JWT_SECRET on Railway with the real value from: Supabase Dashboard -> Settings -> API -> JWT Secret

After this PR merges, even a wrong secret will degrade gracefully instead of breaking auth.

Summary by CodeRabbit

  • Bug Fixes

    • Added startup validation to detect and ignore weak/placeholder JWT secrets, preventing insecure local verification.
    • Token verification now falls back to server-side API verification when local decode fails (except for expired tokens), improving reliability and diagnostics.
    • Improved logging for authentication verification failures.
  • Tests

    • Updated tests to cover API-fallback scenarios, placeholder-secret behavior, and strict expired-token handling.

… postmortem tests

Production auth outage: SUPABASE_JWT_SECRET was a placeholder
('dev-secret-key') on Railway. PR OpenCodeIntel#253 switched JWT verification
from Supabase API calls to local decode. Local decode against a
wrong secret silently failed, causing 401 for all authenticated users.

Layer 1 -- Graceful fallback (services/auth.py):
When local JWT decode fails with InvalidTokenError, try Supabase
API verification before giving up. Wrong secret now degrades to
slow auth (API call per request) instead of broken auth (401).
Expired tokens and missing claims still fail immediately (no
point retrying those).

Layer 2 -- Startup validation:
- auth.py __init__: detects placeholder secrets (< 32 chars or
  known placeholders like 'dev-secret-key'). Nulls them out and
  logs error with instructions to get the real secret from
  Supabase dashboard. Forces API verification path.
- startup_checks.py: warns if JWT secret is suspiciously short.

Layer 3 -- Postmortem tests (3 new):
- test_wrong_secret_falls_back_to_api: THE production scenario.
  JWT secret is set but wrong. Verifies auth succeeds via API
  fallback instead of returning 401.
- test_expired_token_does_not_fallback: expired is expired.
  Verifies we don't waste an API call on genuinely expired tokens.
- test_placeholder_secret_nulled_at_startup: verifies
  'dev-secret-key' gets detected and nulled at init time.

Updated 2 existing tests:
- test_wrong_secret_raises_error -> test_wrong_secret_falls_back_not_raises
- test_wrong_audience_raises_error -> test_wrong_audience_falls_back_not_raises
Both now verify fallback behavior instead of asserting errors.

292 tests pass (289 + 3 new). Zero flake8 errors.
@vercel

vercel Bot commented Feb 28, 2026

Copy link
Copy Markdown

@DevanshuNEU is attempting to deploy a commit to the Dev's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai

coderabbitai Bot commented Feb 28, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9515fda and 89e0553.

📒 Files selected for processing (1)
  • backend/tests/test_jwt_local_decode.py

📝 Walkthrough

Walkthrough

Adds runtime validation for SUPABASE_JWT_SECRET at startup and changes auth verification to prefer local JWT decode when a valid secret is configured, but fall back to Supabase API verification when local decoding fails or when the secret is missing/placeholder. Tests updated to cover fallback and expired-token behavior.

Changes

Cohort / File(s) Summary
Startup Validation
backend/config/startup_checks.py
Added a runtime check that detects placeholder/too-short SUPABASE_JWT_SECRET values (<32 chars) and logs detailed errors; does not change control flow but may nullify secret usage downstream.
Auth Service Logic
backend/services/auth.py
On init, treat short/placeholder secrets as absent (null) to force API verification. verify_jwt now: attempt local decode if a valid secret exists; on InvalidTokenError log and fall back to API verification; still propagate TokenExpiredError and missing-claim errors without fallback.
Tests
backend/tests/test_jwt_local_decode.py
Updated tests to expect API fallback when local decode fails (wrong secret/audience), added tests ensuring expired tokens still raise, and added test that startup nullifies placeholder secrets to force API path.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant AuthService
    participant LocalVerify as Local JWT Decoder
    participant SupabaseAPI as Supabase API

    Client->>AuthService: verify_jwt(token)
    alt Valid JWT secret configured
        AuthService->>LocalVerify: decode(token, secret)
        alt decode succeeds
            LocalVerify-->>AuthService: claims
            AuthService-->>Client: user data
        else decode fails (InvalidTokenError)
            LocalVerify-->>AuthService: InvalidTokenError
            AuthService->>SupabaseAPI: get_user(token)
            SupabaseAPI-->>AuthService: user data
            AuthService-->>Client: user data
        else token expired / missing claim
            LocalVerify-->>AuthService: TokenExpiredError / TokenMissingClaimError
            AuthService-->>Client: raise error
        end
    else No/placeholder/too-short secret
        AuthService->>SupabaseAPI: get_user(token)
        SupabaseAPI-->>AuthService: user data
        AuthService-->>Client: user data
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I sniffed a short secret, hiding in the log,
I nudged it gently—now the API does the jog.
When local decode trips, I hop to the cloud,
Bringing users back safe, quiet and proud. 🥕🔐

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main changes: auth resilience improvements with graceful JWT fallback and placeholder detection in response to a production outage.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
backend/services/auth.py (1)

112-129: Synchronous network call may block async endpoints.

_verify_via_api makes a synchronous Supabase API call. If verify_jwt is called from an async request handler, this blocks the event loop during the fallback path.

Given this is a fallback path (not the happy path), the impact is limited, but consider wrapping the call in asyncio.to_thread() or making the method async for consistency with other I/O methods in this class (signup, login, etc.).

♻️ Optional: Make async-safe
-    def _verify_via_api(self, token: str) -> Dict[str, Any]:
+    async def _verify_via_api(self, token: str) -> Dict[str, Any]:
         """Fallback: verify via Supabase API call (requires network)."""
+        import asyncio
         try:
-            response = self.client.auth.get_user(token)
+            response = await asyncio.to_thread(
+                self.client.auth.get_user, token
+            )

Note: This would require verify_jwt to also become async or use a sync wrapper.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/services/auth.py` around lines 112 - 129, _verify_via_api currently
does a synchronous Supabase API call (self.client.auth.get_user) which can block
the event loop; make it async-safe by running that blocking call in a thread:
convert _verify_via_api to async and call the blocking call via
asyncio.to_thread(lambda: self.client.auth.get_user(token)) (or keep it sync but
move the to_thread call into verify_jwt if you prefer) and propagate the async
change to verify_jwt (update its signature and callers) so exceptions and the
existing return shape are preserved; keep the existing exception handling
(AuthenticationError, InvalidTokenError) and logging behavior.
backend/config/startup_checks.py (1)

56-65: Consider aligning placeholder detection with auth.py for consistency.

The startup check only validates length (< 32), but auth.py (lines 33-36) additionally checks for known placeholder strings like "super-secret-jwt-token-with-at-least-32-characters" which would pass the length check here.

For defense-in-depth consistency, consider extracting the placeholder detection logic into a shared helper or at minimum adding the same string checks here.

♻️ Optional: Add placeholder string check
     # Validate JWT secret looks real (not a placeholder)
     jwt_secret = os.getenv("SUPABASE_JWT_SECRET", "")
-    if jwt_secret and len(jwt_secret) < 32:
+    KNOWN_PLACEHOLDERS = {
+        "dev-secret-key", "secret", "your-jwt-secret", "change-me",
+        "test-secret-key", "super-secret-jwt-token-with-at-least-32-characters",
+    }
+    if jwt_secret and (len(jwt_secret) < 32 or jwt_secret in KNOWN_PLACEHOLDERS):
         logger.error(
             "SUPABASE_JWT_SECRET is too short -- likely a placeholder. "
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/config/startup_checks.py` around lines 56 - 65, The startup check
using jwt_secret currently only enforces length (< 32) but misses known
placeholder values used in auth.py; update the logic in startup_checks.py to
mirror auth.py's placeholder detection by rejecting the specific placeholder
string "super-secret-jwt-token-with-at-least-32-characters" (and any other
hardcoded placeholders used elsewhere) or, better, extract the placeholder test
into a shared helper (e.g., is_placeholder_jwt_secret(secret)) and call it from
startup_checks.py (use jwt_secret variable) so the logger.error branch is
triggered for both short secrets and recognized placeholder strings; ensure the
logger.error call (logger.error(..., secret_length=...)) remains and is used
when the helper indicates a placeholder.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/tests/test_jwt_local_decode.py`:
- Around line 216-225: Remove the dead `token` variable created by the
_make_token call: delete the lines that assign token = _make_token({"sub":
"expired-user"}, secret=JWT_SECRET) since the test uses `expired_token` produced
by pyjwt.encode; ensure no other references to `token` remain and run tests to
confirm nothing else depends on _make_token in this test.

---

Nitpick comments:
In `@backend/config/startup_checks.py`:
- Around line 56-65: The startup check using jwt_secret currently only enforces
length (< 32) but misses known placeholder values used in auth.py; update the
logic in startup_checks.py to mirror auth.py's placeholder detection by
rejecting the specific placeholder string
"super-secret-jwt-token-with-at-least-32-characters" (and any other hardcoded
placeholders used elsewhere) or, better, extract the placeholder test into a
shared helper (e.g., is_placeholder_jwt_secret(secret)) and call it from
startup_checks.py (use jwt_secret variable) so the logger.error branch is
triggered for both short secrets and recognized placeholder strings; ensure the
logger.error call (logger.error(..., secret_length=...)) remains and is used
when the helper indicates a placeholder.

In `@backend/services/auth.py`:
- Around line 112-129: _verify_via_api currently does a synchronous Supabase API
call (self.client.auth.get_user) which can block the event loop; make it
async-safe by running that blocking call in a thread: convert _verify_via_api to
async and call the blocking call via asyncio.to_thread(lambda:
self.client.auth.get_user(token)) (or keep it sync but move the to_thread call
into verify_jwt if you prefer) and propagate the async change to verify_jwt
(update its signature and callers) so exceptions and the existing return shape
are preserved; keep the existing exception handling (AuthenticationError,
InvalidTokenError) and logging behavior.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d6563d1 and 9515fda.

📒 Files selected for processing (3)
  • backend/config/startup_checks.py
  • backend/services/auth.py
  • backend/tests/test_jwt_local_decode.py

Comment thread backend/tests/test_jwt_local_decode.py Outdated
@vercel

vercel Bot commented Feb 28, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
opencodeintel Ignored Ignored Preview Feb 28, 2026 0:56am

@DevanshuNEU DevanshuNEU merged commit 0011fff into OpenCodeIntel:main Feb 28, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant