Summary
#1471 added lazy-retry for the JWKS fetch, so a failing fetch at boot no longer caches None indefinitely. But there is no cache-invalidation strategy once a successful JWKS is cached — if Auth0 rotates signing keys between fetches, tokens signed with the new key will fail validation until the process restarts or a manual cache-clear runs.
What goes wrong
- After a successful fetch,
_jwks_cache[issuer] is held for the process lifetime.
- Auth0 rotates signing keys periodically;
PyJWKClient has a 5-minute internal cache but the outer _jwks_cache wrapper doesn't expire successful fetches at all.
- Result: a rotation event causes every token signed with the new kid to 401 on this API until pods restart.
Suggested fix
Add a success-cache TTL consistent with rotation cadence (Auth0 default is days; conservative is 1 hour):
_JWKS_SUCCESS_TTL_SECONDS = 3600
def _fetch_jwks(issuer: str) -> PyJWKClient | None:
cached = _jwks_cache.get(issuer)
now = time.monotonic()
if cached is not None:
fetched_at, client = cached
if client is not None and now - fetched_at < _JWKS_SUCCESS_TTL_SECONDS:
return client
# success older than TTL → re-fetch
# … rest of fetch logic
Alternatively, because PyJWKClient already handles per-kid lookup + its own short cache, let it handle rotation internally and simplify the outer wrapper to just cache the client instance (not the JWKS itself) forever — the client will refetch individual kids on miss.
Severity
Medium — no active incident, but rotation-day outages are silent and self-healing only after restarts.
Relates to
Fixes #1471 (merged), #1468 (merged).
Summary
#1471 added lazy-retry for the JWKS fetch, so a failing fetch at boot no longer caches
Noneindefinitely. But there is no cache-invalidation strategy once a successful JWKS is cached — if Auth0 rotates signing keys between fetches, tokens signed with the new key will fail validation until the process restarts or a manual cache-clear runs.What goes wrong
_jwks_cache[issuer]is held for the process lifetime.PyJWKClienthas a 5-minute internal cache but the outer_jwks_cachewrapper doesn't expire successful fetches at all.Suggested fix
Add a success-cache TTL consistent with rotation cadence (Auth0 default is days; conservative is 1 hour):
Alternatively, because
PyJWKClientalready handles per-kid lookup + its own short cache, let it handle rotation internally and simplify the outer wrapper to just cache the client instance (not the JWKS itself) forever — the client will refetch individual kids on miss.Severity
Medium — no active incident, but rotation-day outages are silent and self-healing only after restarts.
Relates to
Fixes #1471 (merged), #1468 (merged).