Skip to content

Infinite retry loop when certificate file is missing during ARI renewal #366

@ivan901229

Description

@ivan901229

What is your question?

I am running Caddy v2.8.4 with caddy-storage-redis. I am encountering a situation where certificate renewal gets stuck in an infinite loop due to a specific error handling behavior in the maintenance routine.

My question is:

  1. Is there a logic in CertMagic that prevents switching to "Obtain" (Issue new cert) mode when the "Renew" check fails with file does not exist?

  2. Is there any configuration parameter (e.g., in the global options or tls directive) that can force Caddy to attempt obtaining a new certificate if the existing one is missing from storage during a renewal check?

The Scenario:

  1. Trigger: CertMagic correctly identifies that a certificate needs renewal based on the ARI (ACME Renewal Info) window.
  2. Storage Check: When it tries to load the stored certificate to check validity/expiration, the storage backend returns an error indicating the file does not exist. (I verified manually that the certificate is indeed missing from Redis.)
  3. The Loop: Instead of treating this as "Certificate is missing -> Obtain a new one immediately", Caddy seems to treat it as a storage check failure and retries the check every 10 minutes indefinitely
  4. Result: It never initiates the actual ACME Obtain flow, eventually leading to expiry.

Observed Behavior vs. Expected Behavior:

  • Current: Storage returns "file does not exist" -> Log warning -> Retry check later (Infinite Loop).
  • Expected: Storage returns "file does not exist" -> Assume no cert -> Trigger immediate issuance (Obtain).

What have you already tried?

Environment:

  • Caddy Version: v2.8.4 (built with xcaddy)
  • Storage: github.com/pberkel/caddy-storage-redis (Azure Cache for Redis)

Logs:
The following logs appear every 10 minutes, showing the loop:

// 1. Caddy detects renewal is needed (ARI Window)
{"level":"info","ts":1766246914.403092,"logger":"tls","msg":"certificate needs renewal based on ARI window","subjects":["api.arisan.io"],"expiration":1768707020,"ari_cert_id":"jw0TovYuftFQbDMYOF1ZjiNykco.BmPJu4XVb2bwq0ivvMrCC-b-","next_ari_update":1766261915.2208467,"renew_check_interval":600}

// 2. Fails to read the file from storage, returning "file does not exist"
{"level":"warn","ts":1766246914.4040234,"logger":"tls.cache.maintenance","msg":"error while checking if stored certificate is also expiring soon","identifiers":["api.arisan.io"],"error":"file does not exist"}

// 3. Queues for renewal (but seems to fail or loop back to check because the file is "missing")
{"level":"info","ts":1766246914.4049966,"logger":"tls.cache.maintenance","msg":"certificate expires soon; queuing for renewal","identifiers":["api.arisan.io"],"remaining":2460105.595003929}

Workaround:
Restarting the Caddy process immediately resolves the issue. Upon restart, CertMagic recognizes the certificate is missing and handles the lifecycle correctly.

Include any other information or discussion.

I originally reported this to the caddy-storage-redis maintainer. They suggested that since the Storage interface is a simple CRUD layer, the decision logic on how to handle a "file does not exist" error during a renewal attempt is handled by CertMagic.

Bonus: What do you use this package for, and does it help you?

We use Caddy/CertMagic to manage automatic HTTPS for a high-traffic production API gateway.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions