Skip to content

fix: handshake worker starve when loading external certificate#369

Merged
mholt merged 2 commits intocaddyserver:masterfrom
bndcts:master
Feb 17, 2026
Merged

fix: handshake worker starve when loading external certificate#369
mholt merged 2 commits intocaddyserver:masterfrom
bndcts:master

Conversation

@bndcts
Copy link
Contributor

@bndcts bndcts commented Feb 13, 2026

addresses this issue #368 (comment)

  • Fix goroutine leak when using external certificate managers (e.g. Tailscale get_certificate)
  • External certs are not added to certmagic's in-memory cache, so the previous leader/waiter pattern caused waiters to recursively miss the cache, spawn new leaders, and accumulate
    goroutines indefinitely under concurrent load
  • Share the leader's result directly with waiting goroutines via a new certLoadWaiter struct, avoiding the cache round-trip for externally-managed certificates

}

// no other goroutine is currently trying to load this cert
waiter = &certLoadWaiter{done: make(chan struct{})}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we pool these maybe to avoid some allocations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, the channel get's closed to wake all waiters at once and can't really be reused and needs to be freshly allocated everytime, so pooling would only save the small struct around it.
Putting it back in the pool is also not straightforward I think, because the waiters still hold references after the leader is done, so we'd probably need some kind of reference counting to know when it's safe.

It feels like a lot of complexity for one allocation per contended name, but happy to reconsider if you see it differently.

Copy link
Member

@mholt mholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks like a reasonable approach to me.

We can see if the allocations (as mentioned above) become an issue via profiling.

@mholt mholt merged commit 8a2d459 into caddyserver:master Feb 17, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants