Fix orphan MemcachedConnection after client being shutdown by shy-1234 · Pull Request #188 · Netflix/EVCache

shy-1234 · 2026-03-04T08:33:57Z

We found scenarios when the EVCacheClientPoolManager was shutdown, but somehow an orphan connection thread was not being shutdown correctly. The Sequence is like:

Caller thread calls EVCacheClientPoolManager.shutdown()
asyncExecutor.shutdown() is called — this only prevents new tasks from being submitted. A refresh() task already in the executor queue (or currently running) continues to execute.
Caller thread then calls pool.shutdown() → serverGroupDisabled() → removes all server groups from the map and shuts down their connections (sets shutDown=true, running=false)
Meanwhile on the executor thread, a previously scheduled refresh() runs (or was already running). refresh() has no _shutdown check. It:
- Discovers instances from the provider
- Creates brand new EVCacheClient instances (line 1081), each with a new EVCacheConnection thread
- Calls setupNewClientsByServerGroup() (line 1093) which does map.put(sg, newClients) (line 877)
- map.put() returns null as currentClients (because step 3 already removed the old ones)
- Since currentClients == null, line 889 returns early — no shutdown of the new clients happens
The new EVCacheConnection thread is now running with shutDown=false and running=true
The bad node fails to connect → ConnectException → queueReconnect() → shutDown is false on this instance → node gets re-queued → infinite reconnect loop

The fix is to guard refresh() on _shutdown. One thing to note is that we could have gone through the _shutdown check right before the shutdown is signaled, which would still result in the same issue. To avoid this, we need to mark shutdown as synchronized the same as refresh(). This way shutdown() and refresh() are mutually exclusive. Either:

refresh() runs first to completion, then shutdown() acquires the lock and shuts down everything (including the newly created clients now in the map)
shutdown() runs first, empties the map, sets _shutdown = true, then refresh() acquires the lock, sees _shutdown == true, and returns immediately

respect shutdown in refresh

48739f6

shy-1234 requested review from Sunjeet and akashdeepgoel March 4, 2026 08:34

Sunjeet approved these changes Mar 5, 2026

View reviewed changes

shy-1234 merged commit 30df901 into master Mar 7, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix orphan MemcachedConnection after client being shutdown#188

Fix orphan MemcachedConnection after client being shutdown#188
shy-1234 merged 1 commit intomasterfrom
dev/sh/orphanconnection

shy-1234 commented Mar 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shy-1234 commented Mar 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants