Skip to content

Fix orphan MemcachedConnection after client being shutdown#188

Merged
shy-1234 merged 1 commit intomasterfrom
dev/sh/orphanconnection
Mar 7, 2026
Merged

Fix orphan MemcachedConnection after client being shutdown#188
shy-1234 merged 1 commit intomasterfrom
dev/sh/orphanconnection

Conversation

@shy-1234
Copy link
Contributor

@shy-1234 shy-1234 commented Mar 4, 2026

We found scenarios when the EVCacheClientPoolManager was shutdown, but somehow an orphan connection thread was not being shutdown correctly. The Sequence is like:

  1. Caller thread calls EVCacheClientPoolManager.shutdown()
  2. asyncExecutor.shutdown() is called — this only prevents new tasks from being submitted. A refresh() task already in the executor queue (or currently running) continues to execute.
  3. Caller thread then calls pool.shutdown() → serverGroupDisabled() → removes all server groups from the map and shuts down their connections (sets shutDown=true, running=false)
  4. Meanwhile on the executor thread, a previously scheduled refresh() runs (or was already running). refresh() has no _shutdown check. It:
    - Discovers instances from the provider
    - Creates brand new EVCacheClient instances (line 1081), each with a new EVCacheConnection thread
    - Calls setupNewClientsByServerGroup() (line 1093) which does map.put(sg, newClients) (line 877)
    - map.put() returns null as currentClients (because step 3 already removed the old ones)
    - Since currentClients == null, line 889 returns early — no shutdown of the new clients happens
  5. The new EVCacheConnection thread is now running with shutDown=false and running=true
  6. The bad node fails to connect → ConnectException → queueReconnect() → shutDown is false on this instance → node gets re-queued → infinite reconnect loop

The fix is to guard refresh() on _shutdown. One thing to note is that we could have gone through the _shutdown check right before the shutdown is signaled, which would still result in the same issue. To avoid this, we need to mark shutdown as synchronized the same as refresh(). This way shutdown() and refresh() are mutually exclusive. Either:

  • refresh() runs first to completion, then shutdown() acquires the lock and shuts down everything (including the newly created clients now in the map)
  • shutdown() runs first, empties the map, sets _shutdown = true, then refresh() acquires the lock, sees _shutdown == true, and returns immediately

@shy-1234 shy-1234 merged commit 30df901 into master Mar 7, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants