Summary
In the current leader-election example, serverId may be generated from a deterministic source such as:
String serverId = Integer.toHexString(random.nextInt());
where random is initialised using this.hashCode().
|
private Random random = new Random(this.hashCode()); |
This can cause the same serverId to be reused across application restarts.
If the process restarts quickly (before the previous ZooKeeper session has expired), the new instance may incorrectly believe it is already the master.
This leads to an inconsistent application state where:
• ZooKeeper still considers the old session alive,
• the ephemeral /master node still belongs to that session,
• but the new process thinks it owns the mastership because the stored serverId matches its own. ‼️
Steps to reproduce
ZooKeeper binds ephemeral nodes (like /master) to sessions, not to processes.
When Process A becomes the master:
- It opens session S1.
- It creates
/master (ephemeral) with data X.
- It crashes, but ZooKeeper still keeps session S1 alive until the session timeout expires.
If Process B restarts and also gets serverId = "X":
• It calls checkMaster().
• It sees /master with data X.
• It mistakenly concludes “I am the master.”
But in reality:
• /masterstill belongs to the dead process’s session S1.
• B does not actually hold the lock.
• There is no real leader during this window.
This is a subtle but serious correctness problem.
Impact: New instance may skip leader election entirely because it believes it already won.
Expected Behaviour
A restarted process must always generate a new unique contender identity so that:
• If /master exists and contains a different ID, the new process correctly concludes that the old session is still considered alive.
• If /master is eventually deleted after session expiration, the new process can safely run for master again.
Proposed Fix
Replace any deterministic or repeatable serverId with a per-process unique value, for example:
String serverId = UUID.randomUUID().toString();
This ensures:
• Each process run has a unique identity.
• checkMaster() can correctly distinguish between:
• “I created /master”
• “The previous instance created /master, and ZooKeeper has not expired its session yet.”
Summary
In the current leader-election example,
serverIdmay be generated from a deterministic source such as:where random is initialised using
this.hashCode().zookeeper-book-example/src/main/java/org/apache/zookeeper/book/Master.java
Line 107 in 135b79b
This can cause the same
serverIdto be reused across application restarts.If the process restarts quickly (before the previous ZooKeeper session has expired), the new instance may incorrectly believe it is already the master.
This leads to an inconsistent application state where:‼️
• ZooKeeper still considers the old session alive,
• the ephemeral
/masternode still belongs to that session,• but the new process thinks it owns the mastership because the stored
serverIdmatches its own.Steps to reproduce
ZooKeeper binds ephemeral nodes (like
/master) to sessions, not to processes.When Process A becomes the master:
/master(ephemeral) with dataX.If Process B restarts and also gets serverId = "X":
• It calls
checkMaster().• It sees
/masterwith dataX.• It mistakenly concludes “I am the master.”
But in reality:
•
/masterstill belongs to the dead process’s session S1.• B does not actually hold the lock.
• There is no real leader during this window.
This is a subtle but serious correctness problem.
Impact: New instance may skip leader election entirely because it believes it already won.
Expected Behaviour
A restarted process must always generate a new unique contender identity so that:
• If
/masterexists and contains a different ID, the new process correctly concludes that the old session is still considered alive.• If
/masteris eventually deleted after session expiration, the new process can safely run for master again.Proposed Fix
Replace any deterministic or repeatable serverId with a per-process unique value, for example:
This ensures:
• Each process run has a unique identity.
•
checkMaster()can correctly distinguish between:• “I created
/master”• “The previous instance created
/master, and ZooKeeper has not expired its session yet.”