Skip to content

Commit 72ca8f9

Browse files
committed
cli/interactive_tests: wait for liveness replication before node shutdowns
This commit extends the fix from #151573 to cover all node shutdown operations in test_demo_node_cmds.tcl. Previously, the test only waited for the liveness range to have 5 voting replicas before decommissioning a node, but not before shutdown operations. Without this check, if the liveness range has fewer than 3 voting replicas when a node is shut down, and that node is one of them, the cluster can lose quorum on the liveness range. This causes subsequent queries to system tables (particularly crdb_internal.kv_node_liveness) to hang, resulting in test timeouts. The fix adds the liveness range replication check in two places: 1. Before shutting down node 3 (the first shutdown operation) 2. Before shutting down node 6 (after adding a new node) Release note: None
1 parent 1c3e1bb commit 72ca8f9

File tree

1 file changed

+40
-0
lines changed

1 file changed

+40
-0
lines changed

pkg/cli/interactive_tests/test_demo_node_cmds.tcl

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,26 @@ send "\\demo restart 2\r"
3636
eexpect "node 2 is already running"
3737
eexpect "defaultdb>"
3838

39+
# Wait for the liveness range to have the default 5 voters. If its replication
40+
# factor is too low, shutting down the node below can cause it to lose quorum
41+
# and stall queries to system tables (example: #147867).
42+
set timeout 2
43+
set stmt "select range_id, array_length(voting_replicas,1) from crdb_internal.ranges where range_id=2;\r"
44+
send $stmt
45+
expect {
46+
"2 | 5" {
47+
puts "\rliveness range has 5 voters"
48+
}
49+
timeout {
50+
puts "\rliveness range does not yet have 5 voters"
51+
sleep 2
52+
send $stmt
53+
exp_continue
54+
}
55+
}
56+
# Reset timeout back to 45 to match common.tcl.
57+
set timeout 45
58+
3959
# Shut down a separate node.
4060
send "\\demo shutdown 3\r"
4161
eexpect "node 3 has been shutdown"
@@ -148,6 +168,26 @@ eexpect "5 | region=us-west1,az=b"
148168
eexpect "6 | region=ca-central,zone=a"
149169
eexpect "defaultdb>"
150170

171+
# Wait for the liveness range to have the default 5 voters. If its replication
172+
# factor is too low, shutting down the node below can cause it to lose quorum
173+
# and stall queries to system tables (example: #147867).
174+
set timeout 2
175+
set stmt "select range_id, array_length(voting_replicas,1) from crdb_internal.ranges where range_id=2;\r"
176+
send $stmt
177+
expect {
178+
"2 | 5" {
179+
puts "\rliveness range has 5 voters"
180+
}
181+
timeout {
182+
puts "\rliveness range does not yet have 5 voters"
183+
sleep 2
184+
send $stmt
185+
exp_continue
186+
}
187+
}
188+
# Reset timeout back to 45 to match common.tcl.
189+
set timeout 45
190+
151191
# Shut down the newly created node.
152192
send "\\demo shutdown 6\r"
153193
set timeout 120

0 commit comments

Comments
 (0)