I think there's a logic bug in the implementation of Heartbeat._bind_socket. The current code looks like this:
max_attempts = 1 if self.original_port else 100
for attempt in range(max_attempts):
try:
self._try_bind_socket()
except zmq.ZMQError as ze:
if attempt == max_attempts - 1:
raise
# Raise if we have any error not related to socket binding
if ze.errno != errno.EADDRINUSE and ze.errno != win_in_use:
raise
# Raise if we have any error not related to socket binding
if self.original_port == 0:
self.pick_port()
else:
raise
If self._try_bind_socket fails with an exception, we try again; that part makes sense to me. But if self._try_bind_socket succeeds ..., we also enter the next iteration of the for loop and try again. I think there should be a return or a break at that point.
When I test this on my machine (in the case where self.original_port is False), the effect is that 50 different ports are tried, twice each: the first attempt of each pair succeeds, and the second one fails (because the port is already in use). The end result is that the run method is abandoned and the heartbeat isn't ever properly started.
Changing 100 to 101 (or 99) allows the heartbeat to start. That's obviously not the right fix. :-)
I think there's a logic bug in the implementation of
Heartbeat._bind_socket. The current code looks like this:If
self._try_bind_socketfails with an exception, we try again; that part makes sense to me. But ifself._try_bind_socketsucceeds ..., we also enter the next iteration of theforloop and try again. I think there should be areturnor abreakat that point.When I test this on my machine (in the case where
self.original_portisFalse), the effect is that 50 different ports are tried, twice each: the first attempt of each pair succeeds, and the second one fails (because the port is already in use). The end result is that therunmethod is abandoned and the heartbeat isn't ever properly started.Changing
100to101(or99) allows the heartbeat to start. That's obviously not the right fix. :-)