Close device socket in dispatcherClean to unblock EventReader thread#105
Draft
alicespetma-stack wants to merge 1 commit intoluxonis:masterfrom
Draft
Close device socket in dispatcherClean to unblock EventReader thread#105alicespetma-stack wants to merge 1 commit intoluxonis:masterfrom
alicespetma-stack wants to merge 1 commit intoluxonis:masterfrom
Conversation
When DispatcherWaitEventComplete times out and DispatcherClean is called, the EventReader thread remains stuck in recv() because the TCP socket is still open. This leaks a thread and socket for each failed connection attempt, and prevents clean retry. Fix: close the device socket at the start of dispatcherClean, before cleaning up events and semaphores. The shutdown(SHUT_RDWR) inside closeDeviceFd causes recv() to return an error, letting the EventReader exit its loop via the resetXLink flag. The dispatcherDeviceFdDown flag prevents double-close when dispatcherReset (which also closes the FD) calls dispatcherClean. We inline the flag check rather than calling dispatcherDeviceFdDown() to avoid deadlock on the non-recursive reset_mutex.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When
DispatcherWaitEventCompletetimes out (e.g. after the timeout fixes in PR #104),DispatcherCleanis called to clean up the dispatcher state. However, the EventReader thread remains stuck inrecv()because the TCP socket is never closed during cleanup. This leaks a thread and socket for each failed connection attempt, preventing clean retry.Problem
The cleanup flow today:
DispatcherWaitEventCompletetimes out, returns errorDispatcherCleandispatcherCleansetsresetXLink = 1and destroys semaphoresresetXLinkat the top of its loop...recv()is blocked foreverThe EventReader's
recv()call (intcpipPlatformRead) has noSO_RCVTIMEO, so it blocks until data arrives or the connection breaks. If the remote device is unresponsive, neither happens.Fix
Close the device socket at the start of
dispatcherClean, before cleaning up events and semaphores. Theshutdown(SHUT_RDWR)insidecloseDeviceFdcausesrecv()to return an error immediately, allowing the EventReader thread to exit its loop.The
dispatcherDeviceFdDownflag prevents double-close whendispatcherReset(which also closes the FD) callsdispatcherClean. The flag check is inlined rather than callingdispatcherDeviceFdDown()to avoid deadlock on the non-recursivereset_mutex, whichdispatcherResetalready holds when it calls us.Why not SO_RCVTIMEO?
SO_RCVTIMEOwas considered but would affect allrecv()calls on the socket, including healthy connections where large data transfers may legitimately take a long time. Closing the socket on cleanup is more surgical — it only interruptsrecv()when we already know the connection is dead (timeout has occurred).Files Modified
src/shared/XLinkDispatcher.cdispatcherCleanCompanion PR
This works together with PR #104 (configurable timeouts for
XLinkConnectandXLinkOpenStream). PR #104 enables timeout-based error return from the main thread; this PR ensures the EventReader thread also cleans up.Test Plan
XLinkConnecttimeout, verify EventReader thread exits (no leaked threads)XLinkConnectretry succeeds on a healthy devicedispatcherResetpath still works correctly (no double-close)