GUACAMOLE-2238: Multiple guacamole-server code paths do not correctly…#643
GUACAMOLE-2238: Multiple guacamole-server code paths do not correctly…#643bbennett-ks wants to merge 1 commit intoapache:staging/1.6.1from
Conversation
|
Looks pretty good to me, @bbennett-ks. Just going to see if @mike-jumper has any comments or concerns before merging this. I'm wondering if this will help resolve some of the other bugs we have lingering, like 1998, which deals with hung processes and the like. |
2f630cc to
db0f828
Compare
|
An instance of this issue (GUACAMOLE-2239) was identified as the core cause of one intermittent issue. |
mike-jumper
left a comment
There was a problem hiding this comment.
It'd be good to be a bit more DRY. Otherwise, LGTM.
src/common/io.c
Outdated
| do { | ||
| bytes_written = write(fd, bytes, length); | ||
| } while (bytes_written < 0 && errno == EINTR); |
There was a problem hiding this comment.
There's a lot of this pattern repeated. If possible, I think we should pull anything like:
do {
retval = do_thing(params);
} while (retval < 0 && errno == EINTR);into libguac functions/macros that can be reused instead of duplicated.
There was a problem hiding this comment.
There's a lot of this pattern repeated. If possible, I think we should pull anything like:
do { retval = do_thing(params); } while (retval < 0 && errno == EINTR);into libguac functions/macros that can be reused instead of duplicated.
Does this approach look good?
db0f828 to
3076f6d
Compare
|
After looking at kernel & glib source, I have a few more changes I'd like to make. I've changed the PR back to a draft. |
3076f6d to
2f7ea03
Compare
Summary
This PR hardens blocking I/O and wait paths against signal interruption (EINTR).
When a signal is delivered (e.g. when a child process exits), some blocking syscalls can return early with -1 with errno = EINTR . This is not a hard failure: the syscall should be retried.
This PR adds EINTR retry loops around blocking calls in various guacamole-server. Equivalent handling for the Windows equivalent was also added.
Without retry, signal (async) delivery can cause a variety of errors: network connections errors, read/write failures, premature timeouts...
select Handling
int select(int nfds, fd_set *_Nullable restrict readfds, fd_set *_Nullable restrict writefds, fd_set *_Nullable restrict exceptfds, struct timeval *_Nullable restrict timeout);Special care has been taken with the select system call, as it has parameters which are modified by the system: how these parameters are handled on retry must be considered. Based on the Linux man page:
On error, -1 is returned, and is set to indicate the error; the file descriptor sets are unmodified, and timeout becomes undefined.select retry is based on this.