Skip to content

sbftcpconnection fixes#23

Open
lmcnulty4 wants to merge 1 commit intoblu-corner:masterfrom
lmcnulty4:bugfix/sbftcpconnection-races
Open

sbftcpconnection fixes#23
lmcnulty4 wants to merge 1 commit intoblu-corner:masterfrom
lmcnulty4:bugfix/sbftcpconnection-races

Conversation

@lmcnulty4
Copy link

@lmcnulty4 lmcnulty4 commented Feb 17, 2021

Removed mQueued field from sbfTcpConnection. This was being toggled in
different threads without synchronisation so was undefined behaviour. I
don't believe it provided any functionality since before enqueuing we
call bufferevent_disable and after dequeuing we call bufferevent_enable.

Removed mDestroyed for the same reason.

Reworked how sbfTcpConnection_destroy works. Previously, it would call
bufferevent_free on mEvent. However, other threads (in particular, the
work queue thread on to which read events are dispatched) could be using
the mEvent, causing use after frees (manifesting as hangs, as freed
mutexes inside libevent would be locked).

Now when calling sbfTcpConnection_destroy, all callbacks are cleared
(preventing further events being fired by libevent, hence preventing
further messages being enqueued on to the work queue), and then a final
item is enqueued on the work queue, which frees the resources of the
sbfTcpConnection.

To ensure the bufferevent is always alive when the dequeuing is done,
when enqueuing it we now increment the bufferevent's reference count,
using bufferevent_incref, and after dequeuing it we decrement it using
bufferevent_decref (with no incrementing done for the final destroy
event enqueued obviously).

I believe (though not certain) that clearing the callbacks is thread
safe, so no further events will fire after they're cleared (even on
other threads). If this is true then we can get rid of the ref counting
altogether (including the mRefCount) and just free things only in the
final destroy callback which is now enqueued when calling
sbfTcpConnection_destroy.

One other change is to no longer free the bufferevent whenever an error
occurs and we enqueue an error work item on the queue. I think freeing
the bufferevent here complicates things and doesn't provide anything
useful, it leaves the sbfTcpConnection in a weird state of being
half freed and half not freed, and there's no way to create a new
bufferevent for an existing sbfTcpConnection.

One consequence of the destroy rework is that the sbfQueue used by the
sbfTcpConnection must now be freed after the sbfTcpConnection has been
freed. This makes more intuitive sense than the opposite (since the
sbfTcpConnection uses the queue, it should be freed first), but its now
definitely required (since we enqueue when destroying which, if the
sbfQueue were already destroyed, would fail the assert when enqueuing)
as opposed to before where the queue could be destroyed first and almost
never would throw an assertion error (though it would if libevent tried
to enqueue something, if there was e.g. activity on a socket, so a
sporadic assertion failure would happen before if the queue was
destroyed first).

Removed mQueue field from sbfTcpConnection. This was being toggled in
different threads without synchronisation so was undefined behaviour. I
don't believe it provided any functionality since before enqueuing we
call bufferevent_disable and after dequeuing we call bufferevent_enable.

Removed mDestroyed for the same reason.

Reworked how sbfTcpConnection_destroy works. Previously, it would call
bufferevent_free on mEvent. However, other threads (in particular, the
work queue thread on to which read events are dispatched) could be using
the mEvent, causing use after frees (manifesting as hangs, as freed
mutexes inside libevent would be locked).

Now when calling sbfTcpConnection_destroy, all callbacks are cleared
(preventing further events being fired by libevent, hence preventing
further messages being enqueued on to the work queue), and then a final
item is enqueued on the work queue, which frees the resources of the
sbfTcpConnection.

To ensure the bufferevent is always alive when the dequeuing is done,
when enqueuing it we now increment the bufferevent's reference count,
using bufferevent_incref, and after dequeuing it we decrement it using
bufferevent_decref (with no incrementing done for the final destroy
event enqueued obviously).

I believe (though not certain) that clearing the callbacks is thread
safe, so no further events will fire after they're cleared (even on
other threads). If this is true then we can get rid of the ref counting
altogether (including the mRefCount) and just free things only in the
final destroy callback which is now enqueued when calling
sbfTcpConnection_destroy.

One other change is to no longer free the bufferevent whenever an error
occurs and we enqueue an error work item on the queue. I think freeing
the bufferevent here complicates things and doesn't provide anything
useful, it leaves the sbfTcpConnection in a weird state of being
half freed and half not freed, and there's no way to create a new
bufferevent for an existing sbfTcpConnection.

One consequence of the destroy rework is that the sbfQueue used by the
sbfTcpConnection must now be freed after the sbfTcpConnection has been
freed. This makes more intuitive sense than the opposite (since the
sbfTcpConnection uses the queue, it should be freed first), but its now
definitely required (since we enqueue when destroying which, if the
sbfQueue were already destroyed, would fail the assert when enqueuing)
as opposed to before where the queue could be destroyed first and almost
never would throw an assertion error (though it would if libevent tried
to enqueue something, if there was e.g. activity on a socket, so a
sporadic assertion failure would happen before if the queue was
destroyed first).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant