Skip to content

rtc: fail fast instead of hanging when FFI is used across fork()#740

Open
theomonnom wants to merge 1 commit into
mainfrom
theo/ffi-fork-guard
Open

rtc: fail fast instead of hanging when FFI is used across fork()#740
theomonnom wants to merge 1 commit into
mainfrom
theo/ffi-fork-guard

Conversation

@theomonnom

Copy link
Copy Markdown
Member

The native runtime (tokio worker/IO threads in liblivekit_ffi) does not survive fork(), and re-initializing only resets the event callback rather than rebuilding the runtime. A child process that inherited an already-initialized FfiClient singleton therefore hangs on every FFI request.

Guard FfiClient.instance with the owning pid and raise a clear RuntimeError instead of hanging. This does not affect spawn/forkserver (each process initializes its own FFI); it only turns the fork-after-init footgun into an actionable error.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@davidzhao davidzhao left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Comment thread livekit-rtc/livekit/rtc/_ffi_client.py Outdated
assert FfiClient.instance._ffi_lib.livekit_ffi_drop_handle(ctypes.c_uint64(self.handle))
ffi = FfiClient._owned_instance()
if ffi is not None:
assert ffi._ffi_lib.livekit_ffi_drop_handle(ctypes.c_uint64(self.handle))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will assert be stripped out if our customers use some optimization flags ?

if that gets stripped out, I will assume ffi._ffi_lib.livekit_ffi_drop_handle will never be called.

@xianshijing-lk xianshijing-lk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm assuming you will address the comment about 'assert'

@theomonnom theomonnom force-pushed the theo/ffi-fork-guard branch from e192426 to d06cfdd Compare July 2, 2026 00:52
devin-ai-integration[bot]

This comment was marked as resolved.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

Open in Devin Review

Comment on lines 83 to +87
def dispose(self) -> None:
if self.handle != INVALID_HANDLE and not self._disposed:
self._disposed = True
assert FfiClient.instance._ffi_lib.livekit_ffi_drop_handle(ctypes.c_uint64(self.handle))
dropped = FfiClient.instance._ffi_lib.livekit_ffi_drop_handle(ctypes.c_uint64(self.handle))
assert dropped

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Fork-safety guard missing on native handle cleanup, allowing child processes to hang or crash on exit

Inherited native handles are released into the dead runtime (livekit_ffi_drop_handle at livekit-rtc/livekit/rtc/_ffi_client.py:86) without the same process-ID check added to requests, so forked child processes can hang or abort during garbage collection.

Impact: A forked child process may hang or crash at exit when Python's garbage collector cleans up inherited handle objects.

Mechanism: FfiHandle.dispose() bypasses the fork guard added to request()

The PR adds a PID check in request() (livekit-rtc/livekit/rtc/_ffi_client.py:274) and in the atexit handler (livekit-rtc/livekit/rtc/_ffi_client.py:265) to prevent calling into the native FFI library from a forked child process. However, FfiHandle.dispose() at livekit-rtc/livekit/rtc/_ffi_client.py:83-87 also calls into the native library via livekit_ffi_drop_handle, and has no such guard.

When a parent process creates livekit objects (Room, tracks, audio streams, etc.), each creates FfiHandle instances. After fork(), the child inherits these objects. When the child exits, Python's GC invokes __del__ (livekit-rtc/livekit/rtc/_ffi_client.py:76-77) on these inherited handles, which calls dispose(), which calls livekit_ffi_drop_handle on the dead native runtime — the exact scenario the PR is designed to prevent.

Multiple callers use FfiHandle extensively: room.py:550, audio_stream.py:138, video_stream.py:70, track.py:35, participant.py:134, and many others.

Suggested change
def dispose(self) -> None:
if self.handle != INVALID_HANDLE and not self._disposed:
self._disposed = True
assert FfiClient.instance._ffi_lib.livekit_ffi_drop_handle(ctypes.c_uint64(self.handle))
dropped = FfiClient.instance._ffi_lib.livekit_ffi_drop_handle(ctypes.c_uint64(self.handle))
assert dropped
def dispose(self) -> None:
if self.handle != INVALID_HANDLE and not self._disposed:
self._disposed = True
if FfiClient.instance._pid != os.getpid():
return # native runtime does not survive fork()
dropped = FfiClient.instance._ffi_lib.livekit_ffi_drop_handle(ctypes.c_uint64(self.handle))
assert dropped
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@theomonnom theomonnom force-pushed the theo/ffi-fork-guard branch from 42a8d60 to 3a092f3 Compare July 2, 2026 01:03
The native runtime (tokio worker/IO threads in liblivekit_ffi) does not
survive fork() and cannot be rebuilt by re-initializing, so a child that
inherited an initialized FfiClient would hang on every request. Guard
request() with the owning pid and raise a clear error instead. The atexit
dispose handler is likewise skipped in fork children (it needs the runtime
and would hang at exit). Also move the drop_handle side effect out of an
assert so it is not stripped under python -O.
@theomonnom theomonnom force-pushed the theo/ffi-fork-guard branch from 3a092f3 to 389b573 Compare July 2, 2026 01:26
@xianshijing-lk

Copy link
Copy Markdown
Contributor

oh, one question, is there any way to add a test for the fork behavior ? so that we can prevent regression

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants