fix: prevent stream localId collisions from destroying live stream state#102
Conversation
AdbConnection allocated stream IDs with random.nextInt() and never checked
them against streams already open on the connection. MessageQueue keys all
stream state on that raw ID, and stopListening removes it, so when two
concurrently-open streams collided on an ID, the first one to close wiped
the survivor's queues entry and its next take() failed with the fatal
IllegalStateException("Not listening for localId") even though it was
never closed.
- Allocate IDs sequentially (AtomicInteger), like AOSP's adb client, so
collisions within a connection are impossible.
- startListening now fails fast on a duplicate ID instead of silently
sharing state between two streams.
- A missing queues entry in poll() now surfaces as the graceful
AdbStreamClosed (which callers already handle) instead of
IllegalStateException.
What are the chances of this actually happening? |
|
A pure random collision is rare. With k streams open, each new Digging further into the source while writing this up: the frequent path to this exact exception doesn't need a collision at all. I've updated the PR description to lay out both paths. The change covers both: sequential IDs make collisions impossible, and converting the missing-entry case to (Pedro and Claudro thought through this) |
What this looks like in production
DuckDuckGo (and 20+ other orgs over the last 14 days, internal ref: MA-4060) run Maestro flows with
androidWebViewHierarchy: devtools, which fetches WebView hierarchies over Chrome DevTools using short-lived streams (HTTP + websocket via OkHttp) on the sameDadbconnection the driver uses for taps. Their flows fail on native taps that already matched: the tap is issued, and the command then dies with:For the affected flows this failed 4 out of 4 attempts across 4 different hosts. The exception escapes the devtools client's
catch (e: IOException)(it is anIllegalStateException) and fails the customer's command.Root cause
MessageQueuekeys all stream state on the rawlocalIdint, andstopListening(localId)removes it without holding any lock. There are two ways a live stream's state gets destroyed:1. Close-during-read race (no collision needed, the frequent path).
AdbStreamImpl.nextMessagealready handles "stream closed while reading" gracefully:But if another thread calls
close()(->stopListening) while a reader is blocked insidetake(), the nextpoll()findsqueues[localId] == nulland throwsIllegalStateException("Not listening for localId")instead ofAdbStreamClosed. Not anIOException, so it escapes the catch above and every caller's IO error handling. OkHttp closes sockets from watchdog/cancellation/pool-eviction threads while its reader threads are blocked, so the devtools path hits this interleaving routinely.2. localId collision (rare, the latent design flaw).
AdbConnection.newId()israndom.nextInt()with no uniqueness check. If two concurrently-open streams draw the same ID,startListeningsilently no-ops (putIfAbsent) and both streams share one state entry; the first to close wipes the survivor's state and its nexttake()throws the same fatal exception.Changes
AtomicInteger), like AOSP's adb client. Collisions within a connection become impossible.startListeningfails fast on a duplicate ID instead of silently sharing one state entry between two streams.poll()now throwsAdbStreamClosedinstead ofIllegalStateException. This makes the close-during-read race resolve through the existingIOExceptionhandling as a normal end-of-stream, which is what "the stream was closed under you" should mean.Testing
The new tests in
MessageQueueTestfail on master and pass with this change. The fullMessageQueueTestsuite (including the existing concurrency stress tests) passes. Device-dependent tests (DadbTest) were not run locally and rely on CI.