Skip to content

Fix test flakes from port allocation races#847

Merged
tnull merged 1 commit intolightningdevkit:mainfrom
joostjager:fix-port-allocation
Mar 26, 2026
Merged

Fix test flakes from port allocation races#847
tnull merged 1 commit intolightningdevkit:mainfrom
joostjager:fix-port-allocation

Conversation

@joostjager
Copy link
Contributor

@joostjager joostjager commented Mar 26, 2026

No description provided.

@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Mar 26, 2026

🎉 This PR is now ready for review!
Please choose at least one reviewer by assigning them on the right bar.
If no reviewers are assigned within 10 minutes, I'll automatically assign one.
Once the first reviewer has submitted a review, a second will be assigned if required.

@tnull tnull requested review from tnull and removed request for tnull March 26, 2026 08:47
@joostjager joostjager force-pushed the fix-port-allocation branch 2 times, most recently from b068c07 to efd2b64 Compare March 26, 2026 09:47
@joostjager joostjager changed the title Track actual bound listening addresses in Node Fix test flakes from port allocation races Mar 26, 2026
@joostjager joostjager requested a review from tnull March 26, 2026 10:16
@joostjager joostjager marked this pull request as ready for review March 26, 2026 10:16
Copy link
Collaborator

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think we considered that before but you were of the opinion it wouldn't work? Any case, open to test whether it actually makes a difference (I'm not positive the flakes are actually stemming from this at all).

Just one nit.

@joostjager joostjager force-pushed the fix-port-allocation branch from efd2b64 to 39c79bc Compare March 26, 2026 13:39
@joostjager
Copy link
Contributor Author

Okay, I think we considered that before but you were of the opinion it wouldn't work? Any case, open to test whether it actually makes a difference (I'm not positive the flakes are actually stemming from this at all).

I think I suggested it in #718 (comment), but not sure if I was of the opinion that it wouldn't work.

But yes, let's run with this for a while, see if it makes a difference. My naive repro prs failed to repro, so no quick way to test this I think.

Integration tests allocate listening ports for test nodes by binding
to 127.0.0.1:0, reading the OS-assigned port, then immediately
dropping the socket. Later, Node::start() tries to bind to that same
port. This has two problems:

Parallel test collisions: with 35 tests running concurrently, each
creating 2-4 nodes with 2 ports each, the OS can reassign a freed
port to another test's node before the original node calls start().
This is a classic TOCTOU race. In CI, it caused ~50% of Rust test
runs to fail with InvalidSocketAddress, always exactly one random
test per run.

Restart instability: when a test stops and restarts a node, the port
is released during stop() and must be re-acquired during start().
Another test's node can grab it in between. The peer store still has
the old address, so auto-reconnection also fails.

Both problems are inherent to any scheme that allocates a port and
later releases it. The only way to guarantee a port stays yours is
to never release it, or to never share the port space.

A deterministic atomic counter starting at a fixed base port (20000)
solves both problems: each fetch_add returns a unique value, so no
two nodes in the same process ever get the same port, and ports are
stable across restarts because the same node keeps the same config.
There is no external contention because CI runners are isolated.

AI tools were used in preparing this commit.
@joostjager joostjager force-pushed the fix-port-allocation branch from 39c79bc to a51c2bd Compare March 26, 2026 13:48
@tnull tnull merged commit 4c6dfec into lightningdevkit:main Mar 26, 2026
19 checks passed
@joostjager joostjager self-assigned this Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants