Skip to content

builders: bump max-silent-time to 4h#950

Open
booxter wants to merge 1 commit intoNixOS:mainfrom
booxter:bump-silent-time
Open

builders: bump max-silent-time to 4h#950
booxter wants to merge 1 commit intoNixOS:mainfrom
booxter:bump-silent-time

Conversation

@booxter
Copy link

@booxter booxter commented Jan 31, 2026

After LTO was enabled for nixpkgs firefox darwin build, we started
hitting hydra timeouts due to long silence during XUL linkage:

https://hydra.nixos.org/build/320487707
https://hydra.nixos.org/build/319560484

I believe this happens because nix-store does not propagate settings
from derivation meta to remote nix-daemon after initial connection. (See
NixOS/nix#15125 for more details and a potential
fix.)

While this issue is not fixed in Nix, this patch bumps max-silent-time
to 4h for all builders, both darwin and linux. The 4h setting comes from
the observation that all nixpkgs packages that do set meta.maxSilent
set it to 4h.

Note: while we don't have immediate need to bump the limit to 4h for
non-mac builders, it seems prudent to do so because at least some
meta.maxSilent settings in nixpkgs, including for firefox, originate
from linux timeouts:

NixOS/nixpkgs#129212
NixOS/nixpkgs#129115

After LTO was enabled for nixpkgs firefox darwin build, we started
hitting hydra timeouts due to long silence during XUL linkage:

https://hydra.nixos.org/build/320487707
https://hydra.nixos.org/build/319560484

I believe this happens because nix-store does not propagate settings
from derivation meta to remote nix-daemon after initial connection. (See
NixOS/nix#15125 for more details and a potential
fix.)

While this issue is not fixed in Nix, this patch bumps max-silent-time
to 4h for all builders, both darwin and linux. The 4h setting comes from
the observation that all nixpkgs packages that *do* set meta.maxSilent
set it to 4h.

Note: while we don't have immediate need to bump the limit to 4h for
non-mac builders, it seems prudent to do so because at least some
meta.maxSilent settings in nixpkgs, including for firefox, originate
from linux timeouts:

NixOS/nixpkgs#129212
NixOS/nixpkgs#129115

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
@booxter
Copy link
Author

booxter commented Feb 2, 2026

@mweinelt FYI

@mweinelt
Copy link
Member

mweinelt commented Feb 2, 2026

Won't make this call alone. This will affect our darwin builders quite a bit.

@booxter
Copy link
Author

booxter commented Feb 7, 2026

To clarify, this patch affects both macos and other builders. (The latter part can be reverted if it's of concern but my thinking is that the reason for macos bump is not specific to macos, even if it manifested itself on this platform.)

I don't think this change would affect builders by much since it only changes the silence timeout, and it does it relatively gradually. The regular default per-job timeout stays intact.


Regardless, I'm told that Hydra is switching to a new queue implementation that won't use nix-store protocol to propagate derivations, and that hopefully won't be affected by the meta amnesia. If that happens soon, this patch won't be needed. I will leave the PR open in case you decide Hydra update has to be postponed for some reason.

@mweinelt
Copy link
Member

mweinelt commented Feb 8, 2026

Yes, I expect the new queue-runner to not overload our machines so much as the current one does. That might alleviate the pains we have right now

@booxter
Copy link
Author

booxter commented Mar 4, 2026

@mweinelt was the new hydra queue management layer deployed? I still see timeouts from what looks like silence timer not respected: https://hydra.nixos.org/build/322983173

@Mic92
Copy link
Member

Mic92 commented Mar 12, 2026

I don't think it's fully done yet.

@mweinelt
Copy link
Member

What do y'all think about bumping the silence timeout?

@vcunat
Copy link
Member

vcunat commented Mar 12, 2026

Are we sure that these Firefox (or some other) builds would significantly benefit from this? Could it be that the linking exhausted RAM and we got into some thrashing because of that?

@vcunat
Copy link
Member

vcunat commented Mar 12, 2026

I'd say that one hour without any output certainly is suspicious generally. It's alchemy when to give up, though, as both ways can lead to wasted resources.

@mweinelt
Copy link
Member

mweinelt commented Mar 12, 2026

Yeah, I'm worried that the process is just stalling because the host is thrashing, too. Mach builds are not generally quiet for longer durations except during the linker stage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants