Skip to content

evict .exec variant when its digest is evicted#2503

Open
MarcusSorealheis wants to merge 1 commit into
TraceMachina:mainfrom
MarcusSorealheis:bugfix/#2474-exec-accounting
Open

evict .exec variant when its digest is evicted#2503
MarcusSorealheis wants to merge 1 commit into
TraceMachina:mainfrom
MarcusSorealheis:bugfix/#2474-exec-accounting

Conversation

@MarcusSorealheis

@MarcusSorealheis MarcusSorealheis commented Jul 4, 2026

Copy link
Copy Markdown
Member

Description

Evict .exec variant when the digest is evicted. Pretty simple..

Fixes #2474

Type of change

Please delete options that aren't relevant.

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

bazel test

Checklist

  • Updated documentation if needed
  • Tests added/amended
  • bazel test //... passes locally
  • PR is contained in a single commit, using git amend see some docs

This change is Reviewable

@vercel

vercel Bot commented Jul 4, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
nativelink Ready Ready Preview, Comment Jul 5, 2026 3:27am
nativelink-aidm Ready Ready Preview, Comment Jul 5, 2026 3:27am

Request Review

@corcillo

corcillo commented Jul 5, 2026

Copy link
Copy Markdown
Collaborator

@MarcusSorealheis wondering thinking between your approach (deleting associated exec files on the call back post deletion, thus not counting in overall running size count), vs keeping track of the exec files in the size count? Basically if you wait to do the exec eviction like this the size count wont technically be accurate as it will be missing those exec sizes.

@corcillo

corcillo commented Jul 5, 2026

Copy link
Copy Markdown
Collaborator

Also, if a digest gets evicted while its .exec copy is still being created, the eviction's cleanup runs too early before there's anything to delete? Then the copy would get published with no owner and still leak into memory until restart. Not sure if this will happen a lot based on timing or if its an edge case?

@MarcusSorealheis

Copy link
Copy Markdown
Member Author

It's a good question.. I did consider giving .exec its own tracked size in evicting_map and went with the callback/tether approach instead, for a few reasons.

The first is, EvictingMap calls len() once at insert to add to sum_store_size, and again at removal to subtract it (evicting_map.rs:150, :645-656). Sothere's no resize hook in between. If a digest's len() grew after insert because an executable variant got materialized later, removal would subtract a bigger number than was ever added, underflowing sum_store_size. This is the core of the issue as I understand it today. @dkostyrev let me know if you have thoughts here.

The actual failure that @dkostyrev pointed out is unbounded disk growt:

roughly doubling the on-disk footprint and driving the host past max_bytes until it runs out of disk.

and here, deleting a digest's .exec sibling via the existing remove-callback hook whenever that digest is evicted from the primary map bounds total disk to ~2x max_bytes absolute worst-case instead of unbounded.

Also, I did not want to add any additional config options or accounting if not absolutely necessary.

@corcillo corcillo closed this Jul 5, 2026
@corcillo corcillo reopened this Jul 5, 2026
@corcillo

corcillo commented Jul 5, 2026

Copy link
Copy Markdown
Collaborator

@MarcusSorealheis Ok so for the race condition I mentioned, can you do something to recheck membership after create_executable_variant publishes the variant, so if the digest was evicted mid-copy and the exec file is now orphaned, we follow up by deleting it and returning NotFound?

@MarcusSorealheis

Copy link
Copy Markdown
Member Author

can you do something to recheck membership after create_executable_variant publishes the variant

yes, good thinking. hold my beer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Executable-variant cache (.exec) is not tracked by max_bytes and never evicted at runtime

2 participants