perf(mem): avoid eager scratch re-zero on restore#1605
Closed
simongdavies wants to merge 2 commits into
Closed
Conversation
…pervisors On snapshot restore the guest scratch is re-zeroed to preserve cross-restore isolation. On Linux/KVM SharedMemory::zero() is an O(1) madvise(MADV_DONTNEED), but on hypervisors that map guest memory up-front without host-MM notification (Windows/WHP, MSHV) it degrades to an eager O(size) memset -- ~2.37s for a 256 MiB scratch on WHP guest-shared memory, paid on every restore. Instead, swap in a fresh demand-zero scratch section and remap it into the guest (the same path already taken when the scratch size changes): the OS zero-fills lazily on fault and the old section is released once it has been unmapped from the VM. Restore stays hermetic while the eager memset becomes an O(1) allocation -- ~150-160x faster on WHP (2.37s -> 14.6ms at 256 MiB). Selection mirrors zero()'s own madvise gate (fresh iff the in-place zero would be eager). Temporary scaffolding for CI validation (to be reduced to the one-line change before review): a HYPERLIGHT_SCRATCH_ZERO_STRATEGY env var to force either mechanism, unit tests for the strategy resolver, and tests/scratch_restore_perf.rs which measures per-restore timing and asserts no mapping/handle leak across 50 restores of large scratch regions on both strategies (captured to the CI step summary).
Member
Author
|
Closing this, apporach does not work, is more expensive than just resetting the entire scratch to zero |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
On snapshot restore the guest scratch region is re-zeroed to reset state. On Linux/KVM SharedMemory::zero() is an O(1) madvise(MADV_DONTNEED), but on mshv (which maps guest memory up-front without host-MM notification) and Windows/WHP it degrades to O(size) memset.
This change swaps in a fresh demand-zero scratch region and remaps it into the guest (the same path already taken when the scratch size changes) for Windows and mshv. The OS zero-fills lazily on fault and the old region is released once it has been unmapped from the VM. Restore still resets while the eager memset becomes an O(1) allocation .
Temporary scaffolding for CI validation (to be reduced to the one-line change before review): a HYPERLIGHT_SCRATCH_ZERO_STRATEGY env var to force either mechanism, unit tests for the strategy resolver, and tests/scratch_restore_perf.rs which measures per-restore timing and asserts no mapping/handle leak across 50 restores of large scratch regions on both strategies (captured to the CI step summary).