Skip to content

perf(mem): avoid eager scratch re-zero on restore#1605

Closed
simongdavies wants to merge 2 commits into
hyperlight-dev:mainfrom
simongdavies:perf/restore-fresh-scratch
Closed

perf(mem): avoid eager scratch re-zero on restore#1605
simongdavies wants to merge 2 commits into
hyperlight-dev:mainfrom
simongdavies:perf/restore-fresh-scratch

Conversation

@simongdavies

@simongdavies simongdavies commented Jul 1, 2026

Copy link
Copy Markdown
Member

On snapshot restore the guest scratch region is re-zeroed to reset state. On Linux/KVM SharedMemory::zero() is an O(1) madvise(MADV_DONTNEED), but on mshv (which maps guest memory up-front without host-MM notification) and Windows/WHP it degrades to O(size) memset.

This change swaps in a fresh demand-zero scratch region and remaps it into the guest (the same path already taken when the scratch size changes) for Windows and mshv. The OS zero-fills lazily on fault and the old region is released once it has been unmapped from the VM. Restore still resets while the eager memset becomes an O(1) allocation .

Temporary scaffolding for CI validation (to be reduced to the one-line change before review): a HYPERLIGHT_SCRATCH_ZERO_STRATEGY env var to force either mechanism, unit tests for the strategy resolver, and tests/scratch_restore_perf.rs which measures per-restore timing and asserts no mapping/handle leak across 50 restores of large scratch regions on both strategies (captured to the CI step summary).

…pervisors

On snapshot restore the guest scratch is re-zeroed to preserve cross-restore isolation. On Linux/KVM SharedMemory::zero() is an O(1) madvise(MADV_DONTNEED), but on hypervisors that map guest memory up-front without host-MM notification (Windows/WHP, MSHV) it degrades to an eager O(size) memset -- ~2.37s for a 256 MiB scratch on WHP guest-shared memory, paid on every restore.

Instead, swap in a fresh demand-zero scratch section and remap it into the guest (the same path already taken when the scratch size changes): the OS zero-fills lazily on fault and the old section is released once it has been unmapped from the VM. Restore stays hermetic while the eager memset becomes an O(1) allocation -- ~150-160x faster on WHP (2.37s -> 14.6ms at 256 MiB). Selection mirrors zero()'s own madvise gate (fresh iff the in-place zero would be eager).

Temporary scaffolding for CI validation (to be reduced to the one-line change before review): a HYPERLIGHT_SCRATCH_ZERO_STRATEGY env var to force either mechanism, unit tests for the strategy resolver, and tests/scratch_restore_perf.rs which measures per-restore timing and asserts no mapping/handle leak across 50 restores of large scratch regions on both strategies (captured to the CI step summary).
@simongdavies simongdavies added area/performance Addresses performance kind/enhancement For PRs adding features, improving functionality, docs, tests, etc. labels Jul 1, 2026
Signed-off-by: Simon Davies <simongdavies@users.noreply.github.com>
@simongdavies

Copy link
Copy Markdown
Member Author

Closing this, apporach does not work, is more expensive than just resetting the entire scratch to zero

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/performance Addresses performance kind/enhancement For PRs adding features, improving functionality, docs, tests, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant