perf(mem): avoid eager scratch re-zero on restore by simongdavies · Pull Request #1605 · hyperlight-dev/hyperlight

simongdavies · 2026-07-01T10:53:40Z

On snapshot restore the guest scratch region is re-zeroed to reset state. On Linux/KVM SharedMemory::zero() is an O(1) madvise(MADV_DONTNEED), but on mshv (which maps guest memory up-front without host-MM notification) and Windows/WHP it degrades to O(size) memset.

This change swaps in a fresh demand-zero scratch region and remaps it into the guest (the same path already taken when the scratch size changes) for Windows and mshv. The OS zero-fills lazily on fault and the old region is released once it has been unmapped from the VM. Restore still resets while the eager memset becomes an O(1) allocation .

Temporary scaffolding for CI validation (to be reduced to the one-line change before review): a HYPERLIGHT_SCRATCH_ZERO_STRATEGY env var to force either mechanism, unit tests for the strategy resolver, and tests/scratch_restore_perf.rs which measures per-restore timing and asserts no mapping/handle leak across 50 restores of large scratch regions on both strategies (captured to the CI step summary).

…pervisors On snapshot restore the guest scratch is re-zeroed to preserve cross-restore isolation. On Linux/KVM SharedMemory::zero() is an O(1) madvise(MADV_DONTNEED), but on hypervisors that map guest memory up-front without host-MM notification (Windows/WHP, MSHV) it degrades to an eager O(size) memset -- ~2.37s for a 256 MiB scratch on WHP guest-shared memory, paid on every restore. Instead, swap in a fresh demand-zero scratch section and remap it into the guest (the same path already taken when the scratch size changes): the OS zero-fills lazily on fault and the old section is released once it has been unmapped from the VM. Restore stays hermetic while the eager memset becomes an O(1) allocation -- ~150-160x faster on WHP (2.37s -> 14.6ms at 256 MiB). Selection mirrors zero()'s own madvise gate (fresh iff the in-place zero would be eager). Temporary scaffolding for CI validation (to be reduced to the one-line change before review): a HYPERLIGHT_SCRATCH_ZERO_STRATEGY env var to force either mechanism, unit tests for the strategy resolver, and tests/scratch_restore_perf.rs which measures per-restore timing and asserts no mapping/handle leak across 50 restores of large scratch regions on both strategies (captured to the CI step summary).

Signed-off-by: Simon Davies <simongdavies@users.noreply.github.com>

simongdavies · 2026-07-02T10:38:45Z

Closing this, apporach does not work, is more expensive than just resetting the entire scratch to zero

simongdavies added area/performance Addresses performance kind/enhancement For PRs adding features, improving functionality, docs, tests, etc. labels Jul 1, 2026

fixes

3964052

Signed-off-by: Simon Davies <simongdavies@users.noreply.github.com>

simongdavies closed this Jul 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(mem): avoid eager scratch re-zero on restore#1605

perf(mem): avoid eager scratch re-zero on restore#1605
simongdavies wants to merge 2 commits into
hyperlight-dev:mainfrom
simongdavies:perf/restore-fresh-scratch

simongdavies commented Jul 1, 2026 •

edited

Loading

Uh oh!

simongdavies commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

simongdavies commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simongdavies commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

simongdavies commented Jul 1, 2026 •

edited

Loading