Skip to content

Add snapshot golden tests#1446

Merged
ludfjig merged 3 commits into
hyperlight-dev:mainfrom
ludfjig:snapshot-golden-tests
Jul 1, 2026
Merged

Add snapshot golden tests#1446
ludfjig merged 3 commits into
hyperlight-dev:mainfrom
ludfjig:snapshot-golden-tests

Conversation

@ludfjig

@ludfjig ludfjig commented May 8, 2026

Copy link
Copy Markdown
Contributor

This PR adds tests to help notice breaking changes in the guest<->host ABI. This matters because any change there breaks every snapshot already persisted to disk (bad!).

  • persist a set of snapshots to GHCR, and on each PR pull them and run tests against them. Note that these snapshots are not expected to be regenerated often. I'm using GHCR instead of checking in raw binaries into repo to not clutter git repo (there might be a lot of them eventually), and also to exercise the "OCI-ness" of the snapshot format.
  • add compile-time tripwires that pin every value defining the format (SNAPSHOT_ABI_VERSION, the media-type strings, the OCI layout version, HyperlightPEB field offsets and size, and the OutBAction ports), helping develpers catch changes that would be breaking
  • add some additional regular tests for loading snapshots
  • define and outline the development process for breaking snapshot compatibility (and regenerating new golden snapshtos)

When a change to the format is intentional, the author bumps the version and applies the regen-goldens label. A workflow then regenerates the goldens against the branch and pushes the new set to GHCR.

See docs/snapshot-versioning.md for details.

@ludfjig ludfjig added regen-goldens Regenerate snapshot golden fixtures and removed regen-goldens Regenerate snapshot golden fixtures labels May 8, 2026
@ludfjig ludfjig force-pushed the snapshot-golden-tests branch 2 times, most recently from db6f7fb to b5a0b52 Compare June 8, 2026 23:34
@ludfjig ludfjig force-pushed the snapshot-golden-tests branch 5 times, most recently from e8bcea9 to 0115ba2 Compare June 24, 2026 19:22
@ludfjig ludfjig added the kind/enhancement For PRs adding features, improving functionality, docs, tests, etc. label Jun 24, 2026
@ludfjig ludfjig force-pushed the snapshot-golden-tests branch 6 times, most recently from 22a0cff to 0658445 Compare June 25, 2026 19:11
@ludfjig ludfjig marked this pull request as ready for review June 25, 2026 21:47
Copilot AI review requested due to automatic review settings June 25, 2026 21:47
@ludfjig ludfjig requested a review from danbugs as a code owner June 25, 2026 21:47
@ludfjig ludfjig requested a review from squillace as a code owner June 25, 2026 21:47
@ludfjig ludfjig changed the title [test do not merge] snapshot golden tests Add snapshot golden tests Jun 25, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds snapshot “golden” testing and ABI tripwires to detect breaking changes in the on-disk guest↔host snapshot format, with CI automation to pull/regenerate published baselines from GHCR and validate cross-CPU portability.

Changes:

  • Introduces a custom-harness snapshot_goldens test binary (libtest-mimic) plus fixtures/checks for snapshot load/round-trip validation against staged OCI image layouts.
  • Adds compile-time snapshot ABI tripwires pinning key format-defining constants and struct/layout offsets.
  • Extends CI/Justfile tooling to pull/publish/regenerate goldens via GHCR (oras), including a regen-goldens label-controlled path and cross-CPU verification.

Reviewed changes

Copilot reviewed 22 out of 23 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/tests/rust_guests/simpleguest/src/main.rs Adds guest/host echo and round-trip functions plus heap-pattern helpers used by golden fixtures/checks.
src/hyperlight_host/tests/snapshot_goldens/platform.rs Detects platform (hypervisor/cpu/profile) and builds golden tag names for staging/verification.
src/hyperlight_host/tests/snapshot_goldens/oci.rs Locates the staged OCI image-layout directories under target/snapshot-goldens/.
src/hyperlight_host/tests/snapshot_goldens/main.rs Implements the custom harness entrypoint for verify and generate subcommands.
src/hyperlight_host/tests/snapshot_goldens/goldens_version.rs Defines GOLDENS_VERSION and compat version set for multi-major verification.
src/hyperlight_host/tests/snapshot_goldens/fixtures.rs Defines canonical sandbox config + deterministic mutation sequence for golden generation.
src/hyperlight_host/tests/snapshot_goldens/checks.rs Adds independent checks that load goldens and validate captured state + ABI wire formats.
src/hyperlight_host/tests/integration_test.rs Adjusts heap sizing in a couple tests to account for added guest-function registration overhead.
src/hyperlight_host/src/sandbox/snapshot/tripwires.rs Adds const-eval assertions pinning ABI-critical constants, offsets, and discriminants.
src/hyperlight_host/src/sandbox/snapshot/mod.rs Wires the new tripwires module into the snapshot module.
src/hyperlight_host/src/sandbox/snapshot/file/mod.rs Exposes snapshot format constants to support tripwires.
src/hyperlight_host/src/sandbox/snapshot/file/media_types.rs Tightens docs/visibility for snapshot media-type and ABI version constants.
src/hyperlight_host/src/sandbox/snapshot/file/config.rs Adds targeted load-validation tests and JSON schema “pinned” round-trip tests.
src/hyperlight_host/src/sandbox/snapshot/file_tests.rs Adds tests for from_snapshot config behavior and init-data permission round-tripping.
src/hyperlight_host/Cargo.toml Adds libtest-mimic and registers the snapshot_goldens custom harness test target.
Justfile Adds snapshot-goldens pull/generate/verify recipes and ensures clippy covers the custom harness target.
docs/snapshot-versioning.md Documents snapshot format versioning, enforcement mechanisms, and regen/publish workflows.
docs/github-labels.md Documents the regen-goldens workflow-affecting label.
Cargo.lock Locks new dependencies introduced for the golden test harness.
.github/workflows/ValidatePullRequest.yml Adds label-driven mode selection and introduces cross-CPU verification for regen path; disables fail-fast.
.github/workflows/RegenSnapshotGoldens.yml Adds a workflow to generate/publish versioned golden OCI layouts (with completion marker) to GHCR.
.github/workflows/dep_snapshot_cross_verify.yml Adds reusable workflow to verify regenerated goldens across CPU vendors.
.github/workflows/dep_build_test.yml Adds pull-vs-regen golden verification steps (oras + just recipes) and uploads regen artifacts.

Comment thread src/hyperlight_host/tests/snapshot_goldens/platform.rs Outdated
@danbugs danbugs added the ready-for-review PR is ready for (re-)review label Jun 25, 2026

@danbugs danbugs left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, just 2 minor comments--great work!! Let's just get 1 more person to look at this as it is a pretty big PR.

Comment thread .github/workflows/ValidatePullRequest.yml
Comment thread Justfile

@syntactically syntactically left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had time only for a pretty quick skim over this.

add compile-time tripwires that pin every value defining the format (SNAPSHOT_ABI_VERSION, the media-type strings, the OCI layout version, HyperlightPEB field offsets and size, and the OutBAction ports), helping develpers catch

Why are the PEB fields important? I think the are used only during creation of the original snapshot from a binary? If they are not important I would like to be careful not to constrain on them for no reason.

make sure snapshots taken on AMD are usable on Intel, and vice versa

Are we certain already that this is a goal? I thought we purposefully punted on this and compatibility across hypervisors until we had a better sense of whether it would be onerous.

Comment thread docs/snapshot-versioning.md Outdated
Comment thread Justfile
@ludfjig ludfjig force-pushed the snapshot-golden-tests branch from 2219634 to 9ed1e1b Compare June 26, 2026 22:41
@ludfjig

ludfjig commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

Why are the PEB fields important? I think the are used only during creation of the original snapshot from a binary? If they are not important I would like to be careful not to constrain on them for no reason.

You're right, only the size of the PEB struct matters, the fields themselves are never read back by the host.

make sure snapshots taken on AMD are usable on Intel, and vice versa

Are we certain already that this is a goal? I thought we purposefully punted on this and compatibility across hypervisors until we had a better sense of whether it would be onerous.

Addressed in #1587

@ludfjig ludfjig force-pushed the snapshot-golden-tests branch 3 times, most recently from c9b2a9d to 5e0d25e Compare June 29, 2026 21:32

@simongdavies simongdavies left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not fully reviewed this but I have a couple of suggestions:

Can we move all the login that is embedded as script in the PR workflows into Justfile recipes and/or scripts so that they can easily beused/testd locally?

Do we want to run this on every PR? Can we run nightly and/or on release and then open a release blocker issue if it fails? We might also want running this as a gate to creating a new release?

Comment thread .github/workflows/dep_build_test.yml
Comment thread .github/workflows/dep_build_test.yml Outdated
Comment thread .github/workflows/RegenSnapshotGoldens.yml
Comment thread docs/github-labels.md
@ludfjig ludfjig force-pushed the snapshot-golden-tests branch 3 times, most recently from 5455de1 to 1a30967 Compare June 30, 2026 18:19
@ludfjig

ludfjig commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

Do we want to run this on every PR? Can we run nightly and/or on release and then open a release blocker issue if it fails? We might also want running this as a gate to creating a new release?

It should be very lightweight on each PR, it's just a couple of small tests. I think it's worth it to catch early rather than to discover after the fact

@ludfjig ludfjig force-pushed the snapshot-golden-tests branch from 1a30967 to bd27c5a Compare June 30, 2026 20:27
@ludfjig

ludfjig commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

Pushed some updates. I'd like feedback on the following

  • just test-like-ci runs the snapshot golden test without first pulling. You will need to manually pull images first using just snapshot-goldens-verify to pass the tests. Should we include pulling in this recipe (or pull-if-absent logic maybe).
  • there is duplicated logic in bash/rust for detectign host machine details. This is on purpose so users don't need to compile any rust in order to pull images
  • host_cpu_vendor_golden_tag is a new public method in hyperlight-host, needed for the new integration test harness. Ideally I want to avoid exposing this publicly, but since integration tests ignore cfg(test), i'm sure if there's a better way (adding a cargo feature just for this seems a little extreme as well?)

@ludfjig ludfjig force-pushed the snapshot-golden-tests branch from bd27c5a to 571a802 Compare June 30, 2026 22:52
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
@ludfjig ludfjig force-pushed the snapshot-golden-tests branch from 571a802 to bfdc6f1 Compare June 30, 2026 23:05
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
@simongdavies

Copy link
Copy Markdown
Member

Do we want to run this on every PR? Can we run nightly and/or on release and then open a release blocker issue if it fails? We might also want running this as a gate to creating a new release?

It should be very lightweight on each PR, it's just a couple of small tests. I think it's worth it to catch early rather than to discover after the fact

OK maybe we should add cache entries int the job for the golden images? or perhaps if add the oras cli install to the image then we should pull all the OCIs onto the CI image that way we dont waste time each run waiting for the pulls? Also means that if they change the changes will still get pulled and everything will work.

@simongdavies

Copy link
Copy Markdown
Member
  • just test-like-ci runs the snapshot golden test without first pulling. You will need to manually pull images first using just snapshot-goldens-verify to pass the tests. Should we include pulling in this recipe (or pull-if-absent logic maybe).

Maybe but perhaps we should add the tools as pre-reqs in the docs and then list the pull command/script needed in the docs also?

@simongdavies

Copy link
Copy Markdown
Member
  • host_cpu_vendor_golden_tag is a new public method in hyperlight-host, needed for the new integration test harness. Ideally I want to avoid exposing this publicly, but since integration tests ignore cfg(test), i'm sure if there's a better way (adding a cargo feature just for this seems a little extreme as well?)

Urghhh , maybe we can make this debug only and no-op the golden tests for release profile??

Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
@ludfjig

ludfjig commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

OK maybe we should add cache entries int the job for the golden images? or perhaps if add the oras cli install to the image then we should pull all the OCIs onto the CI image that way we dont waste time each run waiting for the pulls? Also means that if they change the changes will still get pulled and everything will work.

I am not worried about this, the images are really very small and should pull super quick. If we really want to, we could do this in a follow-up. Installing oras into the image is a good idea though.

just test-like-ci runs the snapshot golden test without first pulling. You will need to manually pull images first using just snapshot-goldens-verify to pass the tests. Should we include pulling in this recipe (or pull-if-absent logic maybe).

Maybe but perhaps we should add the tools as pre-reqs in the docs and then list the pull command/script needed in the docs also?

Added to docs in snapshot-versioning.md

host_cpu_vendor_golden_tag is a new public method in hyperlight-host, needed for the new integration test harness. Ideally I want to avoid exposing this publicly, but since integration tests ignore cfg(test), i'm sure if there's a better way (adding a cargo feature just for this seems a little extreme as well?)

Urghhh , maybe we can make this debug only and no-op the golden tests for release profile??

Moved to hyperlight_host::__private::host_cpu_vendor_golden_tag(), __private module should signify nobody should use this but us. I do really want to test release as well

@ludfjig ludfjig merged commit 35f3d45 into hyperlight-dev:main Jul 1, 2026
45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/enhancement For PRs adding features, improving functionality, docs, tests, etc. ready-for-review PR is ready for (re-)review regen-goldens Regenerate snapshot golden fixtures

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants