Skip to content

orchestrator: get dirty memory from Firecracker#1937

Merged
bchalios merged 15 commits intomainfrom
feat/fc-dirty-memory
Feb 25, 2026
Merged

orchestrator: get dirty memory from Firecracker#1937
bchalios merged 15 commits intomainfrom
feat/fc-dirty-memory

Conversation

@bchalios
Copy link
Contributor

This PR uses the newly published versions of Firecracker, which support getting dirty memory from the /memory/dirty endpoint and use this instead of internal book-keeping for deciding which pages need to be included in the snapshot.

@bchalios bchalios force-pushed the feat/fc-dirty-memory branch 3 times, most recently from 277a44e to 1f77f35 Compare February 18, 2026 16:42
@bchalios bchalios marked this pull request as ready for review February 18, 2026 18:15
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 076111c8e8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, CURRENT_TIMESTAMP)
`, build.id, "FROM e2bdev/base:latest", dbtypes.BuildStatusUploaded,
2, 512, 512, 1982, "vmlinux-6.1.102", "v1.12.1_d990331", "0.2.4",
2, 512, 512, 1982, "vmlinux-6.1.102", "v1.12.1_a41d3fb", "0.2.4",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these just reference the shared feature flag default const?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a const, I'd say it makes sense yes. Where do we define that?

@sitole sitole assigned djeebus and unassigned sitole Feb 19, 2026
@ValentaTomas
Copy link
Member

I'm testing one more thing here, so please wait before merging.

@ValentaTomas
Copy link
Member

ValentaTomas commented Feb 19, 2026

The last commit should add unit and integration tests that reproduce the bug fixable by uncommenting https://github.com/e2b-dev/infra/blob/feat/fc-dirty-memory/packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go#L351.

You can run the unit test by:

sudo `which go` test -v -run "TestAsyncWriteProtection" -count=1 -timeout 120s ./internal/sandbox/uffd/userfaultfd/

@bchalios
Copy link
Contributor Author

The last commit should add unit and integration tests that reproduce the bug fixable by uncommenting https://github.com/e2b-dev/infra/blob/feat/fc-dirty-memory/packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go#L351.

You can run the unit test by:

sudo `which go` test -v -run "TestAsyncWriteProtection" -count=1 -timeout 120s ./internal/sandbox/uffd/userfaultfd/

So, the bug Tomas found is a real one. What happens is that copy()/zero() in the UFFD clears the Write Protection bit unless we explicitly tell it not to.

The effect of this bug is that read-only pages are marked as dirty and included in the snapshot. This is not a functional issue, it "just" causes snapshots to be bigger. That's why the integration test doesn't fail.

In any case, the fix proposed by Tomas is the correct one; i.e. we need to copy() data with COPY_WP mode when the access causing the page fault is a read access.

bchalios and others added 7 commits February 23, 2026 11:19
Use new v1.10 and v1.12 Firecracker versions with support for getting
information about dirty memory calling the /memory/dirty endpoint.

Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
Bump it to version v1.12 which now supports the logic for getting dirty
memory via the /memory/dirty endpoint. It also brings a few new features
from Firecracker, e.g. NetworkOverrides during loading snapshots.

Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
Use Firecracker's /memory/dirty endpoint for getting information about
the guest's dirty memory.
Otherwise calling copy() on the UFFD will clear it. Also fix comment
regarding the structure of pagemap.

Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
@bchalios bchalios force-pushed the feat/fc-dirty-memory branch from 1573f51 to a976557 Compare February 23, 2026 10:20
@ValentaTomas
Copy link
Member

  • Refactored the wp test to control order of ops.
  • 🚧 Added (unit) test for checking the build-resume loop with all default FC versions as there is compatibility problem with the network override FC client when using FC 1.10. This should catch similar problems in the future
    • I have a fix prepared, but it modifies the swagger file, so I'll first wait for the tests to fail, so we are sure this can be caught in the future PRs too

@ValentaTomas
Copy link
Member

  • Refactored the wp test to control order of ops.

  • 🚧 Added (unit) test for checking the build-resume loop with all default FC versions as there is compatibility problem with the network override FC client when using FC 1.10. This should catch similar problems in the future

    • I have a fix prepared, but it modifies the swagger file, so I'll first wait for the tests to fail, so we are sure this can be caught in the future PRs too

Confirmed failure on the 1.10 in the tests:

The filesystem on /tmp/TestSmokeAllFCVersions2798398403/001/orchestrator/build-templates/dbdfff6a-c456-4de6-bb2c-cea7ef94124e/rootfs.filesystem.build is now 261376 (4k) blocks long.
smoke_test.go:95: build 23da1036-f8ef-4d1e-b8e3-0dcd222419d0 done
smoke_test.go:98: resuming build 23da1036-f8ef-4d1e-b8e3-0dcd222419d0
smoke_test.go:137: resumed in 149.661069ms
--- FAIL: TestSmokeAllFCVersions (58.56s)
--- FAIL: TestSmokeAllFCVersions/fc-v1.10 (24.74s)
--- PASS: TestSmokeAllFCVersions/fc-v1.12 (25.23s)
FAIL
FAIL github.com/e2b-dev/infra/packages/orchestrator/cmd/smoketest 58.832s

Copy link
Member

@ValentaTomas ValentaTomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If everything passes, I think this should be ready.

@bchalios bchalios merged commit 4ecbd1b into main Feb 25, 2026
35 checks passed
@bchalios bchalios deleted the feat/fc-dirty-memory branch February 25, 2026 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants