orchestrator: get dirty memory from Firecracker#1937
Conversation
277a44e to
1f77f35
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 076111c8e8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, CURRENT_TIMESTAMP) | ||
| `, build.id, "FROM e2bdev/base:latest", dbtypes.BuildStatusUploaded, | ||
| 2, 512, 512, 1982, "vmlinux-6.1.102", "v1.12.1_d990331", "0.2.4", | ||
| 2, 512, 512, 1982, "vmlinux-6.1.102", "v1.12.1_a41d3fb", "0.2.4", |
There was a problem hiding this comment.
Should these just reference the shared feature flag default const?
There was a problem hiding this comment.
If we have a const, I'd say it makes sense yes. Where do we define that?
|
I'm testing one more thing here, so please wait before merging. |
|
The last commit should add unit and integration tests that reproduce the bug fixable by uncommenting https://github.com/e2b-dev/infra/blob/feat/fc-dirty-memory/packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go#L351. You can run the unit test by: sudo `which go` test -v -run "TestAsyncWriteProtection" -count=1 -timeout 120s ./internal/sandbox/uffd/userfaultfd/ |
So, the bug Tomas found is a real one. What happens is that The effect of this bug is that read-only pages are marked as dirty and included in the snapshot. This is not a functional issue, it "just" causes snapshots to be bigger. That's why the integration test doesn't fail. In any case, the fix proposed by Tomas is the correct one; i.e. we need to copy() data with COPY_WP mode when the access causing the page fault is a read access. |
44d4f09 to
970109a
Compare
Use new v1.10 and v1.12 Firecracker versions with support for getting information about dirty memory calling the /memory/dirty endpoint. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
Bump it to version v1.12 which now supports the logic for getting dirty memory via the /memory/dirty endpoint. It also brings a few new features from Firecracker, e.g. NetworkOverrides during loading snapshots. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
Use Firecracker's /memory/dirty endpoint for getting information about the guest's dirty memory.
Otherwise calling copy() on the UFFD will clear it. Also fix comment regarding the structure of pagemap. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
1573f51 to
a976557
Compare
|
Confirmed failure on the 1.10 in the tests:
|
ValentaTomas
left a comment
There was a problem hiding this comment.
If everything passes, I think this should be ready.
This PR uses the newly published versions of Firecracker, which support getting dirty memory from the
/memory/dirtyendpoint and use this instead of internal book-keeping for deciding which pages need to be included in the snapshot.