uffd: add support for UFFD_EVENT_REMOVE events#1896
Conversation
aef7cd8 to
eab27ea
Compare
packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go
Outdated
Show resolved
Hide resolved
packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go
Outdated
Show resolved
Hide resolved
a1d0a02 to
03e2966
Compare
b766eb3 to
4d197e4
Compare
packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go
Outdated
Show resolved
Hide resolved
packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go
Outdated
Show resolved
Hide resolved
e639acf to
88420f2
Compare
d2e5d09 to
8ea97e4
Compare
There was a problem hiding this comment.
Pull request overview
This PR upgrades Firecracker from v1.12.1 to v1.14.1 and adds support for UFFD_EVENT_REMOVE events in the userfaultfd handler, which is required to support Firecracker's free-page-reporting feature. When Firecracker calls madvise(MADV_DONTNEED) on guest memory ranges, the kernel emits REMOVE events; subsequent page faults on those addresses must be zero-filled rather than served from the snapshot file.
Changes:
- Adds
UFFD_EVENT_REMOVEhandling in the UFFD handler with a newpageTrackerto track per-page state (unfaulted,faulted,removed), replaces the oldblock.Trackerwith this new tracker, and introduces deferred fault retry logic for EAGAIN. - Bumps Firecracker default version to
v1.14.1_aba5ab3across feature flags, seeds, and CI, and regenerates the go-swagger client/models for the new Firecracker v1.14.1 OpenAPI spec (adding balloon hinting, pmem, serial device, memory hotplug, and structured CPU config). - Adds
FreePageReportingconfig flag propagated from the API layer through the orchestrator to Firecracker, enabled by default for v1.14+ sandboxes.
Reviewed changes
Copilot reviewed 68 out of 68 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
userfaultfd/userfaultfd.go |
Core UFFD handler: adds REMOVE event processing, page state tracking, deferred fault retry, zero-fill for removed pages |
userfaultfd/page_tracker.go |
New per-page state map (unfaulted/faulted/removed) |
userfaultfd/deferred.go |
Thread-safe collection for pagefaults that returned EAGAIN |
userfaultfd/prefault.go |
Extracted prefault logic from main handler |
userfaultfd/fd.go |
New UFFD constants/types: UFFD_EVENT_REMOVE, UFFDIO_ZEROPAGE, UffdRemove, etc. |
memory/mapping.go |
Simplified GetOffset/GetHostVirtAddr to return one fewer value |
block/tracker.go + tracker_test.go |
Deleted (replaced by pageTracker) |
fc/firecracker.yml |
OpenAPI spec bumped to v1.14.1 with new endpoints and definitions |
fc/models/* |
Regenerated go-swagger models for new Firecracker API |
fc/client/operations/* |
Regenerated go-swagger client operations |
sandbox/fc/client.go |
New enableFreePageReporting method |
sandbox/fc/process.go |
Passes freePageReporting flag to process creation |
sandbox/sandbox.go |
Config struct extended with FreePageReporting |
api/.../sandbox_features.go |
HasFreePageReporting() version check |
api/.../create_template.go |
Passes free-page-reporting flag when creating templates |
feature-flags/flags.go |
New v1.14 Firecracker version constant |
tests/integration/seed.go |
Seed data updated to v1.14.1 |
.github/actions/build-sandbox-template/action.yml |
CI updated to v1.14.1 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go
Outdated
Show resolved
Hide resolved
| u.wg.Go(func() error { | ||
| // The RLock must be called inside the goroutine to ensure RUnlock runs via defer, | ||
| // even if the errgroup is cancelled or the goroutine returns early. | ||
| // This check protects us against race condition between marking the request for prefetching and accessing the prefetchTracker. | ||
| u.settleRequests.RLock() | ||
| defer u.settleRequests.RUnlock() | ||
|
|
||
| var copyMode CULong | ||
| var accessType block.AccessType | ||
|
|
||
| // Performing copy() on UFFD clears the WP bit unless we explicitly tell | ||
| // it not to. We do that for faults caused by a read access. Write accesses | ||
| // would anyways cause clear the write-protection bit. | ||
| if pf.flags&UFFD_PAGEFAULT_FLAG_WRITE == 0 { | ||
| copyMode = UFFDIO_COPY_MODE_WP | ||
| accessType = block.Read | ||
| } else { | ||
| accessType = block.Write | ||
| } | ||
|
|
||
| handled, err := u.faultPage( | ||
| ctx, | ||
| addr, | ||
| offset, | ||
| source, | ||
| fdExit.SignalExit, | ||
| copyMode, | ||
| ) | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| if handled { | ||
| u.pageTracker.setState(addr, addr+u.pageSize, faulted) | ||
| } else { | ||
| deferred.push(pf) | ||
| } | ||
|
|
||
| u.prefetchTracker.Add(offset, accessType) |
There was a problem hiding this comment.
There is a race condition between the goroutine that calls u.pageTracker.setState(addr, addr+u.pageSize, faulted) (line 321) and the main event loop that reads u.pageTracker.get(addr) (line 276). The pageTracker itself is mutex-protected for individual operations, but between the time the main loop reads a page as unfaulted, dispatches a goroutine, and the goroutine eventually marks it faulted, a second pagefault for the same address can be dispatched and both goroutines can race to call faultPage for the same address. While EEXIST is handled (line 376), this scenario can also cause the deferred.push(pf) path to be taken by both, resulting in indefinite re-queueing. More critically, in the same goroutine, deferred.push(pf) is called (line 323) without holding the settleRequests lock, while the main loop drains deferred (line 235) without coordinating with in-flight goroutines, meaning a deferred fault could be drained and forgotten before the goroutine that pushed it is done executing.
There was a problem hiding this comment.
Not sure right now if this can be a problem, so we should recheck.
There was a problem hiding this comment.
Every time we see there are available events to read from UFFD, we wait for all Go routines in the workgroup to complete before we start handling the events of the outer iteration. That happens at line 186.
| } | ||
|
|
||
| if !handled { | ||
| span.RecordError(fmt.Errorf("page already faulted")) |
There was a problem hiding this comment.
The Prefault function in prefault.go (line 37-39) records an error to the span when handled is false (EAGAIN case), treating it as an error event. However, when faultPage returns (false, nil), it means the write returned EAGAIN and should be retried — it is not an error condition. The span should not record this as an error; at most a span attribute or event noting the deferral would be appropriate.
| span.RecordError(fmt.Errorf("page already faulted")) | |
| span.AddEvent("prefault deferred: page already faulted or write returned EAGAIN") |
| pagefaults = append(pagefaults, (*UffdPagefault)(unsafe.Pointer(&arg[0]))) | ||
| case UFFD_EVENT_REMOVE: | ||
| removes = append(removes, (*UffdRemove)(unsafe.Pointer(&arg[0]))) |
There was a problem hiding this comment.
In readEvents, the buf slice is reused across loop iterations, but the parsed UffdRemove and UffdPagefault pointers (at lines 122-124) point directly into the buffer's underlying memory via unsafe.Pointer. Since buf is overwritten by each subsequent syscall.Read call, all previously appended pointers now alias the same buffer memory and will reflect the data of the last-read event rather than the event they were parsed from.
Each parsed struct must be copied into its own allocation (e.g. v := *(*UffdPagefault)(unsafe.Pointer(&arg[0])); pagefaults = append(pagefaults, &v)) so that every pointer in the returned slices points to independent memory. Without this fix, processing multiple events in a single readEvents call leads to all events appearing identical (the last-read event), causing incorrect page state tracking and potentially silent data corruption in the guest VM's memory.
| pagefaults = append(pagefaults, (*UffdPagefault)(unsafe.Pointer(&arg[0]))) | |
| case UFFD_EVENT_REMOVE: | |
| removes = append(removes, (*UffdRemove)(unsafe.Pointer(&arg[0]))) | |
| v := *(*UffdPagefault)(unsafe.Pointer(&arg[0])) | |
| pagefaults = append(pagefaults, &v) | |
| case UFFD_EVENT_REMOVE: | |
| v := *(*UffdRemove)(unsafe.Pointer(&arg[0])) | |
| removes = append(removes, &v) |
There was a problem hiding this comment.
I think this is a vaild one. We only allocate buf once and then reuse it across iterations, where we take a pointer to its arg field. I think all events will be pointing to the same memory. The solution is to move the allocation inside the loop.
There was a problem hiding this comment.
Actually it's not. Indeed we're re-using the same buffer (buf) but the arg bit is something we're getting calling getMsgArg which returns a [24]byte value, which essentially copies the msg.arg field. It's not the same memory. I have validated this, by manually inspecting the addresses of the various variables when the loop handles more than one events.
| func (pt *pageTracker) faultedOffsets(ma *memory.Mapping) ([]uint64, error) { | ||
| pt.mu.RLock() | ||
| defer pt.mu.RUnlock() | ||
|
|
||
| offsets := make([]uint64, 0, len(pt.m)) | ||
| for addr := range pt.m { | ||
| offset, err := ma.GetOffset(addr) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("address %#x not in mapping: %w", addr, err) | ||
| } | ||
| offsets = append(offsets, uint64(offset)) | ||
| } |
There was a problem hiding this comment.
In cross_process_helpers_test.go, the faultedOffsets helper iterates over all pages in pt.m regardless of their state, including pages in the removed state. This means removed pages will appear in the faulted offsets output, which may cause test assertions to match incorrectly. Only pages in the faulted state should be included in the faulted offsets list.
There was a problem hiding this comment.
That is ok, because for this test we want all pages that were faulted and to be removed it had to be faulted.
There was a problem hiding this comment.
Though, we might want to change this if we are adding a new test for the removed.
| case UFFD_EVENT_PAGEFAULT: | ||
| pagefaults = append(pagefaults, (*UffdPagefault)(unsafe.Pointer(&arg[0]))) | ||
| case UFFD_EVENT_REMOVE: | ||
| removes = append(removes, (*UffdRemove)(unsafe.Pointer(&arg[0]))) |
There was a problem hiding this comment.
Event pointers reference ephemeral stack memory
High Severity
readEvents appends pointers created from arg := getMsgArg(msg), but arg is a per-iteration local copy. The stored *UffdPagefault and *UffdRemove pointers outlive that stack value, so later reads can use corrupted event data or repeated last-event values.
There was a problem hiding this comment.
Already noted in another comment.
packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go
Outdated
Show resolved
Hide resolved
| switch state := u.pageTracker.get(addr); state { | ||
| case faulted: | ||
| // TODO: Can we skip faulting pages that are already faulted? How does that play with prefaulting? | ||
| continue |
There was a problem hiding this comment.
Repeated faults are skipped indefinitely
Medium Severity
When a page is marked faulted, later pagefault events for that same address are ignored with continue. If a mapped page becomes missing again (for example after reclaim/swap behavior), the new fault is never resolved, leaving the guest thread blocked on an unhandled userfault.
There was a problem hiding this comment.
Added TODO there to resolve this.
There was a problem hiding this comment.
Do we actually use swap on the host? If not, the logic here is correct. The only reason why a page would become missing is because we previously removed it, and we are tracking that. If we do enable swap this whole logic is wrong. Actually it already was wrong, even before adding support for UFFD_EVENT_REMOVE. Imagine a page has been written. This means its contents have changed comparing to what is in the snapshot file. If that page gets swapped out, we have no way of knowing what data we should provide for it. I'm not really sure how swapping would interplay with UFFD at all.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 06d14431fe
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go
Outdated
Show resolved
Hide resolved
|
We should add a test that explicitly tests the |
| case UFFD_EVENT_PAGEFAULT: | ||
| pagefaults = append(pagefaults, (*UffdPagefault)(unsafe.Pointer(&arg[0]))) | ||
| case UFFD_EVENT_REMOVE: | ||
| removes = append(removes, (*UffdRemove)(unsafe.Pointer(&arg[0]))) |
There was a problem hiding this comment.
Uffd event pointers reference invalid stack memory
High Severity
readEvents stores pointers to arg, but arg is a per-iteration local copy ([24]byte). After the loop, those pointers no longer reliably reference valid event data. Serve then reads pf.flags, pf.address, and remove ranges from invalid memory, which can mis-handle faults or fail unpredictably under load.
Additional Locations (1)
There was a problem hiding this comment.
Not sure here, we should recheck.
There was a problem hiding this comment.
I think that if we move buf inside the loop then Go will understand that it escapes from the function's context and will just place it in heap.
There was a problem hiding this comment.
Event if we don't move buf inside the loop, Go is able to track that the memory (of arg) escapes the function context and puts it in the heap.
| span.RecordError(fmt.Errorf("page already faulted")) | ||
| } | ||
|
|
||
| return nil |
There was a problem hiding this comment.
Prefault silently succeeds after deferred write failure
Medium Severity
faultPage now returns handled=false on unix.EAGAIN, but Prefault treats !handled as non-fatal and still returns nil. This allows prefault requests to report success even when no page was mapped, leaving memory unfaulted and shifting failures to later runtime faults.
Additional Locations (1)
There was a problem hiding this comment.
The reason for this is that we get EAGAIN because of a UFFD_EVENT_REMOVE landing (concurrently) in the queue. When we get an EAGAIN during pre-faulting we don't explicitly handle a page fault, so ignoring here should be fine. I'm assuming here that when pre-faulting happens, we don't run the normal handler loop, right? If that's not the case, we might get issues if we are trying to "prefault" a page and handle a page fault to the same page at the same time.
There was a problem hiding this comment.
We do run the normal loop at the time of the prefaulting.
There was a problem hiding this comment.
Ok, I've spent some time thinking this. I don't think there's a problem with running prefaulter and on-demand faulting loop concurrently (as long as the page state tracker is handled/coordinated correctly). I can elaborate offline if you want why that is the case.
In any case, it's fine returning nil here. If EAGAIN happens, we will just not pre-fault the page. There isn't any functional problem with it.
| // would anyways cause clear the write-protection bit. | ||
| if pf.flags&UFFD_PAGEFAULT_FLAG_WRITE == 0 { | ||
| copyMode = UFFDIO_COPY_MODE_WP | ||
| accessType = block.Read |
There was a problem hiding this comment.
Wrong mode passed to zeropage ioctl
High Severity
Read faults set mode to UFFDIO_COPY_MODE_WP, and removed 4KB pages are handled via UFFDIO_ZEROPAGE. zero() forwards that mode directly to UFFDIO_ZEROPAGE, which does not use UFFDIO_COPY_MODE_WP. This can make zero-fill faults fail and break handling of UFFD_EVENT_REMOVE for standard pages.
Additional Locations (2)
There was a problem hiding this comment.
We should recheck if the zero does not handle mode wp.
There was a problem hiding this comment.
I'm pretty sure that it does. I will re-check
| deferred.push(pf) | ||
| } | ||
|
|
||
| u.prefetchTracker.Add(offset, accessType) |
There was a problem hiding this comment.
Prefetch tracker updated for deferred (unhandled) page faults
Medium Severity
u.prefetchTracker.Add(offset, accessType) is called unconditionally after faultPage, even when handled is false (i.e., the fault was deferred due to EAGAIN). This means the prefetch tracker records offsets for pages that haven't actually been served yet. When the deferred fault is later retried and succeeds, prefetchTracker.Add will be called a second time for the same offset. Previously, this call only happened after a successful copy. It likely needs to be inside the if handled block.
There was a problem hiding this comment.
This is valid; we should fix it.
| // would anyways cause clear the write-protection bit. | ||
| if accessType != block.Write { | ||
| copyMode |= UFFDIO_COPY_MODE_WP | ||
| writeErr = u.fd.copy(addr, u.pageSize, b, mode) |
There was a problem hiding this comment.
Nil source reaches default switch causing nil dereference
Medium Severity
In faultPage, the switch only handles source == nil for header.PageSize and header.HugepageSize. If source is nil (page was removed) and u.pageSize doesn't match either constant, execution falls through to the default case, which calls source.Slice(...) on a nil interface, causing a panic. The code assumes only two page sizes exist, but this isn't validated here.
There was a problem hiding this comment.
The page should not be a different size, but maybe we can add check somewhere to prevent accidens.
| offset, | ||
| directDataSource{data, int64(u.pageSize)}, | ||
| nil, | ||
| // TODO: What mode should we pass here? |
There was a problem hiding this comment.
@bchalios What should we pass here? Should we use mode WP if we are prefaulting?
There was a problem hiding this comment.
Maybe it makes sense to pass COPY_WP? That's what we do with read page faults.
There was a problem hiding this comment.
As mentioned in the other comment, the prefault can run at the same time as the normal serve loop, so we should sync about what needs to be done to make this correct.
There was a problem hiding this comment.
Ok, here we should pass COPY_MODE_WP. Otherwise we will clear the write protection bit and subsequent snapshots will potentially be bigger.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| deferred.push(pf) | ||
| } | ||
|
|
||
| u.prefetchTracker.Add(offset, accessType) |
There was a problem hiding this comment.
Prefetch tracker records offset for unhandled deferred faults
Medium Severity
When faultPage returns handled=false (the EAGAIN path), the pagefault is correctly pushed to deferred for retry, but u.prefetchTracker.Add(offset, accessType) is called unconditionally afterward. This records the offset in prefetch data even though the page was never actually served. When the deferred fault eventually succeeds on a later iteration, prefetchTracker.Add is called again, double-counting the offset. The prefetchTracker.Add call needs to be inside the if handled branch.
Bump the swagger file for Firecracker to v1.14 and regenerate APIs/models. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
Add support for Firecracker v1.14 and make it the default. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
As we're going to handle UFFD_EVENT_REMOVE events triggerred by
Firecracker, we need to keep track of the state of all the guest memory
pages.
Theis commit introduces 3 states:
* Unfaulted - A page that has not been faulted yet.
* Faulted - A page that we have previously faulted in.
* Removed - A page that we have received a remove event for and
haven't faulted in since.
It also adds the necessary book keeping of page state in all the memory
regions of the guest, along with methods for retrieving and setting the
state of pages.
Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
Import a few more bindings that we'll need for handling remove events. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
Handle UFFD_EVENT_REMOVE events from the file descriptor. These events are triggerred when Firecracker calls madvise() with MADV_DONTNEED on some memory range that we are tracking. This Firecracker behaviour is support with version 1.14.0 onwards using the free page reporting and hinting features of balloon devices. What this means for us is that, we need to track removed pages because subsequent page faults need to be served with a zero page. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
Balloon devices provide memory reclamation facilities through free page reporting, as a more efficient mechanism (in terms of latency and CPU time) than traditional ballooning. Free page reporting instructs the guest to periodically report memory that has been freed, so that we can reclaim it in the host side. It is enabled before starting the sandbox and does not require any further host-side orchestration. Enable free page reporting for all new templates using Firecracker versions >= v1.14.0. Also, allow users to optionally disable it for these versions. Older Firecracker versions don't support the feature. Trying to enable it for those, will return an error. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
10a0cfd to
5fa75fe
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix prepared fixes for all 3 issues found in the latest run.
- ✅ Fixed: Nil source dereference when zero-filling non-standard page sizes
faultPagenow handles allsource == nilpage sizes by zero-filling in a dedicated branch, preventing the nil dereference and silent defer loop.
- ✅ Fixed: Test helper
faultedOffsetsreturns all states, not just faulted- The test helper now filters
pageTrackerentries to include only pages in thefaultedstate before converting addresses to offsets.
- The test helper now filters
- ✅ Fixed: Prefault silently succeeds on EAGAIN without mapping page
Prefaultnow returns an error whenfaultPagereportshandled == false, so deferred EAGAIN cases are no longer reported as successful prefaults.
Or push these changes by commenting:
@cursor push b8ac3ea454
Preview (b8ac3ea454)
diff --git a/packages/orchestrator/internal/sandbox/uffd/userfaultfd/cross_process_helpers_test.go b/packages/orchestrator/internal/sandbox/uffd/userfaultfd/cross_process_helpers_test.go
--- a/packages/orchestrator/internal/sandbox/uffd/userfaultfd/cross_process_helpers_test.go
+++ b/packages/orchestrator/internal/sandbox/uffd/userfaultfd/cross_process_helpers_test.go
@@ -409,7 +409,11 @@
defer pt.mu.RUnlock()
offsets := make([]uint64, 0, len(pt.m))
- for addr := range pt.m {
+ for addr, state := range pt.m {
+ if state != faulted {
+ continue
+ }
+
offset, err := ma.GetOffset(addr)
if err != nil {
return nil, fmt.Errorf("address %#x not in mapping: %w", addr, err)
diff --git a/packages/orchestrator/internal/sandbox/uffd/userfaultfd/prefault.go b/packages/orchestrator/internal/sandbox/uffd/userfaultfd/prefault.go
--- a/packages/orchestrator/internal/sandbox/uffd/userfaultfd/prefault.go
+++ b/packages/orchestrator/internal/sandbox/uffd/userfaultfd/prefault.go
@@ -35,7 +35,10 @@
}
if !handled {
- span.RecordError(fmt.Errorf("page already faulted"))
+ err = fmt.Errorf("page fault was deferred and not mapped")
+ span.RecordError(err)
+
+ return err
}
return nil
diff --git a/packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go b/packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go
--- a/packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go
+++ b/packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go
@@ -360,6 +360,8 @@
writeErr = u.fd.zero(addr, u.pageSize, mode)
case source == nil && u.pageSize == header.HugepageSize:
writeErr = u.fd.copy(addr, u.pageSize, header.EmptyHugePage, mode)
+ case source == nil:
+ writeErr = u.fd.zero(addr, u.pageSize, mode)
default:
b, dataErr := source.Slice(ctx, offset, int64(u.pageSize))
if dataErr != nil {| // would anyways cause clear the write-protection bit. | ||
| if accessType != block.Write { | ||
| copyMode |= UFFDIO_COPY_MODE_WP | ||
| writeErr = u.fd.copy(addr, u.pageSize, b, mode) |
There was a problem hiding this comment.
Nil source dereference when zero-filling non-standard page sizes
Medium Severity
In faultPage, when source is nil (removed page needing zero-fill), only 4KB (header.PageSize) and 2MB (header.HugepageSize) are handled explicitly. The default branch calls source.Slice() which would panic on a nil source. The panic recovery silently swallows the error, returning (false, nil), which causes the fault to be endlessly deferred via deferred.push(pf), permanently blocking the faulting thread.
| slices.Sort(offsets) | ||
| } | ||
|
|
||
| return offsets, nil |
There was a problem hiding this comment.
Test helper faultedOffsets returns all states, not just faulted
Low Severity
faultedOffsets iterates over all entries in pageTracker.m with for addr := range pt.m without filtering by pageState. Pages in removed state are included in the results alongside genuinely faulted pages. This would cause tests exercising the new remove event flow to report incorrect faulted offsets.
| span.RecordError(fmt.Errorf("page already faulted")) | ||
| } | ||
|
|
||
| return nil |
There was a problem hiding this comment.
Prefault silently succeeds on EAGAIN without mapping page
Medium Severity
When faultPage returns (false, nil) due to EAGAIN (concurrent remove event), Prefault treats it as a non-error by recording "page already faulted" and returning nil. The page is not actually mapped, but the caller believes prefaulting succeeded. Unlike the Serve loop which retries via deferred.push, Prefault has no retry mechanism for EAGAIN.



Enabling Firecracker free-page-reporting feature requires us to handle remove events (UFFD_EVENT_REMOVE) in our userfaultfd handler. These events are triggered whenever Firecracker calls madvise(MADV_DONTNEED) (or similar) on a range of guest memory addresses.
The main thing that changes on our logic is that page faults in a page that has previously been removed need to be served with a zero page rather than a page from the snapshot file.
This commit changes the page fault serving logic to:
This is dependent on the part of #1858 that enables free page reporting on the Firecracker side.
Note
High Risk
High risk because it changes low-level userfaultfd page-fault handling and VM startup configuration, where subtle race/IOCTL edge cases could cause sandbox boot failures or memory correctness issues.
Overview
This PR updates the sandbox stack to support Firecracker free page reporting by bumping the default Firecracker version to
v1.14.1and introducing an end-to-endFreePageReportingtoggle (API/proto, template build configs, and CLI with version-based defaulting). It significantly refactors the userfaultfd handler to processUFFD_EVENT_REMOVE, track per-page state, and serve removed pages as zero-filled rather than snapshot-backed data, including new UFFD zero-page support and adjusted memory-mapping helpers/tests to match the new flow.Written by Cursor Bugbot for commit 5fa75fe. This will update automatically on new commits. Configure here.