uffd: add support for UFFD_EVENT_REMOVE events by bchalios · Pull Request #1896 · e2b-dev/infra

bchalios · 2026-02-11T21:13:23Z

Enabling Firecracker free-page-reporting feature requires us to handle remove events (UFFD_EVENT_REMOVE) in our userfaultfd handler. These events are triggered whenever Firecracker calls madvise(MADV_DONTNEED) (or similar) on a range of guest memory addresses.

The main thing that changes on our logic is that page faults in a page that has previously been removed need to be served with a zero page rather than a page from the snapshot file.

This commit changes the page fault serving logic to:

Introduce tracking of the state of every page in the guest's memory mappings.
Add logic to handle the new UFFD_EVENT_REMOVE event
Modify existing logic to take into account current state when deciding how to handle each page fault

This is dependent on the part of #1858 that enables free page reporting on the Firecracker side.

Note

High Risk
High risk because it changes low-level userfaultfd page-fault handling and VM startup configuration, where subtle race/IOCTL edge cases could cause sandbox boot failures or memory correctness issues.

Overview
This PR updates the sandbox stack to support Firecracker free page reporting by bumping the default Firecracker version to v1.14.1 and introducing an end-to-end FreePageReporting toggle (API/proto, template build configs, and CLI with version-based defaulting). It significantly refactors the userfaultfd handler to process UFFD_EVENT_REMOVE, track per-page state, and serve removed pages as zero-filled rather than snapshot-backed data, including new UFFD zero-page support and adjusted memory-mapping helpers/tests to match the new flow.

^{Written by Cursor Bugbot for commit 5fa75fe. This will update automatically on new commits. Configure here.}

packages/orchestrator/internal/sandbox/uffd/memory/mapping.go

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go

packages/orchestrator/internal/sandbox/uffd/memory/region.go

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go

packages/orchestrator/internal/sandbox/fc/process.go

packages/shared/pkg/fc/firecracker.yml

packages/orchestrator/internal/sandbox/uffd/memory/region.go

Copilot

Pull request overview

This PR upgrades Firecracker from v1.12.1 to v1.14.1 and adds support for UFFD_EVENT_REMOVE events in the userfaultfd handler, which is required to support Firecracker's free-page-reporting feature. When Firecracker calls madvise(MADV_DONTNEED) on guest memory ranges, the kernel emits REMOVE events; subsequent page faults on those addresses must be zero-filled rather than served from the snapshot file.

Changes:

Adds UFFD_EVENT_REMOVE handling in the UFFD handler with a new pageTracker to track per-page state (unfaulted, faulted, removed), replaces the old block.Tracker with this new tracker, and introduces deferred fault retry logic for EAGAIN.
Bumps Firecracker default version to v1.14.1_aba5ab3 across feature flags, seeds, and CI, and regenerates the go-swagger client/models for the new Firecracker v1.14.1 OpenAPI spec (adding balloon hinting, pmem, serial device, memory hotplug, and structured CPU config).
Adds FreePageReporting config flag propagated from the API layer through the orchestrator to Firecracker, enabled by default for v1.14+ sandboxes.

Reviewed changes

Copilot reviewed 68 out of 68 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
`userfaultfd/userfaultfd.go`	Core UFFD handler: adds REMOVE event processing, page state tracking, deferred fault retry, zero-fill for removed pages
`userfaultfd/page_tracker.go`	New per-page state map (`unfaulted`/`faulted`/`removed`)
`userfaultfd/deferred.go`	Thread-safe collection for pagefaults that returned EAGAIN
`userfaultfd/prefault.go`	Extracted prefault logic from main handler
`userfaultfd/fd.go`	New UFFD constants/types: `UFFD_EVENT_REMOVE`, `UFFDIO_ZEROPAGE`, `UffdRemove`, etc.
`memory/mapping.go`	Simplified `GetOffset`/`GetHostVirtAddr` to return one fewer value
`block/tracker.go` + `tracker_test.go`	Deleted (replaced by `pageTracker`)
`fc/firecracker.yml`	OpenAPI spec bumped to v1.14.1 with new endpoints and definitions
`fc/models/*`	Regenerated go-swagger models for new Firecracker API
`fc/client/operations/*`	Regenerated go-swagger client operations
`sandbox/fc/client.go`	New `enableFreePageReporting` method
`sandbox/fc/process.go`	Passes `freePageReporting` flag to process creation
`sandbox/sandbox.go`	`Config` struct extended with `FreePageReporting`
`api/.../sandbox_features.go`	`HasFreePageReporting()` version check
`api/.../create_template.go`	Passes free-page-reporting flag when creating templates
`feature-flags/flags.go`	New v1.14 Firecracker version constant
`tests/integration/seed.go`	Seed data updated to v1.14.1
`.github/actions/build-sandbox-template/action.yml`	CI updated to v1.14.1

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go

Copilot · 2026-03-04T07:48:56Z

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go

+			u.wg.Go(func() error {
+				// The RLock must be called inside the goroutine to ensure RUnlock runs via defer,
+				// even if the errgroup is cancelled or the goroutine returns early.
+				// This check protects us against race condition between marking the request for prefetching and accessing the prefetchTracker.
+				u.settleRequests.RLock()
+				defer u.settleRequests.RUnlock()
+
+				var copyMode CULong
+				var accessType block.AccessType
+
+				// Performing copy() on UFFD clears the WP bit unless we explicitly tell
+				// it not to. We do that for faults caused by a read access. Write accesses
+				// would anyways cause clear the write-protection bit.
+				if pf.flags&UFFD_PAGEFAULT_FLAG_WRITE == 0 {
+					copyMode = UFFDIO_COPY_MODE_WP
+					accessType = block.Read
+				} else {
+					accessType = block.Write
+				}
+
+				handled, err := u.faultPage(
+					ctx,
+					addr,
+					offset,
+					source,
+					fdExit.SignalExit,
+					copyMode,
+				)
+				if err != nil {
+					return err
+				}
+
+				if handled {
+					u.pageTracker.setState(addr, addr+u.pageSize, faulted)
+				} else {
+					deferred.push(pf)
+				}
+
+				u.prefetchTracker.Add(offset, accessType)


There is a race condition between the goroutine that calls u.pageTracker.setState(addr, addr+u.pageSize, faulted) (line 321) and the main event loop that reads u.pageTracker.get(addr) (line 276). The pageTracker itself is mutex-protected for individual operations, but between the time the main loop reads a page as unfaulted, dispatches a goroutine, and the goroutine eventually marks it faulted, a second pagefault for the same address can be dispatched and both goroutines can race to call faultPage for the same address. While EEXIST is handled (line 376), this scenario can also cause the deferred.push(pf) path to be taken by both, resulting in indefinite re-queueing. More critically, in the same goroutine, deferred.push(pf) is called (line 323) without holding the settleRequests lock, while the main loop drains deferred (line 235) without coordinating with in-flight goroutines, meaning a deferred fault could be drained and forgotten before the goroutine that pushed it is done executing.

Not sure right now if this can be a problem, so we should recheck.

Every time we see there are available events to read from UFFD, we wait for all Go routines in the workgroup to complete before we start handling the events of the outer iteration. That happens at line 186.

Copilot · 2026-03-04T07:48:56Z

packages/orchestrator/internal/sandbox/uffd/userfaultfd/prefault.go

+	}
+
+	if !handled {
+		span.RecordError(fmt.Errorf("page already faulted"))


The Prefault function in prefault.go (line 37-39) records an error to the span when handled is false (EAGAIN case), treating it as an error event. However, when faultPage returns (false, nil), it means the write returned EAGAIN and should be retried — it is not an error condition. The span should not record this as an error; at most a span attribute or event noting the deferral would be appropriate.

Suggested change

span.RecordError(fmt.Errorf("page already faulted"))

span.AddEvent("prefault deferred: page already faulted or write returned EAGAIN")

We should clean this up.

Copilot · 2026-03-04T07:48:56Z

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go

+			pagefaults = append(pagefaults, (*UffdPagefault)(unsafe.Pointer(&arg[0])))
+		case UFFD_EVENT_REMOVE:
+			removes = append(removes, (*UffdRemove)(unsafe.Pointer(&arg[0])))


In readEvents, the buf slice is reused across loop iterations, but the parsed UffdRemove and UffdPagefault pointers (at lines 122-124) point directly into the buffer's underlying memory via unsafe.Pointer. Since buf is overwritten by each subsequent syscall.Read call, all previously appended pointers now alias the same buffer memory and will reflect the data of the last-read event rather than the event they were parsed from.

Each parsed struct must be copied into its own allocation (e.g. v := *(*UffdPagefault)(unsafe.Pointer(&arg[0])); pagefaults = append(pagefaults, &v)) so that every pointer in the returned slices points to independent memory. Without this fix, processing multiple events in a single readEvents call leads to all events appearing identical (the last-read event), causing incorrect page state tracking and potentially silent data corruption in the guest VM's memory.

Suggested change

pagefaults = append(pagefaults, (*UffdPagefault)(unsafe.Pointer(&arg[0])))

case UFFD_EVENT_REMOVE:

removes = append(removes, (*UffdRemove)(unsafe.Pointer(&arg[0])))

v := *(*UffdPagefault)(unsafe.Pointer(&arg[0]))

pagefaults = append(pagefaults, &v)

case UFFD_EVENT_REMOVE:

v := *(*UffdRemove)(unsafe.Pointer(&arg[0]))

removes = append(removes, &v)

We should look into this.

I think this is a vaild one. We only allocate buf once and then reuse it across iterations, where we take a pointer to its arg field. I think all events will be pointing to the same memory. The solution is to move the allocation inside the loop.

Actually it's not. Indeed we're re-using the same buffer (buf) but the arg bit is something we're getting calling getMsgArg which returns a [24]byte value, which essentially copies the msg.arg field. It's not the same memory. I have validated this, by manually inspecting the addresses of the various variables when the loop handles more than one events.

Copilot · 2026-03-04T07:48:56Z

packages/orchestrator/internal/sandbox/uffd/userfaultfd/cross_process_helpers_test.go

+func (pt *pageTracker) faultedOffsets(ma *memory.Mapping) ([]uint64, error) {
+	pt.mu.RLock()
+	defer pt.mu.RUnlock()
+
+	offsets := make([]uint64, 0, len(pt.m))
+	for addr := range pt.m {
+		offset, err := ma.GetOffset(addr)
+		if err != nil {
+			return nil, fmt.Errorf("address %#x not in mapping: %w", addr, err)
+		}
+		offsets = append(offsets, uint64(offset))
+	}


In cross_process_helpers_test.go, the faultedOffsets helper iterates over all pages in pt.m regardless of their state, including pages in the removed state. This means removed pages will appear in the faulted offsets output, which may cause test assertions to match incorrectly. Only pages in the faulted state should be included in the faulted offsets list.

That is ok, because for this test we want all pages that were faulted and to be removed it had to be faulted.

Though, we might want to change this if we are adding a new test for the removed.

packages/api/internal/sandbox/sandbox_features.go

packages/orchestrator/internal/sandbox/uffd/userfaultfd/page_tracker.go

packages/shared/pkg/fc/firecracker.yml

packages/shared/pkg/fc/models/balloon_stats.go

cursor · 2026-03-04T07:52:08Z

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go

+		case UFFD_EVENT_PAGEFAULT:
+			pagefaults = append(pagefaults, (*UffdPagefault)(unsafe.Pointer(&arg[0])))
+		case UFFD_EVENT_REMOVE:
+			removes = append(removes, (*UffdRemove)(unsafe.Pointer(&arg[0])))


Event pointers reference ephemeral stack memory

High Severity

readEvents appends pointers created from arg := getMsgArg(msg), but arg is a per-iteration local copy. The stored *UffdPagefault and *UffdRemove pointers outlive that stack value, so later reads can use corrupted event data or repeated last-event values.

Already noted in another comment.

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go

packages/api/internal/sandbox/sandbox_features.go

cursor · 2026-03-04T07:52:08Z

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go

+			switch state := u.pageTracker.get(addr); state {
+			case faulted:
+				// TODO: Can we skip faulting pages that are already faulted? How does that play with prefaulting?
+				continue


Repeated faults are skipped indefinitely

Medium Severity

When a page is marked faulted, later pagefault events for that same address are ignored with continue. If a mapped page becomes missing again (for example after reclaim/swap behavior), the new fault is never resolved, leaving the guest thread blocked on an unhandled userfault.

Added TODO there to resolve this.

Do we actually use swap on the host? If not, the logic here is correct. The only reason why a page would become missing is because we previously removed it, and we are tracking that. If we do enable swap this whole logic is wrong. Actually it already was wrong, even before adding support for UFFD_EVENT_REMOVE. Imagine a page has been written. This means its contents have changed comparing to what is in the snapshot file. If that page gets swapped out, we have no way of knowing what data we should provide for it. I'm not really sure how swapping would interplay with UFFD at all.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 06d14431fe

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go

ValentaTomas · 2026-03-04T07:57:10Z

We should add a test that explicitly tests the remove (and faulted?) flows in the uffd.

cursor · 2026-03-04T08:05:04Z

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go

+		case UFFD_EVENT_PAGEFAULT:
+			pagefaults = append(pagefaults, (*UffdPagefault)(unsafe.Pointer(&arg[0])))
+		case UFFD_EVENT_REMOVE:
+			removes = append(removes, (*UffdRemove)(unsafe.Pointer(&arg[0])))


Uffd event pointers reference invalid stack memory

High Severity

readEvents stores pointers to arg, but arg is a per-iteration local copy ([24]byte). After the loop, those pointers no longer reliably reference valid event data. Serve then reads pf.flags, pf.address, and remove ranges from invalid memory, which can mis-handle faults or fail unpredictably under load.

Additional Locations (1)

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go#L254-L267

Not sure here, we should recheck.

I think that if we move buf inside the loop then Go will understand that it escapes from the function's context and will just place it in heap.

Event if we don't move buf inside the loop, Go is able to track that the memory (of arg) escapes the function context and puts it in the heap.

packages/api/internal/sandbox/sandbox_features.go

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go

cursor · 2026-03-04T08:05:04Z

packages/orchestrator/internal/sandbox/uffd/userfaultfd/prefault.go

+		span.RecordError(fmt.Errorf("page already faulted"))
+	}
+
+	return nil


Prefault silently succeeds after deferred write failure

Medium Severity

faultPage now returns handled=false on unix.EAGAIN, but Prefault treats !handled as non-fatal and still returns nil. This allows prefault requests to report success even when no page was mapped, leaving memory unfaulted and shifting failures to later runtime faults.

Additional Locations (1)

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go#L383-L390

We should check the logic.

The reason for this is that we get EAGAIN because of a UFFD_EVENT_REMOVE landing (concurrently) in the queue. When we get an EAGAIN during pre-faulting we don't explicitly handle a page fault, so ignoring here should be fine. I'm assuming here that when pre-faulting happens, we don't run the normal handler loop, right? If that's not the case, we might get issues if we are trying to "prefault" a page and handle a page fault to the same page at the same time.

We do run the normal loop at the time of the prefaulting.

Ok, I've spent some time thinking this. I don't think there's a problem with running prefaulter and on-demand faulting loop concurrently (as long as the page state tracker is handled/coordinated correctly). I can elaborate offline if you want why that is the case.

In any case, it's fine returning nil here. If EAGAIN happens, we will just not pre-fault the page. There isn't any functional problem with it.

cursor · 2026-03-04T08:05:04Z

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go

+				// would anyways cause clear the write-protection bit.
+				if pf.flags&UFFD_PAGEFAULT_FLAG_WRITE == 0 {
+					copyMode = UFFDIO_COPY_MODE_WP
+					accessType = block.Read


Wrong mode passed to zeropage ioctl

High Severity

Read faults set mode to UFFDIO_COPY_MODE_WP, and removed 4KB pages are handled via UFFDIO_ZEROPAGE. zero() forwards that mode directly to UFFDIO_ZEROPAGE, which does not use UFFDIO_COPY_MODE_WP. This can make zero-fill faults fail and break handling of UFFD_EVENT_REMOVE for standard pages.

Additional Locations (2)

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go#L355-L357

packages/orchestrator/internal/sandbox/uffd/userfaultfd/fd.go#L152-L157

We should recheck if the zero does not handle mode wp.

I'm pretty sure that it does. I will re-check

cursor · 2026-03-04T08:18:25Z

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go

+					deferred.push(pf)
+				}
+
+				u.prefetchTracker.Add(offset, accessType)


Prefetch tracker updated for deferred (unhandled) page faults

Medium Severity

u.prefetchTracker.Add(offset, accessType) is called unconditionally after faultPage, even when handled is false (i.e., the fault was deferred due to EAGAIN). This means the prefetch tracker records offsets for pages that haven't actually been served yet. When the deferred fault is later retried and succeeds, prefetchTracker.Add will be called a second time for the same offset. Previously, this call only happened after a successful copy. It likely needs to be inside the if handled block.

This is valid; we should fix it.

cursor · 2026-03-04T08:18:25Z

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go

-	// would anyways cause clear the write-protection bit.
-	if accessType != block.Write {
-		copyMode |= UFFDIO_COPY_MODE_WP
+		writeErr = u.fd.copy(addr, u.pageSize, b, mode)


Nil source reaches default switch causing nil dereference

Medium Severity

In faultPage, the switch only handles source == nil for header.PageSize and header.HugepageSize. If source is nil (page was removed) and u.pageSize doesn't match either constant, execution falls through to the default case, which calls source.Slice(...) on a nil interface, causing a panic. The code assumes only two page sizes exist, but this isn't validated here.

The page should not be a different size, but maybe we can add check somewhere to prevent accidens.

ValentaTomas · 2026-03-04T08:21:29Z

packages/orchestrator/internal/sandbox/uffd/userfaultfd/prefault.go

+		offset,
+		directDataSource{data, int64(u.pageSize)},
+		nil,
+		// TODO: What mode should we pass here?


@bchalios What should we pass here? Should we use mode WP if we are prefaulting?

Maybe it makes sense to pass COPY_WP? That's what we do with read page faults.

As mentioned in the other comment, the prefault can run at the same time as the normal serve loop, so we should sync about what needs to be done to make this correct.

Ok, here we should pass COPY_MODE_WP. Otherwise we will clear the write protection bit and subsequent snapshots will potentially be bigger.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-04T08:31:58Z

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go

+					deferred.push(pf)
+				}
+
+				u.prefetchTracker.Add(offset, accessType)


Prefetch tracker records offset for unhandled deferred faults

Medium Severity

When faultPage returns handled=false (the EAGAIN path), the pagefault is correctly pushed to deferred for retry, but u.prefetchTracker.Add(offset, accessType) is called unconditionally afterward. This records the offset in prefetch data even though the page was never actually served. When the deferred fault eventually succeeds on a later iteration, prefetchTracker.Add is called again, double-counting the offset. The prefetchTracker.Add call needs to be inside the if handled branch.

We should check this.

Bump the swagger file for Firecracker to v1.14 and regenerate APIs/models. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

Add support for Firecracker v1.14 and make it the default. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

As we're going to handle UFFD_EVENT_REMOVE events triggerred by Firecracker, we need to keep track of the state of all the guest memory pages. Theis commit introduces 3 states: * Unfaulted - A page that has not been faulted yet. * Faulted - A page that we have previously faulted in. * Removed - A page that we have received a remove event for and haven't faulted in since. It also adds the necessary book keeping of page state in all the memory regions of the guest, along with methods for retrieving and setting the state of pages. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

Import a few more bindings that we'll need for handling remove events. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

Handle UFFD_EVENT_REMOVE events from the file descriptor. These events are triggerred when Firecracker calls madvise() with MADV_DONTNEED on some memory range that we are tracking. This Firecracker behaviour is support with version 1.14.0 onwards using the free page reporting and hinting features of balloon devices. What this means for us is that, we need to track removed pages because subsequent page faults need to be served with a zero page. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

Balloon devices provide memory reclamation facilities through free page reporting, as a more efficient mechanism (in terms of latency and CPU time) than traditional ballooning. Free page reporting instructs the guest to periodically report memory that has been freed, so that we can reclaim it in the host side. It is enabled before starting the sandbox and does not require any further host-side orchestration. Enable free page reporting for all new templates using Firecracker versions >= v1.14.0. Also, allow users to optionally disable it for these versions. Older Firecracker versions don't support the feature. Trying to enable it for those, will return an error. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix prepared fixes for all 3 issues found in the latest run.

✅ Fixed: Nil source dereference when zero-filling non-standard page sizes
- faultPage now handles all source == nil page sizes by zero-filling in a dedicated branch, preventing the nil dereference and silent defer loop.
✅ Fixed: Test helper faultedOffsets returns all states, not just faulted
- The test helper now filters pageTracker entries to include only pages in the faulted state before converting addresses to offsets.
✅ Fixed: Prefault silently succeeds on EAGAIN without mapping page
- Prefault now returns an error when faultPage reports handled == false, so deferred EAGAIN cases are no longer reported as successful prefaults.

Or push these changes by commenting:

@cursor push b8ac3ea454

Preview (b8ac3ea454)

diff --git a/packages/orchestrator/internal/sandbox/uffd/userfaultfd/cross_process_helpers_test.go b/packages/orchestrator/internal/sandbox/uffd/userfaultfd/cross_process_helpers_test.go
--- a/packages/orchestrator/internal/sandbox/uffd/userfaultfd/cross_process_helpers_test.go
+++ b/packages/orchestrator/internal/sandbox/uffd/userfaultfd/cross_process_helpers_test.go
@@ -409,7 +409,11 @@
 	defer pt.mu.RUnlock()
 
 	offsets := make([]uint64, 0, len(pt.m))
-	for addr := range pt.m {
+	for addr, state := range pt.m {
+		if state != faulted {
+			continue
+		}
+
 		offset, err := ma.GetOffset(addr)
 		if err != nil {
 			return nil, fmt.Errorf("address %#x not in mapping: %w", addr, err)

diff --git a/packages/orchestrator/internal/sandbox/uffd/userfaultfd/prefault.go b/packages/orchestrator/internal/sandbox/uffd/userfaultfd/prefault.go
--- a/packages/orchestrator/internal/sandbox/uffd/userfaultfd/prefault.go
+++ b/packages/orchestrator/internal/sandbox/uffd/userfaultfd/prefault.go
@@ -35,7 +35,10 @@
 	}
 
 	if !handled {
-		span.RecordError(fmt.Errorf("page already faulted"))
+		err = fmt.Errorf("page fault was deferred and not mapped")
+		span.RecordError(err)
+
+		return err
 	}
 
 	return nil

diff --git a/packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go b/packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go
--- a/packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go
+++ b/packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go
@@ -360,6 +360,8 @@
 		writeErr = u.fd.zero(addr, u.pageSize, mode)
 	case source == nil && u.pageSize == header.HugepageSize:
 		writeErr = u.fd.copy(addr, u.pageSize, header.EmptyHugePage, mode)
+	case source == nil:
+		writeErr = u.fd.zero(addr, u.pageSize, mode)
 	default:
 		b, dataErr := source.Slice(ctx, offset, int64(u.pageSize))
 		if dataErr != nil {

_{This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.}

cursor · 2026-03-06T19:17:36Z

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go

-	// would anyways cause clear the write-protection bit.
-	if accessType != block.Write {
-		copyMode |= UFFDIO_COPY_MODE_WP
+		writeErr = u.fd.copy(addr, u.pageSize, b, mode)


Nil source dereference when zero-filling non-standard page sizes

Medium Severity

In faultPage, when source is nil (removed page needing zero-fill), only 4KB (header.PageSize) and 2MB (header.HugepageSize) are handled explicitly. The default branch calls source.Slice() which would panic on a nil source. The panic recovery silently swallows the error, returning (false, nil), which causes the fault to be endlessly deferred via deferred.push(pf), permanently blocking the faulting thread.

cursor · 2026-03-06T19:17:36Z

packages/orchestrator/internal/sandbox/uffd/userfaultfd/cross_process_helpers_test.go

+		slices.Sort(offsets)
+	}
+
+	return offsets, nil


Test helper faultedOffsets returns all states, not just faulted

Low Severity

faultedOffsets iterates over all entries in pageTracker.m with for addr := range pt.m without filtering by pageState. Pages in removed state are included in the results alongside genuinely faulted pages. This would cause tests exercising the new remove event flow to report incorrect faulted offsets.

cursor · 2026-03-06T19:17:36Z

packages/orchestrator/internal/sandbox/uffd/userfaultfd/prefault.go

+		span.RecordError(fmt.Errorf("page already faulted"))
+	}
+
+	return nil


Prefault silently succeeds on EAGAIN without mapping page

Medium Severity

When faultPage returns (false, nil) due to EAGAIN (concurrent remove event), Prefault treats it as a non-error by recording "page already faulted" and returning nil. The page is not actually mapped, but the caller believes prefaulting succeeded. Unlike the Serve loop which retries via deferred.push, Prefault has no retry mechanism for EAGAIN.

e2b-request-same-site-reviewers bot assigned jakubno Feb 11, 2026

bchalios force-pushed the feat/free-page-reporting branch from aef7cd8 to eab27ea Compare February 11, 2026 21:13

claude bot reviewed Feb 11, 2026

View reviewed changes

packages/orchestrator/internal/sandbox/uffd/memory/mapping.go Outdated Show resolved Hide resolved

claude bot reviewed Feb 11, 2026

View reviewed changes

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go Outdated Show resolved Hide resolved

claude bot reviewed Feb 11, 2026

View reviewed changes

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go Outdated Show resolved Hide resolved

claude bot reviewed Feb 11, 2026

View reviewed changes

packages/orchestrator/internal/sandbox/uffd/memory/region.go Outdated Show resolved Hide resolved

bchalios force-pushed the feat/free-page-reporting branch 2 times, most recently from a1d0a02 to 03e2966 Compare February 11, 2026 21:20

jakubno assigned ValentaTomas Feb 11, 2026

bchalios force-pushed the feat/free-page-reporting branch 5 times, most recently from b766eb3 to 4d197e4 Compare February 12, 2026 23:48

ValentaTomas reviewed Feb 12, 2026

View reviewed changes

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go Outdated Show resolved Hide resolved

ValentaTomas reviewed Feb 12, 2026

View reviewed changes

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go Outdated Show resolved Hide resolved

ValentaTomas reviewed Feb 12, 2026

View reviewed changes

packages/orchestrator/internal/sandbox/fc/process.go Outdated Show resolved Hide resolved

ValentaTomas reviewed Feb 12, 2026

View reviewed changes

packages/shared/pkg/fc/firecracker.yml Show resolved Hide resolved

ValentaTomas reviewed Feb 12, 2026

View reviewed changes

packages/orchestrator/internal/sandbox/uffd/memory/region.go Outdated Show resolved Hide resolved

bchalios force-pushed the feat/free-page-reporting branch 2 times, most recently from e639acf to 88420f2 Compare February 13, 2026 00:45

ValentaTomas reviewed Feb 13, 2026

View reviewed changes

packages/orchestrator/internal/sandbox/uffd/memory/region.go Outdated Show resolved Hide resolved

bchalios force-pushed the feat/free-page-reporting branch 8 times, most recently from d2e5d09 to 8ea97e4 Compare February 17, 2026 15:49

ValentaTomas unassigned arkamar Mar 4, 2026

Copilot started reviewing on behalf of ValentaTomas March 4, 2026 07:45 View session

Copilot AI reviewed Mar 4, 2026

View reviewed changes

cursor bot reviewed Mar 4, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Mar 4, 2026

View reviewed changes

packages/orchestrator/internal/sandbox/uffd/userfaultfd/userfaultfd.go Outdated Show resolved Hide resolved

cursor bot reviewed Mar 4, 2026

View reviewed changes

ValentaTomas reviewed Mar 4, 2026

View reviewed changes

cursor bot reviewed Mar 4, 2026

View reviewed changes

bchalios and others added 18 commits March 5, 2026 11:37

feat: update Firecracker client to v1.14

0f87625

Bump the swagger file for Firecracker to v1.14 and regenerate APIs/models. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

orchestrator: add support for Firecracker v1.14

25644cf

Add support for Firecracker v1.14 and make it the default. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

uffd: export more CGO bindings for UFFD

646dfeb

Import a few more bindings that we'll need for handling remove events. Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

chore: auto-commit generated changes

f5b68be

Cleanup and move free page reporting field to API

f06903b

Cleanup

f9e942d

Wrap onFailure

228f662

Clarify prefetch remove handling

2442750

Cleanup

95f4387

Fix lint

4091dc4

Fix lint supress

4e4c937

Clarify behavior; Fix bug

ecf0841

Cleanup

574ac78

Fix version check for major > 2

35f98d3

fix(uffd): clarifications and minor fixes

5fa75fe

Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>

bchalios force-pushed the feat/free-page-reporting branch from 10a0cfd to 5fa75fe Compare March 6, 2026 19:11

cursor bot reviewed Mar 6, 2026

View reviewed changes

	span.RecordError(fmt.Errorf("page already faulted"))
	span.AddEvent("prefault deferred: page already faulted or write returned EAGAIN")

Conversation

bchalios commented Feb 11, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bchalios Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Mar 4, 2026

Choose a reason for hiding this comment

Event pointers reference ephemeral stack memory

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cursor bot Mar 4, 2026

Choose a reason for hiding this comment

Repeated faults are skipped indefinitely

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

ValentaTomas commented Mar 4, 2026

Uh oh!

cursor bot Mar 4, 2026

Choose a reason for hiding this comment

bchalios commented Feb 11, 2026 •

edited by cursor bot

Loading

bchalios Mar 6, 2026 •

edited

Loading