[doc][netebpfext] Add proposal for async operations in netebpfext by matthewige · Pull Request #5189 · microsoft/ebpf-for-windows

matthewige · 2026-04-22T00:51:46Z

Description

Design proposal for asynchronous processing of verdicts in netebpfext. Adds docs/AsyncProcessing.md covering how WFP classifyFn callouts can PEND, return to WFP, and later COMPLETE from another context, including how this composes with eBPF program invocation, custom maps, and the existing ebpfcore extension contract.

Closes #5188.

This is a documentation-only PR -- no code or test changes.

Highlights

Pend / complete model built on a custom continuation map plus the extension helpers bpf_pend_operation() / bpf_complete_operation(), with a per-bucket lock and an explicit PENDING -> COMPLETING -> COMPLETED lifecycle state machine.
Layer-specific integration for ALE_AUTH_CONNECT_V{4,6} (synchronous reauth) and ALE_AUTH_RECV_ACCEPT_V{4,6} (direct FwpsCompleteOperation from ebpfext context).
Threaded-DPC unwinding contract for AUTH_* deferred completion, and a best-effort PASSIVE-level fence around in-classifyFn completion (Race B).
ebpfcore platform requirements consolidated, including kernel-mode CRUD APIs for custom maps and a provider-side map handle issued at registration so extensions can drive periodic stale-entry cleanup through the same CRUD path.
netebpfext work breakdown for the PEND helpers, the threaded-DPC unwinding glue, and the PASSIVE fence.

Testing

n/a -- documentation only.

Documentation

This PR is the documentation.

Installation

n/a.

Copilot

Pull request overview

Adds a design proposal document for introducing pend/complete-style asynchronous verdict processing to netebpfext (eBPF network extension), intended to enable external orchestrators (user-mode and/or kernel-mode) to complete deferred WFP decisions.

Changes:

Introduces a new proposal doc describing an async “PEND/COMPLETE/CONTINUE” lifecycle built around a custom map and a net_ebpf_ext_pend_operation() helper.
Documents expected control flow for PEND, COMPLETE, CONTINUE, and common failure/edge cases across multiple WFP layers.
Outlines required ebpfcore platform changes and a phased netebpfext work breakdown.

dthaler · 2026-04-27T14:48:58Z

+   [Supported WFP layers](#supported-wfp-layers)) must be able to
+   pend the current network operation and return control to WFP.
+2. An external async orchestrator must be able to complete the pended operation
+   with a verdict (PERMIT, BLOCK, or CONTINUE) at a later time.


Any constraints on "later"? How do you prevent a buggy ebpf program from pending and never completing, thus consuming all resources? Is there something the verifier can do to ensure safety?

Good question. To be clear, there is no 100% guarantee here -- bounding completion latency is the async orchestrator's responsibility, by design. The extension provides backstops, not a guarantee:

The pend map is bounded by max_entries. Once full, pend() returns an error and the program must return a non-PEND verdict, so a buggy/stalled orchestrator degrades gracefully (no kernel resource exhaustion).

A per-entry stale-entry watchdog (driven by the timestamp recorded at pend time) reclaims entries whose orchestrator never came back.

There is no verifier check that proves COMPLETE will eventually be called, since that depends entirely on user-mode behavior; the verifier can't reason about it.

Added a forward-link from the requirement to Edge case 1 in 4228b62 so the design overview points readers at these backstops. Happy to add a more explicit verifier-side check if you have one in mind (e.g., 'program must call pend() before returning PEND').

If there is no guarantee, I'm afraid this will degrade the value of ebpf itself. Would like to discuss this in a meeting. I didn't quite understand the point "A per-entry stale-entry watchdog (driven by the timestamp recorded at pend time) reclaims entries whose orchestrator never came back". If this is just saying that it's guarantee to complete within (say) 2 seconds, by failing (and cleaning up state) if not completed, then that would be a type of guarantee of safety.

- Fix TOC anchor for AUTH_CONNECT/AUTH_RECV_ACCEPT (slash maps to double dash) - Remove FwpsPendClassify reference in DATAGRAM_DATA section -- per the layer table, datagram uses ABSORB+reinject only, no WFP pend API - Fix typos (is is, the the) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add forward link from the COMPLETE requirement to Edge case 1, and call out that bounding completion latency is the orchestrator's responsibility -- the extension only provides backstops (stale-entry watchdog, bounded max_entries). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Note that re-invocation is asynchronous on a worker thread at PASSIVE_LEVEL, not synchronous inside COMPLETE, and not pinned to the original classifyFn CPU. Adds a forward-link to the CONTINUE flow section. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- PERMIT -> PERMIT_SOFT/PERMIT_HARD/BLOCK across COMPLETE step body, callback validation list, and sequence-diagram labels. - Failure flow: REJECT only (remove contradictory PERMIT fallback). - Trim redundancy in helper description, 'Why threaded DPC?' blockquote, and CPU hot-unplug fallback (cross-ref canonical sources). - Add diagnostic trace note in 'Multiple attached programs and PEND'. - Service-crash text simplified to brief 'no active cleanup' wording. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Address PR review feedback: the raw WSACMSGHDR* pointer would not be valid after the original classifyFn returns. Add a comment specifying that control_data is deep-copied into extension-owned memory at pend time so it is safe to access during completion. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Alan-Jowett · 2026-04-30T16:06:06Z

Minor issue:
Not all of the mermaid diagrams render correctly in GitHub ui.

- Add extension-side stale-entry watchdog (layer (e)) as kernel-owned backstop; cite netebpfext connect-redirect precedent. - Tighten saved-state-for-CONTINUE: extension saves full helper-visible net_ebpf_sock_addr_t wrapper (hook_id, redirect_context, transport_endpoint_handle, process_id, access_information, original_context). - Add keying-note callout for continuation map (CONTINUE is consumer-defined; conn-tuple holds for ALE; per-packet layer keying deferred to per-layer design). - Soften 'cleanup orchestrator responsibility' to layered model (orchestrator primary; extension backstop). - Threaded-DPC: remove duplicated 'Load-bearing assumption' callout; reframe as 'Implementation validation' (engineering validation of reimplementation, not uncertainty about underlying DDK pattern). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Document the best-effort KEVENT signal-at-end fence for layers where WFP delivers classifyFn at PASSIVE_LEVEL (TCP ALE_AUTH_CONNECT/ ALE_AUTH_RECV_ACCEPT). The threaded-DPC ordering proof relies on the IRQL barrier and does not transfer to PASSIVE; the fence narrows but does not close the race window with WFP's post-classify pipeline cleanup. Flagged as an open dependency for WFP-team confirmation. - New sub-section under Race B - Per-layer callout under AUTH_CONNECT / AUTH_RECV_ACCEPT - New netebpfext work-breakdown item for the fence implementation - Fix broken Race B anchor (missing third hyphen in slug for ' -> ') Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Extension-driven periodic stale-entry purge (Edge case 1 back-stop) requires a kernel-mode handle to the custom map plus an enumeration entry point. Today the dispatch-table callbacks deliver per-key context only -- the provider has no map handle and no way to iterate independently of an inbound op. Added as ebpfcore platform requirement microsoft#5. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- ebpfcore platform requirements item 2 now includes the map-handle requirement and the extension-driven periodic stale-entry purge (folded in from a brief separate item 5 that was added then removed). - Drop empty '## WFP implementation requirements' section (was a one-line filler heading missing from the TOC; per-layer requirements live under '## Per-layer async design'). - Helper-name typo: pend_operation() -> bpf_pend_operation() in the implicit-program-context-accessor narrative (matches every other reference to the helper in this doc). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Mirror the WESP-side change: add originating_program to net_ebpf_ext_pend_internal_state_t so netebpfext can identify which program in a chain returned PEND when re-dispatching on CONTINUE. This was a real gap when the pend map is shared across multiple programs attached at the same attach point. Layer + attach params were already captured (layer_id, compartment_id) and chain semantics were already pinned down (PEND short-circuits at program N, aggregate_verdict from 1..N-1 preserved, only one program may PEND). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Mirror the WESP-side correction. The doc had implied that user-mode CRUD support was something that needed to be added -- in fact, user-mode and BPF-program-driven CRUD via the existing dispatch-table callbacks (preprocess_map_update_element, preprocess_map_delete_element, postprocess_map_find_element, postprocess_map_delete) is fully wired and works today. The actual remaining ebpfcore gap is extension-initiated kernel-mode CRUD APIs that let a custom-map provider drive operations on its own map from kernel mode (from inside an extension helper handler such as pend(), and from the threaded DPC where there is no in-flight user-mode or BPF caller to drive the dispatch). Updated the design-overview note and ebpfcore work-breakdown item 2 to reflect this -- the user-mode/BPF-helper paths are explicitly called out as 'no new wiring needed', and only the kernel-mode CRUD path is flagged as new ebpfcore work. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Mirror updates from the WESP async-processing PR round 4 walkthrough: - Synthesized COMPLETE: clarified how netebpfext identifies which entry to clean up via pend_id stored in private classifyFn context wrapper. - Multiple attached programs: documented detection of competing PEND-capable programs via a new MAY_PEND attach-params flag plus client-side enumeration of bpf_link_info.attach_data. - Smaller threads: padding/static_assert, netebpfext fail behavior. netebpfext work breakdown updated to track new attach-params extension item. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…odern standby + generic) New Edge case 4 documents two mitigation patterns for conditions that require the orchestrator to temporarily disallow new pends and force-drain already-pended operations (modern standby is the canonical example; in-place servicing and planned maintenance are others): - Pattern (a): pend control custom map (BPF_MAP_TYPE_NET_EBPF_EXT_PEND_CONTROL, size 1, singleton per extension instance, opt-in). Map is zero-initialized to PEND_STATE_DISABLED so the default is fail-closed; orchestrator affirms readiness with an explicit PEND_STATE_ENABLED write at startup. Helper short-circuits to NET_EBPF_EXT_PEND_ERROR_DISABLED while disabled; the ENABLED -> DISABLED transition queues a kernel-side threaded DPC drain. - Pattern (b): best-effort shared BPF_MAP_TYPE_ARRAY polling fallback. Race window for new pends; no kernel-side fail-safe drain; cheaper. Neutral trade-off framing: (a) actually closes the 9F surface, (b) significantly lower cost but leaves the problem unresolved; (a) can be layered on top of (b) incrementally. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…CEPT to DISPATCH; narrow PASSIVE fence to AUTH_LISTEN Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…URITY_SUBJECT_CONTEXT) Documents the design for identity-aware programs that need a full PACCESS_TOKEN or captured SECURITY_SUBJECT_CONTEXT (rather than just the DISPATCH-safe TOKEN_ACCESS_INFORMATION blob from FWPS_INCOMING_VALUE_ALE_USER_ID). Key points: - DISPATCH-vs-PASSIVE applicability per WFP layer (token resolution via ObReferenceObjectByHandle / PsLookupProcessByProcessId is PASSIVE-only; AUTH_CONNECT and AUTH_RECV_ACCEPT are DISPATCH). - New extension-specific helpers bpf_get_access_token / bpf_get_subject_context that return NULL at DISPATCH when not pre-resolved. - New BPF_*_VERDICT_DEFER_TO_PASSIVE verdict + extension-driven PEND + threaded-DPC pinned to classifyFn CPU + worker re-invocation via the existing CONTINUE path. - PID-reuse detection via token-pointer equality (PsReferencePrimaryToken vs ObReferenceObjectByHandle of saved HANDLE) before SeCaptureSubjectContextEx is called. - PETHREAD limitation (Thread = NULL; matches WFP ALE access-check semantics). - Lifecycle / cleanup table; double-defer and PASSIVE-layer-defer fail-closed guards. Tracking issues: microsoft#5231 (parent), microsoft#5235 (bpf_get_access_token), microsoft#5236 (bpf_get_subject_context). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Replace two-helper design (bpf_get_access_token + bpf_get_subject_context) with single bpf_get_identity() returning bpf_identity_info_t. Errno set {0, -EAGAIN, -ENOENT, -EINVAL}. - Classify-wrapper steps 1-11 cover PEND + DEFER_TO_PASSIVE re-invocation, PID-reuse + token-pointer-equality check, atomic-snapshot policy. - Add No-leak invariants (pend-entry path) section documenting structured acquire/release pairing across all failure paths. - Add Race windows R1-R10 with per-row mitigations. - Add DATAGRAM_DATA / STREAM identity propagation: per-flow blob with independent refs published via FwpsFlowAssociateContext0 + flowDeleteFn; F1-F6 race table; open implementation questions. - Add netebpfext work-breakdown item 10 (identity-aware programs). - Reference single GH issue microsoft#5235 (supersedes microsoft#5236). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Restore subject clause in 'No-leak invariants (pend-entry path)' intro that lost its leading sentence in a prior edit. - Drop filler closing sentence in Custom-map subsection. - Drop 'Race protection: COMPLETE before pend API' callout in PEND flow; full argument lives in Race A and pend-API ordering note cross-references it. - Drop CONTINUE re-invocation Note that restated steps 3 and 5 of the same flow verbatim. - Drop section-preview sentence in Identity-aware programs intro (Design overview enumerates the same content one section below). - Drop heading-restating opener sentences in Per-layer async design and Async orchestrator integration guide section intros. - Tighten orchestrator-guide 'COMPLETE and cleanup' subsection to cross-references; full stale-entry cleanup design is in Edge case 1. No technical detail removed; all design rationale, race tables, ownership/lifetime invariants, and per-layer specifics are preserved. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Mirror the corresponding refinements from the internal WESP design doc: - Add Anti-loop guards callout in 'New verdict: defer to PASSIVE' (paired helper/verdict guards prevent runaway DISPATCH<->PASSIVE loops). - Correct the MAY_PEND wording: the user-mode loader sets the existing MAY_PEND attach flag at bpf_link_create time for programs that use bpf_get_identity(); it is not auto-stamped from bytecode. Note that netebpfext treats DEFER_TO_PASSIVE as a chain-terminating verdict identical to PEND. - Compress the DATAGRAM_DATA / STREAM identity-propagation section to a deferred-design note (status, constraint, required SET/GET flow-context mechanism, and an implementation note that the flow blob must hold its own independent refs rather than transferring ownership from the pend entry). The full design is deferred until those layers land in netebpfext. - Remove the Edge cases subsection: the PETHREAD-not-available constraint is inlined into the helper prose; remaining bullets duplicated the race table, anti-loop guards, helper prototype docs, or chain-aggregation paragraphs. - Reposition the DEFER_TO_PASSIVE sequence diagram as a wrap-up after Race windows, with notes showing the per-invocation 'do I actually need identity for this decision?' check and the no-defer fast path. - Trim verifier-rationale prose around the bpf_get_identity prototype. - Add a consolidated DATAGRAM_DATA / STREAM layer-support work item to the netebpfext work breakdown, gathering pend/complete, identity propagation (independent refs), and program-adaptation requirements. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Correct preprocess_map_update_element locking story: callback runs outside the per-bucket lock; rewrite serialization story around single-logical-writer-per-entry + per-value atomicity + single-winner-delete provided by the bucket lock. - Rewrite CONTINUE-flow inline-reinvocation rationale (no longer a deadlock claim; now framed as re-entrancy concern). - Switch pend_key struct from explicit eserved+C_ASSERT to `#pragma pack(push,1)` for layout determinism. - Fix bucket-allocation comments on pend_map and continuation_map `max_entries`: bucket array IS allocated up front; size to expected concurrent pend count, not UINT32_MAX. - Add bidirectional forward-compat safety paragraph. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

shankarseal · 2026-05-08T22:58:46Z

+Network callout drivers often need to defer a verdict on a connection or
+packet while waiting for an asynchronous decision from another component
+-- for example, a user-mode policy service or a kernel-mode classification
+driver. The Windows Filtering Platform (WFP) provides several async
+mechanisms at different layers (`FwpsPendOperation` /
+`FwpsCompleteOperation` at ALE authorize layers, `FwpsPendClassify` /
+`FwpsCompleteClassify` at resource assignment, ABSORB+reinject at
+datagram, DEFER/OOB at stream), but eBPF programs running through


Suggested change

Network callout drivers often need to defer a verdict on a connection or

packet while waiting for an asynchronous decision from another component

-- for example, a user-mode policy service or a kernel-mode classification

driver. The Windows Filtering Platform (WFP) provides several async

mechanisms at different layers (`FwpsPendOperation` /

`FwpsCompleteOperation` at ALE authorize layers, `FwpsPendClassify` /

`FwpsCompleteClassify` at resource assignment, ABSORB+reinject at

datagram, DEFER/OOB at stream), but eBPF programs running through

Network security applications often need to defer a verdict on a connection or

packet while waiting for an asynchronous decision from another component

-- for example, a policy service. But eBPF programs running through

shankarseal · 2026-05-08T23:01:01Z

+1. An eBPF program attached to a supported WFP hook point (see
+   [Supported WFP layers](#supported-wfp-layers)) must be able to
+   pend the current network operation and return control to WFP.


Suggested change

1. An eBPF program attached to a supported WFP hook point (see

[Supported WFP layers](#supported-wfp-layers)) must be able to

pend the current network operation and return control to WFP.

1. An eBPF program attached to SOCK_ADDR attach points must be able to

pend the current network operation and return a new verdict type.

We must avoid mentioning WFP in the context of the BPF program. The "API" is eBPF hook and not WFP.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

matthewige added 2 commits April 16, 2026 10:29

WIP - PR part 1

0fdc4d2

ready for CR

7698af7

matthewige requested review from Alan-Jowett, LakshK98, dthaler, mikeagun, mtfriesen, poornagmsft, saxena-anurag and shankarseal as code owners April 22, 2026 00:51

github-project-automation Bot added this to eBPF for Windows Triage Apr 22, 2026

github-project-automation Bot moved this to Todo in eBPF for Windows Triage Apr 22, 2026

shankarseal assigned keith-horton and shankarseal Apr 24, 2026

updates - add threadedDPC for completion

702e0a1

dthaler requested a review from Copilot April 27, 2026 14:47

Copilot started reviewing on behalf of dthaler April 27, 2026 14:48 View session

Copilot AI reviewed Apr 27, 2026

View reviewed changes

dthaler reviewed Apr 27, 2026

View reviewed changes

matthewige and others added 5 commits April 28, 2026 12:12

Alan-Jowett reviewed Apr 30, 2026

View reviewed changes

Comment thread docs/AsyncProcessing.md Outdated

Alan-Jowett reviewed Apr 30, 2026

View reviewed changes

Comment thread docs/AsyncProcessing.md Outdated

Alan-Jowett reviewed Apr 30, 2026

View reviewed changes

Comment thread docs/AsyncProcessing.md Outdated

matthewige and others added 2 commits May 1, 2026 07:42

matthewige and others added 12 commits May 1, 2026 10:14

docs(async): add classifyFn IRQL column; correct AUTH_CONNECT/RECV_AC…

b064816

…CEPT to DISPATCH; narrow PASSIVE fence to AUTH_LISTEN Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

shankarseal reviewed May 8, 2026

View reviewed changes

shankarseal moved this from Todo to In Progress in eBPF for Windows Triage May 11, 2026

matthewige marked this pull request as draft May 12, 2026 19:12

docs: replace AsyncProcessing.md with v2 design

f53e8a9

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Conversation

matthewige commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Highlights

Testing

Documentation

Installation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dthaler Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

matthewige Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

dthaler May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Alan-Jowett commented Apr 30, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shankarseal May 8, 2026

Choose a reason for hiding this comment

Uh oh!

shankarseal May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

matthewige commented Apr 22, 2026 •

edited

Loading

dthaler May 14, 2026 •

edited

Loading