Remove reset attack surface by hlef · Pull Request #54 · CHERIoT-Platform/network-stack

hlef · 2024-12-06T01:15:57Z

One current limitation of the network stack reset is that a certain set of "reset-critical" variables persists across resets, or are used as part of resets, and will prevent resets from being effective if compromised. This is a relevant attack surface, as compromising these variables will fully DoS the network stack, only repairable through a full reboot. This limitations was tracked as part of #31.

In this PR, I intend to entirely eliminate this attack surface.

I go through the list of reset-critical variables and identify those which are not a problem (by construction they cannot be attacked with our threat model). Thanks to @davidchisnall for discussing this.

This leaves us with two important pieces of data: the socket list, and the threadEntryGuard. I address these through refactoring:

Use of lock steps and conditions to make the socket list non-reset-critical (heavily based on the changes proposed in Various changes needed to make network stack reset more reliable. cheriot-rtos#369)
Reset threadEntryGuard at a different locations in the code to make it non-vulnerable

Initially we intended to address this by adding an internal compartment to protect this state (similarly to microreboots), however we realized that this PR's approach is a much better option.

I tested these changes assuming a fully compromised socket list. The network stack reset still works as a charm, albeit visibly more slowly. In practice this should just be an edge case.

If a thread somehow corrupts the socket list, we will lose the ability to retrieve references to socket locks, ultimately preventing us from unblocking threads blocked on them. To handle that situation, ensure that threads block on the socket lock in steps, checking the network stack reset state at every step through `LockGuard` "conditions". If a network stack reset is detected in this way, threads will bail out. Note that the "condition" lambda is necessary here, since socket locks are allocated on a caller capability which we cannot heap-free-all. (see recent additions to the `LockGuard` class). Signed-off-by: Hugo Lefeuvre <hugo.lefeuvre@ubc.ca>

This commit addresses a number of issues in the reset when socket lists are corrupted. Together with the recent support of steps/conditions when waiting on socket locks and event queues, this removes the socket list from the set of reset-critical variables. Signed-off-by: Hugo Lefeuvre <hugo.lefeuvre@ubc.ca>

Although `currentSocketEpoch` and `userThreadCount` are both reset critical, they should be impossible to corrupt by construction, unless control-flow or spatial memory safety is compromised. Document that. Signed-off-by: Hugo Lefeuvre <hugo.lefeuvre@ubc.ca>

By moving the reset of `threadEntryGuard` at a different place in the execution flow, we can remove it from the set of reset variables. The idea here is to only reset `threadEntryGuard` in the case of a crash, and only if the crash was triggered by a user thread, since resetting is not needed in the case of a network thread crash (due to deterministic execution flow, see comment in the code). Signed-off-by: Hugo Lefeuvre <hugo.lefeuvre@ubc.ca>

This is not used anymore, we stopped using it with the stack-overflow-resilient handler. Signed-off-by: Hugo Lefeuvre <hugo.lefeuvre@ubc.ca>

Currently the example ignores the failure and tries to re-open the listening socket, running into an infinite loop. This is not meaningful, if we cannot close the listening socket we better stop since we will never be able to re-bind onto the server port anymore. Signed-off-by: Hugo Lefeuvre <hugo.lefeuvre@ubc.ca>

hlef · 2024-12-06T01:19:18Z

(The CI fails because the RTOS core PR has not yet been merged).

hlef added 6 commits December 5, 2024 15:58

Remove networkThreadID.

0941483

This is not used anymore, we stopped using it with the stack-overflow-resilient handler. Signed-off-by: Hugo Lefeuvre <hugo.lefeuvre@ubc.ca>

hlef requested a review from davidchisnall December 6, 2024 01:15

hlef mentioned this pull request Dec 6, 2024

Various changes needed to make network stack reset more reliable. CHERIoT-Platform/cheriot-rtos#369

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove reset attack surface#54

Remove reset attack surface#54
hlef wants to merge 6 commits intomainfrom
hlefeuvre/reset-reliability

hlef commented Dec 6, 2024 •

edited

Loading

Uh oh!

hlef commented Dec 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hlef commented Dec 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hlef commented Dec 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hlef commented Dec 6, 2024 •

edited

Loading