@@ -10,6 +10,45 @@ associated problem space.
1010point that out explicitly and clearly in the associated patches and Cc
1111` Christian Brauner <brauner (at) kernel (dot) org ` .**
1212
13+ ### Dynamic No New Privileges (NNP) via bpf
14+
15+ On newer systems the use of privilege escalating binaries (suid, sgid,
16+ file capabilities) can be avoided. This model is illustrated in
17+ systemd's ` run0 ` tool.
18+
19+ So it is possible to turn on ` PR_SET_NO_NEW_PRIVS ` (NNP) for systemd
20+ itself and thus for every process on the system. However, that breaks
21+ sandboxed workloads. Sandboxed workloads such as containers may run
22+ a single process without a full-fledged daemon that could supervise
23+ privileged operations. In such cases executing privilege escalating
24+ binaries must be allowed.
25+
26+ Ideally sandboxes that require execution of privilege escalating
27+ binaries must use a user namespace with a non-identity idmapping.
28+
29+ Instead of revamping the fairly inflexible NNP implementation, execution
30+ of privilege escalating binaries should be supervised by a bpf LSM.
31+
32+ When a privilege escalating binary is executed in the initial user
33+ namespace the bpf LSM program will cause the kernel to skip elevating
34+ privileges and instead execute the binary with the caller's privileges.
35+ This is equivalent to the NNP behavior.
36+
37+ If a privilege escalating binary is executed in a non-initial user
38+ namespace the bpf LSM program will allow the kernel to escalate the
39+ caller's privileges to a higher privilege level.
40+
41+ This will allow unprivileged containers to execute privilege escalating
42+ binaries but completely isolate regular services from doing so.
43+
44+ This can of course be configurable on a per-service basis if needed.
45+
46+ This will require hooking up a new security hook into the kernel's exec
47+ codepath.
48+
49+ ** Use-Case:** Wean all of userspace off of privilege escalating
50+ binaries.
51+
1352### xattrs for pidfd
1453
1554Since pidfds have been moved to a separate pidfs filesystem it is easy
0 commit comments