|
| 1 | +.. _pm_s2idle_psci: |
| 2 | + |
| 3 | +############################################# |
| 4 | +Suspend-to-Idle (S2Idle) and PSCI Integration |
| 5 | +############################################# |
| 6 | + |
| 7 | +********************************** |
| 8 | +Suspend-to-Idle (S2Idle) Overview |
| 9 | +********************************** |
| 10 | + |
| 11 | +Suspend-to-Idle (s2idle), also known as "freeze," is a generic, pure software, light-weight variant of system suspend. |
| 12 | +In this state, the Linux kernel freezes user space tasks, suspends devices, and then puts all CPUs into their deepest available idle state. |
| 13 | + |
| 14 | +********************** |
| 15 | +PSCI as the Enabler |
| 16 | +********************** |
| 17 | + |
| 18 | +The Power State Coordination Interface (PSCI) acts as the fundamental enabler for s2idle on ARM platforms. It provides the abstraction layer |
| 19 | +that allows the Operating System (OS) to request power states without needing intimate knowledge of the underlying power controller hardware. |
| 20 | + |
| 21 | +When the Linux kernel initiates s2idle: |
| 22 | +1. It freezes tasks and suspends devices. |
| 23 | +2. It invokes the `cpuidle` driver for each CPU. |
| 24 | +3. The `cpuidle` driver eventually calls the PSCI `CPU_SUSPEND` API to transition the CPU (and potentially higher-level topology nodes like clusters) |
| 25 | + into a low-power state. |
| 26 | + |
| 27 | +The efficiency of s2idle depends heavily on the PSCI implementation's ability to coordinate these requests and enter the deepest possible hardware state. |
| 28 | + |
| 29 | +************************** |
| 30 | +OS Initiated (OSI) Mode |
| 31 | +************************** |
| 32 | + |
| 33 | +PSCI 1.0 introduced **OS Initiated (OSI)** mode, which shifts the responsibility of power state coordination from the platform firmware to the Operating System. |
| 34 | + |
| 35 | +In the default **Platform Coordinated (PC)** mode, the OS independently requests a state for each core. The firmware then aggregates these requests (voting) to |
| 36 | +determine if a cluster or the system can be powered down. |
| 37 | + |
| 38 | +In **OS Initiated (OSI)** mode, the OS explicitly manages the hierarchy. The OS determines when the last core in a power domain (e.g., a cluster) is going idle |
| 39 | +and explicitly requests the power-down of that domain. |
| 40 | + |
| 41 | +Why OSI? |
| 42 | +======== |
| 43 | + |
| 44 | +OSI mode allows the OS to make better power decisions because it has visibility into: |
| 45 | +* **Task Scheduling:** The OS knows when other cores will wake up. |
| 46 | +* **Wakeup Latencies:** The OS can respect Quality of Service (QoS) latency constraints more accurately. |
| 47 | +* **Usage Patterns:** The OS can predict idle duration better than firmware. |
| 48 | + |
| 49 | +OSI Sequence |
| 50 | +============ |
| 51 | + |
| 52 | +The coordination in OSI mode follows a specific "Last Man Standing" sequence. The OS tracks the state of all cores in a topology node (e.g., a cluster). |
| 53 | + |
| 54 | +1. **First Core(s) Idle:** When the first cores in a cluster go idle, the OS requests a local idle state for those cores (e.g., Core Power Down) but keeps the cluster running. |
| 55 | +2. **Last Core Idle:** When the *last* active core in the cluster is ready to go idle, the OS recognizes that the entire cluster can now be powered down. |
| 56 | +3. **Composite Request:** The last core issues a `CPU_SUSPEND` call that requests a **composite state**: |
| 57 | + * **Core State:** Power Down |
| 58 | + * **Cluster State:** Power Down |
| 59 | + * **System State:** (Optional) Power Down or Retention |
| 60 | +4. **PSCI Enforcement:** The PSCI implementation verifies that all other cores in the requested node are indeed idle. If they are not, the request is denied (to prevent race conditions). |
| 61 | + |
| 62 | +************************************* |
| 63 | +Understanding the Suspend Parameter |
| 64 | +************************************* |
| 65 | + |
| 66 | +The `power_state` parameter passed to `CPU_SUSPEND` is the key to requesting these states. In OSI mode, this parameter must encode the intent for the entire hierarchy. |
| 67 | + |
| 68 | +Composite State Encoding |
| 69 | +======================== |
| 70 | + |
| 71 | +The `power_state` is a 32-bit value that encodes the requested state for the Core and its parent nodes (Cluster, System). |
| 72 | + |
| 73 | +.. list-table:: Recommended StateID Encoding (Example) |
| 74 | + :widths: 15 85 |
| 75 | + :header-rows: 1 |
| 76 | + |
| 77 | + * - Bit Field |
| 78 | + - Description |
| 79 | + * - **[31:28]** |
| 80 | + - **Last Man Indicator**: Specifies the highest power level for which the calling core is the "last man". |
| 81 | + |
| 82 | + * ``0``: Core only. |
| 83 | + * ``1``: Last core in the Cluster. |
| 84 | + * ``2``: Last core in the System. |
| 85 | + * - **[27:16]** |
| 86 | + - Reserved / Implementation Defined (often used for Extended StateID). |
| 87 | + * - **[15:12]** |
| 88 | + - Reserved. |
| 89 | + * - **[11:8]** |
| 90 | + - **System Level State**: e.g., 3 for Power Down, 2 for Retention. |
| 91 | + * - **[7:4]** |
| 92 | + - **Cluster Level State**: e.g., 3 for Power Down, 2 for Retention. |
| 93 | + * - **[3:0]** |
| 94 | + - **Core Level State**: e.g., 3 for Power Down, 1 for Retention, 0 for Run. |
| 95 | + |
| 96 | +**Example Scenario (OSI):** |
| 97 | + |
| 98 | +If the OS wants to put the last core of a cluster into power down, and also power down the cluster itself, it might construct a `power_state` where: |
| 99 | +* **Core State [3:0]** = 3 (Power Down) |
| 100 | +* **Cluster State [7:4]** = 3 (Power Down) |
| 101 | +* **Last Man Indicator** = 1 (Cluster Level) |
| 102 | + |
| 103 | +The PSCI implementation reads this, confirms the other cores in the cluster are OFF, and then proceeds to power gate the entire cluster. |
0 commit comments