|
| 1 | +.. SPDX-License-Identifier: GPL-2.0 |
| 2 | +
|
| 3 | +============================================================ |
| 4 | +Hardware-Feedback Interface for scheduling on Intel Hardware |
| 5 | +============================================================ |
| 6 | + |
| 7 | +Overview |
| 8 | +-------- |
| 9 | + |
| 10 | +Intel has described the Hardware Feedback Interface (HFI) in the Intel 64 and |
| 11 | +IA-32 Architectures Software Developer's Manual (Intel SDM) Volume 3 Section |
| 12 | +14.6 [1]_. |
| 13 | + |
| 14 | +The HFI gives the operating system a performance and energy efficiency |
| 15 | +capability data for each CPU in the system. Linux can use the information from |
| 16 | +the HFI to influence task placement decisions. |
| 17 | + |
| 18 | +The Hardware Feedback Interface |
| 19 | +------------------------------- |
| 20 | + |
| 21 | +The Hardware Feedback Interface provides to the operating system information |
| 22 | +about the performance and energy efficiency of each CPU in the system. Each |
| 23 | +capability is given as a unit-less quantity in the range [0-255]. Higher values |
| 24 | +indicate higher capability. Energy efficiency and performance are reported in |
| 25 | +separate capabilities. Even though on some systems these two metrics may be |
| 26 | +related, they are specified as independent capabilities in the Intel SDM. |
| 27 | + |
| 28 | +These capabilities may change at runtime as a result of changes in the |
| 29 | +operating conditions of the system or the action of external factors. The rate |
| 30 | +at which these capabilities are updated is specific to each processor model. On |
| 31 | +some models, capabilities are set at boot time and never change. On others, |
| 32 | +capabilities may change every tens of milliseconds. For instance, a remote |
| 33 | +mechanism may be used to lower Thermal Design Power. Such change can be |
| 34 | +reflected in the HFI. Likewise, if the system needs to be throttled due to |
| 35 | +excessive heat, the HFI may reflect reduced performance on specific CPUs. |
| 36 | + |
| 37 | +The kernel or a userspace policy daemon can use these capabilities to modify |
| 38 | +task placement decisions. For instance, if either the performance or energy |
| 39 | +capabilities of a given logical processor becomes zero, it is an indication that |
| 40 | +the hardware recommends to the operating system to not schedule any tasks on |
| 41 | +that processor for performance or energy efficiency reasons, respectively. |
| 42 | + |
| 43 | +Implementation details for Linux |
| 44 | +-------------------------------- |
| 45 | + |
| 46 | +The infrastructure to handle thermal event interrupts has two parts. In the |
| 47 | +Local Vector Table of a CPU's local APIC, there exists a register for the |
| 48 | +Thermal Monitor Register. This register controls how interrupts are delivered |
| 49 | +to a CPU when the thermal monitor generates and interrupt. Further details |
| 50 | +can be found in the Intel SDM Vol. 3 Section 10.5 [1]_. |
| 51 | + |
| 52 | +The thermal monitor may generate interrupts per CPU or per package. The HFI |
| 53 | +generates package-level interrupts. This monitor is configured and initialized |
| 54 | +via a set of machine-specific registers. Specifically, the HFI interrupt and |
| 55 | +status are controlled via designated bits in the IA32_PACKAGE_THERM_INTERRUPT |
| 56 | +and IA32_PACKAGE_THERM_STATUS registers, respectively. There exists one HFI |
| 57 | +table per package. Further details can be found in the Intel SDM Vol. 3 |
| 58 | +Section 14.9 [1]_. |
| 59 | + |
| 60 | +The hardware issues an HFI interrupt after updating the HFI table and is ready |
| 61 | +for the operating system to consume it. CPUs receive such interrupt via the |
| 62 | +thermal entry in the Local APIC's Local Vector Table. |
| 63 | + |
| 64 | +When servicing such interrupt, the HFI driver parses the updated table and |
| 65 | +relays the update to userspace using the thermal notification framework. Given |
| 66 | +that there may be many HFI updates every second, the updates relayed to |
| 67 | +userspace are throttled at a rate of CONFIG_HZ jiffies. |
| 68 | + |
| 69 | +References |
| 70 | +---------- |
| 71 | + |
| 72 | +.. [1] https://www.intel.com/sdm |
0 commit comments