energy-model.rst 10 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244
  1. .. SPDX-License-Identifier: GPL-2.0
  2. =======================
  3. Energy Model of devices
  4. =======================
  5. 1. Overview
  6. -----------
  7. The Energy Model (EM) framework serves as an interface between drivers knowing
  8. the power consumed by devices at various performance levels, and the kernel
  9. subsystems willing to use that information to make energy-aware decisions.
  10. The source of the information about the power consumed by devices can vary greatly
  11. from one platform to another. These power costs can be estimated using
  12. devicetree data in some cases. In others, the firmware will know better.
  13. Alternatively, userspace might be best positioned. And so on. In order to avoid
  14. each and every client subsystem to re-implement support for each and every
  15. possible source of information on its own, the EM framework intervenes as an
  16. abstraction layer which standardizes the format of power cost tables in the
  17. kernel, hence enabling to avoid redundant work.
  18. The power values might be expressed in micro-Watts or in an 'abstract scale'.
  19. Multiple subsystems might use the EM and it is up to the system integrator to
  20. check that the requirements for the power value scale types are met. An example
  21. can be found in the Energy-Aware Scheduler documentation
  22. Documentation/scheduler/sched-energy.rst. For some subsystems like thermal or
  23. powercap power values expressed in an 'abstract scale' might cause issues.
  24. These subsystems are more interested in estimation of power used in the past,
  25. thus the real micro-Watts might be needed. An example of these requirements can
  26. be found in the Intelligent Power Allocation in
  27. Documentation/driver-api/thermal/power_allocator.rst.
  28. Kernel subsystems might implement automatic detection to check whether EM
  29. registered devices have inconsistent scale (based on EM internal flag).
  30. Important thing to keep in mind is that when the power values are expressed in
  31. an 'abstract scale' deriving real energy in micro-Joules would not be possible.
  32. The figure below depicts an example of drivers (Arm-specific here, but the
  33. approach is applicable to any architecture) providing power costs to the EM
  34. framework, and interested clients reading the data from it::
  35. +---------------+ +-----------------+ +---------------+
  36. | Thermal (IPA) | | Scheduler (EAS) | | Other |
  37. +---------------+ +-----------------+ +---------------+
  38. | | em_cpu_energy() |
  39. | | em_cpu_get() |
  40. +---------+ | +---------+
  41. | | |
  42. v v v
  43. +---------------------+
  44. | Energy Model |
  45. | Framework |
  46. +---------------------+
  47. ^ ^ ^
  48. | | | em_dev_register_perf_domain()
  49. +----------+ | +---------+
  50. | | |
  51. +---------------+ +---------------+ +--------------+
  52. | cpufreq-dt | | arm_scmi | | Other |
  53. +---------------+ +---------------+ +--------------+
  54. ^ ^ ^
  55. | | |
  56. +--------------+ +---------------+ +--------------+
  57. | Device Tree | | Firmware | | ? |
  58. +--------------+ +---------------+ +--------------+
  59. In case of CPU devices the EM framework manages power cost tables per
  60. 'performance domain' in the system. A performance domain is a group of CPUs
  61. whose performance is scaled together. Performance domains generally have a
  62. 1-to-1 mapping with CPUFreq policies. All CPUs in a performance domain are
  63. required to have the same micro-architecture. CPUs in different performance
  64. domains can have different micro-architectures.
  65. 2. Core APIs
  66. ------------
  67. 2.1 Config options
  68. ^^^^^^^^^^^^^^^^^^
  69. CONFIG_ENERGY_MODEL must be enabled to use the EM framework.
  70. 2.2 Registration of performance domains
  71. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  72. Registration of 'advanced' EM
  73. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  74. The 'advanced' EM gets it's name due to the fact that the driver is allowed
  75. to provide more precised power model. It's not limited to some implemented math
  76. formula in the framework (like it's in 'simple' EM case). It can better reflect
  77. the real power measurements performed for each performance state. Thus, this
  78. registration method should be preferred in case considering EM static power
  79. (leakage) is important.
  80. Drivers are expected to register performance domains into the EM framework by
  81. calling the following API::
  82. int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
  83. struct em_data_callback *cb, cpumask_t *cpus, bool microwatts);
  84. Drivers must provide a callback function returning <frequency, power> tuples
  85. for each performance state. The callback function provided by the driver is free
  86. to fetch data from any relevant location (DT, firmware, ...), and by any mean
  87. deemed necessary. Only for CPU devices, drivers must specify the CPUs of the
  88. performance domains using cpumask. For other devices than CPUs the last
  89. argument must be set to NULL.
  90. The last argument 'microwatts' is important to set with correct value. Kernel
  91. subsystems which use EM might rely on this flag to check if all EM devices use
  92. the same scale. If there are different scales, these subsystems might decide
  93. to return warning/error, stop working or panic.
  94. See Section 3. for an example of driver implementing this
  95. callback, or Section 2.4 for further documentation on this API
  96. Registration of EM using DT
  97. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  98. The EM can also be registered using OPP framework and information in DT
  99. "operating-points-v2". Each OPP entry in DT can be extended with a property
  100. "opp-microwatt" containing micro-Watts power value. This OPP DT property
  101. allows a platform to register EM power values which are reflecting total power
  102. (static + dynamic). These power values might be coming directly from
  103. experiments and measurements.
  104. Registration of 'artificial' EM
  105. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  106. There is an option to provide a custom callback for drivers missing detailed
  107. knowledge about power value for each performance state. The callback
  108. .get_cost() is optional and provides the 'cost' values used by the EAS.
  109. This is useful for platforms that only provide information on relative
  110. efficiency between CPU types, where one could use the information to
  111. create an abstract power model. But even an abstract power model can
  112. sometimes be hard to fit in, given the input power value size restrictions.
  113. The .get_cost() allows to provide the 'cost' values which reflect the
  114. efficiency of the CPUs. This would allow to provide EAS information which
  115. has different relation than what would be forced by the EM internal
  116. formulas calculating 'cost' values. To register an EM for such platform, the
  117. driver must set the flag 'microwatts' to 0, provide .get_power() callback
  118. and provide .get_cost() callback. The EM framework would handle such platform
  119. properly during registration. A flag EM_PERF_DOMAIN_ARTIFICIAL is set for such
  120. platform. Special care should be taken by other frameworks which are using EM
  121. to test and treat this flag properly.
  122. Registration of 'simple' EM
  123. ~~~~~~~~~~~~~~~~~~~~~~~~~~~
  124. The 'simple' EM is registered using the framework helper function
  125. cpufreq_register_em_with_opp(). It implements a power model which is tight to
  126. math formula::
  127. Power = C * V^2 * f
  128. The EM which is registered using this method might not reflect correctly the
  129. physics of a real device, e.g. when static power (leakage) is important.
  130. 2.3 Accessing performance domains
  131. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  132. There are two API functions which provide the access to the energy model:
  133. em_cpu_get() which takes CPU id as an argument and em_pd_get() with device
  134. pointer as an argument. It depends on the subsystem which interface it is
  135. going to use, but in case of CPU devices both functions return the same
  136. performance domain.
  137. Subsystems interested in the energy model of a CPU can retrieve it using the
  138. em_cpu_get() API. The energy model tables are allocated once upon creation of
  139. the performance domains, and kept in memory untouched.
  140. The energy consumed by a performance domain can be estimated using the
  141. em_cpu_energy() API. The estimation is performed assuming that the schedutil
  142. CPUfreq governor is in use in case of CPU device. Currently this calculation is
  143. not provided for other type of devices.
  144. More details about the above APIs can be found in ``<linux/energy_model.h>``
  145. or in Section 2.4
  146. 2.4 Description details of this API
  147. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  148. .. kernel-doc:: include/linux/energy_model.h
  149. :internal:
  150. .. kernel-doc:: kernel/power/energy_model.c
  151. :export:
  152. 3. Example driver
  153. -----------------
  154. The CPUFreq framework supports dedicated callback for registering
  155. the EM for a given CPU(s) 'policy' object: cpufreq_driver::register_em().
  156. That callback has to be implemented properly for a given driver,
  157. because the framework would call it at the right time during setup.
  158. This section provides a simple example of a CPUFreq driver registering a
  159. performance domain in the Energy Model framework using the (fake) 'foo'
  160. protocol. The driver implements an est_power() function to be provided to the
  161. EM framework::
  162. -> drivers/cpufreq/foo_cpufreq.c
  163. 01 static int est_power(struct device *dev, unsigned long *mW,
  164. 02 unsigned long *KHz)
  165. 03 {
  166. 04 long freq, power;
  167. 05
  168. 06 /* Use the 'foo' protocol to ceil the frequency */
  169. 07 freq = foo_get_freq_ceil(dev, *KHz);
  170. 08 if (freq < 0);
  171. 09 return freq;
  172. 10
  173. 11 /* Estimate the power cost for the dev at the relevant freq. */
  174. 12 power = foo_estimate_power(dev, freq);
  175. 13 if (power < 0);
  176. 14 return power;
  177. 15
  178. 16 /* Return the values to the EM framework */
  179. 17 *mW = power;
  180. 18 *KHz = freq;
  181. 19
  182. 20 return 0;
  183. 21 }
  184. 22
  185. 23 static void foo_cpufreq_register_em(struct cpufreq_policy *policy)
  186. 24 {
  187. 25 struct em_data_callback em_cb = EM_DATA_CB(est_power);
  188. 26 struct device *cpu_dev;
  189. 27 int nr_opp;
  190. 28
  191. 29 cpu_dev = get_cpu_device(cpumask_first(policy->cpus));
  192. 30
  193. 31 /* Find the number of OPPs for this policy */
  194. 32 nr_opp = foo_get_nr_opp(policy);
  195. 33
  196. 34 /* And register the new performance domain */
  197. 35 em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus,
  198. 36 true);
  199. 37 }
  200. 38
  201. 39 static struct cpufreq_driver foo_cpufreq_driver = {
  202. 40 .register_em = foo_cpufreq_register_em,
  203. 41 };