123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244 |
- .. SPDX-License-Identifier: GPL-2.0
- =======================
- Energy Model of devices
- =======================
- 1. Overview
- -----------
- The Energy Model (EM) framework serves as an interface between drivers knowing
- the power consumed by devices at various performance levels, and the kernel
- subsystems willing to use that information to make energy-aware decisions.
- The source of the information about the power consumed by devices can vary greatly
- from one platform to another. These power costs can be estimated using
- devicetree data in some cases. In others, the firmware will know better.
- Alternatively, userspace might be best positioned. And so on. In order to avoid
- each and every client subsystem to re-implement support for each and every
- possible source of information on its own, the EM framework intervenes as an
- abstraction layer which standardizes the format of power cost tables in the
- kernel, hence enabling to avoid redundant work.
- The power values might be expressed in micro-Watts or in an 'abstract scale'.
- Multiple subsystems might use the EM and it is up to the system integrator to
- check that the requirements for the power value scale types are met. An example
- can be found in the Energy-Aware Scheduler documentation
- Documentation/scheduler/sched-energy.rst. For some subsystems like thermal or
- powercap power values expressed in an 'abstract scale' might cause issues.
- These subsystems are more interested in estimation of power used in the past,
- thus the real micro-Watts might be needed. An example of these requirements can
- be found in the Intelligent Power Allocation in
- Documentation/driver-api/thermal/power_allocator.rst.
- Kernel subsystems might implement automatic detection to check whether EM
- registered devices have inconsistent scale (based on EM internal flag).
- Important thing to keep in mind is that when the power values are expressed in
- an 'abstract scale' deriving real energy in micro-Joules would not be possible.
- The figure below depicts an example of drivers (Arm-specific here, but the
- approach is applicable to any architecture) providing power costs to the EM
- framework, and interested clients reading the data from it::
- +---------------+ +-----------------+ +---------------+
- | Thermal (IPA) | | Scheduler (EAS) | | Other |
- +---------------+ +-----------------+ +---------------+
- | | em_cpu_energy() |
- | | em_cpu_get() |
- +---------+ | +---------+
- | | |
- v v v
- +---------------------+
- | Energy Model |
- | Framework |
- +---------------------+
- ^ ^ ^
- | | | em_dev_register_perf_domain()
- +----------+ | +---------+
- | | |
- +---------------+ +---------------+ +--------------+
- | cpufreq-dt | | arm_scmi | | Other |
- +---------------+ +---------------+ +--------------+
- ^ ^ ^
- | | |
- +--------------+ +---------------+ +--------------+
- | Device Tree | | Firmware | | ? |
- +--------------+ +---------------+ +--------------+
- In case of CPU devices the EM framework manages power cost tables per
- 'performance domain' in the system. A performance domain is a group of CPUs
- whose performance is scaled together. Performance domains generally have a
- 1-to-1 mapping with CPUFreq policies. All CPUs in a performance domain are
- required to have the same micro-architecture. CPUs in different performance
- domains can have different micro-architectures.
- 2. Core APIs
- ------------
- 2.1 Config options
- ^^^^^^^^^^^^^^^^^^
- CONFIG_ENERGY_MODEL must be enabled to use the EM framework.
- 2.2 Registration of performance domains
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- Registration of 'advanced' EM
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- The 'advanced' EM gets it's name due to the fact that the driver is allowed
- to provide more precised power model. It's not limited to some implemented math
- formula in the framework (like it's in 'simple' EM case). It can better reflect
- the real power measurements performed for each performance state. Thus, this
- registration method should be preferred in case considering EM static power
- (leakage) is important.
- Drivers are expected to register performance domains into the EM framework by
- calling the following API::
- int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
- struct em_data_callback *cb, cpumask_t *cpus, bool microwatts);
- Drivers must provide a callback function returning <frequency, power> tuples
- for each performance state. The callback function provided by the driver is free
- to fetch data from any relevant location (DT, firmware, ...), and by any mean
- deemed necessary. Only for CPU devices, drivers must specify the CPUs of the
- performance domains using cpumask. For other devices than CPUs the last
- argument must be set to NULL.
- The last argument 'microwatts' is important to set with correct value. Kernel
- subsystems which use EM might rely on this flag to check if all EM devices use
- the same scale. If there are different scales, these subsystems might decide
- to return warning/error, stop working or panic.
- See Section 3. for an example of driver implementing this
- callback, or Section 2.4 for further documentation on this API
- Registration of EM using DT
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- The EM can also be registered using OPP framework and information in DT
- "operating-points-v2". Each OPP entry in DT can be extended with a property
- "opp-microwatt" containing micro-Watts power value. This OPP DT property
- allows a platform to register EM power values which are reflecting total power
- (static + dynamic). These power values might be coming directly from
- experiments and measurements.
- Registration of 'artificial' EM
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- There is an option to provide a custom callback for drivers missing detailed
- knowledge about power value for each performance state. The callback
- .get_cost() is optional and provides the 'cost' values used by the EAS.
- This is useful for platforms that only provide information on relative
- efficiency between CPU types, where one could use the information to
- create an abstract power model. But even an abstract power model can
- sometimes be hard to fit in, given the input power value size restrictions.
- The .get_cost() allows to provide the 'cost' values which reflect the
- efficiency of the CPUs. This would allow to provide EAS information which
- has different relation than what would be forced by the EM internal
- formulas calculating 'cost' values. To register an EM for such platform, the
- driver must set the flag 'microwatts' to 0, provide .get_power() callback
- and provide .get_cost() callback. The EM framework would handle such platform
- properly during registration. A flag EM_PERF_DOMAIN_ARTIFICIAL is set for such
- platform. Special care should be taken by other frameworks which are using EM
- to test and treat this flag properly.
- Registration of 'simple' EM
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~
- The 'simple' EM is registered using the framework helper function
- cpufreq_register_em_with_opp(). It implements a power model which is tight to
- math formula::
- Power = C * V^2 * f
- The EM which is registered using this method might not reflect correctly the
- physics of a real device, e.g. when static power (leakage) is important.
- 2.3 Accessing performance domains
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- There are two API functions which provide the access to the energy model:
- em_cpu_get() which takes CPU id as an argument and em_pd_get() with device
- pointer as an argument. It depends on the subsystem which interface it is
- going to use, but in case of CPU devices both functions return the same
- performance domain.
- Subsystems interested in the energy model of a CPU can retrieve it using the
- em_cpu_get() API. The energy model tables are allocated once upon creation of
- the performance domains, and kept in memory untouched.
- The energy consumed by a performance domain can be estimated using the
- em_cpu_energy() API. The estimation is performed assuming that the schedutil
- CPUfreq governor is in use in case of CPU device. Currently this calculation is
- not provided for other type of devices.
- More details about the above APIs can be found in ``<linux/energy_model.h>``
- or in Section 2.4
- 2.4 Description details of this API
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- .. kernel-doc:: include/linux/energy_model.h
- :internal:
- .. kernel-doc:: kernel/power/energy_model.c
- :export:
- 3. Example driver
- -----------------
- The CPUFreq framework supports dedicated callback for registering
- the EM for a given CPU(s) 'policy' object: cpufreq_driver::register_em().
- That callback has to be implemented properly for a given driver,
- because the framework would call it at the right time during setup.
- This section provides a simple example of a CPUFreq driver registering a
- performance domain in the Energy Model framework using the (fake) 'foo'
- protocol. The driver implements an est_power() function to be provided to the
- EM framework::
- -> drivers/cpufreq/foo_cpufreq.c
- 01 static int est_power(struct device *dev, unsigned long *mW,
- 02 unsigned long *KHz)
- 03 {
- 04 long freq, power;
- 05
- 06 /* Use the 'foo' protocol to ceil the frequency */
- 07 freq = foo_get_freq_ceil(dev, *KHz);
- 08 if (freq < 0);
- 09 return freq;
- 10
- 11 /* Estimate the power cost for the dev at the relevant freq. */
- 12 power = foo_estimate_power(dev, freq);
- 13 if (power < 0);
- 14 return power;
- 15
- 16 /* Return the values to the EM framework */
- 17 *mW = power;
- 18 *KHz = freq;
- 19
- 20 return 0;
- 21 }
- 22
- 23 static void foo_cpufreq_register_em(struct cpufreq_policy *policy)
- 24 {
- 25 struct em_data_callback em_cb = EM_DATA_CB(est_power);
- 26 struct device *cpu_dev;
- 27 int nr_opp;
- 28
- 29 cpu_dev = get_cpu_device(cpumask_first(policy->cpus));
- 30
- 31 /* Find the number of OPPs for this policy */
- 32 nr_opp = foo_get_nr_opp(policy);
- 33
- 34 /* And register the new performance domain */
- 35 em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus,
- 36 true);
- 37 }
- 38
- 39 static struct cpufreq_driver foo_cpufreq_driver = {
- 40 .register_em = foo_cpufreq_register_em,
- 41 };
|