Merge tag 'pm-5.9-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull one more power management update from Rafael Wysocki: "Modify the intel_pstate driver to allow it to work in the passive mode with hardware-managed P-states (HWP) enabled" * tag 'pm-5.9-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: intel_pstate: Implement passive mode with HWP enabled
This commit is contained in:
@@ -54,10 +54,13 @@ registered (see `below <status_attr_>`_).
|
|||||||
Operation Modes
|
Operation Modes
|
||||||
===============
|
===============
|
||||||
|
|
||||||
``intel_pstate`` can operate in three different modes: in the active mode with
|
``intel_pstate`` can operate in two different modes, active or passive. In the
|
||||||
or without hardware-managed P-states support and in the passive mode. Which of
|
active mode, it uses its own internal performance scaling governor algorithm or
|
||||||
them will be in effect depends on what kernel command line options are used and
|
allows the hardware to do preformance scaling by itself, while in the passive
|
||||||
on the capabilities of the processor.
|
mode it responds to requests made by a generic ``CPUFreq`` governor implementing
|
||||||
|
a certain performance scaling algorithm. Which of them will be in effect
|
||||||
|
depends on what kernel command line options are used and on the capabilities of
|
||||||
|
the processor.
|
||||||
|
|
||||||
Active Mode
|
Active Mode
|
||||||
-----------
|
-----------
|
||||||
@@ -194,10 +197,11 @@ This is the default operation mode of ``intel_pstate`` for processors without
|
|||||||
hardware-managed P-states (HWP) support. It is always used if the
|
hardware-managed P-states (HWP) support. It is always used if the
|
||||||
``intel_pstate=passive`` argument is passed to the kernel in the command line
|
``intel_pstate=passive`` argument is passed to the kernel in the command line
|
||||||
regardless of whether or not the given processor supports HWP. [Note that the
|
regardless of whether or not the given processor supports HWP. [Note that the
|
||||||
``intel_pstate=no_hwp`` setting implies ``intel_pstate=passive`` if it is used
|
``intel_pstate=no_hwp`` setting causes the driver to start in the passive mode
|
||||||
without ``intel_pstate=active``.] Like in the active mode without HWP support,
|
if it is not combined with ``intel_pstate=active``.] Like in the active mode
|
||||||
in this mode ``intel_pstate`` may refuse to work with processors that are not
|
without HWP support, in this mode ``intel_pstate`` may refuse to work with
|
||||||
recognized by it.
|
processors that are not recognized by it if HWP is prevented from being enabled
|
||||||
|
through the kernel command line.
|
||||||
|
|
||||||
If the driver works in this mode, the ``scaling_driver`` policy attribute in
|
If the driver works in this mode, the ``scaling_driver`` policy attribute in
|
||||||
``sysfs`` for all ``CPUFreq`` policies contains the string "intel_cpufreq".
|
``sysfs`` for all ``CPUFreq`` policies contains the string "intel_cpufreq".
|
||||||
@@ -318,10 +322,9 @@ manuals need to be consulted to get to it too.
|
|||||||
|
|
||||||
For this reason, there is a list of supported processors in ``intel_pstate`` and
|
For this reason, there is a list of supported processors in ``intel_pstate`` and
|
||||||
the driver initialization will fail if the detected processor is not in that
|
the driver initialization will fail if the detected processor is not in that
|
||||||
list, unless it supports the `HWP feature <Active Mode_>`_. [The interface to
|
list, unless it supports the HWP feature. [The interface to obtain all of the
|
||||||
obtain all of the information listed above is the same for all of the processors
|
information listed above is the same for all of the processors supporting the
|
||||||
supporting the HWP feature, which is why they all are supported by
|
HWP feature, which is why ``intel_pstate`` works with all of them.]
|
||||||
``intel_pstate``.]
|
|
||||||
|
|
||||||
|
|
||||||
User Space Interface in ``sysfs``
|
User Space Interface in ``sysfs``
|
||||||
@@ -425,22 +428,16 @@ argument is passed to the kernel in the command line.
|
|||||||
as well as the per-policy ones) are then reset to their default
|
as well as the per-policy ones) are then reset to their default
|
||||||
values, possibly depending on the target operation mode.]
|
values, possibly depending on the target operation mode.]
|
||||||
|
|
||||||
That only is supported in some configurations, though (for example, if
|
|
||||||
the `HWP feature is enabled in the processor <Active Mode With HWP_>`_,
|
|
||||||
the operation mode of the driver cannot be changed), and if it is not
|
|
||||||
supported in the current configuration, writes to this attribute will
|
|
||||||
fail with an appropriate error.
|
|
||||||
|
|
||||||
``energy_efficiency``
|
``energy_efficiency``
|
||||||
This attribute is only present on platforms, which have CPUs matching
|
This attribute is only present on platforms with CPUs matching the Kaby
|
||||||
Kaby Lake or Coffee Lake desktop CPU model. By default
|
Lake or Coffee Lake desktop CPU model. By default, energy-efficiency
|
||||||
energy efficiency optimizations are disabled on these CPU models in HWP
|
optimizations are disabled on these CPU models if HWP is enabled.
|
||||||
mode by this driver. Enabling energy efficiency may limit maximum
|
Enabling energy-efficiency optimizations may limit maximum operating
|
||||||
operating frequency in both HWP and non HWP mode. In non HWP mode,
|
frequency with or without the HWP feature. With HWP enabled, the
|
||||||
optimizations are done only in the turbo frequency range. In HWP mode,
|
optimizations are done only in the turbo frequency range. Without it,
|
||||||
optimizations are done in the entire frequency range. Setting this
|
they are done in the entire available frequency range. Setting this
|
||||||
attribute to "1" enables energy efficiency optimizations and setting
|
attribute to "1" enables the energy-efficiency optimizations and setting
|
||||||
to "0" disables energy efficiency optimizations.
|
to "0" disables them.
|
||||||
|
|
||||||
Interpretation of Policy Attributes
|
Interpretation of Policy Attributes
|
||||||
-----------------------------------
|
-----------------------------------
|
||||||
@@ -484,8 +481,8 @@ Next, the following policy attributes have special meaning if
|
|||||||
policy for the time interval between the last two invocations of the
|
policy for the time interval between the last two invocations of the
|
||||||
driver's utilization update callback by the CPU scheduler for that CPU.
|
driver's utilization update callback by the CPU scheduler for that CPU.
|
||||||
|
|
||||||
One more policy attribute is present if the `HWP feature is enabled in the
|
One more policy attribute is present if the HWP feature is enabled in the
|
||||||
processor <Active Mode With HWP_>`_:
|
processor:
|
||||||
|
|
||||||
``base_frequency``
|
``base_frequency``
|
||||||
Shows the base frequency of the CPU. Any frequency above this will be
|
Shows the base frequency of the CPU. Any frequency above this will be
|
||||||
@@ -526,11 +523,11 @@ on the following rules, regardless of the current operation mode of the driver:
|
|||||||
|
|
||||||
3. The global and per-policy limits can be set independently.
|
3. The global and per-policy limits can be set independently.
|
||||||
|
|
||||||
If the `HWP feature is enabled in the processor <Active Mode With HWP_>`_, the
|
In the `active mode with the HWP feature enabled <Active Mode With HWP_>`_, the
|
||||||
resulting effective values are written into its registers whenever the limits
|
resulting effective values are written into hardware registers whenever the
|
||||||
change in order to request its internal P-state selection logic to always set
|
limits change in order to request its internal P-state selection logic to always
|
||||||
P-states within these limits. Otherwise, the limits are taken into account by
|
set P-states within these limits. Otherwise, the limits are taken into account
|
||||||
scaling governors (in the `passive mode <Passive Mode_>`_) and by the driver
|
by scaling governors (in the `passive mode <Passive Mode_>`_) and by the driver
|
||||||
every time before setting a new P-state for a CPU.
|
every time before setting a new P-state for a CPU.
|
||||||
|
|
||||||
Additionally, if the ``intel_pstate=per_cpu_perf_limits`` command line argument
|
Additionally, if the ``intel_pstate=per_cpu_perf_limits`` command line argument
|
||||||
@@ -541,12 +538,11 @@ at all and the only way to set the limits is by using the policy attributes.
|
|||||||
Energy vs Performance Hints
|
Energy vs Performance Hints
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
||||||
If ``intel_pstate`` works in the `active mode with the HWP feature enabled
|
If the hardware-managed P-states (HWP) is enabled in the processor, additional
|
||||||
<Active Mode With HWP_>`_ in the processor, additional attributes are present
|
attributes, intended to allow user space to help ``intel_pstate`` to adjust the
|
||||||
in every ``CPUFreq`` policy directory in ``sysfs``. They are intended to allow
|
processor's internal P-state selection logic by focusing it on performance or on
|
||||||
user space to help ``intel_pstate`` to adjust the processor's internal P-state
|
energy-efficiency, or somewhere between the two extremes, are present in every
|
||||||
selection logic by focusing it on performance or on energy-efficiency, or
|
``CPUFreq`` policy directory in ``sysfs``. They are :
|
||||||
somewhere between the two extremes:
|
|
||||||
|
|
||||||
``energy_performance_preference``
|
``energy_performance_preference``
|
||||||
Current value of the energy vs performance hint for the given policy
|
Current value of the energy vs performance hint for the given policy
|
||||||
@@ -650,12 +646,14 @@ of them have to be prepended with the ``intel_pstate=`` prefix.
|
|||||||
Do not register ``intel_pstate`` as the scaling driver even if the
|
Do not register ``intel_pstate`` as the scaling driver even if the
|
||||||
processor is supported by it.
|
processor is supported by it.
|
||||||
|
|
||||||
|
``active``
|
||||||
|
Register ``intel_pstate`` in the `active mode <Active Mode_>`_ to start
|
||||||
|
with.
|
||||||
|
|
||||||
``passive``
|
``passive``
|
||||||
Register ``intel_pstate`` in the `passive mode <Passive Mode_>`_ to
|
Register ``intel_pstate`` in the `passive mode <Passive Mode_>`_ to
|
||||||
start with.
|
start with.
|
||||||
|
|
||||||
This option implies the ``no_hwp`` one described below.
|
|
||||||
|
|
||||||
``force``
|
``force``
|
||||||
Register ``intel_pstate`` as the scaling driver instead of
|
Register ``intel_pstate`` as the scaling driver instead of
|
||||||
``acpi-cpufreq`` even if the latter is preferred on the given system.
|
``acpi-cpufreq`` even if the latter is preferred on the given system.
|
||||||
@@ -670,13 +668,12 @@ of them have to be prepended with the ``intel_pstate=`` prefix.
|
|||||||
driver is used instead of ``acpi-cpufreq``.
|
driver is used instead of ``acpi-cpufreq``.
|
||||||
|
|
||||||
``no_hwp``
|
``no_hwp``
|
||||||
Do not enable the `hardware-managed P-states (HWP) feature
|
Do not enable the hardware-managed P-states (HWP) feature even if it is
|
||||||
<Active Mode With HWP_>`_ even if it is supported by the processor.
|
supported by the processor.
|
||||||
|
|
||||||
``hwp_only``
|
``hwp_only``
|
||||||
Register ``intel_pstate`` as the scaling driver only if the
|
Register ``intel_pstate`` as the scaling driver only if the
|
||||||
`hardware-managed P-states (HWP) feature <Active Mode With HWP_>`_ is
|
hardware-managed P-states (HWP) feature is supported by the processor.
|
||||||
supported by the processor.
|
|
||||||
|
|
||||||
``support_acpi_ppc``
|
``support_acpi_ppc``
|
||||||
Take ACPI ``_PPC`` performance limits into account.
|
Take ACPI ``_PPC`` performance limits into account.
|
||||||
|
@@ -73,8 +73,6 @@ static inline bool has_target(void)
|
|||||||
static unsigned int __cpufreq_get(struct cpufreq_policy *policy);
|
static unsigned int __cpufreq_get(struct cpufreq_policy *policy);
|
||||||
static int cpufreq_init_governor(struct cpufreq_policy *policy);
|
static int cpufreq_init_governor(struct cpufreq_policy *policy);
|
||||||
static void cpufreq_exit_governor(struct cpufreq_policy *policy);
|
static void cpufreq_exit_governor(struct cpufreq_policy *policy);
|
||||||
static int cpufreq_start_governor(struct cpufreq_policy *policy);
|
|
||||||
static void cpufreq_stop_governor(struct cpufreq_policy *policy);
|
|
||||||
static void cpufreq_governor_limits(struct cpufreq_policy *policy);
|
static void cpufreq_governor_limits(struct cpufreq_policy *policy);
|
||||||
static int cpufreq_set_policy(struct cpufreq_policy *policy,
|
static int cpufreq_set_policy(struct cpufreq_policy *policy,
|
||||||
struct cpufreq_governor *new_gov,
|
struct cpufreq_governor *new_gov,
|
||||||
@@ -2266,7 +2264,7 @@ static void cpufreq_exit_governor(struct cpufreq_policy *policy)
|
|||||||
module_put(policy->governor->owner);
|
module_put(policy->governor->owner);
|
||||||
}
|
}
|
||||||
|
|
||||||
static int cpufreq_start_governor(struct cpufreq_policy *policy)
|
int cpufreq_start_governor(struct cpufreq_policy *policy)
|
||||||
{
|
{
|
||||||
int ret;
|
int ret;
|
||||||
|
|
||||||
@@ -2293,7 +2291,7 @@ static int cpufreq_start_governor(struct cpufreq_policy *policy)
|
|||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
static void cpufreq_stop_governor(struct cpufreq_policy *policy)
|
void cpufreq_stop_governor(struct cpufreq_policy *policy)
|
||||||
{
|
{
|
||||||
if (cpufreq_suspended || !policy->governor)
|
if (cpufreq_suspended || !policy->governor)
|
||||||
return;
|
return;
|
||||||
|
@@ -36,6 +36,7 @@
|
|||||||
#define INTEL_PSTATE_SAMPLING_INTERVAL (10 * NSEC_PER_MSEC)
|
#define INTEL_PSTATE_SAMPLING_INTERVAL (10 * NSEC_PER_MSEC)
|
||||||
|
|
||||||
#define INTEL_CPUFREQ_TRANSITION_LATENCY 20000
|
#define INTEL_CPUFREQ_TRANSITION_LATENCY 20000
|
||||||
|
#define INTEL_CPUFREQ_TRANSITION_DELAY_HWP 5000
|
||||||
#define INTEL_CPUFREQ_TRANSITION_DELAY 500
|
#define INTEL_CPUFREQ_TRANSITION_DELAY 500
|
||||||
|
|
||||||
#ifdef CONFIG_ACPI
|
#ifdef CONFIG_ACPI
|
||||||
@@ -220,6 +221,7 @@ struct global_params {
|
|||||||
* preference/bias
|
* preference/bias
|
||||||
* @epp_saved: Saved EPP/EPB during system suspend or CPU offline
|
* @epp_saved: Saved EPP/EPB during system suspend or CPU offline
|
||||||
* operation
|
* operation
|
||||||
|
* @epp_cached Cached HWP energy-performance preference value
|
||||||
* @hwp_req_cached: Cached value of the last HWP Request MSR
|
* @hwp_req_cached: Cached value of the last HWP Request MSR
|
||||||
* @hwp_cap_cached: Cached value of the last HWP Capabilities MSR
|
* @hwp_cap_cached: Cached value of the last HWP Capabilities MSR
|
||||||
* @last_io_update: Last time when IO wake flag was set
|
* @last_io_update: Last time when IO wake flag was set
|
||||||
@@ -257,6 +259,7 @@ struct cpudata {
|
|||||||
s16 epp_policy;
|
s16 epp_policy;
|
||||||
s16 epp_default;
|
s16 epp_default;
|
||||||
s16 epp_saved;
|
s16 epp_saved;
|
||||||
|
s16 epp_cached;
|
||||||
u64 hwp_req_cached;
|
u64 hwp_req_cached;
|
||||||
u64 hwp_cap_cached;
|
u64 hwp_cap_cached;
|
||||||
u64 last_io_update;
|
u64 last_io_update;
|
||||||
@@ -639,6 +642,26 @@ static int intel_pstate_get_energy_pref_index(struct cpudata *cpu_data, int *raw
|
|||||||
return index;
|
return index;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static int intel_pstate_set_epp(struct cpudata *cpu, u32 epp)
|
||||||
|
{
|
||||||
|
/*
|
||||||
|
* Use the cached HWP Request MSR value, because in the active mode the
|
||||||
|
* register itself may be updated by intel_pstate_hwp_boost_up() or
|
||||||
|
* intel_pstate_hwp_boost_down() at any time.
|
||||||
|
*/
|
||||||
|
u64 value = READ_ONCE(cpu->hwp_req_cached);
|
||||||
|
|
||||||
|
value &= ~GENMASK_ULL(31, 24);
|
||||||
|
value |= (u64)epp << 24;
|
||||||
|
/*
|
||||||
|
* The only other updater of hwp_req_cached in the active mode,
|
||||||
|
* intel_pstate_hwp_set(), is called under the same lock as this
|
||||||
|
* function, so it cannot run in parallel with the update below.
|
||||||
|
*/
|
||||||
|
WRITE_ONCE(cpu->hwp_req_cached, value);
|
||||||
|
return wrmsrl_on_cpu(cpu->cpu, MSR_HWP_REQUEST, value);
|
||||||
|
}
|
||||||
|
|
||||||
static int intel_pstate_set_energy_pref_index(struct cpudata *cpu_data,
|
static int intel_pstate_set_energy_pref_index(struct cpudata *cpu_data,
|
||||||
int pref_index, bool use_raw,
|
int pref_index, bool use_raw,
|
||||||
u32 raw_epp)
|
u32 raw_epp)
|
||||||
@@ -650,28 +673,12 @@ static int intel_pstate_set_energy_pref_index(struct cpudata *cpu_data,
|
|||||||
epp = cpu_data->epp_default;
|
epp = cpu_data->epp_default;
|
||||||
|
|
||||||
if (boot_cpu_has(X86_FEATURE_HWP_EPP)) {
|
if (boot_cpu_has(X86_FEATURE_HWP_EPP)) {
|
||||||
/*
|
|
||||||
* Use the cached HWP Request MSR value, because the register
|
|
||||||
* itself may be updated by intel_pstate_hwp_boost_up() or
|
|
||||||
* intel_pstate_hwp_boost_down() at any time.
|
|
||||||
*/
|
|
||||||
u64 value = READ_ONCE(cpu_data->hwp_req_cached);
|
|
||||||
|
|
||||||
value &= ~GENMASK_ULL(31, 24);
|
|
||||||
|
|
||||||
if (use_raw)
|
if (use_raw)
|
||||||
epp = raw_epp;
|
epp = raw_epp;
|
||||||
else if (epp == -EINVAL)
|
else if (epp == -EINVAL)
|
||||||
epp = epp_values[pref_index - 1];
|
epp = epp_values[pref_index - 1];
|
||||||
|
|
||||||
value |= (u64)epp << 24;
|
ret = intel_pstate_set_epp(cpu_data, epp);
|
||||||
/*
|
|
||||||
* The only other updater of hwp_req_cached in the active mode,
|
|
||||||
* intel_pstate_hwp_set(), is called under the same lock as this
|
|
||||||
* function, so it cannot run in parallel with the update below.
|
|
||||||
*/
|
|
||||||
WRITE_ONCE(cpu_data->hwp_req_cached, value);
|
|
||||||
ret = wrmsrl_on_cpu(cpu_data->cpu, MSR_HWP_REQUEST, value);
|
|
||||||
} else {
|
} else {
|
||||||
if (epp == -EINVAL)
|
if (epp == -EINVAL)
|
||||||
epp = (pref_index - 1) << 2;
|
epp = (pref_index - 1) << 2;
|
||||||
@@ -697,10 +704,12 @@ static ssize_t show_energy_performance_available_preferences(
|
|||||||
|
|
||||||
cpufreq_freq_attr_ro(energy_performance_available_preferences);
|
cpufreq_freq_attr_ro(energy_performance_available_preferences);
|
||||||
|
|
||||||
|
static struct cpufreq_driver intel_pstate;
|
||||||
|
|
||||||
static ssize_t store_energy_performance_preference(
|
static ssize_t store_energy_performance_preference(
|
||||||
struct cpufreq_policy *policy, const char *buf, size_t count)
|
struct cpufreq_policy *policy, const char *buf, size_t count)
|
||||||
{
|
{
|
||||||
struct cpudata *cpu_data = all_cpu_data[policy->cpu];
|
struct cpudata *cpu = all_cpu_data[policy->cpu];
|
||||||
char str_preference[21];
|
char str_preference[21];
|
||||||
bool raw = false;
|
bool raw = false;
|
||||||
ssize_t ret;
|
ssize_t ret;
|
||||||
@@ -725,15 +734,44 @@ static ssize_t store_energy_performance_preference(
|
|||||||
raw = true;
|
raw = true;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* This function runs with the policy R/W semaphore held, which
|
||||||
|
* guarantees that the driver pointer will not change while it is
|
||||||
|
* running.
|
||||||
|
*/
|
||||||
|
if (!intel_pstate_driver)
|
||||||
|
return -EAGAIN;
|
||||||
|
|
||||||
mutex_lock(&intel_pstate_limits_lock);
|
mutex_lock(&intel_pstate_limits_lock);
|
||||||
|
|
||||||
ret = intel_pstate_set_energy_pref_index(cpu_data, ret, raw, epp);
|
if (intel_pstate_driver == &intel_pstate) {
|
||||||
if (!ret)
|
ret = intel_pstate_set_energy_pref_index(cpu, ret, raw, epp);
|
||||||
ret = count;
|
} else {
|
||||||
|
/*
|
||||||
|
* In the passive mode the governor needs to be stopped on the
|
||||||
|
* target CPU before the EPP update and restarted after it,
|
||||||
|
* which is super-heavy-weight, so make sure it is worth doing
|
||||||
|
* upfront.
|
||||||
|
*/
|
||||||
|
if (!raw)
|
||||||
|
epp = ret ? epp_values[ret - 1] : cpu->epp_default;
|
||||||
|
|
||||||
|
if (cpu->epp_cached != epp) {
|
||||||
|
int err;
|
||||||
|
|
||||||
|
cpufreq_stop_governor(policy);
|
||||||
|
ret = intel_pstate_set_epp(cpu, epp);
|
||||||
|
err = cpufreq_start_governor(policy);
|
||||||
|
if (!ret) {
|
||||||
|
cpu->epp_cached = epp;
|
||||||
|
ret = err;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
mutex_unlock(&intel_pstate_limits_lock);
|
mutex_unlock(&intel_pstate_limits_lock);
|
||||||
|
|
||||||
return ret;
|
return ret ?: count;
|
||||||
}
|
}
|
||||||
|
|
||||||
static ssize_t show_energy_performance_preference(
|
static ssize_t show_energy_performance_preference(
|
||||||
@@ -1145,8 +1183,6 @@ static ssize_t store_no_turbo(struct kobject *a, struct kobj_attribute *b,
|
|||||||
return count;
|
return count;
|
||||||
}
|
}
|
||||||
|
|
||||||
static struct cpufreq_driver intel_pstate;
|
|
||||||
|
|
||||||
static void update_qos_request(enum freq_qos_req_type type)
|
static void update_qos_request(enum freq_qos_req_type type)
|
||||||
{
|
{
|
||||||
int max_state, turbo_max, freq, i, perf_pct;
|
int max_state, turbo_max, freq, i, perf_pct;
|
||||||
@@ -1330,9 +1366,10 @@ static const struct attribute_group intel_pstate_attr_group = {
|
|||||||
|
|
||||||
static const struct x86_cpu_id intel_pstate_cpu_ee_disable_ids[];
|
static const struct x86_cpu_id intel_pstate_cpu_ee_disable_ids[];
|
||||||
|
|
||||||
|
static struct kobject *intel_pstate_kobject;
|
||||||
|
|
||||||
static void __init intel_pstate_sysfs_expose_params(void)
|
static void __init intel_pstate_sysfs_expose_params(void)
|
||||||
{
|
{
|
||||||
struct kobject *intel_pstate_kobject;
|
|
||||||
int rc;
|
int rc;
|
||||||
|
|
||||||
intel_pstate_kobject = kobject_create_and_add("intel_pstate",
|
intel_pstate_kobject = kobject_create_and_add("intel_pstate",
|
||||||
@@ -1357,17 +1394,31 @@ static void __init intel_pstate_sysfs_expose_params(void)
|
|||||||
rc = sysfs_create_file(intel_pstate_kobject, &min_perf_pct.attr);
|
rc = sysfs_create_file(intel_pstate_kobject, &min_perf_pct.attr);
|
||||||
WARN_ON(rc);
|
WARN_ON(rc);
|
||||||
|
|
||||||
if (hwp_active) {
|
|
||||||
rc = sysfs_create_file(intel_pstate_kobject,
|
|
||||||
&hwp_dynamic_boost.attr);
|
|
||||||
WARN_ON(rc);
|
|
||||||
}
|
|
||||||
|
|
||||||
if (x86_match_cpu(intel_pstate_cpu_ee_disable_ids)) {
|
if (x86_match_cpu(intel_pstate_cpu_ee_disable_ids)) {
|
||||||
rc = sysfs_create_file(intel_pstate_kobject, &energy_efficiency.attr);
|
rc = sysfs_create_file(intel_pstate_kobject, &energy_efficiency.attr);
|
||||||
WARN_ON(rc);
|
WARN_ON(rc);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void intel_pstate_sysfs_expose_hwp_dynamic_boost(void)
|
||||||
|
{
|
||||||
|
int rc;
|
||||||
|
|
||||||
|
if (!hwp_active)
|
||||||
|
return;
|
||||||
|
|
||||||
|
rc = sysfs_create_file(intel_pstate_kobject, &hwp_dynamic_boost.attr);
|
||||||
|
WARN_ON_ONCE(rc);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void intel_pstate_sysfs_hide_hwp_dynamic_boost(void)
|
||||||
|
{
|
||||||
|
if (!hwp_active)
|
||||||
|
return;
|
||||||
|
|
||||||
|
sysfs_remove_file(intel_pstate_kobject, &hwp_dynamic_boost.attr);
|
||||||
|
}
|
||||||
|
|
||||||
/************************** sysfs end ************************/
|
/************************** sysfs end ************************/
|
||||||
|
|
||||||
static void intel_pstate_hwp_enable(struct cpudata *cpudata)
|
static void intel_pstate_hwp_enable(struct cpudata *cpudata)
|
||||||
@@ -2247,7 +2298,10 @@ static int intel_pstate_verify_policy(struct cpufreq_policy_data *policy)
|
|||||||
|
|
||||||
static void intel_cpufreq_stop_cpu(struct cpufreq_policy *policy)
|
static void intel_cpufreq_stop_cpu(struct cpufreq_policy *policy)
|
||||||
{
|
{
|
||||||
intel_pstate_set_min_pstate(all_cpu_data[policy->cpu]);
|
if (hwp_active)
|
||||||
|
intel_pstate_hwp_force_min_perf(policy->cpu);
|
||||||
|
else
|
||||||
|
intel_pstate_set_min_pstate(all_cpu_data[policy->cpu]);
|
||||||
}
|
}
|
||||||
|
|
||||||
static void intel_pstate_stop_cpu(struct cpufreq_policy *policy)
|
static void intel_pstate_stop_cpu(struct cpufreq_policy *policy)
|
||||||
@@ -2255,12 +2309,10 @@ static void intel_pstate_stop_cpu(struct cpufreq_policy *policy)
|
|||||||
pr_debug("CPU %d exiting\n", policy->cpu);
|
pr_debug("CPU %d exiting\n", policy->cpu);
|
||||||
|
|
||||||
intel_pstate_clear_update_util_hook(policy->cpu);
|
intel_pstate_clear_update_util_hook(policy->cpu);
|
||||||
if (hwp_active) {
|
if (hwp_active)
|
||||||
intel_pstate_hwp_save_state(policy);
|
intel_pstate_hwp_save_state(policy);
|
||||||
intel_pstate_hwp_force_min_perf(policy->cpu);
|
|
||||||
} else {
|
intel_cpufreq_stop_cpu(policy);
|
||||||
intel_cpufreq_stop_cpu(policy);
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
static int intel_pstate_cpu_exit(struct cpufreq_policy *policy)
|
static int intel_pstate_cpu_exit(struct cpufreq_policy *policy)
|
||||||
@@ -2390,13 +2442,71 @@ static void intel_cpufreq_trace(struct cpudata *cpu, unsigned int trace_type, in
|
|||||||
fp_toint(cpu->iowait_boost * 100));
|
fp_toint(cpu->iowait_boost * 100));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void intel_cpufreq_adjust_hwp(struct cpudata *cpu, u32 target_pstate,
|
||||||
|
bool fast_switch)
|
||||||
|
{
|
||||||
|
u64 prev = READ_ONCE(cpu->hwp_req_cached), value = prev;
|
||||||
|
|
||||||
|
value &= ~HWP_MIN_PERF(~0L);
|
||||||
|
value |= HWP_MIN_PERF(target_pstate);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* The entire MSR needs to be updated in order to update the HWP min
|
||||||
|
* field in it, so opportunistically update the max too if needed.
|
||||||
|
*/
|
||||||
|
value &= ~HWP_MAX_PERF(~0L);
|
||||||
|
value |= HWP_MAX_PERF(cpu->max_perf_ratio);
|
||||||
|
|
||||||
|
if (value == prev)
|
||||||
|
return;
|
||||||
|
|
||||||
|
WRITE_ONCE(cpu->hwp_req_cached, value);
|
||||||
|
if (fast_switch)
|
||||||
|
wrmsrl(MSR_HWP_REQUEST, value);
|
||||||
|
else
|
||||||
|
wrmsrl_on_cpu(cpu->cpu, MSR_HWP_REQUEST, value);
|
||||||
|
}
|
||||||
|
|
||||||
|
static void intel_cpufreq_adjust_perf_ctl(struct cpudata *cpu,
|
||||||
|
u32 target_pstate, bool fast_switch)
|
||||||
|
{
|
||||||
|
if (fast_switch)
|
||||||
|
wrmsrl(MSR_IA32_PERF_CTL,
|
||||||
|
pstate_funcs.get_val(cpu, target_pstate));
|
||||||
|
else
|
||||||
|
wrmsrl_on_cpu(cpu->cpu, MSR_IA32_PERF_CTL,
|
||||||
|
pstate_funcs.get_val(cpu, target_pstate));
|
||||||
|
}
|
||||||
|
|
||||||
|
static int intel_cpufreq_update_pstate(struct cpudata *cpu, int target_pstate,
|
||||||
|
bool fast_switch)
|
||||||
|
{
|
||||||
|
int old_pstate = cpu->pstate.current_pstate;
|
||||||
|
|
||||||
|
target_pstate = intel_pstate_prepare_request(cpu, target_pstate);
|
||||||
|
if (target_pstate != old_pstate) {
|
||||||
|
cpu->pstate.current_pstate = target_pstate;
|
||||||
|
if (hwp_active)
|
||||||
|
intel_cpufreq_adjust_hwp(cpu, target_pstate,
|
||||||
|
fast_switch);
|
||||||
|
else
|
||||||
|
intel_cpufreq_adjust_perf_ctl(cpu, target_pstate,
|
||||||
|
fast_switch);
|
||||||
|
}
|
||||||
|
|
||||||
|
intel_cpufreq_trace(cpu, fast_switch ? INTEL_PSTATE_TRACE_FAST_SWITCH :
|
||||||
|
INTEL_PSTATE_TRACE_TARGET, old_pstate);
|
||||||
|
|
||||||
|
return target_pstate;
|
||||||
|
}
|
||||||
|
|
||||||
static int intel_cpufreq_target(struct cpufreq_policy *policy,
|
static int intel_cpufreq_target(struct cpufreq_policy *policy,
|
||||||
unsigned int target_freq,
|
unsigned int target_freq,
|
||||||
unsigned int relation)
|
unsigned int relation)
|
||||||
{
|
{
|
||||||
struct cpudata *cpu = all_cpu_data[policy->cpu];
|
struct cpudata *cpu = all_cpu_data[policy->cpu];
|
||||||
struct cpufreq_freqs freqs;
|
struct cpufreq_freqs freqs;
|
||||||
int target_pstate, old_pstate;
|
int target_pstate;
|
||||||
|
|
||||||
update_turbo_state();
|
update_turbo_state();
|
||||||
|
|
||||||
@@ -2404,6 +2514,7 @@ static int intel_cpufreq_target(struct cpufreq_policy *policy,
|
|||||||
freqs.new = target_freq;
|
freqs.new = target_freq;
|
||||||
|
|
||||||
cpufreq_freq_transition_begin(policy, &freqs);
|
cpufreq_freq_transition_begin(policy, &freqs);
|
||||||
|
|
||||||
switch (relation) {
|
switch (relation) {
|
||||||
case CPUFREQ_RELATION_L:
|
case CPUFREQ_RELATION_L:
|
||||||
target_pstate = DIV_ROUND_UP(freqs.new, cpu->pstate.scaling);
|
target_pstate = DIV_ROUND_UP(freqs.new, cpu->pstate.scaling);
|
||||||
@@ -2415,15 +2526,11 @@ static int intel_cpufreq_target(struct cpufreq_policy *policy,
|
|||||||
target_pstate = DIV_ROUND_CLOSEST(freqs.new, cpu->pstate.scaling);
|
target_pstate = DIV_ROUND_CLOSEST(freqs.new, cpu->pstate.scaling);
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
target_pstate = intel_pstate_prepare_request(cpu, target_pstate);
|
|
||||||
old_pstate = cpu->pstate.current_pstate;
|
target_pstate = intel_cpufreq_update_pstate(cpu, target_pstate, false);
|
||||||
if (target_pstate != cpu->pstate.current_pstate) {
|
|
||||||
cpu->pstate.current_pstate = target_pstate;
|
|
||||||
wrmsrl_on_cpu(policy->cpu, MSR_IA32_PERF_CTL,
|
|
||||||
pstate_funcs.get_val(cpu, target_pstate));
|
|
||||||
}
|
|
||||||
freqs.new = target_pstate * cpu->pstate.scaling;
|
freqs.new = target_pstate * cpu->pstate.scaling;
|
||||||
intel_cpufreq_trace(cpu, INTEL_PSTATE_TRACE_TARGET, old_pstate);
|
|
||||||
cpufreq_freq_transition_end(policy, &freqs, false);
|
cpufreq_freq_transition_end(policy, &freqs, false);
|
||||||
|
|
||||||
return 0;
|
return 0;
|
||||||
@@ -2433,15 +2540,14 @@ static unsigned int intel_cpufreq_fast_switch(struct cpufreq_policy *policy,
|
|||||||
unsigned int target_freq)
|
unsigned int target_freq)
|
||||||
{
|
{
|
||||||
struct cpudata *cpu = all_cpu_data[policy->cpu];
|
struct cpudata *cpu = all_cpu_data[policy->cpu];
|
||||||
int target_pstate, old_pstate;
|
int target_pstate;
|
||||||
|
|
||||||
update_turbo_state();
|
update_turbo_state();
|
||||||
|
|
||||||
target_pstate = DIV_ROUND_UP(target_freq, cpu->pstate.scaling);
|
target_pstate = DIV_ROUND_UP(target_freq, cpu->pstate.scaling);
|
||||||
target_pstate = intel_pstate_prepare_request(cpu, target_pstate);
|
|
||||||
old_pstate = cpu->pstate.current_pstate;
|
target_pstate = intel_cpufreq_update_pstate(cpu, target_pstate, true);
|
||||||
intel_pstate_update_pstate(cpu, target_pstate);
|
|
||||||
intel_cpufreq_trace(cpu, INTEL_PSTATE_TRACE_FAST_SWITCH, old_pstate);
|
|
||||||
return target_pstate * cpu->pstate.scaling;
|
return target_pstate * cpu->pstate.scaling;
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -2461,7 +2567,6 @@ static int intel_cpufreq_cpu_init(struct cpufreq_policy *policy)
|
|||||||
return ret;
|
return ret;
|
||||||
|
|
||||||
policy->cpuinfo.transition_latency = INTEL_CPUFREQ_TRANSITION_LATENCY;
|
policy->cpuinfo.transition_latency = INTEL_CPUFREQ_TRANSITION_LATENCY;
|
||||||
policy->transition_delay_us = INTEL_CPUFREQ_TRANSITION_DELAY;
|
|
||||||
/* This reflects the intel_pstate_get_cpu_pstates() setting. */
|
/* This reflects the intel_pstate_get_cpu_pstates() setting. */
|
||||||
policy->cur = policy->cpuinfo.min_freq;
|
policy->cur = policy->cpuinfo.min_freq;
|
||||||
|
|
||||||
@@ -2473,10 +2578,18 @@ static int intel_cpufreq_cpu_init(struct cpufreq_policy *policy)
|
|||||||
|
|
||||||
cpu = all_cpu_data[policy->cpu];
|
cpu = all_cpu_data[policy->cpu];
|
||||||
|
|
||||||
if (hwp_active)
|
if (hwp_active) {
|
||||||
|
u64 value;
|
||||||
|
|
||||||
intel_pstate_get_hwp_max(policy->cpu, &turbo_max, &max_state);
|
intel_pstate_get_hwp_max(policy->cpu, &turbo_max, &max_state);
|
||||||
else
|
policy->transition_delay_us = INTEL_CPUFREQ_TRANSITION_DELAY_HWP;
|
||||||
|
rdmsrl_on_cpu(cpu->cpu, MSR_HWP_REQUEST, &value);
|
||||||
|
WRITE_ONCE(cpu->hwp_req_cached, value);
|
||||||
|
cpu->epp_cached = (value & GENMASK_ULL(31, 24)) >> 24;
|
||||||
|
} else {
|
||||||
turbo_max = cpu->pstate.turbo_pstate;
|
turbo_max = cpu->pstate.turbo_pstate;
|
||||||
|
policy->transition_delay_us = INTEL_CPUFREQ_TRANSITION_DELAY;
|
||||||
|
}
|
||||||
|
|
||||||
min_freq = DIV_ROUND_UP(turbo_max * global.min_perf_pct, 100);
|
min_freq = DIV_ROUND_UP(turbo_max * global.min_perf_pct, 100);
|
||||||
min_freq *= cpu->pstate.scaling;
|
min_freq *= cpu->pstate.scaling;
|
||||||
@@ -2553,6 +2666,10 @@ static void intel_pstate_driver_cleanup(void)
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
put_online_cpus();
|
put_online_cpus();
|
||||||
|
|
||||||
|
if (intel_pstate_driver == &intel_pstate)
|
||||||
|
intel_pstate_sysfs_hide_hwp_dynamic_boost();
|
||||||
|
|
||||||
intel_pstate_driver = NULL;
|
intel_pstate_driver = NULL;
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -2560,6 +2677,9 @@ static int intel_pstate_register_driver(struct cpufreq_driver *driver)
|
|||||||
{
|
{
|
||||||
int ret;
|
int ret;
|
||||||
|
|
||||||
|
if (driver == &intel_pstate)
|
||||||
|
intel_pstate_sysfs_expose_hwp_dynamic_boost();
|
||||||
|
|
||||||
memset(&global, 0, sizeof(global));
|
memset(&global, 0, sizeof(global));
|
||||||
global.max_perf_pct = 100;
|
global.max_perf_pct = 100;
|
||||||
|
|
||||||
@@ -2577,9 +2697,6 @@ static int intel_pstate_register_driver(struct cpufreq_driver *driver)
|
|||||||
|
|
||||||
static int intel_pstate_unregister_driver(void)
|
static int intel_pstate_unregister_driver(void)
|
||||||
{
|
{
|
||||||
if (hwp_active)
|
|
||||||
return -EBUSY;
|
|
||||||
|
|
||||||
cpufreq_unregister_driver(intel_pstate_driver);
|
cpufreq_unregister_driver(intel_pstate_driver);
|
||||||
intel_pstate_driver_cleanup();
|
intel_pstate_driver_cleanup();
|
||||||
|
|
||||||
@@ -2835,7 +2952,10 @@ static int __init intel_pstate_init(void)
|
|||||||
hwp_active++;
|
hwp_active++;
|
||||||
hwp_mode_bdw = id->driver_data;
|
hwp_mode_bdw = id->driver_data;
|
||||||
intel_pstate.attr = hwp_cpufreq_attrs;
|
intel_pstate.attr = hwp_cpufreq_attrs;
|
||||||
default_driver = &intel_pstate;
|
intel_cpufreq.attr = hwp_cpufreq_attrs;
|
||||||
|
if (!default_driver)
|
||||||
|
default_driver = &intel_pstate;
|
||||||
|
|
||||||
goto hwp_cpu_matched;
|
goto hwp_cpu_matched;
|
||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
@@ -2906,14 +3026,13 @@ static int __init intel_pstate_setup(char *str)
|
|||||||
if (!str)
|
if (!str)
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
|
|
||||||
if (!strcmp(str, "disable")) {
|
if (!strcmp(str, "disable"))
|
||||||
no_load = 1;
|
no_load = 1;
|
||||||
} else if (!strcmp(str, "active")) {
|
else if (!strcmp(str, "active"))
|
||||||
default_driver = &intel_pstate;
|
default_driver = &intel_pstate;
|
||||||
} else if (!strcmp(str, "passive")) {
|
else if (!strcmp(str, "passive"))
|
||||||
default_driver = &intel_cpufreq;
|
default_driver = &intel_cpufreq;
|
||||||
no_hwp = 1;
|
|
||||||
}
|
|
||||||
if (!strcmp(str, "no_hwp")) {
|
if (!strcmp(str, "no_hwp")) {
|
||||||
pr_info("HWP disabled\n");
|
pr_info("HWP disabled\n");
|
||||||
no_hwp = 1;
|
no_hwp = 1;
|
||||||
|
@@ -576,6 +576,8 @@ unsigned int cpufreq_driver_resolve_freq(struct cpufreq_policy *policy,
|
|||||||
unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy);
|
unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy);
|
||||||
int cpufreq_register_governor(struct cpufreq_governor *governor);
|
int cpufreq_register_governor(struct cpufreq_governor *governor);
|
||||||
void cpufreq_unregister_governor(struct cpufreq_governor *governor);
|
void cpufreq_unregister_governor(struct cpufreq_governor *governor);
|
||||||
|
int cpufreq_start_governor(struct cpufreq_policy *policy);
|
||||||
|
void cpufreq_stop_governor(struct cpufreq_policy *policy);
|
||||||
|
|
||||||
#define cpufreq_governor_init(__governor) \
|
#define cpufreq_governor_init(__governor) \
|
||||||
static int __init __governor##_init(void) \
|
static int __init __governor##_init(void) \
|
||||||
|
Reference in New Issue
Block a user