Commit Graph

3613 Commits

Author SHA1 Message Date
Sebastian Andrzej Siewior
7f34b935e8 x86/mcheck: Be prepared for a rollback back to the ONLINE state
If we try a CPU down and fail in the middle then we roll back to the
online state. This means we would perform CPU_ONLINE / mce_device_create()
without invoking CPU_DEAD / mce_device_remove() for the cleanup of what was
allocated in CPU_ONLINE.

Be prepared for this and don't allocate the struct if we have it
already.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: rt@linutronix.de
Cc: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/20161110174447.11848-4-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-16 09:34:17 +01:00
Sebastian Andrzej Siewior
ec553abb31 x86/mcheck: Explicit cleanup on failure in mce_amd
If the ONLINE callback fails, the driver does not any clean up right
away instead it waits to get to the DEAD stage to do it. Yes, it waits.
Since we don't pass the error code back to the caller, no one knows.

Do the clean up right away so it does not look like a leak.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: rt@linutronix.de
Cc: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/20161110174447.11848-3-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-16 09:34:17 +01:00
Sebastian Andrzej Siewior
0943637293 x86/mcheck: Move threshold_create_device()
Move the threshold_create_device() so it can use
threshold_remove_device() without a forward declaration.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: rt@linutronix.de
Cc: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/20161110174447.11848-2-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-16 09:34:16 +01:00
Fenghua Yu
f410770293 x86/intel_rdt: Update percpu closid immeditately on CPUs affected by changee
If CPUs are moved to or removed from a rdtgroup, the percpu closid storage
is updated. If tasks running on an affected CPU use the percpu closid then
the PQR_ASSOC MSR is only updated when the task runs through a context
switch. Up to the context switch the CPUs operate on the wrong closid. This
state is potentially unbound.
    
Make the change immediately effective by invoking a smp function call on
the affected CPUs which stores the new closid in the perpu storage and
calls the rdt_sched_in() function which updates the MSR, if the current
task uses the percpu closid.

[ tglx: Made it work and massaged changelog once more ]

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1478912558-55514-3-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-15 18:35:50 +01:00
Fenghua Yu
c7cc0cc10c x86/intel_rdt: Reset per cpu closids on unmount
All CPUs in a rdtgroup are given back to the default rdtgroup before the
rdtgroup is removed during umount. After umount, the default rdtgroup
contains all online CPUs, but the per cpu closids are not cleared. As a
result the stale closid value will be used immediately after the next
mount.

Move all cpus to the default group and update the percpu closid storage.

[ tglx: Massaged changelong ]

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1478912558-55514-2-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-15 18:35:50 +01:00
Thomas Gleixner
a2584e1d5a x86/intel_rdt: Prevent deadlock against hotplug lock
The cpu online/offline callbacks of intel_rdt lock rdtgroup_mutex nested
inside of cpu hotplug lock. rdtgroup_cpus_write() does it in reverse order.

Remove the get/put_online_cpus() calls from rdtgroup_cpus_write(). This is
safe against cpu hotplug as the resource group cpumasks are protected by
rdtgroup_mutex.

Found by review, but should have been found if authors would have bothered
to test cpu hotplug with lockdep enabled.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Shaohua Li <shli@fb.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
2016-11-15 18:35:49 +01:00
Fenghua Yu
f57b308728 x86/intel_rdt: Protect info directory from removal
The info directory and the per-resource subdirectories of the info
directory have no reference to a struct rdtgroup in kn->priv. An attempt to
remove one of those directories results in a NULL pointer dereference.

Protect the directories from removal and return -EPERM instead of -ENOENT.

[ tglx: Massaged changelog ]

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1478912558-55514-1-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-15 18:35:49 +01:00
Borislav Petkov
54467353a9 x86/MCE: Correct TSC timestamping of error records
We did have logic in the MCE code which would TSC-timestamp an error
record only when it is exact - i.e., when it wasn't detected by polling.
This isn't the case anymore. So let's fix that:

We have a valid TSC timestamp in the error record only when it has been
a precise detection, i.e., either in the #MC handler or in one of the
interrupt handlers (thresholding, deferred, ...).

All other error records still have mce.time which contains the wall
time in order to be able to place the error record in time at least
approximately.

Also, this fixes another bug where machine_check_poll() would clear
mce.tsc unconditionally even if we requested precise MCP_TIMESTAMP
logging.

The proper fix would be to generate timestamp only when it has been
requested and not always. But that would require a more thorough code
audit of all mce_gather_info/mce_setup() users. Add a FIXME for now.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony <tony.luck@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: kernel test robot <xiaolong.ye@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: lkp@01.org
Link: http://lkml.kernel.org/r/20161110131053.kybsijfs5venpjnf@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-11 08:08:24 +01:00
Sudeep Holla
fac5148257 drivers: base: cacheinfo: fix x86 with CONFIG_OF enabled
With CONFIG_OF enabled on x86, we get the following error on boot:
"
	Failed to find cpu0 device node
 	Unable to detect cache hierarchy from DT for CPU 0
"
and the cacheinfo fails to get populated in the corresponding sysfs
entries. This is because cache_setup_of_node looks for of_node for
setting up the shared cpu_map without checking that it's already
populated in the architecture specific callback.

In order to indicate that the shared cpu_map is already populated, this
patch introduces a boolean `cpu_map_populated` in struct cpu_cacheinfo
that can be used by the generic code to skip cache_shared_cpu_map_setup.

This patch also sets that boolean for x86.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-10 17:30:53 +01:00
Thomas Gleixner
d49597fd3b x86/cpu: Deal with broken firmware (VMWare/XEN)
Both ACPI and MP specifications require that the APIC id in the respective
tables must be the same as the APIC id in CPUID.

The kernel retrieves the physical package id from the APIC id during the
ACPI/MP table scan and builds the physical to logical package map. The
physical package id which is used after a CPU comes up is retrieved from
CPUID. So we rely on ACPI/MP tables and CPUID agreeing in that respect.

There exist VMware and XEN implementations which violate the spec. As a
result the physical to logical package map, which relies on the ACPI/MP
tables does not work on those systems, because the CPUID initialized
physical package id does not match the firmware id. This causes system
crashes and malfunction due to invalid package mappings.

The only way to cure this is to sanitize the physical package id after the
CPUID enumeration and yell when the APIC ids are different. Fix up the
initial APIC id, which is fine as it is only used printout purposes.

If the physical package IDs differ yell and use the package information
from the ACPI/MP tables so the existing logical package map just works.

Chas provided the resulting dmesg output for his affected 4 virtual
sockets, 1 core per socket VM:

[Firmware Bug]: CPU1: APIC id mismatch. Firmware: 1 CPUID: 2
[Firmware Bug]: CPU1: Using firmware package id 1 instead of 2
....

Reported-and-tested-by: "Charles (Chas) Williams" <ciwillia@brocade.com>,
Reported-by: M. Vefa Bicakci <m.v.b@runbox.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: #4.6+ <stable@vger,kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1611091613540.3501@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-09 21:05:01 +01:00
Yazen Ghannam
b6a50cddbc x86/cpu/AMD: Clean up cpu_llc_id assignment per topology feature
These changes do not affect current hw - just a cleanup:

Currently, we assume that a system has a single Last Level Cache (LLC)
per node, and that the cpu_llc_id is thus equal to the node_id. This no
longer applies since Fam17h can have multiple last level caches within a
node.

So group the cpu_llc_id assignment by topology feature and family in
order to make the computation of cpu_llc_id on the different families
more clear.

Here is how the LLC ID is being computed on the different families:

The NODEID_MSR feature only applies to Fam10h in which case the LLC is
at the node level.

The TOPOEXT feature is used on families 15h, 16h and 17h. So far we only
see multiple last level caches if L3 caches are available. Otherwise,
the cpu_llc_id will default to be the phys_proc_id.

We have L3 caches only on families 15h and 17h:

 - on Fam15h, the LLC is at the node level.

 - on Fam17h, the LLC is at the core complex level and can be found by
   right shifting the APIC ID. Also, keep the family checks explicit so that
   new families will fall back to the default, which will be node_id for
   TOPOEXT systems.

Single node systems in families 10h and 15h will have a Node ID of 0
which will be the same as the phys_proc_id, so we don't need to check
for multiple nodes before using the node_id.

Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
[ Rewrote the commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20161108153054.bs3sajbyevq6a6uu@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-09 17:07:43 +01:00
Ingo Molnar
ca4b2df651 Merge branch 'x86/urgent' into x86/cpu, to pick up dependent fix
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-09 17:07:11 +01:00
Yazen Ghannam
b0b6e86846 x86/cpu/AMD: Fix cpu_llc_id for AMD Fam17h systems
cpu_llc_id (Last Level Cache ID) derivation on AMD Fam17h has an
underflow bug when extracting the socket_id value. It starts from 0
so subtracting 1 from it will result in an invalid value. This breaks
scheduling topology later on since the cpu_llc_id will be incorrect.

For example, the the cpu_llc_id of the *other* CPU in the loops in
set_cpu_sibling_map() underflows and we're generating the funniest
thread_siblings masks and then when I run 8 threads of nbench, they get
spread around the LLC domains in a very strange pattern which doesn't
give you the normal scheduling spread one would expect for performance.

Other things like EDAC use cpu_llc_id so they will be b0rked too.

So, the APIC ID is preset in APICx020 for bits 3 and above: they contain
the core complex, node and socket IDs.

The LLC is at the core complex level so we can find a unique cpu_llc_id
by right shifting the APICID by 3 because then the least significant bit
will be the Core Complex ID.

Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
[ Cleaned up and extended the commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org> # v4.4..
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 3849e91f57 ("x86/AMD: Fix last level cache topology for AMD Fam17h systems")
Link: http://lkml.kernel.org/r/20161108083506.rvqb5h4chrcptj7d@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-09 17:06:08 +01:00
Borislav Petkov
c09a8c40e0 x86/RAS: Hide SMCA bank names
Add accessor functions and hide the smca_names array. Also, add a
sanity-check to bank HWID assignment in get_smca_bank_info().

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20161104152317.5r276t35df53qk76@pd.tnic
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-08 17:10:15 +01:00
Borislav Petkov
a9a1c0ee04 x86/RAS: Rename smca_bank_names to smca_names
Make it differ more from struct smca_bank_name for better readability.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: http://lkml.kernel.org/r/20161103125556.15482-3-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-08 17:10:14 +01:00
Borislav Petkov
1ce9cd7f9f x86/RAS: Simplify SMCA HWID descriptor struct
Call it simply smca_hwid and call local variables "hwid". More readable.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: http://lkml.kernel.org/r/20161103125556.15482-2-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-08 17:10:14 +01:00
Borislav Petkov
79349f529a x86/RAS: Simplify SMCA bank descriptor struct
Call the struct simply smca_bank, it's instance ID can be simply ->id.
Makes the code much more readable.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: http://lkml.kernel.org/r/20161103125556.15482-1-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-08 17:10:14 +01:00
Borislav Petkov
cd9c57cad3 x86/MCE: Dump MCE to dmesg if no consumers
When there are no error record consumers registered with the kernel, the
only thing that appears in dmesg is something like:

  [  300.000326] mce: [Hardware Error]: Machine check events logged

and the error records are gone. Which is seriously counterproductive.

So let's dump them to dmesg instead, in such a case.

Requested-by: Eric Morton <Eric.Morton@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20161101120911.13163-4-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-08 17:10:13 +01:00
Yinghai Lu
f5e886ef9b x86/MCE: Do not look at panic_on_oops in the severity grading
The MCE tolerance levels control whether we panic on a machine check or do
something else like generating a signal and logging error information. This
is controlled by the mce=<level> command line parameter.

However, if panic_on_oops is set, it will force a panic for such an MCE
even though the user didn't want to.

So don't check panic_on_oops in the severity grading anymore.

One of the use cases for that is recovery from uncorrectable errors with
mce=2.

 [ Boris: rewrite commit message. ]

Signed-off-by: Yinghai Lu <yinghai.lu@oracle.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20160916202325.4972-1-yinghai@kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-08 17:10:12 +01:00
Shaohua Li
53a114a690 x86/intel_rdt: Export the minimum number of set mask bits in sysfs
The minimum number of bits set for a cache mask is checked by the kernel
when writing a mask, but there is no way for the user to retrieve this
information.

Add a new file 'min_cbm_bits' to the info directory and export the
information to user space.

[ tglx: Massaged changelog ]
Signed-off-by: Shaohua Li <shli@fb.com>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Link: http://lkml.kernel.org/r/e69b1ffa206d0353eea58101e1bf9b677d9732f7.1478207143.git.shli@fb.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-07 12:20:52 +01:00
Shaohua Li
7bff0af510 x86/intel_rdt: Propagate error in rdt_mount() properly
gcc complains:
"warning: ‘dentry’ may be used uninitialized in this function"

The error exit path in rdt_mount(), which deals with a failure in
rdtgroup_create_info_dir(), does not set the error code in dentry and
returns the uninitialized dentry value.

Add the missing error propagation.

[tglx: Massaged changelog ]
Fixes: 4e978d06de ("x86/intel_rdt: Add "info" files to resctrl file system")
Signed-off-by: Shaohua Li <shli@fb.com>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Link: http://lkml.kernel.org/r/a56a556f6768dc12cadbf97f49e000189056f90e.1478207143.git.shli@fb.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-07 12:20:52 +01:00
Fenghua Yu
4f341a5e48 x86/intel_rdt: Add scheduler hook
Hook the x86 scheduler code to update closid based on whether the current
task is assigned to a specific closid or running on a CPU assigned to a
specific closid.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477692289-37412-10-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30 19:10:16 -06:00
Tony Luck
60ec2440c6 x86/intel_rdt: Add schemata file
Last of the per resource group files. Also mode 0644. This one shows
the resources available to the group. Syntax depends on whether the
"cdp" mount option was given. With code/data prioritization disabled
it is simply a list of masks for each cache domain. Initial value
allows access to all of the L3 cache on all domains. E.g. on a 2 socket
Broadwell:
        L3:0=fffff;1=fffff
With CDP enabled, separate masks for data and instructions are provided:
        L3DATA:0=fffff;1=fffff
        L3CODE:0=fffff;1=fffff

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477692289-37412-9-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30 19:10:16 -06:00
Fenghua Yu
e02737d5b8 x86/intel_rdt: Add tasks files
The root directory all subdirectories are automatically populated with a
read/write (mode 0644) file named "tasks". When read it will show all the
task IDs assigned to the resource group. Tasks can be added (one at a time)
to a group by writing the task ID to the file.  E.g.

Membership in a resource group is indicated by a new field in the
task_struct "int closid" which holds the CLOSID for each task. The default
resource group uses CLOSID=0 which means that all existing tasks when the
resctrl file system is mounted belong to the default group.

If a group is removed, tasks which are members of that group are moved to
the default group.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477692289-37412-8-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30 19:10:15 -06:00
Tony Luck
12e0110c11 x86/intel_rdt: Add cpus file
Now we populate each directory with a read/write (mode 0644) file
named "cpus". This is used to over-ride the resources available
to processes in the default resource group when running on specific
CPUs.  Each "cpus" file reads as a cpumask showing which CPUs belong
to this resource group. Initially all online CPUs are assigned to
the default group. They can be added to other groups by writing a
cpumask to the "cpus" file in the directory for the resource group
(which will remove them from the previous group to which they were
assigned). CPU online/offline operations will delete CPUs that go
offline from whatever group they are in and add new CPUs to the
default group.

If there are CPUs assigned to a group when the directory is removed,
they are returned to the default group.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477692289-37412-7-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30 19:10:15 -06:00
Fenghua Yu
60cf5e101f x86/intel_rdt: Add mkdir to resctrl file system
Resource control groups are represented as directories in the resctrl
file system. The root directory describes the default resources available
to tasks that have not been assigned specific resources. Other directories
can be created at the root level to make new resource groups. It is not
permitted to make directories within other directories.

Hardware uses a CLOSID (Class of service ID) to determine which resource
limits are currently in effect. The exact number available is enumerated
by CPUID leaf 0x10, but on current implementations it is a small number.
We implement a simple bitmask allocator for CLOSIDs.

Each resource control group uses one CLOSID, which limits the total number
of directories that can be created.

Resource groups can be removed using rmdir.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477692289-37412-6-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30 19:10:14 -06:00
Fenghua Yu
4e978d06de x86/intel_rdt: Add "info" files to resctrl file system
For the convenience of applications we make the decoded values of some
of the CPUID values available in read-only (0444) files.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477692289-37412-5-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30 19:10:14 -06:00
Fenghua Yu
5ff193fbde x86/intel_rdt: Add basic resctrl filesystem support
Use kernfs as basis for our user interface filesystem. This patch
supports mount/umount, and one mount parameter "cdp" to enable code/data
prioritization (though all we do at this point is ensure that the system
can support CDP).  The file system is not populated yet in this patch.

[ tglx: Fixed up a few nits and added cdp handling in case of error ]

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477692289-37412-4-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30 19:10:14 -06:00
Tony Luck
2264d9c74d x86/intel_rdt: Build structures for each resource based on cache topology
We use the cpu hotplug notifier to catch each cpu in turn and look at
its cache topology w.r.t each of the resource groups. As we discover
new resources, we initialize the bitmask array for each to the default
(full access) value.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477692289-37412-3-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30 19:10:13 -06:00
Alexey Makhalov
80e9a4f21f x86/vmware: Add paravirt sched clock
The default sched_clock() implementation is native_sched_clock(). It
contains code to handle non constant frequency TSCs, which creates
overhead for systems with constant frequency TSCs.

The vmware hypervisor guarantees a constant frequency TSC, so
native_sched_clock() is not required and slower than a dedicated function
which operates with one time calculated conversion factors.

Calculate the conversion factors at boot time from the tsc frequency and
install an optimized sched_clock() function via paravirt ops.

The paravirtualized clock can be disabled on the kernel command line with
the new 'no-vmw-sched-clock' option.

Signed-off-by: Alexey Makhalov <amakhalov@vmware.com>
Acked-by: Alok N Kataria <akataria@vmware.com>
Cc: linux-doc@vger.kernel.org
Cc: pv-drivers@vmware.com
Cc: corbet@lwn.net
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20161028075432.90579-4-amakhalov@vmware.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30 08:57:08 +01:00
Alexey Makhalov
91d1e54ebd x86/vmware: Add basic paravirt ops support
Add basic paravirt support:

 1. Set pv_info.name to "VMware hypervisor" to have proper boot log message
	Booting paravirtualized kernel on VMware hypervisor
    instead of "... on bare hardware"

 2. Set pv_cpu_ops.io_delay() to empty function - paravirt_noop() to
    avoid vm-exits on IO delays because io delays they are not required.

Signed-off-by: Alexey Makhalov <amakhalov@vmware.com>
Acked-by: Alok N Kataria <akataria@vmware.com>
Cc: linux-doc@vger.kernel.org
Cc: pv-drivers@vmware.com
Cc: corbet@lwn.net
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20161028075432.90579-3-amakhalov@vmware.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30 08:57:07 +01:00
Alexey Makhalov
687bca8d66 x86/vmware: Use tsc_khz value for calibrate_cpu()
Commit aa297292d7 ("x86/tsc: Enumerate SKL cpu_khz and tsc_khz via
CPUID") separated the calibration mechanisms for cpu_khz and tsc_khz.

Since the vmware hypervisor provides a constant frequency TSC to the guest,
this change can lead to divergence between the tsc and the cpu frequency
after vmotion, which might confuse the user.

Solve this by overriding the x86 platform cpu calibration callback with the
vmware specific tsc calibration function.

Signed-off-by: Alexey Makhalov <amakhalov@vmware.com>
Acked-by: Alok N Kataria <akataria@vmware.com>
Cc: linux-doc@vger.kernel.org
Cc: pv-drivers@vmware.com
Cc: corbet@lwn.net
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20161028075432.90579-2-amakhalov@vmware.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30 08:57:07 +01:00
Borislav Petkov
1c27f646b1 x86/microcode/AMD: Fix more fallout from CONFIG_RANDOMIZE_MEMORY=y
We needed the physical address of the container in order to compute the
offset within the relocated ramdisk. And we did this by doing __pa() on
the virtual address.

However, __pa() does checks whether the physical address is within
PAGE_OFFSET and __START_KERNEL_map - see __phys_addr() - which fail
if we have CONFIG_RANDOMIZE_MEMORY enabled: we feed a virtual address
which *doesn't* have the randomization offset into a function which uses
PAGE_OFFSET which *does* have that offset.

This makes this check fire:

	VIRTUAL_BUG_ON((x > y) || !phys_addr_valid(x));
			^^^^^^

due to the randomization offset.

The fix is as simple as using __pa_nodebug() because we do that
randomization offset accounting later in that function ourselves.

Reported-by: Bob Peterson <rpeterso@redhat.com>
Tested-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm <linux-mm@kvack.org>
Cc: stable@vger.kernel.org # 4.9
Link: http://lkml.kernel.org/r/20161027123623.j2jri5bandimboff@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-28 10:29:59 +02:00
Fenghua Yu
c1c7c3f9d6 x86/intel_rdt: Pick up L3/L2 RDT parameters from CPUID
Define struct rdt_resource to hold all the parameterized values for an RDT
resource and fill in the CPUID enumerated values from leaf 0x10 if
available. Hard code them for the MSR detected Haswells.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477142405-32078-9-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-26 23:12:39 +02:00
Fenghua Yu
113c60970c x86/intel_rdt: Add Haswell feature discovery
Some Haswell generation CPUs support RDT, but they don't enumerate this via
CPUID.  Use rdmsr_safe() and wrmsr_safe() to probe the MSRs on cpu model 63
(INTEL_FAM6_HASWELL_X)

Move the relevant defines into a common header file which is shared between
RDT/CQM and RDT/Allocation to avoid duplication.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477142405-32078-8-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-26 23:12:38 +02:00
Fenghua Yu
78e99b4a2b x86/intel_rdt: Add CONFIG, Makefile, and basic initialization
Introduce CONFIG_INTEL_RDT_A (default: no, dependent on CPU_SUP_INTEL) to
control inclusion of Resource Director Technology in the build.

Simple init() routine just checks which features are present. If they are
pr_info() one line summary for each feature for now.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477142405-32078-7-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-26 23:12:38 +02:00
Fenghua Yu
4ab1586488 x86/cpufeature: Add RDT CPUID feature bits
Check CPUID leaves for all the Resource Director Technology (RDT)
Cache Allocation Technology (CAT) bits.

Presence of allocation features:
  CPUID.(EAX=7H, ECX=0):EBX[bit 15]	X86_FEATURE_RDT_A

L2 and L3 caches are each separately enabled:
  CPUID.(EAX=10H, ECX=0):EBX[bit 1]	X86_FEATURE_CAT_L3
  CPUID.(EAX=10H, ECX=0):EBX[bit 2]	X86_FEATURE_CAT_L2

L3 cache may support independent control of allocation for
code and data (CDP = Code/Data Prioritization):
  CPUID.(EAX=10H, ECX=1):ECX[bit 2]	X86_FEATURE_CDP_L3

[ tglx: Fixed up Borislavs comments and moved the feature bits into a gap ]

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: "Borislav Petkov" <bp@suse.de>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477142405-32078-5-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-26 23:12:38 +02:00
Fenghua Yu
d57e3ab7e3 x86/intel_cacheinfo: Enable cache id in cache info
Cache id is retrieved from APIC ID and CPUID leaf 4 on x86.

For more details please see the section on "Cache ID Extraction
Parameters" in "Intel 64 Architecture Processor Topology Enumeration".

Also the documentation of the CPUID instruction in the "Intel 64 and
IA-32 Architectures Software Developer's Manual"

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477142405-32078-4-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-26 23:12:37 +02:00
Borislav Petkov
14cfbe55c7 x86/microcode: Bump driver version, update copyrights
Let's increment that number finally: it is long overdue.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-13-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25 12:28:59 +02:00
Borislav Petkov
06b8534cb7 x86/microcode: Rework microcode loading
Yeah, I know, I know, this is a huuge patch and reviewing it is hard.

Sorry but this is the only way I could think of in which I can rewrite
the microcode patches loading procedure without breaking (knowingly) the
driver.

So maybe this patch is easier to review if one looks at the files after
the patch has been applied instead at the diff. Because then it becomes
pretty obvious:

* The BSP-loading path - load_ucode_bsp() is working independently from
  the AP path now and it doesn't save any pointers or patches anymore -
  it solely parses the builtin or initrd microcode and applies the patch.
  That's it.

This fixes the CONFIG_RANDOMIZE_MEMORY offset fun more solidly.

* The AP-loading path - load_ucode_ap() then goes and scans
  builtin/initrd *again* for the microcode patches but it caches them this
  time so that we don't have to do that scan on each AP but only once.

This simplifies the code considerably.

Then, when we save the microcode from the initrd/builtin, we go and
add the relevant patches to our own cache. The AMD side did do that
and now the Intel side does it too. So no more pointer copying and
blabla, we save the microcode patches ourselves and are independent from
initrd/builtin.

This whole conversion gives us other benefits like unifying the
initrd parsing into a single function: find_microcode_in_initrd() is
used by both.

The diffstat speaks for itself: 456 insertions(+), 695 deletions(-)

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-12-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25 12:28:59 +02:00
Borislav Petkov
8027923ab4 x86/microcode/intel: Remove intel_lib.c
Its functions are used in intel.c only now, so get rid of it. Make
functions static.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-11-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25 12:28:59 +02:00
Borislav Petkov
76bd11c23a x86/microcode/amd: Move private inlines to .c and mark local functions static
Make them all static as they're used in a single file now.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-10-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25 12:28:59 +02:00
Borislav Petkov
7f709d0c32 x86/microcode: Collect CPU info on resume
We need to reread the CPU's microcode revision after resume because
applied microcode gets "forgotten" depending on the sleep state.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-9-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25 12:28:58 +02:00
Borislav Petkov
6b14b81899 x86/microcode: Issue the debug printk on resume only on success
Move it after the patch application function which also checks whether
we were successful.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-8-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25 12:28:58 +02:00
Borislav Petkov
b3763a672d x86/microcode/amd: Hand down the CPU family
Will be needed in a following patch.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-7-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25 12:28:58 +02:00
Borislav Petkov
058dc49803 x86/microcode: Export the microcode cache linked list
It will be used by both drivers so move it to core.c.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-6-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25 12:28:58 +02:00
Borislav Petkov
f61337d984 x86/microcode/intel: Simplify generic_load_microcode()
Make it return the ucode_state directly instead of assigning to a state
variable and jumping to an out: label.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-4-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25 12:28:57 +02:00
Borislav Petkov
5879a26752 x86/microcode: Move driver authors to CREDITS
They're not active anymore.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-3-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25 12:28:57 +02:00
Borislav Petkov
777284b66f x86/microcode: Run the AP-loading routine only on the application processors
cpu_init() is run also on the BSP (in addition to the APs):

 x86_64_start_kernel
 |-> x86_64_start_reservations
 |-> start_kernel
 |-> trap_init
 |-> cpu_init
 |-> load_ucode_ap
 ...

but we run the AP (Application Processors) microcode loading routine
there too even though we have a BSP-specific routine for that:
load_ucode_bsp().

Which is unnecessary. So let's limit the AP microcode loading routine to
the APs only.

Remove a useless comment while at it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25 12:28:57 +02:00
Borislav Petkov
59c6f278bd x86/cpu: Get rid of the show_msr= boot option
It is useless as it dumps the MSRs too early BUT(!) we do set MSRs later too.
Also, it dumps only BSP MSRs as it gets called only for CPU 0.

And the MSR range array would need constant updating anyway, and so on
and so on...

Oh, and we have msr.ko and msr-tools which are the much better solution
anyway. So off it goes...

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161024173844.23038-4-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25 11:48:50 +02:00