This function is a misnomer on two levels:
1) it doesn't really manipulate TS on modern CPUs anymore, its
primary purpose is to save FPU state, used:
- when executing fork()/clone(): to copy current FPU state
to the child's FPU state.
- when handling math exceptions: to generate the math error
si_code in the signal frame.
2) even on legacy CPUs it doesn't actually 'unlazy', if then
it lazies the FPU state: as a side effect of the old FNSAVE
instruction which clears (destroys) FPU state it's necessary
to set CR0::TS.
So rename it to fpu__save() to better reflect its purpose.
Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
So while testing kernels using tools/kvm/ (kvmtool) I noticed that it
booted super slow:
[ 0.142991] Performance Events: no PMU driver, software events only.
[ 0.149265] x86: Booting SMP configuration:
[ 0.149765] .... node #0, CPUs: #1
[ 0.148304] kvm-clock: cpu 1, msr 2:1bfe9041, secondary cpu clock
[ 10.158813] KVM setup async PF for cpu 1
[ 10.159000] #2
[ 10.159000] kvm-stealtime: cpu 1, msr 211a4d400
[ 10.158829] kvm-clock: cpu 2, msr 2:1bfe9081, secondary cpu clock
[ 20.167805] KVM setup async PF for cpu 2
[ 20.168000] #3
[ 20.168000] kvm-stealtime: cpu 2, msr 211a8d400
[ 20.167818] kvm-clock: cpu 3, msr 2:1bfe90c1, secondary cpu clock
[ 30.176902] KVM setup async PF for cpu 3
[ 30.177000] #4
[ 30.177000] kvm-stealtime: cpu 3, msr 211acd400
One CPU booted up per 10 seconds. With 120 CPUs that takes a while.
Bisection pinpointed this commit:
853b160aaa ("Revert f5d6a52f51 ("x86/smpboot: Skip delays during SMP initialization similar to Xen")")
But that commit just restores previous behavior, so it cannot cause the
problem. After some head scratching it turns out that these two commits:
1a744cb356 ("x86/smp/boot: Remove 10ms delay from cpu_up() on modern processors")
d68921f9bd ("x86/smp/boot: Add cmdline "cpu_init_udelay=N" to specify cpu_up() delay")
added the following code to smpboot.c:
- mdelay(10);
+ mdelay(init_udelay);
Note the mismatch in the units: the delay is called 'udelay' and is set
to microseconds - while the function used here is actually 'mdelay',
which counts in milliseconds ...
So the delay for legacy systems is off by a factor of 1,000, so instead
of 10 msecs we waited for 10 seconds ...
The reason bisection pointed to 853b160aaa was that 853b160aaa removed
a (broken) boot-time speedup patch, which masked the factor 1,000 bug.
Fix it by using udelay(). This fixes my bootup problems.
Cc: Len Brown <len.brown@intel.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan H. Schönherr <jschoenh@amazon.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Derek noticed that a critical MCE gets reported with the wrong
error type description:
[Hardware Error]: CPU 34: Machine Check Exception: 5 Bank 9: f200003f000100b0
[Hardware Error]: RIP !INEXACT! 10:<ffffffff812e14c1> {intel_idle+0xb1/0x170}
[Hardware Error]: TSC 49587b8e321cb
[Hardware Error]: PROCESSOR 0:306e4 TIME 1431561296 SOCKET 1 APIC 29
[Hardware Error]: Some CPUs didn't answer in synchronization
[Hardware Error]: Machine check: Invalid
^^^^^^^
The last line with 'Invalid' should have printed the high level
MCE error type description we get from mce_severity, i.e.
something like:
[Hardware Error]: Machine check: Action required: data load error in a user process
this happens due to the fact that mce_no_way_out() iterates over
all MCA banks and possibly overwrites the @msg argument which is
used in the panic printing later.
Change behavior to take the message of only and the (last)
critical MCE it detects.
Reported-by: Derek <denc716@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1431936437-25286-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
smp.c and irq_work.c implement the same inline helper. Move it to
apic.h and use it everywhere.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Fix the following oops:
hpet_msi_get_hwirq+0x1f/0x27
msi_domain_alloc+0x35/0xfe
? trace_hardirqs_on_caller+0x16c/0x188
irq_domain_alloc_irqs_recursive+0x51/0x95
__irq_domain_alloc_irqs+0x151/0x223
hpet_assign_irq+0x5d/0x68
hpet_msi_capability_lookup+0x121/0x1cb
? hpet_enable+0x2b4/0x2b4
hpet_late_init+0x5f/0xf2
? hpet_enable+0x2b4/0x2b4
do_one_initcall+0x184/0x199
kernel_init_freeable+0x1af/0x237
? rest_init+0x13a/0x13a
kernel_init+0xe/0xd4
ret_from_fork+0x3f/0x70
? rest_init+0x13a/0x13a
Since 3cb96f0c97 ('x86/hpet: Enhance HPET IRQ to support
hierarchical irqdomains') hpet_msi_capability_lookup() uses
hpet_assign_irq(). The latter initializes irq_alloc_info on stack, but
passes a NULL pointer to irq_domain_alloc_irqs(), which causes a NULL
pointer dereference later in hpet_msi_get_hwirq().
Pass the pointer to the irq_alloc_info irq_domain_alloc_irqs().
Fixes: 3cb96f0c97 'x86/hpet: Enhance HPET IRQ to support hierarchical irqdomains'
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Link: http://lkml.kernel.org/r/20150512041444.GA1094@swordfish
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull RAS updates from Borislav Petkov:
- RAS: Add support for deferred errors on AMD (Aravind Gopalakrishnan)
This is an important RAS feature which adds hardware support for
poisoned data. That means roughly that the hardware marks data which it
has detected as corrupted but wasn't able to correct, as poisoned data
and raises an APIC interrupt to signal that in the form of a deferred
error. It is the OS's responsibility then to take proper recovery action
and thus prolonge system lifetime as far as possible.
- Misc cleanups ontop. (Borislav Petkov)"
Signed-off-by: Ingo Molnar <mingo@kernel.org>
drm-intel-next-2015-04-23:
- dither support for ns2501 dvo (Thomas Richter)
- some polish for the gtt code and fixes to finally enable the cmd parser on hsw
- first pile of bxt stage 1 enabling (too many different people to list ...)
- more psr fixes from Rodrigo
- skl rotation support from Chandra
- more atomic work from Ander and Matt
- pile of cleanups and micro-ops for execlist from Chris
drm-intel-next-2015-04-10:
- cdclk handling cleanup and fixes from Ville
- more prep patches for olr removal from John Harrison
- gmbus pin naming rework from Jani (prep for bxt)
- remove ->new_config from Ander (more atomic conversion work)
- rps (boost) tuning and unification with byt/bsw from Chris
- cmd parser batch bool tuning from Chris
- gen8 dynamic pte allocation (Michel Thierry, based on work from Ben Widawsky)
- execlist tuning (not yet all of it) from Chris
- add drm_plane_from_index (Chandra)
- various small things all over
* tag 'drm-intel-next-2015-04-23-fixed' of git://anongit.freedesktop.org/drm-intel: (204 commits)
drm/i915/gtt: Allocate va range only if vma is not bound
drm/i915: Enable cmd parser to do secure batch promotion for aliasing ppgtt
drm/i915: fix intel_prepare_ddi
drm/i915: factor out ddi_get_encoder_port
drm/i915/hdmi: check port in ibx_infoframe_enabled
drm/i915/hdmi: fix vlv infoframe port check
drm/i915: Silence compiler warning in dvo
drm/i915: Update DRIVER_DATE to 20150423
drm/i915: Enable dithering on NatSemi DVO2501 for Fujitsu S6010
rm/i915: Move i915_get_ggtt_vma_pages into ggtt_bind_vma
drm/i915: Don't try to outsmart gcc in i915_gem_gtt.c
drm/i915: Unduplicate i915_ggtt_unbind/bind_vma
drm/i915: Move ppgtt_bind/unbind around
drm/i915: move i915_gem_restore_gtt_mappings around
drm/i915: Fix up the vma aliasing ppgtt binding
drm/i915: Remove misleading comment around bind_to_vm
drm/i915: Don't use atomics for pg_dirty_rings
drm/i915: Don't look at pg_dirty_rings for aliasing ppgtt
drm/i915/skl: Support Y tiling in MMIO flips
drm/i915: Fixup kerneldoc for struct intel_context
...
Conflicts:
drivers/gpu/drm/i915/i915_drv.c
iTLB-load-misses and LLC-load-misses count incorrectly on SLM.
There is no ITLB.MISSES support on SLM. Event PAGE_WALKS.I_SIDE_WALK
should be used to count iTLB-load-misses. This event counts when an
instruction (I) page walk is completed or started. Since a page walk
implies a TLB miss, the number of TLB misses can be counted by counting
the number of pagewalks.
DMND_DATA_RD counts both demand and DCU prefetch data reads. However,
LLC-load-misses should only count demand reads. There is no way to not
include prefetches with a single counter on SLM. So the LLC-load-misses
support should be removed on SLM.
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1429608881-5055-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It is useless and git history has it all detailed anyway. Update
copyright while at it.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Those were leftovers of the x86 merge, see
081f75bbdc ("traps: x86: make traps_32.c and traps_64.c equal")
for example and are not needed now.
Signed-off-by: Borislav Petkov <bp@suse.de>
If you try to enable NOHZ_FULL on a guest today, you'll get
the following error when the guest tries to deactivate the
scheduler tick:
WARNING: CPU: 3 PID: 2182 at kernel/time/tick-sched.c:192 can_stop_full_tick+0xb9/0x290()
NO_HZ FULL will not work with unstable sched clock
CPU: 3 PID: 2182 Comm: kworker/3:1 Not tainted 4.0.0-10545-gb9bb6fb #204
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Workqueue: events flush_to_ldisc
ffffffff8162a0c7 ffff88011f583e88 ffffffff814e6ba0 0000000000000002
ffff88011f583ed8 ffff88011f583ec8 ffffffff8104d095 ffff88011f583eb8
0000000000000000 0000000000000003 0000000000000001 0000000000000001
Call Trace:
<IRQ> [<ffffffff814e6ba0>] dump_stack+0x4f/0x7b
[<ffffffff8104d095>] warn_slowpath_common+0x85/0xc0
[<ffffffff8104d146>] warn_slowpath_fmt+0x46/0x50
[<ffffffff810bd2a9>] can_stop_full_tick+0xb9/0x290
[<ffffffff810bd9ed>] tick_nohz_irq_exit+0x8d/0xb0
[<ffffffff810511c5>] irq_exit+0xc5/0x130
[<ffffffff814f180a>] smp_apic_timer_interrupt+0x4a/0x60
[<ffffffff814eff5e>] apic_timer_interrupt+0x6e/0x80
<EOI> [<ffffffff814ee5d1>] ? _raw_spin_unlock_irqrestore+0x31/0x60
[<ffffffff8108bbc8>] __wake_up+0x48/0x60
[<ffffffff8134836c>] n_tty_receive_buf_common+0x49c/0xba0
[<ffffffff8134a6bf>] ? tty_ldisc_ref+0x1f/0x70
[<ffffffff81348a84>] n_tty_receive_buf2+0x14/0x20
[<ffffffff8134b390>] flush_to_ldisc+0xe0/0x120
[<ffffffff81064d05>] process_one_work+0x1d5/0x540
[<ffffffff81064c81>] ? process_one_work+0x151/0x540
[<ffffffff81065191>] worker_thread+0x121/0x470
[<ffffffff81065070>] ? process_one_work+0x540/0x540
[<ffffffff8106b4df>] kthread+0xef/0x110
[<ffffffff8106b3f0>] ? __kthread_parkme+0xa0/0xa0
[<ffffffff814ef4f2>] ret_from_fork+0x42/0x70
[<ffffffff8106b3f0>] ? __kthread_parkme+0xa0/0xa0
---[ end trace 06e3507544a38866 ]---
However, it turns out that kvmclock does provide a stable
sched_clock callback. So, let the scheduler know this which
in turn makes NOHZ_FULL work in the guest.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>