As Frederic pointed idle_cpu() may return false even if async fault
happened in the idle task if wake up is pending. In this case the code
will try to put idle task to sleep. Fix this by using is_idle_task() to
check for idle task.
Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Hook into generic pvclock vsyscall code, with the aim to
allow userspace to have visibility into pvclock data.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Originally from Jeremy Fitzhardinge.
Introduce generic, non hypervisor specific, pvclock initialization
routines.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Originally from Jeremy Fitzhardinge.
So code can be reused.
Acked-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Originally from Jeremy Fitzhardinge.
We can copy the information directly from "struct pvclock_vcpu_time_info",
remove pvclock_shadow_time.
Reviewed-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Originally from Jeremy Fitzhardinge.
pvclock_get_time_values, which contains the memory barriers
will be removed by next patch.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
We want to expose the pvclock shared memory areas, which
the hypervisor periodically updates, to userspace.
For a linear mapping from userspace, it is necessary that
entire page sized regions are used for array of pvclock
structures.
There is no such guarantee with per cpu areas, therefore move
to memblock_alloc based allocation.
Acked-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
There appear to have been some 486 clones, including the "enhanced"
version of Am486, which have CPUID but not CR4. These 486 clones had
only the FPU flag, if any, unlike the Intel 486s with CPUID, which
also had VME and therefore needed CR4.
Therefore, look at the basic CPUID flags and require at least one bit
other than bit 0 before we modify CR4.
Thanks to Christian Ludloff of sandpile.org for confirming this as a
problem.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Issues that need to be handled:
* Handle PIC interrupts on any CPU irrespective of the apic mode
* In the apic lowest priority logical flat delivery mode, be prepared to
handle the interrupt on any CPU irrespective of what the IO-APIC RTE says.
* Because of above, when the IO-APIC starts handling the legacy PIC interrupt,
use the same vector that is being used by the PIC while programming the
corresponding IO-APIC RTE.
Start with all the cpu's in the legacy PIC interrupts cfg->domain.
By the time IO-APIC starts taking over the PIC interrupts, apic driver
model is finalized. So depend on the assign_irq_vector() to update the
cfg->domain and retain the same vector that was used by PIC before.
For the logical apic flat mode, cfg->domain is updated (during the first
call to assign_irq_vector()) to contain all the possible online cpu's (0xff).
Vector used for the legacy PIC interrupt doesn't change when the IO-APIC
starts handling the interrupt. Any interrupt migration after that
doesn't change the cfg->domain or the vector used.
For other apic modes like physical mode, cfg->domain is updated
(during the first call to assign_irq_vector()) to the boot cpu (cpu-0),
with the same vector that is being used by the PIC. When that interrupt is
migrated to a different cpu, cfg->domin and the vector assigned will change
accordingly.
Tested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1353970176.21070.51.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
CONFIG_DEBUG_HOTPLUG_CPU0 is for debugging the CPU0 hotplug feature. The switch
offlines CPU0 as soon as possible and boots userspace up with CPU0 offlined.
User can online CPU0 back after boot time. The default value of the switch is
off.
To debug CPU0 hotplug, you need to enable CPU0 offline/online feature by either
turning on CONFIG_BOOTPARAM_HOTPLUG_CPU0 during compilation or giving
cpu0_hotplug kernel parameter at boot.
It's safe and early place to take down CPU0 after all hotplug notifiers
are installed and SMP is booted.
Please note that some applications or drivers, e.g. some versions of udevd,
during boot time may put CPU0 online again in this CPU0 hotplug debug mode.
In this debug mode, setup_local_APIC() may report a warning on max_loops<=0
when CPU0 is onlined back after boot time. This is because pending interrupt in
IRR can not move to ISR. The warning is not CPU0 specfic and it can happen on
other CPUs as well. It is harmless except the first CPU0 online takes a bit
longer time. And so this debug mode is useful to expose this issue. I'll send
a seperate patch to fix this generic warning issue.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1352835171-3958-15-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Instead of waiting for STARTUP after INITs, BSP will execute the BIOS boot-strap
code which is not a desired behavior for waking up BSP. To avoid the boot-strap
code, wake up CPU0 by NMI instead.
This works to wake up soft offlined CPU0 only. If CPU0 is hard offlined (i.e.
physically hot removed and then hot added), NMI won't wake it up. We'll change
this code in the future to wake up hard offlined CPU0 if real platform and
request are available.
AP is still waken up as before by INIT, SIPI, SIPI sequence.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1352896613-25957-1-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The ACPI specificiation would like us to save NVS at hibernation time,
but makes no mention of saving NVS over S3. Not all versions of
Windows do this either, and it is clear that not all machines need NVS
saved/restored over S3. Allow the user to improve their suspend/resume
time by disabling the NVS save/restore at S3 time, but continue to do
the NVS save/restore for S4 as specified.
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If CONFIG_BOOTPARAM_HOTPLUG_CPU is turned on, CPU0 hotplug feature is enabled
by default.
If CONFIG_BOOTPARAM_HOTPLUG_CPU is not turned on, CPU0 hotplug feature is not
enabled by default. The kernel parameter cpu0_hotplug can enable CPU0 hotplug
feature at boot.
Currently the feature is supported on Intel platforms only.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1352835171-3958-4-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
In order to promote interoperability between userspace tracers and ftrace,
add a trace_clock that reports raw TSC values which will then be recorded
in the ring buffer. Userspace tracers that also record TSCs are then on
exactly the same time base as the kernel and events can be unambiguously
interlaced.
Tested: Enabled a tracepoint and the "tsc" trace_clock and saw very large
timestamp values.
v2:
Move arch-specific bits out of generic code.
v3:
Rename "x86-tsc", cleanups
v7:
Generic arch bits in Kbuild.
Google-Bug-Id: 6980623
Link: http://lkml.kernel.org/r/1352837903-32191-1-git-send-email-dhsharp@google.com
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Signed-off-by: David Sharp <dhsharp@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The patch is based on a patch submitted by Hans Rosenfeld.
See http://marc.info/?l=linux-kernel&m=133908777200931
Note that CPUID Fn8000_001D_EAX slightly differs to Intel's CPUID function 4.
Bits 14-25 contain NumSharingCache. Actual number of cores sharing
this cache. SW to add value of one to get result.
The corresponding bits on Intel are defined as "maximum number of threads
sharing this cache" (with a "plus 1" encoding).
Thus a different method to determine which cores are sharing a cache
level has to be used.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Link: http://lkml.kernel.org/r/20121019090209.GG26718@alberich
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Introduce cpu_has_topoext to check for AMD's CPUID topology extensions
support. It indicates support for
CPUID Fn8000_001D_EAX_x[N:0]-CPUID Fn8000_001E_EDX
See AMD's CPUID Specification, Publication # 25481
(as of Rev. 2.34 September 2010)
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Link: http://lkml.kernel.org/r/20121019085813.GD26718@alberich
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Pull MCE fix from Tony Luck:
"Fix problem in CMCI rediscovery code that was illegally
migrating worker threads to other cpus."
Signed-off-by: Ingo Molnar <mingo@kernel.org>
No functional changes.
Now that default arch_uprobe_enable/disable_step() helpers do nothing,
x86 has no reason to reimplement them. Change arch_uprobe_*_xol() hooks
to do the necessary work and remove the x86-specific hooks.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
The nested NMI modifies the place (instruction, flags and stack)
that the first NMI will iret to. However, the copy of registers
modified is exactly the one that is the part of pt_regs in
the first NMI. This can change the behaviour of the first NMI.
In particular, Google's arch_trigger_all_cpu_backtrace handler
also prints regions of memory surrounding addresses appearing in
registers. This results in handled exceptions, after which nested NMIs
start coming in. These nested NMIs change the value of registers
in pt_regs. This can cause the original NMI handler to produce
incorrect output.
We solve this problem by interchanging the position of the preserved
copy of the iret registers ("saved") and the copy subject to being
trampled by nested NMI ("copied").
Link: http://lkml.kernel.org/r/20121002002919.27236.14388.stgit@dungbeetle.mtv.corp.google.com
Signed-off-by: Salman Qazi <sqazi@google.com>
[ Added a needed CFI_ADJUST_CFA_OFFSET ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
If the TSC deadline mode is supported, LAPIC timer one-shot mode can be
implemented using IA32_TSC_DEADLINE MSR. An interrupt will be generated
when the TSC value equals or exceeds the value in the IA32_TSC_DEADLINE
MSR.
This enables us to skip the APIC calibration during boot. Also, in
xapic mode, this enables us to skip the uncached apic access to re-arm
the APIC timer.
As this timer ticks at the high frequency TSC rate, we use the
TSC_DIVISOR (32) to work with the 32-bit restrictions in the
clockevent API's to avoid 64-bit divides etc (frequency is u32 and
"unsigned long" in the set_next_event(), max_delta limits the next
event to 32-bit for 32-bit kernel).
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: venki@google.com
Cc: len.brown@intel.com
Link: http://lkml.kernel.org/r/1350941878.6017.31.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The Way Access Filter in recent AMD CPUs may hurt the performance of
some workloads, caused by aliasing issues in the L1 cache.
This patch disables it on the affected CPUs.
The issue is similar to that one of last year:
http://lkml.indiana.edu/hypermail/linux/kernel/1107.3/00041.html
This new patch does not replace the old one, we just need another
quirk for newer CPUs.
The performance penalty without the patch depends on the
circumstances, but is a bit less than the last year's 3%.
The workloads affected would be those that access code from the same
physical page under different virtual addresses, so different
processes using the same libraries with ASLR or multiple instances of
PIE-binaries. The code needs to be accessed simultaneously from both
cores of the same compute unit.
More details can be found here:
http://developer.amd.com/Assets/SharedL1InstructionCacheonAMD15hCPU.pdf
CPUs affected are anything with the core known as Piledriver.
That includes the new parts of the AMD A-Series (aka Trinity) and the
just released new CPUs of the FX-Series (aka Vishera).
The model numbering is a bit odd here: FX CPUs have model 2,
A-Series has model 10h, with possible extensions to 1Fh. Hence the
range of model ids.
Signed-off-by: Andre Przywara <osp@andrep.de>
Link: http://lkml.kernel.org/r/1351700450-9277-1-git-send-email-osp@andrep.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
For TXT boot, while Linux kernel trys to shutdown/S3/S4/reboot, it
need to jump back to tboot code and do TXT teardown work. Previously
kernel zapped all mem page identity mapping (va=pa) after booting, so
tboot code mem address was mapped again with identity mapping. Now
kernel didn't zap the identity mapping page table, so tboot related
code can remove the remapping code before trapping back now.
Signed-off-by: Xiaoyan Zhang <xiaoyan.zhang@intel.com>
Acked-by: Gang Wei <gang.wei@intel.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
On x86-64 syscall exit, 3 non exclusive events may happen
looping in the following order:
1) Check if we need resched for user preemption, if so call
schedule_user()
2) Check if we have pending signals, if so call do_notify_resume()
3) Check if we do syscall tracing, if so call syscall_trace_leave()
However syscall_trace_leave() has been written assuming it directly
follows the syscall and forget about the above possible 1st and 2nd
steps.
Now schedule_user() and do_notify_resume() exit in RCU user mode
because they have most chances to resume userspace immediately and
this avoids an rcu_user_enter() call in the syscall fast path.
So by the time we call syscall_trace_leave(), we may well be in RCU
user mode. To fix this up, simply call rcu_user_exit() in the beginning
of this function.
This fixes some reported RCU uses in extended quiescent state.
Reported-by: Dave Jones <davej@redhat.com>
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Pull x86 fixes from Ingo Molnar:
"This fixes a couple of nasty page table initialization bugs which were
causing kdump regressions. A clean rearchitecturing of the code is in
the works - meanwhile these are reverts that restore the
best-known-working state of the kernel.
There's also EFI fixes and other small fixes."
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, mm: Undo incorrect revert in arch/x86/mm/init.c
x86: efi: Turn off efi_enabled after setup on mixed fw/kernel
x86, mm: Find_early_table_space based on ranges that are actually being mapped
x86, mm: Use memblock memory loop instead of e820_RAM
x86, mm: Trim memory in memblock to be page aligned
x86/irq/ioapic: Check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt
x86/efi: Fix oops caused by incorrect set_memory_uc() usage
x86-64: Fix page table accounting
Revert "x86/mm: Fix the size calculation of mapping tables"
MAINTAINERS: Add EFI git repository location
Pull x86 RAS changes from Borislav Petkov:
"Rework all config variables used throughout the MCA code and collect
them together into a mca_config struct. This keeps them tightly and
neatly packed together instead of spilled all over the place.
Then, convert those which are used as booleans into real booleans and
save some space."
Signed-off-by: Ingo Molnar <mingo@kernel.org>
mce_ser, mce_bios_cmci_threshold and mce_disabled are the last three
bools which need conversion. Move them to the mca_config struct and
adjust usage sites accordingly.
Signed-off-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Move them into the mca_config struct and adjust code touching them
accordingly.
Signed-off-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>