There were several reports that on some systems writing the SBOX0 PMU
initialization MSR would #GP at boot. This did not happen on all
systems -- my two test systems booted fine.
Writing the three initialization bits bit-by-bit seems to avoid the
problem. So add a special callback to do just that.
This replaces an earlier patch that disabled the SBOX.
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Reported-and-Tested-by: Patrick Lu <patrick.lu@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1415062828-19759-4-git-send-email-andi@firstfloor.org
[ Fixed a whitespace error and added attribution tags that were left out inexplicably. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The only place where the time is invalid is when the ACPI_CSTATE_FFH entry
method is not set. Otherwise for all the drivers, the time can be correctly
measured.
Instead of duplicating the CPUIDLE_FLAG_TIME_VALID flag in all the drivers
for all the states, just invert the logic by replacing it by the flag
CPUIDLE_FLAG_TIME_INVALID, hence we can set this flag only for the acpi idle
driver, remove the former flag from all the drivers and invert the logic with
this flag in the different governor.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add support of Hardware Managed Performance States (HWP) described in Volume 3
section 14.4 of the SDM.
One bit CPUID.06H:EAX[bit 7] expresses the presence of the HWP feature on
the processor. The remaining bits CPUID.06H:EAX[bit 8-11] denote the
presense of various HWP features.
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The problem fixed by 0e4ccb1505 ("PCI: Add x86_msi.msi_mask_irq() and
msix_mask_irq()") has been fixed in a simpler way by a previous commit
("PCI/MSI: Add pci_msi_ignore_mask to prevent writes to MSI/MSI-X Mask
Bits").
The msi_mask_irq() and msix_mask_irq() x86_msi_ops added by 0e4ccb1505
are no longer needed, so revert the commit.
default_msi_mask_irq() and default_msix_mask_irq() were added by
0e4ccb1505 and are still used by s390, so keep them for now.
[bhelgaas: changelog]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: David Vrabel <david.vrabel@citrix.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: xen-devel@lists.xenproject.org
With the introduction of the dynamic trampolines, it is useful that if
things go wrong that ftrace_bug() produces more information about what
the current state is. This can help debug issues that may arise.
Ftrace has lots of checks to make sure that the state of the system it
touchs is exactly what it expects it to be. When it detects an abnormality
it calls ftrace_bug() and disables itself to prevent any further damage.
It is crucial that ftrace_bug() produces sufficient information that
can be used to debug the situation.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Borislav Petkov <bp@suse.de>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When the static ftrace_ops (like function tracer) enables tracing, and it
is the only callback that is referencing a function, a trampoline is
dynamically allocated to the function that calls the callback directly
instead of calling a loop function that iterates over all the registered
ftrace ops (if more than one ops is registered).
But when it comes to dynamically allocated ftrace_ops, where they may be
freed, on a CONFIG_PREEMPT kernel there's no way to know when it is safe
to free the trampoline. If a task was preempted while executing on the
trampoline, there's currently no way to know when it will be off that
trampoline.
But this is not true when it comes to !CONFIG_PREEMPT. The current method
of calling schedule_on_each_cpu() will force tasks off the trampoline,
becaues they can not schedule while on it (kernel preemption is not
configured). That means it is safe to free a dynamically allocated
ftrace ops trampoline when CONFIG_PREEMPT is not configured.
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Previous versions of the espfix had a single function which did setup
the pagetables. It was later split into BSP and AP version. Drop unused
leftovers after that split.
Signed-off-by: Borislav Petkov <bp@suse.de>
Pull two fixes for early microcode loader on 32-bit from Borislav Petkov:
- access the dis_ucode_ldr chicken bit properly
- fix patch stashing on AMD on 32-bit
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Save the patch while we're running on the BSP instead of later, before
the initrd has been jettisoned. More importantly, on 32-bit we need to
access the physical address instead of the virtual.
This way we actually do find it on the APs instead of having to go
through the initrd each time.
Tested-by: Richard Hendershot <rshendershot@mchsi.com>
Fixes: 5335ba5cf4 ("x86, microcode, AMD: Fix early ucode loading")
Cc: <stable@vger.kernel.org> # v3.13+
Signed-off-by: Borislav Petkov <bp@suse.de>
We should be accessing it through a pointer, like on the BSP.
Tested-by: Richard Hendershot <rshendershot@mchsi.com>
Fixes: 65cef1311d ("x86, microcode: Add a disable chicken bit")
Cc: <stable@vger.kernel.org> # v3.15+
Signed-off-by: Borislav Petkov <bp@suse.de>
Observing that per-CPU data (in the SMP case) is reachable by
exploiting 64-bit address wraparound (building on the default kernel
load address being at 16Mb), the one byte shorter RIP-relative
addressing form can be used for most per-CPU accesses. The one
exception are the "stable" reads, where the use of the "P" operand
modifier prevents the compiler from using RIP-relative addressing, but
is unavoidable due to the use of the "p" constraint (side note: with
gcc 4.9.x the intended effect of this isn't being achieved anymore,
see gcc bug 63637).
With the dependency on the minimum kernel load address, arbitrarily
low values for CONFIG_PHYSICAL_START are now no longer possible. A
link time assertion is being added, directing to the need to increase
that value when it triggers.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/5458A1780200007800044A9D@mail.emea.novell.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Both this_cpu_off and cpu_info aren't getting modified post boot, yet
are being accessed on enough code paths that grouping them with other
frequently read items seems desirable. For cpu_info this at the same
time implies removing the cache line alignment (which afaict became
pointless when it got converted to per-CPU data years ago).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/54589BD20200007800044A84@mail.emea.novell.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We can use get_cpu() and put_cpu() to replace
preempt_disable()/cpu = smp_processor_id() and
preempt_enable() for slightly better code.
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Konrad triggered the following splat below in a 32-bit guest on an AMD
box. As it turns out, in save_microcode_in_initrd_amd() we're using the
*physical* address of the container *after* we have enabled paging and
thus we #PF in load_microcode_amd() when trying to access the microcode
container in the ramdisk range.
Because the ramdisk is exactly there:
[ 0.000000] RAMDISK: [mem 0x35e04000-0x36ef9fff]
and we fault at 0x35e04304.
And since this guest doesn't relocate the ramdisk, we don't do the
computation which will give us the correct virtual address and we end up
with the PA.
So, we should actually be using virtual addresses on 32-bit too by the
time we're freeing the initrd. Do that then!
Unpacking initramfs...
BUG: unable to handle kernel paging request at 35d4e304
IP: [<c042e905>] load_microcode_amd+0x25/0x4a0
*pde = 00000000
Oops: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.1-302.fc21.i686 #1
Hardware name: Xen HVM domU, BIOS 4.4.1 10/01/2014
task: f5098000 ti: f50d0000 task.ti: f50d0000
EIP: 0060:[<c042e905>] EFLAGS: 00010246 CPU: 0
EIP is at load_microcode_amd+0x25/0x4a0
EAX: 00000000 EBX: f6e9ec4c ECX: 00001ec4 EDX: 00000000
ESI: f5d4e000 EDI: 35d4e2fc EBP: f50d1ed0 ESP: f50d1e94
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 35d4e304 CR3: 00e33000 CR4: 000406d0
Stack:
00000000 00000000 f50d1ebc f50d1ec4 f5d4e000 c0d7735a f50d1ed0 15a3d17f
f50d1ec4 00600f20 00001ec4 bfb83203 f6e9ec4c f5d4e000 c0d7735a f50d1ed8
c0d80861 f50d1ee0 c0d80429 f50d1ef0 c0d889a9 f5d4e000 c0000000 f50d1f04
Call Trace:
? unpack_to_rootfs
? unpack_to_rootfs
save_microcode_in_initrd_amd
save_microcode_in_initrd
free_initrd_mem
populate_rootfs
? unpack_to_rootfs
do_one_initcall
? unpack_to_rootfs
? repair_env_string
? proc_mkdir
kernel_init_freeable
kernel_init
ret_from_kernel_thread
? rest_init
Reported-and-tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
References: https://bugzilla.redhat.com/show_bug.cgi?id=1158204
Fixes: 75a1ba5b2c ("x86, microcode, AMD: Unify valid container checks")
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # v3.14+
Link: http://lkml.kernel.org/r/20141101100100.GA4462@pd.tnic
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
There are some AMD CPU models which have thresholding banks but which
cannot generate a thresholding interrupt. This is denoted by the bit
MCi_MISC[IntP]. Make sure to check that bit before assigning the
thresholding interrupt handler.
Signed-off-by: Chen Yucong <slaoub@gmail.com>
[ Boris: save an indentation level and rewrite commit message. ]
Link: http://lkml.kernel.org/r/1412662128.28440.18.camel@debian
Signed-off-by: Borislav Petkov <bp@suse.de>
Pull x86 fixes from Ingo Molnar:
"Fixes from all around the place:
- hyper-V 32-bit PAE guest kernel fix
- two IRQ allocation fixes on certain x86 boards
- intel-mid boot crash fix
- intel-quark quirk
- /proc/interrupts duplicate irq chip name fix
- cma boot crash fix
- syscall audit fix
- boot crash fix with certain TSC configurations (seen on Qemu)
- smpboot.c build warning fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, pageattr: Prevent overflow in slow_virt_to_phys() for X86_PAE
ACPI, irq, x86: Return IRQ instead of GSI in mp_register_gsi()
x86, intel-mid: Create IRQs for APB timers and RTC timers
x86: Don't enable F00F workaround on Intel Quark processors
x86/irq: Fix XT-PIC-XT-PIC in /proc/interrupts
x86, cma: Reserve DMA contiguous area after initmem_init()
i386/audit: stop scribbling on the stack frame
x86, apic: Handle a bad TSC more gracefully
x86: ACPI: Do not translate GSI number if IOAPIC is disabled
x86/smpboot: Move data structure to its primary usage scope
The file /sys/kernel/debug/tracing/eneabled_functions is used to debug
ftrace function hooks. Add to the output what function is being called
by the trampoline if the arch supports it.
Add support for this feature in x86_64.
Cc: H. Peter Anvin <hpa@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The current method of handling multiple function callbacks is to register
a list function callback that calls all the other callbacks based on
their hash tables and compare it to the function that the callback was
called on. But this is very inefficient.
For example, if you are tracing all functions in the kernel and then
add a kprobe to a function such that the kprobe uses ftrace, the
mcount trampoline will switch from calling the function trace callback
to calling the list callback that will iterate over all registered
ftrace_ops (in this case, the function tracer and the kprobes callback).
That means for every function being traced it checks the hash of the
ftrace_ops for function tracing and kprobes, even though the kprobes
is only set at a single function. The kprobes ftrace_ops is checked
for every function being traced!
Instead of calling the list function for functions that are only being
traced by a single callback, we can call a dynamically allocated
trampoline that calls the callback directly. The function graph tracer
already uses a direct call trampoline when it is being traced by itself
but it is not dynamically allocated. It's trampoline is static in the
kernel core. The infrastructure that called the function graph trampoline
can also be used to call a dynamically allocated one.
For now, only ftrace_ops that are not dynamically allocated can have
a trampoline. That is, users such as function tracer or stack tracer.
kprobes and perf allocate their ftrace_ops, and until there's a safe
way to free the trampoline, it can not be used. The dynamically allocated
ftrace_ops may, although, use the trampoline if the kernel is not
compiled with CONFIG_PREEMPT. But that will come later.
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
These patches:
86a349a28b ("perf/x86/intel: Add Broadwell core support")
c46e665f03 ("perf/x86: Add INST_RETIRED.ALL workarounds")
fdda3c4aac ("perf/x86/intel: Use Broadwell cache event list for Haswell")
introduced magic constants and unexplained changes:
https://lkml.org/lkml/2014/10/28/1128https://lkml.org/lkml/2014/10/27/325https://lkml.org/lkml/2014/8/27/546https://lkml.org/lkml/2014/10/28/546
Peter Zijlstra has attempted to help out, to clean up the mess:
https://lkml.org/lkml/2014/10/28/543
But has not received helpful and constructive replies which makes
me doubt wether it can all be finished in time until v3.18 is
released.
Despite various review feedback the author (Andi Kleen) has answered
only few of the review questions and has generally been uncooperative,
only giving replies when prompted repeatedly, and only giving minimal
answers instead of constructively explaining and helping along the effort.
That kind of behavior is not acceptable.
There's also a boot crash on Intel E5-1630 v3 CPUs reported for another
commit from Andi Kleen:
e735b9db12 ("perf/x86/intel/uncore: Add Haswell-EP uncore support")
https://lkml.org/lkml/2014/10/22/730
Which is not yet resolved. The uncore driver is independent in theory,
but the crash makes me worry about how well all these patches were
tested and makes me uneasy about the level of interminging that the
Broadwell and Haswell code has received by the commits above.
As a first step to resolve the mess revert the Broadwell client commits
back to the v3.17 version, before we run out of time and problematic
code hits a stable upstream kernel.
( If the Haswell-EP crash is not resolved via a simple fix then we'll have
to revert the Haswell-EP uncore driver as well. )
The Broadwell client series has to be submitted in a clean fashion, with
single, well documented changes per patch. If they are submitted in time
and are accepted during review then they can possibly go into v3.19 but
will need additional scrutiny due to the rocky history of this patch set.
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1409683455-29168-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fix duplicate XT-PIC seen in /proc/interrupts on x86 systems
that make use of 8259A Programmable Interrupt Controllers.
Specifically convert output like this:
CPU0
0: 76573 XT-PIC-XT-PIC timer
1: 11 XT-PIC-XT-PIC i8042
2: 0 XT-PIC-XT-PIC cascade
4: 8 XT-PIC-XT-PIC serial
6: 3 XT-PIC-XT-PIC floppy
7: 0 XT-PIC-XT-PIC parport0
8: 1 XT-PIC-XT-PIC rtc0
10: 448 XT-PIC-XT-PIC fddi0
12: 23 XT-PIC-XT-PIC eth0
14: 2464 XT-PIC-XT-PIC ide0
NMI: 0 Non-maskable interrupts
ERR: 0
to one like this:
CPU0
0: 122033 XT-PIC timer
1: 11 XT-PIC i8042
2: 0 XT-PIC cascade
4: 8 XT-PIC serial
6: 3 XT-PIC floppy
7: 0 XT-PIC parport0
8: 1 XT-PIC rtc0
10: 145 XT-PIC fddi0
12: 31 XT-PIC eth0
14: 2245 XT-PIC ide0
NMI: 0 Non-maskable interrupts
ERR: 0
that is one like we used to have from ~2.2 till it was changed
sometime.
The rationale is there is no value in this duplicate
information, it merely clutters output and looks ugly. We only
have one handler for 8259A interrupts so there is no need to
give it a name separate from the name already given to
irq_chip.
We could define meaningful names for handlers based on bits in
the ELCR register on systems that have it or the value of the
LTIM bit we use in ICW1 otherwise (hardcoded to 0 though with
MCA support gone), to tell edge-triggered and level-triggered
inputs apart. While that information does not affect 8259A
interrupt handlers it could help people determine which lines
are shareable and which are not. That is material for a
separate change though.
Any tools that parse /proc/interrupts are supposed not to be
affected since it was many years we used the format this change
converts back to.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.LFD.2.11.1410260147190.21390@eddie.linux-mips.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fengguang Wu reported a boot crash on the x86 platform
via the 0-day Linux Kernel Performance Test:
cma: dma_contiguous_reserve: reserving 31 MiB for global area
BUG: Int 6: CR2 (null)
[<41850786>] dump_stack+0x16/0x18
[<41d2b1db>] early_idt_handler+0x6b/0x6b
[<41072227>] ? __phys_addr+0x2e/0xca
[<41d4ee4d>] cma_declare_contiguous+0x3c/0x2d7
[<41d6d359>] dma_contiguous_reserve_area+0x27/0x47
[<41d6d4d1>] dma_contiguous_reserve+0x158/0x163
[<41d33e0f>] setup_arch+0x79b/0xc68
[<41d2b7cf>] start_kernel+0x9c/0x456
[<41d2b2ca>] i386_start_kernel+0x79/0x7d
(See details at: https://lkml.org/lkml/2014/10/8/708)
It is because dma_contiguous_reserve() is called before
initmem_init() in x86, the variable high_memory is not
initialized but accessed by __pa(high_memory) in
dma_contiguous_reserve().
This patch moves dma_contiguous_reserve() after initmem_init()
so that high_memory is initialized before accessed.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: iamjoonsoo.kim@lge.com
Cc: 'Linux-MM' <linux-mm@kvack.org>
Cc: 'Weijie Yang' <weijie.yang.kh@gmail.com>
Link: http://lkml.kernel.org/r/000101cfef69%2431e528a0%2495af79e0%24%25yang@samsung.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ok, new attempt, this time around with full ppgtt disabled again.
drm-intel-next-2014-10-03:
- first batch of skl stage 1 enabling
- fixes from Rodrigo to the PSR, fbc and sink crc code
- kerneldoc for the frontbuffer tracking code, runtime pm code and the basic
interrupt enable/disable functions
- smaller stuff all over
drm-intel-next-2014-09-19:
- bunch more i830M fixes from Ville
- full ppgtt now again enabled by default
- more ppgtt fixes from Michel Thierry and Chris Wilson
- plane config work from Gustavo Padovan
- spinlock clarifications
- piles of smaller improvements all over, as usual
* tag 'drm-intel-next-2014-10-03-no-ppgtt' of git://anongit.freedesktop.org/drm-intel: (114 commits)
Revert "drm/i915: Enable full PPGTT on gen7"
drm/i915: Update DRIVER_DATE to 20141003
drm/i915: Remove the duplicated logic between the two shrink phases
drm/i915: kerneldoc for interrupt enable/disable functions
drm/i915: Use dev_priv instead of dev in irq setup functions
drm/i915: s/pm._irqs_disabled/pm.irqs_enabled/
drm/i915: Clear TX FIFO reset master override bits on chv
drm/i915: Make sure hardware uses the correct swing margin/deemph bits on chv
drm/i915: make sink_crc return -EIO on aux read/write failure
drm/i915: Constify send buffer for intel_dp_aux_ch
drm/i915: De-magic the PSR AUX message
drm/i915: Reinstate error level message for non-simulated gpu hangs
drm/i915: Kerneldoc for intel_runtime_pm.c
drm/i915: Call runtime_pm_disable directly
drm/i915: Move intel_display_set_init_power to intel_runtime_pm.c
drm/i915: Bikeshed rpm functions name a bit.
drm/i915: Extract intel_runtime_pm.c
drm/i915: Remove intel_modeset_suspend_hw
drm/i915: spelling fixes for frontbuffer tracking kerneldoc
drm/i915: Tighting frontbuffer tracking around flips
...
git commit b4f0d3755c was very very dumb.
It was writing over %esp/pt_regs semi-randomly on i686 with the expected
"system can't boot" results. As noted in:
https://bugs.freedesktop.org/show_bug.cgi?id=85277
This patch stops fscking with pt_regs. Instead it sets up the registers
for the call to __audit_syscall_entry in the most obvious conceivable
way. It then does just a tiny tiny touch of magic. We need to get what
started in PT_EDX into 0(%esp) and PT_ESI into 4(%esp). This is as easy
as a pair of pushes.
After the call to __audit_syscall_entry all we need to do is get that
now useless junk off the stack (pair of pops) and reload %eax with the
original syscall so other stuff can keep going about it's business.
Reported-by: Paulo Zanoni <przanoni@gmail.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Link: http://lkml.kernel.org/r/1414037043-30647-1-git-send-email-eparis@redhat.com
Cc: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>