Fix build error by generating elfcore.o only when ELF_CORE (depending on
COREDUMP) is selected:
arch/x86/um/built-in.o: In function `elf_core_write_extra_phdrs':
(.text+0x3e62): undefined reference to `dump_emit'
arch/x86/um/built-in.o: In function `elf_core_write_extra_data':
(.text+0x3eef): undefined reference to `dump_emit'
Fixes: 5d2acfc7b9 ("kconfig: make allnoconfig disable options behind EMBEDDED and EXPERT")
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Add subarchitecture-independent implementation of asm-generic/syscall.h
allowing access to user system call parameters and results:
* syscall_get_nr()
* syscall_rollback()
* syscall_get_error()
* syscall_get_return_value()
* syscall_set_return_value()
* syscall_get_arguments()
* syscall_set_arguments()
* syscall_get_arch() provided by arch/x86/um/asm/syscall.h
This provides the necessary syscall helpers needed by
HAVE_ARCH_SECCOMP_FILTER plus syscall_get_error().
This is inspired from Meredydd Luff's patch
(https://gerrit.chromium.org/gerrit/21425).
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Meredydd Luff <meredydd@senatehouse.org>
Cc: David Drysdale <drysdale@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: Kees Cook <keescook@chromium.org>
Pull KVM fix from Paolo Bonzini:
"A simple fix. I'm sending it before the merge window, because it
refines a patch found in your master branch but not yet in the
kvm/next branch that is destined for 4.5"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: x86: only channel 0 of the i8254 is linked to the HPET
Pull x86 fixes from Ingo Molnar:
"A handful of x86 fixes:
- a syscall ABI fix, fixing an Android breakage
- a Xen PV guest fix relating to the RTC device, causing a
non-working console
- a Xen guest syscall stack frame fix
- an MCE hotplug CPU crash fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/numachip: Fix NumaConnect2 MMCFG PCI access
x86/entry: Restore traditional SYSENTER calling convention
x86/entry: Fix some comments
x86/paravirt: Prevent rtc_cmos platform device init on PV guests
x86/xen: Avoid fast syscall path for Xen PV guests
x86/mce: Ensure offline CPUs don't participate in rendezvous process
Consolidate updating the Hyper-V SynIC timers in a
single place: on guest entry in processing KVM_REQ_HV_STIMER
request. This simplifies the overall logic, and makes sure
the most current state of msrs and guest clock is used for
arming the timers (to achieve that, KVM_REQ_HV_STIMER
has to be processed after KVM_REQ_CLOCK_UPDATE).
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Hypervisor Function Specification(HFS) doesn't require
to disable SynIC timer at timer config write if timer->count = 0.
So drop this check, this allow to load timers MSR's
during migration restore, because config are set before count
in QEMU side.
Also fix condition according to HFS doc(15.3.1):
"It is not permitted to set the SINTx field to zero for an
enabled timer. If attempted, the timer will be
marked disabled (that is, bit 0 cleared) immediately."
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since the numbers now overlap, it makes sense to enumerate
them in asm/kvm_host.h rather than linux/kvm_host.h. Functions
that refer to architecture-specific requests are also moved
to arch/.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This makes sure the wall clock is updated only after an odd version value
is successfully written to guest memory.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
While setting the KVM PIT counters in 'kvm_pit_load_count', if
'hpet_legacy_start' is set, the function disables the timer on
channel[0], instead of the respective index 'channel'. This is
because channels 1-3 are not linked to the HPET. Fix the caller
to only activate the special HPET processing for channel 0.
Reported-by: P J P <pjp@fedoraproject.org>
Fixes: 0185604c2d
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The comment had the meaning of mmu.gva_to_gpa and nested_mmu.gva_to_gpa
swapped. Fix that, and also add some details describing how each translation
works.
Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM/ARM changes for Linux v4.5
- Complete rewrite of the arm64 world switch in C, hopefully
paving the way for more sharing with the 32bit code, better
maintainability and easier integration of new features.
Also smaller and slightly faster in some cases...
- Support for 16bit VM identifiers
- Various cleanups
This is a long standing bug with the l1-dcache-stores generic event on
AMD machines. My perf_event testsuite has been complaining about this
for years and I'm finally getting around to trying to get it fixed.
The data_cache_refills:system event does not make sense for l1-dcache-stores.
Maybe this was a typo and it was meant to be for l1-dcache-store-misses?
In any case, the values returned are nowhere near correct for l1-dcache-stores
and in fact the umask values for the event have completely changed with
fam15h so it makes even less sense than ever. So just remove it.
Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1512091134350.24311@vincent-weaver-1.umelst.maine.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch fixes a bug in the filter_events() function.
The patch fixes the bug whereby if some mappings did not
exist, e.g., STALLED_CYCLES_FRONTEND, then any event after it
in the attrs array would disappear from the published list of
events in /sys/devices/cpu/events. This could be verified
easily on any system post SNB (which do not publish
STALLED_CYCLES_FRONTEND):
$ ./perf stat -e cycles,ref-cycles true
Performance counter stats for 'true':
1,217,348 cycles
<not supported> ref-cycles
The problem is that in filter_events() there is an assumption
that the argument (attrs) is organized in increasing continuous
event indexes related to the event_map(). But if we remove the
non-supported events by shifing the position in the array, then
the lookup x86_pmu.event_map() needs to compensate for it, otherwise
we are looking up the wrong index. This patch corrects this problem
by compensating for the deleted events and with that ref-cycles
reappears (here shown on Haswell):
$ perf stat -e ref-cycles,cycles true
Performance counter stats for 'true':
4,525,910 ref-cycles
1,064,920 cycles
0.002943888 seconds time elapsed
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: jolsa@kernel.org
Cc: kan.liang@intel.com
Fixes: 8300daa267 ("perf/x86: Filter out undefined events from sysfs events attribute")
Link: http://lkml.kernel.org/r/1449516805-6637-1-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add a new 'three-p' precise level, that uses INST_RETIRED.PREC_DIST as
base. The basic mechanism of abusing the inverse cmask to get all
cycles works the same as before.
PREC_DIST is available on Sandy Bridge or later. It had some problems
on Sandy Bridge, so we only use it on IvyBridge and later. I tested it
on Broadwell and Skylake.
PREC_DIST has special support for avoiding shadow effects, which can
give better results compare to UOPS_RETIRED. The drawback is that
PREC_DIST can only schedule on counter 1, but that is ok for cycle
sampling, as there is normally no need to do multiple cycle sampling
runs in parallel. It is still possible to run perf top in parallel, as
that doesn't use precise mode. Also of course the multiplexing can
still allow parallel operation.
:pp stays with the previous event.
Example:
Sample a loop with 10 sqrt with old cycles:pp
0.14 │10: sqrtps %xmm1,%xmm0 <--------------
9.13 │ sqrtps %xmm1,%xmm0
11.58 │ sqrtps %xmm1,%xmm0
11.51 │ sqrtps %xmm1,%xmm0
6.27 │ sqrtps %xmm1,%xmm0
10.38 │ sqrtps %xmm1,%xmm0
12.20 │ sqrtps %xmm1,%xmm0
12.74 │ sqrtps %xmm1,%xmm0
5.40 │ sqrtps %xmm1,%xmm0
10.14 │ sqrtps %xmm1,%xmm0
10.51 │ ↑ jmp 10
We expect all 10 sqrt to get roughly the sample number of samples.
But you can see that the instruction directly after the JMP is
systematically underestimated in the result, due to sampling shadow
effects.
With the new PREC_DIST based sampling this problem is gone and all
instructions show up roughly evenly:
9.51 │10: sqrtps %xmm1,%xmm0
11.74 │ sqrtps %xmm1,%xmm0
11.84 │ sqrtps %xmm1,%xmm0
6.05 │ sqrtps %xmm1,%xmm0
10.46 │ sqrtps %xmm1,%xmm0
12.25 │ sqrtps %xmm1,%xmm0
12.18 │ sqrtps %xmm1,%xmm0
5.26 │ sqrtps %xmm1,%xmm0
10.13 │ sqrtps %xmm1,%xmm0
10.43 │ sqrtps %xmm1,%xmm0
0.16 │ ↑ jmp 10
Even with PREC_DIST there is still sampling skid and the result is not
completely even, but systematic shadow effects are significantly
reduced.
The improvements are mainly expected to make a difference in high IPC
code. With low IPC it should be similar.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1448929689-13771-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
I added UOPS_RETIRED.ALL by mistake to the Skylake PEBS event list for
cycles:pp. But the event is not documented for Skylake, and has some
issues.
The recommended replacement for cycles:pp is to use
INST_RETIRED.ANY+pebs as a base, similar to what CPUs before Sandy
Bridge did. This new event is called INST_RETIRED.TOTAL_CYCLES_PS. The
event is not really new, but has been already used by perf before
Sandy Bridge for the original cycles:p
Note the SDM doesn't document that event either, but it's being
documented in the latest version of the event list on:
https://download.01.org/perfmon/SKL
This patch does:
- Remove UOPS_RETIRED.ALL from the Skylake PEBS event list
- Add INST_RETIRED.ANY to the Skylake PEBS event list, and an table entry to
allow cmask=16,inv=1 for cycles:pp
- We don't need an extra entry for the base INST_RETIRED event,
because it is already covered by the catch-all PEBS table entry.
- Switch Skylake to use the Core2 PEBS alias (which is
INST_RETIRED.TOTAL_CYCLES_PS)
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1448929689-13771-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The CHECK_MEMBER_AT_END_OF(TYPE, MEMBER) checks whether MEMBER
is last member of TYPE by evaluating:
offsetof(TYPE::MEMBER) + sizeof(TYPE::MEMBER) == sizeof(TYPE)
and ensuring TYPE::MEMBER is the last member of the TYPE.
This condition breaks on structs that are padded to be
aligned. This patch ensures the TYPE alignment is taken
into account.
This bug was revealed after adding cacheline alignment into
struct sched_entity, which broke task_struct::thread check:
CHECK_MEMBER_AT_END_OF(struct task_struct, thread);
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1450707930-3445-1-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Using mremap() to shrink the map size of a VM_PFNMAP range causes
the following error message, and leaves the pfn range allocated.
x86/PAT: test:3493 freeing invalid memtype [mem 0x483200000-0x4863fffff]
This is because rbt_memtype_erase(), called from free_memtype()
with spin_lock held, only supports to free a whole memtype node in
memtype_rbroot. Therefore, this patch changes rbt_memtype_erase()
to support a request that shrinks the size of a memtype node for
mremap().
memtype_rb_exact_match() is renamed to memtype_rb_match(), and
is enhanced to support EXACT_MATCH and END_MATCH in @match_type.
Since the memtype_rbroot tree allows overlapping ranges,
rbt_memtype_erase() checks with EXACT_MATCH first, i.e. free
a whole node for the munmap case. If no such entry is found,
it then checks with END_MATCH, i.e. shrink the size of a node
from the end for the mremap case.
On the mremap case, rbt_memtype_erase() proceeds in two steps,
1) remove the node, and then 2) insert the updated node. This
allows proper update of augmented values, subtree_max_end, in
the tree.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: stsp@list.ru
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1450832064-10093-3-git-send-email-toshi.kani@hpe.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
mremap() with MREMAP_FIXED on a VM_PFNMAP range causes the following
WARN_ON_ONCE() message in untrack_pfn().
WARNING: CPU: 1 PID: 3493 at arch/x86/mm/pat.c:985 untrack_pfn+0xbd/0xd0()
Call Trace:
[<ffffffff817729ea>] dump_stack+0x45/0x57
[<ffffffff8109e4b6>] warn_slowpath_common+0x86/0xc0
[<ffffffff8109e5ea>] warn_slowpath_null+0x1a/0x20
[<ffffffff8106a88d>] untrack_pfn+0xbd/0xd0
[<ffffffff811d2d5e>] unmap_single_vma+0x80e/0x860
[<ffffffff811d3725>] unmap_vmas+0x55/0xb0
[<ffffffff811d916c>] unmap_region+0xac/0x120
[<ffffffff811db86a>] do_munmap+0x28a/0x460
[<ffffffff811dec33>] move_vma+0x1b3/0x2e0
[<ffffffff811df113>] SyS_mremap+0x3b3/0x510
[<ffffffff817793ee>] entry_SYSCALL_64_fastpath+0x12/0x71
MREMAP_FIXED moves a pfnmap from old vma to new vma. untrack_pfn() is
called with the old vma after its pfnmap page table has been removed,
which causes follow_phys() to fail. The new vma has a new pfnmap to
the same pfn & cache type with VM_PAT set. Therefore, we only need to
clear VM_PAT from the old vma in this case.
Add untrack_pfn_moved(), which clears VM_PAT from a given old vma.
move_vma() is changed to call this function with the old vma when
VM_PFNMAP is set. move_vma() then calls do_munmap(), and untrack_pfn()
is a no-op since VM_PAT is cleared.
Reported-by: Stas Sergeev <stsp@list.ru>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1450832064-10093-2-git-send-email-toshi.kani@hpe.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
There is no need to worry about module and __init text disappearing
case, because that ftrace has a module notifier that is called when
a module is being unloaded and before the text goes away and this
code grabs the ftrace_lock mutex and removes the module functions
from the ftrace list, such that it will no longer do any
modifications to that module's text, the update to make functions
be traced or not is done under the ftrace_lock mutex as well.
And by now, __init section codes should not been modified
by ftrace, because it is black listed in recordmcount.c and
ignored by ftrace.
Link: http://lkml.kernel.org/r/1449367378-29430-6-git-send-email-huawei.libin@huawei.com
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Li Bin <huawei.libin@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
On a cancelled suspend the vcpu_info location does not change (it's
still in the per-cpu area registered by xen_vcpu_setup()). So do not
call xen_hvm_init_shared_info() which would make the kernel think its
back in the shared info. With the wrong vcpu_info, events cannot be
received and the domain will hang after a cancelled suspend.
Signed-off-by: Charles Ouyang <ouyangzhaowei@huawei.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>