Adam Buchbinder
6a6256f9e0
x86: Fix misspellings in comments
...
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: trivial@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2016-02-24 08:44:58 +01:00
Kees Cook
018ef8dcf3
x86/vdso: Mark the vDSO code read-only after init
...
The vDSO does not need to be writable after __init, so mark it as
__ro_after_init. The result kills the exploit method of writing to the
vDSO from kernel space resulting in userspace executing the modified code,
as shown here to bypass SMEP restrictions: http://itszn.com/blog/?p=21
The memory map (with added vDSO address reporting) shows the vDSO moving
into read-only memory:
Before:
[ 0.143067] vDSO @ ffffffff82004000
[ 0.143551] vDSO @ ffffffff82006000
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000 16M pmd
0xffffffff81000000-0xffffffff81800000 8M ro PSE GLB x pmd
0xffffffff81800000-0xffffffff819f3000 1996K ro GLB x pte
0xffffffff819f3000-0xffffffff81a00000 52K ro NX pte
0xffffffff81a00000-0xffffffff81e00000 4M ro PSE GLB NX pmd
0xffffffff81e00000-0xffffffff81e05000 20K ro GLB NX pte
0xffffffff81e05000-0xffffffff82000000 2028K ro NX pte
0xffffffff82000000-0xffffffff8214f000 1340K RW GLB NX pte
0xffffffff8214f000-0xffffffff82281000 1224K RW NX pte
0xffffffff82281000-0xffffffff82400000 1532K RW GLB NX pte
0xffffffff82400000-0xffffffff83200000 14M RW PSE GLB NX pmd
0xffffffff83200000-0xffffffffc0000000 974M pmd
After:
[ 0.145062] vDSO @ ffffffff81da1000
[ 0.146057] vDSO @ ffffffff81da4000
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000 16M pmd
0xffffffff81000000-0xffffffff81800000 8M ro PSE GLB x pmd
0xffffffff81800000-0xffffffff819f3000 1996K ro GLB x pte
0xffffffff819f3000-0xffffffff81a00000 52K ro NX pte
0xffffffff81a00000-0xffffffff81e00000 4M ro PSE GLB NX pmd
0xffffffff81e00000-0xffffffff81e0b000 44K ro GLB NX pte
0xffffffff81e0b000-0xffffffff82000000 2004K ro NX pte
0xffffffff82000000-0xffffffff8214c000 1328K RW GLB NX pte
0xffffffff8214c000-0xffffffff8227e000 1224K RW NX pte
0xffffffff8227e000-0xffffffff82400000 1544K RW GLB NX pte
0xffffffff82400000-0xffffffff83200000 14M RW PSE GLB NX pmd
0xffffffff83200000-0xffffffffc0000000 974M pmd
Based on work by PaX Team and Brad Spengler.
Signed-off-by: Kees Cook <keescook@chromium.org >
Acked-by: Andy Lutomirski <luto@kernel.org >
Acked-by: H. Peter Anvin <hpa@linux.intel.com >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Arnd Bergmann <arnd@arndb.de >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Brad Spengler <spender@grsecurity.net >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: David Brown <david.brown@linaro.org >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: Emese Revfy <re.emese@gmail.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Mathias Krause <minipli@googlemail.com >
Cc: Michael Ellerman <mpe@ellerman.id.au >
Cc: PaX Team <pageexec@freemail.hu >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org >
Link: http://lkml.kernel.org/r/1455748879-21872-7-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2016-02-22 08:51:39 +01:00
Borislav Petkov
8c72530699
x86/vdso: Use static_cpu_has()
...
... and simplify and speed up a tad.
Signed-off-by: Borislav Petkov <bp@suse.de >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Link: http://lkml.kernel.org/r/1453842730-28463-10-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2016-01-30 11:22:23 +01:00
Borislav Petkov
cd4d09ec6f
x86/cpufeature: Carve out X86_FEATURE_*
...
Move them to a separate header and have the following
dependency:
x86/cpufeatures.h <- x86/processor.h <- x86/cpufeature.h
This makes it easier to use the header in asm code and not
include the whole cpufeature.h and add guards for asm.
Suggested-by: H. Peter Anvin <hpa@zytor.com >
Signed-off-by: Borislav Petkov <bp@suse.de >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Link: http://lkml.kernel.org/r/1453842730-28463-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2016-01-30 11:22:17 +01:00
Ingo Molnar
76b36fa896
Merge tag 'v4.5-rc1' into x86/asm, to refresh the branch before merging new changes
...
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2016-01-29 09:41:18 +01:00
Andrey Ryabinin
c6d308534a
UBSAN: run-time undefined behavior sanity checker
...
UBSAN uses compile-time instrumentation to catch undefined behavior
(UB). Compiler inserts code that perform certain kinds of checks before
operations that could cause UB. If check fails (i.e. UB detected)
__ubsan_handle_* function called to print error message.
So the most of the work is done by compiler. This patch just implements
ubsan handlers printing errors.
GCC has this capability since 4.9.x [1] (see -fsanitize=undefined
option and its suboptions).
However GCC 5.x has more checkers implemented [2].
Article [3] has a bit more details about UBSAN in the GCC.
[1] - https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/Debugging-Options.html
[2] - https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html
[3] - http://developerblog.redhat.com/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/
Issues which UBSAN has found thus far are:
Found bugs:
* out-of-bounds access - 97840cb67f
("netfilter: nfnetlink: fix
insufficient validation in nfnetlink_bind")
undefined shifts:
* d48458d4a7
("jbd2: use a better hash function for the revoke
table")
* 10632008b9
("clockevents: Prevent shift out of bounds")
* 'x << -1' shift in ext4 -
http://lkml.kernel.org/r/ <5444EF21.8020501@samsung.com >
* undefined rol32(0) -
http://lkml.kernel.org/r/ <1449198241-20654-1-git-send-email-sasha.levin@oracle.com >
* undefined dirty_ratelimit calculation -
http://lkml.kernel.org/r/ <566594E2.3050306@odin.com >
* undefined roundown_pow_of_two(0) -
http://lkml.kernel.org/r/ <1449156616-11474-1-git-send-email-sasha.levin@oracle.com >
* [WONTFIX] undefined shift in __bpf_prog_run -
http://lkml.kernel.org/r/ <CACT4Y+ZxoR3UjLgcNdUm4fECLMx2VdtfrENMtRRCdgHB2n0bJA@mail.gmail.com >
WONTFIX here because it should be fixed in bpf program, not in kernel.
signed overflows:
* 32a8df4e0b
("sched: Fix odd values in effective_load()
calculations")
* mul overflow in ntp -
http://lkml.kernel.org/r/ <1449175608-1146-1-git-send-email-sasha.levin@oracle.com >
* incorrect conversion into rtc_time in rtc_time64_to_tm() -
http://lkml.kernel.org/r/ <1449187944-11730-1-git-send-email-sasha.levin@oracle.com >
* unvalidated timespec in io_getevents() -
http://lkml.kernel.org/r/ <CACT4Y+bBxVYLQ6LtOKrKtnLthqLHcw-BMp3aqP3mjdAvr9FULQ@mail.gmail.com >
* [NOTABUG] signed overflow in ktime_add_safe() -
http://lkml.kernel.org/r/ <CACT4Y+aJ4muRnWxsUe1CMnA6P8nooO33kwG-c8YZg=0Xc8rJqw@mail.gmail.com >
[akpm@linux-foundation.org: fix unused local warning]
[akpm@linux-foundation.org: fix __int128 build woes]
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Yury Gribov <y.gribov@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-20 17:09:18 -08:00
Andy Lutomirski
78fd8c7288
x86/vdso/pvclock: Protect STABLE check with the seqcount
...
If the clock becomes unstable while we're reading it, we need to
bail. We can do this by simply moving the check into the
seqcount loop.
Reported-by: Marcelo Tosatti <mtosatti@redhat.com >
Signed-off-by: Andy Lutomirski <luto@kernel.org >
Cc: Alexander Graf <agraf@suse.de >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Paolo Bonzini <pbonzini@redhat.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Radim Krcmar <rkrcmar@redhat.com >
Cc: Thomas Gleixner <tglx@linutronix.de >
Link: http://lkml.kernel.org/r/755dcedb17269e1d7ce12a9a713dea303835137e.1451949191.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2016-01-13 11:46:29 +01:00
Andy Lutomirski
bd902c5362
x86/vdso: Disallow vvar access to vclock IO for never-used vclocks
...
It makes me uncomfortable that even modern systems grant every
process direct read access to the HPET.
While fixing this for real without regressing anything is a mess
(unmapping the HPET is tricky because we don't adequately track
all the mappings), we can do almost as well by tracking which
vclocks have ever been used and only allowing pages associated
with used vclocks to be faulted in.
This will cause rogue programs that try to peek at the HPET to
get SIGBUS instead on most systems.
We can't restrict faults to vclock pages that are associated
with the currently selected vclock due to a race: a process
could start to access the HPET for the first time and race
against a switch away from the HPET as the current clocksource.
We can't segfault the process trying to peek at the HPET in this
case, even though the process isn't going to do anything useful
with the data.
Signed-off-by: Andy Lutomirski <luto@kernel.org >
Reviewed-by: Kees Cook <keescook@chromium.org >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: Fenghua Yu <fenghua.yu@intel.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Oleg Nesterov <oleg@redhat.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com >
Cc: Thomas Gleixner <tglx@linutronix.de >
Link: http://lkml.kernel.org/r/e79d06295625c02512277737ab55085a498ac5d8.1451446564.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2016-01-12 11:59:35 +01:00
Andy Lutomirski
a48a704261
x86/vdso: Use ->fault() instead of remap_pfn_range() for the vvar mapping
...
This is IMO much less ugly, and it also opens the door to
disallowing unprivileged userspace HPET access on systems with
usable TSCs.
Signed-off-by: Andy Lutomirski <luto@kernel.org >
Reviewed-by: Kees Cook <keescook@chromium.org >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: Fenghua Yu <fenghua.yu@intel.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Oleg Nesterov <oleg@redhat.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com >
Cc: Thomas Gleixner <tglx@linutronix.de >
Link: http://lkml.kernel.org/r/c19c2909e5ee3c3d8742f916586676bb7c40345f.1451446564.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2016-01-12 11:59:35 +01:00
Andy Lutomirski
05ef76b20f
x86/vdso: Use .fault for the vDSO text mapping
...
The old scheme for mapping the vDSO text is rather complicated.
vdso2c generates a struct vm_special_mapping and a blank .pages
array of the correct size for each vdso image. Init code in
vdso/vma.c populates the .pages array for each vDSO image, and
the mapping code selects the appropriate struct
vm_special_mapping.
With .fault, we can use a less roundabout approach: vdso_fault()
just returns the appropriate page for the selected vDSO image.
Signed-off-by: Andy Lutomirski <luto@kernel.org >
Reviewed-by: Kees Cook <keescook@chromium.org >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: Fenghua Yu <fenghua.yu@intel.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Oleg Nesterov <oleg@redhat.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com >
Cc: Thomas Gleixner <tglx@linutronix.de >
Link: http://lkml.kernel.org/r/f886954c186bafd74e1b967c8931d852ae199aa2.1451446564.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2016-01-12 11:59:34 +01:00
Andy Lutomirski
352b78c62f
x86/vdso: Track each mm's loaded vDSO image as well as its base
...
As we start to do more intelligent things with the vDSO at
runtime (as opposed to just at mm initialization time), we'll
need to know which vDSO is in use.
In principle, we could guess based on the mm type, but that's
over-complicated and error-prone. Instead, just track it in the
mmu context.
Signed-off-by: Andy Lutomirski <luto@kernel.org >
Reviewed-by: Kees Cook <keescook@chromium.org >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: Fenghua Yu <fenghua.yu@intel.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Oleg Nesterov <oleg@redhat.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com >
Cc: Thomas Gleixner <tglx@linutronix.de >
Link: http://lkml.kernel.org/r/c99ac48681bad709ca7ad5ee899d9042a3af6b00.1451446564.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2016-01-12 11:59:34 +01:00
Linus Torvalds
88cbfd0711
Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
...
Pull x86 asm updates from Ingo Molnar:
"The main changes in this cycle were:
- vDSO and asm entry improvements (Andy Lutomirski)
- Xen paravirt entry enhancements (Boris Ostrovsky)
- asm entry labels enhancement (Borislav Petkov)
- and other misc changes (Thomas Gleixner, me)"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vsdo: Fix build on PARAVIRT_CLOCK=y, KVM_GUEST=n
Revert "x86/kvm: On KVM re-enable (e.g. after suspend), update clocks"
x86/entry/64_compat: Make labels local
x86/platform/uv: Include clocksource.h for clocksource_touch_watchdog()
x86/vdso: Enable vdso pvclock access on all vdso variants
x86/vdso: Remove pvclock fixmap machinery
x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap
x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader
x86/kvm: On KVM re-enable (e.g. after suspend), update clocks
x86/entry/64: Bypass enter_from_user_mode on non-context-tracking boots
x86/asm: Add asm macros for static keys/jump labels
x86/asm: Error out if asm/jump_label.h is included inappropriately
context_tracking: Switch to new static_branch API
x86/entry, x86/paravirt: Remove the unused usergs_sysret32 PV op
x86/paravirt: Remove the unused irq_enable_sysexit pv op
x86/xen: Avoid fast syscall path for Xen PV guests
2016-01-11 15:58:16 -08:00
Andy Lutomirski
30bfa7b348
x86/entry: Restore traditional SYSENTER calling convention
...
It turns out that some Android versions hardcode the SYSENTER
calling convention. This is buggy and will cause problems no
matter what the kernel does. Nonetheless, we should try to
support it.
Credit goes to Linus for pointing out a clean way to handle
the SYSENTER/SYSCALL clobber differences while preserving
straightforward DWARF annotations.
I believe that the original offending Android commit was:
https://android.googlesource.com/platform%2Fbionic/+/7dc3684d7a2587e43e6d2a8e0e3f39bf759bd535
Reported-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com >
Signed-off-by: Andy Lutomirski <luto@kernel.org >
Reviewed-and-tested-by: Borislav Petkov <bp@alien8.de >
Cc: <mark.gross@intel.com >
Cc: Su Tao <tao.su@intel.com >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: <frank.wang@intel.com >
Cc: <borun.fu@intel.com >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Mingwei Shi <mingwei.shi@intel.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Signed-off-by: Thomas Gleixner <tglx@linutronix.de >
2015-12-21 16:05:01 +01:00
Andy Lutomirski
6a613ac6bc
x86/entry: Fix some comments
...
Signed-off-by: Andy Lutomirski <luto@kernel.org >
Reviewed-and-tested-by: Borislav Petkov <bp@alien8.de >
Cc: <mark.gross@intel.com >
Cc: Su Tao <tao.su@intel.com >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: <qiuxu.zhuo@intel.com >
Cc: <frank.wang@intel.com >
Cc: <borun.fu@intel.com >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Mingwei Shi <mingwei.shi@intel.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Signed-off-by: Thomas Gleixner <tglx@linutronix.de >
2015-12-21 16:05:01 +01:00
Andy Lutomirski
76480a6a55
x86/vdso: Enable vdso pvclock access on all vdso variants
...
Now that pvclock doesn't require access to the fixmap, all vdso
variants can use it.
The kernel side isn't wired up for 32-bit kernels yet, but this
covers 32-bit and x32 userspace on 64-bit kernels.
Signed-off-by: Andy Lutomirski <luto@kernel.org >
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/a7ef693b7a4c88dd2173dc1d4bf6bc27023626eb.1449702533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2015-12-11 08:56:03 +01:00
Andy Lutomirski
cc1e24fdb0
x86/vdso: Remove pvclock fixmap machinery
...
Signed-off-by: Andy Lutomirski <luto@kernel.org >
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/4933029991103ae44672c82b97a20035f5c1fe4f.1449702533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2015-12-11 08:56:03 +01:00
Andy Lutomirski
dac16fba6f
x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap
...
Signed-off-by: Andy Lutomirski <luto@kernel.org >
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/9d37826fdc7e2d2809efe31d5345f97186859284.1449702533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2015-12-11 08:56:03 +01:00
Andy Lutomirski
6b078f5de7
x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader
...
The pvclock vdso code was too abstracted to understand easily
and excessively paranoid. Simplify it for a huge speedup.
This opens the door for additional simplifications, as the vdso
no longer accesses the pvti for any vcpu other than vcpu 0.
Before, vclock_gettime using kvm-clock took about 45ns on my
machine. With this change, it takes 29ns, which is almost as
fast as the pure TSC implementation.
Signed-off-by: Andy Lutomirski <luto@amacapital.net >
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/6b51dcc41f1b101f963945c5ec7093d72bdac429.1449702533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2015-12-11 08:56:02 +01:00
Linus Torvalds
639ab3eb38
Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
...
Pull x86 mm changes from Ingo Molnar:
"The main changes are: continued PAT work by Toshi Kani, plus a new
boot time warning about insecure RWX kernel mappings, by Stephen
Smalley.
The new CONFIG_DEBUG_WX=y warning is marked default-y if
CONFIG_DEBUG_RODATA=y is already eanbled, as a special exception, as
these bugs are hard to notice and this check already found several
live bugs"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Warn on W^X mappings
x86/mm: Fix no-change case in try_preserve_large_page()
x86/mm: Fix __split_large_page() to handle large PAT bit
x86/mm: Fix try_preserve_large_page() to handle large PAT bit
x86/mm: Fix gup_huge_p?d() to handle large PAT bit
x86/mm: Fix slow_virt_to_phys() to handle large PAT bit
x86/mm: Fix page table dump to show PAT bit
x86/asm: Add pud_pgprot() and pmd_pgprot()
x86/asm: Fix pud/pmd interfaces to handle large PAT bit
x86/asm: Add pud/pmd mask interfaces to handle large PAT bit
x86/asm: Move PUD_PAGE macros to page_types.h
x86/vdso32: Define PGTABLE_LEVELS to 32bit VDSO
2015-11-03 21:23:56 -08:00
Andy Lutomirski
5f310f739b
x86/entry/32: Re-implement SYSENTER using the new C path
...
Signed-off-by: Andy Lutomirski <luto@kernel.org >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/5b99659e8be70f3dd10cd8970a5c90293d9ad9a7.1444091585.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2015-10-09 09:41:10 +02:00
Andy Lutomirski
a474e67c91
x86/vdso/compat: Wire up SYSENTER and SYSCSALL for compat userspace
...
What, you didn't realize that SYSENTER and SYSCALL were actually
the same thing? :)
Unlike the old code, this actually passes the ptrace_syscall_32
test on AMD systems.
Signed-off-by: Andy Lutomirski <luto@kernel.org >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/b74615af58d785aa02d917213ec64e2022a2c796.1444091585.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2015-10-09 09:41:09 +02:00
Andy Lutomirski
8242c6c84a
x86/vdso/32: Save extra registers in the INT80 vsyscall path
...
The goal is to integrate the SYSENTER and SYSCALL32 entry paths
with the INT80 path. SYSENTER clobbers ESP and EIP. SYSCALL32
clobbers ECX (and, invisibly, R11). SYSRETL (long mode to
compat mode) clobbers ECX and, invisibly, R11. SYSEXIT (which
we only need for native 32-bit) clobbers ECX and EDX.
This means that we'll need to provide ESP to the kernel in a
register (I chose ECX, since it's only needed for SYSENTER) and
we need to provide the args that normally live in ECX and EDX in
memory.
The epilogue needs to restore ECX and EDX, since user code
relies on regs being preserved.
We don't need to do anything special about EIP, since the kernel
already knows where we are. The kernel will eventually need to
know where int $0x80 lands, so add a vdso_image entry for it.
The only user-visible effect of this code is that ptrace-induced
changes to ECX and EDX during fast syscalls will be lost. This
is already the case for the SYSENTER path.
Signed-off-by: Andy Lutomirski <luto@kernel.org >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/b860925adbee2d2627a0671fbfe23a7fd04127f8.1444091584.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2015-10-09 09:41:06 +02:00
Andy Lutomirski
29c0ce9508
x86/vdso: Replace hex int80 CFI annotations with GAS directives
...
Maintaining the current CFI annotations written in R'lyehian is
difficult for most of us. Translate them to something a little
closer to English.
This will remove the CFI data for kernels built with extremely
old versions of binutils. I think this is a fair tradeoff for
the ability for mortals to edit the asm.
Signed-off-by: Andy Lutomirski <luto@kernel.org >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/ae3ff4ff5278b4bfc1e1dab368823469866d4b71.1444091584.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2015-10-09 09:41:06 +02:00
Andy Lutomirski
f24f910884
x86/vdso: Define BUILD_VDSO while building and emit .eh_frame in asm
...
For the vDSO, user code wants runtime unwind info. Make sure
that, if we use .cfi directives, we generate it.
Signed-off-by: Andy Lutomirski <luto@kernel.org >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/16e29ad8855e6508197000d8c41f56adb00d7580.1444091584.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2015-10-09 09:41:05 +02:00
Andy Lutomirski
0a6d1fa0d2
x86/vdso: Remove runtime 32-bit vDSO selection
...
32-bit userspace will now always see the same vDSO, which is
exactly what used to be the int80 vDSO. Subsequent patches will
clean it up and make it support SYSENTER and SYSCALL using
alternatives.
Signed-off-by: Andy Lutomirski <luto@kernel.org >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/e7e6b3526fa442502e6125fe69486aab50813c32.1444091584.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2015-10-07 11:34:08 +02:00
Toshi Kani
fb535ccb30
x86/vdso32: Define PGTABLE_LEVELS to 32bit VDSO
...
In case of CONFIG_X86_64, vdso32/vclock_gettime.c fakes a 32-bit
non-PAE kernel configuration by re-defining it to CONFIG_X86_32.
However, it does not re-define CONFIG_PGTABLE_LEVELS leaving it
as 4 levels.
This mismatch leads <asm/pgtable_type.h> to NOT include <asm-generic/
pgtable-nopud.h> and <asm-generic/pgtable-nopmd.h>, which will cause
compile errors when a later patch enhances <asm/pgtable_type.h> to
use PUD_SHIFT and PMD_SHIFT. These -nopud & -nopmd headers define
these SHIFTs for the 32-bit non-PAE kernel.
Fix it by re-defining CONFIG_PGTABLE_LEVELS to 2 levels.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com >
Cc: Andrew Morton <akpm@linux-foundation.org >
Cc: Juergen Gross <jgross@suse.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Konrad Wilk <konrad.wilk@oracle.com >
Cc: Robert Elliot <elliott@hpe.com >
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1442514264-12475-2-git-send-email-toshi.kani@hpe.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de >
2015-09-22 21:27:32 +02:00
Andy Lutomirski
6b7e26547f
x86/vdso: Emit a GNU hash
...
Some dynamic loaders may be slightly faster if a GNU hash is
available. Strangely, this seems to have no effect at all on
the vdso size.
This is unlikely to have any measurable effect on the time it
takes to resolve vdso symbols (since there are so few of them).
In some contexts, it can be a win for a different reason: if
every DSO has a GNU hash section, then libc can avoid
calculating SysV hashes at all. Both musl and glibc appear to
have this optimization.
It's plausible that this breaks some ancient glibc version. If
so, then, depending on what glibc versions break, we could
either require COMPAT_VDSO for them or consider reverting.
Signed-off-by: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Isaac Dunham <ibid.ag@gmail.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Nathan Lynch <nathan_lynch@mentor.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Rich Felker <dalias@libc.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: musl@lists.openwall.com <musl@lists.openwall.com >
Link: http://lkml.kernel.org/r/fd56cc057a2d62ab31c56a48d04fccb435b3fd4f.1438897382.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2015-08-08 10:42:07 +02:00
Brian Gerst
ab8b82ee6d
x86/compat: Don't build the 32-bit VDSO if not needed
...
Build the 32-bit vdso only for native 32-bit or 32-bit compat is
enabled. x32 should not force it to build.
Signed-off-by: Brian Gerst <brgerst@gmail.com >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Link: http://lkml.kernel.org/r/1434974121-32575-7-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2015-07-06 15:28:56 +02:00
Andy Lutomirski
03b9730b76
x86/asm/tsc: Add rdtsc_ordered() and use it in trivial call sites
...
rdtsc_barrier(); rdtsc() is an unnecessary mouthful and requires
more thought than should be necessary. Add an rdtsc_ordered()
helper and replace the trivial call sites with it.
This should not change generated code. The duplication of the
fence asm is temporary.
Signed-off-by: Andy Lutomirski <luto@kernel.org >
Signed-off-by: Borislav Petkov <bp@suse.de >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Huang Rui <ray.huang@amd.com >
Cc: John Stultz <john.stultz@linaro.org >
Cc: Len Brown <lenb@kernel.org >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Ralf Baechle <ralf@linux-mips.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: kvm ML <kvm@vger.kernel.org >
Link: http://lkml.kernel.org/r/dddbf98a2af53312e9aa73a5a2b1622fe5d6f52b.1434501121.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2015-07-06 15:23:29 +02:00
Andy Lutomirski
4ea1636b04
x86/asm/tsc: Rename native_read_tsc() to rdtsc()
...
Now that there is no paravirt TSC, the "native" is
inappropriate. The function does RDTSC, so give it the obvious
name: rdtsc().
Suggested-by: Borislav Petkov <bp@suse.de >
Signed-off-by: Andy Lutomirski <luto@kernel.org >
Signed-off-by: Borislav Petkov <bp@suse.de >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Huang Rui <ray.huang@amd.com >
Cc: John Stultz <john.stultz@linaro.org >
Cc: Len Brown <lenb@kernel.org >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Ralf Baechle <ralf@linux-mips.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: kvm ML <kvm@vger.kernel.org >
Link: http://lkml.kernel.org/r/fd43e16281991f096c1e4d21574d9e1402c62d39.1434501121.git.luto@kernel.org
[ Ported it to v4.2-rc1. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2015-07-06 15:23:28 +02:00
Andy Lutomirski
c6e5ca35c4
x86/asm/tsc: Inline native_read_tsc() and remove __native_read_tsc()
...
In the following commit:
cdc7957d19
("x86: move native_read_tsc() offline")
... native_read_tsc() was moved out of line, presumably for some
now-obsolete vDSO-related reason. Undo it.
The entire rdtsc, shl, or sequence is only 11 bytes, and calls
via rdtscl() and similar helpers were already inlined.
Signed-off-by: Andy Lutomirski <luto@kernel.org >
Signed-off-by: Borislav Petkov <bp@suse.de >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Huang Rui <ray.huang@amd.com >
Cc: John Stultz <john.stultz@linaro.org >
Cc: Len Brown <lenb@kernel.org >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Ralf Baechle <ralf@linux-mips.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: kvm ML <kvm@vger.kernel.org >
Link: http://lkml.kernel.org/r/d05ffe2aaf8468ca475ebc00efad7b2fa174af19.1434501121.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2015-07-06 15:23:25 +02:00
Ingo Molnar
d603c8e184
x86/asm/entry, x86/vdso: Move the vDSO code to arch/x86/entry/vdso/
...
Cc: Borislav Petkov <bp@alien8.de >
Cc: H. Peter Anvin <hpa@zytor.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Andy Lutomirski <luto@amacapital.net >
Cc: Denys Vlasenko <dvlasenk@redhat.com >
Cc: Brian Gerst <brgerst@gmail.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2015-06-03 18:51:37 +02:00