Commit Graph

8502 Commits

Author SHA1 Message Date
Tom Lendacky
7744ccdbc1 x86/mm: Add Secure Memory Encryption (SME) support
Add support for Secure Memory Encryption (SME). This initial support
provides a Kconfig entry to build the SME support into the kernel and
defines the memory encryption mask that will be used in subsequent
patches to mark pages as encrypted.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Toshimitsu Kani <toshi.kani@hpe.com>
Cc: kasan-dev@googlegroups.com
Cc: kvm@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/a6c34d16caaed3bc3e2d6f0987554275bd291554.1500319216.git.thomas.lendacky@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-18 11:37:59 +02:00
Tom Lendacky
872cbefd2d x86/cpu/AMD: Add the Secure Memory Encryption CPU feature
Update the CPU features to include identifying and reporting on the
Secure Memory Encryption (SME) feature.  SME is identified by CPUID
0x8000001f, but requires BIOS support to enable it (set bit 23 of
MSR_K8_SYSCFG).  Only show the SME feature as available if reported by
CPUID, enabled by BIOS and not configured as CONFIG_X86_32=y.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Toshimitsu Kani <toshi.kani@hpe.com>
Cc: kasan-dev@googlegroups.com
Cc: kvm@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/85c17ff450721abccddc95e611ae8df3f4d9718b.1500319216.git.thomas.lendacky@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-18 11:37:59 +02:00
Tom Lendacky
f7750a7956 x86, mpparse, x86/acpi, x86/PCI, x86/dmi, SFI: Use memremap() for RAM mappings
The ioremap() function is intended for mapping MMIO. For RAM, the
memremap() function should be used. Convert calls from ioremap() to
memremap() when re-mapping RAM.

This will be used later by SME to control how the encryption mask is
applied to memory mappings, with certain memory locations being mapped
decrypted vs encrypted.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Toshimitsu Kani <toshi.kani@hpe.com>
Cc: kasan-dev@googlegroups.com
Cc: kvm@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/b13fccb9abbd547a7eef7b1fdfc223431b211c88.1500319216.git.thomas.lendacky@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-18 11:37:58 +02:00
Ingo Molnar
1ed7d32763 Merge branch 'x86/boot' into x86/mm, to pick up interacting changes
The SME patches we are about to apply add some E820 logic, so merge in
pending E820 code changes first, to have a single code base.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-18 11:36:53 +02:00
Josh Poimboeuf
76846bf3cb x86/asm: Add unwind hint annotations to sync_core()
This enables objtool to grok the iret in the middle of a C function.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/b057be26193c11d2ed3337b2107bc7adcba42c99.1499786555.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-18 10:57:44 +02:00
Josh Poimboeuf
39358a033b objtool, x86: Add facility for asm code to provide unwind hints
Some asm (and inline asm) code does special things to the stack which
objtool can't understand.  (Nor can GCC or GNU assembler, for that
matter.)  In such cases we need a facility for the code to provide
annotations, so the unwinder can unwind through it.

This provides such a facility, in the form of unwind hints.  They're
similar to the GNU assembler .cfi* directives, but they give more
information, and are needed in far fewer places, because objtool can
fill in the blanks by following branches and adjusting the stack pointer
for pushes and pops.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/0f5f3c9104fca559ff4088bece1d14ae3bca52d5.1499786555.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-18 10:57:44 +02:00
Roman Kagan
4c07f9046e x86/mm, KVM: Fix warning when !CONFIG_PREEMPT_COUNT
A recent commit:

  d6e41f1151 ("x86/mm, KVM: Teach KVM's VMX code that CR3 isn't a constant")

introduced a VM_WARN_ON(!in_atomic()) which generates false positives
on every VM entry on !CONFIG_PREEMPT_COUNT kernels.

Replace it with a test for preemptible(), which appears to match the
original intent and works across different CONFIG_PREEMPT* variations.

Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Cc: linux-mm@kvack.org
Fixes: d6e41f1151 ("x86/mm, KVM: Teach KVM's VMX code that CR3 isn't a constant")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-18 10:49:18 +02:00
Vitaly Kuznetsov
dd018597a0 x86/hyper-v: stash the max number of virtual/logical processor
Max virtual processor will be needed for 'extended' hypercalls supporting
more than 64 vCPUs. While on it, unify on 'Hyper-V' in mshyperv.c as we
currently have a mix, report acquired misc features as well.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-17 17:20:28 +02:00
Vitaly Kuznetsov
2e252fbf77 x86/hyper-v: include hyperv/ only when CONFIG_HYPERV is set
Code is arch/x86/hyperv/ is only needed when CONFIG_HYPERV is set, the
'basic' support and detection lives in arch/x86/kernel/cpu/mshyperv.c
which is included when CONFIG_HYPERVISOR_GUEST is set.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-17 17:19:42 +02:00
Linus Torvalds
89cbec71fe Merge branch 'work.uaccess-unaligned' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull uacess-unaligned removal from Al Viro:
 "That stuff had just one user, and an exotic one, at that - binfmt_flat
  on arm and m68k"

* 'work.uaccess-unaligned' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  kill {__,}{get,put}_user_unaligned()
  binfmt_flat: flat_{get,put}_addr_from_rp() should be able to fail
2017-07-15 11:17:52 -07:00
Linus Torvalds
e37a07e0c2 Merge tag 'kvm-4.13-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more KVM updates from Radim Krčmář:
 "Second batch of KVM updates for v4.13

  Common:
   - add uevents for VM creation/destruction
   - annotate and properly access RCU-protected objects

  s390:
   - rename IOCTL added in the first v4.13 merge

  x86:
   - emulate VMLOAD VMSAVE feature in SVM
   - support paravirtual asynchronous page fault while nested
   - add Hyper-V userspace interfaces for better migration
   - improve master clock corner cases
   - extend internal error reporting after EPT misconfig
   - correct single-stepping of emulated instructions in SVM
   - handle MCE during VM entry
   - fix nVMX VM entry checks and nVMX VMCS shadowing"

* tag 'kvm-4.13-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
  kvm: x86: hyperv: make VP_INDEX managed by userspace
  KVM: async_pf: Let guest support delivery of async_pf from guest mode
  KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf
  KVM: async_pf: Add L1 guest async_pf #PF vmexit handler
  KVM: x86: Simplify kvm_x86_ops->queue_exception parameter list
  kvm: x86: hyperv: add KVM_CAP_HYPERV_SYNIC2
  KVM: x86: make backwards_tsc_observed a per-VM variable
  KVM: trigger uevents when creating or destroying a VM
  KVM: SVM: Enable Virtual VMLOAD VMSAVE feature
  KVM: SVM: Add Virtual VMLOAD VMSAVE feature definition
  KVM: SVM: Rename lbr_ctl field in the vmcb control area
  KVM: SVM: Prepare for new bit definition in lbr_ctl
  KVM: SVM: handle singlestep exception when skipping emulated instructions
  KVM: x86: take slots_lock in kvm_free_pit
  KVM: s390: Fix KVM_S390_GET_CMMA_BITS ioctl definition
  kvm: vmx: Properly handle machine check during VM-entry
  KVM: x86: update master clock before computing kvmclock_offset
  kvm: nVMX: Shadow "high" parts of shadowed 64-bit VMCS fields
  kvm: nVMX: Fix nested_vmx_check_msr_bitmap_controls
  kvm: nVMX: Validate the I/O bitmaps on nested VM-entry
  ...
2017-07-15 10:18:16 -07:00
Roman Kagan
d3457c877b kvm: x86: hyperv: make VP_INDEX managed by userspace
Hyper-V identifies vCPUs by Virtual Processor Index, which can be
queried via HV_X64_MSR_VP_INDEX msr.  It is defined by the spec as a
sequential number which can't exceed the maximum number of vCPUs per VM.
APIC ids can be sparse and thus aren't a valid replacement for VP
indices.

Current KVM uses its internal vcpu index as VP_INDEX.  However, to make
it predictable and persistent across VM migrations, the userspace has to
control the value of VP_INDEX.

This patch achieves that, by storing vp_index explicitly on vcpu, and
allowing HV_X64_MSR_VP_INDEX to be set from the host side.  For
compatibility it's initialized to KVM vcpu index.  Also a few variables
are renamed to make clear distinction betweed this Hyper-V vp_index and
KVM vcpu_id (== APIC id).  Besides, a new capability,
KVM_CAP_HYPERV_VP_INDEX, is added to allow the userspace to skip
attempting msr writes where unsupported, to avoid spamming error logs.

Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-14 16:28:18 +02:00
Wanpeng Li
52a5c155cf KVM: async_pf: Let guest support delivery of async_pf from guest mode
Adds another flag bit (bit 2) to MSR_KVM_ASYNC_PF_EN. If bit 2 is 1,
async page faults are delivered to L1 as #PF vmexits; if bit 2 is 0,
kvm_can_do_async_pf returns 0 if in guest mode.

This is similar to what svm.c wanted to do all along, but it is only
enabled for Linux as L1 hypervisor.  Foreign hypervisors must never
receive async page faults as vmexits, because they'd probably be very
confused about that.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-14 14:26:16 +02:00
Wanpeng Li
adfe20fb48 KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf
Add an nested_apf field to vcpu->arch.exception to identify an async page
fault, and constructs the expected vm-exit information fields. Force a
nested VM exit from nested_vmx_check_exception() if the injected #PF is
async page fault.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-14 14:26:16 +02:00
Wanpeng Li
1261bfa326 KVM: async_pf: Add L1 guest async_pf #PF vmexit handler
This patch adds the L1 guest async page fault #PF vmexit handler, such
by L1 similar to ordinary async page fault.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
[Passed insn parameters to kvm_mmu_page_fault().]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-14 14:25:24 +02:00
Wanpeng Li
cfcd20e5ca KVM: x86: Simplify kvm_x86_ops->queue_exception parameter list
This patch removes all arguments except the first in
kvm_x86_ops->queue_exception since they can extract the arguments from
vcpu->arch.exception themselves.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-14 14:24:28 +02:00
Roman Kagan
efc479e690 kvm: x86: hyperv: add KVM_CAP_HYPERV_SYNIC2
There is a flaw in the Hyper-V SynIC implementation in KVM: when message
page or event flags page is enabled by setting the corresponding msr,
KVM zeroes it out.  This is problematic because on migration the
corresponding MSRs are loaded on the destination, so the content of
those pages is lost.

This went unnoticed so far because the only user of those pages was
in-KVM hyperv synic timers, which could continue working despite that
zeroing.

Newer QEMU uses those pages for Hyper-V VMBus implementation, and
zeroing them breaks the migration.

Besides, in newer QEMU the content of those pages is fully managed by
QEMU, so zeroing them is undesirable even when writing the MSRs from the
guest side.

To support this new scheme, introduce a new capability,
KVM_CAP_HYPERV_SYNIC2, which, when enabled, makes sure that the synic
pages aren't zeroed out in KVM.

Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-13 17:41:04 +02:00
Ladi Prosek
a826faf108 KVM: x86: make backwards_tsc_observed a per-VM variable
The backwards_tsc_observed global introduced in commit 16a9602 is never
reset to false. If a VM happens to be running while the host is suspended
(a common source of the TSC jumping backwards), master clock will never
be enabled again for any VM. In contrast, if no VM is running while the
host is suspended, master clock is unaffected. This is inconsistent and
unnecessarily strict. Let's track the backwards_tsc_observed variable
separately and let each VM start with a clean slate.

Real world impact: My Windows VMs get slower after my laptop undergoes a
suspend/resume cycle. The only way to get the perf back is unloading and
reloading the kvm module.

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-13 17:25:39 +02:00
Joe Perches
0825f49f22 x86/efi: move asmlinkage before return type
Make the code like the rest of the kernel.

Link: http://lkml.kernel.org/r/1cd3d401626e51ea0e2333a860e76e80bc560a4c.1499284835.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:04 -07:00
Rik van Riel
bf9eb54438 x86: ascii armor the x86_64 boot init stack canary
Use the ascii-armor canary to prevent unterminated C string overflows
from being able to successfully overwrite the canary, even if they
somehow obtain the canary value.

Inspired by execshield ascii-armor and Daniel Micay's linux-hardened
tree.

Link: http://lkml.kernel.org/r/20170524155751.424-4-riel@redhat.com
Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Daniel Micay <danielmicay@gmail.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:03 -07:00
Daniel Micay
6974f0c455 include/linux/string.h: add the option of fortified string.h functions
This adds support for compiling with a rough equivalent to the glibc
_FORTIFY_SOURCE=1 feature, providing compile-time and runtime buffer
overflow checks for string.h functions when the compiler determines the
size of the source or destination buffer at compile-time.  Unlike glibc,
it covers buffer reads in addition to writes.

GNU C __builtin_*_chk intrinsics are avoided because they would force a
much more complex implementation.  They aren't designed to detect read
overflows and offer no real benefit when using an implementation based
on inline checks.  Inline checks don't add up to much code size and
allow full use of the regular string intrinsics while avoiding the need
for a bunch of _chk functions and per-arch assembly to avoid wrapper
overhead.

This detects various overflows at compile-time in various drivers and
some non-x86 core kernel code.  There will likely be issues caught in
regular use at runtime too.

Future improvements left out of initial implementation for simplicity,
as it's all quite optional and can be done incrementally:

* Some of the fortified string functions (strncpy, strcat), don't yet
  place a limit on reads from the source based on __builtin_object_size of
  the source buffer.

* Extending coverage to more string functions like strlcat.

* It should be possible to optionally use __builtin_object_size(x, 1) for
  some functions (C strings) to detect intra-object overflows (like
  glibc's _FORTIFY_SOURCE=2), but for now this takes the conservative
  approach to avoid likely compatibility issues.

* The compile-time checks should be made available via a separate config
  option which can be enabled by default (or always enabled) once enough
  time has passed to get the issues it catches fixed.

Kees said:
 "This is great to have. While it was out-of-tree code, it would have
  blocked at least CVE-2016-3858 from being exploitable (improper size
  argument to strlcpy()). I've sent a number of fixes for
  out-of-bounds-reads that this detected upstream already"

[arnd@arndb.de: x86: fix fortified memcpy]
  Link: http://lkml.kernel.org/r/20170627150047.660360-1-arnd@arndb.de
[keescook@chromium.org: avoid panic() in favor of BUG()]
  Link: http://lkml.kernel.org/r/20170626235122.GA25261@beast
[keescook@chromium.org: move from -mm, add ARCH_HAS_FORTIFY_SOURCE, tweak Kconfig help]
Link: http://lkml.kernel.org/r/20170526095404.20439-1-danielmicay@gmail.com
Link: http://lkml.kernel.org/r/1497903987-21002-8-git-send-email-keescook@chromium.org
Signed-off-by: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:03 -07:00
Janakarajan Natarajan
89c8a4984f KVM: SVM: Enable Virtual VMLOAD VMSAVE feature
Enable the Virtual VMLOAD VMSAVE feature. This is done by setting bit 1
at position B8h in the vmcb.

The processor must have nested paging enabled, be in 64-bit mode and
have support for the Virtual VMLOAD VMSAVE feature for the bit to be set
in the vmcb.

Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-12 22:38:30 +02:00
Janakarajan Natarajan
76ff359249 KVM: SVM: Add Virtual VMLOAD VMSAVE feature definition
Define a new cpufeature definition for Virtual VMLOAD VMSAVE.

Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-12 22:38:29 +02:00
Janakarajan Natarajan
0dc92119b5 KVM: SVM: Rename lbr_ctl field in the vmcb control area
Rename the lbr_ctl variable to better reflect the purpose of the field -
provide support for virtualization extensions.

Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-12 22:38:29 +02:00
Janakarajan Natarajan
8a77e90966 KVM: SVM: Prepare for new bit definition in lbr_ctl
The lbr_ctl variable in the vmcb control area is used to enable or
disable Last Branch Record (LBR) virtualization. However, this is to be
done using only bit 0 of the variable. To correct this and to prepare
for a new feature, change the current usage to work only on a particular
bit.

Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-12 22:38:28 +02:00
Kees Cook
eab09532d4 binfmt_elf: use ELF_ET_DYN_BASE only for PIE
The ELF_ET_DYN_BASE position was originally intended to keep loaders
away from ET_EXEC binaries.  (For example, running "/lib/ld-linux.so.2
/bin/cat" might cause the subsequent load of /bin/cat into where the
loader had been loaded.)

With the advent of PIE (ET_DYN binaries with an INTERP Program Header),
ELF_ET_DYN_BASE continued to be used since the kernel was only looking
at ET_DYN.  However, since ELF_ET_DYN_BASE is traditionally set at the
top 1/3rd of the TASK_SIZE, a substantial portion of the address space
is unused.

For 32-bit tasks when RLIMIT_STACK is set to RLIM_INFINITY, programs are
loaded above the mmap region.  This means they can be made to collide
(CVE-2017-1000370) or nearly collide (CVE-2017-1000371) with
pathological stack regions.

Lowering ELF_ET_DYN_BASE solves both by moving programs below the mmap
region in all cases, and will now additionally avoid programs falling
back to the mmap region by enforcing MAP_FIXED for program loads (i.e.
if it would have collided with the stack, now it will fail to load
instead of falling back to the mmap region).

To allow for a lower ELF_ET_DYN_BASE, loaders (ET_DYN without INTERP)
are loaded into the mmap region, leaving space available for either an
ET_EXEC binary with a fixed location or PIE being loaded into mmap by
the loader.  Only PIE programs are loaded offset from ELF_ET_DYN_BASE,
which means architectures can now safely lower their values without risk
of loaders colliding with their subsequently loaded programs.

For 64-bit, ELF_ET_DYN_BASE is best set to 4GB to allow runtimes to use
the entire 32-bit address space for 32-bit pointers.

Thanks to PaX Team, Daniel Micay, and Rik van Riel for inspiration and
suggestions on how to implement this solution.

Fixes: d1fd836dcf ("mm: split ET_DYN ASLR from mmap ASLR")
Link: http://lkml.kernel.org/r/20170621173201.GA114489@beast
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Daniel Micay <danielmicay@gmail.com>
Cc: Qualys Security Advisory <qsa@qualys.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pratyush Anand <panand@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:36 -07:00
Linus Torvalds
2b97620341 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "The x86 updates contain:

   - A fix for a longstanding PAT bug, where PAT was reported on CPUs
     that do not support it, which leads to wrong caching attributes and
     missing MTRR updates

   - Prevent overwriting of the e820 firmware table, which causes kexec
     kernels to lose the fake mptable which is stored there.

   - Cleanup of the UV/BAU code, removing unused code and making local
     functions static"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot/e820: Introduce the bootloader provided e820_table_firmware[] table
  x86/boot/e820: Rename the e820_table_firmware to e820_table_kexec
  x86/boot/e820: Avoid overwriting e820_table_firmware
  x86/mm/pat: Don't report PAT on CPUs that don't support it
  x86/platform/uv/BAU: Minor cleanup, make some local functions static
2017-07-09 11:21:31 -07:00
Linus Torvalds
f263fbb8d6 Merge tag 'pci-v4.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:

  - add sysfs max_link_speed/width, current_link_speed/width (Wong Vee
    Khee)

  - make host bridge IRQ mapping much more generic (Matthew Minter,
    Lorenzo Pieralisi)

  - convert most drivers to pci_scan_root_bus_bridge() (Lorenzo
    Pieralisi)

  - mutex sriov_configure() (Jakub Kicinski)

  - mutex pci_error_handlers callbacks (Christoph Hellwig)

  - split ->reset_notify() into ->reset_prepare()/reset_done()
    (Christoph Hellwig)

  - support multiple PCIe portdrv interrupts for MSI as well as MSI-X
    (Gabriele Paoloni)

  - allocate MSI/MSI-X vector for Downstream Port Containment (Gabriele
    Paoloni)

  - fix MSI IRQ affinity pre/post/min_vecs issue (Michael Hernandez)

  - test INTx masking during enumeration, not at run-time (Piotr Gregor)

  - avoid using device_may_wakeup() for runtime PM (Rafael J. Wysocki)

  - restore the status of PCI devices across hibernation (Chen Yu)

  - keep parent resources that start at 0x0 (Ard Biesheuvel)

  - enable ECRC only if device supports it (Bjorn Helgaas)

  - restore PRI and PASID state after Function-Level Reset (CQ Tang)

  - skip DPC event if device is not present (Keith Busch)

  - check domain when matching SMBIOS info (Sujith Pandel)

  - mark Intel XXV710 NIC INTx masking as broken (Alex Williamson)

  - avoid AMD SB7xx EHCI USB wakeup defect (Kai-Heng Feng)

  - work around long-standing Macbook Pro poweroff issue (Bjorn Helgaas)

  - add Switchtec "running" status flag (Logan Gunthorpe)

  - fix dra7xx incorrect RW1C IRQ register usage (Arvind Yadav)

  - modify xilinx-nwl IRQ chip for legacy interrupts (Bharat Kumar
    Gogada)

  - move VMD SRCU cleanup after bus, child device removal (Jon Derrick)

  - add Faraday clock handling (Linus Walleij)

  - configure Rockchip MPS and reorganize (Shawn Lin)

  - limit Qualcomm TLP size to 2K (hardware issue) (Srinivas Kandagatla)

  - support Tegra MSI 64-bit addressing (Thierry Reding)

  - use Rockchip normal (not privileged) register bank (Shawn Lin)

  - add HiSilicon Kirin SoC PCIe controller driver (Xiaowei Song)

  - add Sigma Designs Tango SMP8759 PCIe controller driver (Marc
    Gonzalez)

  - add MediaTek PCIe host controller support (Ryder Lee)

  - add Qualcomm IPQ4019 support (John Crispin)

  - add HyperV vPCI protocol v1.2 support (Jork Loeser)

  - add i.MX6 regulator support (Quentin Schulz)

* tag 'pci-v4.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (113 commits)
  PCI: tango: Add Sigma Designs Tango SMP8759 PCIe host bridge support
  PCI: Add DT binding for Sigma Designs Tango PCIe controller
  PCI: rockchip: Use normal register bank for config accessors
  dt-bindings: PCI: Add documentation for MediaTek PCIe
  PCI: Remove __pci_dev_reset() and pci_dev_reset()
  PCI: Split ->reset_notify() method into ->reset_prepare() and ->reset_done()
  PCI: xilinx: Make of_device_ids const
  PCI: xilinx-nwl: Modify IRQ chip for legacy interrupts
  PCI: vmd: Move SRCU cleanup after bus, child device removal
  PCI: vmd: Correct comment: VMD domains start at 0x10000, not 0x1000
  PCI: versatile: Add local struct device pointers
  PCI: tegra: Do not allocate MSI target memory
  PCI: tegra: Support MSI 64-bit addressing
  PCI: rockchip: Use local struct device pointer consistently
  PCI: rockchip: Check for clk_prepare_enable() errors during resume
  MAINTAINERS: Remove Wenrui Li as Rockchip PCIe driver maintainer
  PCI: rockchip: Configure RC's MPS setting
  PCI: rockchip: Reconfigure configuration space header type
  PCI: rockchip: Split out rockchip_pcie_cfg_configuration_accesses()
  PCI: rockchip: Move configuration accesses into rockchip_pcie_cfg_atu()
  ...
2017-07-08 15:51:57 -07:00
Thomas Garnier
5ea0727b16 x86/syscalls: Check address limit on user-mode return
Ensure the address limit is a user-mode segment before returning to
user-mode. Otherwise a process can corrupt kernel-mode memory and elevate
privileges [1].

The set_fs function sets the TIF_SETFS flag to force a slow path on
return. In the slow path, the address limit is checked to be USER_DS if
needed.

The addr_limit_user_check function is added as a cross-architecture
function to check the address limit.

[1] https://bugs.chromium.org/p/project-zero/issues/detail?id=990

Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: kernel-hardening@lists.openwall.com
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Pratyush Anand <panand@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Will Drewry <wad@chromium.org>
Cc: linux-api@vger.kernel.org
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/20170615011203.144108-1-thgarnie@google.com
2017-07-08 14:05:32 +02:00
Linus Torvalds
58f051fc98 Merge tag 'kbuild-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:

 - Clean up Makefiles and scripts

 - Improve clang support

 - Remove unneeded genhdr-y syntax

 - Remove unneeded cc-option-align macro

 - Introduce __cc-option macro and use it to fix x86 boot code compiler
   flags

* tag 'kbuild-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: improve comments on KBUILD_SRC
  x86/build: Specify stack alignment for clang
  x86/build: Use __cc-option for boot code compiler options
  kbuild: Add __cc-option macro
  kbuild: remove cc-option-align
  kbuild: replace genhdr-y with generated-y
  kbuild: clang: Disable 'address-of-packed-member' warning
  kbuild: remove duplicated arch/*/include/generated/uapi include path
  kbuild: speed up checksyscalls.sh
  kbuild: simplify silent build (-s) detection
2017-07-07 14:09:24 -07:00
Linus Torvalds
b6ffe9ba46 Merge tag 'libnvdimm-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
 "libnvdimm updates for the latest ACPI and UEFI specifications. This
  pull request also includes new 'struct dax_operations' enabling to
  undo the abuse of copy_user_nocache() for copy operations to pmem.

  The dax work originally missed 4.12 to address concerns raised by Al.

  Summary:

   - Introduce the _flushcache() family of memory copy helpers and use
     them for persistent memory write operations on x86. The
     _flushcache() semantic indicates that the cache is either bypassed
     for the copy operation (movnt) or any lines dirtied by the copy
     operation are written back (clwb, clflushopt, or clflush).

   - Extend dax_operations with ->copy_from_iter() and ->flush()
     operations. These operations and other infrastructure updates allow
     all persistent memory specific dax functionality to be pushed into
     libnvdimm and the pmem driver directly. It also allows dax-specific
     sysfs attributes to be linked to a host device, for example:
     /sys/block/pmem0/dax/write_cache

   - Add support for the new NVDIMM platform/firmware mechanisms
     introduced in ACPI 6.2 and UEFI 2.7. This support includes the v1.2
     namespace label format, extensions to the address-range-scrub
     command set, new error injection commands, and a new BTT
     (block-translation-table) layout. These updates support inter-OS
     and pre-OS compatibility.

   - Fix a longstanding memory corruption bug in nfit_test.

   - Make the pmem and nvdimm-region 'badblocks' sysfs files poll(2)
     capable.

   - Miscellaneous fixes and small updates across libnvdimm and the nfit
     driver.

  Acknowledgements that came after the branch was pushed: commit
  6aa734a2f3 ("libnvdimm, region, pmem: fix 'badblocks'
  sysfs_get_dirent() reference lifetime") was reviewed by Toshi Kani
  <toshi.kani@hpe.com>"

* tag 'libnvdimm-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (42 commits)
  libnvdimm, namespace: record 'lbasize' for pmem namespaces
  acpi/nfit: Issue Start ARS to retrieve existing records
  libnvdimm: New ACPI 6.2 DSM functions
  acpi, nfit: Show bus_dsm_mask in sysfs
  libnvdimm, acpi, nfit: Add bus level dsm mask for pass thru.
  acpi, nfit: Enable DSM pass thru for root functions.
  libnvdimm: passthru functions clear to send
  libnvdimm, btt: convert some info messages to warn/err
  libnvdimm, region, pmem: fix 'badblocks' sysfs_get_dirent() reference lifetime
  libnvdimm: fix the clear-error check in nsio_rw_bytes
  libnvdimm, btt: fix btt_rw_page not returning errors
  acpi, nfit: quiet invalid block-aperture-region warnings
  libnvdimm, btt: BTT updates for UEFI 2.7 format
  acpi, nfit: constify *_attribute_group
  libnvdimm, pmem: disable dax flushing when pmem is fronting a volatile region
  libnvdimm, pmem, dax: export a cache control attribute
  dax: convert to bitmask for flags
  dax: remove default copy_from_iter fallback
  libnvdimm, nfit: enable support for volatile ranges
  libnvdimm, pmem: fix persistence warning
  ...
2017-07-07 09:44:06 -07:00
Linus Torvalds
9f45efb928 Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:

 - a few hotfixes

 - various misc updates

 - ocfs2 updates

 - most of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (108 commits)
  mm, memory_hotplug: move movable_node to the hotplug proper
  mm, memory_hotplug: drop CONFIG_MOVABLE_NODE
  mm, memory_hotplug: drop artificial restriction on online/offline
  mm: memcontrol: account slab stats per lruvec
  mm: memcontrol: per-lruvec stats infrastructure
  mm: memcontrol: use generic mod_memcg_page_state for kmem pages
  mm: memcontrol: use the node-native slab memory counters
  mm: vmstat: move slab statistics from zone to node counters
  mm/zswap.c: delete an error message for a failed memory allocation in zswap_dstmem_prepare()
  mm/zswap.c: improve a size determination in zswap_frontswap_init()
  mm/zswap.c: delete an error message for a failed memory allocation in zswap_pool_create()
  mm/swapfile.c: sort swap entries before free
  mm/oom_kill: count global and memory cgroup oom kills
  mm: per-cgroup memory reclaim stats
  mm: kmemleak: treat vm_struct as alternative reference to vmalloc'ed objects
  mm: kmemleak: factor object reference updating out of scan_block()
  mm: kmemleak: slightly reduce the size of some structures on 64-bit architectures
  mm, mempolicy: don't check cpuset seqlock where it doesn't matter
  mm, cpuset: always use seqlock when changing task's nodemask
  mm, mempolicy: simplify rebinding mempolicies when updating cpusets
  ...
2017-07-06 22:27:08 -07:00
Linus Torvalds
dc502142b6 Merge branch 'uaccess.strlen' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull user access str* updates from Al Viro:
 "uaccess str...() dead code removal"

* 'uaccess.strlen' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  s390 keyboard.c: don't open-code strndup_user()
  mips: get rid of unused __strnlen_user()
  get rid of unused __strncpy_from_user() instances
  kill strlen_user()
2017-07-06 22:07:44 -07:00
Linus Torvalds
c856863988 Merge branch 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc compat stuff updates from Al Viro:
 "This part is basically untangling various compat stuff. Compat
  syscalls moved to their native counterparts, getting rid of quite a
  bit of double-copying and/or set_fs() uses. A lot of field-by-field
  copyin/copyout killed off.

   - kernel/compat.c is much closer to containing just the
     copyin/copyout of compat structs. Not all compat syscalls are gone
     from it yet, but it's getting there.

   - ipc/compat_mq.c killed off completely.

   - block/compat_ioctl.c cleaned up; floppy compat ioctls moved to
     drivers/block/floppy.c where they belong. Yes, there are several
     drivers that implement some of the same ioctls. Some are m68k and
     one is 32bit-only pmac. drivers/block/floppy.c is the only one in
     that bunch that can be built on biarch"

* 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  mqueue: move compat syscalls to native ones
  usbdevfs: get rid of field-by-field copyin
  compat_hdio_ioctl: get rid of set_fs()
  take floppy compat ioctls to sodding floppy.c
  ipmi: get rid of field-by-field __get_user()
  ipmi: get COMPAT_IPMICTL_RECEIVE_MSG in sync with the native one
  rt_sigtimedwait(): move compat to native
  select: switch compat_{get,put}_fd_set() to compat_{get,put}_bitmap()
  put_compat_rusage(): switch to copy_to_user()
  sigpending(): move compat to native
  getrlimit()/setrlimit(): move compat to native
  times(2): move compat to native
  compat_{get,put}_bitmap(): use unsafe_{get,put}_user()
  fb_get_fscreeninfo(): don't bother with do_fb_ioctl()
  do_sigaltstack(): lift copying to/from userland into callers
  take compat_sys_old_getrlimit() to native syscall
  trim __ARCH_WANT_SYS_OLD_GETRLIMIT
2017-07-06 20:57:13 -07:00
Linus Torvalds
f72e24a124 Merge tag 'dma-mapping-4.13' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping infrastructure from Christoph Hellwig:
 "This is the first pull request for the new dma-mapping subsystem

  In this new subsystem we'll try to properly maintain all the generic
  code related to dma-mapping, and will further consolidate arch code
  into common helpers.

  This pull request contains:

   - removal of the DMA_ERROR_CODE macro, replacing it with calls to
     ->mapping_error so that the dma_map_ops instances are more self
     contained and can be shared across architectures (me)

   - removal of the ->set_dma_mask method, which duplicates the
     ->dma_capable one in terms of functionality, but requires more
     duplicate code.

   - various updates for the coherent dma pool and related arm code
     (Vladimir)

   - various smaller cleanups (me)"

* tag 'dma-mapping-4.13' of git://git.infradead.org/users/hch/dma-mapping: (56 commits)
  ARM: dma-mapping: Remove traces of NOMMU code
  ARM: NOMMU: Set ARM_DMA_MEM_BUFFERABLE for M-class cpus
  ARM: NOMMU: Introduce dma operations for noMMU
  drivers: dma-mapping: allow dma_common_mmap() for NOMMU
  drivers: dma-coherent: Introduce default DMA pool
  drivers: dma-coherent: Account dma_pfn_offset when used with device tree
  dma: Take into account dma_pfn_offset
  dma-mapping: replace dmam_alloc_noncoherent with dmam_alloc_attrs
  dma-mapping: remove dmam_free_noncoherent
  crypto: qat - avoid an uninitialized variable warning
  au1100fb: remove a bogus dma_free_nonconsistent call
  MAINTAINERS: add entry for dma mapping helpers
  powerpc: merge __dma_set_mask into dma_set_mask
  dma-mapping: remove the set_dma_mask method
  powerpc/cell: use the dma_supported method for ops switching
  powerpc/cell: clean up fixed mapping dma_ops initialization
  tile: remove dma_supported and mapping_error methods
  xen-swiotlb: remove xen_swiotlb_set_dma_mask
  arm: implement ->dma_supported instead of ->set_dma_mask
  mips/loongson64: implement ->dma_supported instead of ->set_dma_mask
  ...
2017-07-06 19:20:54 -07:00
Linus Torvalds
6e6c5b9606 Merge tag 'for-linus-4.13-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
 "Other than fixes and cleanups it contains:

   - support > 32 VCPUs at domain restore

   - support for new sysfs nodes related to Xen

   - some performance tuning for Linux running as Xen guest"

* tag 'for-linus-4.13-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: allow userspace access during hypercalls
  x86: xen: remove unnecessary variable in xen_foreach_remap_area()
  xen: allocate page for shared info page from low memory
  xen: avoid deadlock in xenbus driver
  xen: add sysfs node for hypervisor build id
  xen: sync include/xen/interface/version.h
  xen: add sysfs node for guest type
  doc,xen: document hypervisor sysfs nodes for xen
  xen/vcpu: Handle xen_vcpu_setup() failure at boot
  xen/vcpu: Handle xen_vcpu_setup() failure in hotplug
  xen/pv: Fix OOPS on restore for a PV, !SMP domain
  xen/pvh*: Support > 32 VCPUs at domain restore
  xen/vcpu: Simplify xen_vcpu related code
  xen-evtchn: Bind dyn evtchn:qemu-dm interrupt to next online VCPU
  xen: avoid type warning in xchg_xen_ulong
  xen: fix HYPERVISOR_dm_op() prototype
  xen: don't print error message in case of missing Xenstore entry
  arm/xen: Adjust one function call together with a variable assignment
  arm/xen: Delete an error message for a failed memory allocation in __set_phys_to_machine_multi()
  arm/xen: Improve a size determination in __set_phys_to_machine_multi()
2017-07-06 19:11:24 -07:00
Linus Torvalds
c136b84393 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
 "PPC:
   - Better machine check handling for HV KVM
   - Ability to support guests with threads=2, 4 or 8 on POWER9
   - Fix for a race that could cause delayed recognition of signals
   - Fix for a bug where POWER9 guests could sleep with interrupts pending.

  ARM:
   - VCPU request overhaul
   - allow timer and PMU to have their interrupt number selected from userspace
   - workaround for Cavium erratum 30115
   - handling of memory poisonning
   - the usual crop of fixes and cleanups

  s390:
   - initial machine check forwarding
   - migration support for the CMMA page hinting information
   - cleanups and fixes

  x86:
   - nested VMX bugfixes and improvements
   - more reliable NMI window detection on AMD
   - APIC timer optimizations

  Generic:
   - VCPU request overhaul + documentation of common code patterns
   - kvm_stat improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (124 commits)
  Update my email address
  kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS
  x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12
  kvm: x86: mmu: allow A/D bits to be disabled in an mmu
  x86: kvm: mmu: make spte mmio mask more explicit
  x86: kvm: mmu: dead code thanks to access tracking
  KVM: PPC: Book3S: Fix typo in XICS-on-XIVE state saving code
  KVM: PPC: Book3S HV: Close race with testing for signals on guest entry
  KVM: PPC: Book3S HV: Simplify dynamic micro-threading code
  KVM: x86: remove ignored type attribute
  KVM: LAPIC: Fix lapic timer injection delay
  KVM: lapic: reorganize restart_apic_timer
  KVM: lapic: reorganize start_hv_timer
  kvm: nVMX: Check memory operand to INVVPID
  KVM: s390: Inject machine check into the nested guest
  KVM: s390: Inject machine check into the guest
  tools/kvm_stat: add new interactive command 'b'
  tools/kvm_stat: add new command line switch '-i'
  tools/kvm_stat: fix error on interactive command 'g'
  KVM: SVM: suppress unnecessary NMI singlestep on GIF=0 and nested exit
  ...
2017-07-06 18:38:31 -07:00
Aneesh Kumar K.V
e1073d1e79 mm/hugetlb: clean up ARCH_HAS_GIGANTIC_PAGE
This moves the #ifdef in C code to a Kconfig dependency.  Also we move
the gigantic_page_supported() function to be arch specific.

This allows architectures to conditionally enable runtime allocation of
gigantic huge page.  Architectures like ppc64 supports different
gigantic huge page size (16G and 1G) based on the translation mode
selected.  This provides an opportunity for ppc64 to enable runtime
allocation only w.r.t 1G hugepage.

No functional change in this patch.

Link: http://lkml.kernel.org/r/1494995292-4443-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:33 -07:00
Andy Lutomirski
660da7c922 x86/mm: Enable CR4.PCIDE on supported systems
We can use PCID if the CPU has PCID and PGE and we're not on Xen.

By itself, this has no effect. A followup patch will start using PCID.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Nadav Amit <nadav.amit@gmail.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/6327ecd907b32f79d5aa0d466f04503bbec5df88.1498751203.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-05 10:52:58 +02:00
Andy Lutomirski
cba4671af7 x86/mm: Disable PCID on 32-bit kernels
32-bit kernels on new hardware will see PCID in CPUID, but PCID can
only be used in 64-bit mode.  Rather than making all PCID code
conditional, just disable the feature on 32-bit builds.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Nadav Amit <nadav.amit@gmail.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/2e391769192a4d31b808410c383c6bf0734bc6ea.1498751203.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-05 10:52:57 +02:00
Andy Lutomirski
43858b4f25 x86/mm: Stop calling leave_mm() in idle code
Now that lazy TLB suppresses all flush IPIs (as opposed to all but
the first), there's no need to leave_mm() when going idle.

This means we can get rid of the rcuidle hack in
switch_mm_irqs_off() and we can unexport leave_mm().

This also removes acpi_unlazy_tlb() from the x86 and ia64 headers,
since it has no callers any more.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Nadav Amit <nadav.amit@gmail.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/03c699cfd6021e467be650d6b73deaccfe4b4bd7.1498751203.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-05 10:52:57 +02:00
Andy Lutomirski
94b1b03b51 x86/mm: Rework lazy TLB mode and TLB freshness tracking
x86's lazy TLB mode used to be fairly weak -- it would switch to
init_mm the first time it tried to flush a lazy TLB.  This meant an
unnecessary CR3 write and, if the flush was remote, an unnecessary
IPI.

Rewrite it entirely.  When we enter lazy mode, we simply remove the
CPU from mm_cpumask.  This means that we need a way to figure out
whether we've missed a flush when we switch back out of lazy mode.
I use the tlb_gen machinery to track whether a context is up to
date.

Note to reviewers: this patch, my itself, looks a bit odd.  I'm
using an array of length 1 containing (ctx_id, tlb_gen) rather than
just storing tlb_gen, and making it at array isn't necessary yet.
I'm doing this because the next few patches add PCID support, and,
with PCID, we need ctx_id, and the array will end up with a length
greater than 1.  Making it an array now means that there will be
less churn and therefore less stress on your eyeballs.

NB: This is dubious but, AFAICT, still correct on Xen and UV.
xen_exit_mmap() uses mm_cpumask() for nefarious purposes and this
patch changes the way that mm_cpumask() works.  This should be okay,
since Xen *also* iterates all online CPUs to find all the CPUs it
needs to twiddle.

The UV tlbflush code is rather dated and should be changed.

Here are some benchmark results, done on a Skylake laptop at 2.3 GHz
(turbo off, intel_pstate requesting max performance) under KVM with
the guest using idle=poll (to avoid artifacts when bouncing between
CPUs).  I haven't done any real statistics here -- I just ran them
in a loop and picked the fastest results that didn't look like
outliers.  Unpatched means commit a4eb8b9935, so all the
bookkeeping overhead is gone.

MADV_DONTNEED; touch the page; switch CPUs using sched_setaffinity.  In
an unpatched kernel, MADV_DONTNEED will send an IPI to the previous CPU.
This is intended to be a nearly worst-case test.

  patched:         13.4µs
  unpatched:       21.6µs

Vitaly's pthread_mmap microbenchmark with 8 threads (on four cores),
nrounds = 100, 256M data

  patched:         1.1 seconds or so
  unpatched:       1.9 seconds or so

The sleepup on Vitaly's test appearss to be because it spends a lot
of time blocked on mmap_sem, and this patch avoids sending IPIs to
blocked CPUs.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Nadav Amit <nadav.amit@gmail.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Travis <travis@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/ddf2c92962339f4ba39d8fc41b853936ec0b44f1.1498751203.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-05 10:52:57 +02:00
Andy Lutomirski
b0579ade7c x86/mm: Track the TLB's tlb_gen and update the flushing algorithm
There are two kernel features that would benefit from tracking
how up-to-date each CPU's TLB is in the case where IPIs aren't keeping
it up to date in real time:

 - Lazy mm switching currently works by switching to init_mm when
   it would otherwise flush.  This is wasteful: there isn't fundamentally
   any need to update CR3 at all when going lazy or when returning from
   lazy mode, nor is there any need to receive flush IPIs at all.  Instead,
   we should just stop trying to keep the TLB coherent when we go lazy and,
   when unlazying, check whether we missed any flushes.

 - PCID will let us keep recent user contexts alive in the TLB.  If we
   start doing this, we need a way to decide whether those contexts are
   up to date.

On some paravirt systems, remote TLBs can be flushed without IPIs.
This won't update the target CPUs' tlb_gens, which may cause
unnecessary local flushes later on.  We can address this if it becomes
a problem by carefully updating the target CPU's tlb_gen directly.

By itself, this patch is a very minor optimization that avoids
unnecessary flushes when multiple TLB flushes targetting the same CPU
race.  The complexity in this patch would not be worth it on its own,
but it will enable improved lazy TLB tracking and PCID.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Nadav Amit <nadav.amit@gmail.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1210fb244bc9cbe7677f7f0b72db4d359675f24b.1498751203.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-05 10:52:56 +02:00
Andy Lutomirski
f39681ed0f x86/mm: Give each mm TLB flush generation a unique ID
This adds two new variables to mmu_context_t: ctx_id and tlb_gen.
ctx_id uniquely identifies the mm_struct and will never be reused.
For a given mm_struct (and hence ctx_id), tlb_gen is a monotonic
count of the number of times that a TLB flush has been requested.
The pair (ctx_id, tlb_gen) can be used as an identifier for TLB
flush actions and will be used in subsequent patches to reliably
determine whether all needed TLB flushes have occurred on a given
CPU.

This patch is split out for ease of review.  By itself, it has no
real effect other than creating and updating the new variables.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Nadav Amit <nadav.amit@gmail.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/413a91c24dab3ed0caa5f4e4d017d87b0857f920.1498751203.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-05 10:52:56 +02:00
Chen Yu
12df216c61 x86/boot/e820: Introduce the bootloader provided e820_table_firmware[] table
Add the real e820_tabel_firmware[] that will not be modified by the kernel
or the EFI boot stub under any circumstance.

In addition to that modify the code so that e820_table_firmwarep[] is
exposed via sysfs to represent the real firmware memory layout,
rather than exposing the e820_table_kexec[] table.

This fixes a hibernation bug/warning, which uses e820_table_kexec[] to check
RAM layout consistency across hibernation/resume:

  The suspend kernel:
  [    0.000000] e820: update [mem 0x76671018-0x76679457] usable ==> usable

  The resume kernel:
  [    0.000000] e820: update [mem 0x7666f018-0x76677457] usable ==> usable
  ...
  [   15.752088] PM: Using 3 thread(s) for decompression.
  [   15.752088] PM: Loading and decompressing image data (471870 pages)...
  [   15.764971] Hibernate inconsistent memory map detected!
  [   15.770833] PM: Image mismatch: architecture specific data

Actually it is safe to restore these pages because E820_TYPE_RAM and
E820_TYPE_RESERVED_KERN are treated the same during hibernation, so
the original e820 table provided by the bootloader is used for
hibernation MD5 fingerprint checking.

The side effect is that, this newly introduced variable might increase the
kernel size at compile time.

Suggested-by: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Xunlei Pang <xlpang@redhat.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-05 10:09:02 +02:00
Chen Yu
a09bae0f8a x86/boot/e820: Rename the e820_table_firmware to e820_table_kexec
Currently the e820_table_firmware[] table is mainly used by the kexec,
and it is not what it's supposed to be - despite its name it might be
modified by the kernel.

So change its name to e820_table_kexec[]. In the next patch we will
introduce the real e820_table_firmware[] table.

No functional change.

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Xunlei Pang <xlpang@redhat.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-05 10:09:02 +02:00
Mikulas Patocka
99c13b8c88 x86/mm/pat: Don't report PAT on CPUs that don't support it
The pat_enabled() logic is broken on CPUs which do not support PAT and
where the initialization code fails to call pat_init(). Due to that the
enabled flag stays true and pat_enabled() returns true wrongfully.

As a consequence the mappings, e.g. for Xorg, are set up with the wrong
caching mode and the required MTRR setups are omitted.

To cure this the following changes are required:

  1) Make pat_enabled() return true only if PAT initialization was
     invoked and successful.

  2) Invoke init_cache_modes() unconditionally in setup_arch() and
     remove the extra callsites in pat_disable() and the pat disabled
     code path in pat_init().

Also rename __pat_enabled to pat_disabled to reflect the real purpose of
this variable.

Fixes: 9cd25aac1f ("x86/mm/pat: Emulate PAT when it is disabled")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Bernhard Held <berny156@gmx.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: "Luis R. Rodriguez" <mcgrof@suse.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1707041749300.3456@file01.intranet.prod.int.rdu2.redhat.com
2017-07-05 09:01:24 +02:00
Linus Torvalds
408c9861c6 Merge tag 'pm-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
 "The big ticket items here are the rework of suspend-to-idle in order
  to add proper support for power button wakeup from it on recent Dell
  laptops and the rework of interfaces exporting the current CPU
  frequency on x86.

  In addition to that, support for a few new pieces of hardware is
  added, the PCI/ACPI device wakeup infrastructure is simplified
  significantly and the wakeup IRQ framework is fixed to unbreak the IRQ
  bus locking infrastructure.

  Also, there are some functional improvements for intel_pstate, tools
  updates and small fixes and cleanups all over.

  Specifics:

   - Rework suspend-to-idle to allow it to take wakeup events signaled
     by the EC into account on ACPI-based platforms in order to properly
     support power button wakeup from suspend-to-idle on recent Dell
     laptops (Rafael Wysocki).

     That includes the core suspend-to-idle code rework, support for the
     Low Power S0 _DSM interface, and support for the ACPI INT0002
     Virtual GPIO device from Hans de Goede (required for USB keyboard
     wakeup from suspend-to-idle to work on some machines).

   - Stop trying to export the current CPU frequency via /proc/cpuinfo
     on x86 as that is inaccurate and confusing (Len Brown).

   - Rework the way in which the current CPU frequency is exported by
     the kernel (over the cpufreq sysfs interface) on x86 systems with
     the APERF and MPERF registers by always using values read from
     these registers, when available, to compute the current frequency
     regardless of which cpufreq driver is in use (Len Brown).

   - Rework the PCI/ACPI device wakeup infrastructure to remove the
     questionable and artificial distinction between "devices that can
     wake up the system from sleep states" and "devices that can
     generate wakeup signals in the working state" from it, which allows
     the code to be simplified quite a bit (Rafael Wysocki).

   - Fix the wakeup IRQ framework by making it use SRCU instead of RCU
     which doesn't allow sleeping in the read-side critical sections,
     but which in turn is expected to be allowed by the IRQ bus locking
     infrastructure (Thomas Gleixner).

   - Modify some computations in the intel_pstate driver to avoid
     rounding errors resulting from them (Srinivas Pandruvada).

   - Reduce the overhead of the intel_pstate driver in the HWP
     (hardware-managed P-states) mode and when the "performance" P-state
     selection algorithm is in use by making it avoid registering
     scheduler callbacks in those cases (Len Brown).

   - Rework the energy_performance_preference sysfs knob in intel_pstate
     by changing the values that correspond to different symbolic hint
     names used by it (Len Brown).

   - Make it possible to use more than one cpuidle driver at the same
     time on ARM (Daniel Lezcano).

   - Make it possible to prevent the cpuidle menu governor from using
     the 0 state by disabling it via sysfs (Nicholas Piggin).

   - Add support for FFH (Fixed Functional Hardware) MWAIT in ACPI C1 on
     AMD systems (Yazen Ghannam).

   - Make the CPPC cpufreq driver take the lowest nonlinear performance
     information into account (Prashanth Prakash).

   - Add support for hi3660 to the cpufreq-dt driver, fix the imx6q
     driver and clean up the sfi, exynos5440 and intel_pstate drivers
     (Colin Ian King, Krzysztof Kozlowski, Octavian Purdila, Rafael
     Wysocki, Tao Wang).

   - Fix a few minor issues in the generic power domains (genpd)
     framework and clean it up somewhat (Krzysztof Kozlowski, Mikko
     Perttunen, Viresh Kumar).

   - Fix a couple of minor issues in the operating performance points
     (OPP) framework and clean it up somewhat (Viresh Kumar).

   - Fix a CONFIG dependency in the hibernation core and clean it up
     slightly (Balbir Singh, Arvind Yadav, BaoJun Luo).

   - Add rk3228 support to the rockchip-io adaptive voltage scaling
     (AVS) driver (David Wu).

   - Fix an incorrect bit shift operation in the RAPL power capping
     driver (Adam Lessnau).

   - Add support for the EPP field in the HWP (hardware managed
     P-states) control register, HWP.EPP, to the x86_energy_perf_policy
     tool and update msr-index.h with HWP.EPP values (Len Brown).

   - Fix some minor issues in the turbostat tool (Len Brown).

   - Add support for AMD family 0x17 CPUs to the cpupower tool and fix a
     minor issue in it (Sherry Hurwitz).

   - Assorted cleanups, mostly related to the constification of some
     data structures (Arvind Yadav, Joe Perches, Kees Cook, Krzysztof
     Kozlowski)"

* tag 'pm-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (69 commits)
  cpufreq: Update scaling_cur_freq documentation
  cpufreq: intel_pstate: Clean up after performance governor changes
  PM: hibernate: constify attribute_group structures.
  cpuidle: menu: allow state 0 to be disabled
  intel_idle: Use more common logging style
  PM / Domains: Fix missing default_power_down_ok comment
  PM / Domains: Fix unsafe iteration over modified list of domains
  PM / Domains: Fix unsafe iteration over modified list of domain providers
  PM / Domains: Fix unsafe iteration over modified list of device links
  PM / Domains: Handle safely genpd_syscore_switch() call on non-genpd device
  PM / Domains: Call driver's noirq callbacks
  PM / core: Drop run_wake flag from struct dev_pm_info
  PCI / PM: Simplify device wakeup settings code
  PCI / PM: Drop pme_interrupt flag from struct pci_dev
  ACPI / PM: Consolidate device wakeup settings code
  ACPI / PM: Drop run_wake from struct acpi_device_wakeup_flags
  PM / QoS: constify *_attribute_group.
  PM / AVS: rockchip-io: add io selectors and supplies for rk3228
  powercap/RAPL: prevent overridding bits outside of the mask
  PM / sysfs: Constify attribute groups
  ...
2017-07-04 13:39:41 -07:00
Linus Torvalds
f4dd029ee0 Merge tag 'char-misc-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc updates from Greg KH:
 "Here is the "big" char/misc driver patchset for 4.13-rc1.

  Lots of stuff in here, a large thunderbolt update, w1 driver header
  reorg, the new mux driver subsystem, google firmware driver updates,
  and a raft of other smaller things. Full details in the shortlog.

  All of these have been in linux-next for a while with the only
  reported issue being a merge problem with this tree and the jc-docs
  tree in the w1 documentation area"

* tag 'char-misc-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (147 commits)
  misc: apds990x: Use sysfs_match_string() helper
  mei: drop unreachable code in mei_start
  mei: validate the message header only in first fragment.
  DocBook: w1: Update W1 file locations and names in DocBook
  mux: adg792a: always require I2C support
  nvmem: rockchip-efuse: add support for rk322x-efuse
  nvmem: core: add locking to nvmem_find_cell
  nvmem: core: Call put_device() in nvmem_unregister()
  nvmem: core: fix leaks on registration errors
  nvmem: correct Broadcom OTP controller driver writes
  w1: Add subsystem kernel public interface
  drivers/fsi: Add module license to core driver
  drivers/fsi: Use asynchronous slave mode
  drivers/fsi: Add hub master support
  drivers/fsi: Add SCOM FSI client device driver
  drivers/fsi/gpio: Add tracepoints for GPIO master
  drivers/fsi: Add GPIO based FSI master
  drivers/fsi: Document FSI master sysfs files in ABI
  drivers/fsi: Add error handling for slave
  drivers/fsi: Add tracepoints for low-level operations
  ...
2017-07-03 20:55:59 -07:00
Linus Torvalds
4422d80ed7 Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Thomas Gleixner:
 "The RAS updates for the 4.13 merge window:

   - Cleanup of the MCE injection facility (Borsilav Petkov)

   - Rework of the AMD/SMCA handling (Yazen Ghannam)

   - Enhancements for ACPI/APEI to handle new notitication types (Shiju
     Jose)

   - atomic_t to refcount_t conversion (Elena Reshetova)

   - A few fixes and enhancements all over the place"

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  RAS/CEC: Check the correct variable in the debugfs error handling
  x86/mce: Always save severity in machine_check_poll()
  x86/MCE, xen/mcelog: Make /dev/mcelog registration messages more precise
  x86/mce: Update bootlog description to reflect behavior on AMD
  x86/mce: Don't disable MCA banks when offlining a CPU on AMD
  x86/mce/mce-inject: Preset the MCE injection struct
  x86/mce: Clean up include files
  x86/mce: Get rid of register_mce_write_callback()
  x86/mce: Merge mce_amd_inj into mce-inject
  x86/mce/AMD: Use saved threshold block info in interrupt handler
  x86/mce/AMD: Use msr_stat when clearing MCA_STATUS
  x86/mce/AMD: Carve out SMCA bank configuration
  x86/mce/AMD: Redo error logging from APIC LVT interrupt handlers
  x86/mce: Convert threshold_bank.cpus from atomic_t to refcount_t
  RAS: Make local function parse_ras_param() static
  ACPI/APEI: Handle GSIV and GPIO notification types
2017-07-03 18:33:03 -07:00