Commit Graph

14337 Commits

Author SHA1 Message Date
Linus Torvalds
cf626b0da7 Merge branch 'hch.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull procfs updates from Al Viro:
 "Christoph's proc_create_... cleanups series"

* 'hch.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (44 commits)
  xfs, proc: hide unused xfs procfs helpers
  isdn/gigaset: add back gigaset_procinfo assignment
  proc: update SIZEOF_PDE_INLINE_NAME for the new pde fields
  tty: replace ->proc_fops with ->proc_show
  ide: replace ->proc_fops with ->proc_show
  ide: remove ide_driver_proc_write
  isdn: replace ->proc_fops with ->proc_show
  atm: switch to proc_create_seq_private
  atm: simplify procfs code
  bluetooth: switch to proc_create_seq_data
  netfilter/x_tables: switch to proc_create_seq_private
  netfilter/xt_hashlimit: switch to proc_create_{seq,single}_data
  neigh: switch to proc_create_seq_data
  hostap: switch to proc_create_{seq,single}_data
  bonding: switch to proc_create_seq_data
  rtc/proc: switch to proc_create_single_data
  drbd: switch to proc_create_single
  resource: switch to proc_create_seq_data
  staging/rtl8192u: simplify procfs code
  jfs: simplify procfs code
  ...
2018-06-04 10:00:01 -07:00
Ingo Molnar
24dd064d5b Merge branches 'x86/dma', 'x86/microcode', 'x86/mm' and 'x86/vdso' into x86/urgent
Merge these small and simple 1-2 commit branches into the urgent branch.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-04 18:50:32 +02:00
Ingo Molnar
c52b5c5f96 Merge branch 'linus' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-31 12:27:56 +02:00
Ingo Molnar
52f2b34f46 Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU fix from Paul E. McKenney:

 "This additional v4.18 pull request contains a single commit that fell
  through the cracks:

      Provide early rcu_cpu_starting() callback for the benefit of the
      x86/mtrr code, which needs RCU to be available on incoming CPUs
      earlier than has been the case in the past."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-30 07:55:39 +02:00
Christoph Hellwig
0ead51c3fb x86/pci-dma: switch the VIA 32-bit DMA quirk to use the struct device flag
Instead of globally disabling > 32bit DMA using the arch_dma_supported
hook walk the PCI bus under the actually affected bridge and mark every
device with the dma_32bit_limit flag.  This also gets rid of the
arch_dma_supported hook entirely.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-28 12:48:25 +02:00
Christoph Hellwig
098afd9817 x86/pci-dma: remove the explicit nodac and allowdac option
This is something drivers should decide (modulo chipset quirks like
for VIA), which as far as I can tell is how things have been handled
for the last 15 years.

Note that we keep the usedac option for now, as it is used in the wild
to override the too generic VIA quirk.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-28 12:48:21 +02:00
Christoph Hellwig
06e9552f5f x86/pci-dma: remove the experimental forcesac boot option
Limiting the dma mask to avoid PCI (pre-PCIe) DAC cycles while paying
the huge overhead of an IOMMU is rather pointless, and this seriously
gets in the way of dma mapping work.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-28 12:48:16 +02:00
Scott Wood
ff987fcf01 x86/microcode: Make the late update update_lock a raw lock for RT
__reload_late() is called from stop_machine context and thus cannot
acquire a non-raw spinlock on PREEMPT_RT.

Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Pei Zhang <pezhang@redhat.com>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20180524154420.24455-1-swood@redhat.com
2018-05-27 21:50:09 +02:00
Huaisheng Ye
884571f0de dma-mapping: remove unused gfp_t parameter to arch_dma_alloc_attrs
Signed-off-by: Huaisheng Ye <yehs1@lenovo.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-25 11:23:06 +02:00
Alexey Budankov
10b1105004 perf/x86: Store user space frame-pointer value on a sample
Store user space frame-pointer value (BP register) into the perf trace
on a sample for a process so the value becomes available when
unwinding call stacks for functions gaining event samples.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/311d4a34-f81b-5535-3385-01427ac73b41@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-25 08:11:12 +02:00
Dominik Brodowski
8ecc4979b1 x86/speculation: Simplify the CPU bug detection logic
Only CPUs which speculate can speculate. Therefore, it seems prudent
to test for cpu_no_speculation first and only then determine whether
a specific speculating CPU is susceptible to store bypass speculation.
This is underlined by all CPUs currently listed in cpu_no_speculation
were present in cpu_no_spec_store_bypass as well.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@suse.de
Cc: konrad.wilk@oracle.com
Link: https://lkml.kernel.org/r/20180522090539.GA24668@light.dominikbrodowski.net
2018-05-23 10:55:52 +02:00
Peter Zijlstra
f64c6013a2 rcu/x86: Provide early rcu_cpu_starting() callback
The x86/mtrr code does horrific things because hardware. It uses
stop_machine_from_inactive_cpu(), which does a wakeup (of the stopper
thread on another CPU), which uses RCU, all before the CPU is onlined.

RCU complains about this, because wakeups use RCU and RCU does
(rightfully) not consider offline CPUs for grace-periods.

Fix this by initializing RCU way early in the MTRR case.

Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Add !SMP support, per 0day Test Robot report. ]
2018-05-22 16:12:26 -07:00
Linus Torvalds
3b78ce4a34 Merge branch 'speck-v20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Merge speculative store buffer bypass fixes from Thomas Gleixner:

 - rework of the SPEC_CTRL MSR management to accomodate the new fancy
   SSBD (Speculative Store Bypass Disable) bit handling.

 - the CPU bug and sysfs infrastructure for the exciting new Speculative
   Store Bypass 'feature'.

 - support for disabling SSB via LS_CFG MSR on AMD CPUs including
   Hyperthread synchronization on ZEN.

 - PRCTL support for dynamic runtime control of SSB

 - SECCOMP integration to automatically disable SSB for sandboxed
   processes with a filter flag for opt-out.

 - KVM integration to allow guests fiddling with SSBD including the new
   software MSR VIRT_SPEC_CTRL to handle the LS_CFG based oddities on
   AMD.

 - BPF protection against SSB

.. this is just the core and x86 side, other architecture support will
come separately.

* 'speck-v20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits)
  bpf: Prevent memory disambiguation attack
  x86/bugs: Rename SSBD_NO to SSB_NO
  KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD
  x86/speculation, KVM: Implement support for VIRT_SPEC_CTRL/LS_CFG
  x86/bugs: Rework spec_ctrl base and mask logic
  x86/bugs: Remove x86_spec_ctrl_set()
  x86/bugs: Expose x86_spec_ctrl_base directly
  x86/bugs: Unify x86_spec_ctrl_{set_guest,restore_host}
  x86/speculation: Rework speculative_store_bypass_update()
  x86/speculation: Add virtualized speculative store bypass disable support
  x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL
  x86/speculation: Handle HT correctly on AMD
  x86/cpufeatures: Add FEATURE_ZEN
  x86/cpufeatures: Disentangle SSBD enumeration
  x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRS
  x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP
  KVM: SVM: Move spec control call after restore of GS
  x86/cpu: Make alternative_msr_write work for 32-bit code
  x86/bugs: Fix the parameters alignment and missing void
  x86/bugs: Make cpu_show_common() static
  ...
2018-05-21 11:23:26 -07:00
Linus Torvalds
8a6bd2f40e Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "An unfortunately larger set of fixes, but a large portion is
  selftests:

   - Fix the missing clusterid initializaiton for x2apic cluster
     management which caused boot failures due to IPIs being sent to the
     wrong cluster

   - Drop TX_COMPAT when a 64bit executable is exec()'ed from a compat
     task

   - Wrap access to __supported_pte_mask in __startup_64() where clang
     compile fails due to a non PC relative access being generated.

   - Two fixes for 5 level paging fallout in the decompressor:

      - Handle GOT correctly for paging_prepare() and
        cleanup_trampoline()

      - Fix the page table handling in cleanup_trampoline() to avoid
        page table corruption.

   - Stop special casing protection key 0 as this is inconsistent with
     the manpage and also inconsistent with the allocation map handling.

   - Override the protection key wen moving away from PROT_EXEC to
     prevent inaccessible memory.

   - Fix and update the protection key selftests to address breakage and
     to cover the above issue

   - Add a MOV SS self test"

[ Part of the x86 fixes were in the earlier core pull due to dependencies ]

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  x86/mm: Drop TS_COMPAT on 64-bit exec() syscall
  x86/apic/x2apic: Initialize cluster ID properly
  x86/boot/compressed/64: Fix moving page table out of trampoline memory
  x86/boot/compressed/64: Set up GOT for paging_prepare() and cleanup_trampoline()
  x86/pkeys: Do not special case protection key 0
  x86/pkeys/selftests: Add a test for pkey 0
  x86/pkeys/selftests: Save off 'prot' for allocations
  x86/pkeys/selftests: Fix pointer math
  x86/pkeys: Override pkey when moving away from PROT_EXEC
  x86/pkeys/selftests: Fix pkey exhaustion test off-by-one
  x86/pkeys/selftests: Add PROT_EXEC test
  x86/pkeys/selftests: Factor out "instruction page"
  x86/pkeys/selftests: Allow faults on unknown keys
  x86/pkeys/selftests: Avoid printf-in-signal deadlocks
  x86/pkeys/selftests: Remove dead debugging code, fix dprint_in_signal
  x86/pkeys/selftests: Stop using assert()
  x86/pkeys/selftests: Give better unexpected fault error messages
  x86/selftests: Add mov_to_ss test
  x86/mpx/selftests: Adjust the self-test to fresh distros that export the MPX ABI
  x86/pkeys/selftests: Adjust the self-test to fresh distros that export the pkeys ABI
  ...
2018-05-20 11:28:32 -07:00
Linus Torvalds
74cce52f9f Merge branch 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS fix from Thomas Gleixner:
 "Fix a regression in the new AMD SMCA code which issues an SMP function
  call from the early interrupt disabled region of CPU hotplug. To avoid
  that, use cached block addresses which can be used directly"

* 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/MCE/AMD: Cache SMCA MISC block addresses
2018-05-20 11:20:40 -07:00
Linus Torvalds
583dbad340 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core fixes from Thomas Gleixner:

 - Unbreak the BPF compilation which got broken by the unconditional
   requirement of asm-goto, which is not supported by clang.

 - Prevent probing on exception masking instructions in uprobes and
   kprobes to avoid the issues of the delayed exceptions instead of
   having an ugly workaround.

 - Prevent a double free_page() in the error path of do_kexec_load()

 - A set of objtool updates addressing various issues mostly related to
   switch tables and the noreturn detection for recursive sibling calls

 - Header sync for tools.

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Detect RIP-relative switch table references, part 2
  objtool: Detect RIP-relative switch table references
  objtool: Support GCC 8 switch tables
  objtool: Support GCC 8's cold subfunctions
  objtool: Fix "noreturn" detection for recursive sibling calls
  objtool, kprobes/x86: Sync the latest <asm/insn.h> header with tools/objtool/arch/x86/include/asm/insn.h
  x86/cpufeature: Guard asm_volatile_goto usage for BPF compilation
  uprobes/x86: Prohibit probing on MOV SS instruction
  kprobes/x86: Prohibit probing on exception masking instructions
  x86/kexec: Avoid double free_page() upon do_kexec_load() failure
2018-05-20 10:01:38 -07:00
Borislav Petkov
fbf96cf904 x86/MCE/AMD: Read MCx_MISC block addresses on any CPU
We used rdmsr_safe_on_cpu() to make sure we're reading the proper CPU's
MISC block addresses. However, that caused trouble with CPU hotplug due to
the _on_cpu() helper issuing an IPI while IRQs are disabled.

But we don't have to do that: the block addresses are the same on any CPU
so we can read them on any CPU. (What practically happens is, we read them
on the BSP and cache them, and for later reads, we service them from the
cache).

Suggested-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-19 15:21:46 +02:00
Thomas Gleixner
95b5c0a592 Merge branch 'ras/urgent' into ras/core
Pick up urgent fix as pending patch depends on it.
2018-05-19 15:20:49 +02:00
Borislav Petkov
78ce241099 x86/MCE/AMD: Cache SMCA MISC block addresses
... into a global, two-dimensional array and service subsequent reads from
that cache to avoid rdmsr_on_cpu() calls during CPU hotplug (IPIs with IRQs
disabled).

In addition, this fixes a KASAN slab-out-of-bounds read due to wrong usage
of the bank->blocks pointer.

Fixes: 27bd595027 ("x86/mce/AMD: Get address from already initialized block")
Reported-by: Johannes Hirte <johannes.hirte@datenkhaos.de>
Tested-by: Johannes Hirte <johannes.hirte@datenkhaos.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Link: http://lkml.kernel.org/r/20180414004230.GA2033@probook
2018-05-19 15:19:30 +02:00
Colin Ian King
844ea8f626 x86/apm: Fix spelling mistake: "caculate" -> "calculate"
Trivial fix to spelling mistake in module parameter description text

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: kernel-janitors@vger.kernel.org
Cc: "H . Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20180428092448.6493-1-colin.king@canonical.com
2018-05-19 14:18:59 +02:00
Arnd Bergmann
e27c49291a x86: Convert x86_platform_ops to timespec64
The x86 platform operations are fairly isolated, so it's easy to change
them from using timespec to timespec64. It has been checked that all the
users and callers are safe, and there is only one critical function that is
broken beyond 2106:

  pvclock_read_wallclock() uses a 32-bit number of seconds since the epoch
  to communicate the boot time between host and guest in a virtual
  environment. This will work until 2106, but fixing this is outside the
  scope of this change, Add a comment at least.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Radim Krčmář <rkrcmar@redhat.com>
Acked-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: jailhouse-dev@googlegroups.com
Cc: Borislav Petkov <bp@suse.de>
Cc: kvm@vger.kernel.org
Cc: y2038@lists.linaro.org
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: xen-devel@lists.xenproject.org
Cc: John Stultz <john.stultz@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Link: https://lkml.kernel.org/r/20180427201435.3194219-1-arnd@arndb.de
2018-05-19 14:03:14 +02:00
Thomas Gleixner
b563ea676a Merge branch 'linus' into timers/2038
Merge upstream to pick up changes on which pending patches depend on.
2018-05-19 13:55:40 +02:00
Vikas Shivappa
de73f38f76 x86/intel_rdt/mba_sc: Feedback loop to dynamically update mem bandwidth
mba_sc is a feedback loop where we periodically read MBM counters and
try to restrict the bandwidth below a max value so the below is always
true:

  "current bandwidth(cur_bw) < user specified bandwidth(user_bw)"

The frequency of these checks is currently 1s and we just tag along the
MBM overflow timer to do the updates. Doing it once in a second also
makes the calculation of bandwidth easy. The steps of increase or
decrease of bandwidth is the minimum granularity specified by the
hardware.

Although the MBA's goal is to restrict the bandwidth below a maximum,
there may be a need to even increase the bandwidth. Since MBA controls
the L2 external bandwidth where as MBM measures the L3 external
bandwidth, we may end up restricting some rdtgroups unnecessarily. This
may happen in the sequence where rdtgroup (set of jobs) had high
"L3 <-> memory traffic" in initial phases -> mba_sc kicks in and reduced
bandwidth percentage values -> but after some it has mostly "L2 <-> L3"
traffic. In this scenario mba_sc increases the bandwidth percentage when
there is lesser memory traffic.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
Cc: ak@linux.intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/1524263781-14267-7-git-send-email-vikas.shivappa@linux.intel.com
2018-05-19 13:16:44 +02:00
Vikas Shivappa
ba0f26d852 x86/intel_rdt/mba_sc: Prepare for feedback loop
This is a preparatory patch for the mba feedback loop. Add support to
measure the "bandwidth in MBps" and the "delta bandwidth". Measure it by
reading the MBM IA32_QM_CTR MSRs and calculating the amount of "bytes"
moved. There is no user space interface for this and will only be used by
the feedback loop patch.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
Cc: ak@linux.intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/1524263781-14267-6-git-send-email-vikas.shivappa@linux.intel.com
2018-05-19 13:16:44 +02:00
Vikas Shivappa
8205a078ba x86/intel_rdt/mba_sc: Add schemata support
Currently when user updates the "schemata" with new MBA percentage
values, kernel writes the corresponding bandwidth percentage values to
the IA32_MBA_THRTL_MSR.

When MBA is expressed in MBps, the schemata format is changed to have the
per package memory bandwidth in MBps instead of being specified in
percentage. Do not write the IA32_MBA_THRTL_MSRs when the schemata is
updated as that is handled separately.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
Cc: ak@linux.intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/1524263781-14267-5-git-send-email-vikas.shivappa@linux.intel.com
2018-05-19 13:16:44 +02:00
Vikas Shivappa
1bd2a63b4f x86/intel_rdt/mba_sc: Add initialization support
When MBA software controller is enabled, a per domain storage is required
for user specified bandwidth in "MBps" and the "percentage" values which
are programmed into the IA32_MBA_THRTL_MSR. Add support for these data
structures and initialization.

The MBA percentage values have a default max value of 100 but however the
max value in MBps is not available from the hardware so it's set to
U32_MAX.

This simply says that the control group can use all bandwidth by default
but does not say what is the actual max bandwidth available. The actual
bandwidth that is available may depend on lot of factors like QPI link,
number of memory channels, memory channel frequency, its width and memory
speed, how many channels are configured and also if memory interleaving is
enabled. So there is no way to determine the maximum at runtime reliably.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
Cc: ak@linux.intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/1524263781-14267-4-git-send-email-vikas.shivappa@linux.intel.com
2018-05-19 13:16:43 +02:00
Vikas Shivappa
19c635ab24 x86/intel_rdt/mba_sc: Enable/disable MBA software controller
Currently user does memory bandwidth allocation(MBA) by specifying the
bandwidth in percentage via the resctrl schemata file:
	"/sys/fs/resctrl/schemata"

Add a new mount option "mba_MBps" to enable the user to specify MBA
in MBps:

$mount -t resctrl resctrl [-o cdp[,cdpl2][mba_MBps]] /sys/fs/resctrl

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
Cc: ak@linux.intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/1524263781-14267-3-git-send-email-vikas.shivappa@linux.intel.com
2018-05-19 13:16:43 +02:00
Dmitry Safonov
acf4602001 x86/mm: Drop TS_COMPAT on 64-bit exec() syscall
The x86 mmap() code selects the mmap base for an allocation depending on
the bitness of the syscall. For 64bit sycalls it select mm->mmap_base and
for 32bit mm->mmap_compat_base.

exec() calls mmap() which in turn uses in_compat_syscall() to check whether
the mapping is for a 32bit or a 64bit task. The decision is made on the
following criteria:

  ia32    child->thread.status & TS_COMPAT
   x32    child->pt_regs.orig_ax & __X32_SYSCALL_BIT
  ia64    !ia32 && !x32

__set_personality_x32() was dropping TS_COMPAT flag, but
set_personality_64bit() has kept compat syscall flag making
in_compat_syscall() return true during the first exec() syscall.

Which in result has user-visible effects, mentioned by Alexey:
1) It breaks ASAN
$ gcc -fsanitize=address wrap.c -o wrap-asan
$ ./wrap32 ./wrap-asan true
==1217==Shadow memory range interleaves with an existing memory mapping. ASan cannot proceed correctly. ABORTING.
==1217==ASan shadow was supposed to be located in the [0x00007fff7000-0x10007fff7fff] range.
==1217==Process memory map follows:
        0x000000400000-0x000000401000   /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan
        0x000000600000-0x000000601000   /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan
        0x000000601000-0x000000602000   /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan
        0x0000f7dbd000-0x0000f7de2000   /lib64/ld-2.27.so
        0x0000f7fe2000-0x0000f7fe3000   /lib64/ld-2.27.so
        0x0000f7fe3000-0x0000f7fe4000   /lib64/ld-2.27.so
        0x0000f7fe4000-0x0000f7fe5000
        0x7fed9abff000-0x7fed9af54000
        0x7fed9af54000-0x7fed9af6b000   /lib64/libgcc_s.so.1
[snip]

2) It doesn't seem to be great for security if an attacker always knows
that ld.so is going to be mapped into the first 4GB in this case
(the same thing happens for PIEs as well).

The testcase:
$ cat wrap.c

int main(int argc, char *argv[]) {
  execvp(argv[1], &argv[1]);
  return 127;
}

$ gcc wrap.c -o wrap
$ LD_SHOW_AUXV=1 ./wrap ./wrap true |& grep AT_BASE
AT_BASE:         0x7f63b8309000
AT_BASE:         0x7faec143c000
AT_BASE:         0x7fbdb25fa000

$ gcc -m32 wrap.c -o wrap32
$ LD_SHOW_AUXV=1 ./wrap32 ./wrap true |& grep AT_BASE
AT_BASE:         0xf7eff000
AT_BASE:         0xf7cee000
AT_BASE:         0x7f8b9774e000

Fixes: 1b028f784e ("x86/mm: Introduce mmap_compat_base() for 32-bit mmap()")
Fixes: ada26481df ("x86/mm: Make in_compat_syscall() work during exec")
Reported-by: Alexey Izbyshev <izbyshev@ispras.ru>
Bisected-by: Alexander Monakov <amonakov@ispras.ru>
Investigated-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Alexander Monakov <amonakov@ispras.ru>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: stable@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: https://lkml.kernel.org/r/20180517233510.24996-1-dima@arista.com
2018-05-19 12:31:05 +02:00
Kirill A. Shutemov
e4e961e36f x86/mm: Mark __pgtable_l5_enabled __initdata
__pgtable_l5_enabled shouldn't be needed after system has booted.
All preparation is done. We can now mark it as __initdata.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180518103528.59260-8-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-19 11:56:58 +02:00
Kirill A. Shutemov
372fddf709 x86/mm: Introduce the 'no5lvl' kernel parameter
This kernel parameter allows to force kernel to use 4-level paging even
if hardware and kernel support 5-level paging.

The option may be useful to work around regressions related to 5-level
paging.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180518103528.59260-5-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-19 11:56:57 +02:00
Kirill A. Shutemov
ed7588d5dc x86/mm: Stop pretending pgtable_l5_enabled is a variable
pgtable_l5_enabled is defined using cpu_feature_enabled() but we refer
to it as a variable. This is misleading.

Make pgtable_l5_enabled() a function.

We cannot literally define it as a function due to circular dependencies
between header files. Function-alike macros is close enough.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180518103528.59260-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-19 11:56:57 +02:00
Kirill A. Shutemov
ad3fe525b9 x86/mm: Unify pgtable_l5_enabled usage in early boot code
Usually pgtable_l5_enabled is defined using cpu_feature_enabled().
cpu_feature_enabled() is not available in early boot code. We use
several different preprocessor tricks to get around it. It's messy.

Unify them all.

If cpu_feature_enabled() is not yet available, USE_EARLY_PGTABLE_L5 can
be defined before all includes. It makes pgtable_l5_enabled rely on
__pgtable_l5_enabled variable instead. This approach fits all early
users.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180518103528.59260-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-19 11:56:57 +02:00
Ingo Molnar
177bfd725b Merge branches 'x86/urgent' and 'core/urgent' into x86/boot, to pick up fixes and avoid conflicts
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-19 08:18:56 +02:00
Konrad Rzeszutek Wilk
240da953fc x86/bugs: Rename SSBD_NO to SSB_NO
The "336996 Speculative Execution Side Channel Mitigations" from
May defines this as SSB_NO, hence lets sync-up.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-18 11:17:30 +02:00
Linus Torvalds
3acf4e3952 Merge tag 'hwmon-for-linus-v4.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
 "Two k10temp fixes:

   - fix race condition when accessing System Management Network
     registers

   - fix reading critical temperatures on F15h M60h and M70h

  Also add PCI ID's for the AMD Raven Ridge root bridge"

* tag 'hwmon-for-linus-v4.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (k10temp) Use API function to access System Management Network
  x86/amd_nb: Add support for Raven Ridge CPUs
  hwmon: (k10temp) Fix reading critical temperature register
2018-05-17 15:58:12 -07:00
Thomas Gleixner
fed71f7d98 x86/apic/x2apic: Initialize cluster ID properly
Rick bisected a regression on large systems which use the x2apic cluster
mode for interrupt delivery to the commit wich reworked the cluster
management.

The problem is caused by a missing initialization of the clusterid field
in the shared cluster data structures. So all structures end up with
cluster ID 0 which only allows sharing between all CPUs which belong to
cluster 0. All other CPUs with a cluster ID > 0 cannot share the data
structure because they cannot find existing data with their cluster
ID. This causes malfunction with IPIs because IPIs are sent to the wrong
cluster and the caller waits for ever that the target CPU handles the IPI.

Add the missing initialization when a upcoming CPU is the first in a
cluster so that the later booting CPUs can find the data and share it for
proper operation.

Fixes: 023a611748 ("x86/apic/x2apic: Simplify cluster management")
Reported-by: Rick Warner <rick@microway.com>
Bisected-by: Rick Warner <rick@microway.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Rick Warner <rick@microway.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1805171418210.1947@nanos.tec.linutronix.de
2018-05-17 21:00:12 +02:00
Michael S. Tsirkin
633711e828 kvm: rename KVM_HINTS_DEDICATED to KVM_HINTS_REALTIME
KVM_HINTS_DEDICATED seems to be somewhat confusing:

Guest doesn't really care whether it's the only task running on a host
CPU as long as it's not preempted.

And there are more reasons for Guest to be preempted than host CPU
sharing, for example, with memory overcommit it can get preempted on a
memory access, post copy migration can cause preemption, etc.

Let's call it KVM_HINTS_REALTIME which seems to better
match what guests expect.

Also, the flag most be set on all vCPUs - current guests assume this.
Note so in the documentation.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-05-17 19:12:13 +02:00
Tom Lendacky
bc226f07dc KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD
Expose the new virtualized architectural mechanism, VIRT_SSBD, for using
speculative store bypass disable (SSBD) under SVM.  This will allow guests
to use SSBD on hardware that uses non-architectural mechanisms for enabling
SSBD.

[ tglx: Folded the migration fixup from Paolo Bonzini ]

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-17 17:09:21 +02:00
Thomas Gleixner
47c61b3955 x86/speculation, KVM: Implement support for VIRT_SPEC_CTRL/LS_CFG
Add the necessary logic for supporting the emulated VIRT_SPEC_CTRL MSR to
x86_virt_spec_ctrl().  If either X86_FEATURE_LS_CFG_SSBD or
X86_FEATURE_VIRT_SPEC_CTRL is set then use the new guest_virt_spec_ctrl
argument to check whether the state must be modified on the host. The
update reuses speculative_store_bypass_update() so the ZEN-specific sibling
coordination can be reused.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-17 17:09:21 +02:00
Thomas Gleixner
be6fcb5478 x86/bugs: Rework spec_ctrl base and mask logic
x86_spec_ctrL_mask is intended to mask out bits from a MSR_SPEC_CTRL value
which are not to be modified. However the implementation is not really used
and the bitmask was inverted to make a check easier, which was removed in
"x86/bugs: Remove x86_spec_ctrl_set()"

Aside of that it is missing the STIBP bit if it is supported by the
platform, so if the mask would be used in x86_virt_spec_ctrl() then it
would prevent a guest from setting STIBP.

Add the STIBP bit if supported and use the mask in x86_virt_spec_ctrl() to
sanitize the value which is supplied by the guest.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
2018-05-17 17:09:20 +02:00
Thomas Gleixner
4b59bdb569 x86/bugs: Remove x86_spec_ctrl_set()
x86_spec_ctrl_set() is only used in bugs.c and the extra mask checks there
provide no real value as both call sites can just write x86_spec_ctrl_base
to MSR_SPEC_CTRL. x86_spec_ctrl_base is valid and does not need any extra
masking or checking.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-05-17 17:09:20 +02:00
Thomas Gleixner
fa8ac49882 x86/bugs: Expose x86_spec_ctrl_base directly
x86_spec_ctrl_base is the system wide default value for the SPEC_CTRL MSR.
x86_spec_ctrl_get_default() returns x86_spec_ctrl_base and was intended to
prevent modification to that variable. Though the variable is read only
after init and globaly visible already.

Remove the function and export the variable instead.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-05-17 17:09:19 +02:00
Borislav Petkov
cc69b34989 x86/bugs: Unify x86_spec_ctrl_{set_guest,restore_host}
Function bodies are very similar and are going to grow more almost
identical code. Add a bool arg to determine whether SPEC_CTRL is being set
for the guest or restored to the host.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-05-17 17:09:19 +02:00
Thomas Gleixner
0270be3e34 x86/speculation: Rework speculative_store_bypass_update()
The upcoming support for the virtual SPEC_CTRL MSR on AMD needs to reuse
speculative_store_bypass_update() to avoid code duplication. Add an
argument for supplying a thread info (TIF) value and create a wrapper
speculative_store_bypass_update_current() which is used at the existing
call site.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-05-17 17:09:19 +02:00
Tom Lendacky
11fb068349 x86/speculation: Add virtualized speculative store bypass disable support
Some AMD processors only support a non-architectural means of enabling
speculative store bypass disable (SSBD).  To allow a simplified view of
this to a guest, an architectural definition has been created through a new
CPUID bit, 0x80000008_EBX[25], and a new MSR, 0xc001011f.  With this, a
hypervisor can virtualize the existence of this definition and provide an
architectural method for using SSBD to a guest.

Add the new CPUID feature, the new MSR and update the existing SSBD
support to use this MSR when present.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
2018-05-17 17:09:18 +02:00
Thomas Gleixner
ccbcd26744 x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL
AMD is proposing a VIRT_SPEC_CTRL MSR to handle the Speculative Store
Bypass Disable via MSR_AMD64_LS_CFG so that guests do not have to care
about the bit position of the SSBD bit and thus facilitate migration.
Also, the sibling coordination on Family 17H CPUs can only be done on
the host.

Extend x86_spec_ctrl_set_guest() and x86_spec_ctrl_restore_host() with an
extra argument for the VIRT_SPEC_CTRL MSR.

Hand in 0 from VMX and in SVM add a new virt_spec_ctrl member to the CPU
data structure which is going to be used in later patches for the actual
implementation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-05-17 17:09:18 +02:00
Thomas Gleixner
1f50ddb4f4 x86/speculation: Handle HT correctly on AMD
The AMD64_LS_CFG MSR is a per core MSR on Family 17H CPUs. That means when
hyperthreading is enabled the SSBD bit toggle needs to take both cores into
account. Otherwise the following situation can happen:

CPU0		CPU1

disable SSB
		disable SSB
		enable  SSB <- Enables it for the Core, i.e. for CPU0 as well

So after the SSB enable on CPU1 the task on CPU0 runs with SSB enabled
again.

On Intel the SSBD control is per core as well, but the synchronization
logic is implemented behind the per thread SPEC_CTRL MSR. It works like
this:

  CORE_SPEC_CTRL = THREAD0_SPEC_CTRL | THREAD1_SPEC_CTRL

i.e. if one of the threads enables a mitigation then this affects both and
the mitigation is only disabled in the core when both threads disabled it.

Add the necessary synchronization logic for AMD family 17H. Unfortunately
that requires a spinlock to serialize the access to the MSR, but the locks
are only shared between siblings.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-05-17 17:09:18 +02:00
Thomas Gleixner
d1035d9718 x86/cpufeatures: Add FEATURE_ZEN
Add a ZEN feature bit so family-dependent static_cpu_has() optimizations
can be built for ZEN.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-05-17 17:09:18 +02:00
Thomas Gleixner
52817587e7 x86/cpufeatures: Disentangle SSBD enumeration
The SSBD enumeration is similarly to the other bits magically shared
between Intel and AMD though the mechanisms are different.

Make X86_FEATURE_SSBD synthetic and set it depending on the vendor specific
features or family dependent setup.

Change the Intel bit to X86_FEATURE_SPEC_CTRL_SSBD to denote that SSBD is
controlled via MSR_SPEC_CTRL and fix up the usage sites.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-05-17 17:09:17 +02:00
Thomas Gleixner
7eb8956a7f x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRS
The availability of the SPEC_CTRL MSR is enumerated by a CPUID bit on
Intel and implied by IBRS or STIBP support on AMD. That's just confusing
and in case an AMD CPU has IBRS not supported because the underlying
problem has been fixed but has another bit valid in the SPEC_CTRL MSR,
the thing falls apart.

Add a synthetic feature bit X86_FEATURE_MSR_SPEC_CTRL to denote the
availability on both Intel and AMD.

While at it replace the boot_cpu_has() checks with static_cpu_has() where
possible. This prevents late microcode loading from exposing SPEC_CTRL, but
late loading is already very limited as it does not reevaluate the
mitigation options and other bits and pieces. Having static_cpu_has() is
the simplest and least fragile solution.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-05-17 17:09:17 +02:00