Pull locking updates from Ingo Molnar:
"The main changes in this cycle were:
- Add CONFIG_REFCOUNT_FULL=y to allow the disabling of the 'full'
(robustness checked) refcount_t implementation with slightly lower
runtime overhead. (Kees Cook)
The lighter weight variant is the default. The two variants use the
same API. Having this variant was a precondition by some
maintainers to merge refcount_t cleanups.
- Add lockdep support for rtmutexes (Peter Zijlstra)
- liblockdep fixes and improvements (Sasha Levin, Ben Hutchings)
- ... misc fixes and improvements"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
locking/refcount: Remove the half-implemented refcount_sub() API
locking/refcount: Create unchecked atomic_t implementation
locking/rtmutex: Don't initialize lockdep when not required
locking/selftest: Add RT-mutex support
locking/selftest: Remove the bad unlock ordering test
rt_mutex: Add lockdep annotations
MAINTAINERS: Claim atomic*_t maintainership
locking/x86: Remove the unused atomic_inc_short() methd
tools/lib/lockdep: Remove private kernel headers
tools/lib/lockdep: Hide liblockdep output from test results
tools/lib/lockdep: Add dummy current_gfp_context()
tools/include: Add IS_ERR_OR_NULL to err.h
tools/lib/lockdep: Add empty __is_[module,kernel]_percpu_address
tools/lib/lockdep: Include err.h
tools/include: Add (mostly) empty include/linux/sched/mm.h
tools/lib/lockdep: Use LDFLAGS
tools/lib/lockdep: Remove double-quotes from soname
tools/lib/lockdep: Fix object file paths used in an out-of-tree build
tools/lib/lockdep: Fix compilation for 4.11
tools/lib/lockdep: Don't mix fd-based and stream IO
...
* pm-sleep:
PM: hibernate: constify attribute_group structures.
PM / hibernate: Drop redundant parameter of swsusp_alloc()
PM / hibernate: Use CONFIG_HAVE_SET_MEMORY for include condition
x86/power/64: Use char arrays for asm function names
* pm-tools:
cpupower: Add support for new AMD family 0x17
cpupower: Fix bug where return value was not used
tools/power turbostat: update version number
tools/power turbostat: decode MSR_IA32_MISC_ENABLE only on Intel
tools/power turbostat: stop migrating, unless '-m'
tools/power turbostat: if --debug, print sampling overhead
tools/power turbostat: hide SKL counters, when not requested
intel_pstate: use updated msr-index.h HWP.EPP values
tools/power x86_energy_perf_policy: support HWP.EPP
x86: msr-index.h: fix shifts to ULL results in HWP macros.
x86: msr-index.h: define HWP.EPP values
x86: msr-index.h: define EPB mid-points
Userspace application can do a hypercall through /dev/xen/privcmd, and
some for some hypercalls argument is a pointers to user-provided
structure. When SMAP is supported and enabled, hypervisor can't access.
So, lets allow it.
The same applies to HYPERVISOR_dm_op, where additionally privcmd driver
carefully verify buffer addresses.
Cc: stable@vger.kernel.org
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Adds the plumbing to disable A/D bits in the MMU based on a new role
bit, ad_disabled. When A/D is disabled, the MMU operates as though A/D
aren't available (i.e., using access tracking faults instead).
To avoid SP -> kvm_mmu_page.role.ad_disabled lookups all over the
place, A/D disablement is now stored in the SPTE. This state is stored
in the SPTE by tweaking the use of SPTE_SPECIAL_MASK for access
tracking. Rather than just setting SPTE_SPECIAL_MASK when an
access-tracking SPTE is non-present, we now always set
SPTE_SPECIAL_MASK for access-tracking SPTEs.
Signed-off-by: Peter Feiner <pfeiner@google.com>
[Use role.ad_disabled even for direct (non-shadow) EPT page tables. Add
documentation and a few MMU_WARN_ONs. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Update the Hyper-V vPCI driver to use the Server-2016 version of the vPCI
protocol, fixing MSI creation and retargeting issues.
Signed-off-by: Jork Loeser <jloeser@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Some function pointer structures are used externally to the kernel, like
the paravirt structures. These should never be randomized, so mark them
as such, in preparation for enabling randstruct's automatic selection
of all-function-pointer structures.
These markings are verbatim from Brad Spengler/PaX Team's code in the
last public patch of grsecurity/PaX based on my understanding of the
code. Changes or omissions from the original code are mine and don't
reflect the original grsecurity/PaX code.
Signed-off-by: Kees Cook <keescook@chromium.org>
This marks many critical kernel structures for randomization. These are
structures that have been targeted in the past in security exploits, or
contain functions pointers, pointers to function pointer tables, lists,
workqueues, ref-counters, credentials, permissions, or are otherwise
sensitive. This initial list was extracted from Brad Spengler/PaX Team's
code in the last public patch of grsecurity/PaX based on my understanding
of the code. Changes or omissions from the original code are mine and
don't reflect the original grsecurity/PaX code.
Left out of this list is task_struct, which requires special handling
and will be covered in a subsequent patch.
Signed-off-by: Kees Cook <keescook@chromium.org>
KVM/ARM updates for 4.13
- vcpu request overhaul
- allow timer and PMU to have their interrupt number
selected from userspace
- workaround for Cavium erratum 30115
- handling of memory poisonning
- the usual crop of fixes and cleanups
Conflicts:
arch/s390/include/asm/kvm_host.h
The only user of thread_saved_pc() in non-arch-specific code was removed
in commit 8243d55977 ("sched/core: Remove pointless printout in
sched_show_task()"). Remove the implementations as well.
Some architectures use thread_saved_pc() in their arch-specific code.
Leave their thread_saved_pc() intact.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
And instead wire it up as method for all the dma_map_ops instances.
Note that this also means the arch specific check will be fully instead
of partially applied in the AMD iommu driver.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Pull x86 fix from Thomas Gleixner:
"A single fix to unbreak the vdso32 build for 64bit kernels caused by
excess #includes in the mshyperv header"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mshyperv: Remove excess #includes from mshyperv.h
A recent commit included linux/slab.h in linux/irq.h. This breaks the build
of vdso32 on a 64-bit kernel.
The reason is that linux/irq.h gets included into the vdso code via
linux/interrupt.h which is included from asm/mshyperv.h. That makes the
32-bit vdso compile fail, because slab.h includes the pgtable headers for
64-bit on a 64-bit build.
Neither linux/clocksource.h nor linux/interrupt.h are needed in the
mshyperv.h header file itself - it has a dependency on <linux/atomic.h>.
Remove the includes and unbreak the build.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: devel@linuxdriverproject.org
Fixes: dee863b571 ("hv: export current Hyper-V clocksource")
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1706231038460.2647@nanos
Signed-off-by: Ingo Molnar <mingo@kernel.org>
TF is handled a bit differently for syscall and sysret, compared
to the other instructions: TF is checked after the instruction completes,
so that the OS can disable #DB at a syscall by adding TF to FMASK.
When the sysret is executed the #DB is taken "as if" the syscall insn
just completed.
KVM emulates syscall so that it can trap 32-bit syscall on Intel processors.
Fix the behavior, otherwise you could get #DB on a user stack which is not
nice. This does not affect Linux guests, as they use an IST or task gate
for #DB.
This fixes CVE-2017-7518.
Cc: stable@vger.kernel.org
Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Originally, Linux reloaded the LDT whenever the prev mm or the next
mm had an LDT. It was changed in 2002 in:
0bbed3beb4f2 ("[PATCH] Thread-Local Storage (TLS) support")
(commit from the historical tree), like this:
- /* load_LDT, if either the previous or next thread
- * has a non-default LDT.
+ /*
+ * load the LDT, if the LDT is different:
*/
- if (next->context.size+prev->context.size)
+ if (unlikely(prev->context.ldt != next->context.ldt))
load_LDT(&next->context);
The current code is unlikely to avoid any LDT reloads, since different
mms won't share an LDT.
When we redo lazy mode to stop flush IPIs without switching to
init_mm, though, the current logic would become incorrect: it will
be possible to have real_prev == next but nonetheless have a stale
LDT descriptor.
Simplify the code to update LDTR if either the previous or the next
mm has an LDT, i.e. effectively restore the historical logic..
While we're at it, clean up the code by moving all the ifdeffery to
a header where it belongs.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/2a859ac01245f9594c58f9d0a8b2ed8a7cd2507e.1498022414.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This switches the hibernate_64.S function names into character arrays
to match other areas of the kernel where this is done (e.g., linker
scripts). Specifically this fixes a compile-time error noticed by the
future CONFIG_FORTIFY_SOURCE routines that complained about PAGE_SIZE
being copied out of the "single byte" core_restore_code variable.
Additionally drops the "acpi_save_state_mem" exern which does not
appear to be used anywhere else in the kernel.
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Originally, generated-y and genhdr-y had different meaning, like
follows:
- generated-y: generated headers (other than asm-generic wrappers)
- header-y : headers to be exported
- genhdr-y : generated headers to be exported (generated-y + header-y)
Since commit fcc8487d47 ("uapi: export all headers under uapi
directories"), headers under UAPI directories are all exported.
So, there is no more difference between generated-y and genhdr-y.
We see two users of genhdr-y, arch/{arm,x86}/include/uapi/asm/Kbuild.
They generate some headers in arch/{arm,x86}/include/generated/uapi/asm
directories, which are obviously exported.
Replace them with generated-y, and abolish genhdr-y.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Conflicts:
kernel/sched/Makefile
Pick up the waitqueue related renames - it didn't get much feedback,
so it appears to be uncontroversial. Famous last words? ;-)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With all handling of the CONFIG_ARCH_HAS_PMEM_API case being moved to
libnvdimm and the pmem driver directly we do not need to provide global
wrappers and fallbacks in the CONFIG_ARCH_HAS_PMEM_API=n case. The pmem
driver will simply not link to arch_wb_cache_pmem() in that case. Same
as before, pmem flushing is only defined for x86_64, via
clean_cache_range(), but it is straightforward to add other archs in the
future.
arch_wb_cache_pmem() is an exported function since the pmem module needs
to find it, but it is privately declared in drivers/nvdimm/pmem.h because
there are no consumers outside of the pmem driver.
Cc: <x86@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The clear_pmem() helper simply combines a memset() plus a cache flush.
Now that the flush routine is optionally provided by the dax device
driver we can avoid unnecessary cache management on dax devices fronting
volatile memory.
With clear_pmem() gone we can follow on with a patch to make pmem cache
management completely defined within the pmem driver.
Cc: <x86@kernel.org>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Now that all possible providers of the dax_operations copy_from_iter
method are implemented, switch filesytem-dax to call the driver rather
than copy_to_iter_pmem.
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The pmem driver has a need to transfer data with a persistent memory
destination and be able to rely on the fact that the destination writes are not
cached. It is sufficient for the writes to be flushed to a cpu-store-buffer
(non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync()
to ensure data-writes have reached a power-fail-safe zone in the platform. The
fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn
around and fence previous writes with an "sfence".
Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and
memcpy_flushcache, that guarantee that the destination buffer is not dirty in
the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines
will be used to replace the "pmem api" (include/linux/pmem.h +
arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache()
and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
config symbol, and fallback to copy_from_iter_nocache() and plain memcpy()
otherwise.
This is meant to satisfy the concern from Linus that if a driver wants to do
something beyond the normal nocache semantics it should be something private to
that driver [1], and Al's concern that anything uaccess related belongs with
the rest of the uaccess code [2].
The first consumer of this interface is a new 'copy_from_iter' dax operation so
that pmem can inject cache maintenance operations without imposing this
overhead on other dax-capable drivers.
[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html
Cc: <x86@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>