Commit Graph

58753 Commits

Author SHA1 Message Date
Arnd Bergmann
bad19e0d04 Merge tag 'tee-drv-dynamic-shm-for-v4.16' of https://git.linaro.org/people/jens.wiklander/linux-tee into next/drivers
Pull "tee dynamic shm for v4.16" from Jens Wiklander:

This pull request enables dynamic shared memory support in the TEE
subsystem as a whole and in OP-TEE in particular.

Global Platform TEE specification [1] allows client applications
to register part of own memory as a shared buffer between
application and TEE. This allows fast zero-copy communication between
TEE and REE. But current implementation of TEE in Linux does not support
this feature.

Also, current implementation of OP-TEE transport uses fixed size
pre-shared buffer for all communications with OP-TEE OS. This is okay
in the most use cases. But this prevents use of OP-TEE in virtualized
environments, because:
 a) We can't share the same buffer between different virtual machines
 b) Physically contiguous memory as seen by VM can be non-contiguous
    in reality (and as seen by OP-TEE OS) due to second stage of
    MMU translation.
 c) Size of this pre-shared buffer is limited.

So, first part of this pull request adds generic register/unregister
interface to tee subsystem. The second part adds necessary features into
OP-TEE driver, so it can use not only static pre-shared buffer, but
whole RAM to communicate with OP-TEE OS.

This change is backwards compatible allowing older secure world or
user space to work with newer kernels and vice versa.

[1] https://www.globalplatform.org/specificationsdevice.asp

* tag 'tee-drv-dynamic-shm-for-v4.16' of https://git.linaro.org/people/jens.wiklander/linux-tee:
  tee: shm: inline tee_shm_get_id()
  tee: use reference counting for tee_context
  tee: optee: enable dynamic SHM support
  tee: optee: add optee-specific shared pool implementation
  tee: optee: store OP-TEE capabilities in private data
  tee: optee: add registered buffers handling into RPC calls
  tee: optee: add registered shared parameters handling
  tee: optee: add shared buffer registration functions
  tee: optee: add page list manipulation functions
  tee: optee: Update protocol definitions
  tee: shm: add page accessor functions
  tee: shm: add accessors for buffer size and page offset
  tee: add register user memory
  tee: flexible shared memory pool creation
2017-12-21 17:23:52 +01:00
Arnd Bergmann
a8e9f5f672 Merge tag 'tee-drv-async-supplicant-for-v4.16' of https://git.linaro.org/people/jens.wiklander/linux-tee into next/drivers
Pull "Enable async communication with tee supplicant" from Jens Wiklander:

This pull request enables asynchronous communication with TEE supplicant
by introducing meta parameters in the user space API. The meta
parameters can be used to tag requests with an id that can be matched
against an asynchronous response as is done here in the OP-TEE driver.

Asynchronous supplicant communication is needed by OP-TEE to implement
GlobalPlatforms TEE Sockets API Specification v1.0.1. The specification
is available at https://www.globalplatform.org/specificationsdevice.asp.

This change is backwards compatible allowing older supplicants to work
with newer kernels and vice versa.

* tag 'tee-drv-async-supplicant-for-v4.16' of https://git.linaro.org/people/jens.wiklander/linux-tee:
  optee: support asynchronous supplicant requests
  tee: add TEE_IOCTL_PARAM_ATTR_META
  tee: add tee_param_is_memref() for driver use
2017-12-21 16:02:07 +01:00
Linus Torvalds
6ba64feff6 Merge branch 'WIP.x86-pti.base.prep-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull Page Table Isolation (PTI) preparatory tree from Ingo Molnar:
 "This does a rename to free up linux/pti.h to be used by the upcoming
  page table isolation feature"

* 'WIP.x86-pti.base.prep-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  drivers/misc/intel/pti: Rename the header file to free up the namespace
2017-12-17 13:54:31 -08:00
Ingo Molnar
1784f9144b drivers/misc/intel/pti: Rename the header file to free up the namespace
We'd like to use the 'PTI' acronym for 'Page Table Isolation' - free up the
namespace by renaming the <linux/pti.h> driver header to <linux/intel-pti.h>.

(Also standardize the header guard name while at it.)

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: J Freyensee <james_p_freyensee@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-17 12:52:34 +01:00
Linus Torvalds
7a3c296ae0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Clamp timeouts to INT_MAX in conntrack, from Jay Elliot.

 2) Fix broken UAPI for BPF_PROG_TYPE_PERF_EVENT, from Hendrik
    Brueckner.

 3) Fix locking in ieee80211_sta_tear_down_BA_sessions, from Johannes
    Berg.

 4) Add missing barriers to ptr_ring, from Michael S. Tsirkin.

 5) Don't advertise gigabit in sh_eth when not available, from Thomas
    Petazzoni.

 6) Check network namespace when delivering to netlink taps, from Kevin
    Cernekee.

 7) Kill a race in raw_sendmsg(), from Mohamed Ghannam.

 8) Use correct address in TCP md5 lookups when replying to an incoming
    segment, from Christoph Paasch.

 9) Add schedule points to BPF map alloc/free, from Eric Dumazet.

10) Don't allow silly mtu values to be used in ipv4/ipv6 multicast, also
    from Eric Dumazet.

11) Fix SKB leak in tipc, from Jon Maloy.

12) Disable MAC learning on OVS ports of mlxsw, from Yuval Mintz.

13) SKB leak fix in skB_complete_tx_timestamp(), from Willem de Bruijn.

14) Add some new qmi_wwan device IDs, from Daniele Palmas.

15) Fix static key imbalance in ingress qdisc, from Jiri Pirko.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
  net: qcom/emac: Reduce timeout for mdio read/write
  net: sched: fix static key imbalance in case of ingress/clsact_init error
  net: sched: fix clsact init error path
  ip_gre: fix wrong return value of erspan_rcv
  net: usb: qmi_wwan: add Telit ME910 PID 0x1101 support
  pkt_sched: Remove TC_RED_OFFLOADED from uapi
  net: sched: Move to new offload indication in RED
  net: sched: Add TCA_HW_OFFLOAD
  net: aquantia: Increment driver version
  net: aquantia: Fix typo in ethtool statistics names
  net: aquantia: Update hw counters on hw init
  net: aquantia: Improve link state and statistics check interval callback
  net: aquantia: Fill in multicast counter in ndev stats from hardware
  net: aquantia: Fill ndev stat couters from hardware
  net: aquantia: Extend stat counters to 64bit values
  net: aquantia: Fix hardware DMA stream overload on large MRRS
  net: aquantia: Fix actual speed capabilities reporting
  sock: free skb in skb_complete_tx_timestamp on error
  s390/qeth: update takeover IPs after configuration change
  s390/qeth: lock IP table while applying takeover changes
  ...
2017-12-15 13:08:37 -08:00
Linus Torvalds
1f76a75561 Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
 "Misc fixes:

   - Fix a S390 boot hang that was caused by the lock-break logic.
     Remove lock-break to begin with, as review suggested it was
     unreasonably fragile and our confidence in its continued good
     health is lower than our confidence in its removal.

   - Remove the lockdep cross-release checking code for now, because of
     unresolved false positive warnings. This should make lockdep work
     well everywhere again.

   - Get rid of the final (and single) ACCESS_ONCE() straggler and
     remove the API from v4.15.

   - Fix a liblockdep build warning"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tools/lib/lockdep: Add missing declaration of 'pr_cont()'
  checkpatch: Remove ACCESS_ONCE() warning
  compiler.h: Remove ACCESS_ONCE()
  tools/include: Remove ACCESS_ONCE()
  tools/perf: Convert ACCESS_ONCE() to READ_ONCE()
  locking/lockdep: Remove the cross-release locking checks
  locking/core: Remove break_lock field when CONFIG_GENERIC_LOCKBREAK=y
  locking/core: Fix deadlock during boot on systems with GENERIC_LOCKBREAK
2017-12-15 11:44:59 -08:00
Volodymyr Babchuk
ef8e08d24c tee: shm: inline tee_shm_get_id()
Now, when struct tee_shm is defined in public header,
we can inline small getter functions like this one.

Signed-off-by: Volodymyr Babchuk <vlad.babchuk@gmail.com>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
2017-12-15 13:36:21 +01:00
Volodymyr Babchuk
217e0250cc tee: use reference counting for tee_context
We need to ensure that tee_context is present until last
shared buffer will be freed.

Signed-off-by: Volodymyr Babchuk <vlad.babchuk@gmail.com>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
2017-12-15 13:36:18 +01:00
Volodymyr Babchuk
e0c69ae8bf tee: shm: add page accessor functions
In order to register a shared buffer in TEE, we need accessor
function that return list of pages for that buffer.

Signed-off-by: Volodymyr Babchuk <vlad.babchuk@gmail.com>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
2017-12-15 13:32:30 +01:00
Volodymyr Babchuk
b25946ad95 tee: shm: add accessors for buffer size and page offset
These two function will be needed for shared memory registration in OP-TEE

Signed-off-by: Volodymyr Babchuk <vlad.babchuk@gmail.com>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
2017-12-15 13:32:30 +01:00
Jens Wiklander
033ddf12bc tee: add register user memory
Added new ioctl to allow users register own buffers as a shared memory.

Signed-off-by: Volodymyr Babchuk <vlad.babchuk@gmail.com>
[jw: moved tee_shm_is_registered() declaration]
[jw: added space after __tee_shm_alloc() implementation]
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
2017-12-15 13:32:20 +01:00
Jens Wiklander
e2aca5d892 tee: flexible shared memory pool creation
Makes creation of shm pools more flexible by adding new more primitive
functions to allocate a shm pool. This makes it easier to add driver
specific shm pool management.

Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Signed-off-by: Volodymyr Babchuk <vlad.babchuk@gmail.com>
2017-12-15 12:37:29 +01:00
Linus Torvalds
032b4cc8ff Merge tag 'pm-4.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
 "This fixes an issue in two recent commits that may cause
  pm_runtime_enable() to be called for too many times for some devices
  during the "thaw" transition belonging to hibernation"

* tag 'pm-4.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / sleep: Avoid excess pm_runtime_enable() calls in device_resume()
2017-12-14 18:25:03 -08:00
Linus Torvalds
0424378781 Merge tag 'trace-v4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
 "Various fix-ups:

   - comment fixes

   - build fix

   - better memory alloction (don't use NR_CPUS)

   - configuration fix

   - build warning fix

   - enhanced callback parameter (to simplify users of trace hooks)

   - give up on stack tracing when RCU isn't watching (it's a lost
     cause)"

* tag 'trace-v4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Have stack trace not record if RCU is not watching
  tracing: Pass export pointer as argument to ->write()
  ring-buffer: Remove unused function __rb_data_page_index()
  tracing: make PREEMPTIRQ_EVENTS depend on TRACING
  tracing: Allocate mask_str buffer dynamically
  tracing: always define trace_{irq,preempt}_{enable_disable}
  tracing: Fix code comments in trace.c
2017-12-14 18:21:33 -08:00
Linus Torvalds
c4f988ee51 Merge tag 'pci-v4.15-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:

 - add a pci_get_domain_bus_and_slot() stub for the CONFIG_PCI=n case to
   avoid build breakage in the v4.16 merge window if a
   pci_get_bus_and_slot() -> pci_get_domain_bus_and_slot() patch gets
   merged before the PCI tree (Randy Dunlap)

 - fix an AMD boot regression in the 64bit BAR support added in v4.15
   (Christian König)

 - fix an R-Car use-after-free that causes a crash if no PCIe card is
   present (Geert Uytterhoeven)

* tag 'pci-v4.15-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: rcar: Fix use-after-free in probe error path
  x86/PCI: Only enable a 64bit BAR on single-socket AMD Family 15h
  x86/PCI: Fix infinite loop in search for 64bit BAR placement
  PCI: Add pci_get_domain_bus_and_slot() stub
2017-12-14 17:02:39 -08:00
Michal Hocko
4837fe37ad mm, oom_reaper: fix memory corruption
David Rientjes has reported the following memory corruption while the
oom reaper tries to unmap the victims address space

  BUG: Bad page map in process oom_reaper  pte:6353826300000000 pmd:00000000
  addr:00007f50cab1d000 vm_flags:08100073 anon_vma:ffff9eea335603f0 mapping:          (null) index:7f50cab1d
  file:          (null) fault:          (null) mmap:          (null) readpage:          (null)
  CPU: 2 PID: 1001 Comm: oom_reaper
  Call Trace:
     unmap_page_range+0x1068/0x1130
     __oom_reap_task_mm+0xd5/0x16b
     oom_reaper+0xff/0x14c
     kthread+0xc1/0xe0

Tetsuo Handa has noticed that the synchronization inside exit_mmap is
insufficient.  We only synchronize with the oom reaper if
tsk_is_oom_victim which is not true if the final __mmput is called from
a different context than the oom victim exit path.  This can trivially
happen from context of any task which has grabbed mm reference (e.g.  to
read /proc/<pid>/ file which requires mm etc.).

The race would look like this

  oom_reaper		oom_victim		task
						mmget_not_zero
			do_exit
			  mmput
  __oom_reap_task_mm				mmput
  						  __mmput
						    exit_mmap
						      remove_vma
    unmap_page_range

Fix this issue by providing a new mm_is_oom_victim() helper which
operates on the mm struct rather than a task.  Any context which
operates on a remote mm struct should use this helper in place of
tsk_is_oom_victim.  The flag is set in mark_oom_victim and never cleared
so it is stable in the exit_mmap path.

Debugged by Tetsuo Handa.

Link: http://lkml.kernel.org/r/20171210095130.17110-1-mhocko@kernel.org
Fixes: 2129258024 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: David Rientjes <rientjes@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: <stable@vger.kernel.org>	[4.14]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14 16:00:49 -08:00
Thiago Rafael Becker
bdcf0a423e kernel: make groups_sort calling a responsibility group_info allocators
In testing, we found that nfsd threads may call set_groups in parallel
for the same entry cached in auth.unix.gid, racing in the call of
groups_sort, corrupting the groups for that entry and leading to
permission denials for the client.

This patch:
 - Make groups_sort globally visible.
 - Move the call to groups_sort to the modifiers of group_info
 - Remove the call to groups_sort from set_groups

Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com
Signed-off-by: Thiago Rafael Becker <thiago.becker@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14 16:00:49 -08:00
Arnd Bergmann
3756f6401c exec: avoid gcc-8 warning for get_task_comm
gcc-8 warns about using strncpy() with the source size as the limit:

  fs/exec.c:1223:32: error: argument to 'sizeof' in 'strncpy' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]

This is indeed slightly suspicious, as it protects us from source
arguments without NUL-termination, but does not guarantee that the
destination is terminated.

This keeps the strncpy() to ensure we have properly padded target
buffer, but ensures that we use the correct length, by passing the
actual length of the destination buffer as well as adding a build-time
check to ensure it is exactly TASK_COMM_LEN.

There are only 23 callsites which I all reviewed to ensure this is
currently the case.  We could get away with doing only the check or
passing the right length, but it doesn't hurt to do both.

Link: http://lkml.kernel.org/r/20171205151724.1764896-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Kees Cook <keescook@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Aleksa Sarai <asarai@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14 16:00:48 -08:00
Arnd Bergmann
146734b091 string.h: workaround for increased stack usage
The hardened strlen() function causes rather large stack usage in at
least one file in the kernel, in particular when CONFIG_KASAN is
enabled:

  drivers/media/usb/em28xx/em28xx-dvb.c: In function 'em28xx_dvb_init':
  drivers/media/usb/em28xx/em28xx-dvb.c:2062:1: error: the frame size of 3256 bytes is larger than 204 bytes [-Werror=frame-larger-than=]

Analyzing this problem led to the discovery that gcc fails to merge the
stack slots for the i2c_board_info[] structures after we strlcpy() into
them, due to the 'noreturn' attribute on the source string length check.

I reported this as a gcc bug, but it is unlikely to get fixed for gcc-8,
since it is relatively easy to work around, and it gets triggered
rarely.  An earlier workaround I did added an empty inline assembly
statement before the call to fortify_panic(), which works surprisingly
well, but is really ugly and unintuitive.

This is a new approach to the same problem, this time addressing it by
not calling the 'extern __real_strnlen()' function for string constants
where __builtin_strlen() is a compile-time constant and therefore known
to be safe.

We do this by checking if the last character in the string is a
compile-time constant '\0'.  If it is, we can assume that strlen() of
the string is also constant.

As a side-effect, this should also improve the object code output for
any other call of strlen() on a string constant.

[akpm@linux-foundation.org: add comment]
Link: http://lkml.kernel.org/r/20171205215143.3085755-1-arnd@arndb.de
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82365
Link: https://patchwork.kernel.org/patch/9980413/
Link: https://patchwork.kernel.org/patch/9974047/
Fixes: 6974f0c455 ("include/linux/string.h: add the option of fortified string.h functions")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Daniel Micay <danielmicay@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Martin Wilck <mwilck@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14 16:00:48 -08:00
Chris Wilson
338f1d9d1b lib/rbtree,drm/mm: add rbtree_replace_node_cached()
Add a variant of rbtree_replace_node() that maintains the leftmost cache
of struct rbtree_root_cached when replacing nodes within the rbtree.

As drm_mm is the only rb_replace_node() being used on an interval tree,
the mistake looks fairly self-contained.  Furthermore the only user of
drm_mm_replace_node() is its testsuite...

Testcase: igt/drm_mm/replace

Link: http://lkml.kernel.org/r/20171122100729.3742-1-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/20171109212435.9265-1-chris@chris-wilson.co.uk
Fixes: f808c13fd3 ("lib/interval_tree: fast overlap detection")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14 16:00:48 -08:00
Wei Wang
c47d7f56e9 include/linux/idr.h: add #include <linux/bug.h>
The <linux/bug.h> was removed from radix-tree.h by commit f5bba9d11a
("include/linux/radix-tree.h: remove unneeded #include <linux/bug.h>").

Since that commit, tools/testing/radix-tree/ couldn't pass compilation
due to tools/testing/radix-tree/idr.c:17: undefined reference to
WARN_ON_ONCE.  This patch adds the bug.h header to idr.h to solve the
issue.

Link: http://lkml.kernel.org/r/1511963726-34070-2-git-send-email-wei.w.wang@intel.com
Fixes: f5bba9d11a ("include/linux/radix-tree.h: remove unneeded #include <linux/bug.h>")
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14 16:00:48 -08:00
Mark Rutland
b899a85043 compiler.h: Remove ACCESS_ONCE()
There are no longer any kernelspace uses of ACCESS_ONCE(), so we can
remove the definition from <linux/compiler.h>.

This patch removes the ACCESS_ONCE() definition, and updates comments
which referred to it. At the same time, some inconsistent and redundant
whitespace is removed from comments.

Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: apw@canonical.com
Link: http://lkml.kernel.org/r/20171127103824.36526-4-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-12 13:22:10 +01:00
Ingo Molnar
e966eaeeb6 locking/lockdep: Remove the cross-release locking checks
This code (CONFIG_LOCKDEP_CROSSRELEASE=y and CONFIG_LOCKDEP_COMPLETIONS=y),
while it found a number of old bugs initially, was also causing too many
false positives that caused people to disable lockdep - which is arguably
a worse overall outcome.

If we disable cross-release by default but keep the code upstream then
in practice the most likely outcome is that we'll allow the situation
to degrade gradually, by allowing entropy to introduce more and more
false positives, until it overwhelms maintenance capacity.

Another bad side effect was that people were trying to work around
the false positives by uglifying/complicating unrelated code. There's
a marked difference between annotating locking operations and
uglifying good code just due to bad lock debugging code ...

This gradual decrease in quality happened to a number of debugging
facilities in the kernel, and lockdep is pretty complex already,
so we cannot risk this outcome.

Either cross-release checking can be done right with no false positives,
or it should not be included in the upstream kernel.

( Note that it might make sense to maintain it out of tree and go through
  the false positives every now and then and see whether new bugs were
  introduced. )

Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-12 12:38:51 +01:00
Will Deacon
d89c70356a locking/core: Remove break_lock field when CONFIG_GENERIC_LOCKBREAK=y
When CONFIG_GENERIC_LOCKBEAK=y, locking structures grow an extra int ->break_lock
field which is used to implement raw_spin_is_contended() by setting the field
to 1 when waiting on a lock and clearing it to zero when holding a lock.
However, there are a few problems with this approach:

  - There is a write-write race between a CPU successfully taking the lock
    (and subsequently writing break_lock = 0) and a waiter waiting on
    the lock (and subsequently writing break_lock = 1). This could result
    in a contended lock being reported as uncontended and vice-versa.

  - On machines with store buffers, nothing guarantees that the writes
    to break_lock are visible to other CPUs at any particular time.

  - READ_ONCE/WRITE_ONCE are not used, so the field is potentially
    susceptible to harmful compiler optimisations,

Consequently, the usefulness of this field is unclear and we'd be better off
removing it and allowing architectures to implement raw_spin_is_contended() by
providing a definition of arch_spin_is_contended(), as they can when
CONFIG_GENERIC_LOCKBREAK=n.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1511894539-7988-3-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-12 11:24:01 +01:00
Michael S. Tsirkin
a8ceb5dbfd ptr_ring: add barriers
Users of ptr_ring expect that it's safe to give the
data structure a pointer and have it be available
to consumers, but that actually requires an smb_wmb
or a stronger barrier.

In absence of such barriers and on architectures that reorder writes,
consumer might read an un=initialized value from an skb pointer stored
in the skb array.  This was observed causing crashes.

To fix, add memory barriers.  The barrier we use is a wmb, the
assumption being that producers do not need to read the value so we do
not need to order these reads.

Reported-by: George Cherian <george.cherian@cavium.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-11 10:52:23 -05:00
Rafael J. Wysocki
3487972d7f PM / sleep: Avoid excess pm_runtime_enable() calls in device_resume()
Middle-layer code doing suspend-time optimizations for devices with
the DPM_FLAG_SMART_SUSPEND flag set (currently, the PCI bus type and
the ACPI PM domain) needs to make the core skip ->thaw_early and
->thaw callbacks for those devices in some cases and it sets the
power.direct_complete flag for them for this purpose.

However, it turns out that setting power.direct_complete outside of
the PM core is a bad idea as it triggers an excess invocation of
pm_runtime_enable() in device_resume().

For this reason, provide a helper to clear power.is_late_suspended
and power.is_suspended to be invoked by the middle-layer code in
question instead of setting power.direct_complete and make that code
call the new helper.

Fixes: c4b65157ae (PCI / PM: Take SMART_SUSPEND driver flag into account)
Fixes: 05087360fd (ACPI / PM: Take SMART_SUSPEND driver flag into account)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
2017-12-11 14:32:56 +01:00
Linus Torvalds
c465fc11e5 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
 "ARM:
   - A number of issues in the vgic discovered using SMATCH
   - A bit one-off calculation in out stage base address mask (32-bit
     and 64-bit)
   - Fixes to single-step debugging instructions that trap for other
     reasons such as MMMIO aborts
   - Printing unavailable hyp mode as error
   - Potential spinlock deadlock in the vgic
   - Avoid calling vgic vcpu free more than once
   - Broken bit calculation for big endian systems

 s390:
   - SPDX tags
   - Fence storage key accesses from problem state
   - Make sure that irq_state.flags is not used in the future

  x86:
   - Intercept port 0x80 accesses to prevent host instability (CVE)
   - Use userspace FPU context for guest FPU (mainly an optimization
     that fixes a double use of kernel FPU)
   - Do not leak one page per module load
   - Flush APIC page address cache from MMU invalidation notifiers"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
  KVM: x86: fix APIC page invalidation
  KVM: s390: Fix skey emulation permission check
  KVM: s390: mark irq_state.flags as non-usable
  KVM: s390: Remove redundant license text
  KVM: s390: add SPDX identifiers to the remaining files
  KVM: VMX: fix page leak in hardware_setup()
  KVM: VMX: remove I/O port 0x80 bypass on Intel hosts
  x86,kvm: remove KVM emulator get_fpu / put_fpu
  x86,kvm: move qemu/guest FPU switching out to vcpu_run
  KVM: arm/arm64: Fix broken GICH_ELRSR big endian conversion
  KVM: arm/arm64: kvm_arch_destroy_vm cleanups
  KVM: arm/arm64: Fix spinlock acquisition in vgic_set_owner
  kvm: arm: don't treat unavailable HYP mode as an error
  KVM: arm/arm64: Avoid attempting to load timer vgic state without a vgic
  kvm: arm64: handle single-step of hyp emulated mmio instructions
  kvm: arm64: handle single-step during SError exceptions
  kvm: arm64: handle single-step of userspace mmio instructions
  kvm: arm64: handle single-stepping trapped instructions
  KVM: arm/arm64: debug: Introduce helper for single-step
  arm: KVM: Fix VTTBR_BADDR_MASK BUG_ON off-by-one
  ...
2017-12-10 08:24:16 -08:00
Michal Hocko
f335195adf kmemcheck: rip it out for real
Commit 4675ff05de ("kmemcheck: rip it out") has removed the code but
for some reason SPDX header stayed in place.  This looks like a rebase
mistake in the mmotm tree or the merge mistake.  Let's drop those
leftovers as well.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-08 13:40:17 -08:00
Linus Torvalds
e9ef1fe312 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) CAN fixes from Martin Kelly (cancel URBs properly in all the CAN usb
    drivers).

 2) Revert returning -EEXIST from __dev_alloc_name() as this propagates
    to userspace and broke some apps. From Johannes Berg.

 3) Fix conn memory leaks and crashes in TIPC, from Jon Malloc and Cong
    Wang.

 4) Gianfar MAC can't do EEE so don't advertise it by default, from
    Claudiu Manoil.

 5) Relax strict netlink attribute validation, but emit a warning. From
    David Ahern.

 6) Fix regression in checksum offload of thunderx driver, from Florian
    Westphal.

 7) Fix UAPI bpf issues on s390, from Hendrik Brueckner.

 8) New card support in iwlwifi, from Ihab Zhaika.

 9) BBR congestion control bug fixes from Neal Cardwell.

10) Fix port stats in nfp driver, from Pieter Jansen van Vuuren.

11) Fix leaks in qualcomm rmnet, from Subash Abhinov Kasiviswanathan.

12) Fix DMA API handling in sh_eth driver, from Thomas Petazzoni.

13) Fix spurious netpoll warnings in bnxt_en, from Calvin Owens.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits)
  net: mvpp2: fix the RSS table entry offset
  tcp: evaluate packet losses upon RTT change
  tcp: fix off-by-one bug in RACK
  tcp: always evaluate losses in RACK upon undo
  tcp: correctly test congestion state in RACK
  bnxt_en: Fix sources of spurious netpoll warnings
  tcp_bbr: reset long-term bandwidth sampling on loss recovery undo
  tcp_bbr: reset full pipe detection on loss recovery undo
  tcp_bbr: record "full bw reached" decision in new full_bw_reached bit
  sfc: pass valid pointers from efx_enqueue_unwind
  gianfar: Disable EEE autoneg by default
  tcp: invalidate rate samples during SACK reneging
  can: peak/pcie_fd: fix potential bug in restarting tx queue
  can: usb_8dev: cancel urb on -EPIPE and -EPROTO
  can: kvaser_usb: cancel urb on -EPIPE and -EPROTO
  can: esd_usb2: cancel urb on -EPIPE and -EPROTO
  can: ems_usb: cancel urb on -EPIPE and -EPROTO
  can: mcba_usb: cancel urb on -EPROTO
  usbnet: fix alignment for frames with no ethernet header
  tcp: use current time in tcp_rcv_space_adjust()
  ...
2017-12-08 13:32:44 -08:00
Yousuk Seung
d4761754b4 tcp: invalidate rate samples during SACK reneging
Mark tcp_sock during a SACK reneging event and invalidate rate samples
while marked. Such rate samples may overestimate bw by including packets
that were SACKed before reneging.

< ack 6001 win 10000 sack 7001:38001
< ack 7001 win 0 sack 8001:38001 // Reneg detected
> seq 7001:8001 // RTO, SACK cleared.
< ack 38001 win 10000

In above example the rate sample taken after the last ack will count
7001-38001 as delivered while the actual delivery rate likely could
be much lower i.e. 7001-8001.

This patch adds a new field tcp_sock.sack_reneg and marks it when we
declare SACK reneging and entering TCP_CA_Loss, and unmarks it after
the last rate sample was taken before moving back to TCP_CA_Open. This
patch also invalidates rate samples taken while tcp_sock.is_sack_reneg
is set.

Fixes: b9f64820fb ("tcp: track data delivery rate for a TCP connection")
Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08 10:07:02 -05:00
David S. Miller
195bd525d5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2017-12-06

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fixing broken uapi for BPF tracing programs for s390 and arm64
   architectures due to pt_regs being in-kernel only, and not part
   of uapi right now. A wrapper is added that exports pt_regs in
   an asm-generic way. For arm64 this maps to existing user_pt_regs
   structure and for s390 a user_pt_regs structure exporting the
   beginning of pt_regs is added and uapi-exported, thus fixing the
   BPF issues seen in perf (and BPF selftests), all from Hendrik.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-07 16:22:51 -05:00
Bjørn Mork
a4abd7a80a usbnet: fix alignment for frames with no ethernet header
The qmi_wwan minidriver support a 'raw-ip' mode where frames are
received without any ethernet header. This causes alignment issues
because the skbs allocated by usbnet are "IP aligned".

Fix by allowing minidrivers to disable the additional alignment
offset. This is implemented using a per-device flag, since the same
minidriver also supports 'ethernet' mode.

Fixes: 32f7adf633 ("net: qmi_wwan: support "raw IP" mode")
Reported-and-tested-by: Jay Foster <jay@systech.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-07 14:32:30 -05:00
Linus Torvalds
61d6be3a7a Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
 "Two fixes: use bool type consistently, plus a irq_matrix_available()
  bugfix"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqdesc: Use bool return type instead of int
  genirq/matrix: Fix the precedence fix for real
2017-12-06 15:47:51 -08:00
Randy Dunlap
7912af5c83 PCI: Add pci_get_domain_bus_and_slot() stub
The coretemp driver build fails when CONFIG_PCI is not enabled because it
uses a function that does not have a stub for that config case, so add the
function stub.

  ../drivers/hwmon/coretemp.c: In function 'adjust_tjmax':
  ../drivers/hwmon/coretemp.c:250:9: error: implicit declaration of function 'pci_get_domain_bus_and_slot' [-Werror=implicit-function-declaration]
    struct pci_dev *host_bridge = pci_get_domain_bus_and_slot(0, 0, devfn);
  ../drivers/hwmon/coretemp.c:250:32: warning: initialization makes pointer from integer without a cast [enabled by default]
    struct pci_dev *host_bridge = pci_get_domain_bus_and_slot(0, 0, devfn);

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
[bhelgaas: identical patch also by Arnd Bergmann <arnd@arndb.de>]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
2017-12-06 14:55:05 -06:00
Greg Kroah-Hartman
af97a77bc0 efi: Move some sysfs files to be read-only by root
Thanks to the scripts/leaking_addresses.pl script, it was found that
some EFI values should not be readable by non-root users.

So make them root-only, and to do that, add a __ATTR_RO_MODE() macro to
make this easier, and use it in other places at the same time.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Cc: stable <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20171206095010.24170-2-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-06 19:31:39 +01:00
Eric Dumazet
d7efc6c11b net: remove hlist_nulls_add_tail_rcu()
Alexander Potapenko reported use of uninitialized memory [1]

This happens when inserting a request socket into TCP ehash,
in __sk_nulls_add_node_rcu(), since sk_reuseport is not initialized.

Bug was added by commit d894ba18d4 ("soreuseport: fix ordering for
mixed v4/v6 sockets")

Note that d296ba60d8 ("soreuseport: Resolve merge conflict for v4/v6
ordering fix") missed the opportunity to get rid of
hlist_nulls_add_tail_rcu() :

Both UDP sockets and TCP/DCCP listeners no longer use
__sk_nulls_add_node_rcu() for their hash insertion.

Since all other sockets have unique 4-tuple, the reuseport status
has no special meaning, so we can always use hlist_nulls_add_head_rcu()
for them and save few cycles/instructions.

[1]

==================================================================
BUG: KMSAN: use of uninitialized memory in inet_ehash_insert+0xd40/0x1050
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0+ #3288
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:16
 dump_stack+0x185/0x1d0 lib/dump_stack.c:52
 kmsan_report+0x13f/0x1c0 mm/kmsan/kmsan.c:1016
 __msan_warning_32+0x69/0xb0 mm/kmsan/kmsan_instr.c:766
 __sk_nulls_add_node_rcu ./include/net/sock.h:684
 inet_ehash_insert+0xd40/0x1050 net/ipv4/inet_hashtables.c:413
 reqsk_queue_hash_req net/ipv4/inet_connection_sock.c:754
 inet_csk_reqsk_queue_hash_add+0x1cc/0x300 net/ipv4/inet_connection_sock.c:765
 tcp_conn_request+0x31e7/0x36f0 net/ipv4/tcp_input.c:6414
 tcp_v4_conn_request+0x16d/0x220 net/ipv4/tcp_ipv4.c:1314
 tcp_rcv_state_process+0x42a/0x7210 net/ipv4/tcp_input.c:5917
 tcp_v4_do_rcv+0xa6a/0xcd0 net/ipv4/tcp_ipv4.c:1483
 tcp_v4_rcv+0x3de0/0x4ab0 net/ipv4/tcp_ipv4.c:1763
 ip_local_deliver_finish+0x6bb/0xcb0 net/ipv4/ip_input.c:216
 NF_HOOK ./include/linux/netfilter.h:248
 ip_local_deliver+0x3fa/0x480 net/ipv4/ip_input.c:257
 dst_input ./include/net/dst.h:477
 ip_rcv_finish+0x6fb/0x1540 net/ipv4/ip_input.c:397
 NF_HOOK ./include/linux/netfilter.h:248
 ip_rcv+0x10f6/0x15c0 net/ipv4/ip_input.c:488
 __netif_receive_skb_core+0x36f6/0x3f60 net/core/dev.c:4298
 __netif_receive_skb net/core/dev.c:4336
 netif_receive_skb_internal+0x63c/0x19c0 net/core/dev.c:4497
 napi_skb_finish net/core/dev.c:4858
 napi_gro_receive+0x629/0xa50 net/core/dev.c:4889
 e1000_receive_skb drivers/net/ethernet/intel/e1000/e1000_main.c:4018
 e1000_clean_rx_irq+0x1492/0x1d30
drivers/net/ethernet/intel/e1000/e1000_main.c:4474
 e1000_clean+0x43aa/0x5970 drivers/net/ethernet/intel/e1000/e1000_main.c:3819
 napi_poll net/core/dev.c:5500
 net_rx_action+0x73c/0x1820 net/core/dev.c:5566
 __do_softirq+0x4b4/0x8dd kernel/softirq.c:284
 invoke_softirq kernel/softirq.c:364
 irq_exit+0x203/0x240 kernel/softirq.c:405
 exiting_irq+0xe/0x10 ./arch/x86/include/asm/apic.h:638
 do_IRQ+0x15e/0x1a0 arch/x86/kernel/irq.c:263
 common_interrupt+0x86/0x86

Fixes: d894ba18d4 ("soreuseport: fix ordering for mixed v4/v6 sockets")
Fixes: d296ba60d8 ("soreuseport: Resolve merge conflict for v4/v6 ordering fix")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Alexander Potapenko <glider@google.com>
Acked-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 18:06:09 -05:00
Rik van Riel
f775b13eed x86,kvm: move qemu/guest FPU switching out to vcpu_run
Currently, every time a VCPU is scheduled out, the host kernel will
first save the guest FPU/xstate context, then load the qemu userspace
FPU context, only to then immediately save the qemu userspace FPU
context back to memory. When scheduling in a VCPU, the same extraneous
FPU loads and saves are done.

This could be avoided by moving from a model where the guest FPU is
loaded and stored with preemption disabled, to a model where the
qemu userspace FPU is swapped out for the guest FPU context for
the duration of the KVM_RUN ioctl.

This is done under the VCPU mutex, which is also taken when other
tasks inspect the VCPU FPU context, so the code should already be
safe for this change. That should come as no surprise, given that
s390 already has this optimization.

This can fix a bug where KVM calls get_user_pages while owning the
FPU, and the file system ends up requesting the FPU again:

    [258270.527947]  __warn+0xcb/0xf0
    [258270.527948]  warn_slowpath_null+0x1d/0x20
    [258270.527951]  kernel_fpu_disable+0x3f/0x50
    [258270.527953]  __kernel_fpu_begin+0x49/0x100
    [258270.527955]  kernel_fpu_begin+0xe/0x10
    [258270.527958]  crc32c_pcl_intel_update+0x84/0xb0
    [258270.527961]  crypto_shash_update+0x3f/0x110
    [258270.527968]  crc32c+0x63/0x8a [libcrc32c]
    [258270.527975]  dm_bm_checksum+0x1b/0x20 [dm_persistent_data]
    [258270.527978]  node_prepare_for_write+0x44/0x70 [dm_persistent_data]
    [258270.527985]  dm_block_manager_write_callback+0x41/0x50 [dm_persistent_data]
    [258270.527988]  submit_io+0x170/0x1b0 [dm_bufio]
    [258270.527992]  __write_dirty_buffer+0x89/0x90 [dm_bufio]
    [258270.527994]  __make_buffer_clean+0x4f/0x80 [dm_bufio]
    [258270.527996]  __try_evict_buffer+0x42/0x60 [dm_bufio]
    [258270.527998]  dm_bufio_shrink_scan+0xc0/0x130 [dm_bufio]
    [258270.528002]  shrink_slab.part.40+0x1f5/0x420
    [258270.528004]  shrink_node+0x22c/0x320
    [258270.528006]  do_try_to_free_pages+0xf5/0x330
    [258270.528008]  try_to_free_pages+0xe9/0x190
    [258270.528009]  __alloc_pages_slowpath+0x40f/0xba0
    [258270.528011]  __alloc_pages_nodemask+0x209/0x260
    [258270.528014]  alloc_pages_vma+0x1f1/0x250
    [258270.528017]  do_huge_pmd_anonymous_page+0x123/0x660
    [258270.528021]  handle_mm_fault+0xfd3/0x1330
    [258270.528025]  __get_user_pages+0x113/0x640
    [258270.528027]  get_user_pages+0x4f/0x60
    [258270.528063]  __gfn_to_pfn_memslot+0x120/0x3f0 [kvm]
    [258270.528108]  try_async_pf+0x66/0x230 [kvm]
    [258270.528135]  tdp_page_fault+0x130/0x280 [kvm]
    [258270.528149]  kvm_mmu_page_fault+0x60/0x120 [kvm]
    [258270.528158]  handle_ept_violation+0x91/0x170 [kvm_intel]
    [258270.528162]  vmx_handle_exit+0x1ca/0x1400 [kvm_intel]

No performance changes were detected in quick ping-pong tests on
my 4 socket system, which is expected since an FPU+xstate load is
on the order of 0.1us, while ping-ponging between CPUs is on the
order of 20us, and somewhat noisy.

Cc: stable@vger.kernel.org
Signed-off-by: Rik van Riel <riel@redhat.com>
Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[Fixed a bug where reset_vcpu called put_fpu without preceding load_fpu,
 which happened inside from KVM_CREATE_VCPU ioctl. - Radim]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-12-05 21:16:43 +01:00
Linus Torvalds
13231cacce Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
 "A bunch of fixes for aacraid, a set of coherency fixes that only
  affect non-coherent platforms and one coccinelle detected null check
  after use"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: libsas: align sata_device's rps_resp on a cacheline
  scsi: use dma_get_cache_alignment() as minimum DMA alignment
  scsi: dma-mapping: always provide dma_get_cache_alignment
  scsi: ufs: ufshcd: fix potential NULL pointer dereference in ufshcd_config_vreg
  scsi: aacraid: Prevent crash in case of free interrupt during scsi EH path
  scsi: aacraid: Perform initialization reset only once
  scsi: aacraid: Check for PCI state of device in a generic way
2017-12-05 10:31:32 -08:00
Linus Torvalds
6a5e05a47b Merge tag 'char-misc-4.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc fixes from Greg KH:
 "Here are some small misc driver fixes for 4.15-rc3 to resolve reported
  issues. Specifically these are:

   - binder fix for a memory leak

   - vpd driver fixes for a number of reported problems

   - hyperv driver fix for memory accesses where it shouldn't be.

  All of these have been in linux-next for a while. There's also one
  more MAINTAINERS file update that came in today to get the Android
  developer's emails correct, which is also in this pull request, that
  was not in linux-next, but should not be an issue"

* tag 'char-misc-4.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  MAINTAINERS: update Android driver maintainers.
  firmware: vpd: Fix platform driver and device registration/unregistration
  firmware: vpd: Tie firmware kobject to device lifetime
  firmware: vpd: Destroy vpd sections in remove function
  hv: kvp: Avoid reading past allocated blocks from KVP file
  Drivers: hv: vmbus: Fix a rescind issue
  ANDROID: binder: fix transaction leak.
2017-12-05 10:06:23 -08:00
Linus Torvalds
1fbd55c0cc Merge tag 'driver-core-4.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
 "Here are 3 small fixes for some reported issues:

   - a debugfs build error that lots of people have reported

   - a Kconfig help text cleanup now that the firmware is not in the
     kernel tree

   - an ISA bus bug fix for a reported issue that has been there since
     2.6.18.

  All of these have been in linux-next with no reported issues"

* tag 'driver-core-4.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  firmware: cleanup FIRMWARE_IN_KERNEL message
  isa: Prevent NULL dereference in isa_bus driver callbacks
  debugfs: fix debugfs_real_fops() build error
2017-12-05 10:00:14 -08:00
Linus Torvalds
73996933b5 Merge tag 'staging-4.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging and iio driver fixes from Greg KH:
 "Here are a number of small staging and iio driver fixes for reported
  issues for 4.15-rc3. Nothing major here, the majority is IIO issues,
  like normal, but there are also some small bugfixes for a few staging
  drivers as well.

  Full details are in the shortlog.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'staging-4.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  iio: stm32: fix adc/trigger link error
  iio: health: max30102: Temperature should be in milli Celsius
  iio: fix kernel-doc build errors
  iio: adc: meson-saradc: Meson8 and Meson8b do not have REG11 and REG13
  iio: adc: meson-saradc: initialize the bandgap correctly on older SoCs
  iio: adc: meson-saradc: fix the bit_idx of the adc_en clock
  iio: proximity: sx9500: Assign interrupt from GpioIo()
  iio: adc: cpcap: fix incorrect validation
  staging: octeon-usb: use __delay() instead of cvmx_wait()
  staging: rtl8188eu: Fix incorrect response to SIOCGIWESSID
  staging: ccree: fix leak of import() after init()
  staging: comedi: ni_atmio: fix license warning.
2017-12-05 09:57:34 -08:00
Linus Torvalds
84dda2965d Merge tag 'tty-4.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver fixes from Greg KH:
 "Here are some small serdev and serial fixes for 4.15-rc3. They resolve
  some reported problems:

   - a number of serdev fixes to resolve crashes

   - MIPS build fixes for their serial port

   - a new 8250 device id

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-4.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  MIPS: Add custom serial.h with BASE_BAUD override for generic kernel
  serdev: ttyport: fix tty locking in close
  serdev: ttyport: fix NULL-deref on hangup
  serdev: fix receive_buf return value when no callback
  serdev: ttyport: add missing receive_buf sanity checks
  serial: 8250_early: Only set divisor if valid clk & baud
  serial: 8250_pci: Add Amazon PCI serial device ID
2017-12-05 09:05:16 -08:00
Hendrik Brueckner
c895f6f703 bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type
Commit 0515e5999a ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT
program type") introduced the bpf_perf_event_data structure which
exports the pt_regs structure.  This is OK for multiple architectures
but fail for s390 and arm64 which do not export pt_regs.  Programs
using them, for example, the bpf selftest fail to compile on these
architectures.

For s390, exporting the pt_regs is not an option because s390 wants
to allow changes to it.  For arm64, there is a user_pt_regs structure
that covers parts of the pt_regs structure for use by user space.

To solve the broken uapi for s390 and arm64, introduce an abstract
type for pt_regs and add an asm/bpf_perf_event.h file that concretes
the type.  An asm-generic header file covers the architectures that
export pt_regs today.

The arch-specific enablement for s390 and arm64 follows in separate
commits.

Reported-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Fixes: 0515e5999a ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type")
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-and-tested-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-05 15:02:40 +01:00
Will Deacon
4ce413d184 irqdesc: Use bool return type instead of int
The irq_balancing_disabled and irq_is_percpu{,_devid} functions are
clearly intended to return bool like the functions in
kernel/irq/settings.h, but actually return an int containing a masked
value of desc->status_use_accessors. This can lead to subtle breakage
if, for example, the return value is subsequently truncated when
assigned to a narrower type.

As Linus points out:

| In particular, what can (and _has_ happened) is that people end up
| using these functions that return true or false, and they assign the
| result to something like a bitfield (or a char) or whatever.
|
| And the code looks *obviously* correct, when you have things like
|
|      dev->percpu = irq_is_percpu_devid(dev->irq);
|
| and that "percpu" thing is just one status bit among many. It may even
| *work*, because maybe that "percpu" flag ends up not being all that
| important, or it just happens to never be set on the particular
| hardware that people end up testing.
|
| But while it looks obviously correct, and might even work, it's really
| fundamentally broken. Because that "true or false" function didn't
| actually return 0/1, it returned 0 or 0x20000.
|
| And 0x20000 may not fit in a bitmask or a "char" or whatever.

Fix the problem by consistently using bool as the return type for these
functions.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: marc.zyngier@arm.com
Link: https://lkml.kernel.org/r/1512142179-24616-1-git-send-email-will.deacon@arm.com
2017-12-04 20:51:12 +01:00
Linus Torvalds
236fa078c6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Various TCP control block fixes, including one that crashes with
    SELinux, from David Ahern and Eric Dumazet.

 2) Fix ACK generation in rxrpc, from David Howells.

 3) ipvlan doesn't set the mark properly in the ipv4 route lookup key,
    from Gao Feng.

 4) SIT configuration doesn't take on the frag_off ipv4 field
    configuration properly, fix from Hangbin Liu.

 5) TSO can fail after device down/up on stmmac, fix from Lars Persson.

 6) Various bpftool fixes (mostly in JSON handling) from Quentin Monnet.

 7) Various SKB leak fixes in vhost/tun/tap (mostly observed as
    performance problems). From Wei Xu.

 8) mvpps's TX descriptors were not zero initialized, from Yan Markman.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (57 commits)
  tcp: use IPCB instead of TCP_SKB_CB in inet_exact_dif_match()
  tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()
  rxrpc: Fix the MAINTAINERS record
  rxrpc: Use correct netns source in rxrpc_release_sock()
  liquidio: fix incorrect indentation of assignment statement
  stmmac: reset last TSO segment size after device open
  ipvlan: Add the skb->mark as flow4's member to lookup route
  s390/qeth: build max size GSO skbs on L2 devices
  s390/qeth: fix GSO throughput regression
  s390/qeth: fix thinko in IPv4 multicast address tracking
  tap: free skb if flags error
  tun: free skb in early errors
  vhost: fix skb leak in handle_rx()
  bnxt_en: Fix a variable scoping in bnxt_hwrm_do_send_msg()
  bnxt_en: fix dst/src fid for vxlan encap/decap actions
  bnxt_en: wildcard smac while creating tunnel decap filter
  bnxt_en: Need to unconditionally shut down RoCE in bnxt_shutdown
  phylink: ensure we take the link down when phylink_stop() is called
  sfp: warn about modules requiring address change sequence
  sfp: improve RX_LOS handling
  ...
2017-12-04 11:14:46 -08:00
Felipe Balbi
a773d41927 tracing: Pass export pointer as argument to ->write()
By passing an export descriptor to the write function, users don't need to
keep a global static pointer and can rely on container_of() to fetch their
own structure.

Link: http://lkml.kernel.org/r/20170602102025.5140-1-felipe.balbi@linux.intel.com

Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Chunyan Zhang <zhang.chunyan@linaro.org>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-12-04 07:14:30 -05:00
Arnd Bergmann
6d745ee8b5 iio: stm32: fix adc/trigger link error
The ADC driver can trigger on either the timer or the lptim
trigger, but it only uses a Kconfig 'select' statement
to ensure that the first of the two is present. When the lptim
trigger is enabled as a loadable module, and the adc driver
is built-in, we now get a link error:

drivers/iio/adc/stm32-adc.o: In function `stm32_adc_get_trig_extsel':
stm32-adc.c:(.text+0x4e0): undefined reference to `is_stm32_lptim_trigger'

We could use a second 'select' statement and always have both
trigger drivers enabled when the adc driver is, but it seems that
the lptimer trigger was intentionally left optional, so it seems
better to keep it that way.

This adds a hack to use 'IS_REACHABLE()' rather than 'IS_ENABLED()',
which avoids the link error, but instead leads to the lptimer trigger
not being used in the broken configuration. I've added a runtime
warning for this case to help users figure out what they did wrong
if this should ever be done by accident.

Fixes: f0b638a7f6 ("iio: adc: stm32: add support for lptimer triggers")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2017-12-02 11:21:31 +00:00
Linus Torvalds
e1ba1c99da Merge tag 'riscv-for-linus-4.15-rc2_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux
Pull RISC-V cleanups and ABI fixes from Palmer Dabbelt:
 "This contains a handful of small cleanups that are a result of
  feedback that didn't make it into our original patch set, either
  because the feedback hadn't been given yet, I missed the original
  emails, or we weren't ready to submit the changes yet.

  I've been maintaining the various cleanup patch sets I have as their
  own branches, which I then merged together and signed. Each merge
  commit has a short summary of the changes, and each branch is based on
  your latest tag (4.15-rc1, in this case). If this isn't the right way
  to do this then feel free to suggest something else, but it seems sane
  to me.

  Here's a short summary of the changes, roughly in order of how
  interesting they are.

   - libgcc.h has been moved from include/lib, where it's the only
     member, to include/linux. This is meant to avoid tab completion
     conflicts.

   - VDSO entries for clock_get/gettimeofday/getcpu have been added.
     These are simple syscalls now, but we want to let glibc use them
     from the start so we can make them faster later.

   - A VDSO entry for instruction cache flushing has been added so
     userspace can flush the instruction cache.

   - The VDSO symbol versions for __vdso_cmpxchg{32,64} have been
     removed, as those VDSO entries don't actually exist.

   - __io_writes has been corrected to respect the given type.

   - A new READ_ONCE in arch_spin_is_locked().

   - __test_and_op_bit_ord() is now actually ordered.

   - Various small fixes throughout the tree to enable allmodconfig to
     build cleanly.

   - Removal of some dead code in our atomic support headers.

   - Improvements to various comments in our atomic support headers"

* tag 'riscv-for-linus-4.15-rc2_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux: (23 commits)
  RISC-V: __io_writes should respect the length argument
  move libgcc.h to include/linux
  RISC-V: Clean up an unused include
  RISC-V: Allow userspace to flush the instruction cache
  RISC-V: Flush I$ when making a dirty page executable
  RISC-V: Add missing include
  RISC-V: Use define for get_cycles like other architectures
  RISC-V: Provide stub of setup_profiling_timer()
  RISC-V: Export some expected symbols for modules
  RISC-V: move empty_zero_page definition to C and export it
  RISC-V: io.h: type fixes for warnings
  RISC-V: use RISCV_{INT,SHORT} instead of {INT,SHORT} for asm macros
  RISC-V: use generic serial.h
  RISC-V: remove spin_unlock_wait()
  RISC-V: `sfence.vma` orderes the instruction cache
  RISC-V: Add READ_ONCE in arch_spin_is_locked()
  RISC-V: __test_and_op_bit_ord should be strongly ordered
  RISC-V: Remove smb_mb__{before,after}_spinlock()
  RISC-V: Remove __smp_bp__{before,after}_atomic
  RISC-V: Comment on why {,cmp}xchg is ordered how it is
  ...
2017-12-01 19:39:12 -05:00
Christoph Hellwig
4db2b604c0 move libgcc.h to include/linux
Introducing a new include/lib directory just for this file totally
messes up tab completion for include/linux, which is highly annoying.

Move it to include/linux where we have headers for all kinds of other
lib/ code as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2017-12-01 13:09:40 -08:00
Linus Torvalds
9e0600f5cf Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:

 - x86 bugfixes: APIC, nested virtualization, IOAPIC

 - PPC bugfix: HPT guests on a POWER9 radix host

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (26 commits)
  KVM: Let KVM_SET_SIGNAL_MASK work as advertised
  KVM: VMX: Fix vmx->nested freeing when no SMI handler
  KVM: VMX: Fix rflags cache during vCPU reset
  KVM: X86: Fix softlockup when get the current kvmclock
  KVM: lapic: Fixup LDR on load in x2apic
  KVM: lapic: Split out x2apic ldr calculation
  KVM: PPC: Book3S HV: Fix migration and HPT resizing of HPT guests on radix hosts
  KVM: vmx: use X86_CR4_UMIP and X86_FEATURE_UMIP
  KVM: x86: Fix CPUID function for word 6 (80000001_ECX)
  KVM: nVMX: Fix vmx_check_nested_events() return value in case an event was reinjected to L2
  KVM: x86: ioapic: Preserve read-only values in the redirection table
  KVM: x86: ioapic: Clear Remote IRR when entry is switched to edge-triggered
  KVM: x86: ioapic: Remove redundant check for Remote IRR in ioapic_set_irq
  KVM: x86: ioapic: Don't fire level irq when Remote IRR set
  KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race
  KVM: x86: inject exceptions produced by x86_decode_insn
  KVM: x86: Allow suppressing prints on RDMSR/WRMSR of unhandled MSRs
  KVM: x86: fix em_fxstor() sleeping while in atomic
  KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure
  KVM: nVMX: Validate the IA32_BNDCFGS on nested VM-entry
  ...
2017-11-30 08:15:19 -08:00