Commit Graph

92392 Commits

Author SHA1 Message Date
Dan Williams
4e4f00a9b5 x86, dax, libnvdimm: remove wb_cache_pmem() indirection
With all handling of the CONFIG_ARCH_HAS_PMEM_API case being moved to
libnvdimm and the pmem driver directly we do not need to provide global
wrappers and fallbacks in the CONFIG_ARCH_HAS_PMEM_API=n case. The pmem
driver will simply not link to arch_wb_cache_pmem() in that case.  Same
as before, pmem flushing is only defined for x86_64, via
clean_cache_range(), but it is straightforward to add other archs in the
future.

arch_wb_cache_pmem() is an exported function since the pmem module needs
to find it, but it is privately declared in drivers/nvdimm/pmem.h because
there are no consumers outside of the pmem driver.

Cc: <x86@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-06-15 14:35:24 -07:00
Dan Williams
81f558701a x86, dax: replace clear_pmem() with open coded memset + dax_ops->flush
The clear_pmem() helper simply combines a memset() plus a cache flush.
Now that the flush routine is optionally provided by the dax device
driver we can avoid unnecessary cache management on dax devices fronting
volatile memory.

With clear_pmem() gone we can follow on with a patch to make pmem cache
management completely defined within the pmem driver.

Cc: <x86@kernel.org>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-06-15 14:35:24 -07:00
Dan Williams
abebfbe2f7 dm: add ->flush() dax operation support
Allow device-mapper to route flush operations to the
per-target implementation. In order for the device stacking to work we
need a dax_dev and a pgoff relative to that device. This gives each
layer of the stack the information it needs to look up the operation
pointer for the next level.

This conceptually allows for an array of mixed device drivers with
varying flush implementations.

Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-06-15 14:34:59 -07:00
Dan Williams
3c1cebff23 dax, pmem: introduce an optional 'flush' dax_operation
Filesystem-DAX flushes caches whenever it writes to the address returned
through dax_direct_access() and when writing back dirty radix entries.
That flushing is only required in the pmem case, so add a dax operation
to allow pmem to take this extra action, but skip it for other dax
capable devices that do not provide a flush routine.

An example for this differentiation might be a volatile ram disk where
there is no expectation of persistence. In fact the pmem driver itself might
front such an address range specified by the NFIT. So, this "no flush"
property might be something passed down by the bus / libnvdimm.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-06-15 14:34:59 -07:00
Dan Williams
fec53774fd filesystem-dax: convert to dax_copy_from_iter()
Now that all possible providers of the dax_operations copy_from_iter
method are implemented, switch filesytem-dax to call the driver rather
than copy_to_iter_pmem.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-06-15 14:34:59 -07:00
Dan Williams
7e026c8c0a dm: add ->copy_from_iter() dax operation support
Allow device-mapper to route copy_from_iter operations to the
per-target implementation. In order for the device stacking to work we
need a dax_dev and a pgoff relative to that device. This gives each
layer of the stack the information it needs to look up the operation
pointer for the next level.

This conceptually allows for an array of mixed device drivers with
varying copy_from_iter implementations.

Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-06-09 09:22:21 -07:00
Dan Williams
0aed55af88 x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations
The pmem driver has a need to transfer data with a persistent memory
destination and be able to rely on the fact that the destination writes are not
cached. It is sufficient for the writes to be flushed to a cpu-store-buffer
(non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync()
to ensure data-writes have reached a power-fail-safe zone in the platform. The
fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn
around and fence previous writes with an "sfence".

Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and
memcpy_flushcache, that guarantee that the destination buffer is not dirty in
the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines
will be used to replace the "pmem api" (include/linux/pmem.h +
arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache()
and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
config symbol, and fallback to copy_from_iter_nocache() and plain memcpy()
otherwise.

This is meant to satisfy the concern from Linus that if a driver wants to do
something beyond the normal nocache semantics it should be something private to
that driver [1], and Al's concern that anything uaccess related belongs with
the rest of the uaccess code [2].

The first consumer of this interface is a new 'copy_from_iter' dax operation so
that pmem can inject cache maintenance operations without imposing this
overhead on other dax-capable drivers.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html

Cc: <x86@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-06-09 09:09:56 -07:00
Linus Torvalds
55cbdaf639 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
 "For the most part this is just a minor -rc cycle for the rdma
  subsystem. Even given that this is all of the -rc patches since the
  merge window closed, it's still only about 25 patches:

   - Multiple i40iw, nes, iw_cxgb4, hfi1, qib, mlx4, mlx5 fixes

   - A few upper layer protocol fixes (IPoIB, iSER, SRP)

   - A modest number of core fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (26 commits)
  RDMA/SA: Fix kernel panic in CMA request handler flow
  RDMA/umem: Fix missing mmap_sem in get umem ODP call
  RDMA/core: not to set page dirty bit if it's already set.
  RDMA/uverbs: Declare local function static and add brackets to sizeof
  RDMA/netlink: Reduce exposure of RDMA netlink functions
  RDMA/srp: Fix NULL deref at srp_destroy_qp()
  RDMA/IPoIB: Limit the ipoib_dev_uninit_default scope
  RDMA/IPoIB: Replace netdev_priv with ipoib_priv for ipoib_get_link_ksettings
  RDMA/qedr: add null check before pointer dereference
  RDMA/mlx5: set UMR wqe fence according to HCA cap
  net/mlx5: Define interface bits for fencing UMR wqe
  RDMA/mlx4: Fix MAD tunneling when SRIOV is enabled
  RDMA/qib,hfi1: Fix MR reference count leak on write with immediate
  RDMA/hfi1: Defer setting VL15 credits to link-up interrupt
  RDMA/hfi1: change PCI bar addr assignments to Linux API functions
  RDMA/hfi1: fix array termination by appending NULL to attr array
  RDMA/iw_cxgb4: fix the calculation of ipv6 header size
  RDMA/iw_cxgb4: calculate t4_eq_status_entries properly
  RDMA/iw_cxgb4: Avoid touch after free error in ARP failure handlers
  RDMA/nes: ACK MPA Reply frame
  ...
2017-06-04 10:41:32 -07:00
Linus Torvalds
f219764920 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "15 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  scripts/gdb: make lx-dmesg command work (reliably)
  mm: consider memblock reservations for deferred memory initialization sizing
  mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified
  mlock: fix mlock count can not decrease in race condition
  mm/migrate: fix refcount handling when !hugepage_migration_supported()
  dax: fix race between colliding PMD & PTE entries
  mm: avoid spurious 'bad pmd' warning messages
  mm/page_alloc.c: make sure OOM victim can try allocations with no watermarks once
  pcmcia: remove left-over %Z format
  slub/memcg: cure the brainless abuse of sysfs attributes
  initramfs: fix disabling of initramfs (and its compression)
  mm: clarify why we want kmalloc before falling backto vmallock
  frv: declare jiffies to be located in the .data section
  include/linux/gfp.h: fix ___GFP_NOLOCKDEP value
  ksm: prevent crash after write_protect_page fails
2017-06-02 15:49:46 -07:00
Michal Hocko
864b9a393d mm: consider memblock reservations for deferred memory initialization sizing
We have seen an early OOM killer invocation on ppc64 systems with
crashkernel=4096M:

	kthreadd invoked oom-killer: gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=7, order=0, oom_score_adj=0
	kthreadd cpuset=/ mems_allowed=7
	CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.4.68-1.gd7fe927-default #1
	Call Trace:
	  dump_stack+0xb0/0xf0 (unreliable)
	  dump_header+0xb0/0x258
	  out_of_memory+0x5f0/0x640
	  __alloc_pages_nodemask+0xa8c/0xc80
	  kmem_getpages+0x84/0x1a0
	  fallback_alloc+0x2a4/0x320
	  kmem_cache_alloc_node+0xc0/0x2e0
	  copy_process.isra.25+0x260/0x1b30
	  _do_fork+0x94/0x470
	  kernel_thread+0x48/0x60
	  kthreadd+0x264/0x330
	  ret_from_kernel_thread+0x5c/0xa4

	Mem-Info:
	active_anon:0 inactive_anon:0 isolated_anon:0
	 active_file:0 inactive_file:0 isolated_file:0
	 unevictable:0 dirty:0 writeback:0 unstable:0
	 slab_reclaimable:5 slab_unreclaimable:73
	 mapped:0 shmem:0 pagetables:0 bounce:0
	 free:0 free_pcp:0 free_cma:0
	Node 7 DMA free:0kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:52428800kB managed:110016kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:320kB slab_unreclaimable:4672kB kernel_stack:1152kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
	lowmem_reserve[]: 0 0 0 0
	Node 7 DMA: 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 0kB
	0 total pagecache pages
	0 pages in swap cache
	Swap cache stats: add 0, delete 0, find 0/0
	Free swap  = 0kB
	Total swap = 0kB
	819200 pages RAM
	0 pages HighMem/MovableOnly
	817481 pages reserved
	0 pages cma reserved
	0 pages hwpoisoned

the reason is that the managed memory is too low (only 110MB) while the
rest of the the 50GB is still waiting for the deferred intialization to
be done.  update_defer_init estimates the initial memoty to initialize
to 2GB at least but it doesn't consider any memory allocated in that
range.  In this particular case we've had

	Reserving 4096MB of memory at 128MB for crashkernel (System RAM: 51200MB)

so the low 2GB is mostly depleted.

Fix this by considering memblock allocations in the initial static
initialization estimation.  Move the max_initialise to
reset_deferred_meminit and implement a simple memblock_reserved_memory
helper which iterates all reserved blocks and sums the size of all that
start below the given address.  The cumulative size is than added on top
of the initial estimation.  This is still not ideal because
reset_deferred_meminit doesn't consider holes and so reservation might
be above the initial estimation whihch we ignore but let's make the
logic simpler until we really need to handle more complicated cases.

Fixes: 3a80a7fa79 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
Link: http://lkml.kernel.org/r/20170531104010.GI27783@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>	[4.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02 15:07:38 -07:00
James Morse
9a291a7c94 mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified
KVM uses get_user_pages() to resolve its stage2 faults.  KVM sets the
FOLL_HWPOISON flag causing faultin_page() to return -EHWPOISON when it
finds a VM_FAULT_HWPOISON.  KVM handles these hwpoison pages as a
special case.  (check_user_page_hwpoison())

When huge pages are involved, this doesn't work so well.
get_user_pages() calls follow_hugetlb_page(), which stops early if it
receives VM_FAULT_HWPOISON from hugetlb_fault(), eventually returning
-EFAULT to the caller.  The step to map this to -EHWPOISON based on the
FOLL_ flags is missing.  The hwpoison special case is skipped, and
-EFAULT is returned to user-space, causing Qemu or kvmtool to exit.

Instead, move this VM_FAULT_ to errno mapping code into a header file
and use it from faultin_page() and follow_hugetlb_page().

With this, KVM works as expected.

This isn't a problem for arm64 today as we haven't enabled
MEMORY_FAILURE, but I can't see any reason this doesn't happen on x86
too, so I think this should be a fix.  This doesn't apply earlier than
stable's v4.11.1 due to all sorts of cleanup.

[james.morse@arm.com: add vm_fault_to_errno() call to faultin_page()]
suggested.
  Link: http://lkml.kernel.org/r/20170525171035.16359-1-james.morse@arm.com
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170524160900.28786-1-james.morse@arm.com
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>	[4.11.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02 15:07:38 -07:00
Matthias Kaehlcke
60b0a8c3d2 frv: declare jiffies to be located in the .data section
Commit 7c30f352c8 ("jiffies.h: declare jiffies and jiffies_64 with
____cacheline_aligned_in_smp") removed a section specification from the
jiffies declaration that caused conflicts on some platforms.

Unfortunately this change broke the build for frv:

  kernel/built-in.o: In function `__do_softirq': (.text+0x6460): relocation truncated to fit: R_FRV_GPREL12 against symbol
      `jiffies' defined in *ABS* section in .tmp_vmlinux1
  kernel/built-in.o: In function `__do_softirq': (.text+0x6574): relocation truncated to fit: R_FRV_GPREL12 against symbol
      `jiffies' defined in *ABS* section in .tmp_vmlinux1
  kernel/built-in.o: In function `pwq_activate_delayed_work': workqueue.c:(.text+0x15b9c): relocation truncated to fit: R_FRV_GPREL12 against
      symbol `jiffies' defined in *ABS* section in .tmp_vmlinux1
  ...

Add __jiffy_arch_data to the declaration of jiffies and use it on frv to
include the section specification.  For all other platforms
__jiffy_arch_data (currently) has no effect.

Fixes: 7c30f352c8 ("jiffies.h: declare jiffies and jiffies_64 with ____cacheline_aligned_in_smp")
Link: http://lkml.kernel.org/r/20170516221333.177280-1-mka@chromium.org
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: David Howells <dhowells@redhat.com>
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02 15:07:37 -07:00
Michal Hocko
1bde33e051 include/linux/gfp.h: fix ___GFP_NOLOCKDEP value
Igor Stoppa has noticed that __GFP_NOLOCKDEP can use a lower bit.  At
the time commit 7e7844226f ("lockdep: allow to disable reclaim lockup
detection") was written we still had __GFP_OTHER_NODE but I have removed
it in commit 41b6167e8f ("mm: get rid of __GFP_OTHER_NODE") and forgot
to lower the bit value.

The current value is outside of __GFP_BITS_SHIFT so it cannot be used
actually.

Fixes: 7e7844226f ("lockdep: allow to disable reclaim lockup detection")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Igor Stoppa <igor.stoppa@nokia.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-02 15:07:37 -07:00
Linus Torvalds
46356945fc Merge tag 'drm-dp-quirk-for-v4.12-rc4' of git://people.freedesktop.org/~airlied/linux
Pull drm displayport quirk support:
 "DP quirk for usb c dongles.

  As mentioned I have a separate request for fixing a regression, but
  also keeping the broken hw working, for certain USB-C DP adapters they
  require a minimised n/m parameters, but an attempt to do this
  generically has failed, we need to quirk these specific adapters.
  However doing it generically regressed some eDP panels.

  This pull adds the infrastructure and a quirk for the adapter"

* tag 'drm-dp-quirk-for-v4.12-rc4' of git://people.freedesktop.org/~airlied/linux:
  drm/i915: Detect USB-C specific dongles before reducing M and N
  drm/dp: start a DPCD based DP sink/branch device quirk database
  drm/i915: use drm DP helper to read DPCD desc
  drm/dp: add helper for reading DP sink/branch device desc from DPCD
2017-06-02 11:32:38 -07:00
Linus Torvalds
3b1e342be2 Merge tag 'nfsd-4.12-1' of git://linux-nfs.org/~bfields/linux
Pull nfsd fixes from Bruce Fields:
 "Revert patch accidentally included in the merge window pull request,
  and fix a crash that was likely a result of buggy client behavior"

* tag 'nfsd-4.12-1' of git://linux-nfs.org/~bfields/linux:
  nfsd4: fix null dereference on replay
  nfsd: Revert "nfsd: check for oversized NFSv2/v3 arguments"
2017-06-01 16:24:48 -07:00
Majd Dibbiny
d3957b86a4 RDMA/SA: Fix kernel panic in CMA request handler flow
Commit 9fdca4da4d (IB/SA: Split struct sa_path_rec based on IB and
ROCE specific fields) moved the service_id to be specific attribute
for IB and OPA SA Path Record, and thus wasn't assigned for RoCE.

This caused to the following kernel panic in the CMA request handler flow:

[   27.074594] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[   27.074731] IP: __radix_tree_lookup+0x1d/0xe0
...
[   27.075356] Workqueue: ib_cm cm_work_handler [ib_cm]
[   27.075401] task: ffff88022e3b8000 task.stack: ffffc90001298000
[   27.075449] RIP: 0010:__radix_tree_lookup+0x1d/0xe0
...
[   27.075979] Call Trace:
[   27.076015]  radix_tree_lookup+0xd/0x10
[   27.076055]  cma_ps_find+0x59/0x70 [rdma_cm]
[   27.076097]  cma_id_from_event+0xd2/0x470 [rdma_cm]
[   27.076144]  ? ib_init_ah_from_path+0x39a/0x590 [ib_core]
[   27.076193]  cma_req_handler+0x25/0x480 [rdma_cm]
[   27.076237]  cm_process_work+0x25/0x120 [ib_cm]
[   27.076280]  ? cm_get_bth_pkey.isra.62+0x3c/0xa0 [ib_cm]
[   27.076350]  cm_req_handler+0xb03/0xd40 [ib_cm]
[   27.076430]  ? sched_clock_cpu+0x11/0xb0
[   27.076478]  cm_work_handler+0x194/0x1588 [ib_cm]
[   27.076525]  process_one_work+0x160/0x410
[   27.076565]  worker_thread+0x137/0x4a0
[   27.076614]  kthread+0x112/0x150
[   27.076684]  ? max_active_store+0x60/0x60
[   27.077642]  ? kthread_park+0x90/0x90
[   27.078530]  ret_from_fork+0x2c/0x40

This patch moves it back to the common SA Path Record structure
and removes the redundant setter and getter.

Tested on Connect-IB and Connect-X4 in Infiniband and RoCE respectively.

Fixes: 9fdca4da4d (IB/SA: Split struct sa_path_rec based on IB ands
	ROCE specific fields)
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-01 17:20:14 -04:00
Leon Romanovsky
233c195583 RDMA/netlink: Reduce exposure of RDMA netlink functions
RDMA netlink is part of ib_core, hence ibnl_chk_listeners(),
ibnl_init() and ibnl_cleanup() don't need to be published
in public header file.

Let's remove EXPORT_SYMBOL from ibnl_chk_listeners() and move all these
functions to private header file.

CC: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-01 17:20:11 -04:00
Max Gurtovoy
1410a90ae4 net/mlx5: Define interface bits for fencing UMR wqe
HW can implement UMR wqe re-transmission in various ways.
Thus, add HCA cap to distinguish the needed fence for UMR to make
sure that the wqe wouldn't fail on mkey checks.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-01 17:05:04 -04:00
Linus Torvalds
393bcfaeb8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "Here are the target-pending fixes for v4.12-rc4:

   - ibmviscsis ABORT_TASK handling fixes that missed the v4.12 merge
     window. (Bryant Ly and Michael Cyr)

   - Re-add a target-core check enforcing WRITE overflow reject that was
     relaxed in v4.3, to avoid unsupported iscsi-target immediate data
     overflow. (nab)

   - Fix a target-core-user OOPs during device removal. (MNC + Bryant
     Ly)

   - Fix a long standing iscsi-target potential issue where kthread exit
     did not wait for kthread_should_stop(). (Jiang Yi)

   - Fix a iscsi-target v3.12.y regression OOPs involving initial login
     PDU processing during asynchronous TCP connection close. (MNC +
     nab)

  This is a little larger than usual for an -rc4, primarily due to the
  iscsi-target v3.12.y regression OOPs bug-fix.

  However, it's an important patch as MNC + Hannes where both able to
  trigger it using a reduced iscsi initiator login timeout combined with
  a backend taking a long time to complete I/Os during iscsi login
  driven session reinstatement"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  iscsi-target: Always wait for kthread_should_stop() before kthread exit
  iscsi-target: Fix initial login PDU asynchronous socket close OOPs
  tcmu: fix crash during device removal
  target: Re-add check to reject control WRITEs with overflow data
  ibmvscsis: Fix the incorrect req_lim_delta
  ibmvscsis: Clear left-over abort_cmd pointers
2017-06-01 10:40:41 -07:00
Nicholas Bellinger
25cdda95fd iscsi-target: Fix initial login PDU asynchronous socket close OOPs
This patch fixes a OOPs originally introduced by:

   commit bb048357da
   Author: Nicholas Bellinger <nab@linux-iscsi.org>
   Date:   Thu Sep 5 14:54:04 2013 -0700

   iscsi-target: Add sk->sk_state_change to cleanup after TCP failure

which would trigger a NULL pointer dereference when a TCP connection
was closed asynchronously via iscsi_target_sk_state_change(), but only
when the initial PDU processing in iscsi_target_do_login() from iscsi_np
process context was blocked waiting for backend I/O to complete.

To address this issue, this patch makes the following changes.

First, it introduces some common helper functions used for checking
socket closing state, checking login_flags, and atomically checking
socket closing state + setting login_flags.

Second, it introduces a LOGIN_FLAGS_INITIAL_PDU bit to know when a TCP
connection has dropped via iscsi_target_sk_state_change(), but the
initial PDU processing within iscsi_target_do_login() in iscsi_np
context is still running.  For this case, it sets LOGIN_FLAGS_CLOSED,
but doesn't invoke schedule_delayed_work().

The original NULL pointer dereference case reported by MNC is now handled
by iscsi_target_do_login() doing a iscsi_target_sk_check_close() before
transitioning to FFP to determine when the socket has already closed,
or iscsi_target_start_negotiation() if the login needs to exchange
more PDUs (eg: iscsi_target_do_login returned 0) but the socket has
closed.  For both of these cases, the cleanup up of remaining connection
resources will occur in iscsi_target_start_negotiation() from iscsi_np
process context once the failure is detected.

Finally, to handle to case where iscsi_target_sk_state_change() is
called after the initial PDU procesing is complete, it now invokes
conn->login_work -> iscsi_target_do_login_rx() to perform cleanup once
existing iscsi_target_sk_check_close() checks detect connection failure.
For this case, the cleanup of remaining connection resources will occur
in iscsi_target_do_login_rx() from delayed workqueue process context
once the failure is detected.

Reported-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Tested-by: Mike Christie <mchristi@redhat.com>
Cc: Mike Christie <mchristi@redhat.com>
Reported-by: Hannes Reinecke <hare@suse.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Varun Prakash <varun@chelsio.com>
Cc: <stable@vger.kernel.org> # v3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-31 15:12:31 -07:00
Linus Torvalds
3f173bde7e Merge tag 'pinctrl-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
 "Here is an overdue pull request for pin control fixes, the most
  prominent feature is to make Intel Chromebooks (and I suspect any
  other Cherryview-based Intel thing) happy again, which we really want
  to see.

  There is a patch hitting drivers/firmware/* that I was uncertain to
  who actually manages, but I got Andy Shevchenko's and Dmitry Torokov's
  review tags on it and I trust them both 100% to do the right thing for
  Intel platform drivers.

  Summary:

   - Make a few Intel Chromebooks with Cherryview DMI firmware work
     smoothly.

   - A fix for some bogus allocations in the generic group management
     code.

   - Some GPIO descriptor lookup table stubs. Merged through the pin
     control tree for administrative reasons.

   - Revert the "bi-directional" and "output-enable" generic properties:
     we need more discussions around this. It seems other SoCs are using
     input/output gate enablement and these terms are not correct.

   - Fix mux and drive strength atomically in the MXS driver.

   - Fix the SPDIF function on sunxi A83T.

   - OF table terminators and other small fixes"

* tag 'pinctrl-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: sunxi: Fix SPDIF function name for A83T
  pinctrl: mxs: atomically switch mux and drive strength config
  pinctrl: cherryview: Extend the Chromebook DMI quirk to Intel_Strago systems
  firmware: dmi: Add DMI_PRODUCT_FAMILY identification string
  pinctrl: core: Fix warning by removing bogus code
  gpiolib: Add stubs for gpiod lookup table interface
  Revert "pinctrl: generic: Add bi-directional and output-enable"
  pinctrl: cherryview: Add terminate entry for dmi_system_id tables
2017-05-29 10:05:19 -07:00
Jani Nikula
76fa998acd drm/dp: start a DPCD based DP sink/branch device quirk database
Face the fact, there are Display Port sink and branch devices out there
in the wild that don't follow the Display Port specifications, or they
have bugs, or just otherwise require special treatment. Start a common
quirk database the drivers can query based on the DP device
identification. At least for now, we leave the workarounds for the
drivers to implement as they see fit.

For starters, add a branch device that can't handle full 24-bit main
link Mdiv and Ndiv main link attributes properly. Naturally, the
workaround of reducing main link attributes for all devices ended up in
regressions for other devices. So here we are.

v2: Rebase on DRM DP desc read helpers

v3: Fix the OUI memcmp blunder (Clint)

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Cc: Clint Taylor <clinton.a.taylor@intel.com>
Cc: Adam Jackson <ajax@redhat.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Tested-by: Clinton Taylor <clinton.a.taylor@intel.com>
Reviewed-by: Clinton Taylor <clinton.a.taylor@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> # v2
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/91ec198dd95258dbf3bee2f6be739e0da73b4fdd.1495105635.git.jani.nikula@intel.com
2017-05-29 13:43:26 +03:00
Jani Nikula
118b90f3f1 drm/dp: add helper for reading DP sink/branch device desc from DPCD
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/acba54da7d80eafea9e59a893e27e3c31028c0ba.1495105635.git.jani.nikula@intel.com
2017-05-29 13:36:57 +03:00
Linus Torvalds
249f1efd8e Merge tag 'tty-4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
 "Here are some serial and tty fixes for 4.12-rc3. They are a bit bigger
  than normal, which is why I had them bake in linux-next for a few
  weeks and didn't send them to you for -rc2.

  They revert a few of the serdev patches from 4.12-rc1, and bring
  things back to how they were in 4.11, to try to make things a bit more
  stable there. Rob and Johan both agree that this is the way forward,
  so this isn't people squabbling over semantics. Other than that, just
  a few minor serial driver fixes that people have had problems with.

  All of these have been in linux-next for a few weeks with no reported
  issues"

* tag 'tty-4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: altera_uart: call iounmap() at driver remove
  serial: imx: ensure UCR3 and UFCR are setup correctly
  MAINTAINERS/serial: Change maintainer of jsm driver
  serial: enable serdev support
  tty/serdev: add serdev registration interface
  serdev: Restore serdev_device_write_buf for atomic context
  serial: core: fix crash in uart_suspend_port
  tty: fix port buffer locking
  tty: ehv_bytechan: clean up init error handling
  serial: ifx6x60: fix use-after-free on module unload
  serial: altera_jtaguart: adding iounmap()
  serial: exar: Fix stuck MSIs
  serial: efm32: Fix parity management in 'efm32_uart_console_get_options()'
  serdev: fix tty-port client deregistration
  Revert "tty_port: register tty ports with serdev bus"
  drivers/tty: 8250: only call fintek_8250_probe when doing port I/O
2017-05-27 09:39:09 -07:00
Linus Torvalds
6741d51699 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix state pruning in bpf verifier wrt. alignment, from Daniel
    Borkmann.

 2) Handle non-linear SKBs properly in SCTP ICMP parsing, from Davide
    Caratti.

 3) Fix bit field definitions for rss_hash_type of descriptors in mlx5
    driver, from Jesper Brouer.

 4) Defer slave->link updates until bonding is ready to do a full commit
    to the new settings, from Nithin Sujir.

 5) Properly reference count ipv4 FIB metrics to avoid use after free
    situations, from Eric Dumazet and several others including Cong Wang
    and Julian Anastasov.

 6) Fix races in llc_ui_bind(), from Lin Zhang.

 7) Fix regression of ESP UDP encapsulation for TCP packets, from
    Steffen Klassert.

 8) Fix mdio-octeon driver Kconfig deps, from Randy Dunlap.

 9) Fix regression in setting DSCP on ipv6/GRE encapsulation, from Peter
    Dawson.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
  ipv4: add reference counting to metrics
  net: ethernet: ax88796: don't call free_irq without request_irq first
  ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets
  sctp: fix ICMP processing if skb is non-linear
  net: llc: add lock_sock in llc_ui_bind to avoid a race condition
  bonding: Don't update slave->link until ready to commit
  test_bpf: Add a couple of tests for BPF_JSGE.
  bpf: add various verifier test cases
  bpf: fix wrong exposure of map_flags into fdinfo for lpm
  bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data
  bpf: properly reset caller saved regs after helper call and ld_abs/ind
  bpf: fix incorrect pruning decision when alignment must be tracked
  arp: fixed -Wuninitialized compiler warning
  tcp: avoid fastopen API to be used on AF_UNSPEC
  net: move somaxconn init from sysctl code
  net: fix potential null pointer dereference
  geneve: fix fill_info when using collect_metadata
  virtio-net: enable TSO/checksum offloads for Q-in-Q vlans
  be2net: Fix offload features for Q-in-Q packets
  vlan: Fix tcp checksum offloads in Q-in-Q vlans
  ...
2017-05-26 13:51:01 -07:00
Eric Dumazet
3fb07daff8 ipv4: add reference counting to metrics
Andrey Konovalov reported crashes in ipv4_mtu()

I could reproduce the issue with KASAN kernels, between
10.246.7.151 and 10.246.7.152 :

1) 20 concurrent netperf -t TCP_RR -H 10.246.7.152 -l 1000 &

2) At the same time run following loop :
while :
do
 ip ro add 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
 ip ro del 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
done

Cong Wang attempted to add back rt->fi in commit
82486aa6f1 ("ipv4: restore rt->fi for reference counting")
but this proved to add some issues that were complex to solve.

Instead, I suggested to add a refcount to the metrics themselves,
being a standalone object (in particular, no reference to other objects)

I tried to make this patch as small as possible to ease its backport,
instead of being super clean. Note that we believe that only ipv4 dst
need to take care of the metric refcount. But if this is wrong,
this patch adds the basic infrastructure to extend this to other
families.

Many thanks to Julian Anastasov for reviewing this patch, and Cong Wang
for his efforts on this problem.

Fixes: 2860583fe8 ("ipv4: Kill rt->fi")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-26 14:57:07 -04:00
Linus Torvalds
1b8f2ffc79 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "A collection of fixes that should go into this series. This contains:

   - A set of NVMe fixes, pulled from Christoph. This includes a set of
     fixes for the fiber channel bits from James Smart, rdma queue depth
     fix from Marta, controller removal fixes from Ming, and some more
     APST quirk updates from Andy.

   - A blk-mq debugfs fix from Bart, fixing a problem with the
     untangling of the sysfs and debugfs blk-mq bits that was added in
     this series.

   - Error code fix in add_partition() from Dan.

   - A small series of fixes for the new blk-throttle code from Shaohua"

* 'for-linus' of git://git.kernel.dk/linux-block: (21 commits)
  blk-mq: Only register debugfs attributes for blk-mq queues
  nvme: Quirk APST on Intel 600P/P3100 devices
  nvme: only setup block integrity if supported by the driver
  nvme: replace is_flags field in nvme_ctrl_ops with a flags field
  nvme-pci: consistencly use ctrl->device for logging
  partitions/msdos: FreeBSD UFS2 file systems are not recognized
  block: fix an error code in add_partition()
  blk-throttle: force user to configure all settings for io.low
  blk-throttle: respect 0 bps/iops settings for io.low
  blk-throttle: output some debug info in trace
  blk-throttle: add hierarchy support for latency target and idle time
  nvme_fc: remove extra controller reference taken on reconnect
  nvme_fc: correct nvme status set on abort
  nvme_fc: set logging level on resets/deletes
  nvme_fc: revise comment on teardown
  nvme_fc: Support ctrl_loss_tmo
  nvme_fc: get rid of local reconnect_delay
  blk-mq: remove blk_mq_abort_requeue_list()
  nvme: avoid to use blk_mq_abort_requeue_list()
  nvme: use blk_mq_start_hw_queues() in nvme_kill_queues()
  ...
2017-05-26 11:05:22 -07:00
Linus Torvalds
6ce4782911 Merge tag 'pci-v4.12-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:

 - fix PCI_ENDPOINT build error (merged for v4.12)

 - fix Switchtec driver (merged for v4.12)

 - fix imx6 config read timeouts, fallout from changing to non-postable
   reads

 - add PM "needs_resume" flag for i915 suspend issue

* tag 'pci-v4.12-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI/PM: Add needs_resume flag to avoid suspend complete optimization
  PCI: imx6: Fix config read timeout handling
  switchtec: Fix minor bug with partition ID register
  switchtec: Use new cdev_device_add() helper function
  PCI: endpoint: Make PCI_ENDPOINT depend on HAS_DMA
2017-05-26 10:51:18 -07:00
Linus Torvalds
80941b2aeb Merge tag 'ceph-for-4.12-rc3' of git://github.com/ceph/ceph-client
Pul ceph fixes from Ilya Dryomov:
 "A bunch of make W=1 and static checker fixups, a RECONNECT_SEQ
  messenger patch from Zheng and Luis' fallocate fix"

* tag 'ceph-for-4.12-rc3' of git://github.com/ceph/ceph-client:
  ceph: check that the new inode size is within limits in ceph_fallocate()
  libceph: cleanup old messages according to reconnect seq
  libceph: NULL deref on crush_decode() error path
  libceph: fix error handling in process_one_ticket()
  libceph: validate blob_struct_v in process_one_ticket()
  libceph: drop version variable from ceph_monmap_decode()
  libceph: make ceph_msg_data_advance() return void
  libceph: use kbasename() and kill ceph_file_part()
2017-05-26 09:35:22 -07:00
Christoph Hellwig
83b4605b0c PCI/msi: fix the pci_alloc_irq_vectors_affinity stub
We need to return an error for any call that asks for MSI / MSI-X
vectors only, so that non-trivial fallback logic can work properly.

Also valid dev->irq and use the "correct" errno value based on feedback
from Linus.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Fixes: aff17164 ("PCI: Provide sensible IRQ vector alloc/free routines")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-26 08:45:56 -07:00
Daniel Borkmann
614d0d77b4 bpf: add various verifier test cases
This patch adds various verifier test cases:

1) A test case for the pruning issue when tracking alignment
   is used.
2) Various PTR_TO_MAP_VALUE_OR_NULL tests to make sure pointer
   arithmetic turns such register into UNKNOWN_VALUE type.
3) Test cases for the special treatment of LD_ABS/LD_IND to
   make sure verifier doesn't break calling convention here.
   Latter is needed, since f.e. arm64 JIT uses r1 - r5 for
   storing temporary data, so they really must be marked as
   NOT_INIT.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-25 13:44:28 -04:00
Vlad Yasevich
35d2f80b07 vlan: Fix tcp checksum offloads in Q-in-Q vlans
It appears that TCP checksum offloading has been broken for
Q-in-Q vlans.  The behavior was execerbated by the
series
    commit afb0bc972b ("Merge branch 'stacked_vlan_tso'")
that that enabled accleleration features on stacked vlans.

However, event without that series, it is possible to trigger
this issue.  It just requires a lot more specialized configuration.

The root cause is the interaction between how
netdev_intersect_features() works, the features actually set on
the vlan devices and HW having the ability to run checksum with
longer headers.

The issue starts when netdev_interesect_features() replaces
NETIF_F_HW_CSUM with a combination of NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM,
if the HW advertises IP|IPV6 specific checksums.  This happens
for tagged and multi-tagged packets.   However, HW that enables
IP|IPV6 checksum offloading doesn't gurantee that packets with
arbitrarily long headers can be checksummed.

This patch disables IP|IPV6 checksums on the packet for multi-tagged
packets.

CC: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
CC: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-24 16:27:14 -04:00
David S. Miller
3f6b123bcc Merge tag 'mlx5-fixes-2017-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:

====================
mlx5-fixes-2017-05-23

Some TC offloads fixes from Or Gerlitz.
From Erez, mlx5 IPoIB RX fix to improve GRO.
From Mohamad, Command interface fix to improve mitigation against FW
commands timeouts.
From Tariq, Driver load Tolerance against affinity settings failures.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-24 15:43:57 -04:00
Linus Torvalds
2426125ab4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ptrace fix from Eric Biederman:
 "This fixes a brown paper bag bug. When I fixed the ptrace interaction
  with user namespaces I added a new field ptracer_cred in struct_task
  and I failed to properly initialize it on fork.

  This dangling pointer wound up breaking runing setuid applications run
  from the enlightenment window manager.

  As this is the worst sort of bug. A regression breaking user space for
  no good reason let's get this fixed"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  ptrace: Properly initialize ptracer_cred on fork
2017-05-24 08:28:59 -07:00
Linus Torvalds
590d4b333d Merge tag 'mmc-v4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
 "A couple of MMC host fixes intended for v4.12 rc3:

   - sdhci-xenon: Don't free data for phy allocated by devm*
   - sdhci-iproc: Suppress spurious interrupts
   - cavium: Fix probing race with regulator
   - cavium: Prevent crash with incomplete DT
   - cavium-octeon: Use proper GPIO name for power control
   - cavium-octeon: Fix interrupt enable code"

* tag 'mmc-v4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: sdhci-iproc: suppress spurious interrupt with Multiblock read
  mmc: cavium: Fix probing race with regulator
  of/platform: Make of_platform_device_destroy globally visible
  mmc: cavium: Prevent crash with incomplete DT
  mmc: cavium-octeon: Use proper GPIO name for power control
  mmc: cavium-octeon: Fix interrupt enable code
  mmc: sdhci-xenon: kill xenon_clean_phy()
2017-05-24 08:21:56 -07:00
Imre Deak
4d071c3238 PCI/PM: Add needs_resume flag to avoid suspend complete optimization
Some drivers - like i915 - may not support the system suspend direct
complete optimization due to differences in their runtime and system
suspend sequence.  Add a flag that when set resumes the device before
calling the driver's system suspend handlers which effectively disables
the optimization.

Needed by a future patch fixing suspend/resume on i915.

Suggested by Rafael.

Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: stable@vger.kernel.org
2017-05-23 14:18:17 -05:00
Ilya Dryomov
6f4dbd149d libceph: use kbasename() and kill ceph_file_part()
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2017-05-23 20:32:10 +02:00
Jesper Dangaard Brouer
12e8b570e7 mlx5: fix bug reading rss_hash_type from CQE
Masks for extracting part of the Completion Queue Entry (CQE)
field rss_hash_type was swapped, namely CQE_RSS_HTYPE_IP and
CQE_RSS_HTYPE_L4.

The bug resulted in setting skb->l4_hash, even-though the
rss_hash_type indicated that hash was NOT computed over the
L4 (UDP or TCP) part of the packet.

Added comments from the datasheet, to make it more clear what
these masks are selecting.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-23 11:03:31 -04:00
Oliver Neukum
7f65b1f5ad cdc-ether: divorce initialisation with a filter reset and a generic method
Some devices need their multicast filter reset but others are crashed by that.
So the methods need to be separated.

Signed-off-by: Oliver Neukum <oneukum@suse.com>
Reported-by: "Ridgway, Keith" <kridgway@harris.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-23 11:01:28 -04:00
David S. Miller
2f9bfd3399 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
pull request (net): ipsec 2017-05-23

1) Fix wrong header offset for esp4 udpencap packets.

2) Fix a stack access out of bounds when creating a bundle
   with sub policies. From Sabrina Dubroca.

3) Fix slab-out-of-bounds in pfkey due to an incorrect
   sadb_x_sec_len calculation.

4) We checked the wrong feature flags when taking down
   an interface with IPsec offload enabled.
   Fix from Ilan Tayari.

5) Copy the anti replay sequence numbers when doing a state
   migration, otherwise we get out of sync with the sequence
   numbers. Fix from Antony Antony.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-23 10:51:32 -04:00
Mohamad Haj Yahia
73dd3a4839 net/mlx5: Avoid using pending command interface slots
Currently when firmware command gets stuck or it takes long time to
complete, the driver command will get timeout and the command slot is
freed and can be used for new commands, and if the firmware receive new
command on the old busy slot its behavior is unexpected and this could
be harmful.
To fix this when the driver command gets timeout we return failure,
but we don't free the command slot and we wait for the firmware to
explicitly respond to that command.
Once all the entries are busy we will stop processing new firmware
commands.

Fixes: 9cba4ebcf3 ('net/mlx5: Fix potential deadlock in command mode change')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-23 16:23:31 +03:00
Or Gerlitz
3aa4266405 net/sched: act_csum: Add accessors for offloading drivers
Add the accessors for realizing if this is a csum action,
and for which fields checksum is needed.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-23 16:23:31 +03:00
Eric W. Biederman
c70d9d809f ptrace: Properly initialize ptracer_cred on fork
When I introduced ptracer_cred I failed to consider the weirdness of
fork where the task_struct copies the old value by default.  This
winds up leaving ptracer_cred set even when a process forks and
the child process does not wind up being ptraced.

Because ptracer_cred is not set on non-ptraced processes whose
parents were ptraced this has broken the ability of the enlightenment
window manager to start setuid children.

Fix this by properly initializing ptracer_cred in ptrace_init_task

This must be done with a little bit of care to preserve the current value
of ptracer_cred when ptrace carries through fork.  Re-reading the
ptracer_cred from the ptracing process at this point is inconsistent
with how PT_PTRACE_CAP has been maintained all of these years.

Tested-by: Takashi Iwai <tiwai@suse.de>
Fixes: 64b875f7ac ("ptrace: Capture the ptracer's creds not PT_PTRACE_CAP")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2017-05-23 07:40:44 -05:00
Mika Westerberg
c61872c983 firmware: dmi: Add DMI_PRODUCT_FAMILY identification string
Sometimes it is more convenient to be able to match a whole family of
products, like in case of bunch of Chromebooks based on Intel_Strago to
apply a driver quirk instead of quirking each machine one-by-one.

This adds support for DMI_PRODUCT_FAMILY identification string and also
exports it to the userspace through sysfs attribute just like the
existing ones.

Suggested-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2017-05-23 10:04:41 +02:00
Linus Torvalds
86ca984cef Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Mostly netfilter bug fixes in here, but we have some bits elsewhere as
  well.

   1) Don't do SNAT replies for non-NATed connections in IPVS, from
      Julian Anastasov.

   2) Don't delete conntrack helpers while they are still in use, from
      Liping Zhang.

   3) Fix zero padding in xtables's xt_data_to_user(), from Willem de
      Bruijn.

   4) Add proper RCU protection to nf_tables_dump_set() because we
      cannot guarantee that we hold the NFNL_SUBSYS_NFTABLES lock. From
      Liping Zhang.

   5) Initialize rcv_mss in tcp_disconnect(), from Wei Wang.

   6) smsc95xx devices can't handle IPV6 checksums fully, so don't
      advertise support for offloading them. From Nisar Sayed.

   7) Fix out-of-bounds access in __ip6_append_data(), from Eric
      Dumazet.

   8) Make atl2_probe() propagate the error code properly on failures,
      from Alexey Khoroshilov.

   9) arp_target[] in bond_check_params() is used uninitialized. This
      got changes from a global static to a local variable, which is how
      this mistake happened. Fix from Jarod Wilson.

  10) Fix fallout from unnecessary NULL check removal in cls_matchall,
      from Jiri Pirko. This is definitely brown paper bag territory..."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
  net: sched: cls_matchall: fix null pointer dereference
  vsock: use new wait API for vsock_stream_sendmsg()
  bonding: fix randomly populated arp target array
  net: Make IP alignment calulations clearer.
  bonding: fix accounting of active ports in 3ad
  net: atheros: atl2: don't return zero on failure path in atl2_probe()
  ipv6: fix out of bound writes in __ip6_append_data()
  bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
  smsc95xx: Support only IPv4 TCP/UDP csum offload
  arp: always override existing neigh entries with gratuitous ARP
  arp: postpone addr_type calculation to as late as possible
  arp: decompose is_garp logic into a separate function
  arp: fixed error in a comment
  tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0
  netfilter: xtables: fix build failure from COMPAT_XT_ALIGN outside CONFIG_COMPAT
  ebtables: arpreply: Add the standard target sanity check
  netfilter: nf_tables: revisit chain/object refcounting from elements
  netfilter: nf_tables: missing sanitization in data from userspace
  netfilter: nf_tables: can't assume lock is acquired when dumping set elems
  netfilter: synproxy: fix conntrackd interaction
  ...
2017-05-22 12:42:02 -07:00
Ming Lei
7254a50a5d blk-mq: remove blk_mq_abort_requeue_list()
No one uses it any more, so remove it.

Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-05-22 20:50:11 +02:00
Jan Glauber
c2372c2042 of/platform: Make of_platform_device_destroy globally visible
of_platform_device_destroy is the counterpart to
of_platform_device_create which is a non-static function.

After creating a platform device it might be neccessary
to destroy it to deal with -EPROBE_DEFER where a
repeated of_platform_device_create call would fail otherwise.

Therefore also make of_platform_device_destroy globally visible.

Signed-off-by: Jan Glauber <jglauber@cavium.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-05-22 18:01:46 +02:00
Anatolij Gustschin
020e0b1c8f gpiolib: Add stubs for gpiod lookup table interface
Add stubs for gpiod_add_lookup_table() and gpiod_remove_lookup_table()
for the !GPIOLIB case to prevent build errors.

Signed-off-by: Anatolij Gustschin <agust@denx.de>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2017-05-22 10:39:11 +02:00
Linus Walleij
b4d2ea2af9 Revert "pinctrl: generic: Add bi-directional and output-enable"
This reverts commit 8c58f1a7a4.

It turns out that applying these generic properties was
premature: the properties used in the driver using this
are of unclear electrical nature and the subject need to
be discussed.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2017-05-22 10:39:10 +02:00
David S. Miller
23416e2304 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter/IPVS fixes for your net tree,
they are:

1) When using IPVS in direct-routing mode, normal traffic from the LVS
   host to a back-end server is sometimes incorrectly NATed on the way
   back into the LVS host. Patch to fix this from Julian Anastasov.

2) Calm down clang compilation warning in ctnetlink due to type
   mismatch, from Matthias Kaehlcke.

3) Do not re-setup NAT for conntracks that are already confirmed, this
   is fixing a problem that was introduced in the previous nf-next batch.
   Patch from Liping Zhang.

4) Do not allow conntrack helper removal from userspace cthelper
   infrastructure if already in used. This comes with an initial patch
   to introduce nf_conntrack_helper_put() that is required by this fix.
   From Liping Zhang.

5) Zero the pad when copying data to userspace, otherwise iptables fails
   to remove rules. This is a follow up on the patchset that sorts out
   the internal match/target structure pointer leak to userspace. Patch
   from the same author, Willem de Bruijn. This also comes with a build
   failure when CONFIG_COMPAT is not on, coming in the last patch of
   this series.

6) SYNPROXY crashes with conntrack entries that are created via
   ctnetlink, more specifically via conntrackd state sync. Patch from
   Eric Leblond.

7) RCU safe iteration on set element dumping in nf_tables, from
   Liping Zhang.

8) Missing sanitization of immediate date for the bitwise and cmp
   expressions in nf_tables.

9) Refcounting logic for chain and objects from set elements does not
   integrate into the nf_tables 2-phase commit protocol.

10) Missing sanitization of target verdict in ebtables arpreply target,
    from Gao Feng.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21 13:00:02 -04:00