Commit Graph

63614 Commits

Author SHA1 Message Date
David S. Miller
ddc9cc0131 Merge tag 'mlx5e-updates-2018-09-05' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:

====================
mlx5e-updates-2018-09-05

This series provides updates to mlx5 ethernet driver.

1) Starting with a four patches series to optimize flow counters updates,
From Vlad Buslov:
==============================================

By default mlx5 driver updates cached counters each second. Update function
consumes noticeable amount of CPU resources. The goal of this patch series
is to optimize update function.

Investigation revealed following bottlenecks in fs counters
implementation:
 1) Update code(scheduled each second) iterates over all counters twice.
 (first for finding and deleting counters that are marked for deletion,
 second iteration is for actually updating the counters)
 2) Counters are stored in rb tree. Linear iteration over all rb tree
 elements(rb_next in profiling data) consumed ~65% of time spent in
 update function.

Following optimizations were implemented:
 1) Instead of just marking counters for deletion, store them in
 standalone list. This removes first iteration over whole counters tree.
 2) Store counters in sorted list to optimize traversing them and remove
 calls to rb_next.

First implementation of these changes caused degradation of performance,
instead of improving it. Investigation revealed that there first cache
line of struct mlx5_fc is full and adding anything to it causes amount
of cache misses to double. To mitigate that, following refactorings were
implemented:
 - Change 'addlist' list type from double linked to single linked. This
 allowes to get free space for one additional pointer that is used to
 store deletion list(optimization 1)
 - Substitute rb tree with idr. Idr is non-intrusive data structure and
 doesn't require adding any new members to struct mlx5_fc. Use free
 space that became available for double linked sorted list that is used
 for traversing all counters. (optimization 2)

Described changes reduced CPU time spent in mlx5_fc_stats_work from 70%
to 44%. (global perf profile mode)
============================================

The rest of the series are misc updates:

2) From Kamal, Move mlx5e_priv_flags into en_ethtool.c, to avoid a
compilation warning.

3) From Roi Dayan, Move Q counters allocation and drop RQ to init_rx profile
function to avoid allocating Q counters when not required.

4) From Shay Agroskin, Replace PTP clock lock from RW lock to seq lock.
Almost double the packet rate when timestamping is active on multiple TX
queues.

5) From: Natali Shechtman, set ECN for received packets using CQE indication.

6) From: Alaa Hleihel, don't set CHECKSUM_COMPLETE on SCTP packets.
CHECKSUM_COMPLETE is not applicable to SCTP protocol.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-06 15:42:04 -07:00
Ming Lei
7759eb23fd block: remove bio_rewind_iter()
It is pointed that bio_rewind_iter() is one very bad API[1]:

1) bio size may not be restored after rewinding

2) it causes some bogus change, such as 5151842b9d (block: reset
bi_iter.bi_done after splitting bio)

3) rewinding really makes things complicated wrt. bio splitting

4) unnecessary updating of .bi_done in fast path

[1] https://marc.info/?t=153549924200005&r=1&w=2

So this patch takes Kent's suggestion to restore one bio into its original
state via saving bio iterator(struct bvec_iter) in bio_integrity_prep(),
given now bio_rewind_iter() is only used by bio integrity code.

Cc: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: Hannes Reinecke <hare@suse.com>
Suggested-by: Kent Overstreet <kent.overstreet@gmail.com>
Acked-by: Kent Overstreet <kent.overstreet@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-06 15:12:24 -06:00
Linus Torvalds
ca16eb342e Merge tag 'for-linus-20180906' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "Small collection of fixes that should go into this release. This
  contains:

   - Small series that fixes a race between blkcg teardown and writeback
     (Dennis Zhou)

   - Fix disallowing invalid block size settings from the nbd ioctl (me)

   - BFQ fix for a use-after-free on last release of a bfqg (Konstantin
     Khlebnikov)

   - Fix for the "don't warn for flush" fix (Mikulas)"

* tag 'for-linus-20180906' of git://git.kernel.dk/linux-block:
  block: bfq: swap puts in bfqg_and_blkg_put
  block: don't warn when doing fsync on read-only devices
  nbd: don't allow invalid blocksize settings
  blkcg: use tryget logic when associating a blkg with a bio
  blkcg: delay blkg destruction until after writeback has finished
  Revert "blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()"
2018-09-06 14:01:15 -07:00
Linus Torvalds
be65e2595b Merge tag 'trace-v4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
 "This fixes two annoying bugs:

   - The first one is a side effect caused by using SRCU for rcuidle
     tracepoints. It seems that the perf was depending on the rcuidle
     tracepoints to make RCU watch when it wasn't.

     The real fix will be to have perf use SRCU instead of depending on
     RCU watching, but that can't be done until SRCU is safe to use in
     NMI context (Paul's working on that).

   - The second bug fix is for a bug that's been periodically making my
     tests fail randomly for some time. I haven't had time to track it
     down, but finally have. It has to do with stressing NMIs (via perf)
     while enabling or disabling ftrace function handling with lockdep
     enabled.

     If an interrupt happens and just as it returns, it sets lockdep
     back to "interrupts enabled" but before it returns an NMI is
     triggered, and if this happens while printk_nmi_enter has a
     breakpoint attached to it (because ftrace is converting it to or
     from nop to call fentry), the breakpoint trap also calls into
     lockdep, and since returning from the NMI to a interrupt handler,
     interrupts were disabled when the NMI went off, lockdep keeps its
     state as interrupts disabled when it returns back from the
     interrupt handler where interrupts are enabled.

     This causes lockdep_assert_irqs_enabled() to trigger a false
     positive"

* tag 'trace-v4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  printk/tracing: Do not trace printk_nmi_enter()
  tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints
2018-09-06 09:06:49 -07:00
Denis Bolotin
a3f723079d qed*: Utilize FW 8.37.7.0
This patch adds a new qed firmware with fixes and support for new features.

Fixes:
- Fix a rare case of device crash with iWARP, iSCSI or FCoE offload.
- Fix GRE tunneled traffic when iWARP offload is enabled.
- Fix RoCE failure in ib_send_bw when using inline data.
- Fix latency optimization flow for inline WQEs.
- BigBear 100G fix

RDMA:
- Reduce task context size.
- Application page sizes above 2GB support.
- Performance improvements.

ETH:
- Tenant DCB support.
- Replace RSS indirection table update interface.

Misc:
- Debug Tools changes.

Signed-off-by: Denis Bolotin <denis.bolotin@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-06 07:44:35 -07:00
Vincent Whitchurch
fa788d986a packet: add sockopt to ignore outgoing packets
Currently, the only way to ignore outgoing packets on a packet socket is
via the BPF filter.  With MSG_ZEROCOPY, packets that are looped into
AF_PACKET are copied in dev_queue_xmit_nit(), and this copy happens even
if the filter run from packet_rcv() would reject them.  So the presence
of a packet socket on the interface takes away the benefits of
MSG_ZEROCOPY, even if the packet socket is not interested in outgoing
packets.  (Even when MSG_ZEROCOPY is not used, the skb is unnecessarily
cloned, but the cost for that is much lower.)

Add a socket option to allow AF_PACKET sockets to ignore outgoing
packets to solve this.  Note that the *BSDs already have something
similar: BIOCSSEESENT/BIOCSDIRECTION and BIOCSDIRFILT.

The first intended user is lldpd.

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:09:37 -07:00
Shay Agroskin
64109f1dc4 net/mlx5e: Replace PTP clock lock from RW lock to seq lock
Changed "priv.clock.lock" lock from 'rw_lock' to 'seq_lock'
in order to improve packet rate performance.

Tested on Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz.
Sent 64b packets between two peers connected by ConnectX-5,
and measured packet rate for the receiver in three modes:
	no time-stamping (base rate)
	time-stamping using rw_lock (old lock) for critical region
	time-stamping using seq_lock (new lock) for critical region
Only the receiver time stamped its packets.

The measured packet rate improvements are:

	Single flow (multiple TX rings to single RX ring):
		without timestamping:	  4.26 (M packets)/sec
		with rw-lock (old lock):  4.1  (M packets)/sec
		with seq-lock (new lock): 4.16 (M packets)/sec
		1.46% improvement

	Multiple flows (multiple TX rings to six RX rings):
		without timestamping: 	  22   (M packets)/sec
		with rw-lock (old lock):  11.7 (M packets)/sec
		with seq-lock (new lock): 21.3 (M packets)/sec
		82.05% improvement

The packet rate improvement is due to the lack of atomic operations
for the 'readers' by the seq-lock.
Since there are much more 'readers' than 'writers' contention
on this lock, almost all atomic operations are saved.
this results in a dramatic decrease in overall
cache misses.

Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05 21:14:57 -07:00
Vlad Buslov
12d6066c3b net/mlx5: Add flow counters idr
Previous patch in series changed flow counter storage structure from
rb_tree to linked list in order to improve flow counter traversal
performance. The drawback of such solution is that flow counter lookup by
id becomes linear in complexity.

Store pointers to flow counters in idr in order to improve lookup
performance to logarithmic again. Idr is non-intrusive data structure and
doesn't require extending flow counter struct with new elements. This means
that idr can be used for lookup, while linked list from previous patch is
used for traversal, and struct mlx5_fc size is <= 2 cache lines.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Amir Vadai <amir@vadai.me>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05 21:14:57 -07:00
Vlad Buslov
9aff93d7d0 net/mlx5: Store flow counters in a list
In order to improve performance of flow counter stats query loop that
traverses all configured flow counters, replace rb_tree with double-linked
list. This change improves performance of traversing flow counters by
removing the tree traversal. (profiling data showed that call to rb_next
was most top CPU consumer)

However, lookup of flow flow counter in list becomes linear, instead of
logarithmic. This problem is fixed by next patch in series, which adds idr
for fast lookup. Idr is to be used because it is not an intrusive data
structure and doesn't require adding any new members to struct mlx5_fc,
which allows its control data part to stay <= 1 cache line in size.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Amir Vadai <amir@vadai.me>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05 21:14:57 -07:00
Vlad Buslov
6e5e228391 net/mlx5: Add new list to store deleted flow counters
In order to prevent flow counters stats work function from traversing whole
flow counters tree while searching for deleted flow counters, new list to
store deleted flow counters is added to struct mlx5_fc_stats. Lockless
NULL-terminated single linked list data type is used due to following
reasons:
 - This use case only needs to add single element to list and
 remove/iterate whole list. Lockless list doesn't require any additional
 synchronization for these operations.
 - First cache line of flow counter data structure only has space to store
 single additional pointer, which precludes usage of double linked list.

Remove flow counter 'deleted' flag that is no longer needed.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Amir Vadai <amir@vadai.me>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05 21:14:57 -07:00
Vlad Buslov
83033688b7 net/mlx5: Change flow counters addlist type to single linked list
In order to prevent flow counters stats work function from traversing whole
flow counters tree while searching for deleted flow counters, new list to
store deleted flow counters will be added to struct mlx5_fc_stats. However,
the flow counter structure itself has no space left to store any more data
in first cache line. To free space that is needed to store additional list
node, convert current addlist double linked list (two pointers per node) to
atomic single linked list (one pointer per node).

Lockless NULL-terminated single linked list data type doesn't require any
additional external synchronization for operations used by flow counters
module (add single new element, remove all elements from list and traverse
them). Remove addlist_lock that is no longer needed.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Amir Vadai <amir@vadai.me>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05 21:14:56 -07:00
Tariq Toukan
a090362210 net/mlx5: Use u16 for Work Queue buffer strides offset
Minimal stride size is 16.
Hence, the number of strides in a fragment (of PAGE_SIZE)
is <= PAGE_SIZE / 16 <= 4K.

u16 is sufficient to represent this.

Fixes: d7037ad73d ("net/mlx5: Fix QP fragmented buffer allocation")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05 17:08:33 -07:00
Tariq Toukan
8d71e81850 net/mlx5: Use u16 for Work Queue buffer fragment size
Minimal stride size is 16.
Hence, the number of strides in a fragment (of PAGE_SIZE)
is <= PAGE_SIZE / 16 <= 4K.

u16 is sufficient to represent this.

Fixes: 388ca8be00 ("IB/mlx5: Implement fragmented completion queue (CQ)")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05 17:08:33 -07:00
Jack Morgenstein
76d5581c87 net/mlx5: Fix use-after-free in self-healing flow
When the mlx5 health mechanism detects a problem while the driver
is in the middle of init_one or remove_one, the driver needs to prevent
the health mechanism from scheduling future work; if future work
is scheduled, there is a problem with use-after-free: the system WQ
tries to run the work item (which has been freed) at the scheduled
future time.

Prevent this by disabling work item scheduling in the health mechanism
when the driver is in the middle of init_one() or remove_one().

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05 17:08:33 -07:00
Jason Gunthorpe
2c910cb75e Merge branch 'uverbs_dev_cleanups' into rdma.git for-next
For dependencies, branch based on rdma.git 'for-rc' of
https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/

Pull 'uverbs_dev_cleanups' from Leon Romanovsky:

====================
Reuse the char device code interfaces to simplify ib_uverbs_device
creation and destruction. As part of this series, we are sending fix to
cleanup path, which was discovered during internal review,

The fix definitely can go to -rc, but it means that this series will be
dependent on rdma-rc.
====================

* branch 'uverbs_dev_cleanups':
  RDMA/uverbs: Use device.groups to initialize device attributes
  RDMA/uverbs: Use cdev_device_add() instead of cdev_add()
  RDMA/core: Depend on device_add() to add device attributes
  RDMA/uverbs: Fix error cleanup path of ib_uverbs_add_one()

Resolved conflict in ib_device_unregister_sysfs()

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-05 16:21:22 -06:00
Steven Rostedt (VMware)
865e63b04e tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints
Borislav reported the following splat:

 =============================
 WARNING: suspicious RCU usage
 4.19.0-rc1+ #1 Not tainted
 -----------------------------
 ./include/linux/rcupdate.h:631 rcu_read_lock() used illegally while idle!
 other info that might help us debug this:

 RCU used illegally from idle CPU!
 rcu_scheduler_active = 2, debug_locks = 1
 RCU used illegally from extended quiescent state!
 1 lock held by swapper/0/0:
  #0: 000000004557ee0e (rcu_read_lock){....}, at: perf_event_output_forward+0x0/0x130

 stack backtrace:
 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0-rc1+ #1
 Hardware name: LENOVO 2320CTO/2320CTO, BIOS G2ET86WW (2.06 ) 11/13/2012
 Call Trace:
  dump_stack+0x85/0xcb
  perf_event_output_forward+0xf6/0x130
  __perf_event_overflow+0x52/0xe0
  perf_swevent_overflow+0x91/0xb0
  perf_tp_event+0x11a/0x350
  ? find_held_lock+0x2d/0x90
  ? __lock_acquire+0x2ce/0x1350
  ? __lock_acquire+0x2ce/0x1350
  ? retint_kernel+0x2d/0x2d
  ? find_held_lock+0x2d/0x90
  ? tick_nohz_get_sleep_length+0x83/0xb0
  ? perf_trace_cpu+0xbb/0xd0
  ? perf_trace_buf_alloc+0x5a/0xa0
  perf_trace_cpu+0xbb/0xd0
  cpuidle_enter_state+0x185/0x340
  do_idle+0x1eb/0x260
  cpu_startup_entry+0x5f/0x70
  start_kernel+0x49b/0x4a6
  secondary_startup_64+0xa4/0xb0

This is due to the tracepoints moving to SRCU usage which does not require
RCU to be "watching". But perf uses these tracepoints with RCU and expects
it to be. Hence, we still need to add in the rcu_irq_enter/exit_irqson()
calls for "rcuidle" tracepoints. This is a temporary fix until we have SRCU
working in NMI context, and then perf can be converted to use that instead
of normal RCU.

Link: http://lkml.kernel.org/r/20180904162611.6a120068@gandalf.local.home

Cc: x86-ml <x86@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Reported-by: Borislav Petkov <bp@alien8.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Fixes: e6753f23d9 ("tracepoint: Make rcuidle tracepoint callers use SRCU")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-09-05 11:23:21 -04:00
Randy Dunlap
d23df2dc56 linux/mod_devicetable.h: fix kernel-doc missing notation for typec_device_id
Fix kernel-doc warning for missing struct member description:

../include/linux/mod_devicetable.h:763: warning: Function parameter or member 'driver_data' not described in 'typec_device_id'

Fixes: 8a37d87d72 ("usb: typec: Bus type for alternate modes")

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-05 14:36:53 +02:00
Harry Cutts
1ff2e1a44e HID: input: Create a utility class for counting scroll events
To avoid code duplication, this class counts high-resolution scroll
movements and emits the legacy low-resolution events when appropriate.
Drivers should be able to create one instance for each scroll wheel that
they need to handle.

Signed-off-by: Harry Cutts <hcutts@chromium.org>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2018-09-05 10:12:07 +02:00
Shaul Triebitz
add7453ad6 wireless: align to draft 11ax D3.0
Align to new 11ax draft D3.0.  Change/add new MAC and PHY capabilities
and update drivers' 11ax capabilities and mac80211's debugfs
accordingly.

Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:09:50 +02:00
Johannes Berg
b0aa75f0b1 ieee80211: add new VHT capability fields/parsing
IEEE 802.11-2016 extended the VHT capability fields to allow
indicating the number of spatial streams depending on the
actually used bandwidth, add support for decoding this.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:03:14 +02:00
Sara Sharon
03512ceb60 ieee80211: remove redundant leading zeroes
The defines of IEEE80211_HE_OPERATION_VHT_OPER_INFO and
IEEE80211_HE_OPERATION_MULTI_BSSID_AP have leading zeroes
that makes the number look like it is bigger than 32 bit.
This is misleading, remove it.

Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05 10:03:13 +02:00
Mark Bloch
50acec06f3 net/mlx5: Export packet reformat alloc/dealloc functions
This will allow for the RDMA side to allocate packet reformat context.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-09-05 08:11:26 +03:00
Mark Bloch
bea4e1f6c6 net/mlx5: Expose new packet reformat capabilities
Expose new abilities when creating a packet reformat context.

The new types which can be created are:
MLX5_REFORMAT_TYPE_L2_TO_L2_TUNNEL: Ability to create generic encap
operation to be done by the HW.

MLX5_REFORMAT_TYPE_L3_TUNNEL_TO_L2: Ability to create generic decap
operation where the inner packet doesn't contain L2.

MLX5_REFORMAT_TYPE_L2_TO_L3_TUNNEL: Ability to create generic encap
operation to be done by the HW. The L2 of the original packet
is dropped.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-09-05 08:11:09 +03:00
Mark Bloch
60786f0987 {net, RDMA}/mlx5: Rename encap to reformat packet
Renames all encap mlx5_{core,ib} code to use the new naming of packet
reformat. This change doesn't introduce any function change and is
needed to properly reflect the operation being done by this action.
For example not only can we encapsulate a packet, but also decapsulate it.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-09-05 08:10:59 +03:00
Mark Bloch
e0e7a3861b net/mlx5: Move header encap type to IFC header file
Those bits are hardware specification and should be defined in the
IFC header file.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-09-05 08:10:51 +03:00
Mark Bloch
61444b458b net/mlx5: Break encap/decap into two separated flow table creation flags
Today we are able to attach encap and decap actions only to the FDB. In
preparation to enable those actions on the NIC flow tables, break the
single flag into two. Those flags control whatever a decap or encap
operations can be attached to the flow table created. For FDB, if
encapsulation is required, we set both of them.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-09-05 07:58:00 +03:00
Mark Bloch
90c1d1b8da net/mlx5: Export modify header alloc/dealloc functions
Those functions will be used by the RDMA side to create modify header
actions to be attached to flow steering rules via verbs.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-09-05 07:56:40 +03:00
Mark Bloch
8ce7825796 net/mlx5: Add proper NIC TX steering flow tables support
Extend the ability to add steering rules to NIC TX flow tables.
For now, we are only adding TX bypass (egress) which is used by the RDMA
side. This will allow to shape outgoing traffic and tweak it if needed, for
example performing encapsulation or rewriting headers.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-09-05 07:56:33 +03:00
David S. Miller
36302685f5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-09-04 21:33:03 -07:00
Kurt Kanzenbach
ff8648f29f mtd: rawnand: fsl_ifc: fixup SRAM init for newer ctrl versions
Newer versions of the IFC controller use a different method of initializing the
internal SRAM: Instead of reading from flash, a bit in the NAND configuration
register has to be set in order to trigger the self-initializing process.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2018-09-04 23:37:38 +02:00
Boris Brezillon
7525c9518e mtd: rawnand: Get rid of the ->read_word() hook
Commit c120e75e0e ("mtd: nand: use read_oob() instead of cmdfunc()
for bad block check") removed this only user of the ->read_word()
method but kept the hook in place. Remove it now.

Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2018-09-04 22:53:13 +02:00
Linus Torvalds
28619527b8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Must perform TXQ teardown before unregistering interfaces in
    mac80211, from Toke Høiland-Jørgensen.

 2) Don't allow creating mac80211_hwsim with less than one channel, from
    Johannes Berg.

 3) Division by zero in cfg80211, fix from Johannes Berg.

 4) Fix endian issue in tipc, from Haiqing Bai.

 5) BPF sockmap use-after-free fixes from Daniel Borkmann.

 6) Spectre-v1 in mac80211_hwsim, from Jinbum Park.

 7) Missing rhashtable_walk_exit() in tipc, from Cong Wang.

 8) Revert kvzalloc() conversion of AF_PACKET, it breaks mmap() when
    kvzalloc() tries to use kmalloc() pages. From Eric Dumazet.

 9) Fix deadlock in hv_netvsc, from Dexuan Cui.

10) Do not restart timewait timer on RST, from Florian Westphal.

11) Fix double lwstate refcount grab in ipv6, from Alexey Kodanev.

12) Unsolicit report count handling is off-by-one, fix from Hangbin Liu.

13) Sleep-in-atomic in cadence driver, from Jia-Ju Bai.

14) Respect ttl-inherit in ip6 tunnel driver, from Hangbin Liu.

15) Use-after-free in act_ife, fix from Cong Wang.

16) Missing hold to meta module in act_ife, from Vlad Buslov.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (91 commits)
  net: phy: sfp: Handle unimplemented hwmon limits and alarms
  net: sched: action_ife: take reference to meta module
  act_ife: fix a potential use-after-free
  net/mlx5: Fix SQ offset in QPs with small RQ
  tipc: correct spelling errors for tipc_topsrv_queue_evt() comments
  tipc: correct spelling errors for struct tipc_bc_base's comment
  bnxt_en: Do not adjust max_cp_rings by the ones used by RDMA.
  bnxt_en: Clean up unused functions.
  bnxt_en: Fix firmware signaled resource change logic in open.
  sctp: not traverse asoc trans list if non-ipv6 trans exists for ipv6_flowlabel
  sctp: fix invalid reference to the index variable of the iterator
  net/ibm/emac: wrong emac_calc_base call was used by typo
  net: sched: null actions array pointer before releasing action
  vhost: fix VHOST_GET_BACKEND_FEATURES ioctl request definition
  r8169: add support for NCube 8168 network card
  ip6_tunnel: respect ttl inherit for ip6tnl
  mac80211: shorten the IBSS debug messages
  mac80211: don't Tx a deauth frame if the AP forbade Tx
  mac80211: Fix station bandwidth setting after channel switch
  mac80211: fix a race between restart and CSA flows
  ...
2018-09-04 12:45:11 -07:00
Benjamin Tissoires
0d6c301140 HID: core: fix grouping by application
commit f07b3c1da9 ("HID: generic: create one input report per
application type") was effectively the same as MULTI_INPUT:
hidinput->report was never set, so hidinput_match_application()
always returned null.

Fix that by testing against the real application.

Note that this breaks some old eGalax touchscreens that expect MULTI_INPUT
instead of HID_QUIRK_INPUT_PER_APP. Enable this quirk for backward
compatibility on all non-Win8 touchscreens.

link: https://bugzilla.kernel.org/show_bug.cgi?id=200847
link: https://bugzilla.kernel.org/show_bug.cgi?id=200849
link: https://bugs.archlinux.org/task/59699
link: https://github.com/NixOS/nixpkgs/issues/45165

Cc: stable@vger.kernel.org # v4.18+
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2018-09-04 21:31:43 +02:00
Alexander Popov
964c9dff00 stackleak: Allow runtime disabling of kernel stack erasing
Introduce CONFIG_STACKLEAK_RUNTIME_DISABLE option, which provides
'stack_erasing' sysctl. It can be used in runtime to control kernel
stack erasing for kernels built with CONFIG_GCC_PLUGIN_STACKLEAK.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Alexander Popov <alex.popov@linux.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-09-04 10:35:48 -07:00
Alexander Popov
c8d126275a fs/proc: Show STACKLEAK metrics in the /proc file system
Introduce CONFIG_STACKLEAK_METRICS providing STACKLEAK information about
tasks via the /proc file system. In particular, /proc/<pid>/stack_depth
shows the maximum kernel stack consumption for the current and previous
syscalls. Although this information is not precise, it can be useful for
estimating the STACKLEAK performance impact for your workloads.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Alexander Popov <alex.popov@linux.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-09-04 10:35:48 -07:00
Alexander Popov
afaef01c00 x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
The STACKLEAK feature (initially developed by PaX Team) has the following
benefits:

1. Reduces the information that can be revealed through kernel stack leak
   bugs. The idea of erasing the thread stack at the end of syscalls is
   similar to CONFIG_PAGE_POISONING and memzero_explicit() in kernel
   crypto, which all comply with FDP_RIP.2 (Full Residual Information
   Protection) of the Common Criteria standard.

2. Blocks some uninitialized stack variable attacks (e.g. CVE-2017-17712,
   CVE-2010-2963). That kind of bugs should be killed by improving C
   compilers in future, which might take a long time.

This commit introduces the code filling the used part of the kernel
stack with a poison value before returning to userspace. Full
STACKLEAK feature also contains the gcc plugin which comes in a
separate commit.

The STACKLEAK feature is ported from grsecurity/PaX. More information at:
  https://grsecurity.net/
  https://pax.grsecurity.net/

This code is modified from Brad Spengler/PaX Team's code in the last
public patch of grsecurity/PaX based on our understanding of the code.
Changes or omissions from the original code are ours and don't reflect
the original grsecurity/PaX code.

Performance impact:

Hardware: Intel Core i7-4770, 16 GB RAM

Test #1: building the Linux kernel on a single core
        0.91% slowdown

Test #2: hackbench -s 4096 -l 2000 -g 15 -f 25 -P
        4.2% slowdown

So the STACKLEAK description in Kconfig includes: "The tradeoff is the
performance impact: on a single CPU system kernel compilation sees a 1%
slowdown, other systems and workloads may vary and you are advised to
test this feature on your expected workload before deploying it".

Signed-off-by: Alexander Popov <alex.popov@linux.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-09-04 10:35:47 -07:00
Moni Shoua
aa7e80b220 net/mlx5: Fix atomic_mode enum values
The field atomic_mode is 4 bits wide and therefore can hold values
from 0x0 to 0xf. Remove the unnecessary 20 bit shift that made the values
be incorrect. While that, remove unused enum values.

Fixes: 57cda166bb ("net/mlx5: Add DCT command interface")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-09-04 15:03:06 +03:00
Linus Walleij
97feacc05d gpio: ts5500: Delete platform data handling
The TS5500 GPIO driver apparently supports platform data
without making any use of it whatsoever. Delete this code,
last chance to speak up if you think it is needed.

Cc: kernel@savoirfairelinux.com
Cc: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Cc: Jerome Oufella <jerome.oufella@savoirfairelinux.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-09-04 08:22:47 +02:00
Linus Walleij
bf97279079 gpio: ts5500: Use SPDX header
Cut some boilerplate, use the SPDX license identifier.

Cc: kernel@savoirfairelinux.com
Cc: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Cc: Jerome Oufella <jerome.oufella@savoirfairelinux.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-09-04 08:22:47 +02:00
Martin K. Petersen
b76377543b crc-t10dif: Pick better transform if one becomes available
T10 CRC library is linked into the kernel thanks to block and SCSI. The
crypto accelerators are typically loaded later as modules and are
therefore not available when the T10 CRC library is initialized.

Use the crypto notifier facility to trigger a switch to a better algorithm
if one becomes available after the initial hash has been registered. Use
RCU to protect the original transform while the new one is being set up.

Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org
Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-09-04 11:37:05 +08:00
Kees Cook
f3569fd613 crypto: shash - Remove VLA usage in unaligned hashing
In the quest to remove all stack VLA usage from the kernel[1], this uses
the newly defined max alignment to perform unaligned hashing to avoid
VLAs, and drops the helper function while adding sanity checks on the
resulting buffer sizes. Additionally, the __aligned_largest macro is
removed since this helper was the only user.

[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-09-04 11:37:03 +08:00
Anthony Wong
9fd0e09a4e r8169: add support for NCube 8168 network card
This card identifies itself as:
  Ethernet controller [0200]: NCube Device [10ff:8168] (rev 06)
  Subsystem: TP-LINK Technologies Co., Ltd. Device [7470:3468]

Adding a new entry to rtl8169_pci_tbl makes the card work.

Link: http://launchpad.net/bugs/1788730
Signed-off-by: Anthony Wong <anthony.wong@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-03 19:05:13 -07:00
Michał Mirosław
4d18975c78 fbdev: add remove_conflicting_pci_framebuffers()
Almost all PCI drivers using remove_conflicting_framebuffers() wrap it
with the same code.

v2: add kerneldoc for DRM helper
v3: propagate remove_conflicting_framebuffers() return value
  + move kerneldoc to where function is implemented

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/7db1c278276de420eb45a1b71d06b5eb6bbd49ef.1535810304.git.mirq-linux@rere.qmqm.pl
2018-09-03 18:15:40 +02:00
Marek Szyprowski
3edd79cf5a regulator: Fix 'do-nothing' value for regulators without suspend state
Some regulators don't have all states defined and in such cases regulator
core should not assume anything. However in current implementation
of of_get_regulation_constraints() DO_NOTHING_IN_SUSPEND enable value was
set only for regulators which had suspend node defined, otherwise the
default 0 value was used, what means DISABLE_IN_SUSPEND. This lead to
broken system suspend/resume on boards, which had simple regulator
constraints definition (without suspend state nodes).

To avoid further mismatches between the default and uninitialized values
of the suspend enabled/disabled states, change the values of the them,
so default '0' means DO_NOTHING_IN_SUSPEND.

Fixes: 72069f9957: regulator: leave one item to record whether regulator is enabled
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
2018-09-03 16:10:40 +01:00
Amir Goldstein
1e6cb72399 fsnotify: add super block object type
Add the infrastructure to attach a mark to a super_block struct
and detach all attached marks when super block is destroyed.

This is going to be used by fanotify backend to setup super block
marks.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-09-03 15:14:01 +02:00
Jann Horn
9da3f2b740 x86/fault: BUG() when uaccess helpers fault on kernel addresses
There have been multiple kernel vulnerabilities that permitted userspace to
pass completely unchecked pointers through to userspace accessors:

 - the waitid() bug - commit 96ca579a1e ("waitid(): Add missing
   access_ok() checks")
 - the sg/bsg read/write APIs
 - the infiniband read/write APIs

These don't happen all that often, but when they do happen, it is hard to
test for them properly; and it is probably also hard to discover them with
fuzzing. Even when an unmapped kernel address is supplied to such buggy
code, it just returns -EFAULT instead of doing a proper BUG() or at least
WARN().

Try to make such misbehaving code a bit more visible by refusing to do a
fixup in the pagefault handler code when a userspace accessor causes a #PF
on a kernel address and the current context isn't whitelisted.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: dvyukov@google.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20180828201421.157735-7-jannh@google.com
2018-09-03 15:12:09 +02:00
Eric Long
4ac6954647 dmaengine: sprd: Support DMA link-list mode
The Spreadtrum DMA can support the link-list transaction mode, which means
DMA controller can do transaction one by one automatically once we linked
these transaction by link-list register.

Signed-off-by: Eric Long <eric.long@spreadtrum.com>
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2018-09-03 16:58:50 +05:30
Christian Borntraeger
c43c5e9f52 timekeeping: Fix declaration of read_persistent_wall_and_boot_offset()
It is read_persistent_wall_and_boot_offset() and not
read_persistent_clock_and_boot_offset()

Fixes: 3eca993740 ("timekeeping: Replace read_boot_clock64() with read_persistent_wall_and_boot_offset()")
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Link: https://lkml.kernel.org/r/20180903081533.34366-1-borntraeger@de.ibm.com
2018-09-03 13:26:44 +02:00
Linus Torvalds
fd6868d82b Merge tag 'devicetree-fixes-for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
 "A couple of new helper functions in preparation for some tree wide
  clean-ups.

  I'm sending these new helpers now for rc2 in order to simplify the
  dependencies on subsequent cleanups across the tree in 4.20"

* tag 'devicetree-fixes-for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  of: Add device_type access helper functions
  of: add node name compare helper functions
  of: add helper to lookup compatible child node
2018-09-02 10:56:01 -07:00
David S. Miller
fd3c040b24 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2018-09-01

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Add AF_XDP zero-copy support for i40e driver (!), from Björn and Magnus.

2) BPF verifier improvements by giving each register its own liveness
   chain which allows to simplify and getting rid of skip_callee() logic,
   from Edward.

3) Add bpf fs pretty print support for percpu arraymap, percpu hashmap
   and percpu lru hashmap. Also add generic percpu formatted print on
   bpftool so the same can be dumped there, from Yonghong.

4) Add bpf_{set,get}sockopt() helper support for TCP_SAVE_SYN and
   TCP_SAVED_SYN options to allow reflection of tos/tclass from received
   SYN packet, from Nikita.

5) Misc improvements to the BPF sockmap test cases in terms of cgroup v2
   interaction and removal of incorrect shutdown() calls, from John.

6) Few cleanups in xdp_umem_assign_dev() and xdpsock samples, from Prashant.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31 17:41:08 -07:00