Commit Graph

63614 Commits

Author SHA1 Message Date
Paolo Abeni
72a338bcc6 net: core: rework basic flow dissection helper
When the core networking needs to detect the transport offset in a given
packet and parse it explicitly, a full-blown flow_keys struct is used for
storage.
This patch introduces a smaller keys store, rework the basic flow dissect
helper to use it, and apply this new helper where possible - namely in
skb_probe_transport_header(). The used flow dissector data structures
are renamed to match more closely the new role.

The above gives ~50% performance improvement in micro benchmarking around
skb_probe_transport_header() and ~30% around eth_get_headlen(), mostly due
to the smaller memset. Small, but measurable improvement is measured also
in macro benchmarking.

v1 -> v2: use the new helper in eth_get_headlen() and skb_get_poff(),
  as per DaveM suggestion

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-08 00:02:36 -04:00
Sudarsana Reddy Kalluru
cac6f69154 qed: Add support for Unified Fabric Port.
This patch adds driver changes for supporting the Unified Fabric Port
(UFP). This is a new paritioning mode wherein MFW provides the set of
parameters to be used by the device such as traffic class, outer-vlan
tag value, priority type etc. Drivers receives this info via notifications
from mfw and configures the hardware accordingly.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-07 23:46:10 -04:00
Sudarsana Reddy Kalluru
27bf96e32c qed: Remove unused data member 'is_mf_default'.
The data member 'is_mf_default' is not used by the qed/qede drivers,
removing the same.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-07 23:46:10 -04:00
Sudarsana Reddy Kalluru
0bc5fe8572 qed*: Refactor mf_mode to consist of bits.
`mf_mode' field indicates the multi-partitioning mode the device is
configured to. This method doesn't scale very well, adding a new MF mode
requires going over all the existing conditions, and deciding whether those
are needed for the new mode or not.
The patch defines a set of bit-fields for modes which are derived according
to the mode info shared by the MFW and all the configuration would be made
according to those. To add a new mode, there would be a single place where
we'll need to go and choose which bits apply and which don't.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-07 23:46:10 -04:00
David S. Miller
01adc4851a Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Minor conflict, a CHECK was placed into an if() statement
in net-next, whilst a newline was added to that CHECK
call in 'net'.  Thanks to Daniel for the merge resolution.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-07 23:35:08 -04:00
Anna-Maria Gleixner
c36a68fab1 net: u64_stats_sync: Remove functions without user
Commit 67db3e4bfb ("tcp: no longer hold ehash lock while calling
tcp_get_info()") removes the only users of u64_stats_update_end/begin_raw()
without removing the function in header file.

Remove no longer used functions.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-07 23:25:24 -04:00
Gilles Buloz
17e8f0d4ce PCI: Check whether bridges allow access to extended config space
Even if a device supports extended config space, i.e., it is a PCI-X Mode 2
or a PCI Express device, the extended space may not be accessible if
there's a conventional PCI bus in the path to it.

We currently figure that out in pci_cfg_space_size() by reading the first
dword of extended config space.  On most platforms that returns ~0 data if
the space is inaccessible, but it may set error bits in PCI status
registers, and on some platforms it causes exceptions that we currently
don't recover from.

For example, a PCIe-to-conventional PCI bridge treats config transactions
with a non-zero Extended Register Address as an Unsupported Request on PCIe
and a received Master-Abort on the destination bus (see PCI Express to
PCI/PCI-X Bridge spec, r1.0, sec 4.1.3).

A sample case is a LS1043A CPU (NXP QorIQ Layerscape) platform with the
following bus topology:

  LS1043 PCIe Root Port
    -> PEX8112 PCIe-to-PCI bridge (doesn't support ext cfg on PCI side)
      -> PMC slot connector (for legacy PMC modules)

With a PMC module topology as follows:

  PMC connector
    -> PCI-to-PCIe bridge
      -> PCIe switch (4 ports)
        -> 4 PCIe devices (one on each port)

The PCIe devices on the PMC module support extended config space, but we
can't reach it because the PEX8112 can't generate accesses to the extended
space on its secondary bus.  Attempts to access it cause Unsupported
Request errors, which result in synchronous aborts on this platform.

To avoid these errors, check whether bridges are capable of generating
extended config space addresses on their secondary interfaces.  If they
can't, we restrict devices below the bridge to only the 256-byte
PCI-compatible config space.

Signed-off-by: Gilles Buloz <gilles.buloz@kontron.com>
[bhelgaas: changelog, rework patch so bus_flags testing is all in
pci_bridge_child_ext_cfg_accessible()]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-05-07 16:54:35 -05:00
Bjorn Helgaas
d22b362184 PCI: pciehp: Add quirk for Command Completed errata
Several PCIe hotplug controllers have errata that mean they do not set the
Command Completed bit unless writes to the Slot Command register change
"Control" bits.  Command Completed is never set for writes that only change
software notification "Enable" bits.  This results in timeouts like this:

  pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago)

When this erratum is present, avoid these timeouts by marking commands
"completed" immediately unless they change the "Control" bits.

Here's the text of the Intel erratum CF118.  We assume this applies to all
Intel parts:

  CF118        PCIe Slot Status Register Command Completed bit not always
               updated on any configuration write to the Slot Control
               Register

  Problem:     For PCIe root ports (devices 0 - 10) supporting hot-plug,
               the Slot Status Register (offset AAh) Command Completed
               (bit[4]) status is updated under the following condition:
               IOH will set Command Completed bit after delivering the new
               commands written in the Slot Controller register (offset
               A8h) to VPP. The IOH detects new commands written in Slot
               Control register by checking the change of value for Power
               Controller Control (bit[10]), Power Indicator Control
               (bits[9:8]), Attention Indicator Control (bits[7:6]), or
               Electromechanical Interlock Control (bit[11]) fields. Any
               other configuration writes to the Slot Control register
               without changing the values of these fields will not cause
               Command Completed bit to be set.

               The PCIe Base Specification Revision 2.0 or later describes
               the “Slot Control Register” in section 7.8.10, as follows
               (Reference section 7.8.10, Slot Control Register, Offset
               18h). In hot-plug capable Downstream Ports, a write to the
               Slot Control register must cause a hot-plug command to be
               generated (see Section 6.7.3.2 for details on hot-plug
               commands). A write to the Slot Control register in a
               Downstream Port that is not hotplug capable must not cause a
               hot-plug command to be executed.

               The PCIe Spec intended that every write to the Slot Control
               Register is a command and expected a command complete status
               to abstract the VPP implementation specific nuances from the
               OS software. IOH PCIe Slot Control Register implementation
               is not fully conforming to the PCIe Specification in this
               respect.

  Implication: Software checking on the Command Completed status after
               writing to the Slot Control register may time out.

  Workaround:  Software can read the Slot Control register and compare the
               existing and new values to determine if it should check the
               Command Completed status after writing to the Slot Control
               register.

Per Sinan, the Qualcomm QDF2400 controller also does not set the Command
Completed bit unless writes to the Slot Command register change "Control"
bits.

Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
Link: https://lkml.kernel.org/r/8770820b-85a0-172b-7230-3a44524e6c9f@molgen.mpg.de
Reported-by: Paul Menzel <pmenzel+linux-pci@molgen.mpg.de>	# Lenovo X60
Tested-by: Paul Menzel <pmenzel+linux-pci@molgen.mpg.de>	# Lenovo X60
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>		# Qcom quirk
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-05-07 16:25:08 -05:00
Bjorn Helgaas
333c8c1216 PCI: Add Qualcomm vendor ID
Add the Qualcomm vendor ID to pci_ids.h and use it in quirks.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-05-07 15:52:55 -05:00
Mauro Carvalho Chehab
04e7b3d7ea iio: iio.h: use nested struct support on kernel-doc markup
Solve those Sphinx warnings:

    ./include/linux/iio/iio.h:270: warning: Function parameter or member 'scan_type.sign' not described in 'iio_chan_spec'
    ./include/linux/iio/iio.h:270: warning: Function parameter or member 'scan_type.realbits' not described in 'iio_chan_spec'
    ./include/linux/iio/iio.h:270: warning: Function parameter or member 'scan_type.storagebits' not described in 'iio_chan_spec'
    ./include/linux/iio/iio.h:270: warning: Function parameter or member 'scan_type.shift' not described in 'iio_chan_spec'
    ./include/linux/iio/iio.h:270: warning: Function parameter or member 'scan_type.repeat' not described in 'iio_chan_spec'
    ./include/linux/iio/iio.h:270: warning: Function parameter or member 'scan_type.endianness' not described in 'iio_chan_spec'

    ./include/linux/iio/iio.h:191: WARNING: Unexpected indentation.
    ./include/linux/iio/iio.h:192: WARNING: Block quote ends without a blank line; unexpected unindent.
    ./include/linux/iio/iio.h:198: WARNING: Definition list ends without a blank line; unexpected unindent.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2018-05-07 18:18:18 +01:00
Chuck Lever
edb41e61a5 xprtrdma: Make rpc_rqst part of rpcrdma_req
This simplifies allocation of the generic RPC slot and xprtrdma
specific per-RPC resources.

It also makes xprtrdma more like the socket-based transports:
->buf_alloc and ->buf_free are now responsible only for send and
receive buffers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-05-07 09:20:03 -04:00
Chuck Lever
a9cde23ab7 SUNRPC: Add a ->free_slot transport callout
Refactor: xprtrdma needs to have better control over when RPCs are
awoken from the backlog queue, so replace xprt_free_slot with a
transport op callout.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-05-07 09:20:03 -04:00
Chuck Lever
37ac86c3a7 SUNRPC: Initialize rpc_rqst outside of xprt->reserve_lock
alloc_slot is a transport-specific op, but initializing an rpc_rqst
is common to all transports. In addition, the only part of initial-
izing an rpc_rqst that needs serialization is getting a fresh XID.

Move rpc_rqst initialization to common code in preparation for
adding a transport-specific alloc_slot to xprtrdma.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-05-07 09:20:03 -04:00
Chuck Lever
a2268cfbf5 xprtrdma: Add proper SPDX tags for NetApp-contributed source
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-05-07 09:20:03 -04:00
Christoph Hellwig
325ef1857f PCI: remove PCI_DMA_BUS_IS_PHYS
This was used by the ide, scsi and networking code in the past to
determine if they should bounce payloads.  Now that the dma mapping
always have to support dma to all physical memory (thanks to swiotlb
for non-iommu systems) there is no need to this crude hack any more.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Palmer Dabbelt <palmer@sifive.com> (for riscv)
Reviewed-by: Jens Axboe <axboe@kernel.dk>
2018-05-07 07:15:41 +02:00
Christoph Hellwig
1d3b9917df ide: kill ide_toggle_bounce
ide_toggle_bounce did select various strange block bounce limits, including
not bouncing at all as soon as an iommu is present in the system.  Given
that the dma_map routines now handle any required bounce buffering except
for ISA DMA, and the ide code already must handle either ISA DMA or highmem
at least for iommu equipped systems we can get rid of the block layer
bounce limit setting entirely.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
2018-05-07 07:15:41 +02:00
David S. Miller
90278871d4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for your net-next
tree, more relevant updates in this batch are:

1) Add Maglev support to IPVS. Moreover, store lastest server weight in
   IPVS since this is needed by maglev, patches from from Inju Song.

2) Preparation works to add iptables flowtable support, patches
   from Felix Fietkau.

3) Hand over flows back to conntrack slow path in case of TCP RST/FIN
   packet is seen via new teardown state, also from Felix.

4) Add support for extended netlink error reporting for nf_tables.

5) Support for larger timeouts that 23 days in nf_tables, patch from
   Florian Westphal.

6) Always set an upper limit to dynamic sets, also from Florian.

7) Allow number generator to make map lookups, from Laura Garcia.

8) Use hash_32() instead of opencode hashing in IPVS, from Vicent Bernat.

9) Extend ip6tables SRH match to support previous, next and last SID,
   from Ahmed Abdelsalam.

10) Move Passive OS fingerprint nf_osf.c, from Fernando Fernandez.

11) Expose nf_conntrack_max through ctnetlink, from Florent Fourcot.

12) Several housekeeping patches for xt_NFLOG, x_tables and ebtables,
   from Taehee Yoo.

13) Unify meta bridge with core nft_meta, then make nft_meta built-in.
   Make rt and exthdr built-in too, again from Florian.

14) Missing initialization of tbl->entries in IPVS, from Cong Wang.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-06 21:51:37 -04:00
Fernando Fernandez Mancera
bfb15f2a95 netfilter: extract Passive OS fingerprint infrastructure from xt_osf
Add nf_osf_ttl() and nf_osf_match() into nf_osf.c to prepare for
nf_tables support.

Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-07 00:02:11 +02:00
Linus Torvalds
8e95cb336d Merge tag 'usb-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
 "Here are some USB driver fixes for 4.17-rc4.

  The majority of them are some USB gadget fixes that missed my last
  pull request. The "largest" patch in here is a fix for the old visor
  driver that syzbot found 6 months or so ago and I finally remembered
  to fix it.

  All of these have been in linux-next with no reported issues"

* tag 'usb-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  Revert "usb: host: ehci: Use dma_pool_zalloc()"
  usb: typec: tps6598x: handle block reads separately with plain-I2C adapters
  usb: typec: tcpm: Release the role mux when exiting
  USB: Accept bulk endpoints with 1024-byte maxpacket
  xhci: Fix use-after-free in xhci_free_virt_device
  USB: serial: visor: handle potential invalid device configuration
  USB: serial: option: adding support for ublox R410M
  usb: musb: trace: fix NULL pointer dereference in musb_g_tx()
  usb: musb: host: fix potential NULL pointer dereference
  usb: gadget: composite Allow for larger configuration descriptors
  usb: dwc3: gadget: Fix list_del corruption in dwc3_ep_dequeue
  usb: dwc3: gadget: dwc3_gadget_del_and_unmap_request() can be static
  usb: dwc2: pci: Fix error return code in dwc2_pci_probe()
  usb: dwc2: WA for Full speed ISOC IN in DDMA mode.
  usb: dwc2: dwc2_vbus_supply_init: fix error check
  usb: gadget: f_phonet: fix pn_net_xmit()'s return type
2018-05-05 17:28:08 -10:00
Mauro Carvalho Chehab
6159e12e11 media: meye: allow building it with COMPILE_TEST on non-x86
This driver depends on sony-laptop driver, but this is available
only for x86. So, add a stub function, in order to allow building
it on non-x86 too.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2018-05-05 11:41:58 -04:00
Ingo Molnar
12e2c41148 Merge branch 'linus' into locking/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-05 10:01:34 +02:00
Linus Torvalds
4a7a772986 Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes froom Stephen Boyd:
 "A handful of fixes for the stm32mp1 clk driver came in during the
  merge window for the driver that got merged in the merge window.

  Plus a warning fix for unused PM ops and a couple fixes for the meson
  clk driver clk names that went unnoticed with the regmap rework.

  There's also another fix in here for the mux rounding flag which
  wasn't doing what it said it did, but now it does"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: meson: meson8b: fix meson8b_cpu_clk parent clock name
  clk: meson: meson8b: fix meson8b_fclk_div3_div clock name
  clk: meson: drop meson_aoclk_gate_regmap_ops
  clk: meson: honor CLK_MUX_ROUND_CLOSEST in clk_regmap
  clk: honor CLK_MUX_ROUND_CLOSEST in generic clk mux
  clk: cs2000: mark resume function as __maybe_unused
  clk: stm32mp1: remove ck_apb_dbg clock
  clk: stm32mp1: set stgen_k clock as critical
  clk: stm32mp1: add missing tzc2 clock
  clk: stm32mp1: fix SAI3 & SAI4 clocks
  clk: stm32mp1: remove unused dfsdm_src[] const
  clk: stm32mp1: add missing static
2018-05-04 21:12:06 -10:00
Linus Torvalds
f93314732f Merge tag 'rproc-v4.17-1' of git://github.com/andersson/remoteproc
Pull remoteproc and rpmsg fixes from Bjorn Andersson:

 - fix screw-up when reversing boolean for rproc_stop()

 - add missing OF node refcounting dereferences

 - add missing MODULE_ALIAS in rpmsg_char

* tag 'rproc-v4.17-1' of git://github.com/andersson/remoteproc:
  rpmsg: added MODULE_ALIAS for rpmsg_char
  remoteproc: qcom: Fix potential device node leaks
  remoteproc: fix crashed parameter logic on stop call
2018-05-04 21:07:43 -10:00
Linus Torvalds
2f50037a1c Merge tag 'for-linus-20180504' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "A collection of fixes that should to into this release. This contains:

   - Set of bcache fixes from Coly, fixing regression in patches that
     went into this series.

   - Set of NVMe fixes by way of Keith.

   - Set of bdi related fixes, one from Jan and two from Tetsuo Handa,
     fixing various issues around device addition/removal.

   - Two block inflight fixes from Omar, fixing issues around the
     transition to using tags for blk-mq inflight accounting that we
     did a few releases ago"

* tag 'for-linus-20180504' of git://git.kernel.dk/linux-block:
  bdi: Fix oops in wb_workfn()
  nvmet: switch loopback target state to connecting when resetting
  nvme/multipath: Fix multipath disabled naming collisions
  nvme/multipath: Disable runtime writable enabling parameter
  nvme: Set integrity flag for user passthrough commands
  nvme: fix potential memory leak in option parsing
  bdi: Fix use after free bug in debugfs_remove()
  bdi: wake up concurrent wb_shutdown() callers.
  bcache: use pr_info() to inform duplicated CACHE_SET_IO_DISABLE set
  bcache: set dc->io_disable to true in conditional_stop_bcache_device()
  bcache: add wait_for_kthread_stop() in bch_allocator_thread()
  bcache: count backing device I/O error for writeback I/O
  bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error()
  bcache: store disk name in struct cache and struct cached_dev
  blk-mq: fix sysfs inflight counter
  blk-mq: count allocated but not started requests in iostats inflight
2018-05-04 20:41:44 -10:00
Lokesh Vutla
1e0a601437 firmware: ti_sci: Switch to SPDX Licensing
Switch to SPDX licensing and drop the GPL text which comes redundant.

Acked-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2018-05-04 23:10:23 -07:00
Thomas Gleixner
8bf37d8c06 seccomp: Move speculation migitation control to arch code
The migitation control is simpler to implement in architecture code as it
avoids the extra function call to check the mode. Aside of that having an
explicit seccomp enabled mode in the architecture mitigations would require
even more workarounds.

Move it into architecture code and provide a weak function in the seccomp
code. Remove the 'which' argument as this allows the architecture to decide
which mitigations are relevant for seccomp.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-05 00:51:44 +02:00
Kees Cook
00a02d0c50 seccomp: Add filter flag to opt-out of SSB mitigation
If a seccomp user is not interested in Speculative Store Bypass mitigation
by default, it can set the new SECCOMP_FILTER_FLAG_SPEC_ALLOW flag when
adding filters.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-05 00:51:44 +02:00
Thomas Gleixner
356e4bfff2 prctl: Add force disable speculation
For certain use cases it is desired to enforce mitigations so they cannot
be undone afterwards. That's important for loader stubs which want to
prevent a child from disabling the mitigation again. Will also be used for
seccomp(). The extra state preserving of the prctl state for SSB is a
preparatory step for EBPF dymanic speculation control.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-05 00:51:43 +02:00
Jakub Kicinski
0cd3cbed3c bpf: offload: allow offloaded programs to use perf event arrays
BPF_MAP_TYPE_PERF_EVENT_ARRAY is special as far as offload goes.
The map only holds glue to perf ring, not actual data.  Allow
non-offloaded perf event arrays to be used in offloaded programs.
Offload driver can extract the events from HW and put them in
the map for user space to retrieve.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-04 23:41:03 +02:00
David Herrmann
aae7cfcbb7 security: add hook for socketpair()
Right now the LSM labels for socketpairs are always uninitialized,
since there is no security hook for the socketpair() syscall. This
patch adds the required hooks so LSMs can properly label socketpairs.
This allows SO_PEERSEC to return useful information on those sockets.

Note that the behavior of socketpair() can be emulated by creating a
listener socket, connecting to it, and then discarding the initial
listener socket. With this workaround, SO_PEERSEC would return the
caller's security context. However, with socketpair(), the uninitialized
context is returned unconditionally. This is unexpected and makes
socketpair() less useful in situations where the security context is
crucial to the application.

With the new socketpair-hook this disparity can be solved by making
socketpair() return the expected security context.

Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
2018-05-04 12:48:54 -07:00
Bhadram Varka
23b8392201 net: phy: broadcom: add support for BCM89610 PHY
It adds support for BCM89610 (Single-Port 10/100/1000BASE-T)
transceiver which is used in P3310 Tegra186 platform.

Signed-off-by: Bhadram Varka <vbhadram@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-04 12:45:55 -04:00
David S. Miller
a7b15ab887 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Overlapping changes in selftests Makefile.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-04 09:58:56 -04:00
Honghui Zhang
101c92dc80 PCI: mediatek: Set up vendor ID and class type for MT7622
MT7622's hardware default value of vendor ID and class type is not correct,
fix that by setup the correct values before linkup with Endpoint.

Signed-off-by: Honghui Zhang <honghui.zhang@mediatek.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Ryder Lee <ryder.lee@mediatek.com>
2018-05-04 12:25:45 +01:00
Jiong Wang
9c8105bd44 bpf: centre subprog information fields
It is better to centre all subprog information fields into one structure.
This structure could later serve as function node in call graph.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-04 11:58:36 +02:00
Jiong Wang
f910cefa32 bpf: unify main prog and subprog
Currently, verifier treat main prog and subprog differently. All subprogs
detected are kept in env->subprog_starts while main prog is not kept there.
Instead, main prog is implicitly defined as the prog start at 0.

There is actually no difference between main prog and subprog, it is better
to unify them, and register all progs detected into env->subprog_starts.

This could also help simplifying some code logic.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-04 11:58:35 +02:00
Ruslan Bilovol
9ea19e7e76 include: usb: audio-v3: add BADD-specific values
Add BADD-specific predefined values to audio-v3
so usb-audio in ALSA and UAC3 gadget can use them

Signed-off-by: Ruslan Bilovol <ruslan.bilovol@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2018-05-04 09:40:04 +02:00
Peter Zijlstra
b5bf9a90bb sched/core: Introduce set_special_state()
Gaurav reported a perceived problem with TASK_PARKED, which turned out
to be a broken wait-loop pattern in __kthread_parkme(), but the
reported issue can (and does) in fact happen for states that do not do
condition based sleeps.

When the 'current->state = TASK_RUNNING' store of a previous
(concurrent) try_to_wake_up() collides with the setting of a 'special'
sleep state, we can loose the sleep state.

Normal condition based wait-loops are immune to this problem, but for
sleep states that are not condition based are subject to this problem.

There already is a fix for TASK_DEAD. Abstract that and also apply it
to TASK_STOPPED and TASK_TRACED, both of which are also without
condition based wait-loop.

Reported-by: Gaurav Kohli <gkohli@codeaurora.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-04 07:54:54 +02:00
Linus Torvalds
e523a2562a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Various sockmap fixes from John Fastabend (pinned map handling,
    blocking in recvmsg, double page put, error handling during redirect
    failures, etc.)

 2) Fix dead code handling in x86-64 JIT, from Gianluca Borello.

 3) Missing device put in RDS IB code, from Dag Moxnes.

 4) Don't process fast open during repair mode in TCP< from Yuchung
    Cheng.

 5) Move address/port comparison fixes in SCTP, from Xin Long.

 6) Handle add a bond slave's master into a bridge properly, from
    Hangbin Liu.

 7) IPv6 multipath code can operate on unitialized memory due to an
    assumption that the icmp header is in the linear SKB area. Fix from
    Eric Dumazet.

 8) Don't invoke do_tcp_sendpages() recursively via TLS, from Dave
    Watson.

9) Fix memory leaks in x86-64 JIT, from Daniel Borkmann.

10) RDS leaks kernel memory to userspace, from Eric Dumazet.

11) DCCP can invoke a tasklet on a freed socket, take a refcount. Also
    from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (78 commits)
  dccp: fix tasklet usage
  smc: fix sendpage() call
  net/smc: handle unregistered buffers
  net/smc: call consolidation
  qed: fix spelling mistake: "offloded" -> "offloaded"
  net/mlx5e: fix spelling mistake: "loobpack" -> "loopback"
  tcp: restore autocorking
  rds: do not leak kernel memory to user land
  qmi_wwan: do not steal interfaces from class drivers
  ipv4: fix fnhe usage by non-cached routes
  bpf: sockmap, fix error handling in redirect failures
  bpf: sockmap, zero sg_size on error when buffer is released
  bpf: sockmap, fix scatterlist update on error path in send with apply
  net_sched: fq: take care of throttled flows before reuse
  ipv6: Revert "ipv6: Allow non-gateway ECMP for IPv6"
  bpf, x64: fix memleak when not converging on calls
  bpf, x64: fix memleak when not converging after image
  net/smc: restrict non-blocking connect finish
  8139too: Use disable_irq_nosync() in rtl8139_poll_controller()
  sctp: fix the issue that the cookie-ack with auth can't get processed
  ...
2018-05-03 18:57:03 -10:00
Daniel Borkmann
e0cea7ce98 bpf: implement ld_abs/ld_ind in native bpf
The main part of this work is to finally allow removal of LD_ABS
and LD_IND from the BPF core by reimplementing them through native
eBPF instead. Both LD_ABS/LD_IND were carried over from cBPF and
keeping them around in native eBPF caused way more trouble than
actually worth it. To just list some of the security issues in
the past:

  * fdfaf64e75 ("x86: bpf_jit: support negative offsets")
  * 35607b02db ("sparc: bpf_jit: fix loads from negative offsets")
  * e0ee9c1215 ("x86: bpf_jit: fix two bugs in eBPF JIT compiler")
  * 07aee94394 ("bpf, sparc: fix usage of wrong reg for load_skb_regs after call")
  * 6d59b7dbf7 ("bpf, s390x: do not reload skb pointers in non-skb context")
  * 87338c8e2c ("bpf, ppc64: do not reload skb pointers in non-skb context")

For programs in native eBPF, LD_ABS/LD_IND are pretty much legacy
these days due to their limitations and more efficient/flexible
alternatives that have been developed over time such as direct
packet access. LD_ABS/LD_IND only cover 1/2/4 byte loads into a
register, the load happens in host endianness and its exception
handling can yield unexpected behavior. The latter is explained
in depth in f6b1b3bf0d ("bpf: fix subprog verifier bypass by
div/mod by 0 exception") with similar cases of exceptions we had.
In native eBPF more recent program types will disable LD_ABS/LD_IND
altogether through may_access_skb() in verifier, and given the
limitations in terms of exception handling, it's also disabled
in programs that use BPF to BPF calls.

In terms of cBPF, the LD_ABS/LD_IND is used in networking programs
to access packet data. It is not used in seccomp-BPF but programs
that use it for socket filtering or reuseport for demuxing with
cBPF. This is mostly relevant for applications that have not yet
migrated to native eBPF.

The main complexity and source of bugs in LD_ABS/LD_IND is coming
from their implementation in the various JITs. Most of them keep
the model around from cBPF times by implementing a fastpath written
in asm. They use typically two from the BPF program hidden CPU
registers for caching the skb's headlen (skb->len - skb->data_len)
and skb->data. Throughout the JIT phase this requires to keep track
whether LD_ABS/LD_IND are used and if so, the two registers need
to be recached each time a BPF helper would change the underlying
packet data in native eBPF case. At least in eBPF case, available
CPU registers are rare and the additional exit path out of the
asm written JIT helper makes it also inflexible since not all
parts of the JITer are in control from plain C. A LD_ABS/LD_IND
implementation in eBPF therefore allows to significantly reduce
the complexity in JITs with comparable performance results for
them, e.g.:

test_bpf             tcpdump port 22             tcpdump complex
x64      - before    15 21 10                    14 19  18
         - after      7 10 10                     7 10  15
arm64    - before    40 91 92                    40 91 151
         - after     51 64 73                    51 62 113

For cBPF we now track any usage of LD_ABS/LD_IND in bpf_convert_filter()
and cache the skb's headlen and data in the cBPF prologue. The
BPF_REG_TMP gets remapped from R8 to R2 since it's mainly just
used as a local temporary variable. This allows to shrink the
image on x86_64 also for seccomp programs slightly since mapping
to %rsi is not an ereg. In callee-saved R8 and R9 we now track
skb data and headlen, respectively. For normal prologue emission
in the JITs this does not add any extra instructions since R8, R9
are pushed to stack in any case from eBPF side. cBPF uses the
convert_bpf_ld_abs() emitter which probes the fast path inline
already and falls back to bpf_skb_load_helper_{8,16,32}() helper
relying on the cached skb data and headlen as well. R8 and R9
never need to be reloaded due to bpf_helper_changes_pkt_data()
since all skb access in cBPF is read-only. Then, for the case
of native eBPF, we use the bpf_gen_ld_abs() emitter, which calls
the bpf_skb_load_helper_{8,16,32}_no_cache() helper unconditionally,
does neither cache skb data and headlen nor has an inlined fast
path. The reason for the latter is that native eBPF does not have
any extra registers available anyway, but even if there were, it
avoids any reload of skb data and headlen in the first place.
Additionally, for the negative offsets, we provide an alternative
bpf_skb_load_bytes_relative() helper in eBPF which operates
similarly as bpf_skb_load_bytes() and allows for more flexibility.
Tested myself on x64, arm64, s390x, from Sandipan on ppc64.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-03 16:49:19 -07:00
Daniel Borkmann
93731ef086 bpf: migrate ebpf ld_abs/ld_ind tests to test_verifier
Remove all eBPF tests involving LD_ABS/LD_IND from test_bpf.ko. Reason
is that the eBPF tests from test_bpf module do not go via BPF verifier
and therefore any instruction rewrites from verifier cannot take place.

Therefore, move them into test_verifier which runs out of user space,
so that verfier can rewrite LD_ABS/LD_IND internally in upcoming patches.
It will have the same effect since runtime tests are also performed from
there. This also allows to finally unexport bpf_skb_vlan_{push,pop}_proto
and keep it internal to core kernel.

Additionally, also add further cBPF LD_ABS/LD_IND test coverage into
test_bpf.ko suite.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-03 16:49:19 -07:00
Magnus Karlsson
865b03f211 dev: packet: make packet_direct_xmit a common function
The new dev_direct_xmit will be used by AF_XDP in later commits.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-03 15:55:24 -07:00
Björn Töpel
02671e23e7 xsk: wire up XDP_SKB side of AF_XDP
This commit wires up the xskmap to XDP_SKB layer.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-03 15:55:24 -07:00
Björn Töpel
fbfc504a24 bpf: introduce new bpf AF_XDP map type BPF_MAP_TYPE_XSKMAP
The xskmap is yet another BPF map, very much inspired by
dev/cpu/sockmap, and is a holder of AF_XDP sockets. A user application
adds AF_XDP sockets into the map, and by using the bpf_redirect_map
helper, an XDP program can redirect XDP frames to an AF_XDP socket.

Note that a socket that is bound to certain ifindex/queue index will
*only* accept XDP frames from that netdev/queue index. If an XDP
program tries to redirect from a netdev/queue index other than what
the socket is bound to, the frame will not be received on the socket.

A socket can reside in multiple maps.

v3: Fixed race and simplified code.
v2: Removed one indirection in map lookup.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-03 15:55:24 -07:00
Björn Töpel
68e8b849b2 net: initial AF_XDP skeleton
Buildable skeleton of AF_XDP without any functionality. Just what it
takes to register a new address family.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-03 15:55:23 -07:00
Tetsuo Handa
8236b0ae31 bdi: wake up concurrent wb_shutdown() callers.
syzbot is reporting hung tasks at wait_on_bit(WB_shutting_down) in
wb_shutdown() [1]. This seems to be because commit 5318ce7d46 ("bdi:
Shutdown writeback on all cgwbs in cgwb_bdi_destroy()") forgot to call
wake_up_bit(WB_shutting_down) after clear_bit(WB_shutting_down).

Introduce a helper function clear_and_wake_up_bit() and use it, in order
to avoid similar errors in future.

[1] https://syzkaller.appspot.com/bug?id=b297474817af98d5796bc544e1bb806fc3da0e5e

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+c0cf869505e03bdf1a24@syzkaller.appspotmail.com>
Fixes: 5318ce7d46 ("bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()")
Cc: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-03 09:25:47 -06:00
Christoph Hellwig
3d6ce86ee7 drivers: remove force dma flag from buses
With each bus implementing its own DMA configuration callback, there is no
need for bus to explicitly set the force_dma flag.  Modify the
of_dma_configure function to accept an input parameter which specifies if
implicit DMA configuration is required when it is not described by the
firmware.

Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>  # PCI parts
Reviewed-by: Rob Herring <robh@kernel.org>
[hch: tweaked the changelog a bit]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-03 16:25:08 +02:00
Nipun Gupta
07397df29e dma-mapping: move dma configuration to bus infrastructure
ACPI/OF support for configuration of DMA is a bus specific aspect, and
thus should be configured by the bus.  Introduces a 'dma_configure' bus
method so that busses can control their DMA capabilities.

Also update the PCI, Platform, ACPI and host1x buses to use the new
method.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>  # PCI parts
Acked-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[hch: simplified host1x_dma_configure based on a comment from Thierry,
      rewrote changelog]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-03 16:22:18 +02:00
Kees Cook
7bbf1373e2 nospec: Allow getting/setting on non-current task
Adjust arch_prctl_get/set_spec_ctrl() to operate on tasks other than
current.

This is needed both for /proc/$pid/status queries and for seccomp (since
thread-syncing can trigger seccomp in non-current threads).

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-03 13:55:51 +02:00
Thomas Gleixner
b617cfc858 prctl: Add speculation control prctls
Add two new prctls to control aspects of speculation related vulnerabilites
and their mitigations to provide finer grained control over performance
impacting mitigations.

PR_GET_SPECULATION_CTRL returns the state of the speculation misfeature
which is selected with arg2 of prctl(2). The return value uses bit 0-2 with
the following meaning:

Bit  Define           Description
0    PR_SPEC_PRCTL    Mitigation can be controlled per task by
                      PR_SET_SPECULATION_CTRL
1    PR_SPEC_ENABLE   The speculation feature is enabled, mitigation is
                      disabled
2    PR_SPEC_DISABLE  The speculation feature is disabled, mitigation is
                      enabled

If all bits are 0 the CPU is not affected by the speculation misfeature.

If PR_SPEC_PRCTL is set, then the per task control of the mitigation is
available. If not set, prctl(PR_SET_SPECULATION_CTRL) for the speculation
misfeature will fail.

PR_SET_SPECULATION_CTRL allows to control the speculation misfeature, which
is selected by arg2 of prctl(2) per task. arg3 is used to hand in the
control value, i.e. either PR_SPEC_ENABLE or PR_SPEC_DISABLE.

The common return values are:

EINVAL  prctl is not implemented by the architecture or the unused prctl()
        arguments are not 0
ENODEV  arg2 is selecting a not supported speculation misfeature

PR_SET_SPECULATION_CTRL has these additional return values:

ERANGE  arg3 is incorrect, i.e. it's not either PR_SPEC_ENABLE or PR_SPEC_DISABLE
ENXIO   prctl control of the selected speculation misfeature is disabled

The first supported controlable speculation misfeature is
PR_SPEC_STORE_BYPASS. Add the define so this can be shared between
architectures.

Based on an initial patch from Tim Chen and mostly rewritten.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-05-03 13:55:50 +02:00
Konrad Rzeszutek Wilk
c456442cd3 x86/bugs: Expose /sys/../spec_store_bypass
Add the sysfs file for the new vulerability. It does not do much except
show the words 'Vulnerable' for recent x86 cores.

Intel cores prior to family 6 are known not to be vulnerable, and so are
some Atoms and some Xeon Phi.

It assumes that older Cyrix, Centaur, etc. cores are immune.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
2018-05-03 13:55:47 +02:00