Saeed Mahameed says:
====================
mlx5-updates-2019-09-05
1) Allover mlx5 cleanups
2) Added port congestion counters to ethtool stats:
Add 3 counters per priority to ethtool using PPCNT:
2.1) rx_prio[p]_buf_discard - the number of packets discarded by device
due to lack of per host receive buffers
2.2) rx_prio[p]_cong_discard - the number of packets discarded by device
due to per host congestion
2.3) rx_prio[p]_marked - the number of packets ECN marked by device due
to per host congestion
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add the ability to use unaligned chunks in the AF_XDP umem. By
relaxing where the chunks can be placed, it allows to use an
arbitrary buffer size and place whenever there is a free
address in the umem. Helps more seamless DPDK AF_XDP driver
integration. Support for i40e, ixgbe and mlx5e, from Kevin and
Maxim.
2) Addition of a wakeup flag for AF_XDP tx and fill rings so the
application can wake up the kernel for rx/tx processing which
avoids busy-spinning of the latter, useful when app and driver
is located on the same core. Support for i40e, ixgbe and mlx5e,
from Magnus and Maxim.
3) bpftool fixes for printf()-like functions so compiler can actually
enforce checks, bpftool build system improvements for custom output
directories, and addition of 'bpftool map freeze' command, from Quentin.
4) Support attaching/detaching XDP programs from 'bpftool net' command,
from Daniel.
5) Automatic xskmap cleanup when AF_XDP socket is released, and several
barrier/{read,write}_once fixes in AF_XDP code, from Björn.
6) Relicense of bpf_helpers.h/bpf_endian.h for future libbpf
inclusion as well as libbpf versioning improvements, from Andrii.
7) Several new BPF kselftests for verifier precision tracking, from Alexei.
8) Several BPF kselftest fixes wrt endianess to run on s390x, from Ilya.
9) And more BPF kselftest improvements all over the place, from Stanislav.
10) Add simple BPF map op cache for nfp driver to batch dumps, from Jakub.
11) AF_XDP socket umem mapping improvements for 32bit archs, from Ivan.
12) Add BPF-to-BPF call and BTF line info support for s390x JIT, from Yauheni.
13) Small optimization in arm64 JIT to spare 1 insns for BPF_MOD, from Jerin.
14) Fix an error check in bpf_tcp_gen_syncookie() helper, from Petar.
15) Various minor fixes and cleanups, from Nathan, Masahiro, Masanari,
Peter, Wei, Yue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Offloaded OvS datapath rules are translated one to one to tc rules,
for example the following simplified OvS rule:
recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2)
Will be translated to the following tc rule:
$ tc filter add dev dev1 ingress \
prio 1 chain 0 proto ip \
flower tcp ct_state -trk \
action ct pipe \
action goto chain 2
Received packets will first travel though tc, and if they aren't stolen
by it, like in the above rule, they will continue to OvS datapath.
Since we already did some actions (action ct in this case) which might
modify the packets, and updated action stats, we would like to continue
the proccessing with the correct recirc_id in OvS (here recirc_id(2))
where we left off.
To support this, introduce a new skb extension for tc, which
will be used for translating tc chain to ovs recirc_id to
handle these miss cases. Last tc chain index will be set
by tc goto chain action and read by OvS datapath.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Map capability bit indicating that HCA supports port buffer's congestion
counters. Also map registers with the corresponding counters.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Marc Kleine-Budde says:
====================
pull-request: can-next 2019-09-04 j1939
this is a pull request for net-next/master consisting of 21 patches.
the first 12 patches are by me and target the CAN core infrastructure.
They clean up the names of variables , structs and struct members,
convert can_rx_register() to use max() instead of open coding it and
remove unneeded code from the can_pernet_exit() callback.
The next three patches are also by me and they introduce and make use of
the CAN midlayer private structure. It is used to hold protocol specific
per device data structures.
The next patch is by Oleksij Rempel, switches the
&net->can.rcvlists_lock from a spin_lock() to a spin_lock_bh(), so that
it can be used from NAPI (soft IRQ) context.
The next 4 patches are by Kurt Van Dijck, he first updates his email
address via mailmap and then extends sockaddr_can to include j1939
members.
The final patch is the collective effort of many entities (The j1939
authors: Oliver Hartkopp, Bastian Stender, Elenita Hinds, kbuild test
robot, Kurt Van Dijck, Maxime Jayat, Robin van der Gracht, Oleksij
Rempel, Marc Kleine-Budde). It adds support of SAE J1939 protocol to the
CAN networking stack.
SAE J1939 is the vehicle bus recommended practice used for communication
and diagnostics among vehicle components. Originating in the car and
heavy-duty truck industry in the United States, it is now widely used in
other parts of the world.
P.S.: This pull request doesn't invalidate my last pull request:
"pull-request: can-next 2019-09-03".
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Marc Kleine-Budde says:
====================
pull-request: can-next 2019-09-03
this is a pull request for net-next/master consisting of 15 patches.
The first patch is by Christer Beskow, targets the kvaser_pciefd driver
and fixes the PWM generator's frequency.
The next three patches are by Dan Murphy, the tcan4x5x is updated to use
a proper interrupts/interrupt-parent DT binding to specify the devices
IRQ line. Further the unneeded wake ups of the device is removed from
the driver.
A patch by me for the mcp25xx driver removes the deprecated board file
setup example. Three patches by Andy Shevchenko simplify clock handling,
update the driver from OF to device property API and simplify the
mcp251x_can_suspend() function.
The remaining 7 patches are by me and clean up checkpatch warnings in
the generic CAN device infrastructure.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
SAE J1939 is the vehicle bus recommended practice used for communication
and diagnostics among vehicle components. Originating in the car and
heavy-duty truck industry in the United States, it is now widely used in
other parts of the world.
J1939, ISO 11783 and NMEA 2000 all share the same high level protocol.
SAE J1939 can be considered the replacement for the older SAE J1708 and
SAE J1587 specifications.
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Bastian Stender <bst@pengutronix.de>
Signed-off-by: Elenita Hinds <ecathinds@gmail.com>
Signed-off-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be>
Signed-off-by: Maxime Jayat <maxime.jayat@mobile-devices.fr>
Signed-off-by: Robin van der Gracht <robin@protonic.nl>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The size of this structure will be increased with J1939 support. To stay
binary compatible, the CAN_REQUIRED_SIZE macro is introduced for
existing CAN protocols.
Signed-off-by: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch removes the old method of allocating the per device protocol
specific memory via a netdevice_notifier. This had the drawback, that
the allocation can fail, leading to a lot of null pointer checks in the
code. This also makes the live cycle management of this memory quite
complicated.
This patch switches from the allocating the struct can_dev_rcv_lists in
a NETDEV_REGISTER call to using the dev->ml_priv, which is allocated by
the driver since the previous patch.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch introduces the CAN midlayer private structure ("struct
can_ml_priv") which should be used to hold protocol specific per device
data structures. For now it's only member is "struct can_dev_rcv_lists".
The CAN midlayer private is allocated via alloc_netdev()'s private and
assigned to "struct net_device::ml_priv" during device creation. This is
done transparently for CAN drivers using alloc_candev(). The slcan, vcan
and vxcan drivers which are not using alloc_candev() have been adopted
manually. The memory layout of the netdev_priv allocated via
alloc_candev() will looke like this:
+-------------------------+
| driver's priv |
+-------------------------+
| struct can_ml_priv |
+-------------------------+
| array of struct sk_buff |
+-------------------------+
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Saeed Mahameed says:
====================
mlx5-updates-2019-09-01 (Software steering support)
Abstract:
--------
Mellanox ConnetX devices supports packet matching, packet modification and
redirection. These functionalities are also referred to as flow-steering.
To configure a steering rule, the rule is written to the device owned
memory, this memory is accessed and cached by the device when processing
a packet.
Steering rules are constructed from multiple steering entries (STE).
Rules are configured using the Firmware command interface. The Firmware
processes the given driver command and translates them to STEs, then
writes them to the device memory in the current steering tables.
This process is slow due to the architecture of the command interface and
the processing complexity of each rule.
The highlight of this patchset is to cut the middle man (The firmware) and
do steering rules programming into device directly from the driver, with
no firmware intervention whatsoever.
Motivation:
-----------
Software (driver managed) steering allows for high rule insertion rates
compared to the FW steering described above, this is achieved by using
internal RDMA writes to the device owned memory instead of the slow
command interface to program steering rules.
Software (driver managed) steering, doesn't depend on new FW
for new steering functionality, new implementations can be done in the
driver skipping the FW layer.
Performance:
------------
The insertion rate on a single core using the new approach allows
programming ~300K rules per sec. (Done via direct raw test to the new mlx5
sw steering layer, without any kernel layer involved).
Test: TC L2 rules
33K/s with Software steering (this patchset).
5K/s with FW and current driver.
This will improve OVS based solution performance.
Architecture and implementation details:
----------------------------------------
Software steering will be dynamically selected via devlink device
parameter. Example:
$ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
pci/0000:06:00.0:
name flow_steering_mode type driver-specific
values:
cmode runtime value smfs
mlx5 software steering module a.k.a (DR - Direct Rule) is implemented
and contained in mlx5/core/steering directory and controlled by
MLX5_SW_STEERING kconfig flag.
mlx5 core steering layer (fs_core) already provides a shim layer for
implementing different steering mechanisms, software steering will
leverage that as seen at the end of this series.
When Software Steering for a specific steering domain
(NIC/RDMA/Vport/ESwitch, etc ..) is supported, it will cause rules
targeting this domain to be created using SW steering instead of FW.
The implementation includes:
Domain - The steering domain is the object that all other object resides
in. It holds the memory allocator, send engine, locks and other shared
data needed by lower objects such as table, matcher, rule, action.
Each domain can contain multiple tables. Domain is equivalent to
namespaces e.g (NIC/RDMA/Vport/ESwitch, etc ..) as implemented
currently in mlx5_core fs_core (flow steering core).
Table - Table objects are used for holding multiple matchers, each table
has a level used to prevent processing loops. Packets are being
directed to this table once it is set as the root table, this is done
by fs_core using a FW command. A packet is being processed inside the
table matcher by matcher until a successful hit, otherwise the packet
will perform the default action.
Matcher - Matchers objects are used to specify the fields mask for
matching when processing a packet. A matcher belongs to a table, each
matcher can hold multiple rules, each rule with different matching
values corresponding to the matcher mask. Each matcher has a priority
used for rule processing order inside the table.
Action - Action objects are created to specify different steering actions
such as count, reformat (encapsulate, decapsulate, ...), modify
header, forward to table and many other actions. When creating a rule
a sequence of actions can be provided to be executed on a successful
match.
Rule - Rule objects are used to specify a specific match on packets as
well as the actions that should be executed. A rule belongs to a
matcher.
STE - This layer is used to hold the specific STE format for the device
and to convert the requested rule to STEs. Each rule is constructed of
an STE chain, Multiple rules construct a steering graph. Each node in
the graph is a hash table containing multiple STEs. The index of each
STE in the hash table is being calculated using a CRC32 hash function.
Memory pool - Used for managing and caching device owned memory for rule
insertion. The memory is being allocated using DM (device memory) API.
Communication with device - layer for standard RDMA operation using RC QP
to configure the device steering.
Command utility - This module holds all of the FW commands that are
required for SW steering to function.
Patch planning and files:
-------------------------
1) First patch, adds the support to Add flow steering actions to fs_cmd
shim layer.
2) Next 12 patch will add a file per each Software steering
functionality/module as described above. (See patches with title: DR, *)
3) Add CONFIG_MLX5_SW_STEERING for software steering support and enable
build with the new files
4) Next two patches will add the support for software steering in mlx5
steering shim layer
net/mlx5: Add API to set the namespace steering mode
net/mlx5: Add direct rule fs_cmd implementation
5) Last two patches will add the new devlink parameter to select mlx5
steering mode, will be valid only for switchdev mode for now.
Two modes are supported:
1. DMFS - Device managed flow steering
2. SMFS - Software/Driver managed flow steering.
In the DMFS mode, the HW steering entities are created through the
FW. In the SMFS mode this entities are created though the driver
directly.
The driver will use the devlink steering mode only if the steering
domain supports it, for now SMFS will manages only the switchdev
eswitch steering domain.
User command examples:
- Set SMFS flow steering mode::
$ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime
- Read device flow steering mode::
$ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
pci/0000:06:00.0:
name flow_steering_mode type driver-specific
values:
cmode runtime value smfs
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add flow steering actions: modify header and packet reformat
to the fs_cmd shim layer. This allows each namespace to define
possibly different functionality for alloc/dealloc action commands.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Merge mlx5-next patches needed for upcoming mlx5 software steering.
1) Alex adds HW bits and definitions required for SW steering
2) Ariel moves device memory management to mlx5_core (From mlx5_ib)
3) Maor, Cleanups and fixups for eswitch mode and RoCE
4) Mark, Set only stag for match untagged packets
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Move the device memory allocation and deallocation commands
SW ICM memory to mlx5_core to expose this API for all
mlx5_core users.
This comes as preparation for supporting SW steering in kernel
where it will be required to allocate and register device
memory for direct rule insertion.
In addition, an API to register this device memory for future
remote access operations is introduced using the create_mkey
commands.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Pull networking fixes from David Miller:
1) Fix some length checks during OGM processing in batman-adv, from
Sven Eckelmann.
2) Fix regression that caused netfilter conntrack sysctls to not be
per-netns any more. From Florian Westphal.
3) Use after free in netpoll, from Feng Sun.
4) Guard destruction of pfifo_fast per-cpu qdisc stats with
qdisc_is_percpu_stats(), from Davide Caratti. Similar bug is fixed
in pfifo_fast_enqueue().
5) Fix memory leak in mld_del_delrec(), from Eric Dumazet.
6) Handle neigh events on internal ports correctly in nfp, from John
Hurley.
7) Clear SKB timestamp in NF flow table code so that it does not
confuse fq scheduler. From Florian Westphal.
8) taprio destroy can crash if it is invoked in a failure path of
taprio_init(), because the list head isn't setup properly yet and
the list del is unconditional. Perform the list add earlier to
address this. From Vladimir Oltean.
9) Make sure to reapply vlan filters on device up, in aquantia driver.
From Dmitry Bogdanov.
10) sgiseeq driver releases DMA memory using free_page() instead of
dma_free_attrs(). From Christophe JAILLET.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (58 commits)
net: seeq: Fix the function used to release some memory in an error handling path
enetc: Add missing call to 'pci_free_irq_vectors()' in probe and remove functions
net: bcmgenet: use ethtool_op_get_ts_info()
tc-testing: don't hardcode 'ip' in nsPlugin.py
net: dsa: microchip: add KSZ8563 compatibility string
dt-bindings: net: dsa: document additional Microchip KSZ8563 switch
net: aquantia: fix out of memory condition on rx side
net: aquantia: linkstate irq should be oneshot
net: aquantia: reapply vlan filters on up
net: aquantia: fix limit of vlan filters
net: aquantia: fix removal of vlan 0
net/sched: cbs: Set default link speed to 10 Mbps in cbs_set_port_rate
taprio: Set default link speed to 10 Mbps in taprio_set_picos_per_byte
taprio: Fix kernel panic in taprio_destroy
net: dsa: microchip: fill regmap_config name
rxrpc: Fix lack of conn cleanup when local endpoint is cleaned up [ver #2]
net: stmmac: dwmac-rk: Don't fail if phy regulator is absent
amd-xgbe: Fix error path in xgbe_mod_init()
netfilter: nft_meta_bridge: Fix get NFT_META_BRI_IIFVPROTO in network byteorder
mac80211: Correctly set noencrypt for PAE frames
...
The patch adds driver support for configuring the grc dump config flags.
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch adds driver support for reading the config id attributes from NVM
flash partition.
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull tracing fixes from Steven Rostedt:
"Small fixes and minor cleanups for tracing:
- Make exported ftrace function not static
- Fix NULL pointer dereference in reading probes as they are created
- Fix NULL pointer dereference in k/uprobe clean up path
- Various documentation fixes"
* tag 'trace-v5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Correct kdoc formats
ftrace/x86: Remove mcount() declaration
tracing/probe: Fix null pointer dereference
tracing: Make exported ftrace_set_clr_event non-static
ftrace: Check for successful allocation of hash
ftrace: Check for empty hash and comment the race with registering probes
ftrace: Fix NULL pointer dereference in t_probe_next()
The function ftrace_set_clr_event is declared static and marked
EXPORT_SYMBOL_GPL(), which is at best an odd combination. Because the
function was decided to be a part of API, this commit removes the static
attribute and adds the declaration to the header.
Link: http://lkml.kernel.org/r/20190704172110.27041-1-efremov@linux.com
Fixes: f45d1225ad ("tracing: Kernel access to Ftrace instances")
Reviewed-by: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
I've noticed that the "slab" value in memory.stat is sometimes 0, even
if some children memory cgroups have a non-zero "slab" value. The
following investigation showed that this is the result of the kmem_cache
reparenting in combination with the per-cpu batching of slab vmstats.
At the offlining some vmstat value may leave in the percpu cache, not
being propagated upwards by the cgroup hierarchy. It means that stats
on ancestor levels are lower than actual. Later when slab pages are
released, the precise number of pages is substracted on the parent
level, making the value negative. We don't show negative values, 0 is
printed instead.
To fix this issue, let's flush percpu slab memcg and lruvec stats on
memcg offlining. This guarantees that numbers on all ancestor levels
are accurate and match the actual number of outstanding slab pages.
Link: http://lkml.kernel.org/r/20190819202338.363363-3-guro@fb.com
Fixes: fb2f2b0adb ("mm: memcg/slab: reparent memcg kmem_caches on cgroup removal")
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull ARM SoC fixes from Arnd Bergmann:
"The majority of the fixes this time are for OMAP hardware, here is a
breakdown of the significant changes:
Various device tree bug fixes:
- TI am57xx boards need a voltage level fix to avoid damaging SD
cards
- vf610-bk4 fails to detect its flash due to an incorrect description
- meson-g12a USB phy configuration fails
- meson-g12b reboot should not power off the SD card
- Some corrections for apparently harmless differences from the
documentation.
Regression fixes:
- ams-delta FIQ interrupts broke in 5.3
- TI am3/am4 mmc controllers broke in 5.2
The logic_pio driver (used on some Huawei ARM servers) got a few bug
fixes for reliability.
And a couple of compile-time warning fixes"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (26 commits)
soc: ixp4xx: Protect IXP4xx SoC drivers by ARCH_IXP4XX || COMPILE_TEST
soc: ti: pm33xx: Make two symbols static
soc: ti: pm33xx: Fix static checker warnings
ARM: OMAP: dma: Mark expected switch fall-throughs
ARM: dts: Fix incomplete dts data for am3 and am4 mmc
bus: ti-sysc: Simplify cleanup upon failures in sysc_probe()
ARM: OMAP1: ams-delta-fiq: Fix missing irq_ack
ARM: dts: dra74x: Fix iodelay configuration for mmc3
ARM: dts: am335x: Fix UARTs length
ARM: OMAP2+: Fix omap4 errata warning on other SoCs
bus: hisi_lpc: Add .remove method to avoid driver unbind crash
bus: hisi_lpc: Unregister logical PIO range to avoid potential use-after-free
lib: logic_pio: Add logic_pio_unregister_range()
lib: logic_pio: Avoid possible overlap for unregistering regions
lib: logic_pio: Fix RCU usage
arm64: dts: amlogic: odroid-n2: keep SD card regulator always on
arm64: dts: meson-g12a-sei510: enable IR controller
arm64: dts: meson-g12a: add missing dwc2 phy-names
ARM: dts: vf610-bk4: Fix qspi node description
ARM: dts: Fix incorrect dcan register mapping for am3, am4 and dra7
...
Hisilicon fixes for v5.3-rc
- Fixed RCU usage in logical PIO
- Added a function to unregister a logical PIO range in logical PIO
to support the fixes in the hisi-lpc driver
- Fixed and optimized hisi-lpc driver to avoid potential use-after-free
and driver unbind crash
* tag 'hisi-fixes-for-5.3' of git://github.com/hisilicon/linux-hisi:
bus: hisi_lpc: Add .remove method to avoid driver unbind crash
bus: hisi_lpc: Unregister logical PIO range to avoid potential use-after-free
lib: logic_pio: Add logic_pio_unregister_range()
lib: logic_pio: Avoid possible overlap for unregistering regions
lib: logic_pio: Fix RCU usage
Link: https://lore.kernel.org/r/5D562335.7000902@hisilicon.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
mlx5 HW spec and bits updates:
1) Aya exposes IP-in-IP capability in mlx5_core.
2) Maxim exposes lag tx port affinity capabilities.
3) Moshe adds VNIC_ENV internal rq counter bits.
4) ODP capabilities for DC transport
Misc updates:
5) Saeed, two compiler warnings cleanups
6) Add XRQ legacy commands opcodes
7) Use refcount_t for refcount
8) fix a -Wstringop-truncation warning
In mlx5_core initialization, query max ODP capabilities for DC transport
from FW and set as current capabilities.
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
EHL DW EQOS is running on a 200MHz clock. Setting up stmmac-clk,
ptp clock and ptp_max_adj to 200MHz.
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 34786005ec ("net: phy: prevent PHYs w/o Clause 22 regs from calling
genphy_config_aneg") introduced a check that aborts phy_config_aneg()
if the phy is a C45 phy.
This causes phy_state_machine() to call phy_error() so that the phy
ends up in PHY_HALTED state.
Instead of returning -EOPNOTSUPP, call genphy_c45_config_aneg()
(analogous to the C22 case) so that the state machine can run
correctly.
genphy_c45_config_aneg() closely resembles mv3310_config_aneg()
in drivers/net/phy/marvell10g.c, excluding vendor specific
configurations for 1000BaseT.
Fixes: 22b56e8270 ("net: phy: replace genphy_10g_driver with genphy_c45_driver")
Signed-off-by: Marco Hartmann <marco.hartmann@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce BPF_F_TEST_STATE_FREQ flag to stress test parentage chain
and state pruning.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
Stable fixes:
- Fix a page lock leak in nfs_pageio_resend()
- Ensure O_DIRECT reports an error if the bytes read/written is 0
- Don't handle errors if the bind/connect succeeded
- Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was
invalidat ed"
Bugfixes:
- Don't refresh attributes with mounted-on-file information
- Fix return values for nfs4_file_open() and nfs_finish_open()
- Fix pnfs layoutstats reporting of I/O errors
- Don't use soft RPC calls for pNFS/flexfiles I/O, and don't abort
for soft I/O errors when the user specifies a hard mount.
- Various fixes to the error handling in sunrpc
- Don't report writepage()/writepages() errors twice"
* tag 'nfs-for-5.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: remove set but not used variable 'mapping'
NFSv2: Fix write regression
NFSv2: Fix eof handling
NFS: Fix writepage(s) error handling to not report errors twice
NFS: Fix spurious EIO read errors
pNFS/flexfiles: Don't time out requests on hard mounts
SUNRPC: Handle connection breakages correctly in call_status()
Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was invalidated"
SUNRPC: Handle EADDRINUSE and ENOBUFS correctly
pNFS/flexfiles: Turn off soft RPC calls
SUNRPC: Don't handle errors if the bind/connect succeeded
NFS: On fatal writeback errors, we need to call nfs_inode_remove_request()
NFS: Fix initialisation of I/O result struct in nfs_pgio_rpcsetup
NFS: Ensure O_DIRECT reports an error if the bytes read/written is 0
NFSv4/pnfs: Fix a page lock leak in nfs_pageio_resend()
NFSv4: Fix return value in nfs_finish_open()
NFSv4: Fix return values for nfs4_file_open()
NFS: Don't refresh attributes with mounted-on-file information
Pull networking fixes from David Miller:
1) Use 32-bit index for tails calls in s390 bpf JIT, from Ilya
Leoshkevich.
2) Fix missed EPOLLOUT events in TCP, from Eric Dumazet. Same fix for
SMC from Jason Baron.
3) ipv6_mc_may_pull() should return 0 for malformed packets, not
-EINVAL. From Stefano Brivio.
4) Don't forget to unpin umem xdp pages in error path of
xdp_umem_reg(). From Ivan Khoronzhuk.
5) Fix sta object leak in mac80211, from Johannes Berg.
6) Fix regression by not configuring PHYLINK on CPU port of bcm_sf2
switches. From Florian Fainelli.
7) Revert DMA sync removal from r8169 which was causing regressions on
some MIPS Loongson platforms. From Heiner Kallweit.
8) Use after free in flow dissector, from Jakub Sitnicki.
9) Fix NULL derefs of net devices during ICMP processing across
collect_md tunnels, from Hangbin Liu.
10) proto_register() memory leaks, from Zhang Lin.
11) Set NLM_F_MULTI flag in multipart netlink messages consistently,
from John Fastabend.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (66 commits)
r8152: Set memory to all 0xFFs on failed reg reads
openvswitch: Fix conntrack cache with timeout
ipv4: mpls: fix mpls_xmit for iptunnel
nexthop: Fix nexthop_num_path for blackhole nexthops
net: rds: add service level support in rds-info
net: route dump netlink NLM_F_MULTI flag missing
s390/qeth: reject oversized SNMP requests
sock: fix potential memory leak in proto_register()
MAINTAINERS: Add phylink keyword to SFF/SFP/SFP+ MODULE SUPPORT
xfrm/xfrm_policy: fix dst dev null pointer dereference in collect_md mode
ipv4/icmp: fix rt dst dev null pointer dereference
openvswitch: Fix log message in ovs conntrack
bpf: allow narrow loads of some sk_reuseport_md fields with offset > 0
bpf: fix use after free in prog symbol exposure
bpf: fix precision tracking in presence of bpf2bpf calls
flow_dissector: Fix potential use-after-free on BPF_PROG_DETACH
Revert "r8169: remove not needed call to dma_sync_single_for_device"
ipv6: propagate ipv6_add_dev's error returns out of ipv6_find_idev
net/ncsi: Fix the payload copying for the request coming from Netlink
qed: Add cleanup in qed_slowpath_start()
...
This reverts commit a79f194aa4.
The mechanism for aborting I/O is racy, since we are not guaranteed that
the request is asleep while we're changing both task->tk_status and
task->tk_action.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v5.1
Pull timekeeping fix from Thomas Gleixner:
"A single fix for a regression caused by the generic VDSO
implementation where a math overflow causes CLOCK_BOOTTIME to become a
random number generator"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping/vsyscall: Prevent math overflow in BOOTTIME update
Pull dma-mapping fixes from Christoph Hellwig:
"Two fixes for regressions in this merge window:
- select the Kconfig symbols for the noncoherent dma arch helpers on
arm if swiotlb is selected, not just for LPAE to not break then Xen
build, that uses swiotlb indirectly through swiotlb-xen
- fix the page allocator fallback in dma_alloc_contiguous if the CMA
allocation fails"
* tag 'dma-mapping-5.3-5' of git://git.infradead.org/users/hch/dma-mapping:
dma-direct: fix zone selection after an unaddressable CMA allocation
arm: select the dma-noncoherent symbols for all swiotlb builds
Pull GPIO fixes from Linus Walleij:
"Here is a (hopefully last) set of GPIO fixes for the v5.3 kernel
cycle. Two are pretty core:
- Fix not reporting open drain/source lines to userspace as "input"
- Fix a minor build error found in randconfigs
- Fix a chip select quirk on the Freescale SPI
- Fix the irqchip initialization semantic order to reflect what it
was using the old API"
* tag 'gpio-v5.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: Fix irqchip initialization order
gpio: of: fix Freescale SPI CS quirk handling
gpio: Fix build error of function redefinition
gpiolib: never report open-drain/source lines as 'input' to user-space
This is an unlikely case, use unlikely() on it seems logical.
Signed-off-by: xiaolinkui <xiaolinkui@kylinos.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull ceph fixes from Ilya Dryomov:
"Three important fixes tagged for stable (an indefinite hang, a crash
on an assert and a NULL pointer dereference) plus a small series from
Luis fixing instances of vfree() under spinlock"
* tag 'ceph-for-5.3-rc6' of git://github.com/ceph/ceph-client:
libceph: fix PG split vs OSD (re)connect race
ceph: don't try fill file_lock on unsuccessful GETFILELOCK reply
ceph: clear page dirty before invalidate page
ceph: fix buffer free while holding i_ceph_lock in fill_inode()
ceph: fix buffer free while holding i_ceph_lock in __ceph_build_xattrs_blob()
ceph: fix buffer free while holding i_ceph_lock in __ceph_setxattr()
libceph: allow ceph_buffer_put() to receive a NULL ceph_buffer
The VDSO update for CLOCK_BOOTTIME has a overflow issue as it shifts the
nanoseconds based boot time offset left by the clocksource shift. That
overflows once the boot time offset becomes large enough. As a consequence
CLOCK_BOOTTIME in the VDSO becomes a random number causing applications to
misbehave.
Fix it by storing a timespec64 representation of the offset when boot time
is adjusted and add that to the MONOTONIC base time value in the vdso data
page. Using the timespec64 representation avoids a 64bit division in the
update code.
Fixes: 44f57d788e ("timekeeping: Provide a generic update_vsyscall() implementation")
Reported-by: Chris Clayton <chris2553@googlemail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Chris Clayton <chris2553@googlemail.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908221257580.1983@nanos.tec.linutronix.de
HV VHCA is a layer which provides PF to VF communication channel based on
HyperV PCI config channel. It implements Mellanox's Inter VHCA control
communication protocol. The protocol contains control block in order to
pass messages between the PF and VF drivers, and data blocks in order to
pass actual data.
The infrastructure is agent based. Each agent will be responsible of
contiguous buffer blocks in the VHCA config space. This infrastructure will
bind agents to their blocks, and those agents can only access read/write
the buffer blocks assigned to them. Each agent will provide three
callbacks (control, invalidate, cleanup). Control will be invoked when
block-0 is invalidated with a command that concerns this agent. Invalidate
callback will be invoked if one of the blocks assigned to this agent was
invalidated. Cleanup will be invoked before the agent is being freed in
order to clean all of its open resources or deferred works.
Block-0 serves as the control block. All execution commands from the PF
will be written by the PF over this block. VF will ack on those by
writing on block-0 as well. Its format is described by struct
mlx5_hv_vhca_control_block layout.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This interface driver is a helper driver allows other drivers to
have a common interface with the Hyper-V PCI frontend driver.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Windows SR-IOV provides a backchannel mechanism in software for communication
between a VF driver and a PF driver. These "configuration blocks" are
similar in concept to PCI configuration space, but instead of doing reads and
writes in 32-bit chunks through a very slow path, packets of up to 128 bytes
can be sent or received asynchronously.
Nearly every SR-IOV device contains just such a communications channel in
hardware, so using this one in software is usually optional. Using the
software channel, however, allows driver implementers to leverage software
tools that fuzz the communications channel looking for vulnerabilities.
The usage model for these packets puts the responsibility for reading or
writing on the VF driver. The VF driver sends a read or a write packet,
indicating which "block" is being referred to by number.
If the PF driver wishes to initiate communication, it can "invalidate" one or
more of the first 64 blocks. This invalidation is delivered via a callback
supplied by the VF driver by this driver.
No protocol is implied, except that supplied by the PF and VF drivers.
Signed-off-by: Jake Oshins <jakeo@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use different namespaces for bypass and switchdev loopback because they
have different priorities and default table miss action requirement:
1. bypass: with multiple priorities support, and
MLX5_FLOW_TABLE_MISS_ACTION_DEF as the default table miss action;
2. switchdev loopback: with single priority support, and
MLX5_FLOW_TABLE_MISS_ACTION_SWITCH_DOMAIN as the default table miss
action.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
The new dma_alloc_contiguous hides if we allocate CMA or regular
pages, and thus fails to retry a ZONE_NORMAL allocation if the CMA
allocation succeeds but isn't addressable. That means we either fail
outright or dip into a small zone that might not succeed either.
Thanks to Hillf Danton for debugging this issue.
Fixes: b1d2dc009d ("dma-contiguous: add dma_{alloc,free}_contiguous() helpers")
Reported-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Add the lag_tx_port_affinity HCA capability bit that indicates that
setting port affinity of TISes is supported.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Expose Fw indication that it supports Stateless Offloads for IP over IP
tunneled packets. The following offloads are supported for the inner
packets: RSS, RX & TX Checksum Offloads, LSO and Flow Steering.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>