No longer require RTNL lock in this code. Newly introduced mutexes take
care of guarding objagg and bloom filter. There is no need to guard
gen_pool_alloc()/gen_pool_free() as they are fine to be called lockless.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Relax dependency on rtnl mutex during vregion_rehash_intrvl_set(). The
vregion list is protected with newly introduced mutex.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Protect objagg structures by adding a mutex to ERP code and take it
during the structure manipulation.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For MR ACL profile is does not make sense to do periodical rehashes, as
there is only one mask in use during the whole vregion lifetime.
Therefore periodical work is scheduled but the rehash never happens.
So allow to enable/disable rehash for the whole group, which is added
per-profile. Disable rehashing for MR profile.
Addition to the vregion list is done only in case the rehash is enable
on the particular vregion. Also, the addition is moved after delayed
work init to avoid schedule of uninitialized work
from vregion_rehash_intrvl_set(). Symmetrically, deletion from
the list is done before canceling the delayed work so it is
not scheduled by vregion_rehash_intrvl_set() again.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bloom filter is shared within multiple regions. For updates, it needs to
be guarded by a separate mutex. Do that in order to not rely on RTNL
mutex.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to remove dependency on RTNL, introduce a mutex
to guard vregion structure, list of chunks and list of entries in
chunks.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refactor existing _vchunk_assoc/_vchunk_deassoc() functions into
_vregion_get()/_vregion_put() to make the code simpler and prepared for
vregion locking.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to remove RTNL lock dependency, it is needed to protect
the regions list in a group. Introduce a mutex to do the job.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the existing group structure to contain fields needed for HW region
list manipulations. Move the rest of the fields into new vgroup struct.
This makes layering cleaner as the vgroup struct is on higher level than
low-level group struct. Also, this makes it possible to introduce
fine-grained locking.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed says:
====================
mlx5-updates-2019-02-21
This series adds some misc updates to mlx5 driver,
1) Eli Britstein, Introduces tunnel entropy control from PCMR register
and fixes GRE key by controlling port tunnel entropy calculation.
2) Eran Ben Elisha, provides some mlx5 fixes to the latest tx devlink health
reporting mechanism.
3) Huy Nguyen, Added the support for ndo bridge_setlink to allow
VEPA/VEB E-Switch legacy mode configurations.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add Spectrum-2 ASIC support for the following new port types and speeds:
* 50Gbps 1-lane
* 100Gbps 2-lanes
* 200Gbps 4-lanes
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add Spectrum-2 ASIC port type-speed operations.
Since multiple ethtool link modes are represented using a single bit in the
ASIC, the driver forces the user to configure all types per a specific
speed. For example, if the user wants to advertise 100Gbps 4-lanes speed,
he should advertise all the types of 100Gbps 4-lanes speed that are
supported by the ASIC as shown below:
Supported ethtool bits for 100Gbps 4-lanes:
0x1000000000 100000baseKR4 Full
0x2000000000 100000baseSR4 Full
0x4000000000 100000baseCR4 Full
0x8000000000 100000baseLR4_ER4 Full
Command for advertising 100Gbps 4-lanes:
ethtool -s enp3s0np1 advertise 0xF000000000
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PTYS register introduces a new layout for port type-speed fields. These
fields extend the existing ones in order to handle more types and speeds.
For example, the new 200Gbps speed.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add port type-speed operations in order to have different operations for
different ASICs. For now, both ASICs use the same pointer.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Give precision identifiers to the two snprintf() formatting the priority
and TC strings to avoid producing these two warnings:
drivers/net/ethernet/mellanox/mlxsw/spectrum.c: In function
'mlxsw_sp_port_get_prio_strings':
drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2132:37: warning: '%d'
directive output may be truncated writing between 1 and 3 bytes into a
region of size between 0 and 31 [-Wformat-truncation=]
snprintf(*p, ETH_GSTRING_LEN, "%s_%d",
^~
drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2132:3: note: 'snprintf'
output between 3 and 36 bytes into a destination of size 32
snprintf(*p, ETH_GSTRING_LEN, "%s_%d",
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mlxsw_sp_port_hw_prio_stats[i].str, prio);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/mellanox/mlxsw/spectrum.c: In function
'mlxsw_sp_port_get_tc_strings':
drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2143:37: warning: '%d'
directive output may be truncated writing between 1 and 11 bytes into a
region of size between 0 and 31 [-Wformat-truncation=]
snprintf(*p, ETH_GSTRING_LEN, "%s_%d",
^~
drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2143:3: note: 'snprintf'
output between 3 and 44 bytes into a destination of size 32
snprintf(*p, ETH_GSTRING_LEN, "%s_%d",
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mlxsw_sp_port_hw_tc_stats[i].str, tc);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow enabling VEPA mode on the HCA's port in legacy devlink mode.
Example:
bridge link set dev ens1f0 hwmode vepa
will turn on VEPA mode on the netdev ens1f0.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In Virtual Ethernet Port Aggregator (VEPA) mode, the packet skips
the system internal virtual switch and forwards to external network
switch. In Mellanox HCA case, the virtual switch is the HCA's Eswitch.
To support this, an new FDB flow table are created with level 0 and
linked to the existing FDB flow table in legacy mode. By default,
VEPA is turned off and this FDB flow table is empty. When VEPA is
turned on, two rules are created. One rule to forward on uplink vport
traffic to the legacy FDB. The other rule forward all other traffic
to uplink vport.
Other design alternatives were not chosen as explained below:
1. Create a forward rule in ACL flow table (most efficient design).
This approach is the not chosen because firmware does not support
forward rule to uplink vport (0xffff) for ACL flow table.
2. Add additional source port criteria in all the FDB rules to make the
FDB rules to be received rules only. This approach is not chosen because
it is not efficient as there can many rules in the FDB and VEPA mode
cannot be controlled per vport.
3. Add a highest prioirty flow group in the existing legacy FDB Flow
Table instead of a new flow table. This approoach does not work because the
new flow group has the same match criteria as the promiscuous flow group
and mlx5_add_flow_rules does not allow specifying flow group.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
If reporter is ERR_PTR or NULL, error code shall be returned. At all other
cases it shall return success. Fix that.
Fixes: de8650a820 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In case of lost interrupt recover, we shall return success. Fix that.
Fixes: 7d91126b1a ("net/mlx5e: Add tx timeout support for mlx5e tx reporter")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reported-by: Maria Pasechnik <mariap@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When TX reporter was introduced, it took ownership over TX timeout error
handling. this introduced a regression in case TX reporter is not valid
(NET_DEVLINK is not set, or devlink_health_reporter_create failure).
Fix mlx5e_tx_reporter_timeout function so it can be called at all times.
In addition, remove a warning print that indicates that a TX timeout won't
be handled in case of no valid TX reporter.
Fixes: 7d91126b1a ("net/mlx5e: Add tx timeout support for mlx5e tx reporter")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Print warning message in case of TX reporter creation failure, only if the
return value is ERR_PTR type. NULL pointer return indicates that
NET_DEVLINK is not set, and the warning print can be skipped.
Fixes: de8650a820 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Flow entropy is calculated on the inner packet headers and used for
flow distribution in processing, routing etc. For GRE-type
encapsulations the entropy value is placed in the eight LSB of the key
field in the GRE header as defined in NVGRE RFC 7637. For UDP based
encapsulations the entropy value is placed in the source port of the
UDP header.
The hardware may support entropy calculation specifically for GRE and
for all tunneling protocols. With commit df2ef3bff1 ("net/mlx5e: Add
GRE protocol offloading") GRE is offloaded, but the hardware is
configured by default to calculate flow entropy so packets transmitted
on the wire have a wrong key. To support UDP based tunnels (i.e VXLAN),
GRE (i.e. no flow entropy) and NVGRE (i.e. with flow entropy) the
hardware behaviour must be controlled by the driver.
Ensure port entropy calculation is enabled for offloaded VXLAN tunnels
and disable port entropy calculation in the presence of offloaded GRE
tunnels by monitoring the presence of entropy enabling tunnels (i.e
VXLAN) and entropy disabing tunnels (i.e GRE).
Fixes: df2ef3bff1 ("net/mlx5e: Add GRE protocol offloading")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently changing a PCMR field is done by setting the field in a
zeroed buffer, zeroing other unrelated fields.
Fix this behaviour by modifying only the required field after first
reading the current register values, as a pre-step towards using more
fields in PCMR register.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
After AF_PACKET is fixed to calculate the transport header offset
correctly, trust the value set by the kernel. If the offset wasn't set,
it means there is no transport header in the packet.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_transport_offset() == 0 is not a special value. The only special
value is when skb->transport_header is ~0U, and it's checked by
skb_transport_header_was_set().
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cap_max_headroom_size holds maximum headroom size supported.
Overstepping that limit might under certain conditions lead to ASIC
freeze.
Query and store the value, and add mlxsw_sp_sb_max_headroom_cells() for
obtaining the stored value. In __mlxsw_sp_port_headroom_set(), reject
requests where the total port buffer is larger than the advertised
maximum.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The recommendation for headroom size for 100Gbps port and 100m cable is
101.6KB, reduced accordingly for split ports. The closest higher number
evenly divisible by cell size for both Spectrum-1 and Spectrum-2, and
such that the number of cells can be further divided by maximum split
factor of 4, is 102528 bytes, or 25632 bytes per lane.
Update mlxsw_sp_port_pb_init() to compute the headroom taking into
account this recommended per-lane value and number of lanes actually
dedicated to a given port.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Customize the tables related to shared buffer configuration to match the
current recommendation for Spectrum-2 systems.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SBMM register configures the shared buffer quota for MC packets
according to Switch-Priority. The default configuration depends on the
chip type. Therefore keep the table and length in struct
mlxsw_sp_sb_vals. Redirect the references from the global definitions to
the fields.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SBCM register configures shared buffer quota according to
port-priority resp. port-TC. The default configuration depends on the
chip type. Therefore keep the tables and their lengths in struct
mlxsw_sp_sb_vals. Redirect the references from the global definitions to
the fields.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SBPR register configures shared buffer pools. The default
configuration depends on the chip type. Therefore keep it in struct
mlxsw_sp_sb_vals. Redirect the one reference from the global array to
the field.
Because the pool descriptor ID is implicit in the ordering of array
members, both this array and the pool descriptor array have the same
length. Therefore reuse mlxsw_sp_sb.pool_dess_len for the purpose of
determining the length of SBPR array.
Drop the now useless MLXSW_SP_SB_PRS_LEN.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SBPM register can be used to configure quotas for packets ingressing
from a certain pool to a certain port, and egressing from a certain pool
to a certain port. The default configuration depends on the chip type.
Therefore keep it in struct mlxsw_sp_sb_vals. Redirect the one reference
from the global array to the field.
Because the pool descriptor ID is implicit in the ordering of array
members, both this array and the pool descriptor array have the same
length. Therefore reuse mlxsw_sp_sb.pool_dess_len for the purpose of
determining the length of SBPM array.
Drop the now useless MLXSW_SP_SB_PMS_LEN.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Keep the table of pool descriptors and its length in struct
mlxsw_sp_sb_vals so that it can be specialized per chip type. Redirect
all users from the global definitions to the mlxsw_sp_sb fields.
Give mlxsw_sp_pool_count() an extra mlxsw_sp parameter so that it can
access the descriptor table.
Drop the now unnecessary MLXSW_SP_SB_POOL_DESS_LEN.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Spectrum-2 will be configured with a different set of pools than
Spectrum-1. The size of prs and pms buffers will therefore depend on the
chip type of the device.
Therefore, instead of reserving an array directly in a structure
definition, allocate the buffer in mlxsw_sp_sb_port{,s}_init().
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Spectrum-2 will be configured with a different shared buffer
configuration than Spectrum-1. Therefore introduce a structure for
keeping the chip-specific default and immutable configuration.
Configuration mutable in runtime will still be kept in struct
mlxsw_sp_sb.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the bridge no longer calling switchdev_port_attr_get() to obtain
the supported bridge port flags from a driver but instead trying to set
the bridge port flags directly and relying on driver to reject
unsupported configurations, we can effectively get rid of
switchdev_port_attr_get() entirely since this was the only place where
it was called.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we have converted the bridge code and the drivers to check for
bridge port(s) flags at the time we try to set them, there is no need
for a get() -> set() sequence anymore and
SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT therefore becomes unused.
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for getting rid of switchdev_port_attr_get(), have mlxsw
check for the bridge flags being set through switchdev_port_attr_set()
when the SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS attribute identifier is
used.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed says:
====================
mlx5-updates-2019-02-19
This series includes misc updates to mlx5 drivers and one ethtool update.
1) From Aya Levin:
- ethtool: Define 50Gbps per lane link modes
- add support for 50Gbps per lane link modes in mlx5 driver
2) From Tariq Toukan,
- Add a helper function to unify mlx5 resource reloading
3) From Vlad Buslov,
- Remove wrong and superfluous tc pedit header type check
4) From Tonghao Zhang,
- Some refactoring in en_tc.c to simplify the mlx5e_tc_add_fdb_flow
5) From Leon Romanovsky & Saeed,
- Compilation warning fixes
6) From Bodong wang,
- E-Switch fixes that are related to the SmarNIC series
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When disabling vport, relevant vport configurations will be cleaned
up. These cleanups should be done to the vports which had these configs
applied at vport enablement. As esw manager vport didn't have such
vport config applied, cleanup should not touch it.
Fixes: de9e6a8136c5 ("net/mlx5: E-Switch, Properly refer to host PF vport as other vport")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reported-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When eswitch gets vport data structure, the index should not be out
of the range of the vport array. Driver mistakenly used vport number
to check the range.
Fixes: 22b8ddc86bf4 ("net/mlx5: E-Switch, Assign a different position for uplink rep and vport")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>