This new state is required when firmware indicates that the error
recovery process requires polling for firmware state to be completely
down before initiating reset. For example, firmware may take some
time to collect the crash dump before it is down and ready to be
reset.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Firmware coredump messages take much longer than standard messages,
so increase the timeout accordingly.
Fixes: 6c5657d085 ("bnxt_en: Add support for ethtool get dump.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Health show command example and output:
$ devlink health show pci/0000:af:00.0 reporter fw_fatal
pci/0000:af:00.0:
name fw_fatal
state healthy error 1 recover 1 grace_period 0 auto_recover true
Fatal events from firmware or missing periodic heartbeats will
be reported and recovery will be handled.
We also turn on the support flags when we register with the firmware to
enable this health and recovery feature in the firmware.
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This call will handle fatal firmware errors by forcing a reset on the
firmware. The master function driver will carry out the forced reset.
The sequence will go through the same bnxt_fw_reset_task() workqueue.
This fatal reset differs from the non-fatal reset at the beginning
stages. From the BNXT_FW_RESET_STATE_ENABLE_DEV state onwards where
the firmware is coming out of reset, it is practically identical to the
non-fatal reset.
The next patch will add the periodic heartbeat check and the devlink
reporter to report the fatal event and to initiate the bnxt_fw_exception()
call.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a flag to mark that the firmware has encountered fatal condition.
The driver will not send any more firmware messages and will return
error to the caller. Fix up some clean up functions to continue
and not abort when the firmware message function returns error.
This is preparation work to fully handle firmware error recovery
under fatal conditions.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add devlink health reporter for the firmware reset event. Once we get
the notification from firmware about the impending reset, the driver
will report this to devlink and the call to bnxt_fw_reset() will be
initiated to complete the reset sequence.
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the bnxt_fw_reset() main function to handle firmware reset. This
is triggered by firmware to initiate an orderly reset, for example
when a non-fatal exception condition has been detected. bnxt_fw_reset()
will first wait for all VFs to shutdown and then start the
bnxt_fw_reset_task() work queue to go through the sequence of reset,
re-probe, and re-initialization.
The next patch will add the devlink reporter to start the sequence and
call bnxt_fw_reset().
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This event from firmware signals a coordinated reset initiated by the
firmware. It may be triggered by some error conditions encountered
in the firmware or other orderly reset conditions.
We store the parameters from this event. Subsequent patches will
add logic to handle reset itself using devlink reporters.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create new FW devlink_health_reporter, to know the current health
status of FW.
Command example and output:
$ devlink health show pci/0000:af:00.0 reporter fw
pci/0000:af:00.0:
name fw
state healthy error 0 recover 0
FW status: Healthy; Reset count: 1
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new flag will be set in subsequent patches when firmware is
going through reset. If bnxt_close() is called while the new flag
is set, the FW reset sequence will have to be aborted because the
NIC is prematurely closed before FW reset has completed. We also
reject SRIOV configurations while FW reset is in progress.
v2: No longer drop rtnl_lock() in close and wait for FW reset to complete.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Handle the async event from the firmware that enables firmware health
monitoring. Store initial health metrics.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call the new firmware API HWRM_ERROR_RECOVERY_QCFG if it is supported
to discover the firmware health and recovery capabilities and settings.
This feature allows the driver to reset the chip if firmware crashes and
becomes unresponsive.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During IF_UP, newer firmware has a new status flag that indicates that
firmware has reset. Add new function bnxt_fw_init_one() to re-probe the
firmware and re-setup VF resources on the PF if necessary. If the
re-probe fails, set a flag to prevent bnxt_open() from proceeding again.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
This cleans up a lot of unneeded code and logic around the debugfs
files, making all of this much simpler and easier to understand.
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define the 57508, 57504, and 57502 chip IDs that are all part of the
BNXT_CHIP_P5 family of chips.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the new TPA feature in the 57500 chips, we need to discover the
feature first before setting up the netdev features. Refactor the
the firmware probe and init logic more cleanly into 2 functions and
and make these calls before setting up the netdev features.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new TPA implemantation has additional TPA counters that extend the
per-ring statistics block. Allocate the proper size accordingly.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new TPA feature on 57500 supports a larger number of concurrent TPAs
(up to 1024) divided among the functions. We need to add some logic to
map the hardware TPA ID to a software index that keeps track of each TPA
in progress. A 1:1 direct mapping without translation would be too
wasteful as we would have to allocate 1024 TPA structures for each RX
ring on each PCI function.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an aggregation array to bnxt_tpa_info struct to keep track of the
aggregation completions. The aggregation completions are not
completed at the TPA_END completion on 57500 chips so we need to
keep track of them. The array is only allocated on the new chips
when required. An agg_count field is also added to keep track of the
number of these completions.
The maximum concurrent TPA is now discovered from firmware instead of
the hardcoded 64. Add a new bp->max_tpa to keep track of maximum
configured TPA.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new chips have a slightly modified TPA interface for LRO/GRO_HW.
Modify the TPA structures so that the same structures can also be
used on the new chips.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This removes contention over page allocation for XDP_REDIRECT actions by
adding page_pool support per queue for the driver. The performance for
XDP_REDIRECT actions scales linearly with the number of cores performing
redirect actions when using the page pools instead of the standard page
allocator.
v2: Fix up the error path from XDP registration, noted by Ilias Apalodimas.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds basic support for XDP_REDIRECT in the bnxt_en driver. Next
patch adds the more optimized page pool support.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__bnxt_xmit_xdp() is used by XDP_TX and ethtool loopback packet transmit.
Refactor it so that it can be re-used by the XDP_REDIRECT logic.
Restructure the TX interrupt handler logic to cleanly separate XDP_TX
logic in preparation for XDP_REDIRECT.
Acked-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mamameed says:
====================
Generic DIM
From: Tal Gilboa and Yamin Fridman
Implement net DIM over a generic DIM library, add RDMA DIM
dim.h lib exposes an implementation of the DIM algorithm for
dynamically-tuned interrupt moderation for networking interfaces.
We want a similar functionality for other protocols, which might need to
optimize interrupts differently. Main motivation here is DIM for NVMf
storage protocol.
Current DIM implementation prioritizes reducing interrupt overhead over
latency. Also, in order to reduce DIM's own overhead, the algorithm might
take some time to identify it needs to change profiles. While this is
acceptable for networking, it might not work well on other scenarios.
Here we propose a new structure to DIM. The idea is to allow a slightly
modified functionality without the risk of breaking Net DIM behavior for
netdev. We verified there are no degradations in current DIM behavior with
the modified solution.
Suggested solution:
- Common logic is implemented in lib/dim/dim.c
- Net DIM (existing) logic is implemented in lib/dim/net_dim.c, which uses
the common logic in dim.c
- Any new DIM logic will be implemented in "lib/dim/new_dim.c".
This new implementation will expose modified versions of profiles,
dim_step() and dim_decision().
- DIM API is declared in include/linux/dim.h for all implementations.
Pros for this solution are:
- Zero impact on existing net_dim implementation and usage
- Relatively more code reuse (compared to two separate solutions)
- Increased extensibility
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Moved all logic from dim.h and net_dim.h to dim.c and net_dim.c.
This is both more structurally appealing and would allow to only
expose externally used functions.
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Removed 'net' prefix from functions and structs used by external drivers.
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Skip RDMA context memory allocations, reduce to 1 ring, and disable
TPA when running in the kdump kernel. Without this patch, the driver
fails to initialize with memory allocation errors when running in a
typical kdump kernel.
Fixes: cf6daed098 ("bnxt_en: Increase context memory allocations on 57500 chips for RDMA.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When making configuration changes, the driver calls bnxt_close_nic()
and then bnxt_open_nic() for the changes to take effect. A parameter
irq_re_init is passed to the call sequence to indicate if IRQ
should be re-initialized. This irq_re_init parameter needs to
be included in the bnxt_reserve_rings() call. bnxt_reserve_rings()
can only call pci_disable_msix() if the irq_re_init parameter is
true, otherwise it may hit BUG() because some IRQs may not have been
freed yet.
Fixes: 41e8d79837 ("bnxt_en: Modify the ring reservation functions for 57500 series chips.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In newer firmware, the context memory for MR (Memory Region)
and AH (Address Handle) to support RDMA are specified separately.
Modify driver to specify and allocate the 2 context memory types
separately when supported by the firmware.
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
HWRM_VER_GET firmware command returns package name that is running
actively on the adapter. Use this version instead of parsing from
the package log in NVRAM.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Newer firmware now advertises the capability for extended stats
support. Check the new capability in addition to the existing
version check.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Three conflicts, one of which, for marvell10g.c is non-trivial and
requires some follow-up from Heiner or someone else.
The issue is that Heiner converted the marvell10g driver over to
use the generic c45 code as much as possible.
However, in 'net' a bug fix appeared which makes sure that a new
local mask (MDIO_AN_10GBT_CTRL_ADV_NBT_MASK) with value 0x01e0
is cleared.
Signed-off-by: David S. Miller <davem@davemloft.net>
The code waits up to 20 usec for the firmware response to complete
once we've seen the valid response header in the buffer. It turns
out that in some scenarios, this wait time is not long enough.
Extend it to 150 usec and use usleep_range() instead of udelay().
Fixes: 9751e8e714 ("bnxt_en: reduce timeout on initial HWRM calls")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Newer firmware understands the concept of a trusted VF, so propagate the
trusted VF attribute set by the PF admin. to the firmware. Also, check
the firmware trusted setting when considering the VF MAC address change
and reporting the trusted setting to the user.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
BNXT only supports SWITCHDEV_ATTR_ID_PORT_PARENT_ID, which makes it a
great candidate to be converted to use the ndo_get_port_parent_id() NDO
instead of implementing switchdev_port_attr_get(). The conversion is
straight forward here since the PF and VF code use the same getter.
Since bnxt makes uses of switchdev_port_same_parent_id() convert it to
use netdev_port_same_parent_id().
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Register devlink_port with devlink and create initial port params
table for bnxt_en. The table consists of a generic parameter:
wake_on_lan: Enables Wake on Lan for this port when magic packet
is received with this port's MAC address using ACPI pattern.
If enabled, the controller asserts a wake pin upon reception of
WoL packet. ACPI (Advanced Configuration and Power Interface) is
an industry specification for the efficient handling of power
consumption in desktop and mobile computers.
v2->v3:
- Modify bnxt_dl_wol_validate(), to throw error message when user gives
value other than DEVLINK_PARAM_WAKE_MAGIC ot to disable WOL.
- Use netdev_err() instead of netdev_warn(), when devlink_port_register()
and devlink_port_params_register() returns error. Also, don't log rc
in this message.
Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Older firmware only supports 16-bit flow handle, because of which the
number of flows that can be offloaded can’t scale beyond a point.
Newer firmware supports 64-bit flow handle enabling the host to scale
upto millions of flows. With the new 64-bit flow handle support, driver
has to query flow stats in a different way compared to the older approach.
This patch adds support for 64-bit flow handle and new way to query
flow stats.
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Reviewed-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If RDMA is supported on the 57500 chip, increase context memory
allocations for the resources used by RDMA.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the new functions bnxt_alloc_ctx_pg_tbls()/bnxt_free_ctx_pg_tbls()
to allocate and free pages for context memory. The new functions
will handle the different levels of paging support and allocate/free
the pages accordingly using the existing functions.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To support level 2 context page memory structures, enhance the
bnxt_ring_mem_info structure with a "depth" field to specify the page
level and add a flag to specify using full pages for L1 and L2 page
tables. This is needed to support RDMA functionality on 57500 chips
since RDMA requires more context memory.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Earlier, some of the firmware commands (ex: CFA_FLOW_*) which are processed
by KONG processor were sent to the CHIMP processor from the host. This
approach was taken as there was no direct message channel to KONG.
CHIMP in turn used to send them to KONG. Newer firmware supports a new
message channel which the host can send messages directly to the KONG
processor.
This patch adds support for required changes needed in the driver
to support direct KONG message channel. This speeds up flow related
messages sent to the firmware for CLS_FLOWER offload.
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These routines will be enhanced in the subsequent patch to
return the 2nd firmware comm. channel's hwrm response address &
sequence id respectively.
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set hwrm_intr_seq_id value to its inverted value instead of
HWRM_SEQ_INVALID, when an hwrm completion of type
CMPL_BASE_TYPE_HWRM_DONE is received. This will enable us to use
the complete 16-bit sequence ID space.
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the current driver, the statistics reported by .ndo_get_stats64()
are reset when the device goes down. Store a snapshot of the
rtnl_link_stats64 before shutdown. This snapshot is added to the
current counters in .ndo_get_stats64() so that the counters will not
get reset when the device is down.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Display the CoS counters as additional priority counters by looking up
the priority to CoS queue mapping. If the TX extended port statistics
block size returned by firmware is big enough to cover the CoS counters,
then we will display the new priority counters. We call firmware to get
the up-to-date pri2cos mapping to convert the CoS counters to
priority counters.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are some minor differences when assigning VF resources on the
new chips. The MSIX (NQ) resource has to be assigned and ring group
is not needed on the new chips.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>