There's a bunch of overhead in spi-geni-qcom's prepare_message. Get
rid of it. Before this change spi_geni_prepare_message() took around
14.5 us. After this change, spi_geni_prepare_message() takes about
1.75 us (as measured by ftrace).
What's here:
* We're always in FIFO mode, so no need to call it for every transfer.
This avoids a whole ton of readl/writel calls.
* We don't need to write a whole pile of config registers if the mode
isn't changing. Cache the last mode and only do the work if needed.
* For several registers we were trying to do read/modify/write, but
there was no reason. The registers only have one thing in them, so
just write them.
Acked-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Akash Asthana <akashast@codeaurora.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20200701174506.3.I2b3d7aeb1ea622335482cce60c58d2f8381e61dd@changeid
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
In the patch ("spi: spi-geni-qcom: Avoid clock setting if not needed")
we avoid a whole pile of clock code. As part of that, we should have
restored the clock at runtime resume. Do that.
It turns out that, at least with today's configurations, this doesn't
actually matter. That's because none of the current device trees have
an OPP table for geni SPI yet. That makes dev_pm_opp_set_rate(dev, 0)
a no-op. This is why it wasn't noticed in the testing of the original
patch. It's still a good idea to fix, though.
Reviewed-by: Rajendra Nayak <rnayak@codeaurora.org>
Reviewed-by: Akash Asthana <akashast@codeaurora.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20200709074037.v2.1.I0b701fc23eca911a5bde4ae4fa7f97543d7f960e@changeid
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Every SPI transfer could have a different clock rate. The
spi-geni-qcom controller code to deal with this was never very well
optimized and has always had a lot of code plus some calls into the
clk framework which, at the very least, would grab a mutex. However,
until recently, the overhead wasn't _too_ much. That changed with
commit 0e3b8a81f5 ("spi: spi-geni-qcom: Add interconnect support")
we're now calling geni_icc_set_bw(), which leads to a bunch of math
plus:
geni_icc_set_bw()
icc_set_bw()
apply_constraints()
qcom_icc_set()
qcom_icc_bcm_voter_commit()
rpmh_invalidate()
rpmh_write_batch()
...and those rpmh commands can be a bit beefy if you call them too
often.
We already know what speed we were running at before, so if we see
that nothing has changed let's avoid the whole pile of code.
On my hardware, this made spi_geni_prepare_message() drop down from
~145 us down to ~14 us.
NOTE: Potentially it might also make sense to add some code into the
interconnect framework to avoid executing so much code when bandwidth
isn't changing, but even if we did that we still want to short circuit
here to save the extra math / clock calls.
Acked-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Akash Asthana<akashast@codeaurora.org>
Fixes: 0e3b8a81f5 ("spi: spi-geni-qcom: Add interconnect support")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20200701174506.1.Icfdcee14649fc0a6c38e87477b28523d4e60bab3@changeid
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
This converts the two Freescale i.MX SPI drivers
Freescale i.MX (CONFIG_SPI_IMX) and Freescale i.MX LPSPI
(CONFIG_SPI_FSL_LPSPI) to use GPIO descriptors handled in
the SPI core for GPIO chip selects whether defined in
the device tree or a board file.
The reason why both are converted at the same time is
that they were both using the same platform data and
platform device population helpers when using
board files intertwining the code so this gives a cleaner
cut.
The platform device creation was passing a platform data
container from each boardfile down to the driver using
struct spi_imx_master from <linux/platform_data/spi-imx.h>,
but this was only conveying the number of chipselects and
an int * array of the chipselect GPIO numbers.
The imx27 and imx31 platforms had code passing the
now-unused platform data when creating the platform devices,
this has been repurposed to pass around GPIO descriptor
tables. The platform data struct that was just passing an
array of integers and number of chip selects for the GPIO
lines has been removed.
The number of chipselects used to be passed from the board
file, because this number also limits the number of native
chipselects that the platform can use. To deal with this we
just augment the i.MX (CONFIG_SPI_IMX) driver to support 3
chipselects if the platform does not define "num-cs" as a
device property (such as from the device tree). This covers
all the legacy boards as these use <= 3 native chip selects
(or GPIO lines, and in that case the number of chip selects
is determined by the core from the number of available
GPIO lines). Any new boards should use device tree, so
this is a reasonable simplification to cover all old
boards.
The LPSPI driver never assigned the number of chipselects
and thus always fall back to the core default of 1 chip
select if no GPIOs are defined in the device tree.
The Freescale i.MX driver was already partly utilizing
the SPI core to obtain the GPIO numbers from the device tree,
so this completes the transtion to let the core handle all
of it.
All board files and the core i.MX boardfile registration
code is augmented to account for these changes.
This has been compile-tested with the imx_v4_v5_defconfig
and the imx_v6_v7_defconfig.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Shawn Guo <shawnguo@kernel.org>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Robin Gong <yibin.gong@nxp.com>
Cc: Trent Piepho <tpiepho@impinj.com>
Cc: Clark Wang <xiaoning.wang@nxp.com>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Pengutronix Kernel Team <kernel@pengutronix.de>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: NXP Linux Team <linux-imx@nxp.com>
Link: https://lore.kernel.org/r/20200625200252.207614-1-linus.walleij@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.
Deterministic algorithm:
For each file:
If not .svg:
For each line:
If doesn't contain `\bxmlns\b`:
For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
If both the HTTP and HTTPS versions
return 200 OK and serve the same content:
Replace HTTP with HTTPS.
Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Link: https://lore.kernel.org/r/20200708194400.22213-1-grandmaster@al2klimov.de
Signed-off-by: Mark Brown <broonie@kernel.org>
This series tries to reduce a whole bunch of overhead in each SPI
transfer. Much of this overhead is new with the recent interconnect
changes, but even without those changes we still had some overhead
that we could avoid. Let's avoid all of it.
These changes are atop the Qualcomm tree to avoid merge conflicts. If
they look good, the most expedient way to land them is probably to get
Ack's from Mark and land then via the Qualcomm tree.
Most testing was done on the Chrome OS 5.4 tree, but sanity check was
done on mainline.
Douglas Anderson (3):
spi: spi-geni-qcom: Avoid clock setting if not needed
spi: spi-geni-qcom: Set an autosuspend delay of 250 ms
spi: spi-geni-qcom: Get rid of most overhead in prepare_message()
drivers/spi/spi-geni-qcom.c | 67 ++++++++++++++++++-------------------
1 file changed, 32 insertions(+), 35 deletions(-)
--
2.27.0.383.g050319c2ae-goog
Hello,
this series first fixes the calculation of the clock rate. The driver will
round up to the nearest clock rate instead of rounding down. Resulting in SPI
devices accessed with a too high SPI clock.
The remaining patches improve the performance of the driver. The changes range
from micro-optimizations like reducing MMIO writes to the controller to
reducing the number of needed interrupts in some use cases.
regards,
Marc
changes since v1:
- added Maxime Ripard's to the existing patches
- 06/10: (was 05/10 in v1)
"spi: spi-sun6i: sun6i_spi_drain_fifo(): introduce sun6i_spi_get_rx_fifo_count() and make use of it"
use FIELD_GET instead of open coding it
(tnx: Maxime Ripard)
- 05/10: "spi: spi-sun6i: sun6i_spi_get_tx_fifo_count: Convert manual shift+mask to FIELD_GET()"
new patch
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.orghttp://lists.infradead.org/mailman/listinfo/linux-arm-kernel
In commit 0e3b8a81f5 ("spi: spi-geni-qcom: Add interconnect
support") the spi_geni_runtime_suspend() and spi_geni_runtime_resume()
became a bit slower. Measuring on my hardware I see numbers in the
hundreds of microseconds now.
Let's use autosuspend to help avoid some of the overhead. Now if
we're doing a bunch of transfers we won't need to be constantly
chruning.
The number 250 ms for the autosuspend delay was picked a bit
arbitrarily, so if someone has measurements showing a better value we
could easily change this.
Fixes: 0e3b8a81f5 ("spi: spi-geni-qcom: Add interconnect support")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Akash Asthana<akashast@codeaurora.org>
Link: https://lore.kernel.org/r/20200701174506.2.I9b8f6bb1e7e6d8847e2ed2cf854ec55678db427f@changeid
Signed-off-by: Mark Brown <broonie@kernel.org>
In sun6i_spi_transfer_one() the RX FIFO Ready (SUN6I_INT_CTL_RF_RDY) is
unconditionally enabled.
A RX interrupt is only needed, if more data than fits into the FIFO is going to
be received during this transfer. As the RX-FIFO is drained during transfer
complete interrupt, enable the RX FIFO Ready interrupt only if the data doesn't
fit into the FIFO.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Maxime Ripard <mripard@kernel.org>
Link: https://lore.kernel.org/r/20200706143443.9855-11-mkl@pengutronix.de
Signed-off-by: Mark Brown <broonie@kernel.org>
The function sun6i_spi_fill_fifo() is called with a length argument of
"sspi->fifo_depth" and "SUN6I_FIFO_DEPTH".
The driver reads the number of free bytes in the FIFO from the hardware and
uses the length argument to limit this value. This is not needed as the number
of free bytes in the FIFO is always less or equal the depth of the FIFO.
This patch removes the length argument and check.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Maxime Ripard <mripard@kernel.org>
Link: https://lore.kernel.org/r/20200706143443.9855-9-mkl@pengutronix.de
Signed-off-by: Mark Brown <broonie@kernel.org>
The function sun6i_spi_drain_fifo() is called with a length argument of
"sspi->fifo_depth" and "SUN6I_FIFO_DEPTH".
The driver reads the number of available bytes to read from the FIFO from the
hardware and uses the length argument to limit this value. This is not needed
as the FIFO can contain only the fifo depth number of bytes.
This patch removes the length argument and check.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Maxime Ripard <mripard@kernel.org>
Link: https://lore.kernel.org/r/20200706143443.9855-8-mkl@pengutronix.de
Signed-off-by: Mark Brown <broonie@kernel.org>
In sun6i_spi_transfer_one() the driver ensures that the length of the transfer
is smaller or equal to SUN6I_MAX_XFER_SIZE. This means the masking of the
length to SUN6I_MAX_XFER_SIZE can be skipped when writing the transfer length
into the registers.
This patch removes the useless masking of the transfer length to
SUN6I_MAX_XFER_SIZE.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Maxime Ripard <mripard@kernel.org>
Link: https://lore.kernel.org/r/20200706143443.9855-5-mkl@pengutronix.de
Signed-off-by: Mark Brown <broonie@kernel.org>
A SPI transfer defines the _maximum_ speed of the SPI transfer. However the
driver doesn't take into account that the clock divider is always rounded down
(due to integer arithmetics). This results in a too high clock rate for the SPI
transfer.
E.g.: with a mclk_rate of 24 MHz and a SPI transfer speed of 10 MHz, the
original code calculates a reg of "0", which results in a effective divider of
"2" and a 12 MHz clock for the SPI transfer.
This patch fixes the issue by using DIV_ROUND_UP() instead of a plain
integer division.
While there simplify the divider calculation for the CDR1 case, use
order_base_2() instead of two ilog2() calculations.
Fixes: 3558fe900e ("spi: sunxi: Add Allwinner A31 SPI controller driver")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Maxime Ripard <mripard@kernel.org>
Link: https://lore.kernel.org/r/20200706143443.9855-2-mkl@pengutronix.de
Signed-off-by: Mark Brown <broonie@kernel.org>
Hi all,
Although Florian was concerned about a trivial inline check to deal with
shared IRQs adding overhead, the reality is that it would be so small as
to not be worth even thinking about unless the driver was already tuned
to squeeze out every last cycle. And a brief look over the code shows
that that clearly isn't the case.
This is an example of some of the easy low-hanging fruit that jumps out
just from code inspection. Based on disassembly and ARM1176 cycle
timings, patch #2 should save the equivalent of 2-3 shared interrupt
checks off the critical path in all cases, and patch #3 possibly up to
about 100x more. I don't have any means to test these patches, let alone
measure performance, so they're only backed by the principle that less
code - and in particular fewer memory accesses - is almost always
better.
There is almost certainly a *lot* more to be had from careful use of
relaxed I/O accessors, not doing a read-modify-write of CS at every
reset, tweaking the loops further to avoid unnecessary writebacks to
variables, and so on. However since I'm not invested in this personally
I'm not going to pursue it any further; I'm throwing these patches out
as more of a demonstration to back up my original drive-by review
comments, so if anyone want to pick them up and run with them then
please do so.
Robin.
Robin Murphy (3):
spi: bcm3835: Tidy up bcm2835_spi_reset_hw()
spi: bcm2835: Micro-optimise IRQ handler
spi: bcm2835: Micro-optimise FIFO loops
drivers/spi/spi-bcm2835.c | 45 +++++++++++++++++++--------------------
1 file changed, 22 insertions(+), 23 deletions(-)
--
2.23.0.dirty
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.orghttp://lists.infradead.org/mailman/listinfo/linux-arm-kernel
This converts the IMG SPFI SPI driver to use GPIO descriptors
as obtained from the core instead of GPIO numbers.
The driver was already relying on the core code to look up
the GPIO numbers from the device tree and allocate memory for
storing state etc. By moving to use descriptors handled by
the core we can delete the setup/cleanup functions and
the device state handler that were only dealing with this.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Cc: Ionela Voinescu <ionela.voinescu@imgtec.com>
Cc: Sifan Naeem <sifan.naeem@imgtec.com>
Link: https://lore.kernel.org/r/20200625201422.208640-1-linus.walleij@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
The Nuvoton PSPI driver already uses the core to handle GPIO
chip selects but is using the old GPIO number method and
retrieveing the GPIOs in the probe() call.
Switch it over to using GPIO descriptors saving a bunch of
code and modernizing it.
Compile tested med ARMv7 multiplatform config augmented
with the Nuvoton arch and this driver.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Cc: Tomer Maimon <tmaimon77@gmail.com>
Link: https://lore.kernel.org/r/20200625225759.273911-1-linus.walleij@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
On some SPI controllers (like spi-geni-qcom) setting the chip select
is a heavy operation. For instance on spi-geni-qcom, with the current
code, is was measured as taking upwards of 20 us. Even on SPI
controllers that aren't as heavy, setting the chip select is at least
something like a MMIO operation over some peripheral bus which isn't
as fast as a RAM access.
While it would be good to find ways to mitigate problems like this in
the drivers for those SPI controllers, it can also be noted that the
SPI framework could also help out. Specifically, in some situations,
we can see the SPI framework calling the driver's set_cs() with the
same parameter several times in a row. This is specifically observed
when looking at the way the Chrome OS EC SPI driver (cros_ec_spi)
works but other drivers likely trip it to some extent.
Let's solve this by caching the chip select state in the core and only
calling into the controller if there was a change. We check not only
the "enable" state but also the chip select mode (active high or
active low) since controllers may care about both the mode and the
enable flag in their callback.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20200629164103.1.Ied8e8ad8bbb2df7f947e3bc5ea1c315e041785a2@changeid
Signed-off-by: Mark Brown <broonie@kernel.org>
The field mspi->reg_base is annotated as an __iomem pointer. Good.
However, this field is often assigned to a temporary variable:
before being used. For example:
struct fsl_spi_reg *reg_base = mspi->reg_base;
But this variable is missing the __iomem annotation.
So, add the missing __iomem and make sparse & the bot happier.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Link: https://lore.kernel.org/r/20200622162611.83694-1-luc.vanoostenryck@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The IRQ handler only needs the struct spi_controller for the sake of
the completion at the end of a transfer. Passing the struct bcm2835_spi
directly as the IRQ data allows that level of indirection to be pushed
into the completion path for the reverse lookup, and avoided entirely
in all other cases.
This saves one explicit load in the critical path, plus (for a GCC 8.3
build) two registers worth of stack frame overhead.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/6b401cb521539caffab21f05b4c8cba6c9d27c6e.1592261248.git.robin.murphy@arm.com
Signed-off-by: Mark Brown <broonie@kernel.org>
The OMAP2 MCSPI has some kind of half-baked GPIO CS support:
it includes code like this:
if (gpio_is_valid(spi->cs_gpio)) {
ret = gpio_request(spi->cs_gpio, dev_name(&spi->dev));
(...)
But it doesn't parse the "cs-gpios" attribute in the device
tree to count the number of GPIOs or pick out the GPIO numbers
and put these in the SPI master's .cs_gpios property.
We complete the implementation of supporting CS GPIOs
from the device tree and switch it over to use the SPI core
for this.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Tony Lindgren <tony@atomide.com>
Link: https://lore.kernel.org/r/20200625231257.280615-1-linus.walleij@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Pull spi fixes from Mark Brown:
"A batch of fixes for the Freescale DSPI driver fixing some serious
issues with removal of active devices and one resume case, plus a few
new PCI IDs for Intel platforms"
* tag 'spi-fix-v5.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: pxa2xx: Add support for Intel Tiger Lake PCH-H
spi: spi-fsl-dspi: Initialize completion before possible interrupt
spi: spi-fsl-dspi: Fix external abort on interrupt in resume or exit paths
spi: spi-fsl-dspi: Fix lockup if device is shutdown during SPI transfer
spi: spi-fsl-dspi: Fix lockup if device is removed during SPI transfer
There is code for adjusting the clock both in setup_fifo_params()
(called from prepare_message()) and in setup_fifo_xfer() (called from
transfer_one()). The code is the same. Abstract it out to a shared
function.
This is a no-op cleanup patch. The only change is to the error string
if we fail to set the clock. Since the two paths has marginally
different error messages I picked the clean one.
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Akash Asthana <akashast@codeaurora.org>
Link: https://lore.kernel.org/r/1592908737-7068-6-git-send-email-akashast@codeaurora.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Add fallback to pio mode in case dma transfer failed with error status
SPI_TRANS_FAIL_NO_START.
If spi client driver want to enable this feature please set xfer->error in
the proper place such as dmaengine_prep_slave_sg() failure detect(but no
any data put into spi bus yet). Besides, add master->fallback checking in
its can_dma() so that spi core could switch to pio next time. Please refer
to spi-imx.c.
Signed-off-by: Robin Gong <yibin.gong@nxp.com>
Link: https://lore.kernel.org/r/1592347329-28363-2-git-send-email-yibin.gong@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Pull spi fixes from Mark Brown:
"Quite a lot of fixes here for no single reason.
There's a collection of the usual sort of device specific fixes and
also a bunch of people have been working on spidev and the userspace
test program spidev_test so they've got an unusually large collection
of small fixes"
* tag 'spi-fix-v5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spidev: fix a potential use-after-free in spidev_release()
spi: spidev: fix a race between spidev_release and spidev_remove
spi: stm32-qspi: Fix error path in case of -EPROBE_DEFER
spi: uapi: spidev: Use TABs for alignment
spi: spi-fsl-dspi: Free DMA memory with matching function
spi: tools: Add macro definitions to fix build errors
spi: tools: Make default_tx/rx and input_tx static
spi: dt-bindings: amlogic, meson-gx-spicc: Fix schema for meson-g12a
spi: rspi: Use requested instead of maximum bit rate
spi: spidev_test: Use %u to format unsigned numbers
spi: sprd: switch the sequence of setting WDG_LOAD_LOW and _HIGH
To follow onto Doug's latest spi geni series[1] this simplifies and
reduces the code a little more.
[1] https://lore.kernel.org/r/20200618150626.237027-1-dianders@chromium.org
Stephen Boyd (2):
spi: spi-geni-qcom: Simplify setup_fifo_xfer()
spi: spi-geni-qcom: Don't set {tx,rx}_rem_bytes unnecessarily
drivers/spi/spi-geni-qcom.c | 55 +++++++++++++++++--------------------
1 file changed, 25 insertions(+), 30 deletions(-)
base-commit: 7ba9bdcb91
--
Sent by a computer, using git, on the internet
We only need to test for these counters being non-zero when we see the
end of a transfer. If we're doing a CS change then they will already be
zero. This implies that we don't need to set these to 0 if we're
cancelling an in flight transfer too, because we only care to test these
counters when the 'DONE' bit is set in the hardware and we've set them
to non-zero for a transfer.
This is a non-functional change, just cleanup to consolidate code.
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20200620022233.64716-3-swboyd@chromium.org
Signed-off-by: Mark Brown <broonie@kernel.org>