The rpmh-rsc code had both a driver-level lock (sometimes referred to
in comments as drv->lock) and a lock per-TCS. The idea was supposed
to be that there would be times where you could get by with just
locking a TCS lock and therefor other RPMH users wouldn't be blocked.
The above didn't work out so well.
Looking at tcs_write() the bigger drv->lock was held for most of the
function anyway. Only the __tcs_buffer_write() and
__tcs_set_trigger() calls were called without holding the drv->lock.
It actually turns out that in tcs_write() we don't need to hold the
drv->lock for those function calls anyway even if the per-TCS lock
isn't there anymore. From the newly added comments in the code, this
is because:
- We marked "tcs_in_use" under lock.
- Once "tcs_in_use" has been marked nobody else could be writing
to these registers until the interrupt goes off.
- The interrupt can't go off until we trigger w/ the last line
of __tcs_set_trigger().
Thus, from a tcs_write() point of view, the per-TCS lock was useless.
Looking at rpmh_rsc_write_ctrl_data(), only the per-TCS lock was held.
It turns out, though, that this function already needs to be called
with the equivalent of the drv->lock held anyway (we either need to
hold drv->lock as we will in a future patch or we need to know no
other CPUs could be running as happens today). Specifically
rpmh_rsc_write_ctrl_data() might be writing to a TCS that has been
borrowed for writing an active transation but it never checks this.
Let's eliminate this extra overhead and avoid possible AB BA locking
headaches.
Suggested-by: Maulik Shah <mkshah@codeaurora.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20200504104917.v6.4.Ib8dccfdb10bf6b1fb1d600ca1c21d9c0db1ef746@changeid
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
When a PM Notifier returns NOTIFY_BAD it doesn't get called with
CPU_PM_ENTER_FAILED. It only get called for CPU_PM_ENTER_FAILED if
someone else (further down the notifier chain) returns NOTIFY_BAD.
Handle this case by taking our CPU out of the list of ones that have
entered PM. Without this it's possible we could detect that the last
CPU went down (and we would flush) even if some CPU was alive. That's
not good since our flushing routines currently assume they're running
on the last CPU for mutual exclusion.
Fixes: 985427f997 ("soc: qcom: rpmh: Invoke rpmh_flush() for dirty caches")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Maulik Shah <mkshah@codeaurora.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20200504104917.v6.2.I1927d1bca2569a27b2d04986baf285027f0818a2@changeid
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Our switch statement doesn't have entries for CPU_CLUSTER_PM_ENTER,
CPU_CLUSTER_PM_ENTER_FAILED, and CPU_CLUSTER_PM_EXIT and doesn't have
a default. This means that we'll try to do a flush in those cases but
we won't necessarily be the last CPU down. That's not so ideal since
our (lack of) locking assumes we're on the last CPU.
Luckily this isn't as big a problem as you'd think since (at least on
the SoC I tested) we don't get these notifications except on full
system suspend. ...and on full system suspend we get them on the last
CPU down. That means that the worst problem we hit is flushing twice.
Still, it's good to make it correct.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Fixes: 985427f997 ("soc: qcom: rpmh: Invoke rpmh_flush() for dirty caches")
Reported-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20200504104917.v6.1.Ic7096b3b9b7828cdd41cd5469a6dee5eb6abf549@changeid
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
We pass the result of sizeof() here to tell the printk format specifier
how many bytes to print. That expects an int though and sizeof() isn't
that type. Cast to int to silence this warning:
drivers/soc/qcom/cmd-db.c: In function 'cmd_db_debugfs_dump':
drivers/soc/qcom/cmd-db.c:281:30: warning: field width specifier '*' expects argument of type 'int', but argument 4 has type 'long unsigned int' [-Wformat=]
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Fixes: d6815c5c43 ("soc: qcom: cmd-db: Add debugfs dumping file")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20200415062033.66406-1-swboyd@chromium.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Auditing tcs_invalidate() made me worried. Specifically I saw that it
used spin_lock(), not spin_lock_irqsave(). That always worries me
unless I can trace for sure that I'm in the interrupt handler or that
someone else already disabled interrupts.
Looking more at it, there is actually no reason for these locks
anyway. Specifically the only reason you'd ever call
rpmh_rsc_invalidate() is if you cared that the sleep/wake TCSes were
empty. That means that they need to continue to be empty even after
rpmh_rsc_invalidate() returns. The only way that can happen is if the
caller already has done something to keep all other RPMH users out.
It should be noted that even though the caller is only worried about
making sleep/wake TCSes empty, they also need to worry about stopping
active-only transfers if they need to handle the case where
active-only transfers might borrow the wake TCS.
At the moment rpmh_rsc_invalidate() is only called in PM code from the
last CPU. If that later changes the caller will still need to solve
the above problems themselves, so these locks will never be useful.
Continuing to audit tcs_invalidate(), I found a bug. The function
didn't properly check for a borrowed TCS if we hadn't recently written
anything into the TCS. Specifically, if we've never written to the
WAKE_TCS (or we've flushed it recently) then tcs->slots is empty.
We'll early-out and we'll never call tcs_is_free().
I thought about fixing this bug by either deleting the early check for
bitmap_empty() or possibly only doing it if we knew we weren't on a
TCS that could be borrowed. However, I think it's better to just
delete the checks.
As argued above it's up to the caller to make sure that all other
users of RPMH are quiet before tcs_invalidate() is called. Since
callers need to handle the zero-active-TCS case anyway that means they
need to make sure that the active-only transfers are quiet before
calling too. The one way tcs_invalidate() gets called today is
through rpmh_rsc_cpu_pm_callback() which calls
rpmh_rsc_ctrlr_is_busy() to handle this. When we have another path to
get to tcs_invalidate() it will also need to come up with something
similar and it won't need this extra check either. If we later find
some code path that actually needs this check back in (and somehow
manages to be race free) we can always add it back in.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Maulik Shah <mkshah@codeaurora.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Tested-by: Maulik Shah <mkshah@codeaurora.org>
Link: https://lore.kernel.org/r/20200413100321.v4.9.I07c1f70e0e8f2dc0004bd38970b4e258acdc773e@changeid
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
I've been pouring through the rpmh-rsc code and trying to understand
it. Document everything to the best of my ability. All documentation
here is strictly from code analysis--no actual knowledge of the
hardware was used. If something is wrong in here I either
misunderstood the code, had a typo, or the code has a bug in it
leading to my incorrect understanding.
In a few places here I have documented things that don't make tons of
sense. A future patch will try to address this. While this means I'm
adding comments / todos and then later fixing them in the series, it
seemed more urgent to get things documented first so that people could
understand the later patches.
Any comments I adjusted I also tried to make match kernel-doc better.
Specifically:
- kernel-doc says do not leave a blank line between the function
description and the arguments
- kernel-doc examples always have things starting w/ a capital and
ending with a period.
This should be a no-op. It's just comment changes.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Maulik Shah <mkshah@codeaurora.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20200413100321.v4.6.I52653eb85d7dc8981ee0dafcd0b6cc0f273e9425@changeid
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
The "cmd_cache" in RPMH wasn't terribly sensible. Specifically:
- The current code doesn't really detect "conflicts" properly any case
where the sequence being checked has more than one entry. One
simple way to see this in the current code is that if cmd[0].addr
isn't found then cmd[1].addr is never checked.
- The code attempted to use the "cmd_cache" to update an existing
message in a sleep/wake TCS with new data. The goal appeared to be
to update part of a TCS while leaving the rest of the TCS alone. We
never actually do this. We always fully invalidate and re-write
everything.
- If/when we try to optimize things to not fully invalidate / re-write
every time we update the TCSes we'll need to think it through very
carefully. Specifically requirement of find_match() that the new
sequence of addrs must match exactly the old sequence of addrs seems
inflexible. It's also not documented in rpmh_write() and
rpmh_write_batch(). In any case, if we do decide to require updates
to keep the exact same sequence and length then presumably the API
and data structures should be updated to understand groups more
properly. The current algorithm doesn't really keep track of the
length of the old sequence and there are several boundary-condition
bugs because of that. Said another way: if we decide to do
something like this in the future we should start from scratch and
thus find_match() isn't useful to keep around.
This patch isn't quite a no-op. Specifically:
- It should be a slight performance boost of not searching through so
many arrays.
- The old code would have done something useful in one case: it would
allow someone calling rpmh_write() to override the data that came
from rpmh_write_batch(). I don't believe that actually happens in
reality.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Maulik Shah <mkshah@codeaurora.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Tested-by: Maulik Shah <mkshah@codeaurora.org>
Link: https://lore.kernel.org/r/20200413100321.v4.5.I6d3d0a3ec810dc72ff1df3cbf97deefdcdeb8eef@changeid
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
I was trying to write documentation for the functions in rpmh-rsc and
I got to tcs_ctrl_write(). The documentation for the function would
have been: "This is the core of rpmh_rsc_write_ctrl_data(); all the
caller does is error-check and then call this".
Having the error checks in a separate function doesn't help for
anything since:
- There are no other callers that need to bypass the error checks.
- It's less documenting. When I read tcs_ctrl_write() I kept
wondering if I need to handle cases other than ACTIVE_ONLY or cases
with more commands than could fit in a TCS. This is obvious when
the error checks and code are together.
- The function just isn't that long, so there's no problem
understanding the combined function.
Things were even more confusing because the two functions names didn't
make obvious (at least to me) their relationship.
Simplify by folding one function into the other.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Maulik Shah <mkshah@codeaurora.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Tested-by: Maulik Shah <mkshah@codeaurora.org>
Link: https://lore.kernel.org/r/20200413100321.v4.3.Ie88ce5ccfc0c6055903ccca5286ae28ed3b85ed3@changeid
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
When there are more than one WAKE TCS available and there is no dedicated
ACTIVE TCS available, invalidating all WAKE TCSes and waiting for current
transfer to complete in first WAKE TCS blocks using another free WAKE TCS
to complete current request.
Remove rpmh_rsc_invalidate() to happen from tcs_write() when WAKE TCSes
is re-purposed to be used for Active mode. Clear only currently used
WAKE TCS's register configuration.
Fixes: 2de4b8d33e (drivers: qcom: rpmh-rsc: allow active requests from wake TCS)
Signed-off-by: Maulik Shah <mkshah@codeaurora.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/1586703004-13674-7-git-send-email-mkshah@codeaurora.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
For RSCs that have sleep & wake TCS but no dedicated active TCS, wake
TCS can be re-purposed to send active requests. Once the active requests
are sent and response is received, the active mode configuration needs
to be cleared so that controller can use wake TCS for sending wake
requests.
Introduce enable_tcs_irq() to enable completion IRQ for repurposed TCSes.
Fixes: 2de4b8d33e (drivers: qcom: rpmh-rsc: allow active requests from wake TCS)
Signed-off-by: Raju P.L.S.S.S.N <rplsssn@codeaurora.org>
[mkshah: call enable_tcs_irq() within drv->lock, update commit message]
Signed-off-by: Maulik Shah <mkshah@codeaurora.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/1586703004-13674-6-git-send-email-mkshah@codeaurora.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Add changes to invoke rpmh flush() from CPU PM notification.
This is done when the last the cpu is entering deep CPU idle
states and controller is not busy.
Controllers that have 'HW solver' mode like display RSC do not need
to register for CPU PM notification. They may be in autonomous mode
executing low power mode and do not require rpmh_flush() to happen
from CPU PM notification.
Signed-off-by: Maulik Shah <mkshah@codeaurora.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/1586703004-13674-5-git-send-email-mkshah@codeaurora.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
TCSes have previously programmed data when rpmh_flush() is called.
This can cause old data to trigger along with newly flushed.
Fix this by cleaning SLEEP and WAKE TCSes before new data is flushed.
With this there is no need to invoke rpmh_rsc_invalidate() call from
rpmh_invalidate().
Simplify rpmh_invalidate() by moving invalidate_batch() inside.
Fixes: 600513dfee ("drivers: qcom: rpmh: cache sleep/wake state requests")
Signed-off-by: Maulik Shah <mkshah@codeaurora.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/1586703004-13674-4-git-send-email-mkshah@codeaurora.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Currently rpmh ctrlr dirty flag is set for all cases regardless of data
is really changed or not. Add changes to update dirty flag when data is
changed to newer values. Update dirty flag everytime when data in batch
cache is updated since rpmh_flush() may get invoked from any CPU instead
of only last CPU going to low power mode.
Also move dirty flag updates to happen from within cache_lock and remove
unnecessary INIT_LIST_HEAD() call and a default case from switch.
Fixes: 600513dfee ("drivers: qcom: rpmh: cache sleep/wake state requests")
Signed-off-by: Maulik Shah <mkshah@codeaurora.org>
Reviewed-by: Srinivas Rao L <lsrao@codeaurora.org>
Reviewed-by: Evan Green <evgreen@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/1586703004-13674-3-git-send-email-mkshah@codeaurora.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
QCOM_APR selects QCOM_PDR_HELPERS, which in turn selects
QCOM_QMI_HELPERS, which depends on NET. So ensure that APR's
dependencies are met by making it depend on NET as well.
Fixes: 8347356626 ("soc: qcom: apr: Add avs/audio tracking functionality")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Qualcomm SoCs (starting with MSM8998) allow for multiple protection domains
to run on the same Q6 sub-system. This allows for services like ATH10K WLAN
FW to have their own separate address space and crash/recover without
disrupting the modem and other PDs running on the same sub-system. The PDR
helpers introduces an abstraction that allows for tracking/controlling the
life cycle of protection domains running on various Q6 sub-systems.
Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
Link: https://lore.kernel.org/r/20200312120842.21991-2-sibis@codeaurora.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
This tracepoint is hit now that we call into the rpmh code from the cpu
idle path. Let's move this to be an rcuidle tracepoint so that we avoid
the RCU idle splat below
=============================
WARNING: suspicious RCU usage
5.4.10 #68 Tainted: G S
-----------------------------
drivers/soc/qcom/trace-rpmh.h:72 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from idle CPU!
rcu_scheduler_active = 2, debug_locks = 1
RCU used illegally from extended quiescent state!
5 locks held by swapper/2/0:
#0: ffffff81745d6ee8 (&(&genpd->slock)->rlock){+.+.}, at: genpd_lock_spin+0x1c/0x2c
#1: ffffff81745da6e8 (&(&genpd->slock)->rlock/1){....}, at: genpd_lock_nested_spin+0x24/0x34
#2: ffffff8174f2ca20 (&(&genpd->slock)->rlock/2){....}, at: genpd_lock_nested_spin+0x24/0x34
#3: ffffff8174f2c300 (&(&drv->client.cache_lock)->rlock){....}, at: rpmh_flush+0x48/0x24c
#4: ffffff8174f2c150 (&(&tcs->lock)->rlock){+.+.}, at: rpmh_rsc_write_ctrl_data+0x74/0x270
stack backtrace:
CPU: 2 PID: 0 Comm: swapper/2 Tainted: G S 5.4.10 #68
Call trace:
dump_backtrace+0x0/0x174
show_stack+0x20/0x2c
dump_stack+0xc8/0x124
lockdep_rcu_suspicious+0xe4/0x104
__tcs_buffer_write+0x230/0x2d0
rpmh_rsc_write_ctrl_data+0x210/0x270
rpmh_flush+0x84/0x24c
rpmh_domain_power_off+0x78/0x98
_genpd_power_off+0x40/0xc0
genpd_power_off+0x168/0x208
genpd_power_off+0x1e0/0x208
genpd_power_off+0x1e0/0x208
genpd_runtime_suspend+0x1ac/0x220
__rpm_callback+0x70/0xfc
rpm_callback+0x34/0x8c
rpm_suspend+0x218/0x4a4
__pm_runtime_suspend+0x88/0xac
psci_enter_domain_idle_state+0x3c/0xb4
cpuidle_enter_state+0xb8/0x284
cpuidle_enter+0x38/0x4c
call_cpuidle+0x3c/0x68
do_idle+0x194/0x260
cpu_startup_entry+0x24/0x28
secondary_start_kernel+0x150/0x15c
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Fixes: a65a397f24 ("cpuidle: psci: Add support for PM domains by using genpd")
Reported-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20200115013751.249588-1-swboyd@chromium.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>