Currently blktrace isn't cgroup aware. blktrace prints out task name of
current context, but the task of current context isn't always in the
cgroup where the BIO comes from. We can't use task name to find out IO
cgroup. For example, Writeback BIOs always comes from flusher thread but
the BIOs are for different blk cgroups. Request could be requeued and
dispatched from completely different tasks. MD/DM are another examples.
This patch tries to fix the gap. We print out cgroup fhandle info in
blktrace. Userspace can use open_by_handle_at() syscall to find the
cgroup by fhandle. Or userspace can use name_to_handle_at() syscall to
find fhandle for a cgroup and use a BPF program to filter out blktrace
for a specific cgroup.
We add a new 'blk_cgroup' trace option for blk tracer. It's default off.
Application which doesn't know the new option isn't affected. When it's
on, we output fhandle info right after blk_io_trace with an extra bit
set in event action. So from application point of view, blktrace with
the option will output new actions.
I didn't change blk trace event yet, since I'm not sure if changing the
trace event output is an ABI issue. If not, I'll do it later.
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Now we have the facilities to implement exportfs operations. The idea is
cgroup can export the fhandle info to userspace, then userspace uses
fhandle to find the cgroup name. Another example is userspace can get
fhandle for a cgroup and BPF uses the fhandle to filter info for the
cgroup.
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Per-cpu workqueues have been tripping CPU affinity sanity checks while
a CPU is being offlined. A per-cpu kworker ends up running on a CPU
which isn't its target CPU while the CPU is online but inactive.
While the scheduler allows kthreads to wake up on an online but
inactive CPU, it doesn't allow a running kthread to be migrated to
such a CPU, which leads to an odd situation where setting affinity on
a sleeping and running kthread leads to different results.
Each mem-reclaim workqueue has one rescuer which guarantees forward
progress and the rescuer needs to bind itself to the CPU which needs
help in making forward progress; however, due to the above issue,
while set_cpus_allowed_ptr() succeeds, the rescuer doesn't end up on
the correct CPU if the CPU is in the process of going offline,
tripping the sanity check and executing the work item on the wrong
CPU.
This patch updates __migrate_task() so that kthreads can be migrated
into an inactive but online CPU.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There is an underlying assumption/trade-off in many layers of the Linux
system that CPU <-> node mapping is static. This is despite the presence
of features like NUMA and 'hotplug' that support the dynamic addition/
removal of fundamental system resources like CPUs and memory. PowerPC
systems, however, do provide extensive features for the dynamic change
of resources available to a system.
Currently, there is little or no synchronization protection around the
updating of the CPU <-> node mapping, and the export/update of this
information for other layers / modules. In systems which can change
this mapping during 'hotplug', like PowerPC, the information is changing
underneath all layers that might reference it.
This patch attempts to ensure that a valid, usable cpumask attribute
is used by the workqueue infrastructure when setting up new resource
pools. It prevents a crash that has been observed when an 'empty'
cpumask is passed along to the worker/task scheduling code. It is
intended as a temporary workaround until a more fundamental review and
correction of the issue can be done.
[With additions to the patch provided by Tejun Hao <tj@kernel.org>]
Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Tree RCU guarantees that every online CPU has a memory barrier between
any given grace period and any of that CPU's RCU read-side sections that
must be ordered against that grace period. Since RCU doesn't always
know where read-side critical sections are, the actual implementation
guarantees order against prior and subsequent non-idle non-offline code,
whether in an RCU read-side critical section or not. As a result, there
does not need to be a memory barrier at the end of synchronize_rcu()
and friends because the ordering internal to the grace period has
ordered every CPU's post-grace-period execution against each CPU's
pre-grace-period execution, again for all non-idle online CPUs.
In contrast, SRCU can have non-idle online CPUs that are completely
uninvolved in a given SRCU grace period, for example, a CPU that
never runs any SRCU read-side critical sections and took no part in
the grace-period processing. It is in theory possible for a given
synchronize_srcu()'s wakeup to be delivered to a CPU that was completely
uninvolved in the prior SRCU grace period, which could mean that the
code following that synchronize_srcu() would end up being unordered with
respect to both the grace period and any pre-existing SRCU read-side
critical sections.
This commit therefore adds an smp_mb() to the end of __synchronize_srcu(),
which prevents this scenario from occurring.
Reported-by: Lance Roy <ldr709@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Lance Roy <ldr709@gmail.com>
Cc: <stable@vger.kernel.org> # 4.12.x
That commit was part of the changes moving x86 to the generic CPU hotplug
interrupt migration code. The force flag was required on x86 before the
hierarchical irqdomain rework, but invoking set_affinity() with force=true
stayed and had no side effects.
At some point in the past, the force flag got repurposed to support the
exynos timer interrupt affinity setting to a not yet online CPU, so the
interrupt controller callback does not verify the supplied affinity mask
against cpu_online_mask.
Setting the flag in the CPU hotplug code causes the cpu online masking to
be blocked on these irq controllers and results in potentially affining an
interrupt to the CPU which is unplugged, i.e. instead of moving it away,
it's just reassigned to it.
As the force flags is not longer needed on x86, it's safe to revert that
patch so the ARM irqchips which use the force flag work again.
Add comments to that effect, so this won't happen again.
Note: The online mask handling should be done in the generic code and the
force flag and the masking in the irq chips removed all together, but
that's not a change possible for 4.13.
Fixes: 77f85e66aa ("genirq/cpuhotplug: Set force affinity flag on hotplug migration")
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: LAK <linux-arm-kernel@lists.infradead.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1707271217590.3109@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
printk_late_init() is responsible for disabling boot consoles that
use init memory. It checks the address of struct console for this.
But this is not enough. For example, there are several early
consoles that have write() method in the init section and
struct console in the normal section. They are not disabled
and could cause fancy and hard to debug system states.
It is even more complicated by the macros EARLYCON_DECLARE() and
OF_EARLYCON_DECLARE() where various struct members are set at
runtime by the provided setup() function.
I have tried to reproduce this problem and forced the classic uart
early console to stay using keep_bootcon parameter. In particular
I used earlycon=uart,io,0x3f8 keep_bootcon console=ttyS0,115200.
The system did not boot:
[ 1.570496] PM: Image not found (code -22)
[ 1.570496] PM: Image not found (code -22)
[ 1.571886] PM: Hibernation image not present or could not be loaded.
[ 1.571886] PM: Hibernation image not present or could not be loaded.
[ 1.576407] Freeing unused kernel memory: 2528K
[ 1.577244] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
The double lines are caused by having both early uart console and
ttyS0 console enabled at the same time. The early console stopped
working when the init memory was freed. Fortunately, the invalid
call was caught by the NX-protexted page check and did not cause
any silent fancy problems.
This patch adds a check for many other addresses stored in
struct console. It omits setup() and match() that are used
only when the console is registered. Therefore they have
already been used at this point and there is no reason
to use them again.
Link: http://lkml.kernel.org/r/1500036673-7122-3-git-send-email-pmladek@suse.com
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: "Fabio M. Di Nitto" <fdinitto@redhat.com>
Cc: linux-serial@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Commit 4c30c6f566 ("kernel/printk: do not turn off bootconsole in
printk_late_init() if keep_bootcon") added a check on keep_bootcon to
ensure that boot consoles were kept around until the real console is
registered.
This can lead to problems if the boot console data and code are in the
init section, since it can be freed before the boot console is
unregistered.
Commit 81cc26f2bd ("printk: only unregister boot consoles when
necessary") fixed this a better way. It allowed to keep boot consoles
that did not use init data. Unfortunately it did not remove the check
of keep_bootcon.
This can lead to crashes and weird panics when the bootconsole is
accessed after free, especially if page poisoning is in use and the
code / data have been overwritten with a poison value.
To prevent this, always free the boot console if it is within the init
section. In addition, print a warning about that the console is removed
prematurely.
Finally there is a new comment how to avoid the warning. It replaced
an explanation that duplicated a more comprehensive function
description few lines above.
Fixes: 4c30c6f566 ("kernel/printk: do not turn off bootconsole in printk_late_init() if keep_bootcon")
Link: http://lkml.kernel.org/r/1500036673-7122-2-git-send-email-pmladek@suse.com
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: "Fabio M. Di Nitto" <fdinitto@redhat.com>
Cc: linux-serial@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com>
[pmladek@suse.com: print the warning, code and comments clean up]
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Make iowait_boost and iowait_boost_max as unsigned int since its unit
is kHz and this is consistent with struct cpufreq_policy. Also change
the local variables in sugov_iowait_boost() to match this.
Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently the iowait_boost feature in schedutil makes the frequency
go to max on iowait wakeups. This feature was added to handle a case
that Peter described where the throughput of operations involving
continuous I/O requests [1] is reduced due to running at a lower
frequency, however the lower throughput itself causes utilization to
be low and hence causing frequency to be low hence its "stuck".
Instead of going to max, its also possible to achieve the same effect
by ramping up to max if there are repeated in_iowait wakeups
happening. This patch is an attempt to do that. We start from a lower
frequency (policy->min) and double the boost for every consecutive
iowait update until we reach the maximum iowait boost frequency
(iowait_boost_max).
I ran a synthetic test (continuous O_DIRECT writes in a loop) on an
x86 machine with intel_pstate in passive mode using schedutil. In
this test the iowait_boost value ramped from 800MHz to 4GHz in 60ms.
The patch achieves the desired improved throughput as the existing
behavior.
[1] https://patchwork.kernel.org/patch/9735885/
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The newly added irqchip_fwnode_ops structure is not exported, which can
lead to link errors:
ERROR: "irqchip_fwnode_ops" [drivers/gpio/gpio-xgene-sb.ko] undefined!
I checked that all other such symbols that were introduced are
exported if they need to be, this is the only missing one.
Fixes: db3e50f323 (device property: Get rid of struct fwnode_handle type field)
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Set dynamic_switching to 'true' to disallow use of schedutil governor
for platforms with transition_latency set to CPUFREQ_ETERNAL, as they
may not want to do automatic dynamic frequency switching.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
After adopting callbacks from a newly offlined CPU, the adopting CPU
checks to make sure that its callback list's count is zero only if the
list has no callbacks and vice versa. Unfortunately, it does so after
enabling interrupts, which means that false positives are possible due to
interrupt handlers invoking call_rcu(). Although these false positives
are improbable, rcutorture did make it happen once.
This commit therefore moves this check to an irq-disabled region of code,
thus suppressing the false positive.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Given changes to callback migration, rcu_cblist_head(),
rcu_cblist_tail(), rcu_cblist_count_cbs(), rcu_segcblist_segempty(),
rcu_segcblist_dequeued_lazy(), and rcu_segcblist_new_cbs() are
no longer used. This commit therefore removes them.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Given that the rcu_state structure's >orphan_pend and ->orphan_done
fields are used only during migration of callbacks from the recently
offlined CPU to a surviving CPU, if rcu_send_cbs_to_orphanage() and
rcu_adopt_orphan_cbs() are combined, these fields can become local
variables in the combined function. This commit therefore combines
rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() into a new
rcu_segcblist_merge() function and removes the ->orphan_pend and
->orphan_done fields.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
When migrating callbacks from a newly offlined CPU, we are already
holding the root rcu_node structure's lock, so it costs almost nothing
to advance and accelerate the newly migrated callbacks. This patch
therefore makes this advancing and acceleration happen.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The ->orphan_lock is acquired and released only within the
rcu_migrate_callbacks() function, which now acquires the root rcu_node
structure's ->lock. This commit therefore eliminates the ->orphan_lock
in favor of the root rcu_node structure's ->lock.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
It is possible that the outgoing CPU is unaware of recent grace periods,
and so it is also possible that some of its pending callbacks are actually
ready to be invoked. The current callback-migration code would needlessly
force these callbacks to pass through another grace period. This commit
therefore invokes rcu_advance_cbs() on the outgoing CPU's callbacks in
order to give them full credit for having passed through any recent
grace periods.
This also fixes an odd theoretical bug where there are no callbacks in
the system except for those on the outgoing CPU, none of those callbacks
have yet been associated with a grace-period number, there is never again
another callback registered, and the surviving CPU never again takes a
scheduling-clock interrupt, never goes idle, and never enters nohz_full
userspace execution. Yes, this is (just barely) possible. It requires
that the surviving CPU be a nohz_full CPU, that its scheduler-clock
interrupt be shut off, and that it loop forever in the kernel. You get
bonus points if you can make this one happen! ;-)
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
RCU's CPU-hotplug callback-migration code first moves the outgoing
CPU's callbacks to ->orphan_done and ->orphan_pend, and only then
moves them to the NOCB callback list. This commit avoids the
extra step (and simplifies the code) by moving the callbacks directly
from the outgoing CPU's callback list to the NOCB callback list.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The current CPU-hotplug RCU-callback-migration code checks
for the source (newly offlined) CPU being a NOCBs CPU down in
rcu_send_cbs_to_orphanage(). This commit simplifies callback migration a
bit by moving this check up to rcu_migrate_callbacks(). This commit also
adds a check for the source CPU having no callbacks, which eases analysis
of the rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() functions.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The rcu_node structure's ->n_cbs_orphaned and ->n_cbs_adopted fields
are updated, but never read. This commit therefore removes them.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The torture status line contains a series of values preceded by "onoff:".
The last value in that line, the one preceding the "HZ=" string, is
always zero. The reason that it is always zero is that torture_offline()
was incrementing the sum_offl pointer instead of the value that this
pointer referenced. This commit therefore makes this increment operate
on the statistic rather than the pointer to the statistic.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The update of the ->expmaskinitnext and of ->ncpus are unsynchronized,
with the value of ->ncpus being incremented long before the corresponding
->expmaskinitnext mask is updated. If an RCU expedited grace period
sees ->ncpus change, it will update the ->expmaskinit masks from the new
->expmaskinitnext masks. But it is possible that ->ncpus has already
been updated, but the ->expmaskinitnext masks still have their old values.
For the current expedited grace period, no harm done. The CPU could not
have been online before the grace period started, so there is no need to
wait for its non-existent pre-existing readers.
But the next RCU expedited grace period is in a world of hurt. The value
of ->ncpus has already been updated, so this grace period will assume
that the ->expmaskinitnext masks have not changed. But they have, and
they won't be taken into account until the next never-been-online CPU
comes online. This means that RCU will be ignoring some CPUs that it
should be paying attention to.
The solution is to update ->ncpus and ->expmaskinitnext while holding
the ->lock for the rcu_node structure containing the ->expmaskinitnext
mask. Because smp_store_release() is now used to update ->ncpus and
smp_load_acquire() is now used to locklessly read it, if the expedited
grace period sees ->ncpus change, then the updating CPU has to
already be holding the corresponding ->lock. Therefore, when the
expedited grace period later acquires that ->lock, it is guaranteed
to see the new value of ->expmaskinitnext.
On the other hand, if the expedited grace period loads ->ncpus just
before an update, earlier full memory barriers guarantee that
the incoming CPU isn't far enough along to be running any RCU readers.
This commit therefore makes the required change.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
RCU callbacks must be migrated away from an outgoing CPU, and this is
done near the end of the CPU-hotplug operation, after the outgoing CPU is
long gone. Unfortunately, this means that other CPU-hotplug callbacks
can execute while the outgoing CPU's callbacks are still immobilized
on the long-gone CPU's callback lists. If any of these CPU-hotplug
callbacks must wait, either directly or indirectly, for the invocation
of any of the immobilized RCU callbacks, the system will hang.
This commit avoids such hangs by migrating the callbacks away from the
outgoing CPU immediately upon its departure, shortly after the return
from __cpu_die() in takedown_cpu(). Thus, RCU is able to advance these
callbacks and invoke them, which allows all the after-the-fact CPU-hotplug
callbacks to wait on these RCU callbacks without risk of a hang.
While in the neighborhood, this commit also moves rcu_send_cbs_to_orphanage()
and rcu_adopt_orphan_cbs() under a pre-existing #ifdef to avoid including
dead code on the one hand and to avoid define-without-use warnings on the
other hand.
Reported-by: Jeffrey Hugo <jhugo@codeaurora.org>
Link: http://lkml.kernel.org/r/db9c91f6-1b17-6136-84f0-03c3c2581ab4@codeaurora.org
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Richard Weinberger <richard@nod.at>
5c0338c687 ("workqueue: restore WQ_UNBOUND/max_active==1 to be
ordered") automatically enabled ordered attribute for unbound
workqueues w/ max_active == 1. Because ordered workqueues reject
max_active and some attribute changes, this implicit ordered mode
broke cases where the user creates an unbound workqueue w/ max_active
== 1 and later explicitly changes the related attributes.
This patch distinguishes explicit and implicit ordered setting and
overrides from attribute changes if implict.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 5c0338c687 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered")
Explain cgroup_enable_threaded() and note that the function can never
be called on the root cgroup.
Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Waiman Long <longman@redhat.com>
cgroup_enable_threaded() checks that the cgroup doesn't have any tasks
or children and fails the operation if so. This test is unnecessary
because the first part is already checked by
cgroup_can_be_thread_root() and the latter is unnecessary. The latter
actually cause a behavioral oddity. Please consider the following
hierarchy. All cgroups are domains.
A
/ \
B C
\
D
If B is made threaded, C and D becomes invalid domains. Due to the no
children restriction, threaded mode can't be enabled on C. For C and
D, the only thing the user can do is removal.
There is no reason for this restriction. Remove it.
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair. This commit therefore replaces the spin_unlock_wait() call in
task_work_run() with a spin_lock_irq() and a spin_unlock_irq() aruond
the cmpxchg() dequeue loop. This should be safe from a performance
perspective because ->pi_lock is local to the task and because calls to
the other side of the race, task_work_cancel(), should be rare.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The handling of RCU's no-CBs CPUs has a maintenance headache, namely
that if call_rcu() is invoked with interrupts disabled, the rcuo kthread
wakeup must be defered to a point where we can be sure that scheduler
locks are not held. Of course, there are a lot of code paths leading
from an interrupts-disabled invocation of call_rcu(), and missing any
one of these can result in excessive callback-invocation latency, and
potentially even system hangs.
This commit therefore uses a timer to guarantee that the wakeup will
eventually occur. If one of the deferred-wakeup points kicks in, then
the timer is simply cancelled.
This commit also fixes up an incomplete removal of commits that were
intended to plug remaining exit paths, which should have the added
benefit of reducing the overhead of RCU's context-switch hooks. In
addition, it simplifies leader-to-follower callback-list handoff by
introducing locking. The call_rcu()-to-leader handoff continues to
use atomic operations in order to maintain good real-time latency for
common-case use of call_rcu().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Dan Carpenter fix for mod_timer() usage bug found by smatch. ]
ddebug_remove_module() use mod->name to find the ddebug_table of the
module and remove it. But dynamic_debug_setup() use the first
_ddebug->modname to create ddebug_table for the module. It's ok when
the _ddebug->modname is the same with the mod->name.
But livepatch module is special, it may contain _ddebugs of other
modules, the modname of which is different from the name of livepatch
module. So ddebug_remove_module() can't use mod->name to find the
right ddebug_table and remove it. It can cause kernel crash when we cat
the file <debugfs>/dynamic_debug/control.
Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
We forgot to set the error code on two error paths which means that we
return ERR_PTR(0) which is NULL. The caller, find_and_alloc_map(), is
not expecting that and will have a NULL dereference.
Fixes: 546ac1ffb7 ("bpf: add devmap, a map for storing net device references")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Linux kernel invokes call_rcu() from various interrupt/softirq
handlers, but rcutorture does not. This commit therefore adds this
behavior to rcutorture's repertoire.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit augments the grace-period-kthread starvation debugging
messages by adding the last CPU that ran the kthread.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit removes an unused local variable named ts_rem that is
marked __maybe_unused. Yes, the variable was assigned to, but it
was never used beyond that point, hence not needed.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
It appears that at least some of the rcutorture writer stall messages
coincide with unusually long CPU-online operations, for example, no
fewer than 205 seconds in a recent test. It is of course possible that
the writer stall is not unrelated to this unusually long CPU-hotplug
operation, and so this commit adds the rcutorture writer task's CPU to
the stall message to gain more information about this possible connection.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Strings used in event tracing need to be specially handled, for example,
being copied to the trace buffer instead of being pointed to by the trace
buffer. Although the TPS() macro can be used to "launder" pointed-to
strings, this might not be all that effective within a loadable module.
This commit therefore copies rcutorture's strings to the trace buffer.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Now that it is legal to invoke srcu_read_lock() and srcu_read_unlock()
for a given srcu_struct from both process context and {soft,}irq
handlers, it is time to test it. This commit therefore enables
testing of SRCU readers from rcutorture's timer handler, using in_task()
to determine whether or not it is safe to sleep in the SRCU read-side
critical sections.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The synchronize_rcu_tasks() and call_rcu_tasks() APIs are now available
regardless of kernel configuration, so this commit removes the
CONFIG_TASKS_RCU ifdef from rcuperf.c.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit adds printing of SRCU lock/unlock totals, which are just
the sums of the per-CPU counts. Saves a bit of mental arithmetic.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit gets rid of some ugly #ifdefs in rcutorture.c by moving
the SRCU status printing to the SRCU implementations.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The function process_srcu() is not invoked outside of srcutree.c, so
this commit makes it static and drops the EXPORT_SYMBOL_GPL().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Other than lockdep support, Tiny RCU has no need for the
scheduler status. However, Tiny SRCU will need this to control
boot-time behavior independent of lockdep. Therefore, this commit
moves rcu_scheduler_starting() from kernel/rcu/tiny_plugin.h to
kernel/rcu/srcutiny.c. This in turn allows the complete removal of
kernel/rcu/tiny_plugin.h.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Define a common prefix ("PM:") for messages printed by the
code in kernel/power/suspend.c.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Some messages in suspend.c currently print state names from
pm_states[], but that may be confusing if the mem_sleep sysfs
attribute is changed to anything different from "mem", because
in those cases the messages will say either "freeze" or "standby"
after writing "mem" to /sys/power/state.
To avoid the confusion, use mem_sleep_labels[] strings in those
messages instead.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
The pm_test sysfs attribute is under CONFIG_PM_DEBUG, but it doesn't
make sense to provide it if CONFIG_PM_SLEEP is unset, so put it under
CONFIG_PM_SLEEP_DEBUG instead.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Restore the pm_wakeup_pending() check in __device_suspend_noirq()
removed by commit eed4d47efe (ACPI / sleep: Ignore spurious SCI
wakeups from suspend-to-idle) as that allows the function to return
earlier if there's a wakeup event pending already (so that it may
spend less time on carrying out operations that will be reversed
shortly anyway) and rework the main suspend-to-idle loop to take
that optimization into account.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
As a preparation for subsequent changes, rearrange the core
suspend-to-idle code by moving the initial invocation of
dpm_suspend_noirq() into s2idle_loop().
This also causes debug messages from that code to appear in
a less confusing order.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>