The rcu_state structure's ->gp_start field is read locklessly, so this
commit adds the WRITE_ONCE() to an update in order to provide proper
documentation and READ_ONCE()/WRITE_ONCE() pairing.
This data race was reported by KCSAN. Not appropriate for backporting
due to failure being unlikely.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The rcu_segcblist_insert_pend_cbs() function currently (partially)
initializes the rcu_cblist that it pulls callbacks from. However, all
the resulting stores are dead because all callers pass in the address of
an on-stack cblist that is not used afterwards. This commit therefore
removes this pointless initialization.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The timer_pending() function is mostly used in lockless contexts, so
Without proper annotations, KCSAN might detect a data-race [1].
Using hlist_unhashed_lockless() instead of hand-coding it seems
appropriate (as suggested by Paul E. McKenney).
[1]
BUG: KCSAN: data-race in del_timer / detach_if_pending
write to 0xffff88808697d870 of 8 bytes by task 10 on cpu 0:
__hlist_del include/linux/list.h:764 [inline]
detach_timer kernel/time/timer.c:815 [inline]
detach_if_pending+0xcd/0x2d0 kernel/time/timer.c:832
try_to_del_timer_sync+0x60/0xb0 kernel/time/timer.c:1226
del_timer_sync+0x6b/0xa0 kernel/time/timer.c:1365
schedule_timeout+0x2d2/0x6e0 kernel/time/timer.c:1896
rcu_gp_fqs_loop+0x37c/0x580 kernel/rcu/tree.c:1639
rcu_gp_kthread+0x143/0x230 kernel/rcu/tree.c:1799
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
read to 0xffff88808697d870 of 8 bytes by task 12060 on cpu 1:
del_timer+0x3b/0xb0 kernel/time/timer.c:1198
sk_stop_timer+0x25/0x60 net/core/sock.c:2845
inet_csk_clear_xmit_timers+0x69/0xa0 net/ipv4/inet_connection_sock.c:523
tcp_clear_xmit_timers include/net/tcp.h:606 [inline]
tcp_v4_destroy_sock+0xa3/0x3f0 net/ipv4/tcp_ipv4.c:2096
inet_csk_destroy_sock+0xf4/0x250 net/ipv4/inet_connection_sock.c:836
tcp_close+0x6f3/0x970 net/ipv4/tcp.c:2497
inet_release+0x86/0x100 net/ipv4/af_inet.c:427
__sock_release+0x85/0x160 net/socket.c:590
sock_close+0x24/0x30 net/socket.c:1268
__fput+0x1e1/0x520 fs/file_table.c:280
____fput+0x1f/0x30 fs/file_table.c:313
task_work_run+0xf6/0x130 kernel/task_work.c:113
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop+0x2b4/0x2c0 arch/x86/entry/common.c:163
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 12060 Comm: syz-executor.5 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
[ paulmck: Pulled in Eric's later amendments. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The rcu_node structure's ->boost_kthread_status field is accessed
locklessly, so this commit causes all updates to use WRITE_ONCE() and
all reads to use READ_ONCE().
This data race was reported by KCSAN. Not appropriate for backporting
due to failure being unlikely.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The rcu_data structure's ->rcu_forced_tick field is read locklessly, so
this commit adds WRITE_ONCE() to all updates and READ_ONCE() to all
lockless reads.
This data race was reported by KCSAN. Not appropriate for backporting
due to failure being unlikely.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The rcu_data structure's ->gpwrap field is read locklessly, and so
this commit adds the required READ_ONCE() to a pair of laods in order
to avoid destructive compiler optimizations.
This data race was reported by KCSAN.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Convert to plural and add a note that this is for Tree RCU.
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The various RCU structures' ->gp_seq, ->gp_seq_needed, ->gp_req_activity,
and ->gp_activity fields are read locklessly, so they must be updated with
WRITE_ONCE() and, when read locklessly, with READ_ONCE(). This commit makes
these changes.
This data race was reported by KCSAN. Not appropriate for backporting
due to failure being unlikely.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The rcu_segcblist structure's ->tails[] array entries are read
locklessly, so this commit adds the READ_ONCE() to a load in order to
avoid destructive compiler optimizations.
This data race was reported by KCSAN.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The rt_mutex structure's ->owner field is read locklessly, so this
commit adds the WRITE_ONCE() to an update in order to provide proper
documentation and READ_ONCE()/WRITE_ONCE() pairing.
This data race was reported by KCSAN. Not appropriate for backporting
due to failure being unlikely.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will@kernel.org>
The rcu_state structure's ->qsmaskinitnext field is read locklessly,
so this commit adds the WRITE_ONCE() to an update in order to provide
proper documentation and READ_ONCE()/WRITE_ONCE() pairing.
This data race was reported by KCSAN. Not appropriate for backporting
due to failure being unlikely for systems not doing incessant CPU-hotplug
operations.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The rcu_state structure's ->gp_req_activity field is read locklessly,
so this commit adds the WRITE_ONCE() to an update in order to provide
proper documentation and READ_ONCE()/WRITE_ONCE() pairing.
This data race was reported by KCSAN. Not appropriate for backporting
due to failure being unlikely.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The rcu_node structure's ->gp_seq field is read locklessly, so this
commit adds the READ_ONCE() to several loads in order to avoid
destructive compiler optimizations.
This data race was reported by KCSAN. Not appropriate for backporting
because this affects only tracing and warnings.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The rcu_node structure's ->exp_seq_rq field is read locklessly, so
this commit adds the WRITE_ONCE() to a load in order to provide proper
documentation and READ_ONCE()/WRITE_ONCE() pairing.
This data race was reported by KCSAN. Not appropriate for backporting
due to failure being unlikely.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The rcu_node structure's ->qsmask field is read locklessly, so this
commit adds the WRITE_ONCE() to an update in order to provide proper
documentation and READ_ONCE()/WRITE_ONCE() pairing.
This data race was reported by KCSAN. Not appropriate for backporting
due to failure being unlikely.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This commit adds "-g -fno-omit-frame-pointer" to ease interpretation
of KCSAN output, but only for CONFIG_KCSAN=y kerrnels.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The rcu_node structure's ->exp_seq_rq field is accessed locklessly, so
updates must use WRITE_ONCE(). This commit therefore adds the needed
WRITE_ONCE() invocation where it was missed.
This data race was reported by KCSAN. Not appropriate for backporting
due to failure being unlikely.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The for_each_leaf_node_cpu_mask() and for_each_leaf_node_possible_cpu()
macros must be invoked only on leaf rcu_node structures. Failing to
abide by this restriction can result in infinite loops on systems with
more than 64 CPUs (or for more than 32 CPUs on 32-bit systems). This
commit therefore adds WARN_ON_ONCE() calls to make misuse of these two
macros easier to debug.
Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Clear trace_state data structure when starting trace
in __synth_event_trace_start() internal function.
Currently trace_state is initialized only in the
synth_event_trace_start() API, but the trace_state
in synth_event_trace() and synth_event_trace_array()
are on the stack without initialization.
This means those APIs will see wrong parameters and
wil skip closing process in __synth_event_trace_end()
because trace_state->disabled may be !0.
Link: http://lkml.kernel.org/r/158193315899.8868.1781259176894639952.stgit@devnote2
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The tracing seftests checks various aspects of the tracing infrastructure,
and one is filtering. If trace_printk() is active during a self test, it can
cause the filtering to fail, which will disable that part of the trace.
To keep the selftests from failing because of trace_printk() calls,
trace_printk() checks the variable tracing_selftest_running, and if set, it
does not write to the tracing buffer.
As some tracers were registered earlier in boot, the selftest they triggered
would fail because not all the infrastructure was set up for the full
selftest. Thus, some of the tests were post poned to when their
infrastructure was ready (namely file system code). The postpone code did
not set the tracing_seftest_running variable, and could fail if a
trace_printk() was added and executed during their run.
Cc: stable@vger.kernel.org
Fixes: 9afecfbb95 ("tracing: Postpone tracer start-up tests till the system is more robust")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The test code that tests synthetic event creation pushes in as one of its
test fields the current CPU using "smp_processor_id()". As this is just
something to see if the value is correctly passed in, and the actual CPU
used does not matter, use raw_smp_processor_id(), otherwise with debug
preemption enabled, a warning happens as the smp_processor_id() is called
without preemption enabled.
Link: http://lkml.kernel.org/r/20200220162950.35162579@gandalf.local.home
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Fix a varargs-related bug in print_synth_event() which resulted in
strange output and oopses on 32-bit x86 systems. The problem is that
trace_seq_printf() expects the varargs to match the format string, but
print_synth_event() was always passing u64 values regardless. This
results in unspecified behavior when unpacking with va_arg() in
trace_seq_printf().
Add a function that takes the size into account when calling
trace_seq_printf().
Before:
modprobe-1731 [003] .... 919.039758: gen_synth_test: next_pid_field=777(null)next_comm_field=hula hoops ts_ns=1000000 ts_ms=1000 cpu=3(null)my_string_field=thneed my_int_field=598(null)
After:
insmod-1136 [001] .... 36.634590: gen_synth_test: next_pid_field=777 next_comm_field=hula hoops ts_ns=1000000 ts_ms=1000 cpu=1 my_string_field=thneed my_int_field=598
Link: http://lkml.kernel.org/r/a9b59eb515dbbd7d4abe53b347dccf7a8e285657.1581720155.git.zanussi@kernel.org
Reported-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
synth_event_trace() is the varargs version of synth_event_trace_array(),
which takes an array of u64, as do synth_event_add_val() et al.
To not only be consistent with those, but also to address the fact
that synth_event_trace() expects every arg to be of the same type
since it doesn't also pass in e.g. a format string, the caller needs
to make sure all args are of the same type, u64. u64 is used because
it needs to accomodate the largest type available in synthetic events,
which is u64.
This fixes the bug reported by the kernel test robot/Rong Chen.
Link: https://lore.kernel.org/lkml/20200212113444.GS12867@shao2-debian/
Link: http://lkml.kernel.org/r/894c4e955558b521210ee0642ba194a9e603354c.1581720155.git.zanussi@kernel.org
Fixes: 9fe41efaca ("tracing: Add synth event generation test module")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
SD_BALANCE_WAKE was previously added to lower sched_domain levels on
asymmetric CPU capacity systems by commit:
9ee1cda5ee ("sched/core: Enable SD_BALANCE_WAKE for asymmetric capacity systems")
to enable the use of find_idlest_cpu() and friends to find an appropriate
CPU for tasks.
That responsibility has now been shifted to select_idle_sibling() and
friends, and hence the flag can be removed. Note that this causes
asymmetric CPU capacity systems to no longer enter the slow wakeup path
(find_idlest_cpu()) on wakeups - only on execs and forks (which is aligned
with all other mainline topologies).
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
[Changelog tweaks]
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Quentin Perret <qperret@google.com>
Link: https://lkml.kernel.org/r/20200206191957.12325-3-valentin.schneider@arm.com
Issue
=====
On asymmetric CPU capacity topologies, we currently rely on wake_cap() to
drive select_task_rq_fair() towards either:
- its slow-path (find_idlest_cpu()) if either the previous or
current (waking) CPU has too little capacity for the waking task
- its fast-path (select_idle_sibling()) otherwise
Commit:
3273163c67 ("sched/fair: Let asymmetric CPU configurations balance at wake-up")
points out that this relies on the assumption that "[...]the CPU capacities
within an SD_SHARE_PKG_RESOURCES domain (sd_llc) are homogeneous".
This assumption no longer holds on newer generations of big.LITTLE
systems (DynamIQ), which can accommodate CPUs of different compute capacity
within a single LLC domain. To hopefully paint a better picture, a regular
big.LITTLE topology would look like this:
+---------+ +---------+
| L2 | | L2 |
+----+----+ +----+----+
|CPU0|CPU1| |CPU2|CPU3|
+----+----+ +----+----+
^^^ ^^^
LITTLEs bigs
which would result in the following scheduler topology:
DIE [ ] <- sd_asym_cpucapacity
MC [ ] [ ] <- sd_llc
0 1 2 3
Conversely, a DynamIQ topology could look like:
+-------------------+
| L3 |
+----+----+----+----+
| L2 | L2 | L2 | L2 |
+----+----+----+----+
|CPU0|CPU1|CPU2|CPU3|
+----+----+----+----+
^^^^^ ^^^^^
LITTLEs bigs
which would result in the following scheduler topology:
MC [ ] <- sd_llc, sd_asym_cpucapacity
0 1 2 3
What this means is that, on DynamIQ systems, we could pass the wake_cap()
test (IOW presume the waking task fits on the CPU capacities of some LLC
domain), thus go through select_idle_sibling().
This function operates on an LLC domain, which here spans both bigs and
LITTLEs, so it could very well pick a CPU of too small capacity for the
task, despite there being fitting idle CPUs - it very much depends on the
CPU iteration order, on which we have absolutely no guarantees
capacity-wise.
Implementation
==============
Introduce yet another select_idle_sibling() helper function that takes CPU
capacity into account. The policy is to pick the first idle CPU which is
big enough for the task (task_util * margin < cpu_capacity). If no
idle CPU is big enough, we pick the idle one with the highest capacity.
Unlike other select_idle_sibling() helpers, this one operates on the
sd_asym_cpucapacity sched_domain pointer, which is guaranteed to span all
known CPU capacities in the system. As such, this will work for both
"legacy" big.LITTLE (LITTLEs & bigs split at MC, joined at DIE) and for
newer DynamIQ systems (e.g. LITTLEs and bigs in the same MC domain).
Note that this limits the scope of select_idle_sibling() to
select_idle_capacity() for asymmetric CPU capacity systems - the LLC domain
will not be scanned, and no further heuristic will be applied.
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Quentin Perret <qperret@google.com>
Link: https://lkml.kernel.org/r/20200206191957.12325-2-valentin.schneider@arm.com
Alexei Starovoitov says:
====================
pull-request: bpf 2020-02-19
The following pull-request contains BPF updates for your *net* tree.
We've added 10 non-merge commits during the last 10 day(s) which contain
a total of 10 files changed, 93 insertions(+), 31 deletions(-).
The main changes are:
1) batched bpf hashtab fixes from Brian and Yonghong.
2) various selftests and libbpf fixes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Grabbing the spinlock for every bucket even if it's empty, was causing
significant perfomance cost when traversing htab maps that have only a
few entries. This patch addresses the issue by checking first the
bucket_cnt, if the bucket has some entries then we go and grab the
spinlock and proceed with the batching.
Tested with a htab of size 50K and different value of populated entries.
Before:
Benchmark Time(ns) CPU(ns)
---------------------------------------------
BM_DumpHashMap/1 2759655 2752033
BM_DumpHashMap/10 2933722 2930825
BM_DumpHashMap/200 3171680 3170265
BM_DumpHashMap/500 3639607 3635511
BM_DumpHashMap/1000 4369008 4364981
BM_DumpHashMap/5k 11171919 11134028
BM_DumpHashMap/20k 69150080 69033496
BM_DumpHashMap/39k 190501036 190226162
After:
Benchmark Time(ns) CPU(ns)
---------------------------------------------
BM_DumpHashMap/1 202707 200109
BM_DumpHashMap/10 213441 210569
BM_DumpHashMap/200 478641 472350
BM_DumpHashMap/500 980061 967102
BM_DumpHashMap/1000 1863835 1839575
BM_DumpHashMap/5k 8961836 8902540
BM_DumpHashMap/20k 69761497 69322756
BM_DumpHashMap/39k 187437830 186551111
Fixes: 057996380a ("bpf: Add batch ops to all htab bpf map")
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200218172552.215077-1-brianvv@google.com
Branch records are a CPU feature that can be configured to record
certain branches that are taken during code execution. This data is
particularly interesting for profile guided optimizations. perf has had
branch record support for a while but the data collection can be a bit
coarse grained.
We (Facebook) have seen in experiments that associating metadata with
branch records can improve results (after postprocessing). We generally
use bpf_probe_read_*() to get metadata out of userspace. That's why bpf
support for branch records is useful.
Aside from this particular use case, having branch data available to bpf
progs can be useful to get stack traces out of userspace applications
that omit frame pointers.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200218030432.4600-2-dxu@dxuuu.xyz
s390 math emulation was removed with commit 5a79859ae0 ("s390:
remove 31 bit support"), rendering ieee_emulation_warnings useless.
The code still built because it was protected by CONFIG_MATHEMU, which
was no longer selectable.
This patch removes the sysctl_ieee_emulation_warnings declaration and
the sysctl entry declaration.
Link: https://lkml.kernel.org/r/20200214172628.3598516-1-steve@sk2.org
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Stephen Kitt <steve@sk2.org>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Pull dma-mapping fixes from Christoph Hellwig:
- give command line cma= precedence over the CONFIG_ option (Nicolas
Saenz Julienne)
- always allow 32-bit DMA, even for weirdly placed ZONE_DMA
- improve the debug printks when memory is not addressable, to help
find problems with swiotlb initialization
* tag 'dma-mapping-5.6' of git://git.infradead.org/users/hch/dma-mapping:
dma-direct: improve DMA mask overflow reporting
dma-direct: improve swiotlb error reporting
dma-direct: relax addressability checks in dma_direct_supported
dma-contiguous: CMA: give precedence to cmdline
CON_CONSDEV flag was historically used to put/keep the preferred console
first in console_drivers list. Where the preferred console is the last
on the command line.
The ordering is important only when opening /dev/console:
+ tty_kopen()
+ tty_lookup_driver()
+ console_device()
The flag was originally an implementation detail. But it was later
made accessible from userspace via /proc/consoles. It was used,
for example, by the tool "showconsole" to show the real tty
accessible via /dev/console, see
https://github.com/bitstreamout/showconsole
Now, the current code sets CON_CONSDEV only for the preferred
console or when a fallback console is added. The flag is not
set when the preferred console is defined on the command line
but it is not registered from some reasons.
Simple solution is to set CON_CONSDEV flag for the first
registered console. It will work most of the time because:
+ Most real consoles have console->device defined.
+ Boot consoles are removed in printk_late_init().
+ unregister_console() moves CON_CONSDEV flag to the next
console.
Clean solution would require checking con->device when the
preferred console is registered and in unregister_console().
Conclusion:
Use the simple solution for now. It is better than the current
state and good enough.
The clean solution is not worth it. It would complicate the already
complicated code without too much gain. Instead the code would deserve
a complete rewrite.
Link: https://lore.kernel.org/r/20200213095133.23176-4-pmladek@suse.com
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[pmladek@suse.com: Correct reasoning in the commit message, comment update.]
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
In the following circumstances, the rule of selecting the console
corresponding to the last "console=" entry on the command line as
the preferred console (CON_CONSDEV, ie, /dev/console) fails. This
is a specific example, but it could happen with different consoles
that have a similar name aliasing mechanism.
- The kernel command line has both console=tty0 and console=ttyS0
in that order (the latter with speed etc... arguments).
This is common with some cloud setups such as Amazon Linux.
- add_preferred_console is called early to register "uart0". In
our case that happens from acpi_parse_spcr() on arm64 since the
"enable_console" argument is true on that architecture. This causes
"uart0" to become entry 0 of the console_cmdline array.
Now, because of the above, what happens is:
- add_preferred_console is called by the cmdline parsing for tty0
and ttyS0 respectively, thus occupying entries 1 and 2 of the
console_cmdline array (since this happens after ACPI SPCR parsing).
At that point preferred_console is set to 2 as expected.
- When the tty layer kicks in, it will call register_console for tty0.
This will match entry 1 in console_cmdline array. It isn't our
preferred console but because it's our only console at this point,
it will end up "first" in the consoles list.
- When 8250 probes the actual serial port later on, it calls
register_console for ttyS0. At that point the loop in register_console
tries to match it with the entries in the console_cmdline array.
Ideally this should match ttyS0 in entry 2, which is preferred, causing
it to be inserted first and to replace tty0 as CONSDEV. However, 8250
provides a "match" hook in its struct console, and that hook will match
"uart" as an alias to "ttyS". So we match uart0 at entry 0 in the array
which is not the preferred console and will not match entry 2 which is
since we break out of the loop on the first match. As a result,
we don't set CONSDEV and don't insert it first, but second in
the console list.
As a result, we end up with tty0 remaining first in the array, and thus
/dev/console going there instead of the last user specified one which
is ttyS0.
This tentative fix register_console() to scan first for consoles
specified on the command line, and only if none is found, to then
scan for consoles specified by the architecture.
Link: https://lore.kernel.org/r/20200213095133.23176-3-pmladek@suse.com
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
head is traversed using hlist_for_each_entry_rcu outside an RCU read-side
critical section but under the protection of hash_lock.
Hence, add corresponding lockdep expression to silence false-positive
lockdep warnings, and harden RCU lists.
[ tglx: Removed the macro and put the condition right where it's used ]
Signed-off-by: Amol Grover <frextrite@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200216074330.GA14025@workstation-portable
When working commit 6dcd5d7a7a, a mistake was noticed by Linus:
schedule_timeout() was called without setting the task state to anything
particular.
It calls the scheduler, but doesn't delay anything, because the task stays
runnable. That happens because sched_submit_work() does nothing for tasks
in TASK_RUNNING state.
That turned out to be the intended behavior. Adding a WARN() is not useful
as the task could be woken up right after setting the state and before
reaching schedule_timeout().
Improve the comment about schedule_timeout() and describe that more
explicitly.
Signed-off-by: Alexander Popov <alex.popov@linux.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200117225900.16340-1-alex.popov@linux.com
This if guards whether user-space wants a copy of the offload-jited
bytecode and whether this bytecode exists. By erroneously doing a bitwise
AND instead of a logical AND on user- and kernel-space buffer-size can lead
to no data being copied to user-space especially when user-space size is a
power of two and bigger then the kernel-space buffer.
Fixes: fcfb126def ("bpf: add new jited info fields in bpf_dev_offload and bpf_prog_info")
Signed-off-by: Johannes Krude <johannes@krude.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/20200212193227.GA3769@phlox.h.transitiv.net