Commit Graph

32375 Commits

Author SHA1 Message Date
Brian Vazquez
492e0d0d6f bpf: Do not grab the bucket spinlock by default on htab batch ops
Grabbing the spinlock for every bucket even if it's empty, was causing
significant perfomance cost when traversing htab maps that have only a
few entries. This patch addresses the issue by checking first the
bucket_cnt, if the bucket has some entries then we go and grab the
spinlock and proceed with the batching.

Tested with a htab of size 50K and different value of populated entries.

Before:
  Benchmark             Time(ns)        CPU(ns)
  ---------------------------------------------
  BM_DumpHashMap/1       2759655        2752033
  BM_DumpHashMap/10      2933722        2930825
  BM_DumpHashMap/200     3171680        3170265
  BM_DumpHashMap/500     3639607        3635511
  BM_DumpHashMap/1000    4369008        4364981
  BM_DumpHashMap/5k     11171919       11134028
  BM_DumpHashMap/20k    69150080       69033496
  BM_DumpHashMap/39k   190501036      190226162

After:
  Benchmark             Time(ns)        CPU(ns)
  ---------------------------------------------
  BM_DumpHashMap/1        202707         200109
  BM_DumpHashMap/10       213441         210569
  BM_DumpHashMap/200      478641         472350
  BM_DumpHashMap/500      980061         967102
  BM_DumpHashMap/1000    1863835        1839575
  BM_DumpHashMap/5k      8961836        8902540
  BM_DumpHashMap/20k    69761497       69322756
  BM_DumpHashMap/39k   187437830      186551111

Fixes: 057996380a ("bpf: Add batch ops to all htab bpf map")
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200218172552.215077-1-brianvv@google.com
2020-02-19 15:59:30 -08:00
Stephen Kitt
140588bfed s390: remove obsolete ieee_emulation_warnings
s390 math emulation was removed with commit 5a79859ae0 ("s390:
remove 31 bit support"), rendering ieee_emulation_warnings useless.
The code still built because it was protected by CONFIG_MATHEMU, which
was no longer selectable.

This patch removes the sysctl_ieee_emulation_warnings declaration and
the sysctl entry declaration.

Link: https://lkml.kernel.org/r/20200214172628.3598516-1-steve@sk2.org
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Stephen Kitt <steve@sk2.org>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-02-19 13:51:46 +01:00
Linus Torvalds
0a44cac810 Merge tag 'dma-mapping-5.6' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:

 - give command line cma= precedence over the CONFIG_ option (Nicolas
   Saenz Julienne)

 - always allow 32-bit DMA, even for weirdly placed ZONE_DMA

 - improve the debug printks when memory is not addressable, to help
   find problems with swiotlb initialization

* tag 'dma-mapping-5.6' of git://git.infradead.org/users/hch/dma-mapping:
  dma-direct: improve DMA mask overflow reporting
  dma-direct: improve swiotlb error reporting
  dma-direct: relax addressability checks in dma_direct_supported
  dma-contiguous: CMA: give precedence to cmdline
2020-02-18 15:06:38 -08:00
Johannes Krude
e20d3a055a bpf, offload: Replace bitwise AND by logical AND in bpf_prog_offload_info_fill
This if guards whether user-space wants a copy of the offload-jited
bytecode and whether this bytecode exists. By erroneously doing a bitwise
AND instead of a logical AND on user- and kernel-space buffer-size can lead
to no data being copied to user-space especially when user-space size is a
power of two and bigger then the kernel-space buffer.

Fixes: fcfb126def ("bpf: add new jited info fields in bpf_dev_offload and bpf_prog_info")
Signed-off-by: Johannes Krude <johannes@krude.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/20200212193227.GA3769@phlox.h.transitiv.net
2020-02-17 16:53:49 +01:00
Linus Torvalds
ef78e5b7de Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Misc fixes all over the place:

   - Fix NUMA over-balancing between lightly loaded nodes. This is
     fallout of the big load-balancer rewrite.

   - Fix the NOHZ remote loadavg update logic, which fixes anomalies
     like reported 150 loadavg on mostly idle CPUs.

   - Fix XFS performance/scalability

   - Fix throttled groups unbound task-execution bug

   - Fix PSI procfs boundary condition

   - Fix the cpu.uclamp.{min,max} cgroup configuration write checks

   - Fix DocBook annotations

   - Fix RCU annotations

   - Fix overly CPU-intensive housekeeper CPU logic loop on large CPU
     counts"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Fix kernel-doc warning in attach_entity_load_avg()
  sched/core: Annotate curr pointer in rq with __rcu
  sched/psi: Fix OOB write when writing 0 bytes to PSI files
  sched/fair: Allow a per-CPU kthread waking a task to stack on the same CPU, to fix XFS performance regression
  sched/fair: Prevent unlimited runtime on throttled group
  sched/nohz: Optimize get_nohz_timer_target()
  sched/uclamp: Reject negative values in cpu_uclamp_write()
  sched/fair: Allow a small load imbalance between low utilisation SD_NUMA domains
  timers/nohz: Update NOHZ load in remote tick
  sched/core: Don't skip remote tick for idle CPUs
2020-02-15 12:51:22 -08:00
Linus Torvalds
4e03e4e6d2 Merge tag 'pm-5.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
 "Fix three issues related to the handling of wakeup events signaled
  through the ACPI SCI while suspended to idle (Rafael Wysocki) and
  unexport an internal cpufreq variable (Yangtao Li)"

* tag 'pm-5.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: PM: s2idle: Prevent spurious SCIs from waking up the system
  ACPICA: Introduce acpi_any_gpe_status_set()
  ACPI: PM: s2idle: Avoid possible race related to the EC GPE
  ACPI: EC: Fix flushing of pending work
  cpufreq: Make cpufreq_global_kobject static
2020-02-14 12:34:30 -08:00
Thomas Gleixner
cba6437a18 genirq/proc: Reject invalid affinity masks (again)
Qian Cai reported that the WARN_ON() in the x86/msi affinity setting code,
which catches cases where the affinity setting is not done on the CPU which
is the current target of the interrupt, triggers during CPU hotplug stress
testing.

It turns out that the warning which was added with the commit addressing
the MSI affinity race unearthed yet another long standing bug.

If user space writes a bogus affinity mask, i.e. it contains no online CPUs,
then it calls irq_select_affinity_usr(). This was introduced for ALPHA in

  eee45269b0 ("[PATCH] Alpha: convert to generic irq framework (generic part)")

and subsequently made available for all architectures in

  1840475676 ("genirq: Expose default irq affinity mask (take 3)")

which introduced the circumvention of the affinity setting restrictions for
interrupt which cannot be moved in process context.

The whole exercise is bogus in various aspects:

  1) If the interrupt is already started up then there is absolutely
     no point to honour a bogus interrupt affinity setting from user
     space. The interrupt is already assigned to an online CPU and it
     does not make any sense to reassign it to some other randomly
     chosen online CPU.

  2) If the interupt is not yet started up then there is no point
     either. A subsequent startup of the interrupt will invoke
     irq_setup_affinity() anyway which will chose a valid target CPU.

So the only correct solution is to just return -EINVAL in case user space
wrote an affinity mask which does not contain any online CPUs, except for
ALPHA which has it's own magic sauce for this.

Fixes: 1840475676 ("genirq: Expose default irq affinity mask (take 3)")
Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Qian Cai <cai@lca.pw>
Link: https://lkml.kernel.org/r/878sl8xdbm.fsf@nanos.tec.linutronix.de
2020-02-14 09:43:17 +01:00
Michal Koutný
9c974c7724 cgroup: Iterate tasks that did not finish do_exit()
PF_EXITING is set earlier than actual removal from css_set when a task
is exitting. This can confuse cgroup.procs readers who see no PF_EXITING
tasks, however, rmdir is checking against css_set membership so it can
transitionally fail with EBUSY.

Fix this by listing tasks that weren't unlinked from css_set active
lists.
It may happen that other users of the task iterator (without
CSS_TASK_ITER_PROCS) spot a PF_EXITING task before cgroup_exit(). This
is equal to the state before commit c03cd7738a ("cgroup: Include dying
leaders with live threads in PROCS iterations") but it may be reviewed
later.

Reported-by: Suren Baghdasaryan <surenb@google.com>
Fixes: c03cd7738a ("cgroup: Include dying leaders with live threads in PROCS iterations")
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-02-12 17:02:53 -05:00
Vasily Averin
2d4ecb030d cgroup: cgroup_procs_next should increase position index
If seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output:

1) dd bs=1 skip output of each 2nd elements
$ dd if=/sys/fs/cgroup/cgroup.procs bs=8 count=1
2
3
4
5
1+0 records in
1+0 records out
8 bytes copied, 0,000267297 s, 29,9 kB/s
[test@localhost ~]$ dd if=/sys/fs/cgroup/cgroup.procs bs=1 count=8
2
4 <<< NB! 3 was skipped
6 <<<    ... and 5 too
8 <<<    ... and 7
8+0 records in
8+0 records out
8 bytes copied, 5,2123e-05 s, 153 kB/s

 This happen because __cgroup_procs_start() makes an extra
 extra cgroup_procs_next() call

2) read after lseek beyond end of file generates whole last line.
3) read after lseek into middle of last line generates
expected rest of last line and unexpected whole line once again.

Additionally patch removes an extra position index changes in
__cgroup_procs_start()

Cc: stable@vger.kernel.org
https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-02-12 17:00:04 -05:00
Vasily Averin
db8dd96972 cgroup-v1: cgroup_pidlist_next should update position index
if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.

 # mount | grep cgroup
 # dd if=/mnt/cgroup.procs bs=1  # normal output
...
1294
1295
1296
1304
1382
584+0 records in
584+0 records out
584 bytes copied

dd: /mnt/cgroup.procs: cannot skip to specified offset
83  <<< generates end of last line
1383  <<< ... and whole last line once again
0+1 records in
0+1 records out
8 bytes copied

dd: /mnt/cgroup.procs: cannot skip to specified offset
1386  <<< generates last line anyway
0+1 records in
0+1 records out
5 bytes copied

https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-02-12 16:53:35 -05:00
Linus Torvalds
61a7595403 Merge tag 'trace-v5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
 "Various fixes:

   - Fix an uninitialized variable

   - Fix compile bug to bootconfig userspace tool (in tools directory)

   - Suppress some error messages of bootconfig userspace tool

   - Remove unneded CONFIG_LIBXBC from bootconfig

   - Allocate bootconfig xbc_nodes dynamically. To ease complaints about
     taking up static memory at boot up

   - Use of parse_args() to parse bootconfig instead of strstr() usage
     Prevents issues of double quotes containing the interested string

   - Fix missing ring_buffer_nest_end() on synthetic event error path

   - Return zero not -EINVAL on soft disabled synthetic event (soft
     disabling must be the same as hard disabling, which returns zero)

   - Consolidate synthetic event code (remove duplicate code)"

* tag 'trace-v5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Consolidate trace() functions
  tracing: Don't return -EINVAL when tracing soft disabled synth events
  tracing: Add missing nest end to synth_event_trace_start() error case
  tools/bootconfig: Suppress non-error messages
  bootconfig: Allocate xbc_nodes array dynamically
  bootconfig: Use parse_args() to find bootconfig and '--'
  tracing/kprobe: Fix uninitialized variable bug
  bootconfig: Remove unneeded CONFIG_LIBXBC
  tools/bootconfig: Fix wrong __VA_ARGS__ usage
2020-02-11 16:39:18 -08:00
Randy Dunlap
e9f5490c35 sched/fair: Fix kernel-doc warning in attach_entity_load_avg()
Fix kernel-doc warning in kernel/sched/fair.c, caused by a recent
function parameter removal:

  ../kernel/sched/fair.c:3526: warning: Excess function parameter 'flags' description in 'attach_entity_load_avg'

Fixes: a4f9a0e51b ("sched/fair: Remove redundant call to cpufreq_update_util()")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/cbe964e4-6879-fd08-41c9-ef1917414af4@infradead.org
2020-02-11 13:05:10 +01:00
Madhuparna Bhowmik
4104a562e0 sched/core: Annotate curr pointer in rq with __rcu
This patch fixes the following sparse warnings in sched/core.c
and sched/membarrier.c:

  kernel/sched/core.c:2372:27: error: incompatible types in comparison expression
  kernel/sched/core.c:4061:17: error: incompatible types in comparison expression
  kernel/sched/core.c:6067:9: error: incompatible types in comparison expression
  kernel/sched/membarrier.c:108:21: error: incompatible types in comparison expression
  kernel/sched/membarrier.c:177:21: error: incompatible types in comparison expression
  kernel/sched/membarrier.c:243:21: error: incompatible types in comparison expression

Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200201125803.20245-1-madhuparnabhowmik10@gmail.com
2020-02-11 13:00:37 +01:00
Suren Baghdasaryan
6fcca0fa48 sched/psi: Fix OOB write when writing 0 bytes to PSI files
Issuing write() with count parameter set to 0 on any file under
/proc/pressure/ will cause an OOB write because of the access to
buf[buf_size-1] when NUL-termination is performed. Fix this by checking
for buf_size to be non-zero.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lkml.kernel.org/r/20200203212216.7076-1-surenb@google.com
2020-02-11 13:00:02 +01:00
Rafael J. Wysocki
e3728b50cd ACPI: PM: s2idle: Avoid possible race related to the EC GPE
It is theoretically possible for the ACPI EC GPE to be set after the
s2idle_ops->wake() called from s2idle_loop() has returned and before
the subsequent pm_wakeup_pending() check is carried out.  If that
happens, the resulting wakeup event will cause the system to resume
even though it may be a spurious one.

To avoid that race, first make the ->wake() callback in struct
platform_s2idle_ops return a bool value indicating whether or not
to let the system resume and rearrange s2idle_loop() to use that
value instad of the direct pm_wakeup_pending() call if ->wake() is
present.

Next, rework acpi_s2idle_wake() to process EC events and check
pm_wakeup_pending() before re-arming the SCI for system wakeup
to prevent it from triggering prematurely and add comments to
that function to explain the rationale for the new code flow.

Fixes: 56b9918490 ("PM: sleep: Simplify suspend-to-idle control flow")
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-02-11 10:11:02 +01:00
Tom Zanussi
7276531d40 tracing: Consolidate trace() functions
Move the checking, buffer reserve and buffer commit code in
synth_event_trace_start/end() into inline functions
__synth_event_trace_start/end() so they can also be used by
synth_event_trace() and synth_event_trace_array(), and then have all
those functions use them.

Also, change synth_event_trace_state.enabled to disabled so it only
needs to be set if the event is disabled, which is not normally the
case.

Link: http://lkml.kernel.org/r/b1f3108d0f450e58192955a300e31d0405ab4149.1581374549.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-10 22:00:21 -05:00
Tom Zanussi
0c62f6cd9e tracing: Don't return -EINVAL when tracing soft disabled synth events
There's no reason to return -EINVAL when tracing a synthetic event if
it's soft disabled - treat it the same as if it were hard disabled and
return normally.

Have synth_event_trace() and synth_event_trace_array() just return
normally, and have synth_event_trace_start set the trace state to
disabled and return.

Link: http://lkml.kernel.org/r/df5d02a1625aff97c9866506c5bada6a069982ba.1581374549.git.zanussi@kernel.org

Fixes: 8dcc53ad95 ("tracing: Add synth_event_trace() and related functions")
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-10 22:00:13 -05:00
Tom Zanussi
d090409abb tracing: Add missing nest end to synth_event_trace_start() error case
If the ring_buffer reserve in synth_event_trace_start() fails, the
matching ring_buffer_nest_end() should be called in the error code,
since nothing else will ever call it in this case.

Link: http://lkml.kernel.org/r/20abc444b3eeff76425f895815380abe7aa53ff8.1581374549.git.zanussi@kernel.org

Fixes: 8dcc53ad95 ("tracing: Add synth_event_trace() and related functions")
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-10 21:58:19 -05:00
Linus Torvalds
0a679e13ea Merge branch 'for-5.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
 "I made a mistake while removing cgroup task list lazy init
  optimization making the root cgroup.procs show entries for the
  init_tasks. The zero entries doesn't cause critical failures but does
  make systemd print out warning messages during boot.

  Fix it by omitting init_tasks as they should be"

* 'for-5.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: init_tasks shouldn't be linked to the root cgroup
2020-02-10 17:07:05 -08:00
Hongbo Yao
2bf0eb9b3b bpf: Make btf_check_func_type_match() static
Fix the following sparse warning:

kernel/bpf/btf.c:4131:5: warning: symbol 'btf_check_func_type_match' was
not declared. Should it be static?

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Hongbo Yao <yaohongbo@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200210011441.147102-1-yaohongbo@huawei.com
2020-02-11 00:22:47 +01:00
Gustavo A. R. Silva
10f129cb59 tracing/kprobe: Fix uninitialized variable bug
There is a potential execution path in which variable *ret* is returned
without being properly initialized, previously.

Fix this by initializing variable *ret* to 0.

Link: http://lkml.kernel.org/r/20200205223404.GA3379@embeddedor

Addresses-Coverity-ID: 1491142 ("Uninitialized scalar variable")
Fixes: 2a588dd1d5 ("tracing: Add kprobe event command generation functions")
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-10 12:07:42 -05:00
Mel Gorman
52262ee567 sched/fair: Allow a per-CPU kthread waking a task to stack on the same CPU, to fix XFS performance regression
The following XFS commit:

  8ab39f11d9 ("xfs: prevent CIL push holdoff in log recovery")

changed the logic from using bound workqueues to using unbound
workqueues. Functionally this makes sense but it was observed at the
time that the dbench performance dropped quite a lot and CPU migrations
were increased.

The current pattern of the task migration is straight-forward. With XFS,
an IO issuer delegates work to xlog_cil_push_work ()on an unbound kworker.
This runs on a nearby CPU and on completion, dbench wakes up on its old CPU
as it is still idle and no migration occurs. dbench then queues the real
IO on the blk_mq_requeue_work() work item which runs on a bound kworker
which is forced to run on the same CPU as dbench. When IO completes,
the bound kworker wakes dbench but as the kworker is a bound but,
real task, the CPU is not considered idle and dbench gets migrated by
select_idle_sibling() to a new CPU. dbench may ping-pong between two CPUs
for a while but ultimately it starts a round-robin of all CPUs sharing
the same LLC. High-frequency migration on each IO completion has poor
performance overall. It has negative implications both in commication
costs and power management. mpstat confirmed that at low thread counts
that all CPUs sharing an LLC has low level of activity.

Note that even if the CIL patch was reverted, there still would
be migrations but the impact is less noticeable. It turns out that
individually the scheduler, XFS, blk-mq and workqueues all made sensible
decisions but in combination, the overall effect was sub-optimal.

This patch special cases the IO issue/completion pattern and allows
a bound kworker waker and a task wakee to stack on the same CPU if
there is a strong chance they are directly related. The expectation
is that the kworker is likely going back to sleep shortly. This is not
guaranteed as the IO could be queued asynchronously but there is a very
strong relationship between the task and kworker in this case that would
justify stacking on the same CPU instead of migrating. There should be
few concerns about kworker starvation given that the special casing is
only when the kworker is the waker.

DBench on XFS
MMTests config: io-dbench4-async modified to run on a fresh XFS filesystem

UMA machine with 8 cores sharing LLC
                          5.5.0-rc7              5.5.0-rc7
                  tipsched-20200124           kworkerstack
Amean     1        22.63 (   0.00%)       20.54 *   9.23%*
Amean     2        25.56 (   0.00%)       23.40 *   8.44%*
Amean     4        28.63 (   0.00%)       27.85 *   2.70%*
Amean     8        37.66 (   0.00%)       37.68 (  -0.05%)
Amean     64      469.47 (   0.00%)      468.26 (   0.26%)
Stddev    1         1.00 (   0.00%)        0.72 (  28.12%)
Stddev    2         1.62 (   0.00%)        1.97 ( -21.54%)
Stddev    4         2.53 (   0.00%)        3.58 ( -41.19%)
Stddev    8         5.30 (   0.00%)        5.20 (   1.92%)
Stddev    64       86.36 (   0.00%)       94.53 (  -9.46%)

NUMA machine, 48 CPUs total, 24 CPUs share cache
                           5.5.0-rc7              5.5.0-rc7
                   tipsched-20200124      kworkerstack-v1r2
Amean     1         58.69 (   0.00%)       30.21 *  48.53%*
Amean     2         60.90 (   0.00%)       35.29 *  42.05%*
Amean     4         66.77 (   0.00%)       46.55 *  30.28%*
Amean     8         81.41 (   0.00%)       68.46 *  15.91%*
Amean     16       113.29 (   0.00%)      107.79 *   4.85%*
Amean     32       199.10 (   0.00%)      198.22 *   0.44%*
Amean     64       478.99 (   0.00%)      477.06 *   0.40%*
Amean     128     1345.26 (   0.00%)     1372.64 *  -2.04%*
Stddev    1          2.64 (   0.00%)        4.17 ( -58.08%)
Stddev    2          4.35 (   0.00%)        5.38 ( -23.73%)
Stddev    4          6.77 (   0.00%)        6.56 (   3.00%)
Stddev    8         11.61 (   0.00%)       10.91 (   6.04%)
Stddev    16        18.63 (   0.00%)       19.19 (  -3.01%)
Stddev    32        38.71 (   0.00%)       38.30 (   1.06%)
Stddev    64       100.28 (   0.00%)       91.24 (   9.02%)
Stddev    128      186.87 (   0.00%)      160.34 (  14.20%)

Dbench has been modified to report the time to complete a single "load
file". This is a more meaningful metric for dbench that a throughput
metric as the benchmark makes many different system calls that are not
throughput-related

Patch shows a 9.23% and 48.53% reduction in the time to process a load
file with the difference partially explained by the number of CPUs sharing
a LLC. In a separate run, task migrations were almost eliminated by the
patch for low client counts. In case people have issue with the metric
used for the benchmark, this is a comparison of the throughputs as
reported by dbench on the NUMA machine.

dbench4 Throughput (misleading but traditional)
                           5.5.0-rc7              5.5.0-rc7
                   tipsched-20200124      kworkerstack-v1r2
Hmean     1        321.41 (   0.00%)      617.82 *  92.22%*
Hmean     2        622.87 (   0.00%)     1066.80 *  71.27%*
Hmean     4       1134.56 (   0.00%)     1623.74 *  43.12%*
Hmean     8       1869.96 (   0.00%)     2212.67 *  18.33%*
Hmean     16      2673.11 (   0.00%)     2806.13 *   4.98%*
Hmean     32      3032.74 (   0.00%)     3039.54 (   0.22%)
Hmean     64      2514.25 (   0.00%)     2498.96 *  -0.61%*
Hmean     128     1778.49 (   0.00%)     1746.05 *  -1.82%*

Note that this is somewhat specific to XFS and ext4 shows no performance
difference as it does not rely on kworkers in the same way. No major
problem was observed running other workloads on different machines although
not all tests have completed yet.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200128154006.GD3466@techsingularity.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-02-10 11:24:37 +01:00
Linus Torvalds
89a47dd1af Merge tag 'kbuild-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild updates from Masahiro Yamada:

 - fix randconfig to generate a sane .config

 - rename hostprogs-y / always to hostprogs / always-y, which are more
   natual syntax.

 - optimize scripts/kallsyms

 - fix yes2modconfig and mod2yesconfig

 - make multiple directory targets ('make foo/ bar/') work

* tag 'kbuild-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: make multiple directory targets work
  kconfig: Invalidate all symbols after changing to y or m.
  kallsyms: fix type of kallsyms_token_table[]
  scripts/kallsyms: change table to store (strcut sym_entry *)
  scripts/kallsyms: rename local variables in read_symbol()
  kbuild: rename hostprogs-y/always to hostprogs/always-y
  kbuild: fix the document to use extra-y for vmlinux.lds
  kconfig: fix broken dependency in randconfig-generated .config
2020-02-09 16:05:50 -08:00
Linus Torvalds
1a2a76c268 Merge tag 'x86-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A set of fixes for X86:

   - Ensure that the PIT is set up when the local APIC is disable or
     configured in legacy mode. This is caused by an ordering issue
     introduced in the recent changes which skip PIT initialization when
     the TSC and APIC frequencies are already known.

   - Handle malformed SRAT tables during early ACPI parsing which caused
     an infinite loop anda boot hang.

   - Fix a long standing race in the affinity setting code which affects
     PCI devices with non-maskable MSI interrupts. The problem is caused
     by the non-atomic writes of the MSI address (destination APIC id)
     and data (vector) fields which the device uses to construct the MSI
     message. The non-atomic writes are mandated by PCI.

     If both fields change and the device raises an interrupt after
     writing address and before writing data, then the MSI block
     constructs a inconsistent message which causes interrupts to be
     lost and subsequent malfunction of the device.

     The fix is to redirect the interrupt to the new vector on the
     current CPU first and then switch it over to the new target CPU.
     This allows to observe an eventually raised interrupt in the
     transitional stage (old CPU, new vector) to be observed in the APIC
     IRR and retriggered on the new target CPU and the new vector.

     The potential spurious interrupts caused by this are harmless and
     can in the worst case expose a buggy driver (all handlers have to
     be able to deal with spurious interrupts as they can and do happen
     for various reasons).

   - Add the missing suspend/resume mechanism for the HYPERV hypercall
     page which prevents resume hibernation on HYPERV guests. This
     change got lost before the merge window.

   - Mask the IOAPIC before disabling the local APIC to prevent
     potentially stale IOAPIC remote IRR bits which cause stale
     interrupt lines after resume"

* tag 'x86-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Mask IOAPIC entries when disabling the local APIC
  x86/hyperv: Suspend/resume the hypercall page for hibernation
  x86/apic/msi: Plug non-maskable MSI affinity race
  x86/boot: Handle malformed SRAT tables during early ACPI parsing
  x86/timer: Don't skip PIT setup when APIC is disabled or in legacy mode
2020-02-09 12:11:12 -08:00
Linus Torvalds
f41377609a Merge tag 'smp-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull SMP fixes from Thomas Gleixner:
 "Two fixes for the SMP related functionality:

   - Make the UP version of smp_call_function_single() match SMP
     semantics when called for a not available CPU. Instead of emitting
     a warning and assuming that the function call target is CPU0,
     return a proper error code like the SMP version does.

   - Remove a superfluous check in smp_call_function_many_cond()"

* tag 'smp-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  smp/up: Make smp_call_function_single() match SMP semantics
  smp: Remove superfluous cond_func check in smp_call_function_many_cond()
2020-02-09 12:09:43 -08:00
Linus Torvalds
ca21b9b370 Merge tag 'perf-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "A set of fixes and improvements for the perf subsystem:

  Kernel fixes:

   - Install cgroup events to the correct CPU context to prevent a
     potential list double add

   - Prevent an integer underflow in the perf mlock accounting

   - Add a missing prototype for arch_perf_update_userpage()

  Tooling:

   - Add a missing unlock in the error path of maps__insert() in perf
     maps.

   - Fix the build with the latest libbfd

   - Fix the perf parser so it does not delete parse event terms, which
     caused a regression for using perf with the ARM CoreSight as the
     sink configuration was missing due to the deletion.

   - Fix the double free in the perf CPU map merging test case

   - Add the missing ustring support for the perf probe command"

* tag 'perf-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf maps: Add missing unlock to maps__insert() error case
  perf probe: Add ustring support for perf probe command
  perf: Make perf able to build with latest libbfd
  perf test: Fix test case Merge cpu map
  perf parse: Copy string to perf_evsel_config_term
  perf parse: Refactor 'struct perf_evsel_config_term'
  kernel/events: Add a missing prototype for arch_perf_update_userpage()
  perf/cgroups: Install cgroup events to correct cpuctx
  perf/core: Fix mlock accounting in perf_mmap()
2020-02-09 12:04:09 -08:00
Linus Torvalds
2fbc23c738 Merge tag 'timers-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
 "Two small fixes for the time(r) subsystem:

   - Handle a subtle race between the clocksource watchdog and a
     concurrent clocksource watchdog stop/start sequence correctly to
     prevent a timer double add bug.

   - Fix the file path for the core time namespace file"

* tag 'timers-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource: Prevent double add_timer_on() for watchdog_timer
  MAINTAINERS: Correct path to time namespace source file
2020-02-09 12:00:12 -08:00
Linus Torvalds
f06bed87d7 Merge tag 'irq-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull interrupt fixes from Thomas Gleixner:
 "A set of fixes for the interrupt subsystem:

   - Provision only ACPI enabled redistributors on GICv3

   - Use the proper command colums when building the INVALL command for
     the GICv3-ITS

   - Ensure the allocation of the L2 vPE table for GICv4.1

   - Correct the GICv4.1 VPROBASER programming so it uses the proper
     size

   - A set of small GICv4.1 tidy up patches

   - Configuration cleanup for C-SKY interrupt chip

   - Clarify the function documentation for irq_set_wake() to document
     that the wakeup functionality is orthogonal to the irq
     disable/enable mechanism"

* tag 'irq-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic-v3-its: Rename VPENDBASER/VPROPBASER accessors
  irqchip/gic-v3-its: Remove superfluous WARN_ON
  irqchip/gic-v4.1: Drop 'tmp' in inherit_vpe_l1_table_from_rd()
  irqchip/gic-v4.1: Ensure L2 vPE table is allocated at RD level
  irqchip/gic-v4.1: Set vpe_l1_base for all redistributors
  irqchip/gic-v4.1: Fix programming of GICR_VPROPBASER_4_1_SIZE
  genirq: Clarify that irq wake state is orthogonal to enable/disable
  irqchip/gic-v3-its: Reference to its_invall_cmd descriptor when building INVALL
  irqchip: Some Kconfig cleanup for C-SKY
  irqchip/gic-v3: Only provision redistributors that are enabled in ACPI
2020-02-09 11:56:41 -08:00
Linus Torvalds
291abfea47 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Unbalanced locking in mwifiex_process_country_ie, from Brian Norris.

 2) Fix thermal zone registration in iwlwifi, from Andrei
    Otcheretianski.

 3) Fix double free_irq in sgi ioc3 eth, from Thomas Bogendoerfer.

 4) Use after free in mptcp, from Florian Westphal.

 5) Use after free in wireguard's root_remove_peer_lists, from Eric
    Dumazet.

 6) Properly access packets heads in bonding alb code, from Eric
    Dumazet.

 7) Fix data race in skb_queue_len(), from Qian Cai.

 8) Fix regression in r8169 on some chips, from Heiner Kallweit.

 9) Fix XDP program ref counting in hv_netvsc, from Haiyang Zhang.

10) Certain kinds of set link netlink operations can cause a NULL deref
    in the ipv6 addrconf code. Fix from Eric Dumazet.

11) Don't cancel uninitialized work queue in drop monitor, from Ido
    Schimmel.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits)
  net: thunderx: use proper interface type for RGMII
  mt76: mt7615: fix max_nss in mt7615_eeprom_parse_hw_cap
  bpf: Improve bucket_log calculation logic
  selftests/bpf: Test freeing sockmap/sockhash with a socket in it
  bpf, sockhash: Synchronize_rcu before free'ing map
  bpf, sockmap: Don't sleep while holding RCU lock on tear-down
  bpftool: Don't crash on missing xlated program instructions
  bpf, sockmap: Check update requirements after locking
  drop_monitor: Do not cancel uninitialized work item
  mlxsw: spectrum_dpipe: Add missing error path
  mlxsw: core: Add validation of hardware device types for MGPIR register
  mlxsw: spectrum_router: Clear offload indication from IPv6 nexthops on abort
  selftests: mlxsw: Add test cases for local table route replacement
  mlxsw: spectrum_router: Prevent incorrect replacement of local table routes
  net: dsa: microchip: enable module autoprobe
  ipv6/addrconf: fix potential NULL deref in inet6_set_link_af()
  dpaa_eth: support all modes with rate adapting PHYs
  net: stmmac: update pci platform data to use phy_interface
  net: stmmac: xgmac: fix missing IFF_MULTICAST checki in dwxgmac2_set_filter
  net: stmmac: fix missing IFF_MULTICAST check in dwmac4_set_filter
  ...
2020-02-08 17:15:08 -08:00
Linus Torvalds
c9d35ee049 Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs file system parameter updates from Al Viro:
 "Saner fs_parser.c guts and data structures. The system-wide registry
  of syntax types (string/enum/int32/oct32/.../etc.) is gone and so is
  the horror switch() in fs_parse() that would have to grow another case
  every time something got added to that system-wide registry.

  New syntax types can be added by filesystems easily now, and their
  namespace is that of functions - not of system-wide enum members. IOW,
  they can be shared or kept private and if some turn out to be widely
  useful, we can make them common library helpers, etc., without having
  to do anything whatsoever to fs_parse() itself.

  And we already get that kind of requests - the thing that finally
  pushed me into doing that was "oh, and let's add one for timeouts -
  things like 15s or 2h". If some filesystem really wants that, let them
  do it. Without somebody having to play gatekeeper for the variants
  blessed by direct support in fs_parse(), TYVM.

  Quite a bit of boilerplate is gone. And IMO the data structures make a
  lot more sense now. -200LoC, while we are at it"

* 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (25 commits)
  tmpfs: switch to use of invalfc()
  cgroup1: switch to use of errorfc() et.al.
  procfs: switch to use of invalfc()
  hugetlbfs: switch to use of invalfc()
  cramfs: switch to use of errofc() et.al.
  gfs2: switch to use of errorfc() et.al.
  fuse: switch to use errorfc() et.al.
  ceph: use errorfc() and friends instead of spelling the prefix out
  prefix-handling analogues of errorf() and friends
  turn fs_param_is_... into functions
  fs_parse: handle optional arguments sanely
  fs_parse: fold fs_parameter_desc/fs_parameter_spec
  fs_parser: remove fs_parameter_description name field
  add prefix to fs_context->log
  ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log
  new primitive: __fs_parse()
  switch rbd and libceph to p_log-based primitives
  struct p_log, variants of warnf() et.al. taking that one instead
  teach logfc() to handle prefices, give it saner calling conventions
  get rid of cg_invalf()
  ...
2020-02-08 13:26:41 -08:00
David S. Miller
2696e1146d Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2020-02-07

The following pull-request contains BPF updates for your *net* tree.

We've added 15 non-merge commits during the last 10 day(s) which contain
a total of 12 files changed, 114 insertions(+), 31 deletions(-).

The main changes are:

1) Various BPF sockmap fixes related to RCU handling in the map's tear-
   down code, from Jakub Sitnicki.

2) Fix macro state explosion in BPF sk_storage map when calculating its
   bucket_log on allocation, from Martin KaFai Lau.

3) Fix potential BPF sockmap update race by rechecking socket's established
   state under lock, from Lorenz Bauer.

4) Fix crash in bpftool on missing xlated instructions when kptr_restrict
   sysctl is set, from Toke Høiland-Jørgensen.

5) Fix i40e's XSK wakeup code to return proper error in busy state and
   various misc fixes in xdpsock BPF sample code, from Maciej Fijalkowski.

6) Fix the way modifiers are skipped in BTF in the verifier while walking
   pointers to avoid program rejection, from Alexei Starovoitov.

7) Fix Makefile for runqslower BPF tool to i) rebuild on libbpf changes and
   ii) to fix undefined reference linker errors for older gcc version due to
   order of passed gcc parameters, from Yulia Kartseva and Song Liu.

8) Fix a trampoline_count BPF kselftest warning about missing braces around
   initializer, from Andrii Nakryiko.

9) Fix up redundant "HAVE" prefix from large INSN limit kernel probe in
   bpftool, from Michal Rostecki.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-08 15:01:03 +01:00
Stephen Boyd
f9f21cea31 genirq: Clarify that irq wake state is orthogonal to enable/disable
There's some confusion around if an irq that's disabled with disable_irq()
can still wake the system from sleep states such as "suspend to RAM".

Clarify this in the kernel documentation for irq_set_irq_wake() so that
it's clear that an irq can be disabled and still wake the system if it has
been marked for wakeup.

Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lkml.kernel.org/r/20200206191521.94559-1-swboyd@chromium.org
2020-02-07 21:37:08 +01:00
Al Viro
58c025f0e8 cgroup1: switch to use of errorfc() et.al.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:43 -05:00
Al Viro
d7167b1499 fs_parse: fold fs_parameter_desc/fs_parameter_spec
The former contains nothing but a pointer to an array of the latter...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:37 -05:00
Eric Sandeen
96cafb9ccb fs_parser: remove fs_parameter_description name field
Unused now.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:36 -05:00
Al Viro
fbc2d1686d get rid of cg_invalf()
pointless alias for invalf()...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:31 -05:00
Paul E. McKenney
1e474b28e7 smp/up: Make smp_call_function_single() match SMP semantics
In CONFIG_SMP=y kernels, smp_call_function_single() returns -ENXIO when
invoked for a non-existent CPU.  In contrast, in CONFIG_SMP=n kernels,
a splat is emitted and smp_call_function_single() otherwise silently
ignores its "cpu" argument, instead pretending that the caller intended
to have something happen on CPU 0.  Given that there is now code that
expects smp_call_function_single() to return an error if a bad CPU was
specified, this difference in semantics needs to be addressed.

Bring the semantics of the CONFIG_SMP=n version of
smp_call_function_single() into alignment with its CONFIG_SMP=y
counterpart.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200205143409.GA7021@paulmck-ThinkPad-P72
2020-02-07 15:34:12 +01:00
Linus Torvalds
d854b2d639 Merge tag 'kgdb-fixes-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux
Pull kgdb fix from Daniel Thompson:
 "One of the simplifications added for 5.6-rc1 has caused build
  regressions on some platforms (it was reported for sparc64).

  This fixes it with a revert"

* tag 'kgdb-fixes-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
  Revert "kdb: Get rid of confusing diag msg from "rd" if current task has no regs"
2020-02-06 09:05:42 -08:00
Daniel Thompson
fcf2736c82 Revert "kdb: Get rid of confusing diag msg from "rd" if current task has no regs"
This reverts commit bbfceba15f.

When DBG_MAX_REG_NUM is zero then a number of symbols are conditionally
defined. It is therefore not possible to check it using C expressions.

Reported-by: Anatoly Pugachev <matorola@gmail.com>
Acked-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2020-02-06 11:40:09 +00:00
Linus Torvalds
e310396bb8 Merge tag 'trace-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:

 - Added new "bootconfig".

   This looks for a file appended to initrd to add boot config options,
   and has been discussed thoroughly at Linux Plumbers.

   Very useful for adding kprobes at bootup.

   Only enabled if "bootconfig" is on the real kernel command line.

 - Created dynamic event creation.

   Merges common code between creating synthetic events and kprobe
   events.

 - Rename perf "ring_buffer" structure to "perf_buffer"

 - Rename ftrace "ring_buffer" structure to "trace_buffer"

   Had to rename existing "trace_buffer" to "array_buffer"

 - Allow trace_printk() to work withing (some) tracing code.

 - Sort of tracing configs to be a little better organized

 - Fixed bug where ftrace_graph hash was not being protected properly

 - Various other small fixes and clean ups

* tag 'trace-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (88 commits)
  bootconfig: Show the number of nodes on boot message
  tools/bootconfig: Show the number of bootconfig nodes
  bootconfig: Add more parse error messages
  bootconfig: Use bootconfig instead of boot config
  ftrace: Protect ftrace_graph_hash with ftrace_sync
  ftrace: Add comment to why rcu_dereference_sched() is open coded
  tracing: Annotate ftrace_graph_notrace_hash pointer with __rcu
  tracing: Annotate ftrace_graph_hash pointer with __rcu
  bootconfig: Only load bootconfig if "bootconfig" is on the kernel cmdline
  tracing: Use seq_buf for building dynevent_cmd string
  tracing: Remove useless code in dynevent_arg_pair_add()
  tracing: Remove check_arg() callbacks from dynevent args
  tracing: Consolidate some synth_event_trace code
  tracing: Fix now invalid var_ref_vals assumption in trace action
  tracing: Change trace_boot to use synth_event interface
  tracing: Move tracing selftests to bottom of menu
  tracing: Move mmio tracer config up with the other tracers
  tracing: Move tracing test module configs together
  tracing: Move all function tracing configs together
  tracing: Documentation for in-kernel synthetic event API
  ...
2020-02-06 07:12:11 +00:00
Steven Rostedt (VMware)
54a16ff6f2 ftrace: Protect ftrace_graph_hash with ftrace_sync
As function_graph tracer can run when RCU is not "watching", it can not be
protected by synchronize_rcu() it requires running a task on each CPU before
it can be freed. Calling schedule_on_each_cpu(ftrace_sync) needs to be used.

Link: https://lore.kernel.org/r/20200205131110.GT2935@paulmck-ThinkPad-P72

Cc: stable@vger.kernel.org
Fixes: b9b0c831be ("ftrace: Convert graph filter to use hash tables")
Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-05 17:16:42 -05:00
Steven Rostedt (VMware)
16052dd5bd ftrace: Add comment to why rcu_dereference_sched() is open coded
Because the function graph tracer can execute in sections where RCU is not
"watching", the rcu_dereference_sched() for the has needs to be open coded.
This is fine because the RCU "flavor" of the ftrace hash is protected by
its own RCU handling (it does its own little synchronization on every CPU
and does not rely on RCU sched).

Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-05 17:15:57 -05:00
Amol Grover
fd0e6852c4 tracing: Annotate ftrace_graph_notrace_hash pointer with __rcu
Fix following instances of sparse error
kernel/trace/ftrace.c:5667:29: error: incompatible types in comparison
kernel/trace/ftrace.c:5813:21: error: incompatible types in comparison
kernel/trace/ftrace.c:5868:36: error: incompatible types in comparison
kernel/trace/ftrace.c:5870:25: error: incompatible types in comparison

Use rcu_dereference_protected to dereference the newly annotated pointer.

Link: http://lkml.kernel.org/r/20200205055701.30195-1-frextrite@gmail.com

Signed-off-by: Amol Grover <frextrite@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-05 17:14:37 -05:00
Amol Grover
24a9729f83 tracing: Annotate ftrace_graph_hash pointer with __rcu
Fix following instances of sparse error
kernel/trace/ftrace.c:5664:29: error: incompatible types in comparison
kernel/trace/ftrace.c:5785:21: error: incompatible types in comparison
kernel/trace/ftrace.c:5864:36: error: incompatible types in comparison
kernel/trace/ftrace.c:5866:25: error: incompatible types in comparison

Use rcu_dereference_protected to access the __rcu annotated pointer.

Link: http://lkml.kernel.org/r/20200201072703.17330-1-frextrite@gmail.com

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Amol Grover <frextrite@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-05 17:14:26 -05:00
Christoph Hellwig
75467ee48a dma-direct: improve DMA mask overflow reporting
Remove the unset dma_mask case as that won't get into mapping calls
anymore, and also report the other errors unconditonally and with a
slightly improved message.  Remove the now pointless report_addr helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad@darnok.org>
2020-02-05 18:53:41 +01:00
Christoph Hellwig
4a47cbae04 dma-direct: improve swiotlb error reporting
Untangle the way how dma_direct_map_page calls into swiotlb to be able
to properly report errors where the swiotlb DMA address overflows the
mask separately from overflows in the !swiotlb case.  This means that
siotlb_map now has to do a little more work that duplicates
dma_direct_map_page, but doing so greatly simplifies the calling
convention.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2020-02-05 18:53:05 +01:00
Christoph Hellwig
91ef26f914 dma-direct: relax addressability checks in dma_direct_supported
dma_direct_supported tries to find the minimum addressable bitmask
based on the end pfn and optional magic that architectures can use
to communicate the size of the magic ZONE_DMA that can be used
for bounce buffering.  But between the DMA offsets that can change
per device (or sometimes even region), the fact the ZONE_DMA isn't
even guaranteed to be the lowest addresses and failure of having
proper interfaces to the MM code this fails at least for one
arm subarchitecture.

As all the legacy DMA implementations have supported 32-bit DMA
masks, and 32-bit masks are guranteed to always work by the API
contract (using bounce buffers if needed), we can short cut the
complicated check and always return true without breaking existing
assumptions.  Hopefully we can properly clean up the interaction
with the arch defined zones and the bootmem allocator eventually.

Fixes: ad3c7b18c5 ("arm: use swiotlb for bounce buffering on LPAE configs")
Reported-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
2020-02-05 18:50:55 +01:00
Ingo Molnar
fdff7c21ea Merge branch 'linus' into perf/urgent, to synchronize with upstream
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-02-05 08:44:22 +01:00
Linus Torvalds
72f582ff85 Merge branch 'work.recursive_removal' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs recursive removal updates from Al Viro:
 "We have quite a few places where synthetic filesystems do an
  equivalent of 'rm -rf', with varying amounts of code duplication,
  wrong locking, etc. That really ought to be a library helper.

  Only debugfs (and very similar tracefs) are converted here - I have
  more conversions, but they'd never been in -next, so they'll have to
  wait"

* 'work.recursive_removal' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  simple_recursive_removal(): kernel-side rm -rf for ramfs-style filesystems
2020-02-05 05:09:46 +00:00
Masahiro Yamada
cde26a6e17 kallsyms: fix type of kallsyms_token_table[]
kallsyms_token_table[] only contains ASCII characters. It should be
char instead of u8.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
2020-02-05 13:45:37 +09:00