Add support for AXI Multichannel Direct Memory Access (AXI MCDMA)
core, which is a soft Xilinx IP core that provides high-bandwidth
direct memory access between memory and AXI4-Stream target peripherals.
The AXI MCDMA core provides scatter-gather interface with multiple
independent transmit and receive channels. The driver supports
device_prep_slave_sg slave transfer mode.
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Link: https://lore.kernel.org/r/1571763622-29281-7-git-send-email-radhey.shyam.pandey@xilinx.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The rdma_user_mmap_io interface created a common interface for drivers to
correctly map hw resources and zap them once the ucontext is destroyed
enabling the drivers to safely free the hw resources.
However, this meant the drivers need to delay freeing the resource to the
ucontext destroy phase to ensure they were no longer mapped. The new
mechanism for a common way of handling user/driver address mapping enabled
notifying the driver if all umap_priv mappings were removed, and enabled
freeing the hw resources when they are done with and not delay it until
ucontext destroy.
Since not all drivers use the mechanism, NULL can be sent to the
rdma_user_mmap_io interface to continue working as before. Drivers that
use the mmap_xa interface can pass the entry being mapped to the
rdma_user_mmap_io function to be linked together.
Link: https://lore.kernel.org/r/20191030094417.16866-4-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Create some common API's for adding entries to a xa_mmap. Searching for
an entry and freeing one.
The general approach is copied from the EFA driver and improved to be more
general and do more to help the drivers. Integration with the core allows
a reference counted scheme with a free function so that the driver can
know when its mmaps are all gone.
This significant new functionality will be helpful for drivers to have the
correct lifetime model for mmap objects.
Link: https://lore.kernel.org/r/20191030094417.16866-3-michal.kalderon@marvell.com
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The AXI DMA multichannel support is deprecated in the IP and it is no
longer actively supported. For multichannel support, refer to the AXI
multichannel direct memory access IP product guide(PG228) and MCDMA
driver(added in the subsequent commits). Inline with it remove axidma
multichannel optional properties i.e xlnx,mcdma and dma-channels from
the binding description.
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Acked-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/1571763622-29281-2-git-send-email-radhey.shyam.pandey@xilinx.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Add USB ID for MOXA UPort 2210. This device contains mos7820 but
it passes GPIO0 check implemented by driver and it's detected as
mos7840. Hence product id check is added to force mos7820 mode.
Signed-off-by: Pavel Löbl <pavel@loebl.cz>
Cc: stable <stable@vger.kernel.org>
[ johan: rename id defines and add vendor-id check ]
Signed-off-by: Johan Hovold <johan@kernel.org>
The clean up commit 41672c0c24 ("ALSA: timer: Simplify error path in
snd_timer_open()") unified the error handling code paths with the
standard goto, but it introduced a subtle bug: the timer instance is
stored in snd_timer_open() incorrectly even if it returns an error.
This may eventually lead to UAF, as spotted by fuzzer.
The culprit is the snd_timer_open() code checks the
SNDRV_TIMER_IFLG_EXCLUSIVE flag with the common variable timeri.
This variable is supposed to be the newly created instance, but we
(ab-)used it for a temporary check before the actual creation of a
timer instance. After that point, there is another check for the max
number of instances, and it bails out if over the threshold. Before
the refactoring above, it worked fine because the code returned
directly from that point. After the refactoring, however, it jumps to
the unified error path that stores the timeri variable in return --
even if it returns an error. Unfortunately this stored value is kept
in the caller side (snd_timer_user_tselect()) in tu->timeri. This
causes inconsistency later, as if the timer was successfully
assigned.
In this patch, we fix it by not re-using timeri variable but a
temporary variable for testing the exclusive connection, so timeri
remains NULL at that point.
Fixes: 41672c0c24 ("ALSA: timer: Simplify error path in snd_timer_open()")
Reported-and-tested-by: Tristan Madani <tristmd@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191106165547.23518-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The fuzzer tries to open the timer instances as much as possible, and
this may cause a system hiccup easily. We've already introduced the
cap for the max number of available instances for the h/w timers, and
we should put such a limit also to the slave timers, too.
This patch introduces the limit to the multiple opened slave timers.
The upper limit is hard-coded to 1000 for now, which should suffice
for any practical usages up to now.
Link: https://lore.kernel.org/r/20191106154257.5853-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
While upgrading from 4.16 to 5.2, we noticed these allocation errors in
the log of the new kernel:
SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
cache: tw_sock_TCPv6(960:helper-logs), object size: 232, buffer size: 240, default order: 1, min order: 0
node 0: slabs: 5, objs: 170, free: 0
slab_out_of_memory+1
___slab_alloc+969
__slab_alloc+14
kmem_cache_alloc+346
inet_twsk_alloc+60
tcp_time_wait+46
tcp_fin+206
tcp_data_queue+2034
tcp_rcv_state_process+784
tcp_v6_do_rcv+405
__release_sock+118
tcp_close+385
inet_release+46
__sock_release+55
sock_close+17
__fput+170
task_work_run+127
exit_to_usermode_loop+191
do_syscall_64+212
entry_SYSCALL_64_after_hwframe+68
accompanied by an increase in machines going completely radio silent
under memory pressure.
One thing that changed since 4.16 is e699e2c6a6 ("net, mm: account
sock objects to kmemcg"), which made these slab caches subject to cgroup
memory accounting and control.
The problem with that is that cgroups, unlike the page allocator, do not
maintain dedicated atomic reserves. As a cgroup's usage hovers at its
limit, atomic allocations - such as done during network rx - can fail
consistently for extended periods of time. The kernel is not able to
operate under these conditions.
We don't want to revert the culprit patch, because it indeed tracks a
potentially substantial amount of memory used by a cgroup.
We also don't want to implement dedicated atomic reserves for cgroups.
There is no point in keeping a fixed margin of unused bytes in the
cgroup's memory budget to accomodate a consumer that is impossible to
predict - we'd be wasting memory and get into configuration headaches,
not unlike what we have going with min_free_kbytes. We do this for
physical mem because we have to, but cgroups are an accounting game.
Instead, account these privileged allocations to the cgroup, but let
them bypass the configured limit if they have to. This way, we get the
benefits of accounting the consumed memory and have it exert pressure on
the rest of the cgroup, but like with the page allocator, we shift the
burden of reclaimining on behalf of atomic allocations onto the regular
allocations that can block.
Link: http://lkml.kernel.org/r/20191022233708.365764-1-hannes@cmpxchg.org
Fixes: e699e2c6a6 ("net, mm: account sock objects to kmemcg")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org> [4.18+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We recently started updating the node span based on the zone span to
avoid touching uninitialized memmaps.
Currently, we will always detect the node span to start at 0, meaning a
node can easily span too many pages. pgdat_is_empty() will still work
correctly if all zones span no pages. We should skip over all zones
without spanned pages and properly handle the first detected zone that
spans pages.
Unfortunately, in contrast to the zone span (/proc/zoneinfo), the node
span cannot easily be inspected and tested. The node span gives no real
guarantees when an architecture supports memory hotplug, meaning it can
easily contain holes or span pages of different nodes.
The node span is not really used after init on architectures that
support memory hotplug.
E.g., we use it in mm/memory_hotplug.c:try_offline_node() and in
mm/kmemleak.c:kmemleak_scan(). These users seem to be fine.
Link: http://lkml.kernel.org/r/20191027222714.5313-1-david@redhat.com
Fixes: 00d6c019b5 ("mm/memory_hotplug: don't access uninitialized memmaps in shrink_pgdat_span()")
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
gcc's -freorder-blocks-and-partition option makes it group frequently
and infrequently used code in .text.hot and .text.unlikely sections
respectively. At least when building modules on s390, this option is
used by default.
gdb assumes that all code is located in .text section, and that .text
section is located at module load address. With such modules this is no
longer the case: there is code in .text.hot and .text.unlikely, and
either of them might precede .text.
Fix by explicitly telling gdb the addresses of code sections.
It might be tempting to do this for all sections, not only the ones in
the white list. Unfortunately, gdb appears to have an issue, when
telling it about e.g. loadable .note.gnu.build-id section causes it to
think that non-loadable .note.Linux section is loaded at address 0,
which in turn causes NULL pointers to be resolved to bogus symbols. So
keep using the white list approach for the time being.
Link: http://lkml.kernel.org/r/20191028152734.13065-1-iii@linux.ibm.com
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
page_cgroup_ino() doesn't return a valid memcg pointer for non-compound
slab pages, because it depends on PgHead AND PgSlab flags to be set to
determine the memory cgroup from the kmem_cache. It's correct for
compound pages, but not for generic small pages. Those don't have PgHead
set, so it ends up returning zero.
Fix this by replacing the condition to PageSlab() && !PageTail().
Before this patch:
[root@localhost ~]# ./page-types -c /sys/fs/cgroup/user.slice/user-0.slice/user@0.service/ | grep slab
0x0000000000000080 38 0 _______S___________________________________ slab
After this patch:
[root@localhost ~]# ./page-types -c /sys/fs/cgroup/user.slice/user-0.slice/user@0.service/ | grep slab
0x0000000000000080 147 0 _______S___________________________________ slab
Also, hwpoison_filter_task() uses output of page_cgroup_ino() in order
to filter error injection events based on memcg. So if
page_cgroup_ino() fails to return memcg pointer, we just fail to inject
memory error. Considering that hwpoison filter is for testing, affected
users are limited and the impact should be marginal.
[n-horiguchi@ah.jp.nec.com: changelog additions]
Link: http://lkml.kernel.org/r/20191031012151.2722280-1-guro@fb.com
Fixes: 4d96ba3530 ("mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages")
Signed-off-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In the current code, we use the atomic_cmpxchg() to serialize the output
of the dump_stack(), but this implementation suffers the thundering herd
problem. We have observed such kind of livelock on a Marvell cn96xx
board(24 cpus) when heavily using the dump_stack() in a kprobe handler.
Actually we can let the competitors to wait for the releasing of the
lock before jumping to atomic_cmpxchg(). This will definitely mitigate
the thundering herd problem. Thanks Linus for the suggestion.
[akpm@linux-foundation.org: fix comment]
Link: http://lkml.kernel.org/r/20191030031637.6025-1-haokexin@gmail.com
Fixes: b58d977432 ("dump_stack: serialize the output from dump_stack()")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While investigating a bug related to higher atomic allocation failures,
we noticed the failure warnings positively drowning the console, and in
our case trigger lockup warnings because of a serial console too slow to
handle all that output.
But even if we had a faster console, it's unclear what additional
information the current level of repetition provides.
Allocation failures happen for three reasons: The machine is OOM, the VM
is failing to handle reasonable requests, or somebody is making
unreasonable requests (and didn't acknowledge their opportunism with
__GFP_NOWARN). Having the memory dump, a callstack, and the ratelimit
stats on skipped failure warnings should provide enough information to
let users/admins/developers know whether something is wrong and point
them in the right direction for debugging, bpftracing etc.
Limit allocation failure warnings to one spew every ten seconds.
Link: http://lkml.kernel.org/r/20191028194906.26899-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pagetypeinfo_showfree_print is called by zone->lock held in irq mode.
This is not really nice because it blocks both any interrupts on that
cpu and the page allocator. On large machines this might even trigger
the hard lockup detector.
Considering the pagetypeinfo is a debugging tool we do not really need
exact numbers here. The primary reason to look at the outuput is to see
how pageblocks are spread among different migratetypes and low number of
pages is much more interesting therefore putting a bound on the number
of pages on the free_list sounds like a reasonable tradeoff.
The new output will simply tell
[...]
Node 6, zone Normal, type Movable >100000 >100000 >100000 >100000 41019 31560 23996 10054 3229 983 648
instead of
Node 6, zone Normal, type Movable 399568 294127 221558 102119 41019 31560 23996 10054 3229 983 648
The limit has been chosen arbitrary and it is a subject of a future
change should there be a need for that.
While we are at it, also drop the zone lock after each free_list
iteration which will help with the IRQ and page allocator responsiveness
even further as the IRQ lock held time is always bound to those 100k
pages.
[akpm@linux-foundation.org: tweak comment text, per David Hildenbrand]
Link: http://lkml.kernel.org/r/20191025072610.18526-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Waiman Long <longman@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Roman Gushchin <guro@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When the extent tree is modified, it should be protected by inode
cluster lock and ip_alloc_sem.
The extent tree is accessed and modified in the
ocfs2_prepare_inode_for_write, but isn't protected by ip_alloc_sem.
The following is a case. The function ocfs2_fiemap is accessing the
extent tree, which is modified at the same time.
kernel BUG at fs/ocfs2/extent_map.c:475!
invalid opcode: 0000 [#1] SMP
Modules linked in: tun ocfs2 ocfs2_nodemanager configfs ocfs2_stackglue [...]
CPU: 16 PID: 14047 Comm: o2info Not tainted 4.1.12-124.23.1.el6uek.x86_64 #2
Hardware name: Oracle Corporation ORACLE SERVER X7-2L/ASM, MB MECH, X7-2L, BIOS 42040600 10/19/2018
task: ffff88019487e200 ti: ffff88003daa4000 task.ti: ffff88003daa4000
RIP: ocfs2_get_clusters_nocache.isra.11+0x390/0x550 [ocfs2]
Call Trace:
ocfs2_fiemap+0x1e3/0x430 [ocfs2]
do_vfs_ioctl+0x155/0x510
SyS_ioctl+0x81/0xa0
system_call_fastpath+0x18/0xd8
Code: 18 48 c7 c6 60 7f 65 a0 31 c0 bb e2 ff ff ff 48 8b 4a 40 48 8b 7a 28 48 c7 c2 78 2d 66 a0 e8 38 4f 05 00 e9 28 fe ff ff 0f 1f 00 <0f> 0b 66 0f 1f 44 00 00 bb 86 ff ff ff e9 13 fe ff ff 66 0f 1f
RIP ocfs2_get_clusters_nocache.isra.11+0x390/0x550 [ocfs2]
---[ end trace c8aa0c8180e869dc ]---
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabled
This issue can be reproduced every week in a production environment.
This issue is related to the usage mode. If others use ocfs2 in this
mode, the kernel will panic frequently.
[akpm@linux-foundation.org: coding style fixes]
[Fix new warning due to unused function by removing said function - Linus ]
Link: http://lkml.kernel.org/r/1568772175-2906-2-git-send-email-sunny.s.zhang@oracle.com
Signed-off-by: Shuning Zhang <sunny.s.zhang@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have a usecase to use tmpfs as QEMU memory backend and we would like
to take the advantage of THP as well. But, our test shows the EPT is
not PMD mapped even though the underlying THP are PMD mapped on host.
The number showed by /sys/kernel/debug/kvm/largepage is much less than
the number of PMD mapped shmem pages as the below:
7f2778200000-7f2878200000 rw-s 00000000 00:14 262232 /dev/shm/qemu_back_mem.mem.Hz2hSf (deleted)
Size: 4194304 kB
[snip]
AnonHugePages: 0 kB
ShmemPmdMapped: 579584 kB
[snip]
Locked: 0 kB
cat /sys/kernel/debug/kvm/largepages
12
And some benchmarks do worse than with anonymous THPs.
By digging into the code we figured out that commit 127393fbe5 ("mm:
thp: kvm: fix memory corruption in KVM with THP enabled") checks if
there is a single PTE mapping on the page for anonymous THP when setting
up EPT map. But the _mapcount < 0 check doesn't work for page cache THP
since every subpage of page cache THP would get _mapcount inc'ed once it
is PMD mapped, so PageTransCompoundMap() always returns false for page
cache THP. This would prevent KVM from setting up PMD mapped EPT entry.
So we need handle page cache THP correctly. However, when page cache
THP's PMD gets split, kernel just remove the map instead of setting up
PTE map like what anonymous THP does. Before KVM calls get_user_pages()
the subpages may get PTE mapped even though it is still a THP since the
page cache THP may be mapped by other processes at the mean time.
Checking its _mapcount and whether the THP has PTE mapped or not.
Although this may report some false negative cases (PTE mapped by other
processes), it looks not trivial to make this accurate.
With this fix /sys/kernel/debug/kvm/largepage would show reasonable
pages are PMD mapped by EPT as the below:
7fbeaee00000-7fbfaee00000 rw-s 00000000 00:14 275464 /dev/shm/qemu_back_mem.mem.SKUvat (deleted)
Size: 4194304 kB
[snip]
AnonHugePages: 0 kB
ShmemPmdMapped: 557056 kB
[snip]
Locked: 0 kB
cat /sys/kernel/debug/kvm/largepages
271
And the benchmarks are as same as anonymous THPs.
[yang.shi@linux.alibaba.com: v4]
Link: http://lkml.kernel.org/r/1571865575-42913-1-git-send-email-yang.shi@linux.alibaba.com
Link: http://lkml.kernel.org/r/1571769577-89735-1-git-send-email-yang.shi@linux.alibaba.com
Fixes: dd78fedde4 ("rmap: support file thp")
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Reported-by: Gang Deng <gavin.dg@linux.alibaba.com>
Tested-by: Gang Deng <gavin.dg@linux.alibaba.com>
Suggested-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Deferred memory initialisation updates zone->managed_pages during the
initialisation phase but before that finishes, the per-cpu page
allocator (pcpu) calculates the number of pages allocated/freed in
batches as well as the maximum number of pages allowed on a per-cpu
list. As zone->managed_pages is not up to date yet, the pcpu
initialisation calculates inappropriately low batch and high values.
This increases zone lock contention quite severely in some cases with
the degree of severity depending on how many CPUs share a local zone and
the size of the zone. A private report indicated that kernel build
times were excessive with extremely high system CPU usage. A perf
profile indicated that a large chunk of time was lost on zone->lock
contention.
This patch recalculates the pcpu batch and high values after deferred
initialisation completes for every populated zone in the system. It was
tested on a 2-socket AMD EPYC 2 machine using a kernel compilation
workload -- allmodconfig and all available CPUs.
mmtests configuration: config-workload-kernbench-max Configuration was
modified to build on a fresh XFS partition.
kernbench
5.4.0-rc3 5.4.0-rc3
vanilla resetpcpu-v2
Amean user-256 13249.50 ( 0.00%) 16401.31 * -23.79%*
Amean syst-256 14760.30 ( 0.00%) 4448.39 * 69.86%*
Amean elsp-256 162.42 ( 0.00%) 119.13 * 26.65%*
Stddev user-256 42.97 ( 0.00%) 19.15 ( 55.43%)
Stddev syst-256 336.87 ( 0.00%) 6.71 ( 98.01%)
Stddev elsp-256 2.46 ( 0.00%) 0.39 ( 84.03%)
5.4.0-rc3 5.4.0-rc3
vanilla resetpcpu-v2
Duration User 39766.24 49221.79
Duration System 44298.10 13361.67
Duration Elapsed 519.11 388.87
The patch reduces system CPU usage by 69.86% and total build time by
26.65%. The variance of system CPU usage is also much reduced.
Before, this was the breakdown of batch and high values over all zones
was:
256 batch: 1
256 batch: 63
512 batch: 7
256 high: 0
256 high: 378
512 high: 42
512 pcpu pagesets had a batch limit of 7 and a high limit of 42. After
the patch:
256 batch: 1
768 batch: 63
256 high: 0
768 high: 378
[mgorman@techsingularity.net: fix merge/linkage snafu]
Link: http://lkml.kernel.org/r/20191023084705.GD3016@techsingularity.netLink: http://lkml.kernel.org/r/20191021094808.28824-2-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Qian Cai <cai@lca.pw>
Cc: <stable@vger.kernel.org> [4.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The MAP_HUGETLB ("-H" option) of gup_benchmark fails:
$ sudo ./gup_benchmark -H
mmap: Invalid argument
This is because gup_benchmark.c is passing in a file descriptor to
mmap(), but the fd came from opening up the /dev/zero file. This
confuses the mmap syscall implementation, which thinks that, if the
caller did not specify MAP_ANONYMOUS, then the file must be a huge page
file. So it attempts to verify that the file really is a huge page
file, as you can see here:
ksys_mmap_pgoff()
{
if (!(flags & MAP_ANONYMOUS)) {
retval = -EINVAL;
if (unlikely(flags & MAP_HUGETLB && !is_file_hugepages(file)))
goto out_fput; /* THIS IS WHERE WE END UP */
else if (flags & MAP_HUGETLB) {
...proceed normally, /dev/zero is ok here...
...and of course is_file_hugepages() returns "false" for the /dev/zero
file.
The problem is that the user space program, gup_benchmark.c, really just
wants anonymous memory here. The simplest way to get that is to pass
MAP_ANONYMOUS whenever MAP_HUGETLB is specified, so that's what this
patch does.
Link: http://lkml.kernel.org/r/20191021212435.398153-2-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Keith Busch <keith.busch@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Variable error is being initialized with a value that is never read
and is being re-assigned a couple of statements later on. The
assignment is redundant and hence can be removed.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Scrubbing directories, quotas, and fs counters all involve iterating
some collection of metadata items. The per-item scrub functions for
these three are missing some of the components they need to be able to
check for a fatal signal and terminate early.
Per-item scrub functions need to call xchk_should_terminate to look for
fatal signals, and they need to check the scrub context's corruption
flag because there's no point in continuing a scan once we've decided
the data structure is bad. Add both of these where missing.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
i.MX defconfig update for 5.5:
- Enable i.MX7ULP watchdog, DA9052 touch and USB configfs support
in imx_v6_v7_defconfig.
- Enable newly added S32V234 SoC and its UART driver support in arm64
defconfig.
- Built i.MX8QXP SCU key driver as module in arm64 defconfig.
- Change AT803X Ethernet PHY driver from module to built-in, so that
we can boot i.MX8MM EVK board with rootfs on NFS.
* tag 'imx-defconfig-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
arm64: defconfig: Change CONFIG_AT803X_PHY from m to y
ARM: imx_v6_v7_defconfig: Enable CONFIG_TOUCHSCREEN_DA9052
arm64: defconfig: Enable configs for S32V234
arm64: defconfig: Enable CONFIG_KEYBOARD_IMX_SC_KEY as module
ARM: imx_v6_v7_defconfig: Build USB_CONFIGFS into kernel
ARM: imx_v6_v7_defconfig: Enable CONFIG_IMX7ULP_WDT by default
Link: https://lore.kernel.org/r/20191105150315.15477-7-shawnguo@kernel.org
Signed-off-by: Olof Johansson <olof@lixom.net>
i.MX arm64 device tree changes for 5.5:
- Add the initial support for a new arm64 family SoC from NXP:
S32V234 ("Treerunner") vision microprocessors which are targeted for
high-performance, computationally intensive vision and sensor fusion
applications that require automotive safety levels.
- New board support: i.MX8MN LPDDR4 EVK, i.MX8QXP Colibri and
S32V234 EVB.
- A series of patch from Andrey Smirnov to improve zii-ultra support by
fixing regulator and adding accelerometer, switch watchdog.
- Add system counter device and enable cpuidle support for i.MX8MN.
- Move usdhc clocks assignment from SoC to board level DTS for
i.MX8 based boards.
- Add PCA6416 on I2C3 bus for imx8mm-evk, and enable SCU key for
imx8qxp-mek board.
- Enable GPU passive throttling on i.MX8MQ SoC, and add DDR PMU device
for i.MX8MN.
- A series from Fabio Estevam to fix DTC W=1 warnings for LS1028A device.
- Update the clock providers for the Mali DP500 and '#clock-cells' of
DPCLK node for LS1028A SoC.
- Misc small updates on various boards.
* tag 'imx-dt64-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: (40 commits)
arm64: dts: imx8mn-evk: Remove invalid Atheros properties
arm64: dts: freescale: add initial support for colibri imx8x
arm64: dts: ls1028a: Fix tmu unit address
arm64: dts: ls1028a: Move thermal-zone out of SoC
arm64: dts: ls1028a-qds: Remove unnecessary #address-cells/#size-cells
arm64: dts: imx8mn: Remove duplicated machine compatible
arm64: dts: imx8mm: Remove duplicated machine compatible
arm64: dts: imx8mq-evk: Add remote control
arm64: dts: imx8mn: Add LPDDR4 EVK board support
arm64: dts: imx8mn: Create EVK dtsi file for common use
arm64: dts: imx8mn: Move usdhc clocks assignment to board DT
arm64: dts: imx8mm: Move usdhc clocks assignment to board DT
arm64: dts: imx8mq: Move usdhc clocks assignment to board DT
arm64: dts: imx8qxp: Move usdhc clocks assignment to board DT
arm64: dts: fsl: Add device tree for S32V234-EVB
arm64: dts: imx8mm-evk: Assigned clocks for audio plls
arm64: dts: zii-ultra: Add node for switch watchdog
arm64: dts: zii-ultra: Add node for accelerometer
arm64: dts: zii-ultra: Fix regulator-3p3-main's name
arm64: dts: zii-ultra: Fix regulator-vsd-3v3's vin-supply
...
Link: https://lore.kernel.org/r/20191105150315.15477-5-shawnguo@kernel.org
Signed-off-by: Olof Johansson <olof@lixom.net>
i.MX device tree changes for 5.5:
- New board support: Netronix E60K02 and Kobo Clara HD, Kontron N6311
and N6411, OPOS6UL and OPOS6ULDev.
- Correct speed grading fuse settings and add opp-suspend property for
i.MX7D device tree.
- Move usdhc clocks assignment from SoC to board level DTS for imx7ulp,
and use APLL_PFD1 as usdhc's clock source on imx7ulp-evk board.
- Add missing cooling device properties for CPUs for i.MX6/7 SoCs.
- Add sensor GPIO regulator and assign power supplies for magnetometer
for imx6ul-14x14-evk board.
- Replace "simple-bus" with "simple-mfd" for ANATOP device for i.MX6/7
SoCs.
- Fix DTC W=1 warnings by not using simple-audio-card,dai-link on
imx6qdl-gw551x and imx6q-gw54xx board.
- Move to use DRM bindings for the Seiko 43WVF1G panel on imx53-qsb.
- A series from Frieder Schrempf to support more i.MX6UL/ULL-based SoMs
and boards from Kontron Electronics GmbH.
- A few patches from Michal Vokáč to enable more devices support on
imx6dl-yapp4 board.
- A patch series from Philippe Schenker to improve i.MX6/7 Apalis and
Colibri board support.
- A patch series from Sébastien Szymanski to update i.MX6 APF6/APF6Dev
device tree with more devices added and adopting DRM bindings for
display.
- Random improvements, clean-up and device additions on various boards.
* tag 'imx-dt-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: (68 commits)
ARM: dts: imx6ul-kontron-n6x1x-s: Remove an obsolete comment and fix indentation
ARM: dts: imx6ul-kontron-n6x1x-s: Add vbus-supply and overcurrent polarity to usb nodes
ARM: dts: imx6ul-kontron-n6x1x: Add 'chosen' node with 'stdout-path'
ARM: dts: Add support for two more Kontron evalkit boards 'N6311 S' and 'N6411 S'
ARM: dts: imx6ul-kontron-n6310-s: Move common nodes to a separate file
ARM: dts: imx6ul-kontron-n6310-s: Disable the snvs-poweroff driver
ARM: dts: Add support for two more Kontron SoMs N6311 and N6411
ARM: dts: imx6ul-kontron-n6310: Move common SoM nodes to a separate file
ARM: dts: imx7ulp-evk: Use APLL_PFD1 as usdhc's clock source
ARM: dts: imx: add devicetree for Kobo Clara HD
ARM: dts: add Netronix E60K02 board common file
ARM: dts: vf-colibri: fix typo in top-level module compatible
ARM: dts: imx53-qsb: Use DRM bindings for the Seiko 43WVF1G panel
ARM: dts: imx53: Spelling s/configration/configuration/
ARM: dts: imx6ul-14x14-evk: Assign power supplies for magnetometer
ARM: dts: imx6ul-14x14-evk: Fix the magnetometer node name
ARM: dts: imx6ul-14x14-evk: Add sensors' GPIO regulator
ARM: dts: imx6ul: Disable gpt2 by default
ARM: dts: imx7d: Add missing cooling device properties for CPUs
ARM: dts: imx6dl: Add missing cooling device properties for CPUs
...
Link: https://lore.kernel.org/r/20191105150315.15477-4-shawnguo@kernel.org
Signed-off-by: Olof Johansson <olof@lixom.net>
i.MX SoC update for 5.5:
- Add arm64 Kconfig option for the NXP S32 platform.
- Drop imx_anatop_usb_chrg_detect_disable() function which becomes
unneeded, since all the necessary charger setup is done by the USB
PHY driver now.
- Add serial number support for i.MX6/7 SoCs by reading 64-bit SoC
unique ID from OCOTP block.
- Replace i.MX machine specific coherency exit implementation using
the generic v7_exit_coherency_flush() function.
* tag 'imx-soc-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: imx: use generic function to exit coherency
ARM: imx: Add serial number support for i.MX6/7 SoCs
ARM: imx: Drop imx_anatop_usb_chrg_detect_disable()
arm64: Introduce config for S32
Link: https://lore.kernel.org/r/20191105150315.15477-2-shawnguo@kernel.org
Signed-off-by: Olof Johansson <olof@lixom.net>
i.MX drivers update for 5.5:
- Skip return check for those SCU firmware APIs that are defined as
void function in firmware.
- Use established serial_number attribute instead of custom one to show
SoC's unique ID for i.MX8 SoC drivers.
- Read i.MX8MQ SOC revision from TF-A which parses ROM and exposes the
value through a SMC call. This improves the situation that SOC
revision reports 'unknown' on some older revisions.
- Add a check and warn on unexpected SCU RX to avoid potential stack
corruption in imx-scu driver.
- Fix a sparse warning in imx-scu-irq driver by adding missing header.
- Remove an unneeded call to devm_of_platform_populate() from imx-dsp
driver.
* tag 'imx-drivers-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
soc: imx8mq: Read SOC revision from TF-A
soc: imx-scu: Using existing serial_number instead of UID
soc: imx8: Using existing serial_number instead of UID
firmware: imx: add missing include of <linux/firmware/imx/sci.h>
firmware: imx: Remove call to devm_of_platform_populate
firmware: imx: Skip return value check for some special SCU firmware APIs
firmware: imx: warn on unexpected RX
Link: https://lore.kernel.org/r/20191105150315.15477-1-shawnguo@kernel.org
Signed-off-by: Olof Johansson <olof@lixom.net>
Samsung soc drivers changes for v5.5
1. Minor fixes to Exynos Chipid driver.
2. Add Exynos Adaptive Supply Voltage driver allowing to adjust voltages
used during CPU frequency scaling based on revision of SoC. This
also pulls dependency from PM/OPP tree - driver uses newly added
dev_pm_opp_adjust_voltage() function.
* tag 'samsung-drivers-5.5' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
soc: samsung: exynos-asv: Potential NULL dereference in exynos_asv_update_opps()
soc: samsung: chipid: Drop "syscon" compatible requirement
soc: samsung: Add Exynos Adaptive Supply Voltage driver
PM / OPP: Support adjusting OPP voltages at runtime
soc: samsung: chipid: Make exynos_chipid_early_init() static
Link: https://lore.kernel.org/r/20191104175902.12224-1-krzk@kernel.org
Signed-off-by: Olof Johansson <olof@lixom.net>
A few more DT patches for 5.5, mostly:
- USB3 support for the H6
- Deinterlacer support for the H3
- eDP Bridge support on the Teres-I
- More DT cleanups thanks to the validation
* tag 'sunxi-dt-for-5.5-2' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
arm64: dts: allwinner: h6: Remove useless reset name
ARM: dts: sun6i: Remove useless reset-names
arm64: dts: allwinner: orange-pi-3: Enable USB 3.0 host support
arm64: dts: allwinner: h6: add USB3 device nodes
dt-bindings: Add ANX6345 DP/eDP transmitter binding
arm64: dts: allwinner: a64: enable ANX6345 bridge on Teres-I
dts: arm: sun8i: h3: Enable deinterlace unit
ARM: dts: sunxi: h3/h5: Add MBUS controller node
dt-bindings: bus: sunxi: Add H3 MBUS compatible
Link: https://lore.kernel.org/r/58ad00a8-9579-4811-969a-a74e331ee9a2.lettre@localhost
Signed-off-by: Olof Johansson <olof@lixom.net>