1419b694038104c01bbb46e7ef968e8632e3dc5c
1776 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
24149445ad |
ANDROID: vendor_hooks: Add hooks for memory when debug
Add vendors hooks for recording memory used Bug: 182443489 Signed-off-by: Liujie Xie <xieliujie@oppo.com> Change-Id: I62d8bb2b6650d8b187b433f97eb833ef0b784df1 |
||
|
|
4c84191cbc |
mm/page_alloc.c: refactor initialization of struct page for holes in memory layout
commit 0740a50b9baa4472cfb12442df4b39e2712a64a4 upstream. There could be struct pages that are not backed by actual physical memory. This can happen when the actual memory bank is not a multiple of SECTION_SIZE or when an architecture does not register memory holes reserved by the firmware as memblock.memory. Such pages are currently initialized using init_unavailable_mem() function that iterates through PFNs in holes in memblock.memory and if there is a struct page corresponding to a PFN, the fields of this page are set to default values and it is marked as Reserved. init_unavailable_mem() does not take into account zone and node the page belongs to and sets both zone and node links in struct page to zero. Before commit |
||
|
|
134ac2d4dc |
FROMLIST: mm: replace migrate_prep with lru_add_drain_all
Currently, migrate_prep is merely a wrapper of lru_cache_add_all. There is not much to gain from having additional abstraction. Use lru_add_drain_all instead of migrate_prep, which would be more descriptive. note: migrate_prep_local in compaction.c changed into lru_add_drain to avoid CPU schedule cost with involving many other CPUs to keep keep old behavior. Bug: 180018981 Link: https://lore.kernel.org/linux-mm/20210310161429.399432-1-minchan@kernel.org/ Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Minchan Kim <minchan@google.com> Change-Id: I1bd3fcb13993e8a7a7961ceec817ac17304364cb |
||
|
|
28f6641041 |
FROMLIST: mm: page_alloc: dump migrate-failed pages
Currently, debugging CMA allocation failures is quite limited. The most common source of these failures seems to be page migration which doesn't provide any useful information on the reason of the failure by itself. alloc_contig_range can report those failures as it holds a list of migrate-failed pages. The information logged by dump_page() has already proven helpful for debugging allocation issues, like identifying long-term pinnings on ZONE_MOVABLE or MIGRATE_CMA. Let's use the dynamic debugging infrastructure, such that we avoid flooding the logs and creating a lot of noise on frequent alloc_contig_range() calls. This information is helpful for debugging only. There are two ifdefery conditions to support common dyndbg options: - CONFIG_DYNAMIC_DEBUG_CORE && DYNAMIC_DEBUG_MODULE It aims for supporting the feature with only specific file with adding ccflags. - CONFIG_DYNAMIC_DEBUG It aims for supporting the feature with system wide globally. A simple example to enable the feature: Admin could enable the dump like this(by default, disabled) echo "func alloc_contig_dump_pages +p" > control Admin could disable it. echo "func alloc_contig_dump_pages =_" > control Detail goes Documentation/admin-guide/dynamic-debug-howto.rst A concern is utility functions in dump_page use inconsistent loglevels. In the future, we might want to make the loglevels used inside dump_page() consistent and eventually rework the way we log the information here. See [1]. [1] https://lore.kernel.org/linux-mm/YEh4doXvyuRl5BDB@google.com/ Bug: 182195592 Link: https://lore.kernel.org/linux-mm/20210311194042.825152-1-minchan@kernel.org/ Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Change-Id: I4db99a76a543d559a05515d6800c66ab48196978 |
||
|
|
2c194f3178 |
FROMGIT: mm: remove lru_add_drain_all in alloc_contig_range
__alloc_contig_migrate_range already has lru_add_drain_all call via
migrate_prep. It's necessary to move LRU taget pages into LRU list to be
able to isolated. However, lru_add_drain_all call after
__alloc_contig_migrate_range is pointless since it has changed source page
freeing from putback_lru_pages to put_page[1].
This patch removes it.
[1]
|
||
|
|
cd6aa9911d |
UPSTREAM: mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo
Let's count the number of CMA pages per zone and print them in
/proc/zoneinfo.
Having access to the total number of CMA pages per zone is helpful for
debugging purposes to know where exactly the CMA pages ended up, and to
figure out how many pages of a zone might behave differently, even after
some of these pages might already have been allocated.
As one example, CMA pages part of a kernel zone cannot be used for
ordinary kernel allocations but instead behave more like ZONE_MOVABLE.
For now, we are only able to get the global nr+free cma pages from
/proc/meminfo and the free cma pages per zone from /proc/zoneinfo.
Example after this patch when booting a 6 GiB QEMU VM with
"hugetlb_cma=2G":
# cat /proc/zoneinfo | grep cma
cma 0
nr_free_cma 0
cma 0
nr_free_cma 0
cma 524288
nr_free_cma 493016
cma 0
cma 0
# cat /proc/meminfo | grep Cma
CmaTotal: 2097152 kB
CmaFree: 1972064 kB
Note: We print even without CONFIG_CMA, just like "nr_free_cma"; this way,
one can be sure when spotting "cma 0", that there are definetly no
CMA pages located in a zone.
[david@redhat.com: v2]
Link: https://lkml.kernel.org/r/20210128164533.18566-1-david@redhat.com
[david@redhat.com: v3]
Link: https://lkml.kernel.org/r/20210129113451.22085-1-david@redhat.com
Link: https://lkml.kernel.org/r/20210127101813.6370-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 3c381db1fac80373f2cc0d8c1d0bcfbf8bd4fb57)
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: Ia4496355853c7bb7201f09394369cd42f632c079
|
||
|
|
e909fe79d2 |
ANDROID: mm: export zone_watermark_ok
Export zone_watermark_ok and its friends so that modules can use it to determine if zone watermarks are ok in the system. Bug: 140294230 Change-Id: I958961150cf0c6db318f3e0daf1543ced00a9aab Signed-off-by: Sudarshan Rajagopalan <sudaraja@codeaurora.org> |
||
|
|
369de37804 |
ANDROID: mm: Add vendor hook in rmqueue()
Add a vendor hook for costly order page counting and other vendor specific functions. Bug: 174521902 Bug: 172987241 Signed-off-by: Chiawei Wang <chiaweiwang@google.com> Change-Id: I89206727a462548cc3500b695d85c83ff003eec7 |
||
|
|
c11f7749f1 |
mm/page_alloc: add a missing mm_page_alloc_zone_locked() tracepoint
commit ce8f86ee94fabcc98537ddccd7e82cfd360a4dc5 upstream. The trace point *trace_mm_page_alloc_zone_locked()* in __rmqueue() does not currently cover all branches. Add the missing tracepoint and check the page before do that. [akpm@linux-foundation.org: use IS_ENABLED() to suppress warning] Link: https://lkml.kernel.org/r/20201228132901.41523-1-carver4lio@163.com Signed-off-by: Hailong liu <liu.hailong6@zte.com.cn> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ivan Babrou <ivan@cloudflare.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
385eb1fe10 |
UPSTREAM: kasan, mm: fix resetting page_alloc tags for HW_TAGS
[ Upstream commit acb35b177c71d3d39b9a3b9ea213d926235066e3 ] A previous commit added resetting KASAN page tags to kernel_init_free_pages() to avoid false-positives due to accesses to metadata with the hardware tag-based mode. That commit did reset page tags before the metadata access, but didn't restore them after. As the result, KASAN fails to detect bad accesses to page_alloc allocations on some configurations. Fix this by recovering the tag after the metadata access. Link: https://lkml.kernel.org/r/02b5bcd692e912c27d484030f666b350ad7e4ae4.1611074450.git.andreyknvl@google.com Fixes: aa1ef4d7b3f6 ("kasan, mm: reset tags when accessing metadata") Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Marco Elver <elver@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Bug: 172318110 Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Change-Id: I7bf87862c7524cb0a8178c584e238d6e3d84bac0 |
||
|
|
1daa298a04 |
Revert "mm: fix initialization of struct page for holes in memory layout"
commit 377bf660d07a47269510435d11f3b65d53edca20 upstream. This reverts commit d3921cb8be29ce5668c64e23ffdaeec5f8c69399. Chris Wilson reports that it causes boot problems: "We have half a dozen or so different machines in CI that are silently failing to boot, that we believe is bisected to this patch" and the CI team confirmed that a revert fixed the issues. The cause is unknown for now, so let's revert it. Link: https://lore.kernel.org/lkml/161160687463.28991.354987542182281928@build.alporthouse.com/ Reported-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
f2a79851c7 |
mm: fix initialization of struct page for holes in memory layout
commit d3921cb8be29ce5668c64e23ffdaeec5f8c69399 upstream.
There could be struct pages that are not backed by actual physical
memory. This can happen when the actual memory bank is not a multiple
of SECTION_SIZE or when an architecture does not register memory holes
reserved by the firmware as memblock.memory.
Such pages are currently initialized using init_unavailable_mem()
function that iterates through PFNs in holes in memblock.memory and if
there is a struct page corresponding to a PFN, the fields if this page
are set to default values and the page is marked as Reserved.
init_unavailable_mem() does not take into account zone and node the page
belongs to and sets both zone and node links in struct page to zero.
On a system that has firmware reserved holes in a zone above ZONE_DMA,
for instance in a configuration below:
# grep -A1 E820 /proc/iomem
7a17b000-7a216fff : Unknown E820 type
7a217000-7bffffff : System RAM
unset zone link in struct page will trigger
VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
because there are pages in both ZONE_DMA32 and ZONE_DMA (unset zone link
in struct page) in the same pageblock.
Update init_unavailable_mem() to use zone constraints defined by an
architecture to properly setup the zone link and use node ID of the
adjacent range in memblock.memory to set the node link.
Link: https://lkml.kernel.org/r/20210111194017.22696-3-rppt@kernel.org
Fixes:
|
||
|
|
20512940b8 |
FROMLIST: mm: failfast mode with __GFP_NORETRY in alloc_contig_range
Contiguous memory allocation can be stalled due to waiting on page writeback and/or page lock which causes unpredictable delay. It's a unavoidable cost for the requestor to get *big* contiguous memory but it's expensive for *small* contiguous memory(e.g., order-4) because caller could retry the request in different range where would have easy migratable pages without stalling. This patch introduce __GFP_NORETRY as compaction gfp_mask in alloc_contig_range so it will fail fast without blocking when it encounters pages needed waiting. Bug: 170340257 Bug: 120293424 Link: https://lore.kernel.org/linux-mm/YAnM5PbNJZlk%2F%2FiX@google.com/T/#m1362218ebb69e6e10c20d9361008b079745c4e6f Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Minchan Kim <minchan@google.com> Change-Id: I42ba8dd5aeb065d936978ab205e4baf84bf9a321 |
||
|
|
a8198d1062 |
ANDROID: mm: use alloc_flags for cma first alloc policy
rmqueue internal functions to allocate cma memory should use alloc_flags instead of gfp_flags because it's more restricted flags to be considered allocation context. Otherwise, we could allocate page from CMA area even though current context already disable cma memory allocation. For example, current allocation context is limited not to allocate the page from CMA area by PF_MEMALLOC_NOCMA to prevent longterm pin but it's ignored so the longterm pin page could be allocated from CMA area. Bug: 178019362 Signed-off-by: Minchan Kim <minchan@google.com> Change-Id: I61bd4642c91ecd9153f6c59f89e296e8b515f1ad |
||
|
|
8d4b6fc236 |
UPSTREAM: kasan, mm: reset tags when accessing metadata
[ Upstream commit aa1ef4d7b3f67f7f17aa4aa34f5ec513c7e4db6c ] Kernel allocator code accesses metadata for slab objects, that may lie out-of-bounds of the object itself, or be accessed when an object is freed. Such accesses trigger tag faults and lead to false-positive reports with hardware tag-based KASAN. Software KASAN modes disable instrumentation for allocator code via KASAN_SANITIZE Makefile macro, and rely on kasan_enable/disable_current() annotations which are used to ignore KASAN reports. With hardware tag-based KASAN neither of those options are available, as it doesn't use compiler instrumetation, no tag faults are ignored, and MTE is disabled after the first one. Instead, reset tags when accessing metadata (currently only for SLUB). Link: https://lkml.kernel.org/r/a0f3cefbc49f34c843b664110842de4db28179d0.1606161801.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Acked-by: Marco Elver <elver@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Bug: 172318110 Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Change-Id: I9e465a2b11b96938d2dc4d45d31a15b1c6c1d129 |
||
|
|
a878e24296 |
UPSTREAM: kasan, mm: untag page address in free_reserved_area
[ Upstream commit c746170d6a48b59d1233b375905f7faef6ce80bc ] free_reserved_area() memsets the pages belonging to a given memory area. As that memory hasn't been allocated via page_alloc, the KASAN tags that those pages have are 0x00. As the result the memset might result in a tag mismatch. Untag the address to avoid spurious faults. Link: https://lkml.kernel.org/r/ebef6425f4468d063e2f09c1b62ccbb2236b71d3.1606161801.git.andreyknvl@google.com Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Marco Elver <elver@google.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Bug: 172318110 Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Change-Id: I7ee2b3a75f390d26b82dec5e66e9d103bf3df8c4 |
||
|
|
9cf2ceaffd |
Merge 5.10.5 into android12-5.10
Changes in 5.10.5
net/sched: sch_taprio: reset child qdiscs before freeing them
mptcp: fix security context on server socket
ethtool: fix error paths in ethnl_set_channels()
ethtool: fix string set id check
md/raid10: initialize r10_bio->read_slot before use.
drm/amd/display: Add get_dig_frontend implementation for DCEx
io_uring: close a small race gap for files cancel
jffs2: Allow setting rp_size to zero during remounting
jffs2: Fix NULL pointer dereference in rp_size fs option parsing
spi: dw-bt1: Fix undefined devm_mux_control_get symbol
opp: fix memory leak in _allocate_opp_table
opp: Call the missing clk_put() on error
scsi: block: Fix a race in the runtime power management code
mm/hugetlb: fix deadlock in hugetlb_cow error path
mm: memmap defer init doesn't work as expected
lib/zlib: fix inflating zlib streams on s390
io_uring: don't assume mm is constant across submits
io_uring: use bottom half safe lock for fixed file data
io_uring: add a helper for setting a ref node
io_uring: fix io_sqe_files_unregister() hangs
uapi: move constants from <linux/kernel.h> to <linux/const.h>
tools headers UAPI: Sync linux/const.h with the kernel headers
cgroup: Fix memory leak when parsing multiple source parameters
zlib: move EXPORT_SYMBOL() and MODULE_LICENSE() out of dfltcc_syms.c
scsi: cxgb4i: Fix TLS dependency
Bluetooth: hci_h5: close serdev device and free hu in h5_close
fbcon: Disable accelerated scrolling
reiserfs: add check for an invalid ih_entry_count
misc: vmw_vmci: fix kernel info-leak by initializing dbells in vmci_ctx_get_chkpt_doorbells()
media: gp8psk: initialize stats at power control logic
f2fs: fix shift-out-of-bounds in sanity_check_raw_super()
ALSA: seq: Use bool for snd_seq_queue internal flags
ALSA: rawmidi: Access runtime->avail always in spinlock
bfs: don't use WARNING: string when it's just info.
ext4: check for invalid block size early when mounting a file system
fcntl: Fix potential deadlock in send_sig{io, urg}()
io_uring: check kthread stopped flag when sq thread is unparked
rtc: sun6i: Fix memleak in sun6i_rtc_clk_init
module: set MODULE_STATE_GOING state when a module fails to load
quota: Don't overflow quota file offsets
rtc: pl031: fix resource leak in pl031_probe
powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe()
i3c master: fix missing destroy_workqueue() on error in i3c_master_register
NFSv4: Fix a pNFS layout related use-after-free race when freeing the inode
f2fs: avoid race condition for shrinker count
f2fs: fix race of pending_pages in decompression
module: delay kobject uevent until after module init call
powerpc/64: irq replay remove decrementer overflow check
fs/namespace.c: WARN if mnt_count has become negative
watchdog: rti-wdt: fix reference leak in rti_wdt_probe
um: random: Register random as hwrng-core device
um: ubd: Submit all data segments atomically
NFSv4.2: Don't error when exiting early on a READ_PLUS buffer overflow
ceph: fix inode refcount leak when ceph_fill_inode on non-I_NEW inode fails
drm/amd/display: updated wm table for Renoir
tick/sched: Remove bogus boot "safety" check
s390: always clear kernel stack backchain before calling functions
io_uring: remove racy overflow list fast checks
ALSA: pcm: Clear the full allocated memory at hw_params
dm verity: skip verity work if I/O error when system is shutting down
ext4: avoid s_mb_prefetch to be zero in individual scenarios
device-dax: Fix range release
Linux 5.10.5
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2b481bfac06bafdef2cf3cc1ac2c2a4ddf9913dc
|
||
|
|
98b57685c2 |
mm: memmap defer init doesn't work as expected
commit dc2da7b45ffe954a0090f5d0310ed7b0b37d2bd2 upstream. VMware observed a performance regression during memmap init on their platform, and bisected to commit |
||
|
|
19057a6a6b |
Merge 5.10.4 into android12-5.10
Changes in 5.10.4
hwmon: (k10temp) Remove support for displaying voltage and current on Zen CPUs
drm/gma500: fix double free of gma_connector
iio: adc: at91_adc: add Kconfig dep on the OF symbol and remove of_match_ptr()
drm/aspeed: Fix Kconfig warning & subsequent build errors
drm/mcde: Fix handling of platform_get_irq() error
drm/tve200: Fix handling of platform_get_irq() error
arm64: dts: renesas: hihope-rzg2-ex: Drop rxc-skew-ps from ethernet-phy node
arm64: dts: renesas: cat875: Remove rxc-skew-ps from ethernet-phy node
soc: renesas: rmobile-sysc: Fix some leaks in rmobile_init_pm_domains()
soc: mediatek: Check if power domains can be powered on at boot time
arm64: dts: mediatek: mt8183: fix gce incorrect mbox-cells value
arm64: dts: ipq6018: update the reserved-memory node
arm64: dts: qcom: sc7180: Fix one forgotten interconnect reference
soc: qcom: geni: More properly switch to DMA mode
Revert "i2c: i2c-qcom-geni: Fix DMA transfer race"
RDMA/bnxt_re: Set queue pair state when being queried
rtc: pcf2127: fix pcf2127_nvmem_read/write() returns
RDMA/bnxt_re: Fix entry size during SRQ create
selinux: fix error initialization in inode_doinit_with_dentry()
ARM: dts: aspeed-g6: Fix the GPIO memory size
ARM: dts: aspeed: s2600wf: Fix VGA memory region location
RDMA/core: Fix error return in _ib_modify_qp()
RDMA/rxe: Compute PSN windows correctly
x86/mm/ident_map: Check for errors from ident_pud_init()
ARM: p2v: fix handling of LPAE translation in BE mode
RDMA/rtrs-clt: Remove destroy_con_cq_qp in case route resolving failed
RDMA/rtrs-clt: Missing error from rtrs_rdma_conn_established
RDMA/rtrs-srv: Don't guard the whole __alloc_srv with srv_mutex
x86/apic: Fix x2apic enablement without interrupt remapping
ASoC: qcom: fix unsigned int bitwidth compared to less than zero
sched/deadline: Fix sched_dl_global_validate()
sched: Reenable interrupts in do_sched_yield()
drm/amdgpu: fix incorrect enum type
crypto: talitos - Endianess in current_desc_hdr()
crypto: talitos - Fix return type of current_desc_hdr()
crypto: inside-secure - Fix sizeof() mismatch
ASoC: sun4i-i2s: Fix lrck_period computation for I2S justified mode
drm/msm: Add missing stub definition
ARM: dts: aspeed: tiogapass: Remove vuart
drm/amdgpu: fix build_coefficients() argument
powerpc/64: Set up a kernel stack for secondaries before cpu_restore()
spi: img-spfi: fix reference leak in img_spfi_resume
f2fs: call f2fs_get_meta_page_retry for nat page
RDMA/mlx5: Fix corruption of reg_pages in mlx5_ib_rereg_user_mr()
perf test: Use generic event for expand_libpfm_events()
drm/msm/dp: DisplayPort PHY compliance tests fixup
drm/msm/dsi_pll_7nm: restore VCO rate during restore_state
drm/msm/dsi_pll_10nm: restore VCO rate during restore_state
drm/msm/dpu: fix clock scaling on non-sc7180 board
spi: spi-mem: fix reference leak in spi_mem_access_start
scsi: aacraid: Improve compat_ioctl handlers
pinctrl: core: Add missing #ifdef CONFIG_GPIOLIB
ASoC: pcm: DRAIN support reactivation
drm/bridge: tpd12s015: Fix irq registering in tpd12s015_probe
crypto: arm64/poly1305-neon - reorder PAC authentication with SP update
crypto: arm/aes-neonbs - fix usage of cbc(aes) fallback
crypto: caam - fix printing on xts fallback allocation error path
selinux: fix inode_doinit_with_dentry() LABEL_INVALID error handling
nl80211/cfg80211: fix potential infinite loop
spi: stm32: fix reference leak in stm32_spi_resume
bpf: Fix tests for local_storage
x86/mce: Correct the detection of invalid notifier priorities
drm/edid: Fix uninitialized variable in drm_cvt_modes()
ath11k: Initialize complete alpha2 for regulatory change
ath11k: Fix number of rules in filtered ETSI regdomain
ath11k: fix wmi init configuration
brcmfmac: Fix memory leak for unpaired brcmf_{alloc/free}
arm64: dts: exynos: Include common syscon restart/poweroff for Exynos7
arm64: dts: exynos: Correct psci compatible used on Exynos7
drm/panel: simple: Add flags to boe_nv133fhm_n61
Bluetooth: Fix null pointer dereference in hci_event_packet()
Bluetooth: Fix: LL PRivacy BLE device fails to connect
Bluetooth: hci_h5: fix memory leak in h5_close
spi: stm32-qspi: fix reference leak in stm32 qspi operations
spi: spi-ti-qspi: fix reference leak in ti_qspi_setup
spi: mt7621: fix missing clk_disable_unprepare() on error in mt7621_spi_probe
spi: tegra20-slink: fix reference leak in slink ops of tegra20
spi: tegra20-sflash: fix reference leak in tegra_sflash_resume
spi: tegra114: fix reference leak in tegra spi ops
spi: bcm63xx-hsspi: fix missing clk_disable_unprepare() on error in bcm63xx_hsspi_resume
spi: imx: fix reference leak in two imx operations
ASoC: qcom: common: Fix refcounting in qcom_snd_parse_of()
ath11k: Handle errors if peer creation fails
mwifiex: fix mwifiex_shutdown_sw() causing sw reset failure
drm/msm/a6xx: Clear shadow on suspend
drm/msm/a5xx: Clear shadow on suspend
firmware: tegra: fix strncpy()/strncat() confusion
drm/msm/dp: return correct connection status after suspend
drm/msm/dp: skip checking LINK_STATUS_UPDATED bit
drm/msm/dp: do not notify audio subsystem if sink doesn't support audio
selftests/run_kselftest.sh: fix dry-run typo
selftest/bpf: Add missed ip6ip6 test back
ASoC: wm8994: Fix PM disable depth imbalance on error
ASoC: wm8998: Fix PM disable depth imbalance on error
spi: sprd: fix reference leak in sprd_spi_remove
virtiofs fix leak in setup
ASoC: arizona: Fix a wrong free in wm8997_probe
RDMa/mthca: Work around -Wenum-conversion warning
ASoC: SOF: Intel: fix Kconfig dependency for SND_INTEL_DSP_CONFIG
arm64: dts: ti: k3-am65*/j721e*: Fix unit address format error for dss node
MIPS: BCM47XX: fix kconfig dependency bug for BCM47XX_BCMA
drm/amdgpu: fix compute queue priority if num_kcq is less than 4
soc: ti: omap-prm: Do not check rstst bit on deassert if already deasserted
crypto: Kconfig - CRYPTO_MANAGER_EXTRA_TESTS requires the manager
crypto: qat - fix status check in qat_hal_put_rel_rd_xfer()
firmware: arm_scmi: Fix missing destroy_workqueue()
drm/udl: Fix missing error code in udl_handle_damage()
staging: greybus: codecs: Fix reference counter leak in error handling
staging: gasket: interrupt: fix the missed eventfd_ctx_put() in gasket_interrupt.c
scripts: kernel-doc: Restore anonymous enum parsing
drm/amdkfd: Put ACPI table after using it
ionic: use mc sync for multicast filters
ionic: flatten calls to ionic_lif_rx_mode
ionic: change set_rx_mode from_ndo to can_sleep
media: tm6000: Fix sizeof() mismatches
media: platform: add missing put_device() call in mtk_jpeg_clk_init()
media: mtk-vcodec: add missing put_device() call in mtk_vcodec_init_dec_pm()
media: mtk-vcodec: add missing put_device() call in mtk_vcodec_release_dec_pm()
media: mtk-vcodec: add missing put_device() call in mtk_vcodec_init_enc_pm()
media: v4l2-fwnode: Return -EINVAL for invalid bus-type
media: v4l2-fwnode: v4l2_fwnode_endpoint_parse caller must init vep argument
media: ov5640: fix support of BT656 bus mode
media: staging: rkisp1: cap: fix runtime PM imbalance on error
media: cedrus: fix reference leak in cedrus_start_streaming
media: platform: add missing put_device() call in mtk_jpeg_probe() and mtk_jpeg_remove()
media: venus: core: change clk enable and disable order in resume and suspend
media: venus: core: vote for video-mem path
media: venus: core: vote with average bandwidth and peak bandwidth as zero
RDMA/cma: Add missing error handling of listen_id
ASoC: meson: fix COMPILE_TEST error
spi: dw: fix build error by selecting MULTIPLEXER
scsi: core: Fix VPD LUN ID designator priorities
media: venus: put dummy vote on video-mem path after last session release
media: solo6x10: fix missing snd_card_free in error handling case
video: fbdev: atmel_lcdfb: fix return error code in atmel_lcdfb_of_init()
mmc: sdhci: tegra: fix wrong unit with busy_timeout
drm/omap: dmm_tiler: fix return error code in omap_dmm_probe()
drm/meson: Free RDMA resources after tearing down DRM
drm/meson: Unbind all connectors on module removal
drm/meson: dw-hdmi: Register a callback to disable the regulator
drm/meson: dw-hdmi: Ensure that clocks are enabled before touching the TOP registers
ASoC: intel: SND_SOC_INTEL_KEEMBAY should depend on ARCH_KEEMBAY
iommu/vt-d: include conditionally on CONFIG_INTEL_IOMMU_SVM
Input: ads7846 - fix race that causes missing releases
Input: ads7846 - fix integer overflow on Rt calculation
Input: ads7846 - fix unaligned access on 7845
bus: mhi: core: Remove double locking from mhi_driver_remove()
bus: mhi: core: Fix null pointer access when parsing MHI configuration
usb/max3421: fix return error code in max3421_probe()
spi: mxs: fix reference leak in mxs_spi_probe
selftests/bpf: Fix broken riscv build
powerpc: Avoid broken GCC __attribute__((optimize))
powerpc/feature: Fix CPU_FTRS_ALWAYS by removing CPU_FTRS_GENERIC_32
ARM: dts: tacoma: Fix node vs reg mismatch for flash memory
Revert "powerpc/pseries/hotplug-cpu: Remove double free in error path"
powerpc/powernv/sriov: fix unsigned int win compared to less than zero
mfd: htc-i2cpld: Add the missed i2c_put_adapter() in htcpld_register_chip_i2c()
mfd: MFD_SL28CPLD should depend on ARCH_LAYERSCAPE
mfd: stmfx: Fix dev_err_probe() call in stmfx_chip_init()
mfd: cpcap: Fix interrupt regression with regmap clear_ack
EDAC/mce_amd: Use struct cpuinfo_x86.cpu_die_id for AMD NodeId
scsi: ufs: Avoid to call REQ_CLKS_OFF to CLKS_OFF
scsi: ufs: Fix clkgating on/off
rcu: Allow rcu_irq_enter_check_tick() from NMI
rcu,ftrace: Fix ftrace recursion
rcu/tree: Defer kvfree_rcu() allocation to a clean context
crypto: crypto4xx - Replace bitwise OR with logical OR in crypto4xx_build_pd
crypto: omap-aes - Fix PM disable depth imbalance in omap_aes_probe
crypto: sun8i-ce - fix two error path's memory leak
spi: fix resource leak for drivers without .remove callback
drm/meson: dw-hdmi: Disable clocks on driver teardown
drm/meson: dw-hdmi: Enable the iahb clock early enough
PCI: Disable MSI for Pericom PCIe-USB adapter
PCI: brcmstb: Initialize "tmp" before use
soc: ti: knav_qmss: fix reference leak in knav_queue_probe
soc: ti: Fix reference imbalance in knav_dma_probe
drivers: soc: ti: knav_qmss_queue: Fix error return code in knav_queue_probe
soc: qcom: initialize local variable
arm64: dts: qcom: sm8250: correct compatible for sm8250-mtp
arm64: dts: qcom: msm8916-samsung-a2015: Disable muic i2c pin bias
Input: omap4-keypad - fix runtime PM error handling
clk: meson: Kconfig: fix dependency for G12A
staging: mfd: hi6421-spmi-pmic: fix error return code in hi6421_spmi_pmic_probe()
ath11k: Fix the rx_filter flag setting for peer rssi stats
RDMA/cxgb4: Validate the number of CQEs
soundwire: Fix DEBUG_LOCKS_WARN_ON for uninitialized attribute
pinctrl: sunxi: fix irq bank map for the Allwinner A100 pin controller
memstick: fix a double-free bug in memstick_check
ARM: dts: at91: sam9x60: add pincontrol for USB Host
ARM: dts: at91: sama5d4_xplained: add pincontrol for USB Host
ARM: dts: at91: sama5d3_xplained: add pincontrol for USB Host
mmc: pxamci: Fix error return code in pxamci_probe
brcmfmac: fix error return code in brcmf_cfg80211_connect()
orinoco: Move context allocation after processing the skb
qtnfmac: fix error return code in qtnf_pcie_probe()
rsi: fix error return code in rsi_reset_card()
cw1200: fix missing destroy_workqueue() on error in cw1200_init_common
dmaengine: mv_xor_v2: Fix error return code in mv_xor_v2_probe()
arm64: dts: qcom: sdm845: Limit ipa iommu streams
leds: netxbig: add missing put_device() call in netxbig_leds_get_of_pdata()
leds: lp50xx: Fix an error handling path in 'lp50xx_probe_dt()'
leds: turris-omnia: check for LED_COLOR_ID_RGB instead LED_COLOR_ID_MULTI
arm64: tegra: Fix DT binding for IO High Voltage entry
RDMA/cma: Fix deadlock on &lock in rdma_cma_listen_on_all() error unwind
soundwire: qcom: Fix build failure when slimbus is module
drm/imx/dcss: fix rotations for Vivante tiled formats
media: siano: fix memory leak of debugfs members in smsdvb_hotplug
platform/x86: mlx-platform: Remove PSU EEPROM from default platform configuration
platform/x86: mlx-platform: Remove PSU EEPROM from MSN274x platform configuration
arm64: dts: qcom: sc7180: limit IPA iommu streams
RDMA/hns: Only record vlan info for HIP08
RDMA/hns: Fix missing fields in address vector
RDMA/hns: Avoid setting loopback indicator when smac is same as dmac
serial: 8250-mtk: Fix reference leak in mtk8250_probe
samples: bpf: Fix lwt_len_hist reusing previous BPF map
media: imx214: Fix stop streaming
mips: cdmm: fix use-after-free in mips_cdmm_bus_discover
media: max2175: fix max2175_set_csm_mode() error code
slimbus: qcom-ngd-ctrl: Avoid sending power requests without QMI
RDMA/core: Track device memory MRs
drm/mediatek: Use correct aliases name for ovl
HSI: omap_ssi: Don't jump to free ID in ssi_add_controller()
ARM: dts: Remove non-existent i2c1 from 98dx3236
arm64: dts: armada-3720-turris-mox: update ethernet-phy handle name
power: supply: bq25890: Use the correct range for IILIM register
arm64: dts: rockchip: Set dr_mode to "host" for OTG on rk3328-roc-cc
power: supply: max17042_battery: Fix current_{avg,now} hiding with no current sense
power: supply: axp288_charger: Fix HP Pavilion x2 10 DMI matching
power: supply: bq24190_charger: fix reference leak
genirq/irqdomain: Don't try to free an interrupt that has no mapping
arm64: dts: ls1028a: fix ENETC PTP clock input
arm64: dts: ls1028a: fix FlexSPI clock input
arm64: dts: freescale: sl28: combine SPI MTD partitions
phy: tegra: xusb: Fix usb_phy device driver field
arm64: dts: qcom: c630: Polish i2c-hid devices
arm64: dts: qcom: c630: Fix pinctrl pins properties
PCI: Bounds-check command-line resource alignment requests
PCI: Fix overflow in command-line resource alignment requests
PCI: iproc: Fix out-of-bound array accesses
PCI: iproc: Invalidate correct PAXB inbound windows
arm64: dts: meson: fix spi-max-frequency on Khadas VIM2
arm64: dts: meson-sm1: fix typo in opp table
soc: amlogic: canvas: add missing put_device() call in meson_canvas_get()
scsi: hisi_sas: Fix up probe error handling for v3 hw
scsi: pm80xx: Do not sleep in atomic context
spi: spi-fsl-dspi: Use max_native_cs instead of num_chipselect to set SPI_MCR
ARM: dts: at91: at91sam9rl: fix ADC triggers
RDMA/hns: Fix 0-length sge calculation error
RDMA/hns: Bugfix for calculation of extended sge
mailbox: arm_mhu_db: Fix mhu_db_shutdown by replacing kfree with devm_kfree
soundwire: master: use pm_runtime_set_active() on add
platform/x86: dell-smbios-base: Fix error return code in dell_smbios_init
ASoC: Intel: Boards: tgl_max98373: update TDM slot_width
media: max9271: Fix GPIO enable/disable
media: rdacm20: Enable GPIO1 explicitly
media: i2c: imx219: Selection compliance fixes
ath11k: Don't cast ath11k_skb_cb to ieee80211_tx_info.control
ath11k: Reset ath11k_skb_cb before setting new flags
ath11k: Fix an error handling path
ath10k: Fix the parsing error in service available event
ath10k: Fix an error handling path
ath10k: Release some resources in an error handling path
SUNRPC: rpc_wake_up() should wake up tasks in the correct order
NFSv4.2: condition READDIR's mask for security label based on LSM state
SUNRPC: xprt_load_transport() needs to support the netid "rdma6"
NFSv4: Fix the alignment of page data in the getdeviceinfo reply
net: sunrpc: Fix 'snprintf' return value check in 'do_xprt_debugfs'
lockd: don't use interval-based rebinding over TCP
NFS: switch nfsiod to be an UNBOUND workqueue.
selftests/seccomp: Update kernel config
vfio-pci: Use io_remap_pfn_range() for PCI IO memory
hwmon: (ina3221) Fix PM usage counter unbalance in ina3221_write_enable
f2fs: fix double free of unicode map
media: tvp5150: Fix wrong return value of tvp5150_parse_dt()
media: saa7146: fix array overflow in vidioc_s_audio()
powerpc/perf: Fix crash with is_sier_available when pmu is not set
powerpc/64: Fix an EMIT_BUG_ENTRY in head_64.S
powerpc/xmon: Fix build failure for 8xx
powerpc/perf: Fix to update radix_scope_qual in power10
powerpc/perf: Update the PMU group constraints for l2l3 events in power10
powerpc/perf: Fix the PMU group constraints for threshold events in power10
clocksource/drivers/orion: Add missing clk_disable_unprepare() on error path
clocksource/drivers/cadence_ttc: Fix memory leak in ttc_setup_clockevent()
clocksource/drivers/ingenic: Fix section mismatch
clocksource/drivers/riscv: Make RISCV_TIMER depends on RISCV_SBI
arm64: mte: fix prctl(PR_GET_TAGGED_ADDR_CTRL) if TCF0=NONE
iio: hrtimer-trigger: Mark hrtimer to expire in hard interrupt context
libbpf: Sanitise map names before pinning
ARM: dts: at91: sam9x60ek: remove bypass property
ARM: dts: at91: sama5d2: map securam as device
scripts: kernel-doc: fix parsing function-like typedefs
bpf: Fix bpf_put_raw_tracepoint()'s use of __module_address()
selftests/bpf: Fix invalid use of strncat in test_sockmap
pinctrl: falcon: add missing put_device() call in pinctrl_falcon_probe()
soc: rockchip: io-domain: Fix error return code in rockchip_iodomain_probe()
arm64: dts: rockchip: Fix UART pull-ups on rk3328
memstick: r592: Fix error return in r592_probe()
MIPS: Don't round up kernel sections size for memblock_add()
mt76: mt7663s: fix a possible ple quota underflow
mt76: mt7915: set fops_sta_stats.owner to THIS_MODULE
mt76: set fops_tx_stats.owner to THIS_MODULE
mt76: dma: fix possible deadlock running mt76_dma_cleanup
net/mlx5: Properly convey driver version to firmware
mt76: fix memory leak if device probing fails
mt76: fix tkip configuration for mt7615/7663 devices
ASoC: jz4740-i2s: add missed checks for clk_get()
ASoC: q6afe-clocks: Add missing parent clock rate
dm ioctl: fix error return code in target_message
ASoC: cros_ec_codec: fix uninitialized memory read
ASoC: atmel: mchp-spdifrx needs COMMON_CLK
ASoC: qcom: fix QDSP6 dependencies, attempt #3
phy: mediatek: allow compile-testing the hdmi phy
phy: renesas: rcar-gen3-usb2: disable runtime pm in case of failure
memory: ti-emif-sram: only build for ARMv7
memory: jz4780_nemc: Fix potential NULL dereference in jz4780_nemc_probe()
drm/msm: a5xx: Make preemption reset case reentrant
drm/msm: add IOMMU_SUPPORT dependency
clocksource/drivers/arm_arch_timer: Use stable count reader in erratum sne
clocksource/drivers/arm_arch_timer: Correct fault programming of CNTKCTL_EL1.EVNTI
cpufreq: ap806: Add missing MODULE_DEVICE_TABLE
cpufreq: highbank: Add missing MODULE_DEVICE_TABLE
cpufreq: mediatek: Add missing MODULE_DEVICE_TABLE
cpufreq: qcom: Add missing MODULE_DEVICE_TABLE
cpufreq: st: Add missing MODULE_DEVICE_TABLE
cpufreq: sun50i: Add missing MODULE_DEVICE_TABLE
cpufreq: loongson1: Add missing MODULE_ALIAS
cpufreq: scpi: Add missing MODULE_ALIAS
cpufreq: vexpress-spc: Add missing MODULE_ALIAS
cpufreq: imx: fix NVMEM_IMX_OCOTP dependency
macintosh/adb-iop: Always wait for reply message from IOP
macintosh/adb-iop: Send correct poll command
staging: bcm2835: fix vchiq_mmal dependencies
staging: greybus: audio: Fix possible leak free widgets in gbaudio_dapm_free_controls
spi: dw: Fix error return code in dw_spi_bt1_probe()
Bluetooth: btusb: Add the missed release_firmware() in btusb_mtk_setup_firmware()
Bluetooth: btmtksdio: Add the missed release_firmware() in mtk_setup_firmware()
Bluetooth: sco: Fix crash when using BT_SNDMTU/BT_RCVMTU option
block/rnbd-clt: Dynamically alloc buffer for pathname & blk_symlink_name
block/rnbd: fix a null pointer dereference on dev->blk_symlink_name
Bluetooth: btusb: Fix detection of some fake CSR controllers with a bcdDevice val of 0x0134
platform/x86: intel-vbtn: Fix SW_TABLET_MODE always reporting 1 on some HP x360 models
adm8211: fix error return code in adm8211_probe()
mtd: spi-nor: sst: fix BPn bits for the SST25VF064C
mtd: spi-nor: ignore errors in spi_nor_unlock_all()
mtd: spi-nor: atmel: remove global protection flag
mtd: spi-nor: atmel: fix unlock_all() for AT25FS010/040
arm64: dts: meson: g12b: odroid-n2: fix PHY deassert timing requirements
arm64: dts: meson: fix PHY deassert timing requirements
ARM: dts: meson: fix PHY deassert timing requirements
arm64: dts: meson: g12a: x96-max: fix PHY deassert timing requirements
arm64: dts: meson: g12b: w400: fix PHY deassert timing requirements
clk: fsl-sai: fix memory leak
scsi: qedi: Fix missing destroy_workqueue() on error in __qedi_probe
scsi: pm80xx: Fix error return in pm8001_pci_probe()
scsi: iscsi: Fix inappropriate use of put_device()
seq_buf: Avoid type mismatch for seq_buf_init
scsi: fnic: Fix error return code in fnic_probe()
platform/x86: mlx-platform: Fix item counter assignment for MSN2700, MSN24xx systems
platform/x86: mlx-platform: Fix item counter assignment for MSN2700/ComEx system
ARM: 9030/1: entry: omit FP emulation for UND exceptions taken in kernel mode
powerpc/pseries/hibernation: drop pseries_suspend_begin() from suspend ops
powerpc/pseries/hibernation: remove redundant cacheinfo update
powerpc/powermac: Fix low_sleep_handler with CONFIG_VMAP_STACK
drm/mediatek: avoid dereferencing a null hdmi_phy on an error message
ASoC: amd: change clk_get() to devm_clk_get() and add missed checks
coresight: remove broken __exit annotations
ASoC: max98390: Fix error codes in max98390_dsm_init()
powerpc/mm: sanity_check_fault() should work for all, not only BOOK3S
usb: ehci-omap: Fix PM disable depth umbalance in ehci_hcd_omap_probe
usb: oxu210hp-hcd: Fix memory leak in oxu_create
speakup: fix uninitialized flush_lock
nfsd: Fix message level for normal termination
NFSD: Fix 5 seconds delay when doing inter server copy
nfs_common: need lock during iterate through the list
x86/kprobes: Restore BTF if the single-stepping is cancelled
scsi: qla2xxx: Fix FW initialization error on big endian machines
scsi: qla2xxx: Fix N2N and NVMe connect retry failure
platform/chrome: cros_ec_spi: Don't overwrite spi::mode
misc: pci_endpoint_test: fix return value of error branch
bus: fsl-mc: add back accidentally dropped error check
bus: fsl-mc: fix error return code in fsl_mc_object_allocate()
fsi: Aspeed: Add mutex to protect HW access
s390/cio: fix use-after-free in ccw_device_destroy_console
iwlwifi: dbg-tlv: fix old length in is_trig_data_contained()
iwlwifi: mvm: hook up missing RX handlers
erofs: avoid using generic_block_bmap
clk: renesas: r8a779a0: Fix R and OSC clocks
can: m_can: m_can_config_endisable(): remove double clearing of clock stop request bit
powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31 is set
powerpc/sstep: Cover new VSX instructions under CONFIG_VSX
slimbus: qcom: fix potential NULL dereference in qcom_slim_prg_slew()
ALSA: hda/hdmi: fix silent stream for first playback to DP
RDMA/core: Do not indicate device ready when device enablement fails
RDMA/uverbs: Fix incorrect variable type
remoteproc/mediatek: change MT8192 CFG register base
remoteproc/mtk_scp: surround DT device IDs with CONFIG_OF
remoteproc: q6v5-mss: fix error handling in q6v5_pds_enable
remoteproc: qcom: fix reference leak in adsp_start
remoteproc: qcom: pas: fix error handling in adsp_pds_enable
remoteproc: k3-dsp: Fix return value check in k3_dsp_rproc_of_get_memories()
remoteproc: qcom: Fix potential NULL dereference in adsp_init_mmio()
remoteproc/mediatek: unprepare clk if scp_before_load fails
clk: qcom: gcc-sc7180: Use floor ops for sdcc clks
clk: tegra: Fix duplicated SE clock entry
mtd: rawnand: gpmi: fix reference count leak in gpmi ops
mtd: rawnand: meson: Fix a resource leak in init
mtd: rawnand: gpmi: Fix the random DMA timeout issue
samples/bpf: Fix possible hang in xdpsock with multiple threads
fs: Handle I_DONTCACHE in iput_final() instead of generic_drop_inode()
extcon: max77693: Fix modalias string
crypto: atmel-i2c - select CONFIG_BITREVERSE
mac80211: don't set set TDLS STA bandwidth wider than possible
mac80211: fix a mistake check for rx_stats update
ASoC: wm_adsp: remove "ctl" from list on error in wm_adsp_create_control()
irqchip/alpine-msi: Fix freeing of interrupts on allocation error path
irqchip/ti-sci-inta: Fix printing of inta id on probe success
irqchip/ti-sci-intr: Fix freeing of irqs
dmaengine: ti: k3-udma: Correct normal channel offset when uchan_cnt is not 0
RDMA/hns: Limit the length of data copied between kernel and userspace
RDMA/hns: Normalization the judgment of some features
RDMA/hns: Do shift on traffic class when using RoCEv2
gpiolib: irq hooks: fix recursion in gpiochip_irq_unmask
ath11k: Fix incorrect tlvs in scan start command
irqchip/qcom-pdc: Fix phantom irq when changing between rising/falling
watchdog: armada_37xx: Add missing dependency on HAS_IOMEM
watchdog: sirfsoc: Add missing dependency on HAS_IOMEM
watchdog: sprd: remove watchdog disable from resume fail path
watchdog: sprd: check busy bit before new loading rather than after that
watchdog: Fix potential dereferencing of null pointer
ubifs: Fix error return code in ubifs_init_authentication()
um: Monitor error events in IRQ controller
um: tty: Fix handling of close in tty lines
um: chan_xterm: Fix fd leak
sunrpc: fix xs_read_xdr_buf for partial pages receive
RDMA/mlx5: Fix MR cache memory leak
RDMA/cma: Don't overwrite sgid_attr after device is released
nfc: s3fwrn5: Release the nfc firmware
drm: mxsfb: Silence -EPROBE_DEFER while waiting for bridge
powerpc/perf: Fix Threshold Event Counter Multiplier width for P10
powerpc/ps3: use dma_mapping_error()
perf test: Fix metric parsing test
drm/amdgpu: fix regression in vbios reservation handling on headless
mm/gup: reorganize internal_get_user_pages_fast()
mm/gup: prevent gup_fast from racing with COW during fork
mm/gup: combine put_compound_head() and unpin_user_page()
mm: memcg/slab: fix return of child memcg objcg for root memcg
mm: memcg/slab: fix use after free in obj_cgroup_charge
mm/rmap: always do TTU_IGNORE_ACCESS
sparc: fix handling of page table constructor failure
mm/vmalloc: Fix unlock order in s_stop()
mm/vmalloc.c: fix kasan shadow poisoning size
mm,memory_failure: always pin the page in madvise_inject_error
hugetlb: fix an error code in hugetlb_reserve_pages()
mm: don't wake kswapd prematurely when watermark boosting is disabled
proc: fix lookup in /proc/net subdirectories after setns(2)
checkpatch: fix unescaped left brace
s390/test_unwind: fix CALL_ON_STACK tests
lan743x: fix rx_napi_poll/interrupt ping-pong
ice, xsk: clear the status bits for the next_to_use descriptor
i40e, xsk: clear the status bits for the next_to_use descriptor
net: dsa: qca: ar9331: fix sleeping function called from invalid context bug
dpaa2-eth: fix the size of the mapped SGT buffer
net: bcmgenet: Fix a resource leak in an error handling path in the probe functin
net: mscc: ocelot: Fix a resource leak in the error handling path of the probe function
net: allwinner: Fix some resources leak in the error handling path of the probe and in the remove function
block/rnbd-clt: Get rid of warning regarding size argument in strlcpy
block/rnbd-clt: Fix possible memleak
NFS/pNFS: Fix a typo in ff_layout_resend_pnfs_read()
net: korina: fix return value
devlink: use _BITUL() macro instead of BIT() in the UAPI header
libnvdimm/label: Return -ENXIO for no slot in __blk_label_update
powerpc/32s: Fix cleanup_cpu_mmu_context() compile bug
watchdog: qcom: Avoid context switch in restart handler
watchdog: coh901327: add COMMON_CLK dependency
clk: ti: Fix memleak in ti_fapll_synth_setup
pwm: zx: Add missing cleanup in error path
pwm: lp3943: Dynamically allocate PWM chip base
pwm: imx27: Fix overflow for bigger periods
pwm: sun4i: Remove erroneous else branch
io_uring: cancel only requests of current task
tools build: Add missing libcap to test-all.bin target
perf record: Fix memory leak when using '--user-regs=?' to list registers
qlcnic: Fix error code in probe
nfp: move indirect block cleanup to flower app stop callback
vdpa/mlx5: Use write memory barrier after updating CQ index
virtio_ring: Cut and paste bugs in vring_create_virtqueue_packed()
virtio_net: Fix error code in probe()
virtio_ring: Fix two use after free bugs
vhost scsi: fix error return code in vhost_scsi_set_endpoint()
epoll: check for events when removing a timed out thread from the wait queue
clk: bcm: dvp: Add MODULE_DEVICE_TABLE()
clk: at91: sama7g5: fix compilation error
clk: at91: sam9x60: remove atmel,osc-bypass support
clk: s2mps11: Fix a resource leak in error handling paths in the probe function
clk: sunxi-ng: Make sure divider tables have sentinel
clk: vc5: Use "idt,voltage-microvolt" instead of "idt,voltage-microvolts"
kconfig: fix return value of do_error_if()
powerpc/boot: Fix build of dts/fsl
powerpc/smp: Add __init to init_big_cores()
ARM: 9044/1: vfp: use undef hook for VFP support detection
ARM: 9036/1: uncompress: Fix dbgadtb size parameter name
perf probe: Fix memory leak when synthesizing SDT probes
io_uring: fix racy IOPOLL flush overflow
io_uring: cancel reqs shouldn't kill overflow list
Smack: Handle io_uring kernel thread privileges
proc mountinfo: make splice available again
io_uring: fix io_cqring_events()'s noflush
io_uring: fix racy IOPOLL completions
io_uring: always let io_iopoll_complete() complete polled io
vfio/pci: Move dummy_resources_list init in vfio_pci_probe()
vfio/pci/nvlink2: Do not attempt NPU2 setup on POWER8NVL NPU
media: gspca: Fix memory leak in probe
io_uring: fix io_wqe->work_list corruption
io_uring: fix 0-iov read buffer select
io_uring: hold uring_lock while completing failed polled io in io_wq_submit_work()
io_uring: fix ignoring xa_store errors
io_uring: fix double io_uring free
io_uring: make ctx cancel on exit targeted to actual ctx
media: sunxi-cir: ensure IR is handled when it is continuous
media: netup_unidvb: Don't leak SPI master in probe error path
media: ipu3-cio2: Remove traces of returned buffers
media: ipu3-cio2: Return actual subdev format
media: ipu3-cio2: Serialise access to pad format
media: ipu3-cio2: Validate mbus format in setting subdev format
media: ipu3-cio2: Make the field on subdev format V4L2_FIELD_NONE
Input: cyapa_gen6 - fix out-of-bounds stack access
ALSA: hda/ca0132 - Change Input Source enum strings.
ACPI: NFIT: Fix input validation of bus-family
PM: ACPI: PCI: Drop acpi_pm_set_bridge_wakeup()
Revert "ACPI / resources: Use AE_CTRL_TERMINATE to terminate resources walks"
ACPI: PNP: compare the string length in the matching_id()
ALSA: hda: Fix regressions on clear and reconfig sysfs
ALSA: hda/ca0132 - Fix AE-5 rear headphone pincfg.
ALSA: hda/realtek: make bass spk volume adjustable on a yoga laptop
ALSA: hda/realtek - Enable headset mic of ASUS X430UN with ALC256
ALSA: hda/realtek - Enable headset mic of ASUS Q524UQK with ALC255
ALSA: hda/realtek - Add supported for more Lenovo ALC285 Headset Button
ALSA: pcm: oss: Fix a few more UBSAN fixes
ALSA/hda: apply jack fixup for the Acer Veriton N4640G/N6640G/N2510G
ALSA: hda/realtek: Add quirk for MSI-GP73
ALSA: hda/realtek: Apply jack fixup for Quanta NL3
ALSA: hda/realtek: Remove dummy lineout on Acer TravelMate P648/P658
ALSA: hda/realtek - Supported Dell fixed type headset
ALSA: usb-audio: Add VID to support native DSD reproduction on FiiO devices
ALSA: usb-audio: Disable sample read check if firmware doesn't give back
ALSA: usb-audio: Add alias entry for ASUS PRIME TRX40 PRO-S
ALSA: core: memalloc: add page alignment for iram
s390/smp: perform initial CPU reset also for SMT siblings
s390/kexec_file: fix diag308 subcode when loading crash kernel
s390/idle: add missing mt_cycles calculation
s390/idle: fix accounting with machine checks
s390/dasd: fix hanging device offline processing
s390/dasd: prevent inconsistent LCU device data
s390/dasd: fix list corruption of pavgroup group list
s390/dasd: fix list corruption of lcu list
binder: add flag to clear buffer on txn complete
ASoC: cx2072x: Fix doubly definitions of Playback and Capture streams
ASoC: AMD Renoir - add DMI table to avoid the ACP mic probe (broken BIOS)
ASoC: AMD Raven/Renoir - fix the PCI probe (PCI revision)
staging: comedi: mf6x4: Fix AI end-of-conversion detection
z3fold: simplify freeing slots
z3fold: stricter locking and more careful reclaim
perf/x86/intel: Add event constraint for CYCLE_ACTIVITY.STALLS_MEM_ANY
perf/x86/intel: Fix rtm_abort_event encoding on Ice Lake
perf/x86/intel/lbr: Fix the return type of get_lbr_cycles()
powerpc/perf: Exclude kernel samples while counting events in user space.
cpufreq: intel_pstate: Use most recent guaranteed performance values
crypto: ecdh - avoid unaligned accesses in ecdh_set_secret()
crypto: arm/aes-ce - work around Cortex-A57/A72 silion errata
m68k: Fix WARNING splat in pmac_zilog driver
Documentation: seqlock: s/LOCKTYPE/LOCKNAME/g
EDAC/i10nm: Use readl() to access MMIO registers
EDAC/amd64: Fix PCI component registration
cpuset: fix race between hotplug work and later CPU offline
dyndbg: fix use before null check
USB: serial: mos7720: fix parallel-port state restore
USB: serial: digi_acceleport: fix write-wakeup deadlocks
USB: serial: keyspan_pda: fix dropped unthrottle interrupts
USB: serial: keyspan_pda: fix write deadlock
USB: serial: keyspan_pda: fix stalled writes
USB: serial: keyspan_pda: fix write-wakeup use-after-free
USB: serial: keyspan_pda: fix tx-unthrottle use-after-free
USB: serial: keyspan_pda: fix write unthrottling
btrfs: do not shorten unpin len for caching block groups
btrfs: update last_byte_to_unpin in switch_commit_roots
btrfs: fix race when defragmenting leads to unnecessary IO
ext4: fix an IS_ERR() vs NULL check
ext4: fix a memory leak of ext4_free_data
ext4: fix deadlock with fs freezing and EA inodes
ext4: don't remount read-only with errors=continue on reboot
RISC-V: Fix usage of memblock_enforce_memory_limit
arm64: dts: ti: k3-am65: mark dss as dma-coherent
arm64: dts: marvell: keep SMMU disabled by default for Armada 7040 and 8040
KVM: arm64: Introduce handling of AArch32 TTBCR2 traps
KVM: x86: reinstate vendor-agnostic check on SPEC_CTRL cpuid bits
KVM: SVM: Remove the call to sev_platform_status() during setup
iommu/arm-smmu: Allow implementation specific write_s2cr
iommu/arm-smmu-qcom: Read back stream mappings
iommu/arm-smmu-qcom: Implement S2CR quirk
ARM: dts: pandaboard: fix pinmux for gpio user button of Pandaboard ES
ARM: dts: at91: sama5d2: fix CAN message ram offset and size
ARM: tegra: Populate OPP table for Tegra20 Ventana
xprtrdma: Fix XDRBUF_SPARSE_PAGES support
powerpc/32: Fix vmap stack - Properly set r1 before activating MMU on syscall too
powerpc: Fix incorrect stw{, ux, u, x} instructions in __set_pte_at
powerpc/rtas: Fix typo of ibm,open-errinjct in RTAS filter
powerpc/bitops: Fix possible undefined behaviour with fls() and fls64()
powerpc/feature: Add CPU_FTR_NOEXECUTE to G2_LE
powerpc/xmon: Change printk() to pr_cont()
powerpc/8xx: Fix early debug when SMC1 is relocated
powerpc/mm: Fix verification of MMU_FTR_TYPE_44x
powerpc/powernv/npu: Do not attempt NPU2 setup on POWER8NVL NPU
powerpc/powernv/memtrace: Don't leak kernel memory to user space
powerpc/powernv/memtrace: Fix crashing the kernel when enabling concurrently
ovl: make ioctl() safe
ima: Don't modify file descriptor mode on the fly
um: Remove use of asprinf in umid.c
um: Fix time-travel mode
ceph: fix race in concurrent __ceph_remove_cap invocations
SMB3: avoid confusing warning message on mount to Azure
SMB3.1.1: remove confusing mount warning when no SPNEGO info on negprot rsp
SMB3.1.1: do not log warning message if server doesn't populate salt
ubifs: wbuf: Don't leak kernel memory to flash
jffs2: Fix GC exit abnormally
jffs2: Fix ignoring mounting options problem during remounting
fsnotify: generalize handle_inode_event()
inotify: convert to handle_inode_event() interface
fsnotify: fix events reported to watching parent and child
jfs: Fix array index bounds check in dbAdjTree
drm/panfrost: Fix job timeout handling
drm/panfrost: Move the GPU reset bits outside the timeout handler
platform/x86: mlx-platform: remove an unused variable
drm/amdgpu: only set DP subconnector type on DP and eDP connectors
drm/amd/display: Fix memory leaks in S3 resume
drm/dp_aux_dev: check aux_dev before use in drm_dp_aux_dev_get_by_minor()
drm/i915: Fix mismatch between misplaced vma check and vma insert
iio: ad_sigma_delta: Don't put SPI transfer buffer on the stack
spi: pxa2xx: Fix use-after-free on unbind
spi: spi-sh: Fix use-after-free on unbind
spi: atmel-quadspi: Fix use-after-free on unbind
spi: spi-mtk-nor: Don't leak SPI master in probe error path
spi: ar934x: Don't leak SPI master in probe error path
spi: davinci: Fix use-after-free on unbind
spi: fsl: fix use of spisel_boot signal on MPC8309
spi: gpio: Don't leak SPI master in probe error path
spi: mxic: Don't leak SPI master in probe error path
spi: npcm-fiu: Disable clock in probe error path
spi: pic32: Don't leak DMA channels in probe error path
spi: rb4xx: Don't leak SPI master in probe error path
spi: rpc-if: Fix use-after-free on unbind
spi: sc18is602: Don't leak SPI master in probe error path
spi: spi-geni-qcom: Fix use-after-free on unbind
spi: spi-qcom-qspi: Fix use-after-free on unbind
spi: st-ssc4: Fix unbalanced pm_runtime_disable() in probe error path
spi: synquacer: Disable clock in probe error path
spi: mt7621: Disable clock in probe error path
spi: mt7621: Don't leak SPI master in probe error path
spi: atmel-quadspi: Disable clock in probe error path
spi: atmel-quadspi: Fix AHB memory accesses
soc: qcom: smp2p: Safely acquire spinlock without IRQs
mtd: spinand: Fix OOB read
mtd: parser: cmdline: Fix parsing of part-names with colons
mtd: core: Fix refcounting for unpartitioned MTDs
mtd: rawnand: qcom: Fix DMA sync on FLASH_STATUS register read
mtd: rawnand: meson: fix meson_nfc_dma_buffer_release() arguments
scsi: qla2xxx: Fix crash during driver load on big endian machines
scsi: lpfc: Fix invalid sleeping context in lpfc_sli4_nvmet_alloc()
scsi: lpfc: Fix scheduling call while in softirq context in lpfc_unreg_rpi
scsi: lpfc: Re-fix use after free in lpfc_rq_buf_free()
openat2: reject RESOLVE_BENEATH|RESOLVE_IN_ROOT
iio: buffer: Fix demux update
iio: adc: rockchip_saradc: fix missing clk_disable_unprepare() on error in rockchip_saradc_resume
iio: imu: st_lsm6dsx: fix edge-trigger interrupts
iio:light:rpr0521: Fix timestamp alignment and prevent data leak.
iio:light:st_uvis25: Fix timestamp alignment and prevent data leak.
iio:magnetometer:mag3110: Fix alignment and data leak issues.
iio:pressure:mpl3115: Force alignment of buffer
iio:imu:bmi160: Fix too large a buffer.
iio:imu:bmi160: Fix alignment and data leak issues
iio:adc:ti-ads124s08: Fix buffer being too long.
iio:adc:ti-ads124s08: Fix alignment and data leak issues.
md/cluster: block reshape with remote resync job
md/cluster: fix deadlock when node is doing resync job
pinctrl: sunxi: Always call chained_irq_{enter, exit} in sunxi_pinctrl_irq_handler
clk: ingenic: Fix divider calculation with div tables
clk: mvebu: a3700: fix the XTAL MODE pin to MPP1_9
clk: tegra: Do not return 0 on failure
counter: microchip-tcb-capture: Fix CMR value check
device-dax/core: Fix memory leak when rmmod dax.ko
dma-buf/dma-resv: Respect num_fences when initializing the shared fence list.
driver: core: Fix list corruption after device_del()
xen-blkback: set ring->xenblkd to NULL after kthread_stop()
xen/xenbus: Allow watches discard events before queueing
xen/xenbus: Add 'will_handle' callback support in xenbus_watch_path()
xen/xenbus/xen_bus_type: Support will_handle watch callback
xen/xenbus: Count pending messages for each watch
xenbus/xenbus_backend: Disallow pending watch messages
memory: jz4780_nemc: Fix an error pointer vs NULL check in probe()
memory: renesas-rpc-if: Fix a node reference leak in rpcif_probe()
memory: renesas-rpc-if: Return correct value to the caller of rpcif_manual_xfer()
memory: renesas-rpc-if: Fix unbalanced pm_runtime_enable in rpcif_{enable,disable}_rpm
libnvdimm/namespace: Fix reaping of invalidated block-window-namespace labels
platform/x86: intel-vbtn: Allow switch events on Acer Switch Alpha 12
tracing: Disable ftrace selftests when any tracer is running
mt76: add back the SUPPORTS_REORDERING_BUFFER flag
of: fix linker-section match-table corruption
PCI: Fix pci_slot_release() NULL pointer dereference
regulator: axp20x: Fix DLDO2 voltage control register mask for AXP22x
remoteproc: sysmon: Ensure remote notification ordering
thermal/drivers/cpufreq_cooling: Update cpufreq_state only if state has changed
rtc: ep93xx: Fix NULL pointer dereference in ep93xx_rtc_read_time
Revert: "ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS"
null_blk: Fix zone size initialization
null_blk: Fail zone append to conventional zones
drm/edid: fix objtool warning in drm_cvt_modes()
x86/CPU/AMD: Save AMD NodeId as cpu_die_id
Linux 5.10.4
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I25209e79d8b9faf5382087955a29b7404bdefe38
|
||
|
|
bd3f4b6fd9 |
mm: don't wake kswapd prematurely when watermark boosting is disabled
[ Upstream commit 597c892038e08098b17ccfe65afd9677e6979800 ]
On 2-node NUMA hosts we see bursts of kswapd reclaim and subsequent
pressure spikes and stalls from cache refaults while there is plenty of
free memory in the system.
Usually, kswapd is woken up when all eligible nodes in an allocation are
full. But the code related to watermark boosting can wake kswapd on one
full node while the other one is mostly empty. This may be justified to
fight fragmentation, but is currently unconditionally done whether
watermark boosting is occurring or not.
In our case, many of our workloads' throughput scales with available
memory, and pure utilization is a more tangible concern than trends
around longer-term fragmentation. As a result we generally disable
watermark boosting.
Wake kswapd only woken when watermark boosting is requested.
Link: https://lkml.kernel.org/r/20201020175833.397286-1-hannes@cmpxchg.org
Fixes:
|
||
|
|
3d7ab504ec |
ANDROID: mm: add cma pcp list
Add a PCP list for __GFP_CMA allocations so as not to deprive MIGRATE_MOVABLE allocations quick access to order-zero pages. Bug: 158645321 Signed-off-by: Liam Mark <lmark@codeaurora.org> Signed-off-by: Chris Goldsworthy <cgoldswo@codeaurora.org> Change-Id: I601f686097de733dedeb1c47b00693bcc25829ed |
||
|
|
7ff00a49a2 |
ANDROID: cma: redirect page allocation to CMA
CMA pages are designed to be used as fallback for movable allocations and cannot be used for non-movable allocations. If CMA pages are utilized poorly, non-movable allocations may end up getting starved if all regular movable pages are allocated and the only pages left are CMA. Always using CMA pages first creates unacceptable performance problems. As a midway alternative, use CMA pages for certain userspace allocations. The userspace pages can be migrated or dropped quickly which giving decent utilization. Additionally, add a fall-backs for failed CMA allocations in rmqueue() and __rmqueue_pcplist() (the latter addition being driven by a report by the kernel test robot); these fallbacks were dealt with differently in the original version of the patch as the rmqueue() call chain has changed). Bug: 158645321 Link: https://lore.kernel.org/lkml/cover.1604282969.git.cgoldswo@codeaurora.org/ Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Heesub Shin <heesub.shin@samsung.com> Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> [cgoldswo@codeaurora.org: Place in bugfixes; remove cma_alloc zone flag] Signed-off-by: Chris Goldsworthy <cgoldswo@codeaurora.org> Change-Id: Ibca5eedfc5eacd44542ad483851d741166715f84 |
||
|
|
d53cfb36d9 |
Merge 4d02da974e ("Merge tag 'net-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") into android-mainline
Steps on the way to 5.10-rc5 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I00726ee0d08f08ae6ac5edd07c8fa502b41d4800 |
||
|
|
d8c19014bb |
page_frag: Recover from memory pressure
The ethernet driver may allocate skb (and skb->data) via napi_alloc_skb().
This ends up to page_frag_alloc() to allocate skb->data from
page_frag_cache->va.
During the memory pressure, page_frag_cache->va may be allocated as
pfmemalloc page. As a result, the skb->pfmemalloc is always true as
skb->data is from page_frag_cache->va. The skb will be dropped if the
sock (receiver) does not have SOCK_MEMALLOC. This is expected behaviour
under memory pressure.
However, once kernel is not under memory pressure any longer (suppose large
amount of memory pages are just reclaimed), the page_frag_alloc() may still
re-use the prior pfmemalloc page_frag_cache->va to allocate skb->data. As a
result, the skb->pfmemalloc is always true unless page_frag_cache->va is
re-allocated, even if the kernel is not under memory pressure any longer.
Here is how kernel runs into issue.
1. The kernel is under memory pressure and allocation of
PAGE_FRAG_CACHE_MAX_ORDER in __page_frag_cache_refill() will fail. Instead,
the pfmemalloc page is allocated for page_frag_cache->va.
2: All skb->data from page_frag_cache->va (pfmemalloc) will have
skb->pfmemalloc=true. The skb will always be dropped by sock without
SOCK_MEMALLOC. This is an expected behaviour.
3. Suppose a large amount of pages are reclaimed and kernel is not under
memory pressure any longer. We expect skb->pfmemalloc drop will not happen.
4. Unfortunately, page_frag_alloc() does not proactively re-allocate
page_frag_alloc->va and will always re-use the prior pfmemalloc page. The
skb->pfmemalloc is always true even kernel is not under memory pressure any
longer.
Fix this by freeing and re-allocating the page instead of recycling it.
References: https://lore.kernel.org/lkml/20201103193239.1807-1-dongli.zhang@oracle.com/
References: https://lore.kernel.org/linux-mm/20201105042140.5253-1-willy@infradead.org/
Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Cc: Bert Barbe <bert.barbe@oracle.com>
Cc: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Cc: Manjunath Patil <manjunath.b.patil@oracle.com>
Cc: Joe Jin <joe.jin@oracle.com>
Cc: SRINIVAS <srinivas.eeda@oracle.com>
Fixes:
|
||
|
|
05d2a661fd |
Merge 54a4c789ca ("Merge tag 'docs/v5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media") into android-mainline
Steps on the way to 5.10-rc1 Resolves conflicts in: fs/userfaultfd.c Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ie3fe3c818f1f6565cfd4fa551de72d2b72ef60af |
||
|
|
5a8acc99f7 |
Merge 9ff9b0d392 ("Merge tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next") into android-mainline
Steps on the way to 5.10-rc1 Resolves merge issues in: drivers/net/virtio_net.c net/xfrm/xfrm_state.c net/xfrm/xfrm_user.c Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I3132e7802f25cb775eb02d0b3a03068da39a6fe2 |
||
|
|
75c90a8c3a |
Merge d5660df4a5 ("Merge branch 'akpm' (patches from Andrew)") into android-mainline
steps on the way to 5.10-rc1 Change-Id: Iddc84c25b6a9d71fa8542b927d6f69c364131c3d Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
c4cf498dc0 |
Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton: "155 patches. Subsystems affected by this patch series: mm (dax, debug, thp, readahead, page-poison, util, memory-hotplug, zram, cleanups), misc, core-kernel, get_maintainer, MAINTAINERS, lib, bitops, checkpatch, binfmt, ramfs, autofs, nilfs, rapidio, panic, relay, kgdb, ubsan, romfs, and fault-injection" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (155 commits) lib, uaccess: add failure injection to usercopy functions lib, include/linux: add usercopy failure capability ROMFS: support inode blocks calculation ubsan: introduce CONFIG_UBSAN_LOCAL_BOUNDS for Clang sched.h: drop in_ubsan field when UBSAN is in trap mode scripts/gdb/tasks: add headers and improve spacing format scripts/gdb/proc: add struct mount & struct super_block addr in lx-mounts command kernel/relay.c: drop unneeded initialization panic: dump registers on panic_on_warn rapidio: fix the missed put_device() for rio_mport_add_riodev rapidio: fix error handling path nilfs2: fix some kernel-doc warnings for nilfs2 autofs: harden ioctl table ramfs: fix nommu mmap with gaps in the page cache mm: remove the now-unnecessary mmget_still_valid() hack mm/gup: take mmap_lock in get_dump_page() binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot coredump: rework elf/elf_fdpic vma_dump_size() into common helper coredump: refactor page range dumping into common helper coredump: let dump_emit() bail out on short writes ... |
||
|
|
ab130f9108 |
mm: rename page_order() to buddy_order()
The current page_order() can only be called on pages in the buddy allocator. For compound pages, you have to use compound_order(). This is confusing and led to a bug, so rename page_order() to buddy_order(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20201001152259.14932-2-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
7fef431be9 |
mm/page_alloc: place pages to tail in __free_pages_core()
__free_pages_core() is used when exposing fresh memory to the buddy during system boot and when onlining memory in generic_online_page(). generic_online_page() is used in two cases: 1. Direct memory onlining in online_pages(). 2. Deferred memory onlining in memory-ballooning-like mechanisms (HyperV balloon and virtio-mem), when parts of a section are kept fake-offline to be fake-onlined later on. In 1, we already place pages to the tail of the freelist. Pages will be freed to MIGRATE_ISOLATE lists first and moved to the tail of the freelists via undo_isolate_page_range(). In 2, we currently don't implement a proper rule. In case of virtio-mem, where we currently always online MAX_ORDER - 1 pages, the pages will be placed to the HEAD of the freelist - undesireable. While the hyper-v balloon calls generic_online_page() with single pages, usually it will call it on successive single pages in a larger block. The pages are fresh, so place them to the tail of the freelist and avoid the PCP. In __free_pages_core(), remove the now superflouos call to set_page_refcounted() and add a comment regarding page initialization and the refcount. Note: In 2. we currently don't shuffle. If ever relevant (page shuffling is usually of limited use in virtualized environments), we might want to shuffle after a sequence of generic_online_page() calls in the relevant callers. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Wei Liu <wei.liu@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Scott Cheloha <cheloha@linux.ibm.com> Link: https://lkml.kernel.org/r/20201005121534.15649-5-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
293ffa5ebb |
mm/page_alloc: move pages to tail in move_to_free_list()
Whenever we move pages between freelists via move_to_free_list()/ move_freepages_block(), we don't actually touch the pages: 1. Page isolation doesn't actually touch the pages, it simply isolates pageblocks and moves all free pages to the MIGRATE_ISOLATE freelist. When undoing isolation, we move the pages back to the target list. 2. Page stealing (steal_suitable_fallback()) moves free pages directly between lists without touching them. 3. reserve_highatomic_pageblock()/unreserve_highatomic_pageblock() moves free pages directly between freelists without touching them. We already place pages to the tail of the freelists when undoing isolation via __putback_isolated_page(), let's do it in any case (e.g., if order <= pageblock_order) and document the behavior. To simplify, let's move the pages to the tail for all move_to_free_list()/move_freepages_block() users. In 2., the target list is empty, so there should be no change. In 3., we might observe a change, however, highatomic is more concerned about allocations succeeding than cache hotness - if we ever realize this change degrades a workload, we can special-case this instance and add a proper comment. This change results in all pages getting onlined via online_pages() to be placed to the tail of the freelist. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Rapoport <rppt@kernel.org> Cc: Scott Cheloha <cheloha@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Wei Liu <wei.liu@kernel.org> Link: https://lkml.kernel.org/r/20201005121534.15649-4-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
47b6a24a23 |
mm/page_alloc: place pages to tail in __putback_isolated_page()
__putback_isolated_page() already documents that pages will be placed to the tail of the freelist - this is, however, not the case for "order >= MAX_ORDER - 2" (see buddy_merge_likely()) - which should be the case for all existing users. This change affects two users: - free page reporting - page isolation, when undoing the isolation (including memory onlining). This behavior is desirable for pages that haven't really been touched lately, so exactly the two users that don't actually read/write page content, but rather move untouched pages. The new behavior is especially desirable for memory onlining, where we allow allocation of newly onlined pages via undo_isolate_page_range() in online_pages(). Right now, we always place them to the head of the freelist, resulting in undesireable behavior: Assume we add individual memory chunks via add_memory() and online them right away to the NORMAL zone. We create a dependency chain of unmovable allocations e.g., via the memmap. The memmap of the next chunk will be placed onto previous chunks - if the last block cannot get offlined+removed, all dependent ones cannot get offlined+removed. While this can already be observed with individual DIMMs, it's more of an issue for virtio-mem (and I suspect also ppc DLPAR). Document that this should only be used for optimizations, and no code should rely on this behavior for correction (if the order of the freelists ever changes). We won't care about page shuffling: memory onlining already properly shuffles after onlining. free page reporting doesn't care about physically contiguous ranges, and there are already cases where page isolation will simply move (physically close) free pages to (currently) the head of the freelists via move_freepages_block() instead of shuffling. If this becomes ever relevant, we should shuffle the whole zone when undoing isolation of larger ranges, and after free_contig_range(). Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Rapoport <rppt@kernel.org> Cc: Scott Cheloha <cheloha@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Wei Liu <wei.liu@kernel.org> Link: https://lkml.kernel.org/r/20201005121534.15649-3-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
f04a5d5d91 |
mm/page_alloc: convert "report" flag of __free_one_page() to a proper flag
Patch series "mm: place pages to the freelist tail when onlining and undoing isolation", v2. When adding separate memory blocks via add_memory*() and onlining them immediately, the metadata (especially the memmap) of the next block will be placed onto one of the just added+onlined block. This creates a chain of unmovable allocations: If the last memory block cannot get offlined+removed() so will all dependent ones. We directly have unmovable allocations all over the place. This can be observed quite easily using virtio-mem, however, it can also be observed when using DIMMs. The freshly onlined pages will usually be placed to the head of the freelists, meaning they will be allocated next, turning the just-added memory usually immediately un-removable. The fresh pages are cold, prefering to allocate others (that might be hot) also feels to be the natural thing to do. It also applies to the hyper-v balloon xen-balloon, and ppc64 dlpar: when adding separate, successive memory blocks, each memory block will have unmovable allocations on them - for example gigantic pages will fail to allocate. While the ZONE_NORMAL doesn't provide any guarantees that memory can get offlined+removed again (any kind of fragmentation with unmovable allocations is possible), there are many scenarios (hotplugging a lot of memory, running workload, hotunplug some memory/as much as possible) where we can offline+remove quite a lot with this patchset. a) To visualize the problem, a very simple example: Start a VM with 4GB and 8GB of virtio-mem memory: [root@localhost ~]# lsmem RANGE SIZE STATE REMOVABLE BLOCK 0x0000000000000000-0x00000000bfffffff 3G online yes 0-23 0x0000000100000000-0x000000033fffffff 9G online yes 32-103 Memory block size: 128M Total online memory: 12G Total offline memory: 0B Then try to unplug as much as possible using virtio-mem. Observe which memory blocks are still around. Without this patch set: [root@localhost ~]# lsmem RANGE SIZE STATE REMOVABLE BLOCK 0x0000000000000000-0x00000000bfffffff 3G online yes 0-23 0x0000000100000000-0x000000013fffffff 1G online yes 32-39 0x0000000148000000-0x000000014fffffff 128M online yes 41 0x0000000158000000-0x000000015fffffff 128M online yes 43 0x0000000168000000-0x000000016fffffff 128M online yes 45 0x0000000178000000-0x000000017fffffff 128M online yes 47 0x0000000188000000-0x0000000197ffffff 256M online yes 49-50 0x00000001a0000000-0x00000001a7ffffff 128M online yes 52 0x00000001b0000000-0x00000001b7ffffff 128M online yes 54 0x00000001c0000000-0x00000001c7ffffff 128M online yes 56 0x00000001d0000000-0x00000001d7ffffff 128M online yes 58 0x00000001e0000000-0x00000001e7ffffff 128M online yes 60 0x00000001f0000000-0x00000001f7ffffff 128M online yes 62 0x0000000200000000-0x0000000207ffffff 128M online yes 64 0x0000000210000000-0x0000000217ffffff 128M online yes 66 0x0000000220000000-0x0000000227ffffff 128M online yes 68 0x0000000230000000-0x0000000237ffffff 128M online yes 70 0x0000000240000000-0x0000000247ffffff 128M online yes 72 0x0000000250000000-0x0000000257ffffff 128M online yes 74 0x0000000260000000-0x0000000267ffffff 128M online yes 76 0x0000000270000000-0x0000000277ffffff 128M online yes 78 0x0000000280000000-0x0000000287ffffff 128M online yes 80 0x0000000290000000-0x0000000297ffffff 128M online yes 82 0x00000002a0000000-0x00000002a7ffffff 128M online yes 84 0x00000002b0000000-0x00000002b7ffffff 128M online yes 86 0x00000002c0000000-0x00000002c7ffffff 128M online yes 88 0x00000002d0000000-0x00000002d7ffffff 128M online yes 90 0x00000002e0000000-0x00000002e7ffffff 128M online yes 92 0x00000002f0000000-0x00000002f7ffffff 128M online yes 94 0x0000000300000000-0x0000000307ffffff 128M online yes 96 0x0000000310000000-0x0000000317ffffff 128M online yes 98 0x0000000320000000-0x0000000327ffffff 128M online yes 100 0x0000000330000000-0x000000033fffffff 256M online yes 102-103 Memory block size: 128M Total online memory: 8.1G Total offline memory: 0B With this patch set: [root@localhost ~]# lsmem RANGE SIZE STATE REMOVABLE BLOCK 0x0000000000000000-0x00000000bfffffff 3G online yes 0-23 0x0000000100000000-0x000000013fffffff 1G online yes 32-39 Memory block size: 128M Total online memory: 4G Total offline memory: 0B All memory can get unplugged, all memory block can get removed. Of course, no workload ran and the system was basically idle, but it highlights the issue - the fairly deterministic chain of unmovable allocations. When a huge page for the 2MB memmap is needed, a just-onlined 4MB page will be split. The remaining 2MB page will be used for the memmap of the next memory block. So one memory block will hold the memmap of the two following memory blocks. Finally the pages of the last-onlined memory block will get used for the next bigger allocations - if any allocation is unmovable, all dependent memory blocks cannot get unplugged and removed until that allocation is gone. Note that with bigger memory blocks (e.g., 256MB), *all* memory blocks are dependent and none can get unplugged again! b) Experiment with memory intensive workload I performed an experiment with an older version of this patch set (before we used undo_isolate_page_range() in online_pages(): Hotplug 56GB to a VM with an initial 4GB, onlining all memory to ZONE_NORMAL right from the kernel when adding it. I then run various memory intensive workloads that consume most system memory for a total of 45 minutes. Once finished, I try to unplug as much memory as possible. With this change, I am able to remove via virtio-mem (adding individual 128MB memory blocks) 413 out of 448 added memory blocks. Via individual (256MB) DIMMs 380 out of 448 added memory blocks. (I don't have any numbers without this patchset, but looking at the above example, it's at most half of the 448 memory blocks for virtio-mem, and most probably none for DIMMs). Again, there are workloads that might behave very differently due to the nature of ZONE_NORMAL. This change also affects (besides memory onlining): - Other users of undo_isolate_page_range(): Pages are always placed to the tail. -- When memory offlining fails -- When memory isolation fails after having isolated some pageblocks -- When alloc_contig_range() either succeeds or fails - Other users of __putback_isolated_page(): Pages are always placed to the tail. -- Free page reporting - Other users of __free_pages_core() -- AFAIKs, any memory that is getting exposed to the buddy during boot. IIUC we will now usually allocate memory from lower addresses within a zone first (especially during boot). - Other users of generic_online_page() -- Hyper-V balloon This patch (of 5): Let's prepare for additional flags and avoid long parameter lists of bools. Follow-up patches will also make use of the flags in __free_pages_ok(). Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Scott Cheloha <cheloha@linux.ibm.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Wei Liu <wei.liu@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Link: https://lkml.kernel.org/r/20201005121534.15649-1-david@redhat.com Link: https://lkml.kernel.org/r/20201005121534.15649-2-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
d882c0067d |
mm: pass migratetype into memmap_init_zone() and move_pfn_range_to_zone()
On the memory onlining path, we want to start with MIGRATE_ISOLATE, to un-isolate the pages after memory onlining is complete. Let's allow passing in the migratetype. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michel Lespinasse <walken@google.com> Cc: Charan Teja Reddy <charante@codeaurora.org> Cc: Mel Gorman <mgorman@techsingularity.net> Link: https://lkml.kernel.org/r/20200819175957.28465-10-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
4eb29bd9d0 |
mm/page_alloc: drop stale pageblock comment in memmap_init_zone*()
Commit
|
||
|
|
3fa0c7c79d |
mm/page_isolation: simplify return value of start_isolate_page_range()
Callers no longer need the number of isolated pageblocks. Let's simplify. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Charan Teja Reddy <charante@codeaurora.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michel Lespinasse <walken@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200819175957.28465-7-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
257bea7158 |
mm/page_alloc: simplify __offline_isolated_pages()
offline_pages() is the only user. __offline_isolated_pages() never gets called with ranges that contain memory holes and we no longer care about the return value. Drop the return value handling and all pfn_valid() checks. Update the documentation. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Charan Teja Reddy <charante@codeaurora.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michel Lespinasse <walken@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200819175957.28465-5-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
79f5f8fab4 |
mm,hwpoison: rework soft offline for in-use pages
This patch changes the way we set and handle in-use poisoned pages. Until
now, poisoned pages were released to the buddy allocator, trusting that
the checks that take place at allocation time would act as a safe net and
would skip that page.
This has proved to be wrong, as we got some pfn walkers out there, like
compaction, that all they care is the page to be in a buddy freelist.
Although this might not be the only user, having poisoned pages in the
buddy allocator seems a bad idea as we should only have free pages that
are ready and meant to be used as such.
Before explaining the taken approach, let us break down the kind of pages
we can soft offline.
- Anonymous THP (after the split, they end up being 4K pages)
- Hugetlb
- Order-0 pages (that can be either migrated or invalited)
* Normal pages (order-0 and anon-THP)
- If they are clean and unmapped page cache pages, we invalidate
then by means of invalidate_inode_page().
- If they are mapped/dirty, we do the isolate-and-migrate dance.
Either way, do not call put_page directly from those paths. Instead, we
keep the page and send it to page_handle_poison to perform the right
handling.
page_handle_poison sets the HWPoison flag and does the last put_page.
Down the chain, we placed a check for HWPoison page in
free_pages_prepare, that just skips any poisoned page, so those pages
do not end up in any pcplist/freelist.
After that, we set the refcount on the page to 1 and we increment
the poisoned pages counter.
If we see that the check in free_pages_prepare creates trouble, we can
always do what we do for free pages:
- wait until the page hits buddy's freelists
- take it off, and flag it
The downside of the above approach is that we could race with an
allocation, so by the time we want to take the page off the buddy, the
page has been already allocated so we cannot soft offline it.
But the user could always retry it.
* Hugetlb pages
- We isolate-and-migrate them
After the migration has been successful, we call dissolve_free_huge_page,
and we set HWPoison on the page if we succeed.
Hugetlb has a slightly different handling though.
While for non-hugetlb pages we cared about closing the race with an
allocation, doing so for hugetlb pages requires quite some additional
and intrusive code (we would need to hook in free_huge_page and some other
places).
So I decided to not make the code overly complicated and just fail
normally if the page we allocated in the meantime.
We can always build on top of this.
As a bonus, because of the way we handle now in-use pages, we no longer
need the put-as-isolation-migratetype dance, that was guarding for poisoned
pages to end up in pcplists.
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Aristeu Rozanski <aris@ruivo.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Yakunin <zeil@yandex-team.ru>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200922135650.1634-10-osalvador@suse.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
06be6ff3d2 |
mm,hwpoison: rework soft offline for free pages
When trying to soft-offline a free page, we need to first take it off the buddy allocator. Once we know is out of reach, we can safely flag it as poisoned. take_page_off_buddy will be used to take a page meant to be poisoned off the buddy allocator. take_page_off_buddy calls break_down_buddy_pages, which splits a higher-order page in case our page belongs to one. Once the page is under our control, we call page_handle_poison to set it as poisoned and grab a refcount on it. Signed-off-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Aristeu Rozanski <aris@ruivo.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Yakunin <zeil@yandex-team.ru> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Oscar Salvador <osalvador@suse.com> Cc: Qian Cai <cai@lca.pw> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200922135650.1634-9-osalvador@suse.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
8fb156c9ee |
mm/page_owner: change split_page_owner to take a count
The implementation of split_page_owner() prefers a count rather than the old order of the page. When we support a variable size THP, we won't have the order at this point, but we will have the number of pages. So change the interface to what the caller and callee would prefer. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: SeongJae Park <sjpark@amazon.de> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Huang Ying <ying.huang@intel.com> Link: https://lkml.kernel.org/r/20200908195539.25896-4-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
9ff9b0d392 |
Merge tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski: - Add redirect_neigh() BPF packet redirect helper, allowing to limit stack traversal in common container configs and improving TCP back-pressure. Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain. - Expand netlink policy support and improve policy export to user space. (Ge)netlink core performs request validation according to declared policies. Expand the expressiveness of those policies (min/max length and bitmasks). Allow dumping policies for particular commands. This is used for feature discovery by user space (instead of kernel version parsing or trial and error). - Support IGMPv3/MLDv2 multicast listener discovery protocols in bridge. - Allow more than 255 IPv4 multicast interfaces. - Add support for Type of Service (ToS) reflection in SYN/SYN-ACK packets of TCPv6. - In Multi-patch TCP (MPTCP) support concurrent transmission of data on multiple subflows in a load balancing scenario. Enhance advertising addresses via the RM_ADDR/ADD_ADDR options. - Support SMC-Dv2 version of SMC, which enables multi-subnet deployments. - Allow more calls to same peer in RxRPC. - Support two new Controller Area Network (CAN) protocols - CAN-FD and ISO 15765-2:2016. - Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit kernel problem. - Add TC actions for implementing MPLS L2 VPNs. - Improve nexthop code - e.g. handle various corner cases when nexthop objects are removed from groups better, skip unnecessary notifications and make it easier to offload nexthops into HW by converting to a blocking notifier. - Support adding and consuming TCP header options by BPF programs, opening the doors for easy experimental and deployment-specific TCP option use. - Reorganize TCP congestion control (CC) initialization to simplify life of TCP CC implemented in BPF. - Add support for shipping BPF programs with the kernel and loading them early on boot via the User Mode Driver mechanism, hence reusing all the user space infra we have. - Support sleepable BPF programs, initially targeting LSM and tracing. - Add bpf_d_path() helper for returning full path for given 'struct path'. - Make bpf_tail_call compatible with bpf-to-bpf calls. - Allow BPF programs to call map_update_elem on sockmaps. - Add BPF Type Format (BTF) support for type and enum discovery, as well as support for using BTF within the kernel itself (current use is for pretty printing structures). - Support listing and getting information about bpf_links via the bpf syscall. - Enhance kernel interfaces around NIC firmware update. Allow specifying overwrite mask to control if settings etc. are reset during update; report expected max time operation may take to users; support firmware activation without machine reboot incl. limits of how much impact reset may have (e.g. dropping link or not). - Extend ethtool configuration interface to report IEEE-standard counters, to limit the need for per-vendor logic in user space. - Adopt or extend devlink use for debug, monitoring, fw update in many drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx, dpaa2-eth). - In mlxsw expose critical and emergency SFP module temperature alarms. Refactor port buffer handling to make the defaults more suitable and support setting these values explicitly via the DCBNL interface. - Add XDP support for Intel's igb driver. - Support offloading TC flower classification and filtering rules to mscc_ocelot switches. - Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as fixed interval period pulse generator and one-step timestamping in dpaa-eth. - Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3) offload. - Add Lynx PHY/PCS MDIO module, and convert various drivers which have this HW to use it. Convert mvpp2 to split PCS. - Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as 7-port Mediatek MT7531 IP. - Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver, and wcn3680 support in wcn36xx. - Improve performance for packets which don't require much offloads on recent Mellanox NICs by 20% by making multiple packets share a descriptor entry. - Move chelsio inline crypto drivers (for TLS and IPsec) from the crypto subtree to drivers/net. Move MDIO drivers out of the phy directory. - Clean up a lot of W=1 warnings, reportedly the actively developed subsections of networking drivers should now build W=1 warning free. - Make sure drivers don't use in_interrupt() to dynamically adapt their code. Convert tasklets to use new tasklet_setup API (sadly this conversion is not yet complete). * tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits) Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH" net, sockmap: Don't call bpf_prog_put() on NULL pointer bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo bpf, sockmap: Add locking annotations to iterator netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements net: fix pos incrementment in ipv6_route_seq_next net/smc: fix invalid return code in smcd_new_buf_create() net/smc: fix valid DMBE buffer sizes net/smc: fix use-after-free of delayed events bpfilter: Fix build error with CONFIG_BPFILTER_UMH cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info bpf: Fix register equivalence tracking. rxrpc: Fix loss of final ack on shutdown rxrpc: Fix bundle counting for exclusive connections netfilter: restore NF_INET_NUMHOOKS ibmveth: Identify ingress large send packets. ibmveth: Switch order of ibmveth_helper calls. cxgb4: handle 4-tuple PEDIT to NAT mode translation selftests: Add VRF route leaking tests ... |
||
|
|
cc6de16805 |
memblock: use separate iterators for memory and reserved regions
for_each_memblock() is used to iterate over memblock.memory in a few places that use data from memblock_region rather than the memory ranges. Introduce separate for_each_mem_region() and for_each_reserved_mem_region() to improve encapsulation of memblock internals from its users. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Baoquan He <bhe@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> [x86] Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> [MIPS] Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> [.clang-format] Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Axtens <dja@axtens.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Emil Renner Berthing <kernel@esmil.dk> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: https://lkml.kernel.org/r/20200818151634.14343-18-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
6e245ad4a1 |
memblock: reduce number of parameters in for_each_mem_range()
Currently for_each_mem_range() and for_each_mem_range_rev() iterators are the most generic way to traverse memblock regions. As such, they have 8 parameters and they are hardly convenient to users. Most users choose to utilize one of their wrappers and the only user that actually needs most of the parameters is memblock itself. To avoid yet another naming for memblock iterators, rename the existing for_each_mem_range[_rev]() to __for_each_mem_range[_rev]() and add a new for_each_mem_range[_rev]() wrappers with only index, start and end parameters. The new wrapper nicely fits into init_unavailable_mem() and will be used in upcoming changes to simplify memblock traversals. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> [MIPS] Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Axtens <dja@axtens.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Emil Renner Berthing <kernel@esmil.dk> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: https://lkml.kernel.org/r/20200818151634.14343-11-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
e320d3012d |
mm/page_alloc.c: fix freeing non-compound pages
Here is a very rare race which leaks memory:
Page P0 is allocated to the page cache. Page P1 is free.
Thread A Thread B Thread C
find_get_entry():
xas_load() returns P0
Removes P0 from page cache
P0 finds its buddy P1
alloc_pages(GFP_KERNEL, 1) returns P0
P0 has refcount 1
page_cache_get_speculative(P0)
P0 has refcount 2
__free_pages(P0)
P0 has refcount 1
put_page(P0)
P1 is not freed
Fix this by freeing all the pages in __free_pages() that won't be freed
by the call to put_page(). It's usually not a good idea to split a page,
but this is a very unlikely scenario.
Fixes:
|
||
|
|
30d8ec73e8 |
mmzone: clean code by removing unused macro parameter
Previously 'for_next_zone_zonelist_nodemask' macro parameter 'zlist' was unused so this patch removes it. Signed-off-by: Mateusz Nosek <mateusznosek0@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200917211906.30059-1-mateusznosek0@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
2187e17b02 |
mm/page_alloc.c: __perform_reclaim should return 'unsigned long'
__perform_reclaim()'s single caller expects it to return 'unsigned long', hence change its return value and a local variable to 'unsigned long'. Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Yanfei Xu <yanfei.xu@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200916022138.16740-1-yanfei.xu@windriver.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
a0622d0537 |
mm/page_alloc.c: clean code by merging two functions
finalise_ac() is just 'epilogue' for 'prepare_alloc_pages'. Therefore there is no need to keep them both so 'finalise_ac' content can be merged into prepare_alloc_pages() code. It would make __alloc_pages_nodemask() cleaner when it comes to readability. Signed-off-by: Mateusz Nosek <mateusznosek0@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Rapoport <rppt@kernel.org> Link: https://lkml.kernel.org/r/20200916110118.6537-1-mateusznosek0@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
fdd4fa1cd9 |
mm/page_alloc.c: fix early params garbage value accesses
Previously in '__init early_init_on_alloc' and '__init early_init_on_free' the return values from 'kstrtobool' were not handled properly. That caused potential garbage value read from variable 'bool_result'. Introduced patch fixes error handling. Signed-off-by: Mateusz Nosek <mateusznosek0@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200916214125.28271-1-mateusznosek0@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
cfb4a54191 |
mm/page_alloc.c: micro-optimization remove unnecessary branch
Previously flags check was separated into two separated checks with two separated branches. In case of presence of any of two mentioned flags, the same effect on flow occurs. Therefore checks can be merged and one branch can be avoided. Signed-off-by: Mateusz Nosek <mateusznosek0@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200911092310.31136-1-mateusznosek0@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
b630749f01 |
mm/page_alloc.c: clean code by removing unnecessary initialization
Previously variable 'tmp' was initialized, but was not read later before reassigning. So the initialization can be removed. [akpm@linux-foundation.org: remove `tmp' altogether] Signed-off-by: Mateusz Nosek <mateusznosek0@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200904132422.17387-1-mateusznosek0@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |