android_kernel_xiaomi_sm8450

xiaomi-sm8450/android_kernel_xiaomi_sm8450

Author	SHA1	Message	Date
Bing Han	034877c195	ANDROID: mm: export swapcache_free_entries Export swapcache_free_entries to be used in the alternative function android_vh_drain_slots_cache_cpu to swap entries in swap slot cache, it's usage is similar to the usage in drain_slots_cache_cpu. Bug: 234214858 Signed-off-by: Bing Han <bing.han@transsion.com> Change-Id: Ia89b1728d540c5cc8995a939a918e12c23057266	2022-06-30 03:00:23 +00:00
Bing Han	06c2766cbc	ANDROID: mm: export symbols used in vendor hook android_vh_get_swap_page() 3 symbols are exported to be used in vendor hook android_vh_get_swap_page: 1)check_cache_active, used to get swap page from the specified swap location, it's usage is similar to the usage in get_swap_page 2)scan_swap_map_slots, used to get swap page from the specified swap, it's usage is similar to get_swap_pages 3)swap_alloc_cluster, used to get swap page from the specified swap, it's usage is similar to get_swap_pages Bug: 234214858 Signed-off-by: Bing Han <bing.han@transsion.com> Change-Id: Ie24c5d32a16c7cb87905d034095ec8fb070dbe0f	2022-06-30 03:00:23 +00:00
Bing Han	4506bcbba5	ANDROID: mm: export swap_type_to_swap_info The function swap_type_to_swap_info is exported to access the swap_info_struct of the specified swap, which is regarded as reserved extended memory. Bug: 234214858 Signed-off-by: Bing Han <bing.han@transsion.com> Change-Id: I0107e7d561150f1945a4c161e886e9e03383fff6	2022-06-30 03:00:23 +00:00
Bing Han	ed2b11d639	ANDROID: vendor_hook: Add hook in si_swapinfo() Provide a vendor hook android_vh_si_swapinf to replace the process of updating nr_to_be_unused. When the page is swapped to a specified swap location, nr_to_be_unused should not be updated. Because the specified swap is regarded as a reserved extended memory. Bug: 234214858 Signed-off-by: Bing Han <bing.han@transsion.com> Change-Id: Ie41caec345658589bf908fb0f96d038d1fba21f3	2022-06-30 03:00:23 +00:00
Bing Han	667f0d71dc	ANDROID: vendor_hooks: Add hooks to extend the struct swap_info_struct Two vendor hooks are added to extend the struct swap_info_struct: android_vh_alloc_si, extend the allocation of struct swap_info_struct, adding data to record the information of specified reclaimed location; android_vh_init_swap_info_struct, adding initializing the extension of struct swap_info_struct; Bug: 234214858 Signed-off-by: Bing Han <bing.han@transsion.com> Change-Id: I0e1d8e38ba7dfd52b609b1c14eb78f8b0ef0f9e6	2022-06-30 03:00:23 +00:00
Bing Han	bc4c73c182	ANDROID: vendor_hook: Add hooks in unuse_pte_range() and try_to_unuse() When the page is unused, a vendor hook android_vh_unuse_swap_page should be called to specify that the page should not be swapped to the specified swap location any more. Bug: 234214858 Signed-off-by: Bing Han <bing.han@transsion.com> Change-Id: I3fc3675020517f7cc69c76a06150dfb2380dae21	2022-06-30 03:00:23 +00:00
Bing Han	d2fea0ba9a	ANDROID: vendor_hook: Add hook to update nr_swap_pages and total_swap_pages The specified swap is regarded as reserved extended memory. So nr_swap_pages and total_swap_pages should not be affected by the specified swap. Provide a vendor hook android_vh_account_swap_pages to replace the updating process of nr_swap_pages and total_swap_pages. When the page is swapped to the specified swap location, nr_swap_pages and total_swap_pages should not be updated. Bug: 234214858 Signed-off-by: Bing Han <bing.han@transsion.com> Change-Id: Ib8dfb355d190399a037b9d9eda478a81c436e224	2022-06-30 03:00:23 +00:00
Liujie Xie	d9845e9e5c	ANDROID: export walk_page_range and swp_swap_info Export walk_page_range and swp_swap_info for reading swap from backing device to zram. Bug: 225273514 Signed-off-by: Liujie Xie <xieliujie@oppo.com> Change-Id: If888cfc2823d8003b62bdb177740643696cf6f7e	2022-03-29 17:40:45 +00:00
Greg Kroah-Hartman	b1a6760ddf	Merge branch 'android12-5.10' into `android12-5.10-lts` Sync up with android12-5.10 for the following commits: `dd139186ef` ANDROID: usb: gadget: fix NULL pointer dereference in android_setup `07f65598af` ANDROID: GKI: Disable kmem cgroup accounting `309aa7e7a2` FROMLIST: mm, memcg: inline swap-related functions to improve disabled memcg config `3ae8e2f183` BACKPORT: FROMLIST: mm, memcg: inline mem_cgroup_{charge/uncharge} to improve disabled memcg config `f73d029485` FROMLIST: mm, memcg: add mem_cgroup_disabled checks in vmpressure and swap-related functions `669df367a9` UPSTREAM: mm/memcg: bail early from swap accounting if memcg disabled `1f0c32a667` UPSTREAM: procfs/dmabuf: add inode number to /proc/*/fdinfo `0c8c125f57` UPSTREAM: procfs: allow reading fdinfo with PTRACE_MODE_READ `2e0476a465` Revert "FROMLIST: procfs: Allow reading fdinfo with PTRACE_MODE_READ" `5ded961aa2` Revert "FROMLIST: BACKPORT: procfs/dmabuf: Add inode number to /..." `3ee5565017` UPSTREAM: f2fs: initialize page->private when using for our internal use `dba79c3af3` ANDROID: mm: page_pinner: report test_page_isolation_failure `13362ab28e` ANDROID: mm: page_pinner: add state of page_pinner `3254948484` ANDROID: mm: page_pinner: add more struct page fields `0445b67bee` ANDROID: mm: page_pinner: change timestamp format `71da06728c` ANDROID: mm: page_pinner: print_page_pinner refactoring `b83e564914` ANDROID: mm: page_pinner: remove shared_count `849f048050` ANDROID: mm: page_pinner: remove WARN_ON_ONCE `9a453100fc` ANDROID: mm: page_pinner: fix typos `d012783a86` ANDROID: mm: page_pinner: reset migration failed page `470cce5085` ANDROID: mm: page_pinner: record every put_page `9f47e5fdda` ANDROID: mm: page_pinner: change function names `a8385d61f2` ANDROID: Allow vendor module to reclaim a memcg `f41a95eadc` ANDROID: Export memcg functions to allow module to add new files `46bf3b94e7` FROMGIT: dt-bindings: usb: dwc3: Update dwc3 TX fifo properties `b36b813e39` UPSTREAM: dt-bindings: usb: Convert DWC USB3 bindings to DT schema `9a80b7b728` FROMGIT: of: Add stub for of_add_property() `2742be5903` ANDROID: fips140: define fips_enabled to 1 to enable FIPS behavior `e886dd4c33` ANDROID: fips140: unregister existing DRBG algorithms `634445a640` ANDROID: fips140: fix deadlock in unregister_existing_fips140_algos() `0af06624ea` ANDROID: fips140: check for errors from initcalls `92de53472e` ANDROID: fips140: log already-live algorithms `0a7da21583` ANDROID: Update new mtk gki symbol `98085b5dd8` ANDROID: usb: Add vendor hook for usb suspend and resume `956db89e71` BACKPORT: FROMLIST: dma-heap: Let dma heap use dma_map_attrs to map & unmap iova `749d6e7f2c` ANDROID: abi_gki_aarch64_qcom: Add vendor hook for shmem_alloc_page `b05bbe48be` ANDROID: abi_gki_aarch64_qcom: Add reclaim_shmem_address_space `d80c70d7a8` ANDROID: android: export kernel function arch_mmap_rnd `25c7eb4932` ANDROID: mm: shmem: Fix build break with allnoconfig `1cdcf76b15` ANDROID: vendor_hooks: add hooks in mem_cgroup subsystem `726468dd4a` ANDROID: GKI: add vendor padding variable in struct skb_shared_info `fc79c93657` FROMLIST: scsi: ufs: add quirk to enable host controller without interface configuration `2d5ae6b787` FROMLIST: scsi: ufs: add quirk to handle broken UIC command `38abaebab7` ANDROID: syscall_check: add vendor hook for bpf syscall `a7a3b31d58` ANDROID: syscall_check: add vendor hook for open syscall `a5543c9cd7` ANDROID: syscall_check: add vendor hook for mmap syscall `1f0769279f` ANDROID: GKI: Add symbol to symbol list `2cff74e08c` ANDROID: vendor_hooks: Add vendor hook to the net `25edba0d4d` FROMLIST: scsi: ufs: Fix the SCSI abort handler `c0efdc4a5e` ANDROID: android: export kernel function vm_unmapped_area `964220d080` ANDROID: shmem: vendor hook in shmem_alloc_page `bd2ca0ba5b` FROMLIST: pstore/ram: Rework logic for detecting ramoops reserved memory region `daeabfe7fa` ANDROID: mm: add reclaim_shmem_address_space() for faster reclaims `4c3dddf408` ANDROID: Update the generic ABI symbol list `4c4d8cbdef` ANDROID: GKI: refresh ABI XML `01e4a037d8` ANDROID: GKI: turn on TIDY_ABI `edf973fd24` ANDROID: Update symbol list for VIVO `1702d2c8b7` FROMGIT: net: cdc_ncm: switch to eth%d interface naming `f4d6e8324c` ANDROID: GKI: add allowed GKI symbol for Exynosauto SoC `444a0b7752` ANDROID: mm: add vendor hook for vmpressure `c799c6644b` ANDROID: fips140: adjust some log messages `091338cb39` ANDROID: fips140: add missing static keyword to fips140_init() `70bfd6a7e0` ANDROID: GKI: update allowed list for exynosauto SoC `3e3147b280` UPSTREAM: scsi: ufs: ufshcd: Fix some function doc-rot `2c553e754f` UPSTREAM: scsi: ufs: Adjust ufshcd_hold() during sending attribute requests `52ccdf90b9` FROMLIST: lockdep: Remove console_verbose when disable lock debugging `4458494476` ANDROID: ABI: qcom: Add symbols for 80211 `5c51579fde` ANDROID: fork: Export task_newtask tracepoint `e2a90797e8` ANDROID: Fix kernelci warnings for indentation in smp.c `bac33eaebf` ANDROID: irqchip: gic-v3: Move struct gic_chip_data to header `bdac4418bf` ANDROID: abi_gki_aarch64_qcom: Add android_vh_ufs_clock_scaling `65c1de0f06` ANDROID: Update symbol list for mtk `d4d02ab9b0` UPSTREAM: swiotlb: manipulate orig_addr when tlb_addr has offset `58aa0f2832` ANDROID: qcom: Add net related symbol `2f9f816445` ANDROID: Update the exynos symbol list `b2a9471239` ANDROID: Update symbol list for mtk `7c9599e204` FROMGIT: usb: dwc3: Create helper function getting MDWIDTH `0a24affb86` ANDROID: vendor_hooks: modify the function name `d686d5ffc6` ANDROID: GKI: Add some symbols to symbol list `bdfb11230b` ANDROID: cpuidle: Allow for an early exit from cpuidle_enter_state() `f0b280c395` ANDROID: cpuidle: Update cpuidle_uninstall_idle_handler() to wakeup all online CPUs `14dd90ab37` ANDROID: scsi: ufs: Add hook to influence the UFS clock scaling policy `00aec39e2e` FROMGIT: bpf: Support all gso types in bpf_skb_change_proto() Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I6e5be89f3f02c420237a549f4c6a08b5ed434581	2021-07-13 15:00:50 +02:00
Suren Baghdasaryan	309aa7e7a2	FROMLIST: mm, memcg: inline swap-related functions to improve disabled memcg config Inline mem_cgroup_try_charge_swap, mem_cgroup_uncharge_swap and cgroup_throttle_swaprate functions to perform mem_cgroup_disabled static key check inline before calling the main body of the function. This minimizes the memcg overhead in the pagefault and exit_mmap paths when memcgs are disabled using cgroup_disable=memory command-line option. This change results in ~1% overhead reduction when running PFT test [1] comparing {CONFIG_MEMCG=n} against {CONFIG_MEMCG=y, cgroup_disable=memory} configuration on an 8-core ARM64 Android device. [1] https://lkml.org/lkml/2006/8/29/294 also used in mmtests suite Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Link: https://lore.kernel.org/patchwork/patch/1458908/ Bug: 191223209 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I18d59090ec908037b39324d1f1bb511d06e9c690	2021-07-12 18:34:30 -07:00
Suren Baghdasaryan	f73d029485	FROMLIST: mm, memcg: add mem_cgroup_disabled checks in vmpressure and swap-related functions Add mem_cgroup_disabled check in vmpressure, mem_cgroup_uncharge_swap and cgroup_throttle_swaprate functions. This minimizes the memcg overhead in the pagefault and exit_mmap paths when memcgs are disabled using cgroup_disable=memory command-line option. This change results in ~2.1% overhead reduction when running PFT test [1] comparing {CONFIG_MEMCG=n, CONFIG_MEMCG_SWAP=n} against {CONFIG_MEMCG=y, CONFIG_MEMCG_SWAP=y, cgroup_disable=memory} configuration on an 8-core ARM64 Android device. [1] https://lkml.org/lkml/2006/8/29/294 also used in mmtests suite Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Link: https://lore.kernel.org/patchwork/patch/1458906/ Bug: 191223209 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Ic1fc75eb1e4d7a9848cf641b9f232ad3262c490b	2021-07-12 18:26:15 -07:00
Greg Kroah-Hartman	948d38f94d	Merge 5.10.46 into android12-5.10-lts Changes in 5.10.46 dmaengine: idxd: add missing dsa driver unregister dmaengine: fsl-dpaa2-qdma: Fix error return code in two functions dmaengine: xilinx: dpdma: initialize registers before request_irq dmaengine: ALTERA_MSGDMA depends on HAS_IOMEM dmaengine: QCOM_HIDMA_MGMT depends on HAS_IOMEM dmaengine: SF_PDMA depends on HAS_IOMEM dmaengine: stedma40: add missing iounmap() on error in d40_probe() afs: Fix an IS_ERR() vs NULL check mm/memory-failure: make sure wait for page writeback in memory_failure kvm: LAPIC: Restore guard to prevent illegal APIC register access fanotify: fix copy_event_to_user() fid error clean up batman-adv: Avoid WARN_ON timing related checks mac80211: fix skb length check in ieee80211_scan_rx() mlxsw: reg: Spectrum-3: Enforce lowest max-shaper burst size of 11 mlxsw: core: Set thermal zone polling delay argument to real value at init libbpf: Fixes incorrect rx_ring_setup_done net: ipv4: fix memory leak in netlbl_cipsov4_add_std vrf: fix maximum MTU net: rds: fix memory leak in rds_recvmsg net: dsa: felix: re-enable TX flow control in ocelot_port_flush() net: lantiq: disable interrupt before sheduling NAPI netfilter: nft_fib_ipv6: skip ipv6 packets from any to link-local ice: add ndo_bpf callback for safe mode netdev ops ice: parameterize functions responsible for Tx ring management udp: fix race between close() and udp_abort() rtnetlink: Fix regression in bridge VLAN configuration net/sched: act_ct: handle DNAT tuple collision net/mlx5e: Remove dependency in IPsec initialization flows net/mlx5e: Fix page reclaim for dead peer hairpin net/mlx5: Consider RoCE cap before init RDMA resources net/mlx5: DR, Allow SW steering for sw_owner_v2 devices net/mlx5: DR, Don't use SW steering when RoCE is not supported net/mlx5e: Block offload of outer header csum for UDP tunnels netfilter: synproxy: Fix out of bounds when parsing TCP options mptcp: Fix out of bounds when parsing TCP options sch_cake: Fix out of bounds when parsing TCP options and header mptcp: try harder to borrow memory from subflow under pressure mptcp: do not warn on bad input from the network selftests: mptcp: enable syncookie only in absence of reorders alx: Fix an error handling path in 'alx_probe()' cxgb4: fix endianness when flashing boot image cxgb4: fix sleep in atomic when flashing PHY firmware cxgb4: halt chip before flashing PHY firmware image net: stmmac: dwmac1000: Fix extended MAC address registers definition net: make get_net_ns return error if NET_NS is disabled net: qualcomm: rmnet: Update rmnet device MTU based on real device net: qualcomm: rmnet: don't over-count statistics ethtool: strset: fix message length calculation qlcnic: Fix an error handling path in 'qlcnic_probe()' netxen_nic: Fix an error handling path in 'netxen_nic_probe()' cxgb4: fix wrong ethtool n-tuple rule lookup ipv4: Fix device used for dst_alloc with local routes net: qrtr: fix OOB Read in qrtr_endpoint_post bpf: Fix leakage under speculation on mispredicted branches ptp: improve max_adj check against unreasonable values net: cdc_ncm: switch to eth%d interface naming lantiq: net: fix duplicated skb in rx descriptor ring net: usb: fix possible use-after-free in smsc75xx_bind net: fec_ptp: fix issue caused by refactor the fec_devtype net: ipv4: fix memory leak in ip_mc_add1_src net/af_unix: fix a data-race in unix_dgram_sendmsg / unix_release_sock net/mlx5: E-Switch, Read PF mac address net/mlx5: E-Switch, Allow setting GUID for host PF vport net/mlx5: Reset mkey index on creation be2net: Fix an error handling path in 'be_probe()' net: hamradio: fix memory leak in mkiss_close net: cdc_eem: fix tx fixup skb leak cxgb4: fix wrong shift. bnxt_en: Rediscover PHY capabilities after firmware reset bnxt_en: Fix TQM fastpath ring backing store computation bnxt_en: Call bnxt_ethtool_free() in bnxt_init_one() error path icmp: don't send out ICMP messages with a source address of 0.0.0.0 net: ethernet: fix potential use-after-free in ec_bhf_remove regulator: cros-ec: Fix error code in dev_err message regulator: bd70528: Fix off-by-one for buck123 .n_voltages setting platform/x86: thinkpad_acpi: Add X1 Carbon Gen 9 second fan support ASoC: rt5659: Fix the lost powers for the HDA header phy: phy-mtk-tphy: Fix some resource leaks in mtk_phy_init() ASoC: fsl-asoc-card: Set .owner attribute when registering card. regulator: rtmv20: Fix to make regcache value first reading back from HW spi: spi-zynq-qspi: Fix some wrong goto jumps & missing error code sched/pelt: Ensure that _sum is always synced with _avg ASoC: tas2562: Fix TDM_CFG0_SAMPRATE values spi: stm32-qspi: Always wait BUSY bit to be cleared in stm32_qspi_wait_cmd() regulator: rt4801: Fix NULL pointer dereference if priv->enable_gpios is NULL ASoC: rt5682: Fix the fast discharge for headset unplugging in soundwire mode pinctrl: ralink: rt2880: avoid to error in calls is pin is already enabled drm/sun4i: dw-hdmi: Make HDMI PHY into a platform device ASoC: qcom: lpass-cpu: Fix pop noise during audio capture begin radeon: use memcpy_to/fromio for UVD fw upload hwmon: (scpi-hwmon) shows the negative temperature properly mm: relocate 'write_protect_seq' in struct mm_struct irqchip/gic-v3: Workaround inconsistent PMR setting on NMI entry bpf: Inherit expanded/patched seen count from old aux data bpf: Do not mark insn as seen under speculative path verification can: bcm: fix infoleak in struct bcm_msg_head can: bcm/raw/isotp: use per module netdevice notifier can: j1939: fix Use-after-Free, hold skb ref while in use can: mcba_usb: fix memory leak in mcba_usb usb: core: hub: Disable autosuspend for Cypress CY7C65632 usb: chipidea: imx: Fix Battery Charger 1.2 CDP detection tracing: Do not stop recording cmdlines when tracing is off tracing: Do not stop recording comms if the trace file is being read tracing: Do no increment trace_clock_global() by one PCI: Mark TI C667X to avoid bus reset PCI: Mark some NVIDIA GPUs to avoid bus reset PCI: aardvark: Fix kernel panic during PIO transfer PCI: Add ACS quirk for Broadcom BCM57414 NIC PCI: Work around Huawei Intelligent NIC VF FLR erratum KVM: x86: Immediately reset the MMU context when the SMM flag is cleared KVM: x86/mmu: Calculate and check "full" mmu_role for nested MMU KVM: X86: Fix x86_emulator slab cache leak s390/mcck: fix calculation of SIE critical section size s390/ap: Fix hanging ioctl caused by wrong msg counter ARCv2: save ABI registers across signal handling x86/mm: Avoid truncating memblocks for SGX memory x86/process: Check PF_KTHREAD and not current->mm for kernel threads x86/ioremap: Map EFI-reserved memory as encrypted for SEV x86/pkru: Write hardware init value to PKRU when xstate is init x86/fpu: Prevent state corruption in __fpu__restore_sig() x86/fpu: Invalidate FPU state after a failed XRSTOR from a user buffer x86/fpu: Reset state for all signal restore failures crash_core, vmcoreinfo: append 'SECTION_SIZE_BITS' to vmcoreinfo dmaengine: pl330: fix wrong usage of spinlock flags in dma_cyclc mac80211: Fix NULL ptr deref for injected rate info cfg80211: make certificate generation more robust cfg80211: avoid double free of PMSR request drm/amdgpu/gfx10: enlarge CP_MEC_DOORBELL_RANGE_UPPER to cover full doorbell. drm/amdgpu/gfx9: fix the doorbell missing when in CGPG issue. net: ll_temac: Make sure to free skb when it is completely used net: ll_temac: Fix TX BD buffer overwrite net: bridge: fix vlan tunnel dst null pointer dereference net: bridge: fix vlan tunnel dst refcnt when egressing mm/swap: fix pte_same_as_swp() not removing uffd-wp bit when compare mm/slub: clarify verification reporting mm/slub: fix redzoning for small allocations mm/slub: actually fix freelist pointer vs redzoning mm/slub.c: include swab.h net: stmmac: disable clocks in stmmac_remove_config_dt() net: fec_ptp: add clock rate zero check tools headers UAPI: Sync linux/in.h copy with the kernel sources perf beauty: Update copy of linux/socket.h with the kernel sources usb: dwc3: debugfs: Add and remove endpoint dirs dynamically usb: dwc3: core: fix kernel panic when do reboot Linux 5.10.46 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I99f37c9f257f90ccdb091306f3d4cfb7c32e3880	2021-06-23 17:53:08 +02:00
Peter Xu	12eb3c2c1a	mm/swap: fix pte_same_as_swp() not removing uffd-wp bit when compare commit 099dd6878b9b12d6bbfa6bf29ce0c8ddd38f6901 upstream. I found it by pure code review, that pte_same_as_swp() of unuse_vma() didn't take uffd-wp bit into account when comparing ptes. pte_same_as_swp() returning false negative could cause failure to swapoff swap ptes that was wr-protected by userfaultfd. Link: https://lkml.kernel.org/r/20210603180546.9083-1-peterx@redhat.com Fixes: `f45ec5ff16` ("userfaultfd: wp: support swap and page migration") Signed-off-by: Peter Xu <peterx@redhat.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> [5.7+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-23 14:42:53 +02:00
Greg Kroah-Hartman	28454baf9c	Merge 5.10.21 into android12-5.10 Changes in 5.10.21 net: usb: qmi_wwan: support ZTE P685M modem Input: elantech - fix protocol errors for some trackpoints in SMBus mode Input: elan_i2c - add new trackpoint report type 0x5F drm/virtio: use kvmalloc for large allocations x86/build: Treat R_386_PLT32 relocation as R_386_PC32 JFS: more checks for invalid superblock sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled udlfb: Fix memory leak in dlfb_usb_probe media: mceusb: sanity check for prescaler value erofs: fix shift-out-of-bounds of blkszbits media: v4l2-ctrls.c: fix shift-out-of-bounds in std_validate xfs: Fix assert failure in xfs_setattr_size() net/af_iucv: remove WARN_ONCE on malformed RX packets smackfs: restrict bytes count in smackfs write functions tomoyo: ignore data race while checking quota net: fix up truesize of cloned skb in skb_prepare_for_shift() riscv: Get rid of MAX_EARLY_MAPPING_SIZE nbd: handle device refs for DESTROY_ON_DISCONNECT properly mm/hugetlb.c: fix unnecessary address expansion of pmd sharing RDMA/rtrs: Do not signal for heatbeat RDMA/rtrs-clt: Use bitmask to check sess->flags RDMA/rtrs-srv: Do not signal REG_MR tcp: fix tcp_rmem documentation mptcp: do not wakeup listener for MPJ subflows net: bridge: use switchdev for port flags set through sysfs too net/sched: cls_flower: Reject invalid ct_state flags rules net: dsa: tag_rtl4_a: Support also egress tags net: ag71xx: remove unnecessary MTU reservation net: hsr: add support for EntryForgetTime net: psample: Fix netlink skb length with tunnel info net: fix dev_ifsioc_locked() race condition dt-bindings: ethernet-controller: fix fixed-link specification dt-bindings: net: btusb: DT fix s/interrupt-name/interrupt-names/ ASoC: qcom: Remove useless debug print rsi: Fix TX EAPOL packet handling against iwlwifi AP rsi: Move card interrupt handling to RX thread EDAC/amd64: Do not load on family 0x15, model 0x13 staging: fwserial: Fix error handling in fwserial_create x86/reboot: Add Zotac ZBOX CI327 nano PCI reboot quirk vt/consolemap: do font sum unsigned wlcore: Fix command execute failure 19 for wl12xx Bluetooth: hci_h5: Set HCI_QUIRK_SIMULTANEOUS_DISCOVERY for btrtl Bluetooth: btusb: fix memory leak on suspend and resume mt76: mt7615: reset token when mac_reset happens pktgen: fix misuse of BUG_ON() in pktgen_thread_worker() ath10k: fix wmi mgmt tx queue full due to race condition net: sfp: add mode quirk for GPON module Ubiquiti U-Fiber Instant Bluetooth: Add new HCI_QUIRK_NO_SUSPEND_NOTIFIER quirk Bluetooth: Fix null pointer dereference in amp_read_loc_assoc_final_data staging: most: sound: add sanity check for function argument staging: bcm2835-audio: Replace unsafe strcpy() with strscpy() brcmfmac: Add DMI nvram filename quirk for Predia Basic tablet brcmfmac: Add DMI nvram filename quirk for Voyo winpad A15 tablet drm/hisilicon: Fix use-after-free crypto: tcrypt - avoid signed overflow in byte count fs: make unlazy_walk() error handling consistent drm/amdgpu: Add check to prevent IH overflow PCI: Add a REBAR size quirk for Sapphire RX 5600 XT Pulse ASoC: Intel: bytcr_rt5640: Add new BYT_RT5640_NO_SPEAKERS quirk-flag drm/amd/display: Guard against NULL pointer deref when get_i2c_info fails drm/amd/amdgpu: add error handling to amdgpu_virt_read_pf2vf_data media: uvcvideo: Allow entities with no pads f2fs: handle unallocated section and zone on pinned/atgc f2fs: fix to set/clear I_LINKABLE under i_lock nvme-core: add cancel tagset helpers nvme-rdma: add clean action for failed reconnection nvme-tcp: add clean action for failed reconnection ASoC: Intel: Add DMI quirk table to soc_intel_is_byt_cr() btrfs: fix error handling in commit_fs_roots perf/x86/kvm: Add Cascade Lake Xeon steppings to isolation_ucodes[] ASoC: Intel: sof-sdw: indent and add quirks consistently ASoC: Intel: sof_sdw: detect DMIC number based on mach params parisc: Bump 64-bit IRQ stack size to 64 KB sched/features: Fix hrtick reprogramming ASoC: Intel: bytcr_rt5640: Add quirk for the Estar Beauty HD MID 7316R tablet ASoC: Intel: bytcr_rt5640: Add quirk for the Voyo Winpad A15 tablet ASoC: Intel: bytcr_rt5651: Add quirk for the Jumper EZpad 7 tablet ASoC: Intel: bytcr_rt5640: Add quirk for the Acer One S1002 tablet scsi: iscsi: Restrict sessions and handles to admin capabilities scsi: iscsi: Ensure sysfs attributes are limited to PAGE_SIZE scsi: iscsi: Verify lengths on passthrough PDUs Xen/gnttab: handle p2m update errors on a per-slot basis xen-netback: respect gnttab_map_refs()'s return value xen: fix p2m size in dom0 for disabled memory hotplug case zsmalloc: account the number of compacted pages correctly remoteproc/mediatek: Fix kernel test robot warning swap: fix swapfile read/write offset powerpc/sstep: Check instruction validity against ISA version before emulation powerpc/sstep: Fix incorrect return from analyze_instr() tty: fix up iterate_tty_read() EOVERFLOW handling tty: fix up hung_up_tty_read() conversion tty: clean up legacy leftovers from n_tty line discipline tty: teach n_tty line discipline about the new "cookie continuations" tty: teach the n_tty ICANON case about the new "cookie continuations" too media: v4l: ioctl: Fix memory leak in video_usercopy ALSA: hda/realtek: Add quirk for Clevo NH55RZQ ALSA: hda/realtek: Add quirk for Intel NUC 10 ALSA: hda/realtek: Apply dual codec quirks for MSI Godlike X570 board net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 v2.0 workaround net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips Linux 5.10.21 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I52b1105b73d893779b3886b577accfabe9f83a16	2021-03-07 12:53:30 +01:00
Jens Axboe	04b049ac9c	swap: fix swapfile read/write offset commit caf6912f3f4af7232340d500a4a2008f81b93f14 upstream. We're not factoring in the start of the file for where to write and read the swapfile, which leads to very unfortunate side effects of writing where we should not be... Fixes: `dd6bd0d9c7` ("swap: use bdev_read_page() / bdev_write_page()") Signed-off-by: Jens Axboe <axboe@kernel.dk> Cc: Anthony Iliopoulos <ailiop@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-03-07 12:34:15 +01:00
Will Deacon	cab48b24a8	BACKPORT: FROMGIT: mm: Use static initialisers for immutable fields of 'struct vm_fault' In preparation for const-ifying the anonymous struct field of 'struct vm_fault', ensure that it is initialised using designated initialisers. Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Will Deacon <will@kernel.org> Change-Id: Ib2c84bbc4d59fe1811465e59c89f8eb7f73e6229 Bug: 171278850 (cherry picked from commit 8c63ca5bc3e19f11128e8e285dcf20aac6768f97 https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/faultaround) Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>	2021-02-11 12:15:40 +00:00
Greg Kroah-Hartman	39564d70ad	Merge 5.10.12 into android12-5.10 Changes in 5.10.12 gpio: mvebu: fix pwm .get_state period calculation Revert "mm/slub: fix a memory leak in sysfs_slab_add()" futex: Ensure the correct return value from futex_lock_pi() futex: Replace pointless printk in fixup_owner() futex: Provide and use pi_state_update_owner() rtmutex: Remove unused argument from rt_mutex_proxy_unlock() futex: Use pi_state_update_owner() in put_pi_state() futex: Simplify fixup_pi_state_owner() futex: Handle faults correctly for PI futexes HID: wacom: Correct NULL dereference on AES pen proximity HID: multitouch: Apply MT_QUIRK_CONFIDENCE quirk for multi-input devices media: Revert "media: videobuf2: Fix length check for single plane dmabuf queueing" media: v4l2-subdev.h: BIT() is not available in userspace RDMA/vmw_pvrdma: Fix network_hdr_type reported in WC iwlwifi: dbg: Don't touch the tlv data kernel/io_uring: cancel io_uring before task works io_uring: inline io_uring_attempt_task_drop() io_uring: add warn_once for io_uring_flush() io_uring: stop SQPOLL submit on creator's death io_uring: fix null-deref in io_disable_sqo_submit io_uring: do sqo disable on install_fd error io_uring: fix false positive sqo warning on flush io_uring: fix uring_flush in exit_files() warning io_uring: fix skipping disabling sqo on exec io_uring: dont kill fasync under completion_lock io_uring: fix sleeping under spin in __io_clean_op objtool: Don't fail on missing symbol table mm/page_alloc: add a missing mm_page_alloc_zone_locked() tracepoint mm: fix a race on nr_swap_pages tools: Factor HOSTCC, HOSTLD, HOSTAR definitions printk: fix buffer overflow potential for print_text() printk: fix string termination for record_print_text() Linux 5.10.12 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I6d96ec78494ebbc0daf4fdecfc13e522c6bd6b42	2021-01-30 14:29:02 +01:00
Zhaoyang Huang	f472a59aa1	mm: fix a race on nr_swap_pages commit b50da6e9f42ade19141f6cf8870bb2312b055aa3 upstream. The scenario on which "Free swap = -4kB" happens in my system, which is caused by several get_swap_pages racing with each other and show_swap_cache_info happens simutaniously. No need to add a lock on get_swap_page_of_type as we remove "Presub/PosAdd" here. ProcessA ProcessB ProcessC ngoals = 1 ngoals = 1 avail = nr_swap_pages(1) avail = nr_swap_pages(1) nr_swap_pages(1) -= ngoals nr_swap_pages(0) -= ngoals nr_swap_pages = -1 Link: https://lkml.kernel.org/r/1607050340-4535-1-git-send-email-zhaoyang.huang@unisoc.com Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-30 13:55:19 +01:00
Vijayanand Jitta	5e07d2eb08	ANDROID: mm: Export si_swapinfo Export si_swapinfo symbol which is used as part of meminfo collection from minidump module. Bug: 176277894 Change-Id: I5dc1672ce649c22dc33d4a544ee5a38f8376becf Signed-off-by: Vijayanand Jitta <vjitta@codeaurora.org>	2021-01-11 06:05:40 +00:00
Qian Cai	b11a76b37a	mm/swapfile: do not sleep with a spin lock held We can't call kvfree() with a spin lock held, so defer it. Fixes a might_sleep() runtime warning. Fixes: `873d7bcfd0` ("mm/swapfile.c: use kvzalloc for swap_info_struct allocation") Signed-off-by: Qian Cai <qcai@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20201202151549.10350-1-qcai@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-12-06 10:19:07 -08:00
Miaohe Lin	822bca52ee	mm/swapfile.c: fix potential memory leak in sys_swapon If we failed to drain inode, we would forget to free the swap address space allocated by init_swap_address_space() above. Fixes: `dc617f29db` ("vfs: don't allow writes to swap files") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Link: https://lkml.kernel.org/r/20200930101803.53884-1-linmiaohe@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-10-13 18:38:30 -07:00
Miaohe Lin	7a3d52e45e	mm/swapfile.c: remove unnecessary goto out in _swap_info_get() It's unnecessary to goto the out label while out label is just below. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200930102549.1885-1-linmiaohe@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-10-13 18:38:30 -07:00
Yu Zhao	cc2828b21c	mm: remove activate_page() from unuse_pte() We don't initially add anon pages to active lruvec after commit `b518154e59` ("mm/vmscan: protect the workingset on anonymous LRU"). Remove activate_page() from unuse_pte(), which seems to be missed by the commit. And make the function static while we are at it. Before the commit, we called lru_cache_add_active_or_unevictable() to add new ksm pages to active lruvec. Therefore, activate_page() wasn't necessary for them in the first place. Signed-off-by: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Qian Cai <cai@lca.pw> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: http://lkml.kernel.org/r/20200818184704.3625199-1-yuzhao@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-10-13 18:38:30 -07:00
Gao Xiang	3264631548	swap: rename SWP_FS to SWAP_FS_OPS to avoid ambiguity SWP_FS is used to make swap_{read,write}page() go through the filesystem, and it's only used for swap files over NFS for now. Otherwise it will directly submit IO to blockdev according to swapfile extents reported by filesystems in advance. As Matthew pointed out [1], SWP_FS naming is somewhat confusing, so let's rename to SWP_FS_OPS. [1] https://lore.kernel.org/r/20200820113448.GM17456@casper.infradead.org Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200822113019.11319-1-hsiangkao@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-10-13 18:38:29 -07:00
Linus Torvalds	3ad11d7ac8	Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block Pull block updates from Jens Axboe: - Series of merge handling cleanups (Baolin, Christoph) - Series of blk-throttle fixes and cleanups (Baolin) - Series cleaning up BDI, seperating the block device from the backing_dev_info (Christoph) - Removal of bdget() as a generic API (Christoph) - Removal of blkdev_get() as a generic API (Christoph) - Cleanup of is-partition checks (Christoph) - Series reworking disk revalidation (Christoph) - Series cleaning up bio flags (Christoph) - bio crypt fixes (Eric) - IO stats inflight tweak (Gabriel) - blk-mq tags fixes (Hannes) - Buffer invalidation fixes (Jan) - Allow soft limits for zone append (Johannes) - Shared tag set improvements (John, Kashyap) - Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel) - DM no-wait support (Mike, Konstantin) - Request allocation improvements (Ming) - Allow md/dm/bcache to use IO stat helpers (Song) - Series improving blk-iocost (Tejun) - Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang, Xianting, Yang, Yufen, yangerkun) * tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits) block: fix uapi blkzoned.h comments blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue blk-mq: get rid of the dead flush handle code path block: get rid of unnecessary local variable block: fix comment and add lockdep assert blk-mq: use helper function to test hw stopped block: use helper function to test queue register block: remove redundant mq check block: invoke blk_mq_exit_sched no matter whether have .exit_sched percpu_ref: don't refer to ref->data if it isn't allocated block: ratelimit handle_bad_sector() message blk-throttle: Re-use the throtl_set_slice_end() blk-throttle: Open code __throtl_de/enqueue_tg() blk-throttle: Move service tree validation out of the throtl_rb_first() blk-throttle: Move the list operation after list validation blk-throttle: Fix IO hang for a corner case blk-throttle: Avoid tracking latency if low limit is invalid blk-throttle: Avoid getting the current time if tg->last_finish_time is 0 blk-throttle: Remove a meaningless parameter for throtl_downgrade_state() block: Remove redundant 'return' statement ...	2020-10-13 12:12:44 -07:00
Linus Torvalds	6734e20e39	Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "There's quite a lot of code here, but much of it is due to the addition of a new PMU driver as well as some arm64-specific selftests which is an area where we've traditionally been lagging a bit. In terms of exciting features, this includes support for the Memory Tagging Extension which narrowly missed 5.9, hopefully allowing userspace to run with use-after-free detection in production on CPUs that support it. Work is ongoing to integrate the feature with KASAN for 5.11. Another change that I'm excited about (assuming they get the hardware right) is preparing the ASID allocator for sharing the CPU page-table with the SMMU. Those changes will also come in via Joerg with the IOMMU pull. We do stray outside of our usual directories in a few places, mostly due to core changes required by MTE. Although much of this has been Acked, there were a couple of places where we unfortunately didn't get any review feedback. Other than that, we ran into a handful of minor conflicts in -next, but nothing that should post any issues. Summary: - Userspace support for the Memory Tagging Extension introduced by Armv8.5. Kernel support (via KASAN) is likely to follow in 5.11. - Selftests for MTE, Pointer Authentication and FPSIMD/SVE context switching. - Fix and subsequent rewrite of our Spectre mitigations, including the addition of support for PR_SPEC_DISABLE_NOEXEC. - Support for the Armv8.3 Pointer Authentication enhancements. - Support for ASID pinning, which is required when sharing page-tables with the SMMU. - MM updates, including treating flush_tlb_fix_spurious_fault() as a no-op. - Perf/PMU driver updates, including addition of the ARM CMN PMU driver and also support to handle CPU PMU IRQs as NMIs. - Allow prefetchable PCI BARs to be exposed to userspace using normal non-cacheable mappings. - Implementation of ARCH_STACKWALK for unwinding. - Improve reporting of unexpected kernel traps due to BPF JIT failure. - Improve robustness of user-visible HWCAP strings and their corresponding numerical constants. - Removal of TEXT_OFFSET. - Removal of some unused functions, parameters and prototypes. - Removal of MPIDR-based topology detection in favour of firmware description. - Cleanups to handling of SVE and FPSIMD register state in preparation for potential future optimisation of handling across syscalls. - Cleanups to the SDEI driver in preparation for support in KVM. - Miscellaneous cleanups and refactoring work" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits) Revert "arm64: initialize per-cpu offsets earlier" arm64: random: Remove no longer needed prototypes arm64: initialize per-cpu offsets earlier kselftest/arm64: Check mte tagged user address in kernel kselftest/arm64: Verify KSM page merge for MTE pages kselftest/arm64: Verify all different mmap MTE options kselftest/arm64: Check forked child mte memory accessibility kselftest/arm64: Verify mte tag inclusion via prctl kselftest/arm64: Add utilities and a test to validate mte memory perf: arm-cmn: Fix conversion specifiers for node type perf: arm-cmn: Fix unsigned comparison to less than zero arm64: dbm: Invalidate local TLB when setting TCR_EL1.HD arm64: mm: Make flush_tlb_fix_spurious_fault() a no-op arm64: Add support for PR_SPEC_DISABLE_NOEXEC prctl() option arm64: Pull in task_stack_page() to Spectre-v4 mitigation code KVM: arm64: Allow patching EL2 vectors even with KASLR is not enabled arm64: Get rid of arm64_ssbd_state KVM: arm64: Convert ARCH_WORKAROUND_2 to arm64_get_spectre_v4_state() KVM: arm64: Get rid of kvm_arm_have_ssbd() KVM: arm64: Simplify handling of ARCH_WORKAROUND_2 ...	2020-10-12 10:00:51 -07:00
Gao Xiang	4166343058	mm, THP, swap: fix allocating cluster for swapfile by mistake SWP_FS is used to make swap_{read,write}page() go through the filesystem, and it's only used for swap files over NFS. So, !SWP_FS means non NFS for now, it could be either file backed or device backed. Something similar goes with legacy SWP_FILE. So in order to achieve the goal of the original patch, SWP_BLKDEV should be used instead. FS corruption can be observed with SSD device + XFS + fragmented swapfile due to CONFIG_THP_SWAP=y. I reproduced the issue with the following details: Environment: QEMU + upstream kernel + buildroot + NVMe (2 GB) Kernel config: CONFIG_BLK_DEV_NVME=y CONFIG_THP_SWAP=y Some reproducible steps: mkfs.xfs -f /dev/nvme0n1 mkdir /tmp/mnt mount /dev/nvme0n1 /tmp/mnt bs="32k" sz="1024m" # doesn't matter too much, I also tried 16m xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw xfs_io -f -c "pwrite -F -S 0 -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fsync" /tmp/mnt/sw mkswap /tmp/mnt/sw swapon /tmp/mnt/sw stress --vm 2 --vm-bytes 600M # doesn't matter too much as well Symptoms: - FS corruption (e.g. checksum failure) - memory corruption at: 0xd2808010 - segfault Fixes: `f0eea189e8` ("mm, THP, swap: Don't allocate huge cluster for file backed swap device") Fixes: `38d8b4e6bd` ("mm, THP, swap: delay splitting THP during swap out") Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Acked-by: Rafael Aquini <aquini@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Eric Sandeen <esandeen@redhat.com> Cc: Dave Chinner <david@fromorbit.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200820045323.7809-1-hsiangkao@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-09-26 10:33:57 -07:00
Christoph Hellwig	1cb039f3dc	bdi: replace BDI_CAP_STABLE_WRITES with a queue and a sb flag The BDI_CAP_STABLE_WRITES is one of the few bits of information in the backing_dev_info shared between the block drivers and the writeback code. To help untangling the dependency replace it with a queue flag and a superblock flag derived from it. This also helps with the case of e.g. a file system requiring stable writes due to its own checksumming, but not forcing it on other users of the block device like the swap code. One downside is that we an't support the stable_pages_required bdi attribute in sysfs anymore. It is replaced with a queue attribute which also is writable for easier testing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-24 13:43:39 -06:00
Christoph Hellwig	a8b456d01c	bdi: remove BDI_CAP_SYNCHRONOUS_IO BDI_CAP_SYNCHRONOUS_IO is only checked in the swap code, and used to decided if ->rw_page can be used on a block device. Just check up for the method instead. The only complication is that zram needs a second set of block_device_operations as it can switch between modes that actually support ->rw_page and those who don't. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-24 13:43:39 -06:00
Christoph Hellwig	21bd900572	mm: split swap_type_of swap_type_of is used for two entirely different purposes: (1) check what swap type a given device/offset corresponds to (2) find the first available swap device that can be written to Mixing both in a single function creates an unreadable mess. Create two separate functions instead, and switch both to pass a dev_t instead of a struct block_device to further simplify the code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-23 10:43:19 -06:00
Christoph Hellwig	ef16e1d98c	mm: cleanup claim_swapfile Use blkdev_get_by_dev instead of bdgrab + blkdev_get. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-23 10:43:19 -06:00
Steven Price	8a84802e2a	mm: Add arch hooks for saving/restoring tags Arm's Memory Tagging Extension (MTE) adds some metadata (tags) to every physical page, when swapping pages out to disk it is necessary to save these tags, and later restore them when reading the pages back. Add some hooks along with dummy implementations to enable the arch code to handle this. Three new hooks are added to the swap code: * arch_prepare_to_swap() and * arch_swap_invalidate_page() / arch_swap_invalidate_area(). One new hook is added to shmem: * arch_swap_restore() Signed-off-by: Steven Price <steven.price@arm.com> [catalin.marinas@arm.com: add unlock_page() on the error path] [catalin.marinas@arm.com: dropped the _tags suffix] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org>	2020-09-04 12:46:07 +01:00
Qian Cai	a449bf58e4	mm/swapfile: fix and annotate various data races swap_info_struct si.highest_bit, si.swap_map[offset] and si.flags could be accessed concurrently separately as noticed by KCSAN, === si.highest_bit === write to 0xffff8d5abccdc4d4 of 4 bytes by task 5353 on cpu 24: swap_range_alloc+0x81/0x130 swap_range_alloc at mm/swapfile.c:681 scan_swap_map_slots+0x371/0xb90 get_swap_pages+0x39d/0x5c0 get_swap_page+0xf2/0x524 add_to_swap+0xe4/0x1c0 shrink_page_list+0x1795/0x2870 shrink_inactive_list+0x316/0x880 shrink_lruvec+0x8dc/0x1380 shrink_node+0x317/0xd80 do_try_to_free_pages+0x1f7/0xa10 try_to_free_pages+0x26c/0x5e0 __alloc_pages_slowpath+0x458/0x1290 read to 0xffff8d5abccdc4d4 of 4 bytes by task 6672 on cpu 70: scan_swap_map_slots+0x4a6/0xb90 scan_swap_map_slots at mm/swapfile.c:892 get_swap_pages+0x39d/0x5c0 get_swap_page+0xf2/0x524 add_to_swap+0xe4/0x1c0 shrink_page_list+0x1795/0x2870 shrink_inactive_list+0x316/0x880 shrink_lruvec+0x8dc/0x1380 shrink_node+0x317/0xd80 do_try_to_free_pages+0x1f7/0xa10 try_to_free_pages+0x26c/0x5e0 __alloc_pages_slowpath+0x458/0x1290 Reported by Kernel Concurrency Sanitizer on: CPU: 70 PID: 6672 Comm: oom01 Tainted: G W L 5.5.0-next-20200205+ #3 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 === si.swap_map[offset] === write to 0xffffbc370c29a64c of 1 bytes by task 6856 on cpu 86: __swap_entry_free_locked+0x8c/0x100 __swap_entry_free_locked at mm/swapfile.c:1209 (discriminator 4) __swap_entry_free.constprop.20+0x69/0xb0 free_swap_and_cache+0x53/0xa0 unmap_page_range+0x7f8/0x1d70 unmap_single_vma+0xcd/0x170 unmap_vmas+0x18b/0x220 exit_mmap+0xee/0x220 mmput+0x10e/0x270 do_exit+0x59b/0xf40 do_group_exit+0x8b/0x180 read to 0xffffbc370c29a64c of 1 bytes by task 6855 on cpu 20: _swap_info_get+0x81/0xa0 _swap_info_get at mm/swapfile.c:1140 free_swap_and_cache+0x40/0xa0 unmap_page_range+0x7f8/0x1d70 unmap_single_vma+0xcd/0x170 unmap_vmas+0x18b/0x220 exit_mmap+0xee/0x220 mmput+0x10e/0x270 do_exit+0x59b/0xf40 do_group_exit+0x8b/0x180 === si.flags === write to 0xffff956c8fc6c400 of 8 bytes by task 6087 on cpu 23: scan_swap_map_slots+0x6fe/0xb50 scan_swap_map_slots at mm/swapfile.c:887 get_swap_pages+0x39d/0x5c0 get_swap_page+0x377/0x524 add_to_swap+0xe4/0x1c0 shrink_page_list+0x1795/0x2870 shrink_inactive_list+0x316/0x880 shrink_lruvec+0x8dc/0x1380 shrink_node+0x317/0xd80 do_try_to_free_pages+0x1f7/0xa10 try_to_free_pages+0x26c/0x5e0 __alloc_pages_slowpath+0x458/0x1290 read to 0xffff956c8fc6c400 of 8 bytes by task 6207 on cpu 63: _swap_info_get+0x41/0xa0 __swap_info_get at mm/swapfile.c:1114 put_swap_page+0x84/0x490 __remove_mapping+0x384/0x5f0 shrink_page_list+0xff1/0x2870 shrink_inactive_list+0x316/0x880 shrink_lruvec+0x8dc/0x1380 shrink_node+0x317/0xd80 do_try_to_free_pages+0x1f7/0xa10 try_to_free_pages+0x26c/0x5e0 __alloc_pages_slowpath+0x458/0x1290 The writes are under si->lock but the reads are not. For si.highest_bit and si.swap_map[offset], data race could trigger logic bugs, so fix them by having WRITE_ONCE() for the writes and READ_ONCE() for the reads except those isolated reads where they compare against zero which a data race would cause no harm. Thus, annotate them as intentional data races using the data_race() macro. For si.flags, the readers are only interested in a single bit where a data race there would cause no issue there. [cai@lca.pw: add a missing annotation for si->flags in memory.c] Link: http://lkml.kernel.org/r/1581612647-5958-1-git-send-email-cai@lca.pw Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Marco Elver <elver@google.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/1581095163-12198-1-git-send-email-cai@lca.pw Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-08-14 19:56:57 -07:00
Matthew Wilcox (Oracle)	6c357848b4	mm: replace hpage_nr_pages with thp_nr_pages The thp prefix is more frequently used than hpage and we should be consistent between the various functions. [akpm@linux-foundation.org: fix mm/migrate.c] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/r/20200629151959.15779-6-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-08-14 19:56:56 -07:00
Joonsoo Kim	3852f6768e	mm/swapcache: support to handle the shadow entries Workingset detection for anonymous page will be implemented in the following patch and it requires to store the shadow entries into the swapcache. This patch implements an infrastructure to store the shadow entry in the swapcache. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/1595490560-15117-5-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-08-12 10:57:55 -07:00
Joonsoo Kim	b518154e59	mm/vmscan: protect the workingset on anonymous LRU In current implementation, newly created or swap-in anonymous page is started on active list. Growing active list results in rebalancing active/inactive list so old pages on active list are demoted to inactive list. Hence, the page on active list isn't protected at all. Following is an example of this situation. Assume that 50 hot pages on active list. Numbers denote the number of pages on active/inactive list (active \| inactive). 1. 50 hot pages on active list 50(h) \| 0 2. workload: 50 newly created (used-once) pages 50(uo) \| 50(h) 3. workload: another 50 newly created (used-once) pages 50(uo) \| 50(uo), swap-out 50(h) This patch tries to fix this issue. Like as file LRU, newly created or swap-in anonymous pages will be inserted to the inactive list. They are promoted to active list if enough reference happens. This simple modification changes the above example as following. 1. 50 hot pages on active list 50(h) \| 0 2. workload: 50 newly created (used-once) pages 50(h) \| 50(uo) 3. workload: another 50 newly created (used-once) pages 50(h) \| 50(uo), swap-out 50(uo) As you can see, hot pages on active list would be protected. Note that, this implementation has a drawback that the page cannot be promoted and will be swapped-out if re-access interval is greater than the size of inactive list but less than the size of total(active+inactive). To solve this potential issue, following patch will apply workingset detection similar to the one that's already applied to file LRU. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Link: http://lkml.kernel.org/r/1595490560-15117-3-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-08-12 10:57:55 -07:00
Christoph Hellwig	e556f6ba10	block: remove the bd_queue field from struct block_device Just use bd_disk->queue instead. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-01 08:08:20 -06:00
Michel Lespinasse	d8ed45c5dc	mmap locking API: use coccinelle to convert mmap_sem rwsem call sites This change converts the existing mmap_sem rwsem calls to use the new mmap locking API instead. The change is generated using coccinelle with the following rule: // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir . @@ expression mm; @@ ( -init_rwsem +mmap_init_lock \| -down_write +mmap_write_lock \| -down_write_killable +mmap_write_lock_killable \| -down_write_trylock +mmap_write_trylock \| -up_write +mmap_write_unlock \| -downgrade_write +mmap_write_downgrade \| -down_read +mmap_read_lock \| -down_read_killable +mmap_read_lock_killable \| -down_read_trylock +mmap_read_trylock \| -up_read +mmap_read_unlock ) -(&mm->mmap_sem) +(mm) Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-09 09:39:14 -07:00
Mike Rapoport	e31cf2f4ca	mm: don't include asm/pgtable.h if linux/mm.h is already included Patch series "mm: consolidate definitions of page table accessors", v2. The low level page table accessors (pXY_index(), pXY_offset()) are duplicated across all architectures and sometimes more than once. For instance, we have 31 definition of pgd_offset() for 25 supported architectures. Most of these definitions are actually identical and typically it boils down to, e.g. static inline unsigned long pmd_index(unsigned long address) { return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1); } static inline pmd_t pmd_offset(pud_t pud, unsigned long address) { return (pmd_t )pud_page_vaddr(pud) + pmd_index(address); } These definitions can be shared among 90% of the arches provided XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined. For architectures that really need a custom version there is always possibility to override the generic version with the usual ifdefs magic. These patches introduce include/linux/pgtable.h that replaces include/asm-generic/pgtable.h and add the definitions of the page table accessors to the new header. This patch (of 12): The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the functions involving page table manipulations, e.g. pte_alloc() and pmd_alloc(). So, there is no point to explicitly include <asm/pgtable.h> in the files that include <linux/mm.h>. The include statements in such cases are remove with a simple loop: for f in $(git grep -l "include <linux/mm.h>") ; do sed -i -e '/include <asm\/pgtable.h>/ d' $f done Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-09 09:39:13 -07:00
Johannes Weiner	4c6355b25e	mm: memcontrol: charge swapin pages on instantiation Right now, users that are otherwise memory controlled can easily escape their containment and allocate significant amounts of memory that they're not being charged for. That's because swap readahead pages are not being charged until somebody actually faults them into their page table. This can be exploited with MADV_WILLNEED, which triggers arbitrary readahead allocations without charging the pages. There are additional problems with the delayed charging of swap pages: 1. To implement refault/workingset detection for anonymous pages, we need to have a target LRU available at swapin time, but the LRU is not determinable until the page has been charged. 2. To implement per-cgroup LRU locking, we need page->mem_cgroup to be stable when the page is isolated from the LRU; otherwise, the locks change under us. But swapcache gets charged after it's already on the LRU, and even if we cannot isolate it ourselves (since charging is not exactly optional). The previous patch ensured we always maintain cgroup ownership records for swap pages. This patch moves the swapcache charging point from the fault handler to swapin time to fix all of the above problems. v2: simplify swapin error checking (Joonsoo) [hughd@google.com: fix livelock in __read_swap_cache_async()] Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2005212246080.8458@eggly.anvils Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Alex Shi <alex.shi@linux.alibaba.com> Link: http://lkml.kernel.org/r/20200508183105.225460-17-hannes@cmpxchg.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-03 20:09:48 -07:00
Johannes Weiner	9d82c69438	mm: memcontrol: convert anon and file-thp to new mem_cgroup_charge() API With the page->mapping requirement gone from memcg, we can charge anon and file-thp pages in one single step, right after they're allocated. This removes two out of three API calls - especially the tricky commit step that needed to happen at just the right time between when the page is "set up" and when it's "published" - somewhat vague and fluid concepts that varied by page type. All we need is a freshly allocated page and a memcg context to charge. v2: prevent double charges on pre-allocated hugepages in khugepaged [hannes@cmpxchg.org: Fix crash - *hpage could be ERR_PTR instead of NULL] Link: http://lkml.kernel.org/r/20200512215813.GA487759@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Qian Cai <cai@lca.pw> Link: http://lkml.kernel.org/r/20200508183105.225460-13-hannes@cmpxchg.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-03 20:09:48 -07:00
Johannes Weiner	be5d0a74c6	mm: memcontrol: switch to native NR_ANON_MAPPED counter Memcg maintains a private MEMCG_RSS counter. This divergence from the generic VM accounting means unnecessary code overhead, and creates a dependency for memcg that page->mapping is set up at the time of charging, so that page types can be told apart. Convert the generic accounting sites to mod_lruvec_page_state and friends to maintain the per-cgroup vmstat counter of NR_ANON_MAPPED. We use lock_page_memcg() to stabilize page->mem_cgroup during rmap changes, the same way we do for NR_FILE_MAPPED. With the previous patch removing MEMCG_CACHE and the private NR_SHMEM counter, this patch finally eliminates the need to have page->mapping set up at charge time. However, we need to have page->mem_cgroup set up by the time rmap runs and does the accounting, so switch the commit and the rmap callbacks around. v2: fix temporary accounting bug by switching rmap<->commit (Joonsoo) Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Balbir Singh <bsingharora@gmail.com> Link: http://lkml.kernel.org/r/20200508183105.225460-11-hannes@cmpxchg.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-03 20:09:47 -07:00
Johannes Weiner	6caa6a0703	mm: memcontrol: move out cgroup swaprate throttling The cgroup swaprate throttling is about matching new anon allocations to the rate of available IO when that is being throttled. It's the io controller hooking into the VM, rather than a memory controller thing. Rename mem_cgroup_throttle_swaprate() to cgroup_throttle_swaprate(), and drop the @memcg argument which is only used to check whether the preceding page charge has succeeded and the fault is proceeding. We could decouple the call from mem_cgroup_try_charge() here as well, but that would cause unnecessary churn: the following patches convert all callsites to a new charge API and we'll decouple as we go along. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com> Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <guro@fb.com> Cc: Balbir Singh <bsingharora@gmail.com> Link: http://lkml.kernel.org/r/20200508183105.225460-5-hannes@cmpxchg.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-03 20:09:47 -07:00
Johannes Weiner	3fba69a56e	mm: memcontrol: drop @compound parameter from memcg charging API The memcg charging API carries a boolean @compound parameter that tells whether the page we're dealing with is a hugepage. mem_cgroup_commit_charge() has another boolean @lrucare that indicates whether the page needs LRU locking or not while charging. The majority of callsites know those parameters at compile time, which results in a lot of naked "false, false" argument lists. This makes for cryptic code and is a breeding ground for subtle mistakes. Thankfully, the huge page state can be inferred from the page itself and doesn't need to be passed along. This is safe because charging completes before the page is published and somebody may split it. Simplify the callsites by removing @compound, and let memcg infer the state by using hpage_nr_pages() unconditionally. That function does PageTransHuge() to identify huge pages, which also helpfully asserts that nobody passes in tail pages by accident. The following patches will introduce a new charging API, best not to carry over unnecessary weight. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com> Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <guro@fb.com> Cc: Balbir Singh <bsingharora@gmail.com> Link: http://lkml.kernel.org/r/20200508183105.225460-4-hannes@cmpxchg.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-03 20:09:47 -07:00
Randy Dunlap	6f7939405f	mm: swapfile: fix /proc/swaps heading and Size/Used/Priority alignment Fix the heading and Size/Used/Priority field alignments in /proc/swaps. If the Size and/or Used value is >= 10000000 (8 bytes), then the alignment by using tab characters is broken. This patch maintains the use of tabs for alignment. If spaces are preferred, we can just use a Field Width specifier for the bytes and inuse fields. That way those fields don't have to be a multiple of 8 bytes in width. E.g., with a field width of 12, both Size and Used would always fit on the first line of an 80-column wide terminal (only Priority would be on the second line). There are actually 2 problems: heading alignment and field width. On an xterm, if Used is 7 bytes in length, the tab does nothing, and the display is like this, with no space/tab between the Used and Priority fields. (ugh) Filename Type Size Used Priority /dev/sda8 partition 16779260 2023012-1 To be clear, if one does 'cat /proc/swaps >/tmp/proc.swaps', it does look different, like so: Filename Type Size Used Priority /dev/sda8 partition 16779260 2086988 -1 Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Link: http://lkml.kernel.org/r/c0ffb41a-81ac-ddfa-d452-a9229ecc0387@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-02 10:59:09 -07:00
Huang Ying	4907058881	swap: reduce lock contention on swap cache from swap slots allocation In some swap scalability test, it is found that there are heavy lock contention on swap cache even if we have split one swap cache radix tree per swap device to one swap cache radix tree every 64 MB trunk in commit `4b3ef9daa4` ("mm/swap: split swap cache into 64MB trunks"). The reason is as follow. After the swap device becomes fragmented so that there's no free swap cluster, the swap device will be scanned linearly to find the free swap slots. swap_info_struct->cluster_next is the next scanning base that is shared by all CPUs. So nearby free swap slots will be allocated for different CPUs. The probability for multiple CPUs to operate on the same 64 MB trunk is high. This causes the lock contention on the swap cache. To solve the issue, in this patch, for SSD swap device, a percpu version next scanning base (cluster_next_cpu) is added. Every CPU will use its own per-cpu next scanning base. And after finishing scanning a 64MB trunk, the per-cpu scanning base will be changed to the beginning of another randomly selected 64MB trunk. In this way, the probability for multiple CPUs to operate on the same 64 MB trunk is reduced greatly. Thus the lock contention is reduced too. For HDD, because sequential access is more important for IO performance, the original shared next scanning base is used. To test the patch, we have run 16-process pmbench memory benchmark on a 2-socket server machine with 48 cores. One ram disk is configured as the swap device per socket. The pmbench working-set size is much larger than the available memory so that swapping is triggered. The memory read/write ratio is 80/20 and the accessing pattern is random. In the original implementation, the lock contention on the swap cache is heavy. The perf profiling data of the lock contention code path is as following, _raw_spin_lock_irq.add_to_swap_cache.add_to_swap.shrink_page_list: 7.91 _raw_spin_lock_irqsave.__remove_mapping.shrink_page_list: 7.11 _raw_spin_lock.swapcache_free_entries.free_swap_slot.__swap_entry_free: 2.51 _raw_spin_lock_irqsave.swap_cgroup_record.mem_cgroup_uncharge_swap: 1.66 _raw_spin_lock_irq.shrink_inactive_list.shrink_lruvec.shrink_node: 1.29 _raw_spin_lock.free_pcppages_bulk.drain_pages_zone.drain_pages: 1.03 _raw_spin_lock_irq.shrink_active_list.shrink_lruvec.shrink_node: 0.93 After applying this patch, it becomes, _raw_spin_lock.swapcache_free_entries.free_swap_slot.__swap_entry_free: 3.58 _raw_spin_lock_irq.shrink_inactive_list.shrink_lruvec.shrink_node: 2.3 _raw_spin_lock_irqsave.swap_cgroup_record.mem_cgroup_uncharge_swap: 2.26 _raw_spin_lock_irq.shrink_active_list.shrink_lruvec.shrink_node: 1.8 _raw_spin_lock.free_pcppages_bulk.drain_pages_zone.drain_pages: 1.19 The lock contention on the swap cache is almost eliminated. And the pmbench score increases 18.5%. The swapin throughput increases 18.7% from 2.96 GB/s to 3.51 GB/s. While the swapout throughput increases 18.5% from 2.99 GB/s to 3.54 GB/s. We need really fast disk to show the benefit. I have tried this on 2 Intel P3600 NVMe disks. The performance improvement is only about 1%. The improvement should be better on the faster disks, such as Intel Optane disk. [ying.huang@intel.com: fix cluster_next_cpu allocation and freeing, per Daniel] Link: http://lkml.kernel.org/r/20200525002648.336325-1-ying.huang@intel.com [ying.huang@intel.com: v4] Link: http://lkml.kernel.org/r/20200529010840.928819-1-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200520031502.175659-1-ying.huang@intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-02 10:59:09 -07:00
Huang Ying	09fe06ce0b	mm/swapfile.c: use prandom_u32_max() To improve the code readability and take advantage of the common implementation. Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200512081013.520201-1-ying.huang@intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-02 10:59:09 -07:00
Wei Yang	33e16272fe	mm/swapfile.c: __swap_entry_free() always free 1 entry __swap_entry_free() always frees 1 entry. Let's remove the usage. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200501015259.32237-2-richard.weiyang@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-02 10:59:09 -07:00
Huang Ying	ed43af1097	swap: try to scan more free slots even when fragmented Now, the scalability of swap code will drop much when the swap device becomes fragmented, because the swap slots allocation batching stops working. To solve the problem, in this patch, we will try to scan a little more swap slots with restricted effort to batch the swap slots allocation even if the swap device is fragmented. Test shows that the benchmark score can increase up to 37.1% with the patch. Details are as follows. The swap code has a per-cpu cache of swap slots. These batch swap space allocations to improve swap subsystem scaling. In the following code path, add_to_swap() get_swap_page() refill_swap_slots_cache() get_swap_pages() scan_swap_map_slots() scan_swap_map_slots() and get_swap_pages() can return multiple swap slots for each call. These slots will be cached in the per-CPU swap slots cache, so that several following swap slot requests will be fulfilled there to avoid the lock contention in the lower level swap space allocation/freeing code path. But this only works when there are free swap clusters. If a swap device becomes so fragmented that there's no free swap clusters, scan_swap_map_slots() and get_swap_pages() will return only one swap slot for each call in the above code path. Effectively, this falls back to the situation before the swap slots cache was introduced, the heavy lock contention on the swap related locks kills the scalability. Why does it work in this way? Because the swap device could be large, and the free swap slot scanning could be quite time consuming, to avoid taking too much time to scanning free swap slots, the conservative method was used. In fact, this can be improved via scanning a little more free slots with strictly restricted effort. Which is implemented in this patch. In scan_swap_map_slots(), after the first free swap slot is gotten, we will try to scan a little more, but only if we haven't scanned too many slots (< LATENCY_LIMIT). That is, the added scanning latency is strictly restricted. To test the patch, we have run 16-process pmbench memory benchmark on a 2-socket server machine with 48 cores. Multiple ram disks are configured as the swap devices. The pmbench working-set size is much larger than the available memory so that swapping is triggered. The memory read/write ratio is 80/20 and the accessing pattern is random, so the swap space becomes highly fragmented during the test. In the original implementation, the lock contention on swap related locks is very heavy. The perf profiling data of the lock contention code path is as following, _raw_spin_lock.get_swap_pages.get_swap_page.add_to_swap: 21.03 _raw_spin_lock_irq.shrink_inactive_list.shrink_lruvec.shrink_node: 1.92 _raw_spin_lock_irq.shrink_active_list.shrink_lruvec.shrink_node: 1.72 _raw_spin_lock.free_pcppages_bulk.drain_pages_zone.drain_pages: 0.69 While after applying this patch, it becomes, _raw_spin_lock_irq.shrink_inactive_list.shrink_lruvec.shrink_node: 4.89 _raw_spin_lock_irq.shrink_active_list.shrink_lruvec.shrink_node: 3.85 _raw_spin_lock.free_pcppages_bulk.drain_pages_zone.drain_pages: 1.1 _raw_spin_lock_irqsave.pagevec_lru_move_fn.__lru_cache_add.do_swap_page: 0.88 That is, the lock contention on the swap locks is eliminated. And the pmbench score increases 37.1%. The swapin throughput increases 45.7% from 2.02 GB/s to 2.94 GB/s. While the swapout throughput increases 45.3% from 2.04 GB/s to 2.97 GB/s. Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200427030023.264780-1-ying.huang@intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-02 10:59:09 -07:00
Wei Yang	7b9e2de130	mm/swapfile.c: omit a duplicate code by compare tmp and max first There are two duplicate code to handle the case when there is no available swap entry. To avoid this, we can compare tmp and max first and let the second guard do its job. No functional change is expected. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/20200421213824.8099-3-richard.weiyang@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-02 10:59:09 -07:00

1 2 3 4 5 ...

400 Commits