android_kernel_xiaomi_sm8450

xiaomi-sm8450/android_kernel_xiaomi_sm8450

Author	SHA1	Message	Date
Raghavendra Rao Ananta	c249464a77	Merge remote-tracking branch 'remotes/origin/tmp-aefd2d632eb0' into msm-lahaina * remotes/origin/tmp-aefd2d632eb0: ANDROID: binder: fix sleeping from invalid function caused by RT inheritance FROMGIT: scsi: ufs: override auto suspend tunables for ufs FROMGIT: scsi: core: allow auto suspend override by low-level driver Linux 5.4-rc6 net: fix installing orphaned programs net: cls_bpf: fix NULL deref on offload filter removal selftests: bpf: Skip write only files in debugfs selftests: net: reuseport_dualstack: fix uninitalized parameter r8169: fix wrong PHY ID issue with RTL8168dp net: dsa: bcm_sf2: Fix IMP setup for port different than 8 net: phylink: Fix phylink_dbg() macro gve: Fixes DMA synchronization. inet: stop leaking jiffies on the wire ixgbe: Remove duplicate clear_bit() call Documentation: networking: device drivers: Remove stray asterisks e1000: fix memory leaks i40e: Fix receive buffer starvation for AF_XDP igb: Fix constant media auto sense switching when no cable is connected net: ethernet: arc: add the missed clk_disable_unprepare ANDROID: gki_defconfig: enable CONFIG_KEYBOARD_GPIO NFS: Fix an RCU lock leak in nfs4_refresh_delegation_stateid() NFSv4: Don't allow a cached open with a revoked delegation arm64: apply ARM64_ERRATUM_843419 workaround for Brahma-B53 core arm64: Brahma-B53 is SSB and spectre v2 safe arm64: apply ARM64_ERRATUM_845719 workaround for Brahma-B53 core igb: Enable media autosense for the i350. igb/igc: Don't warn on fatal read failures when the device is removed tcp: increase tcp_max_syn_backlog max value net: increase SOMAXCONN to 4096 netdevsim: Fix use-after-free during device dismantle rxrpc: Fix handling of last subpacket of jumbo packet usb: dwc3: gadget: fix race when disabling ep with cancelled xfers ANDROID: Remove KVM_INTEL allmodconfig workaround ANDROID: Fix x86_64 allmodconfig build iocost: don't nest spin_lock_irq in ioc_weight_write() s390/idle: fix cpu idle time calculation s390/unwind: fix mixing regs and sp s390/cmm: fix information leak in cmm_timeout_handler() arm64: cpufeature: Enable Qualcomm Falkor errata 1009 for Kryo KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active kvm: call kvm_arch_destroy_vm if vm creation fails efi/efi_test: Lock down /dev/efi_test and require CAP_SYS_ADMIN x86, efi: Never relocate kernel below lowest acceptable address efi: libstub/arm: Account for firmware reserved memory at the base of RAM efi/random: Treat EFI_RNG_PROTOCOL output as bootloader randomness efi/tpm: Return -EINVAL when determining tpm final events log size fails efi: Make CONFIG_EFI_RCI2_TABLE selectable on x86 only hv_netvsc: Fix error handling in netvsc_attach() hv_netvsc: Fix error handling in netvsc_set_features() cxgb4: fix panic when attaching to ULD fail net: annotate lockless accesses to sk->sk_napi_id FROMLIST: scsi: ufs-qcom: enter and exit hibern8 during clock scaling FROMLIST: scsi: ufs: export hibern8 entry and exit ANDROID: scsi: ufs: UFS crypto variant operations API ANDROID: gki_defconfig: enable inline encryption FROMLIST: ext4: add inline encryption support FROMLIST: f2fs: add inline encryption support FROMLIST: fscrypt: add inline encryption support FROMLIST: scsi: ufs: Add inline encryption support to UFS FROMLIST: scsi: ufs: UFS crypto API FROMLIST: scsi: ufs: UFS driver v2.1 spec crypto additions FROMLIST: block: blk-crypto for Inline Encryption ANDROID: block: Fix bio_crypt_should_process WARN_ON ALSA: timer: Fix mutex deadlock at releasing card FROMLIST: block: Add encryption context to struct bio io_uring: ensure we clear io_kiocb->result before each issue parisc: fix frame pointer in ftrace_regs_caller() net: annotate accesses to sk->sk_incoming_cpu FROMLIST: block: Keyslot Manager for Inline Encryption FROMLIST: f2fs: add support for IV_INO_LBLK_64 encryption policies FROMLIST: ext4: add support for IV_INO_LBLK_64 encryption policies FROMLIST: fscrypt: add support for IV_INO_LBLK_64 policies FROMLIST: docs: ioctl-number: document fscrypt ioctl numbers FROMLIST: fscrypt: zeroize fscrypt_info before freeing FROMLIST: fscrypt: remove struct fscrypt_ctx FROMLIST: fscrypt: invoke crypto API for ESSIV handling mlxsw: core: Unpublish devlink parameters during reload qed: Optimize execution time for nvm attributes configuration. vxlan: fix unexpected failure of vxlan_changelink() qed: fix spelling mistake "queuess" -> "queues" SUNRPC: Destroy the back channel when we destroy the host transport SUNRPC: The RDMA back channel mustn't disappear while requests are outstanding SUNRPC: The TCP back channel mustn't disappear while requests are outstanding drm/amdgpu: enable -msse2 for GCC 7.1+ users drm/amdgpu: fix stack alignment ABI mismatch for GCC 7.1+ drm/amdgpu: fix stack alignment ABI mismatch for Clang drm/radeon: Fix EEH during kexec drm/amdgpu/gmc10: properly set BANK_SELECT and FRAGMENT_SIZE drm/amdgpu/powerplay/vega10: allow undervolting in p7 dc.c:use kzalloc without test drm/amd/display: setting the DIG_MODE to the correct value. drm/amd/display: Passive DP->HDMI dongle detection fix drm/amd/display: add 50us buffer as WA for pstate switch in active drm/amd/display: Allow inverted gamma drm/amd/display: do not synchronize "drr" displays drm/amdgpu: If amdgpu_ib_schedule fails return back the error. drm/sched: Set error to s_fence if HW job submission failed. drm/amdgpu/gfx10: update gfx golden settings for navi12 drm/amdgpu/gfx10: update gfx golden settings for navi14 drm/amdgpu/gfx10: update gfx golden settings drm/amd/display: Change Navi14's DWB flag to 1 drm/amdgpu/sdma5: do not execute 0-sized IBs (v2) drm/amdgpu: Fix SDMA hang when performing VKexample test iwlwifi: fw api: support new API for scan config cmd mt76: dma: fix buffer unmap with non-linear skbs mt76: mt76x2e: disable pcie_aspm by default ALSA: hda - Fix mutex deadlock in HDMI codec driver usb: cdns3: gadget: Fix g_audio use case when connected to Super-Speed host usb: cdns3: gadget: reset EP_CLAIMED flag while unloading gfs2: Fix initialisation of args for remount iommu/vt-d: Fix panic after kexec -p for kdump iommu/amd: Apply the same IVRS IOAPIC workaround to Acer Aspire A315-41 iommu/ipmmu-vmsa: Remove dev_err() on platform_get_irq() failure nl80211: fix validation of mesh path nexthop nl80211: Disallow setting of HT for channel 14 USB: serial: whiteheat: fix line-speed endianness USB: serial: whiteheat: fix potential slab corruption MAINTAINERS: Change to my personal email address drm/i915: Fix PCH reference clock for FDI on HSW/BDW net: rtnetlink: fix a typo fbd -> fdb net/smc: fix refcounting for non-blocking connect() bonding: fix using uninitialized mode_lock net: fec_ptp: Use platform_get_irq_xxx_optional() to avoid error message net: fec_main: Use platform_get_irq_byname_optional() to avoid error message MAINTAINERS: remove Dave Watson as TLS maintainer vxlan: check tun_info options_len properly erspan: fix the tun_info options_len check for erspan net: hisilicon: Fix ping latency when deal with high throughput net/mlx4_core: Dynamically set guaranteed amount of counters per VF net/mlx5e: Initialize on stack link modes bitmap net/mlx5e: Fix ethtool self test: link speed net/mlx5e: Fix handling of compressed CQEs in case of low NAPI budget net/mlx5e: Don't store direct pointer to action's tunnel info net/mlx5: Fix NULL pointer dereference in extended destination net/mlx5: Fix rtable reference leak net/mlx5e: Only skip encap flows update when encap init failed net/mlx5e: Replace kfree with kvfree when free vhca stats net/mlx5e: Remove incorrect match criteria assignment line net/mlx5e: Determine source port properly for vlan push action net/mlx5: Fix flow counter list auto bits struct net: mscc: ocelot: refuse to overwrite the port's native vlan net: mscc: ocelot: fix vlan_filtering when enslaving to bridge before link is up wimax: i2400: Fix memory leak in i2400m_op_rfkill_sw_toggle drm/i915/tgl: Fix doc not corresponding to code ANDROID: staging: ion: Fix dynamic heap ID assignment ANDROID: Fix typo for FROMLIST: section drm/panfrost: Don't dereference bogus MMU pointers ANDROID: media: increase video max frame number drm/panfrost: fix -Wmissing-prototypes warnings net: hisilicon: Fix "Trying to free already-free IRQ" fjes: Handle workqueue allocation failure Revert "sched: Rework pick_next_task() slow-path" arm64: cpufeature: Enable Qualcomm Falkor/Kryo errata 1003 drm/etnaviv: fix dumping of iommuv2 drm/etnaviv: reinstate MMUv1 command buffer window check drm/etnaviv: fix deadlock in GPU coredump arm64: Ensure VM_WRITE\|VM_SHARED ptes are clean by default um-ubd: Entrust re-queue to the upper layers nvme-multipath: remove unused groups_only mode in ana log nvme-multipath: fix possible io hang after ctrl reconnect powerpc/powernv: Fix CPU idle to be called with IRQs disabled sched/topology: Allow sched_asym_cpucapacity to be disabled sched/topology: Don't try to build empty sched domains USB: gadget: Reject endpoints with 0 maxpacket value powerpc/prom_init: Undo relocation before entering secure mode ANDROID: dummy_cpufreq: Implement get() ANDROID: gki_defconfig: enable CONFIG_CPUSETS ANDROID: virtio: virtio_input: Set the amount of multitouch slots in virtio input scsi: qla2xxx: stop timer in shutdown path hwmon: (ina3221) Fix read timeout issue net: usb: lan78xx: Disable interrupts before calling generic_handle_irq() Revert "ANDROID: Revert "kheaders: make headers archive reproducible"" net: dsa: sja1105: improve NET_DSA_SJA1105_TAS dependency net: ethernet: ftgmac100: Fix DMA coherency issue with SW checksum net: fix sk_page_frag() recursion from memory reclaim udp: fix data-race in udp_set_dev_scratch() net: dpaa2: Use the correct style for SPDX License Identifier net: add READ_ONCE() annotation in __skb_wait_for_more_packets() net: use skb_queue_empty_lockless() in busy poll contexts net: use skb_queue_empty_lockless() in poll() handlers udp: use skb_queue_empty_lockless() net: add skb_queue_empty_lockless() ANDROID: gki_defconfig: enable CONFIG_CPU_FREQ_GOV_CONSERVATIVE RDMA/hns: Prevent memory leaks of eq->buf_list RISC-V: Add PCIe I/O BAR memory mapping RDMA/iw_cxgb4: Avoid freeing skb twice in arp failure case RDMA/mlx5: Use irq xarray locking for mkey_table ANDROID: fix VIDEOBUF2_CORE dependency in 'allmodconfig' builds UAS: Revert commit `3ae62a4209` ("UAS: fix alignment of scatter/gather segments") usb-storage: Revert commit `747668dbc0` ("usb-storage: Set virt_boundary_mask to avoid SG overflows") usbip: Fix free of unallocated memory in vhci tx usbip: tools: Fix read_usb_vudc_device() error path handling usb: xhci: fix __le32/__le64 accessors in debugfs code usb: xhci: fix Immediate Data Transfer endianness xhci: Fix use-after-free regression in xhci clear hub TT implementation USB: ldusb: fix control-message timeout USB: ldusb: use unsigned size format specifiers USB: ldusb: fix ring-buffer locking USB: Skip endpoints with 0 maxpacket length io_uring: don't touch ctx in setup after ring fd install ANDROID: modpost: fix up merge issues due to namespace removal Revert "ALSA: hda: Flush interrupts on disabling" perf/headers: Fix spelling s/EACCESS/EACCES/, s/privilidge/privilege/ perf/x86/uncore: Fix event group support perf/x86/amd/ibs: Handle erratum #420 only on the affected CPU family (10h) perf/x86/amd/ibs: Fix reading of the IBS OpData register and thus precise RIP validity perf/core: Start rejecting the syscall with attr.__reserved_2 set vringh: fix copy direction of vringh_iov_push_kern() vsock/virtio: remove unused 'work' field from 'struct virtio_vsock_pkt' virtio_ring: fix stalls for packed rings riscv: for C functions called only from assembly, mark with __visible riscv: fp: add missing __user pointer annotations riscv: add missing header file includes riscv: mark some code and data as file-static riscv: init: merge split string literals in preprocessor directive riscv: add prototypes for assembly language functions from head.S io_uring: Fix leaked shadow_req fix memory leak in large read decrypt offload Linux 5.4-rc5 usb: cdns3: gadget: Don't manage pullups usb: dwc3: remove the call trace of USBx_GFLADJ usb: gadget: configfs: fix concurrent issue between composite APIs usb: dwc3: pci: prevent memory leak in dwc3_pci_probe usb: gadget: composite: Fix possible double free memory bug usb: gadget: udc: atmel: Fix interrupt storm in FIFO mode. usb: renesas_usbhs: fix type of buf usb: renesas_usbhs: Fix warnings in usbhsg_recip_handler_std_set_device() usb: gadget: udc: renesas_usb3: Fix __le16 warnings usb: renesas_usbhs: fix __le16 warnings usb: cdns3: include host-export,h for cdns3_host_init usb: mtu3: fix missing include of mtu3_dr.h usb: fsl: Check memory resource before releasing it usb: dwc3: select CONFIG_REGMAP_MMIO ANDROID: add README.md selftests: fib_tests: add more tests for metric update ipv4: fix route update on metric change. net: Zeroing the structure ethtool_wolinfo in ethtool_get_wol() ALSA: bebob: Fix prototype of helper function to return negative value cxgb4: request the TX CIDX updates to status page netns: fix GFP flags in rtnl_net_notifyid() net: ethernet: Use the correct style for SPDX License Identifier net/smc: keep vlan_id for SMC-R in smc_listen_work() net/smc: fix closing of fallback SMC sockets ANDROID: virt_wifi: Add data ops for scan data simulation ANDROID: Allow DRM_IOCTL_MODE__DUMB for render clients. riscv: cleanup do_trap_break net: hwbm: if CONFIG_NET_HWBM unset, make stub functions static ANDROID: cpufreq: create dummy cpufreq driver net: mvneta: make stub functions static inline net: sch_generic: Use pfifo_fast as fallback scheduler for CAN hardware nbd: verify socket is supported during setup ata: libahci_platform: Fix regulator_get_optional() misuse nbd: handle racing with error'ed out commands nbd: protect cmd->status with cmd->lock Revert "ANDROID: x86: Remove a useless warning message" ANDROID: init: GKI: enable hidden configs for media ANDROID: gki_defconfig: add FORTIFY_SOURCE, remove SPMI_MSM_PMIC_ARB Revert "Revert "Revert "Revert "x86/mm: Identify the end of the kernel area to be reserved"""" build.config.: Link android-mainline kernels with LLD ANDROID: ALSA: jack: Update supported jack switch types ANDROID: ASoC: compress: fix unsigned integer overflow check io_uring: fix bad inflight accounting for SETUP_IOPOLL\|SETUP_SQTHREAD io_uring: used cached copies of sq->dropped and cq->overflow FROMLIST: iommu: Export core IOMMU functions to kernel modules FROMLIST: PCI: Export PCI ACS and DMA searching functions to modules FROMLIST: of: Export of_phandle_iterator_args() to modules ARM: dts: stm32: relax qspi pins slew-rate for stm32mp157 io_uring: Fix race for sqes with userspace io_uring: Fix broken links with offloading io_uring: Fix corrupted user_data xen: issue deprecation warning for 32-bit pv guest kvm: Allocate memslots and buses before calling kvm_arch_init_vm powerpc/powernv/eeh: Fix oops when probing cxl devices irqchip/sifive-plic: Skip contexts except supervisor in plic_init() ANDROID: soc: qcom: Add required header to irq.h ACPI: processor: Add QoS requests for all CPUs cifs: Fix cifsInodeInfo lock_sem deadlock when reconnect occurs CIFS: Fix use after free of file info structures CIFS: Fix retry mid list corruption on reconnects scsi: sd: define variable dif as unsigned int instead of bool ANDROID: v4l2-compat-ioctl32.c: copy reserved fields scsi: target: cxgbit: Fix cxgbit_fw4_ack() IB/core: Avoid deadlock during netlink message handling ANDROID: of: property: Enable of_devlink by default virt_wifi: fix refcnt leak in module exit routine net: remove unnecessary variables and callback vxlan: add adjacent link to limit depth level net: core: add ignore flag to netdev_adjacent structure macsec: fix refcnt leak in module exit routine team: fix nested locking lockdep warning bonding: use dynamic lockdep key instead of subclass bonding: fix unexpected IFF_BONDING bit unset net: core: add generic lockdep keys net: core: limit nested device depth keys: Fix memory leak in copy_net_ns ANDROID: of: property: Make sure child dependencies don't block probing of parent ANDROID: driver core: Allow fwnode_operations.add_links to differentiate errors ANDROID: driver core: Allow a device to wait on optional suppliers ANDROID: driver core: Add device link support for SYNC_STATE_ONLY flag FROMGIT: docs: driver-model: Add documentation for sync_state FROMGIT: driver: core: Improve documentation for fwnode_operations.add_links() FROMGIT: of: property: Minor code formatting/style clean ups i2c: stm32f7: remove warning when compiling with W=1 i2c: stm32f7: fix a race in slave mode with arbitration loss irq i2c: stm32f7: fix first byte to send in slave mode i2c: mt65xx: fix NULL ptr dereference RDMA/nldev: Skip counter if port doesn't match irqchip/gic-v3-its: Use the exact ITSList for VMOVP FROMLIST: drivers: pinctrl: msm: setup GPIO chip in hierarchy FROMLIST: drivers: irqchip: pdc: Add irqchip set/get state calls FROMLIST: genirq: Introduce irq_chip_get/set_parent_state calls FROMLIST: drivers: irqchip: pdc: additionally set type in SPI config registers FROMLIST: dt-bindings/interrupt-controller: pdc: add SPI config register FROMLIST: of: irq: document properties for wakeup interrupt parent FROMLIST: drivers: irqchip: add PDC irqdomain for wakeup capable GPIOs FROMLIST: drivers: irqchip: pdc: Do not toggle IRQ_ENABLE during mask/unmask FROMLIST: drivers: irqchip: qcom-pdc: update max PDC interrupts FROMLIST: irqdomain: add bus token DOMAIN_BUS_WAKEUP gfs2: Fix memory leak when gfs2meta's fs_context is freed ALSA: hda/realtek - Fix 2 front mics of codec 0x623 ALSA: hda/realtek - Add support for ALC623 ALSA: usb-audio: Add DSD support for Gustard U16/X26 USB Interface netfilter: nft_payload: fix missing check for matching length in offloads ipvs: move old_secure_tcp into struct netns_ipvs ipvs: don't ignore errors in case refcounting ip_vs module fails ANDROID: drop patches/ symbolic link mfd: mt6397: Fix probe after changing mt6397-core net: phy: smsc: LAN8740: add PHY_RST_AFTER_CLK_EN flag MIPS: tlbex: Fix build_restore_pagemask KScratch restore io_uring: correct timeout req sequence when inserting a new entry io_uring : correct timeout req sequence when waiting timeout io_uring: revert "io_uring: optimize submit_and_wait API" MIPS: bmips: mark exception vectors as char arrays xsk: Fix registration of Rx-only sockets net/flow_dissector: switch to siphash MAINTAINERS: Update the Spreadtrum SoC maintainer ANDROID: Revert "ANDROID: Removed check for asm-goto" riscv: cleanup <asm/bug.h> riscv: Fix undefined reference to vmemmap_populate_basepages riscv: Fix implicit declaration of 'page_to_section' riscv: fix fs/proc/kcore.c compilation with sparsemem enabled ANDROID: sdcardfs: evict dentries on fscrypt key removal ANDROID: fscrypt: add key removal notifier chain ANDROID: move up spin_unlock_bh() ahead of remove_proc_entry() ANDROID: Kconfig.gki: Add hidden MMC config support ANDROID: Kconfig.gki: Add Hidden QCOM configs ANDROID: Kconfig.gki: Add SND_PCM_ELD to HIDDEN_DRM configs ANDROID: Kconfig.gki: Add extra audio selections ANDROID: Kconfig.gki: Add extra GKI_HIDDEN_REGMAP_CONFIGS selections ANDROID: Four part re-add of asm-goto usage [4/4] ANDROID: Four part re-add of asm-goto usage [3/4] ANDROID: Four part re-add of asm-goto usage [2/4] ANDROID: Four part re-add of asm-goto usage [1/4] ANDROID: Move out patches/ and replace by link to kernel/common-patches project of: reserved_mem: add missing of_node_put() for proper ref-counting of: unittest: fix memory leak in unittest_data_add dt-bindings: riscv: Fix CPU schema errors MAINTAINERS: Remove Gregory and Brian for ARCH_BRCMSTB drm/v3d: Fix memory leak in v3d_submit_cl_ioctl panfrost: Properly undo pm_runtime_enable when deferring a probe dmaengine: cppi41: Fix cppi41_dma_prep_slave_sg() when idle posix-cpu-timers: Fix two trivial comments timers/sched_clock: Include local timekeeping.h for missing declarations lib/vdso: Make clock_getres() POSIX compliant again fuse: redundant get_fuse_inode() calls in fuse_writepages_fill() fuse: Add changelog entries for protocols 7.1 - 7.8 fuse: truncate pending writes on O_TRUNC fuse: flush dirty data/metadata before non-truncate setattr netfilter: nf_tables_offload: restore basechain deletion netfilter: nf_flow_table: set timeout before insertion into hashes rtlwifi: rtl_pci: Fix problem of too small skb->len iwlwifi: pcie: 0x2720 is qu and 0x30DC is not iwlwifi: pcie: add workaround for power gating in integrated 22000 iwlwifi: mvm: handle iwl_mvm_tvqm_enable_txq() error return iwlwifi: pcie: fix all 9460 entries for qnj iwlwifi: pcie: fix PCI ID 0x2720 configs that should be soc rtlwifi: Fix potential overflow on P2P code iwlwifi: pcie: fix merge damage on making QnJ exclusive scripts/nsdeps: use alternative sed delimiter virtiofs: Remove set but not used variable 'fc' fs/dax: Fix pmd vs pte conflict detection opp: Reinitialize the list_kref before adding the static OPPs again bpf: Fix use after free in bpf_get_prog_name ALSA: hda: Add Tigerlake/Jasperlake PCI ID scsi: qla2xxx: Fix partial flash write of MBI scsi: qla2xxx: Initialized mailbox to prevent driver load failure scsi: lpfc: Honor module parameter lpfc_use_adisc ipv6: include <net/addrconf.h> for missing declarations net: openvswitch: free vport unless register_netdevice() succeeds selftests: Make l2tp.sh executable net: sched: taprio: fix -Wmissing-prototypes warnings bnxt_en: Avoid disabling pci device in bnxt_remove_one() for already disabled device. bnxt_en: Minor formatting changes in FW devlink_health_reporter bnxt_en: Adjust the time to wait before polling firmware readiness. bnxt_en: Fix devlink NVRAM related byte order related issues. bnxt_en: Fix the size of devlink MSIX parameters. net: stmmac: Fix the problem of tso_xmit dynamic_debug: provide dynamic_hex_dump stub bpf: Fix use after free in subprog's jited symbol removal RDMA/uverbs: Prevent potential underflow KVM: nVMX: Don't leak L1 MMIO regions to L2 ARC: perf: Accommodate big-endian CPU ARC: [plat-hsdk]: Enable on-boardi SPI ADC IC ARC: [plat-hsdk]: Enable on-board SPI NOR flash IC KVM: SVM: Fix potential wrong physical id in avic_handle_ldr_update cpufreq: Cancel policy update work scheduled before freeing s390/kaslr: add support for R_390_GLOB_DAT relocation type s390/zcrypt: fix memleak at release ALSA: usb-audio: Fix copy&paste error in the validator perf/aux: Fix AUX output stopping kvm: clear kvmclock MSR on reset KVM: x86: fix bugon.cocci warnings KVM: VMX: Remove specialized handling of unexpected exit-reasons selftests: kvm: fix sync_regs_test with newer gccs selftests: kvm: vmx_dirty_log_test: skip the test when VMX is not supported selftests: kvm: consolidate VMX support checks selftests: kvm: vmx_set_nested_state_test: don't check for VMX support twice KVM: Don't shrink/grow vCPU halt_poll_ns if host side polling is disabled selftests: kvm: synchronize .gitignore to Makefile kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUID cpuidle: haltpoll: Take 'idle=' override into account ACPI: NFIT: Fix unlock on error in scrub_show() tracing: Fix race in perf_trace_buf initialization x86/cpu/vmware: Fix platform detection VMWARE_PORT macro x86/cpu/vmware: Use the full form of INL in VMWARE_HYPERCALL, for clang/llvm xdp: Handle device unregister for devmap_hash map type r8152: add device id for Lenovo ThinkPad USB-C Dock Gen 2 ipv4: fix IPSKB_FRAG_PMTU handling with fragmentation ARM: 8926/1: v7m: remove register save to stack before svc Input: st1232 - fix reporting multitouch coordinates Revert "pwm: Let pwm_get_state() return the last implemented state" mmc: mxs: fix flags passed to dmaengine_prep_slave_sg virtiofs: Retry request submission from worker context virtiofs: Count pending forgets as in_flight forgets virtiofs: Set FR_SENT flag only after request has been sent virtiofs: No need to check fpq->connected state virtiofs: Do not end request in submission context fuse: don't advise readdirplus for negative lookup drm/komeda: Fix typos in komeda_splitter_validate drm/komeda: Don't flush inactive pipes i2c: aspeed: fix master pending state handling mmc: cqhci: Commit descriptors before setting the doorbell mmc: sdhci-omap: Fix Tuning procedure for temperatures < -20C ALSA: hda/realtek - Add support for ALC711 perf/aux: Fix tracking of auxiliary trace buffer allocation fuse: don't dereference req->args on finished request opp: core: Revert "add regulators enable and disable" cifs: Fix missed free operations CIFS: avoid using MID 0xFFFF cifs: clarify comment about timestamp granularity for old servers cifs: Handle -EINPROGRESS only when noblockcnt is set PM: QoS: Drop frequency QoS types from device PM QoS cpufreq: Use per-policy frequency QoS PM: QoS: Introduce frequency QoS Linux 5.4-rc4 hwmon: (nct7904) Fix the incorrect value of vsen_mask & tcpu_mask & temp_mode in nct7904_data struct. perf/x86/intel/pt: Fix base for single entry topa KVM: arm64: pmu: Reset sample period on overflow handling KVM: arm64: pmu: Set the CHAINED attribute before creating the in-kernel event arm64: KVM: Handle PMCR_EL0.LC as RES1 on pure AArch64 systems KVM: arm64: pmu: Fix cycle counter truncation net: reorder 'struct net' fields to avoid false sharing net: dsa: fix switch tree list net: ethernet: dwmac-sun8i: show message only when switching to promisc net: aquantia: add an error handling in aq_nic_set_multicast_list net: netem: correct the parent's backlog when corrupted packet was dropped net: netem: fix error path for corrupted GSO frames macb: propagate errors when getting optional clocks xen/netback: fix error path of xenvif_connect_data() net: hns3: fix mis-counting IRQ vector numbers issue scripts/gdb: fix debugging modules on s390 kernel/events/uprobes.c: only do FOLL_SPLIT_PMD for uprobe register mm/thp: allow dropping THP from page cache mm/vmscan.c: support removing arbitrary sized pages from mapping mm/thp: fix node page state in split_huge_page_to_list() proc/meminfo: fix output alignment mm/init-mm.c: include <linux/mman.h> for vm_committed_as_batch mm/filemap.c: include <linux/ramfs.h> for generic_file_vm_ops definition mm: include <linux/huge_mm.h> for is_vma_temporary_stack zram: fix race between backing_dev_show and backing_dev_store mm/memcontrol: update lruvec counters in mem_cgroup_move_account ocfs2: fix panic due to ocfs2_wq is null hugetlbfs: don't access uninitialized memmaps in pfn_range_valid_gigantic() mm: memblock: do not enforce current limit for memblock_phys* family mm: memcg: get number of pages on the LRU list in memcgroup base on lru_zone_size mm/gup: fix a misnamed "write" argument, and a related bug mm/gup_benchmark: add a missing "w" to getopt string ocfs2: fix error handling in ocfs2_setattr() mm: memcg/slab: fix panic in __free_slab() caused by premature memcg pointer release mm/memunmap: don't access uninitialized memmap in memunmap_pages() mm/memory_hotplug: don't access uninitialized memmaps in shrink_pgdat_span() mm/page_owner: don't access uninitialized memmaps when reading /proc/pagetypeinfo scripts/gdb: fix lx-dmesg when CONFIG_PRINTK_CALLER is set mm/memory-failure.c: don't access uninitialized memmaps in memory_failure() fs/proc/page.c: don't access uninitialized memmaps in fs/proc/page.c drivers/base/memory.c: don't access uninitialized memmaps in soft_offline_page_store() xdp: Prevent overflow in devmap_hash cost calculation for 32-bit builds filldir[64]: remove WARN_ON_ONCE() for bad directory entries scsi: ufs-bsg: Wake the device before sending raw upiu commands scsi: lpfc: Check queue pointer before use mips: vdso: Fix __arch_get_hw_counter() MAINTAINERS: Use @kernel.org address for Paul Burton scsi: qla2xxx: fixup incorrect usage of host_byte selftests/bpf: More compatible nc options in test_tc_edt net/mlx5: fix memory leak in mlx5_fw_fatal_reporter_dump net/mlx5: prevent memory leak in mlx5_fpga_conn_create_cq net/mlx5e: TX, Fix consumer index of error cqe dump net/mlx5e: kTLS, Enhance TX resync flow net/mlx5e: kTLS, Save a copy of the crypto info net/mlx5e: kTLS, Remove unneeded cipher type checks net/mlx5e: kTLS, Limit DUMP wqe size net/mlx5e: kTLS, Fix missing SQ edge fill net/mlx5e: kTLS, Fix page refcnt leak in TX resync error flow net/mlx5e: kTLS, Save by-value copy of the record frags net/mlx5e: kTLS, Save only the frag page to release at completion net/mlx5e: kTLS, Size of a Dump WQE is fixed net/mlx5e: kTLS, Release reference on DUMPed fragments in shutdown flow net/mlx5e: Tx, Zero-memset WQE info struct upon update net/mlx5e: Tx, Fix assumption of single WQEBB of NOP in cleanup flow usb: cdns3: Error out if USB_DR_MODE_UNKNOWN in cdns3_core_init_role() ARM: dts: bcm2837-rpi-cm3: Avoid leds-gpio probing issue USB: ldusb: fix read info leaks IB/core: Use rdma_read_gid_l2_fields to compare GID L2 fields RDMA/qedr: Fix reported firmware version RDMA/siw: free siw_base_qp in kref release routine tracing: Fix "gfp_t" format for synthetic events RDMA/iwcm: move iw_rem_ref() calls out of spinlock iw_cxgb4: fix ECN check on the passive accept net: usb: lan78xx: Connect PHY before registering MAC vsock/virtio: discard packets if credit is not respected vsock/virtio: send a credit update when buffer size is changed mlxsw: spectrum_trap: Push Ethernet header before reporting trap ASoC: SOF: control: return true when kcontrol values change ASoC: stm32: sai: fix sysclk management on shutdown ASoC: Intel: sof-rt5682: add a check for devm_clk_get ASoC: rsnd: Reinitialize bit clock inversion flag for every format setting net: ensure correct skb->tstamp in various fragmenters net: bcmgenet: reset 40nm EPHY on energy detect net: bcmgenet: soft reset 40nm EPHYs before MAC init net: phy: bcm7xxx: define soft_reset for 40nm EPHY net: bcmgenet: don't set phydev->link from MAC bus: ti-sysc: Fix watchdog quirk handling ARM: OMAP2+: Add pdata for OMAP3 ISP IOMMU ARM: OMAP2+: Plug in device_enable/idle ops for IOMMUs iommu/vt-d: Return the correct dma mask when we are bypassing the IOMMU iommu/amd: Check PM_LEVEL_SIZE() condition in locked section nvme-pci: Set the prp2 correctly when using more than 4k page HID: i2c-hid: add Trekstor Primebook C11B to descriptor override symbol namespaces: revert to previous __ksymtab name scheme modpost: make updating the symbol namespace explicit modpost: delegate updating namespaces to separate function HID: logitech-hidpp: do all FF cleanup in hidpp_ff_destroy() HID: logitech-hidpp: rework device validation HID: logitech-hidpp: split g920_get_config() HID: i2c-hid: Remove runtime power management x86/boot/acpi: Move get_cmdline_acpi_rsdp() under #ifdef guard x86/hyperv: Set pv_info.name to "Hyper-V" ACPI: CPPC: Set pcc_data[pcc_ss_id] to NULL in acpi_cppc_processor_exit() dmaengine: qcom: bam_dma: Fix resource leak scsi: lpfc: remove left-over BUILD_NVME defines scsi: core: try to get module before removing device scsi: hpsa: add missing hunks in reset-patch scsi: target: core: Do not overwrite CDB byte 1 net: Update address for MediaTek ethernet driver in MAINTAINERS ipv4: fix race condition between route lookup and invalidation ipv4: Return -ENETUNREACH if we can't create route but saddr is valid net: phy: micrel: Update KSZ87xx PHY name net: phy: micrel: Discern KSZ8051 and KSZ8795 PHYs io_uring: fix logic error in io_timeout io_uring: fix up O_NONBLOCK handling for sockets drm/amdgpu/vce: fix allocation size in enc ring test drm/amdgpu: fix error handling in amdgpu_bo_list_create drm/amdgpu: fix potential VM faults drm/amdgpu: user pages array memory leak fix drm/amdgpu/vcn: fix allocation size in enc ring test drm/amdgpu/uvd7: fix allocation size in enc ring test (v2) drm/amdgpu/uvd6: fix allocation size in enc ring test (v2) IB/hfi1: Use a common pad buffer for 9B and 16B packets IB/hfi1: Avoid excessive retry for TID RDMA READ request RDMA/mlx5: Clear old rate limit when closing QP net: dsa: microchip: Add shared regmap mutex net: dsa: microchip: Do not reinit mutexes on KSZ87xx net: stmmac: fix argument to stmmac_pcs_ctrl_ane() dpaa2-eth: Fix TX FQID values dpaa2-eth: add irq for the dpmac connect/disconnect event usb: hso: obey DMA rules in tiocmget Btrfs: check for the full sync flag while holding the inode lock during fsync Btrfs: fix qgroup double free after failure to reserve metadata for delalloc coccinelle: api/devm_platform_ioremap_resource: remove useless script ALSA: hda - Force runtime PM on Nvidia HDMI codecs dm cache: fix bugs when a GFP_NOWAIT allocation fails ARM: davinci_all_defconfig: enable GPIO backlight ARM: davinci: dm365: Fix McBSP dma_slave_map entry binder: Don't modify VMA bounds in ->mmap handler btrfs: tracepoints: Fix bad entry members of qgroup events btrfs: tracepoints: Fix wrong parameter order for qgroup events stop_machine: Avoid potential race behaviour EDAC/ghes: Fix Use after free in ghes_edac remove path ALSA: hda/realtek - Enable headset mic on Asus MJ401TA ALSA: usb-audio: Disable quirks for BOSS Katana amplifiers kheaders: substituting --sort in archive creation powerpc/32s: fix allow/prevent_user_access() when crossing segment boundaries. net: stmmac: disable/enable ptp_ref_clk in suspend/resume flow net: phy: Fix "link partner" information disappear issue rxrpc: use rcu protection while reading sk->sk_user_data drm/i915: Fixup preempt-to-busy vs resubmission of a virtual request drm/i915/userptr: Never allow userptr into the mappable GGTT drm/i915: Favor last VBT child device with conflicting AUX ch/DDC pin drm/i915/execlists: Refactor -EIO markup of hung requests Revert "blackhole_netdev: fix syzkaller reported issue" arm64: tags: Preserve tags for addresses translated via TTBR1 arm64: mm: fix inverted PAR_EL1.F check arm64: sysreg: fix incorrect definition of SYS_PAR_EL1_F arm64: entry.S: Do not preempt from IRQ before all cpufeatures are enabled md/raid0: fix warning message for parameter default_layout kthread: make __kthread_queue_delayed_work static pinctrl: aspeed-g6: Rename SD3 to EMMC and rework pin groups pinctrl: aspeed-g6: Fix UART13 group pinmux pinctrl: aspeed-g6: Make SIG_DESC_CLEAR() behave intuitively pinctrl: aspeed-g6: Fix I3C3/I3C4 pinmux configuration pinctrl: aspeed-g6: Fix I2C14 SDA description pinctrl: aspeed-g6: Sort pins for sanity dt-bindings: pinctrl: aspeed-g6: Rework SD3 function and groups perf kmem: Fix memory leak in compact_gfp_flags() usercopy: Avoid soft lockups in test_check_nonzero_user() pinctrl: berlin: as370: fix a typo s/spififib/spdifib ACPI: processor: Avoid NULL pointer dereferences at init time USB: serial: ti_usb_3410_5052: clean up serial data access USB: serial: ti_usb_3410_5052: fix port-close races xtensa: fix change_bit in exclusive access option HID: intel-ish-hid: fix wrong error handling in ishtp_cl_alloc_tx_ring() RISC-V: fix virtual address overlapped in FIXADDR_START and VMEMMAP_START net: usb: sr9800: fix uninitialized local variable net: bcmgenet: Fix RGMII_MODE_EN value for GENET v1/2/3 net: stmmac: make tc_flow_parsers static davinci_cpdma: make cpdma_chan_split_pool static net: i82596: fix dma_alloc_attr for sni_82596 sctp: change sctp_prot .no_autobind with true sched: etf: Fix ordering of packets with same txtime net: avoid potential infinite loop in tc_ctl_action() net: dsa: sja1105: Use the correct style for SPDX License Identifier tcp: fix a possible lockdep splat in tcp_done() arm: dts: mediatek: Update mt7629 dts to reflect the latest dt-binding net: ethernet: mediatek: Fix MT7629 missing GMII mode support Revert "Input: elantech - enable SMBus on new (2018+) systems" net/sched: fix corrupted L2 header with MPLS 'push' and 'pop' actions net: avoid errors when trying to pop MLPS header on non-MPLS packets net: cavium: Use the correct style for SPDX License Identifier net: dsa: microchip: Use the correct style for SPDX License Identifier PCI: PM: Fix pci_power_up() xtensa: virt: fix PCI IO ports mapping libata/ahci: Fix PCS quirk application vfio/type1: Initialize resv_msi_base 8250-men-mcb: fix error checking when get_num_ports returns -ENODEV USB: usblp: fix use-after-free on disconnect usb: udc: lpc32xx: fix bad bit shift operation usb: cdns3: Fix dequeue implementation. USB: legousbtower: fix a signedness bug in tower_probe() USB: legousbtower: fix memleak on disconnect USB: ldusb: fix memleak on disconnect net: ethernet: broadcom: have drivers select DIMLIB as needed net: Update address for vrf and l3mdev in MAINTAINERS net: bcmgenet: Set phydev->dev_flags only for internal PHYs blackhole_netdev: fix syzkaller reported issue ARM: dts: bcm2835-rpi-zero-w: Fix bus-width of sdhci sparc64: disable fast-GUP due to unexplained oopses btrfs: qgroup: Always free PREALLOC META reserve in btrfs_delalloc_release_extents() drm/panfrost: Handle resetting on timeout better blk-rq-qos: fix first node deletion of rq_qos_del() blkcg: Fix multiple bugs in blkcg_activate_policy() xfs: change the seconds fields in xfs_bulkstat to signed tools headers UAPI: Sync sched.h with the kernel rbd: cancel lock_dwork if the wait is interrupted ceph: just skip unrecognized info in ceph_reply_info_extra tools headers kvm: Sync kvm.h headers with the kernel sources tools headers kvm: Sync kvm headers with the kernel sources tools headers kvm: Sync kvm headers with the kernel sources perf c2c: Fix memory leak in build_cl_output() perf tools: Fix mode setting in copyfile_mode_ns() perf annotate: Fix multiple memory and file descriptor leaks io_uring: consider the overflow of sequence for timeout req perf tools: Fix resource leak of closedir() on the error paths perf evlist: Fix fix for freed id arrays perf jvmti: Link against tools/lib/ctype.h to have weak strlcpy() scripts: setlocalversion: fix a bashism kbuild: update comment about KBUILD_ALLDIRS virtio-fs: don't show mount options nvme-tcp: fix possible leakage during error flow nvmet-loop: fix possible leakage during error flow btrfs: don't needlessly create extent-refs kernel thread iommu/amd: Fix incorrect PASID decoding from event log iommu/ipmmu-vmsa: Only call platform_get_irq() when interrupt is mandatory iommu/rockchip: Don't use platform_get_irq to implicitly count irqs dmaengine: sprd: Fix the possible memory leak issue dmaengine: xilinx_dma: Fix control reg update in vdma_channel_set_config dmaengine: xilinx_dma: Fix 64-bit simple AXIDMA transfer x86/apic/x2apic: Fix a NULL pointer deref when handling a dying cpu x86/hyperv: Make vapic support x2apic mode KVM: PPC: Book3S HV: XIVE: Ensure VP isn't already in use arm64: hibernate: check pgd table allocation arm64: cpufeature: Treat ID_AA64ZFR0_EL1 as RAZ when SVE is not enabled net: aquantia: correctly handle macvlan and multicast coexistence net: aquantia: do not pass lro session with invalid tcp checksum net: aquantia: when cleaning hw cache it should be toggled net: aquantia: temperature retrieval fix gpio: lynxpoint: set default handler to be handle_bad_irq() gpio: merrifield: Move hardware initialization to callback gpio: lynxpoint: Move hardware initialization to callback gpio: intel-mid: Move hardware initialization to callback gpiolib: Initialize the hardware with a callback gpio: merrifield: Restore use of irq_base xtensa: drop EXPORT_SYMBOL for outs/ins mm/memory-failure: poison read receives SIGKILL instead of SIGBUS if mmaped more than once mm/slab.c: fix kernel-doc warning for __ksize() xarray.h: fix kernel-doc warning bitmap.h: fix kernel-doc warning and typo fs/fs-writeback.c: fix kernel-doc warning fs/libfs.c: fix kernel-doc warning fs/direct-io.c: fix kernel-doc warning mm, compaction: fix wrong pfn handling in __reset_isolation_pfn() mm, hugetlb: allow hugepage allocations to reclaim as needed lib/test_meminit: add a kmem_cache_alloc_bulk() test mm/slub.c: init_on_free=1 should wipe freelist ptr for bulk allocations lib/generic-radix-tree.c: add kmemleak annotations mm/slub: fix a deadlock in show_slab_objects() mm, page_owner: rename flag indicating that page is allocated mm, page_owner: decouple freeing stack trace from debug_pagealloc mm, page_owner: fix off-by-one error in __set_page_owner_handle() xtensa: fix type conversion in __get_user_[no]check xtensa: clean up assembly arguments in uaccess macros block: Fix elv_support_iosched() parisc: Remove 32-bit DMA enforcement from sba_iommu parisc: Fix vmap memory leak in ioremap()/iounmap() parisc: prefer __section from compiler_attributes.h parisc: sysctl.c: Use CONFIG_PARISC instead of __hppa_ define firmware: dmi: Fix unlikely out-of-bounds read in save_mem_devices riscv: tlbflush: remove confusing comment on local_flush_tlb_all() riscv: dts: HiFive Unleashed: add default chosen/stdout-path riscv: remove the switch statement in do_trap_break() drm/panfrost: Add missing GPU feature registers bpf: lwtunnel: Fix reroute supplying invalid dst xtensa: fix {get,put}_user() for 64bit values kmemleak: Do not corrupt the object_list during clean-up nvme-tcp: Initialize sk->sk_ll_usec only with NET_RX_BUSY_POLL nvme: Wait for reset state when required nvme: Prevent resets during paused controller state nvme: Restart request timers in resetting state nvme: Remove ADMIN_ONLY state nvme-pci: Free tagset if no IO queues hrtimer: Annotate lockless access to timer->base staging: wlan-ng: fix exit return when sme->key_idx >= NUM_WEPKEYS ARM: imx_v6_v7_defconfig: Enable CONFIG_DRM_MSM arm64: dts: imx8mn: Use correct clock for usdhc's ipg clk arm64: dts: imx8mm: Use correct clock for usdhc's ipg clk arm64: dts: imx8mq: Use correct clock for usdhc's ipg clk platform/x86: i2c-multi-instantiate: Fail the probe if no IRQ provided ARM: dts: imx7s: Correct GPT's ipg clock source ARM: dts: vf610-zii-scu4-aib: Specify 'i2c-mux-idle-disconnect' drm/ttm: fix handling in ttm_bo_add_mem_to_lru drm/ttm: Restore ttm prefaulting drm/ttm: fix busy reference in ttm_mem_evict_first ARM: dts: imx6q-logicpd: Re-Enable SNVS power key ath10k: fix latency issue for QCA988x virtio-fs: Change module name to virtiofs.ko dmaengine: imx-sdma: fix size check for sdma script_number dmaengine: tegra210-adma: fix transfer failure arm64: dts: lx2160a: Correct CPU core idle state name dmaengine: sprd: Fix the link-list pointer register configuration issue batman-adv: Avoid free/alloc race when handling OGM buffer batman-adv: Avoid free/alloc race when handling OGM2 buffer netdevsim: Fix error handling in nsim_fib_init and nsim_fib_exit net/ibmvnic: Fix EOI when running in XIVE mode. net: lpc_eth: avoid resetting twice tcp: annotate sk->sk_wmem_queued lockless reads tcp: annotate sk->sk_sndbuf lockless reads tcp: annotate sk->sk_rcvbuf lockless reads tcp: annotate tp->urg_seq lockless reads tcp: annotate tp->snd_nxt lockless reads tcp: annotate tp->write_seq lockless reads tcp: annotate tp->copied_seq lockless reads tcp: annotate tp->rcv_nxt lockless reads tcp: add rcu protection around tp->fastopen_rsk vhost/test: stop device before reset tools/virtio: xen stub mailmap: Add Simon Arlott (replacement for expired email address) rxrpc: Fix possible NULL pointer access in ICMP handling drm/amdgpu/sdma5: fix mask value of POLL_REGMEM packet for pipe sync drm/amdgpu: Bail earlier when amdgpu.cik_/si_support is not set to 1 Revert "drm/radeon: Fix EEH during kexec" Input: synaptics-rmi4 - avoid processing unknown IRQs btrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group() drm/msm/dsi: Implement reset correctly Btrfs: add missing extents release on file extent cluster relocation error x86/boot/64: Round memory hole size up to next PMD page x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area arm64: Fix kcore macros after 52-bit virtual addressing fallout tools/virtio: more stubs net/smc: receive pending data after RCV_SHUTDOWN net/smc: receive returns without data net/smc: fix SMCD link group creation with VLAN id net: update net_dim documentation after rename r8169: fix jumbo packet handling on resume from suspend arm64: dts: rockchip: Fix override mode for rk3399-kevin panel arm64: dts: rockchip: Fix usb-c on Hugsun X99 TV Box arm64: dts: rockchip: fix RockPro64 sdmmc settings ARM: 8914/1: NOMMU: Fix exc_ret for XIP ARM: 8908/1: add __always_inline to functions called from __get_user_check() HID: google: add magnemite/masterball USB ids MAINTAINERS: Add BCM2711 to BCM2835 ARCH dma-buf/resv: fix exclusive fence get drm/edid: Add 6 bpc quirk for SDC panel in Lenovo G50 dm snapshot: rework COW throttling to fix deadlock dm snapshot: introduce account_start_copy() and account_end_copy() drm/tiny: Kconfig: Remove always-y THERMAL dep. from TINYDRM_REPAPER platform/x86: intel_punit_ipc: Avoid error message when retrieving IRQ platform/x86: classmate-laptop: remove unused variable opp: of: drop incorrect lockdep_assert_held() PM: sleep: include <linux/pm_runtime.h> for pm_wq cpufreq: Avoid cpufreq_suspend() deadlock on system shutdown ACPI: PM: Drop Dell XPS13 9360 from LPS0 Idle _DSM blacklist net: silence KCSAN warnings about sk->sk_backlog.len reads net: annotate sk->sk_rcvlowat lockless reads net: silence KCSAN warnings around sk_add_backlog() calls tcp: annotate lockless access to tcp_memory_pressure net: add {READ\|WRITE}_ONCE() annotations on ->rskq_accept_head net: avoid possible false sharing in sk_leave_memory_pressure() tun: remove possible false sharing in tun_flow_update() netfilter: conntrack: avoid possible false sharing netns: fix NLM_F_ECHO mechanism for RTM_NEWNSID scsi: ch: Make it possible to open a ch device multiple times again scsi: fix kconfig dependency warning related to 53C700_LE_ON_BE scsi: sni_53c710: fix compilation error scsi: scsi_dh_alua: handle RTPG sense code correctly during state transitions scsi: qla2xxx: fix a potential NULL pointer dereference net: usb: qmi_wwan: add Telit 0x1050 composition act_mirred: Fix mirred_init_module error handling net: taprio: Fix returning EINVAL when configuring without flags s390/qeth: Fix initialization of vnicc cmd masks during set online s390/qeth: Fix error handling during VNICC initialization phylink: fix kernel-doc warnings sctp: add chunks to sk_backlog when the newsk sk_socket is not set bonding: fix potential NULL deref in bond_update_slave_arr net: stmmac: fix disabling flexible PPS output net: stmmac: fix length of PTP clock's name string ARM: mm: alignment: use "u32" for 32-bit instructions ARM: mm: fix alignment handler faults under memory pressure ARM: dts: Use level interrupt for omap4 & 5 wlcore drivers/amba: fix reset control error handling ASoC: simple_card_utils.h: Fix potential multiple redefinition error ASoC: msm8916-wcd-digital: add missing MIX2 path for RX1/2 drm/amdgpu/powerplay: fix typo in mvdd table setup iwlwifi: pcie: change qu with jf devices to use qu configuration iwlwifi: exclude GEO SAR support for 3168 iwlwifi: pcie: fix memory leaks in iwl_pcie_ctxt_info_gen3_init iwlwifi: dbg_ini: fix memory leak in alloc_sgtable iwlwifi: pcie: fix rb_allocator workqueue allocation iwlwifi: pcie: fix indexing in command dump for new HW iwlwifi: mvm: fix race in sync rx queue notification iwlwifi: mvm: force single phy init iwlwifi: fix ACPI table revision checks iwlwifi: don't access trans_cfg via cfg memstick: jmb38x_ms: Fix an error handling path in 'jmb38x_ms_probe()' mmc: sdhci-iproc: fix spurious interrupts on Multiblock reads with bcm2711 pinctrl: armada-37xx: swap polarity on LED group arm64: dts: armada-3720-turris-mox: convert usb-phy to phy-supply ip6erspan: remove the incorrect mtu limit for ip6erspan Doc: networking/device_drivers/pensando: fix ionic.rst warnings NFC: pn533: fix use-after-free and memleaks Input: soc_button_array - partial revert of support for newer surface devices net_sched: fix backward compatibility for TCA_ACT_KIND net_sched: fix backward compatibility for TCA_KIND net/mlx5: DR, Allow insertion of duplicate rules selftests/bpf: More compatible nc options in test_lwt_ip_encap selftests/bpf: Set rp_filter in test_flow_dissector llc: fix sk_buff refcounting in llc_conn_state_process() llc: fix another potential sk_buff leak in llc_ui_sendmsg() llc: fix sk_buff leak in llc_conn_service() llc: fix sk_buff leak in llc_sap_state_process() dm clone: Make __hash_find static rt2x00: remove input-polldev.h header ARM: dts: am3874-iceboard: Fix 'i2c-mux-idle-disconnect' usage ARM: dts: omap5: fix gpu_cm clock provider name arm64: Allow CAVIUM_TX2_ERRATUM_219 to be selected arm64: Avoid Cavium TX2 erratum 219 when switching TTBR arm64: Enable workaround for Cavium TX2 erratum 219 when running SMT arm64: KVM: Trap VM ops when ARM64_WORKAROUND_CAVIUM_TX2_219_TVM is set mac80211: fix scan when operating on DFS channels in ETSI domains mac80211: accept deauth frames in IBSS mode cfg80211: fix a bunch of RCU issues in multi-bssid code nl80211: fix memory leak in nl80211_get_ftm_responder_stats ptp: fix typo of "mechanism" in Kconfig help text ionic: fix stats memory dereference ASoC: core: Fix pcm code debugfs error ARM: dts: sun7i: Drop the module clock from the device tree dt-bindings: media: sun4i-csi: Drop the module clock rxrpc: Fix call crypto state cleanup rxrpc: rxrpc_peer needs to hold a ref on the rxrpc_local record rxrpc: Fix trace-after-put looking at the put call record rxrpc: Fix trace-after-put looking at the put connection record rxrpc: Fix trace-after-put looking at the put peer record media: dt-bindings: Fix building error for dt_binding_check rxrpc: Fix call ref leak ALSA: hdac: clear link output stream mapping ALSA: hda/realtek: Reduce the Headphone static noise on XPS 9350/9360 lib: test_user_copy: style cleanup net: stmmac: selftests: Fix L2 Hash Filter test net: stmmac: gmac4+: Not all Unicast addresses may be available net: stmmac: selftests: Check if filtering is available before running net: dsa: b53: Do not clear existing mirrored port mask arm64: dts: zii-ultra: fix ARM regulator states soc: imx: imx-scu: Getting UID from SCU should have response pinctrl: stmfx: fix null pointer on remove pinctrl: iproc: allow for error from platform_get_irq() nvme: retain split access workaround for capability reads nvme: fix possible deadlock when nvme_update_formats fails pinctrl: ns2: Fix off by one bugs in ns2_pinmux_enable() pinctrl: bcm-iproc: Use SPDX header pinctrl: armada-37xx: fix control of pins 32 and up arm64: dts: rockchip: fix RockPro64 sdhci settings arm64: dts: rockchip: fix RockPro64 vdd-log regulator settings regulator: qcom-rpmh: Fix PMIC5 BoB min voltage ARM: dts: logicpd-torpedo-som: Remove twl_keypad cfg80211: wext: avoid copying malformed SSIDs mac80211: Reject malformed SSID elements mac80211_hwsim: fix incorrect dev_alloc_name failure goto MAINTAINERS: Add hp_sdc drivers to parisc arch scsi: MAINTAINERS: Update qla2xxx driver scsi: zfcp: fix reaction on bit error threshold notification scsi: core: save/restore command resid for error handling dt-bindings: arm: rockchip: fix Theobroma-System board bindings arm64: dts: rockchip: fix Rockpro64 RK808 interrupt line HID: Fix assumption that devices have inputs ARM: omap2plus_defconfig: Fix selected panels after generic panel changes samples/bpf: Add a workaround for asm_inline xsk: Fix crash in poll when device does not support ndo_xsk_wakeup samples/bpf: Fix build for task_fd_query_user.c ASoc: rockchip: i2s: Fix RPM imbalance mmc: sh_mmcif: Use platform_get_irq_optional() for optional interrupt mmc: renesas_sdhi: Do not use platform_get_irq() to count interrupts ACPI: HMAT: ACPI_HMAT_MEMORY_PD_VALID is deprecated since ACPI-6.3 Input: goodix - add support for 9-bytes reports Input: da9063 - fix capability and drop KEY_SLEEP ASoC: wm_adsp: Don't generate kcontrols without READ flags sysfs: Fixes __BIN_ATTR_WO() macro rt2x00: initialize last_reset selftests/bpf: test_progs: Don't leak server_fd in test_sockopt_inherit selftests/bpf: test_progs: Don't leak server_fd in tcp_rtt regulator: pfuze100-regulator: Variable "val" in pfuze100_regulator_probe() could be uninitialized ASoC: intel: bytcr_rt5651: add null check to support_button_press ASoC: intel: sof_rt5682: add remove function to disable jack ASoC: rt5682: add NULL handler to set_jack function ASoC: intel: sof_rt5682: use separate route map for dmic ASoC: SOF: Intel: hda: Disable DMI L1 entry during capture ASoC: SOF: Intel: initialise and verify FW crash dump data. ASoC: SOF: Intel: hda: fix warnings during FW load ASoC: SOF: pcm: harden PCM STOP sequence ASoC: SOF: pcm: fix resource leak in hw_free ASoC: SOF: topology: fix parse fail issue for byte/bool tuple types ASoC: SOF: loader: fix kernel oops on firmware boot failure regulator: lochnagar: Add on_off_delay for VDDCORE ASoC: wm_adsp: Fix theoretical NULL pointer for alg_region pinctrl: cherryview: restore Strago DMI workaround for all versions pinctrl: intel: Allocate IRQ chip dynamic HID: prodikeys: make array keys static const, makes object smaller HID: fix error message in hid_open_report() ASoC: max98373: check for device node before parsing regulator: ti-abb: Fix timeout in ti_abb_wait_txdone/ti_abb_clear_all_txdone iommu/io-pgtable-arm: Support all Mali configurations iommu/io-pgtable-arm: Correct Mali attributes iommu/arm-smmu: Free context bitmap in the err path of arm_smmu_init_domain_context scsi: qla2xxx: Remove WARN_ON_ONCE in qla2x00_status_cont_entry() scsi: sd: Ignore a failure to sync cache due to lack of authorization arm64: dts: Fix gpio to pinmux mapping libbpf: handle symbol versioning properly for libbpf.a arm64: dts: allwinner: a64: sopine-baseboard: Add PHY regulator delay arm64: dts: allwinner: a64: Drop PMU node arm64: dts: allwinner: a64: pine64-plus: Add PHY regulator delay tools: bpf: Use !building_out_of_srctree to determine srctree ASoC: topology: Fix a signedness bug in soc_tplg_dapm_widget_create() scsi: core: fix dh and multipathing for SCSI hosts without request batching scsi: core: fix missing .cleanup_rq for SCSI hosts without request batching regulator: da9062: fix suspend_enable/disable preparation dt-bindings: fixed-regulator: fix compatible enum regulator: fixed: Prevent NULL pointer dereference when !CONFIG_OF ASoC: soc-component: fix a couple missing error assignments ASoC: wm8994: Do not register inapplicable controls for WM1811 ASoC: samsung: arndale: Add missing OF node dereferencing irqchip/sifive-plic: Switch to fasteoi flow irqchip/gic-v3: Fix GIC_LINE_NR accessor regulator: core: make regulator_register() EPROBE_DEFER aware regulator: of: fix suspend-min/max-voltage parsing irqchip/atmel-aic5: Add support for sam9x60 irqchip irqchip/al-fic: Add support for irq retrigger Change-Id: I5e7fd941c93a36889378f480cc27d8ea77d11b39 Signed-off-by: Raghavendra Rao Ananta <rananta@codeaurora.org>	2019-11-04 17:30:19 -08:00
Greg Kroah-Hartman	9be46ff3b6	Merge 5.4-rc4 into android-mainline Linux 5.4-rc4 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I0edccd72fad8b6443b24c8c1005b66d6b8f532ce	2019-10-26 19:24:41 +02:00
Prakash Gupta	8d8eacdb48	mm: run the showmem notifier in alloc failure When the page allocation fails, it's useful to be able to see the state of unaccounted memory in the system. Call the showmem notifier to get other clients to dump out their state. This is an example output with this patch. [ 457.125478] SLUB: Unable to allocate memory on node -1, gfp=0x2008000(GFP_NOWAIT\|__GFP_ZERO) [ 457.133982] cache: kmalloc-128, object size: 128, buffer size: 640, default order: 2, min order: 0 [ 457.143179] node 0: slabs: 5903, objs: 132755, free: 26 [ 457.906076] BootAnimation: page allocation failure: order:0, mode:0x2204000(GFP_NOWAIT\|__GFP_COMP\|__GFP_NOTRACK) [ 457.916395] CPU: 2 PID: 4752 Comm: BootAnimation Not tainted 4.9.37+ #43 [ 457.916398] Hardware name: Qualcomm Technologies, Inc. SDM845 v1 MTP (DT) [ 457.916402] Call trace: [ 457.916420] [<ffffff82c5c89504>] dump_backtrace+0x0/0x2c4 [ 457.916426] [<ffffff82c5c897e8>] show_stack+0x20/0x28 [ 457.916434] [<ffffff82c5fea888>] dump_stack+0xb8/0xf4 [ 457.916442] [<ffffff82c5dd136c>] warn_alloc+0x154/0x170 [ 457.916447] [<ffffff82c5dd184c>] __alloc_pages_nodemask+0x430/0xcdc [ 457.916454] [<ffffff82c5e1c154>] new_slab+0x344/0x430 [ 457.916458] [<ffffff82c5e1e404>] ___slab_alloc.constprop.72+0x2f4/0x398 [ 457.916463] [<ffffff82c5e1e4f0>] __slab_alloc.isra.69.constprop.71+0x48/0x80 [ 457.916467] [<ffffff82c5e1ea24>] kmem_cache_alloc_trace+0x210/0x2dc [ 457.916476] [<ffffff82c68ebf0c>] binder_transaction+0x280/0x2008 [ 457.916480] [<ffffff82c68ee68c>] binder_thread_write+0x9f8/0x136c [ 457.916484] [<ffffff82c68f08d0>] binder_ioctl_write_read+0x14c/0x3b0 [ 457.916488] [<ffffff82c68f0df0>] binder_ioctl+0x2bc/0x868 [ 457.916494] [<ffffff82c5e451e4>] do_vfs_ioctl+0xd0/0x858 [ 457.916498] [<ffffff82c5e459fc>] SyS_ioctl+0x90/0xa4 [ 457.916503] [<ffffff82c5c83770>] el0_svc_naked+0x24/0x28 [ 457.916505] Mem-Info: [ 457.916515] active_anon:83629 inactive_anon:212 isolated_anon:0\x0a active_file:5955 inactive_file:5745 isolated_file:0\x0a unevictable:630956 dirty:0 writeback:0 unstable:0\x0a slab_recl aimable:16602 slab_unreclaimable:69384\x0a mapped:3609 shmem:308 pagetables:5737 bounce:0\x0a free:4446 free_pcp:482 free_cma:112 [ 457.916524] Node 0 active_anon:334516kB inactive_anon:848kB active_file:23820kB inactive_file:22980kB unevictable:2523824kB isolated(anon):0kB isolated(file):0kB mapped:14436kB dirty:0kB writeback:0kB shmem:1232kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no [ 457.916534] DMA free:12516kB min:3352kB low:4924kB high:6496kB active_anon:45336kB inactive_anon:64kB active_file:3588kB inactive_file:3404kB unevictable:1564224kB writepending:0kB presen t:1854120kB managed:1748064kB mlocked:1564224kB slab_reclaimable:10664kB slab_unreclaimable:24400kB kernel_stack:2608kB pagetables:6376kB bounce:0kB free_pcp:572kB local_pcp:0kB free_cma:448 kB [ 457.916536] lowmem_reserve[]: 0 1901 1901 [ 457.916550] Normal free:5268kB min:4148kB low:6092kB high:8036kB active_anon:289180kB inactive_anon:784kB active_file:20232kB inactive_file:19576kB unevictable:959600kB writepending:0kB p resent:2068224kB managed:1984196kB mlocked:959600kB slab_reclaimable:55744kB slab_unreclaimable:253136kB kernel_stack:19232kB pagetables:16572kB bounce:0kB free_pcp:1356kB local_pcp:116kB fr ee_cma:0kB [ 457.916552] lowmem_reserve[]: 0 0 0 [ 457.916560] DMA: 8194kB (UMEC) 2808kB (UMEC) 22416kB (UME) 9832kB (UME) 664kB (UME) 0128kB 0256kB 0512kB 01024kB 02048kB 04096kB = 12620kB [ 457.916594] Normal: 11184kB (UMEH) 388kB (UMH) 116kB (H) 232kB (H) 164kB (H) 1128kB (H) 0256kB 0512kB 01024kB 02048kB 04096kB = 5048kB [ 457.916627] 12071 total pagecache pages [ 457.916630] 0 pages in swap cache [ 457.916633] Swap cache stats: add 0, delete 0, find 0/0 [ 457.916634] Free swap = 0kB [ 457.916636] Total swap = 0kB [ 457.916639] 980586 pages RAM [ 457.916641] 0 pages HighMem/MovableOnly [ 457.916643] 47521 pages reserved [ 457.916645] 51200 pages cma reserved [ 457.916651] cma: cma-0 pages: => 0 used of 2048 total pages [ 457.916660] cma: cma-1 pages: => 0 used of 23552 total pages [ 457.916665] cma: cma-2 pages: => 1695 used of 3072 total pages [ 457.916670] cma: cma-3 pages: => 8277 used of 9216 total pages [ 457.916674] cma: cma-4 pages: => 186 used of 5120 total pages [ 457.916679] cma: cma-5 pages: => 3792 used of 8192 total pages [ 457.916685] Heap name Total heap size Total orphaned size [ 457.916687] --------------------------------- [ 457.916691] qsecom 0x ba000 0x 0 [ 457.916694] system 0x 44db000 0x 500000 [ 457.916705] ------------------------------------------------- [ 457.916708] uncached pool = 31027200 cached pool = 0 secure pool = 0 [ 457.916710] pool total (uncached + cached + secure) = 31027200 [ 457.916712] ------------------------------------------------- [ 457.916715] adsp 0x 614000 0x 0 [ 457.916720] spss 0x 0 0x 0 [ 457.916725] secure_display 0x 0 0x 0 [ 457.916727] secure_heap 0x 0 0x 0. Change-Id: Id01cce4abf331ff9c1c7ab9f0c0f9b1fc4146467 Signed-off-by: Prakash Gupta <guptap@codeaurora.org> Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>	2019-10-24 10:21:16 -07:00
Greg Kroah-Hartman	630839ac24	Merge 5.4-rc3 into android-mainline Linux 5.4-rc3 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ia87ba662738dd58ddb917e32c1fbd812861e7a46	2019-10-17 05:28:13 -07:00
David Rientjes	3f36d86694	mm, hugetlb: allow hugepage allocations to reclaim as needed Commit `b39d0ee263` ("mm, page_alloc: avoid expensive reclaim when compaction may not succeed") has chnaged the allocator to bail out from the allocator early to prevent from a potentially excessive memory reclaim. __GFP_RETRY_MAYFAIL is designed to retry the allocation, reclaim and compaction loop as long as there is a reasonable chance to make forward progress. Neither COMPACT_SKIPPED nor COMPACT_DEFERRED at the INIT_COMPACT_PRIORITY compaction attempt gives this feedback. The most obvious affected subsystem is hugetlbfs which allocates huge pages based on an admin request (or via admin configured overcommit). I have done a simple test which tries to allocate half of the memory for hugetlb pages while the memory is full of a clean page cache. This is not an unusual situation because we try to cache as much of the memory as possible and sysctl/sysfs interface to allocate huge pages is there for flexibility to allocate hugetlb pages at any time. System has 1GB of RAM and we are requesting 515MB worth of hugetlb pages after the memory is prefilled by a clean page cache: root@test1:~# cat hugetlb_test.sh set -x echo 0 > /proc/sys/vm/nr_hugepages echo 3 > /proc/sys/vm/drop_caches echo 1 > /proc/sys/vm/compact_memory dd if=/mnt/data/file-1G of=/dev/null bs=$((4<<10)) TS=$(date +%s) echo 256 > /proc/sys/vm/nr_hugepages cat /proc/sys/vm/nr_hugepages The results for 2 consecutive runs on clean 5.3 root@test1:~# sh hugetlb_test.sh + echo 0 + echo 3 + echo 1 + dd if=/mnt/data/file-1G of=/dev/null bs=4096 262144+0 records in 262144+0 records out 1073741824 bytes (1.1 GB) copied, 21.0694 s, 51.0 MB/s + date +%s + TS=1569905284 + echo 256 + cat /proc/sys/vm/nr_hugepages 256 root@test1:~# sh hugetlb_test.sh + echo 0 + echo 3 + echo 1 + dd if=/mnt/data/file-1G of=/dev/null bs=4096 262144+0 records in 262144+0 records out 1073741824 bytes (1.1 GB) copied, 21.7548 s, 49.4 MB/s + date +%s + TS=1569905311 + echo 256 + cat /proc/sys/vm/nr_hugepages 256 Now with `b39d0ee263` applied root@test1:~# sh hugetlb_test.sh + echo 0 + echo 3 + echo 1 + dd if=/mnt/data/file-1G of=/dev/null bs=4096 262144+0 records in 262144+0 records out 1073741824 bytes (1.1 GB) copied, 20.1815 s, 53.2 MB/s + date +%s + TS=1569905516 + echo 256 + cat /proc/sys/vm/nr_hugepages 11 root@test1:~# sh hugetlb_test.sh + echo 0 + echo 3 + echo 1 + dd if=/mnt/data/file-1G of=/dev/null bs=4096 262144+0 records in 262144+0 records out 1073741824 bytes (1.1 GB) copied, 21.9485 s, 48.9 MB/s + date +%s + TS=1569905541 + echo 256 + cat /proc/sys/vm/nr_hugepages 12 The success rate went down by factor of 20! Although hugetlb allocation requests might fail and it is reasonable to expect them to under extremely fragmented memory or when the memory is under a heavy pressure but the above situation is not that case. Fix the regression by reverting back to the previous behavior for __GFP_RETRY_MAYFAIL requests and disable the beail out heuristic for those requests. Mike said: : hugetlbfs allocations are commonly done via sysctl/sysfs shortly after : boot where this may not be as much of an issue. However, I am aware of at : least three use cases where allocations are made after the system has been : up and running for quite some time: : : - DB reconfiguration. If sysctl/sysfs fails to get required number of : huge pages, system is rebooted to perform allocation after boot. : : - VM provisioning. If unable get required number of huge pages, fall : back to base pages. : : - An application that does not preallocate pool, but rather allocates : pages at fault time for optimal NUMA locality. : : In all cases, I would expect `b39d0ee263` to cause regressions and : noticable behavior changes. : : My quick/limited testing in : https://lkml.kernel.org/r/3468b605-a3a9-6978-9699-57c52a90bd7e@oracle.com : was insufficient. It was also mentioned that if something like : `b39d0ee263` went forward, I would like exemptions for __GFP_RETRY_MAYFAIL : requests as in this patch. [mhocko@suse.com: reworded changelog] Link: http://lkml.kernel.org/r/20191007075548.12456-1-mhocko@kernel.org Fixes: `b39d0ee263` ("mm, page_alloc: avoid expensive reclaim when compaction may not succeed") Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-10-14 15:04:01 -07:00
Qian Cai	234fdce892	mm/page_alloc.c: fix a crash in free_pages_prepare() On architectures like s390, arch_free_page() could mark the page unused (set_page_unused()) and any access later would trigger a kernel panic. Fix it by moving arch_free_page() after all possible accessing calls. Hardware name: IBM 2964 N96 400 (z/VM 6.4.0) Krnl PSW : 0404e00180000000 0000000026c2b96e (__free_pages_ok+0x34e/0x5d8) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 Krnl GPRS: 0000000088d43af7 0000000000484000 000000000000007c 000000000000000f 000003d080012100 000003d080013fc0 0000000000000000 0000000000100000 00000000275cca48 0000000000000100 0000000000000008 000003d080010000 00000000000001d0 000003d000000000 0000000026c2b78a 000000002717fdb0 Krnl Code: 0000000026c2b95c: ec1100b30659 risbgn %r1,%r1,0,179,6 0000000026c2b962: e32014000036 pfd 2,1024(%r1) #0000000026c2b968: d7ff10001000 xc 0(256,%r1),0(%r1) >0000000026c2b96e: 41101100 la %r1,256(%r1) 0000000026c2b972: a737fff8 brctg %r3,26c2b962 0000000026c2b976: d7ff10001000 xc 0(256,%r1),0(%r1) 0000000026c2b97c: e31003400004 lg %r1,832 0000000026c2b982: ebff1430016a asi 5168(%r1),-1 Call Trace: __free_pages_ok+0x16a/0x5d8) memblock_free_all+0x206/0x290 mem_init+0x58/0x120 start_kernel+0x2b0/0x570 startup_continue+0x6a/0xc0 INFO: lockdep is turned off. Last Breaking-Event-Address: __free_pages_ok+0x372/0x5d8 Kernel panic - not syncing: Fatal exception: panic_on_oops 00: HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 00000000 26A2379C In the past, only kernel_poison_pages() would trigger this but it needs "page_poison=on" kernel cmdline, and I suspect nobody tested that on s390. Recently, kernel_init_free_pages() (commit `6471384af2` ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options")) was added and could trigger this as well. [akpm@linux-foundation.org: add comment] Link: http://lkml.kernel.org/r/1569613623-16820-1-git-send-email-cai@lca.pw Fixes: `8823b1dbc0` ("mm/page_poison.c: enable PAGE_POISONING as a separate option") Fixes: `6471384af2` ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options") Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: <stable@vger.kernel.org> [5.3+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-10-07 15:47:19 -07:00
Greg Kroah-Hartman	cb33d78781	Merge 5.4-rc1 into android-mainline Linux 5.4-rc1 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I15eec52df70f829acf81ff614a1c2a5fb443a4e0	2019-10-02 19:10:07 +02:00
Greg Kroah-Hartman	94139142d9	Merge 5.4-rc1-prelrease into android-mainline To make the 5.4-rc1 merge easier, merge at a prerelease point in time before the final release happens. Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: If613d657fd0abf9910c5bf3435a745f01b89765e	2019-10-02 17:58:47 +02:00
Linus Torvalds	edf445ad7c	Merge branch 'hugepage-fallbacks' (hugepatch patches from David Rientjes) Merge hugepage allocation updates from David Rientjes: "We (mostly Linus, Andrea, and myself) have been discussing offlist how to implement a sane default allocation strategy for hugepages on NUMA platforms. With these reverts in place, the page allocator will happily allocate a remote hugepage immediately rather than try to make a local hugepage available. This incurs a substantial performance degradation when memory compaction would have otherwise made a local hugepage available. This series reverts those reverts and attempts to propose a more sane default allocation strategy specifically for hugepages. Andrea acknowledges this is likely to fix the swap storms that he originally reported that resulted in the patches that removed __GFP_THISNODE from hugepage allocations. The immediate goal is to return 5.3 to the behavior the kernel has implemented over the past several years so that remote hugepages are not immediately allocated when local hugepages could have been made available because the increased access latency is untenable. The next goal is to introduce a sane default allocation strategy for hugepages allocations in general regardless of the configuration of the system so that we prevent thrashing of local memory when compaction is unlikely to succeed and can prefer remote hugepages over remote native pages when the local node is low on memory." Note on timing: this reverts the hugepage VM behavior changes that got introduced fairly late in the 5.3 cycle, and that fixed a huge performance regression for certain loads that had been around since 4.18. Andrea had this note: "The regression of 4.18 was that it was taking hours to start a VM where 3.10 was only taking a few seconds, I reported all the details on lkml when it was finally tracked down in August 2018. https://lore.kernel.org/linux-mm/20180820032640.9896-2-aarcange@redhat.com/ __GFP_THISNODE in MADV_HUGEPAGE made the above enterprise vfio workload degrade like in the "current upstream" above. And it still would have been that bad as above until 5.3-rc5" where the bad behavior ends up happening as you fill up a local node, and without that change, you'd get into the nasty swap storm behavior due to compaction working overtime to make room for more memory on the nodes. As a result 5.3 got the two performance fix reverts in rc5. However, David Rientjes then noted that those performance fixes in turn regressed performance for other loads - although not quite to the same degree. He suggested reverting the reverts and instead replacing them with two small changes to how hugepage allocations are done (patch descriptions rephrased by me): - "avoid expensive reclaim when compaction may not succeed": just admit that the allocation failed when you're trying to allocate a huge-page and compaction wasn't successful. - "allow hugepage fallback to remote nodes when madvised": when that node-local huge-page allocation failed, retry without forcing the local node. but by then I judged it too late to replace the fixes for a 5.3 release. So 5.3 was released with behavior that harked back to the pre-4.18 logic. But now we're in the merge window for 5.4, and we can see if this alternate model fixes not just the horrendous swap storm behavior, but also restores the performance regression that the late reverts caused. Fingers crossed. * emailed patches from David Rientjes <rientjes@google.com>: mm, page_alloc: allow hugepage fallback to remote nodes when madvised mm, page_alloc: avoid expensive reclaim when compaction may not succeed Revert "Revert "Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"" Revert "Revert "mm, thp: restore node-local hugepage allocations""	2019-09-28 14:26:47 -07:00
David Rientjes	b39d0ee263	mm, page_alloc: avoid expensive reclaim when compaction may not succeed Memory compaction has a couple significant drawbacks as the allocation order increases, specifically: - isolate_freepages() is responsible for finding free pages to use as migration targets and is implemented as a linear scan of memory starting at the end of a zone, - failing order-0 watermark checks in memory compaction does not account for how far below the watermarks the zone actually is: to enable migration, there must be some free memory available. Per the above, watermarks are not always suffficient if isolate_freepages() cannot find the free memory but it could require hundreds of MBs of reclaim to even reach this threshold (read: potentially very expensive reclaim with no indication compaction can be successful), and - if compaction at this order has failed recently so that it does not even run as a result of deferred compaction, looping through reclaim can often be pointless. For hugepage allocations, these are quite substantial drawbacks because these are very high order allocations (order-9 on x86) and falling back to doing reclaim can potentially be very expensive without any indication that compaction would even be successful. Reclaim itself is unlikely to free entire pageblocks and certainly no reliance should be put on it to do so in isolation (recall lumpy reclaim). This means we should avoid reclaim and simply fail hugepage allocation if compaction is deferred. It is also not helpful to thrash a zone by doing excessive reclaim if compaction may not be able to access that memory. If order-0 watermarks fail and the allocation order is sufficiently large, it is likely better to fail the allocation rather than thrashing the zone. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-09-28 14:05:38 -07:00
Yang Shi	7ae88534cd	mm: move mem_cgroup_uncharge out of __page_cache_release() A later patch makes THP deferred split shrinker memcg aware, but it needs page->mem_cgroup information in THP destructor, which is called after mem_cgroup_uncharge() now. So move mem_cgroup_uncharge() from __page_cache_release() to compound page destructor, which is called by both THP and other compound pages except HugeTLB. And call it in __put_single_page() for single order page. Link: http://lkml.kernel.org/r/1565144277-36240-3-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Suggested-by: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Hugh Dickins <hughd@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Qian Cai <cai@lca.pw> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-09-24 15:54:11 -07:00
Yang Shi	364c1eebe4	mm: thp: extract split_queue_* into a struct Patch series "Make deferred split shrinker memcg aware", v6. Currently THP deferred split shrinker is not memcg aware, this may cause premature OOM with some configuration. For example the below test would run into premature OOM easily: $ cgcreate -g memory:thp $ echo 4G > /sys/fs/cgroup/memory/thp/memory/limit_in_bytes $ cgexec -g memory:thp transhuge-stress 4000 transhuge-stress comes from kernel selftest. It is easy to hit OOM, but there are still a lot THP on the deferred split queue, memcg direct reclaim can't touch them since the deferred split shrinker is not memcg aware. Convert deferred split shrinker memcg aware by introducing per memcg deferred split queue. The THP should be on either per node or per memcg deferred split queue if it belongs to a memcg. When the page is immigrated to the other memcg, it will be immigrated to the target memcg's deferred split queue too. Reuse the second tail page's deferred_list for per memcg list since the same THP can't be on multiple deferred split queues. Make deferred split shrinker not depend on memcg kmem since it is not slab. It doesn't make sense to not shrink THP even though memcg kmem is disabled. With the above change the test demonstrated above doesn't trigger OOM even though with cgroup.memory=nokmem. This patch (of 4): Put split_queue, split_queue_lock and split_queue_len into a struct in order to reduce code duplication when we convert deferred_split to memcg aware in the later patches. Link: http://lkml.kernel.org/r/1565144277-36240-2-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Suggested-by: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Hugh Dickins <hughd@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Qian Cai <cai@lca.pw> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-09-24 15:54:11 -07:00
Vlastimil Babka	4943308556	mm, compaction: raise compaction priority after it withdrawns Mike Kravetz reports that "hugetlb allocations could stall for minutes or hours when should_compact_retry() would return true more often then it should. Specifically, this was in the case where compact_result was COMPACT_DEFERRED and COMPACT_PARTIAL_SKIPPED and no progress was being made." The problem is that the compaction_withdrawn() test in should_compact_retry() includes compaction outcomes that are only possible on low compaction priority, and results in a retry without increasing the priority. This may result in furter reclaim, and more incomplete compaction attempts. With this patch, compaction priority is raised when possible, or should_compact_retry() returns false. The COMPACT_SKIPPED result doesn't really fit together with the other outcomes in compaction_withdrawn(), as that's a result caused by insufficient order-0 pages, not due to low compaction priority. With this patch, it is moved to a new compaction_needs_reclaim() function, and for that outcome we keep the current logic of retrying if it looks like reclaim will be able to help. Link: http://lkml.kernel.org/r/20190806014744.15446-4-mike.kravetz@oracle.com Reported-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Tested-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-09-24 15:54:10 -07:00
Matthew Wilcox (Oracle)	d8c6546b1a	mm: introduce compound_nr() Replace 1 << compound_order(page) with compound_nr(page). Minor improvements in readability. Link: http://lkml.kernel.org/r/20190721104612.19120-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-09-24 15:54:08 -07:00
Greg Kroah-Hartman	896be8f44d	Merge 5.4-rc1-prereleae into android-mainline To make the 5.4-rc1 merge easier, merge at a prerelease point in time before the final release happens. Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I29b683c837ed1a3324644dbf9bf863f30740cd0b	2019-09-23 14:14:08 +02:00
Linus Torvalds	84da111de0	Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull hmm updates from Jason Gunthorpe: "This is more cleanup and consolidation of the hmm APIs and the very strongly related mmu_notifier interfaces. Many places across the tree using these interfaces are touched in the process. Beyond that a cleanup to the page walker API and a few memremap related changes round out the series: - General improvement of hmm_range_fault() and related APIs, more documentation, bug fixes from testing, API simplification & consolidation, and unused API removal - Simplify the hmm related kconfigs to HMM_MIRROR and DEVICE_PRIVATE, and make them internal kconfig selects - Hoist a lot of code related to mmu notifier attachment out of drivers by using a refcount get/put attachment idiom and remove the convoluted mmu_notifier_unregister_no_release() and related APIs. - General API improvement for the migrate_vma API and revision of its only user in nouveau - Annotate mmu_notifiers with lockdep and sleeping region debugging Two series unrelated to HMM or mmu_notifiers came along due to dependencies: - Allow pagemap's memremap_pages family of APIs to work without providing a struct device - Make walk_page_range() and related use a constant structure for function pointers" * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (75 commits) libnvdimm: Enable unit test infrastructure compile checks mm, notifier: Catch sleeping/blocking for !blockable kernel.h: Add non_block_start/end() drm/radeon: guard against calling an unpaired radeon_mn_unregister() csky: add missing brackets in a macro for tlb.h pagewalk: use lockdep_assert_held for locking validation pagewalk: separate function pointers from iterator data mm: split out a new pagewalk.h header from mm.h mm/mmu_notifiers: annotate with might_sleep() mm/mmu_notifiers: prime lockdep mm/mmu_notifiers: add a lockdep map for invalidate_range_start/end mm/mmu_notifiers: remove the __mmu_notifier_invalidate_range_start/end exports mm/hmm: hmm_range_fault() infinite loop mm/hmm: hmm_range_fault() NULL pointer bug mm/hmm: fix hmm_range_fault()'s handling of swapped out pages mm/mmu_notifiers: remove unregister_no_release RDMA/odp: remove ib_ucontext from ib_umem RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm' RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr RDMA/mlx5: Use ib_umem_start instead of umem.address ...	2019-09-21 10:07:42 -07:00
Greg Kroah-Hartman	bfa0399bc8	Merge Linus's 5.4-rc1-prerelease branch into android-mainline This merges Linus's tree as of commit `b41dae061b` ("Merge tag 'xfs-5.4-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux") into android-mainline. This "early" merge makes it easier to test and handle merge conflicts instead of having to wait until the "end" of the merge window and handle all 10000+ commits at once. Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I6bebf55e5e2353f814e3c87f5033607b1ae5d812	2019-09-20 16:07:54 -07:00
Linus Torvalds	7e67a85999	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - MAINTAINERS: Add Mark Rutland as perf submaintainer, Juri Lelli and Vincent Guittot as scheduler submaintainers. Add Dietmar Eggemann, Steven Rostedt, Ben Segall and Mel Gorman as scheduler reviewers. As perf and the scheduler is getting bigger and more complex, document the status quo of current responsibilities and interests, and spread the review pain^H^H^H^H fun via an increase in the Cc: linecount generated by scripts/get_maintainer.pl. :-) - Add another series of patches that brings the -rt (PREEMPT_RT) tree closer to mainline: split the monolithic CONFIG_PREEMPT dependencies into a new CONFIG_PREEMPTION category that will allow the eventual introduction of CONFIG_PREEMPT_RT. Still a few more hundred patches to go though. - Extend the CPU cgroup controller with uclamp.min and uclamp.max to allow the finer shaping of CPU bandwidth usage. - Micro-optimize energy-aware wake-ups from O(CPUS^2) to O(CPUS). - Improve the behavior of high CPU count, high thread count applications running under cpu.cfs_quota_us constraints. - Improve balancing with SCHED_IDLE (SCHED_BATCH) tasks present. - Improve CPU isolation housekeeping CPU allocation NUMA locality. - Fix deadline scheduler bandwidth calculations and logic when cpusets rebuilds the topology, or when it gets deadline-throttled while it's being offlined. - Convert the cpuset_mutex to percpu_rwsem, to allow it to be used from setscheduler() system calls without creating global serialization. Add new synchronization between cpuset topology-changing events and the deadline acceptance tests in setscheduler(), which were broken before. - Rework the active_mm state machine to be less confusing and more optimal. - Rework (simplify) the pick_next_task() slowpath. - Improve load-balancing on AMD EPYC systems. - ... and misc cleanups, smaller fixes and improvements - please see the Git log for more details. * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) sched/psi: Correct overly pessimistic size calculation sched/fair: Speed-up energy-aware wake-ups sched/uclamp: Always use 'enum uclamp_id' for clamp_id values sched/uclamp: Update CPU's refcount on TG's clamp changes sched/uclamp: Use TG's clamps to restrict TASK's clamps sched/uclamp: Propagate system defaults to the root group sched/uclamp: Propagate parent clamps sched/uclamp: Extend CPU's cgroup controller sched/topology: Improve load balancing on AMD EPYC systems arch, ia64: Make NUMA select SMP sched, perf: MAINTAINERS update, add submaintainers and reviewers sched/fair: Use rq_lock/unlock in online_fair_sched_group cpufreq: schedutil: fix equation in comment sched: Rework pick_next_task() slow-path sched: Allow put_prev_task() to drop rq->lock sched/fair: Expose newidle_balance() sched: Add task_struct pointer to sched_class::set_curr_task sched: Rework CPU hotplug task selection sched/{rt,deadline}: Fix set_next_task vs pick_next_task sched: Fix kerneldoc comment for ia64_set_curr_task ...	2019-09-16 17:25:49 -07:00
Matt Fleming	a55c7454a8	sched/topology: Improve load balancing on AMD EPYC systems SD_BALANCE_{FORK,EXEC} and SD_WAKE_AFFINE are stripped in sd_init() for any sched domains with a NUMA distance greater than 2 hops (RECLAIM_DISTANCE). The idea being that it's expensive to balance across domains that far apart. However, as is rather unfortunately explained in: commit `32e45ff43e` ("mm: increase RECLAIM_DISTANCE to 30") the value for RECLAIM_DISTANCE is based on node distance tables from 2011-era hardware. Current AMD EPYC machines have the following NUMA node distances: node distances: node 0 1 2 3 4 5 6 7 0: 10 16 16 16 32 32 32 32 1: 16 10 16 16 32 32 32 32 2: 16 16 10 16 32 32 32 32 3: 16 16 16 10 32 32 32 32 4: 32 32 32 32 10 16 16 16 5: 32 32 32 32 16 10 16 16 6: 32 32 32 32 16 16 10 16 7: 32 32 32 32 16 16 16 10 where 2 hops is 32. The result is that the scheduler fails to load balance properly across NUMA nodes on different sockets -- 2 hops apart. For example, pinning 16 busy threads to NUMA nodes 0 (CPUs 0-7) and 4 (CPUs 32-39) like so, $ numactl -C 0-7,32-39 ./spinner 16 causes all threads to fork and remain on node 0 until the active balancer kicks in after a few seconds and forcibly moves some threads to node 4. Override node_reclaim_distance for AMD Zen. Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Suravee.Suthikulpanit@amd.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas.Lendacky@amd.com Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190808195301.13222-3-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-09-03 09:17:37 +02:00
Greg Kroah-Hartman	ad455d87e5	Merge 5.3-rc6 into android-mainline Linux 5.3-rc6 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Id10580d48d56054408b3efe0bd1866d67aba2a3d	2019-08-26 16:45:30 +02:00
David Rientjes	cd96103838	mm, page_alloc: move_freepages should not examine struct page of reserved memory After commit `907ec5fca3` ("mm: zero remaining unavailable struct pages"), struct page of reserved memory is zeroed. This causes page->flags to be 0 and fixes issues related to reading /proc/kpageflags, for example, of reserved memory. The VM_BUG_ON() in move_freepages_block(), however, assumes that page_zone() is meaningful even for reserved memory. That assumption is no longer true after the aforementioned commit. There's no reason why move_freepages_block() should be testing the legitimacy of page_zone() for reserved memory; its scope is limited only to pages on the zone's freelist. Note that pfn_valid() can be true for reserved memory: there is a backing struct page. The check for page_to_nid(page) is also buggy but reserved memory normally only appears on node 0 so the zeroing doesn't affect this. Move the debug checks to after verifying PageBuddy is true. This isolates the scope of the checks to only be for buddy pages which are on the zone's freelist which move_freepages_block() is operating on. In this case, an incorrect node or zone is a bug worthy of being warned about (and the examination of struct page is acceptable bcause this memory is not reserved). Why does move_freepages_block() gets called on reserved memory? It's simply math after finding a valid free page from the per-zone free area to use as fallback. We find the beginning and end of the pageblock of the valid page and that can bring us into memory that was reserved per the e820. pfn_valid() is still true (it's backed by a struct page), but since it's zero'd we shouldn't make any inferences here about comparing its node or zone. The current node check just happens to succeed most of the time by luck because reserved memory typically appears on node 0. The fix here is to validate that we actually have buddy pages before testing if there's any type of zone or node strangeness going on. We noticed it almost immediately after bringing `907ec5fca3` in on CONFIG_DEBUG_VM builds. It depends on finding specific free pages in the per-zone free area where the math in move_freepages() will bring the start or end pfn into reserved memory and wanting to claim that entire pageblock as a new migratetype. So the path will be rare, require CONFIG_DEBUG_VM, and require fallback to a different migratetype. Some struct pages were already zeroed from reserve pages before 907ec5fca3c so it theoretically could trigger before this commit. I think it's rare enough under a config option that most people don't run that others may not have noticed. I wouldn't argue against a stable tag and the backport should be easy enough, but probably wouldn't single out a commit that this is fixing. Mel said: : The overhead of the debugging check is higher with this patch although : it'll only affect debug builds and the path is not particularly hot. : If this was a concern, I think it would be reasonable to simply remove : the debugging check as the zone boundaries are checked in : move_freepages_block and we never expect a zone/node to be smaller than : a pageblock and stuck in the middle of another zone. Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1908122036560.10779@chino.kir.corp.google.com Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-08-24 19:48:42 -07:00
Christoph Hellwig	fdc029b19d	memremap: remove the dev field in struct dev_pagemap The dev field in struct dev_pagemap is only used to print dev_name in two places, which are at best nice to have. Just remove the field and thus the name in those two messages. Link: https://lore.kernel.org/r/20190818090557.17853-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Bharata B Rao <bharata@linux.ibm.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-08-20 09:41:35 -03:00
Greg Kroah-Hartman	37766c2946	Merge 5.3.0-rc1 into android-mainline Linus 5.3-rc1 release Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ic171e37d4c21ffa495240c5538852bbb5a9dcce8	2019-07-23 16:21:59 -07:00
Dan Williams	ba72b4c8cf	mm/sparsemem: support sub-section hotplug The libnvdimm sub-system has suffered a series of hacks and broken workarounds for the memory-hotplug implementation's awkward section-aligned (128MB) granularity. For example the following backtrace is emitted when attempting arch_add_memory() with physical address ranges that intersect 'System RAM' (RAM) with 'Persistent Memory' (PMEM) within a given section: # cat /proc/iomem \| grep -A1 -B1 Persistent\ Memory 100000000-1ffffffff : System RAM 200000000-303ffffff : Persistent Memory (legacy) 304000000-43fffffff : System RAM 440000000-23ffffffff : Persistent Memory 2400000000-43bfffffff : Persistent Memory 2400000000-43bfffffff : namespace2.0 WARNING: CPU: 38 PID: 928 at arch/x86/mm/init_64.c:850 add_pages+0x5c/0x60 [..] RIP: 0010:add_pages+0x5c/0x60 [..] Call Trace: devm_memremap_pages+0x460/0x6e0 pmem_attach_disk+0x29e/0x680 [nd_pmem] ? nd_dax_probe+0xfc/0x120 [libnvdimm] nvdimm_bus_probe+0x66/0x160 [libnvdimm] It was discovered that the problem goes beyond RAM vs PMEM collisions as some platform produce PMEM vs PMEM collisions within a given section. The libnvdimm workaround for that case revealed that the libnvdimm section-alignment-padding implementation has been broken for a long while. A fix for that long-standing breakage introduces as many problems as it solves as it would require a backward-incompatible change to the namespace metadata interpretation. Instead of that dubious route [1], address the root problem in the memory-hotplug implementation. Note that EEXIST is no longer treated as success as that is how sparse_add_section() reports subsection collisions, it was also obviated by recent changes to perform the request_region() for 'System RAM' before arch_add_memory() in the add_memory() sequence. [1] https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwillia2-desk3.amr.corp.intel.com [osalvador@suse.de: fix deactivate_section for early sections] Link: http://lkml.kernel.org/r/20190715081549.32577-2-osalvador@suse.de Link: http://lkml.kernel.org/r/156092354368.979959.6232443923440952359.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Oscar Salvador <osalvador@suse.de> Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [ppc64] Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Wei Yang <richardw.yang@linux.intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-07-18 17:08:07 -07:00
Dan Williams	46d945aeab	mm: kill is_dev_zone() helper Given there are no more usages of is_dev_zone() outside of 'ifdef CONFIG_ZONE_DEVICE' protection, kill off the compilation helper. Link: http://lkml.kernel.org/r/156092353211.979959.1489004866360828964.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Wei Yang <richardw.yang@linux.intel.com> Acked-by: David Hildenbrand <david@redhat.com> Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [ppc64] Cc: Michal Hocko <mhocko@suse.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-07-18 17:08:07 -07:00
Dan Williams	f46edbd1b1	mm/sparsemem: add helpers track active portions of a section at boot Prepare for hot{plug,remove} of sub-ranges of a section by tracking a sub-section active bitmask, each bit representing a PMD_SIZE span of the architecture's memory hotplug section size. The implications of a partially populated section is that pfn_valid() needs to go beyond a valid_section() check and either determine that the section is an "early section", or read the sub-section active ranges from the bitmask. The expectation is that the bitmask (subsection_map) fits in the same cacheline as the valid_section() / early_section() data, so the incremental performance overhead to pfn_valid() should be negligible. The rationale for using early_section() to short-ciruit the subsection_map check is that there are legacy code paths that use pfn_valid() at section granularity before validating the pfn against pgdat data. So, the early_section() check allows those traditional assumptions to persist while also permitting subsection_map to tell the truth for purposes of populating the unused portions of early sections with PMEM and other ZONE_DEVICE mappings. Link: http://lkml.kernel.org/r/156092350874.979959.18185938451405518285.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Qian Cai <cai@lca.pw> Tested-by: Jane Chu <jane.chu@oracle.com> Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [ppc64] Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Wei Yang <richardw.yang@linux.intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-07-18 17:08:07 -07:00
Dan Williams	f1eca35a0d	mm/sparsemem: introduce struct mem_section_usage Patch series "mm: Sub-section memory hotplug support", v10. The memory hotplug section is an arbitrary / convenient unit for memory hotplug. 'Section-size' units have bled into the user interface ('memblock' sysfs) and can not be changed without breaking existing userspace. The section-size constraint, while mostly benign for typical memory hotplug, has and continues to wreak havoc with 'device-memory' use cases, persistent memory (pmem) in particular. Recall that pmem uses devm_memremap_pages(), and subsequently arch_add_memory(), to allocate a 'struct page' memmap for pmem. However, it does not use the 'bottom half' of memory hotplug, i.e. never marks pmem pages online and never exposes the userspace memblock interface for pmem. This leaves an opening to redress the section-size constraint. To date, the libnvdimm subsystem has attempted to inject padding to satisfy the internal constraints of arch_add_memory(). Beyond complicating the code, leading to bugs [2], wasting memory, and limiting configuration flexibility, the padding hack is broken when the platform changes this physical memory alignment of pmem from one boot to the next. Device failure (intermittent or permanent) and physical reconfiguration are events that can cause the platform firmware to change the physical placement of pmem on a subsequent boot, and device failure is an everyday event in a data-center. It turns out that sections are only a hard requirement of the user-facing interface for memory hotplug and with a bit more infrastructure sub-section arch_add_memory() support can be added for kernel internal usages like devm_memremap_pages(). Here is an analysis of the current design assumptions in the current code and how they are addressed in the new implementation: Current design assumptions: - Sections that describe boot memory (early sections) are never unplugged / removed. - pfn_valid(), in the CONFIG_SPARSEMEM_VMEMMAP=y, case devolves to a valid_section() check - __add_pages() and helper routines assume all operations occur in PAGES_PER_SECTION units. - The memblock sysfs interface only comprehends full sections New design assumptions: - Sections are instrumented with a sub-section bitmask to track (on x86) individual 2MB sub-divisions of a 128MB section. - Partially populated early sections can be extended with additional sub-sections, and those sub-sections can be removed with arch_remove_memory(). With this in place we no longer lose usable memory capacity to padding. - pfn_valid() is updated to look deeper than valid_section() to also check the active-sub-section mask. This indication is in the same cacheline as the valid_section() so the performance impact is expected to be negligible. So far the lkp robot has not reported any regressions. - Outside of the core vmemmap population routines which are replaced, other helper routines like shrink_{zone,pgdat}_span() are updated to handle the smaller granularity. Core memory hotplug routines that deal with online memory are not touched. - The existing memblock sysfs user api guarantees / assumptions are not touched since this capability is limited to !online !memblock-sysfs-accessible sections. Meanwhile the issue reports continue to roll in from users that do not understand when and how the 128MB constraint will bite them. The current implementation relied on being able to support at least one misaligned namespace, but that immediately falls over on any moderately complex namespace creation attempt. Beyond the initial problem of 'System RAM' colliding with pmem, and the unsolvable problem of physical alignment changes, Linux is now being exposed to platforms that collide pmem ranges with other pmem ranges by default [3]. In short, devm_memremap_pages() has pushed the venerable section-size constraint past the breaking point, and the simplicity of section-aligned arch_add_memory() is no longer tenable. These patches are exposed to the kbuild robot on a subsection-v10 branch [4], and a preview of the unit test for this functionality is available on the 'subsection-pending' branch of ndctl [5]. [2]: https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwillia2-desk3.amr.corp.intel.com [3]: https://github.com/pmem/ndctl/issues/76 [4]: https://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git/log/?h=subsection-v10 [5]: https://github.com/pmem/ndctl/commit/7c59b4867e1c This patch (of 13): Towards enabling memory hotplug to track partial population of a section, introduce 'struct mem_section_usage'. A pointer to a 'struct mem_section_usage' instance replaces the existing pointer to a 'pageblock_flags' bitmap. Effectively it adds one more 'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to house a new 'subsection_map' bitmap. The new bitmap enables the memory hot{plug,remove} implementation to act on incremental sub-divisions of a section. SUBSECTION_SHIFT is defined as global constant instead of per-architecture value like SECTION_SIZE_BITS in order to allow cross-arch compatibility of subsection users. Specifically a common subsection size allows for the possibility that persistent memory namespace configurations be made compatible across architectures. The primary motivation for this functionality is to support platforms that mix "System RAM" and "Persistent Memory" within a single section, or multiple PMEM ranges with different mapping lifetimes within a single section. The section restriction for hotplug has caused an ongoing saga of hacks and bugs for devm_memremap_pages() users. Beyond the fixups to teach existing paths how to retrieve the 'usemap' from a section, and updates to usemap allocation path, there are no expected behavior changes. Link: http://lkml.kernel.org/r/156092349845.979959.73333291612799019.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Wei Yang <richardw.yang@linux.intel.com> Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [ppc64] Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Qian Cai <cai@lca.pw> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-07-18 17:08:07 -07:00
Yafang Shao	e5ca8071fe	mm/vmscan.c: add a new member reclaim_state in struct shrink_control Patch series "mm/vmscan: calculate reclaimed slab in all reclaim paths". This patchset is to fix the issues in doing shrink slab. There're six different reclaim paths by now, - kswapd reclaim path - node reclaim path - hibernate preallocate memory reclaim path - direct reclaim path - memcg reclaim path - memcg softlimit reclaim path The slab caches reclaimed in these paths are only calculated in the above three paths. The issues are detailed explained in patch #2. We should calculate the reclaimed slab caches in every reclaim path. In order to do it, the struct reclaim_state is placed into the struct shrink_control. In node reclaim path, there'is another issue about shrinking slab, which is adressed in "mm/vmscan: shrink slab in node reclaim" (https://lore.kernel.org/linux-mm/1559874946-22960-1-git-send-email-laoar.shao@gmail.com/). This patch (of 2): The struct reclaim_state is used to record how many slab caches are reclaimed in one reclaim path. The struct shrink_control is used to control one reclaim path. So we'd better put reclaim_state into shrink_control. [laoar.shao@gmail.com: remove reclaim_state assignment from __perform_reclaim()] Link: http://lkml.kernel.org/r/1561381582-13697-1-git-send-email-laoar.shao@gmail.com Link: http://lkml.kernel.org/r/1561112086-6169-2-git-send-email-laoar.shao@gmail.com Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-07-16 19:23:21 -07:00
Linus Torvalds	fec88ab0af	Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull HMM updates from Jason Gunthorpe: "Improvements and bug fixes for the hmm interface in the kernel: - Improve clarity, locking and APIs related to the 'hmm mirror' feature merged last cycle. In linux-next we now see AMDGPU and nouveau to be using this API. - Remove old or transitional hmm APIs. These are hold overs from the past with no users, or APIs that existed only to manage cross tree conflicts. There are still a few more of these cleanups that didn't make the merge window cut off. - Improve some core mm APIs: - export alloc_pages_vma() for driver use - refactor into devm_request_free_mem_region() to manage DEVICE_PRIVATE resource reservations - refactor duplicative driver code into the core dev_pagemap struct - Remove hmm wrappers of improved core mm APIs, instead have drivers use the simplified API directly - Remove DEVICE_PUBLIC - Simplify the kconfig flow for the hmm users and core code" * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (42 commits) mm: don't select MIGRATE_VMA_HELPER from HMM_MIRROR mm: remove the HMM config option mm: sort out the DEVICE_PRIVATE Kconfig mess mm: simplify ZONE_DEVICE page private data mm: remove hmm_devmem_add mm: remove hmm_vma_alloc_locked_page nouveau: use devm_memremap_pages directly nouveau: use alloc_page_vma directly PCI/P2PDMA: use the dev_pagemap internal refcount device-dax: use the dev_pagemap internal refcount memremap: provide an optional internal refcount in struct dev_pagemap memremap: replace the altmap_valid field with a PGMAP_ALTMAP_VALID flag memremap: remove the data field in struct dev_pagemap memremap: add a migrate_to_ram method to struct dev_pagemap_ops memremap: lift the devmap_enable manipulation into devm_memremap_pages memremap: pass a struct dev_pagemap to ->kill and ->cleanup memremap: move dev_pagemap callbacks into a separate structure memremap: validate the pagemap type passed to devm_memremap_pages mm: factor out a devm_request_free_mem_region helper mm: export alloc_pages_vma ...	2019-07-14 19:42:11 -07:00
Alexander Potapenko	6471384af2	mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options Patch series "add init_on_alloc/init_on_free boot options", v10. Provide init_on_alloc and init_on_free boot options. These are aimed at preventing possible information leaks and making the control-flow bugs that depend on uninitialized values more deterministic. Enabling either of the options guarantees that the memory returned by the page allocator and SL[AU]B is initialized with zeroes. SLOB allocator isn't supported at the moment, as its emulation of kmem caches complicates handling of SLAB_TYPESAFE_BY_RCU caches correctly. Enabling init_on_free also guarantees that pages and heap objects are initialized right after they're freed, so it won't be possible to access stale data by using a dangling pointer. As suggested by Michal Hocko, right now we don't let the heap users to disable initialization for certain allocations. There's not enough evidence that doing so can speed up real-life cases, and introducing ways to opt-out may result in things going out of control. This patch (of 2): The new options are needed to prevent possible information leaks and make control-flow bugs that depend on uninitialized values more deterministic. This is expected to be on-by-default on Android and Chrome OS. And it gives the opportunity for anyone else to use it under distros too via the boot args. (The init_on_free feature is regularly requested by folks where memory forensics is included in their threat models.) init_on_alloc=1 makes the kernel initialize newly allocated pages and heap objects with zeroes. Initialization is done at allocation time at the places where checks for __GFP_ZERO are performed. init_on_free=1 makes the kernel initialize freed pages and heap objects with zeroes upon their deletion. This helps to ensure sensitive data doesn't leak via use-after-free accesses. Both init_on_alloc=1 and init_on_free=1 guarantee that the allocator returns zeroed memory. The two exceptions are slab caches with constructors and SLAB_TYPESAFE_BY_RCU flag. Those are never zero-initialized to preserve their semantics. Both init_on_alloc and init_on_free default to zero, but those defaults can be overridden with CONFIG_INIT_ON_ALLOC_DEFAULT_ON and CONFIG_INIT_ON_FREE_DEFAULT_ON. If either SLUB poisoning or page poisoning is enabled, those options take precedence over init_on_alloc and init_on_free: initialization is only applied to unpoisoned allocations. Slowdown for the new features compared to init_on_free=0, init_on_alloc=0: hackbench, init_on_free=1: +7.62% sys time (st.err 0.74%) hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%) Linux build with -j12, init_on_free=1: +8.38% wall time (st.err 0.39%) Linux build with -j12, init_on_free=1: +24.42% sys time (st.err 0.52%) Linux build with -j12, init_on_alloc=1: -0.13% wall time (st.err 0.42%) Linux build with -j12, init_on_alloc=1: +0.57% sys time (st.err 0.40%) The slowdown for init_on_free=0, init_on_alloc=0 compared to the baseline is within the standard error. The new features are also going to pave the way for hardware memory tagging (e.g. arm64's MTE), which will require both on_alloc and on_free hooks to set the tags for heap objects. With MTE, tagging will have the same cost as memory initialization. Although init_on_free is rather costly, there are paranoid use-cases where in-memory data lifetime is desired to be minimized. There are various arguments for/against the realism of the associated threat models, but given that we'll need the infrastructure for MTE anyway, and there are people who want wipe-on-free behavior no matter what the performance cost, it seems reasonable to include it in this series. [glider@google.com: v8] Link: http://lkml.kernel.org/r/20190626121943.131390-2-glider@google.com [glider@google.com: v9] Link: http://lkml.kernel.org/r/20190627130316.254309-2-glider@google.com [glider@google.com: v10] Link: http://lkml.kernel.org/r/20190628093131.199499-2-glider@google.com Link: http://lkml.kernel.org/r/20190617151050.92663-2-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Michal Hocko <mhocko@suse.cz> [page and dmapool parts Acked-by: James Morris <jamorris@linux.microsoft.com>] Cc: Christoph Lameter <cl@linux.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Sandeep Patil <sspatil@android.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Jann Horn <jannh@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-07-12 11:05:46 -07:00
Nicholas Piggin	e03a5125ec	mm/large system hash: clear hashdist when only one node with memory is booted CONFIG_NUMA on 64-bit CPUs currently enables hashdist unconditionally even when booting on single node machines. This causes the large system hashes to be allocated with vmalloc, and mapped with small pages. This change clears hashdist if only one node has come up with memory. This results in the important large inode and dentry hashes using memblock allocations. All others are within 4MB size up to about 128GB of RAM, which allows them to be allocated from the linear map on most non-NUMA images. Other big hashes like futex and TCP should eventually be moved over to the same style of allocation as those vfs caches that use HASH_EARLY if !hashdist, so they don't exceed MAX_ORDER on very large non-NUMA images. This brings dTLB misses for linux kernel tree `git diff` from ~45,000 to ~8,000 on a Kaby Lake KVM guest with 8MB dentry hash and mitigations=off (performance is in the noise, under 1% difference, page tables are likely to be well cached for this workload). Link: http://lkml.kernel.org/r/20190605144814.29319-2-npiggin@gmail.com Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-07-12 11:05:46 -07:00
Nicholas Piggin	ec11408a16	mm/large system hash: use vmalloc for size > MAX_ORDER when !hashdist The kernel currently clamps large system hashes to MAX_ORDER when hashdist is not set, which is rather arbitrary. vmalloc space is limited on 32-bit machines, but this shouldn't result in much more used because of small physical memory limiting system hash sizes. Include "vmalloc" or "linear" in the kernel log message. Link: http://lkml.kernel.org/r/20190605144814.29319-1-npiggin@gmail.com Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-07-12 11:05:46 -07:00
Vlastimil Babka	3972f6bb1c	mm, debug_pagealloc: use a page type instead of page_ext flag When debug_pagealloc is enabled, we currently allocate the page_ext array to mark guard pages with the PAGE_EXT_DEBUG_GUARD flag. Now that we have the page_type field in struct page, we can use that instead, as guard pages are neither PageSlab nor mapped to userspace. This reduces memory overhead when debug_pagealloc is enabled and there are no other features requiring the page_ext array. Link: http://lkml.kernel.org/r/20190603143451.27353-4-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-07-12 11:05:43 -07:00
Vlastimil Babka	4462b32c92	mm, page_alloc: more extensive free page checking with debug_pagealloc The page allocator checks struct pages for expected state (mapcount, flags etc) as pages are being allocated (check_new_page()) and freed (free_pages_check()) to provide some defense against errors in page allocator users. Prior commits `479f854a20` ("mm, page_alloc: defer debugging checks of pages allocated from the PCP") and `4db7548ccb` ("mm, page_alloc: defer debugging checks of freed pages until a PCP drain") this has happened for order-0 pages as they were allocated from or freed to the per-cpu caches (pcplists). Since those are fast paths, the checks are now performed only when pages are moved between pcplists and global free lists. This however lowers the chances of catching errors soon enough. In order to increase the chances of the checks to catch errors, the kernel has to be rebuilt with CONFIG_DEBUG_VM, which also enables multiple other internal debug checks (VM_BUG_ON() etc), which is suboptimal when the goal is to catch errors in mm users, not in mm code itself. To catch some wrong users of the page allocator we have CONFIG_DEBUG_PAGEALLOC, which is designed to have virtually no overhead unless enabled at boot time. Memory corruptions when writing to freed pages have often the same underlying errors (use-after-free, double free) as corrupting the corresponding struct pages, so this existing debugging functionality is a good fit to extend by also perform struct page checks at least as often as if CONFIG_DEBUG_VM was enabled. Specifically, after this patch, when debug_pagealloc is enabled on boot, and CONFIG_DEBUG_VM disabled, pages are checked when allocated from or freed to the pcplists in addition to being moved between pcplists and free lists. When both debug_pagealloc and CONFIG_DEBUG_VM are enabled, pages are checked when being moved between pcplists and free lists in addition to when allocated from or freed to the pcplists. When debug_pagealloc is not enabled on boot, the overhead in fast paths should be virtually none thanks to the use of static key. Link: http://lkml.kernel.org/r/20190603143451.27353-3-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-07-12 11:05:43 -07:00
Vlastimil Babka	96a2b03f28	mm, debug_pagelloc: use static keys to enable debugging Patch series "debug_pagealloc improvements". I have been recently debugging some pcplist corruptions, where it would be useful to perform struct page checks immediately as pages are allocated from and freed to pcplists, which is now only possible by rebuilding the kernel with CONFIG_DEBUG_VM (details in Patch 2 changelog). To make this kind of debugging simpler in future on a distro kernel, I have improved CONFIG_DEBUG_PAGEALLOC so that it has even smaller overhead when not enabled at boot time (Patch 1) and also when enabled (Patch 3), and extended it to perform the struct page checks more often when enabled (Patch 2). Now it can be configured in when building a distro kernel without extra overhead, and debugging page use after free or double free can be enabled simply by rebooting with debug_pagealloc=on. This patch (of 3): CONFIG_DEBUG_PAGEALLOC has been redesigned by `031bc5743f` ("mm/debug-pagealloc: make debug-pagealloc boottime configurable") to allow being always enabled in a distro kernel, but only perform its expensive functionality when booted with debug_pagelloc=on. We can further reduce the overhead when not boot-enabled (including page allocator fast paths) using static keys. This patch introduces one for debug_pagealloc core functionality, and another for the optional guard page functionality (enabled by booting with debug_guardpage_minorder=X). Link: http://lkml.kernel.org/r/20190603143451.27353-2-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-07-12 11:05:43 -07:00
Denis Efremov	98ef2046f2	mm: remove the exporting of totalram_pages Previously totalram_pages was the global variable. Currently, totalram_pages is the static inline function from the include/linux/mm.h However, the function is also marked as EXPORT_SYMBOL, which is at best an odd combination. Because there is no point for the static inline function from a public header to be exported, this commit removes the EXPORT_SYMBOL() marking. It will be still possible to use the function in modules because all the symbols it depends on are exported. Link: http://lkml.kernel.org/r/20190710141031.15642-1-efremov@linux.com Fixes: `ca79b0c211` ("mm: convert totalram_pages and totalhigh_pages variables to atomic") Signed-off-by: Denis Efremov <efremov@linux.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-07-12 11:05:43 -07:00
Greg Kroah-Hartman	a4bbf3df04	Merge 5.2 into android-common Linux 5.2 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2019-07-08 08:24:40 +02:00
Juergen Gross	b9705d8778	mm/page_alloc.c: fix regression with deferred struct page init Commit `0e56acae4b` ("mm: initialize MAX_ORDER_NR_PAGES at a time instead of doing larger sections") is causing a regression on some systems when the kernel is booted as Xen dom0. The system will just hang in early boot. Reason is an endless loop in get_page_from_freelist() in case the first zone looked at has no free memory. deferred_grow_zone() is always returning true due to the following code snipplet: /* If the zone is empty somebody else may have cleared out the zone */ if (!deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn, first_deferred_pfn)) { pgdat->first_deferred_pfn = ULONG_MAX; pgdat_resize_unlock(pgdat, &flags); return true; } This in turn results in the loop as get_page_from_freelist() is assuming forward progress can be made by doing some more struct page initialization. Link: http://lkml.kernel.org/r/20190620160821.4210-1-jgross@suse.com Fixes: `0e56acae4b` ("mm: initialize MAX_ORDER_NR_PAGES at a time instead of doing larger sections") Signed-off-by: Juergen Gross <jgross@suse.com> Suggested-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Acked-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-07-05 11:12:07 +09:00
Christoph Hellwig	8a164fef9c	mm: simplify ZONE_DEVICE page private data Remove the clumsy hmm_devmem_page_{get,set}_drvdata helpers, and instead just access the page directly. Also make the page data a void pointer, and thus much easier to use. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-07-02 14:32:45 -03:00
Christoph Hellwig	514caf23a7	memremap: replace the altmap_valid field with a PGMAP_ALTMAP_VALID flag Add a flags field to struct dev_pagemap to replace the altmap_valid boolean to be a little more extensible. Also add a pgmap_altmap() helper to find the optional altmap and clean up the code using the altmap using it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-07-02 14:32:44 -03:00
Greg Kroah-Hartman	b9c482880a	Merge 5.2-rc2 into android-mainline Linux 5.2-rc2 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2019-05-27 09:45:14 +02:00
Thomas Gleixner	457c899653	treewide: Add SPDX license identifier for missed files Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-05-21 10:50:45 +02:00
Greg Kroah-Hartman	1226c72a32	Merge 5.2-rc1 into android-mainline Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2019-05-20 20:17:24 +02:00
Dan Williams	97500a4a54	mm: maintain randomization of page free lists When freeing a page with an order >= shuffle_page_order randomly select the front or back of the list for insertion. While the mm tries to defragment physical pages into huge pages this can tend to make the page allocator more predictable over time. Inject the front-back randomness to preserve the initial randomness established by shuffle_free_memory() when the kernel was booted. The overhead of this manipulation is constrained by only being applied for MAX_ORDER sized pages by default. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/154899812788.3165233.9066631950746578517.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Robert Elliott <elliott@hpe.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-05-14 19:52:48 -07:00
Dan Williams	b03641af68	mm: move buddy list manipulations into helpers In preparation for runtime randomization of the zone lists, take all (well, most of) the list_*() functions in the buddy allocator and put them in helper functions. Provide a common control point for injecting additional behavior when freeing pages. [dan.j.williams@intel.com: fix buddy list helpers] Link: http://lkml.kernel.org/r/155033679702.1773410.13041474192173212653.stgit@dwillia2-desk3.amr.corp.intel.com [vbabka@suse.cz: remove del_page_from_free_area() migratetype parameter] Link: http://lkml.kernel.org/r/4672701b-6775-6efd-0797-b6242591419e@suse.cz Link: http://lkml.kernel.org/r/154899812264.3165233.5219320056406926223.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Keith Busch <keith.busch@intel.com> Cc: Robert Elliott <elliott@hpe.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-05-14 19:52:48 -07:00
Dan Williams	e900a918b0	mm: shuffle initial free memory to improve memory-side-cache utilization Patch series "mm: Randomize free memory", v10. This patch (of 3): Randomization of the page allocator improves the average utilization of a direct-mapped memory-side-cache. Memory side caching is a platform capability that Linux has been previously exposed to in HPC (high-performance computing) environments on specialty platforms. In that instance it was a smaller pool of high-bandwidth-memory relative to higher-capacity / lower-bandwidth DRAM. Now, this capability is going to be found on general purpose server platforms where DRAM is a cache in front of higher latency persistent memory [1]. Robert offered an explanation of the state of the art of Linux interactions with memory-side-caches [2], and I copy it here: It's been a problem in the HPC space: http://www.nersc.gov/research-and-development/knl-cache-mode-performance-coe/ A kernel module called zonesort is available to try to help: https://software.intel.com/en-us/articles/xeon-phi-software and this abandoned patch series proposed that for the kernel: https://lkml.kernel.org/r/20170823100205.17311-1-lukasz.daniluk@intel.com Dan's patch series doesn't attempt to ensure buffers won't conflict, but also reduces the chance that the buffers will. This will make performance more consistent, albeit slower than "optimal" (which is near impossible to attain in a general-purpose kernel). That's better than forcing users to deploy remedies like: "To eliminate this gradual degradation, we have added a Stream measurement to the Node Health Check that follows each job; nodes are rebooted whenever their measured memory bandwidth falls below 300 GB/s." A replacement for zonesort was merged upstream in commit `cc9aec03e5` ("x86/numa_emulation: Introduce uniform split capability"). With this numa_emulation capability, memory can be split into cache sized ("near-memory" sized) numa nodes. A bind operation to such a node, and disabling workloads on other nodes, enables full cache performance. However, once the workload exceeds the cache size then cache conflicts are unavoidable. While HPC environments might be able to tolerate time-scheduling of cache sized workloads, for general purpose server platforms, the oversubscribed cache case will be the common case. The worst case scenario is that a server system owner benchmarks a workload at boot with an un-contended cache only to see that performance degrade over time, even below the average cache performance due to excessive conflicts. Randomization clips the peaks and fills in the valleys of cache utilization to yield steady average performance. Here are some performance impact details of the patches: 1/ An Intel internal synthetic memory bandwidth measurement tool, saw a 3X speedup in a contrived case that tries to force cache conflicts. The contrived cased used the numa_emulation capability to force an instance of the benchmark to be run in two of the near-memory sized numa nodes. If both instances were placed on the same emulated they would fit and cause zero conflicts. While on separate emulated nodes without randomization they underutilized the cache and conflicted unnecessarily due to the in-order allocation per node. 2/ A well known Java server application benchmark was run with a heap size that exceeded cache size by 3X. The cache conflict rate was 8% for the first run and degraded to 21% after page allocator aging. With randomization enabled the rate levelled out at 11%. 3/ A MongoDB workload did not observe measurable difference in cache-conflict rates, but the overall throughput dropped by 7% with randomization in one case. 4/ Mel Gorman ran his suite of performance workloads with randomization enabled on platforms without a memory-side-cache and saw a mix of some improvements and some losses [3]. While there is potentially significant improvement for applications that depend on low latency access across a wide working-set, the performance may be negligible to negative for other workloads. For this reason the shuffle capability defaults to off unless a direct-mapped memory-side-cache is detected. Even then, the page_alloc.shuffle=0 parameter can be specified to disable the randomization on those systems. Outside of memory-side-cache utilization concerns there is potentially security benefit from randomization. Some data exfiltration and return-oriented-programming attacks rely on the ability to infer the location of sensitive data objects. The kernel page allocator, especially early in system boot, has predictable first-in-first out behavior for physical pages. Pages are freed in physical address order when first onlined. Quoting Kees: "While we already have a base-address randomization (CONFIG_RANDOMIZE_MEMORY), attacks against the same hardware and memory layouts would certainly be using the predictability of allocation ordering (i.e. for attacks where the base address isn't important: only the relative positions between allocated memory). This is common in lots of heap-style attacks. They try to gain control over ordering by spraying allocations, etc. I'd really like to see this because it gives us something similar to CONFIG_SLAB_FREELIST_RANDOM but for the page allocator." While SLAB_FREELIST_RANDOM reduces the predictability of some local slab caches it leaves vast bulk of memory to be predictably in order allocated. However, it should be noted, the concrete security benefits are hard to quantify, and no known CVE is mitigated by this randomization. Introduce shuffle_free_memory(), and its helper shuffle_zone(), to perform a Fisher-Yates shuffle of the page allocator 'free_area' lists when they are initially populated with free memory at boot and at hotplug time. Do this based on either the presence of a page_alloc.shuffle=Y command line parameter, or autodetection of a memory-side-cache (to be added in a follow-on patch). The shuffling is done in terms of CONFIG_SHUFFLE_PAGE_ORDER sized free pages where the default CONFIG_SHUFFLE_PAGE_ORDER is MAX_ORDER-1 i.e. 10, 4MB this trades off randomization granularity for time spent shuffling. MAX_ORDER-1 was chosen to be minimally invasive to the page allocator while still showing memory-side cache behavior improvements, and the expectation that the security implications of finer granularity randomization is mitigated by CONFIG_SLAB_FREELIST_RANDOM. The performance impact of the shuffling appears to be in the noise compared to other memory initialization work. This initial randomization can be undone over time so a follow-on patch is introduced to inject entropy on page free decisions. It is reasonable to ask if the page free entropy is sufficient, but it is not enough due to the in-order initial freeing of pages. At the start of that process putting page1 in front or behind page0 still keeps them close together, page2 is still near page1 and has a high chance of being adjacent. As more pages are added ordering diversity improves, but there is still high page locality for the low address pages and this leads to no significant impact to the cache conflict rate. [1]: https://itpeernetwork.intel.com/intel-optane-dc-persistent-memory-operating-modes/ [2]: https://lkml.kernel.org/r/AT5PR8401MB1169D656C8B5E121752FC0F8AB120@AT5PR8401MB1169.NAMPRD84.PROD.OUTLOOK.COM [3]: https://lkml.org/lkml/2018/10/12/309 [dan.j.williams@intel.com: fix shuffle enable] Link: http://lkml.kernel.org/r/154943713038.3858443.4125180191382062871.stgit@dwillia2-desk3.amr.corp.intel.com [cai@lca.pw: fix SHUFFLE_PAGE_ALLOCATOR help texts] Link: http://lkml.kernel.org/r/20190425201300.75650-1-cai@lca.pw Link: http://lkml.kernel.org/r/154899811738.3165233.12325692939590944259.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Robert Elliott <elliott@hpe.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-05-14 19:52:48 -07:00
Baruch Siach	136ac591f0	mm: update references to page _refcount Commit `0139aa7b7f` ("mm: rename _count, field of the struct page, to _refcount") left out a couple of references to the old field name. Fix that. Link: http://lkml.kernel.org/r/cedf87b02eb8a6b3eac57e8e91da53fb15c3c44c.1556537475.git.baruch@tkos.co.il Fixes: `0139aa7b7f` ("mm: rename _count, field of the struct page, to _refcount") Signed-off-by: Baruch Siach <baruch@tkos.co.il> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-05-14 19:52:47 -07:00
Mike Rapoport	350e88bad4	mm: memblock: make keeping memblock memory opt-in rather than opt-out Most architectures do not need the memblock memory after the page allocator is initialized, but only few enable ARCH_DISCARD_MEMBLOCK in the arch Kconfig. Replacing ARCH_DISCARD_MEMBLOCK with ARCH_KEEP_MEMBLOCK and inverting the logic makes it clear which architectures actually use memblock after system initialization and skips the necessity to add ARCH_DISCARD_MEMBLOCK to the architectures that are still missing that option. Link: http://lkml.kernel.org/r/1556102150-32517-1-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@mips.com> Cc: James Hogan <jhogan@kernel.org> Cc: Ley Foon Tan <lftan@altera.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-05-14 09:47:50 -07:00
Yafang Shao	1c52e6d068	mm/page_alloc.c: remove unnecessary parameter in rmqueue_pcplist Because rmqueue_pcplist() is only called when order is 0, we don't need to use order as a parameter. Link: http://lkml.kernel.org/r/1555591709-11744-1-git-send-email-laoar.shao@gmail.com Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Pankaj Gupta <pagupta@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-05-14 09:47:50 -07:00
Michal Hocko	5557c766ab	mm, memory_hotplug: cleanup memory offline path check_pages_isolated_cb currently accounts the whole pfn range as being offlined if test_pages_isolated suceeds on the range. This is based on the assumption that all pages in the range are freed which is currently the case in most cases but it won't be with later changes, as pages marked as vmemmap won't be isolated. Move the offlined pages counting to offline_isolated_pages_cb and rely on __offline_isolated_pages to return the correct value. check_pages_isolated_cb will still do it's primary job and check the pfn range. While we are at it remove check_pages_isolated and offline_isolated_pages and use directly walk_system_ram_range as do in online_pages. Link: http://lkml.kernel.org/r/20190408082633.2864-2-osalvador@suse.de Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Oscar Salvador <osalvador@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-05-14 09:47:49 -07:00

1 2 3 4 5 ...

1535 Commits