ac14ef0f9f201e16e61d3be5b493e10acad957e7
1119 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
![]() |
b1a6760ddf |
Merge branch 'android12-5.10' into android12-5.10-lts
Sync up with android12-5.10 for the following commits: |
||
![]() |
470cce5085 |
ANDROID: mm: page_pinner: record every put_page
There are some places using put_page_testzero instead of put_page. Thus, move page_pinner_put_page into put_page_testzero to catch all of put operations. Bug: 192475091 Signed-off-by: Minchan Kim <minchan@google.com> Change-Id: If33b2a28ceb64e3ccab83990eac2c1cc291c3b08 |
||
![]() |
9f47e5fdda |
ANDROID: mm: page_pinner: change function names
Make function name more clear to indicate what it's doing. Bug: 192475091 Signed-off-by: Minchan Kim <minchan@google.com> Change-Id: I6adabc0df6a54cf24d8287bf0f22cf7dcdc7ad03 |
||
![]() |
daeabfe7fa |
ANDROID: mm: add reclaim_shmem_address_space() for faster reclaims
Add the functionality that allow users of shmem to reclaim its pages without going through the kswapd/direct reclaim path. An example usecase is: Say that device allocates a larger amount of shmem pages and shares it with hardware. To faster reclaims such pages, drivers can register the shrinkers and call reclaim_shmem_address_space(). The implementation of this function is mostly borrowed from reclaim_address_space() implemented for per process reclaim[1]. [1] https://lore.kernel.org/patchwork/cover/378056/ Bug: 187798288 Change-Id: I03d2c3b9610612af977f89ddeabb63b8e9e50918 Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> |
||
![]() |
194be71cc6 |
Merge 5.10.47 into android12-5.10-lts
Changes in 5.10.47 module: limit enabling module.sig_enforce Revert "drm/amdgpu/gfx9: fix the doorbell missing when in CGPG issue." Revert "drm/amdgpu/gfx10: enlarge CP_MEC_DOORBELL_RANGE_UPPER to cover full doorbell." drm: add a locked version of drm_is_current_master drm/nouveau: wait for moving fence after pinning v2 drm/radeon: wait for moving fence after pinning drm/amdgpu: wait for moving fence after pinning ARM: 9081/1: fix gcc-10 thumb2-kernel regression mmc: meson-gx: use memcpy_to/fromio for dram-access-quirk MIPS: generic: Update node names to avoid unit addresses arm64: Ignore any DMA offsets in the max_zone_phys() calculation arm64: Force NO_BLOCK_MAPPINGS if crashkernel reservation is required spi: spi-nxp-fspi: move the register operation after the clock enable Revert "PCI: PM: Do not read power state in pci_enable_device_flags()" drm/vc4: hdmi: Move the HSM clock enable to runtime_pm drm/vc4: hdmi: Make sure the controller is powered in detect x86/entry: Fix noinstr fail in __do_fast_syscall_32() x86/xen: Fix noinstr fail in exc_xen_unknown_trap() locking/lockdep: Improve noinstr vs errors perf/x86/lbr: Remove cpuc->lbr_xsave allocation from atomic context perf/x86/intel/lbr: Zero the xstate buffer on allocation dmaengine: zynqmp_dma: Fix PM reference leak in zynqmp_dma_alloc_chan_resourc() dmaengine: stm32-mdma: fix PM reference leak in stm32_mdma_alloc_chan_resourc() dmaengine: xilinx: dpdma: Add missing dependencies to Kconfig dmaengine: xilinx: dpdma: Limit descriptor IDs to 16 bits mac80211: remove warning in ieee80211_get_sband() mac80211_hwsim: drop pending frames on stop cfg80211: call cfg80211_leave_ocb when switching away from OCB dmaengine: rcar-dmac: Fix PM reference leak in rcar_dmac_probe() dmaengine: mediatek: free the proper desc in desc_free handler dmaengine: mediatek: do not issue a new desc if one is still current dmaengine: mediatek: use GFP_NOWAIT instead of GFP_ATOMIC in prep_dma net: ipv4: Remove unneed BUG() function mac80211: drop multicast fragments net: ethtool: clear heap allocations for ethtool function inet: annotate data race in inet_send_prepare() and inet_dgram_connect() ping: Check return value of function 'ping_queue_rcv_skb' net: annotate data race in sock_error() inet: annotate date races around sk->sk_txhash net/packet: annotate data race in packet_sendmsg() net: phy: dp83867: perform soft reset and retain established link riscv32: Use medany C model for modules net: caif: fix memory leak in ldisc_open net/packet: annotate accesses to po->bind net/packet: annotate accesses to po->ifindex r8152: Avoid memcpy() over-reading of ETH_SS_STATS sh_eth: Avoid memcpy() over-reading of ETH_SS_STATS r8169: Avoid memcpy() over-reading of ETH_SS_STATS KVM: selftests: Fix kvm_check_cap() assertion net: qed: Fix memcpy() overflow of qed_dcbx_params() mac80211: reset profile_periodicity/ema_ap mac80211: handle various extensible elements correctly recordmcount: Correct st_shndx handling PCI: Add AMD RS690 quirk to enable 64-bit DMA net: ll_temac: Add memory-barriers for TX BD access net: ll_temac: Avoid ndo_start_xmit returning NETDEV_TX_BUSY perf/x86: Track pmu in per-CPU cpu_hw_events pinctrl: stm32: fix the reported number of GPIO lines per bank i2c: i801: Ensure that SMBHSTSTS_INUSE_STS is cleared when leaving i801_access gpiolib: cdev: zero padding during conversion to gpioline_info_changed scsi: sd: Call sd_revalidate_disk() for ioctl(BLKRRPART) nilfs2: fix memory leak in nilfs_sysfs_delete_device_group s390/stack: fix possible register corruption with stack switch helper KVM: do not allow mapping valid but non-reference-counted pages i2c: robotfuzz-osif: fix control-request directions ceph: must hold snap_rwsem when filling inode for async create kthread_worker: split code for canceling the delayed work timer kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync() x86/fpu: Preserve supervisor states in sanitize_restored_user_xstate() x86/fpu: Make init_fpstate correct with optimized XSAVE mm: add VM_WARN_ON_ONCE_PAGE() macro mm/rmap: remove unneeded semicolon in page_not_mapped() mm/rmap: use page_not_mapped in try_to_unmap() mm, thp: use head page in __migration_entry_wait() mm/thp: fix __split_huge_pmd_locked() on shmem migration entry mm/thp: make is_huge_zero_pmd() safe and quicker mm/thp: try_to_unmap() use TTU_SYNC for safe splitting mm/thp: fix vma_address() if virtual address below file offset mm/thp: fix page_address_in_vma() on file THP tails mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split mm: page_vma_mapped_walk(): use page for pvmw->page mm: page_vma_mapped_walk(): settle PageHuge on entry mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block mm: page_vma_mapped_walk(): crossing page table boundary mm: page_vma_mapped_walk(): add a level of indentation mm: page_vma_mapped_walk(): use goto instead of while (1) mm: page_vma_mapped_walk(): get vma_address_end() earlier mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() mm, futex: fix shared futex pgoff on shmem huge page KVM: SVM: Call SEV Guest Decommission if ASID binding fails swiotlb: manipulate orig_addr when tlb_addr has offset netfs: fix test for whether we can skip read when writing beyond EOF Revert "drm: add a locked version of drm_is_current_master" certs: Add EFI_CERT_X509_GUID support for dbx entries certs: Move load_system_certificate_list to a common function certs: Add ability to preload revocation certs integrity: Load mokx variables into the blacklist keyring Linux 5.10.47 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I68f731ad78a5db003c41093e4faf59f6f9f2e446 |
||
![]() |
0010275ca2 |
mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page()
[ Upstream commit 22061a1ffabdb9c3385de159c5db7aac3a4df1cc ]
There is a race between THP unmapping and truncation, when truncate sees
pmd_none() and skips the entry, after munmap's zap_huge_pmd() cleared
it, but before its page_remove_rmap() gets to decrement
compound_mapcount: generating false "BUG: Bad page cache" reports that
the page is still mapped when deleted. This commit fixes that, but not
in the way I hoped.
The first attempt used try_to_unmap(page, TTU_SYNC|TTU_IGNORE_MLOCK)
instead of unmap_mapping_range() in truncate_cleanup_page(): it has
often been an annoyance that we usually call unmap_mapping_range() with
no pages locked, but there apply it to a single locked page.
try_to_unmap() looks more suitable for a single locked page.
However, try_to_unmap_one() contains a VM_BUG_ON_PAGE(!pvmw.pte,page):
it is used to insert THP migration entries, but not used to unmap THPs.
Copy zap_huge_pmd() and add THP handling now? Perhaps, but their TLB
needs are different, I'm too ignorant of the DAX cases, and couldn't
decide how far to go for anon+swap. Set that aside.
The second attempt took a different tack: make no change in truncate.c,
but modify zap_huge_pmd() to insert an invalidated huge pmd instead of
clearing it initially, then pmd_clear() between page_remove_rmap() and
unlocking at the end. Nice. But powerpc blows that approach out of the
water, with its serialize_against_pte_lookup(), and interesting pgtable
usage. It would need serious help to get working on powerpc (with a
minor optimization issue on s390 too). Set that aside.
Just add an "if (page_mapped(page)) synchronize_rcu();" or other such
delay, after unmapping in truncate_cleanup_page()? Perhaps, but though
that's likely to reduce or eliminate the number of incidents, it would
give less assurance of whether we had identified the problem correctly.
This successful iteration introduces "unmap_mapping_page(page)" instead
of try_to_unmap(), and goes the usual unmap_mapping_range_tree() route,
with an addition to details. Then zap_pmd_range() watches for this
case, and does spin_unlock(pmd_lock) if so - just like
page_vma_mapped_walk() now does in the PVMW_SYNC case. Not pretty, but
safe.
Note that unmap_mapping_page() is doing a VM_BUG_ON(!PageLocked) to
assert its interface; but currently that's only used to make sure that
page->mapping is stable, and zap_pmd_range() doesn't care if the page is
locked or not. Along these lines, in invalidate_inode_pages2_range()
move the initial unmap_mapping_range() out from under page lock, before
then calling unmap_mapping_page() under page lock if still mapped.
Link: https://lkml.kernel.org/r/a2a4a148-cdd8-942c-4ef8-51b77f643dbe@google.com
Fixes:
|
||
![]() |
da33f6fa6c |
ANDROID: mm: Add hooks to filemap_fault for oem's optimization
Add vendor_hooks to filemap_fault to cache page for oem's optimization. Save the page for next time retry when VM_FAULT_RETRY returned. Add ANDROID_OEM_DATA to vm_fault to save the page. Bug: 188891314 Change-Id: Ibfc9ec950c3360e6f6ccb9546cab0acd89e5d316 Signed-off-by: Liangliang Li <liliangliang@vivo.com> |
||
![]() |
76002c201f |
Merge 5.10.38 into android12-5.10
Changes in 5.10.38 KEYS: trusted: Fix memory leak on object td tpm: fix error return code in tpm2_get_cc_attrs_tbl() tpm, tpm_tis: Extend locality handling to TPM2 in tpm_tis_gen_interrupt() tpm, tpm_tis: Reserve locality in tpm_tis_resume() KVM: x86/mmu: Remove the defunct update_pte() paging hook KVM/VMX: Invoke NMI non-IST entry instead of IST entry ACPI: PM: Add ACPI ID of Alder Lake Fan PM: runtime: Fix unpaired parent child_count for force_resume cpufreq: intel_pstate: Use HWP if enabled by platform firmware kvm: Cap halt polling at kvm->max_halt_poll_ns ath11k: fix thermal temperature read fs: dlm: fix debugfs dump fs: dlm: add errno handling to check callback fs: dlm: check on minimum msglen size fs: dlm: flush swork on shutdown tipc: convert dest node's address to network order ASoC: Intel: bytcr_rt5640: Enable jack-detect support on Asus T100TAF net/mlx5e: Use net_prefetchw instead of prefetchw in MPWQE TX datapath net: stmmac: Set FIFO sizes for ipq806x ASoC: rsnd: core: Check convert rate in rsnd_hw_params Bluetooth: Fix incorrect status handling in LE PHY UPDATE event i2c: bail out early when RDWR parameters are wrong ALSA: hdsp: don't disable if not enabled ALSA: hdspm: don't disable if not enabled ALSA: rme9652: don't disable if not enabled ALSA: bebob: enable to deliver MIDI messages for multiple ports Bluetooth: Set CONF_NOT_COMPLETE as l2cap_chan default Bluetooth: initialize skb_queue_head at l2cap_chan_create() net/sched: cls_flower: use ntohs for struct flow_dissector_key_ports net: bridge: when suppression is enabled exclude RARP packets Bluetooth: check for zapped sk before connecting selftests/powerpc: Fix L1D flushing tests for Power10 powerpc/32: Statically initialise first emergency context net: hns3: remediate a potential overflow risk of bd_num_list net: hns3: add handling for xmit skb with recursive fraglist ip6_vti: proper dev_{hold|put} in ndo_[un]init methods ASoC: Intel: bytcr_rt5640: Add quirk for the Chuwi Hi8 tablet ice: handle increasing Tx or Rx ring sizes Bluetooth: btusb: Enable quirk boolean flag for Mediatek Chip. ASoC: rt5670: Add a quirk for the Dell Venue 10 Pro 5055 i2c: Add I2C_AQ_NO_REP_START adapter quirk MIPS: Loongson64: Use _CACHE_UNCACHED instead of _CACHE_UNCACHED_ACCELERATED coresight: Do not scan for graph if none is present IB/hfi1: Correct oversized ring allocation mac80211: clear the beacon's CRC after channel switch pinctrl: samsung: use 'int' for register masks in Exynos rtw88: 8822c: add LC calibration for RTL8822C mt76: mt7615: support loading EEPROM for MT7613BE mt76: mt76x0: disable GTK offloading mt76: mt7915: fix txpower init for TSSI off chips fuse: invalidate attrs when page writeback completes virtiofs: fix userns cuse: prevent clone iwlwifi: pcie: make cfg vs. trans_cfg more robust powerpc/mm: Add cond_resched() while removing hpte mappings ASoC: rsnd: call rsnd_ssi_master_clk_start() from rsnd_ssi_init() Revert "iommu/amd: Fix performance counter initialization" iommu/amd: Remove performance counter pre-initialization test drm/amd/display: Force vsync flip when reconfiguring MPCC selftests: Set CC to clang in lib.mk if LLVM is set kconfig: nconf: stop endless search loops ALSA: hda/realtek: Add quirk for Lenovo Ideapad S740 ASoC: Intel: sof_sdw: add quirk for new ADL-P Rvp ALSA: hda/hdmi: fix race in handling acomp ELD notification at resume sctp: Fix out-of-bounds warning in sctp_process_asconf_param() flow_dissector: Fix out-of-bounds warning in __skb_flow_bpf_to_target() powerpc/smp: Set numa node before updating mask ASoC: rt286: Generalize support for ALC3263 codec ethtool: ioctl: Fix out-of-bounds warning in store_link_ksettings_for_user() net: sched: tapr: prevent cycle_time == 0 in parse_taprio_schedule samples/bpf: Fix broken tracex1 due to kprobe argument change powerpc/pseries: Stop calling printk in rtas_stop_self() drm/amd/display: fixed divide by zero kernel crash during dsc enablement drm/amd/display: add handling for hdcp2 rx id list validation drm/amdgpu: Add mem sync flag for IB allocated by SA mt76: mt7615: fix entering driver-own state on mt7663 crypto: ccp: Free SEV device if SEV init fails wl3501_cs: Fix out-of-bounds warnings in wl3501_send_pkt wl3501_cs: Fix out-of-bounds warnings in wl3501_mgmt_join qtnfmac: Fix possible buffer overflow in qtnf_event_handle_external_auth powerpc/iommu: Annotate nested lock for lockdep iavf: remove duplicate free resources calls net: ethernet: mtk_eth_soc: fix RX VLAN offload selftests: mlxsw: Increase the tolerance of backlog buildup selftests: mlxsw: Fix mausezahn invocation in ERSPAN scale test kbuild: generate Module.symvers only when vmlinux exists bnxt_en: Add PCI IDs for Hyper-V VF devices. ia64: module: fix symbolizer crash on fdescr watchdog: rename __touch_watchdog() to a better descriptive name watchdog: explicitly update timestamp when reporting softlockup watchdog/softlockup: remove logic that tried to prevent repeated reports watchdog: fix barriers when printing backtraces from all CPUs ASoC: rt286: Make RT286_SET_GPIO_* readable and writable thermal: thermal_of: Fix error return code of thermal_of_populate_bind_params() f2fs: move ioctl interface definitions to separated file f2fs: fix compat F2FS_IOC_{MOVE,GARBAGE_COLLECT}_RANGE f2fs: fix to allow migrating fully valid segment f2fs: fix panic during f2fs_resize_fs() f2fs: fix a redundant call to f2fs_balance_fs if an error occurs remoteproc: qcom_q6v5_mss: Replace ioremap with memremap remoteproc: qcom_q6v5_mss: Validate p_filesz in ELF loader PCI: iproc: Fix return value of iproc_msi_irq_domain_alloc() PCI: Release OF node in pci_scan_device()'s error path ARM: 9064/1: hw_breakpoint: Do not directly check the event's overflow_handler hook f2fs: fix to align to section for fallocate() on pinned file f2fs: fix to update last i_size if fallocate partially succeeds PCI: endpoint: Make *_get_first_free_bar() take into account 64 bit BAR PCI: endpoint: Add helper API to get the 'next' unreserved BAR PCI: endpoint: Make *_free_bar() to return error codes on failure PCI: endpoint: Fix NULL pointer dereference for ->get_features() f2fs: fix to avoid touching checkpointed data in get_victim() f2fs: fix to cover __allocate_new_section() with curseg_lock f2fs: Fix a hungtask problem in atomic write f2fs: fix to avoid accessing invalid fio in f2fs_allocate_data_block() rpmsg: qcom_glink_native: fix error return code of qcom_glink_rx_data() NFS: nfs4_bitmask_adjust() must not change the server global bitmasks NFS: Fix attribute bitmask in _nfs42_proc_fallocate() NFSv4.2: Always flush out writes in nfs42_proc_fallocate() NFS: Deal correctly with attribute generation counter overflow PCI: endpoint: Fix missing destroy_workqueue() pNFS/flexfiles: fix incorrect size check in decode_nfs_fh() NFSv4.2 fix handling of sr_eof in SEEK's reply SUNRPC: Move fault injection call sites SUNRPC: Remove trace_xprt_transmit_queued SUNRPC: Handle major timeout in xprt_adjust_timeout() thermal/drivers/tsens: Fix missing put_device error NFSv4.x: Don't return NFS4ERR_NOMATCHING_LAYOUT if we're unmounting nfsd: ensure new clients break delegations rtc: fsl-ftm-alarm: add MODULE_TABLE() dmaengine: idxd: Fix potential null dereference on pointer status dmaengine: idxd: fix dma device lifetime dmaengine: idxd: fix cdev setup and free device lifetime issues SUNRPC: fix ternary sign expansion bug in tracing pwm: atmel: Fix duty cycle calculation in .get_state() xprtrdma: Avoid Receive Queue wrapping xprtrdma: Fix cwnd update ordering xprtrdma: rpcrdma_mr_pop() already does list_del_init() swiotlb: Fix the type of index ceph: fix inode leak on getattr error in __fh_to_dentry scsi: qla2xxx: Prevent PRLI in target mode scsi: ufs: core: Do not put UFS power into LPM if link is broken scsi: ufs: core: Cancel rpm_dev_flush_recheck_work during system suspend scsi: ufs: core: Narrow down fast path in system suspend path rtc: ds1307: Fix wday settings for rx8130 net: hns3: fix incorrect configuration for igu_egu_hw_err net: hns3: initialize the message content in hclge_get_link_mode() net: hns3: add check for HNS3_NIC_STATE_INITED in hns3_reset_notify_up_enet() net: hns3: fix for vxlan gpe tx checksum bug net: hns3: use netif_tx_disable to stop the transmit queue net: hns3: disable phy loopback setting in hclge_mac_start_phy sctp: do asoc update earlier in sctp_sf_do_dupcook_a RISC-V: Fix error code returned by riscv_hartid_to_cpuid() sunrpc: Fix misplaced barrier in call_decode libbpf: Fix signed overflow in ringbuf_process_ring block/rnbd-clt: Change queue_depth type in rnbd_clt_session to size_t block/rnbd-clt: Check the return value of the function rtrs_clt_query ethernet:enic: Fix a use after free bug in enic_hard_start_xmit sctp: fix a SCTP_MIB_CURRESTAB leak in sctp_sf_do_dupcook_b netfilter: xt_SECMARK: add new revision to fix structure layout xsk: Fix for xp_aligned_validate_desc() when len == chunk_size net: stmmac: Clear receive all(RA) bit when promiscuous mode is off drm/radeon: Fix off-by-one power_state index heap overwrite drm/radeon: Avoid power table parsing memory leaks arm64: entry: factor irq triage logic into macros arm64: entry: always set GIC_PRIO_PSR_I_SET during entry khugepaged: fix wrong result value for trace_mm_collapse_huge_page_isolate() mm/hugeltb: handle the error case in hugetlb_fix_reserve_counts() mm/migrate.c: fix potential indeterminate pte entry in migrate_vma_insert_page() ksm: fix potential missing rmap_item for stable_node mm/gup: check every subpage of a compound page during isolation mm/gup: return an error on migration failure mm/gup: check for isolation errors ethtool: fix missing NLM_F_MULTI flag when dumping net: fix nla_strcmp to handle more then one trailing null character smc: disallow TCP_ULP in smc_setsockopt() netfilter: nfnetlink_osf: Fix a missing skb_header_pointer() NULL check netfilter: nftables: Fix a memleak from userdata error path in new objects can: mcp251xfd: mcp251xfd_probe(): add missing can_rx_offload_del() in error path can: mcp251x: fix resume from sleep before interface was brought up can: m_can: m_can_tx_work_queue(): fix tx_skb race condition sched: Fix out-of-bound access in uclamp sched/fair: Fix unfairness caused by missing load decay fs/proc/generic.c: fix incorrect pde_is_permanent check kernel: kexec_file: fix error return code of kexec_calculate_store_digests() kernel/resource: make walk_system_ram_res() find all busy IORESOURCE_SYSTEM_RAM resources kernel/resource: make walk_mem_res() find all busy IORESOURCE_MEM resources netfilter: nftables: avoid overflows in nft_hash_buckets() i40e: fix broken XDP support i40e: Fix use-after-free in i40e_client_subtask() i40e: fix the restart auto-negotiation after FEC modified i40e: Fix PHY type identifiers for 2.5G and 5G adapters mptcp: fix splat when closing unaccepted socket f2fs: avoid unneeded data copy in f2fs_ioc_move_range() ARC: entry: fix off-by-one error in syscall number validation ARC: mm: PAE: use 40-bit physical page mask ARC: mm: Use max_high_pfn as a HIGHMEM zone border powerpc/64s: Fix crashes when toggling stf barrier powerpc/64s: Fix crashes when toggling entry flush barrier hfsplus: prevent corruption in shrinking truncate squashfs: fix divide error in calculate_skip() userfaultfd: release page in error path to avoid BUG_ON kasan: fix unit tests with CONFIG_UBSAN_LOCAL_BOUNDS enabled mm/hugetlb: fix F_SEAL_FUTURE_WRITE blk-iocost: fix weight updates of inner active iocgs arm64: mte: initialize RGSR_EL1.SEED in __cpu_setup arm64: Fix race condition on PG_dcache_clean in __sync_icache_dcache() btrfs: fix race leading to unpersisted data and metadata on fsync drm/radeon/dpm: Disable sclk switching on Oland when two 4K 60Hz monitors are connected drm/amd/display: Initialize attribute for hdcp_srm sysfs file drm/i915: Avoid div-by-zero on gen2 kvm: exit halt polling on need_resched() as well KVM: LAPIC: Accurately guarantee busy wait for timer to expire when using hv_timer drm/msm/dp: initialize audio_comp when audio starts KVM: x86: Cancel pvclock_gtod_work on module removal KVM: x86: Prevent deadlock against tk_core.seq dax: Add an enum for specifying dax wakup mode dax: Add a wakeup mode parameter to put_unlocked_entry() dax: Wake up all waiters after invalidating dax entry xen/unpopulated-alloc: consolidate pgmap manipulation xen/unpopulated-alloc: fix error return code in fill_list() perf tools: Fix dynamic libbpf link usb: dwc3: gadget: Free gadget structure only after freeing endpoints iio: light: gp2ap002: Fix rumtime PM imbalance on error iio: proximity: pulsedlight: Fix rumtime PM imbalance on error iio: hid-sensors: select IIO_TRIGGERED_BUFFER under HID_SENSOR_IIO_TRIGGER usb: fotg210-hcd: Fix an error message hwmon: (occ) Fix poll rate limiting usb: musb: Fix an error message ACPI: scan: Fix a memory leak in an error handling path kyber: fix out of bounds access when preempted nvmet: add lba to sect conversion helpers nvmet: fix inline bio check for bdev-ns nvmet-rdma: Fix NULL deref when SEND is completed with error f2fs: compress: fix to free compress page correctly f2fs: compress: fix race condition of overwrite vs truncate f2fs: compress: fix to assign cc.cluster_idx correctly nbd: Fix NULL pointer in flush_workqueue blk-mq: plug request for shared sbitmap blk-mq: Swap two calls in blk_mq_exit_queue() usb: dwc3: omap: improve extcon initialization usb: dwc3: pci: Enable usb2-gadget-lpm-disable for Intel Merrifield usb: xhci: Increase timeout for HC halt usb: dwc2: Fix gadget DMA unmap direction usb: core: hub: fix race condition about TRSMRCY of resume usb: dwc3: gadget: Enable suspend events usb: dwc3: gadget: Return success always for kick transfer in ep queue usb: typec: ucsi: Retrieve all the PDOs instead of just the first 4 usb: typec: ucsi: Put fwnode in any case during ->probe() xhci-pci: Allow host runtime PM as default for Intel Alder Lake xHCI xhci: Do not use GFP_KERNEL in (potentially) atomic context xhci: Add reset resume quirk for AMD xhci controller. iio: gyro: mpu3050: Fix reported temperature value iio: tsl2583: Fix division by a zero lux_val cdc-wdm: untangle a circular dependency between callback and softint xen/gntdev: fix gntdev_mmap() error exit path KVM: x86: Emulate RDPID only if RDTSCP is supported KVM: x86: Move RDPID emulation intercept to its own enum KVM: nVMX: Always make an attempt to map eVMCS after migration KVM: VMX: Do not advertise RDPID if ENABLE_RDTSCP control is unsupported KVM: VMX: Disable preemption when probing user return MSRs Revert "iommu/vt-d: Remove WO permissions on second-level paging entries" Revert "iommu/vt-d: Preset Access/Dirty bits for IOVA over FL" iommu/vt-d: Preset Access/Dirty bits for IOVA over FL iommu/vt-d: Remove WO permissions on second-level paging entries mm: fix struct page layout on 32-bit systems MIPS: Reinstate platform `__div64_32' handler MIPS: Avoid DIVU in `__div64_32' is result would be zero MIPS: Avoid handcoded DIVU in `__div64_32' altogether clocksource/drivers/timer-ti-dm: Prepare to handle dra7 timer wrap issue clocksource/drivers/timer-ti-dm: Handle dra7 timer wrap errata i940 ARM: 9011/1: centralize phys-to-virt conversion of DT/ATAGS address ARM: 9012/1: move device tree mapping out of linear region ARM: 9020/1: mm: use correct section size macro to describe the FDT virtual address ARM: 9027/1: head.S: explicitly map DT even if it lives in the first physical section usb: typec: tcpm: Fix error while calculating PPS out values kobject_uevent: remove warning in init_uevent_argv() drm/i915/gt: Fix a double free in gen8_preallocate_top_level_pdp drm/i915: Read C0DRB3/C1DRB3 as 16 bits again drm/i915/overlay: Fix active retire callback alignment drm/i915: Fix crash in auto_retire clk: exynos7: Mark aclk_fsys1_200 as critical media: rkvdec: Remove of_match_ptr() i2c: mediatek: Fix send master code at more than 1MHz dt-bindings: media: renesas,vin: Make resets optional on R-Car Gen1 dt-bindings: serial: 8250: Remove duplicated compatible strings debugfs: Make debugfs_allow RO after init ext4: fix debug format string warning nvme: do not try to reconfigure APST when the controller is not live ASoC: rsnd: check all BUSIF status when error Linux 5.10.38 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ia32e01283b488a38be48015c58a0e481f09aaf65 |
||
![]() |
014868616d |
mm/hugetlb: fix F_SEAL_FUTURE_WRITE
commit 22247efd822e6d263f3c8bd327f3f769aea9b1d9 upstream.
Patch series "mm/hugetlb: Fix issues on file sealing and fork", v2.
Hugh reported issue with F_SEAL_FUTURE_WRITE not applied correctly to
hugetlbfs, which I can easily verify using the memfd_test program, which
seems that the program is hardly run with hugetlbfs pages (as by default
shmem).
Meanwhile I found another probably even more severe issue on that hugetlb
fork won't wr-protect child cow pages, so child can potentially write to
parent private pages. Patch 2 addresses that.
After this series applied, "memfd_test hugetlbfs" should start to pass.
This patch (of 2):
F_SEAL_FUTURE_WRITE is missing for hugetlb starting from the first day.
There is a test program for that and it fails constantly.
$ ./memfd_test hugetlbfs
memfd-hugetlb: CREATE
memfd-hugetlb: BASIC
memfd-hugetlb: SEAL-WRITE
memfd-hugetlb: SEAL-FUTURE-WRITE
mmap() didn't fail as expected
Aborted (core dumped)
I think it's probably because no one is really running the hugetlbfs test.
Fix it by checking FUTURE_WRITE also in hugetlbfs_file_mmap() as what we
do in shmem_mmap(). Generalize a helper for that.
Link: https://lkml.kernel.org/r/20210503234356.9097-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20210503234356.9097-2-peterx@redhat.com
Fixes:
|
||
![]() |
458e81ecf7 |
ANDROID: mm: spf: fix task fault accounting
SPF has missed per-process fault account logic. Fix it. Bug: 187072626 Signed-off-by: Minchan Kim <minchan@google.com> Change-Id: Ibd854bf350721917ef0be1af65055691d16b52f0 |
||
![]() |
ddc4a48797 |
ANDROID: mm: page_pinner: introduce failure_tracking feature
CMA allocation can fail by temporal page refcount increasement by get_page API as well as get_user_pages friends. However, since get_page is one of the most hot function, it is hard to hook get_page to get callstack everytime due to performance concern. Furthermore, get_page could be nested multiple times so we couldn't track all of the pin sites on limited space of page_pinner. Thus, here approach is keep tracking of put_page callsite rather than get_page once VM found the page migration failed. It's based on assumption: 1. Since it's temporal page refcount, it could be released soon before overflowing dmesg log buffer 2. developer can find the pair of get_page by reviewing put_page. By default, it's eanbled. If you want to disable it: echo 0 > $debugfs/page_pinner/failure_tracking You can capture the tracking using: cat $debugfs/page_pinner/alloc_contig_failed note: the example below is artificial: Page pinned ts 386067292 us count 0 PFN 10162530 Block 9924 type Isolate Flags 0x800000000008000c(uptodate|dirty|swapbacked) __page_pinner_migration_failed+0x30/0x104 putback_lru_page+0x90/0xac putback_movable_pages+0xc4/0x204 __alloc_contig_migrate_range+0x290/0x31c alloc_contig_range+0x114/0x2bc cma_alloc+0x2d8/0x698 cma_alloc_write+0x58/0xb8 simple_attr_write+0xd4/0x124 debugfs_attr_write+0x50/0xd8 full_proxy_write+0x70/0xf8 vfs_write+0x168/0x3a8 ksys_write+0x7c/0xec __arm64_sys_write+0x20/0x30 el0_svc_common+0xa4/0x180 do_el0_svc+0x28/0x88 el0_svc+0x14/0x24 Page pinned ts 385867394 us count 0 PFN 10162530 Block 9924 type Isolate Flags 0x800000000008000c(uptodate|dirty|swapbacked) __page_pinner_migration_failed+0x30/0x104 __alloc_contig_migrate_range+0x200/0x31c alloc_contig_range+0x114/0x2bc cma_alloc+0x2d8/0x698 cma_alloc_write+0x58/0xb8 simple_attr_write+0xd4/0x124 debugfs_attr_write+0x50/0xd8 full_proxy_write+0x70/0xf8 vfs_write+0x168/0x3a8 ksys_write+0x7c/0xec __arm64_sys_write+0x20/0x30 el0_svc_common+0xa4/0x180 do_el0_svc+0x28/0x88 el0_svc+0x14/0x24 el0_sync_handler+0x88/0xec el0_sync+0x198/0x1c0 Bug: 183414571 Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Minchan Kim <minchan@google.com> Change-Id: Ie79902c18390eb9f320d823839bb9d9a7fdcdb31 |
||
![]() |
6e12c5b7d4 |
ANDROID: mm: introduce page_pinner
For CMA allocation, it's really critical to migrate a page but sometimes it fails. One of the reasons is some driver holds a page refcount for a long time so VM couldn't migrate the page at that time. The concern here is there is no way to find the who hold the refcount of the page effectively. This patch introduces feature to keep tracking page's pinner. All get_page sites are vulnerable to pin a page for a long time but the cost to keep track it would be significat since get_page is the most frequent kernel operation. Furthermore, the page could be not user page but kernel page which is not related to the page migration failure. So, this patch keeps tracking only get_user_pages/follow_page with (FOLL_GET|PIN friends because they are the very common APIs to pin user pages which could cause migration failure and the less frequent than get_page so runtime cost wouldn't be that big but could cover many cases effectively. This patch also introduces put_user_page API. It aims for attributing "the pinner releases the page from now on" while it release the page refcount. Thus, any user of get_user_pages/follow_page(FOLL_GET) must use put_user_page as pair of those functions. Otherwise, page_pinner will treat them long term pinner as false postive but nothing should affect stability. * $debugfs/page_pinner/threshold It indicates threshold(microsecond) to flag long term pinning. It's configurable(Default is 300000us). Once you write new value to the threshold, old data will clear. * $debugfs/page_pinner/longterm_pinner It shows call sites where the duration of pinning was greater than the threshold. Internally, it uses a static array to keep 4096 elements and overwrites old ones once overflow happens. Therefore, you could lose some information. example) Page pinned ts 76953865787 us count 1 PFN 9856945 Block 9625 type Movable Flags 0x8000000000080014(uptodate|lru|swapbacked) __set_page_pinner+0x34/0xcc try_grab_page+0x19c/0x1a0 follow_page_pte+0x1c0/0x33c follow_page_mask+0xc0/0xc8 __get_user_pages+0x178/0x414 __gup_longterm_locked+0x80/0x148 internal_get_user_pages_fast+0x140/0x174 pin_user_pages_fast+0x24/0x40 CCC BBB AAA __arm64_sys_ioctl+0x94/0xd0 el0_svc_common+0xa4/0x180 do_el0_svc+0x28/0x88 el0_svc+0x14/0x24 note: page_pinner doesn't guarantee attributing/unattributing are atomic if they happen at the same time. It's just best effort so false-positive could happen. Bug: 183414571 Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Minchan Kim <minchan@google.com> Change-Id: Ife37ec360eef993d390b9c131732218a4dfd2f04 |
||
![]() |
35eacb5c87 |
ANDROID: mm: allow vmas with vm_ops to be speculatively handled
Right now only anonymous page faults are speculatively handled, thus leaving out a large percentage of faults still requiring to take mmap_sem. These were left out since there can be fault handlers mainly in the fs layer which may use vma in unknown ways. This patch enables speculative fault for ext4, f2fs and shmem. The feature is disabled by default and enabled via allow_file_spec_access kernel param. Bug: 171954515 Change-Id: I0d23ebf299000e4ac5e2c71bc0b7fc9006e98da9 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> |
||
![]() |
4d3dd339de |
BACKPORT: FROMGIT: userfaultfd: add minor fault registration mode
Patch series "userfaultfd: add minor fault handling", v9. Overview ======== This series adds a new userfaultfd feature, UFFD_FEATURE_MINOR_HUGETLBFS. When enabled (via the UFFDIO_API ioctl), this feature means that any hugetlbfs VMAs registered with UFFDIO_REGISTER_MODE_MISSING will *also* get events for "minor" faults. By "minor" fault, I mean the following situation: Let there exist two mappings (i.e., VMAs) to the same page(s) (shared memory). One of the mappings is registered with userfaultfd (in minor mode), and the other is not. Via the non-UFFD mapping, the underlying pages have already been allocated & filled with some contents. The UFFD mapping has not yet been faulted in; when it is touched for the first time, this results in what I'm calling a "minor" fault. As a concrete example, when working with hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing page. We also add a new ioctl to resolve such faults: UFFDIO_CONTINUE. The idea is, userspace resolves the fault by either a) doing nothing if the contents are already correct, or b) updating the underlying contents using the second, non-UFFD mapping (via memcpy/memset or similar, or something fancier like RDMA, or etc...). In either case, userspace issues UFFDIO_CONTINUE to tell the kernel "I have ensured the page contents are correct, carry on setting up the mapping". Use Case ======== Consider the use case of VM live migration (e.g. under QEMU/KVM): 1. While a VM is still running, we copy the contents of its memory to a target machine. The pages are populated on the target by writing to the non-UFFD mapping, using the setup described above. The VM is still running (and therefore its memory is likely changing), so this may be repeated several times, until we decide the target is "up to date enough". 2. We pause the VM on the source, and start executing on the target machine. During this gap, the VM's user(s) will *see* a pause, so it is desirable to minimize this window. 3. Between the last time any page was copied from the source to the target, and when the VM was paused, the contents of that page may have changed - and therefore the copy we have on the target machine is out of date. Although we can keep track of which pages are out of date, for VMs with large amounts of memory, it is "slow" to transfer this information to the target machine. We want to resume execution before such a transfer would complete. 4. So, the guest begins executing on the target machine. The first time it touches its memory (via the UFFD-registered mapping), userspace wants to intercept this fault. Userspace checks whether or not the page is up to date, and if not, copies the updated page from the source machine, via the non-UFFD mapping. Finally, whether a copy was performed or not, userspace issues a UFFDIO_CONTINUE ioctl to tell the kernel "I have ensured the page contents are correct, carry on setting up the mapping". We don't have to do all of the final updates on-demand. The userfaultfd manager can, in the background, also copy over updated pages once it receives the map of which pages are up-to-date or not. Interaction with Existing APIs ============================== Because this is a feature, a registered VMA could potentially receive both missing and minor faults. I spent some time thinking through how the existing API interacts with the new feature: UFFDIO_CONTINUE cannot be used to resolve non-minor faults, as it does not allocate a new page. If UFFDIO_CONTINUE is used on a non-minor fault: - For non-shared memory or shmem, -EINVAL is returned. - For hugetlb, -EFAULT is returned. UFFDIO_COPY and UFFDIO_ZEROPAGE cannot be used to resolve minor faults. Without modifications, the existing codepath assumes a new page needs to be allocated. This is okay, since userspace must have a second non-UFFD-registered mapping anyway, thus there isn't much reason to want to use these in any case (just memcpy or memset or similar). - If UFFDIO_COPY is used on a minor fault, -EEXIST is returned. - If UFFDIO_ZEROPAGE is used on a minor fault, -EEXIST is returned (or -EINVAL in the case of hugetlb, as UFFDIO_ZEROPAGE is unsupported in any case). - UFFDIO_WRITEPROTECT simply doesn't work with shared memory, and returns -ENOENT in that case (regardless of the kind of fault). Future Work =========== This series only supports hugetlbfs. I have a second series in flight to support shmem as well, extending the functionality. This series is more mature than the shmem support at this point, and the functionality works fully on hugetlbfs, so this series can be merged first and then shmem support will follow. This patch (of 6): This feature allows userspace to intercept "minor" faults. By "minor" faults, I mean the following situation: Let there exist two mappings (i.e., VMAs) to the same page(s). One of the mappings is registered with userfaultfd (in minor mode), and the other is not. Via the non-UFFD mapping, the underlying pages have already been allocated & filled with some contents. The UFFD mapping has not yet been faulted in; when it is touched for the first time, this results in what I'm calling a "minor" fault. As a concrete example, when working with hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing page. This commit adds the new registration mode, and sets the relevant flag on the VMAs being registered. In the hugetlb fault path, if we find that we have huge_pte_none(), but find_lock_page() does indeed find an existing page, then we have a "minor" fault, and if the VMA has the userfaultfd registration flag, we call into userfaultfd to handle it. This is implemented as a new registration mode, instead of an API feature. This is because the alternative implementation has significant drawbacks [1]. However, doing it this was requires we allocate a VM_* flag for the new registration mode. On 32-bit systems, there are no unused bits, so this feature is only supported on architectures with CONFIG_ARCH_USES_HIGH_VMA_FLAGS. When attempting to register a VMA in MINOR mode on 32-bit architectures, we return -EINVAL. [1] https://lore.kernel.org/patchwork/patch/1380226/ Link: https://lkml.kernel.org/r/20210301222728.176417-1-axelrasmussen@google.com Link: https://lkml.kernel.org/r/20210301222728.176417-2-axelrasmussen@google.com Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michal Koutn" <mkoutny@suse.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Shaohua Li <shli@fb.com> Cc: Shawn Anastasio <shawn@anastas.io> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Steven Price <steven.price@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Adam Ruprecht <ruprecht@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Cannon Matthews <cannonmatthews@google.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Oliver Upton <oupton@google.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> (cherry picked from commit 82a150ec394f6b944e26786b907fc0deab5b2064 https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm) Link: https://lore.kernel.org/patchwork/patch/1388132/ Conflicts: arch/arm64/Kconfig fs/userfaultfd.c mm/hugetlb.c (All related to SPF feature. Resolved by manual rebase) Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Bug: 160737021 Bug: 169683130 Change-Id: I43b37272d531341439ceaa03213d0e2415e04688 |
||
![]() |
6e63cc1fe2 |
kasan: fix per-page tags for non-page_alloc pages
commit cf10bd4c4aff8dd64d1aa7f2a529d0c672bc16af upstream.
To allow performing tag checks on page_alloc addresses obtained via
page_address(), tag-based KASAN modes store tags for page_alloc
allocations in page->flags.
Currently, the default tag value stored in page->flags is 0x00.
Therefore, page_address() returns a 0x00ffff... address for pages that
were not allocated via page_alloc.
This might cause problems. A particular case we encountered is a
conflict with KFENCE. If a KFENCE-allocated slab object is being freed
via kfree(page_address(page) + offset), the address passed to kfree()
will get tagged with 0x00 (as slab pages keep the default per-page
tags). This leads to is_kfence_address() check failing, and a KFENCE
object ending up in normal slab freelist, which causes memory
corruptions.
This patch changes the way KASAN stores tag in page-flags: they are now
stored xor'ed with 0xff. This way, KASAN doesn't need to initialize
per-page flags for every created page, which might be slow.
With this change, page_address() returns natively-tagged (with 0xff)
pointers for pages that didn't have tags set explicitly.
This patch fixes the encountered conflict with KFENCE and prevents more
similar issues that can occur in the future.
Link: https://lkml.kernel.org/r/1a41abb11c51b264511d9e71c303bb16d5cb367b.1615475452.git.andreyknvl@google.com
Fixes:
|
||
![]() |
9538c5a8c5 |
FROMGIT: mm: introduce debug_pagealloc_{map,unmap}_pages() helpers
Patch series "arch, mm: improve robustness of direct map manipulation", v7. During recent discussion about KVM protected memory, David raised a concern about usage of __kernel_map_pages() outside of DEBUG_PAGEALLOC scope [1]. Indeed, for architectures that define CONFIG_ARCH_HAS_SET_DIRECT_MAP it is possible that __kernel_map_pages() would fail, but since this function is void, the failure will go unnoticed. Moreover, there's lack of consistency of __kernel_map_pages() semantics across architectures as some guard this function with #ifdef DEBUG_PAGEALLOC, some refuse to update the direct map if page allocation debugging is disabled at run time and some allow modifying the direct map regardless of DEBUG_PAGEALLOC settings. This set straightens this out by restoring dependency of __kernel_map_pages() on DEBUG_PAGEALLOC and updating the call sites accordingly. Since currently the only user of __kernel_map_pages() outside DEBUG_PAGEALLOC is hibernation, it is updated to make direct map accesses there more explicit. [1] https://lore.kernel.org/lkml/2759b4bf-e1e3-d006-7d86-78a40348269d@redhat.com This patch (of 4): When CONFIG_DEBUG_PAGEALLOC is enabled, it unmaps pages from the kernel direct mapping after free_pages(). The pages than need to be mapped back before they could be used. Theese mapping operations use __kernel_map_pages() guarded with with debug_pagealloc_enabled(). The only place that calls __kernel_map_pages() without checking whether DEBUG_PAGEALLOC is enabled is the hibernation code that presumes availability of this function when ARCH_HAS_SET_DIRECT_MAP is set. Still, on arm64, __kernel_map_pages() will bail out when DEBUG_PAGEALLOC is not enabled but set_direct_map_invalid_noflush() may render some pages not present in the direct map and hibernation code won't be able to save such pages. To make page allocation debugging and hibernation interaction more robust, the dependency on DEBUG_PAGEALLOC or ARCH_HAS_SET_DIRECT_MAP has to be made more explicit. Start with combining the guard condition and the call to __kernel_map_pages() into debug_pagealloc_map_pages() and debug_pagealloc_unmap_pages() functions to emphasize that __kernel_map_pages() should not be called without DEBUG_PAGEALLOC and use these new functions to map/unmap pages when page allocation debugging is enabled. Link: https://lkml.kernel.org/r/20201109192128.960-1-rppt@kernel.org Link: https://lkml.kernel.org/r/20201109192128.960-2-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Len Brown <len.brown@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 77bc7fd607dee2ffb28daff6d0dd8ae42af61ea8 https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm) Bug: 182930667 Signed-off-by: Alexander Potapenko <glider@google.com> Change-Id: I9f0dac574bc3a7ea7d88bff051b77eca19610ce9 |
||
![]() |
a2bbfa414c |
FROMGIT: kernel/power: allow hibernation with page_poison sanity checking
Page poisoning used to be incompatible with hibernation, as the state of poisoned pages was lost after resume, thus enabling CONFIG_HIBERNATION forces CONFIG_PAGE_POISONING_NO_SANITY. For the same reason, the poisoning with zeroes variant CONFIG_PAGE_POISONING_ZERO used to disable hibernation. The latter restriction was removed by commit |
||
![]() |
e871c7feeb |
FROMGIT: mm, page_poison: use static key more efficiently
Commit
|
||
![]() |
0879d44ddd |
BACKPORT: mm, page_alloc: do not rely on the order of page_poison and init_on_alloc/free parameters
Patch series "cleanup page poisoning", v3. I have identified a number of issues and opportunities for cleanup with CONFIG_PAGE_POISON and friends: - interaction with init_on_alloc and init_on_free parameters depends on the order of parameters (Patch 1) - the boot time enabling uses static key, but inefficienty (Patch 2) - sanity checking is incompatible with hibernation (Patch 3) - CONFIG_PAGE_POISONING_NO_SANITY can be removed now that we have init_on_free (Patch 4) - CONFIG_PAGE_POISONING_ZERO can be most likely removed now that we have init_on_free (Patch 5) This patch (of 5): Enabling page_poison=1 together with init_on_alloc=1 or init_on_free=1 produces a warning in dmesg that page_poison takes precedence. However, as these warnings are printed in early_param handlers for init_on_alloc/free, they are not printed if page_poison is enabled later on the command line (handlers are called in the order of their parameters), or when init_on_alloc/free is always enabled by the respective config option - before the page_poison early param handler is called, it is not considered to be enabled. This is inconsistent. We can remove the dependency on order by making the init_on_* parameters only set a boolean variable, and postponing the evaluation after all early params have been processed. Introduce a new init_mem_debugging_and_hardening() function for that, and move the related debug_pagealloc processing there as well. As a result init_mem_debugging_and_hardening() knows always accurately if init_on_* and/or page_poison options were enabled. Thus we can also optimize want_init_on_alloc() and want_init_on_free(). We don't need to check page_poisoning_enabled() there, we can instead not enable the init_on_* static keys at all, if page poisoning is enabled. This results in a simpler and more effective code. Link: https://lkml.kernel.org/r/20201113104033.22907-1-vbabka@suse.cz Link: https://lkml.kernel.org/r/20201113104033.22907-2-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mateusz Nosek <mateusznosek0@gmail.com> Cc: Laura Abbott <labbott@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 04013513cc84c401c7de9023ff3eda7863fc4add https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm) Bug: 182930667 [glider: resolved a minor conflict in init/main.c - API change] Signed-off-by: Alexander Potapenko <glider@google.com> Change-Id: I6c0ffcb0d8e2f56a688986aa1dc201adf89de067 |
||
![]() |
ea1ffe7053 |
FROMGIT: kasan: fix per-page tags for non-page_alloc pages
To allow performing tag checks on page_alloc addresses obtained via
page_address(), tag-based KASAN modes store tags for page_alloc
allocations in page->flags.
Currently, the default tag value stored in page->flags is 0x00.
Therefore, page_address() returns a 0x00ffff... address for pages that
were not allocated via page_alloc.
This might cause problems. A particular case we encountered is a conflict
with KFENCE. If a KFENCE-allocated slab object is being freed via
kfree(page_address(page) + offset), the address passed to kfree() will get
tagged with 0x00 (as slab pages keep the default per-page tags). This
leads to is_kfence_address() check failing, and a KFENCE object ending up
in normal slab freelist, which causes memory corruptions.
This patch changes the way KASAN stores tag in page-flags: they are now
stored xor'ed with 0xff. This way, KASAN doesn't need to initialize
per-page flags for every created page, which might be slow.
With this change, page_address() returns natively-tagged (with 0xff)
pointers for pages that didn't have tags set explicitly.
This patch fixes the encountered conflict with KFENCE and prevents more
similar issues that can occur in the future.
Link: https://lkml.kernel.org/r/1a41abb11c51b264511d9e71c303bb16d5cb367b.1615475452.git.andreyknvl@google.com
Fixes:
|
||
![]() |
8faaa07702 |
ANDROID: GKI: mm.h: add Android ABI padding to a structure
Try to mitigate potential future driver core api changes by adding a padding to struct vm_operations_struct. Based on a change made to the RHEL/CENTOS 8 kernel. Bug: 151154716 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I78f84148ef4d3524bd6c5b78e53e06503a4ac3ae |
||
![]() |
e1a763aea5 |
Merge 5.10.19 into android-5.10
Changes in 5.10.19 bpf: Fix truncation handling for mod32 dst reg wrt zero HID: make arrays usage and value to be the same RDMA: Lift ibdev_to_node from rds to common code nvme-rdma: Use ibdev_to_node instead of dereferencing ->dma_device USB: quirks: sort quirk entries usb: quirks: add quirk to start video capture on ELMO L-12F document camera reliable ceph: downgrade warning from mdsmap decode to debug ntfs: check for valid standard information attribute Bluetooth: btusb: Some Qualcomm Bluetooth adapters stop working arm64: tegra: Add power-domain for Tegra210 HDA hwmon: (dell-smm) Add XPS 15 L502X to fan control blacklist KVM: x86: Zap the oldest MMU pages, not the newest mm: unexport follow_pte_pmd mm: simplify follow_pte{,pmd} KVM: do not assume PTE is writable after follow_pfn mm: provide a saner PTE walking API for modules KVM: Use kvm_pfn_t for local PFN variable in hva_to_pfn_remapped() drm/xlnx: fix kmemleak by sending vblank_event in atomic_disable NET: usb: qmi_wwan: Adding support for Cinterion MV31 cxgb4: Add new T6 PCI device id 0x6092 cifs: Set CIFS_MOUNT_USE_PREFIX_PATH flag on setting cifs_sb->prepath. kbuild: fix CONFIG_TRIM_UNUSED_KSYMS build for ppc64 scripts/recordmcount.pl: support big endian for ARCH sh Linux 5.10.19 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ie460e26abc91311bcdd6b8484f5b42a7ffe1058f |
||
![]() |
a42150f1c9 |
mm: provide a saner PTE walking API for modules
commit 9fd6dad1261a541b3f5fa7dc5b152222306e6702 upstream. Currently, the follow_pfn function is exported for modules but follow_pte is not. However, follow_pfn is very easy to misuse, because it does not provide protections (so most of its callers assume the page is writable!) and because it returns after having already unlocked the page table lock. Provide instead a simplified version of follow_pte that does not have the pmdpp and range arguments. The older version survives as follow_invalidate_pte() for use by fs/dax.c. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
![]() |
6d9c9ec0d8 |
mm: simplify follow_pte{,pmd}
commit ff5c19ed4b087073cea38ff0edc80c23d7256943 upstream. Merge __follow_pte_pmd, follow_pte_pmd and follow_pte into a single follow_pte function and just pass two additional NULL arguments for the two previous follow_pte callers. [sfr@canb.auug.org.au: merge fix for "s390/pci: remove races against pte updates"] Link: https://lkml.kernel.org/r/20201111221254.7f6a3658@canb.auug.org.au Link: https://lkml.kernel.org/r/20201029101432.47011-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
![]() |
ccdd6170d1 |
BACKPORT: FROMGIT: Mark anonymous struct field of 'struct vm_fault' as 'const'
The fields of this struct are only ever read after being initialised, so mark it 'const' before somebody tries to modify it again. GCC will then complain (with an error) about modification of these fields after they have been initialised, although LLVM currently allows them without even a warning: https://bugs.llvm.org/show_bug.cgi?id=48755 Hopefully, future versions of LLVM will emit a warning. Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Will Deacon <will@kernel.org> Change-Id: I46e2de1252751b863442383eac22090c4b4606c0 Bug: 171278850 (cherry picked from commit 5857c9209ce58f8e262889539ccdf63e73ad7a93 https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/faultaround) [vinmenon: changes for speculative page fault] Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> |
||
![]() |
db6cf10907 |
BACKPORT: FROMGIT: mm: Pass 'address' to map to do_set_pte() and drop FAULT_FLAG_PREFAULT
Rather than modifying the 'address' field of the 'struct vm_fault' passed to do_set_pte(), leave that to identify the real faulting address and pass in the virtual address to be mapped by the new pte as a separate argument. This makes FAULT_FLAG_PREFAULT redundant, as a prefault entry can be identified simply by comparing the new address parameter with the faulting address, so remove the redundant flag at the same time. Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Will Deacon <will@kernel.org> Change-Id: I495c06047bac0f4e2241bc47a18b8ee8f97e4af8 Bug: 171278850 (cherry picked from commit 9d3af4b448a119ac81378d3bc775f1c4a2a7ff36 https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/faultaround) [vinmenon: changes for speculative page fault] Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> |
||
![]() |
249439d643 |
BACKPORT: FROMGIT: mm: Move immutable fields of 'struct vm_fault' into anonymous struct
'struct vm_fault' contains both information about the fault being serviced alongside mutable fields contributing to the state of the fault-handling logic. Unfortunately, the distinction between the two is not clear-cut, and a number of callers end up manipulating the structure temporarily before restoring it when returning. Try to clean this up by moving the immutable fault information into an anonymous struct, which will later be marked as 'const'. Ideally, the 'flags' field would be part of the new structure too, but it seems as though the ->page_mkwrite() path is not ready for this yet. Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/CAHk-=whYs9XsO88iqJzN6NC=D-dp2m0oYXuOoZ=eWnvv=5OA+w@mail.gmail.com Signed-off-by: Will Deacon <will@kernel.org> Change-Id: If094fbaa416c31b7bf2d5b00f2474bd330a22cc5 Bug: 171278850 (cherry picked from commit 742d33729a0df11c9d8d4625dbf21dd20cdefd44 https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/faultaround) [vinmenon: changes for speculative page fault] Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> |
||
![]() |
ef3b732457 |
BACKPORT: FROMGIT: mm: Allow architectures to request 'old' entries when prefaulting
Commit |
||
![]() |
0aa300a252 |
BACKPORT: FROMGIT: mm: Cleanup faultaround and finish_fault() codepaths
alloc_set_pte() has two users with different requirements: in the faultaround code, it called from an atomic context and PTE page table has to be preallocated. finish_fault() can sleep and allocate page table as needed. PTL locking rules are also strange, hard to follow and overkill for finish_fault(). Let's untangle the mess. alloc_set_pte() has gone now. All locking is explicit. The price is some code duplication to handle huge pages in faultaround path, but it should be fine, having overall improvement in readability. Link: https://lore.kernel.org/r/20201229132819.najtavneutnf7ajp@box Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> [will: s/from from/from/ in comment; spotted by willy] Signed-off-by: Will Deacon <will@kernel.org> Change-Id: I2746b62adfe63e4f1b62e806df06b1b7a17574ad Bug: 171278850 (cherry picked from commit f9ce0be71d1fbb038ada15ced83474b0e63f264d https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/faultaround) [vinmenon: changes for speculative page fault] Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> |
||
![]() |
9e4d84273c |
ANDROID: Fix sparse warning in __handle_speculative_fault caused by SPF
SPF patchset introduced a sparse warning caused by the mismatch in
__handle_speculative_fault function's return type. Fix the return
type.
Fixes:
|
||
![]() |
c9201630e8 |
ANDROID: mm: use raw seqcount variants in vm_write_*
write_seqcount_begin expects to be called from a non-preemptible context to avoid preemption by a read section that can spin due to an odd value. But the readers of vm_sequence never retries and thus writers need not disable preemption. Use the non-lockdep variant as lockdep checks are now in-built to write_seqcount_begin. Bug: 161210518 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> Change-Id: If4f0cddd7f0a79136495060d4acc1702abb46817 |
||
![]() |
99e15a0799 |
FROMLIST: mm: speculative page fault handler return VMA
When the speculative page fault handler is returning VM_RETRY, there is a chance that VMA fetched without grabbing the mmap_sem can be reused by the legacy page fault handler. By reusing it, we avoid calling find_vma() again. To achieve, that we must ensure that the VMA structure will not be freed in our back. This is done by getting the reference on it (get_vma()) and by assuming that the caller will call the new service can_reuse_spf_vma() once it has grabbed the mmap_sem. can_reuse_spf_vma() is first checking that the VMA is still in the RB tree , and then that the VMA's boundaries matched the passed address and release the reference on the VMA so that it can be freed if needed. In the case the VMA is freed, can_reuse_spf_vma() will have returned false as the VMA is no more in the RB tree. In the architecture page fault handler, the call to the new service reuse_spf_or_find_vma() should be made in place of find_vma(), this will handle the check on the spf_vma and if needed call find_vma(). Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Change-Id: Ia56dcf807e8bddf6788fd696dd80372db35476f0 Link: https://lore.kernel.org/lkml/1523975611-15978-23-git-send-email-ldufour@linux.vnet.ibm.com/ Bug: 161210518 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> |
||
![]() |
1c53717440 |
FROMLIST: mm: provide speculative fault infrastructure
Provide infrastructure to do a speculative fault (not holding mmap_sem). The not holding of mmap_sem means we can race against VMA change/removal and page-table destruction. We use the SRCU VMA freeing to keep the VMA around. We use the VMA seqcount to detect change (including umapping / page-table deletion) and we use gup_fast() style page-table walking to deal with page-table races. Once we've obtained the page and are ready to update the PTE, we validate if the state we started the fault with is still valid, if not, we'll fail the fault with VM_FAULT_RETRY, otherwise we update the PTE and we're done. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [Manage the newly introduced pte_spinlock() for speculative page fault to fail if the VMA is touched in our back] [Rename vma_is_dead() to vma_has_changed() and declare it here] [Fetch p4d and pud] [Set vmd.sequence in __handle_mm_fault()] [Abort speculative path when handle_userfault() has to be called] [Add additional VMA's flags checks in handle_speculative_fault()] [Clear FAULT_FLAG_ALLOW_RETRY in handle_speculative_fault()] [Don't set vmf->pte and vmf->ptl if pte_map_lock() failed] [Remove warning comment about waiting for !seq&1 since we don't want to wait] [Remove warning about no huge page support, mention it explictly] [Don't call do_fault() in the speculative path as __do_fault() calls vma->vm_ops->fault() which may want to release mmap_sem] [Only vm_fault pointer argument for vma_has_changed()] [Fix check against huge page, calling pmd_trans_huge()] [Use READ_ONCE() when reading VMA's fields in the speculative path] [Explicitly check for __HAVE_ARCH_PTE_SPECIAL as we can't support for processing done in vm_normal_page()] [Check that vma->anon_vma is already set when starting the speculative path] [Check for memory policy as we can't support MPOL_INTERLEAVE case due to the processing done in mpol_misplaced()] [Don't support VMA growing up or down] [Move check on vm_sequence just before calling handle_pte_fault()] [Don't build SPF services if !CONFIG_SPECULATIVE_PAGE_FAULT] [Add mem cgroup oom check] [Use READ_ONCE to access p*d entries] [Replace deprecated ACCESS_ONCE() by READ_ONCE() in vma_has_changed()] [Don't fetch pte again in handle_pte_fault() when running the speculative path] [Check PMD against concurrent collapsing operation] [Try spin lock the pte during the speculative path to avoid deadlock with other CPU's invalidating the TLB and requiring this CPU to catch the inter processor's interrupt] [Move define of FAULT_FLAG_SPECULATIVE here] [Introduce __handle_speculative_fault() and add a check against mm->mm_users in handle_speculative_fault() defined in mm.h] Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Link: https://lore.kernel.org/lkml/1523975611-15978-19-git-send-email-ldufour@linux.vnet.ibm.com/ Bug: 161210518 Change-Id: I6a29e6edd9779bd34a9f7f4f6034e041a8487f30 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> |
||
![]() |
6d8fc36d9e |
FROMLIST: mm: protect mm_rb tree with a rwlock
This change is inspired by the Peter's proposal patch [1] which was protecting the VMA using SRCU. Unfortunately, SRCU is not scaling well in that particular case, and it is introducing major performance degradation due to excessive scheduling operations. To allow access to the mm_rb tree without grabbing the mmap_sem, this patch is protecting it access using a rwlock. As the mm_rb tree is a O(log n) search it is safe to protect it using such a lock. The VMA cache is not protected by the new rwlock and it should not be used without holding the mmap_sem. To allow the picked VMA structure to be used once the rwlock is released, a use count is added to the VMA structure. When the VMA is allocated it is set to 1. Each time the VMA is picked with the rwlock held its use count is incremented. Each time the VMA is released it is decremented. When the use count hits zero, this means that the VMA is no more used and should be freed. This patch is preparing for 2 kind of VMA access : - as usual, under the control of the mmap_sem, - without holding the mmap_sem for the speculative page fault handler. Access done under the control the mmap_sem doesn't require to grab the rwlock to protect read access to the mm_rb tree, but access in write must be done under the protection of the rwlock too. This affects inserting and removing of elements in the RB tree. The patch is introducing 2 new functions: - vma_get() to find a VMA based on an address by holding the new rwlock. - vma_put() to release the VMA when its no more used. These services are designed to be used when access are made to the RB tree without holding the mmap_sem. When a VMA is removed from the RB tree, its vma->vm_rb field is cleared and we rely on the WMB done when releasing the rwlock to serialize the write with the RMB done in a later patch to check for the VMA's validity. When free_vma is called, the file associated with the VMA is closed immediately, but the policy and the file structure remained in used until the VMA's use count reach 0, which may happens later when exiting an in progress speculative page fault. [1] https://patchwork.kernel.org/patch/5108281/ Change-Id: I9ecc922b8efa4b28975cc6a8e9531284c24ac14e Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Link: https://lore.kernel.org/lkml/1523975611-15978-18-git-send-email-ldufour@linux.vnet.ibm.com/ Bug: 161210518 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> |
||
![]() |
10a5eb6be8 |
FROMLIST: mm: introduce __vm_normal_page()
When dealing with the speculative fault path we should use the VMA's field cached value stored in the vm_fault structure. Currently vm_normal_page() is using the pointer to the VMA to fetch the vm_flags value. This patch provides a new __vm_normal_page() which is receiving the vm_flags flags value as parameter. Note: The speculative path is turned on for architecture providing support for special PTE flag. So only the first block of vm_normal_page is used during the speculative path. Change-Id: I0f2c4ab1212fbca449bdf6e7993dafa0d41044bc Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Link: https://lore.kernel.org/lkml/1523975611-15978-16-git-send-email-ldufour@linux.vnet.ibm.com/ Bug: 161210518 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> |
||
![]() |
32507b6ff2 |
FROMLIST: mm: cache some VMA fields in the vm_fault structure
When handling speculative page fault, the vma->vm_flags and vma->vm_page_prot fields are read once the page table lock is released. So there is no more guarantee that these fields would not change in our back. They will be saved in the vm_fault structure before the VMA is checked for changes. This patch also set the fields in hugetlb_no_page() and __collapse_huge_page_swapin even if it is not need for the callee. Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Change-Id: I9821f02ea32ef220b57b8bfd817992bbf71bbb1d Link: https://lore.kernel.org/lkml/1523975611-15978-13-git-send-email-ldufour@linux.vnet.ibm.com/ Bug: 161210518 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> |
||
![]() |
0525756e6b |
FROMLIST: mm: protect mremap() against SPF hanlder
If a thread is remapping an area while another one is faulting on the destination area, the SPF handler may fetch the vma from the RB tree before the pte has been moved by the other thread. This means that the moved ptes will overwrite those create by the page fault handler leading to page leaked. CPU 1 CPU2 enter mremap() unmap the dest area copy_vma() Enter speculative page fault handler >> at this time the dest area is present in the RB tree fetch the vma matching dest area create a pte as the VMA matched Exit the SPF handler <data written in the new page> move_ptes() > it is assumed that the dest area is empty, > the move ptes overwrite the page mapped by the CPU2. To prevent that, when the VMA matching the dest area is extended or created by copy_vma(), it should be marked as non available to the SPF handler. The usual way to so is to rely on vm_write_begin()/end(). This is already in __vma_adjust() called by copy_vma() (through vma_merge()). But __vma_adjust() is calling vm_write_end() before returning which create a window for another thread. This patch adds a new parameter to vma_merge() which is passed down to vma_adjust(). The assumption is that copy_vma() is returning a vma which should be released by calling vm_raw_write_end() by the callee once the ptes have been moved. Change-Id: Icd338ad6e9b3c97b7334d3b8d30a8badfa2a4efa Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Link: https://lore.kernel.org/lkml/1523975611-15978-11-git-send-email-ldufour@linux.vnet.ibm.com/ Bug: 161210518 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> |
||
![]() |
2ce6b11ac3 |
FROMLIST: mm: VMA sequence count
Wrap the VMA modifications (vma_adjust/unmap_page_range) with sequence counts such that we can easily test if a VMA is changed. The unmap_page_range() one allows us to make assumptions about page-tables; when we find the seqcount hasn't changed we can assume page-tables are still valid. The flip side is that we cannot distinguish between a vma_adjust() and the unmap_page_range() -- where with the former we could have re-checked the vma bounds against the address. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [Port to 4.12 kernel] [Build depends on CONFIG_SPECULATIVE_PAGE_FAULT] [Introduce vm_write_* inline function depending on CONFIG_SPECULATIVE_PAGE_FAULT] [Fix lock dependency between mapping->i_mmap_rwsem and vma->vm_sequence by using vm_raw_write* functions] [Fix a lock dependency warning in mmap_region() when entering the error path] [move sequence initialisation INIT_VMA()] Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Link: https://lore.kernel.org/lkml/1523975611-15978-9-git-send-email-ldufour@linux.vnet.ibm.com/ Bug: 161210518 Change-Id: Ibc23ef3b9dbb80323c0f24cb06da34b4c3a8fa71 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> |
||
![]() |
0076600734 |
FROMLIST: mm: introduce INIT_VMA()
Some VMA struct fields need to be initialized once the VMA structure is allocated. Currently this only concerns anon_vma_chain field but some other will be added to support the speculative page fault. Instead of spreading the initialization calls all over the code, let's introduce a dedicated inline function. Change-Id: I9f6b29dc74055354318b548e2b6b22c37d4c61bb Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Link: https://lore.kernel.org/lkml/1523975611-15978-8-git-send-email-ldufour@linux.vnet.ibm.com/ Bug: 161210518 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> |
||
![]() |
4e35a81bd8 |
UPSTREAM: kasan, mm: check kasan_enabled in annotations
[ Upstream commit 34303244f2615add92076a4bf2d4f39323bde4f2 ] Declare the kasan_enabled static key in include/linux/kasan.h and in include/linux/mm.h and check it in all kasan annotations. This allows to avoid any slowdown caused by function calls when kasan_enabled is disabled. Link: https://lkml.kernel.org/r/9f90e3c0aa840dbb4833367c2335193299f69023.1606162397.git.andreyknvl@google.com Co-developed-by: Vincenzo Frascino <Vincenzo.Frascino@arm.com> Signed-off-by: Vincenzo Frascino <Vincenzo.Frascino@arm.com> Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Bug: 172318110 Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Change-Id: I2fd783303872cb87b447dbcc97d5654b293fa080 |
||
![]() |
11167161e5 |
UPSTREAM: kasan, arm64: implement HW_TAGS runtime
[ Upstream commit 2e903b91479782b7dedd869603423d77e079d3de ] Provide implementation of KASAN functions required for the hardware tag-based mode. Those include core functions for memory and pointer tagging (tags_hw.c) and bug reporting (report_tags_hw.c). Also adapt common KASAN code to support the new mode. Link: https://lkml.kernel.org/r/cfd0fbede579a6b66755c98c88c108e54f9c56bf.1606161801.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Marco Elver <elver@google.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Bug: 172318110 Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Change-Id: I01f73c7dad50345aa95272fa93eb26cbb1d6bf83 |
||
![]() |
6010ce3442 |
ANDROID: mm: add generic __va_function and __pa_function
We use non-canonical CFI jump tables with CONFIG_CFI_CLANG, which means the compiler replaces function address references with the address of the function's CFI jump table entry. This results in __pa_symbol(function), for example, returning the physical address of the jump table entry, which can lead to address space confusion since the jump table itself points to a virtual address. This change adds generic definitions for __pa/va_function, which architectures that support CFI can override. Bug: 145210207 Change-Id: I5b616901d5582478df613a4d28bf2b9c911edb46 Signed-off-by: Sami Tolvanen <samitolvanen@google.com> |
||
![]() |
9cf2ceaffd |
Merge 5.10.5 into android12-5.10
Changes in 5.10.5 net/sched: sch_taprio: reset child qdiscs before freeing them mptcp: fix security context on server socket ethtool: fix error paths in ethnl_set_channels() ethtool: fix string set id check md/raid10: initialize r10_bio->read_slot before use. drm/amd/display: Add get_dig_frontend implementation for DCEx io_uring: close a small race gap for files cancel jffs2: Allow setting rp_size to zero during remounting jffs2: Fix NULL pointer dereference in rp_size fs option parsing spi: dw-bt1: Fix undefined devm_mux_control_get symbol opp: fix memory leak in _allocate_opp_table opp: Call the missing clk_put() on error scsi: block: Fix a race in the runtime power management code mm/hugetlb: fix deadlock in hugetlb_cow error path mm: memmap defer init doesn't work as expected lib/zlib: fix inflating zlib streams on s390 io_uring: don't assume mm is constant across submits io_uring: use bottom half safe lock for fixed file data io_uring: add a helper for setting a ref node io_uring: fix io_sqe_files_unregister() hangs uapi: move constants from <linux/kernel.h> to <linux/const.h> tools headers UAPI: Sync linux/const.h with the kernel headers cgroup: Fix memory leak when parsing multiple source parameters zlib: move EXPORT_SYMBOL() and MODULE_LICENSE() out of dfltcc_syms.c scsi: cxgb4i: Fix TLS dependency Bluetooth: hci_h5: close serdev device and free hu in h5_close fbcon: Disable accelerated scrolling reiserfs: add check for an invalid ih_entry_count misc: vmw_vmci: fix kernel info-leak by initializing dbells in vmci_ctx_get_chkpt_doorbells() media: gp8psk: initialize stats at power control logic f2fs: fix shift-out-of-bounds in sanity_check_raw_super() ALSA: seq: Use bool for snd_seq_queue internal flags ALSA: rawmidi: Access runtime->avail always in spinlock bfs: don't use WARNING: string when it's just info. ext4: check for invalid block size early when mounting a file system fcntl: Fix potential deadlock in send_sig{io, urg}() io_uring: check kthread stopped flag when sq thread is unparked rtc: sun6i: Fix memleak in sun6i_rtc_clk_init module: set MODULE_STATE_GOING state when a module fails to load quota: Don't overflow quota file offsets rtc: pl031: fix resource leak in pl031_probe powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe() i3c master: fix missing destroy_workqueue() on error in i3c_master_register NFSv4: Fix a pNFS layout related use-after-free race when freeing the inode f2fs: avoid race condition for shrinker count f2fs: fix race of pending_pages in decompression module: delay kobject uevent until after module init call powerpc/64: irq replay remove decrementer overflow check fs/namespace.c: WARN if mnt_count has become negative watchdog: rti-wdt: fix reference leak in rti_wdt_probe um: random: Register random as hwrng-core device um: ubd: Submit all data segments atomically NFSv4.2: Don't error when exiting early on a READ_PLUS buffer overflow ceph: fix inode refcount leak when ceph_fill_inode on non-I_NEW inode fails drm/amd/display: updated wm table for Renoir tick/sched: Remove bogus boot "safety" check s390: always clear kernel stack backchain before calling functions io_uring: remove racy overflow list fast checks ALSA: pcm: Clear the full allocated memory at hw_params dm verity: skip verity work if I/O error when system is shutting down ext4: avoid s_mb_prefetch to be zero in individual scenarios device-dax: Fix range release Linux 5.10.5 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I2b481bfac06bafdef2cf3cc1ac2c2a4ddf9913dc |
||
![]() |
98b57685c2 |
mm: memmap defer init doesn't work as expected
commit dc2da7b45ffe954a0090f5d0310ed7b0b37d2bd2 upstream. VMware observed a performance regression during memmap init on their platform, and bisected to commit |
||
![]() |
07933490ae |
Merge 4ef8451b33 ("Merge tag 'perf-tools-for-v5.10-2020-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux") into android-mainline
Steps on the way to 5.10-rc3 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ia09418a96a25f6c602af953db5d3258e032c0f30 |
||
![]() |
f8f6ae5d07 |
mm: always have io_remap_pfn_range() set pgprot_decrypted()
The purpose of io_remap_pfn_range() is to map IO memory, such as a
memory mapped IO exposed through a PCI BAR. IO devices do not
understand encryption, so this memory must always be decrypted.
Automatically call pgprot_decrypted() as part of the generic
implementation.
This fixes a bug where enabling AMD SME causes subsystems, such as RDMA,
using io_remap_pfn_range() to expose BAR pages to user space to fail.
The CPU will encrypt access to those BAR pages instead of passing
unencrypted IO directly to the device.
Places not mapping IO should use remap_pfn_range().
Fixes:
|
||
![]() |
b3dd1b5952 |
Merge 922a763ae1 ("Merge tag 'zonefs-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs") into android-mainline
Steps on the way to 5.10-rc1 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I520719ae5e0d992c3756e393cb299d77d650622e |
||
![]() |
05d2a661fd |
Merge 54a4c789ca ("Merge tag 'docs/v5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media") into android-mainline
Steps on the way to 5.10-rc1 Resolves conflicts in: fs/userfaultfd.c Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ie3fe3c818f1f6565cfd4fa551de72d2b72ef60af |
||
![]() |
75c90a8c3a |
Merge d5660df4a5 ("Merge branch 'akpm' (patches from Andrew)") into android-mainline
steps on the way to 5.10-rc1 Change-Id: Iddc84c25b6a9d71fa8542b927d6f69c364131c3d Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
![]() |
1c84293163 |
Merge 6734e20e39 ("Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux") into android-mainline
Tiny steps on the way to 5.10-rc1. Change-Id: I8ff6cb398ac1c0623bf2cefd29860616d05be107 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |