Commit Graph

1931 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
141fbd343b Revert "oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup"
This reverts commit ed5d4efb4d which is
commit e4a38402c36e42df28eb1a5394be87e6571fb48a upstream.

It breaks the kernel ABI and should not be an issue for Android at this
point in time.  If it is, it can come back in a different, abi-stable
form.

Bug: 161946584
Fixes: ed5d4efb4d ("oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup")
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I729614c373a7b28546376913e719bbaf6dd306a9
2022-05-12 12:03:58 +02:00
Greg Kroah-Hartman
ca9b002a16 Merge 5.10.113 into android12-5.10-lts
Changes in 5.10.113
	etherdevice: Adjust ether_addr* prototypes to silence -Wstringop-overead
	mm: page_alloc: fix building error on -Werror=array-compare
	tracing: Dump stacktrace trigger to the corresponding instance
	perf tools: Fix segfault accessing sample_id xyarray
	gfs2: assign rgrp glock before compute_bitstructs
	net/sched: cls_u32: fix netns refcount changes in u32_change()
	ALSA: usb-audio: Clear MIDI port active flag after draining
	ALSA: hda/realtek: Add quirk for Clevo NP70PNP
	dm: fix mempool NULL pointer race when completing IO
	ASoC: atmel: Remove system clock tree configuration for at91sam9g20ek
	ASoC: msm8916-wcd-digital: Check failure for devm_snd_soc_register_component
	ASoC: codecs: wcd934x: do not switch off SIDO Buck when codec is in use
	dmaengine: imx-sdma: Fix error checking in sdma_event_remap
	dmaengine: mediatek:Fix PM usage reference leak of mtk_uart_apdma_alloc_chan_resources
	spi: spi-mtk-nor: initialize spi controller after resume
	esp: limit skb_page_frag_refill use to a single page
	igc: Fix infinite loop in release_swfw_sync
	igc: Fix BUG: scheduling while atomic
	rxrpc: Restore removed timer deletion
	net/smc: Fix sock leak when release after smc_shutdown()
	net/packet: fix packet_sock xmit return value checking
	ip6_gre: Avoid updating tunnel->tun_hlen in __gre6_xmit()
	ip6_gre: Fix skb_under_panic in __gre6_xmit()
	net/sched: cls_u32: fix possible leak in u32_init_knode()
	l3mdev: l3mdev_master_upper_ifindex_by_index_rcu should be using netdev_master_upper_dev_get_rcu
	ipv6: make ip6_rt_gc_expire an atomic_t
	netlink: reset network and mac headers in netlink_dump()
	net: stmmac: Use readl_poll_timeout_atomic() in atomic state
	dmaengine: idxd: add RO check for wq max_batch_size write
	dmaengine: idxd: add RO check for wq max_transfer_size write
	selftests: mlxsw: vxlan_flooding: Prevent flooding of unwanted packets
	arm64/mm: Remove [PUD|PMD]_TABLE_BIT from [pud|pmd]_bad()
	arm64: mm: fix p?d_leaf()
	ARM: vexpress/spc: Avoid negative array index when !SMP
	reset: tegra-bpmp: Restore Handle errors in BPMP response
	platform/x86: samsung-laptop: Fix an unsigned comparison which can never be negative
	ALSA: usb-audio: Fix undefined behavior due to shift overflowing the constant
	arm64: dts: imx: Fix imx8*-var-som touchscreen property sizes
	vxlan: fix error return code in vxlan_fdb_append
	cifs: Check the IOCB_DIRECT flag, not O_DIRECT
	net: atlantic: Avoid out-of-bounds indexing
	mt76: Fix undefined behavior due to shift overflowing the constant
	brcmfmac: sdio: Fix undefined behavior due to shift overflowing the constant
	dpaa_eth: Fix missing of_node_put in dpaa_get_ts_info()
	drm/msm/mdp5: check the return of kzalloc()
	net: macb: Restart tx only if queue pointer is lagging
	scsi: qedi: Fix failed disconnect handling
	stat: fix inconsistency between struct stat and struct compat_stat
	nvme: add a quirk to disable namespace identifiers
	nvme-pci: disable namespace identifiers for Qemu controllers
	EDAC/synopsys: Read the error count from the correct register
	mm, hugetlb: allow for "high" userspace addresses
	oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup
	mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove()
	ata: pata_marvell: Check the 'bmdma_addr' beforing reading
	dma: at_xdmac: fix a missing check on list iterator
	net: atlantic: invert deep par in pm functions, preventing null derefs
	xtensa: patch_text: Fixup last cpu should be master
	xtensa: fix a7 clobbering in coprocessor context load/store
	openvswitch: fix OOB access in reserve_sfa_size()
	gpio: Request interrupts after IRQ is initialized
	ASoC: soc-dapm: fix two incorrect uses of list iterator
	e1000e: Fix possible overflow in LTR decoding
	ARC: entry: fix syscall_trace_exit argument
	arm_pmu: Validate single/group leader events
	sched/pelt: Fix attach_entity_load_avg() corner case
	perf/core: Fix perf_mmap fail when CONFIG_PERF_USE_VMALLOC enabled
	drm/panel/raspberrypi-touchscreen: Avoid NULL deref if not initialised
	drm/panel/raspberrypi-touchscreen: Initialise the bridge in prepare
	KVM: PPC: Fix TCE handling for VFIO
	drm/vc4: Use pm_runtime_resume_and_get to fix pm_runtime_get_sync() usage
	powerpc/perf: Fix power9 event alternatives
	perf report: Set PERF_SAMPLE_DATA_SRC bit for Arm SPE event
	ext4: fix fallocate to use file_modified to update permissions consistently
	ext4: fix symlink file size not match to file content
	ext4: fix use-after-free in ext4_search_dir
	ext4: limit length to bitmap_maxbytes - blocksize in punch_hole
	ext4, doc: fix incorrect h_reserved size
	ext4: fix overhead calculation to account for the reserved gdt blocks
	ext4: force overhead calculation if the s_overhead_cluster makes no sense
	can: isotp: stop timeout monitoring when no first frame was sent
	jbd2: fix a potential race while discarding reserved buffers after an abort
	spi: atmel-quadspi: Fix the buswidth adjustment between spi-mem and controller
	staging: ion: Prevent incorrect reference counting behavour
	block/compat_ioctl: fix range check in BLKGETSIZE
	Revert "net: micrel: fix KS8851_MLL Kconfig"
	Linux 5.10.113

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4ed10699cbb32b89caf79b8b4a2a35b3d8824115
2022-05-12 11:23:35 +02:00
Nico Pache
ed5d4efb4d oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup
commit e4a38402c36e42df28eb1a5394be87e6571fb48a upstream.

The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which
can be targeted by the oom reaper.  This mapping is used to store the
futex robust list head; the kernel does not keep a copy of the robust
list and instead references a userspace address to maintain the
robustness during a process death.

A race can occur between exit_mm and the oom reaper that allows the oom
reaper to free the memory of the futex robust list before the exit path
has handled the futex death:

    CPU1                               CPU2
    --------------------------------------------------------------------
    page_fault
    do_exit "signal"
    wake_oom_reaper
                                        oom_reaper
                                        oom_reap_task_mm (invalidates mm)
    exit_mm
    exit_mm_release
    futex_exit_release
    futex_cleanup
    exit_robust_list
    get_user (EFAULT- can't access memory)

If the get_user EFAULT's, the kernel will be unable to recover the
waiters on the robust_list, leaving userspace mutexes hung indefinitely.

Delay the OOM reaper, allowing more time for the exit path to perform
the futex cleanup.

Reproducer: https://gitlab.com/jsavitz/oom_futex_reproducer

Based on a patch by Michal Hocko.

Link: https://elixir.bootlin.com/glibc/glibc-2.35/source/nptl/allocatestack.c#L370 [1]
Link: https://lkml.kernel.org/r/20220414144042.677008-1-npache@redhat.com
Fixes: 2129258024 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
Signed-off-by: Joel Savitz <jsavitz@redhat.com>
Signed-off-by: Nico Pache <npache@redhat.com>
Co-developed-by: Joel Savitz <jsavitz@redhat.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Herton R. Krzesinski <herton@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joel Savitz <jsavitz@redhat.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27 13:53:54 +02:00
Greg Kroah-Hartman
79553fad5c Merge 5.10.102 into android12-5.10-lts
Changes in 5.10.102
	drm/nouveau/pmu/gm200-: use alternate falcon reset sequence
	mm: memcg: synchronize objcg lists with a dedicated spinlock
	rcu: Do not report strict GPs for outgoing CPUs
	fget: clarify and improve __fget_files() implementation
	fs/proc: task_mmu.c: don't read mapcount for migration entry
	can: isotp: prevent race between isotp_bind() and isotp_setsockopt()
	can: isotp: add SF_BROADCAST support for functional addressing
	scsi: lpfc: Fix mailbox command failure during driver initialization
	HID:Add support for UGTABLET WP5540
	Revert "svm: Add warning message for AVIC IPI invalid target"
	serial: parisc: GSC: fix build when IOSAPIC is not set
	parisc: Drop __init from map_pages declaration
	parisc: Fix data TLB miss in sba_unmap_sg
	parisc: Fix sglist access in ccio-dma.c
	mmc: block: fix read single on recovery logic
	mm: don't try to NUMA-migrate COW pages that have other uses
	PCI: hv: Fix NUMA node assignment when kernel boots with custom NUMA topology
	parisc: Add ioread64_lo_hi() and iowrite64_lo_hi()
	btrfs: send: in case of IO error log it
	platform/x86: touchscreen_dmi: Add info for the RWC NANOTE P8 AY07J 2-in-1
	platform/x86: ISST: Fix possible circular locking dependency detected
	selftests: rtc: Increase test timeout so that all tests run
	kselftest: signal all child processes
	net: ieee802154: at86rf230: Stop leaking skb's
	selftests/zram: Skip max_comp_streams interface on newer kernel
	selftests/zram01.sh: Fix compression ratio calculation
	selftests/zram: Adapt the situation that /dev/zram0 is being used
	selftests: openat2: Print also errno in failure messages
	selftests: openat2: Add missing dependency in Makefile
	selftests: openat2: Skip testcases that fail with EOPNOTSUPP
	selftests: skip mincore.check_file_mmap when fs lacks needed support
	ax25: improve the incomplete fix to avoid UAF and NPD bugs
	vfs: make freeze_super abort when sync_filesystem returns error
	quota: make dquot_quota_sync return errors from ->sync_fs
	scsi: pm8001: Fix use-after-free for aborted TMF sas_task
	scsi: pm8001: Fix use-after-free for aborted SSP/STP sas_task
	nvme: fix a possible use-after-free in controller reset during load
	nvme-tcp: fix possible use-after-free in transport error_recovery work
	nvme-rdma: fix possible use-after-free in transport error_recovery work
	drm/amdgpu: fix logic inversion in check
	x86/Xen: streamline (and fix) PV CPU enumeration
	Revert "module, async: async_synchronize_full() on module init iff async is used"
	gcc-plugins/stackleak: Use noinstr in favor of notrace
	random: wake up /dev/random writers after zap
	kbuild: lto: merge module sections
	kbuild: lto: Merge module sections if and only if CONFIG_LTO_CLANG is enabled
	iwlwifi: fix use-after-free
	drm/radeon: Fix backlight control on iMac 12,1
	drm/i915/opregion: check port number bounds for SWSCI display power state
	vsock: remove vsock from connected table when connect is interrupted by a signal
	drm/i915/gvt: Make DRM_I915_GVT depend on X86
	iwlwifi: pcie: fix locking when "HW not ready"
	iwlwifi: pcie: gen2: fix locking when "HW not ready"
	selftests: netfilter: fix exit value for nft_concat_range
	netfilter: nft_synproxy: unregister hooks on init error path
	ipv6: per-netns exclusive flowlabel checks
	net: dsa: lan9303: fix reset on probe
	net: dsa: lantiq_gswip: fix use after free in gswip_remove()
	net: ieee802154: ca8210: Fix lifs/sifs periods
	ping: fix the dif and sdif check in ping_lookup
	bonding: force carrier update when releasing slave
	drop_monitor: fix data-race in dropmon_net_event / trace_napi_poll_hit
	net_sched: add __rcu annotation to netdev->qdisc
	bonding: fix data-races around agg_select_timer
	libsubcmd: Fix use-after-free for realloc(..., 0)
	dpaa2-eth: Initialize mutex used in one step timestamping path
	perf bpf: Defer freeing string after possible strlen() on it
	selftests/exec: Add non-regular to TEST_GEN_PROGS
	ALSA: hda/realtek: Add quirk for Legion Y9000X 2019
	ALSA: hda/realtek: Fix deadlock by COEF mutex
	ALSA: hda: Fix regression on forced probe mask option
	ALSA: hda: Fix missing codec probe on Shenker Dock 15
	ASoC: ops: Fix stereo change notifications in snd_soc_put_volsw()
	ASoC: ops: Fix stereo change notifications in snd_soc_put_volsw_range()
	powerpc/lib/sstep: fix 'ptesync' build error
	mtd: rawnand: gpmi: don't leak PM reference in error path
	KVM: SVM: Never reject emulation due to SMAP errata for !SEV guests
	ASoC: tas2770: Insert post reset delay
	block/wbt: fix negative inflight counter when remove scsi device
	NFS: LOOKUP_DIRECTORY is also ok with symlinks
	NFS: Do not report writeback errors in nfs_getattr()
	tty: n_tty: do not look ahead for EOL character past the end of the buffer
	mtd: rawnand: qcom: Fix clock sequencing in qcom_nandc_probe()
	mtd: rawnand: brcmnand: Fixed incorrect sub-page ECC status
	Drivers: hv: vmbus: Fix memory leak in vmbus_add_channel_kobj
	KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id()
	KVM: x86/pmu: Don't truncate the PerfEvtSeln MSR when creating a perf event
	KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAW
	NFS: Don't set NFS_INO_INVALID_XATTR if there is no xattr cache
	ARM: OMAP2+: hwmod: Add of_node_put() before break
	ARM: OMAP2+: adjust the location of put_device() call in omapdss_init_of
	phy: usb: Leave some clocks running during suspend
	irqchip/sifive-plic: Add missing thead,c900-plic match string
	netfilter: conntrack: don't refresh sctp entries in closed state
	arm64: dts: meson-gx: add ATF BL32 reserved-memory region
	arm64: dts: meson-g12: add ATF BL32 reserved-memory region
	arm64: dts: meson-g12: drop BL32 region from SEI510/SEI610
	pidfd: fix test failure due to stack overflow on some arches
	selftests: fixup build warnings in pidfd / clone3 tests
	kconfig: let 'shell' return enough output for deep path names
	lib/iov_iter: initialize "flags" in new pipe_buffer
	ata: libata-core: Disable TRIM on M88V29
	soc: aspeed: lpc-ctrl: Block error printing on probe defer cases
	xprtrdma: fix pointer derefs in error cases of rpcrdma_ep_create
	drm/rockchip: dw_hdmi: Do not leave clock enabled in error case
	tracing: Fix tp_printk option related with tp_printk_stop_on_boot
	net: usb: qmi_wwan: Add support for Dell DW5829e
	net: macb: Align the dma and coherent dma masks
	kconfig: fix failing to generate auto.conf
	scsi: lpfc: Fix pt2pt NVMe PRLI reject LOGO loop
	EDAC: Fix calculation of returned address and next offset in edac_align_ptr()
	net: sched: limit TC_ACT_REPEAT loops
	dmaengine: sh: rcar-dmac: Check for error num after setting mask
	dmaengine: stm32-dmamux: Fix PM disable depth imbalance in stm32_dmamux_probe
	dmaengine: sh: rcar-dmac: Check for error num after dma_set_max_seg_size
	i2c: qcom-cci: don't delete an unregistered adapter
	i2c: qcom-cci: don't put a device tree node before i2c_add_adapter()
	copy_process(): Move fd_install() out of sighand->siglock critical section
	i2c: brcmstb: fix support for DSL and CM variants
	lockdep: Correct lock_classes index mapping
	Linux 5.10.102

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ief12c0d77b23f9796b81a8fc3b79ac6589e81dc9
2022-02-23 12:56:37 +01:00
Igor Pylypiv
de55891e16 Revert "module, async: async_synchronize_full() on module init iff async is used"
[ Upstream commit 67d6212afda218d564890d1674bab28e8612170f ]

This reverts commit 774a1221e8.

We need to finish all async code before the module init sequence is
done.  In the reverted commit the PF_USED_ASYNC flag was added to mark a
thread that called async_schedule().  Then the PF_USED_ASYNC flag was
used to determine whether or not async_synchronize_full() needs to be
invoked.  This works when modprobe thread is calling async_schedule(),
but it does not work if module dispatches init code to a worker thread
which then calls async_schedule().

For example, PCI driver probing is invoked from a worker thread based on
a node where device is attached:

	if (cpu < nr_cpu_ids)
		error = work_on_cpu(cpu, local_pci_probe, &ddi);
	else
		error = local_pci_probe(&ddi);

We end up in a situation where a worker thread gets the PF_USED_ASYNC
flag set instead of the modprobe thread.  As a result,
async_synchronize_full() is not invoked and modprobe completes without
waiting for the async code to finish.

The issue was discovered while loading the pm80xx driver:
(scsi_mod.scan=async)

modprobe pm80xx                      worker
...
  do_init_module()
  ...
    pci_call_probe()
      work_on_cpu(local_pci_probe)
                                     local_pci_probe()
                                       pm8001_pci_probe()
                                         scsi_scan_host()
                                           async_schedule()
                                           worker->flags |= PF_USED_ASYNC;
                                     ...
      < return from worker >
  ...
  if (current->flags & PF_USED_ASYNC) <--- false
  	async_synchronize_full();

Commit 21c3c5d280 ("block: don't request module during elevator init")
fixed the deadlock issue which the reverted commit 774a1221e8
("module, async: async_synchronize_full() on module init iff async is
used") tried to fix.

Since commit 0fdff3ec6d ("async, kmod: warn on synchronous
request_module() from async workers") synchronous module loading from
async is not allowed.

Given that the original deadlock issue is fixed and it is no longer
allowed to call synchronous request_module() from async we can remove
PF_USED_ASYNC flag to make module init consistently invoke
async_synchronize_full() unless async module probe is requested.

Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Reviewed-by: Changyuan Lyu <changyuanl@google.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-23 12:01:00 +01:00
Greg Kroah-Hartman
a336e746e3 Merge 5.10.74 into android12-5.10-lts
Changes in 5.10.74
	ext4: check and update i_disksize properly
	ext4: correct the error path of ext4_write_inline_data_end()
	ASoC: Intel: sof_sdw: tag SoundWire BEs as non-atomic
	HID: apple: Fix logical maximum and usage maximum of Magic Keyboard JIS
	netfilter: ip6_tables: zero-initialize fragment offset
	HID: wacom: Add new Intuos BT (CTL-4100WL/CTL-6100WL) device IDs
	ASoC: SOF: loader: release_firmware() on load failure to avoid batching
	netfilter: nf_nat_masquerade: make async masq_inet6_event handling generic
	netfilter: nf_nat_masquerade: defer conntrack walk to work queue
	mac80211: Drop frames from invalid MAC address in ad-hoc mode
	m68k: Handle arrivals of multiple signals correctly
	hwmon: (ltc2947) Properly handle errors when looking for the external clock
	net: prevent user from passing illegal stab size
	mac80211: check return value of rhashtable_init
	vboxfs: fix broken legacy mount signature checking
	net: sun: SUNVNET_COMMON should depend on INET
	drm/amdgpu: fix gart.bo pin_count leak
	scsi: ses: Fix unsigned comparison with less than zero
	scsi: virtio_scsi: Fix spelling mistake "Unsupport" -> "Unsupported"
	perf/core: fix userpage->time_enabled of inactive events
	sched: Always inline is_percpu_thread()
	hwmon: (pmbus/ibm-cffps) max_power_out swap changes
	Linux 5.10.74

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic5235aa8641c64ee2849355a7f10613ae204c74e
2021-10-17 11:42:28 +02:00
Peter Zijlstra
bb893f0754 sched: Always inline is_percpu_thread()
[ Upstream commit 83d40a61046f73103b4e5d8f1310261487ff63b0 ]

  vmlinux.o: warning: objtool: check_preemption_disabled()+0x81: call to is_percpu_thread() leaves .noinstr.text section

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210928084218.063371959@infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-17 10:43:33 +02:00
Greg Kroah-Hartman
beafee90ec Merge 5.10.68 into android12-5.10-lts
Changes in 5.10.68
	drm/bridge: lt9611: Fix handling of 4k panels
	btrfs: fix upper limit for max_inline for page size 64K
	io_uring: ensure symmetry in handling iter types in loop_rw_iter()
	xen: reset legacy rtc flag for PV domU
	bnx2x: Fix enabling network interfaces without VFs
	arm64/sve: Use correct size when reinitialising SVE state
	PM: base: power: don't try to use non-existing RTC for storing data
	PCI: Add AMD GPU multi-function power dependencies
	drm/amd/amdgpu: Increase HWIP_MAX_INSTANCE to 10
	drm/etnaviv: return context from etnaviv_iommu_context_get
	drm/etnaviv: put submit prev MMU context when it exists
	drm/etnaviv: stop abusing mmu_context as FE running marker
	drm/etnaviv: keep MMU context across runtime suspend/resume
	drm/etnaviv: exec and MMU state is lost when resetting the GPU
	drm/etnaviv: fix MMU context leak on GPU reset
	drm/etnaviv: reference MMU context when setting up hardware state
	drm/etnaviv: add missing MMU context put when reaping MMU mapping
	s390/sclp: fix Secure-IPL facility detection
	x86/pat: Pass valid address to sanitize_phys()
	x86/mm: Fix kern_addr_valid() to cope with existing but not present entries
	tipc: fix an use-after-free issue in tipc_recvmsg
	ethtool: Fix rxnfc copy to user buffer overflow
	net/{mlx5|nfp|bnxt}: Remove unnecessary RTNL lock assert
	net-caif: avoid user-triggerable WARN_ON(1)
	ptp: dp83640: don't define PAGE0
	dccp: don't duplicate ccid when cloning dccp sock
	net/l2tp: Fix reference count leak in l2tp_udp_recv_core
	r6040: Restore MDIO clock frequency after MAC reset
	tipc: increase timeout in tipc_sk_enqueue()
	drm/rockchip: cdn-dp-core: Make cdn_dp_core_resume __maybe_unused
	perf machine: Initialize srcline string member in add_location struct
	net/mlx5: FWTrace, cancel work on alloc pd error flow
	net/mlx5: Fix potential sleeping in atomic context
	nvme-tcp: fix io_work priority inversion
	events: Reuse value read using READ_ONCE instead of re-reading it
	net: ipa: initialize all filter table slots
	gen_compile_commands: fix missing 'sys' package
	vhost_net: fix OoB on sendmsg() failure.
	net/af_unix: fix a data-race in unix_dgram_poll
	net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setup
	x86/uaccess: Fix 32-bit __get_user_asm_u64() when CC_HAS_ASM_GOTO_OUTPUT=y
	tcp: fix tp->undo_retrans accounting in tcp_sacktag_one()
	selftest: net: fix typo in altname test
	qed: Handle management FW error
	udp_tunnel: Fix udp_tunnel_nic work-queue type
	dt-bindings: arm: Fix Toradex compatible typo
	ibmvnic: check failover_pending in login response
	KVM: PPC: Book3S HV: Tolerate treclaim. in fake-suspend mode changing registers
	bnxt_en: make bnxt_free_skbs() safe to call after bnxt_free_mem()
	net: hns3: pad the short tunnel frame before sending to hardware
	net: hns3: change affinity_mask to numa node range
	net: hns3: disable mac in flr process
	net: hns3: fix the timing issue of VF clearing interrupt sources
	mm/memory_hotplug: use "unsigned long" for PFN in zone_for_pfn_range()
	dt-bindings: mtd: gpmc: Fix the ECC bytes vs. OOB bytes equation
	mfd: db8500-prcmu: Adjust map to reality
	PCI: Add ACS quirks for NXP LX2xx0 and LX2xx2 platforms
	fuse: fix use after free in fuse_read_interrupt()
	PCI: tegra194: Fix handling BME_CHGED event
	PCI: tegra194: Fix MSI-X programming
	PCI: tegra: Fix OF node reference leak
	mfd: Don't use irq_create_mapping() to resolve a mapping
	PCI: rcar: Fix runtime PM imbalance in rcar_pcie_ep_probe()
	tracing/probes: Reject events which have the same name of existing one
	PCI: cadence: Use bitfield for *quirk_retrain_flag* instead of bool
	PCI: cadence: Add quirk flag to set minimum delay in LTSSM Detect.Quiet state
	PCI: j721e: Add PCIe support for J7200
	PCI: j721e: Add PCIe support for AM64
	PCI: Add ACS quirks for Cavium multi-function devices
	watchdog: Start watchdog in watchdog_set_last_hw_keepalive only if appropriate
	octeontx2-af: Add additional register check to rvu_poll_reg()
	Set fc_nlinfo in nh_create_ipv4, nh_create_ipv6
	net: usb: cdc_mbim: avoid altsetting toggling for Telit LN920
	block, bfq: honor already-setup queue merges
	PCI: ibmphp: Fix double unmap of io_mem
	ethtool: Fix an error code in cxgb2.c
	NTB: Fix an error code in ntb_msit_probe()
	NTB: perf: Fix an error code in perf_setup_inbuf()
	s390/bpf: Fix optimizing out zero-extensions
	s390/bpf: Fix 64-bit subtraction of the -0x80000000 constant
	s390/bpf: Fix branch shortening during codegen pass
	mfd: axp20x: Update AXP288 volatile ranges
	backlight: ktd253: Stabilize backlight
	PCI: of: Don't fail devm_pci_alloc_host_bridge() on missing 'ranges'
	PCI: iproc: Fix BCMA probe resource handling
	netfilter: Fix fall-through warnings for Clang
	netfilter: nft_ct: protect nft_ct_pcpu_template_refcnt with mutex
	KVM: arm64: Restrict IPA size to maximum 48 bits on 4K and 16K page size
	PCI: Fix pci_dev_str_match_path() alloc while atomic bug
	mfd: tqmx86: Clear GPIO IRQ resource when no IRQ is set
	tracing/boot: Fix a hist trigger dependency for boot time tracing
	mtd: mtdconcat: Judge callback existence based on the master
	mtd: mtdconcat: Check _read, _write callbacks existence before assignment
	KVM: arm64: Fix read-side race on updates to vcpu reset state
	KVM: arm64: Handle PSCI resets before userspace touches vCPU state
	PCI: Sync __pci_register_driver() stub for CONFIG_PCI=n
	mtd: rawnand: cafe: Fix a resource leak in the error handling path of 'cafe_nand_probe()'
	ARC: export clear_user_page() for modules
	perf unwind: Do not overwrite FEATURE_CHECK_LDFLAGS-libunwind-{x86,aarch64}
	perf bench inject-buildid: Handle writen() errors
	gpio: mpc8xxx: Fix a resources leak in the error handling path of 'mpc8xxx_probe()'
	gpio: mpc8xxx: Use 'devm_gpiochip_add_data()' to simplify the code and avoid a leak
	net: dsa: tag_rtl4_a: Fix egress tags
	selftests: mptcp: clean tmp files in simult_flows
	net: hso: add failure handler for add_net_device
	net: dsa: b53: Fix calculating number of switch ports
	net: dsa: b53: Set correct number of ports in the DSA struct
	netfilter: socket: icmp6: fix use-after-scope
	fq_codel: reject silly quantum parameters
	qlcnic: Remove redundant unlock in qlcnic_pinit_from_rom
	ip_gre: validate csum_start only on pull
	net: dsa: b53: Fix IMP port setup on BCM5301x
	bnxt_en: fix stored FW_PSID version masks
	bnxt_en: Fix asic.rev in devlink dev info command
	bnxt_en: log firmware debug notifications
	bnxt_en: Consolidate firmware reset event logging.
	bnxt_en: Convert to use netif_level() helpers.
	bnxt_en: Improve logging of error recovery settings information.
	bnxt_en: Fix possible unintended driver initiated error recovery
	mfd: lpc_sch: Partially revert "Add support for Intel Quark X1000"
	mfd: lpc_sch: Rename GPIOBASE to prevent build error
	net: renesas: sh_eth: Fix freeing wrong tx descriptor
	x86/mce: Avoid infinite loop for copy from user recovery
	bnxt_en: Fix error recovery regression
	net: dsa: bcm_sf2: Fix array overrun in bcm_sf2_num_active_ports()
	Linux 5.10.68

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I542f48f8de516dcabce91d3d399583483aba0da7
2021-09-30 18:35:35 +02:00
Tony Luck
619d747c18 x86/mce: Avoid infinite loop for copy from user recovery
commit 81065b35e2486c024c7aa86caed452e1f01a59d4 upstream.

There are two cases for machine check recovery:

1) The machine check was triggered by ring3 (application) code.
   This is the simpler case. The machine check handler simply queues
   work to be executed on return to user. That code unmaps the page
   from all users and arranges to send a SIGBUS to the task that
   triggered the poison.

2) The machine check was triggered in kernel code that is covered by
   an exception table entry. In this case the machine check handler
   still queues a work entry to unmap the page, etc. but this will
   not be called right away because the #MC handler returns to the
   fix up code address in the exception table entry.

Problems occur if the kernel triggers another machine check before the
return to user processes the first queued work item.

Specifically, the work is queued using the ->mce_kill_me callback
structure in the task struct for the current thread. Attempting to queue
a second work item using this same callback results in a loop in the
linked list of work functions to call. So when the kernel does return to
user, it enters an infinite loop processing the same entry for ever.

There are some legitimate scenarios where the kernel may take a second
machine check before returning to the user.

1) Some code (e.g. futex) first tries a get_user() with page faults
   disabled. If this fails, the code retries with page faults enabled
   expecting that this will resolve the page fault.

2) Copy from user code retries a copy in byte-at-time mode to check
   whether any additional bytes can be copied.

On the other side of the fence are some bad drivers that do not check
the return value from individual get_user() calls and may access
multiple user addresses without noticing that some/all calls have
failed.

Fix by adding a counter (current->mce_count) to keep track of repeated
machine checks before task_work() is called. First machine check saves
the address information and calls task_work_add(). Subsequent machine
checks before that task_work call back is executed check that the address
is in the same page as the first machine check (since the callback will
offline exactly one page).

Expected worst case is four machine checks before moving on (e.g. one
user access with page faults disabled, then a repeat to the same address
with page faults enabled ... repeat in copy tail bytes). Just in case
there is some code that loops forever enforce a limit of 10.

 [ bp: Massage commit message, drop noinstr, fix typo, extend panic
   messages. ]

Fixes: 5567d11c21 ("x86/mce: Send #MC singal from task work")
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/YT/IJ9ziLqmtqEPu@agluck-desk2.amr.corp.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22 12:28:07 +02:00
Greg Kroah-Hartman
82658bfd88 Merge 5.10.44 into android12-5.10-lts
Changes in 5.10.44
	proc: Track /proc/$pid/attr/ opener mm_struct
	ASoC: max98088: fix ni clock divider calculation
	ASoC: amd: fix for pcm_read() error
	spi: Fix spi device unregister flow
	spi: spi-zynq-qspi: Fix stack violation bug
	bpf: Forbid trampoline attach for functions with variable arguments
	net/nfc/rawsock.c: fix a permission check bug
	usb: cdns3: Fix runtime PM imbalance on error
	ASoC: Intel: bytcr_rt5640: Add quirk for the Glavey TM800A550L tablet
	ASoC: Intel: bytcr_rt5640: Add quirk for the Lenovo Miix 3-830 tablet
	vfio-ccw: Reset FSM state to IDLE inside FSM
	vfio-ccw: Serialize FSM IDLE state with I/O completion
	ASoC: sti-sas: add missing MODULE_DEVICE_TABLE
	spi: sprd: Add missing MODULE_DEVICE_TABLE
	usb: chipidea: udc: assign interrupt number to USB gadget structure
	isdn: mISDN: netjet: Fix crash in nj_probe:
	bonding: init notify_work earlier to avoid uninitialized use
	netlink: disable IRQs for netlink_lock_table()
	net: mdiobus: get rid of a BUG_ON()
	cgroup: disable controllers at parse time
	wq: handle VM suspension in stall detection
	net/qla3xxx: fix schedule while atomic in ql_sem_spinlock
	RDS tcp loopback connection can hang
	net:sfc: fix non-freed irq in legacy irq mode
	scsi: bnx2fc: Return failure if io_req is already in ABTS processing
	scsi: vmw_pvscsi: Set correct residual data length
	scsi: hisi_sas: Drop free_irq() of devm_request_irq() allocated irq
	scsi: target: qla2xxx: Wait for stop_phase1 at WWN removal
	net: macb: ensure the device is available before accessing GEMGXL control registers
	net: appletalk: cops: Fix data race in cops_probe1
	net: dsa: microchip: enable phy errata workaround on 9567
	nvme-fabrics: decode host pathing error for connect
	MIPS: Fix kernel hang under FUNCTION_GRAPH_TRACER and PREEMPT_TRACER
	dm verity: fix require_signatures module_param permissions
	bnx2x: Fix missing error code in bnx2x_iov_init_one()
	nvme-tcp: remove incorrect Kconfig dep in BLK_DEV_NVME
	nvmet: fix false keep-alive timeout when a controller is torn down
	powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P2041 i2c controllers
	powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P1010 i2c controllers
	spi: Don't have controller clean up spi device before driver unbind
	spi: Cleanup on failure of initial setup
	i2c: mpc: Make use of i2c_recover_bus()
	i2c: mpc: implement erratum A-004447 workaround
	ALSA: seq: Fix race of snd_seq_timer_open()
	ALSA: firewire-lib: fix the context to call snd_pcm_stop_xrun()
	ALSA: hda/realtek: headphone and mic don't work on an Acer laptop
	ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP Elite Dragonfly G2
	ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP EliteBook x360 1040 G8
	ALSA: hda/realtek: fix mute/micmute LEDs for HP EliteBook 840 Aero G8
	ALSA: hda/realtek: fix mute/micmute LEDs for HP ZBook Power G8
	spi: bcm2835: Fix out-of-bounds access with more than 4 slaves
	Revert "ACPI: sleep: Put the FACS table after using it"
	drm: Fix use-after-free read in drm_getunique()
	drm: Lock pointer access in drm_master_release()
	perf/x86/intel/uncore: Fix M2M event umask for Ice Lake server
	KVM: X86: MMU: Use the correct inherited permissions to get shadow page
	kvm: avoid speculation-based attacks from out-of-range memslot accesses
	staging: rtl8723bs: Fix uninitialized variables
	async_xor: check src_offs is not NULL before updating it
	btrfs: return value from btrfs_mark_extent_written() in case of error
	btrfs: promote debugging asserts to full-fledged checks in validate_super
	cgroup1: don't allow '\n' in renaming
	ftrace: Do not blindly read the ip address in ftrace_bug()
	mmc: renesas_sdhi: abort tuning when timeout detected
	mmc: renesas_sdhi: Fix HS400 on R-Car M3-W+
	USB: f_ncm: ncm_bitrate (speed) is unsigned
	usb: f_ncm: only first packet of aggregate needs to start timer
	usb: pd: Set PD_T_SINK_WAIT_CAP to 310ms
	usb: dwc3-meson-g12a: fix usb2 PHY glue init when phy0 is disabled
	usb: dwc3: meson-g12a: Disable the regulator in the error handling path of the probe
	usb: dwc3: gadget: Bail from dwc3_gadget_exit() if dwc->gadget is NULL
	usb: dwc3: ep0: fix NULL pointer exception
	usb: musb: fix MUSB_QUIRK_B_DISCONNECT_99 handling
	usb: typec: wcove: Use LE to CPU conversion when accessing msg->header
	usb: typec: ucsi: Clear PPM capability data in ucsi_init() error path
	usb: typec: intel_pmc_mux: Put fwnode in error case during ->probe()
	usb: typec: intel_pmc_mux: Add missed error check for devm_ioremap_resource()
	usb: gadget: f_fs: Ensure io_completion_wq is idle during unbind
	USB: serial: ftdi_sio: add NovaTech OrionMX product ID
	USB: serial: omninet: add device id for Zyxel Omni 56K Plus
	USB: serial: quatech2: fix control-request directions
	USB: serial: cp210x: fix alternate function for CP2102N QFN20
	usb: gadget: eem: fix wrong eem header operation
	usb: fix various gadgets null ptr deref on 10gbps cabling.
	usb: fix various gadget panics on 10gbps cabling
	usb: typec: tcpm: cancel vdm and state machine hrtimer when unregister tcpm port
	usb: typec: tcpm: cancel frs hrtimer when unregister tcpm port
	regulator: core: resolve supply for boot-on/always-on regulators
	regulator: max77620: Use device_set_of_node_from_dev()
	regulator: bd718x7: Fix the BUCK7 voltage setting on BD71837
	regulator: fan53880: Fix missing n_voltages setting
	regulator: bd71828: Fix .n_voltages settings
	regulator: rtmv20: Fix .set_current_limit/.get_current_limit callbacks
	phy: usb: Fix misuse of IS_ENABLED
	usb: dwc3: gadget: Disable gadget IRQ during pullup disable
	usb: typec: mux: Fix copy-paste mistake in typec_mux_match
	drm/mcde: Fix off by 10^3 in calculation
	drm/msm/a6xx: fix incorrectly set uavflagprd_inv field for A650
	drm/msm/a6xx: update/fix CP_PROTECT initialization
	drm/msm/a6xx: avoid shadow NULL reference in failure path
	RDMA/ipoib: Fix warning caused by destroying non-initial netns
	RDMA/mlx4: Do not map the core_clock page to user space unless enabled
	ARM: cpuidle: Avoid orphan section warning
	vmlinux.lds.h: Avoid orphan section with !SMP
	tools/bootconfig: Fix error return code in apply_xbc()
	phy: cadence: Sierra: Fix error return code in cdns_sierra_phy_probe()
	ASoC: core: Fix Null-point-dereference in fmt_single_name()
	ASoC: meson: gx-card: fix sound-dai dt schema
	phy: ti: Fix an error code in wiz_probe()
	gpio: wcd934x: Fix shift-out-of-bounds error
	perf: Fix data race between pin_count increment/decrement
	sched/fair: Keep load_avg and load_sum synced
	sched/fair: Make sure to update tg contrib for blocked load
	sched/fair: Fix util_est UTIL_AVG_UNCHANGED handling
	x86/nmi_watchdog: Fix old-style NMI watchdog regression on old Intel CPUs
	KVM: x86: Ensure liveliness of nested VM-Enter fail tracepoint message
	IB/mlx5: Fix initializing CQ fragments buffer
	NFS: Fix a potential NULL dereference in nfs_get_client()
	NFSv4: Fix deadlock between nfs4_evict_inode() and nfs4_opendata_get_inode()
	perf session: Correct buffer copying when peeking events
	kvm: fix previous commit for 32-bit builds
	NFS: Fix use-after-free in nfs4_init_client()
	NFSv4: Fix second deadlock in nfs4_evict_inode()
	NFSv4: nfs4_proc_set_acl needs to restore NFS_CAP_UIDGID_NOMAP on error.
	scsi: core: Fix error handling of scsi_host_alloc()
	scsi: core: Fix failure handling of scsi_add_host_with_dma()
	scsi: core: Put .shost_dev in failure path if host state changes to RUNNING
	scsi: core: Only put parent device if host state differs from SHOST_CREATED
	tracing: Correct the length check which causes memory corruption
	proc: only require mm_struct for writing
	Linux 5.10.44

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic64172b4e72ccb54d96000b3065dd8b33aa9fef5
2021-06-16 13:14:03 +02:00
Dietmar Eggemann
190a7f9089 sched/fair: Fix util_est UTIL_AVG_UNCHANGED handling
commit 68d7a190682aa4eb02db477328088ebad15acc83 upstream.

The util_est internal UTIL_AVG_UNCHANGED flag which is used to prevent
unnecessary util_est updates uses the LSB of util_est.enqueued. It is
exposed via _task_util_est() (and task_util_est()).

Commit 92a801e5d5 ("sched/fair: Mask UTIL_AVG_UNCHANGED usages")
mentions that the LSB is lost for util_est resolution but
find_energy_efficient_cpu() checks if task_util_est() returns 0 to
return prev_cpu early.

_task_util_est() returns the max value of util_est.ewma and
util_est.enqueued or'ed w/ UTIL_AVG_UNCHANGED.
So task_util_est() returning the max of task_util() and
_task_util_est() will never return 0 under the default
SCHED_FEAT(UTIL_EST, true).

To fix this use the MSB of util_est.enqueued instead and keep the flag
util_est internal, i.e. don't export it via _task_util_est().

The maximal possible util_avg value for a task is 1024 so the MSB of
'unsigned int util_est.enqueued' isn't used to store a util value.

As a caveat the code behind the util_est_se trace point has to filter
UTIL_AVG_UNCHANGED to see the real util_est.enqueued value which should
be easy to do.

This also fixes an issue report by Xuewen Yan that util_est_update()
only used UTIL_AVG_UNCHANGED for the subtrahend of the equation:

  last_enqueued_diff = ue.enqueued - (task_util() | UTIL_AVG_UNCHANGED)

Fixes: b89997aa88f0b sched/pelt: Fix task util_est update filtering
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Xuewen Yan <xuewen.yan@unisoc.com>
Reviewed-by: Vincent Donnefort <vincent.donnefort@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20210602145808.1562603-1-dietmar.eggemann@arm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-16 12:01:46 +02:00
Liangliang Li
e1611d8751 ANDROID: GKI: Enlarge OEM data reserved in task_struct
Enlarge OEM data reserved in struct task_struct to support OEM's
feature.

Bug: 188018141
Change-Id: I242459e0ecf327475bb4814110f89f160cca0ad1
Signed-off-by: Liangliang Li <liliangliang@vivo.com>
2021-06-04 11:15:16 -07:00
Greg Kroah-Hartman
2d118bc904 ANDROID: GKI: sched.h: add Android ABI padding to some structures
Try to mitigate potential future driver core api changes by adding a
pointer to struct signal_struct, struct sched_entity, struct
sched_rt_entity, and struct task_struct.

Based on a patch from Michal Marek <mmarek@suse.cz> from the SLES kernel

Leaf changes summary: 3 artifacts changed
Changed leaf types summary: 3 leaf types changed
Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 0 Added function
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable

'struct sched_entity at sched.h:444:1' changed:
  type size changed from 3584 to 4096 (in bits)
  4 data member insertions:
    'u64 sched_entity::android_kabi_reserved1', at offset 3584 (in bits) at sched.h:481:1
    'u64 sched_entity::android_kabi_reserved2', at offset 3648 (in bits) at sched.h:482:1
    'u64 sched_entity::android_kabi_reserved3', at offset 3712 (in bits) at sched.h:483:1
    'u64 sched_entity::android_kabi_reserved4', at offset 3776 (in bits) at sched.h:484:1
  1435 impacted interfaces:

'struct sched_rt_entity at sched.h:481:1' changed:
  type size changed from 384 to 640 (in bits)
  4 data member insertions:
    'u64 sched_rt_entity::android_kabi_reserved1', at offset 384 (in bits) at sched.h:504:1
    'u64 sched_rt_entity::android_kabi_reserved2', at offset 448 (in bits) at sched.h:505:1
    'u64 sched_rt_entity::android_kabi_reserved3', at offset 512 (in bits) at sched.h:506:1
    'u64 sched_rt_entity::android_kabi_reserved4', at offset 576 (in bits) at sched.h:507:1
  1435 impacted interfaces:

'struct task_struct at sched.h:624:1' changed:
  type size changed from 28672 to 30208 (in bits)
  8 data member insertions:
    'u64 task_struct::android_kabi_reserved1', at offset 20992 (in bits) at sched.h:1294:1
    'u64 task_struct::android_kabi_reserved2', at offset 21056 (in bits) at sched.h:1295:1
    'u64 task_struct::android_kabi_reserved3', at offset 21120 (in bits) at sched.h:1296:1
    'u64 task_struct::android_kabi_reserved4', at offset 21184 (in bits) at sched.h:1297:1
    'u64 task_struct::android_kabi_reserved5', at offset 21248 (in bits) at sched.h:1298:1
    'u64 task_struct::android_kabi_reserved6', at offset 21312 (in bits) at sched.h:1299:1
    'u64 task_struct::android_kabi_reserved7', at offset 21376 (in bits) at sched.h:1300:1
    'u64 task_struct::android_kabi_reserved8', at offset 21440 (in bits) at sched.h:1301:1
  there are data member changes:
  1435 impacted interfaces:

Bug: 151154716
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I1449735b836399e9b356608727092334daf5c36b
2021-03-22 15:06:24 +00:00
xieliujie
53e8099784 ANDROID: vendor_hooks: Add hooks for scheduler
Add hooks to support oem's scheduler feature.

Bug: 182441701
Signed-off-by: xieliujie <xieliujie@oppo.com>
Change-Id: I31023844b20923bed1ca216831e7a2431e9ab0c9
2021-03-16 09:08:22 +00:00
Shaleen Agrawal
1e674650ff ANDROID: sched: Fix wake_q length tracking
The current approach to carry the wake_q length is exposed to an
intertask stack access. For example, if A sets the wake_q_head for
B but is preempted before it is able to set it back to NULL,
then B continues to point to an address corresponding to A's stack.
If B is then woken up by another task, it ends up accessing
the address pointing to A's stack. This causes a memory fault.

Replace this with a simple parameter which indicates the number
of tasks that are being woken up as part of the same event. This
avoids saving and accessing on stack pointers.

Bug: 173981591
Change-Id: I0031747d79a27673e680f7b1121eb4896ac7c699
Signed-off-by: Shaleen Agrawal <shalagra@codeaurora.org>
2021-02-09 20:04:50 +00:00
Will Deacon
0d2b4a64a9 FROMLIST: sched: Introduce force_compatible_cpus_allowed_ptr() to limit CPU affinity
Asymmetric systems may not offer the same level of userspace ISA support
across all CPUs, meaning that some applications cannot be executed by
some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do
not feature support for 32-bit applications on both clusters.

Although userspace can carefully manage the affinity masks for such
tasks, one place where it is particularly problematic is execve()
because the CPU on which the execve() is occurring may be incompatible
with the new application image. In such a situation, it is desirable to
restrict the affinity mask of the task and ensure that the new image is
entered on a compatible CPU. From userspace's point of view, this looks
the same as if the incompatible CPUs have been hotplugged off in the
task's affinity mask.

In preparation for restricting the affinity mask for compat tasks on
arm64 systems without uniform support for 32-bit applications, introduce
force_compatible_cpus_allowed_ptr(), which restricts the affinity mask
for a task to contain only compatible CPUs.

Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Bug: 178507149
Link: https://lore.kernel.org/linux-arch/20201208132835.6151-11-will@kernel.org/
Signed-off-by: Will Deacon <willdeacon@google.com>
Change-Id: Ief3e47f6aa8179eadf2e009f207cc0161b76f466
2021-02-05 09:20:53 +00:00
Todd Kjos
403e11fa53 ANDROID: use ANDROID_OEM_DATA for OEM data
Change vendor fields to OEM fields by using
ANDROID_OEM_DATA when the vendor fields were
originally requested by an OEM.

Bug: 177481081
Signed-off-by: Todd Kjos <tkjos@google.com>
Change-Id: Iaf1e80e07bfd78efb5a9b7ff5894ff751f272f23
2021-01-28 02:50:00 +00:00
Andrey Konovalov
e0ae1141ab UPSTREAM: kasan, arm64: only use kasan_depth for software modes
[ Upstream commit d73b49365ee65ac48074bdb5aa717bb4644dbbb7 ]

This is a preparatory commit for the upcoming addition of a new hardware
tag-based (MTE-based) KASAN mode.

Hardware tag-based KASAN won't use kasan_depth.  Only define and use it
when one of the software KASAN modes are enabled.

No functional changes for software modes.

Link: https://lkml.kernel.org/r/e16f15aeda90bc7fb4dfc2e243a14b74cc5c8219.1606161801.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 172318110
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Change-Id: I553d5ca1fa50ae80cd2eb929e328ac3cb3ce0e9f
2021-01-19 21:47:28 -08:00
Vincent Donnefort
e19b8ce907 ANDROID: cpu/hotplug: add migration to paused_cpus
paused_cpus intending to force CPUs to go idle as quickly as possible,
adding a migration step, to drain the rq from any running task.

Two steps are actually needed. The first one, "lazy", will run before the
cpu_active_mask has been synced. The second one will run after. It is
possible for another CPU, to observe an outdated version of that mask and
to enqueue a task on a rq that has just been marked inactive. The second
migration is there to catch any of those spurious move, while the first
one will drain the rq as quickly as possible to let the CPU reach an idle
state.

Bug: 161210528
Change-Id: Ie26c2e4c42665dd61d41a899a84536e56bf2b887
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
2020-12-08 19:09:07 +00:00
Quentin Perret
1a29b384b0 ANDROID: sched: Track wake_q length
Some partners have value-adds based on aosp/540066, which cannot be
carried in ACK in its entirety as it no longer makes sense as-is (the
select_idle_capacity() rework upstream solved the issue differently).

It seems that those partners do not actually need the wake-wide tweaks,
they only need to access the wake_q length for wake-up balance. To
support this, add minimal tracking to the wake_q infrastructure in the
core kernel, but do that by adding a pointer to the wake_q_head to
task_struct directly to not litter all sched classes with an additional
sibling_count_hint argument to the select_task_rq callbacks.

Modules needing to access the wake_q length can do so by dereferencing
p->wake_q_head in the wake-up path when it is non-NULL.

Bug: 173981591
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I9a98167face92e70aba847d9f04d0c216065478c
2020-12-01 00:48:34 +00:00
Greg Kroah-Hartman
5acba58e59 Merge 5.10-rc5 into android-mainline
Linux 5.10-rc5

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia5b23cceb3e0212c1c841f1297ecfab65cc9aaa6
2020-11-23 08:17:16 +01:00
Linus Torvalds
f4b936f5d6 Merge tag 'sched-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
 "A couple of scheduler fixes:

   - Make the conditional update of the overutilized state work
     correctly by caching the relevant flags state before overwriting
     them and checking them afterwards.

   - Fix a data race in the wakeup path which caused loadavg on ARM64
     platforms to become a random number generator.

   - Fix the ordering of the iowaiter accounting operations so it can't
     be decremented before it is incremented.

   - Fix a bug in the deadline scheduler vs. priority inheritance when a
     non-deadline task A has inherited the parameters of a deadline task
     B and then blocks on a non-deadline task C.

     The second inheritance step used the static deadline parameters of
     task A, which are usually 0, instead of further propagating task
     B's parameters. The zero initialized parameters trigger a bug in
     the deadline scheduler"

* tag 'sched-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/deadline: Fix priority inheritance with multiple scheduling classes
  sched: Fix rq->nr_iowait ordering
  sched: Fix data-race in wakeup
  sched/fair: Fix overutilized update in enqueue_task_fair()
2020-11-22 13:26:07 -08:00
Juri Lelli
2279f540ea sched/deadline: Fix priority inheritance with multiple scheduling classes
Glenn reported that "an application [he developed produces] a BUG in
deadline.c when a SCHED_DEADLINE task contends with CFS tasks on nested
PTHREAD_PRIO_INHERIT mutexes.  I believe the bug is triggered when a CFS
task that was boosted by a SCHED_DEADLINE task boosts another CFS task
(nested priority inheritance).

 ------------[ cut here ]------------
 kernel BUG at kernel/sched/deadline.c:1462!
 invalid opcode: 0000 [#1] PREEMPT SMP
 CPU: 12 PID: 19171 Comm: dl_boost_bug Tainted: ...
 Hardware name: ...
 RIP: 0010:enqueue_task_dl+0x335/0x910
 Code: ...
 RSP: 0018:ffffc9000c2bbc68 EFLAGS: 00010002
 RAX: 0000000000000009 RBX: ffff888c0af94c00 RCX: ffffffff81e12500
 RDX: 000000000000002e RSI: ffff888c0af94c00 RDI: ffff888c10b22600
 RBP: ffffc9000c2bbd08 R08: 0000000000000009 R09: 0000000000000078
 R10: ffffffff81e12440 R11: ffffffff81e1236c R12: ffff888bc8932600
 R13: ffff888c0af94eb8 R14: ffff888c10b22600 R15: ffff888bc8932600
 FS:  00007fa58ac55700(0000) GS:ffff888c10b00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fa58b523230 CR3: 0000000bf44ab003 CR4: 00000000007606e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  ? intel_pstate_update_util_hwp+0x13/0x170
  rt_mutex_setprio+0x1cc/0x4b0
  task_blocks_on_rt_mutex+0x225/0x260
  rt_spin_lock_slowlock_locked+0xab/0x2d0
  rt_spin_lock_slowlock+0x50/0x80
  hrtimer_grab_expiry_lock+0x20/0x30
  hrtimer_cancel+0x13/0x30
  do_nanosleep+0xa0/0x150
  hrtimer_nanosleep+0xe1/0x230
  ? __hrtimer_init_sleeper+0x60/0x60
  __x64_sys_nanosleep+0x8d/0xa0
  do_syscall_64+0x4a/0x100
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
 RIP: 0033:0x7fa58b52330d
 ...
 ---[ end trace 0000000000000002 ]—

He also provided a simple reproducer creating the situation below:

 So the execution order of locking steps are the following
 (N1 and N2 are non-deadline tasks. D1 is a deadline task. M1 and M2
 are mutexes that are enabled * with priority inheritance.)

 Time moves forward as this timeline goes down:

 N1              N2               D1
 |               |                |
 |               |                |
 Lock(M1)        |                |
 |               |                |
 |             Lock(M2)           |
 |               |                |
 |               |              Lock(M2)
 |               |                |
 |             Lock(M1)           |
 |             (!!bug triggered!) |

Daniel reported a similar situation as well, by just letting ksoftirqd
run with DEADLINE (and eventually block on a mutex).

Problem is that boosted entities (Priority Inheritance) use static
DEADLINE parameters of the top priority waiter. However, there might be
cases where top waiter could be a non-DEADLINE entity that is currently
boosted by a DEADLINE entity from a different lock chain (i.e., nested
priority chains involving entities of non-DEADLINE classes). In this
case, top waiter static DEADLINE parameters could be null (initialized
to 0 at fork()) and replenish_dl_entity() would hit a BUG().

Fix this by keeping track of the original donor and using its parameters
when a task is boosted.

Reported-by: Glenn Elliott <glenn@aurora.tech>
Reported-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lkml.kernel.org/r/20201117061432.517340-1-juri.lelli@redhat.com
2020-11-17 13:15:28 +01:00
Peter Zijlstra
f97bb5272d sched: Fix data-race in wakeup
Mel reported that on some ARM64 platforms loadavg goes bananas and
Will tracked it down to the following race:

  CPU0					CPU1

  schedule()
    prev->sched_contributes_to_load = X;
    deactivate_task(prev);

					try_to_wake_up()
					  if (p->on_rq &&) // false
					  if (smp_load_acquire(&p->on_cpu) && // true
					      ttwu_queue_wakelist())
					        p->sched_remote_wakeup = Y;

    smp_store_release(prev->on_cpu, 0);

where both p->sched_contributes_to_load and p->sched_remote_wakeup are
in the same word, and thus the stores X and Y race (and can clobber
one another's data).

Whereas prior to commit c6e7bd7afa ("sched/core: Optimize ttwu()
spinning on p->on_cpu") the p->on_cpu handoff serialized access to
p->sched_remote_wakeup (just as it still does with
p->sched_contributes_to_load) that commit broke that by calling
ttwu_queue_wakelist() with p->on_cpu != 0.

However, due to

  p->XXX = X			ttwu()
  schedule()			  if (p->on_rq && ...) // false
    smp_mb__after_spinlock()	  if (smp_load_acquire(&p->on_cpu) &&
    deactivate_task()		      ttwu_queue_wakelist())
      p->on_rq = 0;		        p->sched_remote_wakeup = Y;

We can be sure any 'current' store is complete and 'current' is
guaranteed asleep. Therefore we can move p->sched_remote_wakeup into
the current flags word.

Note: while the observed failure was loadavg accounting gone wrong due
to ttwu() cobbering p->sched_contributes_to_load, the reverse problem
is also possible where schedule() clobbers p->sched_remote_wakeup,
this could result in enqueue_entity() wrecking ->vruntime and causing
scheduling artifacts.

Fixes: c6e7bd7afa ("sched/core: Optimize ttwu() spinning on p->on_cpu")
Reported-by: Mel Gorman <mgorman@techsingularity.net>
Debugged-by: Will Deacon <will@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201117083016.GK3121392@hirez.programming.kicks-ass.net
2020-11-17 13:15:27 +01:00
Pavankumar Kondeti
0578248bed ANDROID: softirq: defer softirq processing to ksoftirqd if CPU is busy with RT
Defer the softirq processing to ksoftirqd if a RT task is running
or queued on the current CPU. This complements the RT task placement
algorithm which tries to find a CPU that is not currently busy with
softirqs.

Currently NET_TX, NET_RX, BLOCK and TASKLET softirqs are only deferred
as they can potentially run for long time.

Bug: 168521633
Change-Id: Id7665244af6bbd5a96d9e591cf26154e9eaa860c
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
[satyap@codeaurora.org: trivial merge conflict resolution.]
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
[elavila: Port to mainline, squash with bugfix]
Signed-off-by: J. Avila <elavila@google.com>
2020-11-10 19:07:11 +00:00
Satya Durga Srinivasu Prabhala
7a2a316228 ANDROID: sched: gki: add padding to some structs to support WALT
Add padding to below structs to support WALT based accounting:
	1. struct cpu_topology
	2. struct task_struct
	3. struct sched_domain_shared
	4. struct task_group
	5. struct root_domain
	6. struct rq

To accommodate potential future changes, reserving more memory than
what WALT needs today.

Bug: 171858786
Change-Id: If6d901174fc7963be3ae44daa799cb2953669ec1
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
2020-10-29 16:51:28 +00:00
Greg Kroah-Hartman
05d2a661fd Merge 54a4c789ca ("Merge tag 'docs/v5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media") into android-mainline
Steps on the way to 5.10-rc1

Resolves conflicts in:
	fs/userfaultfd.c

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie3fe3c818f1f6565cfd4fa551de72d2b72ef60af
2020-10-26 09:23:33 +01:00
Greg Kroah-Hartman
75c90a8c3a Merge d5660df4a5 ("Merge branch 'akpm' (patches from Andrew)") into android-mainline
steps on the way to 5.10-rc1

Change-Id: Iddc84c25b6a9d71fa8542b927d6f69c364131c3d
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2020-10-25 11:57:29 +01:00
Greg Kroah-Hartman
e24a525c0d Merge 6ad4bf6ea1 ("Merge tag 'io_uring-5.10-2020-10-12' of git://git.kernel.dk/linux-block") into android-mainline
Resolves conflict with:
	kernel/fork.c

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6c031ef388bfddfe80a64929d62969718755f740
2020-10-25 11:50:09 +01:00
Greg Kroah-Hartman
c2f57c792b Merge edaa5ddf38 ("Merge tag 'sched-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") into android-mainline
Steps on the way to 5.10-rc1

Change-Id: Ica8a8f76ad5735ebcf4f30f3156edf7f9f60e576
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2020-10-21 10:54:38 +02:00
Greg Kroah-Hartman
1d0156db22 Merge ca1b66922a ("Merge tag 'ras_updates_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") into android-mainline
Tiny steps on the way to 5.10-rc1

Change-Id: I7e303e545e85a0c989f21f98784a7fe6810478c7
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2020-10-21 07:45:19 +02:00
Elena Petrova
5cf53f3ce3 sched.h: drop in_ubsan field when UBSAN is in trap mode
in_ubsan field of task_struct is only used in lib/ubsan.c, which in its
turn is used only `ifneq ($(CONFIG_UBSAN_TRAP),y)`.

Removing unnecessary field from a task_struct will help preserve the ABI
between vanilla and CONFIG_UBSAN_TRAP'ed kernels.  In particular, this
will help enabling bounds sanitizer transparently for Android's GKI.

Signed-off-by: Elena Petrova <lenaptr@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Jann Horn <jannh@google.com>
Link: https://lkml.kernel.org/r/20200910134802.3160311-1-lenaptr@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:22 -07:00
Patricia Alfonso
393824f650 kasan/kunit: add KUnit Struct to Current Task
Patch series "KASAN-KUnit Integration", v14.

This patchset contains everything needed to integrate KASAN and KUnit.

KUnit will be able to:
(1) Fail tests when an unexpected KASAN error occurs
(2) Pass tests when an expected KASAN error occurs

Convert KASAN tests to KUnit with the exception of copy_user_test because
KUnit is unable to test those.

Add documentation on how to run the KASAN tests with KUnit and what to
expect when running these tests.

This patch (of 5):

In order to integrate debugging tools like KASAN into the KUnit framework,
add KUnit struct to the current task to keep track of the current KUnit
test.

Signed-off-by: Patricia Alfonso <trishalfonso@google.com>
Signed-off-by: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Shuah Khan <shuah@kernel.org>
Link: https://lkml.kernel.org/r/20200915035828.570483-1-davidgow@google.com
Link: https://lkml.kernel.org/r/20200915035828.570483-2-davidgow@google.com
Link: https://lkml.kernel.org/r/20200910070331.3358048-1-davidgow@google.com
Link: https://lkml.kernel.org/r/20200910070331.3358048-2-davidgow@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:32 -07:00
Linus Torvalds
6ad4bf6ea1 Merge tag 'io_uring-5.10-2020-10-12' of git://git.kernel.dk/linux-block
Pull io_uring updates from Jens Axboe:

 - Add blkcg accounting for io-wq offload (Dennis)

 - A use-after-free fix for io-wq (Hillf)

 - Cancelation fixes and improvements

 - Use proper files_struct references for offload

 - Cleanup of io_uring_get_socket() since that can now go into our own
   header

 - SQPOLL fixes and cleanups, and support for sharing the thread

 - Improvement to how page accounting is done for registered buffers and
   huge pages, accounting the real pinned state

 - Series cleaning up the xarray code (Willy)

 - Various cleanups, refactoring, and improvements (Pavel)

 - Use raw spinlock for io-wq (Sebastian)

 - Add support for ring restrictions (Stefano)

* tag 'io_uring-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (62 commits)
  io_uring: keep a pointer ref_node in file_data
  io_uring: refactor *files_register()'s error paths
  io_uring: clean file_data access in files_register
  io_uring: don't delay io_init_req() error check
  io_uring: clean leftovers after splitting issue
  io_uring: remove timeout.list after hrtimer cancel
  io_uring: use a separate struct for timeout_remove
  io_uring: improve submit_state.ios_left accounting
  io_uring: simplify io_file_get()
  io_uring: kill extra check in fixed io_file_get()
  io_uring: clean up ->files grabbing
  io_uring: don't io_prep_async_work() linked reqs
  io_uring: Convert advanced XArray uses to the normal API
  io_uring: Fix XArray usage in io_uring_add_task_file
  io_uring: Fix use of XArray in __io_uring_files_cancel
  io_uring: fix break condition for __io_uring_register() waiting
  io_uring: no need to call xa_destroy() on empty xarray
  io_uring: batch account ->req_issue and task struct references
  io_uring: kill callback_head argument for io_req_task_work_add()
  io_uring: move req preps out of io_issue_sqe()
  ...
2020-10-13 12:36:21 -07:00
Linus Torvalds
edaa5ddf38 Merge tag 'sched-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:

 - reorganize & clean up the SD* flags definitions and add a bunch of
   sanity checks. These new checks caught quite a few bugs or at least
   inconsistencies, resulting in another set of patches.

 - rseq updates, add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ

 - add a new tracepoint to improve CPU capacity tracking

 - improve overloaded SMP system load-balancing behavior

 - tweak SMT balancing

 - energy-aware scheduling updates

 - NUMA balancing improvements

 - deadline scheduler fixes and improvements

 - CPU isolation fixes

 - misc cleanups, simplifications and smaller optimizations

* tag 'sched-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
  sched/deadline: Unthrottle PI boosted threads while enqueuing
  sched/debug: Add new tracepoint to track cpu_capacity
  sched/fair: Tweak pick_next_entity()
  rseq/selftests: Test MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
  rseq/selftests,x86_64: Add rseq_offset_deref_addv()
  rseq/membarrier: Add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
  sched/fair: Use dst group while checking imbalance for NUMA balancer
  sched/fair: Reduce busy load balance interval
  sched/fair: Minimize concurrent LBs between domain level
  sched/fair: Reduce minimal imbalance threshold
  sched/fair: Relax constraint on task's load during load balance
  sched/fair: Remove the force parameter of update_tg_load_avg()
  sched/fair: Fix wrong cpu selecting from isolated domain
  sched: Remove unused inline function uclamp_bucket_base_value()
  sched/rt: Disable RT_RUNTIME_SHARE by default
  sched/deadline: Fix stale throttling on de-/boosted tasks
  sched/numa: Use runnable_avg to classify node
  sched/topology: Move sd_flag_debug out of #ifdef CONFIG_SYSCTL
  MAINTAINERS: Add myself as SCHED_DEADLINE reviewer
  sched/topology: Move SD_DEGENERATE_GROUPS_MASK out of linux/sched/topology.h
  ...
2020-10-12 12:56:01 -07:00
Linus Torvalds
ca1b66922a Merge tag 'ras_updates_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Borislav Petkov:

 - Extend the recovery from MCE in kernel space also to processes which
   encounter an MCE in kernel space but while copying from user memory
   by sending them a SIGBUS on return to user space and umapping the
   faulty memory, by Tony Luck and Youquan Song.

 - memcpy_mcsafe() rework by splitting the functionality into
   copy_mc_to_user() and copy_mc_to_kernel(). This, as a result, enables
   support for new hardware which can recover from a machine check
   encountered during a fast string copy and makes that the default and
   lets the older hardware which does not support that advance recovery,
   opt in to use the old, fragile, slow variant, by Dan Williams.

 - New AMD hw enablement, by Yazen Ghannam and Akshay Gupta.

 - Do not use MSR-tracing accessors in #MC context and flag any fault
   while accessing MCA architectural MSRs as an architectural violation
   with the hope that such hw/fw misdesigns are caught early during the
   hw eval phase and they don't make it into production.

 - Misc fixes, improvements and cleanups, as always.

* tag 'ras_updates_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Allow for copy_mc_fragile symbol checksum to be generated
  x86/mce: Decode a kernel instruction to determine if it is copying from user
  x86/mce: Recover from poison found while copying from user space
  x86/mce: Avoid tail copy when machine check terminated a copy from user
  x86/mce: Add _ASM_EXTABLE_CPY for copy user access
  x86/mce: Provide method to find out the type of an exception handler
  x86/mce: Pass pointer to saved pt_regs to severity calculation routines
  x86/copy_mc: Introduce copy_mc_enhanced_fast_string()
  x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()
  x86/mce: Drop AMD-specific "DEFERRED" case from Intel severity rule list
  x86/mce: Add Skylake quirk for patrol scrub reported errors
  RAS/CEC: Convert to DEFINE_SHOW_ATTRIBUTE()
  x86/mce: Annotate mce_rd/wrmsrl() with noinstr
  x86/mce/dev-mcelog: Do not update kflags on AMD systems
  x86/mce: Stop mce_reign() from re-computing severity for every CPU
  x86/mce: Make mce_rdmsrl() panic on an inaccessible MSR
  x86/mce: Increase maximum number of banks to 64
  x86/mce: Delay clearing IA32_MCG_STATUS to the end of do_machine_check()
  x86/MCE/AMD, EDAC/mce_amd: Remove struct smca_hwid.xec_bitmap
  RAS/CEC: Fix cec_init() prototype
2020-10-12 10:14:38 -07:00
Tony Luck
c0ab7ffce2 x86/mce: Recover from poison found while copying from user space
Existing kernel code can only recover from a machine check on code that
is tagged in the exception table with a fault handling recovery path.

Add two new fields in the task structure to pass information from
machine check handler to the "task_work" that is queued to run before
the task returns to user mode:

+ mce_vaddr: will be initialized to the user virtual address of the fault
  in the case where the fault occurred in the kernel copying data from
  a user address.  This is so that kill_me_maybe() can provide that
  information to the user SIGBUS handler.

+ mce_kflags: copy of the struct mce.kflags needed by kill_me_maybe()
  to determine if mce_vaddr is applicable to this error.

Add code to recover from a machine check while copying data from user
space to the kernel. Action for this case is the same as if the user
touched the poison directly; unmap the page and send a SIGBUS to the task.

Use a new helper function to share common code between the "fault
in user mode" case and the "fault while copying from user" case.

New code paths will be activated by the next patch which sets
MCE_IN_KERNEL_COPYIN.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201006210910.21062-6-tony.luck@intel.com
2020-10-07 11:29:41 +02:00
Vincent Donnefort
51cf18c90c sched/debug: Add new tracepoint to track cpu_capacity
rq->cpu_capacity is a key element in several scheduler parts, such as EAS
task placement and load balancing. Tracking this value enables testing
and/or debugging by a toolkit.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1598605249-72651-1-git-send-email-vincent.donnefort@arm.com
2020-10-03 16:30:52 +02:00
Jens Axboe
0f2122045b io_uring: don't rely on weak ->files references
Grab actual references to the files_struct. To avoid circular references
issues due to this, we add a per-task note that keeps track of what
io_uring contexts a task has used. When the tasks execs or exits its
assigned files, we cancel requests based on this tracking.

With that, we can grab proper references to the files table, and no
longer need to rely on stashing away ring_fd and ring_file to check
if the ring_fd may have been closed.

Cc: stable@vger.kernel.org # v5.5+
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-30 20:32:32 -06:00
Greg Kroah-Hartman
f022c0602c Merge 5.9-rc3 into android-mainline
Linux 5.9-rc3

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic7758bc57a7d91861657388ddd015db5c5db5480
2020-08-31 19:51:25 +02:00
Sebastian Andrzej Siewior
01ccf59236 sched: Bring the PF_IO_WORKER and PF_WQ_WORKER bits closer together
The bits PF_IO_WORKER and PF_WQ_WORKER are tested together in
sched_submit_work() which is considered to be a hot path.
If the two bits cross the 8 or 16 bit boundary then most architecture
require multiple load instructions in order to create the constant
value. Also, such a value can not be encoded within the compare opcode.

By moving the bit definition within the same block, the compiler can
create/use one immediate value.

For some reason gcc-10 on ARM64 requires both bits to be next to each
other in order to issue "tst reg, val; bne label". Otherwise the result
is "mov reg1, val; tst reg, reg1; bne label".

Move PF_VCPU out of the way so that PF_IO_WORKER can be next to
PF_WQ_WORKER.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200819195505.y3fxk72sotnrkczi@linutronix.de
2020-08-26 12:41:58 +02:00
Marco Elver
c94a88f341 sched: Use __always_inline on is_idle_task()
is_idle_task() may be used from noinstr functions such as
irqentry_enter(). Since the compiler is free to not inline regular
inline functions, switch to using __always_inline.

Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200820172046.GA177701@elver.google.com
2020-08-26 12:41:51 +02:00
Sangmoon Kim
9ad8ff902e ANDROID: vendor_hooks: add waiting information for blocked tasks
- Add the hook to get mutex/rwsem information that the tasks
   are waiting for.

 - Add the hook to print messages for sched_show_task.

 - ANDROID_VENDOR_DATA_ARRAY added to task_struct

Bug: 162776704

Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
Change-Id: Ib436fbd8d0ad509c3b5a73ea8f5170e0761a13fd
(cherry picked from commit b519ac4237)
2020-08-19 14:52:44 +00:00
Greg Kroah-Hartman
df7e491a37 Merge 4b6c093e21 ("Merge tag 'block-5.9-2020-08-14' of git://git.kernel.dk/linux-block") into android-mainline
Steps on the way to 5.9-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I904678b5c31139b25fb49ee67cbacb4721a3c7bc
2020-08-17 09:19:35 +02:00
Linus Torvalds
b6b178e38f Merge tag 'timers-core-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more timer updates from Thomas Gleixner:
 "A set of posix CPU timer changes which allows to defer the heavy work
  of posix CPU timers into task work context. The tick interrupt is
  reduced to a quick check which queues the work which is doing the
  heavy lifting before returning to user space or going back to guest
  mode. Moving this out is deferring the signal delivery slightly but
  posix CPU timers are inaccurate by nature as they depend on the tick
  so there is no real damage. The relevant test cases all passed.

  This lifts the last offender for RT out of the hard interrupt context
  tick handler, but it also has the general benefit that the actual
  heavy work is accounted to the task/process and not to the tick
  interrupt itself.

  Further optimizations are possible to break long sighand lock hold and
  interrupt disabled (on !RT kernels) times when a massive amount of
  posix CPU timers (which are unpriviledged) is armed for a
  task/process.

  This is currently only enabled for x86 because the architecture has to
  ensure that task work is handled in KVM before entering a guest, which
  was just established for x86 with the new common entry/exit code which
  got merged post 5.8 and is not the case for other KVM architectures"

* tag 'timers-core-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Select POSIX_CPU_TIMERS_TASK_WORK
  posix-cpu-timers: Provide mechanisms to defer timer handling to task_work
  posix-cpu-timers: Split run_posix_cpu_timers()
2020-08-14 14:17:51 -07:00
Greg Kroah-Hartman
d7b0856eac Merge 00e4db5125 ("Merge tag 'perf-tools-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux") into android-mainline
Tiny steps on the way to 5.9-rc1.

Fixes conflicts in:
	fs/f2fs/inline.c

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I16d863ae44a51156499458e8c3486587cbe2babe
2020-08-11 10:57:46 +02:00
Linus Torvalds
97d052ea3f Merge tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Thomas Gleixner:
 "A set of locking fixes and updates:

   - Untangle the header spaghetti which causes build failures in
     various situations caused by the lockdep additions to seqcount to
     validate that the write side critical sections are non-preemptible.

   - The seqcount associated lock debug addons which were blocked by the
     above fallout.

     seqcount writers contrary to seqlock writers must be externally
     serialized, which usually happens via locking - except for strict
     per CPU seqcounts. As the lock is not part of the seqcount, lockdep
     cannot validate that the lock is held.

     This new debug mechanism adds the concept of associated locks.
     sequence count has now lock type variants and corresponding
     initializers which take a pointer to the associated lock used for
     writer serialization. If lockdep is enabled the pointer is stored
     and write_seqcount_begin() has a lockdep assertion to validate that
     the lock is held.

     Aside of the type and the initializer no other code changes are
     required at the seqcount usage sites. The rest of the seqcount API
     is unchanged and determines the type at compile time with the help
     of _Generic which is possible now that the minimal GCC version has
     been moved up.

     Adding this lockdep coverage unearthed a handful of seqcount bugs
     which have been addressed already independent of this.

     While generally useful this comes with a Trojan Horse twist: On RT
     kernels the write side critical section can become preemtible if
     the writers are serialized by an associated lock, which leads to
     the well known reader preempts writer livelock. RT prevents this by
     storing the associated lock pointer independent of lockdep in the
     seqcount and changing the reader side to block on the lock when a
     reader detects that a writer is in the write side critical section.

   - Conversion of seqcount usage sites to associated types and
     initializers"

* tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  locking/seqlock, headers: Untangle the spaghetti monster
  locking, arch/ia64: Reduce <asm/smp.h> header dependencies by moving XTP bits into the new <asm/xtp.h> header
  x86/headers: Remove APIC headers from <asm/smp.h>
  seqcount: More consistent seqprop names
  seqcount: Compress SEQCNT_LOCKNAME_ZERO()
  seqlock: Fold seqcount_LOCKNAME_init() definition
  seqlock: Fold seqcount_LOCKNAME_t definition
  seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g
  hrtimer: Use sequence counter with associated raw spinlock
  kvm/eventfd: Use sequence counter with associated spinlock
  userfaultfd: Use sequence counter with associated spinlock
  NFSv4: Use sequence counter with associated spinlock
  iocost: Use sequence counter with associated spinlock
  raid5: Use sequence counter with associated spinlock
  vfs: Use sequence counter with associated spinlock
  timekeeping: Use sequence counter with associated raw spinlock
  xfrm: policy: Use sequence counters with associated lock
  netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
  netfilter: conntrack: Use sequence counter with associated spinlock
  sched: tasks: Use sequence counter with associated spinlock
  ...
2020-08-10 19:07:44 -07:00
Greg Kroah-Hartman
f04a11ac99 Merge 7b4ea9456d ("Revert "x86/mm/64: Do not sync vmalloc/ioremap mappings"") into android-mainline
Steps on the way to 5.9-rc1

Resolves conflicts in:
	drivers/irqchip/qcom-pdc.c
	include/linux/device.h
	net/xfrm/xfrm_state.c
	security/lsm_audit.c

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4aeb3d04f4717714a421721eb3ce690c099bb30a
2020-08-07 16:01:35 +02:00
Greg Kroah-Hartman
00d6a8a7ee Merge e4cbce4d13 ("Merge tag 'sched-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") into android-mainline
Baby steps for 5.9-rc1

Resolves some kernel/sched/ merge issues.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I88cf5411ac7251f9795d9c50cb18b0df5bf0bcd6
2020-08-07 14:17:39 +02:00
Linus Torvalds
6d2b84a4e5 Merge tag 'sched-fifo-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull sched/fifo updates from Ingo Molnar:
 "This adds the sched_set_fifo*() encapsulation APIs to remove static
  priority level knowledge from non-scheduler code.

  The three APIs for non-scheduler code to set SCHED_FIFO are:

   - sched_set_fifo()
   - sched_set_fifo_low()
   - sched_set_normal()

  These are two FIFO priority levels: default (high), and a 'low'
  priority level, plus sched_set_normal() to set the policy back to
  non-SCHED_FIFO.

  Since the changes affect a lot of non-scheduler code, we kept this in
  a separate tree"

* tag 'sched-fifo-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  sched,tracing: Convert to sched_set_fifo()
  sched: Remove sched_set_*() return value
  sched: Remove sched_setscheduler*() EXPORTs
  sched,psi: Convert to sched_set_fifo_low()
  sched,rcutorture: Convert to sched_set_fifo_low()
  sched,rcuperf: Convert to sched_set_fifo_low()
  sched,locktorture: Convert to sched_set_fifo()
  sched,irq: Convert to sched_set_fifo()
  sched,watchdog: Convert to sched_set_fifo()
  sched,serial: Convert to sched_set_fifo()
  sched,powerclamp: Convert to sched_set_fifo()
  sched,ion: Convert to sched_set_normal()
  sched,powercap: Convert to sched_set_fifo*()
  sched,spi: Convert to sched_set_fifo*()
  sched,mmc: Convert to sched_set_fifo*()
  sched,ivtv: Convert to sched_set_fifo*()
  sched,drm/scheduler: Convert to sched_set_fifo*()
  sched,msm: Convert to sched_set_fifo*()
  sched,psci: Convert to sched_set_fifo*()
  sched,drbd: Convert to sched_set_fifo*()
  ...
2020-08-06 11:55:43 -07:00