Changes in 5.10.95
drm/i915: Flush TLBs before releasing backing store
bnx2x: Utilize firmware 7.13.21.0
bnx2x: Invalidate fastpath HSI version for VFs
rcu: Tighten rcu_advance_cbs_nowake() checks
KVM: x86/mmu: Fix write-protection of PTs mapped by the TDP MMU
select: Fix indefinitely sleeping task in poll_schedule_timeout()
drm/vmwgfx: Fix stale file descriptors on failed usercopy
Linux 5.10.95
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2df6cec74b4d7c8f0539182c547db61a45cd093f
[ Upstream commit 98b0d890220d45418cfbc5157b3382e6da5a12ab ]
Rick reported performance regressions in bugzilla because of cpu frequency
being lower than before:
https://bugzilla.kernel.org/show_bug.cgi?id=215045
He bisected the problem to:
commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
This commit forces util_sum to be synced with the new util_avg after
removing the contribution of a task and before the next periodic sync. By
doing so util_sum is rounded to its lower bound and might lost up to
LOAD_AVG_MAX-1 of accumulated contribution which has not yet been
reflected in util_avg.
Instead of always setting util_sum to the low bound of util_avg, which can
significantly lower the utilization of root cfs_rq after propagating the
change down into the hierarchy, we revert the change of util_sum and
propagate the difference.
In addition, we also check that cfs's util_sum always stays above the
lower bound for a given util_avg as it has been observed that
sched_entity's util_sum is sometimes above cfs one.
Fixes: 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
Reported-by: Rick Yiu <rickyiu@google.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Link: https://lkml.kernel.org/r/20220111134659.24961-2-vincent.guittot@linaro.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 09f5e7dc7ad705289e1b1ec065439aa3c42951c4 ]
Time readers that cannot take locks (due to NMI etc..) currently make
use of perf_event::shadow_ctx_time, which, for that event gives:
time' = now + (time - timestamp)
or, alternatively arranged:
time' = time + (now - timestamp)
IOW, the progression of time since the last time the shadow_ctx_time
was updated.
There's problems with this:
A) the shadow_ctx_time is per-event, even though the ctx_time it
reflects is obviously per context. The direct concequence of this
is that the context needs to iterate all events all the time to
keep the shadow_ctx_time in sync.
B) even with the prior point, the context itself might not be active
meaning its time should not advance to begin with.
C) shadow_ctx_time isn't consistently updated when ctx_time is
There are 3 users of this stuff, that suffer differently from this:
- calc_timer_values()
- perf_output_read()
- perf_event_update_userpage() /* A */
- perf_event_read_local() /* A,B */
In particular, perf_output_read() doesn't suffer at all, because it's
sample driven and hence only relevant when the event is actually
running.
This same was supposed to be true for perf_event_update_userpage(),
after all self-monitoring implies the context is active *HOWEVER*, as
per commit f79256532682 ("perf/core: fix userpage->time_enabled of
inactive events") this goes wrong when combined with counter
overcommit, in that case those events that do not get scheduled when
the context becomes active (task events typically) miss out on the
EVENT_TIME update and ENABLED time is inflated (for a little while)
with the time the context was inactive. Once the event gets rotated
in, this gets corrected, leading to a non-monotonic timeflow.
perf_event_read_local() made things even worse, it can request time at
any point, suffering all the problems perf_event_update_userpage()
does and more. Because while perf_event_update_userpage() is limited
by the context being active, perf_event_read_local() users have no
such constraint.
Therefore, completely overhaul things and do away with
perf_event::shadow_ctx_time. Instead have regular context time updates
keep track of this offset directly and provide perf_event_time_now()
to complement perf_event_time().
perf_event_time_now() will, in adition to being context wide, also
take into account if the context is active. For inactive context, it
will not advance time.
This latter property means the cgroup perf_cgroup_info context needs
to grow addition state to track this.
Additionally, since all this is strictly per-cpu, we can use barrier()
to order context activity vs context time.
Fixes: 7d9285e82d ("perf/bpf: Extend the perf_event_read_local() interface, a.k.a. "bpf: perf event change needed for subsequent bpf helpers"")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Song Liu <song@kernel.org>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/r/YcB06DasOBtU0b00@hirez.programming.kicks-ass.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 809232619f5b15e31fb3563985e705454f32621f upstream.
The membarrier command MEMBARRIER_CMD_QUERY allows querying the
available membarrier commands. When the membarrier-rseq fence commands
were added, a new MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ_BITMASK was
introduced with the intent to expose them with the MEMBARRIER_CMD_QUERY
command, the but it was never added to MEMBARRIER_CMD_BITMASK.
The membarrier-rseq fence commands are therefore not wired up with the
query command.
Rename MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ_BITMASK to
MEMBARRIER_PRIVATE_EXPEDITED_RSEQ_BITMASK (the bitmask is not a command
per-se), and change the erroneous
MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ_BITMASK (which does not
actually exist) to MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ.
Wire up MEMBARRIER_PRIVATE_EXPEDITED_RSEQ_BITMASK in
MEMBARRIER_CMD_BITMASK. Fixing this allows discovering availability of
the membarrier-rseq fence feature.
Fixes: 2a36ab717e ("rseq/membarrier: Add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org> # 5.10+
Link: https://lkml.kernel.org/r/20220117203010.30129-1-mathieu.desnoyers@efficios.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c9d967b2ce40d71e968eb839f36c936b8a9cf1ea upstream.
The buffer handling in pm_show_wakelocks() is tricky, and hopefully
correct. Ensure it really is correct by using sysfs_emit_at() which
handles all of the tricky string handling logic in a PAGE_SIZE buffer
for us automatically as this is a sysfs file being read from.
Reviewed-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit cacc6c30e3 which is
commit 2e27e793e280ff12cb5c202a1214c08b0d3a0f26 upstream.
It breaks the Android kernel ABI and is not an issue for Android
systems, so revert it.
Bug: 161946584
Fixes: cacc6c30e3 ("clocksource: Reduce clocksource-skew threshold")
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4d6c0cc253459fad083a55bcfc2b06ff217c5cab
This reverts commit fd99aeb978 which is
commit c86ff8c55b8ae68837b2fa59dc0c203907e9a15f upstream.
It breaks the Android kernel ABI and is not an issue for Android
systems, so revert it.
Bug: 161946584
Fixes: fd99aeb978 ("clocksource: Avoid accidental unstable marking of clocksources")
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6ea81231f082921f85c3c671f120724a40a191fe
commit 614ddad17f22a22e035e2ea37a04815f50362017 upstream.
Currently, rcu_advance_cbs_nowake() checks that a grace period is in
progress, however, that grace period could end just after the check.
This commit rechecks that a grace period is still in progress while
holding the rcu_node structure's lock. The grace period cannot end while
the current CPU's rcu_node structure's ->lock is held, thus avoiding
false positives from the WARN_ON_ONCE().
As Daniel Vacek noted, it is not necessary for the rcu_node structure
to have a CPU that has not yet passed through its quiescent state.
Tested-by: Guillaume Morin <guillaume@morinfr.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Although it is usually safe to invoke synchronize_rcu_expedited() from a
preemption-enabled CPU-hotplug notifier, if it is invoked from a notifier
between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its attempts to
invoke a workqueue handler will hang due to RCU waiting on a CPU that
the scheduler is not paying attention to. This commit therefore expands
use of the existing workqueue-independent synchronize_rcu_expedited()
from early boot to also include CPUs that are being hotplugged.
Bug: 216238044
Link: https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/
Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit 710f460c395af6b81df1c81043308aaa60d5e25c
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/next)
Change-Id: I3f81dee6deaf6a4504aec31e058785dc8cee6a3f
Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
Changes in 5.10.94
KVM: VMX: switch blocked_vcpu_on_cpu_lock to raw spinlock
HID: uhid: Fix worker destroying device without any protection
HID: wacom: Reset expected and received contact counts at the same time
HID: wacom: Ignore the confidence flag when a touch is removed
HID: wacom: Avoid using stale array indicies to read contact count
f2fs: fix to do sanity check in is_alive()
nfc: llcp: fix NULL error pointer dereference on sendmsg() after failed bind()
mtd: rawnand: gpmi: Add ERR007117 protection for nfc_apply_timings
mtd: rawnand: gpmi: Remove explicit default gpmi clock setting for i.MX6
mtd: Fixed breaking list in __mtd_del_partition.
mtd: rawnand: davinci: Don't calculate ECC when reading page
mtd: rawnand: davinci: Avoid duplicated page read
mtd: rawnand: davinci: Rewrite function description
x86/gpu: Reserve stolen memory for first integrated Intel GPU
tools/nolibc: x86-64: Fix startup code bug
tools/nolibc: i386: fix initial stack alignment
tools/nolibc: fix incorrect truncation of exit code
rtc: cmos: take rtc_lock while reading from CMOS
media: v4l2-ioctl.c: readbuffers depends on V4L2_CAP_READWRITE
media: flexcop-usb: fix control-message timeouts
media: mceusb: fix control-message timeouts
media: em28xx: fix control-message timeouts
media: cpia2: fix control-message timeouts
media: s2255: fix control-message timeouts
media: dib0700: fix undefined behavior in tuner shutdown
media: redrat3: fix control-message timeouts
media: pvrusb2: fix control-message timeouts
media: stk1160: fix control-message timeouts
media: cec-pin: fix interrupt en/disable handling
can: softing_cs: softingcs_probe(): fix memleak on registration failure
iio: adc: ti-adc081c: Partial revert of removal of ACPI IDs
lkdtm: Fix content of section containing lkdtm_rodata_do_nothing()
iommu/io-pgtable-arm-v7s: Add error handle for page table allocation failure
gpu: host1x: Add back arm_iommu_detach_device()
dma_fence_array: Fix PENDING_ERROR leak in dma_fence_array_signaled()
PCI: Add function 1 DMA alias quirk for Marvell 88SE9125 SATA controller
mm_zone: add function to check if managed dma zone exists
dma/pool: create dma atomic pool only if dma zone has managed pages
mm/page_alloc.c: do not warn allocation failure on zone DMA if no managed pages
shmem: fix a race between shmem_unused_huge_shrink and shmem_evict_inode
drm/ttm: Put BO in its memory manager's lru list
Bluetooth: L2CAP: Fix not initializing sk_peer_pid
drm/bridge: display-connector: fix an uninitialized pointer in probe()
drm: fix null-ptr-deref in drm_dev_init_release()
drm/panel: kingdisplay-kd097d04: Delete panel on attach() failure
drm/panel: innolux-p079zca: Delete panel on attach() failure
drm/rockchip: dsi: Fix unbalanced clock on probe error
drm/rockchip: dsi: Hold pm-runtime across bind/unbind
drm/rockchip: dsi: Disable PLL clock on bind error
drm/rockchip: dsi: Reconfigure hardware on resume()
Bluetooth: cmtp: fix possible panic when cmtp_init_sockets() fails
clk: bcm-2835: Pick the closest clock rate
clk: bcm-2835: Remove rounding up the dividers
drm/vc4: hdmi: Set a default HSM rate
wcn36xx: ensure pairing of init_scan/finish_scan and start_scan/end_scan
wcn36xx: Indicate beacon not connection loss on MISSED_BEACON_IND
wcn36xx: Fix DMA channel enable/disable cycle
wcn36xx: Release DMA channel descriptor allocations
wcn36xx: Put DXE block into reset before freeing memory
wcn36xx: populate band before determining rate on RX
wcn36xx: fix RX BD rate mapping for 5GHz legacy rates
ath11k: Send PPDU_STATS_CFG with proper pdev mask to firmware
mtd: hyperbus: rpc-if: Check return value of rpcif_sw_init()
media: videobuf2: Fix the size printk format
media: atomisp: add missing media_device_cleanup() in atomisp_unregister_entities()
media: atomisp: fix punit_ddr_dvfs_enable() argument for mrfld_power up case
media: atomisp: fix inverted logic in buffers_needed()
media: atomisp: do not use err var when checking port validity for ISP2400
media: atomisp: fix inverted error check for ia_css_mipi_is_source_port_valid()
media: atomisp: fix ifdefs in sh_css.c
media: staging: media: atomisp: pci: Balance braces around conditional statements in file atomisp_cmd.c
media: atomisp: add NULL check for asd obtained from atomisp_video_pipe
media: atomisp: fix enum formats logic
media: atomisp: fix uninitialized bug in gmin_get_pmic_id_and_addr()
media: aspeed: fix mode-detect always time out at 2nd run
media: em28xx: fix memory leak in em28xx_init_dev
media: aspeed: Update signal status immediately to ensure sane hw state
arm64: dts: amlogic: meson-g12: Fix GPU operating point table node name
arm64: dts: amlogic: Fix SPI NOR flash node name for ODROID N2/N2+
arm64: dts: meson-gxbb-wetek: fix HDMI in early boot
arm64: dts: meson-gxbb-wetek: fix missing GPIO binding
fs: dlm: use sk->sk_socket instead of con->sock
fs: dlm: don't call kernel_getpeername() in error_report()
memory: renesas-rpc-if: Return error in case devm_ioremap_resource() fails
Bluetooth: stop proccessing malicious adv data
ath11k: Fix ETSI regd with weather radar overlap
ath11k: clear the keys properly via DISABLE_KEY
ath11k: reset RSN/WPA present state for open BSS
tee: fix put order in teedev_close_context()
fs: dlm: fix build with CONFIG_IPV6 disabled
drm/vboxvideo: fix a NULL vs IS_ERR() check
arm64: dts: renesas: cat875: Add rx/tx delays
media: dmxdev: fix UAF when dvb_register_device() fails
crypto: qce - fix uaf on qce_ahash_register_one
crypto: qce - fix uaf on qce_skcipher_register_one
mtd: hyperbus: rpc-if: fix bug in rpcif_hb_remove
ARM: dts: stm32: fix dtbs_check warning on ili9341 dts binding on stm32f429 disco
crypto: qat - fix spelling mistake: "messge" -> "message"
crypto: qat - remove unnecessary collision prevention step in PFVF
crypto: qat - make pfvf send message direction agnostic
crypto: qat - fix undetected PFVF timeout in ACK loop
ath11k: Use host CE parameters for CE interrupts configuration
arm64: dts: ti: k3-j721e: correct cache-sets info
tty: serial: atmel: Check return code of dmaengine_submit()
tty: serial: atmel: Call dma_async_issue_pending()
mfd: atmel-flexcom: Remove #ifdef CONFIG_PM_SLEEP
mfd: atmel-flexcom: Use .resume_noirq
media: rcar-csi2: Correct the selection of hsfreqrange
media: imx-pxp: Initialize the spinlock prior to using it
media: si470x-i2c: fix possible memory leak in si470x_i2c_probe()
media: mtk-vcodec: call v4l2_m2m_ctx_release first when file is released
media: coda: fix CODA960 JPEG encoder buffer overflow
media: venus: pm_helpers: Control core power domain manually
media: venus: core, venc, vdec: Fix probe dependency error
media: venus: core: Fix a potential NULL pointer dereference in an error handling path
media: venus: core: Fix a resource leak in the error handling path of 'venus_probe()'
thermal/drivers/imx: Implement runtime PM support
netfilter: bridge: add support for pppoe filtering
arm64: dts: qcom: msm8916: fix MMC controller aliases
cgroup: Trace event cgroup id fields should be u64
ACPI: EC: Rework flushing of EC work while suspended to idle
thermal/drivers/imx8mm: Enable ADC when enabling monitor
drm/amdgpu: Fix a NULL pointer dereference in amdgpu_connector_lcd_native_mode()
drm/radeon/radeon_kms: Fix a NULL pointer dereference in radeon_driver_open_kms()
arm64: dts: ti: k3-j7200: Fix the L2 cache sets
arm64: dts: ti: k3-j721e: Fix the L2 cache sets
arm64: dts: ti: k3-j7200: Correct the d-cache-sets info
tty: serial: uartlite: allow 64 bit address
serial: amba-pl011: do not request memory region twice
floppy: Fix hang in watchdog when disk is ejected
staging: rtl8192e: return error code from rtllib_softmac_init()
staging: rtl8192e: rtllib_module: fix error handle case in alloc_rtllib()
Bluetooth: btmtksdio: fix resume failure
sched/fair: Fix detection of per-CPU kthreads waking a task
sched/fair: Fix per-CPU kthread and wakee stacking for asym CPU capacity
bpf: Adjust BTF log size limit.
bpf: Disallow BPF_LOG_KERNEL log level for bpf(BPF_BTF_LOAD)
bpf: Remove config check to enable bpf support for branch records
arm64: lib: Annotate {clear, copy}_page() as position-independent
arm64: clear_page() shouldn't use DC ZVA when DCZID_EL0.DZP == 1
media: dib8000: Fix a memleak in dib8000_init()
media: saa7146: mxb: Fix a NULL pointer dereference in mxb_attach()
media: si2157: Fix "warm" tuner state detection
wireless: iwlwifi: Fix a double free in iwl_txq_dyn_alloc_dma
sched/rt: Try to restart rt period timer when rt runtime exceeded
drm/msm/dp: displayPort driver need algorithm rational
rcu/exp: Mark current CPU as exp-QS in IPI loop second pass
mwifiex: Fix possible ABBA deadlock
xfrm: fix a small bug in xfrm_sa_len()
x86/uaccess: Move variable into switch case statement
selftests: clone3: clone3: add case CLONE3_ARGS_NO_TEST
selftests: harness: avoid false negatives if test has no ASSERTs
crypto: stm32 - Fix last sparse warning in stm32_cryp_check_ctr_counter
crypto: stm32/cryp - fix CTR counter carry
crypto: stm32/cryp - fix xts and race condition in crypto_engine requests
crypto: stm32/cryp - check early input data
crypto: stm32/cryp - fix double pm exit
crypto: stm32/cryp - fix lrw chaining mode
crypto: stm32/cryp - fix bugs and crash in tests
crypto: stm32 - Revert broken pm_runtime_resume_and_get changes
ath11k: Fix deleting uninitialized kernel timer during fragment cache flush
ARM: dts: gemini: NAS4220-B: fis-index-block with 128 KiB sectors
media: dw2102: Fix use after free
media: msi001: fix possible null-ptr-deref in msi001_probe()
media: coda/imx-vdoa: Handle dma_set_coherent_mask error codes
ath11k: Fix a NULL pointer dereference in ath11k_mac_op_hw_scan()
arm64: dts: qcom: c630: Fix soundcard setup
arm64: dts: qcom: ipq6018: Fix gpio-ranges property
drm/msm/dpu: fix safe status debugfs file
drm/bridge: ti-sn65dsi86: Set max register for regmap
drm/tegra: vic: Fix DMA API misuse
media: hantro: Fix probe func error path
xfrm: interface with if_id 0 should return error
xfrm: state and policy should fail if XFRMA_IF_ID 0
ARM: 9159/1: decompressor: Avoid UNPREDICTABLE NOP encoding
usb: ftdi-elan: fix memory leak on device disconnect
arm64: dts: marvell: cn9130: add GPIO and SPI aliases
arm64: dts: marvell: cn9130: enable CP0 GPIO controllers
ARM: dts: armada-38x: Add generic compatible to UART nodes
iwlwifi: mvm: fix 32-bit build in FTM
iwlwifi: mvm: test roc running status bits before removing the sta
mmc: meson-mx-sdhc: add IRQ check
mmc: meson-mx-sdio: add IRQ check
selinux: fix potential memleak in selinux_add_opt()
um: fix ndelay/udelay defines
um: virtio_uml: Fix time-travel external time propagation
Bluetooth: L2CAP: Fix using wrong mode
bpftool: Enable line buffering for stdout
backlight: qcom-wled: Validate enabled string indices in DT
backlight: qcom-wled: Pass number of elements to read to read_u32_array
backlight: qcom-wled: Fix off-by-one maximum with default num_strings
backlight: qcom-wled: Override default length with qcom,enabled-strings
backlight: qcom-wled: Use cpu_to_le16 macro to perform conversion
backlight: qcom-wled: Respect enabled-strings in set_brightness
software node: fix wrong node passed to find nargs_prop
Bluetooth: hci_qca: Stop IBS timer during BT OFF
x86/boot/compressed: Move CLANG_FLAGS to beginning of KBUILD_CFLAGS
hwmon: (mr75203) fix wrong power-up delay value
x86/mce/inject: Avoid out-of-bounds write when setting flags
ACPI: scan: Create platform device for BCM4752 and LNV4752 ACPI nodes
pcmcia: rsrc_nonstatic: Fix a NULL pointer dereference in __nonstatic_find_io_region()
pcmcia: rsrc_nonstatic: Fix a NULL pointer dereference in nonstatic_find_mem_region()
power: reset: mt6397: Check for null res pointer
netfilter: ipt_CLUSTERIP: fix refcount leak in clusterip_tg_check()
bpf: Don't promote bogus looking registers after null check.
bpf: Fix SO_RCVBUF/SO_SNDBUF handling in _bpf_setsockopt().
netfilter: nft_set_pipapo: allocate pcpu scratch maps on clone
ppp: ensure minimum packet size in ppp_write()
rocker: fix a sleeping in atomic bug
staging: greybus: audio: Check null pointer
fsl/fman: Check for null pointer after calling devm_ioremap
Bluetooth: hci_bcm: Check for error irq
Bluetooth: hci_qca: Fix NULL vs IS_ERR_OR_NULL check in qca_serdev_probe
usb: dwc3: qcom: Fix NULL vs IS_ERR checking in dwc3_qcom_probe
HID: hid-uclogic-params: Invalid parameter check in uclogic_params_init
HID: hid-uclogic-params: Invalid parameter check in uclogic_params_get_str_desc
HID: hid-uclogic-params: Invalid parameter check in uclogic_params_huion_init
HID: hid-uclogic-params: Invalid parameter check in uclogic_params_frame_init_v1_buttonpad
debugfs: lockdown: Allow reading debugfs files that are not world readable
net/mlx5e: Fix page DMA map/unmap attributes
net/mlx5e: Don't block routes with nexthop objects in SW
Revert "net/mlx5e: Block offload of outer header csum for UDP tunnels"
net/mlx5: Set command entry semaphore up once got index free
lib/mpi: Add the return value check of kcalloc()
Bluetooth: L2CAP: uninitialized variables in l2cap_sock_setsockopt()
spi: spi-meson-spifc: Add missing pm_runtime_disable() in meson_spifc_probe
ax25: uninitialized variable in ax25_setsockopt()
netrom: fix api breakage in nr_setsockopt()
regmap: Call regmap_debugfs_exit() prior to _init()
can: mcp251xfd: add missing newline to printed strings
tpm: add request_locality before write TPM_INT_ENABLE
tpm_tis: Fix an error handling path in 'tpm_tis_core_init()'
can: softing: softing_startstop(): fix set but not used variable warning
can: xilinx_can: xcan_probe(): check for error irq
pcmcia: fix setting of kthread task states
iwlwifi: mvm: Use div_s64 instead of do_div in iwl_mvm_ftm_rtt_smoothing()
net: mcs7830: handle usb read errors properly
ext4: avoid trim error on fs with small groups
ALSA: jack: Add missing rwsem around snd_ctl_remove() calls
ALSA: PCM: Add missing rwsem around snd_ctl_remove() calls
ALSA: hda: Add missing rwsem around snd_ctl_remove() calls
RDMA/bnxt_re: Scan the whole bitmap when checking if "disabling RCFW with pending cmd-bit"
RDMA/hns: Validate the pkey index
scsi: pm80xx: Update WARN_ON check in pm8001_mpi_build_cmd()
clk: imx8mn: Fix imx8mn_clko1_sels
powerpc/prom_init: Fix improper check of prom_getprop()
ASoC: uniphier: drop selecting non-existing SND_SOC_UNIPHIER_AIO_DMA
dt-bindings: thermal: Fix definition of cooling-maps contribution property
powerpc/64s: Convert some cpu_setup() and cpu_restore() functions to C
powerpc/perf: MMCR0 control for PMU registers under PMCC=00
powerpc/perf: move perf irq/nmi handling details into traps.c
powerpc/irq: Add helper to set regs->softe
powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC
powerpc/32s: Fix shift-out-of-bounds in KASAN init
clocksource: Reduce clocksource-skew threshold
clocksource: Avoid accidental unstable marking of clocksources
ALSA: oss: fix compile error when OSS_DEBUG is enabled
ALSA: usb-audio: Drop superfluous '0' in Presonus Studio 1810c's ID
char/mwave: Adjust io port register size
binder: fix handling of error during copy
openrisc: Add clone3 ABI wrapper
iommu/io-pgtable-arm: Fix table descriptor paddr formatting
scsi: ufs: Fix race conditions related to driver data
RDMA/qedr: Fix reporting max_{send/recv}_wr attrs
PCI/MSI: Fix pci_irq_vector()/pci_irq_get_affinity()
powerpc/powermac: Add additional missing lockdep_register_key()
RDMA/core: Let ib_find_gid() continue search even after empty entry
RDMA/cma: Let cma_resolve_ib_dev() continue search even after empty entry
ASoC: rt5663: Handle device_property_read_u32_array error codes
of: unittest: fix warning on PowerPC frame size warning
of: unittest: 64 bit dma address test requires arch support
clk: stm32: Fix ltdc's clock turn off by clk_disable_unused() after system enter shell
mips: add SYS_HAS_CPU_MIPS64_R5 config for MIPS Release 5 support
mips: fix Kconfig reference to PHYS_ADDR_T_64BIT
dmaengine: pxa/mmp: stop referencing config->slave_id
iommu/amd: Remove iommu_init_ga()
iommu/amd: Restore GA log/tail pointer on host resume
ASoC: Intel: catpt: Test dmaengine_submit() result before moving on
iommu/iova: Fix race between FQ timeout and teardown
scsi: block: pm: Always set request queue runtime active in blk_post_runtime_resume()
phy: uniphier-usb3ss: fix unintended writing zeros to PHY register
ASoC: mediatek: Check for error clk pointer
ASoC: samsung: idma: Check of ioremap return value
misc: lattice-ecp3-config: Fix task hung when firmware load failed
counter: stm32-lptimer-cnt: remove iio counter abi
arm64: tegra: Fix Tegra194 HDA {clock,reset}-names ordering
arm64: tegra: Remove non existent Tegra194 reset
mips: lantiq: add support for clk_set_parent()
mips: bcm63xx: add support for clk_set_parent()
powerpc/xive: Add missing null check after calling kmalloc
ASoC: fsl_mqs: fix MODULE_ALIAS
RDMA/cxgb4: Set queue pair state when being queried
ASoC: fsl_asrc: refine the check of available clock divider
clk: bm1880: remove kfrees on static allocations
of: base: Fix phandle argument length mismatch error message
ARM: dts: omap3-n900: Fix lp5523 for multi color
Bluetooth: Fix debugfs entry leak in hci_register_dev()
fs: dlm: filter user dlm messages for kernel locks
drm/lima: fix warning when CONFIG_DEBUG_SG=y & CONFIG_DMA_API_DEBUG=y
selftests/bpf: Fix bpf_object leak in skb_ctx selftest
ar5523: Fix null-ptr-deref with unexpected WDCMSG_TARGET_START reply
drm/bridge: dw-hdmi: handle ELD when DRM_BRIDGE_ATTACH_NO_CONNECTOR
drm/nouveau/pmu/gm200-: avoid touching PMU outside of DEVINIT/PREOS/ACR
media: atomisp: fix try_fmt logic
media: atomisp: set per-device's default mode
media: atomisp-ov2680: Fix ov2680_set_fmt() clobbering the exposure
ARM: shmobile: rcar-gen2: Add missing of_node_put()
batman-adv: allow netlink usage in unprivileged containers
media: atomisp: handle errors at sh_css_create_isp_params()
ath11k: Fix crash caused by uninitialized TX ring
usb: gadget: f_fs: Use stream_open() for endpoint files
drm: panel-orientation-quirks: Add quirk for the Lenovo Yoga Book X91F/L
HID: apple: Do not reset quirks when the Fn key is not found
media: b2c2: Add missing check in flexcop_pci_isr:
EDAC/synopsys: Use the quirk for version instead of ddr version
ARM: imx: rename DEBUG_IMX21_IMX27_UART to DEBUG_IMX27_UART
drm/amd/display: check top_pipe_to_program pointer
drm/amdgpu/display: set vblank_disable_immediate for DC
soc: ti: pruss: fix referenced node in error message
mlxsw: pci: Add shutdown method in PCI driver
drm/bridge: megachips: Ensure both bridges are probed before registration
tty: serial: imx: disable UCR4_OREN in .stop_rx() instead of .shutdown()
gpiolib: acpi: Do not set the IRQ type if the IRQ is already in use
HSI: core: Fix return freed object in hsi_new_client
crypto: jitter - consider 32 LSB for APT
mwifiex: Fix skb_over_panic in mwifiex_usb_recv()
rsi: Fix use-after-free in rsi_rx_done_handler()
rsi: Fix out-of-bounds read in rsi_read_pkt()
ath11k: Avoid NULL ptr access during mgmt tx cleanup
media: venus: avoid calling core_clk_setrate() concurrently during concurrent video sessions
ACPI / x86: Drop PWM2 device on Lenovo Yoga Book from always present table
ACPI: Change acpi_device_always_present() into acpi_device_override_status()
ACPI / x86: Allow specifying acpi_device_override_status() quirks by path
ACPI / x86: Add not-present quirk for the PCI0.SDHB.BRC1 device on the GPD win
arm64: dts: ti: j7200-main: Fix 'dtbs_check' serdes_ln_ctrl node
usb: uhci: add aspeed ast2600 uhci support
floppy: Add max size check for user space request
x86/mm: Flush global TLB when switching to trampoline page-table
drm: rcar-du: Fix CRTC timings when CMM is used
media: uvcvideo: Increase UVC_CTRL_CONTROL_TIMEOUT to 5 seconds.
media: rcar-vin: Update format alignment constraints
media: saa7146: hexium_orion: Fix a NULL pointer dereference in hexium_attach()
media: m920x: don't use stack on USB reads
thunderbolt: Runtime PM activate both ends of the device link
iwlwifi: mvm: synchronize with FW after multicast commands
iwlwifi: mvm: avoid clearing a just saved session protection id
ath11k: avoid deadlock by change ieee80211_queue_work for regd_update_work
ath10k: Fix tx hanging
net-sysfs: update the queue counts in the unregistration path
net: phy: prefer 1000baseT over 1000baseKX
gpio: aspeed: Convert aspeed_gpio.lock to raw_spinlock
selftests/ftrace: make kprobe profile testcase description unique
ath11k: Avoid false DEADLOCK warning reported by lockdep
x86/mce: Allow instrumentation during task work queueing
x86/mce: Mark mce_panic() noinstr
x86/mce: Mark mce_end() noinstr
x86/mce: Mark mce_read_aux() noinstr
net: bonding: debug: avoid printing debug logs when bond is not notifying peers
bpf: Do not WARN in bpf_warn_invalid_xdp_action()
HID: quirks: Allow inverting the absolute X/Y values
media: igorplugusb: receiver overflow should be reported
media: saa7146: hexium_gemini: Fix a NULL pointer dereference in hexium_attach()
mmc: core: Fixup storing of OCR for MMC_QUIRK_NONSTD_SDIO
audit: ensure userspace is penalized the same as the kernel when under pressure
arm64: dts: ls1028a-qds: move rtc node to the correct i2c bus
arm64: tegra: Adjust length of CCPLEX cluster MMIO region
PM: runtime: Add safety net to supplier device release
cpufreq: Fix initialization of min and max frequency QoS requests
usb: hub: Add delay for SuperSpeed hub resume to let links transit to U0
ath9k: Fix out-of-bound memcpy in ath9k_hif_usb_rx_stream
rtw88: 8822c: update rx settings to prevent potential hw deadlock
PM: AVS: qcom-cpr: Use div64_ul instead of do_div
iwlwifi: fix leaks/bad data after failed firmware load
iwlwifi: remove module loading failure message
iwlwifi: mvm: Fix calculation of frame length
iwlwifi: pcie: make sure prph_info is set when treating wakeup IRQ
um: registers: Rename function names to avoid conflicts and build problems
ath11k: Fix napi related hang
Bluetooth: vhci: Set HCI_QUIRK_VALID_LE_STATES
xfrm: rate limit SA mapping change message to user space
drm/etnaviv: consider completed fence seqno in hang check
jffs2: GC deadlock reading a page that is used in jffs2_write_begin()
ACPICA: actypes.h: Expand the ACPI_ACCESS_ definitions
ACPICA: Utilities: Avoid deleting the same object twice in a row
ACPICA: Executer: Fix the REFCLASS_REFOF case in acpi_ex_opcode_1A_0T_1R()
ACPICA: Fix wrong interpretation of PCC address
ACPICA: Hardware: Do not flush CPU cache when entering S4 and S5
drm/amdgpu: fixup bad vram size on gmc v8
amdgpu/pm: Make sysfs pm attributes as read-only for VFs
ACPI: battery: Add the ThinkPad "Not Charging" quirk
btrfs: remove BUG_ON() in find_parent_nodes()
btrfs: remove BUG_ON(!eie) in find_parent_nodes
net: mdio: Demote probed message to debug print
mac80211: allow non-standard VHT MCS-10/11
dm btree: add a defensive bounds check to insert_at()
dm space map common: add bounds check to sm_ll_lookup_bitmap()
mlxsw: pci: Avoid flow control for EMAD packets
net: phy: marvell: configure RGMII delays for 88E1118
net: gemini: allow any RGMII interface mode
regulator: qcom_smd: Align probe function with rpmh-regulator
serial: pl010: Drop CR register reset on set_termios
serial: core: Keep mctrl register state and cached copy in sync
random: do not throw away excess input to crng_fast_load
parisc: Avoid calling faulthandler_disabled() twice
x86/kbuild: Enable CONFIG_KALLSYMS_ALL=y in the defconfigs
powerpc/6xx: add missing of_node_put
powerpc/powernv: add missing of_node_put
powerpc/cell: add missing of_node_put
powerpc/btext: add missing of_node_put
powerpc/watchdog: Fix missed watchdog reset due to memory ordering race
i2c: i801: Don't silently correct invalid transfer size
powerpc/smp: Move setup_profiling_timer() under CONFIG_PROFILING
i2c: mpc: Correct I2C reset procedure
clk: meson: gxbb: Fix the SDM_EN bit for MPLL0 on GXBB
powerpc/powermac: Add missing lockdep_register_key()
KVM: PPC: Book3S: Suppress warnings when allocating too big memory slots
KVM: PPC: Book3S: Suppress failed alloc warning in H_COPY_TOFROM_GUEST
w1: Misuse of get_user()/put_user() reported by sparse
nvmem: core: set size for sysfs bin file
dm: fix alloc_dax error handling in alloc_dev
scsi: lpfc: Trigger SLI4 firmware dump before doing driver cleanup
ALSA: seq: Set upper limit of processed events
MIPS: Loongson64: Use three arguments for slti
powerpc/40x: Map 32Mbytes of memory at startup
selftests/powerpc/spectre_v2: Return skip code when miss_percent is high
powerpc: handle kdump appropriately with crash_kexec_post_notifiers option
powerpc/fadump: Fix inaccurate CPU state info in vmcore generated with panic
udf: Fix error handling in udf_new_inode()
MIPS: OCTEON: add put_device() after of_find_device_by_node()
irqchip/gic-v4: Disable redistributors' view of the VPE table at boot time
i2c: designware-pci: Fix to change data types of hcnt and lcnt parameters
MIPS: Octeon: Fix build errors using clang
scsi: sr: Don't use GFP_DMA
ASoC: mediatek: mt8173: fix device_node leak
ASoC: mediatek: mt8183: fix device_node leak
phy: mediatek: Fix missing check in mtk_mipi_tx_probe
rpmsg: core: Clean up resources on announce_create failure.
crypto: omap-aes - Fix broken pm_runtime_and_get() usage
crypto: stm32/crc32 - Fix kernel BUG triggered in probe()
crypto: caam - replace this_cpu_ptr with raw_cpu_ptr
ubifs: Error path in ubifs_remount_rw() seems to wrongly free write buffers
tpm: fix NPE on probe for missing device
spi: uniphier: Fix a bug that doesn't point to private data correctly
xen/gntdev: fix unmap notification order
fuse: Pass correct lend value to filemap_write_and_wait_range()
serial: Fix incorrect rs485 polarity on uart open
cputime, cpuacct: Include guest time in user time in cpuacct.stat
tracing/kprobes: 'nmissed' not showed correctly for kretprobe
iwlwifi: mvm: Increase the scan timeout guard to 30 seconds
s390/mm: fix 2KB pgtable release race
device property: Fix fwnode_graph_devcon_match() fwnode leak
drm/etnaviv: limit submit sizes
drm/nouveau/kms/nv04: use vzalloc for nv04_display
drm/bridge: analogix_dp: Make PSR-exit block less
parisc: Fix lpa and lpa_user defines
powerpc/64s/radix: Fix huge vmap false positive
PCI: xgene: Fix IB window setup
PCI: pciehp: Use down_read/write_nested(reset_lock) to fix lockdep errors
PCI: pci-bridge-emul: Make expansion ROM Base Address register read-only
PCI: pci-bridge-emul: Properly mark reserved PCIe bits in PCI config space
PCI: pci-bridge-emul: Fix definitions of reserved bits
PCI: pci-bridge-emul: Correctly set PCIe capabilities
PCI: pci-bridge-emul: Set PCI_STATUS_CAP_LIST for PCIe device
xfrm: fix policy lookup for ipv6 gre packets
btrfs: fix deadlock between quota enable and other quota operations
btrfs: check the root node for uptodate before returning it
btrfs: respect the max size in the header when activating swap file
ext4: make sure to reset inode lockdep class when quota enabling fails
ext4: make sure quota gets properly shutdown on error
ext4: fix a possible ABBA deadlock due to busy PA
ext4: initialize err_blk before calling __ext4_get_inode_loc
ext4: fix fast commit may miss tracking range for FALLOC_FL_ZERO_RANGE
ext4: set csum seed in tmp inode while migrating to extents
ext4: Fix BUG_ON in ext4_bread when write quota data
ext4: use ext4_ext_remove_space() for fast commit replay delete range
ext4: fast commit may miss tracking unwritten range during ftruncate
ext4: destroy ext4_fc_dentry_cachep kmemcache on module removal
ext4: fix null-ptr-deref in '__ext4_journal_ensure_credits'
ext4: don't use the orphan list when migrating an inode
drm/radeon: fix error handling in radeon_driver_open_kms
of: base: Improve argument length mismatch error
firmware: Update Kconfig help text for Google firmware
can: mcp251xfd: mcp251xfd_tef_obj_read(): fix typo in error message
media: rcar-csi2: Optimize the selection PHTW register
drm/vc4: hdmi: Make sure the device is powered with CEC
media: correct MEDIA_TEST_SUPPORT help text
Documentation: dmaengine: Correctly describe dmatest with channel unset
Documentation: ACPI: Fix data node reference documentation
Documentation: refer to config RANDOMIZE_BASE for kernel address-space randomization
Documentation: fix firewire.rst ABI file path error
Bluetooth: hci_sync: Fix not setting adv set duration
scsi: core: Show SCMD_LAST in text form
dmaengine: uniphier-xdmac: Fix type of address variables
RDMA/hns: Modify the mapping attribute of doorbell to device
RDMA/rxe: Fix a typo in opcode name
dmaengine: stm32-mdma: fix STM32_MDMA_CTBR_TSEL_MASK
Revert "net/mlx5: Add retry mechanism to the command entry index allocation"
powerpc/cell: Fix clang -Wimplicit-fallthrough warning
powerpc/fsl/dts: Enable WA for erratum A-009885 on fman3l MDIO buses
block: Fix fsync always failed if once failed
bpftool: Remove inclusion of utilities.mak from Makefiles
xdp: check prog type before updating BPF link
perf evsel: Override attr->sample_period for non-libpfm4 events
ipv4: update fib_info_cnt under spinlock protection
ipv4: avoid quadratic behavior in netns dismantle
net/fsl: xgmac_mdio: Add workaround for erratum A-009885
net/fsl: xgmac_mdio: Fix incorrect iounmap when removing module
parisc: pdc_stable: Fix memory leak in pdcs_register_pathentries
f2fs: compress: fix potential deadlock of compress file
f2fs: fix to reserve space for IO align feature
af_unix: annote lockless accesses to unix_tot_inflight & gc_in_progress
clk: Emit a stern warning with writable debugfs enabled
clk: si5341: Fix clock HW provider cleanup
net/smc: Fix hung_task when removing SMC-R devices
net: axienet: increase reset timeout
net: axienet: Wait for PhyRstCmplt after core reset
net: axienet: reset core on initialization prior to MDIO access
net: axienet: add missing memory barriers
net: axienet: limit minimum TX ring size
net: axienet: Fix TX ring slot available check
net: axienet: fix number of TX ring slots for available check
net: axienet: fix for TX busy handling
net: axienet: increase default TX ring size to 128
HID: vivaldi: fix handling devices not using numbered reports
rtc: pxa: fix null pointer dereference
vdpa/mlx5: Fix wrong configuration of virtio_version_1_0
virtio_ring: mark ring unused on error
taskstats: Cleanup the use of task->exit_code
inet: frags: annotate races around fqdir->dead and fqdir->high_thresh
netns: add schedule point in ops_exit_list()
xfrm: Don't accidentally set RTO_ONLINK in decode_session4()
gre: Don't accidentally set RTO_ONLINK in gre_fill_metadata_dst()
libcxgb: Don't accidentally set RTO_ONLINK in cxgb_find_route()
perf script: Fix hex dump character output
dmaengine: at_xdmac: Don't start transactions at tx_submit level
dmaengine: at_xdmac: Start transfer for cyclic channels in issue_pending
dmaengine: at_xdmac: Print debug message after realeasing the lock
dmaengine: at_xdmac: Fix concurrency over xfers_list
dmaengine: at_xdmac: Fix lld view setting
dmaengine: at_xdmac: Fix at_xdmac_lld struct definition
perf probe: Fix ppc64 'perf probe add events failed' case
devlink: Remove misleading internal_flags from health reporter dump
arm64: dts: qcom: msm8996: drop not documented adreno properties
net: bonding: fix bond_xmit_broadcast return value error bug
net_sched: restore "mpu xxx" handling
bcmgenet: add WOL IRQ check
net: ethernet: mtk_eth_soc: fix error checking in mtk_mac_config()
net: sfp: fix high power modules without diagnostic monitoring
net: mscc: ocelot: fix using match before it is set
dt-bindings: display: meson-dw-hdmi: add missing sound-name-prefix property
dt-bindings: display: meson-vpu: Add missing amlogic,canvas property
dt-bindings: watchdog: Require samsung,syscon-phandle for Exynos7
scripts/dtc: dtx_diff: remove broken example from help text
lib82596: Fix IRQ check in sni_82596_probe
mm/hmm.c: allow VM_MIXEDMAP to work with hmm_range_fault
lib/test_meminit: destroy cache in kmem_cache_alloc_bulk() test
mtd: nand: bbt: Fix corner case in bad block table handling
ath10k: Fix the MTU size on QCA9377 SDIO
scripts: sphinx-pre-install: add required ctex dependency
scripts: sphinx-pre-install: Fix ctex support on Debian
Linux 5.10.94
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I857f2417c899508815a1ba13d1285fd400a1f133
commit 1b5a42d9c85f0e731f01c8d1129001fd8531a8a0 upstream.
In the function bacct_add_task the code reading task->exit_code was
introduced in commit f3cef7a994 ("[PATCH] csa: basic accounting over
taskstats"), and it is not entirely clear what the taskstats interface
is trying to return as only returning the exit_code of the first task
in a process doesn't make a lot of sense.
As best as I can figure the intent is to return task->exit_code after
a task exits. The field is returned with per task fields, so the
exit_code of the entire process is not wanted. Only the value of the
first task is returned so this is not a useful way to get the per task
ptrace stop code. The ordinary case of returning this value is
returning after a task exits, which also precludes use for getting
a ptrace value.
It is common to for the first task of a process to also be the last
task of a process so this field may have done something reasonable by
accident in testing.
Make ac_exitcode a reliable per task value by always returning it for
every exited task.
Setting ac_exitcode in a sensible mannter makes it possible to continue
to provide this value going forward.
Cc: Balbir Singh <bsingharora@gmail.com>
Fixes: f3cef7a994 ("[PATCH] csa: basic accounting over taskstats")
Link: https://lkml.kernel.org/r/20220103213312.9144-5-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8f110f530635af44fff1f4ee100ecef0bac62510 ]
Due to the audit control mutex necessary for serializing audit
userspace messages we haven't been able to block/penalize userspace
processes that attempt to send audit records while the system is
under audit pressure. The result is that privileged userspace
applications have a priority boost with respect to audit as they are
not bound by the same audit queue throttling as the other tasks on
the system.
This patch attempts to restore some balance to the system when under
audit pressure by blocking these privileged userspace tasks after
they have finished their audit processing, and dropped the audit
control mutex, but before they return to userspace.
Reported-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Tested-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c86ff8c55b8ae68837b2fa59dc0c203907e9a15f ]
Since commit db3a34e17433 ("clocksource: Retry clock read if long delays
detected") and commit 2e27e793e280 ("clocksource: Reduce clocksource-skew
threshold"), it is found that tsc clocksource fallback to hpet can
sometimes happen on both Intel and AMD systems especially when they are
running stressful benchmarking workloads. Of the 23 systems tested with
a v5.14 kernel, 10 of them have switched to hpet clock source during
the test run.
The result of falling back to hpet is a drastic reduction of performance
when running benchmarks. For example, the fio performance tests can
drop up to 70% whereas the iperf3 performance can drop up to 80%.
4 hpet fallbacks happened during bootup. They were:
[ 8.749399] clocksource: timekeeping watchdog on CPU13: hpet read-back delay of 263750ns, attempt 4, marking unstable
[ 12.044610] clocksource: timekeeping watchdog on CPU19: hpet read-back delay of 186166ns, attempt 4, marking unstable
[ 17.336941] clocksource: timekeeping watchdog on CPU28: hpet read-back delay of 182291ns, attempt 4, marking unstable
[ 17.518565] clocksource: timekeeping watchdog on CPU34: hpet read-back delay of 252196ns, attempt 4, marking unstable
Other fallbacks happen when the systems were running stressful
benchmarks. For example:
[ 2685.867873] clocksource: timekeeping watchdog on CPU117: hpet read-back delay of 57269ns, attempt 4, marking unstable
[46215.471228] clocksource: timekeeping watchdog on CPU8: hpet read-back delay of 61460ns, attempt 4, marking unstable
Commit 2e27e793e280 ("clocksource: Reduce clocksource-skew threshold"),
changed the skew margin from 100us to 50us. I think this is too small
and can easily be exceeded when running some stressful workloads on a
thermally stressed system. So it is switched back to 100us.
Even a maximum skew margin of 100us may be too small in for some systems
when booting up especially if those systems are under thermal stress. To
eliminate the case that the large skew is due to the system being too
busy slowing down the reading of both the watchdog and the clocksource,
an extra consecutive read of watchdog clock is being done to check this.
The consecutive watchdog read delay is compared against
WATCHDOG_MAX_SKEW/2. If the delay exceeds the limit, we assume that
the system is just too busy. A warning will be printed to the console
and the clock skew check is skipped for this round.
Fixes: db3a34e17433 ("clocksource: Retry clock read if long delays detected")
Fixes: 2e27e793e280 ("clocksource: Reduce clocksource-skew threshold")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2e27e793e280ff12cb5c202a1214c08b0d3a0f26 ]
Currently, WATCHDOG_THRESHOLD is set to detect a 62.5-millisecond skew in
a 500-millisecond WATCHDOG_INTERVAL. This requires that clocks be skewed
by more than 12.5% in order to be marked unstable. Except that a clock
that is skewed by that much is probably destroying unsuspecting software
right and left. And given that there are now checks for false-positive
skews due to delays between reading the two clocks, it should be possible
to greatly decrease WATCHDOG_THRESHOLD, at least for fine-grained clocks
such as TSC.
Therefore, add a new uncertainty_margin field to the clocksource structure
that contains the maximum uncertainty in nanoseconds for the corresponding
clock. This field may be initialized manually, as it is for
clocksource_tsc_early and clocksource_jiffies, which is copied to
refined_jiffies. If the field is not initialized manually, it will be
computed at clock-registry time as the period of the clock in question
based on the scale and freq parameters to __clocksource_update_freq_scale()
function. If either of those two parameters are zero, the
tens-of-milliseconds WATCHDOG_THRESHOLD is used as a cowardly alternative
to dividing by zero. No matter how the uncertainty_margin field is
calculated, it is bounded below by twice WATCHDOG_MAX_SKEW, that is, by 100
microseconds.
Note that manually initialized uncertainty_margin fields are not adjusted,
but there is a WARN_ON_ONCE() that triggers if any such field is less than
twice WATCHDOG_MAX_SKEW. This WARN_ON_ONCE() is intended to discourage
production use of the one-nanosecond uncertainty_margin values that are
used to test the clock-skew code itself.
The actual clock-skew check uses the sum of the uncertainty_margin fields
of the two clocksource structures being compared. Integer overflow is
avoided because the largest computed value of the uncertainty_margin
fields is one billion (10^9), and double that value fits into an
unsigned int. However, if someone manually specifies (say) UINT_MAX,
they will get what they deserve.
Note that the refined_jiffies uncertainty_margin field is initialized to
TICK_NSEC, which means that skew checks involving this clocksource will
be sufficently forgiving. In a similar vein, the clocksource_tsc_early
uncertainty_margin field is initialized to 32*NSEC_PER_MSEC, which
replicates the current behavior and allows custom setting if needed
in order to address the rare skews detected for this clocksource in
current mainline.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Feng Tang <feng.tang@intel.com>
Link: https://lore.kernel.org/r/20210527190124.440372-4-paulmck@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e60b0d12a95dcf16a63225cead4541567f5cb517 ]
If we ever get to a point again where we convert a bogus looking <ptr>_or_null
typed register containing a non-zero fixed or variable offset, then lets not
reset these bounds to zero since they are not and also don't promote the register
to a <ptr> type, but instead leave it as <ptr>_or_null. Converting to a unknown
register could be an avenue as well, but then if we run into this case it would
allow to leak a kernel pointer this way.
Fixes: f1174f77b5 ("bpf/verifier: rework value tracking")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 81f6d49cce2d2fe507e3fddcc4a6db021d9c2e7b ]
Expedited RCU grace periods invoke sync_rcu_exp_select_node_cpus(), which
takes two passes over the leaf rcu_node structure's CPUs. The first
pass gathers up the current CPU and CPUs that are in dynticks idle mode.
The workqueue will report a quiescent state on their behalf later.
The second pass sends IPIs to the rest of the CPUs, but excludes the
current CPU, incorrectly assuming it has been included in the first
pass's list of CPUs.
Unfortunately the current CPU may have changed between the first and
second pass, due to the fact that the various rcu_node structures'
->lock fields have been dropped, thus momentarily enabling preemption.
This means that if the second pass's CPU was not on the first pass's
list, it will be ignored completely. There will be no IPI sent to
it, and there will be no reporting of quiescent states on its behalf.
Unfortunately, the expedited grace period will nevertheless be waiting
for that CPU to report a quiescent state, but with that CPU having no
reason to believe that such a report is needed.
The result will be an expedited grace period stall.
Fix this by no longer excluding the current CPU from consideration during
the second pass.
Fixes: b9ad4d6ed1 ("rcu: Avoid self-IPI in sync_rcu_exp_select_node_cpus()")
Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9b58e976b3b391c0cf02e038d53dd0478ed3013c ]
When rt_runtime is modified from -1 to a valid control value, it may
cause the task to be throttled all the time. Operations like the following
will trigger the bug. E.g:
1. echo -1 > /proc/sys/kernel/sched_rt_runtime_us
2. Run a FIFO task named A that executes while(1)
3. echo 950000 > /proc/sys/kernel/sched_rt_runtime_us
When rt_runtime is -1, The rt period timer will not be activated when task
A enqueued. And then the task will be throttled after setting rt_runtime to
950,000. The task will always be throttled because the rt period timer is
not activated.
Fixes: d0b27fa778 ("sched: rt-group: synchonised bandwidth period")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Li Hua <hucool.lihua@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20211203033618.11895-1-hucool.lihua@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit db52f57211b4e45f0ebb274e2c877b211dc18591 ]
Branch data available to BPF programs can be very useful to get stack traces
out of userspace application.
Commit fff7b64355 ("bpf: Add bpf_read_branch_records() helper") added BPF
support to capture branch records in x86. Enable this feature also for other
architectures as well by removing checks specific to x86.
If an architecture doesn't support branch records, bpf_read_branch_records()
still has appropriate checks and it will return an -EINVAL in that scenario.
Based on UAPI helper doc in include/uapi/linux/bpf.h, unsupported architectures
should return -ENOENT in such case. Hence, update the appropriate check to
return -ENOENT instead.
Selftest 'perf_branches' result on power9 machine which has the branch stacks
support:
- Before this patch:
[command]# ./test_progs -t perf_branches
#88/1 perf_branches/perf_branches_hw:FAIL
#88/2 perf_branches/perf_branches_no_hw:OK
#88 perf_branches:FAIL
Summary: 0/1 PASSED, 0 SKIPPED, 1 FAILED
- After this patch:
[command]# ./test_progs -t perf_branches
#88/1 perf_branches/perf_branches_hw:OK
#88/2 perf_branches/perf_branches_no_hw:OK
#88 perf_branches:OK
Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED
Selftest 'perf_branches' result on power9 machine which doesn't have branch
stack report:
- After this patch:
[command]# ./test_progs -t perf_branches
#88/1 perf_branches/perf_branches_hw:SKIP
#88/2 perf_branches/perf_branches_no_hw:OK
#88 perf_branches:OK
Summary: 1/1 PASSED, 1 SKIPPED, 0 FAILED
Fixes: fff7b64355 ("bpf: Add bpf_read_branch_records() helper")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211206073315.77432-1-kjain@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 866de407444398bc8140ea70de1dba5f91cc34ac ]
BPF_LOG_KERNEL is only used internally, so disallow bpf_btf_load()
to set log level as BPF_LOG_KERNEL. The same checking has already
been done in bpf_check(), so factor out a helper to check the
validity of log attributes and use it in both places.
Fixes: 8580ac9404 ("bpf: Process in-kernel BTF")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20211203053001.740945-1-houtao1@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c5a2d43e998a821701029f23e25b62f9188e93ff ]
Make BTF log size limit to be the same as the verifier log size limit.
Otherwise tools that progressively increase log size and use the same log
for BTF loading and program loading will be hitting hard to debug EINVAL.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211201181040.23337-7-alexei.starovoitov@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 014ba44e8184e1acf93e0cbb7089ee847802f8f0 ]
select_idle_sibling() has a special case for tasks woken up by a per-CPU
kthread where the selected CPU is the previous one. For asymmetric CPU
capacity systems, the assumption was that the wakee couldn't have a
bigger utilization during task placement than it used to have during the
last activation. That was not considering uclamp.min which can completely
change between two task activations and as a consequence mandates the
fitness criterion asym_fits_capacity(), even for the exit path described
above.
Fixes: b4c9c9f156 ("sched/fair: Prefer prev cpu in asymmetric wakeup path")
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lkml.kernel.org/r/20211129173115.4006346-1-vincent.donnefort@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8b4e74ccb582797f6f0b0a50372ebd9fd2372a27 ]
select_idle_sibling() has a special case for tasks woken up by a per-CPU
kthread, where the selected CPU is the previous one. However, the current
condition for this exit path is incomplete. A task can wake up from an
interrupt context (e.g. hrtimer), while a per-CPU kthread is running. A
such scenario would spuriously trigger the special case described above.
Also, a recent change made the idle task like a regular per-CPU kthread,
hence making that situation more likely to happen
(is_per_cpu_kthread(swapper) being true now).
Checking for task context makes sure select_idle_sibling() will not
interpret a wake up from any other context as a wake up by a per-CPU
kthread.
Fixes: 52262ee567 ("sched/fair: Allow a per-CPU kthread waking a task to stack on the same CPU, to fix XFS performance regression")
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lore.kernel.org/r/20211201143450.479472-1-vincent.donnefort@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit a674e48c5443d12a8a43c3ac42367aa39505d506 upstream.
Currently three dma atomic pools are initialized as long as the relevant
kernel codes are built in. While in kdump kernel of x86_64, this is not
right when trying to create atomic_pool_dma, because there's no managed
pages in DMA zone. In the case, DMA zone only has low 1M memory
presented and locked down by memblock allocator. So no pages are added
into buddy of DMA zone. Please check commit f1d4d47c5851 ("x86/setup:
Always reserve the first 1M of RAM").
Then in kdump kernel of x86_64, it always prints below failure message:
DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-0.rc5.20210611git929d931f2b40.42.fc35.x86_64 #1
Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.12.0 06/04/2018
Call Trace:
dump_stack+0x7f/0xa1
warn_alloc.cold+0x72/0xd6
__alloc_pages_slowpath.constprop.0+0xf29/0xf50
__alloc_pages+0x24d/0x2c0
alloc_page_interleave+0x13/0xb0
atomic_pool_expand+0x118/0x210
__dma_atomic_pool_init+0x45/0x93
dma_atomic_pool_init+0xdb/0x176
do_one_initcall+0x67/0x320
kernel_init_freeable+0x290/0x2dc
kernel_init+0xa/0x111
ret_from_fork+0x22/0x30
Mem-Info:
......
DMA: failed to allocate 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocation
DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
Here, let's check if DMA zone has managed pages, then create
atomic_pool_dma if yes. Otherwise just skip it.
Link: https://lkml.kernel.org/r/20211223094435.248523-3-bhe@redhat.com
Fixes: 6f599d8423 ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: David Rientjes <rientjes@google.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
prb_next_seq() always iterates from the first known sequence number.
In the worst case, it might loop 8k times for 256kB buffer,
15k times for 512kB buffer, and 64k times for 2MB buffer.
It was reported that polling and reading using syslog interface
might occupy 50% of CPU.
Speedup the search by storing @id of the last finalized descriptor.
The loop is still needed because the @id is stored and read in the best
effort way. An atomic variable is used to keep the @id consistent.
But the stores and reads are not serialized against each other.
The descriptor could get reused in the meantime. The related sequence
number will be used only when it is still valid.
An invalid value should be read _only_ when there is a flood of messages
and the ringbuffer is rapidly reused. The performance is the least
problem in this case.
Bug: 216238044
Reported-by: Chunlei Wang <chunlei.wang@mediatek.com>
Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/1642770388-17327-1-git-send-email-quic_mojha@quicinc.com
Link: https://lore.kernel.org/lkml/YXlddJxLh77DKfIO@alley/T/#m43062e8b2a17f8dbc8c6ccdb8851fb0dbaabbb14
(cherry picked from commit f244b4dc53e520d4570b2610436aba0593ce6f55
https://git.kernel.org/pub/scm/linux/kernel/git/printk/linux.git printk-rework)
Change-Id: I30a71e934422600f6b356da25ab705f71f3ad904
Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
commit 7dd5d437c258bbf4cc15b35229e5208b87b8b4e0 upstream.
In 32-bit architecture, the result of sizeof() is a 32-bit integer so
the expression becomes the multiplication between 2 32-bit integer which
can potentially leads to integer overflow. As a result,
bpf_map_area_alloc() allocates less memory than needed.
Fix this by casting 1 operand to u64.
Fixes: 0d2c4f964050 ("bpf: Eliminate rlimit-based memory accounting for sockmap and sockhash maps")
Fixes: 99c51064fb ("devmap: Use bpf_map_area_alloc() for allocating hash buckets")
Fixes: 546ac1ffb7 ("bpf: add devmap, a map for storing net device references")
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210613143440.71975-1-minhquangbui99@gmail.com
Signed-off-by: Connor O'Brien <connoro@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I9ce1991224a87eb39acf1da4923534e22380fc42
This reverts parts of commit f7d52eda9f.
The hook android_vh_cpu_down is not used by any vendor, so remove it to
help with merge issues with future LTS releases.
If this is needed by any real user, it can easily be reverted to add it
back and then the symbol should be added to the abi list at the same
time to prevent it from being removed again later.
Bug: 203756332
Bug: 176152285
Cc: Stephen Dickey <dickey@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I66e5308f32b8b11741dd4ea659cd3f79fa9a6526
This reverts half of commit 496b17304a.
This hook is not used, so the param is not needed at all. Step on the
way to remove the hook entirely.
Bug: 203756332
Bug: 203492840
Cc: Liujie Xie <xieliujie@oppo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I84c021f1139e0bbadc3573211a8ee8a24273f5e9
This reverts commit 8f3f46d77c.
The hook android_vh_create_worker is not used by any vendor, so remove
it to help with merge issues with future LTS releases.
If this is needed by any real user, it can easily be reverted to add it
back and then the symbol should be added to the abi list at the same
time to prevent it from being removed again later.
Bug: 203756332
Bug: 184571803
Cc: Liujie Xie <xieliujie@oppo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3112f4e067c27ecb6889333bb58f9342df0ecb63
This reverts commit c9b8fa644f.
The hooks android_vh_alloc_uid and android_vh_free_user are not used by
any vendor, so remove them to help with merge issues with future LTS
releases.
If this is needed by any real user, it can easily be reverted to add it
back and then the symbol should be added to the abi list at the same
time to prevent it from being removed again later.
Bug: 203756332
Bug: 187458531
Cc: heshuai1 <heshuai1@xiaomi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2aa3f95ab52c2d81eb790ba8b64b864f8189222b
This reverts part of commit 54f66141a8.
The hook android_rvh_entity_tick is not used by any vendor, so remove
it to help with merge issues with future LTS releases.
If this is needed by any real user, it can easily be reverted to add it
back and then the symbol should be added to the abi list at the same
time to prevent it from being removed again later.
Bug: 203756332
Bug: 183674818
Cc: lijianzhong <lijianzhong@xiaomi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I92acc11802939d1ebfcc8253490193019567695d
This reverts part of commit 53e8099784.
The hook android_rvh_check_preempt_wakeup_ignore is not used by any
vendor, so remove it to help with merge issues with future LTS releases.
If this is needed by any real user, it can easily be reverted to add it
back and then the symbol should be added to the abi list at the same
time to prevent it from being removed again later.
Bug: 203756332
Bug: 182441701
Cc: xieliujie <xieliujie@oppo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I69bdc14143760bfdd07bbd72ed3d5a4127fbc4c8
Changes in 5.10.93
kbuild: Add $(KBUILD_HOSTLDFLAGS) to 'has_libelf' test
devtmpfs regression fix: reconfigure on each mount
orangefs: Fix the size of a memory allocation in orangefs_bufmap_alloc()
remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided
vfs: fs_context: fix up param length parsing in legacy_parse_param
perf: Protect perf_guest_cbs with RCU
KVM: x86: Register Processor Trace interrupt hook iff PT enabled in guest
KVM: s390: Clarify SIGP orders versus STOP/RESTART
9p: only copy valid iattrs in 9P2000.L setattr implementation
video: vga16fb: Only probe for EGA and VGA 16 color graphic cards
media: uvcvideo: fix division by zero at stream start
rtlwifi: rtl8192cu: Fix WARNING when calling local_irq_restore() with interrupts enabled
firmware: qemu_fw_cfg: fix sysfs information leak
firmware: qemu_fw_cfg: fix NULL-pointer deref on duplicate entries
firmware: qemu_fw_cfg: fix kobject leak in probe error path
KVM: x86: remove PMU FIXED_CTR3 from msrs_to_save_all
ALSA: hda/realtek: Add speaker fixup for some Yoga 15ITL5 devices
ALSA: hda/realtek - Fix silent output on Gigabyte X570 Aorus Master after reboot from Windows
ALSA: hda: ALC287: Add Lenovo IdeaPad Slim 9i 14ITL5 speaker quirk
ALSA: hda/realtek: Add quirk for Legion Y9000X 2020
ALSA: hda/realtek: Re-order quirk entries for Lenovo
powerpc/pseries: Get entry and uaccess flush required bits from H_GET_CPU_CHARACTERISTICS
mtd: fixup CFI on ixp4xx
Linux 5.10.93
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6913f176d30f4c258f45327bd9bcb50deefcea98
commit ff083a2d972f56bebfd82409ca62e5dfce950961 upstream.
Protect perf_guest_cbs with RCU to fix multiple possible errors. Luckily,
all paths that read perf_guest_cbs already require RCU protection, e.g. to
protect the callback chains, so only the direct perf_guest_cbs touchpoints
need to be modified.
Bug #1 is a simple lack of WRITE_ONCE/READ_ONCE behavior to ensure
perf_guest_cbs isn't reloaded between a !NULL check and a dereference.
Fixed via the READ_ONCE() in rcu_dereference().
Bug #2 is that on weakly-ordered architectures, updates to the callbacks
themselves are not guaranteed to be visible before the pointer is made
visible to readers. Fixed by the smp_store_release() in
rcu_assign_pointer() when the new pointer is non-NULL.
Bug #3 is that, because the callbacks are global, it's possible for
readers to run in parallel with an unregisters, and thus a module
implementing the callbacks can be unloaded while readers are in flight,
resulting in a use-after-free. Fixed by a synchronize_rcu() call when
unregistering callbacks.
Bug #1 escaped notice because it's extremely unlikely a compiler will
reload perf_guest_cbs in this sequence. perf_guest_cbs does get reloaded
for future derefs, e.g. for ->is_user_mode(), but the ->is_in_guest()
guard all but guarantees the consumer will win the race, e.g. to nullify
perf_guest_cbs, KVM has to completely exit the guest and teardown down
all VMs before KVM start its module unload / unregister sequence. This
also makes it all but impossible to encounter bug #3.
Bug #2 has not been a problem because all architectures that register
callbacks are strongly ordered and/or have a static set of callbacks.
But with help, unloading kvm_intel can trigger bug #1 e.g. wrapping
perf_guest_cbs with READ_ONCE in perf_misc_flags() while spamming
kvm_intel module load/unload leads to:
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP
CPU: 6 PID: 1825 Comm: stress Not tainted 5.14.0-rc2+ #459
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:perf_misc_flags+0x1c/0x70
Call Trace:
perf_prepare_sample+0x53/0x6b0
perf_event_output_forward+0x67/0x160
__perf_event_overflow+0x52/0xf0
handle_pmi_common+0x207/0x300
intel_pmu_handle_irq+0xcf/0x410
perf_event_nmi_handler+0x28/0x50
nmi_handle+0xc7/0x260
default_do_nmi+0x6b/0x170
exc_nmi+0x103/0x130
asm_exc_nmi+0x76/0xbf
Fixes: 39447b386c ("perf: Enhance perf to allow for guest statistic collection from host")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211111020738.2512932-2-seanjc@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 5.10.92
md: revert io stats accounting
workqueue: Fix unbind_workers() VS wq_worker_running() race
bpf: Fix out of bounds access from invalid *_or_null type verification
Bluetooth: btusb: fix memory leak in btusb_mtk_submit_wmt_recv_urb()
Bluetooth: btusb: Add two more Bluetooth parts for WCN6855
Bluetooth: btusb: Add support for Foxconn MT7922A
Bluetooth: btusb: Add support for Foxconn QCA 0xe0d0
Bluetooth: bfusb: fix division by zero in send path
ARM: dts: exynos: Fix BCM4330 Bluetooth reset polarity in I9100
USB: core: Fix bug in resuming hub's handling of wakeup requests
USB: Fix "slab-out-of-bounds Write" bug in usb_hcd_poll_rh_status
ath11k: Fix buffer overflow when scanning with extraie
mmc: sdhci-pci: Add PCI ID for Intel ADL
veth: Do not record rx queue hint in veth_xmit
mfd: intel-lpss: Fix too early PM enablement in the ACPI ->probe()
can: gs_usb: fix use of uninitialized variable, detach device on reception of invalid USB data
can: isotp: convert struct tpcon::{idx,len} to unsigned int
can: gs_usb: gs_can_start_xmit(): zero-initialize hf->{flags,reserved}
random: fix data race on crng_node_pool
random: fix data race on crng init time
random: fix crash on multiple early calls to add_bootloader_randomness()
media: Revert "media: uvcvideo: Set unique vdev name based in type"
staging: wlan-ng: Avoid bitwise vs logical OR warning in hfa384x_usb_throttlefn()
drm/i915: Avoid bitwise vs logical OR warning in snb_wm_latency_quirk()
staging: greybus: fix stack size warning with UBSAN
Linux 5.10.92
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: If1a622474ca6cad5fbe08c171396f3df521bd9a0
[ no upstream commit given implicitly fixed through the larger refactoring
in c25b2ae136039ffa820c26138ed4a5e5f3ab3841 ]
While auditing some other code, I noticed missing checks inside the pointer
arithmetic simulation, more specifically, adjust_ptr_min_max_vals(). Several
*_OR_NULL types are not rejected whereas they are _required_ to be rejected
given the expectation is that they get promoted into a 'real' pointer type
for the success case, that is, after an explicit != NULL check.
One case which stands out and is accessible from unprivileged (iff enabled
given disabled by default) is BPF ring buffer. From crafting a PoC, the NULL
check can be bypassed through an offset, and its id marking will then lead
to promotion of mem_or_null to a mem type.
bpf_ringbuf_reserve() helper can trigger this case through passing of reserved
flags, for example.
func#0 @0
0: R1=ctx(id=0,off=0,imm=0) R10=fp0
0: (7a) *(u64 *)(r10 -8) = 0
1: R1=ctx(id=0,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm
1: (18) r1 = 0x0
3: R1_w=map_ptr(id=0,off=0,ks=0,vs=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm
3: (b7) r2 = 8
4: R1_w=map_ptr(id=0,off=0,ks=0,vs=0,imm=0) R2_w=invP8 R10=fp0 fp-8_w=mmmmmmmm
4: (b7) r3 = 0
5: R1_w=map_ptr(id=0,off=0,ks=0,vs=0,imm=0) R2_w=invP8 R3_w=invP0 R10=fp0 fp-8_w=mmmmmmmm
5: (85) call bpf_ringbuf_reserve#131
6: R0_w=mem_or_null(id=2,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm refs=2
6: (bf) r6 = r0
7: R0_w=mem_or_null(id=2,ref_obj_id=2,off=0,imm=0) R6_w=mem_or_null(id=2,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm refs=2
7: (07) r0 += 1
8: R0_w=mem_or_null(id=2,ref_obj_id=2,off=1,imm=0) R6_w=mem_or_null(id=2,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm refs=2
8: (15) if r0 == 0x0 goto pc+4
R0_w=mem(id=0,ref_obj_id=0,off=0,imm=0) R6_w=mem(id=0,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm refs=2
9: R0_w=mem(id=0,ref_obj_id=0,off=0,imm=0) R6_w=mem(id=0,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm refs=2
9: (62) *(u32 *)(r6 +0) = 0
R0_w=mem(id=0,ref_obj_id=0,off=0,imm=0) R6_w=mem(id=0,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm refs=2
10: R0_w=mem(id=0,ref_obj_id=0,off=0,imm=0) R6_w=mem(id=0,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm refs=2
10: (bf) r1 = r6
11: R0_w=mem(id=0,ref_obj_id=0,off=0,imm=0) R1_w=mem(id=0,ref_obj_id=2,off=0,imm=0) R6_w=mem(id=0,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm refs=2
11: (b7) r2 = 0
12: R0_w=mem(id=0,ref_obj_id=0,off=0,imm=0) R1_w=mem(id=0,ref_obj_id=2,off=0,imm=0) R2_w=invP0 R6_w=mem(id=0,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm refs=2
12: (85) call bpf_ringbuf_submit#132
13: R6=invP(id=0) R10=fp0 fp-8=mmmmmmmm
13: (b7) r0 = 0
14: R0_w=invP0 R6=invP(id=0) R10=fp0 fp-8=mmmmmmmm
14: (95) exit
from 8 to 13: safe
processed 15 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 0
OK
All three commits, that is b121b341e5 ("bpf: Add PTR_TO_BTF_ID_OR_NULL support"),
457f44363a ("bpf: Implement BPF ring buffer and verifier support for it"), and the
afbf21dce6 ("bpf: Support readonly/readwrite buffers in verifier") suffer the same
cause and their *_OR_NULL type pendants must be rejected in adjust_ptr_min_max_vals().
Make the test more robust by reusing reg_type_may_be_null() helper such that we catch
all *_OR_NULL types we have today and in future.
Note that pointer arithmetic on PTR_TO_BTF_ID, PTR_TO_RDONLY_BUF, and PTR_TO_RDWR_BUF
is generally allowed.
Fixes: b121b341e5 ("bpf: Add PTR_TO_BTF_ID_OR_NULL support")
Fixes: 457f44363a ("bpf: Implement BPF ring buffer and verifier support for it")
Fixes: afbf21dce6 ("bpf: Support readonly/readwrite buffers in verifier")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 07edfece8bcb0580a1828d939e6f8d91a8603eb2 upstream.
At CPU-hotplug time, unbind_worker() may preempt a worker while it is
waking up. In that case the following scenario can happen:
unbind_workers() wq_worker_running()
-------------- -------------------
if (!(worker->flags & WORKER_NOT_RUNNING))
//PREEMPTED by unbind_workers
worker->flags |= WORKER_UNBOUND;
[...]
atomic_set(&pool->nr_running, 0);
//resume to worker
atomic_inc(&worker->pool->nr_running);
After unbind_worker() resets pool->nr_running, the value is expected to
remain 0 until the pool ever gets rebound in case cpu_up() is called on
the target CPU in the future. But here the race leaves pool->nr_running
with a value of 1, triggering the following warning when the worker goes
idle:
WARNING: CPU: 3 PID: 34 at kernel/workqueue.c:1823 worker_enter_idle+0x95/0xc0
Modules linked in:
CPU: 3 PID: 34 Comm: kworker/3:0 Not tainted 5.16.0-rc1+ #34
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
Workqueue: 0x0 (rcu_par_gp)
RIP: 0010:worker_enter_idle+0x95/0xc0
Code: 04 85 f8 ff ff ff 39 c1 7f 09 48 8b 43 50 48 85 c0 74 1b 83 e2 04 75 99 8b 43 34 39 43 30 75 91 8b 83 00 03 00 00 85 c0 74 87 <0f> 0b 5b c3 48 8b 35 70 f1 37 01 48 8d 7b 48 48 81 c6 e0 93 0
RSP: 0000:ffff9b7680277ed0 EFLAGS: 00010086
RAX: 00000000ffffffff RBX: ffff93465eae9c00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9346418a0000 RDI: ffff934641057140
RBP: ffff934641057170 R08: 0000000000000001 R09: ffff9346418a0080
R10: ffff9b768027fdf0 R11: 0000000000002400 R12: ffff93465eae9c20
R13: ffff93465eae9c20 R14: ffff93465eae9c70 R15: ffff934641057140
FS: 0000000000000000(0000) GS:ffff93465eac0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000001cc0c000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
worker_thread+0x89/0x3d0
? process_one_work+0x400/0x400
kthread+0x162/0x190
? set_kthread_struct+0x40/0x40
ret_from_fork+0x22/0x30
</TASK>
Also due to this incorrect "nr_running == 1", further queued work may
end up not being served, because no worker is awaken at work insert time.
This raises rcutorture writer stalls for example.
Fix this with disabling preemption in the right place in
wq_worker_running().
It's worth noting that if the worker migrates and runs concurrently with
unbind_workers(), it is guaranteed to see the WORKER_UNBOUND flag update
due to set_cpus_allowed_ptr() acquiring/releasing rq->lock.
Fixes: 6d25be5782 ("sched/core, workqueues: Distangle worker accounting from rq lock")
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add one CONFIG to control removing the macros or not. On some platform,
configureing out the macros removes the associated members from the
structs, this reduces the object size of the slabs related with the
structs, therefore reduces the total slab memory consumption of system.
Besides, this also reduces vmlinux size a bit, therefore the total
kernel memory size increses a bit.
The macros are ANDROID_KABI_RESERVE, ANDROID_VENDOR_DATA,
ANDROID_VENDOR_DATA_ARRAY, ANDROID_OEM_DATA, ANDROID_OEM_DATA_ARRAY.
Bug: 206561931
Signed-off-by: Qingqing Zhou <quic_qqzhou@quicinc.com>
Change-Id: I0868d299ccce3c4b39f42af17916828500be6cc4
Changes in 5.10.91
f2fs: quota: fix potential deadlock
selftests: x86: fix [-Wstringop-overread] warn in test_process_vm_readv()
tracing: Fix check for trace_percpu_buffer validity in get_trace_buf()
tracing: Tag trace_percpu_buffer as a percpu pointer
ieee802154: atusb: fix uninit value in atusb_set_extended_addr
i40e: Fix to not show opcode msg on unsuccessful VF MAC change
iavf: Fix limit of total number of queues to active queues of VF
RDMA/core: Don't infoleak GRH fields
netrom: fix copying in user data in nr_setsockopt
RDMA/uverbs: Check for null return of kmalloc_array
mac80211: initialize variable have_higher_than_11mbit
sfc: The RX page_ring is optional
i40e: fix use-after-free in i40e_sync_filters_subtask()
i40e: Fix for displaying message regarding NVM version
i40e: Fix incorrect netdev's real number of RX/TX queues
ftrace/samples: Add missing prototypes direct functions
ipv4: Check attribute length for RTA_GATEWAY in multipath route
ipv4: Check attribute length for RTA_FLOW in multipath route
ipv6: Check attribute length for RTA_GATEWAY in multipath route
ipv6: Check attribute length for RTA_GATEWAY when deleting multipath route
lwtunnel: Validate RTA_ENCAP_TYPE attribute length
batman-adv: mcast: don't send link-local multicast to mcast routers
sch_qfq: prevent shift-out-of-bounds in qfq_init_qdisc
net: ena: Fix undefined state when tx request id is out of bounds
net: ena: Fix error handling when calculating max IO queues number
xfs: map unwritten blocks in XFS_IOC_{ALLOC,FREE}SP just like fallocate
power: supply: core: Break capacity loop
power: reset: ltc2952: Fix use of floating point literals
rndis_host: support Hytera digital radios
phonet: refcount leak in pep_sock_accep
power: bq25890: Enable continuous conversion for ADC at charging
ipv6: Continue processing multipath route even if gateway attribute is invalid
ipv6: Do cleanup if attribute validation fails in multipath route
usb: mtu3: fix interval value for intr and isoc
scsi: libiscsi: Fix UAF in iscsi_conn_get_param()/iscsi_conn_teardown()
ip6_vti: initialize __ip6_tnl_parm struct in vti6_siocdevprivate
net: udp: fix alignment problem in udp4_seq_show()
atlantic: Fix buff_ring OOB in aq_ring_rx_clean
mISDN: change function names to avoid conflicts
drm/amd/display: Added power down for DCN10
ipv6: raw: check passed optlen before reading
ARM: dts: gpio-ranges property is now required
Input: zinitix - make sure the IRQ is allocated before it gets enabled
Linux 5.10.91
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iea6f28f738081085f9c5df020ebdee6583b38dfb
commit 823e670f7ed616d0ce993075c8afe0217885f79d upstream.
With the new osnoise tracer, we are seeing the below splat:
Kernel attempted to read user page (c7d880000) - exploit attempt? (uid: 0)
BUG: Unable to handle kernel data access on read at 0xc7d880000
Faulting instruction address: 0xc0000000002ffa10
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
...
NIP [c0000000002ffa10] __trace_array_vprintk.part.0+0x70/0x2f0
LR [c0000000002ff9fc] __trace_array_vprintk.part.0+0x5c/0x2f0
Call Trace:
[c0000008bdd73b80] [c0000000001c49cc] put_prev_task_fair+0x3c/0x60 (unreliable)
[c0000008bdd73be0] [c000000000301430] trace_array_printk_buf+0x70/0x90
[c0000008bdd73c00] [c0000000003178b0] trace_sched_switch_callback+0x250/0x290
[c0000008bdd73c90] [c000000000e70d60] __schedule+0x410/0x710
[c0000008bdd73d40] [c000000000e710c0] schedule+0x60/0x130
[c0000008bdd73d70] [c000000000030614] interrupt_exit_user_prepare_main+0x264/0x270
[c0000008bdd73de0] [c000000000030a70] syscall_exit_prepare+0x150/0x180
[c0000008bdd73e10] [c00000000000c174] system_call_vectored_common+0xf4/0x278
osnoise tracer on ppc64le is triggering osnoise_taint() for negative
duration in get_int_safe_duration() called from
trace_sched_switch_callback()->thread_exit().
The problem though is that the check for a valid trace_percpu_buffer is
incorrect in get_trace_buf(). The check is being done after calculating
the pointer for the current cpu, rather than on the main percpu pointer.
Fix the check to be against trace_percpu_buffer.
Link: https://lkml.kernel.org/r/a920e4272e0b0635cf20c444707cbce1b2c8973d.1640255304.git.naveen.n.rao@linux.vnet.ibm.com
Cc: stable@vger.kernel.org
Fixes: e2ace00117 ("tracing: Choose static tp_printk buffer by explicit nesting count")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 5.10.90
Input: i8042 - add deferred probe support
Input: i8042 - enable deferred probe quirk for ASUS UM325UA
tomoyo: Check exceeded quota early in tomoyo_domain_quota_is_ok().
tomoyo: use hwight16() in tomoyo_domain_quota_is_ok()
parisc: Clear stale IIR value on instruction access rights trap
platform/x86: apple-gmux: use resource_size() with res
memblock: fix memblock_phys_alloc() section mismatch error
recordmcount.pl: fix typo in s390 mcount regex
selinux: initialize proto variable in selinux_ip_postroute_compat()
scsi: lpfc: Terminate string in lpfc_debugfs_nvmeio_trc_write()
net/mlx5: DR, Fix NULL vs IS_ERR checking in dr_domain_init_resources
net/mlx5e: Wrap the tx reporter dump callback to extract the sq
net/mlx5e: Fix ICOSQ recovery flow for XSK
udp: using datalen to cap ipv6 udp max gso segments
selftests: Calculate udpgso segment count without header adjustment
sctp: use call_rcu to free endpoint
net/smc: fix using of uninitialized completions
net: usb: pegasus: Do not drop long Ethernet frames
net: ag71xx: Fix a potential double free in error handling paths
net: lantiq_xrx200: fix statistics of received bytes
NFC: st21nfca: Fix memory leak in device probe and remove
net/smc: improved fix wait on already cleared link
net/smc: don't send CDC/LLC message if link not ready
net/smc: fix kernel panic caused by race of smc_sock
igc: Fix TX timestamp support for non-MSI-X platforms
ionic: Initialize the 'lif->dbid_inuse' bitmap
net/mlx5e: Fix wrong features assignment in case of error
selftests/net: udpgso_bench_tx: fix dst ip argument
net/ncsi: check for error return from call to nla_put_u32
fsl/fman: Fix missing put_device() call in fman_port_probe
i2c: validate user data in compat ioctl
nfc: uapi: use kernel size_t to fix user-space builds
uapi: fix linux/nfc.h userspace compilation errors
drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
drm/amdgpu: add support for IP discovery gc_info table v2
xhci: Fresco FL1100 controller should not have BROKEN_MSI quirk set.
usb: gadget: f_fs: Clear ffs_eventfd in ffs_data_clear.
usb: mtu3: add memory barrier before set GPD's HWO
usb: mtu3: fix list_head check warning
usb: mtu3: set interval of FS intr and isoc endpoint
binder: fix async_free_space accounting for empty parcels
scsi: vmw_pvscsi: Set residual data length conditionally
Input: appletouch - initialize work before device registration
Input: spaceball - fix parsing of movement data packets
net: fix use-after-free in tw_timer_handler
perf script: Fix CPU filtering of a script's switch events
bpf: Add kconfig knob for disabling unpriv bpf by default
Linux 5.10.90
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I299d1e939d3b01b5d6f34f7b9ec701d624bbfde3