Commit Graph

100 Commits

Author SHA1 Message Date
Frederic Weisbecker
c3daae52af UPSTREAM: rcu/exp: Mark current CPU as exp-QS in IPI loop second pass
Expedited RCU grace periods invoke sync_rcu_exp_select_node_cpus(), which
takes two passes over the leaf rcu_node structure's CPUs.  The first
pass gathers up the current CPU and CPUs that are in dynticks idle mode.
The workqueue will report a quiescent state on their behalf later.
The second pass sends IPIs to the rest of the CPUs, but excludes the
current CPU, incorrectly assuming it has been included in the first
pass's list of CPUs.

Unfortunately the current CPU may have changed between the first and
second pass, due to the fact that the various rcu_node structures'
->lock fields have been dropped, thus momentarily enabling preemption.
This means that if the second pass's CPU was not on the first pass's
list, it will be ignored completely.  There will be no IPI sent to
it, and there will be no reporting of quiescent states on its behalf.
Unfortunately, the expedited grace period will nevertheless be waiting
for that CPU to report a quiescent state, but with that CPU having no
reason to believe that such a report is needed.

The result will be an expedited grace period stall.

Fix this by no longer excluding the current CPU from consideration during
the second pass.

Bug: 216238044
Fixes: b9ad4d6ed1 ("rcu: Avoid self-IPI in sync_rcu_exp_select_node_cpus()")
Change-Id: I6b7bab7d72ddc41ca7cbb6bdc0dbe1ad5ecb4f44
Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit 81f6d49cce2d2fe507e3fddcc4a6db021d9c2e7b)
Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
2022-02-15 17:19:24 +05:30
Paul E. McKenney
6aa9e78d6e FROMGIT: rcu: Allow expedited RCU grace periods on incoming CPUs
Although it is usually safe to invoke synchronize_rcu_expedited() from a
preemption-enabled CPU-hotplug notifier, if it is invoked from a notifier
between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its attempts to
invoke a workqueue handler will hang due to RCU waiting on a CPU that
the scheduler is not paying attention to.  This commit therefore expands
use of the existing workqueue-independent synchronize_rcu_expedited()
from early boot to also include CPUs that are being hotplugged.

Bug: 216238044
Link: https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/
Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit 710f460c395af6b81df1c81043308aaa60d5e25c
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/next)

Change-Id: I3f81dee6deaf6a4504aec31e058785dc8cee6a3f
Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
2022-01-28 22:26:39 +00:00
Greg Kroah-Hartman
c553d9a246 Merge 5.10.80 into android12-5.10-lts
Changes in 5.10.80
	xhci: Fix USB 3.1 enumeration issues by increasing roothub power-on-good delay
	usb: xhci: Enable runtime-pm by default on AMD Yellow Carp platform
	binder: use euid from cred instead of using task
	binder: use cred instead of task for selinux checks
	binder: use cred instead of task for getsecid
	Input: iforce - fix control-message timeout
	Input: elantench - fix misreporting trackpoint coordinates
	Input: i8042 - Add quirk for Fujitsu Lifebook T725
	libata: fix read log timeout value
	ocfs2: fix data corruption on truncate
	scsi: core: Remove command size deduction from scsi_setup_scsi_cmnd()
	scsi: qla2xxx: Fix kernel crash when accessing port_speed sysfs file
	scsi: qla2xxx: Fix use after free in eh_abort path
	mmc: mtk-sd: Add wait dma stop done flow
	mmc: dw_mmc: Dont wait for DRTO on Write RSP error
	exfat: fix incorrect loading of i_blocks for large files
	parisc: Fix set_fixmap() on PA1.x CPUs
	parisc: Fix ptrace check on syscall return
	tpm: Check for integer overflow in tpm2_map_response_body()
	firmware/psci: fix application of sizeof to pointer
	crypto: s5p-sss - Add error handling in s5p_aes_probe()
	media: rkvdec: Do not override sizeimage for output format
	media: ite-cir: IR receiver stop working after receive overflow
	media: rkvdec: Support dynamic resolution changes
	media: ir-kbd-i2c: improve responsiveness of hauppauge zilog receivers
	media: v4l2-ioctl: Fix check_ext_ctrls
	ALSA: hda/realtek: Fix mic mute LED for the HP Spectre x360 14
	ALSA: hda/realtek: Add a quirk for HP OMEN 15 mute LED
	ALSA: hda/realtek: Add quirk for Clevo PC70HS
	ALSA: hda/realtek: Headset fixup for Clevo NH77HJQ
	ALSA: hda/realtek: Add a quirk for Acer Spin SP513-54N
	ALSA: hda/realtek: Add quirk for ASUS UX550VE
	ALSA: hda/realtek: Add quirk for HP EliteBook 840 G7 mute LED
	ALSA: ua101: fix division by zero at probe
	ALSA: 6fire: fix control and bulk message timeouts
	ALSA: line6: fix control and interrupt message timeouts
	ALSA: usb-audio: Line6 HX-Stomp XL USB_ID for 48k-fixed quirk
	ALSA: usb-audio: Add registration quirk for JBL Quantum 400
	ALSA: hda: Free card instance properly at probe errors
	ALSA: synth: missing check for possible NULL after the call to kstrdup
	ALSA: timer: Fix use-after-free problem
	ALSA: timer: Unconditionally unlink slave instances, too
	ext4: fix lazy initialization next schedule time computation in more granular unit
	ext4: ensure enough credits in ext4_ext_shift_path_extents
	ext4: refresh the ext4_ext_path struct after dropping i_data_sem.
	fuse: fix page stealing
	x86/sme: Use #define USE_EARLY_PGTABLE_L5 in mem_encrypt_identity.c
	x86/cpu: Fix migration safety with X86_BUG_NULL_SEL
	x86/irq: Ensure PI wakeup handler is unregistered before module unload
	ASoC: soc-core: fix null-ptr-deref in snd_soc_del_component_unlocked()
	ALSA: hda/realtek: Fixes HP Spectre x360 15-eb1xxx speakers
	cavium: Return negative value when pci_alloc_irq_vectors() fails
	scsi: qla2xxx: Return -ENOMEM if kzalloc() fails
	scsi: qla2xxx: Fix unmap of already freed sgl
	mISDN: Fix return values of the probe function
	cavium: Fix return values of the probe function
	sfc: Export fibre-specific supported link modes
	sfc: Don't use netif_info before net_device setup
	hyperv/vmbus: include linux/bitops.h
	ARM: dts: sun7i: A20-olinuxino-lime2: Fix ethernet phy-mode
	reset: socfpga: add empty driver allowing consumers to probe
	mmc: winbond: don't build on M68K
	drm: panel-orientation-quirks: Add quirk for Aya Neo 2021
	fcnal-test: kill hanging ping/nettest binaries on cleanup
	bpf: Define bpf_jit_alloc_exec_limit for arm64 JIT
	bpf: Prevent increasing bpf_jit_limit above max
	gpio: mlxbf2.c: Add check for bgpio_init failure
	xen/netfront: stop tx queues during live migration
	nvmet-tcp: fix a memory leak when releasing a queue
	spi: spl022: fix Microwire full duplex mode
	net: multicast: calculate csum of looped-back and forwarded packets
	watchdog: Fix OMAP watchdog early handling
	drm: panel-orientation-quirks: Add quirk for GPD Win3
	block: schedule queue restart after BLK_STS_ZONE_RESOURCE
	nvmet-tcp: fix header digest verification
	r8169: Add device 10ec:8162 to driver r8169
	vmxnet3: do not stop tx queues after netif_device_detach()
	nfp: bpf: relax prog rejection for mtu check through max_pkt_offset
	net/smc: Fix smc_link->llc_testlink_time overflow
	net/smc: Correct spelling mistake to TCPF_SYN_RECV
	rds: stop using dmapool
	btrfs: clear MISSING device status bit in btrfs_close_one_device
	btrfs: fix lost error handling when replaying directory deletes
	btrfs: call btrfs_check_rw_degradable only if there is a missing device
	KVM: VMX: Unregister posted interrupt wakeup handler on hardware unsetup
	ia64: kprobes: Fix to pass correct trampoline address to the handler
	selinux: fix race condition when computing ocontext SIDs
	hwmon: (pmbus/lm25066) Add offset coefficients
	regulator: s5m8767: do not use reset value as DVS voltage if GPIO DVS is disabled
	regulator: dt-bindings: samsung,s5m8767: correct s5m8767,pmic-buck-default-dvs-idx property
	EDAC/sb_edac: Fix top-of-high-memory value for Broadwell/Haswell
	mwifiex: fix division by zero in fw download path
	ath6kl: fix division by zero in send path
	ath6kl: fix control-message timeout
	ath10k: fix control-message timeout
	ath10k: fix division by zero in send path
	PCI: Mark Atheros QCA6174 to avoid bus reset
	rtl8187: fix control-message timeouts
	evm: mark evm_fixmode as __ro_after_init
	ifb: Depend on netfilter alternatively to tc
	wcn36xx: Fix HT40 capability for 2Ghz band
	wcn36xx: Fix tx_status mechanism
	wcn36xx: Fix (QoS) null data frame bitrate/modulation
	PM: sleep: Do not let "syscore" devices runtime-suspend during system transitions
	mwifiex: Read a PCI register after writing the TX ring write pointer
	mwifiex: Try waking the firmware until we get an interrupt
	libata: fix checking of DMA state
	wcn36xx: handle connection loss indication
	rsi: fix occasional initialisation failure with BT coex
	rsi: fix key enabled check causing unwanted encryption for vap_id > 0
	rsi: fix rate mask set leading to P2P failure
	rsi: Fix module dev_oper_mode parameter description
	perf/x86/intel/uncore: Support extra IMC channel on Ice Lake server
	perf/x86/intel/uncore: Fix Intel ICX IIO event constraints
	RDMA/qedr: Fix NULL deref for query_qp on the GSI QP
	signal: Remove the bogus sigkill_pending in ptrace_stop
	memory: renesas-rpc-if: Correct QSPI data transfer in Manual mode
	signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT
	soc: fsl: dpio: replace smp_processor_id with raw_smp_processor_id
	soc: fsl: dpio: use the combined functions to protect critical zone
	mtd: rawnand: socrates: Keep the driver compatible with on-die ECC engines
	power: supply: max17042_battery: Prevent int underflow in set_soc_threshold
	power: supply: max17042_battery: use VFSOC for capacity when no rsns
	KVM: arm64: Extract ESR_ELx.EC only
	KVM: nVMX: Query current VMCS when determining if MSR bitmaps are in use
	can: j1939: j1939_tp_cmd_recv(): ignore abort message in the BAM transport
	can: j1939: j1939_can_recv(): ignore messages with invalid source address
	powerpc/85xx: Fix oops when mpc85xx_smp_guts_ids node cannot be found
	ring-buffer: Protect ring_buffer_reset() from reentrancy
	serial: core: Fix initializing and restoring termios speed
	ifb: fix building without CONFIG_NET_CLS_ACT
	ALSA: mixer: oss: Fix racy access to slots
	ALSA: mixer: fix deadlock in snd_mixer_oss_set_volume
	xen/balloon: add late_initcall_sync() for initial ballooning done
	ovl: fix use after free in struct ovl_aio_req
	PCI: pci-bridge-emul: Fix emulation of W1C bits
	PCI: cadence: Add cdns_plat_pcie_probe() missing return
	PCI: aardvark: Do not clear status bits of masked interrupts
	PCI: aardvark: Fix checking for link up via LTSSM state
	PCI: aardvark: Do not unmask unused interrupts
	PCI: aardvark: Fix reporting Data Link Layer Link Active
	PCI: aardvark: Fix configuring Reference clock
	PCI: aardvark: Fix return value of MSI domain .alloc() method
	PCI: aardvark: Read all 16-bits from PCIE_MSI_PAYLOAD_REG
	PCI: aardvark: Fix support for bus mastering and PCI_COMMAND on emulated bridge
	PCI: aardvark: Fix support for PCI_BRIDGE_CTL_BUS_RESET on emulated bridge
	PCI: aardvark: Set PCI Bridge Class Code to PCI Bridge
	PCI: aardvark: Fix support for PCI_ROM_ADDRESS1 on emulated bridge
	quota: check block number when reading the block in quota file
	quota: correct error number in free_dqentry()
	pinctrl: core: fix possible memory leak in pinctrl_enable()
	coresight: cti: Correct the parameter for pm_runtime_put
	iio: dac: ad5446: Fix ad5622_write() return value
	iio: ad5770r: make devicetree property reading consistent
	USB: serial: keyspan: fix memleak on probe errors
	serial: 8250: fix racy uartclk update
	most: fix control-message timeouts
	USB: iowarrior: fix control-message timeouts
	USB: chipidea: fix interrupt deadlock
	power: supply: max17042_battery: Clear status bits in interrupt handler
	dma-buf: WARN on dmabuf release with pending attachments
	drm: panel-orientation-quirks: Update the Lenovo Ideapad D330 quirk (v2)
	drm: panel-orientation-quirks: Add quirk for KD Kurio Smart C15200 2-in-1
	drm: panel-orientation-quirks: Add quirk for the Samsung Galaxy Book 10.6
	Bluetooth: sco: Fix lock_sock() blockage by memcpy_from_msg()
	Bluetooth: fix use-after-free error in lock_sock_nested()
	drm/panel-orientation-quirks: add Valve Steam Deck
	rcutorture: Avoid problematic critical section nesting on PREEMPT_RT
	platform/x86: wmi: do not fail if disabling fails
	MIPS: lantiq: dma: add small delay after reset
	MIPS: lantiq: dma: reset correct number of channel
	locking/lockdep: Avoid RCU-induced noinstr fail
	net: sched: update default qdisc visibility after Tx queue cnt changes
	rcu-tasks: Move RTGS_WAIT_CBS to beginning of rcu_tasks_kthread() loop
	smackfs: Fix use-after-free in netlbl_catmap_walk()
	ath11k: Align bss_chan_info structure with firmware
	x86: Increase exception stack sizes
	mwifiex: Run SET_BSS_MODE when changing from P2P to STATION vif-type
	mwifiex: Properly initialize private structure on interface type changes
	fscrypt: allow 256-bit master keys with AES-256-XTS
	drm/amdgpu: Fix MMIO access page fault
	ath11k: Avoid reg rules update during firmware recovery
	ath11k: add handler for scan event WMI_SCAN_EVENT_DEQUEUED
	ath11k: Change DMA_FROM_DEVICE to DMA_TO_DEVICE when map reinjected packets
	ath10k: high latency fixes for beacon buffer
	media: mt9p031: Fix corrupted frame after restarting stream
	media: netup_unidvb: handle interrupt properly according to the firmware
	media: atomisp: Fix error handling in probe
	media: stm32: Potential NULL pointer dereference in dcmi_irq_thread()
	media: uvcvideo: Set capability in s_param
	media: uvcvideo: Return -EIO for control errors
	media: uvcvideo: Set unique vdev name based in type
	media: s5p-mfc: fix possible null-pointer dereference in s5p_mfc_probe()
	media: s5p-mfc: Add checking to s5p_mfc_probe().
	media: imx: set a media_device bus_info string
	media: mceusb: return without resubmitting URB in case of -EPROTO error.
	ia64: don't do IA64_CMPXCHG_DEBUG without CONFIG_PRINTK
	rtw88: fix RX clock gate setting while fifo dump
	brcmfmac: Add DMI nvram filename quirk for Cyberbook T116 tablet
	media: rcar-csi2: Add checking to rcsi2_start_receiver()
	ipmi: Disable some operations during a panic
	fs/proc/uptime.c: Fix idle time reporting in /proc/uptime
	ACPICA: Avoid evaluating methods too early during system resume
	media: ipu3-imgu: imgu_fmt: Handle properly try
	media: ipu3-imgu: VIDIOC_QUERYCAP: Fix bus_info
	media: usb: dvd-usb: fix uninit-value bug in dibusb_read_eeprom_byte()
	net-sysfs: try not to restart the syscall if it will fail eventually
	tracefs: Have tracefs directories not set OTH permission bits by default
	ath: dfs_pattern_detector: Fix possible null-pointer dereference in channel_detector_create()
	mmc: moxart: Fix reference count leaks in moxart_probe
	iov_iter: Fix iov_iter_get_pages{,_alloc} page fault return value
	ACPI: battery: Accept charges over the design capacity as full
	drm/amdkfd: fix resume error when iommu disabled in Picasso
	net: phy: micrel: make *-skew-ps check more lenient
	leaking_addresses: Always print a trailing newline
	drm/msm: prevent NULL dereference in msm_gpu_crashstate_capture()
	block: bump max plugged deferred size from 16 to 32
	md: update superblock after changing rdev flags in state_store
	memstick: r592: Fix a UAF bug when removing the driver
	lib/xz: Avoid overlapping memcpy() with invalid input with in-place decompression
	lib/xz: Validate the value before assigning it to an enum variable
	workqueue: make sysfs of unbound kworker cpumask more clever
	tracing/cfi: Fix cmp_entries_* functions signature mismatch
	mt76: mt7915: fix an off-by-one bound check
	mwl8k: Fix use-after-free in mwl8k_fw_state_machine()
	block: remove inaccurate requeue check
	media: allegro: ignore interrupt if mailbox is not initialized
	nvmet: fix use-after-free when a port is removed
	nvmet-rdma: fix use-after-free when a port is removed
	nvmet-tcp: fix use-after-free when a port is removed
	nvme: drop scan_lock and always kick requeue list when removing namespaces
	PM: hibernate: Get block device exclusively in swsusp_check()
	selftests: kvm: fix mismatched fclose() after popen()
	selftests/bpf: Fix perf_buffer test on system with offline cpus
	iwlwifi: mvm: disable RX-diversity in powersave
	smackfs: use __GFP_NOFAIL for smk_cipso_doi()
	ARM: clang: Do not rely on lr register for stacktrace
	gre/sit: Don't generate link-local addr if addr_gen_mode is IN6_ADDR_GEN_MODE_NONE
	gfs2: Cancel remote delete work asynchronously
	gfs2: Fix glock_hash_walk bugs
	ARM: 9136/1: ARMv7-M uses BE-8, not BE-32
	vrf: run conntrack only in context of lower/physdev for locally generated packets
	net: annotate data-race in neigh_output()
	ACPI: AC: Quirk GK45 to skip reading _PSR
	btrfs: reflink: initialize return value to 0 in btrfs_extent_same()
	btrfs: do not take the uuid_mutex in btrfs_rm_device
	spi: bcm-qspi: Fix missing clk_disable_unprepare() on error in bcm_qspi_probe()
	wcn36xx: Correct band/freq reporting on RX
	x86/hyperv: Protect set_hv_tscchange_cb() against getting preempted
	drm/amd/display: dcn20_resource_construct reduce scope of FPU enabled
	selftests/core: fix conflicting types compile error for close_range()
	parisc: fix warning in flush_tlb_all
	task_stack: Fix end_of_stack() for architectures with upwards-growing stack
	erofs: don't trigger WARN() when decompression fails
	parisc/unwind: fix unwinder when CONFIG_64BIT is enabled
	parisc/kgdb: add kgdb_roundup() to make kgdb work with idle polling
	netfilter: conntrack: set on IPS_ASSURED if flows enters internal stream state
	selftests/bpf: Fix strobemeta selftest regression
	Bluetooth: fix init and cleanup of sco_conn.timeout_work
	rcu: Fix existing exp request check in sync_sched_exp_online_cleanup()
	MIPS: lantiq: dma: fix burst length for DEU
	objtool: Add xen_start_kernel() to noreturn list
	x86/xen: Mark cpu_bringup_and_idle() as dead_end_function
	objtool: Fix static_call list generation
	drm/v3d: fix wait for TMU write combiner flush
	virtio-gpu: fix possible memory allocation failure
	lockdep: Let lock_is_held_type() detect recursive read as read
	net: net_namespace: Fix undefined member in key_remove_domain()
	cgroup: Make rebind_subsystems() disable v2 controllers all at once
	wcn36xx: Fix Antenna Diversity Switching
	wilc1000: fix possible memory leak in cfg_scan_result()
	Bluetooth: btmtkuart: fix a memleak in mtk_hci_wmt_sync
	crypto: caam - disable pkc for non-E SoCs
	rxrpc: Fix _usecs_to_jiffies() by using usecs_to_jiffies()
	net: dsa: rtl8366rb: Fix off-by-one bug
	ath11k: fix some sleeping in atomic bugs
	ath11k: Avoid race during regd updates
	ath11k: fix packet drops due to incorrect 6 GHz freq value in rx status
	ath11k: Fix memory leak in ath11k_qmi_driver_event_work
	ath10k: Fix missing frame timestamp for beacon/probe-resp
	ath10k: sdio: Add missing BH locking around napi_schdule()
	drm/ttm: stop calling tt_swapin in vm_access
	arm64: mm: update max_pfn after memory hotplug
	drm/amdgpu: fix warning for overflow check
	media: em28xx: add missing em28xx_close_extension
	media: cxd2880-spi: Fix a null pointer dereference on error handling path
	media: dvb-usb: fix ununit-value in az6027_rc_query
	media: v4l2-ioctl: S_CTRL output the right value
	media: TDA1997x: handle short reads of hdmi info frame.
	media: mtk-vpu: Fix a resource leak in the error handling path of 'mtk_vpu_probe()'
	media: radio-wl1273: Avoid card name truncation
	media: si470x: Avoid card name truncation
	media: tm6000: Avoid card name truncation
	media: cx23885: Fix snd_card_free call on null card pointer
	kprobes: Do not use local variable when creating debugfs file
	crypto: ecc - fix CRYPTO_DEFAULT_RNG dependency
	cpuidle: Fix kobject memory leaks in error paths
	media: em28xx: Don't use ops->suspend if it is NULL
	ath9k: Fix potential interrupt storm on queue reset
	PM: EM: Fix inefficient states detection
	EDAC/amd64: Handle three rank interleaving mode
	rcu: Always inline rcu_dynticks_task*_{enter,exit}()
	netfilter: nft_dynset: relax superfluous check on set updates
	media: dvb-frontends: mn88443x: Handle errors of clk_prepare_enable()
	crypto: qat - detect PFVF collision after ACK
	crypto: qat - disregard spurious PFVF interrupts
	hwrng: mtk - Force runtime pm ops for sleep ops
	b43legacy: fix a lower bounds test
	b43: fix a lower bounds test
	gve: Recover from queue stall due to missed IRQ
	mmc: sdhci-omap: Fix NULL pointer exception if regulator is not configured
	mmc: sdhci-omap: Fix context restore
	memstick: avoid out-of-range warning
	memstick: jmb38x_ms: use appropriate free function in jmb38x_ms_alloc_host()
	net, neigh: Fix NTF_EXT_LEARNED in combination with NTF_USE
	hwmon: Fix possible memleak in __hwmon_device_register()
	hwmon: (pmbus/lm25066) Let compiler determine outer dimension of lm25066_coeff
	ath10k: fix max antenna gain unit
	kernel/sched: Fix sched_fork() access an invalid sched_task_group
	tcp: switch orphan_count to bare per-cpu counters
	drm/msm: potential error pointer dereference in init()
	drm/msm: uninitialized variable in msm_gem_import()
	net: stream: don't purge sk_error_queue in sk_stream_kill_queues()
	media: ir_toy: assignment to be16 should be of correct type
	mmc: mxs-mmc: disable regulator on error and in the remove function
	platform/x86: thinkpad_acpi: Fix bitwise vs. logical warning
	mt76: mt7615: fix endianness warning in mt7615_mac_write_txwi
	mt76: mt76x02: fix endianness warnings in mt76x02_mac.c
	mt76: mt7915: fix possible infinite loop release semaphore
	mt76: mt7915: fix sta_rec_wtbl tag len
	mt76: mt7915: fix muar_idx in mt7915_mcu_alloc_sta_req()
	rsi: stop thread firstly in rsi_91x_init() error handling
	mwifiex: Send DELBA requests according to spec
	net: enetc: unmap DMA in enetc_send_cmd()
	phy: micrel: ksz8041nl: do not use power down mode
	nvme-rdma: fix error code in nvme_rdma_setup_ctrl
	PM: hibernate: fix sparse warnings
	clocksource/drivers/timer-ti-dm: Select TIMER_OF
	x86/sev: Fix stack type check in vc_switch_off_ist()
	drm/msm: Fix potential NULL dereference in DPU SSPP
	smackfs: use netlbl_cfg_cipsov4_del() for deleting cipso_v4_doi
	KVM: selftests: Add operand to vmsave/vmload/vmrun in svm.c
	KVM: selftests: Fix nested SVM tests when built with clang
	bpftool: Avoid leaking the JSON writer prepared for program metadata
	libbpf: Fix BTF data layout checks and allow empty BTF
	libbpf: Allow loading empty BTFs
	libbpf: Fix overflow in BTF sanity checks
	libbpf: Fix BTF header parsing checks
	s390/gmap: don't unconditionally call pte_unmap_unlock() in __gmap_zap()
	KVM: s390: pv: avoid double free of sida page
	KVM: s390: pv: avoid stalls for kvm_s390_pv_init_vm
	irq: mips: avoid nested irq_enter()
	tpm: fix Atmel TPM crash caused by too frequent queries
	tpm_tis_spi: Add missing SPI ID
	libbpf: Fix endianness detection in BPF_CORE_READ_BITFIELD_PROBED()
	tcp: don't free a FIN sk_buff in tcp_remove_empty_skb()
	spi: spi-rpc-if: Check return value of rpcif_sw_init()
	samples/kretprobes: Fix return value if register_kretprobe() failed
	KVM: s390: Fix handle_sske page fault handling
	libertas_tf: Fix possible memory leak in probe and disconnect
	libertas: Fix possible memory leak in probe and disconnect
	wcn36xx: add proper DMA memory barriers in rx path
	wcn36xx: Fix discarded frames due to wrong sequence number
	drm/amdgpu/gmc6: fix DMA mask from 44 to 40 bits
	selftests: bpf: Convert sk_lookup ctx access tests to PROG_TEST_RUN
	selftests/bpf: Fix fd cleanup in sk_lookup test
	net: amd-xgbe: Toggle PLL settings during rate change
	net: phylink: avoid mvneta warning when setting pause parameters
	crypto: pcrypt - Delay write to padata->info
	selftests/bpf: Fix fclose/pclose mismatch in test_progs
	udp6: allow SO_MARK ctrl msg to affect routing
	ibmvnic: don't stop queue in xmit
	ibmvnic: Process crqs after enabling interrupts
	cgroup: Fix rootcg cpu.stat guest double counting
	bpf: Fix propagation of bounds from 64-bit min/max into 32-bit and var_off.
	bpf: Fix propagation of signed bounds from 64-bit min/max into 32-bit.
	of: unittest: fix EXPECT text for gpio hog errors
	iio: st_sensors: Call st_sensors_power_enable() from bus drivers
	iio: st_sensors: disable regulators after device unregistration
	RDMA/rxe: Fix wrong port_cap_flags
	ARM: dts: BCM5301X: Fix memory nodes names
	clk: mvebu: ap-cpu-clk: Fix a memory leak in error handling paths
	ARM: s3c: irq-s3c24xx: Fix return value check for s3c24xx_init_intc()
	arm64: dts: rockchip: Fix GPU register width for RK3328
	ARM: dts: qcom: msm8974: Add xo_board reference clock to DSI0 PHY
	RDMA/bnxt_re: Fix query SRQ failure
	arm64: dts: ti: k3-j721e-main: Fix "max-virtual-functions" in PCIe EP nodes
	arm64: dts: ti: k3-j721e-main: Fix "bus-range" upto 256 bus number for PCIe
	arm64: dts: meson-g12a: Fix the pwm regulator supply properties
	arm64: dts: meson-g12b: Fix the pwm regulator supply properties
	bus: ti-sysc: Fix timekeeping_suspended warning on resume
	ARM: dts: at91: tse850: the emac<->phy interface is rmii
	scsi: dc395: Fix error case unwinding
	MIPS: loongson64: make CPU_LOONGSON64 depends on MIPS_FP_SUPPORT
	JFS: fix memleak in jfs_mount
	arm64: dts: qcom: msm8916: Fix Secondary MI2S bit clock
	arm64: dts: renesas: beacon: Fix Ethernet PHY mode
	arm64: dts: qcom: pm8916: Remove wrong reg-names for rtc@6000
	ALSA: hda: Reduce udelay() at SKL+ position reporting
	ALSA: hda: Release controller display power during shutdown/reboot
	ALSA: hda: Fix hang during shutdown due to link reset
	ALSA: hda: Use position buffer for SKL+ again
	soundwire: debugfs: use controller id and link_id for debugfs
	scsi: pm80xx: Fix misleading log statement in pm8001_mpi_get_nvmd_resp()
	driver core: Fix possible memory leak in device_link_add()
	arm: dts: omap3-gta04a4: accelerometer irq fix
	ASoC: SOF: topology: do not power down primary core during topology removal
	soc/tegra: Fix an error handling path in tegra_powergate_power_up()
	memory: fsl_ifc: fix leak of irq and nand_irq in fsl_ifc_ctrl_probe
	clk: at91: check pmc node status before registering syscore ops
	video: fbdev: chipsfb: use memset_io() instead of memset()
	powerpc: Refactor is_kvm_guest() declaration to new header
	powerpc: Rename is_kvm_guest() to check_kvm_guest()
	powerpc: Reintroduce is_kvm_guest() as a fast-path check
	powerpc: Fix is_kvm_guest() / kvm_para_available()
	powerpc: fix unbalanced node refcount in check_kvm_guest()
	serial: 8250_dw: Drop wrong use of ACPI_PTR()
	usb: gadget: hid: fix error code in do_config()
	power: supply: rt5033_battery: Change voltage values to µV
	power: supply: max17040: fix null-ptr-deref in max17040_probe()
	scsi: csiostor: Uninitialized data in csio_ln_vnp_read_cbfn()
	RDMA/mlx4: Return missed an error if device doesn't support steering
	usb: musb: select GENERIC_PHY instead of depending on it
	staging: most: dim2: do not double-register the same device
	staging: ks7010: select CRYPTO_HASH/CRYPTO_MICHAEL_MIC
	pinctrl: renesas: checker: Fix off-by-one bug in drive register check
	ARM: dts: stm32: Reduce DHCOR SPI NOR frequency to 50 MHz
	ARM: dts: stm32: fix SAI sub nodes register range
	ARM: dts: stm32: fix AV96 board SAI2 pin muxing on stm32mp15
	ASoC: cs42l42: Correct some register default values
	ASoC: cs42l42: Defer probe if request_threaded_irq() returns EPROBE_DEFER
	soc: qcom: rpmhpd: Provide some missing struct member descriptions
	soc: qcom: rpmhpd: Make power_on actually enable the domain
	usb: typec: STUSB160X should select REGMAP_I2C
	iio: adis: do not disabe IRQs in 'adis_init()'
	scsi: ufs: Refactor ufshcd_setup_clocks() to remove skip_ref_clk
	scsi: ufs: ufshcd-pltfrm: Fix memory leak due to probe defer
	serial: imx: fix detach/attach of serial console
	usb: dwc2: drd: fix dwc2_force_mode call in dwc2_ovr_init
	usb: dwc2: drd: fix dwc2_drd_role_sw_set when clock could be disabled
	usb: dwc2: drd: reset current session before setting the new one
	firmware: qcom_scm: Fix error retval in __qcom_scm_is_call_available()
	soc: qcom: apr: Add of_node_put() before return
	pinctrl: equilibrium: Fix function addition in multiple groups
	phy: qcom-qusb2: Fix a memory leak on probe
	phy: ti: gmii-sel: check of_get_address() for failure
	phy: qcom-snps: Correct the FSEL_MASK
	serial: xilinx_uartps: Fix race condition causing stuck TX
	clk: at91: sam9x60-pll: use DIV_ROUND_CLOSEST_ULL
	HID: u2fzero: clarify error check and length calculations
	HID: u2fzero: properly handle timeouts in usb_submit_urb
	powerpc/44x/fsp2: add missing of_node_put
	ASoC: cs42l42: Disable regulators if probe fails
	ASoC: cs42l42: Use device_property API instead of of_property
	ASoC: cs42l42: Correct configuring of switch inversion from ts-inv
	virtio_ring: check desc == NULL when using indirect with packed
	mips: cm: Convert to bitfield API to fix out-of-bounds access
	power: supply: bq27xxx: Fix kernel crash on IRQ handler register error
	apparmor: fix error check
	rpmsg: Fix rpmsg_create_ept return when RPMSG config is not defined
	nfsd: don't alloc under spinlock in rpc_parse_scope_id
	i2c: mediatek: fixing the incorrect register offset
	NFS: Fix dentry verifier races
	pnfs/flexfiles: Fix misplaced barrier in nfs4_ff_layout_prepare_ds
	drm/plane-helper: fix uninitialized variable reference
	PCI: aardvark: Don't spam about PIO Response Status
	PCI: aardvark: Fix preserving PCI_EXP_RTCTL_CRSSVE flag on emulated bridge
	opp: Fix return in _opp_add_static_v2()
	NFS: Fix deadlocks in nfs_scan_commit_list()
	fs: orangefs: fix error return code of orangefs_revalidate_lookup()
	mtd: spi-nor: hisi-sfc: Remove excessive clk_disable_unprepare()
	PCI: uniphier: Serialize INTx masking/unmasking and fix the bit operation
	mtd: core: don't remove debugfs directory if device is in use
	remoteproc: Fix a memory leak in an error handling path in 'rproc_handle_vdev()'
	rtc: rv3032: fix error handling in rv3032_clkout_set_rate()
	dmaengine: at_xdmac: fix AT_XDMAC_CC_PERID() macro
	NFS: Fix up commit deadlocks
	NFS: Fix an Oops in pnfs_mark_request_commit()
	Fix user namespace leak
	auxdisplay: img-ascii-lcd: Fix lock-up when displaying empty string
	auxdisplay: ht16k33: Connect backlight to fbdev
	auxdisplay: ht16k33: Fix frame buffer device blanking
	soc: fsl: dpaa2-console: free buffer before returning from dpaa2_console_read
	netfilter: nfnetlink_queue: fix OOB when mac header was cleared
	dmaengine: dmaengine_desc_callback_valid(): Check for `callback_result`
	signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL)
	m68k: set a default value for MEMORY_RESERVE
	watchdog: f71808e_wdt: fix inaccurate report in WDIOC_GETTIMEOUT
	ar7: fix kernel builds for compiler test
	scsi: qla2xxx: Changes to support FCP2 Target
	scsi: qla2xxx: Relogin during fabric disturbance
	scsi: qla2xxx: Fix gnl list corruption
	scsi: qla2xxx: Turn off target reset during issue_lip
	NFSv4: Fix a regression in nfs_set_open_stateid_locked()
	i2c: xlr: Fix a resource leak in the error handling path of 'xlr_i2c_probe()'
	xen-pciback: Fix return in pm_ctrl_init()
	net: davinci_emac: Fix interrupt pacing disable
	ethtool: fix ethtool msg len calculation for pause stats
	openrisc: fix SMP tlb flush NULL pointer dereference
	net: vlan: fix a UAF in vlan_dev_real_dev()
	ice: Fix replacing VF hardware MAC to existing MAC filter
	ice: Fix not stopping Tx queues for VFs
	ACPI: PMIC: Fix intel_pmic_regs_handler() read accesses
	drm/nouveau/svm: Fix refcount leak bug and missing check against null bug
	net: phy: fix duplex out of sync problem while changing settings
	bonding: Fix a use-after-free problem when bond_sysfs_slave_add() failed
	mfd: core: Add missing of_node_put for loop iteration
	can: mcp251xfd: mcp251xfd_chip_start(): fix error handling for mcp251xfd_chip_rx_int_enable()
	mm/zsmalloc.c: close race window between zs_pool_dec_isolated() and zs_unregister_migration()
	zram: off by one in read_block_state()
	perf bpf: Add missing free to bpf_event__print_bpf_prog_info()
	llc: fix out-of-bound array index in llc_sk_dev_hash()
	nfc: pn533: Fix double free when pn533_fill_fragment_skbs() fails
	arm64: pgtable: make __pte_to_phys/__phys_to_pte_val inline functions
	bpf, sockmap: Remove unhash handler for BPF sockmap usage
	bpf: sockmap, strparser, and tls are reusing qdisc_skb_cb and colliding
	gve: Fix off by one in gve_tx_timeout()
	seq_file: fix passing wrong private data
	net/sched: sch_taprio: fix undefined behavior in ktime_mono_to_any
	net: hns3: fix kernel crash when unload VF while it is being reset
	net: hns3: allow configure ETS bandwidth of all TCs
	net: stmmac: allow a tc-taprio base-time of zero
	vsock: prevent unnecessary refcnt inc for nonblocking connect
	net/smc: fix sk_refcnt underflow on linkdown and fallback
	cxgb4: fix eeprom len when diagnostics not implemented
	selftests/net: udpgso_bench_rx: fix port argument
	ARM: 9155/1: fix early early_iounmap()
	ARM: 9156/1: drop cc-option fallbacks for architecture selection
	parisc: Fix backtrace to always include init funtion names
	MIPS: Fix assembly error from MIPSr2 code used within MIPS_ISA_ARCH_LEVEL
	x86/mce: Add errata workaround for Skylake SKX37
	posix-cpu-timers: Clear task::posix_cputimers_work in copy_process()
	irqchip/sifive-plic: Fixup EOI failed when masked
	f2fs: should use GFP_NOFS for directory inodes
	net, neigh: Enable state migration between NUD_PERMANENT and NTF_USE
	9p/net: fix missing error check in p9_check_errors
	memcg: prohibit unconditional exceeding the limit of dying tasks
	powerpc/lib: Add helper to check if offset is within conditional branch range
	powerpc/bpf: Validate branch ranges
	powerpc/security: Add a helper to query stf_barrier type
	powerpc/bpf: Emit stf barrier instruction sequences for BPF_NOSPEC
	mm, oom: pagefault_out_of_memory: don't force global OOM for dying tasks
	mm, oom: do not trigger out_of_memory from the #PF
	mfd: dln2: Add cell for initializing DLN2 ADC
	video: backlight: Drop maximum brightness override for brightness zero
	s390/cio: check the subchannel validity for dev_busid
	s390/tape: fix timer initialization in tape_std_assign()
	s390/ap: Fix hanging ioctl caused by orphaned replies
	s390/cio: make ccw_device_dma_* more robust
	mtd: rawnand: ams-delta: Keep the driver compatible with on-die ECC engines
	mtd: rawnand: xway: Keep the driver compatible with on-die ECC engines
	mtd: rawnand: mpc5121: Keep the driver compatible with on-die ECC engines
	mtd: rawnand: gpio: Keep the driver compatible with on-die ECC engines
	mtd: rawnand: pasemi: Keep the driver compatible with on-die ECC engines
	mtd: rawnand: orion: Keep the driver compatible with on-die ECC engines
	mtd: rawnand: plat_nand: Keep the driver compatible with on-die ECC engines
	mtd: rawnand: au1550nd: Keep the driver compatible with on-die ECC engines
	powerpc/powernv/prd: Unregister OPAL_MSG_PRD2 notifier during module unload
	powerpc/85xx: fix timebase sync issue when CONFIG_HOTPLUG_CPU=n
	drm/sun4i: Fix macros in sun8i_csc.h
	PCI: Add PCI_EXP_DEVCTL_PAYLOAD_* macros
	PCI: aardvark: Fix PCIe Max Payload Size setting
	SUNRPC: Partial revert of commit 6f9f17287e
	ath10k: fix invalid dma_addr_t token assignment
	mmc: moxart: Fix null pointer dereference on pointer host
	selftests/bpf: Fix also no-alu32 strobemeta selftest
	arch/cc: Introduce a function to check for confidential computing features
	x86/sev: Add an x86 version of cc_platform_has()
	x86/sev: Make the #VC exception stacks part of the default stacks storage
	soc/tegra: pmc: Fix imbalanced clock disabling in error code path
	Linux 5.10.80

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I21c750863965fbf584251fa2de3c941ae5922d3f
2021-11-19 11:50:41 +01:00
Neeraj Upadhyay
226d68fb6c rcu: Fix existing exp request check in sync_sched_exp_online_cleanup()
[ Upstream commit f0b2b2df5423fb369ac762c77900bc7765496d58 ]

The sync_sched_exp_online_cleanup() checks to see if RCU needs
an expedited quiescent state from the incoming CPU, sending it
an IPI if so. Before sending IPI, it checks whether expedited
qs need has been already requested for the incoming CPU, by
checking rcu_data.cpu_no_qs.b.exp for the current cpu, on which
sync_sched_exp_online_cleanup() is running. This works for the
case where incoming CPU is same as self. However, for the case
where incoming CPU is different from self, expedited request
won't get marked, which can potentially delay reporting of
expedited quiescent state for the incoming CPU.

Fixes: e015a34112 ("rcu: Avoid self-IPI in sync_sched_exp_online_cleanup()")
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18 14:04:02 +01:00
Sangmoon Kim
f23fe39801 FROMGIT: rcu/tree: Add a trace event for RCU CPU stall warnings
This commit adds a trace event which allows tracing the beginnings of RCU
CPU stall warnings on systems where sysctl_panic_on_rcu_stall is disabled.

The first parameter is the name of RCU flavor like other trace events.
The second parameter indicates whether this is a stall of an expedited
grace period, a self-detected stall of a normal grace period, or a stall
of a normal grace period detected by some CPU other than the one that
is stalled.

RCU CPU stall warnings are often caused by external-to-RCU issues,
for example, in interrupt handling or task scheduling.  Therefore,
this event uses TRACE_EVENT, not TRACE_EVENT_RCU, to avoid requiring
those interested in tracing RCU CPU stalls to rebuild their kernels
with CONFIG_RCU_TRACE=y.

Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit 71580d6503f747f1b06ef8dbb7cdbbe20400a2ec
 git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev)
Bug: 177483057
Change-Id: Ie2f0340725a2fb582d026ecfaf5e1d35cc90fece
Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
2021-03-11 21:06:44 +09:00
Paul E. McKenney
7487ea07df rcu: Initialize at declaration time in rcu_exp_handler()
This commit moves the initialization of the CONFIG_PREEMPT=n version of
the rcu_exp_handler() function's rdp and rnp local variables into their
respective declarations to save a couple lines of code.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24 18:36:03 -07:00
Paul E. McKenney
68c2f27e01 rcu: Expedited grace-period sleeps to idle priority
This commit converts the schedule_timeout_uninterruptible() call used
by RCU's expedited grace-period processing to schedule_timeout_idle().
This conversion avoids polluting the load-average with RCU-related
sleeping.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:58:50 -07:00
Paul E. McKenney
f736e0f1a5 Merge branches 'fixes.2020.04.27a', 'kfree_rcu.2020.04.27a', 'rcu-tasks.2020.04.27a', 'stall.2020.04.27a' and 'torture.2020.05.07a' into HEAD
fixes.2020.04.27a:  Miscellaneous fixes.
kfree_rcu.2020.04.27a:  Changes related to kfree_rcu().
rcu-tasks.2020.04.27a:  Addition of new RCU-tasks flavors.
stall.2020.04.27a:  RCU CPU stall-warning updates.
torture.2020.05.07a:  Torture-test updates.
2020-05-07 10:18:32 -07:00
Paul E. McKenney
654db05cee rcu: Use data_race() for RCU expedited CPU stall-warning prints
Although the accesses used to determine whether or not an expedited
stall should be printed are an integral part of the concurrency algorithm
governing use of the corresponding variables, the values that are simply
printed are ancillary.  As such, it is best to use data_race() for these
accesses in order to provide the greatest latitude in the use of KCSAN
for the other accesses that are an integral part of the algorithm.  This
commit therefore changes the relevant uses of READ_ONCE() to data_race().

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:04:37 -07:00
Lai Jiangshan
5f5fa7ea89 rcu: Don't use negative nesting depth in __rcu_read_unlock()
Now that RCU flavors have been consolidated, an RCU-preempt
rcu_read_unlock() in an interrupt or softirq handler cannot possibly
end the RCU read-side critical section.  Consider the old vulnerability
involving rcu_read_unlock() being invoked within such a handler that
interrupted an __rcu_read_unlock_special(), in which a wakeup might be
invoked with a scheduler lock held.  Because rcu_read_unlock_special()
no longer does wakeups in such situations, it is no longer necessary
for __rcu_read_unlock() to set the nesting level negative.

This commit therefore removes this recursion-protection code from
__rcu_read_unlock().

[ paulmck: Let rcu_exp_handler() continue to call rcu_report_exp_rdp(). ]
[ paulmck: Adjust other checks given no more negative nesting. ]
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:50 -07:00
Paul E. McKenney
314eeb43e5 rcu: Add *_ONCE() and data_race() to rcu_node ->exp_tasks plus locking
There are lockless loads from the rcu_node structure's ->exp_tasks field,
so this commit causes all stores to use WRITE_ONCE() and all lockless
loads to use READ_ONCE() or data_race(), with the latter for debug
prints.  This code also did a unprotected traversal of the linked list
pointed into by ->exp_tasks, so this commit also acquires the rcu_node
structure's ->lock to properly protect this traversal.  This list was
traversed unprotected only when printing an RCU CPU stall warning for
an expedited grace period, so the odds of seeing this in production are
not all that high.

This data race was reported by KCSAN.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:01:15 -07:00
Paul E. McKenney
aa93ec620b Merge branches 'doc.2020.02.27a', 'fixes.2020.03.21a', 'kfree_rcu.2020.02.20a', 'locktorture.2020.02.20a', 'ovld.2020.02.20a', 'rcu-tasks.2020.02.20a', 'srcu.2020.02.20a' and 'torture.2020.02.20a' into HEAD
doc.2020.02.27a: Documentation updates.
fixes.2020.03.21a: Miscellaneous fixes.
kfree_rcu.2020.02.20a: Updates to kfree_rcu().
locktorture.2020.02.20a: Lock torture-test updates.
ovld.2020.02.20a: Updates to callback-overload handling.
rcu-tasks.2020.02.20a: RCU-tasks updates.
srcu.2020.02.20a: SRCU updates.
torture.2020.02.20a: Torture-test updates.
2020-03-21 17:15:11 -07:00
Paul E. McKenney
58c53360b3 rcutorture: Allow boottime stall warnings to be suppressed
In normal production, an RCU CPU stall warning at boottime is often
just as bad as at any other time.  In fact, given the desire for fast
boot, any sort of long-term stall at boot is a bad idea.  However,
heavy rcutorture testing on large hyperthreaded systems can generate
boottime RCU CPU stalls as a matter of course.  This commit therefore
provides a kernel boot parameter that suppresses reporting of boottime
RCU CPU stall warnings and similarly of rcutorture writer stalls.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 16:03:30 -08:00
Paul E. McKenney
59ee0326cc rcutorture: Suppress forward-progress complaints during early boot
Some larger systems can take in excess of 50 seconds to complete their
early boot initcalls prior to spawing init.  This does not in any way
help the forward-progress judgments of built-in rcutorture (when
rcutorture is built as a module, the insmod or modprobe command normally
cannot happen until some time after boot completes).  This commit
therefore suppresses such complaints until about the time that init
is spawned.

This also includes a fix to a stupid error located by kbuild test robot.

[ paulmck: Apply kbuild test robot feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
[ paulmck: Fix to nohz_full slow-expediting recovery logic, per bpetkov. ]
[ paulmck: Restrict splat to CONFIG_PREEMPT_RT=y kernels and simplify. ]
Tested-by: Borislav Petkov <bp@alien8.de>
2020-02-20 16:03:30 -08:00
Paul E. McKenney
b0c18c8773 rcu: Add WRITE_ONCE to rcu_node ->exp_seq_rq store
The rcu_node structure's ->exp_seq_rq field is read locklessly, so
this commit adds the WRITE_ONCE() to a load in order to provide proper
documentation and READ_ONCE()/WRITE_ONCE() pairing.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 15:58:22 -08:00
Paul E. McKenney
24bb9eccf7 rcu: Fix exp_funnel_lock()/rcu_exp_wait_wake() datarace
The rcu_node structure's ->exp_seq_rq field is accessed locklessly, so
updates must use WRITE_ONCE().  This commit therefore adds the needed
WRITE_ONCE() invocation where it was missed.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 15:58:21 -08:00
Paul E. McKenney
59d8cc6b2e rcu: Forgive slow expedited grace periods at boot time
Boot-time processing often loops in the kernel longer than one might
prefer, which can prevent expedited grace periods from completing in
a timely manner.  This in turn triggers a splat In nohz_full CPUs  One
could argue that long-looping code should be fixed, but on the other hand,
boot time is a bit special.

This commit therefore removes the splat.  Later commits will add the
splat back in, but in a way that removes false positives.

Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-25 12:00:40 -08:00
Paul E. McKenney
0e247386d9 Merge branches 'doc.2019.12.10a', 'exp.2019.12.09a', 'fixes.2020.01.24a', 'kfree_rcu.2020.01.24a', 'list.2020.01.10a', 'preempt.2020.01.24a' and 'torture.2019.12.09a' into HEAD
doc.2019.12.10a: Documentations updates
exp.2019.12.09a: Expedited grace-period updates
fixes.2020.01.24a: Miscellaneous fixes
kfree_rcu.2020.01.24a: Batch kfree_rcu() work
list.2020.01.10a: RCU-protected-list updates
preempt.2020.01.24a: Preemptible RCU updates
torture.2019.12.09a: Torture-test updates
2020-01-24 10:37:27 -08:00
Lai Jiangshan
77339e61aa rcu: Provide wrappers for uses of ->rcu_read_lock_nesting
This commit provides wrapper functions for uses of ->rcu_read_lock_nesting
to improve readability and to ease future changes to support inlining
of __rcu_read_lock() and __rcu_read_unlock().

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24 10:27:33 -08:00
Sebastian Andrzej Siewior
90326f0521 rcu: Use CONFIG_PREEMPTION where appropriate
The config option `CONFIG_PREEMPT' is used for the preemption model
"Low-Latency Desktop". The config option `CONFIG_PREEMPTION' is enabled
when kernel preemption is enabled which is true for the preemption model
`CONFIG_PREEMPT' and `CONFIG_PREEMPT_RT'.

Use `CONFIG_PREEMPTION' if it applies to both preemption models and not
just to `CONFIG_PREEMPT'.

Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: rcu@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09 12:37:51 -08:00
Paul E. McKenney
df1e849ae4 rcu: Enable tick for nohz_full CPUs slow to provide expedited QS
An expedited grace period can be stalled by a nohz_full CPU looping
in kernel context.  This possibility is currently handled by some
carefully crafted checks in rcu_read_unlock_special() that enlist help
from ksoftirqd when permitted by the scheduler.  However, it is exactly
these checks that require the scheduler avoid holding any of its rq or
pi locks across rcu_read_unlock() without also having held them across
the entire RCU read-side critical section.

It would therefore be very nice if expedited grace periods could
handle nohz_full CPUs looping in kernel context without such checks.
This commit therefore adds code to the expedited grace period's wait
and cleanup code that forces the scheduler-clock interrupt on for CPUs
that fail to quickly supply a quiescent state.  "Quickly" is currently
a hard-coded single-jiffy delay.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09 12:32:59 -08:00
Paul E. McKenney
28f0361fdf rcu: Replace synchronize_sched_expedited_wait() "_sched" with "_rcu"
After RCU flavor consolidation, synchronize_sched_expedited_wait() does
both RCU-preempt and RCU-sched, whichever happens to have been built into
the running kernel.  This commit therefore changes this function's name
to synchronize_rcu_expedited_wait() to reflect its new generic nature.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09 12:24:59 -08:00
Paul E. McKenney
de8cd0a533 rcu: Update tree_exp.h function-header comments
The function-header comments in kernel/rcu/tree_exp.h have gotten a bit
out of date, so this commit updates a number of them.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09 12:24:58 -08:00
Paul E. McKenney
6c7d7dbf5b rcu: Rename sync_rcu_preempt_exp_done() to sync_rcu_exp_done()
Now that the RCU flavors have been consolidated, there is one common
function for checking to see if an expedited RCU grace period has
completed, namely sync_rcu_preempt_exp_done().  Because this function is
no longer specific to RCU-preempt, this commit removes the "_preempt" from
its name.  This commit also changes sync_rcu_preempt_exp_done_unlocked()
to sync_rcu_exp_done_unlocked() for the same reason.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09 12:24:58 -08:00
Neeraj Upadhyay
4bc6b745e5 rcu: Allow only one expedited GP to run concurrently with wakeups
The current expedited RCU grace-period code expects that a task
requesting an expedited grace period cannot awaken until that grace
period has reached the wakeup phase.  However, it is possible for a long
preemption to result in the waiting task never sleeping.  For example,
consider the following sequence of events:

1.	Task A starts an expedited grace period by invoking
	synchronize_rcu_expedited().  It proceeds normally up to the
	wait_event() near the end of that function, and is then preempted
	(or interrupted or whatever).

2.	The expedited grace period completes, and a kworker task starts
	the awaken phase, having incremented the counter and acquired
	the rcu_state structure's .exp_wake_mutex.  This kworker task
	is then preempted or interrupted or whatever.

3.	Task A resumes and enters wait_event(), which notes that the
	expedited grace period has completed, and thus doesn't sleep.

4.	Task B starts an expedited grace period exactly as did Task A,
	complete with the preemption (or whatever delay) just before
	the call to wait_event().

5.	The expedited grace period completes, and another kworker
	task starts the awaken phase, having incremented the counter.
	However, it blocks when attempting to acquire the rcu_state
	structure's .exp_wake_mutex because step 2's kworker task has
	not yet released it.

6.	Steps 4 and 5 repeat, resulting in overflow of the rcu_node
	structure's ->exp_wq[] array.

In theory, this is harmless.  Tasks waiting on the various ->exp_wq[]
array will just be spuriously awakened, but they will just sleep again
on noting that the rcu_state structure's ->expedited_sequence value has
not advanced far enough.

In practice, this wastes CPU time and is an accident waiting to happen.
This commit therefore moves the rcu_exp_gp_seq_end() call that officially
ends the expedited grace period (along with associate tracing) until
after the ->exp_wake_mutex has been acquired.  This prevents Task A from
awakening prematurely, thus preventing more than one expedited grace
period from being in flight during a previous expedited grace period's
wakeup phase.

Fixes: 3b5f668e71 ("rcu: Overlap wakeups with next expedited grace period")
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
[ paulmck: Added updated comment. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09 12:24:57 -08:00
Neeraj Upadhyay
fd6bc19d76 rcu: Fix missed wakeup of exp_wq waiters
Tasks waiting within exp_funnel_lock() for an expedited grace period to
elapse can be starved due to the following sequence of events:

1.	Tasks A and B both attempt to start an expedited grace
	period at about the same time.	This grace period will have
	completed when the lower four bits of the rcu_state structure's
	->expedited_sequence field are 0b'0100', for example, when the
	initial value of this counter is zero.	Task A wins, and thus
	does the actual work of starting the grace period, including
	acquiring the rcu_state structure's .exp_mutex and sets the
	counter to 0b'0001'.

2.	Because task B lost the race to start the grace period, it
	waits on ->expedited_sequence to reach 0b'0100' inside of
	exp_funnel_lock(). This task therefore blocks on the rcu_node
	structure's ->exp_wq[1] field, keeping in mind that the
	end-of-grace-period value of ->expedited_sequence (0b'0100')
	is shifted down two bits before indexing the ->exp_wq[] field.

3.	Task C attempts to start another expedited grace period,
	but blocks on ->exp_mutex, which is still held by Task A.

4.	The aforementioned expedited grace period completes, so that
	->expedited_sequence now has the value 0b'0100'.  A kworker task
	therefore acquires the rcu_state structure's ->exp_wake_mutex
	and starts awakening any tasks waiting for this grace period.

5.	One of the first tasks awakened happens to be Task A.  Task A
	therefore releases the rcu_state structure's ->exp_mutex,
	which allows Task C to start the next expedited grace period,
	which causes the lower four bits of the rcu_state structure's
	->expedited_sequence field to become 0b'0101'.

6.	Task C's expedited grace period completes, so that the lower four
	bits of the rcu_state structure's ->expedited_sequence field now
	become 0b'1000'.

7.	The kworker task from step 4 above continues its wakeups.
	Unfortunately, the wake_up_all() refetches the rcu_state
	structure's .expedited_sequence field:

	wake_up_all(&rnp->exp_wq[rcu_seq_ctr(rcu_state.expedited_sequence) & 0x3]);

	This results in the wakeup being applied to the rcu_node
	structure's ->exp_wq[2] field, which is unfortunate given that
	Task B is instead waiting on ->exp_wq[1].

On a busy system, no harm is done (or at least no permanent harm is done).
Some later expedited grace period will redo the wakeup.  But on a quiet
system, such as many embedded systems, it might be a good long time before
there was another expedited grace period.  On such embedded systems,
this situation could therefore result in a system hang.

This issue manifested as DPM device timeout during suspend (which
usually qualifies as a quiet time) due to a SCSI device being stuck in
_synchronize_rcu_expedited(), with the following stack trace:

	schedule()
	synchronize_rcu_expedited()
	synchronize_rcu()
	scsi_device_quiesce()
	scsi_bus_suspend()
	dpm_run_callback()
	__device_suspend()

This commit therefore prevents such delays, timeouts, and hangs by
making rcu_exp_wait_wake() use its "s" argument consistently instead of
refetching from rcu_state.expedited_sequence.

Fixes: 3b5f668e71 ("rcu: Overlap wakeups with next expedited grace period")
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09 12:24:57 -08:00
Paul E. McKenney
aca2991a25 rcu: Substitute lookup for bit-twiddling in sync_rcu_exp_select_node_cpus()
The code in sync_rcu_exp_select_node_cpus() calculates the current
CPU's mask within its rcu_node structure's bitmasks, but this has
already been computed in the ->grpmask field of that CPU's rcu_data
structure.  This commit therefore just uses this ->grpmask field.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-12-09 12:24:57 -08:00
Boqun Feng
9f08cf0886 rcu: Avoid modifying mask_ofl_ipi in sync_rcu_exp_select_node_cpus()
The "mask_ofl_ipi" is used to track which CPUs get IPIed, however
in the IPI sending loop, "mask_ofl_ipi" along with another variable
"mask_ofl_test" might also get modified to record which CPUs' quiesent
states must be reported by the sync_rcu_exp_select_node_cpus() at
the end of sync_rcu_exp_select_node_cpus().  This overlap of roles
can be confusing, so this patch cleans things a little by using
"mask_ofl_ipi" solely for determining which CPUs must be IPIed  and
"mask_ofl_test" for solely determining on behalf of  which CPUs
sync_rcu_exp_select_node_cpus() must report a quiscent state.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: Marco Elver <elver@google.com>
2019-12-09 12:24:56 -08:00
Paul E. McKenney
15c7c972cd rcu: Use *_ONCE() to protect lockless ->expmask accesses
The rcu_node structure's ->expmask field is accessed locklessly when
starting a new expedited grace period and when reporting an expedited
RCU CPU stall warning.  This commit therefore handles the former by
taking a snapshot of ->expmask while the lock is held and the latter
by applying READ_ONCE() to lockless reads and WRITE_ONCE() to the
corresponding updates.

Link: https://lore.kernel.org/lkml/CANpmjNNmSOagbTpffHr4=Yedckx9Rm2NuGqC9UqE+AOz5f1-ZQ@mail.gmail.com
Reported-by: syzbot+134336b86f728d6e55a0@syzkaller.appspotmail.com
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Marco Elver <elver@google.com>
2019-12-09 12:24:56 -08:00
Mukesh Ojha
511b44f759 rcu: Fix spelling mistake "greate"->"great"
This commit fixes a spelling mistake in file tree_exp.h.

Signed-off-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-08-12 11:25:06 -07:00
Paul E. McKenney
fbad01af8c rcu: Add destroy_work_on_stack() to match INIT_WORK_ONSTACK()
The synchronize_rcu_expedited() function has an INIT_WORK_ONSTACK(),
but lacks the corresponding destroy_work_on_stack().  This commit
therefore adds destroy_work_on_stack().

Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
2019-08-01 14:05:51 -07:00
Paul E. McKenney
11ca7a9d54 Merge branches 'consolidate.2019.05.28a', 'doc.2019.05.28a', 'fixes.2019.06.13a', 'srcu.2019.05.28a', 'sync.2019.05.28a' and 'torture.2019.05.28a' into HEAD
consolidate.2019.05.28a: RCU flavor consolidation cleanups and optmizations.
doc.2019.05.28a: Documentation updates.
fixes.2019.06.13a: Miscellaneous fixes.
srcu.2019.05.28a: SRCU updates.
sync.2019.05.28a: RCU-sync flavor consolidation.
torture.2019.05.28a: Torture-test updates.
2019-06-19 09:21:46 -07:00
Paul E. McKenney
96050c68be rcu: Upgrade sync_exp_work_done() to smp_mb()
The sync_exp_work_done() function uses smp_mb__before_atomic(), but
there is no obvious atomic in the ensuing code.  The ordering is
absolutely required for grace periods to work correctly, so this
commit upgrades the smp_mb__before_atomic() to smp_mb().

Fixes: 6fba2b3767 ("rcu: Remove deprecated RCU debugfs tracing code")
Reported-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-06-13 15:33:19 -07:00
Jiang Biao
f0b6356273 rcu: Remove unused rdp local from synchronize_rcu_expedited()
Because rdp is initialized but never used in synchronize_rcu_expedited(),
this commit removes it.

Signed-off-by: Jiang Biao <benbjiang@tencent.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-05-28 08:48:19 -07:00
Paul E. McKenney
1bb336443c rcu: Rename rcu_data's ->deferred_qs to ->exp_deferred_qs
The rcu_data structure's ->deferred_qs field is used to indicate that the
current CPU is blocking an expedited grace period (perhaps a future one).
Given that it is used only for expedited grace periods, its current name
is misleading, so this commit renames it to ->exp_deferred_qs.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-05-28 08:48:19 -07:00
Paul E. McKenney
e015a34112 rcu: Avoid self-IPI in sync_sched_exp_online_cleanup()
The sync_sched_exp_online_cleanup() is invoked at online time to handle
the case where the start of an expedited grace period ran concurrently
with a CPU being taken offline and then immediately being placed online.
It checks to see if RCU needs an expedited quiescent state from the
incoming CPU, sending it an IPI if so.  However, it is quite possible
that sync_sched_exp_online_cleanup() is running on that CPU, in which
case it is considerably less overhead to simply request the quiescent
state locally instead of simulating a self-IPI.

This commit therefore places the last few lines of rcu_exp_handler()
into a new rcu_exp_need_qs() function, which is invoked both by
rcu_exp_handler() and by sync_sched_exp_online_cleanup() in the self-IPI
case.

This also reduces the rcu_exp_handler() function's state space by
removing the direct call that this smp_call_function_single() uses to
emulate the requested self-IPI.  This in turn will allow tighter error
checking in rcu_is_cpu_rrupt_from_idle().

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2019-05-25 14:50:50 -07:00
Paul E. McKenney
b9ad4d6ed1 rcu: Avoid self-IPI in sync_rcu_exp_select_node_cpus()
Although sync_rcu_exp_select_node_cpus() treats the current CPU as being
in a quiescent state, it might well migrate to some other CPU before
reaching the smp_call_function_single(), which could then result in an
unnecessary simulated self-IPI.  This commit therefore instead simply
refuses to invoke smp_call_function_single() on the current CPU, which
causes the later rcu_report_exp_cpu_mult() to report this CPU's quiescent
state with less overhead.

This also reduces the rcu_exp_handler() function's state space by removing
the direct call that this smp_call_function_single() uses to emulate the
requested self-IPI.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
[ paulmck: Use get_cpu() instead of preempt_disable() per Joel Fernandes. ]
2019-05-25 14:50:50 -07:00
Paul E. McKenney
6cdbc07a5a Merge branches 'consolidate.2019.04.09a', 'doc.2019.03.26b', 'fixes.2019.03.26b', 'srcu.2019.03.26b', 'stall.2019.03.26b' and 'torture.2019.03.26b' into HEAD
consolidate.2019.04.09a: Lingering RCU flavor consolidation cleanups.
doc.2019.03.26b: Documentation updates.
fixes.2019.03.26b: Miscellaneous fixes.
srcu.2019.03.26b: SRCU updates.
stall.2019.03.26b: RCU CPU stall warning updates.
torture.2019.03.26b: Torture-test updates.
2019-04-09 08:08:13 -07:00
Paul E. McKenney
d87cda5094 rcu: Move rcu_print_task_exp_stall() to tree_exp.h
Because expedited CPU stall warnings are contained within the
kernel/rcu/tree_exp.h file, rcu_print_task_exp_stall() should live
there too.  This commit carries out the required code motion.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:40:13 -07:00
Paul E. McKenney
add0d37b4f rcu: Correct READ_ONCE()/WRITE_ONCE() for ->rcu_read_unlock_special
The task_struct structure's ->rcu_read_unlock_special field is only ever
read or written by the owning task, but it is accessed both at process
and interrupt levels.  It may therefore be accessed using plain reads
and writes while interrupts are disabled, but must be accessed using
READ_ONCE() and WRITE_ONCE() or better otherwise.  This commit makes a
few adjustments to align with this discipline.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:38:38 -07:00
Paul E. McKenney
f1a98045ab rcu: Fix typo in tree_exp.h comment
This commit changes a rcu_exp_handler() comment from rcu_preempt_defer_qs()
to rcu_preempt_deferred_qs() in order to better match reality.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26 14:38:38 -07:00
Paul E. McKenney
e7ffb4eb9a Merge branches 'doc.2019.01.26a', 'fixes.2019.01.26a', 'sil.2019.01.26a', 'spdx.2019.02.09a', 'srcu.2019.01.26a' and 'torture.2019.01.26a' into HEAD
doc.2019.01.26a:  Documentation updates.
fixes.2019.01.26a:  Miscellaneous fixes.
sil.2019.01.26a:  Removal of a few more spin_is_locked() instances.
spdx.2019.02.09a:  Add SPDX identifiers to RCU files
srcu.2019.01.26a:  SRCU updates.
torture.2019.01.26a: Torture-test updates.
2019-02-09 08:47:52 -08:00
Paul E. McKenney
22e4092531 rcu/tree: Convert to SPDX license identifier
Replace the license boiler plate with a SPDX license identifier.
While in the area, update an email address.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
[ paulmck: Update .h file SPDX comment format per Joe Perches. ]
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2019-02-09 08:44:10 -08:00
Paul E. McKenney
5a0874c1d1 rcu: Remove preemption disabling from expedited CPU selection
It turns out that it is queue_delayed_work_on() rather than
queue_work_on() that has difficulties when used concurrently with
CPU-hotplug removal operations.  It is therefore unnecessary to protect
CPU identification and queue_work_on() with preempt_disable().

This commit therefore removes the preempt_disable() and preempt_enable()
from sync_rcu_exp_select_cpus(), which has the further benefit of reducing
the number of changes that must be maintained in the -rt patchset.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Sebastian Siewior <bigeasy@linutronix.de>
Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:35:23 -08:00
Paul E. McKenney
8923072664 rcu: Inline _synchronize_rcu_expedited() into synchronize_rcu_expedited()
Now that _synchronize_rcu_expedited() has only one caller, and given that
this is a tail call, this commit inlines _synchronize_rcu_expedited()
into synchronize_rcu_expedited().

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:28:29 -08:00
Paul E. McKenney
e5bc3af773 rcu: Consolidate PREEMPT and !PREEMPT synchronize_rcu()
Now that rcu_blocking_is_gp() makes the correct immediate-return
decision for both PREEMPT and !PREEMPT, a single implementation of
synchronize_rcu() will work correctly under both configurations.
This commit therefore eliminates a few lines of code by consolidating
the two implementations of synchronize_rcu().

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:28:28 -08:00
Paul E. McKenney
3cd4ca47aa rcu: Consolidate PREEMPT and !PREEMPT synchronize_rcu_expedited()
The CONFIG_PREEMPT=n and CONFIG_PREEMPT=y implementations of
synchronize_rcu_expedited() are quite similar, and with small
modifications to rcu_blocking_is_gp() can be made identical.  This commit
therefore makes this change in order to save a few lines of code and to
reduce the amount of duplicate code.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:28:27 -08:00
Paul E. McKenney
142d106d5e rcu: Determine expedited-GP IPI handler at build time
Back when there could be multiple RCU flavors running in the same kernel
at the same time, it was necessary to specify the expedited grace-period
IPI handler at runtime.  Now that there is only one RCU flavor, the
IPI handler can be determined at build time.  There is therefore no
longer any reason for the RCU-preempt and RCU-sched IPI handlers to
have different names, nor is there any reason to pass these handlers in
function arguments and in the data structures enclosing workqueues.

This commit therefore makes all these changes, pushing the specification
of the expedited grace-period IPI handler down to the point of use.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:28:27 -08:00
Paul E. McKenney
1de462ed85 rcu: Make expedited IPI handler return after handling critical section
During expedited RCU grace-period initialization, IPIs are sent to
all non-idle online CPUs.  The IPI handler checks to see if the CPU is
in quiescent state, reporting one if so.  This handler looks at three
different cases: (1) The CPU is not in an rcu_read_lock()-based critical
section, (2) The CPU is in the process of exiting an rcu_read_lock()-based
critical section, and (3) The CPU is in an rcu_read_lock()-based critical
section.  In case (2), execution falls through into case (3).

This is harmless from a functionality viewpoint, but can result in
needless overhead during an improbable corner case.  This commit therefore
adds the "return" statement needed to prevent fall-through.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:28:24 -08:00
Paul E. McKenney
05f415715c rcu: Speed up expedited GPs when interrupting RCU reader
In PREEMPT kernels, an expedited grace period might send an IPI to a
CPU that is executing an RCU read-side critical section.  In that case,
it would be nice if the rcu_read_unlock() directly interacted with the
RCU core code to immediately report the quiescent state.  And this does
happen in the case where the reader has been preempted.  But it would
also be a nice performance optimization if immediate reporting also
happened in the preemption-free case.

This commit therefore adds an ->exp_hint field to the task_struct structure's
->rcu_read_unlock_special field.  The IPI handler sets this hint when
it has interrupted an RCU read-side critical section, and this causes
the outermost rcu_read_unlock() call to invoke rcu_read_unlock_special(),
which, if preemption is enabled, reports the quiescent state immediately.
If preemption is disabled, then the report is required to be deferred
until preemption (or bottom halves or interrupts or whatever) is re-enabled.

Because this is a hint, it does nothing for more complicated cases.  For
example, if the IPI interrupts an RCU reader, but interrupts are disabled
across the rcu_read_unlock(), but another rcu_read_lock() is executed
before interrupts are re-enabled, the hint will already have been cleared.
If you do crazy things like this, reporting will be deferred until some
later RCU_SOFTIRQ handler, context switch, cond_resched(), or similar.

Reported-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2018-11-12 09:03:59 -08:00