This reverts commit 8c3ac02bca.
The hook android_vh_mutex_start_check_new_owner is not used by any
vendor, so remove it to help with merge issues with future LTS releases.
If this is needed by any real user, it can easily be reverted to add it
back and then the symbol should be added to the abi list at the same
time to prevent it from being removed again later.
Bug: 203756332
Bug: 231647361
Cc: Liujie Xie <xieliujie@oppo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8c3bf787525d684f64b8d0654d379df78eb7b69e
Providing vendor hooks to record the start time of holding the lock, which
protects rwsem/mutex locking-process from being preemptedfor a short time
in some cases.
- android_vh_record_mutex_lock_starttime
- android_vh_record_rtmutex_lock_starttime
- android_vh_record_rwsem_lock_starttime
- android_vh_record_percpu_rwsem_lock_starttime
Bug: 241191475
Signed-off-by: Peifeng Li <lipeifeng@oppo.com>
Change-Id: I0e967a1e8b77c32a1ad588acd54028fae2f90c4e
Due to the existence of optimistic spin, we need to
sense whether the owner of the lock has changed in the loop,
so as to do priority inheritance on the owner more accurately,
trace_android_vh_mutex_wait_start does not meet our needs.
Bug: 231647361
Change-Id: Iab2832fd3c352d8c1229348a5e7befced70ee92e
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Current might be preempt after spin_unlock(&lock->wait_lock),
we want to add a hook after wake_up_q(&wake_q) in which to
disable owner's privilege in scheduler.
Bug: 231647361
Change-Id: I3016da2fd31b8bdc8435df4e800f91381a64af4f
Signed-off-by: Liujie Xie <xieliujie@oppo.com>
Changes in 5.10.65
locking/mutex: Fix HANDOFF condition
regmap: fix the offset of register error log
regulator: tps65910: Silence deferred probe error
crypto: mxs-dcp - Check for DMA mapping errors
sched/deadline: Fix reset_on_fork reporting of DL tasks
power: supply: axp288_fuel_gauge: Report register-address on readb / writeb errors
crypto: omap-sham - clear dma flags only after omap_sham_update_dma_stop()
sched/deadline: Fix missing clock update in migrate_task_rq_dl()
rcu/tree: Handle VM stoppage in stall detection
EDAC/mce_amd: Do not load edac_mce_amd module on guests
posix-cpu-timers: Force next expiration recalc after itimer reset
hrtimer: Avoid double reprogramming in __hrtimer_start_range_ns()
hrtimer: Ensure timerfd notification for HIGHRES=n
udf: Check LVID earlier
udf: Fix iocharset=utf8 mount option
isofs: joliet: Fix iocharset=utf8 mount option
bcache: add proper error unwinding in bcache_device_init
blk-throtl: optimize IOPS throttle for large IO scenarios
nvme-tcp: don't update queue count when failing to set io queues
nvme-rdma: don't update queue count when failing to set io queues
nvmet: pass back cntlid on successful completion
power: supply: smb347-charger: Add missing pin control activation
power: supply: max17042_battery: fix typo in MAx17042_TOFF
s390/cio: add dev_busid sysfs entry for each subchannel
s390/zcrypt: fix wrong offset index for APKA master key valid state
libata: fix ata_host_start()
crypto: omap - Fix inconsistent locking of device lists
crypto: qat - do not ignore errors from enable_vf2pf_comms()
crypto: qat - handle both source of interrupt in VF ISR
crypto: qat - fix reuse of completion variable
crypto: qat - fix naming for init/shutdown VF to PF notifications
crypto: qat - do not export adf_iov_putmsg()
fcntl: fix potential deadlock for &fasync_struct.fa_lock
udf_get_extendedattr() had no boundary checks.
s390/kasan: fix large PMD pages address alignment check
s390/pci: fix misleading rc in clp_set_pci_fn()
s390/debug: keep debug data on resize
s390/debug: fix debug area life cycle
s390/ap: fix state machine hang after failure to enable irq
power: supply: cw2015: use dev_err_probe to allow deferred probe
m68k: emu: Fix invalid free in nfeth_cleanup()
sched/numa: Fix is_core_idle()
sched: Fix UCLAMP_FLAG_IDLE setting
rcu: Fix to include first blocked task in stall warning
rcu: Add lockdep_assert_irqs_disabled() to rcu_sched_clock_irq() and callees
rcu: Fix stall-warning deadlock due to non-release of rcu_node ->lock
m68k: Fix invalid RMW_INSNS on CPUs that lack CAS
block: return ELEVATOR_DISCARD_MERGE if possible
spi: spi-fsl-dspi: Fix issue with uninitialized dma_slave_config
spi: spi-pic32: Fix issue with uninitialized dma_slave_config
genirq/timings: Fix error return code in irq_timings_test_irqs()
irqchip/loongson-pch-pic: Improve edge triggered interrupt support
lib/mpi: use kcalloc in mpi_resize
clocksource/drivers/sh_cmt: Fix wrong setting if don't request IRQ for clock source channel
block: nbd: add sanity check for first_minor
spi: coldfire-qspi: Use clk_disable_unprepare in the remove function
irqchip/gic-v3: Fix priority comparison when non-secure priorities are used
crypto: qat - use proper type for vf_mask
certs: Trigger creation of RSA module signing key if it's not an RSA key
tpm: ibmvtpm: Avoid error message when process gets signal while waiting
x86/mce: Defer processing of early errors
spi: davinci: invoke chipselect callback
blk-crypto: fix check for too-large dun_bytes
regulator: vctrl: Use locked regulator_get_voltage in probe path
regulator: vctrl: Avoid lockdep warning in enable/disable ops
spi: sprd: Fix the wrong WDG_LOAD_VAL
spi: spi-zynq-qspi: use wait_for_completion_timeout to make zynq_qspi_exec_mem_op not interruptible
EDAC/i10nm: Fix NVDIMM detection
drm/panfrost: Fix missing clk_disable_unprepare() on error in panfrost_clk_init()
drm/gma500: Fix end of loop tests for list_for_each_entry
ASoC: mediatek: mt8183: Fix Unbalanced pm_runtime_enable in mt8183_afe_pcm_dev_probe
media: TDA1997x: enable EDID support
leds: is31fl32xx: Fix missing error code in is31fl32xx_parse_dt()
soc: rockchip: ROCKCHIP_GRF should not default to y, unconditionally
media: cxd2880-spi: Fix an error handling path
drm/of: free the right object
bpf: Fix a typo of reuseport map in bpf.h.
bpf: Fix potential memleak and UAF in the verifier.
drm/of: free the iterator object on failure
gve: fix the wrong AdminQ buffer overflow check
libbpf: Fix the possible memory leak on error
ARM: dts: aspeed-g6: Fix HVI3C function-group in pinctrl dtsi
arm64: dts: renesas: r8a77995: draak: Remove bogus adv7511w properties
i40e: improve locking of mac_filter_hash
soc: qcom: rpmhpd: Use corner in power_off
libbpf: Fix removal of inner map in bpf_object__create_map
gfs2: Fix memory leak of object lsi on error return path
firmware: fix theoretical UAF race with firmware cache and resume
driver core: Fix error return code in really_probe()
ionic: cleanly release devlink instance
media: dvb-usb: fix uninit-value in dvb_usb_adapter_dvb_init
media: dvb-usb: fix uninit-value in vp702x_read_mac_addr
media: dvb-usb: Fix error handling in dvb_usb_i2c_init
media: go7007: fix memory leak in go7007_usb_probe
media: go7007: remove redundant initialization
media: rockchip/rga: use pm_runtime_resume_and_get()
media: rockchip/rga: fix error handling in probe
media: coda: fix frame_mem_ctrl for YUV420 and YVU420 formats
media: atomisp: fix the uninitialized use and rename "retvalue"
Bluetooth: sco: prevent information leak in sco_conn_defer_accept()
6lowpan: iphc: Fix an off-by-one check of array index
drm/amdgpu/acp: Make PM domain really work
tcp: seq_file: Avoid skipping sk during tcp_seek_last_pos
ARM: dts: meson8: Use a higher default GPU clock frequency
ARM: dts: meson8b: odroidc1: Fix the pwm regulator supply properties
ARM: dts: meson8b: mxq: Fix the pwm regulator supply properties
ARM: dts: meson8b: ec100: Fix the pwm regulator supply properties
net/mlx5e: Prohibit inner indir TIRs in IPoIB
net/mlx5e: Block LRO if firmware asks for tunneled LRO
cgroup/cpuset: Fix a partition bug with hotplug
drm: mxsfb: Enable recovery on underflow
drm: mxsfb: Increase number of outstanding requests on V4 and newer HW
drm: mxsfb: Clear FIFO_CLEAR bit
net: cipso: fix warnings in netlbl_cipsov4_add_std
Bluetooth: mgmt: Fix wrong opcode in the response for add_adv cmd
arm64: dts: renesas: rzg2: Convert EtherAVB to explicit delay handling
arm64: dts: renesas: hihope-rzg2-ex: Add EtherAVB internal rx delay
devlink: Break parameter notification sequence to be before/after unload/load driver
net/mlx5: Fix missing return value in mlx5_devlink_eswitch_inline_mode_set()
i2c: highlander: add IRQ check
leds: lt3593: Put fwnode in any case during ->probe()
leds: trigger: audio: Add an activate callback to ensure the initial brightness is set
media: em28xx-input: fix refcount bug in em28xx_usb_disconnect
media: venus: venc: Fix potential null pointer dereference on pointer fmt
PCI: PM: Avoid forcing PCI_D0 for wakeup reasons inconsistently
PCI: PM: Enable PME if it can be signaled from D3cold
bpf, samples: Add missing mprog-disable to xdp_redirect_cpu's optstring
soc: qcom: smsm: Fix missed interrupts if state changes while masked
debugfs: Return error during {full/open}_proxy_open() on rmmod
Bluetooth: increase BTNAMSIZ to 21 chars to fix potential buffer overflow
PM: EM: Increase energy calculation precision
selftests/bpf: Fix bpf-iter-tcp4 test to print correctly the dest IP
drm/msm/mdp4: refactor HW revision detection into read_mdp_hw_revision
drm/msm/mdp4: move HW revision detection to earlier phase
drm/msm/dpu: make dpu_hw_ctl_clear_all_blendstages clear necessary LMs
arm64: dts: exynos: correct GIC CPU interfaces address range on Exynos7
counter: 104-quad-8: Return error when invalid mode during ceiling_write
cgroup/cpuset: Miscellaneous code cleanup
cgroup/cpuset: Fix violation of cpuset locking rule
ASoC: Intel: Fix platform ID matching
Bluetooth: fix repeated calls to sco_sock_kill
drm/msm/dsi: Fix some reference counted resource leaks
net/mlx5: Register to devlink ingress VLAN filter trap
net/mlx5: Fix unpublish devlink parameters
ASoC: rt5682: Implement remove callback
ASoC: rt5682: Properly turn off regulators if wrong device ID
usb: dwc3: meson-g12a: add IRQ check
usb: dwc3: qcom: add IRQ check
usb: gadget: udc: at91: add IRQ check
usb: gadget: udc: s3c2410: add IRQ check
usb: phy: fsl-usb: add IRQ check
usb: phy: twl6030: add IRQ checks
usb: gadget: udc: renesas_usb3: Fix soc_device_match() abuse
selftests/bpf: Fix test_core_autosize on big-endian machines
devlink: Clear whole devlink_flash_notify struct
samples: pktgen: add missing IPv6 option to pktgen scripts
Bluetooth: Move shutdown callback before flushing tx and rx queue
PM: cpu: Make notifier chain use a raw_spinlock_t
usb: host: ohci-tmio: add IRQ check
usb: phy: tahvo: add IRQ check
libbpf: Re-build libbpf.so when libbpf.map changes
mac80211: Fix insufficient headroom issue for AMSDU
locking/lockdep: Mark local_lock_t
locking/local_lock: Add missing owner initialization
lockd: Fix invalid lockowner cast after vfs_test_lock
nfsd4: Fix forced-expiry locking
arm64: dts: marvell: armada-37xx: Extend PCIe MEM space
clk: staging: correct reference to config IOMEM to config HAS_IOMEM
i2c: synquacer: fix deferred probing
firmware: raspberrypi: Keep count of all consumers
firmware: raspberrypi: Fix a leak in 'rpi_firmware_get()'
usb: gadget: mv_u3d: request_irq() after initializing UDC
mm/swap: consider max pages in iomap_swapfile_add_extent
lkdtm: replace SCSI_DISPATCH_CMD with SCSI_QUEUE_RQ
Bluetooth: add timeout sanity check to hci_inquiry
i2c: iop3xx: fix deferred probing
i2c: s3c2410: fix IRQ check
i2c: fix platform_get_irq.cocci warnings
i2c: hix5hd2: fix IRQ check
gfs2: init system threads before freeze lock
rsi: fix error code in rsi_load_9116_firmware()
rsi: fix an error code in rsi_probe()
ASoC: Intel: kbl_da7219_max98927: Fix format selection for max98373
ASoC: Intel: Skylake: Leave data as is when invoking TLV IPCs
ASoC: Intel: Skylake: Fix module resource and format selection
mmc: sdhci: Fix issue with uninitialized dma_slave_config
mmc: dw_mmc: Fix issue with uninitialized dma_slave_config
mmc: moxart: Fix issue with uninitialized dma_slave_config
bpf: Fix possible out of bound write in narrow load handling
CIFS: Fix a potencially linear read overflow
i2c: mt65xx: fix IRQ check
i2c: xlp9xx: fix main IRQ check
usb: ehci-orion: Handle errors of clk_prepare_enable() in probe
usb: bdc: Fix an error handling path in 'bdc_probe()' when no suitable DMA config is available
usb: bdc: Fix a resource leak in the error handling path of 'bdc_probe()'
tty: serial: fsl_lpuart: fix the wrong mapbase value
ASoC: wcd9335: Fix a double irq free in the remove function
ASoC: wcd9335: Fix a memory leak in the error handling path of the probe function
ASoC: wcd9335: Disable irq on slave ports in the remove function
iwlwifi: follow the new inclusive terminology
iwlwifi: skip first element in the WTAS ACPI table
ice: Only lock to update netdev dev_addr
ath6kl: wmi: fix an error code in ath6kl_wmi_sync_point()
atlantic: Fix driver resume flow.
bcma: Fix memory leak for internally-handled cores
brcmfmac: pcie: fix oops on failure to resume and reprobe
ipv6: make exception cache less predictible
ipv4: make exception cache less predictible
net: sched: Fix qdisc_rate_table refcount leak when get tcf_block failed
net: qualcomm: fix QCA7000 checksum handling
octeontx2-af: Fix loop in free and unmap counter
octeontx2-af: Fix static code analyzer reported issues
octeontx2-af: Set proper errorcode for IPv4 checksum errors
ipv4: fix endianness issue in inet_rtm_getroute_build_skb()
ASoC: rt5682: Remove unused variable in rt5682_i2c_remove()
iwlwifi Add support for ax201 in Samsung Galaxy Book Flex2 Alpha
f2fs: guarantee to write dirty data when enabling checkpoint back
time: Handle negative seconds correctly in timespec64_to_ns()
io_uring: IORING_OP_WRITE needs hash_reg_file set
bio: fix page leak bio_add_hw_page failure
tty: Fix data race between tiocsti() and flush_to_ldisc()
perf/x86/amd/ibs: Extend PERF_PMU_CAP_NO_EXCLUDE to IBS Op
x86/resctrl: Fix a maybe-uninitialized build warning treated as error
Revert "KVM: x86: mmu: Add guest physical address check in translate_gpa()"
KVM: s390: index kvm->arch.idle_mask by vcpu_idx
KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted
KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation
KVM: nVMX: Unconditionally clear nested.pi_pending on nested VM-Enter
ARM: dts: at91: add pinctrl-{names, 0} for all gpios
fuse: truncate pagecache on atomic_o_trunc
fuse: flush extending writes
IMA: remove -Wmissing-prototypes warning
IMA: remove the dependency on CRYPTO_MD5
fbmem: don't allow too huge resolutions
backlight: pwm_bl: Improve bootloader/kernel device handover
clk: kirkwood: Fix a clocking boot regression
Linux 5.10.65
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie0b9306ba6ee4193de3200df7cdacaeba152b83e
[ Upstream commit 048661a1f963e9517630f080687d48af79ed784c ]
Yanfei reported that setting HANDOFF should not depend on recomputing
@first, only on @first state. Which would then give:
if (ww_ctx || !first)
first = __mutex_waiter_is_first(lock, &waiter);
if (first)
__mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
But because 'ww_ctx || !first' is basically 'always' and the test for
first is relatively cheap, omit that first branch entirely.
Reported-by: Yanfei Xu <yanfei.xu@windriver.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Waiman Long <longman@redhat.com>
Reviewed-by: Yanfei Xu <yanfei.xu@windriver.com>
Link: https://lore.kernel.org/r/20210630154114.896786297@infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
Changes in 5.10.40
firmware: arm_scpi: Prevent the ternary sign expansion bug
openrisc: Fix a memory leak
tee: amdtee: unload TA only when its refcount becomes 0
RDMA/siw: Properly check send and receive CQ pointers
RDMA/siw: Release xarray entry
RDMA/core: Prevent divide-by-zero error triggered by the user
RDMA/rxe: Clear all QP fields if creation failed
scsi: ufs: core: Increase the usable queue depth
scsi: qedf: Add pointer checks in qedf_update_link_speed()
scsi: qla2xxx: Fix error return code in qla82xx_write_flash_dword()
RDMA/mlx5: Recover from fatal event in dual port mode
RDMA/core: Don't access cm_id after its destruction
nvmet: remove unused ctrl->cqs
nvmet: fix memory leak in nvmet_alloc_ctrl()
nvme-loop: fix memory leak in nvme_loop_create_ctrl()
nvme-tcp: rerun io_work if req_list is not empty
nvme-fc: clear q_live at beginning of association teardown
platform/mellanox: mlxbf-tmfifo: Fix a memory barrier issue
platform/x86: intel_int0002_vgpio: Only call enable_irq_wake() when using s2idle
platform/x86: dell-smbios-wmi: Fix oops on rmmod dell_smbios
RDMA/mlx5: Fix query DCT via DEVX
RDMA/uverbs: Fix a NULL vs IS_ERR() bug
tools/testing/selftests/exec: fix link error
powerpc/pseries: Fix hcall tracing recursion in pv queued spinlocks
ptrace: make ptrace() fail if the tracee changed its pid unexpectedly
nvmet: seset ns->file when open fails
perf/x86: Avoid touching LBR_TOS MSR for Arch LBR
locking/lockdep: Correct calling tracepoints
locking/mutex: clear MUTEX_FLAGS if wait_list is empty due to signal
powerpc: Fix early setup to make early_ioremap() work
btrfs: avoid RCU stalls while running delayed iputs
cifs: fix memory leak in smb2_copychunk_range
misc: eeprom: at24: check suspend status before disable regulator
ALSA: dice: fix stream format for TC Electronic Konnekt Live at high sampling transfer frequency
ALSA: intel8x0: Don't update period unless prepared
ALSA: firewire-lib: fix amdtp_packet tracepoints event for packet_index field
ALSA: line6: Fix racy initialization of LINE6 MIDI
ALSA: dice: fix stream format at middle sampling rate for Alesis iO 26
ALSA: firewire-lib: fix calculation for size of IR context payload
ALSA: usb-audio: Validate MS endpoint descriptors
ALSA: bebob/oxfw: fix Kconfig entry for Mackie d.2 Pro
ALSA: hda: fixup headset for ASUS GU502 laptop
Revert "ALSA: sb8: add a check for request_region"
ALSA: firewire-lib: fix check for the size of isochronous packet payload
ALSA: hda/realtek: reset eapd coeff to default value for alc287
ALSA: hda/realtek: Add some CLOVE SSIDs of ALC293
ALSA: hda/realtek: Fix silent headphone output on ASUS UX430UA
ALSA: hda/realtek: Add fixup for HP OMEN laptop
ALSA: hda/realtek: Add fixup for HP Spectre x360 15-df0xxx
uio_hv_generic: Fix a memory leak in error handling paths
Revert "rapidio: fix a NULL pointer dereference when create_workqueue() fails"
rapidio: handle create_workqueue() failure
Revert "serial: mvebu-uart: Fix to avoid a potential NULL pointer dereference"
nvme-tcp: fix possible use-after-completion
x86/sev-es: Move sev_es_put_ghcb() in prep for follow on patch
x86/sev-es: Invalidate the GHCB after completing VMGEXIT
x86/sev-es: Don't return NULL from sev_es_get_ghcb()
x86/sev-es: Use __put_user()/__get_user() for data accesses
x86/sev-es: Forward page-faults which happen during emulation
drm/amdgpu: Fix GPU TLB update error when PAGE_SIZE > AMDGPU_PAGE_SIZE
drm/amdgpu: disable 3DCGCG on picasso/raven1 to avoid compute hang
drm/amdgpu: update gc golden setting for Navi12
drm/amdgpu: update sdma golden setting for Navi12
powerpc/64s/syscall: Use pt_regs.trap to distinguish syscall ABI difference between sc and scv syscalls
powerpc/64s/syscall: Fix ptrace syscall info with scv syscalls
mmc: sdhci-pci-gli: increase 1.8V regulator wait
xen-pciback: redo VF placement in the virtual topology
xen-pciback: reconfigure also from backend watch handler
ipc/mqueue, msg, sem: avoid relying on a stack reference past its expiry
dm snapshot: fix crash with transient storage and zero chunk size
kcsan: Fix debugfs initcall return type
Revert "video: hgafb: fix potential NULL pointer dereference"
Revert "net: stmicro: fix a missing check of clk_prepare"
Revert "leds: lp5523: fix a missing check of return value of lp55xx_read"
Revert "hwmon: (lm80) fix a missing check of bus read in lm80 probe"
Revert "video: imsttfb: fix potential NULL pointer dereferences"
Revert "ecryptfs: replace BUG_ON with error handling code"
Revert "scsi: ufs: fix a missing check of devm_reset_control_get"
Revert "gdrom: fix a memory leak bug"
cdrom: gdrom: deallocate struct gdrom_unit fields in remove_gdrom
cdrom: gdrom: initialize global variable at init time
Revert "media: rcar_drif: fix a memory disclosure"
Revert "rtlwifi: fix a potential NULL pointer dereference"
Revert "qlcnic: Avoid potential NULL pointer dereference"
Revert "niu: fix missing checks of niu_pci_eeprom_read"
ethernet: sun: niu: fix missing checks of niu_pci_eeprom_read()
net: stmicro: handle clk_prepare() failure during init
scsi: ufs: handle cleanup correctly on devm_reset_control_get error
net: rtlwifi: properly check for alloc_workqueue() failure
ics932s401: fix broken handling of errors when word reading fails
leds: lp5523: check return value of lp5xx_read and jump to cleanup code
qlcnic: Add null check after calling netdev_alloc_skb
video: hgafb: fix potential NULL pointer dereference
vgacon: Record video mode changes with VT_RESIZEX
vt_ioctl: Revert VT_RESIZEX parameter handling removal
vt: Fix character height handling with VT_RESIZEX
tty: vt: always invoke vc->vc_sw->con_resize callback
drm/i915/gt: Disable HiZ Raw Stall Optimization on broken gen7
openrisc: mm/init.c: remove unused memblock_region variable in map_ram()
x86/Xen: swap NX determination and GDT setup on BSP
nvme-multipath: fix double initialization of ANA state
rtc: pcf85063: fallback to parent of_node
x86/boot/compressed/64: Check SEV encryption in the 32-bit boot-path
nvmet: use new ana_log_size instead the old one
video: hgafb: correctly handle card detect failure during probe
Bluetooth: SMP: Fail if remote and local public keys are identical
Linux 5.10.40
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4523cf43d1da6bea507e4027bd83bc491a574f41
[ Upstream commit 3a010c493271f04578b133de977e0e5dd2848cea ]
When a interruptible mutex locker is interrupted by a signal
without acquiring this lock and removed from the wait queue.
if the mutex isn't contended enough to have a waiter
put into the wait queue again, the setting of the WAITER
bit will force mutex locker to go into the slowpath to
acquire the lock every time, so if the wait queue is empty,
the WAITER bit need to be clear.
Fixes: 040a0a3710 ("mutex: Add support for wound/wait style locks")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Zqiang <qiang.zhang@windriver.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210517034005.30828-1-qiang.zhang@windriver.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Changes in 5.10.28
arm64: mm: correct the inside linear map range during hotplug check
bpf: Fix fexit trampoline.
virtiofs: Fail dax mount if device does not support it
ext4: shrink race window in ext4_should_retry_alloc()
ext4: fix bh ref count on error paths
fs: nfsd: fix kconfig dependency warning for NFSD_V4
rpc: fix NULL dereference on kmalloc failure
iomap: Fix negative assignment to unsigned sis->pages in iomap_swapfile_activate
ASoC: rt1015: fix i2c communication error
ASoC: rt5640: Fix dac- and adc- vol-tlv values being off by a factor of 10
ASoC: rt5651: Fix dac- and adc- vol-tlv values being off by a factor of 10
ASoC: sgtl5000: set DAP_AVC_CTRL register to correct default value on probe
ASoC: es8316: Simplify adc_pga_gain_tlv table
ASoC: soc-core: Prevent warning if no DMI table is present
ASoC: cs42l42: Fix Bitclock polarity inversion
ASoC: cs42l42: Fix channel width support
ASoC: cs42l42: Fix mixer volume control
ASoC: cs42l42: Always wait at least 3ms after reset
NFSD: fix error handling in NFSv4.0 callbacks
kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing
vhost: Fix vhost_vq_reset()
io_uring: fix ->flags races by linked timeouts
scsi: st: Fix a use after free in st_open()
scsi: qla2xxx: Fix broken #endif placement
staging: comedi: cb_pcidas: fix request_irq() warn
staging: comedi: cb_pcidas64: fix request_irq() warn
ASoC: rt5659: Update MCLK rate in set_sysclk()
ASoC: rt711: add snd_soc_component remove callback
thermal/core: Add NULL pointer check before using cooling device stats
locking/ww_mutex: Simplify use_ww_ctx & ww_ctx handling
locking/ww_mutex: Fix acquire/release imbalance in ww_acquire_init()/ww_acquire_fini()
nvmet-tcp: fix kmap leak when data digest in use
io_uring: imply MSG_NOSIGNAL for send[msg]()/recv[msg]() calls
static_call: Align static_call_is_init() patching condition
ext4: do not iput inode under running transaction in ext4_rename()
io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL
net: mvpp2: fix interrupt mask/unmask skip condition
flow_dissector: fix TTL and TOS dissection on IPv4 fragments
can: dev: move driver related infrastructure into separate subdir
net: introduce CAN specific pointer in the struct net_device
can: tcan4x5x: fix max register value
brcmfmac: clear EAP/association status bits on linkdown events
ath11k: add ieee80211_unregister_hw to avoid kernel crash caused by NULL pointer
rtw88: coex: 8821c: correct antenna switch function
netdevsim: dev: Initialize FIB module after debugfs
iwlwifi: pcie: don't disable interrupts for reg_lock
ath10k: hold RCU lock when calling ieee80211_find_sta_by_ifaddr()
net: ethernet: aquantia: Handle error cleanup of start on open
appletalk: Fix skb allocation size in loopback case
net: ipa: remove two unused register definitions
net: ipa: fix register write command validation
net: wan/lmc: unregister device when no matching device is found
net: 9p: advance iov on empty read
bpf: Remove MTU check in __bpf_skb_max_len
ACPI: tables: x86: Reserve memory occupied by ACPI tables
ACPI: processor: Fix CPU0 wakeup in acpi_idle_play_dead()
ALSA: usb-audio: Apply sample rate quirk to Logitech Connect
ALSA: hda: Re-add dropped snd_poewr_change_state() calls
ALSA: hda: Add missing sanity checks in PM prepare/complete callbacks
ALSA: hda/realtek: fix a determine_headset_type issue for a Dell AIO
ALSA: hda/realtek: call alc_update_headset_mode() in hp_automute_hook
ALSA: hda/realtek: fix mute/micmute LEDs for HP 640 G8
xtensa: fix uaccess-related livelock in do_page_fault
xtensa: move coprocessor_flush to the .text section
KVM: SVM: load control fields from VMCB12 before checking them
KVM: SVM: ensure that EFER.SVME is set when running nested guest or on nested vmexit
PM: runtime: Fix race getting/putting suppliers at probe
PM: runtime: Fix ordering in pm_runtime_get_suppliers()
tracing: Fix stack trace event size
s390/vdso: copy tod_steering_delta value to vdso_data page
s390/vdso: fix tod_steering_delta type
mm: fix race by making init_zero_pfn() early_initcall
drm/amdkfd: dqm fence memory corruption
drm/amdgpu: fix offset calculation in amdgpu_vm_bo_clear_mappings()
drm/amdgpu: check alignment on CPU page for bo map
reiserfs: update reiserfs_xattrs_initialized() condition
drm/imx: fix memory leak when fails to init
drm/tegra: dc: Restore coupling of display controllers
drm/tegra: sor: Grab runtime PM reference across reset
vfio/nvlink: Add missing SPAPR_TCE_IOMMU depends
pinctrl: rockchip: fix restore error in resume
extcon: Add stubs for extcon_register_notifier_all() functions
extcon: Fix error handling in extcon_dev_register
firmware: stratix10-svc: reset COMMAND_RECONFIG_FLAG_PARTIAL to 0
usb: dwc3: pci: Enable dis_uX_susphy_quirk for Intel Merrifield
video: hyperv_fb: Fix a double free in hvfb_probe
firewire: nosy: Fix a use-after-free bug in nosy_ioctl()
usbip: vhci_hcd fix shift out-of-bounds in vhci_hub_control()
USB: quirks: ignore remote wake-up on Fibocom L850-GL LTE modem
usb: musb: Fix suspend with devices connected for a64
usb: xhci-mtk: fix broken streams issue on 0.96 xHCI
cdc-acm: fix BREAK rx code path adding necessary calls
USB: cdc-acm: untangle a circular dependency between callback and softint
USB: cdc-acm: downgrade message to debug
USB: cdc-acm: fix double free on probe failure
USB: cdc-acm: fix use-after-free after probe failure
usb: gadget: udc: amd5536udc_pci fix null-ptr-dereference
usb: dwc2: Fix HPRT0.PrtSusp bit setting for HiKey 960 board.
usb: dwc2: Prevent core suspend when port connection flag is 0
usb: dwc3: qcom: skip interconnect init for ACPI probe
usb: dwc3: gadget: Clear DEP flags after stop transfers in ep disable
soc: qcom-geni-se: Cleanup the code to remove proxy votes
staging: rtl8192e: Fix incorrect source in memcpy()
staging: rtl8192e: Change state information from u16 to u8
driver core: clear deferred probe reason on probe retry
drivers: video: fbcon: fix NULL dereference in fbcon_cursor()
riscv: evaluate put_user() arg before enabling user access
Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing"
bpf: Use NOP_ATOMIC5 instead of emit_nops(&prog, 5) for BPF_TRAMP_F_CALL_ORIG
Linux 5.10.28
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ifdbbeda8de3ee22a7aa3f5d3b10becf0aba1a124
[ Upstream commit 5de2055d31ea88fd9ae9709ac95c372a505a60fa ]
The use_ww_ctx flag is passed to mutex_optimistic_spin(), but the
function doesn't use it. The frequent use of the (use_ww_ctx && ww_ctx)
combination is repetitive.
In fact, ww_ctx should not be used at all if !use_ww_ctx. Simplify
ww_mutex code by dropping use_ww_ctx from mutex_optimistic_spin() an
clear ww_ctx if !use_ww_ctx. In this way, we can replace (use_ww_ctx &&
ww_ctx) by just (ww_ctx).
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Link: https://lore.kernel.org/r/20210316153119.13802-2-longman@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
- Add the hook to get mutex/rwsem information that the tasks
are waiting for.
- Add the hook to print messages for sched_show_task.
- ANDROID_VENDOR_DATA_ARRAY added to task_struct
Bug: 162776704
Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
Change-Id: Ib436fbd8d0ad509c3b5a73ea8f5170e0761a13fd
(cherry picked from commit b519ac4237)
This ended up causing some noise in places such as rxrpc running in softirq.
The warning is misleading in this case as the mutex trylock and unlock
operations are done within the same context; and therefore we need not
worry about the PI-boosting issues that comes along with no single-owner
lock guarantees.
While we don't want to support this in mutexes, there is no way out of
this yet; so lets get rid of the WARNs for now, as it is only fair to
code that has historically relied on non-preemptible softirq guarantees.
In addition, changing the lock type is also unviable: exclusive rwsems
have the same issue (just not the WARN_ON) and counting semaphores
would introduce a performance hit as mutexes are a lot more optimized.
This reverts:
a0855d24fc: ("locking/mutex: Complain upon mutex API misuse in IRQ contexts")
Fixes: a0855d24fc: ("locking/mutex: Complain upon mutex API misuse in IRQ contexts")
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Tested-by: David Howells <dhowells@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-afs@lists.infradead.org
Cc: linux-fsdevel@vger.kernel.org
Cc: will@kernel.org
Link: https://lkml.kernel.org/r/20191210220523.28540-1-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The patch moving bits into mutex.c was a little too much; by also
moving struct mutex_waiter a few less common CONFIGs would no longer
build.
Fixes: 5f35d5a66b ("locking/mutex: Make __mutex_owner static to mutex.c")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
__mutex_owner() should only be used by the mutex api's.
So, to put this restiction let's move the __mutex_owner()
function definition from linux/mutex.h to mutex.c file.
There exist functions that uses __mutex_owner() like
mutex_is_locked() and mutex_trylock_recursive(), So
to keep legacy thing intact move them as well and
export them.
Move mutex_waiter structure also to keep it private to the
file.
Signed-off-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: mingo@redhat.com
Cc: will@kernel.org
Link: https://lkml.kernel.org/r/1564585504-3543-1-git-send-email-mojha@codeaurora.org
Convert the locking documents to ReST and add them to the
kernel development book where it belongs.
Most of the stuff here is just to make Sphinx to properly
parse the text file, as they're already in good shape,
not requiring massive changes in order to be parsed.
The conversion is actually:
- add blank lines and identation in order to identify paragraphs;
- fix tables markups;
- add some lists markups;
- mark literal blocks;
- adjust title markups.
At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Federico Vaga <federico.vaga@vaga.pv.it>
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have EXPORT_.*_SYMBOL_GPL inside which was used in the
initial scan/conversion to ignore the file
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current Wound-Wait mutex algorithm is actually not Wound-Wait but
Wait-Die. Implement also Wound-Wait as a per-ww-class choice. Wound-Wait
is, contrary to Wait-Die a preemptive algorithm and is known to generate
fewer backoffs. Testing reveals that this is true if the
number of simultaneous contending transactions is small.
As the number of simultaneous contending threads increases, Wait-Wound
becomes inferior to Wait-Die in terms of elapsed time.
Possibly due to the larger number of held locks of sleeping transactions.
Update documentation and callers.
Timings using git://people.freedesktop.org/~thomash/ww_mutex_test
tag patch-18-06-15
Each thread runs 100000 batches of lock / unlock 800 ww mutexes randomly
chosen out of 100000. Four core Intel x86_64:
Algorithm #threads Rollbacks time
Wound-Wait 4 ~100 ~17s.
Wait-Die 4 ~150000 ~19s.
Wound-Wait 16 ~360000 ~109s.
Wait-Die 16 ~450000 ~82s.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: David Airlie <airlied@linux.ie>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-doc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Co-authored-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
There are a few issues on some kernel-doc markups that was
causing troubles with kernel-doc output on ReST format:
./kernel/futex.c:492: WARNING: Inline emphasis start-string without end-string.
./kernel/futex.c:1264: WARNING: Block quote ends without a blank line; unexpected unindent.
./kernel/futex.c:1721: WARNING: Block quote ends without a blank line; unexpected unindent.
./kernel/futex.c:2338: WARNING: Block quote ends without a blank line; unexpected unindent.
./kernel/futex.c:2426: WARNING: Block quote ends without a blank line; unexpected unindent.
./kernel/futex.c:2899: WARNING: Block quote ends without a blank line; unexpected unindent.
./kernel/futex.c:2972: WARNING: Block quote ends without a blank line; unexpected unindent.
Fix them.
No functional changes.
Acked-by: Darren Hart (VMware) <dvhart@infradead.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
We are going to split <linux/sched/debug.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/debug.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are going to split <linux/sched/wake_q.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/wake_q.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull locking updates from Ingo Molnar:
"The main changes in this cycle were:
- Implement wraparound-safe refcount_t and kref_t types based on
generic atomic primitives (Peter Zijlstra)
- Improve and fix the ww_mutex code (Nicolai Hähnle)
- Add self-tests to the ww_mutex code (Chris Wilson)
- Optimize percpu-rwsems with the 'rcuwait' mechanism (Davidlohr
Bueso)
- Micro-optimize the current-task logic all around the core kernel
(Davidlohr Bueso)
- Tidy up after recent optimizations: remove stale code and APIs,
clean up the code (Waiman Long)
- ... plus misc fixes, updates and cleanups"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
fork: Fix task_struct alignment
locking/spinlock/debug: Remove spinlock lockup detection code
lockdep: Fix incorrect condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS
lkdtm: Convert to refcount_t testing
kref: Implement 'struct kref' using refcount_t
refcount_t: Introduce a special purpose refcount type
sched/wake_q: Clarify queue reinit comment
sched/wait, rcuwait: Fix typo in comment
locking/mutex: Fix lockdep_assert_held() fail
locking/rtmutex: Flip unlikely() branch to likely() in __rt_mutex_slowlock()
locking/rwsem: Reinit wake_q after use
locking/rwsem: Remove unnecessary atomic_long_t casts
jump_labels: Move header guard #endif down where it belongs
locking/atomic, kref: Implement kref_put_lock()
locking/ww_mutex: Turn off __must_check for now
locking/atomic, kref: Avoid more abuse
locking/atomic, kref: Use kref_get_unless_zero() more
locking/atomic, kref: Kill kref_sub()
locking/atomic, kref: Add kref_read()
locking/atomic, kref: Add KREF_INIT()
...
In commit:
659cf9f582 ("locking/ww_mutex: Optimize ww-mutexes by waking at most one waiter for backoff when acquiring the lock")
I replaced a comment with a lockdep_assert_held(). However it turns out
we hide that lock from lockdep for hysterical raisins, which results
in the assertion always firing.
Remove the old debug code as lockdep will easily spot the abuse it was
meant to catch, which will make the lock visible to lockdep and make
the assertion work as intended.
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicolai Haehnle <Nicolai.Haehnle@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 659cf9f582 ("locking/ww_mutex: Optimize ww-mutexes by waking at most one waiter for backoff when acquiring the lock")
Link: http://lkml.kernel.org/r/20170117150609.GB32474@worktop
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In the following scenario, thread #1 should back off its attempt to lock
ww1 and unlock ww2 (assuming the acquire context stamps are ordered
accordingly).
Thread #0 Thread #1
--------- ---------
successfully lock ww2
set ww1->base.owner
attempt to lock ww1
confirm ww1->ctx == NULL
enter mutex_spin_on_owner
set ww1->ctx
What was likely to happen previously is:
attempt to lock ww2
refuse to spin because
ww2->ctx != NULL
schedule()
detect thread #0 is off CPU
stop optimistic spin
return -EDEADLK
unlock ww2
wakeup thread #0
lock ww2
Now, we are more likely to see:
detect ww1->ctx != NULL
stop optimistic spin
return -EDEADLK
unlock ww2
successfully lock ww2
... because thread #1 will stop its optimistic spin as soon as possible.
The whole scenario is quite unlikely, since it requires thread #1 to get
between thread #0 setting the owner and setting the ctx. But since we're
idling here anyway, the additional check is basically free.
Found by inspection.
Signed-off-by: Nicolai Hähnle <Nicolai.Haehnle@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maarten Lankhorst <dev@mblankhorst.nl>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dri-devel@lists.freedesktop.org
Link: http://lkml.kernel.org/r/1482346000-9927-10-git-send-email-nhaehnle@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Instead of inlining __mutex_lock_common() 5 times, once for each
{state,ww} variant. Reduce this to two, ww and !ww.
Then add __always_inline to mutex_optimistic_spin(), so that that will
get inlined all 4 remaining times, for all {waiter,ww} variants.
text data bss dec hex filename
6301 0 0 6301 189d defconfig-build/kernel/locking/mutex.o
4053 0 0 4053 fd5 defconfig-build/kernel/locking/mutex.o
4257 0 0 4257 10a1 defconfig-build/kernel/locking/mutex.o
This reduces total text size and better separates the ww and !ww mutex
code generation.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add regular waiters in stamp order. Keep adding waiters that have no
context in FIFO order and take care not to starve them.
While adding our task as a waiter, back off if we detect that there is
a waiter with a lower stamp in front of us.
Make sure to call lock_contended even when we back off early.
For w/w mutexes, being first in the wait list is only stable when
taking the lock without a context. Therefore, the purpose of the first
flag is split into two: 'first' remains to indicate whether we want to
spin optimistically, while 'handoff' indicates that we should be
prepared to accept a handoff.
For w/w locking with a context, we always accept handoffs after the
first schedule(), to handle the following sequence of events:
1. Task #0 unlocks and hands off to Task #2 which is first in line
2. Task #1 adds itself in front of Task #2
3. Task #2 wakes up and must accept the handoff even though it is no
longer first in line
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: =?UTF-8?q?Nicolai=20H=C3=A4hnle?= <Nicolai.Haehnle@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maarten Lankhorst <dev@mblankhorst.nl>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dri-devel@lists.freedesktop.org
Link: http://lkml.kernel.org/r/1482346000-9927-7-git-send-email-nhaehnle@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
While reviewing the ww_mutex patches, I noticed that it was still
possible to (incorrectly) succeed for (incorrect) code like:
mutex_lock(&a);
mutex_lock(&a);
This was possible if the second mutex_lock() would block (as expected)
but then receive a spurious wakeup. At that point it would find itself
at the front of the queue, request a handoff and instantly claim
ownership and continue, since owner would point to itself.
Avoid this scenario and simplify the code by introducing a third low
bit to signal handoff pickup. So once we request handoff, unlock
clears the handoff bit and sets the pickup bit along with the new
owner.
This also removes the need for the .handoff argument to
__mutex_trylock(), since that becomes superfluous with PICKUP.
In order to guarantee enough low bits, ensure task_struct alignment is
at least L1_CACHE_BYTES (which seems a good ideal regardless).
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 9d659ae14b ("locking/mutex: Add lock handoff to avoid starvation")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This is a nasty interface and setting the state of a foreign task must
not be done. As of the following commit:
be628be095 ("bcache: Make gc wakeup sane, remove set_task_state()")
... everyone in the kernel calls set_task_state() with current, allowing
the helper to be removed.
However, as the comment indicates, it is still around for those archs
where computing current is more expensive than using a pointer, at least
in theory. An important arch that is affected is arm64, however this has
been addressed now [1] and performance is up to par making no difference
with either calls.
Of all the callers, if any, it's the locking bits that would care most
about this -- ie: we end up passing a tsk pointer to a lot of the lock
slowpath, and setting ->state on that. The following numbers are based
on two tests: a custom ad-hoc microbenchmark that just measures
latencies (for ~65 million calls) between get_task_state() vs
get_current_state().
Secondly for a higher overview, an unlink microbenchmark was used,
which pounds on a single file with open, close,unlink combos with
increasing thread counts (up to 4x ncpus). While the workload is quite
unrealistic, it does contend a lot on the inode mutex or now rwsem.
[1] https://lkml.kernel.org/r/1483468021-8237-1-git-send-email-mark.rutland@arm.com
== 1. x86-64 ==
Avg runtime set_task_state(): 601 msecs
Avg runtime set_current_state(): 552 msecs
vanilla dirty
Hmean unlink1-processes-2 36089.26 ( 0.00%) 38977.33 ( 8.00%)
Hmean unlink1-processes-5 28555.01 ( 0.00%) 29832.55 ( 4.28%)
Hmean unlink1-processes-8 37323.75 ( 0.00%) 44974.57 ( 20.50%)
Hmean unlink1-processes-12 43571.88 ( 0.00%) 44283.01 ( 1.63%)
Hmean unlink1-processes-21 34431.52 ( 0.00%) 38284.45 ( 11.19%)
Hmean unlink1-processes-30 34813.26 ( 0.00%) 37975.17 ( 9.08%)
Hmean unlink1-processes-48 37048.90 ( 0.00%) 39862.78 ( 7.59%)
Hmean unlink1-processes-79 35630.01 ( 0.00%) 36855.30 ( 3.44%)
Hmean unlink1-processes-110 36115.85 ( 0.00%) 39843.91 ( 10.32%)
Hmean unlink1-processes-141 32546.96 ( 0.00%) 35418.52 ( 8.82%)
Hmean unlink1-processes-172 34674.79 ( 0.00%) 36899.21 ( 6.42%)
Hmean unlink1-processes-203 37303.11 ( 0.00%) 36393.04 ( -2.44%)
Hmean unlink1-processes-224 35712.13 ( 0.00%) 36685.96 ( 2.73%)
== 2. ppc64le ==
Avg runtime set_task_state(): 938 msecs
Avg runtime set_current_state: 940 msecs
vanilla dirty
Hmean unlink1-processes-2 19269.19 ( 0.00%) 30704.50 ( 59.35%)
Hmean unlink1-processes-5 20106.15 ( 0.00%) 21804.15 ( 8.45%)
Hmean unlink1-processes-8 17496.97 ( 0.00%) 17243.28 ( -1.45%)
Hmean unlink1-processes-12 14224.15 ( 0.00%) 17240.21 ( 21.20%)
Hmean unlink1-processes-21 14155.66 ( 0.00%) 15681.23 ( 10.78%)
Hmean unlink1-processes-30 14450.70 ( 0.00%) 15995.83 ( 10.69%)
Hmean unlink1-processes-48 16945.57 ( 0.00%) 16370.42 ( -3.39%)
Hmean unlink1-processes-79 15788.39 ( 0.00%) 14639.27 ( -7.28%)
Hmean unlink1-processes-110 14268.48 ( 0.00%) 14377.40 ( 0.76%)
Hmean unlink1-processes-141 14023.65 ( 0.00%) 16271.69 ( 16.03%)
Hmean unlink1-processes-172 13417.62 ( 0.00%) 16067.55 ( 19.75%)
Hmean unlink1-processes-203 15293.08 ( 0.00%) 15440.40 ( 0.96%)
Hmean unlink1-processes-234 13719.32 ( 0.00%) 16190.74 ( 18.01%)
Hmean unlink1-processes-265 16400.97 ( 0.00%) 16115.22 ( -1.74%)
Hmean unlink1-processes-296 14388.60 ( 0.00%) 16216.13 ( 12.70%)
Hmean unlink1-processes-320 15771.85 ( 0.00%) 15905.96 ( 0.85%)
x86-64 (known to be fast for get_current()/this_cpu_read_stable() caching)
and ppc64 (with paca) show similar improvements in the unlink microbenches.
The small delta for ppc64 (2ms), does not represent the gains on the unlink
runs. In the case of x86, there was a decent amount of variation in the
latency runs, but always within a 20 to 50ms increase), ppc was more constant.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@stgolabs.net
Cc: mark.rutland@arm.com
Link: http://lkml.kernel.org/r/1483479794-14013-5-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently the wake_q data structure is defined by the WAKE_Q() macro.
This macro, however, looks like a function doing something as "wake" is
a verb. Even checkpatch.pl was confused as it reported warnings like
WARNING: Missing a blank line after declarations
#548: FILE: kernel/futex.c:3665:
+ int ret;
+ WAKE_Q(wake_q);
This patch renames the WAKE_Q() macro to DEFINE_WAKE_Q() which clarifies
what the macro is doing and eliminates the checkpatch.pl warnings.
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1479401198-1765-1-git-send-email-longman@redhat.com
[ Resolved conflict and added missing rename. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>