Changes in 5.10.38
KEYS: trusted: Fix memory leak on object td
tpm: fix error return code in tpm2_get_cc_attrs_tbl()
tpm, tpm_tis: Extend locality handling to TPM2 in tpm_tis_gen_interrupt()
tpm, tpm_tis: Reserve locality in tpm_tis_resume()
KVM: x86/mmu: Remove the defunct update_pte() paging hook
KVM/VMX: Invoke NMI non-IST entry instead of IST entry
ACPI: PM: Add ACPI ID of Alder Lake Fan
PM: runtime: Fix unpaired parent child_count for force_resume
cpufreq: intel_pstate: Use HWP if enabled by platform firmware
kvm: Cap halt polling at kvm->max_halt_poll_ns
ath11k: fix thermal temperature read
fs: dlm: fix debugfs dump
fs: dlm: add errno handling to check callback
fs: dlm: check on minimum msglen size
fs: dlm: flush swork on shutdown
tipc: convert dest node's address to network order
ASoC: Intel: bytcr_rt5640: Enable jack-detect support on Asus T100TAF
net/mlx5e: Use net_prefetchw instead of prefetchw in MPWQE TX datapath
net: stmmac: Set FIFO sizes for ipq806x
ASoC: rsnd: core: Check convert rate in rsnd_hw_params
Bluetooth: Fix incorrect status handling in LE PHY UPDATE event
i2c: bail out early when RDWR parameters are wrong
ALSA: hdsp: don't disable if not enabled
ALSA: hdspm: don't disable if not enabled
ALSA: rme9652: don't disable if not enabled
ALSA: bebob: enable to deliver MIDI messages for multiple ports
Bluetooth: Set CONF_NOT_COMPLETE as l2cap_chan default
Bluetooth: initialize skb_queue_head at l2cap_chan_create()
net/sched: cls_flower: use ntohs for struct flow_dissector_key_ports
net: bridge: when suppression is enabled exclude RARP packets
Bluetooth: check for zapped sk before connecting
selftests/powerpc: Fix L1D flushing tests for Power10
powerpc/32: Statically initialise first emergency context
net: hns3: remediate a potential overflow risk of bd_num_list
net: hns3: add handling for xmit skb with recursive fraglist
ip6_vti: proper dev_{hold|put} in ndo_[un]init methods
ASoC: Intel: bytcr_rt5640: Add quirk for the Chuwi Hi8 tablet
ice: handle increasing Tx or Rx ring sizes
Bluetooth: btusb: Enable quirk boolean flag for Mediatek Chip.
ASoC: rt5670: Add a quirk for the Dell Venue 10 Pro 5055
i2c: Add I2C_AQ_NO_REP_START adapter quirk
MIPS: Loongson64: Use _CACHE_UNCACHED instead of _CACHE_UNCACHED_ACCELERATED
coresight: Do not scan for graph if none is present
IB/hfi1: Correct oversized ring allocation
mac80211: clear the beacon's CRC after channel switch
pinctrl: samsung: use 'int' for register masks in Exynos
rtw88: 8822c: add LC calibration for RTL8822C
mt76: mt7615: support loading EEPROM for MT7613BE
mt76: mt76x0: disable GTK offloading
mt76: mt7915: fix txpower init for TSSI off chips
fuse: invalidate attrs when page writeback completes
virtiofs: fix userns
cuse: prevent clone
iwlwifi: pcie: make cfg vs. trans_cfg more robust
powerpc/mm: Add cond_resched() while removing hpte mappings
ASoC: rsnd: call rsnd_ssi_master_clk_start() from rsnd_ssi_init()
Revert "iommu/amd: Fix performance counter initialization"
iommu/amd: Remove performance counter pre-initialization test
drm/amd/display: Force vsync flip when reconfiguring MPCC
selftests: Set CC to clang in lib.mk if LLVM is set
kconfig: nconf: stop endless search loops
ALSA: hda/realtek: Add quirk for Lenovo Ideapad S740
ASoC: Intel: sof_sdw: add quirk for new ADL-P Rvp
ALSA: hda/hdmi: fix race in handling acomp ELD notification at resume
sctp: Fix out-of-bounds warning in sctp_process_asconf_param()
flow_dissector: Fix out-of-bounds warning in __skb_flow_bpf_to_target()
powerpc/smp: Set numa node before updating mask
ASoC: rt286: Generalize support for ALC3263 codec
ethtool: ioctl: Fix out-of-bounds warning in store_link_ksettings_for_user()
net: sched: tapr: prevent cycle_time == 0 in parse_taprio_schedule
samples/bpf: Fix broken tracex1 due to kprobe argument change
powerpc/pseries: Stop calling printk in rtas_stop_self()
drm/amd/display: fixed divide by zero kernel crash during dsc enablement
drm/amd/display: add handling for hdcp2 rx id list validation
drm/amdgpu: Add mem sync flag for IB allocated by SA
mt76: mt7615: fix entering driver-own state on mt7663
crypto: ccp: Free SEV device if SEV init fails
wl3501_cs: Fix out-of-bounds warnings in wl3501_send_pkt
wl3501_cs: Fix out-of-bounds warnings in wl3501_mgmt_join
qtnfmac: Fix possible buffer overflow in qtnf_event_handle_external_auth
powerpc/iommu: Annotate nested lock for lockdep
iavf: remove duplicate free resources calls
net: ethernet: mtk_eth_soc: fix RX VLAN offload
selftests: mlxsw: Increase the tolerance of backlog buildup
selftests: mlxsw: Fix mausezahn invocation in ERSPAN scale test
kbuild: generate Module.symvers only when vmlinux exists
bnxt_en: Add PCI IDs for Hyper-V VF devices.
ia64: module: fix symbolizer crash on fdescr
watchdog: rename __touch_watchdog() to a better descriptive name
watchdog: explicitly update timestamp when reporting softlockup
watchdog/softlockup: remove logic that tried to prevent repeated reports
watchdog: fix barriers when printing backtraces from all CPUs
ASoC: rt286: Make RT286_SET_GPIO_* readable and writable
thermal: thermal_of: Fix error return code of thermal_of_populate_bind_params()
f2fs: move ioctl interface definitions to separated file
f2fs: fix compat F2FS_IOC_{MOVE,GARBAGE_COLLECT}_RANGE
f2fs: fix to allow migrating fully valid segment
f2fs: fix panic during f2fs_resize_fs()
f2fs: fix a redundant call to f2fs_balance_fs if an error occurs
remoteproc: qcom_q6v5_mss: Replace ioremap with memremap
remoteproc: qcom_q6v5_mss: Validate p_filesz in ELF loader
PCI: iproc: Fix return value of iproc_msi_irq_domain_alloc()
PCI: Release OF node in pci_scan_device()'s error path
ARM: 9064/1: hw_breakpoint: Do not directly check the event's overflow_handler hook
f2fs: fix to align to section for fallocate() on pinned file
f2fs: fix to update last i_size if fallocate partially succeeds
PCI: endpoint: Make *_get_first_free_bar() take into account 64 bit BAR
PCI: endpoint: Add helper API to get the 'next' unreserved BAR
PCI: endpoint: Make *_free_bar() to return error codes on failure
PCI: endpoint: Fix NULL pointer dereference for ->get_features()
f2fs: fix to avoid touching checkpointed data in get_victim()
f2fs: fix to cover __allocate_new_section() with curseg_lock
f2fs: Fix a hungtask problem in atomic write
f2fs: fix to avoid accessing invalid fio in f2fs_allocate_data_block()
rpmsg: qcom_glink_native: fix error return code of qcom_glink_rx_data()
NFS: nfs4_bitmask_adjust() must not change the server global bitmasks
NFS: Fix attribute bitmask in _nfs42_proc_fallocate()
NFSv4.2: Always flush out writes in nfs42_proc_fallocate()
NFS: Deal correctly with attribute generation counter overflow
PCI: endpoint: Fix missing destroy_workqueue()
pNFS/flexfiles: fix incorrect size check in decode_nfs_fh()
NFSv4.2 fix handling of sr_eof in SEEK's reply
SUNRPC: Move fault injection call sites
SUNRPC: Remove trace_xprt_transmit_queued
SUNRPC: Handle major timeout in xprt_adjust_timeout()
thermal/drivers/tsens: Fix missing put_device error
NFSv4.x: Don't return NFS4ERR_NOMATCHING_LAYOUT if we're unmounting
nfsd: ensure new clients break delegations
rtc: fsl-ftm-alarm: add MODULE_TABLE()
dmaengine: idxd: Fix potential null dereference on pointer status
dmaengine: idxd: fix dma device lifetime
dmaengine: idxd: fix cdev setup and free device lifetime issues
SUNRPC: fix ternary sign expansion bug in tracing
pwm: atmel: Fix duty cycle calculation in .get_state()
xprtrdma: Avoid Receive Queue wrapping
xprtrdma: Fix cwnd update ordering
xprtrdma: rpcrdma_mr_pop() already does list_del_init()
swiotlb: Fix the type of index
ceph: fix inode leak on getattr error in __fh_to_dentry
scsi: qla2xxx: Prevent PRLI in target mode
scsi: ufs: core: Do not put UFS power into LPM if link is broken
scsi: ufs: core: Cancel rpm_dev_flush_recheck_work during system suspend
scsi: ufs: core: Narrow down fast path in system suspend path
rtc: ds1307: Fix wday settings for rx8130
net: hns3: fix incorrect configuration for igu_egu_hw_err
net: hns3: initialize the message content in hclge_get_link_mode()
net: hns3: add check for HNS3_NIC_STATE_INITED in hns3_reset_notify_up_enet()
net: hns3: fix for vxlan gpe tx checksum bug
net: hns3: use netif_tx_disable to stop the transmit queue
net: hns3: disable phy loopback setting in hclge_mac_start_phy
sctp: do asoc update earlier in sctp_sf_do_dupcook_a
RISC-V: Fix error code returned by riscv_hartid_to_cpuid()
sunrpc: Fix misplaced barrier in call_decode
libbpf: Fix signed overflow in ringbuf_process_ring
block/rnbd-clt: Change queue_depth type in rnbd_clt_session to size_t
block/rnbd-clt: Check the return value of the function rtrs_clt_query
ethernet:enic: Fix a use after free bug in enic_hard_start_xmit
sctp: fix a SCTP_MIB_CURRESTAB leak in sctp_sf_do_dupcook_b
netfilter: xt_SECMARK: add new revision to fix structure layout
xsk: Fix for xp_aligned_validate_desc() when len == chunk_size
net: stmmac: Clear receive all(RA) bit when promiscuous mode is off
drm/radeon: Fix off-by-one power_state index heap overwrite
drm/radeon: Avoid power table parsing memory leaks
arm64: entry: factor irq triage logic into macros
arm64: entry: always set GIC_PRIO_PSR_I_SET during entry
khugepaged: fix wrong result value for trace_mm_collapse_huge_page_isolate()
mm/hugeltb: handle the error case in hugetlb_fix_reserve_counts()
mm/migrate.c: fix potential indeterminate pte entry in migrate_vma_insert_page()
ksm: fix potential missing rmap_item for stable_node
mm/gup: check every subpage of a compound page during isolation
mm/gup: return an error on migration failure
mm/gup: check for isolation errors
ethtool: fix missing NLM_F_MULTI flag when dumping
net: fix nla_strcmp to handle more then one trailing null character
smc: disallow TCP_ULP in smc_setsockopt()
netfilter: nfnetlink_osf: Fix a missing skb_header_pointer() NULL check
netfilter: nftables: Fix a memleak from userdata error path in new objects
can: mcp251xfd: mcp251xfd_probe(): add missing can_rx_offload_del() in error path
can: mcp251x: fix resume from sleep before interface was brought up
can: m_can: m_can_tx_work_queue(): fix tx_skb race condition
sched: Fix out-of-bound access in uclamp
sched/fair: Fix unfairness caused by missing load decay
fs/proc/generic.c: fix incorrect pde_is_permanent check
kernel: kexec_file: fix error return code of kexec_calculate_store_digests()
kernel/resource: make walk_system_ram_res() find all busy IORESOURCE_SYSTEM_RAM resources
kernel/resource: make walk_mem_res() find all busy IORESOURCE_MEM resources
netfilter: nftables: avoid overflows in nft_hash_buckets()
i40e: fix broken XDP support
i40e: Fix use-after-free in i40e_client_subtask()
i40e: fix the restart auto-negotiation after FEC modified
i40e: Fix PHY type identifiers for 2.5G and 5G adapters
mptcp: fix splat when closing unaccepted socket
f2fs: avoid unneeded data copy in f2fs_ioc_move_range()
ARC: entry: fix off-by-one error in syscall number validation
ARC: mm: PAE: use 40-bit physical page mask
ARC: mm: Use max_high_pfn as a HIGHMEM zone border
powerpc/64s: Fix crashes when toggling stf barrier
powerpc/64s: Fix crashes when toggling entry flush barrier
hfsplus: prevent corruption in shrinking truncate
squashfs: fix divide error in calculate_skip()
userfaultfd: release page in error path to avoid BUG_ON
kasan: fix unit tests with CONFIG_UBSAN_LOCAL_BOUNDS enabled
mm/hugetlb: fix F_SEAL_FUTURE_WRITE
blk-iocost: fix weight updates of inner active iocgs
arm64: mte: initialize RGSR_EL1.SEED in __cpu_setup
arm64: Fix race condition on PG_dcache_clean in __sync_icache_dcache()
btrfs: fix race leading to unpersisted data and metadata on fsync
drm/radeon/dpm: Disable sclk switching on Oland when two 4K 60Hz monitors are connected
drm/amd/display: Initialize attribute for hdcp_srm sysfs file
drm/i915: Avoid div-by-zero on gen2
kvm: exit halt polling on need_resched() as well
KVM: LAPIC: Accurately guarantee busy wait for timer to expire when using hv_timer
drm/msm/dp: initialize audio_comp when audio starts
KVM: x86: Cancel pvclock_gtod_work on module removal
KVM: x86: Prevent deadlock against tk_core.seq
dax: Add an enum for specifying dax wakup mode
dax: Add a wakeup mode parameter to put_unlocked_entry()
dax: Wake up all waiters after invalidating dax entry
xen/unpopulated-alloc: consolidate pgmap manipulation
xen/unpopulated-alloc: fix error return code in fill_list()
perf tools: Fix dynamic libbpf link
usb: dwc3: gadget: Free gadget structure only after freeing endpoints
iio: light: gp2ap002: Fix rumtime PM imbalance on error
iio: proximity: pulsedlight: Fix rumtime PM imbalance on error
iio: hid-sensors: select IIO_TRIGGERED_BUFFER under HID_SENSOR_IIO_TRIGGER
usb: fotg210-hcd: Fix an error message
hwmon: (occ) Fix poll rate limiting
usb: musb: Fix an error message
ACPI: scan: Fix a memory leak in an error handling path
kyber: fix out of bounds access when preempted
nvmet: add lba to sect conversion helpers
nvmet: fix inline bio check for bdev-ns
nvmet-rdma: Fix NULL deref when SEND is completed with error
f2fs: compress: fix to free compress page correctly
f2fs: compress: fix race condition of overwrite vs truncate
f2fs: compress: fix to assign cc.cluster_idx correctly
nbd: Fix NULL pointer in flush_workqueue
blk-mq: plug request for shared sbitmap
blk-mq: Swap two calls in blk_mq_exit_queue()
usb: dwc3: omap: improve extcon initialization
usb: dwc3: pci: Enable usb2-gadget-lpm-disable for Intel Merrifield
usb: xhci: Increase timeout for HC halt
usb: dwc2: Fix gadget DMA unmap direction
usb: core: hub: fix race condition about TRSMRCY of resume
usb: dwc3: gadget: Enable suspend events
usb: dwc3: gadget: Return success always for kick transfer in ep queue
usb: typec: ucsi: Retrieve all the PDOs instead of just the first 4
usb: typec: ucsi: Put fwnode in any case during ->probe()
xhci-pci: Allow host runtime PM as default for Intel Alder Lake xHCI
xhci: Do not use GFP_KERNEL in (potentially) atomic context
xhci: Add reset resume quirk for AMD xhci controller.
iio: gyro: mpu3050: Fix reported temperature value
iio: tsl2583: Fix division by a zero lux_val
cdc-wdm: untangle a circular dependency between callback and softint
xen/gntdev: fix gntdev_mmap() error exit path
KVM: x86: Emulate RDPID only if RDTSCP is supported
KVM: x86: Move RDPID emulation intercept to its own enum
KVM: nVMX: Always make an attempt to map eVMCS after migration
KVM: VMX: Do not advertise RDPID if ENABLE_RDTSCP control is unsupported
KVM: VMX: Disable preemption when probing user return MSRs
Revert "iommu/vt-d: Remove WO permissions on second-level paging entries"
Revert "iommu/vt-d: Preset Access/Dirty bits for IOVA over FL"
iommu/vt-d: Preset Access/Dirty bits for IOVA over FL
iommu/vt-d: Remove WO permissions on second-level paging entries
mm: fix struct page layout on 32-bit systems
MIPS: Reinstate platform `__div64_32' handler
MIPS: Avoid DIVU in `__div64_32' is result would be zero
MIPS: Avoid handcoded DIVU in `__div64_32' altogether
clocksource/drivers/timer-ti-dm: Prepare to handle dra7 timer wrap issue
clocksource/drivers/timer-ti-dm: Handle dra7 timer wrap errata i940
ARM: 9011/1: centralize phys-to-virt conversion of DT/ATAGS address
ARM: 9012/1: move device tree mapping out of linear region
ARM: 9020/1: mm: use correct section size macro to describe the FDT virtual address
ARM: 9027/1: head.S: explicitly map DT even if it lives in the first physical section
usb: typec: tcpm: Fix error while calculating PPS out values
kobject_uevent: remove warning in init_uevent_argv()
drm/i915/gt: Fix a double free in gen8_preallocate_top_level_pdp
drm/i915: Read C0DRB3/C1DRB3 as 16 bits again
drm/i915/overlay: Fix active retire callback alignment
drm/i915: Fix crash in auto_retire
clk: exynos7: Mark aclk_fsys1_200 as critical
media: rkvdec: Remove of_match_ptr()
i2c: mediatek: Fix send master code at more than 1MHz
dt-bindings: media: renesas,vin: Make resets optional on R-Car Gen1
dt-bindings: serial: 8250: Remove duplicated compatible strings
debugfs: Make debugfs_allow RO after init
ext4: fix debug format string warning
nvme: do not try to reconfigure APST when the controller is not live
ASoC: rsnd: check all BUSIF status when error
Linux 5.10.38
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia32e01283b488a38be48015c58a0e481f09aaf65
[ Upstream commit 1bc503cb4a2638fb1c57801a7796aca57845ce63 ]
The softlockup detector does some gymnastic with the variable
soft_watchdog_warn. It was added by the commit 58687acba5
("lockup_detector: Combine nmi_watchdog and softlockup detector").
The purpose is not completely clear. There are the following clues. They
describe the situation how it looked after the above mentioned commit:
1. The variable was checked with a comment "only warn once".
2. The variable was set when softlockup was reported. It was cleared
only when the CPU was not longer in the softlockup state.
3. watchdog_touch_ts was not explicitly updated when the softlockup
was reported. Without this variable, the report would normally
be printed again during every following watchdog_timer_fn()
invocation.
The logic has got even more tangled up by the commit ed235875e2
("kernel/watchdog.c: print traces for all cpus on lockup detection").
After this commit, soft_watchdog_warn is set only when
softlockup_all_cpu_backtrace is enabled. But multiple reports from all
CPUs are prevented by a new variable soft_lockup_nmi_warn.
Conclusion:
The variable probably never worked as intended. In each case, it has not
worked last many years because the softlockup was reported repeatedly
after the full period defined by watchdog_thresh.
The reason is that watchdog gets touched in many known slow paths, for
example, in printk_stack_address(). This code is called also when
printing the softlockup report. It means that the watchdog timestamp gets
updated after each report.
Solution:
Simply remove the logic. People want the periodic report anyway.
Link: https://lkml.kernel.org/r/20210311122130.6788-5-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c9ad17c991492f4390f42598f6ab0531f87eed07 ]
The softlockup situation might stay for a long time or even forever. When
it happens, the softlockup debug messages are printed in regular intervals
defined by get_softlockup_thresh().
There is a mystery. The repeated message is printed after the full
interval that is defined by get_softlockup_thresh(). But the timer
callback is called more often as defined by sample_period. The code looks
like the soflockup should get reported in every sample_period when it was
once behind the thresh.
It works only by chance. The watchdog is touched when printing the stall
report, for example, in printk_stack_address().
Make the behavior clear and predictable by explicitly updating the
timestamp in watchdog_timer_fn() when the report gets printed.
Link: https://lkml.kernel.org/r/20210311122130.6788-3-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7c0012f522c802d25be102bafe54f333168e6119 ]
Patch series "watchdog/softlockup: Report overall time and some cleanup", v2.
I dug deep into the softlockup watchdog history when time permitted this
year. And reworked the patchset that fixed timestamps and cleaned up the
code[2].
I split it into very small steps and did even more code clean up. The
result looks quite strightforward and I am pretty confident with the
changes.
[1] v2: https://lore.kernel.org/r/20201210160038.31441-1-pmladek@suse.com
[2] v1: https://lore.kernel.org/r/20191024114928.15377-1-pmladek@suse.com
This patch (of 6):
There are many touch_*watchdog() functions. They are called in situations
where the watchdog could report false positives or create unnecessary
noise. For example, when CPU is entering idle mode, a virtual machine is
stopped, or a lot of messages are printed in the atomic context.
These functions set SOFTLOCKUP_RESET instead of a real timestamp. It
allows to call them even in a context where jiffies might be outdated.
For example, in an atomic context.
The real timestamp is set by __touch_watchdog() that is called from the
watchdog timer callback.
Rename this callback to update_touch_ts(). It better describes the effect
and clearly distinguish is from the other touch_*watchdog() functions.
Another motivation is that two timestamps are going to be used. One will
be used for the total softlockup time. The other will be used to measure
time since the last report. The new function name will help to
distinguish which timestamp is being updated.
Link: https://lkml.kernel.org/r/20210311122130.6788-1-pmladek@suse.com
Link: https://lkml.kernel.org/r/20210311122130.6788-2-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Add hook to gather data of softlockup and summarize it with
other information.
Bug: 177483057
Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
Change-Id: I42b906f17ad689176f0cc5a1a46acd0b5971d6c5
After a recent change introduced by Vlastimil's series [0], kernel is
able now to handle sysctl parameters on kernel command line; also, the
series introduced a simple infrastructure to convert legacy boot
parameters (that duplicate sysctls) into sysctl aliases.
This patch converts the watchdog parameters softlockup_panic and
{hard,soft}lockup_all_cpu_backtrace to use the new alias infrastructure.
It fixes the documentation too, since the alias only accepts values 0 or
1, not the full range of integers.
We also took the opportunity here to improve the documentation of the
previously converted hung_task_panic (see the patch series [0]) and put
the alias table in alphabetical order.
[0] http://lkml.kernel.org/r/20200427180433.7029-1-vbabka@suse.cz
Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Link: http://lkml.kernel.org/r/20200507214624.21911-1-gpiccoli@canonical.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Instead of having all the sysctl handlers deal with user pointers, which
is rather hairy in terms of the BPF interaction, copy the input to and
from userspace in common code. This also means that the strings are
always NUL-terminated by the common code, making the API a little bit
safer.
As most handler just pass through the data to one of the common handlers
a lot of the changes are mechnical.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Robert reported that during boot the watchdog timestamp is set to 0 for one
second which is the indicator for a watchdog reset.
The reason for this is that the timestamp is in seconds and the time is
taken from sched clock and divided by ~1e9. sched clock starts at 0 which
means that for the first second during boot the watchdog timestamp is 0,
i.e. reset.
Use ULONG_MAX as the reset indicator value so the watchdog works correctly
right from the start. ULONG_MAX would only conflict with a real timestamp
if the system reaches an uptime of 136 years on 32bit and almost eternity
on 64bit.
Reported-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/87o8v3uuzl.fsf@nanos.tec.linutronix.de
commit 9cf57731b6 ("watchdog/softlockup: Replace "watchdog/%u" threads
with cpu_stop_work") ensures that the watchdog is reliably touched during
a task switch.
As a result the check for an unnoticed task switch is not longer needed.
Remove the relevant code, which effectively reverts commit b1a8de1f53
("softlockup: make detector be aware of task switch of processes hogging
cpu")
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Ziljstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20191024114928.15377-2-pmladek@suse.com
sparse complains:
CHECK kernel/watchdog.c
kernel/watchdog.c:45:19: warning: symbol 'nmi_watchdog_available'
was not declared. Should it be static?
kernel/watchdog.c:47:16: warning: symbol 'watchdog_allowed_mask'
was not declared. Should it be static?
They're not referenced by name from anyplace else, make them static.
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/7855.1552383228@turing-police
The hard and soft lockup detector threshold has a default value of 10
seconds which can only be changed via sysctl.
During early boot lockup detection can trigger when noisy debugging emits
a large amount of messages to the console, but there is no way to set a
larger threshold on the kernel command line. The detector can only be
completely disabled.
Add a new watchdog_thresh= command line parameter to allow boot time
control over the threshold. It works in the same way as the sysctl and
affects both the soft and the hard lockup detectors.
Signed-off-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: rdunlap@infradead.org
Cc: prarit@redhat.com
Link: https://lkml.kernel.org/r/1541079018-13953-1-git-send-email-loberman@redhat.com
Some architectures need to use stop_machine() to patch functions for
ftrace, and the assumption is that the stopped CPUs do not make function
calls to traceable functions when they are in the stopped state.
Commit ce4f06dcbb ("stop_machine: Touch_nmi_watchdog() after
MULTI_STOP_PREPARE") added calls to the watchdog touch functions from
the stopped CPUs and those functions lack notrace annotations. This
leads to crashes when enabling/disabling ftrace on ARM kernels built
with the Thumb-2 instruction set.
Fix it by adding the necessary notrace annotations.
Fixes: ce4f06dcbb ("stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: oleg@redhat.com
Cc: tj@kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180821152507.18313-1-vincent.whitchurch@axis.com
Oleg suggested to replace the "watchdog/%u" threads with
cpu_stop_work. That removes one thread per CPU while at the same time
fixes softlockup vs SCHED_DEADLINE.
But more importantly, it does away with the single
smpboot_update_cpumask_percpu_thread() user, which allows
cleanups/shrinkage of the smpboot interface.
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The function names made sense up to the point where the watchdog
(re)configuration was unified to use softlockup_reconfigure_threads() for
all configuration purposes. But that includes scenarios which solely
configure the nmi watchdog.
Rename softlockup_reconfigure_threads() and softlockup_init_threads() so
the function names match the functionality.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Don Zickus <dzickus@redhat.com>
The rework of the core hotplug code triggers the WARN_ON in start_wd_cpu()
on powerpc because it is called multiple times for the boot CPU.
The first call is via:
start_wd_on_cpu+0x80/0x2f0
watchdog_nmi_reconfigure+0x124/0x170
softlockup_reconfigure_threads+0x110/0x130
lockup_detector_init+0xbc/0xe0
kernel_init_freeable+0x18c/0x37c
kernel_init+0x2c/0x160
ret_from_kernel_thread+0x5c/0xbc
And then again via the CPU hotplug registration:
start_wd_on_cpu+0x80/0x2f0
cpuhp_invoke_callback+0x194/0x620
cpuhp_thread_fun+0x7c/0x1b0
smpboot_thread_fn+0x290/0x2a0
kthread+0x168/0x1b0
ret_from_kernel_thread+0x5c/0xbc
This can be avoided by setting up the cpu hotplug state with nocalls and
move the initialization to the watchdog_nmi_probe() function. That
initializes the hotplug callbacks without invoking the callback and the
following core initialization function then configures the watchdog for the
online CPUs (in this case CPU0) via softlockup_reconfigure_threads().
Reported-and-tested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
The lockup detector reconfiguration tears down all watchdog threads when
the watchdog is disabled and sets them up again when its enabled.
That's a pointless exercise. The watchdog threads are not consuming an
insane amount of resources, so it's enough to set them up at init time and
keep them in parked position when the watchdog is disabled and unpark them
when it is reenabled. The smpboot thread infrastructure takes care of
keeping the force parked threads in place even across cpu hotplug.
Aside of that the code implements the park/unpark facility of smp hotplug
threads on its own, which is even more pointless. We have functionality in
the smpboot thread code to do so.
Use the new thread management functions and get rid of the unholy mess.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194147.470370113@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The lockup detector reconfiguration tears down all watchdog threads when
the watchdog is disabled and sets them up again when its enabled.
That's a pointless exercise. The watchdog threads are not consuming an
insane amount of resources, so it's enough to set them up at init time and
keep them in parked position when the watchdog is disabled and unpark them
when it is reenabled. The smpboot thread infrastructure takes care of
keeping the force parked threads in place even across cpu hotplug.
Another horrible mechanism are the open coded park/unpark loops which are
used for reconfiguration of the watchdog. The smpboot infrastructure allows
exactly the same via smpboot_update_cpumask_thread_percpu(), which is cpu
hotplug safe. Using that instead of the open coded loops allows to get rid
of the hotplug locking mess in the watchdog code.
Implement a clean infrastructure which allows to replace the open coded
nonsense.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194147.377182587@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit:
b94f51183b ("kernel/watchdog: prevent false hardlockup on overloaded system")
tries to fix the following issue:
proc_write()
set_sample_period() <--- New sample period becoms visible
<----- Broken starts
proc_watchdog_update()
watchdog_enable_all_cpus() watchdog_hrtimer_fn()
update_watchdog_all_cpus() restart_timer(sample_period)
watchdog_park_threads()
thread->park()
disable_nmi()
<----- Broken ends
The reason why this is broken is that the update of the watchdog threshold
becomes immediately effective and visible for the hrtimer function which
uses that value to rearm the timer. But the NMI/perf side still uses the
old value up to the point where it is disabled. If the rate has been
lowered then the NMI can run fast enough to 'detect' a hard lockup because
the timer has not fired due to the longer period.
The patch 'fixed' this by adding a variable:
proc_write()
set_sample_period()
<----- Broken starts
proc_watchdog_update()
watchdog_enable_all_cpus() watchdog_hrtimer_fn()
update_watchdog_all_cpus() restart_timer(sample_period)
watchdog_park_threads()
park_in_progress = 1
<----- Broken ends
nmi_watchdog()
if (park_in_progress)
return;
The only effect of this variable was to make the window where the breakage
can hit small enough that it was not longer observable in testing. From a
correctness point of view it is a pointless bandaid which merily papers
over the root cause: the unsychronized update of the variable.
Looking deeper into the related code pathes unearthed similar problems in
the watchdog_start()/stop() functions.
watchdog_start()
perf_nmi_event_start()
hrtimer_start()
watchdog_stop()
hrtimer_cancel()
perf_nmi_event_stop()
In both cases the call order is wrong because if the tasks gets preempted
or the VM gets scheduled out long enough after the first call, then there is
a chance that the next NMI will see a stale hrtimer interrupt count and
trigger a false positive hard lockup splat.
Get rid of park_in_progress so the code can be gradually deobfuscated and
pruned from several layers of duct tape papering over the root cause,
which has been either ignored or not understood at all.
Once this is removed the underlying problem will be fixed by rewriting the
proc interface to do a proper synchronized update.
Address the start/stop() ordering problem as well by reverting the call
order, so this part is at least correct now.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1709052038270.2393@nanos
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The following deadlock is possible in the watchdog hotplug code:
cpus_write_lock()
...
takedown_cpu()
smpboot_park_threads()
smpboot_park_thread()
kthread_park()
->park() := watchdog_disable()
watchdog_nmi_disable()
perf_event_release_kernel();
put_event()
_free_event()
->destroy() := hw_perf_event_destroy()
x86_release_hardware()
release_ds_buffers()
get_online_cpus()
when a per cpu watchdog perf event is destroyed which drops the last
reference to the PMU hardware. The cleanup code there invokes
get_online_cpus() which instantly deadlocks because the hotplug percpu
rwsem is write locked.
To solve this add a deferring mechanism:
cpus_write_lock()
kthread_park()
watchdog_nmi_disable(deferred)
perf_event_disable(event);
move_event_to_deferred(event);
....
cpus_write_unlock()
cleaup_deferred_events()
perf_event_release_kernel()
This is still properly serialized against concurrent hotplug via the
cpu_add_remove_lock, which is held by the task which initiated the hotplug
event.
This is also used to handle event destruction when the watchdog threads are
parked via other mechanisms than CPU hotplug.
Analyzed-by: Peter Zijlstra <peterz@infradead.org>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194146.884469246@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The self disabling feature is broken vs. CPU hotplug locking:
CPU 0 CPU 1
cpus_write_lock();
cpu_up(1)
wait_for_completion()
....
unpark_watchdog()
->unpark()
perf_event_create() <- fails
watchdog_enable &= ~NMI_WATCHDOG;
....
cpus_write_unlock();
CPU 2
cpus_write_lock()
cpu_down(2)
wait_for_completion()
wakeup(watchdog);
watchdog()
if (!(watchdog_enable & NMI_WATCHDOG))
watchdog_nmi_disable()
perf_event_disable()
....
cpus_read_lock();
stop_smpboot_threads()
park_watchdog();
wait_for_completion(watchdog->parked);
Result: End of hotplug and instantaneous full lockup of the machine.
There is a similar problem with disabling the watchdog via the user space
interface as the sysctl function fiddles with watchdog_enable directly.
It's very debatable whether this is required at all. If the watchdog works
nicely on N CPUs and it fails to enable on the N + 1 CPU either during
hotplug or because the user space interface disabled it via sysctl cpumask
and then some perf user grabbed the counter which is then unavailable for
the watchdog when the sysctl cpumask gets changed back.
There is no real justification for this.
One of the reasons WHY this is done is the utter stupidity of the init code
of the perf NMI watchdog. Instead of checking upfront at boot whether PERF
is available and functional at all, it just does this check at run time
over and over when user space fiddles with the sysctl. That's broken beyond
repair along with the idiotic error code dependent warn level printks and
the even more silly printk rate limiting.
If the init code checks whether perf works at boot time, then this mess can
be more or less avoided completely. Perf does not come magically into life
at runtime. Brain usage while coding is overrated.
Remove the cruft and add a temporary safe guard which gets removed later.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194146.806708429@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>