Commit Graph

189 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
76002c201f Merge 5.10.38 into android12-5.10
Changes in 5.10.38
	KEYS: trusted: Fix memory leak on object td
	tpm: fix error return code in tpm2_get_cc_attrs_tbl()
	tpm, tpm_tis: Extend locality handling to TPM2 in tpm_tis_gen_interrupt()
	tpm, tpm_tis: Reserve locality in tpm_tis_resume()
	KVM: x86/mmu: Remove the defunct update_pte() paging hook
	KVM/VMX: Invoke NMI non-IST entry instead of IST entry
	ACPI: PM: Add ACPI ID of Alder Lake Fan
	PM: runtime: Fix unpaired parent child_count for force_resume
	cpufreq: intel_pstate: Use HWP if enabled by platform firmware
	kvm: Cap halt polling at kvm->max_halt_poll_ns
	ath11k: fix thermal temperature read
	fs: dlm: fix debugfs dump
	fs: dlm: add errno handling to check callback
	fs: dlm: check on minimum msglen size
	fs: dlm: flush swork on shutdown
	tipc: convert dest node's address to network order
	ASoC: Intel: bytcr_rt5640: Enable jack-detect support on Asus T100TAF
	net/mlx5e: Use net_prefetchw instead of prefetchw in MPWQE TX datapath
	net: stmmac: Set FIFO sizes for ipq806x
	ASoC: rsnd: core: Check convert rate in rsnd_hw_params
	Bluetooth: Fix incorrect status handling in LE PHY UPDATE event
	i2c: bail out early when RDWR parameters are wrong
	ALSA: hdsp: don't disable if not enabled
	ALSA: hdspm: don't disable if not enabled
	ALSA: rme9652: don't disable if not enabled
	ALSA: bebob: enable to deliver MIDI messages for multiple ports
	Bluetooth: Set CONF_NOT_COMPLETE as l2cap_chan default
	Bluetooth: initialize skb_queue_head at l2cap_chan_create()
	net/sched: cls_flower: use ntohs for struct flow_dissector_key_ports
	net: bridge: when suppression is enabled exclude RARP packets
	Bluetooth: check for zapped sk before connecting
	selftests/powerpc: Fix L1D flushing tests for Power10
	powerpc/32: Statically initialise first emergency context
	net: hns3: remediate a potential overflow risk of bd_num_list
	net: hns3: add handling for xmit skb with recursive fraglist
	ip6_vti: proper dev_{hold|put} in ndo_[un]init methods
	ASoC: Intel: bytcr_rt5640: Add quirk for the Chuwi Hi8 tablet
	ice: handle increasing Tx or Rx ring sizes
	Bluetooth: btusb: Enable quirk boolean flag for Mediatek Chip.
	ASoC: rt5670: Add a quirk for the Dell Venue 10 Pro 5055
	i2c: Add I2C_AQ_NO_REP_START adapter quirk
	MIPS: Loongson64: Use _CACHE_UNCACHED instead of _CACHE_UNCACHED_ACCELERATED
	coresight: Do not scan for graph if none is present
	IB/hfi1: Correct oversized ring allocation
	mac80211: clear the beacon's CRC after channel switch
	pinctrl: samsung: use 'int' for register masks in Exynos
	rtw88: 8822c: add LC calibration for RTL8822C
	mt76: mt7615: support loading EEPROM for MT7613BE
	mt76: mt76x0: disable GTK offloading
	mt76: mt7915: fix txpower init for TSSI off chips
	fuse: invalidate attrs when page writeback completes
	virtiofs: fix userns
	cuse: prevent clone
	iwlwifi: pcie: make cfg vs. trans_cfg more robust
	powerpc/mm: Add cond_resched() while removing hpte mappings
	ASoC: rsnd: call rsnd_ssi_master_clk_start() from rsnd_ssi_init()
	Revert "iommu/amd: Fix performance counter initialization"
	iommu/amd: Remove performance counter pre-initialization test
	drm/amd/display: Force vsync flip when reconfiguring MPCC
	selftests: Set CC to clang in lib.mk if LLVM is set
	kconfig: nconf: stop endless search loops
	ALSA: hda/realtek: Add quirk for Lenovo Ideapad S740
	ASoC: Intel: sof_sdw: add quirk for new ADL-P Rvp
	ALSA: hda/hdmi: fix race in handling acomp ELD notification at resume
	sctp: Fix out-of-bounds warning in sctp_process_asconf_param()
	flow_dissector: Fix out-of-bounds warning in __skb_flow_bpf_to_target()
	powerpc/smp: Set numa node before updating mask
	ASoC: rt286: Generalize support for ALC3263 codec
	ethtool: ioctl: Fix out-of-bounds warning in store_link_ksettings_for_user()
	net: sched: tapr: prevent cycle_time == 0 in parse_taprio_schedule
	samples/bpf: Fix broken tracex1 due to kprobe argument change
	powerpc/pseries: Stop calling printk in rtas_stop_self()
	drm/amd/display: fixed divide by zero kernel crash during dsc enablement
	drm/amd/display: add handling for hdcp2 rx id list validation
	drm/amdgpu: Add mem sync flag for IB allocated by SA
	mt76: mt7615: fix entering driver-own state on mt7663
	crypto: ccp: Free SEV device if SEV init fails
	wl3501_cs: Fix out-of-bounds warnings in wl3501_send_pkt
	wl3501_cs: Fix out-of-bounds warnings in wl3501_mgmt_join
	qtnfmac: Fix possible buffer overflow in qtnf_event_handle_external_auth
	powerpc/iommu: Annotate nested lock for lockdep
	iavf: remove duplicate free resources calls
	net: ethernet: mtk_eth_soc: fix RX VLAN offload
	selftests: mlxsw: Increase the tolerance of backlog buildup
	selftests: mlxsw: Fix mausezahn invocation in ERSPAN scale test
	kbuild: generate Module.symvers only when vmlinux exists
	bnxt_en: Add PCI IDs for Hyper-V VF devices.
	ia64: module: fix symbolizer crash on fdescr
	watchdog: rename __touch_watchdog() to a better descriptive name
	watchdog: explicitly update timestamp when reporting softlockup
	watchdog/softlockup: remove logic that tried to prevent repeated reports
	watchdog: fix barriers when printing backtraces from all CPUs
	ASoC: rt286: Make RT286_SET_GPIO_* readable and writable
	thermal: thermal_of: Fix error return code of thermal_of_populate_bind_params()
	f2fs: move ioctl interface definitions to separated file
	f2fs: fix compat F2FS_IOC_{MOVE,GARBAGE_COLLECT}_RANGE
	f2fs: fix to allow migrating fully valid segment
	f2fs: fix panic during f2fs_resize_fs()
	f2fs: fix a redundant call to f2fs_balance_fs if an error occurs
	remoteproc: qcom_q6v5_mss: Replace ioremap with memremap
	remoteproc: qcom_q6v5_mss: Validate p_filesz in ELF loader
	PCI: iproc: Fix return value of iproc_msi_irq_domain_alloc()
	PCI: Release OF node in pci_scan_device()'s error path
	ARM: 9064/1: hw_breakpoint: Do not directly check the event's overflow_handler hook
	f2fs: fix to align to section for fallocate() on pinned file
	f2fs: fix to update last i_size if fallocate partially succeeds
	PCI: endpoint: Make *_get_first_free_bar() take into account 64 bit BAR
	PCI: endpoint: Add helper API to get the 'next' unreserved BAR
	PCI: endpoint: Make *_free_bar() to return error codes on failure
	PCI: endpoint: Fix NULL pointer dereference for ->get_features()
	f2fs: fix to avoid touching checkpointed data in get_victim()
	f2fs: fix to cover __allocate_new_section() with curseg_lock
	f2fs: Fix a hungtask problem in atomic write
	f2fs: fix to avoid accessing invalid fio in f2fs_allocate_data_block()
	rpmsg: qcom_glink_native: fix error return code of qcom_glink_rx_data()
	NFS: nfs4_bitmask_adjust() must not change the server global bitmasks
	NFS: Fix attribute bitmask in _nfs42_proc_fallocate()
	NFSv4.2: Always flush out writes in nfs42_proc_fallocate()
	NFS: Deal correctly with attribute generation counter overflow
	PCI: endpoint: Fix missing destroy_workqueue()
	pNFS/flexfiles: fix incorrect size check in decode_nfs_fh()
	NFSv4.2 fix handling of sr_eof in SEEK's reply
	SUNRPC: Move fault injection call sites
	SUNRPC: Remove trace_xprt_transmit_queued
	SUNRPC: Handle major timeout in xprt_adjust_timeout()
	thermal/drivers/tsens: Fix missing put_device error
	NFSv4.x: Don't return NFS4ERR_NOMATCHING_LAYOUT if we're unmounting
	nfsd: ensure new clients break delegations
	rtc: fsl-ftm-alarm: add MODULE_TABLE()
	dmaengine: idxd: Fix potential null dereference on pointer status
	dmaengine: idxd: fix dma device lifetime
	dmaengine: idxd: fix cdev setup and free device lifetime issues
	SUNRPC: fix ternary sign expansion bug in tracing
	pwm: atmel: Fix duty cycle calculation in .get_state()
	xprtrdma: Avoid Receive Queue wrapping
	xprtrdma: Fix cwnd update ordering
	xprtrdma: rpcrdma_mr_pop() already does list_del_init()
	swiotlb: Fix the type of index
	ceph: fix inode leak on getattr error in __fh_to_dentry
	scsi: qla2xxx: Prevent PRLI in target mode
	scsi: ufs: core: Do not put UFS power into LPM if link is broken
	scsi: ufs: core: Cancel rpm_dev_flush_recheck_work during system suspend
	scsi: ufs: core: Narrow down fast path in system suspend path
	rtc: ds1307: Fix wday settings for rx8130
	net: hns3: fix incorrect configuration for igu_egu_hw_err
	net: hns3: initialize the message content in hclge_get_link_mode()
	net: hns3: add check for HNS3_NIC_STATE_INITED in hns3_reset_notify_up_enet()
	net: hns3: fix for vxlan gpe tx checksum bug
	net: hns3: use netif_tx_disable to stop the transmit queue
	net: hns3: disable phy loopback setting in hclge_mac_start_phy
	sctp: do asoc update earlier in sctp_sf_do_dupcook_a
	RISC-V: Fix error code returned by riscv_hartid_to_cpuid()
	sunrpc: Fix misplaced barrier in call_decode
	libbpf: Fix signed overflow in ringbuf_process_ring
	block/rnbd-clt: Change queue_depth type in rnbd_clt_session to size_t
	block/rnbd-clt: Check the return value of the function rtrs_clt_query
	ethernet:enic: Fix a use after free bug in enic_hard_start_xmit
	sctp: fix a SCTP_MIB_CURRESTAB leak in sctp_sf_do_dupcook_b
	netfilter: xt_SECMARK: add new revision to fix structure layout
	xsk: Fix for xp_aligned_validate_desc() when len == chunk_size
	net: stmmac: Clear receive all(RA) bit when promiscuous mode is off
	drm/radeon: Fix off-by-one power_state index heap overwrite
	drm/radeon: Avoid power table parsing memory leaks
	arm64: entry: factor irq triage logic into macros
	arm64: entry: always set GIC_PRIO_PSR_I_SET during entry
	khugepaged: fix wrong result value for trace_mm_collapse_huge_page_isolate()
	mm/hugeltb: handle the error case in hugetlb_fix_reserve_counts()
	mm/migrate.c: fix potential indeterminate pte entry in migrate_vma_insert_page()
	ksm: fix potential missing rmap_item for stable_node
	mm/gup: check every subpage of a compound page during isolation
	mm/gup: return an error on migration failure
	mm/gup: check for isolation errors
	ethtool: fix missing NLM_F_MULTI flag when dumping
	net: fix nla_strcmp to handle more then one trailing null character
	smc: disallow TCP_ULP in smc_setsockopt()
	netfilter: nfnetlink_osf: Fix a missing skb_header_pointer() NULL check
	netfilter: nftables: Fix a memleak from userdata error path in new objects
	can: mcp251xfd: mcp251xfd_probe(): add missing can_rx_offload_del() in error path
	can: mcp251x: fix resume from sleep before interface was brought up
	can: m_can: m_can_tx_work_queue(): fix tx_skb race condition
	sched: Fix out-of-bound access in uclamp
	sched/fair: Fix unfairness caused by missing load decay
	fs/proc/generic.c: fix incorrect pde_is_permanent check
	kernel: kexec_file: fix error return code of kexec_calculate_store_digests()
	kernel/resource: make walk_system_ram_res() find all busy IORESOURCE_SYSTEM_RAM resources
	kernel/resource: make walk_mem_res() find all busy IORESOURCE_MEM resources
	netfilter: nftables: avoid overflows in nft_hash_buckets()
	i40e: fix broken XDP support
	i40e: Fix use-after-free in i40e_client_subtask()
	i40e: fix the restart auto-negotiation after FEC modified
	i40e: Fix PHY type identifiers for 2.5G and 5G adapters
	mptcp: fix splat when closing unaccepted socket
	f2fs: avoid unneeded data copy in f2fs_ioc_move_range()
	ARC: entry: fix off-by-one error in syscall number validation
	ARC: mm: PAE: use 40-bit physical page mask
	ARC: mm: Use max_high_pfn as a HIGHMEM zone border
	powerpc/64s: Fix crashes when toggling stf barrier
	powerpc/64s: Fix crashes when toggling entry flush barrier
	hfsplus: prevent corruption in shrinking truncate
	squashfs: fix divide error in calculate_skip()
	userfaultfd: release page in error path to avoid BUG_ON
	kasan: fix unit tests with CONFIG_UBSAN_LOCAL_BOUNDS enabled
	mm/hugetlb: fix F_SEAL_FUTURE_WRITE
	blk-iocost: fix weight updates of inner active iocgs
	arm64: mte: initialize RGSR_EL1.SEED in __cpu_setup
	arm64: Fix race condition on PG_dcache_clean in __sync_icache_dcache()
	btrfs: fix race leading to unpersisted data and metadata on fsync
	drm/radeon/dpm: Disable sclk switching on Oland when two 4K 60Hz monitors are connected
	drm/amd/display: Initialize attribute for hdcp_srm sysfs file
	drm/i915: Avoid div-by-zero on gen2
	kvm: exit halt polling on need_resched() as well
	KVM: LAPIC: Accurately guarantee busy wait for timer to expire when using hv_timer
	drm/msm/dp: initialize audio_comp when audio starts
	KVM: x86: Cancel pvclock_gtod_work on module removal
	KVM: x86: Prevent deadlock against tk_core.seq
	dax: Add an enum for specifying dax wakup mode
	dax: Add a wakeup mode parameter to put_unlocked_entry()
	dax: Wake up all waiters after invalidating dax entry
	xen/unpopulated-alloc: consolidate pgmap manipulation
	xen/unpopulated-alloc: fix error return code in fill_list()
	perf tools: Fix dynamic libbpf link
	usb: dwc3: gadget: Free gadget structure only after freeing endpoints
	iio: light: gp2ap002: Fix rumtime PM imbalance on error
	iio: proximity: pulsedlight: Fix rumtime PM imbalance on error
	iio: hid-sensors: select IIO_TRIGGERED_BUFFER under HID_SENSOR_IIO_TRIGGER
	usb: fotg210-hcd: Fix an error message
	hwmon: (occ) Fix poll rate limiting
	usb: musb: Fix an error message
	ACPI: scan: Fix a memory leak in an error handling path
	kyber: fix out of bounds access when preempted
	nvmet: add lba to sect conversion helpers
	nvmet: fix inline bio check for bdev-ns
	nvmet-rdma: Fix NULL deref when SEND is completed with error
	f2fs: compress: fix to free compress page correctly
	f2fs: compress: fix race condition of overwrite vs truncate
	f2fs: compress: fix to assign cc.cluster_idx correctly
	nbd: Fix NULL pointer in flush_workqueue
	blk-mq: plug request for shared sbitmap
	blk-mq: Swap two calls in blk_mq_exit_queue()
	usb: dwc3: omap: improve extcon initialization
	usb: dwc3: pci: Enable usb2-gadget-lpm-disable for Intel Merrifield
	usb: xhci: Increase timeout for HC halt
	usb: dwc2: Fix gadget DMA unmap direction
	usb: core: hub: fix race condition about TRSMRCY of resume
	usb: dwc3: gadget: Enable suspend events
	usb: dwc3: gadget: Return success always for kick transfer in ep queue
	usb: typec: ucsi: Retrieve all the PDOs instead of just the first 4
	usb: typec: ucsi: Put fwnode in any case during ->probe()
	xhci-pci: Allow host runtime PM as default for Intel Alder Lake xHCI
	xhci: Do not use GFP_KERNEL in (potentially) atomic context
	xhci: Add reset resume quirk for AMD xhci controller.
	iio: gyro: mpu3050: Fix reported temperature value
	iio: tsl2583: Fix division by a zero lux_val
	cdc-wdm: untangle a circular dependency between callback and softint
	xen/gntdev: fix gntdev_mmap() error exit path
	KVM: x86: Emulate RDPID only if RDTSCP is supported
	KVM: x86: Move RDPID emulation intercept to its own enum
	KVM: nVMX: Always make an attempt to map eVMCS after migration
	KVM: VMX: Do not advertise RDPID if ENABLE_RDTSCP control is unsupported
	KVM: VMX: Disable preemption when probing user return MSRs
	Revert "iommu/vt-d: Remove WO permissions on second-level paging entries"
	Revert "iommu/vt-d: Preset Access/Dirty bits for IOVA over FL"
	iommu/vt-d: Preset Access/Dirty bits for IOVA over FL
	iommu/vt-d: Remove WO permissions on second-level paging entries
	mm: fix struct page layout on 32-bit systems
	MIPS: Reinstate platform `__div64_32' handler
	MIPS: Avoid DIVU in `__div64_32' is result would be zero
	MIPS: Avoid handcoded DIVU in `__div64_32' altogether
	clocksource/drivers/timer-ti-dm: Prepare to handle dra7 timer wrap issue
	clocksource/drivers/timer-ti-dm: Handle dra7 timer wrap errata i940
	ARM: 9011/1: centralize phys-to-virt conversion of DT/ATAGS address
	ARM: 9012/1: move device tree mapping out of linear region
	ARM: 9020/1: mm: use correct section size macro to describe the FDT virtual address
	ARM: 9027/1: head.S: explicitly map DT even if it lives in the first physical section
	usb: typec: tcpm: Fix error while calculating PPS out values
	kobject_uevent: remove warning in init_uevent_argv()
	drm/i915/gt: Fix a double free in gen8_preallocate_top_level_pdp
	drm/i915: Read C0DRB3/C1DRB3 as 16 bits again
	drm/i915/overlay: Fix active retire callback alignment
	drm/i915: Fix crash in auto_retire
	clk: exynos7: Mark aclk_fsys1_200 as critical
	media: rkvdec: Remove of_match_ptr()
	i2c: mediatek: Fix send master code at more than 1MHz
	dt-bindings: media: renesas,vin: Make resets optional on R-Car Gen1
	dt-bindings: serial: 8250: Remove duplicated compatible strings
	debugfs: Make debugfs_allow RO after init
	ext4: fix debug format string warning
	nvme: do not try to reconfigure APST when the controller is not live
	ASoC: rsnd: check all BUSIF status when error
	Linux 5.10.38

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia32e01283b488a38be48015c58a0e481f09aaf65
2021-05-20 15:35:25 +02:00
Petr Mladek
5b66867966 watchdog: fix barriers when printing backtraces from all CPUs
[ Upstream commit 9f113bf760ca90d709f8f89a733d10abb1f04a83 ]

Any parallel softlockup reports are skipped when one CPU is already
printing backtraces from all CPUs.

The exclusive rights are synchronized using one bit in
soft_lockup_nmi_warn.  There is also one memory barrier that does not make
much sense.

Use two barriers on the right location to prevent mixing two reports.

[pmladek@suse.com: use bit lock operations to prevent multiple soft-lockup reports]
  Link: https://lkml.kernel.org/r/YFSVsLGVWMXTvlbk@alley

Link: https://lkml.kernel.org/r/20210311122130.6788-6-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19 10:13:00 +02:00
Petr Mladek
a68c246065 watchdog/softlockup: remove logic that tried to prevent repeated reports
[ Upstream commit 1bc503cb4a2638fb1c57801a7796aca57845ce63 ]

The softlockup detector does some gymnastic with the variable
soft_watchdog_warn.  It was added by the commit 58687acba5
("lockup_detector: Combine nmi_watchdog and softlockup detector").

The purpose is not completely clear.  There are the following clues.  They
describe the situation how it looked after the above mentioned commit:

  1. The variable was checked with a comment "only warn once".

  2. The variable was set when softlockup was reported. It was cleared
     only when the CPU was not longer in the softlockup state.

  3. watchdog_touch_ts was not explicitly updated when the softlockup
     was reported. Without this variable, the report would normally
     be printed again during every following watchdog_timer_fn()
     invocation.

The logic has got even more tangled up by the commit ed235875e2
("kernel/watchdog.c: print traces for all cpus on lockup detection").
After this commit, soft_watchdog_warn is set only when
softlockup_all_cpu_backtrace is enabled.  But multiple reports from all
CPUs are prevented by a new variable soft_lockup_nmi_warn.

Conclusion:

The variable probably never worked as intended.  In each case, it has not
worked last many years because the softlockup was reported repeatedly
after the full period defined by watchdog_thresh.

The reason is that watchdog gets touched in many known slow paths, for
example, in printk_stack_address().  This code is called also when
printing the softlockup report.  It means that the watchdog timestamp gets
updated after each report.

Solution:

Simply remove the logic. People want the periodic report anyway.

Link: https://lkml.kernel.org/r/20210311122130.6788-5-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19 10:13:00 +02:00
Petr Mladek
9413b1ee38 watchdog: explicitly update timestamp when reporting softlockup
[ Upstream commit c9ad17c991492f4390f42598f6ab0531f87eed07 ]

The softlockup situation might stay for a long time or even forever.  When
it happens, the softlockup debug messages are printed in regular intervals
defined by get_softlockup_thresh().

There is a mystery.  The repeated message is printed after the full
interval that is defined by get_softlockup_thresh().  But the timer
callback is called more often as defined by sample_period.  The code looks
like the soflockup should get reported in every sample_period when it was
once behind the thresh.

It works only by chance.  The watchdog is touched when printing the stall
report, for example, in printk_stack_address().

Make the behavior clear and predictable by explicitly updating the
timestamp in watchdog_timer_fn() when the report gets printed.

Link: https://lkml.kernel.org/r/20210311122130.6788-3-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19 10:12:59 +02:00
Petr Mladek
018655f875 watchdog: rename __touch_watchdog() to a better descriptive name
[ Upstream commit 7c0012f522c802d25be102bafe54f333168e6119 ]

Patch series "watchdog/softlockup: Report overall time and some cleanup", v2.

I dug deep into the softlockup watchdog history when time permitted this
year.  And reworked the patchset that fixed timestamps and cleaned up the
code[2].

I split it into very small steps and did even more code clean up.  The
result looks quite strightforward and I am pretty confident with the
changes.

[1] v2: https://lore.kernel.org/r/20201210160038.31441-1-pmladek@suse.com
[2] v1: https://lore.kernel.org/r/20191024114928.15377-1-pmladek@suse.com

This patch (of 6):

There are many touch_*watchdog() functions.  They are called in situations
where the watchdog could report false positives or create unnecessary
noise.  For example, when CPU is entering idle mode, a virtual machine is
stopped, or a lot of messages are printed in the atomic context.

These functions set SOFTLOCKUP_RESET instead of a real timestamp.  It
allows to call them even in a context where jiffies might be outdated.
For example, in an atomic context.

The real timestamp is set by __touch_watchdog() that is called from the
watchdog timer callback.

Rename this callback to update_touch_ts().  It better describes the effect
and clearly distinguish is from the other touch_*watchdog() functions.

Another motivation is that two timestamps are going to be used.  One will
be used for the total softlockup time.  The other will be used to measure
time since the last report.  The new function name will help to
distinguish which timestamp is being updated.

Link: https://lkml.kernel.org/r/20210311122130.6788-1-pmladek@suse.com
Link: https://lkml.kernel.org/r/20210311122130.6788-2-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19 10:12:59 +02:00
Sangmoon Kim
b127af3af9 ANDROID: softlockup: add vendor hook for a softlockup task
Add hook to gather data of softlockup and summarize it with
other information.

Bug: 177483057

Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
Change-Id: I42b906f17ad689176f0cc5a1a46acd0b5971d6c5
2021-02-04 23:55:45 +00:00
Santosh Sivaraj
e7e046155a kernel/watchdog: fix watchdog_allowed_mask not used warning
Define watchdog_allowed_mask only when SOFTLOCKUP_DETECTOR is enabled.

Fixes: 7feeb9cd4f ("watchdog/sysctl: Clean up sysctl variable name space")
Signed-off-by: Santosh Sivaraj <santosh@fossix.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20201106015025.1281561-1-santosh@fossix.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-14 11:26:03 -08:00
Guilherme G. Piccoli
f117955a22 kernel/watchdog.c: convert {soft/hard}lockup boot parameters to sysctl aliases
After a recent change introduced by Vlastimil's series [0], kernel is
able now to handle sysctl parameters on kernel command line; also, the
series introduced a simple infrastructure to convert legacy boot
parameters (that duplicate sysctls) into sysctl aliases.

This patch converts the watchdog parameters softlockup_panic and
{hard,soft}lockup_all_cpu_backtrace to use the new alias infrastructure.
It fixes the documentation too, since the alias only accepts values 0 or
1, not the full range of integers.

We also took the opportunity here to improve the documentation of the
previously converted hung_task_panic (see the patch series [0]) and put
the alias table in alphabetical order.

[0] http://lkml.kernel.org/r/20200427180433.7029-1-vbabka@suse.cz

Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Link: http://lkml.kernel.org/r/20200507214624.21911-1-gpiccoli@canonical.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08 11:05:56 -07:00
Christoph Hellwig
32927393dc sysctl: pass kernel pointers to ->proc_handler
Instead of having all the sysctl handlers deal with user pointers, which
is rather hairy in terms of the BPF interaction, copy the input to and
from  userspace in common code.  This also means that the strings are
always NUL-terminated by the common code, making the API a little bit
safer.

As most handler just pass through the data to one of the common handlers
a lot of the changes are mechnical.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-27 02:07:40 -04:00
Thomas Gleixner
11e31f608b watchdog/softlockup: Enforce that timestamp is valid on boot
Robert reported that during boot the watchdog timestamp is set to 0 for one
second which is the indicator for a watchdog reset.

The reason for this is that the timestamp is in seconds and the time is
taken from sched clock and divided by ~1e9. sched clock starts at 0 which
means that for the first second during boot the watchdog timestamp is 0,
i.e. reset.

Use ULONG_MAX as the reset indicator value so the watchdog works correctly
right from the start. ULONG_MAX would only conflict with a real timestamp
if the system reaches an uptime of 136 years on 32bit and almost eternity
on 64bit.

Reported-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/87o8v3uuzl.fsf@nanos.tec.linutronix.de
2020-01-17 11:19:22 +01:00
Petr Mladek
3a51449b79 watchdog/softlockup: Remove obsolete check of last reported task
commit 9cf57731b6 ("watchdog/softlockup: Replace "watchdog/%u" threads
 with cpu_stop_work") ensures that the watchdog is reliably touched during
a task switch.

As a result the check for an unnoticed task switch is not longer needed.

Remove the relevant code, which effectively reverts commit b1a8de1f53
("softlockup: make detector be aware of task switch of processes hogging
cpu")

Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Ziljstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20191024114928.15377-2-pmladek@suse.com
2020-01-16 14:52:48 +01:00
Jisheng Zhang
d129479f1f watchdog: Remove soft_lockup_hrtimer_cnt and related code
After commit 9cf57731b6 ("watchdog/softlockup: Replace "watchdog/%u"
threads with cpu_stop_work"), the percpu soft_lockup_hrtimer_cnt is
not used any more, so remove it and related code.

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191218131720.4146aea2@xhacker.debian
2020-01-16 12:25:51 +01:00
Sebastian Andrzej Siewior
d2ab4cf494 watchdog: Mark watchdog_hrtimer to expire in hard interrupt context
The watchdog hrtimer must expire in hard interrupt context even on
PREEMPT_RT=y kernels as otherwise the hard/softlockup detection logic would
not work.

No functional change.

[ tglx: Split out from larger combo patch. Added changelog ]

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190726185753.262895510@linutronix.de
2019-08-01 20:51:20 +02:00
Arash Fotouhi
76e1552466 watchdog: Fix typo in comment
Signed-off-by: Arash Fotouhi <arash@arashfotouhi.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: loberman@redhat.com
Cc: vincent.whitchurch@axis.com
Link: http://lkml.kernel.org/r/1553308112-3513-1-git-send-email-arash@arashfotouhi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-18 14:05:51 +02:00
Thomas Gleixner
7dd4761711 watchdog: Respect watchdog cpumask on CPU hotplug
The rework of the watchdog core to use cpu_stop_work broke the watchdog
cpumask on CPU hotplug.

The watchdog_enable/disable() functions are now called unconditionally from
the hotplug callback, i.e. even on CPUs which are not in the watchdog
cpumask. As a consequence the watchdog can become unstoppable.

Only invoke them when the plugged CPU is in the watchdog cpumask.

Fixes: 9cf57731b6 ("watchdog/softlockup: Replace "watchdog/%u" threads with cpu_stop_work")
Reported-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1903262245490.1789@nanos.tec.linutronix.de
2019-03-28 13:32:01 +01:00
Valdis Kletnieks
48084abf21 watchdog/core: Make variables static
sparse complains:
  CHECK   kernel/watchdog.c
kernel/watchdog.c:45:19: warning: symbol 'nmi_watchdog_available'
			 	  was not declared. Should it be static?
kernel/watchdog.c:47:16: warning: symbol 'watchdog_allowed_mask'
			 	  was not declared. Should it be static?

They're not referenced by name from anyplace else, make them static.

Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/7855.1552383228@turing-police
2019-03-22 13:40:17 +01:00
Laurence Oberman
1129505552 watchdog/core: Add watchdog_thresh command line parameter
The hard and soft lockup detector threshold has a default value of 10
seconds which can only be changed via sysctl.

During early boot lockup detection can trigger when noisy debugging emits
a large amount of messages to the console, but there is no way to set a
larger threshold on the kernel command line. The detector can only be
completely disabled.

Add a new watchdog_thresh= command line parameter to allow boot time
control over the threshold. It works in the same way as the sysctl and
affects both the soft and the hard lockup detectors.

Signed-off-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: rdunlap@infradead.org
Cc: prarit@redhat.com
Link: https://lkml.kernel.org/r/1541079018-13953-1-git-send-email-loberman@redhat.com
2018-11-01 14:33:35 +01:00
Vincent Whitchurch
cb9d7fd51d watchdog: Mark watchdog touch functions as notrace
Some architectures need to use stop_machine() to patch functions for
ftrace, and the assumption is that the stopped CPUs do not make function
calls to traceable functions when they are in the stopped state.

Commit ce4f06dcbb ("stop_machine: Touch_nmi_watchdog() after
MULTI_STOP_PREPARE") added calls to the watchdog touch functions from
the stopped CPUs and those functions lack notrace annotations.  This
leads to crashes when enabling/disabling ftrace on ARM kernels built
with the Thumb-2 instruction set.

Fix it by adding the necessary notrace annotations.

Fixes: ce4f06dcbb ("stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: oleg@redhat.com
Cc: tj@kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180821152507.18313-1-vincent.whitchurch@axis.com
2018-08-30 12:56:40 +02:00
Peter Zijlstra
be45bf5395 watchdog/softlockup: Fix cpu_stop_queue_work() double-queue bug
When scheduling is delayed for longer than the softlockup interrupt
period it is possible to double-queue the cpu_stop_work, causing list
corruption.

Cure this by adding a completion to track the cpu_stop_work's
progress.

Reported-by: kernel test robot <lkp@intel.com>
Tested-by: Rong Chen <rong.a.chen@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 9cf57731b6 ("watchdog/softlockup: Replace "watchdog/%u" threads with cpu_stop_work")
Link: http://lkml.kernel.org/r/20180713104208.GW2494@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-15 23:51:19 +02:00
Peter Zijlstra
9cf57731b6 watchdog/softlockup: Replace "watchdog/%u" threads with cpu_stop_work
Oleg suggested to replace the "watchdog/%u" threads with
cpu_stop_work. That removes one thread per CPU while at the same time
fixes softlockup vs SCHED_DEADLINE.

But more importantly, it does away with the single
smpboot_update_cpumask_percpu_thread() user, which allows
cleanups/shrinkage of the smpboot interface.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-03 09:20:43 +02:00
Ingo Molnar
8a103df440 Merge branch 'linus' into sched/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-08 10:17:15 +01:00
Greg Kroah-Hartman
b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Frederic Weisbecker
de201559df sched/isolation: Introduce housekeeping flags
Before we implement isolcpus under housekeeping, we need the isolation
features to be more finegrained. For example some people want NOHZ_FULL
without the full scheduler isolation, others want full scheduler
isolation without NOHZ_FULL.

So let's cut all these isolation features piecewise, at the risk of
overcutting it right now. We can still merge some flags later if they
always make sense together.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Link: http://lkml.kernel.org/r/1509072159-31808-9-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-27 09:55:29 +02:00
Frederic Weisbecker
13316b31fd sched/isolation, watchdog: Use housekeeping_cpumask() instead of ad-hoc version
While trying to disable the watchog on nohz_full CPUs, the watchdog
implements an ad-hoc version of housekeeping_cpumask(). Lets replace
those re-invented lines.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Link: http://lkml.kernel.org/r/1509072159-31808-3-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-27 09:55:25 +02:00
Frederic Weisbecker
7863406143 sched/isolation: Move housekeeping related code to its own file
The housekeeping code is currently tied to the NOHZ code. As we are
planning to make housekeeping independent from it, start with moving
the relevant code to its own file.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Link: http://lkml.kernel.org/r/1509072159-31808-2-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-27 09:55:24 +02:00
Thomas Gleixner
0b62bf862d watchdog/core: Put softlockup_threads_initialized under ifdef guard
The variable is unused when the softlockup detector is disabled in Kconfig.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-10-04 11:30:50 +02:00
Thomas Gleixner
5587185ddb watchdog/core: Rename some softlockup_* functions
The function names made sense up to the point where the watchdog
(re)configuration was unified to use softlockup_reconfigure_threads() for
all configuration purposes. But that includes scenarios which solely
configure the nmi watchdog.

Rename softlockup_reconfigure_threads() and softlockup_init_threads() so
the function names match the functionality.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Don Zickus <dzickus@redhat.com>
2017-10-04 10:53:54 +02:00
Thomas Gleixner
34ddaa3e5c powerpc/watchdog: Make use of watchdog_nmi_probe()
The rework of the core hotplug code triggers the WARN_ON in start_wd_cpu()
on powerpc because it is called multiple times for the boot CPU.

The first call is via:

  start_wd_on_cpu+0x80/0x2f0
  watchdog_nmi_reconfigure+0x124/0x170
  softlockup_reconfigure_threads+0x110/0x130
  lockup_detector_init+0xbc/0xe0
  kernel_init_freeable+0x18c/0x37c
  kernel_init+0x2c/0x160
  ret_from_kernel_thread+0x5c/0xbc

And then again via the CPU hotplug registration:

  start_wd_on_cpu+0x80/0x2f0
  cpuhp_invoke_callback+0x194/0x620
  cpuhp_thread_fun+0x7c/0x1b0
  smpboot_thread_fn+0x290/0x2a0
  kthread+0x168/0x1b0
  ret_from_kernel_thread+0x5c/0xbc

This can be avoided by setting up the cpu hotplug state with nocalls and
move the initialization to the watchdog_nmi_probe() function. That
initializes the hotplug callbacks without invoking the callback and the
following core initialization function then configures the watchdog for the
online CPUs (in this case CPU0) via softlockup_reconfigure_threads().

Reported-and-tested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
2017-10-04 10:53:54 +02:00
Thomas Gleixner
e31d6883f2 watchdog/core, powerpc: Lock cpus across reconfiguration
Instead of dropping the cpu hotplug lock after stopping NMI watchdog and
threads and reaquiring for restart, the code and the protection rules
become more obvious when holding cpu hotplug lock across the full
reconfiguration.

Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1710022105570.2114@nanos
2017-10-04 10:53:54 +02:00
Thomas Gleixner
6b9dc4806b watchdog/core, powerpc: Replace watchdog_nmi_reconfigure()
The recent cleanup of the watchdog code split watchdog_nmi_reconfigure()
into two stages. One to stop the NMI and one to restart it after
reconfiguration. That was done by adding a boolean 'run' argument to the
code, which is functionally correct but not necessarily a piece of art.

Replace it by two explicit functions: watchdog_nmi_stop() and
watchdog_nmi_start().

Fixes: 6592ad2fcc ("watchdog/core, powerpc: Make watchdog_nmi_reconfigure() two stage")
Requested-by: Linus 'Nursing his pet-peeve' Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Thomas 'Mopping up garbage' Gleixner <tglx@linutronix.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1710021957480.2114@nanos
2017-10-04 10:53:53 +02:00
Thomas Gleixner
ab5fe3ff38 watchdog/hardlockup: Clean up hotplug locking mess
All watchdog thread related functions are delegated to the smpboot thread
infrastructure, which handles serialization against CPU hotplug correctly.

The sysctl interface is completely decoupled from anything which requires
CPU hotplug protection.

No need to protect the sysctl writes against cpu hotplug anymore. Remove it
and add the now required protection to the powerpc arch_nmi_watchdog
implementation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20170912194148.418497420@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:09 +02:00
Thomas Gleixner
146c9d0e9d watchdog/hardlockup/perf: Use new perf CPU enable mechanism
Get rid of the hodgepodge which tries to be smart about perf being
unavailable and error printout rate limiting.

That's all not required simply because this is never invoked when the perf
NMI watchdog is not functional.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194148.259651788@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:08 +02:00
Thomas Gleixner
a994a3147e watchdog/hardlockup/perf: Implement init time detection of perf
Use the init time detection of the perf NMI watchdog to determine whether
the perf NMI watchdog is functional. If not disable it permanentely. It
won't come back magically at runtime.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194148.099799541@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:08 +02:00
Thomas Gleixner
091549858e watchdog/core: Get rid of the racy update loop
Letting user space poke directly at variables which are used at run time is
stupid and causes a lot of race conditions and other issues.

Seperate the user variables and on change invoke the reconfiguration, which
then stops the watchdogs, reevaluates the new user value and restarts the
watchdogs with the new parameters.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194147.939985640@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:07 +02:00
Thomas Gleixner
6592ad2fcc watchdog/core, powerpc: Make watchdog_nmi_reconfigure() two stage
Both the perf reconfiguration and the powerpc watchdog_nmi_reconfigure()
need to be done in two steps.

     1) Stop all NMIs
     2) Read the new parameters and start NMIs

Right now watchdog_nmi_reconfigure() is a combination of both. To allow a
clean reconfiguration add a 'run' argument and split the functionality in
powerpc.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20170912194147.862865570@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:07 +02:00
Thomas Gleixner
7feeb9cd4f watchdog/sysctl: Clean up sysctl variable name space
Reflect that these variables are user interface related and remove the
whitespace damage in the sysctl table while at it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194147.783210221@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:07 +02:00
Thomas Gleixner
e8b62b2dd1 watchdog/core: Further simplify sysctl handling
Use a single function to update sysctl changes. This is not a high
frequency user space interface and it's root only.

Preparatory patch to cleanup the sysctl variable handling.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194147.549114957@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:06 +02:00
Thomas Gleixner
d57108d4f6 watchdog/core: Get rid of the thread teardown/setup dance
The lockup detector reconfiguration tears down all watchdog threads when
the watchdog is disabled and sets them up again when its enabled.

That's a pointless exercise. The watchdog threads are not consuming an
insane amount of resources, so it's enough to set them up at init time and
keep them in parked position when the watchdog is disabled and unpark them
when it is reenabled. The smpboot thread infrastructure takes care of
keeping the force parked threads in place even across cpu hotplug.

Aside of that the code implements the park/unpark facility of smp hotplug
threads on its own, which is even more pointless. We have functionality in
the smpboot thread code to do so.

Use the new thread management functions and get rid of the unholy mess.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194147.470370113@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:06 +02:00
Thomas Gleixner
2eb2527f84 watchdog/core: Create new thread handling infrastructure
The lockup detector reconfiguration tears down all watchdog threads when
the watchdog is disabled and sets them up again when its enabled.

That's a pointless exercise. The watchdog threads are not consuming an
insane amount of resources, so it's enough to set them up at init time and
keep them in parked position when the watchdog is disabled and unpark them
when it is reenabled. The smpboot thread infrastructure takes care of
keeping the force parked threads in place even across cpu hotplug.

Another horrible mechanism are the open coded park/unpark loops which are
used for reconfiguration of the watchdog. The smpboot infrastructure allows
exactly the same via smpboot_update_cpumask_thread_percpu(), which is cpu
hotplug safe. Using that instead of the open coded loops allows to get rid
of the hotplug locking mess in the watchdog code.

Implement a clean infrastructure which allows to replace the open coded
nonsense.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194147.377182587@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:06 +02:00
Thomas Gleixner
0d85923c7a smpboot/threads, watchdog/core: Avoid runtime allocation
smpboot_update_cpumask_threads_percpu() allocates a temporary cpumask at
runtime. This is suboptimal because the call site needs more code size for
proper error handling than a statically allocated temporary mask requires
data size.

Add static temporary cpumask. The function is globaly serialized, so no
further protection required.

Remove the half baken error handling in the watchdog code and get rid of
the export as there are no in tree modular users of that function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194147.297288838@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:06 +02:00
Thomas Gleixner
05ba3de74a watchdog/core: Split out cpumask write function
Split the write part of the cpumask proc handler out into a separate helper
to avoid deep indentation. This also reduces the patch complexity in the
following cleanups.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194147.218075991@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:06 +02:00
Thomas Gleixner
368a7e2ce8 watchdog/core: Clean up the #ifdef maze
The #ifdef maze in this file is horrible, group stuff at least a bit so one
can figure out what belongs to what.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194147.139629546@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:05 +02:00
Thomas Gleixner
2b9d7f233b watchdog/core: Clean up stub functions
Having stub functions which take a full page is not helping the
readablility of code.

Condense them and move the doubled #ifdef variant into the SYSFS section.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194147.045545271@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:05 +02:00
Thomas Gleixner
01f0a02701 watchdog/core: Remove the park_in_progress obfuscation
Commit:

  b94f51183b ("kernel/watchdog: prevent false hardlockup on overloaded system")

tries to fix the following issue:

proc_write()
   set_sample_period()    <--- New sample period becoms visible
			  <----- Broken starts
   proc_watchdog_update()
     watchdog_enable_all_cpus()		watchdog_hrtimer_fn()
     update_watchdog_all_cpus()		   restart_timer(sample_period)
        watchdog_park_threads()

					thread->park()
					  disable_nmi()
			  <----- Broken ends

The reason why this is broken is that the update of the watchdog threshold
becomes immediately effective and visible for the hrtimer function which
uses that value to rearm the timer. But the NMI/perf side still uses the
old value up to the point where it is disabled. If the rate has been
lowered then the NMI can run fast enough to 'detect' a hard lockup because
the timer has not fired due to the longer period.

The patch 'fixed' this by adding a variable:

proc_write()
   set_sample_period()
					<----- Broken starts
   proc_watchdog_update()
     watchdog_enable_all_cpus()		watchdog_hrtimer_fn()
     update_watchdog_all_cpus()		   restart_timer(sample_period)
         watchdog_park_threads()
	  park_in_progress = 1
					<----- Broken ends
				        nmi_watchdog()
					  if (park_in_progress)
					     return;

The only effect of this variable was to make the window where the breakage
can hit small enough that it was not longer observable in testing. From a
correctness point of view it is a pointless bandaid which merily papers
over the root cause: the unsychronized update of the variable.

Looking deeper into the related code pathes unearthed similar problems in
the watchdog_start()/stop() functions.

 watchdog_start()
	perf_nmi_event_start()
	hrtimer_start()

 watchdog_stop()
	hrtimer_cancel()
	perf_nmi_event_stop()

In both cases the call order is wrong because if the tasks gets preempted
or the VM gets scheduled out long enough after the first call, then there is
a chance that the next NMI will see a stale hrtimer interrupt count and
trigger a false positive hard lockup splat.

Get rid of park_in_progress so the code can be gradually deobfuscated and
pruned from several layers of duct tape papering over the root cause,
which has been either ignored or not understood at all.

Once this is removed the underlying problem will be fixed by rewriting the
proc interface to do a proper synchronized update.

Address the start/stop() ordering problem as well by reverting the call
order, so this part is at least correct now.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1709052038270.2393@nanos
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:05 +02:00
Thomas Gleixner
941154bd69 watchdog/hardlockup/perf: Prevent CPU hotplug deadlock
The following deadlock is possible in the watchdog hotplug code:

  cpus_write_lock()
    ...
      takedown_cpu()
        smpboot_park_threads()
          smpboot_park_thread()
            kthread_park()
              ->park() := watchdog_disable()
                watchdog_nmi_disable()
                  perf_event_release_kernel();
                    put_event()
                      _free_event()
                        ->destroy() := hw_perf_event_destroy()
                          x86_release_hardware()
                            release_ds_buffers()
                              get_online_cpus()

when a per cpu watchdog perf event is destroyed which drops the last
reference to the PMU hardware. The cleanup code there invokes
get_online_cpus() which instantly deadlocks because the hotplug percpu
rwsem is write locked.

To solve this add a deferring mechanism:

  cpus_write_lock()
			   kthread_park()
			    watchdog_nmi_disable(deferred)
			      perf_event_disable(event);
			      move_event_to_deferred(event);
			   ....
  cpus_write_unlock()
  cleaup_deferred_events()
    perf_event_release_kernel()

This is still properly serialized against concurrent hotplug via the
cpu_add_remove_lock, which is held by the task which initiated the hotplug
event.

This is also used to handle event destruction when the watchdog threads are
parked via other mechanisms than CPU hotplug.

Analyzed-by: Peter Zijlstra <peterz@infradead.org>

Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194146.884469246@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:05 +02:00
Thomas Gleixner
20d853fd07 watchdog/hardlockup/perf: Remove broken self disable on failure
The self disabling feature is broken vs. CPU hotplug locking:

CPU 0			   CPU 1
cpus_write_lock();
 cpu_up(1)
   wait_for_completion()
			   ....
			   unpark_watchdog()
			   ->unpark()
			     perf_event_create() <- fails
			       watchdog_enable &= ~NMI_WATCHDOG;
			   ....
cpus_write_unlock();
			   CPU 2
cpus_write_lock()
 cpu_down(2)
   wait_for_completion()
			   wakeup(watchdog);
			     watchdog()
			     if (!(watchdog_enable & NMI_WATCHDOG))
				watchdog_nmi_disable()
				  perf_event_disable()
				  ....
				  cpus_read_lock();

			   stop_smpboot_threads()
			     park_watchdog();
			       wait_for_completion(watchdog->parked);

Result: End of hotplug and instantaneous full lockup of the machine.

There is a similar problem with disabling the watchdog via the user space
interface as the sysctl function fiddles with watchdog_enable directly.

It's very debatable whether this is required at all. If the watchdog works
nicely on N CPUs and it fails to enable on the N + 1 CPU either during
hotplug or because the user space interface disabled it via sysctl cpumask
and then some perf user grabbed the counter which is then unavailable for
the watchdog when the sysctl cpumask gets changed back.

There is no real justification for this.

One of the reasons WHY this is done is the utter stupidity of the init code
of the perf NMI watchdog. Instead of checking upfront at boot whether PERF
is available and functional at all, it just does this check at run time
over and over when user space fiddles with the sysctl. That's broken beyond
repair along with the idiotic error code dependent warn level printks and
the even more silly printk rate limiting.

If the init code checks whether perf works at boot time, then this mess can
be more or less avoided completely. Perf does not come magically into life
at runtime. Brain usage while coding is overrated.

Remove the cruft and add a temporary safe guard which gets removed later.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194146.806708429@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:04 +02:00
Thomas Gleixner
7a35582007 watchdog/core: Mark hardlockup_detector_disable() __init
The function is only used by the KVM init code. Mark it __init to prevent
creative abuse.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194146.727134632@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:04 +02:00
Thomas Gleixner
946d197794 watchdog/core: Rename watchdog_proc_mutex
Following patches will use the mutex for other purposes as well. Rename it
as it is not longer a proc specific thing.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194146.647714850@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:04 +02:00
Thomas Gleixner
b7a349819d watchdog/core: Rework CPU hotplug locking
The watchdog proc interface causes extensive recursive locking of the CPU
hotplug percpu rwsem, which is deadlock prone.

Replace the get/put_online_cpus() pairs with cpu_hotplug_disable()/enable()
calls for now. Later patches will remove that requirement completely.

Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194146.568079057@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:04 +02:00
Thomas Gleixner
5490125d77 watchdog/core: Remove broken suspend/resume interfaces
This interface has several issues:

 - It's causing recursive locking of the hotplug lock.

 - It's complete overkill to teardown all threads and then recreate them

The same can be achieved with the simple hardlockup_detector_perf_stop /
restart() interfaces. The abuse from the busy looping poweroff() loop of
PARISC has been solved as well.

Remove the cruft.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194146.487537732@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14 11:41:04 +02:00