Commit Graph

134 Commits

Author SHA1 Message Date
SeongJae Park
a9ec7ed936 BACKPORT: timers: implement usleep_idle_range()
Patch series "mm/damon: Fix fake /proc/loadavg reports", v3.

This patchset fixes DAMON's fake load report issue.  The first patch
makes yet another variant of usleep_range() for this fix, and the second
patch fixes the issue of DAMON by making it using the newly introduced
function.

This patch (of 2):

Some kernel threads such as DAMON could need to repeatedly sleep in
micro seconds level.  Because usleep_range() sleeps in uninterruptible
state, however, such threads would make /proc/loadavg reports fake load.

To help such cases, this commit implements a variant of usleep_range()
called usleep_idle_range().  It is same to usleep_range() but sets the
state of the current task as TASK_IDLE while sleeping.

Link: https://lkml.kernel.org/r/20211126145015.15862-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20211126145015.15862-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: John Stultz <john.stultz@linaro.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit e4779015fd5d2fb8390c258268addff24d6077c7)

Bug: 228223814
Signed-off-by: Hailong Tu <tuhailong@oppo.com>
Change-Id: Ie590ba5fcff22c981d0a7ecae6d8e551160136f3
2022-04-28 23:09:17 +08:00
Greg Kroah-Hartman
af3bdb4304 Merge 5.10.58 into android12-5.10-lts
Changes in 5.10.58
	Revert "ACPICA: Fix memory leak caused by _CID repair function"
	ALSA: seq: Fix racy deletion of subscriber
	bus: ti-sysc: Fix gpt12 system timer issue with reserved status
	net: xfrm: fix memory leak in xfrm_user_rcv_msg
	arm64: dts: ls1028a: fix node name for the sysclk
	ARM: imx: add missing iounmap()
	ARM: imx: add missing clk_disable_unprepare()
	ARM: dts: imx6qdl-sr-som: Increase the PHY reset duration to 10ms
	arm64: dts: ls1028: sl28: fix networking for variant 2
	ARM: dts: colibri-imx6ull: limit SDIO clock to 25MHz
	ARM: imx: fix missing 3rd argument in macro imx_mmdc_perf_init
	ARM: dts: imx: Swap M53Menlo pinctrl_power_button/pinctrl_power_out pins
	arm64: dts: armada-3720-turris-mox: fixed indices for the SDHC controllers
	arm64: dts: armada-3720-turris-mox: remove mrvl,i2c-fast-mode
	ALSA: usb-audio: fix incorrect clock source setting
	clk: stm32f4: fix post divisor setup for I2S/SAI PLLs
	ARM: dts: am437x-l4: fix typo in can@0 node
	omap5-board-common: remove not physically existing vdds_1v8_main fixed-regulator
	dmaengine: uniphier-xdmac: Use readl_poll_timeout_atomic() in atomic state
	clk: tegra: Implement disable_unused() of tegra_clk_sdmmc_mux_ops
	dmaengine: stm32-dma: Fix PM usage counter imbalance in stm32 dma ops
	dmaengine: stm32-dmamux: Fix PM usage counter unbalance in stm32 dmamux ops
	spi: imx: mx51-ecspi: Reinstate low-speed CONFIGREG delay
	spi: imx: mx51-ecspi: Fix low-speed CONFIGREG delay calculation
	scsi: sr: Return correct event when media event code is 3
	media: videobuf2-core: dequeue if start_streaming fails
	ARM: dts: stm32: Disable LAN8710 EDPD on DHCOM
	ARM: dts: stm32: Fix touchscreen IRQ line assignment on DHCOM
	dmaengine: imx-dma: configure the generic DMA type to make it work
	net, gro: Set inner transport header offset in tcp/udp GRO hook
	net: dsa: sja1105: overwrite dynamic FDB entries with static ones in .port_fdb_add
	net: dsa: sja1105: invalidate dynamic FDB entries learned concurrently with statically added ones
	net: dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too
	net: dsa: sja1105: match FDB entries regardless of inner/outer VLAN tag
	net: phy: micrel: Fix detection of ksz87xx switch
	net: natsemi: Fix missing pci_disable_device() in probe and remove
	gpio: tqmx86: really make IRQ optional
	RDMA/mlx5: Delay emptying a cache entry when a new MR is added to it recently
	sctp: move the active_key update after sh_keys is added
	nfp: update ethtool reporting of pauseframe control
	net: ipv6: fix returned variable type in ip6_skb_dst_mtu
	net: dsa: qca: ar9331: reorder MDIO write sequence
	net: sched: fix lockdep_set_class() typo error for sch->seqlock
	MIPS: check return value of pgtable_pmd_page_ctor
	mips: Fix non-POSIX regexp
	bnx2x: fix an error code in bnx2x_nic_load()
	net: pegasus: fix uninit-value in get_interrupt_interval
	net: fec: fix use-after-free in fec_drv_remove
	net: vxge: fix use-after-free in vxge_device_unregister
	blk-iolatency: error out if blk_get_queue() failed in iolatency_set_limit()
	Bluetooth: defer cleanup of resources in hci_unregister_dev()
	USB: usbtmc: Fix RCU stall warning
	USB: serial: option: add Telit FD980 composition 0x1056
	USB: serial: ch341: fix character loss at high transfer rates
	USB: serial: ftdi_sio: add device ID for Auto-M3 OP-COM v2
	firmware_loader: use -ETIMEDOUT instead of -EAGAIN in fw_load_sysfs_fallback
	firmware_loader: fix use-after-free in firmware_fallback_sysfs
	drm/amdgpu/display: fix DMUB firmware version info
	ALSA: pcm - fix mmap capability check for the snd-dummy driver
	ALSA: hda/realtek: add mic quirk for Acer SF314-42
	ALSA: hda/realtek: Fix headset mic for Acer SWIFT SF314-56 (ALC256)
	ALSA: usb-audio: Fix superfluous autosuspend recovery
	ALSA: usb-audio: Add registration quirk for JBL Quantum 600
	usb: dwc3: gadget: Avoid runtime resume if disabling pullup
	usb: gadget: remove leaked entry from udc driver list
	usb: cdns3: Fixed incorrect gadget state
	usb: gadget: f_hid: added GET_IDLE and SET_IDLE handlers
	usb: gadget: f_hid: fixed NULL pointer dereference
	usb: gadget: f_hid: idle uses the highest byte for duration
	usb: host: ohci-at91: suspend/resume ports after/before OHCI accesses
	usb: typec: tcpm: Keep other events when receiving FRS and Sourcing_vbus events
	usb: otg-fsm: Fix hrtimer list corruption
	clk: fix leak on devm_clk_bulk_get_all() unwind
	scripts/tracing: fix the bug that can't parse raw_trace_func
	tracing / histogram: Give calculation hist_fields a size
	tracing: Reject string operand in the histogram expression
	tracing: Fix NULL pointer dereference in start_creating
	tracepoint: static call: Compare data on transition from 2->1 callees
	tracepoint: Fix static call function vs data state mismatch
	arm64: stacktrace: avoid tracing arch_stack_walk()
	optee: Clear stale cache entries during initialization
	tee: add tee_shm_alloc_kernel_buf()
	optee: Fix memory leak when failing to register shm pages
	optee: Refuse to load the driver under the kdump kernel
	optee: fix tee out of memory failure seen during kexec reboot
	tpm_ftpm_tee: Free and unregister TEE shared memory during kexec
	staging: rtl8723bs: Fix a resource leak in sd_int_dpc
	staging: rtl8712: get rid of flush_scheduled_work
	staging: rtl8712: error handling refactoring
	drivers core: Fix oops when driver probe fails
	media: rtl28xxu: fix zero-length control request
	pipe: increase minimum default pipe size to 2 pages
	ext4: fix potential htree corruption when growing large_dir directories
	serial: tegra: Only print FIFO error message when an error occurs
	serial: 8250_mtk: fix uart corruption issue when rx power off
	serial: 8250: Mask out floating 16/32-bit bus bits
	MIPS: Malta: Do not byte-swap accesses to the CBUS UART
	serial: 8250_pci: Enumerate Elkhart Lake UARTs via dedicated driver
	serial: 8250_pci: Avoid irq sharing for MSI(-X) interrupts.
	fpga: dfl: fme: Fix cpu hotplug issue in performance reporting
	timers: Move clearing of base::timer_running under base:: Lock
	xfrm: Fix RCU vs hash_resize_mutex lock inversion
	net/xfrm/compat: Copy xfrm_spdattr_type_t atributes
	pcmcia: i82092: fix a null pointer dereference bug
	selinux: correct the return value when loads initial sids
	bus: ti-sysc: AM3: RNG is GP only
	Revert "gpio: mpc8xxx: change the gpio interrupt flags."
	ARM: omap2+: hwmod: fix potential NULL pointer access
	md/raid10: properly indicate failure when ending a failed write request
	KVM: x86: accept userspace interrupt only if no event is injected
	KVM: Do not leak memory for duplicate debugfs directories
	KVM: x86/mmu: Fix per-cpu counter corruption on 32-bit builds
	arm64: vdso: Avoid ISB after reading from cntvct_el0
	soc: ixp4xx: fix printing resources
	interconnect: Fix undersized devress_alloc allocation
	spi: meson-spicc: fix memory leak in meson_spicc_remove
	interconnect: Zero initial BW after sync-state
	interconnect: Always call pre_aggregate before aggregate
	interconnect: qcom: icc-rpmh: Ensure floor BW is enforced for all nodes
	drm/i915: Correct SFC_DONE register offset
	soc: ixp4xx/qmgr: fix invalid __iomem access
	perf/x86/amd: Don't touch the AMD64_EVENTSEL_HOSTONLY bit inside the guest
	sched/rt: Fix double enqueue caused by rt_effective_prio
	drm/i915: avoid uninitialised var in eb_parse()
	libata: fix ata_pio_sector for CONFIG_HIGHMEM
	reiserfs: add check for root_inode in reiserfs_fill_super
	reiserfs: check directory items on read from disk
	virt_wifi: fix error on connect
	net: qede: Fix end of loop tests for list_for_each_entry
	alpha: Send stop IPI to send to online CPUs
	net/qla3xxx: fix schedule while atomic in ql_wait_for_drvr_lock and ql_adapter_reset
	smb3: rc uninitialized in one fallocate path
	drm/amdgpu/display: only enable aux backlight control for OLED panels
	arm64: fix compat syscall return truncation
	Linux 5.10.58

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2533667974c9dff419a14d63e0e8febfb3de80f1
2021-08-12 14:58:34 +02:00
Thomas Gleixner
23e36a8610 timers: Move clearing of base::timer_running under base:: Lock
commit bb7262b295472eb6858b5c49893954794027cd84 upstream.

syzbot reported KCSAN data races vs. timer_base::timer_running being set to
NULL without holding base::lock in expire_timers().

This looks innocent and most reads are clearly not problematic, but
Frederic identified an issue which is:

 int data = 0;

 void timer_func(struct timer_list *t)
 {
    data = 1;
 }

 CPU 0                                            CPU 1
 ------------------------------                   --------------------------
 base = lock_timer_base(timer, &flags);           raw_spin_unlock(&base->lock);
 if (base->running_timer != timer)                call_timer_fn(timer, fn, baseclk);
   ret = detach_if_pending(timer, base, true);    base->running_timer = NULL;
 raw_spin_unlock_irqrestore(&base->lock, flags);  raw_spin_lock(&base->lock);

 x = data;

If the timer has previously executed on CPU 1 and then CPU 0 can observe
base->running_timer == NULL and returns, assuming the timer has completed,
but it's not guaranteed on all architectures. The comment for
del_timer_sync() makes that guarantee. Moving the assignment under
base->lock prevents this.

For non-RT kernel it's performance wise completely irrelevant whether the
store happens before or after taking the lock. For an RT kernel moving the
store under the lock requires an extra unlock/lock pair in the case that
there is a waiter for the timer, but that's not the end of the world.

Reported-by: syzbot+aa7c2385d46c5eba0b89@syzkaller.appspotmail.com
Reported-by: syzbot+abea4558531bae1ba9fe@syzkaller.appspotmail.com
Fixes: 030dcdd197 ("timers: Prepare support for PREEMPT_RT")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/87lfea7gw8.fsf@nanos.tec.linutronix.de
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-12 13:22:15 +02:00
Greg Kroah-Hartman
e4cac2c332 Merge 5.10.54 into android12-5.10-lts
Changes in 5.10.54
	igc: Fix use-after-free error during reset
	igb: Fix use-after-free error during reset
	igc: change default return of igc_read_phy_reg()
	ixgbe: Fix an error handling path in 'ixgbe_probe()'
	igc: Fix an error handling path in 'igc_probe()'
	igb: Fix an error handling path in 'igb_probe()'
	fm10k: Fix an error handling path in 'fm10k_probe()'
	e1000e: Fix an error handling path in 'e1000_probe()'
	iavf: Fix an error handling path in 'iavf_probe()'
	igb: Check if num of q_vectors is smaller than max before array access
	igb: Fix position of assignment to *ring
	gve: Fix an error handling path in 'gve_probe()'
	net: add kcov handle to skb extensions
	bonding: fix suspicious RCU usage in bond_ipsec_add_sa()
	bonding: fix null dereference in bond_ipsec_add_sa()
	ixgbevf: use xso.real_dev instead of xso.dev in callback functions of struct xfrmdev_ops
	bonding: fix suspicious RCU usage in bond_ipsec_del_sa()
	bonding: disallow setting nested bonding + ipsec offload
	bonding: Add struct bond_ipesc to manage SA
	bonding: fix suspicious RCU usage in bond_ipsec_offload_ok()
	bonding: fix incorrect return value of bond_ipsec_offload_ok()
	ipv6: fix 'disable_policy' for fwd packets
	stmmac: platform: Fix signedness bug in stmmac_probe_config_dt()
	selftests: icmp_redirect: remove from checking for IPv6 route get
	selftests: icmp_redirect: IPv6 PMTU info should be cleared after redirect
	pwm: sprd: Ensure configuring period and duty_cycle isn't wrongly skipped
	cxgb4: fix IRQ free race during driver unload
	mptcp: fix warning in __skb_flow_dissect() when do syn cookie for subflow join
	nvme-pci: do not call nvme_dev_remove_admin from nvme_remove
	KVM: x86/pmu: Clear anythread deprecated bit when 0xa leaf is unsupported on the SVM
	perf inject: Fix dso->nsinfo refcounting
	perf map: Fix dso->nsinfo refcounting
	perf probe: Fix dso->nsinfo refcounting
	perf env: Fix sibling_dies memory leak
	perf test session_topology: Delete session->evlist
	perf test event_update: Fix memory leak of evlist
	perf dso: Fix memory leak in dso__new_map()
	perf test maps__merge_in: Fix memory leak of maps
	perf env: Fix memory leak of cpu_pmu_caps
	perf report: Free generated help strings for sort option
	perf script: Fix memory 'threads' and 'cpus' leaks on exit
	perf lzma: Close lzma stream on exit
	perf probe-file: Delete namelist in del_events() on the error path
	perf data: Close all files in close_dir()
	perf sched: Fix record failure when CONFIG_SCHEDSTATS is not set
	ASoC: wm_adsp: Correct wm_coeff_tlv_get handling
	spi: imx: add a check for speed_hz before calculating the clock
	spi: stm32: fixes pm_runtime calls in probe/remove
	regulator: hi6421: Use correct variable type for regmap api val argument
	regulator: hi6421: Fix getting wrong drvdata
	spi: mediatek: fix fifo rx mode
	ASoC: rt5631: Fix regcache sync errors on resume
	bpf, test: fix NULL pointer dereference on invalid expected_attach_type
	bpf: Fix tail_call_reachable rejection for interpreter when jit failed
	xdp, net: Fix use-after-free in bpf_xdp_link_release
	timers: Fix get_next_timer_interrupt() with no timers pending
	liquidio: Fix unintentional sign extension issue on left shift of u16
	s390/bpf: Perform r1 range checking before accessing jit->seen_reg[r1]
	bpf, sockmap: Fix potential memory leak on unlikely error case
	bpf, sockmap, tcp: sk_prot needs inuse_idx set for proc stats
	bpf, sockmap, udp: sk_prot needs inuse_idx set for proc stats
	bpftool: Check malloc return value in mount_bpffs_for_pin
	net: fix uninit-value in caif_seqpkt_sendmsg
	usb: hso: fix error handling code of hso_create_net_device
	dma-mapping: handle vmalloc addresses in dma_common_{mmap,get_sgtable}
	efi/tpm: Differentiate missing and invalid final event log table.
	net: decnet: Fix sleeping inside in af_decnet
	KVM: PPC: Book3S: Fix CONFIG_TRANSACTIONAL_MEM=n crash
	KVM: PPC: Fix kvm_arch_vcpu_ioctl vcpu_load leak
	net: sched: fix memory leak in tcindex_partial_destroy_work
	sctp: trim optlen when it's a huge value in sctp_setsockopt
	netrom: Decrease sock refcount when sock timers expire
	scsi: iscsi: Fix iface sysfs attr detection
	scsi: target: Fix protect handling in WRITE SAME(32)
	spi: cadence: Correct initialisation of runtime PM again
	ACPI: Kconfig: Fix table override from built-in initrd
	bnxt_en: don't disable an already disabled PCI device
	bnxt_en: Refresh RoCE capabilities in bnxt_ulp_probe()
	bnxt_en: Add missing check for BNXT_STATE_ABORT_ERR in bnxt_fw_rset_task()
	bnxt_en: Validate vlan protocol ID on RX packets
	bnxt_en: Check abort error state in bnxt_half_open_nic()
	net: hisilicon: rename CACHE_LINE_MASK to avoid redefinition
	net/tcp_fastopen: fix data races around tfo_active_disable_stamp
	ALSA: hda: intel-dsp-cfg: add missing ElkhartLake PCI ID
	net: hns3: fix possible mismatches resp of mailbox
	net: hns3: fix rx VLAN offload state inconsistent issue
	spi: spi-bcm2835: Fix deadlock
	net/sched: act_skbmod: Skip non-Ethernet packets
	ipv6: fix another slab-out-of-bounds in fib6_nh_flush_exceptions
	ceph: don't WARN if we're still opening a session to an MDS
	nvme-pci: don't WARN_ON in nvme_reset_work if ctrl.state is not RESETTING
	Revert "USB: quirks: ignore remote wake-up on Fibocom L850-GL LTE modem"
	afs: Fix tracepoint string placement with built-in AFS
	r8169: Avoid duplicate sysfs entry creation error
	nvme: set the PRACT bit when using Write Zeroes with T10 PI
	sctp: update active_key for asoc when old key is being replaced
	tcp: disable TFO blackhole logic by default
	net: dsa: sja1105: make VID 4095 a bridge VLAN too
	net: sched: cls_api: Fix the the wrong parameter
	drm/panel: raspberrypi-touchscreen: Prevent double-free
	cifs: only write 64kb at a time when fallocating a small region of a file
	cifs: fix fallocate when trying to allocate a hole.
	proc: Avoid mixing integer types in mem_rw()
	mmc: core: Don't allocate IDA for OF aliases
	s390/ftrace: fix ftrace_update_ftrace_func implementation
	s390/boot: fix use of expolines in the DMA code
	ALSA: usb-audio: Add missing proc text entry for BESPOKEN type
	ALSA: usb-audio: Add registration quirk for JBL Quantum headsets
	ALSA: sb: Fix potential ABBA deadlock in CSP driver
	ALSA: hda/realtek: Fix pop noise and 2 Front Mic issues on a machine
	ALSA: hdmi: Expose all pins on MSI MS-7C94 board
	ALSA: pcm: Call substream ack() method upon compat mmap commit
	ALSA: pcm: Fix mmap capability check
	Revert "usb: renesas-xhci: Fix handling of unknown ROM state"
	usb: xhci: avoid renesas_usb_fw.mem when it's unusable
	xhci: Fix lost USB 2 remote wake
	KVM: PPC: Book3S: Fix H_RTAS rets buffer overflow
	KVM: PPC: Book3S HV Nested: Sanitise H_ENTER_NESTED TM state
	usb: hub: Disable USB 3 device initiated lpm if exit latency is too high
	usb: hub: Fix link power management max exit latency (MEL) calculations
	USB: usb-storage: Add LaCie Rugged USB3-FW to IGNORE_UAS
	usb: max-3421: Prevent corruption of freed memory
	usb: renesas_usbhs: Fix superfluous irqs happen after usb_pkt_pop()
	USB: serial: option: add support for u-blox LARA-R6 family
	USB: serial: cp210x: fix comments for GE CS1000
	USB: serial: cp210x: add ID for CEL EM3588 USB ZigBee stick
	usb: gadget: Fix Unbalanced pm_runtime_enable in tegra_xudc_probe
	usb: dwc2: gadget: Fix GOUTNAK flow for Slave mode.
	usb: dwc2: gadget: Fix sending zero length packet in DDMA mode.
	usb: typec: stusb160x: register role switch before interrupt registration
	firmware/efi: Tell memblock about EFI iomem reservations
	tracepoints: Update static_call before tp_funcs when adding a tracepoint
	tracing/histogram: Rename "cpu" to "common_cpu"
	tracing: Fix bug in rb_per_cpu_empty() that might cause deadloop.
	tracing: Synthetic event field_pos is an index not a boolean
	btrfs: check for missing device in btrfs_trim_fs
	media: ngene: Fix out-of-bounds bug in ngene_command_config_free_buf()
	ixgbe: Fix packet corruption due to missing DMA sync
	bus: mhi: core: Validate channel ID when processing command completions
	posix-cpu-timers: Fix rearm racing against process tick
	selftest: use mmap instead of posix_memalign to allocate memory
	io_uring: explicitly count entries for poll reqs
	io_uring: remove double poll entry on arm failure
	userfaultfd: do not untag user pointers
	memblock: make for_each_mem_range() traverse MEMBLOCK_HOTPLUG regions
	hugetlbfs: fix mount mode command line processing
	rbd: don't hold lock_rwsem while running_list is being drained
	rbd: always kick acquire on "acquired" and "released" notifications
	misc: eeprom: at24: Always append device id even if label property is set.
	nds32: fix up stack guard gap
	driver core: Prevent warning when removing a device link from unregistered consumer
	drm: Return -ENOTTY for non-drm ioctls
	drm/amdgpu: update golden setting for sienna_cichlid
	net: dsa: mv88e6xxx: enable SerDes RX stats for Topaz
	net: dsa: mv88e6xxx: enable SerDes PCS register dump via ethtool -d on Topaz
	PCI: Mark AMD Navi14 GPU ATS as broken
	bonding: fix build issue
	skbuff: Release nfct refcount on napi stolen or re-used skbs
	Documentation: Fix intiramfs script name
	perf inject: Close inject.output on exit
	usb: ehci: Prevent missed ehci interrupts with edge-triggered MSI
	drm/i915/gvt: Clear d3_entered on elsp cmd submission.
	sfc: ensure correct number of XDP queues
	xhci: add xhci_get_virt_ep() helper
	skbuff: Fix build with SKB extensions disabled
	Linux 5.10.54

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ifd2823b47ab1544cd1f168b138624ffe060a471e
2021-07-28 15:23:47 +02:00
Nicolas Saenz Julienne
0ff2ea9d8f timers: Fix get_next_timer_interrupt() with no timers pending
[ Upstream commit aebacb7f6ca1926918734faae14d1f0b6fae5cb7 ]

31cd0e119d ("timers: Recalculate next timer interrupt only when
necessary") subtly altered get_next_timer_interrupt()'s behaviour. The
function no longer consistently returns KTIME_MAX with no timers
pending.

In order to decide if there are any timers pending we check whether the
next expiry will happen NEXT_TIMER_MAX_DELTA jiffies from now.
Unfortunately, the next expiry time and the timer base clock are no
longer updated in unison. The former changes upon certain timer
operations (enqueue, expire, detach), whereas the latter keeps track of
jiffies as they move forward. Ultimately breaking the logic above.

A simplified example:

- Upon entering get_next_timer_interrupt() with:

	jiffies = 1
	base->clk = 0;
	base->next_expiry = NEXT_TIMER_MAX_DELTA;

  'base->next_expiry == base->clk + NEXT_TIMER_MAX_DELTA', the function
  returns KTIME_MAX.

- 'base->clk' is updated to the jiffies value.

- The next time we enter get_next_timer_interrupt(), taking into account
  no timer operations happened:

	base->clk = 1;
	base->next_expiry = NEXT_TIMER_MAX_DELTA;

  'base->next_expiry != base->clk + NEXT_TIMER_MAX_DELTA', the function
  returns a valid expire time, which is incorrect.

This ultimately might unnecessarily rearm sched's timer on nohz_full
setups, and add latency to the system[1].

So, introduce 'base->timers_pending'[2], update it every time
'base->next_expiry' changes, and use it in get_next_timer_interrupt().

[1] See tick_nohz_stop_tick().
[2] A quick pahole check on x86_64 and arm64 shows it doesn't make
    'struct timer_base' any bigger.

Fixes: 31cd0e119d ("timers: Recalculate next timer interrupt only when necessary")
Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-28 14:35:37 +02:00
Huang Yiwei
ff96a0a256 ANDROID: timer: calc_index vendor hook adjustment
Move the calc_index vendor hook one line ahead to cover more
use cases.

Bug: 181296757
Signed-off-by: Huang Yiwei <hyiwei@codeaurora.org>
Change-Id: I52231a3ccbe622021232c7a54354c5ac02cf3952
2021-02-26 19:23:02 +00:00
Huang Yiwei
9789a88f42 ANDROID: timer: Add vendor hook for timer calc index
Since we're expecting timers more precisely in short period, add
a vendor hook to calc_index when adding timers. Then we can modify
the index this timer used to make it accurate.

Bug: 178758017
Signed-off-by: Huang Yiwei <hyiwei@codeaurora.org>
Change-Id: Ie0e6493ae7ad53b0cc57eb1bbcf8a0a11f652828
2021-02-02 02:04:18 +00:00
Changki Kim
96844c1c84 ANDROID: timer: Export hrtimer_expire_entry/exit tracepoints
Export hrtimer_expire_entry/exit tracepoints, so that vendor modules
can register probes for these tracepoints.

Bug: 175936268
Change-Id: I739f369d3b56e09f8e9061fefdf25830e37e987e
Signed-off-by: Changki Kim <changki.kim@samsung.com>
2020-12-21 17:49:09 +00:00
YueHaibing
9010e3876e timers: Remove unused inline funtion debug_timer_free()
There is no caller in tree, remove it.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200909134749.32300-1-yuehaibing@huawei.com
2020-10-26 11:39:21 +01:00
Willy Tarreau
3744741ada random32: add noise from network and scheduling activity
With the removal of the interrupt perturbations in previous random32
change (random32: make prandom_u32() output unpredictable), the PRNG
has become 100% deterministic again. While SipHash is expected to be
way more robust against brute force than the previous Tausworthe LFSR,
there's still the risk that whoever has even one temporary access to
the PRNG's internal state is able to predict all subsequent draws till
the next reseed (roughly every minute). This may happen through a side
channel attack or any data leak.

This patch restores the spirit of commit f227e3ec3b ("random32: update
the net random state on interrupt and activity") in that it will perturb
the internal PRNG's statee using externally collected noise, except that
it will not pick that noise from the random pool's bits nor upon
interrupt, but will rather combine a few elements along the Tx path
that are collectively hard to predict, such as dev, skb and txq
pointers, packet length and jiffies values. These ones are combined
using a single round of SipHash into a single long variable that is
mixed with the net_rand_state upon each invocation.

The operation was inlined because it produces very small and efficient
code, typically 3 xor, 2 add and 2 rol. The performance was measured
to be the same (even very slightly better) than before the switch to
SipHash; on a 6-core 12-thread Core i7-8700k equipped with a 40G NIC
(i40e), the connection rate dropped from 556k/s to 555k/s while the
SYN cookie rate grew from 5.38 Mpps to 5.45 Mpps.

Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
Cc: George Spelvin <lkml@sdf.org>
Cc: Amit Klein <aksecurity@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: tytso@mit.edu
Cc: Florian Westphal <fw@strlen.de>
Cc: Marc Plumb <lkml.mplumb@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2020-10-24 20:21:57 +02:00
George Spelvin
c51f8f88d7 random32: make prandom_u32() output unpredictable
Non-cryptographic PRNGs may have great statistical properties, but
are usually trivially predictable to someone who knows the algorithm,
given a small sample of their output.  An LFSR like prandom_u32() is
particularly simple, even if the sample is widely scattered bits.

It turns out the network stack uses prandom_u32() for some things like
random port numbers which it would prefer are *not* trivially predictable.
Predictability led to a practical DNS spoofing attack.  Oops.

This patch replaces the LFSR with a homebrew cryptographic PRNG based
on the SipHash round function, which is in turn seeded with 128 bits
of strong random key.  (The authors of SipHash have *not* been consulted
about this abuse of their algorithm.)  Speed is prioritized over security;
attacks are rare, while performance is always wanted.

Replacing all callers of prandom_u32() is the quick fix.
Whether to reinstate a weaker PRNG for uses which can tolerate it
is an open question.

Commit f227e3ec3b ("random32: update the net random state on interrupt
and activity") was an earlier attempt at a solution.  This patch replaces
it.

Reported-by: Amit Klein <aksecurity@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: tytso@mit.edu
Cc: Florian Westphal <fw@strlen.de>
Cc: Marc Plumb <lkml.mplumb@gmail.com>
Fixes: f227e3ec3b ("random32: update the net random state on interrupt and activity")
Signed-off-by: George Spelvin <lkml@sdf.org>
Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
[ willy: partial reversal of f227e3ec3b5c; moved SIPROUND definitions
  to prandom.h for later use; merged George's prandom_seed() proposal;
  inlined siprand_u32(); replaced the net_rand_state[] array with 4
  members to fix a build issue; cosmetic cleanups to make checkpatch
  happy; fixed RANDOM32_SELFTEST build ]
Signed-off-by: Willy Tarreau <w@1wt.eu>
2020-10-24 20:21:57 +02:00
Linus Torvalds
f5f59336a9 Merge tag 'timers-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timekeeping updates from Thomas Gleixner:
 "Updates for timekeeping, timers and related drivers:

  Core:

   - Early boot support for the NMI safe timekeeper by utilizing
     local_clock() up to the point where timekeeping is initialized.
     This allows printk() to store multiple timestamps in the ringbuffer
     which is useful for coordinating dmesg information across a fleet
     of machines.

   - Provide a multi-timestamp accessor for printk()

   - Make timer init more robust by checking for invalid timer flags.

   - Comma vs semicolon fixes

  Drivers:

   - Support for new platforms in existing drivers (SP804 and Renesas
     CMT)

   - Comma vs semicolon fixes

* tag 'timers-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource/drivers/armada-370-xp: Use semicolons rather than commas to separate statements
  clocksource/drivers/mps2-timer: Use semicolons rather than commas to separate statements
  timers: Mask invalid flags in do_init_timer()
  clocksource/drivers/sp804: Enable Hisilicon sp804 timer 64bit mode
  clocksource/drivers/sp804: Add support for Hisilicon sp804 timer
  clocksource/drivers/sp804: Support non-standard register offset
  clocksource/drivers/sp804: Prepare for support non-standard register offset
  clocksource/drivers/sp804: Remove a mismatched comment
  clocksource/drivers/sp804: Delete the leading "__" of some functions
  clocksource/drivers/sp804: Remove unused sp804_timer_disable() and timer-sp804.h
  clocksource/drivers/sp804: Cleanup clk_get_sys()
  dt-bindings: timer: renesas,cmt: Document r8a774e1 CMT support
  dt-bindings: timer: renesas,cmt: Document r8a7742 CMT support
  alarmtimer: Convert comma to semicolon
  timekeeping: Provide multi-timestamp accessor to NMI safe timekeeper
  timekeeping: Utilize local_clock() for NMI safe timekeeper during early boot
2020-10-12 11:27:54 -07:00
Qianli Zhao
b952caf2d5 timers: Mask invalid flags in do_init_timer()
do_init_timer() accepts any combination of timer flags handed in by the
caller without a sanity check, but only TIMER_DEFFERABLE, TIMER_PINNED and
TIMER_IRQSAFE are valid.

If the supplied flags have other bits set, this could result in
malfunction. If bits are set in TIMER_CPUMASK the first timer usage could
deference a cpu base which is outside the range of possible CPUs. If
TIMER_MIGRATION is set, then the switch_timer_base() will live lock.

Prevent that with a sanity check which warns when invalid flags are
supplied and masks them out.

[ tglx: Made it WARN_ON_ONCE() and added context to the changelog ]

Signed-off-by: Qianli Zhao <zhaoqianli@xiaomi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/9d79a8aa4eb56713af7379f99f062dedabcde140.1597326756.git.zhaoqianli@xiaomi.com
2020-09-24 22:12:18 +02:00
Stephen Boyd
f9e62f318f treewide: Make all debug_obj_descriptors const
This should make it harder for the kernel to corrupt the debug object
descriptor, used to call functions to fixup state and track debug objects,
by moving the structure to read-only memory.

Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200815004027.2046113-3-swboyd@chromium.org
2020-09-24 21:56:25 +02:00
Gustavo A. R. Silva
df561f6688 treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23 17:36:59 -05:00
Thomas Gleixner
1fb497dd00 posix-cpu-timers: Provide mechanisms to defer timer handling to task_work
Running posix CPU timers in hard interrupt context has a few downsides:

 - For PREEMPT_RT it cannot work as the expiry code needs to take
   sighand lock, which is a 'sleeping spinlock' in RT. The original RT
   approach of offloading the posix CPU timer handling into a high
   priority thread was clumsy and provided no real benefit in general.

 - For fine grained accounting it's just wrong to run this in context of
   the timer interrupt because that way a process specific CPU time is
   accounted to the timer interrupt.

 - Long running timer interrupts caused by a large amount of expiring
   timers which can be created and armed by unpriviledged user space.

There is no hard requirement to expire them in interrupt context.

If the signal is targeted at the task itself then it won't be delivered
before the task returns to user space anyway. If the signal is targeted at
a supervisor process then it might be slightly delayed, but posix CPU
timers are inaccurate anyway due to the fact that they are tied to the
tick.

Provide infrastructure to schedule task work which allows splitting the
posix CPU timer code into a quick check in interrupt context and a thread
context expiry and signal delivery function. This has to be enabled by
architectures as it requires that the architecture specific KVM
implementation handles pending task work before exiting to guest mode.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20200730102337.783470146@linutronix.de
2020-08-06 16:50:59 +02:00
Linus Torvalds
442489c219 Merge tag 'timers-core-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "Time, timers and related driver updates:

   - Prevent unnecessary timer softirq invocations by extending the
     tracking of the next expiring timer in the timer wheel beyond the
     existing NOHZ functionality.

     The tracking overhead at enqueue time is within the noise, but on
     sensitive workloads the avoidance of the soft interrupt invocation
     is a measurable improvement.

   - The obligatory new clocksource driver for Ingenic X100 OST

   - The usual fixes, improvements, cleanups and extensions for newer
     chip variants all over the driver space"

* tag 'timers-core-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
  timers: Recalculate next timer interrupt only when necessary
  clocksource/drivers/ingenic: Add support for the Ingenic X1000 OST.
  dt-bindings: timer: Add Ingenic X1000 OST bindings.
  clocksource/drivers: Replace HTTP links with HTTPS ones
  clocksource/drivers/nomadik-mtu: Handle 32kHz clock
  clocksource/drivers/sh_cmt: Use "kHz" for kilohertz
  clocksource/drivers/imx: Add support for i.MX TPM driver with ARM64
  clocksource/drivers/ingenic: Add high resolution timer support for SMP/SMT.
  timers: Lower base clock forwarding threshold
  timers: Remove must_forward_clk
  timers: Spare timer softirq until next expiry
  timers: Expand clk forward logic beyond nohz
  timers: Reuse next expiry cache after nohz exit
  timers: Always keep track of next expiry
  timers: Optimize _next_timer_interrupt() level iteration
  timers: Add comments about calc_index() ceiling work
  timers: Move trigger_dyntick_cpu() to enqueue_timer()
  timers: Use only bucket expiry for base->next_expiry value
  timers: Preserve higher bits of expiration on index calculation
  clocksource/drivers/timer-atmel-tcb: Add sama5d2 support
  ...
2020-08-04 18:17:37 -07:00
Willy Tarreau
f227e3ec3b random32: update the net random state on interrupt and activity
This modifies the first 32 bits out of the 128 bits of a random CPU's
net_rand_state on interrupt or CPU activity to complicate remote
observations that could lead to guessing the network RNG's internal
state.

Note that depending on some network devices' interrupt rate moderation
or binding, this re-seeding might happen on every packet or even almost
never.

In addition, with NOHZ some CPUs might not even get timer interrupts,
leaving their local state rarely updated, while they are running
networked processes making use of the random state.  For this reason, we
also perform this update in update_process_times() in order to at least
update the state when there is user or system activity, since it's the
only case we care about.

Reported-by: Amit Klein <aksecurity@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-29 10:35:37 -07:00
Frederic Weisbecker
31cd0e119d timers: Recalculate next timer interrupt only when necessary
The nohz tick code recalculates the timer wheel's next expiry on each idle
loop iteration.

On the other hand, the base next expiry is now always cached and updated
upon timer enqueue and execution. Only timer dequeue may leave
base->next_expiry out of date (but then its stale value won't ever go past
the actual next expiry to be recalculated).

Since recalculating the next_expiry isn't a free operation, especially when
the last wheel level is reached to find out that no timer has been enqueued
at all, reuse the next expiry cache when it is known to be reliable, which
it is most of the time.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200723151641.12236-1-frederic@kernel.org
2020-07-24 12:49:40 +02:00
Frederic Weisbecker
36cd28a4cd timers: Lower base clock forwarding threshold
There is nothing that prevents from forwarding the base clock if it's one
jiffy off. The reason for this arbitrary limit of two jiffies is historical
and does not longer exist.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20200717140551.29076-13-frederic@kernel.org
2020-07-17 21:55:25 +02:00
Frederic Weisbecker
0975fb565b timers: Remove must_forward_clk
There is no reason to keep this guard around. The code makes sure that
base->clk remains sane and won't be forwarded beyond jiffies nor set
backward.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20200717140551.29076-12-frederic@kernel.org
2020-07-17 21:55:25 +02:00
Frederic Weisbecker
d4f7dae870 timers: Spare timer softirq until next expiry
Now that the core timer infrastructure doesn't depend anymore on
periodic base->clk increments, even when the CPU is not in NO_HZ mode,
timer softirqs can be skipped until there are timers to expire.

Some spurious softirqs can still remain since base->next_expiry doesn't
keep track of canceled timers but this still reduces the number of softirqs
significantly: ~15 times less for HZ=1000 and ~5 times less for HZ=100.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20200717140551.29076-11-frederic@kernel.org
2020-07-17 21:55:24 +02:00
Frederic Weisbecker
1f8a4212dc timers: Expand clk forward logic beyond nohz
As for next_expiry, the base->clk catch up logic will be expanded beyond
NOHZ in order to avoid triggering useless softirqs.

If softirqs should only fire to expire pending timers, periodic base->clk
increments must be skippable for random amounts of time.  Therefore prepare
to catch-up with missing updates whenever an up-to-date base clock is
needed.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20200717140551.29076-10-frederic@kernel.org
2020-07-17 21:55:24 +02:00
Frederic Weisbecker
90d52f65f3 timers: Reuse next expiry cache after nohz exit
Now that the next expiry it tracked unconditionally when a timer is added,
this information can be reused on a tick firing after exiting nohz.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20200717140551.29076-9-frederic@kernel.org
2020-07-17 21:55:23 +02:00
Frederic Weisbecker
dc2a0f1fb2 timers: Always keep track of next expiry
So far next expiry was only tracked while the CPU was in nohz_idle mode
in order to cope with missing ticks that can't increment the base->clk
periodically anymore.

This logic is going to be expanded beyond nohz in order to spare timer
softirqs so do it unconditionally.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20200717140551.29076-8-frederic@kernel.org
2020-07-17 21:55:23 +02:00
Frederic Weisbecker
001ec1b392 timers: Optimize _next_timer_interrupt() level iteration
If a level has a timer that expires before reaching the next level, there
is no need to iterate further.

The next level is reached when the 3 lower bits of the current level are
cleared. If the next event happens before/during that, the next levels
won't provide an earlier expiration.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20200717140551.29076-7-frederic@kernel.org
2020-07-17 21:55:22 +02:00
Frederic Weisbecker
4468897211 timers: Add comments about calc_index() ceiling work
calc_index() adds 1 unit of the level granularity to the expiry passed
in parameter to ensure that the timer doesn't expire too early. Add a
comment to explain that and the resulting layout in the wheel.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20200717140551.29076-6-frederic@kernel.org
2020-07-17 21:55:22 +02:00
Frederic Weisbecker
9a2b764b06 timers: Move trigger_dyntick_cpu() to enqueue_timer()
Consolidate the code by calling trigger_dyntick_cpu() from
enqueue_timer() instead of calling it from all its callers.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20200717140551.29076-5-frederic@kernel.org
2020-07-17 21:55:22 +02:00
Anna-Maria Behnsen
1f32cab0db timers: Use only bucket expiry for base->next_expiry value
The bucket expiry time is the effective expriy time of timers and is
greater than or equal to the requested timer expiry time. This is due
to the guarantee that timers never expire early and the reduced expiry
granularity in the secondary wheel levels.

When a timer is enqueued, trigger_dyntick_cpu() checks whether the
timer is the new first timer. This check compares next_expiry with
the requested timer expiry value and not with the effective expiry
value of the bucket into which the timer was queued.

Storing the requested timer expiry value in base->next_expiry can lead
to base->clk going backwards if the requested timer expiry value is
smaller than base->clk. Commit 30c66fc30e ("timer: Prevent base->clk
from moving backward") worked around this by preventing the store when
timer->expiry is before base->clk, but did not fix the underlying
problem.

Use the expiry value of the bucket into which the timer is queued to
do the new first timer check. This fixes the base->clk going backward
problem.

The workaround of commit 30c66fc30e ("timer: Prevent base->clk from
moving backward") in trigger_dyntick_cpu() is not longer necessary as the
timers bucket expiry is guaranteed to be greater than or equal base->clk.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200717140551.29076-4-frederic@kernel.org
2020-07-17 21:55:21 +02:00
Frederic Weisbecker
3d2e83a2a6 timers: Preserve higher bits of expiration on index calculation
The higher bits of the timer expiration are cropped while calling
calc_index() due to the implicit cast from unsigned long to unsigned int.

This loss shouldn't have consequences on the current code since all the
computation to calculate the index is done on the lower 32 bits.

However to prepare for returning the actual bucket expiration from
calc_index() in order to properly fix base->next_expiry updates, the higher
bits need to be preserved.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200717140551.29076-3-frederic@kernel.org
2020-07-17 21:55:21 +02:00
Frederic Weisbecker
e2a71bdea8 timer: Fix wheel index calculation on last level
When an expiration delta falls into the last level of the wheel, that delta
has be compared against the maximum possible delay and reduced to fit in if
necessary.

However instead of comparing the delta against the maximum, the code
compares the actual expiry against the maximum. Then instead of fixing the
delta to fit in, it sets the maximum delta as the expiry value.

This can result in various undesired outcomes, the worst possible one
being a timer expiring 15 days ahead to fire immediately.

Fixes: 500462a9de ("timers: Switch to a non-cascading wheel")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200717140551.29076-2-frederic@kernel.org
2020-07-17 21:44:05 +02:00
Frederic Weisbecker
30c66fc30e timer: Prevent base->clk from moving backward
When a timer is enqueued with a negative delta (ie: expiry is below
base->clk), it gets added to the wheel as expiring now (base->clk).

Yet the value that gets stored in base->next_expiry, while calling
trigger_dyntick_cpu(), is the initial timer->expires value. The
resulting state becomes:

	base->next_expiry < base->clk

On the next timer enqueue, forward_timer_base() may accidentally
rewind base->clk. As a possible outcome, timers may expire way too
early, the worst case being that the highest wheel levels get spuriously
processed again.

To prevent from that, make sure that base->next_expiry doesn't get below
base->clk.

Fixes: a683f390b9 ("timers: Forward the wheel clock whenever possible")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200703010657.2302-1-frederic@kernel.org
2020-07-09 11:56:57 +02:00
Christoph Hellwig
32927393dc sysctl: pass kernel pointers to ->proc_handler
Instead of having all the sysctl handlers deal with user pointers, which
is rather hairy in terms of the BPF interaction, copy the input to and
from  userspace in common code.  This also means that the strings are
always NUL-terminated by the common code, making the API a little bit
safer.

As most handler just pass through the data to one of the common handlers
a lot of the changes are mechnical.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-27 02:07:40 -04:00
Linus Torvalds
dbb381b619 Merge tag 'timers-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timekeeping and timer updates from Thomas Gleixner:
 "Core:

   - Consolidation of the vDSO build infrastructure to address the
     difficulties of cross-builds for ARM64 compat vDSO libraries by
     restricting the exposure of header content to the vDSO build.

     This is achieved by splitting out header content into separate
     headers. which contain only the minimaly required information which
     is necessary to build the vDSO. These new headers are included from
     the kernel headers and the vDSO specific files.

   - Enhancements to the generic vDSO library allowing more fine grained
     control over the compiled in code, further reducing architecture
     specific storage and preparing for adopting the generic library by
     PPC.

   - Cleanup and consolidation of the exit related code in posix CPU
     timers.

   - Small cleanups and enhancements here and there

  Drivers:

   - The obligatory new drivers: Ingenic JZ47xx and X1000 TCU support

   - Correct the clock rate of PIT64b global clock

   - setup_irq() cleanup

   - Preparation for PWM and suspend support for the TI DM timer

   - Expand the fttmr010 driver to support ast2600 systems

   - The usual small fixes, enhancements and cleanups all over the
     place"

* tag 'timers-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (80 commits)
  Revert "clocksource/drivers/timer-probe: Avoid creating dead devices"
  vdso: Fix clocksource.h macro detection
  um: Fix header inclusion
  arm64: vdso32: Enable Clang Compilation
  lib/vdso: Enable common headers
  arm: vdso: Enable arm to use common headers
  x86/vdso: Enable x86 to use common headers
  mips: vdso: Enable mips to use common headers
  arm64: vdso32: Include common headers in the vdso library
  arm64: vdso: Include common headers in the vdso library
  arm64: Introduce asm/vdso/processor.h
  arm64: vdso32: Code clean up
  linux/elfnote.h: Replace elf.h with UAPI equivalent
  scripts: Fix the inclusion order in modpost
  common: Introduce processor.h
  linux/ktime.h: Extract common header for vDSO
  linux/jiffies.h: Extract common header for vDSO
  linux/time64.h: Extract common header for vDSO
  linux/time32.h: Extract common header for vDSO
  linux/time.h: Extract common header for vDSO
  ...
2020-03-30 18:51:47 -07:00
Eric Dumazet
90c018942c timer: Use hlist_unhashed_lockless() in timer_pending()
The timer_pending() function is mostly used in lockless contexts, so
Without proper annotations, KCSAN might detect a data-race [1].

Using hlist_unhashed_lockless() instead of hand-coding it seems
appropriate (as suggested by Paul E. McKenney).

[1]

BUG: KCSAN: data-race in del_timer / detach_if_pending

write to 0xffff88808697d870 of 8 bytes by task 10 on cpu 0:
 __hlist_del include/linux/list.h:764 [inline]
 detach_timer kernel/time/timer.c:815 [inline]
 detach_if_pending+0xcd/0x2d0 kernel/time/timer.c:832
 try_to_del_timer_sync+0x60/0xb0 kernel/time/timer.c:1226
 del_timer_sync+0x6b/0xa0 kernel/time/timer.c:1365
 schedule_timeout+0x2d2/0x6e0 kernel/time/timer.c:1896
 rcu_gp_fqs_loop+0x37c/0x580 kernel/rcu/tree.c:1639
 rcu_gp_kthread+0x143/0x230 kernel/rcu/tree.c:1799
 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

read to 0xffff88808697d870 of 8 bytes by task 12060 on cpu 1:
 del_timer+0x3b/0xb0 kernel/time/timer.c:1198
 sk_stop_timer+0x25/0x60 net/core/sock.c:2845
 inet_csk_clear_xmit_timers+0x69/0xa0 net/ipv4/inet_connection_sock.c:523
 tcp_clear_xmit_timers include/net/tcp.h:606 [inline]
 tcp_v4_destroy_sock+0xa3/0x3f0 net/ipv4/tcp_ipv4.c:2096
 inet_csk_destroy_sock+0xf4/0x250 net/ipv4/inet_connection_sock.c:836
 tcp_close+0x6f3/0x970 net/ipv4/tcp.c:2497
 inet_release+0x86/0x100 net/ipv4/af_inet.c:427
 __sock_release+0x85/0x160 net/socket.c:590
 sock_close+0x24/0x30 net/socket.c:1268
 __fput+0x1e1/0x520 fs/file_table.c:280
 ____fput+0x1f/0x30 fs/file_table.c:313
 task_work_run+0xf6/0x130 kernel/task_work.c:113
 tracehook_notify_resume include/linux/tracehook.h:188 [inline]
 exit_to_usermode_loop+0x2b4/0x2c0 arch/x86/entry/common.c:163

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 12060 Comm: syz-executor.5 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine,

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
[ paulmck: Pulled in Eric's later amendments. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20 15:58:22 -08:00
Alexander Popov
6e317c32fd timer: Improve the comment describing schedule_timeout()
When working commit 6dcd5d7a7a, a mistake was noticed by Linus:
schedule_timeout() was called without setting the task state to anything
particular.

It calls the scheduler, but doesn't delay anything, because the task stays
runnable. That happens because sched_submit_work() does nothing for tasks
in TASK_RUNNING state.

That turned out to be the intended behavior. Adding a WARN() is not useful
as the task could be woken up right after setting the state and before
reaching schedule_timeout().

Improve the comment about schedule_timeout() and describe that more
explicitly.

Signed-off-by: Alexander Popov <alex.popov@linux.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200117225900.16340-1-alex.popov@linux.com
2020-02-17 20:12:19 +01:00
Li RongQing
e430d802d6 timer: Read jiffies once when forwarding base clk
The timer delayed for more than 3 seconds warning was triggered during
testing.

  Workqueue: events_unbound sched_tick_remote
  RIP: 0010:sched_tick_remote+0xee/0x100
  ...
  Call Trace:
   process_one_work+0x18c/0x3a0
   worker_thread+0x30/0x380
   kthread+0x113/0x130
   ret_from_fork+0x22/0x40

The reason is that the code in collect_expired_timers() uses jiffies
unprotected:

    if (next_event > jiffies)
        base->clk = jiffies;

As the compiler is allowed to reload the value base->clk can advance
between the check and the store and in the worst case advance farther than
next event. That causes the timer expiry to be delayed until the wheel
pointer wraps around.

Convert the code to use READ_ONCE()

Fixes: 236968383c ("timers: Optimize collect_expired_timers() for NOHZ")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Liang ZhiCheng <liangzhicheng@baidu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1568894687-14499-1-git-send-email-lirongqing@baidu.com
2019-09-19 17:50:11 +02:00
Thomas Gleixner
dce3e8fd03 posix-cpu-timers: Remove tsk argument from run_posix_cpu_timers()
It's always current. Don't give people wrong ideas.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lkml.kernel.org/r/20190819143801.945469967@linutronix.de
2019-08-21 20:27:16 +02:00
Anna-Maria Gleixner
030dcdd197 timers: Prepare support for PREEMPT_RT
When PREEMPT_RT is enabled, the soft interrupt thread can be preempted.  If
the soft interrupt thread is preempted in the middle of a timer callback,
then calling del_timer_sync() can lead to two issues:

  - If the caller is on a remote CPU then it has to spin wait for the timer
    handler to complete. This can result in unbound priority inversion.

  - If the caller originates from the task which preempted the timer
    handler on the same CPU, then spin waiting for the timer handler to
    complete is never going to end.

To avoid these issues, add a new lock to the timer base which is held
around the execution of the timer callbacks. If del_timer_sync() detects
that the timer callback is currently running, it blocks on the expiry
lock. When the callback is finished, the expiry lock is dropped by the
softirq thread which wakes up the waiter and the system makes progress.

This addresses both the priority inversion and the life lock issues.

This mechanism is not used for timers which are marked IRQSAFE as for those
preemption is disabled accross the callback and therefore this situation
cannot happen. The callbacks for such timers need to be individually
audited for RT compliance.

The same issue can happen in virtual machines when the vCPU which runs a
timer callback is scheduled out. If a second vCPU of the same guest calls
del_timer_sync() it will spin wait for the other vCPU to be scheduled back
in. The expiry lock mechanism would avoid that. It'd be trivial to enable
this when paravirt spinlocks are enabled in a guest, but it's not clear
whether this is an actual problem in the wild, so for now it's an RT only
mechanism.

As the softirq thread can be preempted with PREEMPT_RT=y, the SMP variant
of del_timer_sync() needs to be used on UP as well.

[ tglx: Refactored it for mainline ]

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190726185753.832418500@linutronix.de
2019-08-01 20:51:22 +02:00
Linus Torvalds
0968621917 Merge tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk
Pull printk updates from Petr Mladek:

 - Allow state reset of printk_once() calls.

 - Prevent crashes when dereferencing invalid pointers in vsprintf().
   Only the first byte is checked for simplicity.

 - Make vsprintf warnings consistent and inlined.

 - Treewide conversion of obsolete %pf, %pF to %ps, %pF printf
   modifiers.

 - Some clean up of vsprintf and test_printf code.

* tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
  lib/vsprintf: Make function pointer_string static
  vsprintf: Limit the length of inlined error messages
  vsprintf: Avoid confusion between invalid address and value
  vsprintf: Prevent crash when dereferencing invalid pointers
  vsprintf: Consolidate handling of unknown pointer specifiers
  vsprintf: Factor out %pO handler as kobject_string()
  vsprintf: Factor out %pV handler as va_format()
  vsprintf: Factor out %p[iI] handler as ip_addr_string()
  vsprintf: Do not check address of well-known strings
  vsprintf: Consistent %pK handling for kptr_restrict == 0
  vsprintf: Shuffle restricted_pointer()
  printk: Tie printk_once / printk_deferred_once into .data.once for reset
  treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively
  lib/test_printf: Switch to bitmap_zalloc()
2019-05-07 09:18:12 -07:00
Sakari Ailus
d75f773c86 treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively
%pF and %pf are functionally equivalent to %pS and %ps conversion
specifiers. The former are deprecated, therefore switch the current users
to use the preferred variant.

The changes have been produced by the following command:

	git grep -l '%p[fF]' | grep -v '^\(tools\|Documentation\)/' | \
	while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done

And verifying the result.

Link: http://lkml.kernel.org/r/20190325193229.23390-1-sakari.ailus@linux.intel.com
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: sparclinux@vger.kernel.org
Cc: linux-um@lists.infradead.org
Cc: xen-devel@lists.xenproject.org
Cc: linux-acpi@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Cc: linux-mmc@vger.kernel.org
Cc: linux-nvdimm@lists.01.org
Cc: linux-pci@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: linux-btrfs@vger.kernel.org
Cc: linux-f2fs-devel@lists.sourceforge.net
Cc: linux-mm@kvack.org
Cc: ceph-devel@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Acked-by: David Sterba <dsterba@suse.com> (for btrfs)
Acked-by: Mike Rapoport <rppt@linux.ibm.com> (for mm/memblock.c)
Acked-by: Bjorn Helgaas <bhelgaas@google.com> (for drivers/pci)
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2019-04-09 14:19:06 +02:00
Anna-Maria Gleixner
f28d3d5346 timer/trace: Improve timer tracing
Timers are added to the timer wheel off by one. This is required in
case a timer is queued directly before incrementing jiffies to prevent
early timer expiry.

When reading a timer trace and relying only on the expiry time of the timer
in the timer_start trace point and on the now in the timer_expiry_entry
trace point, it seems that the timer fires late. With the current
timer_expiry_entry trace point information only now=jiffies is printed but
not the value of base->clk. This makes it impossible to draw a conclusion
to the index of base->clk and makes it impossible to examine timer problems
without additional trace points.

Therefore add the base->clk value to the timer_expire_entry trace
point, to be able to calculate the index the timer base is located at
during collecting expired timers.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fweisbec@gmail.com
Cc: peterz@infradead.org
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/20190321120921.16463-5-anna-maria@linutronix.de
2019-03-24 20:29:33 +01:00
Anna-Maria Gleixner
dc1e7dc5ac timer: Move trace point to get proper index
When placing the timer_start trace point before the timer wheel bucket
index is calculated, the index information in the trace point is useless.

It is not possible to simply move the debug_activate() call after the index
calculation, because debug_object_activate() needs to be called before
touching the object.

Therefore split debug_activate() and move the trace point into
enqueue_timer() after the new index has been calculated. The
debug_object_activate() call remains at the original place.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fweisbec@gmail.com
Cc: peterz@infradead.org
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/20190321120921.16463-3-anna-maria@linutronix.de
2019-03-24 20:29:32 +01:00
Linus Torvalds
3717f613f4 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main RCU related changes in this cycle were:

   - Additional cleanups after RCU flavor consolidation

   - Grace-period forward-progress cleanups and improvements

   - Documentation updates

   - Miscellaneous fixes

   - spin_is_locked() conversions to lockdep

   - SPDX changes to RCU source and header files

   - SRCU updates

   - Torture-test updates, including nolibc updates and moving nolibc to
     tools/include"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
  locking/locktorture: Convert to SPDX license identifier
  linux/torture: Convert to SPDX license identifier
  torture: Convert to SPDX license identifier
  linux/srcu: Convert to SPDX license identifier
  linux/rcutree: Convert to SPDX license identifier
  linux/rcutiny: Convert to SPDX license identifier
  linux/rcu_sync: Convert to SPDX license identifier
  linux/rcu_segcblist: Convert to SPDX license identifier
  linux/rcupdate: Convert to SPDX license identifier
  linux/rcu_node_tree: Convert to SPDX license identifier
  rcu/update: Convert to SPDX license identifier
  rcu/tree: Convert to SPDX license identifier
  rcu/tiny: Convert to SPDX license identifier
  rcu/sync: Convert to SPDX license identifier
  rcu/srcu: Convert to SPDX license identifier
  rcu/rcutorture: Convert to SPDX license identifier
  rcu/rcu_segcblist: Convert to SPDX license identifier
  rcu/rcuperf: Convert to SPDX license identifier
  rcu/rcu.h: Convert to SPDX license identifier
  RCU/torture.txt: Remove section MODULE PARAMETERS
  ...
2019-03-05 14:49:11 -08:00
Gustavo A. R. Silva
75b710af71 timers: Mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where fall through is indeed expected.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Stephen Boyd <sboyd@kernel.org>
Link: https://lkml.kernel.org/r/20190123081413.GA3949@embeddedor
2019-01-29 20:08:42 +01:00
Paul E. McKenney
c98cac603f rcu: Rename rcu_check_callbacks() to rcu_sched_clock_irq()
The name rcu_check_callbacks() arguably made sense back in the early
2000s when RCU was quite a bit simpler than it is today, but it has
become quite misleading, especially with the advent of dyntick-idle
and NO_HZ_FULL.  The rcu_check_callbacks() function is RCU's hook into
the scheduling-clock interrupt, and is now but one of many ways that
callbacks get promoted to invocable state.

This commit therefore changes the name to rcu_sched_clock_irq(),
which is the same number of characters and clearly indicates this
function's relation to the rest of the Linux kernel.  In addition, for
the sake of consistency, rcu_flavor_check_callbacks() is also renamed
to rcu_flavor_sched_clock_irq().

While in the area, the header comments for both functions are reworked.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25 15:35:21 -08:00
Thomas Gleixner
35728b8209 time: Add SPDX license identifiers
Update the time(r) core files files with the correct SPDX license
identifier based on the license text in the file itself. The SPDX
identifier is a legally binding shorthand, which can be used instead of the
full boiler plate text.

This work is based on a script and data from Philippe Ombredanne, Kate
Stewart and myself. The data has been created with two independent license
scanners and manual inspection.

The following files do not contain any direct license information and have
been omitted from the big initial SPDX changes:

  timeconst.bc: The .bc files were not touched
  time.c, timer.c, timekeeping.c: Licence was deduced from EXPORT_SYMBOL_GPL

As those files do not contain direct license references they fall under the
project license, i.e. GPL V2 only.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: David Riley <davidriley@chromium.org>
Cc: Colin Cross <ccross@android.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: https://lkml.kernel.org/r/20181031182252.879109557@linutronix.de
2018-11-23 11:51:20 +01:00
Thomas Gleixner
58c5fc2b96 time: Remove useless filenames in top level comments
Remove the pointless filenames in the top level comments. They have no
value at all and just occupy space. While at it tidy up some of the
comments and remove a stale one.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: David Riley <davidriley@chromium.org>
Cc: Colin Cross <ccross@android.com>
Cc: Mark Brown <broonie@kernel.org>
Link: https://lkml.kernel.org/r/20181031182252.794898238@linutronix.de
2018-11-23 11:51:20 +01:00
Gaurav Kohli
363e934d88 timers: Clear timer_base::must_forward_clk with timer_base::lock held
timer_base::must_forward_clock is indicating that the base clock might be
stale due to a long idle sleep.

The forwarding of the base clock takes place in the timer softirq or when a
timer is enqueued to a base which is idle. If the enqueue of timer to an
idle base happens from a remote CPU, then the following race can happen:

  CPU0					CPU1
  run_timer_softirq			mod_timer

					base = lock_timer_base(timer);
  base->must_forward_clk = false
					if (base->must_forward_clk)
				       	    forward(base); -> skipped

					enqueue_timer(base, timer, idx);
					-> idx is calculated high due to
					   stale base
					unlock_timer_base(timer);
  base = lock_timer_base(timer);
  forward(base);

The root cause is that timer_base::must_forward_clk is cleared outside the
timer_base::lock held region, so the remote queuing CPU observes it as
cleared, but the base clock is still stale. This can cause large
granularity values for timers, i.e. the accuracy of the expiry time
suffers.

Prevent this by clearing the flag with timer_base::lock held, so that the
forwarding takes place before the cleared flag is observable by a remote
CPU.

Signed-off-by: Gaurav Kohli <gkohli@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: john.stultz@linaro.org
Cc: sboyd@kernel.org
Cc: linux-arm-msm@vger.kernel.org
Link: https://lkml.kernel.org/r/1533199863-22748-1-git-send-email-gkohli@codeaurora.org
2018-08-02 12:52:38 +02:00
Yi Wang
3058758925 timer: Fix coding style
The call to wake_up_nohz_cpu() is incorrectly indented. Remove the surplus TAB.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
Cc: john.stultz@linaro.org
Cc: sboyd@kernel.org
Cc: zhong.weidong@zte.com.cn
CC: Anna-Maria Gleixner <anna-maria@linutronix.de>
Link: https://lkml.kernel.org/r/1531721337-30284-1-git-send-email-wang.yi59@zte.com.cn
2018-07-19 16:52:40 +02:00