android_kernel_xiaomi_sm8450

xiaomi-sm8450/android_kernel_xiaomi_sm8450

Author	SHA1	Message	Date
SeongJae Park	a9ec7ed936	BACKPORT: timers: implement usleep_idle_range() Patch series "mm/damon: Fix fake /proc/loadavg reports", v3. This patchset fixes DAMON's fake load report issue. The first patch makes yet another variant of usleep_range() for this fix, and the second patch fixes the issue of DAMON by making it using the newly introduced function. This patch (of 2): Some kernel threads such as DAMON could need to repeatedly sleep in micro seconds level. Because usleep_range() sleeps in uninterruptible state, however, such threads would make /proc/loadavg reports fake load. To help such cases, this commit implements a variant of usleep_range() called usleep_idle_range(). It is same to usleep_range() but sets the state of the current task as TASK_IDLE while sleeping. Link: https://lkml.kernel.org/r/20211126145015.15862-1-sj@kernel.org Link: https://lkml.kernel.org/r/20211126145015.15862-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Cc: John Stultz <john.stultz@linaro.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit e4779015fd5d2fb8390c258268addff24d6077c7) Bug: 228223814 Signed-off-by: Hailong Tu <tuhailong@oppo.com> Change-Id: Ie590ba5fcff22c981d0a7ecae6d8e551160136f3	2022-04-28 23:09:17 +08:00
Greg Kroah-Hartman	af3bdb4304	Merge 5.10.58 into android12-5.10-lts Changes in 5.10.58 Revert "ACPICA: Fix memory leak caused by _CID repair function" ALSA: seq: Fix racy deletion of subscriber bus: ti-sysc: Fix gpt12 system timer issue with reserved status net: xfrm: fix memory leak in xfrm_user_rcv_msg arm64: dts: ls1028a: fix node name for the sysclk ARM: imx: add missing iounmap() ARM: imx: add missing clk_disable_unprepare() ARM: dts: imx6qdl-sr-som: Increase the PHY reset duration to 10ms arm64: dts: ls1028: sl28: fix networking for variant 2 ARM: dts: colibri-imx6ull: limit SDIO clock to 25MHz ARM: imx: fix missing 3rd argument in macro imx_mmdc_perf_init ARM: dts: imx: Swap M53Menlo pinctrl_power_button/pinctrl_power_out pins arm64: dts: armada-3720-turris-mox: fixed indices for the SDHC controllers arm64: dts: armada-3720-turris-mox: remove mrvl,i2c-fast-mode ALSA: usb-audio: fix incorrect clock source setting clk: stm32f4: fix post divisor setup for I2S/SAI PLLs ARM: dts: am437x-l4: fix typo in can@0 node omap5-board-common: remove not physically existing vdds_1v8_main fixed-regulator dmaengine: uniphier-xdmac: Use readl_poll_timeout_atomic() in atomic state clk: tegra: Implement disable_unused() of tegra_clk_sdmmc_mux_ops dmaengine: stm32-dma: Fix PM usage counter imbalance in stm32 dma ops dmaengine: stm32-dmamux: Fix PM usage counter unbalance in stm32 dmamux ops spi: imx: mx51-ecspi: Reinstate low-speed CONFIGREG delay spi: imx: mx51-ecspi: Fix low-speed CONFIGREG delay calculation scsi: sr: Return correct event when media event code is 3 media: videobuf2-core: dequeue if start_streaming fails ARM: dts: stm32: Disable LAN8710 EDPD on DHCOM ARM: dts: stm32: Fix touchscreen IRQ line assignment on DHCOM dmaengine: imx-dma: configure the generic DMA type to make it work net, gro: Set inner transport header offset in tcp/udp GRO hook net: dsa: sja1105: overwrite dynamic FDB entries with static ones in .port_fdb_add net: dsa: sja1105: invalidate dynamic FDB entries learned concurrently with statically added ones net: dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too net: dsa: sja1105: match FDB entries regardless of inner/outer VLAN tag net: phy: micrel: Fix detection of ksz87xx switch net: natsemi: Fix missing pci_disable_device() in probe and remove gpio: tqmx86: really make IRQ optional RDMA/mlx5: Delay emptying a cache entry when a new MR is added to it recently sctp: move the active_key update after sh_keys is added nfp: update ethtool reporting of pauseframe control net: ipv6: fix returned variable type in ip6_skb_dst_mtu net: dsa: qca: ar9331: reorder MDIO write sequence net: sched: fix lockdep_set_class() typo error for sch->seqlock MIPS: check return value of pgtable_pmd_page_ctor mips: Fix non-POSIX regexp bnx2x: fix an error code in bnx2x_nic_load() net: pegasus: fix uninit-value in get_interrupt_interval net: fec: fix use-after-free in fec_drv_remove net: vxge: fix use-after-free in vxge_device_unregister blk-iolatency: error out if blk_get_queue() failed in iolatency_set_limit() Bluetooth: defer cleanup of resources in hci_unregister_dev() USB: usbtmc: Fix RCU stall warning USB: serial: option: add Telit FD980 composition 0x1056 USB: serial: ch341: fix character loss at high transfer rates USB: serial: ftdi_sio: add device ID for Auto-M3 OP-COM v2 firmware_loader: use -ETIMEDOUT instead of -EAGAIN in fw_load_sysfs_fallback firmware_loader: fix use-after-free in firmware_fallback_sysfs drm/amdgpu/display: fix DMUB firmware version info ALSA: pcm - fix mmap capability check for the snd-dummy driver ALSA: hda/realtek: add mic quirk for Acer SF314-42 ALSA: hda/realtek: Fix headset mic for Acer SWIFT SF314-56 (ALC256) ALSA: usb-audio: Fix superfluous autosuspend recovery ALSA: usb-audio: Add registration quirk for JBL Quantum 600 usb: dwc3: gadget: Avoid runtime resume if disabling pullup usb: gadget: remove leaked entry from udc driver list usb: cdns3: Fixed incorrect gadget state usb: gadget: f_hid: added GET_IDLE and SET_IDLE handlers usb: gadget: f_hid: fixed NULL pointer dereference usb: gadget: f_hid: idle uses the highest byte for duration usb: host: ohci-at91: suspend/resume ports after/before OHCI accesses usb: typec: tcpm: Keep other events when receiving FRS and Sourcing_vbus events usb: otg-fsm: Fix hrtimer list corruption clk: fix leak on devm_clk_bulk_get_all() unwind scripts/tracing: fix the bug that can't parse raw_trace_func tracing / histogram: Give calculation hist_fields a size tracing: Reject string operand in the histogram expression tracing: Fix NULL pointer dereference in start_creating tracepoint: static call: Compare data on transition from 2->1 callees tracepoint: Fix static call function vs data state mismatch arm64: stacktrace: avoid tracing arch_stack_walk() optee: Clear stale cache entries during initialization tee: add tee_shm_alloc_kernel_buf() optee: Fix memory leak when failing to register shm pages optee: Refuse to load the driver under the kdump kernel optee: fix tee out of memory failure seen during kexec reboot tpm_ftpm_tee: Free and unregister TEE shared memory during kexec staging: rtl8723bs: Fix a resource leak in sd_int_dpc staging: rtl8712: get rid of flush_scheduled_work staging: rtl8712: error handling refactoring drivers core: Fix oops when driver probe fails media: rtl28xxu: fix zero-length control request pipe: increase minimum default pipe size to 2 pages ext4: fix potential htree corruption when growing large_dir directories serial: tegra: Only print FIFO error message when an error occurs serial: 8250_mtk: fix uart corruption issue when rx power off serial: 8250: Mask out floating 16/32-bit bus bits MIPS: Malta: Do not byte-swap accesses to the CBUS UART serial: 8250_pci: Enumerate Elkhart Lake UARTs via dedicated driver serial: 8250_pci: Avoid irq sharing for MSI(-X) interrupts. fpga: dfl: fme: Fix cpu hotplug issue in performance reporting timers: Move clearing of base::timer_running under base:: Lock xfrm: Fix RCU vs hash_resize_mutex lock inversion net/xfrm/compat: Copy xfrm_spdattr_type_t atributes pcmcia: i82092: fix a null pointer dereference bug selinux: correct the return value when loads initial sids bus: ti-sysc: AM3: RNG is GP only Revert "gpio: mpc8xxx: change the gpio interrupt flags." ARM: omap2+: hwmod: fix potential NULL pointer access md/raid10: properly indicate failure when ending a failed write request KVM: x86: accept userspace interrupt only if no event is injected KVM: Do not leak memory for duplicate debugfs directories KVM: x86/mmu: Fix per-cpu counter corruption on 32-bit builds arm64: vdso: Avoid ISB after reading from cntvct_el0 soc: ixp4xx: fix printing resources interconnect: Fix undersized devress_alloc allocation spi: meson-spicc: fix memory leak in meson_spicc_remove interconnect: Zero initial BW after sync-state interconnect: Always call pre_aggregate before aggregate interconnect: qcom: icc-rpmh: Ensure floor BW is enforced for all nodes drm/i915: Correct SFC_DONE register offset soc: ixp4xx/qmgr: fix invalid __iomem access perf/x86/amd: Don't touch the AMD64_EVENTSEL_HOSTONLY bit inside the guest sched/rt: Fix double enqueue caused by rt_effective_prio drm/i915: avoid uninitialised var in eb_parse() libata: fix ata_pio_sector for CONFIG_HIGHMEM reiserfs: add check for root_inode in reiserfs_fill_super reiserfs: check directory items on read from disk virt_wifi: fix error on connect net: qede: Fix end of loop tests for list_for_each_entry alpha: Send stop IPI to send to online CPUs net/qla3xxx: fix schedule while atomic in ql_wait_for_drvr_lock and ql_adapter_reset smb3: rc uninitialized in one fallocate path drm/amdgpu/display: only enable aux backlight control for OLED panels arm64: fix compat syscall return truncation Linux 5.10.58 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I2533667974c9dff419a14d63e0e8febfb3de80f1	2021-08-12 14:58:34 +02:00
Thomas Gleixner	23e36a8610	timers: Move clearing of base::timer_running under base:: Lock commit bb7262b295472eb6858b5c49893954794027cd84 upstream. syzbot reported KCSAN data races vs. timer_base::timer_running being set to NULL without holding base::lock in expire_timers(). This looks innocent and most reads are clearly not problematic, but Frederic identified an issue which is: int data = 0; void timer_func(struct timer_list *t) { data = 1; } CPU 0 CPU 1 ------------------------------ -------------------------- base = lock_timer_base(timer, &flags); raw_spin_unlock(&base->lock); if (base->running_timer != timer) call_timer_fn(timer, fn, baseclk); ret = detach_if_pending(timer, base, true); base->running_timer = NULL; raw_spin_unlock_irqrestore(&base->lock, flags); raw_spin_lock(&base->lock); x = data; If the timer has previously executed on CPU 1 and then CPU 0 can observe base->running_timer == NULL and returns, assuming the timer has completed, but it's not guaranteed on all architectures. The comment for del_timer_sync() makes that guarantee. Moving the assignment under base->lock prevents this. For non-RT kernel it's performance wise completely irrelevant whether the store happens before or after taking the lock. For an RT kernel moving the store under the lock requires an extra unlock/lock pair in the case that there is a waiter for the timer, but that's not the end of the world. Reported-by: syzbot+aa7c2385d46c5eba0b89@syzkaller.appspotmail.com Reported-by: syzbot+abea4558531bae1ba9fe@syzkaller.appspotmail.com Fixes: `030dcdd197` ("timers: Prepare support for PREEMPT_RT") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/87lfea7gw8.fsf@nanos.tec.linutronix.de Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-12 13:22:15 +02:00
Greg Kroah-Hartman	e4cac2c332	Merge 5.10.54 into android12-5.10-lts Changes in 5.10.54 igc: Fix use-after-free error during reset igb: Fix use-after-free error during reset igc: change default return of igc_read_phy_reg() ixgbe: Fix an error handling path in 'ixgbe_probe()' igc: Fix an error handling path in 'igc_probe()' igb: Fix an error handling path in 'igb_probe()' fm10k: Fix an error handling path in 'fm10k_probe()' e1000e: Fix an error handling path in 'e1000_probe()' iavf: Fix an error handling path in 'iavf_probe()' igb: Check if num of q_vectors is smaller than max before array access igb: Fix position of assignment to *ring gve: Fix an error handling path in 'gve_probe()' net: add kcov handle to skb extensions bonding: fix suspicious RCU usage in bond_ipsec_add_sa() bonding: fix null dereference in bond_ipsec_add_sa() ixgbevf: use xso.real_dev instead of xso.dev in callback functions of struct xfrmdev_ops bonding: fix suspicious RCU usage in bond_ipsec_del_sa() bonding: disallow setting nested bonding + ipsec offload bonding: Add struct bond_ipesc to manage SA bonding: fix suspicious RCU usage in bond_ipsec_offload_ok() bonding: fix incorrect return value of bond_ipsec_offload_ok() ipv6: fix 'disable_policy' for fwd packets stmmac: platform: Fix signedness bug in stmmac_probe_config_dt() selftests: icmp_redirect: remove from checking for IPv6 route get selftests: icmp_redirect: IPv6 PMTU info should be cleared after redirect pwm: sprd: Ensure configuring period and duty_cycle isn't wrongly skipped cxgb4: fix IRQ free race during driver unload mptcp: fix warning in __skb_flow_dissect() when do syn cookie for subflow join nvme-pci: do not call nvme_dev_remove_admin from nvme_remove KVM: x86/pmu: Clear anythread deprecated bit when 0xa leaf is unsupported on the SVM perf inject: Fix dso->nsinfo refcounting perf map: Fix dso->nsinfo refcounting perf probe: Fix dso->nsinfo refcounting perf env: Fix sibling_dies memory leak perf test session_topology: Delete session->evlist perf test event_update: Fix memory leak of evlist perf dso: Fix memory leak in dso__new_map() perf test maps__merge_in: Fix memory leak of maps perf env: Fix memory leak of cpu_pmu_caps perf report: Free generated help strings for sort option perf script: Fix memory 'threads' and 'cpus' leaks on exit perf lzma: Close lzma stream on exit perf probe-file: Delete namelist in del_events() on the error path perf data: Close all files in close_dir() perf sched: Fix record failure when CONFIG_SCHEDSTATS is not set ASoC: wm_adsp: Correct wm_coeff_tlv_get handling spi: imx: add a check for speed_hz before calculating the clock spi: stm32: fixes pm_runtime calls in probe/remove regulator: hi6421: Use correct variable type for regmap api val argument regulator: hi6421: Fix getting wrong drvdata spi: mediatek: fix fifo rx mode ASoC: rt5631: Fix regcache sync errors on resume bpf, test: fix NULL pointer dereference on invalid expected_attach_type bpf: Fix tail_call_reachable rejection for interpreter when jit failed xdp, net: Fix use-after-free in bpf_xdp_link_release timers: Fix get_next_timer_interrupt() with no timers pending liquidio: Fix unintentional sign extension issue on left shift of u16 s390/bpf: Perform r1 range checking before accessing jit->seen_reg[r1] bpf, sockmap: Fix potential memory leak on unlikely error case bpf, sockmap, tcp: sk_prot needs inuse_idx set for proc stats bpf, sockmap, udp: sk_prot needs inuse_idx set for proc stats bpftool: Check malloc return value in mount_bpffs_for_pin net: fix uninit-value in caif_seqpkt_sendmsg usb: hso: fix error handling code of hso_create_net_device dma-mapping: handle vmalloc addresses in dma_common_{mmap,get_sgtable} efi/tpm: Differentiate missing and invalid final event log table. net: decnet: Fix sleeping inside in af_decnet KVM: PPC: Book3S: Fix CONFIG_TRANSACTIONAL_MEM=n crash KVM: PPC: Fix kvm_arch_vcpu_ioctl vcpu_load leak net: sched: fix memory leak in tcindex_partial_destroy_work sctp: trim optlen when it's a huge value in sctp_setsockopt netrom: Decrease sock refcount when sock timers expire scsi: iscsi: Fix iface sysfs attr detection scsi: target: Fix protect handling in WRITE SAME(32) spi: cadence: Correct initialisation of runtime PM again ACPI: Kconfig: Fix table override from built-in initrd bnxt_en: don't disable an already disabled PCI device bnxt_en: Refresh RoCE capabilities in bnxt_ulp_probe() bnxt_en: Add missing check for BNXT_STATE_ABORT_ERR in bnxt_fw_rset_task() bnxt_en: Validate vlan protocol ID on RX packets bnxt_en: Check abort error state in bnxt_half_open_nic() net: hisilicon: rename CACHE_LINE_MASK to avoid redefinition net/tcp_fastopen: fix data races around tfo_active_disable_stamp ALSA: hda: intel-dsp-cfg: add missing ElkhartLake PCI ID net: hns3: fix possible mismatches resp of mailbox net: hns3: fix rx VLAN offload state inconsistent issue spi: spi-bcm2835: Fix deadlock net/sched: act_skbmod: Skip non-Ethernet packets ipv6: fix another slab-out-of-bounds in fib6_nh_flush_exceptions ceph: don't WARN if we're still opening a session to an MDS nvme-pci: don't WARN_ON in nvme_reset_work if ctrl.state is not RESETTING Revert "USB: quirks: ignore remote wake-up on Fibocom L850-GL LTE modem" afs: Fix tracepoint string placement with built-in AFS r8169: Avoid duplicate sysfs entry creation error nvme: set the PRACT bit when using Write Zeroes with T10 PI sctp: update active_key for asoc when old key is being replaced tcp: disable TFO blackhole logic by default net: dsa: sja1105: make VID 4095 a bridge VLAN too net: sched: cls_api: Fix the the wrong parameter drm/panel: raspberrypi-touchscreen: Prevent double-free cifs: only write 64kb at a time when fallocating a small region of a file cifs: fix fallocate when trying to allocate a hole. proc: Avoid mixing integer types in mem_rw() mmc: core: Don't allocate IDA for OF aliases s390/ftrace: fix ftrace_update_ftrace_func implementation s390/boot: fix use of expolines in the DMA code ALSA: usb-audio: Add missing proc text entry for BESPOKEN type ALSA: usb-audio: Add registration quirk for JBL Quantum headsets ALSA: sb: Fix potential ABBA deadlock in CSP driver ALSA: hda/realtek: Fix pop noise and 2 Front Mic issues on a machine ALSA: hdmi: Expose all pins on MSI MS-7C94 board ALSA: pcm: Call substream ack() method upon compat mmap commit ALSA: pcm: Fix mmap capability check Revert "usb: renesas-xhci: Fix handling of unknown ROM state" usb: xhci: avoid renesas_usb_fw.mem when it's unusable xhci: Fix lost USB 2 remote wake KVM: PPC: Book3S: Fix H_RTAS rets buffer overflow KVM: PPC: Book3S HV Nested: Sanitise H_ENTER_NESTED TM state usb: hub: Disable USB 3 device initiated lpm if exit latency is too high usb: hub: Fix link power management max exit latency (MEL) calculations USB: usb-storage: Add LaCie Rugged USB3-FW to IGNORE_UAS usb: max-3421: Prevent corruption of freed memory usb: renesas_usbhs: Fix superfluous irqs happen after usb_pkt_pop() USB: serial: option: add support for u-blox LARA-R6 family USB: serial: cp210x: fix comments for GE CS1000 USB: serial: cp210x: add ID for CEL EM3588 USB ZigBee stick usb: gadget: Fix Unbalanced pm_runtime_enable in tegra_xudc_probe usb: dwc2: gadget: Fix GOUTNAK flow for Slave mode. usb: dwc2: gadget: Fix sending zero length packet in DDMA mode. usb: typec: stusb160x: register role switch before interrupt registration firmware/efi: Tell memblock about EFI iomem reservations tracepoints: Update static_call before tp_funcs when adding a tracepoint tracing/histogram: Rename "cpu" to "common_cpu" tracing: Fix bug in rb_per_cpu_empty() that might cause deadloop. tracing: Synthetic event field_pos is an index not a boolean btrfs: check for missing device in btrfs_trim_fs media: ngene: Fix out-of-bounds bug in ngene_command_config_free_buf() ixgbe: Fix packet corruption due to missing DMA sync bus: mhi: core: Validate channel ID when processing command completions posix-cpu-timers: Fix rearm racing against process tick selftest: use mmap instead of posix_memalign to allocate memory io_uring: explicitly count entries for poll reqs io_uring: remove double poll entry on arm failure userfaultfd: do not untag user pointers memblock: make for_each_mem_range() traverse MEMBLOCK_HOTPLUG regions hugetlbfs: fix mount mode command line processing rbd: don't hold lock_rwsem while running_list is being drained rbd: always kick acquire on "acquired" and "released" notifications misc: eeprom: at24: Always append device id even if label property is set. nds32: fix up stack guard gap driver core: Prevent warning when removing a device link from unregistered consumer drm: Return -ENOTTY for non-drm ioctls drm/amdgpu: update golden setting for sienna_cichlid net: dsa: mv88e6xxx: enable SerDes RX stats for Topaz net: dsa: mv88e6xxx: enable SerDes PCS register dump via ethtool -d on Topaz PCI: Mark AMD Navi14 GPU ATS as broken bonding: fix build issue skbuff: Release nfct refcount on napi stolen or re-used skbs Documentation: Fix intiramfs script name perf inject: Close inject.output on exit usb: ehci: Prevent missed ehci interrupts with edge-triggered MSI drm/i915/gvt: Clear d3_entered on elsp cmd submission. sfc: ensure correct number of XDP queues xhci: add xhci_get_virt_ep() helper skbuff: Fix build with SKB extensions disabled Linux 5.10.54 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ifd2823b47ab1544cd1f168b138624ffe060a471e	2021-07-28 15:23:47 +02:00
Nicolas Saenz Julienne	0ff2ea9d8f	timers: Fix get_next_timer_interrupt() with no timers pending [ Upstream commit aebacb7f6ca1926918734faae14d1f0b6fae5cb7 ] `31cd0e119d` ("timers: Recalculate next timer interrupt only when necessary") subtly altered get_next_timer_interrupt()'s behaviour. The function no longer consistently returns KTIME_MAX with no timers pending. In order to decide if there are any timers pending we check whether the next expiry will happen NEXT_TIMER_MAX_DELTA jiffies from now. Unfortunately, the next expiry time and the timer base clock are no longer updated in unison. The former changes upon certain timer operations (enqueue, expire, detach), whereas the latter keeps track of jiffies as they move forward. Ultimately breaking the logic above. A simplified example: - Upon entering get_next_timer_interrupt() with: jiffies = 1 base->clk = 0; base->next_expiry = NEXT_TIMER_MAX_DELTA; 'base->next_expiry == base->clk + NEXT_TIMER_MAX_DELTA', the function returns KTIME_MAX. - 'base->clk' is updated to the jiffies value. - The next time we enter get_next_timer_interrupt(), taking into account no timer operations happened: base->clk = 1; base->next_expiry = NEXT_TIMER_MAX_DELTA; 'base->next_expiry != base->clk + NEXT_TIMER_MAX_DELTA', the function returns a valid expire time, which is incorrect. This ultimately might unnecessarily rearm sched's timer on nohz_full setups, and add latency to the system[1]. So, introduce 'base->timers_pending'[2], update it every time 'base->next_expiry' changes, and use it in get_next_timer_interrupt(). [1] See tick_nohz_stop_tick(). [2] A quick pahole check on x86_64 and arm64 shows it doesn't make 'struct timer_base' any bigger. Fixes: `31cd0e119d` ("timers: Recalculate next timer interrupt only when necessary") Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-28 14:35:37 +02:00
Huang Yiwei	ff96a0a256	ANDROID: timer: calc_index vendor hook adjustment Move the calc_index vendor hook one line ahead to cover more use cases. Bug: 181296757 Signed-off-by: Huang Yiwei <hyiwei@codeaurora.org> Change-Id: I52231a3ccbe622021232c7a54354c5ac02cf3952	2021-02-26 19:23:02 +00:00
Huang Yiwei	9789a88f42	ANDROID: timer: Add vendor hook for timer calc index Since we're expecting timers more precisely in short period, add a vendor hook to calc_index when adding timers. Then we can modify the index this timer used to make it accurate. Bug: 178758017 Signed-off-by: Huang Yiwei <hyiwei@codeaurora.org> Change-Id: Ie0e6493ae7ad53b0cc57eb1bbcf8a0a11f652828	2021-02-02 02:04:18 +00:00
Changki Kim	96844c1c84	ANDROID: timer: Export hrtimer_expire_entry/exit tracepoints Export hrtimer_expire_entry/exit tracepoints, so that vendor modules can register probes for these tracepoints. Bug: 175936268 Change-Id: I739f369d3b56e09f8e9061fefdf25830e37e987e Signed-off-by: Changki Kim <changki.kim@samsung.com>	2020-12-21 17:49:09 +00:00
YueHaibing	9010e3876e	timers: Remove unused inline funtion debug_timer_free() There is no caller in tree, remove it. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200909134749.32300-1-yuehaibing@huawei.com	2020-10-26 11:39:21 +01:00
Willy Tarreau	3744741ada	random32: add noise from network and scheduling activity With the removal of the interrupt perturbations in previous random32 change (random32: make prandom_u32() output unpredictable), the PRNG has become 100% deterministic again. While SipHash is expected to be way more robust against brute force than the previous Tausworthe LFSR, there's still the risk that whoever has even one temporary access to the PRNG's internal state is able to predict all subsequent draws till the next reseed (roughly every minute). This may happen through a side channel attack or any data leak. This patch restores the spirit of commit `f227e3ec3b` ("random32: update the net random state on interrupt and activity") in that it will perturb the internal PRNG's statee using externally collected noise, except that it will not pick that noise from the random pool's bits nor upon interrupt, but will rather combine a few elements along the Tx path that are collectively hard to predict, such as dev, skb and txq pointers, packet length and jiffies values. These ones are combined using a single round of SipHash into a single long variable that is mixed with the net_rand_state upon each invocation. The operation was inlined because it produces very small and efficient code, typically 3 xor, 2 add and 2 rol. The performance was measured to be the same (even very slightly better) than before the switch to SipHash; on a 6-core 12-thread Core i7-8700k equipped with a 40G NIC (i40e), the connection rate dropped from 556k/s to 555k/s while the SYN cookie rate grew from 5.38 Mpps to 5.45 Mpps. Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/ Cc: George Spelvin <lkml@sdf.org> Cc: Amit Klein <aksecurity@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: tytso@mit.edu Cc: Florian Westphal <fw@strlen.de> Cc: Marc Plumb <lkml.mplumb@gmail.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Willy Tarreau <w@1wt.eu>	2020-10-24 20:21:57 +02:00
George Spelvin	c51f8f88d7	random32: make prandom_u32() output unpredictable Non-cryptographic PRNGs may have great statistical properties, but are usually trivially predictable to someone who knows the algorithm, given a small sample of their output. An LFSR like prandom_u32() is particularly simple, even if the sample is widely scattered bits. It turns out the network stack uses prandom_u32() for some things like random port numbers which it would prefer are not trivially predictable. Predictability led to a practical DNS spoofing attack. Oops. This patch replaces the LFSR with a homebrew cryptographic PRNG based on the SipHash round function, which is in turn seeded with 128 bits of strong random key. (The authors of SipHash have not been consulted about this abuse of their algorithm.) Speed is prioritized over security; attacks are rare, while performance is always wanted. Replacing all callers of prandom_u32() is the quick fix. Whether to reinstate a weaker PRNG for uses which can tolerate it is an open question. Commit `f227e3ec3b` ("random32: update the net random state on interrupt and activity") was an earlier attempt at a solution. This patch replaces it. Reported-by: Amit Klein <aksecurity@gmail.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Eric Dumazet <edumazet@google.com> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: tytso@mit.edu Cc: Florian Westphal <fw@strlen.de> Cc: Marc Plumb <lkml.mplumb@gmail.com> Fixes: `f227e3ec3b` ("random32: update the net random state on interrupt and activity") Signed-off-by: George Spelvin <lkml@sdf.org> Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/ [ willy: partial reversal of f227e3ec3b5c; moved SIPROUND definitions to prandom.h for later use; merged George's prandom_seed() proposal; inlined siprand_u32(); replaced the net_rand_state[] array with 4 members to fix a build issue; cosmetic cleanups to make checkpatch happy; fixed RANDOM32_SELFTEST build ] Signed-off-by: Willy Tarreau <w@1wt.eu>	2020-10-24 20:21:57 +02:00
Linus Torvalds	f5f59336a9	Merge tag 'timers-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timekeeping updates from Thomas Gleixner: "Updates for timekeeping, timers and related drivers: Core: - Early boot support for the NMI safe timekeeper by utilizing local_clock() up to the point where timekeeping is initialized. This allows printk() to store multiple timestamps in the ringbuffer which is useful for coordinating dmesg information across a fleet of machines. - Provide a multi-timestamp accessor for printk() - Make timer init more robust by checking for invalid timer flags. - Comma vs semicolon fixes Drivers: - Support for new platforms in existing drivers (SP804 and Renesas CMT) - Comma vs semicolon fixes * tag 'timers-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource/drivers/armada-370-xp: Use semicolons rather than commas to separate statements clocksource/drivers/mps2-timer: Use semicolons rather than commas to separate statements timers: Mask invalid flags in do_init_timer() clocksource/drivers/sp804: Enable Hisilicon sp804 timer 64bit mode clocksource/drivers/sp804: Add support for Hisilicon sp804 timer clocksource/drivers/sp804: Support non-standard register offset clocksource/drivers/sp804: Prepare for support non-standard register offset clocksource/drivers/sp804: Remove a mismatched comment clocksource/drivers/sp804: Delete the leading "__" of some functions clocksource/drivers/sp804: Remove unused sp804_timer_disable() and timer-sp804.h clocksource/drivers/sp804: Cleanup clk_get_sys() dt-bindings: timer: renesas,cmt: Document r8a774e1 CMT support dt-bindings: timer: renesas,cmt: Document r8a7742 CMT support alarmtimer: Convert comma to semicolon timekeeping: Provide multi-timestamp accessor to NMI safe timekeeper timekeeping: Utilize local_clock() for NMI safe timekeeper during early boot	2020-10-12 11:27:54 -07:00
Qianli Zhao	b952caf2d5	timers: Mask invalid flags in do_init_timer() do_init_timer() accepts any combination of timer flags handed in by the caller without a sanity check, but only TIMER_DEFFERABLE, TIMER_PINNED and TIMER_IRQSAFE are valid. If the supplied flags have other bits set, this could result in malfunction. If bits are set in TIMER_CPUMASK the first timer usage could deference a cpu base which is outside the range of possible CPUs. If TIMER_MIGRATION is set, then the switch_timer_base() will live lock. Prevent that with a sanity check which warns when invalid flags are supplied and masks them out. [ tglx: Made it WARN_ON_ONCE() and added context to the changelog ] Signed-off-by: Qianli Zhao <zhaoqianli@xiaomi.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/9d79a8aa4eb56713af7379f99f062dedabcde140.1597326756.git.zhaoqianli@xiaomi.com	2020-09-24 22:12:18 +02:00
Stephen Boyd	f9e62f318f	treewide: Make all debug_obj_descriptors const This should make it harder for the kernel to corrupt the debug object descriptor, used to call functions to fixup state and track debug objects, by moving the structure to read-only memory. Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20200815004027.2046113-3-swboyd@chromium.org	2020-09-24 21:56:25 +02:00
Gustavo A. R. Silva	df561f6688	treewide: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>	2020-08-23 17:36:59 -05:00
Thomas Gleixner	1fb497dd00	posix-cpu-timers: Provide mechanisms to defer timer handling to task_work Running posix CPU timers in hard interrupt context has a few downsides: - For PREEMPT_RT it cannot work as the expiry code needs to take sighand lock, which is a 'sleeping spinlock' in RT. The original RT approach of offloading the posix CPU timer handling into a high priority thread was clumsy and provided no real benefit in general. - For fine grained accounting it's just wrong to run this in context of the timer interrupt because that way a process specific CPU time is accounted to the timer interrupt. - Long running timer interrupts caused by a large amount of expiring timers which can be created and armed by unpriviledged user space. There is no hard requirement to expire them in interrupt context. If the signal is targeted at the task itself then it won't be delivered before the task returns to user space anyway. If the signal is targeted at a supervisor process then it might be slightly delayed, but posix CPU timers are inaccurate anyway due to the fact that they are tied to the tick. Provide infrastructure to schedule task work which allows splitting the posix CPU timer code into a quick check in interrupt context and a thread context expiry and signal delivery function. This has to be enabled by architectures as it requires that the architecture specific KVM implementation handles pending task work before exiting to guest mode. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20200730102337.783470146@linutronix.de	2020-08-06 16:50:59 +02:00
Linus Torvalds	442489c219	Merge tag 'timers-core-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Time, timers and related driver updates: - Prevent unnecessary timer softirq invocations by extending the tracking of the next expiring timer in the timer wheel beyond the existing NOHZ functionality. The tracking overhead at enqueue time is within the noise, but on sensitive workloads the avoidance of the soft interrupt invocation is a measurable improvement. - The obligatory new clocksource driver for Ingenic X100 OST - The usual fixes, improvements, cleanups and extensions for newer chip variants all over the driver space" * tag 'timers-core-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits) timers: Recalculate next timer interrupt only when necessary clocksource/drivers/ingenic: Add support for the Ingenic X1000 OST. dt-bindings: timer: Add Ingenic X1000 OST bindings. clocksource/drivers: Replace HTTP links with HTTPS ones clocksource/drivers/nomadik-mtu: Handle 32kHz clock clocksource/drivers/sh_cmt: Use "kHz" for kilohertz clocksource/drivers/imx: Add support for i.MX TPM driver with ARM64 clocksource/drivers/ingenic: Add high resolution timer support for SMP/SMT. timers: Lower base clock forwarding threshold timers: Remove must_forward_clk timers: Spare timer softirq until next expiry timers: Expand clk forward logic beyond nohz timers: Reuse next expiry cache after nohz exit timers: Always keep track of next expiry timers: Optimize _next_timer_interrupt() level iteration timers: Add comments about calc_index() ceiling work timers: Move trigger_dyntick_cpu() to enqueue_timer() timers: Use only bucket expiry for base->next_expiry value timers: Preserve higher bits of expiration on index calculation clocksource/drivers/timer-atmel-tcb: Add sama5d2 support ...	2020-08-04 18:17:37 -07:00
Willy Tarreau	f227e3ec3b	random32: update the net random state on interrupt and activity This modifies the first 32 bits out of the 128 bits of a random CPU's net_rand_state on interrupt or CPU activity to complicate remote observations that could lead to guessing the network RNG's internal state. Note that depending on some network devices' interrupt rate moderation or binding, this re-seeding might happen on every packet or even almost never. In addition, with NOHZ some CPUs might not even get timer interrupts, leaving their local state rarely updated, while they are running networked processes making use of the random state. For this reason, we also perform this update in update_process_times() in order to at least update the state when there is user or system activity, since it's the only case we care about. Reported-by: Amit Klein <aksecurity@gmail.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Eric Dumazet <edumazet@google.com> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-07-29 10:35:37 -07:00
Frederic Weisbecker	31cd0e119d	timers: Recalculate next timer interrupt only when necessary The nohz tick code recalculates the timer wheel's next expiry on each idle loop iteration. On the other hand, the base next expiry is now always cached and updated upon timer enqueue and execution. Only timer dequeue may leave base->next_expiry out of date (but then its stale value won't ever go past the actual next expiry to be recalculated). Since recalculating the next_expiry isn't a free operation, especially when the last wheel level is reached to find out that no timer has been enqueued at all, reuse the next expiry cache when it is known to be reliable, which it is most of the time. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200723151641.12236-1-frederic@kernel.org	2020-07-24 12:49:40 +02:00
Frederic Weisbecker	36cd28a4cd	timers: Lower base clock forwarding threshold There is nothing that prevents from forwarding the base clock if it's one jiffy off. The reason for this arbitrary limit of two jiffies is historical and does not longer exist. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lkml.kernel.org/r/20200717140551.29076-13-frederic@kernel.org	2020-07-17 21:55:25 +02:00
Frederic Weisbecker	0975fb565b	timers: Remove must_forward_clk There is no reason to keep this guard around. The code makes sure that base->clk remains sane and won't be forwarded beyond jiffies nor set backward. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lkml.kernel.org/r/20200717140551.29076-12-frederic@kernel.org	2020-07-17 21:55:25 +02:00
Frederic Weisbecker	d4f7dae870	timers: Spare timer softirq until next expiry Now that the core timer infrastructure doesn't depend anymore on periodic base->clk increments, even when the CPU is not in NO_HZ mode, timer softirqs can be skipped until there are timers to expire. Some spurious softirqs can still remain since base->next_expiry doesn't keep track of canceled timers but this still reduces the number of softirqs significantly: ~15 times less for HZ=1000 and ~5 times less for HZ=100. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lkml.kernel.org/r/20200717140551.29076-11-frederic@kernel.org	2020-07-17 21:55:24 +02:00
Frederic Weisbecker	1f8a4212dc	timers: Expand clk forward logic beyond nohz As for next_expiry, the base->clk catch up logic will be expanded beyond NOHZ in order to avoid triggering useless softirqs. If softirqs should only fire to expire pending timers, periodic base->clk increments must be skippable for random amounts of time. Therefore prepare to catch-up with missing updates whenever an up-to-date base clock is needed. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lkml.kernel.org/r/20200717140551.29076-10-frederic@kernel.org	2020-07-17 21:55:24 +02:00
Frederic Weisbecker	90d52f65f3	timers: Reuse next expiry cache after nohz exit Now that the next expiry it tracked unconditionally when a timer is added, this information can be reused on a tick firing after exiting nohz. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lkml.kernel.org/r/20200717140551.29076-9-frederic@kernel.org	2020-07-17 21:55:23 +02:00
Frederic Weisbecker	dc2a0f1fb2	timers: Always keep track of next expiry So far next expiry was only tracked while the CPU was in nohz_idle mode in order to cope with missing ticks that can't increment the base->clk periodically anymore. This logic is going to be expanded beyond nohz in order to spare timer softirqs so do it unconditionally. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lkml.kernel.org/r/20200717140551.29076-8-frederic@kernel.org	2020-07-17 21:55:23 +02:00
Frederic Weisbecker	001ec1b392	timers: Optimize _next_timer_interrupt() level iteration If a level has a timer that expires before reaching the next level, there is no need to iterate further. The next level is reached when the 3 lower bits of the current level are cleared. If the next event happens before/during that, the next levels won't provide an earlier expiration. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lkml.kernel.org/r/20200717140551.29076-7-frederic@kernel.org	2020-07-17 21:55:22 +02:00
Frederic Weisbecker	4468897211	timers: Add comments about calc_index() ceiling work calc_index() adds 1 unit of the level granularity to the expiry passed in parameter to ensure that the timer doesn't expire too early. Add a comment to explain that and the resulting layout in the wheel. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lkml.kernel.org/r/20200717140551.29076-6-frederic@kernel.org	2020-07-17 21:55:22 +02:00
Frederic Weisbecker	9a2b764b06	timers: Move trigger_dyntick_cpu() to enqueue_timer() Consolidate the code by calling trigger_dyntick_cpu() from enqueue_timer() instead of calling it from all its callers. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lkml.kernel.org/r/20200717140551.29076-5-frederic@kernel.org	2020-07-17 21:55:22 +02:00
Anna-Maria Behnsen	1f32cab0db	timers: Use only bucket expiry for base->next_expiry value The bucket expiry time is the effective expriy time of timers and is greater than or equal to the requested timer expiry time. This is due to the guarantee that timers never expire early and the reduced expiry granularity in the secondary wheel levels. When a timer is enqueued, trigger_dyntick_cpu() checks whether the timer is the new first timer. This check compares next_expiry with the requested timer expiry value and not with the effective expiry value of the bucket into which the timer was queued. Storing the requested timer expiry value in base->next_expiry can lead to base->clk going backwards if the requested timer expiry value is smaller than base->clk. Commit `30c66fc30e` ("timer: Prevent base->clk from moving backward") worked around this by preventing the store when timer->expiry is before base->clk, but did not fix the underlying problem. Use the expiry value of the bucket into which the timer is queued to do the new first timer check. This fixes the base->clk going backward problem. The workaround of commit `30c66fc30e` ("timer: Prevent base->clk from moving backward") in trigger_dyntick_cpu() is not longer necessary as the timers bucket expiry is guaranteed to be greater than or equal base->clk. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200717140551.29076-4-frederic@kernel.org	2020-07-17 21:55:21 +02:00
Frederic Weisbecker	3d2e83a2a6	timers: Preserve higher bits of expiration on index calculation The higher bits of the timer expiration are cropped while calling calc_index() due to the implicit cast from unsigned long to unsigned int. This loss shouldn't have consequences on the current code since all the computation to calculate the index is done on the lower 32 bits. However to prepare for returning the actual bucket expiration from calc_index() in order to properly fix base->next_expiry updates, the higher bits need to be preserved. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200717140551.29076-3-frederic@kernel.org	2020-07-17 21:55:21 +02:00
Frederic Weisbecker	e2a71bdea8	timer: Fix wheel index calculation on last level When an expiration delta falls into the last level of the wheel, that delta has be compared against the maximum possible delay and reduced to fit in if necessary. However instead of comparing the delta against the maximum, the code compares the actual expiry against the maximum. Then instead of fixing the delta to fit in, it sets the maximum delta as the expiry value. This can result in various undesired outcomes, the worst possible one being a timer expiring 15 days ahead to fire immediately. Fixes: `500462a9de` ("timers: Switch to a non-cascading wheel") Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200717140551.29076-2-frederic@kernel.org	2020-07-17 21:44:05 +02:00
Frederic Weisbecker	30c66fc30e	timer: Prevent base->clk from moving backward When a timer is enqueued with a negative delta (ie: expiry is below base->clk), it gets added to the wheel as expiring now (base->clk). Yet the value that gets stored in base->next_expiry, while calling trigger_dyntick_cpu(), is the initial timer->expires value. The resulting state becomes: base->next_expiry < base->clk On the next timer enqueue, forward_timer_base() may accidentally rewind base->clk. As a possible outcome, timers may expire way too early, the worst case being that the highest wheel levels get spuriously processed again. To prevent from that, make sure that base->next_expiry doesn't get below base->clk. Fixes: `a683f390b9` ("timers: Forward the wheel clock whenever possible") Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Tested-by: Juri Lelli <juri.lelli@redhat.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200703010657.2302-1-frederic@kernel.org	2020-07-09 11:56:57 +02:00
Christoph Hellwig	32927393dc	sysctl: pass kernel pointers to ->proc_handler Instead of having all the sysctl handlers deal with user pointers, which is rather hairy in terms of the BPF interaction, copy the input to and from userspace in common code. This also means that the strings are always NUL-terminated by the common code, making the API a little bit safer. As most handler just pass through the data to one of the common handlers a lot of the changes are mechnical. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-04-27 02:07:40 -04:00
Linus Torvalds	dbb381b619	Merge tag 'timers-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timekeeping and timer updates from Thomas Gleixner: "Core: - Consolidation of the vDSO build infrastructure to address the difficulties of cross-builds for ARM64 compat vDSO libraries by restricting the exposure of header content to the vDSO build. This is achieved by splitting out header content into separate headers. which contain only the minimaly required information which is necessary to build the vDSO. These new headers are included from the kernel headers and the vDSO specific files. - Enhancements to the generic vDSO library allowing more fine grained control over the compiled in code, further reducing architecture specific storage and preparing for adopting the generic library by PPC. - Cleanup and consolidation of the exit related code in posix CPU timers. - Small cleanups and enhancements here and there Drivers: - The obligatory new drivers: Ingenic JZ47xx and X1000 TCU support - Correct the clock rate of PIT64b global clock - setup_irq() cleanup - Preparation for PWM and suspend support for the TI DM timer - Expand the fttmr010 driver to support ast2600 systems - The usual small fixes, enhancements and cleanups all over the place" * tag 'timers-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (80 commits) Revert "clocksource/drivers/timer-probe: Avoid creating dead devices" vdso: Fix clocksource.h macro detection um: Fix header inclusion arm64: vdso32: Enable Clang Compilation lib/vdso: Enable common headers arm: vdso: Enable arm to use common headers x86/vdso: Enable x86 to use common headers mips: vdso: Enable mips to use common headers arm64: vdso32: Include common headers in the vdso library arm64: vdso: Include common headers in the vdso library arm64: Introduce asm/vdso/processor.h arm64: vdso32: Code clean up linux/elfnote.h: Replace elf.h with UAPI equivalent scripts: Fix the inclusion order in modpost common: Introduce processor.h linux/ktime.h: Extract common header for vDSO linux/jiffies.h: Extract common header for vDSO linux/time64.h: Extract common header for vDSO linux/time32.h: Extract common header for vDSO linux/time.h: Extract common header for vDSO ...	2020-03-30 18:51:47 -07:00
Eric Dumazet	90c018942c	timer: Use hlist_unhashed_lockless() in timer_pending() The timer_pending() function is mostly used in lockless contexts, so Without proper annotations, KCSAN might detect a data-race [1]. Using hlist_unhashed_lockless() instead of hand-coding it seems appropriate (as suggested by Paul E. McKenney). [1] BUG: KCSAN: data-race in del_timer / detach_if_pending write to 0xffff88808697d870 of 8 bytes by task 10 on cpu 0: __hlist_del include/linux/list.h:764 [inline] detach_timer kernel/time/timer.c:815 [inline] detach_if_pending+0xcd/0x2d0 kernel/time/timer.c:832 try_to_del_timer_sync+0x60/0xb0 kernel/time/timer.c:1226 del_timer_sync+0x6b/0xa0 kernel/time/timer.c:1365 schedule_timeout+0x2d2/0x6e0 kernel/time/timer.c:1896 rcu_gp_fqs_loop+0x37c/0x580 kernel/rcu/tree.c:1639 rcu_gp_kthread+0x143/0x230 kernel/rcu/tree.c:1799 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352 read to 0xffff88808697d870 of 8 bytes by task 12060 on cpu 1: del_timer+0x3b/0xb0 kernel/time/timer.c:1198 sk_stop_timer+0x25/0x60 net/core/sock.c:2845 inet_csk_clear_xmit_timers+0x69/0xa0 net/ipv4/inet_connection_sock.c:523 tcp_clear_xmit_timers include/net/tcp.h:606 [inline] tcp_v4_destroy_sock+0xa3/0x3f0 net/ipv4/tcp_ipv4.c:2096 inet_csk_destroy_sock+0xf4/0x250 net/ipv4/inet_connection_sock.c:836 tcp_close+0x6f3/0x970 net/ipv4/tcp.c:2497 inet_release+0x86/0x100 net/ipv4/af_inet.c:427 __sock_release+0x85/0x160 net/socket.c:590 sock_close+0x24/0x30 net/socket.c:1268 __fput+0x1e1/0x520 fs/file_table.c:280 ____fput+0x1f/0x30 fs/file_table.c:313 task_work_run+0xf6/0x130 kernel/task_work.c:113 tracehook_notify_resume include/linux/tracehook.h:188 [inline] exit_to_usermode_loop+0x2b4/0x2c0 arch/x86/entry/common.c:163 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 12060 Comm: syz-executor.5 Not tainted 5.4.0-rc3+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> [ paulmck: Pulled in Eric's later amendments. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-02-20 15:58:22 -08:00
Alexander Popov	6e317c32fd	timer: Improve the comment describing schedule_timeout() When working commit `6dcd5d7a7a`, a mistake was noticed by Linus: schedule_timeout() was called without setting the task state to anything particular. It calls the scheduler, but doesn't delay anything, because the task stays runnable. That happens because sched_submit_work() does nothing for tasks in TASK_RUNNING state. That turned out to be the intended behavior. Adding a WARN() is not useful as the task could be woken up right after setting the state and before reaching schedule_timeout(). Improve the comment about schedule_timeout() and describe that more explicitly. Signed-off-by: Alexander Popov <alex.popov@linux.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200117225900.16340-1-alex.popov@linux.com	2020-02-17 20:12:19 +01:00
Li RongQing	e430d802d6	timer: Read jiffies once when forwarding base clk The timer delayed for more than 3 seconds warning was triggered during testing. Workqueue: events_unbound sched_tick_remote RIP: 0010:sched_tick_remote+0xee/0x100 ... Call Trace: process_one_work+0x18c/0x3a0 worker_thread+0x30/0x380 kthread+0x113/0x130 ret_from_fork+0x22/0x40 The reason is that the code in collect_expired_timers() uses jiffies unprotected: if (next_event > jiffies) base->clk = jiffies; As the compiler is allowed to reload the value base->clk can advance between the check and the store and in the worst case advance farther than next event. That causes the timer expiry to be delayed until the wheel pointer wraps around. Convert the code to use READ_ONCE() Fixes: `236968383c` ("timers: Optimize collect_expired_timers() for NOHZ") Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Liang ZhiCheng <liangzhicheng@baidu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1568894687-14499-1-git-send-email-lirongqing@baidu.com	2019-09-19 17:50:11 +02:00
Thomas Gleixner	dce3e8fd03	posix-cpu-timers: Remove tsk argument from run_posix_cpu_timers() It's always current. Don't give people wrong ideas. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20190819143801.945469967@linutronix.de	2019-08-21 20:27:16 +02:00
Anna-Maria Gleixner	030dcdd197	timers: Prepare support for PREEMPT_RT When PREEMPT_RT is enabled, the soft interrupt thread can be preempted. If the soft interrupt thread is preempted in the middle of a timer callback, then calling del_timer_sync() can lead to two issues: - If the caller is on a remote CPU then it has to spin wait for the timer handler to complete. This can result in unbound priority inversion. - If the caller originates from the task which preempted the timer handler on the same CPU, then spin waiting for the timer handler to complete is never going to end. To avoid these issues, add a new lock to the timer base which is held around the execution of the timer callbacks. If del_timer_sync() detects that the timer callback is currently running, it blocks on the expiry lock. When the callback is finished, the expiry lock is dropped by the softirq thread which wakes up the waiter and the system makes progress. This addresses both the priority inversion and the life lock issues. This mechanism is not used for timers which are marked IRQSAFE as for those preemption is disabled accross the callback and therefore this situation cannot happen. The callbacks for such timers need to be individually audited for RT compliance. The same issue can happen in virtual machines when the vCPU which runs a timer callback is scheduled out. If a second vCPU of the same guest calls del_timer_sync() it will spin wait for the other vCPU to be scheduled back in. The expiry lock mechanism would avoid that. It'd be trivial to enable this when paravirt spinlocks are enabled in a guest, but it's not clear whether this is an actual problem in the wild, so for now it's an RT only mechanism. As the softirq thread can be preempted with PREEMPT_RT=y, the SMP variant of del_timer_sync() needs to be used on UP as well. [ tglx: Refactored it for mainline ] Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190726185753.832418500@linutronix.de	2019-08-01 20:51:22 +02:00
Linus Torvalds	0968621917	Merge tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - Allow state reset of printk_once() calls. - Prevent crashes when dereferencing invalid pointers in vsprintf(). Only the first byte is checked for simplicity. - Make vsprintf warnings consistent and inlined. - Treewide conversion of obsolete %pf, %pF to %ps, %pF printf modifiers. - Some clean up of vsprintf and test_printf code. * tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: lib/vsprintf: Make function pointer_string static vsprintf: Limit the length of inlined error messages vsprintf: Avoid confusion between invalid address and value vsprintf: Prevent crash when dereferencing invalid pointers vsprintf: Consolidate handling of unknown pointer specifiers vsprintf: Factor out %pO handler as kobject_string() vsprintf: Factor out %pV handler as va_format() vsprintf: Factor out %p[iI] handler as ip_addr_string() vsprintf: Do not check address of well-known strings vsprintf: Consistent %pK handling for kptr_restrict == 0 vsprintf: Shuffle restricted_pointer() printk: Tie printk_once / printk_deferred_once into .data.once for reset treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively lib/test_printf: Switch to bitmap_zalloc()	2019-05-07 09:18:12 -07:00
Sakari Ailus	d75f773c86	treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively %pF and %pf are functionally equivalent to %pS and %ps conversion specifiers. The former are deprecated, therefore switch the current users to use the preferred variant. The changes have been produced by the following command: git grep -l '%p[fF]' \| grep -v '^$tools\\|Documentation$/' \| \ while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done And verifying the result. Link: http://lkml.kernel.org/r/20190325193229.23390-1-sakari.ailus@linux.intel.com Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Cc: sparclinux@vger.kernel.org Cc: linux-um@lists.infradead.org Cc: xen-devel@lists.xenproject.org Cc: linux-acpi@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: drbd-dev@lists.linbit.com Cc: linux-block@vger.kernel.org Cc: linux-mmc@vger.kernel.org Cc: linux-nvdimm@lists.01.org Cc: linux-pci@vger.kernel.org Cc: linux-scsi@vger.kernel.org Cc: linux-btrfs@vger.kernel.org Cc: linux-f2fs-devel@lists.sourceforge.net Cc: linux-mm@kvack.org Cc: ceph-devel@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Acked-by: David Sterba <dsterba@suse.com> (for btrfs) Acked-by: Mike Rapoport <rppt@linux.ibm.com> (for mm/memblock.c) Acked-by: Bjorn Helgaas <bhelgaas@google.com> (for drivers/pci) Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Petr Mladek <pmladek@suse.com>	2019-04-09 14:19:06 +02:00
Anna-Maria Gleixner	f28d3d5346	timer/trace: Improve timer tracing Timers are added to the timer wheel off by one. This is required in case a timer is queued directly before incrementing jiffies to prevent early timer expiry. When reading a timer trace and relying only on the expiry time of the timer in the timer_start trace point and on the now in the timer_expiry_entry trace point, it seems that the timer fires late. With the current timer_expiry_entry trace point information only now=jiffies is printed but not the value of base->clk. This makes it impossible to draw a conclusion to the index of base->clk and makes it impossible to examine timer problems without additional trace points. Therefore add the base->clk value to the timer_expire_entry trace point, to be able to calculate the index the timer base is located at during collecting expired timers. Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: fweisbec@gmail.com Cc: peterz@infradead.org Cc: Steven Rostedt <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/20190321120921.16463-5-anna-maria@linutronix.de	2019-03-24 20:29:33 +01:00
Anna-Maria Gleixner	dc1e7dc5ac	timer: Move trace point to get proper index When placing the timer_start trace point before the timer wheel bucket index is calculated, the index information in the trace point is useless. It is not possible to simply move the debug_activate() call after the index calculation, because debug_object_activate() needs to be called before touching the object. Therefore split debug_activate() and move the trace point into enqueue_timer() after the new index has been calculated. The debug_object_activate() call remains at the original place. Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: fweisbec@gmail.com Cc: peterz@infradead.org Cc: Steven Rostedt <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/20190321120921.16463-3-anna-maria@linutronix.de	2019-03-24 20:29:32 +01:00
Linus Torvalds	3717f613f4	Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main RCU related changes in this cycle were: - Additional cleanups after RCU flavor consolidation - Grace-period forward-progress cleanups and improvements - Documentation updates - Miscellaneous fixes - spin_is_locked() conversions to lockdep - SPDX changes to RCU source and header files - SRCU updates - Torture-test updates, including nolibc updates and moving nolibc to tools/include" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits) locking/locktorture: Convert to SPDX license identifier linux/torture: Convert to SPDX license identifier torture: Convert to SPDX license identifier linux/srcu: Convert to SPDX license identifier linux/rcutree: Convert to SPDX license identifier linux/rcutiny: Convert to SPDX license identifier linux/rcu_sync: Convert to SPDX license identifier linux/rcu_segcblist: Convert to SPDX license identifier linux/rcupdate: Convert to SPDX license identifier linux/rcu_node_tree: Convert to SPDX license identifier rcu/update: Convert to SPDX license identifier rcu/tree: Convert to SPDX license identifier rcu/tiny: Convert to SPDX license identifier rcu/sync: Convert to SPDX license identifier rcu/srcu: Convert to SPDX license identifier rcu/rcutorture: Convert to SPDX license identifier rcu/rcu_segcblist: Convert to SPDX license identifier rcu/rcuperf: Convert to SPDX license identifier rcu/rcu.h: Convert to SPDX license identifier RCU/torture.txt: Remove section MODULE PARAMETERS ...	2019-03-05 14:49:11 -08:00
Gustavo A. R. Silva	75b710af71	timers: Mark expected switch fall-throughs In preparation to enabling -Wimplicit-fallthrough, mark switch cases where fall through is indeed expected. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Stephen Boyd <sboyd@kernel.org> Link: https://lkml.kernel.org/r/20190123081413.GA3949@embeddedor	2019-01-29 20:08:42 +01:00
Paul E. McKenney	c98cac603f	rcu: Rename rcu_check_callbacks() to rcu_sched_clock_irq() The name rcu_check_callbacks() arguably made sense back in the early 2000s when RCU was quite a bit simpler than it is today, but it has become quite misleading, especially with the advent of dyntick-idle and NO_HZ_FULL. The rcu_check_callbacks() function is RCU's hook into the scheduling-clock interrupt, and is now but one of many ways that callbacks get promoted to invocable state. This commit therefore changes the name to rcu_sched_clock_irq(), which is the same number of characters and clearly indicates this function's relation to the rest of the Linux kernel. In addition, for the sake of consistency, rcu_flavor_check_callbacks() is also renamed to rcu_flavor_sched_clock_irq(). While in the area, the header comments for both functions are reworked. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-01-25 15:35:21 -08:00
Thomas Gleixner	35728b8209	time: Add SPDX license identifiers Update the time(r) core files files with the correct SPDX license identifier based on the license text in the file itself. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This work is based on a script and data from Philippe Ombredanne, Kate Stewart and myself. The data has been created with two independent license scanners and manual inspection. The following files do not contain any direct license information and have been omitted from the big initial SPDX changes: timeconst.bc: The .bc files were not touched time.c, timer.c, timekeeping.c: Licence was deduced from EXPORT_SYMBOL_GPL As those files do not contain direct license references they fall under the project license, i.e. GPL V2 only. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: John Stultz <john.stultz@linaro.org> Acked-by: Corey Minyard <cminyard@mvista.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: David Riley <davidriley@chromium.org> Cc: Colin Cross <ccross@android.com> Cc: Mark Brown <broonie@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: https://lkml.kernel.org/r/20181031182252.879109557@linutronix.de	2018-11-23 11:51:20 +01:00
Thomas Gleixner	58c5fc2b96	time: Remove useless filenames in top level comments Remove the pointless filenames in the top level comments. They have no value at all and just occupy space. While at it tidy up some of the comments and remove a stale one. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Nicolas Pitre <nico@linaro.org> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: John Stultz <john.stultz@linaro.org> Acked-by: Corey Minyard <cminyard@mvista.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Peter Anvin <hpa@zytor.com> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Richard Cochran <richardcochran@gmail.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: David Riley <davidriley@chromium.org> Cc: Colin Cross <ccross@android.com> Cc: Mark Brown <broonie@kernel.org> Link: https://lkml.kernel.org/r/20181031182252.794898238@linutronix.de	2018-11-23 11:51:20 +01:00
Gaurav Kohli	363e934d88	timers: Clear timer_base::must_forward_clk with timer_base::lock held timer_base::must_forward_clock is indicating that the base clock might be stale due to a long idle sleep. The forwarding of the base clock takes place in the timer softirq or when a timer is enqueued to a base which is idle. If the enqueue of timer to an idle base happens from a remote CPU, then the following race can happen: CPU0 CPU1 run_timer_softirq mod_timer base = lock_timer_base(timer); base->must_forward_clk = false if (base->must_forward_clk) forward(base); -> skipped enqueue_timer(base, timer, idx); -> idx is calculated high due to stale base unlock_timer_base(timer); base = lock_timer_base(timer); forward(base); The root cause is that timer_base::must_forward_clk is cleared outside the timer_base::lock held region, so the remote queuing CPU observes it as cleared, but the base clock is still stale. This can cause large granularity values for timers, i.e. the accuracy of the expiry time suffers. Prevent this by clearing the flag with timer_base::lock held, so that the forwarding takes place before the cleared flag is observable by a remote CPU. Signed-off-by: Gaurav Kohli <gkohli@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: john.stultz@linaro.org Cc: sboyd@kernel.org Cc: linux-arm-msm@vger.kernel.org Link: https://lkml.kernel.org/r/1533199863-22748-1-git-send-email-gkohli@codeaurora.org	2018-08-02 12:52:38 +02:00
Yi Wang	3058758925	timer: Fix coding style The call to wake_up_nohz_cpu() is incorrectly indented. Remove the surplus TAB. Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn> Cc: john.stultz@linaro.org Cc: sboyd@kernel.org Cc: zhong.weidong@zte.com.cn CC: Anna-Maria Gleixner <anna-maria@linutronix.de> Link: https://lkml.kernel.org/r/1531721337-30284-1-git-send-email-wang.yi59@zte.com.cn	2018-07-19 16:52:40 +02:00

1 2 3

134 Commits