We have an issue with timeout links that are deeper in the submit chain,
because we only handle it upfront, not from later submissions. Move the
prep + issue of the timeout link to the async work prep handler, and do
it normally for non-async queue. If we validate and prepare the timeout
links upfront when we first see them, there's nothing stopping us from
supporting any sort of nesting.
Fixes: 2665abfd75 ("io_uring: add support for linked SQE timeouts")
Reported-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There are a few reasons for this:
- As a prep to improving the linked timeout logic
- io_timeout is the biggest member in the io_kiocb opcode union
This also enables a few cleanups, like unifying the timer setup between
IORING_OP_TIMEOUT and IORING_OP_LINK_TIMEOUT, and not needing multiple
arguments to the link/prep helpers.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If we don't use the normal completion path, we may skip killing links
that should be errored and freed. Add __io_double_put_req() for use
within the completion path itself, other calls should just use
io_double_put_req().
Signed-off-by: Jens Axboe <axboe@kernel.dk>
__io_queue_sqe(), io_queue_sqe(), io_queue_link_head() all return 0/err,
but the caller doesn't care since the errors are handled inline. Clean
these up and just make them void.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If we have a linked request, this enables us to pass it back directly
without having to go through async context.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull thread management updates from Christian Brauner:
- A pidfd's fdinfo file currently contains the field "Pid:\t<pid>"
where <pid> is the pid of the process in the pid namespace of the
procfs instance the fdinfo file for the pidfd was opened in.
The fdinfo file has now gained a new "NSpid:\t<ns-pid1>[\t<ns-pid2>[...]]"
field which lists the pids of the process in all child pid namespaces
provided the pid namespace of the procfs instance it is looked up
under has an ancestoral relationship with the pid namespace of the
process. If it does not 0 will be shown and no further pid namespaces
will be listed. Tests included. (Christian Kellner)
- If the process the pidfd references has already exited, print -1 for
the Pid and NSpid fields in the pidfd's fdinfo file. Tests included.
(me)
- Add CLONE_CLEAR_SIGHAND. This lets callers clear all signal handler
that are not SIG_DFL or SIG_IGN at process creation time. This
originated as a feature request from glibc to improve performance and
elimate races in their posix_spawn() implementation. Tests included.
(me)
- Add support for choosing a specific pid for a process with clone3().
This is the feature which was part of the thread update for v5.4 but
after a discussion at LPC in Lisbon we decided to delay it for one
more cycle in order to make the interface more generic. This has now
done. It is now possible to choose a specific pid in a whole pid
namespaces (sub)hierarchy instead of just one pid namespace. In order
to choose a specific pid the caller must have CAP_SYS_ADMIN in all
owning user namespaces of the target pid namespaces. Tests included.
(Adrian Reber)
- Test improvements and extensions. (Andrei Vagin, me)
* tag 'threads-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
selftests/clone3: skip if clone3() is ENOSYS
selftests/clone3: check that all pids are released on error paths
selftests/clone3: report a correct number of fails
selftests/clone3: flush stdout and stderr before clone3() and _exit()
selftests: add tests for clone3() with *set_tid
fork: extend clone3() to support setting a PID
selftests: add tests for clone3()
tests: test CLONE_CLEAR_SIGHAND
clone3: add CLONE_CLEAR_SIGHAND
pid: use pid_has_task() in pidfd_open()
exit: use pid_has_task() in do_wait()
pid: use pid_has_task() in __change_pid()
test: verify fdinfo for pidfd of reaped process
pidfd: check pid has attached task in fdinfo
pidfd: add tests for NSpid info in fdinfo
pidfd: add NSpid entries to fdinfo
Pull EDAC updates from Borislav Petkov:
"A lot of changes this time around, details below.
From the next cycle onwards, we'll switch the EDAC tree to topic
branches (instead of a single edac-for-next branch) which should make
the changes handling more flexible, hopefully. We'll see.
Summary:
- Rework error logging functions to accept a count of errors
parameter (Hanna Hawa)
- Part one of substantial EDAC core + ghes_edac driver cleanup
(Robert Richter)
- Print additional useful logging information in skx_* (Tony Luck)
- Improve amd64_edac hw detection + cleanups (Yazen Ghannam)
- Misc cleanups, fixes and code improvements"
* tag 'edac_for_5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: (35 commits)
EDAC/altera: Use the Altera System Manager driver
EDAC/altera: Cleanup the ECC Manager
EDAC/altera: Use fast register IO for S10 IRQs
EDAC/ghes: Do not warn when incrementing refcount on 0
EDAC/Documentation: Describe CPER module definition and DIMM ranks
EDAC: Unify the mc_event tracepoint call
EDAC/ghes: Remove intermediate buffer pvt->detail_location
EDAC/ghes: Fix grain calculation
EDAC/ghes: Use standard kernel macros for page calculations
EDAC: Remove misleading comment in struct edac_raw_error_desc
EDAC/mc: Reduce indentation level in edac_mc_handle_error()
EDAC/mc: Remove needless zero string termination
EDAC/mc: Do not BUG_ON() in edac_mc_alloc()
EDAC: Introduce an mci_for_each_dimm() iterator
EDAC: Remove EDAC_DIMM_OFF() macro
EDAC: Replace EDAC_DIMM_PTR() macro with edac_get_dimm() function
EDAC/amd64: Get rid of the ECC disabled long message
EDAC/ghes: Fix locking and memory barrier issues
EDAC/amd64: Check for memory before fully initializing an instance
EDAC/amd64: Use cached data when checking for ECC
...
Pull KVM updates from Paolo Bonzini:
"ARM:
- data abort report and injection
- steal time support
- GICv4 performance improvements
- vgic ITS emulation fixes
- simplify FWB handling
- enable halt polling counters
- make the emulated timer PREEMPT_RT compliant
s390:
- small fixes and cleanups
- selftest improvements
- yield improvements
PPC:
- add capability to tell userspace whether we can single-step the
guest
- improve the allocation of XIVE virtual processor IDs
- rewrite interrupt synthesis code to deliver interrupts in virtual
mode when appropriate.
- minor cleanups and improvements.
x86:
- XSAVES support for AMD
- more accurate report of nested guest TSC to the nested hypervisor
- retpoline optimizations
- support for nested 5-level page tables
- PMU virtualization optimizations, and improved support for nested
PMU virtualization
- correct latching of INITs for nested virtualization
- IOAPIC optimization
- TSX_CTRL virtualization for more TAA happiness
- improved allocation and flushing of SEV ASIDs
- many bugfixes and cleanups"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
kvm: nVMX: Relax guest IA32_FEATURE_CONTROL constraints
KVM: x86: Grab KVM's srcu lock when setting nested state
KVM: x86: Open code shared_msr_update() in its only caller
KVM: Fix jump label out_free_* in kvm_init()
KVM: x86: Remove a spurious export of a static function
KVM: x86: create mmu/ subdirectory
KVM: nVMX: Remove unnecessary TLB flushes on L1<->L2 switches when L1 use apic-access-page
KVM: x86: remove set but not used variable 'called'
KVM: nVMX: Do not mark vmcs02->apic_access_page as dirty when unpinning
KVM: vmx: use MSR_IA32_TSX_CTRL to hard-disable TSX on guest that lack it
KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionality
KVM: x86: implement MSR_IA32_TSX_CTRL effect on CPUID
KVM: x86: do not modify masked bits of shared MSRs
KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES
KVM: PPC: Book3S HV: XIVE: Fix potential page leak on error path
KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one
KVM: nVMX: Assume TLB entries of L1 and L2 are tagged differently if L0 use EPT
KVM: x86: Unexport kvm_vcpu_reload_apic_access_page()
KVM: nVMX: add CR4_LA57 bit to nested CR4_FIXED1
KVM: nVMX: Use semi-colon instead of comma for exit-handlers initialization
...
Pull xen updates from Juergen Gross:
- a small series to remove the build constraint of Xen x86 MCE handling
to 64-bit only
- a bunch of minor cleanups
* tag 'for-linus-5.5a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: Fix Kconfig indentation
xen/mcelog: also allow building for 32-bit kernels
xen/mcelog: add PPIN to record when available
xen/mcelog: drop __MC_MSR_MCGCAP
xen/gntdev: Use select for DMA_SHARED_BUFFER
xen: mm: make xen_mm_init static
xen: mm: include <xen/xen-ops.h> for missing declarations
Pull MIPS updates from Paul Burton:
"The main MIPS changes for 5.5:
- Atomics-related code sees some rework & cleanup, most notably
allowing Loongson LL/SC errata workarounds to be more bulletproof &
their correctness to be checked at build time.
- Command line setup code is simplified somewhat, resolving various
corner cases.
- MIPS kernels can now be built with kcov code coverage support.
- We can now build with CONFIG_FORTIFY_SOURCE=y.
- Miscellaneous cleanups.
And some platform specific changes:
- We now disable some broken TLB functionality on certain Ingenic
systems, and JZ4780 systems gain some devicetree nodes to support
more devices.
- Loongson support sees a number of cleanups, and we gain initial
support for Loongson 3A R4 systems.
- We gain support for MediaTek MT7688-based GARDENA Smart Gateway
systems.
- SGI IP27 (Origin 2*) see a number of fixes, cleanups &
simplifications.
- SGI IP30 (Octane) systems are now supported"
* tag 'mips_5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (107 commits)
MIPS: SGI-IP27: Enable ethernet phy on second Origin 200 module
MIPS: PCI: Fix fake subdevice ID for IOC3
MIPS: Ingenic: Disable abandoned HPTLB function.
MIPS: PCI: remember nasid changed by set interrupt affinity
MIPS: SGI-IP27: Fix crash, when CPUs are disabled via nr_cpus parameter
mips: add support for folded p4d page tables
mips: drop __pXd_offset() macros that duplicate pXd_index() ones
mips: fix build when "48 bits virtual memory" is enabled
MIPS: math-emu: Reuse name array in debugfs_fpuemu()
MIPS: allow building with kcov coverage
MIPS: Loongson64: Drop setup_pcimap
MIPS: Loongson2ef: Convert to early_printk_8250
MIPS: Drop CPU_SUPPORTS_UNCACHED_ACCELERATED
MIPS: Loongson{2ef, 32, 64} convert to generic fw cmdline
MIPS: Drop pmon.h
MIPS: Loongson: Unify LOONGSON3/LOONGSON64 Kconfig usage
MIPS: Loongson: Rename LOONGSON1 to LOONGSON32
MIPS: Loongson: Fix return value of loongson_hwmon_init
MIPS: add support for SGI Octane (IP30)
MIPS: PCI: make phys_to_dma/dma_to_phys for pci-xtalk-bridge common
...
Pull m68k updates from Geert Uytterhoeven:
- Atari Falcon IDE platform driver conversion for module autoload
- defconfig updates (including enablement of Amiga ICY I2C)
- small fixes and cleanups
* tag 'm68k-for-v5.5-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k/atari: Convert Falcon IDE drivers to platform drivers
m68k: defconfig: Enable ICY I2C and LTC2990 on Amiga
m68k: defconfig: Update defconfigs for v5.4-rc1
m68k: q40: Fix info-leak in rtc_ioctl
nubus: Remove cast to void pointer
Pull RAS updates from Borislav Petkov:
- Fully reworked thermal throttling notifications, there should be no
more spamming of dmesg (Srinivas Pandruvada and Benjamin Berg)
- More enablement for the Intel-compatible CPUs Zhaoxin (Tony W
Wang-oc)
- PPIN support for Icelake (Tony Luck)
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce/therm_throt: Optimize notifications of thermal throttle
x86/mce: Add Xeon Icelake to list of CPUs that support PPIN
x86/mce: Lower throttling MCE messages' priority to warning
x86/mce: Add Zhaoxin LMCE support
x86/mce: Add Zhaoxin CMCI support
x86/mce: Add Zhaoxin MCE support
x86/mce/amd: Make disable_err_thresholding() static
Pull x86 microcode updates from Borislav Petkov:
"This converts the late loading method to load the microcode in
parallel (vs sequentially currently). The patch remained in linux-next
for the maximum amount of time so that any potential and hard to debug
fallout be minimized.
Now cloud folks have their milliseconds back but all the normal people
should use early loading anyway :-)"
* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/intel: Issue the revision updated message only on the BSP
x86/microcode: Update late microcode in parallel
x86/microcode/amd: Fix two -Wunused-but-set-variable warnings
Pull s390 updates from Vasily Gorbik:
- Adjust PMU device drivers registration to avoid WARN_ON and few other
perf improvements.
- Enhance tracing in vfio-ccw.
- Few stack unwinder fixes and improvements, convert get_wchan custom
stack unwinding to generic api usage.
- Fixes for mm helpers issues uncovered with tests validating
architecture page table helpers.
- Fix noexec bit handling when hardware doesn't support it.
- Fix memleak and unsigned value compared with zero bugs in crypto
code. Minor code simplification.
- Fix crash during kdump with kasan enabled kernel.
- Switch bug and alternatives from asm to asm_inline to improve
inlining decisions.
- Use 'depends on cc-option' for MARCH and TUNE options in Kconfig, add
z13s and z14 ZR1 to TUNE descriptions.
- Minor head64.S simplification.
- Fix physical to logical CPU map for SMT.
- Several cleanups in qdio code.
- Other minor cleanups and fixes all over the code.
* tag 's390-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (41 commits)
s390/cpumf: Adjust registration of s390 PMU device drivers
s390/smp: fix physical to logical CPU map for SMT
s390/early: move access registers setup in C code
s390/head64: remove unnecessary vdso_per_cpu_data setup
s390/early: move control registers setup in C code
s390/kasan: support memcpy_real with TRACE_IRQFLAGS
s390/crypto: Fix unsigned variable compared with zero
s390/pkey: use memdup_user() to simplify code
s390/pkey: fix memory leak within _copy_apqns_from_user()
s390/disassembler: don't hide instruction addresses
s390/cpum_sf: Assign error value to err variable
s390/cpum_sf: Replace function name in debug statements
s390/cpum_sf: Use consistant debug print format for sampling
s390/unwind: drop unnecessary code around calling ftrace_graph_ret_addr()
s390: add error handling to perf_callchain_kernel
s390: always inline current_stack_pointer()
s390/mm: add mm_pxd_folded() checks to pxd_free()
s390/mm: properly clear _PAGE_NOEXEC bit when it is not supported
s390/mm: simplify page table helpers for large entries
s390/mm: make pmd/pud_bad() report large entries as bad
...
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-11-26
The following pull-request contains BPF updates for your *net-next* tree.
We've added 2 non-merge commits during the last 1 day(s) which contain
a total of 2 files changed, 14 insertions(+), 3 deletions(-).
The main changes, 2 small fixes are:
1) Fix libbpf out of tree compilation which complained about unknown u32
type used in libbpf_find_vmlinux_btf_id() which needs to be __u32 instead,
from Andrii Nakryiko.
2) Follow-up fix for the prior BPF mmap series where kbuild bot complained
about missing vmalloc_user_node_flags() for no-MMU, also from Andrii Nakryiko.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull arm64 updates from Catalin Marinas:
"Apart from the arm64-specific bits (core arch and perf, new arm64
selftests), it touches the generic cow_user_page() (reviewed by
Kirill) together with a macro for x86 to preserve the existing
behaviour on this architecture.
Summary:
- On ARMv8 CPUs without hardware updates of the access flag, avoid
failing cow_user_page() on PFN mappings if the pte is old. The
patches introduce an arch_faults_on_old_pte() macro, defined as
false on x86. When true, cow_user_page() makes the pte young before
attempting __copy_from_user_inatomic().
- Covert the synchronous exception handling paths in
arch/arm64/kernel/entry.S to C.
- FTRACE_WITH_REGS support for arm64.
- ZONE_DMA re-introduced on arm64 to support Raspberry Pi 4
- Several kselftest cases specific to arm64, together with a
MAINTAINERS update for these files (moved to the ARM64 PORT entry).
- Workaround for a Neoverse-N1 erratum where the CPU may fetch stale
instructions under certain conditions.
- Workaround for Cortex-A57 and A72 errata where the CPU may
speculatively execute an AT instruction and associate a VMID with
the wrong guest page tables (corrupting the TLB).
- Perf updates for arm64: additional PMU topologies on HiSilicon
platforms, support for CCN-512 interconnect, AXI ID filtering in
the IMX8 DDR PMU, support for the CCPI2 uncore PMU in ThunderX2.
- GICv3 optimisation to avoid a heavy barrier when accessing the
ICC_PMR_EL1 register.
- ELF HWCAP documentation updates and clean-up.
- SMC calling convention conduit code clean-up.
- KASLR diagnostics printed during boot
- NVIDIA Carmel CPU added to the KPTI whitelist
- Some arm64 mm clean-ups: use generic free_initrd_mem(), remove
stale macro, simplify calculation in __create_pgd_mapping(), typos.
- Kconfig clean-ups: CMDLINE_FORCE to depend on CMDLINE, choice for
endinanness to help with allmodconfig"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits)
arm64: Kconfig: add a choice for endianness
kselftest: arm64: fix spelling mistake "contiguos" -> "contiguous"
arm64: Kconfig: make CMDLINE_FORCE depend on CMDLINE
MAINTAINERS: Add arm64 selftests to the ARM64 PORT entry
arm64: kaslr: Check command line before looking for a seed
arm64: kaslr: Announce KASLR status on boot
kselftest: arm64: fake_sigreturn_misaligned_sp
kselftest: arm64: fake_sigreturn_bad_size
kselftest: arm64: fake_sigreturn_duplicated_fpsimd
kselftest: arm64: fake_sigreturn_missing_fpsimd
kselftest: arm64: fake_sigreturn_bad_size_for_magic0
kselftest: arm64: fake_sigreturn_bad_magic
kselftest: arm64: add helper get_current_context
kselftest: arm64: extend test_init functionalities
kselftest: arm64: mangle_pstate_invalid_mode_el[123][ht]
kselftest: arm64: mangle_pstate_invalid_daif_bits
kselftest: arm64: mangle_pstate_invalid_compat_toggle and common utils
kselftest: arm64: extend toplevel skeleton Makefile
drivers/perf: hisi: update the sccl_id/ccl_id for certain HiSilicon platform
arm64: mm: reserve CMA and crashkernel in ZONE_DMA32
...
Pull kselftest KUnit support gtom Shuah Khan:
"This adds KUnit, a lightweight unit testing and mocking framework for
the Linux kernel from Brendan Higgins.
KUnit is not an end-to-end testing framework. It is currently
supported on UML and sub-systems can write unit tests and run them in
UML env. KUnit documentation is included in this update.
In addition, this Kunit update adds 3 new kunit tests:
- proc sysctl test from Iurii Zaikin
- the 'list' doubly linked list test from David Gow
- ext4 tests for decoding extended timestamps from Iurii Zaikin
In the future KUnit will be linked to Kselftest framework to provide a
way to trigger KUnit tests from user-space"
* tag 'linux-kselftest-5.5-rc1-kunit' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (23 commits)
lib/list-test: add a test for the 'list' doubly linked list
ext4: add kunit test for decoding extended timestamps
Documentation: kunit: Fix verification command
kunit: Fix '--build_dir' option
kunit: fix failure to build without printk
MAINTAINERS: add proc sysctl KUnit test to PROC SYSCTL section
kernel/sysctl-test: Add null pointer test for sysctl.c:proc_dointvec()
MAINTAINERS: add entry for KUnit the unit testing framework
Documentation: kunit: add documentation for KUnit
kunit: defconfig: add defconfigs for building KUnit tests
kunit: tool: add Python wrappers for running KUnit tests
kunit: test: add tests for KUnit managed resources
kunit: test: add the concept of assertions
kunit: test: add tests for kunit test abort
kunit: test: add support for test abort
objtool: add kunit_try_catch_throw to the noreturn list
kunit: test: add initial tests
lib: enable building KUnit in lib/
kunit: test: add the concept of expectations
kunit: test: add assertion printing library
...
Pull kselftest fixes from Shuah Khan:
"This consists of several fixes to tests and framework.
Masami Hiramatsu fixed several tests to build and run correctly on arm
and other 32bit architectures"
* tag 'linux-kselftest-5.5-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: sync: Fix cast warnings on arm
selftests: net: Fix printf format warnings on arm
selftests: net: Use size_t and ssize_t for counting file size
selftests: vm: Build/Run 64bit tests only on 64bit arch
selftests: proc: Make va_max 1MB
kselftest: Fix NULL INSTALL_PATH for TARGETS runlist
selftests: Move kselftest_module.sh into kselftest/
selftests: gen_kselftest_tar.sh: Do not clobber kselftest/
selftests: breakpoints: Fix a typo of function name
selftests: Fix O= and KBUILD_OUTPUT handling for relative paths
This reverts commit 825dbc6ff7.
I mistakenly applied this and only now have thought about it a little
more and had time to evaluate a kbuild error for dmaengine.
Once we're calling RELOC_HIDE, we're moving back into the __kernel
address space and letting users interact with the actual memory address
rather than in __percpu which is before adding the offsets.
Signed-off-by: Dennis Zhou <dennis@kernel.org>
All vector drivers now allow a BPF program to be loaded and
associated with the RX socket in the host kernel.
1. The program can be loaded as an extra kernel command line
option to any of the vector drivers.
2. The program can also be loaded as "firmware", using the
ethtool flash option. It is possible to turn this facility
on or off using a command line option.
A simplistic wrapper for generating the BPF firmware for the raw
socket driver out of a tcpdump/libpcap filter expression can be
found at: https://github.com/kot-begemot-uk/uml_vector_utilities/
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Adjust indentation from spaces to tab (+optional two spaces) as in
coding style with command like:
$ sed -e 's/^ /\t/' -i */Kconfig
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.co.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
This driver *can* be a module, but then its parameters (socket path)
are untrusted data from inside the VM, and that isn't allowed. Allow
the code to only be built-in to avoid that.
Fixes: 5d38f32499 ("um: drivers: Add virtio vhost-user driver")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.co.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
When we get an interrupt from the socket getting readable,
and start reading, there's a possibility for a race. This
depends on the implementation of the device, but e.g. with
qemu's libvhost-user, we can see:
device virtio_uml
---------------------------------------
write header
get interrupt
read header
read body -> returns -EAGAIN
write body
The -EAGAIN return is because the socket is non-blocking,
and then this leads us to abandon this message.
In fact, we've already read the header, so when the get
another signal/interrupt for the body, we again read it
as though it's a new message header, and also abandon it
for the same reason (wrong size etc.)
This essentially breaks things, and if that message was
one that required a response, it leads to a deadlock as
the device is waiting for the response but we'll never
reply.
Fix this by spinning on -EAGAIN as well when we read the
message body. We need to handle -EAGAIN as "no message"
while reading the header, since we share an interrupt.
Note that this situation is highly unlikely to occur in
normal usage, since there will be very few messages and
only in the startup phase. With the inband call feature
this does tend to happen (eventually) though.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
If the connection drops, just remove the device, we don't try
to recover from this right now.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
In the main() code, we eventually enable signals just before
exec() or exit(), in order to to not have signals pending and
delivered *after* the exec().
I've observed SIGSEGV loops at this point, and the reason seems
to be the irqflags tracing; this makes sense as the kernel is
no longer really functional at this point. Since there's really
no reason to use unblock_signals_trace() here (I had just done
a global search & replace), use the plain unblock_signals() in
this case to avoid going into the no longer functional kernel.
Fixes: 0dafcbe128 ("um: Implement TRACE_IRQFLAGS_SUPPORT")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
The first generation i.MX6 processors does not send an interrupt when the
power key is pressed. It sends a power down request interrupt if the key is
released before a hard shutdown (5 second press). This should allow
software to bring down the SoC safely.
For this driver to work as a regular power key with the older SoCs, we need
to send a keypress AND release when we get the power down request irq.
Signed-off-by: Robin van der Gracht <robin@protonic.nl>
Link: https://lore.kernel.org/r/20191125161210.8275-1-robin@protonic.nl
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Changing alarm_itimer accidentally broke the logic for arithmetic
rounding of half seconds in the return code.
Change it to a constant based on NSEC_PER_SEC, as suggested by
Ben Hutchings.
Fixes: bd40a17576 ("y2038: itimer: change implementation to timespec64")
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The correct type on x32 is 64-bit wide, same as for the other struct
members around it, so use __kernel_long_t in place of the original
__kernel_time_t here, corresponding to the rest of the structure.
Fixes: caf5e32d4e ("y2038: ipc: remove __kernel_time_t reference from headers")
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Pull fsverity updates from Eric Biggers:
"Expose the fs-verity bit through statx()"
* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
docs: fs-verity: mention statx() support
f2fs: support STATX_ATTR_VERITY
ext4: support STATX_ATTR_VERITY
statx: define STATX_ATTR_VERITY
docs: fs-verity: document first supported kernel version
Pull fscrypt updates from Eric Biggers:
- Add the IV_INO_LBLK_64 encryption policy flag which modifies the
encryption to be optimized for UFS inline encryption hardware.
- For AES-128-CBC, use the crypto API's implementation of ESSIV (which
was added in 5.4) rather than doing ESSIV manually.
- A few other cleanups.
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
f2fs: add support for IV_INO_LBLK_64 encryption policies
ext4: add support for IV_INO_LBLK_64 encryption policies
fscrypt: add support for IV_INO_LBLK_64 policies
fscrypt: avoid data race on fscrypt_mode::logged_impl_name
docs: ioctl-number: document fscrypt ioctl numbers
fscrypt: zeroize fscrypt_info before freeing
fscrypt: remove struct fscrypt_ctx
fscrypt: invoke crypto API for ESSIV handling
Pull AFFS updates from David Sterba:
"A minor bugfix and cleanup for AFFS"
* tag 'affs-for-5.5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
affs: fix a memory leak in affs_remount
affs: Replace binary semaphores with mutexes
Pull btrfs updates from David Sterba:
"User visible changes:
- new block group profiles: RAID1 with 3- and 4- copies
- RAID1 in btrfs has always 2 copies, now add support for 3 and 4
- this is an incompat feature (named RAID1C34)
- recommended use of RAID1C3 is replacement of RAID6 profile on
metadata, this brings a more reliable resiliency against 2
device loss/damage
- support for new checksums
- per-filesystem, set at mkfs time
- fast hash (crc32c successor): xxhash, 64bit digest
- strong hashes (both 256bit): sha256 (slower, FIPS), blake2b
(faster)
- the blake2b module goes via the crypto tree, btrfs.ko has a
soft dependency
- speed up lseek, don't take inode locks unnecessarily, this can
speed up parallel SEEK_CUR/SEEK_SET/SEEK_END by 80%
- send:
- allow clone operations within the same file
- limit maximum number of sent clone references to avoid slow
backref walking
- error message improvements: device scan prints process name and PID
Core changes:
- cleanups
- remove unique workqueue helpers, used to provide a way to avoid
deadlocks in the workqueue code, now done in a simpler way
- remove lots of indirect function calls in compression code
- extent IO tree code moved out of extent_io.c
- cleanup backup superblock handling at mount time
- transaction life cycle documentation and cleanups
- locking code cleanups, annotations and documentation
- add more cold, const, pure function attributes
- removal of unused or redundant struct members or variables
- new tree-checker sanity tests
- try to detect missing INODE_ITEM, cross-reference checks of
DIR_ITEM, DIR_INDEX, INODE_REF, and XATTR_* items
- remove own bio scheduling code (used to avoid checksum submissions
being stuck behind other IO), replaced by cgroup controller-based
code to allow better control and avoid priority inversions in cases
where the custom and cgroup scheduling disagreed
Fixes:
- avoid getting stuck during cyclic writebacks
- fix trimming of ranges crossing block group boundaries
- fix rename exchange on subvolumes, all involved subvolumes need to
be recorded in the transaction"
* tag 'for-5.5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (137 commits)
btrfs: drop bdev argument from submit_extent_page
btrfs: remove extent_map::bdev
btrfs: drop bio_set_dev where not needed
btrfs: get bdev directly from fs_devices in submit_extent_page
btrfs: record all roots for rename exchange on a subvol
Btrfs: fix block group remaining RO forever after error during device replace
btrfs: scrub: Don't check free space before marking a block group RO
btrfs: change btrfs_fs_devices::rotating to bool
btrfs: change btrfs_fs_devices::seeding to bool
btrfs: rename btrfs_block_group_cache
btrfs: block-group: Reuse the item key from caller of read_one_block_group()
btrfs: block-group: Refactor btrfs_read_block_groups()
btrfs: document extent buffer locking
btrfs: access eb::blocking_writers according to ACCESS_ONCE policies
btrfs: set blocking_writers directly, no increment or decrement
btrfs: merge blocking_writers branches in btrfs_tree_read_lock
btrfs: drop incompat bit for raid1c34 after last block group is gone
btrfs: add incompat for raid1 with 3, 4 copies
btrfs: add support for 4-copy replication (raid1c4)
btrfs: add support for 3-copy replication (raid1c3)
...
Pull MTD updates from Miquel Raynal:
"MTD core:
- drop inactive maintainers, update the repositories and add IRC
channel
- debugfs functions improvements
- initialize more structure parameters
- misc fixes reported by robots
MTD devices:
- spear_smi: Fixed Write Burst mode
- new Intel IXP4xx flash probing hook
Raw NAND core:
- useless extra checks dropped
- update the detection of the bad block markers position
Raw NAND controller drivers:
- Cadence: new driver
- Brcmnand: support for flash-dma v0 + fixes
- Denali: drop support for the legacy controller/chip DT representation
- superfluous dev_err() calls removed
SPI NOR core changes:
- introduce 'struct spi_nor_controller_ops'
- clean the Register Operations methods
- use dev_dbg insted of dev_err for low level info
- fix retlen handling in sst_write()
- fix silent truncations in spi_nor_read and spi_nor_read_raw()
- fix the clearing of QE bit on lock()/unlock()
- rework the disabling of the block write protection
- rework the Quad Enable methods
- make sure nor->spimem and nor->controller_ops are mutually exclusive
- set default Quad Enable method for ISSI flashes
- add support for few flashes
SPI NOR controller drivers changes:
- intel-spi:
- support chips without software sequencer
- add support for Intel Cannon Lake and Intel Comet Lake-H flashes
CFI core changes:
- code cleanups related useless initializers and coding style issues
- fix for a possible double free problem in cfi_cmdset_0002
- improved HyperFlash error reporting and handling in cfi_cmdset_0002 core"
* tag 'mtd/for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (73 commits)
mtd: devices: fix mchp23k256 read and write
mtd: no need to check return value of debugfs_create functions
mtd: spi-nor: Set default Quad Enable method for ISSI flashes
mtd: spi-nor: Add support for is25wp256
mtd: spi-nor: Add support for w25q256jw
mtd: spi-nor: Move condition to avoid a NULL check
mtd: spi-nor: Make sure nor->spimem and nor->controller_ops are mutually exclusive
mtd: spi-nor: Rename Quad Enable methods
mtd: spi-nor: Merge spansion Quad Enable methods
mtd: spi-nor: Rename CR_QUAD_EN_SPAN to SR2_QUAD_EN_BIT1
mtd: spi-nor: Extend the SR Read Back test
mtd: spi-nor: Rework the disabling of block write protection
mtd: spi-nor: Fix clearing of QE bit on lock()/unlock()
mtd: cfi_cmdset_0002: fix delayed error detection on HyperFlash
mtd: cfi_cmdset_0002: only check errors when ready in cfi_check_err_status()
mtd: cfi_cmdset_0002: don't free cfi->cfiq in error path of cfi_amdstd_setup()
mtd: cfi_cmdset_*: kill useless 'ret' variable initializers
mtd: cfi_util: use DIV_ROUND_UP() in cfi_udelay()
mtd: spi-nor: Print debug message when the read back test fails
mtd: spi-nor: Check all the bits written, not just the BP ones
...
Pull device mapper updates from Mike Snitzer:
- Fix DM core to disallow stacking request-based DM on partitions.
- Fix DM raid target to properly resync raidset even if bitmap needed
additional pages.
- Fix DM crypt performance regression due to use of WQ_HIGHPRI for the
IO and crypt workqueues.
- Fix DM integrity metadata layout that was aligned on 128K boundary
rather than the intended 4K boundary (removes 124K of wasted space
for each metadata block).
- Improve the DM thin, cache and clone targets to use spin_lock_irq
rather than spin_lock_irqsave where possible.
- Fix DM thin single thread performance that was lost due to needless
workqueue wakeups.
- Fix DM zoned target performance that was lost due to excessive
backing device checks.
- Add ability to trigger write failure with the DM dust test target.
- Fix whitespace indentation in drivers/md/Kconfig.
- Various smalls fixes and cleanups (e.g. use struct_size, fix
uninitialized variable, variable renames, etc).
* tag 'for-5.5/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (22 commits)
Revert "dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues"
dm: Fix Kconfig indentation
dm thin: wakeup worker only when deferred bios exist
dm integrity: fix excessive alignment of metadata runs
dm raid: Remove unnecessary negation of a shift in raid10_format_to_md_layout
dm zoned: reduce overhead of backing device checks
dm dust: add limited write failure mode
dm dust: change ret to r in dust_map_read and dust_map
dm dust: change result vars to r
dm cache: replace spin_lock_irqsave with spin_lock_irq
dm bio prison: replace spin_lock_irqsave with spin_lock_irq
dm thin: replace spin_lock_irqsave with spin_lock_irq
dm clone: add bucket_lock_irq/bucket_unlock_irq helpers
dm clone: replace spin_lock_irqsave with spin_lock_irq
dm writecache: handle REQ_FUA
dm writecache: fix uninitialized variable warning
dm stripe: use struct_size() in kmalloc()
dm raid: streamline rs_get_progress() and its raid_status() caller side
dm raid: simplify rs_setup_recovery call chain
dm raid: to ensure resynchronization, perform raid set grow in preresume
...
Pull disk revalidation updates from Jens Axboe:
"This continues the work that Jan Kara started to thoroughly cleanup
and consolidate how we handle rescans and revalidations"
* tag 'for-5.5/disk-revalidate-20191122' of git://git.kernel.dk/linux-block:
block: move clearing bd_invalidated into check_disk_size_change
block: remove (__)blkdev_reread_part as an exported API
block: fix bdev_disk_changed for non-partitioned devices
block: move rescan_partitions to fs/block_dev.c
block: merge invalidate_partitions into rescan_partitions
block: refactor rescan_partitions
Pull zoned block device update from Jens Axboe:
"Enhancements and improvements to the zoned device support"
* tag 'for-5.5/zoned-20191122' of git://git.kernel.dk/linux-block:
scsi: sd_zbc: Remove set but not used variable 'buflen'
block: rework zone reporting
scsi: sd_zbc: Cleanup sd_zbc_alloc_report_buffer()
null_blk: Add zone_nr_conv to features
null_blk: clean up report zones
null_blk: clean up the block device operations
block: Remove partition support for zoned block devices
block: Simplify report zones execution
block: cleanup the !zoned case in blk_revalidate_disk_zones
block: Enhance blk_revalidate_disk_zones()
Pull additional block driver updates from Jens Axboe:
"Here's another block driver update, done to avoid conflicts with the
zoned changes coming next.
This contains:
- Prepare SCSI sd for zone open/close/finish support
- Small NVMe pull request
- hwmon support (Akinobu)
- add new co-maintainer (Christoph)
- work-around for a discard issue on non-conformant drives
(Eduard)
- Small nbd leak fix"
* tag 'for-5.5/drivers-post-20191122' of git://git.kernel.dk/linux-block:
nbd: prevent memory leak
nvme: hwmon: add quirk to avoid changing temperature threshold
nvme: hwmon: provide temperature min and max values for each sensor
nvmet: add another maintainer
nvme: Discard workaround for non-conformant devices
nvme: Add hardware monitoring support
scsi: sd_zbc: add zone open, close, and finish support
Pull block driver updates from Jens Axboe:
"Here are the main block driver updates for 5.5. Nothing major in here,
mostly just fixes. This contains:
- a set of bcache changes via Coly
- MD changes from Song
- loop unmap write-zeroes fix (Darrick)
- spelling fixes (Geert)
- zoned additions cleanups to null_blk/dm (Ajay)
- allow null_blk online submit queue changes (Bart)
- NVMe changes via Keith, nothing major here either"
* tag 'for-5.5/drivers-20191121' of git://git.kernel.dk/linux-block: (56 commits)
Revert "bcache: fix fifo index swapping condition in journal_pin_cmp()"
drivers/md/raid5-ppl.c: use the new spelling of RWH_WRITE_LIFE_NOT_SET
drivers/md/raid5.c: use the new spelling of RWH_WRITE_LIFE_NOT_SET
bcache: don't export symbols
bcache: remove the extra cflags for request.o
bcache: at least try to shrink 1 node in bch_mca_scan()
bcache: add idle_max_writeback_rate sysfs interface
bcache: add code comments in bch_btree_leaf_dirty()
bcache: fix deadlock in bcache_allocator
bcache: add code comment bch_keylist_pop() and bch_keylist_pop_front()
bcache: deleted code comments for dead code in bch_data_insert_keys()
bcache: add more accurate error messages in read_super()
bcache: fix static checker warning in bcache_device_free()
bcache: fix a lost wake-up problem caused by mca_cannibalize_lock
bcache: fix fifo index swapping condition in journal_pin_cmp()
md/raid10: prevent access of uninitialized resync_pages offset
md: avoid invalid memory access for array sb->dev_roles
md/raid1: avoid soft lockup under high load
null_blk: add zone open, close, and finish support
dm: add zone open, close and finish support
...
Slip_open doesn't clean-up device which registration failed from the
slip_devs device list. On next open after failure this list is iterated
and freed device is accessed. Fix this by calling sl_free_netdev in error
path.
Here is the trace from the Syzbot:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x197/0x210 lib/dump_stack.c:118
print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374
__kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506
kasan_report+0x12/0x20 mm/kasan/common.c:634
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
sl_sync drivers/net/slip/slip.c:725 [inline]
slip_open+0xecd/0x11b7 drivers/net/slip/slip.c:801
tty_ldisc_open.isra.0+0xa3/0x110 drivers/tty/tty_ldisc.c:469
tty_set_ldisc+0x30e/0x6b0 drivers/tty/tty_ldisc.c:596
tiocsetd drivers/tty/tty_io.c:2334 [inline]
tty_ioctl+0xe8d/0x14f0 drivers/tty/tty_io.c:2594
vfs_ioctl fs/ioctl.c:46 [inline]
file_ioctl fs/ioctl.c:509 [inline]
do_vfs_ioctl+0xdb6/0x13e0 fs/ioctl.c:696
ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fixes: 3b5a39979d ("slip: Fix memory leak in slip_open error path")
Reported-by: syzbot+4d5170758f3762109542@syzkaller.appspotmail.com
Cc: David Miller <davem@davemloft.net>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Jouni Hogander <jouni.hogander@unikie.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull core block updates from Jens Axboe:
"Due to more granular branches, this one is small and will be followed
with other core branches that add specific features. I meant to just
have a core and drivers branch, but external dependencies we ended up
adding a few more that are also core.
The changes are:
- Fixes and improvements for the zoned device support (Ajay, Damien)
- sed-opal table writing and datastore UID (Revanth)
- blk-cgroup (and bfq) blk-cgroup stat fixes (Tejun)
- Improvements to the block stats tracking (Pavel)
- Fix for overruning sysfs buffer for large number of CPUs (Ming)
- Optimization for small IO (Ming, Christoph)
- Fix typo in RWH lifetime hint (Eugene)
- Dead code removal and documentation (Bart)
- Reduction in memory usage for queue and tag set (Bart)
- Kerneldoc header documentation (André)
- Device/partition revalidation fixes (Jan)
- Stats tracking for flush requests (Konstantin)
- Various other little fixes here and there (et al)"
* tag 'for-5.5/block-20191121' of git://git.kernel.dk/linux-block: (48 commits)
Revert "block: split bio if the only bvec's length is > SZ_4K"
block: add iostat counters for flush requests
block,bfq: Skip tracing hooks if possible
block: sed-opal: Introduce SUM_SET_LIST parameter and append it using 'add_token_u64'
blk-cgroup: cgroup_rstat_updated() shouldn't be called on cgroup1
block: Don't disable interrupts in trigger_softirq()
sbitmap: Delete sbitmap_any_bit_clear()
blk-mq: Delete blk_mq_has_free_tags() and blk_mq_can_queue()
block: split bio if the only bvec's length is > SZ_4K
block: still try to split bio if the bvec crosses pages
blk-cgroup: separate out blkg_rwstat under CONFIG_BLK_CGROUP_RWSTAT
blk-cgroup: reimplement basic IO stats using cgroup rstat
blk-cgroup: remove now unused blkg_print_stat_{bytes|ios}_recursive()
blk-throtl: stop using blkg->stat_bytes and ->stat_ios
bfq-iosched: stop using blkg->stat_bytes and ->stat_ios
bfq-iosched: relocate bfqg_*rwstat*() helpers
block: add zone open, close and finish ioctl support
block: add zone open, close and finish operations
block: Simplify REQ_OP_ZONE_RESET_ALL handling
block: Remove REQ_OP_ZONE_RESET plugging
...
Pull libata updates from Jens Axboe:
"Just a few fixes all over the place, support for the Annapurna SATA
controller, and a patchset that cleans up the error defines and
ultimately fixes anissue with sata_mv"
* tag 'for-5.5/libata-20191121' of git://git.kernel.dk/linux-block:
ata: pata_artop: make arrays static const, makes object smaller
ata_piix: remove open-coded dmi_match(DMI_OEM_STRING)
ata: sata_mv, avoid trigerrable BUG_ON
ata: make qc_prep return ata_completion_errors
ata: define AC_ERR_OK
ata: Documentation, fix function names
libata: Ensure ata_port probe has completed before detach
ahci: tegra: use regulator_bulk_set_supply_names()
ahci: Add support for Amazon's Annapurna Labs SATA controller
This function was using configuration of port 0 in devicetree for all ports.
In case CPU port was not 0, the delay settings was ignored. This resulted not
working communication between CPU and the switch.
Fixes: f5b8631c29 ("net: dsa: sja1105: Error out if RGMII delays are requested in DT")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While enqueueing a broadcast skb to port->bc_queue, schedule_work()
is called to add port->bc_work, which processes the skbs in
bc_queue, to "events" work queue. If port->bc_queue is full, the
skb will be discarded and schedule_work(&port->bc_work) won't be
called. However, if port->bc_queue is full and port->bc_work is not
running or pending, port->bc_queue will keep full and schedule_work()
won't be called any more, and all broadcast skbs to macvlan will be
discarded. This case can happen:
macvlan_process_broadcast() is the pending function of port->bc_work,
it moves all the skbs in port->bc_queue to the queue "list", and
processes the skbs in "list". During this, new skbs will keep being
added to port->bc_queue in macvlan_broadcast_enqueue(), and
port->bc_queue may already full when macvlan_process_broadcast()
return. This may happen, especially when there are a lot of real-time
threads and the process is preempted.
Fix this by calling schedule_work(&port->bc_work) even if
port->bc_work is full in macvlan_broadcast_enqueue().
Fixes: 412ca1550c ("macvlan: Move broadcasts into a work queue")
Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>