Pull year 2038 updates from Thomas Gleixner:
"Another round of changes to make the kernel ready for 2038. After lots
of preparatory work this is the first set of syscalls which are 2038
safe:
403 clock_gettime64
404 clock_settime64
405 clock_adjtime64
406 clock_getres_time64
407 clock_nanosleep_time64
408 timer_gettime64
409 timer_settime64
410 timerfd_gettime64
411 timerfd_settime64
412 utimensat_time64
413 pselect6_time64
414 ppoll_time64
416 io_pgetevents_time64
417 recvmmsg_time64
418 mq_timedsend_time64
419 mq_timedreceiv_time64
420 semtimedop_time64
421 rt_sigtimedwait_time64
422 futex_time64
423 sched_rr_get_interval_time64
The syscall numbers are identical all over the architectures"
* 'timers-2038-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
riscv: Use latest system call ABI
checksyscalls: fix up mq_timedreceive and stat exceptions
unicore32: Fix __ARCH_WANT_STAT64 definition
asm-generic: Make time32 syscall numbers optional
asm-generic: Drop getrlimit and setrlimit syscalls from default list
32-bit userspace ABI: introduce ARCH_32BIT_OFF_T config option
compat ABI: use non-compat openat and open_by_handle_at variants
y2038: add 64-bit time_t syscalls to all 32-bit architectures
y2038: rename old time and utime syscalls
y2038: remove struct definition redirects
y2038: use time32 syscall names on 32-bit
syscalls: remove obsolete __IGNORE_ macros
y2038: syscalls: rename y2038 compat syscalls
x86/x32: use time64 versions of sigtimedwait and recvmmsg
timex: change syscalls to use struct __kernel_timex
timex: use __kernel_timex internally
sparc64: add custom adjtimex/clock_adjtime functions
time: fix sys_timer_settime prototype
time: Add struct __kernel_timex
time: make adjtime compat handling available for 32 bit
...
Pull x86/pti update from Thomas Gleixner:
"Just a single change from the anti-performance departement:
- Add a new PR_SPEC_DISABLE_NOEXEC option which allows to apply the
speculation protections on a process without inheriting the state
on exec.
This remedies a situation where a Java-launcher has speculation
protections enabled because that's the default for JVMs which
causes the launched regular harmless processes to inherit the
protection state which results in unintended performance
degradation"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation: Add PR_SPEC_DISABLE_NOEXEC
Pull irq updates from Thomas Gleixner:
"The interrupt departement delivers this time:
- New infrastructure to manage NMIs on platforms which have a sane
NMI delivery, i.e. identifiable NMI vectors instead of a single
lump.
- Simplification of the interrupt affinity management so drivers
don't have to implement ugly loops around the PCI/MSI enablement.
- Speedup for interrupt statistics in /proc/stat
- Provide a function to retrieve the default irq domain
- A new interrupt controller for the Loongson LS1X platform
- Affinity support for the SiFive PLIC
- Better support for the iMX irqsteer driver
- NUMA aware memory allocations for GICv3
- The usual small fixes, improvements and cleanups all over the
place"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
irqchip/imx-irqsteer: Add multi output interrupts support
irqchip/imx-irqsteer: Change to use reg_num instead of irq_group
dt-bindings: irq: imx-irqsteer: Add multi output interrupts support
dt-binding: irq: imx-irqsteer: Use irq number instead of group number
irqchip/brcmstb-l2: Use _irqsave locking variants in non-interrupt code
irqchip/gicv3-its: Use NUMA aware memory allocation for ITS tables
irqdomain: Allow the default irq domain to be retrieved
irqchip/sifive-plic: Implement irq_set_affinity() for SMP host
irqchip/sifive-plic: Differentiate between PLIC handler and context
irqchip/sifive-plic: Add warning in plic_init() if handler already present
irqchip/sifive-plic: Pre-compute context hart base and enable base
PCI/MSI: Remove obsolete sanity checks for multiple interrupt sets
genirq/affinity: Remove the leftovers of the original set support
nvme-pci: Simplify interrupt allocation
genirq/affinity: Add new callback for (re)calculating interrupt sets
genirq/affinity: Store interrupt sets size in struct irq_affinity
genirq/affinity: Code consolidation
irqchip/irq-sifive-plic: Check and continue in case of an invalid cpuid.
irqchip/i8259: Fix shutdown order by moving syscore_ops registration
dt-bindings: interrupt-controller: loongson ls1x intc
...
Pull timer and clockevent updates from Thomas Gleixner:
"The time(r) core and clockevent updates are mostly boring this time:
- A new driver for the Tegra210 timer
- Small fixes and improvements alll over the place
- Documentation updates and cleanups"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
soc/tegra: default select TEGRA_TIMER for Tegra210
clocksource/drivers/tegra: Add Tegra210 timer support
dt-bindings: timer: add Tegra210 timer
clocksource/drivers/timer-cs5535: Rename the file for consistency
clocksource/drivers/timer-pxa: Rename the file for consistency
clocksource/drivers/tango-xtal: Rename the file for consistency
dt-bindings: timer: gpt: update binding doc
clocksource/drivers/exynos_mct: Remove unused header includes
dt-bindings: timer: mediatek: update bindings for MT7629 SoC
clocksource/drivers/exynos_mct: Fix error path in timer resources initialization
clocksource/drivers/exynos_mct: Remove dead code
clocksource/drivers/riscv: Add required checks during clock source init
dt-bindings: timer: renesas: tmu: Document r8a774c0 bindings
dt-bindings: timer: renesas, cmt: Document r8a774c0 CMT support
clocksource/drivers/exynos_mct: Clear timer interrupt when shutdown
clocksource/drivers/exynos_mct: Move one-shot check from tick clear to ISR
clocksource/drivers/arch_timer: Workaround for Allwinner A64 timer instability
clocksource/drivers/sun5i: Fail gracefully when clock rate is unavailable
timers: Mark expected switch fall-throughs
timekeeping/debug: No need to check return value of debugfs_create functions
...
Add a "create" module parameter, which allows device-mapper targets to
be configured at boot time. This enables early use of DM targets in the
boot process (as the root device or otherwise) without the need of an
initramfs.
The syntax used in the boot param is based on the concise format from
the dmsetup tool to follow the rule of least surprise:
dmsetup table --concise /dev/mapper/lroot
Which is:
dm-mod.create=<name>,<uuid>,<minor>,<flags>,<table>[,<table>+][;<name>,<uuid>,<minor>,<flags>,<table>[,<table>+]+]
Where,
<name> ::= The device name.
<uuid> ::= xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | ""
<minor> ::= The device minor number | ""
<flags> ::= "ro" | "rw"
<table> ::= <start_sector> <num_sectors> <target_type> <target_args>
<target_type> ::= "verity" | "linear" | ...
For example, the following could be added in the boot parameters:
dm-mod.create="lroot,,,rw, 0 4096 linear 98:16 0, 4096 4096 linear 98:32 0" root=/dev/dm-0
Only the targets that were tested are allowed and the ones that don't
change any block device when the device is create as read-only. For
example, mirror and cache targets are not allowed. The rationale behind
this is that if the user makes a mistake, choosing the wrong device to
be the mirror or the cache can corrupt data.
The only targets initially allowed are:
* crypt
* delay
* linear
* snapshot-origin
* striped
* verity
Co-developed-by: Will Drewry <wad@chromium.org>
Co-developed-by: Kees Cook <keescook@chromium.org>
Co-developed-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Signed-off-by: Helen Koike <helen.koike@collabora.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
A dm-raid array with devices larger than 4GB won't assemble on
a 32 bit host since _check_data_dev_sectors() was added in 4.16.
This is because to_sector() treats its argument as an "unsigned long"
which is 32bits (4GB) on a 32bit host. Using "unsigned long long"
is more correct.
Kernels as early as 4.2 can have other problems due to to_sector()
being used on the size of a device.
Fixes: 0cf4503174 ("dm raid: add support for the MD RAID0 personality")
cc: stable@vger.kernel.org (v4.2+)
Reported-and-tested-by: Guillaume Perréal <gperreal@free.fr>
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Pull s390 updates from Martin Schwidefsky:
- A copy of Arnds compat wrapper generation series
- Pass information about the KVM guest to the host in form the control
program code and the control program version code
- Map IOV resources to support PCI physical functions on s390
- Add vector load and store alignment hints to improve performance
- Use the "jdd" constraint with gcc 9 to make jump labels working again
- Remove amode workaround for old z/VM releases from the DCSS code
- Add support for in-kernel performance measurements using the CPU
measurement counter facility
- Introduce a new PMU device cpum_cf_diag to capture counters and store
thenn as event raw data.
- Bug fixes and cleanups
* tag 's390-5.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (54 commits)
Revert "s390/cpum_cf: Add kernel message exaplanations"
s390/dasd: fix read device characteristic with CONFIG_VMAP_STACK=y
s390/suspend: fix prefix register reset in swsusp_arch_resume
s390: warn about clearing als implied facilities
s390: allow overriding facilities via command line
s390: clean up redundant facilities list setup
s390/als: remove duplicated in-place implementation of stfle
s390/cio: Use cpa range elsewhere within vfio-ccw
s390/cio: Fix vfio-ccw handling of recursive TICs
s390: vfio_ap: link the vfio_ap devices to the vfio_ap bus subsystem
s390/cpum_cf: Handle EBUSY return code from CPU counter facility reservation
s390/cpum_cf: Add kernel message exaplanations
s390/cpum_cf_diag: Add support for s390 counter facility diagnostic trace
s390/cpum_cf: add ctr_stcctm() function
s390/cpum_cf: move common functions into a separate file
s390/cpum_cf: introduce kernel_cpumcf_avail() function
s390/cpu_mf: replace stcctm5() with the stcctm() function
s390/cpu_mf: add store cpu counter multiple instruction support
s390/cpum_cf: Add minimal in-kernel interface for counter measurements
s390/cpum_cf: introduce kernel_cpumcf_alert() to obtain measurement alerts
...
If number of caps exceed the limit, ceph_trim_dentires() also trim
dentries with valid leases. Trimming dentry releases references to
associated inode, which may evict inode and release caps.
By default, there is no limit for caps count.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Pull crypto update from Herbert Xu:
"API:
- Add helper for simple skcipher modes.
- Add helper to register multiple templates.
- Set CRYPTO_TFM_NEED_KEY when setkey fails.
- Require neither or both of export/import in shash.
- AEAD decryption test vectors are now generated from encryption
ones.
- New option CONFIG_CRYPTO_MANAGER_EXTRA_TESTS that includes random
fuzzing.
Algorithms:
- Conversions to skcipher and helper for many templates.
- Add more test vectors for nhpoly1305 and adiantum.
Drivers:
- Add crypto4xx prng support.
- Add xcbc/cmac/ecb support in caam.
- Add AES support for Exynos5433 in s5p.
- Remove sha384/sha512 from artpec7 as hardware cannot do partial
hash"
[ There is a merge of the Freescale SoC tree in order to pull in changes
required by patches to the caam/qi2 driver. ]
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (174 commits)
crypto: s5p - add AES support for Exynos5433
dt-bindings: crypto: document Exynos5433 SlimSSS
crypto: crypto4xx - add missing of_node_put after of_device_is_available
crypto: cavium/zip - fix collision with generic cra_driver_name
crypto: af_alg - use struct_size() in sock_kfree_s()
crypto: caam - remove redundant likely/unlikely annotation
crypto: s5p - update iv after AES-CBC op end
crypto: x86/poly1305 - Clear key material from stack in SSE2 variant
crypto: caam - generate hash keys in-place
crypto: caam - fix DMA mapping xcbc key twice
crypto: caam - fix hash context DMA unmap size
hwrng: bcm2835 - fix probe as platform device
crypto: s5p-sss - Use AES_BLOCK_SIZE define instead of number
crypto: stm32 - drop pointless static qualifier in stm32_hash_remove()
crypto: chelsio - Fixed Traffic Stall
crypto: marvell - Remove set but not used variable 'ivsize'
crypto: ccp - Update driver messages to remove some confusion
crypto: adiantum - add 1536 and 4096-byte test vectors
crypto: nhpoly1305 - add a test vector with len % 16 != 0
crypto: arm/aes-ce - update IV after partial final CTR block
...
Pull networking updates from David Miller:
"Here we go, another merge window full of networking and #ebpf changes:
1) Snoop DHCPACKS in batman-adv to learn MAC/IP pairs in the DHCP
range without dealing with floods of ARP traffic, from Linus
Lüssing.
2) Throttle buffered multicast packet transmission in mt76, from
Felix Fietkau.
3) Support adaptive interrupt moderation in ice, from Brett Creeley.
4) A lot of struct_size conversions, from Gustavo A. R. Silva.
5) Add peek/push/pop commands to bpftool, as well as bash completion,
from Stanislav Fomichev.
6) Optimize sk_msg_clone(), from Vakul Garg.
7) Add SO_BINDTOIFINDEX, from David Herrmann.
8) Be more conservative with local resends due to local congestion,
from Yuchung Cheng.
9) Allow vetoing of unsupported VXLAN FDBs, from Petr Machata.
10) Add health buffer support to devlink, from Eran Ben Elisha.
11) Add TXQ scheduling API to mac80211, from Toke Høiland-Jørgensen.
12) Add statistics to basic packet scheduler filter, from Cong Wang.
13) Add GRE tunnel support for mlxsw Spectrum-2, from Nir Dotan.
14) Lots of new IP tunneling forwarding tests, also from Nir Dotan.
15) Add 3ad stats to bonding, from Nikolay Aleksandrov.
16) Lots of probing improvements for bpftool, from Quentin Monnet.
17) Various nfp drive #ebpf JIT improvements from Jakub Kicinski.
18) Allow #ebpf programs to access gso_segs from skb shared info, from
Eric Dumazet.
19) Add sock_diag support for AF_XDP sockets, from Björn Töpel.
20) Support 22260 iwlwifi devices, from Luca Coelho.
21) Use rbtree for ipv6 defragmentation, from Peter Oskolkov.
22) Add JMP32 instruction class support to #ebpf, from Jiong Wang.
23) Add spinlock support to #ebpf, from Alexei Starovoitov.
24) Support 256-bit keys and TLS 1.3 in ktls, from Dave Watson.
25) Add device infomation API to devlink, from Jakub Kicinski.
26) Add new timestamping socket options which are y2038 safe, from
Deepa Dinamani.
27) Add RX checksum offloading for various sh_eth chips, from Sergei
Shtylyov.
28) Flow offload infrastructure, from Pablo Neira Ayuso.
29) Numerous cleanups, improvements, and bug fixes to the PHY layer
and many drivers from Heiner Kallweit.
30) Lots of changes to try and make packet scheduler classifiers run
lockless as much as possible, from Vlad Buslov.
31) Support BCM957504 chip in bnxt_en driver, from Erik Burrows.
32) Add concurrency tests to tc-tests infrastructure, from Vlad
Buslov.
33) Add hwmon support to aquantia, from Heiner Kallweit.
34) Allow 64-bit values for SO_MAX_PACING_RATE, from Eric Dumazet.
And I would be remiss if I didn't thank the various major networking
subsystem maintainers for integrating much of this work before I even
saw it. Alexei Starovoitov, Daniel Borkmann, Pablo Neira Ayuso,
Johannes Berg, Kalle Valo, and many others. Thank you!"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2207 commits)
net/sched: avoid unused-label warning
net: ignore sysctl_devconf_inherit_init_net without SYSCTL
phy: mdio-mux: fix Kconfig dependencies
net: phy: use phy_modify_mmd_changed in genphy_c45_an_config_aneg
net: dsa: mv88e6xxx: add call to mv88e6xxx_ports_cmode_init to probe for new DSA framework
selftest/net: Remove duplicate header
sky2: Disable MSI on Dell Inspiron 1545 and Gateway P-79
net/mlx5e: Update tx reporter status in case channels were successfully opened
devlink: Add support for direct reporter health state update
devlink: Update reporter state to error even if recover aborted
sctp: call iov_iter_revert() after sending ABORT
team: Free BPF filter when unregistering netdev
ip6mr: Do not call __IP6_INC_STATS() from preemptible context
isdn: mISDN: Fix potential NULL pointer dereference of kzalloc
net: dsa: mv88e6xxx: support in-band signalling on SGMII ports with external PHYs
cxgb4/chtls: Prefix adapter flags with CXGB4
net-sysfs: Switch to bitmap_zalloc()
mellanox: Switch to bitmap_zalloc()
bpf: add test cases for non-pointer sanitiation logic
mlxsw: i2c: Extend initialization by querying resources data
...
The kill() syscall operates on process identifiers (pid). After a process
has exited its pid can be reused by another process. If a caller sends a
signal to a reused pid it will end up signaling the wrong process. This
issue has often surfaced and there has been a push to address this problem [1].
This patch uses file descriptors (fd) from proc/<pid> as stable handles on
struct pid. Even if a pid is recycled the handle will not change. The fd
can be used to send signals to the process it refers to.
Thus, the new syscall pidfd_send_signal() is introduced to solve this
problem. Instead of pids it operates on process fds (pidfd).
/* prototype and argument /*
long pidfd_send_signal(int pidfd, int sig, siginfo_t *info, unsigned int flags);
/* syscall number 424 */
The syscall number was chosen to be 424 to align with Arnd's rework in his
y2038 to minimize merge conflicts (cf. [25]).
In addition to the pidfd and signal argument it takes an additional
siginfo_t and flags argument. If the siginfo_t argument is NULL then
pidfd_send_signal() is equivalent to kill(<positive-pid>, <signal>). If it
is not NULL pidfd_send_signal() is equivalent to rt_sigqueueinfo().
The flags argument is added to allow for future extensions of this syscall.
It currently needs to be passed as 0. Failing to do so will cause EINVAL.
/* pidfd_send_signal() replaces multiple pid-based syscalls */
The pidfd_send_signal() syscall currently takes on the job of
rt_sigqueueinfo(2) and parts of the functionality of kill(2), Namely, when a
positive pid is passed to kill(2). It will however be possible to also
replace tgkill(2) and rt_tgsigqueueinfo(2) if this syscall is extended.
/* sending signals to threads (tid) and process groups (pgid) */
Specifically, the pidfd_send_signal() syscall does currently not operate on
process groups or threads. This is left for future extensions.
In order to extend the syscall to allow sending signal to threads and
process groups appropriately named flags (e.g. PIDFD_TYPE_PGID, and
PIDFD_TYPE_TID) should be added. This implies that the flags argument will
determine what is signaled and not the file descriptor itself. Put in other
words, grouping in this api is a property of the flags argument not a
property of the file descriptor (cf. [13]). Clarification for this has been
requested by Eric (cf. [19]).
When appropriate extensions through the flags argument are added then
pidfd_send_signal() can additionally replace the part of kill(2) which
operates on process groups as well as the tgkill(2) and
rt_tgsigqueueinfo(2) syscalls.
How such an extension could be implemented has been very roughly sketched
in [14], [15], and [16]. However, this should not be taken as a commitment
to a particular implementation. There might be better ways to do it.
Right now this is intentionally left out to keep this patchset as simple as
possible (cf. [4]).
/* naming */
The syscall had various names throughout iterations of this patchset:
- procfd_signal()
- procfd_send_signal()
- taskfd_send_signal()
In the last round of reviews it was pointed out that given that if the
flags argument decides the scope of the signal instead of different types
of fds it might make sense to either settle for "procfd_" or "pidfd_" as
prefix. The community was willing to accept either (cf. [17] and [18]).
Given that one developer expressed strong preference for the "pidfd_"
prefix (cf. [13]) and with other developers less opinionated about the name
we should settle for "pidfd_" to avoid further bikeshedding.
The "_send_signal" suffix was chosen to reflect the fact that the syscall
takes on the job of multiple syscalls. It is therefore intentional that the
name is not reminiscent of neither kill(2) nor rt_sigqueueinfo(2). Not the
fomer because it might imply that pidfd_send_signal() is a replacement for
kill(2), and not the latter because it is a hassle to remember the correct
spelling - especially for non-native speakers - and because it is not
descriptive enough of what the syscall actually does. The name
"pidfd_send_signal" makes it very clear that its job is to send signals.
/* zombies */
Zombies can be signaled just as any other process. No special error will be
reported since a zombie state is an unreliable state (cf. [3]). However,
this can be added as an extension through the @flags argument if the need
ever arises.
/* cross-namespace signals */
The patch currently enforces that the signaler and signalee either are in
the same pid namespace or that the signaler's pid namespace is an ancestor
of the signalee's pid namespace. This is done for the sake of simplicity
and because it is unclear to what values certain members of struct
siginfo_t would need to be set to (cf. [5], [6]).
/* compat syscalls */
It became clear that we would like to avoid adding compat syscalls
(cf. [7]). The compat syscall handling is now done in kernel/signal.c
itself by adding __copy_siginfo_from_user_generic() which lets us avoid
compat syscalls (cf. [8]). It should be noted that the addition of
__copy_siginfo_from_user_any() is caused by a bug in the original
implementation of rt_sigqueueinfo(2) (cf. 12).
With upcoming rework for syscall handling things might improve
significantly (cf. [11]) and __copy_siginfo_from_user_any() will not gain
any additional callers.
/* testing */
This patch was tested on x64 and x86.
/* userspace usage */
An asciinema recording for the basic functionality can be found under [9].
With this patch a process can be killed via:
#define _GNU_SOURCE
#include <errno.h>
#include <fcntl.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <unistd.h>
static inline int do_pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
unsigned int flags)
{
#ifdef __NR_pidfd_send_signal
return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags);
#else
return -ENOSYS;
#endif
}
int main(int argc, char *argv[])
{
int fd, ret, saved_errno, sig;
if (argc < 3)
exit(EXIT_FAILURE);
fd = open(argv[1], O_DIRECTORY | O_CLOEXEC);
if (fd < 0) {
printf("%s - Failed to open \"%s\"\n", strerror(errno), argv[1]);
exit(EXIT_FAILURE);
}
sig = atoi(argv[2]);
printf("Sending signal %d to process %s\n", sig, argv[1]);
ret = do_pidfd_send_signal(fd, sig, NULL, 0);
saved_errno = errno;
close(fd);
errno = saved_errno;
if (ret < 0) {
printf("%s - Failed to send signal %d to process %s\n",
strerror(errno), sig, argv[1]);
exit(EXIT_FAILURE);
}
exit(EXIT_SUCCESS);
}
/* Q&A
* Given that it seems the same questions get asked again by people who are
* late to the party it makes sense to add a Q&A section to the commit
* message so it's hopefully easier to avoid duplicate threads.
*
* For the sake of progress please consider these arguments settled unless
* there is a new point that desperately needs to be addressed. Please make
* sure to check the links to the threads in this commit message whether
* this has not already been covered.
*/
Q-01: (Florian Weimer [20], Andrew Morton [21])
What happens when the target process has exited?
A-01: Sending the signal will fail with ESRCH (cf. [22]).
Q-02: (Andrew Morton [21])
Is the task_struct pinned by the fd?
A-02: No. A reference to struct pid is kept. struct pid - as far as I
understand - was created exactly for the reason to not require to
pin struct task_struct (cf. [22]).
Q-03: (Andrew Morton [21])
Does the entire procfs directory remain visible? Just one entry
within it?
A-03: The same thing that happens right now when you hold a file descriptor
to /proc/<pid> open (cf. [22]).
Q-04: (Andrew Morton [21])
Does the pid remain reserved?
A-04: No. This patchset guarantees a stable handle not that pids are not
recycled (cf. [22]).
Q-05: (Andrew Morton [21])
Do attempts to signal that fd return errors?
A-05: See {Q,A}-01.
Q-06: (Andrew Morton [22])
Is there a cleaner way of obtaining the fd? Another syscall perhaps.
A-06: Userspace can already trivially retrieve file descriptors from procfs
so this is something that we will need to support anyway. Hence,
there's no immediate need to add another syscalls just to make
pidfd_send_signal() not dependent on the presence of procfs. However,
adding a syscalls to get such file descriptors is planned for a
future patchset (cf. [22]).
Q-07: (Andrew Morton [21] and others)
This fd-for-a-process sounds like a handy thing and people may well
think up other uses for it in the future, probably unrelated to
signals. Are the code and the interface designed to permit such
future applications?
A-07: Yes (cf. [22]).
Q-08: (Andrew Morton [21] and others)
Now I think about it, why a new syscall? This thing is looking
rather like an ioctl?
A-08: This has been extensively discussed. It was agreed that a syscall is
preferred for a variety or reasons. Here are just a few taken from
prior threads. Syscalls are safer than ioctl()s especially when
signaling to fds. Processes are a core kernel concept so a syscall
seems more appropriate. The layout of the syscall with its four
arguments would require the addition of a custom struct for the
ioctl() thereby causing at least the same amount or even more
complexity for userspace than a simple syscall. The new syscall will
replace multiple other pid-based syscalls (see description above).
The file-descriptors-for-processes concept introduced with this
syscall will be extended with other syscalls in the future. See also
[22], [23] and various other threads already linked in here.
Q-09: (Florian Weimer [24])
What happens if you use the new interface with an O_PATH descriptor?
A-09:
pidfds opened as O_PATH fds cannot be used to send signals to a
process (cf. [2]). Signaling processes through pidfds is the
equivalent of writing to a file. Thus, this is not an operation that
operates "purely at the file descriptor level" as required by the
open(2) manpage. See also [4].
/* References */
[1]: https://lore.kernel.org/lkml/20181029221037.87724-1-dancol@google.com/
[2]: https://lore.kernel.org/lkml/874lbtjvtd.fsf@oldenburg2.str.redhat.com/
[3]: https://lore.kernel.org/lkml/20181204132604.aspfupwjgjx6fhva@brauner.io/
[4]: https://lore.kernel.org/lkml/20181203180224.fkvw4kajtbvru2ku@brauner.io/
[5]: https://lore.kernel.org/lkml/20181121213946.GA10795@mail.hallyn.com/
[6]: https://lore.kernel.org/lkml/20181120103111.etlqp7zop34v6nv4@brauner.io/
[7]: https://lore.kernel.org/lkml/36323361-90BD-41AF-AB5B-EE0D7BA02C21@amacapital.net/
[8]: https://lore.kernel.org/lkml/87tvjxp8pc.fsf@xmission.com/
[9]: https://asciinema.org/a/IQjuCHew6bnq1cr78yuMv16cy
[11]: https://lore.kernel.org/lkml/F53D6D38-3521-4C20-9034-5AF447DF62FF@amacapital.net/
[12]: https://lore.kernel.org/lkml/87zhtjn8ck.fsf@xmission.com/
[13]: https://lore.kernel.org/lkml/871s6u9z6u.fsf@xmission.com/
[14]: https://lore.kernel.org/lkml/20181206231742.xxi4ghn24z4h2qki@brauner.io/
[15]: https://lore.kernel.org/lkml/20181207003124.GA11160@mail.hallyn.com/
[16]: https://lore.kernel.org/lkml/20181207015423.4miorx43l3qhppfz@brauner.io/
[17]: https://lore.kernel.org/lkml/CAGXu5jL8PciZAXvOvCeCU3wKUEB_dU-O3q0tDw4uB_ojMvDEew@mail.gmail.com/
[18]: https://lore.kernel.org/lkml/20181206222746.GB9224@mail.hallyn.com/
[19]: https://lore.kernel.org/lkml/20181208054059.19813-1-christian@brauner.io/
[20]: https://lore.kernel.org/lkml/8736rebl9s.fsf@oldenburg.str.redhat.com/
[21]: https://lore.kernel.org/lkml/20181228152012.dbf0508c2508138efc5f2bbe@linux-foundation.org/
[22]: https://lore.kernel.org/lkml/20181228233725.722tdfgijxcssg76@brauner.io/
[23]: https://lwn.net/Articles/773459/
[24]: https://lore.kernel.org/lkml/8736rebl9s.fsf@oldenburg.str.redhat.com/
[25]: https://lore.kernel.org/lkml/CAK8P3a0ej9NcJM8wXNPbcGUyOUZYX+VLoDFdbenW3s3114oQZw@mail.gmail.com/
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Jann Horn <jannh@google.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Christian Brauner <christian@brauner.io>
Reviewed-by: Tycho Andersen <tycho@tycho.ws>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Serge Hallyn <serge@hallyn.com>
Acked-by: Aleksa Sarai <cyphar@cyphar.com>
Pull LED updates from Jacek Anaszewski:
- finalize previously announced support for initialization of pattern
triggers from Device Tree
- fix for null deref on firmware load failure in leds-lp55xx-common.c
* tag 'leds-for-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
leds: lp55xx: fix null deref on firmware load failure
leds: trigger: timer: Add initialization from Device Tree
leds: trigger: oneshot: Add initialization from Device Tree
leds: trigger: pattern: Add pattern initialization from Device Tree
leds: Add helper for getting default pattern from Device Tree
dt-bindings: leds: Add pattern initialization from Device Tree
Pull spi updates from Mark Brown:
"A fairly quiet release for SPI, the biggest thing is the conversion to
use GPIO descriptors which is now 90% done but still needs some
stragglers converting.
Summary:
- Support for inter-word delays
- Conversion of the core and most drivers to use GPIO descriptors for
GPIO controlled chip selects
- New drivers for NXP FlexSPI and QuadSPI, SiFive and Spreadtrum"
* tag 'spi-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (104 commits)
spi: sh-msiof: Restrict bits per word to 8/16/24/32 on R-Car Gen2/3
spi: sifive: Remove redundant dev_err call in sifive_spi_probe()
spi: sifive: Remove spi_master_put in sifive_spi_remove()
spi: spi-gpio: fix SPI_CS_HIGH capability
spi: pxa2xx: Setup maximum supported DMA transfer length
spi: sifive: Add driver for the SiFive SPI controller
spi: sifive: Add DT documentation for SiFive SPI controller
spi: sprd: Add a prefix for SPI DMA channel macros
spi: sprd: spi: sprd: Add DMA mode support
dt-bindings: spi: Add the DMA properties for the SPI dma mode
spi: sprd: Add the SPI irq function for the SPI DMA mode
dt-bindings: spi: imx: Add an entry for the i.MX8QM compatible
spi: use gpio[d]_set_value_cansleep for setting chipselect GPIO
spi: gpio: Advertise support for SPI_CS_HIGH
spi: sh-msiof: Replace spi_master by spi_controller
spi: sh-hspi: Replace spi_master by spi_controller
spi: rspi: Replace spi_master by spi_controller
spi: atmel-quadspi: add support for sam9x60 qspi controller
dt-bindings: spi: atmel-quadspi: QuadSPI driver for Microchip SAM9X60
spi: atmel-quadspi: add support for named peripheral clock
...
Pull regulator updates from Mark Brown:
"The bulk of the standout changes in this release are cleanups, with
the core work being a combination of factoring out common code into
helpers and the completion of the conversion of the core to use GPIO
descriptors.
Summary:
- Addition of helper functions for current limits and conversion of
drivers to use them by Axel Lin.
- Lots and lots of cleanups from Axel Lin.
- Conversion of the core to use GPIO descriptors rather than numbers
by Linus Walleij.
- New drivers for Maxim MAX77650 and ROHM BD70528"
* tag 'regulator-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (131 commits)
regulator: mc13xxx: Constify regulator_ops variables
regulator: palmas: Constify palmas_smps_ramp_delay array
regulator: wm831x-dcdc: Convert to use regulator_set/get_current_limit_regmap
regulator: pv88090: Convert to use regulator_set/get_current_limit_regmap
regulator: pv88080: Convert to use regulator_set/get_current_limit_regmap
regulator: pv88060: Convert to use regulator_set/get_current_limit_regmap
regulator: max77650: Convert to use regulator_set/get_current_limit_regmap
regulator: lp873x: Convert to use regulator_set/get_current_limit_regmap
regulator: lp872x: Convert to use regulator_set/get_current_limit_regmap
regulator: da9210: Convert to use regulator_set/get_current_limit_regmap
regulator: da9055: Convert to use regulator_set/get_current_limit_regmap
regulator: core: Add set/get_current_limit helpers for regmap users
regulator: Fix comment for csel_reg and csel_mask
regulator: stm32-vrefbuf: add power management support
regulator: 88pm8607: Remove unused fields from struct pm8607_regulator_info
regulator: 88pm8607: Simplify pm8607_list_voltage implementation
regulator: cpcap: Constify omap4_regulators and xoom_regulators
regulator: cpcap: Remove unused vsel_shift from struct cpcap_regulator
dt-bindings: regulator: tps65218: rectify units of LS3
dt-bindings: regulator: add LS2 load switch documentation
...
Pull regmap updates from Mark Brown:
"There are only two changes here:
- fix for conflicting attributes on the rbtree node structure
- implementation of main status register support in the interrupt
code which supports chips that have a register to cut down on the
number of per-interrupt status registers that need to be checked
when handling interrupts"
* tag 'regmap-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: Remove attribute packed from struct 'regcache_rbtree_node'
regmap: regmap-irq: Add main status register support
Pull MMC updates from Ulf Hansson:
"MMC core:
- Fixup max_discard/trim calculations
- Announce SD specs greater than 4.0
- Add discard support for SD cards
- Don't do retries for CMD6 (SWITCH command)
- Various cleanups and re-structuring
MMC host:
- cqhci:
* Add maintainers for eMMC CQHCI driver
- sdhci:
* Consolidate WP GPIO code
* Add ADMA3 DMA support for V4 enabled host
* Fixup card detect support in pci-o2micro driver
* Add support for CMDQ and SDMMC pads auto-calibration in tegra
driver
* Add DCMD support and CMDQ support, support for i.MX6ULL variant,
fixup HS400 timing issue and add HS400_ES support for i.MX8QXP
to esdhc-imx driver
* Avoid CRC errors by adjusting settings to speed mode and fixup
card initialization for high speed mode in renesas_sdhi
* Fixup timeout settings for omap
* Enable 8 bits bus-width support in atmel-mci
* Convert some legacy code in jz4740 driver to use modern APIs
* Send a CMD12 to clear DPSM at errors for STM32 sdmmc mmci
driver"
* tag 'mmc-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (69 commits)
mmc:fix a bug when max_discard is 0
mmc: core: Add a debug print when the card may have been replaced
mmc: core: Add sd discard timeout
mmc: core: Add discard support to sd
mmc: sdhci-esdhc-imx: clear the HALT bit when enable CQE
mmc: core: do not retry CMD6 in __mmc_switch()
mmc: core: Convert mmc_align_data_size() into an SDIO specific function
mmc: core: Move mmc_of_parse_voltage() to host.c
mmc: core: Convert mmc_regulator_get_ocrmask() to static
mmc: core: Move regulator helpers to separate file
mmc: of_mmc_spi: Convert to mmc_of_parse_voltage()
mmc: core: Drop retries as in-parameter to mmc_wait_for_app_cmd()
mmc: core: Convert mmc_wait_for_app_cmd() to static
mmc: renesas_sdhi: Change HW adjustment register according to speed mode
mmc: mmci: Send a CMD12 to clear the DPSM at errors
mmc: sdhci-xenon: Fixup already marked switch fall-through
mmc: sdhci-tegra: drop ->get_ro() implementation
mmc: sdhci-omap: drop ->get_ro() implementation
mmc: sdhci: use WP GPIO in sdhci_check_ro()
mmc: wmt-sdmmc: Drop unused include
...
Pull MTD updates from Boris Brezillon:
"Core MTD changes:
- Use struct_size() where appropriate
- mtd_{read,write}() as wrappers around mtd_{read,write}_oob()
- Fix misuse of PTR_ERR() in docg3
- Coding style improvements in mtdcore.c
SPI NOR changes:
Core changes:
- Add support of octal mode I/O transfer
- Add a bunch of SPI NOR entries to the flash_info table
SPI NOR controller driver changes:
- cadence-quadspi:
* Add support for Octal SPI controller
* write upto 8-bytes data in STIG mode
- mtk-quadspi:
* rename config to a common one
* add SNOR_HWCAPS_READ to spi_nor_hwcaps mask
- Add Tudor as SPI-NOR co-maintainer
NAND changes:
NAND core changes:
- Fourth batch of fixes/cleanup to the raw NAND core impacting
various controller drivers (Sunxi, Marvell, MTK, TMIO, OMAP2).
- Check the return code of nand_reset() and nand_readid_op().
- Remove ->legacy.erase and single_erase().
- Simplify the locking.
- Several implicit fall through annotations.
Raw NAND controllers drivers changes:
- Fix various possible object reference leaks (MTK, JZ4780, Atmel)
- ST:
* Add support for STM32 FMC2 NAND flash controller
- Meson:
* Add support for Amlogic NAND flash controller
- Denali:
* Several cleanup patches
- Sunxi:
* Several cleanup patches
- FSMC:
* Disable NAND on remove()
* Reset NAND timings on resume()
SPI-NAND drivers changes:
- Toshiba:
* Add support for all Toshiba products.
- Macronix:
* Fix ECC status read.
- Gigadevice:
* Add support for GD5F1GQ4UExxG"
* tag 'mtd/for-5.1' of git://git.infradead.org/linux-mtd: (64 commits)
mtd: spi-nor: Fix wrong abbreviation HWCPAS
mtd: spi-nor: cadence-quadspi: fix spelling mistake: "Couldnt't" -> "Couldn't"
mtd: spi-nor: Add support for en25qh64
mtd: spi-nor: Add support for MX25V8035F
mtd: spi-nor: Add support for EN25Q80A
mtd: spi-nor: cadence-quadspi: Add support for Octal SPI controller
dt-bindings: cadence-quadspi: Add new compatible for AM654 SoC
mtd: spi-nor: split s25fl128s into s25fl128s0 and s25fl128s1
mtd: spi-nor: cadence-quadspi: write upto 8-bytes data in STIG mode
mtd: spi-nor: Add support for mx25u3235f
mtd: rawnand: denali_dt: remove single anonymous clock support
mtd: rawnand: mtk: fix possible object reference leak
mtd: rawnand: jz4780: fix possible object reference leak
mtd: rawnand: atmel: fix possible object reference leak
mtd: rawnand: fsmc: Disable NAND on remove()
mtd: rawnand: fsmc: Reset NAND timings on resume()
mtd: spinand: Add support for GigaDevice GD5F1GQ4UExxG
mtd: rawnand: denali: remove unused dma_addr field from denali_nand_info
mtd: rawnand: denali: remove unused function argument 'raw'
mtd: rawnand: denali: remove unneeded denali_reset_irq() call
...
Pull VFIO updates from Alex Williamson:
- Switch mdev to generic UUID API (Andy Shevchenko)
- Fixup platform reset include paths (Masahiro Yamada)
- Fix usage of MINORMASK (Chengguang Xu)
- Remove noise from duplicate spapr table unsets (Alexey Kardashevskiy)
- Restore device state after PM reset (Alex Williamson)
- Ensure memory translation enabled for PCI ROM access (Eric Auger)
* tag 'vfio-v5.1-rc1' of git://github.com/awilliam/linux-vfio:
vfio_pci: Enable memory accesses before calling pci_map_rom
vfio/pci: Restore device state on PM transition
vfio/spapr_tce: Skip unsetting already unset table
samples/vfio-mdev/mtty: expand minor range when registering chrdev region
samples/vfio-mdev/mdpy: expand minor range when registering chrdev region
samples/vfio-mdev/mbochs: expand minor range when registering chrdev region
vfio: expand minor range when registering chrdev region
vfio: platform: reset: fix up include directives to remove ccflags-y
vfio-mdev: Switch to use new generic UUID API
Al Viro root-caused a race where the IOCB_CMD_POLL handling of
fget/fput() could cause us to access the file pointer after it had
already been freed:
"In more details - normally IOCB_CMD_POLL handling looks so:
1) io_submit(2) allocates aio_kiocb instance and passes it to
aio_poll()
2) aio_poll() resolves the descriptor to struct file by req->file =
fget(iocb->aio_fildes)
3) aio_poll() sets ->woken to false and raises ->ki_refcnt of that
aio_kiocb to 2 (bumps by 1, that is).
4) aio_poll() calls vfs_poll(). After sanity checks (basically,
"poll_wait() had been called and only once") it locks the queue.
That's what the extra reference to iocb had been for - we know we
can safely access it.
5) With queue locked, we check if ->woken has already been set to
true (by aio_poll_wake()) and, if it had been, we unlock the
queue, drop a reference to aio_kiocb and bugger off - at that
point it's a responsibility to aio_poll_wake() and the stuff
called/scheduled by it. That code will drop the reference to file
in req->file, along with the other reference to our aio_kiocb.
6) otherwise, we see whether we need to wait. If we do, we unlock the
queue, drop one reference to aio_kiocb and go away - eventual
wakeup (or cancel) will deal with the reference to file and with
the other reference to aio_kiocb
7) otherwise we remove ourselves from waitqueue (still under the
queue lock), so that wakeup won't get us. No async activity will
be happening, so we can safely drop req->file and iocb ourselves.
If wakeup happens while we are in vfs_poll(), we are fine - aio_kiocb
won't get freed under us, so we can do all the checks and locking
safely. And we don't touch ->file if we detect that case.
However, vfs_poll() most certainly *does* touch the file it had been
given. So wakeup coming while we are still in ->poll() might end up
doing fput() on that file. That case is not too rare, and usually we
are saved by the still present reference from descriptor table - that
fput() is not the final one.
But if another thread closes that descriptor right after our fget()
and wakeup does happen before ->poll() returns, we are in trouble -
final fput() done while we are in the middle of a method:
Al also wrote a patch to take an extra reference to the file descriptor
to fix this, but I instead suggested we just streamline the whole file
pointer handling by submit_io() so that the generic aio submission code
simply keeps the file pointer around until the aio has completed.
Fixes: bfe4037e72 ("aio: implement IOCB_CMD_POLL")
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: syzbot+503d4cc169fcec1cb18c@syzkaller.appspotmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-03-04
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add AF_XDP support to libbpf. Rationale is to facilitate writing
AF_XDP applications by offering higher-level APIs that hide many
of the details of the AF_XDP uapi. Sample programs are converted
over to this new interface as well, from Magnus.
2) Introduce a new cant_sleep() macro for annotation of functions
that cannot sleep and use it in BPF_PROG_RUN() to assert that
BPF programs run under preemption disabled context, from Peter.
3) Introduce per BPF prog stats in order to monitor the usage
of BPF; this is controlled by kernel.bpf_stats_enabled sysctl
knob where monitoring tools can make use of this to efficiently
determine the average cost of programs, from Alexei.
4) Split up BPF selftest's test_progs similarly as we already
did with test_verifier. This allows to further reduce merge
conflicts in future and to get more structure into our
quickly growing BPF selftest suite, from Stanislav.
5) Fix a bug in BTF's dedup algorithm which can cause an infinite
loop in some circumstances; also various BPF doc fixes and
improvements, from Andrii.
6) Various BPF sample cleanups and migration to libbpf in order
to further isolate the old sample loader code (so we can get
rid of it at some point), from Jakub.
7) Add a new BPF helper for BPF cgroup skb progs that allows
to set ECN CE code point and a Host Bandwidth Manager (HBM)
sample program for limiting the bandwidth used by v2 cgroups,
from Lawrence.
8) Enable write access to skb->queue_mapping from tc BPF egress
programs in order to let BPF pick TX queue, from Jesper.
9) Fix a bug in BPF spinlock handling for map-in-map which did
not propagate spin_lock_off to the meta map, from Yonghong.
10) Fix a bug in the new per-CPU BPF prog counters to properly
initialize stats for each CPU, from Eric.
11) Add various BPF helper prototypes to selftest's bpf_helpers.h,
from Willem.
12) Fix various BPF samples bugs in XDP and tracing progs,
from Toke, Daniel and Yonghong.
13) Silence preemption splat in test_bpf after BPF_PROG_RUN()
enforces it now everywhere, from Anders.
14) Fix a signedness bug in libbpf's btf_dedup_ref_type() to
get error handling working, from Dan.
15) Fix bpftool documentation and auto-completion with regards
to stream_{verdict,parser} attach types, from Alban.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
* pm-cpufreq: (48 commits)
cpufreq: kryo: Release OPP tables on module removal
cpufreq: ap806: add missing of_node_put after of_device_is_available
cpufreq: acpi-cpufreq: Report if CPU doesn't support boost technologies
cpufreq: Pass updated policy to driver ->setpolicy() callback
cpufreq: Fix two debug messages in cpufreq_set_policy()
cpufreq: Reorder and simplify cpufreq_update_policy()
cpufreq: Add kerneldoc comments for two core functions
cpufreq: intel_pstate: Rework iowait boosting to be less aggressive
cpufreq: intel_pstate: Eliminate intel_pstate_get_base_pstate()
cpufreq: intel_pstate: Avoid redundant initialization of local vars
cpufreq / cppc: Work around for Hisilicon CPPC cpufreq
ACPI / CPPC: Add a helper to get desired performance
cpufreq: davinci: move configuration to include/linux/platform_data
cpufreq: speedstep: convert BUG() to BUG_ON()
cpufreq: powernv: fix missing check of return value in init_powernv_pstates()
cpufreq: longhaul: remove unneeded semicolon
cpufreq: pcc-cpufreq: remove unneeded semicolon
cpufreq: Replace double NOT (!!) with single NOT (!)
cpufreq: intel_pstate: Add reasons for failure and debug messages
cpufreq: dt: Implement online/offline() callbacks
...
* pm-cpuidle:
ACPI / processor: Set P_LVL{2,3} idle state descriptions
intel_idle: add support for Jacobsville
cpuidle: dt: bail out if the idle-state DT node is not compatible
cpuidle: use BIT() for idle state flags and remove CPUIDLE_DRIVER_FLAGS_MASK
Documentation: driver-api: PM: Add cpuidle document
cpuidle: New timer events oriented governor for tickless systems
* powercap:
powercap/intel_rapl: add Ice Lake mobile
powercap: intel_rapl: add support for Jacobsville
* pm-core:
PM / core: Add support to skip power management in device/driver model
PM / suspend: Print debug messages for device using direct-complete
PM-runtime: update time accounting only when enabled
PM-runtime: Switch accounting over to ktime_get_mono_fast_ns()
PM-runtime: Optimize pm_runtime_autosuspend_expiration()
PM-runtime: Replace jiffies-based accounting with ktime-based accounting
PM-runtime: update accounting_timestamp on enable
PM: clock_ops: fix missing clk_prepare() return value check
drm/i915: Move on the new pm runtime interface
PM-runtime: Add new interface to get accounted time
* pm-sleep:
PM / wakeup: fix kerneldoc comment for pm_wakeup_dev_event()
* pm-qos:
PM: QoS: no need to check return value of debugfs_create functions
* pm-domains:
PM / Domains: Mark "name" const in dev_pm_domain_attach_by_name()
PM / Domains: Mark "name" const in genpd_dev_pm_attach_by_name()
PM: domains: no need to check return value of debugfs_create functions
* pm-em:
PM / EM: Expose the Energy Model in debugfs
* acpi-apei: (29 commits)
efi: cper: Fix possible out-of-bounds access
ACPI: APEI: Fix possible out-of-bounds access to BERT region
MAINTAINERS: Add James Morse to the list of APEI reviewers
ACPI / APEI: Add support for the SDEI GHES Notification type
firmware: arm_sdei: Add ACPI GHES registration helper
ACPI / APEI: Use separate fixmap pages for arm64 NMI-like notifications
ACPI / APEI: Only use queued estatus entry during in_nmi_queue_one_entry()
ACPI / APEI: Split ghes_read_estatus() to allow a peek at the CPER length
ACPI / APEI: Make GHES estatus header validation more user friendly
ACPI / APEI: Pass ghes and estatus separately to avoid a later copy
ACPI / APEI: Let the notification helper specify the fixmap slot
ACPI / APEI: Move locking to the notification helper
arm64: KVM/mm: Move SEA handling behind a single 'claim' interface
KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing
ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue
ACPI / APEI: Move NOTIFY_SEA between the estatus-queue and NOTIFY_NMI
ACPI / APEI: Don't allow ghes_ack_error() to mask earlier errors
ACPI / APEI: Generalise the estatus queue's notify code
ACPI / APEI: Don't update struct ghes' flags in read/clear estatus
ACPI / APEI: Remove spurious GHES_TO_CLEAR check
...
* acpi-tables:
ACPI/PPTT: Add acpi_pptt_warn_missing() to consolidate logs
ACPI / tables: table override from built-in initrd
* acpi-debug:
ACPI: debug: Clean up acpi_aml_init()
ACPI: no need to check return value of debugfs_create functions
* acpi-ec:
Revert "ACPI / EC: Remove old CLEAR_ON_RESUME quirk"
ACPI: EC: Simplify boot EC checks in acpi_ec_add()
ACPI: EC: Eliminate acpi_config_boot_ec()
ACPI: EC: Make acpi_ec_dsdt_probe() more straightforward
ACPI: EC: Make acpi_ec_ecdt_probe() more straightforward
ACPI: EC: Declare boot_ec as static
ACPI: EC: Clean up probing for early EC
* acpi-dptf:
ACPI / DPTF: remove header search path to the parent directory
genphy_no_soft_reset and gen10g_no_soft_reset are both the same no-ops,
one is enough.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
gen10g_read_status is deprecated, therefore stop exporting it.
We don't want to encourage anybody to use it.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
ETHTOOL_LINK_MODE_10000baseT_Full_BIT is set anyway in the supported
and advertising bitmap because it's part of PHY_10GBIT_FEATURES.
And all users of gen10g_config_init use PHY_10GBIT_FEATURES.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
phy_suspend() and phy_resume() are no-ops anyway if no callback is
defined. Therefore we don't need these stubs.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
By default IPv6 socket with IPV6_ROUTER_ALERT socket option set will
receive all IPv6 RA packets from all namespaces.
IPV6_ROUTER_ALERT_ISOLATE socket option restricts packets received by
the socket to be only from the socket's namespace.
Signed-off-by: Maxim Martynov <maxim@arista.com>
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a really hairy resolution involving amdgpu fixes, that I'd rather confirm here.
Also some misc fixes are landed by me, but the pr has them as well.
Signed-off-by: Dave Airlie <airlied@redhat.com>
By setting curr_table, n_current_limits, csel_reg and csel_mask, the
regmap users can use regulator_set_current_limit_regmap and
regulator_get_current_limit_regmap for set/get_current_limit callbacks.
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
The csel_reg and csel_mask fields in struct regulator_desc needs to
be generic for drivers. Not just for TPS65218.
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Saeed Mahameed says:
====================
mlx5-updates-2019-03-01
This series adds multipath offload support and contains some small updates
to mlx5 driver.
Multipath offload support from Roi Dayan:
We are going to track SW multipath route and related nexthops and reflect
that as port affinity to the HW.
1) Some patches are preparation.
2) add the multipath mode and fib events handling.
3) add support to handle offload failure for net error, i.e.
port down.
4) Small updates to match the behavior of multipath
Two small updates from Eran Ben Elisha,
5) Make a function static
6) Update PCIe supported devices list.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for net-next:
1) Add .release_ops to properly unroll .select_ops, use it from nft_compat.
After this change, we can remove list of extensions too to simplify this
codebase.
2) Update amanda conntrack helper to support v3.4, from Florian Tham.
3) Get rid of the obsolete BUGPRINT macro in ebtables, from
Florian Westphal.
4) Merge IPv4 and IPv6 masquerading infrastructure into one single module.
From Florian Westphal.
5) Patchset to remove nf_nat_l3proto structure to get rid of
indirections, from Florian Westphal.
6) Skip unnecessary conntrack timeout updates in case the value is
still the same, also from Florian Westphal.
7) Remove unnecessary 'fall through' comments in empty switch cases,
from Li RongQing.
8) Fix lookup to fixed size hashtable sets on big endian with 32-bit keys.
9) Incorrect logic to deactivate path of fixed size hashtable sets,
element was being tested to self.
10) Remove nft_hash_key(), the bitmap set is always selected for 16-bit
keys.
11) Use boolean whenever possible in IPVS codebase, from Andrea Claudi.
12) Enter close state in conntrack if RST matches exact sequence number,
from Florian Westphal.
13) Initialize dst_cache in tunnel extension, from wenxu.
14) Pass protocol as u16 to xt_check_match and xt_check_target, from
Li RongQing.
15) SCTP header is granted to be in a linear area from IPVS NAT handler,
from Xin Long.
16) Don't steal packets coming from slave VRF device from the
ip_sabotage_in() path, from David Ahern.
17) Fix unsafe update of basechain stats, from Li RongQing.
18) Make sure CONNTRACK_LOCKS is power of 2 to let compiler optimize
modulo operation as bitwise AND, from Li RongQing.
19) Use device_attribute instead of internal definition in the IDLETIMER
target, from Sami Tolvanen.
20) Merge redir, masq and IPv4/IPv6 NAT chain types, from Florian Westphal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Fix refcount leak in act_ipt during replace, from Davide Caratti.
2) Set task state properly in tun during blocking reads, from Timur
Celik.
3) Leaked reference in DSA, from Wen Yang.
4) NULL deref in act_tunnel_key, from Vlad Buslov.
5) cipso_v4_erro can reference the skb IPCB in inappropriate contexts
thus referencing garbage, from Nazarov Sergey.
6) Don't accept RTA_VIA and RTA_GATEWAY in contexts where those
attributes make no sense.
7) Fix hung sendto in tipc, from Tung Nguyen.
8) Out-of-bounds access in netlabel, from Paul Moore.
9) Grant reference leak in xen-netback, from Igor Druzhinin.
10) Fix tx stalls with lan743x, from Bryan Whitehead.
11) Fix interrupt storm with mv88e6xxx, from Hein Kallweit.
12) Memory leak in sit on device registry failure, from Mao Wenan.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
net: sit: fix memory leak in sit_init_net()
net: dsa: mv88e6xxx: Fix statistics on mv88e6161
geneve: correctly handle ipv6.disable module parameter
net: dsa: mv88e6xxx: prevent interrupt storm caused by mv88e6390x_port_set_cmode
bpf: fix sanitation rewrite in case of non-pointers
ipv4: Add ICMPv6 support when parse route ipproto
MIPS: eBPF: Fix icache flush end address
lan743x: Fix TX Stall Issue
net: phy: phylink: fix uninitialized variable in phylink_get_mac_state
net: aquantia: regression on cpus with high cores: set mode with 8 queues
selftests: fixes for UDP GRO
bpf: drop refcount if bpf_map_new_fd() fails in map_create()
net: dsa: mv88e6xxx: power serdes on/off for 10G interfaces on 6390X
net: dsa: mv88e6xxx: Fix u64 statistics
xen-netback: don't populate the hash cache on XenBus disconnect
xen-netback: fix occasional leak of grant ref mappings under memory pressure
sctp: chunk.c: correct format string for size_t in printk
net: netem: fix skb length BUG_ON in __skb_to_sgvec
netlabel: fix out-of-bounds memory accesses
ipv4: Pass original device to ip_rcv_finish_core
...
There are two new fields added to mlxreg core structure:
features - supported features of device and
identity - device identity name.
Add new defines for watchdog features.
Signed-off-by: Michael Shych <michaelsh@mellanox.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
If a layout segment gets invalidated while a pNFS I/O operation
is queued for transmission, then we ideally want to abort
immediately. This is particularly the case when there is a large
number of I/O related RPCs queued in the RPC layer, and the layout
segment gets invalidated due to an ENOSPC error, or an EACCES (because
the client was fenced). We may end up forced to spam the MDS with a
lot of otherwise unnecessary LAYOUTERRORs after that I/O fails.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Under multipath offload scheme, as part of handling fib events, emit
mlx5 port affinity event on the enabled ports which will be handled by
the tc offloads code.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In order to offload ecmp-on-host scheme where next-hop routes are used,
we will make use of HW LAG. Add accessor function to let upper layers
in the driver to realize if the lag acts in multi-path mode.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This static inline is unnecessary and can be removed
by using the vsprintf %ph extension.
This reduces overall object size by more than 2K.
Reported-by: Louis Taylor <louis@kragniz.eu>
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Louis Taylor <louis@kragniz.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sasha writes:
1. Exopsing counters for state changes of channel ring buffers; this is
useful to investigate performance issues. By Kimberly Brown.
2. Switching to the new generic UUID API, by Andy Shevchenko.
* tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
Drivers: hv: vmbus: Expose counters for interrupts and full conditions
vmbus: Switch to use new generic UUID API