Commit Graph

7143 Commits

Author SHA1 Message Date
Randy Dunlap
7deff7b5b4 hyperv: hyperv.h: drop a duplicated word
Drop the repeated word "the" in a comment.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: linux-hyperv@vger.kernel.org
Link: https://lore.kernel.org/r/20200719002841.20369-1-rdunlap@infradead.org
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2020-07-23 17:55:20 +00:00
Randy Dunlap
1859f4ebcf android: binder.h: drop a duplicated word
Drop the repeated word "the" in a comment.

Cc: Arve Hjønnevåg <arve@android.com>
Cc: Todd Kjos <tkjos@android.com>
Cc: Martijn Coenen <maco@android.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Hridya Valsaraju <hridya@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: devel@driverdev.osuosl.org
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20200719002738.20210-1-rdunlap@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-23 09:35:36 +02:00
Greg Kroah-Hartman
cb0cec23ce Merge tag 'fpga-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mdf/linux-fpga into char-misc-next
Moritz writes:

FPGA Manager changes for 5.9-rc1

Here is the (slightly larger than usual) patch set for the 5.9-rc1 merge
window.

DFL:
- Xu's changes add support for AFU interrupt handling and puts them to
  use for error handling.
- Xu's other change also adds another device-id for the Intel FPGA PAC N3000.
- John's change converts from using get_user_pages() to
  pin_user_pages().
- Gustavo's patch cleans up some of the allocation by using
  struct_size().

Xilinx:
- Luca's changes clean up the xilinx-spi and xilinx-slave-serial drivers
  and updates the comments and dt-bindings to reflect the fact it also
  supports 7 series devices.

Core:
- Tom cleaned up the fpga-bridge / fpga-mgr core by removing some
  dead-stores.

All patches have been reviewed on the mailing list, and have been in the
last few linux-next releases (as part of my for-next branch) without issues.

Signed-off-by: Moritz Fischer <mdf@kernel.org>

* tag 'fpga-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mdf/linux-fpga:
  fpga: dfl: pci: add device id for Intel FPGA PAC N3000
  Documentation: fpga: dfl: add descriptions for interrupt related interfaces.
  fpga: dfl: afu: add AFU interrupt support
  fpga: dfl: fme: add interrupt support for global error reporting
  fpga: dfl: afu: add interrupt support for port error reporting
  fpga: dfl: introduce interrupt trigger setting API
  fpga: dfl: pci: add irq info for feature devices enumeration
  fpga: dfl: parse interrupt info for feature devices on enumeration
  fpga manager: xilinx-spi: check INIT_B pin during write_init
  dt-bindings: fpga: xilinx-slave-serial: add optional INIT_B GPIO
  fpga: Fix dead store in fpga-bridge.c
  fpga: Fix dead store fpga-mgr.c
  fpga: dfl: Use struct_size() in kzalloc()
  fpga manager: xilinx-spi: remove unneeded, mistyped variables
  fpga manager: xilinx-spi: valid for the 7 Series too
  dt-bindings: fpga: xilinx-slave-serial: valid for the 7 Series too
  fpga: dfl: afu: convert get_user_pages() --> pin_user_pages()
2020-07-23 09:24:26 +02:00
David S. Miller
dee72f8a0c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-07-21

The following pull-request contains BPF updates for your *net-next* tree.

We've added 46 non-merge commits during the last 6 day(s) which contain
a total of 68 files changed, 4929 insertions(+), 526 deletions(-).

The main changes are:

1) Run BPF program on socket lookup, from Jakub.

2) Introduce cpumap, from Lorenzo.

3) s390 JIT fixes, from Ilya.

4) teach riscv JIT to emit compressed insns, from Luke.

5) use build time computed BTF ids in bpf iter, from Yonghong.
====================

Purely independent overlapping changes in both filter.h and xdp.h

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22 12:35:33 -07:00
Randy Dunlap
c333f9495c raid: md_p.h: drop duplicated word in a comment
Drop the doubled word "the" in a comment.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: linux-raid@vger.kernel.org
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-07-21 22:05:32 -07:00
Martin Varghese
4787dd582d bareudp: Reverted support to enable & disable rx metadata collection
The commit fe80536acf ("bareudp: Added attribute to enable & disable
rx metadata collection") breaks the the original(5.7) default behavior of
bareudp module to collect RX metadadata at the receive. It was added to
avoid the crash at the kernel neighbour subsytem when packet with metadata
from bareudp is processed. But it is no more needed as the
commit 394de110a7 ("net: Added pointer check for
dst->ops->neigh_lookup in dst_neigh_lookup_skb") solves this crash.

Fixes: fe80536acf ("bareudp: Added attribute to enable & disable rx metadata collection")
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 18:30:47 -07:00
Max Englander
b43870c74f audit: report audit wait metric in audit status reply
In environments where the preservation of audit events and predictable
usage of system memory are prioritized, admins may use a combination of
--backlog_wait_time and -b options at the risk of degraded performance
resulting from backlog waiting. In some cases, this risk may be
preferred to lost events or unbounded memory usage. Ideally, this risk
can be mitigated by making adjustments when backlog waiting is detected.

However, detection can be difficult using the currently available
metrics. For example, an admin attempting to debug degraded performance
may falsely believe a full backlog indicates backlog waiting. It may
turn out the backlog frequently fills up but drains quickly.

To make it easier to reliably track degraded performance to backlog
waiting, this patch makes the following changes:

Add a new field backlog_wait_time_total to the audit status reply.
Initialize this field to zero. Add to this field the total time spent
by the current task on scheduled timeouts while the backlog limit is
exceeded. Reset field to zero upon request via AUDIT_SET.

Tested on Ubuntu 18.04 using complementary changes to the
audit-userspace and audit-testsuite:
- https://github.com/linux-audit/audit-userspace/pull/134
- https://github.com/linux-audit/audit-testsuite/pull/97

Signed-off-by: Max Englander <max.englander@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-07-21 11:21:44 -04:00
Peter Zijlstra
6c0246a458 perf: Add perf_event_mmap_page::cap_user_time_short ABI
In order to support short clock counters, provide an ABI extension.

As a whole:

    u64 time, delta, cyc = read_cycle_counter();

+   if (cap_user_time_short)
+	cyc = time_cycle + ((cyc - time_cycle) & time_mask);

    delta = mul_u64_u32_shr(cyc, time_mult, time_shift);

    if (cap_user_time_zero)
	time = time_zero + delta;

    delta += time_offset;

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Link: https://lore.kernel.org/r/20200716051130.4359-6-leo.yan@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
2020-07-20 11:50:47 +01:00
Greg Kroah-Hartman
c4d41d0055 Merge v5.8-rc6 into char-misc-next
We need the char/misc fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-20 09:43:40 +02:00
Greg Kroah-Hartman
eed3c957dd Merge 5.8-rc6 into usb-next
We need the USB fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-20 09:41:30 +02:00
Greg Kroah-Hartman
6f2c6599ba Merge 5.8-rc6 into tty-next
We need the serial/tty fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-20 09:39:11 +02:00
Vladimir Oltean
b6bd41363a ptp: introduce a phase offset in the periodic output request
Some PHCs like the ocelot/felix switch cannot emit generic periodic
output, but just PPS (pulse per second) signals, which:
- don't start from arbitrary absolute times, but are rather
  phase-aligned to the beginning of [the closest next] second.
- have an optional phase offset relative to that beginning of the
  second.

For those, it was initially established that they should reject any
other absolute time for the PTP_PEROUT_REQUEST than 0.000000000 [1].

But when it actually came to writing an application [2] that makes use
of this functionality, we realized that we can't really deal generically
with PHCs that support absolute start time, and with PHCs that don't,
without an explicit interface. Namely, in an ideal world, PHC drivers
would ensure that the "perout.start" value written to hardware will
result in a functional output. This means that if the PTP time has
become in the past of this PHC's current time, it should be
automatically fast-forwarded by the driver into a close enough future
time that is known to work (note: this is necessary only if the hardware
doesn't do this fast-forward by itself). But we don't really know what
is the status for PHC drivers in use today, so in the general sense,
user space would be risking to have a non-functional periodic output if
it simply asked for a start time of 0.000000000.

So let's introduce a flag for this type of reduced-functionality
hardware, named PTP_PEROUT_PHASE. The start time is just "soon", the
only thing we know for sure about this signal is that its rising edge
events, Rn, occur at:

Rn = perout.phase + n * perout.period

The "phase" in the periodic output structure is simply an alias to the
"start" time, since both cannot logically be specified at the same time.
Therefore, the binary layout of the structure is not affected.

[1]: https://patchwork.ozlabs.org/project/netdev/patch/20200320103726.32559-7-yangbo.lu@nxp.com/
[2]: https://www.mail-archive.com/linuxptp-devel@lists.sourceforge.net/msg04142.html

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19 19:22:56 -07:00
Vladimir Oltean
f65b71aa25 ptp: add ability to configure duty cycle for periodic output
There are external event timestampers (PHCs with support for
PTP_EXTTS_REQUEST) that timestamp both event edges.

When those edges are very close (such as in the case of a short pulse),
there is a chance that the collected timestamp might be of the rising,
or of the falling edge, we never know.

There are also PHCs capable of generating periodic output with a
configurable duty cycle. This is good news, because we can space the
rising and falling edge out enough in time, that the risks to overrun
the 1-entry timestamp FIFO of the extts PHC are lower (example: the
perout PHC can be configured for a period of 1 second, and an "on" time
of 0.5 seconds, resulting in a duty cycle of 50%).

A flag is introduced for signaling that an on time is present in the
perout request structure, for preserving compatibility. Logically
speaking, the duty cycle cannot exceed 100% and the PTP core checks for
this.

PHC drivers that don't support this flag emit a periodic output of an
unspecified duty cycle, same as before.

The duty cycle is encoded as an "on" time, similar to the "start" and
"period" times, and reuses the reserved space while preserving overall
binary layout.

Pahole reported before:

struct ptp_perout_request {
        struct ptp_clock_time start;                     /*     0    16 */
        struct ptp_clock_time period;                    /*    16    16 */
        unsigned int               index;                /*    32     4 */
        unsigned int               flags;                /*    36     4 */
        unsigned int               rsv[4];               /*    40    16 */

        /* size: 56, cachelines: 1, members: 5 */
        /* last cacheline: 56 bytes */
};

And now:

struct ptp_perout_request {
        struct ptp_clock_time start;                     /*     0    16 */
        struct ptp_clock_time period;                    /*    16    16 */
        unsigned int               index;                /*    32     4 */
        unsigned int               flags;                /*    36     4 */
        union {
                struct ptp_clock_time on;                /*    40    16 */
                unsigned int       rsv[4];               /*    40    16 */
        };                                               /*    40    16 */

        /* size: 56, cachelines: 1, members: 5 */
        /* last cacheline: 56 bytes */
};

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19 19:22:56 -07:00
Willem de Bruijn
eba75c587e icmp: support rfc 4884
Add setsockopt SOL_IP/IP_RECVERR_4884 to return the offset to an
extension struct if present.

ICMP messages may include an extension structure after the original
datagram. RFC 4884 standardized this behavior. It stores the offset
in words to the extension header in u8 icmphdr.un.reserved[1].

The field is valid only for ICMP types destination unreachable, time
exceeded and parameter problem, if length is at least 128 bytes and
entire packet does not exceed 576 bytes.

Return the offset to the start of the extension struct when reading an
ICMP error from the error queue, if it matches the above constraints.

Do not return the raw u8 field. Return the offset from the start of
the user buffer, in bytes. The kernel does not return the network and
transport headers, so subtract those.

Also validate the headers. Return the offset regardless of validation,
as an invalid extension must still not be misinterpreted as part of
the original datagram. Note that !invalid does not imply valid. If
the extension version does not match, no validation can take place,
for instance.

For backward compatibility, make this optional, set by setsockopt
SOL_IP/IP_RECVERR_RFC4884. For API example and feature test, see
github.com/wdebruij/kerneltools/blob/master/tests/recv_icmp_v2.c

For forward compatibility, reserve only setsockopt value 1, leaving
other bits for additional icmp extensions.

Changes
  v1->v2:
  - convert word offset to byte offset from start of user buffer
    - return in ee_data as u8 may be insufficient
  - define extension struct and object header structs
  - return len only if constraints met
  - if returning len, also validate

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19 19:20:22 -07:00
Michael Walle
c4471ad9a5 net: phy: add USXGMII link partner ability constants
The constants are taken from the USXGMII Singleport Copper Interface
specification. The naming are based on the SGMII ones, but with an MDIO_
prefix.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19 18:05:49 -07:00
Adrian Reber
124ea650d3 capabilities: Introduce CAP_CHECKPOINT_RESTORE
This patch introduces CAP_CHECKPOINT_RESTORE, a new capability facilitating
checkpoint/restore for non-root users.

Over the last years, The CRIU (Checkpoint/Restore In Userspace) team has
been asked numerous times if it is possible to checkpoint/restore a
process as non-root. The answer usually was: 'almost'.

The main blocker to restore a process as non-root was to control the PID
of the restored process. This feature available via the clone3 system
call, or via /proc/sys/kernel/ns_last_pid is unfortunately guarded by
CAP_SYS_ADMIN.

In the past two years, requests for non-root checkpoint/restore have
increased due to the following use cases:
* Checkpoint/Restore in an HPC environment in combination with a
  resource manager distributing jobs where users are always running as
  non-root. There is a desire to provide a way to checkpoint and
  restore long running jobs.
* Container migration as non-root
* We have been in contact with JVM developers who are integrating
  CRIU into a Java VM to decrease the startup time. These
  checkpoint/restore applications are not meant to be running with
  CAP_SYS_ADMIN.

We have seen the following workarounds:
* Use a setuid wrapper around CRIU:
  See https://github.com/FredHutch/slurm-examples/blob/master/checkpointer/lib/checkpointer/checkpointer-suid.c
* Use a setuid helper that writes to ns_last_pid.
  Unfortunately, this helper delegation technique is impossible to use
  with clone3, and is thus prone to races.
  See https://github.com/twosigma/set_ns_last_pid
* Cycle through PIDs with fork() until the desired PID is reached:
  This has been demonstrated to work with cycling rates of 100,000 PIDs/s
  See https://github.com/twosigma/set_ns_last_pid
* Patch out the CAP_SYS_ADMIN check from the kernel
* Run the desired application in a new user and PID namespace to provide
  a local CAP_SYS_ADMIN for controlling PIDs. This technique has limited
  use in typical container environments (e.g., Kubernetes) as /proc is
  typically protected with read-only layers (e.g., /proc/sys) for
  hardening purposes. Read-only layers prevent additional /proc mounts
  (due to proc's SB_I_USERNS_VISIBLE property), making the use of new
  PID namespaces limited as certain applications need access to /proc
  matching their PID namespace.

The introduced capability allows to:
* Control PIDs when the current user is CAP_CHECKPOINT_RESTORE capable
  for the corresponding PID namespace via ns_last_pid/clone3.
* Open files in /proc/pid/map_files when the current user is
  CAP_CHECKPOINT_RESTORE capable in the root namespace, useful for
  recovering files that are unreachable via the file system such as
  deleted files, or memfd files.

See corresponding selftest for an example with clone3().

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20200719100418.2112740-2-areber@redhat.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-07-19 20:14:42 +02:00
Ezequiel Garcia
b3ab1c6058 media: Add V4L2_TYPE_IS_CAPTURE helper
It's all too easy to get confused by the V4L2_TYPE_IS_OUTPUT
macro, when it's used as !V4L2_TYPE_IS_OUTPUT.

Reduce the risk of confusion with macro to explicitly
check for the CAPTURE queue type case.

This change does not affect functionality, and it's
only intended to make the code more readable.

Suggested-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
[hverkuil-cisco@xs4all.nl: checkpatch: align with parenthesis]
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-07-19 08:13:24 +02:00
Jakub Sitnicki
e9ddbb7707 bpf: Introduce SK_LOOKUP program type with a dedicated attach point
Add a new program type BPF_PROG_TYPE_SK_LOOKUP with a dedicated attach type
BPF_SK_LOOKUP. The new program kind is to be invoked by the transport layer
when looking up a listening socket for a new connection request for
connection oriented protocols, or when looking up an unconnected socket for
a packet for connection-less protocols.

When called, SK_LOOKUP BPF program can select a socket that will receive
the packet. This serves as a mechanism to overcome the limits of what
bind() API allows to express. Two use-cases driving this work are:

 (1) steer packets destined to an IP range, on fixed port to a socket

     192.0.2.0/24, port 80 -> NGINX socket

 (2) steer packets destined to an IP address, on any port to a socket

     198.51.100.1, any port -> L7 proxy socket

In its run-time context program receives information about the packet that
triggered the socket lookup. Namely IP version, L4 protocol identifier, and
address 4-tuple. Context can be further extended to include ingress
interface identifier.

To select a socket BPF program fetches it from a map holding socket
references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...)
helper to record the selection. Transport layer then uses the selected
socket as a result of socket lookup.

In its basic form, SK_LOOKUP acts as a filter and hence must return either
SK_PASS or SK_DROP. If the program returns with SK_PASS, transport should
look for a socket to receive the packet, or use the one selected by the
program if available, while SK_DROP informs the transport layer that the
lookup should fail.

This patch only enables the user to attach an SK_LOOKUP program to a
network namespace. Subsequent patches hook it up to run on local delivery
path in ipv4 and ipv6 stacks.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200717103536.397595-3-jakub@cloudflare.com
2020-07-17 20:18:16 -07:00
Priyaranjan Jha
e3a5a1e8b6 tcp: add SNMP counter for no. of duplicate segments reported by DSACK
There are two existing SNMP counters, TCPDSACKRecv and TCPDSACKOfoRecv,
which are incremented depending on whether the DSACKed range is below
the cumulative ACK sequence number or not. Unfortunately, these both
implicitly assume each DSACK covers only one segment. This makes these
counters unusable for estimating spurious retransmit rates,
or real/non-spurious loss rate.

This patch introduces a new SNMP counter, TCPDSACKRecvSegs, which tracks
the estimated number of duplicate segments based on:
(DSACKed sequence range) / MSS. This counter is usable for estimating
spurious retransmit rates, or real/non-spurious loss rate.

Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17 12:54:30 -07:00
Randy Dunlap
bfdfa51702 bpf: Drop duplicated words in uapi helper comments
Drop doubled words "will" and "attach".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/6b9f71ae-4f8e-0259-2c5d-187ddaefe6eb@infradead.org
2020-07-16 21:00:09 +02:00
Linus Torvalds
3e543a4d30 Merge tag 'char-misc-5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc into master
Pull char/misc fixes from Greg KH:
 "Here are number of small char/misc driver fixes for 5.8-rc6

  Not that many complex fixes here, just a number of small fixes for
  reported issues, and some new device ids. Nothing fancy.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits)
  virtio: virtio_console: add missing MODULE_DEVICE_TABLE() for rproc serial
  intel_th: Fix a NULL dereference when hub driver is not loaded
  intel_th: pci: Add Emmitsburg PCH support
  intel_th: pci: Add Tiger Lake PCH-H support
  intel_th: pci: Add Jasper Lake CPU support
  virt: vbox: Fix guest capabilities mask check
  virt: vbox: Fix VBGL_IOCTL_VMMDEV_REQUEST_BIG and _LOG req numbers to match upstream
  uio_pdrv_genirq: fix use without device tree and no interrupt
  uio_pdrv_genirq: Remove warning when irq is not specified
  coresight: etmv4: Fix CPU power management setup in probe() function
  coresight: cti: Fix error handling in probe
  Revert "zram: convert remaining CLASS_ATTR() to CLASS_ATTR_RO()"
  mei: bus: don't clean driver pointer
  misc: atmel-ssc: lock with mutex instead of spinlock
  phy: sun4i-usb: fix dereference of pointer phy0 before it is null checked
  phy: rockchip: Fix return value of inno_dsidphy_probe()
  phy: ti: j721e-wiz: Constify structs
  phy: ti: am654-serdes: Constify regmap_config
  phy: intel: fix enum type mismatch warning
  phy: intel: Fix compilation error on FIELD_PREP usage
  ...
2020-07-16 11:26:40 -07:00
Lorenzo Bianconi
9216477449 bpf: cpumap: Add the possibility to attach an eBPF program to cpumap
Introduce the capability to attach an eBPF program to cpumap entries.
The idea behind this feature is to add the possibility to define on
which CPU run the eBPF program if the underlying hw does not support
RSS. Current supported verdicts are XDP_DROP and XDP_PASS.

This patch has been tested on Marvell ESPRESSObin using xdp_redirect_cpu
sample available in the kernel tree to identify possible performance
regressions. Results show there are no observable differences in
packet-per-second:

$./xdp_redirect_cpu --progname xdp_cpu_map0 --dev eth0 --cpu 1
rx: 354.8 Kpps
rx: 356.0 Kpps
rx: 356.8 Kpps
rx: 356.3 Kpps
rx: 356.6 Kpps
rx: 356.6 Kpps
rx: 356.7 Kpps
rx: 355.8 Kpps
rx: 356.8 Kpps
rx: 356.8 Kpps

Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/5c9febdf903d810b3415732e5cd98491d7d9067a.1594734381.git.lorenzo@kernel.org
2020-07-16 17:00:32 +02:00
Lorenzo Bianconi
644bfe51fa cpumap: Formalize map value as a named struct
As it has been already done for devmap, introduce 'struct bpf_cpumap_val'
to formalize the expected values that can be passed in for a CPUMAP.
Update cpumap code to use the struct.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/754f950674665dae6139c061d28c1d982aaf4170.1594734381.git.lorenzo@kernel.org
2020-07-16 17:00:32 +02:00
Randy Dunlap
c201324b54 net: caif: drop duplicate words in comments
Drop doubled words "or" and "the" in several comments.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-07-15 20:34:11 -07:00
Linus Torvalds
0665a4e9a1 Merge tag 'dmaengine-fix-5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine into master
Pull dmaengine fixes from Vinod Koul:

 - update dmaengine tree location to kernel.org

 - dmatest fix for completing threads

 - driver fixes for k3dma, fsl-dma, idxd, ,tegra, and few other drivers

* tag 'dmaengine-fix-5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (21 commits)
  dmaengine: ioat setting ioat timeout as module parameter
  dmaengine: fsl-edma: fix wrong tcd endianness for big-endian cpu
  dmaengine: dmatest: stop completed threads when running without set channel
  dmaengine: fsl-edma-common: correct DSIZE_32BYTE
  dmaengine: dw: Initialize channel before each transfer
  dmaengine: idxd: fix misc interrupt handler thread unmasking
  dmaengine: idxd: cleanup workqueue config after disabling
  dmaengine: tegra210-adma: Fix runtime PM imbalance on error
  dmaengine: mcf-edma: Fix NULL pointer exception in mcf_edma_tx_handler
  dmaengine: fsl-edma: Fix NULL pointer exception in fsl_edma_tx_handler
  dmaengine: fsl-edma: Add lockdep assert for exported function
  dmaengine: idxd: fix hw descriptor fields for delta record
  dmaengine: ti: k3-udma: add missing put_device() call in of_xudma_dev_get()
  dmaengine: sh: usb-dmac: set tx_result parameters
  dmaengine: ti: k3-udma: Fix delayed_work usage for tx drain workaround
  dmaengine: idxd: fix cdev locking for open and release
  dmaengine: imx-sdma: Fix: Remove 'always true' comparison
  MAINTAINERS: switch dmaengine tree to kernel.org
  dmaengine: ti: k3-udma: Fix the running channel handling in alloc_chan_resources
  dmaengine: ti: k3-udma: Fix cleanup code for alloc_chan_resources
  ...
2020-07-15 15:58:11 -07:00
Amber Lin
91e2c19192 include/uapi/linux: Update KFD ioctl version
Bump KFD ioctl after adding SMI events support

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-15 13:27:34 -04:00
Amber Lin
938a0650aa drm/amdkfd: Provide SMI events watch
When the compute is malfunctioning or performance drops, the system admin
will use SMI (System Management Interface) tool to monitor/diagnostic what
went wrong. This patch provides an event watch interface for the user
space to register devices and subscribe events they are interested. After
registered, the user can use annoymous file descriptor's poll function
with wait-time specified and wait for events to happen. Once an event
happens, the user can use read() to retrieve information related to the
event.

VM fault event is done in this patch.

v2: - remove UNREGISTER and add event ENABLE/DISABLE
    - correct kfifo usage
    - move event message API to kfd_ioctl.h
v3: send the event msg in text than in binary
v4: support multiple clients
v5: move events enablement from ioctl to fd write
v6: sparse fix

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-15 13:27:34 -04:00
Sargun Dhillon
7cf97b1254 seccomp: Introduce addfd ioctl to seccomp user notifier
The current SECCOMP_RET_USER_NOTIF API allows for syscall supervision over
an fd. It is often used in settings where a supervising task emulates
syscalls on behalf of a supervised task in userspace, either to further
restrict the supervisee's syscall abilities or to circumvent kernel
enforced restrictions the supervisor deems safe to lift (e.g. actually
performing a mount(2) for an unprivileged container).

While SECCOMP_RET_USER_NOTIF allows for the interception of any syscall,
only a certain subset of syscalls could be correctly emulated. Over the
last few development cycles, the set of syscalls which can't be emulated
has been reduced due to the addition of pidfd_getfd(2). With this we are
now able to, for example, intercept syscalls that require the supervisor
to operate on file descriptors of the supervisee such as connect(2).

However, syscalls that cause new file descriptors to be installed can not
currently be correctly emulated since there is no way for the supervisor
to inject file descriptors into the supervisee. This patch adds a
new addfd ioctl to remove this restriction by allowing the supervisor to
install file descriptors into the intercepted task. By implementing this
feature via seccomp the supervisor effectively instructs the supervisee
to install a set of file descriptors into its own file descriptor table
during the intercepted syscall. This way it is possible to intercept
syscalls such as open() or accept(), and install (or replace, like
dup2(2)) the supervisor's resulting fd into the supervisee. One
replacement use-case would be to redirect the stdout and stderr of a
supervisee into log file descriptors opened by the supervisor.

The ioctl handling is based on the discussions[1] of how Extensible
Arguments should interact with ioctls. Instead of building size into
the addfd structure, make it a function of the ioctl command (which
is how sizes are normally passed to ioctls). To support forward and
backward compatibility, just mask out the direction and size, and match
everything. The size (and any future direction) checks are done along
with copy_struct_from_user() logic.

As a note, the seccomp_notif_addfd structure is laid out based on 8-byte
alignment without requiring packing as there have been packing issues
with uapi highlighted before[2][3]. Although we could overload the
newfd field and use -1 to indicate that it is not to be used, doing
so requires changing the size of the fd field, and introduces struct
packing complexity.

[1]: https://lore.kernel.org/lkml/87o8w9bcaf.fsf@mid.deneb.enyo.de/
[2]: https://lore.kernel.org/lkml/a328b91d-fd8f-4f27-b3c2-91a9c45f18c0@rasmusvillemoes.dk/
[3]: https://lore.kernel.org/lkml/20200612104629.GA15814@ircssh-2.c.rugged-nimbus-611.internal

Cc: Christoph Hellwig <hch@lst.de>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Jann Horn <jannh@google.com>
Cc: Robert Sesek <rsesek@google.com>
Cc: Chris Palmer <palmer@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-api@vger.kernel.org
Suggested-by: Matt Denton <mpdenton@google.com>
Link: https://lore.kernel.org/r/20200603011044.7972-4-sargun@sargun.me
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Reviewed-by: Will Drewry <wad@chromium.org>
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-14 16:29:42 -07:00
Horatiu Vultur
ffb3adba64 net: bridge: Add port attribute IFLA_BRPORT_MRP_IN_OPEN
This patch adds a new port attribute, IFLA_BRPORT_MRP_IN_OPEN, which
allows to notify the userspace when the node lost the contiuity of
MRP_InTest frames.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-14 13:46:43 -07:00
Horatiu Vultur
559139cb04 bridge: uapi: mrp: Extend MRP_INFO attributes for interconnect status
Extend the existing MRP_INFO to return status of MRP interconnect. In
case there is no MRP interconnect on the node then the role will be
disabled so the other attributes can be ignored.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-14 13:46:43 -07:00
Horatiu Vultur
2801758391 bridge: uapi: mrp: Extend MRP attributes for MRP interconnect
Extend the existing MRP netlink attributes to allow to configure MRP
Interconnect:

IFLA_BRIDGE_MRP_IN_ROLE - the parameter type is br_mrp_in_role which
  contains the interconnect id, the ring id, the interconnect role(MIM
  or MIC) and the port ifindex that represents the interconnect port.

IFLA_BRIDGE_MRP_IN_STATE - the parameter type is br_mrp_in_state which
  contains the interconnect id and the interconnect state.

IFLA_BRIDGE_MRP_IN_TEST - the parameter type is br_mrp_start_in_test
  which contains the interconnect id, the interval at which to send
  MRP_InTest frames, how many test frames can be missed before declaring
  the interconnect ring open and the period which represents for how long
  to send MRP_InTest frames.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-14 13:46:42 -07:00
Linus Torvalds
e9919e11e2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
 "A few quirks for the Elan touchpad driver, another Thinkpad is being
  switched over from PS/2 to native RMI4 interface, and we gave a brand
  new SW_MACHINE_COVER switch definition"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: elan_i2c - add more hardware ID for Lenovo laptops
  Input: i8042 - add Lenovo XiaoXin Air 12 to i8042 nomux list
  Revert "Input: elants_i2c - report resolution information for touch major"
  Input: elan_i2c - only increment wakeup count on touch
  Input: synaptics - enable InterTouch for ThinkPad X1E 1st gen
  ARM: dts: n900: remove mmc1 card detect gpio
  Input: add `SW_MACHINE_COVER`
2020-07-13 18:31:15 -07:00
David S. Miller
07dd1b7e68 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-07-13

The following pull-request contains BPF updates for your *net-next* tree.

We've added 36 non-merge commits during the last 7 day(s) which contain
a total of 62 files changed, 2242 insertions(+), 468 deletions(-).

The main changes are:

1) Avoid trace_printk warning banner by switching bpf_trace_printk to use
   its own tracing event, from Alan.

2) Better libbpf support on older kernels, from Andrii.

3) Additional AF_XDP stats, from Ciara.

4) build time resolution of BTF IDs, from Jiri.

5) BPF_CGROUP_INET_SOCK_RELEASE hook, from Stanislav.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-13 18:04:05 -07:00
Alexander A. Klimov
ed757328c3 atm: Replace HTTP links with HTTPS ones
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Deterministic algorithm:
For each file:
  If not .svg:
    For each line:
      If doesn't contain `\bxmlns\b`:
        For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
	  If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
            If both the HTTP and HTTPS versions
            return 200 OK and serve the same content:
              Replace HTTP with HTTPS.

Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-13 17:01:44 -07:00
Ciara Loftus
0d80cb4612 xsk: Add xdp statistics to xsk_diag
Add xdp statistics to the information dumped through the xsk_diag interface

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200708072835.4427-4-ciara.loftus@intel.com
2020-07-13 15:32:56 -07:00
Ciara Loftus
8aa5a33578 xsk: Add new statistics
It can be useful for the user to know the reason behind a dropped packet.
Introduce new counters which track drops on the receive path caused by:
1. rx ring being full
2. fill ring being empty

Also, on the tx path introduce a counter which tracks the number of times
we attempt pull from the tx ring when it is empty.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200708072835.4427-2-ciara.loftus@intel.com
2020-07-13 15:32:56 -07:00
Frank van der Linden
95ad37f90c NFSv4.2: add client side xattr caching.
Implement client side caching for NFSv4.2 extended attributes. The cache
is a per-inode hashtable, with name/value entries. There is one special
entry for the listxattr cache.

NFS inodes have a pointer to a cache structure. The cache structure is
allocated on demand, freed when the cache is invalidated.

Memory shrinkers keep the size in check. Large entries (> PAGE_SIZE)
are collected by a separate shrinker, and freed more aggressively
than others.

Signed-off-by: Frank van der Linden <fllinden@amazon.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-07-13 17:52:46 -04:00
Frank van der Linden
c132621047 nfs,nfsd: NFSv4.2 extended attribute protocol definitions
Add definitions for the new operations, errors and flags as defined
in RFC 8276 (File System Extended Attributes in NFSv4).

Signed-off-by: Frank van der Linden <fllinden@amazon.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-07-13 17:20:49 -04:00
Kent Gibson
21249616f0 gpio: uapi: fix misplaced comment line
The second line of the description for event_type is before the first.
Move it to after the first line.

Signed-off-by: Kent Gibson <warthog618@gmail.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
2020-07-12 10:22:01 +02:00
David S. Miller
71930d6102 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
All conflicts seemed rather trivial, with some guidance from
Saeed Mameed on the tc_ct.c one.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-11 00:46:00 -07:00
Linus Torvalds
5a764898af Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Restore previous behavior of CAP_SYS_ADMIN wrt loading networking
    BPF programs, from Maciej Żenczykowski.

 2) Fix dropped broadcasts in mac80211 code, from Seevalamuthu
    Mariappan.

 3) Slay memory leak in nl80211 bss color attribute parsing code, from
    Luca Coelho.

 4) Get route from skb properly in ip_route_use_hint(), from Miaohe Lin.

 5) Don't allow anything other than ARPHRD_ETHER in llc code, from Eric
    Dumazet.

 6) xsk code dips too deeply into DMA mapping implementation internals.
    Add dma_need_sync and use it. From Christoph Hellwig

 7) Enforce power-of-2 for BPF ringbuf sizes. From Andrii Nakryiko.

 8) Check for disallowed attributes when loading flow dissector BPF
    programs. From Lorenz Bauer.

 9) Correct packet injection to L3 tunnel devices via AF_PACKET, from
    Jason A. Donenfeld.

10) Don't advertise checksum offload on ipa devices that don't support
    it. From Alex Elder.

11) Resolve several issues in TCP MD5 signature support. Missing memory
    barriers, bogus options emitted when using syncookies, and failure
    to allow md5 key changes in established states. All from Eric
    Dumazet.

12) Fix interface leak in hsr code, from Taehee Yoo.

13) VF reset fixes in hns3 driver, from Huazhong Tan.

14) Make loopback work again with ipv6 anycast, from David Ahern.

15) Fix TX starvation under high load in fec driver, from Tobias
    Waldekranz.

16) MLD2 payload lengths not checked properly in bridge multicast code,
    from Linus Lüssing.

17) Packet scheduler code that wants to find the inner protocol
    currently only works for one level of VLAN encapsulation. Allow
    Q-in-Q situations to work properly here, from Toke
    Høiland-Jørgensen.

18) Fix route leak in l2tp, from Xin Long.

19) Resolve conflict between the sk->sk_user_data usage of bpf reuseport
    support and various protocols. From Martin KaFai Lau.

20) Fix socket cgroup v2 reference counting in some situations, from
    Cong Wang.

21) Cure memory leak in mlx5 connection tracking offload support, from
    Eli Britstein.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (146 commits)
  mlxsw: pci: Fix use-after-free in case of failed devlink reload
  mlxsw: spectrum_router: Remove inappropriate usage of WARN_ON()
  net: macb: fix call to pm_runtime in the suspend/resume functions
  net: macb: fix macb_suspend() by removing call to netif_carrier_off()
  net: macb: fix macb_get/set_wol() when moving to phylink
  net: macb: mark device wake capable when "magic-packet" property present
  net: macb: fix wakeup test in runtime suspend/resume routines
  bnxt_en: fix NULL dereference in case SR-IOV configuration fails
  libbpf: Fix libbpf hashmap on (I)LP32 architectures
  net/mlx5e: CT: Fix memory leak in cleanup
  net/mlx5e: Fix port buffers cell size value
  net/mlx5e: Fix 50G per lane indication
  net/mlx5e: Fix CPU mapping after function reload to avoid aRFS RX crash
  net/mlx5e: Fix VXLAN configuration restore after function reload
  net/mlx5e: Fix usage of rcu-protected pointer
  net/mxl5e: Verify that rpriv is not NULL
  net/mlx5: E-Switch, Fix vlan or qos setting in legacy mode
  net/mlx5: Fix eeprom support for SFP module
  cgroup: Fix sock_cgroup_data on big-endian.
  selftests: bpf: Fix detach from sockmap tests
  ...
2020-07-10 18:16:22 -07:00
Kees Cook
47e33c05f9 seccomp: Fix ioctl number for SECCOMP_IOCTL_NOTIF_ID_VALID
When SECCOMP_IOCTL_NOTIF_ID_VALID was first introduced it had the wrong
direction flag set. While this isn't a big deal as nothing currently
enforces these bits in the kernel, it should be defined correctly. Fix
the define and provide support for the old command until it is no longer
needed for backward compatibility.

Fixes: 6a21cc50f0 ("seccomp: add a return code to trap to userspace")
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-10 16:01:52 -07:00
Mohammed Gamal
3edd68399d KVM: x86: Add a capability for GUEST_MAXPHYADDR < HOST_MAXPHYADDR support
This patch adds a new capability KVM_CAP_SMALLER_MAXPHYADDR which
allows userspace to query if the underlying architecture would
support GUEST_MAXPHYADDR < HOST_MAXPHYADDR and hence act accordingly
(e.g. qemu can decide if it should warn for -cpu ..,phys-bits=X)

The complications in this patch are due to unexpected (but documented)
behaviour we see with NPF vmexit handling in AMD processor.  If
SVM is modified to add guest physical address checks in the NPF
and guest #PF paths, we see the followning error multiple times in
the 'access' test in kvm-unit-tests:

            test pte.p pte.36 pde.p: FAIL: pte 2000021 expected 2000001
            Dump mapping: address: 0x123400000000
            ------L4: 24c3027
            ------L3: 24c4027
            ------L2: 24c5021
            ------L1: 1002000021

This is because the PTE's accessed bit is set by the CPU hardware before
the NPF vmexit. This is handled completely by hardware and cannot be fixed
in software.

Therefore, availability of the new capability depends on a boolean variable
allow_smaller_maxphyaddr which is set individually by VMX and SVM init
routines. On VMX it's always set to true, on SVM it's only set to true
when NPT is not enabled.

CC: Tom Lendacky <thomas.lendacky@amd.com>
CC: Babu Moger <babu.moger@amd.com>
Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Message-Id: <20200710154811.418214-10-mgamal@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10 17:01:53 -04:00
Jakub Kicinski
c7d759eb7b ethtool: add tunnel info interface
Add an interface to report offloaded UDP ports via ethtool netlink.

Now that core takes care of tracking which UDP tunnel ports the NICs
are aware of we can quite easily export this information out to
user space.

The responsibility of writing the netlink dumps is split between
ethtool code and udp_tunnel_nic.c - since udp_tunnel module may
not always be loaded, yet we should always report the capabilities
of the NIC.

$ ethtool --show-tunnels eth0
Tunnel information for eth0:
  UDP port table 0:
    Size: 4
    Types: vxlan
    No entries
  UDP port table 1:
    Size: 4
    Types: geneve, vxlan-gpe
    Entries (1):
        port 1230, vxlan-gpe

v4:
 - back to v2, build fix is now directly in udp_tunnel.h
v3:
 - don't compile ETHTOOL_MSG_TUNNEL_INFO_GET in if CONFIG_INET
   not set.
v2:
 - fix string set count,
 - reorder enums in the uAPI,
 - fix type of ETHTOOL_A_TUNNEL_UDP_TABLE_TYPES to bitset
   in docs and comments.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-10 13:54:00 -07:00
Linus Torvalds
a581387e41 Merge tag 'io_uring-5.8-2020-07-10' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:

 - Fix memleak for error path in registered files (Yang)

 - Export CQ overflow state in flags, necessary to fix a case where
   liburing doesn't know if it needs to enter the kernel (Xiaoguang)

 - Fix for a regression in when user memory is accounted freed, causing
   issues with back-to-back ring exit + init if the ulimit -l setting is
   very tight.

* tag 'io_uring-5.8-2020-07-10' of git://git.kernel.dk/linux-block:
  io_uring: account user memory freed when exit has been queued
  io_uring: fix memleak in io_sqe_files_register()
  io_uring: fix memleak in __io_sqe_files_update()
  io_uring: export cq overflow status to userspace
2020-07-10 09:57:57 -07:00
Masahiro Yamada
04aaca197f char: raw: do not leak CONFIG_MAX_RAW_DEVS to userspace
include/uapi/linux/raw.h leaks CONFIG_MAX_RAW_DEVS to userspace.

Userspace programs cannot use MAX_RAW_MINORS since CONFIG_MAX_RAW_DEVS
is not available anyway.

Remove the MAX_RAW_MINORS definition from the exported header, and use
CONFIG_MAX_RAW_DEVS in drivers/char/raw.c

While I was here, I converted printk(KERN_WARNING ...) to pr_warn(...)
and stretched the warning message.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20200617083313.183184-1-masahiroy@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-10 14:50:51 +02:00
Hans de Goede
316b003540 virt: vbox: Add a few new vmmdev request types to the userspace whitelist
Upstream VirtualBox has defined and is using a few new request types for
vmmdev requests passed through /dev/vboxguest to the hypervisor.

Add the defines for these to vbox_vmmdev_types.h and add add them to the
whitelists of vmmdev requests which userspace is allowed to make.

BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1789545
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20200709120858.63928-7-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-10 13:45:32 +02:00
Hans de Goede
631beddc54 virt: vbox: Add support for the new VBG_IOCTL_ACQUIRE_GUEST_CAPABILITIES ioctl
Add support for the new VBG_IOCTL_ACQUIRE_GUEST_CAPABILITIES ioctl, this
is necessary for automatic resizing of the guest resolution to match the
VM-window size to work with the new VMSVGA virtual GPU which is now the
new default in VirtualBox.

BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1789545
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20200709120858.63928-6-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-10 13:45:32 +02:00
Hans de Goede
f794db6841 virt: vbox: Fix VBGL_IOCTL_VMMDEV_REQUEST_BIG and _LOG req numbers to match upstream
Until this commit the mainline kernel version (this version) of the
vboxguest module contained a bug where it defined
VBGL_IOCTL_VMMDEV_REQUEST_BIG and VBGL_IOCTL_LOG using
_IOC(_IOC_READ | _IOC_WRITE, 'V', ...) instead of
_IO(V, ...) as the out of tree VirtualBox upstream version does.

Since the VirtualBox userspace bits are always built against VirtualBox
upstream's headers, this means that so far the mainline kernel version
of the vboxguest module has been failing these 2 ioctls with -ENOTTY.
I guess that VBGL_IOCTL_VMMDEV_REQUEST_BIG is never used causing us to
not hit that one and sofar the vboxguest driver has failed to actually
log any log messages passed it through VBGL_IOCTL_LOG.

This commit changes the VBGL_IOCTL_VMMDEV_REQUEST_BIG and VBGL_IOCTL_LOG
defines to match the out of tree VirtualBox upstream vboxguest version,
while keeping compatibility with the old wrong request defines so as
to not break the kernel ABI in case someone has been using the old
request defines.

Fixes: f6ddd094f5 ("virt: Add vboxguest driver for Virtual Box Guest integration UAPI")
Cc: stable@vger.kernel.org
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20200709120858.63928-2-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-10 13:40:19 +02:00
Danielle Ratson
a0f49b5486 devlink: Add a new devlink port split ability attribute and pass to netlink
Add a new attribute that indicates the split ability of devlink port.

Drivers are expected to set it via devlink_port_attrs_set(), before
registering the port.

Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09 13:15:30 -07:00