Very few (mlxsw) upstream drivers seem to allow offload of chains
other than 0. Save driver developers typing and add a helper for
checking both if ethtool's TC offload flag is on and if chain is 0.
This helper will set the extack appropriately in both error cases.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds support for calling sock_ops BPF program when there is a TCP state
change. Two arguments are used; one for the old state and another for
the new state.
There is a new enum in include/uapi/linux/bpf.h that exports the TCP
states that prepends BPF_ to the current TCP state names. If it is ever
necessary to change the internal TCP state values (other than adding
more to the end), then it will become necessary to convert from the
internal TCP state value to the BPF value before calling the BPF
sock_ops function. There are a set of compile checks added in tcp.c
to detect if the internal and BPF values differ so we can make the
necessary fixes.
New op: BPF_SOCK_OPS_STATE_CB.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adds support for calling sock_ops BPF program when there is a
retransmission. Three arguments are used; one for the sequence number,
another for the number of segments retransmitted, and the last one for
the return value of tcp_transmit_skb (0 => success).
Does not include syn-ack retransmissions.
New op: BPF_SOCK_OPS_RETRANS_CB.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adds an optional call to sock_ops BPF program based on whether the
BPF_SOCK_OPS_RTO_CB_FLAG is set in bpf_sock_ops_flags.
The BPF program is passed 2 arguments: icsk_retransmits and whether the
RTO has expired.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adds field bpf_sock_ops_cb_flags to tcp_sock and bpf_sock_ops. Its primary
use is to determine if there should be calls to sock_ops bpf program at
various points in the TCP code. The field is initialized to zero,
disabling the calls. A sock_ops BPF program can set it, per connection and
as necessary, when the connection is established.
It also adds support for reading and writting the field within a
sock_ops BPF program. Reading is done by accessing the field directly.
However, writing is done through the helper function
bpf_sock_ops_cb_flags_set, in order to return an error if a BPF program
is trying to set a callback that is not supported in the current kernel
(i.e. running an older kernel). The helper function returns 0 if it was
able to set all of the bits set in the argument, a positive number
containing the bits that could not be set, or -EINVAL if the socket is
not a full TCP socket.
Examples of where one could call the bpf program:
1) When RTO fires
2) When a packet is retransmitted
3) When the connection terminates
4) When a packet is sent
5) When a packet is received
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adds support for passing up to 4 arguments to sock_ops bpf functions. It
reusues the reply union, so the bpf_sock_ops structures are not
increased in size.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch adds a macro, SOCK_OPS_SET_FIELD, for writing to
struct tcp_sock or struct sock fields. This required adding a new
field "temp" to struct bpf_sock_ops_kern for temporary storage that
is used by sock_ops_convert_ctx_access. It is used to store and recover
the contents of a register, so the register can be used to store the
address of the sk. Since we cannot overwrite the dst_reg because it
contains the pointer to ctx, nor the src_reg since it contains the value
we want to store, we need an extra register to contain the address
of the sk.
Also adds the macro SOCK_OPS_GET_OR_SET_FIELD that calls one of the
GET or SET macros depending on the value of the TYPE field.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Since commit e76004093d ("fs/buffer.c: remove unnecessary init
operation after allocating buffer_head"), there are no callers of
init_buffer() outside of init_page_buffers(). So just fold it into
init_page_buffers().
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Since commit 9c630ebefe ("ovl: simplify permission checking"),
overlayfs doesn't call __inode_permission() anymore, which leaves no
users other than inode_permission(). So just fold it back into
inode_permission().
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This is the per-I/O equivalent of O_APPEND to support atomic append
operations on any open file.
If a file is opened with O_APPEND, pwrite() ignores the offset and
always appends data to the end of the file. RWF_APPEND enables atomic
append and pwrite() with offset on a single file descriptor.
Signed-off-by: Jürg Billeter <j@bitron.ch>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This patch adds creation time field in inode layout to support showing
kstat.btime in ->statx.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Johan Hedberg says:
====================
pull request: bluetooth-next 2018-01-25
Here's one last bluetooth-next pull request for the 4.16 kernel:
- Improved support for Intel controllers
- New set_parity method to serdev (agreed with maintainers to be taken
through bluetooth-next)
- Fix error path in hci_bcm (missing call to serdev close)
- New ID for BCM4343A0 UART controller
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Some dst_ops (e.g. md_dst_ops)) doesn't set this handler. It may result to:
"BUG: unable to handle kernel NULL pointer dereference at (null)"
Let's add a helper to check if update_pmtu is available before calling it.
Fixes: 52a589d51f ("geneve: update skb dst pmtu on tx path")
Fixes: a93bf0ff44 ("vxlan: update skb dst pmtu on tx path")
CC: Roman Kapl <code@rkapl.cz>
CC: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a tcp socket is closed, if it detects that its net namespace is
exiting, close immediately and do not wait for FIN sequence.
For normal sockets, a reference is taken to their net namespace, so it will
never exit while the socket is open. However, kernel sockets do not take a
reference to their net namespace, so it may begin exiting while the kernel
socket is still open. In this case if the kernel socket is a tcp socket,
it will stay open trying to complete its close sequence. The sock's dst(s)
hold a reference to their interface, which are all transferred to the
namespace's loopback interface when the real interfaces are taken down.
When the namespace tries to take down its loopback interface, it hangs
waiting for all references to the loopback interface to release, which
results in messages like:
unregister_netdevice: waiting for lo to become free. Usage count = 1
These messages continue until the socket finally times out and closes.
Since the net namespace cleanup holds the net_mutex while calling its
registered pernet callbacks, any new net namespace initialization is
blocked until the current net namespace finishes exiting.
After this change, the tcp socket notices the exiting net namespace, and
closes immediately, releasing its dst(s) and their reference to the
loopback interface, which lets the net namespace continue exiting.
Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811
Signed-off-by: Dan Streetman <ddstreet@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that the DT PCI code is merged into drivers/pci, of_irq_parse_pci() can
be static.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Frank Rowand <frowand.list@gmail.com>
To allow accelerated implementations to fall back to the generic
routines, e.g., in contexts where a SIMD based implementation is
not allowed to run, expose the generic SHA3 init/update/final
routines to other modules.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In preparation of exposing the generic SHA3 implementation to other
versions as a fallback, simplify the code, and remove an inconsistency
in the output handling (endian swabbing rsizw words of state before
writing the output does not make sense)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Some of the drivers may use the macro at runtime flow, like
struct property_entry p[10];
...
p[index++] = PROPERTY_ENTRY_U8("u8 property", u8_data);
In that case and absence of the data type compiler fails the build:
drivers/char/ipmi/ipmi_dmi.c:79:29: error: Expected ; at end of statement
drivers/char/ipmi/ipmi_dmi.c:79:29: error: got {
Acked-by: Corey Minyard <cminyard@mvista.com>
Cc: Corey Minyard <minyard@acm.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull networking fixes from David Miller:
1) Avoid negative netdev refcount in error flow of xfrm state add, from
Aviad Yehezkel.
2) Fix tcpdump decoding of IPSEC decap'd frames by filling in the
ethernet header protocol field in xfrm{4,6}_mode_tunnel_input().
From Yossi Kuperman.
3) Fix a syzbot triggered skb_under_panic in pppoe having to do with
failing to allocate an appropriate amount of headroom. From
Guillaume Nault.
4) Fix memory leak in vmxnet3 driver, from Neil Horman.
5) Cure out-of-bounds packet memory access in em_nbyte EMATCH module,
from Wolfgang Bumiller.
6) Restrict what kinds of sockets can be bound to the KCM multiplexer
and also disallow when another layer has attached to the socket and
made use of sk_user_data. From Tom Herbert.
7) Fix use before init of IOTLB in vhost code, from Jason Wang.
8) Correct STACR register write bit definition in IBM emac driver, from
Ivan Mikhaylov.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
net/ibm/emac: wrong bit is used for STA control register write
net/ibm/emac: add 8192 rx/tx fifo size
vhost: do not try to access device IOTLB when not initialized
vhost: use mutex_lock_nested() in vhost_dev_lock_vqs()
i40e: flower: check if TC offload is enabled on a netdev
qed: Free reserved MR tid
qed: Remove reserveration of dpi for kernel
kcm: Check if sk_user_data already set in kcm_attach
kcm: Only allow TCP sockets to be attached to a KCM mux
net: sched: fix TCF_LAYER_LINK case in tcf_get_base_ptr
net: sched: em_nbyte: don't add the data offset twice
mlxsw: spectrum_router: Don't log an error on missing neighbor
vmxnet3: repair memory leak
ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL
pppoe: take ->needed_headroom of lower device into account on xmit
xfrm: fix boolean assignment in xfrm_get_type_offload
xfrm: Fix eth_hdr(skb)->h_proto to reflect inner IP version
xfrm: fix error flow in case of add state fails
xfrm: Add SA to hardware at the end of xfrm_state_construct()
Only two of dev_ioctl() callers may pass SIOCGIFCONF to it.
Separating that codepath from the rest of dev_ioctl() allows both
to simplify dev_ioctl() itself (all other cases work with struct ifreq *)
*and* seriously simplify the compat side of that beast: all it takes
is passing to inet_gifconf() an extra argument - the size of individual
records (sizeof(struct ifreq) or sizeof(struct compat_ifreq)). With
dev_ifconf() called directly from sock_do_ioctl()/compat_dev_ifconf()
that's easy to arrange.
As the result, compat side of SIOCGIFCONF doesn't need any
allocations, copy_in_user() back and forth, etc.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
cls_bpf now guarantees that only device-bound programs are
allowed with skip_sw. The drivers no longer pay attention to
flags on filter load, therefore the bpf_offload member can be
removed. If flags are needed again they should probably be
added to struct tc_cls_common_offload instead.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename the tc_cls_common_offload_init() helper function to
tc_cls_common_offload_init_deprecated() and add a new implementation
which also takes flags argument. We will only set extack if flags
indicate that offload is forced (skip_sw) otherwise driver errors
should be ignored, as they don't influence the overall filter
installation.
Note that we need the tc_skip_hw() helper for new version, therefore
it is added later in the file.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Propagate extack to cls->destroy callbacks when called from
non-error paths. On error paths pass NULL to avoid overwriting
the failure message.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCF_LAYER_LINK and TCF_LAYER_NETWORK returned the same pointer as
skb->data points to the network header.
Use skb_mac_header instead.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull tracing fixes from Steven Rostedt:
"With the new ORC unwinder, ftrace stack tracing became disfunctional.
One was that ORC didn't know how to handle the ftrace callbacks in
general (which Josh fixed).
The other was that ORC would just bail if it hit a dynamically
allocated trampoline. Which means all ftrace stack tracing that
happens from the function tracer would produce no results (that
includes killing the max stack size tracer). I added a check to the
ORC unwinder to see if the trampoline belonged to ftrace, and if it
did, use the orc entry of the static trampoline that was used to
create the dynamic one (it would be identical).
Finally, I noticed that the skip values of the stack tracing were out
of whack. I went through and fixed them up"
* tag 'trace-v4.15-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Update stack trace skipping for ORC unwinder
ftrace, orc, x86: Handle ftrace dynamically allocated trampolines
x86/ftrace: Fix ORC unwinding from ftrace handlers
commit fbb16563c6 ("ASoC: snd_soc_component_driver has pmdown_time")
added new .pmdown_time which is for inverted version of current
.ignore_pmdown_time
But it is confusable name. Let's rename it to .use_pmdown_time
Reported-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Those helpers are going to be used by overlayfs to implement
NFS export decode.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Commit 513674b5a2 ("net: reevalulate autoflowlabel setting after
sysctl setting") removed the initialisation of
ipv6_pinfo::autoflowlabel and added a second flag to indicate
whether this field or the net namespace default should be used.
The getsockopt() handling for this case was not updated, so it
currently returns 0 for all sockets for which IPV6_AUTOFLOWLABEL is
not explicitly enabled. Fix it to return the effective value, whether
that has been set at the socket or net namespace level.
Fixes: 513674b5a2 ("net: reevalulate autoflowlabel setting after sysctl ...")
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
use RCU instead of spin_{,unlock}_bh() to protect concurrent read/write on
act_csum configuration, to reduce the effects of contention in the data
path when multiple readers are present.
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some of the functions done by the generic driver should also be needed
by other opencapi drivers: attaching a context to an adapter,
translation fault handling, AFU interrupt allocation...
So to avoid code duplication, the driver provides a kernel API that
other drivers can use, similar to calling a in-kernel library.
It is still a bit theoretical, for lack of real hardware, and will
likely need adjustements down the road. But we used the cxlflash
driver as a guinea pig.
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add user APIs through ioctl to allocate, free, and be notified of an
AFU interrupt.
For opencapi, an AFU can trigger an interrupt on the host by sending a
specific command targeting a 64-bit object handle. On POWER9, this is
implemented by mapping a special page in the address space of a
process and a write to that page will trigger an interrupt.
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add an ocxl driver to handle generic opencapi devices. Of course, it's
not meant to be the only opencapi driver, any device is free to
implement its own. But if a host application only needs basic services
like attaching to an opencapi adapter, have translation faults handled
or allocate AFU interrupts, it should suffice.
The AFU config space must follow the opencapi specification and use
the expected vendor/device ID to be seen by the generic driver.
The driver exposes the device AFUs as a char device in /dev/ocxl/
Note that the driver currently doesn't handle memory attached to the
opencapi device.
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In the opencapi protocol, host memory contexts are referenced by a
'actag'. During setup, a driver must tell the device how many actags
it can used, and what values are acceptable.
On POWER9, the NPU can handle 64 actags per link, so they must be
shared between all the PCI functions of the link. To get a global
picture of how many actags are used by each AFU of every function, we
capture some data at the end of PCI enumeration, so that actags can be
shared fairly if needed.
This is not powernv specific per say, but rather a consequence of the
opencapi configuration specification being quite general. The number
of available actags on POWER9 makes it more likely to be hit. This is
somewhat mitigated by the fact that existing AFUs are coded by
requesting a reasonable count of actags and existing devices carry
only one AFU.
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The function tracer can create a dynamically allocated trampoline that is
called by the function mcount or fentry hook that is used to call the
function callback that is registered. The problem is that the orc undwinder
will bail if it encounters one of these trampolines. This breaks the stack
trace of function callbacks, which include the stack tracer and setting the
stack trace for individual functions.
Since these dynamic trampolines are basically copies of the static ftrace
trampolines defined in ftrace_*.S, we do not need to create new orc entries
for the dynamic trampolines. Finding the return address on the stack will be
identical as the functions that were copied to create the dynamic
trampolines. When encountering a ftrace dynamic trampoline, we can just use
the orc entry of the ftrace static function that was copied for that
trampoline.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The Atomic Operations feature (PCIe r4.0, sec 6.15) allows atomic
transctions to be requested by, routed through and completed by PCIe
components. Routing and completion do not require software support.
Component support for each is detectable via the DEVCAP2 register.
A Requester may use AtomicOps only if its PCI_EXP_DEVCTL2_ATOMIC_REQ is
set. This should be set only if the Completer and all intermediate routing
elements support AtomicOps.
A concrete example is the AMD Fiji-class GPU (which is capable of making
AtomicOp requests), below a PLX 8747 switch (advertising AtomicOp routing)
with a Haswell host bridge (advertising AtomicOp completion support).
Add pci_enable_atomic_ops_to_root() for per-device control over AtomicOp
requests. This checks to be sure the Root Port supports completion of the
desired AtomicOp sizes and the path to the Root Port supports routing the
AtomicOps.
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
[bhelgaas: changelog, comments, whitespace]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
NFS-over-RDMA client updates for Linux 4.16
New features:
- xprtrdma tracepoints
Bugfixes and cleanups:
- Fix memory leak if rpcrdma_buffer_create() fails
- Fix allocating extra rpcrdma_reps for the backchannel
- Remove various unused and redundant variables and lock cycles
- Fix IPv6 support in xprt_rdma_set_port()
- Fix memory leak by calling buf_free for callback replies
- Fix "bytes registered" accounting
- Fix kernel-doc comments
- SUNRPC tracepoint cleanups for consistent information
- Optimizations for __rpc_execute()
en_rx_am.c was deleted in 'net-next' but had a bug fixed in it in
'net'.
The esp{4,6}_offload.c conflicts were overlapping changes.
The 'out' label is removed so we just return ERR_PTR(-EINVAL)
directly.
Signed-off-by: David S. Miller <davem@davemloft.net>
Backchannel tasks will not have a reference to the rpc_clnt. Return -1 for
cl_clid in that case.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trondmy@gmail.com>