1M is mandatory to be supported by LE controllers and the same
would be set in power on. This patch defines hdev flags for
LE PHYs and set 1M to default.
Signed-off-by: Jaganath Kanakkassery <jaganathx.kanakkassery@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Currently nft uses inlined variants for common operations
such as 'ip saddr 1.2.3.4' instead of an indirect call.
Also handle meta get operations and lookups without indirect call,
both are builtin.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The meter code would create an entry for each new meter. However, it
would not set the meter id in the new entry, so every meter would appear
to have a meter id of zero. This commit properly sets the meter id when
adding the entry.
Fixes: 96fbc13d7e ("openvswitch: Add meter infrastructure")
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Cc: Andy Zhou <azhou@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If an invalid MTU value is set through rtnetlink return extra error
information instead of putting message in kernel log. For other cases
where there is no visible API, keep the error report in the log.
Example:
# ip li set dev enp12s0 mtu 10000
Error: mtu greater than device maximum.
# ifconfig enp12s0 mtu 10000
SIOCSIFMTU: Invalid argument
# dmesg | tail -1
[ 2047.795467] enp12s0: mtu greater than device maximum
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Report the minimum and maximum MTU allowed on a device
via netlink so that it can be displayed by tools like
ip link.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements the feature described in rfc1812#section-5.3.5.2
and rfc2644. It allows the router to forward directed broadcast when
sysctl bc_forwarding is enabled.
Note that this feature could be done by iptables -j TEE, but it would
cause some problems:
- target TEE's gateway param has to be set with a specific address,
and it's not flexible especially when the route wants forward all
directed broadcasts.
- this duplicates the directed broadcasts so this may cause side
effects to applications.
Besides, to keep consistent with other os router like BSD, it's also
necessary to implement it in the route rx path.
Note that route cache needs to be flushed when bc_forwarding is
changed.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When freebind feature is set of an IPv6 socket, any source address can
be used when sending UDP datagrams using IPv6 PKTINFO ancillary
message. Global non-local bind feature was added in commit
35a256fee5 ("ipv6: Nonlocal bind") for IPv6. This commit also allows
IPv6 source address spoofing when non-local bind feature is enabled.
Signed-off-by: Vincent Bernat <vincent@bernat.im>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current code is problematic because the iov_iter is reverted and
never advanced in the non-error case. This patch skips the revert in the
non-error case. This patch also fixes the amount by which the iov_iter
is reverted. Currently, iov_iter is reverted by size, which can be
greater than the amount by which the iter was actually advanced.
Instead, only revert by the amount that the iter was advanced.
Fixes: 4718799817 ("tls: Fix zerocopy_from_iter iov handling")
Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tls_push_record either returns 0 on success or a negative value on failure.
This patch removes code that would only be executed if tls_push_record
were to return a positive value.
Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For some very small BDPs (with just a few packets) there was a
quantization effect where the target number of packets in flight
during the super-unity-gain (1.25x) phase of gain cycling was
implicitly truncated to a number of packets no larger than the normal
unity-gain (1.0x) phase of gain cycling. This meant that in multi-flow
scenarios some flows could get stuck with a lower bandwidth, because
they did not push enough packets inflight to discover that there was
more bandwidth available. This was really only an issue in multi-flow
LAN scenarios, where RTTs and BDPs are low enough for this to be an
issue.
This fix ensures that gain cycling can raise inflight for small BDPs
by ensuring that in PROBE_BW mode target inflight values with a
super-unity gain are always greater than inflight values with a gain
<= 1. Importantly, this applies whether the inflight value is
calculated for use as a cwnd value, or as a target inflight value for
the end of the super-unity phase in bbr_is_next_cycle_phase() (both
need to be bigger to ensure we can probe with more packets in flight
reliably).
This is a candidate fix for stable releases.
Fixes: 0f8782ea14 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Priyaranjan Jha <priyarjha@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'call' is a user-controlled value, so sanitize the array index after the
bounds check to avoid speculating past the bounds of the 'nargs' array.
Found with the help of Smatch:
net/socket.c:2508 __do_sys_socketcall() warn: potential spectre issue
'nargs' [r] (local cap)
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jeremy Cline <jcline@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2018-07-28
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) API fixes for libbpf's BTF mapping of map key/value types in order
to make them compatible with iproute2's BPF_ANNOTATE_KV_PAIR()
markings, from Martin.
2) Fix AF_XDP to not report POLLIN prematurely by using the non-cached
consumer pointer of the RX queue, from Björn.
3) Fix __xdp_return() to check for NULL pointer after the rhashtable
lookup that retrieves the allocator object, from Taehee.
4) Fix x86-32 JIT to adjust ebp register in prologue and epilogue
by 4 bytes which got removed from overall stack usage, from Wang.
5) Fix bpf_skb_load_bytes_relative() length check to use actual
packet length, from Daniel.
6) Fix uninitialized return code in libbpf bpf_perf_event_read_simple()
handler, from Thomas.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove BUG_ON() from fib_compute_spec_dst routine and check
in_dev pointer during flowi4 data structure initialization.
fib_compute_spec_dst routine can be run concurrently with device removal
where ip_ptr net_device pointer is set to NULL. This can happen
if userspace enables pkt info on UDP rx socket and the device
is removed while traffic is flowing
Fixes: 35ebf65e85 ("ipv4: Create and use fib_compute_spec_dst() helper")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The len > skb_headlen(skb) cannot be used as a maximum upper bound
for the packet length since it does not have any relation to the full
linear packet length when filtering is used from upper layers (e.g.
in case of reuseport BPF programs) as by then skb->data, skb->len
already got mangled through __skb_pull() and others.
Fixes: 4e1ec56cdc ("bpf: add skb_load_bytes_relative helper")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
tipc_bcast_init() is never called in atomic context.
It calls kzalloc() with GFP_ATOMIC, which is not necessary.
GFP_ATOMIC can be replaced with GFP_KERNEL.
This is found by a static analysis tool named DCNS written by myself.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tipc_nametbl_init() is never called in atomic context.
It calls kzalloc() with GFP_ATOMIC, which is not necessary.
GFP_ATOMIC can be replaced with GFP_KERNEL.
This is found by a static analysis tool named DCNS written by myself.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This field is not used.
Treat PPPIOC*MRU the same way as PPPIOC*FLAGS: "get" requests return 0,
while "set" requests vadidate the user supplied pointer but discard its
value.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
This field is not used.
Keep validating user input in PPPIOCSFLAGS. Even though we discard the
value, it would look wrong to succeed if an invalid address was passed
from userspace.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
On ingress, a network device such as a switch assigns to packets
priority based on various criteria. Common options include interpreting
PCP and DSCP fields according to user configuration. When a packet
egresses the switch, a reverse process may rewrite PCP and/or DSCP
values according to packet priority.
The following three functions support a) obtaining a DSCP-to-priority
map or vice versa, and b) finding default-priority entries in APP
database.
The DCB subsystem supports for APP entries a very generous M:N mapping
between priorities and protocol identifiers. Understandably,
several (say) DSCP values can map to the same priority. But this
asymmetry holds the other way around as well--one priority can map to
several DSCP values. For this reason, the following functions operate in
terms of bitmaps, with ones in positions that match some APP entry.
- dcb_ieee_getapp_dscp_prio_mask_map() to compute for a given netdevice
a map of DSCP-to-priority-mask, which gives for each DSCP value a
bitmap of priorities related to that DSCP value by APP, along the
lines of dcb_ieee_getapp_mask().
- dcb_ieee_getapp_prio_dscp_mask_map() similarly to compute for a given
netdevice a map from priorities to a bitmap of DSCPs.
- dcb_ieee_getapp_default_prio_mask() which finds all default-priority
rules for a given port in APP database, and returns a mask of
priorities allowed by these default-priority rules.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function dcb_app_lookup walks the list of specified DCB APP entries,
looking for one that matches a given criteria: ifindex, selector,
protocol ID and optionally also priority. The "don't care" value for
priority is set to 0, because that priority has not been allowed under
CEE regime, which predates the IEEE standardization.
Under IEEE, 0 is a valid priority number. But because dcb_app_lookup
considers zero a wild card, attempts to add an APP entry with priority 0
fail when other entries exist for a given ifindex / selector / PID
triplet.
Fix by changing the wild-card value to -1.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case a chain is empty and not explicitly created by a user,
such chain should not exist. The only exception is if there is
an action "goto chain" pointing to it. In that case, don't show the
chain in the dump. Track the chain references held by actions and
use them to find out if a chain should or should not be shown
in chain dump.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2018-07-27
1) Extend the output_mark to also support the input direction
and masking the mark values before applying to the skb.
2) Add a new lookup key for the upcomming xfrm interfaces.
3) Extend the xfrm lookups to match xfrm interface IDs.
4) Add virtual xfrm interfaces. The purpose of these interfaces
is to overcome the design limitations that the existing
VTI devices have.
The main limitations that we see with the current VTI are the
following:
VTI interfaces are L3 tunnels with configurable endpoints.
For xfrm, the tunnel endpoint are already determined by the SA.
So the VTI tunnel endpoints must be either the same as on the
SA or wildcards. In case VTI tunnel endpoints are same as on
the SA, we get a one to one correlation between the SA and
the tunnel. So each SA needs its own tunnel interface.
On the other hand, we can have only one VTI tunnel with
wildcard src/dst tunnel endpoints in the system because the
lookup is based on the tunnel endpoints. The existing tunnel
lookup won't work with multiple tunnels with wildcard
tunnel endpoints. Some usecases require more than on
VTI tunnel of this type, for example if somebody has multiple
namespaces and every namespace requires such a VTI.
VTI needs separate interfaces for IPv4 and IPv6 tunnels.
So when routing to a VTI, we have to know to which address
family this traffic class is going to be encapsulated.
This is a lmitation because it makes routing more complex
and it is not always possible to know what happens behind the
VTI, e.g. when the VTI is move to some namespace.
VTI works just with tunnel mode SAs. We need generic interfaces
that ensures transfomation, regardless of the xfrm mode and
the encapsulated address family.
VTI is configured with a combination GRE keys and xfrm marks.
With this we have to deal with some extra cases in the generic
tunnel lookup because the GRE keys on the VTI are actually
not GRE keys, the GRE keys were just reused for something else.
All extensions to the VTI interfaces would require to add
even more complexity to the generic tunnel lookup.
So to overcome this, we developed xfrm interfaces with the
following design goal:
It should be possible to tunnel IPv4 and IPv6 through the same
interface.
No limitation on xfrm mode (tunnel, transport and beet).
Should be a generic virtual interface that ensures IPsec
transformation, no need to know what happens behind the
interface.
Interfaces should be configured with a new key that must match a
new policy/SA lookup key.
The lookup logic should stay in the xfrm codebase, no need to
change or extend generic routing and tunnel lookups.
Should be possible to use IPsec hardware offloads of the underlying
interface.
5) Remove xfrm pcpu policy cache. This was added after the flowcache
removal, but it turned out to make things even worse.
From Florian Westphal.
6) Allow to update the set mark on SA updates.
From Nathan Harold.
7) Convert some timestamps to time64_t.
From Arnd Bergmann.
8) Don't check the offload_handle in xfrm code,
it is an opaque data cookie for the driver.
From Shannon Nelson.
9) Remove xfrmi interface ID from flowi. After this pach
no generic code is touched anymore to do xfrm interface
lookups. From Benedict Wong.
10) Allow to update the xfrm interface ID on SA updates.
From Nathan Harold.
11) Don't pass zero to ERR_PTR() in xfrm_resolve_and_create_bundle.
From YueHaibing.
12) Return more detailed errors on xfrm interface creation.
From Benedict Wong.
13) Use PTR_ERR_OR_ZERO instead of IS_ERR + PTR_ERR.
From the kbuild test robot.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net): ipsec 2018-07-27
1) Fix PMTU handling of vti6. We update the PMTU on
the xfrm dst_entry which is not cached anymore
after the flowchache removal. So update the
PMTU of the original dst_entry instead.
From Eyal Birger.
2) Fix a leak of kernel memory to userspace.
From Eric Dumazet.
3) Fix a possible dst_entry memleak in xfrm_lookup_route.
From Tommi Rantala.
4) Fix a skb leak in case we can't call nlmsg_multicast
from xfrm_nlmsg_multicast. From Florian Westphal.
5) Fix a leak of a temporary buffer in the error path of
esp6_input. From Zhen Lei.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net/xfrm/xfrm_interface.c:692:1-3: WARNING: PTR_ERR_OR_ZERO can be used
Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
Generated by: scripts/coccinelle/api/ptr_ret.cocci
Fixes: 44e2b838c2 ("xfrm: Return detailed errors from xfrmi_newlink")
CC: Benedict Wong <benedictwong@google.com>
Signed-off-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
rhashtable_lookup() can return NULL. so that NULL pointer
check routine should be added.
Fixes: 02b55e5657 ("xdp: add MEM_TYPE_ZERO_COPY")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Once user manually deletes the chain using "chain del", the chain cannot
be marked as explicitly created anymore.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Fixes: 32a4f5ecd7 ("net: sched: introduce chain object to uapi")
Signed-off-by: David S. Miller <davem@davemloft.net>
The zerocopy path ultimately calls iov_iter_get_pages, which defines the
step function for ITER_KVECs as simply, return -EFAULT. Taking the
non-zerocopy path for ITER_KVECs avoids the unnecessary fallback.
See https://lore.kernel.org/lkml/20150401023311.GL29656@ZenIV.linux.org.uk/T/#u
for a discussion of why zerocopy for vmalloc data is not a good idea.
Discovered while testing NBD traffic encrypted with ktls.
Fixes: c46234ebb4 ("tls: RX path for ktls")
Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Code at line 1850 is unreachable. Fix this by removing the break
statement above it, so the code for case RTM_GETCHAIN can be
properly executed.
Addresses-Coverity-ID: 1472050 ("Structurally dead code")
Fixes: 32a4f5ecd7 ("net: sched: introduce chain object to uapi")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tunnel reception hook is only used by l2tp_ppp for skipping PPP
framing bytes. This is a session specific operation, but once a PPP
session sets ->recv_payload_hook on its tunnel, all frames received by
the tunnel will enter pppol2tp_recv_payload_hook(), including those
targeted at Ethernet sessions (an L2TPv3 tunnel can multiplex PPP and
Ethernet sessions).
So this mechanism is wrong, and uselessly complex. Let's just move this
functionality to the pppol2tp rx handler and drop ->recv_payload_hook.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
when tipc_own_id failed to obtain node identity,dev_put should
be call before return -EINVAL.
Fixes: 682cd3cf94 ("tipc: confgiure and apply UDP bearer MTU on running links")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Removed checks against non-NULL before calling kfree_skb() and
crypto_free_aead(). These functions are safe to be called with NULL
as an argument.
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Acked-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix dev_change_tx_queue_len so it rolls back original value
upon a failure in dev_qdisc_change_tx_queue_len.
This is already done for notifirers' failures, share the code.
In case of failure in dev_qdisc_change_tx_queue_len, some tx queues
would still be of the new length, while they should be reverted.
Currently, the revert is not done, and is marked with a TODO label
in dev_qdisc_change_tx_queue_len, and should find some nice solution
to do it.
Yet it is still better to not apply the newly requested value.
Fixes: 48bfd55e7e ("net_sched: plug in qdisc ops change_tx_queue_len")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Reported-by: Ran Rozenstein <ranro@mellanox.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This will allow to install a child qdisc under cbs. The main use case
is to install ETF (Earliest TxTime First) qdisc under cbs, so there's
another level of control for time-sensitive traffic.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The call in svc_rdma_post_chunk_ctxt() does actually use bad_wr.
Fixes: ed288d74a9 ("net/xprtrdma: Simplify ib_post_(send|recv|srq_recv)() calls")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Currently, code at label *out* is unreachable. Fix this by updating
variable *ret* with -EINVAL, so the jump to *out* can be properly
executed instead of directly returning from function.
Addresses-Coverity-ID: 1472059 ("Structurally dead code")
Fixes: 1e2b44e78e ("rds: Enable RDS IPv6 support")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Build error, implicit declaration of function __inet6_ehashfn shows up
When RDS is enabled but not IPV6.
net/rds/connection.c: In function ‘rds_conn_bucket’:
net/rds/connection.c:67:9: error: implicit declaration of function ‘__inet6_ehashfn’; did you mean ‘__inet_ehashfn’? [-Werror=implicit-function-declaration]
hash = __inet6_ehashfn(lhash, 0, fhash, 0, rds_hash_secret);
^~~~~~~~~~~~~~~
__inet_ehashfn
Current code adds IPV6 as a depends on in config RDS.
Fixes: eee2fa6ab3 ("rds: Changing IP address internal representation to struct in6_addr")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Send an orderly DELETE LINK request before termination of a link group,
add support for client triggered DELETE LINK processing. And send a
disorderly DELETE LINK before module is unloaded.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remember the fallback reason code and the peer diagnosis code for
smc sockets, and provide them in smc_diag.c to the netlink interface.
And add more detailed reason codes.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SMC code uses the base gid for VLAN traffic. The gids exchanged in
the CLC handshake and the gid index used for the QP have to switch
from the base gid to the appropriate vlan gid.
When searching for a matching IB device port for a certain vlan
device, it does not make sense to return an IB device port, which
is not enabled for the used vlan_id. Add another check whether a
vlan gid exists for a certain IB device port.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>