Daniel Borkmann says:
====================
pull-request: bpf-next 2017-12-18
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Allow arbitrary function calls from one BPF function to another BPF function.
As of today when writing BPF programs, __always_inline had to be used in
the BPF C programs for all functions, unnecessarily causing LLVM to inflate
code size. Handle this more naturally with support for BPF to BPF calls
such that this __always_inline restriction can be overcome. As a result,
it allows for better optimized code and finally enables to introduce core
BPF libraries in the future that can be reused out of different projects.
x86 and arm64 JIT support was added as well, from Alexei.
2) Add infrastructure for tagging functions as error injectable and allow for
BPF to return arbitrary error values when BPF is attached via kprobes on
those. This way of injecting errors generically eases testing and debugging
without having to recompile or restart the kernel. Tags for opting-in for
this facility are added with BPF_ALLOW_ERROR_INJECTION(), from Josef.
3) For BPF offload via nfp JIT, add support for bpf_xdp_adjust_head() helper
call for XDP programs. First part of this work adds handling of BPF
capabilities included in the firmware, and the later patches add support
to the nfp verifier part and JIT as well as some small optimizations,
from Jakub.
4) The bpftool now also gets support for basic cgroup BPF operations such
as attaching, detaching and listing current BPF programs. As a requirement
for the attach part, bpftool can now also load object files through
'bpftool prog load'. This reuses libbpf which we have in the kernel tree
as well. bpftool-cgroup man page is added along with it, from Roman.
5) Back then commit e87c6bc385 ("bpf: permit multiple bpf attachments for
a single perf event") added support for attaching multiple BPF programs
to a single perf event. Given they are configured through perf's ioctl()
interface, the interface has been extended with a PERF_EVENT_IOC_QUERY_BPF
command in this work in order to return an array of one or multiple BPF
prog ids that are currently attached, from Yonghong.
6) Various minor fixes and cleanups to the bpftool's Makefile as well
as a new 'uninstall' and 'doc-uninstall' target for removing bpftool
itself or prior installed documentation related to it, from Quentin.
7) Add CONFIG_CGROUP_BPF=y to the BPF kernel selftest config file which is
required for the test_dev_cgroup test case to run, from Naresh.
8) Fix reporting of XDP prog_flags for nfp driver, from Jakub.
9) Fix libbpf's exit code from the Makefile when libelf was not found in
the system, also from Jakub.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
netdev_bpf.flags is the input member for installing the program.
netdev_bpf.prog_flags is the output member for querying. Set
the correct one on query.
Fixes: 92f0292b35 ("net: xdp: report flags program was installed with on query")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Build bot reported warning about invalid printk formats on 32bit
architectures. Use %zu for size_t and %zd ptr diff.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
For XPB registers reads, some island IDs require special handling (e.g.
ARM island), which is already taken care of in nfp_xpb_readl(), so use
that instead of a straight CPP read.
Without this fix all "xpbm:ArmIsldXpbmMap.*" registers are reported as
0xffffffff. It has also been observed to cause a system reboot.
With this fix correct values are reported, none of which are 0xffffffff.
The values may be read using ethtool debug level 2.
# ethtool -W <netdev> 2
# ethtool -w <netdev> data dump.dat
Fixes: 0e6c4955e1 ("nfp: dump CPP, XPB and direct ME CSRs")
Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In TLV-based ethtool debug dumps, don't do a CPP read for absolute
rtsyms, use the addr field in the symbol table directly as the value.
Without this fix rtsym gro_release_ring_0 is 4 bytes of zeros.
With this fix the correct value, 0x0000004a 0x00000000 is reported.
The values may be read using ethtool debug level 2.
# ethtool -W <netdev> 2
# ethtool -w <netdev> data dump.dat
Fixes: e1e798e3fd ("nfp: dump rtsyms")
Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Firmware flashing takes around 60s (specified to not take more than
70s). Prevent hogging the RTNL lock in this time and make use of the
longer timeout for the NSP command. The timeout is set to 2.5 * 70
seconds.
We only allow flashing the firmware from reprs or PF netdevs. VFs do not
have an app reference.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The firmware flashing NSP operation takes longer to execute than the
current default timeout. We need a mechanism to set a longer timeout for
some commands. This patch adds the infrastructure to this.
The default timeout is still 30 seconds.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the program is simple and has only one adjust head call
with constant parameters, we can check that the call will
always succeed at translation time. We need to track the
location of the call and make sure parameters are always
the same. We also have to check the parameters against
datapath constraints and ETH_HLEN.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Support bpf_xdp_adjust_head(). We need to check whether the
packet offset after adjustment is within datapath's limits.
We also check if the frame is at least ETH_HLEN long (similar
to the kernel implementation).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add skeleton of verifier checks and translation handler
for call instructions. Make sure jump target resolution
will not treat them as jumps.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
BPF FW creates a run time symbol called bpf_capabilities which
contains TLV-formatted capability information. Allocate app
private structure to store parsed capabilities and add a skeleton
of parsing logic.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Allow users outside of core reading area sizes. This was not needed
previously because whatever entity created the area would usually know
what size it asked for. The nfp_rtsym_map() helper, however, will
allocate the area based on the size of an RT-symbol with given name.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Convert the requested dump level parameter to big-endian at the start of
nfp_net_dump_calculate_size() and nfp_net_dump_populate_buffer(), then
compare and assign it directly where needed in the traversal and prolog
code. This decreases the total number of conversions used.
Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Port matching is selected by default on every rule so remove check for it
and delete 'else' side of the statement. Remove nfp_flower_meta_one as now
it will not feature in the code. Rename nfp_flower_meta_two given that one
has been removed.
'Additional metadata' if statement can never be true so remove it as well.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the matching of mac/mpls as a default selection. These are not
necessarily set by a TC rule (unlike the port). Previously a mac/mpls
field would exist in every match and be masked out if not used. This patch
has no impact on functionality but removes unnessary memory assignment in
the match cmsg.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcfm_dev always points to the correct netdev and we already
hold a refcnt, so no need to use tcfm_ifindex to lookup again.
If we would support moving target netdev across netns, using
pointer would be better than ifindex.
This also fixes dumping obsolete ifindex, now after the
target device is gone we just dump 0 as ifindex.
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- The spec defines CSR address ranges for indirect ME CSRs. For Each TLV
chunk in the spec, dump a chunk that includes the spec and the data
over the defined address range.
- Each indirect CSR has 8 contexts. To read one context, first write the
context to a specific derived address, read it back, and then read the
register value.
- For each address, read and dump all 8 contexts in this manner.
Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Add spec TLV for hwinfo field, containing key string as data.
- Add dump TLV for hwinfo field, with data being key and value as packed
zero-terminated strings.
- If specified hwinfo field is not found, dump the spec TLV as -ENOENT
error.
Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Dump hwinfo as separate TLV chunk, in a packed format containing
zero-separated key and value strings.
- This provides additional debug context, if requested by the dumpspec.
Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Load the TLV-based binary specification of what needs to be included in
a dump, from the "_abi_dump_spec" rtsymbol. If the symbol is not defined,
then dumps for levels >= 1 are not supported.
Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Skeleton code to perform a binary debug dump via ethtoolops
"set_dump", "get_dump_flags" and "get_dump_data", i.e. the ethtool
-W/w mechanism.
- Skeleton functions for debugdump operations provided.
- An integer "dump level" can be specified, this is stored between
ethtool invocations. Dump level 0 is still the "arm.diag" resource for
backward compatibility. Other dump levels each define a set of state
information to include in the dump, driven by a spec from FW.
Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously we swapped the tx_packets, tx_bytes and tx_dropped counters
with rx_packets, rx_bytes and rx_dropped counters, respectively. This
behaviour is correct and expected for VF representors but it should not
be swapped for physical port mac representors.
Fixes: eadfa4c3be ("nfp: add stats and xmit helpers for representors")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since day one of XDP drivers had to remember to free the program
on the remove path. This leads to code duplication and is error
prone. Make the stack query the installed programs on unregister
and if something is installed, remove the program. Freeing of
program attached to XDP generic is moved from free_netdev() as well.
Because the remove will now be called before notifiers are
invoked, BPF offload state of the program will not get destroyed
before uninstall.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Some drivers enforce that flags on program replacement and
removal must match the flags passed on install. This leaves
the possibility open to enable simultaneous loading
of XDP programs both to HW and DRV.
Allow such drivers to report the flags back to the stack.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch add the optimization frontend, but adding a new eBPF IR scan
pass "nfp_bpf_opt_ldst_gather".
The pass will traverse the IR to recognize the load/store pairs sequences
that come from lowering of memory copy builtins.
The gathered memory copy information will be kept in the meta info
structure of the first load instruction in the sequence and will be
consumed by the optimization backend added in the previous patches.
NOTE: a sequence with cross memory access doesn't qualify this
optimization, i.e. if one load in the sequence will load from place that
has been written by previous store. This is because when we turn the
sequence into single CPP operation, we are reading all contents at once
into NFP transfer registers, then write them out as a whole. This is not
identical with what the original load/store sequence is doing.
Detecting cross memory access for two random pointers will be difficult,
fortunately under XDP/eBPF's restrictied runtime environment, the copy
normally happen among map, packet data and stack, they do not overlap with
each other.
And for cases supported by NFP, cross memory access will only happen on
PTR_TO_PACKET. Fortunately for this, there is ID information that we could
do accurate memory alias check.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When the gathered copy length is bigger than 32-bytes and within 128-bytes
(the maximum length a single CPP Pull/Push request can finish), the
strategy of read/write are changeed into:
* Read.
- use direct reference mode when length is within 32-bytes.
- use indirect mode when length is bigger than 32-bytes.
* Write.
- length <= 8-bytes
use write8 (direct_ref).
- length <= 32-byte and 4-bytes aligned
use write32 (direct_ref).
- length <= 32-bytes but not 4-bytes aligned
use write8 (indirect_ref).
- length > 32-bytes and 4-bytes aligned
use write32 (indirect_ref).
- length > 32-bytes and not 4-bytes aligned and <= 40-bytes
use write32 (direct_ref) to finish the first 32-bytes.
use write8 (direct_ref) to finish all remaining hanging part.
- length > 32-bytes and not 4-bytes aligned
use write32 (indirect_ref) to finish those 4-byte aligned parts.
use write8 (direct_ref) to finish all remaining hanging part.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
For NFP, we want to re-group a sequence of load/store pairs lowered from
memcpy/memmove into single memory bulk operation which then could be
accelerated using NFP CPP bus.
This patch extends the existing load/store auxiliary information by adding
two new fields:
struct bpf_insn *paired_st;
s16 ldst_gather_len;
Both fields are supposed to be carried by the the load instruction at the
head of the sequence. "paired_st" is the corresponding store instruction at
the head and "ldst_gather_len" is the gathered length.
If "ldst_gather_len" is negative, then the sequence is doing memory
load/store in descending order, otherwise it is in ascending order. We need
this information to detect overlapped memory access.
This patch then optimize memory bulk copy when the copy length is within
32-bytes.
The strategy of read/write used is:
* Read.
Use read32 (direct_ref), always.
* Write.
- length <= 8-bytes
write8 (direct_ref).
- length <= 32-bytes and is 4-byte aligned
write32 (direct_ref).
- length <= 32-bytes but is not 4-byte aligned
write8 (indirect_ref).
NOTE: the optimization should not change program semantics. The destination
register of the last load instruction should contain the same value before
and after this optimization.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
It is usual that we need to check if one BPF insn is for loading/storeing
data from/to memory.
Therefore, it makes sense to factor out related code to become common
helper functions.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When immed is used with No-Dest, the emitter should use reg.dst instead of
reg.areg for the destination, using the latter will actually encode
register zero.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The NFP normally requires the source operands to be difference addressing
modes, but we should rule out the very special NN_REG_NONE type.
There are instruction that ignores both A/B operands, for example:
local_csr_rd
For these instructions, we might pass the same operand type, NN_REG_NONE,
for both A/B operands.
NOTE: in current NFP ISA, it is only possible for instructions with
unrestricted operands to take none operands, but in case there is new and
similar instructoin in restricted form, they would follow similar rules,
so swreg_to_restricted is updated as well.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
NFP eBPF offload JIT engine is doing some instruction combine based
optimizations which however must not be safe if the combined sequences
are across basic block boarders.
Currently, there are post checks during fixing jump destinations. If the
jump destination is found to be eBPF insn that has been combined into
another one, then JIT engine will raise error and abort.
This is not optimal. The JIT engine ought to disable the optimization on
such cross-bb-border sequences instead of abort.
As there is no control flow information in eBPF infrastructure that we
can't do basic block based optimizations, this patch extends the existing
jump destination record pass to also flag the jump destination, then in
instruction combine passes we could skip the optimizations if insns in the
sequence are jump targets.
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
eBPF insns are internally organized as dual-list inside NFP offload JIT.
Random access to an insn needs to be done by either forward or backward
traversal along the list.
One place we need to do such traversal is at nfp_fixup_branches where one
traversal is needed for each jump insn to find the destination. Such
traversals could be avoided if jump destinations are collected through a
single travesal in a pre-scan pass, and such information could also be
useful in other places where jump destination info are needed.
This patch adds such jump destination collection in nfp_prog_prepare.
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch adds support for backward jump on NFP.
- restrictions on backward jump in various functions have been removed.
- nfp_fixup_branches now supports backward jump.
There is one thing to note, currently an input eBPF JMP insn may generate
several NFP insns, for example,
NFP imm move insn A \
NFP compare insn B --> 3 NFP insn jited from eBPF JMP insn M
NFP branch insn C /
---
NFP insn X --> 1 NFP insn jited from eBPF insn N
---
...
therefore, we are doing sanity check to make sure the last jited insn from
an eBPF JMP is a NFP branch instruction.
Once backward jump is allowed, it is possible an eBPF JMP insn is at the
end of the program. This is however causing trouble for the sanity check.
Because the sanity check requires the end index of the NFP insns jited from
one eBPF insn while only the start index is recorded before this patch that
we can only get the end index by:
start_index_of_the_next_eBPF_insn - 1
or for the above example:
start_index_of_eBPF_insn_N (which is the index of NFP insn X) - 1
nfp_fixup_branches was using nfp_for_each_insn_walk2 to expose *next* insn
to each iteration during the traversal so the last index could be
calculated from which. Now, it needs some extra code to handle the last
insn. Meanwhile, the use of walk2 is actually unnecessary, we could simply
use generic single instruction walk to do this, the next insn could be
easily calculated using list_next_entry.
So, this patch migrates the jump fixup traversal method to
*list_for_each_entry*, this simplifies the code logic a little bit.
The other thing to note is a new state variable "last_bpf_off" is
introduced to track the index of the last jited NFP insn. This is necessary
because NFP is generating special purposes epilogue sequences, so the index
of the last jited NFP insn is *not* always nfp_prog->prog_len - 1.
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Since commit 3a025e1d1c ("Add optional check for bad kernel-doc
comments") when built with W=1 build will complain about kdoc errors.
Fix the kdoc issues we have. kdoc is still confused by defines in
nfp_net_ctrl.h but those are not really errors.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Daniel Borkmann says:
====================
pull-request: bpf 2017-11-23
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Several BPF offloading fixes, from Jakub. Among others:
- Limit offload to cls_bpf and XDP program types only.
- Move device validation into the driver and don't make
any assumptions about the device in the classifier due
to shared blocks semantics.
- Don't pass offloaded XDP program into the driver when
it should be run in native XDP instead. Offloaded ones
are not JITed for the host in such cases.
- Don't destroy device offload state when moved to
another namespace.
- Revert dumping offload info into user space for now,
since ifindex alone is not sufficient. This will be
redone properly for bpf-next tree.
2) Fix test_verifier to avoid using bpf_probe_write_user()
helper in test cases, since it's dumping a warning into
kernel log which may confuse users when only running tests.
Switch to use bpf_trace_printk() instead, from Yonghong.
3) Several fixes for correcting ARG_CONST_SIZE_OR_ZERO semantics
before it becomes uabi, from Gianluca. More specifically:
- Add a type ARG_PTR_TO_MEM_OR_NULL that is used only
by bpf_csum_diff(), where the argument is either a
valid pointer or NULL. The subsequent ARG_CONST_SIZE_OR_ZERO
then enforces a valid pointer in case of non-0 size
or a valid pointer or NULL in case of size 0. Given
that, the semantics for ARG_PTR_TO_MEM in combination
with ARG_CONST_SIZE_OR_ZERO are now such that in case
of size 0, the pointer must always be valid and cannot
be NULL. This fix in semantics allows for bpf_probe_read()
to drop the recently added size == 0 check in the helper
that would become part of uabi otherwise once released.
At the same time we can then fix bpf_probe_read_str() and
bpf_perf_event_output() to use ARG_CONST_SIZE_OR_ZERO
instead of ARG_CONST_SIZE in order to fix recently
reported issues by Arnaldo et al, where LLVM optimizes
two boundary checks into a single one for unknown
variables where the verifier looses track of the variable
bounds and thus rejects valid programs otherwise.
4) A fix for the verifier for the case when it detects
comparison of two constants where the branch is guaranteed
to not be taken at runtime. Verifier will rightfully prune
the exploration of such paths, but we still pass the program
to JITs, where they would complain about using reserved
fields, etc. Track such dead instructions and sanitize
them with mov r0,r0. Rejection is not possible since LLVM
may generate them for valid C code and doesn't do as much
data flow analysis as verifier. For bpf-next we might
implement removal of such dead code and adjust branches
instead. Fix from Alexei.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
With TC shared block changes we can't depend on correct netdev
pointer being available in cls_bpf. Move the device validation
to the driver. Core will only make sure that offloaded programs
are always attached in the driver (or in HW by the driver). We
trust that drivers which implement offload callbacks will perform
necessary checks.
Moving the checks to the driver is generally a useful thing,
in practice the check should be against a switchdev instance,
not a netdev, given that most ASICs will probably allow using
the same program on many ports.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Pull networking fixes from David Miller:
1) Revert regression inducing change to the IPSEC template resolver,
from Steffen Klassert.
2) Peeloffs can cause the wrong sk to be waken up in SCTP, fix from Xin
Long.
3) Min packet MTU size is wrong in cpsw driver, from Grygorii Strashko.
4) Fix build failure in netfilter ctnetlink, from Arnd Bergmann.
5) ISDN hisax driver checks pnp_irq() for errors incorrectly, from
Arvind Yadav.
6) Fix fealnx driver build failure on MIPS, from Huacai Chen.
7) Fix into leak in SCTP, the scope_id of socket addresses is not
always filled in. From Eric W. Biederman.
8) MTU inheritance between physical function and representor fix in nfp
driver, from Dirk van der Merwe.
9) Fix memory leak in rsi driver, from Colin Ian King.
10) Fix expiration and generation ID handling of cached ipv4 redirect
routes, from Xin Long.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (40 commits)
net: usb: hso.c: remove unneeded DRIVER_LICENSE #define
ibmvnic: fix dma_mapping_error call
ipvlan: NULL pointer dereference panic in ipvlan_port_destroy
route: also update fnhe_genid when updating a route cache
route: update fnhe_expires for redirect when the fnhe exists
sctp: set frag_point in sctp_setsockopt_maxseg correctly
rsi: fix memory leak on buf and usb_reg_buf
net/netlabel: Add list_next_rcu() in rcu_dereference().
nfp: remove false positive offloads in flower vxlan
nfp: register flower reprs for egress dev offload
nfp: inherit the max_mtu from the PF netdev
nfp: fix vlan receive MAC statistics typo
nfp: fix flower offload metadata flag usage
virto_net: remove empty file 'virtio_net.'
net/sctp: Always set scope_id in sctp_inet6_skb_msgname
fealnx: Fix building error on MIPS
isdn: hisax: Fix pnp_irq's error checking for setup_teles3
isdn: hisax: Fix pnp_irq's error checking for setup_sedlbauer_isapnp
isdn: hisax: Fix pnp_irq's error checking for setup_niccy
isdn: hisax: Fix pnp_irq's error checking for setup_ix1micro
...
Pass information to the match offload on whether or not the repr is the
ingress or egress dev. Only accept tunnel matches if repr is the egress
dev.
This means rules such as the following are successfully offloaded:
tc .. add dev vxlan0 .. enc_dst_port 4789 .. action redirect dev nfp_p0
While rules such as the following are rejected:
tc .. add dev nfp_p0 .. enc_dst_port 4789 .. action redirect dev vxlan0
Also reject non tunnel flows that are offloaded to an egress dev.
Non tunnel matches assume that the offload dev is the ingress port and
offload a match accordingly.
Fixes: 611aec101a ("nfp: compile flower vxlan tunnel metadata match fields")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Register a callback for offloading flows that have a repr as their egress
device. The new egdev_register function is added to net-next for the 4.15
release.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>