Add KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK as a feature flag to
KVM_CAP_X2APIC_API.
The quirk made KVM interpret 0xff as a broadcast even in x2APIC mode.
The enableable capability is needed in order to support standard x2APIC and
remain backward compatible.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
[Expand kvm_apic_mda comment. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM_CAP_X2APIC_API is a capability for features related to x2APIC
enablement. KVM_X2APIC_API_32BIT_FORMAT feature can be enabled to
extend APIC ID in get/set ioctl and MSI addresses to 32 bits.
Both are needed to support x2APIC.
The feature has to be enableable and disabled by default, because
get/set ioctl shifted and truncated APIC ID to 8 bits by using a
non-standard protocol inspired by xAPIC and the change is not
backward-compatible.
Changes to MSI addresses follow the format used by interrupt remapping
unit. The upper address word, that used to be 0, contains upper 24 bits
of the LAPIC address in its upper 24 bits. Lower 8 bits are reserved as
0. Using the upper address word is not backward-compatible either as we
didn't check that userspace zeroed the word. Reserved bits are still
not explicitly checked, but non-zero data will affect LAPIC addresses,
which will cause a bug.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* topic/vsp1: (36 commits)
[media] v4l: vsp1: wpf: Add flipping support
[media] v4l: vsp1: rwpf: Support runtime modification of controls
[media] v4l: vsp1: Simplify alpha propagation
[media] v4l: vsp1: clu: Support runtime modification of controls
[media] v4l: vsp1: lut: Support runtime modification of controls
[media] v4l: vsp1: Support runtime modification of controls
[media] v4l: vsp1: Add Cubic Look Up Table (CLU) support
[media] v4l: vsp1: lut: Expose configuration through a control
[media] v4l: vsp1: lut: Initialize the mutex
[media] v4l: vsp1: dl: Don't free fragments with interrupts disabled
[media] v4l: vsp1: Set entities functions
[media] v4l: vsp1: Don't create LIF entity when the userspace API is enabled
[media] v4l: vsp1: Don't register media device when userspace API is disabled
[media] v4l: vsp1: Base link creation on availability of entities
[media] media: Add video statistics computation functions
[media] media: Add video processing entity functions
[media] v4l: vsp1: sru: Fix intensity control ID
[media] v4l: vsp1: Stop the pipeline upon the first STREAMOFF
[media] v4l: vsp1: Constify operation structures
[media] v4l: vsp1: pipe: Fix typo in comment
...
This patch adds SCTP_PR_ASSOC_STATUS to sctp sockopt, which is used
to dump the prsctp statistics info from the asoc. The prsctp statistics
includes abandoned_sent/unsent from the asoc. abandoned_sent is the
count of the packets we drop packets from retransmit/transmited queue,
and abandoned_unsent is the count of the packets we drop from out_queue
according to the policy.
Note: another option for prsctp statistics dump described in rfc is
SCTP_PR_STREAM_STATUS, which is used to dump the prsctp statistics
info from each stream. But by now, linux doesn't yet have per stream
statistics info, it needs rfc6525 to be implemented. As the prsctp
statistics for each stream has to be based on per stream statistics,
we will delay it until rfc6525 is done in linux.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds SCTP_DEFAULT_PRINFO to sctp sockopt. It is used
to set/get sctp Partially Reliable Policies' default params,
which includes 3 policies (ttl, rtx, prio) and their values.
Still, if we set policy params in sndinfo, we will use the params
of sndinfo against chunks, instead of the default params.
In this patch, we will use 5-8bit of sp/asoc->default_flags
to store prsctp policies, and reuse asoc->default_timetolive
to store their values. It means if we enable and set prsctp
policy, prior ttl timeout in sctp will not work any more.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to section 4.5 of rfc7496, prsctp_enable should be per asoc.
We will add prsctp_enable to both asoc and ep, and replace the places
where it used net.sctp->prsctp_enable with asoc->prsctp_enable.
ep->prsctp_enable will be initialized with net.sctp->prsctp_enable, and
asoc->prsctp_enable will be initialized with ep->prsctp_enable. We can
also modify it's value through sockopt SCTP_PR_SUPPORTED.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While reviewing the documentation gaps on LIRC, it was
noticed that several ioctls aren't used by any LIRC drivers
(nor at staging or mainstream).
It doesn't make sense to document them, as they're not used
anywhere. So, let's remove those from the lirc header.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
As was suggested this patch adds support for the different versions of MLD
and IGMP query types. Since the user visible structure is still in net-next
we can augment it instead of adding netlink attributes.
The distinction between the different IGMP/MLD query types is done as
suggested in Section 7.1, RFC 3376 [1] and Section 8.1, RFC 3810 [2] based
on query payload size and code for IGMP. Since all IGMP packets go through
multicast_rcv() and it uses ip_mc_check_igmp/ipv6_mc_check_mld we can be
sure that at least the ip/ipv6 header can be directly used.
[1] https://tools.ietf.org/html/rfc3376#section-7
[2] https://tools.ietf.org/html/rfc3810#section-8.1
Suggested-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
over time there were multiple requests to access different data
structures and fields of task_struct current, so finally add
the helper to access 'current' as-is. Tracing bpf programs will do
the rest of walking the pointers via bpf_probe_read().
Note that current can be null and bpf program has to deal it with,
but even dumb passing null into bpf_probe_read() is still safe.
Suggested-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linux 4.7-rc6
* tag 'v4.7-rc6': (1245 commits)
Linux 4.7-rc6
ovl: warn instead of error if d_type is not supported
MIPS: Fix possible corruption of cache mode by mprotect.
locks: use file_inode()
usb: dwc3: st: Use explicit reset_control_get_exclusive() API
phy: phy-stih407-usb: Use explicit reset_control_get_exclusive() API
phy: miphy28lp: Inform the reset framework that our reset line may be shared
namespace: update event counter when umounting a deleted dentry
9p: use file_dentry()
lockd: unregister notifier blocks if the service fails to come up completely
ACPI,PCI,IRQ: correct operator precedence
fuse: serialize dirops by default
drm/i915: Fix missing unlock on error in i915_ppgtt_info()
powerpc: Initialise pci_io_base as early as possible
mfd: da9053: Fix compiler warning message for uninitialised variable
mfd: max77620: Fix FPS switch statements
phy: phy-stih407-usb: Inform the reset framework that our reset line may be shared
usb: dwc3: st: Inform the reset framework that our reset line may be shared
usb: host: ehci-st: Inform the reset framework that our reset line may be shared
usb: host: ohci-st: Inform the reset framework that our reset line may be shared
...
Johannes Berg says:
====================
One more set of new features:
* beacon report (for radio measurement) support in cfg80211/mac80211
* hwsim: allow wmediumd in namespaces
* mac80211: extend 160MHz workaround to CSA IEs
* mesh: properly encrypt group-addressed privacy action frames
* mesh: allow setting peer AID
* first steps for MU-MIMO monitor mode
* along with various other cleanups and improvements
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/mellanox/mlx5/core/en.h
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
drivers/net/usb/r8152.c
All three conflicts were overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next,
they are:
1) Don't use userspace datatypes in bridge netfilter code, from
Tobin Harding.
2) Iterate only once over the expectation table when removing the
helper module, instead of once per-netns, from Florian Westphal.
3) Extra sanitization in xt_hook_ops_alloc() to return error in case
we ever pass zero hooks, xt_hook_ops_alloc():
4) Handle NFPROTO_INET from the logging core infrastructure, from
Liping Zhang.
5) Autoload loggers when TRACE target is used from rules, this doesn't
change the behaviour in case the user already selected nfnetlink_log
as preferred way to print tracing logs, also from Liping Zhang.
6) Conntrack slabs with SLAB_HWCACHE_ALIGN to allow rearranging fields
by cache lines, increases the size of entries in 11% per entry.
From Florian Westphal.
7) Skip zone comparison if CONFIG_NF_CONNTRACK_ZONES=n, from Florian.
8) Remove useless defensive check in nf_logger_find_get() from Shivani
Bhardwaj.
9) Remove zone extension as place it in the conntrack object, this is
always include in the hashing and we expect more intensive use of
zones since containers are in place. Also from Florian Westphal.
10) Owner match now works from any namespace, from Eric Bierdeman.
11) Make sure we only reply with TCP reset to TCP traffic from
nf_reject_ipv4, patch from Liping Zhang.
12) Introduce --nflog-size to indicate amount of network packet bytes
that are copied to userspace via log message, from Vishwanath Pai.
This obsoletes --nflog-range that has never worked, it was designed
to achieve this but it has never worked.
13) Introduce generic macros for nf_tables object generation masks.
14) Use generation mask in table, chain and set objects in nf_tables.
This allows fixes interferences with ongoing preparation phase of
the commit protocol and object listings going on at the same time.
This update is introduced in three patches, one per object.
15) Check if the object is active in the next generation for element
deactivation in the rbtree implementation, given that deactivation
happens from the commit phase path we have to observe the future
status of the object.
16) Support for deletion of just added elements in the hash set type.
17) Allow to resize hashtable from /proc entry, not only from the
obscure /sys entry that maps to the module parameter, from Florian
Westphal.
18) Get rid of NFT_BASECHAIN_DISABLED, this code is not exercised
anymore since we tear down the ruleset whenever the netdevice
goes away.
19) Support for matching inverted set lookups, from Arturo Borrero.
20) Simplify the iptables_mangle_hook() by removing a superfluous
extra branch.
21) Introduce ether_addr_equal_masked() and use it from the netfilter
codebase, from Joe Perches.
22) Remove references to "Use netfilter MARK value as routing key"
from the Netfilter Kconfig description given that this toggle
doesn't exists already for 10 years, from Moritz Sichert.
23) Introduce generic NF_INVF() and use it from the xtables codebase,
from Joe Perches.
24) Setting logger to NONE via /proc was not working unless explicit
nul-termination was included in the string. This fixes seems to
leave the former behaviour there, so we don't break backward.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously, mesh power management functionality works only with kernel
MPM. Because user space MPM did not report mesh peer AID to kernel,
the kernel could not identify the bit in TIM element. So this patch
adds mesh peer AID setting API.
Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Beacon report radio measurement requires reporting observed BSSs
on the channels specified in the beacon request. If the measurement
mode is set to passive or active, it requires actually performing a
scan (passive or active, accordingly), and reporting the time that
the scan was started and the time each beacon/probe was received
(both in terms of TSF of the BSS of the requesting AP). If the
request mode is table, this information is optional.
In addition, the radio measurement request specifies the channel
dwell time for the measurement.
In order to use scan for beacon report when the mode is active or
passive, add a parameter to scan request that specifies the
channel dwell time, and add scan start time and beacon received time
to scan results information.
Supporting beacon report is required for Multi Band Operation (MBO).
Signed-off-by: Assaf Krauss <assaf.krauss@intel.com>
Signed-off-by: David Spinadel <david.spinadel@intel.com>
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
add API to support VHT MU-MIMO air sniffer.
in MU-MIMO there are parallel frames on the air while the HW
has only one RX.
add the capability to sniff one of the MU-MIMO parallel frames by
giving the sniffer additional information so it'll know which
of the parallel frames it shall follow.
Add attribute - NL80211_ATTR_MU_MIMO_GROUP_DATA - for getting
a MU-MIMO groupID in order to monitor packets from that group
using VHT MU-MIMO.
And add attribute -NL80211_ATTR_MU_MIMO_FOLLOW_ADDR - for passing
MAC address to monitor mode.
that option will be used by VHT MU-MIMO air sniffer to follow a
station according to it's MAC address using VHT MU-MIMO.
Signed-off-by: Aviya Erenfeld <aviya.erenfeld@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Simon Wunderlich says:
====================
This feature patchset includes the following changes:
- Cleanup work by Markus Pargmann and Sven Eckelmann (six patches)
- Initial Netlink support by Matthias Schiffer (two patches)
- Throughput Meter implementation by Antonio Quartulli, a kernel-space
traffic generator to estimate link speeds. This feature is useful on
low-end WiFi APs where running iperf or netperf from userspace
gives wrong results due to heavy userspace/kernelspace overhead.
(two patches)
- API clean-up work by Antonio Quartulli (one patch)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If skb_clear_hash() was invoked due to mangling of relevant headers and
BPF program needs skb->hash later on, we can add a helper to trigger hash
recalculation via bpf_get_hash_recalc().
The helper will return the newly retrieved hash directly, but later access
can also be done via skb context again through skb->hash directly (inline)
without needing to call the helper once more.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extremely useful for setting packet type to host so i dont
have to modify the dst mac address using pedit (which requires
that i know the mac address)
Example usage:
tc filter add dev eth0 parent ffff: protocol ip pref 9 u32 \
match ip src 5.5.5.5/32 \
flowid 1:5 action skbedit ptype host
This will tag all packets incoming from 5.5.5.5 with type
PACKET_HOST
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The throughput meter module is a simple, kernel-space replacement for
throughtput measurements tool like iperf and netperf. It is intended to
approximate TCP behaviour.
It is invoked through batctl: the protocol is connection oriented, with
cumulative acknowledgment and a dynamic-size sliding window.
The test *can* be interrupted by batctl. A receiver side timeout avoids
unlimited waitings for sender packets: after one second of inactivity, the
receiver abort the ongoing test.
Based on a prototype from Edo Monticelli <montik@autistici.org>
Signed-off-by: Antonio Quartulli <antonio.quartulli@open-mesh.com>
Signed-off-by: Sven Eckelmann <sven.eckelmann@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
BATADV_CMD_GET_MESH_INFO is used to query basic information about a
batman-adv softif (name, index and MAC address for both the softif and
the primary hardif; routing algorithm; batman-adv version).
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
[sven.eckelmann@open-mesh.com: Reduce the number of changes to
BATADV_CMD_GET_MESH_INFO, add missing kerneldoc, add policy for attributes]
Signed-off-by: Sven Eckelmann <sven.eckelmann@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
debugfs is currently severely broken virtually everywhere in the kernel
where files are dynamically added and removed (see
http://lkml.iu.edu/hypermail/linux/kernel/1506.1/02196.html for some
details). In addition to that, debugfs is not namespace-aware.
Instead of adding new debugfs entries, the whole infrastructure should be
moved to netlink. This will fix the long standing problem of large buffers
for debug tables and hard to parse text files.
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
[sven.eckelmann@open-mesh.com: Strip down patch to only add genl family,
add missing kerneldoc]
Signed-off-by: Sven Eckelmann <sven.eckelmann@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Pull fuse fix from Miklos Szeredi:
"This makes sure userspace filesystems are not broken by the parallel
lookups and readdir feature"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: serialize dirops by default
Add the commands to set and show the mode of SRIOV E-Switch, two modes
are supported:
* legacy: operating in the "old" L2 based mode (DMAC --> VF vport)
* switchdev: the E-Switch is referred to as whitebox switch configured
using standard tools such as tc, bridge, openvswitch etc. To allow
working with the tools, for each VF, a VF representor netdevice is
created by the E-Switch manager vendor device driver instance (e.g PF).
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds a bpf helper, bpf_skb_in_cgroup, to decide if a skb->sk
belongs to a descendant of a cgroup2. It is similar to the
feature added in netfilter:
commit c38c4597e4 ("netfilter: implement xt_cgroup cgroup2 path match")
The user is expected to populate a BPF_MAP_TYPE_CGROUP_ARRAY
which will be used by the bpf_skb_in_cgroup.
Modifications to the bpf verifier is to ensure BPF_MAP_TYPE_CGROUP_ARRAY
and bpf_skb_in_cgroup() are always used together.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a BPF_MAP_TYPE_CGROUP_ARRAY and its bpf_map_ops's implementations.
To update an element, the caller is expected to obtain a cgroup2 backed
fd by open(cgroup2_dir) and then update the array with that fd.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We found that sometimes a restored tcp socket doesn't work.
A reason of this bug is incorrect window parameters and in this case
tcp_acceptable_seq() returns tcp_wnd_end(tp) instead of tp->snd_nxt. The
other side drops packets with this seq, because seq is less than
tp->rcv_nxt ( tcp_sequence() ).
Data from a send queue is sent only if there is enough space in a
window, so when we restore unacked data, we need to expand a window to
fit this data.
This was in a first version of this patch:
"tcp: extend window to fit all restored unacked data in a send queue"
Then Alexey recommended me to restore window parameters instead of
adjusted them according with data in a sent queue. This sounds resonable.
rcv_wnd has to be restored, because it was reported to another side
and the offered window is never shrunk.
One of reasons why we need to restore snd_wnd was described above.
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Negotiate with userspace filesystems whether they support parallel readdir
and lookup. Disable parallelism by default for fear of breaking fuse
filesystems.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 9902af79c0 ("parallel lookups: actual switch to rwsem")
Fixes: d9b3dbdcfd ("fuse: switch to ->iterate_shared()")
This patch adds stats support for the currently used IGMP/MLD types by the
bridge. The stats are per-port (plus one stat per-bridge) and per-direction
(RX/TX). The stats are exported via netlink via the new linkxstats API
(RTM_GETSTATS). In order to minimize the performance impact, a new option
is used to enable/disable the stats - multicast_stats_enabled, similar to
the recent vlan stats. Also in order to avoid multiple IGMP/MLD type
lookups and checks, we make use of the current "igmp" member of the bridge
private skb->cb region to record the type on Rx (both host-generated and
external packets pass by multicast_rcv()). We can do that since the igmp
member was used as a boolean and all the valid IGMP/MLD types are positive
values. The normal bridge fast-path is not affected at all, the only
affected paths are the flooding ones and since we make use of the IGMP/MLD
type, we can quickly determine if the packet should be counted using
cache-hot data (cb's igmp member). We add counters for:
* IGMP Queries
* IGMP Leaves
* IGMP v1/v2/v3 reports
* MLD Queries
* MLD Leaves
* MLD v1/v2 reports
These are invaluable when monitoring or debugging complex multicast setups
with bridges.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for the IFLA_STATS_LINK_XSTATS_SLAVE attribute
which allows to export per-slave statistics if the master device supports
the linkxstats callback. The attribute is passed down to the linkxstats
callback and it is up to the callback user to use it (an example has been
added to the only current user - the bridge). This allows us to query only
specific slaves of master devices like bridge ports and export only what
we're interested in instead of having to dump all ports and searching only
for a single one. This will be used to export per-port IGMP/MLD stats and
also per-port vlan stats in the future, possibly other statistics as well.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This work adds a helper for changing skb->pkt_type in a controlled way.
We only allow a subset of possible values and can extend that in future
should other use cases come up. Doing this as a helper has the advantage
that errors can be handeled gracefully and thus helper kept extensible.
It's a write counterpart to pkt_type member we can already read from
struct __sk_buff context. Major use case is to change incoming skbs to
PACKET_HOST in a programmatic way instead of having to recirculate via
redirect(..., BPF_F_INGRESS), for example.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a minimal helper for doing the groundwork of changing
the skb->protocol in a controlled way. Currently supported is v4 to
v6 and vice versa transitions, which allows f.e. for a minimal, static
nat64 implementation where applications in containers that still
require IPv4 can be transparently operated in an IPv6-only environment.
For example, host facing veth of the container can transparently do
the transitions in a programmatic way with the help of clsact qdisc
and cls_bpf.
Idea is to separate concerns for keeping complexity of the helper
lower, which means that the programs utilize bpf_skb_change_proto(),
bpf_skb_store_bytes() and bpf_lX_csum_replace() to get the job done,
instead of doing everything in a single helper (and thus partially
duplicating helper functionality). Also, bpf_skb_change_proto()
shouldn't need to deal with raw packet data as this is done by other
helpers.
bpf_skb_proto_6_to_4() and bpf_skb_proto_4_to_6() unclone the skb to
operate on a private one, push or pop additionally required header
space and migrate the gso/gro meta data from the shared info. We do
mark the gso type as dodgy so that headers are checked and segs
recalculated by the gso/gro engine. The gso_size target is adapted
as well. The flags argument added is currently reserved and can be
used for future extensions.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Follow-up commit to 1e33759c78 ("bpf, trace: add BPF_F_CURRENT_CPU
flag for bpf_perf_event_output") to add the same functionality into
bpf_perf_event_read() helper. The split of index into flags and index
component is also safe here, since such large maps are rejected during
map allocation time.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several cases of overlapping changes, except the packet scheduler
conflicts which deal with the addition of the free list parameter
to qdisc_enqueue().
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
"I've been traveling so this accumulates more than week or so of bug
fixing. It perhaps looks a little worse than it really is.
1) Fix deadlock in ath10k driver, from Ben Greear.
2) Increase scan timeout in iwlwifi, from Luca Coelho.
3) Unbreak STP by properly reinjecting STP packets back into the
stack. Regression fix from Ido Schimmel.
4) Mediatek driver fixes (missing malloc failure checks, leaking of
scratch memory, wrong indexing when mapping TX buffers, etc.) from
John Crispin.
5) Fix endianness bug in icmpv6_err() handler, from Hannes Frederic
Sowa.
6) Fix hashing of flows in UDP in the ruseport case, from Xuemin Su.
7) Fix netlink notifications in ovs for tunnels, delete link messages
are never emitted because of how the device registry state is
handled. From Nicolas Dichtel.
8) Conntrack module leaks kmemcache on unload, from Florian Westphal.
9) Prevent endless jump loops in nft rules, from Liping Zhang and
Pablo Neira Ayuso.
10) Not early enough spinlock initialization in mlx4, from Eric
Dumazet.
11) Bind refcount leak in act_ipt, from Cong WANG.
12) Missing RCU locking in HTB scheduler, from Florian Westphal.
13) Several small MACSEC bug fixes from Sabrina Dubroca (missing RCU
barrier, using heap for SG and IV, and erroneous use of async flag
when allocating AEAD conext.)
14) RCU handling fix in TIPC, from Ying Xue.
15) Pass correct protocol down into ipv4_{update_pmtu,redirect}() in
SIT driver, from Simon Horman.
16) Socket timer deadlock fix in TIPC from Jon Paul Maloy.
17) Fix potential deadlock in team enslave, from Ido Schimmel.
18) Memory leak in KCM procfs handling, from Jiri Slaby.
19) ESN generation fix in ipv4 ESP, from Herbert Xu.
20) Fix GFP_KERNEL allocations with locks held in act_ife, from Cong
WANG.
21) Use after free in netem, from Eric Dumazet.
22) Uninitialized last assert time in multicast router code, from Tom
Goff.
23) Skip raw sockets in sock_diag destruction broadcast, from Willem
de Bruijn.
24) Fix link status reporting in thunderx, from Sunil Goutham.
25) Limit resegmentation of retransmit queue so that we do not
retransmit too large GSO frames. From Eric Dumazet.
26) Delay bpf program release after grace period, from Daniel
Borkmann"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (141 commits)
openvswitch: fix conntrack netlink event delivery
qed: Protect the doorbell BAR with the write barriers.
neigh: Explicitly declare RCU-bh read side critical section in neigh_xmit()
e1000e: keep VLAN interfaces functional after rxvlan off
cfg80211: fix proto in ieee80211_data_to_8023 for frames without LLC header
qlcnic: use the correct ring in qlcnic_83xx_process_rcv_ring_diag()
bpf, perf: delay release of BPF prog after grace period
net: bridge: fix vlan stats continue counter
tcp: do not send too big packets at retransmit time
ibmvnic: fix to use list_for_each_safe() when delete items
net: thunderx: Fix TL4 configuration for secondary Qsets
net: thunderx: Fix link status reporting
net/mlx5e: Reorganize ethtool statistics
net/mlx5e: Fix number of PFC counters reported to ethtool
net/mlx5e: Prevent adding the same vxlan port
net/mlx5e: Check for BlueFlame capability before allocating SQ uar
net/mlx5e: Change enum to better reflect usage
net/mlx5: Add ConnectX-5 PCIe 4.0 to list of supported devices
net/mlx5: Update command strings
net: marvell: Add separate config ANEG function for Marvell 88E1111
...
Some devices with a pen may have a switch that can be used to detect
when the pen is inserted or removed to a slot on the device. Let's add
a define to the input event codes so that everyone can be on the same
page for what event we should generate when the pen is inserted or
removed.
In general the pen switch could be used by the software on the device to
kick off any number of actions when the pen is inserted or removed.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The formats are interleaved with the YUV packed and miscellaneous
formats, making the result confusing especially with the YUV444 format
being packed and not planar like YUV410 or YUV420. Move them to their
own group as the 2 planes or 3 non-contiguous planes formats to clarify
the header.
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Add support to inet_diag facility to filter sockets based on device
index. If an interface index is in the filter only sockets bound
to that index (sk_bound_dev_if) are returned.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull input fixes from Dmitry Torokhov.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: vmmouse - remove port reservation
Input: elantech - add more IC body types to the list
Input: wacom_w8001 - ignore invalid pen data packets
Input: wacom_w8001 - w8001_MAX_LENGTH should be 13
Input: xpad - fix oops when attaching an unknown Xbox One gamepad
MAINTAINERS: add Pali Rohár as reviewer of ALPS PS/2 touchpad driver
Input: add HDMI CEC specific keycodes
Input: add BUS_CEC type
Input: xpad - fix rumble on Xbox One controllers with 2015 firmware
CALIPSO is a hop-by-hop IPv6 option. A lot of this patch is based on
the equivalent CISPO code. The main difference is due to manipulating
the options in the hop-by-hop header.
Signed-off-by: Huw Davies <huw@codeweavers.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
CALIPSO is a packet labelling protocol for IPv6 which is very similar
to CIPSO. It is specified in RFC 5570. Much of the code is based on
the current CIPSO code.
This adds support for adding passthrough-type CALIPSO DOIs through the
NLBL_CALIPSO_C_ADD command. It requires attributes:
NLBL_CALIPSO_A_TYPE which must be CALIPSO_MAP_PASS.
NLBL_CALIPSO_A_DOI.
In passthrough mode the CALIPSO engine will map MLS secattr levels
and categories directly to the packet label.
At this stage, the major difference between this and the CIPSO
code is that IPv6 may be compiled as a module. To allow for
this the CALIPSO functions are registered at module init time.
Signed-off-by: Huw Davies <huw@codeweavers.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>