Pull pidfd updates from Christian Brauner:
"This adds two main features.
- First, it adds polling support for pidfds. This allows process
managers to know when a (non-parent) process dies in a race-free
way.
The notification mechanism used follows the same logic that is
currently used when the parent of a task is notified of a child's
death. With this patchset it is possible to put pidfds in an
{e}poll loop and get reliable notifications for process (i.e.
thread-group) exit.
- The second feature compliments the first one by making it possible
to retrieve pollable pidfds for processes that were not created
using CLONE_PIDFD.
A lot of processes get created with traditional PID-based calls
such as fork() or clone() (without CLONE_PIDFD). For these
processes a caller can currently not create a pollable pidfd. This
is a problem for Android's low memory killer (LMK) and service
managers such as systemd.
Both patchsets are accompanied by selftests.
It's perhaps worth noting that the work done so far and the work done
in this branch for pidfd_open() and polling support do already see
some adoption:
- Android is in the process of backporting this work to all their LTS
kernels [1]
- Service managers make use of pidfd_send_signal but will need to
wait until we enable waiting on pidfds for full adoption.
- And projects I maintain make use of both pidfd_send_signal and
CLONE_PIDFD [2] and will use polling support and pidfd_open() too"
[1] https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.9+backport%22https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.14+backport%22https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.19+backport%22
[2] aab6e3eb73/src/lxc/start.c (L1753)
* tag 'pidfd-updates-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
tests: add pidfd_open() tests
arch: wire-up pidfd_open()
pid: add pidfd_open()
pidfd: add polling selftests
pidfd: add polling support
Pull m68nommu updates from Greg Ungerer:
"A series of cleanups for the FLAT format binary loader, binfmt_flat,
from Christoph.
The end goal is to support no-MMU on RISC-V, and the last patch
enables that"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
riscv: add binfmt_flat support
binfmt_flat: don't offset the data start
binfmt_flat: move the MAX_SHARED_LIBS definition to binfmt_flat.c
binfmt_flat: remove the persistent argument from flat_get_addr_from_rp
binfmt_flat: provide an asm-generic/flat.h
binfmt_flat: make support for old format binaries optional
binfmt_flat: add a ARCH_HAS_BINFMT_FLAT option
binfmt_flat: add endianess annotations
binfmt_flat: use fixed size type for the on-disk format
binfmt_flat: consolidate two version of flat_v2_reloc_t
binfmt_flat: remove the unused OLD_FLAT_FLAG_RAM definition
binfmt_flat: remove the uapi <linux/flat.h> header
binfmt_flat: replace flat_argvp_envp_on_stack with a Kconfig variable
binfmt_flat: remove flat_old_ram_flag
binfmt_flat: provide a default version of flat_get_relocate_addr
binfmt_flat: remove flat_set_persistent
binfmt_flat: remove flat_reloc_valid
Commit 9903c8dc73 changed TC_ETF defines to use _BITUL instead of BIT
but did not add the dependecy on linux/const.h. As a consequence,
importing the uapi headers into iproute2 causes builds to fail. Add
the dependency.
Fixes: 9903c8dc73 ("etf: Don't use BIT() in UAPI headers.")
Cc: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds hardware offload support for nftables through the
existing netdev_ops->ndo_setup_tc() interface, the TC_SETUP_CLSFLOWER
classifier and the flow rule API. This hardware offload support is
available for the NFPROTO_NETDEV family and the ingress hook.
Each nftables expression has a new ->offload interface, that is used to
populate the flow rule object that is attached to the transaction
object.
There is a new per-table NFT_TABLE_F_HW flag, that is set on to offload
an entire table, including all of its chains.
This patch supports for basic metadata (layer 3 and 4 protocol numbers),
5-tuple payload matching and the accept/drop actions; this also includes
basechain hardware offload only.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is done through IORING_OP_RECVMSG. This opcode uses the same
sqe->msg_flags that IORING_OP_SENDMSG added, and we pass in the
msghdr struct in the sqe->addr field as well.
We use MSG_DONTWAIT to force an inline fast path if recvmsg() doesn't
block, and punt to async execution if it would have.
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This is done through IORING_OP_SENDMSG. There's a new sqe->msg_flags
for the flags argument, and the msghdr struct is passed in the
sqe->addr field.
We use MSG_DONTWAIT to force an inline fast path if sendmsg() doesn't
block, and punt to async execution if it would have.
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Allow sending a packet to conntrack module for connection tracking.
The packet will be marked with conntrack connection's state, and
any metadata such as conntrack mark and label. This state metadata
can later be matched against with tc classifers, for example with the
flower classifier as below.
In addition to committing new connections the user can optionally
specific a zone to track within, set a mark/label and configure nat
with an address range and port range.
Usage is as follows:
$ tc qdisc add dev ens1f0_0 ingress
$ tc qdisc add dev ens1f0_1 ingress
$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 0 proto ip \
flower ip_proto tcp ct_state -trk \
action ct zone 2 pipe \
action goto chain 2
$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower ct_state +trk+new \
action ct zone 2 commit mark 0xbb nat src addr 5.5.5.7 pipe \
action mirred egress redirect dev ens1f0_1
$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
action ct nat pipe \
action mirred egress redirect dev ens1f0_1
$ tc filter add dev ens1f0_1 ingress \
prio 1 chain 0 proto ip \
flower ip_proto tcp ct_state -trk \
action ct zone 2 pipe \
action goto chain 1
$ tc filter add dev ens1f0_1 ingress \
prio 1 chain 1 proto ip \
flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
action ct nat pipe \
action mirred egress redirect dev ens1f0_0
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Changelog:
V5->V6:
Added CONFIG_NF_DEFRAG_IPV6 in handle fragments ipv6 case
V4->V5:
Reordered nf_conntrack_put() in tcf_ct_skb_nfct_cached()
V3->V4:
Added strict_start_type for act_ct policy
V2->V3:
Fixed david's comments: Removed extra newline after rcu in tcf_ct_params , and indent of break in act_ct.c
V1->V2:
Fixed parsing of ranges TCA_CT_NAT_IPV6_MAX as 'else' case overwritten ipv4 max
Refactored NAT_PORT_MIN_MAX range handling as well
Added ipv4/ipv6 defragmentation
Removed extra skb pull push of nw offset in exectute nat
Refactored tcf_ct_skb_network_trim after pull
Removed TCA_ACT_CT define
Signed-off-by: David S. Miller <davem@davemloft.net>
In an eswitch, PCI VF may have port which is normally represented using
a representor netdevice.
To have better visibility of eswitch port, its association with VF,
and its representor netdevice, introduce a PCI VF port flavour.
When devlink port flavour is PCI VF, fill up PCI VF attributes of
the port.
Extend port name creation using PCI PF and VF number scheme on best
effort basis, so that vendor drivers can skip defining their own scheme.
$ devlink port show
pci/0000:05:00.0/0: type eth netdev eth0 flavour pcipf pfnum 0
pci/0000:05:00.0/1: type eth netdev eth1 flavour pcivf pfnum 0 vfnum 0
pci/0000:05:00.0/2: type eth netdev eth2 flavour pcivf pfnum 0 vfnum 1
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In an eswitch, PCI PF may have port which is normally represented
using a representor netdevice.
To have better visibility of eswitch port, its association with
PF and a representor netdevice, introduce a PCI PF port
flavour and port attriute.
When devlink port flavour is PCI PF, fill up PCI PF attributes of the
port.
Extend port name creation using PCI PF number on best effort basis.
So that vendor drivers can skip defining their own scheme.
$ devlink port show
pci/0000:05:00.0/0: type eth netdev eth0 flavour pcipf pfnum 0
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull block updates from Jens Axboe:
"This is the main block updates for 5.3. Nothing earth shattering or
major in here, just fixes, additions, and improvements all over the
map. This contains:
- Series of documentation fixes (Bart)
- Optimization of the blk-mq ctx get/put (Bart)
- null_blk removal race condition fix (Bob)
- req/bio_op() cleanups (Chaitanya)
- Series cleaning up the segment accounting, and request/bio mapping
(Christoph)
- Series cleaning up the page getting/putting for bios (Christoph)
- block cgroup cleanups and moving it to where it is used (Christoph)
- block cgroup fixes (Tejun)
- Series of fixes and improvements to bcache, most notably a write
deadlock fix (Coly)
- blk-iolatency STS_AGAIN and accounting fixes (Dennis)
- Series of improvements and fixes to BFQ (Douglas, Paolo)
- debugfs_create() return value check removal for drbd (Greg)
- Use struct_size(), where appropriate (Gustavo)
- Two lighnvm fixes (Heiner, Geert)
- MD fixes, including a read balance and corruption fix (Guoqing,
Marcos, Xiao, Yufen)
- block opal shadow mbr additions (Jonas, Revanth)
- sbitmap compare-and-exhange improvemnts (Pavel)
- Fix for potential bio->bi_size overflow (Ming)
- NVMe pull requests:
- improved PCIe suspent support (Keith Busch)
- error injection support for the admin queue (Akinobu Mita)
- Fibre Channel discovery improvements (James Smart)
- tracing improvements including nvmetc tracing support (Minwoo Im)
- misc fixes and cleanups (Anton Eidelman, Minwoo Im, Chaitanya
Kulkarni)"
- Various little fixes and improvements to drivers and core"
* tag 'for-5.3/block-20190708' of git://git.kernel.dk/linux-block: (153 commits)
blk-iolatency: fix STS_AGAIN handling
block: nr_phys_segments needs to be zero for REQ_OP_WRITE_ZEROES
blk-mq: simplify blk_mq_make_request()
blk-mq: remove blk_mq_put_ctx()
sbitmap: Replace cmpxchg with xchg
block: fix .bi_size overflow
block: sed-opal: check size of shadow mbr
block: sed-opal: ioctl for writing to shadow mbr
block: sed-opal: add ioctl for done-mark of shadow mbr
block: never take page references for ITER_BVEC
direct-io: use bio_release_pages in dio_bio_complete
block_dev: use bio_release_pages in bio_unmap_user
block_dev: use bio_release_pages in blkdev_bio_end_io
iomap: use bio_release_pages in iomap_dio_bio_end_io
block: use bio_release_pages in bio_map_user_iov
block: use bio_release_pages in bio_unmap_user
block: optionally mark pages dirty in bio_release_pages
block: move the BIO_NO_PAGE_REF check into bio_release_pages
block: skd_main.c: Remove call to memset after dma_alloc_coherent
block: mtip32xx: Remove call to memset after dma_alloc_coherent
...
Pull sound updates from Takashi Iwai:
"Many updates in this development cycle are found in ASoC where it got
a wide range of changes for the continued refactoring.
Some highlights are below.
ASoC:
- Continued refactoring work by Morimoto-san toward the full
componentization; the changes are seen allover the places
- Support for force disconnecting muxes in DAPM
- Continued development of ASoC Intel SOF stuff
- New drivers for Cirrus Logic CS47L35, CS47L85 and CS47L90, Conexant
CX2072X, Realtek RT1011 and RT1308
HD-audio:
- More fixes and adjustments for ASoC SOF HD-audio
- Fix for resume problem on some Realtek codecs
USB-audio:
- A few fixes for the issues reported by syzbot USB fuzzer
- Fix for UAC2 extension unit parser
- Quirks for Line6 Helix, Emgaic Unitor 8
FireWire:
- Lots of code refactoring and fixes in most of its components"
* tag 'sound-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (626 commits)
ALSA: firewire-lib: code refactoring for local variables
ALSA: firewire-lib: code refactoring for post operation to data block counter
ALSA: firewire-lib: code refactoring for error path of parser for CIP header
ALSA: firewire-lib: fix different data block counter between probed event and transferred isochronous packet
ALSA: firewire-lib: fix initial value of data block count for IR context without CIP_DBC_IS_END_EVENT
ALSA: firewire-lib/fireface: fix initial value of data block counter for IR context with CIP_NO_HEADER
ALSA: firewire-lib: fix invalid length of rx packet payload for tracepoint events
ALSA: usb-audio: fix Line6 Helix audio format rates
firewire-motu: fix wrong reference count for stream functionality at error path of rawmidi interface
ALSA: firewire-digi00x: fix wrong reference count for stream functionality at error path of rawmidi interface
ALSA: dice: fix wrong reference count for stream functionality at error path of rawmidi interface
ALSA: oxfw: fix wrong reference count for stream functionality at error path of rawmidi interface
ALSA: fireworks: fix wrong reference count for stream functionality at error path of rawmidi interface
ALSA: bebob: fix wrong reference count for stream functionality at error path of rawmidi interface
ASoC: SOF: Intel: implement runtime idle for CNL/APL
ASoC: SOF: add runtime idle callback
ASoC: hdac_hdmi: report codec link up/down status to bus
ASoC: SOF: debug: fix possible memory leak in sof_dfsentry_write()
ASoC: sunxi: sun50i-codec-analog: Add earpiece
ASoC: rt5665: remove redundant assignment to variable idx
...
Pull media updates from Mauro Carvalho Chehab:
- new Atmel microship ISC driver
- coda has gained support for mpeg2 and mpeg4
- cxusb gained support for analog TV
- rockchip staging driver was split into two separate staging drivers
- added a new staging driver for Allegro DVT video IP core
- added a new staging driver for Amlogic Meson video decoder
- lots of improvements and cleanups
* tag 'media/v5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (398 commits)
media: allegro: use new v4l2_m2m_ioctl_try_encoder_cmd funcs
media: doc-rst: Fix typos
media: radio-raremono: change devm_k*alloc to k*alloc
media: stv0297: fix frequency range limit
media: rc: Prefer KEY_NUMERIC_* for number buttons on remotes
media: dvb_frontend: split dvb_frontend_handle_ioctl function
media: mceusb: disable "nonsensical irdata" messages
media: rc: remove redundant dev_err message
media: cec-notifier: add new notifier functions
media: cec: add struct cec_connector_info support
media: cec-notifier: rename variables, check kstrdup and n->conn_name
media: MAINTAINERS: Add maintainers for Media Controller
media: staging: media: tegra-vde: Defer dmabuf's unmapping
media: staging: media: tegra-vde: Add IOMMU support
media: hdpvr: fix locking and a missing msleep
media: v4l2: Test type instead of cfg->type in v4l2_ctrl_new_custom()
media: atmel: atmel-isc: fix i386 build error
media: v4l2-ctrl: Move compound control initialization
media: hantro: Use vb2_get_buffer
media: pci: cx88: Change the type of 'missed' to u64
...
Pull iommu updates from Joerg Roedel:
- Make the dma-iommu code more generic so that it can be used outside
of the ARM context with other IOMMU drivers. Goal is to make use of
it on x86 too.
- Generic IOMMU domain support for the Intel VT-d driver. This driver
now makes more use of common IOMMU code to allocate default domains
for the devices it handles.
- An IOMMU fault reporting API to userspace. With that the IOMMU fault
handling can be done in user-space, for example to forward the faults
to a VM.
- Better handling for reserved regions requested by the firmware. These
can be 'relaxed' now, meaning that those don't prevent a device being
attached to a VM.
- Suspend/Resume support for the Renesas IOMMU driver.
- Added support for dumping SVA related fields of the DMAR table in the
Intel VT-d driver via debugfs.
- A pile of smaller fixes and cleanups.
* tag 'iommu-updates-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (90 commits)
iommu/omap: No need to check return value of debugfs_create functions
iommu/arm-smmu-v3: Invalidate ATC when detaching a device
iommu/arm-smmu-v3: Fix compilation when CONFIG_CMA=n
iommu/vt-d: Cleanup unused variable
iommu/amd: Flush not present cache in iommu_map_page
iommu/amd: Only free resources once on init error
iommu/amd: Move gart fallback to amd_iommu_init
iommu/amd: Make iommu_disable safer
iommu/io-pgtable: Support non-coherent page tables
iommu/io-pgtable: Replace IO_PGTABLE_QUIRK_NO_DMA with specific flag
iommu/io-pgtable-arm: Add support to use system cache
iommu/arm-smmu-v3: Increase maximum size of queues
iommu/vt-d: Silence a variable set but not used
iommu/vt-d: Remove an unused variable "length"
iommu: Fix integer truncation
iommu: Add padding to struct iommu_fault
iommu/vt-d: Consolidate domain_init() to avoid duplication
iommu/vt-d: Cleanup after delegating DMA domain to generic iommu
iommu/vt-d: Fix suspicious RCU usage in probe_acpi_namespace_devices()
iommu/vt-d: Allow DMA domain attaching to rmrr locked device
...
Pull cgroup updates from Tejun Heo:
"Documentation updates and the addition of cgroup_parse_float() which
will be used by new controllers including blk-iocost"
* 'for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
docs: cgroup-v1: convert docs to ReST and rename to *.rst
cgroup: Move cgroup_parse_float() implementation out of CONFIG_SYSFS
cgroup: add cgroup_parse_float()
Pull keyring ACL support from David Howells:
"This changes the permissions model used by keys and keyrings to be
based on an internal ACL by the following means:
- Replace the permissions mask internally with an ACL that contains a
list of ACEs, each with a specific subject with a permissions mask.
Potted default ACLs are available for new keys and keyrings.
ACE subjects can be macroised to indicate the UID and GID specified
on the key (which remain). Future commits will be able to add
additional subject types, such as specific UIDs or domain
tags/namespaces.
Also split a number of permissions to give finer control. Examples
include splitting the revocation permit from the change-attributes
permit, thereby allowing someone to be granted permission to revoke
a key without allowing them to change the owner; also the ability
to join a keyring is split from the ability to link to it, thereby
stopping a process accessing a keyring by joining it and thus
acquiring use of possessor permits.
- Provide a keyctl to allow the granting or denial of one or more
permits to a specific subject. Direct access to the ACL is not
granted, and the ACL cannot be viewed"
* tag 'keys-acl-20190703' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
keys: Provide KEYCTL_GRANT_PERMISSION
keys: Replace uid/gid/perm permissions checking with an ACL
Currently, TC offers the ability to match on the MPLS fields of a packet
through the use of the flow_dissector_key_mpls struct. However, as yet, TC
actions do not allow the modification or manipulation of such fields.
Add a new module that registers TC action ops to allow manipulation of
MPLS. This includes the ability to push and pop headers as well as modify
the contents of new or existing headers. A further action to decrement the
TTL field of an MPLS header is also provided with a new helper added to
support this.
Examples of the usage of the new action with flower rules to push and pop
MPLS labels are:
tc filter add dev eth0 protocol ip parent ffff: flower \
action mpls push protocol mpls_uc label 123 \
action mirred egress redirect dev eth1
tc filter add dev eth0 protocol mpls_uc parent ffff: flower \
action mpls pop protocol ipv4 \
action mirred egress redirect dev eth1
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull keyring namespacing from David Howells:
"These patches help make keys and keyrings more namespace aware.
Firstly some miscellaneous patches to make the process easier:
- Simplify key index_key handling so that the word-sized chunks
assoc_array requires don't have to be shifted about, making it
easier to add more bits into the key.
- Cache the hash value in the key so that we don't have to calculate
on every key we examine during a search (it involves a bunch of
multiplications).
- Allow keying_search() to search non-recursively.
Then the main patches:
- Make it so that keyring names are per-user_namespace from the point
of view of KEYCTL_JOIN_SESSION_KEYRING so that they're not
accessible cross-user_namespace.
keyctl_capabilities() shows KEYCTL_CAPS1_NS_KEYRING_NAME for this.
- Move the user and user-session keyrings to the user_namespace
rather than the user_struct. This prevents them propagating
directly across user_namespaces boundaries (ie. the KEY_SPEC_*
flags will only pick from the current user_namespace).
- Make it possible to include the target namespace in which the key
shall operate in the index_key. This will allow the possibility of
multiple keys with the same description, but different target
domains to be held in the same keyring.
keyctl_capabilities() shows KEYCTL_CAPS1_NS_KEY_TAG for this.
- Make it so that keys are implicitly invalidated by removal of a
domain tag, causing them to be garbage collected.
- Institute a network namespace domain tag that allows keys to be
differentiated by the network namespace in which they operate. New
keys that are of a type marked 'KEY_TYPE_NET_DOMAIN' are assigned
the network domain in force when they are created.
- Make it so that the desired network namespace can be handed down
into the request_key() mechanism. This allows AFS, NFS, etc. to
request keys specific to the network namespace of the superblock.
This also means that the keys in the DNS record cache are
thenceforth namespaced, provided network filesystems pass the
appropriate network namespace down into dns_query().
For DNS, AFS and NFS are good, whilst CIFS and Ceph are not. Other
cache keyrings, such as idmapper keyrings, also need to set the
domain tag - for which they need access to the network namespace of
the superblock"
* tag 'keys-namespace-20190627' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
keys: Pass the network namespace into request_key mechanism
keys: Network namespace domain tag
keys: Garbage collect keys for which the domain has been removed
keys: Include target namespace in match criteria
keys: Move the user and user-session keyrings to the user_namespace
keys: Namespace keyring names
keys: Add a 'recurse' flag for keyring searches
keys: Cache the hash value to avoid lots of recalculation
keys: Simplify key description management
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-07-09
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Lots of libbpf improvements: i) addition of new APIs to attach BPF
programs to tracing entities such as {k,u}probes or tracepoints,
ii) improve specification of BTF-defined maps by eliminating the
need for data initialization for some of the members, iii) addition
of a high-level API for setting up and polling perf buffers for
BPF event output helpers, all from Andrii.
2) Add "prog run" subcommand to bpftool in order to test-run programs
through the kernel testing infrastructure of BPF, from Quentin.
3) Improve verifier for BPF sockaddr programs to support 8-byte stores
for user_ip6 and msg_src_ip6 members given clang tends to generate
such stores, from Stanislav.
4) Enable the new BPF JIT zero-extension optimization for further
riscv64 ALU ops, from Luke.
5) Fix a bpftool json JIT dump crash on powerpc, from Jiri.
6) Fix an AF_XDP race in generic XDP's receive path, from Ilya.
7) Various smaller fixes from Ilya, Yue and Arnd.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull misc keyring updates from David Howells:
"These are some miscellaneous keyrings fixes and improvements:
- Fix a bunch of warnings from sparse, including missing RCU bits and
kdoc-function argument mismatches
- Implement a keyctl to allow a key to be moved from one keyring to
another, with the option of prohibiting key replacement in the
destination keyring.
- Grant Link permission to possessors of request_key_auth tokens so
that upcall servicing daemons can more easily arrange things such
that only the necessary auth key is passed to the actual service
program, and not all the auth keys a daemon might possesss.
- Improvement in lookup_user_key().
- Implement a keyctl to allow keyrings subsystem capabilities to be
queried.
The keyutils next branch has commits to make available, document and
test the move-key and capabilities code:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log
They're currently on the 'next' branch"
* tag 'keys-misc-20190619' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
keys: Add capability-checking keyctl function
keys: Reuse keyring_index_key::desc_len in lookup_user_key()
keys: Grant Link permission to possessers of request_key auth keys
keys: Add a keyctl to move a key between keyrings
keys: Hoist locking out of __key_link_begin()
keys: Break bits out of key_unlink()
keys: Change keyring_serialise_link_sem to a mutex
keys: sparse: Fix kdoc mismatches
keys: sparse: Fix incorrect RCU accesses
keys: sparse: Fix key_fs[ug]id_changed()
Pull audit updates from Paul Moore:
"This pull request is a bit early, but with some vacation time coming
up I wanted to send this out now just in case the remote Internet Gods
decide not to smile on me once the merge window opens. The patchset
for v5.3 is pretty minor this time, the highlights include:
- When the audit daemon is sent a signal, ensure we deliver
information about the sender even when syscall auditing is not
enabled/supported.
- Add the ability to filter audit records based on network address
family.
- Tighten the audit field filtering restrictions on string based
fields.
- Cleanup the audit field filtering verification code.
- Remove a few BUG() calls from the audit code"
* tag 'audit-pr-20190702' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: remove the BUG() calls in the audit rule comparison functions
audit: enforce op for string fields
audit: add saddr_fam filter field
audit: re-structure audit field valid checks
audit: deliver signal_info regarless of syscall
Added parameter in ib_device for enabling dynamic interrupt moderation so
that it can be configured in userspace using rdma tool.
In order to set adaptive-moderation for an ib device the command is:
rdma dev set [DEV] adaptive-moderation [on|off]
Please set on/off.
rdma dev show
0: mlx5_0: node_type ca fw 16.26.0055 node_guid 248a:0703:00a5:29d0
sys_image_guid 248a:0703:00a5:29d0 adaptive-moderation on
rdma resource show cq
dev mlx5_0 cqn 0 cqe 1023 users 4 poll-ctx UNBOUND_WORKQUEUE
adaptive-moderation off comm [ib_core]
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for net-next:
1) Move bridge keys in nft_meta to nft_meta_bridge, from wenxu.
2) Support for bridge pvid matching, from wenxu.
3) Support for bridge vlan protocol matching, also from wenxu.
4) Add br_vlan_get_pvid_rcu(), to fetch the bridge port pvid
from packet path.
5) Prefer specific family extension in nf_tables.
6) Autoload specific family extension in case it is missing.
7) Add synproxy support to nf_tables, from Fernando Fernandez Mancera.
8) Support for GRE encapsulation in IPVS, from Vadim Fedorenko.
9) ICMP handling for GRE encapsulation, from Julian Anastasov.
10) Remove unused parameter in nf_queue, from Florian Westphal.
11) Replace seq_printf() by seq_puts() in nf_log, from Markus Elfring.
12) Rename nf_SYNPROXY.h => nf_synproxy.h before this header becomes
public.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
rvt was using ib_sge as part of it's ABI, which is not allowed. Introduce
a new struct with the same layout and use it instead.
Fixes: dabac6e460 ("IB/hfi1: Move receive work queue struct into uapi directory")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Since commit cd17d77705 ("bpf/tools: sync bpf.h") clang decided
that it can do a single u64 store into user_ip6[2] instead of two
separate u32 ones:
# 17: (18) r2 = 0x100000000000000
# ; ctx->user_ip6[2] = bpf_htonl(DST_REWRITE_IP6_2);
# 19: (7b) *(u64 *)(r1 +16) = r2
# invalid bpf_context access off=16 size=8
>From the compiler point of view it does look like a correct thing
to do, so let's support it on the kernel side.
Credit to Andrii Nakryiko for a proper implementation of
bpf_ctx_wide_store_ok.
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Fixes: cd17d77705 ("bpf/tools: sync bpf.h")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
ASoC: Updates for v5.3
This is a very big update, mainly thanks to Morimoto-san's refactoring
work and some fairly large new drivers.
- Lots more work on moving towards a component based framework from
Morimoto-san.
- Support for force disconnecting muxes from Jerome Brunet.
- New drivers for Cirrus Logic CS47L35, CS47L85 and CS47L90, Conexant
CX2072X, Realtek RT1011 and RT1308.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Linux kernel tolerates C++ style comments these days. Actually, the
SPDX License tags for .c files start with //.
On the other hand, uapi headers are written in more strict C, where
the C++ comment style is forbidden.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
This patch adds virtio-pmem driver for KVM guest.
Guest reads the persistent memory range information from
Qemu over VIRTIO and registers it on nvdimm_bus. It also
creates a nd_region object with the persistent memory
range information so that existing 'nvdimm/pmem' driver
can reserve this into system memory map. This way
'virtio-pmem' driver uses existing functionality of pmem
driver to register persistent memory compatible for DAX
capable filesystems.
This also provides function to perform guest flush over
VIRTIO from 'pmem' driver when userspace performs flush
on DAX memory range.
Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jakub Staron <jstaron@google.com>
Tested-by: Jakub Staron <jstaron@google.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch allows you to match on bridge vlan protocol, eg.
nft add rule bridge firewall zones counter meta ibrvproto 0x8100
Signed-off-by: wenxu <wenxu@ucloud.cn>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch allows you to match on the bridge port pvid, eg.
nft add rule bridge firewall zones counter meta ibrpvid 10
Signed-off-by: wenxu <wenxu@ucloud.cn>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add synproxy support for nf_tables. This behaves like the iptables
synproxy target but it is structured in a way that allows us to propose
improvements in the future.
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Provide an option to allow users to manually bind a qp with a counter
through RDMA netlink. Limit it to users with ADMIN capability only.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In manual mode a QP is bound to a counter manually. If counter is not
specified then a new one will be allocated.
Manual mode is enabled when user binds a QP, and disabled when the last
manually bound QP is unbound.
When auto-mode is turned off and there are counters left, manual mode is
enabled so that the user is able to access these counters.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch adds the ability to return all available counters together with
their properties and hwstats.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Provide an option to enable/disable per-port counter auto mode through
RDMA netlink. Limit it to users with ADMIN capability only.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Extension Unit (XU) is used to have a compatible layout with
Processing Unit (PU) on UAC1, and the usb-audio driver code assumed it
for parsing the descriptors. Meanwhile, on UAC2, XU became slightly
incompatible with PU; namely, XU has a one-byte bmControls bitmap
while PU has two bytes bmControls bitmap. This incompatibility
results in the read of a wrong address for the last iExtension field,
which ended up with an incorrect string for the mixer element name, as
recently reported for Focusrite Scarlett 18i20 device.
This patch corrects this misalignment by introducing a couple of new
macros and calling them depending on the descriptor type.
Fixes: 23caaf19b1 ("ALSA: usb-mixer: Add support for Audio Class v2.0")
Reported-by: Stefan Sauer <ensonic@hora-obscura.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-07-03
The following pull-request contains BPF updates for your *net-next* tree.
There is a minor merge conflict in mlx5 due to 8960b38932 ("linux/dim:
Rename externally used net_dim members") which has been pulled into your
tree in the meantime, but resolution seems not that bad ... getting current
bpf-next out now before there's coming more on mlx5. ;) I'm Cc'ing Saeed
just so he's aware of the resolution below:
** First conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c:
<<<<<<< HEAD
static int mlx5e_open_cq(struct mlx5e_channel *c,
struct dim_cq_moder moder,
struct mlx5e_cq_param *param,
struct mlx5e_cq *cq)
=======
int mlx5e_open_cq(struct mlx5e_channel *c, struct net_dim_cq_moder moder,
struct mlx5e_cq_param *param, struct mlx5e_cq *cq)
>>>>>>> e5a3e259ef
Resolution is to take the second chunk and rename net_dim_cq_moder into
dim_cq_moder. Also the signature for mlx5e_open_cq() in ...
drivers/net/ethernet/mellanox/mlx5/core/en.h +977
... and in mlx5e_open_xsk() ...
drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c +64
... needs the same rename from net_dim_cq_moder into dim_cq_moder.
** Second conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c:
<<<<<<< HEAD
int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix));
struct dim_cq_moder icocq_moder = {0, 0};
struct net_device *netdev = priv->netdev;
struct mlx5e_channel *c;
unsigned int irq;
=======
struct net_dim_cq_moder icocq_moder = {0, 0};
>>>>>>> e5a3e259ef
Take the second chunk and rename net_dim_cq_moder into dim_cq_moder
as well.
Let me know if you run into any issues. Anyway, the main changes are:
1) Long-awaited AF_XDP support for mlx5e driver, from Maxim.
2) Addition of two new per-cgroup BPF hooks for getsockopt and
setsockopt along with a new sockopt program type which allows more
fine-grained pass/reject settings for containers. Also add a sock_ops
callback that can be selectively enabled on a per-socket basis and is
executed for every RTT to help tracking TCP statistics, both features
from Stanislav.
3) Follow-up fix from loops in precision tracking which was not propagating
precision marks and as a result verifier assumed that some branches were
not taken and therefore wrongly removed as dead code, from Alexei.
4) Fix BPF cgroup release synchronization race which could lead to a
double-free if a leaf's cgroup_bpf object is released and a new BPF
program is attached to the one of ancestor cgroups in parallel, from Roman.
5) Support for bulking XDP_TX on veth devices which improves performance
in some cases by around 9%, from Toshiaki.
6) Allow for lookups into BPF devmap and improve feedback when calling into
bpf_redirect_map() as lookup is now performed right away in the helper
itself, from Toke.
7) Add support for fq's Earliest Departure Time to the Host Bandwidth
Manager (HBM) sample BPF program, from Lawrence.
8) Various cleanups and minor fixes all over the place from many others.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, gratuitous ARP/ND packets are sent every `miimon'
milliseconds. This commit allows a user to specify a custom delay
through a new option, `peer_notif_delay'.
Like for `updelay' and `downdelay', this delay should be a multiple of
`miimon' to avoid managing an additional work queue. The configuration
logic is copied from `updelay' and `downdelay'. However, the default
value cannot be set using a module parameter: Netlink or sysfs should
be used to configure this feature.
When setting `miimon' to 100 and `peer_notif_delay' to 500, we can
observe the 500 ms delay is respected:
20:30:19.354693 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28
20:30:19.874892 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28
20:30:20.394919 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28
20:30:20.914963 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28
In bond_mii_monitor(), I have tried to keep the lock logic readable.
The change is due to the fact we cannot rely on a notification to
lower the value of `bond->send_peer_notif' as `NETDEV_NOTIFY_PEERS' is
only triggered once every N times, while we need to decrement the
counter each time.
iproute2 also needs to be updated to be able to specify this new
attribute through `ip link'.
Signed-off-by: Vincent Bernat <vincent@bernat.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
windows real servers can handle gre tunnels, this patch allows
gre encapsulation with the tunneling method, thereby letting ipvs
be load balancer for windows-based services
Signed-off-by: Vadim Fedorenko <vfedorenko@yandex-team.ru>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Uppercase is a reminiscence from the iptables infrastructure, rename
this header before this is included in stable kernels.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Implement DEVX dispatching event by looking up for the applicable
subscriptions for the reported event and using their target fd to
signal/set the event.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Enable subscription for device events over DEVX.
Each subscription is added to the two level xarray data structure
according to its event number and the DEVX object information in case was
given with the given target fd.
Those events will be reported over the given fd once will occur.
Downstream patches will mange the dispatching to any subscription.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Introduce MLX5_IB_OBJECT_DEVX_ASYNC_EVENT_FD and its initial
implementation.
This object is from type class FD and will be used to read DEVX
async events.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Daniel Borkmann says:
====================
pull-request: bpf 2019-07-03
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix the interpreter to properly handle BPF_ALU32 | BPF_ARSH
on BE architectures, from Jiong.
2) Fix several bugs in the x32 BPF JIT for handling shifts by 0,
from Luke and Xi.
3) Fix NULL pointer deref in btf_type_is_resolve_source_only(),
from Stanislav.
4) Properly handle the check that forwarding is enabled on the device
in bpf_ipv6_fib_lookup() helper code, from Anton.
5) Fix UAPI bpf_prog_info fields alignment for archs that have 16 bit
alignment such as m68k, from Baruch.
6) Fix kernel hanging in unregister_netdevice loop while unregistering
device bound to XDP socket, from Ilya.
7) Properly terminate tail update in xskq_produce_flush_desc(), from Nathan.
8) Fix broken always_inline handling in test_lwt_seg6local, from Jiri.
9) Fix bpftool to use correct argument in cgroup errors, from Jakub.
10) Fix detaching dummy prog in XDP redirect sample code, from Prashant.
11) Add Jonathan to AF_XDP reviewers, from Björn.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>