There are two potential problems with the existing implementation.
1. Enable and disable can race after the atomic operations.
2. If a command fails the refcount is left in an inconsistent state.
Introduce a lock and perform error checking.
Fixes: a6f7d2aff6 ("net/mlx5: Add support for multiple RoCE enable")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
If two programs simultaneously try to write to the same part of a file
via direct IO and buffered IO, there's a chance that the post-diowrite
pagecache invalidation will fail on the dirty page. When this happens,
the dio write succeeded, which means that the page cache is no longer
coherent with the disk!
Programs are not supposed to mix IO types and this is a clear case of
data corruption, so store an EIO which will be reflected to userspace
during the next fsync. Replace the WARN_ON with a ratelimited pr_crit
so that the developers have /some/ kind of breadcrumb to track down the
offending program(s) and file(s) involved.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Add a missing command interface to work with a DCT. It includes: creating,
destroying and get events for.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The matching of the counters was not taken into account, fixed.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This abstraction has no clients anymore, remove it.
This is what remains from previous authors, so correct copyright
statement after recent modifications and code removal.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We cannot make a direct call to nf_ip6_reroute() because that would result
in autoloading the 'ipv6' module because of symbol dependencies.
Therefore, define reroute indirection in nf_ipv6_ops where this really
belongs to.
For IPv4, we can indeed make a direct function call, which is faster,
given IPv4 is built-in in the networking code by default. Still,
CONFIG_INET=n and CONFIG_NETFILTER=y is possible, so define empty inline
stub for IPv4 in such case.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We cannot make a direct call to nf_ip6_route() because that would result
in autoloading the 'ipv6' module because of symbol dependencies.
Therefore, define route indirection in nf_ipv6_ops where this really
belongs to.
For IPv4, we can indeed make a direct function call, which is faster,
given IPv4 is built-in in the networking code by default. Still,
CONFIG_INET=n and CONFIG_NETFILTER=y is possible, so define empty inline
stub for IPv4 in such case.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This is only used by nf_queue.c and this function comes with no symbol
dependencies with IPv6, it just refers to structure layouts. Therefore,
we can replace it by a direct function call from where it belongs.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We cannot make a direct call to nf_ip6_checksum_partial() because that
would result in autoloading the 'ipv6' module because of symbol
dependencies. Therefore, define checksum_partial indirection in
nf_ipv6_ops where this really belongs to.
For IPv4, we can indeed make a direct function call, which is faster,
given IPv4 is built-in in the networking code by default. Still,
CONFIG_INET=n and CONFIG_NETFILTER=y is possible, so define empty inline
stub for IPv4 in such case.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We cannot make a direct call to nf_ip6_checksum() because that would
result in autoloading the 'ipv6' module because of symbol dependencies.
Therefore, define checksum indirection in nf_ipv6_ops where this really
belongs to.
For IPv4, we can indeed make a direct function call, which is faster,
given IPv4 is built-in in the networking code by default. Still,
CONFIG_INET=n and CONFIG_NETFILTER=y is possible, so define empty inline
stub for IPv4 in such case.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The netfilter NAT core cannot deal with more than one NAT hook per hook
location (prerouting, input ...), because the NAT hooks install a NAT null
binding in case the iptables nat table (iptable_nat hooks) or the
corresponding nftables chain (nft nat hooks) doesn't specify a nat
transformation.
Null bindings are needed to detect port collsisions between NAT-ed and
non-NAT-ed connections.
This causes nftables NAT rules to not work when iptable_nat module is
loaded, and vice versa because nat binding has already been attached
when the second nat hook is consulted.
The netfilter core is not really the correct location to handle this
(hooks are just hooks, the core has no notion of what kinds of side
effects a hook implements), but its the only place where we can check
for conflicts between both iptables hooks and nftables hooks without
adding dependencies.
So add nat annotation to hook_ops to describe those hooks that will
add NAT bindings and then make core reject if such a hook already exists.
The annotation fills a padding hole, in case further restrictions appar
we might change this to a 'u8 type' instead of bool.
iptables error if nft nat hook active:
iptables -t nat -A POSTROUTING -j MASQUERADE
iptables v1.4.21: can't initialize iptables table `nat': File exists
Perhaps iptables or your kernel needs to be upgraded.
nftables error if iptables nat table present:
nft -f /etc/nftables/ipv4-nat
/usr/etc/nftables/ipv4-nat:3:1-2: Error: Could not process rule: File exists
table nat {
^^
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
currently we always return -ENOENT to userspace if we can't find
a particular table, or if the table initialization fails.
Followup patch will make nat table init fail in case nftables already
registered a nat hook so this change makes xt_find_table_lock return
an ERR_PTR to return the errno value reported from the table init
function.
Add xt_request_find_table_lock as try_then_request_module replacement
and use it where needed.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This can be same as NF_INET_NUMHOOKS if we don't support DECNET.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
no need to define hook points if the family isn't supported.
Because we need these hooks for either nftables, arp/ebtables
or the 'call-iptables' hack we have in the bridge layer add two
new dependencies, NETFILTER_FAMILY_{ARP,BRIDGE}, and have the
users select them.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
no need to define hook points if the family isn't supported.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The kernel already has defines for this, but they are in uapi exposed
headers.
Including these from netns.h causes build errors and also adds unneeded
dependencies on heads that we don't need.
So move these defines to netfilter_defs.h and place the uapi ones
in ifndef __KERNEL__ to keep them for userspace.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
struct net contains:
struct nf_hook_entries __rcu *hooks[NFPROTO_NUMPROTO][NF_MAX_HOOKS];
which store the hook entry point locations for the various protocol
families and the hooks.
Using array results in compact c code when doing accesses, i.e.
x = rcu_dereference(net->nf.hooks[pf][hook]);
but its also wasting a lot of memory, as most families are
not used.
So split the array into those families that are used, which
are only 5 (instead of 13). In most cases, the 'pf' argument is
constant, i.e. gcc removes switch statement.
struct net before:
/* size: 5184, cachelines: 81, members: 46 */
after:
/* size: 4672, cachelines: 73, members: 46 */
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Giuseppe Scrivano says:
"SELinux, if enabled, registers for each new network namespace 6
netfilter hooks."
Cost for this is high. With synchronize_net() removed:
"The net benefit on an SMP machine with two cores is that creating a
new network namespace takes -40% of the original time."
This patch replaces synchronize_net+kvfree with call_rcu().
We store rcu_head at the tail of a structure that has no fixed layout,
i.e. we cannot use offsetof() to compute the start of the original
allocation. Thus store this information right after the rcu head.
We could simplify this by just placing the rcu_head at the start
of struct nf_hook_entries. However, this structure is used in
packet processing hotpath, so only place what is needed for that
at the beginning of the struct.
Reported-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Kernel-doc for @use_count does not currently have a field identifier.
All the rest of the fields do. @use_count is used internally and should
not be accessed directly by the driver so it should be marked as so.
Add [INTERN] identifier to @use_count field.
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When building kernel documentation sphinx emits the following warning
warning: No description found for parameter 'owner'
Add description for struct member 'owner'.
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When building kernel documentation sphinx emits the following warnings
No description found for parameter 'iio_dev'
Excess function parameter 'indio_dev' description in 'iio_device_register'
These warnings are caused by a macro with a different argument identifier to the
one listed in the kernel-docs.
Change macro argument to be the same as the docs.
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
memblock_virt_alloc() works for both memblock and bootmem, so use it and
make early_init_dt_alloc_memory_arch a static function. The arches using
bootmem define early_init_dt_alloc_memory_arch as either:
__alloc_bootmem(size, align, __pa(MAX_DMA_ADDRESS))
or:
alloc_bootmem_align(size, align)
Both of these evaluate to the same thing as does memblock_virt_alloc for
bootmem. So we can disable the arch specific functions by making
early_init_dt_alloc_memory_arch static and they can be removed in
subsequent commits.
Cc: Frank Rowand <frowand.list@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Chanwoo writes:
Update extcon for 4.16
Detailed description for this pull request:
1. Add support to notify USB connector for "ChromeOS Embedded Controller".
- extcon-usbc-cros-ec driver detects the EXTCON_USB and EXTCON_USB_HOST
connector type and then notify the state/properties to the consumer device.
2. Update the detection on probe time and clean-up code for "X-Power AXP288".
- Detect the state of connector after a couple of seconds after probe()
becasue extcon-axp288.c driver depends on other device driver like mux.
In order to guarantee the correct state, the extcon-axp288.c uses the
delayed_work.
- Set EXTCON_CHG_USB_SDP type as the safe default type if unknown connector
is attached because the data sheet of axp288 doesn't handle
the all exception cases.
- Remove unused code
3. Fix the minor issue of extcon driver
- Fix platform get_irq's error checking for extcon-adc-jack.
- Delete unneeded initialization for extcon-max8997/max77693.
Felipe writes:
usb: changes for v4.16 merge window
Not many changes here, the most important being an improvement for TI's
AM57xx and DRA7xx devices which allows them to disable a metastability
workaround in situations where we know what's going on.
Other than that, we have a set of changes on Renesas UDC to make the
code a little easier to read and maintain while also better supporting
extcon framework.
The u_serial adaptation layer learned to use kfifo instead of cooking
its own FIFO implementation.
DWC3 learned to decode a few more USB requests on the trace output.
All zero read and write masks in the regmap config are used to signal no
special mask is needed and the bus defaults are used. In some devices
all zero read/write masks are the special mask and bus defaults should
not be used. To signal this a new variable is added.
For example SPI often sets bit 7 in address to signal to the device a
read is requested. On TI AFE44xx parts with SPI interfaces no bit
needs to be set as registers are either read or write only and the
operation can be determined from the address only. For this case both
masks must be zero to not effect the address.
Signed-off-by: Andrew F. Davis <afd@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
The powerhold mask for TPS65917 is different when comapred to
the other palmas versions. Hence assign the right mask that enables
power off of tps65917 pmic correctly.
Signed-off-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Commit 5e572cab92 ("tpm: Enable CLKRUN protocol for Braswell
systems") disabled CLKRUN protocol during TPM transactions and re-enabled
once the transaction is completed. But there were still some corner cases
observed where, reading of TPM header failed for savestate command
while going to suspend, which resulted in suspend failure.
To fix this issue keep the CLKRUN protocol disabled for the entire
duration of a single TPM command and not disabling and re-enabling
again for every TPM transaction. For the other TPM accesses outside
TPM command flow, add a higher level of disabling and re-enabling
the CLKRUN protocol, instead of doing for every TPM transaction.
Fixes: 5e572cab92 ("tpm: Enable CLKRUN protocol for Braswell systems")
Signed-off-by: Azhar Shaikh <azhar.shaikh@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Device number (the character device index) is not a stable identifier
for a TPM chip. That is the reason why every call site passes
TPM_ANY_NUM to tpm_chip_find_get().
This commit changes the API in a way that instead a struct tpm_chip
instance is given and NULL means the default chip. In addition, this
commit refines the documentation to be up to date with the
implementation.
Suggested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> (@chip_num -> @chip part)
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@ziepe.ca>
Tested-by: PrasannaKumar Muralidharan <prasannatsmkumar@gmail.com>
With TPM 2.0 specification, the event logs may only be accessible by
calling an EFI Boot Service. Modify the EFI stub to copy the log area to
a new Linux-specific EFI configuration table so it remains accessible
once booted.
When calling this service, it is possible to specify the expected format
of the logs: TPM 1.2 (SHA1) or TPM 2.0 ("Crypto Agile"). For now, only the
first format is retrieved.
Signed-off-by: Thiebaud Weksteen <tweek@google.com>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Tested-by: Javier Martinez Canillas <javierm@redhat.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Add a driver for RAVE Supervisory Processor, an MCU implementing
various bits of housekeeping functionality (watchdoging, backlight
control, LED control, etc) on RAVE family of products by Zodiac
Inflight Innovations.
This driver implementes core MFD/serdev device as well as
communication subroutines necessary for commanding the device.
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Acked-by: Philippe Ombredanne <pombredanne@nexb.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Tested-by: Chris Healy <cphealy@gmail.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-01-07
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add a start of a framework for extending struct xdp_buff without
having the overhead of populating every data at runtime. Idea
is to have a new per-queue struct xdp_rxq_info that holds read
mostly data (currently that is, queue number and a pointer to
the corresponding netdev) which is set up during rxqueue config
time. When a XDP program is invoked, struct xdp_buff holds a
pointer to struct xdp_rxq_info that the BPF program can then
walk. The user facing BPF program that uses struct xdp_md for
context can use these members directly, and the verifier rewrites
context access transparently by walking the xdp_rxq_info and
net_device pointers to load the data, from Jesper.
2) Redo the reporting of offload device information to user space
such that it works in combination with network namespaces. The
latter is reported through a device/inode tuple as similarly
done in other subsystems as well (e.g. perf) in order to identify
the namespace. For this to work, ns_get_path() has been generalized
such that the namespace can be retrieved not only from a specific
task (perf case), but also from a callback where we deduce the
netns (ns_common) from a netdevice. bpftool support using the new
uapi info and extensive test cases for test_offload.py in BPF
selftests have been added as well, from Jakub.
3) Add two bpftool improvements: i) properly report the bpftool
version such that it corresponds to the version from the kernel
source tree. So pick the right linux/version.h from the source
tree instead of the installed one. ii) fix bpftool and also
bpf_jit_disasm build with bintutils >= 2.9. The reason for the
build breakage is that binutils library changed the function
signature to select the disassembler. Given this is needed in
multiple tools, add a proper feature detection to the
tools/build/features infrastructure, from Roman.
4) Implement the BPF syscall command BPF_MAP_GET_NEXT_KEY for the
stacktrace map. It is currently unimplemented, but there are
use cases where user space needs to walk all stacktrace map
entries e.g. for dumping or deleting map entries w/o having to
close and recreate the map. Add BPF selftests along with it,
from Yonghong.
5) Few follow-up cleanups for the bpftool cgroup code: i) rename
the cgroup 'list' command into 'show' as we have it for other
subcommands as well, ii) then alias the 'show' command such that
'list' is accepted which is also common practice in iproute2,
and iii) remove couple of newlines from error messages using
p_err(), from Jakub.
6) Two follow-up cleanups to sockmap code: i) remove the unused
bpf_compute_data_end_sk_skb() function and ii) only build the
sockmap infrastructure when CONFIG_INET is enabled since it's
only aware of TCP sockets at this time, from John.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ext4 needs to pass through error from its iomap handler to the page
fault handler so that it can properly detect ENOSPC and force
transaction commit and retry the fault (and block allocation). Add
argument to dax_iomap_fault() for passing such error.
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
similar to memdup_user(), but does *not* guarantee that result will
be physically contiguous; use only in cases where that's not a requirement
and free it with kvfree().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull vfs fixes from Al Viro:
- untangle sys_close() abuses in xt_bpf
- deal with register_shrinker() failures in sget()
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix "netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'"
sget(): handle failures of register_shrinker()
mm,vmscan: Make unregister_shrinker() no-op if register_shrinker() failed.
eventfd_ctx_get() is not used outside of eventfd.c, so unexport it and
fold it into eventfd_ctx_fileget().
(eventfd_ctx_get() was apparently added years ago for KVM irqfd's, but
was never used.)
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
eventfd_ctx_read() is not used outside of eventfd.c, so unexport it and
fold it into eventfd_read(). This slightly simplifies the code and
makes it more analogous to eventfd_write().
(eventfd_ctx_read() was apparently added years ago for KVM irqfd's, but
was never used.)
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Nothing actually calls eventfd_file_create() besides the eventfd2()
system call itself. So simplify things by folding it into the system
call and using anon_inode_getfd() instead of anon_inode_getfile(). This
removes over 40 lines with no change in functionality.
(eventfd_file_create() was apparently added years ago for KVM irqfd's,
but was never used.)
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>