OF graph want to count its endpoint number, same as
of_get_child_count(). This patch adds of_graph_get_endpoint_count()
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Linux kernel already has of_graph_get_remote_port_parent(),
but, sometimes we want to get own port parent.
This patch adds of_graph_get_port_parent()
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
It should use same method to get same result.
To getting remote-endpoint node,
let's use of_graph_get_remote_endpoint()
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
The markup inside the #if 0 comment actually refers to a
kernel-doc markup. As we're getting rid of DocBook update it.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The PCM notifier code is used only by OSS emulation layer, so we can
build it conditionally for reducing the size.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The ak4113, ak4114 and ak4117 i2c drivers have some tricky codes to
access the struct fields in the callback. This can be simplified by
replacing the struct fields with the array.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
This reverts commit 51f5677777 "nfsd: check for oversized NFSv2/v3
arguments", which breaks support for NFSv3 ACLs.
That patch was actually an earlier draft of a fix for the problem that
was eventually fixed by e6838a29ec "nfsd: check for oversized NFSv2/v3
arguments". But somehow I accidentally left this earlier draft in the
branch that was part of my 2.12 pull request.
Reported-by: Eryu Guan <eguan@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
commit fa8cddaf90 ("net phylib: Remove unnecessary condition check in phy")
removed the only place where the PHY flag PHY_HAS_MAGICANEG was
checked. But it left the flag being set in the drivers. Remove the flag.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
BBR congestion control depends on pacing, and pacing is
currently handled by sch_fq packet scheduler for performance reasons,
and also because implemening pacing with FQ was convenient to truly
avoid bursts.
However there are many cases where this packet scheduler constraint
is not practical.
- Many linux hosts are not focusing on handling thousands of TCP
flows in the most efficient way.
- Some routers use fq_codel or other AQM, but still would like
to use BBR for the few TCP flows they initiate/terminate.
This patch implements an automatic fallback to internal pacing.
Pacing is requested either by BBR or use of SO_MAX_PACING_RATE option.
If sch_fq happens to be in the egress path, pacing is delegated to
the qdisc, otherwise pacing is done by TCP itself.
One advantage of pacing from TCP stack is to get more precise rtt
estimations, and less work done from TX completion, since TCP Small
queue limits are not generally hit. Setups with single TX queue but
many cpus might even benefit from this.
Note that unlike sch_fq, we do not take into account header sizes.
Taking care of these headers would add additional complexity for
no practical differences in behavior.
Some performance numbers using 800 TCP_STREAM flows rate limited to
~48 Mbit per second on 40Gbit NIC.
If MQ+pfifo_fast is used on the NIC :
$ sar -n DEV 1 5 | grep eth
14:48:44 eth0 725743.00 2932134.00 46776.76 4335184.68 0.00 0.00 1.00
14:48:45 eth0 725349.00 2932112.00 46751.86 4335158.90 0.00 0.00 0.00
14:48:46 eth0 725101.00 2931153.00 46735.07 4333748.63 0.00 0.00 0.00
14:48:47 eth0 725099.00 2931161.00 46735.11 4333760.44 0.00 0.00 1.00
14:48:48 eth0 725160.00 2931731.00 46738.88 4334606.07 0.00 0.00 0.00
Average: eth0 725290.40 2931658.20 46747.54 4334491.74 0.00 0.00 0.40
$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
4 0 0 259825920 45644 2708324 0 0 21 2 247 98 0 0 100 0 0
4 0 0 259823744 45644 2708356 0 0 0 0 2400825 159843 0 19 81 0 0
0 0 0 259824208 45644 2708072 0 0 0 0 2407351 159929 0 19 81 0 0
1 0 0 259824592 45644 2708128 0 0 0 0 2405183 160386 0 19 80 0 0
1 0 0 259824272 45644 2707868 0 0 0 32 2396361 158037 0 19 81 0 0
Now use MQ+FQ :
lpaa23:~# echo fq >/proc/sys/net/core/default_qdisc
lpaa23:~# tc qdisc replace dev eth0 root mq
$ sar -n DEV 1 5 | grep eth
14:49:57 eth0 678614.00 2727930.00 43739.13 4033279.14 0.00 0.00 0.00
14:49:58 eth0 677620.00 2723971.00 43674.69 4027429.62 0.00 0.00 1.00
14:49:59 eth0 676396.00 2719050.00 43596.83 4020125.02 0.00 0.00 0.00
14:50:00 eth0 675197.00 2714173.00 43518.62 4012938.90 0.00 0.00 1.00
14:50:01 eth0 676388.00 2719063.00 43595.47 4020171.64 0.00 0.00 0.00
Average: eth0 676843.00 2720837.40 43624.95 4022788.86 0.00 0.00 0.40
$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 0 0 259832240 46008 2710912 0 0 21 2 223 192 0 1 99 0 0
1 0 0 259832896 46008 2710744 0 0 0 0 1702206 198078 0 17 82 0 0
0 0 0 259830272 46008 2710596 0 0 0 0 1696340 197756 1 17 83 0 0
4 0 0 259829168 46024 2710584 0 0 16 0 1688472 197158 1 17 82 0 0
3 0 0 259830224 46024 2710408 0 0 0 0 1692450 197212 0 18 82 0 0
As expected, number of interrupts per second is very different.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Van Jacobson <vanj@google.com>
Cc: Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
under udp flood the sk_receive_queue spinlock is heavily contended.
This patch try to reduce the contention on such lock adding a
second receive queue to the udp sockets; recvmsg() looks first
in such queue and, only if empty, tries to fetch the data from
sk_receive_queue. The latter is spliced into the newly added
queue every time the receive path has to acquire the
sk_receive_queue lock.
The accounting of forward allocated memory is still protected with
the sk_receive_queue lock, so udp_rmem_release() needs to acquire
both locks when the forward deficit is flushed.
On specific scenarios we can end up acquiring and releasing the
sk_receive_queue lock multiple times; that will be covered by
the next patch
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
And update __sk_queue_drop_skb() to work on the specified queue.
This will help the udp protocol to use an additional private
rx queue in a later patch.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
HID sensor hubs using Integrated Senor Hub (ISH) has added capability to
support batch mode. This allows host processor to go to sleep for extended
duration, while the sensor hub is storing samples in its internal buffers.
'Commit f4f4673b75 ("iio: add support for hardware fifo")' implements
feature in IIO core to implement such feature. This feature is used in
bmc150-accel-core.c to implement batch mode. This implementation allows
software device buffer watermark to be used as a hint to adjust hardware
FIFO.
But HID sensor hubs don't allow to change internal buffer size of FIFOs.
Instead an additional usage id to set "maximum report latency" is defined.
This allows host to go to sleep upto this latency period without getting
any report. Since there is no ABI to set this latency, a new attribute
"hwfifo_timeout" is added so that user mode can specify a latency.
This change checks presence of usage id to get/set maximum report latency
and if present, it will expose hwfifo_timeout.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
The libata documentation is now using ReST. Update references
to it to point to the new place.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
This adds spk_ttyio.c file. It contains a set of functions which implement
those methods in spk_synth struct which relate to sending bytes out using
serial comms. Implementations in this file perform the same function but
using TTY subsystem instead. Currently synths access serial ports, directly
poking standard ISA ports by trying to steal them from serial driver. Some ISA
cards actually need this way of doing it, but most other synthesizers don't,
and can actually work by using the proper TTY subsystem through a new N_SPEAKUP
line discipline. So this adds the methods for drivers to switch to accessing
serial ports through the TTY subsystem, whenever appropriate.
Signed-off-by: Okash Khawaja <okash.khawaja@gmail.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This exports tty_open_by_driver so that it can be called from other
places inside the kernel. The checks for null file pointer are based on
Alan Cox's patch here:
http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1215095.html.
Description below is quoted from it:
"[RFC] tty_port: allow a port to be opened with a tty that has no file handle
Let us create tty objects entirely in kernel space. Untested proposal to
show why all the ideas around rewriting half the uart stack are not needed.
With this a kernel created non file backed tty object could be used to handle
data, and set terminal modes. Not all ldiscs can cope with this as N_TTY in
particular has to work back to the fs/tty layer.
The tty_port code is however otherwise clean of file handles as far as I can
tell as is the low level tty port write path used by the ldisc, the
configuration low level interfaces and most of the ldiscs.
Currently you don't have any exposure to see tty hangups because those are
built around the file layer. However a) it's a fixed port so you probably
don't care about that b) if you do we can add a callback and c) you almost
certainly don't want the userspace tear down/rebuild behaviour anyway.
This should however be sufficient if we wanted for example to enumerate all
the bluetooth bound fixed ports via ACPI and make them directly available.
It doesn't deal with the case of a user opening a port that's also kernel
opened and that would need some locking out (so it returned EBUSY if bound
to a kernel device of some kind). That needs resolving along with how you
"up" or "down" your new bluetooth device, or enumerate it while providing
the existing tty API to avoid regressions (and to debug)."
The exported funtion is used later in this patch set to gain access to tty_struct.
[changed export symbol level - gkh]
Signed-off-by: Okash Khawaja <okash.khawaja@gmail.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The filesystem documentation was moved from DocBook to
Documentation/filesystems/. Update it at the sources.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
The libata documentation is now using ReST. Update references
to it to point to the new place.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
The "%" escape code of kernel-doc only handle letters. It
doesn't handle special chars. So, use the ``literal``
notation. That fixes this warning:
./include/linux/skbuff.h:2695: WARNING: Inline literal start-string without end-string.
No functional changes.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
There are a few issues on some kernel-doc markups that was
causing troubles with kernel-doc output on ReST format:
./kernel/futex.c:492: WARNING: Inline emphasis start-string without end-string.
./kernel/futex.c:1264: WARNING: Block quote ends without a blank line; unexpected unindent.
./kernel/futex.c:1721: WARNING: Block quote ends without a blank line; unexpected unindent.
./kernel/futex.c:2338: WARNING: Block quote ends without a blank line; unexpected unindent.
./kernel/futex.c:2426: WARNING: Block quote ends without a blank line; unexpected unindent.
./kernel/futex.c:2899: WARNING: Block quote ends without a blank line; unexpected unindent.
./kernel/futex.c:2972: WARNING: Block quote ends without a blank line; unexpected unindent.
Fix them.
No functional changes.
Acked-by: Darren Hart (VMware) <dvhart@infradead.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
The info->target comes from userspace and it would be used directly.
So we need to add the sanity check to make sure it is a valid standard
target, although the ebtables tool has already checked it. Kernel needs
to validate anything coming from userspace.
If the target is set as an evil value, it would break the ebtables
and cause a panic. Because the non-standard target is treated as one
offset.
Now add one helper function ebt_invalid_target, and we would replace
the macro INVALID_TARGET later.
Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The raid6_gfexp table represents {2}^n values for 0 <= n < 256. The
Linux async_tx framework pass values from raid6_gfexp as coefficients
for each source to prep_dma_pq() callback of DMA channel with PQ
capability. This creates problem for RAID6 offload engines (such as
Broadcom SBA) which take disk position (i.e. log of {2}) instead of
multiplicative cofficients from raid6_gfexp table.
This patch adds raid6_gflog table having log-of-2 value for any given
x such that 0 <= x < 256. For any given disk coefficient x, the
corresponding disk position is given by raid6_gflog[x]. The RAID6
offload engine driver can use this newly added raid6_gflog table to
get disk position from multiplicative coefficient.
Signed-off-by: Anup Patel <anup.patel@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Acked-by: Shaohua Li <shli@fb.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Pull networking fixes from David Miller:
1) Track alignment in BPF verifier so that legitimate programs won't be
rejected on !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS architectures.
2) Make tail calls work properly in arm64 BPF JIT, from Deniel
Borkmann.
3) Make the configuration and semantics Generic XDP make more sense and
don't allow both generic XDP and a driver specific instance to be
active at the same time. Also from Daniel.
4) Don't crash on resume in xen-netfront, from Vitaly Kuznetsov.
5) Fix use-after-free in VRF driver, from Gao Feng.
6) Use netdev_alloc_skb_ip_align() to avoid unaligned IP headers in
qca_spi driver, from Stefan Wahren.
7) Always run cleanup routines in BPF samples when we get SIGTERM, from
Andy Gospodarek.
8) The mdio phy code should bring PHYs out of reset using the shared
GPIO lines before invoking bus->reset(). From Florian Fainelli.
9) Some USB descriptor access endian fixes in various drivers from
Johan Hovold.
10) Handle PAUSE advertisements properly in mlx5 driver, from Gal
Pressman.
11) Fix reversed test in mlx5e_setup_tc(), from Saeed Mahameed.
12) Cure netdev leak in AF_PACKET when using timestamping via control
messages. From Douglas Caetano dos Santos.
13) netcp doesn't support HWTSTAMP_FILTER_ALl, reject it. From Miroslav
Lichvar.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
ldmvsw: stop the clean timer at beginning of remove
ldmvsw: unregistering netdev before disable hardware
net: netcp: fix check of requested timestamping filter
ipv6: avoid dad-failures for addresses with NODAD
qed: Fix uninitialized data in aRFS infrastructure
mdio: mux: fix device_node_continue.cocci warnings
net/packet: fix missing net_device reference release
net/mlx4_core: Use min3 to select number of MSI-X vectors
macvlan: Fix performance issues with vlan tagged packets
net: stmmac: use correct pointer when printing normal descriptor ring
net/mlx5: Use underlay QPN from the root name space
net/mlx5e: IPoIB, Only support regular RQ for now
net/mlx5e: Fix setup TC ndo
net/mlx5e: Fix ethtool pause support and advertise reporting
net/mlx5e: Use the correct pause values for ethtool advertising
vmxnet3: ensure that adapter is in proper state during force_close
sfc: revert changes to NIC revision numbers
net: ch9200: add missing USB-descriptor endianness conversions
net: irda: irda-usb: fix firmware name on big-endian hosts
net: dsa: mv88e6xxx: add default case to switch
...
This patch starts adding support to Octo SPI protocols (SPI x-y-8).
Op codes for Fast Read and/or Page Program operations using Octo SPI
protocols are not known yet (no JEDEC specification has defined them yet)
but we'd rather introduce the Octo SPI protocols now so it's done as it
should be.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Reviewed-by: Marek Vasut <marek.vasut@gmail.com>
This patch introduces support to Double Transfer Rate (DTR) SPI protocols.
DTR is used only for Fast Read operations.
According to manufacturer datasheets, whatever the number of I/O lines
used during instruction (x) and address/mode/dummy (y) clock cycles, DTR
is used only during data (z) clock cycles of SPI x-y-z protocols.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Reviewed-by: Marek Vasut <marek.vasut@gmail.com>
This patch changes the prototype of spi_nor_scan(): its 3rd parameter
is replaced by a 'struct spi_nor_hwcaps' pointer, which tells the spi-nor
framework about the actual hardware capabilities supported by the SPI
controller and its driver.
Besides, this patch also introduces a new 'struct spi_nor_flash_parameter'
telling the spi-nor framework about the hardware capabilities supported by
the SPI flash memory and the associated settings required to use those
hardware caps.
Then, to improve the readability of spi_nor_scan(), the discovery of the
memory settings and the memory initialization are now split into two
dedicated functions.
1 - spi_nor_init_params()
The spi_nor_init_params() function is responsible for initializing the
'struct spi_nor_flash_parameter'. Currently this structure is filled with
legacy values but further patches will allow to override some parameter
values dynamically, for instance by reading the JESD216 Serial Flash
Discoverable Parameter (SFDP) tables from the SPI memory.
The spi_nor_init_params() function only deals with the hardware
capabilities of the SPI flash memory: especially it doesn't care about
the hardware capabilities supported by the SPI controller.
2 - spi_nor_setup()
The second function is called once the 'struct spi_nor_flash_parameter'
has been initialized by spi_nor_init_params().
With both 'struct spi_nor_flash_parameter' and 'struct spi_nor_hwcaps',
the new argument of spi_nor_scan(), spi_nor_setup() computes the best
match between hardware caps supported by both the (Q)SPI memory and
controller hence selecting the relevant settings for (Fast) Read and Page
Program operations.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Reviewed-by: Marek Vasut <marek.vasut@gmail.com>
struct svc_procinfo contains function pointers, and marking it as
constant avoids it being able to be used as an attach vector for
code injections.
Signed-off-by: Christoph Hellwig <hch@lst.de>
pc_count is the only writeable memeber of struct svc_procinfo, which is
a good candidate to be const-ified as it contains function pointers.
This patch moves it into out out struct svc_procinfo, and into a
separate writable array that is pointed to by struct svc_version.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Drop the resp argument as it can trivially be derived from the rqstp
argument. With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
Drop the argp argument as it can trivially be derived from the rqstp
argument. With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Drop the p and resp arguments as they are always NULL or can trivially
be derived from the rqstp argument. With that all functions now have the
same prototype, and we can remove the unsafe casting to kxdrproc_t.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Drop the argp and resp arguments as they can trivially be derived from
the rqstp argument. With that all functions now have the same prototype,
and we can remove the unsafe casting to svc_procfunc as well as the
svc_procfunc typedef itself.
Signed-off-by: Christoph Hellwig <hch@lst.de>
struct rpc_procinfo contains function pointers, and marking it as
constant avoids it being able to be used as an attach vector for
code injections.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
p_count is the only writeable memeber of struct rpc_procinfo, which is
a good candidate to be const-ified as it contains function pointers.
This patch moves it into out out struct rpc_procinfo, and into a
separate writable array that is pointed to by struct rpc_version and
indexed by p_statidx.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Pass struct rpc_request as the first argument instead of an untyped blob,
and mark the data object as const.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
AVR32 is gone. Now it's time to clean up the driver by removing
leftovers that was used by AVR32 related code.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The nand_read_page_raw() and nand_write_page_raw() functions might be
re-used by vendor-specific implementations of the read_page/write_page
functions. Instead of having vendor-specific code duplicate this code,
it is much better to export those functions and allow them to be
re-used.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Reviewed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
A number of NAND flashes have a capability called "on-die ECC" where the
NAND chip itself is capable of detecting and correcting errors.
Linux already has support for using the ECC implementation of the NAND
controller, or a software based ECC implementation, but not for using
the ECC implementation of the NAND controller. However, such an
implementation is sometimes useful in situations where the NAND
controller provides ECC algorithms that are not strong enough for the
NAND chip used on the system. A typical case is a NAND chip that
requires a 4-bit ECC, while the NAND controller only provides a 1-bit
ECC algorithm.
This commit introduces the support for the NAND_ECC_ON_DIE ECC mode:
- Parsing of the "on-die" value for the "nand-ecc-mode" Device Tree
property
- Handling NAND_ECC_ON_DIE case in nand_scan_tail(). The idea is that
the vendor specific code for the NAND chip must implement
->read_page() and ->write_page(). It may optionally provide its own
->read_page_raw() and ->write_page_raw() as well. For OOB operation,
we assume the standard operations are good enough, but they can be
overridden by the vendor specific code if needed.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Reviewed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Andreas reports that the following incremental update using our commit
protocol doesn't work.
# nft -f incremental-update.nft
delete element ip filter client_to_any { 10.180.86.22 : goto CIn_1 }
delete chain ip filter CIn_1
... Error: Could not process rule: Device or resource busy
The existing code is not well-integrated into the commit phase protocol,
since element deletions do not result in refcount decrement from the
preparation phase. This results in bogus EBUSY errors like the one
above.
Two new functions come with this patch:
* nft_set_elem_activate() function is used from the abort path, to
restore the set element refcounting on objects that occurred from
the preparation phase.
* nft_set_elem_deactivate() that is called from nft_del_setelem() to
decrement set element refcounting on objects from the preparation
phase in the commit protocol.
The nft_data_uninit() has been renamed to nft_data_release() since this
function does not uninitialize any data store in the data register,
instead just releases the references to objects. Moreover, a new
function nft_data_hold() has been introduced to be used from
nft_set_elem_activate().
Reported-by: Andreas Schultz <aschultz@tpip.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When looking up an iptables rule, the iptables binary compares the
aligned match and target data (XT_ALIGN). In some cases this can
exceed the actual data size to include padding bytes.
Before commit f77bc5b23f ("iptables: use match, target and data
copy_to_user helpers") the malloc()ed bytes were overwritten by the
kernel with kzalloced contents, zeroing the padding and making the
comparison succeed. After this patch, the kernel copies and clears
only data, leaving the padding bytes undefined.
Extend the clear operation from data size to aligned data size to
include the padding bytes, if any.
Padding bytes can be observed in both match and target, and the bug
triggered, by issuing a rule with match icmp and target ACCEPT:
iptables -t mangle -A INPUT -i lo -p icmp --icmp-type 1 -j ACCEPT
iptables -t mangle -D INPUT -i lo -p icmp --icmp-type 1 -j ACCEPT
Fixes: f77bc5b23f ("iptables: use match, target and data copy_to_user helpers")
Reported-by: Paul Moore <pmoore@redhat.com>
Reported-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>