This adds ring resize support. Seems to be necessary as
users such as tun allow userspace control over queue size.
If resize is used, this costs us ability to peek at queue without
consumer lock - should not be a big deal as peek and consumer are
usually run on the same CPU.
If ring is made bigger, ring contents is preserved. If ring is made
smaller, extra pointers are passed to an optional destructor callback.
Cleanup function also gains destructor callback such that
all pointers in queue can be cleaned up.
This changes some APIs but we don't have any users yet,
so it won't break bisect.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A simple array based FIFO of pointers. Intended for net stack so uses
skbs for type safety. Implemented as a set of wrappers around ptr_ring.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct media_devnode is currently embedded at struct media_device.
While this works fine during normal usage, it leads to a race
condition during devnode unregister. the problem is that drivers
assume that, after calling media_device_unregister(), the struct
that contains media_device can be freed. This is not true, as it
can't be freed until userspace closes all opened /dev/media devnodes.
In other words, if the media devnode is still open, and media_device
gets freed, any call to an ioctl will make the core to try to access
struct media_device, with will cause an use-after-free and even GPF.
Fix this by dynamically allocating the struct media_devnode and only
freeing it when it is safe.
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
A simple array based FIFO of pointers. Intended for net stack which
commonly has a single consumer/producer.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Along all media controller code, "mdev" is used to represent
a pointer to struct media_device, and "devnode" for a pointer
to struct media_devnode.
However, inside media-devnode.[ch], "mdev" is used to represent
a pointer to struct media_devnode.
This is very confusing and may lead to development errors.
So, let's change all occurrences at media-devnode.[ch] to
also use "devnode" for such pointers.
This patch doesn't make any functional changes.
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
IPv6 multicast and link-local addresses require special handling by the
VRF driver:
1. Rather than using the VRF device index and full FIB lookups,
packets to/from these addresses should use direct FIB lookups based on
the VRF device table.
2. fail sends/receives on a VRF device to/from a multicast address
(e.g, make ping6 ff02::1%<vrf> fail)
3. move the setting of the flow oif to the first dst lookup and revert
the change in icmpv6_echo_reply made in ca254490c8 ("net: Add VRF
support to IPv6 stack"). Linklocal/mcast addresses require use of the
skb->dev.
With this change connections into and out of a VRF enslaved device work
for multicast and link-local addresses work (icmp, tcp, and udp)
e.g.,
1. packets into VM with VRF config:
ping6 -c3 fe80::e0:f9ff:fe1c:b974%br1
ping6 -c3 ff02::1%br1
ssh -6 fe80::e0:f9ff:fe1c:b974%br1
2. packets going out a VRF enslaved device:
ping6 -c3 fe80::18f8:83ff:fe4b:7a2e%eth1
ping6 -c3 ff02::1%eth1
ssh -6 root@fe80::18f8:83ff:fe4b:7a2e%eth1
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow drivers to pass flow arg to functions where the arg is not const
and allow the driver to make updates as needed (eg., setting oif).
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The spec allows backchannels for multiple clients to share the same tcp
connection. When that happens, we need to use the same xprt for all of
them. Similarly, we need the same xps.
This fixes list corruption introduced by the multipath code.
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Trond Myklebust <trondmy@primarydata.com>
Liping Zhang says:
"Users may add such a wrong nft rules successfully, which will cause an
endless jump loop:
# nft add rule filter test tcp dport vmap {1: jump test}
This is because before we commit, the element in the current anonymous
set is inactive, so osp->walk will skip this element and miss the
validate check."
To resolve this problem, this patch passes the generation mask to the
walk function through the iter container structure depending on the code
path:
1) If we're dumping the elements, then we have to check if the element
is active in the current generation. Thus, we check for the current
bit in the genmask.
2) If we're checking for loops, then we have to check if the element is
active in the next generation, as we're in the middle of a
transaction. Thus, we check for the next bit in the genmask.
Based on original patch from Liping Zhang.
Reported-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Tested-by: Liping Zhang <liping.zhang@spreadtrum.com>
Return the raw key with no other processing so that the caller
can copy it or MPI parse it, etc.
The scope is to have only one ANS.1 parser for all RSA
implementations.
Update the RSA software implementation so that it does
the MPI conversion on top.
Signed-off-by: Tudor Ambarus <tudor-dan.ambarus@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Hardware cipher implementation may require aligned buffers. All buffers
that potentially are processed with a cipher are now aligned.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The CTR DRBG derives its random data from the CTR that is encrypted with
AES.
This patch now changes the CTR DRBG implementation such that the
CTR AES mode is employed. This allows the use of steamlined CTR AES
implementation such as ctr-aes-aesni.
Unfortunately there are the following subtile changes we need to apply
when using the CTR AES mode:
- the CTR mode increments the counter after the cipher operation, but
the CTR DRBG requires the increment before the cipher op. Hence, the
crypto_inc is applied to the counter (drbg->V) once it is
recalculated.
- the CTR mode wants to encrypt data, but the CTR DRBG is interested in
the encrypted counter only. The full CTR mode is the XOR of the
encrypted counter with the plaintext data. To access the encrypted
counter, the patch uses a NULL data vector as plaintext to be
"encrypted".
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This adds an ABI for listening to events on GPIO lines.
The mechanism returns an anonymous file handle to a request
to listen to a specific offset on a specific gpiochip.
To fetch the stream of events from the file handle, userspace
simply reads an event.
- Events can be requested with the same flags as ordinary
handles, i.e. open drain or open source. An ioctl() call
GPIO_GET_LINEEVENT_IOCTL is issued indicating the desired
line.
- Events can be requested for falling edge events, rising
edge events, or both.
- All events are timestamped using the kernel real time
nanosecond timestamp (the same as is used by IIO).
- The supplied consumer label will appear in "lsgpio"
listings of the lines, and in /proc/interrupts as the
mechanism will request an interrupt from the gpio chip.
- Events are not supported on gpiochips that do not serve
interrupts (no legal .to_irq() call). The event interrupt
is threaded to avoid any realtime problems.
- It is possible to also directly read the current value
of the registered GPIO line by issuing the same
GPIOHANDLE_GET_LINE_VALUES_IOCTL as used by the
line handles. Setting the value is not supported: we
do not listen to events on output lines.
This ABI is strongly influenced by Industrial I/O and surpasses
the old sysfs ABI by providing proper precision timestamps,
making it possible to set flags like open drain, and put
consumer names on the GPIO lines.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
This adds a userspace ABI for reading and writing GPIO lines.
The mechanism returns an anonymous file handle to a request
to read/write n offsets from a gpiochip. This file handle
in turn accepts two ioctl()s: one that reads and one that
writes values to the selected lines.
- Handles can be requested as input/output, active low,
open drain, open source, however when you issue a request
for n lines with GPIO_GET_LINEHANDLE_IOCTL, they must all
have the same flags, i.e. all inputs or all outputs, all
open drain etc. If a granular control of the flags for
each line is desired, they need to be requested
individually, not in a batch.
- The GPIOHANDLE_GET_LINE_VALUES_IOCTL read ioctl() can be
issued also to output lines to verify that the hardware
is in the expected state.
- It reads and writes up to GPIOHANDLES_MAX lines at once,
utilizing the .set_multiple() call in the driver if
possible, making the call efficient if several lines
can be written with a single register update.
The limitation of GPIOHANDLES_MAX to 64 lines is done under
the assumption that we may expect hardware that can issue a
transaction updating 64 bits at an instant but unlikely
anything larger than that.
ChangeLog v2->v3:
- Use gpiod_get_value_cansleep() so we support also slowpath
GPIO drivers.
- Fix up the UAPI docs kerneldoc.
- Allocate the anonymous fd last, so that the release
function don't get called until that point of something
fails. After this point, skip the errorpath.
ChangeLog v1->v2:
- Handle ioctl_compat() properly based on a similar patch
to the other ioctl() handling code.
- Use _IOWR() as we pass pointers both in and out of the
ioctl()
- Use kmalloc() and kfree() for the linehandled, do not
try to be fancy with devm_* it doesn't work the way I
thought.
- Fix const-correctness on the linehandle name field.
Acked-by: Michael Welling <mwelling@ieee.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Nothing is using its return value so change it to return void.
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This commit breaks torture_online() and torture_offline() out of
torture_onoff() in preparation for allowing waketorture finer-grained
control of its CPU-hotplug workload.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
For the third time in three years, I'm changing my e-mail at
Samsung. That's bad, as it may stop communications with me for
a while. So, this time, I'll also the mchehab@kernel.org e-mail,
as it remains stable since ever.
Cc: stable@vger.kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Since nothing is using the 2-phase API, and it adds more complexity than
benefit, remove it.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@kernel.org>
Currently, if arch code wants to supply seccomp_data directly to
seccomp (which is generally much faster than having seccomp do it
using the syscall_get_xyz() API), it has to use the two-phase
seccomp hooks. Add it to the easy hooks, too.
Cc: linux-arch@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reset controller changes for v4.8, part 2
- add Amlogic Meson SoC Reset Controller driver
* tag 'reset-for-4.8-2' of git://git.pengutronix.de/git/pza/linux:
dt-bindings: reset: Add bindings for the Meson SoC Reset Controller
reset: Add support for the Amlogic Meson SoC Reset Controller
Signed-off-by: Olof Johansson <olof@lixom.net>
The NMI watchdog uses either the fixed cycles or a generic cycles
counter. This causes a lot of conflicts with users of the PMU who want
to run a full group including the cycles fixed counter, for example
the --topdown support recently added to perf stat. The code needs to
fall back to not use groups, which can cause measurement inaccuracy
due to multiplexing errors.
This patch switches the NMI watchdog to use reference cycles
on Intel systems. This is actually more accurate than cycles,
because cycles can tick faster than the measured CPU Frequency
due to Turbo mode.
The ref cycles always tick at their frequency, or slower when
the system is idling. That means the NMI watchdog can never
expire too early, unlike with cycles.
The reference cycles tick roughly at the frequency of the TSC,
so the same period computation can be used.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Cc: jolsa@kernel.org
Link: http://lkml.kernel.org/r/1465478079-19993-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
crypto_alloc_hash returns an ERR_PTR(), not NULL.
Also reset peer_integrity_tfm to NULL, to not call crypto_free_hash()
on an errno in the cleanup path.
Reported-by: Insu Yun <wuninsu@gmail.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
This contains various cosmetic fixes ranging from simple typos to
const-ifying, and using booleans properly.
Original commit messages from Fabian's patch set:
drbd: debugfs: constify drbd_version_fops
drbd: use seq_put instead of seq_print where possible
drbd: include linux/uaccess.h instead of asm/uaccess.h
drbd: use const char * const for drbd strings
drbd: kerneldoc warning fix in w_e_end_data_req()
drbd: use unsigned for one bit fields
drbd: use bool for peer is_ states
drbd: fix typo
drbd: use | for bitmask combination
drbd: use true/false for bool
drbd: fix drbd_bm_init() comments
drbd: introduce peer state union
drbd: fix maybe_pull_ahead() locking comments
drbd: use bool for growing
drbd: remove redundant declarations
drbd: replace if/BUG by BUG_ON
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Roland Kammerer <roland.kammerer@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Make sure we have at least 67 (> AL_UPDATES_PER_TRANSACTION)
al-extents available, and allow up to half of that to be
discarded in one bio.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
We can avoid spurious data divergence caused by partially-ignored
discards on certain backends with discard_zeroes_data=0, if we
translate partial unaligned discard requests into explicit zero-out.
The relevant use case is LVM/DM thin.
If on different nodes, DRBD is backed by devices with differing
discard characteristics, discards may lead to data divergence
(old data or garbage left over on one backend, zeroes due to
unmapped areas on the other backend). Online verify would now
potentially report tons of spurious differences.
While probably harmless for most use cases (fstrim on a file system),
DRBD cannot have that, it would violate our promise to upper layers
that our data instances on the nodes are identical.
To be correct and play safe (make sure data is identical on both copies),
we would have to disable discard support, if our local backend (on a
Primary) does not support "discard_zeroes_data=true".
We'd also have to translate discards to explicit zero-out on the
receiving (typically: Secondary) side, unless the receiving side
supports "discard_zeroes_data=true".
Which both would allocate those blocks, instead of unmapping them,
in contrast with expectations.
LVM/DM thin does set discard_zeroes_data=0,
because it silently ignores discards to partial chunks.
We can work around this by checking the alignment first.
For unaligned (wrt. alignment and granularity) or too small discards,
we zero-out the initial (and/or) trailing unaligned partial chunks,
but discard all the aligned full chunks.
At least for LVM/DM thin, the result is effectively "discard_zeroes_data=1".
Arguably it should behave this way internally, by default,
and we'll try to make that happen.
But our workaround is still valid for already deployed setups,
and for other devices that may behave this way.
Setting discard-zeroes-if-aligned=yes will allow DRBD to use
discards, and to announce discard_zeroes_data=true, even on
backends that announce discard_zeroes_data=false.
Setting discard-zeroes-if-aligned=no will cause DRBD to always
fall-back to zero-out on the receiving side, and to not even
announce discard capabilities on the Primary, if the respective
backend announces discard_zeroes_data=false.
We used to ignore the discard_zeroes_data setting completely.
To not break established and expected behaviour, and suddenly
cause fstrim on thin-provisioned LVs to run out-of-space,
instead of freeing up space, the default value is "yes".
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
As long as the value is 0 the feature is disabled. With setting
it to a positive value, DRBD limits and aligns its resync requests
to the rs-discard-granularity setting. If the sync source detects
all zeros in such a block, the resync target discards the range
on disk.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Renesas ARM Based SoC R-Car SYSC Updates for v4.8
e0c98b9171 soc: renesas: rcar-sysc: Add support for R-Car M3-W power areas
74699228b9 soc: renesas: Add r8a7796 SYSC PM Domain Binding Definitions
2ff1bf77e4 soc: renesas: rcar-sysc: Document r8a7796 support
* tag 'renesas-rcar-sysc-for-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
soc: renesas: rcar-sysc: Add support for R-Car M3-W power areas
soc: renesas: Add r8a7796 SYSC PM Domain Binding Definitions
soc: renesas: rcar-sysc: Document r8a7796 support
Signed-off-by: Olof Johansson <olof@lixom.net>
Topic branch for adding Exynos 5410 Odroid XU board for v4.8.
This brings support for Hardkernel's Odroid XU board. It was the first
design with big.LITTLE SoC from Samsung: Exynos5410. The board is not
very popular. Newer XU3 and XU4 got more attention.
Board details:
1. Exynos5410 octa-core (A15+A7, however as of now only one cluster is
enabled),
2. 2 GB DDR3 RAM,
3. PowerVR SGX544MP3 GPU (not enabled in DTS),
4. USB 3.0 Host x 1, USB 3.0 OTG x 1, USB 2.0 Host x 4,
5. HDMI 1.4a, MIPI DSI and Display Port (Display Port not on all of
revisions though),
6. eMMC 4.5 and microSD slots.
* tag 'samsung-dt-odroid-xu-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux: (28 commits)
ARM: dts: exynos: Add watchdog and Security SubSystem to Exynos5410
ARM: dts: exynos: Configure PWM, usb3503, PMIC and thermal on Odroid XU board
ARM: dts: exynos: Add Thermal Management Unit to Exynos5410
ARM: dts: exynos: Interrupt for USB DWC3-1 differs between Exynos5420 and 5410
dt-bindings: clock: Add watchdog and SSS clock IDs to Exynos5410
dt-bindings: clock: Add TMU clock ID to Exynos5410
ARM: dts: exynos: Add RTC and I2C to Exynos5410
ARM: dts: exynos: Add I2C, PWM and UART pinctrl to Exynos5410
ARM: dts: exynos: Move HSI2C nodes to exynos54xx.dtsi
ARM: dts: exynos: Add initial support for Odroid XU board
ARM: dts: exynos: Add USB to Exynos5410
ARM: dts: exynos: Move common Exynos5410/542x/5800 nodes to new DTSI
ARM: dts: exynos: MCT is not an interrupt controller and extend length of iomap
ARM: dts: exynos: Enable UART3 on Exynos5410
ARM: dts: exynos: Include common exynos5 in exynos5410.dtsi
ARM: dts: exynos: Move Exynos5250 and Exynos5420 nodes under soc
ARM: dts: exynos: Use phandle to get parent node in exynos5250-snow
ARM: dts: exynos: Prepare for inclusion of exynos5.dtsi in exynos5410.dtsi
ARM: dts: exynos: Move common nodes to exynos5.dtsi
ARM: dts: exynos: Split Odroid XU3 LEDs to separate DTSI
...
Signed-off-by: Olof Johansson <olof@lixom.net>
Some I2C devices have multiple addresses assigned, for example each address
corresponding to a different internal register map page of the device.
So far drivers which need support for this have handled this with a driver
specific and non-generic implementation, e.g. passing the additional address
via platform data.
This patch provides a new helper function called i2c_new_secondary_device()
which is intended to provide a generic way to get the secondary address
as well as instantiate a struct i2c_client for the secondary address.
The function expects a pointer to the primary i2c_client, a name
for the secondary address and an optional default address. The name is used
as a handle to specify which secondary address to get.
The default address is used as a fallback in case no secondary address
was explicitly specified. In case no secondary address and no default
address were specified the function returns NULL.
For now the function only supports look-up of the secondary address
from devicetree, but it can be extended in the future
to for example support board files and/or ACPI.
Signed-off-by: Jean-Michel Hautbois <jean-michel.hautbois@veo-labs.com>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Currently the Linux PCI core does not touch power state of PCI bridges and
PCIe ports when system suspend is entered. Leaving them in D0 consumes
power unnecessarily and may prevent the CPU from entering deeper C-states.
With recent PCIe hardware we can power down the ports to save power given
that we take into account few restrictions:
- The PCIe port hardware is recent enough, starting from 2015.
- Devices connected to PCIe ports are effectively in D3cold once the port
is transitioned to D3 (the config space is not accessible anymore and
the link may be powered down).
- Devices behind the PCIe port need to be allowed to transition to D3cold
and back. There is a way both drivers and userspace can forbid this.
- If the device behind the PCIe port is capable of waking the system it
needs to be able to do so from D3cold.
This patch adds a new flag to struct pci_device called 'bridge_d3'. This
flag is set and cleared by the PCI core whenever there is a change in power
management state of any of the devices behind the PCIe port. When system
later on is suspended we only need to check this flag and if it is true
transition the port to D3 otherwise we leave it in D0.
Also provide override mechanism via command line parameter
"pcie_port_pm=[off|force]" that can be used to disable or enable the
feature regardless of the BIOS manufacturing date.
Tested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Use the low latency transport workqueue to process the task that is
next in line on the xprt->sending queue.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
rpciod can easily get congested due to the long list of queued rpc_tasks.
Having the receive queue wait in turn for those tasks to complete can
therefore be a bottleneck.
Address the problem by separating the workqueues into:
- rpciod: manages rpc_tasks
- xprtiod: manages transport related work.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>