This patch enables a hybrid polling mode. Instead of polling after IO
submission, we can induce an artificial delay, and then poll after that.
For example, if the IO is presumed to complete in 8 usecs from now, we
can sleep for 4 usecs, wake up, and then do our polling. This still puts
a sleep/wakeup cycle in the IO path, but instead of the wakeup happening
after the IO has completed, it'll happen before. With this hybrid
scheme, we can achieve big latency reductions while still using the same
(or less) amount of CPU.
Signed-off-by: Jens Axboe <axboe@fb.com>
Tested-By: Stephen Bates <sbates@raithlin.com>
Reviewed-By: Stephen Bates <sbates@raithlin.com>
Prior to 3.15, there was a race between zap_pte_range() and
page_mkclean() where writes to a page could be lost. Dave Hansen
discovered by inspection that there is a similar race between
move_ptes() and page_mkclean().
We've been able to reproduce the issue by enlarging the race window with
a msleep(), but have not been able to hit it without modifying the code.
So, we think it's a real issue, but is difficult or impossible to hit in
practice.
The zap_pte_range() issue is fixed by commit 1cf35d47712d("mm: split
'tlb_flush_mmu()' into tlb flushing and memory freeing parts"). And
this patch is to fix the race between page_mkclean() and mremap().
Here is one possible way to hit the race: suppose a process mmapped a
file with READ | WRITE and SHARED, it has two threads and they are bound
to 2 different CPUs, e.g. CPU1 and CPU2. mmap returned X, then thread
1 did a write to addr X so that CPU1 now has a writable TLB for addr X
on it. Thread 2 starts mremaping from addr X to Y while thread 1
cleaned the page and then did another write to the old addr X again.
The 2nd write from thread 1 could succeed but the value will get lost.
thread 1 thread 2
(bound to CPU1) (bound to CPU2)
1: write 1 to addr X to get a
writeable TLB on this CPU
2: mremap starts
3: move_ptes emptied PTE for addr X
and setup new PTE for addr Y and
then dropped PTL for X and Y
4: page laundering for N by doing
fadvise FADV_DONTNEED. When done,
pageframe N is deemed clean.
5: *write 2 to addr X
6: tlb flush for addr X
7: munmap (Y, pagesize) to make the
page unmapped
8: fadvise with FADV_DONTNEED again
to kick the page off the pagecache
9: pread the page from file to verify
the value. If 1 is there, it means
we have lost the written 2.
*the write may or may not cause segmentation fault, it depends on
if the TLB is still on the CPU.
Please note that this is only one specific way of how the race could
occur, it didn't mean that the race could only occur in exact the above
config, e.g. more than 2 threads could be involved and fadvise() could
be done in another thread, etc.
For anonymous pages, they could race between mremap() and page reclaim:
THP: a huge PMD is moved by mremap to a new huge PMD, then the new huge
PMD gets unmapped/splitted/pagedout before the flush tlb happened for
the old huge PMD in move_page_tables() and we could still write data to
it. The normal anonymous page has similar situation.
To fix this, check for any dirty PTE in move_ptes()/move_huge_pmd() and
if any, did the flush before dropping the PTL. If we did the flush for
every move_ptes()/move_huge_pmd() call then we do not need to do the
flush in move_pages_tables() for the whole range. But if we didn't, we
still need to do the whole range flush.
Alternatively, we can track which part of the range is flushed in
move_ptes()/move_huge_pmd() and which didn't to avoid flushing the whole
range in move_page_tables(). But that would require multiple tlb
flushes for the different sub-ranges and should be less efficient than
the single whole range flush.
KBuild test on my Sandybridge desktop doesn't show any noticeable change.
v4.9-rc4:
real 5m14.048s
user 32m19.800s
sys 4m50.320s
With this commit:
real 5m13.888s
user 32m19.330s
sys 4m51.200s
Reported-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Split callbacks for RX and async notification events on mei bus to
eliminate synchronization problems and to open way for RX optimizations.
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
GF(2^128) multiplication tables are typically used for secret
information, so it's a good idea to zero them on free.
Signed-off-by: Alex Cope <alexcope@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Defined device API strings. Vendor driver using mediated device
framework should use corresponding string for device_api attribute.
Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Neo Jia <cjia@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Vendor driver using mediated device framework would use same mechnism to
validate and prepare IRQs. Introducing this function to reduce code
replication in multiple drivers.
Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Neo Jia <cjia@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Vendor driver using mediated device framework should use
vfio_info_add_capability() to add capabilities.
Introduced this function to reduce code duplication in vendor drivers.
vfio_info_cap_shift() manipulated a data buffer to add an offset to each
element in a chain. This data buffer is documented in a uapi header.
Changing vfio_info_cap_shift symbol to be available to all drivers.
Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Neo Jia <cjia@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Added blocking notifier to IOMMU TYPE1 driver to notify vendor drivers
about DMA_UNMAP.
Exported two APIs vfio_register_notifier() and vfio_unregister_notifier().
Notifier should be registered, if external user wants to use
vfio_pin_pages()/vfio_unpin_pages() APIs to pin/unpin pages.
Vendor driver should use VFIO_IOMMU_NOTIFY_DMA_UNMAP action to invalidate
mappings.
Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Neo Jia <cjia@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Added APIs for pining and unpining set of pages. These call back into
backend iommu module to actually pin and unpin pages.
Added two new callback functions to struct vfio_iommu_driver_ops. Backend
IOMMU module that supports pining and unpinning pages for mdev devices
should provide these functions.
Renamed static functions in vfio_type1_iommu.c to resolve conflicts
Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Design for Mediated Device Driver:
Main purpose of this driver is to provide a common interface for mediated
device management that can be used by different drivers of different
devices.
This module provides a generic interface to create the device, add it to
mediated bus, add device to IOMMU group and then add it to vfio group.
Below is the high Level block diagram, with Nvidia, Intel and IBM devices
as example, since these are the devices which are going to actively use
this module as of now.
+---------------+
| |
| +-----------+ | mdev_register_driver() +--------------+
| | | +<------------------------+ __init() |
| | mdev | | | |
| | bus | +------------------------>+ |<-> VFIO user
| | driver | | probe()/remove() | vfio_mdev.ko | APIs
| | | | | |
| +-----------+ | +--------------+
| |
| MDEV CORE |
| MODULE |
| mdev.ko |
| +-----------+ | mdev_register_device() +--------------+
| | | +<------------------------+ |
| | | | | nvidia.ko |<-> physical
| | | +------------------------>+ | device
| | | | callback +--------------+
| | Physical | |
| | device | | mdev_register_device() +--------------+
| | interface | |<------------------------+ |
| | | | | i915.ko |<-> physical
| | | +------------------------>+ | device
| | | | callback +--------------+
| | | |
| | | | mdev_register_device() +--------------+
| | | +<------------------------+ |
| | | | | ccw_device.ko|<-> physical
| | | +------------------------>+ | device
| | | | callback +--------------+
| +-----------+ |
+---------------+
Core driver provides two types of registration interfaces:
1. Registration interface for mediated bus driver:
/**
* struct mdev_driver - Mediated device's driver
* @name: driver name
* @probe: called when new device created
* @remove:called when device removed
* @driver:device driver structure
*
**/
struct mdev_driver {
const char *name;
int (*probe) (struct device *dev);
void (*remove) (struct device *dev);
struct device_driver driver;
};
Mediated bus driver for mdev device should use this interface to register
and unregister with core driver respectively:
int mdev_register_driver(struct mdev_driver *drv, struct module *owner);
void mdev_unregister_driver(struct mdev_driver *drv);
Mediated bus driver is responsible to add/delete mediated devices to/from
VFIO group when devices are bound and unbound to the driver.
2. Physical device driver interface
This interface provides vendor driver the set APIs to manage physical
device related work in its driver. APIs are :
* dev_attr_groups: attributes of the parent device.
* mdev_attr_groups: attributes of the mediated device.
* supported_type_groups: attributes to define supported type. This is
mandatory field.
* create: to allocate basic resources in vendor driver for a mediated
device. This is mandatory to be provided by vendor driver.
* remove: to free resources in vendor driver when mediated device is
destroyed. This is mandatory to be provided by vendor driver.
* open: open callback of mediated device
* release: release callback of mediated device
* read : read emulation callback.
* write: write emulation callback.
* ioctl: ioctl callback.
* mmap: mmap emulation callback.
Drivers should use these interfaces to register and unregister device to
mdev core driver respectively:
extern int mdev_register_device(struct device *dev,
const struct parent_ops *ops);
extern void mdev_unregister_device(struct device *dev);
There are no locks to serialize above callbacks in mdev driver and
vfio_mdev driver. If required, vendor driver can have locks to serialize
above APIs in their driver.
Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Jike Song <jike.song@intel.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This will aid the seamless removal of the current probe()'s, more
commonly unused than used second parameter. Most I2C drivers can
simply switch over to the new interface, others which have DT
support can use its own matching instead and others can call
i2c_match_id() themselves. This brings I2C's device probe method
into line with other similar interfaces in the kernel and prevents
the requirement to pass an i2c_device_id table.
Suggested-by: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
[Kieran: fix rebase conflicts and adapt for dev_pm_domain_{attach,detach}]
Tested-by: Kieran Bingham <kieran@bingham.xyz>
Reviewed-by: Javier Martinez Canillas <javier@osg.samsung.com>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Kieran Bingham <kieran@bingham.xyz>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
When there was no other way to match a I2C device to driver i2c_match_id()
was exclusively used. However, now there are other types of tables which
are commonly supplied, matching on an i2c_device_id table is used less
frequently. Instead of _always_ calling i2c_match_id() from within the
framework, we only need to do so from drivers which have no other way of
matching. This patch makes i2c_match_id() available to the aforementioned
device drivers.
Acked-by: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Tested-by: Kieran Bingham <kieran@bingham.xyz>
Reviewed-by: Javier Martinez Canillas <javier@osg.samsung.com>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Kieran Bingham <kieran@bingham.xyz>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
This function provides a single call for all I2C devices which need to
match firstly using traditional OF means i.e by of_node, then if that
fails we attempt to match using the supplied I2C client name with a
list of supplied compatible strings with the '<vendor>,' string
removed. The latter is required due to the unruly naming conventions
used currently by I2C devices.
Acked-by: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
[Kieran: Fix static inline usage on !CONFIG_OF]
Tested-by: Kieran Bingham <kieran@bingham.xyz>
Reviewed-by: Javier Martinez Canillas <javier@osg.samsung.com>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Kieran Bingham <kieran@bingham.xyz>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Tvrtko needs
commit b3c11ac267
Author: Eric Engestrom <eric@engestrom.ch>
Date: Sat Nov 12 01:12:56 2016 +0000
drm: move allocation out of drm_get_format_name()
to be able to apply his patches without conflicts.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Mounting proc in user namespace containers fails if the xenbus
filesystem is mounted on /proc/xen because this directory fails
the "permanently empty" test. proc_create_mount_point() exists
specifically to create such mountpoints in proc but is currently
proc-internal. Export this interface to modules, then use it in
xenbus when creating /proc/xen.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
It has been suggested that having per-plane modifiers is making life
more difficult for userspace, so let's just retire modifier[1-3] and
use modifier[0] to apply to the entire framebuffer.
Obviosuly this means that if individual planes need different tiling
layouts and whatnot we will need a new modifier for each combination
of planes with different tiling layouts.
For a bit of extra backwards compatilbilty the kernel will allow
non-zero modifier[1+] but it require that they will match modifier[0].
This in case there's existing userspace out there that sets
modifier[1+] to something non-zero with planar formats.
Mostly a cocci job, with a bit of manual stuff mixed in.
@@
struct drm_framebuffer *fb;
expression E;
@@
- fb->modifier[E]
+ fb->modifier
@@
struct drm_framebuffer fb;
expression E;
@@
- fb.modifier[E]
+ fb.modifier
Cc: Kristian Høgsberg <hoegsberg@gmail.com>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Cc: dczaplejewicz@collabora.co.uk
Suggested-by: Kristian Høgsberg <hoegsberg@gmail.com>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Daniel Stone <daniels@collabora.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1479295996-26246-1-git-send-email-ville.syrjala@linux.intel.com
This patch adds support for the new channel request API introduced
in commit a8135d0d79
"dmaengine: core: Introduce new, universal API to request a channel".
param field of struct dma_slave_map type entries in the platform
data structure should be pointing to struct pl08x_channel_data
of related DMA channel.
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Tested-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Now sctp transport rhashtable uses hash(lport, dport, daddr) as the key
to hash a node to one chain. If in one host thousands of assocs connect
to one server with the same lport and different laddrs (although it's
not a normal case), all the transports would be hashed into the same
chain.
It may cause to keep returning -EBUSY when inserting a new node, as the
chain is too long and sctp inserts a transport node in a loop, which
could even lead to system hangs there.
The new rhlist interface works for this case that there are many nodes
with the same key in one chain. It puts them into a list then makes this
list be as a node of the chain.
This patch is to replace rhashtable_ interface with rhltable_ interface.
Since a chain would not be too long and it would not return -EBUSY with
this fix when inserting a node, the reinsert loop is also removed here.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When MAD arrives to the hypervisor, we need to identify which slave it
should be sent by destination GID. When L3 protocol is IPv4 the
GRH is replaced by an IPv4 header. This patch detects when IPv4 header
needs to be parsed instead of GRH.
Fixes: b6ffaeffae ('mlx4: In RoCE allow guests to have multiple GIDS')
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Callers of netpoll_poll_lock() own NAPI_STATE_SCHED
Callers of netpoll_poll_unlock() have BH blocked between
the NAPI_STATE_SCHED being cleared and poll_lock is released.
We can avoid the spinlock which has no contention, and use cmpxchg()
on poll_owner which we need to set anyway.
This removes a possible lockdep violation after the cited commit,
since sk_busy_loop() re-enables BH before calling busy_poll_stop()
Fixes: 217f697436 ("net: busy-poll: allow preemption in sk_busy_loop()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
acpi_video.c passed the ACPI_VIDEO_NOTIFY_* defines as type code to
acpi_notifier_call_chain(). Move these defines to acpi/video.h so
that acpi_notifier listeners can check the type code using these
defines.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Peter Wu <peter@lekensteyn.nl>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Another pile of misc:
- Explicit fencing for atomic! Big thanks to Gustavo, Sean, Rob 3x, Brian
and anyone else I've forgotten to make this happen.
- roll out fbdev helper ops to drivers (Stefan Christ)
- last bits of drm_crtc split-up&kerneldoc
- some drm_irq.c crtc functions cleanup
- prepare_fb helper for cma, works correctly with explicit fencing (Marek
Vasut)
- misc small patches all over
* tag 'drm-misc-next-2016-11-16' of git://anongit.freedesktop.org/git/drm-misc: (51 commits)
drm/fence: add out-fences support
drm/fence: add fence timeline to drm_crtc
drm/fence: add in-fences support
drm/bridge: analogix_dp: return error if transfer none byte
drm: drm_irq.h header cleanup
drm/irq: Unexport drm_vblank_on/off
drm/irq: Unexport drm_vblank_count
drm/irq: Make drm_vblank_pre/post_modeset internal
drm/nouveau: Use drm_crtc_vblank_off/on
drm/amdgpu: Use drm_crtc_vblank_on/off for dce6
drm/color: document NULL values and default settings better
drm: Drop externs from drm_crtc.h
drm: Move tile group code into drm_connector.c
drm: Extract drm_mode_config.[hc]
Revert "drm: Add aspect ratio parsing in DRM layer"
Revert "drm: Add and handle new aspect ratios in DRM layer"
drm/print: Move kerneldoc next to definition
drm: Consolidate dumb buffer docs
drm: Clean up kerneldoc for struct drm_driver
drm: Extract drm_drv.h
...
This patch changes the lwtunnel_headroom() function which is called
in ipv4_mtu() and ip6_mtu(), to also return the correct headroom
value when the lwtunnel state is OUTPUT_REDIRECT.
This patch enables e.g. SR-IPv6 encapsulations to work without
manually setting the route mtu.
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function declared twice, so remove one declaration of it.
Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Avoid breaking cross-compiled ACPI tools builds by rearranging the
handling of kernel header files.
This patch also contains OUTPUT/srctree cleanups in order to make above fix
working for various build environments.
Fixes: e323c02dee (ACPICA: MSVC9: Fix <sys/stat.h> inclusion order issue)
Reported-and-tested-by: Yisheng Xie <xieyisheng1@huawei.com>
Reported-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
FS threading brings performance improvements of 0-20% in glmark2.
The validation code checks for thread switch signals and ensures that
the registers of the other thread are not touched, and that our clamps
are not live across thread switches. It also checks that the
threading and branching instructions do not interfere.
(Original patch by Jonas, changes by anholt for style cleanup,
removing validation the kernel doesn't need to do, and adding the flag
for userspace).
v2: Minor style fixes from checkpatch.
Signed-off-by: Jonas Pfeil <pfeiljonas@gmx.de>
Signed-off-by: Eric Anholt <eric@anholt.net>
Pull Allwinner clock changes from Maxime Ripard:
The usual patches from us, but most notably the introduction of the A64
clocks unit.
* tag 'sunxi-clk-for-4.10' of https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux:
clk: sunxi-ng: sun8i-h3: Set CLK_SET_RATE_PARENT for audio module clocks
clk: sunxi-ng: sun8i-a23: Set CLK_SET_RATE_PARENT for audio module clocks
clk: sunxi-ng: Add A64 clocks
clk: sunxi-ng: Implement minimum for multipliers
clk: sunxi-ng: Add minimums for all the relevant structures and clocks
clk: sunxi-ng: Finish to convert to structures for arguments
clk: sunxi-ng: Remove the use of rational computations
clk: sunxi-ng: Rename the internal structures
clk: sunxi: mod0: improve function-level documentation
Pull i.MX clock updates from Shawn Guo:
- A patch series to fix the long standing issue with glitchy parent
mux of ldb_di_clk, which can hang up LVDS display when ipu_di_clk
is sourced from ldb_di_clk.
- A patch to add imx6ull clock support on top of imx6ul clock driver.
* tag 'imx-clk-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
clk: imx: clk-imx6ul: add clk support for imx6ull
clk: imx6: Fix procedure to switch the parent of LDB_DI_CLK
clk: imx6: Make the LDB_DI0 and LDB_DI1 clocks read-only
clk: imx6: Mask mmdc_ch1 handshake for periph2_sel and mmdc_ch1_axi_podf
NAPI drivers use napi_complete_done() or napi_complete() when
they drained RX ring and right before re-enabling device interrupts.
In busy polling, we can avoid interrupts being delivered since
we are polling RX ring in a controlled loop.
Drivers can chose to use napi_complete_done() return value
to reduce interrupts overhead while busy polling is active.
This is optional, legacy drivers should work fine even
if not updated.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Adam Belay <abelay@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After commit 4cd13c21b2 ("softirq: Let ksoftirqd do its job"),
sk_busy_loop() needs a bit of care :
softirqs might be delayed since we do not allow preemption yet.
This patch adds preemptiom points in sk_busy_loop(),
and makes sure no unnecessary cache line dirtying
or atomic operations are done while looping.
A new flag is added into napi->state : NAPI_STATE_IN_BUSY_POLL
This prevents napi_complete_done() from clearing NAPIF_STATE_SCHED,
so that sk_busy_loop() does not have to grab it again.
Similarly, netpoll_poll_lock() is done one time.
This gives about 10 to 20 % improvement in various busy polling
tests, especially when many threads are busy polling in
configurations with large number of NIC queues.
This should allow experimenting with bigger delays without
hurting overall latencies.
Tested:
On a 40Gb mlx4 NIC, 32 RX/TX queues.
echo 70 >/proc/sys/net/core/busy_read
for i in `seq 1 40`; do echo -n $i: ; ./super_netperf $i -H lpaa24 -t UDP_RR -- -N -n; done
Before: After:
1: 90072 92819
2: 157289 184007
3: 235772 213504
4: 344074 357513
5: 394755 458267
6: 461151 487819
7: 549116 625963
8: 544423 716219
9: 720460 738446
10: 794686 837612
11: 915998 923960
12: 937507 925107
13: 1019677 971506
14: 1046831 1113650
15: 1114154 1148902
16: 1105221 1179263
17: 1266552 1299585
18: 1258454 1383817
19: 1341453 1312194
20: 1363557 1488487
21: 1387979 1501004
22: 1417552 1601683
23: 1550049 1642002
24: 1568876 1601915
25: 1560239 1683607
26: 1640207 1745211
27: 1706540 1723574
28: 1638518 1722036
29: 1734309 1757447
30: 1782007 1855436
31: 1724806 1888539
32: 1717716 1944297
33: 1778716 1869118
34: 1805738 1983466
35: 1815694 2020758
36: 1893059 2035632
37: 1843406 2034653
38: 1888830 2086580
39: 1972827 2143567
40: 1877729 2181851
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Adam Belay <abelay@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch that removed the FIB offload infrastructure was a bit too
aggressive and also removed code needed to clean up us splitting the table
if additional rules were added. Specifically the function
fib_trie_flush_external was called at the end of a new rule being added to
flush the foreign trie entries from the main trie.
I updated the code so that we only call fib_trie_flush_external on the main
table so that we flush the entries for local from main. This way we don't
call it for every rule change which is what was happening previously.
Fixes: 347e3b28c1 ("switchdev: remove FIB offload infrastructure")
Reported-by: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I made some invalid assumptions with BPF_AND and BPF_MOD that could result in
invalid accesses to bpf map entries. Fix this up by doing a few things
1) Kill BPF_MOD support. This doesn't actually get used by the compiler in real
life and just adds extra complexity.
2) Fix the logic for BPF_AND, don't allow AND of negative numbers and set the
minimum value to 0 for positive AND's.
3) Don't do operations on the ranges if they are set to the limits, as they are
by definition undefined, and allowing arithmetic operations on those values
could make them appear valid when they really aren't.
This fixes the testcase provided by Jann as well as a few other theoretical
problems.
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The last open issues have been addressed, so it is time to move
this out of staging and into the mainline and to move the public
cec headers to include/uapi/linux.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
CDC-Only CEC devices are CEC devices that can only handle CDC messages,
all other messages are ignored.
Add a flag to signal that this is a CDC-Only device and act accordingly.
Also add helper functions to identify if a CEC device is configured as a
CDC-Only device, a second TV, a switch or a processor, since these variations
cannot be determined by the logical address alone.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Give the caller more control over how replies to a transmit are
handled. By default the reply will only go to the filehandle that
called CEC_TRANSMIT. If this new flag is set, then the reply will
also go to all followers.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
By default the CEC_MSG_USER_CONTROL_PRESSED/RELEASED messages
are passed on to the follower(s) only. If the new
CEC_LOG_ADDRS_FL_ALLOW_RC_PASSTHRU flag is set in the
flags field of struct cec_log_addrs then these messages are also
passed on to the remote control input subsystem and they will appear
as keystrokes.
This used to be the default behavior, but now you have to explicitly
enable it. This is done to force the caller to think about possible
security issues (e.g. if these messages are used to enter passwords).
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Add a helper to find timings based on the CEA-861 VIC code. Also,
add a helper that returns the pixel aspect ratio based on the
v4l2_dv_timings struct.
[mchehab@s-opensource.com: fix coding style]
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Add the CEA-861 VIC, the HDMI VIC and the picture aspect ratio information
where applicable.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Add picture aspect ratio information, the CEA-861 VIC (Video Identification
Code) and the HDMI VIC to struct v4l2_bt_timings.
The picture aspect was chosen rather than the pixel aspect since 1) the
CEA-861 standard uses picture aspect, and 2) pixel aspect ratio can become
tricky when dealing with pixel repeat timings. While we don't support those
yet at the moment, this might become necessary. And in that case using
picture aspect ratio makes more sense. And converting picture aspect ratio
to pixel aspect ratio is easy enough.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Jonathan writes:
Third set of IIO new device support, features and cleanup for the 4.10 cycle.
Includes Peter Rosin's interesting drivers for a comparator. First complex
use we have had with an analog front end made from discrete components.
Brian Masney's work on moving the tsl2583 driver out of staging also
feature extensively!
New Drivers
* DAC based on a digital potentiometer
- New driver for the use of a dpot as a DAC. Includes bindings and Axentia
entry in vendor prefixes.
* Envelope detector baed on DAC and a comparator including device tree
bindings.
Staging Graduation
* tsl2583.
Core new features
- Core provision for _available attributes. This one had been stalled for
a long time until Peter picked it up and ran with it!
- In kernel interface helpers to retrieve available info from channels.
Driver new features
* mcp4531
- Add range of available raw values (used for the dpot dac driver).
Driver cleanups and fixes for issues introduced
* ad7766
- Testing the wrong variable following devm_regulator_bulk_get introduced
with the driver earlier in this cycle.
* ad9832
- Fix a wrong ordering in the probe introduced in the previous set of
patches. A use before allocation bug.
* cros_ec_sensors
- Testing for an error in a u8 will never work.
* mpu3050
- Remove duplicate initializer for the module owner.
- Add missing i2c dependency.
- Inform the i2c mux core how it is used - step one in implifying device
tree bindings.
* st-sensors
- Get rid of large number of uninformative defines in favour of putting the
constants where they are relevant. It is clear what they are from where
they are used.
* tsl2583
- Fix unused function warning when CONFIG_PM disabled and remove the
ifdefs in favour of __maybe_unused.
- Refactor taos_chip_on to only read relevant registers.
- Make sure calibscale and integration time are being set.
- Verify chip is in ready to be used before calibration.
- Remove some repeated checks for chip status (it's protected by a mutex
so can't change until it's released)
- Change current state storage from a tristate enum to a boolean seeing as
only two values are actually used now.
- Drop a redundant write to the control regiser in taos_probe (it's a noop)
- Drop the FSF mailing address.
- Clean up logging to not use hard coded function names (use __func__
instead).
- Cleanup up variable and function name prefixes.
- Alignment of #define fixes.
- Fix comparison between signed and unsigned integer warnings.
- Add some newlines in favour of readability.
- Combine the two sysfs ABI docs that somehow ended up in different places.
- Fix multiline comment syntax.
- Move a code block to inside an else statement as it makes more sense there.
- Change tsl2583_als_calibrate to return 0 rather than a value nothing
reads.
- Drop some pointless brackets
- Don't assume 32bit unsigned int.
- Change to a per device instance lux table.
- Add missing tsl2583 to the list of supported devices in the intro comments.
- Improve commment on clearing of interrupts.
- Drop some uninformative comments.
- Drop a memset call that doesn't do anything useful any more.
- Don't initialize some return variables that are always set.
- Add Brian Masney as a module author after all these changes.
Format comments according to what checkpatch wants.
No other changes.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Memory-to-memory V4L2 devices all have file handle specific context.
Say this in the API documentation so that the user space may rely on it
being the case.
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>