Straight switch over to using iomap for direct I/O - we already have the
non-COW dio path in write_begin for DAX and files with extent size hints,
so nothing to add there. The COW path is ported over from the old
get_blocks version and a bit of a mess, but I have some work in progress
to make it look more like the buffered I/O COW path.
This gets rid of xfs_get_blocks_direct and the last caller of
xfs_get_blocks with the create flag set, so all that code can be removed.
Last but not least I've removed a comment in xfs_filemap_fault that
refers to xfs_get_blocks entirely instead of updating it - while the
reference is correct, the whole DAX fault path looks different than
the non-DAX one, so it seems rather pointless.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
This adds a full fledget direct I/O implementation using the iomap
interface. Full fledged in this case means all features are supported:
AIO, vectored I/O, any iov_iter type including kernel pointers, bvecs
and pipes, support for hole filling and async apending writes. It does
not mean supporting all the warts of the old generic code. We expect
i_rwsem to be held over the duration of the call, and we expect to
maintain i_dio_count ourselves, and we pass on any kinds of mapping
to the file system for now.
The algorithm used is very simple: We use iomap_apply to iterate over
the range of the I/O, and then we use the new bio_iov_iter_get_pages
helper to lock down the user range for the size of the extent.
bio_iov_iter_get_pages can currently lock down twice as many pages as
the old direct I/O code did, which means that we will have a better
batch factor for everything but overwrites of badly fragmented files.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com>
Tested-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
We want to use the per-sb completion workqueue from the new iomap
direct I/O code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
This patch drops the XFS-own i_iolock and uses the VFS i_rwsem which
recently replaced i_mutex instead. This means we only have to take
one lock instead of two in many fast path operations, and we can
also shrink the xfs_inode structure. Thanks to the xfs_ilock family
there is very little churn, the only thing of note is that we need
to switch to use the lock_two_directory helper for taking the i_rwsem
on two inodes in a few places to make sure our lock order matches
the one used in the VFS.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Christoph requested lockdep_assert_held() variants that distinguish
between held-for-read or held-for-write.
Provide:
int lock_is_held_type(struct lockdep_map *lock, int read)
which takes the same argument as lock_acquire(.read) and matches it to
the held_lock instance.
Use of this function should be gated by the debug_locks variable. When
that is 0 the return value of the lock_is_held_type() function is
undefined. This is done to allow both negative and positive tests for
holding locks.
By default we provide (positive) lockdep_assert_held{,_exclusive,_read}()
macros.
Requested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Several versions of DW DMAC have multi block transfers hardware
support. Hardware support of multi block transfers is disabled
by default if we use DT to configure DMAC and software emulation
of multi block transfers used instead.
Add multi-block property, so it is possible to enable hardware
multi block transfers (if present) via DT.
Switch from per device is_nollp variable to multi_block array
to be able enable/disable multi block transfers separately per
channel.
Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
The last param set in a transfer should always be pointing to dummy
param set in non-cyclic mode. When system wakes from low power state
EDMA PARAM slots may be reset to random values. Hence, re-initialize
dummy slot to dummy param set on system resume.
Signed-off-by: Vignesh R <vigneshr@ti.com>
Acked-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Based on the src/dst_port_window_size - if it is set - configure the DMA
channel to use double indexing in order to be able to loop within the
address window.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Some slave devices uses address window instead of single register for read
and/or write of data. With the src/dst_port_window_size the address window
can be specified and the DMAengine driver should use this information to
correctly set up the transfer to loop within the provided window.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
We should use dma_pool_zalloc instead of dma_pool_alloc/memset.
Signed-off-by: Souptick joarder <jrdr.linux@gmail.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
We should use dma_pool_zalloc instead of dma_pool_alloc/memset.
Signed-off-by: Souptick joarder <jrdr.linux@gmail.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Existing implementation does not honor the alignment restrictions imposed
by the DMA engines. Allocate buffers with built in slack for honoring
alignment restrictions. Creating new arrays to hold the aligned pointers
and use those pointers for operations.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
From Cyrille Pitchen:
"""
This pull request contains the following notable changes:
- add support to new memory parts.
- fix of spansion_quad_enable().
- fix of the Candence QSPI driver.
- constify some structure instances of the Freescale QSPI driver.
"""
Pull ARC fixes from Vineet Gupta:
- fix PAE40 crash [Yuriy]
- disable IO-Coherency by default
- use a different inline asm constraint for Zero Overhead loops
* tag 'arc-4.9-final' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: mm: PAE40: Fix crash at munmap
ARC: mm: IOC: Don't enable IOC by default
ARC: Don't use "+l" inline asm constraint
From Boris Brezillon:
"""
This pull request contains the following notable changes:
- new tango NAND controller driver
- new ox820 NAND controller driver
- addition of a new full-ID entry in the nand_ids table
- rework of the s3c240 driver to support DT
- extension of the nand_sdr_timings to expose tCCS, tPROG and tR
- addition of a new flag to ask the core to wait for tCCS when sending
a RNDIN/RNDOUT command
- addition of a new flag to ask the core to let the controller driver
send the READ/PROGPAGE command
This pull request also contains minor fixes/cleanup/cosmetic changes:
- properly support 512 ECC step size in the sunxi driver
- improve the error messages in the pxa probe path
- fix module autoload in the omap2 driver
- cleanup of several nand drivers to return nand_scan{_tail}() error
code instead of returning -EIO
- various cleanups in the denali driver
- cleanups in the ooblayout handling (MTD core)
- fix an error check in nandsim
"""
Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.
Replace the wrmsr/rdmrs_on_cpu() calls in the hotplug callbacks as they are
guaranteed to be invoked on the incoming/outgoing cpu.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Packages are kept in a list, which must be searched over and over.
We can be smarter than that and just store the package pointers in an array
which is allocated at init time. Sizing of the array is determined from the
topology information. That makes the package search a simple array lookup.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Delayed work structs are held in a static percpu storage, which makes no
sense at all because work is strictly per package and we never schedule
more than one work per package.
Aside of that the work cancelation in the hotplug is broken when the work
is queued on the outgoing cpu and canceled. Nothing reschedules the work on
another online cpu in the package, so the interrupts stay disabled and the
work_scheduled flag stays active.
Move the delayed work struct into the package struct, which is the only
sensible place to have it.
To simplify the cancelation logic schedule the work always on the cpu which
is the target for the sysfs files. This is required so the cancelation
logic in the cpu offline path cancels only when the outgoing cpu is the
current target and reschedule the work when there is still a online
CPU in the package.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Storage for a boolean information whether work is scheduled for a package
is kept in separate allocated storage, which is resized when the number of
detected packages grows.
With the proper locking in place this is a completely pointless exercise
because we can simply stick it into the per package struct.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
The work cancellation code, the thermal zone unregistering, the work code
and the interrupt notification function are racy against each other and
against cpu hotplug and module exit. The random locking sprinkeled all
over the place does not help anything and probably exists to make people
feel good. The resulting issues (mainly use after free) are probably
hard to trigger, but they clearly exist
Protect the package list with a spinlock so it can be accessed from the
interrupt notifier and also from the work function. The add/removal code in
the hotplug callbacks take the lock for list manipulation. That makes sure
that on removal neither the interrupt notifier nor the work function can
access the about to be freed package structure anymore.
The thermal zone unregistering is another trainwreck. It's not serialized
against the work function. So unregistering the zone device can race with
the work function and cause havoc.
Protect the thermal zone with a mutex, which is held in the work
function to make sure that the zone device is not being unregistered
concurrently.
To solve the module exit issues, we simply invoke the cpu offline callback
and let it work its magic. For that it's required to keep track of the
participating cpus in a package, because topology_core_mask is not affected
by calling the offline callback for teardown of the driver, so it would
never free the package as there is always a valid target in
topology_core_mask.
Use proper names for the locks so it's clear what they are for and add a
pile of comments to explain the protection rules.
It's amazing that fixing the locking and adding 30 lines of comments
explaining it still removes more lines than it adds.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Coding style fixups and replacement of overly complex constructs and random
error codes instead of returning the real ones. This mess makes the eyes bleeding.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Any randomly chosen struct name is more descriptive than phy_dev_entry.
Rename the whole thing to struct pkg_device, which describes the content
reasonably well and use the same variable name throughout the code so it
gets readable. Rename the msr struct members as well.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
There is no point in the whole package data refcounting dance because
topology_core_cpumask tells us whether this is the last cpu in the
package. If yes, then the package can go, if not it stays. It's already
serialized via the hotplug code.
While at it rename the first_cpu member of the package structure to
cpu. The first has absolutely no meaning.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
The threshold callbacks are installed before the initialization of the
online cpus has succeeded and removed after the teardown has been
done. That's both wrong as callbacks might be invoked into a half
initialized or torn down state.
Move them to the proper places: Last in init() and first in exit().
While at it shorten the insane long and horrible named function names.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
find_next_sibling() iterates over the online cpus and searches for a cpu
with the same package id as the current cpu. This is a pointless exercise
as topology_core_cpumask() allows a simple cpumask search for an online cpu
on the same package.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
In pkg_temp_thermal_device_remove() the package device is searched at the
beginning of the function. When the device refcount becomes zero another
search for the same device is conducted. Remove the pointless loop and use
the device pointer which was retrieved at the beginning of the function.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Wenn a package is removed nothing restores the thermal interrupt MSR so
the content will be stale when a CPU of that package becomes online again.
Aside of that the work function reenables interrupts before acknowledging
the current one, which is the wrong order to begin with.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
In the critical sysfs entry the thermal hwmon was returning wrong
temperature to the user-space. It was reporting the temperature of the
first trip point instead of the temperature of critical trip point.
For example:
/sys/class/hwmon/hwmon0/temp1_crit:50000
/sys/class/thermal/thermal_zone0/trip_point_0_temp:50000
/sys/class/thermal/thermal_zone0/trip_point_0_type:active
/sys/class/thermal/thermal_zone0/trip_point_3_temp:120000
/sys/class/thermal/thermal_zone0/trip_point_3_type:critical
Since commit e68b16abd9 ("thermal: add hwmon sysfs I/F") the driver
have been registering a sysfs entry if get_crit_temp() callback was
provided. However when accessed, it was calling get_trip_temp() instead
of the get_crit_temp().
Fixes: e68b16abd9 ("thermal: add hwmon sysfs I/F")
Cc: <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
This is a helper that pins down a range from an iov_iter and adds it to
a bio without requiring a separate memory allocation for the page array.
It will be used for upcoming direct I/O implementations for block devices
and iomap based file systems.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
[hch: ported to the iov_iter interface, renamed and added comments.
All blame should be directed to me and all fame should go to Kent
after this!]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 9cd56d916aa481ce8f56d9c5302a6ed90c2e0b5f)
When we change MTU or the number of channels on a netvsc device we get the
following logged:
hv_netvsc bf5edba8...: net device safe to remove
hv_netvsc: hv_netvsc channel opened successfully
hv_netvsc bf5edba8...: Send section size: 6144, Section count:2560
hv_netvsc bf5edba8...: Device MAC 00:15:5d:1e:91:12 link state up
This information is useful as debug at most.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko says:
====================
mlxsw: couple of enhancements and fixes
Couple of enhancements and fixes from Ido.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We call bus->init() before allocating 'lag.mapping'. Change the order of
operations in removal path to reflect that.
This makes the error path of mlxsw_core_bus_device_register() symmetric
with mlxsw_core_bus_device_unregister().
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without this rollback, the thermal zone is still registered during the
error path, whereas its private data is freed upon the destruction of
the underlying bus device due to the use of devm_kzalloc(). This results
in use after free.
Fix this by calling mlxsw_thermal_fini() from the appropriate place in
the error path.
Fixes: a50c1e3565 ("mlxsw: core: Implement thermal zone")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The shared buffer pools are containers whose size is used to calculate
the maximum usage for packets from / to a specific port / {port, PG/TC},
when dynamic threshold is employed.
While it's perfectly fine for the sum of the pools to exceed the maximum
size of the shared buffer, a single pool cannot.
Add a check when the pool size is set and forbid sizes larger than the
maximum size of the shared buffer.
Without the patch:
$ devlink sb pool set pci/0000:03:00.0 pool 0 size 999999999 thtype
dynamic
// No error is returned
With the patch:
$ devlink sb pool set pci/0000:03:00.0 pool 0 size 999999999 thtype
dynamic
devlink answers: Invalid argument
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to be able to limit the size of shared buffer pools, so query
the maximum size from the device during init.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Be symmetric to hashtable insert and remove filter from hashtable only
in case skip sw flag is not set.
Fixes: e69985c67c ("net/sched: cls_flower: Introduce support in SKIP SW flag")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Amir Vadai <amir@vadai.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
pskb_may_pull() can reallocate skb->head, we need to reload dh pointer
in dccp_invalid_packet() or risk use after free.
Bug found by Andrey Konovalov using syzkaller.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The newly added switchib driver fails to link if MLXSW_PCI=m:
drivers/net/ethernet/mellanox/mlxsw/mlxsw_switchib.o: In function^Cmlxsw_sib_module_exit':
switchib.c:(.exit.text+0x8): undefined reference to `mlxsw_pci_driver_unregister'
switchib.c:(.exit.text+0x10): undefined reference to `mlxsw_pci_driver_unregister'
drivers/net/ethernet/mellanox/mlxsw/mlxsw_switchib.o: In function `mlxsw_sib_module_init':
switchib.c:(.init.text+0x28): undefined reference to `mlxsw_pci_driver_register'
switchib.c:(.init.text+0x38): undefined reference to `mlxsw_pci_driver_register'
switchib.c:(.init.text+0x48): undefined reference to `mlxsw_pci_driver_unregister'
The other two such sub-drivers have a dependency, so add the same one
here. In theory we could allow this driver if MLXSW_PCI is disabled,
but it's probably not worth it.
Fixes: d1ba526384 ("mlxsw: switchib: Introduce SwitchIB and SwitchIB silicon driver")
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
If device_release_driver(&phydev->mdio.dev) is called, it releases all
resources belonging to the PHY device. Hence the subsequent call to
phy_led_triggers_unregister() will access already freed memory when
unregistering the LEDs.
Move the call to phy_led_triggers_unregister() before the possible call
to device_release_driver() to fix this.
Fixes: 2e0bc452f4 ("net: phy: leds: add support for led triggers on phy link state change")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Zach Brown <zach.brown@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a hardware issue happened as described by inline comments, the register
write pattern looks like the following:
<write ~MACB_BIT(RE)>
+ wmb();
<write MACB_BIT(RE)>
There might be a memory barrier between these two write operations, so add wmb
to ensure an flip from 0 to 1 for NCR.
Signed-off-by: Zumeng Chen <zumeng.chen@windriver.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On macb only (not gem), when a RX queue corruption was detected from
macb_rx(), the RX queue was reset: during this process the RX ring
buffer descriptor was initialized by macb_init_rx_ring() but we forgot
to also set bp->rx_tail to 0.
Indeed, when processing the received frames, bp->rx_tail provides the
macb driver with the index in the RX ring buffer of the next buffer to
process. So when the whole ring buffer is reset we must also reset
bp->rx_tail so the driver is synchronized again with the hardware.
Since macb_init_rx_ring() is called from many locations, currently from
macb_rx() and macb_init_rings(), we'd rather add the "bp->rx_tail = 0;"
line inside macb_init_rx_ring() than add the very same line after each
call of this function.
Without this fix, the rx queue is not reset properly to recover from
queue corruption and connection drop may occur.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Fixes: 9ba723b081 ("net: macb: remove BUG_ON() and reset the queue to handle RX errors")
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix comments, add some new, and make debugfs output consistent.
Signed-off-by: Pavel Machek <pavel@denx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The cb->done interface expects to be called in process context.
This was broken by the netlink RCU conversion. This patch fixes
it by adding a worker struct to make the cb->done call where
necessary.
Fixes: 21e4902aea ("netlink: Lockless lookup with RCU grace...")
Reported-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>