The dw_mci_init_dma() may decide to not use dma, but pio instead, caused
by things like wrong dma settings in the system.
Till now the code dw_mci_init_slot() always assumed that dma is available
when CONFIG_MMC_DW_IDMAC was defined, ignoring the host->use_dma var
set during dma init.
So when now the dma init failed for whatever reason, the transfer sizes
would still be set for dma transfers, especially including the maximum
block-count calculated from host->ring_size and resulting in a
[ 4.991109] ------------[ cut here ]------------
[ 4.991111] kernel BUG at drivers/mmc/core/core.c:256!
[ 4.991113] Internal error: Oops - BUG: 0 [#1] SMP ARM
because host->ring_size is 0 in this case and the slot init code uses
the wrong code to calculate the values.
Fix this by selecting the correct calculations using the host->use_dma
variable instead of the CONFIG_MMC_DW_IDMAC config option.
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
For Freescale QorIQ LS1021AQDS board, there is a SDIO interrupt
in the process of resume without inserting SD adapter because of
some unknown issue. But the driver doesn't assign sdio_irq_thread
pointer. This will block the resume of kernel. This patch is used
to avoid using NULL sdio_irq_thread pointer.
Signed-off-by: Yangbo Lu <yangbo.lu@freescale.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
In the (not so unlikely) case that the mmc controller timeout budget is
enough for exactly one erase-group, the simplification of allowing one
sector has an enormous performance penalty. We optimize this special case
by introducing a flag that prohibits erase-group boundary crossing, so
that we can allow trimming more than one sector at a time.
Signed-off-by: David Jander <david@protonic.nl>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Inline function __dma_request_slave_channel_compat() doesn't modify "name"
argument but passes it to dma_request_slave_channel() which already takes
it as a constant.
Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
This adds new descriptor flag for reusing a descriptor by submitting
multiple times by a client, for example video buffer.
Add helper APIs for this as well
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Acked-by:Robert Jarzmik <robert.jarzmik@free.fr>
Now that the AEAD conversion is complete we can rip out the old
AEAD interafce and associated code.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This commits moves the Intersil/Techwell PCI vendor ID, and
the device IDs for the TW68 PCI video capture cards.
This will allow to support future Intersil/Techwell devices
without duplicating the IDs.
Signed-off-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Provide new function get_vaddr_frames(). This function maps virtual
addresses from given start and fills given array with page frame numbers of
the corresponding pages. If given start belongs to a normal vma, the function
grabs reference to each of the pages to pin them in memory. If start
belongs to VM_IO | VM_PFNMAP vma, we don't touch page structures. Caller
must make sure pfns aren't reused for anything else while he is using
them.
This function is created for various drivers to simplify handling of
their buffers.
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
We can remove everything from struct sb_writers except frozen
and add the array of percpu_rw_semaphore's instead.
This patch doesn't remove sb_writers->wait_unfrozen yet, we keep
it for get_super_thawed(). We will probably remove it later.
This change tries to address the following problems:
- Firstly, __sb_start_write() looks simply buggy. It does
__sb_end_write() if it sees ->frozen, but if it migrates
to another CPU before percpu_counter_dec(), sb_wait_write()
can wrongly succeed if there is another task which holds
the same "semaphore": sb_wait_write() can miss the result
of the previous percpu_counter_inc() but see the result
of this percpu_counter_dec().
- As Dave Hansen reports, it is suboptimal. The trivial
microbenchmark that writes to a tmpfs file in a loop runs
12% faster if we change this code to rely on RCU and kill
the memory barriers.
- This code doesn't look simple. It would be better to rely
on the generic locking code.
According to Dave, this change adds the same performance
improvement.
Note: with this change both freeze_super() and thaw_super() will do
synchronize_sched_expedited() 3 times. This is just ugly. But:
- This will be "fixed" by the rcu_sync changes we are going
to merge. After that freeze_super()->percpu_down_write()
will use synchronize_sched(), and thaw_super() won't use
synchronize() at all.
This doesn't need any changes in fs/super.c.
- Once we merge rcu_sync changes, we can also change super.c
so that all wb_write->rw_sem's will share the single ->rss
in struct sb_writes, then freeze_super() will need only one
synchronize_sched().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jan Kara <jack@suse.com>
Of course, this patch is ugly as hell. It will be (partially)
reverted later. We add it to ensure that other WIP changes in
percpu_rw_semaphore won't break fs/super.c.
We do not even need this change right now, percpu_free_rwsem()
is fine in atomic context. But we are going to change this, it
will be might_sleep() after we merge the rcu_sync() patches.
And even after that we do not really need destroy_super_work(),
we will kill it in any case. Instead, destroy_super_rcu() should
just check that rss->cb_state == CB_IDLE and do call_rcu() again
in the (very unlikely) case this is not true.
So this is just the temporary kludge which helps us to avoid the
conflicts with the changes which will be (hopefully) routed via
rcu tree.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jan Kara <jack@suse.com>
Add percpu_rwsem_release() and percpu_rwsem_acquire() for the users
which need to return to userspace with percpu-rwsem lock held and/or
pass the ownership to another thread.
TODO: change percpu_rwsem_release() to use rwsem_clear_owner(). We can
either fold kernel/locking/rwsem.h into include/linux/rwsem.h, or add
the non-inline percpu_rwsem_clear_owner().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Preparation to hide the sb->s_writers internals from xfs and btrfs.
Add 2 trivial define's they can use rather than play with ->s_writers
directly. No changes in btrfs/transaction.o and xfs/xfs_aops.o.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jan Kara <jack@suse.com>
With this patch a flag instead of a variable
is used for the default device authorization.
Signed-off-by: Stefan Koch <skoch@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Interfaces are allowed per default.
This can disabled or enabled (again) by writing 0 or 1 to
/sys/bus/usb/devices/usbX/interface_authorized_default
Signed-off-by: Stefan Koch <skoch@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The attribute authorized shows the authorization state for an interface.
Signed-off-by: Stefan Koch <skoch@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Incoming packets in high speed are randomly corrupted by h/w
resulting in multiple errors. This workaround makes FS as
default mode in all affected socs by disabling HS chirp
signalling.This errata does not affect FS and LS mode.
Forces all HS devices to connect in FS mode for all socs
affected by this erratum:
P3041 and P2041 rev 1.0 and 1.1
P5020 and P5010 rev 1.0 and 2.0
P5040, P1010 and T4240 rev 1.0
Signed-off-by: Ramneek Mehresh <ramneek.mehresh@freescale.com>
Signed-off-by: Nikhil Badola <nikhil.badola@freescale.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter writes:
USB: chipidea updates for v4.3-rc1
The main changes are adding several system interfaces for
tuning performance, and each vendors can adjust them according
to their design configurations.
Others are tiny improvements, like more well siTD supports,
USB_DEVICE_A_HNP_SUPPORT supports, etc.
Felipe writes:
usb: patches for v4.3 merge window
New support for Allwinne SoC on the MUSB driver has been added to the list of
glue layers. MUSB also got support for building all DMA engines in one binary;
this will be great for distros.
DWC3 now has no trace of dev_dbg()/dev_vdbg() usage. We will rely solely on
tracing to debug DWC3. There was also a fix for memory corruption with EP0 when
maxpacket size transfers are > 512 bytes.
Robert's EP capabilities flags is making EP selection a lot simpler. UDCs are
now required to set these flags up when adding endpoints to the framework.
Other than these, we have the usual set of miscelaneous cleanups and minor
fixes.
Signed-off-by: Felipe Balbi <balbi@ti.com>
Kill arch_memremap_pmem() and just let the architecture specify the
flags to be passed to memremap(). Default to writethrough by default.
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Existing users of ioremap_cache() are mapping memory that is known in
advance to not have i/o side effects. These users are forced to cast
away the __iomem annotation, or otherwise neglect to fix the sparse
errors thrown when dereferencing pointers to this memory. Provide
memremap() as a non __iomem annotated ioremap_*() in the case when
ioremap is otherwise a pointer to cacheable memory. Empirically,
ioremap_<cacheable-type>() call sites are seeking memory-like semantics
(e.g. speculative reads, and prefetching permitted).
memremap() is a break from the ioremap implementation pattern of adding
a new memremap_<type>() for each mapping type and having silent
compatibility fall backs. Instead, the implementation defines flags
that are passed to the central memremap() and if a mapping type is not
supported by an arch memremap returns NULL.
We introduce a memremap prototype as a trivial wrapper of
ioremap_cache() and ioremap_wt(). Later, once all ioremap_cache() and
ioremap_wt() usage has been removed from drivers we teach archs to
implement arch_memremap() with the ability to strictly enforce the
mapping type.
Cc: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Having the EWMA parameters stored in the runtime struct imposes
memory requirements for the constant values that could just be
inlined in the code. This particularly makes sense if there are
a lot of such structs, for example in mac80211 in the station
table where each station has a number of these in an array, and
there can be many stations.
Provide a macro DECLARE_EWMA() that declares the necessary struct
and inline functions to access it with the parameters hard-coded;
using this also means the user no longer needs to 'select AVERAGE'
as it's entirely self-contained.
In the mac80211 case, on x86-64, this actually slightly *reduces*
code size, while also saving 80 bytes of runtime memory per sta.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
* pci/hotplug:
PCI: pciehp: Remove ignored MRL sensor interrupt events
PCI: pciehp: Remove unused interrupt events
PCI: pciehp: Handle invalid data when reading from non-existent devices
PCI: Hold pci_slot_mutex while searching bus->slots list
PCI: Protect pci_bus->slots with pci_slot_mutex, not pci_bus_sem
PCI: pciehp: Simplify pcie_poll_cmd()
PCI: Use "slot" and "pci_slot" for struct hotplug_slot and struct pci_slot
* pci/iommu:
PCI: Remove pci_ats_enabled()
PCI: Stop caching ATS Invalidate Queue Depth
PCI: Move ATS declarations to linux/pci.h so they're all together
PCI: Clean up ATS error handling
PCI: Use pci_physfn() rather than looking up physfn by hand
PCI: Inline the ATS setup code into pci_ats_init()
PCI: Rationalize pci_ats_queue_depth() error checking
PCI: Reduce size of ATS structure elements
PCI: Embed ATS info directly into struct pci_dev
PCI: Allocate ATS struct during enumeration
iommu/vt-d: Cache PCI ATS state and Invalidate Queue Depth
* pci/irq:
PCI: Kill off set_irq_flags() usage
* pci/virtualization:
PCI: Add ACS quirks for Intel I219-LM/V
Add a VRF_MASTER flag for interfaces and helper functions for determining
if a device is a VRF_MASTER.
Add link attribute for passing VRF_TABLE id.
Add vrf_ptr to netdevice.
Add various macros for determining if a device is a VRF device, the index
of the master VRF device and table associated with VRF device.
Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Like the ipv4 patch with a similar title, this adds a sysctl to allow
the user to change routing behavior based on whether or not the
interface associated with the nexthop was an up or down link. The
default setting preserves the current behavior, but anyone that enables
it will notice that nexthops on down interfaces will no longer be
selected:
net.ipv6.conf.all.ignore_routes_with_linkdown = 0
net.ipv6.conf.default.ignore_routes_with_linkdown = 0
net.ipv6.conf.lo.ignore_routes_with_linkdown = 0
...
When the above sysctls are set, not only will link status be reported to
userspace, but an indication that a nexthop is dead and will not be used
is also reported.
1000::/8 via 7000::2 dev p7p1 metric 1024 dead linkdown pref medium
1000::/8 via 8000::2 dev p8p1 metric 1024 pref medium
7000::/8 dev p7p1 proto kernel metric 256 dead linkdown pref medium
8000::/8 dev p8p1 proto kernel metric 256 pref medium
9000::/8 via 8000::2 dev p8p1 metric 2048 pref medium
9000::/8 via 7000::2 dev p7p1 metric 1024 dead linkdown pref medium
fe80::/64 dev p7p1 proto kernel metric 256 dead linkdown pref medium
fe80::/64 dev p8p1 proto kernel metric 256 pref medium
This also adds devconf support and notification when sysctl values
change.
v2: drop use of rt6i_nhflags since it is not needed right now
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The system bus and chipidea IP have different limitations for
both host and device mode.
For example, with below errata, we need to enable SDIS(Stream Disable
Mode) at host mode. But we don't want it for device mode at the
same system.
TAR 9000378958
Title: Non-Double Word Aligned Buffer Address Sometimes Causes Host to
Hang on OUT Retry
Impacted Configuration: Host mode, all transfer types
Description:
The host core operating in streaming mode may under run while sending
the data packet of an OUT transaction. This under run can occur if
there are unexpected system delays in fetching the remaining packet
data from memory. The host forces a bad CRC on the packet, the device
detects the error and discards the packet. The host then retries a Bulk,
Interrupt, or Control transfer if an under run occurs according to the
USB specification. During simulations, it was found that the host does
not issue the retry of the failed bulk OUT. It does not issue any other
transactions except SOF packets that have incorrect frame numbers.
The second failure mode occurs if the under run occurs on an ISO OUT
transaction and the next ISO transaction is a zero byte packet. The host
does not issue any transactions (including SOFs). The device detects a
Suspend condition, reverts to full speed, and waits for resume signaling.
A third failure mode occurs when the host under runs on an ISO OUT and
the next ISO in the schedule is an ISO OUT with two max packets of 1024
bytes each. The host should issue MDATA for the first OUT followed by
DATA1 for the second. However, it drops the MDATA transaction, and
issues the DATA1 transaction.
The system impact of this bug is the same regardless of the failure mode
observed. The host core hangs, the ehci_ctrl state machine waits for the
protocol engine to send the completion status for the corrupted
transaction, which never occurs. No indication is sent to the host
controller driver, no register bits change and no interrupts occur.
Eventually the requesting application times out.
Detailed internal behavior:
The EHCI control state machine (ehci_ctrl) in the DMA block is responsible
for parsing the schedules and initiating all transactions. The ehci_ctrl
state machine passes the transaction details to the protocol block by
writing the transaction information in to the TxFIFO. It then asserts
the pe_hst_run_pkt signal to inform the host protocol state machine
(pe_hst_state) that there is a packet in the TxFIFO.
A tag of 0x0 indicates a start of packet with the data providing the
following information:
35:32 Tag
31:30 Reserved
29:23 Endpoint (lowest 4 bits)
22:16 Address
15:10 Reserved
9:8 Endpoint speed
7:6 Endpoint type
5:6 Data Toggle
3:0 PID
The pe_hst_state reads the packet information and constructs the packet
and issues it to the PHY interface.
The ehci_ctrl state machine writes the start transaction information in
to the TxFIFO as 0x03002910c for the OUT packet that had the under run
error. However, it writes 0xC3002910C for the retry of the Out
transaction, which is incorrect.
The pe_hst_state enters a bus timeout state after sending the bad CRC
for the packet that under ran. It then purges any data that was back
filled in to the TxFIFO for the packet that under ran. The pe_hst_state
machine stops purging the TxFIFO when it is empty or if it reads a
location that has a tag of 0x0, indicating a start of packet command.
The pe_hst_state reads 0xC3002910C and discards it as it does not decode
to a start of packet command. It continues to purge the OUT data that
has been pre-buffered for the OUT retry . The pe_hst_state detects the
hst_packet_run signal and attempts to read the PID and address
information from the TxFIFO. This location has packet data and so does
not decode to a valid PID and so falls through to the PE_HST_SOF_LOAD
state where the frame_num_counter is updated. The frame_num_counter
is updated with the data in the TxFIFO. In this case, the data is
incorrect as the ehci_ctrl state machine did not initiate the load.
The hst_pe_state machine detects the SOF request signal and sends an
SOF with the bad frame number. Meanwhile, the ehci_ctrl state machine
waits indefinitely in the run_pkt state waiting for the completion
status from pe_hst_state machine, which will never happen.
The ISO failure case is similar except that there is no retry for ISO.
The ehci_ctrl state machine moves to the next transfer in the periodic
schedule. If the under run occurs on the last entry of the periodic
list then it moves to the Async schedule.
In the case of ISO OUT simulations, the next ISO is a zero byte OUT
and again the start of packet command gets corrupted. The TxFIFO is
empty when the hst_pe_state attempts to read the Address and PID
information as the transaction is a zero byte packet. This results
in the hst_pe_state machine staying in the GET_PID state, which means
that it does not issue any transactions (including SOFs). The device
detects a Suspend condition and reverts to full speed mode and waits
for a Resume or Reset signal.
The EHCI specification allows a Non-DoubleWord (32 bits) offset to
be used as a current offset for Buffer Pointer Page 0 of the qTD.
In Non-DoubleWord aligned cases, the core reads the packet data
from the AHB memory, performs the alignment operation before writing
it in to the TxFIFO as a 32 bit data word. An End Of Packet tag (EOP)
is written to the TxFIFO after all the packet data has been written
in to the TxFIFO. The alignment function is reset to Idle by the EOP
tag. The corruption of the start of packet command arises because the
packet buffer for the OUT transaction that under ran is not aligned
to a DoubleWord, and hence no EOP tag is written to the TxFIFO. The
alignment function is still active when the start packet information
is written in to the TxFIFO for the retry of the bulk packet or for
the next transaction in the case of an under run on an ISO. This
results in the corruption of the start tag and the transaction
information.
Click for waveform showing the command 0x 0000300291 being written in
to the TX FIFO for the Out that under ran.
Click for waveform showing the command 0xC3002910C written to the
TxFIFO instead of 0x 0000300291
Versions affected: Versions 2.10a and previous versions
How discovered: Customer simulation
Workaround:
1- The EHCI specification allows a non-DoubleWord offset to be used
as a current offset for Buffer Pointer Page 0 of the qTD. However,
if a DoubleWord offset is used then this issue does not arise.
2- Use non streaming mode to eliminate under runs.
Resolution:
The fix involves changes to the traffic state machine in the
vusb_hs_dma_traf block. The ehci_ctrl state machine updates the context
information by encoding the transaction results on the
hst_op_context_update signals at the end of a transaction. The signal
hst_op_context_update is added to the traffic state machine, and the
tx_fifo_under_ran_r signal is generated if the transaction results in
an under run error. Click for waveform
The traffic state machine then traverses to the do_eop states if the
tx_fifo_under_ran error is asserted. Thus an EOP tag is written in to
the TxFIFO as shown in this waveform .
The EOP tag resets the align state machine to the Idle state ensuring
that the next command written by the echi_ctrl state machine does not
get corrupted.
File(s) modified:
RTL code fixed: …..
Method of reproducing: This failure cannot be reproduced in the current
test bench.
Date Found: March 2010
Date Fixed: June 2010
Update information:
Added the RTL code fix
Signed-off-by: Peter Chen <peter.chen@freescale.com>
ITC (Interrupt Threshold Control) is used to set the maximum rate at which
the host/device controller will issue interrupts. The default value is 8 (1ms)
for it. EHCI core will modify it to 1, but device mode keeps it as default
value.
In some use cases like Android ADB, it only has one usb request for each
direction, and maximum payload data is only 4KB, so the speed is 4MB/s
at most, it needs controller to trigger interrupt as fast as possible
to increase the speed. The USB performance will be better if the interrupt
can be triggered faster.
Reduce ITC value is benefit for USB performance, but the interrupt number
is increased at the same time, it may increase cpu utilization too.
Most of use case cares about performance, but some may care about
cpu utilization, so, we leave a platform interface for user.
We set ITC as 1 (1 micro-frame) as default value which is aligned
with ehci core default value.
Signed-off-by: Peter Chen <peter.chen@freescale.com>
The register of ttctrl.ttha describes like below:
- Internal TT Hub Address Representation
- RW
- Default = 0000000b
This field is used to match against the Hub Address field in QH & siTD
to determine if the packet is routed to the internal TT for directly
attached FS/LS devices. If the Hub Address in the QH or siTD does not
match this address then the packet will be broadcast on the High Speed
ports destined for a downstream High Speed hub with the address in the QH/siTD.
In silicon RTL, this entry only affects QH and siTD, and the hub.addr at
both QH and siTD are 0 in ehci core for chipidea (with hcd->has_tt = 1).
So, for QH, if the "usage_tt" flag at RTL is 0, set CI_HDRC_SET_NON_ZERO_TTHA
will not affect QH (with non-hs device); for siTD, set this flag
will change remaining space requirement for the last transaction from 1023
bytes to 188 bytes, it can increase the number of transactions within one
frame, ehci periodic schedule code will not queue the packet if the frame space
is full, so it is safe to set this flag for siTD.
With this flag, it can fix the problem Alan Stern reported below:
http://www.spinics.net/lists/linux-usb/msg123125.html
And may fix Michael Tessier's problem too.
http://www.spinics.net/lists/linux-usb/msg118679.html
CC: stern@rowland.harvard.edu
CC: michael.tessier@axiontech.ca
Signed-off-by: Peter Chen <peter.chen@freescale.com>
OF has some helper functions for parsing MAC and PHY settings.
In cases where the platform is providing this information rather
than the device itself, there needs to be similar functions for ACPI.
These functions are slightly modified versions of the ones in
of_net which can use information provided via DT or ACPI.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/cavium/Kconfig
The cavium conflict was overlapping dependency
changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove pci_ats_enabled(). There are no callers outside the ATS code
itself. We don't need to check ats_cap, because if we don't find an ATS
capability, we'll never set ats_enabled.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Stop caching the Invalidate Queue Depth in struct pci_dev.
pci_ats_queue_depth() is typically called only once per device, and it
returns a fixed value per-device, so callers who need the value frequently
can cache it themselves.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Move ATS declarations to linux/pci.h so they're all in one place.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
The extended capabilities list is linked with 12-bit pointers, and the ATS
Smallest Translation Unit and Invalidate Queue Depth fields are both 5
bits.
Use u16 and u8 to hold the extended capability address and the stu and qdep
values. No functional change.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
The pci_ats struct is small and will get smaller, so I don't think it's
worth allocating it separately from the pci_dev struct.
Embed the ATS fields directly into struct pci_dev.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Previously, we allocated pci_ats structures when an IOMMU driver called
pci_enable_ats(). An SR-IOV VF shares the STU setting with its PF, so when
enabling ATS on the VF, we allocated a pci_ats struct for the PF if it
didn't already have one. We held the sriov->lock to serialize threads
concurrently enabling ATS on several VFS so only one would allocate the PF
pci_ats.
Gregor reported a deadlock here:
pci_enable_sriov
sriov_enable
virtfn_add
mutex_lock(dev->sriov->lock) # acquire sriov->lock
pci_device_add
device_add
BUS_NOTIFY_ADD_DEVICE notifier chain
iommu_bus_notifier
amd_iommu_add_device # iommu_ops.add_device
init_iommu_group
iommu_group_get_for_dev
iommu_group_add_device
__iommu_attach_device
amd_iommu_attach_device # iommu_ops.attach_device
attach_device
pci_enable_ats
mutex_lock(dev->sriov->lock) # deadlock
There's no reason to delay allocating the pci_ats struct, and if we
allocate it for each device at enumeration-time, there's no need for
locking in pci_enable_ats().
Allocate pci_ats struct during enumeration, when we initialize other
capabilities.
Note that this implementation requires ATS to be enabled on the PF first,
before on any of the VFs because the PF controls the STU for all the VFs.
Link: http://permalink.gmane.org/gmane.linux.kernel.iommu/9433
Reported-by: Gregor Dick <gdick@solarflare.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
We can always fill up the bio now, no need to estimate the possible
size based on queue parameters.
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
[hch: rebased and wrote a changelog]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The way the block layer is currently written, it goes to great lengths
to avoid having to split bios; upper layer code (such as bio_add_page())
checks what the underlying device can handle and tries to always create
bios that don't need to be split.
But this approach becomes unwieldy and eventually breaks down with
stacked devices and devices with dynamic limits, and it adds a lot of
complexity. If the block layer could split bios as needed, we could
eliminate a lot of complexity elsewhere - particularly in stacked
drivers. Code that creates bios can then create whatever size bios are
convenient, and more importantly stacked drivers don't have to deal with
both their own bio size limitations and the limitations of the
(potentially multiple) devices underneath them. In the future this will
let us delete merge_bvec_fn and a bunch of other code.
We do this by adding calls to blk_queue_split() to the various
make_request functions that need it - a few can already handle arbitrary
size bios. Note that we add the call _after_ any call to
blk_queue_bounce(); this means that blk_queue_split() and
blk_recalc_rq_segments() don't need to be concerned with bouncing
affecting segment merging.
Some make_request_fn() callbacks were simple enough to audit and verify
they don't need blk_queue_split() calls. The skipped ones are:
* nfhd_make_request (arch/m68k/emu/nfblock.c)
* axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
* simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
* brd_make_request (ramdisk - drivers/block/brd.c)
* mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c)
* loop_make_request
* null_queue_bio
* bcache's make_request fns
Some others are almost certainly safe to remove now, but will be left
for future patches.
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: drbd-user@lists.linbit.com
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Jim Paris <jim@jtan.com>
Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Acked-by: NeilBrown <neilb@suse.de> (for the 'md/md.c' bits)
Acked-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
[dpark: skip more mq-based drivers, resolve merge conflicts, etc.]
Signed-off-by: Dongsu Park <dpark@posteo.net>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>