Commit Graph

460 Commits

Author SHA1 Message Date
Mike Marciniszyn
7d7632add8 IB/qib: Modify software pma counters to use percpu variables
The counters, unicast_xmit, unicast_rcv, multicast_xmit, multicast_rcv
are now maintained as percpu variables.

The mad code is modified to add a z_ latch so that the percpu counters
monotonically increase with appropriate adjustments in the reset,
read logic to maintain the z_ latch.

This patch also corrects the fact the unitcast_xmit wasn't handled
at all for UC and RC QPs.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:16:51 -07:00
Mike Marciniszyn
1ed88dd7d0 IB/qib: Add percpu counter replacing qib_devdata int_counter
This patch replaces the dd->int_counter with a percpu counter.

The maintanance of qib_stats.sps_ints and int_counter are
combined into the new counter.

There are two new functions added to read the counter:
- qib_int_counter (for a particular qib_devdata)
- qib_sps_ints (for all HCAs)

A z_int_counter is added to allow the interrupt detection logic
to determine if interrupts have occured since z_int_counter
was "reset".

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:16:51 -07:00
Mike Marciniszyn
f8b6c47a44 IB/qib: Fix debugfs ordering issue with multiple HCAs
The debugfs init code was incorrectly called before the idr mechanism
is used to get the unit number, so the dd->unit hasn't been
initialized.  This caused the unit relative directory creation to fail
after the first.

This patch moves the init for the debugfs stuff until after all of the
failures and after the unit number has been determined.

A bug in unwind code in qib_alloc_devdata() is also fixed.

Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:16:51 -07:00
Dennis Dalessandro
1c20c81909 IB/qib: Fix potential buffer overrun in sending diag packet routine
Guard against a potential buffer overrun.  Right now the qib driver is
protected by the fact that the data structure in question is only 16
bits.  Should that ever change the problem will be exposed. There is a
similar defect in the ipath driver and this brings the two code paths
into sync.

Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 16:16:51 -07:00
Yishai Hadas
eeb8461e36 IB: Refactor umem to use linear SG table
This patch refactors the IB core umem code and vendor drivers to use a
linear (chained) SG table instead of chunk list.  With this change the
relevant code becomes clearer—no need for nested loops to build and
use umem.

Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-04 10:34:28 -08:00
Mike Marciniszyn
2f75e12c44 IB/qib: Add missing serdes init sequence
Research has shown that commit a77fcf8950 ("IB/qib: Use a single
txselect module parameter for serdes tuning") missed a key serdes init
sequence.

This patch add that sequence.

Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 14:52:44 -08:00
Roland Dreier
fb1b5034e4 Merge branch 'ip-roce' into for-next
Conflicts:
	drivers/infiniband/hw/mlx4/main.c
2014-01-22 23:24:21 -08:00
Ira Weiny
6e0ea9e6cb IB/qib: Fix QP check when looping back to/from QP1
The GSI QP type is compatible with and should be allowed to send data
to/from any UD QP.  This was found when testing ibacm on the same node
as an SA.

Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:16:47 -08:00
Matan Barak
dd5f03beb4 IB/core: Ethernet L2 attributes in verbs/cm structures
This patch add the support for Ethernet L2 attributes in the
verbs/cm/cma structures.

When dealing with L2 Ethernet, we should use smac, dmac, vlan ID and priority
in a similar manner that the IB L2 (and the L4 PKEY) attributes are used.

Thus, those attributes were added to the following structures:

* ib_ah_attr - added dmac
* ib_qp_attr - added smac and vlan_id, (sl remains vlan priority)
* ib_wc - added smac, vlan_id
* ib_sa_path_rec - added smac, dmac, vlan_id
* cm_av - added smac and vlan_id

For the path record structure, extra care was taken to avoid the new
fields when packing it into wire format, so we don't break the IB CM
and SA wire protocol.

On the active side, the CM fills. its internal structures from the
path provided by the ULP.  We add there taking the ETH L2 attributes
and placing them into the CM Address Handle (struct cm_av).

On the passive side, the CM fills its internal structures from the WC
associated with the REQ message.  We add there taking the ETH L2
attributes from the WC.

When the HW driver provides the required ETH L2 attributes in the WC,
they set the IB_WC_WITH_SMAC and IB_WC_WITH_VLAN flags. The IB core
code checks for the presence of these flags, and in their absence does
address resolution from the ib_init_ah_from_wc() helper function.

ib_modify_qp_is_ok is also updated to consider the link layer. Some
parameters are mandatory for Ethernet link layer, while they are
irrelevant for IB.  Vendor drivers are modified to support the new
function signature.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 14:20:54 -08:00
Linus Torvalds
1ea406c0e0 Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband/rdma updates from Roland Dreier:
 - Re-enable flow steering verbs with new improved userspace ABI
 - Fixes for slow connection due to GID lookup scalability
 - IPoIB fixes
 - Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib
 - Further improvements to SRP error handling
 - Add new transport type for Cisco usNIC

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits)
  IB/core: Re-enable create_flow/destroy_flow uverbs
  IB/core: extended command: an improved infrastructure for uverbs commands
  IB/core: Remove ib_uverbs_flow_spec structure from userspace
  IB/core: Use a common header for uverbs flow_specs
  IB/core: Make uverbs flow structure use names like verbs ones
  IB/core: Rename 'flow' structs to match other uverbs structs
  IB/core: clarify overflow/underflow checks on ib_create/destroy_flow
  IB/ucma: Convert use of typedef ctl_table to struct ctl_table
  IB/cm: Convert to using idr_alloc_cyclic()
  IB/mlx5: Fix page shift in create CQ for userspace
  IB/mlx4: Fix device max capabilities check
  IB/mlx5: Fix list_del of empty list
  IB/mlx5: Remove dead code
  IB/core: Encorce MR access rights rules on kernel consumers
  IB/mlx4: Fix endless loop in resize CQ
  RDMA/cma: Remove unused argument and minor dead code
  RDMA/ucma: Discard events for IDs not yet claimed by user space
  IB/core: Add Cisco usNIC rdma node and transport types
  RDMA/nes: Remove self-assignment from nes_query_qp()
  IB/srp: Report receive errors correctly
  ...
2013-11-18 15:36:04 -08:00
Linus Torvalds
2f466d33f5 Merge tag 'pci-v3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI changes from Bjorn Helgaas:
 "Resource management
    - Fix host bridge window coalescing (Alexey Neyman)
    - Pass type, width, and prefetchability for window alignment (Wei Yang)

  PCI device hotplug
    - Convert acpiphp, acpiphp_ibm to dynamic debug (Lan Tianyu)

  Power management
    - Remove pci_pm_complete() (Liu Chuansheng)

  MSI
    - Fail initialization if device is not in PCI_D0 (Yijing Wang)

  MPS (Max Payload Size)
    - Use pcie_get_mps() and pcie_set_mps() to simplify code (Yijing Wang)
    - Use pcie_set_readrq() to simplify code (Yijing Wang)
    - Use cached pci_dev->pcie_mpss to simplify code (Yijing Wang)

  SR-IOV
    - Enable upstream bridges even for VFs on virtual buses (Bjorn Helgaas)
    - Use pci_is_root_bus() to avoid catching virtual buses (Wei Yang)

  Virtualization
    - Add x86 MSI masking ops (Konrad Rzeszutek Wilk)

  Freescale i.MX6
    - Support i.MX6 PCIe controller (Sean Cross)
    - Increase link startup timeout (Marek Vasut)
    - Probe PCIe in fs_initcall() (Marek Vasut)
    - Fix imprecise abort handler (Tim Harvey)
    - Remove redundant of_match_ptr (Sachin Kamat)

  Renesas R-Car
    - Support Gen2 internal PCIe controller (Valentine Barshak)

  Samsung Exynos
    - Add MSI support (Jingoo Han)
    - Turn off power when link fails (Jingoo Han)
    - Add Jingoo Han as maintainer (Jingoo Han)
    - Add clk_disable_unprepare() on error path (Wei Yongjun)
    - Remove redundant of_match_ptr (Sachin Kamat)

  Synopsys DesignWare
    - Add irq_create_mapping() (Pratyush Anand)
    - Add header guards (Seungwon Jeon)

  Miscellaneous
    - Enable native PCIe services by default on non-ACPI (Andrew Murray)
    - Cleanup _OSC usage and messages (Bjorn Helgaas)
    - Remove pcibios_last_bus boot option on non-x86 (Bjorn Helgaas)
    - Convert bus code to use bus_, drv_, and dev_groups (Greg Kroah-Hartman)
    - Remove unused pci_mem_start (Myron Stowe)
    - Make sysfs functions static (Sachin Kamat)
    - Warn on invalid return from driver probe (Stephen M. Cameron)
    - Remove Intel Haswell D3 delays (Todd E Brandt)
    - Call pci_set_master() in core if driver doesn't do it (Yinghai Lu)
    - Use pci_is_pcie() to simplify code (Yijing Wang)
    - Use PCIe capability accessors to simplify code (Yijing Wang)
    - Use cached pci_dev->pcie_cap to simplify code (Yijing Wang)
    - Removed unused "is_pcie" from struct pci_dev (Yijing Wang)
    - Simplify sysfs CPU affinity implementation (Yijing Wang)"

* tag 'pci-v3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (79 commits)
  PCI: Enable upstream bridges even for VFs on virtual buses
  PCI: Add pci_upstream_bridge()
  PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()
  PCI: Warn on driver probe return value greater than zero
  PCI: Drop warning about drivers that don't use pci_set_master()
  PCI: Workaround missing pci_set_master in pci drivers
  powerpc/pci: Use pci_is_pcie() to simplify code [fix]
  PCI: Update pcie_ports 'auto' behavior for non-ACPI platforms
  PCI: imx6: Probe the PCIe in fs_initcall()
  PCI: Add R-Car Gen2 internal PCI support
  PCI: imx6: Remove redundant of_match_ptr
  PCI: Report pci_pme_active() kmalloc failure
  mn10300/PCI: Remove useless pcibios_last_bus
  frv/PCI: Remove pcibios_last_bus
  PCI: imx6: Increase link startup timeout
  PCI: exynos: Remove redundant of_match_ptr
  PCI: imx6: Fix imprecise abort handler
  PCI: Fail MSI/MSI-X initialization if device is not in PCI_D0
  PCI: imx6: Remove redundant dev_err() in imx6_pcie_probe()
  x86/PCI: Coalesce multiple overlapping host bridge windows
  ...
2013-11-14 14:02:00 +09:00
Al Viro
441a9d0e1e qib_fs: fix (some) dcache abuses
* lookup_one_len() really wants i_mutex held on directory.
* leaks galore - just mount ipathfs, then
cd /sys/bus/pci/drivers/qib_ib; echo *:*:*.* >unbind
on a box with that card present and try to umount ipathfs...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-13 08:08:19 -05:00
Mike Marciniszyn
2fadd83184 IB/qib: Fix txselect regression
Commit 7fac33014f54("IB/qib: checkpatch fixes") was overzealous in
removing a simple_strtoul for a parse routine, setup_txselect().  That
routine is required to handle a multi-value string.

Unwind that aspect of the fix.

Cc: <stable@vger.kernel.org>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:12 -08:00
Mike Marciniszyn
78a5886472 IB/qib: Fix checkpatch __packed warnings
Convert __attribute__ ((packed)) to __packed.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:12 -08:00
Jan Kara
603e772992 IB/qib: Convert qib_user_sdma_pin_pages() to use get_user_pages_fast()
qib_user_sdma_queue_pkts() gets called with mmap_sem held for
writing. Except for get_user_pages() deep down in
qib_user_sdma_pin_pages() we don't seem to need mmap_sem at all.  Even
more interestingly the function qib_user_sdma_queue_pkts() (and also
qib_user_sdma_coalesce() called somewhat later) call copy_from_user()
which can hit a page fault and we deadlock on trying to get mmap_sem
when handling that fault.

So just make qib_user_sdma_pin_pages() use get_user_pages_fast() and
leave mmap_sem locking for mm.

This deadlock has actually been observed in the wild when the node
is under memory pressure.

Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:11 -08:00
Bjorn Helgaas
03078633a6 IB/qib: Drop qib_tune_pcie_caps() and qib_tune_pcie_coalesce() return values
The callers of qib_tune_pcie_caps() and qib_tune_pcie_coalesce() don't
check the return values, so this patch drops the return values altogether.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
2013-10-04 14:30:19 -06:00
Yijing Wang
0ce0e62f1f IB/qib: Use pcie_set_mps() and pcie_get_mps() to simplify code
Refactor qib_tune_pcie_caps().  Use pcie_get_mps(), pcie_set_mps(),
pcie_get_readrq(), and pcie_set_readrq() to simplify the code.  The PCI
core caches the "PCIe Max Payload Size Supported" in pci_dev->pcie_mpss,
so use that instead of pcie_capability_read_word().  Remove the unused
val2fld() and fld2val().

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
2013-09-24 14:17:06 -06:00
Yijing Wang
dcaa73dc34 IB/qib: Use pci_is_root_bus() to check whether it is a root bus
Use pci_is_root_bus() instead of "if (bus->parent)" statement
for better readability.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
2013-09-24 12:13:03 -06:00
Martin Schwidefsky
0244ad004a Remove GENERIC_HARDIRQ config option
After the last architecture switched to generic hard irqs the config
options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code
for !CONFIG_GENERIC_HARDIRQS can be removed.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-09-13 15:09:52 +02:00
Ira Weiny
0318f68521 IB/qib: Move COUNTER_MASK definition within qib_mad.h header guards
Commit 36a8f01cd2 ("IB/qib: Add congestion control agent
implementation") caused statements to leak pass the header guard.
Fix this.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-09-02 21:22:20 -07:00
Yijing Wang
b29b076394 IB/qib: Clean up unnecessary MSI/MSI-X capability find
PCI core will initialize device MSI/MSI-X capability in
pci_msi_init_pci_dev().  So device drivers should use
pci_dev->msi_cap/msix_cap to determine whether a device supports
MSI/MSI-X instead of using pci_find_capability(pci_dev,
PCI_CAP_ID_MSI/MSIX).  Access to PCIe device config space again will
consume more time.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-13 11:17:23 -07:00
Paul Bolle
bea25e82c6 IB/qib: Make qib_driver static
struct pci_driver qib_driver is only used in qib_init.c.  Remove it
from qib.h and make it static in qib_init.c.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-13 11:14:51 -07:00
CQ Tang
4668e4b527 IB/qib: Improve SDMA performance
1. The code accepts chunks of messages, and splits the chunk into
   packets when converting packets into sdma queue entries.  Adjacent
   packets will use user buffer pages smartly to avoid pinning the
   same page multiple times.

2. Instead of discarding all the work when SDMA queue is full, the
   work is saved in a pending queue.  Whenever there are enough SDMA
   queue free entries, pending queue is directly put onto SDMA queue.

3. An interrupt handler is used to progress this pending queue.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>

[ Fixed up sparse warnings.  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-13 11:14:34 -07:00
Mike Marciniszyn
b268e4db3d IB/qib: Add err_decode() call for ring dump
Commit 0b3ddf380c ("Log all SDMA errors unconditionally") missed
part of the patch.

This also corrects a format warning when dma_addr_t is 32 bits
on a 64 bit system.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-30 10:13:00 -07:00
Linus Torvalds
c552441373 Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull InfiniBand/RDMA changes from Roland Dreier:
 - AF_IB (native IB addressing) for CMA from Sean Hefty
 - new mlx5 driver for Mellanox Connect-IB adapters (including post
   merge request fixes)
 - SRP fixes from Bart Van Assche (including fix to first merge request)
 - qib HW driver updates
 - resurrection of ocrdma HW driver development
 - uverbs conversion to create fds with O_CLOEXEC set
 - other small changes and fixes

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits)
  mlx5: Return -EFAULT instead of -EPERM
  IB/qib: Log all SDMA errors unconditionally
  IB/qib: Fix module-level leak
  mlx5_core: Adjust hca_cap.uar_page_sz to conform to Connect-IB spec
  IB/srp: Let srp_abort() return FAST_IO_FAIL if TL offline
  IB/uverbs: Use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd()
  mlx5_core: Fixes for sparse warnings
  IB/mlx5: Make profile[] static in main.c
  mlx5: Fix parameter type of health_handler_t
  mlx5: Add driver for Mellanox Connect-IB adapters
  IB/core: Add reserved values to enums for low-level driver use
  IB/srp: Bump driver version and release date
  IB/srp: Make HCA completion vector configurable
  IB/srp: Maintain a single connection per I_T nexus
  IB/srp: Fail I/O fast if target offline
  IB/srp: Skip host settle delay
  IB/srp: Avoid skipping srp_reset_host() after a transport error
  IB/srp: Fix remove_one crash due to resource exhaustion
  IB/qib: New transmitter tunning settings for Dell 1.1 backplane
  IB/core: Fix error return code in add_port()
  ...
2013-07-13 12:57:21 -07:00
Dean Luick
0b3ddf380c IB/qib: Log all SDMA errors unconditionally
This patch adds code to log SDMA errors for supportability purposes.

Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-11 16:47:06 -07:00
Mike Marciniszyn
308c813b19 IB/qib: Fix module-level leak
The vzalloc()'ed field physshadow is leaked on module unload.

This patch adds vfree after the sibling page shadow is freed.

Reported-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-11 16:46:44 -07:00
Kees Cook
02aa2a3763 drivers: avoid format string in dev_set_name
Calling dev_set_name with a single paramter causes it to be handled as a
format string.  Many callers are passing potentially dynamic string
content, so use "%s" in those cases to avoid any potential accidents,
including wrappers like device_create*() and bdi_register().

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:41 -07:00
Mitko Haralanov
22baa407f9 IB/qib: New transmitter tunning settings for Dell 1.1 backplane
The Dell blade chassis got an updated backplane which requires new
transmitter tuning settings.

Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-26 09:27:50 -07:00
Mike Marciniszyn
1dd173b01f IB/qib: Add qp_stats debug file
This adds a seq_file iterator for reporting the QP hash table when the
qp_stats file is read.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-21 17:19:52 -07:00
Mike Marciniszyn
17db3a92c1 IB/qib: Add per-context stats interface
This patch adds a debugfs stats interface for per kernel contexts
packet counts.

The code uses the opcode stats count and eliminates the counter in the
context.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-21 17:19:51 -07:00
Mike Marciniszyn
ddb8876589 IB/qib: Convert opcode counters to per-context
This fix changes the opcode relative counters for receive to per
context.

Profiling has shown that when mulitple contexts are being used there
is a lot of cache activity associated with these counters.

The code formerly kept these counters per port, but only provided the
interface to read per HCA.  This patch converts the read of counters
to per HCA and adds the debugfs hooks to be able to read the file as a
sequence of opcodes.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-21 17:19:50 -07:00
Mike Marciniszyn
85caafe307 IB/qib: Optimize CQ callbacks
The current workqueue implemention has the following performance
deficiencies on QDR HCAs:

- The CQ call backs tend to run on the CPUs processing the
  receive queues
- The single thread queue isn't optimal for multiple HCAs

This patch adds a dedicated per HCA bound thread to process CQ callbacks.

Reviewed-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-21 17:19:49 -07:00
Ramkrishna Vepa
c804f07248 IB/qib: Add dual-rail NUMA awareness for PSM processes
The driver currently selects a HCA based on the algorithm that PSM
chooses, contexts within a HCA or across. The HCA can also be chosen
by the user. Either way, this patch assigns a CPU on the NUMA node
local to the selected HCA. This patch also tries to select the HCA
closest to the NUMA node of the CPU assigned via taskset to PSM
process. If this HCA is unusable then another unit is selected based
on the algorithm that is currently enforced or selected by PSM - round
robin context selection 'within' or 'across' HCA's.

Fixed a bug wherein contexts are setup on the NUMA node on which the
processes are opened (setup_ctxt()) and not on the NUMA node that the
driver recommends the CPU on.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-21 17:19:49 -07:00
Ramkrishna Vepa
e0f30baca1 IB/qib: Add optional NUMA affinity
This patch adds context relative numa affinity conditioned on the
module parameter numa_aware. The qib_ctxtdata has an additional
node_id member and qib_create_ctxtdata() has an addition node_id
parameter.

The allocations within the hdr queue and eager queue setup routines
now take this additional member and adjust allocations as necesary.
PSM will pass the either current numa node or the node closest to the
HCA depending on numa_aware. Verbs will always use the node closest to
the HCA.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com>
Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-21 17:19:48 -07:00
Vinit Agnihotri
ab4a13d69b IB/qib: Update minor version number
External PSM repositories have advanced the minor number for a variety
of reasons. The driver needs to increase to avoid warnings.

Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-21 17:19:47 -07:00
Mike Marciniszyn
f7cf9a618b IB/qib: Remove atomic_inc_not_zero() from QP RCU
Follow Documentation/RCU/rcuref.txt guidance in removing
atomic_inc_not_zero() from QP RCU implementation.

This patch also removes an unneeded synchronize_rcu() in the add path.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-21 17:19:46 -07:00
Mike Marciniszyn
8469ba39a6 IB/qib: Add DCA support
This patch adds DCA cache warming for systems that support DCA.

The code uses cpu affinity notification to react to an affinity change
from a user mode program like irqbalance and (re-)program the chip
accordingly. This notification avoids reading the current cpu on every
interrupt.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>

[ Add Kconfig dependency on SMP && GENERIC_HARDIRQS to avoid failure to
  build due to undefined struct irq_affinity_notify.  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-21 17:19:38 -07:00
Mike Marciniszyn
f3bdf34465 IB/qib: Fix lockdep splat in qib_alloc_lkey()
The following backtrace is reported with CONFIG_PROVE_RCU:

    drivers/infiniband/hw/qib/qib_keys.c:64 suspicious rcu_dereference_check() usage!
    other info that might help us debug this:
    rcu_scheduler_active = 1, debug_locks = 1
    4 locks held by kworker/0:1/56:
    #0:  (events){.+.+.+}, at: [<ffffffff8107a4f5>] process_one_work+0x165/0x4a0
    #1:  ((&wfc.work)){+.+.+.}, at: [<ffffffff8107a4f5>] process_one_work+0x165/0x4a0
    #2:  (device_mutex){+.+.+.}, at: [<ffffffffa0148dd8>] ib_register_device+0x38/0x220 [ib_core]
    #3:  (&(&dev->lk_table.lock)->rlock){......}, at: [<ffffffffa017e81c>] qib_alloc_lkey+0x3c/0x1b0 [ib_qib]

    stack backtrace:
    Pid: 56, comm: kworker/0:1 Not tainted 3.10.0-rc1+ #6
    Call Trace:
    [<ffffffff810c0b85>] lockdep_rcu_suspicious+0xe5/0x130
    [<ffffffffa017e8e1>] qib_alloc_lkey+0x101/0x1b0 [ib_qib]
    [<ffffffffa0184886>] qib_get_dma_mr+0xa6/0xd0 [ib_qib]
    [<ffffffffa01461aa>] ib_get_dma_mr+0x1a/0x50 [ib_core]
    [<ffffffffa01678dc>] ib_mad_port_open+0x12c/0x390 [ib_mad]
    [<ffffffff810c2c55>] ?  trace_hardirqs_on_caller+0x105/0x190
    [<ffffffffa0167b92>] ib_mad_init_device+0x52/0x110 [ib_mad]
    [<ffffffffa01917c0>] ?  sl2vl_attr_show+0x30/0x30 [ib_qib]
    [<ffffffffa0148f49>] ib_register_device+0x1a9/0x220 [ib_core]
    [<ffffffffa01b1685>] qib_register_ib_device+0x735/0xa40 [ib_qib]
    [<ffffffff8106ba98>] ? mod_timer+0x118/0x220
    [<ffffffffa017d425>] qib_init_one+0x1e5/0x400 [ib_qib]
    [<ffffffff812ce86e>] local_pci_probe+0x4e/0x90
    [<ffffffff81078118>] work_for_cpu_fn+0x18/0x30
    [<ffffffff8107a566>] process_one_work+0x1d6/0x4a0
    [<ffffffff8107a4f5>] ?  process_one_work+0x165/0x4a0
    [<ffffffff8107c9c9>] worker_thread+0x119/0x370
    [<ffffffff8107c8b0>] ?  manage_workers+0x180/0x180
    [<ffffffff8108294e>] kthread+0xee/0x100
    [<ffffffff81082860>] ?  __init_kthread_worker+0x70/0x70
    [<ffffffff815c04ac>] ret_from_fork+0x7c/0xb0
    [<ffffffff81082860>] ?  __init_kthread_worker+0x70/0x70

Per Documentation/RCU/lockdep-splat.txt, the code now uses rcu_access_pointer()
vs. rcu_dereference().

Reported-by: Jay Fenlason <fenlason@redhat.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-04 17:05:20 -07:00
Linus Torvalds
e0fd9affeb Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull InfiniBand/RDMA changes from Roland Dreier:
 - XRC transport fixes
 - Fix DHCP on IPoIB
 - mlx4 preparations for flow steering
 - iSER fixes
 - miscellaneous other fixes

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (23 commits)
  IB/iser: Add support for iser CM REQ additional info
  IB/iser: Return error to upper layers on EAGAIN registration failures
  IB/iser: Move informational messages from error to info level
  IB/iser: Add module version
  mlx4_core: Expose a few helpers to fill DMFS HW strucutures
  mlx4_core: Directly expose fields of DMFS HW rule control segment
  mlx4_core: Change a few DMFS fields names to match firmare spec
  mlx4: Match DMFS promiscuous field names to firmware spec
  mlx4_core: Move DMFS HW structs to common header file
  IB/mlx4: Set link type for RAW PACKET QPs in the QP context
  IB/mlx4: Disable VLAN stripping for RAW PACKET QPs
  mlx4_core: Reduce warning message for SRQ_LIMIT event to debug level
  RDMA/iwcm: Don't touch cmid after dropping reference
  IB/qib: Correct qib_verbs_register_sysfs() error handling
  IB/ipath: Correct ipath_verbs_register_sysfs() error handling
  RDMA/cxgb4: Fix SQ allocation when on-chip SQ is disabled
  SRPT: Fix odd use of WARN_ON()
  IPoIB: Fix ipoib_hard_header() return value
  RDMA: Rename random32() to prandom_u32()
  RDMA/cxgb3: Fix uninitialized variable
  ...
2013-05-08 15:29:48 -07:00
Kent Overstreet
a27bb332c0 aio: don't include aio.h in sched.h
Faster kernel compiles by way of fewer unnecessary includes.

[akpm@linux-foundation.org: fix fallout]
[akpm@linux-foundation.org: fix build]
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07 20:16:25 -07:00
Mike Marciniszyn
c9bdad3c81 IB/qib: Correct qib_verbs_register_sysfs() error handling
qib_verbs_register_sysfs() never cleans up from a failure.
Additionally, the caller of qib_verbs_register_sysfs() doesn't
return the correct "ret" value.

This patch resolves both of those issues.

Reported-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: Dean Luick <dean.luick@intel.com>

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-04-16 23:04:36 -07:00
Mike Marciniszyn
ff802e31b5 firmware,IB/qib: revert firmware file move
Commit e2eed58b4f ("IB/qib: change QLogic to Intel") moved a firmware
file potentially breaking the ABI.

This patch reverts that aspect of the fix as well as reverting the
firmware name as used in qib.

Reported-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-05 12:19:39 -07:00
Vinit Agnihotri
e2eed58b4f IB/qib: change QLogic to Intel
These changes modify the qib driver as part of acquiring
the InfiniBand assets of QLogic.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-03-22 18:07:04 -07:00
Eric W. Biederman
7f78e03513 fs: Limit sys_mount to only request filesystem modules.
Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.

A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.

Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.

Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives.  Allowing simple, safe,
well understood work-arounds to known problematic software.

This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work.  While writing this patch I saw a handful of such
cases.  The most significant being autofs that lives in the module
autofs4.

This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.

After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module.  The common pattern in the kernel is to call request_module()
without regards to the users permissions.  In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted.  In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Kees Cook <keescook@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-03 19:36:31 -08:00
Tejun Heo
80f22b4430 IB/qib: convert to idr_alloc()
Convert to the much saner new idr interface.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Mike Marciniszyn <infinipath@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:17 -08:00
Linus Torvalds
d895cb1af1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile (part one) from Al Viro:
 "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
  locking violations, etc.

  The most visible changes here are death of FS_REVAL_DOT (replaced with
  "has ->d_weak_revalidate()") and a new helper getting from struct file
  to inode.  Some bits of preparation to xattr method interface changes.

  Misc patches by various people sent this cycle *and* ocfs2 fixes from
  several cycles ago that should've been upstream right then.

  PS: the next vfs pile will be xattr stuff."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
  saner proc_get_inode() calling conventions
  proc: avoid extra pde_put() in proc_fill_super()
  fs: change return values from -EACCES to -EPERM
  fs/exec.c: make bprm_mm_init() static
  ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
  ocfs2: fix possible use-after-free with AIO
  ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
  get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
  target: writev() on single-element vector is pointless
  export kernel_write(), convert open-coded instances
  fs: encode_fh: return FILEID_INVALID if invalid fid_type
  kill f_vfsmnt
  vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
  nfsd: handle vfs_getattr errors in acl protocol
  switch vfs_getattr() to struct path
  default SET_PERSONALITY() in linux/elf.h
  ceph: prepopulate inodes only when request is aborted
  d_hash_and_lookup(): export, switch open-coded instances
  9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
  9p: split dropping the acls from v9fs_set_create_acl()
  ...
2013-02-26 20:16:07 -08:00
Al Viro
496ad9aa8e new helper: file_inode(file)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:31 -05:00
Mike Marciniszyn
bcc9b67a5b IB/qib: Fix QP locate/remove race
remove_qp() can execute concurrently with a qib_lookup_qpn() on
another CPU, which in of itself, is ok, given the RCU locking.

The issue is that remove_qp() NULLs out the qp->next field so that a
qib_lookup_qpn() might fail to find a qp if it occurs after the one
that is being deleted.  This is a momentary issue and subsequent
qib_lookup_qpn() calls would find the qp's since the search restarts
from the bucket head.  At scale, the issue might causes dropped
packets and unnecessary retransmissions.

The fix just deletes the qp->next NULL assignment to prevent the
remove_qp() from hiding qp's from qib_lookup_qpn().

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-14 17:04:18 -08:00
Mike Marciniszyn
d359f35430 IB/qib: Fix for broken sparse warning fix
Commit 1fb9fed6d4 ("IB/qib: Fix QP RCU sparse warning") broke QP
hash list deletion in qp_remove() badly.

This patch restores the former for loop behavior, while still fixing
the sparse warnings.

Cc: <stable@vger.kernel.org>
Reviewed-by: Gary Leshner <gary.s.leshner@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-05 09:43:09 -08:00