Commit Graph

57000 Commits

Author SHA1 Message Date
Mike Christie
5d9fb5cc1b [SCSI] core, classes, mpt2sas: have scsi_internal_device_unblock take new state
This has scsi_internal_device_unblock/scsi_target_unblock take
the new state to set the devices as an argument instead of
always setting to running. The patch also converts users of these
functions.

This allows the FC and iSCSI class to transition devices from blocked
to transport-offline, so that when fast_io_fail/replacement_timeout
has fired we do not set the devices back to running. Instead, we
set them to SDEV_TRANSPORT_OFFLINE.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-20 08:58:22 +01:00
Mike Christie
1b8d262061 [SCSI] add new SDEV_TRANSPORT_OFFLINE state
This patch adds a new state SDEV_TRANSPORT_OFFLINE. It will
be used by transport classes to offline devices for cases like
when the fast_io_fail/recovery_tmo fires. In those cases we
want all IO to fail, and we have not yet escalated to dev_loss_tmo
behavior where we are removing the devices.

Currently to handle this state, transport classes are setting
the scsi_device's state to running, setting their internal
session/port structs state to something that indicates failed,
and then failing IO from some transport check in the queuecommand.

The reason for the new value is so that users can distinguish
between a device failure that is a result of a transport problem
vs the wide range of errors that devices get offlined for
when a scsi command times out and we offline the devices there.
It also fixes the confusion as to why the transport class is
failing IO, but has set the device state from blocked to running.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-20 08:58:21 +01:00
Paul Mundt
9ff561fdf7 Merge branch 'common/pinctrl' into sh-latest 2012-07-20 16:42:59 +09:00
Vasu Dev
4e5fae7adb [SCSI] libfc: update fcp and exch stats
Updates newly added stats from fc_get_host_stats,
added new function fc_exch_update_stats to
update exches related stats from fc_exch.c
by going thru internal ema_list elements.

Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Acked-by : Robert Love <robert.w.love@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-20 08:31:48 +01:00
Vasu Dev
0f02a66528 [SCSI] libfc: adds FCP failures stats
Adds stats to track FCP pkt and frame alloc
failure.

Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Acked-by : Robert Love <robert.w.love@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-20 08:31:48 +01:00
Vasu Dev
1bd49b4820 [SCSI] libfc, fcoe, bnx2fc: cleanup fcoe_dev_stats
The libfc is used by fcoe but fcoe agnostic,
and therefore should not have any fcoe references.

So renaming fcoe_dev_stats from libfc as its for fc_stats.
After that libfc is fcoe string free except some strings for
Open-FCoE.org.

Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Acked-by : Robert Love <robert.w.love@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Acked-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-20 08:31:47 +01:00
Vasu Dev
e58abb0ca4 [SCSI] fc: add some more FC specific stats to fc_host
The libfc provides more flexibility and with that
we can monitor some more FC specific stats for
FC exches or FCP error cases, this patch add
such new FC stats.

The patch adds *only* FC specific new stats to
existing fc_host attribute container.

Added stats names are self explanatory as
existing FC stats already has, however anyway
still added commentary along their definition
to describe them.

Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Acked-by : Robert Love <robert.w.love@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-20 08:31:47 +01:00
Guennadi Liakhovetski
1ff8df4f53 dma: sh: provide a migration path for slave drivers to stop using .private
This patch extends the sh dmaengine driver to support the preferred channel
selection and configuration method, instead of using the "private" field
from struct dma_chan. We add a standard filter function to be used by
slave drivers instead of implementing their own ones, and add support for
the DMA_SLAVE_CONFIG control operation, which must accompany the new
channel selection method. We still support the legacy .private channel
allocation method to cater for a smooth driver migration.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
[applied a trvial checkpath fix]
Signed-off-by: Vinod Koul <vinod.koul@linux.intel.com>
2012-07-20 11:28:20 +05:30
Guennadi Liakhovetski
c2cdb7e4d1 dma: sh: use an integer slave ID to improve API compatibility
Initially struct shdma_slave has been introduced with the only member - an
unsigned slave ID - to describe common properties of DMA slaves in an
extensible way. However, experience shows, that a slave ID is indeed the
only parameter, needed to identify DMA slaves. This is also, what is used
by the core dmaengine API in struct dma_slave_config. We switch to using
the slave_id directly, instead of passing a pointer to struct shdma_slave
to improve compatibility with the core. We also make the slave_id signed
for easier error checking.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Vinod Koul <vinod.koul@linux.intel.com>
2012-07-20 11:23:45 +05:30
Guennadi Liakhovetski
ecf90fbbdc dmaengine: shdma: prepare to stop using struct dma_chan::private
Using struct dma_chan::private is deprecated. To update the shdma driver to
stop using it we first have to eliminate internal runtime uses of it. After
that we will also be able to stop using it for channel configuration.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Vinod Koul <vinod.koul@linux.intel.com>
2012-07-20 11:23:44 +05:30
Dave Airlie
e6b0b6a82f Merge tag 'v3.5-rc7' into drm-next
Merge Linus tree into drm to fixup conflicts in radeon code for further
testing before upstream merge.

Signed-off-by: Dave Airlie <airlied@redhat.com>

Conflicts:
	drivers/gpu/drm/i915/i915_dma.c
	drivers/gpu/drm/i915/intel_display.c
	drivers/gpu/drm/radeon/radeon_gart.c
2012-07-20 00:53:28 -04:00
Daniel Vetter
83bc5fd29a drm/sis: fixup sis_mm ioctl structs
Userspace uses long in quite a few places more than the kernel. Which
gives me neat proof that I'm the only guy on this side of the galaxy
who ever tried to run glxgears on a 64bit machine with sis graphics on
linux.

Note that the longs in drm_sis_mem_t aren't aligned properly, so this
won't even work with 32bit userspace on 64bit kernel as-is. Hence the
patch can't break that, either.

Nope, I'm not nuts enough to write the 32bit ioctl compat layer for
this and test it with some wine app. Even though hunting the ebay
dungeons for a sis card actually supported by the mesa drivers casts
some doubts on this ...

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-07-19 22:51:58 -04:00
Daniel Vetter
26587e6994 drm: kill i915/i830 ids from drm_pciids.h
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-07-19 22:51:12 -04:00
Daniel Vetter
a344a7e7c2 drm: kill dma queue support
Absolutely unused. All the values are only ever initialized and
then used at most in some debug printout functions.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-07-19 22:50:55 -04:00
Daniel Vetter
b0071efe82 drm: kill reclaim_buffers callback
All leftover users either haven't set DRIVER_HAVE_DMA, in which
case this will never be called, or use the drm_core implementation.

Call that directly in the only callsite.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-07-19 22:50:28 -04:00
Daniel Vetter
923d1fe86b drm: kill reclaim_buffers_locked
i810 was the last user of this code, with that gone, kill it.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-07-19 22:49:58 -04:00
Daniel Vetter
3ae6b64400 drm: kill reclaim_buffers_idlelocked functions
The only two users are now folded into the drivers preclose functions,
so this is unused.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-07-19 22:49:27 -04:00
Dave Airlie
f42977841f drm/pci: add support for getting the supported link bw.
This should work for PCIE3.0 as well.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-07-19 22:29:25 -04:00
Dave Airlie
cdcac9cd77 pci_regs: define LNKSTA2 pcie cap + bits.
We need these for detecting the max link speed for drm drivers.

Acked-by: Bjorn Helgaas <bhelgass@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-07-19 22:28:44 -04:00
Laurent Pinchart
e811f5ae19 drm: Make the .mode_fixup() operations mode argument a const pointer
The passed mode must not be modified by the operation, make it const.

Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-07-19 21:52:38 -04:00
Daniel Vetter
59fd415ded drm: remove the list_head from drm_mode_set
It's unused. At it confused me quite a bit until I've discovered that.

Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-07-19 21:27:51 -04:00
Xiao Guangrong
d566104853 KVM: remove the unused parameter of gfn_to_pfn_memslot
The parameter, 'kvm', is not used in gfn_to_pfn_memslot, we can happily remove
it

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-19 21:25:24 -03:00
Xiao Guangrong
f340a51b7e KVM: remove is_error_hpa
Remove them since they are not used anymore

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-19 21:23:49 -03:00
Xiao Guangrong
ca0565f573 KVM: make bad_pfn static to kvm_main.c
bad_pfn is not used out of kvm_main.c, so mark it static, also move it near
hwpoison_pfn and fault_pfn

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-19 21:17:10 -03:00
Xiao Guangrong
903816fa4d KVM: using get_fault_pfn to get the fault pfn
Using get_fault_pfn to cleanup the code

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-19 21:15:25 -03:00
Linus Torvalds
85efc72a02 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull last minute Ceph fixes from Sage Weil:
 "The important one fixes a bug in the socket failure handling behavior
  that was turned up in some recent failure injection testing.  The
  other two are minor bug fixes."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  rbd: endian bug in rbd_req_cb()
  rbd: Fix ceph_snap_context size calculation
  libceph: fix messenger retry
2012-07-19 16:11:28 -07:00
Rob Herring
137f8a7213 clk: fix compile for OF && !COMMON_CLK
With commit 766e6a4ec6 (clk: add DT clock binding support),
compiling with OF && !COMMON_CLK is broken.

Reported-by: Alexandre Pereira da Silva <aletes.xgr@gmail.com>
Reported-by: Prashant Gaikwad <pgaikwad@nvidia.com>
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Mike Turquette <mturquette@linaro.org>
2012-07-19 14:07:56 -07:00
Shawn Guo
9f1612d351 clk: fix clk_get on of_clk_get_by_name return check
The commit 766e6a4 (clk: add DT clock binding support) plugs device
tree clk lookup of_clk_get_by_name into clk_get, and fall on non-DT
lookup clk_get_sys if DT lookup fails.

The return check on of_clk_get_by_name takes (clk != NULL) as a
successful DT lookup.  But it's not the case.  For any system that
does not define clk lookup in device tree, ERR_PTR(-ENOENT) will be
returned, and consequently, all the client drivers calling clk_get
in their probe functions will fail to probe with error code -ENOENT
returned.

Fix the issue by checking of_clk_get_by_name return with !IS_ERR(clk),
and update of_clk_get and of_clk_get_by_name for !CONFIG_OF build
correspondingly.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Tested-by: Marek Vasut <marex@denx.de>
Tested-by: Lauri Hintsala <lauri.hintsala@bluegiga.com>
Signed-off-by: Mike Turquette <mturquette@linaro.org>
2012-07-19 14:07:45 -07:00
Olaf Hering
00e37bdb01 xen PVonHVM: move shared_info to MMIO before kexec
Currently kexec in a PVonHVM guest fails with a triple fault because the
new kernel overwrites the shared info page. The exact failure depends on
the size of the kernel image. This patch moves the pfn from RAM into
MMIO space before the kexec boot.

The pfn containing the shared_info is located somewhere in RAM. This
will cause trouble if the current kernel is doing a kexec boot into a
new kernel. The new kernel (and its startup code) can not know where the
pfn is, so it can not reserve the page. The hypervisor will continue to
update the pfn, and as a result memory corruption occours in the new
kernel.

One way to work around this issue is to allocate a page in the
xen-platform pci device's BAR memory range. But pci init is done very
late and the shared_info page is already in use very early to read the
pvclock. So moving the pfn from RAM to MMIO is racy because some code
paths on other vcpus could access the pfn during the small   window when
the old pfn is moved to the new pfn. There is even a  small window were
the old pfn is not backed by a mfn, and during that time all reads
return -1.

Because it is not known upfront where the MMIO region is located it can
not be used right from the start in xen_hvm_init_shared_info.

To minimise trouble the move of the pfn is done shortly before kexec.
This does not eliminate the race because all vcpus are still online when
the syscore_ops will be called. But hopefully there is no work pending
at this point in time. Also the syscore_op is run last which reduces the
risk further.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19 15:52:05 -04:00
Olaf Hering
254d1a3f02 xen/pv-on-hvm kexec: shutdown watches from old kernel
Add xs_reset_watches function to shutdown watches from old kernel after
kexec boot.  The old kernel does not unregister all watches in the
shutdown path.  They are still active, the double registration can not
be detected by the new kernel.  When the watches fire, unexpected events
will arrive and the xenwatch thread will crash (jumps to NULL).  An
orderly reboot of a hvm guest will destroy the entire guest with all its
resources (including the watches) before it is rebuilt from scratch, so
the missing unregister is not an issue in that case.

With this change the xenstored is instructed to wipe all active watches
for the guest.  However, a patch for xenstored is required so that it
accepts the XS_RESET_WATCHES request from a client (see changeset
23839:42a45baf037d in xen-unstable.hg). Without the patch for xenstored
the registration of watches will fail and some features of a PVonHVM
guest are not available. The guest is still able to boot, but repeated
kexec boots will fail.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19 15:52:02 -04:00
Liu, Jinsong
f65c9bb3fb xen/pcpu: Xen physical cpus online/offline sys interface
This patch provide Xen physical cpus online/offline sys interface.
User can use it for their own purpose, like power saving:
by offlining some cpus when light workload it save power greatly.

Its basic workflow is, user online/offline cpu via sys interface,
then hypercall xen to implement, after done xen inject virq back to dom0,
and then dom0 sync cpu status.

Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19 15:51:39 -04:00
Liu, Jinsong
cef12ee52b xen/mce: Add mcelog support for Xen platform
When MCA error occurs, it would be handled by Xen hypervisor first,
and then the error information would be sent to initial domain for logging.

This patch gets error information from Xen hypervisor and convert
Xen format error into Linux format mcelog. This logic is basically
self-contained, not touching other kernel components.

By using tools like mcelog tool users could read specific error information,
like what they did under native Linux.

To test follow directions outlined in Documentation/acpi/apei/einj.txt

Acked-and-tested-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Ke, Liping <liping.ke@intel.com>
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19 15:51:36 -04:00
David S. Miller
abaa72d7fd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
2012-07-19 11:17:30 -07:00
Yuchung Cheng
67da22d23f net-tcp: Fast Open client - cookie-less mode
In trusted networks, e.g., intranet, data-center, the client does not
need to use Fast Open cookie to mitigate DoS attacks. In cookie-less
mode, sendmsg() with MSG_FASTOPEN flag will send SYN-data regardless
of cookie availability.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 11:02:03 -07:00
Yuchung Cheng
aab4874355 net-tcp: Fast Open client - detecting SYN-data drops
On paths with firewalls dropping SYN with data or experimental TCP options,
Fast Open connections will have experience SYN timeout and bad performance.
The solution is to track such incidents in the cookie cache and disables
Fast Open temporarily.

Since only the original SYN includes data and/or Fast Open option, the
SYN-ACK has some tell-tale sign (tcp_rcv_fastopen_synack()) to detect
such drops. If a path has recurring Fast Open SYN drops, Fast Open is
disabled for 2^(recurring_losses) minutes starting from four minutes up to
roughly one and half day. sendmsg with MSG_FASTOPEN flag will succeed but
it behaves as connect() then write().

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 11:02:03 -07:00
Yuchung Cheng
cf60af03ca net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN)
sendmsg() (or sendto()) with MSG_FASTOPEN is a combo of connect(2)
and write(2). The application should replace connect() with it to
send data in the opening SYN packet.

For blocking socket, sendmsg() blocks until all the data are buffered
locally and the handshake is completed like connect() call. It
returns similar errno like connect() if the TCP handshake fails.

For non-blocking socket, it returns the number of bytes queued (and
transmitted in the SYN-data packet) if cookie is available. If cookie
is not available, it transmits a data-less SYN packet with Fast Open
cookie request option and returns -EINPROGRESS like connect().

Using MSG_FASTOPEN on connecting or connected socket will result in
simlar errno like repeating connect() calls. Therefore the application
should only use this flag on new sockets.

The buffer size of sendmsg() is independent of the MSS of the connection.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 11:02:03 -07:00
Yuchung Cheng
783237e8da net-tcp: Fast Open client - sending SYN-data
This patch implements sending SYN-data in tcp_connect(). The data is
from tcp_sendmsg() with flag MSG_FASTOPEN (implemented in a later patch).

The length of the cookie in tcp_fastopen_req, init'd to 0, controls the
type of the SYN. If the cookie is not cached (len==0), the host sends
data-less SYN with Fast Open cookie request option to solicit a cookie
from the remote. If cookie is not available (len > 0), the host sends
a SYN-data with Fast Open cookie option. If cookie length is negative,
  the SYN will not include any Fast Open option (for fall back operations).

To deal with middleboxes that may drop SYN with data or experimental TCP
option, the SYN-data is only sent once. SYN retransmits do not include
data or Fast Open options. The connection will fall back to regular TCP
handshake.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 11:02:03 -07:00
Yuchung Cheng
1fe4c481ba net-tcp: Fast Open client - cookie cache
With help from Eric Dumazet, add Fast Open metrics in tcp metrics cache.
The basic ones are MSS and the cookies. Later patch will cache more to
handle unfriendly middleboxes.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:55:36 -07:00
Yuchung Cheng
2100c8d2d9 net-tcp: Fast Open base
This patch impelements the common code for both the client and server.

1. TCP Fast Open option processing. Since Fast Open does not have an
   option number assigned by IANA yet, it shares the experiment option
   code 254 by implementing draft-ietf-tcpm-experimental-options
   with a 16 bits magic number 0xF989. This enables global experiments
   without clashing the scarce(2) experimental options available for TCP.

   When the draft status becomes standard (maybe), the client should
   switch to the new option number assigned while the server supports
   both numbers for transistion.

2. The new sysctl tcp_fastopen

3. A place holder init function

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:55:36 -07:00
David S. Miller
d8f1641b58 net: Fix warnings in dst_ops.h
include/net/dst_ops.h:28:20: warning: ‘struct sock’ declared inside parameter list

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:43:03 -07:00
Eric Dumazet
be9f4a44e7 ipv4: tcp: remove per net tcp_sock
tcp_v4_send_reset() and tcp_v4_send_ack() use a single socket
per network namespace.

This leads to bad behavior on multiqueue NICS, because many cpus
contend for the socket lock and once socket lock is acquired, extra
false sharing on various socket fields slow down the operations.

To better resist to attacks, we use a percpu socket. Each cpu can
run without contention, using appropriate memory (local node)

Additional features :

1) We also mirror the queue_mapping of the incoming skb, so that
answers use the same queue if possible.

2) Setting SOCK_USE_WRITE_QUEUE socket flag speedup sock_wfree()

3) We now limit the number of in-flight RST/ACK [1] packets
per cpu, instead of per namespace, and we honor the sysctl_wmem_default
limit dynamically. (Prior to this patch, sysctl_wmem_default value was
copied at boot time, so any further change would not affect tcp_sock
limit)

[1] These packets are only generated when no socket was matched for
the incoming packet.

Reported-by: Bill Sommerfeld <wsommerfeld@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:35:30 -07:00
Julian Anastasov
aee06da672 ipv4: use seqlock for nh_exceptions
Use global seqlock for the nh_exceptions. Call
fnhe_oldest with the right hash chain. Correct the diff
value for dst_set_expires.

v2: after suggestions from Eric Dumazet:
* get rid of spin lock fnhe_lock, rearrange update_or_create_fnhe
* continue daddr search in rt_bind_exception

v3:
* remove the daddr check before seqlock in rt_bind_exception
* restart lookup in rt_bind_exception on detected seqlock change,
as suggested by David Miller

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:30:14 -07:00
Steven Rostedt
08f6fba503 ftrace/x86: Add separate function to save regs
Add a way to have different functions calling different trampolines.
If a ftrace_ops wants regs saved on the return, then have only the
functions with ops registered to save regs. Functions registered by
other ops would not be affected, unless the functions overlap.

If one ftrace_ops registered functions A, B and C and another ops
registered fucntions to save regs on A, and D, then only functions
A and D would be saving regs. Function B and C would work as normal.
Although A is registered by both ops: normal and saves regs; this is fine
as saving the regs is needed to satisfy one of the ops that calls it
but the regs are ignored by the other ops function.

x86_64 implements the full regs saving, and i386 just passes a NULL
for regs to satisfy the ftrace_ops passing. Where an arch must supply
both regs and ftrace_ops parameters, even if regs is just NULL.

It is OK for an arch to pass NULL regs. All function trace users that
require regs passing must add the flag FTRACE_OPS_FL_SAVE_REGS when
registering the ftrace_ops. If the arch does not support saving regs
then the ftrace_ops will fail to register. The flag
FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED may be set that will prevent the
ftrace_ops from failing to register. In this case, the handler may
either check if regs is not NULL or check if ARCH_SUPPORTS_FTRACE_SAVE_REGS.
If the arch supports passing regs it will set this macro and pass regs
for ops that request them. All other archs will just pass NULL.

Link: Link: http://lkml.kernel.org/r/20120711195745.107705970@goodmis.org

Cc: Alexander van Heukelum <heukelum@fastmail.fm>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-07-19 13:20:03 -04:00
Steven Rostedt
a1e2e31d17 ftrace: Return pt_regs to function trace callback
Return as the 4th paramater to the function tracer callback the pt_regs.

Later patches that implement regs passing for the architectures will require
having the ftrace_ops set the SAVE_REGS flag, which will tell the arch
to take the time to pass a full set of pt_regs to the ftrace_ops callback
function. If the arch does not support it then it should pass NULL.

If an arch can pass full regs, then it should define:
 ARCH_SUPPORTS_FTRACE_SAVE_REGS to 1

Link: http://lkml.kernel.org/r/20120702201821.019966811@goodmis.org

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-07-19 13:18:49 -04:00
Steven Rostedt
ccf3672d53 ftrace: Consolidate arch dependent functions with 'list' function
As the function tracer starts to get more features, the support for
theses features will spread out throughout the different architectures
over time. These features boil down to what each arch does in the
mcount trampoline (the ftrace_caller).

Currently there's two features that are not the same throughout the
archs.

 1) Support to stop function tracing before the callback
 2) passing of the ftrace ops

Both of these require placing an indirect function to support the
features if the mcount trampoline does not.

On a side note, for all architectures, when more than one callback
is registered to the function tracer, an intermediate 'list' function
is called by the mcount trampoline to iterate through the callbacks
that are registered.

Instead of making a separate function for each of these features,
and requiring several indirect calls, just use the single 'list' function
as the intermediate, to handle all cases. If an arch does not support
the 'stop function tracing' or the passing of ftrace ops, just force
it to use the list function that will handle the features required.

This makes the code cleaner and simpler and removes a lot of
 #ifdefs in the code.

Link: http://lkml.kernel.org/r/20120612225424.495625483@goodmis.org

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-07-19 13:18:22 -04:00
Steven Rostedt
2f5f6ad939 ftrace: Pass ftrace_ops as third parameter to function trace callback
Currently the function trace callback receives only the ip and parent_ip
of the function that it traced. It would be more powerful to also return
the ops that registered the function as well. This allows the same function
to act differently depending on what ftrace_ops registered it.

Link: http://lkml.kernel.org/r/20120612225424.267254552@goodmis.org

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-07-19 13:17:35 -04:00
Amir Vadai
d9236c3f10 {NET,IB}/mlx4: Add rmap support to mlx4_assign_eq
Enable callers of mlx4_assign_eq to supply a pointer to cpu_rmap.
If supplied, the assigned IRQ is tracked using rmap infrastructure.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 08:34:37 -07:00
Amir Vadai
122733a189 net/rps: Protect cpu_rmap.h from double inclusion
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 08:34:37 -07:00
Amir Vadai
af22d9de45 net/mlx4: Move MAC_MASK to a common place
Define this macro is one common place instead of duplicating it over the code

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 08:34:37 -07:00
Theodore Ts'o
b4237003cf random: final removal of IRQF_SAMPLE_RANDOM
The IRQF_SAMPLE_RANDOM flag is finally gone from the kernel tree, only
three years late.  :-)

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-07-19 10:40:42 -04:00