Commit Graph

48594 Commits

Author SHA1 Message Date
Ingo Molnar
d4f7743542 Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/efi
Pull EFI build fix from Matt Fleming:

  - Fix ESRT build breakage on ia64 reported by Guenter Roeck. (Peter Jones)

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-11 16:42:49 +02:00
Joerg Roedel
d290f1e70d iommu: Introduce iommu_request_dm_for_dev()
This function can be called by an IOMMU driver to request
that a device's default domain is direct mapped.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-06-11 09:01:55 +02:00
Florian Fainelli
8bc84b7926 net: phy: broadcom: define Broadcom pseudo-PHY address in brcmphy.h
Define the pseudo-PHY address (30) which is used by all Broadcom
Ethernet switches in a shared header file.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-10 23:33:58 -07:00
Florian Fainelli
4f822c625f net: phy: broadcom: include phy.h for brcmphy.h
We utilize inline functions from the PHY library, make sure that we do
include phy.h in brcmphy.h in order for the code including brcmphy.h not
to have to resolve this inclusion dependency.

Fixes: 705314797b ("net: phy: broadcom: move shadow 0x1C register accessors to brcmphy.h")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-10 23:33:58 -07:00
Greg Kroah-Hartman
78a66b00d9 Merge tag 'iio-for-v4.2c' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-next
Jonathan writes:

Third round of new IIO drivers, cleanups and functionality for the 4.2 cycle.

Given Linus announced a 4.8rc coming up, hopefully time for one more
lot of IIO patches this cycle.  Some of these are actually
improvements / fixes for patches earlier in the cycle.

New device support
* st_accel driver - support devices with 8 bit channels.

Cleanup
* A general cleanup of the iio tools under /tools/ from Hartmut.
  I'm more than a little embarassed by how bad some of these were! Are well,
  much more refined and less bug prone now.
  These cover lots of stuff like unhandled error returns, memory leaks as
  well as general refactoring to tidy the code up.
* iio_simple_dummy - fix memory leaks in the init functions, drop some
  pointless error returns from functions that never generate errors and
  make the module parameter explicitly unsigned.
* More buffer handling reworks from Lars-Peter, this time targetting hardware
  buffers (a little used corner that looks likely to get more use in the near
  future). Specifically:
  - Always compute the masklength as inkernel buffer users may need it.
  - Add a means of labeling which buffer modes a given buffer implementation
    supports.
  - In the case of hardware buffers, require strict scan matching rather than
    matching to a superset.  Currently the demux is bypassed by these drivers
    (this may well not change for efficiency reasons) so allowing a superset
    of channels to be selected would otherwise lead to more data than requested
    confusing userspace.

Driver funcationality improvments
* mmc35240 - adds a compensation to the raw values as borrowed form Memsic's
  own input driver.
* mma8452
  - event support
  - event debouncing
  - high  pass filter configuration
  - triggers
* vf610 - allow conversion mode to be adjusted

Fixlets
* mmc35240
  - Off by one error that by coincidence had no real effect.
  - i2c_device_name should be lowercase.
  - Lack of null terminator at end of attributes array.
  - Avoid computing the fractional part of the magnetic field by moving
    the scaling into userspace where floating point is available to simplify
    the maths.
  - Use a smaller sleep before assuming the measurement is done.  This is
    safe and improves the possible polling rate.
  - Fix sensitivity on z-axis - datasheet disagrees with Memsic's releasedd
    code and the value used in the code seems to be correct.
* stk3310 - make a local variable signed to ensure error handling works.
* twl4030
  - fix calculation of the temperature sense current - bug unlikely
    to have ever been noticed as the difference is small.
  - Fix errors in descriptions.
2015-06-10 20:48:34 -07:00
Zhang Rui
53daf9383f Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal into thermal-soc 2015-06-11 10:55:42 +08:00
Chuck Lever
4a06825839 SUNRPC: Transport fault injection
It has been exceptionally useful to exercise the logic that handles
local immediate errors and RDMA connection loss.  To enable
developers to test this regularly and repeatably, add logic to
simulate connection loss every so often.

Fault injection is disabled by default. It is enabled with

  $ sudo echo xxx > /sys/kernel/debug/sunrpc/inject_fault/disconnect

where "xxx" is a large positive number of transport method calls
before a disconnect. A value of several thousand is usually a good
number that allows reasonable forward progress while still causing a
lot of connection drops.

These hooks are disabled when SUNRPC_DEBUG is turned off.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-10 18:37:26 -04:00
Anna Schumaker
11598b8ff2 NFS: Remove unused nfs_rw_ops->rw_release() function
This was only ever set to nfs_writeback_release_common(), a function
which is completely empty.  Let's just drop this function pointer and
simplify the code a bit.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-10 18:32:40 -04:00
Jeff Layton
d67fa4d85a sunrpc: turn swapper_enable/disable functions into rpc_xprt_ops
RDMA xprts don't have a sock_xprt, but an rdma_xprt, so the
xs_swapper_enable/disable functions will likely oops when fed an RDMA
xprt. Turn these functions into rpc_xprt_ops so that that doesn't
occur. For now the RDMA versions are no-ops that just return -EINVAL
on an attempt to swapon.

Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-10 18:26:26 -04:00
Jeff Layton
8e2281330f sunrpc: make xprt->swapper an atomic_t
Split xs_swapper into enable/disable functions and eliminate the
"enable" flag.

Currently, it's racy if you have multiple swapon/swapoff operations
running in parallel over the same xprt. Also fix it so that we only
set it to a memalloc socket on a 0->1 transition and only clear it
on a 1->0 transition.

Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-10 18:26:18 -04:00
Jeff Layton
3c87ef6efb sunrpc: keep a count of swapfiles associated with the rpc_clnt
Jerome reported seeing a warning pop when working with a swapfile on
NFS. The nfs_swap_activate can end up calling sk_set_memalloc while
holding the rcu_read_lock and that function can sleep.

To fix that, we need to take a reference to the xprt while holding the
rcu_read_lock, set the socket up for swapping and then drop that
reference. But, xprt_put is not exported and having NFS deal with the
underlying xprt is a bit of layering violation anyway.

Fix this by adding a set of activate/deactivate functions that take a
rpc_clnt pointer instead of an rpc_xprt, and have nfs_swap_activate and
nfs_swap_deactivate call those.

Also, add a per-rpc_clnt atomic counter to keep track of the number of
active swapfiles associated with it. When the counter does a 0->1
transition, we enable swapping on the xprt, when we do a 1->0 transition
we disable swapping on it.

This also allows us to be a bit more selective with the RPC_TASK_SWAPPER
flag. If non-swapper and swapper clnts are sharing a xprt, then we only
need to flag the tasks from the swapper clnt with that flag.

Acked-by: Mel Gorman <mgorman@suse.de>
Reported-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-10 18:26:14 -04:00
Rafael J. Wysocki
b064a8fa77 ACPI / init: Switch over platform to the ACPI mode later
Commit 73f7d1ca32 "ACPI / init: Run acpi_early_init() before
timekeeping_init()" moved the ACPI subsystem initialization,
including the ACPI mode enabling, to an earlier point in the
initialization sequence, to allow the timekeeping subsystem
use ACPI early.  Unfortunately, that resulted in boot regressions
on some systems and the early ACPI initialization was moved toward
its original position in the kernel initialization code by commit
c4e1acbb35 "ACPI / init: Invoke early ACPI initialization later".

However, that turns out to be insufficient, as boot is still broken
on the Tyan S8812 mainboard.

To fix that issue, split the ACPI early initialization code into
two pieces so the majority of it still located in acpi_early_init()
and the part switching over the platform into the ACPI mode goes into
a new function, acpi_subsystem_init(), executed at the original early
ACPI initialization spot.

That fixes the Tyan S8812 boot problem, but still allows ACPI
tables to be loaded earlier which is useful to the EFI code in
efi_enter_virtual_mode().

Link: https://bugzilla.kernel.org/show_bug.cgi?id=97141
Fixes: 73f7d1ca32 "ACPI / init: Run acpi_early_init() before timekeeping_init()"
Reported-and-tested-by: Marius Tolzmann <tolzmann@molgen.mpg.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Reviewed-by: Lee, Chun-Yi <jlee@suse.com>
2015-06-10 23:51:27 +02:00
Daniel Thompson
3037e9ea78 clk: fixed: Add comment to clk_fixed_set_rate
Currently it is not made explicit why clk_fixed_set_rate() can ignore
its arguments and unconditionally return success. Add a comment
to explain this.

We also mark the clk_ops table const since it should never be
modified at runtime.

Suggested-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2015-06-10 14:19:43 -07:00
Hans de Goede
fe27e1dfe9 power: Add devm_power_supply_get_by_phandle() helper function
This commit adds a resource-managed version of the
power_supply_get_by_phandle() function.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
2015-06-10 16:15:54 +02:00
Krzysztof Kozlowski
5c6e3a97e9 power_supply: sysfs: Bring back write to writeable properties
The fix for NULL pointer exception related to calling uevent for not
finished probe caused to set all writeable properties as non-writeable.
This was caused by checking if property is writeable before the initial
increase of power supply usage counter and in the same time using
wrapper over property_is_writeable(). The wrapper returns ENODEV if the
usage counter is still 0.

The call trace looked like:
  device probe:
    power_supply_register()
      use_cnt = 0;
      device_add()
        create sysfs entries
          power_supply_attr_is_visible()
            power_supply_property_is_writeable()
              if (use_cnt == 0) return -ENODEV;
      use_cnt++;

Replace the usage of wrapper with direct call to property_is_writeable()
from driver. This should be safe call during device probe because
implementations of this callback just return 0/1 for different
properties and they do not access any of the driver's internal data.

Fixes: 8e59c7f234 ("power_supply: Fix NULL pointer dereference during bq27x00_battery probe")
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
2015-06-10 16:10:59 +02:00
Jens Axboe
0bb979472a cfq-iosched: fix the setting of IOPS mode on SSDs
A previous commit wanted to make CFQ default to IOPS mode on
non-rotational storage, however it did so when the queue was
initialized and the non-rotational flag is only set later on
in the probe.

Add an elevator hook that gets called off the add_disk() path,
at that point we know that feature probing has finished, and
we can reliably check for the various flags that drivers can
set.

Fixes: 41c0126b ("block: Make CFQ default to IOPS mode on SSDs")
Tested-by: Romain Francoise <romain@orebokech.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-10 08:01:20 -06:00
Herbert Xu
c2719503f5 random: Remove kernel blocking API
This patch removes the kernel blocking API as it has been completely
replaced by the callback API.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-06-10 19:14:04 +08:00
Herbert Xu
205a525c33 random: Add callback API for random pool readiness
The get_blocking_random_bytes API is broken because the wait can
be arbitrarily long (potentially forever) so there is no safe way
of calling it from within the kernel.

This patch replaces it with a callback API instead.  The callback
is invoked potentially from interrupt context so the user needs
to schedule their own work thread if necessary.

In addition to adding callbacks, they can also be removed as
otherwise this opens up a way for user-space to allocate kernel
memory with no bound (by opening algif_rng descriptors and then
closing them).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-06-10 19:13:56 +08:00
Christophe Ricard
ed06aeefda nfc: st-nci: Rename st21nfcb to st-nci
STMicroelectronics NFC NCI chips family is extending
with the new ST21NFCC using the AMS AS39230 RF booster.
The st21nfcb driver is relevant for this solution and
might be with future products.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-06-10 12:51:44 +02:00
Nicholas Mc Guire
c569a23d65 time: Allow gcc to fold usecs_to_jiffies(constant)
To allow constant folding in usecs_to_jiffies() conditionally calls
the HZ dependent _usecs_to_jiffies() helpers or, when gcc can not
figure out constant folding, __usecs_to_jiffies, which is the renamed
original usecs_to_jiffies() function.

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Cc: Masahiro Yamada <yamada.m@jp.panasonic.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Joe Perches <joe@perches.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Andrew Hunter <ahh@google.com>
Cc: Paul Turner <pjt@google.com>
Cc: Michal Marek <mmarek@suse.cz>
Link: http://lkml.kernel.org/r/1432832996-12129-2-git-send-email-hofrat@osadl.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-10 11:31:14 +02:00
Nicholas Mc Guire
ae60d6a0e3 time: Refactor usecs_to_jiffies
Refactor the usecs_to_jiffies conditional code part in time.c and
jiffies.h putting it into conditional functions rather than #ifdefs
to improve readability. This is analogous to the msecs_to_jiffies()
cleanup in commit ca42aaf0c8 ("time: Refactor msecs_to_jiffies")

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Cc: Masahiro Yamada <yamada.m@jp.panasonic.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Joe Perches <joe@perches.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Andrew Hunter <ahh@google.com>
Cc: Paul Turner <pjt@google.com>
Cc: Michal Marek <mmarek@suse.cz>
Link: http://lkml.kernel.org/r/1432832996-12129-1-git-send-email-hofrat@osadl.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-10 11:31:13 +02:00
Guo Zeng
b1999477ed ARM: prima2: move to use REGMAP APIs for rtciobrg
all devices behind rtciobrg needs a special way to access. currently they
are using a platform-specific API.
this patch moves to REGMAP, then clients can use regmap APIs to read/write.
for the moment, old APIs are still kept, once all clients move to regmap,
old APIs will be dropped.

this patch also does minor clean for comments, authors statement.

Signed-off-by: Guo Zeng <Guo.Zeng@csr.com>
Signed-off-by: Barry Song <Baohua.Song@csr.com>
2015-06-10 15:10:26 +08:00
Steve Twiss
5179f0ce2f Input: add OnKey driver for DA9063 MFD part
This adds OnKey driver support for DA9063.

Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2015-06-09 11:23:24 -07:00
David Woodhouse
bd00c606a6 iommu/vt-d: Change PASID support to bit 40 of Extended Capability Register
The existing hardware implementations with PASID support advertised in
bit 28? Forget them. They do not exist. Bit 28 means nothing. When we
have something that works, it'll use bit 40. Do not attempt to infer
anything meaningful from bit 28.

This will be reflected in an updated VT-d spec in the extremely near
future.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2015-06-09 15:06:55 +01:00
Joerg Roedel
6827ca8369 iommu: Add function to query the default domain of a group
This will be used to handle unity mappings in the iommu
drivers.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-06-09 08:55:24 +02:00
Joerg Roedel
a1015c2b99 iommu: Introduce direct mapped region handling
Add two new functions to the IOMMU-API to allow the IOMMU
drivers to export the requirements for direct mapped regions
per device.
This is useful for exporting the information in Intel VT-d's
RMRR entries or AMD-Vi's unity mappings.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-06-09 08:55:23 +02:00
Joerg Roedel
2c1296d92a iommu: Add iommu_get_domain_for_dev function
This function can be used to request the current domain a
device is attached to.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-06-09 08:55:23 +02:00
David S. Miller
941742f497 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-06-08 20:06:56 -07:00
Greg Kroah-Hartman
19915e6234 Merge 4.1-rc7 into usb-next
This resolves a merge issue in musb_core.c and we want the fixes that
were in Linus's tree in this branch as well for testing.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-08 10:57:51 -07:00
Greg Kroah-Hartman
00fda1682e Merge 4.1-rc7 into tty-next
This fixes up a merge issue with the amba-pl011.c driver, and we want
the fixes in this branch as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-08 10:49:28 -07:00
Greg Kroah-Hartman
6394d6d01b Merge 4.1-rc7 into staging-testing
We want the staging tree fixes in here too to help with testing and
merge issues.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-08 10:34:44 -07:00
Greg Kroah-Hartman
987aec39a7 Merge 4.1-rc7 into driver-core-next
We want the fixes in this branch as well for testing and merge
resolution.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-08 10:19:40 -07:00
Borislav Petkov
d711b8b30c hrtimers: Make sure hrtimer_resolution is unsigned int
... in the !CONFIG_HIGH_RES_TIMERS case too. And thus fix warnings like
this one:

net/sched/sch_api.c: In function ‘psched_show’:
net/sched/sch_api.c:1891:6: warning: format ‘%x’ expects argument of type ‘unsigned int’, but argument 6 has type ‘long int’ [-Wformat=]
      (u32)NSEC_PER_SEC / hrtimer_resolution);

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1433583000-32090-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
2015-06-08 15:46:06 +02:00
Bjorn Helgaas
01d72a9518 PCI: Remove unused pci_dma_burst_advice()
pci_dma_burst_advice() was added by e24c2d963a ("[PATCH] PCI: DMA
bursting advice") but apparently never used.  Remove it.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Michal Simek <monstr@monstr.eu>	# microblaze
CC: David S. Miller <davem@davemloft.net>
2015-06-08 07:56:43 -05:00
Rafał Miłecki
a8077d6573 bcma: make calls to PCI hostmode functions config-safe
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2015-06-08 14:33:26 +03:00
Peter Jones
3846c15820 efi: Work around ia64 build problem with ESRT driver
So, I'm told this problem exists in the world:

 > Subject: Build error in -next due to 'efi: Add esrt support'
 >
 > Building ia64:defconfig ... failed
 > --------------
 > Error log:
 >
 > drivers/firmware/efi/esrt.c:28:31: fatal error: asm/early_ioremap.h: No such file or directory
 >

I'm not really sure how it's okay that we have things in asm-generic on
some platforms but not others - is having it the same everywhere not the
whole point of asm-generic?

That said, ia64 doesn't have early_ioremap.h .  So instead, since it's
difficult to imagine new IA64 machines with UEFI 2.5, just don't build
this code there.

To me this looks like a workaround - doing something like:

generic-y += early_ioremap.h

in arch/ia64/include/asm/Kbuild would appear to be more correct, but
ia64 has its own early_memremap() decl in arch/ia64/include/asm/io.h ,
and it's a macro.  So adding the above /and/ requiring that asm/io.h be
included /after/ asm/early_ioremap.h in all cases would fix it, but
that's pretty ugly as well.  Since I'm not going to spend the rest of my
life rectifying ia64 headers vs "generic" headers that aren't generic,
it's much simpler to just not build there.

Note that I've only actually tried to build this patch on x86_64, but
esrt.o still gets built there, and that would seem to demonstrate that
the conditional building is working correctly at all the places the code
built before.  I no longer have any ia64 machines handy to test that the
exclusion actually works there.

Signed-off-by: Peter Jones <pjones@redhat.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
(Compile-)Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2015-06-08 10:51:31 +01:00
Aleksa Sarai
cb4a316752 cgroup: use bitmask to filter for_each_subsys
Add a new macro for_each_subsys_which that allows all enabled cgroup
subsystems to be filtered by a bitmask, such that mask & (1 << ssid)
determines if the subsystem is to be processed in the loop body (where
ssid is the unique id of the subsystem).

Also replace the need_forkexit_callback with two separate bitmasks for
each callback to make (ss->{fork,exit}) checks unnecessary.

tj: add a short comment for "if (!CGROUP_SUBSYS_COUNT)".

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2015-06-08 18:17:32 +09:00
Majd Dibbiny
7cf7fa529d net/mlx5_core: Fix static checker warnings around system guid query flow
Fix static checker warnings in the flow of system guid query.

Fixes: 707c4602cd ('net/mlx5_core: Add new query HCA vport commands')
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-07 20:11:17 -07:00
Kan Liang
f38b0dbb49 perf/x86/intel: Introduce PERF_RECORD_LOST_SAMPLES
After enlarging the PEBS interrupt threshold, there may be some mixed up
PEBS samples which are discarded by the kernel.

This patch makes the kernel emit a PERF_RECORD_LOST_SAMPLES record with
the number of possible discarded records when it is impossible to demux
the samples.

It makes sure the user is not left in the dark about such discards.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1431285195-14269-8-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 16:09:02 +02:00
Yan, Zheng
21509084f9 perf/x86/intel: Handle multiple records in the PEBS buffer
When the PEBS interrupt threshold is larger than one record and the
machine supports multiple PEBS events, the records of these events are
mixed up and we need to demultiplex them.

Demuxing the records is hard because the hardware is deficient. The
hardware has two issues that, when combined, create impossible
scenarios to demux.

The first issue is that the 'status' field of the PEBS record is a copy
of the GLOBAL_STATUS MSR at PEBS assist time. To see why this is a
problem let us first describe the regular PEBS cycle:

A) the CTRn value reaches 0:
  - the corresponding bit in GLOBAL_STATUS gets set
  - we start arming the hardware assist
  < some unspecified amount of time later -- this could cover multiple
    events of interest >

B) the hardware assist is armed, any next event will trigger it

C) a matching event happens:
  - the hardware assist triggers and generates a PEBS record
    this includes a copy of GLOBAL_STATUS at this moment
  - if we auto-reload we (re)set CTRn
  - we clear the relevant bit in GLOBAL_STATUS

Now consider the following chain of events:

  A0, B0, A1, C0

The event generated for counter 0 will include a status with counter 1
set, even though its not at all related to the record. A similar thing
can happen with a !PEBS event if it just happens to overflow at the
right moment.

The second issue is that the hardware will only emit one record for two
or more counters if the event that triggers the assist is 'close'. The
'close' can be several cycles. In some cases even the complete assist,
if the event is something that doesn't need retirement.

For instance, consider this chain of events:

  A0, B0, A1, B1, C01

Where C01 is an event that triggers both hardware assists, we will
generate but a single record, but again with both counters listed in the
status field.

This time the record pertains to both events.

Note that these two cases are different but undistinguishable with the
data as generated. Therefore demuxing records with multiple PEBS bits
(we can safely ignore status bits for !PEBS counters) is impossible.

Furthermore we cannot emit the record to both events because that might
cause a data leak -- the events might not have the same privileges -- so
what this patch does is discard such events.

The assumption/hope is that such discards will be rare.

Here lists some possible ways you may get high discard rate.

  - when you count the same thing multiple times. But it is not a useful
    configuration.
  - you can be unfortunate if you measure with a userspace only PEBS
    event along with either a kernel or unrestricted PEBS event. Imagine
    the event triggering and setting the overflow flag right before
    entering the kernel. Then all kernel side events will end up with
    multiple bits set.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
[ Changelog improvements. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1430940834-8964-4-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 16:08:45 +02:00
Frederic Weisbecker
9a92e3dc6a preempt: Reorganize the notrace definitions a bit
preempt.h has two seperate "#ifdef CONFIG_PREEMPT" sections: one to
define preempt_enable() and another to define preempt_enable_notrace().

Lets gather both.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1433432349-1021-4-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 15:57:43 +02:00
Frederic Weisbecker
4eaca0a887 preempt: Use preempt_schedule_context() as the official tracing preemption point
preempt_schedule_context() is a tracing safe preemption point but it's
only used when CONFIG_CONTEXT_TRACKING=y. Other configs have tracing
recursion issues since commit:

  b30f0e3ffe ("sched/preempt: Optimize preemption operations on __schedule() callers")

introduced function based preemp_count_*() ops.

Lets make it available on all configs and give it a more appropriate
name for its new position.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1433432349-1021-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 15:57:42 +02:00
Alexei Starovoitov
d691f9e8d4 bpf: allow programs to write to certain skb fields
allow programs read/write skb->mark, tc_index fields and
((struct qdisc_skb_cb *)cb)->data.

mark and tc_index are generically useful in TC.
cb[0]-cb[4] are primarily used to pass arguments from one
program to another called via bpf_tail_call() which can
be seen in sockex3_kern.c example.

All fields of 'struct __sk_buff' are readable to socket and tc_cls_act progs.
mark, tc_index are writeable from tc_cls_act only.
cb[0]-cb[4] are writeable by both sockets and tc_cls_act.

Add verifier tests and improve sample code.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-07 02:01:33 -07:00
Linus Torvalds
37ef1647b7 Merge tag 'driver-core-4.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
 "Here are two fixes for the driver core that resolve some reported
  issues.

  One is a regression from 4.0, the other a fixes a reported oops that
  has been there since 3.19.

  Both have been in linux-next for a while with no problems"

* tag 'driver-core-4.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  drivers/base: cacheinfo: handle absence of caches
  drivers: of/base: move of_init to driver_init
2015-06-06 22:37:45 -07:00
Dinh Nguyen
2e61dfb360 clk: of: helper for filling parent clock array and return num of parents
Sprinkled all through the platform clock drivers are code like this to
fill the clock parent array:

for (i = 0; i < num_parents; ++i)
	parent_names[i] = of_clk_get_parent_name(np, i);

The of_clk_parent_fill() will do the same as the code above, and while
at it, return the number of parents as well since the logic of the
function is to the walk the clock node to look for the parent.

Signed-off-by: Dinh Nguyen <dinguyen@opensource.altera.com>
[sboyd@codeaurora.org: Fixed kernel-doc]
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2015-06-05 18:10:33 -07:00
Thomas Gleixner
9f61f62544 Merge branch 'linus' into irq/core
Get the urgent fixes from upstream to avoid conflicts.
2015-06-05 22:25:01 +02:00
Linus Torvalds
a0e9c6efa5 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "The biggest chunk of the changes are two regression fixes: a HT
  workaround fix and an event-group scheduling fix.  It's been verified
  with 5 days of fuzzer testing.

  Other fixes:

   - eBPF fix
   - a BIOS breakage detection fix
   - PMU driver fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/pt: Fix a refactoring bug
  perf/x86: Tweak broken BIOS rules during check_hw_exists()
  perf/x86/intel/pt: Untangle pt_buffer_reset_markers()
  perf: Disallow sparse AUX allocations for non-SG PMUs in overwrite mode
  perf/x86: Improve HT workaround GP counter constraint
  perf/x86: Fix event/group validation
  perf: Fix race in BPF program unregister
2015-06-05 10:00:53 -07:00
Keith Busch
a5768aa887 NVMe: Automatic namespace rescan
Namespaces may be dynamically allocated and deleted or attached and
detached. This has the driver rescan the device for namespace changes
after each device reset or namespace change asynchronous event.

There could potentially be many detached namespaces that we don't want
polluting /dev/ with unusable block handles, so this will delete disks
if the namespace is not active as indicated by the response from identify
namespace. This also skips adding the disk if no capacity is provisioned
to the namespace in the first place.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-05 10:58:34 -06:00
Jens Axboe
b281ebb817 Merge branch 'for-4.2/core' into for-4.2/drivers 2015-06-05 10:58:28 -06:00
Jens Axboe
3f21c265cd block: add blk_set_queue_dying() to blkdev.h
We export this function and NVMe wants to use it, but for some reason
it was never added to the block header. Do that.

Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-05 10:57:37 -06:00