Commit Graph

36256 Commits

Author SHA1 Message Date
Rafał Miłecki
e570bd0472 ssb: add struct for serial flash
This data allow writing for example MTD driver.

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2013-06-17 14:38:54 -04:00
Mark Brown
5464389755 Merge remote-tracking branch 'asoc/topic/wm8994' into asoc-next 2013-06-17 17:20:32 +01:00
Mark Brown
bfe617d33d Merge remote-tracking branch 'asoc/topic/ssm2518' into asoc-next 2013-06-17 17:20:29 +01:00
Mark Brown
9912b30f95 Merge remote-tracking branch 'asoc/topic/arizona' into asoc-next 2013-06-17 17:20:14 +01:00
Linus Walleij
5ca3353bce pinctrl: establish pull-up/pull-down terminology
It is counter-intuitive to have "0" mean disable in a boolean
manner for electronic properties of pins such as pull-up and
pull-down. Therefore, define that a pull-up/pull-down argument
of 0 to such a generic option means that the pin is
short-circuited to VDD or GROUND. Pull disablement shall be
done using PIN_CONFIG_BIAS_DISABLE.

Cc: James Hogan <james.hogan@imgtec.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by Heiko Stuebner <heiko@sntech.de>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-06-17 18:18:32 +02:00
Christian Ruppert
56a59911e5 Fix comment on pinctrl_gpio_range.pin_base
The comment introduced with the recently added pinctrl_gpio_range.pins
element was wrong. This corrects it.
Thanks to Patrice Chotard for pointing this out.

Signed-off-by: Christian Ruppert <christian.ruppert@abilis.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-06-17 18:18:29 +02:00
Linus Walleij
ff73ceed0b pinctrl: move the pm state stubs
The stubs for the !PINCTRL case were placed in the wrong
part of the file, causing breakage in linux-next when compiling
SH without pinctrl. Fix it up by moving the stubs to the right
place.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-06-17 18:18:28 +02:00
Christian Ruppert
c8587eeef8 pinctrl: add pin list based GPIO ranges
Traditionally, GPIO ranges are based on consecutive ranges of both GPIO
and pin numbers. This patch allows for GPIO ranges with arbitrary lists
of pin numbers.

Signed-off-by: Christian Ruppert <christian.ruppert@abilis.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-06-17 18:18:28 +02:00
Linus Walleij
b263e9b887 pinctrl: get rid of all platform data for coh901
This deletes the dependency on any platform data for
the COH901 pin controller. There is only one user in the
kernel, and if we at some point want to support more
variants, they shall provide their variant info through
the device tree.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-06-17 13:54:38 +02:00
Ulrich Hecht
f303b364b4 serial: sh-sci: HSCIF support
Adds support for "High Speed Serial Communications Interface with FIFO",
essentially a SCIF with 128-byte FIFOs and more accurate baud rate
generator.

Signed-off-by: Ulrich Hecht <ulrich.hecht@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
2013-06-17 18:09:53 +09:00
Tejun Heo
a4244454df percpu-refcount: use RCU-sched insted of normal RCU
percpu-refcount was incorrectly using preempt_disable/enable() for RCU
critical sections against call_rcu().  6a24474da8 ("percpu-refcount:
consistently use plain (non-sched) RCU") fixed it by converting the
preepmtion operations with rcu_read_[un]lock() citing that there isn't
any advantage in using sched-RCU over using the usual one; however,
rcu_read_[un]lock() for the preemptible RCU implementation -
CONFIG_TREE_PREEMPT_RCU, chosen when CONFIG_PREEMPT - are slightly
more expensive than preempt_disable/enable().

In a contrived microbench which repeats the followings,

 - percpu_ref_get()
 - copy 32 bytes of data into percpu buffer
 - percpu_put_get()
 - copy 32 bytes of data into percpu buffer

rcu_read_[un]lock() used in percpu_ref_get/put() makes it go slower by
about 15% when compared to using sched-RCU.

As the RCU critical sections are extremely short, using sched-RCU
shouldn't have any latency implications.  Convert to RCU-sched.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Kent Overstreet <koverstreet@google.com>
Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rusty Russell <rusty@rustcorp.com.au>
2013-06-16 16:12:26 -07:00
Linus Walleij
14005ee270 drivers: pinctrl sleep and idle states in the core
If a device have sleep and idle states in addition to the
default state, look up these in the core and stash them in
the pinctrl state container.

Add accessor functions for pinctrl consumers to put the pins
into "default", "sleep" and "idle" states passing nothing but
the struct device * affected.

Solution suggested by Kevin Hilman, Mark Brown and Dmitry
Torokhov in response to a patch series from Hebbar
Gururaja.

Cc: Hebbar Gururaja <gururaja.hebbar@ti.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Acked-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-06-16 11:56:52 +02:00
Heiko Stübner
7970cb770d pinctrl: add pinconf-generic define for a pin-default pull
There exist controllers that don't support to set the pull to up or down
separately but instead automatically set the pull direction based on
embedded knowledge inside the controller, for example depending on the
selected mux function of the pin.

Therefore this patch adds another config option to use this default
pull-state for a pin where it is not possible to know or decide if the
pin will be pulled up or down.

Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-06-16 11:56:51 +02:00
James Hogan
a2df4269ca pinconf-generic: add BIAS_BUS_HOLD pinconf
Add a new PIN_CONFIG_BIAS_BUS_HOLD pin configuration for a bus holder
pin mode (also known as bus keeper, or repeater). This is a weak latch
which drives the last value on a tristate bus. Another device on the bus
can drive the bus high or low before going tristate to change the value
driven by the pin.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-06-16 11:56:50 +02:00
Haojian Zhuang
045779942c clk: gate: add CLK_GATE_HIWORD_MASK
In Rockchip Cortex-A9 based chips, they don't use paradigm of
reading-changing-writing the register contents.  Instead they
use a hiword mask to indicate the changed bits.

When b1 should be set as gate, it also needs to indicate the change
by setting hiword mask (b1 << 16).

The patch adds gate flag for this usage.

Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org>
Signed-off-by: Mike Turquette <mturquette@linaro.org>
2013-06-15 20:23:53 -07:00
Haojian Zhuang
d57dfe7508 clk: divider: add CLK_DIVIDER_HIWORD_MASK flag
In both Hisilicon & Rockchip Cortex-A9 based chips, they don't use the
paradigm of reading-changing-writing the register contents.
Instead they use a hiword mask to indicate the changed bits.

When b01 should be set as setting divider, it also needs to indicate
the change by setting hiword mask (b11 << 16).

The patch adds divider flag for this usage.

Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org>
Signed-off-by: Mike Turquette <mturquette@linaro.org>
2013-06-15 20:23:49 -07:00
Haojian Zhuang
ba492e9007 clk: mux: add CLK_MUX_HIWORD_MASK
In both Hisilicon & Rockchip Cortex-A9 based chips, they don't use the
paradigm of reading-changing-writing the register contents.
Instead they use a hiword mask to indicate the changed bits.

When b01 should be set as switching mux, it also needs to indicate
the change by setting hiword mask (b11 << 16).

The patch adds mux flag for this usage.

Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org>
Signed-off-by: Mike Turquette <mturquette@linaro.org>
2013-06-15 20:23:36 -07:00
Rafael J. Wysocki
592a55d835 Merge tag 'omap-pm-v3.11/avs' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-omap-pm into pm-omap
Pull PM / AVS: SmartReflex: misc. cleanups for 3.11 from Kevin Hilman.
2013-06-16 01:30:46 +02:00
David Daney
f21afc25f9 smp.h: Use local_irq_{save,restore}() in !SMP version of on_each_cpu().
Thanks to commit f91eb62f71 ("init: scream bloody murder if interrupts
are enabled too early"), "bloody murder" is now being screamed.

With a MIPS OCTEON config, we use on_each_cpu() in our
irq_chip.irq_bus_sync_unlock() function.  This gets called in early as a
result of the time_init() call.  Because the !SMP version of
on_each_cpu() unconditionally enables irqs, we get:

    WARNING: at init/main.c:560 start_kernel+0x250/0x410()
    Interrupts were enabled early
    CPU: 0 PID: 0 Comm: swapper Not tainted 3.10.0-rc5-Cavium-Octeon+ #801
    Call Trace:
      show_stack+0x68/0x80
      warn_slowpath_common+0x78/0xb0
      warn_slowpath_fmt+0x38/0x48
      start_kernel+0x250/0x410

Suggested fix: Do what we already do in the SMP version of
on_each_cpu(), and use local_irq_save/local_irq_restore.  Because we
need a flags variable, make it a static inline to avoid name space
issues.

[ Change from v1: Convert on_each_cpu to a static inline function, add
  #include <linux/irqflags.h> to avoid build breakage on some files.

  on_each_cpu_mask() and on_each_cpu_cond() suffer the same problem as
  on_each_cpu(), but they are not causing !SMP bugs for me, so I will
  defer changing them to a less urgent patch. ]

Signed-off-by: David Daney <david.daney@cavium.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-14 19:24:42 -10:00
Olof Johansson
38d77ff90a Merge tag 'renesas-gpio-rcar-for-v3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into next/drivers
From Simon Horman:
Renesas ARM based SoC GPIO R-Car updates for v3.11

DT support to GPIO R-Car driver by Laurent Pinchart.

* tag 'renesas-gpio-rcar-for-v3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas: (131 commits)
  gpio-rcar: Add DT support
2013-06-14 17:45:39 -07:00
Olof Johansson
a114926964 Merge tag 'renesas-phy-rcar-usb-for-v3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into next/soc
From Simon Horman:
Renesas USB updates for v3.11

These updates are by Sergei Shtylyov to clean-up USB support
present for R8A7779/Marzen and then extend USB support coverage to
R8A7778/BOCK-W.

* tag 'renesas-phy-rcar-usb-for-v3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
  ARM: shmobile: BOCK-W: add USB support
  ARM: shmobile: r8a7778: add USB support
  phy-rcar-usb: add R8A7778 support
  phy-rcar-usb: handle platform data
  ARM: shmobile: Marzen: pass platform data to USB PHY device
  phy-rcar-usb: add platform data
  phy-rcar-usb: correct base address
  ARM: shmobile: r8a7779: remove USB PHY 2nd memory resource
  phy-rcar-usb: remove EHCI internal buffer setup
  ARM: shmobile: r8a7779: setup EHCI internal buffer
  ehci-platform: add pre_setup() method to platform data
  ARM: shmobile: Marzen: move USB EHCI, OHCI, and PHY devices to R8A7779 code

Conflicts:
	arch/arm/mach-shmobile/board-marzen.c
	arch/arm/mach-shmobile/setup-r8a7778.c

Signed-off-by: Olof Johansson <olof@lixom.net>
2013-06-14 17:36:30 -07:00
Olof Johansson
2c3165ebb6 Merge tag 'ux500-dma40-for-arm-soc-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson into next/drivers
From Linus Walleij:
Second set of DMA40 changes: refactorings and device tree
support for the DMA40. Now with MUSB and some platform
data removal.

* tag 'ux500-dma40-for-arm-soc-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson:
  dmaengine: ste_dma40: Fetch disabled channels from DT
  dmaengine: ste_dma40: Fetch the number of physical channels from DT
  ARM: ux500: Stop passing DMA platform data though AUXDATA
  dmaengine: ste_dma40: Allow memcpy channels to be configured from DT
  dmaengine: ste_dma40_ll: Replace meaningless register set with comment
  dmaengine: ste_dma40: Convert data_width from register bit format to value
  dmaengine: ste_dma40_ll: Use the BIT macro to replace ugly '(1 << x)'s
  ARM: ux500: Remove recently unused stedma40_xfer_dir enums
  dmaengine: ste_dma40: Replace ST-E's home-brew DMA direction defs with generic ones
  ARM: ux500: Replace ST-E's home-brew DMA direction definition with the generic one
  dmaengine: ste_dma40: Use the BIT macro to replace ugly '(1 << x)'s
  ARM: ux500: Remove empty function u8500_of_init_devices()
  ARM: ux500: Remove ux500-musb platform registation when booting with DT
  usb: musb: ux500: add device tree probing support
  usb: musb: ux500: attempt to find channels by name before using pdata
  usb: musb: ux500: harden checks for platform data
  usb: musb: ux500: take the dma_mask from coherent_dma_mask
  usb: musb: ux500: move the MUSB HDRC configuration into the driver
  usb: musb: ux500: move channel number knowledge into the driver
2013-06-14 16:53:54 -07:00
Bjorn Helgaas
df58f46c0f Merge branch 'pci/jiang-bus-lock-v3' into next
* pci/jiang-bus-lock-v3:
  PCI: Return early on allocation failures to unindent mainline code
  PCI: Simplify IOV implementation and fix reference count races
  PCI: Drop redundant setting of bus->is_added in virtfn_add_bus()
  unicore32/PCI: Remove redundant call of pci_bus_add_devices()
  m68k/PCI: Remove redundant call of pci_bus_add_devices()
  PCI: Rename pci_release_bus_bridge_dev() to pci_release_host_bridge_dev()
  PCI: Fix refcount issue in pci_create_root_bus() error recovery path
  ia64/PCI: Clean up pci_scan_root_bus() usage
  PCI: Convert alloc_pci_dev(void) to pci_alloc_dev(bus)
  PCI: Introduce pci_alloc_dev(struct pci_bus*) to replace alloc_pci_dev()
  PCI: Introduce pci_bus_{get|put}() to manage PCI bus reference count

Conflicts:
	drivers/pci/probe.c
2013-06-14 17:47:46 -06:00
Bjorn Helgaas
726246d2e6 Merge branch 'pci/misc' into next
* pci/misc:
  PCI / ACPI / PM: Use correct power state strings in messages
  PCI: Fix comment typo for pcie_pme_remove()
  PCI: Add pcibios_release_device()
2013-06-14 17:08:48 -06:00
Olof Johansson
10f8902b47 Merge tag 'omap-for-v3.11/soc-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into next/soc
From Tony Lindgren:
Omap SoC changes. Mostly improves am33xx support, and adds
minimal support for am43x SoCs.

* tag 'omap-for-v3.11/soc-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: OMAP2+: AM43x: SRAM base and size
  ARM: OMAP2+: AM43x: GP or HS ?
  ARM: OMAP2+: AM43x: early init
  ARM: OMAP2+: AM43x: static mapping
  ARM: OMAP2+: AM437x: SoC revision detection
  ARM: OMAP2+: AM43x: soc_is support
  ARM: OMAP2+: AM43x: kbuild
  ARM: OMAP2+: AM43x: Kconfig
  ARM: OMAP2+: separate out OMAP4 restart
  ARM: AM33XX: clk: Add clock node for EHRPWM TBCLK
  ARM: OMAP3: clock data: get rid of unused USB host clock aliases and dummies
  ARM: OMAP2+: AM33xx: Add missing reset status info to GFX hwmod
  + Linux 3.10-rc5

Signed-off-by: Olof Johansson <olof@lixom.net>
2013-06-14 14:32:01 -07:00
Olof Johansson
777d466d13 Merge tag 'omap-for-v3.11/cleanup-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into next/cleanup
From Tony Lindgren:
Move omap4 over to device tree based booting. This allows us to get rid
a big pile of platform init code for things that are already handled by
device tree related code. As am33xx is already device tree based, we
can also remove the same data for am33xx.

* tag 'omap-for-v3.11/cleanup-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: OMAP4: hwmod data: Remove irq entries from mcspi, mmc hwmods
  ARM: OMAP4: hwmod data: add DSS data back
  ARM: OMAP4: hwmod data: Clean up the data file
  ARM: AM33XX: hwmod data: irq, dma and addr info clean up
  ARM: OMAP2+: Remove omap4 ocp2scp pdata
  ARM: OMAP2+: Remove omap4 pdata for USB
  ARM: OMAP2+: Remove omap4 pdata from hsmmc.c
  ARM: OMAP2+: Remove legacy mux data for omap4
  ARM: OMAP2+: Remove board-omap4panda.c
  ARM: OMAP2+: Remove board-4430sdp.c
  ARM: OMAP2+: Legacy support for wl12xx when booted with devicetree

Resolved merge conflict due to a fix for 3.10 (the fix is removed since
the code is no longer used -- data comes from device tree).

Signed-off-by: Olof Johansson <olof@lixom.net>
2013-06-14 14:28:22 -07:00
Olof Johansson
214099b028 Merge tag 'omap-for-v3.10/fixes-v3.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into next/cleanup
Pulling in a set of fixes from Tony Lindgren to resolve conflicts with later
cleanup branch:

A set of small fixes for omaps for the -rc cycle:

- am7303 iva2 reset PM regression fix
- am33xx uart2 dma channel fix
- am33xx gpmc properties fix
- omap44xx rtc wake-up mux fix for nirq pins
- omap36xx clock divider restore fix

There's also one tiny non-critical .dts fix for omap5
timer pwm properties.

* tag 'omap-for-v3.10/fixes-v3.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: omap3: clock: fix wrong container_of in clock36xx.c
  ARM: dts: OMAP5: Fix missing PWM capability to timer nodes
  ARM: dts: omap4-panda|sdp: Fix mux for twl6030 IRQ pin and msecure line
  ARM: dts: AM33xx: Fix properties on gpmc node
  arm: omap2: fix AM33xx hwmod infos for UART2
  + Linux 3.10-rc4

Signed-off-by: Olof Johansson <olof@lixom.net>
2013-06-14 14:26:06 -07:00
Olof Johansson
9d6dec733b Merge tag 'omap-for-v3.11/fixes-non-critical-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into next/fixes-non-critical
From Tony Lindgren:
Non-critical fixes for omaps for v3.11 merge window.

* tag 'omap-for-v3.11/fixes-non-critical-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: OMAP2+: omap-usb-host: Fix memory leaks
  ARM: OMAP2+: Fix serial init for device tree based booting
  arm/omap: use const char properly
  ARM: OMAP2+: devices: Do not print error when dss_hdmi hwmod lookup fails
  ARM: OMAP2+: devices: Do not print error when DMIC hwmod lookup fails
  ARM: OMAP2+: devices: Do not print error when McPDM hwmod lookup fails
  ARM: OMAP: add vdds_sdi supply for omapdss_sdi.0
  ARM: OMAP: add vdds_dsi supply for omapdss_dpi.0
  ARM: OMAP: fix dsi regulator names
  + Linux 3.10-rc5

Signed-off-by: Olof Johansson <olof@lixom.net>
2013-06-14 14:07:53 -07:00
Steve Capper
dfa5e237e9 mm: thp: Correct the HPAGE_PMD_ORDER check.
All Transparent Huge Pages are allocated by the buddy allocator.

A compile time check is in place that fails when the order of a
transparent huge page is too large to be allocated by the buddy
allocator. Unfortunately that compile time check passes when:
HPAGE_PMD_ORDER == MAX_ORDER
( which is incorrect as the buddy allocator can only allocate
memory of order strictly less than MAX_ORDER. )

This patch updates the compile time check to fail in the above
case.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
2013-06-14 09:40:28 +01:00
Steve Capper
3212b535f2 mm: hugetlb: Copy huge_pmd_share from x86 to mm.
Under x86, multiple puds can be made to reference the same bank of
huge pmds provided that they represent a full PUD_SIZE of shared
huge memory that is aligned to a PUD_SIZE boundary.

The code to share pmds does not require any architecture specific
knowledge other than the fact that pmds can be indexed, thus can
be beneficial to some other architectures.

This patch copies the huge pmd sharing (and unsharing) logic from
x86/ to mm/ and introduces a new config option to activate it:
CONFIG_ARCH_WANTS_HUGE_PMD_SHARE

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
2013-06-14 09:33:47 +01:00
Tejun Heo
f63674fd0d cgroup: update sane_behavior documentation
f12dc02014 ("cgroup: mark "tasks" cgroup file as insane") and
cc5943a781 ("cgroup: mark "notify_on_release" and "release_agent"
cgroup files insane") forgot to update the changed behavior
documentation in cgroup.h.  Update it.

Signed-off-by: Tejun Heo <tj@kernel.org>
2013-06-13 20:02:26 -07:00
Tejun Heo
d3daf28da1 cgroup: use percpu refcnt for cgroup_subsys_states
A css (cgroup_subsys_state) is how each cgroup is represented to a
controller.  As such, it can be used in hot paths across the various
subsystems different controllers are associated with.

One of the common operations is reference counting, which up until now
has been implemented using a global atomic counter and can have
significant adverse impact on scalability.  For example, css refcnt
can be gotten and put multiple times by blkcg for each IO request.
For highops configurations which try to do as much per-cpu as
possible, the global frequent refcnting can be very expensive.

In general, given the various and hugely diverse paths css's end up
being used from, we need to make it cheap and highly scalable.  In its
usage, css refcnting isn't very different from module refcnting.

This patch converts css refcnting to use the recently added
percpu_ref.  css_get/tryget/put() directly maps to the matching
percpu_ref operations and the deactivation logic is no longer
necessary as percpu_ref already has refcnt killing.

The only complication is that as the refcnt is per-cpu,
percpu_ref_kill() in itself doesn't ensure that further tryget
operations will fail, which we need to guarantee before invoking
->css_offline()'s.  This is resolved collecting kill confirmation
using percpu_ref_kill_and_confirm() and initiating the offline phase
of destruction after all css refcnt's are confirmed to be seen as
killed on all CPUs.  The previous patches already splitted destruction
into two phases, so percpu_ref_kill_and_confirm() can be hooked up
easily.

This patch removes css_refcnt() which is used for rcu dereference
sanity check in css_id().  While we can add a percpu refcnt API to ask
the same question, css_id() itself is scheduled to be removed fairly
soon, so let's not bother with it.  Just drop the sanity check and use
rcu_dereference_raw() instead.

v2: - init_cgroup_css() was calling percpu_ref_init() without checking
      the return value.  This causes two problems - the obvious lack
      of error handling and percpu_ref_init() being called from
      cgroup_init_subsys() before the allocators are up, which
      triggers warnings but doesn't cause actual problems as the
      refcnt isn't used for roots anyway.  Fix both by moving
      percpu_ref_init() to cgroup_create().

    - The base references were put too early by
      percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the
      refs one extra time.  This wasn't noticeable because css's go
      through another RCU grace period before being freed.  Update
      cgroup_destroy_locked() to grab an extra reference before
      killing the refcnts.  This problem was noticed by Kent.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <koverstreet@google.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Alasdair G. Kergon" <agk@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Glauber Costa <glommer@gmail.com>
2013-06-13 19:43:12 -07:00
Tejun Heo
2b0e53a7c8 Merge branch 'for-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu into for-3.11
This is to receive percpu_refcount which will replace atomic_t
reference count in cgroup_subsys_state.

Signed-off-by: Tejun Heo <tj@kernel.org>
2013-06-13 19:42:22 -07:00
Tejun Heo
ea15f8ccdb cgroup: split cgroup destruction into two steps
Split cgroup_destroy_locked() into two steps and put the latter half
into cgroup_offline_fn() which is executed from a work item.  The
latter half is responsible for offlining the css's, removing the
cgroup from internal lists, and propagating release notification to
the parent.  The separation is to allow using percpu refcnt for css.

Note that this allows for other cgroup operations to happen between
the first and second halves of destruction, including creating a new
cgroup with the same name.  As the target cgroup is marked DEAD in the
first half and cgroup internals don't care about the names of cgroups,
this should be fine.  A comment explaining this will be added by the
next patch which implements the actual percpu refcnting.

As RCU freeing is guaranteed to happen after the second step of
destruction, we can use the same work item for both.  This patch
renames cgroup->free_work to ->destroy_work and uses it for both
purposes.  INIT_WORK() is now performed right before queueing the work
item.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2013-06-13 19:27:42 -07:00
Tejun Heo
dbece3a0f1 percpu-refcount: implement percpu_tryget() along with percpu_ref_kill_and_confirm()
Implement percpu_tryget() which stops giving out references once the
percpu_ref is visible as killed.  Because the refcnt is per-cpu,
different CPUs will start to see a refcnt as killed at different
points in time and tryget() may continue to succeed on subset of cpus
for a while after percpu_ref_kill() returns.

For use cases where it's necessary to know when all CPUs start to see
the refcnt as dead, percpu_ref_kill_and_confirm() is added.  The new
function takes an extra argument @confirm_kill which is invoked when
the refcnt is guaranteed to be viewed as killed on all CPUs.

While this isn't the prettiest interface, it doesn't force synchronous
wait and is much safer than requiring the caller to do its own
call_rcu().

v2: Patch description rephrased to emphasize that tryget() may
    continue to succeed on some CPUs after kill() returns as suggested
    by Kent.

v3: Function comment in percpu_ref_kill_and_confirm() updated warning
    people to not depend on the implied RCU grace period from the
    confirm callback as it's an implementation detail.

Signed-off-by: Tejun Heo <tj@kernel.org>
Slightly-Grumpily-Acked-by: Kent Overstreet <koverstreet@google.com>
2013-06-13 19:23:53 -07:00
Rony Efraim
948e306d7d net/mlx4: Add VF link state support
Add support to change the link state of VF (vPort)

Signed-off-by: Rony Efraim <ronye@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-13 17:51:04 -07:00
Rony Efraim
1d8faf48c7 net/core: Add VF link state control
Add netlink directives and ndo entry to allow for controling
VF link, which can be in one of three states:

Auto - VF link state reflects the PF link state (default)

Up - VF link state is up, traffic from VF to VF works even if
the actual PF link is down

Down - VF link state is down, no traffic from/to this VF, can be of
use while configuring the VF

Signed-off-by: Rony Efraim <ronye@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-13 17:51:04 -07:00
Willem de Bruijn
5f121b9a83 net-rps: fixes for rps flow limit
Caught by sparse:
- __rcu: missing annotation to sd->flow_limit
- __user: direct access in cpumask_scnprintf

Also
- add endline character when printing bitmap if room in buffer
- avoid bucket overflow by reducing FLOW_LIMIT_HISTORY

The last item warrants some explanation. The hashtable buckets are
subject to overflow if FLOW_LIMIT_HISTORY is larger than or equal
to bucket size, since all packets may end up in a single bucket. The
current (rather arbitrary) history value of 256 happens to match the
buffer size (u8).

As a result, with a single flow, the first 128 packets are accepted
(correct), the second 128 packets dropped (correct) and then the
history[] array has filled, so that each subsequent new packet
causes an increment in the bucket for new_flow plus a decrement
for old_flow: a steady state.

This is fine if packets are dropped, as the steady state goes away
as soon as a mix of traffic reappears. But, because the 256th packet
overflowed the bucket to 0: no packets are dropped.

Instead of explicitly adding an overflow check, this patch changes
FLOW_LIMIT_HISTORY to never be able to overflow a single bucket.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
(first item)

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-13 17:13:05 -07:00
Linus Torvalds
cb7e9704d5 Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU fixes from Paul McKenney:
 "I must confess that this past merge window was not RCU's best showing.
  This series contains three more fixes for RCU regressions:

   1.   A fix to __DECLARE_TRACE_RCU() that causes it to act as an
        interrupt from idle rather than as a task switch from idle.
        This change is needed due to the recent use of _rcuidle()
        tracepoints that can be invoked from interrupt handlers as well
        as from idle.  Without this fix, invoking _rcuidle() tracepoints
        from interrupt handlers results in splats and (more seriously)
        confusion on RCU's part as to whether a given CPU is idle or not.
        This confusion can in turn result in too-short grace periods and
        therefore random memory corruption.

   2.   A fix to a subtle deadlock that could result due to RCU doing
        a wakeup while holding one of its rcu_node structure's locks.
        Although the probability of occurrence is low, it really
        does happen.  The fix, courtesy of Steven Rostedt, uses
        irq_work_queue() to avoid the deadlock.

   3.   A fix to a silent deadlock (invisible to lockdep) due to the
        interaction of timeouts posted by RCU debug code enabled by
        CONFIG_PROVE_RCU_DELAY=y, grace-period initialization, and CPU
        hotplug operations.  This will not occur in production kernels,
        but really does occur in randconfig testing.  Diagnosis courtesy
        of Steven Rostedt"

* 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  rcu: Fix deadlock with CPU hotplug, RCU GP init, and timer migration
  rcu: Don't call wakeup() with rcu_node structure ->lock held
  trace: Allow idle-safe tracepoints to be called from irq
2013-06-13 12:36:42 -07:00
Tejun Heo
bc497bd33b percpu-refcount: implement percpu_ref_cancel_init()
Normally, percpu_ref_init() initializes and percpu_ref_kill()
initiates destruction which completes asynchronously.  The
asynchronous destruction can be problematic in init failure path where
the caller wants to destroy half-constructed object - distinguishing
half-constructed objects from the usual release method can be painful
for complex objects.

This patch implements percpu_ref_cancel_init() which synchronously
destroys the percpu_ref without invoking release.  To avoid
unintentional misuses, the function requires the ref to have finished
percpu_ref_init() but never used and triggers WARN otherwise.

v2: Explain the weird name and usage restriction in the function
    comment.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Kent Overstreet <koverstreet@google.com>
2013-06-13 11:08:27 -07:00
Tejun Heo
acac7883ee percpu-refcount: add __must_check to percpu_ref_init() and don't use ACCESS_ONCE() in percpu_ref_kill_rcu()
Two small changes.

* Unlike most init functions, percpu_ref_init() allocates memory and
  may fail.  Let's mark it with __must_check in case the caller
  forgets.

* percpu_ref_kill_rcu() is unnecessarily using ACCESS_ONCE() to
  dereference @ref->pcpu_count, which can be misleading.  The pointer
  is guaranteed to be valid and visible and can't change underneath
  the function.  Drop ACCESS_ONCE().

Signed-off-by: Tejun Heo <tj@kernel.org>
2013-06-13 11:08:26 -07:00
Tejun Heo
6f3d828f0f cgroup: remove cgroup->count and use
cgroup->count tracks the number of css_sets associated with the cgroup
and used only to verify that no css_set is associated when the cgroup
is being destroyed.  It's superflous as the destruction path can
simply check whether cgroup->cset_links is empty instead.

Drop cgroup->count and check ->cset_links directly from
cgroup_destroy_locked().

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2013-06-13 10:55:18 -07:00
Tejun Heo
54766d4a1d cgroup: rename CGRP_REMOVED to CGRP_DEAD
We will add another flag indicating that the cgroup is in the process
of being killed.  REMOVING / REMOVED is more difficult to distinguish
and cgroup_is_removing()/cgroup_is_removed() are a bit awkward.  Also,
later percpu_ref usage will involve "kill"ing the refcnt.

 s/CGRP_REMOVED/CGRP_DEAD/
 s/cgroup_is_removed()/cgroup_is_dead()

This patch is purely cosmetic.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2013-06-13 10:55:18 -07:00
Tejun Heo
5de0107e63 cgroup: clean up css_[try]get() and css_put()
* __css_get() isn't used by anyone.  Fold it into css_get().

* Add proper function comments to all css reference functions.

This patch is purely cosmetic.

v2: Typo fix as per Li.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2013-06-13 10:55:17 -07:00
Tejun Heo
69d0206c79 cgroup: bring some sanity to naming around cg_cgroup_link
cgroups and css_sets are mapped M:N and this M:N mapping is
represented by struct cg_cgroup_link which forms linked lists on both
sides.  The naming around this mapping is already confusing and struct
cg_cgroup_link exacerbates the situation quite a bit.

>From cgroup side, it starts off ->css_sets and runs through
->cgrp_link_list.  From css_set side, it starts off ->cg_links and
runs through ->cg_link_list.  This is rather reversed as
cgrp_link_list is used to iterate css_sets and cg_link_list cgroups.
Also, this is the only place which is still using the confusing "cg"
for css_sets.  This patch cleans it up a bit.

* s/cgroup->css_sets/cgroup->cset_links/
  s/css_set->cg_links/css_set->cgrp_links/
  s/cgroup_iter->cg_link/cgroup_iter->cset_link/

* s/cg_cgroup_link/cgrp_cset_link/

* s/cgrp_cset_link->cg/cgrp_cset_link->cset/
  s/cgrp_cset_link->cgrp_link_list/cgrp_cset_link->cset_link/
  s/cgrp_cset_link->cg_link_list/cgrp_cset_link->cgrp_link/

* s/init_css_set_link/init_cgrp_cset_link/
  s/free_cg_links/free_cgrp_cset_links/
  s/allocate_cg_links/allocate_cgrp_cset_links/

* s/cgl[12]/link[12]/ in compare_css_sets()

* s/saved_link/tmp_link/ s/tmp/tmp_links/ and a couple similar
  adustments.

* Comment and whiteline adjustments.

After the changes, we have

	list_for_each_entry(link, &cont->cset_links, cset_link) {
		struct css_set *cset = link->cset;

instead of

	list_for_each_entry(link, &cont->css_sets, cgrp_link_list) {
		struct css_set *cset = link->cg;

This patch is purely cosmetic.

v2: Fix broken sentences in the patch description.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2013-06-13 10:55:17 -07:00
Tejun Heo
3fc3db9a3a cgroup: remove now unused css_depth()
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2013-06-13 10:55:17 -07:00
Li Zefan
88fa523bff cpuset: allow to move tasks to empty cpusets
Currently some cpuset behaviors are not friendly when cpuset is co-mounted
with other cgroup controllers.

Now with this patchset if cpuset is mounted with sane_behavior option,
it behaves differently:

- Tasks will be kept in empty cpusets when hotplug happens and take
  masks of ancestors with non-empty cpus/mems, instead of being moved to
  an ancestor.

- A task can be moved into an empty cpuset, and again it takes masks of
  ancestors, so the user can drop a task into a newly created cgroup without
  having to do anything for it.

As tasks can reside in empy cpusets, here're some rules:

- They can be moved to another cpuset, regardless it's empty or not.

- Though it takes masks from ancestors, it takes other configs from the
  empty cpuset.

- If the ancestors' masks are changed, those tasks will also be updated
  to take new masks.

v2: add documentation in include/linux/cgroup.h

Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-06-13 10:48:33 -07:00
Li Zefan
5c5cc62321 cpuset: allow to keep tasks in empty cpusets
To achieve this:

- We call update_tasks_cpumask/nodemask() for empty cpusets when
hotplug happens, instead of moving tasks out of them.

- When a cpuset's masks are changed by writing cpuset.cpus/mems,
we also update tasks in child cpusets which are empty.

v3:
- do propagation work in one place for both hotplug and unplug

v2:
- drop rcu_read_lock before calling update_task_nodemask() and
  update_task_cpumask(), instead of using workqueue.
- add documentation in include/linux/cgroup.h

Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-06-13 10:48:32 -07:00
Samuel Ortiz
d8c5d658f4 Merge tag 'am335x_tsc-adc' of git://breakpoint.cc/bigeasy/linux
A complete refurbished series inclunding:
- DT support for the MFD, TSC and ADC driver & platform device support,
  which has no users, has been killed.
- iio_map from last series is gone and replaced by proper nodes in the
  device tree.
- suspend fixes which means correct data structs are taken and no
  interrupt storm
- fifo split which should problem with TSC & ADC beeing used at the same
  time
- The ADC channels are now checked before blindly applied. That means the
  touch part reads X, Y and Z coordinates and does not mix them up. Same
  goes for the IIO ADC driver.
- The IIO ADC driver now creates files named in_voltageX_raw where X
  represents the ADC line instead of a number starting at 0. A read from
  this file can return -EBUSY in case touch is busy and the ADC didn't
  collect a value.
2013-06-13 12:14:59 +02:00
Samuel Ortiz
c5fa44d134 Merge branch 'for-mfd-next' of git://git.linaro.org/people/ljones/linux-3.0-ux500 2013-06-13 12:06:15 +02:00