Commit Graph

30926 Commits

Author SHA1 Message Date
Paul E. McKenney
867f236bd1 rcu: Make srcu_read_lock_held() call common lockdep-enabled function
A common debug_lockdep_rcu_enabled() function is used to check whether
RCU lockdep splats should be reported, but srcu_read_lock() does not
use it.  This commit therefore brings srcu_read_lock_held() up to date.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:34 -08:00
Paul E. McKenney
ff195cb69b rcu: Warn when srcu_read_lock() is used in an extended quiescent state
Catch SRCU up to the other variants of RCU by making PROVE_RCU
complain if either srcu_read_lock() or srcu_read_lock_held() are
used from within RCU-idle mode.

Frederic reworked this to allow for the new versions of his patches
that check for extended quiescent states.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:33 -08:00
Paul E. McKenney
d8ab29f8be rcu: Remove one layer of abstraction from PROVE_RCU checking
Simplify things a bit by substituting the definitions of the single-line
rcu_read_acquire(), rcu_read_release(), rcu_read_acquire_bh(),
rcu_read_release_bh(), rcu_read_acquire_sched(), and
rcu_read_release_sched() functions at their call points.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:32 -08:00
Frederic Weisbecker
00f49e5729 rcu: Warn when rcu_read_lock() is used in extended quiescent state
We are currently able to detect uses of rcu_dereference_check() inside
extended quiescent states (such as the RCU-free window in idle).
But rcu_read_lock() and friends can be used without rcu_dereference(),
so that the earlier commit checking for use of rcu_dereference() and
friends while in RCU idle mode miss some error conditions.  This commit
therefore adds extended quiescent state checking to rcu_read_lock() and
friends.

Uses of RCU from within RCU-idle mode are totally ignored by
RCU, hence the importance of these checks.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:31 -08:00
Frederic Weisbecker
e6b80a3b09 rcu: Detect illegal rcu dereference in extended quiescent state
Report that none of the rcu read lock maps are held while in an RCU
extended quiescent state (the section between rcu_idle_enter()
and rcu_idle_exit()). This helps detect any use of rcu_dereference()
and friends from within the section in idle where RCU is not allowed.

This way we can guarantee an extended quiescent window where the CPU
can be put in dyntick idle mode or can simply aoid to be part of any
global grace period completion while in the idle loop.

Uses of RCU from such mode are totally ignored by RCU, hence the
importance of these checks.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:30 -08:00
Paul E. McKenney
91afaf3002 rcu: Add failure tracing to rcutorture
Trace the rcutorture RCU accesses and dump the trace buffer when the
first failure is detected.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:26 -08:00
Paul E. McKenney
9b2e4f1880 rcu: Track idleness independent of idle tasks
Earlier versions of RCU used the scheduling-clock tick to detect idleness
by checking for the idle task, but handled idleness differently for
CONFIG_NO_HZ=y.  But there are now a number of uses of RCU read-side
critical sections in the idle task, for example, for tracing.  A more
fine-grained detection of idleness is therefore required.

This commit presses the old dyntick-idle code into full-time service,
so that rcu_idle_enter(), previously known as rcu_enter_nohz(), is
always invoked at the beginning of an idle loop iteration.  Similarly,
rcu_idle_exit(), previously known as rcu_exit_nohz(), is always invoked
at the end of an idle-loop iteration.  This allows the idle task to
use RCU everywhere except between consecutive rcu_idle_enter() and
rcu_idle_exit() calls, in turn allowing architecture maintainers to
specify exactly where in the idle loop that RCU may be used.

Because some of the userspace upcall uses can result in what looks
to RCU like half of an interrupt, it is not possible to expect that
the irq_enter() and irq_exit() hooks will give exact counts.  This
patch therefore expands the ->dynticks_nesting counter to 64 bits
and uses two separate bitfields to count process/idle transitions
and interrupt entry/exit transitions.  It is presumed that userspace
upcalls do not happen in the idle loop or from usermode execution
(though usermode might do a system call that results in an upcall).
The counter is hard-reset on each process/idle transition, which
avoids the interrupt entry/exit error from accumulating.  Overflow
is avoided by the 64-bitness of the ->dyntick_nesting counter.

This commit also adds warnings if a non-idle task asks RCU to enter
idle state (and these checks will need some adjustment before applying
Frederic's OS-jitter patches (http://lkml.org/lkml/2011/10/7/246).
In addition, validation of ->dynticks and ->dynticks_nesting is added.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11 10:31:24 -08:00
Stefan Nilsson XK
6de5fc9cf7 mmc: core: Add quirk for long data read time
Adds a quirk that sets the data read timeout to a fixed value instead
of relying on the information in the CSD. The timeout value chosen
is 300ms since that has proven enough for the problematic cards found,
but could be increased if other cards require this.

This patch also enables this quirk for certain Micron cards known to
have this problem.

Signed-off-by: Stefan Nilsson XK <stefan.xk.nilsson@stericsson.com>
Signed-off-by: Ulf Hansson <ulf.hansson@stericsson.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Cc: <stable@kernel.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
2011-12-10 16:18:35 -05:00
Paul Gortmaker
9deaa53ac7 serial: add irq handler for Freescale 16550 errata.
Sending a break on the SOC UARTs found in some MPC83xx/85xx/86xx
chips seems to cause a short lived IRQ storm (/proc/interrupts
typically shows somewhere between 300 and 1500 events).  Unfortunately
this renders SysRQ over the serial console completely inoperable.

The suggested workaround in the errata is to read the Rx register,
wait one character period, and then read the Rx register again.
We achieve this by tracking the old LSR value, and on the subsequent
interrupt event after a break, we don't read LSR, instead we just
read the RBR again and return immediately.

The "fsl,ns16550" is used in the compatible field of the serial
device to mark UARTs known to have this issue.

Thanks to Scott Wood for providing the errata data which led to
a much cleaner fix.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09 19:14:13 -08:00
Paul Gortmaker
3986fb2ba6 serial: export the key functions for an 8250 IRQ handler
For drivers that need to construct their own IRQ handler, the
three components are seen in the current handle_port -- i.e.
Rx, Tx and modem_status.

Make these exported symbols so that "almost" 8250 UARTs can
construct their own IRQ handler with these shared components,
while working around their own unique errata issues.

The function names are given a serial8250 prefix, since they
are now entering the global namespace.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09 19:14:13 -08:00
Matt Fleming
55839d5154 efi: Add EFI file I/O data types
The x86 EFI stub needs to access files, for example when loading
initrd's. Add the required data types.

Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1318848017-12301-1-git-send-email-matt@console-pimps.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-12-09 17:35:51 -08:00
Matt Fleming
e2527a7cbe efi.h: Add boottime->locate_handle search types
The x86 EFI boot stub needs to locate handles for various protocols.

Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1318848017-12301-1-git-send-email-matt@console-pimps.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-12-09 17:35:49 -08:00
Matt Fleming
0f7c5d477f efi.h: Add graphics protocol guids
The x86 EFI boot stub uses the Graphics Output Protocol and Universal
Graphics Adapter (UGA) protocol guids when initialising graphics
during boot.

Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1318848017-12301-1-git-send-email-matt@console-pimps.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-12-09 17:35:46 -08:00
Matt Fleming
bb05e4ba45 efi.h: Add allocation types for boottime->allocate_pages()
Add the allocation types detailed in section 6.2 - "AllocatePages()"
of the UEFI 2.3 specification. These definitions will be used by the
x86 EFI boot stub which needs to allocate memory during boot.

Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1318848017-12301-1-git-send-email-matt@console-pimps.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-12-09 17:35:44 -08:00
Matt Fleming
8e84f345e2 efi.h: Add efi_image_loaded_t
Add the EFI loaded image structure and protocol guid which are
required by the x86 EFI boot stub. The EFI boot stub uses the
structure to figure out where it was loaded in memory and to pass
command line arguments to the kernel.

Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1318848017-12301-1-git-send-email-matt@console-pimps.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-12-09 17:35:42 -08:00
Matt Fleming
f30ca6ba0b efi.h: Add struct definition for boot time services
With the forthcoming efi stub code we're gonna need to access boot
time services so let's define a struct so we can access the functions.

Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1318848017-12301-1-git-send-email-matt@console-pimps.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-12-09 17:35:40 -08:00
Uwe Kleine-König
5a3072be6c drivers_base: make argument to platform_device_register_full const
platform_device_register_full doesn't modify *pdevinfo so it can be
marked as const without further adaptions.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09 16:23:49 -08:00
Aman Deep
7bf01185c5 USB: Adding #define in hub_configure() and hcd.c file
This patch is in succession of previous patch
commit c842114792
        xHCI: Adding #define values used for hub descriptor

Hub descriptors characteristics #defines values are added in
hub_configure() in place of magic numbers as asked by Alan Stern.
And the indentation for switch and case is changed to be same.

Some #defines values are added in ch11.h for defining hub class
protocols and used in hub.c and hcd.c in which magic values were
used for hub class protocols.

Signed-off-by: Aman Deep <amandeep3986@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09 16:20:38 -08:00
Clemens Ladisch
bc677d5b64 usb: fix number of mapped SG DMA entries
Add a new field num_mapped_sgs to struct urb so that we have a place to
store the number of mapped entries and can also retain the original
value of entries in num_sgs.  Previously, usb_hcd_map_urb_for_dma()
would overwrite this with the number of mapped entries, which would
break dma_unmap_sg() because it requires the original number of entries.

This fixes warnings like the following when using USB storage devices:
 ------------[ cut here ]------------
 WARNING: at lib/dma-debug.c:902 check_unmap+0x4e4/0x695()
 ehci_hcd 0000:00:12.2: DMA-API: device driver frees DMA sg list with different entry count [map count=4] [unmap count=1]
 Modules linked in: ohci_hcd ehci_hcd
 Pid: 0, comm: kworker/0:1 Not tainted 3.2.0-rc2+ #319
 Call Trace:
  <IRQ>  [<ffffffff81036d3b>] warn_slowpath_common+0x80/0x98
  [<ffffffff81036de7>] warn_slowpath_fmt+0x41/0x43
  [<ffffffff811fa5ae>] check_unmap+0x4e4/0x695
  [<ffffffff8105e92c>] ? trace_hardirqs_off+0xd/0xf
  [<ffffffff8147208b>] ? _raw_spin_unlock_irqrestore+0x33/0x50
  [<ffffffff811fa84a>] debug_dma_unmap_sg+0xeb/0x117
  [<ffffffff8137b02f>] usb_hcd_unmap_urb_for_dma+0x71/0x188
  [<ffffffff8137b166>] unmap_urb_for_dma+0x20/0x22
  [<ffffffff8137b1c5>] usb_hcd_giveback_urb+0x5d/0xc0
  [<ffffffffa0000d02>] ehci_urb_done+0xf7/0x10c [ehci_hcd]
  [<ffffffffa0001140>] qh_completions+0x429/0x4bd [ehci_hcd]
  [<ffffffffa000340a>] ehci_work+0x95/0x9c0 [ehci_hcd]
  ...
 ---[ end trace f29ac88a5a48c580 ]---
 Mapped at:
  [<ffffffff811faac4>] debug_dma_map_sg+0x45/0x139
  [<ffffffff8137bc0b>] usb_hcd_map_urb_for_dma+0x22e/0x478
  [<ffffffff8137c494>] usb_hcd_submit_urb+0x63f/0x6fa
  [<ffffffff8137d01c>] usb_submit_urb+0x2c7/0x2de
  [<ffffffff8137dcd4>] usb_sg_wait+0x55/0x161

Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09 16:18:19 -08:00
Greg Kroah-Hartman
a36ae95c4e Merge v3.2-rc4 into usb-next
This lets us handle the PS3 merge easier, as well as syncing up with
other USB fixes already in the -rc4 tree.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-09 16:07:48 -08:00
Srivatsa S. Bhat
b298d289c7 PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled()
Commit a144c6a (PM: Print a warning if firmware is requested when tasks
are frozen) introduced usermodehelper_is_disabled() to warn and exit
immediately if firmware is requested when usermodehelpers are disabled.

However, it is racy. Consider the following scenario, currently used in
drivers/base/firmware_class.c:

...
if (usermodehelper_is_disabled())
        goto out;

/* Do actual work */
...

out:
        return err;

Nothing prevents someone from disabling usermodehelpers just after the check
in the 'if' condition, which means that it is quite possible to try doing the
"actual work" with usermodehelpers disabled, leading to undesirable
consequences.

In particular, this race condition in _request_firmware() causes task freezing
failures whenever suspend/hibernation is in progress because, it wrongly waits
to get the firmware/microcode image from userspace when actually the
usermodehelpers are disabled or userspace has been frozen.
Some of the example scenarios that cause freezing failures due to this race
are those that depend on userspace via request_firmware(), such as x86
microcode module initialization and microcode image reload.

Previous discussions about this issue can be found at:
http://thread.gmane.org/gmane.linux.kernel/1198291/focus=1200591

This patch adds proper synchronization to fix this issue.

It is to be noted that this patchset fixes the freezing failures but doesn't
remove the warnings. IOW, it does not attempt to add explicit synchronization
to x86 microcode driver to avoid requesting microcode image at inopportune
moments. Because, the warnings were introduced to highlight such cases, in the
first place. And we need not silence the warnings, since we take care of the
*real* problem (freezing failure) and hence, after that, the warnings are
pretty harmless anyway.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-12-09 23:36:36 +01:00
Mark Brown
925b44a273 PM / Domains: Provide an always on power domain governor
Since systems are likely to have power domains that can't be turned off
for various reasons at least temporarily while implementing power domain
support provide a default governor which will always refuse to power off
the domain, saving platforms having to implement their own.

Since the code is so tiny don't bother with a Kconfig symbol for it.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-12-09 23:22:41 +01:00
Pavel Emelyanov
1942c518ca inet_diag: Generalize inet_diag dump and get_exact calls
Introduce two callbacks in inet_diag_handler -- one for dumping all
sockets (with filters) and the other one for dumping a single sk.

Replace direct calls to icsk handlers with indirect calls to callbacks
provided by handlers.

Make existing TCP and DCCP handlers use provided helpers for icsk-s.

The UDP diag module will provide its own.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-09 14:14:08 -05:00
Pavel Emelyanov
3c4d05c805 inet_diag: Introduce the inet socket dumping routine
The existing inet_csk_diag_fill dumps the inet connection sock info
into the netlink inet_diag_message. Prepare this routine to be able
to dump only the inet_sock part of a socket if the icsk part is missing.

This will be used by UDP diag module when dumping UDP sockets.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-09 14:14:08 -05:00
Pavel Emelyanov
8d07d1518a inet_diag: Introduce the byte-code run on an inet socket
The upcoming UDP module will require exactly this ability, so just
move the existing code to provide one.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-09 14:14:08 -05:00
Pavel Emelyanov
b005ab4ef8 inet_diag: Export inet diag cookie checking routine
The netlink diag susbsys stores sk address bits in the nl message
as a "cookie" and uses one when dumps details about particular
socket.

The same will be required for udp diag module, so introduce a heler
in inet_diag module

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-09 14:14:08 -05:00
Pavel Emelyanov
7b35eadd7e inet_diag: Remove indirect sizeof from inet diag handlers
There's an info_size value stored on inet_diag_handler, but for existing
code this value is effectively constant, so just use sizeof(struct tcp_info)
where required.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-09 14:14:07 -05:00
Eric Dumazet
a73ed26bba sch_red: generalize accurate MAX_P support to RED/GRED/CHOKE
Now RED uses a Q0.32 number to store max_p (max probability), allow
RED/GRED/CHOKE to use/report full resolution at config/dump time.

Old tc binaries are non aware of new attributes, and still set/get Plog.

New tc binary set/get both Plog and max_p for backward compatibility,
they display "probability value" if they get max_p from new kernels.

# tc -d  qdisc show dev ...
...
qdisc red 10: parent 1:1 limit 360Kb min 30Kb max 90Kb ecn ewma 5
probability 0.09 Scell_log 15

Make sure we avoid potential divides by 0 in reciprocal_value(), if
(max_th - min_th) is big.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-09 13:46:15 -05:00
Jeremy Fitzhardinge
8351665195 power_supply: allow a power supply to explicitly point to powered device
If a power supply has a scope of "Device", then allow the power supply
to indicate what device it actually powers. This is represented in the
power supply's sysfs directory as a symlink named "powers", which points to
the sysfs directory of the powered device.

If the device has children, then the sub-devices are also powered by
the same power supply.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Richard Hughes <richard@hughsie.com>
2011-12-09 09:52:07 -08:00
Jeremy Fitzhardinge
25a0bc2dfc power_supply: add SCOPE attribute to power supplies
This adds a "scope" attribute to a power_supply, which indicates how
much of the system it powers.  It appears in sysfs as "scope" or in
the uevent file as POWER_SUPPLY_SCOPE=.  There are presently three
possible values:
	Unknown - unknown power topology
	System - the power supply powers the whole system
	Device - it powers a specific device, or tree of devices

A power supply which doesn't have a "scope" attribute should be assumed to
have "System" scope.

In general, usermode should assume that loss of all System-scoped power
supplies will power off the whole system, but any single one is sufficient
to power the system.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Richard Hughes <richard@hughsie.com>
2011-12-09 09:42:05 -08:00
Linus Torvalds
53523d5263 Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
  arch/tile: use new generic {enable,disable}_percpu_irq() routines
  drivers/net/ethernet/tile: use skb_frag_page() API
  asm-generic/unistd.h: support new process_vm_{readv,write} syscalls
  arch/tile: fix double-free bug in homecache_free_pages()
  arch/tile: add a few #includes and an EXPORT to catch up with kernel changes.
2011-12-09 08:08:57 -08:00
Konstantin Khlebnikov
83aeeada7c vmscan: use atomic-long for shrinker batching
Use atomic-long operations instead of looping around cmpxchg().

[akpm@linux-foundation.org: massage atomic.h inclusions]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-09 07:50:27 -08:00
Paul Mundt
0d376945d0 Merge branches 'common/clkfwk', 'common/pfc' and 'common/serial-rework' into sh-latest 2011-12-09 18:11:09 +09:00
Magnus Damm
b0e10211cb sh: pfc: ioremap() support
Add support for non-entity mapped PFC registers through
the use of struct resource and ioremap()/iounmap().

The PFC main data structure gets updated with a pointer
to a struct resources array that point out all register
windows used by the PFC instance. The register definitions
are kept as physical addresses but the PFC code will do
transparent conversion into virtual addresses whenever
register windows are specified using with struct resource.

To introduce as little performance penalty as possible the
virtual address of each data register is cached in memory.
The virtual address of each configuration register is however
calculated during run time. This because the configuration
is considered slow path so focus is instead put on keeping
memory foot print as small as possible.

The PFC register access  code is in this patch updated from
__raw_readN() / __raw_writeN() into ioreadN() / iowriteN().

This patch is needed to support the PFC block in r8a7779.

Signed-off-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2011-12-09 18:07:15 +09:00
Magnus Damm
eda2030a5b sh: extend clock struct with mapped_reg member
Add a "mapped_reg" member to struct clk and use that
to keep the ioremapped register based on enable_reg.

Signed-off-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2011-12-09 18:01:05 +09:00
Eric Dumazet
8af2a218de sch_red: Adaptative RED AQM
Adaptative RED AQM for linux, based on paper from Sally FLoyd,
Ramakrishna Gummadi, and Scott Shenker, August 2001 :

http://icir.org/floyd/papers/adaptiveRed.pdf

Goal of Adaptative RED is to make max_p a dynamic value between 1% and
50% to reach the target average queue : (max_th - min_th) / 2

Every 500 ms:
 if (avg > target and max_p <= 0.5)
  increase max_p : max_p += alpha;
 else if (avg < target and max_p >= 0.01)
  decrease max_p : max_p *= beta;

target :[min_th + 0.4*(min_th - max_th),
          min_th + 0.6*(min_th - max_th)].
alpha : min(0.01, max_p / 4)
beta : 0.9
max_P is a Q0.32 fixed point number (unsigned, with 32 bits mantissa)

Changes against our RED implementation are :

max_p is no longer a negative power of two (1/(2^Plog)), but a Q0.32
fixed point number, to allow full range described in Adatative paper.

To deliver a random number, we now use a reciprocal divide (thats really
a multiply), but this operation is done once per marked/droped packet
when in RED_BETWEEN_TRESH window, so added cost (compared to previous
AND operation) is near zero.

dump operation gives current max_p value in a new TCA_RED_MAX_P
attribute.

Example on a 10Mbit link :

tc qdisc add dev $DEV parent 1:1 handle 10: est 1sec 8sec red \
   limit 400000 min 30000 max 90000 avpkt 1000 \
   burst 55 ecn adaptative bandwidth 10Mbit

# tc -s -d qdisc show dev eth3
...
qdisc red 10: parent 1:1 limit 400000b min 30000b max 90000b ecn
adaptative ewma 5 max_p=0.113335 Scell_log 15
 Sent 50414282 bytes 34504 pkt (dropped 35, overlimits 1392 requeues 0)
 rate 9749Kbit 831pps backlog 72056b 16p requeues 0
  marked 1357 early 35 pdrop 0 other 0

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-08 19:52:43 -05:00
Jiri Pirko
348a1443cc vlan: introduce functions to do mass addition/deletion of vids by another device
Introduce functions handy to copy vlan ids from one driver's list to
another.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-08 19:52:42 -05:00
Jiri Pirko
5b9ea6e022 vlan: introduce vid list with reference counting
This allows to keep track of vids needed to be in rx vlan filters of
devices even if they are used in bond/team etc.

vlan_info as well as vlan_group previously was, is allocated when first
vid is added and dealocated whan last vid is deleted.

vlan_group definition is moved to private header.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-08 19:52:42 -05:00
Jiri Pirko
87002b03ba net: introduce vlan_vid_[add/del] and use them instead of direct [add/kill]_vid ndo calls
This patch adds wrapper for ndo_vlan_rx_add_vid/ndo_vlan_rx_kill_vid
functions. Check for NETIF_F_HW_VLAN_FILTER feature is done in this
wrapper.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-08 19:52:42 -05:00
Jiri Pirko
8e586137e6 net: make vlan ndo_vlan_rx_[add/kill]_vid return error value
Let caller know the result of adding/removing vlan id to/from vlan
filter.

In some drivers I make those functions to just return 0. But in those
where there is able to see if hw setup went correctly, return value is
set appropriately.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-08 19:52:37 -05:00
Jiri Pirko
7da82c06de vlan: rename vlan_dev_info to vlan_dev_priv
As this structure is priv, name it approprietely. Also for pointer to it
use name "vlan".

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-08 19:51:30 -05:00
Srivatsa S. Bhat
9b6fc5dc87 PM / Sleep: Make [un]lock_system_sleep() generic
The [un]lock_system_sleep() APIs were originally introduced to mutually
exclude memory hotplug and hibernation.

Directly using mutex_lock(&pm_mutex) to achieve mutual exclusion with
suspend or hibernation code can lead to freezing failures. However, the
APIs [un]lock_system_sleep() can be safely used to achieve the same,
without causing freezing failures.

So, since it would be beneficial to modify all the existing users of
mutex_lock(&pm_mutex) (in all parts of the kernel), so that they use these
safe APIs intead, make these APIs generic by removing the restriction that
they work only when CONFIG_HIBERNATE_CALLBACKS is set. Moreover, that
restriction didn't buy us anything anyway.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-12-08 23:22:21 +01:00
Srivatsa S. Bhat
33e638b907 PM / Sleep: Use the freezer_count() functions in [un]lock_system_sleep() APIs
Now that freezer_count() and freezer_do_not_count() don't have the restriction
that they are effective only when called by userspace processes, use
them in lock_system_sleep() and unlock_system_sleep() instead of open-coding
their parts.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-12-08 23:22:09 +01:00
Rafael J. Wysocki
43753e58b1 Merge branch 'pm-freezer' into pm-sleep
* pm-freezer:
  PM / Freezer: Remove the "userspace only" constraint from freezer[_do_not]_count()
2011-12-08 23:21:25 +01:00
Srivatsa S. Bhat
467de1fc67 PM / Freezer: Remove the "userspace only" constraint from freezer[_do_not]_count()
At present, the functions freezer_count() and freezer_do_not_count()
impose the restriction that they are effective only for userspace processes.
However, now, these functions have found more utility than originally
intended by the commit which introduced it: ba96a0c8 (freezer:
fix vfork problem). And moreover, even the vfork issue actually does not
need the above restriction in these functions.

So, modify these functions to make them work even for kernel threads, so
that they can be used at other places in the kernel, where the userspace
restriction doesn't apply.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Tejun Heo <tj@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-12-08 23:21:01 +01:00
Tejun Heo
7bd0b0f0da memblock: Reimplement memblock allocation using reverse free area iterator
Now that all early memory information is in memblock when enabled, we
can implement reverse free area iterator and use it to implement NUMA
aware allocator which is then wrapped for simpler variants instead of
the confusing and inefficient mending of information in separate NUMA
aware allocator.

Implement for_each_free_mem_range_reverse(), use it to reimplement
memblock_find_in_range_node() which in turn is used by all allocators.

The visible allocator interface is inconsistent and can probably use
some cleanup too.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
2011-12-08 10:22:09 -08:00
Tejun Heo
0ee332c145 memblock: Kill early_node_map[]
Now all ARCH_POPULATES_NODE_MAP archs select HAVE_MEBLOCK_NODE_MAP -
there's no user of early_node_map[] left.  Kill early_node_map[] and
replace ARCH_POPULATES_NODE_MAP with HAVE_MEMBLOCK_NODE_MAP.  Also,
relocate for_each_mem_pfn_range() and helper from mm.h to memblock.h
as page_alloc.c would no longer host an alternative implementation.

This change is ultimately one to one mapping and shouldn't cause any
observable difference; however, after the recent changes, there are
some functions which now would fit memblock.c better than page_alloc.c
and dependency on HAVE_MEMBLOCK_NODE_MAP instead of HAVE_MEMBLOCK
doesn't make much sense on some of them.  Further cleanups for
functions inside HAVE_MEMBLOCK_NODE_MAP in mm.h would be nice.

-v2: Fix compile bug introduced by mis-spelling
 CONFIG_HAVE_MEMBLOCK_NODE_MAP to CONFIG_MEMBLOCK_HAVE_NODE_MAP in
 mmzone.h.  Reported by Stephen Rothwell.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-12-08 10:22:09 -08:00
Tejun Heo
7fb0bc3f06 memblock: Implement memblock_add_node()
Implement memblock_add_node() which can add a new memblock memory
region with specific node ID.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
2011-12-08 10:22:08 -08:00
Tejun Heo
1aadc0560f memblock: s/memblock_analyze()/memblock_allow_resize()/ and update users
The only function of memblock_analyze() is now allowing resize of
memblock region arrays.  Rename it to memblock_allow_resize() and
update its users.

* The following users remain the same other than renaming.

  arm/mm/init.c::arm_memblock_init()
  microblaze/kernel/prom.c::early_init_devtree()
  powerpc/kernel/prom.c::early_init_devtree()
  openrisc/kernel/prom.c::early_init_devtree()
  sh/mm/init.c::paging_init()
  sparc/mm/init_64.c::paging_init()
  unicore32/mm/init.c::uc32_memblock_init()

* In the following users, analyze was used to update total size which
  is no longer necessary.

  powerpc/kernel/machine_kexec.c::reserve_crashkernel()
  powerpc/kernel/prom.c::early_init_devtree()
  powerpc/mm/init_32.c::MMU_init()
  powerpc/mm/tlb_nohash.c::__early_init_mmu()  
  powerpc/platforms/ps3/mm.c::ps3_mm_add_memory()
  powerpc/platforms/embedded6xx/wii.c::wii_memory_fixups()
  sh/kernel/machine_kexec.c::reserve_crashkernel()

* x86/kernel/e820.c::memblock_x86_fill() was directly setting
  memblock_can_resize before populating memblock and calling analyze
  afterwards.  Call memblock_allow_resize() before start populating.

memblock_can_resize is now static inside memblock.c.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-12-08 10:22:08 -08:00
Tejun Heo
1440c4e2c9 memblock: Track total size of regions automatically
Total size of memory regions was calculated by memblock_analyze()
requiring explicitly calling the function between operations which can
change memory regions and possible users of total size, which is
cumbersome and fragile.

This patch makes each memblock_type track total size automatically
with minor modifications to memblock manipulation functions and remove
requirements on calling memblock_analyze().  [__]memblock_dump_all()
now also dumps the total size of reserved regions.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
2011-12-08 10:22:08 -08:00