498657a478 incorrectly assumed
that preempt wasn't disabled around context_switch() and thus
was fixing imaginary problem. It also broke KVM because it
depended on ->sched_in() to be called with irq enabled so that
it can do smp calls from there.
Revert the incorrect commit and add comment describing different
contexts under with the two callbacks are invoked.
Avi: spotted transposed in/out in the added comment.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Avi Kivity <avi@redhat.com>
Cc: peterz@infradead.org
Cc: efault@gmx.de
Cc: rusty@rustcorp.com.au
LKML-Reference: <1259726212-30259-2-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Keyboard handler should not attempt to traverse handler->h_list on
its own, without any locking, otherwise it races with registering
and unregistering of input handles which leads to crashes.
Introduce input_handler_for_each_handle() helper that allows safely
iterate over all handles attached to a particular handler and switch
keyboard handler to use it.
Reported-by: Jim Paradis <jparadis@redhat.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
I will need this shortly to implement network namespace shutdown
batching. For sanity sake network devices should be removed in
the reverse order they were created in.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The motivation for an additional notifier in batched netdevice
notification (rt_do_flush) only needs to be called once per batch not
once per namespace.
For further batching improvements I need a guarantee that the
netdevices are unregistered in order allowing me to unregister an all
of the network devices in a network namespace at the same time with
the guarantee that the loopback device is really and truly
unregistered last.
Additionally it appears that we moved the route cache flush after
the final synchronize_net, which seems wrong and there was no
explanation. So I have restored the original location of the final
synchronize_net.
Cc: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
use only one prof_sysenter_enable() instead of
prof_sysenter_enable_##sname()
use only one prof_sysenter_disable() instead of
prof_sysenter_disable_##sname()
use only one prof_sysexit_enable() instead of
prof_sysexit_enable_##sname()
use only one prof_sysexit_disable() instead of
prof_sysexit_disable_##sname()
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B14D2A1.8060304@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch implements a new method by which hw_random hardware drivers
can pass data to the core more efficiently, using a shared buffer.
The old methods have been retained as a compatability layer until all the
drivers have been updated.
Signed-off-by: Ian Molton <ian.molton@collabora.co.uk>
Acked-by: Matt Mackall <mpm@selenic.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache: (31 commits)
FS-Cache: Provide nop fscache_stat_d() if CONFIG_FSCACHE_STATS=n
SLOW_WORK: Fix GFS2 to #include <linux/module.h> before using THIS_MODULE
SLOW_WORK: Fix CIFS to pass THIS_MODULE to slow_work_register_user()
CacheFiles: Don't log lookup/create failing with ENOBUFS
CacheFiles: Catch an overly long wait for an old active object
CacheFiles: Better showing of debugging information in active object problems
CacheFiles: Mark parent directory locks as I_MUTEX_PARENT to keep lockdep happy
CacheFiles: Handle truncate unlocking the page we're reading
CacheFiles: Don't write a full page if there's only a partial page to cache
FS-Cache: Actually requeue an object when requested
FS-Cache: Start processing an object's operations on that object's death
FS-Cache: Make sure FSCACHE_COOKIE_LOOKING_UP cleared on lookup failure
FS-Cache: Add a retirement stat counter
FS-Cache: Handle pages pending storage that get evicted under OOM conditions
FS-Cache: Handle read request vs lookup, creation or other cache failure
FS-Cache: Don't delete pending pages from the page-store tracking tree
FS-Cache: Fix lock misorder in fscache_write_op()
FS-Cache: The object-available state can't rely on the cookie to be available
FS-Cache: Permit cache retrieval ops to be interrupted in the initial wait phase
FS-Cache: Use radix tree preload correctly in tracking of pages to be stored
...
This file breaks out the SuperH PFC code from
arch/sh/kernel/gpio.c + arch/sh/include/asm/gpio.h
to drivers/sh/pfc.c + include/linux/sh_pfc.h.
Similar to the INTC stuff. The non-SuperH specific
file location makes it possible to share the code
between multiple architectures.
Signed-off-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This patch moves the KEYSC header file from the
SuperH specific asm directory to a place where
it can be shared by multiple architectures.
Signed-off-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This patch adds platform data with a function for power
control to the SDHI driver. The idea is that board specific
code can provide their own functions so power can be enabled
and disabled for the sd-cards.
Signed-off-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
fork() clones all thread_info flags, including
TIF_USER_RETURN_NOTIFY; if the new task is first scheduled on a cpu
which doesn't have user return notifiers set, this causes user
return notifiers to trigger without any way of clearing itself.
This is easy to trigger with a forky workload on the host in
parallel with kvm, resulting in a cpu in an endless loop on the
verge of returning to userspace.
Fix by dropping the TIF_USER_RETURN_NOTIFY immediately after fork.
Signed-off-by: Avi Kivity <avi@redhat.com>
LKML-Reference: <1259505288-16559-1-git-send-email-avi@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is an interface to set, delete and flush PMKIDs through nl80211.
Main users would be fullmac devices which firmwares are capable of
generating the RSN IEs for the re-association requests, e.g. iwmc3200wifi.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Kernel breakpoints are created using functions in which we pass
breakpoint parameters as individual variables: address, length
and type.
Although it fits well for x86, this just does not scale across
architectures that may support this api later as these may have
more or different needs. Pass in a perf_event_attr structure
instead because it is meant to evolve as much as possible into
a generic hardware breakpoint parameter structure.
Reported-by: K.Prasad <prasad@linux.vnet.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1259294154-5197-2-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In-kernel user breakpoints are created using functions in which
we pass breakpoint parameters as individual variables: address,
length and type.
Although it fits well for x86, this just does not scale across
archictectures that may support this api later as these may have
more or different needs. Pass in a perf_event_attr structure
instead because it is meant to evolve as much as possible into
a generic hardware breakpoint parameter structure.
Reported-by: K.Prasad <prasad@linux.vnet.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1259294154-5197-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently the UP/DOWN state of VLANs is synchronized to the state of the
underlying device, meaning all VLANs are set down once the underlying
device is set down. This causes all routes to the VLAN devices to vanish.
Add a flag to specify a "loose binding" mode, in which only the operstate
is transfered, but the VLAN device state is independant.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to support all three modes of macvlan at
runtime, extend the existing netlink protocol
to allow choosing the mode per macvlan slave
interface.
This depends on a matching patch to iproute2
in order to become accessible in user land.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The veth driver contains code to forward an skb
from the start_xmit function of one network
device into the receive path of another device.
Moving that code into a common location lets us
reuse the code for direct forwarding of data
between macvlan ports, and possibly in other
drivers.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use of msecs_to_jiffies() for nsecs_to_cputime() have some
problems:
- The type of msecs_to_jiffies()'s argument is unsigned int, so
it cannot convert msecs greater than UINT_MAX = about 49.7 days.
- msecs_to_jiffies() returns MAX_JIFFY_OFFSET if MSB of argument
is set, assuming that input was negative value. So it cannot
convert msecs greater than INT_MAX = about 24.8 days too.
This patch defines a new function nsecs_to_jiffies() that can
deal greater values, and that can deal all incoming values as
unsigned.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Spencer Candland <spencer@bluehost.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Amrico Wang <xiyou.wangcong@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <johnstul@linux.vnet.ibm.com>
LKML-Reference: <4B0E16E7.5070307@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Functions task_{u,s}time() are called in pair in almost all
cases. However task_stime() is implemented to call task_utime()
from its inside, so such paired calls run task_utime() twice.
It means we do heavy divisions (div_u64 + do_div) twice to get
utime and stime which can be obtained at same time by one set
of divisions.
This patch introduces a function task_times(*tsk, *utime,
*stime) to retrieve utime and stime at once in better, optimized
way.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Spencer Candland <spencer@bluehost.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
LKML-Reference: <4B0E16AE.906@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There seems to be a regression in direct write path due to following
commit in for-2.6.33 branch of block tree.
commit 1af60fbd75
Author: Jeff Moyer <jmoyer@redhat.com>
Date: Fri Oct 2 18:56:53 2009 -0400
block: get rid of the WRITE_ODIRECT flag
Marking direct writes as WRITE_SYNC_PLUG instead of WRITE_ODIRECT, sets
the NOIDLE flag in bio and hence in request. This tells CFQ to not expect
more request from the queue and not idle on it (despite the fact that
queue's think time is less and it is not seeky).
So direct writers lose big time when competing with sequential readers.
Using fio, I have run one direct writer and two sequential readers and
following are the results with 2.6.32-rc7 kernel and with for-2.6.33
branch.
Test
====
1 direct writer and 2 sequential reader running simultaneously.
[global]
directory=/mnt/sdc/fio/
runtime=10
[seqwrite]
rw=write
size=4G
direct=1
[seqread]
rw=read
size=2G
numjobs=2
2.6.32-rc7
==========
direct writes: aggrb=2,968KB/s
readers : aggrb=101MB/s
for-2.6.33 branch
=================
direct write: aggrb=19KB/s
readers aggrb=137MB/s
This patch brings back the WRITE_ODIRECT flag, with the difference that we
don't set the BIO_RW_UNPLUG flag so that device is not unplugged after
submission of request and an explicit unplug from submitter is required.
That way we fix the jeff's issue of not enough merging taking place in aio
path as well as make sure direct writes get their fair share.
After the fix
=============
for-2.6.33 + fix
----------------
direct writes: aggrb=2,728KB/s
reads: aggrb=103MB/s
Thanks
Vivek
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Mtdblock driver doesn't call flush_dcache_page for pages in request. So,
this causes problems on architectures where the icache doesn't fill from
the dcache or with dcache aliases. The patch fixes this.
The ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE symbol was introduced to avoid
pointless empty cache-thrashing loops on architectures for which
flush_dcache_page() is a no-op. Every architecture was provided with this
flush pages on architectires where ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE is
equal 1 or do nothing otherwise.
See "fix mtd_blkdevs problem with caches on some architectures" discussion
on LKML for more information.
Signed-off-by: Ilya Loginov <isloginov@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Peter Horton <phorton@bitbox.co.uk>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
It is not quite obvious at first sight what TRACE_EVENT_TEMPLATE
does: does it define an event as well beyond defining a template?
To clarify this, rename it to DECLARE_EVENT_CLASS, which follows
the various 'DECLARE_*()' idioms we already have in the kernel:
DECLARE_EVENT_CLASS(class)
DEFINE_EVENT(class, event1)
DEFINE_EVENT(class, event2)
DEFINE_EVENT(class, event3)
To complete this logic we should also rename TRACE_EVENT() to:
DEFINE_SINGLE_EVENT(single_event)
... but in a more quiet moment of the kernel cycle.
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B0E286A.2000405@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The new XFRMA_ALG_AUTH_TRUNC attribute taking a xfrm_algo_auth as
argument allows the installation of authentication algorithms with
a truncation length specified in userspace, i.e. SHA256 with 128 bit
instead of 96 bit truncation.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce pci_is_pcie() which returns true if the specified PCI device
is PCI Express capable, false otherwise.
The purpose of pci_is_pcie() is removing 'is_pcie' flag in the struct
pci_dev, which is not needed because we can check it using 'pcie_cap'
field. To remove 'is_pcie', we need to update user of 'is_pcie' to use
pci_is_pcie() instead first.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Introduce pci_pcie_cap() API that returns saved PCIe capability offset
(currently it is saved in 'pcie_cap' field in the struct PCI dev).
Using pci_pcie_cap() instead of pci_find_capability() avoids
unnecessary search in PCI configuration space.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
After creating the TRACE_EVENT_TEMPLATE I started to look at other
trace points to see what duplication was made. I noticed that there
are several trace points where they are almost identical except for
the name and the output format. Since TRACE_EVENT_TEMPLATE was successful
in bringing down the size of trace events, I added a DEFINE_EVENT_PRINT.
DEFINE_EVENT_PRINT is used just like DEFINE_EVENT is. That is, the
DEFINE_EVENT_PRINT also uses a TRACE_EVENT_TEMPLATE, but it allows the
developer to overwrite the print format. If there are two or more
TRACE_EVENTS that are identical except for the name and print, then
they can be converted to use a TRACE_EVENT_TEMPLATE. Since the
TRACE_EVENT_TEMPLATE already does the print output, the first trace event
would have its print format held in the TRACE_EVENT_TEMPLATE and
be defined with a DEFINE_EVENT. The rest will use the DEFINE_EVENT_PRINT
and override the print format.
Converting the sched trace points to both DEFINE_EVENT and
DEFINE_EVENT_PRINT. Five were converted to DEFINE_EVENT and two were
converted to DEFINE_EVENT_PRINT.
I was able to get the following:
$ size kernel/sched.o-*
text data bss dec hex filename
79299 6776 2520 88595 15a13 kernel/sched.o-notrace
101941 11896 2584 116421 1c6c5 kernel/sched.o-templ
104779 11896 2584 119259 1d1db kernel/sched.o-trace
sched.o-notrace is the scheduler compiled with no trace points.
sched.o-templ is with the use of DEFINE_EVENT and DEFINE_EVENT_PRINT
sched.o-trace is the current trace events.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
There are some places in the kernel that define several tracepoints and
they are all identical besides the name. The code to enable, disable and
record is created for every trace point even if most of the code is
identical.
This patch adds TRACE_EVENT_TEMPLATE that lets the developer create
a template TRACE_EVENT and create trace points with DEFINE_EVENT, which
is based off of a given template. Each trace point used by this
will share most of the code, and bring down the size of the kernel
when there are several duplicate events.
Usage is:
TRACE_EVENT_TEMPLATE(name, proto, args, tstruct, assign, print);
Which would be the same as defining a normal TRACE_EVENT.
To create the trace events that the trace points will use:
DEFINE_EVENT(template, name, proto, args) is done. The template
is the name of the TRACE_EVENT_TEMPLATE to use. The name is the
name of the trace point. The parameters proto and args must be the same
as the proto and args of the template. If they are not the same,
then a compile error will result. I tried hard removing this duplication
but the C preprocessor is not powerful enough (or my CPP magic
experience points is not at a high enough level) to not need them.
A lot of trace events are coming in with new XFS development. Most of
the trace points are identical except for the name. The following shows
the advantage of having TRACE_EVENT_TEMPLATE:
$ size fs/xfs/xfs.o.*
text data bss dec hex filename
452114 2788 3520 458422 6feb6 fs/xfs/xfs.o.old
638482 38116 3744 680342 a6196 fs/xfs/xfs.o.template
996954 38116 4480 1039550 fdcbe fs/xfs/xfs.o.trace
xfs.o.old is without any tracepoints.
xfs.o.template uses the new TRACE_EVENT_TEMPLATE.
xfs.o.trace uses the current TRACE_EVENT macros.
Requested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This interface is mainly intended (and implemented) for ACPI _PPC BIOS
frequency limitations, but other cpufreq drivers can also use it for
similar use-cases.
Why is this needed:
Currently it's not obvious why cpufreq got limited.
People see cpufreq/scaling_max_freq reduced, but this could have
happened by:
- any userspace prog writing to scaling_max_freq
- thermal limitations
- hardware (_PPC in ACPI case) limitiations
Therefore export bios_limit (in kHz) to:
- Point the user that it's the BIOS (broken or intended) which limits
frequency
- Export it as a sysfs interface for userspace progs.
While this was a rarely used feature on laptops, there will appear
more and more server implemenations providing "Green IT" features like
allowing the service processor to limit the frequency. People want
to know about HW/BIOS frequency limitations.
All ACPI P-state driven cpufreq drivers are covered with this patch:
- powernow-k8
- powernow-k7
- acpi-cpufreq
Tested with a patched DSDT which limits the first two cores (_PPC returns 1)
via _PPC, exposed by bios_limit:
# echo 2200000 >cpu2/cpufreq/scaling_max_freq
# cat cpu*/cpufreq/scaling_max_freq
2600000
2600000
2200000
2200000
# #scaling_max_freq shows general user/thermal/BIOS limitations
# cat cpu*/cpufreq/bios_limit
2600000
2600000
2800000
2800000
# #bios_limit only shows the HW/BIOS limitation
CC: Pallipadi Venkatesh <venkatesh.pallipadi@intel.com>
CC: Len Brown <lenb@kernel.org>
CC: davej@codemonkey.org.uk
CC: linux@dominikbrodowski.net
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>