We still can apply DaveM's generation count optimization to
BFS, based on the following idea:
- before doing each BFS, increase the global generation id
by 1
- if one node in the graph has been visited, mark it as
visited by storing the current global generation id into
the node's dep_gen_id field
- so we can decide if one node has been visited already, by
comparing the node's dep_gen_id with the global generation id.
By applying DaveM's generation count optimization to current
implementation of BFS, we gain the following advantages:
- we save MAX_LOCKDEP_ENTRIES/8 bytes memory;
- we remove the bitmap_zero(bfs_accessed, MAX_LOCKDEP_ENTRIES);
in each BFS, which is very time-consuming since
MAX_LOCKDEP_ENTRIES may be very large.(16384UL)
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "David S. Miller" <davem@davemloft.net>
LKML-Reference: <1248274089-6358-1-git-send-email-tom.leiming@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
spin_lock_nest_lock() allows to take many instances of the same
class, this can easily lead to overflow of MAX_LOCK_DEPTH.
To avoid this overflow, we'll stop accounting instances but
start reference counting the class in the held_lock structure.
[ We could maintain a list of instances, if we'd move the hlock
stuff into __lock_acquired(), but that would require
significant modifications to the current code. ]
We restrict this mode to spin_lock_nest_lock() only, because it
degrades the lockdep quality due to lost of instance.
For lockstat this means we don't track lock statistics for any
but the first lock in the series.
Currently nesting is limited to 11 bits because that was the
spare space available in held_lock. This yields a 2048
instances maximium.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add a lockdep helper to validate that we indeed are the owner
of a lock.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We currently have an explicit "needs_post" vtable method which
returns a stack variable for whether we should later run
post-schedule. This leads to an awkward exchange of the
variable as it bubbles back up out of the context switch. Peter
Zijlstra observed that this information could be stored in the
run-queue itself instead of handled on the stack.
Therefore, we revert to the method of having context_switch
return void, and update an internal rq->post_schedule variable
when we require further processing.
In addition, we fix a race condition where we try to access
current->sched_class without holding the rq->lock. This is
technically racy, as the sched-class could change out from
under us. Instead, we reference the per-rq post_schedule
variable with the runqueue unlocked, but with preemption
disabled to see if we need to reacquire the rq->lock.
Finally, we clean the code up slightly by removing the #ifdef
CONFIG_SMP conditionals from the schedule() call, and implement
some inline helper functions instead.
This patch passes checkpatch, and rt-migrate.
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090729150422.17691.55590.stgit@dev.haskins.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The might_sleep() test inside cond_resched_lock() assumes the
spinlock is held and then preemption is disabled. This is true
with CONFIG_PREEMPT but the preempt_count() doesn't change
otherwise.
Check by starting from the appropriate preempt offset depending
on the config.
Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1248458723-12146-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In order to be able to distinguish between no samples due to
inactivity and no samples due to task ended, Arjan asked for
PERF_EVENT_EXIT events. This is useful to the boot delay
instrumentation (bootchart) app.
This patch changes the PERF_EVENT_FORK to be emitted on every
clone, and adds PERF_EVENT_EXIT to be emitted on task exit,
after the task's counters have been closed.
This task tracing is controlled through: attr.comm || attr.mmap
and through the new attr.task field.
Suggested-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
[ cleaned up perf_counter.h a bit ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds new line discipline name an number to include/linux/tty.h. The
line discipline will be used by the Amstrad E3 (Delta) sound driver that will
come next in this series of patches.
Created against linux-2.6.31-rc3.
Applies to linux-omap-2.6 commit 7c5cb7862d32cb344be7831d466535d5255e35ac as
well.
Signed-off-by: Janusz Krzysztofik <jkrzyszt@tis.icnet.pl>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
io context: fix ref counting
block: make the end_io functions be non-GPL exports
block: fix improper kobject release in blk_integrity_unregister
block: always assign default lock to queues
mg_disk: Add missing ready status check on mg_write()
mg_disk: fix issue with data integrity on error in mg_write()
mg_disk: fix reading invalid status when use polling driver
mg_disk: remove prohibited sleep operation
the code allready uses flush_kernel_dcache_page(). This patch updates the
driver to the recent sg API changes which require that either SG_MITER_TO_SG
or SG_MITER_FROM_SG is set. SG_MITER_TO_SG calls flush_kernel_dcache_page()
in sg_mitter_stop()
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Acked-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Pierre Ossman <pierre@ossman.eu>
sg_miter_start() is currently unaware of the direction of the copy
process (to or from the scatter list). It is important to know the
direction because the page has to be flushed in case the data written
is seen on a different mapping in user land on cache incoherent
architectures.
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Pierre Ossman <pierre@ossman.eu>
Header to define the standard registers for an
PL093 SSMC memory controller.
Signed-off-by: Ben Dooks <ben@simtec.co.uk>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
This adds a function that indicates that a battery is fully charged.
It also includes functions to get a power_supply device from the class
of registered devices by name reference. These can be used to find a
specific battery to call power_supply_set_battery_charged() on.
Some battery drivers might need this information to calibrate
themselves.
Signed-off-by: Daniel Mack <daniel@caiaq.de>
Cc: Ian Molton <spyro@f2s.com>
Cc: Anton Vorontsov <cbou@mail.ru>
Cc: Matt Reimer <mreimer@vpop.net>
Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>
I've been doing this for years, and akpm picked me up on it about 12
months ago. lguest partly serves as example code, so let's do it Right.
Also, remove two unused fields in struct vblk_info in the example launcher.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@redhat.com>
I don't really notice it (except to begrudge the extra vertical
space), but Ingo does. And he pointed out that one excuse of lguest
is as a teaching tool, it should set a good example.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@redhat.com>
Once a structure goes over PAGE_SIZE*2, we see occasional allocation
failures. Some people have chosen to switch over to things like vmalloc()
that will let them keep array-like access to such a large structures.
But, vmalloc() has plenty of downsides.
Here's an alternative. I think it's what Andrew was suggesting here:
http://lkml.org/lkml/2009/7/2/518
I call it a flexible array. It does all of its work in PAGE_SIZE bits, so
never does an order>0 allocation. The base level has
PAGE_SIZE-2*sizeof(int) bytes of storage for pointers to the second level.
So, with a 32-bit arch, you get about 4MB (4183112 bytes) of total
storage when the objects pack nicely into a page. It is half that on
64-bit because the pointers are twice the size. There's a table detailing
this in the code.
There are kerneldocs for the functions, but here's an
overview:
flex_array_alloc() - dynamically allocate a base structure
flex_array_free() - free the array and all of the
second-level pages
flex_array_free_parts() - free the second-level pages, but
not the base (for static bases)
flex_array_put() - copy into the array at the given index
flex_array_get() - copy out of the array at the given index
flex_array_prealloc() - preallocate the second-level pages
between the given indexes to
guarantee no allocs will occur at
put() time.
We could also potentially just pass the "element_size" into each of the
API functions instead of storing it internally. That would get us one
more base pointer on 32-bit.
I've been testing this by running it in userspace. The header and patch
that I've been using are here, as well as the little script I'm using to
generate the size table which goes in the kerneldocs.
http://sr71.net/~dave/linux/flexarray/
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After commit ec64f51545 ("cgroup: fix
frequent -EBUSY at rmdir"), cgroup's rmdir (especially against memcg)
doesn't return -EBUSY by temporary ref counts. That commit expects all
refs after pre_destroy() is temporary but...it wasn't. Then, rmdir can
wait permanently. This patch tries to fix that and change followings.
- set CGRP_WAIT_ON_RMDIR flag before pre_destroy().
- clear CGRP_WAIT_ON_RMDIR flag when the subsys finds racy case.
if there are sleeping ones, wakes them up.
- rmdir() sleeps only when CGRP_WAIT_ON_RMDIR flag is set.
Tested-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Reported-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Reviewed-by: Paul Menage <menage@google.com>
Acked-by: Balbir Sigh <balbir@linux.vnet.ibm.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The bug was introduced by commit cc31edceee
("cgroups: convert tasks file to use a seq_file with shared pid array").
We cache a pid array for all threads that are opening the same "tasks"
file, but the pids in the array are always from the namespace of the
last process that opened the file, so all other threads will read pids
from that namespace instead of their own namespaces.
To fix it, we maintain a list of pid arrays, which is keyed by pid_ns.
The list will be of length 1 at most time.
Reported-by: Paul Menage <menage@google.com>
Idea-by: Paul Menage <menage@google.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Reviewed-by: Serge Hallyn <serue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
libata: accept late unlocking of HPA
libata: Updates and fixes for pata_at91 driver
ata_piix: Add new short cable ID
ata_piix: Add new laptop short cable IDs
ahci: add device IDs for Ibex Peak ahci controllers
libata: remove superfluous NULL pointer checks
libata: add missing NULL pointer check to ata_eh_reset()
pata_pcmcia: add CNF-CDROM-ID
We really don't want to mark the pty as a low-latency device, because as
Alan points out, the ->write method can be called from an IRQ (ppp?),
and that means we can't use ->low_latency=1 as we take mutexes in the
low_latency case.
So rather than using low_latency to force the written data to be pushed
to the ldisc handling at 'write()' time, just make the reader side (or
the poll function) do the flush when it checks whether there is data to
be had.
This also fixes the problem with lost data in an emacs compile buffer
(bugzilla 13815), and we can thus revert the low_latency pty hack
(commit 3a54297478: "pty: quickfix for the
pty ENXIO timing problems").
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[ Modified to do the tty_flush_to_ldisc() inside input_available_p() so
that it triggers for both read and poll() - Linus]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Create bdgrab(). This function copies an existing reference to a
block_device. It is safe to call from any context.
Hibernation code wishes to copy a reference to the active swap device.
Right now it calls bdget() under a spinlock, but this is wrong because
bdget() can sleep. It doesn't need a full bdget() because we already
hold a reference to active swap devices (and the spinlock protects
against swapoff).
Fixes http://bugzilla.kernel.org/show_bug.cgi?id=13827
Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
On certain configurations, HPA isn't or can't be unlocked during
probing but it somehow ends up unlocked afterwards. In the following
thread, the problem can be reliably reproduced after resuming from
STR. The BIOS turns on HPA during boot but forgets to do it during
resume.
http://thread.gmane.org/gmane.linux.kernel/858310
This patch updates libata revalidation such that it considers native
n_sectors. If the device size has increased to match native
n_sectors, it's assumed that HPA has been unlocked involuntarily and
the device is recognized as the same one. This should be fairly safe
while nicely working around the problem.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Christof Warlich <christof@warlich.name>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Instead of trying to share the generic 4.1 reply cache code for the
CREATE_SESSION reply cache, it's simpler to handle CREATE_SESSION
separately.
The nfs41 single slot clientid DRC holds the results of create session
processing. CREATE_SESSION can be preceeded by a SEQUENCE operation
(an embedded CREATE_SESSION) and the create session single slot cache must be
maintained. nfsd4_replay_cache_entry() and nfsd4_store_cache_entry() do not
implement the replay of an embedded CREATE_SESSION.
The clientid DRC slot does not need the inuse, cachethis or other fields that
the multiple slot session cache uses. Replace the clientid DRC cache struct
nfs4_slot cache with a new nfsd4_clid_slot cache. Save the xdr struct
nfsd4_create_session into the cache at the end of processing, and on a replay,
replace the struct for the replay request with the cached version all while
under the state lock.
nfsd4_proc_compound will handle both the solo and embedded CREATE_SESSION case
via the normal use of encode_operation.
Errors that do not change the create session cache:
A create session NFS4ERR_STALE_CLIENTID error means that a client record
(and associated create session slot) could not be found and therefore can't
be changed. NFSERR_SEQ_MISORDERED errors do not change the slot cache.
All other errors get cached.
Remove the clientid DRC specific check in nfs4svc_encode_compoundres to
put the session only if cstate.session is set which will now always be true.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
NFSD_SLOT_CACHE_SIZE is the size of all encoded operation responses
(excluding the sequence operation) that we want to cache.
For now, keep NFSD_SLOT_CACHE_SIZE at PAGE_SIZE. It will be reduced
when the DRC is changed from page based to memory based.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This converts swiotlb to use phys_to_dma and dma_to_phys instead of
swiotlb_phys_to_bus() and swiotlb_bus_to_phys().
swiotlb_phys_to_bus() and swiotlb_bus_to_phys() are not necessary so
this patch also removes them.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Becky Bruce <beckyb@kernel.crashing.org>
is_buffer_dma_capable() was replaced with dma_capable().
is_buffer_dma_capable() tells if a buffer is dma-capable or
not. However, it doesn't take a pointer to struct device so it doesn't
work for POWERPC.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Becky Bruce <beckyb@kernel.crashing.org>
Even though reverse path filter was changed from simple boolean to
trinary control, the loose mode only works if both all and device are
configured because of this logic error.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://git.infradead.org/users/eparis/notify:
inotify: use GFP_NOFS under potential memory pressure
fsnotify: fix inotify tail drop check with path entries
inotify: check filename before dropping repeat events
fsnotify: use def_bool in kconfig instead of letting the user choose
inotify: fix error paths in inotify_update_watch
inotify: do not leak inode marks in inotify_add_watch
inotify: drop user watch count when a watch is removed
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (45 commits)
cnic: Fix ISCSI_KEVENT_IF_DOWN message handling.
net: irda: init spinlock after memcpy
ixgbe: fix for 82599 errata marking UDP checksum errors
r8169: WakeOnLan fix for the 8168
netxen: reset ring consumer during cleanup
net/bridge: use kobject_put to release kobject in br_add_if error path
smc91x.h: add config for Nomadik evaluation kit
NET: ROSE: Don't use static buffer.
eepro: Read buffer overflow
tokenring: Read buffer overflow
at1700: Read buffer overflow
fealnx: Write outside array bounds
ixgbe: remove unnecessary call to device_init_wakeup
ixgbe: Don't priority tag control frames in DCB mode
ixgbe: Enable FCoE offload when DCB is enabled for 82599
net: Rework mdio-ofgpio driver to use of_mdio infrastructure
register at91_ether using platform_driver_probe
skge: Enable WoL by default if supported
net: KS8851 needs to depend on MII
be2net: Bug fix in the non-lro path. Size of received packet was not updated in statistics properly.
...
In order to make cfg80211/nl80211 aware of network namespaces,
we have to do the following things:
* del_virtual_intf method takes an interface index rather
than a netdev pointer - simply change this
* nl80211 uses init_net a lot, it changes to use the sender's
network namespace
* scan requests use the interface index, hold a netdev pointer
and reference instead
* we want a wiphy and its associated virtual interfaces to be
in one netns together, so
- we need to be able to change ns for a given interface, so
export dev_change_net_namespace()
- for each virtual interface set the NETIF_F_NETNS_LOCAL
flag, and clear that flag only when the wiphy changes ns,
to disallow breaking this invariant
* when a network namespace goes away, we need to reparent the
wiphy to init_net
* cfg80211 users that support creating virtual interfaces must
create them in the wiphy's namespace, currently this affects
only mac80211
The end result is that you can now switch an entire wiphy into
a different network namespace with the new command
iw phy#<idx> set netns <pid>
and all virtual interfaces will follow (or the operation fails).
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: (34 commits)
V4L/DVB (12303): cx23885: check pointers before dereferencing in dprintk macro
V4L/DVB (12302): cx23885-417: fix broken IOCTL handling
V4L/DVB (12300): bttv: fix regression: tvaudio must be loaded before tuner
V4L/DVB (12291): b2c2: fix frontends compiled into kernel
V4L/DVB (12286): sn9c20x: reorder includes to be like other drivers
V4L/DVB (12284): gspca - jpeg subdrivers: Check the result of kmalloc(jpeg header).
V4L/DVB (12283): gspca - sn9c20x: New subdriver for sn9c201 and sn9c202 bridges.
V4L/DVB (12282): gspca - main: Support for vidioc_g_chip_ident and vidioc_g/s_register.
V4L/DVB (12269): af9013: auto-detect parameters in case of garbage given by app
V4L/DVB (12267): gspca - sonixj: Bad sensor init of non ov76xx sensors.
V4L/DVB (12265): em28xx: fix tuning problem in HVR-900 (R1)
V4L/DVB (12263): em28xx: set demod profile for Pinnacle Hybrid Pro 320e
V4L/DVB (12262): em28xx: Make sure the tuner is initialized if generic empia USB id was used
V4L/DVB (12261): em28xx: set GPIO properly for Pinnacle Hybrid Pro analog support
V4L/DVB (12260): em28xx: make support work for the Pinnacle Hybrid Pro (eb1a:2881)
V4L/DVB (12258): em28xx: fix typo in mt352 init sequence for Terratec Cinergy T XS USB
V4L/DVB (12257): em28xx: make tuning work for Terratec Cinergy T XS USB (mt352 variant)
V4L/DVB (12245): em28xx: add support for mt9m001 webcams
V4L/DVB (12244): em28xx: adjust vinmode/vinctl based on the stream input format
V4L/DVB (12243): em28xx: allow specifying sensor xtal frequency
...
* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm:
dm table: pass correct dev area size to device_area_is_valid
dm: remove queue next_ordered workaround for barriers
dm raid1: wake kmirrord when requeueing delayed bios after remote recovery
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
jbd: fix race between write_metadata_buffer and get_write_access
ext3: Get rid of extenddisksize parameter of ext3_get_blocks_handle()
jbd: Fix a race between checkpointing code and journal_get_write_access()
ext3: Fix truncation of symlinks after failed write
jbd: Fail to load a journal if it is too short
1. add intel's sdio vendor id to sdio_ids.h
2. move iwmc3200 sdio devices' ids to sdio_ids.h
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>