swap can do cluster discard for SSD, which is good, but there are some
problems here:
1. swap do the discard just before page reclaim gets a swap entry and
writes the disk sectors. This is useless for high end SSD, because an
overwrite to a sector implies a discard to original sector too. A
discard + overwrite == overwrite.
2. the purpose of doing discard is to improve SSD firmware garbage
collection. Idealy we should send discard as early as possible, so
firmware can do something smart. Sending discard just after swap entry
is freed is considered early compared to sending discard before write.
Of course, if workload is already bound to gc speed, sending discard
earlier or later doesn't make
3. block discard is a sync API, which will delay scan_swap_map()
significantly.
4. Write and discard command can be executed parallel in PCIe SSD.
Making swap discard async can make execution more efficiently.
This patch makes swap discard async and moves discard to where swap entry
is freed. Discard and write have no dependence now, so above issues can
be avoided. Idealy we should do discard for any freed sectors, but some
SSD discard is very slow. This patch still does discard for a whole
cluster.
My test does a several round of 'mmap, write, unmap', which will trigger a
lot of swap discard. In a fusionio card, with this patch, the test
runtime is reduced to 18% of the time without it, so around 5.5x faster.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Shaohua Li <shli@fusionio.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Kyungmin Park <kmpark@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I'm using a fast SSD to do swap. scan_swap_map() sometimes uses up to
20~30% CPU time (when cluster is hard to find, the CPU time can be up to
80%), which becomes a bottleneck. scan_swap_map() scans a byte array to
search a 256 page cluster, which is very slow.
Here I introduced a simple algorithm to search cluster. Since we only
care about 256 pages cluster, we can just use a counter to track if a
cluster is free. Every 256 pages use one int to store the counter. If
the counter of a cluster is 0, the cluster is free. All free clusters
will be added to a list, so searching cluster is very efficient. With
this, scap_swap_map() overhead disappears.
This might help low end SD card swap too. Because if the cluster is
aligned, SD firmware can do flash erase more efficiently.
We only enable the algorithm for SSD. Hard disk swap isn't fast enough
and has downside with the algorithm which might introduce regression (see
below).
The patch slightly changes which cluster is choosen. It always adds free
cluster to list tail. This can help wear leveling for low end SSD too.
And if no cluster found, the scan_swap_map() will do search from the end
of last cluster. So if no cluster found, the scan_swap_map() will do
search from the end of last free cluster, which is random. For SSD, this
isn't a problem at all.
Another downside is the cluster must be aligned to 256 pages, which will
reduce the chance to find a cluster. I would expect this isn't a big
problem for SSD because of the non-seek penality. (And this is the reason
I only enable the algorithm for SSD).
Signed-off-by: Shaohua Li <shli@fusionio.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Kyungmin Park <kmpark@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The previous patch doing vmstats for TLB flushes ("mm: vmstats: tlb flush
counters") effectively missed UP since arch/x86/mm/tlb.c is only compiled
for SMP.
UP systems do not do remote TLB flushes, so compile those counters out on
UP.
arch/x86/kernel/cpu/mtrr/generic.c calls __flush_tlb() directly. This is
probably an optimization since both the mtrr code and __flush_tlb() write
cr4. It would probably be safe to make that a flush_tlb_all() (and then
get these statistics), but the mtrr code is ancient and I'm hesitant to
touch it other than to just stick in the counters.
[akpm@linux-foundation.org: tweak comments]
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I was investigating some TLB flush scaling issues and realized that we do
not have any good methods for figuring out how many TLB flushes we are
doing.
It would be nice to be able to do these in generic code, but the
arch-independent calls don't explicitly specify whether we actually need
to do remote flushes or not. In the end, we really need to know if we
actually _did_ global vs. local invalidations, so that leaves us with few
options other than to muck with the counters from arch-specific code.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Read block device partition table from command line. The partition used
for fixed block device (eMMC) embedded device. It is no MBR, save
storage space. Bootloader can be easily accessed by absolute address of
data on the block device. Users can easily change the partition.
This code reference MTD partition, source "drivers/mtd/cmdlinepart.c"
About the partition verbose reference
"Documentation/block/cmdline-partition.txt"
[akpm@linux-foundation.org: fix printk text]
[yongjun_wei@trendmicro.com.cn: fix error return code in parse_parts()]
Signed-off-by: Cai Zhiyong <caizhiyong@huawei.com>
Cc: Karel Zak <kzak@redhat.com>
Cc: "Wanglin (Albert)" <albert.wanglin@huawei.com>
Cc: Marius Groeger <mag@sysgo.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Artem Bityutskiy <dedekind@infradead.org>
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* pm-cpufreq:
intel_pstate: Add Haswell CPU models
Revert "cpufreq: make sure frequency transitions are serialized"
cpufreq: Use signed type for 'ret' variable, to store negative error values
cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes
cpufreq: Synchronize the cpufreq store_*() routines with CPU hotplug
cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock
cpufreq: Split __cpufreq_remove_dev() into two parts
cpufreq: Fix wrong time unit conversion
cpufreq: serialize calls to __cpufreq_governor()
cpufreq: don't allow governor limits to be changed when it is disabled
Pull battery/power supply driver updates from Anton Vorontsov:
"New drivers:
- APM X-Gene system reboot driver by Feng Kan and Loc Ho (APM).
- Qualcomm MSM reboot/poweroff driver by Abhimanyu Kapur (Codeaurora).
- Texas Instruments BQ24190 charger driver by Mark A. Greer (Animal
Creek Technologies).
- Texas Instruments TWL4030 MADC battery driver by Lukas Märdian and
Marek Belisko (Golden Delicious Computers). The driver is used on
Freerunner GTA04 phones.
Highlighted fixes and improvements:
- Suspend/wakeup logic improvements: power supply objects will block
system suspend until all power supply events are processed. Thanks
to Zoran Markovic (Linaro), Arve Hjonnevag and Todd Poynor (Google)"
* tag 'for-v3.12' of git://git.infradead.org/battery-2.6:
rx51_battery: Fix channel number when reading adc value
power: Add twl4030_madc battery driver.
bq24190_charger: Workaround SS definition problem on i386 builds
power_supply: Prevent suspend until power supply events are processed
vexpress-poweroff: Should depend on the required infrastructure
twl4030-charger: Fix compiler warning with regulator_enable()
rx51_battery: Replace hardcoded channels values.
bq24190_charger: Add support for TI BQ24190 Battery Charger
ab8500-charger: We print an unintended error message
max8925_power: Fix missing of_node_put
power_supply: Replace strict_strtol() with kstrtol()
power: Add APM X-Gene system reboot driver
power_supply: tosa_battery: Get rid of irq_to_gpio usage
power supply: collie_battery: Convert to use dev_pm_ops
power_supply: Make goldfish_battery depend on GOLDFISH || COMPILE_TEST
power: reset: Add msm restart support
MAINTAINERS: drivers/power: add entry for SmartReflex AVS drivers
Pull drm fixes from Dave Airlie:
"Daniel had some fixes queued up, that were delayed, the stolen memory
ones and vga arbiter ones are quite useful, along with his usual bunch
of stuff, nothing for HSW outputs yet.
The one nouveau fix is for a regression I caused with the poweroff stuff"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (30 commits)
drm/nouveau: fix oops on runtime suspend/resume
drm/i915: Delay disabling of VGA memory until vgacon->fbcon handoff is done
drm/i915: try not to lose backlight CBLV precision
drm/i915: Confine page flips to BCS on Valleyview
drm/i915: Skip stolen region initialisation if none is reserved
drm/i915: fix gpu hang vs. flip stall deadlocks
drm/i915: Hold an object reference whilst we shrink it
drm/i915: fix i9xx_crtc_clock_get for multiplied pixels
drm/i915: handle sdvo input pixel multiplier correctly again
drm/i915: fix hpd work vs. flush_work in the pageflip code deadlock
drm/i915: fix up the relocate_entry refactoring
drm/i915: Fix pipe config warnings when dealing with LVDS fixed mode
drm/i915: Don't call sg_free_table() if sg_alloc_table() fails
i915: Update VGA arbiter support for newer devices
vgaarb: Fix VGA decodes changes
vgaarb: Don't disable resources that are not owned
drm/i915: Pin pages whilst mapping the dma-buf
drm/i915: enable trickle feed on Haswell
x86: add early quirk for reserving Intel graphics stolen memory v5
drm/i915: split PCI IDs out into i915_drm.h v4
...
Pull nfsd updates from Bruce Fields:
"This was a very quiet cycle! Just a few bugfixes and some cleanup"
* 'nfsd-next' of git://linux-nfs.org/~bfields/linux:
rpc: let xdr layer allocate gssproxy receieve pages
rpc: fix huge kmalloc's in gss-proxy
rpc: comment on linux_cred encoding, treat all as unsigned
rpc: clean up decoding of gssproxy linux creds
svcrpc: remove unused rq_resused
nfsd4: nfsd4_create_clid_dir prints uninitialized data
nfsd4: fix leak of inode reference on delegation failure
Revert "nfsd: nfs4_file_get_access: need to be more careful with O_RDWR"
sunrpc: prepare NFS for 2038
nfsd4: fix setlease error return
nfsd: nfs4_file_get_access: need to be more careful with O_RDWR
There are no more users of this API, so kill it dead, dead, dead and
quietly bury the corpse in a shallow, unmarked grave in a dark forest deep
in the hills...
[glommer@openvz.org: added flowers to the grave]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Reviewed-by: Greg Thelen <gthelen@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The list_lru infrastructure already keeps per-node LRU lists in its
node-specific list_lru_node arrays and provide us with a per-node API, and
the shrinkers are properly equiped with node information. This means that
we can now focus our shrinking effort in a single node, but the work that
is deferred from one run to another is kept global at nr_in_batch. Work
can be deferred, for instance, during direct reclaim under a GFP_NOFS
allocation, where situation, all the filesystem shrinkers will be
prevented from running and accumulate in nr_in_batch the amount of work
they should have done, but could not.
This creates an impedance problem, where upon node pressure, work deferred
will accumulate and end up being flushed in other nodes. The problem we
describe is particularly harmful in big machines, where many nodes can
accumulate at the same time, all adding to the global counter nr_in_batch.
As we accumulate more and more, we start to ask for the caches to flush
even bigger numbers. The result is that the caches are depleted and do
not stabilize. To achieve stable steady state behavior, we need to tackle
it differently.
In this patch we keep the deferred count per-node, in the new array
nr_deferred[] (the name is also a bit more descriptive) and will never
accumulate that to other nodes.
Signed-off-by: Glauber Costa <glommer@openvz.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The list_lru implementation has one function, list_lru_dispose_all, with
only one user (the dentry code). At first, such function appears to make
sense because we are really not interested in the result of isolating each
dentry separately - all of them are going away anyway. However, it's
implementation is buggy in the following way:
When we call list_lru_dispose_all in fs/dcache.c, we scan all dentries
marking them with DCACHE_SHRINK_LIST. However, this is done without the
nlru->lock taken. The imediate result of that is that someone else may
add or remove the dentry from the LRU at the same time. When list_lru_del
happens in that scenario we will see an element that is not yet marked
with DCACHE_SHRINK_LIST (even though it will be in the future) and
obviously remove it from an lru where the element no longer is. Since
list_lru_dispose_all will in effect count down nlru's nr_items and
list_lru_del will do the same, this will lead to an imbalance.
The solution for this would not be so simple: we can obviously just keep
the lru_lock taken, but then we have no guarantees that we will be able to
acquire the dentry lock (dentry->d_lock). To properly solve this, we need
a communication mechanism between the lru and dentry code, so they can
coordinate this with each other.
Such mechanism already exists in the form of the list_lru_walk_cb
callback. So it is possible to construct a dcache-side prune function
that does the right thing only by calling list_lru_walk in a loop until no
more dentries are available.
With only one user, plus the fact that a sane solution for the problem
would involve boucing between dcache and list_lru anyway, I see little
justification to keep the special case list_lru_dispose_all in tree.
Signed-off-by: Glauber Costa <glommer@openvz.org>
Cc: Michal Hocko <mhocko@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Now that we have an LRU list API, we can start to enhance the
implementation. This splits the single LRU list into per-node lists and
locks to enhance scalability. Items are placed on lists according to the
node the memory belongs to. To make scanning the lists efficient, also
track whether the per-node lists have entries in them in a active
nodemask.
Note: We use a fixed-size array for the node LRU, this struct can be very
big if MAX_NUMNODES is big. If this becomes a problem this is fixable by
turning this into a pointer and dynamically allocating this to
nr_node_ids. This quantity is firwmare-provided, and still would provide
room for all nodes at the cost of a pointer lookup and an extra
allocation. Because that allocation will most likely come from a may very
well fail.
[glommer@openvz.org: fix warnings, added note about node lru]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Reviewed-by: Greg Thelen <gthelen@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Several subsystems use the same construct for LRU lists - a list head, a
spin lock and and item count. They also use exactly the same code for
adding and removing items from the LRU. Create a generic type for these
LRU lists.
This is the beginning of generic, node aware LRUs for shrinkers to work
with.
[glommer@openvz.org: enum defined constants for lru. Suggested by gthelen, don't relock over retry]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Reviewed-by: Greg Thelen <gthelen@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Convert superblock shrinker to use the new count/scan API, and propagate
the API changes through to the filesystem callouts. The filesystem
callouts already use a count/scan API, so it's just changing counters to
longs to match the VM API.
This requires the dentry and inode shrinker callouts to be converted to
the count/scan API. This is mainly a mechanical change.
[glommer@openvz.org: use mult_frac for fractional proportions, build fixes]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
With the dentry LRUs being per-sb structures, there is no real need for
a global dentry_lru_lock. The locking can be made more fine-grained by
moving to a per-sb LRU lock, isolating the LRU operations of different
filesytsems completely from each other. The need for this is independent
of any performance consideration that may arise: in the interest of
abstracting the lru operations away, it is mandatory that each lru works
around its own lock instead of a global lock for all of them.
[glommer@openvz.org: updated changelog ]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This series reworks our current object cache shrinking infrastructure in
two main ways:
* Noticing that a lot of users copy and paste their own version of LRU
lists for objects, we put some effort in providing a generic version.
It is modeled after the filesystem users: dentries, inodes, and xfs
(for various tasks), but we expect that other users could benefit in
the near future with little or no modification. Let us know if you
have any issues.
* The underlying list_lru being proposed automatically and
transparently keeps the elements in per-node lists, and is able to
manipulate the node lists individually. Given this infrastructure, we
are able to modify the up-to-now hammer called shrink_slab to proceed
with node-reclaim instead of always searching memory from all over like
it has been doing.
Per-node lru lists are also expected to lead to less contention in the lru
locks on multi-node scans, since we are now no longer fighting for a
global lock. The locks usually disappear from the profilers with this
change.
Although we have no official benchmarks for this version - be our guest to
independently evaluate this - earlier versions of this series were
performance tested (details at
http://permalink.gmane.org/gmane.linux.kernel.mm/100537) yielding no
visible performance regressions while yielding a better qualitative
behavior in NUMA machines.
With this infrastructure in place, we can use the list_lru entry point to
provide memcg isolation and per-memcg targeted reclaim. Historically,
those two pieces of work have been posted together. This version presents
only the infrastructure work, deferring the memcg work for a later time,
so we can focus on getting this part tested. You can see more about the
history of such work at http://lwn.net/Articles/552769/
Dave Chinner (18):
dcache: convert dentry_stat.nr_unused to per-cpu counters
dentry: move to per-sb LRU locks
dcache: remove dentries from LRU before putting on dispose list
mm: new shrinker API
shrinker: convert superblock shrinkers to new API
list: add a new LRU list type
inode: convert inode lru list to generic lru list code.
dcache: convert to use new lru list infrastructure
list_lru: per-node list infrastructure
shrinker: add node awareness
fs: convert inode and dentry shrinking to be node aware
xfs: convert buftarg LRU to generic code
xfs: rework buffer dispose list tracking
xfs: convert dquot cache lru to list_lru
fs: convert fs shrinkers to new scan/count API
drivers: convert shrinkers to new count/scan API
shrinker: convert remaining shrinkers to count/scan API
shrinker: Kill old ->shrink API.
Glauber Costa (7):
fs: bump inode and dentry counters to long
super: fix calculation of shrinkable objects for small numbers
list_lru: per-node API
vmscan: per-node deferred work
i915: bail out earlier when shrinker cannot acquire mutex
hugepage: convert huge zero page shrinker to new shrinker API
list_lru: dynamically adjust node arrays
This patch:
There are situations in very large machines in which we can have a large
quantity of dirty inodes, unused dentries, etc. This is particularly true
when umounting a filesystem, where eventually since every live object will
eventually be discarded.
Dave Chinner reported a problem with this while experimenting with the
shrinker revamp patchset. So we believe it is time for a change. This
patch just moves int to longs. Machines where it matters should have a
big long anyway.
Signed-off-by: Glauber Costa <glommer@openvz.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
For a long time no filesystem has been using vfs_follow_link, and as seen
by recent filesystem submissions any new use is accidental as well.
Remove vfs_follow_link, document the replacement in
Documentation/filesystems/porting and also rename __vfs_follow_link
to match its only caller better.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This patch adds the PCI ID's for HP Smart Array Gen9 controllers. Please
consider this patch for inclusion.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Pull device tree core updates from Grant Likely:
"Generally minor changes. A bunch of bug fixes, particularly for
initialization and some refactoring. Most notable change if feeding
the entire flattened tree into the random pool at boot. May not be
significant, but shouldn't hurt either"
Tim Bird questions whether the boot time cost of the random feeding may
be noticeable. And "add_device_randomness()" is definitely not some
speed deamon of a function.
* tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux:
of/platform: add error reporting to of_amba_device_create()
irq/of: Fix comment typo for irq_of_parse_and_map
of: Feed entire flattened device tree into the random pool
of/fdt: Clean up casting in unflattening path
of/fdt: Remove duplicate memory clearing on FDT unflattening
gpio: implement gpio-ranges binding document fix
of: call __of_parse_phandle_with_args from of_parse_phandle
of: introduce of_parse_phandle_with_fixed_args
of: move of_parse_phandle()
of: move documentation of of_parse_phandle_with_args
of: Fix missing memory initialization on FDT unflattening
of: consolidate definition of early_init_dt_alloc_memory_arch()
of: Make of_get_phy_mode() return int i.s.o. const int
include: dt-binding: input: create a DT header defining key codes.
of/platform: Staticize of_platform_device_create_pdata()
of: Specify initrd location using 64-bit
dt: Typo fix
OF: make of_property_for_each_{u32|string}() use parameters if OF is not enabled
Pull slave-dmaengine updates from Vinod Koul:
"This pull brings:
- Andy's DW driver updates
- Guennadi's sh driver updates
- Pl08x driver fixes from Tomasz & Alban
- Improvements to mmp_pdma by Daniel
- TI EDMA fixes by Joel
- New drivers:
- Hisilicon k3dma driver
- Renesas rcar dma driver
- New API for publishing slave driver capablities
- Various fixes across the subsystem by Andy, Jingoo, Sachin etc..."
* 'for-linus' of git://git.infradead.org/users/vkoul/slave-dma: (94 commits)
dma: edma: Remove limits on number of slots
dma: edma: Leave linked to Null slot instead of DUMMY slot
dma: edma: Find missed events and issue them
ARM: edma: Add function to manually trigger an EDMA channel
dma: edma: Write out and handle MAX_NR_SG at a given time
dma: edma: Setup parameters to DMA MAX_NR_SG at a time
dmaengine: pl330: use dma_set_max_seg_size to set the sg limit
dmaengine: dma_slave_caps: remove sg entries
dma: replace devm_request_and_ioremap by devm_ioremap_resource
dma: ste_dma40: Fix potential null pointer dereference
dma: ste_dma40: Remove duplicate const
dma: imx-dma: Remove redundant NULL check
dma: dmagengine: fix function names in comments
dma: add driver for R-Car HPB-DMAC
dma: k3dma: use devm_ioremap_resource() instead of devm_request_and_ioremap()
dma: imx-sdma: Staticize sdma_driver_data structures
pch_dma: Add MODULE_DEVICE_TABLE
dmaengine: PL08x: Add cyclic transfer support
dmaengine: PL08x: Fix reading the byte count in cctl
dmaengine: PL08x: Add support for different maximum transfer size
...
Pull MMC updates from Chris Ball:
"MMC highlights for 3.12:
Core:
- Support Allocation Units 8MB-64MB in SD3.0, previous max was 4MB.
- The slot-gpio helper can now handle GPIO debouncing card-detect.
- Read supported voltages from DT "voltage-ranges" property.
Drivers:
- dw_mmc: Add support for ARC architecture, and support exynos5420.
- mmc_spi: Support CD/RO GPIOs.
- sh_mobile_sdhi: Add compatibility for more Renesas SoCs.
- sh_mmcif: Add DT support for DMA channels"
* tag 'mmc-updates-for-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: (50 commits)
Revert "mmc: tmio-mmc: Remove .set_pwr() callback from platform data"
mmc: dw_mmc: Add support for ARC
mmc: sdhci-s3c: initialize host->quirks2 for using quirks2
mmc: sdhci-s3c: fix the wrong register value, when clock is disabled
mmc: esdhc: add support to get voltage from device-tree
mmc: sdhci: get voltage from sdhc host
mmc: core: parse voltage from device-tree
mmc: omap_hsmmc: use the generic config for omap2plus devices
mmc: omap_hsmmc: clear status flags before starting a new command
mmc: dw_mmc: exynos: Add a new compatible string for exynos5420
mmc: sh_mmcif: revision-specific CLK_CTRL2 handling
mmc: sh_mmcif: revision-specific Command Completion Signal handling
mmc: sh_mmcif: add support for Device Tree DMA bindings
mmc: sh_mmcif: move header include from header into .c
mmc: SDHI: add DT compatibility strings for further SoCs
mmc: dw_mmc-pci: enable bus-mastering mode
mmc: dw_mmc-pci: get resources from a proper BAR
mmc: tmio-mmc: Remove .set_pwr() callback from platform data
mmc: tmio-mmc: Remove .get_cd() callback from platform data
mmc: sh_mobile_sdhi: Remove .set_pwr() callback from platform data
...
Pull device-mapper updates from Mike Snitzer:
"Add the ability to collect I/O statistics on user-defined regions of a
device-mapper device. This dm-stats code required the reintroduction
of a div64_u64_rem() helper, but as a separate method that doesn't
slow down div64_u64() -- especially on 32-bit systems.
Allow the error target to replace request-based DM devices (e.g.
multipath) in addition to bio-based DM devices.
Various other small code fixes and improvements to thin-provisioning,
DM cache and the DM ioctl interface"
* tag 'dm-3.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm stripe: silence a couple sparse warnings
dm: add statistics support
dm thin: always return -ENOSPC if no_free_space is set
dm ioctl: cleanup error handling in table_load
dm ioctl: increase granularity of type_lock when loading table
dm ioctl: prevent rename to empty name or uuid
dm thin: set pool read-only if breaking_sharing fails block allocation
dm thin: prefix pool error messages with pool device name
dm: allow error target to replace bio-based and request-based targets
math64: New separate div64_u64_rem helper
dm space map: optimise sm_ll_dec and sm_ll_inc
dm btree: prefetch child nodes when walking tree for a dm_btree_del
dm btree: use pop_frame in dm_btree_del to cleanup code
dm cache: eliminate holes in cache structure
dm cache: fix stacking of geometry limits
dm thin: fix stacking of geometry limits
dm thin: add data block size limits to Documentation
dm cache: add data block size limits to code and Documentation
dm cache: document metadata device is exclussive to a cache
dm: stop using WQ_NON_REENTRANT
Pull md update from Neil Brown:
"Headline item is multithreading for RAID5 so that more IO/sec can be
supported on fast (SSD) devices. Also TILE-Gx SIMD suppor for RAID6
calculations and an assortment of bug fixes"
* tag 'md/3.12' of git://neil.brown.name/md:
raid5: only wakeup necessary threads
md/raid5: flush out all pending requests before proceeding with reshape.
md/raid5: use seqcount to protect access to shape in make_request.
raid5: sysfs entry to control worker thread number
raid5: offload stripe handle to workqueue
raid5: fix stripe release order
raid5: make release_stripe lockless
md: avoid deadlock when dirty buffers during md_stop.
md: Don't test all of mddev->flags at once.
md: Fix apparent cut-and-paste error in super_90_validate
raid6/test: replace echo -e with printf
RAID: add tilegx SIMD implementation of raid6
md: fix safe_mode buglet.
md: don't call md_allow_write in get_bitmap_file.
Pull vfs pile 3 (of many) from Al Viro:
"Waiman's conversion of d_path() and bits related to it,
kern_path_mountpoint(), several cleanups and fixes (exportfs
one is -stable fodder, IMO).
There definitely will be more... ;-/"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
split read_seqretry_or_unlock(), convert d_walk() to resulting primitives
dcache: Translating dentry into pathname without taking rename_lock
autofs4 - fix device ioctl mount lookup
introduce kern_path_mountpoint()
rename user_path_umountat() to user_path_mountpoint_at()
take unlazy_walk() into umount_lookup_last()
Kill indirect include of file.h from eventfd.h, use fdget() in cgroup.c
prune_super(): sb->s_op is never NULL
exportfs: don't assume that ->iterate() won't feed us too long entries
afs: get rid of redundant ->d_name.len checks
Pull dmaengine update from Dan Williams:
"Collection of random updates to the core and some end-driver fixups
for ioatdma and mv_xor:
- NUMA aware channel allocation
- Cleanup dmatest debugfs interface
- ioat: make raid-support Atom only
- mv_xor: big endian
Aside from the top three commits these have all had some soak time in
-next. The top commit fixes a recent build breakage.
It has been a long while since my last pull request, hopefully it does
not show. Thanks to Vinod for keeping an eye on drivers/dma/ this
past year"
* tag 'dmaengine-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine:
dmaengine: dma_sync_wait and dma_find_channel undefined
MAINTAINERS: update email for Dan Williams
dma: mv_xor: Fix incorrect error path
ioatdma: silence GCC warnings
dmaengine: make dma_channel_rebalance() NUMA aware
dmaengine: make dma_submit_error() return an error code
ioatdma: disable RAID on non-Atom platforms and reenable unaligned copies
mv_xor: support big endian systems using descriptor swap feature
mv_xor: use {readl, writel}_relaxed instead of __raw_{readl, writel}
dmatest: print message on debug level in case of no error
dmatest: remove IS_ERR_OR_NULL checks of debugfs calls
dmatest: make module parameters writable
Commit 7c30ed5 (cpufreq: make sure frequency transitions are
serialized) attempted to serialize frequency transitions by
adding checks to the CPUFREQ_PRECHANGE and CPUFREQ_POSTCHANGE
notifications. However, it assumed that the notifications will
always originate from the driver's .target() callback, but they
also can be triggered by cpufreq_out_of_sync() and that leads to
warnings like this on some systems:
WARNING: CPU: 0 PID: 14543 at drivers/cpufreq/cpufreq.c:317
__cpufreq_notify_transition+0x238/0x260()
In middle of another frequency transition
accompanied by a call trace similar to this one:
[<ffffffff81720daa>] dump_stack+0x46/0x58
[<ffffffff8106534c>] warn_slowpath_common+0x8c/0xc0
[<ffffffff815b8560>] ? acpi_cpufreq_target+0x320/0x320
[<ffffffff81065436>] warn_slowpath_fmt+0x46/0x50
[<ffffffff815b1ec8>] __cpufreq_notify_transition+0x238/0x260
[<ffffffff815b33be>] cpufreq_notify_transition+0x3e/0x70
[<ffffffff815b345d>] cpufreq_out_of_sync+0x6d/0xb0
[<ffffffff815b370c>] cpufreq_update_policy+0x10c/0x160
[<ffffffff815b3760>] ? cpufreq_update_policy+0x160/0x160
[<ffffffff81413813>] cpufreq_set_cur_state+0x8c/0xb5
[<ffffffff814138df>] processor_set_cur_state+0xa3/0xcf
[<ffffffff8158e13c>] thermal_cdev_update+0x9c/0xb0
[<ffffffff8159046a>] step_wise_throttle+0x5a/0x90
[<ffffffff8158e21f>] handle_thermal_trip+0x4f/0x140
[<ffffffff8158e377>] thermal_zone_device_update+0x57/0xa0
[<ffffffff81415b36>] acpi_thermal_check+0x2e/0x30
[<ffffffff81415ca0>] acpi_thermal_notify+0x40/0xdc
[<ffffffff813e7dbd>] acpi_device_notify+0x19/0x1b
[<ffffffff813f8241>] acpi_ev_notify_dispatch+0x41/0x5c
[<ffffffff813e3fbe>] acpi_os_execute_deferred+0x25/0x32
[<ffffffff81081060>] process_one_work+0x170/0x4a0
[<ffffffff81082121>] worker_thread+0x121/0x390
[<ffffffff81082000>] ? manage_workers.isra.20+0x170/0x170
[<ffffffff81088fe0>] kthread+0xc0/0xd0
[<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0
[<ffffffff8173582c>] ret_from_fork+0x7c/0xb0
[<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0
For this reason, revert commit 7c30ed5 along with the fix 266c13d
(cpufreq: Fix serialization of frequency transitions) on top of it
and we will revisit the serialization problem later.
Reported-by: Alessandro Bono <alessandro.bono@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit "cpufreq: serialize calls to __cpufreq_governor()" had been a temporary
and partial solution to the race condition between writing to a cpufreq sysfs
file and taking a CPU offline. Now that we have a proper and complete solution
to that problem, remove the temporary fix.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We can't take a big lock around __cpufreq_governor() as this causes
recursive locking for some cases. But calls to this routine must be
serialized for every policy. Otherwise we can see some unpredictable
events.
For example, consider following scenario:
__cpufreq_remove_dev()
__cpufreq_governor(policy, CPUFREQ_GOV_STOP);
policy->governor->governor(policy, CPUFREQ_GOV_STOP);
cpufreq_governor_dbs()
case CPUFREQ_GOV_STOP:
mutex_destroy(&cpu_cdbs->timer_mutex)
cpu_cdbs->cur_policy = NULL;
<PREEMPT>
store()
__cpufreq_set_policy()
__cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
policy->governor->governor(policy, CPUFREQ_GOV_LIMITS);
case CPUFREQ_GOV_LIMITS:
mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex)
if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL
And so store() will eventually result in a crash if cur_policy is
NULL at this point.
Introduce an additional variable which would guarantee serialization
here.
Reported-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
dma_sync_wait and dma_find_channel are declared regardless of whether
CONFIG_DMA_ENGINE is enabled, but calling the function without
CONFIG_DMA_ENGINE enabled results "undefined reference" errors.
To get around this, declare dma_sync_wait and dma_find_channel as inline
functions if CONFIG_DMA_ENGINE is undefined.
Signed-off-by: Jon Mason <jon.mason@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull ARM SoC late changes from Kevin Hilman:
"These are changes that arrived a little late before the merge window,
or had dependencies on previous branches.
Highlights:
- ux500: misc. cleanup, fixup I2C devices
- exynos: DT updates for RTC; PM updates
- at91: DT updates for NAND; new platforms added to generic defconfig
- sunxi: DT updates: cubieboard2, pinctrl driver, gated clocks
- highbank: LPAE fixes, select necessary ARM errata
- omap: PM fixes and improvements; OMAP5 mailbox support
- omap: basic support for new DRA7xx SoCs"
* tag 'late-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (60 commits)
ARM: dts: vexpress: Add CCI node to TC2 device-tree
ARM: EXYNOS: Skip C1 cpuidle state for exynos5440
ARM: EXYNOS: always enable PM domains support for EXYNOS4X12
ARM: highbank: clean-up some unused includes
ARM: sun7i: Enable the A20 clocks in the DTSI
ARM: sun6i: Enable clock support in the DTSI
ARM: sun5i: dt: Use the A10s gates in the DTSI
ARM: at91: at91_dt_defconfig: enable rm9200 support
ARM: dts: add ADC device tree node for exynos5420/5250
ARM: dts: Add RTC DT node to Exynos5420 SoC
ARM: dts: Update the "status" property of RTC DT node for Exynos5250 SoC
ARM: dts: Fix the RTC DT node name for Exynos5250
irqchip: mmp: avoid to include irqs head file
ARM: mmp: avoid to include head file in mach-mmp
irqchip: mmp: support irqchip
irqchip: move mmp irq driver
ARM: OMAP: AM33xx: clock: Add RNG clock data
ARM: OMAP: TI81XX: add always-on powerdomain for TI81XX
ARM: OMAP4: clock: Lock PLLs in the right sequence
ARM: OMAP: AM33XX: hwmod: Add hwmod data for debugSS
...
Pull ARM SoC driver update from Kevin Hilman:
"This contains the ARM SoC related driver updates for v3.12. The only
thing this cycle are core PM updates and CPUidle support for ARM's TC2
big.LITTLE development platform"
* tag 'drivers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
cpuidle: big.LITTLE: vexpress-TC2 CPU idle driver
ARM: vexpress: tc2: disable GIC CPU IF in tc2_pm_suspend
drivers: irq-chip: irq-gic: introduce gic_cpu_if_down()
Pull clock framework changes from Michael Turquette:
"The common clk framework changes for 3.12 are dominated by clock
driver patches, both new drivers and fixes to existing. A high
percentage of these are for Samsung platforms like Exynos. Core
framework fixes and some new features like automagical clock
re-parenting round out the patches"
* tag 'clk-for-linus-3.12' of git://git.linaro.org/people/mturquette/linux: (102 commits)
clk: only call get_parent if there is one
clk: samsung: exynos5250: Simplify registration of PLL rate tables
clk: samsung: exynos4: Register PLL rate tables for Exynos4x12
clk: samsung: exynos4: Register PLL rate tables for Exynos4210
clk: samsung: exynos4: Reorder registration of mout_vpllsrc
clk: samsung: pll: Add support for rate configuration of PLL46xx
clk: samsung: pll: Use new registration method for PLL46xx
clk: samsung: pll: Add support for rate configuration of PLL45xx
clk: samsung: pll: Use new registration method for PLL45xx
clk: samsung: exynos4: Rename exynos4_plls to exynos4x12_plls
clk: samsung: exynos4: Remove checks for DT node
clk: samsung: exynos4: Remove unused static clkdev aliases
clk: samsung: Modify _get_rate() helper to use __clk_lookup()
clk: samsung: exynos4: Use separate aliases for cpufreq related clocks
clocksource: samsung_pwm_timer: Get clock from device tree
ARM: dts: exynos4: Specify PWM clocks in PWM node
pwm: samsung: Update DT bindings documentation to cover clocks
clk: Move symbol export to proper location
clk: fix new_parent dereference before null check
clk: wm831x: Initialise wm831x pointer on init
...
Percpu frontend for allocating ids. With percpu allocation (that works),
it's impossible to guarantee it will always be possible to allocate all
nr_tags - typically, some will be stuck on a remote percpu freelist
where the current job can't get to them.
We do guarantee that it will always be possible to allocate at least
(nr_tags / 2) tags - this is done by keeping track of which and how many
cpus have tags on their percpu freelists. On allocation failure if
enough cpus have tags that there could potentially be (nr_tags / 2) tags
stuck on remote percpu freelists, we then pick a remote cpu at random to
steal from.
Note that there's no cpu hotplug notifier - we don't care, because
steal_tags() will eventually get the down cpu's tags. We _could_ satisfy
more allocations if we had a notifier - but we'll still meet our
guarantees and it's absolutely not a correctness issue, so I don't think
it's worth the extra code.
From akpm:
"It looks OK to me (that's as close as I get to an ack :))
v6 changes:
- Add #include <linux/cpumask.h> to include/linux/percpu_ida.h to
make alpha/arc builds happy (Fengguang)
- Move second (cpu >= nr_cpu_ids) check inside of first check scope
in steal_tags() (akpm + nab)
v5 changes:
- Change percpu_ida->cpus_have_tags to cpumask_t (kmo + akpm)
- Add comment for percpu_ida_cpu->lock + ->nr_free (kmo + akpm)
- Convert steal_tags() to use cpumask_weight() + cpumask_next() +
cpumask_first() + cpumask_clear_cpu() (kmo + akpm)
- Add comment for alloc_global_tags() (kmo + akpm)
- Convert percpu_ida_alloc() to use cpumask_set_cpu() (kmo + akpm)
- Convert percpu_ida_free() to use cpumask_set_cpu() (kmo + akpm)
- Drop percpu_ida->cpus_have_tags allocation in percpu_ida_init()
(kmo + akpm)
- Drop percpu_ida->cpus_have_tags kfree in percpu_ida_destroy()
(kmo + akpm)
- Add comment for percpu_ida_alloc @ gfp (kmo + akpm)
- Move to percpu_ida.c + percpu_ida.h (kmo + akpm + nab)
v4 changes:
- Fix tags.c reference in percpu_ida_init (akpm)
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Pull xfs updates from Ben Myers:
"For 3.12-rc1 there are a number of bugfixes in addition to work to
ease usage of shared code between libxfs and the kernel, the rest of
the work to enable project and group quotas to be used simultaneously,
performance optimisations in the log and the CIL, directory entry file
type support, fixes for log space reservations, some spelling/grammar
cleanups, and the addition of user namespace support.
- introduce readahead to log recovery
- add directory entry file type support
- fix a number of spelling errors in comments
- introduce new Q_XGETQSTATV quotactl for project quotas
- add USER_NS support
- log space reservation rework
- CIL optimisations
- kernel/userspace libxfs rework"
* tag 'xfs-for-linus-v3.12-rc1' of git://oss.sgi.com/xfs/xfs: (112 commits)
xfs: XFS_MOUNT_QUOTA_ALL needed by userspace
xfs: dtype changed xfs_dir2_sfe_put_ino to xfs_dir3_sfe_put_ino
Fix wrong flag ASSERT in xfs_attr_shortform_getvalue
xfs: finish removing IOP_* macros.
xfs: inode log reservations are too small
xfs: check correct status variable for xfs_inobt_get_rec() call
xfs: inode buffers may not be valid during recovery readahead
xfs: check LSN ordering for v5 superblocks during recovery
xfs: btree block LSN escaping to disk uninitialised
XFS: Assertion failed: first <= last && last < BBTOB(bp->b_length), file: fs/xfs/xfs_trans_buf.c, line: 568
xfs: fix bad dquot buffer size in log recovery readahead
xfs: don't account buffer cancellation during log recovery readahead
xfs: check for underflow in xfs_iformat_fork()
xfs: xfs_dir3_sfe_put_ino can be static
xfs: introduce object readahead to log recovery
xfs: Simplify xfs_ail_min() with list_first_entry_or_null()
xfs: Register hotcpu notifier after initialization
xfs: add xfs sb v4 support for dirent filetype field
xfs: Add write support for dirent filetype field
xfs: Add read-only support for dirent filetype field
...
Without a way to flush the osd client's notify workqueue, a watch
event that is unregistered could continue receiving callbacks
indefinitely.
Unregistering the event simply means no new notifies are added to the
queue, but there may still be events in the queue that will call the
watch callback for the event. If the queue is flushed after the event
is unregistered, the caller can be sure no more watch callbacks will
occur for the canceled watch.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Pull mtd updates from David Woodhouse:
- factor out common code from MTD tests
- nand-gpio cleanup and portability to non-ARM
- m25p80 support for 4-byte addressing chips, other new chips
- pxa3xx cleanup and support for new platforms
- remove obsolete alauda, octagon-5066 drivers
- erase/write support for bcm47xxsflash
- improve detection of ECC requirements for NAND, controller setup
- NFC acceleration support for atmel-nand, read/write via SRAM
- etc
* tag 'for-linus-20130909' of git://git.infradead.org/linux-mtd: (184 commits)
mtd: chips: Add support for PMC SPI Flash chips in m25p80.c
mtd: ofpart: use for_each_child_of_node() macro
mtd: mtdswap: replace strict_strtoul() with kstrtoul()
mtd cs553x_nand: use kzalloc() instead of memset
mtd: atmel_nand: fix error return code in atmel_nand_probe()
mtd: bcm47xxsflash: writing support
mtd: bcm47xxsflash: implement erasing support
mtd: bcm47xxsflash: convert to module_platform_driver instead of init/exit
mtd: bcm47xxsflash: convert kzalloc to avoid invalid access
mtd: remove alauda driver
mtd: nand: mxc_nand: mark 'const' properly
mtd: maps: cfi_flagadm: add missing __iomem annotation
mtd: spear_smi: add missing __iomem annotation
mtd: r852: Staticize local symbols
mtd: nandsim: Staticize local symbols
mtd: impa7: add missing __iomem annotation
mtd: sm_ftl: Staticize local symbols
mtd: m25p80: add support for mr25h10
mtd: m25p80: make CONFIG_M25PXX_USE_FAST_READ safe to enable
mtd: m25p80: Pass flags through CAT25_INFO macro
...
Pull DMA mapping update from Marek Szyprowski:
"This contains an addition of Device Tree support for reserved memory
regions (Contiguous Memory Allocator is one of the drivers for it) and
changes required by the KVM extensions for PowerPC architectue"
* 'for-v3.12' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
ARM: init: add support for reserved memory defined by device tree
drivers: of: add initialization code for dma reserved memory
drivers: of: add function to scan fdt nodes given by path
drivers: dma-contiguous: clean source code and prepare for device tree
Pull VFIO update from Alex Williamson:
"VFIO updates include safer default file flags for VFIO device fds, an
external user interface exported to allow other modules to hold
references to VFIO groups, a fix to test for extended config space on
PCIe and PCI-x, and new hot reset interfaces for PCI devices which
allows the user to do PCI bus/slot resets when all of the devices
affected by the reset are owned by the user.
For this last feature, the PCI bus reset interface, I depend on
changes already merged from Bjorn's PCI pull request. I therefore
merged my tree up to commit cb3e433, which I think was the correct
action, but as Stephen Rothwell noted, I failed to provide a commit
message indicating why the merge was required. Sorry for that.
Thanks, Alex"
* tag 'vfio-v3.12-rc0' of git://github.com/awilliam/linux-vfio:
vfio: fix documentation
vfio-pci: PCI hot reset interface
vfio-pci: Test for extended config space
vfio-pci: Use fdget() rather than eventfd_fget()
vfio: Add O_CLOEXEC flag to vfio device fd
vfio: use get_unused_fd_flags(0) instead of get_unused_fd()
vfio: add external user support