Commit Graph

39896 Commits

Author SHA1 Message Date
Michal Hocko
d715ae08f2 memcg: rename high level charging functions
mem_cgroup_newpage_charge is used only for charging anonymous memory so
it is better to rename it to mem_cgroup_charge_anon.

mem_cgroup_cache_charge is used for file backed memory so rename it to
mem_cgroup_charge_file.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:35:57 -07:00
Johannes Weiner
df38197546 memcg: get_mem_cgroup_from_mm()
Instead of returning NULL from try_get_mem_cgroup_from_mm() when the mm
owner is exiting, just return root_mem_cgroup.  This makes sense for all
callsites and gets rid of some of them having to fallback manually.

[fengguang.wu@intel.com: fix warnings]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:35:56 -07:00
Kirill A. Shutemov
d230dec18d mm: use 'const char *' insted of 'char *' for reason in dump_page()
I tried to use 'dump_page(page, __func__)' for debugging, but it triggers
warning:

  warning: passing argument 2 of `dump_page' discards `const' qualifier from pointer target type [enabled by default]

Let's convert 'reason' to 'const char *' in dump_page() and friends: we
shouldn't modify it anyway.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:35:55 -07:00
David Rientjes
539a13b47e res_counter: remove interface for locked charging and uncharging
The res_counter_{charge,uncharge}_locked() variants are not used in the
kernel outside of the resource counter code itself, so remove the
interface.

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Tim Hockin <thockin@google.com>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:35:54 -07:00
David Rientjes
f0432d1596 mm, mempolicy: remove per-process flag
PF_MEMPOLICY is an unnecessary optimization for CONFIG_SLAB users.
There's no significant performance degradation to checking
current->mempolicy rather than current->flags & PF_MEMPOLICY in the
allocation path, especially since this is considered unlikely().

Running TCP_RR with netperf-2.4.5 through localhost on 16 cpu machine with
64GB of memory and without a mempolicy:

	threads		before		after
	16		1249409		1244487
	32		1281786		1246783
	48		1239175		1239138
	64		1244642		1241841
	80		1244346		1248918
	96		1266436		1254316
	112		1307398		1312135
	128		1327607		1326502

Per-process flags are a scarce resource so we should free them up whenever
possible and make them available.  We'll be using it shortly for memcg oom
reserves.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Tim Hockin <thockin@google.com>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:35:54 -07:00
David Rientjes
2a389610a7 mm, mempolicy: rename slab_node for clarity
slab_node() is actually a mempolicy function, so rename it to
mempolicy_slab_node() to make it clearer that it used for processes with
mempolicies.

At the same time, cleanup its code by saving numa_mem_id() in a local
variable (since we require a node with memory, not just any node) and
remove an obsolete comment that assumes the mempolicy is actually passed
into the function.

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Tim Hockin <thockin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:35:54 -07:00
Davidlohr Bueso
615d6e8756 mm: per-thread vma caching
This patch is a continuation of efforts trying to optimize find_vma(),
avoiding potentially expensive rbtree walks to locate a vma upon faults.
The original approach (https://lkml.org/lkml/2013/11/1/410), where the
largest vma was also cached, ended up being too specific and random,
thus further comparison with other approaches were needed.  There are
two things to consider when dealing with this, the cache hit rate and
the latency of find_vma().  Improving the hit-rate does not necessarily
translate in finding the vma any faster, as the overhead of any fancy
caching schemes can be too high to consider.

We currently cache the last used vma for the whole address space, which
provides a nice optimization, reducing the total cycles in find_vma() by
up to 250%, for workloads with good locality.  On the other hand, this
simple scheme is pretty much useless for workloads with poor locality.
Analyzing ebizzy runs shows that, no matter how many threads are
running, the mmap_cache hit rate is less than 2%, and in many situations
below 1%.

The proposed approach is to replace this scheme with a small per-thread
cache, maximizing hit rates at a very low maintenance cost.
Invalidations are performed by simply bumping up a 32-bit sequence
number.  The only expensive operation is in the rare case of a seq
number overflow, where all caches that share the same address space are
flushed.  Upon a miss, the proposed replacement policy is based on the
page number that contains the virtual address in question.  Concretely,
the following results are seen on an 80 core, 8 socket x86-64 box:

1) System bootup: Most programs are single threaded, so the per-thread
   scheme does improve ~50% hit rate by just adding a few more slots to
   the cache.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 50.61%   | 19.90            |
| patched        | 73.45%   | 13.58            |
+----------------+----------+------------------+

2) Kernel build: This one is already pretty good with the current
   approach as we're dealing with good locality.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 75.28%   | 11.03            |
| patched        | 88.09%   | 9.31             |
+----------------+----------+------------------+

3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 70.66%   | 17.14            |
| patched        | 91.15%   | 12.57            |
+----------------+----------+------------------+

4) Ebizzy: There's a fair amount of variation from run to run, but this
   approach always shows nearly perfect hit rates, while baseline is just
   about non-existent.  The amounts of cycles can fluctuate between
   anywhere from ~60 to ~116 for the baseline scheme, but this approach
   reduces it considerably.  For instance, with 80 threads:

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 1.06%    | 91.54            |
| patched        | 99.97%   | 14.18            |
+----------------+----------+------------------+

[akpm@linux-foundation.org: fix nommu build, per Davidlohr]
[akpm@linux-foundation.org: document vmacache_valid() logic]
[akpm@linux-foundation.org: attempt to untangle header files]
[akpm@linux-foundation.org: add vmacache_find() BUG_ON]
[hughd@google.com: add vmacache_valid_mm() (from Oleg)]
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: adjust and enhance comments]
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Michel Lespinasse <walken@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Tested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:35:53 -07:00
Kirill A. Shutemov
f1820361f8 mm: implement ->map_pages for page cache
filemap_map_pages() is generic implementation of ->map_pages() for
filesystems who uses page cache.

It should be safe to use filemap_map_pages() for ->map_pages() if
filesystem use filemap_fault() for ->fault().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ning Qu <quning@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:35:53 -07:00
Kirill A. Shutemov
8c6e50b029 mm: introduce vm_ops->map_pages()
Here's new version of faultaround patchset.  It took a while to tune it
and collect performance data.

First patch adds new callback ->map_pages to vm_operations_struct.

->map_pages() is called when VM asks to map easy accessible pages.
Filesystem should find and map pages associated with offsets from
"pgoff" till "max_pgoff".  ->map_pages() is called with page table
locked and must not block.  If it's not possible to reach a page without
blocking, filesystem should skip it.  Filesystem should use do_set_pte()
to setup page table entry.  Pointer to entry associated with offset
"pgoff" is passed in "pte" field in vm_fault structure.  Pointers to
entries for other offsets should be calculated relative to "pte".

Currently VM use ->map_pages only on read page fault path.  We try to
map FAULT_AROUND_PAGES a time.  FAULT_AROUND_PAGES is 16 for now.
Performance data for different FAULT_AROUND_ORDER is below.

TODO:
 - implement ->map_pages() for shmem/tmpfs;
 - modify get_user_pages() to be able to use ->map_pages() and implement
   mmap(MAP_POPULATE|MAP_NONBLOCK) on top.

=========================================================================
Tested on 4-socket machine (120 threads) with 128GiB of RAM.

Few real-world workloads. The sweet spot for FAULT_AROUND_ORDER here is
somewhere between 3 and 5. Let's say 4 :)

Linux build (make -j60)
FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
	minor-faults		283,301,572	247,151,987	212,215,789	204,772,882	199,568,944	194,703,779	193,381,485
	time, seconds		151.227629483	153.920996480	151.356125472	150.863792049	150.879207877	151.150764954	151.450962358
Linux rebuild (make -j60)
FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
	minor-faults		5,396,854	4,148,444	2,855,286	2,577,282	2,361,957	2,169,573	2,112,643
	time, seconds		27.404543757	27.559725591	27.030057426	26.855045126	26.678618635	26.974523490	26.761320095
Git test suite (make -j60 test)
FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
	minor-faults		129,591,823	99,200,751	66,106,718	57,606,410	51,510,808	45,776,813	44,085,515
	time, seconds		66.087215026	64.784546905	64.401156567	65.282708668	66.034016829	66.793780811	67.237810413

Two synthetic tests: access every word in file in sequential/random order.
It doesn't improve much after FAULT_AROUND_ORDER == 4.

Sequential access 16GiB file
FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
 1 thread
	minor-faults		4,195,437	2,098,275	525,068		262,251		131,170		32,856		8,282
	time, seconds		7.250461742	6.461711074	5.493859139	5.488488147	5.707213983	5.898510832	5.109232856
 8 threads
	minor-faults		33,557,540	16,892,728	4,515,848	2,366,999	1,423,382	442,732		142,339
	time, seconds		16.649304881	9.312555263	6.612490639	6.394316732	6.669827501	6.75078944	6.371900528
 32 threads
	minor-faults		134,228,222	67,526,810	17,725,386	9,716,537	4,763,731	1,668,921	537,200
	time, seconds		49.164430543	29.712060103	12.938649729	10.175151004	11.840094583	9.594081325	9.928461797
 60 threads
	minor-faults		251,687,988	126,146,952	32,919,406	18,208,804	10,458,947	2,733,907	928,217
	time, seconds		86.260656897	49.626551828	22.335007632	17.608243696	16.523119035	16.339489186	16.326390902
 120 threads
	minor-faults		503,352,863	252,939,677	67,039,168	35,191,827	19,170,091	4,688,357	1,471,862
	time, seconds		124.589206333	79.757867787	39.508707872	32.167281632	29.972989292	28.729834575	28.042251622
Random access 1GiB file
 1 thread
	minor-faults		262,636		132,743		34,369		17,299		8,527		3,451		1,222
	time, seconds		15.351890914	16.613802482	16.569227308	15.179220992	16.557356122	16.578247824	15.365266994
 8 threads
	minor-faults		2,098,948	1,061,871	273,690		154,501		87,110		25,663		7,384
	time, seconds		15.040026343	15.096933500	14.474757288	14.289129964	14.411537468	14.296316837	14.395635804
 32 threads
	minor-faults		8,390,734	4,231,023	1,054,432	528,847		269,242		97,746		26,881
	time, seconds		20.430433109	21.585235358	22.115062928	14.872878951	14.880856305	14.883370649	14.821261690
 60 threads
	minor-faults		15,733,258	7,892,809	1,973,393	988,266		594,789		164,994		51,691
	time, seconds		26.577302548	25.692397770	18.728863715	20.153026398	21.619101933	17.745086260	17.613215273
 120 threads
	minor-faults		31,471,111	15,816,616	3,959,209	1,978,685	1,008,299	264,635		96,010
	time, seconds		41.835322703	40.459786095	36.085306105	35.313894834	35.814445675	36.552633793	34.289210594

Touch only one page in page table in 16GiB file
FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
 1 thread
	minor-faults		8,372		8,324		8,270		8,260		8,249		8,239		8,237
	time, seconds		0.039892712	0.045369149	0.051846126	0.063681685	0.079095975	0.17652406	0.541213386
 8 threads
	minor-faults		65,731		65,681		65,628		65,620		65,608		65,599		65,596
	time, seconds		0.124159196	0.488600638	0.156854426	0.191901957	0.242631486	0.543569456	1.677303984
 32 threads
	minor-faults		262,388		262,341		262,285		262,276		262,266		262,257		263,183
	time, seconds		0.452421421	0.488600638	0.565020946	0.648229739	0.789850823	1.651584361	5.000361559
 60 threads
	minor-faults		491,822		491,792		491,723		491,711		491,701		491,691		491,825
	time, seconds		0.763288616	0.869620515	0.980727360	1.161732354	1.466915814	3.04041448	9.308612938
 120 threads
	minor-faults		983,466		983,655		983,366		983,372		983,363		984,083		984,164
	time, seconds		1.595846553	1.667902182	2.008959376	2.425380942	2.941368804	5.977807890	18.401846125

This patch (of 2):

Introduce new vm_ops callback ->map_pages() and uses it for mapping easy
accessible pages around fault address.

On read page fault, if filesystem provides ->map_pages(), we try to map up
to FAULT_AROUND_PAGES pages around page fault address in hope to reduce
number of minor page faults.

We call ->map_pages first and use ->fault() as fallback if page by the
offset is not ready to be mapped (cold page cache or something).

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ning Qu <quning@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:35:52 -07:00
Alex Thorlton
a0715cc226 mm, thp: add VM_INIT_DEF_MASK and PRCTL_THP_DISABLE
Add VM_INIT_DEF_MASK, to allow us to set the default flags for VMs.  It
also adds a prctl control which allows us to set the THP disable bit in
mm->def_flags so that VMs will pick up the setting as they are created.

Signed-off-by: Alex Thorlton <athorlton@sgi.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:35:52 -07:00
Linus Torvalds
467a9e1633 Merge tag 'cpu-hotplug-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull CPU hotplug notifiers registration fixes from Rafael Wysocki:
 "The purpose of this single series of commits from Srivatsa S Bhat
  (with a small piece from Gautham R Shenoy) touching multiple
  subsystems that use CPU hotplug notifiers is to provide a way to
  register them that will not lead to deadlocks with CPU online/offline
  operations as described in the changelog of commit 93ae4f978c ("CPU
  hotplug: Provide lockless versions of callback registration
  functions").

  The first three commits in the series introduce the API and document
  it and the rest simply goes through the users of CPU hotplug notifiers
  and converts them to using the new method"

* tag 'cpu-hotplug-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (52 commits)
  net/iucv/iucv.c: Fix CPU hotplug callback registration
  net/core/flow.c: Fix CPU hotplug callback registration
  mm, zswap: Fix CPU hotplug callback registration
  mm, vmstat: Fix CPU hotplug callback registration
  profile: Fix CPU hotplug callback registration
  trace, ring-buffer: Fix CPU hotplug callback registration
  xen, balloon: Fix CPU hotplug callback registration
  hwmon, via-cputemp: Fix CPU hotplug callback registration
  hwmon, coretemp: Fix CPU hotplug callback registration
  thermal, x86-pkg-temp: Fix CPU hotplug callback registration
  octeon, watchdog: Fix CPU hotplug callback registration
  oprofile, nmi-timer: Fix CPU hotplug callback registration
  intel-idle: Fix CPU hotplug callback registration
  clocksource, dummy-timer: Fix CPU hotplug callback registration
  drivers/base/topology.c: Fix CPU hotplug callback registration
  acpi-cpufreq: Fix CPU hotplug callback registration
  zsmalloc: Fix CPU hotplug callback registration
  scsi, fcoe: Fix CPU hotplug callback registration
  scsi, bnx2fc: Fix CPU hotplug callback registration
  scsi, bnx2i: Fix CPU hotplug callback registration
  ...
2014-04-07 14:55:46 -07:00
Arnd Bergmann
b8780c363d sched: remove sleep_on() and friends
This is the final piece in the puzzle, as all patches to remove the
last users of \(interruptible_\|\)sleep_on\(_timeout\|\) have made it
into the 3.15 merge window. The work was long overdue, and this
interface in particular should not have survived the BKL removal
that was done a couple of years ago.

Citing Jon Corbet from http://lwn.net/2001/0201/kernel.php3":

 "[...] it was suggested that the janitors look for and fix all code
  that calls sleep_on() [...] since (1) almost all such code is
  incorrect, and (2) Linus has agreed that those functions should
  be removed in the 2.5 development series".

We haven't quite made it for 2.5, but maybe we can merge this for 3.15.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 11:24:06 -07:00
Linus Torvalds
240cd6a817 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph updates from Sage Weil:
 "The biggest chunk is a series of patches from Ilya that add support
  for new Ceph osd and crush map features, including some new tunables,
  primary affinity, and the new encoding that is needed for erasure
  coding support.  This brings things into parity with the server side
  and the looming firefly release.  There is also support for allocation
  hints in RBD that help limit fragmentation on the server side.

  There is also a series of patches from Zheng fixing NFS reexport,
  directory fragmentation support, flock vs fnctl behavior, and some
  issues with clustered MDS.

  Finally, there are some miscellaneous fixes from Yunchuan Wen for
  fscache, Fabian Frederick for ACLs, and from me for fsync(dirfd)
  behavior"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (79 commits)
  ceph: skip invalid dentry during dcache readdir
  libceph: dump pool {read,write}_tier to debugfs
  libceph: output primary affinity values on osdmap updates
  ceph: flush cap release queue when trimming session caps
  ceph: don't grabs open file reference for aborted request
  ceph: drop extra open file reference in ceph_atomic_open()
  ceph: preallocate buffer for readdir reply
  libceph: enable PRIMARY_AFFINITY feature bit
  libceph: redo ceph_calc_pg_primary() in terms of ceph_calc_pg_acting()
  libceph: add support for osd primary affinity
  libceph: add support for primary_temp mappings
  libceph: return primary from ceph_calc_pg_acting()
  libceph: switch ceph_calc_pg_acting() to new helpers
  libceph: introduce apply_temps() helper
  libceph: introduce pg_to_raw_osds() and raw_to_up_osds() helpers
  libceph: ceph_can_shift_osds(pool) and pool type defines
  libceph: ceph_osd_{exists,is_up,is_down}(osd) definitions
  libceph: enable OSDMAP_ENC feature bit
  libceph: primary_affinity decode bits
  libceph: primary_affinity infrastructure
  ...
2014-04-07 11:09:13 -07:00
Jon Mason
53ca4fea0b NTB: Code Style Clean-up
Some white space and 80 char overruns corrected.

Signed-off-by: Jon Mason <jon.mason@intel.com>
2014-04-07 10:59:19 -07:00
Jon Mason
403c63cb6d NTB: client event cleanup
Provide a better event interface between the client and transport

Signed-off-by: Jon Mason <jon.mason@intel.com>
2014-04-07 10:59:19 -07:00
Linus Torvalds
3021112598 Merge tag 'for-f2fs-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
 "This patch-set includes the following major enhancement patches.
   - introduce large directory support
   - introduce f2fs_issue_flush to merge redundant flush commands
   - merge write IOs as much as possible aligned to the segment
   - add sysfs entries to tune the f2fs configuration
   - use radix_tree for the free_nid_list to reduce in-memory operations
   - remove costly bit operations in f2fs_find_entry
   - enhance the readahead flow for CP/NAT/SIT/SSA blocks

  The other bug fixes are as follows:
   - recover xattr node blocks correctly after sudden-power-cut
   - fix to calculate the maximum number of node ids
   - enhance to handle many error cases

  And, there are a bunch of cleanups"

* tag 'for-f2fs-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (62 commits)
  f2fs: fix wrong statistics of inline data
  f2fs: check the acl's validity before setting
  f2fs: introduce f2fs_issue_flush to avoid redundant flush issue
  f2fs: fix to cover io->bio with io_rwsem
  f2fs: fix error path when fail to read inline data
  f2fs: use list_for_each_entry{_safe} for simplyfying code
  f2fs: avoid free slab cache under spinlock
  f2fs: avoid unneeded lookup when xattr name length is too long
  f2fs: avoid unnecessary bio submit when wait page writeback
  f2fs: return -EIO when node id is not matched
  f2fs: avoid RECLAIM_FS-ON-W warning
  f2fs: skip unnecessary node writes during fsync
  f2fs: introduce fi->i_sem to protect fi's info
  f2fs: change reclaim rate in percentage
  f2fs: add missing documentation for dir_level
  f2fs: remove unnecessary threshold
  f2fs: throttle the memory footprint with a sysfs entry
  f2fs: avoid to drop nat entries due to the negative nr_shrink
  f2fs: call f2fs_wait_on_page_writeback instead of native function
  f2fs: introduce nr_pages_to_write for segment alignment
  ...
2014-04-07 10:55:36 -07:00
Linus Torvalds
e5744abb2f Merge tag 'mfd-for-linus-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MFD updates from Lee Jones:
 "Changes to existing drivers:
   - Use of managed resources - omap, twl4030, ti_am335x_tscadc
   - Advanced error handling - omap
   - Rework clk management - omap
   - Device Tree (re-)work - tc3589x, pm8921, da9055, sec
   - IRC management overhaul and !BROKEN - pm8921
   - Convert to regmap - ssbi, pm8921
   - Use simple power-management ops - ucb1x00
   - Include file clean-up - adp5520, cs5535, janz, lpc_ich,
      - lpc_sch, max14577, mcp-sa11x0, pcf50633-adc, rc5t583,
      	rdc321x-southbridge, retu, smsc-ece1099, ti-ssp, ti_am335x_tscadc,
	tps65912, vexpress-config, wm8350, ywm8350
   - Various bug fixes across the subsystem
      - NULL/invalid pointer dereference prevention
      - Resource leak mitigation,
      - Variable used initialised
      - Staticise various containers
      - Enforce return value checks

  New drivers/supported devices:
   - Add support for s2mps14 and s2mpa01 to sec
   - Add support for da9063 (v5) to da9063
   - Add support for atom-c2000 to gpio-ich
   - Add support for come-{mbt10,cbt6,chl6} to kempld
   - Add support for da9053 to da9052
   - Add support for itco-wdt (v3) and baytrail to lpc_ich
   - Add new drivers for tps65218, rtsx_usb, bcm590xx

  (Re-)moved drivers:
   - twl4030 ==> drivers/iio
   - ti-ssp  ==> /dev/null"

* tag 'mfd-for-linus-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (103 commits)
  mfd: wm5110: Correct default for HEADPHONE_DETECT_1
  mfd: arizona: Correct small errors in the DT binding documentation
  mfd: arizona: Mark DSP clocking register as volatile
  mfd: devicetree: bindings: Add pm8xxx RTC description
  mfd: kempld-core: Fix potential hang-up during boot
  mfd: sec-core: Fix uninitialized 'regmap_rtc' on S2MPA01
  mfd: tps65910: Fix regmap_irq_chip_data leak on mfd_add_devices fail
  mfd: tps65910: Fix possible invalid pointer dereference on regmap_add_irq_chip fail
  mfd: sec-core: Fix I2C dummy device resource leak on probe failure
  mfd: sec-core: Add of_compatible strings for clock MFD cells
  mfd: Remove obsolete ti-ssp driver
  Documentation: mfd: s2mps11: Describe S5M8767 and S2MPS14 clocks
  mfd: bcm590xx: Fix type argument for module device table
  mfd: lpc_ich: Add support for Intel Bay Trail SoC
  mfd: lpc_ich: Add support for NM10 GPIO
  mfd: lpc_ich: Change Avoton to iTCO v3
  watchdog: iTCO_wdt: Add support for v3 silicon
  mfd: lpc_ich: Add support for iTCO v3
  mfd: lpc_ich: Remove lpc_ich_cfg struct use
  mfd: lpc_ich: Only configure watchdog or GPIO when present
  ...
2014-04-07 10:24:18 -07:00
Linus Torvalds
c29aa153ef Merge tag 'for-linus-20140405' of git://git.infradead.org/linux-mtd
Pull MTD updates from Brian Norris:
 - A few SPI NOR ID definitions
 - Kill the NAND "max pagesize" restriction
 - Fix some x16 bus-width NAND support
 - Add NAND JEDEC parameter page support
 - DT bindings for NAND ECC
 - GPMI NAND updates (subpage reads)
 - More OMAP NAND refactoring
 - New STMicro SPI NOR driver (now in 40 patches!)
 - A few other random bugfixes

* tag 'for-linus-20140405' of git://git.infradead.org/linux-mtd: (120 commits)
  Fix index regression in nand_read_subpage
  mtd: diskonchip: mem resource name is not optional
  mtd: nand: fix mention to CONFIG_MTD_NAND_ECC_BCH
  mtd: nand: fix GET/SET_FEATURES address on 16-bit devices
  mtd: omap2: Use devm_ioremap_resource()
  mtd: denali_dt: Use devm_ioremap_resource()
  mtd: devices: elm: update DRIVER_NAME as "omap-elm"
  mtd: devices: elm: configure parallel channels based on ecc_steps
  mtd: devices: elm: clean elm_load_syndrome
  mtd: devices: elm: check for hardware engine's design constraints
  mtd: st_spi_fsm: Succinctly reorganise .remove()
  mtd: st_spi_fsm: Allow loop to run at least once before giving up CPU
  mtd: st_spi_fsm: Correct vendor name spelling issue - missing "M"
  mtd: st_spi_fsm: Avoid duplicating MTD core code
  mtd: st_spi_fsm: Remove useless consts from function arguments
  mtd: st_spi_fsm: Convert ST SPI FSM (NOR) Flash driver to new DT partitions
  mtd: st_spi_fsm: Move runtime configurable msg sequences into device's struct
  mtd: st_spi_fsm: Supply the W25Qxxx chip specific configuration call-back
  mtd: st_spi_fsm: Supply the S25FLxxx chip specific configuration call-back
  mtd: st_spi_fsm: Supply the MX25xxx chip specific configuration call-back
  ...
2014-04-07 10:17:30 -07:00
Viresh Kumar
7f4b04614a cpufreq: create another field .flags in cpufreq_frequency_table
Currently cpufreq frequency table has two fields: frequency and driver_data.
driver_data is only for drivers' internal use and cpufreq core shouldn't use
it at all. But with the introduction of BOOST frequencies, this assumption
was broken and we started using it as a flag instead.

There are two problems due to this:
- It is against the description of this field, as driver's data is used by
  the core now.
- if drivers fill it with -3 for any frequency, then those frequencies are
  never considered by cpufreq core as it is exactly same as value of
  CPUFREQ_BOOST_FREQ, i.e. ~2.

The best way to get this fixed is by creating another field flags which
will be used for such flags. This patch does that. Along with that various
drivers need modifications due to the change of struct cpufreq_frequency_table.

Reviewed-by: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-04-07 14:43:50 +02:00
Linus Torvalds
2b3a8fd735 Merge tag 'nfs-for-3.15-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
 "Highlights include:

   - Stable fix for a use after free issue in the NFSv4.1 open code
   - Fix the SUNRPC bi-directional RPC code to account for TCP segmentation
   - Optimise usage of readdirplus when confronted with 'ls -l' situations
   - Soft mount bugfixes
   - NFS over RDMA bugfixes
   - NFSv4 close locking fixes
   - Various NFSv4.x client state management optimisations
   - Rename/unlink code cleanups"

* tag 'nfs-for-3.15-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (28 commits)
  nfs: pass string length to pr_notice message about readdir loops
  NFSv4: Fix a use-after-free problem in open()
  SUNRPC: rpc_restart_call/rpc_restart_call_prepare should clear task->tk_status
  SUNRPC: Don't let rpc_delay() clobber non-timeout errors
  SUNRPC: Ensure call_connect_status() deals correctly with SOFTCONN tasks
  SUNRPC: Ensure call_status() deals correctly with SOFTCONN tasks
  NFSv4: Ensure we respect soft mount timeouts during trunking discovery
  NFSv4: Schedule recovery if nfs40_walk_client_list() is interrupted
  NFS: advertise only supported callback netids
  SUNRPC: remove KERN_INFO from dprintk() call sites
  SUNRPC: Fix large reads on NFS/RDMA
  NFS: Clean up: revert increase in READDIR RPC buffer max size
  SUNRPC: Ensure that call_bind times out correctly
  SUNRPC: Ensure that call_connect times out correctly
  nfs: emit a fsnotify_nameremove call in sillyrename codepath
  nfs: remove synchronous rename code
  nfs: convert nfs_rename to use async_rename infrastructure
  nfs: make nfs_async_rename non-static
  nfs: abstract out code needed to complete a sillyrename
  NFSv4: Clear the open state flags if the new stateid does not match
  ...
2014-04-06 10:09:38 -07:00
Linus Torvalds
6f4c98e1c2 Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module updates from Rusty Russell:
 "Nothing major: the stricter permissions checking for sysfs broke a
  staging driver; fix included.  Greg KH said he'd take the patch but
  hadn't as the merge window opened, so it's included here to avoid
  breaking build"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  staging: fix up speakup kobject mode
  Use 'E' instead of 'X' for unsigned module taint flag.
  VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms.
  kallsyms: fix percpu vars on x86-64 with relocation.
  kallsyms: generalize address range checking
  module: LLVMLinux: Remove unused function warning from __param_check macro
  Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE
  module: remove MODULE_GENERIC_TABLE
  module: allow multiple calls to MODULE_DEVICE_TABLE() per module
  module: use pr_cont
2014-04-06 09:38:07 -07:00
Linus Torvalds
04535d273e Merge tag 'dm-3.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper changes from Mike Snitzer:

 - Fix dm-cache corruption caused by discard_block_size > cache_block_size

 - Fix a lock-inversion detected by LOCKDEP in dm-cache

 - Fix a dangling bio bug in the dm-thinp target's process_deferred_bios
   error path

 - Fix corruption due to non-atomic transaction commit which allowed a
   metadata superblock to be written before all other metadata was
   successfully written -- this is common to all targets that use the
   persistent-data library's transaction manager (dm-thinp, dm-cache and
   dm-era).

 - Various small cleanups in the DM core

 - Add the dm-era target which is useful for keeping track of which
   blocks were written within a user defined period of time called an
   'era'.  Use cases include tracking changed blocks for backup
   software, and partially invalidating the contents of a cache to
   restore cache coherency after rolling back a vendor snapshot.

 - Improve the on-disk layout of multithreaded writes to the
   dm-thin-pool by splitting the pool's deferred bio list to be a
   per-thin device list and then sorting that list using an rb_tree.
   The subsequent read throughput of the data written via multiple
   threads improved by ~70%.

 - Simplify the multipath target's handling of queuing IO by pushing
   requests back to the request queue rather than queueing the IO
   internally.

* tag 'dm-3.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (24 commits)
  dm cache: fix a lock-inversion
  dm thin: sort the per thin deferred bios using an rb_tree
  dm thin: use per thin device deferred bio lists
  dm thin: simplify pool_is_congested
  dm thin: fix dangling bio in process_deferred_bios error path
  dm mpath: print more useful warnings in multipath_message()
  dm-mpath: do not activate failed paths
  dm mpath: remove extra nesting in map function
  dm mpath: remove map_io()
  dm mpath: reduce memory pressure when requeuing
  dm mpath: remove process_queued_ios()
  dm mpath: push back requests instead of queueing
  dm table: add dm_table_run_md_queue_async
  dm mpath: do not call pg_init when it is already running
  dm: use RCU_INIT_POINTER instead of rcu_assign_pointer in __unbind
  dm: stop using bi_private
  dm: remove dm_get_mapinfo
  dm: make dm_table_alloc_md_mempools static
  dm: take care to copy the space map roots before locking the superblock
  dm transaction manager: fix corruption due to non-atomic transaction commit
  ...
2014-04-05 18:49:31 -07:00
Linus Torvalds
3f583bc219 Merge tag 'iommu-updates-v3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU upates from Joerg Roedel:
 "This time a few more updates queued up.

   - Rework VT-d code to support ACPI devices

   - Improvements for memory and PCI hotplug support in the VT-d driver

   - Device-tree support for OMAP IOMMU

   - Convert OMAP IOMMU to use devm_* interfaces

   - Fixed PASID support for AMD IOMMU

   - Other random cleanups and fixes for OMAP, ARM-SMMU and SHMOBILE
     IOMMU

  Most of the changes are in the VT-d driver because some rework was
  necessary for better hotplug and ACPI device support"

* tag 'iommu-updates-v3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (75 commits)
  iommu/vt-d: Fix error handling in ANDD processing
  iommu/vt-d: returning free pointer in get_domain_for_dev()
  iommu/vt-d: Only call dmar_acpi_dev_scope_init() if DRHD units present
  iommu/vt-d: Check for NULL pointer in dmar_acpi_dev_scope_init()
  iommu/amd: Fix logic to determine and checking max PASID
  iommu/vt-d: Include ACPI devices in iommu=pt
  iommu/vt-d: Finally enable translation for non-PCI devices
  iommu/vt-d: Remove to_pci_dev() in intel_map_page()
  iommu/vt-d: Remove pdev from intel_iommu_attach_device()
  iommu/vt-d: Remove pdev from iommu_no_mapping()
  iommu/vt-d: Make domain_add_dev_info() take struct device
  iommu/vt-d: Make domain_remove_one_dev_info() take struct device
  iommu/vt-d: Rename 'hwdev' variables to 'dev' now that that's the norm
  iommu/vt-d: Remove some pointless to_pci_dev() calls
  iommu/vt-d: Make get_valid_domain_for_dev() take struct device
  iommu/vt-d: Make iommu_should_identity_map() take struct device
  iommu/vt-d: Handle RMRRs for non-PCI devices
  iommu/vt-d: Make get_domain_for_dev() take struct device
  iommu/vt-d: Make domain_context_mapp{ed,ing}() take struct device
  iommu/vt-d: Make device_to_iommu() cope with non-PCI devices
  ...
2014-04-05 18:46:26 -07:00
Linus Torvalds
19bc2eec3c Merge tag 'clk-for-linus-3.15' of git://git.linaro.org/people/mike.turquette/linux
Pull clock framework changes from Mike Turquette:
 "The clock framework changes for 3.15 look similar to past pull
  requests.  Mostly clock driver updates, more Device Tree support in
  the form of common functions useful across platforms and a handful of
  features and fixes to the framework core"

* tag 'clk-for-linus-3.15' of git://git.linaro.org/people/mike.turquette/linux: (86 commits)
  clk: shmobile: fix setting paretn clock rate
  clk: shmobile: rcar-gen2: fix lb/sd0/sd1/sdh clock parent to pll1
  clk: Fix minor errors in of_clk_init() function comments
  clk: reverse default clk provider initialization order in of_clk_init()
  clk: sirf: update copyright years to 2014
  clk: mmp: try to use closer one when do round rate
  clk: mmp: fix the wrong calculation formula
  clk: mmp: fix wrong mask when calculate denominator
  clk: st: Adds quadfs clock binding
  clk: st: Adds clockgen-vcc and clockgen-mux clock binding
  clk: st: Adds clockgen clock binding
  clk: st: Adds divmux and prediv clock binding
  clk: st: Support for A9 MUX clocks
  clk: st: Support for ClockGenA9/DDR/GPU
  clk: st: Support for QUADFS inside ClockGenB/C/D/E/F
  clk: st: Support for VCC-mux and MUX clocks
  clk: st: Support for PLLs inside ClockGenA(s)
  clk: st: Support for DIVMUX and PreDiv Clocks
  clk: support hardware-specific debugfs entries
  clk: s2mps11: Use of_get_child_by_name
  ...
2014-04-05 18:39:18 -07:00
Linus Torvalds
9712d3c377 Merge tag 'pwm/for-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm
Pull pwm changes from Thierry Reding:
 "The legacy HAVE_PWM Kconfig symbol is finally being retired.  Thanks a
  lot to Sascha Hauer for doing that.

  Three new drivers are added: Freescale FTM, Cirrus Logic CLPS711X and
  Intel Low Power Subsystem.

  An assortment of fixes and cleanups rounds things off for this release
  cycle"

* tag 'pwm/for-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
  pwm: pxa: Constify OF match table
  pwm: pxa: Fix typo "pwm" -> "PWM"
  Revert "pwm: pxa: Use of_match_ptr()"
  pwm: add support for Intel Low Power Subsystem PWM
  pwm: Add CLPS711X PWM support
  pwm: atmel: correct CDTY calculation
  pwm: atmel: Fix polarity handling
  Documentation: Add device tree bindings for Freescale FTM PWM.
  pwm: Add Freescale FTM PWM driver support
  pwm: pxa: Use of_match_ptr()
  pwm: samsung: Use SIMPLE_DEV_PM_OPS macro
  pwm: renesas-tpu: Add dependency on HAS_IOMEM
  pwm: Remove obsolete HAVE_PWM Kconfig symbol
2014-04-05 18:32:31 -07:00
Linus Torvalds
2bf73dd61a Merge tag 'tags/cleanup2-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC late cleanups from Arnd Bergmann:
 "These could not be part of the first cleanup branch, because they
  either came too late in the cycle, or they have dependencies on other
  branches.  Important changes are:

   - The integrator platform is almost multiplatform capable after some
     reorganization (Linus Walleij)
   - Minor cleanups on Zynq (Michal Simek)
   - Lots of changes for Exynos and other Samsung platforms, including
     further preparations for multiplatform support and the clocks
     bindings are rearranged"

* tag 'tags/cleanup2-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (54 commits)
  devicetree: fix newly added exynos sata bindings
  ARM: EXYNOS: Fix compilation error in cpuidle.c
  ARM: S5P64X0: Explicitly include linux/serial_s3c.h in mach/pm-core.h
  ARM: EXYNOS: Remove hardware.h file
  ARM: SAMSUNG: Remove hardware.h inclusion
  ARM: S3C24XX: Remove invalid code from hardware.h
  dt-bindings: clock: Move exynos-audss-clk.h to dt-bindings/clock
  ARM: dts: Keep some essential LDOs enabled for arndale-octa board
  ARM: dts: Disable MDMA1 node for arndale-octa board
  ARM: S3C64XX: Fix build for implicit serial_s3c.h inclusion
  serial: s3c: Fix build of header without serial_core.h preinclusion
  ARM: EXYNOS: Allow wake-up using GIC interrupts
  ARM: EXYNOS: Stop using legacy Samsung PM code
  ARM: EXYNOS: Remove PM initcalls and useless indirection
  ARM: EXYNOS: Fix abuse of CONFIG_PM
  ARM: SAMSUNG: Move s3c_pm_check_* prototypes to plat/pm-common.h
  ARM: SAMSUNG: Move common save/restore helpers to separate file
  ARM: SAMSUNG: Move Samsung PM debug code into separate file
  ARM: SAMSUNG: Consolidate PM debug functions
  ARM: SAMSUNG: Use debug_ll_addr() to get UART base address
  ...
2014-04-05 15:46:37 -07:00
Linus Torvalds
cbda94e039 Merge tag 'drivers-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC driver changes from Arnd Bergmann:
 "These changes are mostly for ARM specific device drivers that either
  don't have an upstream maintainer, or that had the maintainer ask us
  to pick up the changes to avoid conflicts.

  A large chunk of this are clock drivers (bcm281xx, exynos, versatile,
  shmobile), aside from that, reset controllers for STi as well as a
  large rework of the Marvell Orion/EBU watchdog driver are notable"

* tag 'drivers-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (99 commits)
  Revert "dts: socfpga: Add DTS entry for adding the stmmac glue layer for stmmac."
  Revert "net: stmmac: Add SOCFPGA glue driver"
  ARM: shmobile: r8a7791: Fix SCIFA3-5 clocks
  ARM: STi: Add reset controller support to mach-sti Kconfig
  drivers: reset: stih416: add softreset controller
  drivers: reset: stih415: add softreset controller
  drivers: reset: Reset controller driver for STiH416
  drivers: reset: Reset controller driver for STiH415
  drivers: reset: STi SoC system configuration reset controller support
  dts: socfpga: Add sysmgr node so the gmac can use to reference
  dts: socfpga: Add support for SD/MMC on the SOCFPGA platform
  reset: Add optional resets and stubs
  ARM: shmobile: r7s72100: fix bus clock calculation
  Power: Reset: Generalize qnap-poweroff to work on Synology devices.
  dts: socfpga: Update clock entry to support multiple parents
  ARM: socfpga: Update socfpga_defconfig
  dts: socfpga: Add DTS entry for adding the stmmac glue layer for stmmac.
  net: stmmac: Add SOCFPGA glue driver
  watchdog: orion_wdt: Use %pa to print 'phys_addr_t'
  drivers: cci: Export CCI PMU revision
  ...
2014-04-05 15:37:40 -07:00
Linus Torvalds
f83ccb9358 Merge tag 'dt-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC device tree changes from Arnd Bergmann:
 "A large part of the arm-soc patches are nowadays DT changes, adding
  support for new SoCs, boards and devices without changing kernel
  source.  The plan is still to move the devicetree files out of the
  kernel tree and reduce the amount of churn going on here, but we keep
  finding reasons to delay doing that.

  Changes are really all over the place, with little sticking out
  particularly.  We have contributions from a total of 116 people in
  this branch.

  Unfortunately, the size of this branch also causes a significant
  number of conflicts at the moment, typically when subsystem
  maintainers merge patches that change the driver at the same time as
  the dts files.  In most cases this could be avoided because the dts
  changes are supposed to be compatible in both ways, and we are asking
  everyone to send ARM dts changes through our tree only"

* tag 'dt-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (541 commits)
  dts: stmmac: Document the clocks property in the stmmac base document
  dts: socfpga: Add DTS entry for adding the stmmac glue layer for stmmac.
  ARM: STi: stih41x: Add support for the FSM Serial Flash Controller
  ARM: STi: stih416: Add support for the FSM Serial Flash Controller
  ARM: tegra: fix Dalmore pinctrl configuration
  ARM: dts: keystone: use common "ti,keystone" compatible instead of -evm
  ARM: dts: k2hk-evm: set ubifs partition size for 512M NAND
  ARM: dts: Build all keystone dt blobs
  ARM: dts: keystone: Fix control register range for clktsip
  ARM: dts: keystone: Fix domain register range for clkfftc1
  ARM: dts: bcm28155-ap: leave camldo1 on to fix reboot
  ARM: dts: add bcm590xx pmu support and enable for bcm28155-ap
  ARM: dts: bcm21664: Add device tree files.
  ARM: DT: bcm21664: Device tree bindings
  ARM: efm32: properly namespace i2c location property
  ARM: efm32: fix unit address part in USART2 device nodes' names
  ARM: mvebu: Enable NAND controller in Armada 385-DB
  ARM: mvebu: Add support for NAND controller in Armada 38x SoC
  ARM: mvebu: Add the Core Divider clock to Armada 38x SoCs
  ARM: mvebu: Add a 2 GHz fixed-clock on Armada 38x SoCs
  ...
2014-04-05 15:29:04 -07:00
Linus Torvalds
ff050ad12c Merge tag 'soc-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC specific changes from Arnd Bergmann:
 "Lots of changes specific to one of the SoC families.  Some that stick
  out are:

   - mach-qcom gains new features, most importantly SMP support for the
     newer chips (Stephen Boyd, Rohit Vaswani)
   - mvebu gains support for three new SoCs: Armada 375, 380 and 385
     (Thomas Petazzoni and Free-electrons team)
   - SMP support for Rockchips (Heiko Stübner)
   - Lots of i.MX changes (Shawn Guo)
   - Added support for BCM5301x SoC (Hauke Mehrtens)
   - Multiplatform support for Marvell Kirkwood and Dove (Andrew Lunn
     and Sebastian Hesselbarth doing the final part of a long journey)
   - Unify davinci platforms and remove obsolete ones (Sekhar Nori, Arnd
     Bergmann)"

* tag 'soc-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (126 commits)
  ARM: sunxi: Select HAVE_ARM_ARCH_TIMER
  ARM: cache-tauros2: remove ARMv6 code
  ARM: mvebu: don't select CONFIG_NEON
  ARM: davinci: fix DT booting with default defconfig
  ARM: configs: bcm_defconfig: enable bcm590xx regulator support
  ARM: davinci: remove tnetv107x support
  MAINTAINERS: Update ARM STi maintainers
  ARM: restrict BCM_KONA_UART to ARCH_BCM_MOBILE
  ARM: bcm21664: Add board support.
  ARM: sunxi: Add the new watchog compatibles to the reboot code
  ARM: enable ARM_HAS_SG_CHAIN for multiplatform
  ARM: davinci: remove da8xx_omapl_defconfig
  ARM: davinci: da8xx: fix multiple watchdog device registration
  ARM: davinci: add da8xx specific configs to davinci_all_defconfig
  ARM: davinci: enable da8xx build concurrently with older devices
  ARM: BCM5301X: workaround suppress fault
  ARM: BCM5301X: add early debugging support
  ARM: BCM5301X: initial support for the BCM5301X/BCM470X SoCs with ARM CPU
  ARM: mach-bcm: Remove GENERIC_TIME
  ARM: shmobile: APMU: Fix warnings due to improper printk formats
  ...
2014-04-05 14:19:54 -07:00
Linus Torvalds
dfc25e4503 Merge tag 'cleanup-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC cleanups from Arnd Bergmann:
 "These cleanup patches are mainly move stuff around and should all be
  harmless.  They are mainly split out so that other branches can be
  based on top to avoid conflicts.

  Notable changes are:

   - We finally remove all mach/timex.h, after CLOCK_TICK_RATE is no
     longer used (Uwe Kleine-König)
   - The Qualcomm MSM platform is split out into legacy mach-msm and
     new-style mach-qcom, to allow easier maintainance of the new
     hardware support without regressions (Kumar Gala)
   - A rework of some of the Kconfig logic to simplify multiplatform
     support (Rob Herring)
   - Samsung Exynos gets closer to supporting multiplatform (Sachin
     Kamat and others)
   - mach-bcm3528 gets merged into mach-bcm (Stephen Warren)
   - at91 gains some common clock framework support (Alexandre Belloni,
     Jean-Jacques Hiblot and other French people)"

* tag 'cleanup-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (89 commits)
  ARM: hisi: select HAVE_ARM_SCU only for SMP
  ARM: efm32: allow uncompress debug output
  ARM: prima2: build reset code standalone
  ARM: at91: add PWM clock
  ARM: at91: move sam9261 SoC to common clk
  ARM: at91: prepare common clk transition for sam9261 SoC
  ARM: at91: updated the at91_dt_defconfig with support for the ADS7846
  ARM: at91: dt: sam9261: Device Tree support for the at91sam9261ek
  ARM: at91: dt: defconfig: Added the sam9261 to the list of DT-enabled SOCs
  ARM: at91: dt: Add at91sam9261 dt SoC support
  ARM: at91: switch sam9rl to common clock framework
  ARM: at91/dt: define main clk frequency of at91sam9rlek
  ARM: at91/dt: define at91sam9rl clocks
  ARM: at91: prepare common clk transition for sam9rl SoCs
  ARM: at91: prepare sam9 dt boards transition to common clk
  ARM: at91: dt: sam9rl: Device Tree for the at91sam9rlek
  ARM: at91/defconfig: Add the sam9rl to the list of DT-enabled SOCs
  ARM: at91: Add at91sam9rl DT SoC support
  ARM: at91: prepare at91sam9rl DT transition
  ARM: at91/defconfig: refresh at91sam9260_9g20_defconfig
  ...
2014-04-05 13:51:19 -07:00
Linus Torvalds
9f800363bb Merge tag 'fixes-non-critical-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC non-critical bug fixes from Arnd Bergmann:
 "Lots of isolated bug fixes that were not found to be important enough
  to be submitted before the merge window or backported into stable
  kernels.

  The vast majority of these came out of Arnd's randconfig testing and
  just prevents running into build-time bugs in configurations that we
  do not care about in practice"

* tag 'fixes-non-critical-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (75 commits)
  ARM: at91: fix a typo
  ARM: moxart: fix CPU selection
  ARM: tegra: fix board DT pinmux setup
  ARM: nspire: Fix compiler warning
  IXP4xx: Fix DMA masks.
  Revert "ARM: ixp4xx: Make dma_set_coherent_mask common, correct implementation"
  IXP4xx: Fix Goramo Multilink GPIO conversion.
  Revert "ARM: ixp4xx: fix gpio rework"
  ARM: tegra: make debug_ll code build for ARMv6
  ARM: sunxi: fix build for THUMB2_KERNEL
  ARM: exynos: add missing include of linux/module.h
  ARM: exynos: fix l2x0 saved regs handling
  ARM: samsung: select CRC32 for SAMSUNG_PM_CHECK
  ARM: samsung: select ATAGS where necessary
  ARM: samsung: fix SAMSUNG_PM_DEBUG Kconfig logic
  ARM: samsung: allow serial driver to be disabled
  ARM: s5pv210: enable IDE support in MACH_TORBRECK
  ARM: s5p64x0: fix building with only one soc type
  ARM: s3c64xx: select power domains only when used
  ARM: s3c64xx: MACH_SMDK6400 needs HSMMC1
  ...
2014-04-05 13:44:27 -07:00
Linus Torvalds
2d1eb87ae1 Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM changes from Russell King:

 - Perf updates from Will Deacon:
   - Support for Qualcomm Krait processors (run perf on your phone!)
   - Support for Cortex-A12 (run perf stat on your FPGA!)
   - Support for perf_sample_event_took, allowing us to automatically decrease
     the sample rate if we can't handle the PMU interrupts quickly enough
     (run perf record on your FPGA!).

 - Basic uprobes support from David Long:
     This patch series adds basic uprobes support to ARM. It is based on
     patches developed earlier by Rabin Vincent. That approach of adding
     hooks into the kprobes instruction parsing code was not well received.
     This approach separates the ARM instruction parsing code in kprobes out
     into a separate set of functions which can be used by both kprobes and
     uprobes. Both kprobes and uprobes then provide their own semantic action
     tables to process the results of the parsing.

 - ARMv7M (microcontroller) updates from Uwe Kleine-König

 - OMAP DMA updates (recently added Vinod's Ack even though they've been
   sitting in linux-next for a few months) to reduce the reliance of
   omap-dma on the code in arch/arm.

 - SA11x0 changes from Dmitry Eremin-Solenikov and Alexander Shiyan

 - Support for Cortex-A12 CPU

 - Align support for ARMv6 with ARMv7 so they can cooperate better in a
   single zImage.

 - Addition of first AT_HWCAP2 feature bits for ARMv8 crypto support.

 - Removal of IRQ_DISABLED from various ARM files

 - Improved efficiency of virt_to_page() for single zImage

 - Patch from Ulf Hansson to permit runtime PM callbacks to be available for
   AMBA devices for suspend/resume as well.

 - Finally kill asm/system.h on ARM.

* 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (89 commits)
  dmaengine: omap-dma: more consolidation of CCR register setup
  dmaengine: omap-dma: move IRQ handling to omap-dma
  dmaengine: omap-dma: move register read/writes into omap-dma.c
  ARM: omap: dma: get rid of 'p' allocation and clean up
  ARM: omap: move dma channel allocation into plat-omap code
  ARM: omap: dma: get rid of errata global
  ARM: omap: clean up DMA register accesses
  ARM: omap: remove almost-const variables
  ARM: omap: remove references to disable_irq_lch
  dmaengine: omap-dma: cleanup errata 3.3 handling
  dmaengine: omap-dma: provide register read/write functions
  dmaengine: omap-dma: use cached CCR value when enabling DMA
  dmaengine: omap-dma: move barrier to omap_dma_start_desc()
  dmaengine: omap-dma: move clnk_ctrl setting to preparation functions
  dmaengine: omap-dma: improve efficiency loading C.SA/C.EI/C.FI registers
  dmaengine: omap-dma: consolidate clearing channel status register
  dmaengine: omap-dma: move CCR buffering disable errata out of the fast path
  dmaengine: omap-dma: provide register definitions
  dmaengine: omap-dma: consolidate setup of CCR
  dmaengine: omap-dma: consolidate setup of CSDP
  ...
2014-04-05 13:20:43 -07:00
Dave Airlie
9f97ba806a Merge tag 'drm-intel-fixes-2014-04-04' of git://anongit.freedesktop.org/drm-intel into drm-next
Merge window -fixes pull request as usual. Well, I did sneak in Jani's
drm_i915_private_t typedef removal, need to have fun with a big sed job
too ;-)

Otherwise:
- hdmi interlaced fixes (Jesse&Ville)
- pipe error/underrun/crc tracking fixes, regression in late 3.14-rc (but
  not cc: stable since only really relevant for igt runs)
- large cursor wm fixes (Chris)
- fix gpu turbo boost/throttle again, was getting stuck due to vlv rps
  patches (Chris+Imre)
- fix runtime pm fallout (Paulo)
- bios framebuffer inherit fix (Chris)
- a few smaller things

* tag 'drm-intel-fixes-2014-04-04' of git://anongit.freedesktop.org/drm-intel: (196 commits)
  Skip intel_crt_init for Dell XPS 8700
  drm/i915: vlv: fix RPS interrupt mask setting
  Revert "drm/i915/vlv: fixup DDR freq detection per Punit spec"
  drm/i915: move power domain init earlier during system resume
  drm/i915: Fix the computation of required fb size for pipe
  drm/i915: don't get/put runtime PM at the debugfs forcewake file
  drm/i915: fix WARNs when reading DDI state while suspended
  drm/i915: don't read cursor registers on powered down pipes
  drm/i915: get runtime PM at i915_display_info
  drm/i915: don't read pp_ctrl_reg if we're suspended
  drm/i915: get runtime PM at i915_reg_read_ioctl
  drm/i915: don't schedule force_wake_timer at gen6_read
  drm/i915: vlv: reserve the GT power context only once during driver init
  drm/i915: prefer struct drm_i915_private to drm_i915_private_t
  drm/i915/overlay: prefer struct drm_i915_private to drm_i915_private_t
  drm/i915/ringbuffer: prefer struct drm_i915_private to drm_i915_private_t
  drm/i915/display: prefer struct drm_i915_private to drm_i915_private_t
  drm/i915/irq: prefer struct drm_i915_private to drm_i915_private_t
  drm/i915/gem: prefer struct drm_i915_private to drm_i915_private_t
  drm/i915/dma: prefer struct drm_i915_private to drm_i915_private_t
  ...
2014-04-05 16:14:21 +10:00
Dave Airlie
82c68b6ccd Merge tag 'drm/tegra/for-3.15-rc1' of git://anongit.freedesktop.org/tegra/linux into drm-next
drm/tegra: Changes for v3.15-rc1

Implement eDP support for Tegra124 and support the PRIME vmap()/vunmap()
operations.

A symbol that is required for upcoming V4L2 support is now exported by
the host1x driver.

Relicense drivers under the GPL v2 for consistency. One exception is the
public header file, which is relicensed under MIT to abide by the common
rule.

* tag 'drm/tegra/for-3.15-rc1' of git://anongit.freedesktop.org/tegra/linux:
  drm/tegra: Use standard GPL v2 license text
  drm/tegra: Relicense under GPL v2
  drm/tegra: Relicense public header under MIT
  drm/tegra: Add eDP support
  gpu: host1x: export host1x_syncpt_incr_max() function
  drm/tegra: prime: Add vmap support
2014-04-05 16:13:08 +10:00
Linus Torvalds
8e0c083234 Merge tag 'fbdev-main-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux
Pull fbdev changes from Tomi Valkeinen:
 "Various fbdev fixes and improvements, but nothing big"

* tag 'fbdev-main-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux: (38 commits)
  fbdev: Make the switch from generic to native driver less alarming
  Video: atmel: avoid the id of fix screen info is overwritten
  video: imxfb: Add DT default contrast control register property.
  video: atmel_lcdfb: ensure the hardware is initialized with the correct mode
  fbdev: vesafb: add dev->remove() callback
  fbdev: efifb: add dev->remove() callback
  video: pxa3xx-gcu: switch to devres functions
  video: pxa3xx-gcu: provide an empty .open call
  video: pxa3xx-gcu: pass around struct device *
  video: pxa3xx-gcu: rename some symbols
  sisfb: fix 1280x720 resolution support
  video: fbdev: uvesafb: Remove impossible code path in uvesafb_init_info
  video: fbdev: uvesafb: Remove redundant NULL check in uvesafb_remove
  fbdev: FB_OPENCORES should depend on HAS_DMA
  OMAPDSS: convert pixel clock to common videomode style
  OMAPDSS: Remove unused get_context_loss_count support
  OMAPDSS: use DISPC register to detect context loss
  video: da8xx-fb: Use "SIMPLE_DEV_PM_OPS" macro
  video: imxfb: Convert to SIMPLE_DEV_PM_OPS
  video: imxfb: Resolve mismatch between backlight/contrast
  ...
2014-04-04 21:28:36 -07:00
Linus Torvalds
d1d9cfc330 Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random
Pull /dev/random changes from Ted Ts'o:
 "A number of cleanups plus support for the RDSEED instruction, which
  will be showing up in Intel Broadwell CPU's"

* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
  random: Add arch_has_random[_seed]()
  random: If we have arch_get_random_seed*(), try it before blocking
  random: Use arch_get_random_seed*() at init time and once a second
  x86, random: Enable the RDSEED instruction
  random: use the architectural HWRNG for the SHA's IV in extract_buf()
  random: clarify bits/bytes in wakeup thresholds
  random: entropy_bytes is actually bits
  random: simplify accounting code
  random: tighten bound on random_read_wakeup_thresh
  random: forget lock in lockless accounting
  random: simplify accounting logic
  random: fix comment on "account"
  random: simplify loop in random_read
  random: fix description of get_random_bytes
  random: fix comment on proc_do_uuid
  random: fix typos / spelling errors in comments
2014-04-04 21:25:28 -07:00
Ilya Dryomov
18cb95af2d libceph: enable PRIMARY_AFFINITY feature bit
Announce our support for osdmaps with non-default primary affinity
values.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:08:20 -07:00
Ilya Dryomov
8008ab1080 libceph: return primary from ceph_calc_pg_acting()
In preparation for adding support for primary_temp, stop assuming
primaryness: add a primary out parameter to ceph_calc_pg_acting() and
change call sites accordingly.  Primary is now specified separately
from the order of osds in the set.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:08:14 -07:00
Ilya Dryomov
ac972230e2 libceph: switch ceph_calc_pg_acting() to new helpers
Switch ceph_calc_pg_acting() to new helpers: pg_to_raw_osds(),
raw_to_up_osds() and apply_temps().

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:08:13 -07:00
Ilya Dryomov
2abebdbca7 libceph: ceph_can_shift_osds(pool) and pool type defines
Bring in pg_pool_t::can_shift_osds() counterpart along with pool type
defines.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:08:08 -07:00
Ilya Dryomov
246138fa67 libceph: ceph_osd_{exists,is_up,is_down}(osd) definitions
Sync up with ceph.git definitions.  Bring in ceph_osd_is_down().

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:08:07 -07:00
Ilya Dryomov
ddf3a21a03 libceph: enable OSDMAP_ENC feature bit
Announce our support for "new" (v7 - split and separately versioned
client and osd sections) osdmap enconding.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:08:05 -07:00
Ilya Dryomov
2cfa34f2d6 libceph: primary_affinity infrastructure
Add primary_affinity infrastructure.  primary_affinity values are
stored in an max_osd-sized array, hanging off ceph_osdmap, similar to
a osd_weight array.

Introduce {get,set}_primary_affinity() helpers, primarily to return
CEPH_OSD_DEFAULT_PRIMARY_AFFINITY when no affinity has been set and to
abstract out osd_primary_affinity array allocation and initialization.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:08:02 -07:00
Ilya Dryomov
9686f94c8c libceph: primary_temp infrastructure
Add primary_temp mappings infrastructure.  struct ceph_pg_mapping is
overloaded, primary_temp mappings are stored in an rb-tree, rooted at
ceph_osdmap, in a manner similar to pg_temp mappings.

Dump primary_temp mappings to /sys/kernel/debug/ceph/<client>/osdmap,
one 'primary_temp <pgid> <osd>' per line, e.g:

    primary_temp 2.6 4

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:07:58 -07:00
Ilya Dryomov
35a935d75d libceph: generalize ceph_pg_mapping
In preparation for adding support for primary_temp mappings, generalize
struct ceph_pg_mapping so it can hold mappings other than pg_temp.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:07:57 -07:00
Ilya Dryomov
a2505d63ee libceph: split osdmap allocation and decode steps
Split osdmap allocation and initialization into a separate function,
ceph_osdmap_decode().

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-04-04 21:07:37 -07:00
Ilya Dryomov
07bd7de47a crush: support chooseleaf_vary_r tunable (tunables3) by default
Add TUNABLES3 feature (chooseleaf_vary_r tunable) to a set of features
supported by default.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-04 21:07:29 -07:00
Ilya Dryomov
d83ed858f1 crush: add SET_CHOOSELEAF_VARY_R step
This lets you adjust the vary_r tunable on a per-rule basis.

Reflects ceph.git commit f944ccc20aee60a7d8da7e405ec75ad1cd449fac.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-04 21:07:28 -07:00
Ilya Dryomov
e2b149cc4b crush: add chooseleaf_vary_r tunable
The current crush_choose_firstn code will re-use the same 'r' value for
the recursive call.  That means that if we are hitting a collision or
rejection for some reason (say, an OSD that is marked out) and need to
retry, we will keep making the same (bad) choice in that recursive
selection.

Introduce a tunable that fixes that behavior by incorporating the parent
'r' value into the recursive starting point, so that a different path
will be taken in subsequent placement attempts.

Note that this was done from the get-go for the new crush_choose_indep
algorithm.

This was exposed by a user who was seeing PGs stuck in active+remapped
after reweight-by-utilization because the up set mapped to a single OSD.

Reflects ceph.git commit a8e6c9fbf88bad056dd05d3eb790e98a5e43451a.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-04-04 21:07:26 -07:00
Yan, Zheng
eb13e832f8 ceph: use fl->fl_file as owner identifier of flock and posix lock
flock and posix lock should use fl->fl_file instead of process ID
as owner identifier. (posix lock uses fl->fl_owner. fl->fl_owner
is usually equal to fl->fl_file, but it also can be a customized
value). The process ID of who holds the lock is just for F_GETLK
fcntl(2).

The fix is rename the 'pid' fields of struct ceph_mds_request_args
and struct ceph_filelock to 'owner', rename 'pid_namespace' fields
to 'pid'. Assign fl->fl_file to the 'owner' field of lock messages.
We also set the most significant bit of the 'owner' field. MDS can
use that bit to distinguish between old and new clients.

The MDS counterpart of this patch modifies the flock code to not
take the 'pid_namespace' into consideration when checking conflict
locks.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-04 21:07:11 -07:00