Commit Graph

48594 Commits

Author SHA1 Message Date
Linus Torvalds
6c0f568e84 Merge branch 'akpm' (patches from Andrew)
Merge patch-bomb from Andrew Morton:

 - a few misc things

 - Andy's "ambient capabilities"

 - fs/nofity updates

 - the ocfs2 queue

 - kernel/watchdog.c updates and feature work.

 - some of MM.  Includes Andrea's userfaultfd feature.

[ Hadn't noticed that userfaultfd was 'default y' when applying the
  patches, so that got fixed in this merge instead.  We do _not_ mark
  new features that nobody uses yet 'default y'   - Linus ]

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits)
  mm/hugetlb.c: make vma_has_reserves() return bool
  mm/madvise.c: make madvise_behaviour_valid() return bool
  mm/memory.c: make tlb_next_batch() return bool
  mm/dmapool.c: change is_page_busy() return from int to bool
  mm: remove struct node_active_region
  mremap: simplify the "overlap" check in mremap_to()
  mremap: don't do uneccesary checks if new_len == old_len
  mremap: don't do mm_populate(new_addr) on failure
  mm: move ->mremap() from file_operations to vm_operations_struct
  mremap: don't leak new_vma if f_op->mremap() fails
  mm/hugetlb.c: make vma_shareable() return bool
  mm: make GUP handle pfn mapping unless FOLL_GET is requested
  mm: fix status code which move_pages() returns for zero page
  mm: memcontrol: bring back the VM_BUG_ON() in mem_cgroup_swapout()
  genalloc: add support of multiple gen_pools per device
  genalloc: add name arg to gen_pool_get() and devm_gen_pool_create()
  mm/memblock: WARN_ON when nid differs from overlap region
  Documentation/features/vm: add feature description and arch support status for batched TLB flush after unmap
  mm: defer flush of writable TLB entries
  mm: send one IPI per CPU to TLB flush all entries after unmapping pages
  ...
2015-09-05 14:27:38 -07:00
minkyung88.kim
4e6dab4233 mm: remove struct node_active_region
struct node_active_region is not used anymore.  Remove it.

Signed-off-by: minkyung88.kim <minkyung88.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Oleg Nesterov
5477e70a64 mm: move ->mremap() from file_operations to vm_operations_struct
vma->vm_ops->mremap() looks more natural and clean in move_vma(), and this
way ->mremap() can have more users.  Say, vdso.

While at it, s/aio_ring_remap/aio_ring_mremap/.

Note: this is the minimal change before ->mremap() finds another user in
file_operations; this method should have more arguments, and it can be
used to kill arch_remap().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Vladimir Zapolskiy
c98c36355d genalloc: add support of multiple gen_pools per device
This change fills devm_gen_pool_create()/gen_pool_get() "name" argument
stub with contents and extends of_gen_pool_get() functionality on this
basis.

If there is no associated platform device with a device node passed to
of_gen_pool_get(), the function attempts to get a label property or device
node name (= repeats MTD OF partition standard) and seeks for a named
gen_pool registered by device of the parent device node.

The main idea of the change is to allow registration of independent
gen_pools under the same umbrella device, say "partitions" on "storage
device", the original functionality of one "partition" per "storage
device" is untouched.

[akpm@linux-foundation.org: fix constness in devres_find()]
[dan.carpenter@oracle.com: freeing const data pointers]
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Cc: Philipp Zabel <p.zabel@pengutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Sascha Hauer <kernel@pengutronix.de>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Vladimir Zapolskiy
7385817359 genalloc: add name arg to gen_pool_get() and devm_gen_pool_create()
This change modifies gen_pool_get() and devm_gen_pool_create() client
interfaces adding one more argument "name" of a gen_pool object.

Due to implementation gen_pool_get() is capable to retrieve only one
gen_pool associated with a device even if multiple gen_pools are created,
fortunately right at the moment it is sufficient for the clients, hence
provide NULL as a valid argument on both producer devm_gen_pool_create()
and consumer gen_pool_get() sides.

Because only one created gen_pool per device is addressable, explicitly
add a restriction to devm_gen_pool_create() to create only one gen_pool
per device, this implies two possible error codes returned by the
function, account it on client side (only misc/sram).  This completes
client side changes related to genalloc updates.

[akpm@linux-foundation.org: gen_pool_get() cleanup]
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Cc: Philipp Zabel <p.zabel@pengutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Sascha Hauer <kernel@pengutronix.de>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Mel Gorman
d950c9477d mm: defer flush of writable TLB entries
If a PTE is unmapped and it's dirty then it was writable recently.  Due to
deferred TLB flushing, it's best to assume a writable TLB cache entry
exists.  With that assumption, the TLB must be flushed before any IO can
start or the page is freed to avoid lost writes or data corruption.  This
patch defers flushing of potentially writable TLBs as long as possible.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Mel Gorman
72b252aed5 mm: send one IPI per CPU to TLB flush all entries after unmapping pages
An IPI is sent to flush remote TLBs when a page is unmapped that was
potentially accesssed by other CPUs.  There are many circumstances where
this happens but the obvious one is kswapd reclaiming pages belonging to a
running process as kswapd and the task are likely running on separate
CPUs.

On small machines, this is not a significant problem but as machine gets
larger with more cores and more memory, the cost of these IPIs can be
high.  This patch uses a simple structure that tracks CPUs that
potentially have TLB entries for pages being unmapped.  When the unmapping
is complete, the full TLB is flushed on the assumption that a refill cost
is lower than flushing individual entries.

Architectures wishing to do this must give the following guarantee.

        If a clean page is unmapped and not immediately flushed, the
        architecture must guarantee that a write to that linear address
        from a CPU with a cached TLB entry will trap a page fault.

This is essentially what the kernel already depends on but the window is
much larger with this patch applied and is worth highlighting.  The
architecture should consider whether the cost of the full TLB flush is
higher than sending an IPI to flush each individual entry.  An additional
architecture helper called flush_tlb_local is required.  It's a trivial
wrapper with some accounting in the x86 case.

The impact of this patch depends on the workload as measuring any benefit
requires both mapped pages co-located on the LRU and memory pressure.  The
case with the biggest impact is multiple processes reading mapped pages
taken from the vm-scalability test suite.  The test case uses NR_CPU
readers of mapped files that consume 10*RAM.

Linear mapped reader on a 4-node machine with 64G RAM and 48 CPUs

                                           4.2.0-rc1          4.2.0-rc1
                                             vanilla       flushfull-v7
Ops lru-file-mmap-read-elapsed      159.62 (  0.00%)   120.68 ( 24.40%)
Ops lru-file-mmap-read-time_range    30.59 (  0.00%)     2.80 ( 90.85%)
Ops lru-file-mmap-read-time_stddv     6.70 (  0.00%)     0.64 ( 90.38%)

           4.2.0-rc1    4.2.0-rc1
             vanilla flushfull-v7
User          581.00       611.43
System       5804.93      4111.76
Elapsed       161.03       122.12

This is showing that the readers completed 24.40% faster with 29% less
system CPU time.  From vmstats, it is known that the vanilla kernel was
interrupted roughly 900K times per second during the steady phase of the
test and the patched kernel was interrupts 180K times per second.

The impact is lower on a single socket machine.

                                           4.2.0-rc1          4.2.0-rc1
                                             vanilla       flushfull-v7
Ops lru-file-mmap-read-elapsed       25.33 (  0.00%)    20.38 ( 19.54%)
Ops lru-file-mmap-read-time_range     0.91 (  0.00%)     1.44 (-58.24%)
Ops lru-file-mmap-read-time_stddv     0.28 (  0.00%)     0.47 (-65.34%)

           4.2.0-rc1    4.2.0-rc1
             vanilla flushfull-v7
User           58.09        57.64
System        111.82        76.56
Elapsed        27.29        22.55

It's still a noticeable improvement with vmstat showing interrupts went
from roughly 500K per second to 45K per second.

The patch will have no impact on workloads with no memory pressure or have
relatively few mapped pages.  It will have an unpredictable impact on the
workload running on the CPU being flushed as it'll depend on how many TLB
entries need to be refilled and how long that takes.  Worst case, the TLB
will be completely cleared of active entries when the target PFNs were not
resident at all.

[sasha.levin@oracle.com: trace tlb flush after disabling preemption in try_to_unmap_flush]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Mel Gorman
5b74283ab2 x86, mm: trace when an IPI is about to be sent
When unmapping pages it is necessary to flush the TLB.  If that page was
accessed by another CPU then an IPI is used to flush the remote CPU.  That
is a lot of IPIs if kswapd is scanning and unmapping >100K pages per
second.

There already is a window between when a page is unmapped and when it is
TLB flushed.  This series increases the window so multiple pages can be
flushed using a single IPI.  This should be safe or the kernel is hosed
already.

Patch 1 simply made the rest of the series easier to write as ftrace
        could identify all the senders of TLB flush IPIS.

Patch 2 tracks what CPUs potentially map a PFN and then sends an IPI
        to flush the entire TLB.

Patch 3 tracks when there potentially are writable TLB entries that
        need to be batched differently

Patch 4 increases SWAP_CLUSTER_MAX to further batch flushes

The performance impact is documented in the changelogs but in the optimistic
case on a 4-socket machine the full series reduces interrupts from 900K
interrupts/second to 60K interrupts/second.

This patch (of 4):

It is easy to trace when an IPI is received to flush a TLB but harder to
detect what event sent it.  This patch makes it easy to identify the
source of IPIs being transmitted for TLB flushes on x86.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Andrea Arcangeli
c1a4de99fa userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation
This implements mcopy_atomic and mfill_zeropage that are the lowlevel
VM methods that are invoked respectively by the UFFDIO_COPY and
UFFDIO_ZEROPAGE userfaultfd commands.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Cc: zhang.zhanghailiang@huawei.com
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Andrea Arcangeli
1380fca084 userfaultfd: activate syscall
This activates the userfaultfd syscall.

[sfr@canb.auug.org.au: activate syscall fix]
[akpm@linux-foundation.org: don't enable userfaultfd on powerpc]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Cc: zhang.zhanghailiang@huawei.com
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Andrea Arcangeli
19a809afe2 userfaultfd: teach vma_merge to merge across vma->vm_userfaultfd_ctx
vma->vm_userfaultfd_ctx is yet another vma parameter that vma_merge
must be aware about so that we can merge vmas back like they were
originally before arming the userfaultfd on some memory range.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Cc: zhang.zhanghailiang@huawei.com
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Andrea Arcangeli
16ba6f811d userfaultfd: add VM_UFFD_MISSING and VM_UFFD_WP
These two flags gets set in vma->vm_flags to tell the VM common code
if the userfaultfd is armed and in which mode (only tracking missing
faults, only tracking wrprotect faults or both). If neither flags is
set it means the userfaultfd is not armed on the vma.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Cc: zhang.zhanghailiang@huawei.com
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Andrea Arcangeli
745f234be1 userfaultfd: add vm_userfaultfd_ctx to the vm_area_struct
This adds the vm_userfaultfd_ctx to the vm_area_struct.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Cc: zhang.zhanghailiang@huawei.com
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Andrea Arcangeli
932b18e0ae userfaultfd: linux/userfaultfd_k.h
Kernel header defining the methods needed by the VM common code to
interact with the userfaultfd.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Cc: zhang.zhanghailiang@huawei.com
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Andrea Arcangeli
51360155ec userfaultfd: waitqueue: add nr wake parameter to __wake_up_locked_key
userfaultfd needs to wake all waitqueues (pass 0 as nr parameter), instead
of the current hardcoded 1 (that would wake just the first waitqueue in
the head list).

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Cc: zhang.zhanghailiang@huawei.com
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Christoph Lameter
484748f0b6 slab: infrastructure for bulk object allocation and freeing
Add the basic infrastructure for alloc/free operations on pointer arrays.
It includes a generic function in the common slab code that is used in
this infrastructure patch to create the unoptimized functionality for slab
bulk operations.

Allocators can then provide optimized allocation functions for situations
in which large numbers of objects are needed.  These optimization may
avoid taking locks repeatedly and bypass metadata creation if all objects
in slab pages can be used to provide the objects required.

Allocators can extend the skeletons provided and add their own code to the
bulk alloc and free functions.  They can keep the generic allocation and
freeing and just fall back to those if optimizations would not work (like
for example when debugging is on).

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Ulrich Obergfell
ec6a90661a watchdog: rename watchdog_suspend() and watchdog_resume()
Rename watchdog_suspend() to lockup_detector_suspend() and
watchdog_resume() to lockup_detector_resume() to avoid confusion with the
watchdog subsystem and to be consistent with the existing name
lockup_detector_init().

Also provide comment blocks to explain the watchdog_running and
watchdog_suspended variables and their relationship.

Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Stephane Eranian <eranian@google.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Ulrich Obergfell
999bbe49ea watchdog: use suspend/resume interface in fixup_ht_bug()
Remove watchdog_nmi_disable_all() and watchdog_nmi_enable_all() since
these functions are no longer needed.  If a subsystem has a need to
deactivate the watchdog temporarily, it should utilize the
watchdog_suspend() and watchdog_resume() functions.

[akpm@linux-foundation.org: fix build with CONFIG_LOCKUP_DETECTOR=m]
Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Stephane Eranian <eranian@google.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Ulrich Obergfell
8c073d27d7 watchdog: introduce watchdog_suspend() and watchdog_resume()
This interface can be utilized to deactivate the hard and soft lockup
detector temporarily.  Callers are expected to minimize the duration of
deactivation.  Multiple deactivations are allowed to occur in parallel but
should be rare in practice.

[akpm@linux-foundation.org: remove unneeded static initialization]
Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Stephane Eranian <eranian@google.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Guenter Roeck
aacfbe6a97 kernel/watchdog: move NMI function header declarations from watchdog.h to nmi.h
The kernel's NMI watchdog has nothing to do with the watchdog subsystem.
Its header declarations should be in linux/nmi.h, not linux/watchdog.h.

The code provided two sets of dummy functions if HARDLOCKUP_DETECTOR is
not configured, one in the include file and one in kernel/watchdog.c.
Remove the dummy functions from kernel/watchdog.c and use those from the
include file.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Frederic Weisbecker
230ec93909 smpboot: allow passing the cpumask on per-cpu thread registration
It makes the registration cheaper and simpler for the smpboot per-cpu
kthread users that don't need to always update the cpumask after threads
creation.

[sfr@canb.auug.org.au: fix for allow passing the cpumask on per-cpu thread registration]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Kees Cook
a068acf2ee fs: create and use seq_show_option for escaping
Many file systems that implement the show_options hook fail to correctly
escape their output which could lead to unescaped characters (e.g.  new
lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files.  This
could lead to confusion, spoofed entries (resulting in things like
systemd issuing false d-bus "mount" notifications), and who knows what
else.  This looks like it would only be the root user stepping on
themselves, but it's possible weird things could happen in containers or
in other situations with delegated mount privileges.

Here's an example using overlay with setuid fusermount trusting the
contents of /proc/mounts (via the /etc/mtab symlink).  Imagine the use
of "sudo" is something more sneaky:

  $ BASE="ovl"
  $ MNT="$BASE/mnt"
  $ LOW="$BASE/lower"
  $ UP="$BASE/upper"
  $ WORK="$BASE/work/ 0 0
  none /proc fuse.pwn user_id=1000"
  $ mkdir -p "$LOW" "$UP" "$WORK"
  $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
  $ cat /proc/mounts
  none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
  none /proc fuse.pwn user_id=1000 0 0
  $ fusermount -u /proc
  $ cat /proc/mounts
  cat: /proc/mounts: No such file or directory

This fixes the problem by adding new seq_show_option and
seq_show_option_n helpers, and updating the vulnerable show_option
handlers to use them as needed.  Some, like SELinux, need to be open
coded due to unusual existing escape mechanisms.

[akpm@linux-foundation.org: add lost chunk, per Kees]
[keescook@chromium.org: seq_show_option should be using const parameters]
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Jan Kara <jack@suse.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Cc: J. R. Okajima <hooanon05g@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Jan Kara
4712e722f9 fsnotify: get rid of fsnotify_destroy_mark_locked()
fsnotify_destroy_mark_locked() is subtle to use because it temporarily
releases group->mark_mutex.  To avoid future problems with this
function, split it into two.

fsnotify_detach_mark() is the part that needs group->mark_mutex and
fsnotify_free_mark() is the part that must be called outside of
group->mark_mutex.  This way it's much clearer what's going on and we
also avoid some pointless acquisitions of group->mark_mutex.

Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Jan Kara
925d1132a0 fsnotify: remove mark->free_list
Free list is used when all marks on given inode / mount should be
destroyed when inode / mount is going away.  However we can free all of
the marks without using a special list with some care.

Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Jan Kara
1e39fc0183 fsnotify: document mark locking
Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Andy Lutomirski
58319057b7 capabilities: ambient capabilities
Credit where credit is due: this idea comes from Christoph Lameter with
a lot of valuable input from Serge Hallyn.  This patch is heavily based
on Christoph's patch.

===== The status quo =====

On Linux, there are a number of capabilities defined by the kernel.  To
perform various privileged tasks, processes can wield capabilities that
they hold.

Each task has four capability masks: effective (pE), permitted (pP),
inheritable (pI), and a bounding set (X).  When the kernel checks for a
capability, it checks pE.  The other capability masks serve to modify
what capabilities can be in pE.

Any task can remove capabilities from pE, pP, or pI at any time.  If a
task has a capability in pP, it can add that capability to pE and/or pI.
If a task has CAP_SETPCAP, then it can add any capability to pI, and it
can remove capabilities from X.

Tasks are not the only things that can have capabilities; files can also
have capabilities.  A file can have no capabilty information at all [1].
If a file has capability information, then it has a permitted mask (fP)
and an inheritable mask (fI) as well as a single effective bit (fE) [2].
File capabilities modify the capabilities of tasks that execve(2) them.

A task that successfully calls execve has its capabilities modified for
the file ultimately being excecuted (i.e.  the binary itself if that
binary is ELF or for the interpreter if the binary is a script.) [3] In
the capability evolution rules, for each mask Z, pZ represents the old
value and pZ' represents the new value.  The rules are:

  pP' = (X & fP) | (pI & fI)
  pI' = pI
  pE' = (fE ? pP' : 0)
  X is unchanged

For setuid binaries, fP, fI, and fE are modified by a moderately
complicated set of rules that emulate POSIX behavior.  Similarly, if
euid == 0 or ruid == 0, then fP, fI, and fE are modified differently
(primary, fP and fI usually end up being the full set).  For nonroot
users executing binaries with neither setuid nor file caps, fI and fP
are empty and fE is false.

As an extra complication, if you execute a process as nonroot and fE is
set, then the "secure exec" rules are in effect: AT_SECURE gets set,
LD_PRELOAD doesn't work, etc.

This is rather messy.  We've learned that making any changes is
dangerous, though: if a new kernel version allows an unprivileged
program to change its security state in a way that persists cross
execution of a setuid program or a program with file caps, this
persistent state is surprisingly likely to allow setuid or file-capped
programs to be exploited for privilege escalation.

===== The problem =====

Capability inheritance is basically useless.

If you aren't root and you execute an ordinary binary, fI is zero, so
your capabilities have no effect whatsoever on pP'.  This means that you
can't usefully execute a helper process or a shell command with elevated
capabilities if you aren't root.

On current kernels, you can sort of work around this by setting fI to
the full set for most or all non-setuid executable files.  This causes
pP' = pI for nonroot, and inheritance works.  No one does this because
it's a PITA and it isn't even supported on most filesystems.

If you try this, you'll discover that every nonroot program ends up with
secure exec rules, breaking many things.

This is a problem that has bitten many people who have tried to use
capabilities for anything useful.

===== The proposed change =====

This patch adds a fifth capability mask called the ambient mask (pA).
pA does what most people expect pI to do.

pA obeys the invariant that no bit can ever be set in pA if it is not
set in both pP and pI.  Dropping a bit from pP or pI drops that bit from
pA.  This ensures that existing programs that try to drop capabilities
still do so, with a complication.  Because capability inheritance is so
broken, setting KEEPCAPS, using setresuid to switch to nonroot uids, and
then calling execve effectively drops capabilities.  Therefore,
setresuid from root to nonroot conditionally clears pA unless
SECBIT_NO_SETUID_FIXUP is set.  Processes that don't like this can
re-add bits to pA afterwards.

The capability evolution rules are changed:

  pA' = (file caps or setuid or setgid ? 0 : pA)
  pP' = (X & fP) | (pI & fI) | pA'
  pI' = pI
  pE' = (fE ? pP' : pA')
  X is unchanged

If you are nonroot but you have a capability, you can add it to pA.  If
you do so, your children get that capability in pA, pP, and pE.  For
example, you can set pA = CAP_NET_BIND_SERVICE, and your children can
automatically bind low-numbered ports.  Hallelujah!

Unprivileged users can create user namespaces, map themselves to a
nonzero uid, and create both privileged (relative to their namespace)
and unprivileged process trees.  This is currently more or less
impossible.  Hallelujah!

You cannot use pA to try to subvert a setuid, setgid, or file-capped
program: if you execute any such program, pA gets cleared and the
resulting evolution rules are unchanged by this patch.

Users with nonzero pA are unlikely to unintentionally leak that
capability.  If they run programs that try to drop privileges, dropping
privileges will still work.

It's worth noting that the degree of paranoia in this patch could
possibly be reduced without causing serious problems.  Specifically, if
we allowed pA to persist across executing non-pA-aware setuid binaries
and across setresuid, then, naively, the only capabilities that could
leak as a result would be the capabilities in pA, and any attacker
*already* has those capabilities.  This would make me nervous, though --
setuid binaries that tried to privilege-separate might fail to do so,
and putting CAP_DAC_READ_SEARCH or CAP_DAC_OVERRIDE into pA could have
unexpected side effects.  (Whether these unexpected side effects would
be exploitable is an open question.) I've therefore taken the more
paranoid route.  We can revisit this later.

An alternative would be to require PR_SET_NO_NEW_PRIVS before setting
ambient capabilities.  I think that this would be annoying and would
make granting otherwise unprivileged users minor ambient capabilities
(CAP_NET_BIND_SERVICE or CAP_NET_RAW for example) much less useful than
it is with this patch.

===== Footnotes =====

[1] Files that are missing the "security.capability" xattr or that have
unrecognized values for that xattr end up with has_cap set to false.
The code that does that appears to be complicated for no good reason.

[2] The libcap capability mask parsers and formatters are dangerously
misleading and the documentation is flat-out wrong.  fE is *not* a mask;
it's a single bit.  This has probably confused every single person who
has tried to use file capabilities.

[3] Linux very confusingly processes both the script and the interpreter
if applicable, for reasons that elude me.  The results from thinking
about a script's file capabilities and/or setuid bits are mostly
discarded.

Preliminary userspace code is here, but it needs updating:
https://git.kernel.org/cgit/linux/kernel/git/luto/util-linux-playground.git/commit/?h=cap_ambient&id=7f5afbd175d2

Here is a test program that can be used to verify the functionality
(from Christoph):

/*
 * Test program for the ambient capabilities. This program spawns a shell
 * that allows running processes with a defined set of capabilities.
 *
 * (C) 2015 Christoph Lameter <cl@linux.com>
 * Released under: GPL v3 or later.
 *
 *
 * Compile using:
 *
 *	gcc -o ambient_test ambient_test.o -lcap-ng
 *
 * This program must have the following capabilities to run properly:
 * Permissions for CAP_NET_RAW, CAP_NET_ADMIN, CAP_SYS_NICE
 *
 * A command to equip the binary with the right caps is:
 *
 *	setcap cap_net_raw,cap_net_admin,cap_sys_nice+p ambient_test
 *
 *
 * To get a shell with additional caps that can be inherited by other processes:
 *
 *	./ambient_test /bin/bash
 *
 *
 * Verifying that it works:
 *
 * From the bash spawed by ambient_test run
 *
 *	cat /proc/$$/status
 *
 * and have a look at the capabilities.
 */

#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <cap-ng.h>
#include <sys/prctl.h>
#include <linux/capability.h>

/*
 * Definitions from the kernel header files. These are going to be removed
 * when the /usr/include files have these defined.
 */
#define PR_CAP_AMBIENT 47
#define PR_CAP_AMBIENT_IS_SET 1
#define PR_CAP_AMBIENT_RAISE 2
#define PR_CAP_AMBIENT_LOWER 3
#define PR_CAP_AMBIENT_CLEAR_ALL 4

static void set_ambient_cap(int cap)
{
	int rc;

	capng_get_caps_process();
	rc = capng_update(CAPNG_ADD, CAPNG_INHERITABLE, cap);
	if (rc) {
		printf("Cannot add inheritable cap\n");
		exit(2);
	}
	capng_apply(CAPNG_SELECT_CAPS);

	/* Note the two 0s at the end. Kernel checks for these */
	if (prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_RAISE, cap, 0, 0)) {
		perror("Cannot set cap");
		exit(1);
	}
}

int main(int argc, char **argv)
{
	int rc;

	set_ambient_cap(CAP_NET_RAW);
	set_ambient_cap(CAP_NET_ADMIN);
	set_ambient_cap(CAP_SYS_NICE);

	printf("Ambient_test forking shell\n");
	if (execv(argv[1], argv + 1))
		perror("Cannot exec");

	return 0;
}

Signed-off-by: Christoph Lameter <cl@linux.com> # Original author
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Aaron Jones <aaronmdjones@gmail.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Andrew G. Morgan <morgan@kernel.org>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
Cc: Markku Savela <msa@moth.iki.fi>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: James Morris <james.l.morris@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Andrew Morton
e9f069868d kernel/kthread.c:kthread_create_on_node(): clarify documentation
- Make it clear that the `node' arg refers to memory allocations only:
  kthread_create_on_node() does not pin the new thread to that node's
  CPUs.

- Encourage the use of NUMA_NO_NODE.

[nzimmer@sgi.com: use NUMA_NO_NODE in kthread_create() also]
Cc: Nathan Zimmer <nzimmer@sgi.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Linus Torvalds
f377ea88b8 Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
 "This is the main pull request for the drm for 4.3.  Nouveau is
  probably the biggest amount of changes in here, since it missed 4.2.
  Highlights below, along with the usual bunch of fixes.

  All stuff outside drm should have applicable acks.

  Highlights:

   - new drivers:
        freescale dcu kms driver

   - core:
        more atomic fixes
        disable some dri1 interfaces on kms drivers
        drop fb panic handling, this was just getting more broken, as more locking was required.
        new core fbdev Kconfig support - instead of each driver enable/disabling it
        struct_mutex cleanups

   - panel:
        more new panels
        cleanup Kconfig

   - i915:
        Skylake support enabled by default
        legacy modesetting using atomic infrastructure
        Skylake fixes
        GEN9 workarounds

   - amdgpu:
        Fiji support
        CGS support for amdgpu
        Initial GPU scheduler - off by default
        Lots of bug fixes and optimisations.

   - radeon:
        DP fixes
        misc fixes

   - amdkfd:
        Add Carrizo support for amdkfd using amdgpu.

   - nouveau:
        long pending cleanup to complete driver,
        fully bisectable which makes it larger,
        perfmon work
        more reclocking improvements
        maxwell displayport fixes

   - vmwgfx:
        new DX device support, supports OpenGL 3.3
        screen targets support

   - mgag200:
        G200eW support
        G200e new revision support

   - msm:
        dragonboard 410c support, msm8x94 support, msm8x74v1 support
        yuv format support
        dma plane support
        mdp5 rotation
        initial hdcp

   - sti:
        atomic support

   - exynos:
        lots of cleanups
        atomic modesetting/pageflipping support
        render node support

   - tegra:
        tegra210 support (dc, dsi, dp/hdmi)
        dpms with atomic modesetting support

   - atmel:
        support for 3 more atmel SoCs
        new input formats, PRIME support.

   - dwhdmi:
        preparing to add audio support

   - rockchip:
        yuv plane support"

* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (1369 commits)
  drm/amdgpu: rename gmc_v8_0_init_compute_vmid
  drm/amdgpu: fix vce3 instance handling
  drm/amdgpu: remove ib test for the second VCE Ring
  drm/amdgpu: properly enable VM fault interrupts
  drm/amdgpu: fix warning in scheduler
  drm/amdgpu: fix buffer placement under memory pressure
  drm/amdgpu/cz: fix cz_dpm_update_low_memory_pstate logic
  drm/amdgpu: fix typo in dce11 watermark setup
  drm/amdgpu: fix typo in dce10 watermark setup
  drm/amdgpu: use top down allocation for non-CPU accessible vram
  drm/amdgpu: be explicit about cpu vram access for driver BOs (v2)
  drm/amdgpu: set MEC doorbell range for Fiji
  drm/amdgpu: implement burst NOP for SDMA
  drm/amdgpu: add insert_nop ring func and default implementation
  drm/amdgpu: add amdgpu_get_sdma_instance helper function
  drm/amdgpu: add AMDGPU_MAX_SDMA_INSTANCES
  drm/amdgpu: add burst_nop flag for sdma
  drm/amdgpu: add count field for the SDMA NOP packet v2
  drm/amdgpu: use PT for VM sync on unmap
  drm/amdgpu: make wait_event uninterruptible in push_job
  ...
2015-09-04 15:49:32 -07:00
Linus Torvalds
51e771c0d2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input subsystem updates from Dmitry Torokhov:
 "Drivers, drivers, drivers...  No interesting input core changes this
  time"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (74 commits)
  Input: elan_i2c - use iap_version to get firmware information
  Input: max8997_haptic - fix module alias
  Input: elan_i2c - fix typos for validpage_count
  Input: psmouse - add small delay for IBM trackpoint pass-through mode
  Input: synaptics - fix handling of disabling gesture mode
  Input: elan_i2c - enable ELAN0100 acpi panels
  Input: gpio-keys - report error when disabling unsupported key
  Input: sur40 - fix error return code
  Input: sentelic - silence some underflow warnings
  Input: zhenhua - switch to using bitrev8()
  Input: cros_ec_keyb - replace KEYBOARD_CROS_EC dependency
  Input: cap11xx - add LED support
  Input: elants_i2c - fix for devm_gpiod_get API change
  Input: elan_i2c - enable asynchronous probing
  Input: elants_i2c - enable asynchronous probing
  Input: elants_i2c - wire up regulator support
  Input: do not emit unneeded EV_SYN when suspending
  Input: elants_i2c - disable idle mode before updating firmware
  MAINTAINERS: Add maintainer for atmel_mxt_ts
  Input: atmel_mxt_ts - remove warning on zero T44 count
  ...
2015-09-04 12:02:11 -07:00
Linus Torvalds
abebcdfb64 Merge tag 'sound-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
 "There are little changes in core part, but lots of development are
  found in drivers, especially ASoC.  The diffstat shows regmap-related
  changes for a slight API additions / changes, and that's all.

  Looking at the code size statistics, the most significant addition is
  for Intel Skylake.  (Note that SKL support is still underway, the
  codec driver is missing.) Also STI controller driver is a major
  addition as well as a few new codec drivers.

  In HD-audio side, there are fewer changes than the past.  The
  noticeable change is the support of ELD notification from i915
  graphics driver.  Thus this pull request carries a few changes in
  drm/i915.

  Other than that, USB-audio got a rewrite of runtime PM code.  It was
  initiated by lockdep warning, but resulted in a good cleanup in the
  end.

  Below are the highlights:

  Common:
   - Factoring out of AC'97 reset code from ASoC into the core helper
   - A few regmap API extensions (in case it's not pulled yet)

  ASoC:
   - New drivers for Cirrus CS4349, GTM601, InvenSense ICS43432, Realtek
     RT298 and ST STI controllers
   - Machine drivers for Rockchip systems with MAX98090 and RT5645 and
     RT5650
   - Initial driver support for Intel Skylake devices
   - Lots of rsnd cleanup and enhancements
   - A few DAPM fixes and cleanups
   - A large number of cleanups in various drivers (conversion and
     standardized to regmap, component) mostly by Lars-Peter and Axel

  HD-audio:
   - Extended HD-audio core for Intel Skylake controller support
   - Quirks for Dell headsets, Alienware 15
   - Clean up of pin-based quirk tables for Realtek codecs
   - ELD notifier implenetation for Intel HDMI/DP

  USB-audio:
   - Refactor runtime PM code to make lockdep happier"

* tag 'sound-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (411 commits)
  drm/i915: Add locks around audio component bind/unbind
  drm/i915: Drop port_mst_index parameter from pin/eld callback
  ALSA: hda - Fix missing inline for dummy snd_hdac_set_codec_wakeup()
  ALSA: hda - Wake the codec up on pin/ELD notify events
  ALSA: hda - allow codecs to access the i915 pin/ELD callback
  drm/i915: Call audio pin/ELD notify function
  drm/i915: Add audio pin sense / ELD callback
  ASoC: zx296702-i2s: Fix resource leak when unload module
  ASoC: sti_uniperif: Ensure component is unregistered when unload module
  ASoC: au1x: psc-i2s: Convert to use devm_ioremap_resource
  ASoC: sh: dma-sh7760: Convert to devm_snd_soc_register_platform
  ASoC: spear_pcm: Use devm_snd_dmaengine_pcm_register to fix resource leak
  ALSA: fireworks/bebob/dice/oxfw: fix substreams counting at vmalloc failure
  ASoC: Clean up docbook warnings
  ASoC: txx9: Convert to devm_snd_soc_register_platform
  ASoC: pxa: Convert to devm_snd_soc_register_platform
  ASoC: nuc900: Convert to devm_snd_soc_register_platform
  ASoC: blackfin: Convert to devm_snd_soc_register_platform
  ASoC: au1x: Convert to devm_snd_soc_register_platform
  ASoC: qcom: Constify asoc_qcom_lpass_cpu_dai_ops
  ...
2015-09-04 11:46:02 -07:00
Linus Torvalds
670c039dee Merge tag 'backlight-for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight
Pull backlight updates from Lee Jones:
 - Stop using LP855X Platform Data to control regulators
 - Move PWM8941 WLED driver into Backlight
 - Remove invalid use of IS_ERR_VALUE() macro
 - Remove duplicate check for NULL data before unregistering
 - Export I2C Device ID structure

* tag 'backlight-for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
  backlight: tosa: Export I2C module alias information
  backlight: lp8788_bl: Delete a check before backlight_device_unregister()
  backlight: sky81452: Remove unneeded use of IS_ERR_VALUE() macro
  backlight: pm8941-wled: Move PM8941 WLED driver to backlight
  backlight: lp855x: Use private data for regulator control
2015-09-04 11:40:40 -07:00
Linus Torvalds
8bd8fd0a29 Merge tag 'mfd-for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MFD updates from Lee Jones:
 "New Device Support:
   - New Clocksource driver from ST
   - New MFD/ACPI/DMA drivers for Intel's Sunrisepoint PCH based platforms
   - Add support for Arizona WM8998 and WM1814
   - Add support for Dialog Semi DA9062 and DA9063
   - Add support for Kontron COMe-bBL6 and COMe-cBW6
   - Add support for X-Powers AXP152
   - Add support for Atmel, many
   - Add support for STMPE, many
   - Add support for USB in X-Powers AXP22X

  Core Frameworks:
   - New Base API to traverse devices and their children in reverse order

  Bug Fixes:
   - Fix race between runtime-suspend and IRQs
   - Obtain platform data form more reliable source

  Fix-ups:
   - Constifying things
   - Variable signage changes
   - Kconfig depends|selects changes
   - Make use of BIT() macro
   - Do not supply .owner attribute in *_driver structures
   - MAINTAINERS entries
   - Stop using set_irq_flags()
   - Start using irq_set_chained_handler_and_data()
   - Export DT device ID structures"

* tag 'mfd-for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (69 commits)
  mfd: jz4740-adc: Init mask cache in generic IRQ chip
  mfd: cros_ec: spi: Add OF match table
  mfd: stmpe: Add OF match table
  mfd: max77686: Split out regulator part from the DT binding
  mfd: Add DT binding for Maxim MAX77802 IC
  mfd: max77686: Use a generic name for the PMIC node in the example
  mfd: max77686: Don't suggest in binding to use a deprecated property
  mfd: Add MFD_CROS_EC dependencies
  mfd: cros_ec: Remove CROS_EC_PROTO dependency for SPI and I2C drivers
  mfd: axp20x: Add a cell for the usb power_supply part of the axp20x PMICs
  mfd: axp20x: Add missing registers, and mark more registers volatile
  mfd: arizona: Fixup some formatting/white space errors
  mfd: wm8994: Fix NULL pointer exception on missing pdata
  of: Add vendor prefix for Nuvoton
  mfd: mt6397: Implement wake handler and suspend/resume to handle wake up event
  mfd: atmel-hlcdc: Add support for new SoCs
  mfd: Export OF module alias information in missing drivers
  mfd: stw481x: Export I2C module alias information
  mfd: da9062: Support for the DA9063 OnKey in the DA9062 core
  mfd: max899x: Avoid redundant irq_data lookup
  ...
2015-09-04 11:35:03 -07:00
Linus Torvalds
3527122745 Merge tag 'dmaengine-4.3-rc1' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine updates from Vinod Koul:
 "This time we have aded a new capability for scatter-gathered memset
  using dmaengine APIs.  This is supported in xdmac & hdmac drivers

  We have added support for reusing descriptors for examples like video
  buffers etc.  Driver will follow

  The behaviour of descriptor ack has been clarified and documented

  New devices added are:
   - dma controller in sun[457]i SoCs
   - lpc18xx dmamux
   - ZTE ZX296702 dma controller
   - Analog Devices AXI-DMAC DMA controller
   - eDMA support for dma-crossbar
   - imx6sx support in imx-sdma driver
   - imx-sdma device to device support

  Other:
   - jz4780 fixes
   - ioatdma large refactor and cleanup for removal of ioat v1 and v2
     which is deprecated and fixes
   - ACPI support in X-Gene DMA engine driver
   - ipu irq fixes
   - mvxor fixes
   - minor fixes spread thru drivers"

[ The Kconfig and Makefile entries got re-sorted alphabetically, and I
  handled the conflict with the new Intel integrated IDMA driver by
  slightly mis-sorting it on purpose: "IDMA64" got sorted after "IMX" in
  order to keep the Intel entries together.  I think it might be a good
  idea to just rename the IDMA64 config entry to INTEL_IDMA64 to make
  the sorting be a true sort, not this mismash.

  Also, this merge disables the COMPILE_TEST for the sun4i DMA
  controller, because it does not compile cleanly at all.     - Linus ]

* tag 'dmaengine-4.3-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (89 commits)
  dmaengine: ioatdma: add Broadwell EP ioatdma PCI dev IDs
  dmaengine :ipu: change ipu_irq_handler() to remove compile warning
  dmaengine: ioatdma: Fix variable array length
  dmaengine: ioatdma: fix sparse "error" with prep lock
  dmaengine: hdmac: Add memset capabilities
  dmaengine: sort the sh Makefile
  dmaengine: sort the sh Kconfig
  dmaengine: sort the dw Kconfig
  dmaengine: sort the Kconfig
  dmaengine: sort the makefile
  drivers/dma: make mv_xor.c driver explicitly non-modular
  dmaengine: Add support for the Analog Devices AXI-DMAC DMA controller
  devicetree: Add bindings documentation for Analog Devices AXI-DMAC
  dmaengine: xgene-dma: Fix the lock to allow client for further submission of requests
  dmaengine: ioatdma: fix coccinelle warning
  dmaengine: ioatdma: fix zero day warning on incompatible pointer type
  dmaengine: tegra-apb: Simplify locking for device using global pause
  dmaengine: tegra-apb: Remove unnecessary return statements and variables
  dmaengine: tegra-apb: Avoid unnecessary channel base address calculation
  dmaengine: tegra-apb: Remove unused variables
  ...
2015-09-04 11:10:18 -07:00
Linus Torvalds
8d2faea672 Merge tag 'gpio-v4.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO updates from Linus Walleij:
 "This is the bulk of GPIO changes for the v4.3 kernel cycle.

  There is quite a lot going on in the GPIO subsystem this merge window,
  so the main matter is decribed below.

  The hits in other subsystems when making the GPIO flags optional are
  all ACKed by their respective subsystem maintainers.

  Core changes:

   - Root out the wrapper devm_gpiod_get() and gpiod_get() etc versions
     of the descriptor calls that did not use the flags argument on the
     end.  This was around for too long and eventually Uwe Kleine-König
     took the time to clean it out and the last users are removed along
     with the macros in this tag.  In several cases the use of flags
     simplifies the code.  For this reason we have (ACKed) patches
     hitting in DRM, IIO, media, NFC, USB+PHY up until we hammer in the
     nail with removing the macros.

   - Add a fat document describing how much ready-made GPIO stuff we
     have i the kernel to discourage people from reinventing a square
     wheel in userspace, as so often happens.

   - Create a separate lockdep class for each instance of a GPIO IRQ
     chip instead of using one class for all chips, as the current code
     will not work with systems with several GPIO chips doing lockdep
     debugging.

   - Protect against driver unloading also when a GPIO line is only used
     as IRQ for the GPIOLIB_IRQCHIP helpers.

   - If the GPIO chip has no designated owner, assign the parent device
     driver owner as owner.

   - Consolidation of chained IRQ handler install/remove replacing all
     call sites where irq_set_handler_data() and
     irq_set_chained_handler() were done in succession with a combined
     call to irq_set_chained_handler_and_data().

     This series was created by Thomas Gleixner after the problem was
     observed by Russell King.

   - Tglx also made another series of patches switching
     __irq_set_handler_locked() for irq_set_handler_locked() which is
     way cleaner.

   - Tglx and Jiang Liu wrote a good bunch of patches to make use of
     irq_desc_get_xxx() accessors and avoid looking up irq_descs from
     IRQ numbers.  The goal is to get rid of the irq number from the
     handlers in the IRQ flow which is nice.

   - Rob Herring killed off the set_irq_flags() for all GPIO drivers.
     This was an ARM specific function that is replaced with the generic
     irq_modify_status() where special flags are actually needed.

   - When an OF node has a pin range for its GPIOs, return -EPROBE_DEFER
     if the pin controller isn't available.  Pretty logical, yet needed
     to be fixed.

   - If a driver using GPIOLIB_IRQCHIP has its own irq_*_resources call
     back, then call these instead of the defaults provided by the
     GPIOLIB.

   - Fix an undocumented ABI hole: named GPIOs were not properly
     documented.

  Driver improvements:

   - Add get_direction() support to the generic GPIO driver, it's
     strange that we didn't have that before.

   - Make it possible to have input-only GPIO chips using the generic
     GPIO driver.

   - Clean out platform data support from the Emma Mobile (EM) driver

   - Finegrained runtime PM support for the RCAR driver.

   - Support r8a7795 (R-car H3) in the RCAR driver.

   - Support interrupts on GPIOs 16 thru 31 in the DaVinci driver.

   - Some consolidation and new support in the MPC8xxx driver, we now
     support MPC5125.

   - Preempt-RT-friendly patches: the OMAP, MPC8xxx, drivers uses raw
     spinlocks making it work better with the realime patches.

   - Interrupt support for the EXTRAXFS GPIO driver.

   - Make the ETRAXFS GPIO driver support also ARTPEC-3.

   - Interrupt and wakeup support for the BRCMSTB driver, also for
     wakeup from S5 cold boot.

   - Mask MXC IRQs during suspend.

   - Improve OMAP2 GPIO set_debounce() to work according to spec.

   - The VF610 driver handles IRQs properly.

  New drivers:

   - ZTE ZX GPIO driver"

* tag 'gpio-v4.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (87 commits)
  Revert "gpio: extraxfs: fix returnvar.cocci warnings"
  gpio: tc3589x: use static container helper
  gpio: xlp: fix error return code
  gpio: vf610: handle level IRQ's properly
  gpio: max732x: Fix error handling in probe()
  gpio: omap: fix clk_prepare/unprepare usage
  gpio: omap: protect regs access in omap_gpio_irq_handler
  gpio: omap: fix omap2_set_gpio_debounce
  gpio: omap: switch to use platform_get_irq
  gpio: omap: remove wrong irq_domain_remove usage in probe
  gpiolib: add description for gpio irqchip fields in struct gpio_chip
  gpio: extraxfs: fix returnvar.cocci warnings
  gpiolib: irqchip: use different lockdep class for each gpio irqchip
  gpio/grgpio: fix deadlock in grgpio_irq_unmap()
  Documentation: gpio: consumer: describe active low property
  gpio: mxc: fix section mismatch warning
  gpio/mxc: mask gpio interrupts in suspend
  gpio: omap: Fix missing raw locks conversion
  gpio: brcmstb: support wakeup from S5 cold boot
  gpio: brcmstb: Add interrupt and wakeup source support
  ...
2015-09-04 10:07:45 -07:00
Mark Brown
072502a67c Merge remote-tracking branches 'regmap/topic/lockdep' and 'regmap/topic/seq-delay' into regmap-next 2015-09-04 17:22:10 +01:00
Mark Brown
84fb9015d2 Merge remote-tracking branches 'regmap/topic/debugfs' and 'regmap/topic/force-update' into regmap-next 2015-09-04 17:22:09 +01:00
Mark Brown
a458a6d411 Merge remote-tracking branch 'regmap/topic/core' into regmap-next 2015-09-04 17:22:08 +01:00
Greg Kroah-Hartman
062a68a5e0 Revert "uart: pl011: Add support to ZTE ZX296702 uart"
This reverts commit 8cd90e50d1 as with
this patch the serial console is broken on lots of platforms.

Reported-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Jun Nie <jun.nie@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Tested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-04 09:14:20 -07:00
Linus Torvalds
807249d3ad Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS updates from Ralf Baechle:
 "This is the main pull request for 4.3 for MIPS.  Here's the summary:

  Three fixes that didn't make 4.2-stable:

   - a -Os build might compile the kernel using the MIPS16 instruction
     set but the R2 optimized inline functions in <uapi/asm/swab.h> are
     implemented using 32-bit wide instructions which is invalid.

   - a build error in pgtable-bits.h for a particular kernel
     configuration.

   - accessing registers of the CM GCR might have been compiled to use
     64 bit accesses but these registers are onl 32 bit wide.

  And also a few new bits:

   - move the ATH79 GPIO driver to drivers/gpio

   - the definition of IRQCHIP_DECLARE has moved to linux/irqchip.h,
     change ATH79 accordingly.

   - fix definition of pgprot_writecombine

   - add an implementation of dma_map_ops.mmap

   - fix alignment of quiet build output for vmlinuz link

   - BCM47xx: Use kmemdup rather than duplicating its implementation

   - Netlogic: Fix 0x0x prefixes of constants.

   - merge Bjorn Helgaas' series to remove most of the weak keywords
     from function declarations.

   - CP0 and CP1 registers are best considered treated as unsigned
     values to avoid large values from becoming negative values.

   - improve support for the MIPS GIC timer.

   - enable common clock framework for Malta and SEAD3.

   - a number of improvments and fixes to dump_tlb().

   - document the MIPS TLB dump functionality in Magic SysRq.

   - Cavium Octeon CN68XX improvments.

   - NetLogic improvments.

   - irq: Use access helper irq_data_get_affinity_mask.

   - handle MSA unaligned accesses.

   - a number of R6-related math-emu fixes.

   - support for I6400.

   - improvments to MSA support.

   - add uprobes support.

   - move from deprecated __initcall to arch_initcall.

   - remove finish_arch_switch().

   - IRQ cleanups by Thomas Gleixner.

   - migrate to new 'set-state' interface.

   - random small cleanups"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (148 commits)
  MIPS: UAPI: Fix unrecognized opcode WSBH/DSBH/DSHD when using MIPS16.
  MIPS: Fix alignment of quiet build output for vmlinuz link
  MIPS: math-emu: Remove unused handle_dsemul function declaration
  MIPS: math-emu: Add support for the MIPS R6 MAX{, A} FPU instruction
  MIPS: math-emu: Add support for the MIPS R6 MIN{, A} FPU instruction
  MIPS: math-emu: Add support for the MIPS R6 CLASS FPU instruction
  MIPS: math-emu: Add support for the MIPS R6 RINT FPU instruction
  MIPS: math-emu: Add support for the MIPS R6 MSUBF FPU instruction
  MIPS: math-emu: Add support for the MIPS R6 MADDF FPU instruction
  MIPS: math-emu: Add support for the MIPS R6 SELNEZ FPU instruction
  MIPS: math-emu: Add support for the MIPS R6 SELEQZ FPU instruction
  MIPS: math-emu: Add support for the CMP.condn.fmt R6 instruction
  MIPS: inst.h: Add new MIPS R6 FPU opcodes
  MIPS: Octeon: Fix management port MII address on Kontron S1901
  MIPS: BCM47xx: Use kmemdup rather than duplicating its implementation
  STAGING: Octeon: Use common helpers for determining interface and port
  MIPS: Octeon: Support interfaces 4 and 5
  MIPS: Octeon: Set up 1:1 mapping between CN68XX PKO queues and ports
  MIPS: Octeon: Initialize CN68XX PKO
  STAGING: Octeon: Support CN68XX style WQE
  ...
2015-09-03 16:55:55 -07:00
Linus Torvalds
ff474e8ca8 Merge tag 'powerpc-4.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:

 - support "hybrid" iommu/direct DMA ops for coherent_mask < dma_mask
   from Benjamin Herrenschmidt

 - EEH fixes for SRIOV from Gavin

 - introduce rtas_get_sensor_fast() for IRQ handlers from Thomas Huth

 - use hardware RNG for arch_get_random_seed_* not arch_get_random_*
   from Paul Mackerras

 - seccomp filter support from Michael Ellerman

 - opal_cec_reboot2() handling for HMIs & machine checks from Mahesh
   Salgaonkar

 - add powerpc timebase as a trace clock source from Naveen N.  Rao

 - misc cleanups in the xmon, signal & SLB code from Anshuman Khandual

 - add an inline function to update POWER8 HID0 from Gautham R.  Shenoy

 - fix pte_pagesize_index() crash on 4K w/64K hash from Michael Ellerman

 - drop support for 64K local store on 4K kernels from Michael Ellerman

 - move dma_get_required_mask() from pnv_phb to pci_controller_ops from
   Andrew Donnellan

 - initialize distance lookup table from drconf path from Nikunj A
   Dadhania

 - enable RTC class support from Vaibhav Jain

 - disable automatically blocked PCI config from Gavin Shan

 - add LEDs driver for PowerNV platform from Vasant Hegde

 - fix endianness issues in the HVSI driver from Laurent Dufour

 - kexec endian fixes from Samuel Mendoza-Jonas

 - fix corrupted pdn list from Gavin Shan

 - fix fenced PHB caused by eeh_slot_error_detail() from Gavin Shan

 - Freescale updates from Scott: Highlights include 32-bit memcpy/memset
   optimizations, checksum optimizations, 85xx config fragments and
   updates, device tree updates, e6500 fixes for non-SMP, and misc
   cleanup and minor fixes.

 - a ton of cxl updates & fixes:
    - add explicit precision specifiers from Rasmus Villemoes
    - use more common format specifier from Rasmus Villemoes
    - destroy cxl_adapter_idr on module_exit from Johannes Thumshirn
    - destroy afu->contexts_idr on release of an afu from Johannes
      Thumshirn
    - compile with -Werror from Daniel Axtens
    - EEH support from Daniel Axtens
    - plug irq_bitmap getting leaked in cxl_context from Vaibhav Jain
    - add alternate MMIO error handling from Ian Munsie
    - allow release of contexts which have been OPENED but not STARTED
      from Andrew Donnellan
    - remove use of macro DEFINE_PCI_DEVICE_TABLE from Vaishali Thakkar
    - release irqs if memory allocation fails from Vaibhav Jain
    - remove racy attempt to force EEH invocation in reset from Daniel
      Axtens
    - fix + cleanup error paths in cxl_dev_context_init from Ian Munsie
    - fix force unmapping mmaps of contexts allocated through the kernel
      api from Ian Munsie
    - set up and enable PSL Timebase from Philippe Bergheaud

* tag 'powerpc-4.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (140 commits)
  cxl: Set up and enable PSL Timebase
  cxl: Fix force unmapping mmaps of contexts allocated through the kernel api
  cxl: Fix + cleanup error paths in cxl_dev_context_init
  powerpc/eeh: Fix fenced PHB caused by eeh_slot_error_detail()
  powerpc/pseries: Cleanup on pci_dn_reconfig_notifier()
  powerpc/pseries: Fix corrupted pdn list
  powerpc/powernv: Enable LEDS support
  powerpc/iommu: Set default DMA offset in dma_dev_setup
  cxl: Remove racy attempt to force EEH invocation in reset
  cxl: Release irqs if memory allocation fails
  cxl: Remove use of macro DEFINE_PCI_DEVICE_TABLE
  powerpc/powernv: Fix mis-merge of OPAL support for LEDS driver
  powerpc/powernv: Reset HILE before kexec_sequence()
  powerpc/kexec: Reset secondary cpu endianness before kexec
  powerpc/hvsi: Fix endianness issues in the HVSI driver
  leds/powernv: Add driver for PowerNV platform
  powerpc/powernv: Create LED platform device
  powerpc/powernv: Add OPAL interfaces for accessing and modifying system LED states
  powerpc/powernv: Fix the log message when disabling VF
  cxl: Allow release of contexts which have been OPENED but not STARTED
  ...
2015-09-03 16:41:38 -07:00
Linus Torvalds
c706c7eb0d Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM development updates from Russell King:
 "Included in this update:

   - moving PSCI code from ARM64/ARM to drivers/

   - removal of some architecture internals from global kernel view

   - addition of software based "privileged no access" support using the
     old domains register to turn off the ability for kernel
     loads/stores to access userspace.  Only the proper accessors will
     be usable.

   - addition of early fixup support for early console

   - re-addition (and reimplementation) of OMAP special interconnect
     barrier

   - removal of finish_arch_switch()

   - only expose cpuX/online in sysfs if hotpluggable

   - a number of code cleanups"

* 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (41 commits)
  ARM: software-based priviledged-no-access support
  ARM: entry: provide uaccess assembly macro hooks
  ARM: entry: get rid of multiple macro definitions
  ARM: 8421/1: smp: Collapse arch_cpu_idle_dead() into cpu_die()
  ARM: uaccess: provide uaccess_save_and_enable() and uaccess_restore()
  ARM: mm: improve do_ldrd_abort macro
  ARM: entry: ensure that IRQs are enabled when calling syscall_trace_exit()
  ARM: entry: efficiency cleanups
  ARM: entry: get rid of asm_trace_hardirqs_on_cond
  ARM: uaccess: simplify user access assembly
  ARM: domains: remove DOMAIN_TABLE
  ARM: domains: keep vectors in separate domain
  ARM: domains: get rid of manager mode for user domain
  ARM: domains: move initial domain setting value to asm/domains.h
  ARM: domains: provide domain_mask()
  ARM: domains: switch to keeping domain value in register
  ARM: 8419/1: dma-mapping: harmonize definition of DMA_ERROR_CODE
  ARM: 8417/1: refactor bitops functions with BIT_MASK() and BIT_WORD()
  ARM: 8416/1: Feroceon: use of_iomap() to map register base
  ARM: 8415/1: early fixmap support for earlycon
  ...
2015-09-03 16:27:01 -07:00
Linus Torvalds
ca520cab25 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking and atomic updates from Ingo Molnar:
 "Main changes in this cycle are:

   - Extend atomic primitives with coherent logic op primitives
     (atomic_{or,and,xor}()) and deprecate the old partial APIs
     (atomic_{set,clear}_mask())

     The old ops were incoherent with incompatible signatures across
     architectures and with incomplete support.  Now every architecture
     supports the primitives consistently (by Peter Zijlstra)

   - Generic support for 'relaxed atomics':

       - _acquire/release/relaxed() flavours of xchg(), cmpxchg() and {add,sub}_return()
       - atomic_read_acquire()
       - atomic_set_release()

     This came out of porting qwrlock code to arm64 (by Will Deacon)

   - Clean up the fragile static_key APIs that were causing repeat bugs,
     by introducing a new one:

       DEFINE_STATIC_KEY_TRUE(name);
       DEFINE_STATIC_KEY_FALSE(name);

     which define a key of different types with an initial true/false
     value.

     Then allow:

       static_branch_likely()
       static_branch_unlikely()

     to take a key of either type and emit the right instruction for the
     case.  To be able to know the 'type' of the static key we encode it
     in the jump entry (by Peter Zijlstra)

   - Static key self-tests (by Jason Baron)

   - qrwlock optimizations (by Waiman Long)

   - small futex enhancements (by Davidlohr Bueso)

   - ... and misc other changes"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits)
  jump_label/x86: Work around asm build bug on older/backported GCCs
  locking, ARM, atomics: Define our SMP atomics in terms of _relaxed() operations
  locking, include/llist: Use linux/atomic.h instead of asm/cmpxchg.h
  locking/qrwlock: Make use of _{acquire|release|relaxed}() atomics
  locking/qrwlock: Implement queue_write_unlock() using smp_store_release()
  locking/lockref: Remove homebrew cmpxchg64_relaxed() macro definition
  locking, asm-generic: Add _{relaxed|acquire|release}() variants for 'atomic_long_t'
  locking, asm-generic: Rework atomic-long.h to avoid bulk code duplication
  locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations
  locking, compiler.h: Cast away attributes in the WRITE_ONCE() magic
  locking/static_keys: Make verify_keys() static
  jump label, locking/static_keys: Update docs
  locking/static_keys: Provide a selftest
  jump_label: Provide a self-test
  s390/uaccess, locking/static_keys: employ static_branch_likely()
  x86, tsc, locking/static_keys: Employ static_branch_likely()
  locking/static_keys: Add selftest
  locking/static_keys: Add a new static_key interface
  locking/static_keys: Rework update logic
  locking/static_keys: Add static_key_{en,dis}able() helpers
  ...
2015-09-03 15:46:07 -07:00
Linus Torvalds
4c12ab7e5e Merge tag 'for-f2fs-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
 "The major work includes fixing and enhancing the existing extent_cache
  feature, which has been well settling down so far and now it becomes a
  default mount option accordingly.

  Also, this version newly registers a f2fs memory shrinker to reclaim
  several objects consumed by a couple of data structures in order to
  avoid memory pressures.

  Another new feature is to add ioctl(F2FS_GARBAGE_COLLECT) which
  triggers a cleaning job explicitly by users.

  Most of the other patches are to fix bugs occurred in the corner cases
  across the whole code area"

* tag 'for-f2fs-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (85 commits)
  f2fs: upset segment_info repair
  f2fs: avoid accessing NULL pointer in f2fs_drop_largest_extent
  f2fs: update extent tree in batches
  f2fs: fix to release inode correctly
  f2fs: handle f2fs_truncate error correctly
  f2fs: avoid unneeded initializing when converting inline dentry
  f2fs: atomically set inode->i_flags
  f2fs: fix wrong pointer access during try_to_free_nids
  f2fs: use __GFP_NOFAIL to avoid infinite loop
  f2fs: lookup neighbor extent nodes for merging later
  f2fs: split __insert_extent_tree_ret for readability
  f2fs: kill dead code in __insert_extent_tree
  f2fs: adjust showing of extent cache stat
  f2fs: add largest/cached stat in extent cache
  f2fs: fix incorrect mapping for bmap
  f2fs: add annotation for space utilization of regular/inline dentry
  f2fs: fix to update cached_en of extent tree properly
  f2fs: fix typo
  f2fs: check the node block address of newly allocated nid
  f2fs: go out for insert_inode_locked failure
  ...
2015-09-03 13:10:22 -07:00
Hidehiro Kawai
82802f968b ipmi: Don't flush messages in sender() in run-to-completion mode
When flushing queued messages in run-to-completion mode,
smi_event_handler() is recursively called.

flush_messages()
 smi_event_handler()
  handle_transaction_done()
   deliver_recv_msg()
    ipmi_smi_msg_received()
     smi_recv_tasklet()
      sender()
       flush_messages()
        smi_event_handler()
         ...

The depth of the recursive call depends on the number of queued
messages, so it can cause a stack overflow if many messages have
been queued.

To solve this problem, this patch removes flush_messages()
from sender()@ipmi_si_intf.c.  Instead, add flush_messages() to
caller side of sender() if needed.  Additionally, to implement this,
add new handler flush_messages to struct ipmi_smi_handlers.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>

Fixed up a comment and some spacing issues.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
2015-09-03 15:02:28 -05:00
Corey Minyard
81d02b7f8c ipmi: Make some data const that was only read
Several data structures were only used for reading, so make them
const.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
2015-09-03 15:02:27 -05:00
Linus Torvalds
ea814ab9aa Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
 "Pretty much all bug fixes and clean ups for 4.3, after a lot of
  features and other churn going into 4.2"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  Revert "ext4: remove block_device_ejected"
  ext4: ratelimit the file system mounted message
  ext4: silence a format string false positive
  ext4: simplify some code in read_mmp_block()
  ext4: don't manipulate recovery flag when freezing no-journal fs
  jbd2: limit number of reserved credits
  ext4 crypto: remove duplicate header file
  ext4: update c/mtime on truncate up
  jbd2: avoid infinite loop when destroying aborted journal
  ext4, jbd2: add REQ_FUA flag when recording an error in the superblock
  ext4 crypto: fix spelling typo in comment
  ext4 crypto: exit cleanly if ext4_derive_key_aes() fails
  ext4: reject journal options for ext2 mounts
  ext4: implement cgroup writeback support
  ext4: replace ext4_io_submit->io_op with ->io_wbc
  ext4 crypto: check for too-short encrypted file names
  ext4 crypto: use a jbd2 transaction when adding a crypto policy
  jbd2: speedup jbd2_journal_dirty_metadata()
2015-09-03 12:52:19 -07:00
Linus Torvalds
e31fb9e005 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext3 removal, quota & udf fixes from Jan Kara:
 "The biggest change in the pull is the removal of ext3 filesystem
  driver (~28k lines removed).  Ext4 driver is a full featured
  replacement these days and both RH and SUSE use it for several years
  without issues.  Also there are some workarounds in VM & block layer
  mainly for ext3 which we could eventually get rid of.

  Other larger change is addition of proper error handling for
  dquot_initialize().  The rest is small fixes and cleanups"

[ I wasn't convinced about the ext3 removal and worried about things
  falling through the cracks for legacy users, but ext4 maintainers
  piped up and were all unanimously in favor of removal, and maintaining
  all legacy ext3 support inside ext4.   - Linus ]

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: Don't modify filesystem for read-only mounts
  quota: remove an unneeded condition
  ext4: memory leak on error in ext4_symlink()
  mm/Kconfig: NEED_BOUNCE_POOL: clean-up condition
  ext4: Improve ext4 Kconfig test
  block: Remove forced page bouncing under IO
  fs: Remove ext3 filesystem driver
  doc: Update doc about journalling layer
  jfs: Handle error from dquot_initialize()
  reiserfs: Handle error from dquot_initialize()
  ocfs2: Handle error from dquot_initialize()
  ext4: Handle error from dquot_initialize()
  ext2: Handle error from dquot_initalize()
  quota: Propagate error from ->acquire_dquot()
2015-09-03 12:28:30 -07:00
Jens Axboe
5e7c4274a7 block: Check for gaps on front and back merges
We are checking for gaps to previous bio_vec, which can
only detect back merges gaps. Moreover, at the point where
we check for a gap, we don't know if we will attempt a back
or a front merge. Thus, check for gap to prev in a back merge
attempt and check for a gap to next in a front merge attempt.

Signed-off-by: Jens Axboe <axboe@fb.com>
[sagig: Minor rename change]
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
2015-09-03 10:33:09 -06:00
Linus Torvalds
dd5cdb48ed Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Another merge window, another set of networking changes.  I've heard
  rumblings that the lightweight tunnels infrastructure has been voted
  networking change of the year.  But what do I know?

   1) Add conntrack support to openvswitch, from Joe Stringer.

   2) Initial support for VRF (Virtual Routing and Forwarding), which
      allows the segmentation of routing paths without using multiple
      devices.  There are some semantic kinks to work out still, but
      this is a reasonably strong foundation.  From David Ahern.

   3) Remove spinlock fro act_bpf fast path, from Alexei Starovoitov.

   4) Ignore route nexthops with a link down state in ipv6, just like
      ipv4.  From Andy Gospodarek.

   5) Remove spinlock from fast path of act_gact and act_mirred, from
      Eric Dumazet.

   6) Document the DSA layer, from Florian Fainelli.

   7) Add netconsole support to bcmgenet, systemport, and DSA.  Also
      from Florian Fainelli.

   8) Add Mellanox Switch Driver and core infrastructure, from Jiri
      Pirko.

   9) Add support for "light weight tunnels", which allow for
      encapsulation and decapsulation without bearing the overhead of a
      full blown netdevice.  From Thomas Graf, Jiri Benc, and a cast of
      others.

  10) Add Identifier Locator Addressing support for ipv6, from Tom
      Herbert.

  11) Support fragmented SKBs in iwlwifi, from Johannes Berg.

  12) Allow perf PMUs to be accessed from eBPF programs, from Kaixu Xia.

  13) Add BQL support to 3c59x driver, from Loganaden Velvindron.

  14) Stop using a zero TX queue length to mean that a device shouldn't
      have a qdisc attached, use an explicit flag instead.  From Phil
      Sutter.

  15) Use generic geneve netdevice infrastructure in openvswitch, from
      Pravin B Shelar.

  16) Add infrastructure to avoid re-forwarding a packet in software
      that was already forwarded by a hardware switch.  From Scott
      Feldman.

  17) Allow AF_PACKET fanout function to be implemented in a bpf
      program, from Willem de Bruijn"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1458 commits)
  netfilter: nf_conntrack: make nf_ct_zone_dflt built-in
  netfilter: nf_dup{4, 6}: fix build error when nf_conntrack disabled
  net: fec: clear receive interrupts before processing a packet
  ipv6: fix exthdrs offload registration in out_rt path
  xen-netback: add support for multicast control
  bgmac: Update fixed_phy_register()
  sock, diag: fix panic in sock_diag_put_filterinfo
  flow_dissector: Use 'const' where possible.
  flow_dissector: Fix function argument ordering dependency
  ixgbe: Resolve "initialized field overwritten" warnings
  ixgbe: Remove bimodal SR-IOV disabling
  ixgbe: Add support for reporting 2.5G link speed
  ixgbe: fix bounds checking in ixgbe_setup_tc for 82598
  ixgbe: support for ethtool set_rxfh
  ixgbe: Avoid needless PHY access on copper phys
  ixgbe: cleanup to use cached mask value
  ixgbe: Remove second instance of lan_id variable
  ixgbe: use kzalloc for allocating one thing
  flow: Move __get_hash_from_flowi{4,6} into flow_dissector.c
  ixgbe: Remove unused PCI bus types
  ...
2015-09-03 08:08:17 -07:00
Daniel Borkmann
62da98656b netfilter: nf_conntrack: make nf_ct_zone_dflt built-in
Fengguang reported, that some randconfig generated the following linker
issue with nf_ct_zone_dflt object involved:

  [...]
  CC      init/version.o
  LD      init/built-in.o
  net/built-in.o: In function `ipv4_conntrack_defrag':
  nf_defrag_ipv4.c:(.text+0x93e95): undefined reference to `nf_ct_zone_dflt'
  net/built-in.o: In function `ipv6_defrag':
  nf_defrag_ipv6_hooks.c:(.text+0xe3ffe): undefined reference to `nf_ct_zone_dflt'
  make: *** [vmlinux] Error 1

Given that configurations exist where we have a built-in part, which is
accessing nf_ct_zone_dflt such as the two handlers nf_ct_defrag_user()
and nf_ct6_defrag_user(), and a part that configures nf_conntrack as a
module, we must move nf_ct_zone_dflt into a fixed, guaranteed built-in
area when netfilter is configured in general.

Therefore, split the more generic parts into a common header under
include/linux/netfilter/ and move nf_ct_zone_dflt into the built-in
section that already holds parts related to CONFIG_NF_CONNTRACK in the
netfilter core. This fixes the issue on my side.

Fixes: 308ac9143e ("netfilter: nf_conntrack: push zone object into functions")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-02 16:32:56 -07:00