Commit Graph

54235 Commits

Author SHA1 Message Date
Hugh Dickins
3a4f8a0b3f mm: remove shmem_mapping() shmem_zero_setup() duplicates
Remove the prototypes for shmem_mapping() and shmem_zero_setup() from
linux/mm.h, since they are already provided in linux/shmem_fs.h.  But
shmem_fs.h must then provide the inline stub for shmem_mapping() when
CONFIG_SHMEM is not set, and a few more cfiles now need to #include it.

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1702081658250.1549@eggly.anvils
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:56 -08:00
David Rientjes
def5efe037 mm, madvise: fail with ENOMEM when splitting vma will hit max_map_count
If madvise(2) advice will result in the underlying vma being split and
the number of areas mapped by the process will exceed
/proc/sys/vm/max_map_count as a result, return ENOMEM instead of EAGAIN.

EAGAIN is returned by madvise(2) when a kernel resource, such as slab,
is temporarily unavailable.  It indicates that userspace should retry
the advice in the near future.  This is important for advice such as
MADV_DONTNEED which is often used by malloc implementations to free
memory back to the system: we really do want to free memory back when
madvise(2) returns EAGAIN because slab allocations (for vmas, anon_vmas,
or mempolicies) cannot be allocated.

Encountering /proc/sys/vm/max_map_count is not a temporary failure,
however, so return ENOMEM to indicate this is a more serious issue.  A
followup patch to the man page will specify this behavior.

Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1701241431120.42507@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:55 -08:00
Lucas Stach
712c604dcd mm: wire up GFP flag passing in dma_alloc_from_contiguous
The callers of the DMA alloc functions already provide the proper
context GFP flags.  Make sure to pass them through to the CMA allocator,
to make the CMA compaction context aware.

Link: http://lkml.kernel.org/r/20170127172328.18574-3-l.stach@pengutronix.de
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Alexander Graf <agraf@suse.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:55 -08:00
Lucas Stach
e2f466e32f mm: cma_alloc: allow to specify GFP mask
Most users of this interface just want to use it with the default
GFP_KERNEL flags, but for cases where DMA memory is allocated it may be
called from a different context.

No functional change yet, just passing through the flag to the
underlying alloc_contig_range function.

Link: http://lkml.kernel.org/r/20170127172328.18574-2-l.stach@pengutronix.de
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Alexander Graf <agraf@suse.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:55 -08:00
Lucas Stach
ca96b62534 mm: alloc_contig_range: allow to specify GFP mask
Currently alloc_contig_range assumes that the compaction should be done
with the default GFP_KERNEL flags.  This is probably right for all
current uses of this interface, but may change as CMA is used in more
use-cases (including being the default DMA memory allocator on some
platforms).

Change the function prototype, to allow for passing through the GFP mask
set by upper layers.

Also respect global restrictions by applying memalloc_noio_flags to the
passed in flags.

Link: http://lkml.kernel.org/r/20170127172328.18574-1-l.stach@pengutronix.de
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Alexander Graf <agraf@suse.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:55 -08:00
Mike Rapoport
ca49ca7114 userfaultfd: non-cooperative: add event for exit() notification
Allow userfaultfd monitor track termination of the processes that have
memory backed by the uffd.

[rppt@linux.vnet.ibm.com: add comment]
  Link: http://lkml.kernel.org/r/20170202135448.GB19804@rapoport-lnxLink: http://lkml.kernel.org/r/1485542673-24387-4-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:55 -08:00
Mike Rapoport
897ab3e0c4 userfaultfd: non-cooperative: add event for memory unmaps
When a non-cooperative userfaultfd monitor copies pages in the
background, it may encounter regions that were already unmapped.
Addition of UFFD_EVENT_UNMAP allows the uffd monitor to track precisely
changes in the virtual memory layout.

Since there might be different uffd contexts for the affected VMAs, we
first should create a temporary representation for the unmap event for
each uffd context and then notify them one by one to the appropriate
userfault file descriptors.

The event notification occurs after the mmap_sem has been released.

[arnd@arndb.de: fix nommu build]
  Link: http://lkml.kernel.org/r/20170203165141.3665284-1-arnd@arndb.de
[mhocko@suse.com: fix nommu build]
  Link: http://lkml.kernel.org/r/20170202091503.GA22823@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/1485542673-24387-3-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:55 -08:00
Kirill A. Shutemov
d53a8b49a6 mm: drop page_check_address{,_transhuge}
All users are gone. Let's drop them.

Link: http://lkml.kernel.org/r/20170129173858.45174-12-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:55 -08:00
Kirill A. Shutemov
ace71a19ce mm: introduce page_vma_mapped_walk()
Introduce a new interface to check if a page is mapped into a vma.  It
aims to address shortcomings of page_check_address{,_transhuge}.

Existing interface is not able to handle PTE-mapped THPs: it only finds
the first PTE.  The rest lefted unnoticed.

page_vma_mapped_walk() iterates over all possible mapping of the page in
the vma.

Link: http://lkml.kernel.org/r/20170129173858.45174-3-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:55 -08:00
Yisheng Xie
cbae0170e5 mm/migration: make isolate_movable_page always defined
Define isolate_movable_page as a static inline function when
CONFIG_MIGRATION is not enable.  It should return -EBUSY here which
means failed to isolate movable pages.

This patch do not have any functional change but prepare for later
patch.

Link: http://lkml.kernel.org/r/1485867981-16037-3-git-send-email-ysxie@foxmail.com
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:55 -08:00
Yisheng Xie
9e5bcd610f mm/migration: make isolate_movable_page() return int type
Patch series "HWPOISON: soft offlining for non-lru movable page", v6.

After Minchan's commit bda807d444 ("mm: migrate: support non-lru
movable page migration"), some type of non-lru page like zsmalloc and
virtio-balloon page also support migration.

Therefore, we can:

1) soft offlining no-lru movable pages, which means when memory
   corrected errors occur on a non-lru movable page, we can stop to use
   it by migrating data onto another page and disable the original
   (maybe half-broken) one.

2) enable memory hotplug for non-lru movable pages, i.e. we may offline
   blocks, which include such pages, by using non-lru page migration.

This patchset is heavily dependent on non-lru movable page migration.

This patch (of 4):

Change the return type of isolate_movable_page() from bool to int.  It
will return 0 when isolate movable page successfully, and return -EBUSY
when it isolates failed.

There is no functional change within this patch but prepare for later
patch.

[xieyisheng1@huawei.com: v6]
  Link: http://lkml.kernel.org/r/1486108770-630-2-git-send-email-xieyisheng1@huawei.com
Link: http://lkml.kernel.org/r/1485867981-16037-2-git-send-email-ysxie@foxmail.com
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:55 -08:00
Dave Jiang
c791ace1e7 mm: replace FAULT_FLAG_SIZE with parameter to huge_fault
Since the introduction of FAULT_FLAG_SIZE to the vm_fault flag, it has
been somewhat painful with getting the flags set and removed at the
correct locations.  More than one kernel oops was introduced due to
difficulties of getting the placement correctly.

Remove the flag values and introduce an input parameter to huge_fault
that indicates the size of the page entry.  This makes the code easier
to trace and should avoid the issues we see with the fault flags where
removal of the flag was necessary in the fallback paths.

Link: http://lkml.kernel.org/r/148615748258.43180.1690152053774975329.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Matthew Wilcox
a00cc7d9dd mm, x86: add support for PUD-sized transparent hugepages
The current transparent hugepage code only supports PMDs.  This patch
adds support for transparent use of PUDs with DAX.  It does not include
support for anonymous pages.  x86 support code also added.

Most of this patch simply parallels the work that was done for huge
PMDs.  The only major difference is how the new ->pud_entry method in
mm_walk works.  The ->pmd_entry method replaces the ->pte_entry method,
whereas the ->pud_entry method works along with either ->pmd_entry or
->pte_entry.  The pagewalk code takes care of locking the PUD before
calling ->pud_walk, so handlers do not need to worry whether the PUD is
stable.

[dave.jiang@intel.com: fix SMP x86 32bit build for native_pud_clear()]
  Link: http://lkml.kernel.org/r/148719066814.31111.3239231168815337012.stgit@djiang5-desk3.ch.intel.com
[dave.jiang@intel.com: native_pud_clear missing on i386 build]
  Link: http://lkml.kernel.org/r/148640375195.69754.3315433724330910314.stgit@djiang5-desk3.ch.intel.com
Link: http://lkml.kernel.org/r/148545059381.17912.8602162635537598445.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Alexander Kapshuk <alexander.kapshuk@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Dave Jiang
a2d581675d mm,fs,dax: change ->pmd_fault to ->huge_fault
Patch series "1G transparent hugepage support for device dax", v2.

The following series implements support for 1G trasparent hugepage on
x86 for device dax.  The bulk of the code was written by Mathew Wilcox a
while back supporting transparent 1G hugepage for fs DAX.  I have
forward ported the relevant bits to 4.10-rc.  The current submission has
only the necessary code to support device DAX.

Comments from Dan Williams: So the motivation and intended user of this
functionality mirrors the motivation and users of 1GB page support in
hugetlbfs.  Given expected capacities of persistent memory devices an
in-memory database may want to reduce tlb pressure beyond what they can
already achieve with 2MB mappings of a device-dax file.  We have
customer feedback to that effect as Willy mentioned in his previous
version of these patches [1].

[1]: https://lkml.org/lkml/2016/1/31/52

Comments from Nilesh @ Oracle:

There are applications which have a process model; and if you assume
10,000 processes attempting to mmap all the 6TB memory available on a
server; we are looking at the following:

processes         : 10,000
memory            :    6TB
pte @ 4k page size: 8 bytes / 4K of memory * #processes = 6TB / 4k * 8 * 10000 = 1.5GB * 80000 = 120,000GB
pmd @ 2M page size: 120,000 / 512 = ~240GB
pud @ 1G page size: 240GB / 512 = ~480MB

As you can see with 2M pages, this system will use up an exorbitant
amount of DRAM to hold the page tables; but the 1G pages finally brings
it down to a reasonable level.  Memory sizes will keep increasing; so
this number will keep increasing.

An argument can be made to convert the applications from process model
to thread model, but in the real world that may not be always practical.
Hopefully this helps explain the use case where this is valuable.

This patch (of 3):

In preparation for adding the ability to handle PUD pages, convert
vm_operations_struct.pmd_fault to vm_operations_struct.huge_fault.  The
vm_fault structure is extended to include a union of the different page
table pointers that may be needed, and three flag bits are reserved to
indicate which type of pointer is in the union.

[ross.zwisler@linux.intel.com: remove unused function ext4_dax_huge_fault()]
  Link: http://lkml.kernel.org/r/1485813172-7284-1-git-send-email-ross.zwisler@linux.intel.com
[dave.jiang@intel.com: clear PMD or PUD size flags when in fall through path]
  Link: http://lkml.kernel.org/r/148589842696.5820.16078080610311444794.stgit@djiang5-desk3.ch.intel.com
Link: http://lkml.kernel.org/r/148545058784.17912.6353162518188733642.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Dave Jiang
11bac80004 mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf
->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
take a vma and vmf parameter when the vma already resides in vmf.

Remove the vma parameter to simplify things.

[arnd@arndb.de: fix ARM build]
  Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Johannes Weiner
c55e8d035b mm: vmscan: move dirty pages out of the way until they're flushed
We noticed a performance regression when moving hadoop workloads from
3.10 kernels to 4.0 and 4.6.  This is accompanied by increased pageout
activity initiated by kswapd as well as frequent bursts of allocation
stalls and direct reclaim scans.  Even lowering the dirty ratios to the
equivalent of less than 1% of memory would not eliminate the issue,
suggesting that dirty pages concentrate where the scanner is looking.

This can be traced back to recent efforts of thrash avoidance.  Where
3.10 would not detect refaulting pages and continuously supply clean
cache to the inactive list, a thrashing workload on 4.0+ will detect and
activate refaulting pages right away, distilling used-once pages on the
inactive list much more effectively.  This is by design, and it makes
sense for clean cache.  But for the most part our workload's cache
faults are refaults and its use-once cache is from streaming writes.  We
end up with most of the inactive list dirty, and we don't go after the
active cache as long as we have use-once pages around.

But waiting for writes to avoid reclaiming clean cache that *might*
refault is a bad trade-off.  Even if the refaults happen, reads are
faster than writes.  Before getting bogged down on writeback, reclaim
should first look at *all* cache in the system, even active cache.

To accomplish this, activate pages that are dirty or under writeback
when they reach the end of the inactive LRU.  The pages are marked for
immediate reclaim, meaning they'll get moved back to the inactive LRU
tail as soon as they're written back and become reclaimable.  But in the
meantime, by reducing the inactive list to only immediately reclaimable
pages, we allow the scanner to deactivate and refill the inactive list
with clean cache from the active list tail to guarantee forward
progress.

[hannes@cmpxchg.org: update comment]
  Link: http://lkml.kernel.org/r/20170202191957.22872-8-hannes@cmpxchg.org
Link: http://lkml.kernel.org/r/20170123181641.23938-6-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Johannes Weiner
726d061fbd mm: vmscan: kick flushers when we encounter dirty pages on the LRU
Memory pressure can put dirty pages at the end of the LRU without
anybody running into dirty limits.  Don't start writing individual pages
from kswapd while the flushers might be asleep.

Unlike the old direct reclaim flusher wakeup (removed in the next patch)
that flushes the number of pages just scanned, this patch wakes the
flushers for all outstanding dirty pages.  That seemed to perform better
in a synthetic test that pushes dirty pages to the end of the LRU and
into reclaim, because we know LRU aging outstrips writeback already, and
this way we give younger dirty pages a headstart rather than wait until
reclaim runs into them as well.  It also means less plugging and risk of
exhausting the struct request pool from reclaim.

There is a concern that this will cause temporary files that used to get
dirtied and truncated before writeback to now get written to disk under
memory pressure.  If this turns out to be a real problem, we'll have to
revisit this and tame the reclaim flusher wakeups.

[hannes@cmpxchg.org: mention dirty expiration as a condition]
  Link: http://lkml.kernel.org/r/20170126174739.GA30636@cmpxchg.org
Link: http://lkml.kernel.org/r/20170123181641.23938-3-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Johannes Weiner
1276ad68e2 mm: vmscan: scan dirty pages even in laptop mode
Patch series "mm: vmscan: fix kswapd writeback regression".

We noticed a regression on multiple hadoop workloads when moving from
3.10 to 4.0 and 4.6, which involves kswapd getting tangled up in page
writeout, causing direct reclaim herds that also don't make progress.

I tracked it down to the thrash avoidance efforts after 3.10 that make
the kernel better at keeping use-once cache and use-many cache sorted on
the inactive and active list, with more aggressive protection of the
active list as long as there is inactive cache.  Unfortunately, our
workload's use-once cache is mostly from streaming writes.  Waiting for
writes to avoid potential reloads in the future is not a good tradeoff.

These patches do the following:

1. Wake the flushers when kswapd sees a lump of dirty pages. It's
   possible to be below the dirty background limit and still have cache
   velocity push them through the LRU. So start a-flushin'.

2. Let kswapd only write pages that have been rotated twice. This makes
   sure we really tried to get all the clean pages on the inactive list
   before resorting to horrible LRU-order writeback.

3. Move rotating dirty pages off the inactive list. Instead of churning
   or waiting on page writeback, we'll go after clean active cache. This
   might lead to thrashing, but in this state memory demand outstrips IO
   speed anyway, and reads are faster than writes.

Mel backported the series to 4.10-rc5 with one minor conflict and ran a
couple of tests on it.  Mix of read/write random workload didn't show
anything interesting.  Write-only database didn't show much difference
in performance but there were slight reductions in IO -- probably in the
noise.

simoop did show big differences although not as big as Mel expected.
This is Chris Mason's workload that similate the VM activity of hadoop.
Mel won't go through the full details but over the samples measured
during an hour it reported

                                         4.10.0-rc5            4.10.0-rc5
                                            vanilla         johannes-v1r1
Amean    p50-Read             21346531.56 (  0.00%) 21697513.24 ( -1.64%)
Amean    p95-Read             24700518.40 (  0.00%) 25743268.98 ( -4.22%)
Amean    p99-Read             27959842.13 (  0.00%) 28963271.11 ( -3.59%)
Amean    p50-Write                1138.04 (  0.00%)      989.82 ( 13.02%)
Amean    p95-Write             1106643.48 (  0.00%)    12104.00 ( 98.91%)
Amean    p99-Write             1569213.22 (  0.00%)    36343.38 ( 97.68%)
Amean    p50-Allocation          85159.82 (  0.00%)    79120.70 (  7.09%)
Amean    p95-Allocation         204222.58 (  0.00%)   129018.43 ( 36.82%)
Amean    p99-Allocation         278070.04 (  0.00%)   183354.43 ( 34.06%)
Amean    final-p50-Read       21266432.00 (  0.00%) 21921792.00 ( -3.08%)
Amean    final-p95-Read       24870912.00 (  0.00%) 26116096.00 ( -5.01%)
Amean    final-p99-Read       28147712.00 (  0.00%) 29523968.00 ( -4.89%)
Amean    final-p50-Write          1130.00 (  0.00%)      977.00 ( 13.54%)
Amean    final-p95-Write       1033216.00 (  0.00%)     2980.00 ( 99.71%)
Amean    final-p99-Write       1517568.00 (  0.00%)    32672.00 ( 97.85%)
Amean    final-p50-Allocation    86656.00 (  0.00%)    78464.00 (  9.45%)
Amean    final-p95-Allocation   211712.00 (  0.00%)   116608.00 ( 44.92%)
Amean    final-p99-Allocation   287232.00 (  0.00%)   168704.00 ( 41.27%)

The latencies are actually completely horrific in comparison to 4.4 (and
4.10-rc5 is worse than 4.9 according to historical data for reasons Mel
hasn't analysed yet).

Still, 95% of write latency (p95-write) is halved by the series and
allocation latency is way down.  Direct reclaim activity is one fifth of
what it was according to vmstats.  Kswapd activity is higher but this is
not necessarily surprising.  Kswapd efficiency is unchanged at 99% (99%
of pages scanned were reclaimed) but direct reclaim efficiency went from
77% to 99%

In the vanilla kernel, 627MB of data was written back from reclaim
context.  With the series, no data was written back.  With or without
the patch, pages are being immediately reclaimed after writeback
completes.  However, with the patch, only 1/8th of the pages are
reclaimed like this.

This patch (of 5):

We have an elaborate dirty/writeback throttling mechanism inside the
reclaim scanner, but for that to work the pages have to go through
shrink_page_list() and get counted for what they are.  Otherwise, we
mess up the LRU order and don't match reclaim speed to writeback.

Especially during deactivation, there is never a reason to skip dirty
pages; nothing is even trying to write them out from there.  Don't mess
up the LRU order for nothing, shuffle these pages along.

Link: http://lkml.kernel.org/r/20170123181641.23938-2-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Mike Rapoport
d811914d87 userfaultfd: non-cooperative: rename *EVENT_MADVDONTNEED to *EVENT_REMOVE
Patch series "userfaultfd: non-cooperative: add madvise() event for
MADV_REMOVE request".

These patches add notification of madvise(MADV_REMOVE) event to
non-cooperative userfaultfd monitor.

The first pacth renames EVENT_MADVDONTNEED to EVENT_REMOVE along with
relevant functions and structures.  Using _REMOVE instead of
_MADVDONTNEED describes the event semantics more clearly and I hope it's
not too late for such change in the ABI.

This patch (of 3):

The UFFD_EVENT_MADVDONTNEED purpose is to notify uffd monitor about
removal of certain range from address space tracked by userfaultfd.
Hence, UFFD_EVENT_REMOVE seems to better reflect the operation
semantics.  Respectively, 'madv_dn' field of uffd_msg is renamed to
'remove' and the madvise_userfault_dontneed callback is renamed to
userfaultfd_remove.

Link: http://lkml.kernel.org/r/1484814154-1557-2-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Heiko Carstens
0262d9c845 memblock: embed memblock type name within struct memblock_type
Provide the name of each memblock type with struct memblock_type.  This
allows to get rid of the function memblock_type_name() and duplicating
the type names in __memblock_dump_all().

The only memblock_type usage out of mm/memblock.c seems to be
arch/s390/kernel/crash_dump.c.  While at it, give it a name.

Link: http://lkml.kernel.org/r/20170120123456.46508-4-heiko.carstens@de.ibm.com
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Philipp Hachtmann <phacht@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Dan Williams
3fc2192410 mm: validate device_hotplug is held for memory hotplug
mem_hotplug_begin() assumes that it can set mem_hotplug.active_writer
and run the hotplug process without racing another thread.  Validate
this assumption with a lockdep assertion.

Link: http://lkml.kernel.org/r/148693886229.16345.1770484669403334689.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:53 -08:00
Linus Torvalds
f1ef09fde1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace updates from Eric Biederman:
 "There is a lot here. A lot of these changes result in subtle user
  visible differences in kernel behavior. I don't expect anything will
  care but I will revert/fix things immediately if any regressions show
  up.

  From Seth Forshee there is a continuation of the work to make the vfs
  ready for unpriviled mounts. We had thought the previous changes
  prevented the creation of files outside of s_user_ns of a filesystem,
  but it turns we missed the O_CREAT path. Ooops.

  Pavel Tikhomirov and Oleg Nesterov worked together to fix a long
  standing bug in the implemenation of PR_SET_CHILD_SUBREAPER where only
  children that are forked after the prctl are considered and not
  children forked before the prctl. The only known user of this prctl
  systemd forks all children after the prctl. So no userspace
  regressions will occur. Holding earlier forked children to the same
  rules as later forked children creates a semantic that is sane enough
  to allow checkpoing of processes that use this feature.

  There is a long delayed change by Nikolay Borisov to limit inotify
  instances inside a user namespace.

  Michael Kerrisk extends the API for files used to maniuplate
  namespaces with two new trivial ioctls to allow discovery of the
  hierachy and properties of namespaces.

  Konstantin Khlebnikov with the help of Al Viro adds code that when a
  network namespace exits purges it's sysctl entries from the dcache. As
  in some circumstances this could use a lot of memory.

  Vivek Goyal fixed a bug with stacked filesystems where the permissions
  on the wrong inode were being checked.

  I continue previous work on ptracing across exec. Allowing a file to
  be setuid across exec while being ptraced if the tracer has enough
  credentials in the user namespace, and if the process has CAP_SETUID
  in it's own namespace. Proc files for setuid or otherwise undumpable
  executables are now owned by the root in the user namespace of their
  mm. Allowing debugging of setuid applications in containers to work
  better.

  A bug I introduced with permission checking and automount is now
  fixed. The big change is to mark the mounts that the kernel initiates
  as a result of an automount. This allows the permission checks in sget
  to be safely suppressed for this kind of mount. As the permission
  check happened when the original filesystem was mounted.

  Finally a special case in the mount namespace is removed preventing
  unbounded chains in the mount hash table, and making the semantics
  simpler which benefits CRIU.

  The vfs fix along with related work in ima and evm I believe makes us
  ready to finish developing and merge fully unprivileged mounts of the
  fuse filesystem. The cleanups of the mount namespace makes discussing
  how to fix the worst case complexity of umount. The stacked filesystem
  fixes pave the way for adding multiple mappings for the filesystem
  uids so that efficient and safer containers can be implemented"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  proc/sysctl: Don't grab i_lock under sysctl_lock.
  vfs: Use upper filesystem inode in bprm_fill_uid()
  proc/sysctl: prune stale dentries during unregistering
  mnt: Tuck mounts under others instead of creating shadow/side mounts.
  prctl: propagate has_child_subreaper flag to every descendant
  introduce the walk_process_tree() helper
  nsfs: Add an ioctl() to return owner UID of a userns
  fs: Better permission checking for submounts
  exit: fix the setns() && PR_SET_CHILD_SUBREAPER interaction
  vfs: open() with O_CREAT should not create inodes with unknown ids
  nsfs: Add an ioctl() to return the namespace type
  proc: Better ownership of files for non-dumpable tasks in user namespaces
  exec: Remove LSM_UNSAFE_PTRACE_CAP
  exec: Test the ptracer's saved cred to see if the tracee can gain caps
  exec: Don't reset euid and egid when the tracee has CAP_SETUID
  inotify: Convert to using per-namespace limits
2017-02-23 20:33:51 -08:00
Linus Torvalds
ef96152e6a Merge tag 'drm-for-v4.11-less-shouty' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
 "This is the main drm pull request for v4.11.

  Nothing too major, the tinydrm and mmu-less support should make
  writing smaller drivers easier for some of the simpler platforms, and
  there are a bunch of documentation updates.

  Intel grew displayport MST audio support which is hopefully useful to
  people, and FBC is on by default for GEN9+ (so people know where to
  look for regressions). AMDGPU has a lot of fixes that would like new
  firmware files installed for some GPUs.

  Other than that it's pretty scattered all over.

  I may have a follow up pull request as I know BenH has a bunch of AST
  rework and fixes and I'd like to get those in once they've been tested
  by AST, and I've got at least one pull request I'm just trying to get
  the author to fix up.

  Core:
   - drm_mm reworked
   - Connector list locking and iterators
   - Documentation updates
   - Format handling rework
   - MMU-less support for fbdev helpers
   - drm_crtc_from_index helper
   - Core CRC API
   - Remove drm_framebuffer_unregister_private
   - Debugfs cleanup
   - EDID/Infoframe fixes
   - Release callback
   - Tinydrm support (smaller drivers for simple hw)

  panel:
   - Add support for some new simple panels

  i915:
   - FBC by default for gen9+
   - Shared dpll cleanups and docs
   - GEN8 powerdomain cleanup
   - DMC support on GLK
   - DP MST audio support
   - HuC loading support
   - GVT init ordering fixes
   - GVT IOMMU workaround fix

  amdgpu/radeon:
   - Power/clockgating improvements
   - Preliminary SR-IOV support
   - TTM buffer priority and eviction fixes
   - SI DPM quirks removed due to firmware fixes
   - Powerplay improvements
   - VCE/UVD powergating fixes
   - Cleanup SI GFX code to match CI/VI
   - Support for > 2 displays on 3/5 crtc asics
   - SI headless fixes

  nouveau:
   - Rework securre boot code in prep for GP10x secure boot
   - Channel recovery improvements
   - Initial power budget code
   - MMU rework preperation

  vmwgfx:
   - Bunch of fixes and cleanups

  exynos:
   - Runtime PM support for MIC driver
   - Cleanups to use atomic helpers
   - UHD Support for TM2/TM2E boards
   - Trigger mode fix for Rinato board

  etnaviv:
   - Shader performance fix
   - Command stream validator fixes
   - Command buffer suballocator

  rockchip:
   - CDN DisplayPort support
   - IOMMU support for arm64 platform

  imx-drm:
   - Fix i.MX5 TV encoder probing
   - Remove lower fb size limits

  msm:
   - Support for HW cursor on MDP5 devices
   - DSI encoder cleanup
   - GPU DT bindings cleanup

  sti:
   - stih410 cleanups
   - Create fbdev at binding
   - HQVDP fixes
   - Remove stih416 chip functionality
   - DVI/HDMI mode selection fixes
   - FPS statistic reporting

  omapdrm:
   - IRQ code cleanup

  dwi-hdmi bridge:
   - Cleanups and fixes

  adv-bridge:
   - Updates for nexus

  sii8520 bridge:
   - Add interlace mode support
   - Rework HDMI and lots of fixes

  qxl:
   - probing/teardown cleanups

  ZTE drm:
   - HDMI audio via SPDIF interface
   - Video Layer overlay plane support
   - Add TV encoder output device

  atmel-hlcdc:
   - Rework fbdev creation logic

  tegra:
   - OF node fix

  fsl-dcu:
   - Minor fixes

  mali-dp:
   - Assorted fixes

  sunxi:
   - Minor fix"

[ This was the "fixed" pull, that still had build warnings due to people
  not even having build tested the result. I'm not a happy camper

  I've fixed the things I noticed up in this merge.      - Linus ]

* tag 'drm-for-v4.11-less-shouty' of git://people.freedesktop.org/~airlied/linux: (1177 commits)
  lib/Kconfig: make PRIME_NUMBERS not user selectable
  drm/tinydrm: helpers: Properly fix backlight dependency
  drm/tinydrm: mipi-dbi: Fix field width specifier warning
  drm/tinydrm: mipi-dbi: Silence: ‘cmd’ may be used uninitialized
  drm/sti: fix build warnings in sti_drv.c and sti_vtg.c files
  drm/amd/powerplay: fix PSI feature on Polars12
  drm/amdgpu: refuse to reserve io mem for split VRAM buffers
  drm/ttm: fix use-after-free races in vm fault handling
  drm/tinydrm: Add support for Multi-Inno MI0283QT display
  dt-bindings: Add Multi-Inno MI0283QT binding
  dt-bindings: display/panel: Add common rotation property
  of: Add vendor prefix for Multi-Inno
  drm/tinydrm: Add MIPI DBI support
  drm/tinydrm: Add helper functions
  drm: Add DRM support for tiny LCD displays
  drm/amd/amdgpu: post card if there is real hw resetting performed
  drm/nouveau/tmr: provide backtrace when a timeout is hit
  drm/nouveau/pci/g92: Fix rearm
  drm/nouveau/drm/therm/fan: add a fallback if no fan control is specified in the vbios
  drm/nouveau/hwmon: expose power_max and power_crit
  ..
2017-02-23 18:58:18 -08:00
Linus Torvalds
b2e3c4319d Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC driver updates from Arnd Bergmann:
 "Driver updates for ARM SoCs.

  A handful of driver changes this time around. The larger changes are:

   - Reset drivers for hi3660 and zx2967

   - AHCI driver for Davinci, acked by Tejun and brought in here due to
     platform dependencies

   - Cleanups of atmel-ebi (External Bus Interface)

   - Tweaks for Rockchip GRF (General Register File) usage (kitchensink
     misc register range on the SoCs)

   - PM domains changes for support of two new ZTE SoCs (zx296718 and
     zx2967)"

* tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (53 commits)
  soc: samsung: pmu: Add register defines for pad retention control
  reset: make zx2967 explicitly non-modular
  reset: core: fix reset_control_put
  soc: samsung: pm_domains: Read domain name from the new label property
  soc: samsung: pm_domains: Remove message about failed memory allocation
  soc: samsung: pm_domains: Remove unused name field
  soc: samsung: pm_domains: Use full names in subdomains registration log
  sata: ahci-da850: un-hardcode the MPY bits
  sata: ahci-da850: add a workaround for controller instability
  sata: ahci: export ahci_do_hardreset() locally
  sata: ahci-da850: implement a workaround for the softreset quirk
  sata: ahci-da850: add device tree match table
  sata: ahci-da850: get the sata clock using a connection id
  soc: samsung: pmu: Remove duplicated define for ARM_L2_OPTION register
  memory: atmel-ebi: Enable the SMC clock if specified
  soc: samsung: pmu: Remove unused and duplicated defines
  memory: atmel-ebi: Properly handle multiple reference to the same CS
  memory: atmel-ebi: Fix the test to enable generic SMC logic
  soc: samsung: pm_domains: Add new Exynos5433 compatible
  soc: samsung: pmu: Add dummy support for Exynos5433 SoC
  ...
2017-02-23 15:57:04 -08:00
Linus Torvalds
60e8d3e116 Merge tag 'pci-v4.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:

 - add ASPM L1 substate support

 - enable PCIe Extended Tags when supported

 - configure PCIe MPS settings on iProc, Versatile, X-Gene, and Xilinx

 - increase VPD access timeout

 - add ACS quirks for Intel Union Point, Qualcomm QDF2400 and QDF2432

 - use new pci_irq_alloc_vectors() in more drivers

 - fix MSI affinity memory leak

 - remove unused MSI interfaces and update documentation

 - remove unused AER .link_reset() callback

 - avoid pci_lock / p->pi_lock deadlock seen with perf

 - serialize sysfs enable/disable num_vfs operations

 - move DesignWare IP from drivers/pci/host/ to drivers/pci/dwc/ and
   refactor so we can support both hosts and endpoints

 - add DT ECAM-like support for HiSilicon Hip06/Hip07 controllers

 - add Rockchip system power management support

 - add Thunder-X cn81xx and cn83xx support

 - add Exynos 5440 PCIe PHY support

* tag 'pci-v4.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (93 commits)
  PCI: dwc: Remove dependency of designware on CONFIG_PCI
  PCI: dwc: Add CONFIG_PCIE_DW_HOST to enable PCI dwc host
  PCI: dwc: Split pcie-designware.c into host and core files
  PCI: dwc: designware: Fix style errors in pcie-designware.c
  PCI: dwc: designware: Parse "num-lanes" property in dw_pcie_setup_rc()
  PCI: dwc: all: Split struct pcie_port into host-only and core structures
  PCI: dwc: designware: Get device pointer at the start of dw_pcie_host_init()
  PCI: dwc: all: Rename cfg_read/cfg_write to read/write
  PCI: dwc: all: Use platform_set_drvdata() to save private data
  PCI: dwc: designware: Move register defines to designware header file
  PCI: dwc: Use PTR_ERR_OR_ZERO to simplify code
  PCI: dra7xx: Group PHY API invocations
  PCI: dra7xx: Enable MSI and legacy interrupts simultaneously
  PCI: dra7xx: Add support to force RC to work in GEN1 mode
  PCI: dra7xx: Simplify probe code with devm_gpiod_get_optional()
  PCI: Move DesignWare IP support to new drivers/pci/dwc/ directory
  PCI: exynos: Support the PHY generic framework
  Documentation: binding: Modify the exynos5440 PCIe binding
  phy: phy-exynos-pcie: Add support for Exynos PCIe PHY
  Documentation: samsung-phy: Add exynos-pcie-phy binding
  ...
2017-02-23 11:53:22 -08:00
Linus Torvalds
190c3ee06a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Some 'const'ing in qlogic networking drivers, from Bhumika Goyal.

 2) Fix scheduling while atomic in l2tp network namespace exit by
    deferring the work to the workqueue. From Ridge Kennedy.

 3) Fix use after free in dccp timewait handling, from Andrey Ryabinin.

 4) mlx5e CQE compression engine not initialized properly, from Tariq
    Toukan.

 5) Some UAPI header fixes from Dmitry V. Levin.

 6) Don't overwrite module parameter value in mlx4 driver, from Majd
    Dibbiny.

 7) Fix divide by zero in xt_hashlimit netfilter module, from Alban
    Browaeys.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits)
  bpf: Fix bpf_xdp_event_output
  net/mlx4_en: Use __skb_fill_page_desc()
  net/mlx4_core: Use cq quota in SRIOV when creating completion EQs
  net/mlx4_core: Fix VF overwrite of module param which disables DMFS on new probed PFs
  net/mlx4: Spoofcheck and zero MAC can't coexist
  net/mlx4: Change ENOTSUPP to EOPNOTSUPP
  uapi: fix linux/rds.h userspace compilation errors
  uapi: fix linux/seg6.h and linux/seg6_iptunnel.h userspace compilation errors
  lib: Remove string from parman config selection
  forcedeth: Remove return from a void function
  bpf: fix spelling mistake: "proccessed" -> "processed"
  uapi: fix linux/llc.h userspace compilation error
  uapi: fix linux/ip6_tunnel.h userspace compilation errors
  net/mlx5e: Fix wrong CQE decompression
  net/mlx5e: Update MPWQE stride size when modifying CQE compress state
  net/mlx5e: Fix broken CQE compression initialization
  net/mlx5e: Do not reduce LRO WQE size when not using build_skb
  net/mlx5e: Register/unregister vport representors on interface attach/detach
  net/mlx5e: s390 system compilation fix
  tcp: account for ts offset only if tsecr not zero
  ...
2017-02-23 11:36:06 -08:00
Linus Torvalds
af17fe7a63 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull Mellanox rdma updates from Doug Ledford:
 "Mellanox specific updates for 4.11 merge window

  Because the Mellanox code required being based on a net-next tree, I
  keept it separate from the remainder of the RDMA stack submission that
  is based on 4.10-rc3.

  This branch contains:

   - Various mlx4 and mlx5 fixes and minor changes

   - Support for adding a tag match rule to flow specs

   - Support for cvlan offload operation for raw ethernet QPs

   - A change to the core IB code to recognize raw eth capabilities and
     enumerate them (touches non-Mellanox code)

   - Implicit On-Demand Paging memory registration support"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (40 commits)
  IB/mlx5: Fix configuration of port capabilities
  IB/mlx4: Take source GID by index from HW GID table
  IB/mlx5: Fix blue flame buffer size calculation
  IB/mlx4: Remove unused variable from function declaration
  IB: Query ports via the core instead of direct into the driver
  IB: Add protocol for USNIC
  IB/mlx4: Support raw packet protocol
  IB/mlx5: Support raw packet protocol
  IB/core: Add raw packet protocol
  IB/mlx5: Add implicit MR support
  IB/mlx5: Expose MR cache for mlx5_ib
  IB/mlx5: Add null_mkey access
  IB/umem: Indicate that process is being terminated
  IB/umem: Update on demand page (ODP) support
  IB/core: Add implicit MR flag
  IB/mlx5: Support creation of a WQ with scatter FCS offload
  IB/mlx5: Enable QP creation with cvlan offload
  IB/mlx5: Enable WQ creation and modification with cvlan offload
  IB/mlx5: Expose vlan offloads capabilities
  IB/uverbs: Enable QP creation with cvlan offload
  ...
2017-02-23 11:27:49 -08:00
Linus Torvalds
5bcbe22ca4 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 "API:
   - Try to catch hash output overrun in testmgr
   - Introduce walksize attribute for batched walking
   - Make crypto_xor() and crypto_inc() alignment agnostic

  Algorithms:
   - Add time-invariant AES algorithm
   - Add standalone CBCMAC algorithm

  Drivers:
   - Add NEON acclerated chacha20 on ARM/ARM64
   - Expose AES-CTR as synchronous skcipher on ARM64
   - Add scalar AES implementation on ARM64
   - Improve scalar AES implementation on ARM
   - Improve NEON AES implementation on ARM/ARM64
   - Merge CRC32 and PMULL instruction based drivers on ARM64
   - Add NEON acclerated CBCMAC/CMAC/XCBC AES on ARM64
   - Add IPsec AUTHENC implementation in atmel
   - Add Support for Octeon-tx CPT Engine
   - Add Broadcom SPU driver
   - Add MediaTek driver"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (142 commits)
  crypto: xts - Add ECB dependency
  crypto: cavium - switch to pci_alloc_irq_vectors
  crypto: cavium - switch to pci_alloc_irq_vectors
  crypto: cavium - remove dead MSI-X related define
  crypto: brcm - Avoid double free in ahash_finup()
  crypto: cavium - fix Kconfig dependencies
  crypto: cavium - cpt_bind_vq_to_grp could return an error code
  crypto: doc - fix typo
  hwrng: omap - update Kconfig help description
  crypto: ccm - drop unnecessary minimum 32-bit alignment
  crypto: ccm - honour alignmask of subordinate MAC cipher
  crypto: caam - fix state buffer DMA (un)mapping
  crypto: caam - abstract ahash request double buffering
  crypto: caam - fix error path for ctx_dma mapping failure
  crypto: caam - fix DMA API leaks for multiple setkey() calls
  crypto: caam - don't dma_map key for hash algorithms
  crypto: caam - use dma_map_sg() return code
  crypto: caam - replace sg_count() with sg_nents_for_len()
  crypto: caam - check sg_count() return value
  crypto: caam - fix HW S/G in ablkcipher_giv_edesc_alloc()
  ..
2017-02-23 09:54:19 -08:00
Linus Torvalds
1db934a5b7 Merge tag 'rpmsg-v4.11' of git://github.com/andersson/remoteproc
Pull rpmsg updates from Bjorn Andersson:
 "This introduces an interface for user space to directly communicate on
  rpmsg endpoints without the implementation of specific kernel drivers,
  which is useful for e.g. debug channels:

* tag 'rpmsg-v4.11' of git://github.com/andersson/remoteproc:
  rpmsg: rpmsg_create_ept() returns NULL on error
  rpmsg: qcom: smd: Return positively when not enabled
  rpmsg: unlock on error in rpmsg_eptdev_read()
  rpmsg: char: add CONFIG_NET dependency
  rpmsg: smd: Register rpmsg user space interface for edges
  rpmsg: Driver for user space endpoint interface
  rpmsg: qcom_smd: Implement endpoint "poll"
  rpmsg: Introduce "poll" to endpoint ops
  rpmsg: qcom_smd: Add support for "label" property
2017-02-23 09:41:03 -08:00
Linus Torvalds
a3919caaa2 Merge tag 'rproc-v4.11' of git://github.com/andersson/remoteproc
Pull remoteproc updates from Bjorn Andersson:
 "This introduces support for booting the dedicated sensor core in the
  Qualcomm MSM8996, updates the Qualcomm ADSP and Hexagon drivers to
  utilize SMD subdevice helpers for properly handle shutdowns and
  restarts of the remoteproc, add virtio support to the ST remoteproc
  and refactor the Qualcomm Hexagon driver to handle variations between
  platforms.

  The support code for parsing, loading and authenticating Qualcomm
  firmware files (MDT) is refactored and move to drivers/soc/qcom, to
  allow for non-remoteproc drivers to utilize this.

  Finally it brings some cleanups to the remoteproc core"

* tag 'rproc-v4.11' of git://github.com/andersson/remoteproc: (27 commits)
  remoteproc: qcom: mdt_loader: Use signed type for offset
  remoteproc: st: add virtio communication support
  remoteproc: st: correct probe error management
  remoteproc: Modify the function names
  remoteproc: Reduce asynchronous request_firmware to auto-boot only
  remoteproc: Drop qcom_scm_pas_supported() from adsp_probe()
  MAINTAINERS: Add missing rpmsg include path
  remoteproc: qcom: Use common SMD edge handler
  remoteproc: qcom: wcnss: Make SMD handling common
  remoteproc: Move qcom_mdt_loader into drivers/soc/qcom
  remoteproc: qcom: mdt_loader: Refactor MDT loader
  remoteproc: qcom: mdt_loader: Don't overwrite firmware object
  remoteproc: qcom: Extract non-mdt related helper
  remoteproc: qcom: q6v5: Decouple driver from MDT loader
  remoteproc: qcom: q6v5: Remove mss supply from 8916
  remoteproc: qcom: fix initializers for qcom_mss_reg_res array
  remoteproc: Drop firmware_loading_complete
  remoteproc: Add RPROC_DELETED state
  remoteproc: Move rproc_delete_debug_dir() to rproc_del()
  remoteproc: qcom: Add SLPI rproc support to load and boot slpi proc.
  ...
2017-02-23 09:38:10 -08:00
Linus Torvalds
28cbc335d2 Merge tag 'sound-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
 "Here is the update of sound bits for 4.11: again at this time, no big
  changes in ALSA and ASoC core but only cosmetic changes like
  consitifaction.

  Meanwhile, quite a lot of developments are seen in a few driver side.

  ALSA Core:
   - Clean up, consitification of some ops

  HD-audio:
   - A slight behavior change of single_cmd option
   - Quirks for AmigaOne X1000, Samsung Ativ Book 8, Dell AiO, ALC221
     HP, and fixes for Lewisburg controller
   - Realtek ALC299, ALC1220 codecs

  Others:
   - USB-audio: Tascam US-16x08 DSP mixer quirk
   - Intel HDMI LPE audio support for Baytrail / Cherrytrail; this
     contains some updates in drm/i915 for the new platform binding

  ASoC:
   - Lots of updates in Intel drivers, mostly for DisplayPort and HDMI
     on Skylake and onwards, as well as more Baytrail / Cherrytrail
     boards support
   - Channel mapping support for HDMI
   - Support for AllWinner A31 and A33, Everest Semiconductor ES8328,
     Nuvoton NAU8540.

* tag 'sound-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (323 commits)
  ALSA: usb-audio: Tidy up mixer_us16x08.c
  ALSA: usb-audio: Fix memory leak and corruption in mixer_us16x08.c
  ALSA: usb-audio: purge needless variable length array
  ALSA: x86: hdmi: select CONFIG_SND_PCM
  ALSA: x86: Don't enable runtime PM as default
  ALSA: x86: Use runtime PM autosuspend
  ALSA: usb-audio: localize function without external linkage
  ALSA: usb-audio: localize one-referrer variable
  ALSA: usb-audio: Tascam US-16x08 DSP mixer quirk
  ALSA: emu10k1: constify snd_emux_operators structure
  ASoC: sun4i-spdif: drop unnessary snd_soc_unregister_component()
  ASoC: Intel: bxt: Add jack port initialize in bxt_rt298 machine
  ASoC: nau8825: automatic BCLK and LRC divde in master mode
  ASoC: hdac_hdmi: Add device id for Geminilake
  ASoC: Intel: Skylake: Add Geminlake IDs
  ASoC: rt298: Add DMI match for Geminilake reference platform
  ASoC: Intel: Skylake: Check device type to get endpoint configuration
  ASoC: Intel: bxt: Add jack port initialize in da7219_max98357a machine
  ASoC: Intel: Skylake: Add jack port initialize in nau88l25_ssm4567 machine
  ASoC: Intel: Skylake: Add jack port initialize in nau88l25_max98357a machine
  ...
2017-02-23 08:50:22 -08:00
Linus Torvalds
1ec5c1867a Merge tag 'gpio-v4.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO updates from Linus Walleij:
 "This is the bulk of GPIO changes for the v4.11 cycle

  Core changes:

   - Augment fwnode_get_named_gpiod() to configure the GPIO pin
     immediately after requesting it like all other APIs do. This is a
     treewide change also updating all users.

   - Pass a GPIO label down to gpiod_request() from
     fwnode_get_named_gpiod(). This makes debugfs and the userspace ABI
     correctly reflect the current in-kernel consumer of a pin taken
     using this abstraction. This is a treewide change also updating all
     users.

   - Rename devm_get_gpiod_from_child() to
     devm_fwnode_get_gpiod_from_child() to reflect the fact that this
     function is operating on a fwnode object. This is a treewide change
     also updating all users.

   - Make it possible to take multiple GPIOs in a single hog of device
     tree hogs.

   - The refactorings switching GPIO chips to use the .set_config()
     callback using standard pin control properties and providing a
     backend into the pin control subsystem that were also merged into
     the pin control tree naturally appear here too.

  Testing instrumentation:

   - A whole slew of cleanups and improvements to the mockup GPIO
     driver. We now have an extended userspace test exercising the
     subsystem, and we can inject interrupts etc from userspace to fully
     test the core GPIO functionality.

  New drivers:

   - New driver for the Cortina Systems Gemini GPIO controller.

   - New driver for the Exar XR17V352/354/358 chips.

   - New driver for the ACCES PCI-IDIO-16 PCI GPIO card.

  Driver changes:

   - RCAR: set the irqchip parent device, add fine-grained runtime PM
     support.

   - pca953x: support optional RESET control line on the chip.

   - DaVinci: cleanups and simplifications. Add support for multiple
     instances.

   - .set_multiple() and naming of lines on more or less all of the
     ISA/PCI GPIO controllers.

   - mcp23s08: refactored to use regmap as a first step to further
     rewrites and modernizations"

* tag 'gpio-v4.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (61 commits)
  gpio: reintroduce devm_get_gpiod_from_child()
  gpio: pci-idio-16: Fix PCI BAR index
  gpio: pci-idio-16: Fix PCI device ID code
  gpio: mockup: implement event injecting over debugfs
  gpio: mockup: add a dummy irqchip
  gpio: mockup: implement naming the lines
  gpio: mockup: code shrink
  gpio: mockup: readability tweaks
  gpio: Add GPIO support for the ACCES PCI-IDIO-16
  gpio: Add the devm_fwnode_get_index_gpiod_from_child() helper
  gpio: Rename devm_get_gpiod_from_child()
  gpio: mcp23s08: Select REGMAP/REGMAP_I2C to fix build error
  gpio: ws16c48: Add support for GPIO names
  gpio: gpio-mm: Add support for GPIO names
  gpio: 104-idio-16: Add support for GPIO names
  gpio: 104-idi-48: Add support for GPIO names
  gpio: 104-dio-48e: Add support for GPIO names
  gpio: ws16c48: Remove unnecessary driver_data set
  gpio: gpio-mm: Remove unnecessary driver_data set
  gpio: 104-idio-16: Remove unnecessary driver_data set
  ...
2017-02-23 08:46:04 -08:00
Linus Torvalds
d5dee39b27 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry:

 - a new driver for Zeitech touchscreen controller

 - a new driver for Samsung "touchkeys"

 - touchscreen driver for Moorestown platform has been removed because
   platform support is gone

 - MPU3050 accelerometer driver was removed in favor of IIO driver

 - miscellaneous driver cleanup and fixes

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (88 commits)
  Input: zet6223 - export OF device ID as module aliases
  Input: tsc2004/5 - switch to using generic device properties
  Input: tsc2004/5 - fix regulator handling
  Input: tsc2005 - add OF device table
  Input: add driver for Zeitec ZET6223
  Input: joydev - do not report stale values on first open
  Input: synaptics-rmi4 - forward upper mechanical buttons to PS/2 guest
  Input: synaptics-rmi4 - clean up F30 implementation
  Input: synaptics - use SERIO_OOB_DATA to handle trackstick buttons
  Input: psmouse - add a custom serio protocol to send extra information
  Input: synaptics-rmi4 - fix error return code in rmi_probe_interrupts()
  Input: xpad - restore LED state after device resume
  Input: synaptics-rmi4 - add rmi_find_function()
  Input: xpad - fix stuck mode button on Xbox One S pad
  Input: joydev - use clamp() macro
  Input: refuse to register absolute devices without absinfo
  Input: synaptics-rmi4 - add sysfs interfaces for hardware IDs
  Input: synaptics-rmi4 - add sysfs attribute update_fw_status
  Input: mousedev - stop offering PS/2 to userspace by default
  Input: tca8418 - switch to using generic device properties
  ...
2017-02-23 08:39:40 -08:00
Linus Torvalds
df9cdc1727 Merge tag 'mfd-for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MFD updates from Lee Jones:
 "Core Frameworks:
   - Add new !TOUCHSCREEN_SUN4I dependency for SUN4I_GPADC
   - List include/dt-bindings/mfd/* to files supported in MAINTAINERS

  New Drivers:
   - Intel Apollo Lake SPI NOR
   - ST STM32 Timers (Advanced, Basic and PWM)
   - Motorola 6556002 CPCAP (PMIC)

  New Device Support:
   - Add support for AXP221 to axp20x
   - Add support for Intel Gemini Lake to intel-lpss-pci
   - Add support for MT6323 LED to mt6397-core
   - Add support for COMe-bBD#, COMe-bSL6, COMe-bKL6, COMe-cAL6 and
     COMe-cKL6 to kempld-core

  New Functionality:
   - Add support for Analog CODAC to sun6i-prcm
   - Add support for Watchdog to lpc_ich

  Fix-ups:
   - Error handling improvements; axp288_charger, axp20x, ab8500-sysctrl
   - Adapt platform data handling; axp20x
   - IRQ handling improvements; arizona, axp20x
   - Remove superfluous code; arizona, axp20x, lpc_ich
   - Trivial coding style/spelling fixes; axp20x, abx500, mfd.txt
   - Regmap fix-ups; axp20x
   - DT changes; mfd.txt, aspeed-lpc, aspeed-gfx, ab8500-core, tps65912,
     mt6397
   - Use new I2C probing mechanism; max77686
   - Constification; rk808

  Bug Fixes:
   - Stop data transfer whilst suspended; cros_ec"

* tag 'mfd-for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (43 commits)
  mfd: lpc_ich: Enable watchdog on Intel Apollo Lake PCH
  mfd: lpc_ich: Remove useless comments in core part
  mfd: Add support for several boards to Kontron PLD driver
  mfd: constify regmap_irq_chip structures
  MAINTAINERS: Add include/dt-bindings/mfd to MFD entry
  mfd: cpcap: Add minimal support
  mfd: mt6397: Add MT6323 LED support into MT6397 driver
  Documentation: devicetree: Add LED subnode binding for MT6323 PMIC
  mfd: tps65912: Export OF device ID table as module aliases
  mfd: ab8500-core: Rename clock device and compatible
  mfd: cros_ec: Send correct suspend/resume event to EC
  mfd: max77686: Remove I2C device ID table
  mfd: max77686: Use the struct i2c_driver .probe_new instead of .probe
  mfd: max77686: Use of_device_get_match_data() helper
  mfd: max77686: Don't attempt to get i2c_device_id .data
  mfd: ab8500-sysctrl: Handle probe deferral
  mfd: intel-lpss: Add Intel Gemini Lake PCI IDs
  mfd: axp20x: Fix AXP806 access errors on cold boot
  mfd: cros_ec: Send suspend state notification to EC
  mfd: cros_ec: Prevent data transfer while device is suspended
  ...
2017-02-23 08:18:01 -08:00
Eugenia Emantayev
745d8ae462 net/mlx4: Spoofcheck and zero MAC can't coexist
Spoofcheck can't be enabled if VF MAC is zero.
Vice versa, can't zero MAC if spoofcheck is on.

Fixes: 8f7ba3ca12 ('net/mlx4: Add set VF mac address support')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-23 10:57:56 -05:00
Linus Torvalds
bc49a7831b Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:
 "142 patches:

   - DAX updates

   - various misc bits

   - OCFS2 updates

   - most of MM"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (142 commits)
  mm/z3fold.c: limit first_num to the actual range of possible buddy indexes
  mm: fix <linux/pagemap.h> stray kernel-doc notation
  zram: remove obsolete sysfs attrs
  mm/memblock.c: remove unnecessary log and clean up
  oom-reaper: use madvise_dontneed() logic to decide if unmap the VMA
  mm: drop unused argument of zap_page_range()
  mm: drop zap_details::check_swap_entries
  mm: drop zap_details::ignore_dirty
  mm, page_alloc: warn_alloc nodemask is NULL when cpusets are disabled
  mm: help __GFP_NOFAIL allocations which do not trigger OOM killer
  mm, oom: do not enforce OOM killer for __GFP_NOFAIL automatically
  mm: consolidate GFP_NOFAIL checks in the allocator slowpath
  lib/show_mem.c: teach show_mem to work with the given nodemask
  arch, mm: remove arch specific show_mem
  mm, page_alloc: warn_alloc print nodemask
  mm, page_alloc: do not report all nodes in show_mem
  Revert "mm: bail out in shrink_inactive_list()"
  mm, vmscan: consider eligible zones in get_scan_count
  mm, vmscan: cleanup lru size claculations
  mm, vmscan: do not count freed pages as PGDEACTIVATE
  ...
2017-02-22 19:29:24 -08:00
Linus Torvalds
be5165a51d Merge tag 'devicetree-for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull DeviceTree updates from Rob Herring:
 "Pretty standard stuff with dtc upstream sync being the biggest piece.

   - Sync dtc to upstream commit 0931cea3ba20. This picks up overlay
     support in dtc.

   - Set dma_ops for reserved memory users.

   - Make references to IOMMU consistent in DT bindings.

   - Cleanup references to pm_power_off in bindings.

   - Move some display bindings that snuck into the old bindings/video/
     path.

   - Fix some wrong documentation paths caused from binding
     restructuring.

   - Vendor prefixes for Faraday and Fujitsu.

   - Fix an of_node ref counting leak in of_find_node_opts_by_path

   - Introduce new graph helper of_graph_get_remote_node() which will be
     used by DRM drivers in 4.12"

* tag 'devicetree-for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (27 commits)
  DT: add Faraday Tec. as vendor
  of: introduce of_graph_get_remote_node
  of: Add missing space at end of pr_fmt().
  of: make of_device_make_bus_id() static
  of: fix of_node leak caused in of_find_node_opts_by_path
  dt-bindings: net: remove reference to fixed link support
  dt-bindings: power: reset: qnap-poweroff: Drop reference to pm_power_off
  dt-bindings: power: reset: gpio-poweroff: Drop reference to pm_power_off
  dt-bindings: mfd: as3722: Drop reference to pm_power_off
  dt-bindings: display: move ANX7814 and SiI8620 bridge bindings
  of/unittest: Swap arguments of of_unittest_apply_overlay()
  Documentation: usb: fix wrong documentation paths
  serial: fsl-imx-uart.txt: Remove generic property
  devicetree: Add Fujitsu Ltd. vendor prefix
  Documentation: display: fix wrong documentation paths
  of: remove redundant memset in overlay
  bus:qcom : Fix typo in qcom,ebi2.txt
  dt-bindings: qman: Remove pool channel node
  Documentation: panel-dpi: fix path to display-timing.txt
  devicetree: bindings: clk: mvebu: fix description for sata1 on Armada XP
  ...
2017-02-22 19:23:14 -08:00
Linus Torvalds
c1aac62f36 Merge tag 'docs-4.11' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
 "A slightly quieter cycle for documentation this time around.

  Three more DocBook template files have been converted to RST; only 21
  to go. There are various build improvements and the usual array of
  documentation improvements and fixes"

* tag 'docs-4.11' of git://git.lwn.net/linux: (44 commits)
  docs / driver-api: Fix structure references in device_link.rst
  PM / docs: Fix structure references in device.rst
  Add a target to check broken external links in the Documentation
  Documentation: Fix linux-api list typo
  Documentation: DocBook/Makefile comment typo
  Improve sparse documentation
  Documentation: make Makefile.sphinx no-ops quieter
  Documentation: DMA-ISA-LPC.txt
  Documentation: input: fix path to input code definitions
  docs: Remove the copyright year from conf.py
  docs: Fix a warning in the Korean HOWTO.rst translation
  PM / sleep / docs: Convert PM notifiers document to reST
  PM / core / docs: Convert sleep states API document to reST
  PM / core: Update kerneldoc comments in pm.h
  doc-rst: Fix recursive make invocation from macros
  doc-rst: Delete output of failed dot-SVG conversion
  doc-rst: Break shell command sequences on failure
  Documentation/sphinx: make targets independent of Sphinx work for HAVE_SPHINX=0
  doc-rst: fixed cleandoc target when used with O=dir
  Documentation/sphinx: prevent generation of .pyc files in the source tree
  ...
2017-02-22 18:51:29 -08:00
Linus Torvalds
fd7e9a8834 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
 "4.11 is going to be a relatively large release for KVM, with a little
  over 200 commits and noteworthy changes for most architectures.

  ARM:
   - GICv3 save/restore
   - cache flushing fixes
   - working MSI injection for GICv3 ITS
   - physical timer emulation

  MIPS:
   - various improvements under the hood
   - support for SMP guests
   - a large rewrite of MMU emulation. KVM MIPS can now use MMU
     notifiers to support copy-on-write, KSM, idle page tracking,
     swapping, ballooning and everything else. KVM_CAP_READONLY_MEM is
     also supported, so that writes to some memory regions can be
     treated as MMIO. The new MMU also paves the way for hardware
     virtualization support.

  PPC:
   - support for POWER9 using the radix-tree MMU for host and guest
   - resizable hashed page table
   - bugfixes.

  s390:
   - expose more features to the guest
   - more SIMD extensions
   - instruction execution protection
   - ESOP2

  x86:
   - improved hashing in the MMU
   - faster PageLRU tracking for Intel CPUs without EPT A/D bits
   - some refactoring of nested VMX entry/exit code, preparing for live
     migration support of nested hypervisors
   - expose yet another AVX512 CPUID bit
   - host-to-guest PTP support
   - refactoring of interrupt injection, with some optimizations thrown
     in and some duct tape removed.
   - remove lazy FPU handling
   - optimizations of user-mode exits
   - optimizations of vcpu_is_preempted() for KVM guests

  generic:
   - alternative signaling mechanism that doesn't pound on
     tsk->sighand->siglock"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (195 commits)
  x86/kvm: Provide optimized version of vcpu_is_preempted() for x86-64
  x86/paravirt: Change vcp_is_preempted() arg type to long
  KVM: VMX: use correct vmcs_read/write for guest segment selector/base
  x86/kvm/vmx: Defer TR reload after VM exit
  x86/asm/64: Drop __cacheline_aligned from struct x86_hw_tss
  x86/kvm/vmx: Simplify segment_base()
  x86/kvm/vmx: Get rid of segment_base() on 64-bit kernels
  x86/kvm/vmx: Don't fetch the TSS base from the GDT
  x86/asm: Define the kernel TSS limit in a macro
  kvm: fix page struct leak in handle_vmon
  KVM: PPC: Book3S HV: Disable HPT resizing on POWER9 for now
  KVM: Return an error code only as a constant in kvm_get_dirty_log()
  KVM: Return an error code only as a constant in kvm_get_dirty_log_protect()
  KVM: Return directly after a failed copy_from_user() in kvm_vm_compat_ioctl()
  KVM: x86: remove code for lazy FPU handling
  KVM: race-free exit from KVM_RUN without POSIX signals
  KVM: PPC: Book3S HV: Turn "KVM guest htab" message into a debug message
  KVM: PPC: Book3S PR: Ratelimit copy data failure error messages
  KVM: Support vCPU-based gfn->hva cache
  KVM: use separate generations for each address space
  ...
2017-02-22 18:22:53 -08:00
Dave Airlie
94000cc329 Merge tag 'v4.10-rc8' into drm-next
Linux 4.10-rc8

Backmerge Linus rc8 to fix some conflicts, but also
to avoid pulling it in via a fixes pull from someone.
2017-02-23 12:10:12 +10:00
Linus Torvalds
a27fcb0cd1 Merge tag 'xfs-4.11-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs updates from Darrick Wong:
 "Here are the XFS changes for 4.11. We aren't introducing any major
  features in this release cycle except for this being the first merge
  window I've managed on my own. :)

  Changes since last update:

   - Various cleanups

   - Livelock fixes for eofblocks scanning

   - Improved input verification for on-disk metadata

   - Fix races in the copy on write remap mechanism

   - Fix buffer io error timeout controls

   - Streamlining of directio copy on write

   - Asynchronous discard support

   - Fix asserts when splitting delalloc reservations

   - Don't bloat bmbt when right shifting extents

   - Inode alignment fixes for 32k block sizes"

* tag 'xfs-4.11-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (39 commits)
  xfs: remove XFS_ALLOCTYPE_ANY_AG and XFS_ALLOCTYPE_START_AG
  xfs: simplify xfs_rtallocate_extent
  xfs: tune down agno asserts in the bmap code
  xfs: Use xfs_icluster_size_fsb() to calculate inode chunk alignment
  xfs: don't reserve blocks for right shift transactions
  xfs: fix len comparison in xfs_extent_busy_trim
  xfs: fix uninitialized variable in _reflink_convert_cow
  xfs: split indlen reservations fairly when under reserved
  xfs: handle indlen shortage on delalloc extent merge
  xfs: resurrect debug mode drop buffered writes mechanism
  xfs: clear delalloc and cache on buffered write failure
  xfs: don't block the log commit handler for discards
  xfs: improve busy extent sorting
  xfs: improve handling of busy extents in the low-level allocator
  xfs: don't fail xfs_extent_busy allocation
  xfs: correct null checks and error processing in xfs_initialize_perag
  xfs: update ctime and mtime on clone destinatation inodes
  xfs: allocate direct I/O COW blocks in iomap_begin
  xfs: go straight to real allocations for direct I/O COW writes
  xfs: return the converted extent in __xfs_reflink_convert_cow
  ...
2017-02-22 18:05:23 -08:00
Linus Torvalds
7d91de7443 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk
Pull printk updates from Petr Mladek:

 - Add Petr Mladek, Sergey Senozhatsky as printk maintainers, and Steven
   Rostedt as the printk reviewer. This idea came up after the
   discussion about printk issues at Kernel Summit. It was formulated
   and discussed at lkml[1].

 - Extend a lock-less NMI per-cpu buffers idea to handle recursive
   printk() calls by Sergey Senozhatsky[2]. It is the first step in
   sanitizing printk as discussed at Kernel Summit.

   The change allows to see messages that would normally get ignored or
   would cause a deadlock.

   Also it allows to enable lockdep in printk(). This already paid off.
   The testing in linux-next helped to discover two old problems that
   were hidden before[3][4].

 - Remove unused parameter by Sergey Senozhatsky. Clean up after a past
   change.

[1] http://lkml.kernel.org/r/1481798878-31898-1-git-send-email-pmladek@suse.com
[2] http://lkml.kernel.org/r/20161227141611.940-1-sergey.senozhatsky@gmail.com
[3] http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
[4] http://lkml.kernel.org/r/20170217015932.11898-1-sergey.senozhatsky@gmail.com

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
  printk: drop call_console_drivers() unused param
  printk: convert the rest to printk-safe
  printk: remove zap_locks() function
  printk: use printk_safe buffers in printk
  printk: report lost messages in printk safe/nmi contexts
  printk: always use deferred printk when flush printk_safe lines
  printk: introduce per-cpu safe_print seq buffer
  printk: rename nmi.c and exported api
  printk: use vprintk_func in vprintk()
  MAINTAINERS: Add printk maintainers
2017-02-22 17:33:34 -08:00
Linus Torvalds
6ef192f225 Merge tag 'modules-for-v4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull modules updates from Jessica Yu:
 "Summary of modules changes for the 4.11 merge window:

   - A few small code cleanups

   - Add modules git tree url to MAINTAINERS"

* tag 'modules-for-v4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
  MAINTAINERS: add tree for modules
  module: fix memory leak on early load_module() failures
  module: Optimize search_module_extables()
  modules: mark __inittest/__exittest as __maybe_unused
  livepatch/module: print notice of TAINT_LIVEPATCH
  module: Drop redundant declaration of struct module
2017-02-22 17:08:33 -08:00
Randy Dunlap
083fb8edda mm: fix <linux/pagemap.h> stray kernel-doc notation
Delete stray (second) function description in find_lock_page()
kernel-doc notation.

Note: scripts/kernel-doc just ignores the second function description.

Fixes: 2457aec637 ("mm: non-atomically mark page accessed during page cache allocation where possible")
Link: http://lkml.kernel.org/r/b037e9a3-516c-ec02-6c8e-fa5479747ba6@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22 16:41:30 -08:00
Kirill A. Shutemov
ecf1385d72 mm: drop unused argument of zap_page_range()
There's no users of zap_page_range() who wants non-NULL 'details'.
Let's drop it.

Link: http://lkml.kernel.org/r/20170118122429.43661-3-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22 16:41:30 -08:00
Kirill A. Shutemov
3e8715fdc0 mm: drop zap_details::check_swap_entries
detail == NULL would give the same functionality as
.check_swap_entries==true.

Link: http://lkml.kernel.org/r/20170118122429.43661-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22 16:41:30 -08:00
Kirill A. Shutemov
da162e9368 mm: drop zap_details::ignore_dirty
The only user of ignore_dirty is oom-reaper.  But it doesn't really use
it.

ignore_dirty only has effect on file pages mapped with dirty pte.  But
oom-repear skips shared VMAs, so there's no way we can dirty file pte in
them.

Link: http://lkml.kernel.org/r/20170118122429.43661-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22 16:41:30 -08:00
Michal Hocko
9af744d743 lib/show_mem.c: teach show_mem to work with the given nodemask
show_mem() allows to filter out node specific data which is irrelevant
to the allocation request via SHOW_MEM_FILTER_NODES.  The filtering is
done in skip_free_areas_node which skips all nodes which are not in the
mems_allowed of the current process.  This works most of the time as
expected because the nodemask shouldn't be outside of the allocating
task but there are some exceptions.  E.g.  memory hotplug might want to
request allocations from outside of the allowed nodes (see
new_node_page).

Get rid of this hardcoded behavior and push the allocation mask down the
show_mem path and use it instead of cpuset_current_mems_allowed.  NULL
nodemask is interpreted as cpuset_current_mems_allowed.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170117091543.25850-5-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22 16:41:30 -08:00
Michal Hocko
a8e99259e7 mm, page_alloc: warn_alloc print nodemask
warn_alloc is currently used for to report an allocation failure or an
allocation stall.  We print some details of the allocation request like
the gfp mask and the request order.  We do not print the allocation
nodemask which is important when debugging the reason for the allocation
failure as well.  We alreaddy print the nodemask in the OOM report.

Add nodemask to warn_alloc and print it in warn_alloc as well.

Link: http://lkml.kernel.org/r/20170117091543.25850-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22 16:41:30 -08:00
Michal Hocko
fd53880373 mm, vmscan: cleanup lru size claculations
lruvec_lru_size returns the full size of the LRU list while we sometimes
need a value reduced only to eligible zones (e.g.  for lowmem requests).
inactive_list_is_low is one such user.  Later patches will add more of
them.  Add a new parameter to lruvec_lru_size and allow it filter out
zones which are not eligible for the given context.

Link: http://lkml.kernel.org/r/20170117103702.28542-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22 16:41:30 -08:00