If madvise(2) advice will result in the underlying vma being split and
the number of areas mapped by the process will exceed
/proc/sys/vm/max_map_count as a result, return ENOMEM instead of EAGAIN.
EAGAIN is returned by madvise(2) when a kernel resource, such as slab,
is temporarily unavailable. It indicates that userspace should retry
the advice in the near future. This is important for advice such as
MADV_DONTNEED which is often used by malloc implementations to free
memory back to the system: we really do want to free memory back when
madvise(2) returns EAGAIN because slab allocations (for vmas, anon_vmas,
or mempolicies) cannot be allocated.
Encountering /proc/sys/vm/max_map_count is not a temporary failure,
however, so return ENOMEM to indicate this is a more serious issue. A
followup patch to the man page will specify this behavior.
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1701241431120.42507@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "1G transparent hugepage support for device dax", v2.
The following series implements support for 1G trasparent hugepage on
x86 for device dax. The bulk of the code was written by Mathew Wilcox a
while back supporting transparent 1G hugepage for fs DAX. I have
forward ported the relevant bits to 4.10-rc. The current submission has
only the necessary code to support device DAX.
Comments from Dan Williams: So the motivation and intended user of this
functionality mirrors the motivation and users of 1GB page support in
hugetlbfs. Given expected capacities of persistent memory devices an
in-memory database may want to reduce tlb pressure beyond what they can
already achieve with 2MB mappings of a device-dax file. We have
customer feedback to that effect as Willy mentioned in his previous
version of these patches [1].
[1]: https://lkml.org/lkml/2016/1/31/52
Comments from Nilesh @ Oracle:
There are applications which have a process model; and if you assume
10,000 processes attempting to mmap all the 6TB memory available on a
server; we are looking at the following:
processes : 10,000
memory : 6TB
pte @ 4k page size: 8 bytes / 4K of memory * #processes = 6TB / 4k * 8 * 10000 = 1.5GB * 80000 = 120,000GB
pmd @ 2M page size: 120,000 / 512 = ~240GB
pud @ 1G page size: 240GB / 512 = ~480MB
As you can see with 2M pages, this system will use up an exorbitant
amount of DRAM to hold the page tables; but the 1G pages finally brings
it down to a reasonable level. Memory sizes will keep increasing; so
this number will keep increasing.
An argument can be made to convert the applications from process model
to thread model, but in the real world that may not be always practical.
Hopefully this helps explain the use case where this is valuable.
This patch (of 3):
In preparation for adding the ability to handle PUD pages, convert
vm_operations_struct.pmd_fault to vm_operations_struct.huge_fault. The
vm_fault structure is extended to include a union of the different page
table pointers that may be needed, and three flag bits are reserved to
indicate which type of pointer is in the union.
[ross.zwisler@linux.intel.com: remove unused function ext4_dax_huge_fault()]
Link: http://lkml.kernel.org/r/1485813172-7284-1-git-send-email-ross.zwisler@linux.intel.com
[dave.jiang@intel.com: clear PMD or PUD size flags when in fall through path]
Link: http://lkml.kernel.org/r/148589842696.5820.16078080610311444794.stgit@djiang5-desk3.ch.intel.com
Link: http://lkml.kernel.org/r/148545058784.17912.6353162518188733642.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We noticed a performance regression when moving hadoop workloads from
3.10 kernels to 4.0 and 4.6. This is accompanied by increased pageout
activity initiated by kswapd as well as frequent bursts of allocation
stalls and direct reclaim scans. Even lowering the dirty ratios to the
equivalent of less than 1% of memory would not eliminate the issue,
suggesting that dirty pages concentrate where the scanner is looking.
This can be traced back to recent efforts of thrash avoidance. Where
3.10 would not detect refaulting pages and continuously supply clean
cache to the inactive list, a thrashing workload on 4.0+ will detect and
activate refaulting pages right away, distilling used-once pages on the
inactive list much more effectively. This is by design, and it makes
sense for clean cache. But for the most part our workload's cache
faults are refaults and its use-once cache is from streaming writes. We
end up with most of the inactive list dirty, and we don't go after the
active cache as long as we have use-once pages around.
But waiting for writes to avoid reclaiming clean cache that *might*
refault is a bad trade-off. Even if the refaults happen, reads are
faster than writes. Before getting bogged down on writeback, reclaim
should first look at *all* cache in the system, even active cache.
To accomplish this, activate pages that are dirty or under writeback
when they reach the end of the inactive LRU. The pages are marked for
immediate reclaim, meaning they'll get moved back to the inactive LRU
tail as soon as they're written back and become reclaimable. But in the
meantime, by reducing the inactive list to only immediately reclaimable
pages, we allow the scanner to deactivate and refill the inactive list
with clean cache from the active list tail to guarantee forward
progress.
[hannes@cmpxchg.org: update comment]
Link: http://lkml.kernel.org/r/20170202191957.22872-8-hannes@cmpxchg.org
Link: http://lkml.kernel.org/r/20170123181641.23938-6-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Memory pressure can put dirty pages at the end of the LRU without
anybody running into dirty limits. Don't start writing individual pages
from kswapd while the flushers might be asleep.
Unlike the old direct reclaim flusher wakeup (removed in the next patch)
that flushes the number of pages just scanned, this patch wakes the
flushers for all outstanding dirty pages. That seemed to perform better
in a synthetic test that pushes dirty pages to the end of the LRU and
into reclaim, because we know LRU aging outstrips writeback already, and
this way we give younger dirty pages a headstart rather than wait until
reclaim runs into them as well. It also means less plugging and risk of
exhausting the struct request pool from reclaim.
There is a concern that this will cause temporary files that used to get
dirtied and truncated before writeback to now get written to disk under
memory pressure. If this turns out to be a real problem, we'll have to
revisit this and tame the reclaim flusher wakeups.
[hannes@cmpxchg.org: mention dirty expiration as a condition]
Link: http://lkml.kernel.org/r/20170126174739.GA30636@cmpxchg.org
Link: http://lkml.kernel.org/r/20170123181641.23938-3-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "mm: vmscan: fix kswapd writeback regression".
We noticed a regression on multiple hadoop workloads when moving from
3.10 to 4.0 and 4.6, which involves kswapd getting tangled up in page
writeout, causing direct reclaim herds that also don't make progress.
I tracked it down to the thrash avoidance efforts after 3.10 that make
the kernel better at keeping use-once cache and use-many cache sorted on
the inactive and active list, with more aggressive protection of the
active list as long as there is inactive cache. Unfortunately, our
workload's use-once cache is mostly from streaming writes. Waiting for
writes to avoid potential reloads in the future is not a good tradeoff.
These patches do the following:
1. Wake the flushers when kswapd sees a lump of dirty pages. It's
possible to be below the dirty background limit and still have cache
velocity push them through the LRU. So start a-flushin'.
2. Let kswapd only write pages that have been rotated twice. This makes
sure we really tried to get all the clean pages on the inactive list
before resorting to horrible LRU-order writeback.
3. Move rotating dirty pages off the inactive list. Instead of churning
or waiting on page writeback, we'll go after clean active cache. This
might lead to thrashing, but in this state memory demand outstrips IO
speed anyway, and reads are faster than writes.
Mel backported the series to 4.10-rc5 with one minor conflict and ran a
couple of tests on it. Mix of read/write random workload didn't show
anything interesting. Write-only database didn't show much difference
in performance but there were slight reductions in IO -- probably in the
noise.
simoop did show big differences although not as big as Mel expected.
This is Chris Mason's workload that similate the VM activity of hadoop.
Mel won't go through the full details but over the samples measured
during an hour it reported
4.10.0-rc5 4.10.0-rc5
vanilla johannes-v1r1
Amean p50-Read 21346531.56 ( 0.00%) 21697513.24 ( -1.64%)
Amean p95-Read 24700518.40 ( 0.00%) 25743268.98 ( -4.22%)
Amean p99-Read 27959842.13 ( 0.00%) 28963271.11 ( -3.59%)
Amean p50-Write 1138.04 ( 0.00%) 989.82 ( 13.02%)
Amean p95-Write 1106643.48 ( 0.00%) 12104.00 ( 98.91%)
Amean p99-Write 1569213.22 ( 0.00%) 36343.38 ( 97.68%)
Amean p50-Allocation 85159.82 ( 0.00%) 79120.70 ( 7.09%)
Amean p95-Allocation 204222.58 ( 0.00%) 129018.43 ( 36.82%)
Amean p99-Allocation 278070.04 ( 0.00%) 183354.43 ( 34.06%)
Amean final-p50-Read 21266432.00 ( 0.00%) 21921792.00 ( -3.08%)
Amean final-p95-Read 24870912.00 ( 0.00%) 26116096.00 ( -5.01%)
Amean final-p99-Read 28147712.00 ( 0.00%) 29523968.00 ( -4.89%)
Amean final-p50-Write 1130.00 ( 0.00%) 977.00 ( 13.54%)
Amean final-p95-Write 1033216.00 ( 0.00%) 2980.00 ( 99.71%)
Amean final-p99-Write 1517568.00 ( 0.00%) 32672.00 ( 97.85%)
Amean final-p50-Allocation 86656.00 ( 0.00%) 78464.00 ( 9.45%)
Amean final-p95-Allocation 211712.00 ( 0.00%) 116608.00 ( 44.92%)
Amean final-p99-Allocation 287232.00 ( 0.00%) 168704.00 ( 41.27%)
The latencies are actually completely horrific in comparison to 4.4 (and
4.10-rc5 is worse than 4.9 according to historical data for reasons Mel
hasn't analysed yet).
Still, 95% of write latency (p95-write) is halved by the series and
allocation latency is way down. Direct reclaim activity is one fifth of
what it was according to vmstats. Kswapd activity is higher but this is
not necessarily surprising. Kswapd efficiency is unchanged at 99% (99%
of pages scanned were reclaimed) but direct reclaim efficiency went from
77% to 99%
In the vanilla kernel, 627MB of data was written back from reclaim
context. With the series, no data was written back. With or without
the patch, pages are being immediately reclaimed after writeback
completes. However, with the patch, only 1/8th of the pages are
reclaimed like this.
This patch (of 5):
We have an elaborate dirty/writeback throttling mechanism inside the
reclaim scanner, but for that to work the pages have to go through
shrink_page_list() and get counted for what they are. Otherwise, we
mess up the LRU order and don't match reclaim speed to writeback.
Especially during deactivation, there is never a reason to skip dirty
pages; nothing is even trying to write them out from there. Don't mess
up the LRU order for nothing, shuffle these pages along.
Link: http://lkml.kernel.org/r/20170123181641.23938-2-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "userfaultfd: non-cooperative: add madvise() event for
MADV_REMOVE request".
These patches add notification of madvise(MADV_REMOVE) event to
non-cooperative userfaultfd monitor.
The first pacth renames EVENT_MADVDONTNEED to EVENT_REMOVE along with
relevant functions and structures. Using _REMOVE instead of
_MADVDONTNEED describes the event semantics more clearly and I hope it's
not too late for such change in the ABI.
This patch (of 3):
The UFFD_EVENT_MADVDONTNEED purpose is to notify uffd monitor about
removal of certain range from address space tracked by userfaultfd.
Hence, UFFD_EVENT_REMOVE seems to better reflect the operation
semantics. Respectively, 'madv_dn' field of uffd_msg is renamed to
'remove' and the madvise_userfault_dontneed callback is renamed to
userfaultfd_remove.
Link: http://lkml.kernel.org/r/1484814154-1557-2-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull namespace updates from Eric Biederman:
"There is a lot here. A lot of these changes result in subtle user
visible differences in kernel behavior. I don't expect anything will
care but I will revert/fix things immediately if any regressions show
up.
From Seth Forshee there is a continuation of the work to make the vfs
ready for unpriviled mounts. We had thought the previous changes
prevented the creation of files outside of s_user_ns of a filesystem,
but it turns we missed the O_CREAT path. Ooops.
Pavel Tikhomirov and Oleg Nesterov worked together to fix a long
standing bug in the implemenation of PR_SET_CHILD_SUBREAPER where only
children that are forked after the prctl are considered and not
children forked before the prctl. The only known user of this prctl
systemd forks all children after the prctl. So no userspace
regressions will occur. Holding earlier forked children to the same
rules as later forked children creates a semantic that is sane enough
to allow checkpoing of processes that use this feature.
There is a long delayed change by Nikolay Borisov to limit inotify
instances inside a user namespace.
Michael Kerrisk extends the API for files used to maniuplate
namespaces with two new trivial ioctls to allow discovery of the
hierachy and properties of namespaces.
Konstantin Khlebnikov with the help of Al Viro adds code that when a
network namespace exits purges it's sysctl entries from the dcache. As
in some circumstances this could use a lot of memory.
Vivek Goyal fixed a bug with stacked filesystems where the permissions
on the wrong inode were being checked.
I continue previous work on ptracing across exec. Allowing a file to
be setuid across exec while being ptraced if the tracer has enough
credentials in the user namespace, and if the process has CAP_SETUID
in it's own namespace. Proc files for setuid or otherwise undumpable
executables are now owned by the root in the user namespace of their
mm. Allowing debugging of setuid applications in containers to work
better.
A bug I introduced with permission checking and automount is now
fixed. The big change is to mark the mounts that the kernel initiates
as a result of an automount. This allows the permission checks in sget
to be safely suppressed for this kind of mount. As the permission
check happened when the original filesystem was mounted.
Finally a special case in the mount namespace is removed preventing
unbounded chains in the mount hash table, and making the semantics
simpler which benefits CRIU.
The vfs fix along with related work in ima and evm I believe makes us
ready to finish developing and merge fully unprivileged mounts of the
fuse filesystem. The cleanups of the mount namespace makes discussing
how to fix the worst case complexity of umount. The stacked filesystem
fixes pave the way for adding multiple mappings for the filesystem
uids so that efficient and safer containers can be implemented"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
proc/sysctl: Don't grab i_lock under sysctl_lock.
vfs: Use upper filesystem inode in bprm_fill_uid()
proc/sysctl: prune stale dentries during unregistering
mnt: Tuck mounts under others instead of creating shadow/side mounts.
prctl: propagate has_child_subreaper flag to every descendant
introduce the walk_process_tree() helper
nsfs: Add an ioctl() to return owner UID of a userns
fs: Better permission checking for submounts
exit: fix the setns() && PR_SET_CHILD_SUBREAPER interaction
vfs: open() with O_CREAT should not create inodes with unknown ids
nsfs: Add an ioctl() to return the namespace type
proc: Better ownership of files for non-dumpable tasks in user namespaces
exec: Remove LSM_UNSAFE_PTRACE_CAP
exec: Test the ptracer's saved cred to see if the tracee can gain caps
exec: Don't reset euid and egid when the tracee has CAP_SETUID
inotify: Convert to using per-namespace limits
Pull drm updates from Dave Airlie:
"This is the main drm pull request for v4.11.
Nothing too major, the tinydrm and mmu-less support should make
writing smaller drivers easier for some of the simpler platforms, and
there are a bunch of documentation updates.
Intel grew displayport MST audio support which is hopefully useful to
people, and FBC is on by default for GEN9+ (so people know where to
look for regressions). AMDGPU has a lot of fixes that would like new
firmware files installed for some GPUs.
Other than that it's pretty scattered all over.
I may have a follow up pull request as I know BenH has a bunch of AST
rework and fixes and I'd like to get those in once they've been tested
by AST, and I've got at least one pull request I'm just trying to get
the author to fix up.
Core:
- drm_mm reworked
- Connector list locking and iterators
- Documentation updates
- Format handling rework
- MMU-less support for fbdev helpers
- drm_crtc_from_index helper
- Core CRC API
- Remove drm_framebuffer_unregister_private
- Debugfs cleanup
- EDID/Infoframe fixes
- Release callback
- Tinydrm support (smaller drivers for simple hw)
panel:
- Add support for some new simple panels
i915:
- FBC by default for gen9+
- Shared dpll cleanups and docs
- GEN8 powerdomain cleanup
- DMC support on GLK
- DP MST audio support
- HuC loading support
- GVT init ordering fixes
- GVT IOMMU workaround fix
amdgpu/radeon:
- Power/clockgating improvements
- Preliminary SR-IOV support
- TTM buffer priority and eviction fixes
- SI DPM quirks removed due to firmware fixes
- Powerplay improvements
- VCE/UVD powergating fixes
- Cleanup SI GFX code to match CI/VI
- Support for > 2 displays on 3/5 crtc asics
- SI headless fixes
nouveau:
- Rework securre boot code in prep for GP10x secure boot
- Channel recovery improvements
- Initial power budget code
- MMU rework preperation
vmwgfx:
- Bunch of fixes and cleanups
exynos:
- Runtime PM support for MIC driver
- Cleanups to use atomic helpers
- UHD Support for TM2/TM2E boards
- Trigger mode fix for Rinato board
etnaviv:
- Shader performance fix
- Command stream validator fixes
- Command buffer suballocator
rockchip:
- CDN DisplayPort support
- IOMMU support for arm64 platform
imx-drm:
- Fix i.MX5 TV encoder probing
- Remove lower fb size limits
msm:
- Support for HW cursor on MDP5 devices
- DSI encoder cleanup
- GPU DT bindings cleanup
sti:
- stih410 cleanups
- Create fbdev at binding
- HQVDP fixes
- Remove stih416 chip functionality
- DVI/HDMI mode selection fixes
- FPS statistic reporting
omapdrm:
- IRQ code cleanup
dwi-hdmi bridge:
- Cleanups and fixes
adv-bridge:
- Updates for nexus
sii8520 bridge:
- Add interlace mode support
- Rework HDMI and lots of fixes
qxl:
- probing/teardown cleanups
ZTE drm:
- HDMI audio via SPDIF interface
- Video Layer overlay plane support
- Add TV encoder output device
atmel-hlcdc:
- Rework fbdev creation logic
tegra:
- OF node fix
fsl-dcu:
- Minor fixes
mali-dp:
- Assorted fixes
sunxi:
- Minor fix"
[ This was the "fixed" pull, that still had build warnings due to people
not even having build tested the result. I'm not a happy camper
I've fixed the things I noticed up in this merge. - Linus ]
* tag 'drm-for-v4.11-less-shouty' of git://people.freedesktop.org/~airlied/linux: (1177 commits)
lib/Kconfig: make PRIME_NUMBERS not user selectable
drm/tinydrm: helpers: Properly fix backlight dependency
drm/tinydrm: mipi-dbi: Fix field width specifier warning
drm/tinydrm: mipi-dbi: Silence: ‘cmd’ may be used uninitialized
drm/sti: fix build warnings in sti_drv.c and sti_vtg.c files
drm/amd/powerplay: fix PSI feature on Polars12
drm/amdgpu: refuse to reserve io mem for split VRAM buffers
drm/ttm: fix use-after-free races in vm fault handling
drm/tinydrm: Add support for Multi-Inno MI0283QT display
dt-bindings: Add Multi-Inno MI0283QT binding
dt-bindings: display/panel: Add common rotation property
of: Add vendor prefix for Multi-Inno
drm/tinydrm: Add MIPI DBI support
drm/tinydrm: Add helper functions
drm: Add DRM support for tiny LCD displays
drm/amd/amdgpu: post card if there is real hw resetting performed
drm/nouveau/tmr: provide backtrace when a timeout is hit
drm/nouveau/pci/g92: Fix rearm
drm/nouveau/drm/therm/fan: add a fallback if no fan control is specified in the vbios
drm/nouveau/hwmon: expose power_max and power_crit
..
Pull ARM SoC driver updates from Arnd Bergmann:
"Driver updates for ARM SoCs.
A handful of driver changes this time around. The larger changes are:
- Reset drivers for hi3660 and zx2967
- AHCI driver for Davinci, acked by Tejun and brought in here due to
platform dependencies
- Cleanups of atmel-ebi (External Bus Interface)
- Tweaks for Rockchip GRF (General Register File) usage (kitchensink
misc register range on the SoCs)
- PM domains changes for support of two new ZTE SoCs (zx296718 and
zx2967)"
* tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (53 commits)
soc: samsung: pmu: Add register defines for pad retention control
reset: make zx2967 explicitly non-modular
reset: core: fix reset_control_put
soc: samsung: pm_domains: Read domain name from the new label property
soc: samsung: pm_domains: Remove message about failed memory allocation
soc: samsung: pm_domains: Remove unused name field
soc: samsung: pm_domains: Use full names in subdomains registration log
sata: ahci-da850: un-hardcode the MPY bits
sata: ahci-da850: add a workaround for controller instability
sata: ahci: export ahci_do_hardreset() locally
sata: ahci-da850: implement a workaround for the softreset quirk
sata: ahci-da850: add device tree match table
sata: ahci-da850: get the sata clock using a connection id
soc: samsung: pmu: Remove duplicated define for ARM_L2_OPTION register
memory: atmel-ebi: Enable the SMC clock if specified
soc: samsung: pmu: Remove unused and duplicated defines
memory: atmel-ebi: Properly handle multiple reference to the same CS
memory: atmel-ebi: Fix the test to enable generic SMC logic
soc: samsung: pm_domains: Add new Exynos5433 compatible
soc: samsung: pmu: Add dummy support for Exynos5433 SoC
...
Pull PCI updates from Bjorn Helgaas:
- add ASPM L1 substate support
- enable PCIe Extended Tags when supported
- configure PCIe MPS settings on iProc, Versatile, X-Gene, and Xilinx
- increase VPD access timeout
- add ACS quirks for Intel Union Point, Qualcomm QDF2400 and QDF2432
- use new pci_irq_alloc_vectors() in more drivers
- fix MSI affinity memory leak
- remove unused MSI interfaces and update documentation
- remove unused AER .link_reset() callback
- avoid pci_lock / p->pi_lock deadlock seen with perf
- serialize sysfs enable/disable num_vfs operations
- move DesignWare IP from drivers/pci/host/ to drivers/pci/dwc/ and
refactor so we can support both hosts and endpoints
- add DT ECAM-like support for HiSilicon Hip06/Hip07 controllers
- add Rockchip system power management support
- add Thunder-X cn81xx and cn83xx support
- add Exynos 5440 PCIe PHY support
* tag 'pci-v4.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (93 commits)
PCI: dwc: Remove dependency of designware on CONFIG_PCI
PCI: dwc: Add CONFIG_PCIE_DW_HOST to enable PCI dwc host
PCI: dwc: Split pcie-designware.c into host and core files
PCI: dwc: designware: Fix style errors in pcie-designware.c
PCI: dwc: designware: Parse "num-lanes" property in dw_pcie_setup_rc()
PCI: dwc: all: Split struct pcie_port into host-only and core structures
PCI: dwc: designware: Get device pointer at the start of dw_pcie_host_init()
PCI: dwc: all: Rename cfg_read/cfg_write to read/write
PCI: dwc: all: Use platform_set_drvdata() to save private data
PCI: dwc: designware: Move register defines to designware header file
PCI: dwc: Use PTR_ERR_OR_ZERO to simplify code
PCI: dra7xx: Group PHY API invocations
PCI: dra7xx: Enable MSI and legacy interrupts simultaneously
PCI: dra7xx: Add support to force RC to work in GEN1 mode
PCI: dra7xx: Simplify probe code with devm_gpiod_get_optional()
PCI: Move DesignWare IP support to new drivers/pci/dwc/ directory
PCI: exynos: Support the PHY generic framework
Documentation: binding: Modify the exynos5440 PCIe binding
phy: phy-exynos-pcie: Add support for Exynos PCIe PHY
Documentation: samsung-phy: Add exynos-pcie-phy binding
...
Pull networking fixes from David Miller:
1) Some 'const'ing in qlogic networking drivers, from Bhumika Goyal.
2) Fix scheduling while atomic in l2tp network namespace exit by
deferring the work to the workqueue. From Ridge Kennedy.
3) Fix use after free in dccp timewait handling, from Andrey Ryabinin.
4) mlx5e CQE compression engine not initialized properly, from Tariq
Toukan.
5) Some UAPI header fixes from Dmitry V. Levin.
6) Don't overwrite module parameter value in mlx4 driver, from Majd
Dibbiny.
7) Fix divide by zero in xt_hashlimit netfilter module, from Alban
Browaeys.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits)
bpf: Fix bpf_xdp_event_output
net/mlx4_en: Use __skb_fill_page_desc()
net/mlx4_core: Use cq quota in SRIOV when creating completion EQs
net/mlx4_core: Fix VF overwrite of module param which disables DMFS on new probed PFs
net/mlx4: Spoofcheck and zero MAC can't coexist
net/mlx4: Change ENOTSUPP to EOPNOTSUPP
uapi: fix linux/rds.h userspace compilation errors
uapi: fix linux/seg6.h and linux/seg6_iptunnel.h userspace compilation errors
lib: Remove string from parman config selection
forcedeth: Remove return from a void function
bpf: fix spelling mistake: "proccessed" -> "processed"
uapi: fix linux/llc.h userspace compilation error
uapi: fix linux/ip6_tunnel.h userspace compilation errors
net/mlx5e: Fix wrong CQE decompression
net/mlx5e: Update MPWQE stride size when modifying CQE compress state
net/mlx5e: Fix broken CQE compression initialization
net/mlx5e: Do not reduce LRO WQE size when not using build_skb
net/mlx5e: Register/unregister vport representors on interface attach/detach
net/mlx5e: s390 system compilation fix
tcp: account for ts offset only if tsecr not zero
...
Pull Mellanox rdma updates from Doug Ledford:
"Mellanox specific updates for 4.11 merge window
Because the Mellanox code required being based on a net-next tree, I
keept it separate from the remainder of the RDMA stack submission that
is based on 4.10-rc3.
This branch contains:
- Various mlx4 and mlx5 fixes and minor changes
- Support for adding a tag match rule to flow specs
- Support for cvlan offload operation for raw ethernet QPs
- A change to the core IB code to recognize raw eth capabilities and
enumerate them (touches non-Mellanox code)
- Implicit On-Demand Paging memory registration support"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (40 commits)
IB/mlx5: Fix configuration of port capabilities
IB/mlx4: Take source GID by index from HW GID table
IB/mlx5: Fix blue flame buffer size calculation
IB/mlx4: Remove unused variable from function declaration
IB: Query ports via the core instead of direct into the driver
IB: Add protocol for USNIC
IB/mlx4: Support raw packet protocol
IB/mlx5: Support raw packet protocol
IB/core: Add raw packet protocol
IB/mlx5: Add implicit MR support
IB/mlx5: Expose MR cache for mlx5_ib
IB/mlx5: Add null_mkey access
IB/umem: Indicate that process is being terminated
IB/umem: Update on demand page (ODP) support
IB/core: Add implicit MR flag
IB/mlx5: Support creation of a WQ with scatter FCS offload
IB/mlx5: Enable QP creation with cvlan offload
IB/mlx5: Enable WQ creation and modification with cvlan offload
IB/mlx5: Expose vlan offloads capabilities
IB/uverbs: Enable QP creation with cvlan offload
...
Pull rpmsg updates from Bjorn Andersson:
"This introduces an interface for user space to directly communicate on
rpmsg endpoints without the implementation of specific kernel drivers,
which is useful for e.g. debug channels:
* tag 'rpmsg-v4.11' of git://github.com/andersson/remoteproc:
rpmsg: rpmsg_create_ept() returns NULL on error
rpmsg: qcom: smd: Return positively when not enabled
rpmsg: unlock on error in rpmsg_eptdev_read()
rpmsg: char: add CONFIG_NET dependency
rpmsg: smd: Register rpmsg user space interface for edges
rpmsg: Driver for user space endpoint interface
rpmsg: qcom_smd: Implement endpoint "poll"
rpmsg: Introduce "poll" to endpoint ops
rpmsg: qcom_smd: Add support for "label" property
Pull remoteproc updates from Bjorn Andersson:
"This introduces support for booting the dedicated sensor core in the
Qualcomm MSM8996, updates the Qualcomm ADSP and Hexagon drivers to
utilize SMD subdevice helpers for properly handle shutdowns and
restarts of the remoteproc, add virtio support to the ST remoteproc
and refactor the Qualcomm Hexagon driver to handle variations between
platforms.
The support code for parsing, loading and authenticating Qualcomm
firmware files (MDT) is refactored and move to drivers/soc/qcom, to
allow for non-remoteproc drivers to utilize this.
Finally it brings some cleanups to the remoteproc core"
* tag 'rproc-v4.11' of git://github.com/andersson/remoteproc: (27 commits)
remoteproc: qcom: mdt_loader: Use signed type for offset
remoteproc: st: add virtio communication support
remoteproc: st: correct probe error management
remoteproc: Modify the function names
remoteproc: Reduce asynchronous request_firmware to auto-boot only
remoteproc: Drop qcom_scm_pas_supported() from adsp_probe()
MAINTAINERS: Add missing rpmsg include path
remoteproc: qcom: Use common SMD edge handler
remoteproc: qcom: wcnss: Make SMD handling common
remoteproc: Move qcom_mdt_loader into drivers/soc/qcom
remoteproc: qcom: mdt_loader: Refactor MDT loader
remoteproc: qcom: mdt_loader: Don't overwrite firmware object
remoteproc: qcom: Extract non-mdt related helper
remoteproc: qcom: q6v5: Decouple driver from MDT loader
remoteproc: qcom: q6v5: Remove mss supply from 8916
remoteproc: qcom: fix initializers for qcom_mss_reg_res array
remoteproc: Drop firmware_loading_complete
remoteproc: Add RPROC_DELETED state
remoteproc: Move rproc_delete_debug_dir() to rproc_del()
remoteproc: qcom: Add SLPI rproc support to load and boot slpi proc.
...
Pull sound updates from Takashi Iwai:
"Here is the update of sound bits for 4.11: again at this time, no big
changes in ALSA and ASoC core but only cosmetic changes like
consitifaction.
Meanwhile, quite a lot of developments are seen in a few driver side.
ALSA Core:
- Clean up, consitification of some ops
HD-audio:
- A slight behavior change of single_cmd option
- Quirks for AmigaOne X1000, Samsung Ativ Book 8, Dell AiO, ALC221
HP, and fixes for Lewisburg controller
- Realtek ALC299, ALC1220 codecs
Others:
- USB-audio: Tascam US-16x08 DSP mixer quirk
- Intel HDMI LPE audio support for Baytrail / Cherrytrail; this
contains some updates in drm/i915 for the new platform binding
ASoC:
- Lots of updates in Intel drivers, mostly for DisplayPort and HDMI
on Skylake and onwards, as well as more Baytrail / Cherrytrail
boards support
- Channel mapping support for HDMI
- Support for AllWinner A31 and A33, Everest Semiconductor ES8328,
Nuvoton NAU8540.
* tag 'sound-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (323 commits)
ALSA: usb-audio: Tidy up mixer_us16x08.c
ALSA: usb-audio: Fix memory leak and corruption in mixer_us16x08.c
ALSA: usb-audio: purge needless variable length array
ALSA: x86: hdmi: select CONFIG_SND_PCM
ALSA: x86: Don't enable runtime PM as default
ALSA: x86: Use runtime PM autosuspend
ALSA: usb-audio: localize function without external linkage
ALSA: usb-audio: localize one-referrer variable
ALSA: usb-audio: Tascam US-16x08 DSP mixer quirk
ALSA: emu10k1: constify snd_emux_operators structure
ASoC: sun4i-spdif: drop unnessary snd_soc_unregister_component()
ASoC: Intel: bxt: Add jack port initialize in bxt_rt298 machine
ASoC: nau8825: automatic BCLK and LRC divde in master mode
ASoC: hdac_hdmi: Add device id for Geminilake
ASoC: Intel: Skylake: Add Geminlake IDs
ASoC: rt298: Add DMI match for Geminilake reference platform
ASoC: Intel: Skylake: Check device type to get endpoint configuration
ASoC: Intel: bxt: Add jack port initialize in da7219_max98357a machine
ASoC: Intel: Skylake: Add jack port initialize in nau88l25_ssm4567 machine
ASoC: Intel: Skylake: Add jack port initialize in nau88l25_max98357a machine
...
Pull GPIO updates from Linus Walleij:
"This is the bulk of GPIO changes for the v4.11 cycle
Core changes:
- Augment fwnode_get_named_gpiod() to configure the GPIO pin
immediately after requesting it like all other APIs do. This is a
treewide change also updating all users.
- Pass a GPIO label down to gpiod_request() from
fwnode_get_named_gpiod(). This makes debugfs and the userspace ABI
correctly reflect the current in-kernel consumer of a pin taken
using this abstraction. This is a treewide change also updating all
users.
- Rename devm_get_gpiod_from_child() to
devm_fwnode_get_gpiod_from_child() to reflect the fact that this
function is operating on a fwnode object. This is a treewide change
also updating all users.
- Make it possible to take multiple GPIOs in a single hog of device
tree hogs.
- The refactorings switching GPIO chips to use the .set_config()
callback using standard pin control properties and providing a
backend into the pin control subsystem that were also merged into
the pin control tree naturally appear here too.
Testing instrumentation:
- A whole slew of cleanups and improvements to the mockup GPIO
driver. We now have an extended userspace test exercising the
subsystem, and we can inject interrupts etc from userspace to fully
test the core GPIO functionality.
New drivers:
- New driver for the Cortina Systems Gemini GPIO controller.
- New driver for the Exar XR17V352/354/358 chips.
- New driver for the ACCES PCI-IDIO-16 PCI GPIO card.
Driver changes:
- RCAR: set the irqchip parent device, add fine-grained runtime PM
support.
- pca953x: support optional RESET control line on the chip.
- DaVinci: cleanups and simplifications. Add support for multiple
instances.
- .set_multiple() and naming of lines on more or less all of the
ISA/PCI GPIO controllers.
- mcp23s08: refactored to use regmap as a first step to further
rewrites and modernizations"
* tag 'gpio-v4.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (61 commits)
gpio: reintroduce devm_get_gpiod_from_child()
gpio: pci-idio-16: Fix PCI BAR index
gpio: pci-idio-16: Fix PCI device ID code
gpio: mockup: implement event injecting over debugfs
gpio: mockup: add a dummy irqchip
gpio: mockup: implement naming the lines
gpio: mockup: code shrink
gpio: mockup: readability tweaks
gpio: Add GPIO support for the ACCES PCI-IDIO-16
gpio: Add the devm_fwnode_get_index_gpiod_from_child() helper
gpio: Rename devm_get_gpiod_from_child()
gpio: mcp23s08: Select REGMAP/REGMAP_I2C to fix build error
gpio: ws16c48: Add support for GPIO names
gpio: gpio-mm: Add support for GPIO names
gpio: 104-idio-16: Add support for GPIO names
gpio: 104-idi-48: Add support for GPIO names
gpio: 104-dio-48e: Add support for GPIO names
gpio: ws16c48: Remove unnecessary driver_data set
gpio: gpio-mm: Remove unnecessary driver_data set
gpio: 104-idio-16: Remove unnecessary driver_data set
...
Pull input updates from Dmitry:
- a new driver for Zeitech touchscreen controller
- a new driver for Samsung "touchkeys"
- touchscreen driver for Moorestown platform has been removed because
platform support is gone
- MPU3050 accelerometer driver was removed in favor of IIO driver
- miscellaneous driver cleanup and fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (88 commits)
Input: zet6223 - export OF device ID as module aliases
Input: tsc2004/5 - switch to using generic device properties
Input: tsc2004/5 - fix regulator handling
Input: tsc2005 - add OF device table
Input: add driver for Zeitec ZET6223
Input: joydev - do not report stale values on first open
Input: synaptics-rmi4 - forward upper mechanical buttons to PS/2 guest
Input: synaptics-rmi4 - clean up F30 implementation
Input: synaptics - use SERIO_OOB_DATA to handle trackstick buttons
Input: psmouse - add a custom serio protocol to send extra information
Input: synaptics-rmi4 - fix error return code in rmi_probe_interrupts()
Input: xpad - restore LED state after device resume
Input: synaptics-rmi4 - add rmi_find_function()
Input: xpad - fix stuck mode button on Xbox One S pad
Input: joydev - use clamp() macro
Input: refuse to register absolute devices without absinfo
Input: synaptics-rmi4 - add sysfs interfaces for hardware IDs
Input: synaptics-rmi4 - add sysfs attribute update_fw_status
Input: mousedev - stop offering PS/2 to userspace by default
Input: tca8418 - switch to using generic device properties
...
Pull MFD updates from Lee Jones:
"Core Frameworks:
- Add new !TOUCHSCREEN_SUN4I dependency for SUN4I_GPADC
- List include/dt-bindings/mfd/* to files supported in MAINTAINERS
New Drivers:
- Intel Apollo Lake SPI NOR
- ST STM32 Timers (Advanced, Basic and PWM)
- Motorola 6556002 CPCAP (PMIC)
New Device Support:
- Add support for AXP221 to axp20x
- Add support for Intel Gemini Lake to intel-lpss-pci
- Add support for MT6323 LED to mt6397-core
- Add support for COMe-bBD#, COMe-bSL6, COMe-bKL6, COMe-cAL6 and
COMe-cKL6 to kempld-core
New Functionality:
- Add support for Analog CODAC to sun6i-prcm
- Add support for Watchdog to lpc_ich
Fix-ups:
- Error handling improvements; axp288_charger, axp20x, ab8500-sysctrl
- Adapt platform data handling; axp20x
- IRQ handling improvements; arizona, axp20x
- Remove superfluous code; arizona, axp20x, lpc_ich
- Trivial coding style/spelling fixes; axp20x, abx500, mfd.txt
- Regmap fix-ups; axp20x
- DT changes; mfd.txt, aspeed-lpc, aspeed-gfx, ab8500-core, tps65912,
mt6397
- Use new I2C probing mechanism; max77686
- Constification; rk808
Bug Fixes:
- Stop data transfer whilst suspended; cros_ec"
* tag 'mfd-for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (43 commits)
mfd: lpc_ich: Enable watchdog on Intel Apollo Lake PCH
mfd: lpc_ich: Remove useless comments in core part
mfd: Add support for several boards to Kontron PLD driver
mfd: constify regmap_irq_chip structures
MAINTAINERS: Add include/dt-bindings/mfd to MFD entry
mfd: cpcap: Add minimal support
mfd: mt6397: Add MT6323 LED support into MT6397 driver
Documentation: devicetree: Add LED subnode binding for MT6323 PMIC
mfd: tps65912: Export OF device ID table as module aliases
mfd: ab8500-core: Rename clock device and compatible
mfd: cros_ec: Send correct suspend/resume event to EC
mfd: max77686: Remove I2C device ID table
mfd: max77686: Use the struct i2c_driver .probe_new instead of .probe
mfd: max77686: Use of_device_get_match_data() helper
mfd: max77686: Don't attempt to get i2c_device_id .data
mfd: ab8500-sysctrl: Handle probe deferral
mfd: intel-lpss: Add Intel Gemini Lake PCI IDs
mfd: axp20x: Fix AXP806 access errors on cold boot
mfd: cros_ec: Send suspend state notification to EC
mfd: cros_ec: Prevent data transfer while device is suspended
...
Spoofcheck can't be enabled if VF MAC is zero.
Vice versa, can't zero MAC if spoofcheck is on.
Fixes: 8f7ba3ca12 ('net/mlx4: Add set VF mac address support')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge updates from Andrew Morton:
"142 patches:
- DAX updates
- various misc bits
- OCFS2 updates
- most of MM"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (142 commits)
mm/z3fold.c: limit first_num to the actual range of possible buddy indexes
mm: fix <linux/pagemap.h> stray kernel-doc notation
zram: remove obsolete sysfs attrs
mm/memblock.c: remove unnecessary log and clean up
oom-reaper: use madvise_dontneed() logic to decide if unmap the VMA
mm: drop unused argument of zap_page_range()
mm: drop zap_details::check_swap_entries
mm: drop zap_details::ignore_dirty
mm, page_alloc: warn_alloc nodemask is NULL when cpusets are disabled
mm: help __GFP_NOFAIL allocations which do not trigger OOM killer
mm, oom: do not enforce OOM killer for __GFP_NOFAIL automatically
mm: consolidate GFP_NOFAIL checks in the allocator slowpath
lib/show_mem.c: teach show_mem to work with the given nodemask
arch, mm: remove arch specific show_mem
mm, page_alloc: warn_alloc print nodemask
mm, page_alloc: do not report all nodes in show_mem
Revert "mm: bail out in shrink_inactive_list()"
mm, vmscan: consider eligible zones in get_scan_count
mm, vmscan: cleanup lru size claculations
mm, vmscan: do not count freed pages as PGDEACTIVATE
...
Pull DeviceTree updates from Rob Herring:
"Pretty standard stuff with dtc upstream sync being the biggest piece.
- Sync dtc to upstream commit 0931cea3ba20. This picks up overlay
support in dtc.
- Set dma_ops for reserved memory users.
- Make references to IOMMU consistent in DT bindings.
- Cleanup references to pm_power_off in bindings.
- Move some display bindings that snuck into the old bindings/video/
path.
- Fix some wrong documentation paths caused from binding
restructuring.
- Vendor prefixes for Faraday and Fujitsu.
- Fix an of_node ref counting leak in of_find_node_opts_by_path
- Introduce new graph helper of_graph_get_remote_node() which will be
used by DRM drivers in 4.12"
* tag 'devicetree-for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (27 commits)
DT: add Faraday Tec. as vendor
of: introduce of_graph_get_remote_node
of: Add missing space at end of pr_fmt().
of: make of_device_make_bus_id() static
of: fix of_node leak caused in of_find_node_opts_by_path
dt-bindings: net: remove reference to fixed link support
dt-bindings: power: reset: qnap-poweroff: Drop reference to pm_power_off
dt-bindings: power: reset: gpio-poweroff: Drop reference to pm_power_off
dt-bindings: mfd: as3722: Drop reference to pm_power_off
dt-bindings: display: move ANX7814 and SiI8620 bridge bindings
of/unittest: Swap arguments of of_unittest_apply_overlay()
Documentation: usb: fix wrong documentation paths
serial: fsl-imx-uart.txt: Remove generic property
devicetree: Add Fujitsu Ltd. vendor prefix
Documentation: display: fix wrong documentation paths
of: remove redundant memset in overlay
bus:qcom : Fix typo in qcom,ebi2.txt
dt-bindings: qman: Remove pool channel node
Documentation: panel-dpi: fix path to display-timing.txt
devicetree: bindings: clk: mvebu: fix description for sata1 on Armada XP
...
Pull documentation updates from Jonathan Corbet:
"A slightly quieter cycle for documentation this time around.
Three more DocBook template files have been converted to RST; only 21
to go. There are various build improvements and the usual array of
documentation improvements and fixes"
* tag 'docs-4.11' of git://git.lwn.net/linux: (44 commits)
docs / driver-api: Fix structure references in device_link.rst
PM / docs: Fix structure references in device.rst
Add a target to check broken external links in the Documentation
Documentation: Fix linux-api list typo
Documentation: DocBook/Makefile comment typo
Improve sparse documentation
Documentation: make Makefile.sphinx no-ops quieter
Documentation: DMA-ISA-LPC.txt
Documentation: input: fix path to input code definitions
docs: Remove the copyright year from conf.py
docs: Fix a warning in the Korean HOWTO.rst translation
PM / sleep / docs: Convert PM notifiers document to reST
PM / core / docs: Convert sleep states API document to reST
PM / core: Update kerneldoc comments in pm.h
doc-rst: Fix recursive make invocation from macros
doc-rst: Delete output of failed dot-SVG conversion
doc-rst: Break shell command sequences on failure
Documentation/sphinx: make targets independent of Sphinx work for HAVE_SPHINX=0
doc-rst: fixed cleandoc target when used with O=dir
Documentation/sphinx: prevent generation of .pyc files in the source tree
...
Pull KVM updates from Paolo Bonzini:
"4.11 is going to be a relatively large release for KVM, with a little
over 200 commits and noteworthy changes for most architectures.
ARM:
- GICv3 save/restore
- cache flushing fixes
- working MSI injection for GICv3 ITS
- physical timer emulation
MIPS:
- various improvements under the hood
- support for SMP guests
- a large rewrite of MMU emulation. KVM MIPS can now use MMU
notifiers to support copy-on-write, KSM, idle page tracking,
swapping, ballooning and everything else. KVM_CAP_READONLY_MEM is
also supported, so that writes to some memory regions can be
treated as MMIO. The new MMU also paves the way for hardware
virtualization support.
PPC:
- support for POWER9 using the radix-tree MMU for host and guest
- resizable hashed page table
- bugfixes.
s390:
- expose more features to the guest
- more SIMD extensions
- instruction execution protection
- ESOP2
x86:
- improved hashing in the MMU
- faster PageLRU tracking for Intel CPUs without EPT A/D bits
- some refactoring of nested VMX entry/exit code, preparing for live
migration support of nested hypervisors
- expose yet another AVX512 CPUID bit
- host-to-guest PTP support
- refactoring of interrupt injection, with some optimizations thrown
in and some duct tape removed.
- remove lazy FPU handling
- optimizations of user-mode exits
- optimizations of vcpu_is_preempted() for KVM guests
generic:
- alternative signaling mechanism that doesn't pound on
tsk->sighand->siglock"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (195 commits)
x86/kvm: Provide optimized version of vcpu_is_preempted() for x86-64
x86/paravirt: Change vcp_is_preempted() arg type to long
KVM: VMX: use correct vmcs_read/write for guest segment selector/base
x86/kvm/vmx: Defer TR reload after VM exit
x86/asm/64: Drop __cacheline_aligned from struct x86_hw_tss
x86/kvm/vmx: Simplify segment_base()
x86/kvm/vmx: Get rid of segment_base() on 64-bit kernels
x86/kvm/vmx: Don't fetch the TSS base from the GDT
x86/asm: Define the kernel TSS limit in a macro
kvm: fix page struct leak in handle_vmon
KVM: PPC: Book3S HV: Disable HPT resizing on POWER9 for now
KVM: Return an error code only as a constant in kvm_get_dirty_log()
KVM: Return an error code only as a constant in kvm_get_dirty_log_protect()
KVM: Return directly after a failed copy_from_user() in kvm_vm_compat_ioctl()
KVM: x86: remove code for lazy FPU handling
KVM: race-free exit from KVM_RUN without POSIX signals
KVM: PPC: Book3S HV: Turn "KVM guest htab" message into a debug message
KVM: PPC: Book3S PR: Ratelimit copy data failure error messages
KVM: Support vCPU-based gfn->hva cache
KVM: use separate generations for each address space
...
Pull xfs updates from Darrick Wong:
"Here are the XFS changes for 4.11. We aren't introducing any major
features in this release cycle except for this being the first merge
window I've managed on my own. :)
Changes since last update:
- Various cleanups
- Livelock fixes for eofblocks scanning
- Improved input verification for on-disk metadata
- Fix races in the copy on write remap mechanism
- Fix buffer io error timeout controls
- Streamlining of directio copy on write
- Asynchronous discard support
- Fix asserts when splitting delalloc reservations
- Don't bloat bmbt when right shifting extents
- Inode alignment fixes for 32k block sizes"
* tag 'xfs-4.11-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (39 commits)
xfs: remove XFS_ALLOCTYPE_ANY_AG and XFS_ALLOCTYPE_START_AG
xfs: simplify xfs_rtallocate_extent
xfs: tune down agno asserts in the bmap code
xfs: Use xfs_icluster_size_fsb() to calculate inode chunk alignment
xfs: don't reserve blocks for right shift transactions
xfs: fix len comparison in xfs_extent_busy_trim
xfs: fix uninitialized variable in _reflink_convert_cow
xfs: split indlen reservations fairly when under reserved
xfs: handle indlen shortage on delalloc extent merge
xfs: resurrect debug mode drop buffered writes mechanism
xfs: clear delalloc and cache on buffered write failure
xfs: don't block the log commit handler for discards
xfs: improve busy extent sorting
xfs: improve handling of busy extents in the low-level allocator
xfs: don't fail xfs_extent_busy allocation
xfs: correct null checks and error processing in xfs_initialize_perag
xfs: update ctime and mtime on clone destinatation inodes
xfs: allocate direct I/O COW blocks in iomap_begin
xfs: go straight to real allocations for direct I/O COW writes
xfs: return the converted extent in __xfs_reflink_convert_cow
...
Pull printk updates from Petr Mladek:
- Add Petr Mladek, Sergey Senozhatsky as printk maintainers, and Steven
Rostedt as the printk reviewer. This idea came up after the
discussion about printk issues at Kernel Summit. It was formulated
and discussed at lkml[1].
- Extend a lock-less NMI per-cpu buffers idea to handle recursive
printk() calls by Sergey Senozhatsky[2]. It is the first step in
sanitizing printk as discussed at Kernel Summit.
The change allows to see messages that would normally get ignored or
would cause a deadlock.
Also it allows to enable lockdep in printk(). This already paid off.
The testing in linux-next helped to discover two old problems that
were hidden before[3][4].
- Remove unused parameter by Sergey Senozhatsky. Clean up after a past
change.
[1] http://lkml.kernel.org/r/1481798878-31898-1-git-send-email-pmladek@suse.com
[2] http://lkml.kernel.org/r/20161227141611.940-1-sergey.senozhatsky@gmail.com
[3] http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
[4] http://lkml.kernel.org/r/20170217015932.11898-1-sergey.senozhatsky@gmail.com
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
printk: drop call_console_drivers() unused param
printk: convert the rest to printk-safe
printk: remove zap_locks() function
printk: use printk_safe buffers in printk
printk: report lost messages in printk safe/nmi contexts
printk: always use deferred printk when flush printk_safe lines
printk: introduce per-cpu safe_print seq buffer
printk: rename nmi.c and exported api
printk: use vprintk_func in vprintk()
MAINTAINERS: Add printk maintainers
Pull modules updates from Jessica Yu:
"Summary of modules changes for the 4.11 merge window:
- A few small code cleanups
- Add modules git tree url to MAINTAINERS"
* tag 'modules-for-v4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
MAINTAINERS: add tree for modules
module: fix memory leak on early load_module() failures
module: Optimize search_module_extables()
modules: mark __inittest/__exittest as __maybe_unused
livepatch/module: print notice of TAINT_LIVEPATCH
module: Drop redundant declaration of struct module
show_mem() allows to filter out node specific data which is irrelevant
to the allocation request via SHOW_MEM_FILTER_NODES. The filtering is
done in skip_free_areas_node which skips all nodes which are not in the
mems_allowed of the current process. This works most of the time as
expected because the nodemask shouldn't be outside of the allocating
task but there are some exceptions. E.g. memory hotplug might want to
request allocations from outside of the allowed nodes (see
new_node_page).
Get rid of this hardcoded behavior and push the allocation mask down the
show_mem path and use it instead of cpuset_current_mems_allowed. NULL
nodemask is interpreted as cpuset_current_mems_allowed.
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170117091543.25850-5-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>