
Patch series "Introduce MADV_COLD and MADV_PAGEOUT", v7. - Background The Android terminology used for forking a new process and starting an app from scratch is a cold start, while resuming an existing app is a hot start. While we continually try to improve the performance of cold starts, hot starts will always be significantly less power hungry as well as faster so we are trying to make hot start more likely than cold start. To increase hot start, Android userspace manages the order that apps should be killed in a process called ActivityManagerService. ActivityManagerService tracks every Android app or service that the user could be interacting with at any time and translates that into a ranked list for lmkd(low memory killer daemon). They are likely to be killed by lmkd if the system has to reclaim memory. In that sense they are similar to entries in any other cache. Those apps are kept alive for opportunistic performance improvements but those performance improvements will vary based on the memory requirements of individual workloads. - Problem Naturally, cached apps were dominant consumers of memory on the system. However, they were not significant consumers of swap even though they are good candidate for swap. Under investigation, swapping out only begins once the low zone watermark is hit and kswapd wakes up, but the overall allocation rate in the system might trip lmkd thresholds and cause a cached process to be killed(we measured performance swapping out vs. zapping the memory by killing a process. Unsurprisingly, zapping is 10x times faster even though we use zram which is much faster than real storage) so kill from lmkd will often satisfy the high zone watermark, resulting in very few pages actually being moved to swap. - Approach The approach we chose was to use a new interface to allow userspace to proactively reclaim entire processes by leveraging platform information. This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages that are known to be cold from userspace and to avoid races with lmkd by reclaiming apps as soon as they entered the cached state. Additionally, it could provide many chances for platform to use much information to optimize memory efficiency. To achieve the goal, the patchset introduce two new options for madvise. One is MADV_COLD which will deactivate activated pages and the other is MADV_PAGEOUT which will reclaim private pages instantly. These new options complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to gain some free memory space. MADV_PAGEOUT is similar to MADV_DONTNEED in a way that it hints the kernel that memory region is not currently needed and should be reclaimed immediately; MADV_COLD is similar to MADV_FREE in a way that it hints the kernel that memory region is not currently needed and should be reclaimed when memory pressure rises. This patch (of 5): When a process expects no accesses to a certain memory range, it could give a hint to kernel that the pages can be reclaimed when memory pressure happens but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_COLD hint to madvise(2) syscall. MADV_COLD can be used by a process to mark a memory range as not expected to be used in the near future. The hint can help kernel in deciding which pages to evict early during memory pressure. It works for every LRU pages like MADV_[DONTNEED|FREE]. IOW, It moves active file page -> inactive file LRU active anon page -> inacdtive anon LRU Unlike MADV_FREE, it doesn't move active anonymous pages to inactive file LRU's head because MADV_COLD is a little bit different symantic. MADV_FREE means it's okay to discard when the memory pressure because the content of the page is *garbage* so freeing such pages is almost zero overhead since we don't need to swap out and access afterward causes just minor fault. Thus, it would make sense to put those freeable pages in inactive file LRU to compete other used-once pages. It makes sense for implmentaion point of view, too because it's not swapbacked memory any longer until it would be re-dirtied. Even, it could give a bonus to make them be reclaimed on swapless system. However, MADV_COLD doesn't mean garbage so reclaiming them requires swap-out/in in the end so it's bigger cost. Since we have designed VM LRU aging based on cost-model, anonymous cold pages would be better to position inactive anon's LRU list, not file LRU. Furthermore, it would help to avoid unnecessary scanning if system doesn't have a swap device. Let's start simpler way without adding complexity at this moment. However, keep in mind, too that it's a caveat that workloads with a lot of pages cache are likely to ignore MADV_COLD on anonymous memory because we rarely age anonymous LRU lists. * man-page material MADV_COLD (since Linux x.x) Pages in the specified regions will be treated as less-recently-accessed compared to pages in the system with similar access frequencies. In contrast to MADV_FREE, the contents of the region are preserved regardless of subsequent writes to pages. MADV_COLD cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP pages. [akpm@linux-foundation.org: resolve conflicts with hmm.git] Link: http://lkml.kernel.org/r/20190726023435.214162-2-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: kbuild test robot <lkp@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Chris Zankel <chris@zankel.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Daniel Colascione <dancol@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tim Murray <timmurray@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
109 lines
4.3 KiB
C
109 lines
4.3 KiB
C
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
|
|
/*
|
|
* This file is subject to the terms and conditions of the GNU General Public
|
|
* License. See the file "COPYING" in the main directory of this archive
|
|
* for more details.
|
|
*
|
|
* Copyright (C) 1995, 1999, 2002 by Ralf Baechle
|
|
*/
|
|
#ifndef _ASM_MMAN_H
|
|
#define _ASM_MMAN_H
|
|
|
|
/*
|
|
* Protections are chosen from these bits, OR'd together. The
|
|
* implementation does not necessarily support PROT_EXEC or PROT_WRITE
|
|
* without PROT_READ. The only guarantees are that no writing will be
|
|
* allowed without PROT_WRITE and no access will be allowed for PROT_NONE.
|
|
*/
|
|
#define PROT_NONE 0x00 /* page can not be accessed */
|
|
#define PROT_READ 0x01 /* page can be read */
|
|
#define PROT_WRITE 0x02 /* page can be written */
|
|
#define PROT_EXEC 0x04 /* page can be executed */
|
|
/* 0x08 reserved for PROT_EXEC_NOFLUSH */
|
|
#define PROT_SEM 0x10 /* page may be used for atomic ops */
|
|
#define PROT_GROWSDOWN 0x01000000 /* mprotect flag: extend change to start of growsdown vma */
|
|
#define PROT_GROWSUP 0x02000000 /* mprotect flag: extend change to end of growsup vma */
|
|
|
|
/*
|
|
* Flags for mmap
|
|
*/
|
|
/* 0x01 - 0x03 are defined in linux/mman.h */
|
|
#define MAP_TYPE 0x00f /* Mask for type of mapping */
|
|
#define MAP_FIXED 0x010 /* Interpret addr exactly */
|
|
|
|
/* not used by linux, but here to make sure we don't clash with ABI defines */
|
|
#define MAP_RENAME 0x020 /* Assign page to file */
|
|
#define MAP_AUTOGROW 0x040 /* File may grow by writing */
|
|
#define MAP_LOCAL 0x080 /* Copy on fork/sproc */
|
|
#define MAP_AUTORSRV 0x100 /* Logical swap reserved on demand */
|
|
|
|
/* These are linux-specific */
|
|
#define MAP_NORESERVE 0x0400 /* don't check for reservations */
|
|
#define MAP_ANONYMOUS 0x0800 /* don't use a file */
|
|
#define MAP_GROWSDOWN 0x1000 /* stack-like segment */
|
|
#define MAP_DENYWRITE 0x2000 /* ETXTBSY */
|
|
#define MAP_EXECUTABLE 0x4000 /* mark it as an executable */
|
|
#define MAP_LOCKED 0x8000 /* pages are locked */
|
|
#define MAP_POPULATE 0x10000 /* populate (prefault) pagetables */
|
|
#define MAP_NONBLOCK 0x20000 /* do not block on IO */
|
|
#define MAP_STACK 0x40000 /* give out an address that is best suited for process/thread stacks */
|
|
#define MAP_HUGETLB 0x80000 /* create a huge page mapping */
|
|
#define MAP_FIXED_NOREPLACE 0x100000 /* MAP_FIXED which doesn't unmap underlying mapping */
|
|
|
|
/*
|
|
* Flags for msync
|
|
*/
|
|
#define MS_ASYNC 0x0001 /* sync memory asynchronously */
|
|
#define MS_INVALIDATE 0x0002 /* invalidate mappings & caches */
|
|
#define MS_SYNC 0x0004 /* synchronous memory sync */
|
|
|
|
/*
|
|
* Flags for mlockall
|
|
*/
|
|
#define MCL_CURRENT 1 /* lock all current mappings */
|
|
#define MCL_FUTURE 2 /* lock all future mappings */
|
|
#define MCL_ONFAULT 4 /* lock all pages that are faulted in */
|
|
|
|
/*
|
|
* Flags for mlock
|
|
*/
|
|
#define MLOCK_ONFAULT 0x01 /* Lock pages in range after they are faulted in, do not prefault */
|
|
|
|
#define MADV_NORMAL 0 /* no further special treatment */
|
|
#define MADV_RANDOM 1 /* expect random page references */
|
|
#define MADV_SEQUENTIAL 2 /* expect sequential page references */
|
|
#define MADV_WILLNEED 3 /* will need these pages */
|
|
#define MADV_DONTNEED 4 /* don't need these pages */
|
|
|
|
/* common parameters: try to keep these consistent across architectures */
|
|
#define MADV_FREE 8 /* free pages only if memory pressure */
|
|
#define MADV_REMOVE 9 /* remove these pages & resources */
|
|
#define MADV_DONTFORK 10 /* don't inherit across fork */
|
|
#define MADV_DOFORK 11 /* do inherit across fork */
|
|
|
|
#define MADV_MERGEABLE 12 /* KSM may merge identical pages */
|
|
#define MADV_UNMERGEABLE 13 /* KSM may not merge identical pages */
|
|
#define MADV_HWPOISON 100 /* poison a page for testing */
|
|
|
|
#define MADV_HUGEPAGE 14 /* Worth backing with hugepages */
|
|
#define MADV_NOHUGEPAGE 15 /* Not worth backing with hugepages */
|
|
|
|
#define MADV_DONTDUMP 16 /* Explicity exclude from the core dump,
|
|
overrides the coredump filter bits */
|
|
#define MADV_DODUMP 17 /* Clear the MADV_NODUMP flag */
|
|
|
|
#define MADV_WIPEONFORK 18 /* Zero memory on fork, child only */
|
|
#define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */
|
|
|
|
#define MADV_COLD 20 /* deactivate these pages */
|
|
|
|
/* compatibility flags */
|
|
#define MAP_FILE 0
|
|
|
|
#define PKEY_DISABLE_ACCESS 0x1
|
|
#define PKEY_DISABLE_WRITE 0x2
|
|
#define PKEY_ACCESS_MASK (PKEY_DISABLE_ACCESS |\
|
|
PKEY_DISABLE_WRITE)
|
|
|
|
#endif /* _ASM_MMAN_H */
|