memcg: introduce memory.min

Memory controller implements the memory.low best-effort memory
protection mechanism, which works perfectly in many cases and allows
protecting working sets of important workloads from sudden reclaim.

But its semantics has a significant limitation: it works only as long as
there is a supply of reclaimable memory.  This makes it pretty useless
against any sort of slow memory leaks or memory usage increases.  This
is especially true for swapless systems.  If swap is enabled, memory
soft protection effectively postpones problems, allowing a leaking
application to fill all swap area, which makes no sense.  The only
effective way to guarantee the memory protection in this case is to
invoke the OOM killer.

It's possible to handle this case in userspace by reacting on MEMCG_LOW
events; but there is still a place for a fail-safe in-kernel mechanism
to provide stronger guarantees.

This patch introduces the memory.min interface for cgroup v2 memory
controller.  It works very similarly to memory.low (sharing the same
hierarchical behavior), except that it's not disabled if there is no
more reclaimable memory in the system.

If cgroup is not populated, its memory.min is ignored, because otherwise
even the OOM killer wouldn't be able to reclaim the protected memory,
and the system can stall.

[guro@fb.com: s/low/min/ in docs]
Link: http://lkml.kernel.org/r/20180510130758.GA9129@castle.DHCP.thefacebook.com
Link: http://lkml.kernel.org/r/20180509180734.GA4856@castle.DHCP.thefacebook.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit is contained in:
Roman Gushchin
2018-06-07 17:07:46 -07:00
committed by Linus Torvalds
parent fb52bbaee5
commit bf8d5d52ff
6 changed files with 202 additions and 50 deletions

View File

@@ -1001,6 +1001,29 @@ PAGE_SIZE multiple when read back.
The total amount of memory currently being used by the cgroup
and its descendants.
memory.min
A read-write single value file which exists on non-root
cgroups. The default is "0".
Hard memory protection. If the memory usage of a cgroup
is within its effective min boundary, the cgroup's memory
won't be reclaimed under any conditions. If there is no
unprotected reclaimable memory available, OOM killer
is invoked.
Effective min boundary is limited by memory.min values of
all ancestor cgroups. If there is memory.min overcommitment
(child cgroup or cgroups are requiring more protected memory
than parent will allow), then each child cgroup will get
the part of parent's protection proportional to its
actual memory usage below memory.min.
Putting more memory than generally available under this
protection is discouraged and may lead to constant OOMs.
If a memory cgroup is not populated with processes,
its memory.min is ignored.
memory.low
A read-write single value file which exists on non-root
cgroups. The default is "0".
@@ -1012,9 +1035,9 @@ PAGE_SIZE multiple when read back.
Effective low boundary is limited by memory.low values of
all ancestor cgroups. If there is memory.low overcommitment
(child cgroup or cgroups are requiring more protected memory,
(child cgroup or cgroups are requiring more protected memory
than parent will allow), then each child cgroup will get
the part of parent's protection proportional to the its
the part of parent's protection proportional to its
actual memory usage below memory.low.
Putting more memory than generally available under this