kernel: res_counter: remove the unused API
All memory accounting and limiting has been switched over to the lockless page counters. Bye, res_counter! [akpm@linux-foundation.org: update Documentation/cgroups/memory.txt] [mhocko@suse.cz: ditch the last remainings of res_counter] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit is contained in:

committed by
Linus Torvalds

parent
71f87bee38
commit
5b1efc027c
@@ -116,16 +116,16 @@ The memory controller is the first controller developed.
|
||||
|
||||
2.1. Design
|
||||
|
||||
The core of the design is a counter called the res_counter. The res_counter
|
||||
tracks the current memory usage and limit of the group of processes associated
|
||||
with the controller. Each cgroup has a memory controller specific data
|
||||
structure (mem_cgroup) associated with it.
|
||||
The core of the design is a counter called the page_counter. The
|
||||
page_counter tracks the current memory usage and limit of the group of
|
||||
processes associated with the controller. Each cgroup has a memory controller
|
||||
specific data structure (mem_cgroup) associated with it.
|
||||
|
||||
2.2. Accounting
|
||||
|
||||
+--------------------+
|
||||
| mem_cgroup |
|
||||
| (res_counter) |
|
||||
| mem_cgroup |
|
||||
| (page_counter) |
|
||||
+--------------------+
|
||||
/ ^ \
|
||||
/ | \
|
||||
@@ -352,9 +352,8 @@ set:
|
||||
0. Configuration
|
||||
|
||||
a. Enable CONFIG_CGROUPS
|
||||
b. Enable CONFIG_RESOURCE_COUNTERS
|
||||
c. Enable CONFIG_MEMCG
|
||||
d. Enable CONFIG_MEMCG_SWAP (to use swap extension)
|
||||
b. Enable CONFIG_MEMCG
|
||||
c. Enable CONFIG_MEMCG_SWAP (to use swap extension)
|
||||
d. Enable CONFIG_MEMCG_KMEM (to use kmem extension)
|
||||
|
||||
1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
|
||||
|
@@ -1,197 +0,0 @@
|
||||
|
||||
The Resource Counter
|
||||
|
||||
The resource counter, declared at include/linux/res_counter.h,
|
||||
is supposed to facilitate the resource management by controllers
|
||||
by providing common stuff for accounting.
|
||||
|
||||
This "stuff" includes the res_counter structure and routines
|
||||
to work with it.
|
||||
|
||||
|
||||
|
||||
1. Crucial parts of the res_counter structure
|
||||
|
||||
a. unsigned long long usage
|
||||
|
||||
The usage value shows the amount of a resource that is consumed
|
||||
by a group at a given time. The units of measurement should be
|
||||
determined by the controller that uses this counter. E.g. it can
|
||||
be bytes, items or any other unit the controller operates on.
|
||||
|
||||
b. unsigned long long max_usage
|
||||
|
||||
The maximal value of the usage over time.
|
||||
|
||||
This value is useful when gathering statistical information about
|
||||
the particular group, as it shows the actual resource requirements
|
||||
for a particular group, not just some usage snapshot.
|
||||
|
||||
c. unsigned long long limit
|
||||
|
||||
The maximal allowed amount of resource to consume by the group. In
|
||||
case the group requests for more resources, so that the usage value
|
||||
would exceed the limit, the resource allocation is rejected (see
|
||||
the next section).
|
||||
|
||||
d. unsigned long long failcnt
|
||||
|
||||
The failcnt stands for "failures counter". This is the number of
|
||||
resource allocation attempts that failed.
|
||||
|
||||
c. spinlock_t lock
|
||||
|
||||
Protects changes of the above values.
|
||||
|
||||
|
||||
|
||||
2. Basic accounting routines
|
||||
|
||||
a. void res_counter_init(struct res_counter *rc,
|
||||
struct res_counter *rc_parent)
|
||||
|
||||
Initializes the resource counter. As usual, should be the first
|
||||
routine called for a new counter.
|
||||
|
||||
The struct res_counter *parent can be used to define a hierarchical
|
||||
child -> parent relationship directly in the res_counter structure,
|
||||
NULL can be used to define no relationship.
|
||||
|
||||
c. int res_counter_charge(struct res_counter *rc, unsigned long val,
|
||||
struct res_counter **limit_fail_at)
|
||||
|
||||
When a resource is about to be allocated it has to be accounted
|
||||
with the appropriate resource counter (controller should determine
|
||||
which one to use on its own). This operation is called "charging".
|
||||
|
||||
This is not very important which operation - resource allocation
|
||||
or charging - is performed first, but
|
||||
* if the allocation is performed first, this may create a
|
||||
temporary resource over-usage by the time resource counter is
|
||||
charged;
|
||||
* if the charging is performed first, then it should be uncharged
|
||||
on error path (if the one is called).
|
||||
|
||||
If the charging fails and a hierarchical dependency exists, the
|
||||
limit_fail_at parameter is set to the particular res_counter element
|
||||
where the charging failed.
|
||||
|
||||
d. u64 res_counter_uncharge(struct res_counter *rc, unsigned long val)
|
||||
|
||||
When a resource is released (freed) it should be de-accounted
|
||||
from the resource counter it was accounted to. This is called
|
||||
"uncharging". The return value of this function indicate the amount
|
||||
of charges still present in the counter.
|
||||
|
||||
The _locked routines imply that the res_counter->lock is taken.
|
||||
|
||||
e. u64 res_counter_uncharge_until
|
||||
(struct res_counter *rc, struct res_counter *top,
|
||||
unsigned long val)
|
||||
|
||||
Almost same as res_counter_uncharge() but propagation of uncharge
|
||||
stops when rc == top. This is useful when kill a res_counter in
|
||||
child cgroup.
|
||||
|
||||
2.1 Other accounting routines
|
||||
|
||||
There are more routines that may help you with common needs, like
|
||||
checking whether the limit is reached or resetting the max_usage
|
||||
value. They are all declared in include/linux/res_counter.h.
|
||||
|
||||
|
||||
|
||||
3. Analyzing the resource counter registrations
|
||||
|
||||
a. If the failcnt value constantly grows, this means that the counter's
|
||||
limit is too tight. Either the group is misbehaving and consumes too
|
||||
many resources, or the configuration is not suitable for the group
|
||||
and the limit should be increased.
|
||||
|
||||
b. The max_usage value can be used to quickly tune the group. One may
|
||||
set the limits to maximal values and either load the container with
|
||||
a common pattern or leave one for a while. After this the max_usage
|
||||
value shows the amount of memory the container would require during
|
||||
its common activity.
|
||||
|
||||
Setting the limit a bit above this value gives a pretty good
|
||||
configuration that works in most of the cases.
|
||||
|
||||
c. If the max_usage is much less than the limit, but the failcnt value
|
||||
is growing, then the group tries to allocate a big chunk of resource
|
||||
at once.
|
||||
|
||||
d. If the max_usage is much less than the limit, but the failcnt value
|
||||
is 0, then this group is given too high limit, that it does not
|
||||
require. It is better to lower the limit a bit leaving more resource
|
||||
for other groups.
|
||||
|
||||
|
||||
|
||||
4. Communication with the control groups subsystem (cgroups)
|
||||
|
||||
All the resource controllers that are using cgroups and resource counters
|
||||
should provide files (in the cgroup filesystem) to work with the resource
|
||||
counter fields. They are recommended to adhere to the following rules:
|
||||
|
||||
a. File names
|
||||
|
||||
Field name File name
|
||||
---------------------------------------------------
|
||||
usage usage_in_<unit_of_measurement>
|
||||
max_usage max_usage_in_<unit_of_measurement>
|
||||
limit limit_in_<unit_of_measurement>
|
||||
failcnt failcnt
|
||||
lock no file :)
|
||||
|
||||
b. Reading from file should show the corresponding field value in the
|
||||
appropriate format.
|
||||
|
||||
c. Writing to file
|
||||
|
||||
Field Expected behavior
|
||||
----------------------------------
|
||||
usage prohibited
|
||||
max_usage reset to usage
|
||||
limit set the limit
|
||||
failcnt reset to zero
|
||||
|
||||
|
||||
|
||||
5. Usage example
|
||||
|
||||
a. Declare a task group (take a look at cgroups subsystem for this) and
|
||||
fold a res_counter into it
|
||||
|
||||
struct my_group {
|
||||
struct res_counter res;
|
||||
|
||||
<other fields>
|
||||
}
|
||||
|
||||
b. Put hooks in resource allocation/release paths
|
||||
|
||||
int alloc_something(...)
|
||||
{
|
||||
if (res_counter_charge(res_counter_ptr, amount) < 0)
|
||||
return -ENOMEM;
|
||||
|
||||
<allocate the resource and return to the caller>
|
||||
}
|
||||
|
||||
void release_something(...)
|
||||
{
|
||||
res_counter_uncharge(res_counter_ptr, amount);
|
||||
|
||||
<release the resource>
|
||||
}
|
||||
|
||||
In order to keep the usage value self-consistent, both the
|
||||
"res_counter_ptr" and the "amount" in release_something() should be
|
||||
the same as they were in the alloc_something() when the releasing
|
||||
resource was allocated.
|
||||
|
||||
c. Provide the way to read res_counter values and set them (the cgroups
|
||||
still can help with it).
|
||||
|
||||
c. Compile and run :)
|
Reference in New Issue
Block a user