writeback, blkio: add documentation for cgroup writeback support

Update Documentation/cgroups/blkio-controller.txt to reflect the recently added cgroup writeback support. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Li Zefan <lizefan@huawei.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: cgroups@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-16 18:48:32 -04:00
parent 46b15caa7c
commit 3e1534cf4a
1 changed files with 78 additions and 5 deletions
--- a/Documentation/cgroups/blkio-controller.txt
+++ b/Documentation/cgroups/blkio-controller.txt
@@ -387,8 +387,81 @@ groups and put applications in that group which are not driving enough
 IO to keep disk busy. In that case set group_idle=0, and CFQ will not idle
 on individual groups and throughput should improve.
-What works
+Writeback
-==========
+=========
- Currently only sync IO queues are support. All the buffered writes are
+
-  still system wide and not per group. Hence we will not see service
+Page cache is dirtied through buffered writes and shared mmaps and
-  differentiation between buffered writes between groups.
+written asynchronously to the backing filesystem by the writeback
 mechanism.  Writeback sits between the memory and IO domains and
 regulates the proportion of dirty memory by balancing dirtying and
 write IOs.
 On traditional cgroup hierarchies, relationships between different
 controllers cannot be established making it impossible for writeback
 to operate accounting for cgroup resource restrictions and all
 writeback IOs are attributed to the root cgroup.
 If both the blkio and memory controllers are used on the v2 hierarchy
 and the filesystem supports cgroup writeback, writeback operations
 correctly follow the resource restrictions imposed by both memory and
 blkio controllers.
 Writeback examines both system-wide and per-cgroup dirty memory status
 and enforces the more restrictive of the two.  Also, writeback control
 parameters which are absolute values - vm.dirty_bytes and
 vm.dirty_background_bytes - are distributed across cgroups according
 to their current writeback bandwidth.
 There's a peculiarity stemming from the discrepancy in ownership
 granularity between memory controller and writeback.  While memory
 controller tracks ownership per page, writeback operates on inode
 basis.  cgroup writeback bridges the gap by tracking ownership by
 inode but migrating ownership if too many foreign pages, pages which
 don't match the current inode ownership, have been encountered while
 writing back the inode.
 This is a conscious design choice as writeback operations are
 inherently tied to inodes making strictly following page ownership
 complicated and inefficient.  The only use case which suffers from
 this compromise is multiple cgroups concurrently dirtying disjoint
 regions of the same inode, which is an unlikely use case and decided
 to be unsupported.  Note that as memory controller assigns page
 ownership on the first use and doesn't update it until the page is
 released, even if cgroup writeback strictly follows page ownership,
 multiple cgroups dirtying overlapping areas wouldn't work as expected.
 In general, write-sharing an inode across multiple cgroups is not well
 supported.
 Filesystem support for cgroup writeback
 ---------------------------------------
 A filesystem can make writeback IOs cgroup-aware by updating
 address_space_operations->writepage[s]() to annotate bio's using the
 following two functions.
 * wbc_init_bio(@wbc, @bio)
  Should be called for each bio carrying writeback data and associates
  the bio with the inode's owner cgroup.  Can be called anytime
  between bio allocation and submission.
 * wbc_account_io(@wbc, @page, @bytes)
  Should be called for each data segment being written out.  While
  this function doesn't care exactly when it's called during the
  writeback session, it's the easiest and most natural to call it as
  data segments are added to a bio.
 With writeback bio's annotated, cgroup support can be enabled per
 super_block by setting MS_CGROUPWB in ->s_flags.  This allows for
 selective disabling of cgroup writeback support which is helpful when
 certain filesystem features, e.g. journaled data mode, are
 incompatible.
 wbc_init_bio() binds the specified bio to its cgroup.  Depending on
 the configuration, the bio may be executed at a lower priority and if
 the writeback session is holding shared resources, e.g. a journal
 entry, may lead to priority inversion.  There is no one easy solution
 for the problem.  Filesystems can try to work around specific problem
 cases by skipping wbc_init_bio() or using bio_associate_blkcg()
 directly.