fake-numa-for-cpusets.rst 3.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778
  1. .. SPDX-License-Identifier: GPL-2.0
  2. =====================
  3. Fake NUMA For CPUSets
  4. =====================
  5. :Author: David Rientjes <[email protected]>
  6. Using numa=fake and CPUSets for Resource Management
  7. This document describes how the numa=fake x86_64 command-line option can be used
  8. in conjunction with cpusets for coarse memory management. Using this feature,
  9. you can create fake NUMA nodes that represent contiguous chunks of memory and
  10. assign them to cpusets and their attached tasks. This is a way of limiting the
  11. amount of system memory that are available to a certain class of tasks.
  12. For more information on the features of cpusets, see
  13. Documentation/admin-guide/cgroup-v1/cpusets.rst.
  14. There are a number of different configurations you can use for your needs. For
  15. more information on the numa=fake command line option and its various ways of
  16. configuring fake nodes, see Documentation/x86/x86_64/boot-options.rst.
  17. For the purposes of this introduction, we'll assume a very primitive NUMA
  18. emulation setup of "numa=fake=4*512,". This will split our system memory into
  19. four equal chunks of 512M each that we can now use to assign to cpusets. As
  20. you become more familiar with using this combination for resource control,
  21. you'll determine a better setup to minimize the number of nodes you have to deal
  22. with.
  23. A machine may be split as follows with "numa=fake=4*512," as reported by dmesg::
  24. Faking node 0 at 0000000000000000-0000000020000000 (512MB)
  25. Faking node 1 at 0000000020000000-0000000040000000 (512MB)
  26. Faking node 2 at 0000000040000000-0000000060000000 (512MB)
  27. Faking node 3 at 0000000060000000-0000000080000000 (512MB)
  28. ...
  29. On node 0 totalpages: 130975
  30. On node 1 totalpages: 131072
  31. On node 2 totalpages: 131072
  32. On node 3 totalpages: 131072
  33. Now following the instructions for mounting the cpusets filesystem from
  34. Documentation/admin-guide/cgroup-v1/cpusets.rst, you can assign fake nodes (i.e. contiguous memory
  35. address spaces) to individual cpusets::
  36. [root@xroads /]# mkdir exampleset
  37. [root@xroads /]# mount -t cpuset none exampleset
  38. [root@xroads /]# mkdir exampleset/ddset
  39. [root@xroads /]# cd exampleset/ddset
  40. [root@xroads /exampleset/ddset]# echo 0-1 > cpus
  41. [root@xroads /exampleset/ddset]# echo 0-1 > mems
  42. Now this cpuset, 'ddset', will only allowed access to fake nodes 0 and 1 for
  43. memory allocations (1G).
  44. You can now assign tasks to these cpusets to limit the memory resources
  45. available to them according to the fake nodes assigned as mems::
  46. [root@xroads /exampleset/ddset]# echo $$ > tasks
  47. [root@xroads /exampleset/ddset]# dd if=/dev/zero of=tmp bs=1024 count=1G
  48. [1] 13425
  49. Notice the difference between the system memory usage as reported by
  50. /proc/meminfo between the restricted cpuset case above and the unrestricted
  51. case (i.e. running the same 'dd' command without assigning it to a fake NUMA
  52. cpuset):
  53. ======== ============ ==========
  54. Name Unrestricted Restricted
  55. ======== ============ ==========
  56. MemTotal 3091900 kB 3091900 kB
  57. MemFree 42113 kB 1513236 kB
  58. ======== ============ ==========
  59. This allows for coarse memory management for the tasks you assign to particular
  60. cpusets. Since cpusets can form a hierarchy, you can create some pretty
  61. interesting combinations of use-cases for various classes of tasks for your
  62. memory management needs.