acpi, memory-hotplug: support getting hotplug info from SRAT

We now provide an option for users who don't want to specify physical
memory address in kernel commandline.

         /*
          * For movablemem_map=acpi:
          *
          * SRAT:                |_____| |_____| |_________| |_________| ......
          * node id:                0       1         1           2
          * hotpluggable:           n       y         y           n
          * movablemem_map:              |_____| |_________|
          *
          * Using movablemem_map, we can prevent memblock from allocating memory
          * on ZONE_MOVABLE at boot time.
          */

So user just specify movablemem_map=acpi, and the kernel will use
hotpluggable info in SRAT to determine which memory ranges should be set
as ZONE_MOVABLE.

If all the memory ranges in SRAT is hotpluggable, then no memory can be
used by kernel.  But before parsing SRAT, memblock has already reserve
some memory ranges for other purposes, such as for kernel image, and so
on.  We cannot prevent kernel from using these memory.  So we need to
exclude these ranges even if these memory is hotpluggable.

Furthermore, there could be several memory ranges in the single node
which the kernel resides in.  We may skip one range that have memory
reserved by memblock, but if the rest of memory is too small, then the
kernel will fail to boot.  So, make the whole node which the kernel
resides in un-hotpluggable.  Then the kernel has enough memory to use.

NOTE: Using this way will cause NUMA performance down because the
      whole node will be set as ZONE_MOVABLE, and kernel cannot use memory
      on it.  If users don't want to lose NUMA performance, just don't use
      it.

[akpm@linux-foundation.org: fix warning]
[akpm@linux-foundation.org: use strcmp()]
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Wu Jianguo <wujianguo@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Len Brown <lenb@kernel.org>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit is contained in:
Tang Chen
2013-02-22 16:33:49 -08:00
committed by Linus Torvalds
parent 27168d38fa
commit 01a178a94e
4 changed files with 113 additions and 11 deletions

View File

@@ -1640,15 +1640,30 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
that the amount of memory usable for all allocations
is not too small.
movablemem_map=acpi
[KNL,X86,IA-64,PPC] This parameter is similar to
memmap except it specifies the memory map of
ZONE_MOVABLE.
This option inform the kernel to use Hot Pluggable bit
in flags from SRAT from ACPI BIOS to determine which
memory devices could be hotplugged. The corresponding
memory ranges will be set as ZONE_MOVABLE.
NOTE: Whatever node the kernel resides in will always
be un-hotpluggable.
movablemem_map=nn[KMG]@ss[KMG]
[KNL,X86,IA-64,PPC] This parameter is similar to
memmap except it specifies the memory map of
ZONE_MOVABLE.
If more areas are all within one node, then from
lowest ss to the end of the node will be ZONE_MOVABLE.
If an area covers two or more nodes, the area from
ss to the end of the 1st node will be ZONE_MOVABLE,
and all the rest nodes will only have ZONE_MOVABLE.
If user specifies memory ranges, the info in SRAT will
be ingored. And it works like the following:
- If more ranges are all within one node, then from
lowest ss to the end of the node will be ZONE_MOVABLE.
- If a range is within a node, then from ss to the end
of the node will be ZONE_MOVABLE.
- If a range covers two or more nodes, then from ss to
the end of the 1st node will be ZONE_MOVABLE, and all
the rest nodes will only have ZONE_MOVABLE.
If memmap is specified at the same time, the
movablemem_map will be limited within the memmap
areas. If kernelcore or movablecore is also specified,
@@ -1656,6 +1671,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
satisfied. So the administrator should be careful that
the amount of movablemem_map areas are not too large.
Otherwise kernel won't have enough memory to start.
NOTE: We don't stop users specifying the node the
kernel resides in as hotpluggable so that this
option can be used as a workaround of firmware
bugs.
MTD_Partition= [MTD]
Format: <name>,<region-number>,<size>,<offset>