Linus pointed out that relying on the compiler to pack structures with
enums is fragile not just for the kernel, but for external tooling as
well which might rely on our UAPI headers.
So separate the two from each other: introduce 'struct boot_e820_entry',
which is the boot protocol entry format.
This actually simplifies the code, as e820__update_table() is now never
called directly with boot protocol table entries - we can rely on
append_e820_table() and do a e820__update_table() call afterwards.
( This will allow further simplifications of __e820__update_table(),
but that will be done in a separate patch. )
This change also has the side effect of not modifying the bootparams structure
anymore - which might be useful for debugging. In theory we could even constify
the boot_params structure - at least from the E820 code's point of view.
Remove the uapi/asm/e820/types.h file, as it's not used anymore - all
kernel side E820 types are defined in asm/e820/types.h.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The e820__update_table() parameters are pretty complex:
arch/x86/include/asm/e820/api.h:extern int e820__update_table(struct e820_entry *biosmap, int max_nr_map, u32 *pnr_map);
But 90% of the usage is trivial:
arch/x86/kernel/e820.c: if (e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries))
arch/x86/kernel/e820.c: e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries);
arch/x86/kernel/e820.c: e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries);
arch/x86/kernel/e820.c: if (e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries) < 0)
arch/x86/kernel/e820.c: e820__update_table(boot_params.e820_table, ARRAY_SIZE(boot_params.e820_table), &new_nr);
arch/x86/kernel/early-quirks.c: e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries);
arch/x86/kernel/setup.c: e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries);
arch/x86/kernel/setup.c: e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries);
arch/x86/platform/efi/efi.c: e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries);
arch/x86/xen/setup.c: e820__update_table(xen_e820_table.entries, ARRAY_SIZE(xen_e820_table.entries),
arch/x86/xen/setup.c: e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries);
arch/x86/xen/setup.c: e820__update_table(xen_e820_table.entries, ARRAY_SIZE(xen_e820_table.entries),
as it only uses an exiting struct e820_table's entries array, its size and
its current number of entries as input and output arguments.
Only one use is non-trivial:
arch/x86/kernel/e820.c: e820__update_table(boot_params.e820_table, ARRAY_SIZE(boot_params.e820_table), &new_nr);
... which call updates the E820 table in the zeropage in-situ, and the layout there does not
match that of 'struct e820_table' (in particular nr_entries is at a different offset,
hardcoded by the boot protocol).
Simplify all this by introducing a low level __e820__update_table() API that
the zeropage update call can use, and simplifying the main e820__update_table()
call signature down to:
int e820__update_table(struct e820_table *table);
This visibly simplifies all the call sites:
arch/x86/include/asm/e820/api.h:extern int e820__update_table(struct e820_table *table);
arch/x86/include/asm/e820/types.h: * call to e820__update_table() to remove duplicates. The allowance
arch/x86/kernel/e820.c: * The return value from e820__update_table() is zero if it
arch/x86/kernel/e820.c:int __init e820__update_table(struct e820_table *table)
arch/x86/kernel/e820.c: if (e820__update_table(e820_table))
arch/x86/kernel/e820.c: e820__update_table(e820_table_firmware);
arch/x86/kernel/e820.c: e820__update_table(e820_table);
arch/x86/kernel/e820.c: e820__update_table(e820_table);
arch/x86/kernel/e820.c: if (e820__update_table(e820_table) < 0)
arch/x86/kernel/early-quirks.c: e820__update_table(e820_table);
arch/x86/kernel/setup.c: e820__update_table(e820_table);
arch/x86/kernel/setup.c: e820__update_table(e820_table);
arch/x86/platform/efi/efi.c: e820__update_table(e820_table);
arch/x86/xen/setup.c: e820__update_table(&xen_e820_table);
arch/x86/xen/setup.c: e820__update_table(e820_table);
arch/x86/xen/setup.c: e820__update_table(&xen_e820_table);
No change in functionality.
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We have these three related functions:
extern void e820_add_region(u64 start, u64 size, int type);
extern u64 e820_update_range(u64 start, u64 size, unsigned old_type, unsigned new_type);
extern u64 e820_remove_range(u64 start, u64 size, unsigned old_type, int checktype);
But it's not clear from the naming that they are 3 operations based around the
same 'memory range' concept. Rename them to better signal this, and move
the prototypes next to each other:
extern void e820__range_add (u64 start, u64 size, int type);
extern u64 e820__range_update(u64 start, u64 size, unsigned old_type, unsigned new_type);
extern u64 e820__range_remove(u64 start, u64 size, unsigned old_type, int checktype);
Note that this improved organization of the functions shows another problem that was easy
to miss before: sometimes the E820 entry type is 'int', sometimes 'unsigned int' - but this
will be fixed in a separate patch.
No change in functionality.
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
sanitize_e820_table() is a minor misnomer in that it suggests that
the E820 table requires sanitizing - which implies that it will only
do anything if the E820 table is irregular (not sane).
That is wrong, because sanitize_e820_table() also does a very regular
sorting of the E820 table, which is a necessity in the basic
append-only flow of E820 updates the kernel is allowed to perform to
it.
So rename it to e820__update_table() to include that purpose as well.
This also lines up all the table-update functions into a coherent
naming family:
int e820__update_table(struct e820_entry *biosmap, int max_nr_map, u32 *pnr_map);
void e820__update_table_print(void);
void e820__update_table_firmware(void);
No change in functionality.
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
early_reserve_e820() is an early hack for kexec that does a limited fixup of the
mptable and passes it to the kexec kernel as if it was the real thing.
For this it needs to allocate memory - but no memory allocator is available yet
beyond the memblock allocator, so early_reserve_e820() is really a wrapper
around memblock_alloc() plus a hack to update the e820_table_firmware entries.
The name 'reserve' is really a bit of a misnomer, as 'reserved' memory typically
means memory completely inaccessible to the kernel - while here what we want to do
is a special RAM allocation for our own purposes and insert that as RAM_RESERVED.
Rename the function to e820__memblock_alloc_reserved() to better signal this dual
purpose, plus document it better, which was omitted when it was merged. The barely
comprehensible and cryptic comment:
/*
* pre allocated 4k and reserved it in memblock and e820_table_firmware
*/
u64 __init e820__memblock_alloc_reserved(u64 size, u64 align)
... does not count as documentation, replace it with:
/*
* Allocate the requested number of bytes with the requsted alignment
* and return (the physical address) to the caller. Also register this
* range in the 'firmware' E820 table.
*
* This allows kexec to fake a new mptable, as if it came from the real
* system.
*/
u64 __init e820__memblock_alloc_reserved(u64 size, u64 align)
No change in functionality.
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We introduced memblock_find_dma_reserve() in this commit:
6f2a75369e x86, memblock: Use memblock_memory_size()/memblock_free_memory_size() to get correct dma_reserve
But there's several problems with it:
- The changelog is full of typos and is incomprehensible in general, and
the comments in the code are not much better either.
- The function was inexplicably placed into e820.c, while it has very
little connection to the E820 table: when we call
memblock_find_dma_reserve() then memblock is already set up and we
are not using the E820 table anymore.
- The function is a wrapper around set_dma_reserve(), but changed the 'set'
name to 'find' - actively misleading about its primary purpose, which is
still to set the DMA-reserve value.
- The function is limited to 64-bit systems, but neither the changelog nor
the comments explain why. The change would appear to be relevant to
32-bit systems as well, as the ISA DMA zone is the first 16 MB of RAM.
So address some of these problems:
- Move it into arch/x86/mm/init.c, next to the other zone setup related
functions.
- Clean up the code flow and names of local variables a bit.
- Rename it to memblock_set_dma_reserve()
- Improve the comments.
No change in functionality. Enabling it for 32-bit systems is left
for a separate patch.
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
So memblock_x86_fill() is another E820 code misnomer:
- nothing in its name tells us that it's part of the E820 subsystem ...
- The 'fill' wording is ambiguous and doesn't tell us whether it's a single
entry or some process - while the _real_ purpose of the function is hidden,
which is to do a complete setup of the (platform independent) memblock regions.
So rename it accordingly, to e820__memblock_setup().
Also translate this incomprehensible and misleading comment:
/*
* EFI may have more than 128 entries
* We are safe to enable resizing, beause memblock_x86_fill()
* is rather later for x86
*/
memblock_allow_resize();
The worst aspect of this comment isn't even the sloppy typos, but that it
casually mentions a '128' number with no explanation, which makes one lead
to the assumption that this is related to the well-known limit of a maximum
of 128 E820 entries passed via legacy bootloaders.
But no, the _real_ meaning of 128 here is that of the memblock subsystem,
which too happens to have a 128 entries limit for very early memblock
regions (which is unrelated to E820), via INIT_MEMBLOCK_REGIONS ...
So change the comment to a more comprehensible version:
/*
* The bootstrap memblock region count maximum is 128 entries
* (INIT_MEMBLOCK_REGIONS), but EFI might pass us more E820 entries
* than that - so allow memblock resizing.
*
* This is safe, because this call happens pretty late during x86 setup,
* so we know about reserved memory regions already. (This is important
* so that memblock resizing does no stomp over reserved areas.)
*/
memblock_allow_resize();
No change in functionality.
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
So the 'e820_table_saved' is a bit of a misnomer that hides its real purpose.
At first sight the name suggests that it's some sort save/restore mechanism,
as this is how we typically name such facilities in the kernel.
But that is not so, e820_table_saved is the original firmware version of the
e820 table, not modified by the kernel. This table is displayed in the
/sys/firmware/memmap file, and it's also used by the hibernation code to
calculate a physical memory layout MD5 fingerprint checksum which is
invariant of the kernel.
So rename it to 'e820_table_firmware' and update all the comments to better
describe the main e820 data strutures.
Also rename:
'initial_e820_table_saved' => 'e820_table_firmware_init'
'e820_update_range_saved' => 'e820_update_range_firmware'
... to better match the new nomenclature.
No change in functionality.
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The 'e820entry' and 'e820map' names have various annoyances:
- the missing underscore departs from the usual kernel style
and makes the code look weird,
- in the past I kept confusing the 'map' with the 'entry', because
a 'map' is ambiguous in that regard,
- it's not really clear from the 'e820map' that this is a regular
C array.
Rename them to 'struct e820_entry' and 'struct e820_array' accordingly.
( Leave the legacy UAPI header alone but do the rename in the bootparam.h
and e820/types.h file - outside tools relying on these defines should
either adjust their code, or should use the legacy header, or should
create their private copies for the definitions. )
No change in functionality.
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>