Pull x86 mm updates from Ingo Molnar:
"The main changes in this cycle were:
- Continued work to add support for 5-level paging provided by future
Intel CPUs. In particular we switch the x86 GUP code to the generic
implementation. (Kirill A. Shutemov)
- Continued work to add PCID CPU support to native kernels as well.
In this round most of the focus is on reworking/refreshing the TLB
flush infrastructure for the upcoming PCID changes. (Andy
Lutomirski)"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
x86/mm: Delete a big outdated comment about TLB flushing
x86/mm: Don't reenter flush_tlb_func_common()
x86/KASLR: Fix detection 32/64 bit bootloaders for 5-level paging
x86/ftrace: Exclude functions in head64.c from function-tracing
x86/mmap, ASLR: Do not treat unlimited-stack tasks as legacy mmap
x86/mm: Remove reset_lazy_tlbstate()
x86/ldt: Simplify the LDT switching logic
x86/boot/64: Put __startup_64() into .head.text
x86/mm: Add support for 5-level paging for KASLR
x86/mm: Make kernel_physical_mapping_init() support 5-level paging
x86/mm: Add sync_global_pgds() for configuration with 5-level paging
x86/boot/64: Add support of additional page table level during early boot
x86/boot/64: Rename init_level4_pgt and early_level4_pgt
x86/boot/64: Rewrite startup_64() in C
x86/boot/compressed: Enable 5-level paging during decompression stage
x86/boot/efi: Define __KERNEL32_CS GDT on 64-bit configurations
x86/boot/efi: Fix __KERNEL_CS definition of GDT entry on 64-bit configurations
x86/boot/efi: Cleanup initialization of GDT entries
x86/asm: Fix comment in return_from_SYSCALL_64()
x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation
...
Pull EFI updates from Ingo Molnar:
"The main changes in this cycle were:
- Rework the EFI capsule loader to allow for workarounds for
non-compliant firmware (Ard Biesheuvel)
- Implement a capsule loader quirk for Quark X102x (Jan Kiszka)
- Enable SMBIOS/DMI support for the ARM architecture (Ard Biesheuvel)
- Add CONFIG_EFI_PGT_DUMP=y support for x86-32 and kexec (Sai
Praneeth)
- Fixes for EFI support for Xen dom0 guests running under x86-64
hosts (Daniel Kiper)"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/xen/efi: Initialize only the EFI struct members used by Xen
efi: Process the MEMATTR table only if EFI_MEMMAP is enabled
efi/arm: Enable DMI/SMBIOS
x86/efi: Extend CONFIG_EFI_PGT_DUMP support to x86_32 and kexec as well
efi/efi_test: Use memdup_user() helper
efi/capsule: Add support for Quark security header
efi/capsule-loader: Use page addresses rather than struct page pointers
efi/capsule-loader: Redirect calls to efi_capsule_setup_info() via weak alias
efi/capsule: Remove NULL test on kmap()
efi/capsule-loader: Use a cached copy of the capsule header
efi/capsule: Adjust return type of efi_capsule_setup_info()
efi/capsule: Clean up pr_err/_info() messages
efi/capsule: Remove pr_debug() on ENOMEM or EFAULT
efi/capsule: Fix return code on failing kmap/vmap
For EFI with the 'efi=old_map' kernel option specified, the kernel will panic
when KASLR is enabled:
BUG: unable to handle kernel paging request at 000000007febd57e
IP: 0x7febd57e
PGD 1025a067
PUD 0
Oops: 0010 [#1] SMP
Call Trace:
efi_enter_virtual_mode()
start_kernel()
x86_64_start_reservations()
x86_64_start_kernel()
start_cpu()
The root cause is that the identity mapping is not built correctly
in the 'efi=old_map' case.
On 'nokaslr' kernels, PAGE_OFFSET is 0xffff880000000000 which is PGDIR_SIZE
aligned. We can borrow the PUD table from the direct mappings safely. Given a
physical address X, we have pud_index(X) == pud_index(__va(X)).
However, on KASLR kernels, PAGE_OFFSET is PUD_SIZE aligned. For a given physical
address X, pud_index(X) != pud_index(__va(X)). We can't just copy the PGD entry
from direct mapping to build identity mapping, instead we need to copy the
PUD entries one by one from the direct mapping.
Fix it.
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Young <dyoung@redhat.com>
Cc: Frank Ramsay <frank.ramsay@hpe.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20170526113652.21339-5-matt@codeblueprint.co.uk
[ Fixed and reworded the changelog and code comments to be more readable. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Booting kexec kernel with "efi=old_map" in kernel command line hits
kernel panic as shown below.
BUG: unable to handle kernel paging request at ffff88007fe78070
IP: virt_efi_set_variable.part.7+0x63/0x1b0
PGD 7ea28067
PUD 7ea2b067
PMD 7ea2d067
PTE 0
[...]
Call Trace:
virt_efi_set_variable()
efi_delete_dummy_variable()
efi_enter_virtual_mode()
start_kernel()
x86_64_start_reservations()
x86_64_start_kernel()
start_cpu()
[ efi=old_map was never intended to work with kexec. The problem with
using efi=old_map is that the virtual addresses are assigned from the
memory region used by other kernel mappings; vmalloc() space.
Potentially there could be collisions when booting kexec if something
else is mapped at the virtual address we allocated for runtime service
regions in the initial boot - Matt Fleming ]
Since kexec was never intended to work with efi=old_map, disable
runtime services in kexec if booted with efi=old_map, so that we don't
panic.
Tested-by: Lee Chun-Yi <jlee@suse.com>
Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20170526113652.21339-4-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 mm updates from Ingo Molnar:
"The main x86 MM changes in this cycle were:
- continued native kernel PCID support preparation patches to the TLB
flushing code (Andy Lutomirski)
- various fixes related to 32-bit compat syscall returning address
over 4Gb in applications, launched from 64-bit binaries - motivated
by C/R frameworks such as Virtuozzo. (Dmitry Safonov)
- continued Intel 5-level paging enablement: in particular the
conversion of x86 GUP to the generic GUP code. (Kirill A. Shutemov)
- x86/mpx ABI corner case fixes/enhancements (Joerg Roedel)
- ... plus misc updates, fixes and cleanups"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits)
mm, zone_device: Replace {get, put}_zone_device_page() with a single reference to fix pmem crash
x86/mm: Fix flush_tlb_page() on Xen
x86/mm: Make flush_tlb_mm_range() more predictable
x86/mm: Remove flush_tlb() and flush_tlb_current_task()
x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly()
x86/mm/64: Fix crash in remove_pagetable()
Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation"
x86/boot/e820: Remove a redundant self assignment
x86/mm: Fix dump pagetables for 4 levels of page tables
x86/mpx, selftests: Only check bounds-vs-shadow when we keep shadow
x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space
Revert "x86/mm/numa: Remove numa_nodemask_from_meminfo()"
x86/espfix: Add support for 5-level paging
x86/kasan: Extend KASAN to support 5-level paging
x86/mm: Add basic defines/helpers for CONFIG_X86_5LEVEL=y
x86/paravirt: Add 5-level support to the paravirt code
x86/mm: Define virtual memory map for 5-level paging
x86/asm: Remove __VIRTUAL_MASK_SHIFT==47 assert
x86/boot: Detect 5-level paging support
x86/mm/numa: Remove numa_nodemask_from_meminfo()
...
Pull x86 boot updates from Ingo Molnar:
"The biggest changes in this cycle were:
- reworking of the e820 code: separate in-kernel and boot-ABI data
structures and apply a whole range of cleanups to the kernel side.
No change in functionality.
- enable KASLR by default: it's used by all major distros and it's
out of the experimental stage as well.
- ... misc fixes and cleanups"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits)
x86/KASLR: Fix kexec kernel boot crash when KASLR randomization fails
x86/reboot: Turn off KVM when halting a CPU
x86/boot: Fix BSS corruption/overwrite bug in early x86 kernel startup
x86: Enable KASLR by default
boot/param: Move next_arg() function to lib/cmdline.c for later reuse
x86/boot: Fix Sparse warning by including required header file
x86/boot/64: Rename start_cpu()
x86/xen: Update e820 table handling to the new core x86 E820 code
x86/boot: Fix pr_debug() API braindamage
xen, x86/headers: Add <linux/device.h> dependency to <asm/xen/page.h>
x86/boot/e820: Simplify e820__update_table()
x86/boot/e820: Separate the E820 ABI structures from the in-kernel structures
x86/boot/e820: Fix and clean up e820_type switch() statements
x86/boot/e820: Rename the remaining E820 APIs to the e820__*() prefix
x86/boot/e820: Remove unnecessary #include's
x86/boot/e820: Rename e820_mark_nosave_regions() to e820__register_nosave_regions()
x86/boot/e820: Rename e820_reserve_resources*() to e820__reserve_resources*()
x86/boot/e820: Use bool in query APIs
x86/boot/e820: Document e820__reserve_setup_data()
x86/boot/e820: Clean up __e820__update_table() et al
...
Pull EFI updates from Ingo Molnar:
"The main changes in this cycle were:
- move BGRT handling to drivers/acpi so it can be shared between x86
and ARM
- bring the EFI stub's initrd and FDT allocation logic in line with
the latest changes to the arm64 boot protocol
- improvements and fixes to the EFI stub's command line parsing
routines
- randomize the virtual mapping of the UEFI runtime services on
ARM/arm64
- ... and other misc enhancements, cleanups and fixes"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/libstub/arm: Don't use TASK_SIZE when randomizing the RT space
ef/libstub/arm/arm64: Randomize the base of the UEFI rt services region
efi/libstub/arm/arm64: Disable debug prints on 'quiet' cmdline arg
efi/libstub: Unify command line param parsing
efi/libstub: Fix harmless command line parsing bug
efi/arm32-stub: Allow boot-time allocations in the vmlinux region
x86/efi: Clean up a minor mistake in comment
efi/pstore: Return error code (if any) from efi_pstore_write()
efi/bgrt: Enable ACPI BGRT handling on arm64
x86/efi/bgrt: Move efi-bgrt handling out of arch/x86
efi/arm-stub: Round up FDT allocation to mapping size
efi/arm-stub: Correct FDT and initrd allocation rules for arm64
Reserving a runtime region results in splitting the EFI memory
descriptors for the runtime region. This results in runtime region
descriptors with bogus memory mappings, leading to interesting crashes
like the following during a kexec:
general protection fault: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc1 #53
Hardware name: Wiwynn Leopard-Orv2/Leopard-DDR BW, BIOS LBM05 09/30/2016
RIP: 0010:virt_efi_set_variable()
...
Call Trace:
efi_delete_dummy_variable()
efi_enter_virtual_mode()
start_kernel()
? set_init_arg()
x86_64_start_reservations()
x86_64_start_kernel()
start_cpu()
...
Kernel panic - not syncing: Fatal exception
Runtime regions will not be freed and do not need to be reserved, so
skip the memmap modification in this case.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: <stable@vger.kernel.org> # v4.9+
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Fixes: 8e80632fb2 ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()")
Link: http://lkml.kernel.org/r/20170412152719.9779-2-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There's a conflict between ongoing level-5 paging support and
the E820 rewrite. Since the E820 rewrite is essentially ready,
merge it into x86/mm to reduce tree conflicts.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
UEFI v2.6 introduces EFI_MEMORY_ATTRIBUTES_TABLE which describes memory
protections that may be applied to the EFI Runtime code and data regions by
the kernel. This enables the kernel to map these regions more strictly thereby
increasing security.
Presently, the only valid bits for the attribute field of a memory descriptor
are EFI_MEMORY_RO and EFI_MEMORY_XP, hence use these bits to update the
mappings in efi_pgd.
The UEFI specification recommends to use this feature instead of
EFI_PROPERTIES_TABLE and hence while updating EFI mappings we first
check for EFI_MEMORY_ATTRIBUTES_TABLE and if it's present we update
the mappings according to this table and hence disregarding
EFI_PROPERTIES_TABLE even if it's published by the firmware. We consider
EFI_PROPERTIES_TABLE only when EFI_MEMORY_ATTRIBUTES_TABLE is absent.
Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Lee, Chun-Yi <jlee@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1485868902-20401-6-git-send-email-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The e820__update_table() parameters are pretty complex:
arch/x86/include/asm/e820/api.h:extern int e820__update_table(struct e820_entry *biosmap, int max_nr_map, u32 *pnr_map);
But 90% of the usage is trivial:
arch/x86/kernel/e820.c: if (e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries))
arch/x86/kernel/e820.c: e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries);
arch/x86/kernel/e820.c: e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries);
arch/x86/kernel/e820.c: if (e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries) < 0)
arch/x86/kernel/e820.c: e820__update_table(boot_params.e820_table, ARRAY_SIZE(boot_params.e820_table), &new_nr);
arch/x86/kernel/early-quirks.c: e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries);
arch/x86/kernel/setup.c: e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries);
arch/x86/kernel/setup.c: e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries);
arch/x86/platform/efi/efi.c: e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries);
arch/x86/xen/setup.c: e820__update_table(xen_e820_table.entries, ARRAY_SIZE(xen_e820_table.entries),
arch/x86/xen/setup.c: e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries);
arch/x86/xen/setup.c: e820__update_table(xen_e820_table.entries, ARRAY_SIZE(xen_e820_table.entries),
as it only uses an exiting struct e820_table's entries array, its size and
its current number of entries as input and output arguments.
Only one use is non-trivial:
arch/x86/kernel/e820.c: e820__update_table(boot_params.e820_table, ARRAY_SIZE(boot_params.e820_table), &new_nr);
... which call updates the E820 table in the zeropage in-situ, and the layout there does not
match that of 'struct e820_table' (in particular nr_entries is at a different offset,
hardcoded by the boot protocol).
Simplify all this by introducing a low level __e820__update_table() API that
the zeropage update call can use, and simplifying the main e820__update_table()
call signature down to:
int e820__update_table(struct e820_table *table);
This visibly simplifies all the call sites:
arch/x86/include/asm/e820/api.h:extern int e820__update_table(struct e820_table *table);
arch/x86/include/asm/e820/types.h: * call to e820__update_table() to remove duplicates. The allowance
arch/x86/kernel/e820.c: * The return value from e820__update_table() is zero if it
arch/x86/kernel/e820.c:int __init e820__update_table(struct e820_table *table)
arch/x86/kernel/e820.c: if (e820__update_table(e820_table))
arch/x86/kernel/e820.c: e820__update_table(e820_table_firmware);
arch/x86/kernel/e820.c: e820__update_table(e820_table);
arch/x86/kernel/e820.c: e820__update_table(e820_table);
arch/x86/kernel/e820.c: if (e820__update_table(e820_table) < 0)
arch/x86/kernel/early-quirks.c: e820__update_table(e820_table);
arch/x86/kernel/setup.c: e820__update_table(e820_table);
arch/x86/kernel/setup.c: e820__update_table(e820_table);
arch/x86/platform/efi/efi.c: e820__update_table(e820_table);
arch/x86/xen/setup.c: e820__update_table(&xen_e820_table);
arch/x86/xen/setup.c: e820__update_table(e820_table);
arch/x86/xen/setup.c: e820__update_table(&xen_e820_table);
No change in functionality.
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We have these three related functions:
extern void e820_add_region(u64 start, u64 size, int type);
extern u64 e820_update_range(u64 start, u64 size, unsigned old_type, unsigned new_type);
extern u64 e820_remove_range(u64 start, u64 size, unsigned old_type, int checktype);
But it's not clear from the naming that they are 3 operations based around the
same 'memory range' concept. Rename them to better signal this, and move
the prototypes next to each other:
extern void e820__range_add (u64 start, u64 size, int type);
extern u64 e820__range_update(u64 start, u64 size, unsigned old_type, unsigned new_type);
extern u64 e820__range_remove(u64 start, u64 size, unsigned old_type, int checktype);
Note that this improved organization of the functions shows another problem that was easy
to miss before: sometimes the E820 entry type is 'int', sometimes 'unsigned int' - but this
will be fixed in a separate patch.
No change in functionality.
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
sanitize_e820_table() is a minor misnomer in that it suggests that
the E820 table requires sanitizing - which implies that it will only
do anything if the E820 table is irregular (not sane).
That is wrong, because sanitize_e820_table() also does a very regular
sorting of the E820 table, which is a necessity in the basic
append-only flow of E820 updates the kernel is allowed to perform to
it.
So rename it to e820__update_table() to include that purpose as well.
This also lines up all the table-update functions into a coherent
naming family:
int e820__update_table(struct e820_entry *biosmap, int max_nr_map, u32 *pnr_map);
void e820__update_table_print(void);
void e820__update_table_firmware(void);
No change in functionality.
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit:
129766708 ("x86/efi: Only map RAM into EFI page tables if in mixed-mode")
stopped creating 1:1 mappings for all RAM, when running in native 64-bit mode.
It turns out though that there are 64-bit EFI implementations in the wild
(this particular problem has been reported on a Lenovo Yoga 710-11IKB),
which still make use of the first physical page for their own private use,
even though they explicitly mark it EFI_CONVENTIONAL_MEMORY in the memory
map.
In case there is no mapping for this particular frame in the EFI pagetables,
as soon as firmware tries to make use of it, a triple fault occurs and the
system reboots (in case of the Yoga 710-11IKB this is very early during bootup).
Fix that by always mapping the first page of physical memory into the EFI
pagetables. We're free to hand this page to the BIOS, as trim_bios_range()
will reserve the first page and isolate it away from memory allocators anyway.
Note that just reverting 129766708 alone is not enough on v4.9-rc1+ to fix the
regression on affected hardware, as this commit:
ab72a27da ("x86/efi: Consolidate region mapping logic")
later made the first physical frame not to be mapped anyway.
Reported-by: Hanka Pavlikova <hanka@ucw.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vojtech Pavlik <vojtech@ucw.cz>
Cc: Waiman Long <waiman.long@hpe.com>
Cc: linux-efi@vger.kernel.org
Cc: stable@kernel.org # v4.8+
Fixes: 129766708 ("x86/efi: Only map RAM into EFI page tables if in mixed-mode")
Link: http://lkml.kernel.org/r/20170127222552.22336-1-matt@codeblueprint.co.uk
[ Tidied up the changelog and the comment. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Some machines, such as the Lenovo ThinkPad W541 with firmware GNET80WW
(2.28), include memory map entries with phys_addr=0x0 and num_pages=0.
These machines fail to boot after the following commit,
commit 8e80632fb2 ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()")
Fix this by removing such bogus entries from the memory map.
Furthermore, currently the log output for this case (with efi=debug)
looks like:
[ 0.000000] efi: mem45: [Reserved | | | | | | | | | | | | ] range=[0x0000000000000000-0xffffffffffffffff] (0MB)
This is clearly wrong, and also not as informative as it could be. This
patch changes it so that if we find obviously invalid memory map
entries, we print an error and skip those entries. It also detects the
display of the address range calculation overflow, so the new output is:
[ 0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries:
[ 0.000000] efi: mem45: [Reserved | | | | | | | | | | | | ] range=[0x0000000000000000-0x0000000000000000] (invalid)
It also detects memory map sizes that would overflow the physical
address, for example phys_addr=0xfffffffffffff000 and
num_pages=0x0200000000000001, and prints:
[ 0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries:
[ 0.000000] efi: mem45: [Reserved | | | | | | | | | | | | ] range=[phys_addr=0xfffffffffffff000-0x20ffffffffffffffff] (invalid)
It then removes these entries from the memory map.
Signed-off-by: Peter Jones <pjones@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[ardb: refactor for clarity with no functional changes, avoid PAGE_SHIFT]
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
[Matt: Include bugzilla info in commit log]
Cc: <stable@vger.kernel.org> # v4.9+
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=191121
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With the following commit:
4bc9f92e64 ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data")
... efi_bgrt_init() calls into the memblock allocator through
efi_mem_reserve() => efi_arch_mem_reserve() *after* mm_init() has been called.
Indeed, KASAN reports a bad read access later on in efi_free_boot_services():
BUG: KASAN: use-after-free in efi_free_boot_services+0xae/0x24c
at addr ffff88022de12740
Read of size 4 by task swapper/0/0
page:ffffea0008b78480 count:0 mapcount:-127
mapping: (null) index:0x1 flags: 0x5fff8000000000()
[...]
Call Trace:
dump_stack+0x68/0x9f
kasan_report_error+0x4c8/0x500
kasan_report+0x58/0x60
__asan_load4+0x61/0x80
efi_free_boot_services+0xae/0x24c
start_kernel+0x527/0x562
x86_64_start_reservations+0x24/0x26
x86_64_start_kernel+0x157/0x17a
start_cpu+0x5/0x14
The instruction at the given address is the first read from the memmap's
memory, i.e. the read of md->type in efi_free_boot_services().
Note that the writes earlier in efi_arch_mem_reserve() don't splat because
they're done through early_memremap()ed addresses.
So, after memblock is gone, allocations should be done through the "normal"
page allocator. Introduce a helper, efi_memmap_alloc() for this. Use
it from efi_arch_mem_reserve(), efi_free_boot_services() and, for the sake
of consistency, from efi_fake_memmap() as well.
Note that for the latter, the memmap allocations cease to be page aligned.
This isn't needed though.
Tested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: <stable@vger.kernel.org> # v4.9
Cc: Dave Young <dyoung@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Mika Penttilä <mika.penttila@nextfour.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Fixes: 4bc9f92e64 ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data")
Link: http://lkml.kernel.org/r/20170105125130.2815-1-nicstange@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fix this when building on 32-bit:
arch/x86/platform/efi/efi.c: In function ‘__efi_enter_virtual_mode’:
arch/x86/platform/efi/efi.c:911:5: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
(efi_memory_desc_t *)pa);
^
arch/x86/platform/efi/efi.c:918:5: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
(efi_memory_desc_t *)pa);
^
The @pa local variable is declared as phys_addr_t and that is a u64 when
CONFIG_PHYS_ADDR_T_64BIT=y. (The last is enabled on 32-bit on a PAE
build.)
However, its value comes from __pa() which is basically doing pointer
arithmetic and checking, and returns unsigned long as it is the native
pointer width.
So let's use an unsigned long too. It should be fine to do so because
the later users cast it to a pointer too.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20161112210424.5157-2-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 boot updates from Ingo Molnar:
"The changes in this cycle were:
- Save e820 table RAM footprint on larger kernel configurations.
(Denys Vlasenko)
- pmem related fixes (Dan Williams)
- theoretical e820 boundary condition fix (Wei Yang)"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation
x86/e820: Use much less memory for e820/e820_saved, save up to 120k
x86/e820: Prepare e280 code for switch to dynamic storage
x86/e820: Mark some static functions __init
x86/e820: Fix very large 'size' handling boundary condition
Pull EFI fix from Matt Fleming:
* Fix a boot crash reported by Mike Galbraith and Mike Krinkin. The
new EFI memory map reservation code didn't align reservations to
EFI_PAGE_SIZE boundaries causing bogus regions to be inserted into
the global EFI memory map (Matt Fleming)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mike Galbraith reported that his machine started rebooting during boot
after,
commit 8e80632fb2 ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()")
The ESRT table on his machine is 56 bytes and at no point in the
efi_arch_mem_reserve() call path is that size rounded up to
EFI_PAGE_SIZE, nor is the start address on an EFI_PAGE_SIZE boundary.
Since the EFI memory map only deals with whole pages, inserting an EFI
memory region with 56 bytes results in a new entry covering zero
pages, and completely screws up the calculations for the old regions
that were trimmed.
Round all sizes upwards, and start addresses downwards, to the nearest
EFI_PAGE_SIZE boundary.
Additionally, efi_memmap_insert() expects the mem::range::end value to
be one less than the end address for the region.
Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Reported-by: Mike Krinkin <krinkin.m.u@gmail.com>
Tested-by: Mike Krinkin <krinkin.m.u@gmail.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
* A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus reuse the corresponding function "kmalloc_array".
This issue was detected by using the Coccinelle software.
* Replace the specification of a data type by a pointer dereference
to make the corresponding size determination a bit safer according to
the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>