Sparse complains about wrong address space used in __acpi_map_table()
and in __acpi_unmap_table().
arch/x86/kernel/acpi/boot.c:127:29: warning: incorrect type in return expression (different address spaces)
arch/x86/kernel/acpi/boot.c:127:29: expected char *
arch/x86/kernel/acpi/boot.c:127:29: got void [noderef] <asn:2>*
arch/x86/kernel/acpi/boot.c:135:23: warning: incorrect type in argument 1 (different address spaces)
arch/x86/kernel/acpi/boot.c:135:23: expected void [noderef] <asn:2>*addr
arch/x86/kernel/acpi/boot.c:135:23: got char *map
Correct address space to be in align of type of returned and passed
parameter.
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Some code in acpi_parse_x2apic() conditionally compiled, though parts of
it are being used in any case. This annoys gcc.
arch/x86/kernel/acpi/boot.c: In function ‘acpi_parse_x2apic’:
arch/x86/kernel/acpi/boot.c:203:5: warning: variable ‘enabled’ set but not used [-Wunused-but-set-variable]
u8 enabled;
^~~~~~~
arch/x86/kernel/acpi/boot.c:202:6: warning: variable ‘apic_id’ set but not used [-Wunused-but-set-variable]
int apic_id;
^~~~~~~
Re-arrange the code to avoid compiling unused variables.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
struct siginfo is a union and the kernel since 2.4 has been hiding a union
tag in the high 16bits of si_code using the values:
__SI_KILL
__SI_TIMER
__SI_POLL
__SI_FAULT
__SI_CHLD
__SI_RT
__SI_MESGQ
__SI_SYS
While this looks plausible on the surface, in practice this situation has
not worked well.
- Injected positive signals are not copied to user space properly
unless they have these magic high bits set.
- Injected positive signals are not reported properly by signalfd
unless they have these magic high bits set.
- These kernel internal values leaked to userspace via ptrace_peek_siginfo
- It was possible to inject these kernel internal values and cause the
the kernel to misbehave.
- Kernel developers got confused and expected these kernel internal values
in userspace in kernel self tests.
- Kernel developers got confused and set si_code to __SI_FAULT which
is SI_USER in userspace which causes userspace to think an ordinary user
sent the signal and that it was not kernel generated.
- The values make it impossible to reorganize the code to transform
siginfo_copy_to_user into a plain copy_to_user. As si_code must
be massaged before being passed to userspace.
So remove these kernel internal si codes and make the kernel code simpler
and more maintainable.
To replace these kernel internal magic si_codes introduce the helper
function siginfo_layout, that takes a signal number and an si_code and
computes which union member of siginfo is being used. Have
siginfo_layout return an enumeration so that gcc will have enough
information to warn if a switch statement does not handle all of union
members.
A couple of architectures have a messed up ABI that defines signal
specific duplications of SI_USER which causes more special cases in
siginfo_layout than I would like. The good news is only problem
architectures pay the cost.
Update all of the code that used the previous magic __SI_ values to
use the new SIL_ values and to call siginfo_layout to get those
values. Escept where not all of the cases are handled remove the
defaults in the switch statements so that if a new case is missed in
the future the lack will show up at compile time.
Modify the code that copies siginfo si_code to userspace to just copy
the value and not cast si_code to a short first. The high bits are no
longer used to hold a magic union member.
Fixup the siginfo header files to stop including the __SI_ values in
their constants and for the headers that were missing it to properly
update the number of si_codes for each signal type.
The fixes to copy_siginfo_from_user32 implementations has the
interesting property that several of them perviously should never have
worked as the __SI_ values they depended up where kernel internal.
With that dependency gone those implementations should work much
better.
The idea of not passing the __SI_ values out to userspace and then
not reinserting them has been tested with criu and criu worked without
changes.
Ref: 2.4.0-test1
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
On x86, 5-level paging enables 56-bit userspace virtual address space.
Not all user space is ready to handle wide addresses. It's known that
at least some JIT compilers use higher bits in pointers to encode their
information. It collides with valid pointers with 5-level paging and
leads to crashes.
To mitigate this, we are not going to allocate virtual address space
above 47-bit by default.
But userspace can ask for allocation from full address space by
specifying hint address (with or without MAP_FIXED) above 47-bits.
If hint address set above 47-bit, but MAP_FIXED is not specified, we try
to look for unmapped area by specified address. If it's already
occupied, we look for unmapped area in *full* address space, rather than
from 47-bit window.
A high hint address would only affect the allocation in question, but not
any future mmap()s.
Specifying high hint address on older kernel or on machine without 5-level
paging support is safe. The hint will be ignored and kernel will fall back
to allocation from 47-bit address space.
This approach helps to easily make application's memory allocator aware
about large address space without manually tracking allocated virtual
address space.
The patch puts all machinery in place, but not yet allows userspace to have
mappings above 47-bit -- TASK_SIZE_MAX has to be raised to get the effect.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170716225954.74185-7-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
One of the rarely executed code pathes in check_timer() calls
unmask_ioapic_irq() passing irq_get_chip_data(0) as argument.
That's wrong as unmask_ioapic_irq() expects a pointer to the irq data of
interrupt 0. irq_get_chip_data(0) returns NULL, so the following
dereference in unmask_ioapic_irq() causes a kernel panic.
The issue went unnoticed in the first place because irq_get_chip_data()
returns a void pointer so the compiler cannot do a type check on the
argument. The code path was added for machines with broken configuration,
but it seems that those machines are either not running current kernels or
simply do not longer exist.
Hand in irq_get_irq_data(0) as argument which provides the correct data.
[ tglx: Rewrote changelog ]
Fixes: 4467715a44 ("x86/irq: Move irq_cfg.irq_2_pin into io_apic.c")
Signed-off-by: Seunghun Han <kkamagui@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1500369644-45767-1-git-send-email-kkamagui@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The bus_irq argument of mp_override_legacy_irq() is used as the index into
the isa_irq_to_gsi[] array. The bus_irq argument originates from
ACPI_MADT_TYPE_IO_APIC and ACPI_MADT_TYPE_INTERRUPT items in the ACPI
tables, but is nowhere sanity checked.
That allows broken or malicious ACPI tables to overwrite memory, which
might cause malfunction, panic or arbitrary code execution.
Add a sanity check and emit a warning when that triggers.
[ tglx: Added warning and rewrote changelog ]
Signed-off-by: Seunghun Han <kkamagui@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2nd round of 4.14 features:
- prep for deferred fbdev setup
- refactor fixed 16.16 computations and skl+ wm code (Mahesh Kumar)
- more cnl paches (Rodrigo, Imre et al)
- tighten context cleanup and handling (Chris Wilson)
- fix interlaced handling on skl+ (Mahesh Kumar)
- small bits as usual
* tag 'drm-intel-next-2017-07-17' of git://anongit.freedesktop.org/git/drm-intel: (84 commits)
drm/i915: Update DRIVER_DATE to 20170717
drm/i915: Protect against deferred fbdev setup
drm/i915/fbdev: Always forward hotplug events
drm/i915/skl+: unify cpp value in WM calculation
drm/i915/skl+: WM calculation don't require height
drm/i915: Addition wrapper for fixed16.16 operation
drm/i915: cleanup fixed-point wrappers naming
drm/i915: Always perform internal fixed16 division in 64 bits
drm/i915: take-out common clamping code of fixed16 wrappers
drm/i915/cnl: Add missing type case.
drm/i915/cnl: Add max allowed Cannonlake DC.
drm/i915: Make DP-MST connector info work
drm/i915/cnl: Get DDI clock based on PLLs.
drm/i915/cnl: Inherit RPS stuff from previous platforms.
drm/i915/cnl: Gen10 render context size.
drm/i915/cnl: Don't trust VBT's alternate pin for port D for now.
drm/i915: Fix the kernel panic when using aliasing ppgtt
drm/i915/cnl: Cannonlake color init.
drm/i915/cnl: Add force wake for gen10+.
x86/gpu: CNL uses the same GMS values as SKL
...
Changes to the existing page table macros will allow the SME support to
be enabled in a simple fashion with minimal changes to files that use these
macros. Since the memory encryption mask will now be part of the regular
pagetable macros, we introduce two new macros (_PAGE_TABLE_NOENC and
_KERNPG_TABLE_NOENC) to allow for early pagetable creation/initialization
without the encryption mask before SME becomes active. Two new pgprot()
macros are defined to allow setting or clearing the page encryption mask.
The FIXMAP_PAGE_NOCACHE define is introduced for use with MMIO. SME does
not support encryption for MMIO areas so this define removes the encryption
mask from the page attribute.
Two new macros are introduced (__sme_pa() / __sme_pa_nodebug()) to allow
creating a physical address with the encryption mask. These are used when
working with the cr3 register so that the PGD can be encrypted. The current
__va() macro is updated so that the virtual address is generated based off
of the physical address without the encryption mask thus allowing the same
virtual address to be generated regardless of whether encryption is enabled
for that physical location or not.
Also, an early initialization function is added for SME. If SME is active,
this function:
- Updates the early_pmd_flags so that early page faults create mappings
with the encryption mask.
- Updates the __supported_pte_mask to include the encryption mask.
- Updates the protection_map entries to include the encryption mask so
that user-space allocations will automatically have the encryption mask
applied.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Toshimitsu Kani <toshi.kani@hpe.com>
Cc: kasan-dev@googlegroups.com
Cc: kvm@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/b36e952c4c39767ae7f0a41cf5345adf27438480.1500319216.git.thomas.lendacky@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The SME patches we are about to apply add some E820 logic, so merge in
pending E820 code changes first, to have a single code base.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Max virtual processor will be needed for 'extended' hypercalls supporting
more than 64 vCPUs. While on it, unify on 'Hyper-V' in mshyperv.c as we
currently have a mix, report acquired misc features as well.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull more KVM updates from Radim Krčmář:
"Second batch of KVM updates for v4.13
Common:
- add uevents for VM creation/destruction
- annotate and properly access RCU-protected objects
s390:
- rename IOCTL added in the first v4.13 merge
x86:
- emulate VMLOAD VMSAVE feature in SVM
- support paravirtual asynchronous page fault while nested
- add Hyper-V userspace interfaces for better migration
- improve master clock corner cases
- extend internal error reporting after EPT misconfig
- correct single-stepping of emulated instructions in SVM
- handle MCE during VM entry
- fix nVMX VM entry checks and nVMX VMCS shadowing"
* tag 'kvm-4.13-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
kvm: x86: hyperv: make VP_INDEX managed by userspace
KVM: async_pf: Let guest support delivery of async_pf from guest mode
KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf
KVM: async_pf: Add L1 guest async_pf #PF vmexit handler
KVM: x86: Simplify kvm_x86_ops->queue_exception parameter list
kvm: x86: hyperv: add KVM_CAP_HYPERV_SYNIC2
KVM: x86: make backwards_tsc_observed a per-VM variable
KVM: trigger uevents when creating or destroying a VM
KVM: SVM: Enable Virtual VMLOAD VMSAVE feature
KVM: SVM: Add Virtual VMLOAD VMSAVE feature definition
KVM: SVM: Rename lbr_ctl field in the vmcb control area
KVM: SVM: Prepare for new bit definition in lbr_ctl
KVM: SVM: handle singlestep exception when skipping emulated instructions
KVM: x86: take slots_lock in kvm_free_pit
KVM: s390: Fix KVM_S390_GET_CMMA_BITS ioctl definition
kvm: vmx: Properly handle machine check during VM-entry
KVM: x86: update master clock before computing kvmclock_offset
kvm: nVMX: Shadow "high" parts of shadowed 64-bit VMCS fields
kvm: nVMX: Fix nested_vmx_check_msr_bitmap_controls
kvm: nVMX: Validate the I/O bitmaps on nested VM-entry
...
Adds another flag bit (bit 2) to MSR_KVM_ASYNC_PF_EN. If bit 2 is 1,
async page faults are delivered to L1 as #PF vmexits; if bit 2 is 0,
kvm_can_do_async_pf returns 0 if in guest mode.
This is similar to what svm.c wanted to do all along, but it is only
enabled for Linux as L1 hypervisor. Foreign hypervisors must never
receive async page faults as vmexits, because they'd probably be very
confused about that.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Split SOFTLOCKUP_DETECTOR from LOCKUP_DETECTOR, and split
HARDLOCKUP_DETECTOR_PERF from HARDLOCKUP_DETECTOR.
LOCKUP_DETECTOR implies the general boot, sysctl, and programming
interfaces for the lockup detectors.
An architecture that wants to use a hard lockup detector must define
HAVE_HARDLOCKUP_DETECTOR_PERF or HAVE_HARDLOCKUP_DETECTOR_ARCH.
Alternatively an arch can define HAVE_NMI_WATCHDOG, which provides the
minimum arch_touch_nmi_watchdog, and it otherwise does its own thing and
does not implement the LOCKUP_DETECTOR interfaces.
sparc is unusual in that it has started to implement some of the
interfaces, but not fully yet. It should probably be converted to a full
HAVE_HARDLOCKUP_DETECTOR_ARCH.
[npiggin@gmail.com: fix]
Link: http://lkml.kernel.org/r/20170617223522.66c0ad88@roar.ozlabs.ibm.com
Link: http://lkml.kernel.org/r/20170616065715.18390-4-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Reviewed-by: Babu Moger <babu.moger@oracle.com>
Tested-by: Babu Moger <babu.moger@oracle.com> [sparc]
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As Eric said,
"what we need to do is move the variable vmcoreinfo_note out of the
kernel's .bss section. And modify the code to regenerate and keep this
information in something like the control page.
Definitely something like this needs a page all to itself, and ideally
far away from any other kernel data structures. I clearly was not
watching closely the data someone decided to keep this silly thing in
the kernel's .bss section."
This patch allocates extra pages for these vmcoreinfo_XXX variables, one
advantage is that it enhances some safety of vmcoreinfo, because
vmcoreinfo now is kept far away from other kernel data structures.
Link: http://lkml.kernel.org/r/1493281021-20737-1-git-send-email-xlpang@redhat.com
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Suggested-by: Eric Biederman <ebiederm@xmission.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Resync with the main drm-next pull request for 4.13. What we really
need is to fully resync with pending drm-misc, but that's not yet
possible due to the still ongoing merge window.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Pull x86 fixes from Thomas Gleixner:
"The x86 updates contain:
- A fix for a longstanding PAT bug, where PAT was reported on CPUs
that do not support it, which leads to wrong caching attributes and
missing MTRR updates
- Prevent overwriting of the e820 firmware table, which causes kexec
kernels to lose the fake mptable which is stored there.
- Cleanup of the UV/BAU code, removing unused code and making local
functions static"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/e820: Introduce the bootloader provided e820_table_firmware[] table
x86/boot/e820: Rename the e820_table_firmware to e820_table_kexec
x86/boot/e820: Avoid overwriting e820_table_firmware
x86/mm/pat: Don't report PAT on CPUs that don't support it
x86/platform/uv/BAU: Minor cleanup, make some local functions static
Pull dma-mapping infrastructure from Christoph Hellwig:
"This is the first pull request for the new dma-mapping subsystem
In this new subsystem we'll try to properly maintain all the generic
code related to dma-mapping, and will further consolidate arch code
into common helpers.
This pull request contains:
- removal of the DMA_ERROR_CODE macro, replacing it with calls to
->mapping_error so that the dma_map_ops instances are more self
contained and can be shared across architectures (me)
- removal of the ->set_dma_mask method, which duplicates the
->dma_capable one in terms of functionality, but requires more
duplicate code.
- various updates for the coherent dma pool and related arm code
(Vladimir)
- various smaller cleanups (me)"
* tag 'dma-mapping-4.13' of git://git.infradead.org/users/hch/dma-mapping: (56 commits)
ARM: dma-mapping: Remove traces of NOMMU code
ARM: NOMMU: Set ARM_DMA_MEM_BUFFERABLE for M-class cpus
ARM: NOMMU: Introduce dma operations for noMMU
drivers: dma-mapping: allow dma_common_mmap() for NOMMU
drivers: dma-coherent: Introduce default DMA pool
drivers: dma-coherent: Account dma_pfn_offset when used with device tree
dma: Take into account dma_pfn_offset
dma-mapping: replace dmam_alloc_noncoherent with dmam_alloc_attrs
dma-mapping: remove dmam_free_noncoherent
crypto: qat - avoid an uninitialized variable warning
au1100fb: remove a bogus dma_free_nonconsistent call
MAINTAINERS: add entry for dma mapping helpers
powerpc: merge __dma_set_mask into dma_set_mask
dma-mapping: remove the set_dma_mask method
powerpc/cell: use the dma_supported method for ops switching
powerpc/cell: clean up fixed mapping dma_ops initialization
tile: remove dma_supported and mapping_error methods
xen-swiotlb: remove xen_swiotlb_set_dma_mask
arm: implement ->dma_supported instead of ->set_dma_mask
mips/loongson64: implement ->dma_supported instead of ->set_dma_mask
...
Add the real e820_tabel_firmware[] that will not be modified by the kernel
or the EFI boot stub under any circumstance.
In addition to that modify the code so that e820_table_firmwarep[] is
exposed via sysfs to represent the real firmware memory layout,
rather than exposing the e820_table_kexec[] table.
This fixes a hibernation bug/warning, which uses e820_table_kexec[] to check
RAM layout consistency across hibernation/resume:
The suspend kernel:
[ 0.000000] e820: update [mem 0x76671018-0x76679457] usable ==> usable
The resume kernel:
[ 0.000000] e820: update [mem 0x7666f018-0x76677457] usable ==> usable
...
[ 15.752088] PM: Using 3 thread(s) for decompression.
[ 15.752088] PM: Loading and decompressing image data (471870 pages)...
[ 15.764971] Hibernate inconsistent memory map detected!
[ 15.770833] PM: Image mismatch: architecture specific data
Actually it is safe to restore these pages because E820_TYPE_RAM and
E820_TYPE_RESERVED_KERN are treated the same during hibernation, so
the original e820 table provided by the bootloader is used for
hibernation MD5 fingerprint checking.
The side effect is that, this newly introduced variable might increase the
kernel size at compile time.
Suggested-by: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Xunlei Pang <xlpang@redhat.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently the e820_table_firmware[] table is mainly used by the kexec,
and it is not what it's supposed to be - despite its name it might be
modified by the kernel.
So change its name to e820_table_kexec[]. In the next patch we will
introduce the real e820_table_firmware[] table.
No functional change.
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Xunlei Pang <xlpang@redhat.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The following commit in 2013:
77ea8c9489 ("x86: Reserve setup_data ranges late after parsing memmap cmdline")
has fixed the issue of losing setup_data information by deferring the
e820_reserve_setup_data() call until the early params have been parsed.
But this also introduced a new problem that, during early params parsing,
the kexec kernel might fake a mptable and saves it into the e820_table_firmware[]
table (without saving the mptable to the e820_table[]), however the subsequent
invoking of e820_reserve_setup_data() will overwrite the e820_table_firmware[]
according to the e820_table[], thus the fake mptable information is lost.
Fix this issue by updating the e820_table_firmware[] according to
the setup_data information, but without overwriting it.
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Xunlei Pang <xlpang@redhat.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The pat_enabled() logic is broken on CPUs which do not support PAT and
where the initialization code fails to call pat_init(). Due to that the
enabled flag stays true and pat_enabled() returns true wrongfully.
As a consequence the mappings, e.g. for Xorg, are set up with the wrong
caching mode and the required MTRR setups are omitted.
To cure this the following changes are required:
1) Make pat_enabled() return true only if PAT initialization was
invoked and successful.
2) Invoke init_cache_modes() unconditionally in setup_arch() and
remove the extra callsites in pat_disable() and the pat disabled
code path in pat_init().
Also rename __pat_enabled to pat_disabled to reflect the real purpose of
this variable.
Fixes: 9cd25aac1f ("x86/mm/pat: Emulate PAT when it is disabled")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Bernhard Held <berny156@gmx.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: "Luis R. Rodriguez" <mcgrof@suse.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1707041749300.3456@file01.intranet.prod.int.rdu2.redhat.com
Pull power management updates from Rafael Wysocki:
"The big ticket items here are the rework of suspend-to-idle in order
to add proper support for power button wakeup from it on recent Dell
laptops and the rework of interfaces exporting the current CPU
frequency on x86.
In addition to that, support for a few new pieces of hardware is
added, the PCI/ACPI device wakeup infrastructure is simplified
significantly and the wakeup IRQ framework is fixed to unbreak the IRQ
bus locking infrastructure.
Also, there are some functional improvements for intel_pstate, tools
updates and small fixes and cleanups all over.
Specifics:
- Rework suspend-to-idle to allow it to take wakeup events signaled
by the EC into account on ACPI-based platforms in order to properly
support power button wakeup from suspend-to-idle on recent Dell
laptops (Rafael Wysocki).
That includes the core suspend-to-idle code rework, support for the
Low Power S0 _DSM interface, and support for the ACPI INT0002
Virtual GPIO device from Hans de Goede (required for USB keyboard
wakeup from suspend-to-idle to work on some machines).
- Stop trying to export the current CPU frequency via /proc/cpuinfo
on x86 as that is inaccurate and confusing (Len Brown).
- Rework the way in which the current CPU frequency is exported by
the kernel (over the cpufreq sysfs interface) on x86 systems with
the APERF and MPERF registers by always using values read from
these registers, when available, to compute the current frequency
regardless of which cpufreq driver is in use (Len Brown).
- Rework the PCI/ACPI device wakeup infrastructure to remove the
questionable and artificial distinction between "devices that can
wake up the system from sleep states" and "devices that can
generate wakeup signals in the working state" from it, which allows
the code to be simplified quite a bit (Rafael Wysocki).
- Fix the wakeup IRQ framework by making it use SRCU instead of RCU
which doesn't allow sleeping in the read-side critical sections,
but which in turn is expected to be allowed by the IRQ bus locking
infrastructure (Thomas Gleixner).
- Modify some computations in the intel_pstate driver to avoid
rounding errors resulting from them (Srinivas Pandruvada).
- Reduce the overhead of the intel_pstate driver in the HWP
(hardware-managed P-states) mode and when the "performance" P-state
selection algorithm is in use by making it avoid registering
scheduler callbacks in those cases (Len Brown).
- Rework the energy_performance_preference sysfs knob in intel_pstate
by changing the values that correspond to different symbolic hint
names used by it (Len Brown).
- Make it possible to use more than one cpuidle driver at the same
time on ARM (Daniel Lezcano).
- Make it possible to prevent the cpuidle menu governor from using
the 0 state by disabling it via sysfs (Nicholas Piggin).
- Add support for FFH (Fixed Functional Hardware) MWAIT in ACPI C1 on
AMD systems (Yazen Ghannam).
- Make the CPPC cpufreq driver take the lowest nonlinear performance
information into account (Prashanth Prakash).
- Add support for hi3660 to the cpufreq-dt driver, fix the imx6q
driver and clean up the sfi, exynos5440 and intel_pstate drivers
(Colin Ian King, Krzysztof Kozlowski, Octavian Purdila, Rafael
Wysocki, Tao Wang).
- Fix a few minor issues in the generic power domains (genpd)
framework and clean it up somewhat (Krzysztof Kozlowski, Mikko
Perttunen, Viresh Kumar).
- Fix a couple of minor issues in the operating performance points
(OPP) framework and clean it up somewhat (Viresh Kumar).
- Fix a CONFIG dependency in the hibernation core and clean it up
slightly (Balbir Singh, Arvind Yadav, BaoJun Luo).
- Add rk3228 support to the rockchip-io adaptive voltage scaling
(AVS) driver (David Wu).
- Fix an incorrect bit shift operation in the RAPL power capping
driver (Adam Lessnau).
- Add support for the EPP field in the HWP (hardware managed
P-states) control register, HWP.EPP, to the x86_energy_perf_policy
tool and update msr-index.h with HWP.EPP values (Len Brown).
- Fix some minor issues in the turbostat tool (Len Brown).
- Add support for AMD family 0x17 CPUs to the cpupower tool and fix a
minor issue in it (Sherry Hurwitz).
- Assorted cleanups, mostly related to the constification of some
data structures (Arvind Yadav, Joe Perches, Kees Cook, Krzysztof
Kozlowski)"
* tag 'pm-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (69 commits)
cpufreq: Update scaling_cur_freq documentation
cpufreq: intel_pstate: Clean up after performance governor changes
PM: hibernate: constify attribute_group structures.
cpuidle: menu: allow state 0 to be disabled
intel_idle: Use more common logging style
PM / Domains: Fix missing default_power_down_ok comment
PM / Domains: Fix unsafe iteration over modified list of domains
PM / Domains: Fix unsafe iteration over modified list of domain providers
PM / Domains: Fix unsafe iteration over modified list of device links
PM / Domains: Handle safely genpd_syscore_switch() call on non-genpd device
PM / Domains: Call driver's noirq callbacks
PM / core: Drop run_wake flag from struct dev_pm_info
PCI / PM: Simplify device wakeup settings code
PCI / PM: Drop pme_interrupt flag from struct pci_dev
ACPI / PM: Consolidate device wakeup settings code
ACPI / PM: Drop run_wake from struct acpi_device_wakeup_flags
PM / QoS: constify *_attribute_group.
PM / AVS: rockchip-io: add io selectors and supplies for rk3228
powercap/RAPL: prevent overridding bits outside of the mask
PM / sysfs: Constify attribute groups
...
Pull RAS updates from Thomas Gleixner:
"The RAS updates for the 4.13 merge window:
- Cleanup of the MCE injection facility (Borsilav Petkov)
- Rework of the AMD/SMCA handling (Yazen Ghannam)
- Enhancements for ACPI/APEI to handle new notitication types (Shiju
Jose)
- atomic_t to refcount_t conversion (Elena Reshetova)
- A few fixes and enhancements all over the place"
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
RAS/CEC: Check the correct variable in the debugfs error handling
x86/mce: Always save severity in machine_check_poll()
x86/MCE, xen/mcelog: Make /dev/mcelog registration messages more precise
x86/mce: Update bootlog description to reflect behavior on AMD
x86/mce: Don't disable MCA banks when offlining a CPU on AMD
x86/mce/mce-inject: Preset the MCE injection struct
x86/mce: Clean up include files
x86/mce: Get rid of register_mce_write_callback()
x86/mce: Merge mce_amd_inj into mce-inject
x86/mce/AMD: Use saved threshold block info in interrupt handler
x86/mce/AMD: Use msr_stat when clearing MCA_STATUS
x86/mce/AMD: Carve out SMCA bank configuration
x86/mce/AMD: Redo error logging from APIC LVT interrupt handlers
x86/mce: Convert threshold_bank.cpus from atomic_t to refcount_t
RAS: Make local function parse_ras_param() static
ACPI/APEI: Handle GSIV and GPIO notification types
Pull SMP hotplug updates from Thomas Gleixner:
"This update is primarily a cleanup of the CPU hotplug locking code.
The hotplug locking mechanism is an open coded RWSEM, which allows
recursive locking. The main problem with that is the recursive nature
as it evades the full lockdep coverage and hides potential deadlocks.
The rework replaces the open coded RWSEM with a percpu RWSEM and
establishes full lockdep coverage that way.
The bulk of the changes fix up recursive locking issues and address
the now fully reported potential deadlocks all over the place. Some of
these deadlocks have been observed in the RT tree, but on mainline the
probability was low enough to hide them away."
* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
cpu/hotplug: Constify attribute_group structures
powerpc: Only obtain cpu_hotplug_lock if called by rtasd
ARM/hw_breakpoint: Fix possible recursive locking for arch_hw_breakpoint_init
cpu/hotplug: Remove unused check_for_tasks() function
perf/core: Don't release cred_guard_mutex if not taken
cpuhotplug: Link lock stacks for hotplug callbacks
acpi/processor: Prevent cpu hotplug deadlock
sched: Provide is_percpu_thread() helper
cpu/hotplug: Convert hotplug locking to percpu rwsem
s390: Prevent hotplug rwsem recursion
arm: Prevent hotplug rwsem recursion
arm64: Prevent cpu hotplug rwsem recursion
kprobes: Cure hotplug lock ordering issues
jump_label: Reorder hotplug lock and jump_label_lock
perf/tracing/cpuhotplug: Fix locking order
ACPI/processor: Use cpu_hotplug_disable() instead of get_online_cpus()
PCI: Replace the racy recursion prevention
PCI: Use cpu_hotplug_disable() instead of get_online_cpus()
perf/x86/intel: Drop get_online_cpus() in intel_snb_check_microcode()
x86/perf: Drop EXPORT of perf_check_microcode
...
Pull x86 timers updates from Thomas Gleixner:
"This update contains:
- The solution for the TSC deadline timer borkage, which is caused by
a hardware problem in the TSC_ADJUST/TSC_DEADLINE_TIMER logic.
The problem is documented now and fixed with a microcode update, so
we can remove the workaround and just check for the microcode version.
If the microcode is not up to date, then the TSC deadline timer is
disabled. If the borkage is fixed by the proper microcode version,
then the deadline timer can be used. In both cases the restrictions
to the range of the TSC_ADJUST value, which were added as
workarounds, are removed.
- A few simple fixes and updates to the timer related x86 code"
* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tsc: Call check_system_tsc_reliable() before unsynchronized_tsc()
x86/hpet: Do not use smp_processor_id() in preemptible code
x86/time: Make setup_default_timer_irq() static
x86/tsc: Remove the TSC_ADJUST clamp
x86/apic: Add TSC_DEADLINE quirk due to errata
x86/apic: Change the lapic name in deadline mode
Pull irq updates from Thomas Gleixner:
"The irq department delivers:
- Expand the generic infrastructure handling the irq migration on CPU
hotplug and convert X86 over to it. (Thomas Gleixner)
Aside of consolidating code this is a preparatory change for:
- Finalizing the affinity management for multi-queue devices. The
main change here is to shut down interrupts which are affine to a
outgoing CPU and reenabling them when the CPU comes online again.
That avoids moving interrupts pointlessly around and breaking and
reestablishing affinities for no value. (Christoph Hellwig)
Note: This contains also the BLOCK-MQ and NVME changes which depend
on the rework of the irq core infrastructure. Jens acked them and
agreed that they should go with the irq changes.
- Consolidation of irq domain code (Marc Zyngier)
- State tracking consolidation in the core code (Jeffy Chen)
- Add debug infrastructure for hierarchical irq domains (Thomas
Gleixner)
- Infrastructure enhancement for managing generic interrupt chips via
devmem (Bartosz Golaszewski)
- Constification work all over the place (Tobias Klauser)
- Two new interrupt controller drivers for MVEBU (Thomas Petazzoni)
- The usual set of fixes, updates and enhancements all over the
place"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (112 commits)
irqchip/or1k-pic: Fix interrupt acknowledgement
irqchip/irq-mvebu-gicp: Allocate enough memory for spi_bitmap
irqchip/gic-v3: Fix out-of-bound access in gic_set_affinity
nvme: Allocate queues for all possible CPUs
blk-mq: Create hctx for each present CPU
blk-mq: Include all present CPUs in the default queue mapping
genirq: Avoid unnecessary low level irq function calls
genirq: Set irq masked state when initializing irq_desc
genirq/timings: Add infrastructure for estimating the next interrupt arrival time
genirq/timings: Add infrastructure to track the interrupt timings
genirq/debugfs: Remove pointless NULL pointer check
irqchip/gic-v3-its: Don't assume GICv3 hardware supports 16bit INTID
irqchip/gic-v3-its: Add ACPI NUMA node mapping
irqchip/gic-v3-its-platform-msi: Make of_device_ids const
irqchip/gic-v3-its: Make of_device_ids const
irqchip/irq-mvebu-icu: Add new driver for Marvell ICU
irqchip/irq-mvebu-gicp: Add new driver for Marvell GICP
dt-bindings/interrupt-controller: Add DT binding for the Marvell ICU
genirq/irqdomain: Remove auto-recursive hierarchy support
irqchip/MSI: Use irq_domain_update_bus_token instead of an open coded access
...