For being able to relocate pre-allocated data areas like initrd or
p2m list it is mandatory to find a contiguous memory area which is
not yet in use and doesn't conflict with the memory map we want to
be in effect.
In case such an area is found reserve it at once as this will be
required to be done in any case.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Provide a service routine to check a physical memory area against the
E820 map. The routine will return false if the complete area is RAM
according to the E820 map and true otherwise.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Memory pages in the initial memory setup done by the Xen hypervisor
conflicting with the target E820 map are remapped. In order to do this
those pages are counted and remapped in xen_set_identity_and_remap().
Split the counting from the remapping operation to be able to setup
the needed memory sizes in time but doing the remap operation at a
later time. This enables us to simplify the interface to
xen_set_identity_and_remap() as the number of remapped and released
pages is no longer needed here.
Finally move the remapping further down to prepare relocating
conflicting memory contents before the memory might be clobbered by
xen_set_identity_and_remap(). This requires to not destroy the Xen
E820 map when the one for the system is being constructed.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Instead of using a function local static e820 map in xen_memory_setup()
and calling various functions in the same source with the map as a
parameter use a map directly accessible by all functions in the source.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Direct Xen to place the initial P->M table outside of the initial
mapping, as otherwise the 1G (implementation) / 2G (theoretical)
restriction on the size of the initial mapping limits the amount
of memory a domain can be handed initially.
As the initial P->M table is copied rather early during boot to
domain private memory and it's initial virtual mapping is dropped,
the easiest way to avoid virtual address conflicts with other
addresses in the kernel is to use a user address area for the
virtual address of the initial P->M table. This allows us to just
throw away the page tables of the initial mapping after the copy
without having to care about address invalidation.
It should be noted that this patch won't enable a pv-domain to USE
more than 512 GB of RAM. It just enables it to be started with a
P->M table covering more memory. This is especially important for
being able to boot a Dom0 on a system with more than 512 GB memory.
Signed-off-by: Juergen Gross <jgross@suse.com>
Based-on-patch-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
In case the Xen tools indicate they don't need the p2m 3 level tree
as they support the virtual mapped linear p2m list, just omit building
the tree.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
The virtual address of the linear p2m list should be stored in the
shared info structure read by the Xen tools to be able to support
64 bit pv-domains larger than 512 GB. Additionally the linear p2m
list interface includes a generation count which is changed prior
to and after each mapping change of the p2m list. Reading the
generation count the Xen tools can detect changes of the mappings
and re-read the p2m list eventually.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Currently, the event channel rebind code is gated with the presence of
the vector callback.
The virtual interrupt controller on ARM has the concept of per-CPU
interrupt (PPI) which allow us to support per-VCPU event channel.
Therefore there is no need of vector callback for ARM.
Xen is already using a free PPI to notify the guest VCPU of an event.
Furthermore, the xen code initialization in Linux (see
arch/arm/xen/enlighten.c) is requesting correctly a per-CPU IRQ.
Introduce new helper xen_support_evtchn_rebind to allow architecture
decide whether rebind an event is support or not. It will always return
true on ARM and keep the same behavior on x86.
This is also allow us to drop the usage of xen_have_vector_callback
entirely in the ARM code.
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
xen_has_pv_devices() has no parameters, so use the normal void
parameter convention to make it match the prototype in the header file
include/xen/platform_pci.h.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Since commit feb44f1f7a (x86/xen:
Provide a "Xen PV" APIC driver to support >255 VCPUs) Xen guests need
a full APIC driver and thus should depend on X86_LOCAL_APIC.
This fixes an i386 build failure with !SMP && !CONFIG_X86_UP_APIC by
disabling Xen support in this configuration.
Users needing Xen support in a non-SMP i386 kernel will need to enable
CONFIG_X86_UP_APIC.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: <stable@vger.kernel.org>
We currently register a platform device for e820 type-12 memory and
register a nvdimm bus beneath it. Registering the platform device
triggers the device-core machinery to probe for a driver, but that
search currently comes up empty. Building the nvdimm-bus registration
into the e820_pmem platform device registration in this way forces
libnvdimm to be built-in. Instead, convert the built-in portion of
CONFIG_X86_PMEM_LEGACY to simply register a platform device and move the
rest of the logic to the driver for e820_pmem, for the following
reasons:
1/ Letting e820_pmem support be a module allows building and testing
libnvdimm.ko changes without rebooting
2/ All the normal policy around modules can be applied to e820_pmem
(unbind to disable and/or blacklisting the module from loading by
default)
3/ Moving the driver to a generic location and converting it to scan
"iomem_resource" rather than "e820.map" means any other architecture can
take advantage of this simple nvdimm resource discovery mechanism by
registering a resource named "Persistent Memory (legacy)"
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Alex Deucher, Mark Rustad and Alexander Holler reported a regression
with the latest v4.2-rc4 kernel, which breaks some SATA controllers.
With multi-MSI capable SATA controllers, only the first port works,
all other ports time out when executing SATA commands.
This happens because the first argument to assign_irq_vector_policy()
is always the base linux irq number of the multi MSI interrupt block,
so all subsequent vector assignments operate on the base linux irq
number, so all MSI irqs are handled as the first irq number. Therefor
the other MSI irqs of a device are never set up correctly and never
fire.
Add the loop iterator to the base irq number so all vectors are
assigned correctly.
Fixes: b5dc8e6c21 "x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors"
Reported-and-tested-by: Alex Deucher <alexdeucher@gmail.com>
Reported-and-tested-by: Mark Rustad <mrustad@gmail.com>
Reported-and-tested-by: Alexander Holler <holler@ahsoftware.de>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1439911228-9880-1-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Merge x86 fixes from Ingo Molnar:
"Two followup fixes related to the previous LDT fix"
Also applied a further FPU emulation fix from Andy Lutomirski to the
branch before actually merging it.
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
x86/ldt: Further fix FPU emulation
x86/ldt: Correct FPU emulation access to LDT
x86/ldt: Correct LDT access in single stepping logic
Pull KVM fixes from Paolo Bonzini:
"Just two very small & simple patches"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Use adjustment in guest cycles when handling MSR_IA32_TSC_ADJUST
KVM: x86: zero IDT limit on entry to SMM
VMX encodes access rights differently from LAR, and the latter is
most likely what x86 people think of when they think of "access
rights".
Rename them to avoid confusion.
Cc: kvm@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Kill arch_memremap_pmem() and just let the architecture specify the
flags to be passed to memremap(). Default to writethrough by default.
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Conflicts:
drivers/net/ethernet/cavium/Kconfig
The cavium conflict was overlapping dependency
changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 3f5159a922 ("x86/asm/entry/32: Update -ENOSYS handling to match
the 64-bit logic") broke the ENOSYS handling for the 32-bit compat case.
The proper error return value was never loaded into %rax, except if
things just happened to go through the audit paths, which ended up
reloading the return value.
This moves the loading or %rax into the normal system call path, just to
make sure the error case triggers it. It's kind of sad, since it adds a
useless instruction to reload the register to the fast path, but it's
not like that single load from the stack is going to be noticeable.
Reported-by: David Drysdale <drysdale@google.com>
Tested-by: Kees Cook <keescook@chromium.org>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull xen bug fixes from David Vrabel:
- revert a fix from 4.2-rc5 that was causing lots of WARNING spam.
- fix a memory leak affecting backends in HVM guests.
- fix PV domU hang with certain configurations.
* tag 'for-linus-4.2-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/xenbus: Don't leak memory when unmapping the ring on HVM backend
Revert "xen/events/fifo: Handle linked events when closing a port"
x86/xen: build "Xen PV" APIC driver for domU as well
This reverts commits 9a036b93a3 ("x86/signal/64: Remove 'fs' and 'gs'
from sigcontext") and c6f2062935 ("x86/signal/64: Fix SS handling for
signals delivered to 64-bit programs").
They were cleanups, but they break dosemu by changing the signal return
behavior (and removing 'fs' and 'gs' from the sigcontext struct - while
not actually changing any behavior - causes build problems).
Reported-and-tested-by: Stas Sergeev <stsp@list.ru>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhang Liguang reported the following issue:
1) System detects a CMCI storm on the current CPU.
2) Kernel disables the CMCI interrupt on banks owned by the
current CPU and switches to poll mode
3) After the CMCI storm subsides, kernel switches back to
interrupt mode
4) We expect the system to reenable the CMCI interrupt on banks
owned by the current CPU
mce_intel_adjust_timer
|-> cmci_reenable
|-> cmci_discover # owned banks are ignored here
static void cmci_discover(int banks)
...
for (i = 0; i < banks; i++) {
...
if (test_bit(i, owned)) # ownd banks is ignore here
continue;
So convert cmci_storm_disable_banks() to
cmci_toggle_interrupt_mode() which controls whether to enable or
disable CMCI interrupts with its argument.
NB: We cannot clear the owned bit because the banks won't be
polled, otherwise. See:
27f6c573e0 ("x86, CMCI: Add proper detection of end of CMCI storms")
for more info.
Reported-by: Zhang Liguang <zhangliguang@huawei.com>
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # v3.15+
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: huawei.libin@huawei.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: rui.xiang@huawei.com
Link: http://lkml.kernel.org/r/1439396985-12812-10-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
printk() is not safe to use in MCE context. Add a lockless
memory allocator pool to save error records in MCE context.
Those records will be issued later, in a printk-safe context.
The idea is inspired by the APEI/GHES driver.
We're very conservative and allocate only two pages for it but
since we're going to use those pages throughout the system's
lifetime, we allocate them statically to avoid early boot time
allocation woes.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
[ Rewrite. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1439396985-12812-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Boris reported that gcc version 4.4.4 20100503 (Red Hat
4.4.4-2) fails to build linux-next kernels that have
this fresh commit via the locking tree:
11276d5306 ("locking/static_keys: Add a new static_key interface")
The problem appears to be that even though @key and @branch are
compile time constants, it doesn't see the following expression
as an immediate value:
&((char *)key)[branch]
More recent GCCs don't appear to have this problem.
In particular, Red Hat backported the 'asm goto' feature into 4.4,
'normal' 4.4 compilers will not have this feature and thus not
run into this asm.
The workaround is to supply both values to the asm as immediates
and do the addition in asm.
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A PKCS#7 or CMS message can have per-signature authenticated attributes
that are digested as a lump and signed by the authorising key for that
signature. If such attributes exist, the content digest isn't itself
signed, but rather it is included in a special authattr which then
contributes to the signature.
Further, we already require the master message content type to be
pkcs7_signedData - but there's also a separate content type for the data
itself within the SignedData object and this must be repeated inside the
authattrs for each signer [RFC2315 9.2, RFC5652 11.1].
We should really validate the authattrs if they exist or forbid them
entirely as appropriate. To this end:
(1) Alter the PKCS#7 parser to reject any message that has more than one
signature where at least one signature has authattrs and at least one
that does not.
(2) Validate authattrs if they are present and strongly restrict them.
Only the following authattrs are permitted and all others are
rejected:
(a) contentType. This is checked to be an OID that matches the
content type in the SignedData object.
(b) messageDigest. This must match the crypto digest of the data.
(c) signingTime. If present, we check that this is a valid, parseable
UTCTime or GeneralTime and that the date it encodes fits within
the validity window of the matching X.509 cert.
(d) S/MIME capabilities. We don't check the contents.
(e) Authenticode SP Opus Info. We don't check the contents.
(f) Authenticode Statement Type. We don't check the contents.
The message is rejected if (a) or (b) are missing. If the message is
an Authenticode type, the message is rejected if (e) is missing; if
not Authenticode, the message is rejected if (d) - (f) are present.
The S/MIME capabilities authattr (d) unfortunately has to be allowed
to support kernels already signed by the pesign program. This only
affects kexec. sign-file suppresses them (CMS_NOSMIMECAP).
The message is also rejected if an authattr is given more than once or
if it contains more than one element in its set of values.
(3) Add a parameter to pkcs7_verify() to select one of the following
restrictions and pass in the appropriate option from the callers:
(*) VERIFYING_MODULE_SIGNATURE
This requires that the SignedData content type be pkcs7-data and
forbids authattrs. sign-file sets CMS_NOATTR. We could be more
flexible and permit authattrs optionally, but only permit minimal
content.
(*) VERIFYING_FIRMWARE_SIGNATURE
This requires that the SignedData content type be pkcs7-data and
requires authattrs. In future, this will require an attribute
holding the target firmware name in addition to the minimal set.
(*) VERIFYING_UNSPECIFIED_SIGNATURE
This requires that the SignedData content type be pkcs7-data but
allows either no authattrs or only permits the minimal set.
(*) VERIFYING_KEXEC_PE_SIGNATURE
This only supports the Authenticode SPC_INDIRECT_DATA content type
and requires at least an SpcSpOpusInfo authattr in addition to the
minimal set. It also permits an SPC_STATEMENT_TYPE authattr (and
an S/MIME capabilities authattr because the pesign program doesn't
remove these).
(*) VERIFYING_KEY_SIGNATURE
(*) VERIFYING_KEY_SELF_SIGNATURE
These are invalid in this context but are included for later use
when limiting the use of X.509 certs.
(4) The pkcs7_test key type is given a module parameter to select between
the above options for testing purposes. For example:
echo 1 >/sys/module/pkcs7_test_key/parameters/usage
keyctl padd pkcs7_test foo @s </tmp/stuff.pkcs7
will attempt to check the signature on stuff.pkcs7 as if it contains a
firmware blob (1 being VERIFYING_FIRMWARE_SIGNATURE).
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marcel Holtmann <marcel@holtmann.org>
Reviewed-by: David Woodhouse <David.Woodhouse@intel.com>
Pull RCU changes from Paul E. McKenney:
- The combination of tree geometry-initialization simplifications
and OS-jitter-reduction changes to expedited grace periods.
These two are stacked due to the large number of conflicts
that would otherwise result.
[ With one addition, a temporary commit to silence a lockdep false
positive. Additional changes to the expedited grace-period
primitives (queued for 4.4) remove the cause of this false
positive, and therefore include a revert of this temporary commit. ]
- Documentation updates.
- Torture-test updates.
- Miscellaneous fixes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Enable CONFIG_JUMP_LABEL in the defconfigs, the feature already deals with
GCC not having the asm-goto feature so will not break the build on
older compilers.
Having it enabled generates a faster kernel at very little extra cost
since we already include all the code patching code by having KPROBES
enabled.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>