Nick Meier reported a regression with HyperV that "
After rebooting the VM, the following messages are logged in syslog
when trying to load the tulip driver:
tulip: Linux Tulip drivers version 1.1.15 (Feb 27, 2007)
tulip: 0000:00:0a.0: PCI INT A: failed to register GSI
tulip: Cannot enable tulip board #0, aborting
tulip: probe of 0000:00:0a.0 failed with error -16
Errors occur in 3.19.0 kernel
Works in 3.17 kernel.
"
According to the ACPI dump file posted by Nick at
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1440072
The ACPI MADT table includes an interrupt source overridden entry for
ACPI SCI:
[236h 0566 1] Subtable Type : 02 <Interrupt Source Override>
[237h 0567 1] Length : 0A
[238h 0568 1] Bus : 00
[239h 0569 1] Source : 09
[23Ah 0570 4] Interrupt : 00000009
[23Eh 0574 2] Flags (decoded below) : 000D
Polarity : 1
Trigger Mode : 3
And in DSDT table, we have _PRT method to define PCI interrupts, which
eventually goes to:
Name (PRSA, ResourceTemplate ()
{
IRQ (Level, ActiveLow, Shared, )
{3,4,5,7,9,10,11,12,14,15}
})
Name (PRSB, ResourceTemplate ()
{
IRQ (Level, ActiveLow, Shared, )
{3,4,5,7,9,10,11,12,14,15}
})
Name (PRSC, ResourceTemplate ()
{
IRQ (Level, ActiveLow, Shared, )
{3,4,5,7,9,10,11,12,14,15}
})
Name (PRSD, ResourceTemplate ()
{
IRQ (Level, ActiveLow, Shared, )
{3,4,5,7,9,10,11,12,14,15}
})
According to the MADT and DSDT tables, IRQ 9 may be used for:
1) ACPI SCI in level, high mode
2) PCI legacy IRQ in level, low mode
So there's a conflict in polarity setting for IRQ 9.
Prior to commit cd68f6bd53 ("x86, irq, acpi: Get rid of special
handling of GSI for ACPI SCI"), ACPI SCI is handled specially and
there's no check for conflicts between ACPI SCI and PCI legagy IRQ.
And it seems that the HyperV hypervisor doesn't make use of the
polarity configuration in IOAPIC entry, so it just works.
Commit cd68f6bd53 gets rid of the specially handling of ACPI SCI,
and then the pin attribute checking code discloses the conflicts
between ACPI SCI and PCI legacy IRQ on HyperV virtual machine,
and rejects the request to assign IRQ9 to PCI devices.
So penalize legacy IRQ used by ACPI SCI and mark it unusable if ACPI
SCI attributes conflict with PCI IRQ attributes.
Please refer to following links for more information:
https://bugzilla.kernel.org/show_bug.cgi?id=101301https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1440072
Fixes: cd68f6bd53 ("x86, irq, acpi: Get rid of special handling of GSI for ACPI SCI")
Reported-and-tested-by: Nick Meier <nmeier@microsoft.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: 3.19+ <stable@vger.kernel.org> # 3.19+
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The Kconfig currently controlling compilation of this code is:
arch/x86/Kconfig:config X86_CHECK_BIOS_CORRUPTION
arch/x86/Kconfig: bool "Check for low memory corruption"
...meaning that it currently is not being built as a module by
anyone.
Lets remove the couple traces of modularity so that when reading
the code there is no doubt it is builtin-only.
Since module_init translates to device_initcall in the
non-modular case, the init ordering remains unchanged with this
commit.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1440459295-21814-4-git-send-email-paul.gortmaker@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In the recent x2apic cleanup I got two things really wrong:
1) The safety check in __disable_x2apic which allows the function to
be called unconditionally is backwards. The check is there to
prevent access to the apic MSR in case that the machine has no
apic. Though right now it returns if the machine has an apic and
therefor the disabling of x2apic is never invoked.
2) x2apic_disable() sets x2apic_mode to 0 after registering the local
apic. That's wrong, because register_lapic_address() checks x2apic
mode and therefor takes the wrong code path.
This results in boot failures on machines with x2apic preenabled by
BIOS and can also lead to an fatal MSR access on machines without
apic.
The solutions are simple:
1) Correct the sanity check for apic availability
2) Clear x2apic_mode _before_ calling register_lapic_address()
Fixes: 659006bf3a 'x86/x2apic: Split enable and setup function'
Reported-and-tested-by: Javier Monteagudo <javiermon@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1224764
Cc: stable@vger.kernel.org # 4.0+
Cc: Laura Abbott <labbott@redhat.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
During later stages of math-emu bootup the following crash triggers:
math_emulate: 0060:c100d0a8
Kernel panic - not syncing: Math emulation needed in kernel
CPU: 0 PID: 1511 Comm: login Not tainted 4.2.0-rc7+ #1012
[...]
Call Trace:
[<c181d50d>] dump_stack+0x41/0x52
[<c181c918>] panic+0x77/0x189
[<c1003530>] ? math_error+0x140/0x140
[<c164c2d7>] math_emulate+0xba7/0xbd0
[<c100d0a8>] ? fpu__copy+0x138/0x1c0
[<c1109c3c>] ? __alloc_pages_nodemask+0x12c/0x870
[<c136ac20>] ? proc_clear_tty+0x40/0x70
[<c136ac6e>] ? session_clear_tty+0x1e/0x30
[<c1003530>] ? math_error+0x140/0x140
[<c1003575>] do_device_not_available+0x45/0x70
[<c100d0a8>] ? fpu__copy+0x138/0x1c0
[<c18258e6>] error_code+0x5a/0x60
[<c1003530>] ? math_error+0x140/0x140
[<c100d0a8>] ? fpu__copy+0x138/0x1c0
[<c100c205>] arch_dup_task_struct+0x25/0x30
[<c1048cea>] copy_process.part.51+0xea/0x1480
[<c115a8e5>] ? dput+0x175/0x200
[<c136af70>] ? no_tty+0x30/0x30
[<c1157242>] ? do_vfs_ioctl+0x322/0x540
[<c104a21a>] _do_fork+0xca/0x340
[<c1057b06>] ? SyS_rt_sigaction+0x66/0x90
[<c104a557>] SyS_clone+0x27/0x30
[<c1824a80>] sysenter_do_call+0x12/0x12
The reason is the incorrect assumption in fpu_copy(), that FNSAVE
can be executed from math-emu kernels as well.
Don't try to copy the registers, the soft state will be copied
by fork anyway, so the child task inherits the parent task's
soft math state.
With this fix applied math-emu kernels boot up fine on modern
hardware and the 'no387 nofxsr' boot options.
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On a math-emu bootup the following crash occurs:
Initializing CPU#0
------------[ cut here ]------------
kernel BUG at arch/x86/kernel/traps.c:779!
invalid opcode: 0000 [#1] SMP
[...]
EIP is at do_device_not_available+0xe/0x70
[...]
Call Trace:
[<c18238e6>] error_code+0x5a/0x60
[<c1002bd0>] ? math_error+0x140/0x140
[<c100bbd9>] ? fpu__init_cpu+0x59/0xa0
[<c1012322>] cpu_init+0x202/0x330
[<c104509f>] ? __native_set_fixmap+0x1f/0x30
[<c1b56ab0>] trap_init+0x305/0x346
[<c1b548af>] start_kernel+0x1a5/0x35d
[<c1b542b4>] i386_start_kernel+0x82/0x86
The reason is that in the following commit:
b1276c48e9 ("x86/fpu: Initialize fpregs in fpu__init_cpu_generic()")
I failed to consider math-emu's limitation that it cannot execute the
FNINIT instruction in kernel mode.
The long term fix might be to allow math-emu to execute (certain) kernel
mode FPU instructions, but for now apply the safe (albeit somewhat ugly)
fix: initialize the emulation state explicitly without trapping out to
the FPU emulator.
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The Hyper-V top-level functional specification states, that
"algorithms should be resilient to sudden jumps forward or
backward in the TSC value", this means that we should consider
TSC as unstable. In some cases tsc tests are able to detect the
instability, it was detected in 543 out of 646 boots in my
testing:
Measured 6277 cycles TSC warp between CPUs, turning off TSC clock.
tsc: Marking TSC unstable due to check_tsc_sync_source failed
This is, however, just a heuristic. On Hyper-V platform there
are two good clocksources: MSR-based hyperv_clocksource and
recently introduced TSC page.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: devel@linuxdriverproject.org
Link: http://lkml.kernel.org/r/1440003264-9949-1-git-send-email-vkuznets@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We currently register a platform device for e820 type-12 memory and
register a nvdimm bus beneath it. Registering the platform device
triggers the device-core machinery to probe for a driver, but that
search currently comes up empty. Building the nvdimm-bus registration
into the e820_pmem platform device registration in this way forces
libnvdimm to be built-in. Instead, convert the built-in portion of
CONFIG_X86_PMEM_LEGACY to simply register a platform device and move the
rest of the logic to the driver for e820_pmem, for the following
reasons:
1/ Letting e820_pmem support be a module allows building and testing
libnvdimm.ko changes without rebooting
2/ All the normal policy around modules can be applied to e820_pmem
(unbind to disable and/or blacklisting the module from loading by
default)
3/ Moving the driver to a generic location and converting it to scan
"iomem_resource" rather than "e820.map" means any other architecture can
take advantage of this simple nvdimm resource discovery mechanism by
registering a resource named "Persistent Memory (legacy)"
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Alex Deucher, Mark Rustad and Alexander Holler reported a regression
with the latest v4.2-rc4 kernel, which breaks some SATA controllers.
With multi-MSI capable SATA controllers, only the first port works,
all other ports time out when executing SATA commands.
This happens because the first argument to assign_irq_vector_policy()
is always the base linux irq number of the multi MSI interrupt block,
so all subsequent vector assignments operate on the base linux irq
number, so all MSI irqs are handled as the first irq number. Therefor
the other MSI irqs of a device are never set up correctly and never
fire.
Add the loop iterator to the base irq number so all vectors are
assigned correctly.
Fixes: b5dc8e6c21 "x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors"
Reported-and-tested-by: Alex Deucher <alexdeucher@gmail.com>
Reported-and-tested-by: Mark Rustad <mrustad@gmail.com>
Reported-and-tested-by: Alexander Holler <holler@ahsoftware.de>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1439911228-9880-1-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Merge x86 fixes from Ingo Molnar:
"Two followup fixes related to the previous LDT fix"
Also applied a further FPU emulation fix from Andy Lutomirski to the
branch before actually merging it.
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
x86/ldt: Further fix FPU emulation
x86/ldt: Correct FPU emulation access to LDT
x86/ldt: Correct LDT access in single stepping logic
This reverts commits 9a036b93a3 ("x86/signal/64: Remove 'fs' and 'gs'
from sigcontext") and c6f2062935 ("x86/signal/64: Fix SS handling for
signals delivered to 64-bit programs").
They were cleanups, but they break dosemu by changing the signal return
behavior (and removing 'fs' and 'gs' from the sigcontext struct - while
not actually changing any behavior - causes build problems).
Reported-and-tested-by: Stas Sergeev <stsp@list.ru>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhang Liguang reported the following issue:
1) System detects a CMCI storm on the current CPU.
2) Kernel disables the CMCI interrupt on banks owned by the
current CPU and switches to poll mode
3) After the CMCI storm subsides, kernel switches back to
interrupt mode
4) We expect the system to reenable the CMCI interrupt on banks
owned by the current CPU
mce_intel_adjust_timer
|-> cmci_reenable
|-> cmci_discover # owned banks are ignored here
static void cmci_discover(int banks)
...
for (i = 0; i < banks; i++) {
...
if (test_bit(i, owned)) # ownd banks is ignore here
continue;
So convert cmci_storm_disable_banks() to
cmci_toggle_interrupt_mode() which controls whether to enable or
disable CMCI interrupts with its argument.
NB: We cannot clear the owned bit because the banks won't be
polled, otherwise. See:
27f6c573e0 ("x86, CMCI: Add proper detection of end of CMCI storms")
for more info.
Reported-by: Zhang Liguang <zhangliguang@huawei.com>
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # v3.15+
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: huawei.libin@huawei.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: rui.xiang@huawei.com
Link: http://lkml.kernel.org/r/1439396985-12812-10-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
printk() is not safe to use in MCE context. Add a lockless
memory allocator pool to save error records in MCE context.
Those records will be issued later, in a printk-safe context.
The idea is inspired by the APEI/GHES driver.
We're very conservative and allocate only two pages for it but
since we're going to use those pages throughout the system's
lifetime, we allocate them statically to avoid early boot time
allocation woes.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
[ Rewrite. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1439396985-12812-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A PKCS#7 or CMS message can have per-signature authenticated attributes
that are digested as a lump and signed by the authorising key for that
signature. If such attributes exist, the content digest isn't itself
signed, but rather it is included in a special authattr which then
contributes to the signature.
Further, we already require the master message content type to be
pkcs7_signedData - but there's also a separate content type for the data
itself within the SignedData object and this must be repeated inside the
authattrs for each signer [RFC2315 9.2, RFC5652 11.1].
We should really validate the authattrs if they exist or forbid them
entirely as appropriate. To this end:
(1) Alter the PKCS#7 parser to reject any message that has more than one
signature where at least one signature has authattrs and at least one
that does not.
(2) Validate authattrs if they are present and strongly restrict them.
Only the following authattrs are permitted and all others are
rejected:
(a) contentType. This is checked to be an OID that matches the
content type in the SignedData object.
(b) messageDigest. This must match the crypto digest of the data.
(c) signingTime. If present, we check that this is a valid, parseable
UTCTime or GeneralTime and that the date it encodes fits within
the validity window of the matching X.509 cert.
(d) S/MIME capabilities. We don't check the contents.
(e) Authenticode SP Opus Info. We don't check the contents.
(f) Authenticode Statement Type. We don't check the contents.
The message is rejected if (a) or (b) are missing. If the message is
an Authenticode type, the message is rejected if (e) is missing; if
not Authenticode, the message is rejected if (d) - (f) are present.
The S/MIME capabilities authattr (d) unfortunately has to be allowed
to support kernels already signed by the pesign program. This only
affects kexec. sign-file suppresses them (CMS_NOSMIMECAP).
The message is also rejected if an authattr is given more than once or
if it contains more than one element in its set of values.
(3) Add a parameter to pkcs7_verify() to select one of the following
restrictions and pass in the appropriate option from the callers:
(*) VERIFYING_MODULE_SIGNATURE
This requires that the SignedData content type be pkcs7-data and
forbids authattrs. sign-file sets CMS_NOATTR. We could be more
flexible and permit authattrs optionally, but only permit minimal
content.
(*) VERIFYING_FIRMWARE_SIGNATURE
This requires that the SignedData content type be pkcs7-data and
requires authattrs. In future, this will require an attribute
holding the target firmware name in addition to the minimal set.
(*) VERIFYING_UNSPECIFIED_SIGNATURE
This requires that the SignedData content type be pkcs7-data but
allows either no authattrs or only permits the minimal set.
(*) VERIFYING_KEXEC_PE_SIGNATURE
This only supports the Authenticode SPC_INDIRECT_DATA content type
and requires at least an SpcSpOpusInfo authattr in addition to the
minimal set. It also permits an SPC_STATEMENT_TYPE authattr (and
an S/MIME capabilities authattr because the pesign program doesn't
remove these).
(*) VERIFYING_KEY_SIGNATURE
(*) VERIFYING_KEY_SELF_SIGNATURE
These are invalid in this context but are included for later use
when limiting the use of X.509 certs.
(4) The pkcs7_test key type is given a module parameter to select between
the above options for testing purposes. For example:
echo 1 >/sys/module/pkcs7_test_key/parameters/usage
keyctl padd pkcs7_test foo @s </tmp/stuff.pkcs7
will attempt to check the signature on stuff.pkcs7 as if it contains a
firmware blob (1 being VERIFYING_FIRMWARE_SIGNATURE).
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marcel Holtmann <marcel@holtmann.org>
Reviewed-by: David Woodhouse <David.Woodhouse@intel.com>
Pull RCU changes from Paul E. McKenney:
- The combination of tree geometry-initialization simplifications
and OS-jitter-reduction changes to expedited grace periods.
These two are stacked due to the large number of conflicts
that would otherwise result.
[ With one addition, a temporary commit to silence a lockdep false
positive. Additional changes to the expedited grace-period
primitives (queued for 4.4) remove the cause of this false
positive, and therefore include a revert of this temporary commit. ]
- Documentation updates.
- Torture-test updates.
- Miscellaneous fixes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently we only update the sysfs event files per available MSR, we
didn't actually disallow creating unlisted events.
Rework things such that the dectection, sysfs listing and event
creation are better coordinated.
Sadly it appears it's impossible to probe R/O MSRs under virt. This
means we have to do the full model table to avoid listing all MSRs all
the time.
Tested-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Migrate i8253 driver to the new 'set-state' interface provided by
clockevents core, the earlier 'set-mode' interface is marked obsolete
now.
This also enables us to implement callbacks for new states of clockevent
devices, for example: ONESHOT_STOPPED.
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Its return value is not used by the subsys core and nothing meaningful
can be done with it, even if we want to use it. The subsys device is
anyway getting removed.
Update prototype of ->remove_dev() to make its return type as void. Fix
all usage sites as well.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>