Handle the misconfiguration where CONFIG_PHYSICAL_START is
incompatible with CONFIG_PHYSICAL_ALIGN. This is a configuration
error, but one which arises easily since Kconfig doesn't have the
smarts to express the true relationship between these two variables.
Hence, align __PHYSICAL_START the same way we align LOAD_PHYSICAL_ADDR
in <asm/boot.h>.
For non-relocatable kernels, this would cause the boot to fail.
[ Impact: fix boot failures for non-relocatable kernels ]
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Ed found that on 32-bit, boot_cpu_physical_apicid is not read right,
when the mptable is broken.
Interestingly, actually three paths use/set it:
1. acpi: at that time that is already read from reg
2. mptable: only read from mptable
3. no madt, and no mptable, that use default apic id 0 for 64-bit, -1 for 32-bit
so we could read the apic id for the 2/3 path. We trust the hardware
register more than we trust a BIOS data structure (the mptable).
We can also avoid the double set_fixmap() when acpi_lapic
is used, and also need to move cpu_has_apic earlier and
call apic_disable().
Also when need to update the apic id, we'd better read and
set the apic version as well - so that quirks are applied precisely.
v2: make path 3 with 64bit, use -1 as apic id, so could read it later.
v3: fix whitespace problem pointed out by Ed Swierk
v5: fix boot crash
[ Impact: get correct apic id for bsp other than acpi path ]
Reported-by: Ed Swierk <eswierk@aristanetworks.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <49FC85A9.2070702@kernel.org>
[ v4: sanity-check in the ACPI case too ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Merge reason: both topics modify the APIC code but were able to do it in
parallel so far. An upcoming patch generates a conflict so
merge them to avoid the conflict.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Solve issues described in 6f66cbc630
in a way that doesn't resort to set_cpus_allowed();
* in fact, only collect_cpu_info and apply_microcode callbacks
must run on a target cpu, others will do just fine on any other.
smp_call_function_single() (as suggested by Ingo) is used to run
these callbacks on a target cpu.
* cleanup of synchronization logic of the 'microcode_core' part
The generic 'microcode_core' part guarantees that only a single cpu
(be it a full-fledged cpu, one of the cores or HT)
is being updated at any particular moment of time.
In general, there is no need for any additional sync. mechanism in
arch-specific parts (the patch removes existing spinlocks).
See also the "Synchronization" section in microcode_core.c.
* return -EINVAL instead of -1 (which is translated into -EPERM) in
microcode_write(), reload_cpu() and mc_sysdev_add(). Other suggestions
for an error code?
* use 'enum ucode_state' as return value of request_microcode_{fw, user}
to gain more flexibility by distinguishing between real error cases
and situations when an appropriate ucode was not found (which is not an
error per-se).
* some minor cleanups
Thanks a lot to Hugh Dickins for review/suggestions/testing!
Reference: http://marc.info/?l=linux-kernel&m=124025889012541&w=2
[ Impact: refactor and clean up microcode driver locking code ]
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Peter Oruba <peter.oruba@amd.com>
Cc: Arjan van de Ven <arjan@infradead.org>
LKML-Reference: <1242078507.5560.9.camel@earth>
[ did some more cleanups ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/include/asm/microcode.h | 25 ++
arch/x86/kernel/microcode_amd.c | 58 ++----
arch/x86/kernel/microcode_core.c | 326 +++++++++++++++++++++-----------------
arch/x86/kernel/microcode_intel.c | 92 +++-------
4 files changed, 261 insertions(+), 240 deletions(-)
(~20 new comment lines)
A long ago, in days of yore, it all began with a god named Thor.
There were vikings and boats and some plans for a Linux kernel
header. Unfortunately, a single 8-bit field was used for bootloader
type and version. This has generally worked without *too* much pain,
but we're getting close to flat running out of ID fields.
Add extension fields for both type and version. The type will be
extended if it the old field is 0xE; the version is a simple MSB
extension.
Keep /proc/sys/kernel/bootloader_type containing
(type << 4) + (ver & 0xf) for backwards compatiblity, but also add
/proc/sys/kernel/bootloader_version which contains the full version
number.
[ Impact: new feature to support more bootloaders ]
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Make the kernel_alignment field adjustable; this allows us to set it
to a large value (intended to be 16 MB to avoid ZONE_DMA contention,
memory holes and other weirdness) while a smart bootloader can still
force a loading at a lesser alignment if absolutely necessary.
Also export pref_address (preferred loading address, corresponding to
the link-time address) and init_size, the total amount of linear
memory the kernel will require during initialization.
[ Impact: allows better kernel placement, gives bootloader more info ]
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Fix to prevent sched_mc_power_saving from being exported through sysfs
for multi-scoket single core system. Max cores should be always greater than
one (1). My earlier patch that introduced fix for not exporting
'sched_mc_power_saving' on laptops broke it on multi-socket single core
system. This fix addresses issue on both laptop and multi-socket single
core system.
Below are the Test results:
1. Single socket - multi-core
Before Patch: Does not export 'sched_mc_power_saving'
After Patch: Does not export 'sched_mc_power_saving'
Result: Pass
2. Multi Socket - single core
Before Patch: exports 'sched_mc_power_saving'
After Patch: Does not export 'sched_mc_power_saving'
Result: Pass
3. Multi Socket - Multi core
Before Patch: exports 'sched_mc_power_saving'
After Patch: exports 'sched_mc_power_saving'
[ Impact: make the sched_mc_power_saving control available more consistently ]
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Suresh B Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090511143914.GB4853@dirshya.in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- the byte operand constraints were wrong for 32-bit
- the to-op's input operands weren't properly parenthesized
[ Impact: fix possible miscompilation or build failure ]
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Both print_local_APIC (used when apic=debug kernel param is set) and
cpu_debug code missed support for some extended APIC registers that
I'd like to see.
This adds support to show:
- extended APIC feature register
- extended APIC control register
- extended LVT registers
[ Impact: print more debug info ]
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Jaswinder Singh Rajput <jaswinder@kernel.org>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <20090508162350.GO29045@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
setup_force_cpu_cap() only have one user (Xen guest code),
but it should not reuse cleared_cpu_cpus, otherwise it
will have problems on SMP.
Need to have a separate cpu_cpus_set array too, for forced-on
flags, beyond the forced-off flags.
Also need to setup handling before all cpus caps are combined.
[ Impact: fix the forced-set CPU feature flag logic ]
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Yinghai Lu <yinghai.lu@kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The patch to call mp_config_acpi_gsi() from the ACPI IRQ registration
code never got mainline because there were open discussions about it.
This call is needed to properly update the kernel's copy of the mptable,
when the update_mptable boot parameter is needed.
Now that the dust has settled with the APIC unification, and since there
were no objections when the patch was re-submitted, try this again.
[ Impact: fix the update_mptable boot parameter ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Len Brown <lenb@kernel.org>
LKML-Reference: <4A01C387.7090103@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Native x86-64 uses the IST mechanism to run int3 and debug traps on
an alternative stack. Xen does not do this, and so the frames were
being misinterpreted by the ptrace code. This change special-cases
these two exceptions by using Xen variants which run on the normal
kernel stack properly.
Impact: avoid crash or bad data when IST trap is invoked under Xen
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
MSRC001_0015 Hardware Configuration Register (HWCR) is already defined
as MSR_K7_HWCR.
And HWCR is available for >= K7.
So MSR_K8_HWCR is not required and no-one is using it.
[ Impact: cleanup, no object code change ]
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Conflicts:
arch/frv/include/asm/pgtable.h
arch/x86/include/asm/required-features.h
arch/x86/xen/mmu.c
Merge reason: x86/xen was on a .29 base still, move it to a fresher
branch and pick up Xen fixes as well, plus resolve
conflicts
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Merge reason: this topic is ready for upstream now. It passed
Oleg's review and Andrew had no further mm/*
objections/observations either.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Merge reason: tracing/core was on a .30-rc1 base and was missing out on
on a handful of tracing fixes present in .30-rc5-almost.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Extend the maximum addressable memory on x86-64 from 2^44 to
2^46 bytes. This requires some shuffling around of the vmalloc
and virtual memmap memory areas, to keep them away from the
direct mapping of up to 64TB of physical memory.
This patch also introduces a guard hole between the vmalloc
area and the virtual memory map space. There's really no
good reason why we wouldn't have a guard hole there.
[ Impact: future hardware enablement ]
Signed-off-by: Rik van Riel <riel@redhat.com>
LKML-Reference: <20090505172856.6820db22@cuia.bos.redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
x86, mce: fix boot logging logic
x86, mce: make polling timer interval per CPU
Conflicts:
arch/x86/kernel/apic/io_apic.c
Merge reason: non-trivial interaction between ongoing work in io_apic.c
and the NUMA migration feature in the irq tree.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is necessary to avoid the conflict of syscall numbers.
Conflicts:
arch/x86/ia32/ia32entry.S
arch/x86/include/asm/unistd_32.h
arch/x86/include/asm/unistd_64.h
Fixes up the borked syscall numbers of perfcounters versus
preadv/pwritev as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The patch adds kernel parameter intel_iommu=pt to set up pass through
mode in context mapping entry. This disables DMAR in linux kernel; but
KVM still runs on VT-d and interrupt remapping still works.
In this mode, kernel uses swiotlb for DMA API functions but other VT-d
functionalities are enabled for KVM. KVM always uses multi level
translation page table in VT-d. By default, pass though mode is disabled
in kernel.
This is useful when people don't want to enable VT-d DMAR in kernel but
still want to use KVM and interrupt remapping for reasons like DMAR
performance concern or debug purpose.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Weidong Han <weidong@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Alternative header duplicates assembly that could be merged in
one single macro. Merging this into this macro also allows to
directly declare ALTERNATIVE() statements within assembly code.
Uses a __stringify() of the feature bits rather than passing a
"i" operand. Leave the old %0 operand as-is (set to 0), unused
to stay compatible with API.
(v2: tab alignment fixes)
[ Impact: cleanup ]
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
LKML-Reference: <20090428151346.GA31212@Krystal>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
arch/x86/kernel/ptrace.c
Merge reason: fix the conflict above, and also pick up the CONFIG_BROKEN
dependency change from upstream so that we can remove it
here.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
currently these are paravirtulaized, doesn't appear any callers rely on
this (no pv_ops backends are using native_tlb and overriding cr3/4
access).
[ Impact: fix lockdep warning with paravirt and function tracer ]
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
LKML-Reference: <20090423172138.GR3036@sequoia.sous-sol.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Rename set_pci_bus_resources_arch_default to x86_pci_root_bus_res_quirks, move
the weak version from common.c to i386.c, and before calling, make sure it's a
root bus.
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The earlier patch to change the poller to a separate function subtly
broke the boot logging logic. This could lead to machine checks
getting logged at boot even when disabled or defaulting to off
on some systems. Fix that.
[ Impact: bug fix - avoid spurious MCE in log ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
When interrupt-remapping is enabled, we are relying on
setup_IO_APIC_irqs() to configure remapped entries in the
IO-APIC, which comes little bit later after enabling
interrupt-remapping.
Meanwhile, restoration of old io-apic entries after enabling
interrupt-remapping will not make the interrupts through
io-apic functional anyway.
So remove the unnecessary reinit_intr_remapped_IO_APIC() step.
The longer story:
When interrupt-remapping is enabled, IO-APIC entries need to be
setup in the re-mappable format (pointing to
interrupt-remapping table entries setup by the OS). This
remapping configuration is happening in the same place where we
traditionally configure IO-APIC (i.e., in
setup_IO_APIC_irqs()).
So when we enable interrupt-remapping successfully, there is no
need to restore old io-apic RTE entries before we actually do a
complete configuration shortly in setup_IO_APIC_irqs(). Old
IO-APIC RTE's may be in traditional format (non re-mappable) or
in re-mappable format pointing to interrupt-remapping table
entries setup by BIOS. Restoring both of these will not make
IO-APIC functional. We have to rely on setup_IO_APIC_irqs() for
proper configuration by OS.
So I am removing this unnecessary and broken step.
[ Impact: remove unnecessary/broken IO-APIC setup step ]
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Weidong Han <weidong.han@intel.com>
Cc: dwmw2@infradead.org
LKML-Reference: <20090420200450.552359000@linux-os.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In non-SMP mode, the variable section attribute specified by DECLARE_PER_CPU()
does not agree with that specified by DEFINE_PER_CPU(). This means that
architectures that have a small data section references relative to a base
register may throw up linkage errors due to too great a displacement between
where the base register points and the per-CPU variable.
On FRV, the .h declaration says that the variable is in the .sdata section, but
the .c definition says it's actually in the .data section. The linker throws
up the following errors:
kernel/built-in.o: In function `release_task':
kernel/exit.c:78: relocation truncated to fit: R_FRV_GPREL12 against symbol `per_cpu__process_counts' defined in .data section in kernel/built-in.o
kernel/exit.c:78: relocation truncated to fit: R_FRV_GPREL12 against symbol `per_cpu__process_counts' defined in .data section in kernel/built-in.o
To fix this, DECLARE_PER_CPU() should simply apply the same section attribute
as does DEFINE_PER_CPU(). However, this is made slightly more complex by
virtue of the fact that there are several variants on DEFINE, so these need to
be matched by variants on DECLARE.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes guest crash 'lguest: bad read address 0x4800000 len 256'
The new per-cpu allocator ends up handing a non-linear address to
write_gdt_entry. We do __pa() on it, and hand it to the host, which
kills us.
I've long wanted to make the hypercall "LOAD_GDT_ENTRY" to match the IDT
code, but had no pressing reason until now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: lguest@ozlabs.org
Currently, when x2apic is not enabled, interrupt remapping
will be enabled in init_dmars(), where it is too late to remap
ioapic interrupts, that is, ioapic interrupts are really in
compatibility mode, not remappable mode.
This patch always enables interrupt remapping before ioapic
setup, it guarantees all interrupts will be remapped when
interrupt remapping is enabled. Thus it doesn't need to set
the compatibility interrupt bit.
[ Impact: refactor intr-remap init sequence, enable fuller remap mode ]
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: allen.m.kay@intel.com
Cc: fenghua.yu@intel.com
LKML-Reference: <1239957736-6161-4-git-send-email-weidong.han@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Shouldn't call ack_apic_edge() in ir_ack_apic_edge(), because
ack_apic_edge() does more than just ack: it also does irq migration
in the non-interrupt-remapping case. But there is no such need for
interrupt-remapping case, as irq migration is done in the process
context.
Similarly, ir_ack_apic_level() shouldn't call ack_apic_level, and
instead should do the local cpu's EOI + directed EOI to the io-apic.
ack_x2APIC_irq() is not neccessary, because ack_APIC_irq() will use MSR
write for x2apic, and uncached write for non-x2apic.
[ Impact: simplify/standardize intr-remap IRQ acking, fix on !x2apic ]
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: allen.m.kay@intel.com
Cc: fenghua.yu@intel.com
LKML-Reference: <1239957736-6161-3-git-send-email-weidong.han@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: fix microcode driver newly spewing warnings
x86, PAT: Remove page granularity tracking for vm_insert_pfn maps
x86: disable X86_PTRACE_BTS for now
x86, documentation: kernel-parameters replace X86-32,X86-64 with X86
x86: pci-swiotlb.c swiotlb_dma_ops should be static
x86, PAT: Remove duplicate memtype reserve in devmem mmap
x86, PAT: Consolidate code in pat_x_mtrr_type() and reserve_memtype()
x86, PAT: Changing memtype to WC ensuring no WB alias
x86, PAT: Handle faults cleanly in set_memory_ APIs
x86, PAT: Change order of cpa and free in set_memory_wb
x86, CPA: Change idmap attribute before ioremap attribute setup
* 'x86/uv' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: UV BAU distribution and payload MMRs
x86: UV: BAU partition-relative distribution map
x86, uv: add Kconfig dependency on NUMA for UV systems
x86: prevent /sys/firmware/sgi_uv from being created on non-uv systems
x86, UV: Fix for nodes with memory and no cpus
x86, UV: system table in bios accessed after unmap
x86: UV BAU messaging timeouts
x86: UV BAU and nodes with no memory
Converting node_to_k8_nb_misc() from a macro to an inline function
makes compiler see the 'node' parameter in the !CONFIG_K8_NB too,
which eliminates these compiler warnings:
arch/x86/kernel/cpu/intel_cacheinfo.c: In function ‘show_cache_disable’:
arch/x86/kernel/cpu/intel_cacheinfo.c:712: warning: unused variable ‘node’
arch/x86/kernel/cpu/intel_cacheinfo.c: In function ‘store_cache_disable’:
arch/x86/kernel/cpu/intel_cacheinfo.c:739: warning: unused variable ‘node’
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Mark Langsdorf <mark.langsdorf@amd.com>
LKML-Reference: <1239730477.2966.26.camel@ht.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>