The map size counter passed into, and back out of, sanitize_e820_map(),
was an eight bit type (char or u8), as derived from its origins in
legacy BIOS E820 structures. This patch changes that type to an 'int',
to allow this sanitize routine to also be used on larger maps (larger
than the 256 count that fits in a char). The legacy BIOS E820 interface
of course does not change; that remains at 8 bits for this count, holding
up to E820MAX == 128 entries. But the kernel internals can handle more
when those additional memory map entries are passed from the BIOS via
EFI interfaces.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Extend internal boot time memory tables to allow for up to
three entries per node, which may be larger than the 128 E820MAX
entries handled by the legacy BIOS E820 interface. The EFI
interface, if present, is capable of passing memory map
entries for these larger node counts.
This patch requires an earlier patch that rewrote code depending
on these array sizes from using E820MAX explicitly to size loops,
to instead using ARRAY_SIZE() of the applicable array.
Another patch following this one will provide the code to pick
up additional memory entries passed via the EFI interface from
the BIOS and insert them in the following, now enlarged, arrays.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch is motivated by a subsequent patch which will allow for more
memory map entries on EFI supported systems than can be passed via the x86
legacy BIOS E820 interface. The legacy interface is limited to E820MAX ==
128 memory entries, and that "E820MAX" manifest constant was used as the
size for several arrays and loops over those arrays.
The primary change in this patch is to change code loop sizes over those
arrays from using the constant E820MAX, to using the ARRAY_SIZE() macro
evaluated for the array being looped. That way, a subsequent patch can
change the size of some of these arrays, without breaking this code.
This patch also adds a parameter to the sanitize_e820_map() routine,
which had an implicit size for the array passed it of E820MAX entries.
This new parameter explicitly passes the size of said array. Once again,
this will allow a subsequent patch to change that array size for some
calls to sanitize_e820_map() without breaking the code.
As part of enhancing the sanitize_e820_map() interface this way, I further
combined the unnecessarily distinct x86_32 and x86_64 declarations for
this routine into a single, commonly used, declaration.
This patch in itself should make no difference to the resulting kernel
binary.
[ mingo@elte.hu: merged to -tip ]
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The use of #defines with '##' pre-processor concatenation is a useful
way to form several symbol names with a common pattern. But when there
is just a single name obtained from that #define, it's just obfuscation.
Better to just write the plain symbol name, as is.
The following patch is a result of my wasting ten minutes looking through
the kernel to figure out what 'PB_migrate_end' meant, and forgetting what
I came to do, by the time I figured out that the #define PB_range macro
defined it.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove three extern declarations for routines
that don't exist. Fix a typo in a comment.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The patch:
x86: convert cpu_to_apicid to be a per cpu variable
introduced a dependency of ipi.h on smp.h in x86
builds with an allnoconfig. Including smp.h in ipi.h
fixes the build error:
In file included from arch/x86/kernel/traps_64.c:48:
include/asm/ipi.h: In function 'send_IPI_mask_sequence':
include/asm/ipi.h:114: error: 'per_cpu__x86_cpu_to_apicid' undeclared (first use in this function)
Signed-off-by: Paul Jackson <pj@sgi.com>
Cc: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
converting MTRR layout from continous to discrete, some time could run out of
MTRRs. So add gran_sizek to prevent that by dumpping small RAM piece less than
gran_sizek.
previous trimming only can handle highest_pfn from mtrr to end_pfn from e820.
when have more than 4g RAM installed, there will be holes below 4g. so need to
check ram below 4g is coverred well.
need to be applied after
[PATCH] x86: mtrr cleanup for converting continuous to discrete layout v7
Signed-off-by: Yinghai Lu <yinghai.lu@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The x86_64 code has centralized the memory setup code in
e820_64.c. This patch copies that approach to i386:
- early_param("mem", ...) parsing is moved from
setup_32.c to e820_32.c.
- setup_memory_map() and finish_e820_parsing() are
factored out from setup_arch(), and declarations
are added to e820_32.h.
- print_memory_map() is made static and removed from
e820_32.h.
- user_defined_memmap is marked as __initdata.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Looked at this file because of __memcpy warnings.
Thought it could use a style/checkpatch cleanup.
No change in vmlinux.
[tglx: fixed the remaining issues ]
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Simply stitch these together. There are just two definitions that are shared
but the file is resonably small and putting these things together shows that
further unifications requires a unification of the per cpu / pda handling
between both arches.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
X86_VM_MASK is kernel specific flags so hide it from userland programs.
It should be defined *before* ptrace.h inclusion because of circular
link between these files
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
arch/x86/kernel/mmconf-fam10h_64.c is missing the prototypes, which
are decalred in arch/x86/kernel/setup_64.c. Move the prototypes and
the inline stubs to the appropriate header file.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
As git-grep shows, open_softirq() is always called with the last argument
being NULL
block/blk-core.c: open_softirq(BLOCK_SOFTIRQ, blk_done_softirq, NULL);
kernel/hrtimer.c: open_softirq(HRTIMER_SOFTIRQ, run_hrtimer_softirq, NULL);
kernel/rcuclassic.c: open_softirq(RCU_SOFTIRQ, rcu_process_callbacks, NULL);
kernel/rcupreempt.c: open_softirq(RCU_SOFTIRQ, rcu_process_callbacks, NULL);
kernel/sched.c: open_softirq(SCHED_SOFTIRQ, run_rebalance_domains, NULL);
kernel/softirq.c: open_softirq(TASKLET_SOFTIRQ, tasklet_action, NULL);
kernel/softirq.c: open_softirq(HI_SOFTIRQ, tasklet_hi_action, NULL);
kernel/timer.c: open_softirq(TIMER_SOFTIRQ, run_timer_softirq, NULL);
net/core/dev.c: open_softirq(NET_TX_SOFTIRQ, net_tx_action, NULL);
net/core/dev.c: open_softirq(NET_RX_SOFTIRQ, net_rx_action, NULL);
This observation has already been made by Matthew Wilcox in June 2002
(http://www.cs.helsinki.fi/linux/linux-kernel/2002-25/0687.html)
"I notice that none of the current softirq routines use the data element
passed to them."
and the situation hasn't changed since them. So it appears we can safely
remove that extra argument to save 128 (54) bytes of kernel data (text).
Signed-off-by: Carlos R. Mafra <crmafra@ift.unesp.br>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LAPIC interrupts, which don't go through the generic interrupt handling
code, aren't accounted for in /proc/stat. Hence this patch adds a
mechanism architectures can use to accordingly adjust the statistics.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
.. allowing it to be write-protected just as other read-only data
under CONFIG_DEBUG_RODATA.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
While examining holes in percpu section I found this :
c05f5000 D per_cpu__current_task
c05f5000 D __per_cpu_start
c05f5004 D per_cpu__cpu_number
c05f5008 D per_cpu__irq_regs
c05f500c d per_cpu__cpu_devices
c05f5040 D per_cpu__cyc2ns
<Big Hole of about 4000 bytes>
c05f6000 d per_cpu__cpuid4_info
c05f6004 d per_cpu__cache_kobject
c05f6008 d per_cpu__index_kobject
<Big Hole of about 4000 bytes>
c05f7000 D per_cpu__gdt_page
This is because gdt_page is a percpu variable, defined with
a page alignement, and linker is doing its job, two times because of .o
nesting in the build process.
I introduced a new macro DEFINE_PER_CPU_PAGE_ALIGNED() to avoid
wasting this space. All page aligned variables (only one at this time)
are put in a separate
subsection .data.percpu.page_aligned, at the very begining of percpu zone.
Before patch , on a x86_32 machine :
.data.percpu 30232 3227471872
.data.percpu 22168 3227471872
Thats 8064 bytes saved for each CPU.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix unaligned access errors when setting softlockup_thresh on
64 bit platforms.
Allow softlockup detection to be disabled by setting
softlockup_thresh <= 0.
Detect that boot time softlockup detection has been disabled
earlier in softlockup_tick.
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
allow users to configure the softlockup detector to generate a panic
instead of a warning message.
high-availability systems might opt for this strict method (combined
with panic_timeout= boot option/sysctl), instead of generating
softlockup warnings ad infinitum.
also, automated tests work better if the system reboots reliably (into
a safe kernel) in case of a lockup.
The full spectrum of configurability is supported: boot option, sysctl
option and Kconfig option.
it's default-disabled.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
x86: prevent PGE flush from interruption/preemption
x86: use explicit copy in vdso_gettimeofday()
namespacecheck: automated fixes
x86/xen: fix arbitrary_virt_to_machine()
x86: don't read maxlvt before checking if APIC is mapped
x86: disable TSC for sched_clock() when calibration failed
x86: distangle user disabled TSC from unstable
x86: fix setup of cyc2ns in tsc_64.c
When we get any IO error during a recovery (rebuilding a spare), we abort
the recovery and restart it.
For RAID6 (and multi-drive RAID1) it may not be best to restart at the
beginning: when multiple failures can be tolerated, the recovery may be
able to continue and re-doing all that has already been done doesn't make
sense.
We already have the infrastructure to record where a recovery is up to
and restart from there, but it is not being used properly.
This is because:
- We sometimes abort with MD_RECOVERY_ERR rather than just MD_RECOVERY_INTR,
which causes the recovery not be be checkpointed.
- We remove spares and then re-added them which loses important state
information.
The distinction between MD_RECOVERY_ERR and MD_RECOVERY_INTR really isn't
needed. If there is an error, the relevant drive will be marked as
Faulty, and that is enough to ensure correct handling of the error. So we
first remove MD_RECOVERY_ERR, changing some of the uses of it to
MD_RECOVERY_INTR.
Then we cause the attempt to remove a non-faulty device from an array to
fail (unless recovery is impossible as the array is too degraded). Then
when remove_and_add_spares attempts to remove the devices on which
recovery can continue, it will fail, they will remain in place, and
recovery will continue on them as desired.
Issue: If we are halfway through rebuilding a spare and another drive
fails, and a new spare is immediately available, do we want to:
1/ complete the current rebuild, then go back and rebuild the new spare or
2/ restart the rebuild from the start and rebuild both devices in
parallel.
Both options can be argued for. The code currently takes option 2 as
a/ this requires least code change
b/ this results in a minimally-degraded array in minimal time.
Cc: "Eivind Sarto" <ivan@kasenna.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In some configurations, a raid6 resync can be limited by CPU speed
(Calculating P and Q and moving data) rather than by device speed. In
these cases there is nothing to be gained byt serialising resync of arrays
that share a device, and doing the resync in parallel can provide benefit.
So add a sysfs tunable to flag an array as being allowed to resync in
parallel with other arrays that use (a different part of) the same device.
Signed-off-by: Bernd Schubert <bs@q-leap.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The atomic_t type is 32bit but a 64bit system can have more than 2^32
pages of virtual address space available. Without this we overflow on
ludicrously large mappings
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
<linux/types.h> can't be used together with <sys/ustat.h> because they
both define struct ustat:
$ cat test.c
#include <sys/ustat.h>
#include <linux/types.h>
$ gcc -c test.c
In file included from test.c:2:
/usr/include/linux/types.h:165: error: redefinition of 'struct ustat'
has been reported a while ago to debian, but seems to have been
lost in cat fighting: http://bugs.debian.org/429064
Signed-off-by: maximilian attems <max@stro.at>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>