Now that smp_ops->smp_message_pass is always called with an (online) cpu
number for the target remove the checks for MSG_ALL and MSG_ALL_BUT_SELF.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The only user of MSG_ALL_BUT_SELF in the whole kernel tree is powerpc,
and it only uses it to start the debugger. Both debuggers always call
smp_send_debugger_break with MSG_ALL_BUT_SELF, and only mpic can do
anything more optimal than a loop over all online cpus, but all message
passing implementations have to code for this special delivery target.
Convert smp_send_debugger_break to take void and loop calling the smp_ops
message_pass function for each of the other cpus in the online cpumask.
Use raw_smp_processor_id() because we are either entering the debugger
or trying to start kdump and the additional warning it not useful were
it to trigger.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
c1854e0072 (powerpc: Set nr_cpu_ids early
and use it to free PACAs) copied the formerly static setup_nr_cpu_ids
from init/main.c but 34db18a054 (smp:
move smp setup functions to kernel/smp.c) moved it to kernel/smp.c
with a declaration in include/linux/smp.h, so we can call it instead of
replicating it.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Now that we never set a cpu above nr_cpu_ids possible we can
limit our initial paca allocation to nr_cpu_ids. We can then
clamp the number of cpus in platforms/iseries/setup.c.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We should not set cpus above nr_cpu_ids to possible. While we
will trigger a warning with CONFIG_CPUMASK_DEBUG, even then the mask
initializers will set the bits beyond what the iterators check and cause
nr_cpu_ids to increase.
Respecting nr_cpu_ids during setup will allow us to use it in our initial
paca allocation. It can be reduced from NR_CPUS by the existing early param
nr_cpus=, which was added in 2b633e3fac (smp:
Use nr_cpus= to set nr_cpu_ids early). We already call parse_early_parms
between finding the command line and allocating the pacas.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Starting with 1426d5a3bd (powerpc:
Dynamically allocate pacas) the space for pacas beyond cpu_possible
is freed, but we failed to update the loop in crash.c.
Since c1854e0072 (powerpc: Set nr_cpu_ids
early and use it to free PACAs) the number of pacas allocated is
always nr_cpu_ids.
Signed-off-by: Milton Miller <miltonm@bga.com>
Cc: <stable@kernel.org> # .34.x
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Starting with 1426d5a3bd (powerpc:
Dynamically allocate pacas) we free the memory for pacas beyond
cpu_possible, but we failed to update the loop the secondary cpus use
to find their paca. If the system has running cpu threads for which
the kernel did not allocate a paca for they will search the memory that
was freed. For instance this could happen when the device tree for
a kdump kernel was not updated after a cpu hotplug, or the kernel is
running with more cpus than the kernel was configured.
Since c1854e0072 (powerpc: Set nr_cpu_ids
early and use it to free PACAs) we set nr_cpu_ids before telling the
cpus to advance, so use that to limit the search.
We can't reference nr_cpu_ids without CONFIG_SMP because it is defined
as 1 instead of a memory location, but any extra threads should be sent
to kexec_wait in that case anyways, so make that explicit and remove
the search loop for UP.
Note to stable: The fix also requires
c1854e0072 (powerpc: Set
nr_cpu_ids early and use it to free PACAs) to function. Also
9d07bc841c (Properly handshake CPUs going
out of boot spin loop) affects the second chunk, specifically the branch
target was 3b before and is 4b after that patch, and there was a blank
line before the #ifdef CONFIG_SMP that was removed
Cc: <stable@kernel.org> # .34.x: c1854e0072 powerpc: Set nr_cpu_ids early
Cc: <stable@kernel.org> # .34.x
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit 1fc711f7ff (powerpc/kexec: Fix race
in kexec shutdown) moved the write to signal the cpu had exited the kernel
from before the transition to real mode in kexec_smp_wait to kexec_wait.
Unfornately it missed that kexec_wait is used both by cpus leaving the
kernel and by secondary slave cpus that were not allocated a paca for
what ever reason -- they could be beyond nr_cpus or not described in
the current device tree for whatever reason (for example, kexec-load
was not refreshed after a cpu hotplug operation). Cpus coming through
that path they will write to paca[NR_CPUS] which is beyond the space
allocated for the paca data and overwrite memory not allocated to pacas
but very likely still real mode accessable).
Move the write back to kexec_smp_wait, which is used only by cpus that
found their paca, but after the transition to real mode.
Signed-off-by: Milton Miller <miltonm@bga.com>
Cc: <stable@kernel.org> # (1fc711f was backported to 2.6.32)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
After looking at our system call path, Mary Brown suggested that we
should put all mfspr SRR* instructions before any mtspr SRR*.
To test this I used a very simple null syscall (actually getppid)
testcase at http://ozlabs.org/~anton/junkcode/null_syscall.c
I tested with the following changes against the pseries_defconfig:
CONFIG_VIRT_CPU_ACCOUNTING=n
CONFIG_AUDIT=n
to remove the overhead of virtual CPU accounting and syscall
auditing.
POWER6:
baseline: mean = 757.2 cycles sd = 2.108
modified: mean = 759.1 cycles sd = 2.020
POWER7:
baseline: mean = 411.4 cycles sd = 0.138
modified: mean = 404.1 cycles sd = 0.109
So we have 1.77% improvement on POWER7 which looks significant. The
POWER6 suggest a 0.25% slowdown, but the results are within 1
standard deviation and may be in the noise.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
A static branch hint will override dynamic branch prediction on
recent POWER CPUs. Since we are about to use more altivec in the
kernel remove the static hint in giveup_altivec that assumes
a userspace task is using altivec.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To make it easier to add optimised versions of copy_page, remove
the 4kB loop for 64kB pages and just do all the work in copy_page.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch below removes an unused config variable found by using a kernel
cleanup script.
Note: I did try to cross compile these but hit erros while doing so..
(gcc is not setup to cross compile) and am unsure if anymore needs to be done.
Please have a look if/when anybody has free time.
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
commit c56e58537d breaks SMP support in PPC_47x chip.
secondary_ti must be set to current thread info before callin kick_cpu or else
start_secondary_47x will jump into void when trying to return to c-code.
In the current setup secondary_ti is initialized before the CPU idle task is started
and only the boot core will start. I am not sure this is the correct solution, but it
makes SMP possible in my chip.
Note! The HOTPLUG support probably need some fixing to, There is no trampoline code
available in head_44x.S - start_secondary_resume?
Signed-off-by: Kerstin Jonsson <kerstin.jonsson@ericsson.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit b987812b3f left
crash_kexec_wait_realmode() undefined for UP.
Commit 7c7a81b53e defined it for UP but
left it undefined for 32-bit SMP.
Seems like people are getting confused by nested #ifdef's, so move the
definitions of crash_kexec_wait_realmode() after the #ifdef CONFIG_SMP
section.
Compile-tested with 32-bit UP, 32-bit SMP and 64-bit SMP configurations.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On some arches (x86, sh, arm, unicore, powerpc) the oops message would
print out the last sysfs file accessed.
This was very useful in finding a number of sysfs and driver core bugs
in the 2.5 and early 2.6 development days, but it has been a number of
years since this file has actually helped in debugging anything that
couldn't also be trivially determined from the stack traceback.
So it's time to delete the line. This is good as we need all the space
we can get for oops messages at times on consoles.
Acked-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch drops the reference to a global 'cmd_line' variable from
early_init_dt_scan_chosen, and instead passes the pointer to the command
line string via the *data argument. Each architecture does something
slightly different with the initial command line, so it makes sense for
the architecture to be able to specify the variable name.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
We make use of ptrace_get_breakpoints() / ptrace_put_breakpoints() to
protect ptrace_set_debugreg() even if CONFIG_HAVE_HW_BREAKPOINT if off.
However in this case, these APIs are not implemented.
To fix this, push the protection down inside the relevant ifdef.
Best would be to export the code inside
CONFIG_HAVE_HW_BREAKPOINT into a standalone function to cleanup
the ifdefury there and call the breakpoint ref API inside. But
as it is more invasive, this should be rather made in an -rc1.
Fixes this build error:
arch/powerpc/kernel/ptrace.c:1594: error: implicit declaration of function 'ptrace_get_breakpoints' make[2]: ***
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: LPPC <linuxppc-dev@lists.ozlabs.org>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: v2.6.33.. <stable@kernel.org>
Link: http://lkml.kernel.org/r/1304639598-4707-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
slb0_limit() wasn't a very descriptive name. This changes it along with
a comment explaining what it's used for, and provides a 64-bit BookE
implementation.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Adds support for page coalescing, which is a feature on IBM Power servers
which allows for coalescing identical pages between logical partitions.
Hint text pages as coalesce candidates, since they are the most likely
pages to be able to be coalesced between partitions. This patch also
exports some page coalescing statistics available from firmware via
lparcfg.
[BenH: Moved a couple of things around to fix compile problems]
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit b987812b3f left
crash_kexec_wait_realmode() undefined for UP.
Commit 7c7a81b53e defined it for UP but
left it undefined for 32-bit SMP.
Seems like people are getting confused by nested #ifdef's, so move the
definitions of crash_kexec_wait_realmode() after the #ifdef CONFIG_SMP
section.
Compile-tested with 32-bit UP, 32-bit SMP and 64-bit SMP configurations.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Adapt new API.
Almost change is trivial. Most important change is the below line
because we plan to change task->cpus_allowed implementation.
- ctx->cpus_allowed = current->cpus_allowed;
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Recent 64-bit server processors (POWER6 and POWER7) have a "Come-From
Address Register" (CFAR), that records the address of the most recent
branch or rfid (return from interrupt) instruction for debugging purposes.
This saves the value of the CFAR in the exception entry code and stores
it in the exception frame. We also make xmon print the CFAR value in
its register dump code.
Rather than extend the pt_regs struct at this time, we steal the orig_gpr3
field, which is only used for system calls, and use it for the CFAR value
for all exceptions/interrupts other than system calls. This means we
don't save the CFAR on system calls, which is not a great problem since
system calls tend not to happen unexpectedly, and also avoids adding the
overhead of reading the CFAR to the system call entry path.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When we take an interrupt or exception from kernel mode and the stack
pointer is obviously not a kernel address (i.e. the top bit is 0), we
switch to an emergency stack, save register values and panic. However,
on 64-bit server machines, we don't actually save the values of r9 - r13
at the time of the interrupt, but rather values corrupted by the
exception entry code for r12-r13, and nothing at all for r9-r11.
This fixes it by passing a pointer to the register save area in the paca
through to the bad_stack code in r3. The register values are saved in
one of the paca register save areas (depending on which exception this
is). Using the pointer in r3, the bad_stack code now retrieves the
saved values of r9 - r13 and stores them in the exception frame on the
emergency stack. This also stores the normal exception frame marker
("regshere") in the exception frame.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Some of the 64bit PPC CPU features are MMU-related, so this patch moves
them to MMU_FTR_ bits. All cpu_has_feature()-style tests are moved to
mmu_has_feature(), and seven feature bits are freed as a result.
Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
RTAS returns extended error codes as a hint of how long the
OS might want to wait before retrying a call. If we have nothing
else useful to do we may as well call back straight away.
This was found when testing the new dynamic dma window feature.
Firmware split the zeroing of the TCE table into 32k chunks but
returned 9901 (which is a suggested wait of 10ms). All up this took
about 10 minutes to complete since msleep is jiffies based and will
round 10ms up to 20ms.
With the patch below we take 3 seconds to complete the same test.
The hint firmware is returning in the RTAS call should definitely
be decreased, but even if we slept 1ms each iteration this would
take 32s.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Use the new MSR_64BIT in a few places. Some of these are already ifdef'ed
for BOOKE vs BOOKS, but it's still clearer, MSR_SF does not immediately
parse as "MSR bit for 64bit".
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This can be useful for differentiating interrupts on the same host
but with different chip data.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The DSCR (aka Data Stream Control Register) is supported on some
server PowerPC chips and allow some control over the prefetch
of data streams.
This patch allows the value to be specified per thread by emulating
the corresponding mfspr and mtspr instructions. Children of such
threads inherit the value. Other threads use a default value that
can be specified in sysfs - /sys/devices/system/cpu/dscr_default.
If a thread starts with non default value in the sysfs entry,
all children threads inherit this non default value even if
the sysfs value is changed later.
Signed-off-by: Alexey Kardashevskiy <aik@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When we set up the TLB for ourselves on Book3E, we need to flush out any
old mappings established by the firmware or bootloader. At present we
attempt this with a tlbilx to flush everything, but this will leave behind
any entries with the IPROT bit set.
There are several good reason firmware might establish mappings with IPROT,
and in fact ePAPR compliant firmwares are required to establish their
initial mapped area with IPROT.
This patch, therefore adds more complex code to scan through the TLB upon
entry and flush away any entries that are not our own.
Signed-off-by: Jack Miller <jack@codezen.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
An erratum on A2 can lead to the bolted entry we insert for the linear
mapping being evicted, to avoid that write the bolted entry to way 3.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
In exc_lvl_ctx_init() we index into the crit/dbg/mcheck stacks using
the hard cpu id, but that assumes the hard cpu id is zero based and
contiguous. That is not the case on A2.
The root of the problem is that the 32bit code has no equivalent of the
paca to allow it to do the hard->soft mapping in assembler. Until the
32bit code is updated to handle that, index the stacks using the soft
cpu ids on 64bit and hard on 32 bit.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When we start a cpu we use smp_ops->kick_cpu(), which currently
returns void, it should be able to fail. Convert it to return
int, and update all uses.
Convert all the current error cases to return -ENOENT, which is
what would eventually be returned by __cpu_up() currently when
it doesn't detect the cpu as coming up in time.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We need to do that to guarantee they see any code change done by
dynamic patching during boot.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Wakeup comes from the system reset handler with a potential loss of
the non-hypervisor CPU state. We save the non-volatile state on the
stack and a pointer to it in the PACA, which the system reset handler
uses to restore things
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We need to wait a bit for them to have done their CPU setup
or we might end up with translation and EE on with different
LPCR values between threads
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We do it before we loop on the PACA start flag. This way, we get a
chance to set critical SPRs on all CPUs before Linux tries to start
them up, which avoids problems when changing some bits such as LPCR
bits that need to be identical on all threads of a core or similar
things like that. Ideally, some of that should also be done before
the MMU is enabled, but that's a separate issue which would require
moving some of the SMP startup code earlier, let's not get there
for now, it works with that change alone.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>