The Bestcomm driver requests a memory region larger than the one
described in the device tree. This is due to an extra undocumented field
in the bestcomm register structure. This hasn't been a problem up to
now, but there is a patch pending to make the DT platform_bus support
code use platform_device_add() which tightens the rules and provides
extra checks for drivers to stay within the specified register regions.
Alternately, I could have removed the extra field from the structure,
but I'm not sure if it is still needed for resume to work. Better be
safe and leave it in.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Anatolij Gustschin <agust@denx.de>
Fix warnings:
symbol 'clockctl' was not declared. Should it be static?
symbol 'rate_clks' was not declared. Should it be static?
symbol 'dev_clks' was not declared. Should it be static?
symbol 'mpc5121_clk_init' was not declared. Should it be static?
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Add ability to configure chip select (CS) parameters for devices
that need different CS parameters setup after their configuration.
I.e. an FPGA device on LP bus can require different CS parameters
for its bus interface after loading firmware into it. A driver
can easily reconfigure the LPC CS parameters using this function.
Acked-by: Timur Tabi <timur@tabi.org>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
The ASM version of hash computation function was truncating the upper bit.
Make the ASM version similar to hpt_hash function. Remove masking vsid bits.
Without this patch, we observed hang during bootup due to not satisfying page
fault request correctly. The fault handler used wrong hash values to update
the HPTE. Hence we kept looping with page fault.
hash_page(ea=000001003e260008, access=203, trap=300 ip=3fff91787134 dsisr 42000000
The computed value of hash 000000000f22f390
update: avpnv=4003e46054003e00, hash=000000000722f390, f=80000006, psize: 2 ...
BenH: The over-masking has been there for ever but only hurts with the
new 64T support introduced in 3.7
Reported-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Tested-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org> [v3.7]
Only alpha and sparc are unusual - they have ka_restorer in it.
And nobody needs that exposed to userland.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Switch from __ARCH_WANT_SYS_RT_SIGACTION to opposite
(!CONFIG_ODD_RT_SIGACTION); the only two architectures that
need it are alpha and sparc. The reason for use of CONFIG_...
instead of __ARCH_... is that it's needed only kernel-side
and doing it that way avoids a mess with include order on many
architectures.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Make some POWER7-specific perf events available in sysfs.
$ /bin/ls -1 /sys/bus/event_source/devices/cpu/events/
branch-instructions
branch-misses
cache-misses
cache-references
cpu-cycles
instructions
PM_BRU_FIN
PM_BRU_MPRED
PM_CMPLU_STALL
PM_CYC
PM_GCT_NOSLOT_CYC
PM_INST_CMPL
PM_LD_MISS_L1
PM_LD_REF_L1
stalled-cycles-backend
stalled-cycles-frontend
where the 'PM_*' events are POWER specific and the others are the
generic events.
This will enable users to specify these events with their symbolic
names rather than with their raw code.
perf stat -e 'cpu/PM_CYC' ...
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anton Blanchard <anton@au1.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: linuxppc-dev@ozlabs.org
Link: http://lkml.kernel.org/r/20130123062528.GE13720@us.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Bring in the 'net' tree so that we can get some ipv4/ipv6 bug
fixes that some net-next work will build upon.
Signed-off-by: David S. Miller <davem@davemloft.net>
Early driver probing can fail due to not available clocks
(clk_get() fails) since the clk API init didn't take place yet.
Move clocks init before bus probing.
Signed-off-by: Anatolij Gustschin <agust@denx.de>
There are now two kinds of DMA windows that might be presented by
PowerVM DDW support -- huge windows (that can map all of system memory
regardless of the LPAR configuration) and non-huge windows (which
can't). They are implemented slightly differently in PowerVM, and thus
have different characteristics. The most obvious is that slot isolate
doesn't clear the TCEs/window for us with non-huge windows. Thus, when a
DLPAR operation occurs on a slot using a non-huge window, TCEs are still
present (the notifier chain doesn't currently remove them explicitly)
and the DLPAR fails. Fix this by calling remove_ddw() first, which will
unmap the DDW TCEs.
Note: a corresponding change to drmgr is needed to actually successfully
DLPAR, such that the device-tree update (which causes the notifier chain
to fire) occurs before slot isolate.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
tce_clearrange_multi_pSeriesLP is attempting to iterate over all TCEs in
a given range. However, is it not advancing the dma_offset value passed
to plpar_tce_stuff via the next value. This prevents DLPAR from
completing, because TCEs are still present at slot isolation time.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Change the hardware breakpoint code so that we can support wider ranged
breakpoints.
This means both ptrace and perf hardware breakpoints can use upto 512 byte long
breakpoints when using the DAWR and only 8 byte when using the DABR.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently we set the length field in the DAWR to 0 which defaults it to one
double word (64bits) which is the same as the DABR.
Change this so that we can set it to longer values as supported by the DAWR.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
perf/Power: PERF_EVENT_IOC_ENABLE does not reenable event
If we disable a perf event because we exceeded the specified ->event_limit,
power_pmu_stop() sets the PERF_HES_STOPPED flag on the event.
If the application then re-enables the event using PERF_EVENT_IOC_ENABLE
ioctl, we don't ever clear this STOPPED flag. Consequently, the user space
is never notified of the event.
Following message has more background and test case.
http://lists.eecs.utk.edu/pipermail/ptools-perfapi/2012-October/002528.html
Used the following test cases to verify that this patch works on latest PAPI.
$ papi.git/src/ctests/nonthread PAPI_TOT_CYC@5000000
$ papi.git/src/ctests/overflow_single_event
Changelog[v2]:
- [Paul Mackerras] Also clear PERF_HES_UPTODATE flag since we are
restarting the event; cleanup comments and patch description.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Use local_paca directly in macro SHARED_PROCESSOR, as all processors
have the same value for the field shared_proc, so we don't need care
racy here.
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Uprobes uses emulate_step in sstep.c, but we haven't explicitly specified
the dependency. On pseries HAVE_HW_BREAKPOINT protects us, but 44x has no
such luxury.
Consolidate other users that depend on sstep and create a new config option.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Cc: linuxppc-dev@ozlabs.org
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CTS-1000 is based on P4080. GPIO 27 is used to signal the FPGA to
switch off power, and also associates IRQ 8 with front-panel button
press (which we use to call orderly_poweroff()).
The relevant device-tree looks like this:
gpio0: gpio@130000 {
compatible = "fsl,qoriq-gpio";
reg = <0x130000 0x1000>;
interrupts = <55 2 0 0>;
#gpio-cells = <2>;
gpio-controller;
/* Allows powering off the system via GPIO signal. */
gpio-halt@27 {
compatible = "sgy,gpio-halt";
gpios = <&gpio0 27 0>;
interrupts = <8 1 0 0>;
};
};
Because the driver cannot match on sgy,gpio-halt (because the node is never
processed through of_platform), it matches on fsl,qoriq-gpio and then
checks child nodes for the matching sgy,gpio-halt. This also ensures that
the GPIO controller is detected prior to sgy_cts1000's probe callback,
since that node wont match via of_platform until the controller is
registered.
Also, because the GPIO handler for triggering system poweroff might sleep,
the IRQ uses a workqueue to call orderly_poweroff().
As a final note, this driver may be expanded for other features specific to
the CTS-1000.
Signed-off-by: Ben Collins <ben.c@servergy.com>
Cc: Jack Smith <jack.s@servergy.com>
Cc: Vihar Rai <vihar.r@servergy.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
While this should be harmless now that distribute_irqs
obeys MPIC_SINGLE_DEST_CPU, there's no reason to enable this
on mpc85xx/mpc86xx since MPIC_SINGLE_DEST_CPU will always be set.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Previously we were setting an illegal configuration on mpc85xx
MPICs if CONFIG_IRQ_ALL_CPUS is enabled (which for some reason it is
in mpc85xx_smp_defconfig).
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We have a mix of decimal and hex here, so lets make them consistently
hex. Also, strace will print them in hex if it can't decode them, so
having them in hex here makes it easier to match up.
No functional change.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The only persistent change made by this loop is calling
memblock_set_node() once for each memblock, which is not useful (and has
no effect) as memblock_set_node() is not called with any
memblock-specific parameters.
Subsistute a single memblock_set_node().
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
With lazy interrupt, we always call __check_irq_replaysome with
decrementers_next_tb to check if we need to replay timer interrupt.
So in hotplug case we also need to set decrementers_next_tb as MAX
to make sure __check_irq_replay don't replay timer interrupt
when return as we expect, otherwise we'll trap here infinitely.
Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
the variable backup_current_thread_info isn't freed before existing the
function.
Signed-off-by: Cong Ding <dinggnu@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
In preempt case current arch_local_irq_restore() from
preempt_schedule_irq() may enable hard interrupt but we really
should disable interrupts when we return from the interrupt,
and so that we don't get interrupted after loading SRR0/1.
Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The calculation for the left shift of the mask OPROFILE_PM_PMCSEL_MSK has an
error. The calculation is should be to shift left by (max_cntrs - cntr) times
the width of the pmsel field width. However, the #define OPROFILE_MAX_PMC_NUM
was used instead of OPROFILE_PMSEL_FIELD_WIDTH. This patch fixes the
calculation.
Signed-off-by: Carl Love <cel@us.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
commit f96972f2dc "kernel/sys.c: call disable_nonboot_cpus() in
kernel_restart()"
added a call to disable_nonboot_cpus() on kernel_restart(), which tries
to shutdown all the CPUs except the first one. The issue with the PA
Semi, is that it does not support CPU hotplug.
When the call is made to __cpu_down(), it calls the notifiers
CPU_DOWN_PREPARE, and then tries to take the CPU down.
One of the notifiers to the CPU hotplug code, is the cpufreq. The
DOWN_PREPARE will call __cpufreq_remove_dev() which calls
cpufreq_driver->exit. The PA Semi exit handler unmaps regions of I/O
that is used by an interrupt that goes off constantly
(system_reset_common, but it goes off during normal system operations
too). I'm not sure exactly what this interrupt does.
Running a simple function trace, you can see it goes off quite a bit:
# tracer: function
#
# TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |
<idle>-0 [001] 1558.859363: .pasemi_system_reset_exception <-.system_reset_exception
<idle>-0 [000] 1558.860112: .pasemi_system_reset_exception <-.system_reset_exception
<idle>-0 [000] 1558.861109: .pasemi_system_reset_exception <-.system_reset_exception
<idle>-0 [001] 1558.861361: .pasemi_system_reset_exception <-.system_reset_exception
<idle>-0 [000] 1558.861437: .pasemi_system_reset_exception <-.system_reset_exception
When the region is unmapped, the system crashes with:
Disabling non-boot CPUs ...
Error taking CPU1 down: -38
Unable to handle kernel paging request for data at address 0xd0000800903a0100
Faulting instruction address: 0xc000000000055fcc
Oops: Kernel access of bad area, sig: 11 [#1]
PREEMPT SMP NR_CPUS=64 NUMA PA Semi PWRficient
Modules linked in: shpchp
NIP: c000000000055fcc LR: c000000000055fb4 CTR: c0000000000df1fc
REGS: c0000000012175d0 TRAP: 0300 Not tainted (3.8.0-rc4-test-dirty)
MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR: 24000088 XER: 00000000
SOFTE: 0
DAR: d0000800903a0100, DSISR: 42000000
TASK = c0000000010e9008[0] 'swapper/0' THREAD: c000000001214000 CPU: 0
GPR00: d0000800903a0000 c000000001217850 c0000000012167e0 0000000000000000
GPR04: 0000000000000000 0000000000000724 0000000000000724 0000000000000000
GPR08: 0000000000000000 0000000000000000 0000000000000001 0000000000a70000
GPR12: 0000000024000080 c00000000fff0000 ffffffffffffffff 000000003ffffae0
GPR16: ffffffffffffffff 0000000000a21198 0000000000000060 0000000000000000
GPR20: 00000000008fdd35 0000000000a21258 000000003ffffaf0 0000000000000417
GPR24: 0000000000a226d0 c000000000000000 0000000000000000 0000000000000000
GPR28: c00000000138b358 0000000000000000 c000000001144818 d0000800903a0100
NIP [c000000000055fcc] .set_astate+0x5c/0xa4
LR [c000000000055fb4] .set_astate+0x44/0xa4
Call Trace:
[c000000001217850] [c000000000055fb4] .set_astate+0x44/0xa4 (unreliable)
[c0000000012178f0] [c00000000005647c] .restore_astate+0x2c/0x34
[c000000001217980] [c000000000054668] .pasemi_system_reset_exception+0x6c/0x88
[c000000001217a00] [c000000000019ef0] .system_reset_exception+0x48/0x84
[c000000001217a80] [c000000000001e40] system_reset_common+0x140/0x180
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
If we want to stop the tick further idle, we need to be
able to account the cputime without using the tick.
Virtual based cputime accounting solves that problem by
hooking into kernel/user boundaries.
However implementing CONFIG_VIRT_CPU_ACCOUNTING require
low level hooks and involves more overhead. But we already
have a generic context tracking subsystem that is required
for RCU needs by archs which plan to shut down the tick
outside idle.
This patch implements a generic virtual based cputime
accounting that relies on these generic kernel/user hooks.
There are some upsides of doing this:
- This requires no arch code to implement CONFIG_VIRT_CPU_ACCOUNTING
if context tracking is already built (already necessary for RCU in full
tickless mode).
- We can rely on the generic context tracking subsystem to dynamically
(de)activate the hooks, so that we can switch anytime between virtual
and tick based accounting. This way we don't have the overhead
of the virtual accounting when the tick is running periodically.
And one downside:
- There is probably more overhead than a native virtual based cputime
accounting. But this relies on hooks that are already set anyway.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
The guest TLB handling code should not have any insight into how the host
TLB shadow code works.
kvmppc_e500_tlbil_all() is a function that is used for distinction between
e500v2 and e500mc (E.HV) on how to flush shadow entries. This function really
is private between the e500.c/e500mc.c file and e500_mmu_host.c.
Instead of this one, use the public kvmppc_core_flush_tlb() function to flush
all shadow TLB entries. As a nice side effect, with this we also end up
flushing TLB1 entries which we forgot to do before.
Signed-off-by: Alexander Graf <agraf@suse.de>