It has come to my attention that kprobe event stack tracing does not
work on powerpc. You can see with the following:
# cd /sys/kernel/debug/tracing
# echo stacktrace > trace_options
# echo 'p kfree' > kprobe_events
# echo 1 > events/kprobes/enable
Will print the following warning:
save_stack_trace_regs() not implemented yet.
Although save_stack_trace() (which normal event stack traces use) is
implemented, save_stack_trace_regs() which kprobe events use is not.
This is a cheap attempt to implement that function.
Note, This may have issues if a task tries to get a stack trace from
another task with its regs, because it just passes in "current" to
save_context_stack(). But this does solve the issue with stack tracing
kprobe events.
Reported-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit 2fc251a8dd ("powerpc: Copy only required pieces of the
mm_context_t to the paca") broke the build for CONFIG_PPC_STD_MMU_64=y
and CONFIG_PPC_MM_SLICES=n.
That only happens for a kernel built with 4K pages and HUGETLB disabled,
which is why we missed it.
Fix it by adding a mm_ctx_user_psize member to the paca and populating
it in the appropriate places.
Fixes: 2fc251a8dd ("powerpc: Copy only required pieces of the mm_context_t to the paca")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Since the numbers now overlap, it makes sense to enumerate
them in asm/kvm_host.h rather than linux/kvm_host.h. Functions
that refer to architecture-specific requests are also moved
to arch/.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Have mdio_alloc() create the array of interrupt numbers, and
initialize it to POLLING. This is what most MDIO drivers want, so
allowing code to be removed from the drivers.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
KVM/ARM changes for Linux v4.5
- Complete rewrite of the arm64 world switch in C, hopefully
paving the way for more sharing with the 32bit code, better
maintainability and easier integration of new features.
Also smaller and slightly faster in some cases...
- Support for 16bit VM identifiers
- Various cleanups
The SKF_AD_ALU_XOR_X ancillary is not like the other ancillary data
instructions since it XORs A with X while all the others replace A with
some loaded value. All the BPF JITs fail to clear A if this is used as
the first instruction in a filter. This was found using american fuzzy
lop.
Add a helper to determine if A needs to be cleared given the first
instruction in a filter, and use this in the JITs. Except for ARM, the
rest have only been compile-tested.
Fixes: 3480593131 ("net: filter: get rid of BPF_S_* enum")
Signed-off-by: Rabin Vincent <rabin@rab.in>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Expose socket options for setting a classic or extended BPF program
for use when selecting sockets in an SO_REUSEPORT group. These options
can be used on the first socket to belong to a group before bind or
on any socket in the group after bind.
This change includes refactoring of the existing sk_filter code to
allow reuse of the existing BPF filter validation checks.
Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a new device driver for a high performance SR-IOV assisted virtual
network for IBM System p and IBM System i systems. The SR-IOV VF will be
attached to the VIOS partition and mapped to the Linux client via the
hypervisor's VNIC protocol that this driver implements.
This driver is able to perform basic tx and rx, new features
and improvements will be added as they are being developed and tested.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix off-by-one error in opal_mce_check_early_recovery() when checking
whether the NIP falls within OPAL space.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
A few of the config prompts for powerpc self-tests have periods at the
end, which is inconsistent with the rest of the prompts. Remove the
periods.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Only delay opal_rtc_read() when busy and are going to retry.
This has the advantage of possibly saving a massive 10ms off booting!
Kudos to Stewart for noticing.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On BMC machines, console output is controlled by the OPAL firmware and is
only flushed when its pollers are called. When the kernel is in a panic
state, it no longer calls these pollers and thus console output does not
completely flush, causing some output from the panic to be lost.
Output is only actually lost when the kernel is configured to not power off
or reboot after panic (i.e. CONFIG_PANIC_TIMEOUT is set to 0) since OPAL
flushes the console buffer as part of its power down routines. Before this
patch, however, only partial output would be printed during the timeout wait.
This patch adds a new kmsg_dumper which gets called at panic time to ensure
panic output is not lost. It accomplishes this by calling OPAL_CONSOLE_FLUSH
in the OPAL API, and if that is not available, the pollers are called enough
times to (hopefully) completely flush the buffer.
The flushing mechanism will only affect output printed at and before the
kmsg_dump call in kernel/panic.c:panic(). As such, the "end Kernel panic"
message may still be truncated as follows:
>Call Trace:
>[c000000f1f603b00] [c0000000008e9458] dump_stack+0x90/0xbc (unreliable)
>[c000000f1f603b30] [c0000000008e7e78] panic+0xf8/0x2c4
>[c000000f1f603bc0] [c000000000be4860] mount_block_root+0x288/0x33c
>[c000000f1f603c80] [c000000000be4d14] prepare_namespace+0x1f4/0x254
>[c000000f1f603d00] [c000000000be43e8] kernel_init_freeable+0x318/0x350
>[c000000f1f603dc0] [c00000000000bd74] kernel_init+0x24/0x130
>[c000000f1f603e30] [c0000000000095b0] ret_from_kernel_thread+0x5c/0xac
>---[ end Kernel panic - not
This functionality is implemented as a kmsg_dumper as it seems to be the
most sensible way to introduce platform-specific functionality to the
panic function.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently we copy the whole mm_context_t to the paca but only access a
few bits of it. This is wasteful of space paca and also takes quite
some time in the hot path of context switching.
This patch pulls in only the required bits from the mm_context_t to
the paca and on context switch, copies only those.
Benchmarking this (On top of Anton's recent MSR context switching
changes [1]) using processes and yield shows an improvement of almost
3% on POWER8:
http://ozlabs.org/~anton/junkcode/context_switch2.c
./context_switch2 --test=yield --process 0 0
1. https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-October/135700.html
Signed-off-by: Michael Neuling <mikey@neuling.org>
[mpe: Rename paca fields to be mm_ctx_foo rather than context_foo]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Also add nodes and properties for thermal management support. Meanwhile
preprocessor support is needed using thermal of framework.
Signed-off-by: Jia Hongtao <hongtao.jia@freescale.com>
Reviewed-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Also add nodes and properties for thermal management support. Meanwhile
preprocessor support is needed using thermal of framework.
Signed-off-by: Jia Hongtao <hongtao.jia@freescale.com>
Reviewed-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
p1010rdb uses the irq[4:5] for inta and intb to pcie,
it is active-high, so set it.
Signed-off-by: Zhao Qiang <qiang.zhao@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
e6500 has threads but does not have TLB write conditional. Thus,
the hugetlb code needs to take the same lock that the normal TLB miss
handlers take, to ensure that the tlbsx and tlbwe are atomic.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Enable TWR_P102x option by default in mpc85xx_basic_defconfig to support
p1025twr board.
Signed-off-by: Pengbo Li <Pengbo.Li@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
This code was reworked in commit,
905e75c46d
This change removed the fsl_add_bridge() which originally was above
the addition of the pci_exclude_device function. I think the assumption was that
the pci_exclude_device would prevent changes to the bridge PCI config after
it's been added. It seems it wasn't fully tested on MPC85xx ADS because
if you move the fsl_add_bridge() the pci_exclude_device is set in the machine
description then you can never update the PCI Config since the exclude
prevents it. This disrupts things like DMA.
This issue was extensively debugged by David Beazley.
Cc: xe-kernel@external.cisco.com
Cc: dbeazley@cisco.com
Cc: dwalker@fifo99.com
Signed-off-by: Daniel Walker <danielwa@cisco.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
FMan V3H has 2 different MURAM sizes:
In B4860/4420 the MURAM size is 512KB.
In T4240 and T2080 the MURAM size is 384KB.
The MURAM size in FMan V3H device tree is 384KB.
This patch updates the MURAM size for B4 to 512KB.
Signed-off-by: Igal Liberman <igal.liberman@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
1. Use machine_arch_initcall to hook mpc85xx_common_publish_devices
This can ensure before pcibios_init() is called, pci controllers have
been probed and added to the hose_list.
2. Add a workaround for errata A-005434
For the BSC9132, PEX_PEXIWARn[TRGT] for all windows defaults to 0xF,
which is mapped to CCSRBAR. However, for other products, 0xF is
mapped to the local memory. Therefore, for the BSC9132, any default
PCI Express access to the local memory (DDR) will now access the
CCSRBAR. This patch changes the mapping of targets of inbound windows
PEX_PEXIWARn[TRGT] to the Local address space – 0x0 (from 0xF).
Signed-off-by: Harninder Rai <harninder.rai@freescale.com>
Signed-off-by: Minghuan Lian <Minghuan.Lian@freescale.com>
Signed-off-by: Hou Zhiqiang <B48286@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Pull kvm fixes from Paolo Bonzini:
- A series of fixes to the MTRR emulation, tested in the BZ by several
users so they should be safe this late
- A fix for a division by zero
- Two very simple ARM and PPC fixes
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Reload pit counters for all channels when restoring state
KVM: MTRR: treat memory as writeback if MTRR is disabled in guest CPUID
KVM: MTRR: observe maxphyaddr from guest CPUID, not host
KVM: MTRR: fix fixed MTRR segment look up
KVM: VMX: Fix host initiated access to guest MSR_TSC_AUX
KVM: arm/arm64: vgic: Fix kvm_vgic_map_is_active's dist check
kvm: x86: move tracepoints outside extended quiescent state
KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSR
ls1 has qe and ls1 has arm cpu.
move qe from arch/powerpc to drivers/soc/fsl
to adapt to powerpc and arm
Signed-off-by: Zhao Qiang <qiang.zhao@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Use subsys_initcall to init qe to adapt ARM architecture.
Remove qe_reset from PowerPC platform file.
Signed-off-by: Zhao Qiang <qiang.zhao@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
QE and CPM have the same muram, they use the same management
functions. Now QE support both ARM and PowerPC, it is necessary
to move QE to "driver/soc", so move the muram management functions
from cpm_common to qe_common for preparing to move QE code to "driver/soc"
Signed-off-by: Zhao Qiang <qiang.zhao@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
This adds a function to copy the mm->context to the paca. This is
only a basic conversion for now but will be used more extensively in
the next patch.
This also adds #ifdef CONFIG_PPC_BOOK3S around this code since it's
not used elsewhere.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit 25642e1459 ("powerpc/opal-irqchip: Fix double endian
conversion") fixed an endian bug by calling opal_handle_events() in
opal_event_unmask().
However this introduced a deadlock if we find an event is active
during unmasking and call opal_handle_events() again. The bad call
sequence is:
opal_interrupt()
-> opal_handle_events()
-> generic_handle_irq()
-> handle_level_irq()
-> raw_spin_lock(&desc->lock)
handle_irq_event(desc)
unmask_irq(desc)
-> opal_event_unmask()
-> opal_handle_events()
-> generic_handle_irq()
-> handle_level_irq()
-> raw_spin_lock(&desc->lock) (BOOM)
When generating multiple opal events in quick succession this would lead
to the following stall warnings:
EEH: Fenced PHB#0 detected, location: U78C9.001.WZS09XA-P1-C32
INFO: rcu_sched detected stalls on CPUs/tasks:
12-...: (1 GPs behind) idle=68f/140000000000001/0 softirq=860/861 fqs=2065
15-...: (1 GPs behind) idle=be5/140000000000001/0 softirq=1142/1143 fqs=2065
(detected by 13, t=2102 jiffies, g=1325, c=1324, q=602)
NMI watchdog: BUG: soft lockup - CPU#18 stuck for 22s! [irqbalance:2696]
INFO: rcu_sched detected stalls on CPUs/tasks:
12-...: (1 GPs behind) idle=68f/140000000000001/0 softirq=860/861 fqs=8371
15-...: (1 GPs behind) idle=be5/140000000000001/0 softirq=1142/1143 fqs=8371
(detected by 20, t=8407 jiffies, g=1325, c=1324, q=1290)
This patch corrects the problem by queuing the work if an event is
active during unmasking, which is similar to the pre-endian fix
behaviour.
Fixes: 25642e1459 ("powerpc/opal-irqchip: Fix double endian conversion")
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Reported-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
cppcheck picked up that there were a couple of missing va_end()
calls in functions using va_start().
Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Enable new kernel cpu hotplug functionality by allowing cpu dlpar requests
to be initiated from sysfs.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add the ability to hotplug add cpus via rtas hotplug events by either
specifying the drc index of the CPU to add, or providing a count of the
number of CPUs to add.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add the ability to dlpar remove CPUs via hotplug rtas events, either by
specifying the drc-index of the CPU to remove or providing a count of cpus
to remove.
To remove multiple cpus in a single request we create a list of possible
DR (Dynamic Reconfiguration) cpus and their drc indexes that can be
removed. We can then traverse the list remove each cpu and easily clean
up in any cases of failure.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Update the cpu dlpar add/remove paths to do better error recovery when
a failure occurs during the add/remove operation.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Re-factor the cpu hotplug code to support doing cpu hotplug completely in
the kernel and using the existing sysfs probe/release interfaces. This
patch pulls out pieces of existing cpu hotplug code into common routines,
dlpar_cpu_add() and dlpar_cpu_remove(), to be used by both interfaces.
There are no functional changes introduced.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
No functional changes, this patch is simply a move of the cpu hotplug
code from pseries/dlpar.c to pseries/hotplug-cpu.c. This is in an effort
to consolidate all of the cpu hotplug code in a common place.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When DLPAR adding a CPU we should verify that the CPU does not already
exist. Failure to do so can generate a kernel oops;
[ 9.465585] kernel BUG at arch/powerpc/platforms/pseries/dlpar.c:382!
[ 9.465796] Oops: Exception in kernel mode, sig: 5 [#1]
This oops can be generated by causing a probe to be performed on a cpu
by writing to the sysfs cpu probe file (/sys/devices/system/cpu/probe).
This patch adds a check for the existence of cpu prior to probing the cpu
so userspace doing the wrong thing won't trigger a BUG_ON().
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
PPC476FPE has a different PVR from previous PPC476 processors. The
kexec code checks the PVR in order to correctly setup the MMU. When
the initial support for 476FPE processors was added the corresponding
change in the kexec code was missed. This patch simply adds the check
and solves the following bug on kexec:
kexec: Starting new kernel
Bye!
Unable to handle kernel paging request for instruction fetch
Faulting instruction address: 0xee9a50f8
cpu 0x0: Vector: 400 (Instruction Access) at [ee9d7d20]
pc: ee9a50f8
lr: ee9a50e4
sp: ee9d7dd0
msr: 21020
current = 0xee40f000
pid = 960, comm = kexec
enter ? for help
[link register ] ee9a50e4
[ee9d7dd0] c0013748 default_machine_kexec+0x58/0x70 (unreliable)
[ee9d7df0] c0012f04 machine_kexec+0x34/0x40
[ee9d7e00] c00aa1ec kernel_kexec+0x9c/0xb0
[ee9d7e20] c005d704 SyS_reboot+0x1f4/0x220
[ee9d7f40] c000db68 ret_from_syscall+0x0/0x3c
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
NVLink is a high speed interconnect that is used in conjunction with a
PCI-E connection to create an interface between CPU and GPU that
provides very high data bandwidth. A PCI-E connection to a GPU is used
as the control path to initiate and report status of large data
transfers sent via the NVLink.
On IBM Power systems the NVLink processing unit (NPU) is similar to
the existing PHB3. This patch adds support for a new NPU PHB type. DMA
operations on the NPU are not supported as this patch sets the TCE
translation tables to be the same as the related GPU PCIe device for
each NVLink. Therefore all DMA operations are setup and controlled via
the PCIe device.
EEH is not presently supported for the NPU devices, although it may be
added in future.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Move __raw_rm_writeq() from platforms/powernv/pci-ioda.c to
include/asm/io.h so that it can be used by other code.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This commit removed the pcidev field from struct pci_dn as it was no
longer in use by the kernel. However to support finding the
association of Nvlink devices to GPU devices from the device-tree this
field is required.
This reverts commit 250c7b277c ("powerpc/pci: Remove unused struct
pci_dn.pcidev field").
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The name of PCI root bus's M64 resource isn't initialized properly.
When dumping "/proc/iomem", "<BAD>" is seen for those M64 resources
on PCI root buses.
~# cat /proc/iomem | grep -e "BAD"
3b0000000000-3b0fefffffff : <BAD>
3b1000000000-3b1fefffffff : <BAD>
3c0000000000-3c0fefffffff : <BAD>
3c1000000000-3c1fefffffff : <BAD>
3c2000000000-3c2fefffffff : <BAD>
This fixes the issue by setting the name of PCI root bus's M64
resource to that of PHB's device node full name. With the patch,
no "<BAD>" is seen from "/proc/iomem".
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
User space checkpoint and restart tool (CRIU) needs the page's change
to be soft tracked. This allows to do a pre checkpoint and then dump
only touched pages.
This is done by using a newly assigned PTE bit (_PAGE_SOFT_DIRTY) when
the page is backed in memory, and a new _PAGE_SWP_SOFT_DIRTY bit when
the page is swapped out.
To introduce a new PTE _PAGE_SOFT_DIRTY bit value common to hash 4k
and hash 64k pte, the bits already defined in hash-*4k.h should be
shifted left by one.
The _PAGE_SWP_SOFT_DIRTY bit is dynamically put after the swap type in
the swap pte. A check is added to ensure that the bit is not
overwritten by _PAGE_HPTEFLAGS.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
CC: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The STD_EXCEPTION_PSERIES macro takes both a vector number, and a
location (memory address). However both are always identical, so combine
them to save repeating ourselves.
This does mean an exception handler must always exist at the location in
memory that matches its vector number. But that's OK because this is the
"STD" macro (standard), which does exactly that. We have other macros
for the other cases, eg. STD_EXCEPTION_PSERIES_OOL (out of line).
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>