This ensures that all MSR definitions are consistently unsigned long,
and that MSR_CM does not become 0xffffffff80000000 (this is usually
harmless because MSR is 32-bit on booke and is mainly noticeable when
debugging, but still I'd rather avoid it).
Signed-off-by: Scott Wood <scottwood@freescale.com>
DCR handling was only needed for 440 KVM. Since we removed it, we can also
remove handling of DCR accesses.
Signed-off-by: Alexander Graf <agraf@suse.de>
We're going to implement guest code interpretation in KVM for some rare
corner cases. This code needs to be able to inject data and instruction
faults into the guest when it encounters them.
Expose generic APIs to do this in a reasonably subarch agnostic fashion.
Signed-off-by: Alexander Graf <agraf@suse.de>
Today the instruction emulator can get called via 2 separate code paths. It
can either be called by MMIO emulation detection code or by privileged
instruction traps.
This is bad, as both code paths prepare the environment differently. For MMIO
emulation we already know the virtual address we faulted on, so instructions
there don't have to actually fetch that information.
Split out the two separate use cases into separate files.
Signed-off-by: Alexander Graf <agraf@suse.de>
We use kvmppc_ld and kvmppc_st to emulate load/store instructions that may as
well access the magic page. Special case it out so that we can properly access
it.
Signed-off-by: Alexander Graf <agraf@suse.de>
We have enough common infrastructure now to resolve GVA->GPA mappings at
runtime. With this we can move our book3s specific helpers to load / store
in guest virtual address space to common code as well.
Signed-off-by: Alexander Graf <agraf@suse.de>
We have a nice API to find the translated GPAs of a GVA including protection
flags. So far we only use it on Book3S, but there's no reason the same shouldn't
be used on BookE as well.
Implement a kvmppc_xlate() version for BookE and clean it up to make it more
readable in general.
Signed-off-by: Alexander Graf <agraf@suse.de>
When calculating the lower bits of AVA field, use the shift
count based on the base page size. Also add the missing segment
size and remove stale comment.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
The POWER8 processor has a Micro Partition Prefetch Engine, which is
a fancy way of saying "has way to store and load contents of L2 or
L2+MRU way of L3 cache". We initiate the storing of the log (list of
addresses) using the logmpp instruction and start restore by writing
to a SPR.
The logmpp instruction takes parameters in a single 64bit register:
- starting address of the table to store log of L2/L2+L3 cache contents
- 32kb for L2
- 128kb for L2+L3
- Aligned relative to maximum size of the table (32kb or 128kb)
- Log control (no-op, L2 only, L2 and L3, abort logout)
We should abort any ongoing logging before initiating one.
To initiate restore, we write to the MPPR SPR. The format of what to write
to the SPR is similar to the logmpp instruction parameter:
- starting address of the table to read from (same alignment requirements)
- table size (no data, until end of table)
- prefetch rate (from fastest possible to slower. about every 8, 16, 24 or
32 cycles)
The idea behind loading and storing the contents of L2/L3 cache is to
reduce memory latency in a system that is frequently swapping vcores on
a physical CPU.
The best case scenario for doing this is when some vcores are doing very
cache heavy workloads. The worst case is when they have about 0 cache hits,
so we just generate needless memory operations.
This implementation just does L2 store/load. In my benchmarks this proves
to be useful.
Benchmark 1:
- 16 core POWER8
- 3x Ubuntu 14.04LTS guests (LE) with 8 VCPUs each
- No split core/SMT
- two guests running sysbench memory test.
sysbench --test=memory --num-threads=8 run
- one guest running apache bench (of default HTML page)
ab -n 490000 -c 400 http://localhost/
This benchmark aims to measure performance of real world application (apache)
where other guests are cache hot with their own workloads. The sysbench memory
benchmark does pointer sized writes to a (small) memory buffer in a loop.
In this benchmark with this patch I can see an improvement both in requests
per second (~5%) and in mean and median response times (again, about 5%).
The spread of minimum and maximum response times were largely unchanged.
benchmark 2:
- Same VM config as benchmark 1
- all three guests running sysbench memory benchmark
This benchmark aims to see if there is a positive or negative affect to this
cache heavy benchmark. Although due to the nature of the benchmark (stores) we
may not see a difference in performance, but rather hopefully an improvement
in consistency of performance (when vcore switched in, don't have to wait
many times for cachelines to be pulled in)
The results of this benchmark are improvements in consistency of performance
rather than performance itself. With this patch, the few outliers in duration
go away and we get more consistent performance in each guest.
benchmark 3:
- same 3 guests and CPU configuration as benchmark 1 and 2.
- two idle guests
- 1 guest running STREAM benchmark
This scenario also saw performance improvement with this patch. On Copy and
Scale workloads from STREAM, I got 5-6% improvement with this patch. For
Add and triad, it was around 10% (or more).
benchmark 4:
- same 3 guests as previous benchmarks
- two guests running sysbench --memory, distinctly different cache heavy
workload
- one guest running STREAM benchmark.
Similar improvements to benchmark 3.
benchmark 5:
- 1 guest, 8 VCPUs, Ubuntu 14.04
- Host configured with split core (SMT8, subcores-per-core=4)
- STREAM benchmark
In this benchmark, we see a 10-20% performance improvement across the board
of STREAM benchmark results with this patch.
Based on preliminary investigation and microbenchmarks
by Prerna Saxena <prerna@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
The 440 target hasn't been properly functioning for a few releases and
before I was the only one who fixes a very serious bug that indicates to
me that nobody used it before either.
Furthermore KVM on 440 is slow to the extent of unusable.
We don't have to carry along completely unused code. Remove 440 and give
us one less thing to worry about.
Signed-off-by: Alexander Graf <agraf@suse.de>
Scott Wood pointed out that We are no longer using SPRG1 for vcpu pointer,
but using SPRN_SPRG_THREAD <=> SPRG3 (thread->vcpu). So this comment
is not valid now.
Note: SPRN_SPRG3R is not supported (do not see any need as of now),
and if we want to support this in future then we have to shift to using
SPRG1 for VCPU pointer.
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
SPRN_SPRG is used by debug interrupt handler, so this is required for
debug support.
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
On book3e, guest last instruction is read on the exit path using load
external pid (lwepx) dedicated instruction. This load operation may fail
due to TLB eviction and execute-but-not-read entries.
This patch lay down the path for an alternative solution to read the guest
last instruction, by allowing kvmppc_get_lat_inst() function to fail.
Architecture specific implmentations of kvmppc_load_last_inst() may read
last guest instruction and instruct the emulation layer to re-execute the
guest in case of failure.
Make kvmppc_get_last_inst() definition common between architectures.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Add mising defines MAS0_GET_TLBSEL() and MAS1_GET_TSIZE() for Book3E.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
kvmppc_set_epr() is already defined in asm/kvm_ppc.h, So
rename and move get_epr helper function to same file.
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
[agraf: remove duplicate return]
Signed-off-by: Alexander Graf <agraf@suse.de>
Add and use kvmppc_set_esr() and kvmppc_get_esr() helper functions
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
There are shadow registers like, GSPRG[0-3], GSRR0, GSRR1 etc on
BOOKE-HV and these shadow registers are guest accessible.
So these shadow registers needs to be updated on BOOKE-HV.
This patch adds new macro for get/set helper of shadow register .
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
The magic page is defined as a 4k page of per-vCPU data that is shared
between the guest and the host to accelerate accesses to privileged
registers.
However, when the host is using 64k page size granularity we weren't quite
as strict about that rule anymore. Instead, we partially treated all of the
upper 64k as magic page and mapped only the uppermost 4k with the actual
magic contents.
This works well enough for Linux which doesn't use any memory in kernel
space in the upper 64k, but Mac OS X got upset. So this patch makes magic
page actually stay in a 4k range even on 64k page size hosts.
This patch fixes magic page usage with Mac OS X (using MOL) on 64k PAGE_SIZE
hosts for me.
Signed-off-by: Alexander Graf <agraf@suse.de>
Today we handle split real mode by mapping both instruction and data faults
into a special virtual address space that only exists during the split mode
phase.
This is good enough to catch 32bit Linux guests that use split real mode for
copy_from/to_user. In this case we're always prefixed with 0xc0000000 for our
instruction pointer and can map the user space process freely below there.
However, that approach fails when we're running KVM inside of KVM. Here the 1st
level last_inst reader may well be in the same virtual page as a 2nd level
interrupt handler.
It also fails when running Mac OS X guests. Here we have a 4G/4G split, so a
kernel copy_from/to_user implementation can easily overlap with user space
addresses.
The architecturally correct way to fix this would be to implement an instruction
interpreter in KVM that kicks in whenever we go into split real mode. This
interpreter however would not receive a great amount of testing and be a lot of
bloat for a reasonably isolated corner case.
So I went back to the drawing board and tried to come up with a way to make
split real mode work with a single flat address space. And then I realized that
we could get away with the same trick that makes it work for Linux:
Whenever we see an instruction address during split real mode that may collide,
we just move it higher up the virtual address space to a place that hopefully
does not collide (keep your fingers crossed!).
That approach does work surprisingly well. I am able to successfully run
Mac OS X guests with KVM and QEMU (no split real mode hacks like MOL) when I
apply a tiny timing probe hack to QEMU. I'd say this is a win over even more
broken split real mode :).
Signed-off-by: Alexander Graf <agraf@suse.de>
When building KVM with a lot of vcores (NR_CPUS is big), we can potentially
get out of the ld immediate range for dereferences inside that struct.
Move the array to the end of our kvm_arch struct. This fixes compilation
issues with NR_CPUS=2048 for me.
Signed-off-by: Alexander Graf <agraf@suse.de>
For FSL e6500 core the kernel uses power management SPR register (PWRMGTCR0)
to enable idle power down for cores and devices by setting up the idle count
period at boot time. With the host already controlling the power management
configuration the guest could simply benefit from it, so emulate guest request
as a general store.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
When running on an LE host all data structures are kept in little endian
byte order. However, the HTAB still needs to be maintained in big endian.
So every time we access any HTAB we need to make sure we do so in the right
byte order. Fix up all accesses to manually byte swap.
Signed-off-by: Alexander Graf <agraf@suse.de>
From assembly code we might not only have to explicitly BE access 64bit values,
but sometimes also 32bit ones. Add helpers that allow for easy use of lwzx/stwx
in their respective byte-reverse or native form.
Signed-off-by: Alexander Graf <agraf@suse.de>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tlb search operation used for victim hint relies on the default tlb set by the
host. When hardware tablewalk support is enabled in the host, the default tlb is
TLB1 which leads KVM to evict the bolted entry. Set and restore the default tlb
when searching for victim hint.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Reviewed-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This adds support for the H_SET_MODE hcall. This hcall is a
multiplexer that has several functions, some of which are called
rarely, and some which are potentially called very frequently.
Here we add support for the functions that set the debug registers
CIABR (Completed Instruction Address Breakpoint Register) and
DAWR/DAWRX (Data Address Watchpoint Register and eXtension),
since they could be updated by the guest as often as every context
switch.
This also adds a kvmppc_power8_compatible() function to test to see
if a guest is compatible with POWER8 or not. The CIABR and DAWR/X
only exist on POWER8.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
This adds code to check that when the KVM_CAP_PPC_ENABLE_HCALL
capability is used to enable or disable in-kernel handling of an
hcall, that the hcall is actually implemented by the kernel.
If not an EINVAL error is returned.
This also checks the default-enabled list of hcalls and prints a
warning if any hcall there is not actually implemented.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
This provides a way for userspace controls which sPAPR hcalls get
handled in the kernel. Each hcall can be individually enabled or
disabled for in-kernel handling, except for H_RTAS. The exception
for H_RTAS is because userspace can already control whether
individual RTAS functions are handled in-kernel or not via the
KVM_PPC_RTAS_DEFINE_TOKEN ioctl, and because the numeric value for
H_RTAS is out of the normal sequence of hcall numbers.
Hcalls are enabled or disabled using the KVM_ENABLE_CAP ioctl for the
KVM_CAP_PPC_ENABLE_HCALL capability on the file descriptor for the VM.
The args field of the struct kvm_enable_cap specifies the hcall number
in args[0] and the enable/disable flag in args[1]; 0 means disable
in-kernel handling (so that the hcall will always cause an exit to
userspace) and 1 means enable. Enabling or disabling in-kernel
handling of an hcall is effective across the whole VM.
The ability for KVM_ENABLE_CAP to be used on a VM file descriptor
on PowerPC is new, added by this commit. The KVM_CAP_ENABLE_CAP_VM
capability advertises that this ability exists.
When a VM is created, an initial set of hcalls are enabled for
in-kernel handling. The set that is enabled is the set that have
an in-kernel implementation at this point. Any new hcall
implementations from this point onwards should not be added to the
default set without a good reason.
No distinction is made between real-mode and virtual-mode hcall
implementations; the one setting controls them both.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Some compilers complain about uninitialized variables in the compute_tlbie_rb
function. When you follow the code path you'll realize that we'll never get
to that point, but the compiler isn't all that smart.
So just default to 4k page sizes for everything, making the compiler happy
and the code slightly easier to read.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Paul Mackerras <paulus@samba.org>
virtual time base register is a per VM, per cpu register that needs
to be saved and restored on vm exit and entry. Writing to VTB is not
allowed in the privileged mode.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[agraf: fix compile error]
Signed-off-by: Alexander Graf <agraf@suse.de>
To support per-event exclude settings on Power8 we need access to the
struct perf_events in compute_mmcr().
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
DISABLE_INTS has a long and storied history, but for some time now it
has not actually disabled interrupts.
For the open-coded exception handlers, just stop using it, instead call
RECONCILE_IRQ_STATE directly. This has the benefit of removing a level
of indirection, and making it clear that r10 & r11 are used at that
point.
For the addition case we still need a macro, so rename it to clarify
what it actually does.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The comment on TRACE_ENABLE_INTS is incorrect, and appears to have
always been incorrect since the code was merged. It probably came from
an original out-of-tree patch.
Replace it with something that's correct. Also propagate the message to
RECONCILE_IRQ_STATE(), because it's potentially subtle.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We have a strange #define in cputable.h called CLASSIC_PPC.
Although it is defined for 32 & 64bit, it's only used for 32bit and
it's basically a duplicate of CONFIG_PPC_BOOK3S_32, so let's use
the latter.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Although the name CONFIG_POWER4 suggests that it controls support for
power4 cpus, this symbol is actually misnamed.
It is a historical wart from the powermac code, which used to support
building a 32-bit kernel for power4. CONFIG_POWER4 was used in that
context to guard code that was 64-bit only.
In the powermac code we can just use CONFIG_PPC64 instead, and in other
places it is a synonym for CONFIG_PPC_BOOK3S_64.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We no longer support these cpus, so we don't need oprofile support for
them either.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Now that we have dropped power3 support we can remove CONFIG_POWER3. The
usage in pgtable_32.c was already dead code as CONFIG_POWER3 was not
selectable on PPC32.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We now only support cpus that use an SLB, so we don't need an MMU
feature to indicate that.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Old cpus didn't have a Segment Lookaside Buffer (SLB), instead they had
a Segment Table (STAB). Now that we've dropped support for those cpus,
we can remove the STAB support entirely.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We inadvertently broke power3 support back in 3.4 with commit
f5339277eb "powerpc: Remove FW_FEATURE ISERIES from arch code".
No one noticed until at least 3.9.
By then we'd also broken it with the optimised memcpy, copy_to/from_user
and clear_user routines. We don't want to add any more complexity to
those just to support ancient cpus, so it seems like it's a good time to
drop support for power3 and earlier.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently we have sys_sigpending and sys_old_getrlimit defined to use
COMPAT_SYS() in systbl.h, but then both are #defined to sys_ni_syscall
in systbl.S.
This seems to have been done when ppc and ppc64 were merged, in commit
9994a33 "Introduce entry_{32,64}.S, misc_{32,64}.S, systbl.S".
AFAICS there's no longer (or never was) any need for this, we can just
use SYSX() for both and remove the #defines to sys_ni_syscall.
The expansion before was:
#define COMPAT_SYS(func) .llong .sys_##func,.compat_sys_##func
#define sys_old_getrlimit sys_ni_syscall
COMPAT_SYS(old_getrlimit)
=>
.llong .sys_old_getrlimit,.compat_sys_old_getrlimit
=>
.llong .sys_ni_syscall,.compat_sys_old_getrlimit
After is:
#define SYSX(f, f3264, f32) .llong .f,.f3264
SYSX(sys_ni_syscall, compat_sys_old_getrlimit, sys_old_getrlimit)
=>
.llong .sys_ni_syscall,.compat_sys_old_getrlimit
ie. they are equivalent.
Finally both COMPAT_SYS() and SYSX() evaluate to sys_ni_syscall in the
Cell SPU code.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Pull powerpc fixes from Ben Herrenschmidt:
"Here is a handful of powerpc fixes for 3.16. They are all pretty
simple and self contained and should still make this release"
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: use _GLOBAL_TOC for memmove
powerpc/pseries: dynamically added OF nodes need to call of_node_init
powerpc: subpage_protect: Increase the array size to take care of 64TB
powerpc: Fix bugs in emulate_step()
powerpc: Disable doorbells on Power8 DD1.x
We now support TASK_SIZE of 16TB, hence the array should be 8.
Fixes the below crash:
Unable to handle kernel paging request for data at address 0x000100bd
Faulting instruction address: 0xc00000000004f914
cpu 0x13: Vector: 300 (Data Access) at [c000000fea75fa90]
pc: c00000000004f914: .sys_subpage_prot+0x2d4/0x5c0
lr: c00000000004fb5c: .sys_subpage_prot+0x51c/0x5c0
sp: c000000fea75fd10
msr: 9000000000009032
dar: 100bd
dsisr: 40000000
current = 0xc000000fea6ae490
paca = 0xc00000000fb8ab00 softe: 0 irq_happened: 0x00
pid = 8237, comm = a.out
enter ? for help
[c000000fea75fe30] c00000000000a164 syscall_exit+0x0/0x98
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
These processors do not currently support doorbell IPIs, so remove them
from the feature list if we are at DD 1.xx for the 0x004d part.
This fixes a regression caused by d4e58e5928 (powerpc/powernv: Enable
POWER8 doorbell IPIs). With that patch the kernel would hang at boot
when calling smp_call_function_many, as the doorbell would not be
received by the target CPUs:
.smp_call_function_many+0x2bc/0x3c0 (unreliable)
.on_each_cpu_mask+0x30/0x100
.cpuidle_register_driver+0x158/0x1a0
.cpuidle_register+0x2c/0x110
.powernv_processor_idle_init+0x23c/0x2c0
.do_one_initcall+0xd4/0x260
.kernel_init_freeable+0x25c/0x33c
.kernel_init+0x1c/0x120
.ret_from_kernel_thread+0x58/0x7c
Fixes: d4e58e5928 (powerpc/powernv: Enable POWER8 doorbell IPIs)
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Pull kvm fixes from Paolo Bonzini:
"These are mostly PPC changes for 3.16-new things. However, there is
an x86 change too and it is a regression from 3.14. As it only
affects nested virtualization and there were other changes in this
area in 3.16, I am not nominating it for 3.15-stable"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Check for nested events if there is an injectable interrupt
KVM: PPC: RTAS: Do byte swaps explicitly
KVM: PPC: Book3S PR: Fix ABIv2 on LE
KVM: PPC: Assembly functions exported to modules need _GLOBAL_TOC()
PPC: Add _GLOBAL_TOC for 32bit
KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value
KVM: PPC: Book3E: Unlock mmu_lock when setting caching atttribute
Knowing how long we spend in firmware calls is an important part of
minimising OS jitter.
This patch adds tracepoints to each OPAL call. If tracepoints are
enabled we branch out to a common routine that calls an entry and exit
tracepoint.
This allows us to write tools that monitor the frequency and duration
of OPAL calls, eg:
name count total(ms) min(ms) max(ms) avg(ms) period(ms)
OPAL_HANDLE_INTERRUPT 5 0.199 0.037 0.042 0.040 12547.545
OPAL_POLL_EVENTS 204 2.590 0.012 0.036 0.013 2264.899
OPAL_PCI_MSI_EOI 2830 3.066 0.001 0.005 0.001 81.166
We use jump labels if configured, which means we only add a single
nop instruction to every OPAL call when the tracepoints are disabled.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>