Commit Graph

3873 Commits

Author SHA1 Message Date
Aneesh Kumar K.V
438e69b52b powerpc/mm/radix: Skip ptesync in pte update helpers
We do them at the start of tlb flush, and we are sure a pte update will be
followed by a tlbflush. Hence we can skip the ptesync in pte update helpers.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Tested-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-15 20:02:41 +11:00
Aneesh Kumar K.V
f4894b80b1 powerpc/mm/radix: Use ptep_get_and_clear_full when clearing pte for full mm
This helps us to do some optimization for application exit case, where we can
skip the DD1 style pte update sequence.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Tested-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-15 20:02:40 +11:00
Aneesh Kumar K.V
ca94573b9c powerpc/mm/radix: Update pte update sequence for pte clear case
In the kernel we do follow the below sequence in different code paths.
pte = ptep_get_clear(ptep)
....
set_pte_at(ptep, pte)

We do that for mremap, autonuma protection update and softdirty clearing. This
implies our optimization to skip a tlb flush when clearing a pte update is
not valid, because for DD1 system that followup set_pte_at will be done witout
doing the required tlbflush. Fix that by always doing the dd1 style pte update
irrespective of new_pte value. In a later patch we will optimize the application
exit case.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Tested-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-15 20:02:40 +11:00
Michael Ellerman
36b390fd62 powerpc/mm: Fix build break with BOOK3S_64=n and MEMORY_HOTPLUG=y
The recently merged HPT (Hash Page Table) resize support broke the build
when BOOK3S_64=n (ie. 32-bit or 64-bit Book3E) and MEMORY_HOTPLUG=y:

  arch/powerpc/mm/mem.o: In function `.arch_add_memory':
  (.text+0x4e4): undefined reference to `.resize_hpt_for_hotplug'

Fix it by adding a dummy version.

Fixes: 438cc81a41 ("powerpc/pseries: Automatically resize HPT for memory hot add/remove")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-15 19:58:39 +11:00
Michael Ellerman
da0e7e6276 Merge branch 'topic/ppc-kvm' into next
Merge the topic branch we're sharing with the kvm-ppc tree.
2017-02-14 17:18:29 +11:00
Michael Ellerman
aad71e3928 powerpc/mm: Fix build break with RADIX=y & HUGETLBFS=n
If we enable RADIX but disable HUGETLBFS, the build breaks with:

  arch/powerpc/mm/pgtable-radix.c:557:7: error: implicit declaration of function 'pmd_huge'
  arch/powerpc/mm/pgtable-radix.c:588:7: error: implicit declaration of function 'pud_huge'

Fix it by stubbing those functions when HUGETLBFS=n.

Fixes: 4b5d62ca17 ("powerpc/mm: add radix__remove_section_mapping()")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-14 16:54:21 +11:00
Linus Torvalds
3ebc703316 Merge tag 'powerpc-4.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes friom Michael Ellerman:
 "Apologies for the late pull request, but Ben has been busy finding bugs.

   - Userspace was semi-randomly segfaulting on radix due to us
     incorrectly handling a fault triggered by autonuma, caused by a
     patch we merged earlier in v4.10 to prevent the kernel executing
     userspace.

   - We weren't marking host IPIs properly for KVM in the OPAL ICP
     backend.

   - The ERAT flushing on radix was missing an isync and was incorrectly
     marked as DD1 only.

   - The powernv CPU hotplug code was missing a wakeup type and failing
     to flush the interrupt correctly when using OPAL ICP

  Thanks to Benjamin Herrenschmidt"

* tag 'powerpc-4.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/powernv: Properly set "host-ipi" on IPIs
  powerpc/powernv: Fix CPU hotplug to handle waking on HVI
  powerpc/mm/radix: Update ERAT flushes when invalidating TLB
  powerpc/mm: Fix spurrious segfaults on radix with autonuma
2017-02-10 14:10:35 -08:00
Michael Ellerman
99ad503287 powerpc: Add a prototype for mcount() so it can be versioned
Currently we get a warning that _mcount() can't be versioned:

  WARNING: EXPORT symbol "_mcount" [vmlinux] version generation failed, symbol will not be versioned.

Add a prototype to asm-prototypes.h to fix it.

The prototype is not really correct, mcount() is not a normal function,
it has a special ABI. But for the purpose of versioning it doesn't
matter.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-10 13:28:06 +11:00
Anju T
51c9c08439 powerpc/kprobes: Implement Optprobes
Current infrastructure of kprobe uses the unconditional trap instruction
to probe a running kernel. Optprobe allows kprobe to replace the trap
with a branch instruction to a detour buffer. Detour buffer contains
instructions to create an in memory pt_regs. Detour buffer also has a
call to optimized_callback() which in turn call the pre_handler(). After
the execution of the pre-handler, a call is made for instruction
emulation. The NIP is determined in advanced through dummy instruction
emulation and a branch instruction is created to the NIP at the end of
the trampoline.

To address the limitation of branch instruction in POWER architecture,
detour buffer slot is allocated from a reserved area. For the time
being, 64KB is reserved in memory for this purpose.

Instructions which can be emulated using analyse_instr() are the
candidates for optimization. Before optimization ensure that the address
range between the detour buffer allocated and the instruction being
probed is within +/- 32MB.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-10 13:28:04 +11:00
Naveen N. Rao
30176466e3 powerpc/kprobes: Fixes for kprobe_lookup_name() on BE
Fix two issues with kprobes.h on BE which were exposed with the
optprobes work:
  - one, having to do with a missing include for linux/module.h for
    MODULE_NAME_LEN -- this didn't show up previously since the only
    users of kprobe_lookup_name were in kprobes.c, which included
    linux/module.h through other headers, and
  - two, with a missing const qualifier for a local variable which ends
    up referring a string literal. Again, this is unique to how
    kprobe_lookup_name is being invoked in optprobes.c

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-10 13:28:04 +11:00
Anju T
ebfa50df43 powerpc: Add helper to check if offset is within relative branch range
To permit the use of relative branch instruction in powerpc, the target
address has to be relatively nearby, since the address is specified in an
immediate field (24 bit filed) in the instruction opcode itself. Here
nearby refers to 32MB on either side of the current instruction.

This patch verifies whether the target address is within +/- 32MB
range or not.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-10 13:28:03 +11:00
Naveen N. Rao
c233f5979b powerpc/bpf: Introduce __PPC_SH64()
Introduce __PPC_SH64() as a 64-bit variant to encode shift field in some
of the shift and rotate instructions operating on double-words. Convert
some of the BPF instruction macros to use the same.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-10 13:28:03 +11:00
David Gibson
438cc81a41 powerpc/pseries: Automatically resize HPT for memory hot add/remove
We've now implemented code in the pseries platform to use the new PAPR
interface to allow resizing the hash page table (HPT) at runtime.

This patch uses that interface to automatically attempt to resize the HPT
when memory is hot added or removed.  This tries to always keep the HPT at
a reasonable size for our current memory size.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-10 13:28:02 +11:00
David Gibson
0de0fb09bb powerpc/pseries: Advertise HPT resizing support via CAS
The hypervisor needs to know a guest is capable of using the HPT resizing
PAPR extension in order to make full advantage of it for memory hotplug.

If the hypervisor knows the guest is HPT resize aware, it can size the
initial HPT based on the initial guest RAM size, relying on the guest to
resize the HPT when more memory is hot-added. Without this, the hypervisor
must size the HPT for the maximum possible guest RAM, which can lead to
a huge waste of space if the guest never actually expends to that maximum
size.

This patch advertises the guest's support for HPT resizing via the
ibm,client-architecture-support OF interface. We use bit 5 of byte 6 of
option vector 5 for this purpose, as defined in the PAPR ACR "HPT
resizing option".

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-10 13:28:01 +11:00
David Gibson
dbcf929c00 powerpc/pseries: Add support for hash table resizing
This adds support for using two hypercalls to change the size of the
main hash page table while running as a PAPR guest. For now these
hypercalls are only in experimental qemu versions.

The interface is two part: first H_RESIZE_HPT_PREPARE is used to
allocate and prepare the new hash table. This may be slow, but can be
done asynchronously. Then, H_RESIZE_HPT_COMMIT is used to switch to the
new hash table. This requires that no CPUs be concurrently updating the
HPT, and so must be run under stop_machine().

This also adds a debugfs file which can be used to manually control
HPT resizing or testing purposes.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Paul Mackerras <paulus@samba.org>
[mpe: Rename the debugfs file to "hpt_order"]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-10 13:27:55 +11:00
Paolo Bonzini
2e751dfb5f Merge tag 'kvmarm-for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
kvmarm updates for 4.11

- GICv3 save restore
- Cache flushing fixes
- MSI injection fix for GICv3 ITS
- Physical timer emulation support
2017-02-09 16:01:23 +01:00
David Gibson
64b40ffbc8 powerpc/pseries: Add hypercall wrappers for hash page table resizing
This adds the hypercall numbers and wrapper functions for the hash page
table resizing hypercalls.

These hypercall numbers are defined in the PAPR ACR "HPT resizing
option".

It also adds a new firmware feature flag to track the presence of the
HPT resizing calls.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-09 21:45:48 +11:00
Benjamin Herrenschmidt
9b25671497 powerpc/powernv: Fix CPU hotplug to handle waking on HVI
The IPIs come in as HVI not EE, so we need to test the appropriate
SRR1 bits. The encoding is such that it won't have false positives
on P7 and P8 so we can just test it like that. We also need to handle
the icp-opal variant of the flush.

Fixes: d74361881f ("powerpc/xics: Add ICP OPAL backend")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-09 15:19:57 +11:00
Paul Mackerras
a4a741a048 Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-next
This merges in a fix which touches both PPC and KVM code,
which was therefore put into a topic branch in the powerpc
tree.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-02-08 19:35:34 +11:00
Benjamin Herrenschmidt
ab9bad0ead powerpc/powernv: Remove separate entry for OPAL real mode calls
All entry points already read the MSR so they can easily do
the right thing.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-07 16:40:18 +11:00
Nicholas Piggin
2337d20728 powerpc/64: CONFIG_RELOCATABLE support for hmi interrupts
The branch from hmi_exception_early to hmi_exception_realmode must use
a "relocatable-style" branch, because it is branching from unrelocated
exception code to beyond __end_interrupts.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-07 16:39:44 +11:00
Aneesh Kumar K.V
a5ecdad484 powerpc/mm: Add MMU_FTR_KERNEL_RO to possible feature mask
Without this we will always find the feature disabled.

Fixes: 984d7a1ec6 ("powerpc/mm: Fixup kernel read only mapping")
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-07 16:29:55 +11:00
Nicholas Piggin
1a6822d194 powerpc/64s: Use (start, size) rather than (start, end) for exception handlers
start,size has the benefit of being easier to search for (start,end
usually gives you the preceeding vector from the one you want, as first
result).

Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-07 07:26:57 +11:00
Nicholas Piggin
852e5da99d powerpc/64s: Tidy up after exception handler rework
Somewhere along the line, search/replace left some naming garbled,
and untidy alignment (aka. mpe stuffed it up). Might as well fix them
all up now while git blame history doesn't extend too far.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-07 07:26:57 +11:00
Benjamin Herrenschmidt
98a5f361b8 powerpc: Add new cache geometry aux vectors
This adds AUX vectors for the L1I,D, L2 and L3 cache levels
providing for each cache level the size of the cache in bytes
and the geometry (line size and number of ways).

We chose to not use the existing alpha/sh definition which
packs all the information in a single entry per cache level as
it is too restricted to represent some of the geometries used
on POWER.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-06 19:46:04 +11:00
Benjamin Herrenschmidt
65e01f386f powerpc/64: Add L2 and L3 cache shape info
Retrieved from device-tree when available

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-06 19:46:04 +11:00
Benjamin Herrenschmidt
e2827fe5c1 powerpc/64: Clean up ppc64_caches using a struct per cache
We have two set of identical struct members for the I and D sides
and mostly identical bunches of code to parse the device-tree to
populate them. Instead make a ppc_cache_info structure with one
copy for I and one for D

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-06 19:46:04 +11:00
Benjamin Herrenschmidt
5d451a87e5 powerpc/64: Retrieve number of L1 cache sets from device-tree
It will be used to calculate the associativity

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-06 19:46:04 +11:00
Benjamin Herrenschmidt
bd067f83b0 powerpc/64: Fix naming of cache block vs. cache line
In a number of places we called "cache line size" what is actually
the cache block size, which in the powerpc architecture, means the
effective size to use with cache management instructions (it can
be different from the actual cache line size).

We fix the naming across the board and properly retrieve both
pieces of information when available in the device-tree.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-06 19:46:04 +11:00
Benjamin Herrenschmidt
2a196e24b3 powerpc: Move ARCH_DLINFO out of uapi
It's an kernel private macro, it doesn't belong there

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-06 19:46:04 +11:00
Linus Torvalds
57480b98af Merge tag 'powerpc-4.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
 "The main change is we're reverting the initial stack protector support
  we merged this cycle. It turns out to not work on toolchains built
  with libc support, and fixing it will be need to wait for another
  release.

  And the rest are all fairly minor:

   - Some pasemi machines were not booting due to a missing error check
     in prom_find_boot_cpu()

   - In EEH we were checking a pointer rather than the bool it pointed
     to

   - The clang build was broken by a BUILD_BUG_ON() we added.

   - The radix (Power9 only) version of map_kernel_page() was broken if
     our memory size was a multiple of 2MB, which it generally isn't

  Thanks to: Darren Stevens, Gavin Shan, Reza Arbab"

* tag 'powerpc-4.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/mm: Use the correct pointer when setting a 2MB pte
  powerpc: Fix build failure with clang due to BUILD_BUG_ON()
  powerpc: Revert the initial stack protector support
  powerpc/eeh: Fix wrong flag passed to eeh_unfreeze_pe()
  powerpc: Add missing error check to prom_find_boot_cpu()
2017-02-03 11:10:06 -08:00
Ard Biesheuvel
71810db27c modversions: treat symbol CRCs as 32 bit quantities
The modversion symbol CRCs are emitted as ELF symbols, which allows us
to easily populate the kcrctab sections by relying on the linker to
associate each kcrctab slot with the correct value.

This has a couple of downsides:

 - Given that the CRCs are treated as memory addresses, we waste 4 bytes
   for each CRC on 64 bit architectures,

 - On architectures that support runtime relocation, a R_<arch>_RELATIVE
   relocation entry is emitted for each CRC value, which identifies it
   as a quantity that requires fixing up based on the actual runtime
   load offset of the kernel. This results in corrupted CRCs unless we
   explicitly undo the fixup (and this is currently being handled in the
   core module code)

 - Such runtime relocation entries take up 24 bytes of __init space
   each, resulting in a x8 overhead in [uncompressed] kernel size for
   CRCs.

Switching to explicit 32 bit values on 64 bit architectures fixes most
of these issues, given that 32 bit values are not treated as quantities
that require fixing up based on the actual runtime load offset.  Note
that on some ELF64 architectures [such as PPC64], these 32-bit values
are still emitted as [absolute] runtime relocatable quantities, even if
the value resolves to a build time constant.  Since relative relocations
are always resolved at build time, this patch enables MODULE_REL_CRCS on
powerpc when CONFIG_RELOCATABLE=y, which turns the absolute CRC
references into relative references into .rodata where the actual CRC
value is stored.

So redefine all CRC fields and variables as u32, and redefine the
__CRC_SYMBOL() macro for 64 bit builds to emit the CRC reference using
inline assembler (which is necessary since 64-bit C code cannot use
32-bit types to hold memory addresses, even if they are ultimately
resolved using values that do not exceed 0xffffffff).  To avoid
potential problems with legacy 32-bit architectures using legacy
toolchains, the equivalent C definition of the kcrctab entry is retained
for 32-bit architectures.

Note that this mostly reverts commit d4703aefdb ("module: handle ppc64
relocating kcrctabs when CONFIG_RELOCATABLE=y")

Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03 08:28:25 -08:00
John Allen
e70d59700f powerpc/pseries: Introduce memory hotplug READD operation
Currently, memory must be hot removed and subsequently re-added in order
to dynamically update the affinity of LMBs specified by a PRRN event.
Earlier implementations of the PRRN event handler ran into issues in which
the hot remove would occur successfully, but a hotplug event would be
initiated from another source and grab the hotplug lock preventing the hot
add from occurring. To prevent this situation, this patch introduces the
notion of a hot "readd" action for memory which atomizes a hot remove and
a hot add into a single, serialized operation on the hotplug queue.

Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-02 16:57:39 +11:00
Daniel Axtens
f2ca809059 powerpc/sparse: Constify the address pointer in __get_user_nosleep()
In __get_user_nosleep, we create an intermediate pointer for the
user address we're about to fetch. We currently don't tag this
pointer as const. Make it const, as we are simply dereferencing
it, and it's scope is limited to the __get_user_nosleep macro.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-02 16:35:21 +11:00
Daniel Axtens
d466f6c5ca powerpc/sparse: Constify the address pointer in __get_user_nocheck()
In __get_user_nocheck, we create an intermediate pointer for the
user address we're about to fetch. We currently don't tag this
pointer as const. Make it const, as we are simply dereferencing
it, and it's scope is limited to the __get_user_nocheck macro.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-02 16:35:17 +11:00
Daniel Axtens
f84ed59a61 powerpc/sparse: Constify the address pointer in __get_user_check()
In __get_user_check, we create an intermediate pointer for the
user address we're about to fetch. We currently don't tag this
pointer as const. Make it const, as we are simply dereferencing
it, and it's scope is limited to the __get_user_check macro.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-02 16:35:11 +11:00
Frederic Weisbecker
b672592f02 sched/cputime: Remove generic asm headers
cputime_t is now only used by two architectures:

	* powerpc (when CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y)
	* s390

And since the core doesn't use it anymore, we don't need any arch support
from the others. So we can remove their stub implementations.

A final cleanup would be to provide an efficient pure arch
implementation of cputime_to_nsec() for s390 and powerpc and finally
remove include/linux/cputime.h .

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1485832191-26889-36-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-01 09:14:07 +01:00
Frederic Weisbecker
e7f340ca9c powerpc, sched/cputime: Remove unused cputime definitions
Since the core doesn't deal with cputime_t anymore, most of these APIs
have been left unused. Lets remove these.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1485832191-26889-33-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-01 09:14:04 +01:00
Ingo Molnar
ed5c8c854f Merge branch 'linus' into sched/core, to pick up fixes and refresh the branch
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-01 09:12:25 +01:00
David Gibson
5e9859699a KVM: PPC: Book3S HV: Outline of KVM-HV HPT resizing implementation
This adds a not yet working outline of the HPT resizing PAPR
extension.  Specifically it adds the necessary ioctl() functions,
their basic steps, the work function which will handle preparation for
the resize, and synchronization between these, the guest page fault
path and guest HPT update path.

The actual guts of the implementation isn't here yet, so for now the
calls will always fail.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-01-31 21:59:56 +11:00
David Gibson
f98a8bf9ee KVM: PPC: Book3S HV: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size
The KVM_PPC_ALLOCATE_HTAB ioctl() is used to set the size of hashed page
table (HPT) that userspace expects a guest VM to have, and is also used to
clear that HPT when necessary (e.g. guest reboot).

At present, once the ioctl() is called for the first time, the HPT size can
never be changed thereafter - it will be cleared but always sized as from
the first call.

With upcoming HPT resize implementation, we're going to need to allow
userspace to resize the HPT at reset (to change it back to the default size
if the guest changed it).

So, we need to allow this ioctl() to change the HPT size.

This patch also updates Documentation/virtual/kvm/api.txt to reflect
the new behaviour.  In fact the documentation was already slightly
incorrect since 572abd5 "KVM: PPC: Book3S HV: Don't fall back to
smaller HPT size in allocation ioctl"

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-01-31 21:59:45 +11:00
David Gibson
aae0777f1e KVM: PPC: Book3S HV: Split HPT allocation from activation
Currently, kvmppc_alloc_hpt() both allocates a new hashed page table (HPT)
and sets it up as the active page table for a VM.  For the upcoming HPT
resize implementation we're going to want to allocate HPTs separately from
activating them.

So, split the allocation itself out into kvmppc_allocate_hpt() and perform
the activation with a new kvmppc_set_hpt() function.  Likewise we split
kvmppc_free_hpt(), which just frees the HPT, from kvmppc_release_hpt()
which unsets it as an active HPT, then frees it.

We also move the logic to fall back to smaller HPT sizes if the first try
fails into the single caller which used that behaviour,
kvmppc_hv_setup_htab_rma().  This introduces a slight semantic change, in
that previously if the initial attempt at CMA allocation failed, we would
fall back to attempting smaller sizes with the page allocator.  Now, we
try first CMA, then the page allocator at each size.  As far as I can tell
this change should be harmless.

To match, we make kvmppc_free_hpt() just free the actual HPT itself.  The
call to kvmppc_free_lpid() that was there, we move to the single caller.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-01-31 21:59:39 +11:00
David Gibson
3d089f84c6 KVM: PPC: Book3S HV: Don't store values derivable from HPT order
Currently the kvm_hpt_info structure stores the hashed page table's order,
and also the number of HPTEs it contains and a mask for its size.  The
last two can be easily derived from the order, so remove them and just
calculate them as necessary with a couple of helper inlines.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-01-31 21:59:34 +11:00
David Gibson
3f9d4f5a5f KVM: PPC: Book3S HV: Gather HPT related variables into sub-structure
Currently, the powerpc kvm_arch structure contains a number of variables
tracking the state of the guest's hashed page table (HPT) in KVM HV.  This
patch gathers them all together into a single kvm_hpt_info substructure.
This makes life more convenient for the upcoming HPT resizing
implementation.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-01-31 21:59:28 +11:00
David Gibson
db9a290d9c KVM: PPC: Book3S HV: Rename kvm_alloc_hpt() for clarity
The difference between kvm_alloc_hpt() and kvmppc_alloc_hpt() is not at
all obvious from the name.  In practice kvmppc_alloc_hpt() allocates an HPT
by whatever means, and calls kvm_alloc_hpt() which will attempt to allocate
it with CMA only.

To make this less confusing, rename kvm_alloc_hpt() to kvm_alloc_hpt_cma().
Similarly, kvm_release_hpt() is renamed kvm_free_hpt_cma().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-01-31 21:59:23 +11:00
Paul Mackerras
8cf4ecc0ca KVM: PPC: Book3S HV: Enable radix guest support
This adds a few last pieces of the support for radix guests:

* Implement the backends for the KVM_PPC_CONFIGURE_V3_MMU and
  KVM_PPC_GET_RMMU_INFO ioctls for radix guests

* On POWER9, allow secondary threads to be on/off-lined while guests
  are running.

* Set up LPCR and the partition table entry for radix guests.

* Don't allocate the rmap array in the kvm_memory_slot structure
  on radix.

* Don't try to initialize the HPT for radix guests, since they don't
  have an HPT.

* Take out the code that prevents the HV KVM module from
  initializing on radix hosts.

At this stage, we only support radix guests if the host is running
in radix mode, and only support HPT guests if the host is running in
HPT mode.  Thus a guest cannot switch from one mode to the other,
which enables some simplifications.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31 19:11:52 +11:00
Paul Mackerras
a29ebeaf55 KVM: PPC: Book3S HV: Invalidate TLB on radix guest vcpu movement
With radix, the guest can do TLB invalidations itself using the tlbie
(global) and tlbiel (local) TLB invalidation instructions.  Linux guests
use local TLB invalidations for translations that have only ever been
accessed on one vcpu.  However, that doesn't mean that the translations
have only been accessed on one physical cpu (pcpu) since vcpus can move
around from one pcpu to another.  Thus a tlbiel might leave behind stale
TLB entries on a pcpu where the vcpu previously ran, and if that task
then moves back to that previous pcpu, it could see those stale TLB
entries and thus access memory incorrectly.  The usual symptom of this
is random segfaults in userspace programs in the guest.

To cope with this, we detect when a vcpu is about to start executing on
a thread in a core that is a different core from the last time it
executed.  If that is the case, then we mark the core as needing a
TLB flush and then send an interrupt to any thread in the core that is
currently running a vcpu from the same guest.  This will get those vcpus
out of the guest, and the first one to re-enter the guest will do the
TLB flush.  The reason for interrupting the vcpus executing on the old
core is to cope with the following scenario:

	CPU 0			CPU 1			CPU 4
	(core 0)			(core 0)			(core 1)

	VCPU 0 runs task X      VCPU 1 runs
	core 0 TLB gets
	entries from task X
	VCPU 0 moves to CPU 4
							VCPU 0 runs task X
							Unmap pages of task X
							tlbiel

				(still VCPU 1)			task X moves to VCPU 1
				task X runs
				task X sees stale TLB
				entries

That is, as soon as the VCPU starts executing on the new core, it
could unmap and tlbiel some page table entries, and then the task
could migrate to one of the VCPUs running on the old core and
potentially see stale TLB entries.

Since the TLB is shared between all the threads in a core, we only
use the bit of kvm->arch.need_tlb_flush corresponding to the first
thread in the core.  To ensure that we don't have a window where we
can miss a flush, this moves the clearing of the bit from before the
actual flush to after it.  This way, two threads might both do the
flush, but we prevent the situation where one thread can enter the
guest before the flush is finished.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31 19:11:51 +11:00
Paul Mackerras
8f7b79b837 KVM: PPC: Book3S HV: Implement dirty page logging for radix guests
This adds code to keep track of dirty pages when requested (that is,
when memslot->dirty_bitmap is non-NULL) for radix guests.  We use the
dirty bits in the PTEs in the second-level (partition-scoped) page
tables, together with a bitmap of pages that were dirty when their
PTE was invalidated (e.g., when the page was paged out).  This bitmap
is stored in the first half of the memslot->dirty_bitmap area, and
kvm_vm_ioctl_get_dirty_log_hv() now uses the second half for the
bitmap that gets returned to userspace.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31 19:11:50 +11:00
Paul Mackerras
01756099e0 KVM: PPC: Book3S HV: MMU notifier callbacks for radix guests
This adapts our implementations of the MMU notifier callbacks
(unmap_hva, unmap_hva_range, age_hva, test_age_hva, set_spte_hva)
to call radix functions when the guest is using radix.  These
implementations are much simpler than for HPT guests because we
have only one PTE to deal with, so we don't need to traverse
rmap chains.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31 19:11:49 +11:00
Paul Mackerras
5a319350a4 KVM: PPC: Book3S HV: Page table construction and page faults for radix guests
This adds the code to construct the second-level ("partition-scoped" in
architecturese) page tables for guests using the radix MMU.  Apart from
the PGD level, which is allocated when the guest is created, the rest
of the tree is all constructed in response to hypervisor page faults.

As well as hypervisor page faults for missing pages, we also get faults
for reference/change (RC) bits needing to be set, as well as various
other error conditions.  For now, we only set the R or C bit in the
guest page table if the same bit is set in the host PTE for the
backing page.

This code can take advantage of the guest being backed with either
transparent or ordinary 2MB huge pages, and insert 2MB page entries
into the guest page tables.  There is no support for 1GB huge pages
yet.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31 19:11:49 +11:00