Commit Graph

153218 Commits

Author SHA1 Message Date
David S. Miller
3c2b2d9408 sparc: Really use linker with LDFLAGS.
Rather than funneling through CC.

Also, use --hash-style=both just like other VDSO architectures and
glibc do.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-22 16:02:05 -07:00
David S. Miller
5615edcca9 sparc: Improve VDSO CFLAGS.
Do not set any special register usage options, use the default which
is exactly what we should use for userspace code.

Make sure we remove the gcc plugin options from the 64-bit build.
The 32-bit cflags got it right already.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-22 15:56:11 -07:00
David S. Miller
44231b7fee sparc: Set DISABLE_BRANCH_PROFILING in VDSO CFLAGS.
Not in vclock_gettime.c itself.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-22 15:51:45 -07:00
David S. Miller
3fe5d7e861 sparc: Don't bother masking out TICK_PRIV_BIT in VDSO code.
If the TICK_PRIV_BIT was set, we would not be able to read the tick
register in user space, which is where this code runs.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-22 15:31:38 -07:00
David S. Miller
794b88e047 sparc: Inline VDSO gettime code aggressively.
One interesting thing we need to do is stop using
__builtin_return_address() in get_vvar_data().

Simply read the %pc register instead.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-22 15:27:49 -07:00
David S. Miller
2f6c9bf31a sparc: Improve VDSO instruction patching.
The current VDSO patch mechanism has several problems:

1) It assumes how gcc will emit a function, with a register
   window, an initial save instruction and then immediately
   the %tick read when compiling vread_tick().

   There is no such guarantees, code generation could change
   at any time, gcc could put a nop between the save and
   the %tick read, etc.

   So this is extremely fragile and would fail some day.

2) It disallows us to properly inline vread_tick() into the callers
   and thus get the best possible code sequences.

So fix this to patch properly, with location based annotations.

We have to be careful because we cannot do it the way we do
patches elsewhere in the kernel.  Those use a sequence like:

	1:
	insn
	.section	.whatever_patch, "ax"
	.word		1b
	replacement_insn
	.previous

This is a dynamic shared object, so that .word cannot be resolved at
build time, and thus cannot be used to execute the patches when the
kernel initializes the images.

Even trying to use label difference equations doesn't work in the
above kind of scheme:

	1:
	insn
	.section	.whatever_patch, "ax"
	.word		. - 1b
	replacement_insn
	.previous

The assembler complains that it cannot resolve that computation.
The issue is that this is contained in an executable section.

Borrow the sequence used by x86 alternatives, which is:

	1:
	insn
	.pushsection	.whatever_patch, "a"
	.word		. - 1b, . - 1f
	.popsection
	.pushsection	.whatever_patch_replacements, "ax"
	1:
	replacement_insn
	.previous

This works, allows us to inline vread_tick() as much as we like, and
can be used for arbitrary kinds of VDSO patching in the future.

Also, reverse the condition for patching.  Most systems are %stick
based, so if we only patch on %tick systems the patching code will
get little or no testing.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-22 15:22:14 -07:00
Linus Torvalds
cff229491a Merge tag 'dma-mapping-4.20' of git://git.infradead.org/users/hch/dma-mapping
Pull dma mapping updates from Christoph Hellwig:
 "First batch of dma-mapping changes for 4.20.

  There will be a second PR as some big changes were only applied just
  before the end of the merge window, and I want to give them a few more
  days in linux-next.

  Summary:

   - mostly more consolidation of the direct mapping code, including
     converting over hexagon, and merging the coherent and non-coherent
     code into a single dma_map_ops instance (me)

   - cleanups for the dma_configure/dma_unconfigure callchains (me)

   - better handling of dma_masks in odd setups (me, Alexander Duyck)

   - better debugging of passing vmalloc address to the DMA API (Stephen
     Boyd)

   - CMA command line parsing fix (He Zhe)"

* tag 'dma-mapping-4.20' of git://git.infradead.org/users/hch/dma-mapping: (27 commits)
  dma-direct: respect DMA_ATTR_NO_WARN
  dma-mapping: translate __GFP_NOFAIL to DMA_ATTR_NO_WARN
  dma-direct: document the zone selection logic
  dma-debug: Check for drivers mapping invalid addresses in dma_map_single()
  dma-direct: fix return value of dma_direct_supported
  dma-mapping: move dma_default_get_required_mask under ifdef
  dma-direct: always allow dma mask <= physiscal memory size
  dma-direct: implement complete bus_dma_mask handling
  dma-direct: refine dma_direct_alloc zone selection
  dma-direct: add an explicit dma_direct_get_required_mask
  dma-mapping: make the get_required_mask method available unconditionally
  unicore32: remove swiotlb support
  Revert "dma-mapping: clear dev->dma_ops in arch_teardown_dma_ops"
  dma-mapping: support non-coherent devices in dma_common_get_sgtable
  dma-mapping: consolidate the dma mmap implementations
  dma-mapping: merge direct and noncoherent ops
  dma-mapping: move the dma_coherent flag to struct device
  MIPS: don't select DMA_MAYBE_COHERENT from DMA_PERDEV_COHERENT
  dma-mapping: add the missing ARCH_HAS_SYNC_DMA_FOR_CPU_ALL declaration
  dma-mapping: fix panic caused by passing empty cma command line argument
  ...
2018-10-22 18:16:03 +01:00
Linus Torvalds
6ab9e09238 Merge tag 'for-4.20/block-20181021' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:
 "This is the main pull request for block changes for 4.20. This
  contains:

   - Series enabling runtime PM for blk-mq (Bart).

   - Two pull requests from Christoph for NVMe, with items such as;
      - Better AEN tracking
      - Multipath improvements
      - RDMA fixes
      - Rework of FC for target removal
      - Fixes for issues identified by static checkers
      - Fabric cleanups, as prep for TCP transport
      - Various cleanups and bug fixes

   - Block merging cleanups (Christoph)

   - Conversion of drivers to generic DMA mapping API (Christoph)

   - Series fixing ref count issues with blkcg (Dennis)

   - Series improving BFQ heuristics (Paolo, et al)

   - Series improving heuristics for the Kyber IO scheduler (Omar)

   - Removal of dangerous bio_rewind_iter() API (Ming)

   - Apply single queue IPI redirection logic to blk-mq (Ming)

   - Set of fixes and improvements for bcache (Coly et al)

   - Series closing a hotplug race with sysfs group attributes (Hannes)

   - Set of patches for lightnvm:
      - pblk trace support (Hans)
      - SPDX license header update (Javier)
      - Tons of refactoring patches to cleanly abstract the 1.2 and 2.0
        specs behind a common core interface. (Javier, Matias)
      - Enable pblk to use a common interface to retrieve chunk metadata
        (Matias)
      - Bug fixes (Various)

   - Set of fixes and updates to the blk IO latency target (Josef)

   - blk-mq queue number updates fixes (Jianchao)

   - Convert a bunch of drivers from the old legacy IO interface to
     blk-mq. This will conclude with the removal of the legacy IO
     interface itself in 4.21, with the rest of the drivers (me, Omar)

   - Removal of the DAC960 driver. The SCSI tree will introduce two
     replacement drivers for this (Hannes)"

* tag 'for-4.20/block-20181021' of git://git.kernel.dk/linux-block: (204 commits)
  block: setup bounce bio_sets properly
  blkcg: reassociate bios when make_request() is called recursively
  blkcg: fix edge case for blk_get_rl() under memory pressure
  nvme-fabrics: move controller options matching to fabrics
  nvme-rdma: always have a valid trsvcid
  mtip32xx: fully switch to the generic DMA API
  rsxx: switch to the generic DMA API
  umem: switch to the generic DMA API
  sx8: switch to the generic DMA API
  sx8: remove dead IF_64BIT_DMA_IS_POSSIBLE code
  skd: switch to the generic DMA API
  ubd: remove use of blk_rq_map_sg
  nvme-pci: remove duplicate check
  drivers/block: Remove DAC960 driver
  nvme-pci: fix hot removal during error handling
  nvmet-fcloop: suppress a compiler warning
  nvme-core: make implicit seed truncation explicit
  nvmet-fc: fix kernel-doc headers
  nvme-fc: rework the request initialization code
  nvme-fc: introduce struct nvme_fcp_op_w_sgl
  ...
2018-10-22 17:46:08 +01:00
Linus Torvalds
5289851171 Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
 "Apart from some new arm64 features and clean-ups, this also contains
  the core mmu_gather changes for tracking the levels of the page table
  being cleared and a minor update to the generic
  compat_sys_sigaltstack() introducing COMPAT_SIGMINSKSZ.

  Summary:

   - Core mmu_gather changes which allow tracking the levels of
     page-table being cleared together with the arm64 low-level flushing
     routines

   - Support for the new ARMv8.5 PSTATE.SSBS bit which can be used to
     mitigate Spectre-v4 dynamically without trapping to EL3 firmware

   - Introduce COMPAT_SIGMINSTKSZ for use in compat_sys_sigaltstack

   - Optimise emulation of MRS instructions to ID_* registers on ARMv8.4

   - Support for Common Not Private (CnP) translations allowing threads
     of the same CPU to share the TLB entries

   - Accelerated crc32 routines

   - Move swapper_pg_dir to the rodata section

   - Trap WFI instruction executed in user space

   - ARM erratum 1188874 workaround (arch_timer)

   - Miscellaneous fixes and clean-ups"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (78 commits)
  arm64: KVM: Guests can skip __install_bp_hardening_cb()s HYP work
  arm64: cpufeature: Trap CTR_EL0 access only where it is necessary
  arm64: cpufeature: Fix handling of CTR_EL0.IDC field
  arm64: cpufeature: ctr: Fix cpu capability check for late CPUs
  Documentation/arm64: HugeTLB page implementation
  arm64: mm: Use __pa_symbol() for set_swapper_pgd()
  arm64: Add silicon-errata.txt entry for ARM erratum 1188873
  Revert "arm64: uaccess: implement unsafe accessors"
  arm64: mm: Drop the unused cpu parameter
  MAINTAINERS: fix bad sdei paths
  arm64: mm: Use #ifdef for the __PAGETABLE_P?D_FOLDED defines
  arm64: Fix typo in a comment in arch/arm64/mm/kasan_init.c
  arm64: xen: Use existing helper to check interrupt status
  arm64: Use daifflag_restore after bp_hardening
  arm64: daifflags: Use irqflags functions for daifflags
  arm64: arch_timer: avoid unused function warning
  arm64: Trap WFI executed in userspace
  arm64: docs: Document SSBS HWCAP
  arm64: docs: Fix typos in ELF hwcaps
  arm64/kprobes: remove an extra semicolon in arch_prepare_kprobe
  ...
2018-10-22 17:30:06 +01:00
Vasily Gorbik
cf3dbe5dac s390/kasan: support preemptible kernel build
When the kernel is built with:
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
"stfle" function used by kasan initialization code makes additional
call to preempt_count_add/preempt_count_sub. To avoid removing kasan
instrumentation from sched code where those functions leave split stfle
function and provide __stfle variant without preemption handling to be
used by Kasan.

Reported-by: Benjamin Block <bblock@linux.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-10-22 08:37:45 +02:00
Christophe Leroy
977e4be5eb x86/stackprotector: Remove the call to boot_init_stack_canary() from cpu_startup_entry()
The following commit:

  d7880812b3 ("idle: Add the stack canary init to cpu_startup_entry()")

... added an x86 specific boot_init_stack_canary() call to the generic
cpu_startup_entry() as a temporary hack, with the intention to remove
the #ifdef CONFIG_X86 later.

More than 5 years later let's finally realize that plan! :-)

While implementing stack protector support for PowerPC, we found
that calling boot_init_stack_canary() is also needed for PowerPC
which uses per task (TLS) stack canary like the X86.

However, calling boot_init_stack_canary() would break architectures
using a global stack canary (ARM, SH, MIPS and XTENSA).

Instead of modifying the #ifdef CONFIG_X86 to an even messier:

   #if defined(CONFIG_X86) || defined(CONFIG_PPC)

PowerPC implemented the call to boot_init_stack_canary() in the function
calling cpu_startup_entry().

Let's try the same cleanup on the x86 side as well.

On x86 we have two functions calling cpu_startup_entry():

 - start_secondary()
 - cpu_bringup_and_idle()

start_secondary() already calls boot_init_stack_canary(), so
it's good, and this patch adds the call to boot_init_stack_canary()
in cpu_bringup_and_idle().

I.e. now x86 catches up to the rest of the world and the ugly init
sequence in init/main.c can be removed from cpu_startup_entry().

As a final benefit we can also remove the <linux/stackprotector.h>
dependency from <linux/sched.h>.

[ mingo: Improved the changelog a bit, added language explaining x86 borkage and sched.h change. ]
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20181020072649.5B59310483E@pc16082vm.idsi0.si.c-s.fr
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-22 04:07:24 +02:00
Masami Hiramatsu
2e62024c26 kprobes/x86: Use preempt_enable() in optimized_callback()
The following commit:

  a19b2e3d78 ("kprobes/x86: Remove IRQ disabling from ftrace-based/optimized kprobes”)

removed local_irq_save/restore() from optimized_callback(), the handler
might be interrupted by the rescheduling interrupt and might be
rescheduled - so we must not use the preempt_enable_no_resched() macro.

Use preempt_enable() instead, to not lose preemption events.

[ mingo: Improved the changelog. ]

Reported-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dwmw@amazon.co.uk
Fixes: a19b2e3d78 ("kprobes/x86: Remove IRQ disabling from ftrace-based/optimized kprobes”)
Link: http://lkml.kernel.org/r/154002887331.7627.10194920925792947001.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-22 03:31:01 +02:00
David S. Miller
21ea1d36f6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David Ahern's dump indexing bug fix in 'net' overlapped the
change of the function signature of inet6_fill_ifaddr() in
'net-next'.  Trivially resolved.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-21 11:54:28 -07:00
Mark Brown
4fd1f509e8 Merge branch 'regulator-4.20' into regulator-next 2018-10-21 17:00:02 +01:00
Paolo Bonzini
574c0cfbc7 Merge tag 'kvm-ppc-next-4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
Second PPC KVM update for 4.20.

Two commits; one is an optimization for PCI pass-through, and the
other disables nested HV-KVM on early POWER9 chips that need a
particular hardware bug workaround.
2018-10-21 11:47:01 +02:00
Dave Hansen
1620414251 x86/mm: Kill stray kernel fault handling comment
I originally had matching user and kernel comments, but the kernel
one got improved.  Some errant conflict resolution kicked the commment
somewhere wrong.  Kill it.

Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: aa37c51b94 ("x86/mm: Break out user address space handling")
Link: http://lkml.kernel.org/r/20181019140842.12F929FA@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-21 10:58:10 +02:00
Christophe Leroy
0f99153def powerpc/msi: Fix compile error on mpc83xx
mpic_get_primary_version() is not defined when not using MPIC.
The compile error log like:

arch/powerpc/sysdev/built-in.o: In function `fsl_of_msi_probe':
fsl_msi.c:(.text+0x150c): undefined reference to `fsl_mpic_primary_get_version'

Signed-off-by: Jia Hongtao <hongtao.jia@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Reported-by: Radu Rendec <radu.rendec@gmail.com>
Fixes: 807d38b73b ("powerpc/mpic: Add get_version API both for internal and external use")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-21 19:32:07 +11:00
Michael Ellerman
b6aeddea74 powerpc: Fix stack protector crashes on CPU hotplug
Recently in commit 7241d26e81 ("powerpc/64: properly initialise
the stackprotector canary on SMP.") we fixed a crash with stack
protector on SMP by initialising the stack canary in
cpu_idle_thread_init().

But this can also causes crashes, when a CPU comes back online after
being offline:

  Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: pnv_smp_cpu_kill_self+0x2a0/0x2b0
  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.0-rc3-gcc-7.3.1-00168-g4ffe713b7587 #94
  Call Trace:
    dump_stack+0xb0/0xf4 (unreliable)
    panic+0x144/0x328
    __stack_chk_fail+0x2c/0x30
    pnv_smp_cpu_kill_self+0x2a0/0x2b0
    cpu_die+0x48/0x70
    arch_cpu_idle_dead+0x20/0x40
    do_idle+0x274/0x390
    cpu_startup_entry+0x38/0x50
    start_secondary+0x5e4/0x600
    start_secondary_prolog+0x10/0x14

Looking at the stack we see that the canary value in the stack frame
doesn't match the canary in the task/paca. That is because we have
reinitialised the task/paca value, but then the CPU coming online has
returned into a function using the old canary value. That causes the
comparison to fail.

Instead we can call boot_init_stack_canary() from start_secondary()
which never returns. This is essentially what the generic code does in
cpu_startup_entry() under #ifdef X86, we should make that non-x86
specific in a future patch.

Fixes: 7241d26e81 ("powerpc/64: properly initialise the stackprotector canary on SMP.")
Reported-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
2018-10-21 19:32:00 +11:00
Camelia Groza
0400d65501 powerpc/dts/fsl: t2080rdb: reorder the Cortina PHY XFI lanes
According to the T2080RDB schematics, for the CS4315 PHY, the XFI 1 lane is
connected to SFP 2 and the XFI 2 lane is connected to SFP 1. Change the
device tree to reflect the correct PHY order and port association.

Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Signed-off-by: Scott Wood <oss@buserror.net>
2018-10-20 18:23:56 -05:00
Helge Deller
e543b3a620 parisc: Retrieve and display the PDC PAT capabilities
Signed-off-by: Helge Deller <deller@gmx.de>
2018-10-20 21:10:37 +02:00
John David Anglin
4c5fe5db1a parisc: Optimze cache flush algorithms
The attached patch implements three optimizations:

1) Loops in flush_user_dcache_range_asm, flush_kernel_dcache_range_asm,
purge_kernel_dcache_range_asm, flush_user_icache_range_asm, and
flush_kernel_icache_range_asm are unrolled to reduce branch overhead.

2) The static branch prediction for cmpb instructions in pacache.S have
been reviewed and the operand order adjusted where necessary.

3) For flush routines in cache.c, we purge rather flush when we have no
context.  The pdc instruction at level 0 is not required to write back
dirty lines to memory. This provides a performance improvement over the
fdc instruction if the feature is implemented.

Version 2 adds alternative patching.

The patch provides an average improvement of about 2%.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
2018-10-20 21:10:26 +02:00
John David Anglin
5a23237f14 parisc: Remove pte_inserted define
The attached change removes the pte_inserted from pgtable.h.  As a
result, we always flush the TLB entry when the associated page table
entry is changed.

This change doesn't impact performance signifcantly and it may catch
some cases where the TLB needs flushing but wasn't.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
2018-10-20 21:09:30 +02:00
Bjorn Helgaas
525fde0750 Merge branch 'remotes/lorenzo/pci/dwc'
- Support 100MHz/200MHz refclocks for i.MX6 (Lucas Stach)

  - Add initial power management for i.MX7 (Leonard Crestez)

  - Add PME_Turn_Off support for i.MX7 (Leonard Crestez)

  - Fix qcom runtime power management error handling (Bjorn Andersson)

  - Update TI dra7xx unaligned access errata workaround for host mode as
    well as endpoint mode (Vignesh R)

  - Fix kirin section mismatch warning (Nathan Chancellor)

* remotes/lorenzo/pci/dwc:
  PCI: imx: Add PME_Turn_Off support
  ARM: dts: imx7d: Add turnoff reset
  dt-bindings: imx6q-pcie: Add turnoff reset for imx7d
  reset: imx7: Add PCIE_CTRL_APPS_TURNOFF
  PCI: kirin: Fix section mismatch warning
  PCI: dwc: pci-dra7xx: Enable errata i870 for both EP and RC mode
  dt-bindings: PCI: dra7xx: Add bindings for unaligned access in host mode
  PCI: qcom: Fix error handling in runtime PM support
  PCI: imx: Initial imx7d pm support
  PCI: imx6: Support MPLL reconfiguration for 100MHz and 200MHz refclock
2018-10-20 11:45:49 -05:00
Bjorn Helgaas
6aa0459e75 Merge branch 'pci/host-vmd'
- Fix VMD AERSID quirk Device ID matching (Jon Derrick)

* pci/host-vmd:
  x86/PCI: Apply VMD's AERSID fixup generically
2018-10-20 11:45:44 -05:00
Bjorn Helgaas
20634dc361 Merge branch 'pci/hotplug'
- Differentiate between pciehp surprise and safe removal (Lukas Wunner)

  - Remove unnecessary pciehp includes (Lukas Wunner)

  - Drop pciehp hotplug_slot_ops wrappers (Lukas Wunner)

  - Tolerate PCIe Slot Presence Detect being hardwired to zero to
    workaround broken hardware, e.g., the Wilocity switch/wireless device
    (Lukas Wunner)

  - Unify pciehp controller & slot structs (Lukas Wunner)

  - Constify hotplug_slot_ops (Lukas Wunner)

  - Drop hotplug_slot_info (Lukas Wunner)

  - Embed hotplug_slot struct into users instead of allocating it
    separately (Lukas Wunner)

  - Initialize PCIe port service drivers directly instead of relying on
    initcall ordering (Keith Busch)

  - Restore PCI config state after a slot reset (Keith Busch)

  - Save/restore DPC config state along with other PCI config state (Keith
    Busch)

  - Reference count devices during AER handling to avoid race issue with
    concurrent hot removal (Keith Busch)

  - If an Upstream Port reports ERR_FATAL, don't try to read the Port's
    config space because it is probably unreachable (Keith Busch)

  - During error handling, use slot-specific reset instead of secondary
    bus reset to avoid link up/down issues on hotplug ports (Keith Busch)

  - Restore previous AER/DPC handling that does not remove and re-enumerate
    devices on ERR_FATAL (Keith Busch)

  - Notify all drivers that may be affected by error recovery resets (Keith
    Busch)

  - Always generate error recovery uevents, even if a driver doesn't have
    error callbacks (Keith Busch)

  - Make PCIe link active reporting detection generic (Keith Busch)

  - Support D3cold in PCIe hierarchies during system sleep and runtime,
    including hotplug and Thunderbolt ports (Mika Westerberg)

  - Handle hpmemsize/hpiosize kernel parameters uniformly, whether slots
    are empty or occupied (Jon Derrick)

  - Remove duplicated include from pci/pcie/err.c and unused variable from
    cpqphp (YueHaibing)

  - Remove driver pci_cleanup_aer_uncorrect_error_status() calls (Oza
    Pawandeep)

  - Uninline PCI bus accessors for better ftracing (Keith Busch)

  - Remove unused AER Root Port .error_resume method (Keith Busch)

  - Use kfifo in AER instead of a local version (Keith Busch)

  - Use threaded IRQ in AER bottom half (Keith Busch)

  - Use managed resources in AER core (Keith Busch)

  - Reuse pcie_port_find_device() for AER injection (Keith Busch)

  - Abstract AER interrupt handling to disconnect error injection (Keith
    Busch)

  - Refactor AER injection callbacks to simplify future improvments (Keith
    Busch)

* pci/hotplug:
  PCI/AER: Refactor error injection fallbacks
  PCI/AER: Abstract AER interrupt handling
  PCI/AER: Reuse existing pcie_port_find_device() interface
  PCI/AER: Use managed resource allocations
  PCI/AER: Use threaded IRQ for bottom half
  PCI/AER: Use kfifo_in_spinlocked() to insert locked elements
  PCI/AER: Use kfifo for tracking events instead of reimplementing it
  PCI/AER: Remove error source from AER struct aer_rpc
  PCI/AER: Remove unused aer_error_resume()
  PCI: Uninline PCI bus accessors for better ftracing
  PCI/AER: Remove pci_cleanup_aer_uncorrect_error_status() calls
  PCI: pnv_php: Use kmemdup()
  PCI: cpqphp: Remove set but not used variable 'physical_slot'
  PCI/ERR: Remove duplicated include from err.c
  PCI: Equalize hotplug memory and io for occupied and empty slots
  PCI / ACPI: Whitelist D3 for more PCIe hotplug ports
  ACPI / property: Allow multiple property compatible _DSD entries
  PCI/PME: Implement runtime PM callbacks
  PCI: pciehp: Implement runtime PM callbacks
  PCI/portdrv: Add runtime PM hooks for port service drivers
  PCI/portdrv: Resume upon exit from system suspend if left runtime suspended
  PCI: pciehp: Do not handle events if interrupts are masked
  PCI: pciehp: Disable hotplug interrupt during suspend
  PCI / ACPI: Enable wake automatically for power managed bridges
  PCI: Do not skip power-managed bridges in pci_enable_wake()
  PCI: Make link active reporting detection generic
  PCI: Unify device inaccessible
  PCI/ERR: Always report current recovery status for udev
  PCI/ERR: Simplify broadcast callouts
  PCI/ERR: Run error recovery callbacks for all affected devices
  PCI/ERR: Handle fatal error recovery
  PCI/ERR: Use slot reset if available
  PCI/AER: Don't read upstream ports below fatal errors
  PCI/AER: Take reference on error devices
  PCI/DPC: Save and restore config state
  PCI: portdrv: Restore PCI config state on slot reset
  PCI: portdrv: Initialize service drivers directly
  PCI: hotplug: Document TODOs
  PCI: hotplug: Embed hotplug_slot
  PCI: hotplug: Drop hotplug_slot_info
  PCI: hotplug: Constify hotplug_slot_ops
  PCI: pciehp: Reshuffle controller struct for clarity
  PCI: pciehp: Rename controller struct members for clarity
  PCI: pciehp: Unify controller and slot structs
  PCI: pciehp: Tolerate Presence Detect hardwired to zero
  PCI: pciehp: Drop hotplug_slot_ops wrappers
  PCI: pciehp: Drop unnecessary includes
  PCI: pciehp: Differentiate between surprise and safe removal
  PCI: Simplify disconnected marking
2018-10-20 11:45:29 -05:00
Greg Kroah-Hartman
b0d04fb56b Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Ingo writes:
  "x86 fixes:

   It's 4 misc fixes, 3 build warning fixes and 3 comment fixes.

   In hindsight I'd have left out the 3 comment fixes to make the pull
   request look less scary at such a late point in the cycle. :-/"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/swiotlb: Enable swiotlb for > 4GiG RAM on 32-bit kernels
  x86/fpu: Fix i486 + no387 boot crash by only saving FPU registers on context switch if there is an FPU
  x86/fpu: Remove second definition of fpu in __fpu__restore_sig()
  x86/entry/64: Further improve paranoid_entry comments
  x86/entry/32: Clear the CS high bits
  x86/boot: Add -Wno-pointer-sign to KBUILD_CFLAGS
  x86/time: Correct the attribute on jiffies' definition
  x86/entry: Add some paranoid entry/exit CR3 handling comments
  x86/percpu: Fix this_cpu_read()
  x86/tsc: Force inlining of cyc2ns bits
2018-10-20 15:04:23 +02:00
Alexey Kardashevskiy
6e301a8e56 KVM: PPC: Optimize clearing TCEs for sparse tables
The powernv platform maintains 2 TCE tables for VFIO - a hardware TCE
table and a table with userspace addresses. These tables are radix trees,
we allocate indirect levels when they are written to. Since
the memory allocation is problematic in real mode, we have 2 accessors
to the entries:
- for virtual mode: it allocates the memory and it is always expected
to return non-NULL;
- fr real mode: it does not allocate and can return NULL.

Also, DMA windows can span to up to 55 bits of the address space and since
we never have this much RAM, such windows are sparse. However currently
the SPAPR TCE IOMMU driver walks through all TCEs to unpin DMA memory.

Since we maintain a userspace addresses table for VFIO which is a mirror
of the hardware table, we can use it to know which parts of the DMA
window have not been mapped and skip these so does this patch.

The bare metal systems do not have this problem as they use a bypass mode
of a PHB which maps RAM directly.

This helps a lot with sparse DMA windows, reducing the shutdown time from
about 3 minutes per 1 billion TCEs to a few seconds for 32GB sparse guest.
Just skipping the last level seems to be good enough.

As non-allocating accessor is used now in virtual mode as well, rename it
from IOMMU_TABLE_USERSPACE_ENTRY_RM (real mode) to _RO (read only).

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-10-20 20:47:02 +11:00
Christophe Leroy
daf00ae71d powerpc/traps: restore recoverability of machine_check interrupts
commit b96672dd84 ("powerpc: Machine check interrupt is a non-
maskable interrupt") added a call to nmi_enter() at the beginning of
machine check restart exception handler. Due to that, in_interrupt()
always returns true regardless of the state before entering the
exception, and die() panics even when the system was not already in
interrupt.

This patch calls nmi_exit() before calling die() in order to restore
the interrupt state we had before calling nmi_enter()

Fixes: b96672dd84 ("powerpc: Machine check interrupt is a non-maskable interrupt")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20 13:26:47 +11:00
Nicholas Piggin
b851ba02a6 powerpc/64/module: REL32 relocation range check
The recent module relocation overflow crash demonstrated that we
have no range checking on REL32 relative relocations. This patch
implements a basic check, the same kernel that previously oopsed
and rebooted now continues with some of these errors when loading
the module:

  module_64: x_tables: REL32 527703503449812 out of range!

Possibly other relocations (ADDR32, REL16, TOC16, etc.) should also have
overflow checks.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20 13:26:47 +11:00
Nicholas Piggin
dd76ff5af3 powerpc/64s/radix: Fix radix__flush_tlb_collapsed_pmd double flushing pmd
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20 13:26:47 +11:00
Michael Ellerman
0d923962ab powerpc/mm: Fix page table dump to work on Radix
When we're running on Book3S with the Radix MMU enabled the page table
dump currently prints the wrong addresses because it uses the wrong
start address.

Fix it to use PAGE_OFFSET rather than KERN_VIRT_START.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20 13:26:47 +11:00
Michael Ellerman
afb6d0647f powerpc/mm/radix: Display if mappings are exec or not
At boot we print the ranges we've mapped for the linear mapping and
what page size we've used. Also track whether the range is mapped
executable or not and display that as well.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20 13:26:47 +11:00
Michael Ellerman
232aa40763 powerpc/mm/radix: Simplify split mapping logic
If we look closely at the logic in create_physical_mapping(), when
we're doing STRICT_KERNEL_RWX, we do the following steps:
  - determine the gap from where we are to the end of the range
  - choose an appropriate mapping_size based on the gap
  - check if that mapping_size would overlap the __init_begin
    boundary, and if not choose an appropriate mapping_size

We can simplify the logic by taking the __init_begin boundary into
account when we calculate the initial gap.

So add a next_boundary() function which tells us what the next
boundary is, either the __init_begin boundary or end. In future we can
add more boundaries.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20 13:26:47 +11:00
Michael Ellerman
57306c663d powerpc/mm/radix: Remove the retry in the split mapping logic
When we have CONFIG_STRICT_KERNEL_RWX enabled, we want to split the
linear mapping at the text/data boundary so we can map the kernel
text read only.

The current logic uses a goto inside the for loop, which works, but is
hard to reason about.

When we hit the goto retry case we set max_mapping_size to PMD_SIZE
and go back to the start.

Setting max_mapping_size means we skip the PUD case and go to the PMD
case.

We know we will pass the alignment and gap checks because the only
reason we are there is we hit the goto retry, and that is guarded by
mapping_size == PUD_SIZE, which means addr is PUD aligned and gap is
greater or equal to PUD_SIZE.

So the only part of the check that can fail is the mmu_psize_defs
check for the 2M page size.

If we just duplicate that check we can avoid the goto, and we get the
same result.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20 13:26:47 +11:00
Michael Ellerman
81d1b54dec powerpc/mm/radix: Fix small page at boundary when splitting
When we have CONFIG_STRICT_KERNEL_RWX enabled, we want to split the
linear mapping at the text/data boundary so we can map the kernel
text read only.

Currently we always use a small page at the text/data boundary, even
when that's not necessary:

  Mapped 0x0000000000000000-0x0000000000e00000 with 2.00 MiB pages
  Mapped 0x0000000000e00000-0x0000000001000000 with 64.0 KiB pages
  Mapped 0x0000000001000000-0x0000000040000000 with 2.00 MiB pages

This is because the check that the mapping crosses the __init_begin
boundary is too strict, it also returns true when we map exactly up to
the boundary.

So fix it to check that the mapping would actually map past
__init_begin, and with that we see:

  Mapped 0x0000000000000000-0x0000000040000000 with 2.00 MiB pages
  Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20 13:26:47 +11:00
Michael Ellerman
3b5657ed5b powerpc/mm/radix: Fix overuse of small pages in splitting logic
When we have CONFIG_STRICT_KERNEL_RWX enabled, we want to split the
linear mapping at the text/data boundary so we can map the kernel text
read only.

But the current logic uses small pages for the entire text section,
regardless of whether a larger page size would fit. eg. with the
boundary at 16M we could use 2M pages, but instead we use 64K pages up
to the 16M boundary:

  Mapped 0x0000000000000000-0x0000000001000000 with 64.0 KiB pages
  Mapped 0x0000000001000000-0x0000000040000000 with 2.00 MiB pages
  Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages

This is because the test is checking if addr is < __init_begin
and addr + mapping_size is >= _stext. But that is true for all pages
between _stext and __init_begin.

Instead what we want to check is if we are crossing the text/data
boundary, which is at __init_begin. With that fixed we see:

  Mapped 0x0000000000000000-0x0000000000e00000 with 2.00 MiB pages
  Mapped 0x0000000000e00000-0x0000000001000000 with 64.0 KiB pages
  Mapped 0x0000000001000000-0x0000000040000000 with 2.00 MiB pages
  Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages

ie. we're correctly using 2MB pages below __init_begin, but we still
drop down to 64K pages unnecessarily at the boundary.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20 13:26:47 +11:00
Michael Ellerman
5c6499b704 powerpc/mm/radix: Fix off-by-one in split mapping logic
When we have CONFIG_STRICT_KERNEL_RWX enabled, we try to split the
kernel linear (1:1) mapping so that the kernel text is in a separate
page to kernel data, so we can mark the former read-only.

We could achieve that just by always using 64K pages for the linear
mapping, but we try to be smarter. Instead we use huge pages when
possible, and only switch to smaller pages when necessary.

However we have an off-by-one bug in that logic, which causes us to
calculate the wrong boundary between text and data.

For example with the end of the kernel text at 16M we see:

  radix-mmu: Mapped 0x0000000000000000-0x0000000001200000 with 64.0 KiB pages
  radix-mmu: Mapped 0x0000000001200000-0x0000000040000000 with 2.00 MiB pages
  radix-mmu: Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages

ie. we mapped from 0 to 18M with 64K pages, even though the boundary
between text and data is at 16M.

With the fix we see we're correctly hitting the 16M boundary:

  radix-mmu: Mapped 0x0000000000000000-0x0000000001000000 with 64.0 KiB pages
  radix-mmu: Mapped 0x0000000001000000-0x0000000040000000 with 2.00 MiB pages
  radix-mmu: Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20 13:26:47 +11:00
Naveen N. Rao
67361cf807 powerpc/ftrace: Handle large kernel configs
Currently, we expect to be able to reach ftrace_caller() from all
ftrace-enabled functions through a single relative branch. With large
kernel configs, we see functions outside of 32MB of ftrace_caller()
causing ftrace_init() to bail.

In such configurations, gcc/ld emits two types of trampolines for mcount():
1. A long_branch, which has a single branch to mcount() for functions that
   are one hop away from mcount():
	c0000000019e8544 <00031b56.long_branch._mcount>:
	c0000000019e8544:	4a 69 3f ac 	b       c00000000007c4f0 <._mcount>

2. A plt_branch, for functions that are farther away from mcount():
	c0000000051f33f8 <0008ba04.plt_branch._mcount>:
	c0000000051f33f8:	3d 82 ff a4 	addis   r12,r2,-92
	c0000000051f33fc:	e9 8c 04 20 	ld      r12,1056(r12)
	c0000000051f3400:	7d 89 03 a6 	mtctr   r12
	c0000000051f3404:	4e 80 04 20 	bctr

We can reuse those trampolines for ftrace if we can have those
trampolines go to ftrace_caller() instead. However, with ABIv2, we
cannot depend on r2 being valid. As such, we use only the long_branch
trampolines by patching those to instead branch to ftrace_caller or
ftrace_regs_caller.

In addition, we add additional trampolines around .text and .init.text
to catch locations that are covered by the plt branches. This allows
ftrace to work with most large kernel configurations.

For now, we always patch the trampolines to go to ftrace_regs_caller,
which is slightly inefficient. This can be optimized further at a later
point.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20 13:26:47 +11:00
Aneesh Kumar K.V
dd0e144a63 powerpc/mm: Fix WARN_ON with THP NUMA migration
WARNING: CPU: 12 PID: 4322 at /arch/powerpc/mm/pgtable-book3s64.c:76 set_pmd_at+0x4c/0x2b0
 Modules linked in:
 CPU: 12 PID: 4322 Comm: qemu-system-ppc Tainted: G        W         4.19.0-rc3-00758-g8f0c636b0542 #36
 NIP:  c0000000000872fc LR: c000000000484eec CTR: 0000000000000000
 REGS: c000003fba876fe0 TRAP: 0700   Tainted: G        W          (4.19.0-rc3-00758-g8f0c636b0542)
 MSR:  900000010282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 24282884  XER: 00000000
 CFAR: c000000000484ee8 IRQMASK: 0
 GPR00: c000000000484eec c000003fba877268 c000000001f0ec00 c000003fbd229f80
 GPR04: 00007c8fe8e00000 c000003f864c5a38 860300853e0000c0 0000000000000080
 GPR08: 0000000080000000 0000000000000001 0401000000000080 0000000000000001
 GPR12: 0000000000002000 c000003fffff5400 c000003fce292000 00007c9024570000
 GPR16: 0000000000000000 0000000000ffffff 0000000000000001 c000000001885950
 GPR20: 0000000000000000 001ffffc0004807c 0000000000000008 c000000001f49d05
 GPR24: 00007c8fe8e00000 c0000000020f2468 ffffffffffffffff c000003fcd33b090
 GPR28: 00007c8fe8e00000 c000003fbd229f80 c000003f864c5a38 860300853e0000c0
 NIP [c0000000000872fc] set_pmd_at+0x4c/0x2b0
 LR [c000000000484eec] do_huge_pmd_numa_page+0xb1c/0xc20
 Call Trace:
 [c000003fba877268] [c00000000045931c] mpol_misplaced+0x1bc/0x230 (unreliable)
 [c000003fba8772c8] [c000000000484eec] do_huge_pmd_numa_page+0xb1c/0xc20
 [c000003fba877398] [c00000000040d344] __handle_mm_fault+0x5e4/0x2300
 [c000003fba8774d8] [c00000000040f400] handle_mm_fault+0x3a0/0x420
 [c000003fba877528] [c0000000003ff6f4] __get_user_pages+0x2e4/0x560
 [c000003fba877628] [c000000000400314] get_user_pages_unlocked+0x104/0x2a0
 [c000003fba8776c8] [c000000000118f44] __gfn_to_pfn_memslot+0x284/0x6a0
 [c000003fba877748] [c0000000001463a0] kvmppc_book3s_radix_page_fault+0x360/0x12d0
 [c000003fba877838] [c000000000142228] kvmppc_book3s_hv_page_fault+0x48/0x1300
 [c000003fba877988] [c00000000013dc08] kvmppc_vcpu_run_hv+0x1808/0x1b50
 [c000003fba877af8] [c000000000126b44] kvmppc_vcpu_run+0x34/0x50
 [c000003fba877b18] [c000000000123268] kvm_arch_vcpu_ioctl_run+0x288/0x2d0
 [c000003fba877b98] [c00000000011253c] kvm_vcpu_ioctl+0x1fc/0x8c0
 [c000003fba877d08] [c0000000004e9b24] do_vfs_ioctl+0xa44/0xae0
 [c000003fba877db8] [c0000000004e9c44] ksys_ioctl+0x84/0xf0
 [c000003fba877e08] [c0000000004e9cd8] sys_ioctl+0x28/0x80

We removed the pte_protnone check earlier with the understanding that we
mark the pte invalid before the set_pte/set_pmd usage. But the huge pmd
autonuma still use the set_pmd_at directly. This is ok because a protnone pte
won't have translation cache in TLB.

Fixes: da7ad366b4 ("powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20 13:26:47 +11:00
Christophe Leroy
51eeef9e13 powerpc/time: no steal_time when CONFIG_PPC_SPLPAR is not selected
If CONFIG_PPC_SPLPAR is not selected, steal_time will always
be NUL, so accounting it is pointless

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20 13:26:47 +11:00
Christophe Leroy
abcff86df2 powerpc/time: Only set CONFIG_ARCH_HAS_SCALED_CPUTIME on PPC64
scaled cputime is only meaningfull when the processor has
SPURR and/or PURR, which means only on PPC64.

Removing it on PPC32 significantly reduces the size of
vtime_account_system() and vtime_account_idle() on an 8xx:

Before:
00000000 l     F .text	000000a8 vtime_delta
00000280 g     F .text	0000010c vtime_account_system
0000038c g     F .text	00000048 vtime_account_idle

After:
(vtime_delta gets inlined inside the two functions)
000001d8 g     F .text	000000a0 vtime_account_system
00000278 g     F .text	00000038 vtime_account_idle

In terms of performance, we also get approximatly 7% improvement on
task switch. The following small benchmark app is run with perf stat:

void *thread(void *arg)
{
	int i;

	for (i = 0; i < atoi((char*)arg); i++)
		pthread_yield();
}

int main(int argc, char **argv)
{
	pthread_t th1, th2;

	pthread_create(&th1, NULL, thread, argv[1]);
	pthread_create(&th2, NULL, thread, argv[1]);
	pthread_join(th1, NULL);
	pthread_join(th2, NULL);

	return 0;
}

Before the patch:

 Performance counter stats for 'chrt -f 98 ./sched 100000' (50 runs):

       8228.476465      task-clock (msec)         #    0.954 CPUs utilized            ( +-  0.23% )
            200004      context-switches          #    0.024 M/sec                    ( +-  0.00% )

After the patch:

 Performance counter stats for 'chrt -f 98 ./sched 100000' (50 runs):

       7649.070444      task-clock (msec)         #    0.955 CPUs utilized            ( +-  0.27% )
            200004      context-switches          #    0.026 M/sec                    ( +-  0.00% )

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20 13:26:47 +11:00
Christophe Leroy
b38a181c11 powerpc/time: isolate scaled cputime accounting in dedicated functions.
scaled cputime is only meaningfull when the processor has
SPURR and/or PURR, which means only on PPC64.

In preparation of the following patch that will remove
CONFIG_ARCH_HAS_SCALED_CPUTIME on PPC32, this patch moves
all scaled cputing accounting logic into dedicated functions.

This patch doesn't change any functionality. It's only code
reorganisation.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20 13:26:47 +11:00
Christophe Leroy
fb978ca207 powerpc/kgdb: add kgdb_arch_set/remove_breakpoint()
Generic implementation fails to remove breakpoints after init
when CONFIG_STRICT_KERNEL_RWX is selected:

[   13.251285] KGDB: BP remove failed: c001c338
[   13.259587] kgdbts: ERROR PUT: end of test buffer on 'do_fork_test' line 8 expected OK got $E14#aa
[   13.268969] KGDB: re-enter exception: ALL breakpoints killed
[   13.275099] CPU: 0 PID: 1 Comm: init Not tainted 4.18.0-g82bbb913ffd8 #860
[   13.282836] Call Trace:
[   13.285313] [c60e1ba0] [c0080ef0] kgdb_handle_exception+0x6f4/0x720 (unreliable)
[   13.292618] [c60e1c30] [c000e97c] kgdb_handle_breakpoint+0x3c/0x98
[   13.298709] [c60e1c40] [c000af54] program_check_exception+0x104/0x700
[   13.305083] [c60e1c60] [c000e45c] ret_from_except_full+0x0/0x4
[   13.310845] [c60e1d20] [c02a22ac] run_simple_test+0x2b4/0x2d4
[   13.316532] [c60e1d30] [c0081698] put_packet+0xb8/0x158
[   13.321694] [c60e1d60] [c00820b4] gdb_serial_stub+0x230/0xc4c
[   13.327374] [c60e1dc0] [c0080af8] kgdb_handle_exception+0x2fc/0x720
[   13.333573] [c60e1e50] [c000e928] kgdb_singlestep+0xb4/0xcc
[   13.339068] [c60e1e70] [c000ae1c] single_step_exception+0x90/0xac
[   13.345100] [c60e1e80] [c000e45c] ret_from_except_full+0x0/0x4
[   13.350865] [c60e1f40] [c000e11c] ret_from_syscall+0x0/0x38
[   13.356346] Kernel panic - not syncing: Recursive entry to debugger

This patch creates powerpc specific version of
kgdb_arch_set_breakpoint() and kgdb_arch_remove_breakpoint()
using patch_instruction()

Fixes: 1e0fc9d1eb ("powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20 13:26:47 +11:00
Christophe Leroy
6beb3381b1 powerpc/sysdev/ipic: check primary_ipic NULL pointer before using it
ipic_get_mcp_status() is used by targets implementing NMI
watchdog in target specific machine check handler in order
to known whether a machine check results from a watchdog
NMI reset.

In case of very early machine check, primary_ipic pointer
might not have been set yet, so ipic_get_mcp_status() needs
to check it for nullity before using it.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20 13:26:47 +11:00
Christophe Leroy
37e9c674e7 powerpc/mm: fix always true/false warning in slice.c
This patch fixes the following warnings (obtained with make W=1).

arch/powerpc/mm/slice.c: In function 'slice_range_to_mask':
arch/powerpc/mm/slice.c:73:12: error: comparison is always true due to limited range of data type [-Werror=type-limits]
  if (start < SLICE_LOW_TOP) {
            ^
arch/powerpc/mm/slice.c:81:20: error: comparison is always false due to limited range of data type [-Werror=type-limits]
  if ((start + len) > SLICE_LOW_TOP) {
                    ^
arch/powerpc/mm/slice.c: In function 'slice_mask_for_free':
arch/powerpc/mm/slice.c:136:17: error: comparison is always true due to limited range of data type [-Werror=type-limits]
  if (high_limit <= SLICE_LOW_TOP)
                 ^
arch/powerpc/mm/slice.c: In function 'slice_check_range_fits':
arch/powerpc/mm/slice.c:185:12: error: comparison is always true due to limited range of data type [-Werror=type-limits]
  if (start < SLICE_LOW_TOP) {
            ^
arch/powerpc/mm/slice.c:195:39: error: comparison is always false due to limited range of data type [-Werror=type-limits]
  if (SLICE_NUM_HIGH && ((start + len) > SLICE_LOW_TOP)) {
                                       ^
arch/powerpc/mm/slice.c: In function 'slice_scan_available':
arch/powerpc/mm/slice.c:306:11: error: comparison is always true due to limited range of data type [-Werror=type-limits]
  if (addr < SLICE_LOW_TOP) {
           ^
arch/powerpc/mm/slice.c: In function 'get_slice_psize':
arch/powerpc/mm/slice.c:709:11: error: comparison is always true due to limited range of data type [-Werror=type-limits]
  if (addr < SLICE_LOW_TOP) {
           ^

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20 13:26:47 +11:00
Christophe Leroy
aa5456abdc powerpc/mm: fix missing prototypes in slice.c
This patch fixes the following warnings (obtained with make W=1).

arch/powerpc/mm/slice.c: At top level:
arch/powerpc/mm/slice.c:682:15: error: no previous prototype for 'arch_get_unmapped_area' [-Werror=missing-prototypes]
 unsigned long arch_get_unmapped_area(struct file *filp,
               ^
arch/powerpc/mm/slice.c:692:15: error: no previous prototype for 'arch_get_unmapped_area_topdown' [-Werror=missing-prototypes]
 unsigned long arch_get_unmapped_area_topdown(struct file *filp,
               ^

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20 13:26:47 +11:00
Christophe Leroy
8114c36ea6 powerpc/mm: Trace tlbia instruction
Add a trace point for tlbia (Translation Lookaside Buffer Invalidate
All) instruction.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20 13:26:47 +11:00
Christophe Leroy
cf4a608515 powerpc/mm: Add missing tracepoint for tlbie
commit 0428491cba ("powerpc/mm: Trace tlbie(l) instructions")
added tracepoints for tlbie calls, but _tlbil_va() was forgotten

Fixes: 0428491cba ("powerpc/mm: Trace tlbie(l) instructions")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20 13:26:47 +11:00
Christophe Leroy
3ff38e1874 powerpc/book3s64: fix dump_linuxpagetables "present" flag
Since commit bd0dbb73e0 ("powerpc/mm/books3s: Add new pte bit to
mark pte temporarily invalid."), _PAGE_PRESENT doesn't mean exactly
that a page is present. A page is also considered preset when
_PAGE_INVALID is set.

This patch changes the meaning of "present" and adds a status "valid"
associated to the _PAGE_PRESENT flag.

Fixes: bd0dbb73e0 ("powerpc/mm/books3s: Add new pte bit to mark pte temporarily invalid.")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20 13:26:47 +11:00
Aravinda Prasad
c6c26fb55e powerpc/pseries: Export raw per-CPU VPA data via debugfs
This patch exports the raw per-CPU VPA data via debugfs.
A per-CPU file is created which exports the VPA data of
that CPU to help debug some of the VPA related issues or
to analyze the per-CPU VPA related statistics.

v3: Removed offline CPU check.

v2: Included offline CPU check and other review comments.

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20 13:26:47 +11:00