The sketchy bypass uses 256M pages so add this page size as well.
This should cause no behavioral change but will be used later.
Fixes: 477afd6ea6 "powerpc/ioda: Use ibm,supported-tce-sizes for IOMMU page size mask"
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Memory reservation for crashkernel could fail if there are holes around
kdump kernel offset (128M). Fail gracefully in such cases and print an
error message.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Tested-by: David Gibson <dgibson@redhat.com>
Reviewed-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Simple overlapping changes in stmmac driver.
Adjust skb_gro_flush_final_remcsum function signature to make GRO list
changes in net-next, as per Stephen Rothwell's example merge
resolution.
Signed-off-by: David S. Miller <davem@davemloft.net>
debugfs doesn't support mmap(), so this code is never used.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We use PHB in mode1 which uses bit 59 to select a correct DMA window.
However there is mode2 which uses bits 59:55 and allows up to 32 DMA
windows per a PE.
Even though documentation does not clearly specify that, it seems that
the actual hardware does not support bits 59:55 even in mode1, in other
words we can create a window as big as 1<<58 but DMA simply won't work.
This reduces the upper limit from 59 to 55 bits to let the userspace know
about the hardware limits.
Fixes: 7aafac11e3 "powerpc/powernv/ioda2: Gracefully fail if too many TCE levels requested"
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit 59f47eff03 ("powerpc/pci: Use of_irq_parse_and_map_pci() helper")
removed the 'oirq' variable, but kept memsetting it when the DEBUG macro is
defined.
When setting DEBUG macro for debugging purpose, the kernel fails to build since
'oirq' is not defined anymore.
This patch simply remove the debug block, since it does not seem to sense
now.
Fixes: 59f47eff03 ("powerpc/pci: Use of_irq_parse_and_map_pci() helper")
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The workaround has been removed. What stays is just code to find the
memory hole so the BATs can be configured properly in the function below.
Fixes: 57deb8fea0 ("powerpc/wii: Don't rely on the reserved memory hack")
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
get_monotonic_boottime() is deprecated, and may not be safe to call in
every context, as it has to read a hardware clocksource.
This changes xmon to print the time using ktime_get_coarse_boottime64()
instead, which avoids the old timespec type and the HW access.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Due to recent refactoring in EEH in:
commit b9fde58db7 ("powerpc/powernv: Rework EEH initialization on
powernv")
a misleading message was seen in the kernel message buffer:
[ 0.108431] EEH: PowerNV platform initialized
[ 0.589979] EEH: No capable adapters found
This happened due to the removal of the initialization delay for powernv
platform.
Even though the EEH infrastructure for the devices is eventually
initialized and still works just fine the eeh device probe step is
postponed in order to assure the PEs are created. Later
pnv_eeh_post_init does the probe devices job but at that point the
message was already shown right after eeh_init flow.
This patch introduces a new flag EEH_POSTPONED_PROBE to represent that
temporary state and avoid the message mentioned above and showing the
follow one instead:
[ 0.107724] EEH: PowerNV platform initialized
[ 4.844825] EEH: PCI Enhanced I/O Error Handling Enabled
Signed-off-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Tested-by:Venkat Rao B <vrbagal1@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull Kbuild fixes from Masahiro Yamada:
- introduce __diag_* macros and suppress -Wattribute-alias warnings
from GCC 8
- fix stack protector test script for x86_64
- fix line number handling in Kconfig
- document that '#' starts a comment in Kconfig
- handle P_SYMBOL property in dump debugging of Kconfig
- correct help message of LD_DEAD_CODE_DATA_ELIMINATION
- fix occasional segmentation faults in Kconfig
* tag 'kbuild-fixes-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: loop boundary condition fix
kbuild: reword help of LD_DEAD_CODE_DATA_ELIMINATION
kconfig: handle P_SYMBOL in print_symbol()
kconfig: document Kconfig source file comments
kconfig: fix line numbers for if-entries in menu tree
stack-protector: Fix test with 32-bit userland and CONFIG_64BIT=y
powerpc: Remove -Wattribute-alias pragmas
disable -Wattribute-alias warning for SYSCALL_DEFINEx()
kbuild: add macro for controlling warnings to linux/compiler.h
Pull powerpc fixes from Michael Ellerman:
"Two regression fixes, and a new syscall wire-up:
- A fix for the recent conversion to time64_t in the powermac RTC
routines, which caused time to go backward.
- Another fix for fallout from the split PMD PTL conversion.
- Wire up the new io_pgetevents() syscall.
Thanks to: Aneesh Kumar K.V, Arnd Bergmann, Breno Leitao, Mathieu
Malaterre"
* tag 'powerpc-4.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/powermac: Fix rtc read/write functions
powerpc/mm/32: Fix pgtable_page_dtor call
powerpc: Wire up io_pgetevents
As Mathieu pointed out, my conversion to time64_t was incorrect and
resulted in negative times to be read from the RTC. The problem is
that during the conversion from a byte array to a time64_t, the
'unsigned char' variable holding the top byte gets turned into a
negative signed 32-bit integer before being assigned to the 64-bit
variable for any times after 1972.
This changes the logic to cast to an unsigned 32-bit number first for
the Macintosh time and then convert that to the Unix time, which then
gives us a time in the documented 1904..2040 year range. I decided not
to use the longer 1970..2106 range that other drivers use, for
consistency with the literal interpretation of the register, but that
could be easily changed if we decide we want to support any Mac after
2040.
Just to be on the safe side, I'm also adding a WARN_ON that will
trigger if either the year 2040 has come and is observed by this
driver, or we run into an RTC that got set back to a pre-1970 date for
some reason (the two are indistinguishable).
For the RTC write functions, Andreas found another problem: both
pmu_request() and cuda_request() are varargs functions, so changing
the type of the arguments passed into them from 32 bit to 64 bit
breaks the API for the set_rtc_time functions. This changes it back to
32 bits.
The same code exists in arch/m68k/ and is patched in an identical way
now in a separate patch.
Fixes: 5bfd643583 ("powerpc: use time64_t in read_persistent_clock")
Reported-by: Mathieu Malaterre <malat@debian.org>
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit 667416f385 ("powerpc/mm: Fix kernel crash on page table free")
added a call for pgtable_page_dtor in the rcu page table free routine. We missed
the fact that for 32 bit platforms we did call the 'dtor' early. Drop the extra
call for pgtable_page_dtor. We remove the call from __pte_free_tlb so that we
do the page table free and 'dtor' call together. This should help when we
switch these platforms to pte fragments.
Fixes: 667416f385 ("powerpc/mm: Fix kernel crash on page table free")
Reported-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch is to move ptp timer node out of fman.
Because ptp timer will be probed by ptp_qoriq driver,
it should be an independent device in case of conflict
memory mapping.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With SYSCALL_DEFINEx() disabling -Wattribute-alias generically, there's
no need to duplicate that for PowerPC syscalls.
This reverts commit 4155203739 ("powerpc: fix build failure by
disabling attribute-alias warning in pci_32") and commit 2479bfc9bc
("powerpc: Fix build by disabling attribute-alias warning for
SYSCALL_DEFINEx").
Signed-off-by: Paul Burton <paul.burton@mips.com>
Acked-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Pull rseq fixes from Thomas Gleixer:
"A pile of rseq related fixups:
- Prevent infinite recursion when delivering SIGSEGV
- Remove the abort of rseq critical section on fork() as syscalls
inside rseq critical sections are explicitely forbidden. So no
point in doing the abort on the child.
- Align the rseq structure on 32 bytes in the ARM selftest code.
- Fix file permissions of the test script"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rseq: Avoid infinite recursion when delivering SIGSEGV
rseq/cleanup: Do not abort rseq c.s. in child on fork()
rseq/selftests/arm: Align 'struct rseq_cs' on 32 bytes
rseq/selftests: Make run_param_test.sh executable
Wire up io_pgetevents system call on powerpc.
io_pgetevents is a new syscall to read asynchronous I/O events from the
completion queue.
Tested with libaio branch aio-poll[1] and the io_pgetevents test (#22) passed
on both ppc64 LE and BE modes.
[1] https://pagure.io/libaio/branch/aio-poll
CC: Christoph Hellwig <hch@lst.de>
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When delivering a signal to a task that is using rseq, we call into
__rseq_handle_notify_resume() so that the registers pushed in the
sigframe are updated to reflect the state of the restartable sequence
(for example, ensuring that the signal returns to the abort handler if
necessary).
However, if the rseq management fails due to an unrecoverable fault when
accessing userspace or certain combinations of RSEQ_CS_* flags, then we
will attempt to deliver a SIGSEGV. This has the potential for infinite
recursion if the rseq code continuously fails on signal delivery.
Avoid this problem by using force_sigsegv() instead of force_sig(), which
is explicitly designed to reset the SEGV handler to SIG_DFL in the case
of a recursive fault. In doing so, remove rseq_signal_deliver() from the
internal rseq API and have an optional struct ksignal * parameter to
rseq_handle_notify_resume() instead.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: peterz@infradead.org
Cc: paulmck@linux.vnet.ibm.com
Cc: boqun.feng@gmail.com
Link: https://lkml.kernel.org/r/1529664307-983-1-git-send-email-will.deacon@arm.com
The conditional inc/dec ops differ for atomic_t and atomic64_t:
- atomic_inc_unless_positive() is optional for atomic_t, and doesn't exist for atomic64_t.
- atomic_dec_unless_negative() is optional for atomic_t, and doesn't exist for atomic64_t.
- atomic_dec_if_positive is optional for atomic_t, and is mandatory for atomic64_t.
Let's make these consistently optional for both. At the same time, let's
clean up the existing fallbacks to use atomic_try_cmpxchg().
The instrumented atomics are updated accordingly.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20180621121321.4761-18-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Some of the atomics return the result of a test applied after the atomic
operation, and almost all architectures implement these as trivial
wrappers around the underlying atomic. Specifically:
* <atomic>_inc_and_test(v) is (<atomic>_inc_return(v) == 0)
* <atomic>_dec_and_test(v) is (<atomic>_dec_return(v) == 0)
* <atomic>_sub_and_test(i, v) is (<atomic>_sub_return(i, v) == 0)
* <atomic>_add_negative(i, v) is (<atomic>_add_return(i, v) < 0)
Rather than have these definitions duplicated in all architectures, with
minor inconsistencies in formatting and documentation, let's make these
operations optional, with default fallbacks as above. Implementations
must now provide a preprocessor symbol.
The instrumented atomics are updated accordingly.
Both x86 and m68k have custom implementations, which are left as-is,
given preprocessor symbols to avoid being overridden.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Palmer Dabbelt <palmer@sifive.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20180621121321.4761-16-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Several architectures these have a near-identical implementation based
on atomic_read() and atomic_cmpxchg() which we can instead define in
<linux/atomic.h>, so let's do so, using something close to the existing
x86 implementation with try_cmpxchg().
Where an architecture provides its own atomic_fetch_add_unless(), it
must define a preprocessor symbol for it. The instrumented atomics are
updated accordingly.
Note that arch/arc's existing atomic_fetch_add_unless() had redundant
barriers, as these are already present in its atomic_cmpxchg()
implementation.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Palmer Dabbelt <palmer@sifive.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineet Gupta <vgupta@synopsys.com>
Link: https://lore.kernel.org/lkml/20180621121321.4761-7-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We define a trivial fallback for atomic_inc_not_zero(), but don't do
the same for atomic64_inc_not_zero(), leading most architectures to
define the same boilerplate.
Let's add a fallback in <linux/atomic.h>, and remove the redundant
implementations. Note that atomic64_add_unless() is always defined in
<linux/atomic.h>, and promotes its arguments to the requisite types, so
we need not do this explicitly.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Palmer Dabbelt <palmer@sifive.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20180621121321.4761-6-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
While __atomic_add_unless() was originally intended as a building-block
for atomic_add_unless(), it's now used in a number of places around the
kernel. It's the only common atomic operation named __atomic*(), rather
than atomic_*(), and for consistency it would be better named
atomic_fetch_add_unless().
This lack of consistency is slightly confusing, and gets in the way of
scripting atomics. Given that, let's clean things up and promote it to
an official part of the atomics API, in the form of
atomic_fetch_add_unless().
This patch converts definitions and invocations over to the new name,
including the instrumented version, using the following script:
----
git grep -w __atomic_add_unless | while read line; do
sed -i '{s/\<__atomic_add_unless\>/atomic_fetch_add_unless/}' "${line%%:*}";
done
git grep -w __arch_atomic_add_unless | while read line; do
sed -i '{s/\<__arch_atomic_add_unless\>/arch_atomic_fetch_add_unless/}' "${line%%:*}";
done
----
Note that we do not have atomic{64,_long}_fetch_add_unless(), which will
be introduced by later patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Palmer Dabbelt <palmer@sifive.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20180621121321.4761-2-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With 4k page size for hugetlb we allocate hugepage directories from its on slab
cache. With patch 0c4d26802 ("powerpc/book3s64/mm: Simplify the rcu callback for page table free")
we missed to free these allocated hugepd tables.
Update pgtable_free to handle hugetlb hugepd directory table.
Fixes: 0c4d268029 ("powerpc/book3s64/mm: Simplify the rcu callback for page table free")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Add CONFIG_HUGETLB_PAGE guard to fix build break]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
If possible CPUs are limited (e.g., by kexec), then the kvm prefetch
workaround function can access the paca pointer for a !possible CPU.
Fixes: d2e60075a3 ("powerpc/64: Use array of paca pointers and allocate pacas individually")
Cc: stable@kernel.org
Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
I broke the build when CONFIG_NMI_IPI=n with my recent commit to add
arch_trigger_cpumask_backtrace(), eg:
stacktrace.c:(.text+0x1b0): undefined reference to `.smp_send_safe_nmi_ipi'
We should rework the CONFIG symbols here in future to avoid these
double barrelled ifdefs but for now they fix the build.
Fixes: 5cc05910f2 ("powerpc/64s: Wire up arch_trigger_cpumask_backtrace()")
Reported-by: Christophe LEROY <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Similar to previous patches, hard disable interrupts when a CPU is
in panic. This reduces the chance the watchdog has to interfere with
the panic, and avoids any other type of masked interrupt being
executed when crashing which minimises the length of the crash path.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Marking CPUs stopped by smp_send_stop as offline can cause warnings
due to cross-CPU wakeups. This trace was noticed on a busy system
running a sysrq+c crash test, after the injected crash:
WARNING: CPU: 51 PID: 1546 at kernel/sched/core.c:1179 set_task_cpu+0x22c/0x240
CPU: 51 PID: 1546 Comm: kworker/u352:1 Tainted: G D
Workqueue: mlx5e mlx5e_update_stats_work [mlx5_core]
[...]
NIP [c00000000017c21c] set_task_cpu+0x22c/0x240
LR [c00000000017d580] try_to_wake_up+0x230/0x720
Call Trace:
[c000000001017700] runqueues+0x0/0xb00 (unreliable)
[c00000000017d580] try_to_wake_up+0x230/0x720
[c00000000015a214] insert_work+0x104/0x140
[c00000000015adb0] __queue_work+0x230/0x690
[c000003fc5007910] [c00000000015b26c] queue_work_on+0x5c/0x90
[c0080000135fc8f8] mlx5_cmd_exec+0x538/0xcb0 [mlx5_core]
[c008000013608fd0] mlx5_core_access_reg+0x140/0x1d0 [mlx5_core]
[c00800001362777c] mlx5e_update_pport_counters.constprop.59+0x6c/0x90 [mlx5_core]
[c008000013628868] mlx5e_update_ndo_stats+0x28/0x90 [mlx5_core]
[c008000013625558] mlx5e_update_stats_work+0x68/0xb0 [mlx5_core]
[c00000000015bcec] process_one_work+0x1bc/0x5f0
[c00000000015ecac] worker_thread+0xac/0x6b0
[c000000000168338] kthread+0x168/0x1b0
[c00000000000b628] ret_from_kernel_thread+0x5c/0xb4
This happens because firstly the CPU is not really offline in the
usual sense, processes and interrupts have not been migrated away.
Secondly smp_send_stop does not happen atomically on all CPUs, so
one CPU can have marked itself offline, while another CPU is still
running processes or interrupts which can affect the first CPU.
Fix this by just not marking the CPU as offline. It's more like
frozen in time, so offline does not really reflect its state properly
anyway. There should be nothing in the crash/panic path that walks
online CPUs and synchronously waits for them, so this change should
not introduce new hangs.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Similarly to commit 855bfe0de1 ("powerpc: hard disable irqs in
smp_send_stop loop"), irqs should be hard disabled by
panic_smp_self_stop.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In the device tree CPU features quirk code we want to set
CPU_FTR_POWER9_DD2_1 on all Power9s that aren't DD2.0 or earlier. But
we got the logic wrong and instead set it on all CPUs that aren't
Power9 DD2.0 or earlier, ie. including Power8.
Fix it by making sure we're on a Power9. This isn't a bug in practice
because the only code that checks the feature is Power9 only to begin
with. But we'll backport it anyway to avoid confusion.
Fixes: 9e9626ed3a ("powerpc/64s: Fix POWER9 DD2.2 and above in DT CPU features")
Cc: stable@vger.kernel.org # v4.17+
Reported-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The patch 99baac21e4 ("mm: fix MADV_[FREE|DONTNEED] TLB flush miss
problem") added a force flush mode to the mmu_gather flush, which
unconditionally flushes the entire address range being invalidated
(even if actual ptes only covered a smaller range), to solve a problem
with concurrent threads invalidating the same PTEs causing them to
miss TLBs that need flushing.
This does not work with powerpc that invalidates mmu_gather batches
according to page size. Have powerpc flush all possible page sizes in
the range if it encounters this concurrency condition.
Patch 4647706ebe ("mm: always flush VMA ranges affected by
zap_page_range") does add a TLB flush for all page sizes on powerpc for
the zap_page_range case, but that is to be removed and replaced with
the mmu_gather flush to avoid redundant flushing. It is also thought to
not cover other obscure race conditions:
https://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com
Hash does not have a problem because it invalidates TLBs inside the
page table locks.
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull more kvm updates from Paolo Bonzini:
"Mostly the PPC part of the release, but also switching to Arnd's fix
for the hyperv config issue and a typo fix.
Main PPC changes:
- reimplement the MMIO instruction emulation
- transactional memory support for PR KVM
- improve radix page table handling"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (63 commits)
KVM: x86: VMX: redo fix for link error without CONFIG_HYPERV
KVM: x86: fix typo at kvm_arch_hardware_setup comment
KVM: PPC: Book3S PR: Fix failure status setting in tabort. emulation
KVM: PPC: Book3S PR: Enable use on POWER9 bare-metal hosts in HPT mode
KVM: PPC: Book3S PR: Don't let PAPR guest set MSR hypervisor bit
KVM: PPC: Book3S PR: Fix failure status setting in treclaim. emulation
KVM: PPC: Book3S PR: Fix MSR setting when delivering interrupts
KVM: PPC: Book3S PR: Handle additional interrupt types
KVM: PPC: Book3S PR: Enable kvmppc_get/set_one_reg_pr() for HTM registers
KVM: PPC: Book3S: Remove load/put vcpu for KVM_GET_REGS/KVM_SET_REGS
KVM: PPC: Remove load/put vcpu for KVM_GET/SET_ONE_REG ioctl
KVM: PPC: Move vcpu_load/vcpu_put down to each ioctl case in kvm_arch_vcpu_ioctl
KVM: PPC: Book3S PR: Enable HTM for PR KVM for KVM_CHECK_EXTENSION ioctl
KVM: PPC: Book3S PR: Support TAR handling for PR KVM HTM
KVM: PPC: Book3S PR: Add guard code to prevent returning to guest with PR=0 and Transactional state
KVM: PPC: Book3S PR: Add emulation for tabort. in privileged state
KVM: PPC: Book3S PR: Add emulation for trechkpt.
KVM: PPC: Book3S PR: Add emulation for treclaim.
KVM: PPC: Book3S PR: Restore NV regs after emulating mfspr from TM SPRs
KVM: PPC: Book3S PR: Always fail transactions in guest privileged state
...
Pull more Kbuild updates from Masahiro Yamada:
- fix some bugs introduced by the recent Kconfig syntax extension
- add some symbols about compiler information in Kconfig, such as
CC_IS_GCC, CC_IS_CLANG, GCC_VERSION, etc.
- test compiler capability for the stack protector in Kconfig, and
clean-up Makefile
- test compiler capability for GCC-plugins in Kconfig, and clean-up
Makefile
- allow to enable GCC-plugins for COMPILE_TEST
- test compiler capability for KCOV in Kconfig and correct dependency
- remove auto-detect mode of the GCOV format, which is now more nicely
handled in Kconfig
- test compiler capability for mprofile-kernel on PowerPC, and clean-up
Makefile
- misc cleanups
* tag 'kbuild-v4.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
linux/linkage.h: replace VMLINUX_SYMBOL_STR() with __stringify()
kconfig: fix localmodconfig
sh: remove no-op macro VMLINUX_SYMBOL()
powerpc/kbuild: move -mprofile-kernel check to Kconfig
Documentation: kconfig: add recommended way to describe compiler support
gcc-plugins: disable GCC_PLUGIN_STRUCTLEAK_BYREF_ALL for COMPILE_TEST
gcc-plugins: allow to enable GCC_PLUGINS for COMPILE_TEST
gcc-plugins: test plugin support in Kconfig and clean up Makefile
gcc-plugins: move GCC version check for PowerPC to Kconfig
kcov: test compiler capability in Kconfig and correct dependency
gcov: remove CONFIG_GCOV_FORMAT_AUTODETECT
arm64: move GCC version check for ARCH_SUPPORTS_INT128 to Kconfig
kconfig: add CC_IS_CLANG and CLANG_VERSION
kconfig: add CC_IS_GCC and GCC_VERSION
stack-protector: test compiler capability in Kconfig and drop AUTO mode
kbuild: fix endless syncconfig in case arch Makefile sets CROSS_COMPILE