android_kernel_xiaomi_sm8450

xiaomi-sm8450/android_kernel_xiaomi_sm8450

Author	SHA1	Message	Date
Benjamin Herrenschmidt	e27e0a9465	powerpc/xive: Remove xive_kexec_teardown_cpu() It's identical to xive_teardown_cpu() so just use the latter Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-08-07 21:49:28 +10:00
Nicholas Piggin	5a6099346c	powerpc/64s/radix: tlb do not flush on page size when fullmm When the mm is being torn down there will be a full PID flush so there is no need to flush the TLB on page size changes. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-08-07 21:49:26 +10:00
Paolo Bonzini	d2ce98ca0a	Merge tag 'v4.18-rc6' into HEAD Pull bug fixes into the KVM development tree to avoid nasty conflicts.	2018-08-06 17:31:36 +02:00
Frederic Barrat	cca19f0b68	powerpc/64s/radix: Fix missing global invalidations when removing copro With the optimizations for TLB invalidation from commit `0cef77c779` ("powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask"), the scope of a TLBI (global vs. local) can now be influenced by the value of the 'copros' counter of the memory context. When calling mm_context_remove_copro(), the 'copros' counter is decremented first before flushing. It may have the unintended side effect of sending local TLBIs when we explicitly need global invalidations in this case. Thus breaking any nMMU user in a bad and unpredictable way. Fix it by flushing first, before updating the 'copros' counter, so that invalidations will be global. Fixes: `0cef77c779` ("powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask") Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-08-01 23:23:41 +10:00
Shilpasri G Bhat	04baaf28f4	powerpc/powernv: Add support to enable sensor groups Adds support to enable/disable a sensor group at runtime. This can be used to select the sensor groups that needs to be copied to main memory by OCC. Sensor groups like power, temperature, current, voltage, frequency, utilization can be enabled/disabled at runtime. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-31 19:56:45 +10:00
Akshay Adiga	1961acad2f	powernv/cpuidle: Use parsed device tree values for cpuidle_init Export pnv_idle_states and nr_pnv_idle_states so that its accessible to cpuidle driver. Use properties from pnv_idle_states structure for powernv cpuidle_init. Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-31 19:56:44 +10:00
Akshay Adiga	9c7b185ab2	powernv/cpuidle: Parse dt idle properties into global structure Device-tree parsing happens twice, once while deciding idle state to be used for hotplug and once during cpuidle init. Hence, parsing the device tree and caching it will reduce code duplication. Parsing code has been moved to pnv_parse_cpuidle_dt() from pnv_probe_idle_states(). In addition to the properties in the device tree the number of available states is also required. Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-31 19:56:44 +10:00
Michael Ellerman	a984506c54	powerpc/mm: Don't report PUDs as memory leaks when using kmemleak Paul Menzel reported that kmemleak was producing reports such as: unreferenced object 0xc0000000f8b80000 (size 16384): comm "init", pid 1, jiffies 4294937416 (age 312.240s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000d997deb7>] __pud_alloc+0x80/0x190 [<0000000087f2e8a3>] move_page_tables+0xbac/0xdc0 [<00000000091e51c2>] shift_arg_pages+0xc0/0x210 [<00000000ab88670c>] setup_arg_pages+0x22c/0x2a0 [<0000000060871529>] load_elf_binary+0x41c/0x1648 [<00000000ecd9d2d4>] search_binary_handler.part.11+0xbc/0x280 [<0000000034e0cdd7>] __do_execve_file.isra.13+0x73c/0x940 [<000000005f953a6e>] sys_execve+0x58/0x70 [<000000009700a858>] system_call+0x5c/0x70 Indicating that a PUD was being leaked. However what's really happening is that kmemleak is not able to recognise the references from the PGD to the PUD, because they are not fully qualified pointers. We can confirm that in xmon, eg: Find the task struct for pid 1 "init": 0:mon> P task_struct ->thread.ksp PID PPID S P CMD c0000001fe7c0000 c0000001fe803960 1 0 S 13 systemd Dump virtual address 0 to find the PGD: 0:mon> dv 0 c0000001fe7c0000 pgd @ 0xc0000000f8b01000 Dump the memory of the PGD: 0:mon> d c0000000f8b01000 c0000000f8b01000 00000000f8b90000 0000000000000000 \|................\| c0000000f8b01010 0000000000000000 0000000000000000 \|................\| c0000000f8b01020 0000000000000000 0000000000000000 \|................\| c0000000f8b01030 0000000000000000 00000000f8b80000 \|................\| ^^^^^^^^^^^^^^^^ There we can see the reference to our supposedly leaked PUD. But because it's missing the leading 0xc, kmemleak won't recognise it. We can confirm it's still in use by translating an address that is mapped via it: 0:mon> dv 7fff94000000 c0000001fe7c0000 pgd @ 0xc0000000f8b01000 pgdp @ 0xc0000000f8b01038 = 0x00000000f8b80000 <-- pudp @ 0xc0000000f8b81ff8 = 0x00000000037c4000 pmdp @ 0xc0000000037c5ca0 = 0x00000000fbd89000 ptep @ 0xc0000000fbd89000 = 0xc0800001d5ce0386 Maps physical address = 0x00000001d5ce0000 Flags = Accessed Dirty Read Write The fix is fairly simple. We need to tell kmemleak to ignore PUD allocations and never report them as leaks. We can also tell it not to scan the PGD, because it will never find pointers in there. However it will still notice if we allocate a PGD and then leak it. Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-30 22:48:21 +10:00
Christophe Leroy	405cb4024e	powerpc: split asm/tlbflush.h Split asm/tlbflush.h into: asm/nohash/tlbflush.h asm/book3s/32/tlbflush.h asm/book3s/64/tlbflush.h (already existing) Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-30 22:48:21 +10:00
Christophe Leroy	45ef5992e0	powerpc: remove unnecessary inclusion of asm/tlbflush.h asm/tlbflush.h is only needed for: - using functions xxx_flush_tlb_xxx() - using MMU_NO_CONTEXT - including asm-generic/pgtable.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-30 22:48:20 +10:00
Christophe Leroy	7bc396958c	powerpc/44x: remove page.h from mmu-44x.h mmu-44x.h doesn't need asm/page.h if PAGE_SHIFT are replaced by CONFIG_PPC_XX_PAGES Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-30 22:48:20 +10:00
Christophe Leroy	0c295d0e9c	powerpc/nohash: fix hash related comments in pgtable.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-30 22:48:19 +10:00
Christophe Leroy	62b8426578	powerpc: fix includes in asm/processor.h Remove superflous includes and add missing ones Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-30 22:48:19 +10:00
Christophe Leroy	6b62266911	powerpc/book3s: Remove PPC_PIN_SIZE PPC_PIN_SIZE is specific to the 44x and is defined in mmu.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-30 22:48:19 +10:00
Christophe Leroy	b5ac51d747	powerpc: declare set_breakpoint() static set_breakpoint() is only used in process.c so make it static Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-30 22:48:18 +10:00
Christophe Leroy	e8cb7a55eb	powerpc: remove superflous inclusions of asm/fixmap.h Files not using fixmap consts or functions don't need asm/fixmap.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-30 22:48:18 +10:00
Christophe Leroy	2c86cd188f	powerpc: clean inclusions of asm/feature-fixups.h files not using feature fixup don't need asm/feature-fixups.h files using feature fixup need asm/feature-fixups.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-30 22:48:17 +10:00
Christophe Leroy	5c35a02c54	powerpc: clean the inclusion of stringify.h Only include linux/stringify.h is files using __stringify() Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-30 22:48:17 +10:00
Christophe Leroy	ec0c464cdb	powerpc: move ASM_CONST and stringify_in_c() into asm-const.h This patch moves ASM_CONST() and stringify_in_c() into dedicated asm-const.h, then cleans all related inclusions. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: asm-compat.h should include asm-const.h] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-30 22:48:16 +10:00
Christophe Leroy	36a7eeaff7	powerpc/405: move PPC405_ERR77 in asm-405.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-30 22:48:13 +10:00
Christophe Leroy	8c58259bba	powerpc: remove unneeded inclusions of cpu_has_feature.h Files not using cpu_has_feature() don't need cpu_has_feature.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-30 22:47:54 +10:00
Christophe Leroy	db0a2b633d	powerpc: remove kdump.h from page.h page.h doesn't need kdump.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-30 22:47:53 +10:00
Paul Mackerras	1ebe6b81eb	KVM: PPC: Book3S HV: Allow creating max number of VCPUs on POWER9 Commit `1e175d2` ("KVM: PPC: Book3S HV: Pack VCORE IDs to access full VCPU ID space", 2018-07-25) allowed use of VCPU IDs up to KVM_MAX_VCPU_ID on POWER9 in all guest SMT modes and guest emulated hardware SMT modes. However, with the current definition of KVM_MAX_VCPU_ID, a guest SMT mode of 1 and an emulated SMT mode of 8, it is only possible to create KVM_MAX_VCPUS / 2 VCPUS, because threads_per_subcore is 4 on POWER9 CPUs. (Using an emulated SMT mode of 8 is useful when migrating VMs to or from POWER8 hosts.) This increases KVM_MAX_VCPU_ID to 8 * KVM_MAX_VCPUS when HV KVM is configured in, so that a full complement of KVM_MAX_VCPUS VCPUs can be created on POWER9 in all guest SMT modes and emulated hardware SMT modes. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2018-07-26 14:53:54 +10:00
Sam Bobroff	1e175d2e07	KVM: PPC: Book3S HV: Pack VCORE IDs to access full VCPU ID space It is not currently possible to create the full number of possible VCPUs (KVM_MAX_VCPUS) on Power9 with KVM-HV when the guest uses fewer threads per core than its core stride (or "VSMT mode"). This is because the VCORE ID and XIVE offsets grow beyond KVM_MAX_VCPUS even though the VCPU ID is less than KVM_MAX_VCPU_ID. To address this, "pack" the VCORE ID and XIVE offsets by using knowledge of the way the VCPU IDs will be used when there are fewer guest threads per core than the core stride. The primary thread of each core will always be used first. Then, if the guest uses more than one thread per core, these secondary threads will sequentially follow the primary in each core. So, the only way an ID above KVM_MAX_VCPUS can be seen, is if the VCPUs are being spaced apart, so at least half of each core is empty, and IDs between KVM_MAX_VCPUS and (KVM_MAX_VCPUS * 2) can be mapped into the second half of each core (4..7, in an 8-thread core). Similarly, if IDs above KVM_MAX_VCPUS * 2 are seen, at least 3/4 of each core is being left empty, and we can map down into the second and third quarters of each core (2, 3 and 5, 6 in an 8-thread core). Lastly, if IDs above KVM_MAX_VCPUS * 4 are seen, only the primary threads are being used and 7/8 of the core is empty, allowing use of the 1, 5, 3 and 7 thread slots. (Strides less than 8 are handled similarly.) This allows the VCORE ID or offset to be calculated quickly from the VCPU ID or XIVE server numbers, without access to the VCPU structure. [paulus@ozlabs.org - tidied up comment a little, changed some WARN_ONCE to pr_devel, wrapped line, fixed id check.] Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2018-07-26 13:23:52 +10:00
Mark Rutland	fd2efaa4eb	locking/atomics: Rework ordering barriers Currently architectures can override __atomic_op_() to define the barriers used before/after a relaxed atomic when used to build acquire/release/fence variants. This has the unfortunate property of requiring the architecture to define the full wrapper for the atomics, rather than just the barriers they care about, and gets in the way of generating atomics which can be easily read. Instead, this patch has architectures define an optional set of barriers: __atomic_acquire_fence() * __atomic_release_fence() * __atomic_pre_full_fence() * __atomic_post_full_fence() ... which <linux/atomic.h> uses to build the wrappers. It would be nice if we could undef these, along with the __atomic_op_*() wrappers, but that would break the cmpxchg() wrappers, which are written in preprocessor. Undefs would have been nice, but alas. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: andy.shevchenko@gmail.com Cc: arnd@arndb.de Cc: aryabinin@virtuozzo.com Cc: catalin.marinas@arm.com Cc: dvyukov@google.com Cc: glider@google.com Cc: linux-arm-kernel@lists.infradead.org Cc: peter@hurleysoftware.com Link: http://lkml.kernel.org/r/20180716113017.3909-7-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-07-25 11:53:59 +02:00
Ingo Molnar	93081caaae	Merge branch 'perf/urgent' into perf/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-07-25 11:47:02 +02:00
Nicholas Piggin	17cc1dd492	powerpc/powernv: implement opal_put_chars_atomic The RAW console does not need writes to be atomic, so relax opal_put_chars to be able to do partial writes, and implement an _atomic variant which does not take a spinlock. This API is used in xmon, so the less locking that is used, the better chance there is that a crash can be debugged. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:09:57 +10:00
Nicholas Piggin	d2a2262e68	powerpc/powernv: Implement and use opal_flush_console A new console flushing firmware API was introduced to replace event polling loops, and implemented in opal-kmsg with `affddff69c` ("powerpc/powernv: Add a kmsg_dumper that flushes console output on panic"), to flush the console in the panic path. The OPAL console driver has other situations where interrupts are off and it needs to flush the console synchronously. These still use a polling loop. So move the opal-kmsg flush code to opal_flush_console, and use the new function in opal-kmsg and opal_put_chars. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:09:56 +10:00
Simon Guo	d58badfb7c	powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision This patch add VMX primitives to do memcmp() in case the compare size is equal or greater than 4K bytes. KSM feature can benefit from this. Test result with following test program(replace the "^>" with ""): ------ ># cat tools/testing/selftests/powerpc/stringloops/memcmp.c >#include <malloc.h> >#include <stdlib.h> >#include <string.h> >#include <time.h> >#include "utils.h" >#define SIZE (1024 * 1024 * 900) >#define ITERATIONS 40 int test_memcmp(const void s1, const void s2, size_t n); static int testcase(void) { char s1; char s2; unsigned long i; s1 = memalign(128, SIZE); if (!s1) { perror("memalign"); exit(1); } s2 = memalign(128, SIZE); if (!s2) { perror("memalign"); exit(1); } for (i = 0; i < SIZE; i++) { s1[i] = i & 0xff; s2[i] = i & 0xff; } for (i = 0; i < ITERATIONS; i++) { int ret = test_memcmp(s1, s2, SIZE); if (ret) { printf("return %d at[%ld]! should have returned zero\n", ret, i); abort(); } } return 0; } int main(void) { return test_harness(testcase, "memcmp"); } ------ Without this patch (but with the first patch "powerpc/64: Align bytes before fall back to .Lshort in powerpc64 memcmp()." in the series): 4.726728762 seconds time elapsed ( +- 3.54%) With VMX patch: 4.234335473 seconds time elapsed ( +- 2.63%) There is ~+10% improvement. Testing with unaligned and different offset version (make s1 and s2 shift random offset within 16 bytes) can archieve higher improvement than 10%.. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:03:21 +10:00
Simon Guo	f1ecbaf466	powerpc: add vcmpequd/vcmpequb ppc instruction macro Some old tool chains don't know about instructions like vcmpequd. This patch adds .long macro for vcmpequd and vcmpequb, which is a preparation to optimize ppc64 memcmp with VMX instructions. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:03:20 +10:00
Aneesh Kumar K.V	a833280b4a	powerpc/mm/hash: Add hpte_get_old_v and use that instead of opencoding No functional change Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:03:18 +10:00
Aneesh Kumar K.V	7d4340bb92	powerpc/mm: Increase MAX_PHYSMEM_BITS to 128TB with SPARSEMEM_VMEMMAP config We do this only with VMEMMAP config so that our page_to_[nid/section] etc are not impacted. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:03:17 +10:00
Nicholas Piggin	5b73151fff	powerpc: NMI IPI make NMI IPIs fully sychronous There is an asynchronous aspect to smp_send_nmi_ipi. The caller waits for all CPUs to call in to the handler, but it does not wait for completion of the handler. This is a needless complication, so remove it and always wait synchronously. The synchronous wait allows the caller to easily time out and clear the wait for completion (zero nmi_ipi_busy_count) in the case of badly behaved handlers. This would have prevented the recent smp_send_stop NMI IPI bug from causing the system to hang. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:03:14 +10:00
Nicholas Piggin	9b81c0211c	powerpc/64s: make PACA_IRQ_HARD_DIS track MSR[EE] closely When the masked interrupt handler clears MSR[EE] for an interrupt in the PACA_IRQ_MUST_HARD_MASK set, it does not set PACA_IRQ_HARD_DIS. This makes them get out of synch. With that taken into account, it's only low level irq manipulation (and interrupt entry before reconcile) where they can be out of synch. This makes the code less surprising. It also allows the IRQ replay code to rely on the IRQ_HARD_DIS value and not have to mtmsrd again in this case (e.g., for an external interrupt that has been masked). The bigger benefit might just be that there is not such an element of surprise in these two bits of state. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 22:03:14 +10:00
Ram Pai	07f522d203	powerpc/pkeys: make protection key 0 less special Applications need the ability to associate an address-range with some key and latter revert to its initial default key. Pkey-0 comes close to providing this function but falls short, because the current implementation disallows applications to explicitly associate pkey-0 to the address range. Lets make pkey-0 less special and treat it almost like any other key. Thus it can be explicitly associated with any address range, and can be freed. This gives the application more flexibility and power. The ability to free pkey-0 must be used responsibily, since pkey-0 is associated with almost all address-range by default. Even with this change pkey-0 continues to be slightly more special from the following point of view. (a) it is implicitly allocated. (b) it is the default key assigned to any address-range. (c) its permissions cannot be modified by userspace. NOTE: (c) is specific to powerpc only. pkey-0 is associated by default with all pages including kernel pages, and pkeys are also active in kernel mode. If any permission is denied on pkey-0, the kernel running in the context of the application will be unable to operate. Tested on powerpc. Signed-off-by: Ram Pai <linuxram@us.ibm.com> [mpe: Drop #define PKEY_0 0 in favour of plain old 0] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 21:43:24 +10:00
Ram Pai	4a4a5e5d2a	powerpc/pkeys: key allocation/deallocation must not change pkey registers Key allocation and deallocation has the side effect of programming the UAMOR/AMR/IAMR registers. This is wrong, since its the responsibility of the application and not that of the kernel, to modify the permission on the key. Do not modify the pkey registers at key allocation/deallocation. This patch also fixes a bug where a sys_pkey_free() resets the UAMOR bits of the key, thus making its permissions unmodifiable from user space. Later if the same key gets reallocated from a different thread this thread will no longer be able to change the permissions on the key. Fixes: `cf43d3b264` ("powerpc: Enable pkey subsystem") Cc: stable@vger.kernel.org # v4.16+ Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-24 21:34:08 +10:00
Michael Ellerman	ce57c6610c	Merge branch 'topic/ppc-kvm' into next Merge in some commits we're sharing with the KVM tree. I manually propagated the change from commit `d3d4ffaae4` ("powerpc/powernv/ioda2: Reduce upper limit for DMA window size") into pci-ioda-tce.c. Conflicts: arch/powerpc/include/asm/cputable.h arch/powerpc/platforms/powernv/pci-ioda.c arch/powerpc/platforms/powernv/pci.h	2018-07-19 14:37:57 +10:00
Alexey Kardashevskiy	76fa4975f3	KVM: PPC: Check if IOMMU page is contained in the pinned physical page A VM which has: - a DMA capable device passed through to it (eg. network card); - running a malicious kernel that ignores H_PUT_TCE failure; - capability of using IOMMU pages bigger that physical pages can create an IOMMU mapping that exposes (for example) 16MB of the host physical memory to the device when only 64K was allocated to the VM. The remaining 16MB - 64K will be some other content of host memory, possibly including pages of the VM, but also pages of host kernel memory, host programs or other VMs. The attacking VM does not control the location of the page it can map, and is only allowed to map as many pages as it has pages of RAM. We already have a check in drivers/vfio/vfio_iommu_spapr_tce.c that an IOMMU page is contained in the physical page so the PCI hardware won't get access to unassigned host memory; however this check is missing in the KVM fastpath (H_PUT_TCE accelerated code). We were lucky so far and did not hit this yet as the very first time when the mapping happens we do not have tbl::it_userspace allocated yet and fall back to the userspace which in turn calls VFIO IOMMU driver, this fails and the guest does not retry, This stores the smallest preregistered page size in the preregistered region descriptor and changes the mm_iommu_xxx API to check this against the IOMMU page size. This calculates maximum page size as a minimum of the natural region alignment and compound page size. For the page shift this uses the shift returned by find_linux_pte() which indicates how the page is mapped to the current userspace - if the page is huge and this is not a zero, then it is a leaf pte and the page is mapped within the range. Fixes: `121f80ba68` ("KVM: PPC: VFIO: Add in-kernel acceleration for VFIO") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-18 16:17:17 +10:00
Nicholas Mc Guire	0abb75b7a1	KVM: PPC: Book3S HV: Fix constant size warning The constants are 64bit but not explicitly declared UL resulting in sparse warnings. Fix this by declaring the constants UL. Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2018-07-18 15:14:45 +10:00
Simon Guo	4eeb85568e	KVM: PPC: Remove mmio_vsx_tx_sx_enabled in KVM MMIO emulation Originally PPC KVM MMIO emulation uses only 0~31#(5 bits) for VSR reg number, and use mmio_vsx_tx_sx_enabled field together for 0~63# VSR regs. Currently PPC KVM MMIO emulation is reimplemented with analyse_instr() assistance. analyse_instr() returns 0~63 for VSR register number, so it is not necessary to use additional mmio_vsx_tx_sx_enabled field any more. This patch extends related reg bits (expand io_gpr to u16 from u8 and use 6 bits for VSR reg#), so that mmio_vsx_tx_sx_enabled can be removed. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2018-07-18 15:14:45 +10:00
Ingo Molnar	52b544bd38	Merge tag 'v4.18-rc5' into locking/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-07-17 09:27:43 +02:00
Alexey Kardashevskiy	a68bd1267b	powerpc/powernv/ioda: Allocate indirect TCE levels on demand At the moment we allocate the entire TCE table, twice (hardware part and userspace translation cache). This normally works as we normally have contigous memory and the guest will map entire RAM for 64bit DMA. However if we have sparse RAM (one example is a memory device), then we will allocate TCEs which will never be used as the guest only maps actual memory for DMA. If it is a single level TCE table, there is nothing we can really do but if it a multilevel table, we can skip allocating TCEs we know we won't need. This adds ability to allocate only first level, saving memory. This changes iommu_table::free() to avoid allocating of an extra level; iommu_table::set() will do this when needed. This adds @alloc parameter to iommu_table::exchange() to tell the callback if it can allocate an extra level; the flag is set to "false" for the realmode KVM handlers of H_PUT_TCE hcalls and the callback returns H_TOO_HARD. This still requires the entire table to be counted in mm::locked_vm. To be conservative, this only does on-demand allocation when the usespace cache table is requested which is the case of VFIO. The example math for a system replicating a powernv setup with NVLink2 in a guest: 16GB RAM mapped at 0x0 128GB GPU RAM window (16GB of actual RAM) mapped at 0x244000000000 the table to cover that all with 64K pages takes: (((0x244000000000 + 0x2000000000) >> 16)8)>>20 = 4556MB If we allocate only necessary TCE levels, we will only need: (((0x400000000 + 0x400000000) >> 16)8)>>20 = 4MB (plus some for indirect levels). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-16 22:53:11 +10:00
Alexey Kardashevskiy	090bad39b2	powerpc/powernv: Add indirect levels to it_userspace We want to support sparse memory and therefore huge chunks of DMA windows do not need to be mapped. If a DMA window big enough to require 2 or more indirect levels, and a DMA window is used to map all RAM (which is a default case for 64bit window), we can actually save some memory by not allocation TCE for regions which we are not going to map anyway. The hardware tables alreary support indirect levels but we also keep host-physical-to-userspace translation array which is allocated by vmalloc() and is a flat array which might use quite some memory. This converts it_userspace from vmalloc'ed array to a multi level table. As the format becomes platform dependend, this replaces the direct access to it_usespace with a iommu_table_ops::useraddrptr hook which returns a pointer to the userspace copy of a TCE; future extension will return NULL if the level was not allocated. This should not change non-KVM handling of TCE tables and it_userspace will not be allocated for non-KVM tables. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-16 22:53:10 +10:00
Alexey Kardashevskiy	00a5c58d94	KVM: PPC: Make iommu_table::it_userspace big endian We are going to reuse multilevel TCE code for the userspace copy of the TCE table and since it is big endian, let's make the copy big endian too. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-16 22:53:09 +10:00
Nicholas Piggin	2bf1071a8d	powerpc/64s: Remove POWER9 DD1 support POWER9 DD1 was never a product. It is no longer supported by upstream firmware, and it is not effectively supported in Linux due to lack of testing. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> [mpe: Remove arch_make_huge_pte() entirely] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-16 11:37:21 +10:00
Joel Stanley	e11b64b1ef	powerpc: Remove Power8 DD1 from cputable This was added to support an early version of Power8 that did not have working doorbells. These machines were not publicly available, and all of the internal users have long since upgraded. Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-12 21:08:09 +10:00
Alastair D'Silva	8bf6b91a51	Revert "powerpc/powernv: Add support for the cxl kernel api on the real phb" Remove abandonned capi support for the Mellanox CX4. This reverts commit `4361b03430`. Signed-off-by: Alastair D'Silva <alastair@d-silva.org> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-02 23:54:32 +10:00
Mauro S. M. Rodrigues	ee8c446fed	powerpc/eeh: Avoid misleading message "EEH: no capable adapters found" Due to recent refactoring in EEH in: commit `b9fde58db7` ("powerpc/powernv: Rework EEH initialization on powernv") a misleading message was seen in the kernel message buffer: [ 0.108431] EEH: PowerNV platform initialized [ 0.589979] EEH: No capable adapters found This happened due to the removal of the initialization delay for powernv platform. Even though the EEH infrastructure for the devices is eventually initialized and still works just fine the eeh device probe step is postponed in order to assure the PEs are created. Later pnv_eeh_post_init does the probe devices job but at that point the message was already shown right after eeh_init flow. This patch introduces a new flag EEH_POSTPONED_PROBE to represent that temporary state and avoid the message mentioned above and showing the follow one instead: [ 0.107724] EEH: PowerNV platform initialized [ 4.844825] EEH: PCI Enhanced I/O Error Handling Enabled Signed-off-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com> Acked-by: Russell Currey <ruscur@russell.cc> Tested-by:Venkat Rao B <vrbagal1@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-07-02 23:54:26 +10:00
Aneesh Kumar K.V	941c06d585	powerpc/mm/32: Fix pgtable_page_dtor call Commit `667416f385` ("powerpc/mm: Fix kernel crash on page table free") added a call for pgtable_page_dtor in the rcu page table free routine. We missed the fact that for 32 bit platforms we did call the 'dtor' early. Drop the extra call for pgtable_page_dtor. We remove the call from __pte_free_tlb so that we do the page table free and 'dtor' call together. This should help when we switch these platforms to pte fragments. Fixes: `667416f385` ("powerpc/mm: Fix kernel crash on page table free") Reported-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-26 23:43:14 +10:00
Frederic Weisbecker	cffbb3bd44	perf/hw_breakpoint: Remove default hw_breakpoint_arch_parse() All architectures have implemented it, we can now remove the poor weak version. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Joel Fernandes <joel.opensrc@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/1529981939-8231-11-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-06-26 09:07:58 +02:00

1 2 3 4 5 ...

4521 Commits