android_kernel_xiaomi_sm8450

xiaomi-sm8450/android_kernel_xiaomi_sm8450

Author	SHA1	Message	Date
Christophe Leroy	925ac141d1	powerpc/mm: Allocate static page tables for fixmap Allocate static page tables for the fixmap area. This allows setting mappings through page tables before memblock is ready. That's needed to use early_ioremap() early and to use standard page mappings with fixmap. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/4f4b1412d34de6801b8e925cb88fc69d056ff536.1589866984.git.christophe.leroy@csgroup.eu	2020-05-26 22:22:19 +10:00
Michael Ellerman	595d153dd1	powerpc/64s: Fix restore of NV GPRs after facility unavailable exception Commit `702f098052` ("powerpc/64s/exception: Remove lite interrupt return") changed the interrupt return path to not restore non-volatile registers by default, and explicitly restore them in paths where it is required. But it missed that the facility unavailable exception can sometimes modify user registers, ie. when it does emulation of move from DSCR. This is seen as a failure of the dscr_sysfs_thread_test: test: dscr_sysfs_thread_test [cpu 0] User DSCR should be 1 but is 0 failure: dscr_sysfs_thread_test So restore non-volatile GPRs after facility unavailable exceptions. Currently the hypervisor facility unavailable exception is also wired up to call facility_unavailable_exception(). In practice we should never take a hypervisor facility unavailable exception for the DSCR. On older bare metal systems we set HFSCR_DSCR unconditionally in __init_HFSCR, or on newer systems it should be enabled via the "data-stream-control-register" device tree CPU feature. Even if it's not, since commit `f3c99f97a3` ("KVM: PPC: Book3S HV: Don't access HFSCR, LPIDR or LPCR when running nested"), the KVM code has unconditionally set HFSCR_DSCR when running guests. So we should only get a hypervisor facility unavailable for the DSCR if skiboot has disabled the "data-stream-control-register" feature, and we are somehow in guest context but not via KVM. Given all that, it should be unnecessary to add a restore of non-volatile GPRs after the hypervisor facility exception, because we never expect to hit that path. But equally we may as well add the restore, because we never expect to hit that path, and if we ever did, at least we would correctly restore the registers to their post emulation state. In future we can split the non-HV and HV facility unavailable handling so that there is no emulation in the HV handler, and then remove the restore for the HV case. Fixes: `702f098052` ("powerpc/64s/exception: Remove lite interrupt return") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200526061808.2472279-1-mpe@ellerman.id.au	2020-05-26 17:32:37 +10:00
Christophe Leroy	40bb0e9042	Revert "powerpc/32s: reorder Linux PTE bits to better match Hash PTE bits." This reverts commit `697ece78f8`. The implementation of SWAP on powerpc requires page protection bits to not be one of the least significant PTE bits. Until the SWAP implementation is changed and this requirement voids, we have to keep at least _PAGE_RW outside of the 3 last bits. For now, revert to previous PTE bits order. A further rework may come later. Fixes: `697ece78f8` ("powerpc/32s: reorder Linux PTE bits to better match Hash PTE bits.") Reported-by: Rui Salvaterra <rsalvaterra@gmail.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b34706f8de87f84d135abb5f3ede6b6f16fb1f41.1589969799.git.christophe.leroy@csgroup.eu	2020-05-20 22:35:52 +10:00
Peter Zijlstra	69ea03b56e	hardirq/nmi: Allow nested nmi_enter() Since there are already a number of sites (ARM64, PowerPC) that effectively nest nmi_enter(), make the primitive support this before adding even more. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Link: https://lkml.kernel.org/r/20200505134100.864179229@linutronix.de	2020-05-19 15:51:17 +02:00
Thomas Gleixner	6553896666	vmlinux.lds.h: Create section for protection against instrumentation Some code pathes, especially the low level entry code, must be protected against instrumentation for various reasons: - Low level entry code can be a fragile beast, especially on x86. - With NO_HZ_FULL RCU state needs to be established before using it. Having a dedicated section for such code allows to validate with tooling that no unsafe functions are invoked. Add the .noinstr.text section and the noinstr attribute to mark functions. noinstr implies notrace. Kprobes will gain a section check later. Provide also a set of markers: instrumentation_begin()/end() These are used to mark code inside a noinstr function which calls into regular instrumentable text section as safe. The instrumentation markers are only active when CONFIG_DEBUG_ENTRY is enabled as the end marker emits a NOP to prevent the compiler from merging the annotation points. This means the objtool verification requires a kernel compiled with this option. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200505134100.075416272@linutronix.de	2020-05-19 15:47:20 +02:00
Ravi Bangoria	29da4f91c0	powerpc/watchpoint: Don't allow concurrent perf and ptrace events With Book3s DAWR, ptrace and perf watchpoints on powerpc behaves differently. Ptrace watchpoint works in one-shot mode and generates signal before executing instruction. It's ptrace user's job to single-step the instruction and re-enable the watchpoint. OTOH, in case of perf watchpoint, kernel emulates/single-steps the instruction and then generates event. If perf and ptrace creates two events with same or overlapping address ranges, it's ambiguous to decide who should single-step the instruction. Because of this issue, don't allow perf and ptrace watchpoint at the same time if their address range overlaps. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Michael Neuling <mikey@neuling.org> Link: https://lore.kernel.org/r/20200514111741.97993-15-ravi.bangoria@linux.ibm.com	2020-05-19 00:14:45 +10:00
Ravi Bangoria	74c6881019	powerpc/watchpoint: Prepare handler to handle more than one watchpoint Currently we assume that we have only one watchpoint supported by hw. Get rid of that assumption and use dynamic loop instead. This should make supporting more watchpoints very easy. With more than one watchpoint, exception handler needs to know which DAWR caused the exception, and hw currently does not provide it. So we need sw logic for the same. To figure out which DAWR caused the exception, check all different combinations of user specified range, DAWR address range, actual access range and DAWRX constrains. For ex, if user specified range and actual access range overlaps but DAWRX is configured for readonly watchpoint and the instruction is store, this DAWR must not have caused exception. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Reviewed-by: Michael Neuling <mikey@neuling.org> [mpe: Unsplit multi-line printk() strings, fix some sparse warnings] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200514111741.97993-14-ravi.bangoria@linux.ibm.com	2020-05-19 00:14:37 +10:00
Ravi Bangoria	e68ef121c1	powerpc/watchpoint: Use builtin ALIGN*() macros Currently we calculate hw aligned start and end addresses manually. Replace them with builtin ALIGN_DOWN() and ALIGN() macros. So far end_addr was inclusive but this patch makes it exclusive (by avoiding -1) for better readability. Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Michael Neuling <mikey@neuling.org> Link: https://lore.kernel.org/r/20200514111741.97993-13-ravi.bangoria@linux.ibm.com	2020-05-19 00:11:05 +10:00
Ravi Bangoria	c9e82aeb19	powerpc/watchpoint: Introduce is_ptrace_bp() function Introduce is_ptrace_bp() function and move the check inside the function. It will be utilize more in later set of patches. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Michael Neuling <mikey@neuling.org> Link: https://lore.kernel.org/r/20200514111741.97993-12-ravi.bangoria@linux.ibm.com	2020-05-19 00:11:05 +10:00
Ravi Bangoria	6b424efa11	powerpc/watchpoint: Use loop for thread_struct->ptrace_bps ptrace_bps is already an array of size HBP_NUM_MAX. But we use hardcoded index 0 while fetching/updating it. Convert such code to loop over array. ptrace interface to use multiple watchpoint remains same. eg: two PPC_PTRACE_SETHWDEBUG calls will create two watchpoint if underneath hw supports it. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Michael Neuling <mikey@neuling.org> Link: https://lore.kernel.org/r/20200514111741.97993-11-ravi.bangoria@linux.ibm.com	2020-05-19 00:11:05 +10:00
Ravi Bangoria	303e6a9ddc	powerpc/watchpoint: Convert thread_struct->hw_brk to an array So far powerpc hw supported only one watchpoint. But Power10 is introducing 2nd DAWR. Convert thread_struct->hw_brk into an array. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Michael Neuling <mikey@neuling.org> Link: https://lore.kernel.org/r/20200514111741.97993-10-ravi.bangoria@linux.ibm.com	2020-05-19 00:11:05 +10:00
Ravi Bangoria	22a214e461	powerpc/watchpoint: Disable all available watchpoints when !dawr_force_enable Instead of disabling only first watchpoint, disable all available watchpoints while clearing dawr_force_enable. Callback function is used only for disabling watchpoint, rename it to disable_dawrs_cb(). And null_brk parameter is not really required while disabling watchpoint, remove it. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Michael Neuling <mikey@neuling.org> Link: https://lore.kernel.org/r/20200514111741.97993-9-ravi.bangoria@linux.ibm.com	2020-05-19 00:11:05 +10:00
Ravi Bangoria	4a8a9379f2	powerpc/watchpoint: Provide DAWR number to __set_breakpoint Introduce new parameter 'nr' to __set_breakpoint() which indicates which DAWR should be programed. Also convert current_brk variable to an array. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Michael Neuling <mikey@neuling.org> Link: https://lore.kernel.org/r/20200514111741.97993-7-ravi.bangoria@linux.ibm.com	2020-05-19 00:11:04 +10:00
Ravi Bangoria	a18b834625	powerpc/watchpoint: Provide DAWR number to set_dawr Introduce new parameter 'nr' to set_dawr() which indicates which DAWR should be programed. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Michael Neuling <mikey@neuling.org> Link: https://lore.kernel.org/r/20200514111741.97993-6-ravi.bangoria@linux.ibm.com	2020-05-19 00:11:04 +10:00
Ravi Bangoria	45093b382e	powerpc/watchpoint/ptrace: Return actual num of available watchpoints User can ask for num of available watchpoints(dbginfo.num_data_bps) using ptrace(PPC_PTRACE_GETHWDBGINFO). Return actual number of available watchpoints on the machine rather than hardcoded 1. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Michael Neuling <mikey@neuling.org> Link: https://lore.kernel.org/r/20200514111741.97993-5-ravi.bangoria@linux.ibm.com	2020-05-19 00:11:04 +10:00
Ravi Bangoria	a6ba44e879	powerpc/watchpoint: Introduce function to get nr watchpoints dynamically So far we had only one watchpoint, so we have hardcoded HBP_NUM to 1. But Power10 is introducing 2nd DAWR and thus kernel should be able to dynamically find actual number of watchpoints supported by hw it's running on. Introduce function for the same. Also convert HBP_NUM macro to HBP_NUM_MAX, which will now represent maximum number of watchpoints supported by Powerpc. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Michael Neuling <mikey@neuling.org> Link: https://lore.kernel.org/r/20200514111741.97993-4-ravi.bangoria@linux.ibm.com	2020-05-19 00:11:04 +10:00
Ravi Bangoria	09f82b063a	powerpc/watchpoint: Rename current DAWR macros Power10 is introducing second DAWR. Use real register names from ISA for current macros: s/SPRN_DAWR/SPRN_DAWR0/ s/SPRN_DAWRX/SPRN_DAWRX0/ Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Michael Neuling <mikey@neuling.org> Link: https://lore.kernel.org/r/20200514111741.97993-2-ravi.bangoria@linux.ibm.com	2020-05-19 00:11:03 +10:00
Jordan Niethe	9409d2f9da	powerpc: Support prefixed instructions in alignment handler If a prefixed instruction results in an alignment exception, the SRR1_PREFIXED bit is set. The handler attempts to emulate the responsible instruction and then increment the NIP past it. Use SRR1_PREFIXED to determine by how much the NIP should be incremented. Prefixed instructions are not permitted to cross 64-byte boundaries. If they do the alignment interrupt is invoked with SRR1 BOUNDARY bit set. If this occurs send a SIGBUS to the offending process if in user mode. If in kernel mode call bad_page_fault(). Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Alistair Popple <alistair@popple.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-29-jniethe5@gmail.com	2020-05-19 00:11:03 +10:00
Jordan Niethe	b4657f7650	powerpc/kprobes: Don't allow breakpoints on suffixes Do not allow inserting breakpoints on the suffix of a prefix instruction in kprobes. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-28-jniethe5@gmail.com	2020-05-19 00:11:03 +10:00
Jordan Niethe	650b55b707	powerpc: Add prefixed instructions to instruction data type For powerpc64, redefine the ppc_inst type so both word and prefixed instructions can be represented. On powerpc32 the type will remain the same. Update places which had assumed instructions to be 4 bytes long. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Reviewed-by: Alistair Popple <alistair@popple.id.au> [mpe: Rework the get_user_inst() macros to be parameterised, and don't assign to the dest if an error occurred. Use CONFIG_PPC64 not __powerpc64__ in a few places. Address other comments from Christophe. Fix some sparse complaints.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-24-jniethe5@gmail.com	2020-05-19 00:10:39 +10:00
Jordan Niethe	7a8818e0df	powerpc/optprobes: Add register argument to patch_imm64_load_insns() Currently patch_imm32_load_insns() is used to load an instruction to r4 to be emulated by emulate_step(). For prefixed instructions we would like to be able to load a 64bit immediate to r4. To prepare for this make patch_imm64_load_insns() take an argument that decides which register to load an immediate to - rather than hardcoding r3. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200516115449.4168796-1-mpe@ellerman.id.au	2020-05-19 00:10:38 +10:00
Alistair Popple	2aa6195e43	powerpc: Enable Prefixed Instructions Prefix instructions have their own FSCR bit which needs to enabled via a CPU feature. The kernel will save the FSCR for problem state but it needs to be enabled initially. If prefixed instructions are made unavailable by the [H]FSCR, attempting to use them will cause a facility unavailable exception. Add "PREFIX" to the facility_strings[]. Currently there are no prefixed instructions that are actually emulated by emulate_instruction() within facility_unavailable_exception(). However, when caused by a prefixed instructions the SRR1 PREFIXED bit is set. Prepare for dealing with emulated prefixed instructions by checking for this bit. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Link: https://lore.kernel.org/r/20200506034050.24806-22-jniethe5@gmail.com	2020-05-19 00:10:38 +10:00
Jordan Niethe	622cf6f436	powerpc: Introduce a function for reporting instruction length Currently all instructions have the same length, but in preparation for prefixed instructions introduce a function for returning instruction length. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Alistair Popple <alistair@popple.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-18-jniethe5@gmail.com	2020-05-19 00:10:38 +10:00
Jordan Niethe	5249385ad7	powerpc: Define and use get_user_instr() et. al. Define specialised get_user_instr(), __get_user_instr() and __get_user_instr_inatomic() macros for reading instructions from user and/or kernel space. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Reviewed-by: Alistair Popple <alistair@popple.id.au> [mpe: Squash in addition of get_user_instr() & __user annotations] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-17-jniethe5@gmail.com	2020-05-19 00:10:37 +10:00
Jordan Niethe	a8646f43ba	powerpc/kprobes: Use patch_instruction() Instead of using memcpy() and flush_icache_range() use patch_instruction() which not only accomplishes both of these steps but will also make it easier to add support for prefixed instructions. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Alistair Popple <alistair@popple.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-16-jniethe5@gmail.com	2020-05-19 00:10:37 +10:00
Jordan Niethe	95b980a00d	powerpc: Add a probe_kernel_read_inst() function Introduce a probe_kernel_read_inst() function to use in cases where probe_kernel_read() is used for getting an instruction. This will be more useful for prefixed instructions. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Reviewed-by: Alistair Popple <alistair@popple.id.au> [mpe: Don't write to *inst on error] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-15-jniethe5@gmail.com	2020-05-19 00:10:37 +10:00
Jordan Niethe	f8faaffaa7	powerpc: Use a function for reading instructions Prefixed instructions will mean there are instructions of different length. As a result dereferencing a pointer to an instruction will not necessarily give the desired result. Introduce a function for reading instructions from memory into the instruction data type. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Alistair Popple <alistair@popple.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-13-jniethe5@gmail.com	2020-05-19 00:10:37 +10:00
Jordan Niethe	94afd069d9	powerpc: Use a datatype for instructions Currently unsigned ints are used to represent instructions on powerpc. This has worked well as instructions have always been 4 byte words. However, ISA v3.1 introduces some changes to instructions that mean this scheme will no longer work as well. This change is Prefixed Instructions. A prefixed instruction is made up of a word prefix followed by a word suffix to make an 8 byte double word instruction. No matter the endianness of the system the prefix always comes first. Prefixed instructions are only planned for powerpc64. Introduce a ppc_inst type to represent both prefixed and word instructions on powerpc64 while keeping it possible to exclusively have word instructions on powerpc32. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> [mpe: Fix compile error in emulate_spe()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-12-jniethe5@gmail.com	2020-05-19 00:10:37 +10:00
Jordan Niethe	217862d9b9	powerpc: Introduce functions for instruction equality In preparation for an instruction data type that can not be directly used with the '==' operator use functions for checking equality. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Balamuruhan S <bala24@linux.ibm.com> Link: https://lore.kernel.org/r/20200506034050.24806-11-jniethe5@gmail.com	2020-05-19 00:10:37 +10:00
Jordan Niethe	aabd2233b6	powerpc: Use a function for byte swapping instructions Use a function for byte swapping instructions in preparation of a more complicated instruction type. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Balamuruhan S <bala24@linux.ibm.com> Link: https://lore.kernel.org/r/20200506034050.24806-10-jniethe5@gmail.com	2020-05-19 00:10:37 +10:00
Jordan Niethe	8094892d1a	powerpc: Use a function for getting the instruction op code In preparation for using a data type for instructions that can not be directly used with the '>>' operator use a function for getting the op code of an instruction. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Reviewed-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-9-jniethe5@gmail.com	2020-05-19 00:10:37 +10:00
Jordan Niethe	777e26f0ed	powerpc: Use an accessor for instructions In preparation for introducing a more complicated instruction type to accommodate prefixed instructions use an accessor for getting an instruction as a u32. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-8-jniethe5@gmail.com	2020-05-19 00:10:36 +10:00
Jordan Niethe	7534625128	powerpc: Use a macro for creating instructions from u32s In preparation for instructions having a more complex data type start using a macro, ppc_inst(), for making an instruction out of a u32. A macro is used so that instructions can be used as initializer elements. Currently this does nothing, but it will allow for creating a data type that can represent prefixed instructions. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> [mpe: Change include guard to _ASM_POWERPC_INST_H] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Alistair Popple <alistair@popple.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-7-jniethe5@gmail.com	2020-05-19 00:10:36 +10:00
Jordan Niethe	7c95d8893f	powerpc: Change calling convention for create_branch() et. al. create_branch(), create_cond_branch() and translate_branch() return the instruction that they create, or return 0 to signal an error. Separate these concerns in preparation for an instruction type that is not just an unsigned int. Fill the created instruction to a pointer passed as the first parameter to the function and use a non-zero return value to signify an error. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Alistair Popple <alistair@popple.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-6-jniethe5@gmail.com	2020-05-19 00:10:36 +10:00
Jordan Niethe	4eff2b4f32	powerpc/xmon: Move breakpoints to text section The instructions for xmon's breakpoint are stored bpt_table[] which is in the data section. This is problematic as the data section may be marked as no execute. Move bpt_table[] to the text section. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200506034050.24806-4-jniethe5@gmail.com	2020-05-19 00:10:36 +10:00
Nicholas Piggin	265d6e588d	powerpc/traps: Make unrecoverable NMIs die instead of panic System Reset and Machine Check interrupts that are not recoverable due to being nested or interrupting when RI=0 currently panic. This is not necessary, and can often just kill the current context and recover. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Link: https://lore.kernel.org/r/20200508043408.886394-16-npiggin@gmail.com	2020-05-19 00:10:35 +10:00
Nicholas Piggin	bbbc8032b0	powerpc/traps: Do not trace system reset Similarly to the previous patch, do not trace system reset. This code is used when there is a crash or hang, and tracing disturbs the system more and has been known to crash in the crash handling path. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20200508043408.886394-15-npiggin@gmail.com	2020-05-19 00:10:35 +10:00
Nicholas Piggin	abd106fb43	powerpc/64s: machine check do not trace real-mode handler Rather than notrace annotations throughout a significant part of the machine check code across kernel/ pseries/ and powernv/ which can easily be broken and is infrequently tested, use paca->ftrace_enabled to blanket-disable tracing of the real-mode non-maskable handler. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20200508043408.886394-14-npiggin@gmail.com	2020-05-19 00:10:34 +10:00
Nicholas Piggin	116ac378bb	powerpc/64s: machine check interrupt update NMI accounting machine_check_early() is taken as an NMI, so nmi_enter() is used there. machine_check_exception() is no longer taken as an NMI (it's invoked via irq_work in the case a machine check hits in kernel mode), so remove the nmi_enter() from that case. In NMI context, hash faults don't try to refill the hash table, which can lead to crashes accessing non-pinned kernel pages. System reset still has this potential problem. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Drop change in show_regs() which breaks Book3E] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200508043408.886394-12-npiggin@gmail.com	2020-05-19 00:10:34 +10:00
Nicholas Piggin	d2cbbd45d4	powerpc/pseries: Limit machine check stack to 4GB This allows rtas_args to be put on the machine check stack, which avoids a lot of complications with re-entrancy deadlocks. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Link: https://lore.kernel.org/r/20200508043408.886394-10-npiggin@gmail.com	2020-05-19 00:10:34 +10:00
Nicholas Piggin	f0fd9dd3c2	powerpc/64s/exceptions: Machine check reconcile irq state pseries fwnmi machine check code pops the soft-irq checks in rtas_call (after the next patch to remove rtas_token from this call path). Rather than play whack a mole with these and forever having fragile code, it seems better to have the early machine check handler perform the same kind of reconcile as the other NMI interrupts. WARNING: CPU: 0 PID: 493 at arch/powerpc/kernel/irq.c:343 CPU: 0 PID: 493 Comm: a Tainted: G W NIP: c00000000001ed2c LR: c000000000042c40 CTR: 0000000000000000 REGS: c0000001fffd38b0 TRAP: 0700 Tainted: G W MSR: 8000000000021003 <SF,ME,RI,LE> CR: 28000488 XER: 00000000 CFAR: c00000000001ec90 IRQMASK: 0 GPR00: c000000000043820 c0000001fffd3b40 c0000000012ba300 0000000000000000 GPR04: 0000000048000488 0000000000000000 0000000000000000 00000000deadbeef GPR08: 0000000000000080 0000000000000000 0000000000000000 0000000000001001 GPR12: 0000000000000000 c0000000014a0000 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR28: 0000000000000000 0000000000000001 c000000001360810 0000000000000000 NIP [c00000000001ed2c] arch_local_irq_restore.part.0+0xac/0x100 LR [c000000000042c40] unlock_rtas+0x30/0x90 Call Trace: [c0000001fffd3b40] [c000000001360810] 0xc000000001360810 (unreliable) [c0000001fffd3b60] [c000000000043820] rtas_call+0x1c0/0x280 [c0000001fffd3bb0] [c0000000000dc328] fwnmi_release_errinfo+0x38/0x70 [c0000001fffd3c10] [c0000000000dcd8c] pseries_machine_check_realmode+0x1dc/0x540 [c0000001fffd3cd0] [c00000000003fe04] machine_check_early+0x54/0x70 [c0000001fffd3d00] [c000000000008384] machine_check_early_common+0x134/0x1f0 --- interrupt: 200 at 0x13f1307c8 LR = 0x7fff888b8528 Instruction dump: 60000000 7d2000a6 71298000 41820068 39200002 7d210164 4bffff9c 60000000 60000000 7d2000a6 71298000 4c820020 <0fe00000> 4e800020 60000000 60000000 Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200508043408.886394-5-npiggin@gmail.com	2020-05-18 21:58:44 +10:00
Nicholas Piggin	16754d25bd	powerpc/64s/exceptions: Change irq reconcile for NMIs from reusing _DAR to RESULT A spare interrupt stack slot is needed to save irq state when reconciling NMIs (sreset and decrementer soft-nmi). _DAR is used for this, but we want to reconcile machine checks as well, which do use _DAR. Switch to using RESULT instead, as it's used by system calls. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200508043408.886394-4-npiggin@gmail.com	2020-05-18 21:58:44 +10:00
Nicholas Piggin	ac2a2a1417	powerpc/64s/exceptions: Fix in_mce accounting in unrecoverable path Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Link: https://lore.kernel.org/r/20200508043408.886394-3-npiggin@gmail.com	2020-05-18 21:58:44 +10:00
Nicholas Piggin	8a5054d8cb	powerpc/64s/exception: Fix machine check no-loss idle wakeup The architecture allows for machine check exceptions to cause idle wakeups which resume at the 0x200 address which has to return via the idle wakeup code, but the early machine check handler is run first. The case of a no state-loss sleep is broken because the early handler uses non-volatile register r1 , which is needed for the wakeup protocol, but it is not restored. Fix this by loading r1 from the MCE exception frame before returning to the idle wakeup code. Also update the comment which has become stale since the idle rewrite in C. This crash was found and fix confirmed with a machine check injection test in qemu powernv model (which is not upstream in qemu yet). Fixes: `10d91611f4` ("powerpc/64s: Reimplement book3s idle code in C") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200508043408.886394-2-npiggin@gmail.com	2020-05-18 21:58:44 +10:00
Sam Bobroff	466381ecdc	powerpc/eeh: Release EEH device state synchronously EEH device state is currently removed (by eeh_remove_device()) during the device release handler, which is invoked as the device's reference count drops to zero. This may take some time, or forever, as other threads may hold references. However, the PCI device state is released synchronously by pci_stop_and_remove_bus_device(). This mismatch causes problems, for example the device may be re-discovered as a new device before the release handler has been called, leaving the PCI and EEH state mismatched. So instead, call eeh_remove_device() from the bus device removal handlers, which are called synchronously in the removal path. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0a1f5105d3a33b1c090bba31de63eb0cdd25de7b.1588045502.git.sbobroff@linux.ibm.com	2020-05-18 21:58:44 +10:00
Michael Ellerman	d93e5e2d03	powerpc/64: Update Speculation_Store_Bypass in /proc/<pid>/status Currently we don't report anything useful in /proc/<pid>/status: $ grep Speculation_Store_Bypass /proc/self/status Speculation_Store_Bypass: unknown Our mitigation is currently always a barrier instruction, which doesn't map that well onto the existing possibilities for the PR_SPEC values. However even if we added a "barrier" type PR_SPEC value, userspace would still need to consult some other source to work out which type of barrier to use. So reporting "vulnerable" seems sufficient, as userspace can see that and then consult its source to determine what barrier to use. Signed-off-by: Gustavo Walbon <gwalbon@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200402124929.3574166-1-mpe@ellerman.id.au	2020-05-18 21:58:43 +10:00
Michael Ellerman	7ffa8b7dc1	powerpc/64: Don't initialise init_task->thread.regs Aneesh increased the size of struct pt_regs by 16 bytes and started seeing this WARN_ON: smp: Bringing up secondary CPUs ... ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at arch/powerpc/kernel/process.c:455 giveup_all+0xb4/0x110 Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc2-gcc-8.2.0-1.g8f6a41f-default+ #318 NIP: c00000000001a2b4 LR: c00000000001a29c CTR: c0000000031d0000 REGS: c0000000026d3980 TRAP: 0700 Not tainted (5.7.0-rc2-gcc-8.2.0-1.g8f6a41f-default+) MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 48048224 XER: 00000000 CFAR: c000000000019cc8 IRQMASK: 1 GPR00: c00000000001a264 c0000000026d3c20 c0000000026d7200 800000000280b033 GPR04: 0000000000000001 0000000000000000 0000000000000077 30206d7372203164 GPR08: 0000000000002000 0000000002002000 800000000280b033 3230303030303030 GPR12: 0000000000008800 c0000000031d0000 0000000000800050 0000000002000066 GPR16: 000000000309a1a0 000000000309a4b0 000000000309a2d8 000000000309a890 GPR20: 00000000030d0098 c00000000264da40 00000000fd620000 c0000000ff798080 GPR24: c00000000264edf0 c0000001007469f0 00000000fd620000 c0000000020e5e90 GPR28: c00000000264edf0 c00000000264d200 000000001db60000 c00000000264d200 NIP [c00000000001a2b4] giveup_all+0xb4/0x110 LR [c00000000001a29c] giveup_all+0x9c/0x110 Call Trace: [c0000000026d3c20] [c00000000001a264] giveup_all+0x64/0x110 (unreliable) [c0000000026d3c90] [c00000000001ae34] __switch_to+0x104/0x480 [c0000000026d3cf0] [c000000000e0b8a0] __schedule+0x320/0x970 [c0000000026d3dd0] [c000000000e0c518] schedule_idle+0x38/0x70 [c0000000026d3df0] [c00000000019c7c8] do_idle+0x248/0x3f0 [c0000000026d3e70] [c00000000019cbb8] cpu_startup_entry+0x38/0x40 [c0000000026d3ea0] [c000000000011bb0] rest_init+0xe0/0xf8 [c0000000026d3ed0] [c000000002004820] start_kernel+0x990/0x9e0 [c0000000026d3f90] [c00000000000c49c] start_here_common+0x1c/0x400 Which was unexpected. The warning is checking the thread.regs->msr value of the task we are switching from: usermsr = tsk->thread.regs->msr; ... WARN_ON((usermsr & MSR_VSX) && !((usermsr & MSR_FP) && (usermsr & MSR_VEC))); ie. if MSR_VSX is set then both of MSR_FP and MSR_VEC are also set. Dumping tsk->thread.regs->msr we see that it's: 0x1db60000 Which is not a normal looking MSR, in fact the only valid bit is MSR_VSX, all the other bits are reserved in the current definition of the MSR. We can see from the oops that it was swapper/0 that we were switching from when we hit the warning, ie. init_task. So its thread.regs points to the base (high addresses) in init_stack. Dumping the content of init_task->thread.regs, with the members of pt_regs annotated (the 16 bytes larger version), we see: 0000000000000000 c000000002780080 gpr[0] gpr[1] 0000000000000000 c000000002666008 gpr[2] gpr[3] c0000000026d3ed0 0000000000000078 gpr[4] gpr[5] c000000000011b68 c000000002780080 gpr[6] gpr[7] 0000000000000000 0000000000000000 gpr[8] gpr[9] c0000000026d3f90 0000800000002200 gpr[10] gpr[11] c000000002004820 c0000000026d7200 gpr[12] gpr[13] 000000001db60000 c0000000010aabe8 gpr[14] gpr[15] c0000000010aabe8 c0000000010aabe8 gpr[16] gpr[17] c00000000294d598 0000000000000000 gpr[18] gpr[19] 0000000000000000 0000000000001ff8 gpr[20] gpr[21] 0000000000000000 c00000000206d608 gpr[22] gpr[23] c00000000278e0cc 0000000000000000 gpr[24] gpr[25] 000000002fff0000 c000000000000000 gpr[26] gpr[27] 0000000002000000 0000000000000028 gpr[28] gpr[29] 000000001db60000 0000000004750000 gpr[30] gpr[31] 0000000002000000 000000001db60000 nip msr 0000000000000000 0000000000000000 orig_r3 ctr c00000000000c49c 0000000000000000 link xer 0000000000000000 0000000000000000 ccr softe 0000000000000000 0000000000000000 trap dar 0000000000000000 0000000000000000 dsisr result 0000000000000000 0000000000000000 ppr kuap 0000000000000000 0000000000000000 pad[2] pad[3] This looks suspiciously like stack frames, not a pt_regs. If we look closely we can see return addresses from the stack trace above, c000000002004820 (start_kernel) and c00000000000c49c (start_here_common). init_task->thread.regs is setup at build time in processor.h: #define INIT_THREAD { \ .ksp = INIT_SP, \ .regs = (struct pt_regs )INIT_SP - 1, / XXX bogus, I think / \ The early boot code where we setup the initial stack is: LOAD_REG_ADDR(r3,init_thread_union) / set up a stack pointer */ LOAD_REG_IMMEDIATE(r1,THREAD_SIZE) add r1,r3,r1 li r0,0 stdu r0,-STACK_FRAME_OVERHEAD(r1) Which creates a stack frame of size 112 bytes (STACK_FRAME_OVERHEAD). Which is far too small to contain a pt_regs. So the result is init_task->thread.regs is pointing at some stack frames on the init stack, not at a pt_regs. We have gotten away with this for so long because with pt_regs at its current size the MSR happens to point into the first frame, at a location that is not written to by the early asm. With the 16 byte expansion the MSR falls into the second frame, which is used by the compiler, and collides with a saved register that tends to be non-zero. As far as I can see this has been wrong since the original merge of 64-bit ppc support, back in 2002. Conceptually swapper should have no regs, it never entered from userspace, and in fact that's what we do on 32-bit. It's also presumably what the "bogus" comment is referring to. So I think the right fix is to just not-initialise regs at all. I'm slightly worried this will break some code that isn't prepared for a NULL regs, but we'll have to see. Remove the comment in head_64.S which refers to us setting up the regs (even though we never did), and is otherwise not really accurate any more. Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200428123130.73078-1-mpe@ellerman.id.au	2020-05-15 11:58:54 +10:00
Nicholas Piggin	4e0e45b07d	powerpc: Use trap metadata to prevent double restart rather than zeroing trap It's not very nice to zero trap for this, because then system calls no longer have trap_is_syscall(regs) invariant, and we can't distinguish between sc and scv system calls (in a later patch). Take one last unused bit from the low bits of the pt_regs.trap word for this instead. There is not a really good reason why it should be in trap as opposed to another field, but trap has some concept of flags and it exists. Ideally I think we would move trap to 2-byte field and have 2 more bytes available independently. Add a selftests case for this, which can be seen to fail if trap_norestart() is changed to return false. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Make them static inlines] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200507121332.2233629-4-mpe@ellerman.id.au	2020-05-15 11:58:54 +10:00
Nicholas Piggin	912237ea16	powerpc: trap_is_syscall() helper to hide syscall trap number A new system call interrupt will be added with a new trap number. Hide the explicit 0xc00 test behind an accessor to reduce churn in callers. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Make it a static inline] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200507121332.2233629-3-mpe@ellerman.id.au	2020-05-15 11:58:54 +10:00
Nicholas Piggin	db30144b5c	powerpc: Use set_trap() and avoid open-coding trap masking The pt_regs.trap field keeps 4 low bits for some metadata about the trap or how it was handled, which is masked off in order to test the architectural trap number. Add a set_trap() accessor to set this, equivalent to TRAP() for returning it. This is actually not quite the equivalent of TRAP() because it always clears the low bits, which may be harmless if it can only be updated via ptrace syscall, but it seems dangerous. In fact settting TRAP from ptrace doesn't seem like a great idea so maybe it's better deleted. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Make it a static inline rather than a shouty macro] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200507121332.2233629-2-mpe@ellerman.id.au	2020-05-15 11:58:54 +10:00

... 3 4 5 6 7 ...

7423 Commits