The advancing of the PC when completing an MMIO load is done before
re-entering the guest, i.e. before restoring the guest ASID. However if
the load is in a branch delay slot it may need to access guest code to
read the prior branch instruction. This isn't safe in TLB mapped code at
the moment, nor in the future when we'll access unmapped guest segments
using direct user accessors too, as it could read the branch from host
user memory instead.
Therefore calculate the resume PC in advance while we're still in the
right context and save it in the new vcpu->arch.io_pc (replacing the no
longer needed vcpu->arch.pending_load_cause), and restore it on MMIO
completion.
Fixes: e685c689f3 ("KVM/MIPS32: Privileged instruction/target branch emulation.")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.10.x-
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The ERET instruction to return from exception is used for returning from
exception level (Status.EXL) and error level (Status.ERL). If both bits
are set however we should be returning from ERL first, as ERL can
interrupt EXL, for example when an NMI is taken. KVM however checks EXL
first.
Fix the order of the checks to match the pseudocode in the instruction
set manual.
Fixes: e685c689f3 ("KVM/MIPS32: Privileged instruction/target branch emulation.")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.10.x-
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kvm_mips_check_asids() runs before entering the guest and performs lazy
regeneration of host ASID for guest usermode, using last_user_gasid to
track the last guest ASID in the VCPU that was used by guest usermode on
any host CPU.
last_user_gasid is reset after performing the lazy ASID regeneration on
the current CPU, and by kvm_arch_vcpu_load() if the host ASID for guest
usermode is regenerated due to staleness (to cancel outstanding lazy
ASID regenerations). Unfortunately neither case handles SMP hosts
correctly:
- When the lazy ASID regeneration is performed it should apply to all
CPUs (as last_user_gasid does), so reset the ASID on other CPUs to
zero to trigger regeneration when the VCPU is next loaded on those
CPUs.
- When the ASID is found to be stale on the current CPU, we should not
cancel lazy ASID regenerations globally, so drop the reset of
last_user_gasid altogether here.
Both cases would require a guest ASID change and two host CPU migrations
(and in the latter case one of the CPUs to start a new ASID cycle)
before guest usermode could potentially access stale user pages from a
previously running ASID in the same VCPU.
Fixes: 25b08c7fb0 ("KVM: MIPS: Invalidate TLB by regenerating ASIDs")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 784d5699ed ("x86: move exports to actual definitions") removed the
EXPORT_SYMBOL(__fentry__) and EXPORT_SYMBOL(mcount) from x8664_ksyms_64.c,
and added EXPORT_SYMBOL(function_hook) in mcount_64.S instead. The problem
is that function_hook isn't a function at all, but a macro that is defined
as either mcount or __fentry__ depending on the support from gcc.
Originally, I thought this was a macro issue, like what __stringify()
is used for. But the problem is a bit deeper. The Makefile.build has
some magic that does post processing of files to create the CRC
bindings. It does some searches for EXPORT_SYMBOL() and because it
finds a macro name and not the actual functions, this causes
function_hook not to be converted into mcount or __fentry__ and they
are missed.
Instead of adding more magic to Makefile.build, just add
EXPORT_SYMBOL() for mcount and __fentry__ where the ifdef is used.
Since this is assembly and not C, it doesn't require being set after
the function is defined.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Borislav Petkov <bp@alien8.de>
Cc: Gabriel C <nix.or.die@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Link: http://lkml.kernel.org/r/20161024150148.4f9d90e4@gandalf.local.home
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The drivers are now converted to not use the DMA resource.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
A recent change to the mm code in:
87744ab383 mm: fix cache mode tracking in vm_insert_mixed()
started enforcing checking the memory type against the registered list for
amixed pfn insertion mappings. It happens that the drm drivers for a number
of gpus relied on this being broken. Currently the driver only inserted
VRAM mappings into the tracking table when they came from the kernel,
and userspace mappings never landed in the table. This led to a regression
where all the mapping end up as UC instead of WC now.
I've considered a number of solutions but since this needs to be fixed
in fixes and not next, and some of the solutions were going to introduce
overhead that hadn't been there before I didn't consider them viable at
this stage. These mainly concerned hooking into the TTM io reserve APIs,
but these API have a bunch of fast paths I didn't want to unwind to add
this to.
The solution I've decided on is to add a new API like the arch_phys_wc
APIs (these would have worked but wc_del didn't take a range), and
use them from the drivers to add a WC compatible mapping to the table
for all VRAM on those GPUs. This means we can then create userspace
mapping that won't get degraded to UC.
v1.1: use CONFIG_X86_PAT + add some comments in io.h
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: x86@kernel.org
Cc: mcgrof@suse.com
Cc: Dan Williams <dan.j.williams@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If the number of pages we are flushing is more than twice the number
of entries in the TSB, just scan the TSB table for matches rather
than probing each and every page in the range.
Based upon a patch and report by James Clarke.
Signed-off-by: David S. Miller <davem@davemloft.net>
Additionally, if the offset will overflow the immediate for a ba,pt
instruction, fall back on a standard ba to get an extra 3 bits.
Signed-off-by: James Clarke <jrtc27@jrtc27.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we copy code over to patch another piece of code, we can only use
PC-relative branches that target code within that piece of code.
Such PC-relative branches cannot be made to external symbols because
the patch moves the location of the code and thus modifies the
relative address of external symbols.
Use an absolute jmpl to fix this problem.
Signed-off-by: David S. Miller <davem@davemloft.net>
The removed clock.h file is a leftover after moving the platform to a
common clock framework driver, it contains unused "struct clk"
definition, which under circumstances may coalesce with a generic
"struct clk" declaration for clock consumers. Also remove useless
include of the removed local file from a single source file
mach-lpc32xx/pm.c.
Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: Sylvain Lemieux <slemieux.tyco@gmail.com>
The removed LPC32xx mach/irqs.h file is not included in any source
code, function declaration lpc32xx_init_irq() is also unused, remove
them as leftovers after switching to a new interrupt controller
driver.
Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sylvain Lemieux <slemieux.tyco@gmail.com>
The change setup the peripheral clock (PERIPH_CLK) as the default
parent clock for PWM1 & PWM2.
Signed-off-by: Sylvain Lemieux <slemieux@tycoint.com>
Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
For mostly historical reasons, the x86 oops dump shows the raw stack
values:
...
[registers]
Stack:
ffff880079af7350 ffff880079905400 0000000000000000 ffffc900008f3ae0
ffffffffa0196610 0000000000000001 00010000ffffffff 0000000087654321
0000000000000002 0000000000000000 0000000000000000 0000000000000000
Call Trace:
...
This seems to be an artifact from long ago, and probably isn't needed
anymore. It generally just adds noise to the dump, and it can be
actively harmful because it leaks kernel addresses.
Linus says:
"The stack dump actually goes back to forever, and it used to be
useful back in 1992 or so. But it used to be useful mainly because
stacks were simpler and we didn't have very good call traces anyway. I
definitely remember having used them - I just do not remember having
used them in the last ten+ years.
Of course, it's still true that if you can trigger an oops, you've
likely already lost the security game, but since the stack dump is so
useless, let's aim to just remove it and make games like the above
harder."
This also removes the related 'kstack=' cmdline option and the
'kstack_depth_to_print' sysctl.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e83bd50df52d8fe88e94d2566426ae40d813bf8f.1477405374.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Printing kernel text addresses in stack dumps is of questionable value,
especially now that address randomization is becoming common.
It can be a security issue because it leaks kernel addresses. It also
affects the usefulness of the stack dump. Linus says:
"I actually spend time cleaning up commit messages in logs, because
useless data that isn't actually information (random hex numbers) is
actively detrimental.
It makes commit logs less legible.
It also makes it harder to parse dumps.
It's not useful. That makes it actively bad.
I probably look at more oops reports than most people. I have not
found the hex numbers useful for the last five years, because they are
just randomized crap.
The stack content thing just makes code scroll off the screen etc, for
example."
The only real downside to removing these addresses is that they can be
used to disambiguate duplicate symbol names. However such cases are
rare, and the context of the stack dump should be enough to be able to
figure it out.
There's now a 'faddr2line' script which can be used to convert a
function address to a file name and line:
$ ./scripts/faddr2line ~/k/vmlinux write_sysrq_trigger+0x51/0x60
write_sysrq_trigger+0x51/0x60:
write_sysrq_trigger at drivers/tty/sysrq.c:1098
Or gdb can be used:
$ echo "list *write_sysrq_trigger+0x51" |gdb ~/k/vmlinux |grep "is in"
(gdb) 0xffffffff815b5d83 is in driver_probe_device (/home/jpoimboe/git/linux/drivers/base/dd.c:378).
(But note that when there are duplicate symbol names, gdb will only show
the first symbol it finds. faddr2line is recommended over gdb because
it handles duplicates and it also does function size checking.)
Here's an example of what a stack dump looks like after this change:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: sysrq_handle_crash+0x45/0x80
PGD 36bfa067 [ 29.650644] PUD 7aca3067
Oops: 0002 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 1 PID: 786 Comm: bash Tainted: G E 4.9.0-rc1+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
task: ffff880078582a40 task.stack: ffffc90000ba8000
RIP: 0010:sysrq_handle_crash+0x45/0x80
RSP: 0018:ffffc90000babdc8 EFLAGS: 00010296
RAX: ffff880078582a40 RBX: 0000000000000063 RCX: 0000000000000001
RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000292
RBP: ffffc90000babdc8 R08: 0000000b31866061 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000007 R14: ffffffff81ee8680 R15: 0000000000000000
FS: 00007ffb43869700(0000) GS:ffff88007d400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000007a3e9000 CR4: 00000000001406e0
Stack:
ffffc90000babe00 ffffffff81572d08 ffffffff81572bd5 0000000000000002
0000000000000000 ffff880079606600 00007ffb4386e000 ffffc90000babe20
ffffffff81573201 ffff880036a3fd00 fffffffffffffffb ffffc90000babe40
Call Trace:
__handle_sysrq+0x138/0x220
? __handle_sysrq+0x5/0x220
write_sysrq_trigger+0x51/0x60
proc_reg_write+0x42/0x70
__vfs_write+0x37/0x140
? preempt_count_sub+0xa1/0x100
? __sb_start_write+0xf5/0x210
? vfs_write+0x183/0x1a0
vfs_write+0xb8/0x1a0
SyS_write+0x58/0xc0
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x7ffb42f55940
RSP: 002b:00007ffd33bb6b18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007ffb42f55940
RDX: 0000000000000002 RSI: 00007ffb4386e000 RDI: 0000000000000001
RBP: 0000000000000011 R08: 00007ffb4321ea40 R09: 00007ffb43869700
R10: 00007ffb43869700 R11: 0000000000000246 R12: 0000000000778a10
R13: 00007ffd33bb5c00 R14: 0000000000000007 R15: 0000000000000010
Code: 34 e8 d0 34 bc ff 48 c7 c2 3b 2b 57 81 be 01 00 00 00 48 c7 c7 e0 dd e5 81 e8 a8 55 ba ff c7 05 0e 3f de 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 e8 4c 49 bc ff 84 c0 75 c3 48 c7
RIP: sysrq_handle_crash+0x45/0x80 RSP: ffffc90000babdc8
CR2: 0000000000000000
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/69329cb29b8f324bb5fcea14d61d224807fb6488.1477405374.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
All of the SoCFPGA boards have at least 1GB of RAM, so enabling HIGHMEM
is necessary to avoid the following warning:
[ 0.000000] Truncating RAM at 0x00000000-0x40000000 to -0x30000000
[ 0.000000] Consider using a HIGHMEM enabled kernel.
Signed-off-by: Dinh Nguyen <dinguyen@opensource.altera.com>
When the patch is applied, the allwinner,driver and allwinner,pull
properties are removed.
Although they're described to be optional in the devicetree binding,
without them, the pinmux cannot be initialized, and the uart cannot
be used.
Add them back to fix the problem, and makes the bluetooth on iNet D978
Rev2 board work.
Fixes: 82eec38424 (ARM: dts: sun8i: add pinmux for UART1 at PG)
Signed-off-by: Icenowy Zheng <icenowy@aosc.xyz>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Yeah, I know, I know, this is a huuge patch and reviewing it is hard.
Sorry but this is the only way I could think of in which I can rewrite
the microcode patches loading procedure without breaking (knowingly) the
driver.
So maybe this patch is easier to review if one looks at the files after
the patch has been applied instead at the diff. Because then it becomes
pretty obvious:
* The BSP-loading path - load_ucode_bsp() is working independently from
the AP path now and it doesn't save any pointers or patches anymore -
it solely parses the builtin or initrd microcode and applies the patch.
That's it.
This fixes the CONFIG_RANDOMIZE_MEMORY offset fun more solidly.
* The AP-loading path - load_ucode_ap() then goes and scans
builtin/initrd *again* for the microcode patches but it caches them this
time so that we don't have to do that scan on each AP but only once.
This simplifies the code considerably.
Then, when we save the microcode from the initrd/builtin, we go and
add the relevant patches to our own cache. The AMD side did do that
and now the Intel side does it too. So no more pointer copying and
blabla, we save the microcode patches ourselves and are independent from
initrd/builtin.
This whole conversion gives us other benefits like unifying the
initrd parsing into a single function: find_microcode_in_initrd() is
used by both.
The diffstat speaks for itself: 456 insertions(+), 695 deletions(-)
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-12-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Vince Waver reported the following bug:
WARNING: CPU: 0 PID: 21338 at arch/x86/mm/fault.c:435 vmalloc_fault+0x58/0x1f0
CPU: 0 PID: 21338 Comm: perf_fuzzer Not tainted 4.8.0+ #37
Hardware name: Hewlett-Packard HP Compaq Pro 6305 SFF/1850, BIOS K06 v02.57 08/16/2013
Call Trace:
<NMI> ? dump_stack+0x46/0x59
? __warn+0xd5/0xee
? vmalloc_fault+0x58/0x1f0
? __do_page_fault+0x6d/0x48e
? perf_log_throttle+0xa4/0xf4
? trace_page_fault+0x22/0x30
? __unwind_start+0x28/0x42
? perf_callchain_kernel+0x75/0xac
? get_perf_callchain+0x13a/0x1f0
? perf_callchain+0x6a/0x6c
? perf_prepare_sample+0x71/0x2eb
? perf_event_output_forward+0x1a/0x54
? __default_send_IPI_shortcut+0x10/0x2d
? __perf_event_overflow+0xfb/0x167
? x86_pmu_handle_irq+0x113/0x150
? native_read_msr+0x6/0x34
? perf_event_nmi_handler+0x22/0x39
? perf_ibs_nmi_handler+0x4a/0x51
? perf_event_nmi_handler+0x22/0x39
? nmi_handle+0x4d/0xf0
? perf_ibs_handle_irq+0x3d1/0x3d1
? default_do_nmi+0x3c/0xd5
? do_nmi+0x92/0x102
? end_repeat_nmi+0x1a/0x1e
? entry_SYSCALL_64_after_swapgs+0x12/0x4a
? entry_SYSCALL_64_after_swapgs+0x12/0x4a
? entry_SYSCALL_64_after_swapgs+0x12/0x4a
<EOE> ^A4---[ end trace 632723104d47d31a ]---
BUG: stack guard page was hit at ffffc90008500000 (stack is ffffc900084fc000..ffffc900084fffff)
kernel stack overflow (page fault): 0000 [#1] SMP
...
The NMI hit in the entry code right after setting up the stack pointer
from 'cpu_current_top_of_stack', so the kernel stack was empty. The
'guess' version of __unwind_start() attempted to dereference the "top of
stack" pointer, which is not actually *on* the stack.
Add a check in the guess unwinder to deal with an empty stack. (The
frame pointer unwinder already has such a check.)
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 7c7900f897 ("x86/unwind: Add new unwind interface and implementations")
Link: http://lkml.kernel.org/r/20161024133127.e5evgeebdbohnmpb@treble
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that all of the user copy routines are converted to return
accurate residual lengths when an exception occurs, we no longer need
the broken fixup routines.
Signed-off-by: David S. Miller <davem@davemloft.net>
Report the exact number of bytes which have not been successfully
copied when an exception occurs, using the running remaining length.
Signed-off-by: David S. Miller <davem@davemloft.net>
bcm2837-rpi-3-b.dts, its only in-tree user, was overriding it as
"brcm,bcm2837" already.
Fixes: 9d56c22a78 ("ARM: bcm2835: Add devicetree for the Raspberry Pi 3.")
Cc: Stephen Warren <swarren@wwwdotorg.org>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Eric Anholt <eric@anholt.net>
Report the exact number of bytes which have not been successfully
copied when an exception occurs, using the running remaining length.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull xen fixes from David Vrabel:
- advertise control feature flags in xenstore
- fix x86 build when XEN_PVHVM is disabled
* tag 'for-linus-4.9-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xenbus: check return value of xenbus_scanf()
xenbus: prefer list_for_each()
x86: xen: move cpu_up functions out of ifdef
xenbus: advertise control feature flags
Report the exact number of bytes which have not been successfully
copied when an exception occurs, using the running remaining length.
Signed-off-by: David S. Miller <davem@davemloft.net>