Commit Graph

6717 Commits

Author SHA1 Message Date
Kai Huang
1c91cad423 KVM: x86: Change parameter of kvm_mmu_slot_remove_write_access
This patch changes the second parameter of kvm_mmu_slot_remove_write_access from
'slot id' to 'struct kvm_memory_slot *' to align with kvm_x86_ops dirty logging
hooks, which will be introduced in further patch.

Better way is to change second parameter of kvm_arch_commit_memory_region from
'struct kvm_userspace_memory_region *' to 'struct kvm_memory_slot * new', but it
requires changes on other non-x86 ARCH too, so avoid it now.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-29 15:31:37 +01:00
Kai Huang
f4b4b18086 KVM: MMU: Add mmu help functions to support PML
This patch adds new mmu layer functions to clear/set D-bit for memory slot, and
to write protect superpages for memory slot.

In case of PML, CPU logs the dirty GPA automatically to PML buffer when CPU
updates D-bit from 0 to 1, therefore we don't have to write protect 4K pages,
instead, we only need to clear D-bit in order to log that GPA.

For superpages, we still write protect it and let page fault code to handle
dirty page logging, as we still need to split superpage to 4K pages in PML.

As PML is always enabled during guest's lifetime, to eliminate unnecessary PML
GPA logging, we set D-bit manually for the slot with dirty logging disabled.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-29 15:31:29 +01:00
Ingo Molnar
b3890e4704 Merge branch 'perf/hw_breakpoints' into perf/core
The new hw_breakpoint bits are now ready for v3.20, merge them
into the main branch, to avoid conflicts.

Conflicts:
	tools/perf/Documentation/perf-record.txt

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-28 15:48:59 +01:00
Ingo Molnar
772a9aca12 Merge tag 'pr-20150114-x86-entry' of git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux into x86/asm
Pull x86/entry enhancements from Andy Lutomirski:

" This is my accumulated x86 entry work, part 1, for 3.20.  The meat
  of this is an IST rework.  When an IST exception interrupts user
  space, we will handle it on the per-thread kernel stack instead of
  on the IST stack.  This sounds messy, but it actually simplifies the
  IST entry/exit code, because it eliminates some ugly games we used
  to play in order to handle rescheduling, signal delivery, etc on the
  way out of an IST exception.

  The IST rework introduces proper context tracking to IST exception
  handlers.  I haven't seen any bug reports, but the old code could
  have incorrectly treated an IST exception handler as an RCU extended
  quiescent state.

  The memory failure change (included in this pull request with
  Borislav and Tony's permission) eliminates a bunch of code that
  is no longer needed now that user memory failure handlers are
  called in process context.

  Finally, this includes a few on Denys' uncontroversial and Obviously
  Correct (tm) cleanups.

  The IST and memory failure changes have been in -next for a while.

  LKML references:

  IST rework:
  http://lkml.kernel.org/r/cover.1416604491.git.luto@amacapital.net

  Memory failure change:
  http://lkml.kernel.org/r/54ab2ffa301102cd6e@agluck-desk.sc.intel.com

  Denys' cleanups:
  http://lkml.kernel.org/r/1420927210-19738-1-git-send-email-dvlasenk@redhat.com
"

This tree semantically depends on and is based on the following RCU commit:

  734d168013 ("rcu: Make rcu_nmi_enter() handle nesting")

... and for that reason won't be pushed upstream before the RCU bits hit Linus's tree.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-28 15:33:26 +01:00
David Vrabel
0bb599fd30 xen: remove scratch frames for ballooned pages and m2p override
The scratch frame mappings for ballooned pages and the m2p override
are broken.  Remove them in preparation for replacing them with
simpler mechanisms that works.

The scratch pages did not ensure that the page was not in use.  In
particular, the foreign page could still be in use by hardware.  If
the guest reused the frame the hardware could read or write that
frame.

The m2p override did not handle the same frame being granted by two
different grant references.  Trying an M2P override lookup in this
case is impossible.

With the m2p override removed, the grant map/unmap for the kernel
mappings (for x86 PV) can be easily batched in
set_foreign_p2m_mapping() and clear_foreign_p2m_mapping().

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2015-01-28 14:03:10 +00:00
David Vrabel
853d028934 xen/grant-table: pre-populate kernel unmap ops for xen_gnttab_unmap_refs()
When unmapping grants, instead of converting the kernel map ops to
unmap ops on the fly, pre-populate the set of unmap ops.

This allows the grant unmap for the kernel mappings to be trivially
batched in the future.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2015-01-28 14:03:10 +00:00
Nadav Amit
801806d956 KVM: x86: IRET emulation does not clear NMI masking
The IRET instruction should clear NMI masking, but the current implementation
does not do so.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26 12:14:42 +01:00
K. Y. Srinivasan
4061ed9e2a Drivers: hv: vmbus: Implement a clockevent device
Implement a clockevent device based on the timer support available on
Hyper-V.
In this version of the patch I have addressed Jason's review comments.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-25 09:17:57 -08:00
Paolo Bonzini
1c6007d59a Merge tag 'kvm-arm-for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-next
KVM/ARM changes for v3.20 including GICv3 emulation, dirty page logging, added
trace symbols, and adding an explicit VGIC init device control IOCTL.

Conflicts:
	arch/arm64/include/asm/kvm_arm.h
	arch/arm64/kvm/handle_exit.c
2015-01-23 13:39:51 +01:00
Andy Lutomirski
3669ef9fa7 x86, tls: Interpret an all-zero struct user_desc as "no segment"
The Witcher 2 did something like this to allocate a TLS segment index:

        struct user_desc u_info;
        bzero(&u_info, sizeof(u_info));
        u_info.entry_number = (uint32_t)-1;

        syscall(SYS_set_thread_area, &u_info);

Strictly speaking, this code was never correct.  It should have set
read_exec_only and seg_not_present to 1 to indicate that it wanted
to find a free slot without putting anything there, or it should
have put something sensible in the TLS slot if it wanted to allocate
a TLS entry for real.  The actual effect of this code was to
allocate a bogus segment that could be used to exploit espfix.

The set_thread_area hardening patches changed the behavior, causing
set_thread_area to return -EINVAL and crashing the game.

This changes set_thread_area to interpret this as a request to find
a free slot and to leave it empty, which isn't *quite* what the game
expects but should be close enough to keep it working.  In
particular, using the code above to allocate two segments will
allocate the same segment both times.

According to FrostbittenKing on Github, this fixes The Witcher 2.

If this somehow still causes problems, we could instead allocate
a limit==0 32-bit data segment, but that seems rather ugly to me.

Fixes: 41bdc78544 x86/tls: Validate TLS entries to protect espfix
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: stable@vger.kernel.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/0cb251abe1ff0958b8e468a9a9a905b80ae3a746.1421954363.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 21:45:07 +01:00
Andy Lutomirski
e30ab185c4 x86, tls, ldt: Stop checking lm in LDT_empty
32-bit programs don't have an lm bit in their ABI, so they can't
reliably cause LDT_empty to return true without resorting to memset.
They shouldn't need to do this.

This should fix a longstanding, if minor, issue in all 64-bit kernels
as well as a potential regression in the TLS hardening code.

Fixes: 41bdc78544 x86/tls: Validate TLS entries to protect espfix
Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/72a059de55e86ad5e2935c80aa91880ddf19d07c.1421954363.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 21:11:06 +01:00
Dave Hansen
c922228efe x86, mpx: Fix potential performance issue on unmaps
The 3.19 merge window saw some TLB modifications merged which caused a
performance regression. They were fixed in commit 045bbb9fa.

Once that fix was applied, I also noticed that there was a small
but intermittent regression still present.  It was not present
consistently enough to bisect reliably, but I'm fairly confident
that it came from (my own) MPX patches.  The source was reading
a relatively unused field in the mm_struct via arch_unmap.

I also noted that this code was in the main instruction flow of
do_munmap() and probably had more icache impact than we want.

This patch does two things:
1. Adds a static (via Kconfig) and dynamic (via cpuid) check
   for MPX with cpu_feature_enabled().  This keeps us from
   reading that cacheline in the mm and trades it for a check
   of the global CPUID variables at least on CPUs without MPX.
2. Adds an unlikely() to ensure that the MPX call ends up out
   of the main instruction flow in do_munmap().  I've added
   a detailed comment about why this was done and why we want
   it even on systems where MPX is present.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: luto@amacapital.net
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20150108223021.AEEAB987@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 21:11:06 +01:00
Thomas Gleixner
374aab339f x86/apic: Reuse apic_bsp_setup() for UP APIC setup
Extend apic_bsp_setup() so the same code flow can be used for
APIC_init_uniprocessor().

Folded Jiangs fix to provide proper ordering of the UP setup.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20150115211704.084765674@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 15:10:56 +01:00
Thomas Gleixner
05f7e46d2a x86/smpboot: Move apic init code to apic.c
We better provide proper functions which implement the required code
flow in the apic code rather than letting the smpboot code open code
it. That allows to make more functions static and confines the APIC
functionality to apic.c where it belongs.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20150115211703.907616730@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 15:10:56 +01:00
Thomas Gleixner
8686608336 x86/ioapic: Provide stub functions for IOAPIC%3Dn
To avoid lots of ifdeffery provide proper stubs for setup_IO_APIC(),
enable_IO_APIC() and setup_ioapic_dest().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20150115211703.397170414@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 15:10:55 +01:00
Thomas Gleixner
f77aa308e5 x86/smpboot: Move smpboot inlines to code
No point for a separate header file.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20150115211703.304126687@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 15:10:55 +01:00
Thomas Gleixner
659006bf3a x86/x2apic: Split enable and setup function
enable_x2apic() is a convoluted unreadable mess because it is used for
both enablement in early boot and for setup in cpu_init().

Split the code into x2apic_enable() for enablement and x2apic_setup()
for setup of (secondary cpus). Make use of the new state tracking to
simplify the logic.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20150115211703.129287153@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 15:10:55 +01:00
Thomas Gleixner
44e25ff9e6 x86/x2apic: Disable x2apic from nox2apic setup
There is no point in postponing the hardware disablement of x2apic. It
can be disabled right away in the nox2apic setup function.

Disable it right away and set the state to DISABLED . This allows to
remove all the nox2apic conditionals all over the place.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20150115211703.051214090@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 15:10:55 +01:00
Thomas Gleixner
55eae7de72 x86/x2apic: Move code in conditional region
No point in having try_to_enable_x2apic() outside of the
CONFIG_X86_X2APIC section and having inline functions and more ifdefs
to deal with it. Move the code into the existing ifdef section and
remove the inline cruft.

Fixup the printk about not enabling interrupt remapping as suggested
by Boris.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20150115211702.795388613@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 15:10:54 +01:00
Thomas Gleixner
d524165cb8 x86/apic: Check x2apic early
No point in delaying the x2apic detection for the CONFIG_X86_X2APIC=n
case to enable_IR_x2apic(). We rather detect that before we try to
setup anything there.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20150115211702.702479404@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 15:10:54 +01:00
Thomas Gleixner
2ca5b40479 x86/ioapic: Check x2apic really
The x2apic_preenabled flag is just a horrible hack and if X2APIC
support is disabled it does not reflect the actual hardware
state. Check the hardware instead.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20150115211702.541280622@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 15:10:54 +01:00
Thomas Gleixner
81a46dd824 x86/apic: Make x2apic_mode depend on CONFIG_X86_X2APIC
No point in having a static variable around which is always 0. Let the
compiler optimize code out if disabled.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20150115211702.363274310@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 15:10:54 +01:00
Thomas Gleixner
8d80696060 x86/apic: Avoid open coded x2apic detection
enable_IR_x2apic() grew a open coded x2apic detection. Implement a
proper helper function which shares the code with the already existing
x2apic_enabled().

Made it use rdmsrl_safe as suggested by Boris.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20150115211702.285038186@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-22 15:10:54 +01:00
Borislav Petkov
cfaa790a3f kvm: Fix CR3_PCID_INVD type on 32-bit
arch/x86/kvm/emulate.c: In function ‘check_cr_write’:
arch/x86/kvm/emulate.c:3552:4: warning: left shift count >= width of type
    rsvd = CR3_L_MODE_RESERVED_BITS & ~CR3_PCID_INVD;

happens because sizeof(UL) on 32-bit is 4 bytes but we shift it 63 bits
to the left.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-21 15:59:09 +01:00
Marcelo Tosatti
54750f2cf0 KVM: x86: workaround SuSE's 2.6.16 pvclock vs masterclock issue
SuSE's 2.6.16 kernel fails to boot if the delta between tsc_timestamp
and rdtsc is larger than a given threshold:

 * If we get more than the below threshold into the future, we rerequest
 * the real time from the host again which has only little offset then
 * that we need to adjust using the TSC.
 *
 * For now that threshold is 1/5th of a jiffie. That should be good
 * enough accuracy for completely broken systems, but also give us swing
 * to not call out to the host all the time.
 */
#define PVCLOCK_DELTA_MAX ((1000000000ULL / HZ) / 5)

Disable masterclock support (which increases said delta) in case the
boot vcpu does not use MSR_KVM_SYSTEM_TIME_NEW.

Upstreams kernels which support pvclock vsyscalls (and therefore make
use of PVCLOCK_STABLE_BIT) use MSR_KVM_SYSTEM_TIME_NEW.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-20 20:38:39 +01:00
Oleg Nesterov
7575637ab2 x86, fpu: Fix math_state_restore() race with kernel_fpu_begin()
math_state_restore() can race with kernel_fpu_begin() if irq comes
right after __thread_fpu_begin(), __save_init_fpu() will overwrite
fpu->state we are going to restore.

Add 2 simple helpers, kernel_fpu_disable() and kernel_fpu_enable()
which simply set/clear in_kernel_fpu, and change math_state_restore()
to exclude kernel_fpu_begin() in between.

Alternatively we could use local_irq_save/restore, but probably these
new helpers can have more users.

Perhaps they should disable/enable preemption themselves, in this case
we can remove preempt_disable() in __restore_xstate_sig().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: matt.fleming@intel.com
Cc: bp@suse.de
Cc: pbonzini@redhat.com
Cc: luto@amacapital.net
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Link: http://lkml.kernel.org/r/20150115192028.GD27332@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-20 13:53:07 +01:00
Oleg Nesterov
14e153ef75 x86, fpu: Introduce per-cpu in_kernel_fpu state
interrupted_kernel_fpu_idle() tries to detect if kernel_fpu_begin()
is safe or not. In particular it should obviously deny the nested
kernel_fpu_begin() and this logic looks very confusing.

If use_eager_fpu() == T we rely on a) __thread_has_fpu() check in
interrupted_kernel_fpu_idle(), and b) on the fact that _begin() does
__thread_clear_has_fpu().

Otherwise we demand that the interrupted task has no FPU if it is in
kernel mode, this works because __kernel_fpu_begin() does clts() and
interrupted_kernel_fpu_idle() checks X86_CR0_TS.

Add the per-cpu "bool in_kernel_fpu" variable, and change this code
to check/set/clear it. This allows to do more cleanups and fixes, see
the next changes.

The patch also moves WARN_ON_ONCE() under preempt_disable() just to
make this_cpu_read() look better, this is not really needed. And in
fact I think we should move it into __kernel_fpu_begin().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: matt.fleming@intel.com
Cc: bp@suse.de
Cc: pbonzini@redhat.com
Cc: luto@amacapital.net
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Link: http://lkml.kernel.org/r/20150115191943.GB27332@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-20 13:53:07 +01:00
Andy Shevchenko
0e1540208e x86: pmc_atom: Expose contents of PSS
The PSS register reflects the power state of each island on SoC. It would be
useful to know which of the islands is on or off at the momemnt.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Aubrey Li <aubrey.li@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Kumar P. Mahesh <mahesh.kumar.p@intel.com>
Link: http://lkml.kernel.org/r/1421253575-22509-6-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-20 12:50:14 +01:00
Jiang Liu
8abb850a03 x86/xen: Override ACPI IRQ management callback __acpi_unregister_gsi
Xen overrides __acpi_register_gsi and leaves __acpi_unregister_gsi as is.
That means, an IRQ allocated by acpi_register_gsi_xen_hvm() or
acpi_register_gsi_xen() will be freed by acpi_unregister_gsi_ioapic(),
which may cause undesired effects. So override __acpi_unregister_gsi to
NULL for safety.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Cc: Tony Luck <tony.luck@intel.com>
Cc: xen-devel@lists.xenproject.org
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Graeme Gregory <graeme.gregory@linaro.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Link: http://lkml.kernel.org/r/1421720467-7709-4-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-20 11:44:41 +01:00
Christian Borntraeger
bccec2a0a2 x86/spinlock: Leftover conversion ACCESS_ONCE->READ_ONCE
commit 78bff1c868 ("x86/ticketlock: Fix spin_unlock_wait() livelock")
introduced two additional ACCESS_ONCE cases in x86 spinlock.h.
Lets change those as well.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
2015-01-19 14:14:20 +01:00
Paolo Bonzini
e108ff2f80 KVM: x86: switch to kvm_get_dirty_log_protect
We now have a generic function that does most of the work of
kvm_vm_ioctl_get_dirty_log, now use it.

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
2015-01-16 14:40:14 +01:00
Jiang Liu
c392f56c94 iommu/irq_remapping: Kill function irq_remapping_supported() and related code
Simplify irq_remapping code by killing irq_remapping_supported() and
related interfaces.

Joerg posted a similar patch at https://lkml.org/lkml/2014/12/15/490,
so assume an signed-off from Joerg.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Rientjes <rientjes@google.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Oren Twaig <oren@scalemp.com>
Link: http://lkml.kernel.org/r/1420615903-28253-14-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-15 11:24:23 +01:00
Thomas Gleixner
a1dafe857d iommu, x86: Restructure setup of the irq remapping feature
enable_IR_x2apic() calls setup_irq_remapping_ops() which by default
installs the intel dmar remapping ops and then calls the amd iommu irq
remapping prepare callback to figure out whether we are running on an
AMD machine with irq remapping hardware.

Right after that it calls irq_remapping_prepare() which pointlessly
checks:
	if (!remap_ops || !remap_ops->prepare)
               return -ENODEV;
and then calls

    remap_ops->prepare()

which is silly in the AMD case as it got called from
setup_irq_remapping_ops() already a few microseconds ago.

Simplify this and just collapse everything into
irq_remapping_prepare().

The irq_remapping_prepare() remains still silly as it assigns blindly
the intel ops, but that's not scope of this patch.

The scope here is to move the preperatory work, i.e. memory
allocations out of the atomic section which is required to enable irq
remapping.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Acked-and-tested-by: Joerg Roedel <joro@8bytes.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: Joerg Roedel <jroedel@suse.de>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Oren Twaig <oren@scalemp.com>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20141205084147.232633738@linutronix.de
Link: http://lkml.kernel.org/r/1420615903-28253-2-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-15 11:24:22 +01:00
Arnd Bergmann
643165c8bb Merge tag 'uaccess_for_upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost into asm-generic
Merge "uaccess: fix sparse warning on get/put_user for bitwise types" from Michael S. Tsirkin:

At the moment, if p and x are both tagged as bitwise types,
some of get_user(x, p), put_user(x, p), __get_user(x, p), __put_user(x, p)
might produce a sparse warning on many architectures.
This is a false positive: *p on these architectures is loaded into long
(typically using asm), then cast back to typeof(*p).

When typeof(*p) is a bitwise type (which is uncommon), such a cast needs
__force, otherwise sparse produces a warning.

Some architectures already have the __force tag, add it
where it's missing.

I verified that adding these __force casts does not supress any useful warnings.

Specifically, vhost wants to read/write bitwise types in userspace memory
using get_user/put_user.
At the moment this triggers sparse errors, since the value is passed through an
integer.

For example:
    __le32 __user *p;
    __u32 x;

both
    put_user(x, p);
and
    get_user(x, p);
should be safe, but produce warnings on some architectures.

While there, I noticed that a bunch of architectures violated
coding style rules within uaccess macros.
Included patches to fix them up.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

* tag 'uaccess_for_upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (37 commits)
  sparc32: nocheck uaccess coding style tweaks
  sparc64: nocheck uaccess coding style tweaks
  xtensa: macro whitespace fixes
  sh: macro whitespace fixes
  parisc: macro whitespace fixes
  m68k: macro whitespace fixes
  m32r: macro whitespace fixes
  frv: macro whitespace fixes
  cris: macro whitespace fixes
  avr32: macro whitespace fixes
  arm64: macro whitespace fixes
  arm: macro whitespace fixes
  alpha: macro whitespace fixes
  blackfin: macro whitespace fixes
  sparc64: uaccess_64 macro whitespace fixes
  sparc32: uaccess_32 macro whitespace fixes
  avr32: whitespace fix
  sh: fix put_user sparse errors
  metag: fix put_user sparse errors
  ia64: fix put_user sparse errors
  ...
2015-01-14 23:17:49 +01:00
Denys Vlasenko
af9cfe270d x86: entry_64.S: delete unused code
A define, two macros and an unreferenced bit of assembly are gone.

Acked-by: Borislav Petkov <bp@suse.de>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: X86 ML <x86@kernel.org>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: linux-kernel@vger.kernel.org
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
2015-01-13 14:00:33 -08:00
Michael S. Tsirkin
e182c570e9 x86/uaccess: fix sparse errors
virtio wants to read bitwise types from userspace using get_user.  At the
moment this triggers sparse errors, since the value is passed through an
integer.

Fix that up using __force.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-13 15:22:59 +02:00
Jiri Kosina
b9dfe0bed9 livepatch: handle ancient compilers with more grace
We are aborting a build in case when gcc doesn't support fentry on x86_64
(regs->ip modification can't really reliably work with mcount).

This however breaks allmodconfig for people with older gccs that don't
support -mfentry.

Turn the build-time failure into runtime failure, resulting in the whole
infrastructure not being initialized if CC_USING_FENTRY is unset.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
2015-01-09 10:55:10 +01:00
Nadav Amit
c205fb7d7d KVM: x86: #PF error-code on R/W operations is wrong
When emulating an instruction that reads the destination memory operand (i.e.,
instructions without the Mov flag in the emulator), the operand is first read.
If a page-fault is detected in this phase, the error-code which would be
delivered to the VM does not indicate that the access that caused the exception
is a write one. This does not conform with real hardware, and may cause the VM
to enter the page-fault handler twice for no reason (once for read, once for
write).

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-09 10:24:11 +01:00
Marcelo Tosatti
7c6a98dfa1 KVM: x86: add method to test PIR bitmap vector
kvm_x86_ops->test_posted_interrupt() returns true/false depending
whether 'vector' is set.

Next patch makes use of this interface.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:45:18 +01:00
Eugene Korenevsky
e9ac033e6b KVM: nVMX: Improve nested msr switch checking
This patch improve checks required by Intel Software Developer Manual.
 - SMM MSRs are not allowed.
 - microcode MSRs are not allowed.
 - check x2apic MSRs only when LAPIC is in x2apic mode.
 - MSR switch areas must be aligned to 16 bytes.
 - address of first and last byte in MSR switch areas should not set any bits
   beyond the processor's physical-address width.

Also it adds warning messages on failures during MSR switch. These messages
are useful for people who debug their VMMs in nVMX.

Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:45:15 +01:00
Wincy Van
ff651cb613 KVM: nVMX: Add nested msr load/restore algorithm
Several hypervisors need MSR auto load/restore feature.
We read MSRs from VM-entry MSR load area which specified by L1,
and load them via kvm_set_msr in the nested entry.
When nested exit occurs, we get MSRs via kvm_get_msr, writing
them to L1`s MSR store area. After this, we read MSRs from VM-exit
MSR load area, and load them via kvm_set_msr.

Signed-off-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08 22:45:14 +01:00
Luck, Tony
d4812e169d x86, mce: Get rid of TIF_MCE_NOTIFY and associated mce tricks
We now switch to the kernel stack when a machine check interrupts
during user mode.  This means that we can perform recovery actions
in the tail of do_machine_check()

Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
2015-01-07 07:47:42 -08:00
Andy Lutomirski
bced35b65a x86, traps: Add ist_begin_non_atomic and ist_end_non_atomic
In some IST handlers, if the interrupt came from user mode,
we can safely enable preemption.  Add helpers to do it safely.

This is intended to be used my the memory failure code in
do_machine_check.

Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
2015-01-02 10:22:46 -08:00
Andy Lutomirski
83653c16da x86: Clean up current_stack_pointer
There's no good reason for it to be a macro, and x86_64 will want to
use it, so it should be in a header.

Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
2015-01-02 10:22:46 -08:00
Andy Lutomirski
9592747538 x86, traps: Track entry into and exit from IST context
We currently pretend that IST context is like standard exception
context, but this is incorrect.  IST entries from userspace are like
standard exceptions except that they use per-cpu stacks, so they are
atomic.  IST entries from kernel space are like NMIs from RCU's
perspective -- they are not quiescent states even if they
interrupted the kernel during a quiescent state.

Add and use ist_enter and ist_exit to track IST context.  Even
though x86_32 has no IST stacks, we track these interrupts the same
way.

This fixes two issues:

 - Scheduling from an IST interrupt handler will now warn.  It would
   previously appear to work as long as we got lucky and nothing
   overwrote the stack frame.  (I don't know of any bugs in this
   that would trigger the warning, but it's good to be on the safe
   side.)

 - RCU handling in IST context was dangerous.  As far as I know,
   only machine checks were likely to trigger this, but it's good to
   be on the safe side.

Note that the machine check handlers appears to have been missing
any context tracking at all before this patch.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
2015-01-02 10:22:46 -08:00
Ingo Molnar
2aba73a614 Merge tag 'pr-20141223-x86-vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux into x86/urgent
Pull VDSO fix from Andy Lutomirski:

 "This is hopefully the last vdso fix for 3.19.  It should be very
  safe (it just adds a volatile).

  I don't think it fixes an actual bug (the __getcpu calls in the
  pvclock code may not have been needed in the first place), but
  discussion on that point is ongoing.

  It also fixes a big performance issue in 3.18 and earlier in which
  the lsl instructions in vclock_gettime got hoisted so far up the
  function that they happened even when the function they were in was
  never called.  n 3.19, the performance issue seems to be gone due to
  the whims of my compiler and some interaction with a branch that's
  now gone.

  I'll hopefully have a much bigger overhaul of the pvclock code
  for 3.20, but it needs careful review."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-01 22:21:22 +01:00
Andy Lutomirski
1ddf0b1b11 x86, vdso: Use asm volatile in __getcpu
In Linux 3.18 and below, GCC hoists the lsl instructions in the
pvclock code all the way to the beginning of __vdso_clock_gettime,
slowing the non-paravirt case significantly.  For unknown reasons,
presumably related to the removal of a branch, the performance issue
is gone as of

e76b027e64 x86,vdso: Use LSL unconditionally for vgetcpu

but I don't trust GCC enough to expect the problem to stay fixed.

There should be no correctness issue, because the __getcpu calls in
__vdso_vlock_gettime were never necessary in the first place.

Note to stable maintainers: In 3.18 and below, depending on
configuration, gcc 4.9.2 generates code like this:

     9c3:       44 0f 03 e8             lsl    %ax,%r13d
     9c7:       45 89 eb                mov    %r13d,%r11d
     9ca:       0f 03 d8                lsl    %ax,%ebx

This patch won't apply as is to any released kernel, but I'll send a
trivial backported version if needed.

Fixes: 51c19b4f59 x86: vdso: pvclock gettime support
Cc: stable@vger.kernel.org # 3.8+
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
2014-12-23 13:05:30 -08:00
Borislav Petkov
6ca7a8a150 x86/fpu: Use a symbolic name for asm operand
Fix up the else-case in fpu_fxsave() which seems like it has
been overlooked. Correct comment style in restore_fpu_checking()
while at it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1419170543-11393-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-23 10:41:12 +01:00
Li Bin
b5bfc51707 livepatch: move x86 specific ftrace handler code to arch/x86
The execution flow redirection related implemention in the livepatch
ftrace handler is depended on the specific architecture. This patch
introduces klp_arch_set_pc(like kgdb_arch_set_pc) interface to change
the pt_regs.

Signed-off-by: Li Bin <huawei.libin@huawei.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-12-22 15:40:49 +01:00
Seth Jennings
b700e7f03d livepatch: kernel: add support for live patching
This commit introduces code for the live patching core.  It implements
an ftrace-based mechanism and kernel interface for doing live patching
of kernel and kernel module functions.

It represents the greatest common functionality set between kpatch and
kgraft and can accept patches built using either method.

This first version does not implement any consistency mechanism that
ensures that old and new code do not run together.  In practice, ~90% of
CVEs are safe to apply in this way, since they simply add a conditional
check.  However, any function change that can not execute safely with
the old version of the function can _not_ be safely applied in this
version.

[ jkosina@suse.cz: due to the number of contributions that got folded into
  this original patch from Seth Jennings, add SUSE's copyright as well, as
  discussed via e-mail ]

Signed-off-by: Seth Jennings <sjenning@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-12-22 15:40:49 +01:00