Commit Graph

10132 Commits

Author SHA1 Message Date
Thomas Gleixner
54859f59fc x86: Remove create/destroy_irq()
No more users. Remove the cruft

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20140507154336.760446122@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-16 14:05:20 +02:00
Thomas Gleixner
d07c9f1875 x86: Get rid of get_nr_irqs_gsi()
No need to expose this outside of the ioapic code. The dynamic
allocations are guaranteed not to happen in the gsi space. See commit
62a08ae2a.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20140507154335.959870037@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-16 14:05:19 +02:00
Thomas Gleixner
be47be6c28 x86: ioapic: Use irq_alloc/free_hwirq()
No functional change just less crap.

This does not replace the requirement to move x86 to irq domains, but
it limits the mess to some degree.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20140507154335.749579081@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-16 14:05:19 +02:00
Thomas Gleixner
499c2b75e9 x86: hpet: Use irq_alloc/free_hwirq()
Use the new interfaces. No functional change.

This does not replace the requirement to move x86 to irq domains, but
it limits the mess to some degree.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20140507154334.991589924@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-16 14:05:18 +02:00
Thomas Gleixner
b1ee544174 x86: Implement arch_setup/teardown_hwirq()
This is just a cleanup to get rid of the create/destroy_irq variants
which were designed in hell.

The long term solution for x86 is to switch over to irq domains and
cleanup the whole vector allocation mess.

The generic irq_alloc_hwirqs() interface deliberately prevents
multi-MSI vector allocation to further enforce the irq domain
conversion (aside of the desire to support ioapic hotplug).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20140507154334.482904047@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-16 14:05:18 +02:00
Linus Torvalds
fa81511bb0 x86-64, modify_ldt: Make support for 16-bit segments a runtime option
Checkin:

b3b42ac2cb x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels

disabled 16-bit segments on 64-bit kernels due to an information
leak.  However, it does seem that people are genuinely using Wine to
run old 16-bit Windows programs on Linux.

A proper fix for this ("espfix64") is coming in the upcoming merge
window, but as a temporary fix, create a sysctl to allow the
administrator to re-enable support for 16-bit segments.

It adds a "/proc/sys/abi/ldt16" sysctl that defaults to zero (off). If
you hit this issue and care about your old Windows program more than
you care about a kernel stack address information leak, you can do

   echo 1 > /proc/sys/abi/ldt16

as root (add it to your startup scripts), and you should be ok.

The sysctl table is only added if you have COMPAT support enabled on
x86-64, but I assume anybody who runs old windows binaries very much
does that ;)

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/CA%2B55aFw9BPoD10U1LfHbOMpHWZkvJTkMcfCs9s3urPr1YyWBxw@mail.gmail.com
Cc: <stable@vger.kernel.org>
2014-05-14 16:33:54 -07:00
Steven Rostedt
e18eead3c3 ftrace/x86: Move the mcount/fentry code out of entry_64.S
As the mcount code gets more complex, it really does not belong
in the entry.S file. By moving it into its own file "mcount.S"
keeps things a bit cleaner.

Link: http://lkml.kernel.org/p/20140508152152.2130e8cf@gandalf.local.home

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-14 11:37:31 -04:00
Steven Rostedt (Red Hat)
f1b2f2bd58 ftrace: Remove FTRACE_UPDATE_MODIFY_CALL_REGS flag
As the decision to what needs to be done (converting a call to the
ftrace_caller to ftrace_caller_regs or to convert from ftrace_caller_regs
to ftrace_caller) can easily be determined from the rec->flags of
FTRACE_FL_REGS and FTRACE_FL_REGS_EN, there's no need to have the
ftrace_check_record() return either a UPDATE_MODIFY_CALL_REGS or a
UPDATE_MODIFY_CALL. Just he latter is enough. This added flag causes
more complexity than is required. Remove it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-14 11:37:30 -04:00
Steven Rostedt (Red Hat)
7413af1fb7 ftrace: Make get_ftrace_addr() and get_ftrace_addr_old() global
Move and rename get_ftrace_addr() and get_ftrace_addr_old() to
ftrace_get_addr_new() and ftrace_get_addr_curr() respectively.

This moves these two helper functions in the generic code out from
the arch specific code, and renames them to have a better generic
name. This will allow other archs to use them as well as makes it
a bit easier to work on getting separate trampolines for different
functions.

ftrace_get_addr_new() returns the trampoline address that the mcount
call address will be converted to.

ftrace_get_addr_curr() returns the trampoline address of what the
mcount call address currently jumps to.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-14 11:37:29 -04:00
Steven Rostedt (Red Hat)
94792ea07c ftrace/x86: Get the current mcount addr for add_breakpoint()
The add_breakpoint() code in the ftrace updating gets the address
of what the call will become, but if the mcount address is changing
from regs to non-regs ftrace_caller or vice versa, it will use what
the record currently is.

This is rather silly as the code should always use what is currently
there regardless of if it's changing the regs function or just converting
to a nop.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-05-14 11:37:28 -04:00
Oleg Nesterov
b02ef20a9f uprobes/x86: Fix the wrong ->si_addr when xol triggers a trap
If the probed insn triggers a trap, ->si_addr = regs->ip is technically
correct, but this is not what the signal handler wants; we need to pass
the address of the probed insn, not the address of xol slot.

Add the new arch-agnostic helper, uprobe_get_trap_addr(), and change
fill_trap_info() and math_error() to use it. !CONFIG_UPROBES case in
uprobes.h uses a macro to avoid include hell and ensure that it can be
compiled even if an architecture doesn't define instruction_pointer().

Test-case:

	#include <signal.h>
	#include <stdio.h>
	#include <unistd.h>

	extern void probe_div(void);

	void sigh(int sig, siginfo_t *info, void *c)
	{
		int passed = (info->si_addr == probe_div);
		printf(passed ? "PASS\n" : "FAIL\n");
		_exit(!passed);
	}

	int main(void)
	{
		struct sigaction sa = {
			.sa_sigaction	= sigh,
			.sa_flags	= SA_SIGINFO,
		};

		sigaction(SIGFPE, &sa, NULL);

		asm (
			"xor %ecx,%ecx\n"
			".globl probe_div; probe_div:\n"
			"idiv %ecx\n"
		);

		return 0;
	}

it fails if probe_div() is probed.

Note: show_unhandled_signals users should probably use this helper too,
but we need to cleanup them first.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
2014-05-14 13:57:28 +02:00
Oleg Nesterov
0eb14833d5 x86/traps: Kill DO_ERROR_INFO()
Now that DO_ERROR_INFO() doesn't differ from DO_ERROR() we can remove
it and use DO_ERROR() instead.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-05-14 13:57:28 +02:00
Oleg Nesterov
1c326c4dfe x86/traps: Shift fill_trap_info() from DO_ERROR_INFO() to do_error_trap()
Move the callsite of fill_trap_info() into do_error_trap() and remove
the "siginfo_t *info" argument.

This obviously breaks DO_ERROR() which passed info == NULL, we simply
change fill_trap_info() to return "siginfo_t *" and add the "default"
case which returns SEND_SIG_PRIV.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-05-14 13:57:27 +02:00
Oleg Nesterov
958d3d7298 x86/traps: Introduce fill_trap_info(), simplify DO_ERROR_INFO()
Extract the fill-siginfo code from DO_ERROR_INFO() into the new helper,
fill_trap_info().

It can calculate si_code and si_addr looking at trapnr, so we can remove
these arguments from DO_ERROR_INFO() and simplify the source code. The
generated code is the same, __builtin_constant_p(trapnr) == T.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-05-14 13:57:27 +02:00
Oleg Nesterov
dff0796e53 x86/traps: Introduce do_error_trap()
Move the common code from DO_ERROR() and DO_ERROR_INFO() into the new
helper, do_error_trap(). This simplifies define's and shaves 527 bytes
from traps.o.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-05-14 13:57:27 +02:00
Oleg Nesterov
38cad57be9 x86/traps: Use SEND_SIG_PRIV instead of force_sig()
force_sig() is just force_sig_info(SEND_SIG_PRIV). Imho it should die,
we have too many ugly "send signal" helpers.

And do_trap() looks just ugly because it uses force_sig_info() or
force_sig() depending on info != NULL.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-05-14 13:57:26 +02:00
Oleg Nesterov
5e1b05beec x86/traps: Make math_error() static
Trivial, make math_error() static.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-05-14 13:57:26 +02:00
Denys Vlasenko
1ea30fb645 uprobes/x86: Fix scratch register selection for rip-relative fixups
Before this patch, instructions such as div, mul, shifts with count
in CL, cmpxchg are mishandled.

This patch adds vex prefix handling. In particular, it avoids colliding
with register operand encoded in vex.vvvv field.

Since we need to avoid two possible register operands, the selection of
scratch register needs to be from at least three registers.

After looking through a lot of CPU docs, it looks like the safest choice
is SI,DI,BX. Selecting BX needs care to not collide with implicit use of
BX by cmpxchg8b.

Test-case:

	#include <stdio.h>

	static const char *const pass[] = { "FAIL", "pass" };

	long two = 2;
	void test1(void)
	{
		long ax = 0, dx = 0;
		asm volatile("\n"
	"			xor	%%edx,%%edx\n"
	"			lea	2(%%edx),%%eax\n"
	// We divide 2 by 2. Result (in eax) should be 1:
	"	probe1:		.globl	probe1\n"
	"			divl	two(%%rip)\n"
	// If we have a bug (eax mangled on entry) the result will be 2,
	// because eax gets restored by probe machinery.
		: "=a" (ax), "=d" (dx) /*out*/
		: "0" (ax), "1" (dx) /*in*/
		: "memory" /*clobber*/
		);
		dprintf(2, "%s: %s\n", __func__,
			pass[ax == 1]
		);
	}

	long val2 = 0;
	void test2(void)
	{
		long old_val = val2;
		long ax = 0, dx = 0;
		asm volatile("\n"
	"			mov	val2,%%eax\n"     // eax := val2
	"			lea	1(%%eax),%%edx\n" // edx := eax+1
	// eax is equal to val2. cmpxchg should store edx to val2:
	"	probe2:		.globl  probe2\n"
	"			cmpxchg %%edx,val2(%%rip)\n"
	// If we have a bug (eax mangled on entry), val2 will stay unchanged
		: "=a" (ax), "=d" (dx) /*out*/
		: "0" (ax), "1" (dx) /*in*/
		: "memory" /*clobber*/
		);
		dprintf(2, "%s: %s\n", __func__,
			pass[val2 == old_val + 1]
		);
	}

	long val3[2] = {0,0};
	void test3(void)
	{
		long old_val = val3[0];
		long ax = 0, dx = 0;
		asm volatile("\n"
	"			mov	val3,%%eax\n"  // edx:eax := val3
	"			mov	val3+4,%%edx\n"
	"			mov	%%eax,%%ebx\n" // ecx:ebx := edx:eax + 1
	"			mov	%%edx,%%ecx\n"
	"			add	$1,%%ebx\n"
	"			adc	$0,%%ecx\n"
	// edx:eax is equal to val3. cmpxchg8b should store ecx:ebx to val3:
	"	probe3:		.globl  probe3\n"
	"			cmpxchg8b val3(%%rip)\n"
	// If we have a bug (edx:eax mangled on entry), val3 will stay unchanged.
	// If ecx:edx in mangled, val3 will get wrong value.
		: "=a" (ax), "=d" (dx) /*out*/
		: "0" (ax), "1" (dx) /*in*/
		: "cx", "bx", "memory" /*clobber*/
		);
		dprintf(2, "%s: %s\n", __func__,
			pass[val3[0] == old_val + 1 && val3[1] == 0]
		);
	}

	int main(int argc, char **argv)
	{
		test1();
		test2();
		test3();
		return 0;
	}

Before this change all tests fail if probe{1,2,3} are probed.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-05-14 13:57:25 +02:00
Denys Vlasenko
50204c6f6d uprobes/x86: Simplify rip-relative handling
It is possible to replace rip-relative addressing mode with addressing
mode of the same length: (reg+disp32). This eliminates the need to fix
up immediate and correct for changing instruction length.

And we can kill arch_uprobe->def.riprel_target.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-05-14 13:57:25 +02:00
Rob Herring
eafd370dfe Merge branch 'dt-bus-name' into for-next 2014-05-13 18:34:35 -05:00
Ville Syrjälä
36dfcea47a x86/gpu: Sprinkle const, __init and __initconst to stolen memory quirks
gen8_stolen_size() is missing __init, so add it.

Also all the intel_stolen_funcs structures can be marked
__initconst.

intel_stolen_ids[] can also be made const if we replace the
__initdata with __initconst.

Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-13 14:13:23 +02:00
Damien Lespiau
3e3b2c3908 x86/gpu: Implement stolen memory size early quirk for CHV
CHV uses the same bits as SNB/VLV to code the Graphics Mode Select field
(GFX stolen memory size) with the addition of finer granularity modes:
4MB increments from 0x11 (8MB) to 0x1d.

Values strictly above 0x1d are either reserved or not supported.

v2: 4MB increments, not 8MB. 32MB has been omitted from the list of new
    values (Ville Syrjälä)

v3: Also correctly interpret GGMS (GTT Graphics Memory Size) (Ville
    Syrjälä)

v4: Don't assign a value that needs 20bits or more to a u16 (Rafael
    Barbalho)

[vsyrjala: v5: Split from i915 changes and add chv_stolen_funcs]

Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com>
Tested-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-13 14:13:23 +02:00
H. Peter Anvin
7a5091d584 x86, rdrand: When nordrand is specified, disable RDSEED as well
One can logically expect that when the user has specified "nordrand",
the user doesn't want any use of the CPU random number generator,
neither RDRAND nor RDSEED, so disable both.

Reported-by: Stephan Mueller <smueller@chronox.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Link: http://lkml.kernel.org/r/21542339.0lFnPSyGRS@myon.chronox.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-11 20:25:20 -07:00
Ong Boon Leong
04725ad594 x86, iosf: Add PCI ID macros for better readability
Introduce PCI IDs macro for the list of supported product:
BayTrail & Quark X1000.

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Link: http://lkml.kernel.org/r/1399668248-24199-5-git-send-email-david.e.box@linux.intel.com
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-09 14:57:35 -07:00
Ong Boon Leong
90916e048c x86, iosf: Add Quark X1000 PCI ID
Add PCI device ID, i.e. that of the Host Bridge,
for IOSF MBI driver.

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Link: http://lkml.kernel.org/r/1399668248-24199-4-git-send-email-david.e.box@linux.intel.com
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-09 14:57:23 -07:00
David E. Box
6b8f0c8780 x86, iosf: Make IOSF driver modular and usable by more drivers
Currently drivers that run on non-IOSF systems (Core/Xeon) can't use the IOSF
driver on SOC's without selecting it which forces an unnecessary and limiting
dependency. Provides dummy functions to allow these modules to conditionally
use the driver on IOSF equipped platforms without impacting their ability to
compile and load on non-IOSF platforms. Build default m to ensure availability
on x86 SOC's.

Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: http://lkml.kernel.org/r/1399668248-24199-2-git-send-email-david.e.box@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-09 14:56:15 -07:00
Boris Ostrovsky
28b92e09e2 x86, vdso, time: Cast tv_nsec to u64 for proper shifting in update_vsyscall()
With tk->wall_to_monotonic.tv_nsec being a 32-bit value on 32-bit
systems, (tk->wall_to_monotonic.tv_nsec << tk->shift) in update_vsyscall()
may lose upper bits or, worse, add them since compiler will do this:
	(u64)(tk->wall_to_monotonic.tv_nsec << tk->shift)
instead of
	((u64)tk->wall_to_monotonic.tv_nsec << tk->shift)

So if, for example, tv_nsec is 0x800000 and shift is 8 we will end up
with 0xffffffff80000000 instead of 0x80000000. And then we are stuck in
the subsequent 'while' loop.

We need an explicit cast.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1399648287-15178-1-git-send-email-boris.ostrovsky@oracle.com
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: <stable@vger.kernel.org> # v3.14
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-05-09 08:45:52 -07:00
Peter Zijlstra
f80c5b39b8 sched/idle, x86: Switch from TS_POLLING to TIF_POLLING_NRFLAG
Standardize the idle polling indicator to TIF_POLLING_NRFLAG such that
both TIF_NEED_RESCHED and TIF_POLLING_NRFLAG are in the same word.
This will allow us, using fetch_or(), to both set NEED_RESCHED and
check for POLLING_NRFLAG in a single operation and avoid pointless
wakeups.

Changing from the non-atomic thread_info::status flags to the atomic
thread_info::flags shouldn't be a big issue since most polling state
changes were followed/preceded by a full memory barrier anyway.

Also, fix up the apm_32 idle function, clearly that was forgotten in
the last conversion. The default idle state is !POLLING so just kill
the lot.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Link: http://lkml.kernel.org/n/tip-7yksmqtlv4nfowmlqr1rifoi@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-08 09:16:56 +02:00
Feng Tang
62187910b0 x86/intel: Add quirk to disable HPET for the Baytrail platform
HPET on current Baytrail platform has accuracy problem to be
used as reliable clocksource/clockevent, so add a early quirk to
disable it.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1398327498-13163-2-git-send-email-feng.tang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-08 08:15:34 +02:00
Feng Tang
f10f383d84 x86/hpet: Make boot_hpet_disable extern
HPET on some platform has accuracy problem. Making
"boot_hpet_disable" extern so that we can runtime disable
the HPET timer by using quirk to check the platform.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1398327498-13163-1-git-send-email-feng.tang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-08 08:15:34 +02:00
Ingo Molnar
37b16beaa9 Merge branch 'perf/urgent' into perf/core, to avoid conflicts
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-07 13:39:22 +02:00
Yan, Zheng
a4b4f11b27 perf/x86/intel: Fix Silvermont's event constraints
Event 0x013c is not the same as fixed counter2, remove it from
Silvermont's event constraints.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1398755081-12471-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-07 11:33:16 +02:00
Christian Gmeiner
aadca6fa40 x86/reboot: Add reboot quirk for Certec BPC600
Certec BPC600 needs reboot=pci to actually reboot.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Li Aubrey <aubrey.li@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1399446114-2147-1-git-send-email-christian.gmeiner@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-07 11:22:10 +02:00
Andi Kleen
2605fc216f asmlinkage, x86: Add explicit __visible to arch/x86/*
As requested by Linus add explicit __visible to the asmlinkage users.
This marks all functions visible to assembler.

Tree sweep for arch/x86/*

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1398984278-29319-3-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05 16:07:44 -07:00
Andy Lutomirski
f40c330091 x86, vdso: Move the vvar and hpet mappings next to the 64-bit vDSO
This makes the 64-bit and x32 vdsos use the same mechanism as the
32-bit vdso.  Most of the churn is deleting all the old fixmap code.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/8af87023f57f6bb96ec8d17fce3f88018195b49b.1399317206.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05 13:19:01 -07:00
Andy Lutomirski
6f121e548f x86, vdso: Reimplement vdso.so preparation in build-time C
Currently, vdso.so files are prepared and analyzed by a combination
of objcopy, nm, some linker script tricks, and some simple ELF
parsers in the kernel.  Replace all of that with plain C code that
runs at build time.

All five vdso images now generate .c files that are compiled and
linked in to the kernel image.

This should cause only one userspace-visible change: the loaded vDSO
images are stripped more heavily than they used to be.  Everything
outside the loadable segment is dropped.  In particular, this causes
the section table and section name strings to be missing.  This
should be fine: real dynamic loaders don't load or inspect these
tables anyway.  The result is roughly equivalent to eu-strip's
--strip-sections option.

The purpose of this change is to enable the vvar and hpet mappings
to be moved to the page following the vDSO load segment.  Currently,
it is possible for the section table to extend into the page after
the load segment, so, if we map it, it risks overlapping the vvar or
hpet page.  This happens whenever the load segment is just under a
multiple of PAGE_SIZE.

The only real subtlety here is that the old code had a C file with
inline assembler that did 'call VDSO32_vsyscall' and a linker script
that defined 'VDSO32_vsyscall = __kernel_vsyscall'.  This most
likely worked by accident: the linker script entry defines a symbol
associated with an address as opposed to an alias for the real
dynamic symbol __kernel_vsyscall.  That caused ld to relocate the
reference at link time instead of leaving an interposable dynamic
relocation.  Since the VDSO32_vsyscall hack is no longer needed, I
now use 'call __kernel_vsyscall', and I added -Bsymbolic to make it
work.  vdso2c will generate an error and abort the build if the
resulting image contains any dynamic relocations, so we won't
silently generate bad vdso images.

(Dynamic relocations are a problem because nothing will even attempt
to relocate the vdso.)

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/2c4fcf45524162a34d87fdda1eb046b2a5cecee7.1399317206.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05 13:18:51 -07:00
Andy Lutomirski
cfda7bb9ec x86, vdso: Move syscall and sysenter setup into kernel/cpu/common.c
This code is used during CPU setup, and it isn't strictly speaking
related to the 32-bit vdso.  It's easier to understand how this
works when the code is closer to its callers.

This also lets syscall32_cpu_init be static, which might save some
trivial amount of kernel text.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/4e466987204e232d7b55a53ff6b9739f12237461.1399317206.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-05 13:18:47 -07:00
H. Peter Anvin
34273f41d5 x86, espfix: Make it possible to disable 16-bit support
Embedded systems, which may be very memory-size-sensitive, are
extremely unlikely to ever encounter any 16-bit software, so make it
a CONFIG_EXPERT option to turn off support for any 16-bit software
whatsoever.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
2014-05-04 12:27:37 -07:00
H. Peter Anvin
197725de65 x86, espfix: Make espfix64 a Kconfig option, fix UML
Make espfix64 a hidden Kconfig option.  This fixes the x86-64 UML
build which had broken due to the non-existence of init_espfix_bsp()
in UML: since UML uses its own Kconfig, this option does not appear in
the UML build.

This also makes it possible to make support for 16-bit segments a
configuration option, for the people who want to minimize the size of
the kernel.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Richard Weinberger <richard@nod.at>
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
2014-05-04 10:00:49 -07:00
Linus Torvalds
0384dcae2b Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
 "This udpate delivers:

   - A fix for dynamic interrupt allocation on x86 which is required to
     exclude the GSI interrupts from the dynamic allocatable range.

     This was detected with the newfangled tablet SoCs which have GPIOs
     and therefor allocate a range of interrupts.  The MSI allocations
     already excluded the GSI range, so we never noticed before.

   - The last missing set_irq_affinity() repair, which was delayed due
     to testing issues

   - A few bug fixes for the armada SoC interrupt controller

   - A memory allocation fix for the TI crossbar interrupt controller

   - A trivial kernel-doc warning fix"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip: irq-crossbar: Not allocating enough memory
  irqchip: armanda: Sanitize set_irq_affinity()
  genirq: x86: Ensure that dynamic irq allocation does not conflict
  linux/interrupt.h: fix new kernel-doc warnings
  irqchip: armada-370-xp: Fix releasing of MSIs
  irqchip: armada-370-xp: implement the ->check_device() msi_chip operation
  irqchip: armada-370-xp: fix invalid cast of signed value into unsigned variable
2014-05-03 08:32:48 -07:00
Linus Torvalds
0845e11c2a Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "Two very small changes: one fix for the vSMP Foundation platform, and
  one to help LLVM not choke on options it doesn't understand (although
  it probably should)"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vsmp: Fix irq routing
  x86: LLVMLinux: Wrap -mno-80387 with cc-option
2014-05-02 14:04:52 -07:00
H. Peter Anvin
e1fe9ed8d2 x86, espfix: Move espfix definitions into a separate header file
Sparse warns that the percpu variables aren't declared before they are
defined.  Rather than hacking around it, move espfix definitions into
a proper header file.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-01 14:16:15 -07:00
H. Peter Anvin
246f2d2ee1 x86-32, espfix: Remove filter for espfix32 due to race
It is not safe to use LAR to filter when to go down the espfix path,
because the LDT is per-process (rather than per-thread) and another
thread might change the descriptors behind our back.  Fortunately it
is always *safe* (if a bit slow) to go down the espfix path, and a
32-bit LDT stack segment is extremely rare.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
Cc: <stable@vger.kernel.org> # consider after upstream merge
2014-04-30 14:14:49 -07:00
H. Peter Anvin
3891a04aaf x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack
The IRET instruction, when returning to a 16-bit segment, only
restores the bottom 16 bits of the user space stack pointer.  This
causes some 16-bit software to break, but it also leaks kernel state
to user space.  We have a software workaround for that ("espfix") for
the 32-bit kernel, but it relies on a nonzero stack segment base which
is not available in 64-bit mode.

In checkin:

    b3b42ac2cb x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels

we "solved" this by forbidding 16-bit segments on 64-bit kernels, with
the logic that 16-bit support is crippled on 64-bit kernels anyway (no
V86 support), but it turns out that people are doing stuff like
running old Win16 binaries under Wine and expect it to work.

This works around this by creating percpu "ministacks", each of which
is mapped 2^16 times 64K apart.  When we detect that the return SS is
on the LDT, we copy the IRET frame to the ministack and use the
relevant alias to return to userspace.  The ministacks are mapped
readonly, so if IRET faults we promote #GP to #DF which is an IST
vector and thus has its own stack; we then do the fixup in the #DF
handler.

(Making #GP an IST exception would make the msr_safe functions unsafe
in NMI/MC context, and quite possibly have other effects.)

Special thanks to:

- Andy Lutomirski, for the suggestion of using very small stack slots
  and copy (as opposed to map) the IRET frame there, and for the
  suggestion to mark them readonly and let the fault promote to #DF.
- Konrad Wilk for paravirt fixup and testing.
- Borislav Petkov for testing help and useful comments.

Reported-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andrew Lutomriski <amluto@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dirk Hohndel <dirk@hohndel.org>
Cc: Arjan van de Ven <arjan.van.de.ven@intel.com>
Cc: comex <comexk@gmail.com>
Cc: Alexander van Heukelum <heukelum@fastmail.fm>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: <stable@vger.kernel.org> # consider after upstream merge
2014-04-30 14:14:28 -07:00
Oleg Nesterov
c90a695012 uprobes/x86: Simplify riprel_{pre,post}_xol() and make them similar
Ignoring the "correction" logic riprel_pre_xol() and riprel_post_xol()
are very similar but look quite differently.

1. Add the "UPROBE_FIX_RIP_AX | UPROBE_FIX_RIP_CX" check at the start
   of riprel_pre_xol(), like the same check in riprel_post_xol().

2. Add the trivial scratch_reg() helper which returns the address of
   scratch register pre_xol/post_xol need to change.

3. Change these functions to use the new helper and avoid copy-and-paste
   under if/else branches.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-30 19:10:41 +02:00
Oleg Nesterov
7f55e82bac uprobes/x86: Kill the "autask" arg of riprel_pre_xol()
default_pre_xol_op() passes &current->utask->autask to riprel_pre_xol()
and this is just ugly because it still needs to load current->utask to
read ->vaddr.

Remove this argument, change riprel_pre_xol() to use current->utask.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-30 19:10:41 +02:00
Oleg Nesterov
1475ee7fad uprobes/x86: Rename *riprel* helpers to make the naming consistent
handle_riprel_insn(), pre_xol_rip_insn() and handle_riprel_post_xol()
look confusing and inconsistent. Rename them into riprel_analyze(),
riprel_pre_xol(), and riprel_post_xol() respectively.

No changes in compiled code.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-30 19:10:41 +02:00
Oleg Nesterov
83cd591485 uprobes/x86: Cleanup the usage of UPROBE_FIX_IP/UPROBE_FIX_CALL
Now that UPROBE_FIX_IP/UPROBE_FIX_CALL are mutually exclusive we can
use a single "fix_ip_or_call" enum instead of 2 fix_* booleans. This
way the logic looks more understandable and clean to me.

While at it, join "case 0xea" with other "ip is correct" ret/lret cases.
Also change default_post_xol_op() to use "else if" for the same reason.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-04-30 19:10:40 +02:00
Oleg Nesterov
1dc76e6eac uprobes/x86: Kill adjust_ret_addr(), simplify UPROBE_FIX_CALL logic
The only insn which could have both UPROBE_FIX_IP and UPROBE_FIX_CALL
was 0xe8 "call relative", and now it is handled by branch_xol_ops.

So we can change default_post_xol_op(UPROBE_FIX_CALL) to simply push
the address of next insn == utask->vaddr + insn.length, just we need
to record insn.length into the new auprobe->def.ilen member.

Note: if/when we teach branch_xol_ops to support jcxz/loopz we can
remove the "correction" logic, UPROBE_FIX_IP can use the same address.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-04-30 19:10:39 +02:00
Oleg Nesterov
2b82cadffc uprobes/x86: Introduce push_ret_address()
Extract the "push return address" code from branch_emulate_op() into
the new simple helper, push_ret_address(). It will have more users.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2014-04-30 19:10:38 +02:00