DISABLE_BRANCH_PROFILING turns off branch profiling (i.e. a
redefinition of 'if'). Branch profiling depends on a bunch of
kernel-internal symbols and generates extra output sections, none of
which are useful or functional in the vDSO.
It's currently turned off for vclock_gettime.c, but vgetcpu.c also
triggers branch profiling, so just turn it off in the makefile.
This fixes the build on some configurations: the vdso could contain
undefined symbols, and the fake section table overflowed due to
ftrace's added sections.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/bf1ec29e03b2bbc081f6dcaefa64db1c3a83fb21.1403642755.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Currently, vdso.so files are prepared and analyzed by a combination
of objcopy, nm, some linker script tricks, and some simple ELF
parsers in the kernel. Replace all of that with plain C code that
runs at build time.
All five vdso images now generate .c files that are compiled and
linked in to the kernel image.
This should cause only one userspace-visible change: the loaded vDSO
images are stripped more heavily than they used to be. Everything
outside the loadable segment is dropped. In particular, this causes
the section table and section name strings to be missing. This
should be fine: real dynamic loaders don't load or inspect these
tables anyway. The result is roughly equivalent to eu-strip's
--strip-sections option.
The purpose of this change is to enable the vvar and hpet mappings
to be moved to the page following the vDSO load segment. Currently,
it is possible for the section table to extend into the page after
the load segment, so, if we map it, it risks overlapping the vvar or
hpet page. This happens whenever the load segment is just under a
multiple of PAGE_SIZE.
The only real subtlety here is that the old code had a C file with
inline assembler that did 'call VDSO32_vsyscall' and a linker script
that defined 'VDSO32_vsyscall = __kernel_vsyscall'. This most
likely worked by accident: the linker script entry defines a symbol
associated with an address as opposed to an alias for the real
dynamic symbol __kernel_vsyscall. That caused ld to relocate the
reference at link time instead of leaving an interposable dynamic
relocation. Since the VDSO32_vsyscall hack is no longer needed, I
now use 'call __kernel_vsyscall', and I added -Bsymbolic to make it
work. vdso2c will generate an error and abort the build if the
resulting image contains any dynamic relocations, so we won't
silently generate bad vdso images.
(Dynamic relocations are a problem because nothing will even attempt
to relocate the vdso.)
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/2c4fcf45524162a34d87fdda1eb046b2a5cecee7.1399317206.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
This patch add the VDSO time support for the IA32 Emulation Layer.
Due the nature of the kernel headers and the LP64 compiler where the
size of a long and a pointer differs against a 32 bit compiler, there
is some type hacking necessary for optimal performance.
The vsyscall_gtod_data struture must be a rearranged to serve 32- and
64-bit code access at the same time:
- The seqcount_t was replaced by an unsigned, this makes the
vsyscall_gtod_data intedepend of kernel configuration and internal functions.
- All kernel internal structures are replaced by fix size elements
which works for 32- and 64-bit access
- The inner struct clock was removed to pack the whole struct.
The "unsigned seq" would be handled by functions derivated from seqcount_t.
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Link: http://lkml.kernel.org/r/1395094933-14252-11-git-send-email-stefani@seibold.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Linux as a guest on KVM hypervisor, the only user of the pvclock
vsyscall interface, does not require notification on task migration
because:
1. cpu ID number maps 1:1 to per-CPU pvclock time info.
2. per-CPU pvclock time info is updated if the
underlying CPU changes.
3. that version is increased whenever underlying CPU
changes.
Which is sufficient to guarantee nanoseconds counter
is calculated properly.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
vread_hpet() uses "0xf0" as the offset of the hpet counter. To
clarify the meaning of this code, it should use symbolic name,
HPET_COUNTER, instead.
Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Improve performance of time system calls when using Linux pvclock,
by reading time info from fixmap visible copy of pvclock data.
Originally from Jeremy Fitzhardinge.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
We used to store the wall-to-monotonic offset and the realtime base.
It's faster to precompute the monotonic base.
This is about a 3% speedup on Sandy Bridge for CLOCK_MONOTONIC.
It's much more impressive for CLOCK_MONOTONIC_COARSE.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: John Stultz <john.stultz@linaro.org>
The update of the vdso data happens under xtime_lock, so adding a
nested lock is pointless. Just use a seqcount to sync the readers.
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
When switching from a vsyscall capable to a non-vsyscall capable
clocksource, there was a small race, where the last vsyscall
gettimeofday before the switch might return a invalid time value
using the new non-vsyscall enabled clocksource values after the
switch is complete.
This is due to the vsyscall code checking the vclock_mode once
outside of the seqcount protected section. After it reads the
vclock mode, it doesn't re-check that the sampled clock data
that is obtained in the seqcount critical section still matches.
The fix is to sample vclock_mode inside the protected section,
and as long as it isn't VCLOCK_NONE, return the calculated
value. If it has changed and is now VCLOCK_NONE, fall back
to the syscall gettime calculation.
v2:
* Cleanup checks as suggested by tglx
* Also fix same issue present in gettimeofday path
CC: Andy Lutomirski <luto@amacapital.net>
CC: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Variables that are shared between the vdso and the kernel are
currently a bit of a mess. They are each defined with their own
magic, they are accessed differently in the kernel, the vsyscall page,
and the vdso, and one of them (vsyscall_clock) doesn't even really
exist.
This changes them all to use a common mechanism. All of them are
delcared in vvar.h with a fixed address (validated by the linker
script). In the kernel (as before), they look like ordinary
read-write variables. In the vsyscall page and the vdso, they are
accessed through a new macro VVAR, which gives read-only access.
The vdso is now loaded verbatim into memory without any fixups. As a
side bonus, access from the vdso is faster because a level of
indirection is removed.
While we're at it, pack jiffies and vgetcpu_mode into the same
cacheline.
Signed-off-by: Andy Lutomirski <luto@mit.edu>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Borislav Petkov <bp@amd64.org>
Link: http://lkml.kernel.org/r/%3C7357882fbb51fa30491636a7b6528747301b7ee9.1306156808.git.luto%40mit.edu%3E
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
After talking with some application writers who want very fast, but not
fine-grained timestamps, I decided to try to implement new clock_ids
to clock_gettime(): CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE
which returns the time at the last tick. This is very fast as we don't
have to access any hardware (which can be very painful if you're using
something like the acpi_pm clocksource), and we can even use the vdso
clock_gettime() method to avoid the syscall. The only trade off is you
only get low-res tick grained time resolution.
This isn't a new idea, I know Ingo has a patch in the -rt tree that made
the vsyscall gettimeofday() return coarse grained time when the
vsyscall64 sysctrl was set to 2. However this affects all applications
on a system.
With this method, applications can choose the proper speed/granularity
trade-off for themselves.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: nikolag@ca.ibm.com
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: arjan@infradead.org
Cc: jonathan@jonmasters.org
LKML-Reference: <1250734414.6897.5.camel@localhost.localdomain>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
According to the gettimeofday(2) manual:
If either tv or tz is NULL, the corresponding structure is not
set or returned.
Since it is legal to give NULL as the tv argument, the code should make
sure tv is not NULL before trying to dereference it.
This issue manifests itself on x86_64 when vdso=0 is not on the kernel
command-line and libc uses the vDSO for gettimeofday() (e.g. glibc >=
2.7). A simple reproducer:
#include <stdio.h>
#include <sys/time.h>
int main(void)
{
struct timezone tz;
gettimeofday(NULL, &tz);
return 0;
}
See http://bugs.debian.org/466491 for more details.
[ Impact: fix gettimeofday(NULL, &tz) segfault ]
Signed-off-by: John Wright <john.wright@hp.com>
Cc: Andi Kleen <ak@suse.de>
Cc: John Wright <john.wright@hp.com>
LKML-Reference: <1241037121-14805-1-git-send-email-john.wright@hp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: name change of unlikely tracer and profiler
Ingo Molnar suggested changing the config from UNLIKELY_PROFILE
to BRANCH_PROFILING. I never did like the "unlikely" name so I
went one step farther, and renamed all the unlikely configurations
to a "BRANCH" variant.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix bootup crash
the branch tracer missed arch/x86/vdso/vclock_gettime.c from
disabling tracing, which caused such bootup crashes:
[ 201.840097] init[1]: segfault at 7fffed3fe7c0 ip 00007fffed3fea2e sp 000077
also clean up the ugly ifdefs in arch/x86/kernel/vsyscall_64.c by
creating DISABLE_UNLIKELY_PROFILE facility for code to turn off
instrumentation on a per file basis.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add the notrace annotations to the vsyscall functions - there we are
not in kernel context yet, so the tracer function cannot (and must not)
be called.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Jeremy's gcc 3.4 seems to be unable to inline a 8 byte memcpy. But the
vdso doesn't support external references. Copy the structure members
of struct timezone explicitely instead.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>