Merge tag 'x86-entry-2020-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 entry updates from Thomas Gleixner:
 "The x86 entry, exception and interrupt code rework

  This all started about 6 month ago with the attempt to move the Posix
  CPU timer heavy lifting out of the timer interrupt code and just have
  lockless quick checks in that code path. Trivial 5 patches.

  This unearthed an inconsistency in the KVM handling of task work and
  the review requested to move all of this into generic code so other
  architectures can share.

  Valid request and solved with another 25 patches but those unearthed
  inconsistencies vs. RCU and instrumentation.

  Digging into this made it obvious that there are quite some
  inconsistencies vs. instrumentation in general. The int3 text poke
  handling in particular was completely unprotected and with the batched
  update of trace events even more likely to expose to endless int3
  recursion.

  In parallel the RCU implications of instrumenting fragile entry code
  came up in several discussions.

  The conclusion of the x86 maintainer team was to go all the way and
  make the protection against any form of instrumentation of fragile and
  dangerous code pathes enforcable and verifiable by tooling.

  A first batch of preparatory work hit mainline with commit
  d5f744f9a2 ("Pull x86 entry code updates from Thomas Gleixner")

  That (almost) full solution introduced a new code section
  '.noinstr.text' into which all code which needs to be protected from
  instrumentation of all sorts goes into. Any call into instrumentable
  code out of this section has to be annotated. objtool has support to
  validate this.

  Kprobes now excludes this section fully which also prevents BPF from
  fiddling with it and all 'noinstr' annotated functions also keep
  ftrace off. The section, kprobes and objtool changes are already
  merged.

  The major changes coming with this are:

    - Preparatory cleanups

    - Annotating of relevant functions to move them into the
      noinstr.text section or enforcing inlining by marking them
      __always_inline so the compiler cannot misplace or instrument
      them.

    - Splitting and simplifying the idtentry macro maze so that it is
      now clearly separated into simple exception entries and the more
      interesting ones which use interrupt stacks and have the paranoid
      handling vs. CR3 and GS.

    - Move quite some of the low level ASM functionality into C code:

       - enter_from and exit to user space handling. The ASM code now
         calls into C after doing the really necessary ASM handling and
         the return path goes back out without bells and whistels in
         ASM.

       - exception entry/exit got the equivivalent treatment

       - move all IRQ tracepoints from ASM to C so they can be placed as
         appropriate which is especially important for the int3
         recursion issue.

    - Consolidate the declaration and definition of entry points between
      32 and 64 bit. They share a common header and macros now.

    - Remove the extra device interrupt entry maze and just use the
      regular exception entry code.

    - All ASM entry points except NMI are now generated from the shared
      header file and the corresponding macros in the 32 and 64 bit
      entry ASM.

    - The C code entry points are consolidated as well with the help of
      DEFINE_IDTENTRY*() macros. This allows to ensure at one central
      point that all corresponding entry points share the same
      semantics. The actual function body for most entry points is in an
      instrumentable and sane state.

      There are special macros for the more sensitive entry points, e.g.
      INT3 and of course the nasty paranoid #NMI, #MCE, #DB and #DF.
      They allow to put the whole entry instrumentation and RCU handling
      into safe places instead of the previous pray that it is correct
      approach.

    - The INT3 text poke handling is now completely isolated and the
      recursion issue banned. Aside of the entry rework this required
      other isolation work, e.g. the ability to force inline bsearch.

    - Prevent #DB on fragile entry code, entry relevant memory and
      disable it on NMI, #MC entry, which allowed to get rid of the
      nested #DB IST stack shifting hackery.

    - A few other cleanups and enhancements which have been made
      possible through this and already merged changes, e.g.
      consolidating and further restricting the IDT code so the IDT
      table becomes RO after init which removes yet another popular
      attack vector

    - About 680 lines of ASM maze are gone.

  There are a few open issues:

   - An escape out of the noinstr section in the MCE handler which needs
     some more thought but under the aspect that MCE is a complete
     trainwreck by design and the propability to survive it is low, this
     was not high on the priority list.

   - Paravirtualization

     When PV is enabled then objtool complains about a bunch of indirect
     calls out of the noinstr section. There are a few straight forward
     ways to fix this, but the other issues vs. general correctness were
     more pressing than parawitz.

   - KVM

     KVM is inconsistent as well. Patches have been posted, but they
     have not yet been commented on or picked up by the KVM folks.

   - IDLE

     Pretty much the same problems can be found in the low level idle
     code especially the parts where RCU stopped watching. This was
     beyond the scope of the more obvious and exposable problems and is
     on the todo list.

  The lesson learned from this brain melting exercise to morph the
  evolved code base into something which can be validated and understood
  is that once again the violation of the most important engineering
  principle "correctness first" has caused quite a few people to spend
  valuable time on problems which could have been avoided in the first
  place. The "features first" tinkering mindset really has to stop.

  With that I want to say thanks to everyone involved in contributing to
  this effort. Special thanks go to the following people (alphabetical
  order): Alexandre Chartre, Andy Lutomirski, Borislav Petkov, Brian
  Gerst, Frederic Weisbecker, Josh Poimboeuf, Juergen Gross, Lai
  Jiangshan, Macro Elver, Paolo Bonzin,i Paul McKenney, Peter Zijlstra,
  Vitaly Kuznetsov, and Will Deacon"

* tag 'x86-entry-2020-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (142 commits)
  x86/entry: Force rcu_irq_enter() when in idle task
  x86/entry: Make NMI use IDTENTRY_RAW
  x86/entry: Treat BUG/WARN as NMI-like entries
  x86/entry: Unbreak __irqentry_text_start/end magic
  x86/entry: __always_inline CR2 for noinstr
  lockdep: __always_inline more for noinstr
  x86/entry: Re-order #DB handler to avoid *SAN instrumentation
  x86/entry: __always_inline arch_atomic_* for noinstr
  x86/entry: __always_inline irqflags for noinstr
  x86/entry: __always_inline debugreg for noinstr
  x86/idt: Consolidate idt functionality
  x86/idt: Cleanup trap_init()
  x86/idt: Use proper constants for table size
  x86/idt: Add comments about early #PF handling
  x86/idt: Mark init only functions __init
  x86/entry: Rename trace_hardirqs_off_prepare()
  x86/entry: Clarify irq_{enter,exit}_rcu()
  x86/entry: Remove DBn stacks
  x86/entry: Remove debug IDT frobbing
  x86/entry: Optimize local_db_save() for virt
  ...
This commit is contained in:
Linus Torvalds
2020-06-13 10:05:47 -07:00
111 changed files with 2754 additions and 2441 deletions

View File

@@ -1011,28 +1011,29 @@ struct bp_patching_desc {
static struct bp_patching_desc *bp_desc;
static inline struct bp_patching_desc *try_get_desc(struct bp_patching_desc **descp)
static __always_inline
struct bp_patching_desc *try_get_desc(struct bp_patching_desc **descp)
{
struct bp_patching_desc *desc = READ_ONCE(*descp); /* rcu_dereference */
struct bp_patching_desc *desc = __READ_ONCE(*descp); /* rcu_dereference */
if (!desc || !atomic_inc_not_zero(&desc->refs))
if (!desc || !arch_atomic_inc_not_zero(&desc->refs))
return NULL;
return desc;
}
static inline void put_desc(struct bp_patching_desc *desc)
static __always_inline void put_desc(struct bp_patching_desc *desc)
{
smp_mb__before_atomic();
atomic_dec(&desc->refs);
arch_atomic_dec(&desc->refs);
}
static inline void *text_poke_addr(struct text_poke_loc *tp)
static __always_inline void *text_poke_addr(struct text_poke_loc *tp)
{
return _stext + tp->rel_addr;
}
static int notrace patch_cmp(const void *key, const void *elt)
static __always_inline int patch_cmp(const void *key, const void *elt)
{
struct text_poke_loc *tp = (struct text_poke_loc *) elt;
@@ -1042,9 +1043,8 @@ static int notrace patch_cmp(const void *key, const void *elt)
return 1;
return 0;
}
NOKPROBE_SYMBOL(patch_cmp);
int notrace poke_int3_handler(struct pt_regs *regs)
int noinstr poke_int3_handler(struct pt_regs *regs)
{
struct bp_patching_desc *desc;
struct text_poke_loc *tp;
@@ -1077,9 +1077,9 @@ int notrace poke_int3_handler(struct pt_regs *regs)
* Skip the binary search if there is a single member in the vector.
*/
if (unlikely(desc->nr_entries > 1)) {
tp = bsearch(ip, desc->vec, desc->nr_entries,
sizeof(struct text_poke_loc),
patch_cmp);
tp = __inline_bsearch(ip, desc->vec, desc->nr_entries,
sizeof(struct text_poke_loc),
patch_cmp);
if (!tp)
goto out_put;
} else {
@@ -1118,7 +1118,6 @@ out_put:
put_desc(desc);
return ret;
}
NOKPROBE_SYMBOL(poke_int3_handler);
#define TP_VEC_MAX (PAGE_SIZE / sizeof(struct text_poke_loc))
static struct text_poke_loc tp_vec[TP_VEC_MAX];

View File

@@ -1088,23 +1088,14 @@ static void local_apic_timer_interrupt(void)
* [ if a single-CPU system runs an SMP kernel then we call the local
* interrupt as well. Thus we cannot inline the local irq ... ]
*/
__visible void __irq_entry smp_apic_timer_interrupt(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC(sysvec_apic_timer_interrupt)
{
struct pt_regs *old_regs = set_irq_regs(regs);
/*
* NOTE! We'd better ACK the irq immediately,
* because timer handling can be slow.
*
* update_process_times() expects us to have done irq_enter().
* Besides, if we don't timer interrupts ignore the global
* interrupt lock, which is the WrongThing (tm) to do.
*/
entering_ack_irq();
ack_APIC_irq();
trace_local_timer_entry(LOCAL_TIMER_VECTOR);
local_apic_timer_interrupt();
trace_local_timer_exit(LOCAL_TIMER_VECTOR);
exiting_irq();
set_irq_regs(old_regs);
}
@@ -2120,15 +2111,21 @@ void __init register_lapic_address(unsigned long address)
* Local APIC interrupts
*/
/*
* This interrupt should _never_ happen with our APIC/SMP architecture
/**
* spurious_interrupt - Catch all for interrupts raised on unused vectors
* @regs: Pointer to pt_regs on stack
* @vector: The vector number
*
* This is invoked from ASM entry code to catch all interrupts which
* trigger on an entry which is routed to the common_spurious idtentry
* point.
*
* Also called from sysvec_spurious_apic_interrupt().
*/
__visible void __irq_entry smp_spurious_interrupt(struct pt_regs *regs)
DEFINE_IDTENTRY_IRQ(spurious_interrupt)
{
u8 vector = ~regs->orig_ax;
u32 v;
entering_irq();
trace_spurious_apic_entry(vector);
inc_irq_stat(irq_spurious_count);
@@ -2158,13 +2155,17 @@ __visible void __irq_entry smp_spurious_interrupt(struct pt_regs *regs)
}
out:
trace_spurious_apic_exit(vector);
exiting_irq();
}
DEFINE_IDTENTRY_SYSVEC(sysvec_spurious_apic_interrupt)
{
__spurious_interrupt(regs, SPURIOUS_APIC_VECTOR);
}
/*
* This interrupt should never happen with our APIC/SMP architecture
*/
__visible void __irq_entry smp_error_interrupt(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC(sysvec_error_interrupt)
{
static const char * const error_interrupt_reason[] = {
"Send CS error", /* APIC Error Bit 0 */
@@ -2178,7 +2179,6 @@ __visible void __irq_entry smp_error_interrupt(struct pt_regs *regs)
};
u32 v, i = 0;
entering_irq();
trace_error_apic_entry(ERROR_APIC_VECTOR);
/* First tickle the hardware, only then report what went on. -- REW */
@@ -2202,7 +2202,6 @@ __visible void __irq_entry smp_error_interrupt(struct pt_regs *regs)
apic_printk(APIC_DEBUG, KERN_CONT "\n");
trace_error_apic_exit(ERROR_APIC_VECTOR);
exiting_irq();
}
/**

View File

@@ -115,7 +115,8 @@ msi_set_affinity(struct irq_data *irqd, const struct cpumask *mask, bool force)
* denote it as spurious which is no harm as this is a rare event
* and interrupt handlers have to cope with spurious interrupts
* anyway. If the vector is unused, then it is marked so it won't
* trigger the 'No irq handler for vector' warning in do_IRQ().
* trigger the 'No irq handler for vector' warning in
* common_interrupt().
*
* This requires to hold vector lock to prevent concurrent updates to
* the affected vector.

View File

@@ -861,13 +861,13 @@ static void free_moved_vector(struct apic_chip_data *apicd)
apicd->move_in_progress = 0;
}
asmlinkage __visible void __irq_entry smp_irq_move_cleanup_interrupt(void)
DEFINE_IDTENTRY_SYSVEC(sysvec_irq_move_cleanup)
{
struct hlist_head *clhead = this_cpu_ptr(&cleanup_list);
struct apic_chip_data *apicd;
struct hlist_node *tmp;
entering_ack_irq();
ack_APIC_irq();
/* Prevent vectors vanishing under us */
raw_spin_lock(&vector_lock);
@@ -892,7 +892,6 @@ asmlinkage __visible void __irq_entry smp_irq_move_cleanup_interrupt(void)
}
raw_spin_unlock(&vector_lock);
exiting_irq();
}
static void __send_cleanup_vector(struct apic_chip_data *apicd)

View File

@@ -57,9 +57,6 @@ int main(void)
BLANK();
#undef ENTRY
OFFSET(TSS_ist, tss_struct, x86_tss.ist);
DEFINE(DB_STACK_OFFSET, offsetof(struct cea_exception_stacks, DB_stack) -
offsetof(struct cea_exception_stacks, DB1_stack));
BLANK();
#ifdef CONFIG_STACKPROTECTOR

View File

@@ -10,10 +10,10 @@
*/
#include <linux/interrupt.h>
#include <asm/acrn.h>
#include <asm/apic.h>
#include <asm/desc.h>
#include <asm/hypervisor.h>
#include <asm/idtentry.h>
#include <asm/irq_regs.h>
static uint32_t __init acrn_detect(void)
@@ -24,7 +24,7 @@ static uint32_t __init acrn_detect(void)
static void __init acrn_init_platform(void)
{
/* Setup the IDT for ACRN hypervisor callback */
alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, acrn_hv_callback_vector);
alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, asm_sysvec_acrn_hv_callback);
}
static bool acrn_x2apic_available(void)
@@ -39,7 +39,7 @@ static bool acrn_x2apic_available(void)
static void (*acrn_intr_handler)(void);
__visible void __irq_entry acrn_hv_vector_handler(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC(sysvec_acrn_hv_callback)
{
struct pt_regs *old_regs = set_irq_regs(regs);
@@ -50,13 +50,12 @@ __visible void __irq_entry acrn_hv_vector_handler(struct pt_regs *regs)
* will block the interrupt whose vector is lower than
* HYPERVISOR_CALLBACK_VECTOR.
*/
entering_ack_irq();
ack_APIC_irq();
inc_irq_stat(irq_hv_callback_count);
if (acrn_intr_handler)
acrn_intr_handler();
exiting_irq();
set_irq_regs(old_regs);
}

View File

@@ -1706,25 +1706,6 @@ void syscall_init(void)
X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT);
}
DEFINE_PER_CPU(int, debug_stack_usage);
DEFINE_PER_CPU(u32, debug_idt_ctr);
void debug_stack_set_zero(void)
{
this_cpu_inc(debug_idt_ctr);
load_current_idt();
}
NOKPROBE_SYMBOL(debug_stack_set_zero);
void debug_stack_reset(void)
{
if (WARN_ON(!this_cpu_read(debug_idt_ctr)))
return;
if (this_cpu_dec_return(debug_idt_ctr) == 0)
load_current_idt();
}
NOKPROBE_SYMBOL(debug_stack_reset);
#else /* CONFIG_X86_64 */
DEFINE_PER_CPU(struct task_struct *, current_task) = &init_task;

View File

@@ -907,14 +907,13 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
mce_log(&m);
}
asmlinkage __visible void __irq_entry smp_deferred_error_interrupt(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC(sysvec_deferred_error)
{
entering_irq();
trace_deferred_error_apic_entry(DEFERRED_ERROR_VECTOR);
inc_irq_stat(irq_deferred_error_count);
deferred_error_int_vector();
trace_deferred_error_apic_exit(DEFERRED_ERROR_VECTOR);
exiting_ack_irq();
ack_APIC_irq();
}
/*

View File

@@ -130,7 +130,7 @@ static void (*quirk_no_way_out)(int bank, struct mce *m, struct pt_regs *regs);
BLOCKING_NOTIFIER_HEAD(x86_mce_decoder_chain);
/* Do initial initialization of a struct mce */
void mce_setup(struct mce *m)
noinstr void mce_setup(struct mce *m)
{
memset(m, 0, sizeof(struct mce));
m->cpu = m->extcpu = smp_processor_id();
@@ -140,12 +140,12 @@ void mce_setup(struct mce *m)
m->cpuid = cpuid_eax(1);
m->socketid = cpu_data(m->extcpu).phys_proc_id;
m->apicid = cpu_data(m->extcpu).initial_apicid;
rdmsrl(MSR_IA32_MCG_CAP, m->mcgcap);
m->mcgcap = __rdmsr(MSR_IA32_MCG_CAP);
if (this_cpu_has(X86_FEATURE_INTEL_PPIN))
rdmsrl(MSR_PPIN, m->ppin);
m->ppin = __rdmsr(MSR_PPIN);
else if (this_cpu_has(X86_FEATURE_AMD_PPIN))
rdmsrl(MSR_AMD_PPIN, m->ppin);
m->ppin = __rdmsr(MSR_AMD_PPIN);
m->microcode = boot_cpu_data.microcode;
}
@@ -1100,13 +1100,15 @@ static void mce_clear_state(unsigned long *toclear)
* kdump kernel establishing a new #MC handler where a broadcasted MCE
* might not get handled properly.
*/
static bool __mc_check_crashing_cpu(int cpu)
static noinstr bool mce_check_crashing_cpu(void)
{
unsigned int cpu = smp_processor_id();
if (cpu_is_offline(cpu) ||
(crashing_cpu != -1 && crashing_cpu != cpu)) {
u64 mcgstatus;
mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
mcgstatus = __rdmsr(MSR_IA32_MCG_STATUS);
if (boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN) {
if (mcgstatus & MCG_STATUS_LMCES)
@@ -1114,7 +1116,7 @@ static bool __mc_check_crashing_cpu(int cpu)
}
if (mcgstatus & MCG_STATUS_RIPV) {
mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
__wrmsr(MSR_IA32_MCG_STATUS, 0, 0);
return true;
}
}
@@ -1230,12 +1232,11 @@ static void kill_me_maybe(struct callback_head *cb)
* backing the user stack, tracing that reads the user stack will cause
* potentially infinite recursion.
*/
void noinstr do_machine_check(struct pt_regs *regs, long error_code)
void noinstr do_machine_check(struct pt_regs *regs)
{
DECLARE_BITMAP(valid_banks, MAX_NR_BANKS);
DECLARE_BITMAP(toclear, MAX_NR_BANKS);
struct mca_config *cfg = &mca_cfg;
int cpu = smp_processor_id();
struct mce m, *final;
char *msg = NULL;
int worst = 0;
@@ -1264,11 +1265,6 @@ void noinstr do_machine_check(struct pt_regs *regs, long error_code)
*/
int lmce = 1;
if (__mc_check_crashing_cpu(cpu))
return;
nmi_enter();
this_cpu_inc(mce_exception_count);
mce_gather_info(&m, regs);
@@ -1356,7 +1352,7 @@ void noinstr do_machine_check(struct pt_regs *regs, long error_code)
sync_core();
if (worst != MCE_AR_SEVERITY && !kill_it)
goto out_ist;
return;
/* Fault was in user mode and we need to take some action */
if ((m.cs & 3) == 3) {
@@ -1370,12 +1366,9 @@ void noinstr do_machine_check(struct pt_regs *regs, long error_code)
current->mce_kill_me.func = kill_me_now;
task_work_add(current, &current->mce_kill_me, true);
} else {
if (!fixup_exception(regs, X86_TRAP_MC, error_code, 0))
if (!fixup_exception(regs, X86_TRAP_MC, 0, 0))
mce_panic("Failed kernel mode recovery", &m, msg);
}
out_ist:
nmi_exit();
}
EXPORT_SYMBOL_GPL(do_machine_check);
@@ -1902,21 +1895,84 @@ bool filter_mce(struct mce *m)
}
/* Handle unconfigured int18 (should never happen) */
static void unexpected_machine_check(struct pt_regs *regs, long error_code)
static noinstr void unexpected_machine_check(struct pt_regs *regs)
{
instrumentation_begin();
pr_err("CPU#%d: Unexpected int18 (Machine Check)\n",
smp_processor_id());
instrumentation_end();
}
/* Call the installed machine check handler for this CPU setup. */
void (*machine_check_vector)(struct pt_regs *, long error_code) =
unexpected_machine_check;
void (*machine_check_vector)(struct pt_regs *) = unexpected_machine_check;
dotraplinkage notrace void do_mce(struct pt_regs *regs, long error_code)
static __always_inline void exc_machine_check_kernel(struct pt_regs *regs)
{
machine_check_vector(regs, error_code);
/*
* Only required when from kernel mode. See
* mce_check_crashing_cpu() for details.
*/
if (machine_check_vector == do_machine_check &&
mce_check_crashing_cpu())
return;
nmi_enter();
/*
* The call targets are marked noinstr, but objtool can't figure
* that out because it's an indirect call. Annotate it.
*/
instrumentation_begin();
trace_hardirqs_off_finish();
machine_check_vector(regs);
if (regs->flags & X86_EFLAGS_IF)
trace_hardirqs_on_prepare();
instrumentation_end();
nmi_exit();
}
NOKPROBE_SYMBOL(do_mce);
static __always_inline void exc_machine_check_user(struct pt_regs *regs)
{
idtentry_enter_user(regs);
instrumentation_begin();
machine_check_vector(regs);
instrumentation_end();
idtentry_exit_user(regs);
}
#ifdef CONFIG_X86_64
/* MCE hit kernel mode */
DEFINE_IDTENTRY_MCE(exc_machine_check)
{
unsigned long dr7;
dr7 = local_db_save();
exc_machine_check_kernel(regs);
local_db_restore(dr7);
}
/* The user mode variant. */
DEFINE_IDTENTRY_MCE_USER(exc_machine_check)
{
unsigned long dr7;
dr7 = local_db_save();
exc_machine_check_user(regs);
local_db_restore(dr7);
}
#else
/* 32bit unified entry point */
DEFINE_IDTENTRY_MCE(exc_machine_check)
{
unsigned long dr7;
dr7 = local_db_save();
if (user_mode(regs))
exc_machine_check_user(regs);
else
exc_machine_check_kernel(regs);
local_db_restore(dr7);
}
#endif
/*
* Called for each booted CPU to set up machine checks.

View File

@@ -146,9 +146,9 @@ static void raise_exception(struct mce *m, struct pt_regs *pregs)
regs.cs = m->cs;
pregs = &regs;
}
/* in mcheck exeception handler, irq will be disabled */
/* do_machine_check() expects interrupts disabled -- at least */
local_irq_save(flags);
do_machine_check(pregs, 0);
do_machine_check(pregs);
local_irq_restore(flags);
m->finished = 0;
}

View File

@@ -9,7 +9,7 @@
#include <asm/mce.h>
/* Pointer to the installed machine check handler for this CPU setup. */
extern void (*machine_check_vector)(struct pt_regs *, long error_code);
extern void (*machine_check_vector)(struct pt_regs *);
enum severity_level {
MCE_NO_SEVERITY,

View File

@@ -21,12 +21,11 @@
int mce_p5_enabled __read_mostly;
/* Machine check handler for Pentium class Intel CPUs: */
static void pentium_machine_check(struct pt_regs *regs, long error_code)
static noinstr void pentium_machine_check(struct pt_regs *regs)
{
u32 loaddr, hi, lotype;
nmi_enter();
instrumentation_begin();
rdmsr(MSR_IA32_P5_MC_ADDR, loaddr, hi);
rdmsr(MSR_IA32_P5_MC_TYPE, lotype, hi);
@@ -39,8 +38,7 @@ static void pentium_machine_check(struct pt_regs *regs, long error_code)
}
add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
nmi_exit();
instrumentation_end();
}
/* Set up machine check reporting for processors with Intel style MCE: */

View File

@@ -614,14 +614,13 @@ static void unexpected_thermal_interrupt(void)
static void (*smp_thermal_vector)(void) = unexpected_thermal_interrupt;
asmlinkage __visible void __irq_entry smp_thermal_interrupt(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC(sysvec_thermal)
{
entering_irq();
trace_thermal_apic_entry(THERMAL_APIC_VECTOR);
inc_irq_stat(irq_thermal_count);
smp_thermal_vector();
trace_thermal_apic_exit(THERMAL_APIC_VECTOR);
exiting_ack_irq();
ack_APIC_irq();
}
/* Thermal monitoring depends on APIC, ACPI and clock modulation */

View File

@@ -21,12 +21,11 @@ static void default_threshold_interrupt(void)
void (*mce_threshold_vector)(void) = default_threshold_interrupt;
asmlinkage __visible void __irq_entry smp_threshold_interrupt(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC(sysvec_threshold)
{
entering_irq();
trace_threshold_apic_entry(THRESHOLD_APIC_VECTOR);
inc_irq_stat(irq_threshold_count);
mce_threshold_vector();
trace_threshold_apic_exit(THRESHOLD_APIC_VECTOR);
exiting_ack_irq();
ack_APIC_irq();
}

View File

@@ -17,14 +17,12 @@
#include "internal.h"
/* Machine check handler for WinChip C6: */
static void winchip_machine_check(struct pt_regs *regs, long error_code)
static noinstr void winchip_machine_check(struct pt_regs *regs)
{
nmi_enter();
instrumentation_begin();
pr_emerg("CPU0: Machine Check Exception.\n");
add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
nmi_exit();
instrumentation_end();
}
/* Set up machine check reporting on the Winchip C6 series */

View File

@@ -23,6 +23,7 @@
#include <asm/hyperv-tlfs.h>
#include <asm/mshyperv.h>
#include <asm/desc.h>
#include <asm/idtentry.h>
#include <asm/irq_regs.h>
#include <asm/i8259.h>
#include <asm/apic.h>
@@ -40,11 +41,10 @@ static void (*hv_stimer0_handler)(void);
static void (*hv_kexec_handler)(void);
static void (*hv_crash_handler)(struct pt_regs *regs);
__visible void __irq_entry hyperv_vector_handler(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC(sysvec_hyperv_callback)
{
struct pt_regs *old_regs = set_irq_regs(regs);
entering_irq();
inc_irq_stat(irq_hv_callback_count);
if (vmbus_handler)
vmbus_handler();
@@ -52,7 +52,6 @@ __visible void __irq_entry hyperv_vector_handler(struct pt_regs *regs)
if (ms_hyperv.hints & HV_DEPRECATING_AEOI_RECOMMENDED)
ack_APIC_irq();
exiting_irq();
set_irq_regs(old_regs);
}
@@ -73,19 +72,16 @@ EXPORT_SYMBOL_GPL(hv_remove_vmbus_irq);
* Routines to do per-architecture handling of stimer0
* interrupts when in Direct Mode
*/
__visible void __irq_entry hv_stimer0_vector_handler(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC(sysvec_hyperv_stimer0)
{
struct pt_regs *old_regs = set_irq_regs(regs);
entering_irq();
inc_irq_stat(hyperv_stimer0_count);
if (hv_stimer0_handler)
hv_stimer0_handler();
add_interrupt_randomness(HYPERV_STIMER0_VECTOR, 0);
ack_APIC_irq();
exiting_irq();
set_irq_regs(old_regs);
}
@@ -331,17 +327,19 @@ static void __init ms_hyperv_init_platform(void)
x86_platform.apic_post_init = hyperv_init;
hyperv_setup_mmu_ops();
/* Setup the IDT for hypervisor callback */
alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, hyperv_callback_vector);
alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, asm_sysvec_hyperv_callback);
/* Setup the IDT for reenlightenment notifications */
if (ms_hyperv.features & HV_X64_ACCESS_REENLIGHTENMENT)
if (ms_hyperv.features & HV_X64_ACCESS_REENLIGHTENMENT) {
alloc_intr_gate(HYPERV_REENLIGHTENMENT_VECTOR,
hyperv_reenlightenment_vector);
asm_sysvec_hyperv_reenlightenment);
}
/* Setup the IDT for stimer0 */
if (ms_hyperv.misc_features & HV_STIMER_DIRECT_MODE_AVAILABLE)
if (ms_hyperv.misc_features & HV_STIMER_DIRECT_MODE_AVAILABLE) {
alloc_intr_gate(HYPERV_STIMER0_VECTOR,
hv_stimer0_callback_vector);
asm_sysvec_hyperv_stimer0);
}
# ifdef CONFIG_SMP
smp_ops.smp_prepare_boot_cpu = hv_smp_prepare_boot_cpu;

View File

@@ -10,7 +10,6 @@
#include <asm/desc.h>
#include <asm/traps.h>
extern void double_fault(void);
#define ptr_ok(x) ((x) > PAGE_OFFSET && (x) < PAGE_OFFSET + MAXMEM)
#define TSS(x) this_cpu_read(cpu_tss_rw.x86_tss.x)
@@ -21,7 +20,7 @@ static void set_df_gdt_entry(unsigned int cpu);
* Called by double_fault with CR0.TS and EFLAGS.NT cleared. The CPU thinks
* we're running the doublefault task. Cannot return.
*/
asmlinkage notrace void __noreturn doublefault_shim(void)
asmlinkage noinstr void __noreturn doublefault_shim(void)
{
unsigned long cr2;
struct pt_regs regs;
@@ -40,7 +39,7 @@ asmlinkage notrace void __noreturn doublefault_shim(void)
* Fill in pt_regs. A downside of doing this in C is that the unwinder
* won't see it (no ENCODE_FRAME_POINTER), so a nested stack dump
* won't successfully unwind to the source of the double fault.
* The main dump from do_double_fault() is fine, though, since it
* The main dump from exc_double_fault() is fine, though, since it
* uses these regs directly.
*
* If anyone ever cares, this could be moved to asm.
@@ -70,7 +69,7 @@ asmlinkage notrace void __noreturn doublefault_shim(void)
regs.cx = TSS(cx);
regs.bx = TSS(bx);
do_double_fault(&regs, 0, cr2);
exc_double_fault(&regs, 0, cr2);
/*
* x86_32 does not save the original CR3 anywhere on a task switch.
@@ -84,7 +83,6 @@ asmlinkage notrace void __noreturn doublefault_shim(void)
*/
panic("cannot return from double fault\n");
}
NOKPROBE_SYMBOL(doublefault_shim);
DEFINE_PER_CPU_PAGE_ALIGNED(struct doublefault_stack, doublefault_stack) = {
.tss = {
@@ -95,7 +93,7 @@ DEFINE_PER_CPU_PAGE_ALIGNED(struct doublefault_stack, doublefault_stack) = {
.ldt = 0,
.io_bitmap_base = IO_BITMAP_OFFSET_INVALID,
.ip = (unsigned long) double_fault,
.ip = (unsigned long) asm_exc_double_fault,
.flags = X86_EFLAGS_FIXED,
.es = __USER_DS,
.cs = __KERNEL_CS,

View File

@@ -22,15 +22,13 @@
static const char * const exception_stack_names[] = {
[ ESTACK_DF ] = "#DF",
[ ESTACK_NMI ] = "NMI",
[ ESTACK_DB2 ] = "#DB2",
[ ESTACK_DB1 ] = "#DB1",
[ ESTACK_DB ] = "#DB",
[ ESTACK_MCE ] = "#MC",
};
const char *stack_type_name(enum stack_type type)
{
BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
if (type == STACK_TYPE_IRQ)
return "IRQ";
@@ -79,7 +77,6 @@ static const
struct estack_pages estack_pages[CEA_ESTACK_PAGES] ____cacheline_aligned = {
EPAGERANGE(DF),
EPAGERANGE(NMI),
EPAGERANGE(DB1),
EPAGERANGE(DB),
EPAGERANGE(MCE),
};
@@ -91,7 +88,7 @@ static bool in_exception_stack(unsigned long *stack, struct stack_info *info)
struct pt_regs *regs;
unsigned int k;
BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
begin = (unsigned long)__this_cpu_read(cea_exception_stacks);
/*

View File

@@ -12,7 +12,7 @@
#include <asm/frame.h>
.code64
.section .entry.text, "ax"
.section .text, "ax"
#ifdef CONFIG_FRAME_POINTER
/* Save parent and function stack frames (rip and rbp) */

View File

@@ -29,15 +29,16 @@
#ifdef CONFIG_PARAVIRT_XXL
#include <asm/asm-offsets.h>
#include <asm/paravirt.h>
#define GET_CR2_INTO(reg) GET_CR2_INTO_AX ; _ASM_MOV %_ASM_AX, reg
#else
#define INTERRUPT_RETURN iretq
#define GET_CR2_INTO(reg) _ASM_MOV %cr2, reg
#endif
/* we are not able to switch in one step to the final KERNEL ADDRESS SPACE
/*
* We are not able to switch in one step to the final KERNEL ADDRESS SPACE
* because we need identity-mapped pages.
*
*/
#define l4_index(x) (((x) >> 39) & 511)
#define pud_index(x) (((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1))

View File

@@ -32,6 +32,8 @@
#include <asm/processor.h>
#include <asm/debugreg.h>
#include <asm/user.h>
#include <asm/desc.h>
#include <asm/tlbflush.h>
/* Per cpu debug control register value */
DEFINE_PER_CPU(unsigned long, cpu_dr7);
@@ -97,6 +99,8 @@ int arch_install_hw_breakpoint(struct perf_event *bp)
unsigned long *dr7;
int i;
lockdep_assert_irqs_disabled();
for (i = 0; i < HBP_NUM; i++) {
struct perf_event **slot = this_cpu_ptr(&bp_per_reg[i]);
@@ -115,6 +119,12 @@ int arch_install_hw_breakpoint(struct perf_event *bp)
dr7 = this_cpu_ptr(&cpu_dr7);
*dr7 |= encode_dr7(i, info->len, info->type);
/*
* Ensure we first write cpu_dr7 before we set the DR7 register.
* This ensures an NMI never see cpu_dr7 0 when DR7 is not.
*/
barrier();
set_debugreg(*dr7, 7);
if (info->mask)
set_dr_addr_mask(info->mask, i);
@@ -134,9 +144,11 @@ int arch_install_hw_breakpoint(struct perf_event *bp)
void arch_uninstall_hw_breakpoint(struct perf_event *bp)
{
struct arch_hw_breakpoint *info = counter_arch_bp(bp);
unsigned long *dr7;
unsigned long dr7;
int i;
lockdep_assert_irqs_disabled();
for (i = 0; i < HBP_NUM; i++) {
struct perf_event **slot = this_cpu_ptr(&bp_per_reg[i]);
@@ -149,12 +161,20 @@ void arch_uninstall_hw_breakpoint(struct perf_event *bp)
if (WARN_ONCE(i == HBP_NUM, "Can't find any breakpoint slot"))
return;
dr7 = this_cpu_ptr(&cpu_dr7);
*dr7 &= ~__encode_dr7(i, info->len, info->type);
dr7 = this_cpu_read(cpu_dr7);
dr7 &= ~__encode_dr7(i, info->len, info->type);
set_debugreg(*dr7, 7);
set_debugreg(dr7, 7);
if (info->mask)
set_dr_addr_mask(0, i);
/*
* Ensure the write to cpu_dr7 is after we've set the DR7 register.
* This ensures an NMI never see cpu_dr7 0 when DR7 is not.
*/
barrier();
this_cpu_write(cpu_dr7, dr7);
}
static int arch_bp_generic_len(int x86_len)
@@ -227,10 +247,76 @@ int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw)
return (va >= TASK_SIZE_MAX) || ((va + len - 1) >= TASK_SIZE_MAX);
}
/*
* Checks whether the range [addr, end], overlaps the area [base, base + size).
*/
static inline bool within_area(unsigned long addr, unsigned long end,
unsigned long base, unsigned long size)
{
return end >= base && addr < (base + size);
}
/*
* Checks whether the range from addr to end, inclusive, overlaps the fixed
* mapped CPU entry area range or other ranges used for CPU entry.
*/
static inline bool within_cpu_entry(unsigned long addr, unsigned long end)
{
int cpu;
/* CPU entry erea is always used for CPU entry */
if (within_area(addr, end, CPU_ENTRY_AREA_BASE,
CPU_ENTRY_AREA_TOTAL_SIZE))
return true;
for_each_possible_cpu(cpu) {
/* The original rw GDT is being used after load_direct_gdt() */
if (within_area(addr, end, (unsigned long)get_cpu_gdt_rw(cpu),
GDT_SIZE))
return true;
/*
* cpu_tss_rw is not directly referenced by hardware, but
* cpu_tss_rw is also used in CPU entry code,
*/
if (within_area(addr, end,
(unsigned long)&per_cpu(cpu_tss_rw, cpu),
sizeof(struct tss_struct)))
return true;
/*
* cpu_tlbstate.user_pcid_flush_mask is used for CPU entry.
* If a data breakpoint on it, it will cause an unwanted #DB.
* Protect the full cpu_tlbstate structure to be sure.
*/
if (within_area(addr, end,
(unsigned long)&per_cpu(cpu_tlbstate, cpu),
sizeof(struct tlb_state)))
return true;
}
return false;
}
static int arch_build_bp_info(struct perf_event *bp,
const struct perf_event_attr *attr,
struct arch_hw_breakpoint *hw)
{
unsigned long bp_end;
bp_end = attr->bp_addr + attr->bp_len - 1;
if (bp_end < attr->bp_addr)
return -EINVAL;
/*
* Prevent any breakpoint of any type that overlaps the CPU
* entry area and data. This protects the IST stacks and also
* reduces the chance that we ever find out what happens if
* there's a data breakpoint on the GDT, IDT, or TSS.
*/
if (within_cpu_entry(attr->bp_addr, bp_end))
return -EINVAL;
hw->address = attr->bp_addr;
hw->mask = 0;
@@ -439,7 +525,7 @@ static int hw_breakpoint_handler(struct die_args *args)
{
int i, cpu, rc = NOTIFY_STOP;
struct perf_event *bp;
unsigned long dr7, dr6;
unsigned long dr6;
unsigned long *dr6_p;
/* The DR6 value is pointed by args->err */
@@ -454,9 +540,6 @@ static int hw_breakpoint_handler(struct die_args *args)
if ((dr6 & DR_TRAP_BITS) == 0)
return NOTIFY_DONE;
get_debugreg(dr7, 7);
/* Disable breakpoints during exception handling */
set_debugreg(0UL, 7);
/*
* Assert that local interrupts are disabled
* Reset the DRn bits in the virtualized register value.
@@ -513,7 +596,6 @@ static int hw_breakpoint_handler(struct die_args *args)
(dr6 & (~DR_TRAP_BITS)))
rc = NOTIFY_DONE;
set_debugreg(dr7, 7);
put_cpu();
return rc;

View File

@@ -4,6 +4,8 @@
*/
#include <linux/interrupt.h>
#include <asm/cpu_entry_area.h>
#include <asm/set_memory.h>
#include <asm/traps.h>
#include <asm/proto.h>
#include <asm/desc.h>
@@ -51,15 +53,23 @@ struct idt_data {
#define TSKG(_vector, _gdt) \
G(_vector, NULL, DEFAULT_STACK, GATE_TASK, DPL0, _gdt << 3)
#define IDT_TABLE_SIZE (IDT_ENTRIES * sizeof(gate_desc))
static bool idt_setup_done __initdata;
/*
* Early traps running on the DEFAULT_STACK because the other interrupt
* stacks work only after cpu_init().
*/
static const __initconst struct idt_data early_idts[] = {
INTG(X86_TRAP_DB, debug),
SYSG(X86_TRAP_BP, int3),
INTG(X86_TRAP_DB, asm_exc_debug),
SYSG(X86_TRAP_BP, asm_exc_int3),
#ifdef CONFIG_X86_32
INTG(X86_TRAP_PF, page_fault),
/*
* Not possible on 64-bit. See idt_setup_early_pf() for details.
*/
INTG(X86_TRAP_PF, asm_exc_page_fault),
#endif
};
@@ -70,33 +80,33 @@ static const __initconst struct idt_data early_idts[] = {
* set up TSS.
*/
static const __initconst struct idt_data def_idts[] = {
INTG(X86_TRAP_DE, divide_error),
INTG(X86_TRAP_NMI, nmi),
INTG(X86_TRAP_BR, bounds),
INTG(X86_TRAP_UD, invalid_op),
INTG(X86_TRAP_NM, device_not_available),
INTG(X86_TRAP_OLD_MF, coprocessor_segment_overrun),
INTG(X86_TRAP_TS, invalid_TSS),
INTG(X86_TRAP_NP, segment_not_present),
INTG(X86_TRAP_SS, stack_segment),
INTG(X86_TRAP_GP, general_protection),
INTG(X86_TRAP_SPURIOUS, spurious_interrupt_bug),
INTG(X86_TRAP_MF, coprocessor_error),
INTG(X86_TRAP_AC, alignment_check),
INTG(X86_TRAP_XF, simd_coprocessor_error),
INTG(X86_TRAP_DE, asm_exc_divide_error),
INTG(X86_TRAP_NMI, asm_exc_nmi),
INTG(X86_TRAP_BR, asm_exc_bounds),
INTG(X86_TRAP_UD, asm_exc_invalid_op),
INTG(X86_TRAP_NM, asm_exc_device_not_available),
INTG(X86_TRAP_OLD_MF, asm_exc_coproc_segment_overrun),
INTG(X86_TRAP_TS, asm_exc_invalid_tss),
INTG(X86_TRAP_NP, asm_exc_segment_not_present),
INTG(X86_TRAP_SS, asm_exc_stack_segment),
INTG(X86_TRAP_GP, asm_exc_general_protection),
INTG(X86_TRAP_SPURIOUS, asm_exc_spurious_interrupt_bug),
INTG(X86_TRAP_MF, asm_exc_coprocessor_error),
INTG(X86_TRAP_AC, asm_exc_alignment_check),
INTG(X86_TRAP_XF, asm_exc_simd_coprocessor_error),
#ifdef CONFIG_X86_32
TSKG(X86_TRAP_DF, GDT_ENTRY_DOUBLEFAULT_TSS),
#else
INTG(X86_TRAP_DF, double_fault),
INTG(X86_TRAP_DF, asm_exc_double_fault),
#endif
INTG(X86_TRAP_DB, debug),
INTG(X86_TRAP_DB, asm_exc_debug),
#ifdef CONFIG_X86_MCE
INTG(X86_TRAP_MC, &machine_check),
INTG(X86_TRAP_MC, asm_exc_machine_check),
#endif
SYSG(X86_TRAP_OF, overflow),
SYSG(X86_TRAP_OF, asm_exc_overflow),
#if defined(CONFIG_IA32_EMULATION)
SYSG(IA32_SYSCALL_VECTOR, entry_INT80_compat),
#elif defined(CONFIG_X86_32)
@@ -109,95 +119,63 @@ static const __initconst struct idt_data def_idts[] = {
*/
static const __initconst struct idt_data apic_idts[] = {
#ifdef CONFIG_SMP
INTG(RESCHEDULE_VECTOR, reschedule_interrupt),
INTG(CALL_FUNCTION_VECTOR, call_function_interrupt),
INTG(CALL_FUNCTION_SINGLE_VECTOR, call_function_single_interrupt),
INTG(IRQ_MOVE_CLEANUP_VECTOR, irq_move_cleanup_interrupt),
INTG(REBOOT_VECTOR, reboot_interrupt),
INTG(RESCHEDULE_VECTOR, asm_sysvec_reschedule_ipi),
INTG(CALL_FUNCTION_VECTOR, asm_sysvec_call_function),
INTG(CALL_FUNCTION_SINGLE_VECTOR, asm_sysvec_call_function_single),
INTG(IRQ_MOVE_CLEANUP_VECTOR, asm_sysvec_irq_move_cleanup),
INTG(REBOOT_VECTOR, asm_sysvec_reboot),
#endif
#ifdef CONFIG_X86_THERMAL_VECTOR
INTG(THERMAL_APIC_VECTOR, thermal_interrupt),
INTG(THERMAL_APIC_VECTOR, asm_sysvec_thermal),
#endif
#ifdef CONFIG_X86_MCE_THRESHOLD
INTG(THRESHOLD_APIC_VECTOR, threshold_interrupt),
INTG(THRESHOLD_APIC_VECTOR, asm_sysvec_threshold),
#endif
#ifdef CONFIG_X86_MCE_AMD
INTG(DEFERRED_ERROR_VECTOR, deferred_error_interrupt),
INTG(DEFERRED_ERROR_VECTOR, asm_sysvec_deferred_error),
#endif
#ifdef CONFIG_X86_LOCAL_APIC
INTG(LOCAL_TIMER_VECTOR, apic_timer_interrupt),
INTG(X86_PLATFORM_IPI_VECTOR, x86_platform_ipi),
INTG(LOCAL_TIMER_VECTOR, asm_sysvec_apic_timer_interrupt),
INTG(X86_PLATFORM_IPI_VECTOR, asm_sysvec_x86_platform_ipi),
# ifdef CONFIG_HAVE_KVM
INTG(POSTED_INTR_VECTOR, kvm_posted_intr_ipi),
INTG(POSTED_INTR_WAKEUP_VECTOR, kvm_posted_intr_wakeup_ipi),
INTG(POSTED_INTR_NESTED_VECTOR, kvm_posted_intr_nested_ipi),
INTG(POSTED_INTR_VECTOR, asm_sysvec_kvm_posted_intr_ipi),
INTG(POSTED_INTR_WAKEUP_VECTOR, asm_sysvec_kvm_posted_intr_wakeup_ipi),
INTG(POSTED_INTR_NESTED_VECTOR, asm_sysvec_kvm_posted_intr_nested_ipi),
# endif
# ifdef CONFIG_IRQ_WORK
INTG(IRQ_WORK_VECTOR, irq_work_interrupt),
INTG(IRQ_WORK_VECTOR, asm_sysvec_irq_work),
# endif
#ifdef CONFIG_X86_UV
INTG(UV_BAU_MESSAGE, uv_bau_message_intr1),
#endif
INTG(SPURIOUS_APIC_VECTOR, spurious_interrupt),
INTG(ERROR_APIC_VECTOR, error_interrupt),
# ifdef CONFIG_X86_UV
INTG(UV_BAU_MESSAGE, asm_sysvec_uv_bau_message),
# endif
INTG(SPURIOUS_APIC_VECTOR, asm_sysvec_spurious_apic_interrupt),
INTG(ERROR_APIC_VECTOR, asm_sysvec_error_interrupt),
#endif
};
#ifdef CONFIG_X86_64
/*
* Early traps running on the DEFAULT_STACK because the other interrupt
* stacks work only after cpu_init().
*/
static const __initconst struct idt_data early_pf_idts[] = {
INTG(X86_TRAP_PF, page_fault),
};
/*
* Override for the debug_idt. Same as the default, but with interrupt
* stack set to DEFAULT_STACK (0). Required for NMI trap handling.
*/
static const __initconst struct idt_data dbg_idts[] = {
INTG(X86_TRAP_DB, debug),
};
#endif
/* Must be page-aligned because the real IDT is used in a fixmap. */
gate_desc idt_table[IDT_ENTRIES] __page_aligned_bss;
/* Must be page-aligned because the real IDT is used in the cpu entry area */
static gate_desc idt_table[IDT_ENTRIES] __page_aligned_bss;
struct desc_ptr idt_descr __ro_after_init = {
.size = (IDT_ENTRIES * 2 * sizeof(unsigned long)) - 1,
.size = IDT_TABLE_SIZE - 1,
.address = (unsigned long) idt_table,
};
#ifdef CONFIG_X86_64
/* No need to be aligned, but done to keep all IDTs defined the same way. */
gate_desc debug_idt_table[IDT_ENTRIES] __page_aligned_bss;
void load_current_idt(void)
{
lockdep_assert_irqs_disabled();
load_idt(&idt_descr);
}
/*
* The exceptions which use Interrupt stacks. They are setup after
* cpu_init() when the TSS has been initialized.
*/
static const __initconst struct idt_data ist_idts[] = {
ISTG(X86_TRAP_DB, debug, IST_INDEX_DB),
ISTG(X86_TRAP_NMI, nmi, IST_INDEX_NMI),
ISTG(X86_TRAP_DF, double_fault, IST_INDEX_DF),
#ifdef CONFIG_X86_MCE
ISTG(X86_TRAP_MC, &machine_check, IST_INDEX_MCE),
#endif
};
/*
* Override for the debug_idt. Same as the default, but with interrupt
* stack set to DEFAULT_STACK (0). Required for NMI trap handling.
*/
const struct desc_ptr debug_idt_descr = {
.size = IDT_ENTRIES * 16 - 1,
.address = (unsigned long) debug_idt_table,
};
#ifdef CONFIG_X86_F00F_BUG
bool idt_is_f00f_address(unsigned long address)
{
return ((address - idt_descr.address) >> 3) == 6;
}
#endif
static inline void idt_init_desc(gate_desc *gate, const struct idt_data *d)
@@ -214,7 +192,7 @@ static inline void idt_init_desc(gate_desc *gate, const struct idt_data *d)
#endif
}
static void
static __init void
idt_setup_from_table(gate_desc *idt, const struct idt_data *t, int size, bool sys)
{
gate_desc desc;
@@ -227,7 +205,7 @@ idt_setup_from_table(gate_desc *idt, const struct idt_data *t, int size, bool sy
}
}
static void set_intr_gate(unsigned int n, const void *addr)
static __init void set_intr_gate(unsigned int n, const void *addr)
{
struct idt_data data;
@@ -266,6 +244,27 @@ void __init idt_setup_traps(void)
}
#ifdef CONFIG_X86_64
/*
* Early traps running on the DEFAULT_STACK because the other interrupt
* stacks work only after cpu_init().
*/
static const __initconst struct idt_data early_pf_idts[] = {
INTG(X86_TRAP_PF, asm_exc_page_fault),
};
/*
* The exceptions which use Interrupt stacks. They are setup after
* cpu_init() when the TSS has been initialized.
*/
static const __initconst struct idt_data ist_idts[] = {
ISTG(X86_TRAP_DB, asm_exc_debug, IST_INDEX_DB),
ISTG(X86_TRAP_NMI, asm_exc_nmi, IST_INDEX_NMI),
ISTG(X86_TRAP_DF, asm_exc_double_fault, IST_INDEX_DF),
#ifdef CONFIG_X86_MCE
ISTG(X86_TRAP_MC, asm_exc_machine_check, IST_INDEX_MCE),
#endif
};
/**
* idt_setup_early_pf - Initialize the idt table with early pagefault handler
*
@@ -273,8 +272,10 @@ void __init idt_setup_traps(void)
* cpu_init() is invoked and sets up TSS. The IST variant is installed
* after that.
*
* FIXME: Why is 32bit and 64bit installing the PF handler at different
* places in the early setup code?
* Note, that X86_64 cannot install the real #PF handler in
* idt_setup_early_traps() because the memory intialization needs the #PF
* handler from the early_idt_handler_array to initialize the early page
* tables.
*/
void __init idt_setup_early_pf(void)
{
@@ -289,18 +290,21 @@ void __init idt_setup_ist_traps(void)
{
idt_setup_from_table(idt_table, ist_idts, ARRAY_SIZE(ist_idts), true);
}
/**
* idt_setup_debugidt_traps - Initialize the debug idt table with debug traps
*/
void __init idt_setup_debugidt_traps(void)
{
memcpy(&debug_idt_table, &idt_table, IDT_ENTRIES * 16);
idt_setup_from_table(debug_idt_table, dbg_idts, ARRAY_SIZE(dbg_idts), false);
}
#endif
static void __init idt_map_in_cea(void)
{
/*
* Set the IDT descriptor to a fixed read-only location in the cpu
* entry area, so that the "sidt" instruction will not leak the
* location of the kernel, and to defend the IDT against arbitrary
* memory write vulnerabilities.
*/
cea_set_pte(CPU_ENTRY_AREA_RO_IDT_VADDR, __pa_symbol(idt_table),
PAGE_KERNEL_RO);
idt_descr.address = CPU_ENTRY_AREA_RO_IDT;
}
/**
* idt_setup_apic_and_irq_gates - Setup APIC/SMP and normal interrupt gates
*/
@@ -318,11 +322,23 @@ void __init idt_setup_apic_and_irq_gates(void)
#ifdef CONFIG_X86_LOCAL_APIC
for_each_clear_bit_from(i, system_vectors, NR_VECTORS) {
set_bit(i, system_vectors);
/*
* Don't set the non assigned system vectors in the
* system_vectors bitmap. Otherwise they show up in
* /proc/interrupts.
*/
entry = spurious_entries_start + 8 * (i - FIRST_SYSTEM_VECTOR);
set_intr_gate(i, entry);
}
#endif
/* Map IDT into CPU entry area and reload it. */
idt_map_in_cea();
load_idt(&idt_descr);
/* Make the IDT table read only */
set_memory_ro((unsigned long)&idt_table, 1);
idt_setup_done = true;
}
/**
@@ -352,16 +368,14 @@ void idt_invalidate(void *addr)
load_idt(&idt);
}
void __init update_intr_gate(unsigned int n, const void *addr)
void __init alloc_intr_gate(unsigned int n, const void *addr)
{
if (WARN_ON_ONCE(!test_bit(n, system_vectors)))
if (WARN_ON(n < FIRST_SYSTEM_VECTOR))
return;
set_intr_gate(n, addr);
}
void alloc_intr_gate(unsigned int n, const void *addr)
{
BUG_ON(n < FIRST_SYSTEM_VECTOR);
if (!test_and_set_bit(n, system_vectors))
if (WARN_ON(idt_setup_done))
return;
if (!WARN_ON(test_and_set_bit(n, system_vectors)))
set_intr_gate(n, addr);
}

View File

@@ -13,12 +13,14 @@
#include <linux/export.h>
#include <linux/irq.h>
#include <asm/irq_stack.h>
#include <asm/apic.h>
#include <asm/io_apic.h>
#include <asm/irq.h>
#include <asm/mce.h>
#include <asm/hw_irq.h>
#include <asm/desc.h>
#include <asm/traps.h>
#define CREATE_TRACE_POINTS
#include <asm/trace/irq_vectors.h>
@@ -26,9 +28,6 @@
DEFINE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
EXPORT_PER_CPU_SYMBOL(irq_stat);
DEFINE_PER_CPU(struct pt_regs *, irq_regs);
EXPORT_PER_CPU_SYMBOL(irq_regs);
atomic_t irq_err_count;
/*
@@ -224,35 +223,35 @@ u64 arch_irq_stat(void)
return sum;
}
static __always_inline void handle_irq(struct irq_desc *desc,
struct pt_regs *regs)
{
if (IS_ENABLED(CONFIG_X86_64))
run_on_irqstack_cond(desc->handle_irq, desc, regs);
else
__handle_irq(desc, regs);
}
/*
* do_IRQ handles all normal device IRQ's (the special
* SMP cross-CPU interrupts have their own specific
* handlers).
* common_interrupt() handles all normal device IRQ's (the special SMP
* cross-CPU interrupts have their own entry points).
*/
__visible void __irq_entry do_IRQ(struct pt_regs *regs)
DEFINE_IDTENTRY_IRQ(common_interrupt)
{
struct pt_regs *old_regs = set_irq_regs(regs);
struct irq_desc * desc;
/* high bit used in ret_from_ code */
unsigned vector = ~regs->orig_ax;
struct irq_desc *desc;
entering_irq();
/* entering_irq() tells RCU that we're not quiescent. Check it. */
/* entry code tells RCU that we're not quiescent. Check it. */
RCU_LOCKDEP_WARN(!rcu_is_watching(), "IRQ failed to wake up RCU");
desc = __this_cpu_read(vector_irq[vector]);
if (likely(!IS_ERR_OR_NULL(desc))) {
if (IS_ENABLED(CONFIG_X86_32))
handle_irq(desc, regs);
else
generic_handle_irq_desc(desc);
handle_irq(desc, regs);
} else {
ack_APIC_irq();
if (desc == VECTOR_UNUSED) {
pr_emerg_ratelimited("%s: %d.%d No irq handler for vector\n",
pr_emerg_ratelimited("%s: %d.%u No irq handler for vector\n",
__func__, smp_processor_id(),
vector);
} else {
@@ -260,8 +259,6 @@ __visible void __irq_entry do_IRQ(struct pt_regs *regs)
}
}
exiting_irq();
set_irq_regs(old_regs);
}
@@ -271,17 +268,16 @@ void (*x86_platform_ipi_callback)(void) = NULL;
/*
* Handler for X86_PLATFORM_IPI_VECTOR.
*/
__visible void __irq_entry smp_x86_platform_ipi(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC(sysvec_x86_platform_ipi)
{
struct pt_regs *old_regs = set_irq_regs(regs);
entering_ack_irq();
ack_APIC_irq();
trace_x86_platform_ipi_entry(X86_PLATFORM_IPI_VECTOR);
inc_irq_stat(x86_platform_ipis);
if (x86_platform_ipi_callback)
x86_platform_ipi_callback();
trace_x86_platform_ipi_exit(X86_PLATFORM_IPI_VECTOR);
exiting_irq();
set_irq_regs(old_regs);
}
#endif
@@ -302,41 +298,29 @@ EXPORT_SYMBOL_GPL(kvm_set_posted_intr_wakeup_handler);
/*
* Handler for POSTED_INTERRUPT_VECTOR.
*/
__visible void smp_kvm_posted_intr_ipi(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC_SIMPLE(sysvec_kvm_posted_intr_ipi)
{
struct pt_regs *old_regs = set_irq_regs(regs);
entering_ack_irq();
ack_APIC_irq();
inc_irq_stat(kvm_posted_intr_ipis);
exiting_irq();
set_irq_regs(old_regs);
}
/*
* Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
*/
__visible void smp_kvm_posted_intr_wakeup_ipi(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC(sysvec_kvm_posted_intr_wakeup_ipi)
{
struct pt_regs *old_regs = set_irq_regs(regs);
entering_ack_irq();
ack_APIC_irq();
inc_irq_stat(kvm_posted_intr_wakeup_ipis);
kvm_posted_intr_wakeup_handler();
exiting_irq();
set_irq_regs(old_regs);
}
/*
* Handler for POSTED_INTERRUPT_NESTED_VECTOR.
*/
__visible void smp_kvm_posted_intr_nested_ipi(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC_SIMPLE(sysvec_kvm_posted_intr_nested_ipi)
{
struct pt_regs *old_regs = set_irq_regs(regs);
entering_ack_irq();
ack_APIC_irq();
inc_irq_stat(kvm_posted_intr_nested_ipis);
exiting_irq();
set_irq_regs(old_regs);
}
#endif

View File

@@ -148,7 +148,7 @@ void do_softirq_own_stack(void)
call_on_stack(__do_softirq, isp);
}
void handle_irq(struct irq_desc *desc, struct pt_regs *regs)
void __handle_irq(struct irq_desc *desc, struct pt_regs *regs)
{
int overflow = check_stack_overflow();

View File

@@ -20,6 +20,7 @@
#include <linux/sched/task_stack.h>
#include <asm/cpu_entry_area.h>
#include <asm/irq_stack.h>
#include <asm/io_apic.h>
#include <asm/apic.h>
@@ -70,3 +71,8 @@ int irq_init_percpu_irqstack(unsigned int cpu)
return 0;
return map_irq_stack(cpu);
}
void do_softirq_own_stack(void)
{
run_on_irqstack_cond(__do_softirq, NULL, NULL);
}

View File

@@ -9,18 +9,18 @@
#include <linux/irq_work.h>
#include <linux/hardirq.h>
#include <asm/apic.h>
#include <asm/idtentry.h>
#include <asm/trace/irq_vectors.h>
#include <linux/interrupt.h>
#ifdef CONFIG_X86_LOCAL_APIC
__visible void __irq_entry smp_irq_work_interrupt(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC(sysvec_irq_work)
{
ipi_entering_ack_irq();
ack_APIC_irq();
trace_irq_work_entry(IRQ_WORK_VECTOR);
inc_irq_stat(apic_irq_work_irqs);
irq_work_run();
trace_irq_work_exit(IRQ_WORK_VECTOR);
exiting_irq();
}
void arch_irq_work_raise(void)

View File

@@ -1073,13 +1073,6 @@ NOKPROBE_SYMBOL(kprobe_fault_handler);
int __init arch_populate_kprobe_blacklist(void)
{
int ret;
ret = kprobe_add_area_blacklist((unsigned long)__irqentry_text_start,
(unsigned long)__irqentry_text_end);
if (ret)
return ret;
return kprobe_add_area_blacklist((unsigned long)__entry_text_start,
(unsigned long)__entry_text_end);
}

View File

@@ -286,9 +286,7 @@ static int can_optimize(unsigned long paddr)
* stack handling and registers setup.
*/
if (((paddr >= (unsigned long)__entry_text_start) &&
(paddr < (unsigned long)__entry_text_end)) ||
((paddr >= (unsigned long)__irqentry_text_start) &&
(paddr < (unsigned long)__irqentry_text_end)))
(paddr < (unsigned long)__entry_text_end)))
return 0;
/* Check there is enough space for a relative jump. */

View File

@@ -217,7 +217,7 @@ again:
}
EXPORT_SYMBOL_GPL(kvm_async_pf_task_wake);
u32 kvm_read_and_reset_apf_flags(void)
noinstr u32 kvm_read_and_reset_apf_flags(void)
{
u32 flags = 0;
@@ -229,11 +229,11 @@ u32 kvm_read_and_reset_apf_flags(void)
return flags;
}
EXPORT_SYMBOL_GPL(kvm_read_and_reset_apf_flags);
NOKPROBE_SYMBOL(kvm_read_and_reset_apf_flags);
bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token)
noinstr bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token)
{
u32 reason = kvm_read_and_reset_apf_flags();
bool rcu_exit;
switch (reason) {
case KVM_PV_REASON_PAGE_NOT_PRESENT:
@@ -243,6 +243,9 @@ bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token)
return false;
}
rcu_exit = idtentry_enter_cond_rcu(regs);
instrumentation_begin();
/*
* If the host managed to inject an async #PF into an interrupt
* disabled region, then die hard as this is not going to end well
@@ -257,13 +260,13 @@ bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token)
/* Page is swapped out by the host. */
kvm_async_pf_task_wait_schedule(token);
} else {
rcu_irq_enter();
kvm_async_pf_task_wake(token);
rcu_irq_exit();
}
instrumentation_end();
idtentry_exit_cond_rcu(regs, rcu_exit);
return true;
}
NOKPROBE_SYMBOL(__kvm_handle_async_pf);
static void __init paravirt_ops_setup(void)
{

View File

@@ -303,7 +303,7 @@ NOKPROBE_SYMBOL(unknown_nmi_error);
static DEFINE_PER_CPU(bool, swallow_nmi);
static DEFINE_PER_CPU(unsigned long, last_nmi_rip);
static void default_do_nmi(struct pt_regs *regs)
static noinstr void default_do_nmi(struct pt_regs *regs)
{
unsigned char reason = 0;
int handled;
@@ -329,6 +329,9 @@ static void default_do_nmi(struct pt_regs *regs)
__this_cpu_write(last_nmi_rip, regs->ip);
instrumentation_begin();
trace_hardirqs_off_finish();
handled = nmi_handle(NMI_LOCAL, regs);
__this_cpu_add(nmi_stats.normal, handled);
if (handled) {
@@ -342,7 +345,7 @@ static void default_do_nmi(struct pt_regs *regs)
*/
if (handled > 1)
__this_cpu_write(swallow_nmi, true);
return;
goto out;
}
/*
@@ -374,7 +377,7 @@ static void default_do_nmi(struct pt_regs *regs)
#endif
__this_cpu_add(nmi_stats.external, 1);
raw_spin_unlock(&nmi_reason_lock);
return;
goto out;
}
raw_spin_unlock(&nmi_reason_lock);
@@ -412,8 +415,12 @@ static void default_do_nmi(struct pt_regs *regs)
__this_cpu_add(nmi_stats.swallow, 1);
else
unknown_nmi_error(reason, regs);
out:
if (regs->flags & X86_EFLAGS_IF)
trace_hardirqs_on_prepare();
instrumentation_end();
}
NOKPROBE_SYMBOL(default_do_nmi);
/*
* NMIs can page fault or hit breakpoints which will cause it to lose
@@ -467,44 +474,9 @@ enum nmi_states {
};
static DEFINE_PER_CPU(enum nmi_states, nmi_state);
static DEFINE_PER_CPU(unsigned long, nmi_cr2);
static DEFINE_PER_CPU(unsigned long, nmi_dr7);
#ifdef CONFIG_X86_64
/*
* In x86_64, we need to handle breakpoint -> NMI -> breakpoint. Without
* some care, the inner breakpoint will clobber the outer breakpoint's
* stack.
*
* If a breakpoint is being processed, and the debug stack is being
* used, if an NMI comes in and also hits a breakpoint, the stack
* pointer will be set to the same fixed address as the breakpoint that
* was interrupted, causing that stack to be corrupted. To handle this
* case, check if the stack that was interrupted is the debug stack, and
* if so, change the IDT so that new breakpoints will use the current
* stack and not switch to the fixed address. On return of the NMI,
* switch back to the original IDT.
*/
static DEFINE_PER_CPU(int, update_debug_stack);
static bool notrace is_debug_stack(unsigned long addr)
{
struct cea_exception_stacks *cs = __this_cpu_read(cea_exception_stacks);
unsigned long top = CEA_ESTACK_TOP(cs, DB);
unsigned long bot = CEA_ESTACK_BOT(cs, DB1);
if (__this_cpu_read(debug_stack_usage))
return true;
/*
* Note, this covers the guard page between DB and DB1 as well to
* avoid two checks. But by all means @addr can never point into
* the guard page.
*/
return addr >= bot && addr < top;
}
NOKPROBE_SYMBOL(is_debug_stack);
#endif
dotraplinkage notrace void
do_nmi(struct pt_regs *regs, long error_code)
DEFINE_IDTENTRY_RAW(exc_nmi)
{
if (IS_ENABLED(CONFIG_SMP) && cpu_is_offline(smp_processor_id()))
return;
@@ -517,18 +489,7 @@ do_nmi(struct pt_regs *regs, long error_code)
this_cpu_write(nmi_cr2, read_cr2());
nmi_restart:
#ifdef CONFIG_X86_64
/*
* If we interrupted a breakpoint, it is possible that
* the nmi handler will have breakpoints too. We need to
* change the IDT such that breakpoints that happen here
* continue to use the NMI stack.
*/
if (unlikely(is_debug_stack(regs->sp))) {
debug_stack_set_zero();
this_cpu_write(update_debug_stack, 1);
}
#endif
this_cpu_write(nmi_dr7, local_db_save());
nmi_enter();
@@ -539,12 +500,7 @@ nmi_restart:
nmi_exit();
#ifdef CONFIG_X86_64
if (unlikely(this_cpu_read(update_debug_stack))) {
debug_stack_reset();
this_cpu_write(update_debug_stack, 0);
}
#endif
local_db_restore(this_cpu_read(nmi_dr7));
if (unlikely(this_cpu_read(nmi_cr2) != read_cr2()))
write_cr2(this_cpu_read(nmi_cr2));
@@ -554,7 +510,6 @@ nmi_restart:
if (user_mode(regs))
mds_user_clear_cpu_buffers();
}
NOKPROBE_SYMBOL(do_nmi);
void stop_nmi(void)
{

View File

@@ -27,6 +27,7 @@
#include <asm/mmu_context.h>
#include <asm/proto.h>
#include <asm/apic.h>
#include <asm/idtentry.h>
#include <asm/nmi.h>
#include <asm/mce.h>
#include <asm/trace/irq_vectors.h>
@@ -130,13 +131,11 @@ static int smp_stop_nmi_callback(unsigned int val, struct pt_regs *regs)
/*
* this function calls the 'stop' function on all other CPUs in the system.
*/
asmlinkage __visible void smp_reboot_interrupt(void)
DEFINE_IDTENTRY_SYSVEC(sysvec_reboot)
{
ipi_entering_ack_irq();
ack_APIC_irq();
cpu_emergency_vmxoff();
stop_this_cpu(NULL);
irq_exit();
}
static int register_stop_handler(void)
@@ -221,47 +220,33 @@ static void native_stop_other_cpus(int wait)
/*
* Reschedule call back. KVM uses this interrupt to force a cpu out of
* guest mode
* guest mode.
*/
__visible void __irq_entry smp_reschedule_interrupt(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC_SIMPLE(sysvec_reschedule_ipi)
{
ack_APIC_irq();
trace_reschedule_entry(RESCHEDULE_VECTOR);
inc_irq_stat(irq_resched_count);
kvm_set_cpu_l1tf_flush_l1d();
if (trace_resched_ipi_enabled()) {
/*
* scheduler_ipi() might call irq_enter() as well, but
* nested calls are fine.
*/
irq_enter();
trace_reschedule_entry(RESCHEDULE_VECTOR);
scheduler_ipi();
trace_reschedule_exit(RESCHEDULE_VECTOR);
irq_exit();
return;
}
scheduler_ipi();
trace_reschedule_exit(RESCHEDULE_VECTOR);
}
__visible void __irq_entry smp_call_function_interrupt(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC(sysvec_call_function)
{
ipi_entering_ack_irq();
ack_APIC_irq();
trace_call_function_entry(CALL_FUNCTION_VECTOR);
inc_irq_stat(irq_call_count);
generic_smp_call_function_interrupt();
trace_call_function_exit(CALL_FUNCTION_VECTOR);
exiting_irq();
}
__visible void __irq_entry smp_call_function_single_interrupt(struct pt_regs *r)
DEFINE_IDTENTRY_SYSVEC(sysvec_call_function_single)
{
ipi_entering_ack_irq();
ack_APIC_irq();
trace_call_function_single_entry(CALL_FUNCTION_SINGLE_VECTOR);
inc_irq_stat(irq_call_count);
generic_smp_call_function_single_interrupt();
trace_call_function_single_exit(CALL_FUNCTION_SINGLE_VECTOR);
exiting_irq();
}
static int __init nonmi_ipi_setup(char *str)

View File

@@ -25,20 +25,3 @@ void trace_pagefault_unreg(void)
{
static_branch_dec(&trace_pagefault_key);
}
#ifdef CONFIG_SMP
DEFINE_STATIC_KEY_FALSE(trace_resched_ipi_key);
int trace_resched_ipi_reg(void)
{
static_branch_inc(&trace_resched_ipi_key);
return 0;
}
void trace_resched_ipi_unreg(void)
{
static_branch_dec(&trace_resched_ipi_key);
}
#endif

View File

@@ -97,24 +97,6 @@ int is_valid_bugaddr(unsigned long addr)
return ud == INSN_UD0 || ud == INSN_UD2;
}
int fixup_bug(struct pt_regs *regs, int trapnr)
{
if (trapnr != X86_TRAP_UD)
return 0;
switch (report_bug(regs->ip, regs)) {
case BUG_TRAP_TYPE_NONE:
case BUG_TRAP_TYPE_BUG:
break;
case BUG_TRAP_TYPE_WARN:
regs->ip += LEN_UD2;
return 1;
}
return 0;
}
static nokprobe_inline int
do_trap_no_signal(struct task_struct *tsk, int trapnr, const char *str,
struct pt_regs *regs, long error_code)
@@ -145,7 +127,7 @@ do_trap_no_signal(struct task_struct *tsk, int trapnr, const char *str,
* process no chance to handle the signal and notice the
* kernel fault information, so that won't result in polluting
* the information about previously queued, but not yet
* delivered, faults. See also do_general_protection below.
* delivered, faults. See also exc_general_protection below.
*/
tsk->thread.error_code = error_code;
tsk->thread.trap_nr = trapnr;
@@ -190,42 +172,120 @@ static void do_error_trap(struct pt_regs *regs, long error_code, char *str,
{
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
/*
* WARN*()s end up here; fix them up before we call the
* notifier chain.
*/
if (!user_mode(regs) && fixup_bug(regs, trapnr))
return;
if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) !=
NOTIFY_STOP) {
cond_local_irq_enable(regs);
do_trap(trapnr, signr, str, regs, error_code, sicode, addr);
cond_local_irq_disable(regs);
}
}
#define IP ((void __user *)uprobe_get_trap_addr(regs))
#define DO_ERROR(trapnr, signr, sicode, addr, str, name) \
dotraplinkage void do_##name(struct pt_regs *regs, long error_code) \
{ \
do_error_trap(regs, error_code, str, trapnr, signr, sicode, addr); \
/*
* Posix requires to provide the address of the faulting instruction for
* SIGILL (#UD) and SIGFPE (#DE) in the si_addr member of siginfo_t.
*
* This address is usually regs->ip, but when an uprobe moved the code out
* of line then regs->ip points to the XOL code which would confuse
* anything which analyzes the fault address vs. the unmodified binary. If
* a trap happened in XOL code then uprobe maps regs->ip back to the
* original instruction address.
*/
static __always_inline void __user *error_get_trap_addr(struct pt_regs *regs)
{
return (void __user *)uprobe_get_trap_addr(regs);
}
DO_ERROR(X86_TRAP_DE, SIGFPE, FPE_INTDIV, IP, "divide error", divide_error)
DO_ERROR(X86_TRAP_OF, SIGSEGV, 0, NULL, "overflow", overflow)
DO_ERROR(X86_TRAP_UD, SIGILL, ILL_ILLOPN, IP, "invalid opcode", invalid_op)
DO_ERROR(X86_TRAP_OLD_MF, SIGFPE, 0, NULL, "coprocessor segment overrun", coprocessor_segment_overrun)
DO_ERROR(X86_TRAP_TS, SIGSEGV, 0, NULL, "invalid TSS", invalid_TSS)
DO_ERROR(X86_TRAP_NP, SIGBUS, 0, NULL, "segment not present", segment_not_present)
DO_ERROR(X86_TRAP_SS, SIGBUS, 0, NULL, "stack segment", stack_segment)
#undef IP
DEFINE_IDTENTRY(exc_divide_error)
{
do_error_trap(regs, 0, "divide_error", X86_TRAP_DE, SIGFPE,
FPE_INTDIV, error_get_trap_addr(regs));
}
dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
DEFINE_IDTENTRY(exc_overflow)
{
do_error_trap(regs, 0, "overflow", X86_TRAP_OF, SIGSEGV, 0, NULL);
}
#ifdef CONFIG_X86_F00F_BUG
void handle_invalid_op(struct pt_regs *regs)
#else
static inline void handle_invalid_op(struct pt_regs *regs)
#endif
{
do_error_trap(regs, 0, "invalid opcode", X86_TRAP_UD, SIGILL,
ILL_ILLOPN, error_get_trap_addr(regs));
}
DEFINE_IDTENTRY_RAW(exc_invalid_op)
{
bool rcu_exit;
/*
* Handle BUG/WARN like NMIs instead of like normal idtentries:
* if we bugged/warned in a bad RCU context, for example, the last
* thing we want is to BUG/WARN again in the idtentry code, ad
* infinitum.
*/
if (!user_mode(regs) && is_valid_bugaddr(regs->ip)) {
enum bug_trap_type type;
nmi_enter();
instrumentation_begin();
trace_hardirqs_off_finish();
type = report_bug(regs->ip, regs);
if (regs->flags & X86_EFLAGS_IF)
trace_hardirqs_on_prepare();
instrumentation_end();
nmi_exit();
if (type == BUG_TRAP_TYPE_WARN) {
/* Skip the ud2. */
regs->ip += LEN_UD2;
return;
}
/*
* Else, if this was a BUG and report_bug returns or if this
* was just a normal #UD, we want to continue onward and
* crash.
*/
}
rcu_exit = idtentry_enter_cond_rcu(regs);
instrumentation_begin();
handle_invalid_op(regs);
instrumentation_end();
idtentry_exit_cond_rcu(regs, rcu_exit);
}
DEFINE_IDTENTRY(exc_coproc_segment_overrun)
{
do_error_trap(regs, 0, "coprocessor segment overrun",
X86_TRAP_OLD_MF, SIGFPE, 0, NULL);
}
DEFINE_IDTENTRY_ERRORCODE(exc_invalid_tss)
{
do_error_trap(regs, error_code, "invalid TSS", X86_TRAP_TS, SIGSEGV,
0, NULL);
}
DEFINE_IDTENTRY_ERRORCODE(exc_segment_not_present)
{
do_error_trap(regs, error_code, "segment not present", X86_TRAP_NP,
SIGBUS, 0, NULL);
}
DEFINE_IDTENTRY_ERRORCODE(exc_stack_segment)
{
do_error_trap(regs, error_code, "stack segment", X86_TRAP_SS, SIGBUS,
0, NULL);
}
DEFINE_IDTENTRY_ERRORCODE(exc_alignment_check)
{
char *str = "alignment check";
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
if (notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_AC, SIGBUS) == NOTIFY_STOP)
return;
@@ -271,12 +331,19 @@ __visible void __noreturn handle_stack_overflow(const char *message,
* from the TSS. Returning is, in principle, okay, but changes to regs will
* be lost. If, for some reason, we need to return to a context with modified
* regs, the shim code could be adjusted to synchronize the registers.
*
* The 32bit #DF shim provides CR2 already as an argument. On 64bit it needs
* to be read before doing anything else.
*/
dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code, unsigned long cr2)
DEFINE_IDTENTRY_DF(exc_double_fault)
{
static const char str[] = "double fault";
struct task_struct *tsk = current;
#ifdef CONFIG_VMAP_STACK
unsigned long address = read_cr2();
#endif
#ifdef CONFIG_X86_ESPFIX64
extern unsigned char native_irq_return_iret[];
@@ -299,6 +366,7 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code, unsign
regs->ip == (unsigned long)native_irq_return_iret)
{
struct pt_regs *gpregs = (struct pt_regs *)this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1;
unsigned long *p = (unsigned long *)regs->sp;
/*
* regs->sp points to the failing IRET frame on the
@@ -306,7 +374,11 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code, unsign
* in gpregs->ss through gpregs->ip.
*
*/
memmove(&gpregs->ip, (void *)regs->sp, 5*8);
gpregs->ip = p[0];
gpregs->cs = p[1];
gpregs->flags = p[2];
gpregs->sp = p[3];
gpregs->ss = p[4];
gpregs->orig_ax = 0; /* Missing (lost) #GP error code */
/*
@@ -320,7 +392,7 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code, unsign
* which is what the stub expects, given that the faulting
* RIP will be the IRET instruction.
*/
regs->ip = (unsigned long)general_protection;
regs->ip = (unsigned long)asm_exc_general_protection;
regs->sp = (unsigned long)&gpregs->orig_ax;
return;
@@ -328,6 +400,7 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code, unsign
#endif
nmi_enter();
instrumentation_begin();
notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_DF, SIGSEGV);
tsk->thread.error_code = error_code;
@@ -371,27 +444,31 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code, unsign
* stack even if the actual trigger for the double fault was
* something else.
*/
if ((unsigned long)task_stack_page(tsk) - 1 - cr2 < PAGE_SIZE)
handle_stack_overflow("kernel stack overflow (double-fault)", regs, cr2);
if ((unsigned long)task_stack_page(tsk) - 1 - address < PAGE_SIZE) {
handle_stack_overflow("kernel stack overflow (double-fault)",
regs, address);
}
#endif
pr_emerg("PANIC: double fault, error_code: 0x%lx\n", error_code);
die("double fault", regs, error_code);
panic("Machine halted.");
instrumentation_end();
}
dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
DEFINE_IDTENTRY(exc_bounds)
{
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
if (notify_die(DIE_TRAP, "bounds", regs, error_code,
if (notify_die(DIE_TRAP, "bounds", regs, 0,
X86_TRAP_BR, SIGSEGV) == NOTIFY_STOP)
return;
cond_local_irq_enable(regs);
if (!user_mode(regs))
die("bounds", regs, error_code);
die("bounds", regs, 0);
do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, 0, NULL);
do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, 0, 0, NULL);
cond_local_irq_disable(regs);
}
enum kernel_gp_hint {
@@ -438,7 +515,7 @@ static enum kernel_gp_hint get_kernel_gp_address(struct pt_regs *regs,
#define GPFSTR "general protection fault"
dotraplinkage void do_general_protection(struct pt_regs *regs, long error_code)
DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
{
char desc[sizeof(GPFSTR) + 50 + 2*sizeof(unsigned long) + 1] = GPFSTR;
enum kernel_gp_hint hint = GP_NO_HINT;
@@ -446,17 +523,17 @@ dotraplinkage void do_general_protection(struct pt_regs *regs, long error_code)
unsigned long gp_addr;
int ret;
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
cond_local_irq_enable(regs);
if (static_cpu_has(X86_FEATURE_UMIP)) {
if (user_mode(regs) && fixup_umip_exception(regs))
return;
goto exit;
}
if (v8086_mode(regs)) {
local_irq_enable();
handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
local_irq_disable();
return;
}
@@ -468,12 +545,11 @@ dotraplinkage void do_general_protection(struct pt_regs *regs, long error_code)
show_signal(tsk, SIGSEGV, "", desc, regs, error_code);
force_sig(SIGSEGV);
return;
goto exit;
}
if (fixup_exception(regs, X86_TRAP_GP, error_code, 0))
return;
goto exit;
tsk->thread.error_code = error_code;
tsk->thread.trap_nr = X86_TRAP_GP;
@@ -485,11 +561,11 @@ dotraplinkage void do_general_protection(struct pt_regs *regs, long error_code)
if (!preemptible() &&
kprobe_running() &&
kprobe_fault_handler(regs, X86_TRAP_GP))
return;
goto exit;
ret = notify_die(DIE_GPF, desc, regs, error_code, X86_TRAP_GP, SIGSEGV);
if (ret == NOTIFY_STOP)
return;
goto exit;
if (error_code)
snprintf(desc, sizeof(desc), "segment-related " GPFSTR);
@@ -511,47 +587,74 @@ dotraplinkage void do_general_protection(struct pt_regs *regs, long error_code)
die_addr(desc, regs, error_code, gp_addr);
exit:
cond_local_irq_disable(regs);
}
NOKPROBE_SYMBOL(do_general_protection);
dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
static bool do_int3(struct pt_regs *regs)
{
if (poke_int3_handler(regs))
return;
/*
* Unlike any other non-IST entry, we can be called from pretty much
* any location in the kernel through kprobes -- text_poke() will most
* likely be handled by poke_int3_handler() above. This means this
* handler is effectively NMI-like.
*/
if (!user_mode(regs))
nmi_enter();
int res;
#ifdef CONFIG_KGDB_LOW_LEVEL_TRAP
if (kgdb_ll_trap(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP,
SIGTRAP) == NOTIFY_STOP)
goto exit;
if (kgdb_ll_trap(DIE_INT3, "int3", regs, 0, X86_TRAP_BP,
SIGTRAP) == NOTIFY_STOP)
return true;
#endif /* CONFIG_KGDB_LOW_LEVEL_TRAP */
#ifdef CONFIG_KPROBES
if (kprobe_int3_handler(regs))
goto exit;
return true;
#endif
res = notify_die(DIE_INT3, "int3", regs, 0, X86_TRAP_BP, SIGTRAP);
if (notify_die(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP,
SIGTRAP) == NOTIFY_STOP)
goto exit;
return res == NOTIFY_STOP;
}
static void do_int3_user(struct pt_regs *regs)
{
if (do_int3(regs))
return;
cond_local_irq_enable(regs);
do_trap(X86_TRAP_BP, SIGTRAP, "int3", regs, error_code, 0, NULL);
do_trap(X86_TRAP_BP, SIGTRAP, "int3", regs, 0, 0, NULL);
cond_local_irq_disable(regs);
exit:
if (!user_mode(regs))
nmi_exit();
}
NOKPROBE_SYMBOL(do_int3);
DEFINE_IDTENTRY_RAW(exc_int3)
{
/*
* poke_int3_handler() is completely self contained code; it does (and
* must) *NOT* call out to anything, lest it hits upon yet another
* INT3.
*/
if (poke_int3_handler(regs))
return;
/*
* idtentry_enter_user() uses static_branch_{,un}likely() and therefore
* can trigger INT3, hence poke_int3_handler() must be done
* before. If the entry came from kernel mode, then use nmi_enter()
* because the INT3 could have been hit in any context including
* NMI.
*/
if (user_mode(regs)) {
idtentry_enter_user(regs);
instrumentation_begin();
do_int3_user(regs);
instrumentation_end();
idtentry_exit_user(regs);
} else {
nmi_enter();
instrumentation_begin();
trace_hardirqs_off_finish();
if (!do_int3(regs))
die("int3", regs, 0);
if (regs->flags & X86_EFLAGS_IF)
trace_hardirqs_on_prepare();
instrumentation_end();
nmi_exit();
}
}
#ifdef CONFIG_X86_64
/*
@@ -559,21 +662,20 @@ NOKPROBE_SYMBOL(do_int3);
* to switch to the normal thread stack if the interrupted code was in
* user mode. The actual stack switch is done in entry_64.S
*/
asmlinkage __visible notrace struct pt_regs *sync_regs(struct pt_regs *eregs)
asmlinkage __visible noinstr struct pt_regs *sync_regs(struct pt_regs *eregs)
{
struct pt_regs *regs = (struct pt_regs *)this_cpu_read(cpu_current_top_of_stack) - 1;
if (regs != eregs)
*regs = *eregs;
return regs;
}
NOKPROBE_SYMBOL(sync_regs);
struct bad_iret_stack {
void *error_entry_ret;
struct pt_regs regs;
};
asmlinkage __visible notrace
asmlinkage __visible noinstr
struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s)
{
/*
@@ -584,19 +686,21 @@ struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s)
* just below the IRET frame) and we want to pretend that the
* exception came from the IRET target.
*/
struct bad_iret_stack *new_stack =
(struct bad_iret_stack *)this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1;
struct bad_iret_stack tmp, *new_stack =
(struct bad_iret_stack *)__this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1;
/* Copy the IRET target to the new stack. */
memmove(&new_stack->regs.ip, (void *)s->regs.sp, 5*8);
/* Copy the IRET target to the temporary storage. */
memcpy(&tmp.regs.ip, (void *)s->regs.sp, 5*8);
/* Copy the remainder of the stack from the current stack. */
memmove(new_stack, s, offsetof(struct bad_iret_stack, regs.ip));
memcpy(&tmp, s, offsetof(struct bad_iret_stack, regs.ip));
/* Update the entry stack */
memcpy(new_stack, &tmp, sizeof(tmp));
BUG_ON(!user_mode(&new_stack->regs));
return new_stack;
}
NOKPROBE_SYMBOL(fixup_bad_iret);
#endif
static bool is_sysenter_singlestep(struct pt_regs *regs)
@@ -622,6 +726,43 @@ static bool is_sysenter_singlestep(struct pt_regs *regs)
#endif
}
static __always_inline void debug_enter(unsigned long *dr6, unsigned long *dr7)
{
/*
* Disable breakpoints during exception handling; recursive exceptions
* are exceedingly 'fun'.
*
* Since this function is NOKPROBE, and that also applies to
* HW_BREAKPOINT_X, we can't hit a breakpoint before this (XXX except a
* HW_BREAKPOINT_W on our stack)
*
* Entry text is excluded for HW_BP_X and cpu_entry_area, which
* includes the entry stack is excluded for everything.
*/
*dr7 = local_db_save();
/*
* The Intel SDM says:
*
* Certain debug exceptions may clear bits 0-3. The remaining
* contents of the DR6 register are never cleared by the
* processor. To avoid confusion in identifying debug
* exceptions, debug handlers should clear the register before
* returning to the interrupted task.
*
* Keep it simple: clear DR6 immediately.
*/
get_debugreg(*dr6, 6);
set_debugreg(0, 6);
/* Filter out all the reserved bits which are preset to 1 */
*dr6 &= ~DR6_RESERVED;
}
static __always_inline void debug_exit(unsigned long dr7)
{
local_db_restore(dr7);
}
/*
* Our handling of the processor debug registers is non-trivial.
* We do not clear them on entry and exit from the kernel. Therefore
@@ -646,86 +787,54 @@ static bool is_sysenter_singlestep(struct pt_regs *regs)
*
* May run on IST stack.
*/
dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
static void handle_debug(struct pt_regs *regs, unsigned long dr6, bool user)
{
struct task_struct *tsk = current;
int user_icebp = 0;
unsigned long dr6;
bool user_icebp;
int si_code;
nmi_enter();
get_debugreg(dr6, 6);
/*
* The Intel SDM says:
*
* Certain debug exceptions may clear bits 0-3. The remaining
* contents of the DR6 register are never cleared by the
* processor. To avoid confusion in identifying debug
* exceptions, debug handlers should clear the register before
* returning to the interrupted task.
*
* Keep it simple: clear DR6 immediately.
*/
set_debugreg(0, 6);
/* Filter out all the reserved bits which are preset to 1 */
dr6 &= ~DR6_RESERVED;
/*
* The SDM says "The processor clears the BTF flag when it
* generates a debug exception." Clear TIF_BLOCKSTEP to keep
* TIF_BLOCKSTEP in sync with the hardware BTF flag.
*/
clear_tsk_thread_flag(tsk, TIF_BLOCKSTEP);
clear_thread_flag(TIF_BLOCKSTEP);
if (unlikely(!user_mode(regs) && (dr6 & DR_STEP) &&
is_sysenter_singlestep(regs))) {
dr6 &= ~DR_STEP;
if (!dr6)
goto exit;
/*
* else we might have gotten a single-step trap and hit a
* watchpoint at the same time, in which case we should fall
* through and handle the watchpoint.
*/
}
/*
* If DR6 is zero, no point in trying to handle it. The kernel is
* not using INT1.
*/
if (!user && !dr6)
return;
/*
* If dr6 has no reason to give us about the origin of this trap,
* then it's very likely the result of an icebp/int01 trap.
* User wants a sigtrap for that.
*/
if (!dr6 && user_mode(regs))
user_icebp = 1;
user_icebp = user && !dr6;
/* Store the virtualized DR6 value */
tsk->thread.debugreg6 = dr6;
#ifdef CONFIG_KPROBES
if (kprobe_debug_handler(regs))
goto exit;
if (kprobe_debug_handler(regs)) {
return;
}
#endif
if (notify_die(DIE_DEBUG, "debug", regs, (long)&dr6, error_code,
SIGTRAP) == NOTIFY_STOP)
goto exit;
/*
* Let others (NMI) know that the debug stack is in use
* as we may switch to the interrupt stack.
*/
debug_stack_usage_inc();
if (notify_die(DIE_DEBUG, "debug", regs, (long)&dr6, 0,
SIGTRAP) == NOTIFY_STOP) {
return;
}
/* It's safe to allow irq's after DR6 has been saved */
cond_local_irq_enable(regs);
if (v8086_mode(regs)) {
handle_vm86_trap((struct kernel_vm86_regs *) regs, error_code,
X86_TRAP_DB);
cond_local_irq_disable(regs);
debug_stack_usage_dec();
goto exit;
handle_vm86_trap((struct kernel_vm86_regs *) regs, 0,
X86_TRAP_DB);
goto out;
}
if (WARN_ON_ONCE((dr6 & DR_STEP) && !user_mode(regs))) {
@@ -739,23 +848,91 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
regs->flags &= ~X86_EFLAGS_TF;
}
si_code = get_si_code(tsk->thread.debugreg6);
if (tsk->thread.debugreg6 & (DR_STEP | DR_TRAP_BITS) || user_icebp)
send_sigtrap(regs, error_code, si_code);
cond_local_irq_disable(regs);
debug_stack_usage_dec();
send_sigtrap(regs, 0, si_code);
exit:
out:
cond_local_irq_disable(regs);
}
static __always_inline void exc_debug_kernel(struct pt_regs *regs,
unsigned long dr6)
{
nmi_enter();
instrumentation_begin();
trace_hardirqs_off_finish();
/*
* Catch SYSENTER with TF set and clear DR_STEP. If this hit a
* watchpoint at the same time then that will still be handled.
*/
if ((dr6 & DR_STEP) && is_sysenter_singlestep(regs))
dr6 &= ~DR_STEP;
handle_debug(regs, dr6, false);
if (regs->flags & X86_EFLAGS_IF)
trace_hardirqs_on_prepare();
instrumentation_end();
nmi_exit();
}
NOKPROBE_SYMBOL(do_debug);
static __always_inline void exc_debug_user(struct pt_regs *regs,
unsigned long dr6)
{
idtentry_enter_user(regs);
instrumentation_begin();
handle_debug(regs, dr6, true);
instrumentation_end();
idtentry_exit_user(regs);
}
#ifdef CONFIG_X86_64
/* IST stack entry */
DEFINE_IDTENTRY_DEBUG(exc_debug)
{
unsigned long dr6, dr7;
debug_enter(&dr6, &dr7);
exc_debug_kernel(regs, dr6);
debug_exit(dr7);
}
/* User entry, runs on regular task stack */
DEFINE_IDTENTRY_DEBUG_USER(exc_debug)
{
unsigned long dr6, dr7;
debug_enter(&dr6, &dr7);
exc_debug_user(regs, dr6);
debug_exit(dr7);
}
#else
/* 32 bit does not have separate entry points. */
DEFINE_IDTENTRY_DEBUG(exc_debug)
{
unsigned long dr6, dr7;
debug_enter(&dr6, &dr7);
if (user_mode(regs))
exc_debug_user(regs, dr6);
else
exc_debug_kernel(regs, dr6);
debug_exit(dr7);
}
#endif
/*
* Note that we play around with the 'TS' bit in an attempt to get
* the correct behaviour even in the presence of the asynchronous
* IRQ13 behaviour
*/
static void math_error(struct pt_regs *regs, int error_code, int trapnr)
static void math_error(struct pt_regs *regs, int trapnr)
{
struct task_struct *task = current;
struct fpu *fpu = &task->thread.fpu;
@@ -766,16 +943,16 @@ static void math_error(struct pt_regs *regs, int error_code, int trapnr)
cond_local_irq_enable(regs);
if (!user_mode(regs)) {
if (fixup_exception(regs, trapnr, error_code, 0))
return;
if (fixup_exception(regs, trapnr, 0, 0))
goto exit;
task->thread.error_code = error_code;
task->thread.error_code = 0;
task->thread.trap_nr = trapnr;
if (notify_die(DIE_TRAP, str, regs, error_code,
trapnr, SIGFPE) != NOTIFY_STOP)
die(str, regs, error_code);
return;
if (notify_die(DIE_TRAP, str, regs, 0, trapnr,
SIGFPE) != NOTIFY_STOP)
die(str, regs, 0);
goto exit;
}
/*
@@ -784,32 +961,37 @@ static void math_error(struct pt_regs *regs, int error_code, int trapnr)
fpu__save(fpu);
task->thread.trap_nr = trapnr;
task->thread.error_code = error_code;
task->thread.error_code = 0;
si_code = fpu__exception_code(fpu, trapnr);
/* Retry when we get spurious exceptions: */
if (!si_code)
return;
goto exit;
force_sig_fault(SIGFPE, si_code,
(void __user *)uprobe_get_trap_addr(regs));
exit:
cond_local_irq_disable(regs);
}
dotraplinkage void do_coprocessor_error(struct pt_regs *regs, long error_code)
DEFINE_IDTENTRY(exc_coprocessor_error)
{
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
math_error(regs, error_code, X86_TRAP_MF);
math_error(regs, X86_TRAP_MF);
}
dotraplinkage void
do_simd_coprocessor_error(struct pt_regs *regs, long error_code)
DEFINE_IDTENTRY(exc_simd_coprocessor_error)
{
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
math_error(regs, error_code, X86_TRAP_XF);
if (IS_ENABLED(CONFIG_X86_INVD_BUG)) {
/* AMD 486 bug: INVD in CPL 0 raises #XF instead of #GP */
if (!static_cpu_has(X86_FEATURE_XMM)) {
__exc_general_protection(regs, 0);
return;
}
}
math_error(regs, X86_TRAP_XF);
}
dotraplinkage void
do_spurious_interrupt_bug(struct pt_regs *regs, long error_code)
DEFINE_IDTENTRY(exc_spurious_interrupt_bug)
{
/*
* This addresses a Pentium Pro Erratum:
@@ -832,13 +1014,10 @@ do_spurious_interrupt_bug(struct pt_regs *regs, long error_code)
*/
}
dotraplinkage void
do_device_not_available(struct pt_regs *regs, long error_code)
DEFINE_IDTENTRY(exc_device_not_available)
{
unsigned long cr0 = read_cr0();
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
#ifdef CONFIG_MATH_EMULATION
if (!boot_cpu_has(X86_FEATURE_FPU) && (cr0 & X86_CR0_EM)) {
struct math_emu_info info = { };
@@ -847,6 +1026,8 @@ do_device_not_available(struct pt_regs *regs, long error_code)
info.regs = regs;
math_emulate(&info);
cond_local_irq_disable(regs);
return;
}
#endif
@@ -861,22 +1042,20 @@ do_device_not_available(struct pt_regs *regs, long error_code)
* to kill the task than getting stuck in a never-ending
* loop of #NM faults.
*/
die("unexpected #NM exception", regs, error_code);
die("unexpected #NM exception", regs, 0);
}
}
NOKPROBE_SYMBOL(do_device_not_available);
#ifdef CONFIG_X86_32
dotraplinkage void do_iret_error(struct pt_regs *regs, long error_code)
DEFINE_IDTENTRY_SW(iret_error)
{
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
local_irq_enable();
if (notify_die(DIE_TRAP, "iret exception", regs, error_code,
if (notify_die(DIE_TRAP, "iret exception", regs, 0,
X86_TRAP_IRET, SIGILL) != NOTIFY_STOP) {
do_trap(X86_TRAP_IRET, SIGILL, "iret exception", regs, error_code,
do_trap(X86_TRAP_IRET, SIGILL, "iret exception", regs, 0,
ILL_BADSTK, (void __user *)NULL);
}
local_irq_disable();
}
#endif
@@ -887,21 +1066,10 @@ void __init trap_init(void)
idt_setup_traps();
/*
* Set the IDT descriptor to a fixed read-only location, so that the
* "sidt" instruction will not leak the location of the kernel, and
* to defend the IDT against arbitrary memory write vulnerabilities.
* It will be reloaded in cpu_init() */
cea_set_pte(CPU_ENTRY_AREA_RO_IDT_VADDR, __pa_symbol(idt_table),
PAGE_KERNEL_RO);
idt_descr.address = CPU_ENTRY_AREA_RO_IDT;
/*
* Should be a barrier for any external CPU state:
*/
cpu_init();
idt_setup_ist_traps();
idt_setup_debugidt_traps();
}

View File

@@ -74,13 +74,7 @@ static bool in_entry_code(unsigned long ip)
{
char *addr = (char *)ip;
if (addr >= __entry_text_start && addr < __entry_text_end)
return true;
if (addr >= __irqentry_text_start && addr < __irqentry_text_end)
return true;
return false;
return addr >= __entry_text_start && addr < __entry_text_end;
}
static inline unsigned long *last_frame(struct unwind_state *state)

View File

@@ -134,7 +134,6 @@ SECTIONS
KPROBES_TEXT
ALIGN_ENTRY_TEXT_BEGIN
ENTRY_TEXT
IRQENTRY_TEXT
ALIGN_ENTRY_TEXT_END
SOFTIRQENTRY_TEXT
*(.fixup)