x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue
AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET with
SS == 0 results in an invalid usermode state in which SS is apparently
equal to __USER_DS but causes #SS if used.
Work around the issue by setting SS to __KERNEL_DS __switch_to, thus
ensuring that SYSRET never happens with SS set to NULL.
This was exposed by a recent vDSO cleanup.
Fixes: e7d6eefaaa
x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit is contained in:

committed by
Linus Torvalds

parent
1190944f4b
commit
61f01dd941
@@ -419,6 +419,34 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
|
||||
task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV))
|
||||
__switch_to_xtra(prev_p, next_p, tss);
|
||||
|
||||
if (static_cpu_has_bug(X86_BUG_SYSRET_SS_ATTRS)) {
|
||||
/*
|
||||
* AMD CPUs have a misfeature: SYSRET sets the SS selector but
|
||||
* does not update the cached descriptor. As a result, if we
|
||||
* do SYSRET while SS is NULL, we'll end up in user mode with
|
||||
* SS apparently equal to __USER_DS but actually unusable.
|
||||
*
|
||||
* The straightforward workaround would be to fix it up just
|
||||
* before SYSRET, but that would slow down the system call
|
||||
* fast paths. Instead, we ensure that SS is never NULL in
|
||||
* system call context. We do this by replacing NULL SS
|
||||
* selectors at every context switch. SYSCALL sets up a valid
|
||||
* SS, so the only way to get NULL is to re-enter the kernel
|
||||
* from CPL 3 through an interrupt. Since that can't happen
|
||||
* in the same task as a running syscall, we are guaranteed to
|
||||
* context switch between every interrupt vector entry and a
|
||||
* subsequent SYSRET.
|
||||
*
|
||||
* We read SS first because SS reads are much faster than
|
||||
* writes. Out of caution, we force SS to __KERNEL_DS even if
|
||||
* it previously had a different non-NULL value.
|
||||
*/
|
||||
unsigned short ss_sel;
|
||||
savesegment(ss, ss_sel);
|
||||
if (ss_sel != __KERNEL_DS)
|
||||
loadsegment(ss, __KERNEL_DS);
|
||||
}
|
||||
|
||||
return prev_p;
|
||||
}
|
||||
|
||||
|
Reference in New Issue
Block a user