There's no good reason for it to be a macro, and x86_64 will want to
use it, so it should be in a header.
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
We currently pretend that IST context is like standard exception
context, but this is incorrect. IST entries from userspace are like
standard exceptions except that they use per-cpu stacks, so they are
atomic. IST entries from kernel space are like NMIs from RCU's
perspective -- they are not quiescent states even if they
interrupted the kernel during a quiescent state.
Add and use ist_enter and ist_exit to track IST context. Even
though x86_32 has no IST stacks, we track these interrupts the same
way.
This fixes two issues:
- Scheduling from an IST interrupt handler will now warn. It would
previously appear to work as long as we got lucky and nothing
overwrote the stack frame. (I don't know of any bugs in this
that would trigger the warning, but it's good to be on the safe
side.)
- RCU handling in IST context was dangerous. As far as I know,
only machine checks were likely to trigger this, but it's good to
be on the safe side.
Note that the machine check handlers appears to have been missing
any context tracking at all before this patch.
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
This causes all non-NMI, non-double-fault kernel entries from
userspace to run on the normal kernel stack. Double-fault is
exempt to minimize confusion if we double-fault directly from
userspace due to a bad kernel stack.
This is, suprisingly, simpler and shorter than the current code. It
removes the IMO rather frightening paranoid_userspace path, and it
make sync_regs much simpler.
There is no risk of stack overflow due to this change -- the kernel
stack that we switch to is empty.
This will also enable us to create non-atomic sections within
machine checks from userspace, which will simplify memory failure
handling. It will also allow the upcoming fsgsbase code to be
simplified, because it doesn't need to worry about usergs when
scheduling in paranoid_exit, as that code no longer exists.
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Tony Luck <tony.luck@intel.com>
Acked-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
This commit introduces code for the live patching core. It implements
an ftrace-based mechanism and kernel interface for doing live patching
of kernel and kernel module functions.
It represents the greatest common functionality set between kpatch and
kgraft and can accept patches built using either method.
This first version does not implement any consistency mechanism that
ensures that old and new code do not run together. In practice, ~90% of
CVEs are safe to apply in this way, since they simply add a conditional
check. However, any function change that can not execute safely with
the old version of the function can _not_ be safely applied in this
version.
[ jkosina@suse.cz: due to the number of contributions that got folded into
this original patch from Seth Jennings, add SUSE's copyright as well, as
discussed via e-mail ]
Signed-off-by: Seth Jennings <sjenning@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Pull x86 apic updates from Thomas Gleixner:
"After stopping the full x86/apic branch, I took some time to go
through the first block of patches again, which are mostly cleanups
and preparatory work for the irqdomain conversion and ioapic hotplug
support.
Unfortunaly one of the real problematic commits was right at the
beginning, so I rebased this portion of the pending patches without
the offenders.
It would be great to get this into 3.19. That makes reworking the
problematic parts simpler. The usual tip testing did not unearth any
issues and it is fully bisectible now.
I'm pretty confident that this wont affect the calmness of the xmas
season.
Changes:
- Split the convoluted io_apic.c code into domain specific parts
(vector, ioapic, msi, htirq)
- Introduce proper helper functions to retrieve irq specific data
instead of open coded dereferencing of pointers
- Preparatory work for ioapic hotplug and irqdomain conversion
- Removal of the non functional pci-ioapic driver
- Removal of unused irq entry stubs
- Make native_smp_prepare_cpus() preemtible to avoid GFP_ATOMIC
allocations for everything which is called from there.
- Small cleanups and fixes"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
iommu/amd: Use helpers to access irq_cfg data structure associated with IRQ
iommu/vt-d: Use helpers to access irq_cfg data structure associated with IRQ
x86: irq_remapping: Use helpers to access irq_cfg data structure associated with IRQ
x86, irq: Use helpers to access irq_cfg data structure associated with IRQ
x86, irq: Make MSI and HT_IRQ indepenent of X86_IO_APIC
x86, irq: Move IRQ initialization routines from io_apic.c into vector.c
x86, irq: Move IOAPIC related declarations from hw_irq.h into io_apic.h
x86, irq: Move HT IRQ related code from io_apic.c into htirq.c
x86, irq: Move PCI MSI related code from io_apic.c into msi.c
x86, irq: Replace printk(KERN_LVL) with pr_lvl() utilities
x86, irq: Make UP version of irq_complete_move() an inline stub
x86, irq: Move local APIC related code from io_apic.c into vector.c
x86, irq: Introduce helpers to access struct irq_cfg
x86, irq: Protect __clear_irq_vector() with vector_lock
x86, irq: Rename local APIC related functions in io_apic.c as apic_xxx()
x86, irq: Refine hw_irq.h to prepare for irqdomain support
x86, irq: Convert irq_2_pin list to generic list
x86, irq: Kill useless parameter 'irq_attr' of IO_APIC_get_PCI_irq_vector()
x86, irq, acpi: Get rid of special handling of GSI for ACPI SCI
x86, irq: Introduce helper to check whether an IOAPIC has been registered
...
Pull x86 MPX fixes from Thomas Gleixner:
"Three updates for the new MPX infrastructure:
- Use the proper error check in the trap handler
- Add a proper config option for it
- Bring documentation up to date"
* 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, mpx: Give MPX a real config option prompt
x86, mpx: Update documentation
x86_64/traps: Fix always true condition
Pull x86 fix from Ingo Molnar:
"This contains a single TLS ABI validation fix from Andy Lutomirski"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tls: Don't validate lm in set_thread_area() after all
Pull perf fixes and cleanups from Ingo Molnar:
"A kernel fix plus mostly tooling fixes, but also some tooling
restructuring and cleanups"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
perf: Fix building warning on ARM 32
perf symbols: Fix use after free in filename__read_build_id
perf evlist: Use roundup_pow_of_two
tools: Adopt roundup_pow_of_two
perf tools: Make the mmap length autotuning more robust
tools: Adopt rounddown_pow_of_two and deps
tools: Adopt fls_long and deps
tools: Move bitops.h from tools/perf/util to tools/
tools: Introduce asm-generic/bitops.h
tools lib: Move asm-generic/bitops/find.h code to tools/include and tools/lib
tools: Whitespace prep patches for moving bitops.h
tools: Move code originally from asm-generic/atomic.h into tools/include/asm-generic/
tools: Move code originally from linux/log2.h to tools/include/linux/
tools: Move __ffs implementation to tools/include/asm-generic/bitops/__ffs.h
perf evlist: Do not use hard coded value for a mmap_pages default
perf trace: Let the perf_evlist__mmap autosize the number of pages to use
perf evlist: Improve the strerror_mmap method
perf evlist: Clarify sterror_mmap variable names
perf evlist: Fixup brown paper bag on "hint" for --mmap-pages cmdline arg
perf trace: Provide a better explanation when mmap fails
...
Pull more ACPI and power management updates from Rafael Wysocki:
"These are regression fixes (leds-gpio, ACPI backlight driver,
operating performance points library, ACPI device enumeration
messages, cpupower tool), other bug fixes (ACPI EC driver, ACPI device
PM), some cleanups in the operating performance points (OPP)
framework, continuation of CONFIG_PM_RUNTIME elimination, a couple of
minor intel_pstate driver changes, a new MAINTAINERS entry for it and
an ACPI fan driver change needed for better support of thermal
management in user space.
Specifics:
- Fix a regression in leds-gpio introduced by a recent commit that
inadvertently changed the name of one of the properties used by the
driver (Fabio Estevam).
- Fix a regression in the ACPI backlight driver introduced by a
recent fix that missed one special case that had to be taken into
account (Aaron Lu).
- Drop the level of some new kernel messages from the ACPI core
introduced by a recent commit to KERN_DEBUG which they should have
used from the start and drop some other unuseful KERN_ERR messages
printed by ACPI (Rafael J Wysocki).
- Revert an incorrect commit modifying the cpupower tool (Prarit
Bhargava).
- Fix two regressions introduced by recent commits in the OPP library
and clean up some existing minor issues in that code (Viresh
Kumar).
- Continue to replace CONFIG_PM_RUNTIME with CONFIG_PM throughout the
tree (or drop it where that can be done) in order to make it
possible to eliminate CONFIG_PM_RUNTIME (Rafael J Wysocki, Ulf
Hansson, Ludovic Desroches).
There will be one more "CONFIG_PM_RUNTIME removal" batch after this
one, because some new uses of it have been introduced during the
current merge window, but that should be sufficient to finally get
rid of it.
- Make the ACPI EC driver more robust against race conditions related
to GPE handler installation failures (Lv Zheng).
- Prevent the ACPI device PM core code from attempting to disable
GPEs that it has not enabled which confuses ACPICA and makes it
report errors unnecessarily (Rafael J Wysocki).
- Add a "force" command line switch to the intel_pstate driver to
make it possible to override the blacklisting of some systems in
that driver if needed (Ethan Zhao).
- Improve intel_pstate code documentation and add a MAINTAINERS entry
for it (Kristen Carlson Accardi).
- Make the ACPI fan driver create cooling device interfaces witn
names that reflect the IDs of the ACPI device objects they are
associated with, except for "generic" ACPI fans (PNP ID "PNP0C0B").
That's necessary for user space thermal management tools to be able
to connect the fans with the parts of the system they are supposed
to be cooling properly. From Srinivas Pandruvada"
* tag 'pm+acpi-3.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (32 commits)
MAINTAINERS: add entry for intel_pstate
ACPI / video: update the skip case for acpi_video_device_in_dod()
power / PM: Eliminate CONFIG_PM_RUNTIME
NFC / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
SCSI / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
ACPI / EC: Fix unexpected ec_remove_handlers() invocations
Revert "tools: cpupower: fix return checks for sysfs_get_idlestate_count()"
tracing / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
x86 / PM: Replace CONFIG_PM_RUNTIME in io_apic.c
PM: Remove the SET_PM_RUNTIME_PM_OPS() macro
mmc: atmel-mci: use SET_RUNTIME_PM_OPS() macro
PM / Kconfig: Replace PM_RUNTIME with PM in dependencies
ARM / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
sound / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
phy / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
video / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
tty / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
spi: Replace CONFIG_PM_RUNTIME with CONFIG_PM
ACPI / PM: Do not disable wakeup GPEs that have not been enabled
ACPI / utils: Drop error messages from acpi_evaluate_reference()
...
Pull KVM update from Paolo Bonzini:
"3.19 changes for KVM:
- spring cleaning: removed support for IA64, and for hardware-
assisted virtualization on the PPC970
- ARM, PPC, s390 all had only small fixes
For x86:
- small performance improvements (though only on weird guests)
- usual round of hardware-compliancy fixes from Nadav
- APICv fixes
- XSAVES support for hosts and guests. XSAVES hosts were broken
because the (non-KVM) XSAVES patches inadvertently changed the KVM
userspace ABI whenever XSAVES was enabled; hence, this part is
going to stable. Guest support is just a matter of exposing the
feature and CPUID leaves support"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (179 commits)
KVM: move APIC types to arch/x86/
KVM: PPC: Book3S: Enable in-kernel XICS emulation by default
KVM: PPC: Book3S HV: Improve H_CONFER implementation
KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register
KVM: PPC: Book3S HV: Remove code for PPC970 processors
KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
KVM: PPC: Book3S HV: Simplify locking around stolen time calculations
arch: powerpc: kvm: book3s_paired_singles.c: Remove unused function
arch: powerpc: kvm: book3s_pr.c: Remove unused function
arch: powerpc: kvm: book3s.c: Remove some unused functions
arch: powerpc: kvm: book3s_32_mmu.c: Remove unused function
KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked
KVM: PPC: Book3S HV: ptes are big endian
KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPI
KVM: PPC: Book3S HV: Fix KSM memory corruption
KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI
KVM: PPC: Book3S HV: Fix computation of tlbie operand
KVM: PPC: Book3S HV: Add missing HPTE unlock
KVM: PPC: BookE: Improve irq inject tracepoint
arm/arm64: KVM: Require in-kernel vgic for the arch timers
...
It turns out that there's a lurking ABI issue. GCC, when
compiling this in a 32-bit program:
struct user_desc desc = {
.entry_number = idx,
.base_addr = base,
.limit = 0xfffff,
.seg_32bit = 1,
.contents = 0, /* Data, grow-up */
.read_exec_only = 0,
.limit_in_pages = 1,
.seg_not_present = 0,
.useable = 0,
};
will leave .lm uninitialized. This means that anything in the
kernel that reads user_desc.lm for 32-bit tasks is unreliable.
Revert the .lm check in set_thread_area(). The value never did
anything in the first place.
Fixes: 0e58af4e1d ("x86/tls: Disallow unusual TLS segments")
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org # Only if 0e58af4e1d is backported
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/d7875b60e28c512f6a6fc0baf5714d58e7eaadbb.1418856405.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When X86_LOCAL_APIC (i.e. unconditionally on x86-64),
first_system_vector will never end up being higher than
LOCAL_TIMER_VECTOR (0xef), and hence building stubs for vectors
0xef...0xff is pointlessly reducing code density. Deal with this at
build time already.
Taking into consideration that X86_64 implies X86_LOCAL_APIC, also
simplify (and hence make easier to read and more consistent with the
change done here) some #if-s in arch/x86/kernel/irqinit.c.
While we could further improve the packing of the IRQ entry stubs (the
four ones now left in the last set could be fit into the four padding
bytes each of the final four sets have) this doesn't seem to provide
any real benefit: Both irq_entries_start and common_interrupt getting
cache line aligned, eliminating the 30th set would just produce 32
bytes of padding between the 29th and common_interrupt.
[ tglx: Folded lguest fix from Dan Carpenter ]
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: lguest@lists.ozlabs.org
Cc: Rusty Russell <rusty@rustcorp.com.au>
Link: http://lkml.kernel.org/r/54574D5F0200007800044389@mail.emea.novell.com
Link: http://lkml.kernel.org/r/20141115185718.GB6530@mwanda
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull drm updates from Dave Airlie:
"Highlights:
- AMD KFD driver merge
This is the AMD HSA interface for exposing a lowlevel interface for
GPGPU use. They have an open source userspace built on top of this
interface, and the code looks as good as it was going to get out of
tree.
- Initial atomic modesetting work
The need for an atomic modesetting interface to allow userspace to
try and send a complete set of modesetting state to the driver has
arisen, and been suffering from neglect this past year. No more,
the start of the common code and changes for msm driver to use it
are in this tree. Ongoing work to get the userspace ioctl finished
and the code clean will probably wait until next kernel.
- DisplayID 1.3 and tiled monitor exposed to userspace.
Tiled monitor property is now exposed for userspace to make use of.
- Rockchip drm driver merged.
- imx gpu driver moved out of staging
Other stuff:
- core:
panel - MIPI DSI + new panels.
expose suggested x/y properties for virtual GPUs
- i915:
Initial Skylake (SKL) support
gen3/4 reset work
start of dri1/ums removal
infoframe tracking
fixes for lots of things.
- nouveau:
tegra k1 voltage support
GM204 modesetting support
GT21x memory reclocking work
- radeon:
CI dpm fixes
GPUVM improvements
Initial DPM fan control
- rcar-du:
HDMI support added
removed some support for old boards
slave encoder driver for Analog Devices adv7511
- exynos:
Exynos4415 SoC support
- msm:
a4xx gpu support
atomic helper conversion
- tegra:
iommu support
universal plane support
ganged-mode DSI support
- sti:
HDMI i2c improvements
- vmwgfx:
some late fixes.
- qxl:
use suggested x/y properties"
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (969 commits)
drm: sti: fix module compilation issue
drm/i915: save/restore GMBUS freq across suspend/resume on gen4
drm: sti: correctly cleanup CRTC and planes
drm: sti: add HQVDP plane
drm: sti: add cursor plane
drm: sti: enable auxiliary CRTC
drm: sti: fix delay in VTG programming
drm: sti: prepare sti_tvout to support auxiliary crtc
drm: sti: use drm_crtc_vblank_{on/off} instead of drm_vblank_{on/off}
drm: sti: fix hdmi avi infoframe
drm: sti: remove event lock while disabling vblank
drm: sti: simplify gdp code
drm: sti: clear all mixer control
drm: sti: remove gpio for HDMI hot plug detection
drm: sti: allow to change hdmi ddc i2c adapter
drm/doc: Document drm_add_modes_noedid() usage
drm/i915: Remove '& 0xffff' from the mask given to WA_REG()
drm/i915: Invert the mask and val arguments in wa_add() and WA_REG()
drm: Zero out DRM object memory upon cleanup
drm/i915/bdw: Fix the write setting up the WIZ hashing mode
...
Pull driver core update from Greg KH:
"Here's the set of driver core patches for 3.19-rc1.
They are dominated by the removal of the .owner field in platform
drivers. They touch a lot of files, but they are "simple" changes,
just removing a line in a structure.
Other than that, a few minor driver core and debugfs changes. There
are some ath9k patches coming in through this tree that have been
acked by the wireless maintainers as they relied on the debugfs
changes.
Everything has been in linux-next for a while"
* tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (324 commits)
Revert "ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries"
fs: debugfs: add forward declaration for struct device type
firmware class: Deletion of an unnecessary check before the function call "vunmap"
firmware loader: fix hung task warning dump
devcoredump: provide a one-way disable function
device: Add dev_<level>_once variants
ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries
ath: use seq_file api for ath9k debugfs files
debugfs: add helper function to create device related seq_file
drivers/base: cacheinfo: remove noisy error boot message
Revert "core: platform: add warning if driver has no owner"
drivers: base: support cpu cache information interface to userspace via sysfs
drivers: base: add cpu_device_create to support per-cpu devices
topology: replace custom attribute macros with standard DEVICE_ATTR*
cpumask: factor out show_cpumap into separate helper function
driver core: Fix unbalanced device reference in drivers_probe
driver core: fix race with userland in device_add()
sysfs/kernfs: make read requests on pre-alloc files use the buffer.
sysfs/kernfs: allow attributes to request write buffer be pre-allocated.
fs: sysfs: return EGBIG on write if offset is larger than file size
...
Pull x86 fixes from Ingo Molnar:
"Misc fixes (mainly Andy's TLS fixes), plus a cleanup"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tls: Disallow unusual TLS segments
x86/tls: Validate TLS entries to protect espfix
MAINTAINERS: Add me as x86 VDSO submaintainer
x86/asm: Unify segment selector defines
x86/asm: Guard against building the 32/64-bit versions of the asm-offsets*.c file directly
x86_64, switch_to(): Load TLS descriptors before switching DS and ES
x86/mm: Use min() instead of min_t() in the e820 printout code
x86/mm: Fix zone ranges boot printout
x86/doc: Update documentation after file shuffling
Users have no business installing custom code segments into the
GDT, and segments that are not present but are otherwise valid
are a historical source of interesting attacks.
For completeness, block attempts to set the L bit. (Prior to
this patch, the L bit would have been silently dropped.)
This is an ABI break. I've checked glibc, musl, and Wine, and
none of them look like they'll have any trouble.
Note to stable maintainers: this is a hardening patch that fixes
no known bugs. Given the possibility of ABI issues, this
probably shouldn't be backported quickly.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: stable@vger.kernel.org # optional
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: security@kernel.org <security@kernel.org>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull tracing fixes from Steven Rostedt:
"Here's two fixes:
1) Discovered by Fengguang Wu's tests. I changed the parameters to
the function graph x86 prepare_ftrace_return call but forgot to
update the call from entry_32 (i386 version). This patch corrects
that.
2) I was tracing some code and found that the sched_switch tracepoint
was showing tasks in the INTERRUPTIBLE state as RUNNING. This was
due to the updates to convert preempt_count into a per_cpu
variable. The tracepoint logic was made to use the tasks
saved_preempt_count which could hold a stale "PREEMPT_ACTIVE",
instead of using the current preempt_count() call"
* tag 'trace-fixes-v3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing/sched: Check preempt_count() for current when reading task->state
ftrace/x86: Update i386 call to prepare_ftrace_return()
After commit b2b49ccbdd (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is
selected) PM_RUNTIME is always set if PM is set, so #ifdef blocks
depending on CONFIG_PM_RUNTIME may now be changed to depend on
CONFIG_PM.
Replace CONFIG_PM_RUNTIME with CONFIG_PM in x86/kernel/apic/io_apic.c.
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>