Third set of PPC KVM fixes for 4.19
One patch here, fixing a potential host crash introduced (or at least
exacerbated) by a previous fix for corruption relating to radix guest
page faults and THP operations.
This adds a mode where the vcore scheduling logic in HV KVM limits itself
to scheduling only virtual cores from the same VM on any given physical
core. This is enabled via a new module parameter on the kvm-hv module
called "one_vm_per_core". For this to work on POWER9, it is necessary to
set indep_threads_mode=N. (On POWER8, hardware limitations mean that KVM
is never in independent threads mode, regardless of the indep_threads_mode
setting.)
Thus the settings needed for this to work are:
1. The host is in SMT1 mode.
2. On POWER8, the host is not in 2-way or 4-way static split-core mode.
3. On POWER9, the indep_threads_mode parameter is N.
4. The one_vm_per_core parameter is Y.
With these settings, KVM can run up to 4 vcpus on a core at the same
time on POWER9, or up to 8 vcpus on POWER8 (depending on the guest
threading mode), and will ensure that all of the vcpus belong to the
same VM.
This is intended for use in security-conscious settings where users are
concerned about possible side-channel attacks between threads which could
perhaps enable one VM to attack another VM on the same core, or the host.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
When an OS (currently only classic Mac OS) is running in KVM-PR and makes a
linked jump from code with split hack addressing enabled into code that does
not, LR is not correctly updated and reflects the previously munged PC.
To fix this, this patch undoes the address munge when exiting split
hack mode so that code relying on LR being a proper address will now
execute. This does not affect OS X or other operating systems running
on KVM-PR.
Signed-off-by: Cameron Kaiser <spectre@floodgap.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
For historical reasons, the AES-NI based implementation of the PCBC
chaining mode uses a special FPU chaining mode wrapper template to
amortize the FPU start/stop overhead over multiple blocks.
When this FPU wrapper was introduced, it supported widely used
chaining modes such as XTS and CTR (as well as LRW), but currently,
PCBC is the only remaining user.
Since there are no known users of pcbc(aes) in the kernel, let's remove
this special driver, and rely on the generic pcbc driver to encapsulate
the AES-NI core cipher.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The code flow for the vclocks is convoluted as it requires the vclocks
which can be invalidated separately from the vsyscall_gtod_data sequence to
store the fact in a separate variable. That's inefficient.
Restructure the code so the vclock readout returns cycles and the
conversion to nanoseconds is handled at the call site.
If the clock gets invalidated or vclock is already VCLOCK_NONE, return
U64_MAX as the cycle value, which is invalid for all clocks and leave the
sequence loop immediately in that case by calling the fallback function
directly.
This allows to remove the gettimeofday fallback as it now uses the
clock_gettime() fallback and does the nanoseconds to microseconds
conversion in the same way as it does when the vclock is functional. It
does not make a difference whether the division by 1000 happens in the
kernel fallback or in userspace.
Generates way better code and gains a few cycles back.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Matt Rickard <matt@softrans.com.au>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: devel@linuxdriverproject.org
Cc: virtualization@lists.linux-foundation.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Juergen Gross <jgross@suse.com>
Link: https://lkml.kernel.org/r/20180917130707.657928937@linutronix.de
It's desired to support more clocks in the VDSO, e.g. CLOCK_TAI. This
results either in indirect calls due to the larger switch case, which then
requires retpolines or when the compiler is forced to avoid jump tables it
results in even more conditionals.
To avoid both variants which are bad for performance the high resolution
functions and the coarse grained functions will be collapsed into one for
each. That requires to store the clock specific base time in an array.
Introcude struct vgtod_ts for storage and convert the data store, the
update function and the individual clock functions over to use it.
The new storage does not longer use gtod_long_t for seconds depending on 32
or 64 bit compile because this needs to be the full 64bit value even for
32bit when a Y2038 function is added. No point in keeping the distinction
alive in the internal representation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Matt Rickard <matt@softrans.com.au>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: devel@linuxdriverproject.org
Cc: virtualization@lists.linux-foundation.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Juergen Gross <jgross@suse.com>
Link: https://lkml.kernel.org/r/20180917130707.324679401@linutronix.de
As the v4l2-fwnode now allows drivers to set defaults, and eventually
override them by specifying properties in DTS, use defaults for the CEU
driver.
Also remove endpoint properties from the gr-peach-audiocamerashield as
they match the defaults now specified in the driver code
(h/vsync-active and bus-width) or are not relevant to the interface
as they cannot be configured (pclk-sample).
Signed-off-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Russell writes:
"A couple of small ARM fixes from Stefan and Thomas:
- Adding the io_pgetevents syscall
- Fixing a bounds check in pci_ioremap_io()"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8799/1: mm: fix pci_ioremap_io() offset check
ARM: 8787/1: wire up io_pgetevents syscall
Palmer writes:
"A Single RISC-V Fix for 4.19-rc7
This tag contains a single patch that managed to get lost in the
shuffle, which explains why it's so late. This single line has been
floating around in various patch sets for months, and fixes our DMA32
region."
* tag 'riscv-for-linus-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
RISCV: Fix end PFN for low memory
This board has a Micron MT29F8G08ABACAWP chip which requires a ECC
strength of 8/512. Rather than hard coding any particular strength the
the nand controller auto-detect the ECC strength based on the ONFI data.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Renesas ARM64 Based SoC SoC Updates for v4.20
* Add support for RZ/G2E (r8a774c0) and RZ/G2M (r8a774a1) SoCs
* Enable Compare Match Timer (CMT) and Timer Unit (TMU)
for Renesas SoCs
* Remove no longer needed ARCH_SHMOBILE Kconfig symbol
* tag 'renesas-arm64-soc-for-v4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
arm64: Add Renesas R8A774C0 support
arm64: Add Renesas R8A774A1 support
arm64: enable CMT/TMU support for Renesas SoC
arm64: renesas: Remove the ARCH_SHMOBILE Kconfig symbol
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
i.MX fixes for 4.19, round 2:
- i.MX53 QSB board stops working when cpufreq driver is enabled,
because the default OPP table has the maximum CPU frequency at
1.2GHz, while the board is only capable of running 1GHz. Fix up
the OPP table for the board to get it work with cpufreq driver.
* tag 'imx-fixes-4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: dts: imx53-qsb: disable 1.2GHz OPP
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
DaVinci DT updates for v4.20
----------------------------
- A non-critical fix to reduce errors on A/DC on Lego Mindstorms EV3
- Support for GPIO expander on DA850 EVM
* tag 'davinci-for-v4.20/dt' of git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci:
ARM: dts: da850-lego-ev3: slow down A/DC as much as possible
ARM: dts: da850-evm: Enable tca6416 on baseboard
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
KVM: s390: Features for 4.20
- Initial version of AP crypto virtualization via vfio-mdev
- Set the host program identifier
- Optimize page table locking
Commit b5861e5cf2 introduced a check on
the interrupt-window and NMI-window CPU execution controls in order to
inject an external interrupt vmexit before the first guest instruction
executes. However, when APIC virtualization is enabled the host does not
need a vmexit in order to inject an interrupt at the next interrupt window;
instead, it just places the interrupt vector in RVI and the processor will
inject it as soon as possible. Therefore, on machines with APICv it is
not enough to check the CPU execution controls: the same scenario can also
happen if RVI>vPPR.
Fixes: b5861e5cf2
Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Nodes for the newly support rk3188 display controller, a fix for a new
dtc warning, gpio setting for the sdmmc regulator on radxarock and a
new board the "S" variant of the rk3288-based Tinker board, that sports
an added emmc.
* tag 'v4.20-rockchip-dts32-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
ARM: dts: rockchip: add rk3288-based Tinker board S
ARM: dts: rockchip: move shared tinker-board nodes to a common dtsi
ARM: dts: rockchip: explicitly set vcc_sd0 pin to gpio on rk3188-radxarock
ARM: dts: rockchip: Fix erroneous SPI bus dtc warnings on rk3036
ARM: dts: rockchip: add rk3188 lcd controller nodes
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Due to the electrical design of the A/DC circuits on LEGO MINDSTORMS EV3,
if we are reading analog values as fast as possible (i.e. using DMA to
service the SPI) the A/DC chip will read incorrect values - as much as
0.1V off when the SPI is running at 10MHz. (This has to do with the
capacitor charge time when channels are muxed in the A/DC.)
This patch slows down the SPI as much as possible (if CPU is at 456MHz,
SPI runs at 1/2 of that, so 228MHz and has a max prescalar of 256, so
we could get ~891kHz, but we're just rounding it to 1MHz). We also use
the max allowable value for WDELAY to slow things down even more.
These changes reduce the error of the analog values to about 5mV, which
is tolerable.
Commits a3762b13a5 ("spi: spi-davinci: Add support for SPI_CS_WORD")
and e2540da86e ("iio: adc: ti-ads7950: use SPI_CS_WORD to reduce
CPU usage") introduce changes that allow DMA transfers to be used, so
this slow down is needed now.
Signed-off-by: David Lechner <david@lechnology.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
There is a GPIO expander on both the UI board as well as the
baseboard. This patch enables the second tca6416 and identifies
it as being on the baseboard using _bb as the suffix.
Signed-off-by: Adam Ford <aford173@gmail.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Since the very beginning, unsigned int .irq member of struct
plat_serial8250_port introduced by commit eff443df67 ("OMAP1:
AMS_DELTA: add modem support") was statically initialized to a negative
value -EINVAL. Moreover, commit 0812db9437 ("ARM: OMAP1: ams-delta:
assign MODEM IRQ from GPIO descriptor") has introduced some new code
which checks for that member carrying a negative value which is
impossible.
Use IRQ_NOTCONNECTED instead of -EINVAL. Also, drop the valueless check
and let the modem device be registered regardless of .irq value, and
the value handled by "serial8250" driver.
Fixes: 0812db9437 ("ARM: OMAP1: ams-delta: assign MODEM IRQ from GPIO descriptor")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Janusz Krzysztofik <jmkrzyszt@gmail.com>
Acked-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Some additional new boards, the rk3399-based RockPro64 from Pine64, as well
as the Vamrs Rock960. Another big feature is display support including hdmi
and the Innosilicon hdmiphy on the rk3328, right now enabled on the rock64.
The rock64 also got its spi-nor and spdif enabled. On the px30 we can see
dwc2-based usb support now and finally some misc fixes, like for a new dtc
warning, missing address and size cells and microSD fix on sapphire.
* tag 'v4.20-rockchip-dts64-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
arm64: dts: rockchip: enable display nodes on rk3328-rock64
arm64: dts: rockchip: add rk3328 display nodes
arm64: dts: rockchip: add Innosilicon hdmi phy node to rk3328
arm64: dts: rockchip: add missing address and size cells for rk3399 mipi dsi
arm64: dts: rockchip: Enable SPI NOR flash on Rock64
arm64: dts: rockchip: add initial dts support for Rockpro64
arm64: dts: rockchip: enable dwc2-based otg controller on px30-evb
arm64: dts: rockchip: add dwc2 otg controller on px30
dt-bindings: usb: dwc2: add description for px30
arm64: dts: rockchip: Enable SD card detection for Rock960 boards
arm64: dts: rockchip: Add support for Rock960 board
dt-bindings: arm: rockchip: Add binding for Rock960 board
arm64: dts: rockchip: Split out common nodes for Rock960 based boards
arm64: dts: rockchip: add spdif sound node for rock64
arm64: dts: rockchip: Fix microSD in rk3399 sapphire board
arm64: dts: rockchip: Fix I2C bus unit-address error on rk3399-puma-haikou
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
More devicetree changes for omap variants
There's a patch for previously merged dts changes to remove the last
remaining use of legacy phy_id property.
For dra7, we have a non-urgent PCIe dts fix, enable a PCIe errata for
unaligned access.
For omap5, we enable omap5 USB OTG mode for DWC3 controller.
And we add support for am335x based Moxa UC-2100 series of industrial
computers.
* tag 'omap-for-v4.20/dt-signed-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: am335x: Replace remaining legacy phy_id with phy-handle
ARM: dts: am335x: add support for Moxa UC-2101 open platform
ARM: dts: am335x: add common file for UC-2100 series
ARM: dts: omap5: enable OTG role for DWC3 controller
ARM: dts: dra7: Enable workaround for errata i870 in PCIe host mode
ARM: dts: dra7: Fix up unaligned access setting for PCIe EP
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The purpose of CONFIG_CPU_32v3 is to avoid ldrh/strh on the RiscPC,
which is pretty much an ARMv4 device, except its bus will choke on the
half-words. The way to make the C compiler not output ldrh/strh is with
-march=armv3, which doesn't support them in the ISA. However, this
prevents certain cryptography code from working that uses instructions
like umull. Fortunately there's also -march=armv3m that does support
those, making it possible to continue assembling optimized cryptography
routines for our beloved RiscPC.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
While in theory multiple unwinders could be compiled in, it does
not make sense in practise. Use a choice to make the unwinder
selection mutually exclusive and mandatory.
Already before this commit it has not been possible to deselect
FRAME_POINTER. Remove the obsolete comment.
Furthermore, to produce a meaningful backtrace with FRAME_POINTER
enabled the kernel needs a specific function prologue:
mov ip, sp
stmfd sp!, {fp, ip, lr, pc}
sub fp, ip, #4
To get to the required prologue gcc uses apcs and no-sched-prolog.
This compiler options are not available on clang, and clang is not
able to generate the required prologue. Make the FRAME_POINTER
config symbol depending on !clang.
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Stefan Agner <stefan@agner.ch>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Local radix TLB flush operations that operate on congruence classes
have explicit ERAT flushes for POWER9. The process scoped LPID flush
did not have a flush, so add it.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
PPC_INVALIDATE_ERAT is slbia IH=7 which is a new variant introduced
with POWER9, and the result is undefined on earlier CPUs.
Commits 7b9f71f974 ("powerpc/64s: POWER9 machine check handler") and
d4748276ae ("powerpc/64s: Improve local TLB flush for boot and MCE on
POWER9") caused POWER7/8 code to use this instruction. Remove it. An
ERAT flush can be made by invalidatig the SLB, but before POWER9 that
requires a flush and rebolt.
Fixes: 7b9f71f974 ("powerpc/64s: POWER9 machine check handler")
Fixes: d4748276ae ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9")
Cc: stable@vger.kernel.org # v4.11+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
If CONFIG_PPC_WATCHDOG is enabled we always cap the decrementer to
0x7fffffff:
if (IS_ENABLED(CONFIG_PPC_WATCHDOG))
set_dec(0x7fffffff);
else
set_dec(decrementer_max);
If there are no future events, we don't reprogram the decrementer
after this and we end up with 0x7fffffff even on a large decrementer
capable system.
As suggested by Nick, add a set_state_oneshot_stopped callback
so we program the decrementer with decrementer_max if there are
no future events.
Signed-off-by: Anton Blanchard <anton@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We currently cap the decrementer clockevent at 4 seconds, even on systems
with large decrementer support. Fix this by converting the code to use
clockevents_register_device() which calculates the upper bound based on
the max_delta passed in.
Signed-off-by: Anton Blanchard <anton@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
As of commit 8d860bbeed ("kvm: vmx: Basic APIC virtualization controls
have three settings"), KVM will disable VIRTUALIZE_APIC_ACCESSES when
a nested guest writes APIC_BASE MSR and kvm-intel.flexpriority=0,
whereas previously KVM would allow a nested guest to enable
VIRTUALIZE_APIC_ACCESSES so long as it's supported in hardware. That is,
KVM now advertises VIRTUALIZE_APIC_ACCESSES to a guest but doesn't
(always) allow setting it when kvm-intel.flexpriority=0, and may even
initially allow the control and then clear it when the nested guest
writes APIC_BASE MSR, which is decidedly odd even if it doesn't cause
functional issues.
Hide the control completely when the module parameter is cleared.
reported-by: Sean Christopherson <sean.j.christopherson@intel.com>
Fixes: 8d860bbeed ("kvm: vmx: Basic APIC virtualization controls have three settings")
Cc: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Return early from vmx_set_virtual_apic_mode() if the processor doesn't
support VIRTUALIZE_APIC_ACCESSES or VIRTUALIZE_X2APIC_MODE, both of
which reside in SECONDARY_VM_EXEC_CONTROL. This eliminates warnings
due to VMWRITEs to SECONDARY_VM_EXEC_CONTROL (VMCS field 401e) failing
on processors without secondary exec controls.
Remove the similar check for TPR shadowing as it is incorporated in the
flexpriority_enabled check and the APIC-related code in
vmx_update_msr_bitmap() is further gated by VIRTUALIZE_X2APIC_MODE.
Reported-by: Gerhard Wiesinger <redhat@wiesinger.com>
Fixes: 8d860bbeed ("kvm: vmx: Basic APIC virtualization controls have three settings")
Cc: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>