Merge tag 'kvmarm-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm updates for 5.4

- New ITS translation cache
- Allow up to 512 CPUs to be supported with GICv3 (for real this time)
- Now call kvm_arch_vcpu_blocking early in the blocking sequence
- Tidy-up device mappings in S2 when DIC is available
- Clean icache invalidation on VMID rollover
- General cleanup
This commit is contained in:
Paolo Bonzini
2019-09-10 19:09:14 +02:00
793 changed files with 8258 additions and 4486 deletions

View File

@@ -41,10 +41,11 @@ Related CVEs
The following CVE entries describe Spectre variants:
============= ======================= =================
============= ======================= ==========================
CVE-2017-5753 Bounds check bypass Spectre variant 1
CVE-2017-5715 Branch target injection Spectre variant 2
============= ======================= =================
CVE-2019-1125 Spectre v1 swapgs Spectre variant 1 (swapgs)
============= ======================= ==========================
Problem
-------
@@ -78,6 +79,13 @@ There are some extensions of Spectre variant 1 attacks for reading data
over the network, see :ref:`[12] <spec_ref12>`. However such attacks
are difficult, low bandwidth, fragile, and are considered low risk.
Note that, despite "Bounds Check Bypass" name, Spectre variant 1 is not
only about user-controlled array bounds checks. It can affect any
conditional checks. The kernel entry code interrupt, exception, and NMI
handlers all have conditional swapgs checks. Those may be problematic
in the context of Spectre v1, as kernel code can speculatively run with
a user GS.
Spectre variant 2 (Branch Target Injection)
-------------------------------------------
@@ -132,6 +140,9 @@ not cover all possible attack vectors.
1. A user process attacking the kernel
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Spectre variant 1
~~~~~~~~~~~~~~~~~
The attacker passes a parameter to the kernel via a register or
via a known address in memory during a syscall. Such parameter may
be used later by the kernel as an index to an array or to derive
@@ -144,7 +155,40 @@ not cover all possible attack vectors.
potentially be influenced for Spectre attacks, new "nospec" accessor
macros are used to prevent speculative loading of data.
Spectre variant 2 attacker can :ref:`poison <poison_btb>` the branch
Spectre variant 1 (swapgs)
~~~~~~~~~~~~~~~~~~~~~~~~~~
An attacker can train the branch predictor to speculatively skip the
swapgs path for an interrupt or exception. If they initialize
the GS register to a user-space value, if the swapgs is speculatively
skipped, subsequent GS-related percpu accesses in the speculation
window will be done with the attacker-controlled GS value. This
could cause privileged memory to be accessed and leaked.
For example:
::
if (coming from user space)
swapgs
mov %gs:<percpu_offset>, %reg
mov (%reg), %reg1
When coming from user space, the CPU can speculatively skip the
swapgs, and then do a speculative percpu load using the user GS
value. So the user can speculatively force a read of any kernel
value. If a gadget exists which uses the percpu value as an address
in another load/store, then the contents of the kernel value may
become visible via an L1 side channel attack.
A similar attack exists when coming from kernel space. The CPU can
speculatively do the swapgs, causing the user GS to get used for the
rest of the speculative window.
Spectre variant 2
~~~~~~~~~~~~~~~~~
A spectre variant 2 attacker can :ref:`poison <poison_btb>` the branch
target buffer (BTB) before issuing syscall to launch an attack.
After entering the kernel, the kernel could use the poisoned branch
target buffer on indirect jump and jump to gadget code in speculative
@@ -280,11 +324,18 @@ The sysfs file showing Spectre variant 1 mitigation status is:
The possible values in this file are:
======================================= =================================
'Mitigation: __user pointer sanitation' Protection in kernel on a case by
case base with explicit pointer
sanitation.
======================================= =================================
.. list-table::
* - 'Not affected'
- The processor is not vulnerable.
* - 'Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers'
- The swapgs protections are disabled; otherwise it has
protection in the kernel on a case by case base with explicit
pointer sanitation and usercopy LFENCE barriers.
* - 'Mitigation: usercopy/swapgs barriers and __user pointer sanitization'
- Protection in the kernel on a case by case base with explicit
pointer sanitation, usercopy LFENCE barriers, and swapgs LFENCE
barriers.
However, the protections are put in place on a case by case basis,
and there is no guarantee that all possible attack vectors for Spectre
@@ -366,12 +417,27 @@ Turning on mitigation for Spectre variant 1 and Spectre variant 2
1. Kernel mitigation
^^^^^^^^^^^^^^^^^^^^
Spectre variant 1
~~~~~~~~~~~~~~~~~
For the Spectre variant 1, vulnerable kernel code (as determined
by code audit or scanning tools) is annotated on a case by case
basis to use nospec accessor macros for bounds clipping :ref:`[2]
<spec_ref2>` to avoid any usable disclosure gadgets. However, it may
not cover all attack vectors for Spectre variant 1.
Copy-from-user code has an LFENCE barrier to prevent the access_ok()
check from being mis-speculated. The barrier is done by the
barrier_nospec() macro.
For the swapgs variant of Spectre variant 1, LFENCE barriers are
added to interrupt, exception and NMI entry where needed. These
barriers are done by the FENCE_SWAPGS_KERNEL_ENTRY and
FENCE_SWAPGS_USER_ENTRY macros.
Spectre variant 2
~~~~~~~~~~~~~~~~~
For Spectre variant 2 mitigation, the compiler turns indirect calls or
jumps in the kernel into equivalent return trampolines (retpolines)
:ref:`[3] <spec_ref3>` :ref:`[9] <spec_ref9>` to go to the target
@@ -473,6 +539,12 @@ Mitigation control on the kernel command line
Spectre variant 2 mitigation can be disabled or force enabled at the
kernel command line.
nospectre_v1
[X86,PPC] Disable mitigations for Spectre Variant 1
(bounds check bypass). With this option data leaks are
possible in the system.
nospectre_v2
[X86] Disable all mitigations for the Spectre variant 2

View File

@@ -2604,7 +2604,7 @@
expose users to several CPU vulnerabilities.
Equivalent to: nopti [X86,PPC]
kpti=0 [ARM64]
nospectre_v1 [PPC]
nospectre_v1 [X86,PPC]
nobp=0 [S390]
nospectre_v2 [X86,PPC,S390,ARM64]
spectre_v2_user=off [X86]
@@ -2965,9 +2965,9 @@
nosmt=force: Force disable SMT, cannot be undone
via the sysfs control file.
nospectre_v1 [PPC] Disable mitigations for Spectre Variant 1 (bounds
check bypass). With this option data leaks are possible
in the system.
nospectre_v1 [X86,PPC] Disable mitigations for Spectre Variant 1
(bounds check bypass). With this option data leaks are
possible in the system.
nospectre_v2 [X86,PPC_FSL_BOOK3E,ARM64] Disable all mitigations for
the Spectre variant 2 (indirect branch prediction)

View File

@@ -1,162 +0,0 @@
===================
RISC-V CPU Bindings
===================
The device tree allows to describe the layout of CPUs in a system through
the "cpus" node, which in turn contains a number of subnodes (ie "cpu")
defining properties for every cpu.
Bindings for CPU nodes follow the Devicetree Specification, available from:
https://www.devicetree.org/specifications/
with updates for 32-bit and 64-bit RISC-V systems provided in this document.
===========
Terminology
===========
This document uses some terminology common to the RISC-V community that is not
widely used, the definitions of which are listed here:
* hart: A hardware execution context, which contains all the state mandated by
the RISC-V ISA: a PC and some registers. This terminology is designed to
disambiguate software's view of execution contexts from any particular
microarchitectural implementation strategy. For example, my Intel laptop is
described as having one socket with two cores, each of which has two hyper
threads. Therefore this system has four harts.
=====================================
cpus and cpu node bindings definition
=====================================
The RISC-V architecture, in accordance with the Devicetree Specification,
requires the cpus and cpu nodes to be present and contain the properties
described below.
- cpus node
Description: Container of cpu nodes
The node name must be "cpus".
A cpus node must define the following properties:
- #address-cells
Usage: required
Value type: <u32>
Definition: must be set to 1
- #size-cells
Usage: required
Value type: <u32>
Definition: must be set to 0
- cpu node
Description: Describes a hart context
PROPERTIES
- device_type
Usage: required
Value type: <string>
Definition: must be "cpu"
- reg
Usage: required
Value type: <u32>
Definition: The hart ID of this CPU node
- compatible:
Usage: required
Value type: <stringlist>
Definition: must contain "riscv", may contain one of
"sifive,rocket0"
- mmu-type:
Usage: optional
Value type: <string>
Definition: Specifies the CPU's MMU type. Possible values are
"riscv,sv32"
"riscv,sv39"
"riscv,sv48"
- riscv,isa:
Usage: required
Value type: <string>
Definition: Contains the RISC-V ISA string of this hart. These
ISA strings are defined by the RISC-V ISA manual.
Example: SiFive Freedom U540G Development Kit
---------------------------------------------
This system contains two harts: a hart marked as disabled that's used for
low-level system tasks and should be ignored by Linux, and a second hart that
Linux is allowed to run on.
cpus {
#address-cells = <1>;
#size-cells = <0>;
timebase-frequency = <1000000>;
cpu@0 {
clock-frequency = <1600000000>;
compatible = "sifive,rocket0", "riscv";
device_type = "cpu";
i-cache-block-size = <64>;
i-cache-sets = <128>;
i-cache-size = <16384>;
next-level-cache = <&L15 &L0>;
reg = <0>;
riscv,isa = "rv64imac";
status = "disabled";
L10: interrupt-controller {
#interrupt-cells = <1>;
compatible = "riscv,cpu-intc";
interrupt-controller;
};
};
cpu@1 {
clock-frequency = <1600000000>;
compatible = "sifive,rocket0", "riscv";
d-cache-block-size = <64>;
d-cache-sets = <64>;
d-cache-size = <32768>;
d-tlb-sets = <1>;
d-tlb-size = <32>;
device_type = "cpu";
i-cache-block-size = <64>;
i-cache-sets = <64>;
i-cache-size = <32768>;
i-tlb-sets = <1>;
i-tlb-size = <32>;
mmu-type = "riscv,sv39";
next-level-cache = <&L15 &L0>;
reg = <1>;
riscv,isa = "rv64imafdc";
status = "okay";
tlb-split;
L13: interrupt-controller {
#interrupt-cells = <1>;
compatible = "riscv,cpu-intc";
interrupt-controller;
};
};
};
Example: Spike ISA Simulator with 1 Hart
----------------------------------------
This device tree matches the Spike ISA golden model as run with `spike -p1`.
cpus {
cpu@0 {
device_type = "cpu";
reg = <0x00000000>;
status = "okay";
compatible = "riscv";
riscv,isa = "rv64imafdc";
mmu-type = "riscv,sv48";
clock-frequency = <0x3b9aca00>;
interrupt-controller {
#interrupt-cells = <0x00000001>;
interrupt-controller;
compatible = "riscv,cpu-intc";
}
}
}

View File

@@ -10,6 +10,18 @@ maintainers:
- Paul Walmsley <paul.walmsley@sifive.com>
- Palmer Dabbelt <palmer@sifive.com>
description: |
This document uses some terminology common to the RISC-V community
that is not widely used, the definitions of which are listed here:
hart: A hardware execution context, which contains all the state
mandated by the RISC-V ISA: a PC and some registers. This
terminology is designed to disambiguate software's view of execution
contexts from any particular microarchitectural implementation
strategy. For example, an Intel laptop containing one socket with
two cores, each of which has two hyperthreads, could be described as
having four harts.
properties:
compatible:
items:
@@ -50,6 +62,10 @@ properties:
User-Level ISA document, available from
https://riscv.org/specifications/
While the isa strings in ISA specification are case
insensitive, letters in the riscv,isa string must be all
lowercase to simplify parsing.
timebase-frequency:
type: integer
minimum: 1

View File

@@ -19,7 +19,7 @@ properties:
compatible:
items:
- enum:
- sifive,freedom-unleashed-a00
- sifive,hifive-unleashed-a00
- const: sifive,fu540-c000
- const: sifive,fu540
...

View File

@@ -73,7 +73,6 @@ patternProperties:
Compatible of the SPI device.
reg:
maxItems: 1
minimum: 0
maximum: 256
description:

View File

@@ -13,7 +13,8 @@ a) SMB3 (and SMB3.1.1) missing optional features:
- T10 copy offload ie "ODX" (copy chunk, and "Duplicate Extents" ioctl
currently the only two server side copy mechanisms supported)
b) improved sparse file support
b) improved sparse file support (fiemap and SEEK_HOLE are implemented
but additional features would be supportable by the protocol).
c) Directory entry caching relies on a 1 second timer, rather than
using Directory Leases, currently only the root file handle is cached longer
@@ -21,9 +22,13 @@ using Directory Leases, currently only the root file handle is cached longer
d) quota support (needs minor kernel change since quota calls
to make it to network filesystems or deviceless filesystems)
e) Additional use cases where we use "compoounding" (e.g. open/query/close
and open/setinfo/close) to reduce the number of roundtrips, and also
open to reduce redundant opens (using deferred close and reference counts more).
e) Additional use cases can be optimized to use "compounding"
(e.g. open/query/close and open/setinfo/close) to reduce the number
of roundtrips to the server and improve performance. Various cases
(stat, statfs, create, unlink, mkdir) already have been improved by
using compounding but more can be done. In addition we could significantly
reduce redundant opens by using deferred close (with handle caching leases)
and better using reference counters on file handles.
f) Finish inotify support so kde and gnome file list windows
will autorefresh (partially complete by Asser). Needs minor kernel
@@ -43,18 +48,17 @@ mount or a per server basis to client UIDs or nobody if no mapping
exists. Also better integration with winbind for resolving SID owners
k) Add tools to take advantage of more smb3 specific ioctls and features
(passthrough ioctl/fsctl for sending various SMB3 fsctls to the server
is in progress, and a passthrough query_info call is already implemented
in cifs.ko to allow smb3 info levels queries to be sent from userspace)
(passthrough ioctl/fsctl is now implemented in cifs.ko to allow sending
various SMB3 fsctls and query info and set info calls directly from user space)
Add tools to make setting various non-POSIX metadata attributes easier
from tools (e.g. extending what was done in smb-info tool).
l) encrypted file support
m) improved stats gathering tools (perhaps integration with nfsometer?)
to extend and make easier to use what is currently in /proc/fs/cifs/Stats
n) allow setting more NTFS/SMB3 file attributes remotely (currently limited to compressed
file attribute via chflags) and improve user space tools for managing and
viewing them.
n) Add support for claims based ACLs ("DAC")
o) mount helper GUI (to simplify the various configuration options on mount)
@@ -82,6 +86,8 @@ so far).
w) Add support for additional strong encryption types, and additional spnego
authentication mechanisms (see MS-SMB2)
x) Finish support for SMB3.1.1 compression
KNOWN BUGS
====================================
See http://bugzilla.samba.org - search on product "CifsVFS" for

View File

@@ -424,13 +424,24 @@ Statistics
Following minimum set of TLS-related statistics should be reported
by the driver:
* ``rx_tls_decrypted`` - number of successfully decrypted TLS segments
* ``tx_tls_encrypted`` - number of in-order TLS segments passed to device
for encryption
* ``rx_tls_decrypted_packets`` - number of successfully decrypted RX packets
which were part of a TLS stream.
* ``rx_tls_decrypted_bytes`` - number of TLS payload bytes in RX packets
which were successfully decrypted.
* ``tx_tls_encrypted_packets`` - number of TX packets passed to the device
for encryption of their TLS payload.
* ``tx_tls_encrypted_bytes`` - number of TLS payload bytes in TX packets
passed to the device for encryption.
* ``tx_tls_ctx`` - number of TLS TX HW offload contexts added to device for
encryption.
* ``tx_tls_ooo`` - number of TX packets which were part of a TLS stream
but did not arrive in the expected order
* ``tx_tls_drop_no_sync_data`` - number of TX packets dropped because
they arrived out of order and associated record could not be found
but did not arrive in the expected order.
* ``tx_tls_drop_no_sync_data`` - number of TX packets which were part of
a TLS stream dropped, because they arrived out of order and associated
record could not be found.
* ``tx_tls_drop_bypass_req`` - number of TX packets which were part of a TLS
stream dropped, because they contain both data that has been encrypted by
software and data that expects hardware crypto offload.
Notable corner cases, exceptions and additional requirements
============================================================

View File

@@ -753,8 +753,8 @@ in-kernel irqchip (GIC), and for in-kernel irqchip can tell the GIC to
use PPIs designated for specific cpus. The irq field is interpreted
like this:
 bits: | 31 ... 24 | 23 ... 16 | 15 ... 0 |
field: | irq_type | vcpu_index | irq_id |
 bits: | 31 ... 28 | 27 ... 24 | 23 ... 16 | 15 ... 0 |
field: | vcpu2_index | irq_type | vcpu_index | irq_id |
The irq_type field has the following values:
- irq_type[0]: out-of-kernel GIC: irq_id 0 is IRQ, irq_id 1 is FIQ
@@ -766,6 +766,14 @@ The irq_type field has the following values:
In both cases, level is used to assert/deassert the line.
When KVM_CAP_ARM_IRQ_LINE_LAYOUT_2 is supported, the target vcpu is
identified as (256 * vcpu2_index + vcpu_index). Otherwise, vcpu2_index
must be zero.
Note that on arm/arm64, the KVM_CAP_IRQCHIP capability only conditions
injection of interrupts for the in-kernel irqchip. KVM_IRQ_LINE can always
be used for a userspace interrupt controller.
struct kvm_irq_level {
union {
__u32 irq; /* GSI */

View File

@@ -237,7 +237,7 @@ The usage pattern is::
ret = hmm_range_snapshot(&range);
if (ret) {
up_read(&mm->mmap_sem);
if (ret == -EAGAIN) {
if (ret == -EBUSY) {
/*
* No need to check hmm_range_wait_until_valid() return value
* on retry we will get proper error with hmm_range_snapshot()