Merge tag 'kvmarm-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm updates for 5.4 - New ITS translation cache - Allow up to 512 CPUs to be supported with GICv3 (for real this time) - Now call kvm_arch_vcpu_blocking early in the blocking sequence - Tidy-up device mappings in S2 when DIC is available - Clean icache invalidation on VMID rollover - General cleanup
2019-09-10 19:09:14 +02:00
parent 8146856b0a 92f35b751c
commit 32d1d15c52
793 changed files with 8258 additions and 4486 deletions
--- a/Documentation/admin-guide/hw-vuln/spectre.rst
+++ b/Documentation/admin-guide/hw-vuln/spectre.rst
@@ -41,10 +41,11 @@ Related CVEs

 The following CVE entries describe Spectre variants:

-   =============   =======================  =================
+   =============   =======================  ==========================
   CVE-2017-5753   Bounds check bypass      Spectre variant 1
   CVE-2017-5715   Branch target injection  Spectre variant 2
-   =============   =======================  =================
+   CVE-2019-1125   Spectre v1 swapgs        Spectre variant 1 (swapgs)
+   =============   =======================  ==========================

 Problem
 -------
@@ -78,6 +79,13 @@ There are some extensions of Spectre variant 1 attacks for reading data
 over the network, see :ref:`[12] <spec_ref12>`. However such attacks
 are difficult, low bandwidth, fragile, and are considered low risk.

+Note that, despite "Bounds Check Bypass" name, Spectre variant 1 is not
+only about user-controlled array bounds checks.  It can affect any
+conditional checks.  The kernel entry code interrupt, exception, and NMI
+handlers all have conditional swapgs checks.  Those may be problematic
+in the context of Spectre v1, as kernel code can speculatively run with
+a user GS.
+
 Spectre variant 2 (Branch Target Injection)
 -------------------------------------------

@@ -132,6 +140,9 @@ not cover all possible attack vectors.
 1. A user process attacking the kernel
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

+Spectre variant 1
+~~~~~~~~~~~~~~~~~
+
   The attacker passes a parameter to the kernel via a register or
   via a known address in memory during a syscall. Such parameter may
   be used later by the kernel as an index to an array or to derive
@@ -144,7 +155,40 @@ not cover all possible attack vectors.
   potentially be influenced for Spectre attacks, new "nospec" accessor
   macros are used to prevent speculative loading of data.

-   Spectre variant 2 attacker can :ref:`poison <poison_btb>` the branch
+Spectre variant 1 (swapgs)
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+   An attacker can train the branch predictor to speculatively skip the
+   swapgs path for an interrupt or exception.  If they initialize
+   the GS register to a user-space value, if the swapgs is speculatively
+   skipped, subsequent GS-related percpu accesses in the speculation
+   window will be done with the attacker-controlled GS value.  This
+   could cause privileged memory to be accessed and leaked.
+
+   For example:
+
+   ::
+
+     if (coming from user space)
+         swapgs
+     mov %gs:<percpu_offset>, %reg
+     mov (%reg), %reg1
+
+   When coming from user space, the CPU can speculatively skip the
+   swapgs, and then do a speculative percpu load using the user GS
+   value.  So the user can speculatively force a read of any kernel
+   value.  If a gadget exists which uses the percpu value as an address
+   in another load/store, then the contents of the kernel value may
+   become visible via an L1 side channel attack.
+
+   A similar attack exists when coming from kernel space.  The CPU can
+   speculatively do the swapgs, causing the user GS to get used for the
+   rest of the speculative window.
+
+Spectre variant 2
+~~~~~~~~~~~~~~~~~
+
+   A spectre variant 2 attacker can :ref:`poison <poison_btb>` the branch
   target buffer (BTB) before issuing syscall to launch an attack.
   After entering the kernel, the kernel could use the poisoned branch
   target buffer on indirect jump and jump to gadget code in speculative
@@ -280,11 +324,18 @@ The sysfs file showing Spectre variant 1 mitigation status is:

 The possible values in this file are:

-  =======================================  =================================
-  'Mitigation: __user pointer sanitation'  Protection in kernel on a case by
-                                           case base with explicit pointer
-                                           sanitation.
-  =======================================  =================================
+  .. list-table::
+
+     * - 'Not affected'
+       - The processor is not vulnerable.
+     * - 'Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers'
+       - The swapgs protections are disabled; otherwise it has
+         protection in the kernel on a case by case base with explicit
+         pointer sanitation and usercopy LFENCE barriers.
+     * - 'Mitigation: usercopy/swapgs barriers and __user pointer sanitization'
+       - Protection in the kernel on a case by case base with explicit
+         pointer sanitation, usercopy LFENCE barriers, and swapgs LFENCE
+         barriers.

 However, the protections are put in place on a case by case basis,
 and there is no guarantee that all possible attack vectors for Spectre
@@ -366,12 +417,27 @@ Turning on mitigation for Spectre variant 1 and Spectre variant 2
 1. Kernel mitigation
 ^^^^^^^^^^^^^^^^^^^^

+Spectre variant 1
+~~~~~~~~~~~~~~~~~
+
   For the Spectre variant 1, vulnerable kernel code (as determined
   by code audit or scanning tools) is annotated on a case by case
   basis to use nospec accessor macros for bounds clipping :ref:`[2]
   <spec_ref2>` to avoid any usable disclosure gadgets. However, it may
   not cover all attack vectors for Spectre variant 1.

+   Copy-from-user code has an LFENCE barrier to prevent the access_ok()
+   check from being mis-speculated.  The barrier is done by the
+   barrier_nospec() macro.
+
+   For the swapgs variant of Spectre variant 1, LFENCE barriers are
+   added to interrupt, exception and NMI entry where needed.  These
+   barriers are done by the FENCE_SWAPGS_KERNEL_ENTRY and
+   FENCE_SWAPGS_USER_ENTRY macros.
+
+Spectre variant 2
+~~~~~~~~~~~~~~~~~
+
   For Spectre variant 2 mitigation, the compiler turns indirect calls or
   jumps in the kernel into equivalent return trampolines (retpolines)
   :ref:`[3] <spec_ref3>` :ref:`[9] <spec_ref9>` to go to the target
@@ -473,6 +539,12 @@ Mitigation control on the kernel command line
 Spectre variant 2 mitigation can be disabled or force enabled at the
 kernel command line.

+	nospectre_v1
+
+		[X86,PPC] Disable mitigations for Spectre Variant 1
+		(bounds check bypass). With this option data leaks are
+		possible in the system.
+
 	nospectre_v2

 		[X86] Disable all mitigations for the Spectre variant 2
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2604,7 +2604,7 @@
 				expose users to several CPU vulnerabilities.
 				Equivalent to: nopti [X86,PPC]
 					       kpti=0 [ARM64]
-					       nospectre_v1 [PPC]
+					       nospectre_v1 [X86,PPC]
 					       nobp=0 [S390]
 					       nospectre_v2 [X86,PPC,S390,ARM64]
 					       spectre_v2_user=off [X86]
@@ -2965,9 +2965,9 @@
 			nosmt=force: Force disable SMT, cannot be undone
 				     via the sysfs control file.

-	nospectre_v1	[PPC] Disable mitigations for Spectre Variant 1 (bounds
-			check bypass). With this option data leaks are possible
-			in the system.
+	nospectre_v1	[X86,PPC] Disable mitigations for Spectre Variant 1
+			(bounds check bypass). With this option data leaks are
+			possible in the system.

 	nospectre_v2	[X86,PPC_FSL_BOOK3E,ARM64] Disable all mitigations for
 			the Spectre variant 2 (indirect branch prediction)
--- a/Documentation/devicetree/bindings/riscv/cpus.txt
+++ b/Documentation/devicetree/bindings/riscv/cpus.txt
@@ -1,162 +0,0 @@
-===================
-RISC-V CPU Bindings
-===================
-
-The device tree allows to describe the layout of CPUs in a system through
-the "cpus" node, which in turn contains a number of subnodes (ie "cpu")
-defining properties for every cpu.
-
-Bindings for CPU nodes follow the Devicetree Specification, available from:
-
-https://www.devicetree.org/specifications/
-
-with updates for 32-bit and 64-bit RISC-V systems provided in this document.
-
-===========
-Terminology
-===========
-
-This document uses some terminology common to the RISC-V community that is not
-widely used, the definitions of which are listed here:
-
-* hart: A hardware execution context, which contains all the state mandated by
-  the RISC-V ISA: a PC and some registers.  This terminology is designed to
-  disambiguate software's view of execution contexts from any particular
-  microarchitectural implementation strategy.  For example, my Intel laptop is
-  described as having one socket with two cores, each of which has two hyper
-  threads.  Therefore this system has four harts.
-
-=====================================
-cpus and cpu node bindings definition
-=====================================
-
-The RISC-V architecture, in accordance with the Devicetree Specification,
-requires the cpus and cpu nodes to be present and contain the properties
-described below.
-
- cpus node
-
-        Description: Container of cpu nodes
-
-        The node name must be "cpus".
-
-        A cpus node must define the following properties:
-
-        - #address-cells
-                Usage: required
-                Value type: <u32>
-                Definition: must be set to 1
-        - #size-cells
-                Usage: required
-                Value type: <u32>
-                Definition: must be set to 0
-
- cpu node
-
-        Description: Describes a hart context
-
-        PROPERTIES
-
-        - device_type
-                Usage: required
-                Value type: <string>
-                Definition: must be "cpu"
-        - reg
-                Usage: required
-                Value type: <u32>
-                Definition: The hart ID of this CPU node
-        - compatible:
-                Usage: required
-                Value type: <stringlist>
-                Definition: must contain "riscv", may contain one of
-                            "sifive,rocket0"
-        - mmu-type:
-                Usage: optional
-                Value type: <string>
-                Definition: Specifies the CPU's MMU type.  Possible values are
-                            "riscv,sv32"
-                            "riscv,sv39"
-                            "riscv,sv48"
-        - riscv,isa:
-                Usage: required
-                Value type: <string>
-                Definition: Contains the RISC-V ISA string of this hart.  These
-                            ISA strings are defined by the RISC-V ISA manual.
-
-Example: SiFive Freedom U540G Development Kit
---------------------------------------------
-
-This system contains two harts: a hart marked as disabled that's used for
-low-level system tasks and should be ignored by Linux, and a second hart that
-Linux is allowed to run on.
-
-        cpus {
-                #address-cells = <1>;
-                #size-cells = <0>;
-                timebase-frequency = <1000000>;
-                cpu@0 {
-                        clock-frequency = <1600000000>;
-                        compatible = "sifive,rocket0", "riscv";
-                        device_type = "cpu";
-                        i-cache-block-size = <64>;
-                        i-cache-sets = <128>;
-                        i-cache-size = <16384>;
-                        next-level-cache = <&L15 &L0>;
-                        reg = <0>;
-                        riscv,isa = "rv64imac";
-                        status = "disabled";
-                        L10: interrupt-controller {
-                                #interrupt-cells = <1>;
-                                compatible = "riscv,cpu-intc";
-                                interrupt-controller;
-                        };
-                };
-                cpu@1 {
-                        clock-frequency = <1600000000>;
-                        compatible = "sifive,rocket0", "riscv";
-                        d-cache-block-size = <64>;
-                        d-cache-sets = <64>;
-                        d-cache-size = <32768>;
-                        d-tlb-sets = <1>;
-                        d-tlb-size = <32>;
-                        device_type = "cpu";
-                        i-cache-block-size = <64>;
-                        i-cache-sets = <64>;
-                        i-cache-size = <32768>;
-                        i-tlb-sets = <1>;
-                        i-tlb-size = <32>;
-                        mmu-type = "riscv,sv39";
-                        next-level-cache = <&L15 &L0>;
-                        reg = <1>;
-                        riscv,isa = "rv64imafdc";
-                        status = "okay";
-                        tlb-split;
-                        L13: interrupt-controller {
-                                #interrupt-cells = <1>;
-                                compatible = "riscv,cpu-intc";
-                                interrupt-controller;
-                        };
-                };
-        };
-
-Example: Spike ISA Simulator with 1 Hart
----------------------------------------
-
-This device tree matches the Spike ISA golden model as run with `spike -p1`.
-
-        cpus {
-                cpu@0 {
-                        device_type = "cpu";
-                        reg = <0x00000000>;
-                        status = "okay";
-                        compatible = "riscv";
-                        riscv,isa = "rv64imafdc";
-                        mmu-type = "riscv,sv48";
-                        clock-frequency = <0x3b9aca00>;
-                        interrupt-controller {
-                                #interrupt-cells = <0x00000001>;
-                                interrupt-controller;
-                                compatible = "riscv,cpu-intc";
-                        }
-                }
-        }
--- a/Documentation/devicetree/bindings/riscv/cpus.yaml
+++ b/Documentation/devicetree/bindings/riscv/cpus.yaml
@@ -10,6 +10,18 @@ maintainers:
  - Paul Walmsley <paul.walmsley@sifive.com>
  - Palmer Dabbelt <palmer@sifive.com>

+description: |
+  This document uses some terminology common to the RISC-V community
+  that is not widely used, the definitions of which are listed here:
+
+  hart: A hardware execution context, which contains all the state
+  mandated by the RISC-V ISA: a PC and some registers.  This
+  terminology is designed to disambiguate software's view of execution
+  contexts from any particular microarchitectural implementation
+  strategy.  For example, an Intel laptop containing one socket with
+  two cores, each of which has two hyperthreads, could be described as
+  having four harts.
+
 properties:
  compatible:
    items:
@@ -50,6 +62,10 @@ properties:
      User-Level ISA document, available from
      https://riscv.org/specifications/

+      While the isa strings in ISA specification are case
+      insensitive, letters in the riscv,isa string must be all
+      lowercase to simplify parsing.
+
  timebase-frequency:
    type: integer
    minimum: 1
--- a/Documentation/devicetree/bindings/riscv/sifive.yaml
+++ b/Documentation/devicetree/bindings/riscv/sifive.yaml
@@ -19,7 +19,7 @@ properties:
  compatible:
    items:
      - enum:
-          - sifive,freedom-unleashed-a00
+          - sifive,hifive-unleashed-a00
      - const: sifive,fu540-c000
      - const: sifive,fu540
 ...
--- a/Documentation/devicetree/bindings/spi/spi-controller.yaml
+++ b/Documentation/devicetree/bindings/spi/spi-controller.yaml
@@ -73,7 +73,6 @@ patternProperties:
          Compatible of the SPI device.

      reg:
-        maxItems: 1
        minimum: 0
        maximum: 256
        description:
--- a/Documentation/filesystems/cifs/TODO
+++ b/Documentation/filesystems/cifs/TODO
@@ -13,7 +13,8 @@ a) SMB3 (and SMB3.1.1) missing optional features:
   - T10 copy offload ie "ODX" (copy chunk, and "Duplicate Extents" ioctl
     currently the only two server side copy mechanisms supported)

-b) improved sparse file support
+b) improved sparse file support (fiemap and SEEK_HOLE are implemented
+but additional features would be supportable by the protocol).

 c) Directory entry caching relies on a 1 second timer, rather than
 using Directory Leases, currently only the root file handle is cached longer
@@ -21,9 +22,13 @@ using Directory Leases, currently only the root file handle is cached longer
 d) quota support (needs minor kernel change since quota calls
 to make it to network filesystems or deviceless filesystems)

-e) Additional use cases where we use "compoounding" (e.g. open/query/close
-and open/setinfo/close) to reduce the number of roundtrips, and also
-open to reduce redundant opens (using deferred close and reference counts more).
+e) Additional use cases can be optimized to use "compounding"
+(e.g. open/query/close and open/setinfo/close) to reduce the number
+of roundtrips to the server and improve performance. Various cases
+(stat, statfs, create, unlink, mkdir) already have been improved by
+using compounding but more can be done.  In addition we could significantly
+reduce redundant opens by using deferred close (with handle caching leases)
+and better using reference counters on file handles.

 f) Finish inotify support so kde and gnome file list windows
 will autorefresh (partially complete by Asser). Needs minor kernel
@@ -43,18 +48,17 @@ mount or a per server basis to client UIDs or nobody if no mapping
 exists. Also better integration with winbind for resolving SID owners

 k) Add tools to take advantage of more smb3 specific ioctls and features
-(passthrough ioctl/fsctl for sending various SMB3 fsctls to the server
-is in progress, and a passthrough query_info call is already implemented
-in cifs.ko to allow smb3 info levels queries to be sent from userspace)
+(passthrough ioctl/fsctl is now implemented in cifs.ko to allow sending
+various SMB3 fsctls and query info and set info calls directly from user space)
+Add tools to make setting various non-POSIX metadata attributes easier
+from tools (e.g. extending what was done in smb-info tool).

 l) encrypted file support

 m) improved stats gathering tools (perhaps integration with nfsometer?)
 to extend and make easier to use what is currently in /proc/fs/cifs/Stats

-n) allow setting more NTFS/SMB3 file attributes remotely (currently limited to compressed
-file attribute via chflags) and improve user space tools for managing and
-viewing them.
+n) Add support for claims based ACLs ("DAC")

 o) mount helper GUI (to simplify the various configuration options on mount)

@@ -82,6 +86,8 @@ so far).
 w) Add support for additional strong encryption types, and additional spnego
 authentication mechanisms (see MS-SMB2)

+x) Finish support for SMB3.1.1 compression
+
 KNOWN BUGS
 ====================================
 See http://bugzilla.samba.org - search on product "CifsVFS" for
--- a/Documentation/networking/tls-offload.rst
+++ b/Documentation/networking/tls-offload.rst
@@ -424,13 +424,24 @@ Statistics
 Following minimum set of TLS-related statistics should be reported
 by the driver:

- * ``rx_tls_decrypted`` - number of successfully decrypted TLS segments
- * ``tx_tls_encrypted`` - number of in-order TLS segments passed to device
-   for encryption
+ * ``rx_tls_decrypted_packets`` - number of successfully decrypted RX packets
+   which were part of a TLS stream.
+ * ``rx_tls_decrypted_bytes`` - number of TLS payload bytes in RX packets
+   which were successfully decrypted.
+ * ``tx_tls_encrypted_packets`` - number of TX packets passed to the device
+   for encryption of their TLS payload.
+ * ``tx_tls_encrypted_bytes`` - number of TLS payload bytes in TX packets
+   passed to the device for encryption.
+ * ``tx_tls_ctx`` - number of TLS TX HW offload contexts added to device for
+   encryption.
 * ``tx_tls_ooo`` - number of TX packets which were part of a TLS stream
-   but did not arrive in the expected order
- * ``tx_tls_drop_no_sync_data`` - number of TX packets dropped because
-   they arrived out of order and associated record could not be found
+   but did not arrive in the expected order.
+ * ``tx_tls_drop_no_sync_data`` - number of TX packets which were part of
+   a TLS stream dropped, because they arrived out of order and associated
+   record could not be found.
+ * ``tx_tls_drop_bypass_req`` - number of TX packets which were part of a TLS
+   stream dropped, because they contain both data that has been encrypted by
+   software and data that expects hardware crypto offload.

 Notable corner cases, exceptions and additional requirements
 ============================================================
--- a/Documentation/virt/kvm/api.txt
+++ b/Documentation/virt/kvm/api.txt
@@ -753,8 +753,8 @@ in-kernel irqchip (GIC), and for in-kernel irqchip can tell the GIC to
 use PPIs designated for specific cpus.  The irq field is interpreted
 like this:

-  bits:  | 31 ... 24 | 23  ... 16 | 15    ...    0 |
-  field: | irq_type  | vcpu_index |     irq_id     |
+  bits:  |  31 ... 28  | 27 ... 24 | 23  ... 16 | 15 ... 0 |
+  field: | vcpu2_index | irq_type  | vcpu_index |  irq_id  |

 The irq_type field has the following values:
 - irq_type[0]: out-of-kernel GIC: irq_id 0 is IRQ, irq_id 1 is FIQ
@@ -766,6 +766,14 @@ The irq_type field has the following values:

 In both cases, level is used to assert/deassert the line.

+When KVM_CAP_ARM_IRQ_LINE_LAYOUT_2 is supported, the target vcpu is
+identified as (256 * vcpu2_index + vcpu_index). Otherwise, vcpu2_index
+must be zero.
+
+Note that on arm/arm64, the KVM_CAP_IRQCHIP capability only conditions
+injection of interrupts for the in-kernel irqchip. KVM_IRQ_LINE can always
+be used for a userspace interrupt controller.
+
 struct kvm_irq_level {
 	union {
 		__u32 irq;     /* GSI */
--- a/Documentation/vm/hmm.rst
+++ b/Documentation/vm/hmm.rst
@@ -237,7 +237,7 @@ The usage pattern is::
      ret = hmm_range_snapshot(&range);
      if (ret) {
          up_read(&mm->mmap_sem);
-          if (ret == -EAGAIN) {
+          if (ret == -EBUSY) {
            /*
             * No need to check hmm_range_wait_until_valid() return value
             * on retry we will get proper error with hmm_range_snapshot()