sgx.rst 13 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302
  1. .. SPDX-License-Identifier: GPL-2.0
  2. ===============================
  3. Software Guard eXtensions (SGX)
  4. ===============================
  5. Overview
  6. ========
  7. Software Guard eXtensions (SGX) hardware enables for user space applications
  8. to set aside private memory regions of code and data:
  9. * Privileged (ring-0) ENCLS functions orchestrate the construction of the
  10. regions.
  11. * Unprivileged (ring-3) ENCLU functions allow an application to enter and
  12. execute inside the regions.
  13. These memory regions are called enclaves. An enclave can be only entered at a
  14. fixed set of entry points. Each entry point can hold a single hardware thread
  15. at a time. While the enclave is loaded from a regular binary file by using
  16. ENCLS functions, only the threads inside the enclave can access its memory. The
  17. region is denied from outside access by the CPU, and encrypted before it leaves
  18. from LLC.
  19. The support can be determined by
  20. ``grep sgx /proc/cpuinfo``
  21. SGX must both be supported in the processor and enabled by the BIOS. If SGX
  22. appears to be unsupported on a system which has hardware support, ensure
  23. support is enabled in the BIOS. If a BIOS presents a choice between "Enabled"
  24. and "Software Enabled" modes for SGX, choose "Enabled".
  25. Enclave Page Cache
  26. ==================
  27. SGX utilizes an *Enclave Page Cache (EPC)* to store pages that are associated
  28. with an enclave. It is contained in a BIOS-reserved region of physical memory.
  29. Unlike pages used for regular memory, pages can only be accessed from outside of
  30. the enclave during enclave construction with special, limited SGX instructions.
  31. Only a CPU executing inside an enclave can directly access enclave memory.
  32. However, a CPU executing inside an enclave may access normal memory outside the
  33. enclave.
  34. The kernel manages enclave memory similar to how it treats device memory.
  35. Enclave Page Types
  36. ------------------
  37. **SGX Enclave Control Structure (SECS)**
  38. Enclave's address range, attributes and other global data are defined
  39. by this structure.
  40. **Regular (REG)**
  41. Regular EPC pages contain the code and data of an enclave.
  42. **Thread Control Structure (TCS)**
  43. Thread Control Structure pages define the entry points to an enclave and
  44. track the execution state of an enclave thread.
  45. **Version Array (VA)**
  46. Version Array pages contain 512 slots, each of which can contain a version
  47. number for a page evicted from the EPC.
  48. Enclave Page Cache Map
  49. ----------------------
  50. The processor tracks EPC pages in a hardware metadata structure called the
  51. *Enclave Page Cache Map (EPCM)*. The EPCM contains an entry for each EPC page
  52. which describes the owning enclave, access rights and page type among the other
  53. things.
  54. EPCM permissions are separate from the normal page tables. This prevents the
  55. kernel from, for instance, allowing writes to data which an enclave wishes to
  56. remain read-only. EPCM permissions may only impose additional restrictions on
  57. top of normal x86 page permissions.
  58. For all intents and purposes, the SGX architecture allows the processor to
  59. invalidate all EPCM entries at will. This requires that software be prepared to
  60. handle an EPCM fault at any time. In practice, this can happen on events like
  61. power transitions when the ephemeral key that encrypts enclave memory is lost.
  62. Application interface
  63. =====================
  64. Enclave build functions
  65. -----------------------
  66. In addition to the traditional compiler and linker build process, SGX has a
  67. separate enclave “build” process. Enclaves must be built before they can be
  68. executed (entered). The first step in building an enclave is opening the
  69. **/dev/sgx_enclave** device. Since enclave memory is protected from direct
  70. access, special privileged instructions are then used to copy data into enclave
  71. pages and establish enclave page permissions.
  72. .. kernel-doc:: arch/x86/kernel/cpu/sgx/ioctl.c
  73. :functions: sgx_ioc_enclave_create
  74. sgx_ioc_enclave_add_pages
  75. sgx_ioc_enclave_init
  76. sgx_ioc_enclave_provision
  77. Enclave runtime management
  78. --------------------------
  79. Systems supporting SGX2 additionally support changes to initialized
  80. enclaves: modifying enclave page permissions and type, and dynamically
  81. adding and removing of enclave pages. When an enclave accesses an address
  82. within its address range that does not have a backing page then a new
  83. regular page will be dynamically added to the enclave. The enclave is
  84. still required to run EACCEPT on the new page before it can be used.
  85. .. kernel-doc:: arch/x86/kernel/cpu/sgx/ioctl.c
  86. :functions: sgx_ioc_enclave_restrict_permissions
  87. sgx_ioc_enclave_modify_types
  88. sgx_ioc_enclave_remove_pages
  89. Enclave vDSO
  90. ------------
  91. Entering an enclave can only be done through SGX-specific EENTER and ERESUME
  92. functions, and is a non-trivial process. Because of the complexity of
  93. transitioning to and from an enclave, enclaves typically utilize a library to
  94. handle the actual transitions. This is roughly analogous to how glibc
  95. implementations are used by most applications to wrap system calls.
  96. Another crucial characteristic of enclaves is that they can generate exceptions
  97. as part of their normal operation that need to be handled in the enclave or are
  98. unique to SGX.
  99. Instead of the traditional signal mechanism to handle these exceptions, SGX
  100. can leverage special exception fixup provided by the vDSO. The kernel-provided
  101. vDSO function wraps low-level transitions to/from the enclave like EENTER and
  102. ERESUME. The vDSO function intercepts exceptions that would otherwise generate
  103. a signal and return the fault information directly to its caller. This avoids
  104. the need to juggle signal handlers.
  105. .. kernel-doc:: arch/x86/include/uapi/asm/sgx.h
  106. :functions: vdso_sgx_enter_enclave_t
  107. ksgxd
  108. =====
  109. SGX support includes a kernel thread called *ksgxd*.
  110. EPC sanitization
  111. ----------------
  112. ksgxd is started when SGX initializes. Enclave memory is typically ready
  113. for use when the processor powers on or resets. However, if SGX has been in
  114. use since the reset, enclave pages may be in an inconsistent state. This might
  115. occur after a crash and kexec() cycle, for instance. At boot, ksgxd
  116. reinitializes all enclave pages so that they can be allocated and re-used.
  117. The sanitization is done by going through EPC address space and applying the
  118. EREMOVE function to each physical page. Some enclave pages like SECS pages have
  119. hardware dependencies on other pages which prevents EREMOVE from functioning.
  120. Executing two EREMOVE passes removes the dependencies.
  121. Page reclaimer
  122. --------------
  123. Similar to the core kswapd, ksgxd, is responsible for managing the
  124. overcommitment of enclave memory. If the system runs out of enclave memory,
  125. *ksgxd* “swaps” enclave memory to normal memory.
  126. Launch Control
  127. ==============
  128. SGX provides a launch control mechanism. After all enclave pages have been
  129. copied, kernel executes EINIT function, which initializes the enclave. Only after
  130. this the CPU can execute inside the enclave.
  131. EINIT function takes an RSA-3072 signature of the enclave measurement. The function
  132. checks that the measurement is correct and signature is signed with the key
  133. hashed to the four **IA32_SGXLEPUBKEYHASH{0, 1, 2, 3}** MSRs representing the
  134. SHA256 of a public key.
  135. Those MSRs can be configured by the BIOS to be either readable or writable.
  136. Linux supports only writable configuration in order to give full control to the
  137. kernel on launch control policy. Before calling EINIT function, the driver sets
  138. the MSRs to match the enclave's signing key.
  139. Encryption engines
  140. ==================
  141. In order to conceal the enclave data while it is out of the CPU package, the
  142. memory controller has an encryption engine to transparently encrypt and decrypt
  143. enclave memory.
  144. In CPUs prior to Ice Lake, the Memory Encryption Engine (MEE) is used to
  145. encrypt pages leaving the CPU caches. MEE uses a n-ary Merkle tree with root in
  146. SRAM to maintain integrity of the encrypted data. This provides integrity and
  147. anti-replay protection but does not scale to large memory sizes because the time
  148. required to update the Merkle tree grows logarithmically in relation to the
  149. memory size.
  150. CPUs starting from Icelake use Total Memory Encryption (TME) in the place of
  151. MEE. TME-based SGX implementations do not have an integrity Merkle tree, which
  152. means integrity and replay-attacks are not mitigated. B, it includes
  153. additional changes to prevent cipher text from being returned and SW memory
  154. aliases from being created.
  155. DMA to enclave memory is blocked by range registers on both MEE and TME systems
  156. (SDM section 41.10).
  157. Usage Models
  158. ============
  159. Shared Library
  160. --------------
  161. Sensitive data and the code that acts on it is partitioned from the application
  162. into a separate library. The library is then linked as a DSO which can be loaded
  163. into an enclave. The application can then make individual function calls into
  164. the enclave through special SGX instructions. A run-time within the enclave is
  165. configured to marshal function parameters into and out of the enclave and to
  166. call the correct library function.
  167. Application Container
  168. ---------------------
  169. An application may be loaded into a container enclave which is specially
  170. configured with a library OS and run-time which permits the application to run.
  171. The enclave run-time and library OS work together to execute the application
  172. when a thread enters the enclave.
  173. Impact of Potential Kernel SGX Bugs
  174. ===================================
  175. EPC leaks
  176. ---------
  177. When EPC page leaks happen, a WARNING like this is shown in dmesg:
  178. "EREMOVE returned ... and an EPC page was leaked. SGX may become unusable..."
  179. This is effectively a kernel use-after-free of an EPC page, and due
  180. to the way SGX works, the bug is detected at freeing. Rather than
  181. adding the page back to the pool of available EPC pages, the kernel
  182. intentionally leaks the page to avoid additional errors in the future.
  183. When this happens, the kernel will likely soon leak more EPC pages, and
  184. SGX will likely become unusable because the memory available to SGX is
  185. limited. However, while this may be fatal to SGX, the rest of the kernel
  186. is unlikely to be impacted and should continue to work.
  187. As a result, when this happpens, user should stop running any new
  188. SGX workloads, (or just any new workloads), and migrate all valuable
  189. workloads. Although a machine reboot can recover all EPC memory, the bug
  190. should be reported to Linux developers.
  191. Virtual EPC
  192. ===========
  193. The implementation has also a virtual EPC driver to support SGX enclaves
  194. in guests. Unlike the SGX driver, an EPC page allocated by the virtual
  195. EPC driver doesn't have a specific enclave associated with it. This is
  196. because KVM doesn't track how a guest uses EPC pages.
  197. As a result, the SGX core page reclaimer doesn't support reclaiming EPC
  198. pages allocated to KVM guests through the virtual EPC driver. If the
  199. user wants to deploy SGX applications both on the host and in guests
  200. on the same machine, the user should reserve enough EPC (by taking out
  201. total virtual EPC size of all SGX VMs from the physical EPC size) for
  202. host SGX applications so they can run with acceptable performance.
  203. Architectural behavior is to restore all EPC pages to an uninitialized
  204. state also after a guest reboot. Because this state can be reached only
  205. through the privileged ``ENCLS[EREMOVE]`` instruction, ``/dev/sgx_vepc``
  206. provides the ``SGX_IOC_VEPC_REMOVE_ALL`` ioctl to execute the instruction
  207. on all pages in the virtual EPC.
  208. ``EREMOVE`` can fail for three reasons. Userspace must pay attention
  209. to expected failures and handle them as follows:
  210. 1. Page removal will always fail when any thread is running in the
  211. enclave to which the page belongs. In this case the ioctl will
  212. return ``EBUSY`` independent of whether it has successfully removed
  213. some pages; userspace can avoid these failures by preventing execution
  214. of any vcpu which maps the virtual EPC.
  215. 2. Page removal will cause a general protection fault if two calls to
  216. ``EREMOVE`` happen concurrently for pages that refer to the same
  217. "SECS" metadata pages. This can happen if there are concurrent
  218. invocations to ``SGX_IOC_VEPC_REMOVE_ALL``, or if a ``/dev/sgx_vepc``
  219. file descriptor in the guest is closed at the same time as
  220. ``SGX_IOC_VEPC_REMOVE_ALL``; it will also be reported as ``EBUSY``.
  221. This can be avoided in userspace by serializing calls to the ioctl()
  222. and to close(), but in general it should not be a problem.
  223. 3. Finally, page removal will fail for SECS metadata pages which still
  224. have child pages. Child pages can be removed by executing
  225. ``SGX_IOC_VEPC_REMOVE_ALL`` on all ``/dev/sgx_vepc`` file descriptors
  226. mapped into the guest. This means that the ioctl() must be called
  227. twice: an initial set of calls to remove child pages and a subsequent
  228. set of calls to remove SECS pages. The second set of calls is only
  229. required for those mappings that returned a nonzero value from the
  230. first call. It indicates a bug in the kernel or the userspace client
  231. if any of the second round of ``SGX_IOC_VEPC_REMOVE_ALL`` calls has
  232. a return code other than 0.