microcode.rst 7.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240
  1. .. SPDX-License-Identifier: GPL-2.0
  2. ==========================
  3. The Linux Microcode Loader
  4. ==========================
  5. :Authors: - Fenghua Yu <[email protected]>
  6. - Borislav Petkov <[email protected]>
  7. - Ashok Raj <[email protected]>
  8. The kernel has a x86 microcode loading facility which is supposed to
  9. provide microcode loading methods in the OS. Potential use cases are
  10. updating the microcode on platforms beyond the OEM End-Of-Life support,
  11. and updating the microcode on long-running systems without rebooting.
  12. The loader supports three loading methods:
  13. Early load microcode
  14. ====================
  15. The kernel can update microcode very early during boot. Loading
  16. microcode early can fix CPU issues before they are observed during
  17. kernel boot time.
  18. The microcode is stored in an initrd file. During boot, it is read from
  19. it and loaded into the CPU cores.
  20. The format of the combined initrd image is microcode in (uncompressed)
  21. cpio format followed by the (possibly compressed) initrd image. The
  22. loader parses the combined initrd image during boot.
  23. The microcode files in cpio name space are:
  24. on Intel:
  25. kernel/x86/microcode/GenuineIntel.bin
  26. on AMD :
  27. kernel/x86/microcode/AuthenticAMD.bin
  28. During BSP (BootStrapping Processor) boot (pre-SMP), the kernel
  29. scans the microcode file in the initrd. If microcode matching the
  30. CPU is found, it will be applied in the BSP and later on in all APs
  31. (Application Processors).
  32. The loader also saves the matching microcode for the CPU in memory.
  33. Thus, the cached microcode patch is applied when CPUs resume from a
  34. sleep state.
  35. Here's a crude example how to prepare an initrd with microcode (this is
  36. normally done automatically by the distribution, when recreating the
  37. initrd, so you don't really have to do it yourself. It is documented
  38. here for future reference only).
  39. ::
  40. #!/bin/bash
  41. if [ -z "$1" ]; then
  42. echo "You need to supply an initrd file"
  43. exit 1
  44. fi
  45. INITRD="$1"
  46. DSTDIR=kernel/x86/microcode
  47. TMPDIR=/tmp/initrd
  48. rm -rf $TMPDIR
  49. mkdir $TMPDIR
  50. cd $TMPDIR
  51. mkdir -p $DSTDIR
  52. if [ -d /lib/firmware/amd-ucode ]; then
  53. cat /lib/firmware/amd-ucode/microcode_amd*.bin > $DSTDIR/AuthenticAMD.bin
  54. fi
  55. if [ -d /lib/firmware/intel-ucode ]; then
  56. cat /lib/firmware/intel-ucode/* > $DSTDIR/GenuineIntel.bin
  57. fi
  58. find . | cpio -o -H newc >../ucode.cpio
  59. cd ..
  60. mv $INITRD $INITRD.orig
  61. cat ucode.cpio $INITRD.orig > $INITRD
  62. rm -rf $TMPDIR
  63. The system needs to have the microcode packages installed into
  64. /lib/firmware or you need to fixup the paths above if yours are
  65. somewhere else and/or you've downloaded them directly from the processor
  66. vendor's site.
  67. Late loading
  68. ============
  69. You simply install the microcode packages your distro supplies and
  70. run::
  71. # echo 1 > /sys/devices/system/cpu/microcode/reload
  72. as root.
  73. The loading mechanism looks for microcode blobs in
  74. /lib/firmware/{intel-ucode,amd-ucode}. The default distro installation
  75. packages already put them there.
  76. Since kernel 5.19, late loading is not enabled by default.
  77. The /dev/cpu/microcode method has been removed in 5.19.
  78. Why is late loading dangerous?
  79. ==============================
  80. Synchronizing all CPUs
  81. ----------------------
  82. The microcode engine which receives the microcode update is shared
  83. between the two logical threads in a SMT system. Therefore, when
  84. the update is executed on one SMT thread of the core, the sibling
  85. "automatically" gets the update.
  86. Since the microcode can "simulate" MSRs too, while the microcode update
  87. is in progress, those simulated MSRs transiently cease to exist. This
  88. can result in unpredictable results if the SMT sibling thread happens to
  89. be in the middle of an access to such an MSR. The usual observation is
  90. that such MSR accesses cause #GPs to be raised to signal that former are
  91. not present.
  92. The disappearing MSRs are just one common issue which is being observed.
  93. Any other instruction that's being patched and gets concurrently
  94. executed by the other SMT sibling, can also result in similar,
  95. unpredictable behavior.
  96. To eliminate this case, a stop_machine()-based CPU synchronization was
  97. introduced as a way to guarantee that all logical CPUs will not execute
  98. any code but just wait in a spin loop, polling an atomic variable.
  99. While this took care of device or external interrupts, IPIs including
  100. LVT ones, such as CMCI etc, it cannot address other special interrupts
  101. that can't be shut off. Those are Machine Check (#MC), System Management
  102. (#SMI) and Non-Maskable interrupts (#NMI).
  103. Machine Checks
  104. --------------
  105. Machine Checks (#MC) are non-maskable. There are two kinds of MCEs.
  106. Fatal un-recoverable MCEs and recoverable MCEs. While un-recoverable
  107. errors are fatal, recoverable errors can also happen in kernel context
  108. are also treated as fatal by the kernel.
  109. On certain Intel machines, MCEs are also broadcast to all threads in a
  110. system. If one thread is in the middle of executing WRMSR, a MCE will be
  111. taken at the end of the flow. Either way, they will wait for the thread
  112. performing the wrmsr(0x79) to rendezvous in the MCE handler and shutdown
  113. eventually if any of the threads in the system fail to check in to the
  114. MCE rendezvous.
  115. To be paranoid and get predictable behavior, the OS can choose to set
  116. MCG_STATUS.MCIP. Since MCEs can be at most one in a system, if an
  117. MCE was signaled, the above condition will promote to a system reset
  118. automatically. OS can turn off MCIP at the end of the update for that
  119. core.
  120. System Management Interrupt
  121. ---------------------------
  122. SMIs are also broadcast to all CPUs in the platform. Microcode update
  123. requests exclusive access to the core before writing to MSR 0x79. So if
  124. it does happen such that, one thread is in WRMSR flow, and the 2nd got
  125. an SMI, that thread will be stopped in the first instruction in the SMI
  126. handler.
  127. Since the secondary thread is stopped in the first instruction in SMI,
  128. there is very little chance that it would be in the middle of executing
  129. an instruction being patched. Plus OS has no way to stop SMIs from
  130. happening.
  131. Non-Maskable Interrupts
  132. -----------------------
  133. When thread0 of a core is doing the microcode update, if thread1 is
  134. pulled into NMI, that can cause unpredictable behavior due to the
  135. reasons above.
  136. OS can choose a variety of methods to avoid running into this situation.
  137. Is the microcode suitable for late loading?
  138. -------------------------------------------
  139. Late loading is done when the system is fully operational and running
  140. real workloads. Late loading behavior depends on what the base patch on
  141. the CPU is before upgrading to the new patch.
  142. This is true for Intel CPUs.
  143. Consider, for example, a CPU has patch level 1 and the update is to
  144. patch level 3.
  145. Between patch1 and patch3, patch2 might have deprecated a software-visible
  146. feature.
  147. This is unacceptable if software is even potentially using that feature.
  148. For instance, say MSR_X is no longer available after an update,
  149. accessing that MSR will cause a #GP fault.
  150. Basically there is no way to declare a new microcode update suitable
  151. for late-loading. This is another one of the problems that caused late
  152. loading to be not enabled by default.
  153. Builtin microcode
  154. =================
  155. The loader supports also loading of a builtin microcode supplied through
  156. the regular builtin firmware method CONFIG_EXTRA_FIRMWARE. Only 64-bit is
  157. currently supported.
  158. Here's an example::
  159. CONFIG_EXTRA_FIRMWARE="intel-ucode/06-3a-09 amd-ucode/microcode_amd_fam15h.bin"
  160. CONFIG_EXTRA_FIRMWARE_DIR="/lib/firmware"
  161. This basically means, you have the following tree structure locally::
  162. /lib/firmware/
  163. |-- amd-ucode
  164. ...
  165. | |-- microcode_amd_fam15h.bin
  166. ...
  167. |-- intel-ucode
  168. ...
  169. | |-- 06-3a-09
  170. ...
  171. so that the build system can find those files and integrate them into
  172. the final kernel image. The early loader finds them and applies them.
  173. Needless to say, this method is not the most flexible one because it
  174. requires rebuilding the kernel each time updated microcode from the CPU
  175. vendor is available.