boot-interrupts.rst 6.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159
  1. .. SPDX-License-Identifier: GPL-2.0
  2. ===============
  3. Boot Interrupts
  4. ===============
  5. :Author: - Sean V Kelley <[email protected]>
  6. Overview
  7. ========
  8. On PCI Express, interrupts are represented with either MSI or inbound
  9. interrupt messages (Assert_INTx/Deassert_INTx). The integrated IO-APIC in a
  10. given Core IO converts the legacy interrupt messages from PCI Express to
  11. MSI interrupts. If the IO-APIC is disabled (via the mask bits in the
  12. IO-APIC table entries), the messages are routed to the legacy PCH. This
  13. in-band interrupt mechanism was traditionally necessary for systems that
  14. did not support the IO-APIC and for boot. Intel in the past has used the
  15. term "boot interrupts" to describe this mechanism. Further, the PCI Express
  16. protocol describes this in-band legacy wire-interrupt INTx mechanism for
  17. I/O devices to signal PCI-style level interrupts. The subsequent paragraphs
  18. describe problems with the Core IO handling of INTx message routing to the
  19. PCH and mitigation within BIOS and the OS.
  20. Issue
  21. =====
  22. When in-band legacy INTx messages are forwarded to the PCH, they in turn
  23. trigger a new interrupt for which the OS likely lacks a handler. When an
  24. interrupt goes unhandled over time, they are tracked by the Linux kernel as
  25. Spurious Interrupts. The IRQ will be disabled by the Linux kernel after it
  26. reaches a specific count with the error "nobody cared". This disabled IRQ
  27. now prevents valid usage by an existing interrupt which may happen to share
  28. the IRQ line::
  29. irq 19: nobody cared (try booting with the "irqpoll" option)
  30. CPU: 0 PID: 2988 Comm: irq/34-nipalk Tainted: 4.14.87-rt49-02410-g4a640ec-dirty #1
  31. Hardware name: National Instruments NI PXIe-8880/NI PXIe-8880, BIOS 2.1.5f1 01/09/2020
  32. Call Trace:
  33. <IRQ>
  34. ? dump_stack+0x46/0x5e
  35. ? __report_bad_irq+0x2e/0xb0
  36. ? note_interrupt+0x242/0x290
  37. ? nNIKAL100_memoryRead16+0x8/0x10 [nikal]
  38. ? handle_irq_event_percpu+0x55/0x70
  39. ? handle_irq_event+0x4f/0x80
  40. ? handle_fasteoi_irq+0x81/0x180
  41. ? handle_irq+0x1c/0x30
  42. ? do_IRQ+0x41/0xd0
  43. ? common_interrupt+0x84/0x84
  44. </IRQ>
  45. handlers:
  46. irq_default_primary_handler threaded usb_hcd_irq
  47. Disabling IRQ #19
  48. Conditions
  49. ==========
  50. The use of threaded interrupts is the most likely condition to trigger
  51. this problem today. Threaded interrupts may not be reenabled after the IRQ
  52. handler wakes. These "one shot" conditions mean that the threaded interrupt
  53. needs to keep the interrupt line masked until the threaded handler has run.
  54. Especially when dealing with high data rate interrupts, the thread needs to
  55. run to completion; otherwise some handlers will end up in stack overflows
  56. since the interrupt of the issuing device is still active.
  57. Affected Chipsets
  58. =================
  59. The legacy interrupt forwarding mechanism exists today in a number of
  60. devices including but not limited to chipsets from AMD/ATI, Broadcom, and
  61. Intel. Changes made through the mitigations below have been applied to
  62. drivers/pci/quirks.c
  63. Starting with ICX there are no longer any IO-APICs in the Core IO's
  64. devices. IO-APIC is only in the PCH. Devices connected to the Core IO's
  65. PCIe Root Ports will use native MSI/MSI-X mechanisms.
  66. Mitigations
  67. ===========
  68. The mitigations take the form of PCI quirks. The preference has been to
  69. first identify and make use of a means to disable the routing to the PCH.
  70. In such a case a quirk to disable boot interrupt generation can be
  71. added. [1]_
  72. Intel® 6300ESB I/O Controller Hub
  73. Alternate Base Address Register:
  74. BIE: Boot Interrupt Enable
  75. == ===========================
  76. 0 Boot interrupt is enabled.
  77. 1 Boot interrupt is disabled.
  78. == ===========================
  79. Intel® Sandy Bridge through Sky Lake based Xeon servers:
  80. Coherent Interface Protocol Interrupt Control
  81. dis_intx_route2pch/dis_intx_route2ich/dis_intx_route2dmi2:
  82. When this bit is set. Local INTx messages received from the
  83. Intel® Quick Data DMA/PCI Express ports are not routed to legacy
  84. PCH - they are either converted into MSI via the integrated IO-APIC
  85. (if the IO-APIC mask bit is clear in the appropriate entries)
  86. or cause no further action (when mask bit is set)
  87. In the absence of a way to directly disable the routing, another approach
  88. has been to make use of PCI Interrupt pin to INTx routing tables for
  89. purposes of redirecting the interrupt handler to the rerouted interrupt
  90. line by default. Therefore, on chipsets where this INTx routing cannot be
  91. disabled, the Linux kernel will reroute the valid interrupt to its legacy
  92. interrupt. This redirection of the handler will prevent the occurrence of
  93. the spurious interrupt detection which would ordinarily disable the IRQ
  94. line due to excessive unhandled counts. [2]_
  95. The config option X86_REROUTE_FOR_BROKEN_BOOT_IRQS exists to enable (or
  96. disable) the redirection of the interrupt handler to the PCH interrupt
  97. line. The option can be overridden by either pci=ioapicreroute or
  98. pci=noioapicreroute. [3]_
  99. More Documentation
  100. ==================
  101. There is an overview of the legacy interrupt handling in several datasheets
  102. (6300ESB and 6700PXH below). While largely the same, it provides insight
  103. into the evolution of its handling with chipsets.
  104. Example of disabling of the boot interrupt
  105. ------------------------------------------
  106. - Intel® 6300ESB I/O Controller Hub (Document # 300641-004US)
  107. 5.7.3 Boot Interrupt
  108. https://www.intel.com/content/dam/doc/datasheet/6300esb-io-controller-hub-datasheet.pdf
  109. - Intel® Xeon® Processor E5-1600/2400/2600/4600 v3 Product Families
  110. Datasheet - Volume 2: Registers (Document # 330784-003)
  111. 6.6.41 cipintrc Coherent Interface Protocol Interrupt Control
  112. https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf
  113. Example of handler rerouting
  114. ----------------------------
  115. - Intel® 6700PXH 64-bit PCI Hub (Document # 302628)
  116. 2.15.2 PCI Express Legacy INTx Support and Boot Interrupt
  117. https://www.intel.com/content/dam/doc/datasheet/6700pxh-64-bit-pci-hub-datasheet.pdf
  118. If you have any legacy PCI interrupt questions that aren't answered, email me.
  119. Cheers,
  120. Sean V Kelley
  121. [email protected]
  122. .. [1] https://lore.kernel.org/r/[email protected]/
  123. .. [2] https://lore.kernel.org/r/[email protected]/
  124. .. [3] https://lore.kernel.org/r/[email protected]/