docs: add IRQ documentation at the core-api book
There are 4 IRQ documentation files under Documentation/*.txt. Move them into a new directory (core-api/irq) and add a new index file for it. While here, use a title markup for the Debugging section of the irq-domain.rst file. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/2da7485c3718e1442e6b4c2dd66857b776e8899b.1588345503.git.mchehab+huawei@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
This commit is contained in:

committed by
Jonathan Corbet

parent
a74e2a2264
commit
e00b0ab86c
24
Documentation/core-api/irq/concepts.rst
Normal file
24
Documentation/core-api/irq/concepts.rst
Normal file
@@ -0,0 +1,24 @@
|
||||
===============
|
||||
What is an IRQ?
|
||||
===============
|
||||
|
||||
An IRQ is an interrupt request from a device.
|
||||
Currently they can come in over a pin, or over a packet.
|
||||
Several devices may be connected to the same pin thus
|
||||
sharing an IRQ.
|
||||
|
||||
An IRQ number is a kernel identifier used to talk about a hardware
|
||||
interrupt source. Typically this is an index into the global irq_desc
|
||||
array, but except for what linux/interrupt.h implements the details
|
||||
are architecture specific.
|
||||
|
||||
An IRQ number is an enumeration of the possible interrupt sources on a
|
||||
machine. Typically what is enumerated is the number of input pins on
|
||||
all of the interrupt controller in the system. In the case of ISA
|
||||
what is enumerated are the 16 input pins on the two i8259 interrupt
|
||||
controllers.
|
||||
|
||||
Architectures can assign additional meaning to the IRQ numbers, and
|
||||
are encouraged to in the case where there is any manual configuration
|
||||
of the hardware involved. The ISA IRQs are a classic example of
|
||||
assigning this kind of additional meaning.
|
11
Documentation/core-api/irq/index.rst
Normal file
11
Documentation/core-api/irq/index.rst
Normal file
@@ -0,0 +1,11 @@
|
||||
====
|
||||
IRQs
|
||||
====
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
concepts
|
||||
irq-affinity
|
||||
irq-domain
|
||||
irqflags-tracing
|
70
Documentation/core-api/irq/irq-affinity.rst
Normal file
70
Documentation/core-api/irq/irq-affinity.rst
Normal file
@@ -0,0 +1,70 @@
|
||||
================
|
||||
SMP IRQ affinity
|
||||
================
|
||||
|
||||
ChangeLog:
|
||||
- Started by Ingo Molnar <mingo@redhat.com>
|
||||
- Update by Max Krasnyansky <maxk@qualcomm.com>
|
||||
|
||||
|
||||
/proc/irq/IRQ#/smp_affinity and /proc/irq/IRQ#/smp_affinity_list specify
|
||||
which target CPUs are permitted for a given IRQ source. It's a bitmask
|
||||
(smp_affinity) or cpu list (smp_affinity_list) of allowed CPUs. It's not
|
||||
allowed to turn off all CPUs, and if an IRQ controller does not support
|
||||
IRQ affinity then the value will not change from the default of all cpus.
|
||||
|
||||
/proc/irq/default_smp_affinity specifies default affinity mask that applies
|
||||
to all non-active IRQs. Once IRQ is allocated/activated its affinity bitmask
|
||||
will be set to the default mask. It can then be changed as described above.
|
||||
Default mask is 0xffffffff.
|
||||
|
||||
Here is an example of restricting IRQ44 (eth1) to CPU0-3 then restricting
|
||||
it to CPU4-7 (this is an 8-CPU SMP box)::
|
||||
|
||||
[root@moon 44]# cd /proc/irq/44
|
||||
[root@moon 44]# cat smp_affinity
|
||||
ffffffff
|
||||
|
||||
[root@moon 44]# echo 0f > smp_affinity
|
||||
[root@moon 44]# cat smp_affinity
|
||||
0000000f
|
||||
[root@moon 44]# ping -f h
|
||||
PING hell (195.4.7.3): 56 data bytes
|
||||
...
|
||||
--- hell ping statistics ---
|
||||
6029 packets transmitted, 6027 packets received, 0% packet loss
|
||||
round-trip min/avg/max = 0.1/0.1/0.4 ms
|
||||
[root@moon 44]# cat /proc/interrupts | grep 'CPU\|44:'
|
||||
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
|
||||
44: 1068 1785 1785 1783 0 0 0 0 IO-APIC-level eth1
|
||||
|
||||
As can be seen from the line above IRQ44 was delivered only to the first four
|
||||
processors (0-3).
|
||||
Now lets restrict that IRQ to CPU(4-7).
|
||||
|
||||
::
|
||||
|
||||
[root@moon 44]# echo f0 > smp_affinity
|
||||
[root@moon 44]# cat smp_affinity
|
||||
000000f0
|
||||
[root@moon 44]# ping -f h
|
||||
PING hell (195.4.7.3): 56 data bytes
|
||||
..
|
||||
--- hell ping statistics ---
|
||||
2779 packets transmitted, 2777 packets received, 0% packet loss
|
||||
round-trip min/avg/max = 0.1/0.5/585.4 ms
|
||||
[root@moon 44]# cat /proc/interrupts | 'CPU\|44:'
|
||||
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
|
||||
44: 1068 1785 1785 1783 1784 1069 1070 1069 IO-APIC-level eth1
|
||||
|
||||
This time around IRQ44 was delivered only to the last four processors.
|
||||
i.e counters for the CPU0-3 did not change.
|
||||
|
||||
Here is an example of limiting that same irq (44) to cpus 1024 to 1031::
|
||||
|
||||
[root@moon 44]# echo 1024-1031 > smp_affinity_list
|
||||
[root@moon 44]# cat smp_affinity_list
|
||||
1024-1031
|
||||
|
||||
Note that to do this with a bitmask would require 32 bitmasks of zero
|
||||
to follow the pertinent one.
|
270
Documentation/core-api/irq/irq-domain.rst
Normal file
270
Documentation/core-api/irq/irq-domain.rst
Normal file
@@ -0,0 +1,270 @@
|
||||
===============================================
|
||||
The irq_domain interrupt number mapping library
|
||||
===============================================
|
||||
|
||||
The current design of the Linux kernel uses a single large number
|
||||
space where each separate IRQ source is assigned a different number.
|
||||
This is simple when there is only one interrupt controller, but in
|
||||
systems with multiple interrupt controllers the kernel must ensure
|
||||
that each one gets assigned non-overlapping allocations of Linux
|
||||
IRQ numbers.
|
||||
|
||||
The number of interrupt controllers registered as unique irqchips
|
||||
show a rising tendency: for example subdrivers of different kinds
|
||||
such as GPIO controllers avoid reimplementing identical callback
|
||||
mechanisms as the IRQ core system by modelling their interrupt
|
||||
handlers as irqchips, i.e. in effect cascading interrupt controllers.
|
||||
|
||||
Here the interrupt number loose all kind of correspondence to
|
||||
hardware interrupt numbers: whereas in the past, IRQ numbers could
|
||||
be chosen so they matched the hardware IRQ line into the root
|
||||
interrupt controller (i.e. the component actually fireing the
|
||||
interrupt line to the CPU) nowadays this number is just a number.
|
||||
|
||||
For this reason we need a mechanism to separate controller-local
|
||||
interrupt numbers, called hardware irq's, from Linux IRQ numbers.
|
||||
|
||||
The irq_alloc_desc*() and irq_free_desc*() APIs provide allocation of
|
||||
irq numbers, but they don't provide any support for reverse mapping of
|
||||
the controller-local IRQ (hwirq) number into the Linux IRQ number
|
||||
space.
|
||||
|
||||
The irq_domain library adds mapping between hwirq and IRQ numbers on
|
||||
top of the irq_alloc_desc*() API. An irq_domain to manage mapping is
|
||||
preferred over interrupt controller drivers open coding their own
|
||||
reverse mapping scheme.
|
||||
|
||||
irq_domain also implements translation from an abstract irq_fwspec
|
||||
structure to hwirq numbers (Device Tree and ACPI GSI so far), and can
|
||||
be easily extended to support other IRQ topology data sources.
|
||||
|
||||
irq_domain usage
|
||||
================
|
||||
|
||||
An interrupt controller driver creates and registers an irq_domain by
|
||||
calling one of the irq_domain_add_*() functions (each mapping method
|
||||
has a different allocator function, more on that later). The function
|
||||
will return a pointer to the irq_domain on success. The caller must
|
||||
provide the allocator function with an irq_domain_ops structure.
|
||||
|
||||
In most cases, the irq_domain will begin empty without any mappings
|
||||
between hwirq and IRQ numbers. Mappings are added to the irq_domain
|
||||
by calling irq_create_mapping() which accepts the irq_domain and a
|
||||
hwirq number as arguments. If a mapping for the hwirq doesn't already
|
||||
exist then it will allocate a new Linux irq_desc, associate it with
|
||||
the hwirq, and call the .map() callback so the driver can perform any
|
||||
required hardware setup.
|
||||
|
||||
When an interrupt is received, irq_find_mapping() function should
|
||||
be used to find the Linux IRQ number from the hwirq number.
|
||||
|
||||
The irq_create_mapping() function must be called *atleast once*
|
||||
before any call to irq_find_mapping(), lest the descriptor will not
|
||||
be allocated.
|
||||
|
||||
If the driver has the Linux IRQ number or the irq_data pointer, and
|
||||
needs to know the associated hwirq number (such as in the irq_chip
|
||||
callbacks) then it can be directly obtained from irq_data->hwirq.
|
||||
|
||||
Types of irq_domain mappings
|
||||
============================
|
||||
|
||||
There are several mechanisms available for reverse mapping from hwirq
|
||||
to Linux irq, and each mechanism uses a different allocation function.
|
||||
Which reverse map type should be used depends on the use case. Each
|
||||
of the reverse map types are described below:
|
||||
|
||||
Linear
|
||||
------
|
||||
|
||||
::
|
||||
|
||||
irq_domain_add_linear()
|
||||
irq_domain_create_linear()
|
||||
|
||||
The linear reverse map maintains a fixed size table indexed by the
|
||||
hwirq number. When a hwirq is mapped, an irq_desc is allocated for
|
||||
the hwirq, and the IRQ number is stored in the table.
|
||||
|
||||
The Linear map is a good choice when the maximum number of hwirqs is
|
||||
fixed and a relatively small number (~ < 256). The advantages of this
|
||||
map are fixed time lookup for IRQ numbers, and irq_descs are only
|
||||
allocated for in-use IRQs. The disadvantage is that the table must be
|
||||
as large as the largest possible hwirq number.
|
||||
|
||||
irq_domain_add_linear() and irq_domain_create_linear() are functionally
|
||||
equivalent, except for the first argument is different - the former
|
||||
accepts an Open Firmware specific 'struct device_node', while the latter
|
||||
accepts a more general abstraction 'struct fwnode_handle'.
|
||||
|
||||
The majority of drivers should use the linear map.
|
||||
|
||||
Tree
|
||||
----
|
||||
|
||||
::
|
||||
|
||||
irq_domain_add_tree()
|
||||
irq_domain_create_tree()
|
||||
|
||||
The irq_domain maintains a radix tree map from hwirq numbers to Linux
|
||||
IRQs. When an hwirq is mapped, an irq_desc is allocated and the
|
||||
hwirq is used as the lookup key for the radix tree.
|
||||
|
||||
The tree map is a good choice if the hwirq number can be very large
|
||||
since it doesn't need to allocate a table as large as the largest
|
||||
hwirq number. The disadvantage is that hwirq to IRQ number lookup is
|
||||
dependent on how many entries are in the table.
|
||||
|
||||
irq_domain_add_tree() and irq_domain_create_tree() are functionally
|
||||
equivalent, except for the first argument is different - the former
|
||||
accepts an Open Firmware specific 'struct device_node', while the latter
|
||||
accepts a more general abstraction 'struct fwnode_handle'.
|
||||
|
||||
Very few drivers should need this mapping.
|
||||
|
||||
No Map
|
||||
------
|
||||
|
||||
::
|
||||
|
||||
irq_domain_add_nomap()
|
||||
|
||||
The No Map mapping is to be used when the hwirq number is
|
||||
programmable in the hardware. In this case it is best to program the
|
||||
Linux IRQ number into the hardware itself so that no mapping is
|
||||
required. Calling irq_create_direct_mapping() will allocate a Linux
|
||||
IRQ number and call the .map() callback so that driver can program the
|
||||
Linux IRQ number into the hardware.
|
||||
|
||||
Most drivers cannot use this mapping.
|
||||
|
||||
Legacy
|
||||
------
|
||||
|
||||
::
|
||||
|
||||
irq_domain_add_simple()
|
||||
irq_domain_add_legacy()
|
||||
irq_domain_add_legacy_isa()
|
||||
|
||||
The Legacy mapping is a special case for drivers that already have a
|
||||
range of irq_descs allocated for the hwirqs. It is used when the
|
||||
driver cannot be immediately converted to use the linear mapping. For
|
||||
example, many embedded system board support files use a set of #defines
|
||||
for IRQ numbers that are passed to struct device registrations. In that
|
||||
case the Linux IRQ numbers cannot be dynamically assigned and the legacy
|
||||
mapping should be used.
|
||||
|
||||
The legacy map assumes a contiguous range of IRQ numbers has already
|
||||
been allocated for the controller and that the IRQ number can be
|
||||
calculated by adding a fixed offset to the hwirq number, and
|
||||
visa-versa. The disadvantage is that it requires the interrupt
|
||||
controller to manage IRQ allocations and it requires an irq_desc to be
|
||||
allocated for every hwirq, even if it is unused.
|
||||
|
||||
The legacy map should only be used if fixed IRQ mappings must be
|
||||
supported. For example, ISA controllers would use the legacy map for
|
||||
mapping Linux IRQs 0-15 so that existing ISA drivers get the correct IRQ
|
||||
numbers.
|
||||
|
||||
Most users of legacy mappings should use irq_domain_add_simple() which
|
||||
will use a legacy domain only if an IRQ range is supplied by the
|
||||
system and will otherwise use a linear domain mapping. The semantics
|
||||
of this call are such that if an IRQ range is specified then
|
||||
descriptors will be allocated on-the-fly for it, and if no range is
|
||||
specified it will fall through to irq_domain_add_linear() which means
|
||||
*no* irq descriptors will be allocated.
|
||||
|
||||
A typical use case for simple domains is where an irqchip provider
|
||||
is supporting both dynamic and static IRQ assignments.
|
||||
|
||||
In order to avoid ending up in a situation where a linear domain is
|
||||
used and no descriptor gets allocated it is very important to make sure
|
||||
that the driver using the simple domain call irq_create_mapping()
|
||||
before any irq_find_mapping() since the latter will actually work
|
||||
for the static IRQ assignment case.
|
||||
|
||||
Hierarchy IRQ domain
|
||||
--------------------
|
||||
|
||||
On some architectures, there may be multiple interrupt controllers
|
||||
involved in delivering an interrupt from the device to the target CPU.
|
||||
Let's look at a typical interrupt delivering path on x86 platforms::
|
||||
|
||||
Device --> IOAPIC -> Interrupt remapping Controller -> Local APIC -> CPU
|
||||
|
||||
There are three interrupt controllers involved:
|
||||
|
||||
1) IOAPIC controller
|
||||
2) Interrupt remapping controller
|
||||
3) Local APIC controller
|
||||
|
||||
To support such a hardware topology and make software architecture match
|
||||
hardware architecture, an irq_domain data structure is built for each
|
||||
interrupt controller and those irq_domains are organized into hierarchy.
|
||||
When building irq_domain hierarchy, the irq_domain near to the device is
|
||||
child and the irq_domain near to CPU is parent. So a hierarchy structure
|
||||
as below will be built for the example above::
|
||||
|
||||
CPU Vector irq_domain (root irq_domain to manage CPU vectors)
|
||||
^
|
||||
|
|
||||
Interrupt Remapping irq_domain (manage irq_remapping entries)
|
||||
^
|
||||
|
|
||||
IOAPIC irq_domain (manage IOAPIC delivery entries/pins)
|
||||
|
||||
There are four major interfaces to use hierarchy irq_domain:
|
||||
|
||||
1) irq_domain_alloc_irqs(): allocate IRQ descriptors and interrupt
|
||||
controller related resources to deliver these interrupts.
|
||||
2) irq_domain_free_irqs(): free IRQ descriptors and interrupt controller
|
||||
related resources associated with these interrupts.
|
||||
3) irq_domain_activate_irq(): activate interrupt controller hardware to
|
||||
deliver the interrupt.
|
||||
4) irq_domain_deactivate_irq(): deactivate interrupt controller hardware
|
||||
to stop delivering the interrupt.
|
||||
|
||||
Following changes are needed to support hierarchy irq_domain:
|
||||
|
||||
1) a new field 'parent' is added to struct irq_domain; it's used to
|
||||
maintain irq_domain hierarchy information.
|
||||
2) a new field 'parent_data' is added to struct irq_data; it's used to
|
||||
build hierarchy irq_data to match hierarchy irq_domains. The irq_data
|
||||
is used to store irq_domain pointer and hardware irq number.
|
||||
3) new callbacks are added to struct irq_domain_ops to support hierarchy
|
||||
irq_domain operations.
|
||||
|
||||
With support of hierarchy irq_domain and hierarchy irq_data ready, an
|
||||
irq_domain structure is built for each interrupt controller, and an
|
||||
irq_data structure is allocated for each irq_domain associated with an
|
||||
IRQ. Now we could go one step further to support stacked(hierarchy)
|
||||
irq_chip. That is, an irq_chip is associated with each irq_data along
|
||||
the hierarchy. A child irq_chip may implement a required action by
|
||||
itself or by cooperating with its parent irq_chip.
|
||||
|
||||
With stacked irq_chip, interrupt controller driver only needs to deal
|
||||
with the hardware managed by itself and may ask for services from its
|
||||
parent irq_chip when needed. So we could achieve a much cleaner
|
||||
software architecture.
|
||||
|
||||
For an interrupt controller driver to support hierarchy irq_domain, it
|
||||
needs to:
|
||||
|
||||
1) Implement irq_domain_ops.alloc and irq_domain_ops.free
|
||||
2) Optionally implement irq_domain_ops.activate and
|
||||
irq_domain_ops.deactivate.
|
||||
3) Optionally implement an irq_chip to manage the interrupt controller
|
||||
hardware.
|
||||
4) No need to implement irq_domain_ops.map and irq_domain_ops.unmap,
|
||||
they are unused with hierarchy irq_domain.
|
||||
|
||||
Hierarchy irq_domain is in no way x86 specific, and is heavily used to
|
||||
support other architectures, such as ARM, ARM64 etc.
|
||||
|
||||
Debugging
|
||||
=========
|
||||
|
||||
Most of the internals of the IRQ subsystem are exposed in debugfs by
|
||||
turning CONFIG_GENERIC_IRQ_DEBUGFS on.
|
52
Documentation/core-api/irq/irqflags-tracing.rst
Normal file
52
Documentation/core-api/irq/irqflags-tracing.rst
Normal file
@@ -0,0 +1,52 @@
|
||||
=======================
|
||||
IRQ-flags state tracing
|
||||
=======================
|
||||
|
||||
:Author: started by Ingo Molnar <mingo@redhat.com>
|
||||
|
||||
The "irq-flags tracing" feature "traces" hardirq and softirq state, in
|
||||
that it gives interested subsystems an opportunity to be notified of
|
||||
every hardirqs-off/hardirqs-on, softirqs-off/softirqs-on event that
|
||||
happens in the kernel.
|
||||
|
||||
CONFIG_TRACE_IRQFLAGS_SUPPORT is needed for CONFIG_PROVE_SPIN_LOCKING
|
||||
and CONFIG_PROVE_RW_LOCKING to be offered by the generic lock debugging
|
||||
code. Otherwise only CONFIG_PROVE_MUTEX_LOCKING and
|
||||
CONFIG_PROVE_RWSEM_LOCKING will be offered on an architecture - these
|
||||
are locking APIs that are not used in IRQ context. (the one exception
|
||||
for rwsems is worked around)
|
||||
|
||||
Architecture support for this is certainly not in the "trivial"
|
||||
category, because lots of lowlevel assembly code deal with irq-flags
|
||||
state changes. But an architecture can be irq-flags-tracing enabled in a
|
||||
rather straightforward and risk-free manner.
|
||||
|
||||
Architectures that want to support this need to do a couple of
|
||||
code-organizational changes first:
|
||||
|
||||
- add and enable TRACE_IRQFLAGS_SUPPORT in their arch level Kconfig file
|
||||
|
||||
and then a couple of functional changes are needed as well to implement
|
||||
irq-flags-tracing support:
|
||||
|
||||
- in lowlevel entry code add (build-conditional) calls to the
|
||||
trace_hardirqs_off()/trace_hardirqs_on() functions. The lock validator
|
||||
closely guards whether the 'real' irq-flags matches the 'virtual'
|
||||
irq-flags state, and complains loudly (and turns itself off) if the
|
||||
two do not match. Usually most of the time for arch support for
|
||||
irq-flags-tracing is spent in this state: look at the lockdep
|
||||
complaint, try to figure out the assembly code we did not cover yet,
|
||||
fix and repeat. Once the system has booted up and works without a
|
||||
lockdep complaint in the irq-flags-tracing functions arch support is
|
||||
complete.
|
||||
- if the architecture has non-maskable interrupts then those need to be
|
||||
excluded from the irq-tracing [and lock validation] mechanism via
|
||||
lockdep_off()/lockdep_on().
|
||||
|
||||
In general there is no risk from having an incomplete irq-flags-tracing
|
||||
implementation in an architecture: lockdep will detect that and will
|
||||
turn itself off. I.e. the lock validator will still be reliable. There
|
||||
should be no crashes due to irq-tracing bugs. (except if the assembly
|
||||
changes break other code by modifying conditions or registers that
|
||||
shouldn't be)
|
||||
|
Reference in New Issue
Block a user