docs: add IRQ documentation at the core-api book

There are 4 IRQ documentation files under Documentation/*.txt.

Move them into a new directory (core-api/irq) and add a new
index file for it.

While here, use a title markup for the Debugging section of the
irq-domain.rst file.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/2da7485c3718e1442e6b4c2dd66857b776e8899b.1588345503.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
This commit is contained in:
Mauro Carvalho Chehab
2020-05-01 17:37:51 +02:00
committed by Jonathan Corbet
parent a74e2a2264
commit e00b0ab86c
12 changed files with 22 additions and 9 deletions

View File

@@ -0,0 +1,24 @@
===============
What is an IRQ?
===============
An IRQ is an interrupt request from a device.
Currently they can come in over a pin, or over a packet.
Several devices may be connected to the same pin thus
sharing an IRQ.
An IRQ number is a kernel identifier used to talk about a hardware
interrupt source. Typically this is an index into the global irq_desc
array, but except for what linux/interrupt.h implements the details
are architecture specific.
An IRQ number is an enumeration of the possible interrupt sources on a
machine. Typically what is enumerated is the number of input pins on
all of the interrupt controller in the system. In the case of ISA
what is enumerated are the 16 input pins on the two i8259 interrupt
controllers.
Architectures can assign additional meaning to the IRQ numbers, and
are encouraged to in the case where there is any manual configuration
of the hardware involved. The ISA IRQs are a classic example of
assigning this kind of additional meaning.

View File

@@ -0,0 +1,11 @@
====
IRQs
====
.. toctree::
:maxdepth: 1
concepts
irq-affinity
irq-domain
irqflags-tracing

View File

@@ -0,0 +1,70 @@
================
SMP IRQ affinity
================
ChangeLog:
- Started by Ingo Molnar <mingo@redhat.com>
- Update by Max Krasnyansky <maxk@qualcomm.com>
/proc/irq/IRQ#/smp_affinity and /proc/irq/IRQ#/smp_affinity_list specify
which target CPUs are permitted for a given IRQ source. It's a bitmask
(smp_affinity) or cpu list (smp_affinity_list) of allowed CPUs. It's not
allowed to turn off all CPUs, and if an IRQ controller does not support
IRQ affinity then the value will not change from the default of all cpus.
/proc/irq/default_smp_affinity specifies default affinity mask that applies
to all non-active IRQs. Once IRQ is allocated/activated its affinity bitmask
will be set to the default mask. It can then be changed as described above.
Default mask is 0xffffffff.
Here is an example of restricting IRQ44 (eth1) to CPU0-3 then restricting
it to CPU4-7 (this is an 8-CPU SMP box)::
[root@moon 44]# cd /proc/irq/44
[root@moon 44]# cat smp_affinity
ffffffff
[root@moon 44]# echo 0f > smp_affinity
[root@moon 44]# cat smp_affinity
0000000f
[root@moon 44]# ping -f h
PING hell (195.4.7.3): 56 data bytes
...
--- hell ping statistics ---
6029 packets transmitted, 6027 packets received, 0% packet loss
round-trip min/avg/max = 0.1/0.1/0.4 ms
[root@moon 44]# cat /proc/interrupts | grep 'CPU\|44:'
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
44: 1068 1785 1785 1783 0 0 0 0 IO-APIC-level eth1
As can be seen from the line above IRQ44 was delivered only to the first four
processors (0-3).
Now lets restrict that IRQ to CPU(4-7).
::
[root@moon 44]# echo f0 > smp_affinity
[root@moon 44]# cat smp_affinity
000000f0
[root@moon 44]# ping -f h
PING hell (195.4.7.3): 56 data bytes
..
--- hell ping statistics ---
2779 packets transmitted, 2777 packets received, 0% packet loss
round-trip min/avg/max = 0.1/0.5/585.4 ms
[root@moon 44]# cat /proc/interrupts | 'CPU\|44:'
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
44: 1068 1785 1785 1783 1784 1069 1070 1069 IO-APIC-level eth1
This time around IRQ44 was delivered only to the last four processors.
i.e counters for the CPU0-3 did not change.
Here is an example of limiting that same irq (44) to cpus 1024 to 1031::
[root@moon 44]# echo 1024-1031 > smp_affinity_list
[root@moon 44]# cat smp_affinity_list
1024-1031
Note that to do this with a bitmask would require 32 bitmasks of zero
to follow the pertinent one.

View File

@@ -0,0 +1,270 @@
===============================================
The irq_domain interrupt number mapping library
===============================================
The current design of the Linux kernel uses a single large number
space where each separate IRQ source is assigned a different number.
This is simple when there is only one interrupt controller, but in
systems with multiple interrupt controllers the kernel must ensure
that each one gets assigned non-overlapping allocations of Linux
IRQ numbers.
The number of interrupt controllers registered as unique irqchips
show a rising tendency: for example subdrivers of different kinds
such as GPIO controllers avoid reimplementing identical callback
mechanisms as the IRQ core system by modelling their interrupt
handlers as irqchips, i.e. in effect cascading interrupt controllers.
Here the interrupt number loose all kind of correspondence to
hardware interrupt numbers: whereas in the past, IRQ numbers could
be chosen so they matched the hardware IRQ line into the root
interrupt controller (i.e. the component actually fireing the
interrupt line to the CPU) nowadays this number is just a number.
For this reason we need a mechanism to separate controller-local
interrupt numbers, called hardware irq's, from Linux IRQ numbers.
The irq_alloc_desc*() and irq_free_desc*() APIs provide allocation of
irq numbers, but they don't provide any support for reverse mapping of
the controller-local IRQ (hwirq) number into the Linux IRQ number
space.
The irq_domain library adds mapping between hwirq and IRQ numbers on
top of the irq_alloc_desc*() API. An irq_domain to manage mapping is
preferred over interrupt controller drivers open coding their own
reverse mapping scheme.
irq_domain also implements translation from an abstract irq_fwspec
structure to hwirq numbers (Device Tree and ACPI GSI so far), and can
be easily extended to support other IRQ topology data sources.
irq_domain usage
================
An interrupt controller driver creates and registers an irq_domain by
calling one of the irq_domain_add_*() functions (each mapping method
has a different allocator function, more on that later). The function
will return a pointer to the irq_domain on success. The caller must
provide the allocator function with an irq_domain_ops structure.
In most cases, the irq_domain will begin empty without any mappings
between hwirq and IRQ numbers. Mappings are added to the irq_domain
by calling irq_create_mapping() which accepts the irq_domain and a
hwirq number as arguments. If a mapping for the hwirq doesn't already
exist then it will allocate a new Linux irq_desc, associate it with
the hwirq, and call the .map() callback so the driver can perform any
required hardware setup.
When an interrupt is received, irq_find_mapping() function should
be used to find the Linux IRQ number from the hwirq number.
The irq_create_mapping() function must be called *atleast once*
before any call to irq_find_mapping(), lest the descriptor will not
be allocated.
If the driver has the Linux IRQ number or the irq_data pointer, and
needs to know the associated hwirq number (such as in the irq_chip
callbacks) then it can be directly obtained from irq_data->hwirq.
Types of irq_domain mappings
============================
There are several mechanisms available for reverse mapping from hwirq
to Linux irq, and each mechanism uses a different allocation function.
Which reverse map type should be used depends on the use case. Each
of the reverse map types are described below:
Linear
------
::
irq_domain_add_linear()
irq_domain_create_linear()
The linear reverse map maintains a fixed size table indexed by the
hwirq number. When a hwirq is mapped, an irq_desc is allocated for
the hwirq, and the IRQ number is stored in the table.
The Linear map is a good choice when the maximum number of hwirqs is
fixed and a relatively small number (~ < 256). The advantages of this
map are fixed time lookup for IRQ numbers, and irq_descs are only
allocated for in-use IRQs. The disadvantage is that the table must be
as large as the largest possible hwirq number.
irq_domain_add_linear() and irq_domain_create_linear() are functionally
equivalent, except for the first argument is different - the former
accepts an Open Firmware specific 'struct device_node', while the latter
accepts a more general abstraction 'struct fwnode_handle'.
The majority of drivers should use the linear map.
Tree
----
::
irq_domain_add_tree()
irq_domain_create_tree()
The irq_domain maintains a radix tree map from hwirq numbers to Linux
IRQs. When an hwirq is mapped, an irq_desc is allocated and the
hwirq is used as the lookup key for the radix tree.
The tree map is a good choice if the hwirq number can be very large
since it doesn't need to allocate a table as large as the largest
hwirq number. The disadvantage is that hwirq to IRQ number lookup is
dependent on how many entries are in the table.
irq_domain_add_tree() and irq_domain_create_tree() are functionally
equivalent, except for the first argument is different - the former
accepts an Open Firmware specific 'struct device_node', while the latter
accepts a more general abstraction 'struct fwnode_handle'.
Very few drivers should need this mapping.
No Map
------
::
irq_domain_add_nomap()
The No Map mapping is to be used when the hwirq number is
programmable in the hardware. In this case it is best to program the
Linux IRQ number into the hardware itself so that no mapping is
required. Calling irq_create_direct_mapping() will allocate a Linux
IRQ number and call the .map() callback so that driver can program the
Linux IRQ number into the hardware.
Most drivers cannot use this mapping.
Legacy
------
::
irq_domain_add_simple()
irq_domain_add_legacy()
irq_domain_add_legacy_isa()
The Legacy mapping is a special case for drivers that already have a
range of irq_descs allocated for the hwirqs. It is used when the
driver cannot be immediately converted to use the linear mapping. For
example, many embedded system board support files use a set of #defines
for IRQ numbers that are passed to struct device registrations. In that
case the Linux IRQ numbers cannot be dynamically assigned and the legacy
mapping should be used.
The legacy map assumes a contiguous range of IRQ numbers has already
been allocated for the controller and that the IRQ number can be
calculated by adding a fixed offset to the hwirq number, and
visa-versa. The disadvantage is that it requires the interrupt
controller to manage IRQ allocations and it requires an irq_desc to be
allocated for every hwirq, even if it is unused.
The legacy map should only be used if fixed IRQ mappings must be
supported. For example, ISA controllers would use the legacy map for
mapping Linux IRQs 0-15 so that existing ISA drivers get the correct IRQ
numbers.
Most users of legacy mappings should use irq_domain_add_simple() which
will use a legacy domain only if an IRQ range is supplied by the
system and will otherwise use a linear domain mapping. The semantics
of this call are such that if an IRQ range is specified then
descriptors will be allocated on-the-fly for it, and if no range is
specified it will fall through to irq_domain_add_linear() which means
*no* irq descriptors will be allocated.
A typical use case for simple domains is where an irqchip provider
is supporting both dynamic and static IRQ assignments.
In order to avoid ending up in a situation where a linear domain is
used and no descriptor gets allocated it is very important to make sure
that the driver using the simple domain call irq_create_mapping()
before any irq_find_mapping() since the latter will actually work
for the static IRQ assignment case.
Hierarchy IRQ domain
--------------------
On some architectures, there may be multiple interrupt controllers
involved in delivering an interrupt from the device to the target CPU.
Let's look at a typical interrupt delivering path on x86 platforms::
Device --> IOAPIC -> Interrupt remapping Controller -> Local APIC -> CPU
There are three interrupt controllers involved:
1) IOAPIC controller
2) Interrupt remapping controller
3) Local APIC controller
To support such a hardware topology and make software architecture match
hardware architecture, an irq_domain data structure is built for each
interrupt controller and those irq_domains are organized into hierarchy.
When building irq_domain hierarchy, the irq_domain near to the device is
child and the irq_domain near to CPU is parent. So a hierarchy structure
as below will be built for the example above::
CPU Vector irq_domain (root irq_domain to manage CPU vectors)
^
|
Interrupt Remapping irq_domain (manage irq_remapping entries)
^
|
IOAPIC irq_domain (manage IOAPIC delivery entries/pins)
There are four major interfaces to use hierarchy irq_domain:
1) irq_domain_alloc_irqs(): allocate IRQ descriptors and interrupt
controller related resources to deliver these interrupts.
2) irq_domain_free_irqs(): free IRQ descriptors and interrupt controller
related resources associated with these interrupts.
3) irq_domain_activate_irq(): activate interrupt controller hardware to
deliver the interrupt.
4) irq_domain_deactivate_irq(): deactivate interrupt controller hardware
to stop delivering the interrupt.
Following changes are needed to support hierarchy irq_domain:
1) a new field 'parent' is added to struct irq_domain; it's used to
maintain irq_domain hierarchy information.
2) a new field 'parent_data' is added to struct irq_data; it's used to
build hierarchy irq_data to match hierarchy irq_domains. The irq_data
is used to store irq_domain pointer and hardware irq number.
3) new callbacks are added to struct irq_domain_ops to support hierarchy
irq_domain operations.
With support of hierarchy irq_domain and hierarchy irq_data ready, an
irq_domain structure is built for each interrupt controller, and an
irq_data structure is allocated for each irq_domain associated with an
IRQ. Now we could go one step further to support stacked(hierarchy)
irq_chip. That is, an irq_chip is associated with each irq_data along
the hierarchy. A child irq_chip may implement a required action by
itself or by cooperating with its parent irq_chip.
With stacked irq_chip, interrupt controller driver only needs to deal
with the hardware managed by itself and may ask for services from its
parent irq_chip when needed. So we could achieve a much cleaner
software architecture.
For an interrupt controller driver to support hierarchy irq_domain, it
needs to:
1) Implement irq_domain_ops.alloc and irq_domain_ops.free
2) Optionally implement irq_domain_ops.activate and
irq_domain_ops.deactivate.
3) Optionally implement an irq_chip to manage the interrupt controller
hardware.
4) No need to implement irq_domain_ops.map and irq_domain_ops.unmap,
they are unused with hierarchy irq_domain.
Hierarchy irq_domain is in no way x86 specific, and is heavily used to
support other architectures, such as ARM, ARM64 etc.
Debugging
=========
Most of the internals of the IRQ subsystem are exposed in debugfs by
turning CONFIG_GENERIC_IRQ_DEBUGFS on.

View File

@@ -0,0 +1,52 @@
=======================
IRQ-flags state tracing
=======================
:Author: started by Ingo Molnar <mingo@redhat.com>
The "irq-flags tracing" feature "traces" hardirq and softirq state, in
that it gives interested subsystems an opportunity to be notified of
every hardirqs-off/hardirqs-on, softirqs-off/softirqs-on event that
happens in the kernel.
CONFIG_TRACE_IRQFLAGS_SUPPORT is needed for CONFIG_PROVE_SPIN_LOCKING
and CONFIG_PROVE_RW_LOCKING to be offered by the generic lock debugging
code. Otherwise only CONFIG_PROVE_MUTEX_LOCKING and
CONFIG_PROVE_RWSEM_LOCKING will be offered on an architecture - these
are locking APIs that are not used in IRQ context. (the one exception
for rwsems is worked around)
Architecture support for this is certainly not in the "trivial"
category, because lots of lowlevel assembly code deal with irq-flags
state changes. But an architecture can be irq-flags-tracing enabled in a
rather straightforward and risk-free manner.
Architectures that want to support this need to do a couple of
code-organizational changes first:
- add and enable TRACE_IRQFLAGS_SUPPORT in their arch level Kconfig file
and then a couple of functional changes are needed as well to implement
irq-flags-tracing support:
- in lowlevel entry code add (build-conditional) calls to the
trace_hardirqs_off()/trace_hardirqs_on() functions. The lock validator
closely guards whether the 'real' irq-flags matches the 'virtual'
irq-flags state, and complains loudly (and turns itself off) if the
two do not match. Usually most of the time for arch support for
irq-flags-tracing is spent in this state: look at the lockdep
complaint, try to figure out the assembly code we did not cover yet,
fix and repeat. Once the system has booted up and works without a
lockdep complaint in the irq-flags-tracing functions arch support is
complete.
- if the architecture has non-maskable interrupts then those need to be
excluded from the irq-tracing [and lock validation] mechanism via
lockdep_off()/lockdep_on().
In general there is no risk from having an incomplete irq-flags-tracing
implementation in an architecture: lockdep will detect that and will
turn itself off. I.e. the lock validator will still be reliable. There
should be no crashes due to irq-tracing bugs. (except if the assembly
changes break other code by modifying conditions or registers that
shouldn't be)