Merge v5.3-rc1 into drm-misc-next

Noralf needs some SPI patches in 5.3 to merge some work on tinydrm.

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
This commit is contained in:
Maxime Ripard
2019-07-22 21:24:10 +02:00
16695 changed files with 979854 additions and 436075 deletions

View File

@@ -5,7 +5,7 @@ Description: It is possible to switch the cpi setting of the mouse with the
press of a button.
When read, this file returns the raw number of the actual cpi
setting reported by the mouse. This number has to be further
processed to receive the real dpi value.
processed to receive the real dpi value:
VALUE DPI
1 400

View File

@@ -11,7 +11,7 @@ Description:
Kernel code may export it for complete or partial access.
GPIOs are identified as they are inside the kernel, using integers in
the range 0..INT_MAX. See Documentation/gpio for more information.
the range 0..INT_MAX. See Documentation/admin-guide/gpio for more information.
/sys/class/gpio
/export ... asks the kernel to export a GPIO to userspace

View File

@@ -1,6 +1,6 @@
rfkill - radio frequency (RF) connector kill switch support
For details to this subsystem look at Documentation/rfkill.txt.
For details to this subsystem look at Documentation/driver-api/rfkill.rst.
What: /sys/class/rfkill/rfkill[0-9]+/claim
Date: 09-Jul-2007

View File

@@ -423,23 +423,6 @@ Description:
(e.g. driver restart on the VM which owns the VF).
sysfs interface for NetEffect RNIC Low-Level iWARP driver (nes)
---------------------------------------------------------------
What: /sys/class/infiniband/nesX/hw_rev
What: /sys/class/infiniband/nesX/hca_type
What: /sys/class/infiniband/nesX/board_id
Date: Feb, 2008
KernelVersion: v2.6.25
Contact: linux-rdma@vger.kernel.org
Description:
hw_rev: (RO) Hardware revision number
hca_type: (RO) Host Channel Adapter type (NEX020)
board_id: (RO) Manufacturing board id
sysfs interface for Chelsio T4/T5 RDMA driver (cxgb4)
-----------------------------------------------------

View File

@@ -1,6 +1,6 @@
rfkill - radio frequency (RF) connector kill switch support
For details to this subsystem look at Documentation/rfkill.txt.
For details to this subsystem look at Documentation/driver-api/rfkill.rst.
For the deprecated /sys/class/rfkill/*/claim knobs of this interface look in
Documentation/ABI/removed/sysfs-class-rfkill.

View File

@@ -61,7 +61,7 @@ Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org>
Description:
The node's hit/miss statistics, in units of pages.
See Documentation/numastat.txt
See Documentation/admin-guide/numastat.rst
What: /sys/devices/system/node/nodeX/distance
Date: October 2002

View File

@@ -1,5 +1,4 @@
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/
asic_health
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/asic_health
Date: June 2018
KernelVersion: 4.19
@@ -9,9 +8,8 @@ Description: This file shows ASIC health status. The possible values are:
The files are read only.
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/
cpld1_version
cpld2_version
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/cpld1_version
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/cpld2_version
Date: June 2018
KernelVersion: 4.19
Contact: Vadim Pasternak <vadimpmellanox.com>
@@ -20,8 +18,7 @@ Description: These files show with which CPLD versions have been burned
The files are read only.
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/
fan_dir
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/fan_dir
Date: December 2018
KernelVersion: 5.0
@@ -32,8 +29,7 @@ Description: This file shows the system fans direction:
The files are read only.
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/
jtag_enable
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/jtag_enable
Date: November 2018
KernelVersion: 5.0
@@ -43,8 +39,7 @@ Description: These files show with which CPLD versions have been burned
The files are read only.
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/
jtag_enable
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/jtag_enable
Date: November 2018
KernelVersion: 5.0
@@ -87,16 +82,15 @@ Description: These files allow asserting system power cycling, switching
The files are write only.
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/
reset_aux_pwr_or_ref
reset_asic_thermal
reset_hotswap_or_halt
reset_hotswap_or_wd
reset_fw_reset
reset_long_pb
reset_main_pwr_fail
reset_short_pb
reset_sw_reset
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_aux_pwr_or_ref
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_asic_thermal
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_hotswap_or_halt
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_hotswap_or_wd
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_fw_reset
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_long_pb
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_main_pwr_fail
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_short_pb
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_sw_reset
Date: June 2018
KernelVersion: 4.19
Contact: Vadim Pasternak <vadimpmellanox.com>
@@ -110,11 +104,10 @@ Description: These files show the system reset cause, as following: power
The files are read only.
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/
reset_comex_pwr_fail
reset_from_comex
reset_system
reset_voltmon_upgrade_fail
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_comex_pwr_fail
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_from_comex
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_system
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_voltmon_upgrade_fail
Date: November 2018
KernelVersion: 5.0
@@ -127,3 +120,23 @@ Description: These files show the system reset cause, as following: ComEx
the last reset cause.
The files are read only.
Date: June 2019
KernelVersion: 5.3
Contact: Vadim Pasternak <vadimpmellanox.com>
Description: These files show the system reset cause, as following:
COMEX thermal shutdown; wathchdog power off or reset was derived
by one of the next components: COMEX, switch board or by Small Form
Factor mezzanine, reset requested from ASIC, reset cuased by BIOS
reload. Value 1 in file means this is reset cause, 0 - otherwise.
Only one of the above causes could be 1 at the same time, representing
only last reset cause.
The files are read only.
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_comex_thermal
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_comex_wd
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_from_asic
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_reload_bios
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_sff_wd
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/reset_swb_wd

View File

@@ -1,6 +1,6 @@
What: /sys/kernel/debug/cec/*/error-inj
Date: March 2018
Contact: Hans Verkuil <hans.verkuil@cisco.com>
Contact: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Description:
The CEC Framework allows for CEC error injection commands through

View File

@@ -0,0 +1,56 @@
What: /sys/kernel/debug/<cros-ec-device>/console_log
Date: September 2017
KernelVersion: 4.13
Description:
If the EC supports the CONSOLE_READ command type, this file
can be used to grab the EC logs. The kernel polls for the log
and keeps its own buffer but userspace should grab this and
write it out to some logs.
What: /sys/kernel/debug/<cros-ec-device>/panicinfo
Date: September 2017
KernelVersion: 4.13
Description:
This file dumps the EC panic information from the previous
reboot. This file will only exist if the PANIC_INFO command
type is supported by the EC.
What: /sys/kernel/debug/<cros-ec-device>/pdinfo
Date: June 2018
KernelVersion: 4.17
Description:
This file provides the port role, muxes and power debug
information for all the USB PD/type-C ports available. If
the are no ports available, this file will be just an empty
file.
What: /sys/kernel/debug/<cros-ec-device>/uptime
Date: June 2019
KernelVersion: 5.3
Description:
A u32 providing the time since EC booted in ms. This is
is used for synchronizing the AP host time with the EC
log. An error is returned if the command is not supported
by the EC or there is a communication problem.
What: /sys/kernel/debug/<cros-ec-device>/last_resume_result
Date: June 2019
KernelVersion: 5.3
Description:
Some ECs have a feature where they will track transitions to
the (Intel) processor's SLP_S0 line, in order to detect cases
where a system failed to go into S0ix. When the system resumes,
an EC with this feature will return a summary of SLP_S0
transitions that occurred. The last_resume_result file returns
the most recent response from the AP's resume message to the EC.
The bottom 31 bits contain a count of the number of SLP_S0
transitions that occurred since the suspend message was
received. Bit 31 is set if the EC attempted to wake the
system due to a timeout when watching for SLP_S0 transitions.
Callers can use this to detect a wake from the EC due to
S0ix timeouts. The result will be zero if no suspend
transitions have been attempted, or the EC does not support
this feature.
Output will be in the format: "0x%08x\n".

View File

@@ -3,7 +3,10 @@ Date: Jan 2019
KernelVersion: 5.1
Contact: oded.gabbay@gmail.com
Description: Sets the device address to be used for read or write through
PCI bar. The acceptable value is a string that starts with "0x"
PCI bar, or the device VA of a host mapped memory to be read or
written directly from the host. The latter option is allowed
only when the IOMMU is disabled.
The acceptable value is a string that starts with "0x"
What: /sys/kernel/debug/habanalabs/hl<n>/command_buffers
Date: Jan 2019
@@ -33,10 +36,12 @@ Contact: oded.gabbay@gmail.com
Description: Allows the root user to read or write directly through the
device's PCI bar. Writing to this file generates a write
transaction while reading from the file generates a read
transcation. This custom interface is needed (instead of using
transaction. This custom interface is needed (instead of using
the generic Linux user-space PCI mapping) because the DDR bar
is very small compared to the DDR memory and only the driver can
move the bar before and after the transaction
move the bar before and after the transaction.
If the IOMMU is disabled, it also allows the root user to read
or write from the host a device VA of a host mapped memory
What: /sys/kernel/debug/habanalabs/hl<n>/device
Date: Jan 2019
@@ -46,6 +51,13 @@ Description: Enables the root user to set the device to specific state.
Valid values are "disable", "enable", "suspend", "resume".
User can read this property to see the valid values
What: /sys/kernel/debug/habanalabs/hl<n>/engines
Date: Jul 2019
KernelVersion: 5.3
Contact: oded.gabbay@gmail.com
Description: Displays the status registers values of the device engines and
their derived idle status
What: /sys/kernel/debug/habanalabs/hl<n>/i2c_addr
Date: Jan 2019
KernelVersion: 5.1

View File

@@ -23,11 +23,9 @@ Description:
For writing, bytes 0-1 indicate the message type, one of enum
wilco_ec_msg_type. Byte 2+ consist of the data passed in the
request, starting at MBOX[0]
At least three bytes are required for writing, two for the type
and at least a single byte of data. Only the first
EC_MAILBOX_DATA_SIZE bytes of MBOX will be used.
request, starting at MBOX[0]. At least three bytes are required
for writing, two for the type and at least a single byte of
data.
Example:
// Request EC info type 3 (EC firmware build date)
@@ -40,7 +38,7 @@ Description:
$ cat /sys/kernel/debug/wilco_ec/raw
00 00 31 32 2f 32 31 2f 31 38 00 38 00 01 00 2f 00 ..12/21/18.8...
Note that the first 32 bytes of the received MBOX[] will be
printed, even if some of the data is junk. It is up to you to
know how many of the first bytes of data are the actual
response.
Note that the first 16 bytes of the received MBOX[] will be
printed, even if some of the data is junk, and skipping bytes
17 to 32. It is up to you to know how many of the first bytes of
data are the actual response.

View File

@@ -24,11 +24,11 @@ Description:
[euid=] [fowner=] [fsname=]]
lsm: [[subj_user=] [subj_role=] [subj_type=]
[obj_user=] [obj_role=] [obj_type=]]
option: [[appraise_type=]] [permit_directio]
option: [[appraise_type=]] [template=] [permit_directio]
base: func:= [BPRM_CHECK][MMAP_CHECK][CREDS_CHECK][FILE_CHECK][MODULE_CHECK]
[FIRMWARE_CHECK]
[KEXEC_KERNEL_CHECK] [KEXEC_INITRAMFS_CHECK]
[KEXEC_CMDLINE]
mask:= [[^]MAY_READ] [[^]MAY_WRITE] [[^]MAY_APPEND]
[[^]MAY_EXEC]
fsmagic:= hex value
@@ -38,6 +38,8 @@ Description:
fowner:= decimal value
lsm: are LSM specific
option: appraise_type:= [imasig]
template:= name of a defined IMA template type
(eg, ima-ng). Only valid when action is "measure".
pcr:= decimal value
default policy:

View File

@@ -29,4 +29,4 @@ Description:
17 - sectors discarded
18 - time spent discarding
For more details refer to Documentation/iostats.txt
For more details refer to Documentation/admin-guide/iostats.rst

View File

@@ -3,18 +3,28 @@ Date: August 2017
Contact: Daniel Colascione <dancol@google.com>
Description:
This file provides pre-summed memory information for a
process. The format is identical to /proc/pid/smaps,
process. The format is almost identical to /proc/pid/smaps,
except instead of an entry for each VMA in a process,
smaps_rollup has a single entry (tagged "[rollup]")
for which each field is the sum of the corresponding
fields from all the maps in /proc/pid/smaps.
For more details, see the procfs man page.
Additionally, the fields Pss_Anon, Pss_File and Pss_Shmem
are not present in /proc/pid/smaps. These fields represent
the sum of the Pss field of each type (anon, file, shmem).
For more details, see Documentation/filesystems/proc.txt
and the procfs man page.
Typical output looks like this:
00100000-ff709000 ---p 00000000 00:00 0 [rollup]
Size: 1192 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 884 kB
Pss: 385 kB
Pss_Anon: 301 kB
Pss_File: 80 kB
Pss_Shmem: 4 kB
Shared_Clean: 696 kB
Shared_Dirty: 0 kB
Private_Clean: 120 kB

View File

@@ -1,6 +1,6 @@
Where: /sys/fs/pstore/... (or /dev/pstore/...)
What: /sys/fs/pstore/... (or /dev/pstore/...)
Date: March 2011
Kernel Version: 2.6.39
KernelVersion: 2.6.39
Contact: tony.luck@intel.com
Description: Generic interface to platform dependent persistent storage.

View File

@@ -15,7 +15,7 @@ Description:
9 - I/Os currently in progress
10 - time spent doing I/Os (ms)
11 - weighted time spent doing I/Os (ms)
For more details refer Documentation/iostats.txt
For more details refer Documentation/admin-guide/iostats.rst
What: /sys/block/<disk>/<part>/stat

View File

@@ -45,7 +45,7 @@ Description:
- Values below -2 are rejected with -EINVAL
For more information, see
Documentation/laptops/disk-shock-protection.txt
Documentation/admin-guide/laptops/disk-shock-protection.rst
What: /sys/block/*/device/ncq_prio_enable

View File

@@ -33,3 +33,26 @@ Description: Contains the PIM/PAM/POM values, as reported by the
in sync with the values current in the channel subsystem).
Note: This is an I/O-subchannel specific attribute.
Users: s390-tools, HAL
What: /sys/bus/css/devices/.../driver_override
Date: June 2019
Contact: Cornelia Huck <cohuck@redhat.com>
linux-s390@vger.kernel.org
Description: This file allows the driver for a device to be specified. When
specified, only a driver with a name matching the value written
to driver_override will have an opportunity to bind to the
device. The override is specified by writing a string to the
driver_override file (echo vfio-ccw > driver_override) and
may be cleared with an empty string (echo > driver_override).
This returns the device to standard matching rules binding.
Writing to driver_override does not automatically unbind the
device from its current driver or make any attempt to
automatically load the specified driver. If no driver with a
matching name is currently loaded in the kernel, the device
will not bind to any driver. This also allows devices to
opt-out of driver binding using a driver_override name such as
"none". Only a single driver may be specified in the override,
there is no support for parsing delimiters.
Note that unlike the mechanism of the same name for pci, this
file does not allow to override basic matching rules. I.e.,
the driver must still match the subchannel type of the device.

View File

@@ -1,6 +1,6 @@
Where: /sys/bus/event_source/devices/<dev>/format
What: /sys/bus/event_source/devices/<dev>/format
Date: January 2012
Kernel Version: 3.3
KernelVersion: 3.3
Contact: Jiri Olsa <jolsa@redhat.com>
Description:
Attribute group to describe the magic bits that go into

View File

@@ -1,20 +1,20 @@
Where: /sys/bus/i2c/devices/.../heading0_input
What: /sys/bus/i2c/devices/.../heading0_input
Date: April 2010
Kernel Version: 2.6.36?
KernelVersion: 2.6.36?
Contact: alan.cox@intel.com
Description: Reports the current heading from the compass as a floating
point value in degrees.
Where: /sys/bus/i2c/devices/.../power_state
What: /sys/bus/i2c/devices/.../power_state
Date: April 2010
Kernel Version: 2.6.36?
KernelVersion: 2.6.36?
Contact: alan.cox@intel.com
Description: Sets the power state of the device. 0 sets the device into
sleep mode, 1 wakes it up.
Where: /sys/bus/i2c/devices/.../calibration
What: /sys/bus/i2c/devices/.../calibration
Date: April 2010
Kernel Version: 2.6.36?
KernelVersion: 2.6.36?
Contact: alan.cox@intel.com
Description: Sets the calibration on or off (1 = on, 0 = off). See the
chip data sheet.

View File

@@ -61,8 +61,11 @@ What: /sys/bus/iio/devices/triggerX/sampling_frequency_available
KernelVersion: 2.6.35
Contact: linux-iio@vger.kernel.org
Description:
When the internal sampling clock can only take a small
discrete set of values, this file lists those available.
When the internal sampling clock can only take a specific set of
frequencies, we can specify the available values with:
- a small discrete set of values like "0 2 4 6 8"
- a range with minimum, step and maximum frequencies like
"[min step max]"
What: /sys/bus/iio/devices/iio:deviceX/oversampling_ratio
KernelVersion: 2.6.38

View File

@@ -18,11 +18,11 @@ Description:
values are 'base' and 'lid'.
What: /sys/bus/iio/devices/iio:deviceX/id
Date: Septembre 2017
Date: September 2017
KernelVersion: 4.14
Contact: linux-iio@vger.kernel.org
Description:
This attribute is exposed by the CrOS EC legacy accelerometer
driver and represents the sensor ID as exposed by the EC. This
ID is used by the Android sensor service hardware abstraction
layer (sensor HAL) through the Android container on ChromeOS.
This attribute is exposed by the CrOS EC sensors driver and
represents the sensor ID as exposed by the EC. This ID is used
by the Android sensor service hardware abstraction layer (sensor
HAL) through the Android container on ChromeOS.

View File

@@ -1,4 +1,4 @@
What /sys/bus/iio/devices/iio:deviceX/sensor_sensitivity
What: /sys/bus/iio/devices/iio:deviceX/sensor_sensitivity
Date: January 2017
KernelVersion: 4.11
Contact: linux-iio@vger.kernel.org
@@ -6,7 +6,7 @@ Description:
Show or set the gain boost of the amp, from 0-31 range.
default 31
What /sys/bus/iio/devices/iio:deviceX/sensor_max_range
What: /sys/bus/iio/devices/iio:deviceX/sensor_max_range
Date: January 2017
KernelVersion: 4.11
Contact: linux-iio@vger.kernel.org

View File

@@ -0,0 +1,44 @@
What: /sys/bus/iio/devices/iio:deviceX/out_altvoltageY_frequency
KernelVersion:
Contact: linux-iio@vger.kernel.org
Description:
Stores the PLL frequency in Hz for channel Y.
Reading returns the actual frequency in Hz.
The ADF4371 has an integrated VCO with fundamendal output
frequency ranging from 4000000000 Hz 8000000000 Hz.
out_altvoltage0_frequency:
A divide by 1, 2, 4, 8, 16, 32 or circuit generates
frequencies from 62500000 Hz to 8000000000 Hz.
out_altvoltage1_frequency:
This channel duplicates the channel 0 frequency
out_altvoltage2_frequency:
A frequency doubler generates frequencies from
8000000000 Hz to 16000000000 Hz.
out_altvoltage3_frequency:
A frequency quadrupler generates frequencies from
16000000000 Hz to 32000000000 Hz.
Note: writes to one of the channels will affect the frequency of
all the other channels, since it involves changing the VCO
fundamental output frequency.
What: /sys/bus/iio/devices/iio:deviceX/out_altvoltageY_name
KernelVersion:
Contact: linux-iio@vger.kernel.org
Description:
Reading returns the datasheet name for channel Y:
out_altvoltage0_name: RF8x
out_altvoltage1_name: RFAUX8x
out_altvoltage2_name: RF16x
out_altvoltage3_name: RF32x
What: /sys/bus/iio/devices/iio:deviceX/out_altvoltageY_powerdown
KernelVersion:
Contact: linux-iio@vger.kernel.org
Description:
This attribute allows the user to power down the PLL and it's
RFOut buffers.
Writing 1 causes the specified channel to power down.
Clearing returns to normal operation.

View File

@@ -1,4 +1,4 @@
What /sys/bus/iio/devices/iio:deviceX/in_proximity_input
What: /sys/bus/iio/devices/iio:deviceX/in_proximity_input
Date: March 2014
KernelVersion: 3.15
Contact: Matt Ranostay <matt.ranostay@konsulko.com>
@@ -6,7 +6,7 @@ Description:
Get the current distance in meters of storm (1km steps)
1000-40000 = distance in meters
What /sys/bus/iio/devices/iio:deviceX/sensor_sensitivity
What: /sys/bus/iio/devices/iio:deviceX/sensor_sensitivity
Date: March 2014
KernelVersion: 3.15
Contact: Matt Ranostay <matt.ranostay@konsulko.com>

View File

@@ -9,9 +9,9 @@ errors may be "seen" / reported by the link partner and not the
problematic endpoint itself (which may report all counters as 0 as it never
saw any problems).
Where: /sys/bus/pci/devices/<dev>/aer_dev_correctable
What: /sys/bus/pci/devices/<dev>/aer_dev_correctable
Date: July 2018
Kernel Version: 4.19.0
KernelVersion: 4.19.0
Contact: linux-pci@vger.kernel.org, rajatja@google.com
Description: List of correctable errors seen and reported by this
PCI device using ERR_COR. Note that since multiple errors may
@@ -31,9 +31,9 @@ Header Log Overflow 0
TOTAL_ERR_COR 2
-------------------------------------------------------------------------
Where: /sys/bus/pci/devices/<dev>/aer_dev_fatal
What: /sys/bus/pci/devices/<dev>/aer_dev_fatal
Date: July 2018
Kernel Version: 4.19.0
KernelVersion: 4.19.0
Contact: linux-pci@vger.kernel.org, rajatja@google.com
Description: List of uncorrectable fatal errors seen and reported by this
PCI device using ERR_FATAL. Note that since multiple errors may
@@ -62,9 +62,9 @@ TLP Prefix Blocked Error 0
TOTAL_ERR_FATAL 0
-------------------------------------------------------------------------
Where: /sys/bus/pci/devices/<dev>/aer_dev_nonfatal
What: /sys/bus/pci/devices/<dev>/aer_dev_nonfatal
Date: July 2018
Kernel Version: 4.19.0
KernelVersion: 4.19.0
Contact: linux-pci@vger.kernel.org, rajatja@google.com
Description: List of uncorrectable nonfatal errors seen and reported by this
PCI device using ERR_NONFATAL. Note that since multiple errors
@@ -103,20 +103,20 @@ collectors) that are AER capable. These indicate the number of error messages as
device, so these counters include them and are thus cumulative of all the error
messages on the PCI hierarchy originating at that root port.
Where: /sys/bus/pci/devices/<dev>/aer_stats/aer_rootport_total_err_cor
What: /sys/bus/pci/devices/<dev>/aer_stats/aer_rootport_total_err_cor
Date: July 2018
Kernel Version: 4.19.0
KernelVersion: 4.19.0
Contact: linux-pci@vger.kernel.org, rajatja@google.com
Description: Total number of ERR_COR messages reported to rootport.
Where: /sys/bus/pci/devices/<dev>/aer_stats/aer_rootport_total_err_fatal
What: /sys/bus/pci/devices/<dev>/aer_stats/aer_rootport_total_err_fatal
Date: July 2018
Kernel Version: 4.19.0
KernelVersion: 4.19.0
Contact: linux-pci@vger.kernel.org, rajatja@google.com
Description: Total number of ERR_FATAL messages reported to rootport.
Where: /sys/bus/pci/devices/<dev>/aer_stats/aer_rootport_total_err_nonfatal
What: /sys/bus/pci/devices/<dev>/aer_stats/aer_rootport_total_err_nonfatal
Date: July 2018
Kernel Version: 4.19.0
KernelVersion: 4.19.0
Contact: linux-pci@vger.kernel.org, rajatja@google.com
Description: Total number of ERR_NONFATAL messages reported to rootport.

View File

@@ -1,68 +1,68 @@
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/model
What: /sys/bus/pci/devices/<dev>/ccissX/cXdY/model
Date: March 2009
Kernel Version: 2.6.30
KernelVersion: 2.6.30
Contact: iss_storagedev@hp.com
Description: Displays the SCSI INQUIRY page 0 model for logical drive
Y of controller X.
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/rev
What: /sys/bus/pci/devices/<dev>/ccissX/cXdY/rev
Date: March 2009
Kernel Version: 2.6.30
KernelVersion: 2.6.30
Contact: iss_storagedev@hp.com
Description: Displays the SCSI INQUIRY page 0 revision for logical
drive Y of controller X.
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/unique_id
What: /sys/bus/pci/devices/<dev>/ccissX/cXdY/unique_id
Date: March 2009
Kernel Version: 2.6.30
KernelVersion: 2.6.30
Contact: iss_storagedev@hp.com
Description: Displays the SCSI INQUIRY page 83 serial number for logical
drive Y of controller X.
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/vendor
What: /sys/bus/pci/devices/<dev>/ccissX/cXdY/vendor
Date: March 2009
Kernel Version: 2.6.30
KernelVersion: 2.6.30
Contact: iss_storagedev@hp.com
Description: Displays the SCSI INQUIRY page 0 vendor for logical drive
Y of controller X.
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/block:cciss!cXdY
What: /sys/bus/pci/devices/<dev>/ccissX/cXdY/block:cciss!cXdY
Date: March 2009
Kernel Version: 2.6.30
KernelVersion: 2.6.30
Contact: iss_storagedev@hp.com
Description: A symbolic link to /sys/block/cciss!cXdY
Where: /sys/bus/pci/devices/<dev>/ccissX/rescan
What: /sys/bus/pci/devices/<dev>/ccissX/rescan
Date: August 2009
Kernel Version: 2.6.31
KernelVersion: 2.6.31
Contact: iss_storagedev@hp.com
Description: Kicks of a rescan of the controller to discover logical
drive topology changes.
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/lunid
What: /sys/bus/pci/devices/<dev>/ccissX/cXdY/lunid
Date: August 2009
Kernel Version: 2.6.31
KernelVersion: 2.6.31
Contact: iss_storagedev@hp.com
Description: Displays the 8-byte LUN ID used to address logical
drive Y of controller X.
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/raid_level
What: /sys/bus/pci/devices/<dev>/ccissX/cXdY/raid_level
Date: August 2009
Kernel Version: 2.6.31
KernelVersion: 2.6.31
Contact: iss_storagedev@hp.com
Description: Displays the RAID level of logical drive Y of
controller X.
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/usage_count
What: /sys/bus/pci/devices/<dev>/ccissX/cXdY/usage_count
Date: August 2009
Kernel Version: 2.6.31
KernelVersion: 2.6.31
Contact: iss_storagedev@hp.com
Description: Displays the usage count (number of opens) of logical drive Y
of controller X.
Where: /sys/bus/pci/devices/<dev>/ccissX/resettable
What: /sys/bus/pci/devices/<dev>/ccissX/resettable
Date: February 2011
Kernel Version: 2.6.38
KernelVersion: 2.6.38
Contact: iss_storagedev@hp.com
Description: Value of 1 indicates the controller can honor the reset_devices
kernel parameter. Value of 0 indicates reset_devices cannot be
@@ -71,9 +71,9 @@ Description: Value of 1 indicates the controller can honor the reset_devices
a dump device, as kdump requires resetting the device in order
to work reliably.
Where: /sys/bus/pci/devices/<dev>/ccissX/transport_mode
What: /sys/bus/pci/devices/<dev>/ccissX/transport_mode
Date: July 2011
Kernel Version: 3.0
KernelVersion: 3.0
Contact: iss_storagedev@hp.com
Description: Value of "simple" indicates that the controller has been placed
in "simple mode". Value of "performant" indicates that the

View File

@@ -1,6 +1,6 @@
What: /sys/bus/siox/devices/siox-X/active
KernelVersion: 4.16
Contact: Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Contact: Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Description:
On reading represents the current state of the bus. If it
contains a "0" the bus is stopped and connected devices are
@@ -12,7 +12,7 @@ Description:
What: /sys/bus/siox/devices/siox-X/device_add
KernelVersion: 4.16
Contact: Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Contact: Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Description:
Write-only file. Write
@@ -27,13 +27,13 @@ Description:
What: /sys/bus/siox/devices/siox-X/device_remove
KernelVersion: 4.16
Contact: Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Contact: Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Description:
Write-only file. A single write removes the last device in the siox chain.
What: /sys/bus/siox/devices/siox-X/poll_interval_ns
KernelVersion: 4.16
Contact: Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Contact: Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Description:
Defines the interval between two poll cycles in nano seconds.
Note this is rounded to jiffies on writing. On reading the current value
@@ -41,33 +41,33 @@ Description:
What: /sys/bus/siox/devices/siox-X-Y/connected
KernelVersion: 4.16
Contact: Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Contact: Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Description:
Read-only value. "0" means the Yth device on siox bus X isn't "connected" i.e.
communication with it is not ensured. "1" signals a working connection.
What: /sys/bus/siox/devices/siox-X-Y/inbytes
KernelVersion: 4.16
Contact: Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Contact: Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Description:
Read-only value reporting the inbytes value provided to siox-X/device_add
What: /sys/bus/siox/devices/siox-X-Y/status_errors
KernelVersion: 4.16
Contact: Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Contact: Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Description:
Counts the number of time intervals when the read status byte doesn't yield the
expected value.
What: /sys/bus/siox/devices/siox-X-Y/type
KernelVersion: 4.16
Contact: Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Contact: Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Description:
Read-only value reporting the type value provided to siox-X/device_add.
What: /sys/bus/siox/devices/siox-X-Y/watchdog
KernelVersion: 4.16
Contact: Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Contact: Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Description:
Read-only value reporting if the watchdog of the siox device is
active. "0" means the watchdog is not active and the device is expected to
@@ -75,13 +75,13 @@ Description:
What: /sys/bus/siox/devices/siox-X-Y/watchdog_errors
KernelVersion: 4.16
Contact: Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Contact: Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Description:
Read-only value reporting the number to time intervals when the
watchdog was active.
What: /sys/bus/siox/devices/siox-X-Y/outbytes
KernelVersion: 4.16
Contact: Gavin Schenk <g.schenk@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Contact: Thorsten Scherer <t.scherer@eckelmann.de>, Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Description:
Read-only value reporting the outbytes value provided to siox-X/device_add.

View File

@@ -1,14 +1,14 @@
Where: /sys/bus/usb/.../powered
What: /sys/bus/usb/.../powered
Date: August 2008
Kernel Version: 2.6.26
KernelVersion: 2.6.26
Contact: Harrison Metzger <harrisonmetz@gmail.com>
Description: Controls whether the device's display will powered.
A value of 0 is off and a non-zero value is on.
Where: /sys/bus/usb/.../mode_msb
Where: /sys/bus/usb/.../mode_lsb
What: /sys/bus/usb/.../mode_msb
What: /sys/bus/usb/.../mode_lsb
Date: August 2008
Kernel Version: 2.6.26
KernelVersion: 2.6.26
Contact: Harrison Metzger <harrisonmetz@gmail.com>
Description: Controls the devices display mode.
For a 6 character display the values are
@@ -16,24 +16,24 @@ Description: Controls the devices display mode.
for an 8 character display the values are
MSB 0x08; LSB 0xFF.
Where: /sys/bus/usb/.../textmode
What: /sys/bus/usb/.../textmode
Date: August 2008
Kernel Version: 2.6.26
KernelVersion: 2.6.26
Contact: Harrison Metzger <harrisonmetz@gmail.com>
Description: Controls the way the device interprets its text buffer.
raw: each character controls its segment manually
hex: each character is between 0-15
ascii: each character is between '0'-'9' and 'A'-'F'.
Where: /sys/bus/usb/.../text
What: /sys/bus/usb/.../text
Date: August 2008
Kernel Version: 2.6.26
KernelVersion: 2.6.26
Contact: Harrison Metzger <harrisonmetz@gmail.com>
Description: The text (or data) for the device to display
Where: /sys/bus/usb/.../decimals
What: /sys/bus/usb/.../decimals
Date: August 2008
Kernel Version: 2.6.26
KernelVersion: 2.6.26
Contact: Harrison Metzger <harrisonmetz@gmail.com>
Description: Controls the decimal places on the device.
To set the nth decimal place, give this field

View File

@@ -4,7 +4,7 @@ KernelVersion: 3.5
Contact: Johan Hovold <jhovold@gmail.com>
Description:
Get the ALS output channel used as input in
ALS-current-control mode (0, 1), where
ALS-current-control mode (0, 1), where:
0 - out_current0 (backlight 0)
1 - out_current1 (backlight 1)
@@ -28,7 +28,7 @@ Date: April 2012
KernelVersion: 3.5
Contact: Johan Hovold <jhovold@gmail.com>
Description:
Set the brightness-mapping mode (0, 1), where
Set the brightness-mapping mode (0, 1), where:
0 - exponential mode
1 - linear mode
@@ -38,7 +38,7 @@ Date: April 2012
KernelVersion: 3.5
Contact: Johan Hovold <jhovold@gmail.com>
Description:
Set the PWM-input control mask (5 bits), where
Set the PWM-input control mask (5 bits), where:
bit 5 - PWM-input enabled in Zone 4
bit 4 - PWM-input enabled in Zone 3

View File

@@ -1,6 +1,6 @@
Note: Attributes that are shared between devices are stored in the directory
pointed to by the symlink device/.
Example: The real path of the attribute /sys/class/cxl/afu0.0s/irqs_max is
Please note that attributes that are shared between devices are stored in
the directory pointed to by the symlink device/.
For example, the real path of the attribute /sys/class/cxl/afu0.0s/irqs_max is
/sys/class/cxl/afu0.0s/device/irqs_max, i.e. /sys/class/cxl/afu0.0/irqs_max.

View File

@@ -47,7 +47,7 @@ Description:
What: /sys/class/devfreq/.../trans_stat
Date: October 2012
Contact: MyungJoo Ham <myungjoo.ham@samsung.com>
Descrtiption:
Description:
This ABI shows the statistics of devfreq behavior on a
specific device. It shows the time spent in each state and
the number of transitions between states.

View File

@@ -4,7 +4,7 @@ KernelVersion: 3.5
Contact: Johan Hovold <jhovold@gmail.com>
Description:
Set the ALS output channel to use as input in
ALS-current-control mode (1, 2), where
ALS-current-control mode (1, 2), where:
1 - out_current1
2 - out_current2
@@ -22,7 +22,7 @@ Date: April 2012
KernelVersion: 3.5
Contact: Johan Hovold <jhovold@gmail.com>
Description:
Set the pattern generator fall and rise times (0..7), where
Set the pattern generator fall and rise times (0..7), where:
0 - 2048 us
1 - 262 ms
@@ -45,7 +45,7 @@ Date: April 2012
KernelVersion: 3.5
Contact: Johan Hovold <jhovold@gmail.com>
Description:
Set the brightness-mapping mode (0, 1), where
Set the brightness-mapping mode (0, 1), where:
0 - exponential mode
1 - linear mode
@@ -55,7 +55,7 @@ Date: April 2012
KernelVersion: 3.5
Contact: Johan Hovold <jhovold@gmail.com>
Description:
Set the PWM-input control mask (5 bits), where
Set the PWM-input control mask (5 bits), where:
bit 5 - PWM-input enabled in Zone 4
bit 4 - PWM-input enabled in Zone 3

View File

@@ -5,7 +5,7 @@ Contact: Janne Kanniainen <janne.kanniainen@gmail.com>
Description:
Set the mode of LEDs. You should notice that changing the mode
of one LED will update the mode of its two sibling devices as
well.
well. Possible values are:
0 - normal
1 - audio
@@ -13,4 +13,4 @@ Description:
Normal: LEDs are fully on when enabled
Audio: LEDs brightness depends on sound level
Breathing: LEDs brightness varies at human breathing rate
Breathing: LEDs brightness varies at human breathing rate

View File

@@ -41,3 +41,11 @@ Description:
xgmii, moca, qsgmii, trgmii, 1000base-x, 2500base-x, rxaui,
xaui, 10gbase-kr, unknown
What: /sys/class/mdio_bus/<bus>/<device>/phy_standalone
Date: May 2019
KernelVersion: 5.3
Contact: netdev@vger.kernel.org
Description:
Boolean value indicating whether the PHY device is used in
standalone mode, without a net_device associated, by PHYLINK.
Attribute created only when this is the case.

View File

@@ -29,7 +29,7 @@ Contact: Bjørn Mork <bjorn@mork.no>
Description:
Unsigned integer.
Write a number ranging from 1 to 127 to add a qmap mux
Write a number ranging from 1 to 254 to add a qmap mux
based network device, supported by recent Qualcomm based
modems.
@@ -46,5 +46,5 @@ Contact: Bjørn Mork <bjorn@mork.no>
Description:
Unsigned integer.
Write a number ranging from 1 to 127 to delete a previously
Write a number ranging from 1 to 254 to delete a previously
created qmap mux based network device.

View File

@@ -376,10 +376,42 @@ Description:
supply. Normally this is configured based on the type of
connection made (e.g. A configured SDP should output a maximum
of 500mA so the input current limit is set to the same value).
Use preferably input_power_limit, and for problems that can be
solved using power limit use input_current_limit.
Access: Read, Write
Valid values: Represented in microamps
What: /sys/class/power_supply/<supply_name>/input_voltage_limit
Date: May 2019
Contact: linux-pm@vger.kernel.org
Description:
This entry configures the incoming VBUS voltage limit currently
set in the supply. Normally this is configured based on
system-level knowledge or user input (e.g. This is part of the
Pixel C's thermal management strategy to effectively limit the
input power to 5V when the screen is on to meet Google's skin
temperature targets). Note that this feature should not be
used for safety critical things.
Use preferably input_power_limit, and for problems that can be
solved using power limit use input_voltage_limit.
Access: Read, Write
Valid values: Represented in microvolts
What: /sys/class/power_supply/<supply_name>/input_power_limit
Date: May 2019
Contact: linux-pm@vger.kernel.org
Description:
This entry configures the incoming power limit currently set
in the supply. Normally this is configured based on
system-level knowledge or user input. Use preferably this
feature to limit the incoming power and use current/voltage
limit only for problems that can be solved using power limit.
Access: Read, Write
Valid values: Represented in microwatts
What: /sys/class/power_supply/<supply_name>/online,
Date: May 2007
Contact: linux-pm@vger.kernel.org

View File

@@ -0,0 +1,30 @@
What: /sys/class/power_supply/wilco-charger/charge_type
Date: April 2019
KernelVersion: 5.2
Description:
What charging algorithm to use:
Standard: Fully charges battery at a standard rate.
Adaptive: Battery settings adaptively optimized based on
typical battery usage pattern.
Fast: Battery charges over a shorter period.
Trickle: Extends battery lifespan, intended for users who
primarily use their Chromebook while connected to AC.
Custom: A low and high threshold percentage is specified.
Charging begins when level drops below
charge_control_start_threshold, and ceases when
level is above charge_control_end_threshold.
What: /sys/class/power_supply/wilco-charger/charge_control_start_threshold
Date: April 2019
KernelVersion: 5.2
Description:
Used when charge_type="Custom", as described above. Measured in
percentages. The valid range is [50, 95].
What: /sys/class/power_supply/wilco-charger/charge_control_end_threshold
Date: April 2019
KernelVersion: 5.2
Description:
Used when charge_type="Custom", as described above. Measured in
percentages. The valid range is [55, 100].

View File

@@ -5,7 +5,7 @@ Contact: linux-pm@vger.kernel.org
Description:
The powercap/ class sub directory belongs to the power cap
subsystem. Refer to
Documentation/power/powercap/powercap.txt for details.
Documentation/power/powercap/powercap.rst for details.
What: /sys/class/powercap/<control type>
Date: September 2013
@@ -147,6 +147,6 @@ What: /sys/class/powercap/.../<power zone>/enabled
Date: September 2013
KernelVersion: 3.13
Contact: linux-pm@vger.kernel.org
Description
Description:
This allows to enable/disable power capping at power zone level.
This applies to current power zone and its children.

View File

@@ -1,6 +1,6 @@
switchtec - Microsemi Switchtec PCI Switch Management Endpoint
For details on this subsystem look at Documentation/switchtec.txt.
For details on this subsystem look at Documentation/driver-api/switchtec.rst.
What: /sys/class/switchtec
Date: 05-Jan-2017

View File

@@ -125,12 +125,6 @@ Description:
The EUI-48 of this device in colon separated hex
octets.
What: /sys/class/uwb_rc/uwbN/<EUI-48>/BPST
Date: July 2008
KernelVersion: 2.6.27
Contact: linux-usb@vger.kernel.org
Description:
What: /sys/class/uwb_rc/uwbN/<EUI-48>/IEs
Date: July 2008
KernelVersion: 2.6.27

View File

@@ -34,7 +34,7 @@ Description: CPU topology files that describe kernel limits related to
present: cpus that have been identified as being present in
the system.
See Documentation/cputopology.txt for more information.
See Documentation/admin-guide/cputopology.rst for more information.
What: /sys/devices/system/cpu/probe
@@ -103,7 +103,7 @@ Description: CPU topology files that describe a logical CPU's relationship
thread_siblings_list: human-readable list of cpu#'s hardware
threads within the same core as cpu#
See Documentation/cputopology.txt for more information.
See Documentation/admin-guide/cputopology.rst for more information.
What: /sys/devices/system/cpu/cpuidle/current_driver
@@ -137,7 +137,8 @@ Description: Discover cpuidle policy and mechanism
current_governor: (RW) displays current idle policy. Users can
switch the governor at runtime by writing to this file.
See files in Documentation/cpuidle/ for more information.
See Documentation/admin-guide/pm/cpuidle.rst and
Documentation/driver-api/pm/cpuidle.rst for more information.
What: /sys/devices/system/cpu/cpuX/cpuidle/stateN/name
@@ -538,3 +539,26 @@ Description: Intel Energy and Performance Bias Hint (EPB)
This attribute is present for all online CPUs supporting the
Intel EPB feature.
What: /sys/devices/system/cpu/umwait_control
/sys/devices/system/cpu/umwait_control/enable_c02
/sys/devices/system/cpu/umwait_control/max_time
Date: May 2019
Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org>
Description: Umwait control
enable_c02: Read/write interface to control umwait C0.2 state
Read returns C0.2 state status:
0: C0.2 is disabled
1: C0.2 is enabled
Write 'y' or '1' or 'on' to enable C0.2 state.
Write 'n' or '0' or 'off' to disable C0.2 state.
The interface is case insensitive.
max_time: Read/write interface to control umwait maximum time
in TSC-quanta that the CPU can reside in either C0.1
or C0.2 state. The time is an unsigned 32-bit number.
Note that a value of zero means there is no limit.
Low order two bits must be zero.

View File

@@ -1,6 +1,6 @@
What: /sys/bus/pci/drivers/altera-cvp/chkcfg
Date: May 2017
Kernel Version: 4.13
KernelVersion: 4.13
Contact: Anatolij Gustschin <agust@denx.de>
Description:
Contains either 1 or 0 and controls if configuration

View File

@@ -62,18 +62,20 @@ What: /sys/class/habanalabs/hl<n>/ic_clk
Date: Jan 2019
KernelVersion: 5.1
Contact: oded.gabbay@gmail.com
Description: Allows the user to set the maximum clock frequency of the
Interconnect fabric. Writes to this parameter affect the device
only when the power management profile is set to "manual" mode.
The device IC clock might be set to lower value then the
Description: Allows the user to set the maximum clock frequency, in Hz, of
the Interconnect fabric. Writes to this parameter affect the
device only when the power management profile is set to "manual"
mode. The device IC clock might be set to lower value than the
maximum. The user should read the ic_clk_curr to see the actual
frequency value of the IC
frequency value of the IC. This property is valid only for the
Goya ASIC family
What: /sys/class/habanalabs/hl<n>/ic_clk_curr
Date: Jan 2019
KernelVersion: 5.1
Contact: oded.gabbay@gmail.com
Description: Displays the current clock frequency of the Interconnect fabric
Description: Displays the current clock frequency, in Hz, of the Interconnect
fabric. This property is valid only for the Goya ASIC family
What: /sys/class/habanalabs/hl<n>/infineon_ver
Date: Jan 2019
@@ -92,18 +94,20 @@ What: /sys/class/habanalabs/hl<n>/mme_clk
Date: Jan 2019
KernelVersion: 5.1
Contact: oded.gabbay@gmail.com
Description: Allows the user to set the maximum clock frequency of the
MME compute engine. Writes to this parameter affect the device
only when the power management profile is set to "manual" mode.
The device MME clock might be set to lower value then the
Description: Allows the user to set the maximum clock frequency, in Hz, of
the MME compute engine. Writes to this parameter affect the
device only when the power management profile is set to "manual"
mode. The device MME clock might be set to lower value than the
maximum. The user should read the mme_clk_curr to see the actual
frequency value of the MME
frequency value of the MME. This property is valid only for the
Goya ASIC family
What: /sys/class/habanalabs/hl<n>/mme_clk_curr
Date: Jan 2019
KernelVersion: 5.1
Contact: oded.gabbay@gmail.com
Description: Displays the current clock frequency of the MME compute engine
Description: Displays the current clock frequency, in Hz, of the MME compute
engine. This property is valid only for the Goya ASIC family
What: /sys/class/habanalabs/hl<n>/pci_addr
Date: Jan 2019
@@ -163,18 +167,20 @@ What: /sys/class/habanalabs/hl<n>/tpc_clk
Date: Jan 2019
KernelVersion: 5.1
Contact: oded.gabbay@gmail.com
Description: Allows the user to set the maximum clock frequency of the
TPC compute engines. Writes to this parameter affect the device
only when the power management profile is set to "manual" mode.
The device TPC clock might be set to lower value then the
Description: Allows the user to set the maximum clock frequency, in Hz, of
the TPC compute engines. Writes to this parameter affect the
device only when the power management profile is set to "manual"
mode. The device TPC clock might be set to lower value than the
maximum. The user should read the tpc_clk_curr to see the actual
frequency value of the TPC
frequency value of the TPC. This property is valid only for
Goya ASIC family
What: /sys/class/habanalabs/hl<n>/tpc_clk_curr
Date: Jan 2019
KernelVersion: 5.1
Contact: oded.gabbay@gmail.com
Description: Displays the current clock frequency of the TPC compute engines
Description: Displays the current clock frequency, in Hz, of the TPC compute
engines. This property is valid only for the Goya ASIC family
What: /sys/class/habanalabs/hl<n>/uboot_ver
Date: Jan 2019

View File

@@ -1,6 +1,6 @@
What: For USB devices : /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/report_descriptor
For BT devices : /sys/class/bluetooth/hci<addr>/<hid-bus>:<vendor-id>:<product-id>.<num>/report_descriptor
Symlink : /sys/class/hidraw/hidraw<num>/device/report_descriptor
What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/report_descriptor
What: /sys/class/bluetooth/hci<addr>/<hid-bus>:<vendor-id>:<product-id>.<num>/report_descriptor
What: /sys/class/hidraw/hidraw<num>/device/report_descriptor
Date: Jan 2011
KernelVersion: 2.0.39
Contact: Alan Ott <alan@signal11.us>
@@ -9,9 +9,9 @@ Description: When read, this file returns the device's raw binary HID
This file cannot be written.
Users: HIDAPI library (http://www.signal11.us/oss/hidapi)
What: For USB devices : /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/country
For BT devices : /sys/class/bluetooth/hci<addr>/<hid-bus>:<vendor-id>:<product-id>.<num>/country
Symlink : /sys/class/hidraw/hidraw<num>/device/country
What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/country
What: /sys/class/bluetooth/hci<addr>/<hid-bus>:<vendor-id>:<product-id>.<num>/country
What: /sys/class/hidraw/hidraw<num>/device/country
Date: February 2015
KernelVersion: 3.19
Contact: Olivier Gay <ogay@logitech.com>

View File

@@ -5,7 +5,7 @@ Description: It is possible to switch the dpi setting of the mouse with the
press of a button.
When read, this file returns the raw number of the actual dpi
setting reported by the mouse. This number has to be further
processed to receive the real dpi value.
processed to receive the real dpi value:
VALUE DPI
1 800

View File

@@ -1,6 +1,6 @@
What: /sys/class/tpm/tpmX/ppi/
Date: August 2012
Kernel Version: 3.6
KernelVersion: 3.6
Contact: xiaoyan.zhang@intel.com
Description:
This folder includes the attributes related with PPI (Physical

View File

@@ -1,6 +1,6 @@
What: /sys/bus/scsi/drivers/st/debug_flag
Date: October 2015
Kernel Version: ?.?
KernelVersion: ?.?
Contact: shane.seymour@hpe.com
Description:
This file allows you to turn debug output from the st driver

View File

@@ -1,6 +1,6 @@
What: /sys/bus/hid/devices/<bus>:<vid>:<pid>.<n>/speed
Date: April 2010
Kernel Version: 2.6.35
KernelVersion: 2.6.35
Contact: linux-bluetooth@vger.kernel.org
Description:
The /sys/bus/hid/devices/<bus>:<vid>:<pid>.<n>/speed file

View File

@@ -243,3 +243,11 @@ Description:
- Del: echo '[h/c]!extension' > /sys/fs/f2fs/<disk>/extension_list
- [h] means add/del hot file extension
- [c] means add/del cold file extension
What: /sys/fs/f2fs/<disk>/unusable
Date April 2019
Contact: "Daniel Rosenberg" <drosen@google.com>
Description:
If checkpoint=disable, it displays the number of blocks that are unusable.
If checkpoint=enable it displays the enumber of blocks that would be unusable
if checkpoint=disable were to be set.

View File

@@ -2,7 +2,7 @@ What: /sys/kernel/fscaps
Date: February 2011
KernelVersion: 2.6.38
Contact: Ludwig Nussel <ludwig.nussel@suse.de>
Description
Description:
Shows whether file system capabilities are honored
when executing a binary

View File

@@ -24,3 +24,12 @@ Description: /sys/kernel/iommu_groups/reserved_regions list IOVA
region is described on a single line: the 1st field is
the base IOVA, the second is the end IOVA and the third
field describes the type of the region.
What: /sys/kernel/iommu_groups/reserved_regions
Date: June 2019
KernelVersion: v5.3
Contact: Eric Auger <eric.auger@redhat.com>
Description: In case an RMRR is used only by graphics or USB devices
it is now exposed as "direct-relaxable" instead of "direct".
In device assignment use case, for instance, those RMRR
are considered to be relaxable and safe.

View File

@@ -11,4 +11,4 @@ Description:
example would be, if User A has shares = 1024 and user
B has shares = 2048, User B will get twice the CPU
bandwidth user A will. For more details refer
Documentation/scheduler/sched-design-CFS.txt
Documentation/scheduler/sched-design-CFS.rst

View File

@@ -4,7 +4,7 @@ KernelVersion: 2.6.24
Contact: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Kexec Mailing List <kexec@lists.infradead.org>
Vivek Goyal <vgoyal@redhat.com>
Description
Description:
Shows physical address and size of vmcoreinfo ELF note.
First value contains physical address of note in hex and
second value contains the size of note in hex. This ELF

View File

@@ -31,7 +31,7 @@ Description:
To control the LED display, use the following :
echo 0x0T000DDD > /sys/devices/platform/asus_laptop/
where T control the 3 letters display, and DDD the 3 digits display.
The DDD table can be found in Documentation/laptops/asus-laptop.txt
The DDD table can be found in Documentation/admin-guide/laptops/asus-laptop.rst
What: /sys/devices/platform/asus_laptop/bluetooth
Date: January 2007

View File

@@ -36,3 +36,13 @@ KernelVersion: 3.5
Contact: "AceLan Kao" <acelan.kao@canonical.com>
Description:
Resume on lid open. 1 means on, 0 means off.
What: /sys/devices/platform/<platform>/fan_boost_mode
Date: Sep 2019
KernelVersion: 5.3
Contact: "Yurii Pavlovskyi" <yurii.pavlovskyi@gmail.com>
Description:
Fan boost mode:
* 0 - normal,
* 1 - overboost,
* 2 - silent

View File

@@ -1,7 +1,7 @@
What: /sys/devices/platform/<i2c-demux-name>/available_masters
Date: January 2016
KernelVersion: 4.6
Contact: Wolfram Sang <wsa@the-dreams.de>
Contact: Wolfram Sang <wsa+renesas@sang-engineering.com>
Description:
Reading the file will give you a list of masters which can be
selected for a demultiplexed bus. The format is
@@ -12,7 +12,7 @@ Description:
What: /sys/devices/platform/<i2c-demux-name>/current_master
Date: January 2016
KernelVersion: 4.6
Contact: Wolfram Sang <wsa@the-dreams.de>
Contact: Wolfram Sang <wsa+renesas@sang-engineering.com>
Description:
This file selects/shows the active I2C master for a demultiplexed
bus. It uses the <index> value from the file 'available_masters'.

View File

@@ -0,0 +1,40 @@
What: /sys/bus/platform/devices/GOOG000C\:00/boot_on_ac
Date: April 2019
KernelVersion: 5.3
Description:
Boot on AC is a policy which makes the device boot from S5
when AC power is connected. This is useful for users who
want to run their device headless or with a dock.
Input should be parseable by kstrtou8() to 0 or 1.
What: /sys/bus/platform/devices/GOOG000C\:00/build_date
Date: May 2019
KernelVersion: 5.3
Description:
Display Wilco Embedded Controller firmware build date.
Output will a MM/DD/YY string.
What: /sys/bus/platform/devices/GOOG000C\:00/build_revision
Date: May 2019
KernelVersion: 5.3
Description:
Display Wilco Embedded Controller build revision.
Output will a version string be similar to the example below:
d2592cae0
What: /sys/bus/platform/devices/GOOG000C\:00/model_number
Date: May 2019
KernelVersion: 5.3
Description:
Display Wilco Embedded Controller model number.
Output will a version string be similar to the example below:
08B6
What: /sys/bus/platform/devices/GOOG000C\:00/version
Date: May 2019
KernelVersion: 5.3
Description:
Display Wilco Embedded Controller firmware version.
The format of the string is x.y.z. Where x is major, y is minor
and z is the build number. For example: 95.00.06

View File

@@ -300,4 +300,4 @@ Description:
attempt.
Using this sysfs file will override any values that were
set using the kernel command line for disk offset.
set using the kernel command line for disk offset.

View File

@@ -212,7 +212,7 @@ The standard 64-bit addressing device would do something like this::
If the device only supports 32-bit addressing for descriptors in the
coherent allocations, but supports full 64-bits for streaming mappings
it would look like this:
it would look like this::
if (dma_set_mask(dev, DMA_BIT_MASK(64))) {
dev_warn(dev, "mydev: No suitable DMA available\n");

View File

@@ -198,7 +198,7 @@ call to set the mask to the value returned.
::
size_t
dma_direct_max_mapping_size(struct device *dev);
dma_max_mapping_size(struct device *dev);
Returns the maximum size of a mapping for the device. The size parameter
of the mapping functions like dma_map_single(), dma_map_page() and

13
Documentation/Kconfig Normal file
View File

@@ -0,0 +1,13 @@
config WARN_MISSING_DOCUMENTS
bool "Warn if there's a missing documentation file"
depends on COMPILE_TEST
help
It is not uncommon that a document gets renamed.
This option makes the Kernel to check for missing dependencies,
warning when something is missing. Works only if the Kernel
is built from a git tree.
If unsure, select 'N'.

View File

@@ -4,6 +4,11 @@
subdir-y := devicetree/bindings/
# Check for broken documentation file references
ifeq ($(CONFIG_WARN_MISSING_DOCUMENTS),y)
$(shell $(srctree)/scripts/documentation-file-ref-check --warn)
endif
# You can set these variables from the command line.
SPHINXBUILD = sphinx-build
SPHINXOPTS =
@@ -23,11 +28,13 @@ ifeq ($(HAVE_SPHINX),0)
.DEFAULT:
$(warning The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed and in PATH, or set the SPHINXBUILD make variable to point to the full path of the '$(SPHINXBUILD)' executable.)
@echo
@./scripts/sphinx-pre-install
@$(srctree)/scripts/sphinx-pre-install
@echo " SKIP Sphinx $@ target."
else # HAVE_SPHINX
export SPHINXOPTS = $(shell perl -e 'open IN,"sphinx-build --version 2>&1 |"; while (<IN>) { if (m/([\d\.]+)/) { print "-jauto" if ($$1 >= "1.7") } ;} close IN')
# User-friendly check for pdflatex and latexmk
HAVE_PDFLATEX := $(shell if which $(PDFLATEX) >/dev/null 2>&1; then echo 1; else echo 0; fi)
HAVE_LATEXMK := $(shell if which latexmk >/dev/null 2>&1; then echo 1; else echo 0; fi)
@@ -70,12 +77,14 @@ quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4)
$(abspath $(BUILDDIR)/$3/$4)
htmldocs:
@$(srctree)/scripts/sphinx-pre-install --version-check
@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,html,$(var),,$(var)))
linkcheckdocs:
@$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,linkcheck,$(var),,$(var)))
latexdocs:
@$(srctree)/scripts/sphinx-pre-install --version-check
@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,latex,$(var),latex,$(var)))
ifeq ($(HAVE_PDFLATEX),0)
@@ -87,14 +96,17 @@ pdfdocs:
else # HAVE_PDFLATEX
pdfdocs: latexdocs
@$(srctree)/scripts/sphinx-pre-install --version-check
$(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX="$(PDFLATEX)" LATEXOPTS="$(LATEXOPTS)" -C $(BUILDDIR)/$(var)/latex || exit;)
endif # HAVE_PDFLATEX
epubdocs:
@$(srctree)/scripts/sphinx-pre-install --version-check
@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,epub,$(var),epub,$(var)))
xmldocs:
@$(srctree)/scripts/sphinx-pre-install --version-check
@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,xml,$(var),xml,$(var)))
endif # HAVE_SPHINX

View File

@@ -1,4 +1,8 @@
ACPI considerations for PCI host bridges
.. SPDX-License-Identifier: GPL-2.0
========================================
ACPI considerations for PCI host bridges
========================================
The general rule is that the ACPI namespace should describe everything the
OS might use unless there's another way for the OS to find it [1, 2].
@@ -131,12 +135,13 @@ address always corresponds to bus 0, even if the bus range below the bridge
[4] ACPI 6.2, sec 6.4.3.5.1, 2, 3, 4:
QWord/DWord/Word Address Space Descriptor (.1, .2, .3)
General Flags: Bit [0] Ignored
General Flags: Bit [0] Ignored
Extended Address Space Descriptor (.4)
General Flags: Bit [0] Consumer/Producer:
1This device consumes this resource
0This device produces and consumes this resource
General Flags: Bit [0] Consumer/Producer:
* 1 This device consumes this resource
* 0 This device produces and consumes this resource
[5] ACPI 6.2, sec 19.6.43:
ResourceUsage specifies whether the Memory range is consumed by

View File

@@ -0,0 +1,13 @@
.. SPDX-License-Identifier: GPL-2.0
======================
PCI Endpoint Framework
======================
.. toctree::
:maxdepth: 2
pci-endpoint
pci-endpoint-cfs
pci-test-function
pci-test-howto

View File

@@ -1,41 +1,51 @@
CONFIGURING PCI ENDPOINT USING CONFIGFS
Kishon Vijay Abraham I <kishon@ti.com>
.. SPDX-License-Identifier: GPL-2.0
=======================================
Configuring PCI Endpoint Using CONFIGFS
=======================================
:Author: Kishon Vijay Abraham I <kishon@ti.com>
The PCI Endpoint Core exposes configfs entry (pci_ep) to configure the
PCI endpoint function and to bind the endpoint function
with the endpoint controller. (For introducing other mechanisms to
configure the PCI Endpoint Function refer to [1]).
*) Mounting configfs
Mounting configfs
=================
The PCI Endpoint Core layer creates pci_ep directory in the mounted configfs
directory. configfs can be mounted using the following command.
directory. configfs can be mounted using the following command::
mount -t configfs none /sys/kernel/config
*) Directory Structure
Directory Structure
===================
The pci_ep configfs has two directories at its root: controllers and
functions. Every EPC device present in the system will have an entry in
the *controllers* directory and and every EPF driver present in the system
will have an entry in the *functions* directory.
::
/sys/kernel/config/pci_ep/
.. controllers/
.. functions/
/sys/kernel/config/pci_ep/
.. controllers/
.. functions/
*) Creating EPF Device
Creating EPF Device
===================
Every registered EPF driver will be listed in controllers directory. The
entries corresponding to EPF driver will be created by the EPF core.
::
/sys/kernel/config/pci_ep/functions/
.. <EPF Driver1>/
... <EPF Device 11>/
... <EPF Device 21>/
.. <EPF Driver2>/
... <EPF Device 12>/
... <EPF Device 22>/
/sys/kernel/config/pci_ep/functions/
.. <EPF Driver1>/
... <EPF Device 11>/
... <EPF Device 21>/
.. <EPF Driver2>/
... <EPF Device 12>/
... <EPF Device 22>/
In order to create a <EPF device> of the type probed by <EPF Driver>, the
user has to create a directory inside <EPF DriverN>.
@@ -44,34 +54,37 @@ Every <EPF device> directory consists of the following entries that can be
used to configure the standard configuration header of the endpoint function.
(These entries are created by the framework when any new <EPF Device> is
created)
::
.. <EPF Driver1>/
... <EPF Device 11>/
... vendorid
... deviceid
... revid
... progif_code
... subclass_code
... baseclass_code
... cache_line_size
... subsys_vendor_id
... subsys_id
... interrupt_pin
.. <EPF Driver1>/
... <EPF Device 11>/
... vendorid
... deviceid
... revid
... progif_code
... subclass_code
... baseclass_code
... cache_line_size
... subsys_vendor_id
... subsys_id
... interrupt_pin
*) EPC Device
EPC Device
==========
Every registered EPC device will be listed in controllers directory. The
entries corresponding to EPC device will be created by the EPC core.
::
/sys/kernel/config/pci_ep/controllers/
.. <EPC Device1>/
... <Symlink EPF Device11>/
... <Symlink EPF Device12>/
... start
.. <EPC Device2>/
... <Symlink EPF Device21>/
... <Symlink EPF Device22>/
... start
/sys/kernel/config/pci_ep/controllers/
.. <EPC Device1>/
... <Symlink EPF Device11>/
... <Symlink EPF Device12>/
... start
.. <EPC Device2>/
... <Symlink EPF Device21>/
... <Symlink EPF Device22>/
... start
The <EPC Device> directory will have a list of symbolic links to
<EPF Device>. These symbolic links should be created by the user to
@@ -81,7 +94,7 @@ The <EPC Device> directory will also have a *start* field. Once
"1" is written to this field, the endpoint device will be ready to
establish the link with the host. This is usually done after
all the EPF devices are created and linked with the EPC device.
::
| controllers/
| <Directory: EPC name>/
@@ -102,4 +115,4 @@ all the EPF devices are created and linked with the EPC device.
| interrupt_pin
| function
[1] -> Documentation/PCI/endpoint/pci-endpoint.txt
[1] :doc:`pci-endpoint`

View File

@@ -1,11 +1,13 @@
PCI ENDPOINT FRAMEWORK
Kishon Vijay Abraham I <kishon@ti.com>
.. SPDX-License-Identifier: GPL-2.0
:Author: Kishon Vijay Abraham I <kishon@ti.com>
This document is a guide to use the PCI Endpoint Framework in order to create
endpoint controller driver, endpoint function driver, and using configfs
interface to bind the function driver to the controller driver.
1. Introduction
Introduction
============
Linux has a comprehensive PCI subsystem to support PCI controllers that
operates in Root Complex mode. The subsystem has capability to scan PCI bus,
@@ -19,26 +21,30 @@ add endpoint mode support in Linux. This will help to run Linux in an
EP system which can have a wide variety of use cases from testing or
validation, co-processor accelerator, etc.
2. PCI Endpoint Core
PCI Endpoint Core
=================
The PCI Endpoint Core layer comprises 3 components: the Endpoint Controller
library, the Endpoint Function library, and the configfs layer to bind the
endpoint function with the endpoint controller.
2.1 PCI Endpoint Controller(EPC) Library
PCI Endpoint Controller(EPC) Library
------------------------------------
The EPC library provides APIs to be used by the controller that can operate
in endpoint mode. It also provides APIs to be used by function driver/library
in order to implement a particular endpoint function.
2.1.1 APIs for the PCI controller Driver
APIs for the PCI controller Driver
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This section lists the APIs that the PCI Endpoint core provides to be used
by the PCI controller driver.
*) devm_pci_epc_create()/pci_epc_create()
* devm_pci_epc_create()/pci_epc_create()
The PCI controller driver should implement the following ops:
* write_header: ops to populate configuration space header
* set_bar: ops to configure the BAR
* clear_bar: ops to reset the BAR
@@ -51,110 +57,116 @@ by the PCI controller driver.
The PCI controller driver can then create a new EPC device by invoking
devm_pci_epc_create()/pci_epc_create().
*) devm_pci_epc_destroy()/pci_epc_destroy()
* devm_pci_epc_destroy()/pci_epc_destroy()
The PCI controller driver can destroy the EPC device created by either
devm_pci_epc_create() or pci_epc_create() using devm_pci_epc_destroy() or
pci_epc_destroy().
*) pci_epc_linkup()
* pci_epc_linkup()
In order to notify all the function devices that the EPC device to which
they are linked has established a link with the host, the PCI controller
driver should invoke pci_epc_linkup().
*) pci_epc_mem_init()
* pci_epc_mem_init()
Initialize the pci_epc_mem structure used for allocating EPC addr space.
*) pci_epc_mem_exit()
* pci_epc_mem_exit()
Cleanup the pci_epc_mem structure allocated during pci_epc_mem_init().
2.1.2 APIs for the PCI Endpoint Function Driver
APIs for the PCI Endpoint Function Driver
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This section lists the APIs that the PCI Endpoint core provides to be used
by the PCI endpoint function driver.
*) pci_epc_write_header()
* pci_epc_write_header()
The PCI endpoint function driver should use pci_epc_write_header() to
write the standard configuration header to the endpoint controller.
*) pci_epc_set_bar()
* pci_epc_set_bar()
The PCI endpoint function driver should use pci_epc_set_bar() to configure
the Base Address Register in order for the host to assign PCI addr space.
Register space of the function driver is usually configured
using this API.
*) pci_epc_clear_bar()
* pci_epc_clear_bar()
The PCI endpoint function driver should use pci_epc_clear_bar() to reset
the BAR.
*) pci_epc_raise_irq()
* pci_epc_raise_irq()
The PCI endpoint function driver should use pci_epc_raise_irq() to raise
Legacy Interrupt, MSI or MSI-X Interrupt.
*) pci_epc_mem_alloc_addr()
* pci_epc_mem_alloc_addr()
The PCI endpoint function driver should use pci_epc_mem_alloc_addr(), to
allocate memory address from EPC addr space which is required to access
RC's buffer
*) pci_epc_mem_free_addr()
* pci_epc_mem_free_addr()
The PCI endpoint function driver should use pci_epc_mem_free_addr() to
free the memory space allocated using pci_epc_mem_alloc_addr().
2.1.3 Other APIs
Other APIs
~~~~~~~~~~
There are other APIs provided by the EPC library. These are used for binding
the EPF device with EPC device. pci-ep-cfs.c can be used as reference for
using these APIs.
*) pci_epc_get()
* pci_epc_get()
Get a reference to the PCI endpoint controller based on the device name of
the controller.
*) pci_epc_put()
* pci_epc_put()
Release the reference to the PCI endpoint controller obtained using
pci_epc_get()
*) pci_epc_add_epf()
* pci_epc_add_epf()
Add a PCI endpoint function to a PCI endpoint controller. A PCIe device
can have up to 8 functions according to the specification.
*) pci_epc_remove_epf()
* pci_epc_remove_epf()
Remove the PCI endpoint function from PCI endpoint controller.
*) pci_epc_start()
* pci_epc_start()
The PCI endpoint function driver should invoke pci_epc_start() once it
has configured the endpoint function and wants to start the PCI link.
*) pci_epc_stop()
* pci_epc_stop()
The PCI endpoint function driver should invoke pci_epc_stop() to stop
the PCI LINK.
2.2 PCI Endpoint Function(EPF) Library
PCI Endpoint Function(EPF) Library
----------------------------------
The EPF library provides APIs to be used by the function driver and the EPC
library to provide endpoint mode functionality.
2.2.1 APIs for the PCI Endpoint Function Driver
APIs for the PCI Endpoint Function Driver
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This section lists the APIs that the PCI Endpoint core provides to be used
by the PCI endpoint function driver.
*) pci_epf_register_driver()
* pci_epf_register_driver()
The PCI Endpoint Function driver should implement the following ops:
* bind: ops to perform when a EPC device has been bound to EPF device
@@ -166,50 +178,54 @@ by the PCI endpoint function driver.
The PCI Function driver can then register the PCI EPF driver by using
pci_epf_register_driver().
*) pci_epf_unregister_driver()
* pci_epf_unregister_driver()
The PCI Function driver can unregister the PCI EPF driver by using
pci_epf_unregister_driver().
*) pci_epf_alloc_space()
* pci_epf_alloc_space()
The PCI Function driver can allocate space for a particular BAR using
pci_epf_alloc_space().
*) pci_epf_free_space()
* pci_epf_free_space()
The PCI Function driver can free the allocated space
(using pci_epf_alloc_space) by invoking pci_epf_free_space().
2.2.2 APIs for the PCI Endpoint Controller Library
APIs for the PCI Endpoint Controller Library
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This section lists the APIs that the PCI Endpoint core provides to be used
by the PCI endpoint controller library.
*) pci_epf_linkup()
* pci_epf_linkup()
The PCI endpoint controller library invokes pci_epf_linkup() when the
EPC device has established the connection to the host.
2.2.2 Other APIs
Other APIs
~~~~~~~~~~
There are other APIs provided by the EPF library. These are used to notify
the function driver when the EPF device is bound to the EPC device.
pci-ep-cfs.c can be used as reference for using these APIs.
*) pci_epf_create()
* pci_epf_create()
Create a new PCI EPF device by passing the name of the PCI EPF device.
This name will be used to bind the the EPF device to a EPF driver.
*) pci_epf_destroy()
* pci_epf_destroy()
Destroy the created PCI EPF device.
*) pci_epf_bind()
* pci_epf_bind()
pci_epf_bind() should be invoked when the EPF device has been bound to
a EPC device.
*) pci_epf_unbind()
* pci_epf_unbind()
pci_epf_unbind() should be invoked when the binding between EPC device
and EPF device is lost.

View File

@@ -1,5 +1,10 @@
PCI TEST
Kishon Vijay Abraham I <kishon@ti.com>
.. SPDX-License-Identifier: GPL-2.0
=================
PCI Test Function
=================
:Author: Kishon Vijay Abraham I <kishon@ti.com>
Traditionally PCI RC has always been validated by using standard
PCI cards like ethernet PCI cards or USB PCI cards or SATA PCI cards.
@@ -23,65 +28,76 @@ The PCI endpoint test device has the following registers:
8) PCI_ENDPOINT_TEST_IRQ_TYPE
9) PCI_ENDPOINT_TEST_IRQ_NUMBER
*) PCI_ENDPOINT_TEST_MAGIC
* PCI_ENDPOINT_TEST_MAGIC
This register will be used to test BAR0. A known pattern will be written
and read back from MAGIC register to verify BAR0.
*) PCI_ENDPOINT_TEST_COMMAND:
* PCI_ENDPOINT_TEST_COMMAND
This register will be used by the host driver to indicate the function
that the endpoint device must perform.
Bitfield Description:
Bit 0 : raise legacy IRQ
Bit 1 : raise MSI IRQ
Bit 2 : raise MSI-X IRQ
Bit 3 : read command (read data from RC buffer)
Bit 4 : write command (write data to RC buffer)
Bit 5 : copy command (copy data from one RC buffer to another
RC buffer)
======== ================================================================
Bitfield Description
======== ================================================================
Bit 0 raise legacy IRQ
Bit 1 raise MSI IRQ
Bit 2 raise MSI-X IRQ
Bit 3 read command (read data from RC buffer)
Bit 4 write command (write data to RC buffer)
Bit 5 copy command (copy data from one RC buffer to another RC buffer)
======== ================================================================
*) PCI_ENDPOINT_TEST_STATUS
* PCI_ENDPOINT_TEST_STATUS
This register reflects the status of the PCI endpoint device.
Bitfield Description:
Bit 0 : read success
Bit 1 : read fail
Bit 2 : write success
Bit 3 : write fail
Bit 4 : copy success
Bit 5 : copy fail
Bit 6 : IRQ raised
Bit 7 : source address is invalid
Bit 8 : destination address is invalid
======== ==============================
Bitfield Description
======== ==============================
Bit 0 read success
Bit 1 read fail
Bit 2 write success
Bit 3 write fail
Bit 4 copy success
Bit 5 copy fail
Bit 6 IRQ raised
Bit 7 source address is invalid
Bit 8 destination address is invalid
======== ==============================
*) PCI_ENDPOINT_TEST_SRC_ADDR
* PCI_ENDPOINT_TEST_SRC_ADDR
This register contains the source address (RC buffer address) for the
COPY/READ command.
*) PCI_ENDPOINT_TEST_DST_ADDR
* PCI_ENDPOINT_TEST_DST_ADDR
This register contains the destination address (RC buffer address) for
the COPY/WRITE command.
*) PCI_ENDPOINT_TEST_IRQ_TYPE
* PCI_ENDPOINT_TEST_IRQ_TYPE
This register contains the interrupt type (Legacy/MSI) triggered
for the READ/WRITE/COPY and raise IRQ (Legacy/MSI) commands.
Possible types:
- Legacy : 0
- MSI : 1
- MSI-X : 2
*) PCI_ENDPOINT_TEST_IRQ_NUMBER
====== ==
Legacy 0
MSI 1
MSI-X 2
====== ==
* PCI_ENDPOINT_TEST_IRQ_NUMBER
This register contains the triggered ID interrupt.
Admissible values:
- Legacy : 0
- MSI : [1 .. 32]
- MSI-X : [1 .. 2048]
====== ===========
Legacy 0
MSI [1 .. 32]
MSI-X [1 .. 2048]
====== ===========

View File

@@ -1,38 +1,51 @@
PCI TEST USERGUIDE
Kishon Vijay Abraham I <kishon@ti.com>
.. SPDX-License-Identifier: GPL-2.0
===================
PCI Test User Guide
===================
:Author: Kishon Vijay Abraham I <kishon@ti.com>
This document is a guide to help users use pci-epf-test function driver
and pci_endpoint_test host driver for testing PCI. The list of steps to
be followed in the host side and EP side is given below.
1. Endpoint Device
Endpoint Device
===============
1.1 Endpoint Controller Devices
Endpoint Controller Devices
---------------------------
To find the list of endpoint controller devices in the system:
To find the list of endpoint controller devices in the system::
# ls /sys/class/pci_epc/
51000000.pcie_ep
If PCI_ENDPOINT_CONFIGFS is enabled
If PCI_ENDPOINT_CONFIGFS is enabled::
# ls /sys/kernel/config/pci_ep/controllers
51000000.pcie_ep
1.2 Endpoint Function Drivers
To find the list of endpoint function drivers in the system:
Endpoint Function Drivers
-------------------------
To find the list of endpoint function drivers in the system::
# ls /sys/bus/pci-epf/drivers
pci_epf_test
If PCI_ENDPOINT_CONFIGFS is enabled
If PCI_ENDPOINT_CONFIGFS is enabled::
# ls /sys/kernel/config/pci_ep/functions
pci_epf_test
1.3 Creating pci-epf-test Device
Creating pci-epf-test Device
----------------------------
PCI endpoint function device can be created using the configfs. To create
pci-epf-test device, the following commands can be used
pci-epf-test device, the following commands can be used::
# mount -t configfs none /sys/kernel/config
# cd /sys/kernel/config/pci_ep/
@@ -42,7 +55,7 @@ The "mkdir func1" above creates the pci-epf-test function device that will
be probed by pci_epf_test driver.
The PCI endpoint framework populates the directory with the following
configurable fields.
configurable fields::
# ls functions/pci_epf_test/func1
baseclass_code interrupt_pin progif_code subsys_id
@@ -51,67 +64,83 @@ configurable fields.
The PCI endpoint function driver populates these entries with default values
when the device is bound to the driver. The pci-epf-test driver populates
vendorid with 0xffff and interrupt_pin with 0x0001
vendorid with 0xffff and interrupt_pin with 0x0001::
# cat functions/pci_epf_test/func1/vendorid
0xffff
# cat functions/pci_epf_test/func1/interrupt_pin
0x0001
1.4 Configuring pci-epf-test Device
Configuring pci-epf-test Device
-------------------------------
The user can configure the pci-epf-test device using configfs entry. In order
to change the vendorid and the number of MSI interrupts used by the function
device, the following commands can be used.
device, the following commands can be used::
# echo 0x104c > functions/pci_epf_test/func1/vendorid
# echo 0xb500 > functions/pci_epf_test/func1/deviceid
# echo 16 > functions/pci_epf_test/func1/msi_interrupts
# echo 8 > functions/pci_epf_test/func1/msix_interrupts
1.5 Binding pci-epf-test Device to EP Controller
Binding pci-epf-test Device to EP Controller
--------------------------------------------
In order for the endpoint function device to be useful, it has to be bound to
a PCI endpoint controller driver. Use the configfs to bind the function
device to one of the controller driver present in the system.
device to one of the controller driver present in the system::
# ln -s functions/pci_epf_test/func1 controllers/51000000.pcie_ep/
Once the above step is completed, the PCI endpoint is ready to establish a link
with the host.
1.6 Start the Link
Start the Link
--------------
In order for the endpoint device to establish a link with the host, the _start_
field should be populated with '1'.
field should be populated with '1'::
# echo 1 > controllers/51000000.pcie_ep/start
2. RootComplex Device
2.1 lspci Output
RootComplex Device
==================
Note that the devices listed here correspond to the value populated in 1.4 above
lspci Output
------------
Note that the devices listed here correspond to the value populated in 1.4
above::
00:00.0 PCI bridge: Texas Instruments Device 8888 (rev 01)
01:00.0 Unassigned class [ff00]: Texas Instruments Device b500
2.2 Using Endpoint Test function Device
Using Endpoint Test function Device
-----------------------------------
pcitest.sh added in tools/pci/ can be used to run all the default PCI endpoint
tests. To compile this tool the following commands should be used:
tests. To compile this tool the following commands should be used::
# cd <kernel-dir>
# make -C tools/pci
or if you desire to compile and install in your system:
or if you desire to compile and install in your system::
# cd <kernel-dir>
# make -C tools/pci install
The tool and script will be located in <rootfs>/usr/bin/
2.2.1 pcitest.sh Output
pcitest.sh Output
~~~~~~~~~~~~~~~~~
::
# pcitest.sh
BAR tests

View File

@@ -0,0 +1,18 @@
.. SPDX-License-Identifier: GPL-2.0
=======================
Linux PCI Bus Subsystem
=======================
.. toctree::
:maxdepth: 2
:numbered:
pci
picebus-howto
pci-iov-howto
msi-howto
acpi-info
pci-error-recovery
pcieaer-howto
endpoint/index

View File

@@ -1,13 +1,16 @@
The MSI Driver Guide HOWTO
Tom L Nguyen tom.l.nguyen@intel.com
10/03/2003
Revised Feb 12, 2004 by Martine Silbermann
email: Martine.Silbermann@hp.com
Revised Jun 25, 2004 by Tom L Nguyen
Revised Jul 9, 2008 by Matthew Wilcox <willy@linux.intel.com>
Copyright 2003, 2008 Intel Corporation
.. SPDX-License-Identifier: GPL-2.0
.. include:: <isonum.txt>
1. About this guide
==========================
The MSI Driver Guide HOWTO
==========================
:Authors: Tom L Nguyen; Martine Silbermann; Matthew Wilcox
:Copyright: 2003, 2008 Intel Corporation
About this guide
================
This guide describes the basics of Message Signaled Interrupts (MSIs),
the advantages of using MSI over traditional interrupt mechanisms, how
@@ -15,7 +18,8 @@ to change your driver to use MSI or MSI-X and some basic diagnostics to
try if a device doesn't support MSIs.
2. What are MSIs?
What are MSIs?
==============
A Message Signaled Interrupt is a write from the device to a special
address which causes an interrupt to be received by the CPU.
@@ -29,7 +33,8 @@ Devices may support both MSI and MSI-X, but only one can be enabled at
a time.
3. Why use MSIs?
Why use MSIs?
=============
There are three reasons why using MSIs can give an advantage over
traditional pin-based interrupts.
@@ -61,14 +66,16 @@ Other possible designs include giving one interrupt to each packet queue
in a network card or each port in a storage controller.
4. How to use MSIs
How to use MSIs
===============
PCI devices are initialised to use pin-based interrupts. The device
driver has to set up the device to use MSI or MSI-X. Not all machines
support MSIs correctly, and for those machines, the APIs described below
will simply fail and the device will continue to use pin-based interrupts.
4.1 Include kernel support for MSIs
Include kernel support for MSIs
-------------------------------
To support MSI or MSI-X, the kernel must be built with the CONFIG_PCI_MSI
option enabled. This option is only available on some architectures,
@@ -76,14 +83,15 @@ and it may depend on some other options also being set. For example,
on x86, you must also enable X86_UP_APIC or SMP in order to see the
CONFIG_PCI_MSI option.
4.2 Using MSI
Using MSI
---------
Most of the hard work is done for the driver in the PCI layer. The driver
simply has to request that the PCI layer set up the MSI capability for this
device.
To automatically use MSI or MSI-X interrupt vectors, use the following
function:
function::
int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
unsigned int max_vecs, unsigned int flags);
@@ -101,12 +109,12 @@ any possible kind of interrupt. If the PCI_IRQ_AFFINITY flag is set,
pci_alloc_irq_vectors() will spread the interrupts around the available CPUs.
To get the Linux IRQ numbers passed to request_irq() and free_irq() and the
vectors, use the following function:
vectors, use the following function::
int pci_irq_vector(struct pci_dev *dev, unsigned int nr);
Any allocated resources should be freed before removing the device using
the following function:
the following function::
void pci_free_irq_vectors(struct pci_dev *dev);
@@ -126,7 +134,7 @@ The typical usage of MSI or MSI-X interrupts is to allocate as many vectors
as possible, likely up to the limit supported by the device. If nvec is
larger than the number supported by the device it will automatically be
capped to the supported limit, so there is no need to query the number of
vectors supported beforehand:
vectors supported beforehand::
nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_ALL_TYPES)
if (nvec < 0)
@@ -135,7 +143,7 @@ vectors supported beforehand:
If a driver is unable or unwilling to deal with a variable number of MSI
interrupts it can request a particular number of interrupts by passing that
number to pci_alloc_irq_vectors() function as both 'min_vecs' and
'max_vecs' parameters:
'max_vecs' parameters::
ret = pci_alloc_irq_vectors(pdev, nvec, nvec, PCI_IRQ_ALL_TYPES);
if (ret < 0)
@@ -143,23 +151,24 @@ number to pci_alloc_irq_vectors() function as both 'min_vecs' and
The most notorious example of the request type described above is enabling
the single MSI mode for a device. It could be done by passing two 1s as
'min_vecs' and 'max_vecs':
'min_vecs' and 'max_vecs'::
ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_ALL_TYPES);
if (ret < 0)
goto out_err;
Some devices might not support using legacy line interrupts, in which case
the driver can specify that only MSI or MSI-X is acceptable:
the driver can specify that only MSI or MSI-X is acceptable::
nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_MSI | PCI_IRQ_MSIX);
if (nvec < 0)
goto out_err;
4.3 Legacy APIs
Legacy APIs
-----------
The following old APIs to enable and disable MSI or MSI-X interrupts should
not be used in new code:
not be used in new code::
pci_enable_msi() /* deprecated */
pci_disable_msi() /* deprecated */
@@ -174,9 +183,11 @@ number of vectors. If you have a legitimate special use case for the count
of vectors we might have to revisit that decision and add a
pci_nr_irq_vectors() helper that handles MSI and MSI-X transparently.
4.4 Considerations when using MSIs
Considerations when using MSIs
------------------------------
4.4.1 Spinlocks
Spinlocks
~~~~~~~~~
Most device drivers have a per-device spinlock which is taken in the
interrupt handler. With pin-based interrupts or a single MSI, it is not
@@ -188,7 +199,8 @@ acquire the spinlock. Such deadlocks can be avoided by using
spin_lock_irqsave() or spin_lock_irq() which disable local interrupts
and acquire the lock (see Documentation/kernel-hacking/locking.rst).
4.5 How to tell whether MSI/MSI-X is enabled on a device
How to tell whether MSI/MSI-X is enabled on a device
----------------------------------------------------
Using 'lspci -v' (as root) may show some devices with "MSI", "Message
Signalled Interrupts" or "MSI-X" capabilities. Each of these capabilities
@@ -196,7 +208,8 @@ has an 'Enable' flag which is followed with either "+" (enabled)
or "-" (disabled).
5. MSI quirks
MSI quirks
==========
Several PCI chipsets or devices are known not to support MSIs.
The PCI stack provides three ways to disable MSIs:
@@ -205,7 +218,8 @@ The PCI stack provides three ways to disable MSIs:
2. on all devices behind a specific bridge
3. on a single device
5.1. Disabling MSIs globally
Disabling MSIs globally
-----------------------
Some host chipsets simply don't support MSIs properly. If we're
lucky, the manufacturer knows this and has indicated it in the ACPI
@@ -219,7 +233,8 @@ on the kernel command line to disable MSIs on all devices. It would be
in your best interests to report the problem to linux-pci@vger.kernel.org
including a full 'lspci -v' so we can add the quirks to the kernel.
5.2. Disabling MSIs below a bridge
Disabling MSIs below a bridge
-----------------------------
Some PCI bridges are not able to route MSIs between busses properly.
In this case, MSIs must be disabled on all devices behind the bridge.
@@ -230,7 +245,7 @@ as the nVidia nForce and Serverworks HT2000). As with host chipsets,
Linux mostly knows about them and automatically enables MSIs if it can.
If you have a bridge unknown to Linux, you can enable
MSIs in configuration space using whatever method you know works, then
enable MSIs on that bridge by doing:
enable MSIs on that bridge by doing::
echo 1 > /sys/bus/pci/devices/$bridge/msi_bus
@@ -244,7 +259,8 @@ below this bridge.
Again, please notify linux-pci@vger.kernel.org of any bridges that need
special handling.
5.3. Disabling MSIs on a single device
Disabling MSIs on a single device
---------------------------------
Some devices are known to have faulty MSI implementations. Usually this
is handled in the individual device driver, but occasionally it's necessary
@@ -252,7 +268,8 @@ to handle this with a quirk. Some drivers have an option to disable use
of MSI. While this is a convenient workaround for the driver author,
it is not good practice, and should not be emulated.
5.4. Finding why MSIs are disabled on a device
Finding why MSIs are disabled on a device
-----------------------------------------
From the above three sections, you can see that there are many reasons
why MSIs may not be enabled for a given device. Your first step should
@@ -260,8 +277,8 @@ be to examine your dmesg carefully to determine whether MSIs are enabled
for your machine. You should also check your .config to be sure you
have enabled CONFIG_PCI_MSI.
Then, 'lspci -t' gives the list of bridges above a device. Reading
/sys/bus/pci/devices/*/msi_bus will tell you whether MSIs are enabled (1)
Then, 'lspci -t' gives the list of bridges above a device. Reading
`/sys/bus/pci/devices/*/msi_bus` will tell you whether MSIs are enabled (1)
or disabled (0). If 0 is found in any of the msi_bus files belonging
to bridges between the PCI root and the device, MSIs are disabled.

View File

@@ -1,12 +1,13 @@
.. SPDX-License-Identifier: GPL-2.0
PCI Error Recovery
------------------
February 2, 2006
==================
PCI Error Recovery
==================
Current document maintainer:
Linas Vepstas <linasvepstas@gmail.com>
updated by Richard Lary <rlary@us.ibm.com>
and Mike Mason <mmlnx@us.ibm.com> on 27-Jul-2009
:Authors: - Linas Vepstas <linasvepstas@gmail.com>
- Richard Lary <rlary@us.ibm.com>
- Mike Mason <mmlnx@us.ibm.com>
Many PCI bus controllers are able to detect a variety of hardware
@@ -63,7 +64,8 @@ mechanisms for dealing with SCSI bus errors and SCSI bus resets.
Detailed Design
---------------
===============
Design and implementation details below, based on a chain of
public email discussions with Ben Herrenschmidt, circa 5 April 2005.
@@ -73,30 +75,33 @@ pci_driver. A driver that fails to provide the structure is "non-aware",
and the actual recovery steps taken are platform dependent. The
arch/powerpc implementation will simulate a PCI hotplug remove/add.
This structure has the form:
struct pci_error_handlers
{
int (*error_detected)(struct pci_dev *dev, enum pci_channel_state);
int (*mmio_enabled)(struct pci_dev *dev);
int (*slot_reset)(struct pci_dev *dev);
void (*resume)(struct pci_dev *dev);
};
This structure has the form::
The possible channel states are:
enum pci_channel_state {
pci_channel_io_normal, /* I/O channel is in normal state */
pci_channel_io_frozen, /* I/O to channel is blocked */
pci_channel_io_perm_failure, /* PCI card is dead */
};
struct pci_error_handlers
{
int (*error_detected)(struct pci_dev *dev, enum pci_channel_state);
int (*mmio_enabled)(struct pci_dev *dev);
int (*slot_reset)(struct pci_dev *dev);
void (*resume)(struct pci_dev *dev);
};
Possible return values are:
enum pci_ers_result {
PCI_ERS_RESULT_NONE, /* no result/none/not supported in device driver */
PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */
PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */
PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */
PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */
};
The possible channel states are::
enum pci_channel_state {
pci_channel_io_normal, /* I/O channel is in normal state */
pci_channel_io_frozen, /* I/O to channel is blocked */
pci_channel_io_perm_failure, /* PCI card is dead */
};
Possible return values are::
enum pci_ers_result {
PCI_ERS_RESULT_NONE, /* no result/none/not supported in device driver */
PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */
PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */
PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */
PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */
};
A driver does not have to implement all of these callbacks; however,
if it implements any, it must implement error_detected(). If a callback
@@ -134,16 +139,17 @@ shouldn't do any new IOs. Called in task context. This is sort of a
All drivers participating in this system must implement this call.
The driver must return one of the following result codes:
- PCI_ERS_RESULT_CAN_RECOVER:
Driver returns this if it thinks it might be able to recover
the HW by just banging IOs or if it wants to be given
a chance to extract some diagnostic information (see
mmio_enable, below).
- PCI_ERS_RESULT_NEED_RESET:
Driver returns this if it can't recover without a
slot reset.
- PCI_ERS_RESULT_DISCONNECT:
Driver returns this if it doesn't want to recover at all.
- PCI_ERS_RESULT_CAN_RECOVER
Driver returns this if it thinks it might be able to recover
the HW by just banging IOs or if it wants to be given
a chance to extract some diagnostic information (see
mmio_enable, below).
- PCI_ERS_RESULT_NEED_RESET
Driver returns this if it can't recover without a
slot reset.
- PCI_ERS_RESULT_DISCONNECT
Driver returns this if it doesn't want to recover at all.
The next step taken will depend on the result codes returned by the
drivers.
@@ -159,25 +165,27 @@ then recovery proceeds to STEP 4 (Slot Reset).
If the platform is unable to recover the slot, the next step
is STEP 6 (Permanent Failure).
>>> The current powerpc implementation assumes that a device driver will
>>> *not* schedule or semaphore in this routine; the current powerpc
>>> implementation uses one kernel thread to notify all devices;
>>> thus, if one device sleeps/schedules, all devices are affected.
>>> Doing better requires complex multi-threaded logic in the error
>>> recovery implementation (e.g. waiting for all notification threads
>>> to "join" before proceeding with recovery.) This seems excessively
>>> complex and not worth implementing.
.. note::
>>> The current powerpc implementation doesn't much care if the device
>>> attempts I/O at this point, or not. I/O's will fail, returning
>>> a value of 0xff on read, and writes will be dropped. If more than
>>> EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH
>>> assumes that the device driver has gone into an infinite loop
>>> and prints an error to syslog. A reboot is then required to
>>> get the device working again.
The current powerpc implementation assumes that a device driver will
*not* schedule or semaphore in this routine; the current powerpc
implementation uses one kernel thread to notify all devices;
thus, if one device sleeps/schedules, all devices are affected.
Doing better requires complex multi-threaded logic in the error
recovery implementation (e.g. waiting for all notification threads
to "join" before proceeding with recovery.) This seems excessively
complex and not worth implementing.
The current powerpc implementation doesn't much care if the device
attempts I/O at this point, or not. I/O's will fail, returning
a value of 0xff on read, and writes will be dropped. If more than
EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH
assumes that the device driver has gone into an infinite loop
and prints an error to syslog. A reboot is then required to
get the device working again.
STEP 2: MMIO Enabled
-------------------
--------------------
The platform re-enables MMIO to the device (but typically not the
DMA), and then calls the mmio_enabled() callback on all affected
device drivers.
@@ -192,34 +200,36 @@ link reset was performed by the HW. If the platform can't just re-enable IOs
without a slot reset or a link reset, it will not call this callback, and
instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset)
>>> The following is proposed; no platform implements this yet:
>>> Proposal: All I/O's should be done _synchronously_ from within
>>> this callback, errors triggered by them will be returned via
>>> the normal pci_check_whatever() API, no new error_detected()
>>> callback will be issued due to an error happening here. However,
>>> such an error might cause IOs to be re-blocked for the whole
>>> segment, and thus invalidate the recovery that other devices
>>> on the same segment might have done, forcing the whole segment
>>> into one of the next states, that is, link reset or slot reset.
.. note::
The following is proposed; no platform implements this yet:
Proposal: All I/O's should be done _synchronously_ from within
this callback, errors triggered by them will be returned via
the normal pci_check_whatever() API, no new error_detected()
callback will be issued due to an error happening here. However,
such an error might cause IOs to be re-blocked for the whole
segment, and thus invalidate the recovery that other devices
on the same segment might have done, forcing the whole segment
into one of the next states, that is, link reset or slot reset.
The driver should return one of the following result codes:
- PCI_ERS_RESULT_RECOVERED
Driver returns this if it thinks the device is fully
functional and thinks it is ready to start
normal driver operations again. There is no
guarantee that the driver will actually be
allowed to proceed, as another driver on the
same segment might have failed and thus triggered a
slot reset on platforms that support it.
- PCI_ERS_RESULT_RECOVERED
Driver returns this if it thinks the device is fully
functional and thinks it is ready to start
normal driver operations again. There is no
guarantee that the driver will actually be
allowed to proceed, as another driver on the
same segment might have failed and thus triggered a
slot reset on platforms that support it.
- PCI_ERS_RESULT_NEED_RESET
Driver returns this if it thinks the device is not
recoverable in its current state and it needs a slot
reset to proceed.
- PCI_ERS_RESULT_NEED_RESET
Driver returns this if it thinks the device is not
recoverable in its current state and it needs a slot
reset to proceed.
- PCI_ERS_RESULT_DISCONNECT
Same as above. Total failure, no recovery even after
reset driver dead. (To be defined more precisely)
- PCI_ERS_RESULT_DISCONNECT
Same as above. Total failure, no recovery even after
reset driver dead. (To be defined more precisely)
The next step taken depends on the results returned by the drivers.
If all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform
@@ -293,31 +303,33 @@ device will be considered "dead" in this case.
Drivers for multi-function cards will need to coordinate among
themselves as to which driver instance will perform any "one-shot"
or global device initialization. For example, the Symbios sym53cxx2
driver performs device init only from PCI function 0:
driver performs device init only from PCI function 0::
+ if (PCI_FUNC(pdev->devfn) == 0)
+ sym_reset_scsi_bus(np, 0);
+ if (PCI_FUNC(pdev->devfn) == 0)
+ sym_reset_scsi_bus(np, 0);
Result codes:
- PCI_ERS_RESULT_DISCONNECT
Same as above.
Result codes:
- PCI_ERS_RESULT_DISCONNECT
Same as above.
Drivers for PCI Express cards that require a fundamental reset must
set the needs_freset bit in the pci_dev structure in their probe function.
For example, the QLogic qla2xxx driver sets the needs_freset bit for certain
PCI card types:
PCI card types::
+ /* Set EEH reset type to fundamental if required by hba */
+ if (IS_QLA24XX(ha) || IS_QLA25XX(ha) || IS_QLA81XX(ha))
+ pdev->needs_freset = 1;
+
+ /* Set EEH reset type to fundamental if required by hba */
+ if (IS_QLA24XX(ha) || IS_QLA25XX(ha) || IS_QLA81XX(ha))
+ pdev->needs_freset = 1;
+
Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent
Failure).
>>> The current powerpc implementation does not try a power-cycle
>>> reset if the driver returned PCI_ERS_RESULT_DISCONNECT.
>>> However, it probably should.
.. note::
The current powerpc implementation does not try a power-cycle
reset if the driver returned PCI_ERS_RESULT_DISCONNECT.
However, it probably should.
STEP 5: Resume Operations
@@ -370,44 +382,43 @@ The current policy is to turn this into a platform policy.
That is, the recovery API only requires that:
- There is no guarantee that interrupt delivery can proceed from any
device on the segment starting from the error detection and until the
slot_reset callback is called, at which point interrupts are expected
to be fully operational.
device on the segment starting from the error detection and until the
slot_reset callback is called, at which point interrupts are expected
to be fully operational.
- There is no guarantee that interrupt delivery is stopped, that is,
a driver that gets an interrupt after detecting an error, or that detects
an error within the interrupt handler such that it prevents proper
ack'ing of the interrupt (and thus removal of the source) should just
return IRQ_NOTHANDLED. It's up to the platform to deal with that
condition, typically by masking the IRQ source during the duration of
the error handling. It is expected that the platform "knows" which
interrupts are routed to error-management capable slots and can deal
with temporarily disabling that IRQ number during error processing (this
isn't terribly complex). That means some IRQ latency for other devices
sharing the interrupt, but there is simply no other way. High end
platforms aren't supposed to share interrupts between many devices
anyway :)
a driver that gets an interrupt after detecting an error, or that detects
an error within the interrupt handler such that it prevents proper
ack'ing of the interrupt (and thus removal of the source) should just
return IRQ_NOTHANDLED. It's up to the platform to deal with that
condition, typically by masking the IRQ source during the duration of
the error handling. It is expected that the platform "knows" which
interrupts are routed to error-management capable slots and can deal
with temporarily disabling that IRQ number during error processing (this
isn't terribly complex). That means some IRQ latency for other devices
sharing the interrupt, but there is simply no other way. High end
platforms aren't supposed to share interrupts between many devices
anyway :)
>>> Implementation details for the powerpc platform are discussed in
>>> the file Documentation/powerpc/eeh-pci-error-recovery.txt
.. note::
>>> As of this writing, there is a growing list of device drivers with
>>> patches implementing error recovery. Not all of these patches are in
>>> mainline yet. These may be used as "examples":
>>>
>>> drivers/scsi/ipr
>>> drivers/scsi/sym53c8xx_2
>>> drivers/scsi/qla2xxx
>>> drivers/scsi/lpfc
>>> drivers/next/bnx2.c
>>> drivers/next/e100.c
>>> drivers/net/e1000
>>> drivers/net/e1000e
>>> drivers/net/ixgb
>>> drivers/net/ixgbe
>>> drivers/net/cxgb3
>>> drivers/net/s2io.c
>>> drivers/net/qlge
Implementation details for the powerpc platform are discussed in
the file Documentation/powerpc/eeh-pci-error-recovery.txt
The End
-------
As of this writing, there is a growing list of device drivers with
patches implementing error recovery. Not all of these patches are in
mainline yet. These may be used as "examples":
- drivers/scsi/ipr
- drivers/scsi/sym53c8xx_2
- drivers/scsi/qla2xxx
- drivers/scsi/lpfc
- drivers/next/bnx2.c
- drivers/next/e100.c
- drivers/net/e1000
- drivers/net/e1000e
- drivers/net/ixgb
- drivers/net/ixgbe
- drivers/net/cxgb3
- drivers/net/s2io.c
- drivers/net/qlge

View File

@@ -1,14 +1,19 @@
PCI Express I/O Virtualization Howto
Copyright (C) 2009 Intel Corporation
Yu Zhao <yu.zhao@intel.com>
.. SPDX-License-Identifier: GPL-2.0
.. include:: <isonum.txt>
Update: November 2012
-- sysfs-based SRIOV enable-/disable-ment
Donald Dutile <ddutile@redhat.com>
====================================
PCI Express I/O Virtualization Howto
====================================
1. Overview
:Copyright: |copy| 2009 Intel Corporation
:Authors: - Yu Zhao <yu.zhao@intel.com>
- Donald Dutile <ddutile@redhat.com>
1.1 What is SR-IOV
Overview
========
What is SR-IOV
--------------
Single Root I/O Virtualization (SR-IOV) is a PCI Express Extended
capability which makes one physical device appear as multiple virtual
@@ -23,9 +28,11 @@ Memory Space, which is used to map its register set. VF device driver
operates on the register set so it can be functional and appear as a
real existing PCI device.
2. User Guide
User Guide
==========
2.1 How can I enable SR-IOV capability
How can I enable SR-IOV capability
----------------------------------
Multiple methods are available for SR-IOV enablement.
In the first method, the device driver (PF driver) will control the
@@ -43,105 +50,123 @@ checks, e.g., check numvfs == 0 if enabling VFs, ensure
numvfs <= totalvfs.
The second method is the recommended method for new/future VF devices.
2.2 How can I use the Virtual Functions
How can I use the Virtual Functions
-----------------------------------
The VF is treated as hot-plugged PCI devices in the kernel, so they
should be able to work in the same way as real PCI devices. The VF
requires device driver that is same as a normal PCI device's.
3. Developer Guide
Developer Guide
===============
3.1 SR-IOV API
SR-IOV API
----------
To enable SR-IOV capability:
(a) For the first method, in the driver:
(a) For the first method, in the driver::
int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
'nr_virtfn' is number of VFs to be enabled.
(b) For the second method, from sysfs:
'nr_virtfn' is number of VFs to be enabled.
(b) For the second method, from sysfs::
echo 'nr_virtfn' > \
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
To disable SR-IOV capability:
(a) For the first method, in the driver:
(a) For the first method, in the driver::
void pci_disable_sriov(struct pci_dev *dev);
(b) For the second method, from sysfs:
(b) For the second method, from sysfs::
echo 0 > \
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
To enable auto probing VFs by a compatible driver on the host, run
command below before enabling SR-IOV capabilities. This is the
default behavior.
::
echo 1 > \
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_drivers_autoprobe
To disable auto probing VFs by a compatible driver on the host, run
command below before enabling SR-IOV capabilities. Updating this
entry will not affect VFs which are already probed.
::
echo 0 > \
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_drivers_autoprobe
3.2 Usage example
Usage example
-------------
Following piece of code illustrates the usage of the SR-IOV API.
::
static int dev_probe(struct pci_dev *dev, const struct pci_device_id *id)
{
pci_enable_sriov(dev, NR_VIRTFN);
static int dev_probe(struct pci_dev *dev, const struct pci_device_id *id)
{
pci_enable_sriov(dev, NR_VIRTFN);
...
return 0;
}
static void dev_remove(struct pci_dev *dev)
{
pci_disable_sriov(dev);
...
}
static int dev_suspend(struct pci_dev *dev, pm_message_t state)
{
...
return 0;
}
static int dev_resume(struct pci_dev *dev)
{
...
return 0;
}
static void dev_shutdown(struct pci_dev *dev)
{
...
}
static int dev_sriov_configure(struct pci_dev *dev, int numvfs)
{
if (numvfs > 0) {
...
pci_enable_sriov(dev, numvfs);
...
return numvfs;
}
if (numvfs == 0) {
....
pci_disable_sriov(dev);
...
return 0;
}
}
static struct pci_driver dev_driver = {
.name = "SR-IOV Physical Function driver",
.id_table = dev_id_table,
.probe = dev_probe,
.remove = dev_remove,
.suspend = dev_suspend,
.resume = dev_resume,
.shutdown = dev_shutdown,
.sriov_configure = dev_sriov_configure,
};
static void dev_remove(struct pci_dev *dev)
{
pci_disable_sriov(dev);
...
}
static int dev_suspend(struct pci_dev *dev, pm_message_t state)
{
...
return 0;
}
static int dev_resume(struct pci_dev *dev)
{
...
return 0;
}
static void dev_shutdown(struct pci_dev *dev)
{
...
}
static int dev_sriov_configure(struct pci_dev *dev, int numvfs)
{
if (numvfs > 0) {
...
pci_enable_sriov(dev, numvfs);
...
return numvfs;
}
if (numvfs == 0) {
....
pci_disable_sriov(dev);
...
return 0;
}
}
static struct pci_driver dev_driver = {
.name = "SR-IOV Physical Function driver",
.id_table = dev_id_table,
.probe = dev_probe,
.remove = dev_remove,
.suspend = dev_suspend,
.resume = dev_resume,
.shutdown = dev_shutdown,
.sriov_configure = dev_sriov_configure,
};

View File

@@ -1,10 +1,12 @@
.. SPDX-License-Identifier: GPL-2.0
How To Write Linux PCI Drivers
==============================
How To Write Linux PCI Drivers
==============================
by Martin Mares <mj@ucw.cz> on 07-Feb-2000
updated by Grant Grundler <grundler@parisc-linux.org> on 23-Dec-2006
:Authors: - Martin Mares <mj@ucw.cz>
- Grant Grundler <grundler@parisc-linux.org>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The world of PCI is vast and full of (mostly unpleasant) surprises.
Since each CPU architecture implements different chip-sets and PCI devices
have different requirements (erm, "features"), the result is the PCI support
@@ -15,8 +17,7 @@ PCI device drivers.
A more complete resource is the third edition of "Linux Device Drivers"
by Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman.
LDD3 is available for free (under Creative Commons License) from:
http://lwn.net/Kernel/LDD3/
http://lwn.net/Kernel/LDD3/.
However, keep in mind that all documents are subject to "bit rot".
Refer to the source code if things are not working as described here.
@@ -25,9 +26,8 @@ Please send questions/comments/patches about Linux PCI API to the
"Linux PCI" <linux-pci@atrey.karlin.mff.cuni.cz> mailing list.
0. Structure of PCI drivers
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Structure of PCI drivers
========================
PCI drivers "discover" PCI devices in a system via pci_register_driver().
Actually, it's the other way around. When the PCI generic code discovers
a new device, the driver with a matching "description" will be notified.
@@ -42,24 +42,25 @@ pointers and thus dictates the high level structure of a driver.
Once the driver knows about a PCI device and takes ownership, the
driver generally needs to perform the following initialization:
Enable the device
Request MMIO/IOP resources
Set the DMA mask size (for both coherent and streaming DMA)
Allocate and initialize shared control data (pci_allocate_coherent())
Access device configuration space (if needed)
Register IRQ handler (request_irq())
Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip)
Enable DMA/processing engines
- Enable the device
- Request MMIO/IOP resources
- Set the DMA mask size (for both coherent and streaming DMA)
- Allocate and initialize shared control data (pci_allocate_coherent())
- Access device configuration space (if needed)
- Register IRQ handler (request_irq())
- Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip)
- Enable DMA/processing engines
When done using the device, and perhaps the module needs to be unloaded,
the driver needs to take the follow steps:
Disable the device from generating IRQs
Release the IRQ (free_irq())
Stop all DMA activity
Release DMA buffers (both streaming and coherent)
Unregister from other subsystems (e.g. scsi or netdev)
Release MMIO/IOP resources
Disable the device
- Disable the device from generating IRQs
- Release the IRQ (free_irq())
- Stop all DMA activity
- Release DMA buffers (both streaming and coherent)
- Unregister from other subsystems (e.g. scsi or netdev)
- Release MMIO/IOP resources
- Disable the device
Most of these topics are covered in the following sections.
For the rest look at LDD3 or <linux/pci.h> .
@@ -70,99 +71,38 @@ completely empty or just returning an appropriate error codes to avoid
lots of ifdefs in the drivers.
pci_register_driver() call
==========================
1. pci_register_driver() call
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PCI device drivers call pci_register_driver() during their
PCI device drivers call ``pci_register_driver()`` during their
initialization with a pointer to a structure describing the driver
(struct pci_driver):
(``struct pci_driver``):
field name Description
---------- ------------------------------------------------------
id_table Pointer to table of device ID's the driver is
interested in. Most drivers should export this
table using MODULE_DEVICE_TABLE(pci,...).
.. kernel-doc:: include/linux/pci.h
:functions: pci_driver
probe This probing function gets called (during execution
of pci_register_driver() for already existing
devices or later if a new device gets inserted) for
all PCI devices which match the ID table and are not
"owned" by the other drivers yet. This function gets
passed a "struct pci_dev *" for each device whose
entry in the ID table matches the device. The probe
function returns zero when the driver chooses to
take "ownership" of the device or an error code
(negative number) otherwise.
The probe function always gets called from process
context, so it can sleep.
remove The remove() function gets called whenever a device
being handled by this driver is removed (either during
deregistration of the driver or when it's manually
pulled out of a hot-pluggable slot).
The remove function always gets called from process
context, so it can sleep.
suspend Put device into low power state.
suspend_late Put device into low power state.
resume_early Wake device from low power state.
resume Wake device from low power state.
(Please see Documentation/power/pci.txt for descriptions
of PCI Power Management and the related functions.)
shutdown Hook into reboot_notifier_list (kernel/sys.c).
Intended to stop any idling DMA operations.
Useful for enabling wake-on-lan (NIC) or changing
the power state of a device before reboot.
e.g. drivers/net/e100.c.
err_handler See Documentation/PCI/pci-error-recovery.txt
The ID table is an array of struct pci_device_id entries ending with an
The ID table is an array of ``struct pci_device_id`` entries ending with an
all-zero entry. Definitions with static const are generally preferred.
Each entry consists of:
.. kernel-doc:: include/linux/mod_devicetable.h
:functions: pci_device_id
vendor,device Vendor and device ID to match (or PCI_ANY_ID)
subvendor, Subsystem vendor and device ID to match (or PCI_ANY_ID)
subdevice,
class Device class, subclass, and "interface" to match.
See Appendix D of the PCI Local Bus Spec or
include/linux/pci_ids.h for a full list of classes.
Most drivers do not need to specify class/class_mask
as vendor/device is normally sufficient.
class_mask limit which sub-fields of the class field are compared.
See drivers/scsi/sym53c8xx_2/ for example of usage.
driver_data Data private to the driver.
Most drivers don't need to use driver_data field.
Best practice is to use driver_data as an index
into a static list of equivalent device types,
instead of using it as a pointer.
Most drivers only need PCI_DEVICE() or PCI_DEVICE_CLASS() to set up
Most drivers only need ``PCI_DEVICE()`` or ``PCI_DEVICE_CLASS()`` to set up
a pci_device_id table.
New PCI IDs may be added to a device driver pci_ids table at runtime
as shown below:
as shown below::
echo "vendor device subvendor subdevice class class_mask driver_data" > \
/sys/bus/pci/drivers/{driver}/new_id
echo "vendor device subvendor subdevice class class_mask driver_data" > \
/sys/bus/pci/drivers/{driver}/new_id
All fields are passed in as hexadecimal values (no leading 0x).
The vendor and device fields are mandatory, the others are optional. Users
need pass only as many optional fields as necessary:
o subvendor and subdevice fields default to PCI_ANY_ID (FFFFFFFF)
o class and classmask fields default to 0
o driver_data defaults to 0UL.
- subvendor and subdevice fields default to PCI_ANY_ID (FFFFFFFF)
- class and classmask fields default to 0
- driver_data defaults to 0UL.
Note that driver_data must match the value used by any of the pci_device_id
entries defined in the driver. This makes the driver_data field mandatory
@@ -175,29 +115,31 @@ When the driver exits, it just calls pci_unregister_driver() and the PCI layer
automatically calls the remove hook for all devices handled by the driver.
1.1 "Attributes" for driver functions/data
"Attributes" for driver functions/data
--------------------------------------
Please mark the initialization and cleanup functions where appropriate
(the corresponding macros are defined in <linux/init.h>):
====== =================================================
__init Initialization code. Thrown away after the driver
initializes.
__exit Exit code. Ignored for non-modular drivers.
====== =================================================
Tips on when/where to use the above attributes:
o The module_init()/module_exit() functions (and all
- The module_init()/module_exit() functions (and all
initialization functions called _only_ from these)
should be marked __init/__exit.
o Do not mark the struct pci_driver.
- Do not mark the struct pci_driver.
o Do NOT mark a function if you are not sure which mark to use.
- Do NOT mark a function if you are not sure which mark to use.
Better to not mark the function than mark the function wrong.
2. How to find PCI devices manually
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
How to find PCI devices manually
================================
PCI drivers should have a really good reason for not using the
pci_register_driver() interface to search for PCI devices.
@@ -207,17 +149,17 @@ E.g. combined serial/parallel port/floppy controller.
A manual search may be performed using the following constructs:
Searching by vendor and device ID:
Searching by vendor and device ID::
struct pci_dev *dev = NULL;
while (dev = pci_get_device(VENDOR_ID, DEVICE_ID, dev))
configure_device(dev);
Searching by class ID (iterate in a similar way):
Searching by class ID (iterate in a similar way)::
pci_get_class(CLASS_ID, dev)
Searching by both vendor/device and subsystem vendor/device ID:
Searching by both vendor/device and subsystem vendor/device ID::
pci_get_subsys(VENDOR_ID,DEVICE_ID, SUBSYS_VENDOR_ID, SUBSYS_DEVICE_ID, dev).
@@ -230,21 +172,20 @@ the pci_dev that they return. You must eventually (possibly at module unload)
decrement the reference count on these devices by calling pci_dev_put().
3. Device Initialization Steps
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Device Initialization Steps
===========================
As noted in the introduction, most PCI drivers need the following steps
for device initialization:
Enable the device
Request MMIO/IOP resources
Set the DMA mask size (for both coherent and streaming DMA)
Allocate and initialize shared control data (pci_allocate_coherent())
Access device configuration space (if needed)
Register IRQ handler (request_irq())
Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip)
Enable DMA/processing engines.
- Enable the device
- Request MMIO/IOP resources
- Set the DMA mask size (for both coherent and streaming DMA)
- Allocate and initialize shared control data (pci_allocate_coherent())
- Access device configuration space (if needed)
- Register IRQ handler (request_irq())
- Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip)
- Enable DMA/processing engines.
The driver can access PCI config space registers at any time.
(Well, almost. When running BIST, config space can go away...but
@@ -252,26 +193,29 @@ that will just result in a PCI Bus Master Abort and config reads
will return garbage).
3.1 Enable the PCI device
~~~~~~~~~~~~~~~~~~~~~~~~~
Enable the PCI device
---------------------
Before touching any device registers, the driver needs to enable
the PCI device by calling pci_enable_device(). This will:
o wake up the device if it was in suspended state,
o allocate I/O and memory regions of the device (if BIOS did not),
o allocate an IRQ (if BIOS did not).
NOTE: pci_enable_device() can fail! Check the return value.
- wake up the device if it was in suspended state,
- allocate I/O and memory regions of the device (if BIOS did not),
- allocate an IRQ (if BIOS did not).
[ OS BUG: we don't check resource allocations before enabling those
resources. The sequence would make more sense if we called
pci_request_resources() before calling pci_enable_device().
Currently, the device drivers can't detect the bug when when two
devices have been allocated the same range. This is not a common
problem and unlikely to get fixed soon.
.. note::
pci_enable_device() can fail! Check the return value.
.. warning::
OS BUG: we don't check resource allocations before enabling those
resources. The sequence would make more sense if we called
pci_request_resources() before calling pci_enable_device().
Currently, the device drivers can't detect the bug when when two
devices have been allocated the same range. This is not a common
problem and unlikely to get fixed soon.
This has been discussed before but not changed as of 2.6.19:
http://lkml.org/lkml/2006/3/2/194
This has been discussed before but not changed as of 2.6.19:
http://lkml.org/lkml/2006/3/2/194
]
pci_set_master() will enable DMA by setting the bus master bit
in the PCI_COMMAND register. It also fixes the latency timer value if
@@ -288,8 +232,8 @@ pci_try_set_mwi() to have the system do its best effort at enabling
Mem-Wr-Inval.
3.2 Request MMIO/IOP resources
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Request MMIO/IOP resources
--------------------------
Memory (MMIO), and I/O port addresses should NOT be read directly
from the PCI device config space. Use the values in the pci_dev structure
as the PCI "bus address" might have been remapped to a "host physical"
@@ -304,9 +248,10 @@ Conversely, drivers should call pci_release_region() AFTER
calling pci_disable_device().
The idea is to prevent two devices colliding on the same address range.
[ See OS BUG comment above. Currently (2.6.19), The driver can only
determine MMIO and IO Port resource availability _after_ calling
pci_enable_device(). ]
.. tip::
See OS BUG comment above. Currently (2.6.19), The driver can only
determine MMIO and IO Port resource availability _after_ calling
pci_enable_device().
Generic flavors of pci_request_region() are request_mem_region()
(for MMIO ranges) and request_region() (for IO Port ranges).
@@ -316,12 +261,13 @@ BARs.
Also see pci_request_selected_regions() below.
3.3 Set the DMA mask size
~~~~~~~~~~~~~~~~~~~~~~~~~
[ If anything below doesn't make sense, please refer to
Documentation/DMA-API.txt. This section is just a reminder that
drivers need to indicate DMA capabilities of the device and is not
an authoritative source for DMA interfaces. ]
Set the DMA mask size
---------------------
.. note::
If anything below doesn't make sense, please refer to
Documentation/DMA-API.txt. This section is just a reminder that
drivers need to indicate DMA capabilities of the device and is not
an authoritative source for DMA interfaces.
While all drivers should explicitly indicate the DMA capability
(e.g. 32 or 64 bit) of the PCI bus master, devices with more than
@@ -342,23 +288,23 @@ Many 64-bit "PCI" devices (before PCI-X) and some PCI-X devices are
("consistent") data.
3.4 Setup shared control data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Setup shared control data
-------------------------
Once the DMA masks are set, the driver can allocate "consistent" (a.k.a. shared)
memory. See Documentation/DMA-API.txt for a full description of
the DMA APIs. This section is just a reminder that it needs to be done
before enabling DMA on the device.
3.5 Initialize device registers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Initialize device registers
---------------------------
Some drivers will need specific "capability" fields programmed
or other "vendor specific" register initialized or reset.
E.g. clearing pending interrupts.
3.6 Register IRQ handler
~~~~~~~~~~~~~~~~~~~~~~~~
Register IRQ handler
--------------------
While calling request_irq() is the last step described here,
this is often just another intermediate step to initialize a device.
This step can often be deferred until the device is opened for use.
@@ -396,6 +342,7 @@ and msix_enabled flags in the pci_dev structure after calling
pci_alloc_irq_vectors.
There are (at least) two really good reasons for using MSI:
1) MSI is an exclusive interrupt vector by definition.
This means the interrupt handler doesn't have to verify
its device caused the interrupt.
@@ -410,24 +357,23 @@ See drivers/infiniband/hw/mthca/ or drivers/net/tg3.c for examples
of MSI/MSI-X usage.
4. PCI device shutdown
~~~~~~~~~~~~~~~~~~~~~~~
PCI device shutdown
===================
When a PCI device driver is being unloaded, most of the following
steps need to be performed:
Disable the device from generating IRQs
Release the IRQ (free_irq())
Stop all DMA activity
Release DMA buffers (both streaming and consistent)
Unregister from other subsystems (e.g. scsi or netdev)
Disable device from responding to MMIO/IO Port addresses
Release MMIO/IO Port resource(s)
- Disable the device from generating IRQs
- Release the IRQ (free_irq())
- Stop all DMA activity
- Release DMA buffers (both streaming and consistent)
- Unregister from other subsystems (e.g. scsi or netdev)
- Disable device from responding to MMIO/IO Port addresses
- Release MMIO/IO Port resource(s)
4.1 Stop IRQs on the device
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Stop IRQs on the device
-----------------------
How to do this is chip/device specific. If it's not done, it opens
the possibility of a "screaming interrupt" if (and only if)
the IRQ is shared with another device.
@@ -446,16 +392,16 @@ MSI and MSI-X are defined to be exclusive interrupts and thus
are not susceptible to the "screaming interrupt" problem.
4.2 Release the IRQ
~~~~~~~~~~~~~~~~~~~
Release the IRQ
---------------
Once the device is quiesced (no more IRQs), one can call free_irq().
This function will return control once any pending IRQs are handled,
"unhook" the drivers IRQ handler from that IRQ, and finally release
the IRQ if no one else is using it.
4.3 Stop all DMA activity
~~~~~~~~~~~~~~~~~~~~~~~~~
Stop all DMA activity
---------------------
It's extremely important to stop all DMA operations BEFORE attempting
to deallocate DMA control data. Failure to do so can result in memory
corruption, hangs, and on some chip-sets a hard crash.
@@ -467,8 +413,8 @@ While this step sounds obvious and trivial, several "mature" drivers
didn't get this step right in the past.
4.4 Release DMA buffers
~~~~~~~~~~~~~~~~~~~~~~~
Release DMA buffers
-------------------
Once DMA is stopped, clean up streaming DMA first.
I.e. unmap data buffers and return buffers to "upstream"
owners if there is one.
@@ -478,8 +424,8 @@ Then clean up "consistent" buffers which contain the control data.
See Documentation/DMA-API.txt for details on unmapping interfaces.
4.5 Unregister from other subsystems
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Unregister from other subsystems
--------------------------------
Most low level PCI device drivers support some other subsystem
like USB, ALSA, SCSI, NetDev, Infiniband, etc. Make sure your
driver isn't losing resources from that other subsystem.
@@ -487,31 +433,30 @@ If this happens, typically the symptom is an Oops (panic) when
the subsystem attempts to call into a driver that has been unloaded.
4.6 Disable Device from responding to MMIO/IO Port addresses
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Disable Device from responding to MMIO/IO Port addresses
--------------------------------------------------------
io_unmap() MMIO or IO Port resources and then call pci_disable_device().
This is the symmetric opposite of pci_enable_device().
Do not access device registers after calling pci_disable_device().
4.7 Release MMIO/IO Port Resource(s)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Release MMIO/IO Port Resource(s)
--------------------------------
Call pci_release_region() to mark the MMIO or IO Port range as available.
Failure to do so usually results in the inability to reload the driver.
How to access PCI config space
==============================
5. How to access PCI config space
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can use pci_(read|write)_config_(byte|word|dword) to access the config
space of a device represented by struct pci_dev *. All these functions return 0
when successful or an error code (PCIBIOS_...) which can be translated to a text
string by pcibios_strerror. Most drivers expect that accesses to valid PCI
You can use `pci_(read|write)_config_(byte|word|dword)` to access the config
space of a device represented by `struct pci_dev *`. All these functions return
0 when successful or an error code (`PCIBIOS_...`) which can be translated to a
text string by pcibios_strerror. Most drivers expect that accesses to valid PCI
devices don't fail.
If you don't have a struct pci_dev available, you can call
pci_bus_(read|write)_config_(byte|word|dword) to access a given device
`pci_bus_(read|write)_config_(byte|word|dword)` to access a given device
and function on that bus.
If you access fields in the standard portion of the config header, please
@@ -522,10 +467,10 @@ pci_find_capability() for the particular capability and it will find the
corresponding register block for you.
Other interesting functions
===========================
6. Other interesting functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
============================= ================================================
pci_get_domain_bus_and_slot() Find pci_dev corresponding to given domain,
bus and slot and number. If the device is
found, its reference count is increased.
@@ -539,11 +484,11 @@ pci_set_drvdata() Set private driver data pointer for a pci_dev
pci_get_drvdata() Return private driver data pointer for a pci_dev
pci_set_mwi() Enable Memory-Write-Invalidate transactions.
pci_clear_mwi() Disable Memory-Write-Invalidate transactions.
============================= ================================================
7. Miscellaneous hints
~~~~~~~~~~~~~~~~~~~~~~
Miscellaneous hints
===================
When displaying PCI device names to the user (for example when a driver wants
to tell the user what card has it found), please use pci_name(pci_dev).
@@ -559,9 +504,8 @@ on the bus need to be capable of doing it, so this is something which needs
to be handled by platform and generic code, not individual drivers.
8. Vendor and device identifications
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Vendor and device identifications
=================================
Do not add new device or vendor IDs to include/linux/pci_ids.h unless they
are shared across multiple drivers. You can add private definitions in
@@ -575,28 +519,27 @@ There are mirrors of the pci.ids file at http://pciids.sourceforge.net/
and https://github.com/pciutils/pciids.
9. Obsolete functions
~~~~~~~~~~~~~~~~~~~~~
Obsolete functions
==================
There are several functions which you might come across when trying to
port an old driver to the new PCI interface. They are no longer present
in the kernel as they aren't compatible with hotplug or PCI domains or
having sane locking.
================= ===========================================
pci_find_device() Superseded by pci_get_device()
pci_find_subsys() Superseded by pci_get_subsys()
pci_find_slot() Superseded by pci_get_domain_bus_and_slot()
pci_get_slot() Superseded by pci_get_domain_bus_and_slot()
================= ===========================================
The alternative is the traditional PCI device driver that walks PCI
device lists. This is still possible but discouraged.
10. MMIO Space and "Write Posting"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
MMIO Space and "Write Posting"
==============================
Converting a driver from using I/O Port space to using MMIO space
often requires some additional changes. Specifically, "write posting"
@@ -609,14 +552,14 @@ the CPU before the transaction has reached its destination.
Thus, timing sensitive code should add readl() where the CPU is
expected to wait before doing other work. The classic "bit banging"
sequence works fine for I/O Port space:
sequence works fine for I/O Port space::
for (i = 8; --i; val >>= 1) {
outb(val & 1, ioport_reg); /* write bit */
udelay(10);
}
The same sequence for MMIO space should be:
The same sequence for MMIO space should be::
for (i = 8; --i; val >>= 1) {
writeb(val & 1, mmio_reg); /* write bit */
@@ -633,4 +576,3 @@ handle the PCI master abort on all platforms if the PCI device is
expected to not respond to a readl(). Most x86 platforms will allow
MMIO reads to master abort (a.k.a. "Soft Fail") and return garbage
(e.g. ~0). But many RISC platforms will crash (a.k.a."Hard Fail").

View File

@@ -1,21 +1,29 @@
The PCI Express Advanced Error Reporting Driver Guide HOWTO
T. Long Nguyen <tom.l.nguyen@intel.com>
Yanmin Zhang <yanmin.zhang@intel.com>
07/29/2006
.. SPDX-License-Identifier: GPL-2.0
.. include:: <isonum.txt>
===========================================================
The PCI Express Advanced Error Reporting Driver Guide HOWTO
===========================================================
1. Overview
:Authors: - T. Long Nguyen <tom.l.nguyen@intel.com>
- Yanmin Zhang <yanmin.zhang@intel.com>
1.1 About this guide
:Copyright: |copy| 2006 Intel Corporation
Overview
===========
About this guide
----------------
This guide describes the basics of the PCI Express Advanced Error
Reporting (AER) driver and provides information on how to use it, as
well as how to enable the drivers of endpoint devices to conform with
PCI Express AER driver.
1.2 Copyright (C) Intel Corporation 2006.
1.3 What is the PCI Express AER Driver?
What is the PCI Express AER Driver?
-----------------------------------
PCI Express error signaling can occur on the PCI Express link itself
or on behalf of transactions initiated on the link. PCI Express
@@ -30,17 +38,19 @@ The PCI Express AER driver provides the infrastructure to support PCI
Express Advanced Error Reporting capability. The PCI Express AER
driver provides three basic functions:
- Gathers the comprehensive error information if errors occurred.
- Reports error to the users.
- Performs error recovery actions.
- Gathers the comprehensive error information if errors occurred.
- Reports error to the users.
- Performs error recovery actions.
AER driver only attaches root ports which support PCI-Express AER
capability.
2. User Guide
User Guide
==========
2.1 Include the PCI Express AER Root Driver into the Linux Kernel
Include the PCI Express AER Root Driver into the Linux Kernel
-------------------------------------------------------------
The PCI Express AER Root driver is a Root Port service driver attached
to the PCI Express Port Bus driver. If a user wants to use it, the driver
@@ -48,7 +58,8 @@ has to be compiled. Option CONFIG_PCIEAER supports this capability. It
depends on CONFIG_PCIEPORTBUS, so pls. set CONFIG_PCIEPORTBUS=y and
CONFIG_PCIEAER = y.
2.2 Load PCI Express AER Root Driver
Load PCI Express AER Root Driver
--------------------------------
Some systems have AER support in firmware. Enabling Linux AER support at
the same time the firmware handles AER may result in unpredictable
@@ -56,30 +67,34 @@ behavior. Therefore, Linux does not handle AER events unless the firmware
grants AER control to the OS via the ACPI _OSC method. See the PCI FW 3.0
Specification for details regarding _OSC usage.
2.3 AER error output
AER error output
----------------
When a PCIe AER error is captured, an error message will be output to
console. If it's a correctable error, it is output as a warning.
Otherwise, it is printed as an error. So users could choose different
log level to filter out correctable error messages.
Below shows an example:
0000:50:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0500(Requester ID)
0000:50:00.0: device [8086:0329] error status/mask=00100000/00000000
0000:50:00.0: [20] Unsupported Request (First)
0000:50:00.0: TLP Header: 04000001 00200a03 05010000 00050100
Below shows an example::
0000:50:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0500(Requester ID)
0000:50:00.0: device [8086:0329] error status/mask=00100000/00000000
0000:50:00.0: [20] Unsupported Request (First)
0000:50:00.0: TLP Header: 04000001 00200a03 05010000 00050100
In the example, 'Requester ID' means the ID of the device who sends
the error message to root port. Pls. refer to pci express specs for
other fields.
2.4 AER Statistics / Counters
AER Statistics / Counters
-------------------------
When PCIe AER errors are captured, the counters / statistics are also exposed
in the form of sysfs attributes which are documented at
Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
3. Developer Guide
Developer Guide
===============
To enable AER aware support requires a software driver to configure
the AER capability structure within its device and to provide callbacks.
@@ -120,7 +135,8 @@ hierarchy and links. These errors do not include any device specific
errors because device specific errors will still get sent directly to
the device driver.
3.1 Configure the AER capability structure
Configure the AER capability structure
--------------------------------------
AER aware drivers of PCI Express component need change the device
control registers to enable AER. They also could change AER registers,
@@ -128,9 +144,11 @@ including mask and severity registers. Helper function
pci_enable_pcie_error_reporting could be used to enable AER. See
section 3.3.
3.2. Provide callbacks
Provide callbacks
-----------------
3.2.1 callback reset_link to reset pci express link
callback reset_link to reset pci express link
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This callback is used to reset the pci express physical link when a
fatal error happens. The root port aer service driver provides a
@@ -140,13 +158,15 @@ upstream ports should provide their own reset_link functions.
In struct pcie_port_service_driver, a new pointer, reset_link, is
added.
::
pci_ers_result_t (*reset_link) (struct pci_dev *dev);
pci_ers_result_t (*reset_link) (struct pci_dev *dev);
Section 3.2.2.2 provides more detailed info on when to call
reset_link.
3.2.2 PCI error-recovery callbacks
PCI error-recovery callbacks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The PCI Express AER Root driver uses error callbacks to coordinate
with downstream device drivers associated with a hierarchy in question
@@ -161,7 +181,8 @@ definitions of the callbacks.
Below sections specify when to call the error callback functions.
3.2.2.1 Correctable errors
Correctable errors
~~~~~~~~~~~~~~~~~~
Correctable errors pose no impacts on the functionality of
the interface. The PCI Express protocol can recover without any
@@ -169,13 +190,16 @@ software intervention or any loss of data. These errors do not
require any recovery actions. The AER driver clears the device's
correctable error status register accordingly and logs these errors.
3.2.2.2 Non-correctable (non-fatal and fatal) errors
Non-correctable (non-fatal and fatal) errors
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If an error message indicates a non-fatal error, performing link reset
at upstream is not required. The AER driver calls error_detected(dev,
pci_channel_io_normal) to all drivers associated within a hierarchy in
question. for example,
EndPoint<==>DownstreamPort B<==>UpstreamPort A<==>RootPort.
question. for example::
EndPoint<==>DownstreamPort B<==>UpstreamPort A<==>RootPort
If Upstream port A captures an AER error, the hierarchy consists of
Downstream port B and EndPoint.
@@ -199,53 +223,72 @@ function. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER and
reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes
to mmio_enabled.
3.3 helper functions
helper functions
----------------
::
int pci_enable_pcie_error_reporting(struct pci_dev *dev);
3.3.1 int pci_enable_pcie_error_reporting(struct pci_dev *dev);
pci_enable_pcie_error_reporting enables the device to send error
messages to root port when an error is detected. Note that devices
don't enable the error reporting by default, so device drivers need
call this function to enable it.
3.3.2 int pci_disable_pcie_error_reporting(struct pci_dev *dev);
::
int pci_disable_pcie_error_reporting(struct pci_dev *dev);
pci_disable_pcie_error_reporting disables the device to send error
messages to root port when an error is detected.
3.3.3 int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev);
::
int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev);`
pci_cleanup_aer_uncorrect_error_status cleanups the uncorrectable
error status register.
3.4 Frequent Asked Questions
Frequent Asked Questions
------------------------
Q: What happens if a PCI Express device driver does not provide an
error recovery handler (pci_driver->err_handler is equal to NULL)?
Q:
What happens if a PCI Express device driver does not provide an
error recovery handler (pci_driver->err_handler is equal to NULL)?
A: The devices attached with the driver won't be recovered. If the
error is fatal, kernel will print out warning messages. Please refer
to section 3 for more information.
A:
The devices attached with the driver won't be recovered. If the
error is fatal, kernel will print out warning messages. Please refer
to section 3 for more information.
Q: What happens if an upstream port service driver does not provide
callback reset_link?
Q:
What happens if an upstream port service driver does not provide
callback reset_link?
A: Fatal error recovery will fail if the errors are reported by the
upstream ports who are attached by the service driver.
A:
Fatal error recovery will fail if the errors are reported by the
upstream ports who are attached by the service driver.
Q: How does this infrastructure deal with driver that is not PCI
Express aware?
Q:
How does this infrastructure deal with driver that is not PCI
Express aware?
A: This infrastructure calls the error callback functions of the
driver when an error happens. But if the driver is not aware of
PCI Express, the device might not report its own errors to root
port.
A:
This infrastructure calls the error callback functions of the
driver when an error happens. But if the driver is not aware of
PCI Express, the device might not report its own errors to root
port.
Q: What modifications will that driver need to make it compatible
with the PCI Express AER Root driver?
Q:
What modifications will that driver need to make it compatible
with the PCI Express AER Root driver?
A: It could call the helper functions to enable AER in devices and
cleanup uncorrectable status register. Pls. refer to section 3.3.
A:
It could call the helper functions to enable AER in devices and
cleanup uncorrectable status register. Pls. refer to section 3.3.
4. Software error injection
Software error injection
========================
Debugging PCIe AER error recovery code is quite difficult because it
is hard to trigger real hardware errors. Software based error
@@ -261,6 +304,7 @@ After reboot with new kernel or insert the module, a device file named
Then, you need a user space tool named aer-inject, which can be gotten
from:
https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/
More information about aer-inject can be found in the document comes

View File

@@ -1,16 +1,23 @@
The PCI Express Port Bus Driver Guide HOWTO
Tom L Nguyen tom.l.nguyen@intel.com
11/03/2004
.. SPDX-License-Identifier: GPL-2.0
.. include:: <isonum.txt>
1. About this guide
===========================================
The PCI Express Port Bus Driver Guide HOWTO
===========================================
:Author: Tom L Nguyen tom.l.nguyen@intel.com 11/03/2004
:Copyright: |copy| 2004 Intel Corporation
About this guide
================
This guide describes the basics of the PCI Express Port Bus driver
and provides information on how to enable the service drivers to
register/unregister with the PCI Express Port Bus Driver.
2. Copyright 2004 Intel Corporation
3. What is the PCI Express Port Bus Driver
What is the PCI Express Port Bus Driver
=======================================
A PCI Express Port is a logical PCI-PCI Bridge structure. There
are two types of PCI Express Port: the Root Port and the Switch
@@ -30,7 +37,8 @@ support (AER), and virtual channel support (VC). These services may
be handled by a single complex driver or be individually distributed
and handled by corresponding service drivers.
4. Why use the PCI Express Port Bus Driver?
Why use the PCI Express Port Bus Driver?
========================================
In existing Linux kernels, the Linux Device Driver Model allows a
physical device to be handled by only a single driver. The PCI
@@ -51,28 +59,31 @@ PCI Express Ports and distributes all provided service requests
to the corresponding service drivers as required. Some key
advantages of using the PCI Express Port Bus driver are listed below:
- Allow multiple service drivers to run simultaneously on
a PCI-PCI Bridge Port device.
- Allow multiple service drivers to run simultaneously on
a PCI-PCI Bridge Port device.
- Allow service drivers implemented in an independent
staged approach.
- Allow service drivers implemented in an independent
staged approach.
- Allow one service driver to run on multiple PCI-PCI Bridge
Port devices.
- Allow one service driver to run on multiple PCI-PCI Bridge
Port devices.
- Manage and distribute resources of a PCI-PCI Bridge Port
device to requested service drivers.
- Manage and distribute resources of a PCI-PCI Bridge Port
device to requested service drivers.
5. Configuring the PCI Express Port Bus Driver vs. Service Drivers
Configuring the PCI Express Port Bus Driver vs. Service Drivers
===============================================================
5.1 Including the PCI Express Port Bus Driver Support into the Kernel
Including the PCI Express Port Bus Driver Support into the Kernel
-----------------------------------------------------------------
Including the PCI Express Port Bus driver depends on whether the PCI
Express support is included in the kernel config. The kernel will
automatically include the PCI Express Port Bus driver as a kernel
driver when the PCI Express support is enabled in the kernel.
5.2 Enabling Service Driver Support
Enabling Service Driver Support
-------------------------------
PCI device drivers are implemented based on Linux Device Driver Model.
All service drivers are PCI device drivers. As discussed above, it is
@@ -89,9 +100,11 @@ header file /include/linux/pcieport_if.h, before calling these APIs.
Failure to do so will result an identity mismatch, which prevents
the PCI Express Port Bus driver from loading a service driver.
5.2.1 pcie_port_service_register
pcie_port_service_register
~~~~~~~~~~~~~~~~~~~~~~~~~~
::
int pcie_port_service_register(struct pcie_port_service_driver *new)
int pcie_port_service_register(struct pcie_port_service_driver *new)
This API replaces the Linux Driver Model's pci_register_driver API. A
service driver should always calls pcie_port_service_register at
@@ -99,69 +112,76 @@ module init. Note that after service driver being loaded, calls
such as pci_enable_device(dev) and pci_set_master(dev) are no longer
necessary since these calls are executed by the PCI Port Bus driver.
5.2.2 pcie_port_service_unregister
pcie_port_service_unregister
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
::
void pcie_port_service_unregister(struct pcie_port_service_driver *new)
void pcie_port_service_unregister(struct pcie_port_service_driver *new)
pcie_port_service_unregister replaces the Linux Driver Model's
pci_unregister_driver. It's always called by service driver when a
module exits.
5.2.3 Sample Code
Sample Code
~~~~~~~~~~~
Below is sample service driver code to initialize the port service
driver data structure.
::
static struct pcie_port_service_id service_id[] = { {
.vendor = PCI_ANY_ID,
.device = PCI_ANY_ID,
.port_type = PCIE_RC_PORT,
.service_type = PCIE_PORT_SERVICE_AER,
}, { /* end: all zeroes */ }
};
static struct pcie_port_service_id service_id[] = { {
.vendor = PCI_ANY_ID,
.device = PCI_ANY_ID,
.port_type = PCIE_RC_PORT,
.service_type = PCIE_PORT_SERVICE_AER,
}, { /* end: all zeroes */ }
};
static struct pcie_port_service_driver root_aerdrv = {
.name = (char *)device_name,
.id_table = &service_id[0],
static struct pcie_port_service_driver root_aerdrv = {
.name = (char *)device_name,
.id_table = &service_id[0],
.probe = aerdrv_load,
.remove = aerdrv_unload,
.probe = aerdrv_load,
.remove = aerdrv_unload,
.suspend = aerdrv_suspend,
.resume = aerdrv_resume,
};
.suspend = aerdrv_suspend,
.resume = aerdrv_resume,
};
Below is a sample code for registering/unregistering a service
driver.
::
static int __init aerdrv_service_init(void)
{
int retval = 0;
static int __init aerdrv_service_init(void)
{
int retval = 0;
retval = pcie_port_service_register(&root_aerdrv);
if (!retval) {
/*
* FIX ME
*/
}
return retval;
}
retval = pcie_port_service_register(&root_aerdrv);
if (!retval) {
/*
* FIX ME
*/
}
return retval;
}
static void __exit aerdrv_service_exit(void)
{
pcie_port_service_unregister(&root_aerdrv);
}
static void __exit aerdrv_service_exit(void)
{
pcie_port_service_unregister(&root_aerdrv);
}
module_init(aerdrv_service_init);
module_exit(aerdrv_service_exit);
module_init(aerdrv_service_init);
module_exit(aerdrv_service_exit);
6. Possible Resource Conflicts
Possible Resource Conflicts
===========================
Since all service drivers of a PCI-PCI Bridge Port device are
allowed to run simultaneously, below lists a few of possible resource
conflicts with proposed solutions.
6.1 MSI and MSI-X Vector Resource
MSI and MSI-X Vector Resource
-----------------------------
Once MSI or MSI-X interrupts are enabled on a device, it stays in this
mode until they are disabled again. Since service drivers of the same
@@ -179,7 +199,8 @@ driver. Service drivers should use (struct pcie_device*)dev->irq to
call request_irq/free_irq. In addition, the interrupt mode is stored
in the field interrupt_mode of struct pcie_device.
6.3 PCI Memory/IO Mapped Regions
PCI Memory/IO Mapped Regions
----------------------------
Service drivers for PCI Express Power Management (PME), Advanced
Error Reporting (AER), Hot-Plug (HP) and Virtual Channel (VC) access
@@ -188,7 +209,8 @@ registers accessed are independent of each other. This patch assumes
that all service drivers will be well behaved and not overwrite
other service driver's configuration settings.
6.4 PCI Config Registers
PCI Config Registers
--------------------
Each service driver runs its PCI config operations on its own
capability structure except the PCI Express capability structure, in

View File

@@ -1,17 +1,19 @@
RCU on Uniprocessor Systems
.. _up_doc:
RCU on Uniprocessor Systems
===========================
A common misconception is that, on UP systems, the call_rcu() primitive
may immediately invoke its function. The basis of this misconception
is that since there is only one CPU, it should not be necessary to
wait for anything else to get done, since there are no other CPUs for
anything else to be happening on. Although this approach will -sort- -of-
anything else to be happening on. Although this approach will *sort of*
work a surprising amount of the time, it is a very bad idea in general.
This document presents three examples that demonstrate exactly how bad
an idea this is.
Example 1: softirq Suicide
--------------------------
Suppose that an RCU-based algorithm scans a linked list containing
elements A, B, and C in process context, and can delete elements from
@@ -28,8 +30,8 @@ your kernel.
This same problem can occur if call_rcu() is invoked from a hardware
interrupt handler.
Example 2: Function-Call Fatality
---------------------------------
Of course, one could avert the suicide described in the preceding example
by having call_rcu() directly invoke its arguments only if it was called
@@ -46,11 +48,13 @@ its arguments would cause it to fail to make the fundamental guarantee
underlying RCU, namely that call_rcu() defers invoking its arguments until
all RCU read-side critical sections currently executing have completed.
Quick Quiz #1: why is it -not- legal to invoke synchronize_rcu() in
this case?
Quick Quiz #1:
Why is it *not* legal to invoke synchronize_rcu() in this case?
:ref:`Answers to Quick Quiz <answer_quick_quiz_up>`
Example 3: Death by Deadlock
----------------------------
Suppose that call_rcu() is invoked while holding a lock, and that the
callback function must acquire this same lock. In this case, if
@@ -76,25 +80,30 @@ there are cases where this can be quite ugly:
If call_rcu() directly invokes the callback, painful locking restrictions
or API changes would be required.
Quick Quiz #2: What locking restriction must RCU callbacks respect?
Quick Quiz #2:
What locking restriction must RCU callbacks respect?
:ref:`Answers to Quick Quiz <answer_quick_quiz_up>`
Summary
-------
Permitting call_rcu() to immediately invoke its arguments breaks RCU,
even on a UP system. So do not do it! Even on a UP system, the RCU
infrastructure -must- respect grace periods, and -must- invoke callbacks
infrastructure *must* respect grace periods, and *must* invoke callbacks
from a known environment in which no locks are held.
Note that it -is- safe for synchronize_rcu() to return immediately on
UP systems, including !PREEMPT SMP builds running on UP systems.
Note that it *is* safe for synchronize_rcu() to return immediately on
UP systems, including PREEMPT SMP builds running on UP systems.
Quick Quiz #3: Why can't synchronize_rcu() return immediately on
UP systems running preemptable RCU?
Quick Quiz #3:
Why can't synchronize_rcu() return immediately on UP systems running
preemptable RCU?
.. _answer_quick_quiz_up:
Answer to Quick Quiz #1:
Why is it -not- legal to invoke synchronize_rcu() in this case?
Why is it *not* legal to invoke synchronize_rcu() in this case?
Because the calling function is scanning an RCU-protected linked
list, and is therefore within an RCU read-side critical section.
@@ -104,12 +113,13 @@ Answer to Quick Quiz #1:
Answer to Quick Quiz #2:
What locking restriction must RCU callbacks respect?
Any lock that is acquired within an RCU callback must be
acquired elsewhere using an _irq variant of the spinlock
primitive. For example, if "mylock" is acquired by an
RCU callback, then a process-context acquisition of this
lock must use something like spin_lock_irqsave() to
acquire the lock.
Any lock that is acquired within an RCU callback must be acquired
elsewhere using an _bh variant of the spinlock primitive.
For example, if "mylock" is acquired by an RCU callback, then
a process-context acquisition of this lock must use something
like spin_lock_bh() to acquire the lock. Please note that
it is also OK to use _irq variants of spinlocks, for example,
spin_lock_irqsave().
If the process-context code were to simply use spin_lock(),
then, since RCU callbacks can be invoked from softirq context,
@@ -119,7 +129,7 @@ Answer to Quick Quiz #2:
This restriction might seem gratuitous, since very few RCU
callbacks acquire locks directly. However, a great many RCU
callbacks do acquire locks -indirectly-, for example, via
callbacks do acquire locks *indirectly*, for example, via
the kfree() primitive.
Answer to Quick Quiz #3:

View File

@@ -0,0 +1,19 @@
.. _rcu_concepts:
============
RCU concepts
============
.. toctree::
:maxdepth: 1
rcu
listRCU
UP
.. only:: subproject and html
Indices
=======
* :ref:`genindex`

View File

@@ -1,5 +1,7 @@
Using RCU to Protect Read-Mostly Linked Lists
.. _list_rcu_doc:
Using RCU to Protect Read-Mostly Linked Lists
=============================================
One of the best applications of RCU is to protect read-mostly linked lists
("struct list_head" in list.h). One big advantage of this approach
@@ -7,8 +9,8 @@ is that all of the required memory barriers are included for you in
the list macros. This document describes several applications of RCU,
with the best fits first.
Example 1: Read-Side Action Taken Outside of Lock, No In-Place Updates
----------------------------------------------------------------------
The best applications are cases where, if reader-writer locking were
used, the read-side lock would be dropped before taking any action
@@ -24,7 +26,7 @@ added or deleted, rather than being modified in place.
A straightforward example of this use of RCU may be found in the
system-call auditing support. For example, a reader-writer locked
implementation of audit_filter_task() might be as follows:
implementation of audit_filter_task() might be as follows::
static enum audit_state audit_filter_task(struct task_struct *tsk)
{
@@ -48,7 +50,7 @@ the corresponding value is returned. By the time that this value is acted
on, the list may well have been modified. This makes sense, since if
you are turning auditing off, it is OK to audit a few extra system calls.
This means that RCU can be easily applied to the read side, as follows:
This means that RCU can be easily applied to the read side, as follows::
static enum audit_state audit_filter_task(struct task_struct *tsk)
{
@@ -73,7 +75,7 @@ become list_for_each_entry_rcu(). The _rcu() list-traversal primitives
insert the read-side memory barriers that are required on DEC Alpha CPUs.
The changes to the update side are also straightforward. A reader-writer
lock might be used as follows for deletion and insertion:
lock might be used as follows for deletion and insertion::
static inline int audit_del_rule(struct audit_rule *rule,
struct list_head *list)
@@ -106,7 +108,7 @@ lock might be used as follows for deletion and insertion:
return 0;
}
Following are the RCU equivalents for these two functions:
Following are the RCU equivalents for these two functions::
static inline int audit_del_rule(struct audit_rule *rule,
struct list_head *list)
@@ -154,13 +156,13 @@ otherwise cause concurrent readers to fail spectacularly.
So, when readers can tolerate stale data and when entries are either added
or deleted, without in-place modification, it is very easy to use RCU!
Example 2: Handling In-Place Updates
------------------------------------
The system-call auditing code does not update auditing rules in place.
However, if it did, reader-writer-locked code to do so might look as
follows (presumably, the field_count is only permitted to decrease,
otherwise, the added fields would need to be filled in):
otherwise, the added fields would need to be filled in)::
static inline int audit_upd_rule(struct audit_rule *rule,
struct list_head *list,
@@ -187,7 +189,7 @@ otherwise, the added fields would need to be filled in):
The RCU version creates a copy, updates the copy, then replaces the old
entry with the newly updated entry. This sequence of actions, allowing
concurrent reads while doing a copy to perform an update, is what gives
RCU ("read-copy update") its name. The RCU code is as follows:
RCU ("read-copy update") its name. The RCU code is as follows::
static inline int audit_upd_rule(struct audit_rule *rule,
struct list_head *list,
@@ -216,8 +218,8 @@ RCU ("read-copy update") its name. The RCU code is as follows:
Again, this assumes that the caller holds audit_netlink_sem. Normally,
the reader-writer lock would become a spinlock in this sort of code.
Example 3: Eliminating Stale Data
---------------------------------
The auditing examples above tolerate stale data, as do most algorithms
that are tracking external state. Because there is a delay from the
@@ -231,13 +233,16 @@ per-entry spinlock, and, if the "deleted" flag is set, pretends that the
entry does not exist. For this to be helpful, the search function must
return holding the per-entry spinlock, as ipc_lock() does in fact do.
Quick Quiz: Why does the search function need to return holding the
per-entry lock for this deleted-flag technique to be helpful?
Quick Quiz:
Why does the search function need to return holding the per-entry lock for
this deleted-flag technique to be helpful?
:ref:`Answer to Quick Quiz <answer_quick_quiz_list>`
If the system-call audit module were to ever need to reject stale data,
one way to accomplish this would be to add a "deleted" flag and a "lock"
spinlock to the audit_entry structure, and modify audit_filter_task()
as follows:
as follows::
static enum audit_state audit_filter_task(struct task_struct *tsk)
{
@@ -268,7 +273,7 @@ audit_upd_rule() would need additional memory barriers to ensure
that the list_add_rcu() was really executed before the list_del_rcu().
The audit_del_rule() function would need to set the "deleted"
flag under the spinlock as follows:
flag under the spinlock as follows::
static inline int audit_del_rule(struct audit_rule *rule,
struct list_head *list)
@@ -290,8 +295,8 @@ flag under the spinlock as follows:
return -EFAULT; /* No matching rule */
}
Summary
-------
Read-mostly list-based data structures that can tolerate stale data are
the most amenable to use of RCU. The simplest case is where entries are
@@ -302,8 +307,9 @@ If stale data cannot be tolerated, then a "deleted" flag may be used
in conjunction with a per-entry spinlock in order to allow the search
function to reject newly deleted data.
.. _answer_quick_quiz_list:
Answer to Quick Quiz
Answer to Quick Quiz:
Why does the search function need to return holding the per-entry
lock for this deleted-flag technique to be helpful?

92
Documentation/RCU/rcu.rst Normal file
View File

@@ -0,0 +1,92 @@
.. _rcu_doc:
RCU Concepts
============
The basic idea behind RCU (read-copy update) is to split destructive
operations into two parts, one that prevents anyone from seeing the data
item being destroyed, and one that actually carries out the destruction.
A "grace period" must elapse between the two parts, and this grace period
must be long enough that any readers accessing the item being deleted have
since dropped their references. For example, an RCU-protected deletion
from a linked list would first remove the item from the list, wait for
a grace period to elapse, then free the element. See the
Documentation/RCU/listRCU.rst file for more information on using RCU with
linked lists.
Frequently Asked Questions
--------------------------
- Why would anyone want to use RCU?
The advantage of RCU's two-part approach is that RCU readers need
not acquire any locks, perform any atomic instructions, write to
shared memory, or (on CPUs other than Alpha) execute any memory
barriers. The fact that these operations are quite expensive
on modern CPUs is what gives RCU its performance advantages
in read-mostly situations. The fact that RCU readers need not
acquire locks can also greatly simplify deadlock-avoidance code.
- How can the updater tell when a grace period has completed
if the RCU readers give no indication when they are done?
Just as with spinlocks, RCU readers are not permitted to
block, switch to user-mode execution, or enter the idle loop.
Therefore, as soon as a CPU is seen passing through any of these
three states, we know that that CPU has exited any previous RCU
read-side critical sections. So, if we remove an item from a
linked list, and then wait until all CPUs have switched context,
executed in user mode, or executed in the idle loop, we can
safely free up that item.
Preemptible variants of RCU (CONFIG_PREEMPT_RCU) get the
same effect, but require that the readers manipulate CPU-local
counters. These counters allow limited types of blocking within
RCU read-side critical sections. SRCU also uses CPU-local
counters, and permits general blocking within RCU read-side
critical sections. These variants of RCU detect grace periods
by sampling these counters.
- If I am running on a uniprocessor kernel, which can only do one
thing at a time, why should I wait for a grace period?
See the Documentation/RCU/UP.rst file for more information.
- How can I see where RCU is currently used in the Linux kernel?
Search for "rcu_read_lock", "rcu_read_unlock", "call_rcu",
"rcu_read_lock_bh", "rcu_read_unlock_bh", "srcu_read_lock",
"srcu_read_unlock", "synchronize_rcu", "synchronize_net",
"synchronize_srcu", and the other RCU primitives. Or grab one
of the cscope databases from:
(http://www.rdrop.com/users/paulmck/RCU/linuxusage/rculocktab.html).
- What guidelines should I follow when writing code that uses RCU?
See the checklist.txt file in this directory.
- Why the name "RCU"?
"RCU" stands for "read-copy update". The file Documentation/RCU/listRCU.rst
has more information on where this name came from, search for
"read-copy update" to find it.
- I hear that RCU is patented? What is with that?
Yes, it is. There are several known patents related to RCU,
search for the string "Patent" in RTFP.txt to find them.
Of these, one was allowed to lapse by the assignee, and the
others have been contributed to the Linux kernel under GPL.
There are now also LGPL implementations of user-level RCU
available (http://liburcu.org/).
- I hear that RCU needs work in order to support realtime kernels?
Realtime-friendly RCU can be enabled via the CONFIG_PREEMPT_RCU
kernel configuration parameter.
- Where can I find more information on RCU?
See the RTFP.txt file in this directory.
Or point your browser at (http://www.rdrop.com/users/paulmck/RCU/).

View File

@@ -1,89 +0,0 @@
RCU Concepts
The basic idea behind RCU (read-copy update) is to split destructive
operations into two parts, one that prevents anyone from seeing the data
item being destroyed, and one that actually carries out the destruction.
A "grace period" must elapse between the two parts, and this grace period
must be long enough that any readers accessing the item being deleted have
since dropped their references. For example, an RCU-protected deletion
from a linked list would first remove the item from the list, wait for
a grace period to elapse, then free the element. See the listRCU.txt
file for more information on using RCU with linked lists.
Frequently Asked Questions
o Why would anyone want to use RCU?
The advantage of RCU's two-part approach is that RCU readers need
not acquire any locks, perform any atomic instructions, write to
shared memory, or (on CPUs other than Alpha) execute any memory
barriers. The fact that these operations are quite expensive
on modern CPUs is what gives RCU its performance advantages
in read-mostly situations. The fact that RCU readers need not
acquire locks can also greatly simplify deadlock-avoidance code.
o How can the updater tell when a grace period has completed
if the RCU readers give no indication when they are done?
Just as with spinlocks, RCU readers are not permitted to
block, switch to user-mode execution, or enter the idle loop.
Therefore, as soon as a CPU is seen passing through any of these
three states, we know that that CPU has exited any previous RCU
read-side critical sections. So, if we remove an item from a
linked list, and then wait until all CPUs have switched context,
executed in user mode, or executed in the idle loop, we can
safely free up that item.
Preemptible variants of RCU (CONFIG_PREEMPT_RCU) get the
same effect, but require that the readers manipulate CPU-local
counters. These counters allow limited types of blocking within
RCU read-side critical sections. SRCU also uses CPU-local
counters, and permits general blocking within RCU read-side
critical sections. These variants of RCU detect grace periods
by sampling these counters.
o If I am running on a uniprocessor kernel, which can only do one
thing at a time, why should I wait for a grace period?
See the UP.txt file in this directory.
o How can I see where RCU is currently used in the Linux kernel?
Search for "rcu_read_lock", "rcu_read_unlock", "call_rcu",
"rcu_read_lock_bh", "rcu_read_unlock_bh", "srcu_read_lock",
"srcu_read_unlock", "synchronize_rcu", "synchronize_net",
"synchronize_srcu", and the other RCU primitives. Or grab one
of the cscope databases from:
http://www.rdrop.com/users/paulmck/RCU/linuxusage/rculocktab.html
o What guidelines should I follow when writing code that uses RCU?
See the checklist.txt file in this directory.
o Why the name "RCU"?
"RCU" stands for "read-copy update". The file listRCU.txt has
more information on where this name came from, search for
"read-copy update" to find it.
o I hear that RCU is patented? What is with that?
Yes, it is. There are several known patents related to RCU,
search for the string "Patent" in RTFP.txt to find them.
Of these, one was allowed to lapse by the assignee, and the
others have been contributed to the Linux kernel under GPL.
There are now also LGPL implementations of user-level RCU
available (http://liburcu.org/).
o I hear that RCU needs work in order to support realtime kernels?
Realtime-friendly RCU can be enabled via the CONFIG_PREEMPT_RCU
kernel configuration parameter.
o Where can I find more information on RCU?
See the RTFP.txt file in this directory.
Or point your browser at http://www.rdrop.com/users/paulmck/RCU/.

View File

@@ -12,6 +12,7 @@ please read on.
Reference counting on elements of lists which are protected by traditional
reader/writer spinlocks or semaphores are straightforward:
CODE LISTING A:
1. 2.
add() search_and_reference()
{ {
@@ -28,7 +29,8 @@ add() search_and_reference()
release_referenced() delete()
{ {
... write_lock(&list_lock);
atomic_dec(&el->rc, relfunc) ...
if(atomic_dec_and_test(&el->rc)) ...
kfree(el);
... remove_element
} write_unlock(&list_lock);
...
@@ -44,6 +46,7 @@ search_and_reference() could potentially hold reference to an element which
has already been deleted from the list/array. Use atomic_inc_not_zero()
in this scenario as follows:
CODE LISTING B:
1. 2.
add() search_and_reference()
{ {
@@ -79,6 +82,7 @@ search_and_reference() code path. In such cases, the
atomic_dec_and_test() may be moved from delete() to el_free()
as follows:
CODE LISTING C:
1. 2.
add() search_and_reference()
{ {
@@ -114,6 +118,17 @@ element can therefore safely be freed. This in turn guarantees that if
any reader finds the element, that reader may safely acquire a reference
without checking the value of the reference counter.
A clear advantage of the RCU-based pattern in listing C over the one
in listing B is that any call to search_and_reference() that locates
a given object will succeed in obtaining a reference to that object,
even given a concurrent invocation of delete() for that same object.
Similarly, a clear advantage of both listings B and C over listing A is
that a call to delete() is not delayed even if there are an arbitrarily
large number of calls to search_and_reference() searching for the same
object that delete() was invoked on. Instead, all that is delayed is
the eventual invocation of kfree(), which is usually not a problem on
modern computer systems, even the small ones.
In cases where delete() can sleep, synchronize_rcu() can be called from
delete(), so that el_free() can be subsumed into delete as follows:
@@ -130,3 +145,7 @@ delete()
kfree(el);
...
}
As additional examples in the kernel, the pattern in listing C is used by
reference counting of struct pid, while the pattern in listing B is used by
struct posix_acl.

View File

@@ -153,7 +153,7 @@ rcupdate.rcu_task_stall_timeout
This boot/sysfs parameter controls the RCU-tasks stall warning
interval. A value of zero or less suppresses RCU-tasks stall
warnings. A positive value sets the stall-warning interval
in jiffies. An RCU-tasks stall warning starts with the line:
in seconds. An RCU-tasks stall warning starts with the line:
INFO: rcu_tasks detected stalls on tasks:

View File

@@ -212,7 +212,7 @@ synchronize_rcu()
rcu_assign_pointer()
typeof(p) rcu_assign_pointer(p, typeof(p) v);
void rcu_assign_pointer(p, typeof(p) v);
Yes, rcu_assign_pointer() -is- implemented as a macro, though it
would be cool to be able to declare a function in this manner.
@@ -220,9 +220,9 @@ rcu_assign_pointer()
The updater uses this function to assign a new value to an
RCU-protected pointer, in order to safely communicate the change
in value from the updater to the reader. This function returns
the new value, and also executes any memory-barrier instructions
required for a given CPU architecture.
in value from the updater to the reader. This macro does not
evaluate to an rvalue, but it does execute any memory-barrier
instructions required for a given CPU architecture.
Perhaps just as important, it serves to document (1) which
pointers are protected by RCU and (2) the point at which a

View File

@@ -1,3 +1,7 @@
==================
Control Groupstats
==================
Control Groupstats is inspired by the discussion at
http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics as
suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263.
@@ -19,9 +23,9 @@ about tasks blocked on I/O. If CONFIG_TASK_DELAY_ACCT is disabled, this
information will not be available.
To extract cgroup statistics a utility very similar to getdelays.c
has been developed, the sample output of the utility is shown below
has been developed, the sample output of the utility is shown below::
~/balbir/cgroupstats # ./getdelays -C "/sys/fs/cgroup/a"
sleeping 1, blocked 0, running 1, stopped 0, uninterruptible 0
~/balbir/cgroupstats # ./getdelays -C "/sys/fs/cgroup"
sleeping 155, blocked 0, running 1, stopped 0, uninterruptible 2
~/balbir/cgroupstats # ./getdelays -C "/sys/fs/cgroup/a"
sleeping 1, blocked 0, running 1, stopped 0, uninterruptible 0
~/balbir/cgroupstats # ./getdelays -C "/sys/fs/cgroup"
sleeping 155, blocked 0, running 1, stopped 0, uninterruptible 2

View File

@@ -1,5 +1,6 @@
================
Delay accounting
----------------
================
Tasks encounter delays in execution when they wait
for some kernel resource to become available e.g. a
@@ -39,7 +40,9 @@ in detail in a separate document in this directory. Taskstats returns a
generic data structure to userspace corresponding to per-pid and per-tgid
statistics. The delay accounting functionality populates specific fields of
this structure. See
include/linux/taskstats.h
for a description of the fields pertaining to delay accounting.
It will generally be in the form of counters returning the cumulative
delay seen for cpu, sync block I/O, swapin, memory reclaim etc.
@@ -61,13 +64,16 @@ also serves as an example of using the taskstats interface.
Usage
-----
Compile the kernel with
Compile the kernel with::
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASKSTATS=y
Delay accounting is enabled by default at boot up.
To disable, add
To disable, add::
nodelayacct
to the kernel boot options. The rest of the instructions
below assume this has not been done.
@@ -78,40 +84,43 @@ The utility also allows a given command to be
executed and the corresponding delays to be
seen.
General format of the getdelays command
General format of the getdelays command::
getdelays [-t tgid] [-p pid] [-c cmd...]
getdelays [-t tgid] [-p pid] [-c cmd...]
Get delays, since system boot, for pid 10
# ./getdelays -p 10
(output similar to next case)
Get delays, since system boot, for pid 10::
Get sum of delays, since system boot, for all pids with tgid 5
# ./getdelays -t 5
# ./getdelays -p 10
(output similar to next case)
Get sum of delays, since system boot, for all pids with tgid 5::
# ./getdelays -t 5
CPU count real total virtual total delay total
7876 92005750 100000000 24001500
IO count delay total
0 0
SWAP count delay total
0 0
RECLAIM count delay total
0 0
CPU count real total virtual total delay total
7876 92005750 100000000 24001500
IO count delay total
0 0
SWAP count delay total
0 0
RECLAIM count delay total
0 0
Get delays seen in executing a given simple command
# ./getdelays -c ls /
Get delays seen in executing a given simple command::
bin data1 data3 data5 dev home media opt root srv sys usr
boot data2 data4 data6 etc lib mnt proc sbin subdomain tmp var
# ./getdelays -c ls /
bin data1 data3 data5 dev home media opt root srv sys usr
boot data2 data4 data6 etc lib mnt proc sbin subdomain tmp var
CPU count real total virtual total delay total
CPU count real total virtual total delay total
6 4000250 4000000 0
IO count delay total
IO count delay total
0 0
SWAP count delay total
SWAP count delay total
0 0
RECLAIM count delay total
RECLAIM count delay total
0 0

View File

@@ -0,0 +1,14 @@
.. SPDX-License-Identifier: GPL-2.0
==========
Accounting
==========
.. toctree::
:maxdepth: 1
cgroupstats
delay-accounting
psi
taskstats
taskstats-struct

View File

@@ -35,14 +35,14 @@ Pressure interface
Pressure information for each resource is exported through the
respective file in /proc/pressure/ -- cpu, memory, and io.
The format for CPU is as such:
The format for CPU is as such::
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
and for memory and IO:
and for memory and IO::
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
full avg10=0.00 avg60=0.00 avg300=0.00 total=0
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
full avg10=0.00 avg60=0.00 avg300=0.00 total=0
The "some" line indicates the share of time in which at least some
tasks are stalled on a given resource.
@@ -77,9 +77,9 @@ To register a trigger user has to open psi interface file under
/proc/pressure/ representing the resource to be monitored and write the
desired threshold and time window. The open file descriptor should be
used to wait for trigger events using select(), poll() or epoll().
The following format is used:
The following format is used::
<some|full> <stall amount in us> <time window in us>
<some|full> <stall amount in us> <time window in us>
For example writing "some 150000 1000000" into /proc/pressure/memory
would add 150ms threshold for partial memory stall measured within
@@ -115,18 +115,20 @@ trigger is closed.
Userspace monitor usage example
===============================
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <poll.h>
#include <string.h>
#include <unistd.h>
::
/*
* Monitor memory partial stall with 1s tracking window size
* and 150ms threshold.
*/
int main() {
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <poll.h>
#include <string.h>
#include <unistd.h>
/*
* Monitor memory partial stall with 1s tracking window size
* and 150ms threshold.
*/
int main() {
const char trig[] = "some 150000 1000000";
struct pollfd fds;
int n;
@@ -165,7 +167,7 @@ int main() {
}
return 0;
}
}
Cgroup2 interface
=================

View File

@@ -1,5 +1,6 @@
====================
The struct taskstats
--------------------
====================
This document contains an explanation of the struct taskstats fields.
@@ -10,16 +11,24 @@ There are three different groups of fields in the struct taskstats:
the common fields and basic accounting fields are collected for
delivery at do_exit() of a task.
2) Delay accounting fields
These fields are placed between
/* Delay accounting fields start */
and
/* Delay accounting fields end */
These fields are placed between::
/* Delay accounting fields start */
and::
/* Delay accounting fields end */
Their values are collected if CONFIG_TASK_DELAY_ACCT is set.
3) Extended accounting fields
These fields are placed between
/* Extended accounting fields start */
and
/* Extended accounting fields end */
These fields are placed between::
/* Extended accounting fields start */
and::
/* Extended accounting fields end */
Their values are collected if CONFIG_TASK_XACCT is set.
4) Per-task and per-thread context switch count statistics
@@ -31,31 +40,33 @@ There are three different groups of fields in the struct taskstats:
Future extension should add fields to the end of the taskstats struct, and
should not change the relative position of each field within the struct.
::
struct taskstats {
struct taskstats {
1) Common and basic accounting fields::
1) Common and basic accounting fields:
/* The version number of this struct. This field is always set to
* TAKSTATS_VERSION, which is defined in <linux/taskstats.h>.
* Each time the struct is changed, the value should be incremented.
*/
__u16 version;
/* The exit code of a task. */
/* The exit code of a task. */
__u32 ac_exitcode; /* Exit status */
/* The accounting flags of a task as defined in <linux/acct.h>
/* The accounting flags of a task as defined in <linux/acct.h>
* Defined values are AFORK, ASU, ACOMPAT, ACORE, and AXSIG.
*/
__u8 ac_flag; /* Record flags */
/* The value of task_nice() of a task. */
/* The value of task_nice() of a task. */
__u8 ac_nice; /* task_nice */
/* The name of the command that started this task. */
/* The name of the command that started this task. */
char ac_comm[TS_COMM_LEN]; /* Command name */
/* The scheduling discipline as set in task->policy field. */
/* The scheduling discipline as set in task->policy field. */
__u8 ac_sched; /* Scheduling discipline */
__u8 ac_pad[3];
@@ -64,26 +75,27 @@ struct taskstats {
__u32 ac_pid; /* Process ID */
__u32 ac_ppid; /* Parent process ID */
/* The time when a task begins, in [secs] since 1970. */
/* The time when a task begins, in [secs] since 1970. */
__u32 ac_btime; /* Begin time [sec since 1970] */
/* The elapsed time of a task, in [usec]. */
/* The elapsed time of a task, in [usec]. */
__u64 ac_etime; /* Elapsed time [usec] */
/* The user CPU time of a task, in [usec]. */
/* The user CPU time of a task, in [usec]. */
__u64 ac_utime; /* User CPU time [usec] */
/* The system CPU time of a task, in [usec]. */
/* The system CPU time of a task, in [usec]. */
__u64 ac_stime; /* System CPU time [usec] */
/* The minor page fault count of a task, as set in task->min_flt. */
/* The minor page fault count of a task, as set in task->min_flt. */
__u64 ac_minflt; /* Minor Page Fault Count */
/* The major page fault count of a task, as set in task->maj_flt. */
__u64 ac_majflt; /* Major Page Fault Count */
2) Delay accounting fields:
2) Delay accounting fields::
/* Delay accounting fields start
*
* All values, until the comment "Delay accounting fields end" are
@@ -134,7 +146,8 @@ struct taskstats {
/* version 1 ends here */
3) Extended accounting fields
3) Extended accounting fields::
/* Extended accounting fields start */
/* Accumulated RSS usage in duration of a task, in MBytes-usecs.
@@ -145,15 +158,15 @@ struct taskstats {
*/
__u64 coremem; /* accumulated RSS usage in MB-usec */
/* Accumulated virtual memory usage in duration of a task.
/* Accumulated virtual memory usage in duration of a task.
* Same as acct_rss_mem1 above except that we keep track of VM usage.
*/
__u64 virtmem; /* accumulated VM usage in MB-usec */
/* High watermark of RSS usage in duration of a task, in KBytes. */
/* High watermark of RSS usage in duration of a task, in KBytes. */
__u64 hiwater_rss; /* High-watermark of RSS usage */
/* High watermark of VM usage in duration of a task, in KBytes. */
/* High watermark of VM usage in duration of a task, in KBytes. */
__u64 hiwater_vm; /* High-water virtual memory usage */
/* The following four fields are I/O statistics of a task. */
@@ -164,17 +177,23 @@ struct taskstats {
/* Extended accounting fields end */
4) Per-task and per-thread statistics
4) Per-task and per-thread statistics::
__u64 nvcsw; /* Context voluntary switch counter */
__u64 nivcsw; /* Context involuntary switch counter */
5) Time accounting for SMT machines
5) Time accounting for SMT machines::
__u64 ac_utimescaled; /* utime scaled on frequency etc */
__u64 ac_stimescaled; /* stime scaled on frequency etc */
__u64 cpu_scaled_run_real_total; /* scaled cpu_run_real_total */
6) Extended delay accounting fields for memory reclaim
6) Extended delay accounting fields for memory reclaim::
/* Delay waiting for memory reclaim */
__u64 freepages_count;
__u64 freepages_delay_total;
}
::
}

View File

@@ -1,5 +1,6 @@
=============================
Per-task statistics interface
-----------------------------
=============================
Taskstats is a netlink-based interface for sending per-task and
@@ -65,7 +66,7 @@ taskstats.h file.
The data exchanged between user and kernel space is a netlink message belonging
to the NETLINK_GENERIC family and using the netlink attributes interface.
The messages are in the format
The messages are in the format::
+----------+- - -+-------------+-------------------+
| nlmsghdr | Pad | genlmsghdr | taskstats payload |
@@ -167,15 +168,13 @@ extended and the number of cpus grows large.
To avoid losing statistics, userspace should do one or more of the following:
- increase the receive buffer sizes for the netlink sockets opened by
listeners to receive exit data.
listeners to receive exit data.
- create more listeners and reduce the number of cpus being listened to by
each listener. In the extreme case, there could be one listener for each cpu.
Users may also consider setting the cpu affinity of the listener to the subset
of cpus to which it listens, especially if they are listening to just one cpu.
each listener. In the extreme case, there could be one listener for each cpu.
Users may also consider setting the cpu affinity of the listener to the subset
of cpus to which it listens, especially if they are listening to just one cpu.
Despite these measures, if the userspace receives ENOBUFS error messages
indicated overflow of receive buffers, it should take measures to handle the
loss of data.
----

View File

@@ -96,4 +96,4 @@ where
<URL:http://www.uefi.org/sites/default/files/resources/_DSD-hierarchical-data-extension-UUID-v1.1.pdf>,
referenced 2019-02-21.
[7] Documentation/acpi/dsd/data-node-reference.txt
[7] Documentation/firmware-guide/acpi/dsd/data-node-references.rst

View File

@@ -19,3 +19,13 @@ block device backing the filesystem is not read-only, a sysctl is
created to toggle pinning: ``/proc/sys/kernel/loadpin/enabled``. (Having
a mutable filesystem means pinning is mutable too, but having the
sysctl allows for easy testing on systems with a mutable filesystem.)
It's also possible to exclude specific file types from LoadPin using kernel
command line option "``loadpin.exclude``". By default, all files are
included, but they can be excluded using kernel command line option such
as "``loadpin.exclude=kernel-module,kexec-image``". This allows to use
different mechanisms such as ``CONFIG_MODULE_SIG`` and
``CONFIG_KEXEC_VERIFY_SIG`` to verify kernel module and kernel image while
still use LoadPin to protect the integrity of other files kernel loads. The
full list of valid file types can be found in ``kernel_read_file_str``
defined in ``include/linux/fs.h``.

View File

@@ -227,7 +227,7 @@ Configuring the kernel
"make tinyconfig" Configure the tiniest possible kernel.
You can find more information on using the Linux kernel config tools
in Documentation/kbuild/kconfig.txt.
in Documentation/kbuild/kconfig.rst.
- NOTES on ``make config``:

View File

@@ -1,3 +1,6 @@
Introduction
============
ATA over Ethernet is a network protocol that provides simple access to
block storage on the LAN.
@@ -17,12 +20,13 @@ driver. The aoetools are on sourceforge.
http://aoetools.sourceforge.net/
The scripts in this Documentation/aoe directory are intended to
The scripts in this Documentation/admin-guide/aoe directory are intended to
document the use of the driver and are not necessary if you install
the aoetools.
CREATING DEVICE NODES
Creating Device Nodes
=====================
Users of udev should find the block device nodes created
automatically, but to create all the necessary device nodes, use the
@@ -38,7 +42,8 @@ CREATING DEVICE NODES
confusing when an AoE device is not present the first time the a
command is run but appears a second later.
USING DEVICE NODES
Using Device Nodes
==================
"cat /dev/etherd/err" blocks, waiting for error diagnostic output,
like any retransmitted packets.
@@ -55,7 +60,7 @@ USING DEVICE NODES
by sysfs counterparts. Using the commands in aoetools insulates
users from these implementation details.
The block devices are named like this:
The block devices are named like this::
e{shelf}.{slot}
e{shelf}.{slot}p{part}
@@ -64,7 +69,8 @@ USING DEVICE NODES
first shelf (shelf address zero). That's the whole disk. The first
partition on that disk would be "e0.2p1".
USING SYSFS
Using sysfs
===========
Each aoe block device in /sys/block has the extra attributes of
state, mac, and netif. The state attribute is "up" when the device
@@ -78,29 +84,29 @@ USING SYSFS
There is a script in this directory that formats this information in
a convenient way. Users with aoetools should use the aoe-stat
command.
command::
root@makki root# sh Documentation/aoe/status.sh
e10.0 eth3 up
e10.1 eth3 up
e10.2 eth3 up
e10.3 eth3 up
e10.4 eth3 up
e10.5 eth3 up
e10.6 eth3 up
e10.7 eth3 up
e10.8 eth3 up
e10.9 eth3 up
e4.0 eth1 up
e4.1 eth1 up
e4.2 eth1 up
e4.3 eth1 up
e4.4 eth1 up
e4.5 eth1 up
e4.6 eth1 up
e4.7 eth1 up
e4.8 eth1 up
e4.9 eth1 up
root@makki root# sh Documentation/admin-guide/aoe/status.sh
e10.0 eth3 up
e10.1 eth3 up
e10.2 eth3 up
e10.3 eth3 up
e10.4 eth3 up
e10.5 eth3 up
e10.6 eth3 up
e10.7 eth3 up
e10.8 eth3 up
e10.9 eth3 up
e4.0 eth1 up
e4.1 eth1 up
e4.2 eth1 up
e4.3 eth1 up
e4.4 eth1 up
e4.5 eth1 up
e4.6 eth1 up
e4.7 eth1 up
e4.8 eth1 up
e4.9 eth1 up
Use /sys/module/aoe/parameters/aoe_iflist (or better, the driver
option discussed below) instead of /dev/etherd/interfaces to limit
@@ -113,12 +119,13 @@ USING SYSFS
for this purpose. You can also directly use the
/dev/etherd/discover special file described above.
DRIVER OPTIONS
Driver Options
==============
There is a boot option for the built-in aoe driver and a
corresponding module parameter, aoe_iflist. Without this option,
all network interfaces may be used for ATA over Ethernet. Here is a
usage example for the module parameter.
usage example for the module parameter::
modprobe aoe_iflist="eth1 eth3"

View File

@@ -0,0 +1,23 @@
Example of udev rules
---------------------
.. include:: udev.txt
:literal:
Example of udev install rules script
------------------------------------
.. literalinclude:: udev-install.sh
:language: shell
Example script to get status
----------------------------
.. literalinclude:: status.sh
:language: shell
Example of AoE autoload script
------------------------------
.. literalinclude:: autoload.sh
:language: shell

View File

@@ -0,0 +1,17 @@
=======================
ATA over Ethernet (AoE)
=======================
.. toctree::
:maxdepth: 1
aoe
todo
examples
.. only:: subproject and html
Indices
=======
* :ref:`genindex`

View File

@@ -1,3 +1,6 @@
TODO
====
There is a potential for deadlock when allocating a struct sk_buff for
data that needs to be written out to aoe storage. If the data is
being written from a dirty page in order to free that page, and if

Some files were not shown because too many files have changed in this diff Show More