Merge tag 'docs/v5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull rst conversion of docs from Mauro Carvalho Chehab: "As agreed with Jon, I'm sending this big series directly to you, c/c him, as this series required a special care, in order to avoid conflicts with other trees" * tag 'docs/v5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (77 commits) docs: kbuild: fix build with pdf and fix some minor issues docs: block: fix pdf output docs: arm: fix a breakage with pdf output docs: don't use nested tables docs: gpio: add sysfs interface to the admin-guide docs: locking: add it to the main index docs: add some directories to the main documentation index docs: add SPDX tags to new index files docs: add a memory-devices subdir to driver-api docs: phy: place documentation under driver-api docs: serial: move it to the driver-api docs: driver-api: add remaining converted dirs to it docs: driver-api: add xilinx driver API documentation docs: driver-api: add a series of orphaned documents docs: admin-guide: add a series of orphaned documents docs: cgroup-v1: add it to the admin-guide book docs: aoe: add it to the driver-api book docs: add some documentation dirs to the driver-api book docs: driver-model: move it to the driver-api book docs: lp855x-driver.rst: add it to the driver-api book ...
This commit is contained in:
81
Documentation/driver-api/backlight/lp855x-driver.rst
Normal file
81
Documentation/driver-api/backlight/lp855x-driver.rst
Normal file
@@ -0,0 +1,81 @@
|
||||
====================
|
||||
Kernel driver lp855x
|
||||
====================
|
||||
|
||||
Backlight driver for LP855x ICs
|
||||
|
||||
Supported chips:
|
||||
|
||||
Texas Instruments LP8550, LP8551, LP8552, LP8553, LP8555, LP8556 and
|
||||
LP8557
|
||||
|
||||
Author: Milo(Woogyom) Kim <milo.kim@ti.com>
|
||||
|
||||
Description
|
||||
-----------
|
||||
|
||||
* Brightness control
|
||||
|
||||
Brightness can be controlled by the pwm input or the i2c command.
|
||||
The lp855x driver supports both cases.
|
||||
|
||||
* Device attributes
|
||||
|
||||
1) bl_ctl_mode
|
||||
|
||||
Backlight control mode.
|
||||
|
||||
Value: pwm based or register based
|
||||
|
||||
2) chip_id
|
||||
|
||||
The lp855x chip id.
|
||||
|
||||
Value: lp8550/lp8551/lp8552/lp8553/lp8555/lp8556/lp8557
|
||||
|
||||
Platform data for lp855x
|
||||
------------------------
|
||||
|
||||
For supporting platform specific data, the lp855x platform data can be used.
|
||||
|
||||
* name:
|
||||
Backlight driver name. If it is not defined, default name is set.
|
||||
* device_control:
|
||||
Value of DEVICE CONTROL register.
|
||||
* initial_brightness:
|
||||
Initial value of backlight brightness.
|
||||
* period_ns:
|
||||
Platform specific PWM period value. unit is nano.
|
||||
Only valid when brightness is pwm input mode.
|
||||
* size_program:
|
||||
Total size of lp855x_rom_data.
|
||||
* rom_data:
|
||||
List of new eeprom/eprom registers.
|
||||
|
||||
Examples
|
||||
========
|
||||
|
||||
1) lp8552 platform data: i2c register mode with new eeprom data::
|
||||
|
||||
#define EEPROM_A5_ADDR 0xA5
|
||||
#define EEPROM_A5_VAL 0x4f /* EN_VSYNC=0 */
|
||||
|
||||
static struct lp855x_rom_data lp8552_eeprom_arr[] = {
|
||||
{EEPROM_A5_ADDR, EEPROM_A5_VAL},
|
||||
};
|
||||
|
||||
static struct lp855x_platform_data lp8552_pdata = {
|
||||
.name = "lcd-bl",
|
||||
.device_control = I2C_CONFIG(LP8552),
|
||||
.initial_brightness = INITIAL_BRT,
|
||||
.size_program = ARRAY_SIZE(lp8552_eeprom_arr),
|
||||
.rom_data = lp8552_eeprom_arr,
|
||||
};
|
||||
|
||||
2) lp8556 platform data: pwm input mode with default rom data::
|
||||
|
||||
static struct lp855x_platform_data lp8556_pdata = {
|
||||
.device_control = PWM_CONFIG(LP8556),
|
||||
.initial_brightness = INITIAL_BRT,
|
||||
.period_ns = 1000000,
|
||||
};
|
62
Documentation/driver-api/bt8xxgpio.rst
Normal file
62
Documentation/driver-api/bt8xxgpio.rst
Normal file
@@ -0,0 +1,62 @@
|
||||
===================================================================
|
||||
A driver for a selfmade cheap BT8xx based PCI GPIO-card (bt8xxgpio)
|
||||
===================================================================
|
||||
|
||||
For advanced documentation, see http://www.bu3sch.de/btgpio.php
|
||||
|
||||
A generic digital 24-port PCI GPIO card can be built out of an ordinary
|
||||
Brooktree bt848, bt849, bt878 or bt879 based analog TV tuner card. The
|
||||
Brooktree chip is used in old analog Hauppauge WinTV PCI cards. You can easily
|
||||
find them used for low prices on the net.
|
||||
|
||||
The bt8xx chip does have 24 digital GPIO ports.
|
||||
These ports are accessible via 24 pins on the SMD chip package.
|
||||
|
||||
|
||||
How to physically access the GPIO pins
|
||||
======================================
|
||||
|
||||
The are several ways to access these pins. One might unsolder the whole chip
|
||||
and put it on a custom PCI board, or one might only unsolder each individual
|
||||
GPIO pin and solder that to some tiny wire. As the chip package really is tiny
|
||||
there are some advanced soldering skills needed in any case.
|
||||
|
||||
The physical pinouts are drawn in the following ASCII art.
|
||||
The GPIO pins are marked with G00-G23::
|
||||
|
||||
G G G G G G G G G G G G G G G G G G
|
||||
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
|
||||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7
|
||||
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
|
||||
---------------------------------------------------------------------------
|
||||
--| ^ ^ |--
|
||||
--| pin 86 pin 67 |--
|
||||
--| |--
|
||||
--| pin 61 > |-- G18
|
||||
--| |-- G19
|
||||
--| |-- G20
|
||||
--| |-- G21
|
||||
--| |-- G22
|
||||
--| pin 56 > |-- G23
|
||||
--| |--
|
||||
--| Brooktree 878/879 |--
|
||||
--| |--
|
||||
--| |--
|
||||
--| |--
|
||||
--| |--
|
||||
--| |--
|
||||
--| |--
|
||||
--| |--
|
||||
--| |--
|
||||
--| |--
|
||||
--| |--
|
||||
--| |--
|
||||
--| |--
|
||||
--| |--
|
||||
--| O |--
|
||||
--| |--
|
||||
---------------------------------------------------------------------------
|
||||
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
|
||||
^
|
||||
This is pin 1
|
||||
|
156
Documentation/driver-api/connector.rst
Normal file
156
Documentation/driver-api/connector.rst
Normal file
@@ -0,0 +1,156 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
================
|
||||
Kernel Connector
|
||||
================
|
||||
|
||||
Kernel connector - new netlink based userspace <-> kernel space easy
|
||||
to use communication module.
|
||||
|
||||
The Connector driver makes it easy to connect various agents using a
|
||||
netlink based network. One must register a callback and an identifier.
|
||||
When the driver receives a special netlink message with the appropriate
|
||||
identifier, the appropriate callback will be called.
|
||||
|
||||
From the userspace point of view it's quite straightforward:
|
||||
|
||||
- socket();
|
||||
- bind();
|
||||
- send();
|
||||
- recv();
|
||||
|
||||
But if kernelspace wants to use the full power of such connections, the
|
||||
driver writer must create special sockets, must know about struct sk_buff
|
||||
handling, etc... The Connector driver allows any kernelspace agents to use
|
||||
netlink based networking for inter-process communication in a significantly
|
||||
easier way::
|
||||
|
||||
int cn_add_callback(struct cb_id *id, char *name, void (*callback) (struct cn_msg *, struct netlink_skb_parms *));
|
||||
void cn_netlink_send_multi(struct cn_msg *msg, u16 len, u32 portid, u32 __group, int gfp_mask);
|
||||
void cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, int gfp_mask);
|
||||
|
||||
struct cb_id
|
||||
{
|
||||
__u32 idx;
|
||||
__u32 val;
|
||||
};
|
||||
|
||||
idx and val are unique identifiers which must be registered in the
|
||||
connector.h header for in-kernel usage. `void (*callback) (void *)` is a
|
||||
callback function which will be called when a message with above idx.val
|
||||
is received by the connector core. The argument for that function must
|
||||
be dereferenced to `struct cn_msg *`::
|
||||
|
||||
struct cn_msg
|
||||
{
|
||||
struct cb_id id;
|
||||
|
||||
__u32 seq;
|
||||
__u32 ack;
|
||||
|
||||
__u32 len; /* Length of the following data */
|
||||
__u8 data[0];
|
||||
};
|
||||
|
||||
Connector interfaces
|
||||
====================
|
||||
|
||||
.. kernel-doc:: include/linux/connector.h
|
||||
|
||||
Note:
|
||||
When registering new callback user, connector core assigns
|
||||
netlink group to the user which is equal to its id.idx.
|
||||
|
||||
Protocol description
|
||||
====================
|
||||
|
||||
The current framework offers a transport layer with fixed headers. The
|
||||
recommended protocol which uses such a header is as following:
|
||||
|
||||
msg->seq and msg->ack are used to determine message genealogy. When
|
||||
someone sends a message, they use a locally unique sequence and random
|
||||
acknowledge number. The sequence number may be copied into
|
||||
nlmsghdr->nlmsg_seq too.
|
||||
|
||||
The sequence number is incremented with each message sent.
|
||||
|
||||
If you expect a reply to the message, then the sequence number in the
|
||||
received message MUST be the same as in the original message, and the
|
||||
acknowledge number MUST be the same + 1.
|
||||
|
||||
If we receive a message and its sequence number is not equal to one we
|
||||
are expecting, then it is a new message. If we receive a message and
|
||||
its sequence number is the same as one we are expecting, but its
|
||||
acknowledge is not equal to the sequence number in the original
|
||||
message + 1, then it is a new message.
|
||||
|
||||
Obviously, the protocol header contains the above id.
|
||||
|
||||
The connector allows event notification in the following form: kernel
|
||||
driver or userspace process can ask connector to notify it when
|
||||
selected ids will be turned on or off (registered or unregistered its
|
||||
callback). It is done by sending a special command to the connector
|
||||
driver (it also registers itself with id={-1, -1}).
|
||||
|
||||
As example of this usage can be found in the cn_test.c module which
|
||||
uses the connector to request notification and to send messages.
|
||||
|
||||
Reliability
|
||||
===========
|
||||
|
||||
Netlink itself is not a reliable protocol. That means that messages can
|
||||
be lost due to memory pressure or process' receiving queue overflowed,
|
||||
so caller is warned that it must be prepared. That is why the struct
|
||||
cn_msg [main connector's message header] contains u32 seq and u32 ack
|
||||
fields.
|
||||
|
||||
Userspace usage
|
||||
===============
|
||||
|
||||
2.6.14 has a new netlink socket implementation, which by default does not
|
||||
allow people to send data to netlink groups other than 1.
|
||||
So, if you wish to use a netlink socket (for example using connector)
|
||||
with a different group number, the userspace application must subscribe to
|
||||
that group first. It can be achieved by the following pseudocode::
|
||||
|
||||
s = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_CONNECTOR);
|
||||
|
||||
l_local.nl_family = AF_NETLINK;
|
||||
l_local.nl_groups = 12345;
|
||||
l_local.nl_pid = 0;
|
||||
|
||||
if (bind(s, (struct sockaddr *)&l_local, sizeof(struct sockaddr_nl)) == -1) {
|
||||
perror("bind");
|
||||
close(s);
|
||||
return -1;
|
||||
}
|
||||
|
||||
{
|
||||
int on = l_local.nl_groups;
|
||||
setsockopt(s, 270, 1, &on, sizeof(on));
|
||||
}
|
||||
|
||||
Where 270 above is SOL_NETLINK, and 1 is a NETLINK_ADD_MEMBERSHIP socket
|
||||
option. To drop a multicast subscription, one should call the above socket
|
||||
option with the NETLINK_DROP_MEMBERSHIP parameter which is defined as 0.
|
||||
|
||||
2.6.14 netlink code only allows to select a group which is less or equal to
|
||||
the maximum group number, which is used at netlink_kernel_create() time.
|
||||
In case of connector it is CN_NETLINK_USERS + 0xf, so if you want to use
|
||||
group number 12345, you must increment CN_NETLINK_USERS to that number.
|
||||
Additional 0xf numbers are allocated to be used by non-in-kernel users.
|
||||
|
||||
Due to this limitation, group 0xffffffff does not work now, so one can
|
||||
not use add/remove connector's group notifications, but as far as I know,
|
||||
only cn_test.c test module used it.
|
||||
|
||||
Some work in netlink area is still being done, so things can be changed in
|
||||
2.6.15 timeframe, if it will happen, documentation will be updated for that
|
||||
kernel.
|
||||
|
||||
Code samples
|
||||
============
|
||||
|
||||
Sample code for a connector test module and user space can be found
|
||||
in samples/connector/. To build this code, enable CONFIG_CONNECTOR
|
||||
and CONFIG_SAMPLES.
|
152
Documentation/driver-api/console.rst
Normal file
152
Documentation/driver-api/console.rst
Normal file
@@ -0,0 +1,152 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===============
|
||||
Console Drivers
|
||||
===============
|
||||
|
||||
The Linux kernel has 2 general types of console drivers. The first type is
|
||||
assigned by the kernel to all the virtual consoles during the boot process.
|
||||
This type will be called 'system driver', and only one system driver is allowed
|
||||
to exist. The system driver is persistent and it can never be unloaded, though
|
||||
it may become inactive.
|
||||
|
||||
The second type has to be explicitly loaded and unloaded. This will be called
|
||||
'modular driver' by this document. Multiple modular drivers can coexist at
|
||||
any time with each driver sharing the console with other drivers including
|
||||
the system driver. However, modular drivers cannot take over the console
|
||||
that is currently occupied by another modular driver. (Exception: Drivers that
|
||||
call do_take_over_console() will succeed in the takeover regardless of the type
|
||||
of driver occupying the consoles.) They can only take over the console that is
|
||||
occupied by the system driver. In the same token, if the modular driver is
|
||||
released by the console, the system driver will take over.
|
||||
|
||||
Modular drivers, from the programmer's point of view, have to call::
|
||||
|
||||
do_take_over_console() - load and bind driver to console layer
|
||||
give_up_console() - unload driver; it will only work if driver
|
||||
is fully unbound
|
||||
|
||||
In newer kernels, the following are also available::
|
||||
|
||||
do_register_con_driver()
|
||||
do_unregister_con_driver()
|
||||
|
||||
If sysfs is enabled, the contents of /sys/class/vtconsole can be
|
||||
examined. This shows the console backends currently registered by the
|
||||
system which are named vtcon<n> where <n> is an integer from 0 to 15.
|
||||
Thus::
|
||||
|
||||
ls /sys/class/vtconsole
|
||||
. .. vtcon0 vtcon1
|
||||
|
||||
Each directory in /sys/class/vtconsole has 3 files::
|
||||
|
||||
ls /sys/class/vtconsole/vtcon0
|
||||
. .. bind name uevent
|
||||
|
||||
What do these files signify?
|
||||
|
||||
1. bind - this is a read/write file. It shows the status of the driver if
|
||||
read, or acts to bind or unbind the driver to the virtual consoles
|
||||
when written to. The possible values are:
|
||||
|
||||
0
|
||||
- means the driver is not bound and if echo'ed, commands the driver
|
||||
to unbind
|
||||
|
||||
1
|
||||
- means the driver is bound and if echo'ed, commands the driver to
|
||||
bind
|
||||
|
||||
2. name - read-only file. Shows the name of the driver in this format::
|
||||
|
||||
cat /sys/class/vtconsole/vtcon0/name
|
||||
(S) VGA+
|
||||
|
||||
'(S)' stands for a (S)ystem driver, i.e., it cannot be directly
|
||||
commanded to bind or unbind
|
||||
|
||||
'VGA+' is the name of the driver
|
||||
|
||||
cat /sys/class/vtconsole/vtcon1/name
|
||||
(M) frame buffer device
|
||||
|
||||
In this case, '(M)' stands for a (M)odular driver, one that can be
|
||||
directly commanded to bind or unbind.
|
||||
|
||||
3. uevent - ignore this file
|
||||
|
||||
When unbinding, the modular driver is detached first, and then the system
|
||||
driver takes over the consoles vacated by the driver. Binding, on the other
|
||||
hand, will bind the driver to the consoles that are currently occupied by a
|
||||
system driver.
|
||||
|
||||
NOTE1:
|
||||
Binding and unbinding must be selected in Kconfig. It's under::
|
||||
|
||||
Device Drivers ->
|
||||
Character devices ->
|
||||
Support for binding and unbinding console drivers
|
||||
|
||||
NOTE2:
|
||||
If any of the virtual consoles are in KD_GRAPHICS mode, then binding or
|
||||
unbinding will not succeed. An example of an application that sets the
|
||||
console to KD_GRAPHICS is X.
|
||||
|
||||
How useful is this feature? This is very useful for console driver
|
||||
developers. By unbinding the driver from the console layer, one can unload the
|
||||
driver, make changes, recompile, reload and rebind the driver without any need
|
||||
for rebooting the kernel. For regular users who may want to switch from
|
||||
framebuffer console to VGA console and vice versa, this feature also makes
|
||||
this possible. (NOTE NOTE NOTE: Please read fbcon.txt under Documentation/fb
|
||||
for more details.)
|
||||
|
||||
Notes for developers
|
||||
====================
|
||||
|
||||
do_take_over_console() is now broken up into::
|
||||
|
||||
do_register_con_driver()
|
||||
do_bind_con_driver() - private function
|
||||
|
||||
give_up_console() is a wrapper to do_unregister_con_driver(), and a driver must
|
||||
be fully unbound for this call to succeed. con_is_bound() will check if the
|
||||
driver is bound or not.
|
||||
|
||||
Guidelines for console driver writers
|
||||
=====================================
|
||||
|
||||
In order for binding to and unbinding from the console to properly work,
|
||||
console drivers must follow these guidelines:
|
||||
|
||||
1. All drivers, except system drivers, must call either do_register_con_driver()
|
||||
or do_take_over_console(). do_register_con_driver() will just add the driver
|
||||
to the console's internal list. It won't take over the
|
||||
console. do_take_over_console(), as it name implies, will also take over (or
|
||||
bind to) the console.
|
||||
|
||||
2. All resources allocated during con->con_init() must be released in
|
||||
con->con_deinit().
|
||||
|
||||
3. All resources allocated in con->con_startup() must be released when the
|
||||
driver, which was previously bound, becomes unbound. The console layer
|
||||
does not have a complementary call to con->con_startup() so it's up to the
|
||||
driver to check when it's legal to release these resources. Calling
|
||||
con_is_bound() in con->con_deinit() will help. If the call returned
|
||||
false(), then it's safe to release the resources. This balance has to be
|
||||
ensured because con->con_startup() can be called again when a request to
|
||||
rebind the driver to the console arrives.
|
||||
|
||||
4. Upon exit of the driver, ensure that the driver is totally unbound. If the
|
||||
condition is satisfied, then the driver must call do_unregister_con_driver()
|
||||
or give_up_console().
|
||||
|
||||
5. do_unregister_con_driver() can also be called on conditions which make it
|
||||
impossible for the driver to service console requests. This can happen
|
||||
with the framebuffer console that suddenly lost all of its drivers.
|
||||
|
||||
The current crop of console drivers should still work correctly, but binding
|
||||
and unbinding them may cause problems. With minimal fixes, these drivers can
|
||||
be made to work correctly.
|
||||
|
||||
Antonino Daplas <adaplas@pol.net>
|
99
Documentation/driver-api/dcdbas.rst
Normal file
99
Documentation/driver-api/dcdbas.rst
Normal file
@@ -0,0 +1,99 @@
|
||||
===================================
|
||||
Dell Systems Management Base Driver
|
||||
===================================
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
The Dell Systems Management Base Driver provides a sysfs interface for
|
||||
systems management software such as Dell OpenManage to perform system
|
||||
management interrupts and host control actions (system power cycle or
|
||||
power off after OS shutdown) on certain Dell systems.
|
||||
|
||||
Dell OpenManage requires this driver on the following Dell PowerEdge systems:
|
||||
300, 1300, 1400, 400SC, 500SC, 1500SC, 1550, 600SC, 1600SC, 650, 1655MC,
|
||||
700, and 750. Other Dell software such as the open source libsmbios project
|
||||
is expected to make use of this driver, and it may include the use of this
|
||||
driver on other Dell systems.
|
||||
|
||||
The Dell libsmbios project aims towards providing access to as much BIOS
|
||||
information as possible. See http://linux.dell.com/libsmbios/main/ for
|
||||
more information about the libsmbios project.
|
||||
|
||||
|
||||
System Management Interrupt
|
||||
===========================
|
||||
|
||||
On some Dell systems, systems management software must access certain
|
||||
management information via a system management interrupt (SMI). The SMI data
|
||||
buffer must reside in 32-bit address space, and the physical address of the
|
||||
buffer is required for the SMI. The driver maintains the memory required for
|
||||
the SMI and provides a way for the application to generate the SMI.
|
||||
The driver creates the following sysfs entries for systems management
|
||||
software to perform these system management interrupts::
|
||||
|
||||
/sys/devices/platform/dcdbas/smi_data
|
||||
/sys/devices/platform/dcdbas/smi_data_buf_phys_addr
|
||||
/sys/devices/platform/dcdbas/smi_data_buf_size
|
||||
/sys/devices/platform/dcdbas/smi_request
|
||||
|
||||
Systems management software must perform the following steps to execute
|
||||
a SMI using this driver:
|
||||
|
||||
1) Lock smi_data.
|
||||
2) Write system management command to smi_data.
|
||||
3) Write "1" to smi_request to generate a calling interface SMI or
|
||||
"2" to generate a raw SMI.
|
||||
4) Read system management command response from smi_data.
|
||||
5) Unlock smi_data.
|
||||
|
||||
|
||||
Host Control Action
|
||||
===================
|
||||
|
||||
Dell OpenManage supports a host control feature that allows the administrator
|
||||
to perform a power cycle or power off of the system after the OS has finished
|
||||
shutting down. On some Dell systems, this host control feature requires that
|
||||
a driver perform a SMI after the OS has finished shutting down.
|
||||
|
||||
The driver creates the following sysfs entries for systems management software
|
||||
to schedule the driver to perform a power cycle or power off host control
|
||||
action after the system has finished shutting down:
|
||||
|
||||
/sys/devices/platform/dcdbas/host_control_action
|
||||
/sys/devices/platform/dcdbas/host_control_smi_type
|
||||
/sys/devices/platform/dcdbas/host_control_on_shutdown
|
||||
|
||||
Dell OpenManage performs the following steps to execute a power cycle or
|
||||
power off host control action using this driver:
|
||||
|
||||
1) Write host control action to be performed to host_control_action.
|
||||
2) Write type of SMI that driver needs to perform to host_control_smi_type.
|
||||
3) Write "1" to host_control_on_shutdown to enable host control action.
|
||||
4) Initiate OS shutdown.
|
||||
(Driver will perform host control SMI when it is notified that the OS
|
||||
has finished shutting down.)
|
||||
|
||||
|
||||
Host Control SMI Type
|
||||
=====================
|
||||
|
||||
The following table shows the value to write to host_control_smi_type to
|
||||
perform a power cycle or power off host control action:
|
||||
|
||||
=================== =====================
|
||||
PowerEdge System Host Control SMI Type
|
||||
=================== =====================
|
||||
300 HC_SMITYPE_TYPE1
|
||||
1300 HC_SMITYPE_TYPE1
|
||||
1400 HC_SMITYPE_TYPE2
|
||||
500SC HC_SMITYPE_TYPE2
|
||||
1500SC HC_SMITYPE_TYPE2
|
||||
1550 HC_SMITYPE_TYPE2
|
||||
600SC HC_SMITYPE_TYPE2
|
||||
1600SC HC_SMITYPE_TYPE2
|
||||
650 HC_SMITYPE_TYPE2
|
||||
1655MC HC_SMITYPE_TYPE2
|
||||
700 HC_SMITYPE_TYPE3
|
||||
750 HC_SMITYPE_TYPE3
|
||||
=================== =====================
|
128
Documentation/driver-api/dell_rbu.rst
Normal file
128
Documentation/driver-api/dell_rbu.rst
Normal file
@@ -0,0 +1,128 @@
|
||||
=============================================================
|
||||
Usage of the new open sourced rbu (Remote BIOS Update) driver
|
||||
=============================================================
|
||||
|
||||
Purpose
|
||||
=======
|
||||
|
||||
Document demonstrating the use of the Dell Remote BIOS Update driver.
|
||||
for updating BIOS images on Dell servers and desktops.
|
||||
|
||||
Scope
|
||||
=====
|
||||
|
||||
This document discusses the functionality of the rbu driver only.
|
||||
It does not cover the support needed from applications to enable the BIOS to
|
||||
update itself with the image downloaded in to the memory.
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
This driver works with Dell OpenManage or Dell Update Packages for updating
|
||||
the BIOS on Dell servers (starting from servers sold since 1999), desktops
|
||||
and notebooks (starting from those sold in 2005).
|
||||
|
||||
Please go to http://support.dell.com register and you can find info on
|
||||
OpenManage and Dell Update packages (DUP).
|
||||
|
||||
Libsmbios can also be used to update BIOS on Dell systems go to
|
||||
http://linux.dell.com/libsmbios/ for details.
|
||||
|
||||
Dell_RBU driver supports BIOS update using the monolithic image and packetized
|
||||
image methods. In case of monolithic the driver allocates a contiguous chunk
|
||||
of physical pages having the BIOS image. In case of packetized the app
|
||||
using the driver breaks the image in to packets of fixed sizes and the driver
|
||||
would place each packet in contiguous physical memory. The driver also
|
||||
maintains a link list of packets for reading them back.
|
||||
|
||||
If the dell_rbu driver is unloaded all the allocated memory is freed.
|
||||
|
||||
The rbu driver needs to have an application (as mentioned above)which will
|
||||
inform the BIOS to enable the update in the next system reboot.
|
||||
|
||||
The user should not unload the rbu driver after downloading the BIOS image
|
||||
or updating.
|
||||
|
||||
The driver load creates the following directories under the /sys file system::
|
||||
|
||||
/sys/class/firmware/dell_rbu/loading
|
||||
/sys/class/firmware/dell_rbu/data
|
||||
/sys/devices/platform/dell_rbu/image_type
|
||||
/sys/devices/platform/dell_rbu/data
|
||||
/sys/devices/platform/dell_rbu/packet_size
|
||||
|
||||
The driver supports two types of update mechanism; monolithic and packetized.
|
||||
These update mechanism depends upon the BIOS currently running on the system.
|
||||
Most of the Dell systems support a monolithic update where the BIOS image is
|
||||
copied to a single contiguous block of physical memory.
|
||||
|
||||
In case of packet mechanism the single memory can be broken in smaller chunks
|
||||
of contiguous memory and the BIOS image is scattered in these packets.
|
||||
|
||||
By default the driver uses monolithic memory for the update type. This can be
|
||||
changed to packets during the driver load time by specifying the load
|
||||
parameter image_type=packet. This can also be changed later as below::
|
||||
|
||||
echo packet > /sys/devices/platform/dell_rbu/image_type
|
||||
|
||||
In packet update mode the packet size has to be given before any packets can
|
||||
be downloaded. It is done as below::
|
||||
|
||||
echo XXXX > /sys/devices/platform/dell_rbu/packet_size
|
||||
|
||||
In the packet update mechanism, the user needs to create a new file having
|
||||
packets of data arranged back to back. It can be done as follows
|
||||
The user creates packets header, gets the chunk of the BIOS image and
|
||||
places it next to the packetheader; now, the packetheader + BIOS image chunk
|
||||
added together should match the specified packet_size. This makes one
|
||||
packet, the user needs to create more such packets out of the entire BIOS
|
||||
image file and then arrange all these packets back to back in to one single
|
||||
file.
|
||||
|
||||
This file is then copied to /sys/class/firmware/dell_rbu/data.
|
||||
Once this file gets to the driver, the driver extracts packet_size data from
|
||||
the file and spreads it across the physical memory in contiguous packet_sized
|
||||
space.
|
||||
|
||||
This method makes sure that all the packets get to the driver in a single operation.
|
||||
|
||||
In monolithic update the user simply get the BIOS image (.hdr file) and copies
|
||||
to the data file as is without any change to the BIOS image itself.
|
||||
|
||||
Do the steps below to download the BIOS image.
|
||||
|
||||
1) echo 1 > /sys/class/firmware/dell_rbu/loading
|
||||
2) cp bios_image.hdr /sys/class/firmware/dell_rbu/data
|
||||
3) echo 0 > /sys/class/firmware/dell_rbu/loading
|
||||
|
||||
The /sys/class/firmware/dell_rbu/ entries will remain till the following is
|
||||
done.
|
||||
|
||||
::
|
||||
|
||||
echo -1 > /sys/class/firmware/dell_rbu/loading
|
||||
|
||||
Until this step is completed the driver cannot be unloaded.
|
||||
|
||||
Also echoing either mono, packet or init in to image_type will free up the
|
||||
memory allocated by the driver.
|
||||
|
||||
If a user by accident executes steps 1 and 3 above without executing step 2;
|
||||
it will make the /sys/class/firmware/dell_rbu/ entries disappear.
|
||||
|
||||
The entries can be recreated by doing the following::
|
||||
|
||||
echo init > /sys/devices/platform/dell_rbu/image_type
|
||||
|
||||
.. note:: echoing init in image_type does not change it original value.
|
||||
|
||||
Also the driver provides /sys/devices/platform/dell_rbu/data readonly file to
|
||||
read back the image downloaded.
|
||||
|
||||
.. note::
|
||||
|
||||
After updating the BIOS image a user mode application needs to execute
|
||||
code which sends the BIOS update request to the BIOS. So on the next reboot
|
||||
the BIOS knows about the new image downloaded and it updates itself.
|
||||
Also don't unload the rbu driver if the image has to be updated.
|
||||
|
98
Documentation/driver-api/driver-model/binding.rst
Normal file
98
Documentation/driver-api/driver-model/binding.rst
Normal file
@@ -0,0 +1,98 @@
|
||||
==============
|
||||
Driver Binding
|
||||
==============
|
||||
|
||||
Driver binding is the process of associating a device with a device
|
||||
driver that can control it. Bus drivers have typically handled this
|
||||
because there have been bus-specific structures to represent the
|
||||
devices and the drivers. With generic device and device driver
|
||||
structures, most of the binding can take place using common code.
|
||||
|
||||
|
||||
Bus
|
||||
~~~
|
||||
|
||||
The bus type structure contains a list of all devices that are on that bus
|
||||
type in the system. When device_register is called for a device, it is
|
||||
inserted into the end of this list. The bus object also contains a
|
||||
list of all drivers of that bus type. When driver_register is called
|
||||
for a driver, it is inserted at the end of this list. These are the
|
||||
two events which trigger driver binding.
|
||||
|
||||
|
||||
device_register
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
When a new device is added, the bus's list of drivers is iterated over
|
||||
to find one that supports it. In order to determine that, the device
|
||||
ID of the device must match one of the device IDs that the driver
|
||||
supports. The format and semantics for comparing IDs is bus-specific.
|
||||
Instead of trying to derive a complex state machine and matching
|
||||
algorithm, it is up to the bus driver to provide a callback to compare
|
||||
a device against the IDs of a driver. The bus returns 1 if a match was
|
||||
found; 0 otherwise.
|
||||
|
||||
int match(struct device * dev, struct device_driver * drv);
|
||||
|
||||
If a match is found, the device's driver field is set to the driver
|
||||
and the driver's probe callback is called. This gives the driver a
|
||||
chance to verify that it really does support the hardware, and that
|
||||
it's in a working state.
|
||||
|
||||
Device Class
|
||||
~~~~~~~~~~~~
|
||||
|
||||
Upon the successful completion of probe, the device is registered with
|
||||
the class to which it belongs. Device drivers belong to one and only one
|
||||
class, and that is set in the driver's devclass field.
|
||||
devclass_add_device is called to enumerate the device within the class
|
||||
and actually register it with the class, which happens with the
|
||||
class's register_dev callback.
|
||||
|
||||
|
||||
Driver
|
||||
~~~~~~
|
||||
|
||||
When a driver is attached to a device, the device is inserted into the
|
||||
driver's list of devices.
|
||||
|
||||
|
||||
sysfs
|
||||
~~~~~
|
||||
|
||||
A symlink is created in the bus's 'devices' directory that points to
|
||||
the device's directory in the physical hierarchy.
|
||||
|
||||
A symlink is created in the driver's 'devices' directory that points
|
||||
to the device's directory in the physical hierarchy.
|
||||
|
||||
A directory for the device is created in the class's directory. A
|
||||
symlink is created in that directory that points to the device's
|
||||
physical location in the sysfs tree.
|
||||
|
||||
A symlink can be created (though this isn't done yet) in the device's
|
||||
physical directory to either its class directory, or the class's
|
||||
top-level directory. One can also be created to point to its driver's
|
||||
directory also.
|
||||
|
||||
|
||||
driver_register
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
The process is almost identical for when a new driver is added.
|
||||
The bus's list of devices is iterated over to find a match. Devices
|
||||
that already have a driver are skipped. All the devices are iterated
|
||||
over, to bind as many devices as possible to the driver.
|
||||
|
||||
|
||||
Removal
|
||||
~~~~~~~
|
||||
|
||||
When a device is removed, the reference count for it will eventually
|
||||
go to 0. When it does, the remove callback of the driver is called. It
|
||||
is removed from the driver's list of devices and the reference count
|
||||
of the driver is decremented. All symlinks between the two are removed.
|
||||
|
||||
When a driver is removed, the list of devices that it supports is
|
||||
iterated over, and the driver's remove callback is called for each
|
||||
one. The device is removed from that list and the symlinks removed.
|
146
Documentation/driver-api/driver-model/bus.rst
Normal file
146
Documentation/driver-api/driver-model/bus.rst
Normal file
@@ -0,0 +1,146 @@
|
||||
=========
|
||||
Bus Types
|
||||
=========
|
||||
|
||||
Definition
|
||||
~~~~~~~~~~
|
||||
See the kerneldoc for the struct bus_type.
|
||||
|
||||
int bus_register(struct bus_type * bus);
|
||||
|
||||
|
||||
Declaration
|
||||
~~~~~~~~~~~
|
||||
|
||||
Each bus type in the kernel (PCI, USB, etc) should declare one static
|
||||
object of this type. They must initialize the name field, and may
|
||||
optionally initialize the match callback::
|
||||
|
||||
struct bus_type pci_bus_type = {
|
||||
.name = "pci",
|
||||
.match = pci_bus_match,
|
||||
};
|
||||
|
||||
The structure should be exported to drivers in a header file:
|
||||
|
||||
extern struct bus_type pci_bus_type;
|
||||
|
||||
|
||||
Registration
|
||||
~~~~~~~~~~~~
|
||||
|
||||
When a bus driver is initialized, it calls bus_register. This
|
||||
initializes the rest of the fields in the bus object and inserts it
|
||||
into a global list of bus types. Once the bus object is registered,
|
||||
the fields in it are usable by the bus driver.
|
||||
|
||||
|
||||
Callbacks
|
||||
~~~~~~~~~
|
||||
|
||||
match(): Attaching Drivers to Devices
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The format of device ID structures and the semantics for comparing
|
||||
them are inherently bus-specific. Drivers typically declare an array
|
||||
of device IDs of devices they support that reside in a bus-specific
|
||||
driver structure.
|
||||
|
||||
The purpose of the match callback is to give the bus an opportunity to
|
||||
determine if a particular driver supports a particular device by
|
||||
comparing the device IDs the driver supports with the device ID of a
|
||||
particular device, without sacrificing bus-specific functionality or
|
||||
type-safety.
|
||||
|
||||
When a driver is registered with the bus, the bus's list of devices is
|
||||
iterated over, and the match callback is called for each device that
|
||||
does not have a driver associated with it.
|
||||
|
||||
|
||||
|
||||
Device and Driver Lists
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The lists of devices and drivers are intended to replace the local
|
||||
lists that many buses keep. They are lists of struct devices and
|
||||
struct device_drivers, respectively. Bus drivers are free to use the
|
||||
lists as they please, but conversion to the bus-specific type may be
|
||||
necessary.
|
||||
|
||||
The LDM core provides helper functions for iterating over each list::
|
||||
|
||||
int bus_for_each_dev(struct bus_type * bus, struct device * start,
|
||||
void * data,
|
||||
int (*fn)(struct device *, void *));
|
||||
|
||||
int bus_for_each_drv(struct bus_type * bus, struct device_driver * start,
|
||||
void * data, int (*fn)(struct device_driver *, void *));
|
||||
|
||||
These helpers iterate over the respective list, and call the callback
|
||||
for each device or driver in the list. All list accesses are
|
||||
synchronized by taking the bus's lock (read currently). The reference
|
||||
count on each object in the list is incremented before the callback is
|
||||
called; it is decremented after the next object has been obtained. The
|
||||
lock is not held when calling the callback.
|
||||
|
||||
|
||||
sysfs
|
||||
~~~~~~~~
|
||||
There is a top-level directory named 'bus'.
|
||||
|
||||
Each bus gets a directory in the bus directory, along with two default
|
||||
directories::
|
||||
|
||||
/sys/bus/pci/
|
||||
|-- devices
|
||||
`-- drivers
|
||||
|
||||
Drivers registered with the bus get a directory in the bus's drivers
|
||||
directory::
|
||||
|
||||
/sys/bus/pci/
|
||||
|-- devices
|
||||
`-- drivers
|
||||
|-- Intel ICH
|
||||
|-- Intel ICH Joystick
|
||||
|-- agpgart
|
||||
`-- e100
|
||||
|
||||
Each device that is discovered on a bus of that type gets a symlink in
|
||||
the bus's devices directory to the device's directory in the physical
|
||||
hierarchy::
|
||||
|
||||
/sys/bus/pci/
|
||||
|-- devices
|
||||
| |-- 00:00.0 -> ../../../root/pci0/00:00.0
|
||||
| |-- 00:01.0 -> ../../../root/pci0/00:01.0
|
||||
| `-- 00:02.0 -> ../../../root/pci0/00:02.0
|
||||
`-- drivers
|
||||
|
||||
|
||||
Exporting Attributes
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
::
|
||||
|
||||
struct bus_attribute {
|
||||
struct attribute attr;
|
||||
ssize_t (*show)(struct bus_type *, char * buf);
|
||||
ssize_t (*store)(struct bus_type *, const char * buf, size_t count);
|
||||
};
|
||||
|
||||
Bus drivers can export attributes using the BUS_ATTR_RW macro that works
|
||||
similarly to the DEVICE_ATTR_RW macro for devices. For example, a
|
||||
definition like this::
|
||||
|
||||
static BUS_ATTR_RW(debug);
|
||||
|
||||
is equivalent to declaring::
|
||||
|
||||
static bus_attribute bus_attr_debug;
|
||||
|
||||
This can then be used to add and remove the attribute from the bus's
|
||||
sysfs directory using::
|
||||
|
||||
int bus_create_file(struct bus_type *, struct bus_attribute *);
|
||||
void bus_remove_file(struct bus_type *, struct bus_attribute *);
|
149
Documentation/driver-api/driver-model/class.rst
Normal file
149
Documentation/driver-api/driver-model/class.rst
Normal file
@@ -0,0 +1,149 @@
|
||||
==============
|
||||
Device Classes
|
||||
==============
|
||||
|
||||
Introduction
|
||||
~~~~~~~~~~~~
|
||||
A device class describes a type of device, like an audio or network
|
||||
device. The following device classes have been identified:
|
||||
|
||||
<Insert List of Device Classes Here>
|
||||
|
||||
|
||||
Each device class defines a set of semantics and a programming interface
|
||||
that devices of that class adhere to. Device drivers are the
|
||||
implementation of that programming interface for a particular device on
|
||||
a particular bus.
|
||||
|
||||
Device classes are agnostic with respect to what bus a device resides
|
||||
on.
|
||||
|
||||
|
||||
Programming Interface
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
The device class structure looks like::
|
||||
|
||||
|
||||
typedef int (*devclass_add)(struct device *);
|
||||
typedef void (*devclass_remove)(struct device *);
|
||||
|
||||
See the kerneldoc for the struct class.
|
||||
|
||||
A typical device class definition would look like::
|
||||
|
||||
struct device_class input_devclass = {
|
||||
.name = "input",
|
||||
.add_device = input_add_device,
|
||||
.remove_device = input_remove_device,
|
||||
};
|
||||
|
||||
Each device class structure should be exported in a header file so it
|
||||
can be used by drivers, extensions and interfaces.
|
||||
|
||||
Device classes are registered and unregistered with the core using::
|
||||
|
||||
int devclass_register(struct device_class * cls);
|
||||
void devclass_unregister(struct device_class * cls);
|
||||
|
||||
|
||||
Devices
|
||||
~~~~~~~
|
||||
As devices are bound to drivers, they are added to the device class
|
||||
that the driver belongs to. Before the driver model core, this would
|
||||
typically happen during the driver's probe() callback, once the device
|
||||
has been initialized. It now happens after the probe() callback
|
||||
finishes from the core.
|
||||
|
||||
The device is enumerated in the class. Each time a device is added to
|
||||
the class, the class's devnum field is incremented and assigned to the
|
||||
device. The field is never decremented, so if the device is removed
|
||||
from the class and re-added, it will receive a different enumerated
|
||||
value.
|
||||
|
||||
The class is allowed to create a class-specific structure for the
|
||||
device and store it in the device's class_data pointer.
|
||||
|
||||
There is no list of devices in the device class. Each driver has a
|
||||
list of devices that it supports. The device class has a list of
|
||||
drivers of that particular class. To access all of the devices in the
|
||||
class, iterate over the device lists of each driver in the class.
|
||||
|
||||
|
||||
Device Drivers
|
||||
~~~~~~~~~~~~~~
|
||||
Device drivers are added to device classes when they are registered
|
||||
with the core. A driver specifies the class it belongs to by setting
|
||||
the struct device_driver::devclass field.
|
||||
|
||||
|
||||
sysfs directory structure
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
There is a top-level sysfs directory named 'class'.
|
||||
|
||||
Each class gets a directory in the class directory, along with two
|
||||
default subdirectories::
|
||||
|
||||
class/
|
||||
`-- input
|
||||
|-- devices
|
||||
`-- drivers
|
||||
|
||||
|
||||
Drivers registered with the class get a symlink in the drivers/ directory
|
||||
that points to the driver's directory (under its bus directory)::
|
||||
|
||||
class/
|
||||
`-- input
|
||||
|-- devices
|
||||
`-- drivers
|
||||
`-- usb:usb_mouse -> ../../../bus/drivers/usb_mouse/
|
||||
|
||||
|
||||
Each device gets a symlink in the devices/ directory that points to the
|
||||
device's directory in the physical hierarchy::
|
||||
|
||||
class/
|
||||
`-- input
|
||||
|-- devices
|
||||
| `-- 1 -> ../../../root/pci0/00:1f.0/usb_bus/00:1f.2-1:0/
|
||||
`-- drivers
|
||||
|
||||
|
||||
Exporting Attributes
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
::
|
||||
|
||||
struct devclass_attribute {
|
||||
struct attribute attr;
|
||||
ssize_t (*show)(struct device_class *, char * buf, size_t count, loff_t off);
|
||||
ssize_t (*store)(struct device_class *, const char * buf, size_t count, loff_t off);
|
||||
};
|
||||
|
||||
Class drivers can export attributes using the DEVCLASS_ATTR macro that works
|
||||
similarly to the DEVICE_ATTR macro for devices. For example, a definition
|
||||
like this::
|
||||
|
||||
static DEVCLASS_ATTR(debug,0644,show_debug,store_debug);
|
||||
|
||||
is equivalent to declaring::
|
||||
|
||||
static devclass_attribute devclass_attr_debug;
|
||||
|
||||
The bus driver can add and remove the attribute from the class's
|
||||
sysfs directory using::
|
||||
|
||||
int devclass_create_file(struct device_class *, struct devclass_attribute *);
|
||||
void devclass_remove_file(struct device_class *, struct devclass_attribute *);
|
||||
|
||||
In the example above, the file will be named 'debug' in placed in the
|
||||
class's directory in sysfs.
|
||||
|
||||
|
||||
Interfaces
|
||||
~~~~~~~~~~
|
||||
There may exist multiple mechanisms for accessing the same device of a
|
||||
particular class type. Device interfaces describe these mechanisms.
|
||||
|
||||
When a device is added to a device class, the core attempts to add it
|
||||
to every interface that is registered with the device class.
|
116
Documentation/driver-api/driver-model/design-patterns.rst
Normal file
116
Documentation/driver-api/driver-model/design-patterns.rst
Normal file
@@ -0,0 +1,116 @@
|
||||
=============================
|
||||
Device Driver Design Patterns
|
||||
=============================
|
||||
|
||||
This document describes a few common design patterns found in device drivers.
|
||||
It is likely that subsystem maintainers will ask driver developers to
|
||||
conform to these design patterns.
|
||||
|
||||
1. State Container
|
||||
2. container_of()
|
||||
|
||||
|
||||
1. State Container
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
While the kernel contains a few device drivers that assume that they will
|
||||
only be probed() once on a certain system (singletons), it is custom to assume
|
||||
that the device the driver binds to will appear in several instances. This
|
||||
means that the probe() function and all callbacks need to be reentrant.
|
||||
|
||||
The most common way to achieve this is to use the state container design
|
||||
pattern. It usually has this form::
|
||||
|
||||
struct foo {
|
||||
spinlock_t lock; /* Example member */
|
||||
(...)
|
||||
};
|
||||
|
||||
static int foo_probe(...)
|
||||
{
|
||||
struct foo *foo;
|
||||
|
||||
foo = devm_kzalloc(dev, sizeof(*foo), GFP_KERNEL);
|
||||
if (!foo)
|
||||
return -ENOMEM;
|
||||
spin_lock_init(&foo->lock);
|
||||
(...)
|
||||
}
|
||||
|
||||
This will create an instance of struct foo in memory every time probe() is
|
||||
called. This is our state container for this instance of the device driver.
|
||||
Of course it is then necessary to always pass this instance of the
|
||||
state around to all functions that need access to the state and its members.
|
||||
|
||||
For example, if the driver is registering an interrupt handler, you would
|
||||
pass around a pointer to struct foo like this::
|
||||
|
||||
static irqreturn_t foo_handler(int irq, void *arg)
|
||||
{
|
||||
struct foo *foo = arg;
|
||||
(...)
|
||||
}
|
||||
|
||||
static int foo_probe(...)
|
||||
{
|
||||
struct foo *foo;
|
||||
|
||||
(...)
|
||||
ret = request_irq(irq, foo_handler, 0, "foo", foo);
|
||||
}
|
||||
|
||||
This way you always get a pointer back to the correct instance of foo in
|
||||
your interrupt handler.
|
||||
|
||||
|
||||
2. container_of()
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
Continuing on the above example we add an offloaded work::
|
||||
|
||||
struct foo {
|
||||
spinlock_t lock;
|
||||
struct workqueue_struct *wq;
|
||||
struct work_struct offload;
|
||||
(...)
|
||||
};
|
||||
|
||||
static void foo_work(struct work_struct *work)
|
||||
{
|
||||
struct foo *foo = container_of(work, struct foo, offload);
|
||||
|
||||
(...)
|
||||
}
|
||||
|
||||
static irqreturn_t foo_handler(int irq, void *arg)
|
||||
{
|
||||
struct foo *foo = arg;
|
||||
|
||||
queue_work(foo->wq, &foo->offload);
|
||||
(...)
|
||||
}
|
||||
|
||||
static int foo_probe(...)
|
||||
{
|
||||
struct foo *foo;
|
||||
|
||||
foo->wq = create_singlethread_workqueue("foo-wq");
|
||||
INIT_WORK(&foo->offload, foo_work);
|
||||
(...)
|
||||
}
|
||||
|
||||
The design pattern is the same for an hrtimer or something similar that will
|
||||
return a single argument which is a pointer to a struct member in the
|
||||
callback.
|
||||
|
||||
container_of() is a macro defined in <linux/kernel.h>
|
||||
|
||||
What container_of() does is to obtain a pointer to the containing struct from
|
||||
a pointer to a member by a simple subtraction using the offsetof() macro from
|
||||
standard C, which allows something similar to object oriented behaviours.
|
||||
Notice that the contained member must not be a pointer, but an actual member
|
||||
for this to work.
|
||||
|
||||
We can see here that we avoid having global pointers to our struct foo *
|
||||
instance this way, while still keeping the number of parameters passed to the
|
||||
work function to a single pointer.
|
109
Documentation/driver-api/driver-model/device.rst
Normal file
109
Documentation/driver-api/driver-model/device.rst
Normal file
@@ -0,0 +1,109 @@
|
||||
==========================
|
||||
The Basic Device Structure
|
||||
==========================
|
||||
|
||||
See the kerneldoc for the struct device.
|
||||
|
||||
|
||||
Programming Interface
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
The bus driver that discovers the device uses this to register the
|
||||
device with the core::
|
||||
|
||||
int device_register(struct device * dev);
|
||||
|
||||
The bus should initialize the following fields:
|
||||
|
||||
- parent
|
||||
- name
|
||||
- bus_id
|
||||
- bus
|
||||
|
||||
A device is removed from the core when its reference count goes to
|
||||
0. The reference count can be adjusted using::
|
||||
|
||||
struct device * get_device(struct device * dev);
|
||||
void put_device(struct device * dev);
|
||||
|
||||
get_device() will return a pointer to the struct device passed to it
|
||||
if the reference is not already 0 (if it's in the process of being
|
||||
removed already).
|
||||
|
||||
A driver can access the lock in the device structure using::
|
||||
|
||||
void lock_device(struct device * dev);
|
||||
void unlock_device(struct device * dev);
|
||||
|
||||
|
||||
Attributes
|
||||
~~~~~~~~~~
|
||||
|
||||
::
|
||||
|
||||
struct device_attribute {
|
||||
struct attribute attr;
|
||||
ssize_t (*show)(struct device *dev, struct device_attribute *attr,
|
||||
char *buf);
|
||||
ssize_t (*store)(struct device *dev, struct device_attribute *attr,
|
||||
const char *buf, size_t count);
|
||||
};
|
||||
|
||||
Attributes of devices can be exported by a device driver through sysfs.
|
||||
|
||||
Please see Documentation/filesystems/sysfs.txt for more information
|
||||
on how sysfs works.
|
||||
|
||||
As explained in Documentation/kobject.txt, device attributes must be
|
||||
created before the KOBJ_ADD uevent is generated. The only way to realize
|
||||
that is by defining an attribute group.
|
||||
|
||||
Attributes are declared using a macro called DEVICE_ATTR::
|
||||
|
||||
#define DEVICE_ATTR(name,mode,show,store)
|
||||
|
||||
Example:::
|
||||
|
||||
static DEVICE_ATTR(type, 0444, show_type, NULL);
|
||||
static DEVICE_ATTR(power, 0644, show_power, store_power);
|
||||
|
||||
This declares two structures of type struct device_attribute with respective
|
||||
names 'dev_attr_type' and 'dev_attr_power'. These two attributes can be
|
||||
organized as follows into a group::
|
||||
|
||||
static struct attribute *dev_attrs[] = {
|
||||
&dev_attr_type.attr,
|
||||
&dev_attr_power.attr,
|
||||
NULL,
|
||||
};
|
||||
|
||||
static struct attribute_group dev_attr_group = {
|
||||
.attrs = dev_attrs,
|
||||
};
|
||||
|
||||
static const struct attribute_group *dev_attr_groups[] = {
|
||||
&dev_attr_group,
|
||||
NULL,
|
||||
};
|
||||
|
||||
This array of groups can then be associated with a device by setting the
|
||||
group pointer in struct device before device_register() is invoked::
|
||||
|
||||
dev->groups = dev_attr_groups;
|
||||
device_register(dev);
|
||||
|
||||
The device_register() function will use the 'groups' pointer to create the
|
||||
device attributes and the device_unregister() function will use this pointer
|
||||
to remove the device attributes.
|
||||
|
||||
Word of warning: While the kernel allows device_create_file() and
|
||||
device_remove_file() to be called on a device at any time, userspace has
|
||||
strict expectations on when attributes get created. When a new device is
|
||||
registered in the kernel, a uevent is generated to notify userspace (like
|
||||
udev) that a new device is available. If attributes are added after the
|
||||
device is registered, then userspace won't get notified and userspace will
|
||||
not know about the new attributes.
|
||||
|
||||
This is important for device driver that need to publish additional
|
||||
attributes for a device at driver probe time. If the device driver simply
|
||||
calls device_create_file() on the device structure passed to it, then
|
||||
userspace will never be notified of the new attributes.
|
414
Documentation/driver-api/driver-model/devres.rst
Normal file
414
Documentation/driver-api/driver-model/devres.rst
Normal file
@@ -0,0 +1,414 @@
|
||||
================================
|
||||
Devres - Managed Device Resource
|
||||
================================
|
||||
|
||||
Tejun Heo <teheo@suse.de>
|
||||
|
||||
First draft 10 January 2007
|
||||
|
||||
.. contents
|
||||
|
||||
1. Intro : Huh? Devres?
|
||||
2. Devres : Devres in a nutshell
|
||||
3. Devres Group : Group devres'es and release them together
|
||||
4. Details : Life time rules, calling context, ...
|
||||
5. Overhead : How much do we have to pay for this?
|
||||
6. List of managed interfaces: Currently implemented managed interfaces
|
||||
|
||||
|
||||
1. Intro
|
||||
--------
|
||||
|
||||
devres came up while trying to convert libata to use iomap. Each
|
||||
iomapped address should be kept and unmapped on driver detach. For
|
||||
example, a plain SFF ATA controller (that is, good old PCI IDE) in
|
||||
native mode makes use of 5 PCI BARs and all of them should be
|
||||
maintained.
|
||||
|
||||
As with many other device drivers, libata low level drivers have
|
||||
sufficient bugs in ->remove and ->probe failure path. Well, yes,
|
||||
that's probably because libata low level driver developers are lazy
|
||||
bunch, but aren't all low level driver developers? After spending a
|
||||
day fiddling with braindamaged hardware with no document or
|
||||
braindamaged document, if it's finally working, well, it's working.
|
||||
|
||||
For one reason or another, low level drivers don't receive as much
|
||||
attention or testing as core code, and bugs on driver detach or
|
||||
initialization failure don't happen often enough to be noticeable.
|
||||
Init failure path is worse because it's much less travelled while
|
||||
needs to handle multiple entry points.
|
||||
|
||||
So, many low level drivers end up leaking resources on driver detach
|
||||
and having half broken failure path implementation in ->probe() which
|
||||
would leak resources or even cause oops when failure occurs. iomap
|
||||
adds more to this mix. So do msi and msix.
|
||||
|
||||
|
||||
2. Devres
|
||||
---------
|
||||
|
||||
devres is basically linked list of arbitrarily sized memory areas
|
||||
associated with a struct device. Each devres entry is associated with
|
||||
a release function. A devres can be released in several ways. No
|
||||
matter what, all devres entries are released on driver detach. On
|
||||
release, the associated release function is invoked and then the
|
||||
devres entry is freed.
|
||||
|
||||
Managed interface is created for resources commonly used by device
|
||||
drivers using devres. For example, coherent DMA memory is acquired
|
||||
using dma_alloc_coherent(). The managed version is called
|
||||
dmam_alloc_coherent(). It is identical to dma_alloc_coherent() except
|
||||
for the DMA memory allocated using it is managed and will be
|
||||
automatically released on driver detach. Implementation looks like
|
||||
the following::
|
||||
|
||||
struct dma_devres {
|
||||
size_t size;
|
||||
void *vaddr;
|
||||
dma_addr_t dma_handle;
|
||||
};
|
||||
|
||||
static void dmam_coherent_release(struct device *dev, void *res)
|
||||
{
|
||||
struct dma_devres *this = res;
|
||||
|
||||
dma_free_coherent(dev, this->size, this->vaddr, this->dma_handle);
|
||||
}
|
||||
|
||||
dmam_alloc_coherent(dev, size, dma_handle, gfp)
|
||||
{
|
||||
struct dma_devres *dr;
|
||||
void *vaddr;
|
||||
|
||||
dr = devres_alloc(dmam_coherent_release, sizeof(*dr), gfp);
|
||||
...
|
||||
|
||||
/* alloc DMA memory as usual */
|
||||
vaddr = dma_alloc_coherent(...);
|
||||
...
|
||||
|
||||
/* record size, vaddr, dma_handle in dr */
|
||||
dr->vaddr = vaddr;
|
||||
...
|
||||
|
||||
devres_add(dev, dr);
|
||||
|
||||
return vaddr;
|
||||
}
|
||||
|
||||
If a driver uses dmam_alloc_coherent(), the area is guaranteed to be
|
||||
freed whether initialization fails half-way or the device gets
|
||||
detached. If most resources are acquired using managed interface, a
|
||||
driver can have much simpler init and exit code. Init path basically
|
||||
looks like the following::
|
||||
|
||||
my_init_one()
|
||||
{
|
||||
struct mydev *d;
|
||||
|
||||
d = devm_kzalloc(dev, sizeof(*d), GFP_KERNEL);
|
||||
if (!d)
|
||||
return -ENOMEM;
|
||||
|
||||
d->ring = dmam_alloc_coherent(...);
|
||||
if (!d->ring)
|
||||
return -ENOMEM;
|
||||
|
||||
if (check something)
|
||||
return -EINVAL;
|
||||
...
|
||||
|
||||
return register_to_upper_layer(d);
|
||||
}
|
||||
|
||||
And exit path::
|
||||
|
||||
my_remove_one()
|
||||
{
|
||||
unregister_from_upper_layer(d);
|
||||
shutdown_my_hardware();
|
||||
}
|
||||
|
||||
As shown above, low level drivers can be simplified a lot by using
|
||||
devres. Complexity is shifted from less maintained low level drivers
|
||||
to better maintained higher layer. Also, as init failure path is
|
||||
shared with exit path, both can get more testing.
|
||||
|
||||
Note though that when converting current calls or assignments to
|
||||
managed devm_* versions it is up to you to check if internal operations
|
||||
like allocating memory, have failed. Managed resources pertains to the
|
||||
freeing of these resources *only* - all other checks needed are still
|
||||
on you. In some cases this may mean introducing checks that were not
|
||||
necessary before moving to the managed devm_* calls.
|
||||
|
||||
|
||||
3. Devres group
|
||||
---------------
|
||||
|
||||
Devres entries can be grouped using devres group. When a group is
|
||||
released, all contained normal devres entries and properly nested
|
||||
groups are released. One usage is to rollback series of acquired
|
||||
resources on failure. For example::
|
||||
|
||||
if (!devres_open_group(dev, NULL, GFP_KERNEL))
|
||||
return -ENOMEM;
|
||||
|
||||
acquire A;
|
||||
if (failed)
|
||||
goto err;
|
||||
|
||||
acquire B;
|
||||
if (failed)
|
||||
goto err;
|
||||
...
|
||||
|
||||
devres_remove_group(dev, NULL);
|
||||
return 0;
|
||||
|
||||
err:
|
||||
devres_release_group(dev, NULL);
|
||||
return err_code;
|
||||
|
||||
As resource acquisition failure usually means probe failure, constructs
|
||||
like above are usually useful in midlayer driver (e.g. libata core
|
||||
layer) where interface function shouldn't have side effect on failure.
|
||||
For LLDs, just returning error code suffices in most cases.
|
||||
|
||||
Each group is identified by `void *id`. It can either be explicitly
|
||||
specified by @id argument to devres_open_group() or automatically
|
||||
created by passing NULL as @id as in the above example. In both
|
||||
cases, devres_open_group() returns the group's id. The returned id
|
||||
can be passed to other devres functions to select the target group.
|
||||
If NULL is given to those functions, the latest open group is
|
||||
selected.
|
||||
|
||||
For example, you can do something like the following::
|
||||
|
||||
int my_midlayer_create_something()
|
||||
{
|
||||
if (!devres_open_group(dev, my_midlayer_create_something, GFP_KERNEL))
|
||||
return -ENOMEM;
|
||||
|
||||
...
|
||||
|
||||
devres_close_group(dev, my_midlayer_create_something);
|
||||
return 0;
|
||||
}
|
||||
|
||||
void my_midlayer_destroy_something()
|
||||
{
|
||||
devres_release_group(dev, my_midlayer_create_something);
|
||||
}
|
||||
|
||||
|
||||
4. Details
|
||||
----------
|
||||
|
||||
Lifetime of a devres entry begins on devres allocation and finishes
|
||||
when it is released or destroyed (removed and freed) - no reference
|
||||
counting.
|
||||
|
||||
devres core guarantees atomicity to all basic devres operations and
|
||||
has support for single-instance devres types (atomic
|
||||
lookup-and-add-if-not-found). Other than that, synchronizing
|
||||
concurrent accesses to allocated devres data is caller's
|
||||
responsibility. This is usually non-issue because bus ops and
|
||||
resource allocations already do the job.
|
||||
|
||||
For an example of single-instance devres type, read pcim_iomap_table()
|
||||
in lib/devres.c.
|
||||
|
||||
All devres interface functions can be called without context if the
|
||||
right gfp mask is given.
|
||||
|
||||
|
||||
5. Overhead
|
||||
-----------
|
||||
|
||||
Each devres bookkeeping info is allocated together with requested data
|
||||
area. With debug option turned off, bookkeeping info occupies 16
|
||||
bytes on 32bit machines and 24 bytes on 64bit (three pointers rounded
|
||||
up to ull alignment). If singly linked list is used, it can be
|
||||
reduced to two pointers (8 bytes on 32bit, 16 bytes on 64bit).
|
||||
|
||||
Each devres group occupies 8 pointers. It can be reduced to 6 if
|
||||
singly linked list is used.
|
||||
|
||||
Memory space overhead on ahci controller with two ports is between 300
|
||||
and 400 bytes on 32bit machine after naive conversion (we can
|
||||
certainly invest a bit more effort into libata core layer).
|
||||
|
||||
|
||||
6. List of managed interfaces
|
||||
-----------------------------
|
||||
|
||||
CLOCK
|
||||
devm_clk_get()
|
||||
devm_clk_get_optional()
|
||||
devm_clk_put()
|
||||
devm_clk_hw_register()
|
||||
devm_of_clk_add_hw_provider()
|
||||
devm_clk_hw_register_clkdev()
|
||||
|
||||
DMA
|
||||
dmaenginem_async_device_register()
|
||||
dmam_alloc_coherent()
|
||||
dmam_alloc_attrs()
|
||||
dmam_free_coherent()
|
||||
dmam_pool_create()
|
||||
dmam_pool_destroy()
|
||||
|
||||
DRM
|
||||
devm_drm_dev_init()
|
||||
|
||||
GPIO
|
||||
devm_gpiod_get()
|
||||
devm_gpiod_get_index()
|
||||
devm_gpiod_get_index_optional()
|
||||
devm_gpiod_get_optional()
|
||||
devm_gpiod_put()
|
||||
devm_gpiod_unhinge()
|
||||
devm_gpiochip_add_data()
|
||||
devm_gpio_request()
|
||||
devm_gpio_request_one()
|
||||
devm_gpio_free()
|
||||
|
||||
I2C
|
||||
devm_i2c_new_dummy_device()
|
||||
|
||||
IIO
|
||||
devm_iio_device_alloc()
|
||||
devm_iio_device_free()
|
||||
devm_iio_device_register()
|
||||
devm_iio_device_unregister()
|
||||
devm_iio_kfifo_allocate()
|
||||
devm_iio_kfifo_free()
|
||||
devm_iio_triggered_buffer_setup()
|
||||
devm_iio_triggered_buffer_cleanup()
|
||||
devm_iio_trigger_alloc()
|
||||
devm_iio_trigger_free()
|
||||
devm_iio_trigger_register()
|
||||
devm_iio_trigger_unregister()
|
||||
devm_iio_channel_get()
|
||||
devm_iio_channel_release()
|
||||
devm_iio_channel_get_all()
|
||||
devm_iio_channel_release_all()
|
||||
|
||||
INPUT
|
||||
devm_input_allocate_device()
|
||||
|
||||
IO region
|
||||
devm_release_mem_region()
|
||||
devm_release_region()
|
||||
devm_release_resource()
|
||||
devm_request_mem_region()
|
||||
devm_request_region()
|
||||
devm_request_resource()
|
||||
|
||||
IOMAP
|
||||
devm_ioport_map()
|
||||
devm_ioport_unmap()
|
||||
devm_ioremap()
|
||||
devm_ioremap_nocache()
|
||||
devm_ioremap_wc()
|
||||
devm_ioremap_resource() : checks resource, requests memory region, ioremaps
|
||||
devm_iounmap()
|
||||
pcim_iomap()
|
||||
pcim_iomap_regions() : do request_region() and iomap() on multiple BARs
|
||||
pcim_iomap_table() : array of mapped addresses indexed by BAR
|
||||
pcim_iounmap()
|
||||
|
||||
IRQ
|
||||
devm_free_irq()
|
||||
devm_request_any_context_irq()
|
||||
devm_request_irq()
|
||||
devm_request_threaded_irq()
|
||||
devm_irq_alloc_descs()
|
||||
devm_irq_alloc_desc()
|
||||
devm_irq_alloc_desc_at()
|
||||
devm_irq_alloc_desc_from()
|
||||
devm_irq_alloc_descs_from()
|
||||
devm_irq_alloc_generic_chip()
|
||||
devm_irq_setup_generic_chip()
|
||||
devm_irq_sim_init()
|
||||
|
||||
LED
|
||||
devm_led_classdev_register()
|
||||
devm_led_classdev_unregister()
|
||||
|
||||
MDIO
|
||||
devm_mdiobus_alloc()
|
||||
devm_mdiobus_alloc_size()
|
||||
devm_mdiobus_free()
|
||||
|
||||
MEM
|
||||
devm_free_pages()
|
||||
devm_get_free_pages()
|
||||
devm_kasprintf()
|
||||
devm_kcalloc()
|
||||
devm_kfree()
|
||||
devm_kmalloc()
|
||||
devm_kmalloc_array()
|
||||
devm_kmemdup()
|
||||
devm_kstrdup()
|
||||
devm_kvasprintf()
|
||||
devm_kzalloc()
|
||||
|
||||
MFD
|
||||
devm_mfd_add_devices()
|
||||
|
||||
MUX
|
||||
devm_mux_chip_alloc()
|
||||
devm_mux_chip_register()
|
||||
devm_mux_control_get()
|
||||
|
||||
PER-CPU MEM
|
||||
devm_alloc_percpu()
|
||||
devm_free_percpu()
|
||||
|
||||
PCI
|
||||
devm_pci_alloc_host_bridge() : managed PCI host bridge allocation
|
||||
devm_pci_remap_cfgspace() : ioremap PCI configuration space
|
||||
devm_pci_remap_cfg_resource() : ioremap PCI configuration space resource
|
||||
pcim_enable_device() : after success, all PCI ops become managed
|
||||
pcim_pin_device() : keep PCI device enabled after release
|
||||
|
||||
PHY
|
||||
devm_usb_get_phy()
|
||||
devm_usb_put_phy()
|
||||
|
||||
PINCTRL
|
||||
devm_pinctrl_get()
|
||||
devm_pinctrl_put()
|
||||
devm_pinctrl_register()
|
||||
devm_pinctrl_unregister()
|
||||
|
||||
POWER
|
||||
devm_reboot_mode_register()
|
||||
devm_reboot_mode_unregister()
|
||||
|
||||
PWM
|
||||
devm_pwm_get()
|
||||
devm_pwm_put()
|
||||
|
||||
REGULATOR
|
||||
devm_regulator_bulk_get()
|
||||
devm_regulator_get()
|
||||
devm_regulator_put()
|
||||
devm_regulator_register()
|
||||
|
||||
RESET
|
||||
devm_reset_control_get()
|
||||
devm_reset_controller_register()
|
||||
|
||||
SERDEV
|
||||
devm_serdev_device_open()
|
||||
|
||||
SLAVE DMA ENGINE
|
||||
devm_acpi_dma_controller_register()
|
||||
|
||||
SPI
|
||||
devm_spi_register_master()
|
||||
|
||||
WATCHDOG
|
||||
devm_watchdog_register_device()
|
223
Documentation/driver-api/driver-model/driver.rst
Normal file
223
Documentation/driver-api/driver-model/driver.rst
Normal file
@@ -0,0 +1,223 @@
|
||||
==============
|
||||
Device Drivers
|
||||
==============
|
||||
|
||||
See the kerneldoc for the struct device_driver.
|
||||
|
||||
|
||||
Allocation
|
||||
~~~~~~~~~~
|
||||
|
||||
Device drivers are statically allocated structures. Though there may
|
||||
be multiple devices in a system that a driver supports, struct
|
||||
device_driver represents the driver as a whole (not a particular
|
||||
device instance).
|
||||
|
||||
Initialization
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
The driver must initialize at least the name and bus fields. It should
|
||||
also initialize the devclass field (when it arrives), so it may obtain
|
||||
the proper linkage internally. It should also initialize as many of
|
||||
the callbacks as possible, though each is optional.
|
||||
|
||||
Declaration
|
||||
~~~~~~~~~~~
|
||||
|
||||
As stated above, struct device_driver objects are statically
|
||||
allocated. Below is an example declaration of the eepro100
|
||||
driver. This declaration is hypothetical only; it relies on the driver
|
||||
being converted completely to the new model::
|
||||
|
||||
static struct device_driver eepro100_driver = {
|
||||
.name = "eepro100",
|
||||
.bus = &pci_bus_type,
|
||||
|
||||
.probe = eepro100_probe,
|
||||
.remove = eepro100_remove,
|
||||
.suspend = eepro100_suspend,
|
||||
.resume = eepro100_resume,
|
||||
};
|
||||
|
||||
Most drivers will not be able to be converted completely to the new
|
||||
model because the bus they belong to has a bus-specific structure with
|
||||
bus-specific fields that cannot be generalized.
|
||||
|
||||
The most common example of this are device ID structures. A driver
|
||||
typically defines an array of device IDs that it supports. The format
|
||||
of these structures and the semantics for comparing device IDs are
|
||||
completely bus-specific. Defining them as bus-specific entities would
|
||||
sacrifice type-safety, so we keep bus-specific structures around.
|
||||
|
||||
Bus-specific drivers should include a generic struct device_driver in
|
||||
the definition of the bus-specific driver. Like this::
|
||||
|
||||
struct pci_driver {
|
||||
const struct pci_device_id *id_table;
|
||||
struct device_driver driver;
|
||||
};
|
||||
|
||||
A definition that included bus-specific fields would look like
|
||||
(using the eepro100 driver again)::
|
||||
|
||||
static struct pci_driver eepro100_driver = {
|
||||
.id_table = eepro100_pci_tbl,
|
||||
.driver = {
|
||||
.name = "eepro100",
|
||||
.bus = &pci_bus_type,
|
||||
.probe = eepro100_probe,
|
||||
.remove = eepro100_remove,
|
||||
.suspend = eepro100_suspend,
|
||||
.resume = eepro100_resume,
|
||||
},
|
||||
};
|
||||
|
||||
Some may find the syntax of embedded struct initialization awkward or
|
||||
even a bit ugly. So far, it's the best way we've found to do what we want...
|
||||
|
||||
Registration
|
||||
~~~~~~~~~~~~
|
||||
|
||||
::
|
||||
|
||||
int driver_register(struct device_driver *drv);
|
||||
|
||||
The driver registers the structure on startup. For drivers that have
|
||||
no bus-specific fields (i.e. don't have a bus-specific driver
|
||||
structure), they would use driver_register and pass a pointer to their
|
||||
struct device_driver object.
|
||||
|
||||
Most drivers, however, will have a bus-specific structure and will
|
||||
need to register with the bus using something like pci_driver_register.
|
||||
|
||||
It is important that drivers register their driver structure as early as
|
||||
possible. Registration with the core initializes several fields in the
|
||||
struct device_driver object, including the reference count and the
|
||||
lock. These fields are assumed to be valid at all times and may be
|
||||
used by the device model core or the bus driver.
|
||||
|
||||
|
||||
Transition Bus Drivers
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
By defining wrapper functions, the transition to the new model can be
|
||||
made easier. Drivers can ignore the generic structure altogether and
|
||||
let the bus wrapper fill in the fields. For the callbacks, the bus can
|
||||
define generic callbacks that forward the call to the bus-specific
|
||||
callbacks of the drivers.
|
||||
|
||||
This solution is intended to be only temporary. In order to get class
|
||||
information in the driver, the drivers must be modified anyway. Since
|
||||
converting drivers to the new model should reduce some infrastructural
|
||||
complexity and code size, it is recommended that they are converted as
|
||||
class information is added.
|
||||
|
||||
Access
|
||||
~~~~~~
|
||||
|
||||
Once the object has been registered, it may access the common fields of
|
||||
the object, like the lock and the list of devices::
|
||||
|
||||
int driver_for_each_dev(struct device_driver *drv, void *data,
|
||||
int (*callback)(struct device *dev, void *data));
|
||||
|
||||
The devices field is a list of all the devices that have been bound to
|
||||
the driver. The LDM core provides a helper function to operate on all
|
||||
the devices a driver controls. This helper locks the driver on each
|
||||
node access, and does proper reference counting on each device as it
|
||||
accesses it.
|
||||
|
||||
|
||||
sysfs
|
||||
~~~~~
|
||||
|
||||
When a driver is registered, a sysfs directory is created in its
|
||||
bus's directory. In this directory, the driver can export an interface
|
||||
to userspace to control operation of the driver on a global basis;
|
||||
e.g. toggling debugging output in the driver.
|
||||
|
||||
A future feature of this directory will be a 'devices' directory. This
|
||||
directory will contain symlinks to the directories of devices it
|
||||
supports.
|
||||
|
||||
|
||||
|
||||
Callbacks
|
||||
~~~~~~~~~
|
||||
|
||||
::
|
||||
|
||||
int (*probe) (struct device *dev);
|
||||
|
||||
The probe() entry is called in task context, with the bus's rwsem locked
|
||||
and the driver partially bound to the device. Drivers commonly use
|
||||
container_of() to convert "dev" to a bus-specific type, both in probe()
|
||||
and other routines. That type often provides device resource data, such
|
||||
as pci_dev.resource[] or platform_device.resources, which is used in
|
||||
addition to dev->platform_data to initialize the driver.
|
||||
|
||||
This callback holds the driver-specific logic to bind the driver to a
|
||||
given device. That includes verifying that the device is present, that
|
||||
it's a version the driver can handle, that driver data structures can
|
||||
be allocated and initialized, and that any hardware can be initialized.
|
||||
Drivers often store a pointer to their state with dev_set_drvdata().
|
||||
When the driver has successfully bound itself to that device, then probe()
|
||||
returns zero and the driver model code will finish its part of binding
|
||||
the driver to that device.
|
||||
|
||||
A driver's probe() may return a negative errno value to indicate that
|
||||
the driver did not bind to this device, in which case it should have
|
||||
released all resources it allocated::
|
||||
|
||||
int (*remove) (struct device *dev);
|
||||
|
||||
remove is called to unbind a driver from a device. This may be
|
||||
called if a device is physically removed from the system, if the
|
||||
driver module is being unloaded, during a reboot sequence, or
|
||||
in other cases.
|
||||
|
||||
It is up to the driver to determine if the device is present or
|
||||
not. It should free any resources allocated specifically for the
|
||||
device; i.e. anything in the device's driver_data field.
|
||||
|
||||
If the device is still present, it should quiesce the device and place
|
||||
it into a supported low-power state::
|
||||
|
||||
int (*suspend) (struct device *dev, pm_message_t state);
|
||||
|
||||
suspend is called to put the device in a low power state::
|
||||
|
||||
int (*resume) (struct device *dev);
|
||||
|
||||
Resume is used to bring a device back from a low power state.
|
||||
|
||||
|
||||
Attributes
|
||||
~~~~~~~~~~
|
||||
|
||||
::
|
||||
|
||||
struct driver_attribute {
|
||||
struct attribute attr;
|
||||
ssize_t (*show)(struct device_driver *driver, char *buf);
|
||||
ssize_t (*store)(struct device_driver *, const char *buf, size_t count);
|
||||
};
|
||||
|
||||
Device drivers can export attributes via their sysfs directories.
|
||||
Drivers can declare attributes using a DRIVER_ATTR_RW and DRIVER_ATTR_RO
|
||||
macro that works identically to the DEVICE_ATTR_RW and DEVICE_ATTR_RO
|
||||
macros.
|
||||
|
||||
Example::
|
||||
|
||||
DRIVER_ATTR_RW(debug);
|
||||
|
||||
This is equivalent to declaring::
|
||||
|
||||
struct driver_attribute driver_attr_debug;
|
||||
|
||||
This can then be used to add and remove the attribute from the
|
||||
driver's directory using::
|
||||
|
||||
int driver_create_file(struct device_driver *, const struct driver_attribute *);
|
||||
void driver_remove_file(struct device_driver *, const struct driver_attribute *);
|
24
Documentation/driver-api/driver-model/index.rst
Normal file
24
Documentation/driver-api/driver-model/index.rst
Normal file
@@ -0,0 +1,24 @@
|
||||
============
|
||||
Driver Model
|
||||
============
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
binding
|
||||
bus
|
||||
class
|
||||
design-patterns
|
||||
device
|
||||
devres
|
||||
driver
|
||||
overview
|
||||
platform
|
||||
porting
|
||||
|
||||
.. only:: subproject and html
|
||||
|
||||
Indices
|
||||
=======
|
||||
|
||||
* :ref:`genindex`
|
124
Documentation/driver-api/driver-model/overview.rst
Normal file
124
Documentation/driver-api/driver-model/overview.rst
Normal file
@@ -0,0 +1,124 @@
|
||||
=============================
|
||||
The Linux Kernel Device Model
|
||||
=============================
|
||||
|
||||
Patrick Mochel <mochel@digitalimplant.org>
|
||||
|
||||
Drafted 26 August 2002
|
||||
Updated 31 January 2006
|
||||
|
||||
|
||||
Overview
|
||||
~~~~~~~~
|
||||
|
||||
The Linux Kernel Driver Model is a unification of all the disparate driver
|
||||
models that were previously used in the kernel. It is intended to augment the
|
||||
bus-specific drivers for bridges and devices by consolidating a set of data
|
||||
and operations into globally accessible data structures.
|
||||
|
||||
Traditional driver models implemented some sort of tree-like structure
|
||||
(sometimes just a list) for the devices they control. There wasn't any
|
||||
uniformity across the different bus types.
|
||||
|
||||
The current driver model provides a common, uniform data model for describing
|
||||
a bus and the devices that can appear under the bus. The unified bus
|
||||
model includes a set of common attributes which all busses carry, and a set
|
||||
of common callbacks, such as device discovery during bus probing, bus
|
||||
shutdown, bus power management, etc.
|
||||
|
||||
The common device and bridge interface reflects the goals of the modern
|
||||
computer: namely the ability to do seamless device "plug and play", power
|
||||
management, and hot plug. In particular, the model dictated by Intel and
|
||||
Microsoft (namely ACPI) ensures that almost every device on almost any bus
|
||||
on an x86-compatible system can work within this paradigm. Of course,
|
||||
not every bus is able to support all such operations, although most
|
||||
buses support most of those operations.
|
||||
|
||||
|
||||
Downstream Access
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
Common data fields have been moved out of individual bus layers into a common
|
||||
data structure. These fields must still be accessed by the bus layers,
|
||||
and sometimes by the device-specific drivers.
|
||||
|
||||
Other bus layers are encouraged to do what has been done for the PCI layer.
|
||||
struct pci_dev now looks like this::
|
||||
|
||||
struct pci_dev {
|
||||
...
|
||||
|
||||
struct device dev; /* Generic device interface */
|
||||
...
|
||||
};
|
||||
|
||||
Note first that the struct device dev within the struct pci_dev is
|
||||
statically allocated. This means only one allocation on device discovery.
|
||||
|
||||
Note also that that struct device dev is not necessarily defined at the
|
||||
front of the pci_dev structure. This is to make people think about what
|
||||
they're doing when switching between the bus driver and the global driver,
|
||||
and to discourage meaningless and incorrect casts between the two.
|
||||
|
||||
The PCI bus layer freely accesses the fields of struct device. It knows about
|
||||
the structure of struct pci_dev, and it should know the structure of struct
|
||||
device. Individual PCI device drivers that have been converted to the current
|
||||
driver model generally do not and should not touch the fields of struct device,
|
||||
unless there is a compelling reason to do so.
|
||||
|
||||
The above abstraction prevents unnecessary pain during transitional phases.
|
||||
If it were not done this way, then when a field was renamed or removed, every
|
||||
downstream driver would break. On the other hand, if only the bus layer
|
||||
(and not the device layer) accesses the struct device, it is only the bus
|
||||
layer that needs to change.
|
||||
|
||||
|
||||
User Interface
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
By virtue of having a complete hierarchical view of all the devices in the
|
||||
system, exporting a complete hierarchical view to userspace becomes relatively
|
||||
easy. This has been accomplished by implementing a special purpose virtual
|
||||
file system named sysfs.
|
||||
|
||||
Almost all mainstream Linux distros mount this filesystem automatically; you
|
||||
can see some variation of the following in the output of the "mount" command::
|
||||
|
||||
$ mount
|
||||
...
|
||||
none on /sys type sysfs (rw,noexec,nosuid,nodev)
|
||||
...
|
||||
$
|
||||
|
||||
The auto-mounting of sysfs is typically accomplished by an entry similar to
|
||||
the following in the /etc/fstab file::
|
||||
|
||||
none /sys sysfs defaults 0 0
|
||||
|
||||
or something similar in the /lib/init/fstab file on Debian-based systems::
|
||||
|
||||
none /sys sysfs nodev,noexec,nosuid 0 0
|
||||
|
||||
If sysfs is not automatically mounted, you can always do it manually with::
|
||||
|
||||
# mount -t sysfs sysfs /sys
|
||||
|
||||
Whenever a device is inserted into the tree, a directory is created for it.
|
||||
This directory may be populated at each layer of discovery - the global layer,
|
||||
the bus layer, or the device layer.
|
||||
|
||||
The global layer currently creates two files - 'name' and 'power'. The
|
||||
former only reports the name of the device. The latter reports the
|
||||
current power state of the device. It will also be used to set the current
|
||||
power state.
|
||||
|
||||
The bus layer may also create files for the devices it finds while probing the
|
||||
bus. For example, the PCI layer currently creates 'irq' and 'resource' files
|
||||
for each PCI device.
|
||||
|
||||
A device-specific driver may also export files in its directory to expose
|
||||
device-specific data or tunable interfaces.
|
||||
|
||||
More information about the sysfs directory layout can be found in
|
||||
the other documents in this directory and in the file
|
||||
Documentation/filesystems/sysfs.txt.
|
246
Documentation/driver-api/driver-model/platform.rst
Normal file
246
Documentation/driver-api/driver-model/platform.rst
Normal file
@@ -0,0 +1,246 @@
|
||||
============================
|
||||
Platform Devices and Drivers
|
||||
============================
|
||||
|
||||
See <linux/platform_device.h> for the driver model interface to the
|
||||
platform bus: platform_device, and platform_driver. This pseudo-bus
|
||||
is used to connect devices on busses with minimal infrastructure,
|
||||
like those used to integrate peripherals on many system-on-chip
|
||||
processors, or some "legacy" PC interconnects; as opposed to large
|
||||
formally specified ones like PCI or USB.
|
||||
|
||||
|
||||
Platform devices
|
||||
~~~~~~~~~~~~~~~~
|
||||
Platform devices are devices that typically appear as autonomous
|
||||
entities in the system. This includes legacy port-based devices and
|
||||
host bridges to peripheral buses, and most controllers integrated
|
||||
into system-on-chip platforms. What they usually have in common
|
||||
is direct addressing from a CPU bus. Rarely, a platform_device will
|
||||
be connected through a segment of some other kind of bus; but its
|
||||
registers will still be directly addressable.
|
||||
|
||||
Platform devices are given a name, used in driver binding, and a
|
||||
list of resources such as addresses and IRQs::
|
||||
|
||||
struct platform_device {
|
||||
const char *name;
|
||||
u32 id;
|
||||
struct device dev;
|
||||
u32 num_resources;
|
||||
struct resource *resource;
|
||||
};
|
||||
|
||||
|
||||
Platform drivers
|
||||
~~~~~~~~~~~~~~~~
|
||||
Platform drivers follow the standard driver model convention, where
|
||||
discovery/enumeration is handled outside the drivers, and drivers
|
||||
provide probe() and remove() methods. They support power management
|
||||
and shutdown notifications using the standard conventions::
|
||||
|
||||
struct platform_driver {
|
||||
int (*probe)(struct platform_device *);
|
||||
int (*remove)(struct platform_device *);
|
||||
void (*shutdown)(struct platform_device *);
|
||||
int (*suspend)(struct platform_device *, pm_message_t state);
|
||||
int (*suspend_late)(struct platform_device *, pm_message_t state);
|
||||
int (*resume_early)(struct platform_device *);
|
||||
int (*resume)(struct platform_device *);
|
||||
struct device_driver driver;
|
||||
};
|
||||
|
||||
Note that probe() should in general verify that the specified device hardware
|
||||
actually exists; sometimes platform setup code can't be sure. The probing
|
||||
can use device resources, including clocks, and device platform_data.
|
||||
|
||||
Platform drivers register themselves the normal way::
|
||||
|
||||
int platform_driver_register(struct platform_driver *drv);
|
||||
|
||||
Or, in common situations where the device is known not to be hot-pluggable,
|
||||
the probe() routine can live in an init section to reduce the driver's
|
||||
runtime memory footprint::
|
||||
|
||||
int platform_driver_probe(struct platform_driver *drv,
|
||||
int (*probe)(struct platform_device *))
|
||||
|
||||
Kernel modules can be composed of several platform drivers. The platform core
|
||||
provides helpers to register and unregister an array of drivers::
|
||||
|
||||
int __platform_register_drivers(struct platform_driver * const *drivers,
|
||||
unsigned int count, struct module *owner);
|
||||
void platform_unregister_drivers(struct platform_driver * const *drivers,
|
||||
unsigned int count);
|
||||
|
||||
If one of the drivers fails to register, all drivers registered up to that
|
||||
point will be unregistered in reverse order. Note that there is a convenience
|
||||
macro that passes THIS_MODULE as owner parameter::
|
||||
|
||||
#define platform_register_drivers(drivers, count)
|
||||
|
||||
|
||||
Device Enumeration
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
As a rule, platform specific (and often board-specific) setup code will
|
||||
register platform devices::
|
||||
|
||||
int platform_device_register(struct platform_device *pdev);
|
||||
|
||||
int platform_add_devices(struct platform_device **pdevs, int ndev);
|
||||
|
||||
The general rule is to register only those devices that actually exist,
|
||||
but in some cases extra devices might be registered. For example, a kernel
|
||||
might be configured to work with an external network adapter that might not
|
||||
be populated on all boards, or likewise to work with an integrated controller
|
||||
that some boards might not hook up to any peripherals.
|
||||
|
||||
In some cases, boot firmware will export tables describing the devices
|
||||
that are populated on a given board. Without such tables, often the
|
||||
only way for system setup code to set up the correct devices is to build
|
||||
a kernel for a specific target board. Such board-specific kernels are
|
||||
common with embedded and custom systems development.
|
||||
|
||||
In many cases, the memory and IRQ resources associated with the platform
|
||||
device are not enough to let the device's driver work. Board setup code
|
||||
will often provide additional information using the device's platform_data
|
||||
field to hold additional information.
|
||||
|
||||
Embedded systems frequently need one or more clocks for platform devices,
|
||||
which are normally kept off until they're actively needed (to save power).
|
||||
System setup also associates those clocks with the device, so that that
|
||||
calls to clk_get(&pdev->dev, clock_name) return them as needed.
|
||||
|
||||
|
||||
Legacy Drivers: Device Probing
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Some drivers are not fully converted to the driver model, because they take
|
||||
on a non-driver role: the driver registers its platform device, rather than
|
||||
leaving that for system infrastructure. Such drivers can't be hotplugged
|
||||
or coldplugged, since those mechanisms require device creation to be in a
|
||||
different system component than the driver.
|
||||
|
||||
The only "good" reason for this is to handle older system designs which, like
|
||||
original IBM PCs, rely on error-prone "probe-the-hardware" models for hardware
|
||||
configuration. Newer systems have largely abandoned that model, in favor of
|
||||
bus-level support for dynamic configuration (PCI, USB), or device tables
|
||||
provided by the boot firmware (e.g. PNPACPI on x86). There are too many
|
||||
conflicting options about what might be where, and even educated guesses by
|
||||
an operating system will be wrong often enough to make trouble.
|
||||
|
||||
This style of driver is discouraged. If you're updating such a driver,
|
||||
please try to move the device enumeration to a more appropriate location,
|
||||
outside the driver. This will usually be cleanup, since such drivers
|
||||
tend to already have "normal" modes, such as ones using device nodes that
|
||||
were created by PNP or by platform device setup.
|
||||
|
||||
None the less, there are some APIs to support such legacy drivers. Avoid
|
||||
using these calls except with such hotplug-deficient drivers::
|
||||
|
||||
struct platform_device *platform_device_alloc(
|
||||
const char *name, int id);
|
||||
|
||||
You can use platform_device_alloc() to dynamically allocate a device, which
|
||||
you will then initialize with resources and platform_device_register().
|
||||
A better solution is usually::
|
||||
|
||||
struct platform_device *platform_device_register_simple(
|
||||
const char *name, int id,
|
||||
struct resource *res, unsigned int nres);
|
||||
|
||||
You can use platform_device_register_simple() as a one-step call to allocate
|
||||
and register a device.
|
||||
|
||||
|
||||
Device Naming and Driver Binding
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
The platform_device.dev.bus_id is the canonical name for the devices.
|
||||
It's built from two components:
|
||||
|
||||
* platform_device.name ... which is also used to for driver matching.
|
||||
|
||||
* platform_device.id ... the device instance number, or else "-1"
|
||||
to indicate there's only one.
|
||||
|
||||
These are concatenated, so name/id "serial"/0 indicates bus_id "serial.0", and
|
||||
"serial/3" indicates bus_id "serial.3"; both would use the platform_driver
|
||||
named "serial". While "my_rtc"/-1 would be bus_id "my_rtc" (no instance id)
|
||||
and use the platform_driver called "my_rtc".
|
||||
|
||||
Driver binding is performed automatically by the driver core, invoking
|
||||
driver probe() after finding a match between device and driver. If the
|
||||
probe() succeeds, the driver and device are bound as usual. There are
|
||||
three different ways to find such a match:
|
||||
|
||||
- Whenever a device is registered, the drivers for that bus are
|
||||
checked for matches. Platform devices should be registered very
|
||||
early during system boot.
|
||||
|
||||
- When a driver is registered using platform_driver_register(), all
|
||||
unbound devices on that bus are checked for matches. Drivers
|
||||
usually register later during booting, or by module loading.
|
||||
|
||||
- Registering a driver using platform_driver_probe() works just like
|
||||
using platform_driver_register(), except that the driver won't
|
||||
be probed later if another device registers. (Which is OK, since
|
||||
this interface is only for use with non-hotpluggable devices.)
|
||||
|
||||
|
||||
Early Platform Devices and Drivers
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
The early platform interfaces provide platform data to platform device
|
||||
drivers early on during the system boot. The code is built on top of the
|
||||
early_param() command line parsing and can be executed very early on.
|
||||
|
||||
Example: "earlyprintk" class early serial console in 6 steps
|
||||
|
||||
1. Registering early platform device data
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
The architecture code registers platform device data using the function
|
||||
early_platform_add_devices(). In the case of early serial console this
|
||||
should be hardware configuration for the serial port. Devices registered
|
||||
at this point will later on be matched against early platform drivers.
|
||||
|
||||
2. Parsing kernel command line
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
The architecture code calls parse_early_param() to parse the kernel
|
||||
command line. This will execute all matching early_param() callbacks.
|
||||
User specified early platform devices will be registered at this point.
|
||||
For the early serial console case the user can specify port on the
|
||||
kernel command line as "earlyprintk=serial.0" where "earlyprintk" is
|
||||
the class string, "serial" is the name of the platform driver and
|
||||
0 is the platform device id. If the id is -1 then the dot and the
|
||||
id can be omitted.
|
||||
|
||||
3. Installing early platform drivers belonging to a certain class
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
The architecture code may optionally force registration of all early
|
||||
platform drivers belonging to a certain class using the function
|
||||
early_platform_driver_register_all(). User specified devices from
|
||||
step 2 have priority over these. This step is omitted by the serial
|
||||
driver example since the early serial driver code should be disabled
|
||||
unless the user has specified port on the kernel command line.
|
||||
|
||||
4. Early platform driver registration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Compiled-in platform drivers making use of early_platform_init() are
|
||||
automatically registered during step 2 or 3. The serial driver example
|
||||
should use early_platform_init("earlyprintk", &platform_driver).
|
||||
|
||||
5. Probing of early platform drivers belonging to a certain class
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
The architecture code calls early_platform_driver_probe() to match
|
||||
registered early platform devices associated with a certain class with
|
||||
registered early platform drivers. Matched devices will get probed().
|
||||
This step can be executed at any point during the early boot. As soon
|
||||
as possible may be good for the serial port case.
|
||||
|
||||
6. Inside the early platform driver probe()
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
The driver code needs to take special care during early boot, especially
|
||||
when it comes to memory allocation and interrupt registration. The code
|
||||
in the probe() function can use is_early_platform_device() to check if
|
||||
it is called at early platform device or at the regular platform device
|
||||
time. The early serial driver performs register_console() at this point.
|
||||
|
||||
For further information, see <linux/platform_device.h>.
|
448
Documentation/driver-api/driver-model/porting.rst
Normal file
448
Documentation/driver-api/driver-model/porting.rst
Normal file
@@ -0,0 +1,448 @@
|
||||
=======================================
|
||||
Porting Drivers to the New Driver Model
|
||||
=======================================
|
||||
|
||||
Patrick Mochel
|
||||
|
||||
7 January 2003
|
||||
|
||||
|
||||
Overview
|
||||
|
||||
Please refer to `Documentation/driver-api/driver-model/*.rst` for definitions of
|
||||
various driver types and concepts.
|
||||
|
||||
Most of the work of porting devices drivers to the new model happens
|
||||
at the bus driver layer. This was intentional, to minimize the
|
||||
negative effect on kernel drivers, and to allow a gradual transition
|
||||
of bus drivers.
|
||||
|
||||
In a nutshell, the driver model consists of a set of objects that can
|
||||
be embedded in larger, bus-specific objects. Fields in these generic
|
||||
objects can replace fields in the bus-specific objects.
|
||||
|
||||
The generic objects must be registered with the driver model core. By
|
||||
doing so, they will exported via the sysfs filesystem. sysfs can be
|
||||
mounted by doing::
|
||||
|
||||
# mount -t sysfs sysfs /sys
|
||||
|
||||
|
||||
|
||||
The Process
|
||||
|
||||
Step 0: Read include/linux/device.h for object and function definitions.
|
||||
|
||||
Step 1: Registering the bus driver.
|
||||
|
||||
|
||||
- Define a struct bus_type for the bus driver::
|
||||
|
||||
struct bus_type pci_bus_type = {
|
||||
.name = "pci",
|
||||
};
|
||||
|
||||
|
||||
- Register the bus type.
|
||||
|
||||
This should be done in the initialization function for the bus type,
|
||||
which is usually the module_init(), or equivalent, function::
|
||||
|
||||
static int __init pci_driver_init(void)
|
||||
{
|
||||
return bus_register(&pci_bus_type);
|
||||
}
|
||||
|
||||
subsys_initcall(pci_driver_init);
|
||||
|
||||
|
||||
The bus type may be unregistered (if the bus driver may be compiled
|
||||
as a module) by doing::
|
||||
|
||||
bus_unregister(&pci_bus_type);
|
||||
|
||||
|
||||
- Export the bus type for others to use.
|
||||
|
||||
Other code may wish to reference the bus type, so declare it in a
|
||||
shared header file and export the symbol.
|
||||
|
||||
From include/linux/pci.h::
|
||||
|
||||
extern struct bus_type pci_bus_type;
|
||||
|
||||
|
||||
From file the above code appears in::
|
||||
|
||||
EXPORT_SYMBOL(pci_bus_type);
|
||||
|
||||
|
||||
|
||||
- This will cause the bus to show up in /sys/bus/pci/ with two
|
||||
subdirectories: 'devices' and 'drivers'::
|
||||
|
||||
# tree -d /sys/bus/pci/
|
||||
/sys/bus/pci/
|
||||
|-- devices
|
||||
`-- drivers
|
||||
|
||||
|
||||
|
||||
Step 2: Registering Devices.
|
||||
|
||||
struct device represents a single device. It mainly contains metadata
|
||||
describing the relationship the device has to other entities.
|
||||
|
||||
|
||||
- Embed a struct device in the bus-specific device type::
|
||||
|
||||
|
||||
struct pci_dev {
|
||||
...
|
||||
struct device dev; /* Generic device interface */
|
||||
...
|
||||
};
|
||||
|
||||
It is recommended that the generic device not be the first item in
|
||||
the struct to discourage programmers from doing mindless casts
|
||||
between the object types. Instead macros, or inline functions,
|
||||
should be created to convert from the generic object type::
|
||||
|
||||
|
||||
#define to_pci_dev(n) container_of(n, struct pci_dev, dev)
|
||||
|
||||
or
|
||||
|
||||
static inline struct pci_dev * to_pci_dev(struct kobject * kobj)
|
||||
{
|
||||
return container_of(n, struct pci_dev, dev);
|
||||
}
|
||||
|
||||
This allows the compiler to verify type-safety of the operations
|
||||
that are performed (which is Good).
|
||||
|
||||
|
||||
- Initialize the device on registration.
|
||||
|
||||
When devices are discovered or registered with the bus type, the
|
||||
bus driver should initialize the generic device. The most important
|
||||
things to initialize are the bus_id, parent, and bus fields.
|
||||
|
||||
The bus_id is an ASCII string that contains the device's address on
|
||||
the bus. The format of this string is bus-specific. This is
|
||||
necessary for representing devices in sysfs.
|
||||
|
||||
parent is the physical parent of the device. It is important that
|
||||
the bus driver sets this field correctly.
|
||||
|
||||
The driver model maintains an ordered list of devices that it uses
|
||||
for power management. This list must be in order to guarantee that
|
||||
devices are shutdown before their physical parents, and vice versa.
|
||||
The order of this list is determined by the parent of registered
|
||||
devices.
|
||||
|
||||
Also, the location of the device's sysfs directory depends on a
|
||||
device's parent. sysfs exports a directory structure that mirrors
|
||||
the device hierarchy. Accurately setting the parent guarantees that
|
||||
sysfs will accurately represent the hierarchy.
|
||||
|
||||
The device's bus field is a pointer to the bus type the device
|
||||
belongs to. This should be set to the bus_type that was declared
|
||||
and initialized before.
|
||||
|
||||
Optionally, the bus driver may set the device's name and release
|
||||
fields.
|
||||
|
||||
The name field is an ASCII string describing the device, like
|
||||
|
||||
"ATI Technologies Inc Radeon QD"
|
||||
|
||||
The release field is a callback that the driver model core calls
|
||||
when the device has been removed, and all references to it have
|
||||
been released. More on this in a moment.
|
||||
|
||||
|
||||
- Register the device.
|
||||
|
||||
Once the generic device has been initialized, it can be registered
|
||||
with the driver model core by doing::
|
||||
|
||||
device_register(&dev->dev);
|
||||
|
||||
It can later be unregistered by doing::
|
||||
|
||||
device_unregister(&dev->dev);
|
||||
|
||||
This should happen on buses that support hotpluggable devices.
|
||||
If a bus driver unregisters a device, it should not immediately free
|
||||
it. It should instead wait for the driver model core to call the
|
||||
device's release method, then free the bus-specific object.
|
||||
(There may be other code that is currently referencing the device
|
||||
structure, and it would be rude to free the device while that is
|
||||
happening).
|
||||
|
||||
|
||||
When the device is registered, a directory in sysfs is created.
|
||||
The PCI tree in sysfs looks like::
|
||||
|
||||
/sys/devices/pci0/
|
||||
|-- 00:00.0
|
||||
|-- 00:01.0
|
||||
| `-- 01:00.0
|
||||
|-- 00:02.0
|
||||
| `-- 02:1f.0
|
||||
| `-- 03:00.0
|
||||
|-- 00:1e.0
|
||||
| `-- 04:04.0
|
||||
|-- 00:1f.0
|
||||
|-- 00:1f.1
|
||||
| |-- ide0
|
||||
| | |-- 0.0
|
||||
| | `-- 0.1
|
||||
| `-- ide1
|
||||
| `-- 1.0
|
||||
|-- 00:1f.2
|
||||
|-- 00:1f.3
|
||||
`-- 00:1f.5
|
||||
|
||||
Also, symlinks are created in the bus's 'devices' directory
|
||||
that point to the device's directory in the physical hierarchy::
|
||||
|
||||
/sys/bus/pci/devices/
|
||||
|-- 00:00.0 -> ../../../devices/pci0/00:00.0
|
||||
|-- 00:01.0 -> ../../../devices/pci0/00:01.0
|
||||
|-- 00:02.0 -> ../../../devices/pci0/00:02.0
|
||||
|-- 00:1e.0 -> ../../../devices/pci0/00:1e.0
|
||||
|-- 00:1f.0 -> ../../../devices/pci0/00:1f.0
|
||||
|-- 00:1f.1 -> ../../../devices/pci0/00:1f.1
|
||||
|-- 00:1f.2 -> ../../../devices/pci0/00:1f.2
|
||||
|-- 00:1f.3 -> ../../../devices/pci0/00:1f.3
|
||||
|-- 00:1f.5 -> ../../../devices/pci0/00:1f.5
|
||||
|-- 01:00.0 -> ../../../devices/pci0/00:01.0/01:00.0
|
||||
|-- 02:1f.0 -> ../../../devices/pci0/00:02.0/02:1f.0
|
||||
|-- 03:00.0 -> ../../../devices/pci0/00:02.0/02:1f.0/03:00.0
|
||||
`-- 04:04.0 -> ../../../devices/pci0/00:1e.0/04:04.0
|
||||
|
||||
|
||||
|
||||
Step 3: Registering Drivers.
|
||||
|
||||
struct device_driver is a simple driver structure that contains a set
|
||||
of operations that the driver model core may call.
|
||||
|
||||
|
||||
- Embed a struct device_driver in the bus-specific driver.
|
||||
|
||||
Just like with devices, do something like::
|
||||
|
||||
struct pci_driver {
|
||||
...
|
||||
struct device_driver driver;
|
||||
};
|
||||
|
||||
|
||||
- Initialize the generic driver structure.
|
||||
|
||||
When the driver registers with the bus (e.g. doing pci_register_driver()),
|
||||
initialize the necessary fields of the driver: the name and bus
|
||||
fields.
|
||||
|
||||
|
||||
- Register the driver.
|
||||
|
||||
After the generic driver has been initialized, call::
|
||||
|
||||
driver_register(&drv->driver);
|
||||
|
||||
to register the driver with the core.
|
||||
|
||||
When the driver is unregistered from the bus, unregister it from the
|
||||
core by doing::
|
||||
|
||||
driver_unregister(&drv->driver);
|
||||
|
||||
Note that this will block until all references to the driver have
|
||||
gone away. Normally, there will not be any.
|
||||
|
||||
|
||||
- Sysfs representation.
|
||||
|
||||
Drivers are exported via sysfs in their bus's 'driver's directory.
|
||||
For example::
|
||||
|
||||
/sys/bus/pci/drivers/
|
||||
|-- 3c59x
|
||||
|-- Ensoniq AudioPCI
|
||||
|-- agpgart-amdk7
|
||||
|-- e100
|
||||
`-- serial
|
||||
|
||||
|
||||
Step 4: Define Generic Methods for Drivers.
|
||||
|
||||
struct device_driver defines a set of operations that the driver model
|
||||
core calls. Most of these operations are probably similar to
|
||||
operations the bus already defines for drivers, but taking different
|
||||
parameters.
|
||||
|
||||
It would be difficult and tedious to force every driver on a bus to
|
||||
simultaneously convert their drivers to generic format. Instead, the
|
||||
bus driver should define single instances of the generic methods that
|
||||
forward call to the bus-specific drivers. For instance::
|
||||
|
||||
|
||||
static int pci_device_remove(struct device * dev)
|
||||
{
|
||||
struct pci_dev * pci_dev = to_pci_dev(dev);
|
||||
struct pci_driver * drv = pci_dev->driver;
|
||||
|
||||
if (drv) {
|
||||
if (drv->remove)
|
||||
drv->remove(pci_dev);
|
||||
pci_dev->driver = NULL;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
The generic driver should be initialized with these methods before it
|
||||
is registered::
|
||||
|
||||
/* initialize common driver fields */
|
||||
drv->driver.name = drv->name;
|
||||
drv->driver.bus = &pci_bus_type;
|
||||
drv->driver.probe = pci_device_probe;
|
||||
drv->driver.resume = pci_device_resume;
|
||||
drv->driver.suspend = pci_device_suspend;
|
||||
drv->driver.remove = pci_device_remove;
|
||||
|
||||
/* register with core */
|
||||
driver_register(&drv->driver);
|
||||
|
||||
|
||||
Ideally, the bus should only initialize the fields if they are not
|
||||
already set. This allows the drivers to implement their own generic
|
||||
methods.
|
||||
|
||||
|
||||
Step 5: Support generic driver binding.
|
||||
|
||||
The model assumes that a device or driver can be dynamically
|
||||
registered with the bus at any time. When registration happens,
|
||||
devices must be bound to a driver, or drivers must be bound to all
|
||||
devices that it supports.
|
||||
|
||||
A driver typically contains a list of device IDs that it supports. The
|
||||
bus driver compares these IDs to the IDs of devices registered with it.
|
||||
The format of the device IDs, and the semantics for comparing them are
|
||||
bus-specific, so the generic model does attempt to generalize them.
|
||||
|
||||
Instead, a bus may supply a method in struct bus_type that does the
|
||||
comparison::
|
||||
|
||||
int (*match)(struct device * dev, struct device_driver * drv);
|
||||
|
||||
match should return positive value if the driver supports the device,
|
||||
and zero otherwise. It may also return error code (for example
|
||||
-EPROBE_DEFER) if determining that given driver supports the device is
|
||||
not possible.
|
||||
|
||||
When a device is registered, the bus's list of drivers is iterated
|
||||
over. bus->match() is called for each one until a match is found.
|
||||
|
||||
When a driver is registered, the bus's list of devices is iterated
|
||||
over. bus->match() is called for each device that is not already
|
||||
claimed by a driver.
|
||||
|
||||
When a device is successfully bound to a driver, device->driver is
|
||||
set, the device is added to a per-driver list of devices, and a
|
||||
symlink is created in the driver's sysfs directory that points to the
|
||||
device's physical directory::
|
||||
|
||||
/sys/bus/pci/drivers/
|
||||
|-- 3c59x
|
||||
| `-- 00:0b.0 -> ../../../../devices/pci0/00:0b.0
|
||||
|-- Ensoniq AudioPCI
|
||||
|-- agpgart-amdk7
|
||||
| `-- 00:00.0 -> ../../../../devices/pci0/00:00.0
|
||||
|-- e100
|
||||
| `-- 00:0c.0 -> ../../../../devices/pci0/00:0c.0
|
||||
`-- serial
|
||||
|
||||
|
||||
This driver binding should replace the existing driver binding
|
||||
mechanism the bus currently uses.
|
||||
|
||||
|
||||
Step 6: Supply a hotplug callback.
|
||||
|
||||
Whenever a device is registered with the driver model core, the
|
||||
userspace program /sbin/hotplug is called to notify userspace.
|
||||
Users can define actions to perform when a device is inserted or
|
||||
removed.
|
||||
|
||||
The driver model core passes several arguments to userspace via
|
||||
environment variables, including
|
||||
|
||||
- ACTION: set to 'add' or 'remove'
|
||||
- DEVPATH: set to the device's physical path in sysfs.
|
||||
|
||||
A bus driver may also supply additional parameters for userspace to
|
||||
consume. To do this, a bus must implement the 'hotplug' method in
|
||||
struct bus_type::
|
||||
|
||||
int (*hotplug) (struct device *dev, char **envp,
|
||||
int num_envp, char *buffer, int buffer_size);
|
||||
|
||||
This is called immediately before /sbin/hotplug is executed.
|
||||
|
||||
|
||||
Step 7: Cleaning up the bus driver.
|
||||
|
||||
The generic bus, device, and driver structures provide several fields
|
||||
that can replace those defined privately to the bus driver.
|
||||
|
||||
- Device list.
|
||||
|
||||
struct bus_type contains a list of all devices registered with the bus
|
||||
type. This includes all devices on all instances of that bus type.
|
||||
An internal list that the bus uses may be removed, in favor of using
|
||||
this one.
|
||||
|
||||
The core provides an iterator to access these devices::
|
||||
|
||||
int bus_for_each_dev(struct bus_type * bus, struct device * start,
|
||||
void * data, int (*fn)(struct device *, void *));
|
||||
|
||||
|
||||
- Driver list.
|
||||
|
||||
struct bus_type also contains a list of all drivers registered with
|
||||
it. An internal list of drivers that the bus driver maintains may
|
||||
be removed in favor of using the generic one.
|
||||
|
||||
The drivers may be iterated over, like devices::
|
||||
|
||||
int bus_for_each_drv(struct bus_type * bus, struct device_driver * start,
|
||||
void * data, int (*fn)(struct device_driver *, void *));
|
||||
|
||||
|
||||
Please see drivers/base/bus.c for more information.
|
||||
|
||||
|
||||
- rwsem
|
||||
|
||||
struct bus_type contains an rwsem that protects all core accesses to
|
||||
the device and driver lists. This can be used by the bus driver
|
||||
internally, and should be used when accessing the device or driver
|
||||
lists the bus maintains.
|
||||
|
||||
|
||||
- Device and driver fields.
|
||||
|
||||
Some of the fields in struct device and struct device_driver duplicate
|
||||
fields in the bus-specific representations of these objects. Feel free
|
||||
to remove the bus-specific ones and favor the generic ones. Note
|
||||
though, that this will likely mean fixing up all the drivers that
|
||||
reference the bus-specific fields (though those should all be 1-line
|
||||
changes).
|
119
Documentation/driver-api/early-userspace/buffer-format.rst
Normal file
119
Documentation/driver-api/early-userspace/buffer-format.rst
Normal file
@@ -0,0 +1,119 @@
|
||||
=======================
|
||||
initramfs buffer format
|
||||
=======================
|
||||
|
||||
Al Viro, H. Peter Anvin
|
||||
|
||||
Last revision: 2002-01-13
|
||||
|
||||
Starting with kernel 2.5.x, the old "initial ramdisk" protocol is
|
||||
getting {replaced/complemented} with the new "initial ramfs"
|
||||
(initramfs) protocol. The initramfs contents is passed using the same
|
||||
memory buffer protocol used by the initrd protocol, but the contents
|
||||
is different. The initramfs buffer contains an archive which is
|
||||
expanded into a ramfs filesystem; this document details the format of
|
||||
the initramfs buffer format.
|
||||
|
||||
The initramfs buffer format is based around the "newc" or "crc" CPIO
|
||||
formats, and can be created with the cpio(1) utility. The cpio
|
||||
archive can be compressed using gzip(1). One valid version of an
|
||||
initramfs buffer is thus a single .cpio.gz file.
|
||||
|
||||
The full format of the initramfs buffer is defined by the following
|
||||
grammar, where::
|
||||
|
||||
* is used to indicate "0 or more occurrences of"
|
||||
(|) indicates alternatives
|
||||
+ indicates concatenation
|
||||
GZIP() indicates the gzip(1) of the operand
|
||||
ALGN(n) means padding with null bytes to an n-byte boundary
|
||||
|
||||
initramfs := ("\0" | cpio_archive | cpio_gzip_archive)*
|
||||
|
||||
cpio_gzip_archive := GZIP(cpio_archive)
|
||||
|
||||
cpio_archive := cpio_file* + (<nothing> | cpio_trailer)
|
||||
|
||||
cpio_file := ALGN(4) + cpio_header + filename + "\0" + ALGN(4) + data
|
||||
|
||||
cpio_trailer := ALGN(4) + cpio_header + "TRAILER!!!\0" + ALGN(4)
|
||||
|
||||
|
||||
In human terms, the initramfs buffer contains a collection of
|
||||
compressed and/or uncompressed cpio archives (in the "newc" or "crc"
|
||||
formats); arbitrary amounts zero bytes (for padding) can be added
|
||||
between members.
|
||||
|
||||
The cpio "TRAILER!!!" entry (cpio end-of-archive) is optional, but is
|
||||
not ignored; see "handling of hard links" below.
|
||||
|
||||
The structure of the cpio_header is as follows (all fields contain
|
||||
hexadecimal ASCII numbers fully padded with '0' on the left to the
|
||||
full width of the field, for example, the integer 4780 is represented
|
||||
by the ASCII string "000012ac"):
|
||||
|
||||
============= ================== ==============================================
|
||||
Field name Field size Meaning
|
||||
============= ================== ==============================================
|
||||
c_magic 6 bytes The string "070701" or "070702"
|
||||
c_ino 8 bytes File inode number
|
||||
c_mode 8 bytes File mode and permissions
|
||||
c_uid 8 bytes File uid
|
||||
c_gid 8 bytes File gid
|
||||
c_nlink 8 bytes Number of links
|
||||
c_mtime 8 bytes Modification time
|
||||
c_filesize 8 bytes Size of data field
|
||||
c_maj 8 bytes Major part of file device number
|
||||
c_min 8 bytes Minor part of file device number
|
||||
c_rmaj 8 bytes Major part of device node reference
|
||||
c_rmin 8 bytes Minor part of device node reference
|
||||
c_namesize 8 bytes Length of filename, including final \0
|
||||
c_chksum 8 bytes Checksum of data field if c_magic is 070702;
|
||||
otherwise zero
|
||||
============= ================== ==============================================
|
||||
|
||||
The c_mode field matches the contents of st_mode returned by stat(2)
|
||||
on Linux, and encodes the file type and file permissions.
|
||||
|
||||
The c_filesize should be zero for any file which is not a regular file
|
||||
or symlink.
|
||||
|
||||
The c_chksum field contains a simple 32-bit unsigned sum of all the
|
||||
bytes in the data field. cpio(1) refers to this as "crc", which is
|
||||
clearly incorrect (a cyclic redundancy check is a different and
|
||||
significantly stronger integrity check), however, this is the
|
||||
algorithm used.
|
||||
|
||||
If the filename is "TRAILER!!!" this is actually an end-of-archive
|
||||
marker; the c_filesize for an end-of-archive marker must be zero.
|
||||
|
||||
|
||||
Handling of hard links
|
||||
======================
|
||||
|
||||
When a nondirectory with c_nlink > 1 is seen, the (c_maj,c_min,c_ino)
|
||||
tuple is looked up in a tuple buffer. If not found, it is entered in
|
||||
the tuple buffer and the entry is created as usual; if found, a hard
|
||||
link rather than a second copy of the file is created. It is not
|
||||
necessary (but permitted) to include a second copy of the file
|
||||
contents; if the file contents is not included, the c_filesize field
|
||||
should be set to zero to indicate no data section follows. If data is
|
||||
present, the previous instance of the file is overwritten; this allows
|
||||
the data-carrying instance of a file to occur anywhere in the sequence
|
||||
(GNU cpio is reported to attach the data to the last instance of a
|
||||
file only.)
|
||||
|
||||
c_filesize must not be zero for a symlink.
|
||||
|
||||
When a "TRAILER!!!" end-of-archive marker is seen, the tuple buffer is
|
||||
reset. This permits archives which are generated independently to be
|
||||
concatenated.
|
||||
|
||||
To combine file data from different sources (without having to
|
||||
regenerate the (c_maj,c_min,c_ino) fields), therefore, either one of
|
||||
the following techniques can be used:
|
||||
|
||||
a) Separate the different file data sources with a "TRAILER!!!"
|
||||
end-of-archive marker, or
|
||||
|
||||
b) Make sure c_nlink == 1 for all nondirectory entries.
|
@@ -0,0 +1,154 @@
|
||||
=======================
|
||||
Early userspace support
|
||||
=======================
|
||||
|
||||
Last update: 2004-12-20 tlh
|
||||
|
||||
|
||||
"Early userspace" is a set of libraries and programs that provide
|
||||
various pieces of functionality that are important enough to be
|
||||
available while a Linux kernel is coming up, but that don't need to be
|
||||
run inside the kernel itself.
|
||||
|
||||
It consists of several major infrastructure components:
|
||||
|
||||
- gen_init_cpio, a program that builds a cpio-format archive
|
||||
containing a root filesystem image. This archive is compressed, and
|
||||
the compressed image is linked into the kernel image.
|
||||
- initramfs, a chunk of code that unpacks the compressed cpio image
|
||||
midway through the kernel boot process.
|
||||
- klibc, a userspace C library, currently packaged separately, that is
|
||||
optimized for correctness and small size.
|
||||
|
||||
The cpio file format used by initramfs is the "newc" (aka "cpio -H newc")
|
||||
format, and is documented in the file "buffer-format.txt". There are
|
||||
two ways to add an early userspace image: specify an existing cpio
|
||||
archive to be used as the image or have the kernel build process build
|
||||
the image from specifications.
|
||||
|
||||
CPIO ARCHIVE method
|
||||
-------------------
|
||||
|
||||
You can create a cpio archive that contains the early userspace image.
|
||||
Your cpio archive should be specified in CONFIG_INITRAMFS_SOURCE and it
|
||||
will be used directly. Only a single cpio file may be specified in
|
||||
CONFIG_INITRAMFS_SOURCE and directory and file names are not allowed in
|
||||
combination with a cpio archive.
|
||||
|
||||
IMAGE BUILDING method
|
||||
---------------------
|
||||
|
||||
The kernel build process can also build an early userspace image from
|
||||
source parts rather than supplying a cpio archive. This method provides
|
||||
a way to create images with root-owned files even though the image was
|
||||
built by an unprivileged user.
|
||||
|
||||
The image is specified as one or more sources in
|
||||
CONFIG_INITRAMFS_SOURCE. Sources can be either directories or files -
|
||||
cpio archives are *not* allowed when building from sources.
|
||||
|
||||
A source directory will have it and all of its contents packaged. The
|
||||
specified directory name will be mapped to '/'. When packaging a
|
||||
directory, limited user and group ID translation can be performed.
|
||||
INITRAMFS_ROOT_UID can be set to a user ID that needs to be mapped to
|
||||
user root (0). INITRAMFS_ROOT_GID can be set to a group ID that needs
|
||||
to be mapped to group root (0).
|
||||
|
||||
A source file must be directives in the format required by the
|
||||
usr/gen_init_cpio utility (run 'usr/gen_init_cpio -h' to get the
|
||||
file format). The directives in the file will be passed directly to
|
||||
usr/gen_init_cpio.
|
||||
|
||||
When a combination of directories and files are specified then the
|
||||
initramfs image will be an aggregate of all of them. In this way a user
|
||||
can create a 'root-image' directory and install all files into it.
|
||||
Because device-special files cannot be created by a unprivileged user,
|
||||
special files can be listed in a 'root-files' file. Both 'root-image'
|
||||
and 'root-files' can be listed in CONFIG_INITRAMFS_SOURCE and a complete
|
||||
early userspace image can be built by an unprivileged user.
|
||||
|
||||
As a technical note, when directories and files are specified, the
|
||||
entire CONFIG_INITRAMFS_SOURCE is passed to
|
||||
usr/gen_initramfs_list.sh. This means that CONFIG_INITRAMFS_SOURCE
|
||||
can really be interpreted as any legal argument to
|
||||
gen_initramfs_list.sh. If a directory is specified as an argument then
|
||||
the contents are scanned, uid/gid translation is performed, and
|
||||
usr/gen_init_cpio file directives are output. If a directory is
|
||||
specified as an argument to usr/gen_initramfs_list.sh then the
|
||||
contents of the file are simply copied to the output. All of the output
|
||||
directives from directory scanning and file contents copying are
|
||||
processed by usr/gen_init_cpio.
|
||||
|
||||
See also 'usr/gen_initramfs_list.sh -h'.
|
||||
|
||||
Where's this all leading?
|
||||
=========================
|
||||
|
||||
The klibc distribution contains some of the necessary software to make
|
||||
early userspace useful. The klibc distribution is currently
|
||||
maintained separately from the kernel.
|
||||
|
||||
You can obtain somewhat infrequent snapshots of klibc from
|
||||
https://www.kernel.org/pub/linux/libs/klibc/
|
||||
|
||||
For active users, you are better off using the klibc git
|
||||
repository, at http://git.kernel.org/?p=libs/klibc/klibc.git
|
||||
|
||||
The standalone klibc distribution currently provides three components,
|
||||
in addition to the klibc library:
|
||||
|
||||
- ipconfig, a program that configures network interfaces. It can
|
||||
configure them statically, or use DHCP to obtain information
|
||||
dynamically (aka "IP autoconfiguration").
|
||||
- nfsmount, a program that can mount an NFS filesystem.
|
||||
- kinit, the "glue" that uses ipconfig and nfsmount to replace the old
|
||||
support for IP autoconfig, mount a filesystem over NFS, and continue
|
||||
system boot using that filesystem as root.
|
||||
|
||||
kinit is built as a single statically linked binary to save space.
|
||||
|
||||
Eventually, several more chunks of kernel functionality will hopefully
|
||||
move to early userspace:
|
||||
|
||||
- Almost all of init/do_mounts* (the beginning of this is already in
|
||||
place)
|
||||
- ACPI table parsing
|
||||
- Insert unwieldy subsystem that doesn't really need to be in kernel
|
||||
space here
|
||||
|
||||
If kinit doesn't meet your current needs and you've got bytes to burn,
|
||||
the klibc distribution includes a small Bourne-compatible shell (ash)
|
||||
and a number of other utilities, so you can replace kinit and build
|
||||
custom initramfs images that meet your needs exactly.
|
||||
|
||||
For questions and help, you can sign up for the early userspace
|
||||
mailing list at http://www.zytor.com/mailman/listinfo/klibc
|
||||
|
||||
How does it work?
|
||||
=================
|
||||
|
||||
The kernel has currently 3 ways to mount the root filesystem:
|
||||
|
||||
a) all required device and filesystem drivers compiled into the kernel, no
|
||||
initrd. init/main.c:init() will call prepare_namespace() to mount the
|
||||
final root filesystem, based on the root= option and optional init= to run
|
||||
some other init binary than listed at the end of init/main.c:init().
|
||||
|
||||
b) some device and filesystem drivers built as modules and stored in an
|
||||
initrd. The initrd must contain a binary '/linuxrc' which is supposed to
|
||||
load these driver modules. It is also possible to mount the final root
|
||||
filesystem via linuxrc and use the pivot_root syscall. The initrd is
|
||||
mounted and executed via prepare_namespace().
|
||||
|
||||
c) using initramfs. The call to prepare_namespace() must be skipped.
|
||||
This means that a binary must do all the work. Said binary can be stored
|
||||
into initramfs either via modifying usr/gen_init_cpio.c or via the new
|
||||
initrd format, an cpio archive. It must be called "/init". This binary
|
||||
is responsible to do all the things prepare_namespace() would do.
|
||||
|
||||
To maintain backwards compatibility, the /init binary will only run if it
|
||||
comes via an initramfs cpio archive. If this is not the case,
|
||||
init/main.c:init() will run prepare_namespace() to mount the final root
|
||||
and exec one of the predefined init binaries.
|
||||
|
||||
Bryan O'Sullivan <bos@serpentine.com>
|
18
Documentation/driver-api/early-userspace/index.rst
Normal file
18
Documentation/driver-api/early-userspace/index.rst
Normal file
@@ -0,0 +1,18 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===============
|
||||
Early Userspace
|
||||
===============
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
early_userspace_support
|
||||
buffer-format
|
||||
|
||||
.. only:: subproject and html
|
||||
|
||||
Indices
|
||||
=======
|
||||
|
||||
* :ref:`genindex`
|
58
Documentation/driver-api/edid.rst
Normal file
58
Documentation/driver-api/edid.rst
Normal file
@@ -0,0 +1,58 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
====
|
||||
EDID
|
||||
====
|
||||
|
||||
In the good old days when graphics parameters were configured explicitly
|
||||
in a file called xorg.conf, even broken hardware could be managed.
|
||||
|
||||
Today, with the advent of Kernel Mode Setting, a graphics board is
|
||||
either correctly working because all components follow the standards -
|
||||
or the computer is unusable, because the screen remains dark after
|
||||
booting or it displays the wrong area. Cases when this happens are:
|
||||
- The graphics board does not recognize the monitor.
|
||||
- The graphics board is unable to detect any EDID data.
|
||||
- The graphics board incorrectly forwards EDID data to the driver.
|
||||
- The monitor sends no or bogus EDID data.
|
||||
- A KVM sends its own EDID data instead of querying the connected monitor.
|
||||
Adding the kernel parameter "nomodeset" helps in most cases, but causes
|
||||
restrictions later on.
|
||||
|
||||
As a remedy for such situations, the kernel configuration item
|
||||
CONFIG_DRM_LOAD_EDID_FIRMWARE was introduced. It allows to provide an
|
||||
individually prepared or corrected EDID data set in the /lib/firmware
|
||||
directory from where it is loaded via the firmware interface. The code
|
||||
(see drivers/gpu/drm/drm_edid_load.c) contains built-in data sets for
|
||||
commonly used screen resolutions (800x600, 1024x768, 1280x1024, 1600x1200,
|
||||
1680x1050, 1920x1080) as binary blobs, but the kernel source tree does
|
||||
not contain code to create these data. In order to elucidate the origin
|
||||
of the built-in binary EDID blobs and to facilitate the creation of
|
||||
individual data for a specific misbehaving monitor, commented sources
|
||||
and a Makefile environment are given here.
|
||||
|
||||
To create binary EDID and C source code files from the existing data
|
||||
material, simply type "make".
|
||||
|
||||
If you want to create your own EDID file, copy the file 1024x768.S,
|
||||
replace the settings with your own data and add a new target to the
|
||||
Makefile. Please note that the EDID data structure expects the timing
|
||||
values in a different way as compared to the standard X11 format.
|
||||
|
||||
X11:
|
||||
HTimings:
|
||||
hdisp hsyncstart hsyncend htotal
|
||||
VTimings:
|
||||
vdisp vsyncstart vsyncend vtotal
|
||||
|
||||
EDID::
|
||||
|
||||
#define XPIX hdisp
|
||||
#define XBLANK htotal-hdisp
|
||||
#define XOFFSET hsyncstart-hdisp
|
||||
#define XPULSE hsyncend-hsyncstart
|
||||
|
||||
#define YPIX vdisp
|
||||
#define YBLANK vtotal-vdisp
|
||||
#define YOFFSET vsyncstart-vdisp
|
||||
#define YPULSE vsyncend-vsyncstart
|
230
Documentation/driver-api/eisa.rst
Normal file
230
Documentation/driver-api/eisa.rst
Normal file
@@ -0,0 +1,230 @@
|
||||
================
|
||||
EISA bus support
|
||||
================
|
||||
|
||||
:Author: Marc Zyngier <maz@wild-wind.fr.eu.org>
|
||||
|
||||
This document groups random notes about porting EISA drivers to the
|
||||
new EISA/sysfs API.
|
||||
|
||||
Starting from version 2.5.59, the EISA bus is almost given the same
|
||||
status as other much more mainstream busses such as PCI or USB. This
|
||||
has been possible through sysfs, which defines a nice enough set of
|
||||
abstractions to manage busses, devices and drivers.
|
||||
|
||||
Although the new API is quite simple to use, converting existing
|
||||
drivers to the new infrastructure is not an easy task (mostly because
|
||||
detection code is generally also used to probe ISA cards). Moreover,
|
||||
most EISA drivers are among the oldest Linux drivers so, as you can
|
||||
imagine, some dust has settled here over the years.
|
||||
|
||||
The EISA infrastructure is made up of three parts:
|
||||
|
||||
- The bus code implements most of the generic code. It is shared
|
||||
among all the architectures that the EISA code runs on. It
|
||||
implements bus probing (detecting EISA cards available on the bus),
|
||||
allocates I/O resources, allows fancy naming through sysfs, and
|
||||
offers interfaces for driver to register.
|
||||
|
||||
- The bus root driver implements the glue between the bus hardware
|
||||
and the generic bus code. It is responsible for discovering the
|
||||
device implementing the bus, and setting it up to be latter probed
|
||||
by the bus code. This can go from something as simple as reserving
|
||||
an I/O region on x86, to the rather more complex, like the hppa
|
||||
EISA code. This is the part to implement in order to have EISA
|
||||
running on an "new" platform.
|
||||
|
||||
- The driver offers the bus a list of devices that it manages, and
|
||||
implements the necessary callbacks to probe and release devices
|
||||
whenever told to.
|
||||
|
||||
Every function/structure below lives in <linux/eisa.h>, which depends
|
||||
heavily on <linux/device.h>.
|
||||
|
||||
Bus root driver
|
||||
===============
|
||||
|
||||
::
|
||||
|
||||
int eisa_root_register (struct eisa_root_device *root);
|
||||
|
||||
The eisa_root_register function is used to declare a device as the
|
||||
root of an EISA bus. The eisa_root_device structure holds a reference
|
||||
to this device, as well as some parameters for probing purposes::
|
||||
|
||||
struct eisa_root_device {
|
||||
struct device *dev; /* Pointer to bridge device */
|
||||
struct resource *res;
|
||||
unsigned long bus_base_addr;
|
||||
int slots; /* Max slot number */
|
||||
int force_probe; /* Probe even when no slot 0 */
|
||||
u64 dma_mask; /* from bridge device */
|
||||
int bus_nr; /* Set by eisa_root_register */
|
||||
struct resource eisa_root_res; /* ditto */
|
||||
};
|
||||
|
||||
============= ======================================================
|
||||
node used for eisa_root_register internal purpose
|
||||
dev pointer to the root device
|
||||
res root device I/O resource
|
||||
bus_base_addr slot 0 address on this bus
|
||||
slots max slot number to probe
|
||||
force_probe Probe even when slot 0 is empty (no EISA mainboard)
|
||||
dma_mask Default DMA mask. Usually the bridge device dma_mask.
|
||||
bus_nr unique bus id, set by eisa_root_register
|
||||
============= ======================================================
|
||||
|
||||
Driver
|
||||
======
|
||||
|
||||
::
|
||||
|
||||
int eisa_driver_register (struct eisa_driver *edrv);
|
||||
void eisa_driver_unregister (struct eisa_driver *edrv);
|
||||
|
||||
Clear enough ?
|
||||
|
||||
::
|
||||
|
||||
struct eisa_device_id {
|
||||
char sig[EISA_SIG_LEN];
|
||||
unsigned long driver_data;
|
||||
};
|
||||
|
||||
struct eisa_driver {
|
||||
const struct eisa_device_id *id_table;
|
||||
struct device_driver driver;
|
||||
};
|
||||
|
||||
=============== ====================================================
|
||||
id_table an array of NULL terminated EISA id strings,
|
||||
followed by an empty string. Each string can
|
||||
optionally be paired with a driver-dependent value
|
||||
(driver_data).
|
||||
|
||||
driver a generic driver, such as described in
|
||||
Documentation/driver-api/driver-model/driver.rst. Only .name,
|
||||
.probe and .remove members are mandatory.
|
||||
=============== ====================================================
|
||||
|
||||
An example is the 3c59x driver::
|
||||
|
||||
static struct eisa_device_id vortex_eisa_ids[] = {
|
||||
{ "TCM5920", EISA_3C592_OFFSET },
|
||||
{ "TCM5970", EISA_3C597_OFFSET },
|
||||
{ "" }
|
||||
};
|
||||
|
||||
static struct eisa_driver vortex_eisa_driver = {
|
||||
.id_table = vortex_eisa_ids,
|
||||
.driver = {
|
||||
.name = "3c59x",
|
||||
.probe = vortex_eisa_probe,
|
||||
.remove = vortex_eisa_remove
|
||||
}
|
||||
};
|
||||
|
||||
Device
|
||||
======
|
||||
|
||||
The sysfs framework calls .probe and .remove functions upon device
|
||||
discovery and removal (note that the .remove function is only called
|
||||
when driver is built as a module).
|
||||
|
||||
Both functions are passed a pointer to a 'struct device', which is
|
||||
encapsulated in a 'struct eisa_device' described as follows::
|
||||
|
||||
struct eisa_device {
|
||||
struct eisa_device_id id;
|
||||
int slot;
|
||||
int state;
|
||||
unsigned long base_addr;
|
||||
struct resource res[EISA_MAX_RESOURCES];
|
||||
u64 dma_mask;
|
||||
struct device dev; /* generic device */
|
||||
};
|
||||
|
||||
======== ============================================================
|
||||
id EISA id, as read from device. id.driver_data is set from the
|
||||
matching driver EISA id.
|
||||
slot slot number which the device was detected on
|
||||
state set of flags indicating the state of the device. Current
|
||||
flags are EISA_CONFIG_ENABLED and EISA_CONFIG_FORCED.
|
||||
res set of four 256 bytes I/O regions allocated to this device
|
||||
dma_mask DMA mask set from the parent device.
|
||||
dev generic device (see Documentation/driver-api/driver-model/device.rst)
|
||||
======== ============================================================
|
||||
|
||||
You can get the 'struct eisa_device' from 'struct device' using the
|
||||
'to_eisa_device' macro.
|
||||
|
||||
Misc stuff
|
||||
==========
|
||||
|
||||
::
|
||||
|
||||
void eisa_set_drvdata (struct eisa_device *edev, void *data);
|
||||
|
||||
Stores data into the device's driver_data area.
|
||||
|
||||
::
|
||||
|
||||
void *eisa_get_drvdata (struct eisa_device *edev):
|
||||
|
||||
Gets the pointer previously stored into the device's driver_data area.
|
||||
|
||||
::
|
||||
|
||||
int eisa_get_region_index (void *addr);
|
||||
|
||||
Returns the region number (0 <= x < EISA_MAX_RESOURCES) of a given
|
||||
address.
|
||||
|
||||
Kernel parameters
|
||||
=================
|
||||
|
||||
eisa_bus.enable_dev
|
||||
A comma-separated list of slots to be enabled, even if the firmware
|
||||
set the card as disabled. The driver must be able to properly
|
||||
initialize the device in such conditions.
|
||||
|
||||
eisa_bus.disable_dev
|
||||
A comma-separated list of slots to be enabled, even if the firmware
|
||||
set the card as enabled. The driver won't be called to handle this
|
||||
device.
|
||||
|
||||
virtual_root.force_probe
|
||||
Force the probing code to probe EISA slots even when it cannot find an
|
||||
EISA compliant mainboard (nothing appears on slot 0). Defaults to 0
|
||||
(don't force), and set to 1 (force probing) when either
|
||||
CONFIG_ALPHA_JENSEN or CONFIG_EISA_VLB_PRIMING are set.
|
||||
|
||||
Random notes
|
||||
============
|
||||
|
||||
Converting an EISA driver to the new API mostly involves *deleting*
|
||||
code (since probing is now in the core EISA code). Unfortunately, most
|
||||
drivers share their probing routine between ISA, and EISA. Special
|
||||
care must be taken when ripping out the EISA code, so other busses
|
||||
won't suffer from these surgical strikes...
|
||||
|
||||
You *must not* expect any EISA device to be detected when returning
|
||||
from eisa_driver_register, since the chances are that the bus has not
|
||||
yet been probed. In fact, that's what happens most of the time (the
|
||||
bus root driver usually kicks in rather late in the boot process).
|
||||
Unfortunately, most drivers are doing the probing by themselves, and
|
||||
expect to have explored the whole machine when they exit their probe
|
||||
routine.
|
||||
|
||||
For example, switching your favorite EISA SCSI card to the "hotplug"
|
||||
model is "the right thing"(tm).
|
||||
|
||||
Thanks
|
||||
======
|
||||
|
||||
I'd like to thank the following people for their help:
|
||||
|
||||
- Xavier Benigni for lending me a wonderful Alpha Jensen,
|
||||
- James Bottomley, Jeff Garzik for getting this stuff into the kernel,
|
||||
- Andries Brouwer for contributing numerous EISA ids,
|
||||
- Catrin Jones for coping with far too many machines at home.
|
@@ -399,7 +399,7 @@ symbol:
|
||||
will pass the struct gpio_chip* for the chip to all IRQ callbacks, so the
|
||||
callbacks need to embed the gpio_chip in its state container and obtain a
|
||||
pointer to the container using container_of().
|
||||
(See Documentation/driver-model/design-patterns.rst)
|
||||
(See Documentation/driver-api/driver-model/design-patterns.rst)
|
||||
|
||||
- gpiochip_irqchip_add_nested(): adds a nested cascaded irqchip to a gpiochip,
|
||||
as discussed above regarding different types of cascaded irqchips. The
|
||||
|
@@ -14,8 +14,10 @@ available subsections can be seen below.
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
driver-model/index
|
||||
basics
|
||||
infrastructure
|
||||
early-userspace/index
|
||||
pm/index
|
||||
clk
|
||||
device-io
|
||||
@@ -36,6 +38,7 @@ available subsections can be seen below.
|
||||
i2c
|
||||
ipmb
|
||||
i3c/index
|
||||
interconnect
|
||||
hsi
|
||||
edac
|
||||
scsi
|
||||
@@ -44,8 +47,11 @@ available subsections can be seen below.
|
||||
mtdnand
|
||||
miscellaneous
|
||||
mei/index
|
||||
mtd/index
|
||||
mmc/index
|
||||
nvdimm/index
|
||||
w1
|
||||
rapidio
|
||||
rapidio/index
|
||||
s390-drivers
|
||||
vme
|
||||
80211/index
|
||||
@@ -53,13 +59,48 @@ available subsections can be seen below.
|
||||
firmware/index
|
||||
pinctl
|
||||
gpio/index
|
||||
md/index
|
||||
misc_devices
|
||||
nfc/index
|
||||
dmaengine/index
|
||||
slimbus
|
||||
soundwire/index
|
||||
fpga/index
|
||||
acpi/index
|
||||
backlight/lp855x-driver.rst
|
||||
bt8xxgpio
|
||||
connector
|
||||
console
|
||||
dcdbas
|
||||
dell_rbu
|
||||
edid
|
||||
eisa
|
||||
isa
|
||||
isapnp
|
||||
generic-counter
|
||||
lightnvm-pblk
|
||||
memory-devices/index
|
||||
men-chameleon-bus
|
||||
ntb
|
||||
nvmem
|
||||
parport-lowlevel
|
||||
pps
|
||||
ptp
|
||||
phy/index
|
||||
pti_intel_mid
|
||||
pwm
|
||||
rfkill
|
||||
serial/index
|
||||
sgi-ioc4
|
||||
sm501
|
||||
smsc_ece1099
|
||||
switchtec
|
||||
sync_file
|
||||
vfio-mediated-device
|
||||
vfio
|
||||
xilinx/index
|
||||
xillybus
|
||||
zorro
|
||||
|
||||
.. only:: subproject and html
|
||||
|
||||
|
93
Documentation/driver-api/interconnect.rst
Normal file
93
Documentation/driver-api/interconnect.rst
Normal file
@@ -0,0 +1,93 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=====================================
|
||||
GENERIC SYSTEM INTERCONNECT SUBSYSTEM
|
||||
=====================================
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
This framework is designed to provide a standard kernel interface to control
|
||||
the settings of the interconnects on an SoC. These settings can be throughput,
|
||||
latency and priority between multiple interconnected devices or functional
|
||||
blocks. This can be controlled dynamically in order to save power or provide
|
||||
maximum performance.
|
||||
|
||||
The interconnect bus is hardware with configurable parameters, which can be
|
||||
set on a data path according to the requests received from various drivers.
|
||||
An example of interconnect buses are the interconnects between various
|
||||
components or functional blocks in chipsets. There can be multiple interconnects
|
||||
on an SoC that can be multi-tiered.
|
||||
|
||||
Below is a simplified diagram of a real-world SoC interconnect bus topology.
|
||||
|
||||
::
|
||||
|
||||
+----------------+ +----------------+
|
||||
| HW Accelerator |--->| M NoC |<---------------+
|
||||
+----------------+ +----------------+ |
|
||||
| | +------------+
|
||||
+-----+ +-------------+ V +------+ | |
|
||||
| DDR | | +--------+ | PCIe | | |
|
||||
+-----+ | | Slaves | +------+ | |
|
||||
^ ^ | +--------+ | | C NoC |
|
||||
| | V V | |
|
||||
+------------------+ +------------------------+ | | +-----+
|
||||
| |-->| |-->| |-->| CPU |
|
||||
| |-->| |<--| | +-----+
|
||||
| Mem NoC | | S NoC | +------------+
|
||||
| |<--| |---------+ |
|
||||
| |<--| |<------+ | | +--------+
|
||||
+------------------+ +------------------------+ | | +-->| Slaves |
|
||||
^ ^ ^ ^ ^ | | +--------+
|
||||
| | | | | | V
|
||||
+------+ | +-----+ +-----+ +---------+ +----------------+ +--------+
|
||||
| CPUs | | | GPU | | DSP | | Masters |-->| P NoC |-->| Slaves |
|
||||
+------+ | +-----+ +-----+ +---------+ +----------------+ +--------+
|
||||
|
|
||||
+-------+
|
||||
| Modem |
|
||||
+-------+
|
||||
|
||||
Terminology
|
||||
-----------
|
||||
|
||||
Interconnect provider is the software definition of the interconnect hardware.
|
||||
The interconnect providers on the above diagram are M NoC, S NoC, C NoC, P NoC
|
||||
and Mem NoC.
|
||||
|
||||
Interconnect node is the software definition of the interconnect hardware
|
||||
port. Each interconnect provider consists of multiple interconnect nodes,
|
||||
which are connected to other SoC components including other interconnect
|
||||
providers. The point on the diagram where the CPUs connect to the memory is
|
||||
called an interconnect node, which belongs to the Mem NoC interconnect provider.
|
||||
|
||||
Interconnect endpoints are the first or the last element of the path. Every
|
||||
endpoint is a node, but not every node is an endpoint.
|
||||
|
||||
Interconnect path is everything between two endpoints including all the nodes
|
||||
that have to be traversed to reach from a source to destination node. It may
|
||||
include multiple master-slave pairs across several interconnect providers.
|
||||
|
||||
Interconnect consumers are the entities which make use of the data paths exposed
|
||||
by the providers. The consumers send requests to providers requesting various
|
||||
throughput, latency and priority. Usually the consumers are device drivers, that
|
||||
send request based on their needs. An example for a consumer is a video decoder
|
||||
that supports various formats and image sizes.
|
||||
|
||||
Interconnect providers
|
||||
----------------------
|
||||
|
||||
Interconnect provider is an entity that implements methods to initialize and
|
||||
configure interconnect bus hardware. The interconnect provider drivers should
|
||||
be registered with the interconnect provider core.
|
||||
|
||||
.. kernel-doc:: include/linux/interconnect-provider.h
|
||||
|
||||
Interconnect consumers
|
||||
----------------------
|
||||
|
||||
Interconnect consumers are the clients which use the interconnect APIs to
|
||||
get paths between endpoints and set their bandwidth/latency/QoS requirements
|
||||
for these interconnect paths. These interfaces are not currently
|
||||
documented.
|
122
Documentation/driver-api/isa.rst
Normal file
122
Documentation/driver-api/isa.rst
Normal file
@@ -0,0 +1,122 @@
|
||||
===========
|
||||
ISA Drivers
|
||||
===========
|
||||
|
||||
The following text is adapted from the commit message of the initial
|
||||
commit of the ISA bus driver authored by Rene Herman.
|
||||
|
||||
During the recent "isa drivers using platform devices" discussion it was
|
||||
pointed out that (ALSA) ISA drivers ran into the problem of not having
|
||||
the option to fail driver load (device registration rather) upon not
|
||||
finding their hardware due to a probe() error not being passed up
|
||||
through the driver model. In the course of that, I suggested a separate
|
||||
ISA bus might be best; Russell King agreed and suggested this bus could
|
||||
use the .match() method for the actual device discovery.
|
||||
|
||||
The attached does this. For this old non (generically) discoverable ISA
|
||||
hardware only the driver itself can do discovery so as a difference with
|
||||
the platform_bus, this isa_bus also distributes match() up to the
|
||||
driver.
|
||||
|
||||
As another difference: these devices only exist in the driver model due
|
||||
to the driver creating them because it might want to drive them, meaning
|
||||
that all device creation has been made internal as well.
|
||||
|
||||
The usage model this provides is nice, and has been acked from the ALSA
|
||||
side by Takashi Iwai and Jaroslav Kysela. The ALSA driver module_init's
|
||||
now (for oldisa-only drivers) become::
|
||||
|
||||
static int __init alsa_card_foo_init(void)
|
||||
{
|
||||
return isa_register_driver(&snd_foo_isa_driver, SNDRV_CARDS);
|
||||
}
|
||||
|
||||
static void __exit alsa_card_foo_exit(void)
|
||||
{
|
||||
isa_unregister_driver(&snd_foo_isa_driver);
|
||||
}
|
||||
|
||||
Quite like the other bus models therefore. This removes a lot of
|
||||
duplicated init code from the ALSA ISA drivers.
|
||||
|
||||
The passed in isa_driver struct is the regular driver struct embedding a
|
||||
struct device_driver, the normal probe/remove/shutdown/suspend/resume
|
||||
callbacks, and as indicated that .match callback.
|
||||
|
||||
The "SNDRV_CARDS" you see being passed in is a "unsigned int ndev"
|
||||
parameter, indicating how many devices to create and call our methods
|
||||
with.
|
||||
|
||||
The platform_driver callbacks are called with a platform_device param;
|
||||
the isa_driver callbacks are being called with a ``struct device *dev,
|
||||
unsigned int id`` pair directly -- with the device creation completely
|
||||
internal to the bus it's much cleaner to not leak isa_dev's by passing
|
||||
them in at all. The id is the only thing we ever want other then the
|
||||
struct device anyways, and it makes for nicer code in the callbacks as
|
||||
well.
|
||||
|
||||
With this additional .match() callback ISA drivers have all options. If
|
||||
ALSA would want to keep the old non-load behaviour, it could stick all
|
||||
of the old .probe in .match, which would only keep them registered after
|
||||
everything was found to be present and accounted for. If it wanted the
|
||||
behaviour of always loading as it inadvertently did for a bit after the
|
||||
changeover to platform devices, it could just not provide a .match() and
|
||||
do everything in .probe() as before.
|
||||
|
||||
If it, as Takashi Iwai already suggested earlier as a way of following
|
||||
the model from saner buses more closely, wants to load when a later bind
|
||||
could conceivably succeed, it could use .match() for the prerequisites
|
||||
(such as checking the user wants the card enabled and that port/irq/dma
|
||||
values have been passed in) and .probe() for everything else. This is
|
||||
the nicest model.
|
||||
|
||||
To the code...
|
||||
|
||||
This exports only two functions; isa_{,un}register_driver().
|
||||
|
||||
isa_register_driver() register's the struct device_driver, and then
|
||||
loops over the passed in ndev creating devices and registering them.
|
||||
This causes the bus match method to be called for them, which is::
|
||||
|
||||
int isa_bus_match(struct device *dev, struct device_driver *driver)
|
||||
{
|
||||
struct isa_driver *isa_driver = to_isa_driver(driver);
|
||||
|
||||
if (dev->platform_data == isa_driver) {
|
||||
if (!isa_driver->match ||
|
||||
isa_driver->match(dev, to_isa_dev(dev)->id))
|
||||
return 1;
|
||||
dev->platform_data = NULL;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
The first thing this does is check if this device is in fact one of this
|
||||
driver's devices by seeing if the device's platform_data pointer is set
|
||||
to this driver. Platform devices compare strings, but we don't need to
|
||||
do that with everything being internal, so isa_register_driver() abuses
|
||||
dev->platform_data as a isa_driver pointer which we can then check here.
|
||||
I believe platform_data is available for this, but if rather not, moving
|
||||
the isa_driver pointer to the private struct isa_dev is ofcourse fine as
|
||||
well.
|
||||
|
||||
Then, if the the driver did not provide a .match, it matches. If it did,
|
||||
the driver match() method is called to determine a match.
|
||||
|
||||
If it did **not** match, dev->platform_data is reset to indicate this to
|
||||
isa_register_driver which can then unregister the device again.
|
||||
|
||||
If during all this, there's any error, or no devices matched at all
|
||||
everything is backed out again and the error, or -ENODEV, is returned.
|
||||
|
||||
isa_unregister_driver() just unregisters the matched devices and the
|
||||
driver itself.
|
||||
|
||||
module_isa_driver is a helper macro for ISA drivers which do not do
|
||||
anything special in module init/exit. This eliminates a lot of
|
||||
boilerplate code. Each module may only use this macro once, and calling
|
||||
it replaces module_init and module_exit.
|
||||
|
||||
max_num_isa_dev is a macro to determine the maximum possible number of
|
||||
ISA devices which may be registered in the I/O port address space given
|
||||
the address extent of the ISA devices.
|
15
Documentation/driver-api/isapnp.rst
Normal file
15
Documentation/driver-api/isapnp.rst
Normal file
@@ -0,0 +1,15 @@
|
||||
==========================================================
|
||||
ISA Plug & Play support by Jaroslav Kysela <perex@suse.cz>
|
||||
==========================================================
|
||||
|
||||
Interface /proc/isapnp
|
||||
======================
|
||||
|
||||
The interface has been removed. See pnp.txt for more details.
|
||||
|
||||
Interface /proc/bus/isapnp
|
||||
==========================
|
||||
|
||||
This directory allows access to ISA PnP cards and logical devices.
|
||||
The regular files contain the contents of ISA PnP registers for
|
||||
a logical device.
|
21
Documentation/driver-api/lightnvm-pblk.rst
Normal file
21
Documentation/driver-api/lightnvm-pblk.rst
Normal file
@@ -0,0 +1,21 @@
|
||||
pblk: Physical Block Device Target
|
||||
==================================
|
||||
|
||||
pblk implements a fully associative, host-based FTL that exposes a traditional
|
||||
block I/O interface. Its primary responsibilities are:
|
||||
|
||||
- Map logical addresses onto physical addresses (4KB granularity) in a
|
||||
logical-to-physical (L2P) table.
|
||||
- Maintain the integrity and consistency of the L2P table as well as its
|
||||
recovery from normal tear down and power outage.
|
||||
- Deal with controller- and media-specific constrains.
|
||||
- Handle I/O errors.
|
||||
- Implement garbage collection.
|
||||
- Maintain consistency across the I/O stack during synchronization points.
|
||||
|
||||
For more information please refer to:
|
||||
|
||||
http://lightnvm.io
|
||||
|
||||
which maintains updated FAQs, manual pages, technical documentation, tools,
|
||||
contacts, etc.
|
12
Documentation/driver-api/md/index.rst
Normal file
12
Documentation/driver-api/md/index.rst
Normal file
@@ -0,0 +1,12 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
====
|
||||
RAID
|
||||
====
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
md-cluster
|
||||
raid5-cache
|
||||
raid5-ppl
|
385
Documentation/driver-api/md/md-cluster.rst
Normal file
385
Documentation/driver-api/md/md-cluster.rst
Normal file
@@ -0,0 +1,385 @@
|
||||
==========
|
||||
MD Cluster
|
||||
==========
|
||||
|
||||
The cluster MD is a shared-device RAID for a cluster, it supports
|
||||
two levels: raid1 and raid10 (limited support).
|
||||
|
||||
|
||||
1. On-disk format
|
||||
=================
|
||||
|
||||
Separate write-intent-bitmaps are used for each cluster node.
|
||||
The bitmaps record all writes that may have been started on that node,
|
||||
and may not yet have finished. The on-disk layout is::
|
||||
|
||||
0 4k 8k 12k
|
||||
-------------------------------------------------------------------
|
||||
| idle | md super | bm super [0] + bits |
|
||||
| bm bits[0, contd] | bm super[1] + bits | bm bits[1, contd] |
|
||||
| bm super[2] + bits | bm bits [2, contd] | bm super[3] + bits |
|
||||
| bm bits [3, contd] | | |
|
||||
|
||||
During "normal" functioning we assume the filesystem ensures that only
|
||||
one node writes to any given block at a time, so a write request will
|
||||
|
||||
- set the appropriate bit (if not already set)
|
||||
- commit the write to all mirrors
|
||||
- schedule the bit to be cleared after a timeout.
|
||||
|
||||
Reads are just handled normally. It is up to the filesystem to ensure
|
||||
one node doesn't read from a location where another node (or the same
|
||||
node) is writing.
|
||||
|
||||
|
||||
2. DLM Locks for management
|
||||
===========================
|
||||
|
||||
There are three groups of locks for managing the device:
|
||||
|
||||
2.1 Bitmap lock resource (bm_lockres)
|
||||
-------------------------------------
|
||||
|
||||
The bm_lockres protects individual node bitmaps. They are named in
|
||||
the form bitmap000 for node 1, bitmap001 for node 2 and so on. When a
|
||||
node joins the cluster, it acquires the lock in PW mode and it stays
|
||||
so during the lifetime the node is part of the cluster. The lock
|
||||
resource number is based on the slot number returned by the DLM
|
||||
subsystem. Since DLM starts node count from one and bitmap slots
|
||||
start from zero, one is subtracted from the DLM slot number to arrive
|
||||
at the bitmap slot number.
|
||||
|
||||
The LVB of the bitmap lock for a particular node records the range
|
||||
of sectors that are being re-synced by that node. No other
|
||||
node may write to those sectors. This is used when a new nodes
|
||||
joins the cluster.
|
||||
|
||||
2.2 Message passing locks
|
||||
-------------------------
|
||||
|
||||
Each node has to communicate with other nodes when starting or ending
|
||||
resync, and for metadata superblock updates. This communication is
|
||||
managed through three locks: "token", "message", and "ack", together
|
||||
with the Lock Value Block (LVB) of one of the "message" lock.
|
||||
|
||||
2.3 new-device management
|
||||
-------------------------
|
||||
|
||||
A single lock: "no-new-dev" is used to co-ordinate the addition of
|
||||
new devices - this must be synchronized across the array.
|
||||
Normally all nodes hold a concurrent-read lock on this device.
|
||||
|
||||
3. Communication
|
||||
================
|
||||
|
||||
Messages can be broadcast to all nodes, and the sender waits for all
|
||||
other nodes to acknowledge the message before proceeding. Only one
|
||||
message can be processed at a time.
|
||||
|
||||
3.1 Message Types
|
||||
-----------------
|
||||
|
||||
There are six types of messages which are passed:
|
||||
|
||||
3.1.1 METADATA_UPDATED
|
||||
^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
informs other nodes that the metadata has
|
||||
been updated, and the node must re-read the md superblock. This is
|
||||
performed synchronously. It is primarily used to signal device
|
||||
failure.
|
||||
|
||||
3.1.2 RESYNCING
|
||||
^^^^^^^^^^^^^^^
|
||||
informs other nodes that a resync is initiated or
|
||||
ended so that each node may suspend or resume the region. Each
|
||||
RESYNCING message identifies a range of the devices that the
|
||||
sending node is about to resync. This overrides any previous
|
||||
notification from that node: only one ranged can be resynced at a
|
||||
time per-node.
|
||||
|
||||
3.1.3 NEWDISK
|
||||
^^^^^^^^^^^^^
|
||||
|
||||
informs other nodes that a device is being added to
|
||||
the array. Message contains an identifier for that device. See
|
||||
below for further details.
|
||||
|
||||
3.1.4 REMOVE
|
||||
^^^^^^^^^^^^
|
||||
|
||||
A failed or spare device is being removed from the
|
||||
array. The slot-number of the device is included in the message.
|
||||
|
||||
3.1.5 RE_ADD:
|
||||
|
||||
A failed device is being re-activated - the assumption
|
||||
is that it has been determined to be working again.
|
||||
|
||||
3.1.6 BITMAP_NEEDS_SYNC:
|
||||
|
||||
If a node is stopped locally but the bitmap
|
||||
isn't clean, then another node is informed to take the ownership of
|
||||
resync.
|
||||
|
||||
3.2 Communication mechanism
|
||||
---------------------------
|
||||
|
||||
The DLM LVB is used to communicate within nodes of the cluster. There
|
||||
are three resources used for the purpose:
|
||||
|
||||
3.2.1 token
|
||||
^^^^^^^^^^^
|
||||
The resource which protects the entire communication
|
||||
system. The node having the token resource is allowed to
|
||||
communicate.
|
||||
|
||||
3.2.2 message
|
||||
^^^^^^^^^^^^^
|
||||
The lock resource which carries the data to communicate.
|
||||
|
||||
3.2.3 ack
|
||||
^^^^^^^^^
|
||||
|
||||
The resource, acquiring which means the message has been
|
||||
acknowledged by all nodes in the cluster. The BAST of the resource
|
||||
is used to inform the receiving node that a node wants to
|
||||
communicate.
|
||||
|
||||
The algorithm is:
|
||||
|
||||
1. receive status - all nodes have concurrent-reader lock on "ack"::
|
||||
|
||||
sender receiver receiver
|
||||
"ack":CR "ack":CR "ack":CR
|
||||
|
||||
2. sender get EX on "token",
|
||||
sender get EX on "message"::
|
||||
|
||||
sender receiver receiver
|
||||
"token":EX "ack":CR "ack":CR
|
||||
"message":EX
|
||||
"ack":CR
|
||||
|
||||
Sender checks that it still needs to send a message. Messages
|
||||
received or other events that happened while waiting for the
|
||||
"token" may have made this message inappropriate or redundant.
|
||||
|
||||
3. sender writes LVB
|
||||
|
||||
sender down-convert "message" from EX to CW
|
||||
|
||||
sender try to get EX of "ack"
|
||||
|
||||
::
|
||||
|
||||
[ wait until all receivers have *processed* the "message" ]
|
||||
|
||||
[ triggered by bast of "ack" ]
|
||||
receiver get CR on "message"
|
||||
receiver read LVB
|
||||
receiver processes the message
|
||||
[ wait finish ]
|
||||
receiver releases "ack"
|
||||
receiver tries to get PR on "message"
|
||||
|
||||
sender receiver receiver
|
||||
"token":EX "message":CR "message":CR
|
||||
"message":CW
|
||||
"ack":EX
|
||||
|
||||
4. triggered by grant of EX on "ack" (indicating all receivers
|
||||
have processed message)
|
||||
|
||||
sender down-converts "ack" from EX to CR
|
||||
|
||||
sender releases "message"
|
||||
|
||||
sender releases "token"
|
||||
|
||||
::
|
||||
|
||||
receiver upconvert to PR on "message"
|
||||
receiver get CR of "ack"
|
||||
receiver release "message"
|
||||
|
||||
sender receiver receiver
|
||||
"ack":CR "ack":CR "ack":CR
|
||||
|
||||
|
||||
4. Handling Failures
|
||||
====================
|
||||
|
||||
4.1 Node Failure
|
||||
----------------
|
||||
|
||||
When a node fails, the DLM informs the cluster with the slot
|
||||
number. The node starts a cluster recovery thread. The cluster
|
||||
recovery thread:
|
||||
|
||||
- acquires the bitmap<number> lock of the failed node
|
||||
- opens the bitmap
|
||||
- reads the bitmap of the failed node
|
||||
- copies the set bitmap to local node
|
||||
- cleans the bitmap of the failed node
|
||||
- releases bitmap<number> lock of the failed node
|
||||
- initiates resync of the bitmap on the current node
|
||||
md_check_recovery is invoked within recover_bitmaps,
|
||||
then md_check_recovery -> metadata_update_start/finish,
|
||||
it will lock the communication by lock_comm.
|
||||
Which means when one node is resyncing it blocks all
|
||||
other nodes from writing anywhere on the array.
|
||||
|
||||
The resync process is the regular md resync. However, in a clustered
|
||||
environment when a resync is performed, it needs to tell other nodes
|
||||
of the areas which are suspended. Before a resync starts, the node
|
||||
send out RESYNCING with the (lo,hi) range of the area which needs to
|
||||
be suspended. Each node maintains a suspend_list, which contains the
|
||||
list of ranges which are currently suspended. On receiving RESYNCING,
|
||||
the node adds the range to the suspend_list. Similarly, when the node
|
||||
performing resync finishes, it sends RESYNCING with an empty range to
|
||||
other nodes and other nodes remove the corresponding entry from the
|
||||
suspend_list.
|
||||
|
||||
A helper function, ->area_resyncing() can be used to check if a
|
||||
particular I/O range should be suspended or not.
|
||||
|
||||
4.2 Device Failure
|
||||
==================
|
||||
|
||||
Device failures are handled and communicated with the metadata update
|
||||
routine. When a node detects a device failure it does not allow
|
||||
any further writes to that device until the failure has been
|
||||
acknowledged by all other nodes.
|
||||
|
||||
5. Adding a new Device
|
||||
----------------------
|
||||
|
||||
For adding a new device, it is necessary that all nodes "see" the new
|
||||
device to be added. For this, the following algorithm is used:
|
||||
|
||||
1. Node 1 issues mdadm --manage /dev/mdX --add /dev/sdYY which issues
|
||||
ioctl(ADD_NEW_DISK with disc.state set to MD_DISK_CLUSTER_ADD)
|
||||
2. Node 1 sends a NEWDISK message with uuid and slot number
|
||||
3. Other nodes issue kobject_uevent_env with uuid and slot number
|
||||
(Steps 4,5 could be a udev rule)
|
||||
4. In userspace, the node searches for the disk, perhaps
|
||||
using blkid -t SUB_UUID=""
|
||||
5. Other nodes issue either of the following depending on whether
|
||||
the disk was found:
|
||||
ioctl(ADD_NEW_DISK with disc.state set to MD_DISK_CANDIDATE and
|
||||
disc.number set to slot number)
|
||||
ioctl(CLUSTERED_DISK_NACK)
|
||||
6. Other nodes drop lock on "no-new-devs" (CR) if device is found
|
||||
7. Node 1 attempts EX lock on "no-new-dev"
|
||||
8. If node 1 gets the lock, it sends METADATA_UPDATED after
|
||||
unmarking the disk as SpareLocal
|
||||
9. If not (get "no-new-dev" lock), it fails the operation and sends
|
||||
METADATA_UPDATED.
|
||||
10. Other nodes get the information whether a disk is added or not
|
||||
by the following METADATA_UPDATED.
|
||||
|
||||
6. Module interface
|
||||
===================
|
||||
|
||||
There are 17 call-backs which the md core can make to the cluster
|
||||
module. Understanding these can give a good overview of the whole
|
||||
process.
|
||||
|
||||
6.1 join(nodes) and leave()
|
||||
---------------------------
|
||||
|
||||
These are called when an array is started with a clustered bitmap,
|
||||
and when the array is stopped. join() ensures the cluster is
|
||||
available and initializes the various resources.
|
||||
Only the first 'nodes' nodes in the cluster can use the array.
|
||||
|
||||
6.2 slot_number()
|
||||
-----------------
|
||||
|
||||
Reports the slot number advised by the cluster infrastructure.
|
||||
Range is from 0 to nodes-1.
|
||||
|
||||
6.3 resync_info_update()
|
||||
------------------------
|
||||
|
||||
This updates the resync range that is stored in the bitmap lock.
|
||||
The starting point is updated as the resync progresses. The
|
||||
end point is always the end of the array.
|
||||
It does *not* send a RESYNCING message.
|
||||
|
||||
6.4 resync_start(), resync_finish()
|
||||
-----------------------------------
|
||||
|
||||
These are called when resync/recovery/reshape starts or stops.
|
||||
They update the resyncing range in the bitmap lock and also
|
||||
send a RESYNCING message. resync_start reports the whole
|
||||
array as resyncing, resync_finish reports none of it.
|
||||
|
||||
resync_finish() also sends a BITMAP_NEEDS_SYNC message which
|
||||
allows some other node to take over.
|
||||
|
||||
6.5 metadata_update_start(), metadata_update_finish(), metadata_update_cancel()
|
||||
-------------------------------------------------------------------------------
|
||||
|
||||
metadata_update_start is used to get exclusive access to
|
||||
the metadata. If a change is still needed once that access is
|
||||
gained, metadata_update_finish() will send a METADATA_UPDATE
|
||||
message to all other nodes, otherwise metadata_update_cancel()
|
||||
can be used to release the lock.
|
||||
|
||||
6.6 area_resyncing()
|
||||
--------------------
|
||||
|
||||
This combines two elements of functionality.
|
||||
|
||||
Firstly, it will check if any node is currently resyncing
|
||||
anything in a given range of sectors. If any resync is found,
|
||||
then the caller will avoid writing or read-balancing in that
|
||||
range.
|
||||
|
||||
Secondly, while node recovery is happening it reports that
|
||||
all areas are resyncing for READ requests. This avoids races
|
||||
between the cluster-filesystem and the cluster-RAID handling
|
||||
a node failure.
|
||||
|
||||
6.7 add_new_disk_start(), add_new_disk_finish(), new_disk_ack()
|
||||
---------------------------------------------------------------
|
||||
|
||||
These are used to manage the new-disk protocol described above.
|
||||
When a new device is added, add_new_disk_start() is called before
|
||||
it is bound to the array and, if that succeeds, add_new_disk_finish()
|
||||
is called the device is fully added.
|
||||
|
||||
When a device is added in acknowledgement to a previous
|
||||
request, or when the device is declared "unavailable",
|
||||
new_disk_ack() is called.
|
||||
|
||||
6.8 remove_disk()
|
||||
-----------------
|
||||
|
||||
This is called when a spare or failed device is removed from
|
||||
the array. It causes a REMOVE message to be send to other nodes.
|
||||
|
||||
6.9 gather_bitmaps()
|
||||
--------------------
|
||||
|
||||
This sends a RE_ADD message to all other nodes and then
|
||||
gathers bitmap information from all bitmaps. This combined
|
||||
bitmap is then used to recovery the re-added device.
|
||||
|
||||
6.10 lock_all_bitmaps() and unlock_all_bitmaps()
|
||||
------------------------------------------------
|
||||
|
||||
These are called when change bitmap to none. If a node plans
|
||||
to clear the cluster raid's bitmap, it need to make sure no other
|
||||
nodes are using the raid which is achieved by lock all bitmap
|
||||
locks within the cluster, and also those locks are unlocked
|
||||
accordingly.
|
||||
|
||||
7. Unsupported features
|
||||
=======================
|
||||
|
||||
There are somethings which are not supported by cluster MD yet.
|
||||
|
||||
- change array_sectors.
|
111
Documentation/driver-api/md/raid5-cache.rst
Normal file
111
Documentation/driver-api/md/raid5-cache.rst
Normal file
@@ -0,0 +1,111 @@
|
||||
================
|
||||
RAID 4/5/6 cache
|
||||
================
|
||||
|
||||
Raid 4/5/6 could include an extra disk for data cache besides normal RAID
|
||||
disks. The role of RAID disks isn't changed with the cache disk. The cache disk
|
||||
caches data to the RAID disks. The cache can be in write-through (supported
|
||||
since 4.4) or write-back mode (supported since 4.10). mdadm (supported since
|
||||
3.4) has a new option '--write-journal' to create array with cache. Please
|
||||
refer to mdadm manual for details. By default (RAID array starts), the cache is
|
||||
in write-through mode. A user can switch it to write-back mode by::
|
||||
|
||||
echo "write-back" > /sys/block/md0/md/journal_mode
|
||||
|
||||
And switch it back to write-through mode by::
|
||||
|
||||
echo "write-through" > /sys/block/md0/md/journal_mode
|
||||
|
||||
In both modes, all writes to the array will hit cache disk first. This means
|
||||
the cache disk must be fast and sustainable.
|
||||
|
||||
write-through mode
|
||||
==================
|
||||
|
||||
This mode mainly fixes the 'write hole' issue. For RAID 4/5/6 array, an unclean
|
||||
shutdown can cause data in some stripes to not be in consistent state, eg, data
|
||||
and parity don't match. The reason is that a stripe write involves several RAID
|
||||
disks and it's possible the writes don't hit all RAID disks yet before the
|
||||
unclean shutdown. We call an array degraded if it has inconsistent data. MD
|
||||
tries to resync the array to bring it back to normal state. But before the
|
||||
resync completes, any system crash will expose the chance of real data
|
||||
corruption in the RAID array. This problem is called 'write hole'.
|
||||
|
||||
The write-through cache will cache all data on cache disk first. After the data
|
||||
is safe on the cache disk, the data will be flushed onto RAID disks. The
|
||||
two-step write will guarantee MD can recover correct data after unclean
|
||||
shutdown even the array is degraded. Thus the cache can close the 'write hole'.
|
||||
|
||||
In write-through mode, MD reports IO completion to upper layer (usually
|
||||
filesystems) after the data is safe on RAID disks, so cache disk failure
|
||||
doesn't cause data loss. Of course cache disk failure means the array is
|
||||
exposed to 'write hole' again.
|
||||
|
||||
In write-through mode, the cache disk isn't required to be big. Several
|
||||
hundreds megabytes are enough.
|
||||
|
||||
write-back mode
|
||||
===============
|
||||
|
||||
write-back mode fixes the 'write hole' issue too, since all write data is
|
||||
cached on cache disk. But the main goal of 'write-back' cache is to speed up
|
||||
write. If a write crosses all RAID disks of a stripe, we call it full-stripe
|
||||
write. For non-full-stripe writes, MD must read old data before the new parity
|
||||
can be calculated. These synchronous reads hurt write throughput. Some writes
|
||||
which are sequential but not dispatched in the same time will suffer from this
|
||||
overhead too. Write-back cache will aggregate the data and flush the data to
|
||||
RAID disks only after the data becomes a full stripe write. This will
|
||||
completely avoid the overhead, so it's very helpful for some workloads. A
|
||||
typical workload which does sequential write followed by fsync is an example.
|
||||
|
||||
In write-back mode, MD reports IO completion to upper layer (usually
|
||||
filesystems) right after the data hits cache disk. The data is flushed to raid
|
||||
disks later after specific conditions met. So cache disk failure will cause
|
||||
data loss.
|
||||
|
||||
In write-back mode, MD also caches data in memory. The memory cache includes
|
||||
the same data stored on cache disk, so a power loss doesn't cause data loss.
|
||||
The memory cache size has performance impact for the array. It's recommended
|
||||
the size is big. A user can configure the size by::
|
||||
|
||||
echo "2048" > /sys/block/md0/md/stripe_cache_size
|
||||
|
||||
Too small cache disk will make the write aggregation less efficient in this
|
||||
mode depending on the workloads. It's recommended to use a cache disk with at
|
||||
least several gigabytes size in write-back mode.
|
||||
|
||||
The implementation
|
||||
==================
|
||||
|
||||
The write-through and write-back cache use the same disk format. The cache disk
|
||||
is organized as a simple write log. The log consists of 'meta data' and 'data'
|
||||
pairs. The meta data describes the data. It also includes checksum and sequence
|
||||
ID for recovery identification. Data can be IO data and parity data. Data is
|
||||
checksumed too. The checksum is stored in the meta data ahead of the data. The
|
||||
checksum is an optimization because MD can write meta and data freely without
|
||||
worry about the order. MD superblock has a field pointed to the valid meta data
|
||||
of log head.
|
||||
|
||||
The log implementation is pretty straightforward. The difficult part is the
|
||||
order in which MD writes data to cache disk and RAID disks. Specifically, in
|
||||
write-through mode, MD calculates parity for IO data, writes both IO data and
|
||||
parity to the log, writes the data and parity to RAID disks after the data and
|
||||
parity is settled down in log and finally the IO is finished. Read just reads
|
||||
from raid disks as usual.
|
||||
|
||||
In write-back mode, MD writes IO data to the log and reports IO completion. The
|
||||
data is also fully cached in memory at that time, which means read must query
|
||||
memory cache. If some conditions are met, MD will flush the data to RAID disks.
|
||||
MD will calculate parity for the data and write parity into the log. After this
|
||||
is finished, MD will write both data and parity into RAID disks, then MD can
|
||||
release the memory cache. The flush conditions could be stripe becomes a full
|
||||
stripe write, free cache disk space is low or free in-kernel memory cache space
|
||||
is low.
|
||||
|
||||
After an unclean shutdown, MD does recovery. MD reads all meta data and data
|
||||
from the log. The sequence ID and checksum will help us detect corrupted meta
|
||||
data and data. If MD finds a stripe with data and valid parities (1 parity for
|
||||
raid4/5 and 2 for raid6), MD will write the data and parities to RAID disks. If
|
||||
parities are incompleted, they are discarded. If part of data is corrupted,
|
||||
they are discarded too. MD then loads valid data and writes them to RAID disks
|
||||
in normal way.
|
47
Documentation/driver-api/md/raid5-ppl.rst
Normal file
47
Documentation/driver-api/md/raid5-ppl.rst
Normal file
@@ -0,0 +1,47 @@
|
||||
==================
|
||||
Partial Parity Log
|
||||
==================
|
||||
|
||||
Partial Parity Log (PPL) is a feature available for RAID5 arrays. The issue
|
||||
addressed by PPL is that after a dirty shutdown, parity of a particular stripe
|
||||
may become inconsistent with data on other member disks. If the array is also
|
||||
in degraded state, there is no way to recalculate parity, because one of the
|
||||
disks is missing. This can lead to silent data corruption when rebuilding the
|
||||
array or using it is as degraded - data calculated from parity for array blocks
|
||||
that have not been touched by a write request during the unclean shutdown can
|
||||
be incorrect. Such condition is known as the RAID5 Write Hole. Because of
|
||||
this, md by default does not allow starting a dirty degraded array.
|
||||
|
||||
Partial parity for a write operation is the XOR of stripe data chunks not
|
||||
modified by this write. It is just enough data needed for recovering from the
|
||||
write hole. XORing partial parity with the modified chunks produces parity for
|
||||
the stripe, consistent with its state before the write operation, regardless of
|
||||
which chunk writes have completed. If one of the not modified data disks of
|
||||
this stripe is missing, this updated parity can be used to recover its
|
||||
contents. PPL recovery is also performed when starting an array after an
|
||||
unclean shutdown and all disks are available, eliminating the need to resync
|
||||
the array. Because of this, using write-intent bitmap and PPL together is not
|
||||
supported.
|
||||
|
||||
When handling a write request PPL writes partial parity before new data and
|
||||
parity are dispatched to disks. PPL is a distributed log - it is stored on
|
||||
array member drives in the metadata area, on the parity drive of a particular
|
||||
stripe. It does not require a dedicated journaling drive. Write performance is
|
||||
reduced by up to 30%-40% but it scales with the number of drives in the array
|
||||
and the journaling drive does not become a bottleneck or a single point of
|
||||
failure.
|
||||
|
||||
Unlike raid5-cache, the other solution in md for closing the write hole, PPL is
|
||||
not a true journal. It does not protect from losing in-flight data, only from
|
||||
silent data corruption. If a dirty disk of a stripe is lost, no PPL recovery is
|
||||
performed for this stripe (parity is not updated). So it is possible to have
|
||||
arbitrary data in the written part of a stripe if that disk is lost. In such
|
||||
case the behavior is the same as in plain raid5.
|
||||
|
||||
PPL is available for md version-1 metadata and external (specifically IMSM)
|
||||
metadata arrays. It can be enabled using mdadm option --consistency-policy=ppl.
|
||||
|
||||
There is a limitation of maximum 64 disks in the array for PPL. It allows to
|
||||
keep data structures and implementation simple. RAID5 arrays with so many disks
|
||||
are not likely due to high risk of multiple disks failure. Such restriction
|
||||
should not be a real life limitation.
|
18
Documentation/driver-api/memory-devices/index.rst
Normal file
18
Documentation/driver-api/memory-devices/index.rst
Normal file
@@ -0,0 +1,18 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=========================
|
||||
Memory Controller drivers
|
||||
=========================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
ti-emif
|
||||
ti-gpmc
|
||||
|
||||
.. only:: subproject and html
|
||||
|
||||
Indices
|
||||
=======
|
||||
|
||||
* :ref:`genindex`
|
64
Documentation/driver-api/memory-devices/ti-emif.rst
Normal file
64
Documentation/driver-api/memory-devices/ti-emif.rst
Normal file
@@ -0,0 +1,64 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===============================
|
||||
TI EMIF SDRAM Controller Driver
|
||||
===============================
|
||||
|
||||
Author
|
||||
======
|
||||
Aneesh V <aneesh@ti.com>
|
||||
|
||||
Location
|
||||
========
|
||||
driver/memory/emif.c
|
||||
|
||||
Supported SoCs:
|
||||
===============
|
||||
TI OMAP44xx
|
||||
TI OMAP54xx
|
||||
|
||||
Menuconfig option:
|
||||
==================
|
||||
Device Drivers
|
||||
Memory devices
|
||||
Texas Instruments EMIF driver
|
||||
|
||||
Description
|
||||
===========
|
||||
This driver is for the EMIF module available in Texas Instruments
|
||||
SoCs. EMIF is an SDRAM controller that, based on its revision,
|
||||
supports one or more of DDR2, DDR3, and LPDDR2 SDRAM protocols.
|
||||
This driver takes care of only LPDDR2 memories presently. The
|
||||
functions of the driver includes re-configuring AC timing
|
||||
parameters and other settings during frequency, voltage and
|
||||
temperature changes
|
||||
|
||||
Platform Data (see include/linux/platform_data/emif_plat.h)
|
||||
===========================================================
|
||||
DDR device details and other board dependent and SoC dependent
|
||||
information can be passed through platform data (struct emif_platform_data)
|
||||
|
||||
- DDR device details: 'struct ddr_device_info'
|
||||
- Device AC timings: 'struct lpddr2_timings' and 'struct lpddr2_min_tck'
|
||||
- Custom configurations: customizable policy options through
|
||||
'struct emif_custom_configs'
|
||||
- IP revision
|
||||
- PHY type
|
||||
|
||||
Interface to the external world
|
||||
===============================
|
||||
EMIF driver registers notifiers for voltage and frequency changes
|
||||
affecting EMIF and takes appropriate actions when these are invoked.
|
||||
|
||||
- freq_pre_notify_handling()
|
||||
- freq_post_notify_handling()
|
||||
- volt_notify_handling()
|
||||
|
||||
Debugfs
|
||||
=======
|
||||
The driver creates two debugfs entries per device.
|
||||
|
||||
- regcache_dump : dump of register values calculated and saved for all
|
||||
frequencies used so far.
|
||||
- mr4 : last polled value of MR4 register in the LPDDR2 device. MR4
|
||||
indicates the current temperature level of the device.
|
179
Documentation/driver-api/memory-devices/ti-gpmc.rst
Normal file
179
Documentation/driver-api/memory-devices/ti-gpmc.rst
Normal file
@@ -0,0 +1,179 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
========================================
|
||||
GPMC (General Purpose Memory Controller)
|
||||
========================================
|
||||
|
||||
GPMC is an unified memory controller dedicated to interfacing external
|
||||
memory devices like
|
||||
|
||||
* Asynchronous SRAM like memories and application specific integrated
|
||||
circuit devices.
|
||||
* Asynchronous, synchronous, and page mode burst NOR flash devices
|
||||
NAND flash
|
||||
* Pseudo-SRAM devices
|
||||
|
||||
GPMC is found on Texas Instruments SoC's (OMAP based)
|
||||
IP details: http://www.ti.com/lit/pdf/spruh73 section 7.1
|
||||
|
||||
|
||||
GPMC generic timing calculation:
|
||||
================================
|
||||
|
||||
GPMC has certain timings that has to be programmed for proper
|
||||
functioning of the peripheral, while peripheral has another set of
|
||||
timings. To have peripheral work with gpmc, peripheral timings has to
|
||||
be translated to the form gpmc can understand. The way it has to be
|
||||
translated depends on the connected peripheral. Also there is a
|
||||
dependency for certain gpmc timings on gpmc clock frequency. Hence a
|
||||
generic timing routine was developed to achieve above requirements.
|
||||
|
||||
Generic routine provides a generic method to calculate gpmc timings
|
||||
from gpmc peripheral timings. struct gpmc_device_timings fields has to
|
||||
be updated with timings from the datasheet of the peripheral that is
|
||||
connected to gpmc. A few of the peripheral timings can be fed either
|
||||
in time or in cycles, provision to handle this scenario has been
|
||||
provided (refer struct gpmc_device_timings definition). It may so
|
||||
happen that timing as specified by peripheral datasheet is not present
|
||||
in timing structure, in this scenario, try to correlate peripheral
|
||||
timing to the one available. If that doesn't work, try to add a new
|
||||
field as required by peripheral, educate generic timing routine to
|
||||
handle it, make sure that it does not break any of the existing.
|
||||
Then there may be cases where peripheral datasheet doesn't mention
|
||||
certain fields of struct gpmc_device_timings, zero those entries.
|
||||
|
||||
Generic timing routine has been verified to work properly on
|
||||
multiple onenand's and tusb6010 peripherals.
|
||||
|
||||
A word of caution: generic timing routine has been developed based
|
||||
on understanding of gpmc timings, peripheral timings, available
|
||||
custom timing routines, a kind of reverse engineering without
|
||||
most of the datasheets & hardware (to be exact none of those supported
|
||||
in mainline having custom timing routine) and by simulation.
|
||||
|
||||
gpmc timing dependency on peripheral timings:
|
||||
|
||||
[<gpmc_timing>: <peripheral timing1>, <peripheral timing2> ...]
|
||||
|
||||
1. common
|
||||
|
||||
cs_on:
|
||||
t_ceasu
|
||||
adv_on:
|
||||
t_avdasu, t_ceavd
|
||||
|
||||
2. sync common
|
||||
|
||||
sync_clk:
|
||||
clk
|
||||
page_burst_access:
|
||||
t_bacc
|
||||
clk_activation:
|
||||
t_ces, t_avds
|
||||
|
||||
3. read async muxed
|
||||
|
||||
adv_rd_off:
|
||||
t_avdp_r
|
||||
oe_on:
|
||||
t_oeasu, t_aavdh
|
||||
access:
|
||||
t_iaa, t_oe, t_ce, t_aa
|
||||
rd_cycle:
|
||||
t_rd_cycle, t_cez_r, t_oez
|
||||
|
||||
4. read async non-muxed
|
||||
|
||||
adv_rd_off:
|
||||
t_avdp_r
|
||||
oe_on:
|
||||
t_oeasu
|
||||
access:
|
||||
t_iaa, t_oe, t_ce, t_aa
|
||||
rd_cycle:
|
||||
t_rd_cycle, t_cez_r, t_oez
|
||||
|
||||
5. read sync muxed
|
||||
|
||||
adv_rd_off:
|
||||
t_avdp_r, t_avdh
|
||||
oe_on:
|
||||
t_oeasu, t_ach, cyc_aavdh_oe
|
||||
access:
|
||||
t_iaa, cyc_iaa, cyc_oe
|
||||
rd_cycle:
|
||||
t_cez_r, t_oez, t_ce_rdyz
|
||||
|
||||
6. read sync non-muxed
|
||||
|
||||
adv_rd_off:
|
||||
t_avdp_r
|
||||
oe_on:
|
||||
t_oeasu
|
||||
access:
|
||||
t_iaa, cyc_iaa, cyc_oe
|
||||
rd_cycle:
|
||||
t_cez_r, t_oez, t_ce_rdyz
|
||||
|
||||
7. write async muxed
|
||||
|
||||
adv_wr_off:
|
||||
t_avdp_w
|
||||
we_on, wr_data_mux_bus:
|
||||
t_weasu, t_aavdh, cyc_aavhd_we
|
||||
we_off:
|
||||
t_wpl
|
||||
cs_wr_off:
|
||||
t_wph
|
||||
wr_cycle:
|
||||
t_cez_w, t_wr_cycle
|
||||
|
||||
8. write async non-muxed
|
||||
|
||||
adv_wr_off:
|
||||
t_avdp_w
|
||||
we_on, wr_data_mux_bus:
|
||||
t_weasu
|
||||
we_off:
|
||||
t_wpl
|
||||
cs_wr_off:
|
||||
t_wph
|
||||
wr_cycle:
|
||||
t_cez_w, t_wr_cycle
|
||||
|
||||
9. write sync muxed
|
||||
|
||||
adv_wr_off:
|
||||
t_avdp_w, t_avdh
|
||||
we_on, wr_data_mux_bus:
|
||||
t_weasu, t_rdyo, t_aavdh, cyc_aavhd_we
|
||||
we_off:
|
||||
t_wpl, cyc_wpl
|
||||
cs_wr_off:
|
||||
t_wph
|
||||
wr_cycle:
|
||||
t_cez_w, t_ce_rdyz
|
||||
|
||||
10. write sync non-muxed
|
||||
|
||||
adv_wr_off:
|
||||
t_avdp_w
|
||||
we_on, wr_data_mux_bus:
|
||||
t_weasu, t_rdyo
|
||||
we_off:
|
||||
t_wpl, cyc_wpl
|
||||
cs_wr_off:
|
||||
t_wph
|
||||
wr_cycle:
|
||||
t_cez_w, t_ce_rdyz
|
||||
|
||||
|
||||
Note:
|
||||
Many of gpmc timings are dependent on other gpmc timings (a few
|
||||
gpmc timings purely dependent on other gpmc timings, a reason that
|
||||
some of the gpmc timings are missing above), and it will result in
|
||||
indirect dependency of peripheral timings to gpmc timings other than
|
||||
mentioned above, refer timing routine for more details. To know what
|
||||
these peripheral timings correspond to, please see explanations in
|
||||
struct gpmc_device_timings definition. And for gpmc timings refer
|
||||
IP details (link above).
|
175
Documentation/driver-api/men-chameleon-bus.rst
Normal file
175
Documentation/driver-api/men-chameleon-bus.rst
Normal file
@@ -0,0 +1,175 @@
|
||||
=================
|
||||
MEN Chameleon Bus
|
||||
=================
|
||||
|
||||
.. Table of Contents
|
||||
=================
|
||||
1 Introduction
|
||||
1.1 Scope of this Document
|
||||
1.2 Limitations of the current implementation
|
||||
2 Architecture
|
||||
2.1 MEN Chameleon Bus
|
||||
2.2 Carrier Devices
|
||||
2.3 Parser
|
||||
3 Resource handling
|
||||
3.1 Memory Resources
|
||||
3.2 IRQs
|
||||
4 Writing an MCB driver
|
||||
4.1 The driver structure
|
||||
4.2 Probing and attaching
|
||||
4.3 Initializing the driver
|
||||
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
This document describes the architecture and implementation of the MEN
|
||||
Chameleon Bus (called MCB throughout this document).
|
||||
|
||||
Scope of this Document
|
||||
----------------------
|
||||
|
||||
This document is intended to be a short overview of the current
|
||||
implementation and does by no means describe the complete possibilities of MCB
|
||||
based devices.
|
||||
|
||||
Limitations of the current implementation
|
||||
-----------------------------------------
|
||||
|
||||
The current implementation is limited to PCI and PCIe based carrier devices
|
||||
that only use a single memory resource and share the PCI legacy IRQ. Not
|
||||
implemented are:
|
||||
|
||||
- Multi-resource MCB devices like the VME Controller or M-Module carrier.
|
||||
- MCB devices that need another MCB device, like SRAM for a DMA Controller's
|
||||
buffer descriptors or a video controller's video memory.
|
||||
- A per-carrier IRQ domain for carrier devices that have one (or more) IRQs
|
||||
per MCB device like PCIe based carriers with MSI or MSI-X support.
|
||||
|
||||
Architecture
|
||||
============
|
||||
|
||||
MCB is divided into 3 functional blocks:
|
||||
|
||||
- The MEN Chameleon Bus itself,
|
||||
- drivers for MCB Carrier Devices and
|
||||
- the parser for the Chameleon table.
|
||||
|
||||
MEN Chameleon Bus
|
||||
-----------------
|
||||
|
||||
The MEN Chameleon Bus is an artificial bus system that attaches to a so
|
||||
called Chameleon FPGA device found on some hardware produced my MEN Mikro
|
||||
Elektronik GmbH. These devices are multi-function devices implemented in a
|
||||
single FPGA and usually attached via some sort of PCI or PCIe link. Each
|
||||
FPGA contains a header section describing the content of the FPGA. The
|
||||
header lists the device id, PCI BAR, offset from the beginning of the PCI
|
||||
BAR, size in the FPGA, interrupt number and some other properties currently
|
||||
not handled by the MCB implementation.
|
||||
|
||||
Carrier Devices
|
||||
---------------
|
||||
|
||||
A carrier device is just an abstraction for the real world physical bus the
|
||||
Chameleon FPGA is attached to. Some IP Core drivers may need to interact with
|
||||
properties of the carrier device (like querying the IRQ number of a PCI
|
||||
device). To provide abstraction from the real hardware bus, an MCB carrier
|
||||
device provides callback methods to translate the driver's MCB function calls
|
||||
to hardware related function calls. For example a carrier device may
|
||||
implement the get_irq() method which can be translated into a hardware bus
|
||||
query for the IRQ number the device should use.
|
||||
|
||||
Parser
|
||||
------
|
||||
|
||||
The parser reads the first 512 bytes of a Chameleon device and parses the
|
||||
Chameleon table. Currently the parser only supports the Chameleon v2 variant
|
||||
of the Chameleon table but can easily be adopted to support an older or
|
||||
possible future variant. While parsing the table's entries new MCB devices
|
||||
are allocated and their resources are assigned according to the resource
|
||||
assignment in the Chameleon table. After resource assignment is finished, the
|
||||
MCB devices are registered at the MCB and thus at the driver core of the
|
||||
Linux kernel.
|
||||
|
||||
Resource handling
|
||||
=================
|
||||
|
||||
The current implementation assigns exactly one memory and one IRQ resource
|
||||
per MCB device. But this is likely going to change in the future.
|
||||
|
||||
Memory Resources
|
||||
----------------
|
||||
|
||||
Each MCB device has exactly one memory resource, which can be requested from
|
||||
the MCB bus. This memory resource is the physical address of the MCB device
|
||||
inside the carrier and is intended to be passed to ioremap() and friends. It
|
||||
is already requested from the kernel by calling request_mem_region().
|
||||
|
||||
IRQs
|
||||
----
|
||||
|
||||
Each MCB device has exactly one IRQ resource, which can be requested from the
|
||||
MCB bus. If a carrier device driver implements the ->get_irq() callback
|
||||
method, the IRQ number assigned by the carrier device will be returned,
|
||||
otherwise the IRQ number inside the Chameleon table will be returned. This
|
||||
number is suitable to be passed to request_irq().
|
||||
|
||||
Writing an MCB driver
|
||||
=====================
|
||||
|
||||
The driver structure
|
||||
--------------------
|
||||
|
||||
Each MCB driver has a structure to identify the device driver as well as
|
||||
device ids which identify the IP Core inside the FPGA. The driver structure
|
||||
also contains callback methods which get executed on driver probe and
|
||||
removal from the system::
|
||||
|
||||
static const struct mcb_device_id foo_ids[] = {
|
||||
{ .device = 0x123 },
|
||||
{ }
|
||||
};
|
||||
MODULE_DEVICE_TABLE(mcb, foo_ids);
|
||||
|
||||
static struct mcb_driver foo_driver = {
|
||||
driver = {
|
||||
.name = "foo-bar",
|
||||
.owner = THIS_MODULE,
|
||||
},
|
||||
.probe = foo_probe,
|
||||
.remove = foo_remove,
|
||||
.id_table = foo_ids,
|
||||
};
|
||||
|
||||
Probing and attaching
|
||||
---------------------
|
||||
|
||||
When a driver is loaded and the MCB devices it services are found, the MCB
|
||||
core will call the driver's probe callback method. When the driver is removed
|
||||
from the system, the MCB core will call the driver's remove callback method::
|
||||
|
||||
static init foo_probe(struct mcb_device *mdev, const struct mcb_device_id *id);
|
||||
static void foo_remove(struct mcb_device *mdev);
|
||||
|
||||
Initializing the driver
|
||||
-----------------------
|
||||
|
||||
When the kernel is booted or your foo driver module is inserted, you have to
|
||||
perform driver initialization. Usually it is enough to register your driver
|
||||
module at the MCB core::
|
||||
|
||||
static int __init foo_init(void)
|
||||
{
|
||||
return mcb_register_driver(&foo_driver);
|
||||
}
|
||||
module_init(foo_init);
|
||||
|
||||
static void __exit foo_exit(void)
|
||||
{
|
||||
mcb_unregister_driver(&foo_driver);
|
||||
}
|
||||
module_exit(foo_exit);
|
||||
|
||||
The module_mcb_driver() macro can be used to reduce the above code::
|
||||
|
||||
module_mcb_driver(foo_driver);
|
13
Documentation/driver-api/mmc/index.rst
Normal file
13
Documentation/driver-api/mmc/index.rst
Normal file
@@ -0,0 +1,13 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
========================
|
||||
MMC/SD/SDIO card support
|
||||
========================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
mmc-dev-attrs
|
||||
mmc-dev-parts
|
||||
mmc-async-req
|
||||
mmc-tools
|
98
Documentation/driver-api/mmc/mmc-async-req.rst
Normal file
98
Documentation/driver-api/mmc/mmc-async-req.rst
Normal file
@@ -0,0 +1,98 @@
|
||||
========================
|
||||
MMC Asynchronous Request
|
||||
========================
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
How significant is the cache maintenance overhead?
|
||||
|
||||
It depends. Fast eMMC and multiple cache levels with speculative cache
|
||||
pre-fetch makes the cache overhead relatively significant. If the DMA
|
||||
preparations for the next request are done in parallel with the current
|
||||
transfer, the DMA preparation overhead would not affect the MMC performance.
|
||||
|
||||
The intention of non-blocking (asynchronous) MMC requests is to minimize the
|
||||
time between when an MMC request ends and another MMC request begins.
|
||||
|
||||
Using mmc_wait_for_req(), the MMC controller is idle while dma_map_sg and
|
||||
dma_unmap_sg are processing. Using non-blocking MMC requests makes it
|
||||
possible to prepare the caches for next job in parallel with an active
|
||||
MMC request.
|
||||
|
||||
MMC block driver
|
||||
================
|
||||
|
||||
The mmc_blk_issue_rw_rq() in the MMC block driver is made non-blocking.
|
||||
|
||||
The increase in throughput is proportional to the time it takes to
|
||||
prepare (major part of preparations are dma_map_sg() and dma_unmap_sg())
|
||||
a request and how fast the memory is. The faster the MMC/SD is the
|
||||
more significant the prepare request time becomes. Roughly the expected
|
||||
performance gain is 5% for large writes and 10% on large reads on a L2 cache
|
||||
platform. In power save mode, when clocks run on a lower frequency, the DMA
|
||||
preparation may cost even more. As long as these slower preparations are run
|
||||
in parallel with the transfer performance won't be affected.
|
||||
|
||||
Details on measurements from IOZone and mmc_test
|
||||
================================================
|
||||
|
||||
https://wiki.linaro.org/WorkingGroups/Kernel/Specs/StoragePerfMMC-async-req
|
||||
|
||||
MMC core API extension
|
||||
======================
|
||||
|
||||
There is one new public function mmc_start_req().
|
||||
|
||||
It starts a new MMC command request for a host. The function isn't
|
||||
truly non-blocking. If there is an ongoing async request it waits
|
||||
for completion of that request and starts the new one and returns. It
|
||||
doesn't wait for the new request to complete. If there is no ongoing
|
||||
request it starts the new request and returns immediately.
|
||||
|
||||
MMC host extensions
|
||||
===================
|
||||
|
||||
There are two optional members in the mmc_host_ops -- pre_req() and
|
||||
post_req() -- that the host driver may implement in order to move work
|
||||
to before and after the actual mmc_host_ops.request() function is called.
|
||||
|
||||
In the DMA case pre_req() may do dma_map_sg() and prepare the DMA
|
||||
descriptor, and post_req() runs the dma_unmap_sg().
|
||||
|
||||
Optimize for the first request
|
||||
==============================
|
||||
|
||||
The first request in a series of requests can't be prepared in parallel
|
||||
with the previous transfer, since there is no previous request.
|
||||
|
||||
The argument is_first_req in pre_req() indicates that there is no previous
|
||||
request. The host driver may optimize for this scenario to minimize
|
||||
the performance loss. A way to optimize for this is to split the current
|
||||
request in two chunks, prepare the first chunk and start the request,
|
||||
and finally prepare the second chunk and start the transfer.
|
||||
|
||||
Pseudocode to handle is_first_req scenario with minimal prepare overhead::
|
||||
|
||||
if (is_first_req && req->size > threshold)
|
||||
/* start MMC transfer for the complete transfer size */
|
||||
mmc_start_command(MMC_CMD_TRANSFER_FULL_SIZE);
|
||||
|
||||
/*
|
||||
* Begin to prepare DMA while cmd is being processed by MMC.
|
||||
* The first chunk of the request should take the same time
|
||||
* to prepare as the "MMC process command time".
|
||||
* If prepare time exceeds MMC cmd time
|
||||
* the transfer is delayed, guesstimate max 4k as first chunk size.
|
||||
*/
|
||||
prepare_1st_chunk_for_dma(req);
|
||||
/* flush pending desc to the DMAC (dmaengine.h) */
|
||||
dma_issue_pending(req->dma_desc);
|
||||
|
||||
prepare_2nd_chunk_for_dma(req);
|
||||
/*
|
||||
* The second issue_pending should be called before MMC runs out
|
||||
* of the first chunk. If the MMC runs out of the first data chunk
|
||||
* before this call, the transfer is delayed.
|
||||
*/
|
||||
dma_issue_pending(req->dma_desc);
|
91
Documentation/driver-api/mmc/mmc-dev-attrs.rst
Normal file
91
Documentation/driver-api/mmc/mmc-dev-attrs.rst
Normal file
@@ -0,0 +1,91 @@
|
||||
==================================
|
||||
SD and MMC Block Device Attributes
|
||||
==================================
|
||||
|
||||
These attributes are defined for the block devices associated with the
|
||||
SD or MMC device.
|
||||
|
||||
The following attributes are read/write.
|
||||
|
||||
======== ===============================================
|
||||
force_ro Enforce read-only access even if write protect switch is off.
|
||||
======== ===============================================
|
||||
|
||||
SD and MMC Device Attributes
|
||||
============================
|
||||
|
||||
All attributes are read-only.
|
||||
|
||||
====================== ===============================================
|
||||
cid Card Identification Register
|
||||
csd Card Specific Data Register
|
||||
scr SD Card Configuration Register (SD only)
|
||||
date Manufacturing Date (from CID Register)
|
||||
fwrev Firmware/Product Revision (from CID Register)
|
||||
(SD and MMCv1 only)
|
||||
hwrev Hardware/Product Revision (from CID Register)
|
||||
(SD and MMCv1 only)
|
||||
manfid Manufacturer ID (from CID Register)
|
||||
name Product Name (from CID Register)
|
||||
oemid OEM/Application ID (from CID Register)
|
||||
prv Product Revision (from CID Register)
|
||||
(SD and MMCv4 only)
|
||||
serial Product Serial Number (from CID Register)
|
||||
erase_size Erase group size
|
||||
preferred_erase_size Preferred erase size
|
||||
raw_rpmb_size_mult RPMB partition size
|
||||
rel_sectors Reliable write sector count
|
||||
ocr Operation Conditions Register
|
||||
dsr Driver Stage Register
|
||||
cmdq_en Command Queue enabled:
|
||||
|
||||
1 => enabled, 0 => not enabled
|
||||
====================== ===============================================
|
||||
|
||||
Note on Erase Size and Preferred Erase Size:
|
||||
|
||||
"erase_size" is the minimum size, in bytes, of an erase
|
||||
operation. For MMC, "erase_size" is the erase group size
|
||||
reported by the card. Note that "erase_size" does not apply
|
||||
to trim or secure trim operations where the minimum size is
|
||||
always one 512 byte sector. For SD, "erase_size" is 512
|
||||
if the card is block-addressed, 0 otherwise.
|
||||
|
||||
SD/MMC cards can erase an arbitrarily large area up to and
|
||||
including the whole card. When erasing a large area it may
|
||||
be desirable to do it in smaller chunks for three reasons:
|
||||
|
||||
1. A single erase command will make all other I/O on
|
||||
the card wait. This is not a problem if the whole card
|
||||
is being erased, but erasing one partition will make
|
||||
I/O for another partition on the same card wait for the
|
||||
duration of the erase - which could be a several
|
||||
minutes.
|
||||
2. To be able to inform the user of erase progress.
|
||||
3. The erase timeout becomes too large to be very
|
||||
useful. Because the erase timeout contains a margin
|
||||
which is multiplied by the size of the erase area,
|
||||
the value can end up being several minutes for large
|
||||
areas.
|
||||
|
||||
"erase_size" is not the most efficient unit to erase
|
||||
(especially for SD where it is just one sector),
|
||||
hence "preferred_erase_size" provides a good chunk
|
||||
size for erasing large areas.
|
||||
|
||||
For MMC, "preferred_erase_size" is the high-capacity
|
||||
erase size if a card specifies one, otherwise it is
|
||||
based on the capacity of the card.
|
||||
|
||||
For SD, "preferred_erase_size" is the allocation unit
|
||||
size specified by the card.
|
||||
|
||||
"preferred_erase_size" is in bytes.
|
||||
|
||||
Note on raw_rpmb_size_mult:
|
||||
|
||||
"raw_rpmb_size_mult" is a multiple of 128kB block.
|
||||
|
||||
RPMB size in byte is calculated by using the following equation:
|
||||
|
||||
RPMB partition size = 128kB x raw_rpmb_size_mult
|
41
Documentation/driver-api/mmc/mmc-dev-parts.rst
Normal file
41
Documentation/driver-api/mmc/mmc-dev-parts.rst
Normal file
@@ -0,0 +1,41 @@
|
||||
============================
|
||||
SD and MMC Device Partitions
|
||||
============================
|
||||
|
||||
Device partitions are additional logical block devices present on the
|
||||
SD/MMC device.
|
||||
|
||||
As of this writing, MMC boot partitions as supported and exposed as
|
||||
/dev/mmcblkXboot0 and /dev/mmcblkXboot1, where X is the index of the
|
||||
parent /dev/mmcblkX.
|
||||
|
||||
MMC Boot Partitions
|
||||
===================
|
||||
|
||||
Read and write access is provided to the two MMC boot partitions. Due to
|
||||
the sensitive nature of the boot partition contents, which often store
|
||||
a bootloader or bootloader configuration tables crucial to booting the
|
||||
platform, write access is disabled by default to reduce the chance of
|
||||
accidental bricking.
|
||||
|
||||
To enable write access to /dev/mmcblkXbootY, disable the forced read-only
|
||||
access with::
|
||||
|
||||
echo 0 > /sys/block/mmcblkXbootY/force_ro
|
||||
|
||||
To re-enable read-only access::
|
||||
|
||||
echo 1 > /sys/block/mmcblkXbootY/force_ro
|
||||
|
||||
The boot partitions can also be locked read only until the next power on,
|
||||
with::
|
||||
|
||||
echo 1 > /sys/block/mmcblkXbootY/ro_lock_until_next_power_on
|
||||
|
||||
This is a feature of the card and not of the kernel. If the card does
|
||||
not support boot partition locking, the file will not exist. If the
|
||||
feature has been disabled on the card, the file will be read-only.
|
||||
|
||||
The boot partitions can also be locked permanently, but this feature is
|
||||
not accessible through sysfs in order to avoid accidental or malicious
|
||||
bricking.
|
37
Documentation/driver-api/mmc/mmc-tools.rst
Normal file
37
Documentation/driver-api/mmc/mmc-tools.rst
Normal file
@@ -0,0 +1,37 @@
|
||||
======================
|
||||
MMC tools introduction
|
||||
======================
|
||||
|
||||
There is one MMC test tools called mmc-utils, which is maintained by Chris Ball,
|
||||
you can find it at the below public git repository:
|
||||
|
||||
http://git.kernel.org/cgit/linux/kernel/git/cjb/mmc-utils.git/
|
||||
|
||||
Functions
|
||||
=========
|
||||
|
||||
The mmc-utils tools can do the following:
|
||||
|
||||
- Print and parse extcsd data.
|
||||
- Determine the eMMC writeprotect status.
|
||||
- Set the eMMC writeprotect status.
|
||||
- Set the eMMC data sector size to 4KB by disabling emulation.
|
||||
- Create general purpose partition.
|
||||
- Enable the enhanced user area.
|
||||
- Enable write reliability per partition.
|
||||
- Print the response to STATUS_SEND (CMD13).
|
||||
- Enable the boot partition.
|
||||
- Set Boot Bus Conditions.
|
||||
- Enable the eMMC BKOPS feature.
|
||||
- Permanently enable the eMMC H/W Reset feature.
|
||||
- Permanently disable the eMMC H/W Reset feature.
|
||||
- Send Sanitize command.
|
||||
- Program authentication key for the device.
|
||||
- Counter value for the rpmb device will be read to stdout.
|
||||
- Read from rpmb device to output.
|
||||
- Write to rpmb device from data file.
|
||||
- Enable the eMMC cache feature.
|
||||
- Disable the eMMC cache feature.
|
||||
- Print and parse CID data.
|
||||
- Print and parse CSD data.
|
||||
- Print and parse SCR data.
|
12
Documentation/driver-api/mtd/index.rst
Normal file
12
Documentation/driver-api/mtd/index.rst
Normal file
@@ -0,0 +1,12 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==============================
|
||||
Memory Technology Device (MTD)
|
||||
==============================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
intel-spi
|
||||
nand_ecc
|
||||
spi-nor
|
90
Documentation/driver-api/mtd/intel-spi.rst
Normal file
90
Documentation/driver-api/mtd/intel-spi.rst
Normal file
@@ -0,0 +1,90 @@
|
||||
==============================
|
||||
Upgrading BIOS using intel-spi
|
||||
==============================
|
||||
|
||||
Many Intel CPUs like Baytrail and Braswell include SPI serial flash host
|
||||
controller which is used to hold BIOS and other platform specific data.
|
||||
Since contents of the SPI serial flash is crucial for machine to function,
|
||||
it is typically protected by different hardware protection mechanisms to
|
||||
avoid accidental (or on purpose) overwrite of the content.
|
||||
|
||||
Not all manufacturers protect the SPI serial flash, mainly because it
|
||||
allows upgrading the BIOS image directly from an OS.
|
||||
|
||||
The intel-spi driver makes it possible to read and write the SPI serial
|
||||
flash, if certain protection bits are not set and locked. If it finds
|
||||
any of them set, the whole MTD device is made read-only to prevent
|
||||
partial overwrites. By default the driver exposes SPI serial flash
|
||||
contents as read-only but it can be changed from kernel command line,
|
||||
passing "intel-spi.writeable=1".
|
||||
|
||||
Please keep in mind that overwriting the BIOS image on SPI serial flash
|
||||
might render the machine unbootable and requires special equipment like
|
||||
Dediprog to revive. You have been warned!
|
||||
|
||||
Below are the steps how to upgrade MinnowBoard MAX BIOS directly from
|
||||
Linux.
|
||||
|
||||
1) Download and extract the latest Minnowboard MAX BIOS SPI image
|
||||
[1]. At the time writing this the latest image is v92.
|
||||
|
||||
2) Install mtd-utils package [2]. We need this in order to erase the SPI
|
||||
serial flash. Distros like Debian and Fedora have this prepackaged with
|
||||
name "mtd-utils".
|
||||
|
||||
3) Add "intel-spi.writeable=1" to the kernel command line and reboot
|
||||
the board (you can also reload the driver passing "writeable=1" as
|
||||
module parameter to modprobe).
|
||||
|
||||
4) Once the board is up and running again, find the right MTD partition
|
||||
(it is named as "BIOS")::
|
||||
|
||||
# cat /proc/mtd
|
||||
dev: size erasesize name
|
||||
mtd0: 00800000 00001000 "BIOS"
|
||||
|
||||
So here it will be /dev/mtd0 but it may vary.
|
||||
|
||||
5) Make backup of the existing image first::
|
||||
|
||||
# dd if=/dev/mtd0ro of=bios.bak
|
||||
16384+0 records in
|
||||
16384+0 records out
|
||||
8388608 bytes (8.4 MB) copied, 10.0269 s, 837 kB/s
|
||||
|
||||
6) Verify the backup:
|
||||
|
||||
# sha1sum /dev/mtd0ro bios.bak
|
||||
fdbb011920572ca6c991377c4b418a0502668b73 /dev/mtd0ro
|
||||
fdbb011920572ca6c991377c4b418a0502668b73 bios.bak
|
||||
|
||||
The SHA1 sums must match. Otherwise do not continue any further!
|
||||
|
||||
7) Erase the SPI serial flash. After this step, do not reboot the
|
||||
board! Otherwise it will not start anymore::
|
||||
|
||||
# flash_erase /dev/mtd0 0 0
|
||||
Erasing 4 Kibyte @ 7ff000 -- 100 % complete
|
||||
|
||||
8) Once completed without errors you can write the new BIOS image:
|
||||
|
||||
# dd if=MNW2MAX1.X64.0092.R01.1605221712.bin of=/dev/mtd0
|
||||
|
||||
9) Verify that the new content of the SPI serial flash matches the new
|
||||
BIOS image::
|
||||
|
||||
# sha1sum /dev/mtd0ro MNW2MAX1.X64.0092.R01.1605221712.bin
|
||||
9b4df9e4be2057fceec3a5529ec3d950836c87a2 /dev/mtd0ro
|
||||
9b4df9e4be2057fceec3a5529ec3d950836c87a2 MNW2MAX1.X64.0092.R01.1605221712.bin
|
||||
|
||||
The SHA1 sums should match.
|
||||
|
||||
10) Now you can reboot your board and observe the new BIOS starting up
|
||||
properly.
|
||||
|
||||
References
|
||||
----------
|
||||
|
||||
[1] https://firmware.intel.com/sites/default/files/MinnowBoard%2EMAX_%2EX64%2E92%2ER01%2Ezip
|
||||
|
||||
[2] http://www.linux-mtd.infradead.org/
|
763
Documentation/driver-api/mtd/nand_ecc.rst
Normal file
763
Documentation/driver-api/mtd/nand_ecc.rst
Normal file
@@ -0,0 +1,763 @@
|
||||
==========================
|
||||
NAND Error-correction Code
|
||||
==========================
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
Having looked at the linux mtd/nand driver and more specific at nand_ecc.c
|
||||
I felt there was room for optimisation. I bashed the code for a few hours
|
||||
performing tricks like table lookup removing superfluous code etc.
|
||||
After that the speed was increased by 35-40%.
|
||||
Still I was not too happy as I felt there was additional room for improvement.
|
||||
|
||||
Bad! I was hooked.
|
||||
I decided to annotate my steps in this file. Perhaps it is useful to someone
|
||||
or someone learns something from it.
|
||||
|
||||
|
||||
The problem
|
||||
===========
|
||||
|
||||
NAND flash (at least SLC one) typically has sectors of 256 bytes.
|
||||
However NAND flash is not extremely reliable so some error detection
|
||||
(and sometimes correction) is needed.
|
||||
|
||||
This is done by means of a Hamming code. I'll try to explain it in
|
||||
laymans terms (and apologies to all the pro's in the field in case I do
|
||||
not use the right terminology, my coding theory class was almost 30
|
||||
years ago, and I must admit it was not one of my favourites).
|
||||
|
||||
As I said before the ecc calculation is performed on sectors of 256
|
||||
bytes. This is done by calculating several parity bits over the rows and
|
||||
columns. The parity used is even parity which means that the parity bit = 1
|
||||
if the data over which the parity is calculated is 1 and the parity bit = 0
|
||||
if the data over which the parity is calculated is 0. So the total
|
||||
number of bits over the data over which the parity is calculated + the
|
||||
parity bit is even. (see wikipedia if you can't follow this).
|
||||
Parity is often calculated by means of an exclusive or operation,
|
||||
sometimes also referred to as xor. In C the operator for xor is ^
|
||||
|
||||
Back to ecc.
|
||||
Let's give a small figure:
|
||||
|
||||
========= ==== ==== ==== ==== ==== ==== ==== ==== === === === === ====
|
||||
byte 0: bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0 rp0 rp2 rp4 ... rp14
|
||||
byte 1: bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0 rp1 rp2 rp4 ... rp14
|
||||
byte 2: bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0 rp0 rp3 rp4 ... rp14
|
||||
byte 3: bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0 rp1 rp3 rp4 ... rp14
|
||||
byte 4: bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0 rp0 rp2 rp5 ... rp14
|
||||
...
|
||||
byte 254: bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0 rp0 rp3 rp5 ... rp15
|
||||
byte 255: bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0 rp1 rp3 rp5 ... rp15
|
||||
cp1 cp0 cp1 cp0 cp1 cp0 cp1 cp0
|
||||
cp3 cp3 cp2 cp2 cp3 cp3 cp2 cp2
|
||||
cp5 cp5 cp5 cp5 cp4 cp4 cp4 cp4
|
||||
========= ==== ==== ==== ==== ==== ==== ==== ==== === === === === ====
|
||||
|
||||
This figure represents a sector of 256 bytes.
|
||||
cp is my abbreviation for column parity, rp for row parity.
|
||||
|
||||
Let's start to explain column parity.
|
||||
|
||||
- cp0 is the parity that belongs to all bit0, bit2, bit4, bit6.
|
||||
|
||||
so the sum of all bit0, bit2, bit4 and bit6 values + cp0 itself is even.
|
||||
|
||||
Similarly cp1 is the sum of all bit1, bit3, bit5 and bit7.
|
||||
|
||||
- cp2 is the parity over bit0, bit1, bit4 and bit5
|
||||
- cp3 is the parity over bit2, bit3, bit6 and bit7.
|
||||
- cp4 is the parity over bit0, bit1, bit2 and bit3.
|
||||
- cp5 is the parity over bit4, bit5, bit6 and bit7.
|
||||
|
||||
Note that each of cp0 .. cp5 is exactly one bit.
|
||||
|
||||
Row parity actually works almost the same.
|
||||
|
||||
- rp0 is the parity of all even bytes (0, 2, 4, 6, ... 252, 254)
|
||||
- rp1 is the parity of all odd bytes (1, 3, 5, 7, ..., 253, 255)
|
||||
- rp2 is the parity of all bytes 0, 1, 4, 5, 8, 9, ...
|
||||
(so handle two bytes, then skip 2 bytes).
|
||||
- rp3 is covers the half rp2 does not cover (bytes 2, 3, 6, 7, 10, 11, ...)
|
||||
- for rp4 the rule is cover 4 bytes, skip 4 bytes, cover 4 bytes, skip 4 etc.
|
||||
|
||||
so rp4 calculates parity over bytes 0, 1, 2, 3, 8, 9, 10, 11, 16, ...)
|
||||
- and rp5 covers the other half, so bytes 4, 5, 6, 7, 12, 13, 14, 15, 20, ..
|
||||
|
||||
The story now becomes quite boring. I guess you get the idea.
|
||||
|
||||
- rp6 covers 8 bytes then skips 8 etc
|
||||
- rp7 skips 8 bytes then covers 8 etc
|
||||
- rp8 covers 16 bytes then skips 16 etc
|
||||
- rp9 skips 16 bytes then covers 16 etc
|
||||
- rp10 covers 32 bytes then skips 32 etc
|
||||
- rp11 skips 32 bytes then covers 32 etc
|
||||
- rp12 covers 64 bytes then skips 64 etc
|
||||
- rp13 skips 64 bytes then covers 64 etc
|
||||
- rp14 covers 128 bytes then skips 128
|
||||
- rp15 skips 128 bytes then covers 128
|
||||
|
||||
In the end the parity bits are grouped together in three bytes as
|
||||
follows:
|
||||
|
||||
===== ===== ===== ===== ===== ===== ===== ===== =====
|
||||
ECC Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
|
||||
===== ===== ===== ===== ===== ===== ===== ===== =====
|
||||
ECC 0 rp07 rp06 rp05 rp04 rp03 rp02 rp01 rp00
|
||||
ECC 1 rp15 rp14 rp13 rp12 rp11 rp10 rp09 rp08
|
||||
ECC 2 cp5 cp4 cp3 cp2 cp1 cp0 1 1
|
||||
===== ===== ===== ===== ===== ===== ===== ===== =====
|
||||
|
||||
I detected after writing this that ST application note AN1823
|
||||
(http://www.st.com/stonline/) gives a much
|
||||
nicer picture.(but they use line parity as term where I use row parity)
|
||||
Oh well, I'm graphically challenged, so suffer with me for a moment :-)
|
||||
|
||||
And I could not reuse the ST picture anyway for copyright reasons.
|
||||
|
||||
|
||||
Attempt 0
|
||||
=========
|
||||
|
||||
Implementing the parity calculation is pretty simple.
|
||||
In C pseudocode::
|
||||
|
||||
for (i = 0; i < 256; i++)
|
||||
{
|
||||
if (i & 0x01)
|
||||
rp1 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp1;
|
||||
else
|
||||
rp0 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp0;
|
||||
if (i & 0x02)
|
||||
rp3 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp3;
|
||||
else
|
||||
rp2 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp2;
|
||||
if (i & 0x04)
|
||||
rp5 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp5;
|
||||
else
|
||||
rp4 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp4;
|
||||
if (i & 0x08)
|
||||
rp7 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp7;
|
||||
else
|
||||
rp6 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp6;
|
||||
if (i & 0x10)
|
||||
rp9 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp9;
|
||||
else
|
||||
rp8 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp8;
|
||||
if (i & 0x20)
|
||||
rp11 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp11;
|
||||
else
|
||||
rp10 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp10;
|
||||
if (i & 0x40)
|
||||
rp13 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp13;
|
||||
else
|
||||
rp12 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp12;
|
||||
if (i & 0x80)
|
||||
rp15 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp15;
|
||||
else
|
||||
rp14 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp14;
|
||||
cp0 = bit6 ^ bit4 ^ bit2 ^ bit0 ^ cp0;
|
||||
cp1 = bit7 ^ bit5 ^ bit3 ^ bit1 ^ cp1;
|
||||
cp2 = bit5 ^ bit4 ^ bit1 ^ bit0 ^ cp2;
|
||||
cp3 = bit7 ^ bit6 ^ bit3 ^ bit2 ^ cp3
|
||||
cp4 = bit3 ^ bit2 ^ bit1 ^ bit0 ^ cp4
|
||||
cp5 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ cp5
|
||||
}
|
||||
|
||||
|
||||
Analysis 0
|
||||
==========
|
||||
|
||||
C does have bitwise operators but not really operators to do the above
|
||||
efficiently (and most hardware has no such instructions either).
|
||||
Therefore without implementing this it was clear that the code above was
|
||||
not going to bring me a Nobel prize :-)
|
||||
|
||||
Fortunately the exclusive or operation is commutative, so we can combine
|
||||
the values in any order. So instead of calculating all the bits
|
||||
individually, let us try to rearrange things.
|
||||
For the column parity this is easy. We can just xor the bytes and in the
|
||||
end filter out the relevant bits. This is pretty nice as it will bring
|
||||
all cp calculation out of the for loop.
|
||||
|
||||
Similarly we can first xor the bytes for the various rows.
|
||||
This leads to:
|
||||
|
||||
|
||||
Attempt 1
|
||||
=========
|
||||
|
||||
::
|
||||
|
||||
const char parity[256] = {
|
||||
0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
|
||||
1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
|
||||
1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
|
||||
0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
|
||||
1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
|
||||
0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
|
||||
0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
|
||||
1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
|
||||
1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
|
||||
0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
|
||||
0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
|
||||
1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
|
||||
0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
|
||||
1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
|
||||
1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
|
||||
0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0
|
||||
};
|
||||
|
||||
void ecc1(const unsigned char *buf, unsigned char *code)
|
||||
{
|
||||
int i;
|
||||
const unsigned char *bp = buf;
|
||||
unsigned char cur;
|
||||
unsigned char rp0, rp1, rp2, rp3, rp4, rp5, rp6, rp7;
|
||||
unsigned char rp8, rp9, rp10, rp11, rp12, rp13, rp14, rp15;
|
||||
unsigned char par;
|
||||
|
||||
par = 0;
|
||||
rp0 = 0; rp1 = 0; rp2 = 0; rp3 = 0;
|
||||
rp4 = 0; rp5 = 0; rp6 = 0; rp7 = 0;
|
||||
rp8 = 0; rp9 = 0; rp10 = 0; rp11 = 0;
|
||||
rp12 = 0; rp13 = 0; rp14 = 0; rp15 = 0;
|
||||
|
||||
for (i = 0; i < 256; i++)
|
||||
{
|
||||
cur = *bp++;
|
||||
par ^= cur;
|
||||
if (i & 0x01) rp1 ^= cur; else rp0 ^= cur;
|
||||
if (i & 0x02) rp3 ^= cur; else rp2 ^= cur;
|
||||
if (i & 0x04) rp5 ^= cur; else rp4 ^= cur;
|
||||
if (i & 0x08) rp7 ^= cur; else rp6 ^= cur;
|
||||
if (i & 0x10) rp9 ^= cur; else rp8 ^= cur;
|
||||
if (i & 0x20) rp11 ^= cur; else rp10 ^= cur;
|
||||
if (i & 0x40) rp13 ^= cur; else rp12 ^= cur;
|
||||
if (i & 0x80) rp15 ^= cur; else rp14 ^= cur;
|
||||
}
|
||||
code[0] =
|
||||
(parity[rp7] << 7) |
|
||||
(parity[rp6] << 6) |
|
||||
(parity[rp5] << 5) |
|
||||
(parity[rp4] << 4) |
|
||||
(parity[rp3] << 3) |
|
||||
(parity[rp2] << 2) |
|
||||
(parity[rp1] << 1) |
|
||||
(parity[rp0]);
|
||||
code[1] =
|
||||
(parity[rp15] << 7) |
|
||||
(parity[rp14] << 6) |
|
||||
(parity[rp13] << 5) |
|
||||
(parity[rp12] << 4) |
|
||||
(parity[rp11] << 3) |
|
||||
(parity[rp10] << 2) |
|
||||
(parity[rp9] << 1) |
|
||||
(parity[rp8]);
|
||||
code[2] =
|
||||
(parity[par & 0xf0] << 7) |
|
||||
(parity[par & 0x0f] << 6) |
|
||||
(parity[par & 0xcc] << 5) |
|
||||
(parity[par & 0x33] << 4) |
|
||||
(parity[par & 0xaa] << 3) |
|
||||
(parity[par & 0x55] << 2);
|
||||
code[0] = ~code[0];
|
||||
code[1] = ~code[1];
|
||||
code[2] = ~code[2];
|
||||
}
|
||||
|
||||
Still pretty straightforward. The last three invert statements are there to
|
||||
give a checksum of 0xff 0xff 0xff for an empty flash. In an empty flash
|
||||
all data is 0xff, so the checksum then matches.
|
||||
|
||||
I also introduced the parity lookup. I expected this to be the fastest
|
||||
way to calculate the parity, but I will investigate alternatives later
|
||||
on.
|
||||
|
||||
|
||||
Analysis 1
|
||||
==========
|
||||
|
||||
The code works, but is not terribly efficient. On my system it took
|
||||
almost 4 times as much time as the linux driver code. But hey, if it was
|
||||
*that* easy this would have been done long before.
|
||||
No pain. no gain.
|
||||
|
||||
Fortunately there is plenty of room for improvement.
|
||||
|
||||
In step 1 we moved from bit-wise calculation to byte-wise calculation.
|
||||
However in C we can also use the unsigned long data type and virtually
|
||||
every modern microprocessor supports 32 bit operations, so why not try
|
||||
to write our code in such a way that we process data in 32 bit chunks.
|
||||
|
||||
Of course this means some modification as the row parity is byte by
|
||||
byte. A quick analysis:
|
||||
for the column parity we use the par variable. When extending to 32 bits
|
||||
we can in the end easily calculate rp0 and rp1 from it.
|
||||
(because par now consists of 4 bytes, contributing to rp1, rp0, rp1, rp0
|
||||
respectively, from MSB to LSB)
|
||||
also rp2 and rp3 can be easily retrieved from par as rp3 covers the
|
||||
first two MSBs and rp2 covers the last two LSBs.
|
||||
|
||||
Note that of course now the loop is executed only 64 times (256/4).
|
||||
And note that care must taken wrt byte ordering. The way bytes are
|
||||
ordered in a long is machine dependent, and might affect us.
|
||||
Anyway, if there is an issue: this code is developed on x86 (to be
|
||||
precise: a DELL PC with a D920 Intel CPU)
|
||||
|
||||
And of course the performance might depend on alignment, but I expect
|
||||
that the I/O buffers in the nand driver are aligned properly (and
|
||||
otherwise that should be fixed to get maximum performance).
|
||||
|
||||
Let's give it a try...
|
||||
|
||||
|
||||
Attempt 2
|
||||
=========
|
||||
|
||||
::
|
||||
|
||||
extern const char parity[256];
|
||||
|
||||
void ecc2(const unsigned char *buf, unsigned char *code)
|
||||
{
|
||||
int i;
|
||||
const unsigned long *bp = (unsigned long *)buf;
|
||||
unsigned long cur;
|
||||
unsigned long rp0, rp1, rp2, rp3, rp4, rp5, rp6, rp7;
|
||||
unsigned long rp8, rp9, rp10, rp11, rp12, rp13, rp14, rp15;
|
||||
unsigned long par;
|
||||
|
||||
par = 0;
|
||||
rp0 = 0; rp1 = 0; rp2 = 0; rp3 = 0;
|
||||
rp4 = 0; rp5 = 0; rp6 = 0; rp7 = 0;
|
||||
rp8 = 0; rp9 = 0; rp10 = 0; rp11 = 0;
|
||||
rp12 = 0; rp13 = 0; rp14 = 0; rp15 = 0;
|
||||
|
||||
for (i = 0; i < 64; i++)
|
||||
{
|
||||
cur = *bp++;
|
||||
par ^= cur;
|
||||
if (i & 0x01) rp5 ^= cur; else rp4 ^= cur;
|
||||
if (i & 0x02) rp7 ^= cur; else rp6 ^= cur;
|
||||
if (i & 0x04) rp9 ^= cur; else rp8 ^= cur;
|
||||
if (i & 0x08) rp11 ^= cur; else rp10 ^= cur;
|
||||
if (i & 0x10) rp13 ^= cur; else rp12 ^= cur;
|
||||
if (i & 0x20) rp15 ^= cur; else rp14 ^= cur;
|
||||
}
|
||||
/*
|
||||
we need to adapt the code generation for the fact that rp vars are now
|
||||
long; also the column parity calculation needs to be changed.
|
||||
we'll bring rp4 to 15 back to single byte entities by shifting and
|
||||
xoring
|
||||
*/
|
||||
rp4 ^= (rp4 >> 16); rp4 ^= (rp4 >> 8); rp4 &= 0xff;
|
||||
rp5 ^= (rp5 >> 16); rp5 ^= (rp5 >> 8); rp5 &= 0xff;
|
||||
rp6 ^= (rp6 >> 16); rp6 ^= (rp6 >> 8); rp6 &= 0xff;
|
||||
rp7 ^= (rp7 >> 16); rp7 ^= (rp7 >> 8); rp7 &= 0xff;
|
||||
rp8 ^= (rp8 >> 16); rp8 ^= (rp8 >> 8); rp8 &= 0xff;
|
||||
rp9 ^= (rp9 >> 16); rp9 ^= (rp9 >> 8); rp9 &= 0xff;
|
||||
rp10 ^= (rp10 >> 16); rp10 ^= (rp10 >> 8); rp10 &= 0xff;
|
||||
rp11 ^= (rp11 >> 16); rp11 ^= (rp11 >> 8); rp11 &= 0xff;
|
||||
rp12 ^= (rp12 >> 16); rp12 ^= (rp12 >> 8); rp12 &= 0xff;
|
||||
rp13 ^= (rp13 >> 16); rp13 ^= (rp13 >> 8); rp13 &= 0xff;
|
||||
rp14 ^= (rp14 >> 16); rp14 ^= (rp14 >> 8); rp14 &= 0xff;
|
||||
rp15 ^= (rp15 >> 16); rp15 ^= (rp15 >> 8); rp15 &= 0xff;
|
||||
rp3 = (par >> 16); rp3 ^= (rp3 >> 8); rp3 &= 0xff;
|
||||
rp2 = par & 0xffff; rp2 ^= (rp2 >> 8); rp2 &= 0xff;
|
||||
par ^= (par >> 16);
|
||||
rp1 = (par >> 8); rp1 &= 0xff;
|
||||
rp0 = (par & 0xff);
|
||||
par ^= (par >> 8); par &= 0xff;
|
||||
|
||||
code[0] =
|
||||
(parity[rp7] << 7) |
|
||||
(parity[rp6] << 6) |
|
||||
(parity[rp5] << 5) |
|
||||
(parity[rp4] << 4) |
|
||||
(parity[rp3] << 3) |
|
||||
(parity[rp2] << 2) |
|
||||
(parity[rp1] << 1) |
|
||||
(parity[rp0]);
|
||||
code[1] =
|
||||
(parity[rp15] << 7) |
|
||||
(parity[rp14] << 6) |
|
||||
(parity[rp13] << 5) |
|
||||
(parity[rp12] << 4) |
|
||||
(parity[rp11] << 3) |
|
||||
(parity[rp10] << 2) |
|
||||
(parity[rp9] << 1) |
|
||||
(parity[rp8]);
|
||||
code[2] =
|
||||
(parity[par & 0xf0] << 7) |
|
||||
(parity[par & 0x0f] << 6) |
|
||||
(parity[par & 0xcc] << 5) |
|
||||
(parity[par & 0x33] << 4) |
|
||||
(parity[par & 0xaa] << 3) |
|
||||
(parity[par & 0x55] << 2);
|
||||
code[0] = ~code[0];
|
||||
code[1] = ~code[1];
|
||||
code[2] = ~code[2];
|
||||
}
|
||||
|
||||
The parity array is not shown any more. Note also that for these
|
||||
examples I kinda deviated from my regular programming style by allowing
|
||||
multiple statements on a line, not using { } in then and else blocks
|
||||
with only a single statement and by using operators like ^=
|
||||
|
||||
|
||||
Analysis 2
|
||||
==========
|
||||
|
||||
The code (of course) works, and hurray: we are a little bit faster than
|
||||
the linux driver code (about 15%). But wait, don't cheer too quickly.
|
||||
There is more to be gained.
|
||||
If we look at e.g. rp14 and rp15 we see that we either xor our data with
|
||||
rp14 or with rp15. However we also have par which goes over all data.
|
||||
This means there is no need to calculate rp14 as it can be calculated from
|
||||
rp15 through rp14 = par ^ rp15, because par = rp14 ^ rp15;
|
||||
(or if desired we can avoid calculating rp15 and calculate it from
|
||||
rp14). That is why some places refer to inverse parity.
|
||||
Of course the same thing holds for rp4/5, rp6/7, rp8/9, rp10/11 and rp12/13.
|
||||
Effectively this means we can eliminate the else clause from the if
|
||||
statements. Also we can optimise the calculation in the end a little bit
|
||||
by going from long to byte first. Actually we can even avoid the table
|
||||
lookups
|
||||
|
||||
Attempt 3
|
||||
=========
|
||||
|
||||
Odd replaced::
|
||||
|
||||
if (i & 0x01) rp5 ^= cur; else rp4 ^= cur;
|
||||
if (i & 0x02) rp7 ^= cur; else rp6 ^= cur;
|
||||
if (i & 0x04) rp9 ^= cur; else rp8 ^= cur;
|
||||
if (i & 0x08) rp11 ^= cur; else rp10 ^= cur;
|
||||
if (i & 0x10) rp13 ^= cur; else rp12 ^= cur;
|
||||
if (i & 0x20) rp15 ^= cur; else rp14 ^= cur;
|
||||
|
||||
with::
|
||||
|
||||
if (i & 0x01) rp5 ^= cur;
|
||||
if (i & 0x02) rp7 ^= cur;
|
||||
if (i & 0x04) rp9 ^= cur;
|
||||
if (i & 0x08) rp11 ^= cur;
|
||||
if (i & 0x10) rp13 ^= cur;
|
||||
if (i & 0x20) rp15 ^= cur;
|
||||
|
||||
and outside the loop added::
|
||||
|
||||
rp4 = par ^ rp5;
|
||||
rp6 = par ^ rp7;
|
||||
rp8 = par ^ rp9;
|
||||
rp10 = par ^ rp11;
|
||||
rp12 = par ^ rp13;
|
||||
rp14 = par ^ rp15;
|
||||
|
||||
And after that the code takes about 30% more time, although the number of
|
||||
statements is reduced. This is also reflected in the assembly code.
|
||||
|
||||
|
||||
Analysis 3
|
||||
==========
|
||||
|
||||
Very weird. Guess it has to do with caching or instruction parallellism
|
||||
or so. I also tried on an eeePC (Celeron, clocked at 900 Mhz). Interesting
|
||||
observation was that this one is only 30% slower (according to time)
|
||||
executing the code as my 3Ghz D920 processor.
|
||||
|
||||
Well, it was expected not to be easy so maybe instead move to a
|
||||
different track: let's move back to the code from attempt2 and do some
|
||||
loop unrolling. This will eliminate a few if statements. I'll try
|
||||
different amounts of unrolling to see what works best.
|
||||
|
||||
|
||||
Attempt 4
|
||||
=========
|
||||
|
||||
Unrolled the loop 1, 2, 3 and 4 times.
|
||||
For 4 the code starts with::
|
||||
|
||||
for (i = 0; i < 4; i++)
|
||||
{
|
||||
cur = *bp++;
|
||||
par ^= cur;
|
||||
rp4 ^= cur;
|
||||
rp6 ^= cur;
|
||||
rp8 ^= cur;
|
||||
rp10 ^= cur;
|
||||
if (i & 0x1) rp13 ^= cur; else rp12 ^= cur;
|
||||
if (i & 0x2) rp15 ^= cur; else rp14 ^= cur;
|
||||
cur = *bp++;
|
||||
par ^= cur;
|
||||
rp5 ^= cur;
|
||||
rp6 ^= cur;
|
||||
...
|
||||
|
||||
|
||||
Analysis 4
|
||||
==========
|
||||
|
||||
Unrolling once gains about 15%
|
||||
|
||||
Unrolling twice keeps the gain at about 15%
|
||||
|
||||
Unrolling three times gives a gain of 30% compared to attempt 2.
|
||||
|
||||
Unrolling four times gives a marginal improvement compared to unrolling
|
||||
three times.
|
||||
|
||||
I decided to proceed with a four time unrolled loop anyway. It was my gut
|
||||
feeling that in the next steps I would obtain additional gain from it.
|
||||
|
||||
The next step was triggered by the fact that par contains the xor of all
|
||||
bytes and rp4 and rp5 each contain the xor of half of the bytes.
|
||||
So in effect par = rp4 ^ rp5. But as xor is commutative we can also say
|
||||
that rp5 = par ^ rp4. So no need to keep both rp4 and rp5 around. We can
|
||||
eliminate rp5 (or rp4, but I already foresaw another optimisation).
|
||||
The same holds for rp6/7, rp8/9, rp10/11 rp12/13 and rp14/15.
|
||||
|
||||
|
||||
Attempt 5
|
||||
=========
|
||||
|
||||
Effectively so all odd digit rp assignments in the loop were removed.
|
||||
This included the else clause of the if statements.
|
||||
Of course after the loop we need to correct things by adding code like::
|
||||
|
||||
rp5 = par ^ rp4;
|
||||
|
||||
Also the initial assignments (rp5 = 0; etc) could be removed.
|
||||
Along the line I also removed the initialisation of rp0/1/2/3.
|
||||
|
||||
|
||||
Analysis 5
|
||||
==========
|
||||
|
||||
Measurements showed this was a good move. The run-time roughly halved
|
||||
compared with attempt 4 with 4 times unrolled, and we only require 1/3rd
|
||||
of the processor time compared to the current code in the linux kernel.
|
||||
|
||||
However, still I thought there was more. I didn't like all the if
|
||||
statements. Why not keep a running parity and only keep the last if
|
||||
statement. Time for yet another version!
|
||||
|
||||
|
||||
Attempt 6
|
||||
=========
|
||||
|
||||
THe code within the for loop was changed to::
|
||||
|
||||
for (i = 0; i < 4; i++)
|
||||
{
|
||||
cur = *bp++; tmppar = cur; rp4 ^= cur;
|
||||
cur = *bp++; tmppar ^= cur; rp6 ^= tmppar;
|
||||
cur = *bp++; tmppar ^= cur; rp4 ^= cur;
|
||||
cur = *bp++; tmppar ^= cur; rp8 ^= tmppar;
|
||||
|
||||
cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur;
|
||||
cur = *bp++; tmppar ^= cur; rp6 ^= cur;
|
||||
cur = *bp++; tmppar ^= cur; rp4 ^= cur;
|
||||
cur = *bp++; tmppar ^= cur; rp10 ^= tmppar;
|
||||
|
||||
cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur; rp8 ^= cur;
|
||||
cur = *bp++; tmppar ^= cur; rp6 ^= cur; rp8 ^= cur;
|
||||
cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp8 ^= cur;
|
||||
cur = *bp++; tmppar ^= cur; rp8 ^= cur;
|
||||
|
||||
cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur;
|
||||
cur = *bp++; tmppar ^= cur; rp6 ^= cur;
|
||||
cur = *bp++; tmppar ^= cur; rp4 ^= cur;
|
||||
cur = *bp++; tmppar ^= cur;
|
||||
|
||||
par ^= tmppar;
|
||||
if ((i & 0x1) == 0) rp12 ^= tmppar;
|
||||
if ((i & 0x2) == 0) rp14 ^= tmppar;
|
||||
}
|
||||
|
||||
As you can see tmppar is used to accumulate the parity within a for
|
||||
iteration. In the last 3 statements is added to par and, if needed,
|
||||
to rp12 and rp14.
|
||||
|
||||
While making the changes I also found that I could exploit that tmppar
|
||||
contains the running parity for this iteration. So instead of having:
|
||||
rp4 ^= cur; rp6 ^= cur;
|
||||
I removed the rp6 ^= cur; statement and did rp6 ^= tmppar; on next
|
||||
statement. A similar change was done for rp8 and rp10
|
||||
|
||||
|
||||
Analysis 6
|
||||
==========
|
||||
|
||||
Measuring this code again showed big gain. When executing the original
|
||||
linux code 1 million times, this took about 1 second on my system.
|
||||
(using time to measure the performance). After this iteration I was back
|
||||
to 0.075 sec. Actually I had to decide to start measuring over 10
|
||||
million iterations in order not to lose too much accuracy. This one
|
||||
definitely seemed to be the jackpot!
|
||||
|
||||
There is a little bit more room for improvement though. There are three
|
||||
places with statements::
|
||||
|
||||
rp4 ^= cur; rp6 ^= cur;
|
||||
|
||||
It seems more efficient to also maintain a variable rp4_6 in the while
|
||||
loop; This eliminates 3 statements per loop. Of course after the loop we
|
||||
need to correct by adding::
|
||||
|
||||
rp4 ^= rp4_6;
|
||||
rp6 ^= rp4_6
|
||||
|
||||
Furthermore there are 4 sequential assignments to rp8. This can be
|
||||
encoded slightly more efficiently by saving tmppar before those 4 lines
|
||||
and later do rp8 = rp8 ^ tmppar ^ notrp8;
|
||||
(where notrp8 is the value of rp8 before those 4 lines).
|
||||
Again a use of the commutative property of xor.
|
||||
Time for a new test!
|
||||
|
||||
|
||||
Attempt 7
|
||||
=========
|
||||
|
||||
The new code now looks like::
|
||||
|
||||
for (i = 0; i < 4; i++)
|
||||
{
|
||||
cur = *bp++; tmppar = cur; rp4 ^= cur;
|
||||
cur = *bp++; tmppar ^= cur; rp6 ^= tmppar;
|
||||
cur = *bp++; tmppar ^= cur; rp4 ^= cur;
|
||||
cur = *bp++; tmppar ^= cur; rp8 ^= tmppar;
|
||||
|
||||
cur = *bp++; tmppar ^= cur; rp4_6 ^= cur;
|
||||
cur = *bp++; tmppar ^= cur; rp6 ^= cur;
|
||||
cur = *bp++; tmppar ^= cur; rp4 ^= cur;
|
||||
cur = *bp++; tmppar ^= cur; rp10 ^= tmppar;
|
||||
|
||||
notrp8 = tmppar;
|
||||
cur = *bp++; tmppar ^= cur; rp4_6 ^= cur;
|
||||
cur = *bp++; tmppar ^= cur; rp6 ^= cur;
|
||||
cur = *bp++; tmppar ^= cur; rp4 ^= cur;
|
||||
cur = *bp++; tmppar ^= cur;
|
||||
rp8 = rp8 ^ tmppar ^ notrp8;
|
||||
|
||||
cur = *bp++; tmppar ^= cur; rp4_6 ^= cur;
|
||||
cur = *bp++; tmppar ^= cur; rp6 ^= cur;
|
||||
cur = *bp++; tmppar ^= cur; rp4 ^= cur;
|
||||
cur = *bp++; tmppar ^= cur;
|
||||
|
||||
par ^= tmppar;
|
||||
if ((i & 0x1) == 0) rp12 ^= tmppar;
|
||||
if ((i & 0x2) == 0) rp14 ^= tmppar;
|
||||
}
|
||||
rp4 ^= rp4_6;
|
||||
rp6 ^= rp4_6;
|
||||
|
||||
|
||||
Not a big change, but every penny counts :-)
|
||||
|
||||
|
||||
Analysis 7
|
||||
==========
|
||||
|
||||
Actually this made things worse. Not very much, but I don't want to move
|
||||
into the wrong direction. Maybe something to investigate later. Could
|
||||
have to do with caching again.
|
||||
|
||||
Guess that is what there is to win within the loop. Maybe unrolling one
|
||||
more time will help. I'll keep the optimisations from 7 for now.
|
||||
|
||||
|
||||
Attempt 8
|
||||
=========
|
||||
|
||||
Unrolled the loop one more time.
|
||||
|
||||
|
||||
Analysis 8
|
||||
==========
|
||||
|
||||
This makes things worse. Let's stick with attempt 6 and continue from there.
|
||||
Although it seems that the code within the loop cannot be optimised
|
||||
further there is still room to optimize the generation of the ecc codes.
|
||||
We can simply calculate the total parity. If this is 0 then rp4 = rp5
|
||||
etc. If the parity is 1, then rp4 = !rp5;
|
||||
|
||||
But if rp4 = rp5 we do not need rp5 etc. We can just write the even bits
|
||||
in the result byte and then do something like::
|
||||
|
||||
code[0] |= (code[0] << 1);
|
||||
|
||||
Lets test this.
|
||||
|
||||
|
||||
Attempt 9
|
||||
=========
|
||||
|
||||
Changed the code but again this slightly degrades performance. Tried all
|
||||
kind of other things, like having dedicated parity arrays to avoid the
|
||||
shift after parity[rp7] << 7; No gain.
|
||||
Change the lookup using the parity array by using shift operators (e.g.
|
||||
replace parity[rp7] << 7 with::
|
||||
|
||||
rp7 ^= (rp7 << 4);
|
||||
rp7 ^= (rp7 << 2);
|
||||
rp7 ^= (rp7 << 1);
|
||||
rp7 &= 0x80;
|
||||
|
||||
No gain.
|
||||
|
||||
The only marginal change was inverting the parity bits, so we can remove
|
||||
the last three invert statements.
|
||||
|
||||
Ah well, pity this does not deliver more. Then again 10 million
|
||||
iterations using the linux driver code takes between 13 and 13.5
|
||||
seconds, whereas my code now takes about 0.73 seconds for those 10
|
||||
million iterations. So basically I've improved the performance by a
|
||||
factor 18 on my system. Not that bad. Of course on different hardware
|
||||
you will get different results. No warranties!
|
||||
|
||||
But of course there is no such thing as a free lunch. The codesize almost
|
||||
tripled (from 562 bytes to 1434 bytes). Then again, it is not that much.
|
||||
|
||||
|
||||
Correcting errors
|
||||
=================
|
||||
|
||||
For correcting errors I again used the ST application note as a starter,
|
||||
but I also peeked at the existing code.
|
||||
|
||||
The algorithm itself is pretty straightforward. Just xor the given and
|
||||
the calculated ecc. If all bytes are 0 there is no problem. If 11 bits
|
||||
are 1 we have one correctable bit error. If there is 1 bit 1, we have an
|
||||
error in the given ecc code.
|
||||
|
||||
It proved to be fastest to do some table lookups. Performance gain
|
||||
introduced by this is about a factor 2 on my system when a repair had to
|
||||
be done, and 1% or so if no repair had to be done.
|
||||
|
||||
Code size increased from 330 bytes to 686 bytes for this function.
|
||||
(gcc 4.2, -O3)
|
||||
|
||||
|
||||
Conclusion
|
||||
==========
|
||||
|
||||
The gain when calculating the ecc is tremendous. Om my development hardware
|
||||
a speedup of a factor of 18 for ecc calculation was achieved. On a test on an
|
||||
embedded system with a MIPS core a factor 7 was obtained.
|
||||
|
||||
On a test with a Linksys NSLU2 (ARMv5TE processor) the speedup was a factor
|
||||
5 (big endian mode, gcc 4.1.2, -O3)
|
||||
|
||||
For correction not much gain could be obtained (as bitflips are rare). Then
|
||||
again there are also much less cycles spent there.
|
||||
|
||||
It seems there is not much more gain possible in this, at least when
|
||||
programmed in C. Of course it might be possible to squeeze something more
|
||||
out of it with an assembler program, but due to pipeline behaviour etc
|
||||
this is very tricky (at least for intel hw).
|
||||
|
||||
Author: Frans Meulenbroeks
|
||||
|
||||
Copyright (C) 2008 Koninklijke Philips Electronics NV.
|
66
Documentation/driver-api/mtd/spi-nor.rst
Normal file
66
Documentation/driver-api/mtd/spi-nor.rst
Normal file
@@ -0,0 +1,66 @@
|
||||
=================
|
||||
SPI NOR framework
|
||||
=================
|
||||
|
||||
Part I - Why do we need this framework?
|
||||
---------------------------------------
|
||||
|
||||
SPI bus controllers (drivers/spi/) only deal with streams of bytes; the bus
|
||||
controller operates agnostic of the specific device attached. However, some
|
||||
controllers (such as Freescale's QuadSPI controller) cannot easily handle
|
||||
arbitrary streams of bytes, but rather are designed specifically for SPI NOR.
|
||||
|
||||
In particular, Freescale's QuadSPI controller must know the NOR commands to
|
||||
find the right LUT sequence. Unfortunately, the SPI subsystem has no notion of
|
||||
opcodes, addresses, or data payloads; a SPI controller simply knows to send or
|
||||
receive bytes (Tx and Rx). Therefore, we must define a new layering scheme under
|
||||
which the controller driver is aware of the opcodes, addressing, and other
|
||||
details of the SPI NOR protocol.
|
||||
|
||||
Part II - How does the framework work?
|
||||
--------------------------------------
|
||||
|
||||
This framework just adds a new layer between the MTD and the SPI bus driver.
|
||||
With this new layer, the SPI NOR controller driver does not depend on the
|
||||
m25p80 code anymore.
|
||||
|
||||
Before this framework, the layer is like::
|
||||
|
||||
MTD
|
||||
------------------------
|
||||
m25p80
|
||||
------------------------
|
||||
SPI bus driver
|
||||
------------------------
|
||||
SPI NOR chip
|
||||
|
||||
After this framework, the layer is like:
|
||||
MTD
|
||||
------------------------
|
||||
SPI NOR framework
|
||||
------------------------
|
||||
m25p80
|
||||
------------------------
|
||||
SPI bus driver
|
||||
------------------------
|
||||
SPI NOR chip
|
||||
|
||||
With the SPI NOR controller driver (Freescale QuadSPI), it looks like:
|
||||
MTD
|
||||
------------------------
|
||||
SPI NOR framework
|
||||
------------------------
|
||||
fsl-quadSPI
|
||||
------------------------
|
||||
SPI NOR chip
|
||||
|
||||
Part III - How can drivers use the framework?
|
||||
---------------------------------------------
|
||||
|
||||
The main API is spi_nor_scan(). Before you call the hook, a driver should
|
||||
initialize the necessary fields for spi_nor{}. Please see
|
||||
drivers/mtd/spi-nor/spi-nor.c for detail. Please also refer to fsl-quadspi.c
|
||||
when you want to write a new driver for a SPI NOR controller.
|
||||
Another API is spi_nor_restore(), this is used to restore the status of SPI
|
||||
flash chip such as addressing mode. Call it whenever detach the driver from
|
||||
device or reboot the system.
|
11
Documentation/driver-api/nfc/index.rst
Normal file
11
Documentation/driver-api/nfc/index.rst
Normal file
@@ -0,0 +1,11 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
========================
|
||||
Near Field Communication
|
||||
========================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
nfc-hci
|
||||
nfc-pn544
|
311
Documentation/driver-api/nfc/nfc-hci.rst
Normal file
311
Documentation/driver-api/nfc/nfc-hci.rst
Normal file
@@ -0,0 +1,311 @@
|
||||
========================
|
||||
HCI backend for NFC Core
|
||||
========================
|
||||
|
||||
- Author: Eric Lapuyade, Samuel Ortiz
|
||||
- Contact: eric.lapuyade@intel.com, samuel.ortiz@intel.com
|
||||
|
||||
General
|
||||
-------
|
||||
|
||||
The HCI layer implements much of the ETSI TS 102 622 V10.2.0 specification. It
|
||||
enables easy writing of HCI-based NFC drivers. The HCI layer runs as an NFC Core
|
||||
backend, implementing an abstract nfc device and translating NFC Core API
|
||||
to HCI commands and events.
|
||||
|
||||
HCI
|
||||
---
|
||||
|
||||
HCI registers as an nfc device with NFC Core. Requests coming from userspace are
|
||||
routed through netlink sockets to NFC Core and then to HCI. From this point,
|
||||
they are translated in a sequence of HCI commands sent to the HCI layer in the
|
||||
host controller (the chip). Commands can be executed synchronously (the sending
|
||||
context blocks waiting for response) or asynchronously (the response is returned
|
||||
from HCI Rx context).
|
||||
HCI events can also be received from the host controller. They will be handled
|
||||
and a translation will be forwarded to NFC Core as needed. There are hooks to
|
||||
let the HCI driver handle proprietary events or override standard behavior.
|
||||
HCI uses 2 execution contexts:
|
||||
|
||||
- one for executing commands : nfc_hci_msg_tx_work(). Only one command
|
||||
can be executing at any given moment.
|
||||
- one for dispatching received events and commands : nfc_hci_msg_rx_work().
|
||||
|
||||
HCI Session initialization
|
||||
--------------------------
|
||||
|
||||
The Session initialization is an HCI standard which must unfortunately
|
||||
support proprietary gates. This is the reason why the driver will pass a list
|
||||
of proprietary gates that must be part of the session. HCI will ensure all
|
||||
those gates have pipes connected when the hci device is set up.
|
||||
In case the chip supports pre-opened gates and pseudo-static pipes, the driver
|
||||
can pass that information to HCI core.
|
||||
|
||||
HCI Gates and Pipes
|
||||
-------------------
|
||||
|
||||
A gate defines the 'port' where some service can be found. In order to access
|
||||
a service, one must create a pipe to that gate and open it. In this
|
||||
implementation, pipes are totally hidden. The public API only knows gates.
|
||||
This is consistent with the driver need to send commands to proprietary gates
|
||||
without knowing the pipe connected to it.
|
||||
|
||||
Driver interface
|
||||
----------------
|
||||
|
||||
A driver is generally written in two parts : the physical link management and
|
||||
the HCI management. This makes it easier to maintain a driver for a chip that
|
||||
can be connected using various phy (i2c, spi, ...)
|
||||
|
||||
HCI Management
|
||||
--------------
|
||||
|
||||
A driver would normally register itself with HCI and provide the following
|
||||
entry points::
|
||||
|
||||
struct nfc_hci_ops {
|
||||
int (*open)(struct nfc_hci_dev *hdev);
|
||||
void (*close)(struct nfc_hci_dev *hdev);
|
||||
int (*hci_ready) (struct nfc_hci_dev *hdev);
|
||||
int (*xmit) (struct nfc_hci_dev *hdev, struct sk_buff *skb);
|
||||
int (*start_poll) (struct nfc_hci_dev *hdev,
|
||||
u32 im_protocols, u32 tm_protocols);
|
||||
int (*dep_link_up)(struct nfc_hci_dev *hdev, struct nfc_target *target,
|
||||
u8 comm_mode, u8 *gb, size_t gb_len);
|
||||
int (*dep_link_down)(struct nfc_hci_dev *hdev);
|
||||
int (*target_from_gate) (struct nfc_hci_dev *hdev, u8 gate,
|
||||
struct nfc_target *target);
|
||||
int (*complete_target_discovered) (struct nfc_hci_dev *hdev, u8 gate,
|
||||
struct nfc_target *target);
|
||||
int (*im_transceive) (struct nfc_hci_dev *hdev,
|
||||
struct nfc_target *target, struct sk_buff *skb,
|
||||
data_exchange_cb_t cb, void *cb_context);
|
||||
int (*tm_send)(struct nfc_hci_dev *hdev, struct sk_buff *skb);
|
||||
int (*check_presence)(struct nfc_hci_dev *hdev,
|
||||
struct nfc_target *target);
|
||||
int (*event_received)(struct nfc_hci_dev *hdev, u8 gate, u8 event,
|
||||
struct sk_buff *skb);
|
||||
};
|
||||
|
||||
- open() and close() shall turn the hardware on and off.
|
||||
- hci_ready() is an optional entry point that is called right after the hci
|
||||
session has been set up. The driver can use it to do additional initialization
|
||||
that must be performed using HCI commands.
|
||||
- xmit() shall simply write a frame to the physical link.
|
||||
- start_poll() is an optional entrypoint that shall set the hardware in polling
|
||||
mode. This must be implemented only if the hardware uses proprietary gates or a
|
||||
mechanism slightly different from the HCI standard.
|
||||
- dep_link_up() is called after a p2p target has been detected, to finish
|
||||
the p2p connection setup with hardware parameters that need to be passed back
|
||||
to nfc core.
|
||||
- dep_link_down() is called to bring the p2p link down.
|
||||
- target_from_gate() is an optional entrypoint to return the nfc protocols
|
||||
corresponding to a proprietary gate.
|
||||
- complete_target_discovered() is an optional entry point to let the driver
|
||||
perform additional proprietary processing necessary to auto activate the
|
||||
discovered target.
|
||||
- im_transceive() must be implemented by the driver if proprietary HCI commands
|
||||
are required to send data to the tag. Some tag types will require custom
|
||||
commands, others can be written to using the standard HCI commands. The driver
|
||||
can check the tag type and either do proprietary processing, or return 1 to ask
|
||||
for standard processing. The data exchange command itself must be sent
|
||||
asynchronously.
|
||||
- tm_send() is called to send data in the case of a p2p connection
|
||||
- check_presence() is an optional entry point that will be called regularly
|
||||
by the core to check that an activated tag is still in the field. If this is
|
||||
not implemented, the core will not be able to push tag_lost events to the user
|
||||
space
|
||||
- event_received() is called to handle an event coming from the chip. Driver
|
||||
can handle the event or return 1 to let HCI attempt standard processing.
|
||||
|
||||
On the rx path, the driver is responsible to push incoming HCP frames to HCI
|
||||
using nfc_hci_recv_frame(). HCI will take care of re-aggregation and handling
|
||||
This must be done from a context that can sleep.
|
||||
|
||||
PHY Management
|
||||
--------------
|
||||
|
||||
The physical link (i2c, ...) management is defined by the following structure::
|
||||
|
||||
struct nfc_phy_ops {
|
||||
int (*write)(void *dev_id, struct sk_buff *skb);
|
||||
int (*enable)(void *dev_id);
|
||||
void (*disable)(void *dev_id);
|
||||
};
|
||||
|
||||
enable():
|
||||
turn the phy on (power on), make it ready to transfer data
|
||||
disable():
|
||||
turn the phy off
|
||||
write():
|
||||
Send a data frame to the chip. Note that to enable higher
|
||||
layers such as an llc to store the frame for re-emission, this
|
||||
function must not alter the skb. It must also not return a positive
|
||||
result (return 0 for success, negative for failure).
|
||||
|
||||
Data coming from the chip shall be sent directly to nfc_hci_recv_frame().
|
||||
|
||||
LLC
|
||||
---
|
||||
|
||||
Communication between the CPU and the chip often requires some link layer
|
||||
protocol. Those are isolated as modules managed by the HCI layer. There are
|
||||
currently two modules : nop (raw transfert) and shdlc.
|
||||
A new llc must implement the following functions::
|
||||
|
||||
struct nfc_llc_ops {
|
||||
void *(*init) (struct nfc_hci_dev *hdev, xmit_to_drv_t xmit_to_drv,
|
||||
rcv_to_hci_t rcv_to_hci, int tx_headroom,
|
||||
int tx_tailroom, int *rx_headroom, int *rx_tailroom,
|
||||
llc_failure_t llc_failure);
|
||||
void (*deinit) (struct nfc_llc *llc);
|
||||
int (*start) (struct nfc_llc *llc);
|
||||
int (*stop) (struct nfc_llc *llc);
|
||||
void (*rcv_from_drv) (struct nfc_llc *llc, struct sk_buff *skb);
|
||||
int (*xmit_from_hci) (struct nfc_llc *llc, struct sk_buff *skb);
|
||||
};
|
||||
|
||||
init():
|
||||
allocate and init your private storage
|
||||
deinit():
|
||||
cleanup
|
||||
start():
|
||||
establish the logical connection
|
||||
stop ():
|
||||
terminate the logical connection
|
||||
rcv_from_drv():
|
||||
handle data coming from the chip, going to HCI
|
||||
xmit_from_hci():
|
||||
handle data sent by HCI, going to the chip
|
||||
|
||||
The llc must be registered with nfc before it can be used. Do that by
|
||||
calling::
|
||||
|
||||
nfc_llc_register(const char *name, struct nfc_llc_ops *ops);
|
||||
|
||||
Again, note that the llc does not handle the physical link. It is thus very
|
||||
easy to mix any physical link with any llc for a given chip driver.
|
||||
|
||||
Included Drivers
|
||||
----------------
|
||||
|
||||
An HCI based driver for an NXP PN544, connected through I2C bus, and using
|
||||
shdlc is included.
|
||||
|
||||
Execution Contexts
|
||||
------------------
|
||||
|
||||
The execution contexts are the following:
|
||||
- IRQ handler (IRQH):
|
||||
fast, cannot sleep. sends incoming frames to HCI where they are passed to
|
||||
the current llc. In case of shdlc, the frame is queued in shdlc rx queue.
|
||||
|
||||
- SHDLC State Machine worker (SMW)
|
||||
|
||||
Only when llc_shdlc is used: handles shdlc rx & tx queues.
|
||||
|
||||
Dispatches HCI cmd responses.
|
||||
|
||||
- HCI Tx Cmd worker (MSGTXWQ)
|
||||
|
||||
Serializes execution of HCI commands.
|
||||
|
||||
Completes execution in case of response timeout.
|
||||
|
||||
- HCI Rx worker (MSGRXWQ)
|
||||
|
||||
Dispatches incoming HCI commands or events.
|
||||
|
||||
- Syscall context from a userspace call (SYSCALL)
|
||||
|
||||
Any entrypoint in HCI called from NFC Core
|
||||
|
||||
Workflow executing an HCI command (using shdlc)
|
||||
-----------------------------------------------
|
||||
|
||||
Executing an HCI command can easily be performed synchronously using the
|
||||
following API::
|
||||
|
||||
int nfc_hci_send_cmd (struct nfc_hci_dev *hdev, u8 gate, u8 cmd,
|
||||
const u8 *param, size_t param_len, struct sk_buff **skb)
|
||||
|
||||
The API must be invoked from a context that can sleep. Most of the time, this
|
||||
will be the syscall context. skb will return the result that was received in
|
||||
the response.
|
||||
|
||||
Internally, execution is asynchronous. So all this API does is to enqueue the
|
||||
HCI command, setup a local wait queue on stack, and wait_event() for completion.
|
||||
The wait is not interruptible because it is guaranteed that the command will
|
||||
complete after some short timeout anyway.
|
||||
|
||||
MSGTXWQ context will then be scheduled and invoke nfc_hci_msg_tx_work().
|
||||
This function will dequeue the next pending command and send its HCP fragments
|
||||
to the lower layer which happens to be shdlc. It will then start a timer to be
|
||||
able to complete the command with a timeout error if no response arrive.
|
||||
|
||||
SMW context gets scheduled and invokes nfc_shdlc_sm_work(). This function
|
||||
handles shdlc framing in and out. It uses the driver xmit to send frames and
|
||||
receives incoming frames in an skb queue filled from the driver IRQ handler.
|
||||
SHDLC I(nformation) frames payload are HCP fragments. They are aggregated to
|
||||
form complete HCI frames, which can be a response, command, or event.
|
||||
|
||||
HCI Responses are dispatched immediately from this context to unblock
|
||||
waiting command execution. Response processing involves invoking the completion
|
||||
callback that was provided by nfc_hci_msg_tx_work() when it sent the command.
|
||||
The completion callback will then wake the syscall context.
|
||||
|
||||
It is also possible to execute the command asynchronously using this API::
|
||||
|
||||
static int nfc_hci_execute_cmd_async(struct nfc_hci_dev *hdev, u8 pipe, u8 cmd,
|
||||
const u8 *param, size_t param_len,
|
||||
data_exchange_cb_t cb, void *cb_context)
|
||||
|
||||
The workflow is the same, except that the API call returns immediately, and
|
||||
the callback will be called with the result from the SMW context.
|
||||
|
||||
Workflow receiving an HCI event or command
|
||||
------------------------------------------
|
||||
|
||||
HCI commands or events are not dispatched from SMW context. Instead, they are
|
||||
queued to HCI rx_queue and will be dispatched from HCI rx worker
|
||||
context (MSGRXWQ). This is done this way to allow a cmd or event handler
|
||||
to also execute other commands (for example, handling the
|
||||
NFC_HCI_EVT_TARGET_DISCOVERED event from PN544 requires to issue an
|
||||
ANY_GET_PARAMETER to the reader A gate to get information on the target
|
||||
that was discovered).
|
||||
|
||||
Typically, such an event will be propagated to NFC Core from MSGRXWQ context.
|
||||
|
||||
Error management
|
||||
----------------
|
||||
|
||||
Errors that occur synchronously with the execution of an NFC Core request are
|
||||
simply returned as the execution result of the request. These are easy.
|
||||
|
||||
Errors that occur asynchronously (e.g. in a background protocol handling thread)
|
||||
must be reported such that upper layers don't stay ignorant that something
|
||||
went wrong below and know that expected events will probably never happen.
|
||||
Handling of these errors is done as follows:
|
||||
|
||||
- driver (pn544) fails to deliver an incoming frame: it stores the error such
|
||||
that any subsequent call to the driver will result in this error. Then it
|
||||
calls the standard nfc_shdlc_recv_frame() with a NULL argument to report the
|
||||
problem above. shdlc stores a EREMOTEIO sticky status, which will trigger
|
||||
SMW to report above in turn.
|
||||
|
||||
- SMW is basically a background thread to handle incoming and outgoing shdlc
|
||||
frames. This thread will also check the shdlc sticky status and report to HCI
|
||||
when it discovers it is not able to run anymore because of an unrecoverable
|
||||
error that happened within shdlc or below. If the problem occurs during shdlc
|
||||
connection, the error is reported through the connect completion.
|
||||
|
||||
- HCI: if an internal HCI error happens (frame is lost), or HCI is reported an
|
||||
error from a lower layer, HCI will either complete the currently executing
|
||||
command with that error, or notify NFC Core directly if no command is
|
||||
executing.
|
||||
|
||||
- NFC Core: when NFC Core is notified of an error from below and polling is
|
||||
active, it will send a tag discovered event with an empty tag list to the user
|
||||
space to let it know that the poll operation will never be able to detect a
|
||||
tag. If polling is not active and the error was sticky, lower levels will
|
||||
return it at next invocation.
|
34
Documentation/driver-api/nfc/nfc-pn544.rst
Normal file
34
Documentation/driver-api/nfc/nfc-pn544.rst
Normal file
@@ -0,0 +1,34 @@
|
||||
============================================================================
|
||||
Kernel driver for the NXP Semiconductors PN544 Near Field Communication chip
|
||||
============================================================================
|
||||
|
||||
|
||||
General
|
||||
-------
|
||||
|
||||
The PN544 is an integrated transmission module for contactless
|
||||
communication. The driver goes under drives/nfc/ and is compiled as a
|
||||
module named "pn544".
|
||||
|
||||
Host Interfaces: I2C, SPI and HSU, this driver supports currently only I2C.
|
||||
|
||||
Protocols
|
||||
---------
|
||||
|
||||
In the normal (HCI) mode and in the firmware update mode read and
|
||||
write functions behave a bit differently because the message formats
|
||||
or the protocols are different.
|
||||
|
||||
In the normal (HCI) mode the protocol used is derived from the ETSI
|
||||
HCI specification. The firmware is updated using a specific protocol,
|
||||
which is different from HCI.
|
||||
|
||||
HCI messages consist of an eight bit header and the message body. The
|
||||
header contains the message length. Maximum size for an HCI message is
|
||||
33. In HCI mode sent messages are tested for a correct
|
||||
checksum. Firmware update messages have the length in the second (MSB)
|
||||
and third (LSB) bytes of the message. The maximum FW message length is
|
||||
1024 bytes.
|
||||
|
||||
For the ETSI HCI specification see
|
||||
http://www.etsi.org/WebSite/Technologies/ProtocolSpecification.aspx
|
236
Documentation/driver-api/ntb.rst
Normal file
236
Documentation/driver-api/ntb.rst
Normal file
@@ -0,0 +1,236 @@
|
||||
===========
|
||||
NTB Drivers
|
||||
===========
|
||||
|
||||
NTB (Non-Transparent Bridge) is a type of PCI-Express bridge chip that connects
|
||||
the separate memory systems of two or more computers to the same PCI-Express
|
||||
fabric. Existing NTB hardware supports a common feature set: doorbell
|
||||
registers and memory translation windows, as well as non common features like
|
||||
scratchpad and message registers. Scratchpad registers are read-and-writable
|
||||
registers that are accessible from either side of the device, so that peers can
|
||||
exchange a small amount of information at a fixed address. Message registers can
|
||||
be utilized for the same purpose. Additionally they are provided with with
|
||||
special status bits to make sure the information isn't rewritten by another
|
||||
peer. Doorbell registers provide a way for peers to send interrupt events.
|
||||
Memory windows allow translated read and write access to the peer memory.
|
||||
|
||||
NTB Core Driver (ntb)
|
||||
=====================
|
||||
|
||||
The NTB core driver defines an api wrapping the common feature set, and allows
|
||||
clients interested in NTB features to discover NTB the devices supported by
|
||||
hardware drivers. The term "client" is used here to mean an upper layer
|
||||
component making use of the NTB api. The term "driver," or "hardware driver,"
|
||||
is used here to mean a driver for a specific vendor and model of NTB hardware.
|
||||
|
||||
NTB Client Drivers
|
||||
==================
|
||||
|
||||
NTB client drivers should register with the NTB core driver. After
|
||||
registering, the client probe and remove functions will be called appropriately
|
||||
as ntb hardware, or hardware drivers, are inserted and removed. The
|
||||
registration uses the Linux Device framework, so it should feel familiar to
|
||||
anyone who has written a pci driver.
|
||||
|
||||
NTB Typical client driver implementation
|
||||
----------------------------------------
|
||||
|
||||
Primary purpose of NTB is to share some peace of memory between at least two
|
||||
systems. So the NTB device features like Scratchpad/Message registers are
|
||||
mainly used to perform the proper memory window initialization. Typically
|
||||
there are two types of memory window interfaces supported by the NTB API:
|
||||
inbound translation configured on the local ntb port and outbound translation
|
||||
configured by the peer, on the peer ntb port. The first type is
|
||||
depicted on the next figure::
|
||||
|
||||
Inbound translation:
|
||||
|
||||
Memory: Local NTB Port: Peer NTB Port: Peer MMIO:
|
||||
____________
|
||||
| dma-mapped |-ntb_mw_set_trans(addr) |
|
||||
| memory | _v____________ | ______________
|
||||
| (addr) |<======| MW xlat addr |<====| MW base addr |<== memory-mapped IO
|
||||
|------------| |--------------| | |--------------|
|
||||
|
||||
So typical scenario of the first type memory window initialization looks:
|
||||
1) allocate a memory region, 2) put translated address to NTB config,
|
||||
3) somehow notify a peer device of performed initialization, 4) peer device
|
||||
maps corresponding outbound memory window so to have access to the shared
|
||||
memory region.
|
||||
|
||||
The second type of interface, that implies the shared windows being
|
||||
initialized by a peer device, is depicted on the figure::
|
||||
|
||||
Outbound translation:
|
||||
|
||||
Memory: Local NTB Port: Peer NTB Port: Peer MMIO:
|
||||
____________ ______________
|
||||
| dma-mapped | | | MW base addr |<== memory-mapped IO
|
||||
| memory | | |--------------|
|
||||
| (addr) |<===================| MW xlat addr |<-ntb_peer_mw_set_trans(addr)
|
||||
|------------| | |--------------|
|
||||
|
||||
Typical scenario of the second type interface initialization would be:
|
||||
1) allocate a memory region, 2) somehow deliver a translated address to a peer
|
||||
device, 3) peer puts the translated address to NTB config, 4) peer device maps
|
||||
outbound memory window so to have access to the shared memory region.
|
||||
|
||||
As one can see the described scenarios can be combined in one portable
|
||||
algorithm.
|
||||
|
||||
Local device:
|
||||
1) Allocate memory for a shared window
|
||||
2) Initialize memory window by translated address of the allocated region
|
||||
(it may fail if local memory window initialization is unsupported)
|
||||
3) Send the translated address and memory window index to a peer device
|
||||
|
||||
Peer device:
|
||||
1) Initialize memory window with retrieved address of the allocated
|
||||
by another device memory region (it may fail if peer memory window
|
||||
initialization is unsupported)
|
||||
2) Map outbound memory window
|
||||
|
||||
In accordance with this scenario, the NTB Memory Window API can be used as
|
||||
follows:
|
||||
|
||||
Local device:
|
||||
1) ntb_mw_count(pidx) - retrieve number of memory ranges, which can
|
||||
be allocated for memory windows between local device and peer device
|
||||
of port with specified index.
|
||||
2) ntb_get_align(pidx, midx) - retrieve parameters restricting the
|
||||
shared memory region alignment and size. Then memory can be properly
|
||||
allocated.
|
||||
3) Allocate physically contiguous memory region in compliance with
|
||||
restrictions retrieved in 2).
|
||||
4) ntb_mw_set_trans(pidx, midx) - try to set translation address of
|
||||
the memory window with specified index for the defined peer device
|
||||
(it may fail if local translated address setting is not supported)
|
||||
5) Send translated base address (usually together with memory window
|
||||
number) to the peer device using, for instance, scratchpad or message
|
||||
registers.
|
||||
|
||||
Peer device:
|
||||
1) ntb_peer_mw_set_trans(pidx, midx) - try to set received from other
|
||||
device (related to pidx) translated address for specified memory
|
||||
window. It may fail if retrieved address, for instance, exceeds
|
||||
maximum possible address or isn't properly aligned.
|
||||
2) ntb_peer_mw_get_addr(widx) - retrieve MMIO address to map the memory
|
||||
window so to have an access to the shared memory.
|
||||
|
||||
Also it is worth to note, that method ntb_mw_count(pidx) should return the
|
||||
same value as ntb_peer_mw_count() on the peer with port index - pidx.
|
||||
|
||||
NTB Transport Client (ntb\_transport) and NTB Netdev (ntb\_netdev)
|
||||
------------------------------------------------------------------
|
||||
|
||||
The primary client for NTB is the Transport client, used in tandem with NTB
|
||||
Netdev. These drivers function together to create a logical link to the peer,
|
||||
across the ntb, to exchange packets of network data. The Transport client
|
||||
establishes a logical link to the peer, and creates queue pairs to exchange
|
||||
messages and data. The NTB Netdev then creates an ethernet device using a
|
||||
Transport queue pair. Network data is copied between socket buffers and the
|
||||
Transport queue pair buffer. The Transport client may be used for other things
|
||||
besides Netdev, however no other applications have yet been written.
|
||||
|
||||
NTB Ping Pong Test Client (ntb\_pingpong)
|
||||
-----------------------------------------
|
||||
|
||||
The Ping Pong test client serves as a demonstration to exercise the doorbell
|
||||
and scratchpad registers of NTB hardware, and as an example simple NTB client.
|
||||
Ping Pong enables the link when started, waits for the NTB link to come up, and
|
||||
then proceeds to read and write the doorbell scratchpad registers of the NTB.
|
||||
The peers interrupt each other using a bit mask of doorbell bits, which is
|
||||
shifted by one in each round, to test the behavior of multiple doorbell bits
|
||||
and interrupt vectors. The Ping Pong driver also reads the first local
|
||||
scratchpad, and writes the value plus one to the first peer scratchpad, each
|
||||
round before writing the peer doorbell register.
|
||||
|
||||
Module Parameters:
|
||||
|
||||
* unsafe - Some hardware has known issues with scratchpad and doorbell
|
||||
registers. By default, Ping Pong will not attempt to exercise such
|
||||
hardware. You may override this behavior at your own risk by setting
|
||||
unsafe=1.
|
||||
* delay\_ms - Specify the delay between receiving a doorbell
|
||||
interrupt event and setting the peer doorbell register for the next
|
||||
round.
|
||||
* init\_db - Specify the doorbell bits to start new series of rounds. A new
|
||||
series begins once all the doorbell bits have been shifted out of
|
||||
range.
|
||||
* dyndbg - It is suggested to specify dyndbg=+p when loading this module, and
|
||||
then to observe debugging output on the console.
|
||||
|
||||
NTB Tool Test Client (ntb\_tool)
|
||||
--------------------------------
|
||||
|
||||
The Tool test client serves for debugging, primarily, ntb hardware and drivers.
|
||||
The Tool provides access through debugfs for reading, setting, and clearing the
|
||||
NTB doorbell, and reading and writing scratchpads.
|
||||
|
||||
The Tool does not currently have any module parameters.
|
||||
|
||||
Debugfs Files:
|
||||
|
||||
* *debugfs*/ntb\_tool/*hw*/
|
||||
A directory in debugfs will be created for each
|
||||
NTB device probed by the tool. This directory is shortened to *hw*
|
||||
below.
|
||||
* *hw*/db
|
||||
This file is used to read, set, and clear the local doorbell. Not
|
||||
all operations may be supported by all hardware. To read the doorbell,
|
||||
read the file. To set the doorbell, write `s` followed by the bits to
|
||||
set (eg: `echo 's 0x0101' > db`). To clear the doorbell, write `c`
|
||||
followed by the bits to clear.
|
||||
* *hw*/mask
|
||||
This file is used to read, set, and clear the local doorbell mask.
|
||||
See *db* for details.
|
||||
* *hw*/peer\_db
|
||||
This file is used to read, set, and clear the peer doorbell.
|
||||
See *db* for details.
|
||||
* *hw*/peer\_mask
|
||||
This file is used to read, set, and clear the peer doorbell
|
||||
mask. See *db* for details.
|
||||
* *hw*/spad
|
||||
This file is used to read and write local scratchpads. To read
|
||||
the values of all scratchpads, read the file. To write values, write a
|
||||
series of pairs of scratchpad number and value
|
||||
(eg: `echo '4 0x123 7 0xabc' > spad`
|
||||
# to set scratchpads `4` and `7` to `0x123` and `0xabc`, respectively).
|
||||
* *hw*/peer\_spad
|
||||
This file is used to read and write peer scratchpads. See
|
||||
*spad* for details.
|
||||
|
||||
NTB Hardware Drivers
|
||||
====================
|
||||
|
||||
NTB hardware drivers should register devices with the NTB core driver. After
|
||||
registering, clients probe and remove functions will be called.
|
||||
|
||||
NTB Intel Hardware Driver (ntb\_hw\_intel)
|
||||
------------------------------------------
|
||||
|
||||
The Intel hardware driver supports NTB on Xeon and Atom CPUs.
|
||||
|
||||
Module Parameters:
|
||||
|
||||
* b2b\_mw\_idx
|
||||
If the peer ntb is to be accessed via a memory window, then use
|
||||
this memory window to access the peer ntb. A value of zero or positive
|
||||
starts from the first mw idx, and a negative value starts from the last
|
||||
mw idx. Both sides MUST set the same value here! The default value is
|
||||
`-1`.
|
||||
* b2b\_mw\_share
|
||||
If the peer ntb is to be accessed via a memory window, and if
|
||||
the memory window is large enough, still allow the client to use the
|
||||
second half of the memory window for address translation to the peer.
|
||||
* xeon\_b2b\_usd\_bar2\_addr64
|
||||
If using B2B topology on Xeon hardware, use
|
||||
this 64 bit address on the bus between the NTB devices for the window
|
||||
at BAR2, on the upstream side of the link.
|
||||
* xeon\_b2b\_usd\_bar4\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
|
||||
* xeon\_b2b\_usd\_bar4\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
|
||||
* xeon\_b2b\_usd\_bar5\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
|
||||
* xeon\_b2b\_dsd\_bar2\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
|
||||
* xeon\_b2b\_dsd\_bar4\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
|
||||
* xeon\_b2b\_dsd\_bar4\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
|
||||
* xeon\_b2b\_dsd\_bar5\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
|
285
Documentation/driver-api/nvdimm/btt.rst
Normal file
285
Documentation/driver-api/nvdimm/btt.rst
Normal file
@@ -0,0 +1,285 @@
|
||||
=============================
|
||||
BTT - Block Translation Table
|
||||
=============================
|
||||
|
||||
|
||||
1. Introduction
|
||||
===============
|
||||
|
||||
Persistent memory based storage is able to perform IO at byte (or more
|
||||
accurately, cache line) granularity. However, we often want to expose such
|
||||
storage as traditional block devices. The block drivers for persistent memory
|
||||
will do exactly this. However, they do not provide any atomicity guarantees.
|
||||
Traditional SSDs typically provide protection against torn sectors in hardware,
|
||||
using stored energy in capacitors to complete in-flight block writes, or perhaps
|
||||
in firmware. We don't have this luxury with persistent memory - if a write is in
|
||||
progress, and we experience a power failure, the block will contain a mix of old
|
||||
and new data. Applications may not be prepared to handle such a scenario.
|
||||
|
||||
The Block Translation Table (BTT) provides atomic sector update semantics for
|
||||
persistent memory devices, so that applications that rely on sector writes not
|
||||
being torn can continue to do so. The BTT manifests itself as a stacked block
|
||||
device, and reserves a portion of the underlying storage for its metadata. At
|
||||
the heart of it, is an indirection table that re-maps all the blocks on the
|
||||
volume. It can be thought of as an extremely simple file system that only
|
||||
provides atomic sector updates.
|
||||
|
||||
|
||||
2. Static Layout
|
||||
================
|
||||
|
||||
The underlying storage on which a BTT can be laid out is not limited in any way.
|
||||
The BTT, however, splits the available space into chunks of up to 512 GiB,
|
||||
called "Arenas".
|
||||
|
||||
Each arena follows the same layout for its metadata, and all references in an
|
||||
arena are internal to it (with the exception of one field that points to the
|
||||
next arena). The following depicts the "On-disk" metadata layout::
|
||||
|
||||
|
||||
Backing Store +-------> Arena
|
||||
+---------------+ | +------------------+
|
||||
| | | | Arena info block |
|
||||
| Arena 0 +---+ | 4K |
|
||||
| 512G | +------------------+
|
||||
| | | |
|
||||
+---------------+ | |
|
||||
| | | |
|
||||
| Arena 1 | | Data Blocks |
|
||||
| 512G | | |
|
||||
| | | |
|
||||
+---------------+ | |
|
||||
| . | | |
|
||||
| . | | |
|
||||
| . | | |
|
||||
| | | |
|
||||
| | | |
|
||||
+---------------+ +------------------+
|
||||
| |
|
||||
| BTT Map |
|
||||
| |
|
||||
| |
|
||||
+------------------+
|
||||
| |
|
||||
| BTT Flog |
|
||||
| |
|
||||
+------------------+
|
||||
| Info block copy |
|
||||
| 4K |
|
||||
+------------------+
|
||||
|
||||
|
||||
3. Theory of Operation
|
||||
======================
|
||||
|
||||
|
||||
a. The BTT Map
|
||||
--------------
|
||||
|
||||
The map is a simple lookup/indirection table that maps an LBA to an internal
|
||||
block. Each map entry is 32 bits. The two most significant bits are special
|
||||
flags, and the remaining form the internal block number.
|
||||
|
||||
======== =============================================================
|
||||
Bit Description
|
||||
======== =============================================================
|
||||
31 - 30 Error and Zero flags - Used in the following way::
|
||||
|
||||
== == ====================================================
|
||||
31 30 Description
|
||||
== == ====================================================
|
||||
0 0 Initial state. Reads return zeroes; Premap = Postmap
|
||||
0 1 Zero state: Reads return zeroes
|
||||
1 0 Error state: Reads fail; Writes clear 'E' bit
|
||||
1 1 Normal Block – has valid postmap
|
||||
== == ====================================================
|
||||
|
||||
29 - 0 Mappings to internal 'postmap' blocks
|
||||
======== =============================================================
|
||||
|
||||
|
||||
Some of the terminology that will be subsequently used:
|
||||
|
||||
============ ================================================================
|
||||
External LBA LBA as made visible to upper layers.
|
||||
ABA Arena Block Address - Block offset/number within an arena
|
||||
Premap ABA The block offset into an arena, which was decided upon by range
|
||||
checking the External LBA
|
||||
Postmap ABA The block number in the "Data Blocks" area obtained after
|
||||
indirection from the map
|
||||
nfree The number of free blocks that are maintained at any given time.
|
||||
This is the number of concurrent writes that can happen to the
|
||||
arena.
|
||||
============ ================================================================
|
||||
|
||||
|
||||
For example, after adding a BTT, we surface a disk of 1024G. We get a read for
|
||||
the external LBA at 768G. This falls into the second arena, and of the 512G
|
||||
worth of blocks that this arena contributes, this block is at 256G. Thus, the
|
||||
premap ABA is 256G. We now refer to the map, and find out the mapping for block
|
||||
'X' (256G) points to block 'Y', say '64'. Thus the postmap ABA is 64.
|
||||
|
||||
|
||||
b. The BTT Flog
|
||||
---------------
|
||||
|
||||
The BTT provides sector atomicity by making every write an "allocating write",
|
||||
i.e. Every write goes to a "free" block. A running list of free blocks is
|
||||
maintained in the form of the BTT flog. 'Flog' is a combination of the words
|
||||
"free list" and "log". The flog contains 'nfree' entries, and an entry contains:
|
||||
|
||||
======== =====================================================================
|
||||
lba The premap ABA that is being written to
|
||||
old_map The old postmap ABA - after 'this' write completes, this will be a
|
||||
free block.
|
||||
new_map The new postmap ABA. The map will up updated to reflect this
|
||||
lba->postmap_aba mapping, but we log it here in case we have to
|
||||
recover.
|
||||
seq Sequence number to mark which of the 2 sections of this flog entry is
|
||||
valid/newest. It cycles between 01->10->11->01 (binary) under normal
|
||||
operation, with 00 indicating an uninitialized state.
|
||||
lba' alternate lba entry
|
||||
old_map' alternate old postmap entry
|
||||
new_map' alternate new postmap entry
|
||||
seq' alternate sequence number.
|
||||
======== =====================================================================
|
||||
|
||||
Each of the above fields is 32-bit, making one entry 32 bytes. Entries are also
|
||||
padded to 64 bytes to avoid cache line sharing or aliasing. Flog updates are
|
||||
done such that for any entry being written, it:
|
||||
a. overwrites the 'old' section in the entry based on sequence numbers
|
||||
b. writes the 'new' section such that the sequence number is written last.
|
||||
|
||||
|
||||
c. The concept of lanes
|
||||
-----------------------
|
||||
|
||||
While 'nfree' describes the number of concurrent IOs an arena can process
|
||||
concurrently, 'nlanes' is the number of IOs the BTT device as a whole can
|
||||
process::
|
||||
|
||||
nlanes = min(nfree, num_cpus)
|
||||
|
||||
A lane number is obtained at the start of any IO, and is used for indexing into
|
||||
all the on-disk and in-memory data structures for the duration of the IO. If
|
||||
there are more CPUs than the max number of available lanes, than lanes are
|
||||
protected by spinlocks.
|
||||
|
||||
|
||||
d. In-memory data structure: Read Tracking Table (RTT)
|
||||
------------------------------------------------------
|
||||
|
||||
Consider a case where we have two threads, one doing reads and the other,
|
||||
writes. We can hit a condition where the writer thread grabs a free block to do
|
||||
a new IO, but the (slow) reader thread is still reading from it. In other words,
|
||||
the reader consulted a map entry, and started reading the corresponding block. A
|
||||
writer started writing to the same external LBA, and finished the write updating
|
||||
the map for that external LBA to point to its new postmap ABA. At this point the
|
||||
internal, postmap block that the reader is (still) reading has been inserted
|
||||
into the list of free blocks. If another write comes in for the same LBA, it can
|
||||
grab this free block, and start writing to it, causing the reader to read
|
||||
incorrect data. To prevent this, we introduce the RTT.
|
||||
|
||||
The RTT is a simple, per arena table with 'nfree' entries. Every reader inserts
|
||||
into rtt[lane_number], the postmap ABA it is reading, and clears it after the
|
||||
read is complete. Every writer thread, after grabbing a free block, checks the
|
||||
RTT for its presence. If the postmap free block is in the RTT, it waits till the
|
||||
reader clears the RTT entry, and only then starts writing to it.
|
||||
|
||||
|
||||
e. In-memory data structure: map locks
|
||||
--------------------------------------
|
||||
|
||||
Consider a case where two writer threads are writing to the same LBA. There can
|
||||
be a race in the following sequence of steps::
|
||||
|
||||
free[lane] = map[premap_aba]
|
||||
map[premap_aba] = postmap_aba
|
||||
|
||||
Both threads can update their respective free[lane] with the same old, freed
|
||||
postmap_aba. This has made the layout inconsistent by losing a free entry, and
|
||||
at the same time, duplicating another free entry for two lanes.
|
||||
|
||||
To solve this, we could have a single map lock (per arena) that has to be taken
|
||||
before performing the above sequence, but we feel that could be too contentious.
|
||||
Instead we use an array of (nfree) map_locks that is indexed by
|
||||
(premap_aba modulo nfree).
|
||||
|
||||
|
||||
f. Reconstruction from the Flog
|
||||
-------------------------------
|
||||
|
||||
On startup, we analyze the BTT flog to create our list of free blocks. We walk
|
||||
through all the entries, and for each lane, of the set of two possible
|
||||
'sections', we always look at the most recent one only (based on the sequence
|
||||
number). The reconstruction rules/steps are simple:
|
||||
|
||||
- Read map[log_entry.lba].
|
||||
- If log_entry.new matches the map entry, then log_entry.old is free.
|
||||
- If log_entry.new does not match the map entry, then log_entry.new is free.
|
||||
(This case can only be caused by power-fails/unsafe shutdowns)
|
||||
|
||||
|
||||
g. Summarizing - Read and Write flows
|
||||
-------------------------------------
|
||||
|
||||
Read:
|
||||
|
||||
1. Convert external LBA to arena number + pre-map ABA
|
||||
2. Get a lane (and take lane_lock)
|
||||
3. Read map to get the entry for this pre-map ABA
|
||||
4. Enter post-map ABA into RTT[lane]
|
||||
5. If TRIM flag set in map, return zeroes, and end IO (go to step 8)
|
||||
6. If ERROR flag set in map, end IO with EIO (go to step 8)
|
||||
7. Read data from this block
|
||||
8. Remove post-map ABA entry from RTT[lane]
|
||||
9. Release lane (and lane_lock)
|
||||
|
||||
Write:
|
||||
|
||||
1. Convert external LBA to Arena number + pre-map ABA
|
||||
2. Get a lane (and take lane_lock)
|
||||
3. Use lane to index into in-memory free list and obtain a new block, next flog
|
||||
index, next sequence number
|
||||
4. Scan the RTT to check if free block is present, and spin/wait if it is.
|
||||
5. Write data to this free block
|
||||
6. Read map to get the existing post-map ABA entry for this pre-map ABA
|
||||
7. Write flog entry: [premap_aba / old postmap_aba / new postmap_aba / seq_num]
|
||||
8. Write new post-map ABA into map.
|
||||
9. Write old post-map entry into the free list
|
||||
10. Calculate next sequence number and write into the free list entry
|
||||
11. Release lane (and lane_lock)
|
||||
|
||||
|
||||
4. Error Handling
|
||||
=================
|
||||
|
||||
An arena would be in an error state if any of the metadata is corrupted
|
||||
irrecoverably, either due to a bug or a media error. The following conditions
|
||||
indicate an error:
|
||||
|
||||
- Info block checksum does not match (and recovering from the copy also fails)
|
||||
- All internal available blocks are not uniquely and entirely addressed by the
|
||||
sum of mapped blocks and free blocks (from the BTT flog).
|
||||
- Rebuilding free list from the flog reveals missing/duplicate/impossible
|
||||
entries
|
||||
- A map entry is out of bounds
|
||||
|
||||
If any of these error conditions are encountered, the arena is put into a read
|
||||
only state using a flag in the info block.
|
||||
|
||||
|
||||
5. Usage
|
||||
========
|
||||
|
||||
The BTT can be set up on any disk (namespace) exposed by the libnvdimm subsystem
|
||||
(pmem, or blk mode). The easiest way to set up such a namespace is using the
|
||||
'ndctl' utility [1]:
|
||||
|
||||
For example, the ndctl command line to setup a btt with a 4k sector size is::
|
||||
|
||||
ndctl create-namespace -f -e namespace0.0 -m sector -l 4k
|
||||
|
||||
See ndctl create-namespace --help for more options.
|
||||
|
||||
[1]: https://github.com/pmem/ndctl
|
12
Documentation/driver-api/nvdimm/index.rst
Normal file
12
Documentation/driver-api/nvdimm/index.rst
Normal file
@@ -0,0 +1,12 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===================================
|
||||
Non-Volatile Memory Device (NVDIMM)
|
||||
===================================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
nvdimm
|
||||
btt
|
||||
security
|
887
Documentation/driver-api/nvdimm/nvdimm.rst
Normal file
887
Documentation/driver-api/nvdimm/nvdimm.rst
Normal file
@@ -0,0 +1,887 @@
|
||||
===============================
|
||||
LIBNVDIMM: Non-Volatile Devices
|
||||
===============================
|
||||
|
||||
libnvdimm - kernel / libndctl - userspace helper library
|
||||
|
||||
linux-nvdimm@lists.01.org
|
||||
|
||||
Version 13
|
||||
|
||||
.. contents:
|
||||
|
||||
Glossary
|
||||
Overview
|
||||
Supporting Documents
|
||||
Git Trees
|
||||
LIBNVDIMM PMEM and BLK
|
||||
Why BLK?
|
||||
PMEM vs BLK
|
||||
BLK-REGIONs, PMEM-REGIONs, Atomic Sectors, and DAX
|
||||
Example NVDIMM Platform
|
||||
LIBNVDIMM Kernel Device Model and LIBNDCTL Userspace API
|
||||
LIBNDCTL: Context
|
||||
libndctl: instantiate a new library context example
|
||||
LIBNVDIMM/LIBNDCTL: Bus
|
||||
libnvdimm: control class device in /sys/class
|
||||
libnvdimm: bus
|
||||
libndctl: bus enumeration example
|
||||
LIBNVDIMM/LIBNDCTL: DIMM (NMEM)
|
||||
libnvdimm: DIMM (NMEM)
|
||||
libndctl: DIMM enumeration example
|
||||
LIBNVDIMM/LIBNDCTL: Region
|
||||
libnvdimm: region
|
||||
libndctl: region enumeration example
|
||||
Why Not Encode the Region Type into the Region Name?
|
||||
How Do I Determine the Major Type of a Region?
|
||||
LIBNVDIMM/LIBNDCTL: Namespace
|
||||
libnvdimm: namespace
|
||||
libndctl: namespace enumeration example
|
||||
libndctl: namespace creation example
|
||||
Why the Term "namespace"?
|
||||
LIBNVDIMM/LIBNDCTL: Block Translation Table "btt"
|
||||
libnvdimm: btt layout
|
||||
libndctl: btt creation example
|
||||
Summary LIBNDCTL Diagram
|
||||
|
||||
|
||||
Glossary
|
||||
========
|
||||
|
||||
PMEM:
|
||||
A system-physical-address range where writes are persistent. A
|
||||
block device composed of PMEM is capable of DAX. A PMEM address range
|
||||
may span an interleave of several DIMMs.
|
||||
|
||||
BLK:
|
||||
A set of one or more programmable memory mapped apertures provided
|
||||
by a DIMM to access its media. This indirection precludes the
|
||||
performance benefit of interleaving, but enables DIMM-bounded failure
|
||||
modes.
|
||||
|
||||
DPA:
|
||||
DIMM Physical Address, is a DIMM-relative offset. With one DIMM in
|
||||
the system there would be a 1:1 system-physical-address:DPA association.
|
||||
Once more DIMMs are added a memory controller interleave must be
|
||||
decoded to determine the DPA associated with a given
|
||||
system-physical-address. BLK capacity always has a 1:1 relationship
|
||||
with a single-DIMM's DPA range.
|
||||
|
||||
DAX:
|
||||
File system extensions to bypass the page cache and block layer to
|
||||
mmap persistent memory, from a PMEM block device, directly into a
|
||||
process address space.
|
||||
|
||||
DSM:
|
||||
Device Specific Method: ACPI method to to control specific
|
||||
device - in this case the firmware.
|
||||
|
||||
DCR:
|
||||
NVDIMM Control Region Structure defined in ACPI 6 Section 5.2.25.5.
|
||||
It defines a vendor-id, device-id, and interface format for a given DIMM.
|
||||
|
||||
BTT:
|
||||
Block Translation Table: Persistent memory is byte addressable.
|
||||
Existing software may have an expectation that the power-fail-atomicity
|
||||
of writes is at least one sector, 512 bytes. The BTT is an indirection
|
||||
table with atomic update semantics to front a PMEM/BLK block device
|
||||
driver and present arbitrary atomic sector sizes.
|
||||
|
||||
LABEL:
|
||||
Metadata stored on a DIMM device that partitions and identifies
|
||||
(persistently names) storage between PMEM and BLK. It also partitions
|
||||
BLK storage to host BTTs with different parameters per BLK-partition.
|
||||
Note that traditional partition tables, GPT/MBR, are layered on top of a
|
||||
BLK or PMEM device.
|
||||
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
The LIBNVDIMM subsystem provides support for three types of NVDIMMs, namely,
|
||||
PMEM, BLK, and NVDIMM devices that can simultaneously support both PMEM
|
||||
and BLK mode access. These three modes of operation are described by
|
||||
the "NVDIMM Firmware Interface Table" (NFIT) in ACPI 6. While the LIBNVDIMM
|
||||
implementation is generic and supports pre-NFIT platforms, it was guided
|
||||
by the superset of capabilities need to support this ACPI 6 definition
|
||||
for NVDIMM resources. The bulk of the kernel implementation is in place
|
||||
to handle the case where DPA accessible via PMEM is aliased with DPA
|
||||
accessible via BLK. When that occurs a LABEL is needed to reserve DPA
|
||||
for exclusive access via one mode a time.
|
||||
|
||||
Supporting Documents
|
||||
--------------------
|
||||
|
||||
ACPI 6:
|
||||
http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
|
||||
NVDIMM Namespace:
|
||||
http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
|
||||
DSM Interface Example:
|
||||
http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
|
||||
Driver Writer's Guide:
|
||||
http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf
|
||||
|
||||
Git Trees
|
||||
---------
|
||||
|
||||
LIBNVDIMM:
|
||||
https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git
|
||||
LIBNDCTL:
|
||||
https://github.com/pmem/ndctl.git
|
||||
PMEM:
|
||||
https://github.com/01org/prd
|
||||
|
||||
|
||||
LIBNVDIMM PMEM and BLK
|
||||
======================
|
||||
|
||||
Prior to the arrival of the NFIT, non-volatile memory was described to a
|
||||
system in various ad-hoc ways. Usually only the bare minimum was
|
||||
provided, namely, a single system-physical-address range where writes
|
||||
are expected to be durable after a system power loss. Now, the NFIT
|
||||
specification standardizes not only the description of PMEM, but also
|
||||
BLK and platform message-passing entry points for control and
|
||||
configuration.
|
||||
|
||||
For each NVDIMM access method (PMEM, BLK), LIBNVDIMM provides a block
|
||||
device driver:
|
||||
|
||||
1. PMEM (nd_pmem.ko): Drives a system-physical-address range. This
|
||||
range is contiguous in system memory and may be interleaved (hardware
|
||||
memory controller striped) across multiple DIMMs. When interleaved the
|
||||
platform may optionally provide details of which DIMMs are participating
|
||||
in the interleave.
|
||||
|
||||
Note that while LIBNVDIMM describes system-physical-address ranges that may
|
||||
alias with BLK access as ND_NAMESPACE_PMEM ranges and those without
|
||||
alias as ND_NAMESPACE_IO ranges, to the nd_pmem driver there is no
|
||||
distinction. The different device-types are an implementation detail
|
||||
that userspace can exploit to implement policies like "only interface
|
||||
with address ranges from certain DIMMs". It is worth noting that when
|
||||
aliasing is present and a DIMM lacks a label, then no block device can
|
||||
be created by default as userspace needs to do at least one allocation
|
||||
of DPA to the PMEM range. In contrast ND_NAMESPACE_IO ranges, once
|
||||
registered, can be immediately attached to nd_pmem.
|
||||
|
||||
2. BLK (nd_blk.ko): This driver performs I/O using a set of platform
|
||||
defined apertures. A set of apertures will access just one DIMM.
|
||||
Multiple windows (apertures) allow multiple concurrent accesses, much like
|
||||
tagged-command-queuing, and would likely be used by different threads or
|
||||
different CPUs.
|
||||
|
||||
The NFIT specification defines a standard format for a BLK-aperture, but
|
||||
the spec also allows for vendor specific layouts, and non-NFIT BLK
|
||||
implementations may have other designs for BLK I/O. For this reason
|
||||
"nd_blk" calls back into platform-specific code to perform the I/O.
|
||||
|
||||
One such implementation is defined in the "Driver Writer's Guide" and "DSM
|
||||
Interface Example".
|
||||
|
||||
|
||||
Why BLK?
|
||||
========
|
||||
|
||||
While PMEM provides direct byte-addressable CPU-load/store access to
|
||||
NVDIMM storage, it does not provide the best system RAS (recovery,
|
||||
availability, and serviceability) model. An access to a corrupted
|
||||
system-physical-address address causes a CPU exception while an access
|
||||
to a corrupted address through an BLK-aperture causes that block window
|
||||
to raise an error status in a register. The latter is more aligned with
|
||||
the standard error model that host-bus-adapter attached disks present.
|
||||
|
||||
Also, if an administrator ever wants to replace a memory it is easier to
|
||||
service a system at DIMM module boundaries. Compare this to PMEM where
|
||||
data could be interleaved in an opaque hardware specific manner across
|
||||
several DIMMs.
|
||||
|
||||
PMEM vs BLK
|
||||
-----------
|
||||
|
||||
BLK-apertures solve these RAS problems, but their presence is also the
|
||||
major contributing factor to the complexity of the ND subsystem. They
|
||||
complicate the implementation because PMEM and BLK alias in DPA space.
|
||||
Any given DIMM's DPA-range may contribute to one or more
|
||||
system-physical-address sets of interleaved DIMMs, *and* may also be
|
||||
accessed in its entirety through its BLK-aperture. Accessing a DPA
|
||||
through a system-physical-address while simultaneously accessing the
|
||||
same DPA through a BLK-aperture has undefined results. For this reason,
|
||||
DIMMs with this dual interface configuration include a DSM function to
|
||||
store/retrieve a LABEL. The LABEL effectively partitions the DPA-space
|
||||
into exclusive system-physical-address and BLK-aperture accessible
|
||||
regions. For simplicity a DIMM is allowed a PMEM "region" per each
|
||||
interleave set in which it is a member. The remaining DPA space can be
|
||||
carved into an arbitrary number of BLK devices with discontiguous
|
||||
extents.
|
||||
|
||||
BLK-REGIONs, PMEM-REGIONs, Atomic Sectors, and DAX
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
One of the few
|
||||
reasons to allow multiple BLK namespaces per REGION is so that each
|
||||
BLK-namespace can be configured with a BTT with unique atomic sector
|
||||
sizes. While a PMEM device can host a BTT the LABEL specification does
|
||||
not provide for a sector size to be specified for a PMEM namespace.
|
||||
|
||||
This is due to the expectation that the primary usage model for PMEM is
|
||||
via DAX, and the BTT is incompatible with DAX. However, for the cases
|
||||
where an application or filesystem still needs atomic sector update
|
||||
guarantees it can register a BTT on a PMEM device or partition. See
|
||||
LIBNVDIMM/NDCTL: Block Translation Table "btt"
|
||||
|
||||
|
||||
Example NVDIMM Platform
|
||||
=======================
|
||||
|
||||
For the remainder of this document the following diagram will be
|
||||
referenced for any example sysfs layouts::
|
||||
|
||||
|
||||
(a) (b) DIMM BLK-REGION
|
||||
+-------------------+--------+--------+--------+
|
||||
+------+ | pm0.0 | blk2.0 | pm1.0 | blk2.1 | 0 region2
|
||||
| imc0 +--+- - - region0- - - +--------+ +--------+
|
||||
+--+---+ | pm0.0 | blk3.0 | pm1.0 | blk3.1 | 1 region3
|
||||
| +-------------------+--------v v--------+
|
||||
+--+---+ | |
|
||||
| cpu0 | region1
|
||||
+--+---+ | |
|
||||
| +----------------------------^ ^--------+
|
||||
+--+---+ | blk4.0 | pm1.0 | blk4.0 | 2 region4
|
||||
| imc1 +--+----------------------------| +--------+
|
||||
+------+ | blk5.0 | pm1.0 | blk5.0 | 3 region5
|
||||
+----------------------------+--------+--------+
|
||||
|
||||
In this platform we have four DIMMs and two memory controllers in one
|
||||
socket. Each unique interface (BLK or PMEM) to DPA space is identified
|
||||
by a region device with a dynamically assigned id (REGION0 - REGION5).
|
||||
|
||||
1. The first portion of DIMM0 and DIMM1 are interleaved as REGION0. A
|
||||
single PMEM namespace is created in the REGION0-SPA-range that spans most
|
||||
of DIMM0 and DIMM1 with a user-specified name of "pm0.0". Some of that
|
||||
interleaved system-physical-address range is reclaimed as BLK-aperture
|
||||
accessed space starting at DPA-offset (a) into each DIMM. In that
|
||||
reclaimed space we create two BLK-aperture "namespaces" from REGION2 and
|
||||
REGION3 where "blk2.0" and "blk3.0" are just human readable names that
|
||||
could be set to any user-desired name in the LABEL.
|
||||
|
||||
2. In the last portion of DIMM0 and DIMM1 we have an interleaved
|
||||
system-physical-address range, REGION1, that spans those two DIMMs as
|
||||
well as DIMM2 and DIMM3. Some of REGION1 is allocated to a PMEM namespace
|
||||
named "pm1.0", the rest is reclaimed in 4 BLK-aperture namespaces (for
|
||||
each DIMM in the interleave set), "blk2.1", "blk3.1", "blk4.0", and
|
||||
"blk5.0".
|
||||
|
||||
3. The portion of DIMM2 and DIMM3 that do not participate in the REGION1
|
||||
interleaved system-physical-address range (i.e. the DPA address past
|
||||
offset (b) are also included in the "blk4.0" and "blk5.0" namespaces.
|
||||
Note, that this example shows that BLK-aperture namespaces don't need to
|
||||
be contiguous in DPA-space.
|
||||
|
||||
This bus is provided by the kernel under the device
|
||||
/sys/devices/platform/nfit_test.0 when CONFIG_NFIT_TEST is enabled and
|
||||
the nfit_test.ko module is loaded. This not only test LIBNVDIMM but the
|
||||
acpi_nfit.ko driver as well.
|
||||
|
||||
|
||||
LIBNVDIMM Kernel Device Model and LIBNDCTL Userspace API
|
||||
========================================================
|
||||
|
||||
What follows is a description of the LIBNVDIMM sysfs layout and a
|
||||
corresponding object hierarchy diagram as viewed through the LIBNDCTL
|
||||
API. The example sysfs paths and diagrams are relative to the Example
|
||||
NVDIMM Platform which is also the LIBNVDIMM bus used in the LIBNDCTL unit
|
||||
test.
|
||||
|
||||
LIBNDCTL: Context
|
||||
-----------------
|
||||
|
||||
Every API call in the LIBNDCTL library requires a context that holds the
|
||||
logging parameters and other library instance state. The library is
|
||||
based on the libabc template:
|
||||
|
||||
https://git.kernel.org/cgit/linux/kernel/git/kay/libabc.git
|
||||
|
||||
LIBNDCTL: instantiate a new library context example
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
::
|
||||
|
||||
struct ndctl_ctx *ctx;
|
||||
|
||||
if (ndctl_new(&ctx) == 0)
|
||||
return ctx;
|
||||
else
|
||||
return NULL;
|
||||
|
||||
LIBNVDIMM/LIBNDCTL: Bus
|
||||
-----------------------
|
||||
|
||||
A bus has a 1:1 relationship with an NFIT. The current expectation for
|
||||
ACPI based systems is that there is only ever one platform-global NFIT.
|
||||
That said, it is trivial to register multiple NFITs, the specification
|
||||
does not preclude it. The infrastructure supports multiple busses and
|
||||
we use this capability to test multiple NFIT configurations in the unit
|
||||
test.
|
||||
|
||||
LIBNVDIMM: control class device in /sys/class
|
||||
---------------------------------------------
|
||||
|
||||
This character device accepts DSM messages to be passed to DIMM
|
||||
identified by its NFIT handle::
|
||||
|
||||
/sys/class/nd/ndctl0
|
||||
|-- dev
|
||||
|-- device -> ../../../ndbus0
|
||||
|-- subsystem -> ../../../../../../../class/nd
|
||||
|
||||
|
||||
|
||||
LIBNVDIMM: bus
|
||||
--------------
|
||||
|
||||
::
|
||||
|
||||
struct nvdimm_bus *nvdimm_bus_register(struct device *parent,
|
||||
struct nvdimm_bus_descriptor *nfit_desc);
|
||||
|
||||
::
|
||||
|
||||
/sys/devices/platform/nfit_test.0/ndbus0
|
||||
|-- commands
|
||||
|-- nd
|
||||
|-- nfit
|
||||
|-- nmem0
|
||||
|-- nmem1
|
||||
|-- nmem2
|
||||
|-- nmem3
|
||||
|-- power
|
||||
|-- provider
|
||||
|-- region0
|
||||
|-- region1
|
||||
|-- region2
|
||||
|-- region3
|
||||
|-- region4
|
||||
|-- region5
|
||||
|-- uevent
|
||||
`-- wait_probe
|
||||
|
||||
LIBNDCTL: bus enumeration example
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Find the bus handle that describes the bus from Example NVDIMM Platform::
|
||||
|
||||
static struct ndctl_bus *get_bus_by_provider(struct ndctl_ctx *ctx,
|
||||
const char *provider)
|
||||
{
|
||||
struct ndctl_bus *bus;
|
||||
|
||||
ndctl_bus_foreach(ctx, bus)
|
||||
if (strcmp(provider, ndctl_bus_get_provider(bus)) == 0)
|
||||
return bus;
|
||||
|
||||
return NULL;
|
||||
}
|
||||
|
||||
bus = get_bus_by_provider(ctx, "nfit_test.0");
|
||||
|
||||
|
||||
LIBNVDIMM/LIBNDCTL: DIMM (NMEM)
|
||||
-------------------------------
|
||||
|
||||
The DIMM device provides a character device for sending commands to
|
||||
hardware, and it is a container for LABELs. If the DIMM is defined by
|
||||
NFIT then an optional 'nfit' attribute sub-directory is available to add
|
||||
NFIT-specifics.
|
||||
|
||||
Note that the kernel device name for "DIMMs" is "nmemX". The NFIT
|
||||
describes these devices via "Memory Device to System Physical Address
|
||||
Range Mapping Structure", and there is no requirement that they actually
|
||||
be physical DIMMs, so we use a more generic name.
|
||||
|
||||
LIBNVDIMM: DIMM (NMEM)
|
||||
^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
::
|
||||
|
||||
struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data,
|
||||
const struct attribute_group **groups, unsigned long flags,
|
||||
unsigned long *dsm_mask);
|
||||
|
||||
::
|
||||
|
||||
/sys/devices/platform/nfit_test.0/ndbus0
|
||||
|-- nmem0
|
||||
| |-- available_slots
|
||||
| |-- commands
|
||||
| |-- dev
|
||||
| |-- devtype
|
||||
| |-- driver -> ../../../../../bus/nd/drivers/nvdimm
|
||||
| |-- modalias
|
||||
| |-- nfit
|
||||
| | |-- device
|
||||
| | |-- format
|
||||
| | |-- handle
|
||||
| | |-- phys_id
|
||||
| | |-- rev_id
|
||||
| | |-- serial
|
||||
| | `-- vendor
|
||||
| |-- state
|
||||
| |-- subsystem -> ../../../../../bus/nd
|
||||
| `-- uevent
|
||||
|-- nmem1
|
||||
[..]
|
||||
|
||||
|
||||
LIBNDCTL: DIMM enumeration example
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Note, in this example we are assuming NFIT-defined DIMMs which are
|
||||
identified by an "nfit_handle" a 32-bit value where:
|
||||
|
||||
- Bit 3:0 DIMM number within the memory channel
|
||||
- Bit 7:4 memory channel number
|
||||
- Bit 11:8 memory controller ID
|
||||
- Bit 15:12 socket ID (within scope of a Node controller if node
|
||||
controller is present)
|
||||
- Bit 27:16 Node Controller ID
|
||||
- Bit 31:28 Reserved
|
||||
|
||||
::
|
||||
|
||||
static struct ndctl_dimm *get_dimm_by_handle(struct ndctl_bus *bus,
|
||||
unsigned int handle)
|
||||
{
|
||||
struct ndctl_dimm *dimm;
|
||||
|
||||
ndctl_dimm_foreach(bus, dimm)
|
||||
if (ndctl_dimm_get_handle(dimm) == handle)
|
||||
return dimm;
|
||||
|
||||
return NULL;
|
||||
}
|
||||
|
||||
#define DIMM_HANDLE(n, s, i, c, d) \
|
||||
(((n & 0xfff) << 16) | ((s & 0xf) << 12) | ((i & 0xf) << 8) \
|
||||
| ((c & 0xf) << 4) | (d & 0xf))
|
||||
|
||||
dimm = get_dimm_by_handle(bus, DIMM_HANDLE(0, 0, 0, 0, 0));
|
||||
|
||||
LIBNVDIMM/LIBNDCTL: Region
|
||||
--------------------------
|
||||
|
||||
A generic REGION device is registered for each PMEM range or BLK-aperture
|
||||
set. Per the example there are 6 regions: 2 PMEM and 4 BLK-aperture
|
||||
sets on the "nfit_test.0" bus. The primary role of regions are to be a
|
||||
container of "mappings". A mapping is a tuple of <DIMM,
|
||||
DPA-start-offset, length>.
|
||||
|
||||
LIBNVDIMM provides a built-in driver for these REGION devices. This driver
|
||||
is responsible for reconciling the aliased DPA mappings across all
|
||||
regions, parsing the LABEL, if present, and then emitting NAMESPACE
|
||||
devices with the resolved/exclusive DPA-boundaries for the nd_pmem or
|
||||
nd_blk device driver to consume.
|
||||
|
||||
In addition to the generic attributes of "mapping"s, "interleave_ways"
|
||||
and "size" the REGION device also exports some convenience attributes.
|
||||
"nstype" indicates the integer type of namespace-device this region
|
||||
emits, "devtype" duplicates the DEVTYPE variable stored by udev at the
|
||||
'add' event, "modalias" duplicates the MODALIAS variable stored by udev
|
||||
at the 'add' event, and finally, the optional "spa_index" is provided in
|
||||
the case where the region is defined by a SPA.
|
||||
|
||||
LIBNVDIMM: region::
|
||||
|
||||
struct nd_region *nvdimm_pmem_region_create(struct nvdimm_bus *nvdimm_bus,
|
||||
struct nd_region_desc *ndr_desc);
|
||||
struct nd_region *nvdimm_blk_region_create(struct nvdimm_bus *nvdimm_bus,
|
||||
struct nd_region_desc *ndr_desc);
|
||||
|
||||
::
|
||||
|
||||
/sys/devices/platform/nfit_test.0/ndbus0
|
||||
|-- region0
|
||||
| |-- available_size
|
||||
| |-- btt0
|
||||
| |-- btt_seed
|
||||
| |-- devtype
|
||||
| |-- driver -> ../../../../../bus/nd/drivers/nd_region
|
||||
| |-- init_namespaces
|
||||
| |-- mapping0
|
||||
| |-- mapping1
|
||||
| |-- mappings
|
||||
| |-- modalias
|
||||
| |-- namespace0.0
|
||||
| |-- namespace_seed
|
||||
| |-- numa_node
|
||||
| |-- nfit
|
||||
| | `-- spa_index
|
||||
| |-- nstype
|
||||
| |-- set_cookie
|
||||
| |-- size
|
||||
| |-- subsystem -> ../../../../../bus/nd
|
||||
| `-- uevent
|
||||
|-- region1
|
||||
[..]
|
||||
|
||||
LIBNDCTL: region enumeration example
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Sample region retrieval routines based on NFIT-unique data like
|
||||
"spa_index" (interleave set id) for PMEM and "nfit_handle" (dimm id) for
|
||||
BLK::
|
||||
|
||||
static struct ndctl_region *get_pmem_region_by_spa_index(struct ndctl_bus *bus,
|
||||
unsigned int spa_index)
|
||||
{
|
||||
struct ndctl_region *region;
|
||||
|
||||
ndctl_region_foreach(bus, region) {
|
||||
if (ndctl_region_get_type(region) != ND_DEVICE_REGION_PMEM)
|
||||
continue;
|
||||
if (ndctl_region_get_spa_index(region) == spa_index)
|
||||
return region;
|
||||
}
|
||||
return NULL;
|
||||
}
|
||||
|
||||
static struct ndctl_region *get_blk_region_by_dimm_handle(struct ndctl_bus *bus,
|
||||
unsigned int handle)
|
||||
{
|
||||
struct ndctl_region *region;
|
||||
|
||||
ndctl_region_foreach(bus, region) {
|
||||
struct ndctl_mapping *map;
|
||||
|
||||
if (ndctl_region_get_type(region) != ND_DEVICE_REGION_BLOCK)
|
||||
continue;
|
||||
ndctl_mapping_foreach(region, map) {
|
||||
struct ndctl_dimm *dimm = ndctl_mapping_get_dimm(map);
|
||||
|
||||
if (ndctl_dimm_get_handle(dimm) == handle)
|
||||
return region;
|
||||
}
|
||||
}
|
||||
return NULL;
|
||||
}
|
||||
|
||||
|
||||
Why Not Encode the Region Type into the Region Name?
|
||||
----------------------------------------------------
|
||||
|
||||
At first glance it seems since NFIT defines just PMEM and BLK interface
|
||||
types that we should simply name REGION devices with something derived
|
||||
from those type names. However, the ND subsystem explicitly keeps the
|
||||
REGION name generic and expects userspace to always consider the
|
||||
region-attributes for four reasons:
|
||||
|
||||
1. There are already more than two REGION and "namespace" types. For
|
||||
PMEM there are two subtypes. As mentioned previously we have PMEM where
|
||||
the constituent DIMM devices are known and anonymous PMEM. For BLK
|
||||
regions the NFIT specification already anticipates vendor specific
|
||||
implementations. The exact distinction of what a region contains is in
|
||||
the region-attributes not the region-name or the region-devtype.
|
||||
|
||||
2. A region with zero child-namespaces is a possible configuration. For
|
||||
example, the NFIT allows for a DCR to be published without a
|
||||
corresponding BLK-aperture. This equates to a DIMM that can only accept
|
||||
control/configuration messages, but no i/o through a descendant block
|
||||
device. Again, this "type" is advertised in the attributes ('mappings'
|
||||
== 0) and the name does not tell you much.
|
||||
|
||||
3. What if a third major interface type arises in the future? Outside
|
||||
of vendor specific implementations, it's not difficult to envision a
|
||||
third class of interface type beyond BLK and PMEM. With a generic name
|
||||
for the REGION level of the device-hierarchy old userspace
|
||||
implementations can still make sense of new kernel advertised
|
||||
region-types. Userspace can always rely on the generic region
|
||||
attributes like "mappings", "size", etc and the expected child devices
|
||||
named "namespace". This generic format of the device-model hierarchy
|
||||
allows the LIBNVDIMM and LIBNDCTL implementations to be more uniform and
|
||||
future-proof.
|
||||
|
||||
4. There are more robust mechanisms for determining the major type of a
|
||||
region than a device name. See the next section, How Do I Determine the
|
||||
Major Type of a Region?
|
||||
|
||||
How Do I Determine the Major Type of a Region?
|
||||
----------------------------------------------
|
||||
|
||||
Outside of the blanket recommendation of "use libndctl", or simply
|
||||
looking at the kernel header (/usr/include/linux/ndctl.h) to decode the
|
||||
"nstype" integer attribute, here are some other options.
|
||||
|
||||
1. module alias lookup
|
||||
^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The whole point of region/namespace device type differentiation is to
|
||||
decide which block-device driver will attach to a given LIBNVDIMM namespace.
|
||||
One can simply use the modalias to lookup the resulting module. It's
|
||||
important to note that this method is robust in the presence of a
|
||||
vendor-specific driver down the road. If a vendor-specific
|
||||
implementation wants to supplant the standard nd_blk driver it can with
|
||||
minimal impact to the rest of LIBNVDIMM.
|
||||
|
||||
In fact, a vendor may also want to have a vendor-specific region-driver
|
||||
(outside of nd_region). For example, if a vendor defined its own LABEL
|
||||
format it would need its own region driver to parse that LABEL and emit
|
||||
the resulting namespaces. The output from module resolution is more
|
||||
accurate than a region-name or region-devtype.
|
||||
|
||||
2. udev
|
||||
^^^^^^^
|
||||
|
||||
The kernel "devtype" is registered in the udev database::
|
||||
|
||||
# udevadm info --path=/devices/platform/nfit_test.0/ndbus0/region0
|
||||
P: /devices/platform/nfit_test.0/ndbus0/region0
|
||||
E: DEVPATH=/devices/platform/nfit_test.0/ndbus0/region0
|
||||
E: DEVTYPE=nd_pmem
|
||||
E: MODALIAS=nd:t2
|
||||
E: SUBSYSTEM=nd
|
||||
|
||||
# udevadm info --path=/devices/platform/nfit_test.0/ndbus0/region4
|
||||
P: /devices/platform/nfit_test.0/ndbus0/region4
|
||||
E: DEVPATH=/devices/platform/nfit_test.0/ndbus0/region4
|
||||
E: DEVTYPE=nd_blk
|
||||
E: MODALIAS=nd:t3
|
||||
E: SUBSYSTEM=nd
|
||||
|
||||
...and is available as a region attribute, but keep in mind that the
|
||||
"devtype" does not indicate sub-type variations and scripts should
|
||||
really be understanding the other attributes.
|
||||
|
||||
3. type specific attributes
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
As it currently stands a BLK-aperture region will never have a
|
||||
"nfit/spa_index" attribute, but neither will a non-NFIT PMEM region. A
|
||||
BLK region with a "mappings" value of 0 is, as mentioned above, a DIMM
|
||||
that does not allow I/O. A PMEM region with a "mappings" value of zero
|
||||
is a simple system-physical-address range.
|
||||
|
||||
|
||||
LIBNVDIMM/LIBNDCTL: Namespace
|
||||
-----------------------------
|
||||
|
||||
A REGION, after resolving DPA aliasing and LABEL specified boundaries,
|
||||
surfaces one or more "namespace" devices. The arrival of a "namespace"
|
||||
device currently triggers either the nd_blk or nd_pmem driver to load
|
||||
and register a disk/block device.
|
||||
|
||||
LIBNVDIMM: namespace
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Here is a sample layout from the three major types of NAMESPACE where
|
||||
namespace0.0 represents DIMM-info-backed PMEM (note that it has a 'uuid'
|
||||
attribute), namespace2.0 represents a BLK namespace (note it has a
|
||||
'sector_size' attribute) that, and namespace6.0 represents an anonymous
|
||||
PMEM namespace (note that has no 'uuid' attribute due to not support a
|
||||
LABEL)::
|
||||
|
||||
/sys/devices/platform/nfit_test.0/ndbus0/region0/namespace0.0
|
||||
|-- alt_name
|
||||
|-- devtype
|
||||
|-- dpa_extents
|
||||
|-- force_raw
|
||||
|-- modalias
|
||||
|-- numa_node
|
||||
|-- resource
|
||||
|-- size
|
||||
|-- subsystem -> ../../../../../../bus/nd
|
||||
|-- type
|
||||
|-- uevent
|
||||
`-- uuid
|
||||
/sys/devices/platform/nfit_test.0/ndbus0/region2/namespace2.0
|
||||
|-- alt_name
|
||||
|-- devtype
|
||||
|-- dpa_extents
|
||||
|-- force_raw
|
||||
|-- modalias
|
||||
|-- numa_node
|
||||
|-- sector_size
|
||||
|-- size
|
||||
|-- subsystem -> ../../../../../../bus/nd
|
||||
|-- type
|
||||
|-- uevent
|
||||
`-- uuid
|
||||
/sys/devices/platform/nfit_test.1/ndbus1/region6/namespace6.0
|
||||
|-- block
|
||||
| `-- pmem0
|
||||
|-- devtype
|
||||
|-- driver -> ../../../../../../bus/nd/drivers/pmem
|
||||
|-- force_raw
|
||||
|-- modalias
|
||||
|-- numa_node
|
||||
|-- resource
|
||||
|-- size
|
||||
|-- subsystem -> ../../../../../../bus/nd
|
||||
|-- type
|
||||
`-- uevent
|
||||
|
||||
LIBNDCTL: namespace enumeration example
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Namespaces are indexed relative to their parent region, example below.
|
||||
These indexes are mostly static from boot to boot, but subsystem makes
|
||||
no guarantees in this regard. For a static namespace identifier use its
|
||||
'uuid' attribute.
|
||||
|
||||
::
|
||||
|
||||
static struct ndctl_namespace
|
||||
*get_namespace_by_id(struct ndctl_region *region, unsigned int id)
|
||||
{
|
||||
struct ndctl_namespace *ndns;
|
||||
|
||||
ndctl_namespace_foreach(region, ndns)
|
||||
if (ndctl_namespace_get_id(ndns) == id)
|
||||
return ndns;
|
||||
|
||||
return NULL;
|
||||
}
|
||||
|
||||
LIBNDCTL: namespace creation example
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Idle namespaces are automatically created by the kernel if a given
|
||||
region has enough available capacity to create a new namespace.
|
||||
Namespace instantiation involves finding an idle namespace and
|
||||
configuring it. For the most part the setting of namespace attributes
|
||||
can occur in any order, the only constraint is that 'uuid' must be set
|
||||
before 'size'. This enables the kernel to track DPA allocations
|
||||
internally with a static identifier::
|
||||
|
||||
static int configure_namespace(struct ndctl_region *region,
|
||||
struct ndctl_namespace *ndns,
|
||||
struct namespace_parameters *parameters)
|
||||
{
|
||||
char devname[50];
|
||||
|
||||
snprintf(devname, sizeof(devname), "namespace%d.%d",
|
||||
ndctl_region_get_id(region), paramaters->id);
|
||||
|
||||
ndctl_namespace_set_alt_name(ndns, devname);
|
||||
/* 'uuid' must be set prior to setting size! */
|
||||
ndctl_namespace_set_uuid(ndns, paramaters->uuid);
|
||||
ndctl_namespace_set_size(ndns, paramaters->size);
|
||||
/* unlike pmem namespaces, blk namespaces have a sector size */
|
||||
if (parameters->lbasize)
|
||||
ndctl_namespace_set_sector_size(ndns, parameters->lbasize);
|
||||
ndctl_namespace_enable(ndns);
|
||||
}
|
||||
|
||||
|
||||
Why the Term "namespace"?
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
1. Why not "volume" for instance? "volume" ran the risk of confusing
|
||||
ND (libnvdimm subsystem) to a volume manager like device-mapper.
|
||||
|
||||
2. The term originated to describe the sub-devices that can be created
|
||||
within a NVME controller (see the nvme specification:
|
||||
http://www.nvmexpress.org/specifications/), and NFIT namespaces are
|
||||
meant to parallel the capabilities and configurability of
|
||||
NVME-namespaces.
|
||||
|
||||
|
||||
LIBNVDIMM/LIBNDCTL: Block Translation Table "btt"
|
||||
-------------------------------------------------
|
||||
|
||||
A BTT (design document: http://pmem.io/2014/09/23/btt.html) is a stacked
|
||||
block device driver that fronts either the whole block device or a
|
||||
partition of a block device emitted by either a PMEM or BLK NAMESPACE.
|
||||
|
||||
LIBNVDIMM: btt layout
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Every region will start out with at least one BTT device which is the
|
||||
seed device. To activate it set the "namespace", "uuid", and
|
||||
"sector_size" attributes and then bind the device to the nd_pmem or
|
||||
nd_blk driver depending on the region type::
|
||||
|
||||
/sys/devices/platform/nfit_test.1/ndbus0/region0/btt0/
|
||||
|-- namespace
|
||||
|-- delete
|
||||
|-- devtype
|
||||
|-- modalias
|
||||
|-- numa_node
|
||||
|-- sector_size
|
||||
|-- subsystem -> ../../../../../bus/nd
|
||||
|-- uevent
|
||||
`-- uuid
|
||||
|
||||
LIBNDCTL: btt creation example
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Similar to namespaces an idle BTT device is automatically created per
|
||||
region. Each time this "seed" btt device is configured and enabled a new
|
||||
seed is created. Creating a BTT configuration involves two steps of
|
||||
finding and idle BTT and assigning it to consume a PMEM or BLK namespace::
|
||||
|
||||
static struct ndctl_btt *get_idle_btt(struct ndctl_region *region)
|
||||
{
|
||||
struct ndctl_btt *btt;
|
||||
|
||||
ndctl_btt_foreach(region, btt)
|
||||
if (!ndctl_btt_is_enabled(btt)
|
||||
&& !ndctl_btt_is_configured(btt))
|
||||
return btt;
|
||||
|
||||
return NULL;
|
||||
}
|
||||
|
||||
static int configure_btt(struct ndctl_region *region,
|
||||
struct btt_parameters *parameters)
|
||||
{
|
||||
btt = get_idle_btt(region);
|
||||
|
||||
ndctl_btt_set_uuid(btt, parameters->uuid);
|
||||
ndctl_btt_set_sector_size(btt, parameters->sector_size);
|
||||
ndctl_btt_set_namespace(btt, parameters->ndns);
|
||||
/* turn off raw mode device */
|
||||
ndctl_namespace_disable(parameters->ndns);
|
||||
/* turn on btt access */
|
||||
ndctl_btt_enable(btt);
|
||||
}
|
||||
|
||||
Once instantiated a new inactive btt seed device will appear underneath
|
||||
the region.
|
||||
|
||||
Once a "namespace" is removed from a BTT that instance of the BTT device
|
||||
will be deleted or otherwise reset to default values. This deletion is
|
||||
only at the device model level. In order to destroy a BTT the "info
|
||||
block" needs to be destroyed. Note, that to destroy a BTT the media
|
||||
needs to be written in raw mode. By default, the kernel will autodetect
|
||||
the presence of a BTT and disable raw mode. This autodetect behavior
|
||||
can be suppressed by enabling raw mode for the namespace via the
|
||||
ndctl_namespace_set_raw_mode() API.
|
||||
|
||||
|
||||
Summary LIBNDCTL Diagram
|
||||
------------------------
|
||||
|
||||
For the given example above, here is the view of the objects as seen by the
|
||||
LIBNDCTL API::
|
||||
|
||||
+---+
|
||||
|CTX| +---------+ +--------------+ +---------------+
|
||||
+-+-+ +-> REGION0 +---> NAMESPACE0.0 +--> PMEM8 "pm0.0" |
|
||||
| | +---------+ +--------------+ +---------------+
|
||||
+-------+ | | +---------+ +--------------+ +---------------+
|
||||
| DIMM0 <-+ | +-> REGION1 +---> NAMESPACE1.0 +--> PMEM6 "pm1.0" |
|
||||
+-------+ | | | +---------+ +--------------+ +---------------+
|
||||
| DIMM1 <-+ +-v--+ | +---------+ +--------------+ +---------------+
|
||||
+-------+ +-+BUS0+---> REGION2 +-+-> NAMESPACE2.0 +--> ND6 "blk2.0" |
|
||||
| DIMM2 <-+ +----+ | +---------+ | +--------------+ +----------------------+
|
||||
+-------+ | | +-> NAMESPACE2.1 +--> ND5 "blk2.1" | BTT2 |
|
||||
| DIMM3 <-+ | +--------------+ +----------------------+
|
||||
+-------+ | +---------+ +--------------+ +---------------+
|
||||
+-> REGION3 +-+-> NAMESPACE3.0 +--> ND4 "blk3.0" |
|
||||
| +---------+ | +--------------+ +----------------------+
|
||||
| +-> NAMESPACE3.1 +--> ND3 "blk3.1" | BTT1 |
|
||||
| +--------------+ +----------------------+
|
||||
| +---------+ +--------------+ +---------------+
|
||||
+-> REGION4 +---> NAMESPACE4.0 +--> ND2 "blk4.0" |
|
||||
| +---------+ +--------------+ +---------------+
|
||||
| +---------+ +--------------+ +----------------------+
|
||||
+-> REGION5 +---> NAMESPACE5.0 +--> ND1 "blk5.0" | BTT0 |
|
||||
+---------+ +--------------+ +---------------+------+
|
143
Documentation/driver-api/nvdimm/security.rst
Normal file
143
Documentation/driver-api/nvdimm/security.rst
Normal file
@@ -0,0 +1,143 @@
|
||||
===============
|
||||
NVDIMM Security
|
||||
===============
|
||||
|
||||
1. Introduction
|
||||
---------------
|
||||
|
||||
With the introduction of Intel Device Specific Methods (DSM) v1.8
|
||||
specification [1], security DSMs are introduced. The spec added the following
|
||||
security DSMs: "get security state", "set passphrase", "disable passphrase",
|
||||
"unlock unit", "freeze lock", "secure erase", and "overwrite". A security_ops
|
||||
data structure has been added to struct dimm in order to support the security
|
||||
operations and generic APIs are exposed to allow vendor neutral operations.
|
||||
|
||||
2. Sysfs Interface
|
||||
------------------
|
||||
The "security" sysfs attribute is provided in the nvdimm sysfs directory. For
|
||||
example:
|
||||
/sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/nmem0/security
|
||||
|
||||
The "show" attribute of that attribute will display the security state for
|
||||
that DIMM. The following states are available: disabled, unlocked, locked,
|
||||
frozen, and overwrite. If security is not supported, the sysfs attribute
|
||||
will not be visible.
|
||||
|
||||
The "store" attribute takes several commands when it is being written to
|
||||
in order to support some of the security functionalities:
|
||||
update <old_keyid> <new_keyid> - enable or update passphrase.
|
||||
disable <keyid> - disable enabled security and remove key.
|
||||
freeze - freeze changing of security states.
|
||||
erase <keyid> - delete existing user encryption key.
|
||||
overwrite <keyid> - wipe the entire nvdimm.
|
||||
master_update <keyid> <new_keyid> - enable or update master passphrase.
|
||||
master_erase <keyid> - delete existing user encryption key.
|
||||
|
||||
3. Key Management
|
||||
-----------------
|
||||
|
||||
The key is associated to the payload by the DIMM id. For example:
|
||||
# cat /sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/nmem0/nfit/id
|
||||
8089-a2-1740-00000133
|
||||
The DIMM id would be provided along with the key payload (passphrase) to
|
||||
the kernel.
|
||||
|
||||
The security keys are managed on the basis of a single key per DIMM. The
|
||||
key "passphrase" is expected to be 32bytes long. This is similar to the ATA
|
||||
security specification [2]. A key is initially acquired via the request_key()
|
||||
kernel API call during nvdimm unlock. It is up to the user to make sure that
|
||||
all the keys are in the kernel user keyring for unlock.
|
||||
|
||||
A nvdimm encrypted-key of format enc32 has the description format of:
|
||||
nvdimm:<bus-provider-specific-unique-id>
|
||||
|
||||
See file ``Documentation/security/keys/trusted-encrypted.rst`` for creating
|
||||
encrypted-keys of enc32 format. TPM usage with a master trusted key is
|
||||
preferred for sealing the encrypted-keys.
|
||||
|
||||
4. Unlocking
|
||||
------------
|
||||
When the DIMMs are being enumerated by the kernel, the kernel will attempt to
|
||||
retrieve the key from the kernel user keyring. This is the only time
|
||||
a locked DIMM can be unlocked. Once unlocked, the DIMM will remain unlocked
|
||||
until reboot. Typically an entity (i.e. shell script) will inject all the
|
||||
relevant encrypted-keys into the kernel user keyring during the initramfs phase.
|
||||
This provides the unlock function access to all the related keys that contain
|
||||
the passphrase for the respective nvdimms. It is also recommended that the
|
||||
keys are injected before libnvdimm is loaded by modprobe.
|
||||
|
||||
5. Update
|
||||
---------
|
||||
When doing an update, it is expected that the existing key is removed from
|
||||
the kernel user keyring and reinjected as different (old) key. It's irrelevant
|
||||
what the key description is for the old key since we are only interested in the
|
||||
keyid when doing the update operation. It is also expected that the new key
|
||||
is injected with the description format described from earlier in this
|
||||
document. The update command written to the sysfs attribute will be with
|
||||
the format:
|
||||
update <old keyid> <new keyid>
|
||||
|
||||
If there is no old keyid due to a security enabling, then a 0 should be
|
||||
passed in.
|
||||
|
||||
6. Freeze
|
||||
---------
|
||||
The freeze operation does not require any keys. The security config can be
|
||||
frozen by a user with root privelege.
|
||||
|
||||
7. Disable
|
||||
----------
|
||||
The security disable command format is:
|
||||
disable <keyid>
|
||||
|
||||
An key with the current passphrase payload that is tied to the nvdimm should be
|
||||
in the kernel user keyring.
|
||||
|
||||
8. Secure Erase
|
||||
---------------
|
||||
The command format for doing a secure erase is:
|
||||
erase <keyid>
|
||||
|
||||
An key with the current passphrase payload that is tied to the nvdimm should be
|
||||
in the kernel user keyring.
|
||||
|
||||
9. Overwrite
|
||||
------------
|
||||
The command format for doing an overwrite is:
|
||||
overwrite <keyid>
|
||||
|
||||
Overwrite can be done without a key if security is not enabled. A key serial
|
||||
of 0 can be passed in to indicate no key.
|
||||
|
||||
The sysfs attribute "security" can be polled to wait on overwrite completion.
|
||||
Overwrite can last tens of minutes or more depending on nvdimm size.
|
||||
|
||||
An encrypted-key with the current user passphrase that is tied to the nvdimm
|
||||
should be injected and its keyid should be passed in via sysfs.
|
||||
|
||||
10. Master Update
|
||||
-----------------
|
||||
The command format for doing a master update is:
|
||||
update <old keyid> <new keyid>
|
||||
|
||||
The operating mechanism for master update is identical to update except the
|
||||
master passphrase key is passed to the kernel. The master passphrase key
|
||||
is just another encrypted-key.
|
||||
|
||||
This command is only available when security is disabled.
|
||||
|
||||
11. Master Erase
|
||||
----------------
|
||||
The command format for doing a master erase is:
|
||||
master_erase <current keyid>
|
||||
|
||||
This command has the same operating mechanism as erase except the master
|
||||
passphrase key is passed to the kernel. The master passphrase key is just
|
||||
another encrypted-key.
|
||||
|
||||
This command is only available when the master security is enabled, indicated
|
||||
by the extended security status.
|
||||
|
||||
[1]: http://pmem.io/documents/NVDIMM_DSM_Interface-V1.8.pdf
|
||||
|
||||
[2]: http://www.t13.org/documents/UploadedDocuments/docs2006/e05179r4-ACS-SecurityClarifications.pdf
|
189
Documentation/driver-api/nvmem.rst
Normal file
189
Documentation/driver-api/nvmem.rst
Normal file
@@ -0,0 +1,189 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===============
|
||||
NVMEM Subsystem
|
||||
===============
|
||||
|
||||
Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
|
||||
|
||||
This document explains the NVMEM Framework along with the APIs provided,
|
||||
and how to use it.
|
||||
|
||||
1. Introduction
|
||||
===============
|
||||
*NVMEM* is the abbreviation for Non Volatile Memory layer. It is used to
|
||||
retrieve configuration of SOC or Device specific data from non volatile
|
||||
memories like eeprom, efuses and so on.
|
||||
|
||||
Before this framework existed, NVMEM drivers like eeprom were stored in
|
||||
drivers/misc, where they all had to duplicate pretty much the same code to
|
||||
register a sysfs file, allow in-kernel users to access the content of the
|
||||
devices they were driving, etc.
|
||||
|
||||
This was also a problem as far as other in-kernel users were involved, since
|
||||
the solutions used were pretty much different from one driver to another, there
|
||||
was a rather big abstraction leak.
|
||||
|
||||
This framework aims at solve these problems. It also introduces DT
|
||||
representation for consumer devices to go get the data they require (MAC
|
||||
Addresses, SoC/Revision ID, part numbers, and so on) from the NVMEMs. This
|
||||
framework is based on regmap, so that most of the abstraction available in
|
||||
regmap can be reused, across multiple types of buses.
|
||||
|
||||
NVMEM Providers
|
||||
+++++++++++++++
|
||||
|
||||
NVMEM provider refers to an entity that implements methods to initialize, read
|
||||
and write the non-volatile memory.
|
||||
|
||||
2. Registering/Unregistering the NVMEM provider
|
||||
===============================================
|
||||
|
||||
A NVMEM provider can register with NVMEM core by supplying relevant
|
||||
nvmem configuration to nvmem_register(), on success core would return a valid
|
||||
nvmem_device pointer.
|
||||
|
||||
nvmem_unregister(nvmem) is used to unregister a previously registered provider.
|
||||
|
||||
For example, a simple qfprom case::
|
||||
|
||||
static struct nvmem_config econfig = {
|
||||
.name = "qfprom",
|
||||
.owner = THIS_MODULE,
|
||||
};
|
||||
|
||||
static int qfprom_probe(struct platform_device *pdev)
|
||||
{
|
||||
...
|
||||
econfig.dev = &pdev->dev;
|
||||
nvmem = nvmem_register(&econfig);
|
||||
...
|
||||
}
|
||||
|
||||
It is mandatory that the NVMEM provider has a regmap associated with its
|
||||
struct device. Failure to do would return error code from nvmem_register().
|
||||
|
||||
Users of board files can define and register nvmem cells using the
|
||||
nvmem_cell_table struct::
|
||||
|
||||
static struct nvmem_cell_info foo_nvmem_cells[] = {
|
||||
{
|
||||
.name = "macaddr",
|
||||
.offset = 0x7f00,
|
||||
.bytes = ETH_ALEN,
|
||||
}
|
||||
};
|
||||
|
||||
static struct nvmem_cell_table foo_nvmem_cell_table = {
|
||||
.nvmem_name = "i2c-eeprom",
|
||||
.cells = foo_nvmem_cells,
|
||||
.ncells = ARRAY_SIZE(foo_nvmem_cells),
|
||||
};
|
||||
|
||||
nvmem_add_cell_table(&foo_nvmem_cell_table);
|
||||
|
||||
Additionally it is possible to create nvmem cell lookup entries and register
|
||||
them with the nvmem framework from machine code as shown in the example below::
|
||||
|
||||
static struct nvmem_cell_lookup foo_nvmem_lookup = {
|
||||
.nvmem_name = "i2c-eeprom",
|
||||
.cell_name = "macaddr",
|
||||
.dev_id = "foo_mac.0",
|
||||
.con_id = "mac-address",
|
||||
};
|
||||
|
||||
nvmem_add_cell_lookups(&foo_nvmem_lookup, 1);
|
||||
|
||||
NVMEM Consumers
|
||||
+++++++++++++++
|
||||
|
||||
NVMEM consumers are the entities which make use of the NVMEM provider to
|
||||
read from and to NVMEM.
|
||||
|
||||
3. NVMEM cell based consumer APIs
|
||||
=================================
|
||||
|
||||
NVMEM cells are the data entries/fields in the NVMEM.
|
||||
The NVMEM framework provides 3 APIs to read/write NVMEM cells::
|
||||
|
||||
struct nvmem_cell *nvmem_cell_get(struct device *dev, const char *name);
|
||||
struct nvmem_cell *devm_nvmem_cell_get(struct device *dev, const char *name);
|
||||
|
||||
void nvmem_cell_put(struct nvmem_cell *cell);
|
||||
void devm_nvmem_cell_put(struct device *dev, struct nvmem_cell *cell);
|
||||
|
||||
void *nvmem_cell_read(struct nvmem_cell *cell, ssize_t *len);
|
||||
int nvmem_cell_write(struct nvmem_cell *cell, void *buf, ssize_t len);
|
||||
|
||||
`*nvmem_cell_get()` apis will get a reference to nvmem cell for a given id,
|
||||
and nvmem_cell_read/write() can then read or write to the cell.
|
||||
Once the usage of the cell is finished the consumer should call
|
||||
`*nvmem_cell_put()` to free all the allocation memory for the cell.
|
||||
|
||||
4. Direct NVMEM device based consumer APIs
|
||||
==========================================
|
||||
|
||||
In some instances it is necessary to directly read/write the NVMEM.
|
||||
To facilitate such consumers NVMEM framework provides below apis::
|
||||
|
||||
struct nvmem_device *nvmem_device_get(struct device *dev, const char *name);
|
||||
struct nvmem_device *devm_nvmem_device_get(struct device *dev,
|
||||
const char *name);
|
||||
void nvmem_device_put(struct nvmem_device *nvmem);
|
||||
int nvmem_device_read(struct nvmem_device *nvmem, unsigned int offset,
|
||||
size_t bytes, void *buf);
|
||||
int nvmem_device_write(struct nvmem_device *nvmem, unsigned int offset,
|
||||
size_t bytes, void *buf);
|
||||
int nvmem_device_cell_read(struct nvmem_device *nvmem,
|
||||
struct nvmem_cell_info *info, void *buf);
|
||||
int nvmem_device_cell_write(struct nvmem_device *nvmem,
|
||||
struct nvmem_cell_info *info, void *buf);
|
||||
|
||||
Before the consumers can read/write NVMEM directly, it should get hold
|
||||
of nvmem_controller from one of the `*nvmem_device_get()` api.
|
||||
|
||||
The difference between these apis and cell based apis is that these apis always
|
||||
take nvmem_device as parameter.
|
||||
|
||||
5. Releasing a reference to the NVMEM
|
||||
=====================================
|
||||
|
||||
When a consumer no longer needs the NVMEM, it has to release the reference
|
||||
to the NVMEM it has obtained using the APIs mentioned in the above section.
|
||||
The NVMEM framework provides 2 APIs to release a reference to the NVMEM::
|
||||
|
||||
void nvmem_cell_put(struct nvmem_cell *cell);
|
||||
void devm_nvmem_cell_put(struct device *dev, struct nvmem_cell *cell);
|
||||
void nvmem_device_put(struct nvmem_device *nvmem);
|
||||
void devm_nvmem_device_put(struct device *dev, struct nvmem_device *nvmem);
|
||||
|
||||
Both these APIs are used to release a reference to the NVMEM and
|
||||
devm_nvmem_cell_put and devm_nvmem_device_put destroys the devres associated
|
||||
with this NVMEM.
|
||||
|
||||
Userspace
|
||||
+++++++++
|
||||
|
||||
6. Userspace binary interface
|
||||
==============================
|
||||
|
||||
Userspace can read/write the raw NVMEM file located at::
|
||||
|
||||
/sys/bus/nvmem/devices/*/nvmem
|
||||
|
||||
ex::
|
||||
|
||||
hexdump /sys/bus/nvmem/devices/qfprom0/nvmem
|
||||
|
||||
0000000 0000 0000 0000 0000 0000 0000 0000 0000
|
||||
*
|
||||
00000a0 db10 2240 0000 e000 0c00 0c00 0000 0c00
|
||||
0000000 0000 0000 0000 0000 0000 0000 0000 0000
|
||||
...
|
||||
*
|
||||
0001000
|
||||
|
||||
7. DeviceTree Binding
|
||||
=====================
|
||||
|
||||
See Documentation/devicetree/bindings/nvmem/nvmem.txt
|
1832
Documentation/driver-api/parport-lowlevel.rst
Normal file
1832
Documentation/driver-api/parport-lowlevel.rst
Normal file
File diff suppressed because it is too large
Load Diff
18
Documentation/driver-api/phy/index.rst
Normal file
18
Documentation/driver-api/phy/index.rst
Normal file
@@ -0,0 +1,18 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=====================
|
||||
Generic PHY Framework
|
||||
=====================
|
||||
|
||||
.. toctree::
|
||||
|
||||
phy
|
||||
samsung-usb2
|
||||
|
||||
.. only:: subproject and html
|
||||
|
||||
Indices
|
||||
=======
|
||||
|
||||
* :ref:`genindex`
|
||||
|
197
Documentation/driver-api/phy/phy.rst
Normal file
197
Documentation/driver-api/phy/phy.rst
Normal file
@@ -0,0 +1,197 @@
|
||||
=============
|
||||
PHY subsystem
|
||||
=============
|
||||
|
||||
:Author: Kishon Vijay Abraham I <kishon@ti.com>
|
||||
|
||||
This document explains the Generic PHY Framework along with the APIs provided,
|
||||
and how-to-use.
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
*PHY* is the abbreviation for physical layer. It is used to connect a device
|
||||
to the physical medium e.g., the USB controller has a PHY to provide functions
|
||||
such as serialization, de-serialization, encoding, decoding and is responsible
|
||||
for obtaining the required data transmission rate. Note that some USB
|
||||
controllers have PHY functionality embedded into it and others use an external
|
||||
PHY. Other peripherals that use PHY include Wireless LAN, Ethernet,
|
||||
SATA etc.
|
||||
|
||||
The intention of creating this framework is to bring the PHY drivers spread
|
||||
all over the Linux kernel to drivers/phy to increase code re-use and for
|
||||
better code maintainability.
|
||||
|
||||
This framework will be of use only to devices that use external PHY (PHY
|
||||
functionality is not embedded within the controller).
|
||||
|
||||
Registering/Unregistering the PHY provider
|
||||
==========================================
|
||||
|
||||
PHY provider refers to an entity that implements one or more PHY instances.
|
||||
For the simple case where the PHY provider implements only a single instance of
|
||||
the PHY, the framework provides its own implementation of of_xlate in
|
||||
of_phy_simple_xlate. If the PHY provider implements multiple instances, it
|
||||
should provide its own implementation of of_xlate. of_xlate is used only for
|
||||
dt boot case.
|
||||
|
||||
::
|
||||
|
||||
#define of_phy_provider_register(dev, xlate) \
|
||||
__of_phy_provider_register((dev), NULL, THIS_MODULE, (xlate))
|
||||
|
||||
#define devm_of_phy_provider_register(dev, xlate) \
|
||||
__devm_of_phy_provider_register((dev), NULL, THIS_MODULE,
|
||||
(xlate))
|
||||
|
||||
of_phy_provider_register and devm_of_phy_provider_register macros can be used to
|
||||
register the phy_provider and it takes device and of_xlate as
|
||||
arguments. For the dt boot case, all PHY providers should use one of the above
|
||||
2 macros to register the PHY provider.
|
||||
|
||||
Often the device tree nodes associated with a PHY provider will contain a set
|
||||
of children that each represent a single PHY. Some bindings may nest the child
|
||||
nodes within extra levels for context and extensibility, in which case the low
|
||||
level of_phy_provider_register_full() and devm_of_phy_provider_register_full()
|
||||
macros can be used to override the node containing the children.
|
||||
|
||||
::
|
||||
|
||||
#define of_phy_provider_register_full(dev, children, xlate) \
|
||||
__of_phy_provider_register(dev, children, THIS_MODULE, xlate)
|
||||
|
||||
#define devm_of_phy_provider_register_full(dev, children, xlate) \
|
||||
__devm_of_phy_provider_register_full(dev, children,
|
||||
THIS_MODULE, xlate)
|
||||
|
||||
void devm_of_phy_provider_unregister(struct device *dev,
|
||||
struct phy_provider *phy_provider);
|
||||
void of_phy_provider_unregister(struct phy_provider *phy_provider);
|
||||
|
||||
devm_of_phy_provider_unregister and of_phy_provider_unregister can be used to
|
||||
unregister the PHY.
|
||||
|
||||
Creating the PHY
|
||||
================
|
||||
|
||||
The PHY driver should create the PHY in order for other peripheral controllers
|
||||
to make use of it. The PHY framework provides 2 APIs to create the PHY.
|
||||
|
||||
::
|
||||
|
||||
struct phy *phy_create(struct device *dev, struct device_node *node,
|
||||
const struct phy_ops *ops);
|
||||
struct phy *devm_phy_create(struct device *dev,
|
||||
struct device_node *node,
|
||||
const struct phy_ops *ops);
|
||||
|
||||
The PHY drivers can use one of the above 2 APIs to create the PHY by passing
|
||||
the device pointer and phy ops.
|
||||
phy_ops is a set of function pointers for performing PHY operations such as
|
||||
init, exit, power_on and power_off.
|
||||
|
||||
Inorder to dereference the private data (in phy_ops), the phy provider driver
|
||||
can use phy_set_drvdata() after creating the PHY and use phy_get_drvdata() in
|
||||
phy_ops to get back the private data.
|
||||
|
||||
4. Getting a reference to the PHY
|
||||
|
||||
Before the controller can make use of the PHY, it has to get a reference to
|
||||
it. This framework provides the following APIs to get a reference to the PHY.
|
||||
|
||||
::
|
||||
|
||||
struct phy *phy_get(struct device *dev, const char *string);
|
||||
struct phy *phy_optional_get(struct device *dev, const char *string);
|
||||
struct phy *devm_phy_get(struct device *dev, const char *string);
|
||||
struct phy *devm_phy_optional_get(struct device *dev,
|
||||
const char *string);
|
||||
struct phy *devm_of_phy_get_by_index(struct device *dev,
|
||||
struct device_node *np,
|
||||
int index);
|
||||
|
||||
phy_get, phy_optional_get, devm_phy_get and devm_phy_optional_get can
|
||||
be used to get the PHY. In the case of dt boot, the string arguments
|
||||
should contain the phy name as given in the dt data and in the case of
|
||||
non-dt boot, it should contain the label of the PHY. The two
|
||||
devm_phy_get associates the device with the PHY using devres on
|
||||
successful PHY get. On driver detach, release function is invoked on
|
||||
the devres data and devres data is freed. phy_optional_get and
|
||||
devm_phy_optional_get should be used when the phy is optional. These
|
||||
two functions will never return -ENODEV, but instead returns NULL when
|
||||
the phy cannot be found.Some generic drivers, such as ehci, may use multiple
|
||||
phys and for such drivers referencing phy(s) by name(s) does not make sense. In
|
||||
this case, devm_of_phy_get_by_index can be used to get a phy reference based on
|
||||
the index.
|
||||
|
||||
It should be noted that NULL is a valid phy reference. All phy
|
||||
consumer calls on the NULL phy become NOPs. That is the release calls,
|
||||
the phy_init() and phy_exit() calls, and phy_power_on() and
|
||||
phy_power_off() calls are all NOP when applied to a NULL phy. The NULL
|
||||
phy is useful in devices for handling optional phy devices.
|
||||
|
||||
Releasing a reference to the PHY
|
||||
================================
|
||||
|
||||
When the controller no longer needs the PHY, it has to release the reference
|
||||
to the PHY it has obtained using the APIs mentioned in the above section. The
|
||||
PHY framework provides 2 APIs to release a reference to the PHY.
|
||||
|
||||
::
|
||||
|
||||
void phy_put(struct phy *phy);
|
||||
void devm_phy_put(struct device *dev, struct phy *phy);
|
||||
|
||||
Both these APIs are used to release a reference to the PHY and devm_phy_put
|
||||
destroys the devres associated with this PHY.
|
||||
|
||||
Destroying the PHY
|
||||
==================
|
||||
|
||||
When the driver that created the PHY is unloaded, it should destroy the PHY it
|
||||
created using one of the following 2 APIs::
|
||||
|
||||
void phy_destroy(struct phy *phy);
|
||||
void devm_phy_destroy(struct device *dev, struct phy *phy);
|
||||
|
||||
Both these APIs destroy the PHY and devm_phy_destroy destroys the devres
|
||||
associated with this PHY.
|
||||
|
||||
PM Runtime
|
||||
==========
|
||||
|
||||
This subsystem is pm runtime enabled. So while creating the PHY,
|
||||
pm_runtime_enable of the phy device created by this subsystem is called and
|
||||
while destroying the PHY, pm_runtime_disable is called. Note that the phy
|
||||
device created by this subsystem will be a child of the device that calls
|
||||
phy_create (PHY provider device).
|
||||
|
||||
So pm_runtime_get_sync of the phy_device created by this subsystem will invoke
|
||||
pm_runtime_get_sync of PHY provider device because of parent-child relationship.
|
||||
It should also be noted that phy_power_on and phy_power_off performs
|
||||
phy_pm_runtime_get_sync and phy_pm_runtime_put respectively.
|
||||
There are exported APIs like phy_pm_runtime_get, phy_pm_runtime_get_sync,
|
||||
phy_pm_runtime_put, phy_pm_runtime_put_sync, phy_pm_runtime_allow and
|
||||
phy_pm_runtime_forbid for performing PM operations.
|
||||
|
||||
PHY Mappings
|
||||
============
|
||||
|
||||
In order to get reference to a PHY without help from DeviceTree, the framework
|
||||
offers lookups which can be compared to clkdev that allow clk structures to be
|
||||
bound to devices. A lookup can be made be made during runtime when a handle to
|
||||
the struct phy already exists.
|
||||
|
||||
The framework offers the following API for registering and unregistering the
|
||||
lookups::
|
||||
|
||||
int phy_create_lookup(struct phy *phy, const char *con_id,
|
||||
const char *dev_id);
|
||||
void phy_remove_lookup(struct phy *phy, const char *con_id,
|
||||
const char *dev_id);
|
||||
|
||||
DeviceTree Binding
|
||||
==================
|
||||
|
||||
The documentation for PHY dt binding can be found @
|
||||
Documentation/devicetree/bindings/phy/phy-bindings.txt
|
137
Documentation/driver-api/phy/samsung-usb2.rst
Normal file
137
Documentation/driver-api/phy/samsung-usb2.rst
Normal file
@@ -0,0 +1,137 @@
|
||||
====================================
|
||||
Samsung USB 2.0 PHY adaptation layer
|
||||
====================================
|
||||
|
||||
1. Description
|
||||
--------------
|
||||
|
||||
The architecture of the USB 2.0 PHY module in Samsung SoCs is similar
|
||||
among many SoCs. In spite of the similarities it proved difficult to
|
||||
create a one driver that would fit all these PHY controllers. Often
|
||||
the differences were minor and were found in particular bits of the
|
||||
registers of the PHY. In some rare cases the order of register writes or
|
||||
the PHY powering up process had to be altered. This adaptation layer is
|
||||
a compromise between having separate drivers and having a single driver
|
||||
with added support for many special cases.
|
||||
|
||||
2. Files description
|
||||
--------------------
|
||||
|
||||
- phy-samsung-usb2.c
|
||||
This is the main file of the adaptation layer. This file contains
|
||||
the probe function and provides two callbacks to the Generic PHY
|
||||
Framework. This two callbacks are used to power on and power off the
|
||||
phy. They carry out the common work that has to be done on all version
|
||||
of the PHY module. Depending on which SoC was chosen they execute SoC
|
||||
specific callbacks. The specific SoC version is selected by choosing
|
||||
the appropriate compatible string. In addition, this file contains
|
||||
struct of_device_id definitions for particular SoCs.
|
||||
|
||||
- phy-samsung-usb2.h
|
||||
This is the include file. It declares the structures used by this
|
||||
driver. In addition it should contain extern declarations for
|
||||
structures that describe particular SoCs.
|
||||
|
||||
3. Supporting SoCs
|
||||
------------------
|
||||
|
||||
To support a new SoC a new file should be added to the drivers/phy
|
||||
directory. Each SoC's configuration is stored in an instance of the
|
||||
struct samsung_usb2_phy_config::
|
||||
|
||||
struct samsung_usb2_phy_config {
|
||||
const struct samsung_usb2_common_phy *phys;
|
||||
int (*rate_to_clk)(unsigned long, u32 *);
|
||||
unsigned int num_phys;
|
||||
bool has_mode_switch;
|
||||
};
|
||||
|
||||
The num_phys is the number of phys handled by the driver. `*phys` is an
|
||||
array that contains the configuration for each phy. The has_mode_switch
|
||||
property is a boolean flag that determines whether the SoC has USB host
|
||||
and device on a single pair of pins. If so, a special register has to
|
||||
be modified to change the internal routing of these pins between a USB
|
||||
device or host module.
|
||||
|
||||
For example the configuration for Exynos 4210 is following::
|
||||
|
||||
const struct samsung_usb2_phy_config exynos4210_usb2_phy_config = {
|
||||
.has_mode_switch = 0,
|
||||
.num_phys = EXYNOS4210_NUM_PHYS,
|
||||
.phys = exynos4210_phys,
|
||||
.rate_to_clk = exynos4210_rate_to_clk,
|
||||
}
|
||||
|
||||
- `int (*rate_to_clk)(unsigned long, u32 *)`
|
||||
|
||||
The rate_to_clk callback is to convert the rate of the clock
|
||||
used as the reference clock for the PHY module to the value
|
||||
that should be written in the hardware register.
|
||||
|
||||
The exynos4210_phys configuration array is as follows::
|
||||
|
||||
static const struct samsung_usb2_common_phy exynos4210_phys[] = {
|
||||
{
|
||||
.label = "device",
|
||||
.id = EXYNOS4210_DEVICE,
|
||||
.power_on = exynos4210_power_on,
|
||||
.power_off = exynos4210_power_off,
|
||||
},
|
||||
{
|
||||
.label = "host",
|
||||
.id = EXYNOS4210_HOST,
|
||||
.power_on = exynos4210_power_on,
|
||||
.power_off = exynos4210_power_off,
|
||||
},
|
||||
{
|
||||
.label = "hsic0",
|
||||
.id = EXYNOS4210_HSIC0,
|
||||
.power_on = exynos4210_power_on,
|
||||
.power_off = exynos4210_power_off,
|
||||
},
|
||||
{
|
||||
.label = "hsic1",
|
||||
.id = EXYNOS4210_HSIC1,
|
||||
.power_on = exynos4210_power_on,
|
||||
.power_off = exynos4210_power_off,
|
||||
},
|
||||
{},
|
||||
};
|
||||
|
||||
- `int (*power_on)(struct samsung_usb2_phy_instance *);`
|
||||
`int (*power_off)(struct samsung_usb2_phy_instance *);`
|
||||
|
||||
These two callbacks are used to power on and power off the phy
|
||||
by modifying appropriate registers.
|
||||
|
||||
Final change to the driver is adding appropriate compatible value to the
|
||||
phy-samsung-usb2.c file. In case of Exynos 4210 the following lines were
|
||||
added to the struct of_device_id samsung_usb2_phy_of_match[] array::
|
||||
|
||||
#ifdef CONFIG_PHY_EXYNOS4210_USB2
|
||||
{
|
||||
.compatible = "samsung,exynos4210-usb2-phy",
|
||||
.data = &exynos4210_usb2_phy_config,
|
||||
},
|
||||
#endif
|
||||
|
||||
To add further flexibility to the driver the Kconfig file enables to
|
||||
include support for selected SoCs in the compiled driver. The Kconfig
|
||||
entry for Exynos 4210 is following::
|
||||
|
||||
config PHY_EXYNOS4210_USB2
|
||||
bool "Support for Exynos 4210"
|
||||
depends on PHY_SAMSUNG_USB2
|
||||
depends on CPU_EXYNOS4210
|
||||
help
|
||||
Enable USB PHY support for Exynos 4210. This option requires that
|
||||
Samsung USB 2.0 PHY driver is enabled and means that support for this
|
||||
particular SoC is compiled in the driver. In case of Exynos 4210 four
|
||||
phys are available - device, host, HSCI0 and HSCI1.
|
||||
|
||||
The newly created file that supports the new SoC has to be also added to the
|
||||
Makefile. In case of Exynos 4210 the added line is following::
|
||||
|
||||
obj-$(CONFIG_PHY_EXYNOS4210_USB2) += phy-exynos4210-usb2.o
|
||||
|
||||
After completing these steps the support for the new SoC should be ready.
|
@@ -1,4 +1,4 @@
|
||||
:orphan:
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
======================
|
||||
PPS - Pulse Per Second
|
||||
|
106
Documentation/driver-api/pti_intel_mid.rst
Normal file
106
Documentation/driver-api/pti_intel_mid.rst
Normal file
@@ -0,0 +1,106 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=============
|
||||
Intel MID PTI
|
||||
=============
|
||||
|
||||
The Intel MID PTI project is HW implemented in Intel Atom
|
||||
system-on-a-chip designs based on the Parallel Trace
|
||||
Interface for MIPI P1149.7 cJTAG standard. The kernel solution
|
||||
for this platform involves the following files::
|
||||
|
||||
./include/linux/pti.h
|
||||
./drivers/.../n_tracesink.h
|
||||
./drivers/.../n_tracerouter.c
|
||||
./drivers/.../n_tracesink.c
|
||||
./drivers/.../pti.c
|
||||
|
||||
pti.c is the driver that enables various debugging features
|
||||
popular on platforms from certain mobile manufacturers.
|
||||
n_tracerouter.c and n_tracesink.c allow extra system information to
|
||||
be collected and routed to the pti driver, such as trace
|
||||
debugging data from a modem. Although n_tracerouter
|
||||
and n_tracesink are a part of the complete PTI solution,
|
||||
these two line disciplines can work separately from
|
||||
pti.c and route any data stream from one /dev/tty node
|
||||
to another /dev/tty node via kernel-space. This provides
|
||||
a stable, reliable connection that will not break unless
|
||||
the user-space application shuts down (plus avoids
|
||||
kernel->user->kernel context switch overheads of routing
|
||||
data).
|
||||
|
||||
An example debugging usage for this driver system:
|
||||
|
||||
* Hook /dev/ttyPTI0 to syslogd. Opening this port will also start
|
||||
a console device to further capture debugging messages to PTI.
|
||||
* Hook /dev/ttyPTI1 to modem debugging data to write to PTI HW.
|
||||
This is where n_tracerouter and n_tracesink are used.
|
||||
* Hook /dev/pti to a user-level debugging application for writing
|
||||
to PTI HW.
|
||||
* `Use mipi_` Kernel Driver API in other device drivers for
|
||||
debugging to PTI by first requesting a PTI write address via
|
||||
mipi_request_masterchannel(1).
|
||||
|
||||
Below is example pseudo-code on how a 'privileged' application
|
||||
can hook up n_tracerouter and n_tracesink to any tty on
|
||||
a system. 'Privileged' means the application has enough
|
||||
privileges to successfully manipulate the ldisc drivers
|
||||
but is not just blindly executing as 'root'. Keep in mind
|
||||
the use of ioctl(,TIOCSETD,) is not specific to the n_tracerouter
|
||||
and n_tracesink line discpline drivers but is a generic
|
||||
operation for a program to use a line discpline driver
|
||||
on a tty port other than the default n_tty::
|
||||
|
||||
/////////// To hook up n_tracerouter and n_tracesink /////////
|
||||
|
||||
// Note that n_tracerouter depends on n_tracesink.
|
||||
#include <errno.h>
|
||||
#define ONE_TTY "/dev/ttyOne"
|
||||
#define TWO_TTY "/dev/ttyTwo"
|
||||
|
||||
// needed global to hand onto ldisc connection
|
||||
static int g_fd_source = -1;
|
||||
static int g_fd_sink = -1;
|
||||
|
||||
// these two vars used to grab LDISC values from loaded ldisc drivers
|
||||
// in OS. Look at /proc/tty/ldiscs to get the right numbers from
|
||||
// the ldiscs loaded in the system.
|
||||
int source_ldisc_num, sink_ldisc_num = -1;
|
||||
int retval;
|
||||
|
||||
g_fd_source = open(ONE_TTY, O_RDWR); // must be R/W
|
||||
g_fd_sink = open(TWO_TTY, O_RDWR); // must be R/W
|
||||
|
||||
if (g_fd_source <= 0) || (g_fd_sink <= 0) {
|
||||
// doubt you'll want to use these exact error lines of code
|
||||
printf("Error on open(). errno: %d\n",errno);
|
||||
return errno;
|
||||
}
|
||||
|
||||
retval = ioctl(g_fd_sink, TIOCSETD, &sink_ldisc_num);
|
||||
if (retval < 0) {
|
||||
printf("Error on ioctl(). errno: %d\n", errno);
|
||||
return errno;
|
||||
}
|
||||
|
||||
retval = ioctl(g_fd_source, TIOCSETD, &source_ldisc_num);
|
||||
if (retval < 0) {
|
||||
printf("Error on ioctl(). errno: %d\n", errno);
|
||||
return errno;
|
||||
}
|
||||
|
||||
/////////// To disconnect n_tracerouter and n_tracesink ////////
|
||||
|
||||
// First make sure data through the ldiscs has stopped.
|
||||
|
||||
// Second, disconnect ldiscs. This provides a
|
||||
// little cleaner shutdown on tty stack.
|
||||
sink_ldisc_num = 0;
|
||||
source_ldisc_num = 0;
|
||||
ioctl(g_fd_uart, TIOCSETD, &sink_ldisc_num);
|
||||
ioctl(g_fd_gadget, TIOCSETD, &source_ldisc_num);
|
||||
|
||||
// Three, program closes connection, and cleanup:
|
||||
close(g_fd_uart);
|
||||
close(g_fd_gadget);
|
||||
g_fd_uart = g_fd_gadget = NULL;
|
@@ -1,4 +1,4 @@
|
||||
:orphan:
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===========================================
|
||||
PTP hardware clock infrastructure for Linux
|
||||
|
165
Documentation/driver-api/pwm.rst
Normal file
165
Documentation/driver-api/pwm.rst
Normal file
@@ -0,0 +1,165 @@
|
||||
======================================
|
||||
Pulse Width Modulation (PWM) interface
|
||||
======================================
|
||||
|
||||
This provides an overview about the Linux PWM interface
|
||||
|
||||
PWMs are commonly used for controlling LEDs, fans or vibrators in
|
||||
cell phones. PWMs with a fixed purpose have no need implementing
|
||||
the Linux PWM API (although they could). However, PWMs are often
|
||||
found as discrete devices on SoCs which have no fixed purpose. It's
|
||||
up to the board designer to connect them to LEDs or fans. To provide
|
||||
this kind of flexibility the generic PWM API exists.
|
||||
|
||||
Identifying PWMs
|
||||
----------------
|
||||
|
||||
Users of the legacy PWM API use unique IDs to refer to PWM devices.
|
||||
|
||||
Instead of referring to a PWM device via its unique ID, board setup code
|
||||
should instead register a static mapping that can be used to match PWM
|
||||
consumers to providers, as given in the following example::
|
||||
|
||||
static struct pwm_lookup board_pwm_lookup[] = {
|
||||
PWM_LOOKUP("tegra-pwm", 0, "pwm-backlight", NULL,
|
||||
50000, PWM_POLARITY_NORMAL),
|
||||
};
|
||||
|
||||
static void __init board_init(void)
|
||||
{
|
||||
...
|
||||
pwm_add_table(board_pwm_lookup, ARRAY_SIZE(board_pwm_lookup));
|
||||
...
|
||||
}
|
||||
|
||||
Using PWMs
|
||||
----------
|
||||
|
||||
Legacy users can request a PWM device using pwm_request() and free it
|
||||
after usage with pwm_free().
|
||||
|
||||
New users should use the pwm_get() function and pass to it the consumer
|
||||
device or a consumer name. pwm_put() is used to free the PWM device. Managed
|
||||
variants of these functions, devm_pwm_get() and devm_pwm_put(), also exist.
|
||||
|
||||
After being requested, a PWM has to be configured using::
|
||||
|
||||
int pwm_apply_state(struct pwm_device *pwm, struct pwm_state *state);
|
||||
|
||||
This API controls both the PWM period/duty_cycle config and the
|
||||
enable/disable state.
|
||||
|
||||
The pwm_config(), pwm_enable() and pwm_disable() functions are just wrappers
|
||||
around pwm_apply_state() and should not be used if the user wants to change
|
||||
several parameter at once. For example, if you see pwm_config() and
|
||||
pwm_{enable,disable}() calls in the same function, this probably means you
|
||||
should switch to pwm_apply_state().
|
||||
|
||||
The PWM user API also allows one to query the PWM state with pwm_get_state().
|
||||
|
||||
In addition to the PWM state, the PWM API also exposes PWM arguments, which
|
||||
are the reference PWM config one should use on this PWM.
|
||||
PWM arguments are usually platform-specific and allows the PWM user to only
|
||||
care about dutycycle relatively to the full period (like, duty = 50% of the
|
||||
period). struct pwm_args contains 2 fields (period and polarity) and should
|
||||
be used to set the initial PWM config (usually done in the probe function
|
||||
of the PWM user). PWM arguments are retrieved with pwm_get_args().
|
||||
|
||||
All consumers should really be reconfiguring the PWM upon resume as
|
||||
appropriate. This is the only way to ensure that everything is resumed in
|
||||
the proper order.
|
||||
|
||||
Using PWMs with the sysfs interface
|
||||
-----------------------------------
|
||||
|
||||
If CONFIG_SYSFS is enabled in your kernel configuration a simple sysfs
|
||||
interface is provided to use the PWMs from userspace. It is exposed at
|
||||
/sys/class/pwm/. Each probed PWM controller/chip will be exported as
|
||||
pwmchipN, where N is the base of the PWM chip. Inside the directory you
|
||||
will find:
|
||||
|
||||
npwm
|
||||
The number of PWM channels this chip supports (read-only).
|
||||
|
||||
export
|
||||
Exports a PWM channel for use with sysfs (write-only).
|
||||
|
||||
unexport
|
||||
Unexports a PWM channel from sysfs (write-only).
|
||||
|
||||
The PWM channels are numbered using a per-chip index from 0 to npwm-1.
|
||||
|
||||
When a PWM channel is exported a pwmX directory will be created in the
|
||||
pwmchipN directory it is associated with, where X is the number of the
|
||||
channel that was exported. The following properties will then be available:
|
||||
|
||||
period
|
||||
The total period of the PWM signal (read/write).
|
||||
Value is in nanoseconds and is the sum of the active and inactive
|
||||
time of the PWM.
|
||||
|
||||
duty_cycle
|
||||
The active time of the PWM signal (read/write).
|
||||
Value is in nanoseconds and must be less than the period.
|
||||
|
||||
polarity
|
||||
Changes the polarity of the PWM signal (read/write).
|
||||
Writes to this property only work if the PWM chip supports changing
|
||||
the polarity. The polarity can only be changed if the PWM is not
|
||||
enabled. Value is the string "normal" or "inversed".
|
||||
|
||||
enable
|
||||
Enable/disable the PWM signal (read/write).
|
||||
|
||||
- 0 - disabled
|
||||
- 1 - enabled
|
||||
|
||||
Implementing a PWM driver
|
||||
-------------------------
|
||||
|
||||
Currently there are two ways to implement pwm drivers. Traditionally
|
||||
there only has been the barebone API meaning that each driver has
|
||||
to implement the pwm_*() functions itself. This means that it's impossible
|
||||
to have multiple PWM drivers in the system. For this reason it's mandatory
|
||||
for new drivers to use the generic PWM framework.
|
||||
|
||||
A new PWM controller/chip can be added using pwmchip_add() and removed
|
||||
again with pwmchip_remove(). pwmchip_add() takes a filled in struct
|
||||
pwm_chip as argument which provides a description of the PWM chip, the
|
||||
number of PWM devices provided by the chip and the chip-specific
|
||||
implementation of the supported PWM operations to the framework.
|
||||
|
||||
When implementing polarity support in a PWM driver, make sure to respect the
|
||||
signal conventions in the PWM framework. By definition, normal polarity
|
||||
characterizes a signal starts high for the duration of the duty cycle and
|
||||
goes low for the remainder of the period. Conversely, a signal with inversed
|
||||
polarity starts low for the duration of the duty cycle and goes high for the
|
||||
remainder of the period.
|
||||
|
||||
Drivers are encouraged to implement ->apply() instead of the legacy
|
||||
->enable(), ->disable() and ->config() methods. Doing that should provide
|
||||
atomicity in the PWM config workflow, which is required when the PWM controls
|
||||
a critical device (like a regulator).
|
||||
|
||||
The implementation of ->get_state() (a method used to retrieve initial PWM
|
||||
state) is also encouraged for the same reason: letting the PWM user know
|
||||
about the current PWM state would allow him to avoid glitches.
|
||||
|
||||
Drivers should not implement any power management. In other words,
|
||||
consumers should implement it as described in the "Using PWMs" section.
|
||||
|
||||
Locking
|
||||
-------
|
||||
|
||||
The PWM core list manipulations are protected by a mutex, so pwm_request()
|
||||
and pwm_free() may not be called from an atomic context. Currently the
|
||||
PWM core does not enforce any locking to pwm_enable(), pwm_disable() and
|
||||
pwm_config(), so the calling context is currently driver specific. This
|
||||
is an issue derived from the former barebone API and should be fixed soon.
|
||||
|
||||
Helpers
|
||||
-------
|
||||
|
||||
Currently a PWM can only be configured with period_ns and duty_ns. For several
|
||||
use cases freq_hz and duty_percent might be better. Instead of calculating
|
||||
this in your driver please consider adding appropriate helpers to the framework.
|
@@ -1,107 +0,0 @@
|
||||
=======================
|
||||
RapidIO Subsystem Guide
|
||||
=======================
|
||||
|
||||
:Author: Matt Porter
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
RapidIO is a high speed switched fabric interconnect with features aimed
|
||||
at the embedded market. RapidIO provides support for memory-mapped I/O
|
||||
as well as message-based transactions over the switched fabric network.
|
||||
RapidIO has a standardized discovery mechanism not unlike the PCI bus
|
||||
standard that allows simple detection of devices in a network.
|
||||
|
||||
This documentation is provided for developers intending to support
|
||||
RapidIO on new architectures, write new drivers, or to understand the
|
||||
subsystem internals.
|
||||
|
||||
Known Bugs and Limitations
|
||||
==========================
|
||||
|
||||
Bugs
|
||||
----
|
||||
|
||||
None. ;)
|
||||
|
||||
Limitations
|
||||
-----------
|
||||
|
||||
1. Access/management of RapidIO memory regions is not supported
|
||||
|
||||
2. Multiple host enumeration is not supported
|
||||
|
||||
RapidIO driver interface
|
||||
========================
|
||||
|
||||
Drivers are provided a set of calls in order to interface with the
|
||||
subsystem to gather info on devices, request/map memory region
|
||||
resources, and manage mailboxes/doorbells.
|
||||
|
||||
Functions
|
||||
---------
|
||||
|
||||
.. kernel-doc:: include/linux/rio_drv.h
|
||||
:internal:
|
||||
|
||||
.. kernel-doc:: drivers/rapidio/rio-driver.c
|
||||
:export:
|
||||
|
||||
.. kernel-doc:: drivers/rapidio/rio.c
|
||||
:export:
|
||||
|
||||
Internals
|
||||
=========
|
||||
|
||||
This chapter contains the autogenerated documentation of the RapidIO
|
||||
subsystem.
|
||||
|
||||
Structures
|
||||
----------
|
||||
|
||||
.. kernel-doc:: include/linux/rio.h
|
||||
:internal:
|
||||
|
||||
Enumeration and Discovery
|
||||
-------------------------
|
||||
|
||||
.. kernel-doc:: drivers/rapidio/rio-scan.c
|
||||
:internal:
|
||||
|
||||
Driver functionality
|
||||
--------------------
|
||||
|
||||
.. kernel-doc:: drivers/rapidio/rio.c
|
||||
:internal:
|
||||
|
||||
.. kernel-doc:: drivers/rapidio/rio-access.c
|
||||
:internal:
|
||||
|
||||
Device model support
|
||||
--------------------
|
||||
|
||||
.. kernel-doc:: drivers/rapidio/rio-driver.c
|
||||
:internal:
|
||||
|
||||
PPC32 support
|
||||
-------------
|
||||
|
||||
.. kernel-doc:: arch/powerpc/sysdev/fsl_rio.c
|
||||
:internal:
|
||||
|
||||
Credits
|
||||
=======
|
||||
|
||||
The following people have contributed to the RapidIO subsystem directly
|
||||
or indirectly:
|
||||
|
||||
1. Matt Porter\ mporter@kernel.crashing.org
|
||||
|
||||
2. Randy Vinson\ rvinson@mvista.com
|
||||
|
||||
3. Dan Malek\ dan@embeddedalley.com
|
||||
|
||||
The following people have contributed to this document:
|
||||
|
||||
1. Matt Porter\ mporter@kernel.crashing.org
|
15
Documentation/driver-api/rapidio/index.rst
Normal file
15
Documentation/driver-api/rapidio/index.rst
Normal file
@@ -0,0 +1,15 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===========================
|
||||
The Linux RapidIO Subsystem
|
||||
===========================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
rapidio
|
||||
sysfs
|
||||
|
||||
tsi721
|
||||
mport_cdev
|
||||
rio_cm
|
110
Documentation/driver-api/rapidio/mport_cdev.rst
Normal file
110
Documentation/driver-api/rapidio/mport_cdev.rst
Normal file
@@ -0,0 +1,110 @@
|
||||
==================================================================
|
||||
RapidIO subsystem mport character device driver (rio_mport_cdev.c)
|
||||
==================================================================
|
||||
|
||||
1. Overview
|
||||
===========
|
||||
|
||||
This device driver is the result of collaboration within the RapidIO.org
|
||||
Software Task Group (STG) between Texas Instruments, Freescale,
|
||||
Prodrive Technologies, Nokia Networks, BAE and IDT. Additional input was
|
||||
received from other members of RapidIO.org. The objective was to create a
|
||||
character mode driver interface which exposes the capabilities of RapidIO
|
||||
devices directly to applications, in a manner that allows the numerous and
|
||||
varied RapidIO implementations to interoperate.
|
||||
|
||||
This driver (MPORT_CDEV) provides access to basic RapidIO subsystem operations
|
||||
for user-space applications. Most of RapidIO operations are supported through
|
||||
'ioctl' system calls.
|
||||
|
||||
When loaded this device driver creates filesystem nodes named rio_mportX in /dev
|
||||
directory for each registered RapidIO mport device. 'X' in the node name matches
|
||||
to unique port ID assigned to each local mport device.
|
||||
|
||||
Using available set of ioctl commands user-space applications can perform
|
||||
following RapidIO bus and subsystem operations:
|
||||
|
||||
- Reads and writes from/to configuration registers of mport devices
|
||||
(RIO_MPORT_MAINT_READ_LOCAL/RIO_MPORT_MAINT_WRITE_LOCAL)
|
||||
- Reads and writes from/to configuration registers of remote RapidIO devices.
|
||||
This operations are defined as RapidIO Maintenance reads/writes in RIO spec.
|
||||
(RIO_MPORT_MAINT_READ_REMOTE/RIO_MPORT_MAINT_WRITE_REMOTE)
|
||||
- Set RapidIO Destination ID for mport devices (RIO_MPORT_MAINT_HDID_SET)
|
||||
- Set RapidIO Component Tag for mport devices (RIO_MPORT_MAINT_COMPTAG_SET)
|
||||
- Query logical index of mport devices (RIO_MPORT_MAINT_PORT_IDX_GET)
|
||||
- Query capabilities and RapidIO link configuration of mport devices
|
||||
(RIO_MPORT_GET_PROPERTIES)
|
||||
- Enable/Disable reporting of RapidIO doorbell events to user-space applications
|
||||
(RIO_ENABLE_DOORBELL_RANGE/RIO_DISABLE_DOORBELL_RANGE)
|
||||
- Enable/Disable reporting of RIO port-write events to user-space applications
|
||||
(RIO_ENABLE_PORTWRITE_RANGE/RIO_DISABLE_PORTWRITE_RANGE)
|
||||
- Query/Control type of events reported through this driver: doorbells,
|
||||
port-writes or both (RIO_SET_EVENT_MASK/RIO_GET_EVENT_MASK)
|
||||
- Configure/Map mport's outbound requests window(s) for specific size,
|
||||
RapidIO destination ID, hopcount and request type
|
||||
(RIO_MAP_OUTBOUND/RIO_UNMAP_OUTBOUND)
|
||||
- Configure/Map mport's inbound requests window(s) for specific size,
|
||||
RapidIO base address and local memory base address
|
||||
(RIO_MAP_INBOUND/RIO_UNMAP_INBOUND)
|
||||
- Allocate/Free contiguous DMA coherent memory buffer for DMA data transfers
|
||||
to/from remote RapidIO devices (RIO_ALLOC_DMA/RIO_FREE_DMA)
|
||||
- Initiate DMA data transfers to/from remote RapidIO devices (RIO_TRANSFER).
|
||||
Supports blocking, asynchronous and posted (a.k.a 'fire-and-forget') data
|
||||
transfer modes.
|
||||
- Check/Wait for completion of asynchronous DMA data transfer
|
||||
(RIO_WAIT_FOR_ASYNC)
|
||||
- Manage device objects supported by RapidIO subsystem (RIO_DEV_ADD/RIO_DEV_DEL).
|
||||
This allows implementation of various RapidIO fabric enumeration algorithms
|
||||
as user-space applications while using remaining functionality provided by
|
||||
kernel RapidIO subsystem.
|
||||
|
||||
2. Hardware Compatibility
|
||||
=========================
|
||||
|
||||
This device driver uses standard interfaces defined by kernel RapidIO subsystem
|
||||
and therefore it can be used with any mport device driver registered by RapidIO
|
||||
subsystem with limitations set by available mport implementation.
|
||||
|
||||
At this moment the most common limitation is availability of RapidIO-specific
|
||||
DMA engine framework for specific mport device. Users should verify available
|
||||
functionality of their platform when planning to use this driver:
|
||||
|
||||
- IDT Tsi721 PCIe-to-RapidIO bridge device and its mport device driver are fully
|
||||
compatible with this driver.
|
||||
- Freescale SoCs 'fsl_rio' mport driver does not have implementation for RapidIO
|
||||
specific DMA engine support and therefore DMA data transfers mport_cdev driver
|
||||
are not available.
|
||||
|
||||
3. Module parameters
|
||||
====================
|
||||
|
||||
- 'dma_timeout'
|
||||
- DMA transfer completion timeout (in msec, default value 3000).
|
||||
This parameter set a maximum completion wait time for SYNC mode DMA
|
||||
transfer requests and for RIO_WAIT_FOR_ASYNC ioctl requests.
|
||||
|
||||
- 'dbg_level'
|
||||
- This parameter allows to control amount of debug information
|
||||
generated by this device driver. This parameter is formed by set of
|
||||
bit masks that correspond to the specific functional blocks.
|
||||
For mask definitions see 'drivers/rapidio/devices/rio_mport_cdev.c'
|
||||
This parameter can be changed dynamically.
|
||||
Use CONFIG_RAPIDIO_DEBUG=y to enable debug output at the top level.
|
||||
|
||||
4. Known problems
|
||||
=================
|
||||
|
||||
None.
|
||||
|
||||
5. User-space Applications and API
|
||||
==================================
|
||||
|
||||
API library and applications that use this device driver are available from
|
||||
RapidIO.org.
|
||||
|
||||
6. TODO List
|
||||
============
|
||||
|
||||
- Add support for sending/receiving "raw" RapidIO messaging packets.
|
||||
- Add memory mapped DMA data transfers as an option when RapidIO-specific DMA
|
||||
is not available.
|
362
Documentation/driver-api/rapidio/rapidio.rst
Normal file
362
Documentation/driver-api/rapidio/rapidio.rst
Normal file
@@ -0,0 +1,362 @@
|
||||
============
|
||||
Introduction
|
||||
============
|
||||
|
||||
The RapidIO standard is a packet-based fabric interconnect standard designed for
|
||||
use in embedded systems. Development of the RapidIO standard is directed by the
|
||||
RapidIO Trade Association (RTA). The current version of the RapidIO specification
|
||||
is publicly available for download from the RTA web-site [1].
|
||||
|
||||
This document describes the basics of the Linux RapidIO subsystem and provides
|
||||
information on its major components.
|
||||
|
||||
1 Overview
|
||||
==========
|
||||
|
||||
Because the RapidIO subsystem follows the Linux device model it is integrated
|
||||
into the kernel similarly to other buses by defining RapidIO-specific device and
|
||||
bus types and registering them within the device model.
|
||||
|
||||
The Linux RapidIO subsystem is architecture independent and therefore defines
|
||||
architecture-specific interfaces that provide support for common RapidIO
|
||||
subsystem operations.
|
||||
|
||||
2. Core Components
|
||||
==================
|
||||
|
||||
A typical RapidIO network is a combination of endpoints and switches.
|
||||
Each of these components is represented in the subsystem by an associated data
|
||||
structure. The core logical components of the RapidIO subsystem are defined
|
||||
in include/linux/rio.h file.
|
||||
|
||||
2.1 Master Port
|
||||
---------------
|
||||
|
||||
A master port (or mport) is a RapidIO interface controller that is local to the
|
||||
processor executing the Linux code. A master port generates and receives RapidIO
|
||||
packets (transactions). In the RapidIO subsystem each master port is represented
|
||||
by a rio_mport data structure. This structure contains master port specific
|
||||
resources such as mailboxes and doorbells. The rio_mport also includes a unique
|
||||
host device ID that is valid when a master port is configured as an enumerating
|
||||
host.
|
||||
|
||||
RapidIO master ports are serviced by subsystem specific mport device drivers
|
||||
that provide functionality defined for this subsystem. To provide a hardware
|
||||
independent interface for RapidIO subsystem operations, rio_mport structure
|
||||
includes rio_ops data structure which contains pointers to hardware specific
|
||||
implementations of RapidIO functions.
|
||||
|
||||
2.2 Device
|
||||
----------
|
||||
|
||||
A RapidIO device is any endpoint (other than mport) or switch in the network.
|
||||
All devices are presented in the RapidIO subsystem by corresponding rio_dev data
|
||||
structure. Devices form one global device list and per-network device lists
|
||||
(depending on number of available mports and networks).
|
||||
|
||||
2.3 Switch
|
||||
----------
|
||||
|
||||
A RapidIO switch is a special class of device that routes packets between its
|
||||
ports towards their final destination. The packet destination port within a
|
||||
switch is defined by an internal routing table. A switch is presented in the
|
||||
RapidIO subsystem by rio_dev data structure expanded by additional rio_switch
|
||||
data structure, which contains switch specific information such as copy of the
|
||||
routing table and pointers to switch specific functions.
|
||||
|
||||
The RapidIO subsystem defines the format and initialization method for subsystem
|
||||
specific switch drivers that are designed to provide hardware-specific
|
||||
implementation of common switch management routines.
|
||||
|
||||
2.4 Network
|
||||
-----------
|
||||
|
||||
A RapidIO network is a combination of interconnected endpoint and switch devices.
|
||||
Each RapidIO network known to the system is represented by corresponding rio_net
|
||||
data structure. This structure includes lists of all devices and local master
|
||||
ports that form the same network. It also contains a pointer to the default
|
||||
master port that is used to communicate with devices within the network.
|
||||
|
||||
2.5 Device Drivers
|
||||
------------------
|
||||
|
||||
RapidIO device-specific drivers follow Linux Kernel Driver Model and are
|
||||
intended to support specific RapidIO devices attached to the RapidIO network.
|
||||
|
||||
2.6 Subsystem Interfaces
|
||||
------------------------
|
||||
|
||||
RapidIO interconnect specification defines features that may be used to provide
|
||||
one or more common service layers for all participating RapidIO devices. These
|
||||
common services may act separately from device-specific drivers or be used by
|
||||
device-specific drivers. Example of such service provider is the RIONET driver
|
||||
which implements Ethernet-over-RapidIO interface. Because only one driver can be
|
||||
registered for a device, all common RapidIO services have to be registered as
|
||||
subsystem interfaces. This allows to have multiple common services attached to
|
||||
the same device without blocking attachment of a device-specific driver.
|
||||
|
||||
3. Subsystem Initialization
|
||||
===========================
|
||||
|
||||
In order to initialize the RapidIO subsystem, a platform must initialize and
|
||||
register at least one master port within the RapidIO network. To register mport
|
||||
within the subsystem controller driver's initialization code calls function
|
||||
rio_register_mport() for each available master port.
|
||||
|
||||
After all active master ports are registered with a RapidIO subsystem,
|
||||
an enumeration and/or discovery routine may be called automatically or
|
||||
by user-space command.
|
||||
|
||||
RapidIO subsystem can be configured to be built as a statically linked or
|
||||
modular component of the kernel (see details below).
|
||||
|
||||
4. Enumeration and Discovery
|
||||
============================
|
||||
|
||||
4.1 Overview
|
||||
------------
|
||||
|
||||
RapidIO subsystem configuration options allow users to build enumeration and
|
||||
discovery methods as statically linked components or loadable modules.
|
||||
An enumeration/discovery method implementation and available input parameters
|
||||
define how any given method can be attached to available RapidIO mports:
|
||||
simply to all available mports OR individually to the specified mport device.
|
||||
|
||||
Depending on selected enumeration/discovery build configuration, there are
|
||||
several methods to initiate an enumeration and/or discovery process:
|
||||
|
||||
(a) Statically linked enumeration and discovery process can be started
|
||||
automatically during kernel initialization time using corresponding module
|
||||
parameters. This was the original method used since introduction of RapidIO
|
||||
subsystem. Now this method relies on enumerator module parameter which is
|
||||
'rio-scan.scan' for existing basic enumeration/discovery method.
|
||||
When automatic start of enumeration/discovery is used a user has to ensure
|
||||
that all discovering endpoints are started before the enumerating endpoint
|
||||
and are waiting for enumeration to be completed.
|
||||
Configuration option CONFIG_RAPIDIO_DISC_TIMEOUT defines time that discovering
|
||||
endpoint waits for enumeration to be completed. If the specified timeout
|
||||
expires the discovery process is terminated without obtaining RapidIO network
|
||||
information. NOTE: a timed out discovery process may be restarted later using
|
||||
a user-space command as it is described below (if the given endpoint was
|
||||
enumerated successfully).
|
||||
|
||||
(b) Statically linked enumeration and discovery process can be started by
|
||||
a command from user space. This initiation method provides more flexibility
|
||||
for a system startup compared to the option (a) above. After all participating
|
||||
endpoints have been successfully booted, an enumeration process shall be
|
||||
started first by issuing a user-space command, after an enumeration is
|
||||
completed a discovery process can be started on all remaining endpoints.
|
||||
|
||||
(c) Modular enumeration and discovery process can be started by a command from
|
||||
user space. After an enumeration/discovery module is loaded, a network scan
|
||||
process can be started by issuing a user-space command.
|
||||
Similar to the option (b) above, an enumerator has to be started first.
|
||||
|
||||
(d) Modular enumeration and discovery process can be started by a module
|
||||
initialization routine. In this case an enumerating module shall be loaded
|
||||
first.
|
||||
|
||||
When a network scan process is started it calls an enumeration or discovery
|
||||
routine depending on the configured role of a master port: host or agent.
|
||||
|
||||
Enumeration is performed by a master port if it is configured as a host port by
|
||||
assigning a host destination ID greater than or equal to zero. The host
|
||||
destination ID can be assigned to a master port using various methods depending
|
||||
on RapidIO subsystem build configuration:
|
||||
|
||||
(a) For a statically linked RapidIO subsystem core use command line parameter
|
||||
"rapidio.hdid=" with a list of destination ID assignments in order of mport
|
||||
device registration. For example, in a system with two RapidIO controllers
|
||||
the command line parameter "rapidio.hdid=-1,7" will result in assignment of
|
||||
the host destination ID=7 to the second RapidIO controller, while the first
|
||||
one will be assigned destination ID=-1.
|
||||
|
||||
(b) If the RapidIO subsystem core is built as a loadable module, in addition
|
||||
to the method shown above, the host destination ID(s) can be specified using
|
||||
traditional methods of passing module parameter "hdid=" during its loading:
|
||||
|
||||
- from command line: "modprobe rapidio hdid=-1,7", or
|
||||
- from modprobe configuration file using configuration command "options",
|
||||
like in this example: "options rapidio hdid=-1,7". An example of modprobe
|
||||
configuration file is provided in the section below.
|
||||
|
||||
NOTES:
|
||||
(i) if "hdid=" parameter is omitted all available mport will be assigned
|
||||
destination ID = -1;
|
||||
|
||||
(ii) the "hdid=" parameter in systems with multiple mports can have
|
||||
destination ID assignments omitted from the end of list (default = -1).
|
||||
|
||||
If the host device ID for a specific master port is set to -1, the discovery
|
||||
process will be performed for it.
|
||||
|
||||
The enumeration and discovery routines use RapidIO maintenance transactions
|
||||
to access the configuration space of devices.
|
||||
|
||||
NOTE: If RapidIO switch-specific device drivers are built as loadable modules
|
||||
they must be loaded before enumeration/discovery process starts.
|
||||
This requirement is cased by the fact that enumeration/discovery methods invoke
|
||||
vendor-specific callbacks on early stages.
|
||||
|
||||
4.2 Automatic Start of Enumeration and Discovery
|
||||
------------------------------------------------
|
||||
|
||||
Automatic enumeration/discovery start method is applicable only to built-in
|
||||
enumeration/discovery RapidIO configuration selection. To enable automatic
|
||||
enumeration/discovery start by existing basic enumerator method set use boot
|
||||
command line parameter "rio-scan.scan=1".
|
||||
|
||||
This configuration requires synchronized start of all RapidIO endpoints that
|
||||
form a network which will be enumerated/discovered. Discovering endpoints have
|
||||
to be started before an enumeration starts to ensure that all RapidIO
|
||||
controllers have been initialized and are ready to be discovered. Configuration
|
||||
parameter CONFIG_RAPIDIO_DISC_TIMEOUT defines time (in seconds) which
|
||||
a discovering endpoint will wait for enumeration to be completed.
|
||||
|
||||
When automatic enumeration/discovery start is selected, basic method's
|
||||
initialization routine calls rio_init_mports() to perform enumeration or
|
||||
discovery for all known mport devices.
|
||||
|
||||
Depending on RapidIO network size and configuration this automatic
|
||||
enumeration/discovery start method may be difficult to use due to the
|
||||
requirement for synchronized start of all endpoints.
|
||||
|
||||
4.3 User-space Start of Enumeration and Discovery
|
||||
-------------------------------------------------
|
||||
|
||||
User-space start of enumeration and discovery can be used with built-in and
|
||||
modular build configurations. For user-space controlled start RapidIO subsystem
|
||||
creates the sysfs write-only attribute file '/sys/bus/rapidio/scan'. To initiate
|
||||
an enumeration or discovery process on specific mport device, a user needs to
|
||||
write mport_ID (not RapidIO destination ID) into that file. The mport_ID is a
|
||||
sequential number (0 ... RIO_MAX_MPORTS) assigned during mport device
|
||||
registration. For example for machine with single RapidIO controller, mport_ID
|
||||
for that controller always will be 0.
|
||||
|
||||
To initiate RapidIO enumeration/discovery on all available mports a user may
|
||||
write '-1' (or RIO_MPORT_ANY) into the scan attribute file.
|
||||
|
||||
4.4 Basic Enumeration Method
|
||||
----------------------------
|
||||
|
||||
This is an original enumeration/discovery method which is available since
|
||||
first release of RapidIO subsystem code. The enumeration process is
|
||||
implemented according to the enumeration algorithm outlined in the RapidIO
|
||||
Interconnect Specification: Annex I [1].
|
||||
|
||||
This method can be configured as statically linked or loadable module.
|
||||
The method's single parameter "scan" allows to trigger the enumeration/discovery
|
||||
process from module initialization routine.
|
||||
|
||||
This enumeration/discovery method can be started only once and does not support
|
||||
unloading if it is built as a module.
|
||||
|
||||
The enumeration process traverses the network using a recursive depth-first
|
||||
algorithm. When a new device is found, the enumerator takes ownership of that
|
||||
device by writing into the Host Device ID Lock CSR. It does this to ensure that
|
||||
the enumerator has exclusive right to enumerate the device. If device ownership
|
||||
is successfully acquired, the enumerator allocates a new rio_dev structure and
|
||||
initializes it according to device capabilities.
|
||||
|
||||
If the device is an endpoint, a unique device ID is assigned to it and its value
|
||||
is written into the device's Base Device ID CSR.
|
||||
|
||||
If the device is a switch, the enumerator allocates an additional rio_switch
|
||||
structure to store switch specific information. Then the switch's vendor ID and
|
||||
device ID are queried against a table of known RapidIO switches. Each switch
|
||||
table entry contains a pointer to a switch-specific initialization routine that
|
||||
initializes pointers to the rest of switch specific operations, and performs
|
||||
hardware initialization if necessary. A RapidIO switch does not have a unique
|
||||
device ID; it relies on hopcount and routing for device ID of an attached
|
||||
endpoint if access to its configuration registers is required. If a switch (or
|
||||
chain of switches) does not have any endpoint (except enumerator) attached to
|
||||
it, a fake device ID will be assigned to configure a route to that switch.
|
||||
In the case of a chain of switches without endpoint, one fake device ID is used
|
||||
to configure a route through the entire chain and switches are differentiated by
|
||||
their hopcount value.
|
||||
|
||||
For both endpoints and switches the enumerator writes a unique component tag
|
||||
into device's Component Tag CSR. That unique value is used by the error
|
||||
management notification mechanism to identify a device that is reporting an
|
||||
error management event.
|
||||
|
||||
Enumeration beyond a switch is completed by iterating over each active egress
|
||||
port of that switch. For each active link, a route to a default device ID
|
||||
(0xFF for 8-bit systems and 0xFFFF for 16-bit systems) is temporarily written
|
||||
into the routing table. The algorithm recurs by calling itself with hopcount + 1
|
||||
and the default device ID in order to access the device on the active port.
|
||||
|
||||
After the host has completed enumeration of the entire network it releases
|
||||
devices by clearing device ID locks (calls rio_clear_locks()). For each endpoint
|
||||
in the system, it sets the Discovered bit in the Port General Control CSR
|
||||
to indicate that enumeration is completed and agents are allowed to execute
|
||||
passive discovery of the network.
|
||||
|
||||
The discovery process is performed by agents and is similar to the enumeration
|
||||
process that is described above. However, the discovery process is performed
|
||||
without changes to the existing routing because agents only gather information
|
||||
about RapidIO network structure and are building an internal map of discovered
|
||||
devices. This way each Linux-based component of the RapidIO subsystem has
|
||||
a complete view of the network. The discovery process can be performed
|
||||
simultaneously by several agents. After initializing its RapidIO master port
|
||||
each agent waits for enumeration completion by the host for the configured wait
|
||||
time period. If this wait time period expires before enumeration is completed,
|
||||
an agent skips RapidIO discovery and continues with remaining kernel
|
||||
initialization.
|
||||
|
||||
4.5 Adding New Enumeration/Discovery Method
|
||||
-------------------------------------------
|
||||
|
||||
RapidIO subsystem code organization allows addition of new enumeration/discovery
|
||||
methods as new configuration options without significant impact to the core
|
||||
RapidIO code.
|
||||
|
||||
A new enumeration/discovery method has to be attached to one or more mport
|
||||
devices before an enumeration/discovery process can be started. Normally,
|
||||
method's module initialization routine calls rio_register_scan() to attach
|
||||
an enumerator to a specified mport device (or devices). The basic enumerator
|
||||
implementation demonstrates this process.
|
||||
|
||||
4.6 Using Loadable RapidIO Switch Drivers
|
||||
-----------------------------------------
|
||||
|
||||
In the case when RapidIO switch drivers are built as loadable modules a user
|
||||
must ensure that they are loaded before the enumeration/discovery starts.
|
||||
This process can be automated by specifying pre- or post- dependencies in the
|
||||
RapidIO-specific modprobe configuration file as shown in the example below.
|
||||
|
||||
File /etc/modprobe.d/rapidio.conf::
|
||||
|
||||
# Configure RapidIO subsystem modules
|
||||
|
||||
# Set enumerator host destination ID (overrides kernel command line option)
|
||||
options rapidio hdid=-1,2
|
||||
|
||||
# Load RapidIO switch drivers immediately after rapidio core module was loaded
|
||||
softdep rapidio post: idt_gen2 idtcps tsi57x
|
||||
|
||||
# OR :
|
||||
|
||||
# Load RapidIO switch drivers just before rio-scan enumerator module is loaded
|
||||
softdep rio-scan pre: idt_gen2 idtcps tsi57x
|
||||
|
||||
--------------------------
|
||||
|
||||
NOTE:
|
||||
In the example above, one of "softdep" commands must be removed or
|
||||
commented out to keep required module loading sequence.
|
||||
|
||||
5. References
|
||||
=============
|
||||
|
||||
[1] RapidIO Trade Association. RapidIO Interconnect Specifications.
|
||||
http://www.rapidio.org.
|
||||
|
||||
[2] Rapidio TA. Technology Comparisons.
|
||||
http://www.rapidio.org/education/technology_comparisons/
|
||||
|
||||
[3] RapidIO support for Linux.
|
||||
http://lwn.net/Articles/139118/
|
||||
|
||||
[4] Matt Porter. RapidIO for Linux. Ottawa Linux Symposium, 2005
|
||||
http://www.kernel.org/doc/ols/2005/ols2005v2-pages-43-56.pdf
|
135
Documentation/driver-api/rapidio/rio_cm.rst
Normal file
135
Documentation/driver-api/rapidio/rio_cm.rst
Normal file
@@ -0,0 +1,135 @@
|
||||
==========================================================================
|
||||
RapidIO subsystem Channelized Messaging character device driver (rio_cm.c)
|
||||
==========================================================================
|
||||
|
||||
|
||||
1. Overview
|
||||
===========
|
||||
|
||||
This device driver is the result of collaboration within the RapidIO.org
|
||||
Software Task Group (STG) between Texas Instruments, Prodrive Technologies,
|
||||
Nokia Networks, BAE and IDT. Additional input was received from other members
|
||||
of RapidIO.org.
|
||||
|
||||
The objective was to create a character mode driver interface which exposes
|
||||
messaging capabilities of RapidIO endpoint devices (mports) directly
|
||||
to applications, in a manner that allows the numerous and varied RapidIO
|
||||
implementations to interoperate.
|
||||
|
||||
This driver (RIO_CM) provides to user-space applications shared access to
|
||||
RapidIO mailbox messaging resources.
|
||||
|
||||
RapidIO specification (Part 2) defines that endpoint devices may have up to four
|
||||
messaging mailboxes in case of multi-packet message (up to 4KB) and
|
||||
up to 64 mailboxes if single-packet messages (up to 256 B) are used. In addition
|
||||
to protocol definition limitations, a particular hardware implementation can
|
||||
have reduced number of messaging mailboxes. RapidIO aware applications must
|
||||
therefore share the messaging resources of a RapidIO endpoint.
|
||||
|
||||
Main purpose of this device driver is to provide RapidIO mailbox messaging
|
||||
capability to large number of user-space processes by introducing socket-like
|
||||
operations using a single messaging mailbox. This allows applications to
|
||||
use the limited RapidIO messaging hardware resources efficiently.
|
||||
|
||||
Most of device driver's operations are supported through 'ioctl' system calls.
|
||||
|
||||
When loaded this device driver creates a single file system node named rio_cm
|
||||
in /dev directory common for all registered RapidIO mport devices.
|
||||
|
||||
Following ioctl commands are available to user-space applications:
|
||||
|
||||
- RIO_CM_MPORT_GET_LIST:
|
||||
Returns to caller list of local mport devices that
|
||||
support messaging operations (number of entries up to RIO_MAX_MPORTS).
|
||||
Each list entry is combination of mport's index in the system and RapidIO
|
||||
destination ID assigned to the port.
|
||||
- RIO_CM_EP_GET_LIST_SIZE:
|
||||
Returns number of messaging capable remote endpoints
|
||||
in a RapidIO network associated with the specified mport device.
|
||||
- RIO_CM_EP_GET_LIST:
|
||||
Returns list of RapidIO destination IDs for messaging
|
||||
capable remote endpoints (peers) available in a RapidIO network associated
|
||||
with the specified mport device.
|
||||
- RIO_CM_CHAN_CREATE:
|
||||
Creates RapidIO message exchange channel data structure
|
||||
with channel ID assigned automatically or as requested by a caller.
|
||||
- RIO_CM_CHAN_BIND:
|
||||
Binds the specified channel data structure to the specified
|
||||
mport device.
|
||||
- RIO_CM_CHAN_LISTEN:
|
||||
Enables listening for connection requests on the specified
|
||||
channel.
|
||||
- RIO_CM_CHAN_ACCEPT:
|
||||
Accepts a connection request from peer on the specified
|
||||
channel. If wait timeout for this request is specified by a caller it is
|
||||
a blocking call. If timeout set to 0 this is non-blocking call - ioctl
|
||||
handler checks for a pending connection request and if one is not available
|
||||
exits with -EGAIN error status immediately.
|
||||
- RIO_CM_CHAN_CONNECT:
|
||||
Sends a connection request to a remote peer/channel.
|
||||
- RIO_CM_CHAN_SEND:
|
||||
Sends a data message through the specified channel.
|
||||
The handler for this request assumes that message buffer specified by
|
||||
a caller includes the reserved space for a packet header required by
|
||||
this driver.
|
||||
- RIO_CM_CHAN_RECEIVE:
|
||||
Receives a data message through a connected channel.
|
||||
If the channel does not have an incoming message ready to return this ioctl
|
||||
handler will wait for new message until timeout specified by a caller
|
||||
expires. If timeout value is set to 0, ioctl handler uses a default value
|
||||
defined by MAX_SCHEDULE_TIMEOUT.
|
||||
- RIO_CM_CHAN_CLOSE:
|
||||
Closes a specified channel and frees associated buffers.
|
||||
If the specified channel is in the CONNECTED state, sends close notification
|
||||
to the remote peer.
|
||||
|
||||
The ioctl command codes and corresponding data structures intended for use by
|
||||
user-space applications are defined in 'include/uapi/linux/rio_cm_cdev.h'.
|
||||
|
||||
2. Hardware Compatibility
|
||||
=========================
|
||||
|
||||
This device driver uses standard interfaces defined by kernel RapidIO subsystem
|
||||
and therefore it can be used with any mport device driver registered by RapidIO
|
||||
subsystem with limitations set by available mport HW implementation of messaging
|
||||
mailboxes.
|
||||
|
||||
3. Module parameters
|
||||
====================
|
||||
|
||||
- 'dbg_level'
|
||||
- This parameter allows to control amount of debug information
|
||||
generated by this device driver. This parameter is formed by set of
|
||||
bit masks that correspond to the specific functional block.
|
||||
For mask definitions see 'drivers/rapidio/devices/rio_cm.c'
|
||||
This parameter can be changed dynamically.
|
||||
Use CONFIG_RAPIDIO_DEBUG=y to enable debug output at the top level.
|
||||
|
||||
- 'cmbox'
|
||||
- Number of RapidIO mailbox to use (default value is 1).
|
||||
This parameter allows to set messaging mailbox number that will be used
|
||||
within entire RapidIO network. It can be used when default mailbox is
|
||||
used by other device drivers or is not supported by some nodes in the
|
||||
RapidIO network.
|
||||
|
||||
- 'chstart'
|
||||
- Start channel number for dynamic assignment. Default value - 256.
|
||||
Allows to exclude channel numbers below this parameter from dynamic
|
||||
allocation to avoid conflicts with software components that use
|
||||
reserved predefined channel numbers.
|
||||
|
||||
4. Known problems
|
||||
=================
|
||||
|
||||
None.
|
||||
|
||||
5. User-space Applications and API Library
|
||||
==========================================
|
||||
|
||||
Messaging API library and applications that use this device driver are available
|
||||
from RapidIO.org.
|
||||
|
||||
6. TODO List
|
||||
============
|
||||
|
||||
- Add support for system notification messages (reserved channel 0).
|
7
Documentation/driver-api/rapidio/sysfs.rst
Normal file
7
Documentation/driver-api/rapidio/sysfs.rst
Normal file
@@ -0,0 +1,7 @@
|
||||
=============
|
||||
Sysfs entries
|
||||
=============
|
||||
|
||||
The RapidIO sysfs files have moved to:
|
||||
Documentation/ABI/testing/sysfs-bus-rapidio and
|
||||
Documentation/ABI/testing/sysfs-class-rapidio
|
112
Documentation/driver-api/rapidio/tsi721.rst
Normal file
112
Documentation/driver-api/rapidio/tsi721.rst
Normal file
@@ -0,0 +1,112 @@
|
||||
=========================================================================
|
||||
RapidIO subsystem mport driver for IDT Tsi721 PCI Express-to-SRIO bridge.
|
||||
=========================================================================
|
||||
|
||||
1. Overview
|
||||
===========
|
||||
|
||||
This driver implements all currently defined RapidIO mport callback functions.
|
||||
It supports maintenance read and write operations, inbound and outbound RapidIO
|
||||
doorbells, inbound maintenance port-writes and RapidIO messaging.
|
||||
|
||||
To generate SRIO maintenance transactions this driver uses one of Tsi721 DMA
|
||||
channels. This mechanism provides access to larger range of hop counts and
|
||||
destination IDs without need for changes in outbound window translation.
|
||||
|
||||
RapidIO messaging support uses dedicated messaging channels for each mailbox.
|
||||
For inbound messages this driver uses destination ID matching to forward messages
|
||||
into the corresponding message queue. Messaging callbacks are implemented to be
|
||||
fully compatible with RIONET driver (Ethernet over RapidIO messaging services).
|
||||
|
||||
1. Module parameters:
|
||||
|
||||
- 'dbg_level'
|
||||
- This parameter allows to control amount of debug information
|
||||
generated by this device driver. This parameter is formed by set of
|
||||
This parameter can be changed bit masks that correspond to the specific
|
||||
functional block.
|
||||
For mask definitions see 'drivers/rapidio/devices/tsi721.h'
|
||||
This parameter can be changed dynamically.
|
||||
Use CONFIG_RAPIDIO_DEBUG=y to enable debug output at the top level.
|
||||
|
||||
- 'dma_desc_per_channel'
|
||||
- This parameter defines number of hardware buffer
|
||||
descriptors allocated for each registered Tsi721 DMA channel.
|
||||
Its default value is 128.
|
||||
|
||||
- 'dma_txqueue_sz'
|
||||
- DMA transactions queue size. Defines number of pending
|
||||
transaction requests that can be accepted by each DMA channel.
|
||||
Default value is 16.
|
||||
|
||||
- 'dma_sel'
|
||||
- DMA channel selection mask. Bitmask that defines which hardware
|
||||
DMA channels (0 ... 6) will be registered with DmaEngine core.
|
||||
If bit is set to 1, the corresponding DMA channel will be registered.
|
||||
DMA channels not selected by this mask will not be used by this device
|
||||
driver. Default value is 0x7f (use all channels).
|
||||
|
||||
- 'pcie_mrrs'
|
||||
- override value for PCIe Maximum Read Request Size (MRRS).
|
||||
This parameter gives an ability to override MRRS value set during PCIe
|
||||
configuration process. Tsi721 supports read request sizes up to 4096B.
|
||||
Value for this parameter must be set as defined by PCIe specification:
|
||||
0 = 128B, 1 = 256B, 2 = 512B, 3 = 1024B, 4 = 2048B and 5 = 4096B.
|
||||
Default value is '-1' (= keep platform setting).
|
||||
|
||||
- 'mbox_sel'
|
||||
- RIO messaging MBOX selection mask. This is a bitmask that defines
|
||||
messaging MBOXes are managed by this device driver. Mask bits 0 - 3
|
||||
correspond to MBOX0 - MBOX3. MBOX is under driver's control if the
|
||||
corresponding bit is set to '1'. Default value is 0x0f (= all).
|
||||
|
||||
2. Known problems
|
||||
=================
|
||||
|
||||
None.
|
||||
|
||||
3. DMA Engine Support
|
||||
=====================
|
||||
|
||||
Tsi721 mport driver supports DMA data transfers between local system memory and
|
||||
remote RapidIO devices. This functionality is implemented according to SLAVE
|
||||
mode API defined by common Linux kernel DMA Engine framework.
|
||||
|
||||
Depending on system requirements RapidIO DMA operations can be included/excluded
|
||||
by setting CONFIG_RAPIDIO_DMA_ENGINE option. Tsi721 miniport driver uses seven
|
||||
out of eight available BDMA channels to support DMA data transfers.
|
||||
One BDMA channel is reserved for generation of maintenance read/write requests.
|
||||
|
||||
If Tsi721 mport driver have been built with RAPIDIO_DMA_ENGINE support included,
|
||||
this driver will accept DMA-specific module parameter:
|
||||
|
||||
"dma_desc_per_channel"
|
||||
- defines number of hardware buffer descriptors used by
|
||||
each BDMA channel of Tsi721 (by default - 128).
|
||||
|
||||
4. Version History
|
||||
|
||||
===== ====================================================================
|
||||
1.1.0 DMA operations re-worked to support data scatter/gather lists larger
|
||||
than hardware buffer descriptors ring.
|
||||
1.0.0 Initial driver release.
|
||||
===== ====================================================================
|
||||
|
||||
5. License
|
||||
===========
|
||||
|
||||
Copyright(c) 2011 Integrated Device Technology, Inc. All rights reserved.
|
||||
|
||||
This program is free software; you can redistribute it and/or modify it
|
||||
under the terms of the GNU General Public License as published by the Free
|
||||
Software Foundation; either version 2 of the License, or (at your option)
|
||||
any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful, but WITHOUT
|
||||
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
|
||||
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
|
||||
more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License along with
|
||||
this program; if not, write to the Free Software Foundation, Inc.,
|
||||
59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
|
132
Documentation/driver-api/rfkill.rst
Normal file
132
Documentation/driver-api/rfkill.rst
Normal file
@@ -0,0 +1,132 @@
|
||||
===============================
|
||||
rfkill - RF kill switch support
|
||||
===============================
|
||||
|
||||
|
||||
.. contents::
|
||||
:depth: 2
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
The rfkill subsystem provides a generic interface for disabling any radio
|
||||
transmitter in the system. When a transmitter is blocked, it shall not
|
||||
radiate any power.
|
||||
|
||||
The subsystem also provides the ability to react on button presses and
|
||||
disable all transmitters of a certain type (or all). This is intended for
|
||||
situations where transmitters need to be turned off, for example on
|
||||
aircraft.
|
||||
|
||||
The rfkill subsystem has a concept of "hard" and "soft" block, which
|
||||
differ little in their meaning (block == transmitters off) but rather in
|
||||
whether they can be changed or not:
|
||||
|
||||
- hard block
|
||||
read-only radio block that cannot be overridden by software
|
||||
|
||||
- soft block
|
||||
writable radio block (need not be readable) that is set by
|
||||
the system software.
|
||||
|
||||
The rfkill subsystem has two parameters, rfkill.default_state and
|
||||
rfkill.master_switch_mode, which are documented in
|
||||
admin-guide/kernel-parameters.rst.
|
||||
|
||||
|
||||
Implementation details
|
||||
======================
|
||||
|
||||
The rfkill subsystem is composed of three main components:
|
||||
|
||||
* the rfkill core,
|
||||
* the deprecated rfkill-input module (an input layer handler, being
|
||||
replaced by userspace policy code) and
|
||||
* the rfkill drivers.
|
||||
|
||||
The rfkill core provides API for kernel drivers to register their radio
|
||||
transmitter with the kernel, methods for turning it on and off, and letting
|
||||
the system know about hardware-disabled states that may be implemented on
|
||||
the device.
|
||||
|
||||
The rfkill core code also notifies userspace of state changes, and provides
|
||||
ways for userspace to query the current states. See the "Userspace support"
|
||||
section below.
|
||||
|
||||
When the device is hard-blocked (either by a call to rfkill_set_hw_state()
|
||||
or from query_hw_block), set_block() will be invoked for additional software
|
||||
block, but drivers can ignore the method call since they can use the return
|
||||
value of the function rfkill_set_hw_state() to sync the software state
|
||||
instead of keeping track of calls to set_block(). In fact, drivers should
|
||||
use the return value of rfkill_set_hw_state() unless the hardware actually
|
||||
keeps track of soft and hard block separately.
|
||||
|
||||
|
||||
Kernel API
|
||||
==========
|
||||
|
||||
Drivers for radio transmitters normally implement an rfkill driver.
|
||||
|
||||
Platform drivers might implement input devices if the rfkill button is just
|
||||
that, a button. If that button influences the hardware then you need to
|
||||
implement an rfkill driver instead. This also applies if the platform provides
|
||||
a way to turn on/off the transmitter(s).
|
||||
|
||||
For some platforms, it is possible that the hardware state changes during
|
||||
suspend/hibernation, in which case it will be necessary to update the rfkill
|
||||
core with the current state at resume time.
|
||||
|
||||
To create an rfkill driver, driver's Kconfig needs to have::
|
||||
|
||||
depends on RFKILL || !RFKILL
|
||||
|
||||
to ensure the driver cannot be built-in when rfkill is modular. The !RFKILL
|
||||
case allows the driver to be built when rfkill is not configured, in which
|
||||
case all rfkill API can still be used but will be provided by static inlines
|
||||
which compile to almost nothing.
|
||||
|
||||
Calling rfkill_set_hw_state() when a state change happens is required from
|
||||
rfkill drivers that control devices that can be hard-blocked unless they also
|
||||
assign the poll_hw_block() callback (then the rfkill core will poll the
|
||||
device). Don't do this unless you cannot get the event in any other way.
|
||||
|
||||
rfkill provides per-switch LED triggers, which can be used to drive LEDs
|
||||
according to the switch state (LED_FULL when blocked, LED_OFF otherwise).
|
||||
|
||||
|
||||
Userspace support
|
||||
=================
|
||||
|
||||
The recommended userspace interface to use is /dev/rfkill, which is a misc
|
||||
character device that allows userspace to obtain and set the state of rfkill
|
||||
devices and sets of devices. It also notifies userspace about device addition
|
||||
and removal. The API is a simple read/write API that is defined in
|
||||
linux/rfkill.h, with one ioctl that allows turning off the deprecated input
|
||||
handler in the kernel for the transition period.
|
||||
|
||||
Except for the one ioctl, communication with the kernel is done via read()
|
||||
and write() of instances of 'struct rfkill_event'. In this structure, the
|
||||
soft and hard block are properly separated (unlike sysfs, see below) and
|
||||
userspace is able to get a consistent snapshot of all rfkill devices in the
|
||||
system. Also, it is possible to switch all rfkill drivers (or all drivers of
|
||||
a specified type) into a state which also updates the default state for
|
||||
hotplugged devices.
|
||||
|
||||
After an application opens /dev/rfkill, it can read the current state of all
|
||||
devices. Changes can be obtained by either polling the descriptor for
|
||||
hotplug or state change events or by listening for uevents emitted by the
|
||||
rfkill core framework.
|
||||
|
||||
Additionally, each rfkill device is registered in sysfs and emits uevents.
|
||||
|
||||
rfkill devices issue uevents (with an action of "change"), with the following
|
||||
environment variables set::
|
||||
|
||||
RFKILL_NAME
|
||||
RFKILL_STATE
|
||||
RFKILL_TYPE
|
||||
|
||||
The content of these variables corresponds to the "name", "state" and
|
||||
"type" sysfs files explained above.
|
||||
|
||||
For further details consult Documentation/ABI/stable/sysfs-class-rfkill.
|
11
Documentation/driver-api/serial/cyclades_z.rst
Normal file
11
Documentation/driver-api/serial/cyclades_z.rst
Normal file
@@ -0,0 +1,11 @@
|
||||
================
|
||||
Cyclades-Z notes
|
||||
================
|
||||
|
||||
The Cyclades-Z must have firmware loaded onto the card before it will
|
||||
operate. This operation should be performed during system startup,
|
||||
|
||||
The firmware, loader program and the latest device driver code are
|
||||
available from Cyclades at
|
||||
|
||||
ftp://ftp.cyclades.com/pub/cyclades/cyclades-z/linux/
|
549
Documentation/driver-api/serial/driver.rst
Normal file
549
Documentation/driver-api/serial/driver.rst
Normal file
@@ -0,0 +1,549 @@
|
||||
====================
|
||||
Low Level Serial API
|
||||
====================
|
||||
|
||||
|
||||
This document is meant as a brief overview of some aspects of the new serial
|
||||
driver. It is not complete, any questions you have should be directed to
|
||||
<rmk@arm.linux.org.uk>
|
||||
|
||||
The reference implementation is contained within amba-pl011.c.
|
||||
|
||||
|
||||
|
||||
Low Level Serial Hardware Driver
|
||||
--------------------------------
|
||||
|
||||
The low level serial hardware driver is responsible for supplying port
|
||||
information (defined by uart_port) and a set of control methods (defined
|
||||
by uart_ops) to the core serial driver. The low level driver is also
|
||||
responsible for handling interrupts for the port, and providing any
|
||||
console support.
|
||||
|
||||
|
||||
Console Support
|
||||
---------------
|
||||
|
||||
The serial core provides a few helper functions. This includes identifing
|
||||
the correct port structure (via uart_get_console) and decoding command line
|
||||
arguments (uart_parse_options).
|
||||
|
||||
There is also a helper function (uart_console_write) which performs a
|
||||
character by character write, translating newlines to CRLF sequences.
|
||||
Driver writers are recommended to use this function rather than implementing
|
||||
their own version.
|
||||
|
||||
|
||||
Locking
|
||||
-------
|
||||
|
||||
It is the responsibility of the low level hardware driver to perform the
|
||||
necessary locking using port->lock. There are some exceptions (which
|
||||
are described in the uart_ops listing below.)
|
||||
|
||||
There are two locks. A per-port spinlock, and an overall semaphore.
|
||||
|
||||
From the core driver perspective, the port->lock locks the following
|
||||
data::
|
||||
|
||||
port->mctrl
|
||||
port->icount
|
||||
port->state->xmit.head (circ_buf->head)
|
||||
port->state->xmit.tail (circ_buf->tail)
|
||||
|
||||
The low level driver is free to use this lock to provide any additional
|
||||
locking.
|
||||
|
||||
The port_sem semaphore is used to protect against ports being added/
|
||||
removed or reconfigured at inappropriate times. Since v2.6.27, this
|
||||
semaphore has been the 'mutex' member of the tty_port struct, and
|
||||
commonly referred to as the port mutex.
|
||||
|
||||
|
||||
uart_ops
|
||||
--------
|
||||
|
||||
The uart_ops structure is the main interface between serial_core and the
|
||||
hardware specific driver. It contains all the methods to control the
|
||||
hardware.
|
||||
|
||||
tx_empty(port)
|
||||
This function tests whether the transmitter fifo and shifter
|
||||
for the port described by 'port' is empty. If it is empty,
|
||||
this function should return TIOCSER_TEMT, otherwise return 0.
|
||||
If the port does not support this operation, then it should
|
||||
return TIOCSER_TEMT.
|
||||
|
||||
Locking: none.
|
||||
|
||||
Interrupts: caller dependent.
|
||||
|
||||
This call must not sleep
|
||||
|
||||
set_mctrl(port, mctrl)
|
||||
This function sets the modem control lines for port described
|
||||
by 'port' to the state described by mctrl. The relevant bits
|
||||
of mctrl are:
|
||||
|
||||
- TIOCM_RTS RTS signal.
|
||||
- TIOCM_DTR DTR signal.
|
||||
- TIOCM_OUT1 OUT1 signal.
|
||||
- TIOCM_OUT2 OUT2 signal.
|
||||
- TIOCM_LOOP Set the port into loopback mode.
|
||||
|
||||
If the appropriate bit is set, the signal should be driven
|
||||
active. If the bit is clear, the signal should be driven
|
||||
inactive.
|
||||
|
||||
Locking: port->lock taken.
|
||||
|
||||
Interrupts: locally disabled.
|
||||
|
||||
This call must not sleep
|
||||
|
||||
get_mctrl(port)
|
||||
Returns the current state of modem control inputs. The state
|
||||
of the outputs should not be returned, since the core keeps
|
||||
track of their state. The state information should include:
|
||||
|
||||
- TIOCM_CAR state of DCD signal
|
||||
- TIOCM_CTS state of CTS signal
|
||||
- TIOCM_DSR state of DSR signal
|
||||
- TIOCM_RI state of RI signal
|
||||
|
||||
The bit is set if the signal is currently driven active. If
|
||||
the port does not support CTS, DCD or DSR, the driver should
|
||||
indicate that the signal is permanently active. If RI is
|
||||
not available, the signal should not be indicated as active.
|
||||
|
||||
Locking: port->lock taken.
|
||||
|
||||
Interrupts: locally disabled.
|
||||
|
||||
This call must not sleep
|
||||
|
||||
stop_tx(port)
|
||||
Stop transmitting characters. This might be due to the CTS
|
||||
line becoming inactive or the tty layer indicating we want
|
||||
to stop transmission due to an XOFF character.
|
||||
|
||||
The driver should stop transmitting characters as soon as
|
||||
possible.
|
||||
|
||||
Locking: port->lock taken.
|
||||
|
||||
Interrupts: locally disabled.
|
||||
|
||||
This call must not sleep
|
||||
|
||||
start_tx(port)
|
||||
Start transmitting characters.
|
||||
|
||||
Locking: port->lock taken.
|
||||
|
||||
Interrupts: locally disabled.
|
||||
|
||||
This call must not sleep
|
||||
|
||||
throttle(port)
|
||||
Notify the serial driver that input buffers for the line discipline are
|
||||
close to full, and it should somehow signal that no more characters
|
||||
should be sent to the serial port.
|
||||
This will be called only if hardware assisted flow control is enabled.
|
||||
|
||||
Locking: serialized with .unthrottle() and termios modification by the
|
||||
tty layer.
|
||||
|
||||
unthrottle(port)
|
||||
Notify the serial driver that characters can now be sent to the serial
|
||||
port without fear of overrunning the input buffers of the line
|
||||
disciplines.
|
||||
|
||||
This will be called only if hardware assisted flow control is enabled.
|
||||
|
||||
Locking: serialized with .throttle() and termios modification by the
|
||||
tty layer.
|
||||
|
||||
send_xchar(port,ch)
|
||||
Transmit a high priority character, even if the port is stopped.
|
||||
This is used to implement XON/XOFF flow control and tcflow(). If
|
||||
the serial driver does not implement this function, the tty core
|
||||
will append the character to the circular buffer and then call
|
||||
start_tx() / stop_tx() to flush the data out.
|
||||
|
||||
Do not transmit if ch == '\0' (__DISABLED_CHAR).
|
||||
|
||||
Locking: none.
|
||||
|
||||
Interrupts: caller dependent.
|
||||
|
||||
stop_rx(port)
|
||||
Stop receiving characters; the port is in the process of
|
||||
being closed.
|
||||
|
||||
Locking: port->lock taken.
|
||||
|
||||
Interrupts: locally disabled.
|
||||
|
||||
This call must not sleep
|
||||
|
||||
enable_ms(port)
|
||||
Enable the modem status interrupts.
|
||||
|
||||
This method may be called multiple times. Modem status
|
||||
interrupts should be disabled when the shutdown method is
|
||||
called.
|
||||
|
||||
Locking: port->lock taken.
|
||||
|
||||
Interrupts: locally disabled.
|
||||
|
||||
This call must not sleep
|
||||
|
||||
break_ctl(port,ctl)
|
||||
Control the transmission of a break signal. If ctl is
|
||||
nonzero, the break signal should be transmitted. The signal
|
||||
should be terminated when another call is made with a zero
|
||||
ctl.
|
||||
|
||||
Locking: caller holds tty_port->mutex
|
||||
|
||||
startup(port)
|
||||
Grab any interrupt resources and initialise any low level driver
|
||||
state. Enable the port for reception. It should not activate
|
||||
RTS nor DTR; this will be done via a separate call to set_mctrl.
|
||||
|
||||
This method will only be called when the port is initially opened.
|
||||
|
||||
Locking: port_sem taken.
|
||||
|
||||
Interrupts: globally disabled.
|
||||
|
||||
shutdown(port)
|
||||
Disable the port, disable any break condition that may be in
|
||||
effect, and free any interrupt resources. It should not disable
|
||||
RTS nor DTR; this will have already been done via a separate
|
||||
call to set_mctrl.
|
||||
|
||||
Drivers must not access port->state once this call has completed.
|
||||
|
||||
This method will only be called when there are no more users of
|
||||
this port.
|
||||
|
||||
Locking: port_sem taken.
|
||||
|
||||
Interrupts: caller dependent.
|
||||
|
||||
flush_buffer(port)
|
||||
Flush any write buffers, reset any DMA state and stop any
|
||||
ongoing DMA transfers.
|
||||
|
||||
This will be called whenever the port->state->xmit circular
|
||||
buffer is cleared.
|
||||
|
||||
Locking: port->lock taken.
|
||||
|
||||
Interrupts: locally disabled.
|
||||
|
||||
This call must not sleep
|
||||
|
||||
set_termios(port,termios,oldtermios)
|
||||
Change the port parameters, including word length, parity, stop
|
||||
bits. Update read_status_mask and ignore_status_mask to indicate
|
||||
the types of events we are interested in receiving. Relevant
|
||||
termios->c_cflag bits are:
|
||||
|
||||
CSIZE
|
||||
- word size
|
||||
CSTOPB
|
||||
- 2 stop bits
|
||||
PARENB
|
||||
- parity enable
|
||||
PARODD
|
||||
- odd parity (when PARENB is in force)
|
||||
CREAD
|
||||
- enable reception of characters (if not set,
|
||||
still receive characters from the port, but
|
||||
throw them away.
|
||||
CRTSCTS
|
||||
- if set, enable CTS status change reporting
|
||||
CLOCAL
|
||||
- if not set, enable modem status change
|
||||
reporting.
|
||||
|
||||
Relevant termios->c_iflag bits are:
|
||||
|
||||
INPCK
|
||||
- enable frame and parity error events to be
|
||||
passed to the TTY layer.
|
||||
BRKINT / PARMRK
|
||||
- both of these enable break events to be
|
||||
passed to the TTY layer.
|
||||
|
||||
IGNPAR
|
||||
- ignore parity and framing errors
|
||||
IGNBRK
|
||||
- ignore break errors, If IGNPAR is also
|
||||
set, ignore overrun errors as well.
|
||||
|
||||
The interaction of the iflag bits is as follows (parity error
|
||||
given as an example):
|
||||
|
||||
=============== ======= ====== =============================
|
||||
Parity error INPCK IGNPAR
|
||||
=============== ======= ====== =============================
|
||||
n/a 0 n/a character received, marked as
|
||||
TTY_NORMAL
|
||||
None 1 n/a character received, marked as
|
||||
TTY_NORMAL
|
||||
Yes 1 0 character received, marked as
|
||||
TTY_PARITY
|
||||
Yes 1 1 character discarded
|
||||
=============== ======= ====== =============================
|
||||
|
||||
Other flags may be used (eg, xon/xoff characters) if your
|
||||
hardware supports hardware "soft" flow control.
|
||||
|
||||
Locking: caller holds tty_port->mutex
|
||||
|
||||
Interrupts: caller dependent.
|
||||
|
||||
This call must not sleep
|
||||
|
||||
set_ldisc(port,termios)
|
||||
Notifier for discipline change. See Documentation/driver-api/serial/tty.rst.
|
||||
|
||||
Locking: caller holds tty_port->mutex
|
||||
|
||||
pm(port,state,oldstate)
|
||||
Perform any power management related activities on the specified
|
||||
port. State indicates the new state (defined by
|
||||
enum uart_pm_state), oldstate indicates the previous state.
|
||||
|
||||
This function should not be used to grab any resources.
|
||||
|
||||
This will be called when the port is initially opened and finally
|
||||
closed, except when the port is also the system console. This
|
||||
will occur even if CONFIG_PM is not set.
|
||||
|
||||
Locking: none.
|
||||
|
||||
Interrupts: caller dependent.
|
||||
|
||||
type(port)
|
||||
Return a pointer to a string constant describing the specified
|
||||
port, or return NULL, in which case the string 'unknown' is
|
||||
substituted.
|
||||
|
||||
Locking: none.
|
||||
|
||||
Interrupts: caller dependent.
|
||||
|
||||
release_port(port)
|
||||
Release any memory and IO region resources currently in use by
|
||||
the port.
|
||||
|
||||
Locking: none.
|
||||
|
||||
Interrupts: caller dependent.
|
||||
|
||||
request_port(port)
|
||||
Request any memory and IO region resources required by the port.
|
||||
If any fail, no resources should be registered when this function
|
||||
returns, and it should return -EBUSY on failure.
|
||||
|
||||
Locking: none.
|
||||
|
||||
Interrupts: caller dependent.
|
||||
|
||||
config_port(port,type)
|
||||
Perform any autoconfiguration steps required for the port. `type`
|
||||
contains a bit mask of the required configuration. UART_CONFIG_TYPE
|
||||
indicates that the port requires detection and identification.
|
||||
port->type should be set to the type found, or PORT_UNKNOWN if
|
||||
no port was detected.
|
||||
|
||||
UART_CONFIG_IRQ indicates autoconfiguration of the interrupt signal,
|
||||
which should be probed using standard kernel autoprobing techniques.
|
||||
This is not necessary on platforms where ports have interrupts
|
||||
internally hard wired (eg, system on a chip implementations).
|
||||
|
||||
Locking: none.
|
||||
|
||||
Interrupts: caller dependent.
|
||||
|
||||
verify_port(port,serinfo)
|
||||
Verify the new serial port information contained within serinfo is
|
||||
suitable for this port type.
|
||||
|
||||
Locking: none.
|
||||
|
||||
Interrupts: caller dependent.
|
||||
|
||||
ioctl(port,cmd,arg)
|
||||
Perform any port specific IOCTLs. IOCTL commands must be defined
|
||||
using the standard numbering system found in <asm/ioctl.h>
|
||||
|
||||
Locking: none.
|
||||
|
||||
Interrupts: caller dependent.
|
||||
|
||||
poll_init(port)
|
||||
Called by kgdb to perform the minimal hardware initialization needed
|
||||
to support poll_put_char() and poll_get_char(). Unlike ->startup()
|
||||
this should not request interrupts.
|
||||
|
||||
Locking: tty_mutex and tty_port->mutex taken.
|
||||
|
||||
Interrupts: n/a.
|
||||
|
||||
poll_put_char(port,ch)
|
||||
Called by kgdb to write a single character directly to the serial
|
||||
port. It can and should block until there is space in the TX FIFO.
|
||||
|
||||
Locking: none.
|
||||
|
||||
Interrupts: caller dependent.
|
||||
|
||||
This call must not sleep
|
||||
|
||||
poll_get_char(port)
|
||||
Called by kgdb to read a single character directly from the serial
|
||||
port. If data is available, it should be returned; otherwise
|
||||
the function should return NO_POLL_CHAR immediately.
|
||||
|
||||
Locking: none.
|
||||
|
||||
Interrupts: caller dependent.
|
||||
|
||||
This call must not sleep
|
||||
|
||||
Other functions
|
||||
---------------
|
||||
|
||||
uart_update_timeout(port,cflag,baud)
|
||||
Update the FIFO drain timeout, port->timeout, according to the
|
||||
number of bits, parity, stop bits and baud rate.
|
||||
|
||||
Locking: caller is expected to take port->lock
|
||||
|
||||
Interrupts: n/a
|
||||
|
||||
uart_get_baud_rate(port,termios,old,min,max)
|
||||
Return the numeric baud rate for the specified termios, taking
|
||||
account of the special 38400 baud "kludge". The B0 baud rate
|
||||
is mapped to 9600 baud.
|
||||
|
||||
If the baud rate is not within min..max, then if old is non-NULL,
|
||||
the original baud rate will be tried. If that exceeds the
|
||||
min..max constraint, 9600 baud will be returned. termios will
|
||||
be updated to the baud rate in use.
|
||||
|
||||
Note: min..max must always allow 9600 baud to be selected.
|
||||
|
||||
Locking: caller dependent.
|
||||
|
||||
Interrupts: n/a
|
||||
|
||||
uart_get_divisor(port,baud)
|
||||
Return the divisor (baud_base / baud) for the specified baud
|
||||
rate, appropriately rounded.
|
||||
|
||||
If 38400 baud and custom divisor is selected, return the
|
||||
custom divisor instead.
|
||||
|
||||
Locking: caller dependent.
|
||||
|
||||
Interrupts: n/a
|
||||
|
||||
uart_match_port(port1,port2)
|
||||
This utility function can be used to determine whether two
|
||||
uart_port structures describe the same port.
|
||||
|
||||
Locking: n/a
|
||||
|
||||
Interrupts: n/a
|
||||
|
||||
uart_write_wakeup(port)
|
||||
A driver is expected to call this function when the number of
|
||||
characters in the transmit buffer have dropped below a threshold.
|
||||
|
||||
Locking: port->lock should be held.
|
||||
|
||||
Interrupts: n/a
|
||||
|
||||
uart_register_driver(drv)
|
||||
Register a uart driver with the core driver. We in turn register
|
||||
with the tty layer, and initialise the core driver per-port state.
|
||||
|
||||
drv->port should be NULL, and the per-port structures should be
|
||||
registered using uart_add_one_port after this call has succeeded.
|
||||
|
||||
Locking: none
|
||||
|
||||
Interrupts: enabled
|
||||
|
||||
uart_unregister_driver()
|
||||
Remove all references to a driver from the core driver. The low
|
||||
level driver must have removed all its ports via the
|
||||
uart_remove_one_port() if it registered them with uart_add_one_port().
|
||||
|
||||
Locking: none
|
||||
|
||||
Interrupts: enabled
|
||||
|
||||
**uart_suspend_port()**
|
||||
|
||||
**uart_resume_port()**
|
||||
|
||||
**uart_add_one_port()**
|
||||
|
||||
**uart_remove_one_port()**
|
||||
|
||||
Other notes
|
||||
-----------
|
||||
|
||||
It is intended some day to drop the 'unused' entries from uart_port, and
|
||||
allow low level drivers to register their own individual uart_port's with
|
||||
the core. This will allow drivers to use uart_port as a pointer to a
|
||||
structure containing both the uart_port entry with their own extensions,
|
||||
thus::
|
||||
|
||||
struct my_port {
|
||||
struct uart_port port;
|
||||
int my_stuff;
|
||||
};
|
||||
|
||||
Modem control lines via GPIO
|
||||
----------------------------
|
||||
|
||||
Some helpers are provided in order to set/get modem control lines via GPIO.
|
||||
|
||||
mctrl_gpio_init(port, idx):
|
||||
This will get the {cts,rts,...}-gpios from device tree if they are
|
||||
present and request them, set direction etc, and return an
|
||||
allocated structure. `devm_*` functions are used, so there's no need
|
||||
to call mctrl_gpio_free().
|
||||
As this sets up the irq handling make sure to not handle changes to the
|
||||
gpio input lines in your driver, too.
|
||||
|
||||
mctrl_gpio_free(dev, gpios):
|
||||
This will free the requested gpios in mctrl_gpio_init().
|
||||
As `devm_*` functions are used, there's generally no need to call
|
||||
this function.
|
||||
|
||||
mctrl_gpio_to_gpiod(gpios, gidx)
|
||||
This returns the gpio_desc structure associated to the modem line
|
||||
index.
|
||||
|
||||
mctrl_gpio_set(gpios, mctrl):
|
||||
This will sets the gpios according to the mctrl state.
|
||||
|
||||
mctrl_gpio_get(gpios, mctrl):
|
||||
This will update mctrl with the gpios values.
|
||||
|
||||
mctrl_gpio_enable_ms(gpios):
|
||||
Enables irqs and handling of changes to the ms lines.
|
||||
|
||||
mctrl_gpio_disable_ms(gpios):
|
||||
Disables irqs and handling of changes to the ms lines.
|
32
Documentation/driver-api/serial/index.rst
Normal file
32
Documentation/driver-api/serial/index.rst
Normal file
@@ -0,0 +1,32 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==========================
|
||||
Support for Serial devices
|
||||
==========================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
|
||||
driver
|
||||
tty
|
||||
|
||||
Serial drivers
|
||||
==============
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
cyclades_z
|
||||
moxa-smartio
|
||||
n_gsm
|
||||
rocket
|
||||
serial-iso7816
|
||||
serial-rs485
|
||||
|
||||
.. only:: subproject and html
|
||||
|
||||
Indices
|
||||
=======
|
||||
|
||||
* :ref:`genindex`
|
615
Documentation/driver-api/serial/moxa-smartio.rst
Normal file
615
Documentation/driver-api/serial/moxa-smartio.rst
Normal file
@@ -0,0 +1,615 @@
|
||||
=============================================================
|
||||
MOXA Smartio/Industio Family Device Driver Installation Guide
|
||||
=============================================================
|
||||
|
||||
.. note::
|
||||
|
||||
This file is outdated. It needs some care in order to make it
|
||||
updated to Kernel 5.0 and upper
|
||||
|
||||
Copyright (C) 2008, Moxa Inc.
|
||||
|
||||
Date: 01/21/2008
|
||||
|
||||
.. Content
|
||||
|
||||
1. Introduction
|
||||
2. System Requirement
|
||||
3. Installation
|
||||
3.1 Hardware installation
|
||||
3.2 Driver files
|
||||
3.3 Device naming convention
|
||||
3.4 Module driver configuration
|
||||
3.5 Static driver configuration for Linux kernel 2.4.x and 2.6.x.
|
||||
3.6 Custom configuration
|
||||
3.7 Verify driver installation
|
||||
4. Utilities
|
||||
5. Setserial
|
||||
6. Troubleshooting
|
||||
|
||||
1. Introduction
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
The Smartio/Industio/UPCI family Linux driver supports following multiport
|
||||
boards.
|
||||
|
||||
- 2 ports multiport board
|
||||
CP-102U, CP-102UL, CP-102UF
|
||||
CP-132U-I, CP-132UL,
|
||||
CP-132, CP-132I, CP132S, CP-132IS,
|
||||
CI-132, CI-132I, CI-132IS,
|
||||
(C102H, C102HI, C102HIS, C102P, CP-102, CP-102S)
|
||||
|
||||
- 4 ports multiport board
|
||||
CP-104EL,
|
||||
CP-104UL, CP-104JU,
|
||||
CP-134U, CP-134U-I,
|
||||
C104H/PCI, C104HS/PCI,
|
||||
CP-114, CP-114I, CP-114S, CP-114IS, CP-114UL,
|
||||
C104H, C104HS,
|
||||
CI-104J, CI-104JS,
|
||||
CI-134, CI-134I, CI-134IS,
|
||||
(C114HI, CT-114I, C104P),
|
||||
POS-104UL,
|
||||
CB-114,
|
||||
CB-134I
|
||||
|
||||
- 8 ports multiport board
|
||||
CP-118EL, CP-168EL,
|
||||
CP-118U, CP-168U,
|
||||
C168H/PCI,
|
||||
C168H, C168HS,
|
||||
(C168P),
|
||||
CB-108
|
||||
|
||||
This driver and installation procedure have been developed upon Linux Kernel
|
||||
2.4.x and 2.6.x. This driver supports Intel x86 hardware platform. In order
|
||||
to maintain compatibility, this version has also been properly tested with
|
||||
RedHat, Mandrake, Fedora and S.u.S.E Linux. However, if compatibility problem
|
||||
occurs, please contact Moxa at support@moxa.com.tw.
|
||||
|
||||
In addition to device driver, useful utilities are also provided in this
|
||||
version. They are:
|
||||
|
||||
- msdiag
|
||||
Diagnostic program for displaying installed Moxa
|
||||
Smartio/Industio boards.
|
||||
- msmon
|
||||
Monitor program to observe data count and line status signals.
|
||||
- msterm A simple terminal program which is useful in testing serial
|
||||
ports.
|
||||
- io-irq.exe
|
||||
Configuration program to setup ISA boards. Please note that
|
||||
this program can only be executed under DOS.
|
||||
|
||||
All the drivers and utilities are published in form of source code under
|
||||
GNU General Public License in this version. Please refer to GNU General
|
||||
Public License announcement in each source code file for more detail.
|
||||
|
||||
In Moxa's Web sites, you may always find latest driver at http://www.moxa.com/.
|
||||
|
||||
This version of driver can be installed as Loadable Module (Module driver)
|
||||
or built-in into kernel (Static driver). You may refer to following
|
||||
installation procedure for suitable one. Before you install the driver,
|
||||
please refer to hardware installation procedure in the User's Manual.
|
||||
|
||||
We assume the user should be familiar with following documents.
|
||||
|
||||
- Serial-HOWTO
|
||||
- Kernel-HOWTO
|
||||
|
||||
2. System Requirement
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
- Hardware platform: Intel x86 machine
|
||||
- Kernel version: 2.4.x or 2.6.x
|
||||
- gcc version 2.72 or later
|
||||
- Maximum 4 boards can be installed in combination
|
||||
|
||||
3. Installation
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
3.1 Hardware installation
|
||||
=========================
|
||||
|
||||
There are two types of buses, ISA and PCI, for Smartio/Industio
|
||||
family multiport board.
|
||||
|
||||
ISA board
|
||||
---------
|
||||
|
||||
You'll have to configure CAP address, I/O address, Interrupt Vector
|
||||
as well as IRQ before installing this driver. Please refer to hardware
|
||||
installation procedure in User's Manual before proceed any further.
|
||||
Please make sure the JP1 is open after the ISA board is set properly.
|
||||
|
||||
PCI/UPCI board
|
||||
--------------
|
||||
|
||||
You may need to adjust IRQ usage in BIOS to avoid from IRQ conflict
|
||||
with other ISA devices. Please refer to hardware installation
|
||||
procedure in User's Manual in advance.
|
||||
|
||||
PCI IRQ Sharing
|
||||
---------------
|
||||
|
||||
Each port within the same multiport board shares the same IRQ. Up to
|
||||
4 Moxa Smartio/Industio PCI Family multiport boards can be installed
|
||||
together on one system and they can share the same IRQ.
|
||||
|
||||
|
||||
3.2 Driver files
|
||||
================
|
||||
|
||||
The driver file may be obtained from ftp, CD-ROM or floppy disk. The
|
||||
first step, anyway, is to copy driver file "mxser.tgz" into specified
|
||||
directory. e.g. /moxa. The execute commands as below::
|
||||
|
||||
# cd /
|
||||
# mkdir moxa
|
||||
# cd /moxa
|
||||
# tar xvf /dev/fd0
|
||||
|
||||
or::
|
||||
|
||||
# cd /
|
||||
# mkdir moxa
|
||||
# cd /moxa
|
||||
# cp /mnt/cdrom/<driver directory>/mxser.tgz .
|
||||
# tar xvfz mxser.tgz
|
||||
|
||||
|
||||
3.3 Device naming convention
|
||||
============================
|
||||
|
||||
You may find all the driver and utilities files in /moxa/mxser.
|
||||
Following installation procedure depends on the model you'd like to
|
||||
run the driver. If you prefer module driver, please refer to 3.4.
|
||||
If static driver is required, please refer to 3.5.
|
||||
|
||||
Dialin and callout port
|
||||
-----------------------
|
||||
|
||||
This driver remains traditional serial device properties. There are
|
||||
two special file name for each serial port. One is dial-in port
|
||||
which is named "ttyMxx". For callout port, the naming convention
|
||||
is "cumxx".
|
||||
|
||||
Device naming when more than 2 boards installed
|
||||
-----------------------------------------------
|
||||
|
||||
Naming convention for each Smartio/Industio multiport board is
|
||||
pre-defined as below.
|
||||
|
||||
============ =============== ==============
|
||||
Board Num. Dial-in Port Callout port
|
||||
1st board ttyM0 - ttyM7 cum0 - cum7
|
||||
2nd board ttyM8 - ttyM15 cum8 - cum15
|
||||
3rd board ttyM16 - ttyM23 cum16 - cum23
|
||||
4th board ttyM24 - ttym31 cum24 - cum31
|
||||
============ =============== ==============
|
||||
|
||||
.. note::
|
||||
|
||||
Under Kernel 2.6 and upper, the cum Device is Obsolete. So use ttyM*
|
||||
device instead.
|
||||
|
||||
Board sequence
|
||||
--------------
|
||||
|
||||
This driver will activate ISA boards according to the parameter set
|
||||
in the driver. After all specified ISA board activated, PCI board
|
||||
will be installed in the system automatically driven.
|
||||
Therefore the board number is sorted by the CAP address of ISA boards.
|
||||
For PCI boards, their sequence will be after ISA boards and C168H/PCI
|
||||
has higher priority than C104H/PCI boards.
|
||||
|
||||
3.4 Module driver configuration
|
||||
===============================
|
||||
|
||||
Module driver is easiest way to install. If you prefer static driver
|
||||
installation, please skip this paragraph.
|
||||
|
||||
|
||||
------------- Prepare to use the MOXA driver --------------------
|
||||
|
||||
3.4.1 Create tty device with correct major number
|
||||
-------------------------------------------------
|
||||
|
||||
Before using MOXA driver, your system must have the tty devices
|
||||
which are created with driver's major number. We offer one shell
|
||||
script "msmknod" to simplify the procedure.
|
||||
This step is only needed to be executed once. But you still
|
||||
need to do this procedure when:
|
||||
|
||||
a. You change the driver's major number. Please refer the "3.7"
|
||||
section.
|
||||
b. Your total installed MOXA boards number is changed. Maybe you
|
||||
add/delete one MOXA board.
|
||||
c. You want to change the tty name. This needs to modify the
|
||||
shell script "msmknod"
|
||||
|
||||
The procedure is::
|
||||
|
||||
# cd /moxa/mxser/driver
|
||||
# ./msmknod
|
||||
|
||||
This shell script will require the major number for dial-in
|
||||
device and callout device to create tty device. You also need
|
||||
to specify the total installed MOXA board number. Default major
|
||||
numbers for dial-in device and callout device are 30, 35. If
|
||||
you need to change to other number, please refer section "3.7"
|
||||
for more detailed procedure.
|
||||
Msmknod will delete any special files occupying the same device
|
||||
naming.
|
||||
|
||||
3.4.2 Build the MOXA driver and utilities
|
||||
-----------------------------------------
|
||||
|
||||
Before using the MOXA driver and utilities, you need compile the
|
||||
all the source code. This step is only need to be executed once.
|
||||
But you still re-compile the source code if you modify the source
|
||||
code. For example, if you change the driver's major number (see
|
||||
"3.7" section), then you need to do this step again.
|
||||
|
||||
Find "Makefile" in /moxa/mxser, then run
|
||||
|
||||
# make clean; make install
|
||||
|
||||
..note::
|
||||
|
||||
For Red Hat 9, Red Hat Enterprise Linux AS3/ES3/WS3 & Fedora Core1:
|
||||
# make clean; make installsp1
|
||||
|
||||
For Red Hat Enterprise Linux AS4/ES4/WS4:
|
||||
# make clean; make installsp2
|
||||
|
||||
The driver files "mxser.o" and utilities will be properly compiled
|
||||
and copied to system directories respectively.
|
||||
|
||||
------------- Load MOXA driver--------------------
|
||||
|
||||
3.4.3 Load the MOXA driver
|
||||
--------------------------
|
||||
|
||||
::
|
||||
|
||||
# modprobe mxser <argument>
|
||||
|
||||
will activate the module driver. You may run "lsmod" to check
|
||||
if "mxser" is activated. If the MOXA board is ISA board, the
|
||||
<argument> is needed. Please refer to section "3.4.5" for more
|
||||
information.
|
||||
|
||||
------------- Load MOXA driver on boot --------------------
|
||||
|
||||
3.4.4 Load the mxser driver
|
||||
---------------------------
|
||||
|
||||
|
||||
For the above description, you may manually execute
|
||||
"modprobe mxser" to activate this driver and run
|
||||
"rmmod mxser" to remove it.
|
||||
|
||||
However, it's better to have a boot time configuration to
|
||||
eliminate manual operation. Boot time configuration can be
|
||||
achieved by rc file. We offer one "rc.mxser" file to simplify
|
||||
the procedure under "moxa/mxser/driver".
|
||||
|
||||
But if you use ISA board, please modify the "modprobe ..." command
|
||||
to add the argument (see "3.4.5" section). After modifying the
|
||||
rc.mxser, please try to execute "/moxa/mxser/driver/rc.mxser"
|
||||
manually to make sure the modification is ok. If any error
|
||||
encountered, please try to modify again. If the modification is
|
||||
completed, follow the below step.
|
||||
|
||||
Run following command for setting rc files::
|
||||
|
||||
# cd /moxa/mxser/driver
|
||||
# cp ./rc.mxser /etc/rc.d
|
||||
# cd /etc/rc.d
|
||||
|
||||
Check "rc.serial" is existed or not. If "rc.serial" doesn't exist,
|
||||
create it by vi, run "chmod 755 rc.serial" to change the permission.
|
||||
|
||||
Add "/etc/rc.d/rc.mxser" in last line.
|
||||
|
||||
Reboot and check if moxa.o activated by "lsmod" command.
|
||||
|
||||
3.4.5. specify CAP address
|
||||
--------------------------
|
||||
|
||||
If you'd like to drive Smartio/Industio ISA boards in the system,
|
||||
you'll have to add parameter to specify CAP address of given
|
||||
board while activating "mxser.o". The format for parameters are
|
||||
as follows.::
|
||||
|
||||
modprobe mxser ioaddr=0x???,0x???,0x???,0x???
|
||||
| | | |
|
||||
| | | +- 4th ISA board
|
||||
| | +------ 3rd ISA board
|
||||
| +------------ 2nd ISA board
|
||||
+-------------------1st ISA board
|
||||
|
||||
3.5 Static driver configuration for Linux kernel 2.4.x and 2.6.x
|
||||
================================================================
|
||||
|
||||
Note:
|
||||
To use static driver, you must install the linux kernel
|
||||
source package.
|
||||
|
||||
3.5.1 Backup the built-in driver in the kernel
|
||||
----------------------------------------------
|
||||
|
||||
::
|
||||
|
||||
# cd /usr/src/linux/drivers/char
|
||||
# mv mxser.c mxser.c.old
|
||||
|
||||
For Red Hat 7.x user, you need to create link:
|
||||
# cd /usr/src
|
||||
# ln -s linux-2.4 linux
|
||||
|
||||
3.5.2 Create link
|
||||
-----------------
|
||||
::
|
||||
|
||||
# cd /usr/src/linux/drivers/char
|
||||
# ln -s /moxa/mxser/driver/mxser.c mxser.c
|
||||
|
||||
3.5.3 Add CAP address list for ISA boards.
|
||||
------------------------------------------
|
||||
|
||||
For PCI boards user, please skip this step.
|
||||
|
||||
In module mode, the CAP address for ISA board is given by
|
||||
parameter. In static driver configuration, you'll have to
|
||||
assign it within driver's source code. If you will not
|
||||
install any ISA boards, you may skip to next portion.
|
||||
The instructions to modify driver source code are as
|
||||
below.
|
||||
|
||||
a. run::
|
||||
|
||||
# cd /moxa/mxser/driver
|
||||
# vi mxser.c
|
||||
|
||||
b. Find the array mxserBoardCAP[] as below::
|
||||
|
||||
static int mxserBoardCAP[] = {0x00, 0x00, 0x00, 0x00};
|
||||
|
||||
c. Change the address within this array using vi. For
|
||||
example, to driver 2 ISA boards with CAP address
|
||||
0x280 and 0x180 as 1st and 2nd board. Just to change
|
||||
the source code as follows::
|
||||
|
||||
static int mxserBoardCAP[] = {0x280, 0x180, 0x00, 0x00};
|
||||
|
||||
3.5.4 Setup kernel configuration
|
||||
--------------------------------
|
||||
|
||||
Configure the kernel::
|
||||
|
||||
# cd /usr/src/linux
|
||||
# make menuconfig
|
||||
|
||||
You will go into a menu-driven system. Please select [Character
|
||||
devices][Non-standard serial port support], enable the [Moxa
|
||||
SmartIO support] driver with "[*]" for built-in (not "[M]"), then
|
||||
select [Exit] to exit this program.
|
||||
|
||||
3.5.5 Rebuild kernel
|
||||
--------------------
|
||||
|
||||
The following are for Linux kernel rebuilding, for your
|
||||
reference only.
|
||||
|
||||
For appropriate details, please refer to the Linux document:
|
||||
|
||||
a. Run the following commands::
|
||||
|
||||
cd /usr/src/linux
|
||||
make clean # take a few minutes
|
||||
make dep # take a few minutes
|
||||
make bzImage # take probably 10-20 minutes
|
||||
make install # copy boot image to correct position
|
||||
|
||||
f. Please make sure the boot kernel (vmlinuz) is in the
|
||||
correct position.
|
||||
g. If you use 'lilo' utility, you should check /etc/lilo.conf
|
||||
'image' item specified the path which is the 'vmlinuz' path,
|
||||
or you will load wrong (or old) boot kernel image (vmlinuz).
|
||||
After checking /etc/lilo.conf, please run "lilo".
|
||||
|
||||
Note that if the result of "make bzImage" is ERROR, then you have to
|
||||
go back to Linux configuration Setup. Type "make menuconfig" in
|
||||
directory /usr/src/linux.
|
||||
|
||||
|
||||
3.5.6 Make tty device and special file
|
||||
--------------------------------------
|
||||
|
||||
::
|
||||
# cd /moxa/mxser/driver
|
||||
# ./msmknod
|
||||
|
||||
3.5.7 Make utility
|
||||
------------------
|
||||
|
||||
::
|
||||
|
||||
# cd /moxa/mxser/utility
|
||||
# make clean; make install
|
||||
|
||||
3.5.8 Reboot
|
||||
------------
|
||||
|
||||
|
||||
|
||||
3.6 Custom configuration
|
||||
========================
|
||||
|
||||
Although this driver already provides you default configuration, you
|
||||
still can change the device name and major number. The instruction to
|
||||
change these parameters are shown as below.
|
||||
|
||||
a. Change Device name
|
||||
|
||||
If you'd like to use other device names instead of default naming
|
||||
convention, all you have to do is to modify the internal code
|
||||
within the shell script "msmknod". First, you have to open "msmknod"
|
||||
by vi. Locate each line contains "ttyM" and "cum" and change them
|
||||
to the device name you desired. "msmknod" creates the device names
|
||||
you need next time executed.
|
||||
|
||||
b. Change Major number
|
||||
|
||||
If major number 30 and 35 had been occupied, you may have to select
|
||||
2 free major numbers for this driver. There are 3 steps to change
|
||||
major numbers.
|
||||
|
||||
3.6.1 Find free major numbers
|
||||
-----------------------------
|
||||
|
||||
In /proc/devices, you may find all the major numbers occupied
|
||||
in the system. Please select 2 major numbers that are available.
|
||||
e.g. 40, 45.
|
||||
|
||||
3.6.2 Create special files
|
||||
--------------------------
|
||||
|
||||
Run /moxa/mxser/driver/msmknod to create special files with
|
||||
specified major numbers.
|
||||
|
||||
3.6.3 Modify driver with new major number
|
||||
-----------------------------------------
|
||||
|
||||
Run vi to open /moxa/mxser/driver/mxser.c. Locate the line
|
||||
contains "MXSERMAJOR". Change the content as below::
|
||||
|
||||
#define MXSERMAJOR 40
|
||||
#define MXSERCUMAJOR 45
|
||||
|
||||
3.6.4 Run "make clean; make install" in /moxa/mxser/driver.
|
||||
|
||||
3.7 Verify driver installation
|
||||
==============================
|
||||
|
||||
You may refer to /var/log/messages to check the latest status
|
||||
log reported by this driver whenever it's activated.
|
||||
|
||||
4. Utilities
|
||||
^^^^^^^^^^^^
|
||||
|
||||
There are 3 utilities contained in this driver. They are msdiag, msmon and
|
||||
msterm. These 3 utilities are released in form of source code. They should
|
||||
be compiled into executable file and copied into /usr/bin.
|
||||
|
||||
Before using these utilities, please load driver (refer 3.4 & 3.5) and
|
||||
make sure you had run the "msmknod" utility.
|
||||
|
||||
msdiag - Diagnostic
|
||||
===================
|
||||
|
||||
This utility provides the function to display what Moxa Smartio/Industio
|
||||
board found by driver in the system.
|
||||
|
||||
msmon - Port Monitoring
|
||||
=======================
|
||||
|
||||
This utility gives the user a quick view about all the MOXA ports'
|
||||
activities. One can easily learn each port's total received/transmitted
|
||||
(Rx/Tx) character count since the time when the monitoring is started.
|
||||
|
||||
Rx/Tx throughputs per second are also reported in interval basis (e.g.
|
||||
the last 5 seconds) and in average basis (since the time the monitoring
|
||||
is started). You can reset all ports' count by <HOME> key. <+> <->
|
||||
(plus/minus) keys to change the displaying time interval. Press <ENTER>
|
||||
on the port, that cursor stay, to view the port's communication
|
||||
parameters, signal status, and input/output queue.
|
||||
|
||||
msterm - Terminal Emulation
|
||||
===========================
|
||||
|
||||
This utility provides data sending and receiving ability of all tty ports,
|
||||
especially for MOXA ports. It is quite useful for testing simple
|
||||
application, for example, sending AT command to a modem connected to the
|
||||
port or used as a terminal for login purpose. Note that this is only a
|
||||
dumb terminal emulation without handling full screen operation.
|
||||
|
||||
5. Setserial
|
||||
^^^^^^^^^^^^
|
||||
|
||||
Supported Setserial parameters are listed as below.
|
||||
|
||||
============== =========================================================
|
||||
uart set UART type(16450-->disable FIFO, 16550A-->enable FIFO)
|
||||
close_delay set the amount of time(in 1/100 of a second) that DTR
|
||||
should be kept low while being closed.
|
||||
closing_wait set the amount of time(in 1/100 of a second) that the
|
||||
serial port should wait for data to be drained while
|
||||
being closed, before the receiver is disable.
|
||||
spd_hi Use 57.6kb when the application requests 38.4kb.
|
||||
spd_vhi Use 115.2kb when the application requests 38.4kb.
|
||||
spd_shi Use 230.4kb when the application requests 38.4kb.
|
||||
spd_warp Use 460.8kb when the application requests 38.4kb.
|
||||
spd_normal Use 38.4kb when the application requests 38.4kb.
|
||||
spd_cust Use the custom divisor to set the speed when the
|
||||
application requests 38.4kb.
|
||||
divisor This option set the custom division.
|
||||
baud_base This option set the base baud rate.
|
||||
============== =========================================================
|
||||
|
||||
6. Troubleshooting
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The boot time error messages and solutions are stated as clearly as
|
||||
possible. If all the possible solutions fail, please contact our technical
|
||||
support team to get more help.
|
||||
|
||||
|
||||
Error msg:
|
||||
More than 4 Moxa Smartio/Industio family boards found. Fifth board
|
||||
and after are ignored.
|
||||
|
||||
Solution:
|
||||
To avoid this problem, please unplug fifth and after board, because Moxa
|
||||
driver supports up to 4 boards.
|
||||
|
||||
Error msg:
|
||||
Request_irq fail, IRQ(?) may be conflict with another device.
|
||||
|
||||
Solution:
|
||||
Other PCI or ISA devices occupy the assigned IRQ. If you are not sure
|
||||
which device causes the situation, please check /proc/interrupts to find
|
||||
free IRQ and simply change another free IRQ for Moxa board.
|
||||
|
||||
Error msg:
|
||||
Board #: C1xx Series(CAP=xxx) interrupt number invalid.
|
||||
|
||||
Solution:
|
||||
Each port within the same multiport board shares the same IRQ. Please set
|
||||
one IRQ (IRQ doesn't equal to zero) for one Moxa board.
|
||||
|
||||
Error msg:
|
||||
No interrupt vector be set for Moxa ISA board(CAP=xxx).
|
||||
|
||||
Solution:
|
||||
Moxa ISA board needs an interrupt vector.Please refer to user's manual
|
||||
"Hardware Installation" chapter to set interrupt vector.
|
||||
|
||||
Error msg:
|
||||
Couldn't install MOXA Smartio/Industio family driver!
|
||||
|
||||
Solution:
|
||||
Load Moxa driver fail, the major number may conflict with other devices.
|
||||
Please refer to previous section 3.7 to change a free major number for
|
||||
Moxa driver.
|
||||
|
||||
Error msg:
|
||||
Couldn't install MOXA Smartio/Industio family callout driver!
|
||||
|
||||
Solution:
|
||||
Load Moxa callout driver fail, the callout device major number may
|
||||
conflict with other devices. Please refer to previous section 3.7 to
|
||||
change a free callout device major number for Moxa driver.
|
103
Documentation/driver-api/serial/n_gsm.rst
Normal file
103
Documentation/driver-api/serial/n_gsm.rst
Normal file
@@ -0,0 +1,103 @@
|
||||
==============================
|
||||
GSM 0710 tty multiplexor HOWTO
|
||||
==============================
|
||||
|
||||
This line discipline implements the GSM 07.10 multiplexing protocol
|
||||
detailed in the following 3GPP document:
|
||||
|
||||
http://www.3gpp.org/ftp/Specs/archive/07_series/07.10/0710-720.zip
|
||||
|
||||
This document give some hints on how to use this driver with GPRS and 3G
|
||||
modems connected to a physical serial port.
|
||||
|
||||
How to use it
|
||||
-------------
|
||||
1. initialize the modem in 0710 mux mode (usually AT+CMUX= command) through
|
||||
its serial port. Depending on the modem used, you can pass more or less
|
||||
parameters to this command,
|
||||
2. switch the serial line to using the n_gsm line discipline by using
|
||||
TIOCSETD ioctl,
|
||||
3. configure the mux using GSMIOC_GETCONF / GSMIOC_SETCONF ioctl,
|
||||
|
||||
Major parts of the initialization program :
|
||||
(a good starting point is util-linux-ng/sys-utils/ldattach.c)::
|
||||
|
||||
#include <linux/gsmmux.h>
|
||||
#define N_GSM0710 21 /* GSM 0710 Mux */
|
||||
#define DEFAULT_SPEED B115200
|
||||
#define SERIAL_PORT /dev/ttyS0
|
||||
|
||||
int ldisc = N_GSM0710;
|
||||
struct gsm_config c;
|
||||
struct termios configuration;
|
||||
|
||||
/* open the serial port connected to the modem */
|
||||
fd = open(SERIAL_PORT, O_RDWR | O_NOCTTY | O_NDELAY);
|
||||
|
||||
/* configure the serial port : speed, flow control ... */
|
||||
|
||||
/* send the AT commands to switch the modem to CMUX mode
|
||||
and check that it's successful (should return OK) */
|
||||
write(fd, "AT+CMUX=0\r", 10);
|
||||
|
||||
/* experience showed that some modems need some time before
|
||||
being able to answer to the first MUX packet so a delay
|
||||
may be needed here in some case */
|
||||
sleep(3);
|
||||
|
||||
/* use n_gsm line discipline */
|
||||
ioctl(fd, TIOCSETD, &ldisc);
|
||||
|
||||
/* get n_gsm configuration */
|
||||
ioctl(fd, GSMIOC_GETCONF, &c);
|
||||
/* we are initiator and need encoding 0 (basic) */
|
||||
c.initiator = 1;
|
||||
c.encapsulation = 0;
|
||||
/* our modem defaults to a maximum size of 127 bytes */
|
||||
c.mru = 127;
|
||||
c.mtu = 127;
|
||||
/* set the new configuration */
|
||||
ioctl(fd, GSMIOC_SETCONF, &c);
|
||||
|
||||
/* and wait for ever to keep the line discipline enabled */
|
||||
daemon(0,0);
|
||||
pause();
|
||||
|
||||
4. create the devices corresponding to the "virtual" serial ports (take care,
|
||||
each modem has its configuration and some DLC have dedicated functions,
|
||||
for example GPS), starting with minor 1 (DLC0 is reserved for the management
|
||||
of the mux)::
|
||||
|
||||
MAJOR=`cat /proc/devices |grep gsmtty | awk '{print $1}`
|
||||
for i in `seq 1 4`; do
|
||||
mknod /dev/ttygsm$i c $MAJOR $i
|
||||
done
|
||||
|
||||
5. use these devices as plain serial ports.
|
||||
|
||||
for example, it's possible:
|
||||
|
||||
- and to use gnokii to send / receive SMS on ttygsm1
|
||||
- to use ppp to establish a datalink on ttygsm2
|
||||
|
||||
6. first close all virtual ports before closing the physical port.
|
||||
|
||||
Note that after closing the physical port the modem is still in multiplexing
|
||||
mode. This may prevent a successful re-opening of the port later. To avoid
|
||||
this situation either reset the modem if your hardware allows that or send
|
||||
a disconnect command frame manually before initializing the multiplexing mode
|
||||
for the second time. The byte sequence for the disconnect command frame is::
|
||||
|
||||
0xf9, 0x03, 0xef, 0x03, 0xc3, 0x16, 0xf9.
|
||||
|
||||
Additional Documentation
|
||||
------------------------
|
||||
More practical details on the protocol and how it's supported by industrial
|
||||
modems can be found in the following documents :
|
||||
|
||||
- http://www.telit.com/module/infopool/download.php?id=616
|
||||
- http://www.u-blox.com/images/downloads/Product_Docs/LEON-G100-G200-MuxImplementation_ApplicationNote_%28GSM%20G1-CS-10002%29.pdf
|
||||
- http://www.sierrawireless.com/Support/Downloads/AirPrime/WMP_Series/~/media/Support_Downloads/AirPrime/Application_notes/CMUX_Feature_Application_Note-Rev004.ashx
|
||||
- http://wm.sim.com/sim/News/photo/2010721161442.pdf
|
||||
|
||||
11-03-08 - Eric Bénard - <eric@eukrea.com>
|
185
Documentation/driver-api/serial/rocket.rst
Normal file
185
Documentation/driver-api/serial/rocket.rst
Normal file
@@ -0,0 +1,185 @@
|
||||
================================================
|
||||
Comtrol(tm) RocketPort(R)/RocketModem(TM) Series
|
||||
================================================
|
||||
|
||||
Device Driver for the Linux Operating System
|
||||
============================================
|
||||
|
||||
Product overview
|
||||
----------------
|
||||
|
||||
This driver provides a loadable kernel driver for the Comtrol RocketPort
|
||||
and RocketModem PCI boards. These boards provide, 2, 4, 8, 16, or 32
|
||||
high-speed serial ports or modems. This driver supports up to a combination
|
||||
of four RocketPort or RocketModems boards in one machine simultaneously.
|
||||
This file assumes that you are using the RocketPort driver which is
|
||||
integrated into the kernel sources.
|
||||
|
||||
The driver can also be installed as an external module using the usual
|
||||
"make;make install" routine. This external module driver, obtainable
|
||||
from the Comtrol website listed below, is useful for updating the driver
|
||||
or installing it into kernels which do not have the driver configured
|
||||
into them. Installations instructions for the external module
|
||||
are in the included README and HW_INSTALL files.
|
||||
|
||||
RocketPort ISA and RocketModem II PCI boards currently are only supported by
|
||||
this driver in module form.
|
||||
|
||||
The RocketPort ISA board requires I/O ports to be configured by the DIP
|
||||
switches on the board. See the section "ISA Rocketport Boards" below for
|
||||
information on how to set the DIP switches.
|
||||
|
||||
You pass the I/O port to the driver using the following module parameters:
|
||||
|
||||
board1:
|
||||
I/O port for the first ISA board
|
||||
board2:
|
||||
I/O port for the second ISA board
|
||||
board3:
|
||||
I/O port for the third ISA board
|
||||
board4:
|
||||
I/O port for the fourth ISA board
|
||||
|
||||
There is a set of utilities and scripts provided with the external driver
|
||||
(downloadable from http://www.comtrol.com) that ease the configuration and
|
||||
setup of the ISA cards.
|
||||
|
||||
The RocketModem II PCI boards require firmware to be loaded into the card
|
||||
before it will function. The driver has only been tested as a module for this
|
||||
board.
|
||||
|
||||
Installation Procedures
|
||||
-----------------------
|
||||
|
||||
RocketPort/RocketModem PCI cards require no driver configuration, they are
|
||||
automatically detected and configured.
|
||||
|
||||
The RocketPort driver can be installed as a module (recommended) or built
|
||||
into the kernel. This is selected, as for other drivers, through the `make config`
|
||||
command from the root of the Linux source tree during the kernel build process.
|
||||
|
||||
The RocketPort/RocketModem serial ports installed by this driver are assigned
|
||||
device major number 46, and will be named /dev/ttyRx, where x is the port number
|
||||
starting at zero (ex. /dev/ttyR0, /devttyR1, ...). If you have multiple cards
|
||||
installed in the system, the mapping of port names to serial ports is displayed
|
||||
in the system log at /var/log/messages.
|
||||
|
||||
If installed as a module, the module must be loaded. This can be done
|
||||
manually by entering "modprobe rocket". To have the module loaded automatically
|
||||
upon system boot, edit a `/etc/modprobe.d/*.conf` file and add the line
|
||||
"alias char-major-46 rocket".
|
||||
|
||||
In order to use the ports, their device names (nodes) must be created with mknod.
|
||||
This is only required once, the system will retain the names once created. To
|
||||
create the RocketPort/RocketModem device names, use the command
|
||||
"mknod /dev/ttyRx c 46 x" where x is the port number starting at zero.
|
||||
|
||||
For example::
|
||||
|
||||
> mknod /dev/ttyR0 c 46 0
|
||||
> mknod /dev/ttyR1 c 46 1
|
||||
> mknod /dev/ttyR2 c 46 2
|
||||
|
||||
The Linux script MAKEDEV will create the first 16 ttyRx device names (nodes)
|
||||
for you::
|
||||
|
||||
>/dev/MAKEDEV ttyR
|
||||
|
||||
ISA Rocketport Boards
|
||||
---------------------
|
||||
|
||||
You must assign and configure the I/O addresses used by the ISA Rocketport
|
||||
card before installing and using it. This is done by setting a set of DIP
|
||||
switches on the Rocketport board.
|
||||
|
||||
|
||||
Setting the I/O address
|
||||
-----------------------
|
||||
|
||||
Before installing RocketPort(R) or RocketPort RA boards, you must find
|
||||
a range of I/O addresses for it to use. The first RocketPort card
|
||||
requires a 68-byte contiguous block of I/O addresses, starting at one
|
||||
of the following: 0x100h, 0x140h, 0x180h, 0x200h, 0x240h, 0x280h,
|
||||
0x300h, 0x340h, 0x380h. This I/O address must be reflected in the DIP
|
||||
switches of *all* of the Rocketport cards.
|
||||
|
||||
The second, third, and fourth RocketPort cards require a 64-byte
|
||||
contiguous block of I/O addresses, starting at one of the following
|
||||
I/O addresses: 0x100h, 0x140h, 0x180h, 0x1C0h, 0x200h, 0x240h, 0x280h,
|
||||
0x2C0h, 0x300h, 0x340h, 0x380h, 0x3C0h. The I/O address used by the
|
||||
second, third, and fourth Rocketport cards (if present) are set via
|
||||
software control. The DIP switch settings for the I/O address must be
|
||||
set to the value of the first Rocketport cards.
|
||||
|
||||
In order to distinguish each of the card from the others, each card
|
||||
must have a unique board ID set on the dip switches. The first
|
||||
Rocketport board must be set with the DIP switches corresponding to
|
||||
the first board, the second board must be set with the DIP switches
|
||||
corresponding to the second board, etc. IMPORTANT: The board ID is
|
||||
the only place where the DIP switch settings should differ between the
|
||||
various Rocketport boards in a system.
|
||||
|
||||
The I/O address range used by any of the RocketPort cards must not
|
||||
conflict with any other cards in the system, including other
|
||||
RocketPort cards. Below, you will find a list of commonly used I/O
|
||||
address ranges which may be in use by other devices in your system.
|
||||
On a Linux system, "cat /proc/ioports" will also be helpful in
|
||||
identifying what I/O addresses are being used by devices on your
|
||||
system.
|
||||
|
||||
Remember, the FIRST RocketPort uses 68 I/O addresses. So, if you set it
|
||||
for 0x100, it will occupy 0x100 to 0x143. This would mean that you
|
||||
CAN NOT set the second, third or fourth board for address 0x140 since
|
||||
the first 4 bytes of that range are used by the first board. You would
|
||||
need to set the second, third, or fourth board to one of the next available
|
||||
blocks such as 0x180.
|
||||
|
||||
RocketPort and RocketPort RA SW1 Settings::
|
||||
|
||||
+-------------------------------+
|
||||
| 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |
|
||||
+-------+-------+---------------+
|
||||
| Unused| Card | I/O Port Block|
|
||||
+-------------------------------+
|
||||
|
||||
DIP Switches DIP Switches
|
||||
7 8 6 5
|
||||
=================== ===================
|
||||
On On UNUSED, MUST BE ON. On On First Card <==== Default
|
||||
On Off Second Card
|
||||
Off On Third Card
|
||||
Off Off Fourth Card
|
||||
|
||||
DIP Switches I/O Address Range
|
||||
4 3 2 1 Used by the First Card
|
||||
=====================================
|
||||
On Off On Off 100-143
|
||||
On Off Off On 140-183
|
||||
On Off Off Off 180-1C3 <==== Default
|
||||
Off On On Off 200-243
|
||||
Off On Off On 240-283
|
||||
Off On Off Off 280-2C3
|
||||
Off Off On Off 300-343
|
||||
Off Off Off On 340-383
|
||||
Off Off Off Off 380-3C3
|
||||
|
||||
Reporting Bugs
|
||||
--------------
|
||||
|
||||
For technical support, please provide the following
|
||||
information: Driver version, kernel release, distribution of
|
||||
kernel, and type of board you are using. Error messages and log
|
||||
printouts port configuration details are especially helpful.
|
||||
|
||||
USA:
|
||||
:Phone: (612) 494-4100
|
||||
:FAX: (612) 494-4199
|
||||
:email: support@comtrol.com
|
||||
|
||||
Comtrol Europe:
|
||||
:Phone: +44 (0) 1 869 323-220
|
||||
:FAX: +44 (0) 1 869 323-211
|
||||
:email: support@comtrol.co.uk
|
||||
|
||||
Web: http://www.comtrol.com
|
||||
FTP: ftp.comtrol.com
|
90
Documentation/driver-api/serial/serial-iso7816.rst
Normal file
90
Documentation/driver-api/serial/serial-iso7816.rst
Normal file
@@ -0,0 +1,90 @@
|
||||
=============================
|
||||
ISO7816 Serial Communications
|
||||
=============================
|
||||
|
||||
1. Introduction
|
||||
===============
|
||||
|
||||
ISO/IEC7816 is a series of standards specifying integrated circuit cards (ICC)
|
||||
also known as smart cards.
|
||||
|
||||
2. Hardware-related considerations
|
||||
==================================
|
||||
|
||||
Some CPUs/UARTs (e.g., Microchip AT91) contain a built-in mode capable of
|
||||
handling communication with a smart card.
|
||||
|
||||
For these microcontrollers, the Linux driver should be made capable of
|
||||
working in both modes, and proper ioctls (see later) should be made
|
||||
available at user-level to allow switching from one mode to the other, and
|
||||
vice versa.
|
||||
|
||||
3. Data Structures Already Available in the Kernel
|
||||
==================================================
|
||||
|
||||
The Linux kernel provides the serial_iso7816 structure (see [1]) to handle
|
||||
ISO7816 communications. This data structure is used to set and configure
|
||||
ISO7816 parameters in ioctls.
|
||||
|
||||
Any driver for devices capable of working both as RS232 and ISO7816 should
|
||||
implement the iso7816_config callback in the uart_port structure. The
|
||||
serial_core calls iso7816_config to do the device specific part in response
|
||||
to TIOCGISO7816 and TIOCSISO7816 ioctls (see below). The iso7816_config
|
||||
callback receives a pointer to struct serial_iso7816.
|
||||
|
||||
4. Usage from user-level
|
||||
========================
|
||||
|
||||
From user-level, ISO7816 configuration can be get/set using the previous
|
||||
ioctls. For instance, to set ISO7816 you can use the following code::
|
||||
|
||||
#include <linux/serial.h>
|
||||
|
||||
/* Include definition for ISO7816 ioctls: TIOCSISO7816 and TIOCGISO7816 */
|
||||
#include <sys/ioctl.h>
|
||||
|
||||
/* Open your specific device (e.g., /dev/mydevice): */
|
||||
int fd = open ("/dev/mydevice", O_RDWR);
|
||||
if (fd < 0) {
|
||||
/* Error handling. See errno. */
|
||||
}
|
||||
|
||||
struct serial_iso7816 iso7816conf;
|
||||
|
||||
/* Reserved fields as to be zeroed */
|
||||
memset(&iso7816conf, 0, sizeof(iso7816conf));
|
||||
|
||||
/* Enable ISO7816 mode: */
|
||||
iso7816conf.flags |= SER_ISO7816_ENABLED;
|
||||
|
||||
/* Select the protocol: */
|
||||
/* T=0 */
|
||||
iso7816conf.flags |= SER_ISO7816_T(0);
|
||||
/* or T=1 */
|
||||
iso7816conf.flags |= SER_ISO7816_T(1);
|
||||
|
||||
/* Set the guard time: */
|
||||
iso7816conf.tg = 2;
|
||||
|
||||
/* Set the clock frequency*/
|
||||
iso7816conf.clk = 3571200;
|
||||
|
||||
/* Set transmission factors: */
|
||||
iso7816conf.sc_fi = 372;
|
||||
iso7816conf.sc_di = 1;
|
||||
|
||||
if (ioctl(fd_usart, TIOCSISO7816, &iso7816conf) < 0) {
|
||||
/* Error handling. See errno. */
|
||||
}
|
||||
|
||||
/* Use read() and write() syscalls here... */
|
||||
|
||||
/* Close the device when finished: */
|
||||
if (close (fd) < 0) {
|
||||
/* Error handling. See errno. */
|
||||
}
|
||||
|
||||
5. References
|
||||
=============
|
||||
|
||||
[1] include/uapi/linux/serial.h
|
103
Documentation/driver-api/serial/serial-rs485.rst
Normal file
103
Documentation/driver-api/serial/serial-rs485.rst
Normal file
@@ -0,0 +1,103 @@
|
||||
===========================
|
||||
RS485 Serial Communications
|
||||
===========================
|
||||
|
||||
1. Introduction
|
||||
===============
|
||||
|
||||
EIA-485, also known as TIA/EIA-485 or RS-485, is a standard defining the
|
||||
electrical characteristics of drivers and receivers for use in balanced
|
||||
digital multipoint systems.
|
||||
This standard is widely used for communications in industrial automation
|
||||
because it can be used effectively over long distances and in electrically
|
||||
noisy environments.
|
||||
|
||||
2. Hardware-related Considerations
|
||||
==================================
|
||||
|
||||
Some CPUs/UARTs (e.g., Atmel AT91 or 16C950 UART) contain a built-in
|
||||
half-duplex mode capable of automatically controlling line direction by
|
||||
toggling RTS or DTR signals. That can be used to control external
|
||||
half-duplex hardware like an RS485 transceiver or any RS232-connected
|
||||
half-duplex devices like some modems.
|
||||
|
||||
For these microcontrollers, the Linux driver should be made capable of
|
||||
working in both modes, and proper ioctls (see later) should be made
|
||||
available at user-level to allow switching from one mode to the other, and
|
||||
vice versa.
|
||||
|
||||
3. Data Structures Already Available in the Kernel
|
||||
==================================================
|
||||
|
||||
The Linux kernel provides the serial_rs485 structure (see [1]) to handle
|
||||
RS485 communications. This data structure is used to set and configure RS485
|
||||
parameters in the platform data and in ioctls.
|
||||
|
||||
The device tree can also provide RS485 boot time parameters (see [2]
|
||||
for bindings). The driver is in charge of filling this data structure from
|
||||
the values given by the device tree.
|
||||
|
||||
Any driver for devices capable of working both as RS232 and RS485 should
|
||||
implement the rs485_config callback in the uart_port structure. The
|
||||
serial_core calls rs485_config to do the device specific part in response
|
||||
to TIOCSRS485 and TIOCGRS485 ioctls (see below). The rs485_config callback
|
||||
receives a pointer to struct serial_rs485.
|
||||
|
||||
4. Usage from user-level
|
||||
========================
|
||||
|
||||
From user-level, RS485 configuration can be get/set using the previous
|
||||
ioctls. For instance, to set RS485 you can use the following code::
|
||||
|
||||
#include <linux/serial.h>
|
||||
|
||||
/* Include definition for RS485 ioctls: TIOCGRS485 and TIOCSRS485 */
|
||||
#include <sys/ioctl.h>
|
||||
|
||||
/* Open your specific device (e.g., /dev/mydevice): */
|
||||
int fd = open ("/dev/mydevice", O_RDWR);
|
||||
if (fd < 0) {
|
||||
/* Error handling. See errno. */
|
||||
}
|
||||
|
||||
struct serial_rs485 rs485conf;
|
||||
|
||||
/* Enable RS485 mode: */
|
||||
rs485conf.flags |= SER_RS485_ENABLED;
|
||||
|
||||
/* Set logical level for RTS pin equal to 1 when sending: */
|
||||
rs485conf.flags |= SER_RS485_RTS_ON_SEND;
|
||||
/* or, set logical level for RTS pin equal to 0 when sending: */
|
||||
rs485conf.flags &= ~(SER_RS485_RTS_ON_SEND);
|
||||
|
||||
/* Set logical level for RTS pin equal to 1 after sending: */
|
||||
rs485conf.flags |= SER_RS485_RTS_AFTER_SEND;
|
||||
/* or, set logical level for RTS pin equal to 0 after sending: */
|
||||
rs485conf.flags &= ~(SER_RS485_RTS_AFTER_SEND);
|
||||
|
||||
/* Set rts delay before send, if needed: */
|
||||
rs485conf.delay_rts_before_send = ...;
|
||||
|
||||
/* Set rts delay after send, if needed: */
|
||||
rs485conf.delay_rts_after_send = ...;
|
||||
|
||||
/* Set this flag if you want to receive data even while sending data */
|
||||
rs485conf.flags |= SER_RS485_RX_DURING_TX;
|
||||
|
||||
if (ioctl (fd, TIOCSRS485, &rs485conf) < 0) {
|
||||
/* Error handling. See errno. */
|
||||
}
|
||||
|
||||
/* Use read() and write() syscalls here... */
|
||||
|
||||
/* Close the device when finished: */
|
||||
if (close (fd) < 0) {
|
||||
/* Error handling. See errno. */
|
||||
}
|
||||
|
||||
5. References
|
||||
=============
|
||||
|
||||
[1] include/uapi/linux/serial.h
|
||||
|
||||
[2] Documentation/devicetree/bindings/serial/rs485.txt
|
328
Documentation/driver-api/serial/tty.rst
Normal file
328
Documentation/driver-api/serial/tty.rst
Normal file
@@ -0,0 +1,328 @@
|
||||
=================
|
||||
The Lockronomicon
|
||||
=================
|
||||
|
||||
Your guide to the ancient and twisted locking policies of the tty layer and
|
||||
the warped logic behind them. Beware all ye who read on.
|
||||
|
||||
|
||||
Line Discipline
|
||||
---------------
|
||||
|
||||
Line disciplines are registered with tty_register_ldisc() passing the
|
||||
discipline number and the ldisc structure. At the point of registration the
|
||||
discipline must be ready to use and it is possible it will get used before
|
||||
the call returns success. If the call returns an error then it won't get
|
||||
called. Do not re-use ldisc numbers as they are part of the userspace ABI
|
||||
and writing over an existing ldisc will cause demons to eat your computer.
|
||||
After the return the ldisc data has been copied so you may free your own
|
||||
copy of the structure. You must not re-register over the top of the line
|
||||
discipline even with the same data or your computer again will be eaten by
|
||||
demons.
|
||||
|
||||
In order to remove a line discipline call tty_unregister_ldisc().
|
||||
In ancient times this always worked. In modern times the function will
|
||||
return -EBUSY if the ldisc is currently in use. Since the ldisc referencing
|
||||
code manages the module counts this should not usually be a concern.
|
||||
|
||||
Heed this warning: the reference count field of the registered copies of the
|
||||
tty_ldisc structure in the ldisc table counts the number of lines using this
|
||||
discipline. The reference count of the tty_ldisc structure within a tty
|
||||
counts the number of active users of the ldisc at this instant. In effect it
|
||||
counts the number of threads of execution within an ldisc method (plus those
|
||||
about to enter and exit although this detail matters not).
|
||||
|
||||
Line Discipline Methods
|
||||
-----------------------
|
||||
|
||||
TTY side interfaces
|
||||
^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
======================= =======================================================
|
||||
open() Called when the line discipline is attached to
|
||||
the terminal. No other call into the line
|
||||
discipline for this tty will occur until it
|
||||
completes successfully. Should initialize any
|
||||
state needed by the ldisc, and set receive_room
|
||||
in the tty_struct to the maximum amount of data
|
||||
the line discipline is willing to accept from the
|
||||
driver with a single call to receive_buf().
|
||||
Returning an error will prevent the ldisc from
|
||||
being attached. Can sleep.
|
||||
|
||||
close() This is called on a terminal when the line
|
||||
discipline is being unplugged. At the point of
|
||||
execution no further users will enter the
|
||||
ldisc code for this tty. Can sleep.
|
||||
|
||||
hangup() Called when the tty line is hung up.
|
||||
The line discipline should cease I/O to the tty.
|
||||
No further calls into the ldisc code will occur.
|
||||
The return value is ignored. Can sleep.
|
||||
|
||||
read() (optional) A process requests reading data from
|
||||
the line. Multiple read calls may occur in parallel
|
||||
and the ldisc must deal with serialization issues.
|
||||
If not defined, the process will receive an EIO
|
||||
error. May sleep.
|
||||
|
||||
write() (optional) A process requests writing data to the
|
||||
line. Multiple write calls are serialized by the
|
||||
tty layer for the ldisc. If not defined, the
|
||||
process will receive an EIO error. May sleep.
|
||||
|
||||
flush_buffer() (optional) May be called at any point between
|
||||
open and close, and instructs the line discipline
|
||||
to empty its input buffer.
|
||||
|
||||
set_termios() (optional) Called on termios structure changes.
|
||||
The caller passes the old termios data and the
|
||||
current data is in the tty. Called under the
|
||||
termios semaphore so allowed to sleep. Serialized
|
||||
against itself only.
|
||||
|
||||
poll() (optional) Check the status for the poll/select
|
||||
calls. Multiple poll calls may occur in parallel.
|
||||
May sleep.
|
||||
|
||||
ioctl() (optional) Called when an ioctl is handed to the
|
||||
tty layer that might be for the ldisc. Multiple
|
||||
ioctl calls may occur in parallel. May sleep.
|
||||
|
||||
compat_ioctl() (optional) Called when a 32 bit ioctl is handed
|
||||
to the tty layer that might be for the ldisc.
|
||||
Multiple ioctl calls may occur in parallel.
|
||||
May sleep.
|
||||
======================= =======================================================
|
||||
|
||||
Driver Side Interfaces
|
||||
^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
======================= =======================================================
|
||||
receive_buf() (optional) Called by the low-level driver to hand
|
||||
a buffer of received bytes to the ldisc for
|
||||
processing. The number of bytes is guaranteed not
|
||||
to exceed the current value of tty->receive_room.
|
||||
All bytes must be processed.
|
||||
|
||||
receive_buf2() (optional) Called by the low-level driver to hand
|
||||
a buffer of received bytes to the ldisc for
|
||||
processing. Returns the number of bytes processed.
|
||||
|
||||
If both receive_buf() and receive_buf2() are
|
||||
defined, receive_buf2() should be preferred.
|
||||
|
||||
write_wakeup() May be called at any point between open and close.
|
||||
The TTY_DO_WRITE_WAKEUP flag indicates if a call
|
||||
is needed but always races versus calls. Thus the
|
||||
ldisc must be careful about setting order and to
|
||||
handle unexpected calls. Must not sleep.
|
||||
|
||||
The driver is forbidden from calling this directly
|
||||
from the ->write call from the ldisc as the ldisc
|
||||
is permitted to call the driver write method from
|
||||
this function. In such a situation defer it.
|
||||
|
||||
dcd_change() Report to the tty line the current DCD pin status
|
||||
changes and the relative timestamp. The timestamp
|
||||
cannot be NULL.
|
||||
======================= =======================================================
|
||||
|
||||
|
||||
Driver Access
|
||||
^^^^^^^^^^^^^
|
||||
|
||||
Line discipline methods can call the following methods of the underlying
|
||||
hardware driver through the function pointers within the tty->driver
|
||||
structure:
|
||||
|
||||
======================= =======================================================
|
||||
write() Write a block of characters to the tty device.
|
||||
Returns the number of characters accepted. The
|
||||
character buffer passed to this method is already
|
||||
in kernel space.
|
||||
|
||||
put_char() Queues a character for writing to the tty device.
|
||||
If there is no room in the queue, the character is
|
||||
ignored.
|
||||
|
||||
flush_chars() (Optional) If defined, must be called after
|
||||
queueing characters with put_char() in order to
|
||||
start transmission.
|
||||
|
||||
write_room() Returns the numbers of characters the tty driver
|
||||
will accept for queueing to be written.
|
||||
|
||||
ioctl() Invoke device specific ioctl.
|
||||
Expects data pointers to refer to userspace.
|
||||
Returns ENOIOCTLCMD for unrecognized ioctl numbers.
|
||||
|
||||
set_termios() Notify the tty driver that the device's termios
|
||||
settings have changed. New settings are in
|
||||
tty->termios. Previous settings should be passed in
|
||||
the "old" argument.
|
||||
|
||||
The API is defined such that the driver should return
|
||||
the actual modes selected. This means that the
|
||||
driver function is responsible for modifying any
|
||||
bits in the request it cannot fulfill to indicate
|
||||
the actual modes being used. A device with no
|
||||
hardware capability for change (e.g. a USB dongle or
|
||||
virtual port) can provide NULL for this method.
|
||||
|
||||
throttle() Notify the tty driver that input buffers for the
|
||||
line discipline are close to full, and it should
|
||||
somehow signal that no more characters should be
|
||||
sent to the tty.
|
||||
|
||||
unthrottle() Notify the tty driver that characters can now be
|
||||
sent to the tty without fear of overrunning the
|
||||
input buffers of the line disciplines.
|
||||
|
||||
stop() Ask the tty driver to stop outputting characters
|
||||
to the tty device.
|
||||
|
||||
start() Ask the tty driver to resume sending characters
|
||||
to the tty device.
|
||||
|
||||
hangup() Ask the tty driver to hang up the tty device.
|
||||
|
||||
break_ctl() (Optional) Ask the tty driver to turn on or off
|
||||
BREAK status on the RS-232 port. If state is -1,
|
||||
then the BREAK status should be turned on; if
|
||||
state is 0, then BREAK should be turned off.
|
||||
If this routine is not implemented, use ioctls
|
||||
TIOCSBRK / TIOCCBRK instead.
|
||||
|
||||
wait_until_sent() Waits until the device has written out all of the
|
||||
characters in its transmitter FIFO.
|
||||
|
||||
send_xchar() Send a high-priority XON/XOFF character to the device.
|
||||
======================= =======================================================
|
||||
|
||||
|
||||
Flags
|
||||
^^^^^
|
||||
|
||||
Line discipline methods have access to tty->flags field containing the
|
||||
following interesting flags:
|
||||
|
||||
======================= =======================================================
|
||||
TTY_THROTTLED Driver input is throttled. The ldisc should call
|
||||
tty->driver->unthrottle() in order to resume
|
||||
reception when it is ready to process more data.
|
||||
|
||||
TTY_DO_WRITE_WAKEUP If set, causes the driver to call the ldisc's
|
||||
write_wakeup() method in order to resume
|
||||
transmission when it can accept more data
|
||||
to transmit.
|
||||
|
||||
TTY_IO_ERROR If set, causes all subsequent userspace read/write
|
||||
calls on the tty to fail, returning -EIO.
|
||||
|
||||
TTY_OTHER_CLOSED Device is a pty and the other side has closed.
|
||||
|
||||
TTY_NO_WRITE_SPLIT Prevent driver from splitting up writes into
|
||||
smaller chunks.
|
||||
======================= =======================================================
|
||||
|
||||
|
||||
Locking
|
||||
^^^^^^^
|
||||
|
||||
Callers to the line discipline functions from the tty layer are required to
|
||||
take line discipline locks. The same is true of calls from the driver side
|
||||
but not yet enforced.
|
||||
|
||||
Three calls are now provided::
|
||||
|
||||
ldisc = tty_ldisc_ref(tty);
|
||||
|
||||
takes a handle to the line discipline in the tty and returns it. If no ldisc
|
||||
is currently attached or the ldisc is being closed and re-opened at this
|
||||
point then NULL is returned. While this handle is held the ldisc will not
|
||||
change or go away::
|
||||
|
||||
tty_ldisc_deref(ldisc)
|
||||
|
||||
Returns the ldisc reference and allows the ldisc to be closed. Returning the
|
||||
reference takes away your right to call the ldisc functions until you take
|
||||
a new reference::
|
||||
|
||||
ldisc = tty_ldisc_ref_wait(tty);
|
||||
|
||||
Performs the same function as tty_ldisc_ref except that it will wait for an
|
||||
ldisc change to complete and then return a reference to the new ldisc.
|
||||
|
||||
While these functions are slightly slower than the old code they should have
|
||||
minimal impact as most receive logic uses the flip buffers and they only
|
||||
need to take a reference when they push bits up through the driver.
|
||||
|
||||
A caution: The ldisc->open(), ldisc->close() and driver->set_ldisc
|
||||
functions are called with the ldisc unavailable. Thus tty_ldisc_ref will
|
||||
fail in this situation if used within these functions. Ldisc and driver
|
||||
code calling its own functions must be careful in this case.
|
||||
|
||||
|
||||
Driver Interface
|
||||
----------------
|
||||
|
||||
======================= =======================================================
|
||||
open() Called when a device is opened. May sleep
|
||||
|
||||
close() Called when a device is closed. At the point of
|
||||
return from this call the driver must make no
|
||||
further ldisc calls of any kind. May sleep
|
||||
|
||||
write() Called to write bytes to the device. May not
|
||||
sleep. May occur in parallel in special cases.
|
||||
Because this includes panic paths drivers generally
|
||||
shouldn't try and do clever locking here.
|
||||
|
||||
put_char() Stuff a single character onto the queue. The
|
||||
driver is guaranteed following up calls to
|
||||
flush_chars.
|
||||
|
||||
flush_chars() Ask the kernel to write put_char queue
|
||||
|
||||
write_room() Return the number of characters that can be stuffed
|
||||
into the port buffers without overflow (or less).
|
||||
The ldisc is responsible for being intelligent
|
||||
about multi-threading of write_room/write calls
|
||||
|
||||
ioctl() Called when an ioctl may be for the driver
|
||||
|
||||
set_termios() Called on termios change, serialized against
|
||||
itself by a semaphore. May sleep.
|
||||
|
||||
set_ldisc() Notifier for discipline change. At the point this
|
||||
is done the discipline is not yet usable. Can now
|
||||
sleep (I think)
|
||||
|
||||
throttle() Called by the ldisc to ask the driver to do flow
|
||||
control. Serialization including with unthrottle
|
||||
is the job of the ldisc layer.
|
||||
|
||||
unthrottle() Called by the ldisc to ask the driver to stop flow
|
||||
control.
|
||||
|
||||
stop() Ldisc notifier to the driver to stop output. As with
|
||||
throttle the serializations with start() are down
|
||||
to the ldisc layer.
|
||||
|
||||
start() Ldisc notifier to the driver to start output.
|
||||
|
||||
hangup() Ask the tty driver to cause a hangup initiated
|
||||
from the host side. [Can sleep ??]
|
||||
|
||||
break_ctl() Send RS232 break. Can sleep. Can get called in
|
||||
parallel, driver must serialize (for now), and
|
||||
with write calls.
|
||||
|
||||
wait_until_sent() Wait for characters to exit the hardware queue
|
||||
of the driver. Can sleep
|
||||
|
||||
send_xchar() Send XON/XOFF and if possible jump the queue with
|
||||
it in order to get fast flow control responses.
|
||||
Cannot sleep ??
|
||||
======================= =======================================================
|
49
Documentation/driver-api/sgi-ioc4.rst
Normal file
49
Documentation/driver-api/sgi-ioc4.rst
Normal file
@@ -0,0 +1,49 @@
|
||||
====================================
|
||||
SGI IOC4 PCI (multi function) device
|
||||
====================================
|
||||
|
||||
The SGI IOC4 PCI device is a bit of a strange beast, so some notes on
|
||||
it are in order.
|
||||
|
||||
First, even though the IOC4 performs multiple functions, such as an
|
||||
IDE controller, a serial controller, a PS/2 keyboard/mouse controller,
|
||||
and an external interrupt mechanism, it's not implemented as a
|
||||
multifunction device. The consequence of this from a software
|
||||
standpoint is that all these functions share a single IRQ, and
|
||||
they can't all register to own the same PCI device ID. To make
|
||||
matters a bit worse, some of the register blocks (and even registers
|
||||
themselves) present in IOC4 are mixed-purpose between these several
|
||||
functions, meaning that there's no clear "owning" device driver.
|
||||
|
||||
The solution is to organize the IOC4 driver into several independent
|
||||
drivers, "ioc4", "sgiioc4", and "ioc4_serial". Note that there is no
|
||||
PS/2 controller driver as this functionality has never been wired up
|
||||
on a shipping IO card.
|
||||
|
||||
ioc4
|
||||
====
|
||||
This is the core (or shim) driver for IOC4. It is responsible for
|
||||
initializing the basic functionality of the chip, and allocating
|
||||
the PCI resources that are shared between the IOC4 functions.
|
||||
|
||||
This driver also provides registration functions that the other
|
||||
IOC4 drivers can call to make their presence known. Each driver
|
||||
needs to provide a probe and remove function, which are invoked
|
||||
by the core driver at appropriate times. The interface of these
|
||||
IOC4 function probe and remove operations isn't precisely the same
|
||||
as PCI device probe and remove operations, but is logically the
|
||||
same operation.
|
||||
|
||||
sgiioc4
|
||||
=======
|
||||
This is the IDE driver for IOC4. Its name isn't very descriptive
|
||||
simply for historical reasons (it used to be the only IOC4 driver
|
||||
component). There's not much to say about it other than it hooks
|
||||
up to the ioc4 driver via the appropriate registration, probe, and
|
||||
remove functions.
|
||||
|
||||
ioc4_serial
|
||||
===========
|
||||
This is the serial driver for IOC4. There's not much to say about it
|
||||
other than it hooks up to the ioc4 driver via the appropriate registration,
|
||||
probe, and remove functions.
|
74
Documentation/driver-api/sm501.rst
Normal file
74
Documentation/driver-api/sm501.rst
Normal file
@@ -0,0 +1,74 @@
|
||||
.. include:: <isonum.txt>
|
||||
|
||||
============
|
||||
SM501 Driver
|
||||
============
|
||||
|
||||
:Copyright: |copy| 2006, 2007 Simtec Electronics
|
||||
|
||||
The Silicon Motion SM501 multimedia companion chip is a multifunction device
|
||||
which may provide numerous interfaces including USB host controller USB gadget,
|
||||
asynchronous serial ports, audio functions, and a dual display video interface.
|
||||
The device may be connected by PCI or local bus with varying functions enabled.
|
||||
|
||||
Core
|
||||
----
|
||||
|
||||
The core driver in drivers/mfd provides common services for the
|
||||
drivers which manage the specific hardware blocks. These services
|
||||
include locking for common registers, clock control and resource
|
||||
management.
|
||||
|
||||
The core registers drivers for both PCI and generic bus based
|
||||
chips via the platform device and driver system.
|
||||
|
||||
On detection of a device, the core initialises the chip (which may
|
||||
be specified by the platform data) and then exports the selected
|
||||
peripheral set as platform devices for the specific drivers.
|
||||
|
||||
The core re-uses the platform device system as the platform device
|
||||
system provides enough features to support the drivers without the
|
||||
need to create a new bus-type and the associated code to go with it.
|
||||
|
||||
|
||||
Resources
|
||||
---------
|
||||
|
||||
Each peripheral has a view of the device which is implicitly narrowed to
|
||||
the specific set of resources that peripheral requires in order to
|
||||
function correctly.
|
||||
|
||||
The centralised memory allocation allows the driver to ensure that the
|
||||
maximum possible resource allocation can be made to the video subsystem
|
||||
as this is by-far the most resource-sensitive of the on-chip functions.
|
||||
|
||||
The primary issue with memory allocation is that of moving the video
|
||||
buffers once a display mode is chosen. Indeed when a video mode change
|
||||
occurs the memory footprint of the video subsystem changes.
|
||||
|
||||
Since video memory is difficult to move without changing the display
|
||||
(unless sufficient contiguous memory can be provided for the old and new
|
||||
modes simultaneously) the video driver fully utilises the memory area
|
||||
given to it by aligning fb0 to the start of the area and fb1 to the end
|
||||
of it. Any memory left over in the middle is used for the acceleration
|
||||
functions, which are transient and thus their location is less critical
|
||||
as it can be moved.
|
||||
|
||||
|
||||
Configuration
|
||||
-------------
|
||||
|
||||
The platform device driver uses a set of platform data to pass
|
||||
configurations through to the core and the subsidiary drivers
|
||||
so that there can be support for more than one system carrying
|
||||
an SM501 built into a single kernel image.
|
||||
|
||||
The PCI driver assumes that the PCI card behaves as per the Silicon
|
||||
Motion reference design.
|
||||
|
||||
There is an errata (AB-5) affecting the selection of the
|
||||
of the M1XCLK and M1CLK frequencies. These two clocks
|
||||
must be sourced from the same PLL, although they can then
|
||||
be divided down individually. If this is not set, then SM501 may
|
||||
lock and hang the whole system. The driver will refuse to
|
||||
attach if the PLL selection is different.
|
60
Documentation/driver-api/smsc_ece1099.rst
Normal file
60
Documentation/driver-api/smsc_ece1099.rst
Normal file
@@ -0,0 +1,60 @@
|
||||
=================================================
|
||||
Msc Keyboard Scan Expansion/GPIO Expansion device
|
||||
=================================================
|
||||
|
||||
What is smsc-ece1099?
|
||||
----------------------
|
||||
|
||||
The ECE1099 is a 40-Pin 3.3V Keyboard Scan Expansion
|
||||
or GPIO Expansion device. The device supports a keyboard
|
||||
scan matrix of 23x8. The device is connected to a Master
|
||||
via the SMSC BC-Link interface or via the SMBus.
|
||||
Keypad scan Input(KSI) and Keypad Scan Output(KSO) signals
|
||||
are multiplexed with GPIOs.
|
||||
|
||||
Interrupt generation
|
||||
--------------------
|
||||
|
||||
Interrupts can be generated by an edge detection on a GPIO
|
||||
pin or an edge detection on one of the bus interface pins.
|
||||
Interrupts can also be detected on the keyboard scan interface.
|
||||
The bus interrupt pin (BC_INT# or SMBUS_INT#) is asserted if
|
||||
any bit in one of the Interrupt Status registers is 1 and
|
||||
the corresponding Interrupt Mask bit is also 1.
|
||||
|
||||
In order for software to determine which device is the source
|
||||
of an interrupt, it should first read the Group Interrupt Status Register
|
||||
to determine which Status register group is a source for the interrupt.
|
||||
Software should read both the Status register and the associated Mask register,
|
||||
then AND the two values together. Bits that are 1 in the result of the AND
|
||||
are active interrupts. Software clears an interrupt by writing a 1 to the
|
||||
corresponding bit in the Status register.
|
||||
|
||||
Communication Protocol
|
||||
----------------------
|
||||
|
||||
- SMbus slave Interface
|
||||
The host processor communicates with the ECE1099 device
|
||||
through a series of read/write registers via the SMBus
|
||||
interface. SMBus is a serial communication protocol between
|
||||
a computer host and its peripheral devices. The SMBus data
|
||||
rate is 10KHz minimum to 400 KHz maximum
|
||||
|
||||
- Slave Bus Interface
|
||||
The ECE1099 device SMBus implementation is a subset of the
|
||||
SMBus interface to the host. The device is a slave-only SMBus device.
|
||||
The implementation in the device is a subset of SMBus since it
|
||||
only supports four protocols.
|
||||
|
||||
The Write Byte, Read Byte, Send Byte, and Receive Byte protocols are the
|
||||
only valid SMBus protocols for the device.
|
||||
|
||||
- BC-LinkTM Interface
|
||||
The BC-Link is a proprietary bus that allows communication
|
||||
between a Master device and a Companion device. The Master
|
||||
device uses this serial bus to read and write registers
|
||||
located on the Companion device. The bus comprises three signals,
|
||||
BC_CLK, BC_DAT and BC_INT#. The Master device always provides the
|
||||
clock, BC_CLK, and the Companion device is the source for an
|
||||
independent asynchronous interrupt signal, BC_INT#. The ECE1099
|
||||
supports BC-Link speeds up to 24MHz.
|
102
Documentation/driver-api/switchtec.rst
Normal file
102
Documentation/driver-api/switchtec.rst
Normal file
@@ -0,0 +1,102 @@
|
||||
========================
|
||||
Linux Switchtec Support
|
||||
========================
|
||||
|
||||
Microsemi's "Switchtec" line of PCI switch devices is already
|
||||
supported by the kernel with standard PCI switch drivers. However, the
|
||||
Switchtec device advertises a special management endpoint which
|
||||
enables some additional functionality. This includes:
|
||||
|
||||
* Packet and Byte Counters
|
||||
* Firmware Upgrades
|
||||
* Event and Error logs
|
||||
* Querying port link status
|
||||
* Custom user firmware commands
|
||||
|
||||
The switchtec kernel module implements this functionality.
|
||||
|
||||
|
||||
Interface
|
||||
=========
|
||||
|
||||
The primary means of communicating with the Switchtec management firmware is
|
||||
through the Memory-mapped Remote Procedure Call (MRPC) interface.
|
||||
Commands are submitted to the interface with a 4-byte command
|
||||
identifier and up to 1KB of command specific data. The firmware will
|
||||
respond with a 4-byte return code and up to 1KB of command-specific
|
||||
data. The interface only processes a single command at a time.
|
||||
|
||||
|
||||
Userspace Interface
|
||||
===================
|
||||
|
||||
The MRPC interface will be exposed to userspace through a simple char
|
||||
device: /dev/switchtec#, one for each management endpoint in the system.
|
||||
|
||||
The char device has the following semantics:
|
||||
|
||||
* A write must consist of at least 4 bytes and no more than 1028 bytes.
|
||||
The first 4 bytes will be interpreted as the Command ID and the
|
||||
remainder will be used as the input data. A write will send the
|
||||
command to the firmware to begin processing.
|
||||
|
||||
* Each write must be followed by exactly one read. Any double write will
|
||||
produce an error and any read that doesn't follow a write will
|
||||
produce an error.
|
||||
|
||||
* A read will block until the firmware completes the command and return
|
||||
the 4-byte Command Return Value plus up to 1024 bytes of output
|
||||
data. (The length will be specified by the size parameter of the read
|
||||
call -- reading less than 4 bytes will produce an error.)
|
||||
|
||||
* The poll call will also be supported for userspace applications that
|
||||
need to do other things while waiting for the command to complete.
|
||||
|
||||
The following IOCTLs are also supported by the device:
|
||||
|
||||
* SWITCHTEC_IOCTL_FLASH_INFO - Retrieve firmware length and number
|
||||
of partitions in the device.
|
||||
|
||||
* SWITCHTEC_IOCTL_FLASH_PART_INFO - Retrieve address and lengeth for
|
||||
any specified partition in flash.
|
||||
|
||||
* SWITCHTEC_IOCTL_EVENT_SUMMARY - Read a structure of bitmaps
|
||||
indicating all uncleared events.
|
||||
|
||||
* SWITCHTEC_IOCTL_EVENT_CTL - Get the current count, clear and set flags
|
||||
for any event. This ioctl takes in a switchtec_ioctl_event_ctl struct
|
||||
with the event_id, index and flags set (index being the partition or PFF
|
||||
number for non-global events). It returns whether the event has
|
||||
occurred, the number of times and any event specific data. The flags
|
||||
can be used to clear the count or enable and disable actions to
|
||||
happen when the event occurs.
|
||||
By using the SWITCHTEC_IOCTL_EVENT_FLAG_EN_POLL flag,
|
||||
you can set an event to trigger a poll command to return with
|
||||
POLLPRI. In this way, userspace can wait for events to occur.
|
||||
|
||||
* SWITCHTEC_IOCTL_PFF_TO_PORT and SWITCHTEC_IOCTL_PORT_TO_PFF convert
|
||||
between PCI Function Framework number (used by the event system)
|
||||
and Switchtec Logic Port ID and Partition number (which is more
|
||||
user friendly).
|
||||
|
||||
|
||||
Non-Transparent Bridge (NTB) Driver
|
||||
===================================
|
||||
|
||||
An NTB hardware driver is provided for the Switchtec hardware in
|
||||
ntb_hw_switchtec. Currently, it only supports switches configured with
|
||||
exactly 2 NT partitions and zero or more non-NT partitions. It also requires
|
||||
the following configuration settings:
|
||||
|
||||
* Both NT partitions must be able to access each other's GAS spaces.
|
||||
Thus, the bits in the GAS Access Vector under Management Settings
|
||||
must be set to support this.
|
||||
* Kernel configuration MUST include support for NTB (CONFIG_NTB needs
|
||||
to be set)
|
||||
|
||||
NT EP BAR 2 will be dynamically configured as a Direct Window, and
|
||||
the configuration file does not need to configure it explicitly.
|
||||
|
||||
Please refer to Documentation/driver-api/ntb.rst in Linux source tree for an overall
|
||||
understanding of the Linux NTB stack. ntb_hw_switchtec works as an NTB
|
||||
Hardware Driver in this stack.
|
86
Documentation/driver-api/sync_file.rst
Normal file
86
Documentation/driver-api/sync_file.rst
Normal file
@@ -0,0 +1,86 @@
|
||||
===================
|
||||
Sync File API Guide
|
||||
===================
|
||||
|
||||
:Author: Gustavo Padovan <gustavo at padovan dot org>
|
||||
|
||||
This document serves as a guide for device drivers writers on what the
|
||||
sync_file API is, and how drivers can support it. Sync file is the carrier of
|
||||
the fences(struct dma_fence) that are needed to synchronize between drivers or
|
||||
across process boundaries.
|
||||
|
||||
The sync_file API is meant to be used to send and receive fence information
|
||||
to/from userspace. It enables userspace to do explicit fencing, where instead
|
||||
of attaching a fence to the buffer a producer driver (such as a GPU or V4L
|
||||
driver) sends the fence related to the buffer to userspace via a sync_file.
|
||||
|
||||
The sync_file then can be sent to the consumer (DRM driver for example), that
|
||||
will not use the buffer for anything before the fence(s) signals, i.e., the
|
||||
driver that issued the fence is not using/processing the buffer anymore, so it
|
||||
signals that the buffer is ready to use. And vice-versa for the consumer ->
|
||||
producer part of the cycle.
|
||||
|
||||
Sync files allows userspace awareness on buffer sharing synchronization between
|
||||
drivers.
|
||||
|
||||
Sync file was originally added in the Android kernel but current Linux Desktop
|
||||
can benefit a lot from it.
|
||||
|
||||
in-fences and out-fences
|
||||
------------------------
|
||||
|
||||
Sync files can go either to or from userspace. When a sync_file is sent from
|
||||
the driver to userspace we call the fences it contains 'out-fences'. They are
|
||||
related to a buffer that the driver is processing or is going to process, so
|
||||
the driver creates an out-fence to be able to notify, through
|
||||
dma_fence_signal(), when it has finished using (or processing) that buffer.
|
||||
Out-fences are fences that the driver creates.
|
||||
|
||||
On the other hand if the driver receives fence(s) through a sync_file from
|
||||
userspace we call these fence(s) 'in-fences'. Receiving in-fences means that
|
||||
we need to wait for the fence(s) to signal before using any buffer related to
|
||||
the in-fences.
|
||||
|
||||
Creating Sync Files
|
||||
-------------------
|
||||
|
||||
When a driver needs to send an out-fence userspace it creates a sync_file.
|
||||
|
||||
Interface::
|
||||
|
||||
struct sync_file *sync_file_create(struct dma_fence *fence);
|
||||
|
||||
The caller pass the out-fence and gets back the sync_file. That is just the
|
||||
first step, next it needs to install an fd on sync_file->file. So it gets an
|
||||
fd::
|
||||
|
||||
fd = get_unused_fd_flags(O_CLOEXEC);
|
||||
|
||||
and installs it on sync_file->file::
|
||||
|
||||
fd_install(fd, sync_file->file);
|
||||
|
||||
The sync_file fd now can be sent to userspace.
|
||||
|
||||
If the creation process fail, or the sync_file needs to be released by any
|
||||
other reason fput(sync_file->file) should be used.
|
||||
|
||||
Receiving Sync Files from Userspace
|
||||
-----------------------------------
|
||||
|
||||
When userspace needs to send an in-fence to the driver it passes file descriptor
|
||||
of the Sync File to the kernel. The kernel can then retrieve the fences
|
||||
from it.
|
||||
|
||||
Interface::
|
||||
|
||||
struct dma_fence *sync_file_get_fence(int fd);
|
||||
|
||||
|
||||
The returned reference is owned by the caller and must be disposed of
|
||||
afterwards using dma_fence_put(). In case of error, a NULL is returned instead.
|
||||
|
||||
References:
|
||||
|
||||
1. struct sync_file in include/linux/sync_file.h
|
||||
2. All interfaces mentioned above defined in include/linux/sync_file.h
|
414
Documentation/driver-api/vfio-mediated-device.rst
Normal file
414
Documentation/driver-api/vfio-mediated-device.rst
Normal file
@@ -0,0 +1,414 @@
|
||||
.. include:: <isonum.txt>
|
||||
|
||||
=====================
|
||||
VFIO Mediated devices
|
||||
=====================
|
||||
|
||||
:Copyright: |copy| 2016, NVIDIA CORPORATION. All rights reserved.
|
||||
:Author: Neo Jia <cjia@nvidia.com>
|
||||
:Author: Kirti Wankhede <kwankhede@nvidia.com>
|
||||
|
||||
This program is free software; you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License version 2 as
|
||||
published by the Free Software Foundation.
|
||||
|
||||
|
||||
Virtual Function I/O (VFIO) Mediated devices[1]
|
||||
===============================================
|
||||
|
||||
The number of use cases for virtualizing DMA devices that do not have built-in
|
||||
SR_IOV capability is increasing. Previously, to virtualize such devices,
|
||||
developers had to create their own management interfaces and APIs, and then
|
||||
integrate them with user space software. To simplify integration with user space
|
||||
software, we have identified common requirements and a unified management
|
||||
interface for such devices.
|
||||
|
||||
The VFIO driver framework provides unified APIs for direct device access. It is
|
||||
an IOMMU/device-agnostic framework for exposing direct device access to user
|
||||
space in a secure, IOMMU-protected environment. This framework is used for
|
||||
multiple devices, such as GPUs, network adapters, and compute accelerators. With
|
||||
direct device access, virtual machines or user space applications have direct
|
||||
access to the physical device. This framework is reused for mediated devices.
|
||||
|
||||
The mediated core driver provides a common interface for mediated device
|
||||
management that can be used by drivers of different devices. This module
|
||||
provides a generic interface to perform these operations:
|
||||
|
||||
* Create and destroy a mediated device
|
||||
* Add a mediated device to and remove it from a mediated bus driver
|
||||
* Add a mediated device to and remove it from an IOMMU group
|
||||
|
||||
The mediated core driver also provides an interface to register a bus driver.
|
||||
For example, the mediated VFIO mdev driver is designed for mediated devices and
|
||||
supports VFIO APIs. The mediated bus driver adds a mediated device to and
|
||||
removes it from a VFIO group.
|
||||
|
||||
The following high-level block diagram shows the main components and interfaces
|
||||
in the VFIO mediated driver framework. The diagram shows NVIDIA, Intel, and IBM
|
||||
devices as examples, as these devices are the first devices to use this module::
|
||||
|
||||
+---------------+
|
||||
| |
|
||||
| +-----------+ | mdev_register_driver() +--------------+
|
||||
| | | +<------------------------+ |
|
||||
| | mdev | | | |
|
||||
| | bus | +------------------------>+ vfio_mdev.ko |<-> VFIO user
|
||||
| | driver | | probe()/remove() | | APIs
|
||||
| | | | +--------------+
|
||||
| +-----------+ |
|
||||
| |
|
||||
| MDEV CORE |
|
||||
| MODULE |
|
||||
| mdev.ko |
|
||||
| +-----------+ | mdev_register_device() +--------------+
|
||||
| | | +<------------------------+ |
|
||||
| | | | | nvidia.ko |<-> physical
|
||||
| | | +------------------------>+ | device
|
||||
| | | | callbacks +--------------+
|
||||
| | Physical | |
|
||||
| | device | | mdev_register_device() +--------------+
|
||||
| | interface | |<------------------------+ |
|
||||
| | | | | i915.ko |<-> physical
|
||||
| | | +------------------------>+ | device
|
||||
| | | | callbacks +--------------+
|
||||
| | | |
|
||||
| | | | mdev_register_device() +--------------+
|
||||
| | | +<------------------------+ |
|
||||
| | | | | ccw_device.ko|<-> physical
|
||||
| | | +------------------------>+ | device
|
||||
| | | | callbacks +--------------+
|
||||
| +-----------+ |
|
||||
+---------------+
|
||||
|
||||
|
||||
Registration Interfaces
|
||||
=======================
|
||||
|
||||
The mediated core driver provides the following types of registration
|
||||
interfaces:
|
||||
|
||||
* Registration interface for a mediated bus driver
|
||||
* Physical device driver interface
|
||||
|
||||
Registration Interface for a Mediated Bus Driver
|
||||
------------------------------------------------
|
||||
|
||||
The registration interface for a mediated bus driver provides the following
|
||||
structure to represent a mediated device's driver::
|
||||
|
||||
/*
|
||||
* struct mdev_driver [2] - Mediated device's driver
|
||||
* @name: driver name
|
||||
* @probe: called when new device created
|
||||
* @remove: called when device removed
|
||||
* @driver: device driver structure
|
||||
*/
|
||||
struct mdev_driver {
|
||||
const char *name;
|
||||
int (*probe) (struct device *dev);
|
||||
void (*remove) (struct device *dev);
|
||||
struct device_driver driver;
|
||||
};
|
||||
|
||||
A mediated bus driver for mdev should use this structure in the function calls
|
||||
to register and unregister itself with the core driver:
|
||||
|
||||
* Register::
|
||||
|
||||
extern int mdev_register_driver(struct mdev_driver *drv,
|
||||
struct module *owner);
|
||||
|
||||
* Unregister::
|
||||
|
||||
extern void mdev_unregister_driver(struct mdev_driver *drv);
|
||||
|
||||
The mediated bus driver is responsible for adding mediated devices to the VFIO
|
||||
group when devices are bound to the driver and removing mediated devices from
|
||||
the VFIO when devices are unbound from the driver.
|
||||
|
||||
|
||||
Physical Device Driver Interface
|
||||
--------------------------------
|
||||
|
||||
The physical device driver interface provides the mdev_parent_ops[3] structure
|
||||
to define the APIs to manage work in the mediated core driver that is related
|
||||
to the physical device.
|
||||
|
||||
The structures in the mdev_parent_ops structure are as follows:
|
||||
|
||||
* dev_attr_groups: attributes of the parent device
|
||||
* mdev_attr_groups: attributes of the mediated device
|
||||
* supported_config: attributes to define supported configurations
|
||||
|
||||
The functions in the mdev_parent_ops structure are as follows:
|
||||
|
||||
* create: allocate basic resources in a driver for a mediated device
|
||||
* remove: free resources in a driver when a mediated device is destroyed
|
||||
|
||||
(Note that mdev-core provides no implicit serialization of create/remove
|
||||
callbacks per mdev parent device, per mdev type, or any other categorization.
|
||||
Vendor drivers are expected to be fully asynchronous in this respect or
|
||||
provide their own internal resource protection.)
|
||||
|
||||
The callbacks in the mdev_parent_ops structure are as follows:
|
||||
|
||||
* open: open callback of mediated device
|
||||
* close: close callback of mediated device
|
||||
* ioctl: ioctl callback of mediated device
|
||||
* read : read emulation callback
|
||||
* write: write emulation callback
|
||||
* mmap: mmap emulation callback
|
||||
|
||||
A driver should use the mdev_parent_ops structure in the function call to
|
||||
register itself with the mdev core driver::
|
||||
|
||||
extern int mdev_register_device(struct device *dev,
|
||||
const struct mdev_parent_ops *ops);
|
||||
|
||||
However, the mdev_parent_ops structure is not required in the function call
|
||||
that a driver should use to unregister itself with the mdev core driver::
|
||||
|
||||
extern void mdev_unregister_device(struct device *dev);
|
||||
|
||||
|
||||
Mediated Device Management Interface Through sysfs
|
||||
==================================================
|
||||
|
||||
The management interface through sysfs enables user space software, such as
|
||||
libvirt, to query and configure mediated devices in a hardware-agnostic fashion.
|
||||
This management interface provides flexibility to the underlying physical
|
||||
device's driver to support features such as:
|
||||
|
||||
* Mediated device hot plug
|
||||
* Multiple mediated devices in a single virtual machine
|
||||
* Multiple mediated devices from different physical devices
|
||||
|
||||
Links in the mdev_bus Class Directory
|
||||
-------------------------------------
|
||||
The /sys/class/mdev_bus/ directory contains links to devices that are registered
|
||||
with the mdev core driver.
|
||||
|
||||
Directories and files under the sysfs for Each Physical Device
|
||||
--------------------------------------------------------------
|
||||
|
||||
::
|
||||
|
||||
|- [parent physical device]
|
||||
|--- Vendor-specific-attributes [optional]
|
||||
|--- [mdev_supported_types]
|
||||
| |--- [<type-id>]
|
||||
| | |--- create
|
||||
| | |--- name
|
||||
| | |--- available_instances
|
||||
| | |--- device_api
|
||||
| | |--- description
|
||||
| | |--- [devices]
|
||||
| |--- [<type-id>]
|
||||
| | |--- create
|
||||
| | |--- name
|
||||
| | |--- available_instances
|
||||
| | |--- device_api
|
||||
| | |--- description
|
||||
| | |--- [devices]
|
||||
| |--- [<type-id>]
|
||||
| |--- create
|
||||
| |--- name
|
||||
| |--- available_instances
|
||||
| |--- device_api
|
||||
| |--- description
|
||||
| |--- [devices]
|
||||
|
||||
* [mdev_supported_types]
|
||||
|
||||
The list of currently supported mediated device types and their details.
|
||||
|
||||
[<type-id>], device_api, and available_instances are mandatory attributes
|
||||
that should be provided by vendor driver.
|
||||
|
||||
* [<type-id>]
|
||||
|
||||
The [<type-id>] name is created by adding the device driver string as a prefix
|
||||
to the string provided by the vendor driver. This format of this name is as
|
||||
follows::
|
||||
|
||||
sprintf(buf, "%s-%s", dev_driver_string(parent->dev), group->name);
|
||||
|
||||
(or using mdev_parent_dev(mdev) to arrive at the parent device outside
|
||||
of the core mdev code)
|
||||
|
||||
* device_api
|
||||
|
||||
This attribute should show which device API is being created, for example,
|
||||
"vfio-pci" for a PCI device.
|
||||
|
||||
* available_instances
|
||||
|
||||
This attribute should show the number of devices of type <type-id> that can be
|
||||
created.
|
||||
|
||||
* [device]
|
||||
|
||||
This directory contains links to the devices of type <type-id> that have been
|
||||
created.
|
||||
|
||||
* name
|
||||
|
||||
This attribute should show human readable name. This is optional attribute.
|
||||
|
||||
* description
|
||||
|
||||
This attribute should show brief features/description of the type. This is
|
||||
optional attribute.
|
||||
|
||||
Directories and Files Under the sysfs for Each mdev Device
|
||||
----------------------------------------------------------
|
||||
|
||||
::
|
||||
|
||||
|- [parent phy device]
|
||||
|--- [$MDEV_UUID]
|
||||
|--- remove
|
||||
|--- mdev_type {link to its type}
|
||||
|--- vendor-specific-attributes [optional]
|
||||
|
||||
* remove (write only)
|
||||
|
||||
Writing '1' to the 'remove' file destroys the mdev device. The vendor driver can
|
||||
fail the remove() callback if that device is active and the vendor driver
|
||||
doesn't support hot unplug.
|
||||
|
||||
Example::
|
||||
|
||||
# echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove
|
||||
|
||||
Mediated device Hot plug
|
||||
------------------------
|
||||
|
||||
Mediated devices can be created and assigned at runtime. The procedure to hot
|
||||
plug a mediated device is the same as the procedure to hot plug a PCI device.
|
||||
|
||||
Translation APIs for Mediated Devices
|
||||
=====================================
|
||||
|
||||
The following APIs are provided for translating user pfn to host pfn in a VFIO
|
||||
driver::
|
||||
|
||||
extern int vfio_pin_pages(struct device *dev, unsigned long *user_pfn,
|
||||
int npage, int prot, unsigned long *phys_pfn);
|
||||
|
||||
extern int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn,
|
||||
int npage);
|
||||
|
||||
These functions call back into the back-end IOMMU module by using the pin_pages
|
||||
and unpin_pages callbacks of the struct vfio_iommu_driver_ops[4]. Currently
|
||||
these callbacks are supported in the TYPE1 IOMMU module. To enable them for
|
||||
other IOMMU backend modules, such as PPC64 sPAPR module, they need to provide
|
||||
these two callback functions.
|
||||
|
||||
Using the Sample Code
|
||||
=====================
|
||||
|
||||
mtty.c in samples/vfio-mdev/ directory is a sample driver program to
|
||||
demonstrate how to use the mediated device framework.
|
||||
|
||||
The sample driver creates an mdev device that simulates a serial port over a PCI
|
||||
card.
|
||||
|
||||
1. Build and load the mtty.ko module.
|
||||
|
||||
This step creates a dummy device, /sys/devices/virtual/mtty/mtty/
|
||||
|
||||
Files in this device directory in sysfs are similar to the following::
|
||||
|
||||
# tree /sys/devices/virtual/mtty/mtty/
|
||||
/sys/devices/virtual/mtty/mtty/
|
||||
|-- mdev_supported_types
|
||||
| |-- mtty-1
|
||||
| | |-- available_instances
|
||||
| | |-- create
|
||||
| | |-- device_api
|
||||
| | |-- devices
|
||||
| | `-- name
|
||||
| `-- mtty-2
|
||||
| |-- available_instances
|
||||
| |-- create
|
||||
| |-- device_api
|
||||
| |-- devices
|
||||
| `-- name
|
||||
|-- mtty_dev
|
||||
| `-- sample_mtty_dev
|
||||
|-- power
|
||||
| |-- autosuspend_delay_ms
|
||||
| |-- control
|
||||
| |-- runtime_active_time
|
||||
| |-- runtime_status
|
||||
| `-- runtime_suspended_time
|
||||
|-- subsystem -> ../../../../class/mtty
|
||||
`-- uevent
|
||||
|
||||
2. Create a mediated device by using the dummy device that you created in the
|
||||
previous step::
|
||||
|
||||
# echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001" > \
|
||||
/sys/devices/virtual/mtty/mtty/mdev_supported_types/mtty-2/create
|
||||
|
||||
3. Add parameters to qemu-kvm::
|
||||
|
||||
-device vfio-pci,\
|
||||
sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001
|
||||
|
||||
4. Boot the VM.
|
||||
|
||||
In the Linux guest VM, with no hardware on the host, the device appears
|
||||
as follows::
|
||||
|
||||
# lspci -s 00:05.0 -xxvv
|
||||
00:05.0 Serial controller: Device 4348:3253 (rev 10) (prog-if 02 [16550])
|
||||
Subsystem: Device 4348:3253
|
||||
Physical Slot: 5
|
||||
Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
|
||||
Stepping- SERR- FastB2B- DisINTx-
|
||||
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
|
||||
<TAbort- <MAbort- >SERR- <PERR- INTx-
|
||||
Interrupt: pin A routed to IRQ 10
|
||||
Region 0: I/O ports at c150 [size=8]
|
||||
Region 1: I/O ports at c158 [size=8]
|
||||
Kernel driver in use: serial
|
||||
00: 48 43 53 32 01 00 00 02 10 02 00 07 00 00 00 00
|
||||
10: 51 c1 00 00 59 c1 00 00 00 00 00 00 00 00 00 00
|
||||
20: 00 00 00 00 00 00 00 00 00 00 00 00 48 43 53 32
|
||||
30: 00 00 00 00 00 00 00 00 00 00 00 00 0a 01 00 00
|
||||
|
||||
In the Linux guest VM, dmesg output for the device is as follows:
|
||||
|
||||
serial 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ 10
|
||||
0000:00:05.0: ttyS1 at I/O 0xc150 (irq = 10) is a 16550A
|
||||
0000:00:05.0: ttyS2 at I/O 0xc158 (irq = 10) is a 16550A
|
||||
|
||||
|
||||
5. In the Linux guest VM, check the serial ports::
|
||||
|
||||
# setserial -g /dev/ttyS*
|
||||
/dev/ttyS0, UART: 16550A, Port: 0x03f8, IRQ: 4
|
||||
/dev/ttyS1, UART: 16550A, Port: 0xc150, IRQ: 10
|
||||
/dev/ttyS2, UART: 16550A, Port: 0xc158, IRQ: 10
|
||||
|
||||
6. Using minicom or any terminal emulation program, open port /dev/ttyS1 or
|
||||
/dev/ttyS2 with hardware flow control disabled.
|
||||
|
||||
7. Type data on the minicom terminal or send data to the terminal emulation
|
||||
program and read the data.
|
||||
|
||||
Data is loop backed from hosts mtty driver.
|
||||
|
||||
8. Destroy the mediated device that you created::
|
||||
|
||||
# echo 1 > /sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001/remove
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
1. See Documentation/driver-api/vfio.rst for more information on VFIO.
|
||||
2. struct mdev_driver in include/linux/mdev.h
|
||||
3. struct mdev_parent_ops in include/linux/mdev.h
|
||||
4. struct vfio_iommu_driver_ops in include/linux/vfio.h
|
520
Documentation/driver-api/vfio.rst
Normal file
520
Documentation/driver-api/vfio.rst
Normal file
@@ -0,0 +1,520 @@
|
||||
==================================
|
||||
VFIO - "Virtual Function I/O" [1]_
|
||||
==================================
|
||||
|
||||
Many modern system now provide DMA and interrupt remapping facilities
|
||||
to help ensure I/O devices behave within the boundaries they've been
|
||||
allotted. This includes x86 hardware with AMD-Vi and Intel VT-d,
|
||||
POWER systems with Partitionable Endpoints (PEs) and embedded PowerPC
|
||||
systems such as Freescale PAMU. The VFIO driver is an IOMMU/device
|
||||
agnostic framework for exposing direct device access to userspace, in
|
||||
a secure, IOMMU protected environment. In other words, this allows
|
||||
safe [2]_, non-privileged, userspace drivers.
|
||||
|
||||
Why do we want that? Virtual machines often make use of direct device
|
||||
access ("device assignment") when configured for the highest possible
|
||||
I/O performance. From a device and host perspective, this simply
|
||||
turns the VM into a userspace driver, with the benefits of
|
||||
significantly reduced latency, higher bandwidth, and direct use of
|
||||
bare-metal device drivers [3]_.
|
||||
|
||||
Some applications, particularly in the high performance computing
|
||||
field, also benefit from low-overhead, direct device access from
|
||||
userspace. Examples include network adapters (often non-TCP/IP based)
|
||||
and compute accelerators. Prior to VFIO, these drivers had to either
|
||||
go through the full development cycle to become proper upstream
|
||||
driver, be maintained out of tree, or make use of the UIO framework,
|
||||
which has no notion of IOMMU protection, limited interrupt support,
|
||||
and requires root privileges to access things like PCI configuration
|
||||
space.
|
||||
|
||||
The VFIO driver framework intends to unify these, replacing both the
|
||||
KVM PCI specific device assignment code as well as provide a more
|
||||
secure, more featureful userspace driver environment than UIO.
|
||||
|
||||
Groups, Devices, and IOMMUs
|
||||
---------------------------
|
||||
|
||||
Devices are the main target of any I/O driver. Devices typically
|
||||
create a programming interface made up of I/O access, interrupts,
|
||||
and DMA. Without going into the details of each of these, DMA is
|
||||
by far the most critical aspect for maintaining a secure environment
|
||||
as allowing a device read-write access to system memory imposes the
|
||||
greatest risk to the overall system integrity.
|
||||
|
||||
To help mitigate this risk, many modern IOMMUs now incorporate
|
||||
isolation properties into what was, in many cases, an interface only
|
||||
meant for translation (ie. solving the addressing problems of devices
|
||||
with limited address spaces). With this, devices can now be isolated
|
||||
from each other and from arbitrary memory access, thus allowing
|
||||
things like secure direct assignment of devices into virtual machines.
|
||||
|
||||
This isolation is not always at the granularity of a single device
|
||||
though. Even when an IOMMU is capable of this, properties of devices,
|
||||
interconnects, and IOMMU topologies can each reduce this isolation.
|
||||
For instance, an individual device may be part of a larger multi-
|
||||
function enclosure. While the IOMMU may be able to distinguish
|
||||
between devices within the enclosure, the enclosure may not require
|
||||
transactions between devices to reach the IOMMU. Examples of this
|
||||
could be anything from a multi-function PCI device with backdoors
|
||||
between functions to a non-PCI-ACS (Access Control Services) capable
|
||||
bridge allowing redirection without reaching the IOMMU. Topology
|
||||
can also play a factor in terms of hiding devices. A PCIe-to-PCI
|
||||
bridge masks the devices behind it, making transaction appear as if
|
||||
from the bridge itself. Obviously IOMMU design plays a major factor
|
||||
as well.
|
||||
|
||||
Therefore, while for the most part an IOMMU may have device level
|
||||
granularity, any system is susceptible to reduced granularity. The
|
||||
IOMMU API therefore supports a notion of IOMMU groups. A group is
|
||||
a set of devices which is isolatable from all other devices in the
|
||||
system. Groups are therefore the unit of ownership used by VFIO.
|
||||
|
||||
While the group is the minimum granularity that must be used to
|
||||
ensure secure user access, it's not necessarily the preferred
|
||||
granularity. In IOMMUs which make use of page tables, it may be
|
||||
possible to share a set of page tables between different groups,
|
||||
reducing the overhead both to the platform (reduced TLB thrashing,
|
||||
reduced duplicate page tables), and to the user (programming only
|
||||
a single set of translations). For this reason, VFIO makes use of
|
||||
a container class, which may hold one or more groups. A container
|
||||
is created by simply opening the /dev/vfio/vfio character device.
|
||||
|
||||
On its own, the container provides little functionality, with all
|
||||
but a couple version and extension query interfaces locked away.
|
||||
The user needs to add a group into the container for the next level
|
||||
of functionality. To do this, the user first needs to identify the
|
||||
group associated with the desired device. This can be done using
|
||||
the sysfs links described in the example below. By unbinding the
|
||||
device from the host driver and binding it to a VFIO driver, a new
|
||||
VFIO group will appear for the group as /dev/vfio/$GROUP, where
|
||||
$GROUP is the IOMMU group number of which the device is a member.
|
||||
If the IOMMU group contains multiple devices, each will need to
|
||||
be bound to a VFIO driver before operations on the VFIO group
|
||||
are allowed (it's also sufficient to only unbind the device from
|
||||
host drivers if a VFIO driver is unavailable; this will make the
|
||||
group available, but not that particular device). TBD - interface
|
||||
for disabling driver probing/locking a device.
|
||||
|
||||
Once the group is ready, it may be added to the container by opening
|
||||
the VFIO group character device (/dev/vfio/$GROUP) and using the
|
||||
VFIO_GROUP_SET_CONTAINER ioctl, passing the file descriptor of the
|
||||
previously opened container file. If desired and if the IOMMU driver
|
||||
supports sharing the IOMMU context between groups, multiple groups may
|
||||
be set to the same container. If a group fails to set to a container
|
||||
with existing groups, a new empty container will need to be used
|
||||
instead.
|
||||
|
||||
With a group (or groups) attached to a container, the remaining
|
||||
ioctls become available, enabling access to the VFIO IOMMU interfaces.
|
||||
Additionally, it now becomes possible to get file descriptors for each
|
||||
device within a group using an ioctl on the VFIO group file descriptor.
|
||||
|
||||
The VFIO device API includes ioctls for describing the device, the I/O
|
||||
regions and their read/write/mmap offsets on the device descriptor, as
|
||||
well as mechanisms for describing and registering interrupt
|
||||
notifications.
|
||||
|
||||
VFIO Usage Example
|
||||
------------------
|
||||
|
||||
Assume user wants to access PCI device 0000:06:0d.0::
|
||||
|
||||
$ readlink /sys/bus/pci/devices/0000:06:0d.0/iommu_group
|
||||
../../../../kernel/iommu_groups/26
|
||||
|
||||
This device is therefore in IOMMU group 26. This device is on the
|
||||
pci bus, therefore the user will make use of vfio-pci to manage the
|
||||
group::
|
||||
|
||||
# modprobe vfio-pci
|
||||
|
||||
Binding this device to the vfio-pci driver creates the VFIO group
|
||||
character devices for this group::
|
||||
|
||||
$ lspci -n -s 0000:06:0d.0
|
||||
06:0d.0 0401: 1102:0002 (rev 08)
|
||||
# echo 0000:06:0d.0 > /sys/bus/pci/devices/0000:06:0d.0/driver/unbind
|
||||
# echo 1102 0002 > /sys/bus/pci/drivers/vfio-pci/new_id
|
||||
|
||||
Now we need to look at what other devices are in the group to free
|
||||
it for use by VFIO::
|
||||
|
||||
$ ls -l /sys/bus/pci/devices/0000:06:0d.0/iommu_group/devices
|
||||
total 0
|
||||
lrwxrwxrwx. 1 root root 0 Apr 23 16:13 0000:00:1e.0 ->
|
||||
../../../../devices/pci0000:00/0000:00:1e.0
|
||||
lrwxrwxrwx. 1 root root 0 Apr 23 16:13 0000:06:0d.0 ->
|
||||
../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0
|
||||
lrwxrwxrwx. 1 root root 0 Apr 23 16:13 0000:06:0d.1 ->
|
||||
../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1
|
||||
|
||||
This device is behind a PCIe-to-PCI bridge [4]_, therefore we also
|
||||
need to add device 0000:06:0d.1 to the group following the same
|
||||
procedure as above. Device 0000:00:1e.0 is a bridge that does
|
||||
not currently have a host driver, therefore it's not required to
|
||||
bind this device to the vfio-pci driver (vfio-pci does not currently
|
||||
support PCI bridges).
|
||||
|
||||
The final step is to provide the user with access to the group if
|
||||
unprivileged operation is desired (note that /dev/vfio/vfio provides
|
||||
no capabilities on its own and is therefore expected to be set to
|
||||
mode 0666 by the system)::
|
||||
|
||||
# chown user:user /dev/vfio/26
|
||||
|
||||
The user now has full access to all the devices and the iommu for this
|
||||
group and can access them as follows::
|
||||
|
||||
int container, group, device, i;
|
||||
struct vfio_group_status group_status =
|
||||
{ .argsz = sizeof(group_status) };
|
||||
struct vfio_iommu_type1_info iommu_info = { .argsz = sizeof(iommu_info) };
|
||||
struct vfio_iommu_type1_dma_map dma_map = { .argsz = sizeof(dma_map) };
|
||||
struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
|
||||
|
||||
/* Create a new container */
|
||||
container = open("/dev/vfio/vfio", O_RDWR);
|
||||
|
||||
if (ioctl(container, VFIO_GET_API_VERSION) != VFIO_API_VERSION)
|
||||
/* Unknown API version */
|
||||
|
||||
if (!ioctl(container, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU))
|
||||
/* Doesn't support the IOMMU driver we want. */
|
||||
|
||||
/* Open the group */
|
||||
group = open("/dev/vfio/26", O_RDWR);
|
||||
|
||||
/* Test the group is viable and available */
|
||||
ioctl(group, VFIO_GROUP_GET_STATUS, &group_status);
|
||||
|
||||
if (!(group_status.flags & VFIO_GROUP_FLAGS_VIABLE))
|
||||
/* Group is not viable (ie, not all devices bound for vfio) */
|
||||
|
||||
/* Add the group to the container */
|
||||
ioctl(group, VFIO_GROUP_SET_CONTAINER, &container);
|
||||
|
||||
/* Enable the IOMMU model we want */
|
||||
ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU);
|
||||
|
||||
/* Get addition IOMMU info */
|
||||
ioctl(container, VFIO_IOMMU_GET_INFO, &iommu_info);
|
||||
|
||||
/* Allocate some space and setup a DMA mapping */
|
||||
dma_map.vaddr = mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE,
|
||||
MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
|
||||
dma_map.size = 1024 * 1024;
|
||||
dma_map.iova = 0; /* 1MB starting at 0x0 from device view */
|
||||
dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
|
||||
|
||||
ioctl(container, VFIO_IOMMU_MAP_DMA, &dma_map);
|
||||
|
||||
/* Get a file descriptor for the device */
|
||||
device = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "0000:06:0d.0");
|
||||
|
||||
/* Test and setup the device */
|
||||
ioctl(device, VFIO_DEVICE_GET_INFO, &device_info);
|
||||
|
||||
for (i = 0; i < device_info.num_regions; i++) {
|
||||
struct vfio_region_info reg = { .argsz = sizeof(reg) };
|
||||
|
||||
reg.index = i;
|
||||
|
||||
ioctl(device, VFIO_DEVICE_GET_REGION_INFO, ®);
|
||||
|
||||
/* Setup mappings... read/write offsets, mmaps
|
||||
* For PCI devices, config space is a region */
|
||||
}
|
||||
|
||||
for (i = 0; i < device_info.num_irqs; i++) {
|
||||
struct vfio_irq_info irq = { .argsz = sizeof(irq) };
|
||||
|
||||
irq.index = i;
|
||||
|
||||
ioctl(device, VFIO_DEVICE_GET_IRQ_INFO, &irq);
|
||||
|
||||
/* Setup IRQs... eventfds, VFIO_DEVICE_SET_IRQS */
|
||||
}
|
||||
|
||||
/* Gratuitous device reset and go... */
|
||||
ioctl(device, VFIO_DEVICE_RESET);
|
||||
|
||||
VFIO User API
|
||||
-------------------------------------------------------------------------------
|
||||
|
||||
Please see include/linux/vfio.h for complete API documentation.
|
||||
|
||||
VFIO bus driver API
|
||||
-------------------------------------------------------------------------------
|
||||
|
||||
VFIO bus drivers, such as vfio-pci make use of only a few interfaces
|
||||
into VFIO core. When devices are bound and unbound to the driver,
|
||||
the driver should call vfio_add_group_dev() and vfio_del_group_dev()
|
||||
respectively::
|
||||
|
||||
extern int vfio_add_group_dev(struct device *dev,
|
||||
const struct vfio_device_ops *ops,
|
||||
void *device_data);
|
||||
|
||||
extern void *vfio_del_group_dev(struct device *dev);
|
||||
|
||||
vfio_add_group_dev() indicates to the core to begin tracking the
|
||||
iommu_group of the specified dev and register the dev as owned by
|
||||
a VFIO bus driver. The driver provides an ops structure for callbacks
|
||||
similar to a file operations structure::
|
||||
|
||||
struct vfio_device_ops {
|
||||
int (*open)(void *device_data);
|
||||
void (*release)(void *device_data);
|
||||
ssize_t (*read)(void *device_data, char __user *buf,
|
||||
size_t count, loff_t *ppos);
|
||||
ssize_t (*write)(void *device_data, const char __user *buf,
|
||||
size_t size, loff_t *ppos);
|
||||
long (*ioctl)(void *device_data, unsigned int cmd,
|
||||
unsigned long arg);
|
||||
int (*mmap)(void *device_data, struct vm_area_struct *vma);
|
||||
};
|
||||
|
||||
Each function is passed the device_data that was originally registered
|
||||
in the vfio_add_group_dev() call above. This allows the bus driver
|
||||
an easy place to store its opaque, private data. The open/release
|
||||
callbacks are issued when a new file descriptor is created for a
|
||||
device (via VFIO_GROUP_GET_DEVICE_FD). The ioctl interface provides
|
||||
a direct pass through for VFIO_DEVICE_* ioctls. The read/write/mmap
|
||||
interfaces implement the device region access defined by the device's
|
||||
own VFIO_DEVICE_GET_REGION_INFO ioctl.
|
||||
|
||||
|
||||
PPC64 sPAPR implementation note
|
||||
-------------------------------
|
||||
|
||||
This implementation has some specifics:
|
||||
|
||||
1) On older systems (POWER7 with P5IOC2/IODA1) only one IOMMU group per
|
||||
container is supported as an IOMMU table is allocated at the boot time,
|
||||
one table per a IOMMU group which is a Partitionable Endpoint (PE)
|
||||
(PE is often a PCI domain but not always).
|
||||
|
||||
Newer systems (POWER8 with IODA2) have improved hardware design which allows
|
||||
to remove this limitation and have multiple IOMMU groups per a VFIO
|
||||
container.
|
||||
|
||||
2) The hardware supports so called DMA windows - the PCI address range
|
||||
within which DMA transfer is allowed, any attempt to access address space
|
||||
out of the window leads to the whole PE isolation.
|
||||
|
||||
3) PPC64 guests are paravirtualized but not fully emulated. There is an API
|
||||
to map/unmap pages for DMA, and it normally maps 1..32 pages per call and
|
||||
currently there is no way to reduce the number of calls. In order to make
|
||||
things faster, the map/unmap handling has been implemented in real mode
|
||||
which provides an excellent performance which has limitations such as
|
||||
inability to do locked pages accounting in real time.
|
||||
|
||||
4) According to sPAPR specification, A Partitionable Endpoint (PE) is an I/O
|
||||
subtree that can be treated as a unit for the purposes of partitioning and
|
||||
error recovery. A PE may be a single or multi-function IOA (IO Adapter), a
|
||||
function of a multi-function IOA, or multiple IOAs (possibly including
|
||||
switch and bridge structures above the multiple IOAs). PPC64 guests detect
|
||||
PCI errors and recover from them via EEH RTAS services, which works on the
|
||||
basis of additional ioctl commands.
|
||||
|
||||
So 4 additional ioctls have been added:
|
||||
|
||||
VFIO_IOMMU_SPAPR_TCE_GET_INFO
|
||||
returns the size and the start of the DMA window on the PCI bus.
|
||||
|
||||
VFIO_IOMMU_ENABLE
|
||||
enables the container. The locked pages accounting
|
||||
is done at this point. This lets user first to know what
|
||||
the DMA window is and adjust rlimit before doing any real job.
|
||||
|
||||
VFIO_IOMMU_DISABLE
|
||||
disables the container.
|
||||
|
||||
VFIO_EEH_PE_OP
|
||||
provides an API for EEH setup, error detection and recovery.
|
||||
|
||||
The code flow from the example above should be slightly changed::
|
||||
|
||||
struct vfio_eeh_pe_op pe_op = { .argsz = sizeof(pe_op), .flags = 0 };
|
||||
|
||||
.....
|
||||
/* Add the group to the container */
|
||||
ioctl(group, VFIO_GROUP_SET_CONTAINER, &container);
|
||||
|
||||
/* Enable the IOMMU model we want */
|
||||
ioctl(container, VFIO_SET_IOMMU, VFIO_SPAPR_TCE_IOMMU)
|
||||
|
||||
/* Get addition sPAPR IOMMU info */
|
||||
vfio_iommu_spapr_tce_info spapr_iommu_info;
|
||||
ioctl(container, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &spapr_iommu_info);
|
||||
|
||||
if (ioctl(container, VFIO_IOMMU_ENABLE))
|
||||
/* Cannot enable container, may be low rlimit */
|
||||
|
||||
/* Allocate some space and setup a DMA mapping */
|
||||
dma_map.vaddr = mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE,
|
||||
MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
|
||||
|
||||
dma_map.size = 1024 * 1024;
|
||||
dma_map.iova = 0; /* 1MB starting at 0x0 from device view */
|
||||
dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
|
||||
|
||||
/* Check here is .iova/.size are within DMA window from spapr_iommu_info */
|
||||
ioctl(container, VFIO_IOMMU_MAP_DMA, &dma_map);
|
||||
|
||||
/* Get a file descriptor for the device */
|
||||
device = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "0000:06:0d.0");
|
||||
|
||||
....
|
||||
|
||||
/* Gratuitous device reset and go... */
|
||||
ioctl(device, VFIO_DEVICE_RESET);
|
||||
|
||||
/* Make sure EEH is supported */
|
||||
ioctl(container, VFIO_CHECK_EXTENSION, VFIO_EEH);
|
||||
|
||||
/* Enable the EEH functionality on the device */
|
||||
pe_op.op = VFIO_EEH_PE_ENABLE;
|
||||
ioctl(container, VFIO_EEH_PE_OP, &pe_op);
|
||||
|
||||
/* You're suggested to create additional data struct to represent
|
||||
* PE, and put child devices belonging to same IOMMU group to the
|
||||
* PE instance for later reference.
|
||||
*/
|
||||
|
||||
/* Check the PE's state and make sure it's in functional state */
|
||||
pe_op.op = VFIO_EEH_PE_GET_STATE;
|
||||
ioctl(container, VFIO_EEH_PE_OP, &pe_op);
|
||||
|
||||
/* Save device state using pci_save_state().
|
||||
* EEH should be enabled on the specified device.
|
||||
*/
|
||||
|
||||
....
|
||||
|
||||
/* Inject EEH error, which is expected to be caused by 32-bits
|
||||
* config load.
|
||||
*/
|
||||
pe_op.op = VFIO_EEH_PE_INJECT_ERR;
|
||||
pe_op.err.type = EEH_ERR_TYPE_32;
|
||||
pe_op.err.func = EEH_ERR_FUNC_LD_CFG_ADDR;
|
||||
pe_op.err.addr = 0ul;
|
||||
pe_op.err.mask = 0ul;
|
||||
ioctl(container, VFIO_EEH_PE_OP, &pe_op);
|
||||
|
||||
....
|
||||
|
||||
/* When 0xFF's returned from reading PCI config space or IO BARs
|
||||
* of the PCI device. Check the PE's state to see if that has been
|
||||
* frozen.
|
||||
*/
|
||||
ioctl(container, VFIO_EEH_PE_OP, &pe_op);
|
||||
|
||||
/* Waiting for pending PCI transactions to be completed and don't
|
||||
* produce any more PCI traffic from/to the affected PE until
|
||||
* recovery is finished.
|
||||
*/
|
||||
|
||||
/* Enable IO for the affected PE and collect logs. Usually, the
|
||||
* standard part of PCI config space, AER registers are dumped
|
||||
* as logs for further analysis.
|
||||
*/
|
||||
pe_op.op = VFIO_EEH_PE_UNFREEZE_IO;
|
||||
ioctl(container, VFIO_EEH_PE_OP, &pe_op);
|
||||
|
||||
/*
|
||||
* Issue PE reset: hot or fundamental reset. Usually, hot reset
|
||||
* is enough. However, the firmware of some PCI adapters would
|
||||
* require fundamental reset.
|
||||
*/
|
||||
pe_op.op = VFIO_EEH_PE_RESET_HOT;
|
||||
ioctl(container, VFIO_EEH_PE_OP, &pe_op);
|
||||
pe_op.op = VFIO_EEH_PE_RESET_DEACTIVATE;
|
||||
ioctl(container, VFIO_EEH_PE_OP, &pe_op);
|
||||
|
||||
/* Configure the PCI bridges for the affected PE */
|
||||
pe_op.op = VFIO_EEH_PE_CONFIGURE;
|
||||
ioctl(container, VFIO_EEH_PE_OP, &pe_op);
|
||||
|
||||
/* Restored state we saved at initialization time. pci_restore_state()
|
||||
* is good enough as an example.
|
||||
*/
|
||||
|
||||
/* Hopefully, error is recovered successfully. Now, you can resume to
|
||||
* start PCI traffic to/from the affected PE.
|
||||
*/
|
||||
|
||||
....
|
||||
|
||||
5) There is v2 of SPAPR TCE IOMMU. It deprecates VFIO_IOMMU_ENABLE/
|
||||
VFIO_IOMMU_DISABLE and implements 2 new ioctls:
|
||||
VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY
|
||||
(which are unsupported in v1 IOMMU).
|
||||
|
||||
PPC64 paravirtualized guests generate a lot of map/unmap requests,
|
||||
and the handling of those includes pinning/unpinning pages and updating
|
||||
mm::locked_vm counter to make sure we do not exceed the rlimit.
|
||||
The v2 IOMMU splits accounting and pinning into separate operations:
|
||||
|
||||
- VFIO_IOMMU_SPAPR_REGISTER_MEMORY/VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY ioctls
|
||||
receive a user space address and size of the block to be pinned.
|
||||
Bisecting is not supported and VFIO_IOMMU_UNREGISTER_MEMORY is expected to
|
||||
be called with the exact address and size used for registering
|
||||
the memory block. The userspace is not expected to call these often.
|
||||
The ranges are stored in a linked list in a VFIO container.
|
||||
|
||||
- VFIO_IOMMU_MAP_DMA/VFIO_IOMMU_UNMAP_DMA ioctls only update the actual
|
||||
IOMMU table and do not do pinning; instead these check that the userspace
|
||||
address is from pre-registered range.
|
||||
|
||||
This separation helps in optimizing DMA for guests.
|
||||
|
||||
6) sPAPR specification allows guests to have an additional DMA window(s) on
|
||||
a PCI bus with a variable page size. Two ioctls have been added to support
|
||||
this: VFIO_IOMMU_SPAPR_TCE_CREATE and VFIO_IOMMU_SPAPR_TCE_REMOVE.
|
||||
The platform has to support the functionality or error will be returned to
|
||||
the userspace. The existing hardware supports up to 2 DMA windows, one is
|
||||
2GB long, uses 4K pages and called "default 32bit window"; the other can
|
||||
be as big as entire RAM, use different page size, it is optional - guests
|
||||
create those in run-time if the guest driver supports 64bit DMA.
|
||||
|
||||
VFIO_IOMMU_SPAPR_TCE_CREATE receives a page shift, a DMA window size and
|
||||
a number of TCE table levels (if a TCE table is going to be big enough and
|
||||
the kernel may not be able to allocate enough of physically contiguous
|
||||
memory). It creates a new window in the available slot and returns the bus
|
||||
address where the new window starts. Due to hardware limitation, the user
|
||||
space cannot choose the location of DMA windows.
|
||||
|
||||
VFIO_IOMMU_SPAPR_TCE_REMOVE receives the bus start address of the window
|
||||
and removes it.
|
||||
|
||||
-------------------------------------------------------------------------------
|
||||
|
||||
.. [1] VFIO was originally an acronym for "Virtual Function I/O" in its
|
||||
initial implementation by Tom Lyon while as Cisco. We've since
|
||||
outgrown the acronym, but it's catchy.
|
||||
|
||||
.. [2] "safe" also depends upon a device being "well behaved". It's
|
||||
possible for multi-function devices to have backdoors between
|
||||
functions and even for single function devices to have alternative
|
||||
access to things like PCI config space through MMIO registers. To
|
||||
guard against the former we can include additional precautions in the
|
||||
IOMMU driver to group multi-function PCI devices together
|
||||
(iommu=group_mf). The latter we can't prevent, but the IOMMU should
|
||||
still provide isolation. For PCI, SR-IOV Virtual Functions are the
|
||||
best indicator of "well behaved", as these are designed for
|
||||
virtualization usage models.
|
||||
|
||||
.. [3] As always there are trade-offs to virtual machine device
|
||||
assignment that are beyond the scope of VFIO. It's expected that
|
||||
future IOMMU technologies will reduce some, but maybe not all, of
|
||||
these trade-offs.
|
||||
|
||||
.. [4] In this case the device is below a PCI bridge, so transactions
|
||||
from either function of the device are indistinguishable to the iommu::
|
||||
|
||||
-[0000:00]-+-1e.0-[06]--+-0d.0
|
||||
\-0d.1
|
||||
|
||||
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
|
67
Documentation/driver-api/xilinx/eemi.rst
Normal file
67
Documentation/driver-api/xilinx/eemi.rst
Normal file
@@ -0,0 +1,67 @@
|
||||
====================================
|
||||
Xilinx Zynq MPSoC EEMI Documentation
|
||||
====================================
|
||||
|
||||
Xilinx Zynq MPSoC Firmware Interface
|
||||
-------------------------------------
|
||||
The zynqmp-firmware node describes the interface to platform firmware.
|
||||
ZynqMP has an interface to communicate with secure firmware. Firmware
|
||||
driver provides an interface to firmware APIs. Interface APIs can be
|
||||
used by any driver to communicate with PMC(Platform Management Controller).
|
||||
|
||||
Embedded Energy Management Interface (EEMI)
|
||||
----------------------------------------------
|
||||
The embedded energy management interface is used to allow software
|
||||
components running across different processing clusters on a chip or
|
||||
device to communicate with a power management controller (PMC) on a
|
||||
device to issue or respond to power management requests.
|
||||
|
||||
EEMI ops is a structure containing all eemi APIs supported by Zynq MPSoC.
|
||||
The zynqmp-firmware driver maintain all EEMI APIs in zynqmp_eemi_ops
|
||||
structure. Any driver who want to communicate with PMC using EEMI APIs
|
||||
can call zynqmp_pm_get_eemi_ops().
|
||||
|
||||
Example of EEMI ops::
|
||||
|
||||
/* zynqmp-firmware driver maintain all EEMI APIs */
|
||||
struct zynqmp_eemi_ops {
|
||||
int (*get_api_version)(u32 *version);
|
||||
int (*query_data)(struct zynqmp_pm_query_data qdata, u32 *out);
|
||||
};
|
||||
|
||||
static const struct zynqmp_eemi_ops eemi_ops = {
|
||||
.get_api_version = zynqmp_pm_get_api_version,
|
||||
.query_data = zynqmp_pm_query_data,
|
||||
};
|
||||
|
||||
Example of EEMI ops usage::
|
||||
|
||||
static const struct zynqmp_eemi_ops *eemi_ops;
|
||||
u32 ret_payload[PAYLOAD_ARG_CNT];
|
||||
int ret;
|
||||
|
||||
eemi_ops = zynqmp_pm_get_eemi_ops();
|
||||
if (IS_ERR(eemi_ops))
|
||||
return PTR_ERR(eemi_ops);
|
||||
|
||||
ret = eemi_ops->query_data(qdata, ret_payload);
|
||||
|
||||
IOCTL
|
||||
------
|
||||
IOCTL API is for device control and configuration. It is not a system
|
||||
IOCTL but it is an EEMI API. This API can be used by master to control
|
||||
any device specific configuration. IOCTL definitions can be platform
|
||||
specific. This API also manage shared device configuration.
|
||||
|
||||
The following IOCTL IDs are valid for device control:
|
||||
- IOCTL_SET_PLL_FRAC_MODE 8
|
||||
- IOCTL_GET_PLL_FRAC_MODE 9
|
||||
- IOCTL_SET_PLL_FRAC_DATA 10
|
||||
- IOCTL_GET_PLL_FRAC_DATA 11
|
||||
|
||||
Refer EEMI API guide [0] for IOCTL specific parameters and other EEMI APIs.
|
||||
|
||||
References
|
||||
----------
|
||||
[0] Embedded Energy Management Interface (EEMI) API guide:
|
||||
https://www.xilinx.com/support/documentation/user_guides/ug1200-eemi-api.pdf
|
16
Documentation/driver-api/xilinx/index.rst
Normal file
16
Documentation/driver-api/xilinx/index.rst
Normal file
@@ -0,0 +1,16 @@
|
||||
|
||||
===========
|
||||
Xilinx FPGA
|
||||
===========
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
eemi
|
||||
|
||||
.. only:: subproject and html
|
||||
|
||||
Indices
|
||||
=======
|
||||
|
||||
* :ref:`genindex`
|
379
Documentation/driver-api/xillybus.rst
Normal file
379
Documentation/driver-api/xillybus.rst
Normal file
@@ -0,0 +1,379 @@
|
||||
==========================================
|
||||
Xillybus driver for generic FPGA interface
|
||||
==========================================
|
||||
|
||||
:Author: Eli Billauer, Xillybus Ltd. (http://xillybus.com)
|
||||
:Email: eli.billauer@gmail.com or as advertised on Xillybus' site.
|
||||
|
||||
.. Contents:
|
||||
|
||||
- Introduction
|
||||
-- Background
|
||||
-- Xillybus Overview
|
||||
|
||||
- Usage
|
||||
-- User interface
|
||||
-- Synchronization
|
||||
-- Seekable pipes
|
||||
|
||||
- Internals
|
||||
-- Source code organization
|
||||
-- Pipe attributes
|
||||
-- Host never reads from the FPGA
|
||||
-- Channels, pipes, and the message channel
|
||||
-- Data streaming
|
||||
-- Data granularity
|
||||
-- Probing
|
||||
-- Buffer allocation
|
||||
-- The "nonempty" message (supporting poll)
|
||||
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
Background
|
||||
----------
|
||||
|
||||
An FPGA (Field Programmable Gate Array) is a piece of logic hardware, which
|
||||
can be programmed to become virtually anything that is usually found as a
|
||||
dedicated chipset: For instance, a display adapter, network interface card,
|
||||
or even a processor with its peripherals. FPGAs are the LEGO of hardware:
|
||||
Based upon certain building blocks, you make your own toys the way you like
|
||||
them. It's usually pointless to reimplement something that is already
|
||||
available on the market as a chipset, so FPGAs are mostly used when some
|
||||
special functionality is needed, and the production volume is relatively low
|
||||
(hence not justifying the development of an ASIC).
|
||||
|
||||
The challenge with FPGAs is that everything is implemented at a very low
|
||||
level, even lower than assembly language. In order to allow FPGA designers to
|
||||
focus on their specific project, and not reinvent the wheel over and over
|
||||
again, pre-designed building blocks, IP cores, are often used. These are the
|
||||
FPGA parallels of library functions. IP cores may implement certain
|
||||
mathematical functions, a functional unit (e.g. a USB interface), an entire
|
||||
processor (e.g. ARM) or anything that might come handy. Think of them as a
|
||||
building block, with electrical wires dangling on the sides for connection to
|
||||
other blocks.
|
||||
|
||||
One of the daunting tasks in FPGA design is communicating with a fullblown
|
||||
operating system (actually, with the processor running it): Implementing the
|
||||
low-level bus protocol and the somewhat higher-level interface with the host
|
||||
(registers, interrupts, DMA etc.) is a project in itself. When the FPGA's
|
||||
function is a well-known one (e.g. a video adapter card, or a NIC), it can
|
||||
make sense to design the FPGA's interface logic specifically for the project.
|
||||
A special driver is then written to present the FPGA as a well-known interface
|
||||
to the kernel and/or user space. In that case, there is no reason to treat the
|
||||
FPGA differently than any device on the bus.
|
||||
|
||||
It's however common that the desired data communication doesn't fit any well-
|
||||
known peripheral function. Also, the effort of designing an elegant
|
||||
abstraction for the data exchange is often considered too big. In those cases,
|
||||
a quicker and possibly less elegant solution is sought: The driver is
|
||||
effectively written as a user space program, leaving the kernel space part
|
||||
with just elementary data transport. This still requires designing some
|
||||
interface logic for the FPGA, and write a simple ad-hoc driver for the kernel.
|
||||
|
||||
Xillybus Overview
|
||||
-----------------
|
||||
|
||||
Xillybus is an IP core and a Linux driver. Together, they form a kit for
|
||||
elementary data transport between an FPGA and the host, providing pipe-like
|
||||
data streams with a straightforward user interface. It's intended as a low-
|
||||
effort solution for mixed FPGA-host projects, for which it makes sense to
|
||||
have the project-specific part of the driver running in a user-space program.
|
||||
|
||||
Since the communication requirements may vary significantly from one FPGA
|
||||
project to another (the number of data pipes needed in each direction and
|
||||
their attributes), there isn't one specific chunk of logic being the Xillybus
|
||||
IP core. Rather, the IP core is configured and built based upon a
|
||||
specification given by its end user.
|
||||
|
||||
Xillybus presents independent data streams, which resemble pipes or TCP/IP
|
||||
communication to the user. At the host side, a character device file is used
|
||||
just like any pipe file. On the FPGA side, hardware FIFOs are used to stream
|
||||
the data. This is contrary to a common method of communicating through fixed-
|
||||
sized buffers (even though such buffers are used by Xillybus under the hood).
|
||||
There may be more than a hundred of these streams on a single IP core, but
|
||||
also no more than one, depending on the configuration.
|
||||
|
||||
In order to ease the deployment of the Xillybus IP core, it contains a simple
|
||||
data structure which completely defines the core's configuration. The Linux
|
||||
driver fetches this data structure during its initialization process, and sets
|
||||
up the DMA buffers and character devices accordingly. As a result, a single
|
||||
driver is used to work out of the box with any Xillybus IP core.
|
||||
|
||||
The data structure just mentioned should not be confused with PCI's
|
||||
configuration space or the Flattened Device Tree.
|
||||
|
||||
Usage
|
||||
=====
|
||||
|
||||
User interface
|
||||
--------------
|
||||
|
||||
On the host, all interface with Xillybus is done through /dev/xillybus_*
|
||||
device files, which are generated automatically as the drivers loads. The
|
||||
names of these files depend on the IP core that is loaded in the FPGA (see
|
||||
Probing below). To communicate with the FPGA, open the device file that
|
||||
corresponds to the hardware FIFO you want to send data or receive data from,
|
||||
and use plain write() or read() calls, just like with a regular pipe. In
|
||||
particular, it makes perfect sense to go::
|
||||
|
||||
$ cat mydata > /dev/xillybus_thisfifo
|
||||
|
||||
$ cat /dev/xillybus_thatfifo > hisdata
|
||||
|
||||
possibly pressing CTRL-C as some stage, even though the xillybus_* pipes have
|
||||
the capability to send an EOF (but may not use it).
|
||||
|
||||
The driver and hardware are designed to behave sensibly as pipes, including:
|
||||
|
||||
* Supporting non-blocking I/O (by setting O_NONBLOCK on open() ).
|
||||
|
||||
* Supporting poll() and select().
|
||||
|
||||
* Being bandwidth efficient under load (using DMA) but also handle small
|
||||
pieces of data sent across (like TCP/IP) by autoflushing.
|
||||
|
||||
A device file can be read only, write only or bidirectional. Bidirectional
|
||||
device files are treated like two independent pipes (except for sharing a
|
||||
"channel" structure in the implementation code).
|
||||
|
||||
Synchronization
|
||||
---------------
|
||||
|
||||
Xillybus pipes are configured (on the IP core) to be either synchronous or
|
||||
asynchronous. For a synchronous pipe, write() returns successfully only after
|
||||
some data has been submitted and acknowledged by the FPGA. This slows down
|
||||
bulk data transfers, and is nearly impossible for use with streams that
|
||||
require data at a constant rate: There is no data transmitted to the FPGA
|
||||
between write() calls, in particular when the process loses the CPU.
|
||||
|
||||
When a pipe is configured asynchronous, write() returns if there was enough
|
||||
room in the buffers to store any of the data in the buffers.
|
||||
|
||||
For FPGA to host pipes, asynchronous pipes allow data transfer from the FPGA
|
||||
as soon as the respective device file is opened, regardless of if the data
|
||||
has been requested by a read() call. On synchronous pipes, only the amount
|
||||
of data requested by a read() call is transmitted.
|
||||
|
||||
In summary, for synchronous pipes, data between the host and FPGA is
|
||||
transmitted only to satisfy the read() or write() call currently handled
|
||||
by the driver, and those calls wait for the transmission to complete before
|
||||
returning.
|
||||
|
||||
Note that the synchronization attribute has nothing to do with the possibility
|
||||
that read() or write() completes less bytes than requested. There is a
|
||||
separate configuration flag ("allowpartial") that determines whether such a
|
||||
partial completion is allowed.
|
||||
|
||||
Seekable pipes
|
||||
--------------
|
||||
|
||||
A synchronous pipe can be configured to have the stream's position exposed
|
||||
to the user logic at the FPGA. Such a pipe is also seekable on the host API.
|
||||
With this feature, a memory or register interface can be attached on the
|
||||
FPGA side to the seekable stream. Reading or writing to a certain address in
|
||||
the attached memory is done by seeking to the desired address, and calling
|
||||
read() or write() as required.
|
||||
|
||||
|
||||
Internals
|
||||
=========
|
||||
|
||||
Source code organization
|
||||
------------------------
|
||||
|
||||
The Xillybus driver consists of a core module, xillybus_core.c, and modules
|
||||
that depend on the specific bus interface (xillybus_of.c and xillybus_pcie.c).
|
||||
|
||||
The bus specific modules are those probed when a suitable device is found by
|
||||
the kernel. Since the DMA mapping and synchronization functions, which are bus
|
||||
dependent by their nature, are used by the core module, a
|
||||
xilly_endpoint_hardware structure is passed to the core module on
|
||||
initialization. This structure is populated with pointers to wrapper functions
|
||||
which execute the DMA-related operations on the bus.
|
||||
|
||||
Pipe attributes
|
||||
---------------
|
||||
|
||||
Each pipe has a number of attributes which are set when the FPGA component
|
||||
(IP core) is built. They are fetched from the IDT (the data structure which
|
||||
defines the core's configuration, see Probing below) by xilly_setupchannels()
|
||||
in xillybus_core.c as follows:
|
||||
|
||||
* is_writebuf: The pipe's direction. A non-zero value means it's an FPGA to
|
||||
host pipe (the FPGA "writes").
|
||||
|
||||
* channelnum: The pipe's identification number in communication between the
|
||||
host and FPGA.
|
||||
|
||||
* format: The underlying data width. See Data Granularity below.
|
||||
|
||||
* allowpartial: A non-zero value means that a read() or write() (whichever
|
||||
applies) may return with less than the requested number of bytes. The common
|
||||
choice is a non-zero value, to match standard UNIX behavior.
|
||||
|
||||
* synchronous: A non-zero value means that the pipe is synchronous. See
|
||||
Synchronization above.
|
||||
|
||||
* bufsize: Each DMA buffer's size. Always a power of two.
|
||||
|
||||
* bufnum: The number of buffers allocated for this pipe. Always a power of two.
|
||||
|
||||
* exclusive_open: A non-zero value forces exclusive opening of the associated
|
||||
device file. If the device file is bidirectional, and already opened only in
|
||||
one direction, the opposite direction may be opened once.
|
||||
|
||||
* seekable: A non-zero value indicates that the pipe is seekable. See
|
||||
Seekable pipes above.
|
||||
|
||||
* supports_nonempty: A non-zero value (which is typical) indicates that the
|
||||
hardware will send the messages that are necessary to support select() and
|
||||
poll() for this pipe.
|
||||
|
||||
Host never reads from the FPGA
|
||||
------------------------------
|
||||
|
||||
Even though PCI Express is hotpluggable in general, a typical motherboard
|
||||
doesn't expect a card to go away all of the sudden. But since the PCIe card
|
||||
is based upon reprogrammable logic, a sudden disappearance from the bus is
|
||||
quite likely as a result of an accidental reprogramming of the FPGA while the
|
||||
host is up. In practice, nothing happens immediately in such a situation. But
|
||||
if the host attempts to read from an address that is mapped to the PCI Express
|
||||
device, that leads to an immediate freeze of the system on some motherboards,
|
||||
even though the PCIe standard requires a graceful recovery.
|
||||
|
||||
In order to avoid these freezes, the Xillybus driver refrains completely from
|
||||
reading from the device's register space. All communication from the FPGA to
|
||||
the host is done through DMA. In particular, the Interrupt Service Routine
|
||||
doesn't follow the common practice of checking a status register when it's
|
||||
invoked. Rather, the FPGA prepares a small buffer which contains short
|
||||
messages, which inform the host what the interrupt was about.
|
||||
|
||||
This mechanism is used on non-PCIe buses as well for the sake of uniformity.
|
||||
|
||||
|
||||
Channels, pipes, and the message channel
|
||||
----------------------------------------
|
||||
|
||||
Each of the (possibly bidirectional) pipes presented to the user is allocated
|
||||
a data channel between the FPGA and the host. The distinction between channels
|
||||
and pipes is necessary only because of channel 0, which is used for interrupt-
|
||||
related messages from the FPGA, and has no pipe attached to it.
|
||||
|
||||
Data streaming
|
||||
--------------
|
||||
|
||||
Even though a non-segmented data stream is presented to the user at both
|
||||
sides, the implementation relies on a set of DMA buffers which is allocated
|
||||
for each channel. For the sake of illustration, let's take the FPGA to host
|
||||
direction: As data streams into the respective channel's interface in the
|
||||
FPGA, the Xillybus IP core writes it to one of the DMA buffers. When the
|
||||
buffer is full, the FPGA informs the host about that (appending a
|
||||
XILLYMSG_OPCODE_RELEASEBUF message channel 0 and sending an interrupt if
|
||||
necessary). The host responds by making the data available for reading through
|
||||
the character device. When all data has been read, the host writes on the
|
||||
the FPGA's buffer control register, allowing the buffer's overwriting. Flow
|
||||
control mechanisms exist on both sides to prevent underflows and overflows.
|
||||
|
||||
This is not good enough for creating a TCP/IP-like stream: If the data flow
|
||||
stops momentarily before a DMA buffer is filled, the intuitive expectation is
|
||||
that the partial data in buffer will arrive anyhow, despite the buffer not
|
||||
being completed. This is implemented by adding a field in the
|
||||
XILLYMSG_OPCODE_RELEASEBUF message, through which the FPGA informs not just
|
||||
which buffer is submitted, but how much data it contains.
|
||||
|
||||
But the FPGA will submit a partially filled buffer only if directed to do so
|
||||
by the host. This situation occurs when the read() method has been blocking
|
||||
for XILLY_RX_TIMEOUT jiffies (currently 10 ms), after which the host commands
|
||||
the FPGA to submit a DMA buffer as soon as it can. This timeout mechanism
|
||||
balances between bus bandwidth efficiency (preventing a lot of partially
|
||||
filled buffers being sent) and a latency held fairly low for tails of data.
|
||||
|
||||
A similar setting is used in the host to FPGA direction. The handling of
|
||||
partial DMA buffers is somewhat different, though. The user can tell the
|
||||
driver to submit all data it has in the buffers to the FPGA, by issuing a
|
||||
write() with the byte count set to zero. This is similar to a flush request,
|
||||
but it doesn't block. There is also an autoflushing mechanism, which triggers
|
||||
an equivalent flush roughly XILLY_RX_TIMEOUT jiffies after the last write().
|
||||
This allows the user to be oblivious about the underlying buffering mechanism
|
||||
and yet enjoy a stream-like interface.
|
||||
|
||||
Note that the issue of partial buffer flushing is irrelevant for pipes having
|
||||
the "synchronous" attribute nonzero, since synchronous pipes don't allow data
|
||||
to lay around in the DMA buffers between read() and write() anyhow.
|
||||
|
||||
Data granularity
|
||||
----------------
|
||||
|
||||
The data arrives or is sent at the FPGA as 8, 16 or 32 bit wide words, as
|
||||
configured by the "format" attribute. Whenever possible, the driver attempts
|
||||
to hide this when the pipe is accessed differently from its natural alignment.
|
||||
For example, reading single bytes from a pipe with 32 bit granularity works
|
||||
with no issues. Writing single bytes to pipes with 16 or 32 bit granularity
|
||||
will also work, but the driver can't send partially completed words to the
|
||||
FPGA, so the transmission of up to one word may be held until it's fully
|
||||
occupied with user data.
|
||||
|
||||
This somewhat complicates the handling of host to FPGA streams, because
|
||||
when a buffer is flushed, it may contain up to 3 bytes don't form a word in
|
||||
the FPGA, and hence can't be sent. To prevent loss of data, these leftover
|
||||
bytes need to be moved to the next buffer. The parts in xillybus_core.c
|
||||
that mention "leftovers" in some way are related to this complication.
|
||||
|
||||
Probing
|
||||
-------
|
||||
|
||||
As mentioned earlier, the number of pipes that are created when the driver
|
||||
loads and their attributes depend on the Xillybus IP core in the FPGA. During
|
||||
the driver's initialization, a blob containing configuration info, the
|
||||
Interface Description Table (IDT), is sent from the FPGA to the host. The
|
||||
bootstrap process is done in three phases:
|
||||
|
||||
1. Acquire the length of the IDT, so a buffer can be allocated for it. This
|
||||
is done by sending a quiesce command to the device, since the acknowledge
|
||||
for this command contains the IDT's buffer length.
|
||||
|
||||
2. Acquire the IDT itself.
|
||||
|
||||
3. Create the interfaces according to the IDT.
|
||||
|
||||
Buffer allocation
|
||||
-----------------
|
||||
|
||||
In order to simplify the logic that prevents illegal boundary crossings of
|
||||
PCIe packets, the following rule applies: If a buffer is smaller than 4kB,
|
||||
it must not cross a 4kB boundary. Otherwise, it must be 4kB aligned. The
|
||||
xilly_setupchannels() functions allocates these buffers by requesting whole
|
||||
pages from the kernel, and diving them into DMA buffers as necessary. Since
|
||||
all buffers' sizes are powers of two, it's possible to pack any set of such
|
||||
buffers, with a maximal waste of one page of memory.
|
||||
|
||||
All buffers are allocated when the driver is loaded. This is necessary,
|
||||
since large continuous physical memory segments are sometimes requested,
|
||||
which are more likely to be available when the system is freshly booted.
|
||||
|
||||
The allocation of buffer memory takes place in the same order they appear in
|
||||
the IDT. The driver relies on a rule that the pipes are sorted with decreasing
|
||||
buffer size in the IDT. If a requested buffer is larger or equal to a page,
|
||||
the necessary number of pages is requested from the kernel, and these are
|
||||
used for this buffer. If the requested buffer is smaller than a page, one
|
||||
single page is requested from the kernel, and that page is partially used.
|
||||
Or, if there already is a partially used page at hand, the buffer is packed
|
||||
into that page. It can be shown that all pages requested from the kernel
|
||||
(except possibly for the last) are 100% utilized this way.
|
||||
|
||||
The "nonempty" message (supporting poll)
|
||||
----------------------------------------
|
||||
|
||||
In order to support the "poll" method (and hence select() ), there is a small
|
||||
catch regarding the FPGA to host direction: The FPGA may have filled a DMA
|
||||
buffer with some data, but not submitted that buffer. If the host waited for
|
||||
the buffer's submission by the FPGA, there would be a possibility that the
|
||||
FPGA side has sent data, but a select() call would still block, because the
|
||||
host has not received any notification about this. This is solved with
|
||||
XILLYMSG_OPCODE_NONEMPTY messages sent by the FPGA when a channel goes from
|
||||
completely empty to containing some data.
|
||||
|
||||
These messages are used only to support poll() and select(). The IP core can
|
||||
be configured not to send them for a slight reduction of bandwidth.
|
104
Documentation/driver-api/zorro.rst
Normal file
104
Documentation/driver-api/zorro.rst
Normal file
@@ -0,0 +1,104 @@
|
||||
========================================
|
||||
Writing Device Drivers for Zorro Devices
|
||||
========================================
|
||||
|
||||
:Author: Written by Geert Uytterhoeven <geert@linux-m68k.org>
|
||||
:Last revised: September 5, 2003
|
||||
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
The Zorro bus is the bus used in the Amiga family of computers. Thanks to
|
||||
AutoConfig(tm), it's 100% Plug-and-Play.
|
||||
|
||||
There are two types of Zorro buses, Zorro II and Zorro III:
|
||||
|
||||
- The Zorro II address space is 24-bit and lies within the first 16 MB of the
|
||||
Amiga's address map.
|
||||
|
||||
- Zorro III is a 32-bit extension of Zorro II, which is backwards compatible
|
||||
with Zorro II. The Zorro III address space lies outside the first 16 MB.
|
||||
|
||||
|
||||
Probing for Zorro Devices
|
||||
-------------------------
|
||||
|
||||
Zorro devices are found by calling ``zorro_find_device()``, which returns a
|
||||
pointer to the ``next`` Zorro device with the specified Zorro ID. A probe loop
|
||||
for the board with Zorro ID ``ZORRO_PROD_xxx`` looks like::
|
||||
|
||||
struct zorro_dev *z = NULL;
|
||||
|
||||
while ((z = zorro_find_device(ZORRO_PROD_xxx, z))) {
|
||||
if (!zorro_request_region(z->resource.start+MY_START, MY_SIZE,
|
||||
"My explanation"))
|
||||
...
|
||||
}
|
||||
|
||||
``ZORRO_WILDCARD`` acts as a wildcard and finds any Zorro device. If your driver
|
||||
supports different types of boards, you can use a construct like::
|
||||
|
||||
struct zorro_dev *z = NULL;
|
||||
|
||||
while ((z = zorro_find_device(ZORRO_WILDCARD, z))) {
|
||||
if (z->id != ZORRO_PROD_xxx1 && z->id != ZORRO_PROD_xxx2 && ...)
|
||||
continue;
|
||||
if (!zorro_request_region(z->resource.start+MY_START, MY_SIZE,
|
||||
"My explanation"))
|
||||
...
|
||||
}
|
||||
|
||||
|
||||
Zorro Resources
|
||||
---------------
|
||||
|
||||
Before you can access a Zorro device's registers, you have to make sure it's
|
||||
not yet in use. This is done using the I/O memory space resource management
|
||||
functions::
|
||||
|
||||
request_mem_region()
|
||||
release_mem_region()
|
||||
|
||||
Shortcuts to claim the whole device's address space are provided as well::
|
||||
|
||||
zorro_request_device
|
||||
zorro_release_device
|
||||
|
||||
|
||||
Accessing the Zorro Address Space
|
||||
---------------------------------
|
||||
|
||||
The address regions in the Zorro device resources are Zorro bus address
|
||||
regions. Due to the identity bus-physical address mapping on the Zorro bus,
|
||||
they are CPU physical addresses as well.
|
||||
|
||||
The treatment of these regions depends on the type of Zorro space:
|
||||
|
||||
- Zorro II address space is always mapped and does not have to be mapped
|
||||
explicitly using z_ioremap().
|
||||
|
||||
Conversion from bus/physical Zorro II addresses to kernel virtual addresses
|
||||
and vice versa is done using::
|
||||
|
||||
virt_addr = ZTWO_VADDR(bus_addr);
|
||||
bus_addr = ZTWO_PADDR(virt_addr);
|
||||
|
||||
- Zorro III address space must be mapped explicitly using z_ioremap() first
|
||||
before it can be accessed::
|
||||
|
||||
virt_addr = z_ioremap(bus_addr, size);
|
||||
...
|
||||
z_iounmap(virt_addr);
|
||||
|
||||
|
||||
References
|
||||
----------
|
||||
|
||||
#. linux/include/linux/zorro.h
|
||||
#. linux/include/uapi/linux/zorro.h
|
||||
#. linux/include/uapi/linux/zorro_ids.h
|
||||
#. linux/arch/m68k/include/asm/zorro.h
|
||||
#. linux/drivers/zorro
|
||||
#. /proc/bus/zorro
|
||||
|
Reference in New Issue
Block a user