Merge commit 'v2.6.30-rc1' into sched/urgent
Merge reason: update to latest upstream to queue up fix Signed-off-by: Ingo Molnar <mingo@elte.hu>
This commit is contained in:
22
CREDITS
22
CREDITS
@@ -495,6 +495,11 @@ S: Kopmansg 2
|
|||||||
S: 411 13 Goteborg
|
S: 411 13 Goteborg
|
||||||
S: Sweden
|
S: Sweden
|
||||||
|
|
||||||
|
N: Paul Bristow
|
||||||
|
E: paul@paulbristow.net
|
||||||
|
W: http://paulbristow.net/linux/idefloppy.html
|
||||||
|
D: Maintainer of IDE/ATAPI floppy driver
|
||||||
|
|
||||||
N: Dominik Brodowski
|
N: Dominik Brodowski
|
||||||
E: linux@brodo.de
|
E: linux@brodo.de
|
||||||
W: http://www.brodo.de/
|
W: http://www.brodo.de/
|
||||||
@@ -1407,8 +1412,8 @@ P: 1024D/77D4FC9B F5C5 1C20 1DFC DEC3 3107 54A4 2332 ADFC 77D4 FC9B
|
|||||||
D: National Language Support
|
D: National Language Support
|
||||||
D: Linux Internationalization Project
|
D: Linux Internationalization Project
|
||||||
D: German Localization for Linux and GNU software
|
D: German Localization for Linux and GNU software
|
||||||
S: Kriemhildring 12a
|
S: Auf der Fittel 18
|
||||||
S: 65795 Hattersheim am Main
|
S: 53347 Alfter
|
||||||
S: Germany
|
S: Germany
|
||||||
|
|
||||||
N: Christoph Hellwig
|
N: Christoph Hellwig
|
||||||
@@ -2166,7 +2171,6 @@ D: Initial implementation of VC's, pty's and select()
|
|||||||
|
|
||||||
N: Pavel Machek
|
N: Pavel Machek
|
||||||
E: pavel@ucw.cz
|
E: pavel@ucw.cz
|
||||||
E: pavel@suse.cz
|
|
||||||
D: Softcursor for vga, hypertech cdrom support, vcsa bugfix, nbd
|
D: Softcursor for vga, hypertech cdrom support, vcsa bugfix, nbd
|
||||||
D: sun4/330 port, capabilities for elf, speedup for rm on ext2, USB,
|
D: sun4/330 port, capabilities for elf, speedup for rm on ext2, USB,
|
||||||
D: work on suspend-to-ram/disk, killing duplicates from ioctl32
|
D: work on suspend-to-ram/disk, killing duplicates from ioctl32
|
||||||
@@ -2643,6 +2647,10 @@ S: C/ Mieses 20, 9-B
|
|||||||
S: Valladolid 47009
|
S: Valladolid 47009
|
||||||
S: Spain
|
S: Spain
|
||||||
|
|
||||||
|
N: Gadi Oxman
|
||||||
|
E: gadio@netvision.net.il
|
||||||
|
D: Original author and maintainer of IDE/ATAPI floppy/tape drivers
|
||||||
|
|
||||||
N: Greg Page
|
N: Greg Page
|
||||||
E: gpage@sovereign.org
|
E: gpage@sovereign.org
|
||||||
D: IPX development and support
|
D: IPX development and support
|
||||||
@@ -3572,6 +3580,12 @@ N: Dirk Verworner
|
|||||||
D: Co-author of German book ``Linux-Kernel-Programmierung''
|
D: Co-author of German book ``Linux-Kernel-Programmierung''
|
||||||
D: Co-founder of Berlin Linux User Group
|
D: Co-founder of Berlin Linux User Group
|
||||||
|
|
||||||
|
N: Riku Voipio
|
||||||
|
E: riku.voipio@iki.fi
|
||||||
|
D: Author of PCA9532 LED and Fintek f75375s hwmon driver
|
||||||
|
D: Some random ARM board patches
|
||||||
|
S: Finland
|
||||||
|
|
||||||
N: Patrick Volkerding
|
N: Patrick Volkerding
|
||||||
E: volkerdi@ftp.cdrom.com
|
E: volkerdi@ftp.cdrom.com
|
||||||
D: Produced the Slackware distribution, updated the SVGAlib
|
D: Produced the Slackware distribution, updated the SVGAlib
|
||||||
@@ -3739,7 +3753,7 @@ S: 93149 Nittenau
|
|||||||
S: Germany
|
S: Germany
|
||||||
|
|
||||||
N: Gertjan van Wingerde
|
N: Gertjan van Wingerde
|
||||||
E: gwingerde@home.nl
|
E: gwingerde@gmail.com
|
||||||
D: Ralink rt2x00 WLAN driver
|
D: Ralink rt2x00 WLAN driver
|
||||||
D: Minix V2 file-system
|
D: Minix V2 file-system
|
||||||
D: Misc fixes
|
D: Misc fixes
|
||||||
|
@@ -86,6 +86,8 @@ cachetlb.txt
|
|||||||
- describes the cache/TLB flushing interfaces Linux uses.
|
- describes the cache/TLB flushing interfaces Linux uses.
|
||||||
cdrom/
|
cdrom/
|
||||||
- directory with information on the CD-ROM drivers that Linux has.
|
- directory with information on the CD-ROM drivers that Linux has.
|
||||||
|
cgroups/
|
||||||
|
- cgroups features, including cpusets and memory controller.
|
||||||
connector/
|
connector/
|
||||||
- docs on the netlink based userspace<->kernel space communication mod.
|
- docs on the netlink based userspace<->kernel space communication mod.
|
||||||
console/
|
console/
|
||||||
@@ -98,8 +100,6 @@ cpu-load.txt
|
|||||||
- document describing how CPU load statistics are collected.
|
- document describing how CPU load statistics are collected.
|
||||||
cpuidle/
|
cpuidle/
|
||||||
- info on CPU_IDLE, CPU idle state management subsystem.
|
- info on CPU_IDLE, CPU idle state management subsystem.
|
||||||
cpusets.txt
|
|
||||||
- documents the cpusets feature; assign CPUs and Mem to a set of tasks.
|
|
||||||
cputopology.txt
|
cputopology.txt
|
||||||
- documentation on how CPU topology info is exported via sysfs.
|
- documentation on how CPU topology info is exported via sysfs.
|
||||||
cris/
|
cris/
|
||||||
|
71
Documentation/ABI/testing/debugfs-kmemtrace
Normal file
71
Documentation/ABI/testing/debugfs-kmemtrace
Normal file
@@ -0,0 +1,71 @@
|
|||||||
|
What: /sys/kernel/debug/kmemtrace/
|
||||||
|
Date: July 2008
|
||||||
|
Contact: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
|
||||||
|
Description:
|
||||||
|
|
||||||
|
In kmemtrace-enabled kernels, the following files are created:
|
||||||
|
|
||||||
|
/sys/kernel/debug/kmemtrace/
|
||||||
|
cpu<n> (0400) Per-CPU tracing data, see below. (binary)
|
||||||
|
total_overruns (0400) Total number of bytes which were dropped from
|
||||||
|
cpu<n> files because of full buffer condition,
|
||||||
|
non-binary. (text)
|
||||||
|
abi_version (0400) Kernel's kmemtrace ABI version. (text)
|
||||||
|
|
||||||
|
Each per-CPU file should be read according to the relay interface. That is,
|
||||||
|
the reader should set affinity to that specific CPU and, as currently done by
|
||||||
|
the userspace application (though there are other methods), use poll() with
|
||||||
|
an infinite timeout before every read(). Otherwise, erroneous data may be
|
||||||
|
read. The binary data has the following _core_ format:
|
||||||
|
|
||||||
|
Event ID (1 byte) Unsigned integer, one of:
|
||||||
|
0 - represents an allocation (KMEMTRACE_EVENT_ALLOC)
|
||||||
|
1 - represents a freeing of previously allocated memory
|
||||||
|
(KMEMTRACE_EVENT_FREE)
|
||||||
|
Type ID (1 byte) Unsigned integer, one of:
|
||||||
|
0 - this is a kmalloc() / kfree()
|
||||||
|
1 - this is a kmem_cache_alloc() / kmem_cache_free()
|
||||||
|
2 - this is a __get_free_pages() et al.
|
||||||
|
Event size (2 bytes) Unsigned integer representing the
|
||||||
|
size of this event. Used to extend
|
||||||
|
kmemtrace. Discard the bytes you
|
||||||
|
don't know about.
|
||||||
|
Sequence number (4 bytes) Signed integer used to reorder data
|
||||||
|
logged on SMP machines. Wraparound
|
||||||
|
must be taken into account, although
|
||||||
|
it is unlikely.
|
||||||
|
Caller address (8 bytes) Return address to the caller.
|
||||||
|
Pointer to mem (8 bytes) Pointer to target memory area. Can be
|
||||||
|
NULL, but not all such calls might be
|
||||||
|
recorded.
|
||||||
|
|
||||||
|
In case of KMEMTRACE_EVENT_ALLOC events, the next fields follow:
|
||||||
|
|
||||||
|
Requested bytes (8 bytes) Total number of requested bytes,
|
||||||
|
unsigned, must not be zero.
|
||||||
|
Allocated bytes (8 bytes) Total number of actually allocated
|
||||||
|
bytes, unsigned, must not be lower
|
||||||
|
than requested bytes.
|
||||||
|
Requested flags (4 bytes) GFP flags supplied by the caller.
|
||||||
|
Target CPU (4 bytes) Signed integer, valid for event id 1.
|
||||||
|
If equal to -1, target CPU is the same
|
||||||
|
as origin CPU, but the reverse might
|
||||||
|
not be true.
|
||||||
|
|
||||||
|
The data is made available in the same endianness the machine has.
|
||||||
|
|
||||||
|
Other event ids and type ids may be defined and added. Other fields may be
|
||||||
|
added by increasing event size, but see below for details.
|
||||||
|
Every modification to the ABI, including new id definitions, are followed
|
||||||
|
by bumping the ABI version by one.
|
||||||
|
|
||||||
|
Adding new data to the packet (features) is done at the end of the mandatory
|
||||||
|
data:
|
||||||
|
Feature size (2 byte)
|
||||||
|
Feature ID (1 byte)
|
||||||
|
Feature data (Feature size - 3 bytes)
|
||||||
|
|
||||||
|
|
||||||
|
Users:
|
||||||
|
kmemtrace-user - git://repo.or.cz/kmemtrace-user.git
|
||||||
|
|
61
Documentation/ABI/testing/ima_policy
Normal file
61
Documentation/ABI/testing/ima_policy
Normal file
@@ -0,0 +1,61 @@
|
|||||||
|
What: security/ima/policy
|
||||||
|
Date: May 2008
|
||||||
|
Contact: Mimi Zohar <zohar@us.ibm.com>
|
||||||
|
Description:
|
||||||
|
The Trusted Computing Group(TCG) runtime Integrity
|
||||||
|
Measurement Architecture(IMA) maintains a list of hash
|
||||||
|
values of executables and other sensitive system files
|
||||||
|
loaded into the run-time of this system. At runtime,
|
||||||
|
the policy can be constrained based on LSM specific data.
|
||||||
|
Policies are loaded into the securityfs file ima/policy
|
||||||
|
by opening the file, writing the rules one at a time and
|
||||||
|
then closing the file. The new policy takes effect after
|
||||||
|
the file ima/policy is closed.
|
||||||
|
|
||||||
|
rule format: action [condition ...]
|
||||||
|
|
||||||
|
action: measure | dont_measure
|
||||||
|
condition:= base | lsm
|
||||||
|
base: [[func=] [mask=] [fsmagic=] [uid=]]
|
||||||
|
lsm: [[subj_user=] [subj_role=] [subj_type=]
|
||||||
|
[obj_user=] [obj_role=] [obj_type=]]
|
||||||
|
|
||||||
|
base: func:= [BPRM_CHECK][FILE_MMAP][INODE_PERMISSION]
|
||||||
|
mask:= [MAY_READ] [MAY_WRITE] [MAY_APPEND] [MAY_EXEC]
|
||||||
|
fsmagic:= hex value
|
||||||
|
uid:= decimal value
|
||||||
|
lsm: are LSM specific
|
||||||
|
|
||||||
|
default policy:
|
||||||
|
# PROC_SUPER_MAGIC
|
||||||
|
dont_measure fsmagic=0x9fa0
|
||||||
|
# SYSFS_MAGIC
|
||||||
|
dont_measure fsmagic=0x62656572
|
||||||
|
# DEBUGFS_MAGIC
|
||||||
|
dont_measure fsmagic=0x64626720
|
||||||
|
# TMPFS_MAGIC
|
||||||
|
dont_measure fsmagic=0x01021994
|
||||||
|
# SECURITYFS_MAGIC
|
||||||
|
dont_measure fsmagic=0x73636673
|
||||||
|
|
||||||
|
measure func=BPRM_CHECK
|
||||||
|
measure func=FILE_MMAP mask=MAY_EXEC
|
||||||
|
measure func=INODE_PERM mask=MAY_READ uid=0
|
||||||
|
|
||||||
|
The default policy measures all executables in bprm_check,
|
||||||
|
all files mmapped executable in file_mmap, and all files
|
||||||
|
open for read by root in inode_permission.
|
||||||
|
|
||||||
|
Examples of LSM specific definitions:
|
||||||
|
|
||||||
|
SELinux:
|
||||||
|
# SELINUX_MAGIC
|
||||||
|
dont_measure fsmagic=0xF97CFF8C
|
||||||
|
|
||||||
|
dont_measure obj_type=var_log_t
|
||||||
|
dont_measure obj_type=auditd_log_t
|
||||||
|
measure subj_user=system_u func=INODE_PERM mask=MAY_READ
|
||||||
|
measure subj_role=system_r func=INODE_PERM mask=MAY_READ
|
||||||
|
|
||||||
|
Smack:
|
||||||
|
measure subj_user=_ func=INODE_PERM mask=MAY_READ
|
@@ -1,3 +1,89 @@
|
|||||||
|
What: /sys/bus/pci/drivers/.../bind
|
||||||
|
Date: December 2003
|
||||||
|
Contact: linux-pci@vger.kernel.org
|
||||||
|
Description:
|
||||||
|
Writing a device location to this file will cause
|
||||||
|
the driver to attempt to bind to the device found at
|
||||||
|
this location. This is useful for overriding default
|
||||||
|
bindings. The format for the location is: DDDD:BB:DD.F.
|
||||||
|
That is Domain:Bus:Device.Function and is the same as
|
||||||
|
found in /sys/bus/pci/devices/. For example:
|
||||||
|
# echo 0000:00:19.0 > /sys/bus/pci/drivers/foo/bind
|
||||||
|
(Note: kernels before 2.6.28 may require echo -n).
|
||||||
|
|
||||||
|
What: /sys/bus/pci/drivers/.../unbind
|
||||||
|
Date: December 2003
|
||||||
|
Contact: linux-pci@vger.kernel.org
|
||||||
|
Description:
|
||||||
|
Writing a device location to this file will cause the
|
||||||
|
driver to attempt to unbind from the device found at
|
||||||
|
this location. This may be useful when overriding default
|
||||||
|
bindings. The format for the location is: DDDD:BB:DD.F.
|
||||||
|
That is Domain:Bus:Device.Function and is the same as
|
||||||
|
found in /sys/bus/pci/devices/. For example:
|
||||||
|
# echo 0000:00:19.0 > /sys/bus/pci/drivers/foo/unbind
|
||||||
|
(Note: kernels before 2.6.28 may require echo -n).
|
||||||
|
|
||||||
|
What: /sys/bus/pci/drivers/.../new_id
|
||||||
|
Date: December 2003
|
||||||
|
Contact: linux-pci@vger.kernel.org
|
||||||
|
Description:
|
||||||
|
Writing a device ID to this file will attempt to
|
||||||
|
dynamically add a new device ID to a PCI device driver.
|
||||||
|
This may allow the driver to support more hardware than
|
||||||
|
was included in the driver's static device ID support
|
||||||
|
table at compile time. The format for the device ID is:
|
||||||
|
VVVV DDDD SVVV SDDD CCCC MMMM PPPP. That is Vendor ID,
|
||||||
|
Device ID, Subsystem Vendor ID, Subsystem Device ID,
|
||||||
|
Class, Class Mask, and Private Driver Data. The Vendor ID
|
||||||
|
and Device ID fields are required, the rest are optional.
|
||||||
|
Upon successfully adding an ID, the driver will probe
|
||||||
|
for the device and attempt to bind to it. For example:
|
||||||
|
# echo "8086 10f5" > /sys/bus/pci/drivers/foo/new_id
|
||||||
|
|
||||||
|
What: /sys/bus/pci/drivers/.../remove_id
|
||||||
|
Date: February 2009
|
||||||
|
Contact: Chris Wright <chrisw@sous-sol.org>
|
||||||
|
Description:
|
||||||
|
Writing a device ID to this file will remove an ID
|
||||||
|
that was dynamically added via the new_id sysfs entry.
|
||||||
|
The format for the device ID is:
|
||||||
|
VVVV DDDD SVVV SDDD CCCC MMMM. That is Vendor ID, Device
|
||||||
|
ID, Subsystem Vendor ID, Subsystem Device ID, Class,
|
||||||
|
and Class Mask. The Vendor ID and Device ID fields are
|
||||||
|
required, the rest are optional. After successfully
|
||||||
|
removing an ID, the driver will no longer support the
|
||||||
|
device. This is useful to ensure auto probing won't
|
||||||
|
match the driver to the device. For example:
|
||||||
|
# echo "8086 10f5" > /sys/bus/pci/drivers/foo/remove_id
|
||||||
|
|
||||||
|
What: /sys/bus/pci/rescan
|
||||||
|
Date: January 2009
|
||||||
|
Contact: Linux PCI developers <linux-pci@vger.kernel.org>
|
||||||
|
Description:
|
||||||
|
Writing a non-zero value to this attribute will
|
||||||
|
force a rescan of all PCI buses in the system, and
|
||||||
|
re-discover previously removed devices.
|
||||||
|
Depends on CONFIG_HOTPLUG.
|
||||||
|
|
||||||
|
What: /sys/bus/pci/devices/.../remove
|
||||||
|
Date: January 2009
|
||||||
|
Contact: Linux PCI developers <linux-pci@vger.kernel.org>
|
||||||
|
Description:
|
||||||
|
Writing a non-zero value to this attribute will
|
||||||
|
hot-remove the PCI device and any of its children.
|
||||||
|
Depends on CONFIG_HOTPLUG.
|
||||||
|
|
||||||
|
What: /sys/bus/pci/devices/.../rescan
|
||||||
|
Date: January 2009
|
||||||
|
Contact: Linux PCI developers <linux-pci@vger.kernel.org>
|
||||||
|
Description:
|
||||||
|
Writing a non-zero value to this attribute will
|
||||||
|
force a rescan of the device's parent bus and all
|
||||||
|
child buses, and re-discover devices removed earlier
|
||||||
|
from this part of the device tree.
|
||||||
|
Depends on CONFIG_HOTPLUG.
|
||||||
|
|
||||||
What: /sys/bus/pci/devices/.../vpd
|
What: /sys/bus/pci/devices/.../vpd
|
||||||
Date: February 2008
|
Date: February 2008
|
||||||
Contact: Ben Hutchings <bhutchings@solarflare.com>
|
Contact: Ben Hutchings <bhutchings@solarflare.com>
|
||||||
@@ -9,3 +95,30 @@ Description:
|
|||||||
that some devices may have malformatted data. If the
|
that some devices may have malformatted data. If the
|
||||||
underlying VPD has a writable section then the
|
underlying VPD has a writable section then the
|
||||||
corresponding section of this file will be writable.
|
corresponding section of this file will be writable.
|
||||||
|
|
||||||
|
What: /sys/bus/pci/devices/.../virtfnN
|
||||||
|
Date: March 2009
|
||||||
|
Contact: Yu Zhao <yu.zhao@intel.com>
|
||||||
|
Description:
|
||||||
|
This symbolic link appears when hardware supports the SR-IOV
|
||||||
|
capability and the Physical Function driver has enabled it.
|
||||||
|
The symbolic link points to the PCI device sysfs entry of the
|
||||||
|
Virtual Function whose index is N (0...MaxVFs-1).
|
||||||
|
|
||||||
|
What: /sys/bus/pci/devices/.../dep_link
|
||||||
|
Date: March 2009
|
||||||
|
Contact: Yu Zhao <yu.zhao@intel.com>
|
||||||
|
Description:
|
||||||
|
This symbolic link appears when hardware supports the SR-IOV
|
||||||
|
capability and the Physical Function driver has enabled it,
|
||||||
|
and this device has vendor specific dependencies with others.
|
||||||
|
The symbolic link points to the PCI device sysfs entry of
|
||||||
|
Physical Function this device depends on.
|
||||||
|
|
||||||
|
What: /sys/bus/pci/devices/.../physfn
|
||||||
|
Date: March 2009
|
||||||
|
Contact: Yu Zhao <yu.zhao@intel.com>
|
||||||
|
Description:
|
||||||
|
This symbolic link appears when a device is a Virtual Function.
|
||||||
|
The symbolic link points to the PCI device sysfs entry of the
|
||||||
|
Physical Function this device associates with.
|
||||||
|
@@ -4,8 +4,8 @@ KernelVersion: 2.6.26
|
|||||||
Contact: Liam Girdwood <lrg@slimlogic.co.uk>
|
Contact: Liam Girdwood <lrg@slimlogic.co.uk>
|
||||||
Description:
|
Description:
|
||||||
Some regulator directories will contain a field called
|
Some regulator directories will contain a field called
|
||||||
state. This reports the regulator enable status, for
|
state. This reports the regulator enable control, for
|
||||||
regulators which can report that value.
|
regulators which can report that input value.
|
||||||
|
|
||||||
This will be one of the following strings:
|
This will be one of the following strings:
|
||||||
|
|
||||||
@@ -14,16 +14,54 @@ Description:
|
|||||||
'unknown'
|
'unknown'
|
||||||
|
|
||||||
'enabled' means the regulator output is ON and is supplying
|
'enabled' means the regulator output is ON and is supplying
|
||||||
power to the system.
|
power to the system (assuming no error prevents it).
|
||||||
|
|
||||||
'disabled' means the regulator output is OFF and is not
|
'disabled' means the regulator output is OFF and is not
|
||||||
supplying power to the system..
|
supplying power to the system (unless some non-Linux
|
||||||
|
control has enabled it).
|
||||||
|
|
||||||
'unknown' means software cannot determine the state, or
|
'unknown' means software cannot determine the state, or
|
||||||
the reported state is invalid.
|
the reported state is invalid.
|
||||||
|
|
||||||
NOTE: this field can be used in conjunction with microvolts
|
NOTE: this field can be used in conjunction with microvolts
|
||||||
and microamps to determine regulator output levels.
|
or microamps to determine configured regulator output levels.
|
||||||
|
|
||||||
|
|
||||||
|
What: /sys/class/regulator/.../status
|
||||||
|
Description:
|
||||||
|
Some regulator directories will contain a field called
|
||||||
|
"status". This reports the current regulator status, for
|
||||||
|
regulators which can report that output value.
|
||||||
|
|
||||||
|
This will be one of the following strings:
|
||||||
|
|
||||||
|
off
|
||||||
|
on
|
||||||
|
error
|
||||||
|
fast
|
||||||
|
normal
|
||||||
|
idle
|
||||||
|
standby
|
||||||
|
|
||||||
|
"off" means the regulator is not supplying power to the
|
||||||
|
system.
|
||||||
|
|
||||||
|
"on" means the regulator is supplying power to the system,
|
||||||
|
and the regulator can't report a detailed operation mode.
|
||||||
|
|
||||||
|
"error" indicates an out-of-regulation status such as being
|
||||||
|
disabled due to thermal shutdown, or voltage being unstable
|
||||||
|
because of problems with the input power supply.
|
||||||
|
|
||||||
|
"fast", "normal", "idle", and "standby" are all detailed
|
||||||
|
regulator operation modes (described elsewhere). They
|
||||||
|
imply "on", but provide more detail.
|
||||||
|
|
||||||
|
Note that regulator status is a function of many inputs,
|
||||||
|
not limited to control inputs from Linux. For example,
|
||||||
|
the actual load presented may trigger "error" status; or
|
||||||
|
a regulator may be enabled by another user, even though
|
||||||
|
Linux did not enable it.
|
||||||
|
|
||||||
|
|
||||||
What: /sys/class/regulator/.../type
|
What: /sys/class/regulator/.../type
|
||||||
@@ -58,7 +96,7 @@ Description:
|
|||||||
Some regulator directories will contain a field called
|
Some regulator directories will contain a field called
|
||||||
microvolts. This holds the regulator output voltage setting
|
microvolts. This holds the regulator output voltage setting
|
||||||
measured in microvolts (i.e. E-6 Volts), for regulators
|
measured in microvolts (i.e. E-6 Volts), for regulators
|
||||||
which can report that voltage.
|
which can report the control input for voltage.
|
||||||
|
|
||||||
NOTE: This value should not be used to determine the regulator
|
NOTE: This value should not be used to determine the regulator
|
||||||
output voltage level as this value is the same regardless of
|
output voltage level as this value is the same regardless of
|
||||||
@@ -73,7 +111,7 @@ Description:
|
|||||||
Some regulator directories will contain a field called
|
Some regulator directories will contain a field called
|
||||||
microamps. This holds the regulator output current limit
|
microamps. This holds the regulator output current limit
|
||||||
setting measured in microamps (i.e. E-6 Amps), for regulators
|
setting measured in microamps (i.e. E-6 Amps), for regulators
|
||||||
which can report that current.
|
which can report the control input for a current limit.
|
||||||
|
|
||||||
NOTE: This value should not be used to determine the regulator
|
NOTE: This value should not be used to determine the regulator
|
||||||
output current level as this value is the same regardless of
|
output current level as this value is the same regardless of
|
||||||
@@ -87,7 +125,7 @@ Contact: Liam Girdwood <lrg@slimlogic.co.uk>
|
|||||||
Description:
|
Description:
|
||||||
Some regulator directories will contain a field called
|
Some regulator directories will contain a field called
|
||||||
opmode. This holds the current regulator operating mode,
|
opmode. This holds the current regulator operating mode,
|
||||||
for regulators which can report it.
|
for regulators which can report that control input value.
|
||||||
|
|
||||||
The opmode value can be one of the following strings:
|
The opmode value can be one of the following strings:
|
||||||
|
|
||||||
@@ -101,7 +139,8 @@ Description:
|
|||||||
|
|
||||||
NOTE: This value should not be used to determine the regulator
|
NOTE: This value should not be used to determine the regulator
|
||||||
output operating mode as this value is the same regardless of
|
output operating mode as this value is the same regardless of
|
||||||
whether the regulator is enabled or disabled.
|
whether the regulator is enabled or disabled. A "status"
|
||||||
|
attribute may be available to determine the actual mode.
|
||||||
|
|
||||||
|
|
||||||
What: /sys/class/regulator/.../min_microvolts
|
What: /sys/class/regulator/.../min_microvolts
|
||||||
|
@@ -1,6 +1,6 @@
|
|||||||
What: /sys/firmware/memmap/
|
What: /sys/firmware/memmap/
|
||||||
Date: June 2008
|
Date: June 2008
|
||||||
Contact: Bernhard Walle <bwalle@suse.de>
|
Contact: Bernhard Walle <bernhard.walle@gmx.de>
|
||||||
Description:
|
Description:
|
||||||
On all platforms, the firmware provides a memory map which the
|
On all platforms, the firmware provides a memory map which the
|
||||||
kernel reads. The resources from that memory map are registered
|
kernel reads. The resources from that memory map are registered
|
||||||
|
81
Documentation/ABI/testing/sysfs-fs-ext4
Normal file
81
Documentation/ABI/testing/sysfs-fs-ext4
Normal file
@@ -0,0 +1,81 @@
|
|||||||
|
What: /sys/fs/ext4/<disk>/mb_stats
|
||||||
|
Date: March 2008
|
||||||
|
Contact: "Theodore Ts'o" <tytso@mit.edu>
|
||||||
|
Description:
|
||||||
|
Controls whether the multiblock allocator should
|
||||||
|
collect statistics, which are shown during the unmount.
|
||||||
|
1 means to collect statistics, 0 means not to collect
|
||||||
|
statistics
|
||||||
|
|
||||||
|
What: /sys/fs/ext4/<disk>/mb_group_prealloc
|
||||||
|
Date: March 2008
|
||||||
|
Contact: "Theodore Ts'o" <tytso@mit.edu>
|
||||||
|
Description:
|
||||||
|
The multiblock allocator will round up allocation
|
||||||
|
requests to a multiple of this tuning parameter if the
|
||||||
|
stripe size is not set in the ext4 superblock
|
||||||
|
|
||||||
|
What: /sys/fs/ext4/<disk>/mb_max_to_scan
|
||||||
|
Date: March 2008
|
||||||
|
Contact: "Theodore Ts'o" <tytso@mit.edu>
|
||||||
|
Description:
|
||||||
|
The maximum number of extents the multiblock allocator
|
||||||
|
will search to find the best extent
|
||||||
|
|
||||||
|
What: /sys/fs/ext4/<disk>/mb_min_to_scan
|
||||||
|
Date: March 2008
|
||||||
|
Contact: "Theodore Ts'o" <tytso@mit.edu>
|
||||||
|
Description:
|
||||||
|
The minimum number of extents the multiblock allocator
|
||||||
|
will search to find the best extent
|
||||||
|
|
||||||
|
What: /sys/fs/ext4/<disk>/mb_order2_req
|
||||||
|
Date: March 2008
|
||||||
|
Contact: "Theodore Ts'o" <tytso@mit.edu>
|
||||||
|
Description:
|
||||||
|
Tuning parameter which controls the minimum size for
|
||||||
|
requests (as a power of 2) where the buddy cache is
|
||||||
|
used
|
||||||
|
|
||||||
|
What: /sys/fs/ext4/<disk>/mb_stream_req
|
||||||
|
Date: March 2008
|
||||||
|
Contact: "Theodore Ts'o" <tytso@mit.edu>
|
||||||
|
Description:
|
||||||
|
Files which have fewer blocks than this tunable
|
||||||
|
parameter will have their blocks allocated out of a
|
||||||
|
block group specific preallocation pool, so that small
|
||||||
|
files are packed closely together. Each large file
|
||||||
|
will have its blocks allocated out of its own unique
|
||||||
|
preallocation pool.
|
||||||
|
|
||||||
|
What: /sys/fs/ext4/<disk>/inode_readahead
|
||||||
|
Date: March 2008
|
||||||
|
Contact: "Theodore Ts'o" <tytso@mit.edu>
|
||||||
|
Description:
|
||||||
|
Tuning parameter which controls the maximum number of
|
||||||
|
inode table blocks that ext4's inode table readahead
|
||||||
|
algorithm will pre-read into the buffer cache
|
||||||
|
|
||||||
|
What: /sys/fs/ext4/<disk>/delayed_allocation_blocks
|
||||||
|
Date: March 2008
|
||||||
|
Contact: "Theodore Ts'o" <tytso@mit.edu>
|
||||||
|
Description:
|
||||||
|
This file is read-only and shows the number of blocks
|
||||||
|
that are dirty in the page cache, but which do not
|
||||||
|
have their location in the filesystem allocated yet.
|
||||||
|
|
||||||
|
What: /sys/fs/ext4/<disk>/lifetime_write_kbytes
|
||||||
|
Date: March 2008
|
||||||
|
Contact: "Theodore Ts'o" <tytso@mit.edu>
|
||||||
|
Description:
|
||||||
|
This file is read-only and shows the number of kilobytes
|
||||||
|
of data that have been written to this filesystem since it was
|
||||||
|
created.
|
||||||
|
|
||||||
|
What: /sys/fs/ext4/<disk>/session_write_kbytes
|
||||||
|
Date: March 2008
|
||||||
|
Contact: "Theodore Ts'o" <tytso@mit.edu>
|
||||||
|
Description:
|
||||||
|
This file is read-only and shows the number of
|
||||||
|
kilobytes of data that have been written to this
|
||||||
|
filesystem since it was mounted.
|
@@ -609,3 +609,109 @@ size is the size (and should be a page-sized multiple).
|
|||||||
The return value will be either a pointer to the processor virtual
|
The return value will be either a pointer to the processor virtual
|
||||||
address of the memory, or an error (via PTR_ERR()) if any part of the
|
address of the memory, or an error (via PTR_ERR()) if any part of the
|
||||||
region is occupied.
|
region is occupied.
|
||||||
|
|
||||||
|
Part III - Debug drivers use of the DMA-API
|
||||||
|
-------------------------------------------
|
||||||
|
|
||||||
|
The DMA-API as described above as some constraints. DMA addresses must be
|
||||||
|
released with the corresponding function with the same size for example. With
|
||||||
|
the advent of hardware IOMMUs it becomes more and more important that drivers
|
||||||
|
do not violate those constraints. In the worst case such a violation can
|
||||||
|
result in data corruption up to destroyed filesystems.
|
||||||
|
|
||||||
|
To debug drivers and find bugs in the usage of the DMA-API checking code can
|
||||||
|
be compiled into the kernel which will tell the developer about those
|
||||||
|
violations. If your architecture supports it you can select the "Enable
|
||||||
|
debugging of DMA-API usage" option in your kernel configuration. Enabling this
|
||||||
|
option has a performance impact. Do not enable it in production kernels.
|
||||||
|
|
||||||
|
If you boot the resulting kernel will contain code which does some bookkeeping
|
||||||
|
about what DMA memory was allocated for which device. If this code detects an
|
||||||
|
error it prints a warning message with some details into your kernel log. An
|
||||||
|
example warning message may look like this:
|
||||||
|
|
||||||
|
------------[ cut here ]------------
|
||||||
|
WARNING: at /data2/repos/linux-2.6-iommu/lib/dma-debug.c:448
|
||||||
|
check_unmap+0x203/0x490()
|
||||||
|
Hardware name:
|
||||||
|
forcedeth 0000:00:08.0: DMA-API: device driver frees DMA memory with wrong
|
||||||
|
function [device address=0x00000000640444be] [size=66 bytes] [mapped as
|
||||||
|
single] [unmapped as page]
|
||||||
|
Modules linked in: nfsd exportfs bridge stp llc r8169
|
||||||
|
Pid: 0, comm: swapper Tainted: G W 2.6.28-dmatest-09289-g8bb99c0 #1
|
||||||
|
Call Trace:
|
||||||
|
<IRQ> [<ffffffff80240b22>] warn_slowpath+0xf2/0x130
|
||||||
|
[<ffffffff80647b70>] _spin_unlock+0x10/0x30
|
||||||
|
[<ffffffff80537e75>] usb_hcd_link_urb_to_ep+0x75/0xc0
|
||||||
|
[<ffffffff80647c22>] _spin_unlock_irqrestore+0x12/0x40
|
||||||
|
[<ffffffff8055347f>] ohci_urb_enqueue+0x19f/0x7c0
|
||||||
|
[<ffffffff80252f96>] queue_work+0x56/0x60
|
||||||
|
[<ffffffff80237e10>] enqueue_task_fair+0x20/0x50
|
||||||
|
[<ffffffff80539279>] usb_hcd_submit_urb+0x379/0xbc0
|
||||||
|
[<ffffffff803b78c3>] cpumask_next_and+0x23/0x40
|
||||||
|
[<ffffffff80235177>] find_busiest_group+0x207/0x8a0
|
||||||
|
[<ffffffff8064784f>] _spin_lock_irqsave+0x1f/0x50
|
||||||
|
[<ffffffff803c7ea3>] check_unmap+0x203/0x490
|
||||||
|
[<ffffffff803c8259>] debug_dma_unmap_page+0x49/0x50
|
||||||
|
[<ffffffff80485f26>] nv_tx_done_optimized+0xc6/0x2c0
|
||||||
|
[<ffffffff80486c13>] nv_nic_irq_optimized+0x73/0x2b0
|
||||||
|
[<ffffffff8026df84>] handle_IRQ_event+0x34/0x70
|
||||||
|
[<ffffffff8026ffe9>] handle_edge_irq+0xc9/0x150
|
||||||
|
[<ffffffff8020e3ab>] do_IRQ+0xcb/0x1c0
|
||||||
|
[<ffffffff8020c093>] ret_from_intr+0x0/0xa
|
||||||
|
<EOI> <4>---[ end trace f6435a98e2a38c0e ]---
|
||||||
|
|
||||||
|
The driver developer can find the driver and the device including a stacktrace
|
||||||
|
of the DMA-API call which caused this warning.
|
||||||
|
|
||||||
|
Per default only the first error will result in a warning message. All other
|
||||||
|
errors will only silently counted. This limitation exist to prevent the code
|
||||||
|
from flooding your kernel log. To support debugging a device driver this can
|
||||||
|
be disabled via debugfs. See the debugfs interface documentation below for
|
||||||
|
details.
|
||||||
|
|
||||||
|
The debugfs directory for the DMA-API debugging code is called dma-api/. In
|
||||||
|
this directory the following files can currently be found:
|
||||||
|
|
||||||
|
dma-api/all_errors This file contains a numeric value. If this
|
||||||
|
value is not equal to zero the debugging code
|
||||||
|
will print a warning for every error it finds
|
||||||
|
into the kernel log. Be carefull with this
|
||||||
|
option. It can easily flood your logs.
|
||||||
|
|
||||||
|
dma-api/disabled This read-only file contains the character 'Y'
|
||||||
|
if the debugging code is disabled. This can
|
||||||
|
happen when it runs out of memory or if it was
|
||||||
|
disabled at boot time
|
||||||
|
|
||||||
|
dma-api/error_count This file is read-only and shows the total
|
||||||
|
numbers of errors found.
|
||||||
|
|
||||||
|
dma-api/num_errors The number in this file shows how many
|
||||||
|
warnings will be printed to the kernel log
|
||||||
|
before it stops. This number is initialized to
|
||||||
|
one at system boot and be set by writing into
|
||||||
|
this file
|
||||||
|
|
||||||
|
dma-api/min_free_entries
|
||||||
|
This read-only file can be read to get the
|
||||||
|
minimum number of free dma_debug_entries the
|
||||||
|
allocator has ever seen. If this value goes
|
||||||
|
down to zero the code will disable itself
|
||||||
|
because it is not longer reliable.
|
||||||
|
|
||||||
|
dma-api/num_free_entries
|
||||||
|
The current number of free dma_debug_entries
|
||||||
|
in the allocator.
|
||||||
|
|
||||||
|
If you have this code compiled into your kernel it will be enabled by default.
|
||||||
|
If you want to boot without the bookkeeping anyway you can provide
|
||||||
|
'dma_debug=off' as a boot parameter. This will disable DMA-API debugging.
|
||||||
|
Notice that you can not enable it again at runtime. You have to reboot to do
|
||||||
|
so.
|
||||||
|
|
||||||
|
When the code disables itself at runtime this is most likely because it ran
|
||||||
|
out of dma_debug_entries. These entries are preallocated at boot. The number
|
||||||
|
of preallocated entries is defined per architecture. If it is too low for you
|
||||||
|
boot with 'dma_debug_entries=<your_desired_number>' to overwrite the
|
||||||
|
architectural default.
|
||||||
|
@@ -136,7 +136,7 @@ exactly why.
|
|||||||
The standard 32-bit addressing PCI device would do something like
|
The standard 32-bit addressing PCI device would do something like
|
||||||
this:
|
this:
|
||||||
|
|
||||||
if (pci_set_dma_mask(pdev, DMA_32BIT_MASK)) {
|
if (pci_set_dma_mask(pdev, DMA_BIT_MASK(32))) {
|
||||||
printk(KERN_WARNING
|
printk(KERN_WARNING
|
||||||
"mydev: No suitable DMA available.\n");
|
"mydev: No suitable DMA available.\n");
|
||||||
goto ignore_this_device;
|
goto ignore_this_device;
|
||||||
@@ -155,9 +155,9 @@ all 64-bits when accessing streaming DMA:
|
|||||||
|
|
||||||
int using_dac;
|
int using_dac;
|
||||||
|
|
||||||
if (!pci_set_dma_mask(pdev, DMA_64BIT_MASK)) {
|
if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(64))) {
|
||||||
using_dac = 1;
|
using_dac = 1;
|
||||||
} else if (!pci_set_dma_mask(pdev, DMA_32BIT_MASK)) {
|
} else if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(32))) {
|
||||||
using_dac = 0;
|
using_dac = 0;
|
||||||
} else {
|
} else {
|
||||||
printk(KERN_WARNING
|
printk(KERN_WARNING
|
||||||
@@ -170,14 +170,14 @@ the case would look like this:
|
|||||||
|
|
||||||
int using_dac, consistent_using_dac;
|
int using_dac, consistent_using_dac;
|
||||||
|
|
||||||
if (!pci_set_dma_mask(pdev, DMA_64BIT_MASK)) {
|
if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(64))) {
|
||||||
using_dac = 1;
|
using_dac = 1;
|
||||||
consistent_using_dac = 1;
|
consistent_using_dac = 1;
|
||||||
pci_set_consistent_dma_mask(pdev, DMA_64BIT_MASK);
|
pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
|
||||||
} else if (!pci_set_dma_mask(pdev, DMA_32BIT_MASK)) {
|
} else if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(32))) {
|
||||||
using_dac = 0;
|
using_dac = 0;
|
||||||
consistent_using_dac = 0;
|
consistent_using_dac = 0;
|
||||||
pci_set_consistent_dma_mask(pdev, DMA_32BIT_MASK);
|
pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32));
|
||||||
} else {
|
} else {
|
||||||
printk(KERN_WARNING
|
printk(KERN_WARNING
|
||||||
"mydev: No suitable DMA available.\n");
|
"mydev: No suitable DMA available.\n");
|
||||||
@@ -192,7 +192,7 @@ check the return value from pci_set_consistent_dma_mask().
|
|||||||
Finally, if your device can only drive the low 24-bits of
|
Finally, if your device can only drive the low 24-bits of
|
||||||
address during PCI bus mastering you might do something like:
|
address during PCI bus mastering you might do something like:
|
||||||
|
|
||||||
if (pci_set_dma_mask(pdev, DMA_24BIT_MASK)) {
|
if (pci_set_dma_mask(pdev, DMA_BIT_MASK(24))) {
|
||||||
printk(KERN_WARNING
|
printk(KERN_WARNING
|
||||||
"mydev: 24-bit DMA addressing not available.\n");
|
"mydev: 24-bit DMA addressing not available.\n");
|
||||||
goto ignore_this_device;
|
goto ignore_this_device;
|
||||||
@@ -213,7 +213,7 @@ most specific mask.
|
|||||||
|
|
||||||
Here is pseudo-code showing how this might be done:
|
Here is pseudo-code showing how this might be done:
|
||||||
|
|
||||||
#define PLAYBACK_ADDRESS_BITS DMA_32BIT_MASK
|
#define PLAYBACK_ADDRESS_BITS DMA_BIT_MASK(32)
|
||||||
#define RECORD_ADDRESS_BITS 0x00ffffff
|
#define RECORD_ADDRESS_BITS 0x00ffffff
|
||||||
|
|
||||||
struct my_sound_card *card;
|
struct my_sound_card *card;
|
||||||
|
4
Documentation/DocBook/.gitignore
vendored
4
Documentation/DocBook/.gitignore
vendored
@@ -4,3 +4,7 @@
|
|||||||
*.html
|
*.html
|
||||||
*.9.gz
|
*.9.gz
|
||||||
*.9
|
*.9
|
||||||
|
*.aux
|
||||||
|
*.dvi
|
||||||
|
*.log
|
||||||
|
*.out
|
||||||
|
@@ -6,13 +6,14 @@
|
|||||||
# To add a new book the only step required is to add the book to the
|
# To add a new book the only step required is to add the book to the
|
||||||
# list of DOCBOOKS.
|
# list of DOCBOOKS.
|
||||||
|
|
||||||
DOCBOOKS := z8530book.xml mcabook.xml \
|
DOCBOOKS := z8530book.xml mcabook.xml device-drivers.xml \
|
||||||
kernel-hacking.xml kernel-locking.xml deviceiobook.xml \
|
kernel-hacking.xml kernel-locking.xml deviceiobook.xml \
|
||||||
procfs-guide.xml writing_usb_driver.xml networking.xml \
|
procfs-guide.xml writing_usb_driver.xml networking.xml \
|
||||||
kernel-api.xml filesystems.xml lsm.xml usb.xml kgdb.xml \
|
kernel-api.xml filesystems.xml lsm.xml usb.xml kgdb.xml \
|
||||||
gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \
|
gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \
|
||||||
genericirq.xml s390-drivers.xml uio-howto.xml scsi.xml \
|
genericirq.xml s390-drivers.xml uio-howto.xml scsi.xml \
|
||||||
mac80211.xml debugobjects.xml sh.xml regulator.xml
|
mac80211.xml debugobjects.xml sh.xml regulator.xml \
|
||||||
|
alsa-driver-api.xml writing-an-alsa-driver.xml
|
||||||
|
|
||||||
###
|
###
|
||||||
# The build process is as follows (targets):
|
# The build process is as follows (targets):
|
||||||
|
@@ -1,11 +1,11 @@
|
|||||||
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.1//EN">
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
|
||||||
<book>
|
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
|
||||||
<?dbhtml filename="index.html">
|
|
||||||
|
|
||||||
<!-- ****************************************************** -->
|
<!-- ****************************************************** -->
|
||||||
<!-- Header -->
|
<!-- Header -->
|
||||||
<!-- ****************************************************** -->
|
<!-- ****************************************************** -->
|
||||||
|
<book id="ALSA-Driver-API">
|
||||||
<bookinfo>
|
<bookinfo>
|
||||||
<title>The ALSA Driver API</title>
|
<title>The ALSA Driver API</title>
|
||||||
|
|
||||||
@@ -35,6 +35,8 @@
|
|||||||
|
|
||||||
</bookinfo>
|
</bookinfo>
|
||||||
|
|
||||||
|
<toc></toc>
|
||||||
|
|
||||||
<chapter><title>Management of Cards and Devices</title>
|
<chapter><title>Management of Cards and Devices</title>
|
||||||
<sect1><title>Card Management</title>
|
<sect1><title>Card Management</title>
|
||||||
!Esound/core/init.c
|
!Esound/core/init.c
|
||||||
@@ -71,6 +73,10 @@
|
|||||||
!Esound/pci/ac97/ac97_codec.c
|
!Esound/pci/ac97/ac97_codec.c
|
||||||
!Esound/pci/ac97/ac97_pcm.c
|
!Esound/pci/ac97/ac97_pcm.c
|
||||||
</sect1>
|
</sect1>
|
||||||
|
<sect1><title>Virtual Master Control API</title>
|
||||||
|
!Esound/core/vmaster.c
|
||||||
|
!Iinclude/sound/control.h
|
||||||
|
</sect1>
|
||||||
</chapter>
|
</chapter>
|
||||||
<chapter><title>MIDI API</title>
|
<chapter><title>MIDI API</title>
|
||||||
<sect1><title>Raw MIDI API</title>
|
<sect1><title>Raw MIDI API</title>
|
||||||
@@ -88,6 +94,9 @@
|
|||||||
<chapter><title>Miscellaneous Functions</title>
|
<chapter><title>Miscellaneous Functions</title>
|
||||||
<sect1><title>Hardware-Dependent Devices API</title>
|
<sect1><title>Hardware-Dependent Devices API</title>
|
||||||
!Esound/core/hwdep.c
|
!Esound/core/hwdep.c
|
||||||
|
</sect1>
|
||||||
|
<sect1><title>Jack Abstraction Layer API</title>
|
||||||
|
!Esound/core/jack.c
|
||||||
</sect1>
|
</sect1>
|
||||||
<sect1><title>ISA DMA Helpers</title>
|
<sect1><title>ISA DMA Helpers</title>
|
||||||
!Esound/core/isadma.c
|
!Esound/core/isadma.c
|
418
Documentation/DocBook/device-drivers.tmpl
Normal file
418
Documentation/DocBook/device-drivers.tmpl
Normal file
@@ -0,0 +1,418 @@
|
|||||||
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
|
||||||
|
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
|
||||||
|
|
||||||
|
<book id="LinuxDriversAPI">
|
||||||
|
<bookinfo>
|
||||||
|
<title>Linux Device Drivers</title>
|
||||||
|
|
||||||
|
<legalnotice>
|
||||||
|
<para>
|
||||||
|
This documentation is free software; you can redistribute
|
||||||
|
it and/or modify it under the terms of the GNU General Public
|
||||||
|
License as published by the Free Software Foundation; either
|
||||||
|
version 2 of the License, or (at your option) any later
|
||||||
|
version.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
This program is distributed in the hope that it will be
|
||||||
|
useful, but WITHOUT ANY WARRANTY; without even the implied
|
||||||
|
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
||||||
|
See the GNU General Public License for more details.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
You should have received a copy of the GNU General Public
|
||||||
|
License along with this program; if not, write to the Free
|
||||||
|
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
|
||||||
|
MA 02111-1307 USA
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
For more details see the file COPYING in the source
|
||||||
|
distribution of Linux.
|
||||||
|
</para>
|
||||||
|
</legalnotice>
|
||||||
|
</bookinfo>
|
||||||
|
|
||||||
|
<toc></toc>
|
||||||
|
|
||||||
|
<chapter id="Basics">
|
||||||
|
<title>Driver Basics</title>
|
||||||
|
<sect1><title>Driver Entry and Exit points</title>
|
||||||
|
!Iinclude/linux/init.h
|
||||||
|
</sect1>
|
||||||
|
|
||||||
|
<sect1><title>Atomic and pointer manipulation</title>
|
||||||
|
!Iarch/x86/include/asm/atomic_32.h
|
||||||
|
!Iarch/x86/include/asm/unaligned.h
|
||||||
|
</sect1>
|
||||||
|
|
||||||
|
<sect1><title>Delaying, scheduling, and timer routines</title>
|
||||||
|
!Iinclude/linux/sched.h
|
||||||
|
!Ekernel/sched.c
|
||||||
|
!Ekernel/timer.c
|
||||||
|
</sect1>
|
||||||
|
<sect1><title>High-resolution timers</title>
|
||||||
|
!Iinclude/linux/ktime.h
|
||||||
|
!Iinclude/linux/hrtimer.h
|
||||||
|
!Ekernel/hrtimer.c
|
||||||
|
</sect1>
|
||||||
|
<sect1><title>Workqueues and Kevents</title>
|
||||||
|
!Ekernel/workqueue.c
|
||||||
|
</sect1>
|
||||||
|
<sect1><title>Internal Functions</title>
|
||||||
|
!Ikernel/exit.c
|
||||||
|
!Ikernel/signal.c
|
||||||
|
!Iinclude/linux/kthread.h
|
||||||
|
!Ekernel/kthread.c
|
||||||
|
</sect1>
|
||||||
|
|
||||||
|
<sect1><title>Kernel objects manipulation</title>
|
||||||
|
<!--
|
||||||
|
X!Iinclude/linux/kobject.h
|
||||||
|
-->
|
||||||
|
!Elib/kobject.c
|
||||||
|
</sect1>
|
||||||
|
|
||||||
|
<sect1><title>Kernel utility functions</title>
|
||||||
|
!Iinclude/linux/kernel.h
|
||||||
|
!Ekernel/printk.c
|
||||||
|
!Ekernel/panic.c
|
||||||
|
!Ekernel/sys.c
|
||||||
|
!Ekernel/rcupdate.c
|
||||||
|
</sect1>
|
||||||
|
|
||||||
|
<sect1><title>Device Resource Management</title>
|
||||||
|
!Edrivers/base/devres.c
|
||||||
|
</sect1>
|
||||||
|
|
||||||
|
</chapter>
|
||||||
|
|
||||||
|
<chapter id="devdrivers">
|
||||||
|
<title>Device drivers infrastructure</title>
|
||||||
|
<sect1><title>Device Drivers Base</title>
|
||||||
|
<!--
|
||||||
|
X!Iinclude/linux/device.h
|
||||||
|
-->
|
||||||
|
!Edrivers/base/driver.c
|
||||||
|
!Edrivers/base/core.c
|
||||||
|
!Edrivers/base/class.c
|
||||||
|
!Edrivers/base/firmware_class.c
|
||||||
|
!Edrivers/base/transport_class.c
|
||||||
|
<!-- Cannot be included, because
|
||||||
|
attribute_container_add_class_device_adapter
|
||||||
|
and attribute_container_classdev_to_container
|
||||||
|
exceed allowed 44 characters maximum
|
||||||
|
X!Edrivers/base/attribute_container.c
|
||||||
|
-->
|
||||||
|
!Edrivers/base/sys.c
|
||||||
|
<!--
|
||||||
|
X!Edrivers/base/interface.c
|
||||||
|
-->
|
||||||
|
!Edrivers/base/platform.c
|
||||||
|
!Edrivers/base/bus.c
|
||||||
|
</sect1>
|
||||||
|
<sect1><title>Device Drivers Power Management</title>
|
||||||
|
!Edrivers/base/power/main.c
|
||||||
|
</sect1>
|
||||||
|
<sect1><title>Device Drivers ACPI Support</title>
|
||||||
|
<!-- Internal functions only
|
||||||
|
X!Edrivers/acpi/sleep/main.c
|
||||||
|
X!Edrivers/acpi/sleep/wakeup.c
|
||||||
|
X!Edrivers/acpi/motherboard.c
|
||||||
|
X!Edrivers/acpi/bus.c
|
||||||
|
-->
|
||||||
|
!Edrivers/acpi/scan.c
|
||||||
|
!Idrivers/acpi/scan.c
|
||||||
|
<!-- No correct structured comments
|
||||||
|
X!Edrivers/acpi/pci_bind.c
|
||||||
|
-->
|
||||||
|
</sect1>
|
||||||
|
<sect1><title>Device drivers PnP support</title>
|
||||||
|
!Idrivers/pnp/core.c
|
||||||
|
<!-- No correct structured comments
|
||||||
|
X!Edrivers/pnp/system.c
|
||||||
|
-->
|
||||||
|
!Edrivers/pnp/card.c
|
||||||
|
!Idrivers/pnp/driver.c
|
||||||
|
!Edrivers/pnp/manager.c
|
||||||
|
!Edrivers/pnp/support.c
|
||||||
|
</sect1>
|
||||||
|
<sect1><title>Userspace IO devices</title>
|
||||||
|
!Edrivers/uio/uio.c
|
||||||
|
!Iinclude/linux/uio_driver.h
|
||||||
|
</sect1>
|
||||||
|
</chapter>
|
||||||
|
|
||||||
|
<chapter id="parportdev">
|
||||||
|
<title>Parallel Port Devices</title>
|
||||||
|
!Iinclude/linux/parport.h
|
||||||
|
!Edrivers/parport/ieee1284.c
|
||||||
|
!Edrivers/parport/share.c
|
||||||
|
!Idrivers/parport/daisy.c
|
||||||
|
</chapter>
|
||||||
|
|
||||||
|
<chapter id="message_devices">
|
||||||
|
<title>Message-based devices</title>
|
||||||
|
<sect1><title>Fusion message devices</title>
|
||||||
|
!Edrivers/message/fusion/mptbase.c
|
||||||
|
!Idrivers/message/fusion/mptbase.c
|
||||||
|
!Edrivers/message/fusion/mptscsih.c
|
||||||
|
!Idrivers/message/fusion/mptscsih.c
|
||||||
|
!Idrivers/message/fusion/mptctl.c
|
||||||
|
!Idrivers/message/fusion/mptspi.c
|
||||||
|
!Idrivers/message/fusion/mptfc.c
|
||||||
|
!Idrivers/message/fusion/mptlan.c
|
||||||
|
</sect1>
|
||||||
|
<sect1><title>I2O message devices</title>
|
||||||
|
!Iinclude/linux/i2o.h
|
||||||
|
!Idrivers/message/i2o/core.h
|
||||||
|
!Edrivers/message/i2o/iop.c
|
||||||
|
!Idrivers/message/i2o/iop.c
|
||||||
|
!Idrivers/message/i2o/config-osm.c
|
||||||
|
!Edrivers/message/i2o/exec-osm.c
|
||||||
|
!Idrivers/message/i2o/exec-osm.c
|
||||||
|
!Idrivers/message/i2o/bus-osm.c
|
||||||
|
!Edrivers/message/i2o/device.c
|
||||||
|
!Idrivers/message/i2o/device.c
|
||||||
|
!Idrivers/message/i2o/driver.c
|
||||||
|
!Idrivers/message/i2o/pci.c
|
||||||
|
!Idrivers/message/i2o/i2o_block.c
|
||||||
|
!Idrivers/message/i2o/i2o_scsi.c
|
||||||
|
!Idrivers/message/i2o/i2o_proc.c
|
||||||
|
</sect1>
|
||||||
|
</chapter>
|
||||||
|
|
||||||
|
<chapter id="snddev">
|
||||||
|
<title>Sound Devices</title>
|
||||||
|
!Iinclude/sound/core.h
|
||||||
|
!Esound/sound_core.c
|
||||||
|
!Iinclude/sound/pcm.h
|
||||||
|
!Esound/core/pcm.c
|
||||||
|
!Esound/core/device.c
|
||||||
|
!Esound/core/info.c
|
||||||
|
!Esound/core/rawmidi.c
|
||||||
|
!Esound/core/sound.c
|
||||||
|
!Esound/core/memory.c
|
||||||
|
!Esound/core/pcm_memory.c
|
||||||
|
!Esound/core/init.c
|
||||||
|
!Esound/core/isadma.c
|
||||||
|
!Esound/core/control.c
|
||||||
|
!Esound/core/pcm_lib.c
|
||||||
|
!Esound/core/hwdep.c
|
||||||
|
!Esound/core/pcm_native.c
|
||||||
|
!Esound/core/memalloc.c
|
||||||
|
<!-- FIXME: Removed for now since no structured comments in source
|
||||||
|
X!Isound/sound_firmware.c
|
||||||
|
-->
|
||||||
|
</chapter>
|
||||||
|
|
||||||
|
<chapter id="uart16x50">
|
||||||
|
<title>16x50 UART Driver</title>
|
||||||
|
!Iinclude/linux/serial_core.h
|
||||||
|
!Edrivers/serial/serial_core.c
|
||||||
|
!Edrivers/serial/8250.c
|
||||||
|
</chapter>
|
||||||
|
|
||||||
|
<chapter id="fbdev">
|
||||||
|
<title>Frame Buffer Library</title>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The frame buffer drivers depend heavily on four data structures.
|
||||||
|
These structures are declared in include/linux/fb.h. They are
|
||||||
|
fb_info, fb_var_screeninfo, fb_fix_screeninfo and fb_monospecs.
|
||||||
|
The last three can be made available to and from userland.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
fb_info defines the current state of a particular video card.
|
||||||
|
Inside fb_info, there exists a fb_ops structure which is a
|
||||||
|
collection of needed functions to make fbdev and fbcon work.
|
||||||
|
fb_info is only visible to the kernel.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
fb_var_screeninfo is used to describe the features of a video card
|
||||||
|
that are user defined. With fb_var_screeninfo, things such as
|
||||||
|
depth and the resolution may be defined.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The next structure is fb_fix_screeninfo. This defines the
|
||||||
|
properties of a card that are created when a mode is set and can't
|
||||||
|
be changed otherwise. A good example of this is the start of the
|
||||||
|
frame buffer memory. This "locks" the address of the frame buffer
|
||||||
|
memory, so that it cannot be changed or moved.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The last structure is fb_monospecs. In the old API, there was
|
||||||
|
little importance for fb_monospecs. This allowed for forbidden things
|
||||||
|
such as setting a mode of 800x600 on a fix frequency monitor. With
|
||||||
|
the new API, fb_monospecs prevents such things, and if used
|
||||||
|
correctly, can prevent a monitor from being cooked. fb_monospecs
|
||||||
|
will not be useful until kernels 2.5.x.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<sect1><title>Frame Buffer Memory</title>
|
||||||
|
!Edrivers/video/fbmem.c
|
||||||
|
</sect1>
|
||||||
|
<!--
|
||||||
|
<sect1><title>Frame Buffer Console</title>
|
||||||
|
X!Edrivers/video/console/fbcon.c
|
||||||
|
</sect1>
|
||||||
|
-->
|
||||||
|
<sect1><title>Frame Buffer Colormap</title>
|
||||||
|
!Edrivers/video/fbcmap.c
|
||||||
|
</sect1>
|
||||||
|
<!-- FIXME:
|
||||||
|
drivers/video/fbgen.c has no docs, which stuffs up the sgml. Comment
|
||||||
|
out until somebody adds docs. KAO
|
||||||
|
<sect1><title>Frame Buffer Generic Functions</title>
|
||||||
|
X!Idrivers/video/fbgen.c
|
||||||
|
</sect1>
|
||||||
|
KAO -->
|
||||||
|
<sect1><title>Frame Buffer Video Mode Database</title>
|
||||||
|
!Idrivers/video/modedb.c
|
||||||
|
!Edrivers/video/modedb.c
|
||||||
|
</sect1>
|
||||||
|
<sect1><title>Frame Buffer Macintosh Video Mode Database</title>
|
||||||
|
!Edrivers/video/macmodes.c
|
||||||
|
</sect1>
|
||||||
|
<sect1><title>Frame Buffer Fonts</title>
|
||||||
|
<para>
|
||||||
|
Refer to the file drivers/video/console/fonts.c for more information.
|
||||||
|
</para>
|
||||||
|
<!-- FIXME: Removed for now since no structured comments in source
|
||||||
|
X!Idrivers/video/console/fonts.c
|
||||||
|
-->
|
||||||
|
</sect1>
|
||||||
|
</chapter>
|
||||||
|
|
||||||
|
<chapter id="input_subsystem">
|
||||||
|
<title>Input Subsystem</title>
|
||||||
|
!Iinclude/linux/input.h
|
||||||
|
!Edrivers/input/input.c
|
||||||
|
!Edrivers/input/ff-core.c
|
||||||
|
!Edrivers/input/ff-memless.c
|
||||||
|
</chapter>
|
||||||
|
|
||||||
|
<chapter id="spi">
|
||||||
|
<title>Serial Peripheral Interface (SPI)</title>
|
||||||
|
<para>
|
||||||
|
SPI is the "Serial Peripheral Interface", widely used with
|
||||||
|
embedded systems because it is a simple and efficient
|
||||||
|
interface: basically a multiplexed shift register.
|
||||||
|
Its three signal wires hold a clock (SCK, often in the range
|
||||||
|
of 1-20 MHz), a "Master Out, Slave In" (MOSI) data line, and
|
||||||
|
a "Master In, Slave Out" (MISO) data line.
|
||||||
|
SPI is a full duplex protocol; for each bit shifted out the
|
||||||
|
MOSI line (one per clock) another is shifted in on the MISO line.
|
||||||
|
Those bits are assembled into words of various sizes on the
|
||||||
|
way to and from system memory.
|
||||||
|
An additional chipselect line is usually active-low (nCS);
|
||||||
|
four signals are normally used for each peripheral, plus
|
||||||
|
sometimes an interrupt.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The SPI bus facilities listed here provide a generalized
|
||||||
|
interface to declare SPI busses and devices, manage them
|
||||||
|
according to the standard Linux driver model, and perform
|
||||||
|
input/output operations.
|
||||||
|
At this time, only "master" side interfaces are supported,
|
||||||
|
where Linux talks to SPI peripherals and does not implement
|
||||||
|
such a peripheral itself.
|
||||||
|
(Interfaces to support implementing SPI slaves would
|
||||||
|
necessarily look different.)
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The programming interface is structured around two kinds of driver,
|
||||||
|
and two kinds of device.
|
||||||
|
A "Controller Driver" abstracts the controller hardware, which may
|
||||||
|
be as simple as a set of GPIO pins or as complex as a pair of FIFOs
|
||||||
|
connected to dual DMA engines on the other side of the SPI shift
|
||||||
|
register (maximizing throughput). Such drivers bridge between
|
||||||
|
whatever bus they sit on (often the platform bus) and SPI, and
|
||||||
|
expose the SPI side of their device as a
|
||||||
|
<structname>struct spi_master</structname>.
|
||||||
|
SPI devices are children of that master, represented as a
|
||||||
|
<structname>struct spi_device</structname> and manufactured from
|
||||||
|
<structname>struct spi_board_info</structname> descriptors which
|
||||||
|
are usually provided by board-specific initialization code.
|
||||||
|
A <structname>struct spi_driver</structname> is called a
|
||||||
|
"Protocol Driver", and is bound to a spi_device using normal
|
||||||
|
driver model calls.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The I/O model is a set of queued messages. Protocol drivers
|
||||||
|
submit one or more <structname>struct spi_message</structname>
|
||||||
|
objects, which are processed and completed asynchronously.
|
||||||
|
(There are synchronous wrappers, however.) Messages are
|
||||||
|
built from one or more <structname>struct spi_transfer</structname>
|
||||||
|
objects, each of which wraps a full duplex SPI transfer.
|
||||||
|
A variety of protocol tweaking options are needed, because
|
||||||
|
different chips adopt very different policies for how they
|
||||||
|
use the bits transferred with SPI.
|
||||||
|
</para>
|
||||||
|
!Iinclude/linux/spi/spi.h
|
||||||
|
!Fdrivers/spi/spi.c spi_register_board_info
|
||||||
|
!Edrivers/spi/spi.c
|
||||||
|
</chapter>
|
||||||
|
|
||||||
|
<chapter id="i2c">
|
||||||
|
<title>I<superscript>2</superscript>C and SMBus Subsystem</title>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
I<superscript>2</superscript>C (or without fancy typography, "I2C")
|
||||||
|
is an acronym for the "Inter-IC" bus, a simple bus protocol which is
|
||||||
|
widely used where low data rate communications suffice.
|
||||||
|
Since it's also a licensed trademark, some vendors use another
|
||||||
|
name (such as "Two-Wire Interface", TWI) for the same bus.
|
||||||
|
I2C only needs two signals (SCL for clock, SDA for data), conserving
|
||||||
|
board real estate and minimizing signal quality issues.
|
||||||
|
Most I2C devices use seven bit addresses, and bus speeds of up
|
||||||
|
to 400 kHz; there's a high speed extension (3.4 MHz) that's not yet
|
||||||
|
found wide use.
|
||||||
|
I2C is a multi-master bus; open drain signaling is used to
|
||||||
|
arbitrate between masters, as well as to handshake and to
|
||||||
|
synchronize clocks from slower clients.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The Linux I2C programming interfaces support only the master
|
||||||
|
side of bus interactions, not the slave side.
|
||||||
|
The programming interface is structured around two kinds of driver,
|
||||||
|
and two kinds of device.
|
||||||
|
An I2C "Adapter Driver" abstracts the controller hardware; it binds
|
||||||
|
to a physical device (perhaps a PCI device or platform_device) and
|
||||||
|
exposes a <structname>struct i2c_adapter</structname> representing
|
||||||
|
each I2C bus segment it manages.
|
||||||
|
On each I2C bus segment will be I2C devices represented by a
|
||||||
|
<structname>struct i2c_client</structname>. Those devices will
|
||||||
|
be bound to a <structname>struct i2c_driver</structname>,
|
||||||
|
which should follow the standard Linux driver model.
|
||||||
|
(At this writing, a legacy model is more widely used.)
|
||||||
|
There are functions to perform various I2C protocol operations; at
|
||||||
|
this writing all such functions are usable only from task context.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The System Management Bus (SMBus) is a sibling protocol. Most SMBus
|
||||||
|
systems are also I2C conformant. The electrical constraints are
|
||||||
|
tighter for SMBus, and it standardizes particular protocol messages
|
||||||
|
and idioms. Controllers that support I2C can also support most
|
||||||
|
SMBus operations, but SMBus controllers don't support all the protocol
|
||||||
|
options that an I2C controller will.
|
||||||
|
There are functions to perform various SMBus protocol operations,
|
||||||
|
either using I2C primitives or by issuing SMBus commands to
|
||||||
|
i2c_adapter devices which don't support those I2C operations.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
!Iinclude/linux/i2c.h
|
||||||
|
!Fdrivers/i2c/i2c-boardinfo.c i2c_register_board_info
|
||||||
|
!Edrivers/i2c/i2c-core.c
|
||||||
|
</chapter>
|
||||||
|
|
||||||
|
</book>
|
@@ -440,6 +440,7 @@ desc->chip->end();
|
|||||||
used in the generic IRQ layer.
|
used in the generic IRQ layer.
|
||||||
</para>
|
</para>
|
||||||
!Iinclude/linux/irq.h
|
!Iinclude/linux/irq.h
|
||||||
|
!Iinclude/linux/interrupt.h
|
||||||
</chapter>
|
</chapter>
|
||||||
|
|
||||||
<chapter id="pubfunctions">
|
<chapter id="pubfunctions">
|
||||||
|
@@ -38,58 +38,6 @@
|
|||||||
|
|
||||||
<toc></toc>
|
<toc></toc>
|
||||||
|
|
||||||
<chapter id="Basics">
|
|
||||||
<title>Driver Basics</title>
|
|
||||||
<sect1><title>Driver Entry and Exit points</title>
|
|
||||||
!Iinclude/linux/init.h
|
|
||||||
</sect1>
|
|
||||||
|
|
||||||
<sect1><title>Atomic and pointer manipulation</title>
|
|
||||||
!Iarch/x86/include/asm/atomic_32.h
|
|
||||||
!Iarch/x86/include/asm/unaligned.h
|
|
||||||
</sect1>
|
|
||||||
|
|
||||||
<sect1><title>Delaying, scheduling, and timer routines</title>
|
|
||||||
!Iinclude/linux/sched.h
|
|
||||||
!Ekernel/sched.c
|
|
||||||
!Ekernel/timer.c
|
|
||||||
</sect1>
|
|
||||||
<sect1><title>High-resolution timers</title>
|
|
||||||
!Iinclude/linux/ktime.h
|
|
||||||
!Iinclude/linux/hrtimer.h
|
|
||||||
!Ekernel/hrtimer.c
|
|
||||||
</sect1>
|
|
||||||
<sect1><title>Workqueues and Kevents</title>
|
|
||||||
!Ekernel/workqueue.c
|
|
||||||
</sect1>
|
|
||||||
<sect1><title>Internal Functions</title>
|
|
||||||
!Ikernel/exit.c
|
|
||||||
!Ikernel/signal.c
|
|
||||||
!Iinclude/linux/kthread.h
|
|
||||||
!Ekernel/kthread.c
|
|
||||||
</sect1>
|
|
||||||
|
|
||||||
<sect1><title>Kernel objects manipulation</title>
|
|
||||||
<!--
|
|
||||||
X!Iinclude/linux/kobject.h
|
|
||||||
-->
|
|
||||||
!Elib/kobject.c
|
|
||||||
</sect1>
|
|
||||||
|
|
||||||
<sect1><title>Kernel utility functions</title>
|
|
||||||
!Iinclude/linux/kernel.h
|
|
||||||
!Ekernel/printk.c
|
|
||||||
!Ekernel/panic.c
|
|
||||||
!Ekernel/sys.c
|
|
||||||
!Ekernel/rcupdate.c
|
|
||||||
</sect1>
|
|
||||||
|
|
||||||
<sect1><title>Device Resource Management</title>
|
|
||||||
!Edrivers/base/devres.c
|
|
||||||
</sect1>
|
|
||||||
|
|
||||||
</chapter>
|
|
||||||
|
|
||||||
<chapter id="adt">
|
<chapter id="adt">
|
||||||
<title>Data Types</title>
|
<title>Data Types</title>
|
||||||
<sect1><title>Doubly Linked Lists</title>
|
<sect1><title>Doubly Linked Lists</title>
|
||||||
@@ -251,6 +199,7 @@ X!Edrivers/pci/hotplug.c
|
|||||||
-->
|
-->
|
||||||
!Edrivers/pci/probe.c
|
!Edrivers/pci/probe.c
|
||||||
!Edrivers/pci/rom.c
|
!Edrivers/pci/rom.c
|
||||||
|
!Edrivers/pci/iov.c
|
||||||
</sect1>
|
</sect1>
|
||||||
<sect1><title>PCI Hotplug Support Library</title>
|
<sect1><title>PCI Hotplug Support Library</title>
|
||||||
!Edrivers/pci/hotplug/pci_hotplug_core.c
|
!Edrivers/pci/hotplug/pci_hotplug_core.c
|
||||||
@@ -298,62 +247,6 @@ X!Earch/x86/kernel/mca_32.c
|
|||||||
!Ikernel/acct.c
|
!Ikernel/acct.c
|
||||||
</chapter>
|
</chapter>
|
||||||
|
|
||||||
<chapter id="devdrivers">
|
|
||||||
<title>Device drivers infrastructure</title>
|
|
||||||
<sect1><title>Device Drivers Base</title>
|
|
||||||
<!--
|
|
||||||
X!Iinclude/linux/device.h
|
|
||||||
-->
|
|
||||||
!Edrivers/base/driver.c
|
|
||||||
!Edrivers/base/core.c
|
|
||||||
!Edrivers/base/class.c
|
|
||||||
!Edrivers/base/firmware_class.c
|
|
||||||
!Edrivers/base/transport_class.c
|
|
||||||
<!-- Cannot be included, because
|
|
||||||
attribute_container_add_class_device_adapter
|
|
||||||
and attribute_container_classdev_to_container
|
|
||||||
exceed allowed 44 characters maximum
|
|
||||||
X!Edrivers/base/attribute_container.c
|
|
||||||
-->
|
|
||||||
!Edrivers/base/sys.c
|
|
||||||
<!--
|
|
||||||
X!Edrivers/base/interface.c
|
|
||||||
-->
|
|
||||||
!Edrivers/base/platform.c
|
|
||||||
!Edrivers/base/bus.c
|
|
||||||
</sect1>
|
|
||||||
<sect1><title>Device Drivers Power Management</title>
|
|
||||||
!Edrivers/base/power/main.c
|
|
||||||
</sect1>
|
|
||||||
<sect1><title>Device Drivers ACPI Support</title>
|
|
||||||
<!-- Internal functions only
|
|
||||||
X!Edrivers/acpi/sleep/main.c
|
|
||||||
X!Edrivers/acpi/sleep/wakeup.c
|
|
||||||
X!Edrivers/acpi/motherboard.c
|
|
||||||
X!Edrivers/acpi/bus.c
|
|
||||||
-->
|
|
||||||
!Edrivers/acpi/scan.c
|
|
||||||
!Idrivers/acpi/scan.c
|
|
||||||
<!-- No correct structured comments
|
|
||||||
X!Edrivers/acpi/pci_bind.c
|
|
||||||
-->
|
|
||||||
</sect1>
|
|
||||||
<sect1><title>Device drivers PnP support</title>
|
|
||||||
!Idrivers/pnp/core.c
|
|
||||||
<!-- No correct structured comments
|
|
||||||
X!Edrivers/pnp/system.c
|
|
||||||
-->
|
|
||||||
!Edrivers/pnp/card.c
|
|
||||||
!Idrivers/pnp/driver.c
|
|
||||||
!Edrivers/pnp/manager.c
|
|
||||||
!Edrivers/pnp/support.c
|
|
||||||
</sect1>
|
|
||||||
<sect1><title>Userspace IO devices</title>
|
|
||||||
!Edrivers/uio/uio.c
|
|
||||||
!Iinclude/linux/uio_driver.h
|
|
||||||
</sect1>
|
|
||||||
</chapter>
|
|
||||||
|
|
||||||
<chapter id="blkdev">
|
<chapter id="blkdev">
|
||||||
<title>Block Devices</title>
|
<title>Block Devices</title>
|
||||||
!Eblock/blk-core.c
|
!Eblock/blk-core.c
|
||||||
@@ -366,7 +259,7 @@ X!Edrivers/pnp/system.c
|
|||||||
!Eblock/blk-tag.c
|
!Eblock/blk-tag.c
|
||||||
!Iblock/blk-tag.c
|
!Iblock/blk-tag.c
|
||||||
!Eblock/blk-integrity.c
|
!Eblock/blk-integrity.c
|
||||||
!Iblock/blktrace.c
|
!Ikernel/trace/blktrace.c
|
||||||
!Iblock/genhd.c
|
!Iblock/genhd.c
|
||||||
!Eblock/genhd.c
|
!Eblock/genhd.c
|
||||||
</chapter>
|
</chapter>
|
||||||
@@ -381,275 +274,6 @@ X!Edrivers/pnp/system.c
|
|||||||
!Edrivers/char/misc.c
|
!Edrivers/char/misc.c
|
||||||
</chapter>
|
</chapter>
|
||||||
|
|
||||||
<chapter id="parportdev">
|
|
||||||
<title>Parallel Port Devices</title>
|
|
||||||
!Iinclude/linux/parport.h
|
|
||||||
!Edrivers/parport/ieee1284.c
|
|
||||||
!Edrivers/parport/share.c
|
|
||||||
!Idrivers/parport/daisy.c
|
|
||||||
</chapter>
|
|
||||||
|
|
||||||
<chapter id="message_devices">
|
|
||||||
<title>Message-based devices</title>
|
|
||||||
<sect1><title>Fusion message devices</title>
|
|
||||||
!Edrivers/message/fusion/mptbase.c
|
|
||||||
!Idrivers/message/fusion/mptbase.c
|
|
||||||
!Edrivers/message/fusion/mptscsih.c
|
|
||||||
!Idrivers/message/fusion/mptscsih.c
|
|
||||||
!Idrivers/message/fusion/mptctl.c
|
|
||||||
!Idrivers/message/fusion/mptspi.c
|
|
||||||
!Idrivers/message/fusion/mptfc.c
|
|
||||||
!Idrivers/message/fusion/mptlan.c
|
|
||||||
</sect1>
|
|
||||||
<sect1><title>I2O message devices</title>
|
|
||||||
!Iinclude/linux/i2o.h
|
|
||||||
!Idrivers/message/i2o/core.h
|
|
||||||
!Edrivers/message/i2o/iop.c
|
|
||||||
!Idrivers/message/i2o/iop.c
|
|
||||||
!Idrivers/message/i2o/config-osm.c
|
|
||||||
!Edrivers/message/i2o/exec-osm.c
|
|
||||||
!Idrivers/message/i2o/exec-osm.c
|
|
||||||
!Idrivers/message/i2o/bus-osm.c
|
|
||||||
!Edrivers/message/i2o/device.c
|
|
||||||
!Idrivers/message/i2o/device.c
|
|
||||||
!Idrivers/message/i2o/driver.c
|
|
||||||
!Idrivers/message/i2o/pci.c
|
|
||||||
!Idrivers/message/i2o/i2o_block.c
|
|
||||||
!Idrivers/message/i2o/i2o_scsi.c
|
|
||||||
!Idrivers/message/i2o/i2o_proc.c
|
|
||||||
</sect1>
|
|
||||||
</chapter>
|
|
||||||
|
|
||||||
<chapter id="snddev">
|
|
||||||
<title>Sound Devices</title>
|
|
||||||
!Iinclude/sound/core.h
|
|
||||||
!Esound/sound_core.c
|
|
||||||
!Iinclude/sound/pcm.h
|
|
||||||
!Esound/core/pcm.c
|
|
||||||
!Esound/core/device.c
|
|
||||||
!Esound/core/info.c
|
|
||||||
!Esound/core/rawmidi.c
|
|
||||||
!Esound/core/sound.c
|
|
||||||
!Esound/core/memory.c
|
|
||||||
!Esound/core/pcm_memory.c
|
|
||||||
!Esound/core/init.c
|
|
||||||
!Esound/core/isadma.c
|
|
||||||
!Esound/core/control.c
|
|
||||||
!Esound/core/pcm_lib.c
|
|
||||||
!Esound/core/hwdep.c
|
|
||||||
!Esound/core/pcm_native.c
|
|
||||||
!Esound/core/memalloc.c
|
|
||||||
<!-- FIXME: Removed for now since no structured comments in source
|
|
||||||
X!Isound/sound_firmware.c
|
|
||||||
-->
|
|
||||||
</chapter>
|
|
||||||
|
|
||||||
<chapter id="uart16x50">
|
|
||||||
<title>16x50 UART Driver</title>
|
|
||||||
!Iinclude/linux/serial_core.h
|
|
||||||
!Edrivers/serial/serial_core.c
|
|
||||||
!Edrivers/serial/8250.c
|
|
||||||
</chapter>
|
|
||||||
|
|
||||||
<chapter id="fbdev">
|
|
||||||
<title>Frame Buffer Library</title>
|
|
||||||
|
|
||||||
<para>
|
|
||||||
The frame buffer drivers depend heavily on four data structures.
|
|
||||||
These structures are declared in include/linux/fb.h. They are
|
|
||||||
fb_info, fb_var_screeninfo, fb_fix_screeninfo and fb_monospecs.
|
|
||||||
The last three can be made available to and from userland.
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<para>
|
|
||||||
fb_info defines the current state of a particular video card.
|
|
||||||
Inside fb_info, there exists a fb_ops structure which is a
|
|
||||||
collection of needed functions to make fbdev and fbcon work.
|
|
||||||
fb_info is only visible to the kernel.
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<para>
|
|
||||||
fb_var_screeninfo is used to describe the features of a video card
|
|
||||||
that are user defined. With fb_var_screeninfo, things such as
|
|
||||||
depth and the resolution may be defined.
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<para>
|
|
||||||
The next structure is fb_fix_screeninfo. This defines the
|
|
||||||
properties of a card that are created when a mode is set and can't
|
|
||||||
be changed otherwise. A good example of this is the start of the
|
|
||||||
frame buffer memory. This "locks" the address of the frame buffer
|
|
||||||
memory, so that it cannot be changed or moved.
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<para>
|
|
||||||
The last structure is fb_monospecs. In the old API, there was
|
|
||||||
little importance for fb_monospecs. This allowed for forbidden things
|
|
||||||
such as setting a mode of 800x600 on a fix frequency monitor. With
|
|
||||||
the new API, fb_monospecs prevents such things, and if used
|
|
||||||
correctly, can prevent a monitor from being cooked. fb_monospecs
|
|
||||||
will not be useful until kernels 2.5.x.
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<sect1><title>Frame Buffer Memory</title>
|
|
||||||
!Edrivers/video/fbmem.c
|
|
||||||
</sect1>
|
|
||||||
<!--
|
|
||||||
<sect1><title>Frame Buffer Console</title>
|
|
||||||
X!Edrivers/video/console/fbcon.c
|
|
||||||
</sect1>
|
|
||||||
-->
|
|
||||||
<sect1><title>Frame Buffer Colormap</title>
|
|
||||||
!Edrivers/video/fbcmap.c
|
|
||||||
</sect1>
|
|
||||||
<!-- FIXME:
|
|
||||||
drivers/video/fbgen.c has no docs, which stuffs up the sgml. Comment
|
|
||||||
out until somebody adds docs. KAO
|
|
||||||
<sect1><title>Frame Buffer Generic Functions</title>
|
|
||||||
X!Idrivers/video/fbgen.c
|
|
||||||
</sect1>
|
|
||||||
KAO -->
|
|
||||||
<sect1><title>Frame Buffer Video Mode Database</title>
|
|
||||||
!Idrivers/video/modedb.c
|
|
||||||
!Edrivers/video/modedb.c
|
|
||||||
</sect1>
|
|
||||||
<sect1><title>Frame Buffer Macintosh Video Mode Database</title>
|
|
||||||
!Edrivers/video/macmodes.c
|
|
||||||
</sect1>
|
|
||||||
<sect1><title>Frame Buffer Fonts</title>
|
|
||||||
<para>
|
|
||||||
Refer to the file drivers/video/console/fonts.c for more information.
|
|
||||||
</para>
|
|
||||||
<!-- FIXME: Removed for now since no structured comments in source
|
|
||||||
X!Idrivers/video/console/fonts.c
|
|
||||||
-->
|
|
||||||
</sect1>
|
|
||||||
</chapter>
|
|
||||||
|
|
||||||
<chapter id="input_subsystem">
|
|
||||||
<title>Input Subsystem</title>
|
|
||||||
!Iinclude/linux/input.h
|
|
||||||
!Edrivers/input/input.c
|
|
||||||
!Edrivers/input/ff-core.c
|
|
||||||
!Edrivers/input/ff-memless.c
|
|
||||||
</chapter>
|
|
||||||
|
|
||||||
<chapter id="spi">
|
|
||||||
<title>Serial Peripheral Interface (SPI)</title>
|
|
||||||
<para>
|
|
||||||
SPI is the "Serial Peripheral Interface", widely used with
|
|
||||||
embedded systems because it is a simple and efficient
|
|
||||||
interface: basically a multiplexed shift register.
|
|
||||||
Its three signal wires hold a clock (SCK, often in the range
|
|
||||||
of 1-20 MHz), a "Master Out, Slave In" (MOSI) data line, and
|
|
||||||
a "Master In, Slave Out" (MISO) data line.
|
|
||||||
SPI is a full duplex protocol; for each bit shifted out the
|
|
||||||
MOSI line (one per clock) another is shifted in on the MISO line.
|
|
||||||
Those bits are assembled into words of various sizes on the
|
|
||||||
way to and from system memory.
|
|
||||||
An additional chipselect line is usually active-low (nCS);
|
|
||||||
four signals are normally used for each peripheral, plus
|
|
||||||
sometimes an interrupt.
|
|
||||||
</para>
|
|
||||||
<para>
|
|
||||||
The SPI bus facilities listed here provide a generalized
|
|
||||||
interface to declare SPI busses and devices, manage them
|
|
||||||
according to the standard Linux driver model, and perform
|
|
||||||
input/output operations.
|
|
||||||
At this time, only "master" side interfaces are supported,
|
|
||||||
where Linux talks to SPI peripherals and does not implement
|
|
||||||
such a peripheral itself.
|
|
||||||
(Interfaces to support implementing SPI slaves would
|
|
||||||
necessarily look different.)
|
|
||||||
</para>
|
|
||||||
<para>
|
|
||||||
The programming interface is structured around two kinds of driver,
|
|
||||||
and two kinds of device.
|
|
||||||
A "Controller Driver" abstracts the controller hardware, which may
|
|
||||||
be as simple as a set of GPIO pins or as complex as a pair of FIFOs
|
|
||||||
connected to dual DMA engines on the other side of the SPI shift
|
|
||||||
register (maximizing throughput). Such drivers bridge between
|
|
||||||
whatever bus they sit on (often the platform bus) and SPI, and
|
|
||||||
expose the SPI side of their device as a
|
|
||||||
<structname>struct spi_master</structname>.
|
|
||||||
SPI devices are children of that master, represented as a
|
|
||||||
<structname>struct spi_device</structname> and manufactured from
|
|
||||||
<structname>struct spi_board_info</structname> descriptors which
|
|
||||||
are usually provided by board-specific initialization code.
|
|
||||||
A <structname>struct spi_driver</structname> is called a
|
|
||||||
"Protocol Driver", and is bound to a spi_device using normal
|
|
||||||
driver model calls.
|
|
||||||
</para>
|
|
||||||
<para>
|
|
||||||
The I/O model is a set of queued messages. Protocol drivers
|
|
||||||
submit one or more <structname>struct spi_message</structname>
|
|
||||||
objects, which are processed and completed asynchronously.
|
|
||||||
(There are synchronous wrappers, however.) Messages are
|
|
||||||
built from one or more <structname>struct spi_transfer</structname>
|
|
||||||
objects, each of which wraps a full duplex SPI transfer.
|
|
||||||
A variety of protocol tweaking options are needed, because
|
|
||||||
different chips adopt very different policies for how they
|
|
||||||
use the bits transferred with SPI.
|
|
||||||
</para>
|
|
||||||
!Iinclude/linux/spi/spi.h
|
|
||||||
!Fdrivers/spi/spi.c spi_register_board_info
|
|
||||||
!Edrivers/spi/spi.c
|
|
||||||
</chapter>
|
|
||||||
|
|
||||||
<chapter id="i2c">
|
|
||||||
<title>I<superscript>2</superscript>C and SMBus Subsystem</title>
|
|
||||||
|
|
||||||
<para>
|
|
||||||
I<superscript>2</superscript>C (or without fancy typography, "I2C")
|
|
||||||
is an acronym for the "Inter-IC" bus, a simple bus protocol which is
|
|
||||||
widely used where low data rate communications suffice.
|
|
||||||
Since it's also a licensed trademark, some vendors use another
|
|
||||||
name (such as "Two-Wire Interface", TWI) for the same bus.
|
|
||||||
I2C only needs two signals (SCL for clock, SDA for data), conserving
|
|
||||||
board real estate and minimizing signal quality issues.
|
|
||||||
Most I2C devices use seven bit addresses, and bus speeds of up
|
|
||||||
to 400 kHz; there's a high speed extension (3.4 MHz) that's not yet
|
|
||||||
found wide use.
|
|
||||||
I2C is a multi-master bus; open drain signaling is used to
|
|
||||||
arbitrate between masters, as well as to handshake and to
|
|
||||||
synchronize clocks from slower clients.
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<para>
|
|
||||||
The Linux I2C programming interfaces support only the master
|
|
||||||
side of bus interactions, not the slave side.
|
|
||||||
The programming interface is structured around two kinds of driver,
|
|
||||||
and two kinds of device.
|
|
||||||
An I2C "Adapter Driver" abstracts the controller hardware; it binds
|
|
||||||
to a physical device (perhaps a PCI device or platform_device) and
|
|
||||||
exposes a <structname>struct i2c_adapter</structname> representing
|
|
||||||
each I2C bus segment it manages.
|
|
||||||
On each I2C bus segment will be I2C devices represented by a
|
|
||||||
<structname>struct i2c_client</structname>. Those devices will
|
|
||||||
be bound to a <structname>struct i2c_driver</structname>,
|
|
||||||
which should follow the standard Linux driver model.
|
|
||||||
(At this writing, a legacy model is more widely used.)
|
|
||||||
There are functions to perform various I2C protocol operations; at
|
|
||||||
this writing all such functions are usable only from task context.
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<para>
|
|
||||||
The System Management Bus (SMBus) is a sibling protocol. Most SMBus
|
|
||||||
systems are also I2C conformant. The electrical constraints are
|
|
||||||
tighter for SMBus, and it standardizes particular protocol messages
|
|
||||||
and idioms. Controllers that support I2C can also support most
|
|
||||||
SMBus operations, but SMBus controllers don't support all the protocol
|
|
||||||
options that an I2C controller will.
|
|
||||||
There are functions to perform various SMBus protocol operations,
|
|
||||||
either using I2C primitives or by issuing SMBus commands to
|
|
||||||
i2c_adapter devices which don't support those I2C operations.
|
|
||||||
</para>
|
|
||||||
|
|
||||||
!Iinclude/linux/i2c.h
|
|
||||||
!Fdrivers/i2c/i2c-boardinfo.c i2c_register_board_info
|
|
||||||
!Edrivers/i2c/i2c-core.c
|
|
||||||
</chapter>
|
|
||||||
|
|
||||||
<chapter id="clk">
|
<chapter id="clk">
|
||||||
<title>Clock Framework</title>
|
<title>Clock Framework</title>
|
||||||
|
|
||||||
|
@@ -17,8 +17,7 @@
|
|||||||
</authorgroup>
|
</authorgroup>
|
||||||
|
|
||||||
<copyright>
|
<copyright>
|
||||||
<year>2007</year>
|
<year>2007-2009</year>
|
||||||
<year>2008</year>
|
|
||||||
<holder>Johannes Berg</holder>
|
<holder>Johannes Berg</holder>
|
||||||
</copyright>
|
</copyright>
|
||||||
|
|
||||||
@@ -165,8 +164,8 @@ usage should require reading the full document.
|
|||||||
!Pinclude/net/mac80211.h Frame format
|
!Pinclude/net/mac80211.h Frame format
|
||||||
</sect1>
|
</sect1>
|
||||||
<sect1>
|
<sect1>
|
||||||
<title>Alignment issues</title>
|
<title>Packet alignment</title>
|
||||||
<para>TBD</para>
|
!Pnet/mac80211/rx.c Packet alignment
|
||||||
</sect1>
|
</sect1>
|
||||||
<sect1>
|
<sect1>
|
||||||
<title>Calling into mac80211 from interrupts</title>
|
<title>Calling into mac80211 from interrupts</title>
|
||||||
@@ -223,6 +222,17 @@ usage should require reading the full document.
|
|||||||
!Finclude/net/mac80211.h ieee80211_key_flags
|
!Finclude/net/mac80211.h ieee80211_key_flags
|
||||||
</chapter>
|
</chapter>
|
||||||
|
|
||||||
|
<chapter id="powersave">
|
||||||
|
<title>Powersave support</title>
|
||||||
|
!Pinclude/net/mac80211.h Powersave support
|
||||||
|
</chapter>
|
||||||
|
|
||||||
|
<chapter id="beacon-filter">
|
||||||
|
<title>Beacon filter support</title>
|
||||||
|
!Pinclude/net/mac80211.h Beacon filter support
|
||||||
|
!Finclude/net/mac80211.h ieee80211_beacon_loss
|
||||||
|
</chapter>
|
||||||
|
|
||||||
<chapter id="qos">
|
<chapter id="qos">
|
||||||
<title>Multiple queues and QoS support</title>
|
<title>Multiple queues and QoS support</title>
|
||||||
<para>TBD</para>
|
<para>TBD</para>
|
||||||
|
@@ -117,9 +117,6 @@ static int __init init_procfs_example(void)
|
|||||||
rv = -ENOMEM;
|
rv = -ENOMEM;
|
||||||
goto out;
|
goto out;
|
||||||
}
|
}
|
||||||
|
|
||||||
example_dir->owner = THIS_MODULE;
|
|
||||||
|
|
||||||
/* create jiffies using convenience function */
|
/* create jiffies using convenience function */
|
||||||
jiffies_file = create_proc_read_entry("jiffies",
|
jiffies_file = create_proc_read_entry("jiffies",
|
||||||
0444, example_dir,
|
0444, example_dir,
|
||||||
@@ -130,8 +127,6 @@ static int __init init_procfs_example(void)
|
|||||||
goto no_jiffies;
|
goto no_jiffies;
|
||||||
}
|
}
|
||||||
|
|
||||||
jiffies_file->owner = THIS_MODULE;
|
|
||||||
|
|
||||||
/* create foo and bar files using same callback
|
/* create foo and bar files using same callback
|
||||||
* functions
|
* functions
|
||||||
*/
|
*/
|
||||||
@@ -146,7 +141,6 @@ static int __init init_procfs_example(void)
|
|||||||
foo_file->data = &foo_data;
|
foo_file->data = &foo_data;
|
||||||
foo_file->read_proc = proc_read_foobar;
|
foo_file->read_proc = proc_read_foobar;
|
||||||
foo_file->write_proc = proc_write_foobar;
|
foo_file->write_proc = proc_write_foobar;
|
||||||
foo_file->owner = THIS_MODULE;
|
|
||||||
|
|
||||||
bar_file = create_proc_entry("bar", 0644, example_dir);
|
bar_file = create_proc_entry("bar", 0644, example_dir);
|
||||||
if(bar_file == NULL) {
|
if(bar_file == NULL) {
|
||||||
@@ -159,7 +153,6 @@ static int __init init_procfs_example(void)
|
|||||||
bar_file->data = &bar_data;
|
bar_file->data = &bar_data;
|
||||||
bar_file->read_proc = proc_read_foobar;
|
bar_file->read_proc = proc_read_foobar;
|
||||||
bar_file->write_proc = proc_write_foobar;
|
bar_file->write_proc = proc_write_foobar;
|
||||||
bar_file->owner = THIS_MODULE;
|
|
||||||
|
|
||||||
/* create symlink */
|
/* create symlink */
|
||||||
symlink = proc_symlink("jiffies_too", example_dir,
|
symlink = proc_symlink("jiffies_too", example_dir,
|
||||||
@@ -169,8 +162,6 @@ static int __init init_procfs_example(void)
|
|||||||
goto no_symlink;
|
goto no_symlink;
|
||||||
}
|
}
|
||||||
|
|
||||||
symlink->owner = THIS_MODULE;
|
|
||||||
|
|
||||||
/* everything OK */
|
/* everything OK */
|
||||||
printk(KERN_INFO "%s %s initialised\n",
|
printk(KERN_INFO "%s %s initialised\n",
|
||||||
MODULE_NAME, MODULE_VERS);
|
MODULE_NAME, MODULE_VERS);
|
||||||
|
@@ -41,6 +41,13 @@ GPL version 2.
|
|||||||
</abstract>
|
</abstract>
|
||||||
|
|
||||||
<revhistory>
|
<revhistory>
|
||||||
|
<revision>
|
||||||
|
<revnumber>0.8</revnumber>
|
||||||
|
<date>2008-12-24</date>
|
||||||
|
<authorinitials>hjk</authorinitials>
|
||||||
|
<revremark>Added name attributes in mem and portio sysfs directories.
|
||||||
|
</revremark>
|
||||||
|
</revision>
|
||||||
<revision>
|
<revision>
|
||||||
<revnumber>0.7</revnumber>
|
<revnumber>0.7</revnumber>
|
||||||
<date>2008-12-23</date>
|
<date>2008-12-23</date>
|
||||||
@@ -303,10 +310,17 @@ interested in translating it, please email me
|
|||||||
appear if the size of the mapping is not 0.
|
appear if the size of the mapping is not 0.
|
||||||
</para>
|
</para>
|
||||||
<para>
|
<para>
|
||||||
Each <filename>mapX/</filename> directory contains two read-only files
|
Each <filename>mapX/</filename> directory contains four read-only files
|
||||||
that show start address and size of the memory:
|
that show attributes of the memory:
|
||||||
</para>
|
</para>
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<filename>name</filename>: A string identifier for this mapping. This
|
||||||
|
is optional, the string can be empty. Drivers can set this to make it
|
||||||
|
easier for userspace to find the correct mapping.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>
|
<para>
|
||||||
<filename>addr</filename>: The address of memory that can be mapped.
|
<filename>addr</filename>: The address of memory that can be mapped.
|
||||||
@@ -366,10 +380,17 @@ offset = N * getpagesize();
|
|||||||
<filename>/sys/class/uio/uioX/portio/</filename>.
|
<filename>/sys/class/uio/uioX/portio/</filename>.
|
||||||
</para>
|
</para>
|
||||||
<para>
|
<para>
|
||||||
Each <filename>portX/</filename> directory contains three read-only
|
Each <filename>portX/</filename> directory contains four read-only
|
||||||
files that show start, size, and type of the port region:
|
files that show name, start, size, and type of the port region:
|
||||||
</para>
|
</para>
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
<filename>name</filename>: A string identifier for this port region.
|
||||||
|
The string is optional and can be empty. Drivers can set it to make it
|
||||||
|
easier for userspace to find a certain port region.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>
|
<para>
|
||||||
<filename>start</filename>: The first port of this region.
|
<filename>start</filename>: The first port of this region.
|
||||||
|
@@ -1,11 +1,11 @@
|
|||||||
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.1//EN">
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
|
||||||
<book>
|
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
|
||||||
<?dbhtml filename="index.html">
|
|
||||||
|
|
||||||
<!-- ****************************************************** -->
|
<!-- ****************************************************** -->
|
||||||
<!-- Header -->
|
<!-- Header -->
|
||||||
<!-- ****************************************************** -->
|
<!-- ****************************************************** -->
|
||||||
|
<book id="Writing-an-ALSA-Driver">
|
||||||
<bookinfo>
|
<bookinfo>
|
||||||
<title>Writing an ALSA Driver</title>
|
<title>Writing an ALSA Driver</title>
|
||||||
<author>
|
<author>
|
||||||
@@ -492,9 +492,9 @@
|
|||||||
}
|
}
|
||||||
|
|
||||||
/* (2) */
|
/* (2) */
|
||||||
card = snd_card_new(index[dev], id[dev], THIS_MODULE, 0);
|
err = snd_card_create(index[dev], id[dev], THIS_MODULE, 0, &card);
|
||||||
if (card == NULL)
|
if (err < 0)
|
||||||
return -ENOMEM;
|
return err;
|
||||||
|
|
||||||
/* (3) */
|
/* (3) */
|
||||||
err = snd_mychip_create(card, pci, &chip);
|
err = snd_mychip_create(card, pci, &chip);
|
||||||
@@ -590,8 +590,9 @@
|
|||||||
<programlisting>
|
<programlisting>
|
||||||
<![CDATA[
|
<![CDATA[
|
||||||
struct snd_card *card;
|
struct snd_card *card;
|
||||||
|
int err;
|
||||||
....
|
....
|
||||||
card = snd_card_new(index[dev], id[dev], THIS_MODULE, 0);
|
err = snd_card_create(index[dev], id[dev], THIS_MODULE, 0, &card);
|
||||||
]]>
|
]]>
|
||||||
</programlisting>
|
</programlisting>
|
||||||
</informalexample>
|
</informalexample>
|
||||||
@@ -809,26 +810,28 @@
|
|||||||
|
|
||||||
<para>
|
<para>
|
||||||
As mentioned above, to create a card instance, call
|
As mentioned above, to create a card instance, call
|
||||||
<function>snd_card_new()</function>.
|
<function>snd_card_create()</function>.
|
||||||
|
|
||||||
<informalexample>
|
<informalexample>
|
||||||
<programlisting>
|
<programlisting>
|
||||||
<![CDATA[
|
<![CDATA[
|
||||||
struct snd_card *card;
|
struct snd_card *card;
|
||||||
card = snd_card_new(index, id, module, extra_size);
|
int err;
|
||||||
|
err = snd_card_create(index, id, module, extra_size, &card);
|
||||||
]]>
|
]]>
|
||||||
</programlisting>
|
</programlisting>
|
||||||
</informalexample>
|
</informalexample>
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
The function takes four arguments, the card-index number, the
|
The function takes five arguments, the card-index number, the
|
||||||
id string, the module pointer (usually
|
id string, the module pointer (usually
|
||||||
<constant>THIS_MODULE</constant>),
|
<constant>THIS_MODULE</constant>),
|
||||||
and the size of extra-data space. The last argument is used to
|
the size of extra-data space, and the pointer to return the
|
||||||
|
card instance. The extra_size argument is used to
|
||||||
allocate card->private_data for the
|
allocate card->private_data for the
|
||||||
chip-specific data. Note that these data
|
chip-specific data. Note that these data
|
||||||
are allocated by <function>snd_card_new()</function>.
|
are allocated by <function>snd_card_create()</function>.
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
@@ -915,15 +918,16 @@
|
|||||||
</para>
|
</para>
|
||||||
|
|
||||||
<section id="card-management-chip-specific-snd-card-new">
|
<section id="card-management-chip-specific-snd-card-new">
|
||||||
<title>1. Allocating via <function>snd_card_new()</function>.</title>
|
<title>1. Allocating via <function>snd_card_create()</function>.</title>
|
||||||
<para>
|
<para>
|
||||||
As mentioned above, you can pass the extra-data-length
|
As mentioned above, you can pass the extra-data-length
|
||||||
to the 4th argument of <function>snd_card_new()</function>, i.e.
|
to the 4th argument of <function>snd_card_create()</function>, i.e.
|
||||||
|
|
||||||
<informalexample>
|
<informalexample>
|
||||||
<programlisting>
|
<programlisting>
|
||||||
<![CDATA[
|
<![CDATA[
|
||||||
card = snd_card_new(index[dev], id[dev], THIS_MODULE, sizeof(struct mychip));
|
err = snd_card_create(index[dev], id[dev], THIS_MODULE,
|
||||||
|
sizeof(struct mychip), &card);
|
||||||
]]>
|
]]>
|
||||||
</programlisting>
|
</programlisting>
|
||||||
</informalexample>
|
</informalexample>
|
||||||
@@ -952,8 +956,8 @@
|
|||||||
|
|
||||||
<para>
|
<para>
|
||||||
After allocating a card instance via
|
After allocating a card instance via
|
||||||
<function>snd_card_new()</function> (with
|
<function>snd_card_create()</function> (with
|
||||||
<constant>NULL</constant> on the 4th arg), call
|
<constant>0</constant> on the 4th arg), call
|
||||||
<function>kzalloc()</function>.
|
<function>kzalloc()</function>.
|
||||||
|
|
||||||
<informalexample>
|
<informalexample>
|
||||||
@@ -961,7 +965,7 @@
|
|||||||
<![CDATA[
|
<![CDATA[
|
||||||
struct snd_card *card;
|
struct snd_card *card;
|
||||||
struct mychip *chip;
|
struct mychip *chip;
|
||||||
card = snd_card_new(index[dev], id[dev], THIS_MODULE, NULL);
|
err = snd_card_create(index[dev], id[dev], THIS_MODULE, 0, &card);
|
||||||
.....
|
.....
|
||||||
chip = kzalloc(sizeof(*chip), GFP_KERNEL);
|
chip = kzalloc(sizeof(*chip), GFP_KERNEL);
|
||||||
]]>
|
]]>
|
||||||
@@ -1133,8 +1137,8 @@
|
|||||||
if (err < 0)
|
if (err < 0)
|
||||||
return err;
|
return err;
|
||||||
/* check PCI availability (28bit DMA) */
|
/* check PCI availability (28bit DMA) */
|
||||||
if (pci_set_dma_mask(pci, DMA_28BIT_MASK) < 0 ||
|
if (pci_set_dma_mask(pci, DMA_BIT_MASK(28)) < 0 ||
|
||||||
pci_set_consistent_dma_mask(pci, DMA_28BIT_MASK) < 0) {
|
pci_set_consistent_dma_mask(pci, DMA_BIT_MASK(28)) < 0) {
|
||||||
printk(KERN_ERR "error to set 28bit mask DMA\n");
|
printk(KERN_ERR "error to set 28bit mask DMA\n");
|
||||||
pci_disable_device(pci);
|
pci_disable_device(pci);
|
||||||
return -ENXIO;
|
return -ENXIO;
|
||||||
@@ -1248,8 +1252,8 @@
|
|||||||
err = pci_enable_device(pci);
|
err = pci_enable_device(pci);
|
||||||
if (err < 0)
|
if (err < 0)
|
||||||
return err;
|
return err;
|
||||||
if (pci_set_dma_mask(pci, DMA_28BIT_MASK) < 0 ||
|
if (pci_set_dma_mask(pci, DMA_BIT_MASK(28)) < 0 ||
|
||||||
pci_set_consistent_dma_mask(pci, DMA_28BIT_MASK) < 0) {
|
pci_set_consistent_dma_mask(pci, DMA_BIT_MASK(28)) < 0) {
|
||||||
printk(KERN_ERR "error to set 28bit mask DMA\n");
|
printk(KERN_ERR "error to set 28bit mask DMA\n");
|
||||||
pci_disable_device(pci);
|
pci_disable_device(pci);
|
||||||
return -ENXIO;
|
return -ENXIO;
|
||||||
@@ -5750,8 +5754,9 @@ struct _snd_pcm_runtime {
|
|||||||
....
|
....
|
||||||
struct snd_card *card;
|
struct snd_card *card;
|
||||||
struct mychip *chip;
|
struct mychip *chip;
|
||||||
|
int err;
|
||||||
....
|
....
|
||||||
card = snd_card_new(index[dev], id[dev], THIS_MODULE, NULL);
|
err = snd_card_create(index[dev], id[dev], THIS_MODULE, 0, &card);
|
||||||
....
|
....
|
||||||
chip = kzalloc(sizeof(*chip), GFP_KERNEL);
|
chip = kzalloc(sizeof(*chip), GFP_KERNEL);
|
||||||
....
|
....
|
||||||
@@ -5763,7 +5768,7 @@ struct _snd_pcm_runtime {
|
|||||||
</informalexample>
|
</informalexample>
|
||||||
|
|
||||||
When you created the chip data with
|
When you created the chip data with
|
||||||
<function>snd_card_new()</function>, it's anyway accessible
|
<function>snd_card_create()</function>, it's anyway accessible
|
||||||
via <structfield>private_data</structfield> field.
|
via <structfield>private_data</structfield> field.
|
||||||
|
|
||||||
<informalexample>
|
<informalexample>
|
||||||
@@ -5775,9 +5780,10 @@ struct _snd_pcm_runtime {
|
|||||||
....
|
....
|
||||||
struct snd_card *card;
|
struct snd_card *card;
|
||||||
struct mychip *chip;
|
struct mychip *chip;
|
||||||
|
int err;
|
||||||
....
|
....
|
||||||
card = snd_card_new(index[dev], id[dev], THIS_MODULE,
|
err = snd_card_create(index[dev], id[dev], THIS_MODULE,
|
||||||
sizeof(struct mychip));
|
sizeof(struct mychip), &card);
|
||||||
....
|
....
|
||||||
chip = card->private_data;
|
chip = card->private_data;
|
||||||
....
|
....
|
@@ -4,506 +4,356 @@
|
|||||||
Revised Feb 12, 2004 by Martine Silbermann
|
Revised Feb 12, 2004 by Martine Silbermann
|
||||||
email: Martine.Silbermann@hp.com
|
email: Martine.Silbermann@hp.com
|
||||||
Revised Jun 25, 2004 by Tom L Nguyen
|
Revised Jun 25, 2004 by Tom L Nguyen
|
||||||
|
Revised Jul 9, 2008 by Matthew Wilcox <willy@linux.intel.com>
|
||||||
|
Copyright 2003, 2008 Intel Corporation
|
||||||
|
|
||||||
1. About this guide
|
1. About this guide
|
||||||
|
|
||||||
This guide describes the basics of Message Signaled Interrupts (MSI),
|
This guide describes the basics of Message Signaled Interrupts (MSIs),
|
||||||
the advantages of using MSI over traditional interrupt mechanisms,
|
the advantages of using MSI over traditional interrupt mechanisms, how
|
||||||
and how to enable your driver to use MSI or MSI-X. Also included is
|
to change your driver to use MSI or MSI-X and some basic diagnostics to
|
||||||
a Frequently Asked Questions (FAQ) section.
|
try if a device doesn't support MSIs.
|
||||||
|
|
||||||
1.1 Terminology
|
|
||||||
|
|
||||||
PCI devices can be single-function or multi-function. In either case,
|
2. What are MSIs?
|
||||||
when this text talks about enabling or disabling MSI on a "device
|
|
||||||
function," it is referring to one specific PCI device and function and
|
|
||||||
not to all functions on a PCI device (unless the PCI device has only
|
|
||||||
one function).
|
|
||||||
|
|
||||||
2. Copyright 2003 Intel Corporation
|
A Message Signaled Interrupt is a write from the device to a special
|
||||||
|
address which causes an interrupt to be received by the CPU.
|
||||||
|
|
||||||
3. What is MSI/MSI-X?
|
The MSI capability was first specified in PCI 2.2 and was later enhanced
|
||||||
|
in PCI 3.0 to allow each interrupt to be masked individually. The MSI-X
|
||||||
|
capability was also introduced with PCI 3.0. It supports more interrupts
|
||||||
|
per device than MSI and allows interrupts to be independently configured.
|
||||||
|
|
||||||
Message Signaled Interrupt (MSI), as described in the PCI Local Bus
|
Devices may support both MSI and MSI-X, but only one can be enabled at
|
||||||
Specification Revision 2.3 or later, is an optional feature, and a
|
a time.
|
||||||
required feature for PCI Express devices. MSI enables a device function
|
|
||||||
to request service by sending an Inbound Memory Write on its PCI bus to
|
|
||||||
the FSB as a Message Signal Interrupt transaction. Because MSI is
|
|
||||||
generated in the form of a Memory Write, all transaction conditions,
|
|
||||||
such as a Retry, Master-Abort, Target-Abort or normal completion, are
|
|
||||||
supported.
|
|
||||||
|
|
||||||
A PCI device that supports MSI must also support pin IRQ assertion
|
|
||||||
interrupt mechanism to provide backward compatibility for systems that
|
|
||||||
do not support MSI. In systems which support MSI, the bus driver is
|
|
||||||
responsible for initializing the message address and message data of
|
|
||||||
the device function's MSI/MSI-X capability structure during device
|
|
||||||
initial configuration.
|
|
||||||
|
|
||||||
An MSI capable device function indicates MSI support by implementing
|
3. Why use MSIs?
|
||||||
the MSI/MSI-X capability structure in its PCI capability list. The
|
|
||||||
device function may implement both the MSI capability structure and
|
|
||||||
the MSI-X capability structure; however, the bus driver should not
|
|
||||||
enable both.
|
|
||||||
|
|
||||||
The MSI capability structure contains Message Control register,
|
There are three reasons why using MSIs can give an advantage over
|
||||||
Message Address register and Message Data register. These registers
|
traditional pin-based interrupts.
|
||||||
provide the bus driver control over MSI. The Message Control register
|
|
||||||
indicates the MSI capability supported by the device. The Message
|
|
||||||
Address register specifies the target address and the Message Data
|
|
||||||
register specifies the characteristics of the message. To request
|
|
||||||
service, the device function writes the content of the Message Data
|
|
||||||
register to the target address. The device and its software driver
|
|
||||||
are prohibited from writing to these registers.
|
|
||||||
|
|
||||||
The MSI-X capability structure is an optional extension to MSI. It
|
Pin-based PCI interrupts are often shared amongst several devices.
|
||||||
uses an independent and separate capability structure. There are
|
To support this, the kernel must call each interrupt handler associated
|
||||||
some key advantages to implementing the MSI-X capability structure
|
with an interrupt, which leads to reduced performance for the system as
|
||||||
over the MSI capability structure as described below.
|
a whole. MSIs are never shared, so this problem cannot arise.
|
||||||
|
|
||||||
- Support a larger maximum number of vectors per function.
|
When a device writes data to memory, then raises a pin-based interrupt,
|
||||||
|
it is possible that the interrupt may arrive before all the data has
|
||||||
|
arrived in memory (this becomes more likely with devices behind PCI-PCI
|
||||||
|
bridges). In order to ensure that all the data has arrived in memory,
|
||||||
|
the interrupt handler must read a register on the device which raised
|
||||||
|
the interrupt. PCI transaction ordering rules require that all the data
|
||||||
|
arrives in memory before the value can be returned from the register.
|
||||||
|
Using MSIs avoids this problem as the interrupt-generating write cannot
|
||||||
|
pass the data writes, so by the time the interrupt is raised, the driver
|
||||||
|
knows that all the data has arrived in memory.
|
||||||
|
|
||||||
- Provide the ability for system software to configure
|
PCI devices can only support a single pin-based interrupt per function.
|
||||||
each vector with an independent message address and message
|
Often drivers have to query the device to find out what event has
|
||||||
data, specified by a table that resides in Memory Space.
|
occurred, slowing down interrupt handling for the common case. With
|
||||||
|
MSIs, a device can support more interrupts, allowing each interrupt
|
||||||
|
to be specialised to a different purpose. One possible design gives
|
||||||
|
infrequent conditions (such as errors) their own interrupt which allows
|
||||||
|
the driver to handle the normal interrupt handling path more efficiently.
|
||||||
|
Other possible designs include giving one interrupt to each packet queue
|
||||||
|
in a network card or each port in a storage controller.
|
||||||
|
|
||||||
- MSI and MSI-X both support per-vector masking. Per-vector
|
|
||||||
masking is an optional extension of MSI but a required
|
|
||||||
feature for MSI-X. Per-vector masking provides the kernel the
|
|
||||||
ability to mask/unmask a single MSI while running its
|
|
||||||
interrupt service routine. If per-vector masking is
|
|
||||||
not supported, then the device driver should provide the
|
|
||||||
hardware/software synchronization to ensure that the device
|
|
||||||
generates MSI when the driver wants it to do so.
|
|
||||||
|
|
||||||
4. Why use MSI?
|
4. How to use MSIs
|
||||||
|
|
||||||
As a benefit to the simplification of board design, MSI allows board
|
PCI devices are initialised to use pin-based interrupts. The device
|
||||||
designers to remove out-of-band interrupt routing. MSI is another
|
driver has to set up the device to use MSI or MSI-X. Not all machines
|
||||||
step towards a legacy-free environment.
|
support MSIs correctly, and for those machines, the APIs described below
|
||||||
|
will simply fail and the device will continue to use pin-based interrupts.
|
||||||
|
|
||||||
Due to increasing pressure on chipset and processor packages to
|
4.1 Include kernel support for MSIs
|
||||||
reduce pin count, the need for interrupt pins is expected to
|
|
||||||
diminish over time. Devices, due to pin constraints, may implement
|
|
||||||
messages to increase performance.
|
|
||||||
|
|
||||||
PCI Express endpoints uses INTx emulation (in-band messages) instead
|
To support MSI or MSI-X, the kernel must be built with the CONFIG_PCI_MSI
|
||||||
of IRQ pin assertion. Using INTx emulation requires interrupt
|
option enabled. This option is only available on some architectures,
|
||||||
sharing among devices connected to the same node (PCI bridge) while
|
and it may depend on some other options also being set. For example,
|
||||||
MSI is unique (non-shared) and does not require BIOS configuration
|
on x86, you must also enable X86_UP_APIC or SMP in order to see the
|
||||||
support. As a result, the PCI Express technology requires MSI
|
CONFIG_PCI_MSI option.
|
||||||
support for better interrupt performance.
|
|
||||||
|
|
||||||
Using MSI enables the device functions to support two or more
|
4.2 Using MSI
|
||||||
vectors, which can be configured to target different CPUs to
|
|
||||||
increase scalability.
|
|
||||||
|
|
||||||
5. Configuring a driver to use MSI/MSI-X
|
Most of the hard work is done for the driver in the PCI layer. It simply
|
||||||
|
has to request that the PCI layer set up the MSI capability for this
|
||||||
|
device.
|
||||||
|
|
||||||
By default, the kernel will not enable MSI/MSI-X on all devices that
|
4.2.1 pci_enable_msi
|
||||||
support this capability. The CONFIG_PCI_MSI kernel option
|
|
||||||
must be selected to enable MSI/MSI-X support.
|
|
||||||
|
|
||||||
5.1 Including MSI/MSI-X support into the kernel
|
|
||||||
|
|
||||||
To allow MSI/MSI-X capable device drivers to selectively enable
|
|
||||||
MSI/MSI-X (using pci_enable_msi()/pci_enable_msix() as described
|
|
||||||
below), the VECTOR based scheme needs to be enabled by setting
|
|
||||||
CONFIG_PCI_MSI during kernel config.
|
|
||||||
|
|
||||||
Since the target of the inbound message is the local APIC, providing
|
|
||||||
CONFIG_X86_LOCAL_APIC must be enabled as well as CONFIG_PCI_MSI.
|
|
||||||
|
|
||||||
5.2 Configuring for MSI support
|
|
||||||
|
|
||||||
Due to the non-contiguous fashion in vector assignment of the
|
|
||||||
existing Linux kernel, this version does not support multiple
|
|
||||||
messages regardless of a device function is capable of supporting
|
|
||||||
more than one vector. To enable MSI on a device function's MSI
|
|
||||||
capability structure requires a device driver to call the function
|
|
||||||
pci_enable_msi() explicitly.
|
|
||||||
|
|
||||||
5.2.1 API pci_enable_msi
|
|
||||||
|
|
||||||
int pci_enable_msi(struct pci_dev *dev)
|
int pci_enable_msi(struct pci_dev *dev)
|
||||||
|
|
||||||
With this new API, a device driver that wants to have MSI
|
A successful call will allocate ONE interrupt to the device, regardless
|
||||||
enabled on its device function must call this API to enable MSI.
|
of how many MSIs the device supports. The device will be switched from
|
||||||
A successful call will initialize the MSI capability structure
|
pin-based interrupt mode to MSI mode. The dev->irq number is changed
|
||||||
with ONE vector, regardless of whether a device function is
|
to a new number which represents the message signaled interrupt.
|
||||||
capable of supporting multiple messages. This vector replaces the
|
This function should be called before the driver calls request_irq()
|
||||||
pre-assigned dev->irq with a new MSI vector. To avoid a conflict
|
since enabling MSIs disables the pin-based IRQ and the driver will not
|
||||||
of the new assigned vector with existing pre-assigned vector requires
|
receive interrupts on the old interrupt.
|
||||||
a device driver to call this API before calling request_irq().
|
|
||||||
|
|
||||||
5.2.2 API pci_disable_msi
|
4.2.2 pci_enable_msi_block
|
||||||
|
|
||||||
|
int pci_enable_msi_block(struct pci_dev *dev, int count)
|
||||||
|
|
||||||
|
This variation on the above call allows a device driver to request multiple
|
||||||
|
MSIs. The MSI specification only allows interrupts to be allocated in
|
||||||
|
powers of two, up to a maximum of 2^5 (32).
|
||||||
|
|
||||||
|
If this function returns 0, it has succeeded in allocating at least as many
|
||||||
|
interrupts as the driver requested (it may have allocated more in order
|
||||||
|
to satisfy the power-of-two requirement). In this case, the function
|
||||||
|
enables MSI on this device and updates dev->irq to be the lowest of
|
||||||
|
the new interrupts assigned to it. The other interrupts assigned to
|
||||||
|
the device are in the range dev->irq to dev->irq + count - 1.
|
||||||
|
|
||||||
|
If this function returns a negative number, it indicates an error and
|
||||||
|
the driver should not attempt to request any more MSI interrupts for
|
||||||
|
this device. If this function returns a positive number, it will be
|
||||||
|
less than 'count' and indicate the number of interrupts that could have
|
||||||
|
been allocated. In neither case will the irq value have been
|
||||||
|
updated, nor will the device have been switched into MSI mode.
|
||||||
|
|
||||||
|
The device driver must decide what action to take if
|
||||||
|
pci_enable_msi_block() returns a value less than the number asked for.
|
||||||
|
Some devices can make use of fewer interrupts than the maximum they
|
||||||
|
request; in this case the driver should call pci_enable_msi_block()
|
||||||
|
again. Note that it is not guaranteed to succeed, even when the
|
||||||
|
'count' has been reduced to the value returned from a previous call to
|
||||||
|
pci_enable_msi_block(). This is because there are multiple constraints
|
||||||
|
on the number of vectors that can be allocated; pci_enable_msi_block()
|
||||||
|
will return as soon as it finds any constraint that doesn't allow the
|
||||||
|
call to succeed.
|
||||||
|
|
||||||
|
4.2.3 pci_disable_msi
|
||||||
|
|
||||||
void pci_disable_msi(struct pci_dev *dev)
|
void pci_disable_msi(struct pci_dev *dev)
|
||||||
|
|
||||||
This API should always be used to undo the effect of pci_enable_msi()
|
This function should be used to undo the effect of pci_enable_msi() or
|
||||||
when a device driver is unloading. This API restores dev->irq with
|
pci_enable_msi_block(). Calling it restores dev->irq to the pin-based
|
||||||
the pre-assigned IOAPIC vector and switches a device's interrupt
|
interrupt number and frees the previously allocated message signaled
|
||||||
mode to PCI pin-irq assertion/INTx emulation mode.
|
interrupt(s). The interrupt may subsequently be assigned to another
|
||||||
|
device, so drivers should not cache the value of dev->irq.
|
||||||
|
|
||||||
Note that a device driver should always call free_irq() on the MSI vector
|
A device driver must always call free_irq() on the interrupt(s)
|
||||||
that it has done request_irq() on before calling this API. Failure to do
|
for which it has called request_irq() before calling this function.
|
||||||
so results in a BUG_ON() and a device will be left with MSI enabled and
|
Failure to do so will result in a BUG_ON(), the device will be left with
|
||||||
leaks its vector.
|
MSI enabled and will leak its vector.
|
||||||
|
|
||||||
5.2.3 MSI mode vs. legacy mode diagram
|
4.3 Using MSI-X
|
||||||
|
|
||||||
The below diagram shows the events which switch the interrupt
|
The MSI-X capability is much more flexible than the MSI capability.
|
||||||
mode on the MSI-capable device function between MSI mode and
|
It supports up to 2048 interrupts, each of which can be controlled
|
||||||
PIN-IRQ assertion mode.
|
independently. To support this flexibility, drivers must use an array of
|
||||||
|
`struct msix_entry':
|
||||||
------------ pci_enable_msi ------------------------
|
|
||||||
| | <=============== | |
|
|
||||||
| MSI MODE | | PIN-IRQ ASSERTION MODE |
|
|
||||||
| | ===============> | |
|
|
||||||
------------ pci_disable_msi ------------------------
|
|
||||||
|
|
||||||
|
|
||||||
Figure 1. MSI Mode vs. Legacy Mode
|
|
||||||
|
|
||||||
In Figure 1, a device operates by default in legacy mode. Legacy
|
|
||||||
in this context means PCI pin-irq assertion or PCI-Express INTx
|
|
||||||
emulation. A successful MSI request (using pci_enable_msi()) switches
|
|
||||||
a device's interrupt mode to MSI mode. A pre-assigned IOAPIC vector
|
|
||||||
stored in dev->irq will be saved by the PCI subsystem and a new
|
|
||||||
assigned MSI vector will replace dev->irq.
|
|
||||||
|
|
||||||
To return back to its default mode, a device driver should always call
|
|
||||||
pci_disable_msi() to undo the effect of pci_enable_msi(). Note that a
|
|
||||||
device driver should always call free_irq() on the MSI vector it has
|
|
||||||
done request_irq() on before calling pci_disable_msi(). Failure to do
|
|
||||||
so results in a BUG_ON() and a device will be left with MSI enabled and
|
|
||||||
leaks its vector. Otherwise, the PCI subsystem restores a device's
|
|
||||||
dev->irq with a pre-assigned IOAPIC vector and marks the released
|
|
||||||
MSI vector as unused.
|
|
||||||
|
|
||||||
Once being marked as unused, there is no guarantee that the PCI
|
|
||||||
subsystem will reserve this MSI vector for a device. Depending on
|
|
||||||
the availability of current PCI vector resources and the number of
|
|
||||||
MSI/MSI-X requests from other drivers, this MSI may be re-assigned.
|
|
||||||
|
|
||||||
For the case where the PCI subsystem re-assigns this MSI vector to
|
|
||||||
another driver, a request to switch back to MSI mode may result
|
|
||||||
in being assigned a different MSI vector or a failure if no more
|
|
||||||
vectors are available.
|
|
||||||
|
|
||||||
5.3 Configuring for MSI-X support
|
|
||||||
|
|
||||||
Due to the ability of the system software to configure each vector of
|
|
||||||
the MSI-X capability structure with an independent message address
|
|
||||||
and message data, the non-contiguous fashion in vector assignment of
|
|
||||||
the existing Linux kernel has no impact on supporting multiple
|
|
||||||
messages on an MSI-X capable device functions. To enable MSI-X on
|
|
||||||
a device function's MSI-X capability structure requires its device
|
|
||||||
driver to call the function pci_enable_msix() explicitly.
|
|
||||||
|
|
||||||
The function pci_enable_msix(), once invoked, enables either
|
|
||||||
all or nothing, depending on the current availability of PCI vector
|
|
||||||
resources. If the PCI vector resources are available for the number
|
|
||||||
of vectors requested by a device driver, this function will configure
|
|
||||||
the MSI-X table of the MSI-X capability structure of a device with
|
|
||||||
requested messages. To emphasize this reason, for example, a device
|
|
||||||
may be capable for supporting the maximum of 32 vectors while its
|
|
||||||
software driver usually may request 4 vectors. It is recommended
|
|
||||||
that the device driver should call this function once during the
|
|
||||||
initialization phase of the device driver.
|
|
||||||
|
|
||||||
Unlike the function pci_enable_msi(), the function pci_enable_msix()
|
|
||||||
does not replace the pre-assigned IOAPIC dev->irq with a new MSI
|
|
||||||
vector because the PCI subsystem writes the 1:1 vector-to-entry mapping
|
|
||||||
into the field vector of each element contained in a second argument.
|
|
||||||
Note that the pre-assigned IOAPIC dev->irq is valid only if the device
|
|
||||||
operates in PIN-IRQ assertion mode. In MSI-X mode, any attempt at
|
|
||||||
using dev->irq by the device driver to request for interrupt service
|
|
||||||
may result in unpredictable behavior.
|
|
||||||
|
|
||||||
For each MSI-X vector granted, a device driver is responsible for calling
|
|
||||||
other functions like request_irq(), enable_irq(), etc. to enable
|
|
||||||
this vector with its corresponding interrupt service handler. It is
|
|
||||||
a device driver's choice to assign all vectors with the same
|
|
||||||
interrupt service handler or each vector with a unique interrupt
|
|
||||||
service handler.
|
|
||||||
|
|
||||||
5.3.1 Handling MMIO address space of MSI-X Table
|
|
||||||
|
|
||||||
The PCI 3.0 specification has implementation notes that MMIO address
|
|
||||||
space for a device's MSI-X structure should be isolated so that the
|
|
||||||
software system can set different pages for controlling accesses to the
|
|
||||||
MSI-X structure. The implementation of MSI support requires the PCI
|
|
||||||
subsystem, not a device driver, to maintain full control of the MSI-X
|
|
||||||
table/MSI-X PBA (Pending Bit Array) and MMIO address space of the MSI-X
|
|
||||||
table/MSI-X PBA. A device driver should not access the MMIO address
|
|
||||||
space of the MSI-X table/MSI-X PBA.
|
|
||||||
|
|
||||||
5.3.2 API pci_enable_msix
|
|
||||||
|
|
||||||
int pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries, int nvec)
|
|
||||||
|
|
||||||
This API enables a device driver to request the PCI subsystem
|
|
||||||
to enable MSI-X messages on its hardware device. Depending on
|
|
||||||
the availability of PCI vectors resources, the PCI subsystem enables
|
|
||||||
either all or none of the requested vectors.
|
|
||||||
|
|
||||||
Argument 'dev' points to the device (pci_dev) structure.
|
|
||||||
|
|
||||||
Argument 'entries' is a pointer to an array of msix_entry structs.
|
|
||||||
The number of entries is indicated in argument 'nvec'.
|
|
||||||
struct msix_entry is defined in /driver/pci/msi.h:
|
|
||||||
|
|
||||||
struct msix_entry {
|
struct msix_entry {
|
||||||
u16 vector; /* kernel uses to write alloc vector */
|
u16 vector; /* kernel uses to write alloc vector */
|
||||||
u16 entry; /* driver uses to specify entry */
|
u16 entry; /* driver uses to specify entry */
|
||||||
};
|
};
|
||||||
|
|
||||||
A device driver is responsible for initializing the field 'entry' of
|
This allows for the device to use these interrupts in a sparse fashion;
|
||||||
each element with a unique entry supported by MSI-X table. Otherwise,
|
for example it could use interrupts 3 and 1027 and allocate only a
|
||||||
-EINVAL will be returned as a result. A successful return of zero
|
two-element array. The driver is expected to fill in the 'entry' value
|
||||||
indicates the PCI subsystem completed initializing each of the requested
|
in each element of the array to indicate which entries it wants the kernel
|
||||||
entries of the MSI-X table with message address and message data.
|
to assign interrupts for. It is invalid to fill in two entries with the
|
||||||
Last but not least, the PCI subsystem will write the 1:1
|
same number.
|
||||||
vector-to-entry mapping into the field 'vector' of each element. A
|
|
||||||
device driver is responsible for keeping track of allocated MSI-X
|
|
||||||
vectors in its internal data structure.
|
|
||||||
|
|
||||||
A return of zero indicates that the number of MSI-X vectors was
|
4.3.1 pci_enable_msix
|
||||||
successfully allocated. A return of greater than zero indicates
|
|
||||||
MSI-X vector shortage. Or a return of less than zero indicates
|
|
||||||
a failure. This failure may be a result of duplicate entries
|
|
||||||
specified in second argument, or a result of no available vector,
|
|
||||||
or a result of failing to initialize MSI-X table entries.
|
|
||||||
|
|
||||||
5.3.3 API pci_disable_msix
|
int pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries, int nvec)
|
||||||
|
|
||||||
|
Calling this function asks the PCI subsystem to allocate 'nvec' MSIs.
|
||||||
|
The 'entries' argument is a pointer to an array of msix_entry structs
|
||||||
|
which should be at least 'nvec' entries in size. On success, the
|
||||||
|
function will return 0 and the device will have been switched into
|
||||||
|
MSI-X interrupt mode. The 'vector' elements in each entry will have
|
||||||
|
been filled in with the interrupt number. The driver should then call
|
||||||
|
request_irq() for each 'vector' that it decides to use.
|
||||||
|
|
||||||
|
If this function returns a negative number, it indicates an error and
|
||||||
|
the driver should not attempt to allocate any more MSI-X interrupts for
|
||||||
|
this device. If it returns a positive number, it indicates the maximum
|
||||||
|
number of interrupt vectors that could have been allocated. See example
|
||||||
|
below.
|
||||||
|
|
||||||
|
This function, in contrast with pci_enable_msi(), does not adjust
|
||||||
|
dev->irq. The device will not generate interrupts for this interrupt
|
||||||
|
number once MSI-X is enabled. The device driver is responsible for
|
||||||
|
keeping track of the interrupts assigned to the MSI-X vectors so it can
|
||||||
|
free them again later.
|
||||||
|
|
||||||
|
Device drivers should normally call this function once per device
|
||||||
|
during the initialization phase.
|
||||||
|
|
||||||
|
It is ideal if drivers can cope with a variable number of MSI-X interrupts,
|
||||||
|
there are many reasons why the platform may not be able to provide the
|
||||||
|
exact number a driver asks for.
|
||||||
|
|
||||||
|
A request loop to achieve that might look like:
|
||||||
|
|
||||||
|
static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec)
|
||||||
|
{
|
||||||
|
while (nvec >= FOO_DRIVER_MINIMUM_NVEC) {
|
||||||
|
rc = pci_enable_msix(adapter->pdev,
|
||||||
|
adapter->msix_entries, nvec);
|
||||||
|
if (rc > 0)
|
||||||
|
nvec = rc;
|
||||||
|
else
|
||||||
|
return rc;
|
||||||
|
}
|
||||||
|
|
||||||
|
return -ENOSPC;
|
||||||
|
}
|
||||||
|
|
||||||
|
4.3.2 pci_disable_msix
|
||||||
|
|
||||||
void pci_disable_msix(struct pci_dev *dev)
|
void pci_disable_msix(struct pci_dev *dev)
|
||||||
|
|
||||||
This API should always be used to undo the effect of pci_enable_msix()
|
This API should be used to undo the effect of pci_enable_msix(). It frees
|
||||||
when a device driver is unloading. Note that a device driver should
|
the previously allocated message signaled interrupts. The interrupts may
|
||||||
always call free_irq() on all MSI-X vectors it has done request_irq()
|
subsequently be assigned to another device, so drivers should not cache
|
||||||
on before calling this API. Failure to do so results in a BUG_ON() and
|
the value of the 'vector' elements over a call to pci_disable_msix().
|
||||||
a device will be left with MSI-X enabled and leaks its vectors.
|
|
||||||
|
|
||||||
5.3.4 MSI-X mode vs. legacy mode diagram
|
A device driver must always call free_irq() on the interrupt(s)
|
||||||
|
for which it has called request_irq() before calling this function.
|
||||||
|
Failure to do so will result in a BUG_ON(), the device will be left with
|
||||||
|
MSI enabled and will leak its vector.
|
||||||
|
|
||||||
The below diagram shows the events which switch the interrupt
|
4.3.3 The MSI-X Table
|
||||||
mode on the MSI-X capable device function between MSI-X mode and
|
|
||||||
PIN-IRQ assertion mode (legacy).
|
|
||||||
|
|
||||||
------------ pci_enable_msix(,,n) ------------------------
|
The MSI-X capability specifies a BAR and offset within that BAR for the
|
||||||
| | <=============== | |
|
MSI-X Table. This address is mapped by the PCI subsystem, and should not
|
||||||
| MSI-X MODE | | PIN-IRQ ASSERTION MODE |
|
be accessed directly by the device driver. If the driver wishes to
|
||||||
| | ===============> | |
|
mask or unmask an interrupt, it should call disable_irq() / enable_irq().
|
||||||
------------ pci_disable_msix ------------------------
|
|
||||||
|
|
||||||
Figure 2. MSI-X Mode vs. Legacy Mode
|
4.4 Handling devices implementing both MSI and MSI-X capabilities
|
||||||
|
|
||||||
In Figure 2, a device operates by default in legacy mode. A
|
If a device implements both MSI and MSI-X capabilities, it can
|
||||||
successful MSI-X request (using pci_enable_msix()) switches a
|
run in either MSI mode or MSI-X mode but not both simultaneously.
|
||||||
device's interrupt mode to MSI-X mode. A pre-assigned IOAPIC vector
|
This is a requirement of the PCI spec, and it is enforced by the
|
||||||
stored in dev->irq will be saved by the PCI subsystem; however,
|
PCI layer. Calling pci_enable_msi() when MSI-X is already enabled or
|
||||||
unlike MSI mode, the PCI subsystem will not replace dev->irq with
|
pci_enable_msix() when MSI is already enabled will result in an error.
|
||||||
assigned MSI-X vector because the PCI subsystem already writes the 1:1
|
If a device driver wishes to switch between MSI and MSI-X at runtime,
|
||||||
vector-to-entry mapping into the field 'vector' of each element
|
it must first quiesce the device, then switch it back to pin-interrupt
|
||||||
specified in second argument.
|
mode, before calling pci_enable_msi() or pci_enable_msix() and resuming
|
||||||
|
operation. This is not expected to be a common operation but may be
|
||||||
|
useful for debugging or testing during development.
|
||||||
|
|
||||||
To return back to its default mode, a device driver should always call
|
4.5 Considerations when using MSIs
|
||||||
pci_disable_msix() to undo the effect of pci_enable_msix(). Note that
|
|
||||||
a device driver should always call free_irq() on all MSI-X vectors it
|
|
||||||
has done request_irq() on before calling pci_disable_msix(). Failure
|
|
||||||
to do so results in a BUG_ON() and a device will be left with MSI-X
|
|
||||||
enabled and leaks its vectors. Otherwise, the PCI subsystem switches a
|
|
||||||
device function's interrupt mode from MSI-X mode to legacy mode and
|
|
||||||
marks all allocated MSI-X vectors as unused.
|
|
||||||
|
|
||||||
Once being marked as unused, there is no guarantee that the PCI
|
4.5.1 Choosing between MSI-X and MSI
|
||||||
subsystem will reserve these MSI-X vectors for a device. Depending on
|
|
||||||
the availability of current PCI vector resources and the number of
|
|
||||||
MSI/MSI-X requests from other drivers, these MSI-X vectors may be
|
|
||||||
re-assigned.
|
|
||||||
|
|
||||||
For the case where the PCI subsystem re-assigned these MSI-X vectors
|
If your device supports both MSI-X and MSI capabilities, you should use
|
||||||
to other drivers, a request to switch back to MSI-X mode may result
|
the MSI-X facilities in preference to the MSI facilities. As mentioned
|
||||||
being assigned with another set of MSI-X vectors or a failure if no
|
above, MSI-X supports any number of interrupts between 1 and 2048.
|
||||||
more vectors are available.
|
In constrast, MSI is restricted to a maximum of 32 interrupts (and
|
||||||
|
must be a power of two). In addition, the MSI interrupt vectors must
|
||||||
|
be allocated consecutively, so the system may not be able to allocate
|
||||||
|
as many vectors for MSI as it could for MSI-X. On some platforms, MSI
|
||||||
|
interrupts must all be targetted at the same set of CPUs whereas MSI-X
|
||||||
|
interrupts can all be targetted at different CPUs.
|
||||||
|
|
||||||
5.4 Handling function implementing both MSI and MSI-X capabilities
|
4.5.2 Spinlocks
|
||||||
|
|
||||||
For the case where a function implements both MSI and MSI-X
|
Most device drivers have a per-device spinlock which is taken in the
|
||||||
capabilities, the PCI subsystem enables a device to run either in MSI
|
interrupt handler. With pin-based interrupts or a single MSI, it is not
|
||||||
mode or MSI-X mode but not both. A device driver determines whether it
|
necessary to disable interrupts (Linux guarantees the same interrupt will
|
||||||
wants MSI or MSI-X enabled on its hardware device. Once a device
|
not be re-entered). If a device uses multiple interrupts, the driver
|
||||||
driver requests for MSI, for example, it is prohibited from requesting
|
must disable interrupts while the lock is held. If the device sends
|
||||||
MSI-X; in other words, a device driver is not permitted to ping-pong
|
a different interrupt, the driver will deadlock trying to recursively
|
||||||
between MSI mod MSI-X mode during a run-time.
|
acquire the spinlock.
|
||||||
|
|
||||||
5.5 Hardware requirements for MSI/MSI-X support
|
There are two solutions. The first is to take the lock with
|
||||||
|
spin_lock_irqsave() or spin_lock_irq() (see
|
||||||
|
Documentation/DocBook/kernel-locking). The second is to specify
|
||||||
|
IRQF_DISABLED to request_irq() so that the kernel runs the entire
|
||||||
|
interrupt routine with interrupts disabled.
|
||||||
|
|
||||||
MSI/MSI-X support requires support from both system hardware and
|
If your MSI interrupt routine does not hold the lock for the whole time
|
||||||
individual hardware device functions.
|
it is running, the first solution may be best. The second solution is
|
||||||
|
normally preferred as it avoids making two transitions from interrupt
|
||||||
|
disabled to enabled and back again.
|
||||||
|
|
||||||
5.5.1 Required x86 hardware support
|
4.6 How to tell whether MSI/MSI-X is enabled on a device
|
||||||
|
|
||||||
Since the target of MSI address is the local APIC CPU, enabling
|
Using 'lspci -v' (as root) may show some devices with "MSI", "Message
|
||||||
MSI/MSI-X support in the Linux kernel is dependent on whether existing
|
Signalled Interrupts" or "MSI-X" capabilities. Each of these capabilities
|
||||||
system hardware supports local APIC. Users should verify that their
|
has an 'Enable' flag which will be followed with either "+" (enabled)
|
||||||
system supports local APIC operation by testing that it runs when
|
or "-" (disabled).
|
||||||
CONFIG_X86_LOCAL_APIC=y.
|
|
||||||
|
|
||||||
In SMP environment, CONFIG_X86_LOCAL_APIC is automatically set;
|
|
||||||
however, in UP environment, users must manually set
|
|
||||||
CONFIG_X86_LOCAL_APIC. Once CONFIG_X86_LOCAL_APIC=y, setting
|
|
||||||
CONFIG_PCI_MSI enables the VECTOR based scheme and the option for
|
|
||||||
MSI-capable device drivers to selectively enable MSI/MSI-X.
|
|
||||||
|
|
||||||
Note that CONFIG_X86_IO_APIC setting is irrelevant because MSI/MSI-X
|
5. MSI quirks
|
||||||
vector is allocated new during runtime and MSI/MSI-X support does not
|
|
||||||
depend on BIOS support. This key independency enables MSI/MSI-X
|
|
||||||
support on future IOxAPIC free platforms.
|
|
||||||
|
|
||||||
5.5.2 Device hardware support
|
Several PCI chipsets or devices are known not to support MSIs.
|
||||||
|
The PCI stack provides three ways to disable MSIs:
|
||||||
|
|
||||||
The hardware device function supports MSI by indicating the
|
1. globally
|
||||||
MSI/MSI-X capability structure on its PCI capability list. By
|
2. on all devices behind a specific bridge
|
||||||
default, this capability structure will not be initialized by
|
3. on a single device
|
||||||
the kernel to enable MSI during the system boot. In other words,
|
|
||||||
the device function is running on its default pin assertion mode.
|
|
||||||
Note that in many cases the hardware supporting MSI have bugs,
|
|
||||||
which may result in system hangs. The software driver of specific
|
|
||||||
MSI-capable hardware is responsible for deciding whether to call
|
|
||||||
pci_enable_msi or not. A return of zero indicates the kernel
|
|
||||||
successfully initialized the MSI/MSI-X capability structure of the
|
|
||||||
device function. The device function is now running on MSI/MSI-X mode.
|
|
||||||
|
|
||||||
5.6 How to tell whether MSI/MSI-X is enabled on device function
|
5.1. Disabling MSIs globally
|
||||||
|
|
||||||
At the driver level, a return of zero from the function call of
|
Some host chipsets simply don't support MSIs properly. If we're
|
||||||
pci_enable_msi()/pci_enable_msix() indicates to a device driver that
|
lucky, the manufacturer knows this and has indicated it in the ACPI
|
||||||
its device function is initialized successfully and ready to run in
|
FADT table. In this case, Linux will automatically disable MSIs.
|
||||||
MSI/MSI-X mode.
|
Some boards don't include this information in the table and so we have
|
||||||
|
to detect them ourselves. The complete list of these is found near the
|
||||||
|
quirk_disable_all_msi() function in drivers/pci/quirks.c.
|
||||||
|
|
||||||
At the user level, users can use the command 'cat /proc/interrupts'
|
If you have a board which has problems with MSIs, you can pass pci=nomsi
|
||||||
to display the vectors allocated for devices and their interrupt
|
on the kernel command line to disable MSIs on all devices. It would be
|
||||||
MSI/MSI-X modes ("PCI-MSI"/"PCI-MSI-X"). Below shows MSI mode is
|
in your best interests to report the problem to linux-pci@vger.kernel.org
|
||||||
enabled on a SCSI Adaptec 39320D Ultra320 controller.
|
including a full 'lspci -v' so we can add the quirks to the kernel.
|
||||||
|
|
||||||
CPU0 CPU1
|
5.2. Disabling MSIs below a bridge
|
||||||
0: 324639 0 IO-APIC-edge timer
|
|
||||||
1: 1186 0 IO-APIC-edge i8042
|
|
||||||
2: 0 0 XT-PIC cascade
|
|
||||||
12: 2797 0 IO-APIC-edge i8042
|
|
||||||
14: 6543 0 IO-APIC-edge ide0
|
|
||||||
15: 1 0 IO-APIC-edge ide1
|
|
||||||
169: 0 0 IO-APIC-level uhci-hcd
|
|
||||||
185: 0 0 IO-APIC-level uhci-hcd
|
|
||||||
193: 138 10 PCI-MSI aic79xx
|
|
||||||
201: 30 0 PCI-MSI aic79xx
|
|
||||||
225: 30 0 IO-APIC-level aic7xxx
|
|
||||||
233: 30 0 IO-APIC-level aic7xxx
|
|
||||||
NMI: 0 0
|
|
||||||
LOC: 324553 325068
|
|
||||||
ERR: 0
|
|
||||||
MIS: 0
|
|
||||||
|
|
||||||
6. MSI quirks
|
Some PCI bridges are not able to route MSIs between busses properly.
|
||||||
|
In this case, MSIs must be disabled on all devices behind the bridge.
|
||||||
|
|
||||||
Several PCI chipsets or devices are known to not support MSI.
|
Some bridges allow you to enable MSIs by changing some bits in their
|
||||||
The PCI stack provides 3 possible levels of MSI disabling:
|
PCI configuration space (especially the Hypertransport chipsets such
|
||||||
* on a single device
|
as the nVidia nForce and Serverworks HT2000). As with host chipsets,
|
||||||
* on all devices behind a specific bridge
|
Linux mostly knows about them and automatically enables MSIs if it can.
|
||||||
* globally
|
If you have a bridge which Linux doesn't yet know about, you can enable
|
||||||
|
MSIs in configuration space using whatever method you know works, then
|
||||||
|
enable MSIs on that bridge by doing:
|
||||||
|
|
||||||
6.1. Disabling MSI on a single device
|
echo 1 > /sys/bus/pci/devices/$bridge/msi_bus
|
||||||
|
|
||||||
Under some circumstances it might be required to disable MSI on a
|
where $bridge is the PCI address of the bridge you've enabled (eg
|
||||||
single device. This may be achieved by either not calling pci_enable_msi()
|
0000:00:0e.0).
|
||||||
or all, or setting the pci_dev->no_msi flag before (most of the time
|
|
||||||
in a quirk).
|
|
||||||
|
|
||||||
6.2. Disabling MSI below a bridge
|
To disable MSIs, echo 0 instead of 1. Changing this value should be
|
||||||
|
done with caution as it can break interrupt handling for all devices
|
||||||
|
below this bridge.
|
||||||
|
|
||||||
The vast majority of MSI quirks are required by PCI bridges not
|
Again, please notify linux-pci@vger.kernel.org of any bridges that need
|
||||||
being able to route MSI between busses. In this case, MSI have to be
|
special handling.
|
||||||
disabled on all devices behind this bridge. It is achieves by setting
|
|
||||||
the PCI_BUS_FLAGS_NO_MSI flag in the pci_bus->bus_flags of the bridge
|
|
||||||
subordinate bus. There is no need to set the same flag on bridges that
|
|
||||||
are below the broken bridge. When pci_enable_msi() is called to enable
|
|
||||||
MSI on a device, pci_msi_supported() takes care of checking the NO_MSI
|
|
||||||
flag in all parent busses of the device.
|
|
||||||
|
|
||||||
Some bridges actually support dynamic MSI support enabling/disabling
|
5.3. Disabling MSIs on a single device
|
||||||
by changing some bits in their PCI configuration space (especially
|
|
||||||
the Hypertransport chipsets such as the nVidia nForce and Serverworks
|
|
||||||
HT2000). It may then be required to update the NO_MSI flag on the
|
|
||||||
corresponding devices in the sysfs hierarchy. To enable MSI support
|
|
||||||
on device "0000:00:0e", do:
|
|
||||||
|
|
||||||
echo 1 > /sys/bus/pci/devices/0000:00:0e/msi_bus
|
Some devices are known to have faulty MSI implementations. Usually this
|
||||||
|
is handled in the individual device driver but occasionally it's necessary
|
||||||
|
to handle this with a quirk. Some drivers have an option to disable use
|
||||||
|
of MSI. While this is a convenient workaround for the driver author,
|
||||||
|
it is not good practise, and should not be emulated.
|
||||||
|
|
||||||
To disable MSI support, echo 0 instead of 1. Note that it should be
|
5.4. Finding why MSIs are disabled on a device
|
||||||
used with caution since changing this value might break interrupts.
|
|
||||||
|
|
||||||
6.3. Disabling MSI globally
|
From the above three sections, you can see that there are many reasons
|
||||||
|
why MSIs may not be enabled for a given device. Your first step should
|
||||||
|
be to examine your dmesg carefully to determine whether MSIs are enabled
|
||||||
|
for your machine. You should also check your .config to be sure you
|
||||||
|
have enabled CONFIG_PCI_MSI.
|
||||||
|
|
||||||
Some extreme cases may require to disable MSI globally on the system.
|
Then, 'lspci -t' gives the list of bridges above a device. Reading
|
||||||
For now, the only known case is a Serverworks PCI-X chipsets (MSI are
|
/sys/bus/pci/devices/*/msi_bus will tell you whether MSI are enabled (1)
|
||||||
not supported on several busses that are not all connected to the
|
or disabled (0). If 0 is found in any of the msi_bus files belonging
|
||||||
chipset in the Linux PCI hierarchy). In the vast majority of other
|
to bridges between the PCI root and the device, MSIs are disabled.
|
||||||
cases, disabling only behind a specific bridge is enough.
|
|
||||||
|
|
||||||
For debugging purpose, the user may also pass pci=nomsi on the kernel
|
It is also worth checking the device driver to see whether it supports MSIs.
|
||||||
command-line to explicitly disable MSI globally. But, once the appro-
|
For example, it may contain calls to pci_enable_msi(), pci_enable_msix() or
|
||||||
priate quirks are added to the kernel, this option should not be
|
pci_enable_msi_block().
|
||||||
required anymore.
|
|
||||||
|
|
||||||
6.4. Finding why MSI cannot be enabled on a device
|
|
||||||
|
|
||||||
Assuming that MSI are not enabled on a device, you should look at
|
|
||||||
dmesg to find messages that quirks may output when disabling MSI
|
|
||||||
on some devices, some bridges or even globally.
|
|
||||||
Then, lspci -t gives the list of bridges above a device. Reading
|
|
||||||
/sys/bus/pci/devices/0000:00:0e/msi_bus will tell you whether MSI
|
|
||||||
are enabled (1) or disabled (0). In 0 is found in a single bridge
|
|
||||||
msi_bus file above the device, MSI cannot be enabled.
|
|
||||||
|
|
||||||
7. FAQ
|
|
||||||
|
|
||||||
Q1. Are there any limitations on using the MSI?
|
|
||||||
|
|
||||||
A1. If the PCI device supports MSI and conforms to the
|
|
||||||
specification and the platform supports the APIC local bus,
|
|
||||||
then using MSI should work.
|
|
||||||
|
|
||||||
Q2. Will it work on all the Pentium processors (P3, P4, Xeon,
|
|
||||||
AMD processors)? In P3 IPI's are transmitted on the APIC local
|
|
||||||
bus and in P4 and Xeon they are transmitted on the system
|
|
||||||
bus. Are there any implications with this?
|
|
||||||
|
|
||||||
A2. MSI support enables a PCI device sending an inbound
|
|
||||||
memory write (0xfeexxxxx as target address) on its PCI bus
|
|
||||||
directly to the FSB. Since the message address has a
|
|
||||||
redirection hint bit cleared, it should work.
|
|
||||||
|
|
||||||
Q3. The target address 0xfeexxxxx will be translated by the
|
|
||||||
Host Bridge into an interrupt message. Are there any
|
|
||||||
limitations on the chipsets such as Intel 8xx, Intel e7xxx,
|
|
||||||
or VIA?
|
|
||||||
|
|
||||||
A3. If these chipsets support an inbound memory write with
|
|
||||||
target address set as 0xfeexxxxx, as conformed to PCI
|
|
||||||
specification 2.3 or latest, then it should work.
|
|
||||||
|
|
||||||
Q4. From the driver point of view, if the MSI is lost because
|
|
||||||
of errors occurring during inbound memory write, then it may
|
|
||||||
wait forever. Is there a mechanism for it to recover?
|
|
||||||
|
|
||||||
A4. Since the target of the transaction is an inbound memory
|
|
||||||
write, all transaction termination conditions (Retry,
|
|
||||||
Master-Abort, Target-Abort, or normal completion) are
|
|
||||||
supported. A device sending an MSI must abide by all the PCI
|
|
||||||
rules and conditions regarding that inbound memory write. So,
|
|
||||||
if a retry is signaled it must retry, etc... We believe that
|
|
||||||
the recommendation for Abort is also a retry (refer to PCI
|
|
||||||
specification 2.3 or latest).
|
|
||||||
|
@@ -93,7 +93,7 @@ the PCI Express Port Bus driver from loading a service driver.
|
|||||||
|
|
||||||
int pcie_port_service_register(struct pcie_port_service_driver *new)
|
int pcie_port_service_register(struct pcie_port_service_driver *new)
|
||||||
|
|
||||||
This API replaces the Linux Driver Model's pci_module_init API. A
|
This API replaces the Linux Driver Model's pci_register_driver API. A
|
||||||
service driver should always calls pcie_port_service_register at
|
service driver should always calls pcie_port_service_register at
|
||||||
module init. Note that after service driver being loaded, calls
|
module init. Note that after service driver being loaded, calls
|
||||||
such as pci_enable_device(dev) and pci_set_master(dev) are no longer
|
such as pci_enable_device(dev) and pci_set_master(dev) are no longer
|
||||||
|
99
Documentation/PCI/pci-iov-howto.txt
Normal file
99
Documentation/PCI/pci-iov-howto.txt
Normal file
@@ -0,0 +1,99 @@
|
|||||||
|
PCI Express I/O Virtualization Howto
|
||||||
|
Copyright (C) 2009 Intel Corporation
|
||||||
|
Yu Zhao <yu.zhao@intel.com>
|
||||||
|
|
||||||
|
|
||||||
|
1. Overview
|
||||||
|
|
||||||
|
1.1 What is SR-IOV
|
||||||
|
|
||||||
|
Single Root I/O Virtualization (SR-IOV) is a PCI Express Extended
|
||||||
|
capability which makes one physical device appear as multiple virtual
|
||||||
|
devices. The physical device is referred to as Physical Function (PF)
|
||||||
|
while the virtual devices are referred to as Virtual Functions (VF).
|
||||||
|
Allocation of the VF can be dynamically controlled by the PF via
|
||||||
|
registers encapsulated in the capability. By default, this feature is
|
||||||
|
not enabled and the PF behaves as traditional PCIe device. Once it's
|
||||||
|
turned on, each VF's PCI configuration space can be accessed by its own
|
||||||
|
Bus, Device and Function Number (Routing ID). And each VF also has PCI
|
||||||
|
Memory Space, which is used to map its register set. VF device driver
|
||||||
|
operates on the register set so it can be functional and appear as a
|
||||||
|
real existing PCI device.
|
||||||
|
|
||||||
|
2. User Guide
|
||||||
|
|
||||||
|
2.1 How can I enable SR-IOV capability
|
||||||
|
|
||||||
|
The device driver (PF driver) will control the enabling and disabling
|
||||||
|
of the capability via API provided by SR-IOV core. If the hardware
|
||||||
|
has SR-IOV capability, loading its PF driver would enable it and all
|
||||||
|
VFs associated with the PF.
|
||||||
|
|
||||||
|
2.2 How can I use the Virtual Functions
|
||||||
|
|
||||||
|
The VF is treated as hot-plugged PCI devices in the kernel, so they
|
||||||
|
should be able to work in the same way as real PCI devices. The VF
|
||||||
|
requires device driver that is same as a normal PCI device's.
|
||||||
|
|
||||||
|
3. Developer Guide
|
||||||
|
|
||||||
|
3.1 SR-IOV API
|
||||||
|
|
||||||
|
To enable SR-IOV capability:
|
||||||
|
int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
|
||||||
|
'nr_virtfn' is number of VFs to be enabled.
|
||||||
|
|
||||||
|
To disable SR-IOV capability:
|
||||||
|
void pci_disable_sriov(struct pci_dev *dev);
|
||||||
|
|
||||||
|
To notify SR-IOV core of Virtual Function Migration:
|
||||||
|
irqreturn_t pci_sriov_migration(struct pci_dev *dev);
|
||||||
|
|
||||||
|
3.2 Usage example
|
||||||
|
|
||||||
|
Following piece of code illustrates the usage of the SR-IOV API.
|
||||||
|
|
||||||
|
static int __devinit dev_probe(struct pci_dev *dev, const struct pci_device_id *id)
|
||||||
|
{
|
||||||
|
pci_enable_sriov(dev, NR_VIRTFN);
|
||||||
|
|
||||||
|
...
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __devexit dev_remove(struct pci_dev *dev)
|
||||||
|
{
|
||||||
|
pci_disable_sriov(dev);
|
||||||
|
|
||||||
|
...
|
||||||
|
}
|
||||||
|
|
||||||
|
static int dev_suspend(struct pci_dev *dev, pm_message_t state)
|
||||||
|
{
|
||||||
|
...
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
static int dev_resume(struct pci_dev *dev)
|
||||||
|
{
|
||||||
|
...
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
static void dev_shutdown(struct pci_dev *dev)
|
||||||
|
{
|
||||||
|
...
|
||||||
|
}
|
||||||
|
|
||||||
|
static struct pci_driver dev_driver = {
|
||||||
|
.name = "SR-IOV Physical Function driver",
|
||||||
|
.id_table = dev_id_table,
|
||||||
|
.probe = dev_probe,
|
||||||
|
.remove = __devexit_p(dev_remove),
|
||||||
|
.suspend = dev_suspend,
|
||||||
|
.resume = dev_resume,
|
||||||
|
.shutdown = dev_shutdown,
|
||||||
|
};
|
@@ -298,3 +298,15 @@ over a rather long period of time, but improvements are always welcome!
|
|||||||
|
|
||||||
Note that, rcu_assign_pointer() and rcu_dereference() relate to
|
Note that, rcu_assign_pointer() and rcu_dereference() relate to
|
||||||
SRCU just as they do to other forms of RCU.
|
SRCU just as they do to other forms of RCU.
|
||||||
|
|
||||||
|
15. The whole point of call_rcu(), synchronize_rcu(), and friends
|
||||||
|
is to wait until all pre-existing readers have finished before
|
||||||
|
carrying out some otherwise-destructive operation. It is
|
||||||
|
therefore critically important to -first- remove any path
|
||||||
|
that readers can follow that could be affected by the
|
||||||
|
destructive operation, and -only- -then- invoke call_rcu(),
|
||||||
|
synchronize_rcu(), or friends.
|
||||||
|
|
||||||
|
Because these primitives only wait for pre-existing readers,
|
||||||
|
it is the caller's responsibility to guarantee safety to
|
||||||
|
any subsequent readers.
|
||||||
|
@@ -118,7 +118,7 @@ Following are the RCU equivalents for these two functions:
|
|||||||
list_for_each_entry(e, list, list) {
|
list_for_each_entry(e, list, list) {
|
||||||
if (!audit_compare_rule(rule, &e->rule)) {
|
if (!audit_compare_rule(rule, &e->rule)) {
|
||||||
list_del_rcu(&e->list);
|
list_del_rcu(&e->list);
|
||||||
call_rcu(&e->rcu, audit_free_rule, e);
|
call_rcu(&e->rcu, audit_free_rule);
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -206,7 +206,7 @@ RCU ("read-copy update") its name. The RCU code is as follows:
|
|||||||
ne->rule.action = newaction;
|
ne->rule.action = newaction;
|
||||||
ne->rule.file_count = newfield_count;
|
ne->rule.file_count = newfield_count;
|
||||||
list_replace_rcu(e, ne);
|
list_replace_rcu(e, ne);
|
||||||
call_rcu(&e->rcu, audit_free_rule, e);
|
call_rcu(&e->rcu, audit_free_rule);
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -283,7 +283,7 @@ flag under the spinlock as follows:
|
|||||||
list_del_rcu(&e->list);
|
list_del_rcu(&e->list);
|
||||||
e->deleted = 1;
|
e->deleted = 1;
|
||||||
spin_unlock(&e->lock);
|
spin_unlock(&e->lock);
|
||||||
call_rcu(&e->rcu, audit_free_rule, e);
|
call_rcu(&e->rcu, audit_free_rule);
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@@ -81,7 +81,7 @@ o I hear that RCU needs work in order to support realtime kernels?
|
|||||||
This work is largely completed. Realtime-friendly RCU can be
|
This work is largely completed. Realtime-friendly RCU can be
|
||||||
enabled via the CONFIG_PREEMPT_RCU kernel configuration parameter.
|
enabled via the CONFIG_PREEMPT_RCU kernel configuration parameter.
|
||||||
However, work is in progress for enabling priority boosting of
|
However, work is in progress for enabling priority boosting of
|
||||||
preempted RCU read-side critical sections.This is needed if you
|
preempted RCU read-side critical sections. This is needed if you
|
||||||
have CPU-bound realtime threads.
|
have CPU-bound realtime threads.
|
||||||
|
|
||||||
o Where can I find more information on RCU?
|
o Where can I find more information on RCU?
|
||||||
|
@@ -21,7 +21,7 @@ if (obj) {
|
|||||||
/*
|
/*
|
||||||
* Because a writer could delete object, and a writer could
|
* Because a writer could delete object, and a writer could
|
||||||
* reuse these object before the RCU grace period, we
|
* reuse these object before the RCU grace period, we
|
||||||
* must check key after geting the reference on object
|
* must check key after getting the reference on object
|
||||||
*/
|
*/
|
||||||
if (obj->key != key) { // not the object we expected
|
if (obj->key != key) { // not the object we expected
|
||||||
put_ref(obj);
|
put_ref(obj);
|
||||||
@@ -117,7 +117,7 @@ a race (some writer did a delete and/or a move of an object
|
|||||||
to another chain) checking the final 'nulls' value if
|
to another chain) checking the final 'nulls' value if
|
||||||
the lookup met the end of chain. If final 'nulls' value
|
the lookup met the end of chain. If final 'nulls' value
|
||||||
is not the slot number, then we must restart the lookup at
|
is not the slot number, then we must restart the lookup at
|
||||||
the begining. If the object was moved to same chain,
|
the beginning. If the object was moved to the same chain,
|
||||||
then the reader doesnt care : It might eventually
|
then the reader doesnt care : It might eventually
|
||||||
scan the list again without harm.
|
scan the list again without harm.
|
||||||
|
|
||||||
|
@@ -184,7 +184,8 @@ length. Single character labels using special characters, that being anything
|
|||||||
other than a letter or digit, are reserved for use by the Smack development
|
other than a letter or digit, are reserved for use by the Smack development
|
||||||
team. Smack labels are unstructured, case sensitive, and the only operation
|
team. Smack labels are unstructured, case sensitive, and the only operation
|
||||||
ever performed on them is comparison for equality. Smack labels cannot
|
ever performed on them is comparison for equality. Smack labels cannot
|
||||||
contain unprintable characters or the "/" (slash) character.
|
contain unprintable characters or the "/" (slash) character. Smack labels
|
||||||
|
cannot begin with a '-', which is reserved for special options.
|
||||||
|
|
||||||
There are some predefined labels:
|
There are some predefined labels:
|
||||||
|
|
||||||
@@ -192,6 +193,7 @@ There are some predefined labels:
|
|||||||
^ Pronounced "hat", a single circumflex character.
|
^ Pronounced "hat", a single circumflex character.
|
||||||
* Pronounced "star", a single asterisk character.
|
* Pronounced "star", a single asterisk character.
|
||||||
? Pronounced "huh", a single question mark character.
|
? Pronounced "huh", a single question mark character.
|
||||||
|
@ Pronounced "Internet", a single at sign character.
|
||||||
|
|
||||||
Every task on a Smack system is assigned a label. System tasks, such as
|
Every task on a Smack system is assigned a label. System tasks, such as
|
||||||
init(8) and systems daemons, are run with the floor ("_") label. User tasks
|
init(8) and systems daemons, are run with the floor ("_") label. User tasks
|
||||||
@@ -412,6 +414,36 @@ sockets.
|
|||||||
A privileged program may set this to match the label of another
|
A privileged program may set this to match the label of another
|
||||||
task with which it hopes to communicate.
|
task with which it hopes to communicate.
|
||||||
|
|
||||||
|
Smack Netlabel Exceptions
|
||||||
|
|
||||||
|
You will often find that your labeled application has to talk to the outside,
|
||||||
|
unlabeled world. To do this there's a special file /smack/netlabel where you can
|
||||||
|
add some exceptions in the form of :
|
||||||
|
@IP1 LABEL1 or
|
||||||
|
@IP2/MASK LABEL2
|
||||||
|
|
||||||
|
It means that your application will have unlabeled access to @IP1 if it has
|
||||||
|
write access on LABEL1, and access to the subnet @IP2/MASK if it has write
|
||||||
|
access on LABEL2.
|
||||||
|
|
||||||
|
Entries in the /smack/netlabel file are matched by longest mask first, like in
|
||||||
|
classless IPv4 routing.
|
||||||
|
|
||||||
|
A special label '@' and an option '-CIPSO' can be used there :
|
||||||
|
@ means Internet, any application with any label has access to it
|
||||||
|
-CIPSO means standard CIPSO networking
|
||||||
|
|
||||||
|
If you don't know what CIPSO is and don't plan to use it, you can just do :
|
||||||
|
echo 127.0.0.1 -CIPSO > /smack/netlabel
|
||||||
|
echo 0.0.0.0/0 @ > /smack/netlabel
|
||||||
|
|
||||||
|
If you use CIPSO on your 192.168.0.0/16 local network and need also unlabeled
|
||||||
|
Internet access, you can have :
|
||||||
|
echo 127.0.0.1 -CIPSO > /smack/netlabel
|
||||||
|
echo 192.168.0.0/16 -CIPSO > /smack/netlabel
|
||||||
|
echo 0.0.0.0/0 @ > /smack/netlabel
|
||||||
|
|
||||||
|
|
||||||
Writing Applications for Smack
|
Writing Applications for Smack
|
||||||
|
|
||||||
There are three sorts of applications that will run on a Smack system. How an
|
There are three sorts of applications that will run on a Smack system. How an
|
||||||
|
@@ -40,13 +40,13 @@ Resuming
|
|||||||
Machine Support
|
Machine Support
|
||||||
---------------
|
---------------
|
||||||
|
|
||||||
The machine specific functions must call the s3c2410_pm_init() function
|
The machine specific functions must call the s3c_pm_init() function
|
||||||
to say that its bootloader is capable of resuming. This can be as
|
to say that its bootloader is capable of resuming. This can be as
|
||||||
simple as adding the following to the machine's definition:
|
simple as adding the following to the machine's definition:
|
||||||
|
|
||||||
INITMACHINE(s3c2410_pm_init)
|
INITMACHINE(s3c_pm_init)
|
||||||
|
|
||||||
A board can do its own setup before calling s3c2410_pm_init, if it
|
A board can do its own setup before calling s3c_pm_init, if it
|
||||||
needs to setup anything else for power management support.
|
needs to setup anything else for power management support.
|
||||||
|
|
||||||
There is currently no support for over-riding the default method of
|
There is currently no support for over-riding the default method of
|
||||||
@@ -74,7 +74,7 @@ statuc void __init machine_init(void)
|
|||||||
|
|
||||||
enable_irq_wake(IRQ_EINT0);
|
enable_irq_wake(IRQ_EINT0);
|
||||||
|
|
||||||
s3c2410_pm_init();
|
s3c_pm_init();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@@ -29,7 +29,14 @@ ffff0000 ffff0fff CPU vector page.
|
|||||||
CPU supports vector relocation (control
|
CPU supports vector relocation (control
|
||||||
register V bit.)
|
register V bit.)
|
||||||
|
|
||||||
ffc00000 fffeffff DMA memory mapping region. Memory returned
|
fffe0000 fffeffff XScale cache flush area. This is used
|
||||||
|
in proc-xscale.S to flush the whole data
|
||||||
|
cache. Free for other usage on non-XScale.
|
||||||
|
|
||||||
|
fff00000 fffdffff Fixmap mapping region. Addresses provided
|
||||||
|
by fix_to_virt() will be located here.
|
||||||
|
|
||||||
|
ffc00000 ffefffff DMA memory mapping region. Memory returned
|
||||||
by the dma_alloc_xxx functions will be
|
by the dma_alloc_xxx functions will be
|
||||||
dynamically mapped here.
|
dynamically mapped here.
|
||||||
|
|
||||||
|
@@ -35,9 +35,3 @@ noop anticipatory deadline [cfq]
|
|||||||
# echo anticipatory > /sys/block/hda/queue/scheduler
|
# echo anticipatory > /sys/block/hda/queue/scheduler
|
||||||
# cat /sys/block/hda/queue/scheduler
|
# cat /sys/block/hda/queue/scheduler
|
||||||
noop [anticipatory] deadline cfq
|
noop [anticipatory] deadline cfq
|
||||||
|
|
||||||
Each io queue has a set of io scheduler tunables associated with it. These
|
|
||||||
tunables control how the io scheduler works. You can find these entries
|
|
||||||
in:
|
|
||||||
|
|
||||||
/sys/block/<device>/queue/iosched
|
|
||||||
|
@@ -8,6 +8,8 @@ cpqarray.txt
|
|||||||
- info on using Compaq's SMART2 Intelligent Disk Array Controllers.
|
- info on using Compaq's SMART2 Intelligent Disk Array Controllers.
|
||||||
floppy.txt
|
floppy.txt
|
||||||
- notes and driver options for the floppy disk driver.
|
- notes and driver options for the floppy disk driver.
|
||||||
|
mflash.txt
|
||||||
|
- info on mGine m(g)flash driver for linux.
|
||||||
nbd.txt
|
nbd.txt
|
||||||
- info on a TCP implementation of a network block device.
|
- info on a TCP implementation of a network block device.
|
||||||
paride.txt
|
paride.txt
|
||||||
|
84
Documentation/blockdev/mflash.txt
Normal file
84
Documentation/blockdev/mflash.txt
Normal file
@@ -0,0 +1,84 @@
|
|||||||
|
This document describes m[g]flash support in linux.
|
||||||
|
|
||||||
|
Contents
|
||||||
|
1. Overview
|
||||||
|
2. Reserved area configuration
|
||||||
|
3. Example of mflash platform driver registration
|
||||||
|
|
||||||
|
1. Overview
|
||||||
|
|
||||||
|
Mflash and gflash are embedded flash drive. The only difference is mflash is
|
||||||
|
MCP(Multi Chip Package) device. These two device operate exactly same way.
|
||||||
|
So the rest mflash repersents mflash and gflash altogether.
|
||||||
|
|
||||||
|
Internally, mflash has nand flash and other hardware logics and supports
|
||||||
|
2 different operation (ATA, IO) modes. ATA mode doesn't need any new
|
||||||
|
driver and currently works well under standard IDE subsystem. Actually it's
|
||||||
|
one chip SSD. IO mode is ATA-like custom mode for the host that doesn't have
|
||||||
|
IDE interface.
|
||||||
|
|
||||||
|
Followings are brief descriptions about IO mode.
|
||||||
|
A. IO mode based on ATA protocol and uses some custom command. (read confirm,
|
||||||
|
write confirm)
|
||||||
|
B. IO mode uses SRAM bus interface.
|
||||||
|
C. IO mode supports 4kB boot area, so host can boot from mflash.
|
||||||
|
|
||||||
|
2. Reserved area configuration
|
||||||
|
If host boot from mflash, usually needs raw area for boot loader image. All of
|
||||||
|
the mflash's block device operation will be taken this value as start offset.
|
||||||
|
Note that boot loader's size of reserved area and kernel configuration value
|
||||||
|
must be same.
|
||||||
|
|
||||||
|
3. Example of mflash platform driver registration
|
||||||
|
Working mflash is very straight forward. Adding platform device stuff to board
|
||||||
|
configuration file is all. Here is some pseudo example.
|
||||||
|
|
||||||
|
static struct mg_drv_data mflash_drv_data = {
|
||||||
|
/* If you want to polling driver set to 1 */
|
||||||
|
.use_polling = 0,
|
||||||
|
/* device attribution */
|
||||||
|
.dev_attr = MG_BOOT_DEV
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct resource mg_mflash_rsc[] = {
|
||||||
|
/* Base address of mflash */
|
||||||
|
[0] = {
|
||||||
|
.start = 0x08000000,
|
||||||
|
.end = 0x08000000 + SZ_64K - 1,
|
||||||
|
.flags = IORESOURCE_MEM
|
||||||
|
},
|
||||||
|
/* mflash interrupt pin */
|
||||||
|
[1] = {
|
||||||
|
.start = IRQ_GPIO(84),
|
||||||
|
.end = IRQ_GPIO(84),
|
||||||
|
.flags = IORESOURCE_IRQ
|
||||||
|
},
|
||||||
|
/* mflash reset pin */
|
||||||
|
[2] = {
|
||||||
|
.start = 43,
|
||||||
|
.end = 43,
|
||||||
|
.name = MG_RST_PIN,
|
||||||
|
.flags = IORESOURCE_IO
|
||||||
|
},
|
||||||
|
/* mflash reset-out pin
|
||||||
|
* If you use mflash as storage device (i.e. other than MG_BOOT_DEV),
|
||||||
|
* should assign this */
|
||||||
|
[3] = {
|
||||||
|
.start = 51,
|
||||||
|
.end = 51,
|
||||||
|
.name = MG_RSTOUT_PIN,
|
||||||
|
.flags = IORESOURCE_IO
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct platform_device mflash_dev = {
|
||||||
|
.name = MG_DEV_NAME,
|
||||||
|
.id = -1,
|
||||||
|
.dev = {
|
||||||
|
.platform_data = &mflash_drv_data,
|
||||||
|
},
|
||||||
|
.num_resources = ARRAY_SIZE(mg_mflash_rsc),
|
||||||
|
.resource = mg_mflash_rsc
|
||||||
|
};
|
||||||
|
|
||||||
|
platform_device_register(&mflash_dev);
|
18
Documentation/cgroups/00-INDEX
Normal file
18
Documentation/cgroups/00-INDEX
Normal file
@@ -0,0 +1,18 @@
|
|||||||
|
00-INDEX
|
||||||
|
- this file
|
||||||
|
cgroups.txt
|
||||||
|
- Control Groups definition, implementation details, examples and API.
|
||||||
|
cpuacct.txt
|
||||||
|
- CPU Accounting Controller; account CPU usage for groups of tasks.
|
||||||
|
cpusets.txt
|
||||||
|
- documents the cpusets feature; assign CPUs and Mem to a set of tasks.
|
||||||
|
devices.txt
|
||||||
|
- Device Whitelist Controller; description, interface and security.
|
||||||
|
freezer-subsystem.txt
|
||||||
|
- checkpointing; rationale to not use signals, interface.
|
||||||
|
memcg_test.txt
|
||||||
|
- Memory Resource Controller; implementation details.
|
||||||
|
memory.txt
|
||||||
|
- Memory Resource Controller; design, accounting, interface, testing.
|
||||||
|
resource_counter.txt
|
||||||
|
- Resource Counter API.
|
@@ -56,7 +56,7 @@ hierarchy, and a set of subsystems; each subsystem has system-specific
|
|||||||
state attached to each cgroup in the hierarchy. Each hierarchy has
|
state attached to each cgroup in the hierarchy. Each hierarchy has
|
||||||
an instance of the cgroup virtual filesystem associated with it.
|
an instance of the cgroup virtual filesystem associated with it.
|
||||||
|
|
||||||
At any one time there may be multiple active hierachies of task
|
At any one time there may be multiple active hierarchies of task
|
||||||
cgroups. Each hierarchy is a partition of all tasks in the system.
|
cgroups. Each hierarchy is a partition of all tasks in the system.
|
||||||
|
|
||||||
User level code may create and destroy cgroups by name in an
|
User level code may create and destroy cgroups by name in an
|
||||||
@@ -124,10 +124,10 @@ following lines:
|
|||||||
/ \
|
/ \
|
||||||
Prof (15%) students (5%)
|
Prof (15%) students (5%)
|
||||||
|
|
||||||
Browsers like firefox/lynx go into the WWW network class, while (k)nfsd go
|
Browsers like Firefox/Lynx go into the WWW network class, while (k)nfsd go
|
||||||
into NFS network class.
|
into NFS network class.
|
||||||
|
|
||||||
At the same time firefox/lynx will share an appropriate CPU/Memory class
|
At the same time Firefox/Lynx will share an appropriate CPU/Memory class
|
||||||
depending on who launched it (prof/student).
|
depending on who launched it (prof/student).
|
||||||
|
|
||||||
With the ability to classify tasks differently for different resources
|
With the ability to classify tasks differently for different resources
|
||||||
@@ -252,10 +252,8 @@ cgroup file system directories.
|
|||||||
When a task is moved from one cgroup to another, it gets a new
|
When a task is moved from one cgroup to another, it gets a new
|
||||||
css_set pointer - if there's an already existing css_set with the
|
css_set pointer - if there's an already existing css_set with the
|
||||||
desired collection of cgroups then that group is reused, else a new
|
desired collection of cgroups then that group is reused, else a new
|
||||||
css_set is allocated. Note that the current implementation uses a
|
css_set is allocated. The appropriate existing css_set is located by
|
||||||
linear search to locate an appropriate existing css_set, so isn't
|
looking into a hash table.
|
||||||
very efficient. A future version will use a hash table for better
|
|
||||||
performance.
|
|
||||||
|
|
||||||
To allow access from a cgroup to the css_sets (and hence tasks)
|
To allow access from a cgroup to the css_sets (and hence tasks)
|
||||||
that comprise it, a set of cg_cgroup_link objects form a lattice;
|
that comprise it, a set of cg_cgroup_link objects form a lattice;
|
||||||
@@ -327,7 +325,7 @@ and then start a subshell 'sh' in that cgroup:
|
|||||||
Creating, modifying, using the cgroups can be done through the cgroup
|
Creating, modifying, using the cgroups can be done through the cgroup
|
||||||
virtual filesystem.
|
virtual filesystem.
|
||||||
|
|
||||||
To mount a cgroup hierarchy will all available subsystems, type:
|
To mount a cgroup hierarchy with all available subsystems, type:
|
||||||
# mount -t cgroup xxx /dev/cgroup
|
# mount -t cgroup xxx /dev/cgroup
|
||||||
|
|
||||||
The "xxx" is not interpreted by the cgroup code, but will appear in
|
The "xxx" is not interpreted by the cgroup code, but will appear in
|
||||||
@@ -335,12 +333,23 @@ The "xxx" is not interpreted by the cgroup code, but will appear in
|
|||||||
|
|
||||||
To mount a cgroup hierarchy with just the cpuset and numtasks
|
To mount a cgroup hierarchy with just the cpuset and numtasks
|
||||||
subsystems, type:
|
subsystems, type:
|
||||||
# mount -t cgroup -o cpuset,numtasks hier1 /dev/cgroup
|
# mount -t cgroup -o cpuset,memory hier1 /dev/cgroup
|
||||||
|
|
||||||
To change the set of subsystems bound to a mounted hierarchy, just
|
To change the set of subsystems bound to a mounted hierarchy, just
|
||||||
remount with different options:
|
remount with different options:
|
||||||
|
# mount -o remount,cpuset,ns hier1 /dev/cgroup
|
||||||
|
|
||||||
# mount -o remount,cpuset,ns /dev/cgroup
|
Now memory is removed from the hierarchy and ns is added.
|
||||||
|
|
||||||
|
Note this will add ns to the hierarchy but won't remove memory or
|
||||||
|
cpuset, because the new options are appended to the old ones:
|
||||||
|
# mount -o remount,ns /dev/cgroup
|
||||||
|
|
||||||
|
To Specify a hierarchy's release_agent:
|
||||||
|
# mount -t cgroup -o cpuset,release_agent="/sbin/cpuset_release_agent" \
|
||||||
|
xxx /dev/cgroup
|
||||||
|
|
||||||
|
Note that specifying 'release_agent' more than once will return failure.
|
||||||
|
|
||||||
Note that changing the set of subsystems is currently only supported
|
Note that changing the set of subsystems is currently only supported
|
||||||
when the hierarchy consists of a single (root) cgroup. Supporting
|
when the hierarchy consists of a single (root) cgroup. Supporting
|
||||||
@@ -351,6 +360,11 @@ Then under /dev/cgroup you can find a tree that corresponds to the
|
|||||||
tree of the cgroups in the system. For instance, /dev/cgroup
|
tree of the cgroups in the system. For instance, /dev/cgroup
|
||||||
is the cgroup that holds the whole system.
|
is the cgroup that holds the whole system.
|
||||||
|
|
||||||
|
If you want to change the value of release_agent:
|
||||||
|
# echo "/sbin/new_release_agent" > /dev/cgroup/release_agent
|
||||||
|
|
||||||
|
It can also be changed via remount.
|
||||||
|
|
||||||
If you want to create a new cgroup under /dev/cgroup:
|
If you want to create a new cgroup under /dev/cgroup:
|
||||||
# cd /dev/cgroup
|
# cd /dev/cgroup
|
||||||
# mkdir my_cgroup
|
# mkdir my_cgroup
|
||||||
@@ -478,11 +492,13 @@ cgroup->parent is still valid. (Note - can also be called for a
|
|||||||
newly-created cgroup if an error occurs after this subsystem's
|
newly-created cgroup if an error occurs after this subsystem's
|
||||||
create() method has been called for the new cgroup).
|
create() method has been called for the new cgroup).
|
||||||
|
|
||||||
void pre_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp);
|
int pre_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp);
|
||||||
|
|
||||||
Called before checking the reference count on each subsystem. This may
|
Called before checking the reference count on each subsystem. This may
|
||||||
be useful for subsystems which have some extra references even if
|
be useful for subsystems which have some extra references even if
|
||||||
there are not tasks in the cgroup.
|
there are not tasks in the cgroup. If pre_destroy() returns error code,
|
||||||
|
rmdir() will fail with it. From this behavior, pre_destroy() can be
|
||||||
|
called multiple times against a cgroup.
|
||||||
|
|
||||||
int can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
|
int can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
|
||||||
struct task_struct *task)
|
struct task_struct *task)
|
||||||
@@ -523,7 +539,7 @@ always handled well.
|
|||||||
void post_clone(struct cgroup_subsys *ss, struct cgroup *cgrp)
|
void post_clone(struct cgroup_subsys *ss, struct cgroup *cgrp)
|
||||||
(cgroup_mutex held by caller)
|
(cgroup_mutex held by caller)
|
||||||
|
|
||||||
Called at the end of cgroup_clone() to do any paramater
|
Called at the end of cgroup_clone() to do any parameter
|
||||||
initialization which might be required before a task could attach. For
|
initialization which might be required before a task could attach. For
|
||||||
example in cpusets, no task may attach before 'cpus' and 'mems' are set
|
example in cpusets, no task may attach before 'cpus' and 'mems' are set
|
||||||
up.
|
up.
|
||||||
|
@@ -131,7 +131,7 @@ Cpusets extends these two mechanisms as follows:
|
|||||||
- The hierarchy of cpusets can be mounted at /dev/cpuset, for
|
- The hierarchy of cpusets can be mounted at /dev/cpuset, for
|
||||||
browsing and manipulation from user space.
|
browsing and manipulation from user space.
|
||||||
- A cpuset may be marked exclusive, which ensures that no other
|
- A cpuset may be marked exclusive, which ensures that no other
|
||||||
cpuset (except direct ancestors and descendents) may contain
|
cpuset (except direct ancestors and descendants) may contain
|
||||||
any overlapping CPUs or Memory Nodes.
|
any overlapping CPUs or Memory Nodes.
|
||||||
- You can list all the tasks (by pid) attached to any cpuset.
|
- You can list all the tasks (by pid) attached to any cpuset.
|
||||||
|
|
||||||
@@ -142,7 +142,7 @@ into the rest of the kernel, none in performance critical paths:
|
|||||||
- in fork and exit, to attach and detach a task from its cpuset.
|
- in fork and exit, to attach and detach a task from its cpuset.
|
||||||
- in sched_setaffinity, to mask the requested CPUs by what's
|
- in sched_setaffinity, to mask the requested CPUs by what's
|
||||||
allowed in that tasks cpuset.
|
allowed in that tasks cpuset.
|
||||||
- in sched.c migrate_all_tasks(), to keep migrating tasks within
|
- in sched.c migrate_live_tasks(), to keep migrating tasks within
|
||||||
the CPUs allowed by their cpuset, if possible.
|
the CPUs allowed by their cpuset, if possible.
|
||||||
- in the mbind and set_mempolicy system calls, to mask the requested
|
- in the mbind and set_mempolicy system calls, to mask the requested
|
||||||
Memory Nodes by what's allowed in that tasks cpuset.
|
Memory Nodes by what's allowed in that tasks cpuset.
|
||||||
@@ -175,6 +175,10 @@ files describing that cpuset:
|
|||||||
- mem_exclusive flag: is memory placement exclusive?
|
- mem_exclusive flag: is memory placement exclusive?
|
||||||
- mem_hardwall flag: is memory allocation hardwalled
|
- mem_hardwall flag: is memory allocation hardwalled
|
||||||
- memory_pressure: measure of how much paging pressure in cpuset
|
- memory_pressure: measure of how much paging pressure in cpuset
|
||||||
|
- memory_spread_page flag: if set, spread page cache evenly on allowed nodes
|
||||||
|
- memory_spread_slab flag: if set, spread slab cache evenly on allowed nodes
|
||||||
|
- sched_load_balance flag: if set, load balance within CPUs on that cpuset
|
||||||
|
- sched_relax_domain_level: the searching range when migrating tasks
|
||||||
|
|
||||||
In addition, the root cpuset only has the following file:
|
In addition, the root cpuset only has the following file:
|
||||||
- memory_pressure_enabled flag: compute memory_pressure?
|
- memory_pressure_enabled flag: compute memory_pressure?
|
||||||
@@ -222,7 +226,7 @@ nodes with memory--using the cpuset_track_online_nodes() hook.
|
|||||||
--------------------------------
|
--------------------------------
|
||||||
|
|
||||||
If a cpuset is cpu or mem exclusive, no other cpuset, other than
|
If a cpuset is cpu or mem exclusive, no other cpuset, other than
|
||||||
a direct ancestor or descendent, may share any of the same CPUs or
|
a direct ancestor or descendant, may share any of the same CPUs or
|
||||||
Memory Nodes.
|
Memory Nodes.
|
||||||
|
|
||||||
A cpuset that is mem_exclusive *or* mem_hardwall is "hardwalled",
|
A cpuset that is mem_exclusive *or* mem_hardwall is "hardwalled",
|
||||||
@@ -252,7 +256,7 @@ is causing.
|
|||||||
|
|
||||||
This is useful both on tightly managed systems running a wide mix of
|
This is useful both on tightly managed systems running a wide mix of
|
||||||
submitted jobs, which may choose to terminate or re-prioritize jobs that
|
submitted jobs, which may choose to terminate or re-prioritize jobs that
|
||||||
are trying to use more memory than allowed on the nodes assigned them,
|
are trying to use more memory than allowed on the nodes assigned to them,
|
||||||
and with tightly coupled, long running, massively parallel scientific
|
and with tightly coupled, long running, massively parallel scientific
|
||||||
computing jobs that will dramatically fail to meet required performance
|
computing jobs that will dramatically fail to meet required performance
|
||||||
goals if they start to use more memory than allowed to them.
|
goals if they start to use more memory than allowed to them.
|
||||||
@@ -423,7 +427,7 @@ child cpusets have this flag enabled.
|
|||||||
When doing this, you don't usually want to leave any unpinned tasks in
|
When doing this, you don't usually want to leave any unpinned tasks in
|
||||||
the top cpuset that might use non-trivial amounts of CPU, as such tasks
|
the top cpuset that might use non-trivial amounts of CPU, as such tasks
|
||||||
may be artificially constrained to some subset of CPUs, depending on
|
may be artificially constrained to some subset of CPUs, depending on
|
||||||
the particulars of this flag setting in descendent cpusets. Even if
|
the particulars of this flag setting in descendant cpusets. Even if
|
||||||
such a task could use spare CPU cycles in some other CPUs, the kernel
|
such a task could use spare CPU cycles in some other CPUs, the kernel
|
||||||
scheduler might not consider the possibility of load balancing that
|
scheduler might not consider the possibility of load balancing that
|
||||||
task to that underused CPU.
|
task to that underused CPU.
|
||||||
@@ -485,17 +489,22 @@ of CPUs allowed to a cpuset having 'sched_load_balance' enabled.
|
|||||||
The internal kernel cpuset to scheduler interface passes from the
|
The internal kernel cpuset to scheduler interface passes from the
|
||||||
cpuset code to the scheduler code a partition of the load balanced
|
cpuset code to the scheduler code a partition of the load balanced
|
||||||
CPUs in the system. This partition is a set of subsets (represented
|
CPUs in the system. This partition is a set of subsets (represented
|
||||||
as an array of cpumask_t) of CPUs, pairwise disjoint, that cover all
|
as an array of struct cpumask) of CPUs, pairwise disjoint, that cover
|
||||||
the CPUs that must be load balanced.
|
all the CPUs that must be load balanced.
|
||||||
|
|
||||||
Whenever the 'sched_load_balance' flag changes, or CPUs come or go
|
The cpuset code builds a new such partition and passes it to the
|
||||||
from a cpuset with this flag enabled, or a cpuset with this flag
|
scheduler sched domain setup code, to have the sched domains rebuilt
|
||||||
enabled is removed, the cpuset code builds a new such partition and
|
as necessary, whenever:
|
||||||
passes it to the scheduler sched domain setup code, to have the sched
|
- the 'sched_load_balance' flag of a cpuset with non-empty CPUs changes,
|
||||||
domains rebuilt as necessary.
|
- or CPUs come or go from a cpuset with this flag enabled,
|
||||||
|
- or 'sched_relax_domain_level' value of a cpuset with non-empty CPUs
|
||||||
|
and with this flag enabled changes,
|
||||||
|
- or a cpuset with non-empty CPUs and with this flag enabled is removed,
|
||||||
|
- or a cpu is offlined/onlined.
|
||||||
|
|
||||||
This partition exactly defines what sched domains the scheduler should
|
This partition exactly defines what sched domains the scheduler should
|
||||||
setup - one sched domain for each element (cpumask_t) in the partition.
|
setup - one sched domain for each element (struct cpumask) in the
|
||||||
|
partition.
|
||||||
|
|
||||||
The scheduler remembers the currently active sched domain partitions.
|
The scheduler remembers the currently active sched domain partitions.
|
||||||
When the scheduler routine partition_sched_domains() is invoked from
|
When the scheduler routine partition_sched_domains() is invoked from
|
||||||
@@ -522,9 +531,9 @@ be idle.
|
|||||||
|
|
||||||
Of course it takes some searching cost to find movable tasks and/or
|
Of course it takes some searching cost to find movable tasks and/or
|
||||||
idle CPUs, the scheduler might not search all CPUs in the domain
|
idle CPUs, the scheduler might not search all CPUs in the domain
|
||||||
everytime. In fact, in some architectures, the searching ranges on
|
every time. In fact, in some architectures, the searching ranges on
|
||||||
events are limited in the same socket or node where the CPU locates,
|
events are limited in the same socket or node where the CPU locates,
|
||||||
while the load balance on tick searchs all.
|
while the load balance on tick searches all.
|
||||||
|
|
||||||
For example, assume CPU Z is relatively far from CPU X. Even if CPU Z
|
For example, assume CPU Z is relatively far from CPU X. Even if CPU Z
|
||||||
is idle while CPU X and the siblings are busy, scheduler can't migrate
|
is idle while CPU X and the siblings are busy, scheduler can't migrate
|
||||||
@@ -559,7 +568,7 @@ domain, the largest value among those is used. Be careful, if one
|
|||||||
requests 0 and others are -1 then 0 is used.
|
requests 0 and others are -1 then 0 is used.
|
||||||
|
|
||||||
Note that modifying this file will have both good and bad effects,
|
Note that modifying this file will have both good and bad effects,
|
||||||
and whether it is acceptable or not will be depend on your situation.
|
and whether it is acceptable or not depends on your situation.
|
||||||
Don't modify this file if you are not sure.
|
Don't modify this file if you are not sure.
|
||||||
|
|
||||||
If your situation is:
|
If your situation is:
|
||||||
@@ -592,7 +601,7 @@ its new cpuset, then the task will continue to use whatever subset
|
|||||||
of MPOL_BIND nodes are still allowed in the new cpuset. If the task
|
of MPOL_BIND nodes are still allowed in the new cpuset. If the task
|
||||||
was using MPOL_BIND and now none of its MPOL_BIND nodes are allowed
|
was using MPOL_BIND and now none of its MPOL_BIND nodes are allowed
|
||||||
in the new cpuset, then the task will be essentially treated as if it
|
in the new cpuset, then the task will be essentially treated as if it
|
||||||
was MPOL_BIND bound to the new cpuset (even though its numa placement,
|
was MPOL_BIND bound to the new cpuset (even though its NUMA placement,
|
||||||
as queried by get_mempolicy(), doesn't change). If a task is moved
|
as queried by get_mempolicy(), doesn't change). If a task is moved
|
||||||
from one cpuset to another, then the kernel will adjust the tasks
|
from one cpuset to another, then the kernel will adjust the tasks
|
||||||
memory placement, as above, the next time that the kernel attempts
|
memory placement, as above, the next time that the kernel attempts
|
||||||
@@ -600,19 +609,15 @@ to allocate a page of memory for that task.
|
|||||||
|
|
||||||
If a cpuset has its 'cpus' modified, then each task in that cpuset
|
If a cpuset has its 'cpus' modified, then each task in that cpuset
|
||||||
will have its allowed CPU placement changed immediately. Similarly,
|
will have its allowed CPU placement changed immediately. Similarly,
|
||||||
if a tasks pid is written to a cpusets 'tasks' file, in either its
|
if a tasks pid is written to another cpusets 'tasks' file, then its
|
||||||
current cpuset or another cpuset, then its allowed CPU placement is
|
allowed CPU placement is changed immediately. If such a task had been
|
||||||
changed immediately. If such a task had been bound to some subset
|
bound to some subset of its cpuset using the sched_setaffinity() call,
|
||||||
of its cpuset using the sched_setaffinity() call, the task will be
|
the task will be allowed to run on any CPU allowed in its new cpuset,
|
||||||
allowed to run on any CPU allowed in its new cpuset, negating the
|
negating the effect of the prior sched_setaffinity() call.
|
||||||
affect of the prior sched_setaffinity() call.
|
|
||||||
|
|
||||||
In summary, the memory placement of a task whose cpuset is changed is
|
In summary, the memory placement of a task whose cpuset is changed is
|
||||||
updated by the kernel, on the next allocation of a page for that task,
|
updated by the kernel, on the next allocation of a page for that task,
|
||||||
but the processor placement is not updated, until that tasks pid is
|
and the processor placement is updated immediately.
|
||||||
rewritten to the 'tasks' file of its cpuset. This is done to avoid
|
|
||||||
impacting the scheduler code in the kernel with a check for changes
|
|
||||||
in a tasks processor placement.
|
|
||||||
|
|
||||||
Normally, once a page is allocated (given a physical page
|
Normally, once a page is allocated (given a physical page
|
||||||
of main memory) then that page stays on whatever node it
|
of main memory) then that page stays on whatever node it
|
||||||
@@ -681,10 +686,14 @@ and then start a subshell 'sh' in that cpuset:
|
|||||||
# The next line should display '/Charlie'
|
# The next line should display '/Charlie'
|
||||||
cat /proc/self/cpuset
|
cat /proc/self/cpuset
|
||||||
|
|
||||||
In the future, a C library interface to cpusets will likely be
|
There are ways to query or modify cpusets:
|
||||||
available. For now, the only way to query or modify cpusets is
|
- via the cpuset file system directly, using the various cd, mkdir, echo,
|
||||||
via the cpuset file system, using the various cd, mkdir, echo, cat,
|
cat, rmdir commands from the shell, or their equivalent from C.
|
||||||
rmdir commands from the shell, or their equivalent from C.
|
- via the C library libcpuset.
|
||||||
|
- via the C library libcgroup.
|
||||||
|
(http://sourceforge.net/proects/libcg/)
|
||||||
|
- via the python application cset.
|
||||||
|
(http://developer.novell.com/wiki/index.php/Cpuset)
|
||||||
|
|
||||||
The sched_setaffinity calls can also be done at the shell prompt using
|
The sched_setaffinity calls can also be done at the shell prompt using
|
||||||
SGI's runon or Robert Love's taskset. The mbind and set_mempolicy
|
SGI's runon or Robert Love's taskset. The mbind and set_mempolicy
|
||||||
@@ -756,7 +765,7 @@ mount -t cpuset X /dev/cpuset
|
|||||||
|
|
||||||
is equivalent to
|
is equivalent to
|
||||||
|
|
||||||
mount -t cgroup -ocpuset X /dev/cpuset
|
mount -t cgroup -ocpuset,noprefix X /dev/cpuset
|
||||||
echo "/sbin/cpuset_release_agent" > /dev/cpuset/release_agent
|
echo "/sbin/cpuset_release_agent" > /dev/cpuset/release_agent
|
||||||
|
|
||||||
2.2 Adding/removing cpus
|
2.2 Adding/removing cpus
|
||||||
|
@@ -42,7 +42,7 @@ suffice, but we can decide the best way to adequately restrict
|
|||||||
movement as people get some experience with this. We may just want
|
movement as people get some experience with this. We may just want
|
||||||
to require CAP_SYS_ADMIN, which at least is a separate bit from
|
to require CAP_SYS_ADMIN, which at least is a separate bit from
|
||||||
CAP_MKNOD. We may want to just refuse moving to a cgroup which
|
CAP_MKNOD. We may want to just refuse moving to a cgroup which
|
||||||
isn't a descendent of the current one. Or we may want to use
|
isn't a descendant of the current one. Or we may want to use
|
||||||
CAP_MAC_ADMIN, since we really are trying to lock down root.
|
CAP_MAC_ADMIN, since we really are trying to lock down root.
|
||||||
|
|
||||||
CAP_SYS_ADMIN is needed to modify the whitelist or move another
|
CAP_SYS_ADMIN is needed to modify the whitelist or move another
|
||||||
|
@@ -1,5 +1,5 @@
|
|||||||
Memory Resource Controller(Memcg) Implementation Memo.
|
Memory Resource Controller(Memcg) Implementation Memo.
|
||||||
Last Updated: 2009/1/19
|
Last Updated: 2009/1/20
|
||||||
Base Kernel Version: based on 2.6.29-rc2.
|
Base Kernel Version: based on 2.6.29-rc2.
|
||||||
|
|
||||||
Because VM is getting complex (one of reasons is memcg...), memcg's behavior
|
Because VM is getting complex (one of reasons is memcg...), memcg's behavior
|
||||||
@@ -356,7 +356,25 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
|
|||||||
(Shell-B)
|
(Shell-B)
|
||||||
# move all tasks in /cgroup/test to /cgroup
|
# move all tasks in /cgroup/test to /cgroup
|
||||||
# /sbin/swapoff -a
|
# /sbin/swapoff -a
|
||||||
# rmdir /test/cgroup
|
# rmdir /cgroup/test
|
||||||
# kill malloc task.
|
# kill malloc task.
|
||||||
|
|
||||||
Of course, tmpfs v.s. swapoff test should be tested, too.
|
Of course, tmpfs v.s. swapoff test should be tested, too.
|
||||||
|
|
||||||
|
9.8 OOM-Killer
|
||||||
|
Out-of-memory caused by memcg's limit will kill tasks under
|
||||||
|
the memcg. When hierarchy is used, a task under hierarchy
|
||||||
|
will be killed by the kernel.
|
||||||
|
In this case, panic_on_oom shouldn't be invoked and tasks
|
||||||
|
in other groups shouldn't be killed.
|
||||||
|
|
||||||
|
It's not difficult to cause OOM under memcg as following.
|
||||||
|
Case A) when you can swapoff
|
||||||
|
#swapoff -a
|
||||||
|
#echo 50M > /memory.limit_in_bytes
|
||||||
|
run 51M of malloc
|
||||||
|
|
||||||
|
Case B) when you use mem+swap limitation.
|
||||||
|
#echo 50M > memory.limit_in_bytes
|
||||||
|
#echo 50M > memory.memsw.limit_in_bytes
|
||||||
|
run 51M of malloc
|
||||||
|
@@ -302,7 +302,7 @@ will be charged as a new owner of it.
|
|||||||
unevictable - # of pages cannot be reclaimed.(mlocked etc)
|
unevictable - # of pages cannot be reclaimed.(mlocked etc)
|
||||||
|
|
||||||
Below is depend on CONFIG_DEBUG_VM.
|
Below is depend on CONFIG_DEBUG_VM.
|
||||||
inactive_ratio - VM inernal parameter. (see mm/page_alloc.c)
|
inactive_ratio - VM internal parameter. (see mm/page_alloc.c)
|
||||||
recent_rotated_anon - VM internal parameter. (see mm/vmscan.c)
|
recent_rotated_anon - VM internal parameter. (see mm/vmscan.c)
|
||||||
recent_rotated_file - VM internal parameter. (see mm/vmscan.c)
|
recent_rotated_file - VM internal parameter. (see mm/vmscan.c)
|
||||||
recent_scanned_anon - VM internal parameter. (see mm/vmscan.c)
|
recent_scanned_anon - VM internal parameter. (see mm/vmscan.c)
|
||||||
|
@@ -137,7 +137,7 @@ static void cn_test_timer_func(unsigned long __data)
|
|||||||
|
|
||||||
memcpy(m + 1, data, m->len);
|
memcpy(m + 1, data, m->len);
|
||||||
|
|
||||||
cn_netlink_send(m, 0, gfp_any());
|
cn_netlink_send(m, 0, GFP_ATOMIC);
|
||||||
kfree(m);
|
kfree(m);
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -160,10 +160,8 @@ static int cn_test_init(void)
|
|||||||
goto err_out;
|
goto err_out;
|
||||||
}
|
}
|
||||||
|
|
||||||
init_timer(&cn_test_timer);
|
setup_timer(&cn_test_timer, cn_test_timer_func, 0);
|
||||||
cn_test_timer.function = cn_test_timer_func;
|
|
||||||
cn_test_timer.expires = jiffies + HZ;
|
cn_test_timer.expires = jiffies + HZ;
|
||||||
cn_test_timer.data = 0;
|
|
||||||
add_timer(&cn_test_timer);
|
add_timer(&cn_test_timer);
|
||||||
|
|
||||||
return 0;
|
return 0;
|
||||||
|
@@ -117,10 +117,28 @@ accessible parameters:
|
|||||||
sampling_rate: measured in uS (10^-6 seconds), this is how often you
|
sampling_rate: measured in uS (10^-6 seconds), this is how often you
|
||||||
want the kernel to look at the CPU usage and to make decisions on
|
want the kernel to look at the CPU usage and to make decisions on
|
||||||
what to do about the frequency. Typically this is set to values of
|
what to do about the frequency. Typically this is set to values of
|
||||||
around '10000' or more.
|
around '10000' or more. It's default value is (cmp. with users-guide.txt):
|
||||||
|
transition_latency * 1000
|
||||||
|
The lowest value you can set is:
|
||||||
|
transition_latency * 100 or it may get restricted to a value where it
|
||||||
|
makes not sense for the kernel anymore to poll that often which depends
|
||||||
|
on your HZ config variable (HZ=1000: max=20000us, HZ=250: max=5000).
|
||||||
|
Be aware that transition latency is in ns and sampling_rate is in us, so you
|
||||||
|
get the same sysfs value by default.
|
||||||
|
Sampling rate should always get adjusted considering the transition latency
|
||||||
|
To set the sampling rate 750 times as high as the transition latency
|
||||||
|
in the bash (as said, 1000 is default), do:
|
||||||
|
echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) \
|
||||||
|
>ondemand/sampling_rate
|
||||||
|
|
||||||
show_sampling_rate_(min|max): the minimum and maximum sampling rates
|
show_sampling_rate_(min|max): THIS INTERFACE IS DEPRECATED, DON'T USE IT.
|
||||||
available that you may set 'sampling_rate' to.
|
You can use wider ranges now and the general
|
||||||
|
cpuinfo_transition_latency variable (cmp. with user-guide.txt) can be
|
||||||
|
used to obtain exactly the same info:
|
||||||
|
show_sampling_rate_min = transtition_latency * 500 / 1000
|
||||||
|
show_sampling_rate_max = transtition_latency * 500000 / 1000
|
||||||
|
(divided by 1000 is to illustrate that sampling rate is in us and
|
||||||
|
transition latency is exported ns).
|
||||||
|
|
||||||
up_threshold: defines what the average CPU usage between the samplings
|
up_threshold: defines what the average CPU usage between the samplings
|
||||||
of 'sampling_rate' needs to be for the kernel to make a decision on
|
of 'sampling_rate' needs to be for the kernel to make a decision on
|
||||||
|
@@ -152,6 +152,18 @@ cpuinfo_min_freq : this file shows the minimum operating
|
|||||||
frequency the processor can run at(in kHz)
|
frequency the processor can run at(in kHz)
|
||||||
cpuinfo_max_freq : this file shows the maximum operating
|
cpuinfo_max_freq : this file shows the maximum operating
|
||||||
frequency the processor can run at(in kHz)
|
frequency the processor can run at(in kHz)
|
||||||
|
cpuinfo_transition_latency The time it takes on this CPU to
|
||||||
|
switch between two frequencies in nano
|
||||||
|
seconds. If unknown or known to be
|
||||||
|
that high that the driver does not
|
||||||
|
work with the ondemand governor, -1
|
||||||
|
(CPUFREQ_ETERNAL) will be returned.
|
||||||
|
Using this information can be useful
|
||||||
|
to choose an appropriate polling
|
||||||
|
frequency for a kernel governor or
|
||||||
|
userspace daemon. Make sure to not
|
||||||
|
switch the frequency too often
|
||||||
|
resulting in performance loss.
|
||||||
scaling_driver : this file shows what cpufreq driver is
|
scaling_driver : this file shows what cpufreq driver is
|
||||||
used to set the frequency on this CPU
|
used to set the frequency on this CPU
|
||||||
|
|
||||||
@@ -195,19 +207,3 @@ scaling_setspeed. By "echoing" a new frequency into this
|
|||||||
you can change the speed of the CPU,
|
you can change the speed of the CPU,
|
||||||
but only within the limits of
|
but only within the limits of
|
||||||
scaling_min_freq and scaling_max_freq.
|
scaling_min_freq and scaling_max_freq.
|
||||||
|
|
||||||
|
|
||||||
3.2 Deprecated Interfaces
|
|
||||||
-------------------------
|
|
||||||
|
|
||||||
Depending on your kernel configuration, you might find the following
|
|
||||||
cpufreq-related files:
|
|
||||||
/proc/cpufreq
|
|
||||||
/proc/sys/cpu/*/speed
|
|
||||||
/proc/sys/cpu/*/speed-min
|
|
||||||
/proc/sys/cpu/*/speed-max
|
|
||||||
|
|
||||||
These are files for deprecated interfaces to cpufreq, which offer far
|
|
||||||
less functionality. Because of this, these interfaces aren't described
|
|
||||||
here.
|
|
||||||
|
|
||||||
|
@@ -18,11 +18,11 @@ For an architecture to support this feature, it must define some of
|
|||||||
these macros in include/asm-XXX/topology.h:
|
these macros in include/asm-XXX/topology.h:
|
||||||
#define topology_physical_package_id(cpu)
|
#define topology_physical_package_id(cpu)
|
||||||
#define topology_core_id(cpu)
|
#define topology_core_id(cpu)
|
||||||
#define topology_thread_siblings(cpu)
|
#define topology_thread_cpumask(cpu)
|
||||||
#define topology_core_siblings(cpu)
|
#define topology_core_cpumask(cpu)
|
||||||
|
|
||||||
The type of **_id is int.
|
The type of **_id is int.
|
||||||
The type of siblings is cpumask_t.
|
The type of siblings is (const) struct cpumask *.
|
||||||
|
|
||||||
To be consistent on all architectures, include/linux/topology.h
|
To be consistent on all architectures, include/linux/topology.h
|
||||||
provides default definitions for any of the above macros that are
|
provides default definitions for any of the above macros that are
|
||||||
|
@@ -1,9 +1,9 @@
|
|||||||
|
|
||||||
LINUX ALLOCATED DEVICES (2.6+ version)
|
LINUX ALLOCATED DEVICES (2.6+ version)
|
||||||
|
|
||||||
Maintained by Torben Mathiasen <device@lanana.org>
|
Maintained by Alan Cox <device@lanana.org>
|
||||||
|
|
||||||
Last revised: 29 November 2006
|
Last revised: 6th April 2009
|
||||||
|
|
||||||
This list is the Linux Device List, the official registry of allocated
|
This list is the Linux Device List, the official registry of allocated
|
||||||
device numbers and /dev directory nodes for the Linux operating
|
device numbers and /dev directory nodes for the Linux operating
|
||||||
@@ -67,6 +67,11 @@ up to date. Due to the number of registrations I have to maintain it
|
|||||||
in "batch mode", so there is likely additional registrations that
|
in "batch mode", so there is likely additional registrations that
|
||||||
haven't been listed yet.
|
haven't been listed yet.
|
||||||
|
|
||||||
|
Fourth, remember that Linux now has extensive support for dynamic allocation
|
||||||
|
of device numbering and can use sysfs and udev to handle the naming needs.
|
||||||
|
There are still some exceptions in the serial and boot device area. Before
|
||||||
|
asking for a device number make sure you actually need one.
|
||||||
|
|
||||||
Finally, sometimes I have to play "namespace police." Please don't be
|
Finally, sometimes I have to play "namespace police." Please don't be
|
||||||
offended. I often get submissions for /dev names that would be bound
|
offended. I often get submissions for /dev names that would be bound
|
||||||
to cause conflicts down the road. I am trying to avoid getting in a
|
to cause conflicts down the road. I am trying to avoid getting in a
|
||||||
@@ -101,7 +106,7 @@ Your cooperation is appreciated.
|
|||||||
0 = /dev/ram0 First RAM disk
|
0 = /dev/ram0 First RAM disk
|
||||||
1 = /dev/ram1 Second RAM disk
|
1 = /dev/ram1 Second RAM disk
|
||||||
...
|
...
|
||||||
250 = /dev/initrd Initial RAM disk {2.6}
|
250 = /dev/initrd Initial RAM disk
|
||||||
|
|
||||||
Older kernels had /dev/ramdisk (1, 1) here.
|
Older kernels had /dev/ramdisk (1, 1) here.
|
||||||
/dev/initrd refers to a RAM disk which was preloaded
|
/dev/initrd refers to a RAM disk which was preloaded
|
||||||
@@ -340,7 +345,7 @@ Your cooperation is appreciated.
|
|||||||
14 = /dev/touchscreen/ucb1x00 UCB 1x00 touchscreen
|
14 = /dev/touchscreen/ucb1x00 UCB 1x00 touchscreen
|
||||||
15 = /dev/touchscreen/mk712 MK712 touchscreen
|
15 = /dev/touchscreen/mk712 MK712 touchscreen
|
||||||
128 = /dev/beep Fancy beep device
|
128 = /dev/beep Fancy beep device
|
||||||
129 = /dev/modreq Kernel module load request {2.6}
|
129 =
|
||||||
130 = /dev/watchdog Watchdog timer port
|
130 = /dev/watchdog Watchdog timer port
|
||||||
131 = /dev/temperature Machine internal temperature
|
131 = /dev/temperature Machine internal temperature
|
||||||
132 = /dev/hwtrap Hardware fault trap
|
132 = /dev/hwtrap Hardware fault trap
|
||||||
@@ -350,10 +355,10 @@ Your cooperation is appreciated.
|
|||||||
139 = /dev/openprom SPARC OpenBoot PROM
|
139 = /dev/openprom SPARC OpenBoot PROM
|
||||||
140 = /dev/relay8 Berkshire Products Octal relay card
|
140 = /dev/relay8 Berkshire Products Octal relay card
|
||||||
141 = /dev/relay16 Berkshire Products ISO-16 relay card
|
141 = /dev/relay16 Berkshire Products ISO-16 relay card
|
||||||
142 = /dev/msr x86 model-specific registers {2.6}
|
142 =
|
||||||
143 = /dev/pciconf PCI configuration space
|
143 = /dev/pciconf PCI configuration space
|
||||||
144 = /dev/nvram Non-volatile configuration RAM
|
144 = /dev/nvram Non-volatile configuration RAM
|
||||||
145 = /dev/hfmodem Soundcard shortwave modem control {2.6}
|
145 = /dev/hfmodem Soundcard shortwave modem control
|
||||||
146 = /dev/graphics Linux/SGI graphics device
|
146 = /dev/graphics Linux/SGI graphics device
|
||||||
147 = /dev/opengl Linux/SGI OpenGL pipe
|
147 = /dev/opengl Linux/SGI OpenGL pipe
|
||||||
148 = /dev/gfx Linux/SGI graphics effects device
|
148 = /dev/gfx Linux/SGI graphics effects device
|
||||||
@@ -435,6 +440,9 @@ Your cooperation is appreciated.
|
|||||||
228 = /dev/hpet HPET driver
|
228 = /dev/hpet HPET driver
|
||||||
229 = /dev/fuse Fuse (virtual filesystem in user-space)
|
229 = /dev/fuse Fuse (virtual filesystem in user-space)
|
||||||
230 = /dev/midishare MidiShare driver
|
230 = /dev/midishare MidiShare driver
|
||||||
|
231 = /dev/snapshot System memory snapshot device
|
||||||
|
232 = /dev/kvm Kernel-based virtual machine (hardware virtualization extensions)
|
||||||
|
233 = /dev/kmview View-OS A process with a view
|
||||||
240-254 Reserved for local use
|
240-254 Reserved for local use
|
||||||
255 Reserved for MISC_DYNAMIC_MINOR
|
255 Reserved for MISC_DYNAMIC_MINOR
|
||||||
|
|
||||||
@@ -466,10 +474,7 @@ Your cooperation is appreciated.
|
|||||||
The device names specified are proposed -- if there
|
The device names specified are proposed -- if there
|
||||||
are "standard" names for these devices, please let me know.
|
are "standard" names for these devices, please let me know.
|
||||||
|
|
||||||
12 block MSCDEX CD-ROM callback support {2.6}
|
12 block
|
||||||
0 = /dev/dos_cd0 First MSCDEX CD-ROM
|
|
||||||
1 = /dev/dos_cd1 Second MSCDEX CD-ROM
|
|
||||||
...
|
|
||||||
|
|
||||||
13 char Input core
|
13 char Input core
|
||||||
0 = /dev/input/js0 First joystick
|
0 = /dev/input/js0 First joystick
|
||||||
@@ -498,7 +503,7 @@ Your cooperation is appreciated.
|
|||||||
2 = /dev/midi00 First MIDI port
|
2 = /dev/midi00 First MIDI port
|
||||||
3 = /dev/dsp Digital audio
|
3 = /dev/dsp Digital audio
|
||||||
4 = /dev/audio Sun-compatible digital audio
|
4 = /dev/audio Sun-compatible digital audio
|
||||||
6 = /dev/sndstat Sound card status information {2.6}
|
6 =
|
||||||
7 = /dev/audioctl SPARC audio control device
|
7 = /dev/audioctl SPARC audio control device
|
||||||
8 = /dev/sequencer2 Sequencer -- alternate device
|
8 = /dev/sequencer2 Sequencer -- alternate device
|
||||||
16 = /dev/mixer1 Second soundcard mixer control
|
16 = /dev/mixer1 Second soundcard mixer control
|
||||||
@@ -510,14 +515,7 @@ Your cooperation is appreciated.
|
|||||||
34 = /dev/midi02 Third MIDI port
|
34 = /dev/midi02 Third MIDI port
|
||||||
50 = /dev/midi03 Fourth MIDI port
|
50 = /dev/midi03 Fourth MIDI port
|
||||||
|
|
||||||
14 block BIOS harddrive callback support {2.6}
|
14 block
|
||||||
0 = /dev/dos_hda First BIOS harddrive whole disk
|
|
||||||
64 = /dev/dos_hdb Second BIOS harddrive whole disk
|
|
||||||
128 = /dev/dos_hdc Third BIOS harddrive whole disk
|
|
||||||
192 = /dev/dos_hdd Fourth BIOS harddrive whole disk
|
|
||||||
|
|
||||||
Partitions are handled in the same way as IDE disks
|
|
||||||
(see major number 3).
|
|
||||||
|
|
||||||
15 char Joystick
|
15 char Joystick
|
||||||
0 = /dev/js0 First analog joystick
|
0 = /dev/js0 First analog joystick
|
||||||
@@ -535,14 +533,14 @@ Your cooperation is appreciated.
|
|||||||
16 block GoldStar CD-ROM
|
16 block GoldStar CD-ROM
|
||||||
0 = /dev/gscd GoldStar CD-ROM
|
0 = /dev/gscd GoldStar CD-ROM
|
||||||
|
|
||||||
17 char Chase serial card
|
17 char OBSOLETE (was Chase serial card)
|
||||||
0 = /dev/ttyH0 First Chase port
|
0 = /dev/ttyH0 First Chase port
|
||||||
1 = /dev/ttyH1 Second Chase port
|
1 = /dev/ttyH1 Second Chase port
|
||||||
...
|
...
|
||||||
17 block Optics Storage CD-ROM
|
17 block Optics Storage CD-ROM
|
||||||
0 = /dev/optcd Optics Storage CD-ROM
|
0 = /dev/optcd Optics Storage CD-ROM
|
||||||
|
|
||||||
18 char Chase serial card - alternate devices
|
18 char OBSOLETE (was Chase serial card - alternate devices)
|
||||||
0 = /dev/cuh0 Callout device for ttyH0
|
0 = /dev/cuh0 Callout device for ttyH0
|
||||||
1 = /dev/cuh1 Callout device for ttyH1
|
1 = /dev/cuh1 Callout device for ttyH1
|
||||||
...
|
...
|
||||||
@@ -644,8 +642,7 @@ Your cooperation is appreciated.
|
|||||||
2 = /dev/sbpcd2 Panasonic CD-ROM controller 0 unit 2
|
2 = /dev/sbpcd2 Panasonic CD-ROM controller 0 unit 2
|
||||||
3 = /dev/sbpcd3 Panasonic CD-ROM controller 0 unit 3
|
3 = /dev/sbpcd3 Panasonic CD-ROM controller 0 unit 3
|
||||||
|
|
||||||
26 char Quanta WinVision frame grabber {2.6}
|
26 char
|
||||||
0 = /dev/wvisfgrab Quanta WinVision frame grabber
|
|
||||||
|
|
||||||
26 block Second Matsushita (Panasonic/SoundBlaster) CD-ROM
|
26 block Second Matsushita (Panasonic/SoundBlaster) CD-ROM
|
||||||
0 = /dev/sbpcd4 Panasonic CD-ROM controller 1 unit 0
|
0 = /dev/sbpcd4 Panasonic CD-ROM controller 1 unit 0
|
||||||
@@ -872,7 +869,7 @@ Your cooperation is appreciated.
|
|||||||
and "user level packet I/O." This board is also
|
and "user level packet I/O." This board is also
|
||||||
accessible as a standard networking "eth" device.
|
accessible as a standard networking "eth" device.
|
||||||
|
|
||||||
38 block Reserved for Linux/AP+
|
38 block OBSOLETE (was Linux/AP+)
|
||||||
|
|
||||||
39 char ML-16P experimental I/O board
|
39 char ML-16P experimental I/O board
|
||||||
0 = /dev/ml16pa-a0 First card, first analog channel
|
0 = /dev/ml16pa-a0 First card, first analog channel
|
||||||
@@ -892,29 +889,16 @@ Your cooperation is appreciated.
|
|||||||
50 = /dev/ml16pb-c1 Second card, second counter/timer
|
50 = /dev/ml16pb-c1 Second card, second counter/timer
|
||||||
51 = /dev/ml16pb-c2 Second card, third counter/timer
|
51 = /dev/ml16pb-c2 Second card, third counter/timer
|
||||||
...
|
...
|
||||||
39 block Reserved for Linux/AP+
|
39 block
|
||||||
|
|
||||||
40 char Matrox Meteor frame grabber {2.6}
|
40 char
|
||||||
0 = /dev/mmetfgrab Matrox Meteor frame grabber
|
|
||||||
|
|
||||||
40 block Syquest EZ135 parallel port removable drive
|
40 block
|
||||||
0 = /dev/eza Parallel EZ135 drive, whole disk
|
|
||||||
|
|
||||||
This device is obsolete and will be removed in a
|
|
||||||
future version of Linux. It has been replaced with
|
|
||||||
the parallel port IDE disk driver at major number 45.
|
|
||||||
Partitions are handled in the same way as IDE disks
|
|
||||||
(see major number 3).
|
|
||||||
|
|
||||||
41 char Yet Another Micro Monitor
|
41 char Yet Another Micro Monitor
|
||||||
0 = /dev/yamm Yet Another Micro Monitor
|
0 = /dev/yamm Yet Another Micro Monitor
|
||||||
|
|
||||||
41 block MicroSolutions BackPack parallel port CD-ROM
|
41 block
|
||||||
0 = /dev/bpcd BackPack CD-ROM
|
|
||||||
|
|
||||||
This device is obsolete and will be removed in a
|
|
||||||
future version of Linux. It has been replaced with
|
|
||||||
the parallel port ATAPI CD-ROM driver at major number 46.
|
|
||||||
|
|
||||||
42 char Demo/sample use
|
42 char Demo/sample use
|
||||||
|
|
||||||
@@ -1681,13 +1665,7 @@ Your cooperation is appreciated.
|
|||||||
disks (see major number 3) except that the limit on
|
disks (see major number 3) except that the limit on
|
||||||
partitions is 15.
|
partitions is 15.
|
||||||
|
|
||||||
93 char IBM Smart Capture Card frame grabber {2.6}
|
93 char
|
||||||
0 = /dev/iscc0 First Smart Capture Card
|
|
||||||
1 = /dev/iscc1 Second Smart Capture Card
|
|
||||||
...
|
|
||||||
128 = /dev/isccctl0 First Smart Capture Card control
|
|
||||||
129 = /dev/isccctl1 Second Smart Capture Card control
|
|
||||||
...
|
|
||||||
|
|
||||||
93 block NAND Flash Translation Layer filesystem
|
93 block NAND Flash Translation Layer filesystem
|
||||||
0 = /dev/nftla First NFTL layer
|
0 = /dev/nftla First NFTL layer
|
||||||
@@ -1695,10 +1673,7 @@ Your cooperation is appreciated.
|
|||||||
...
|
...
|
||||||
240 = /dev/nftlp 16th NTFL layer
|
240 = /dev/nftlp 16th NTFL layer
|
||||||
|
|
||||||
94 char miroVIDEO DC10/30 capture/playback device {2.6}
|
94 char
|
||||||
0 = /dev/dcxx0 First capture card
|
|
||||||
1 = /dev/dcxx1 Second capture card
|
|
||||||
...
|
|
||||||
|
|
||||||
94 block IBM S/390 DASD block storage
|
94 block IBM S/390 DASD block storage
|
||||||
0 = /dev/dasda First DASD device, major
|
0 = /dev/dasda First DASD device, major
|
||||||
@@ -1791,11 +1766,7 @@ Your cooperation is appreciated.
|
|||||||
...
|
...
|
||||||
15 = /dev/amiraid/ar?p15 15th partition
|
15 = /dev/amiraid/ar?p15 15th partition
|
||||||
|
|
||||||
102 char Philips SAA5249 Teletext signal decoder {2.6}
|
102 char
|
||||||
0 = /dev/tlk0 First Teletext decoder
|
|
||||||
1 = /dev/tlk1 Second Teletext decoder
|
|
||||||
2 = /dev/tlk2 Third Teletext decoder
|
|
||||||
3 = /dev/tlk3 Fourth Teletext decoder
|
|
||||||
|
|
||||||
102 block Compressed block device
|
102 block Compressed block device
|
||||||
0 = /dev/cbd/a First compressed block device, whole device
|
0 = /dev/cbd/a First compressed block device, whole device
|
||||||
@@ -1916,10 +1887,7 @@ Your cooperation is appreciated.
|
|||||||
DAC960 (see major number 48) except that the limit on
|
DAC960 (see major number 48) except that the limit on
|
||||||
partitions is 15.
|
partitions is 15.
|
||||||
|
|
||||||
111 char Philips SAA7146-based audio/video card {2.6}
|
111 char
|
||||||
0 = /dev/av0 First A/V card
|
|
||||||
1 = /dev/av1 Second A/V card
|
|
||||||
...
|
|
||||||
|
|
||||||
111 block Compaq Next Generation Drive Array, eighth controller
|
111 block Compaq Next Generation Drive Array, eighth controller
|
||||||
0 = /dev/cciss/c7d0 First logical drive, whole disk
|
0 = /dev/cciss/c7d0 First logical drive, whole disk
|
||||||
@@ -2079,8 +2047,8 @@ Your cooperation is appreciated.
|
|||||||
...
|
...
|
||||||
|
|
||||||
119 char VMware virtual network control
|
119 char VMware virtual network control
|
||||||
0 = /dev/vmnet0 1st virtual network
|
0 = /dev/vnet0 1st virtual network
|
||||||
1 = /dev/vmnet1 2nd virtual network
|
1 = /dev/vnet1 2nd virtual network
|
||||||
...
|
...
|
||||||
|
|
||||||
120-127 char LOCAL/EXPERIMENTAL USE
|
120-127 char LOCAL/EXPERIMENTAL USE
|
||||||
@@ -2450,7 +2418,7 @@ Your cooperation is appreciated.
|
|||||||
2 = /dev/raw/raw2 Second raw I/O device
|
2 = /dev/raw/raw2 Second raw I/O device
|
||||||
...
|
...
|
||||||
|
|
||||||
163 char UNASSIGNED (was Radio Tech BIM-XXX-RS232 radio modem - see 51)
|
163 char
|
||||||
|
|
||||||
164 char Chase Research AT/PCI-Fast serial card
|
164 char Chase Research AT/PCI-Fast serial card
|
||||||
0 = /dev/ttyCH0 AT/PCI-Fast board 0, port 0
|
0 = /dev/ttyCH0 AT/PCI-Fast board 0, port 0
|
||||||
@@ -2542,6 +2510,12 @@ Your cooperation is appreciated.
|
|||||||
1 = /dev/clanvi1 Second cLAN adapter
|
1 = /dev/clanvi1 Second cLAN adapter
|
||||||
...
|
...
|
||||||
|
|
||||||
|
179 block MMC block devices
|
||||||
|
0 = /dev/mmcblk0 First SD/MMC card
|
||||||
|
1 = /dev/mmcblk0p1 First partition on first MMC card
|
||||||
|
8 = /dev/mmcblk1 Second SD/MMC card
|
||||||
|
...
|
||||||
|
|
||||||
179 char CCube DVXChip-based PCI products
|
179 char CCube DVXChip-based PCI products
|
||||||
0 = /dev/dvxirq0 First DVX device
|
0 = /dev/dvxirq0 First DVX device
|
||||||
1 = /dev/dvxirq1 Second DVX device
|
1 = /dev/dvxirq1 Second DVX device
|
||||||
@@ -2560,6 +2534,9 @@ Your cooperation is appreciated.
|
|||||||
96 = /dev/usb/hiddev0 1st USB HID device
|
96 = /dev/usb/hiddev0 1st USB HID device
|
||||||
...
|
...
|
||||||
111 = /dev/usb/hiddev15 16th USB HID device
|
111 = /dev/usb/hiddev15 16th USB HID device
|
||||||
|
112 = /dev/usb/auer0 1st auerswald ISDN device
|
||||||
|
...
|
||||||
|
127 = /dev/usb/auer15 16th auerswald ISDN device
|
||||||
128 = /dev/usb/brlvgr0 First Braille Voyager device
|
128 = /dev/usb/brlvgr0 First Braille Voyager device
|
||||||
...
|
...
|
||||||
131 = /dev/usb/brlvgr3 Fourth Braille Voyager device
|
131 = /dev/usb/brlvgr3 Fourth Braille Voyager device
|
||||||
@@ -2810,6 +2787,20 @@ Your cooperation is appreciated.
|
|||||||
...
|
...
|
||||||
190 = /dev/ttyUL3 Xilinx uartlite - port 3
|
190 = /dev/ttyUL3 Xilinx uartlite - port 3
|
||||||
191 = /dev/xvc0 Xen virtual console - port 0
|
191 = /dev/xvc0 Xen virtual console - port 0
|
||||||
|
192 = /dev/ttyPZ0 pmac_zilog - port 0
|
||||||
|
...
|
||||||
|
195 = /dev/ttyPZ3 pmac_zilog - port 3
|
||||||
|
196 = /dev/ttyTX0 TX39/49 serial port 0
|
||||||
|
...
|
||||||
|
204 = /dev/ttyTX7 TX39/49 serial port 7
|
||||||
|
205 = /dev/ttySC0 SC26xx serial port 0
|
||||||
|
206 = /dev/ttySC1 SC26xx serial port 1
|
||||||
|
207 = /dev/ttySC2 SC26xx serial port 2
|
||||||
|
208 = /dev/ttySC3 SC26xx serial port 3
|
||||||
|
209 = /dev/ttyMAX0 MAX3100 serial port 0
|
||||||
|
210 = /dev/ttyMAX1 MAX3100 serial port 1
|
||||||
|
211 = /dev/ttyMAX2 MAX3100 serial port 2
|
||||||
|
212 = /dev/ttyMAX3 MAX3100 serial port 3
|
||||||
|
|
||||||
205 char Low-density serial ports (alternate device)
|
205 char Low-density serial ports (alternate device)
|
||||||
0 = /dev/culu0 Callout device for ttyLU0
|
0 = /dev/culu0 Callout device for ttyLU0
|
||||||
@@ -3145,6 +3136,20 @@ Your cooperation is appreciated.
|
|||||||
1 = /dev/blockrom1 Second ROM card's translation layer interface
|
1 = /dev/blockrom1 Second ROM card's translation layer interface
|
||||||
...
|
...
|
||||||
|
|
||||||
|
259 block Block Extended Major
|
||||||
|
Used dynamically to hold additional partition minor
|
||||||
|
numbers and allow large numbers of partitions per device
|
||||||
|
|
||||||
|
259 char FPGA configuration interfaces
|
||||||
|
0 = /dev/icap0 First Xilinx internal configuration
|
||||||
|
1 = /dev/icap1 Second Xilinx internal configuration
|
||||||
|
|
||||||
|
260 char OSD (Object-based-device) SCSI Device
|
||||||
|
0 = /dev/osd0 First OSD Device
|
||||||
|
1 = /dev/osd1 Second OSD Device
|
||||||
|
...
|
||||||
|
255 = /dev/osd255 256th OSD Device
|
||||||
|
|
||||||
**** ADDITIONAL /dev DIRECTORY ENTRIES
|
**** ADDITIONAL /dev DIRECTORY ENTRIES
|
||||||
|
|
||||||
This section details additional entries that should or may exist in
|
This section details additional entries that should or may exist in
|
||||||
|
@@ -62,7 +62,6 @@ aic7*reg_print.c*
|
|||||||
aic7*seq.h*
|
aic7*seq.h*
|
||||||
aicasm
|
aicasm
|
||||||
aicdb.h*
|
aicdb.h*
|
||||||
asm
|
|
||||||
asm-offsets.h
|
asm-offsets.h
|
||||||
asm_offsets.h
|
asm_offsets.h
|
||||||
autoconf.h*
|
autoconf.h*
|
||||||
|
@@ -128,8 +128,10 @@ Attributes
|
|||||||
~~~~~~~~~~
|
~~~~~~~~~~
|
||||||
struct device_attribute {
|
struct device_attribute {
|
||||||
struct attribute attr;
|
struct attribute attr;
|
||||||
ssize_t (*show)(struct device * dev, char * buf, size_t count, loff_t off);
|
ssize_t (*show)(struct device *dev, struct device_attribute *attr,
|
||||||
ssize_t (*store)(struct device * dev, const char * buf, size_t count, loff_t off);
|
char *buf);
|
||||||
|
ssize_t (*store)(struct device *dev, struct device_attribute *attr,
|
||||||
|
const char *buf, size_t count);
|
||||||
};
|
};
|
||||||
|
|
||||||
Attributes of devices can be exported via drivers using a simple
|
Attributes of devices can be exported via drivers using a simple
|
||||||
|
@@ -1,205 +0,0 @@
|
|||||||
This README escorted the skystar2-driver rewriting procedure. It describes the
|
|
||||||
state of the new flexcop-driver set and some internals are written down here
|
|
||||||
too.
|
|
||||||
|
|
||||||
This document hopefully describes things about the flexcop and its
|
|
||||||
device-offsprings. Goal was to write an easy-to-write and easy-to-read set of
|
|
||||||
drivers based on the skystar2.c and other information.
|
|
||||||
|
|
||||||
Remark: flexcop-pci.c was a copy of skystar2.c, but every line has been
|
|
||||||
touched and rewritten.
|
|
||||||
|
|
||||||
History & News
|
|
||||||
==============
|
|
||||||
2005-04-01 - correct USB ISOC transfers (thanks to Vadim Catana)
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
General coding processing
|
|
||||||
=========================
|
|
||||||
|
|
||||||
We should proceed as follows (as long as no one complains):
|
|
||||||
|
|
||||||
0) Think before start writing code!
|
|
||||||
|
|
||||||
1) rewriting the skystar2.c with the help of the flexcop register descriptions
|
|
||||||
and splitting up the files to a pci-bus-part and a flexcop-part.
|
|
||||||
The new driver will be called b2c2-flexcop-pci.ko/b2c2-flexcop-usb.ko for the
|
|
||||||
device-specific part and b2c2-flexcop.ko for the common flexcop-functions.
|
|
||||||
|
|
||||||
2) Search for errors in the leftover of flexcop-pci.c (compare with pluto2.c
|
|
||||||
and other pci drivers)
|
|
||||||
|
|
||||||
3) make some beautification (see 'Improvements when rewriting (refactoring) is
|
|
||||||
done')
|
|
||||||
|
|
||||||
4) Testing the new driver and maybe substitute the skystar2.c with it, to reach
|
|
||||||
a wider tester audience.
|
|
||||||
|
|
||||||
5) creating an usb-bus-part using the already written flexcop code for the pci
|
|
||||||
card.
|
|
||||||
|
|
||||||
Idea: create a kernel-object for the flexcop and export all important
|
|
||||||
functions. This option saves kernel-memory, but maybe a lot of functions have
|
|
||||||
to be exported to kernel namespace.
|
|
||||||
|
|
||||||
|
|
||||||
Current situation
|
|
||||||
=================
|
|
||||||
|
|
||||||
0) Done :)
|
|
||||||
1) Done (some minor issues left)
|
|
||||||
2) Done
|
|
||||||
3) Not ready yet, more information is necessary
|
|
||||||
4) next to be done (see the table below)
|
|
||||||
5) USB driver is working (yes, there are some minor issues)
|
|
||||||
|
|
||||||
What seems to be ready?
|
|
||||||
-----------------------
|
|
||||||
|
|
||||||
1) Rewriting
|
|
||||||
1a) i2c is cut off from the flexcop-pci.c and seems to work
|
|
||||||
1b) moved tuner and demod stuff from flexcop-pci.c to flexcop-tuner-fe.c
|
|
||||||
1c) moved lnb and diseqc stuff from flexcop-pci.c to flexcop-tuner-fe.c
|
|
||||||
1e) eeprom (reading MAC address)
|
|
||||||
1d) sram (no dynamic sll size detection (commented out) (using default as JJ told me))
|
|
||||||
1f) misc. register accesses for reading parameters (e.g. resetting, revision)
|
|
||||||
1g) pid/mac filter (flexcop-hw-filter.c)
|
|
||||||
1i) dvb-stuff initialization in flexcop.c (done)
|
|
||||||
1h) dma stuff (now just using the size-irq, instead of all-together, to be done)
|
|
||||||
1j) remove flexcop initialization from flexcop-pci.c completely (done)
|
|
||||||
1l) use a well working dma IRQ method (done, see 'Known bugs and problems and TODO')
|
|
||||||
1k) cleanup flexcop-files (remove unused EXPORT_SYMBOLs, make static from
|
|
||||||
non-static where possible, moved code to proper places)
|
|
||||||
|
|
||||||
2) Search for errors in the leftover of flexcop-pci.c (partially done)
|
|
||||||
5a) add MAC address reading
|
|
||||||
5c) feeding of ISOC data to the software demux (format of the isochronous data
|
|
||||||
and speed optimization, no real error) (thanks to Vadim Catana)
|
|
||||||
|
|
||||||
What to do in the near future?
|
|
||||||
--------------------------------------
|
|
||||||
(no special order here)
|
|
||||||
|
|
||||||
5) USB driver
|
|
||||||
5b) optimize isoc-transfer (submitting/killing isoc URBs when transfer is starting)
|
|
||||||
|
|
||||||
Testing changes
|
|
||||||
---------------
|
|
||||||
|
|
||||||
O = item is working
|
|
||||||
P = item is partially working
|
|
||||||
X = item is not working
|
|
||||||
N = item does not apply here
|
|
||||||
<empty field> = item need to be examined
|
|
||||||
|
|
||||||
| PCI | USB
|
|
||||||
item | mt352 | nxt2002 | stv0299 | mt312 | mt352 | nxt2002 | stv0299 | mt312
|
|
||||||
-------+-------+---------+---------+-------+-------+---------+---------+-------
|
|
||||||
1a) | O | | | | N | N | N | N
|
|
||||||
1b) | O | | | | | | O |
|
|
||||||
1c) | N | N | | | N | N | O |
|
|
||||||
1d) | O | O
|
|
||||||
1e) | O | O
|
|
||||||
1f) | P
|
|
||||||
1g) | O
|
|
||||||
1h) | P |
|
|
||||||
1i) | O | N
|
|
||||||
1j) | O | N
|
|
||||||
1l) | O | N
|
|
||||||
2) | O | N
|
|
||||||
5a) | N | O
|
|
||||||
5b)* | N |
|
|
||||||
5c) | N | O
|
|
||||||
|
|
||||||
* - not done yet
|
|
||||||
|
|
||||||
Known bugs and problems and TODO
|
|
||||||
--------------------------------
|
|
||||||
|
|
||||||
1g/h/l) when pid filtering is enabled on the pci card
|
|
||||||
|
|
||||||
DMA usage currently:
|
|
||||||
The DMA is splitted in 2 equal-sized subbuffers. The Flexcop writes to first
|
|
||||||
address and triggers an IRQ when it's full and starts writing to the second
|
|
||||||
address. When the second address is full, the IRQ is triggered again, and
|
|
||||||
the flexcop writes to first address again, and so on.
|
|
||||||
The buffersize of each address is currently 640*188 bytes.
|
|
||||||
|
|
||||||
Problem is, when using hw-pid-filtering and doing some low-bandwidth
|
|
||||||
operation (like scanning) the buffers won't be filled enough to trigger
|
|
||||||
the IRQ. That's why:
|
|
||||||
|
|
||||||
When PID filtering is activated, the timer IRQ is used. Every 1.97 ms the IRQ
|
|
||||||
is triggered. Is the current write address of DMA1 different to the one
|
|
||||||
during the last IRQ, then the data is passed to the demuxer.
|
|
||||||
|
|
||||||
There is an additional DMA-IRQ-method: packet count IRQ. This isn't
|
|
||||||
implemented correctly yet.
|
|
||||||
|
|
||||||
The solution is to disable HW PID filtering, but I don't know how the DVB
|
|
||||||
API software demux behaves on slow systems with 45MBit/s TS.
|
|
||||||
|
|
||||||
Solved bugs :)
|
|
||||||
--------------
|
|
||||||
1g) pid-filtering (somehow pid index 4 and 5 (EMM_PID and ECM_PID) aren't
|
|
||||||
working)
|
|
||||||
SOLUTION: also index 0 was affected, because net_translation is done for
|
|
||||||
these indexes by default
|
|
||||||
|
|
||||||
5b) isochronous transfer does only work in the first attempt (for the Sky2PC
|
|
||||||
USB, Air2PC is working) SOLUTION: the flexcop was going asleep and never really
|
|
||||||
woke up again (don't know if this need fixes, see
|
|
||||||
flexcop-fe-tuner.c:flexcop_sleep)
|
|
||||||
|
|
||||||
NEWS: when the driver is loaded and unloaded and loaded again (w/o doing
|
|
||||||
anything in the while the driver is loaded the first time), no transfers take
|
|
||||||
place anymore.
|
|
||||||
|
|
||||||
Improvements when rewriting (refactoring) is done
|
|
||||||
=================================================
|
|
||||||
|
|
||||||
- split sleeping of the flexcop (misc_204.ACPI3_sig = 1;) from lnb_control
|
|
||||||
(enable sleeping for other demods than dvb-s)
|
|
||||||
- add support for CableStar (stv0297 Microtune 203x/ALPS) (almost done, incompatibilities with the Nexus-CA)
|
|
||||||
|
|
||||||
Debugging
|
|
||||||
---------
|
|
||||||
- add verbose debugging to skystar2.c (dump the reg_dw_data) and compare it
|
|
||||||
with this flexcop, this is important, because i2c is now using the
|
|
||||||
flexcop_ibi_value union from flexcop-reg.h (do you have a better idea for
|
|
||||||
that, please tell us so).
|
|
||||||
|
|
||||||
Everything which is identical in the following table, can be put into a common
|
|
||||||
flexcop-module.
|
|
||||||
|
|
||||||
PCI USB
|
|
||||||
-------------------------------------------------------------------------------
|
|
||||||
Different:
|
|
||||||
Register access: accessing IO memory USB control message
|
|
||||||
I2C bus: I2C bus of the FC USB control message
|
|
||||||
Data transfer: DMA isochronous transfer
|
|
||||||
EEPROM transfer: through i2c bus not clear yet
|
|
||||||
|
|
||||||
Identical:
|
|
||||||
Streaming: accessing registers
|
|
||||||
PID Filtering: accessing registers
|
|
||||||
Sram destinations: accessing registers
|
|
||||||
Tuner/Demod: I2C bus
|
|
||||||
DVB-stuff: can be written for common use
|
|
||||||
|
|
||||||
Acknowledgements (just for the rewriting part)
|
|
||||||
================
|
|
||||||
|
|
||||||
Bjarne Steinsbo thought a lot in the first place of the pci part for this code
|
|
||||||
sharing idea.
|
|
||||||
|
|
||||||
Andreas Oberritter for providing a recent PCI initialization template
|
|
||||||
(pluto2.c).
|
|
||||||
|
|
||||||
Boleslaw Ciesielski for pointing out a problem with firmware loader.
|
|
||||||
|
|
||||||
Vadim Catana for correcting the USB transfer.
|
|
||||||
|
|
||||||
comments, critics and ideas to linux-dvb@linuxtv.org.
|
|
@@ -25,7 +25,7 @@ use IO::Handle;
|
|||||||
"tda10046lifeview", "av7110", "dec2000t", "dec2540t",
|
"tda10046lifeview", "av7110", "dec2000t", "dec2540t",
|
||||||
"dec3000s", "vp7041", "dibusb", "nxt2002", "nxt2004",
|
"dec3000s", "vp7041", "dibusb", "nxt2002", "nxt2004",
|
||||||
"or51211", "or51132_qam", "or51132_vsb", "bluebird",
|
"or51211", "or51132_qam", "or51132_vsb", "bluebird",
|
||||||
"opera1");
|
"opera1", "cx231xx", "cx18", "cx23885", "pvrusb2" );
|
||||||
|
|
||||||
# Check args
|
# Check args
|
||||||
syntax() if (scalar(@ARGV) != 1);
|
syntax() if (scalar(@ARGV) != 1);
|
||||||
@@ -37,8 +37,8 @@ for ($i=0; $i < scalar(@components); $i++) {
|
|||||||
$outfile = eval($cid);
|
$outfile = eval($cid);
|
||||||
die $@ if $@;
|
die $@ if $@;
|
||||||
print STDERR <<EOF;
|
print STDERR <<EOF;
|
||||||
Firmware $outfile extracted successfully.
|
Firmware(s) $outfile extracted successfully.
|
||||||
Now copy it to either /usr/lib/hotplug/firmware or /lib/firmware
|
Now copy it(they) to either /usr/lib/hotplug/firmware or /lib/firmware
|
||||||
(depending on configuration of firmware hotplug).
|
(depending on configuration of firmware hotplug).
|
||||||
EOF
|
EOF
|
||||||
exit(0);
|
exit(0);
|
||||||
@@ -345,6 +345,85 @@ sub or51211 {
|
|||||||
$fwfile;
|
$fwfile;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
sub cx231xx {
|
||||||
|
my $fwfile = "v4l-cx231xx-avcore-01.fw";
|
||||||
|
my $url = "http://linuxtv.org/downloads/firmware/$fwfile";
|
||||||
|
my $hash = "7d3bb956dc9df0eafded2b56ba57cc42";
|
||||||
|
|
||||||
|
checkstandard();
|
||||||
|
|
||||||
|
wgetfile($fwfile, $url);
|
||||||
|
verify($fwfile, $hash);
|
||||||
|
|
||||||
|
$fwfile;
|
||||||
|
}
|
||||||
|
|
||||||
|
sub cx18 {
|
||||||
|
my $url = "http://linuxtv.org/downloads/firmware/";
|
||||||
|
|
||||||
|
my %files = (
|
||||||
|
'v4l-cx23418-apu.fw' => '588f081b562f5c653a3db1ad8f65939a',
|
||||||
|
'v4l-cx23418-cpu.fw' => 'b6c7ed64bc44b1a6e0840adaeac39d79',
|
||||||
|
'v4l-cx23418-dig.fw' => '95bc688d3e7599fd5800161e9971cc55',
|
||||||
|
);
|
||||||
|
|
||||||
|
checkstandard();
|
||||||
|
|
||||||
|
my $allfiles;
|
||||||
|
foreach my $fwfile (keys %files) {
|
||||||
|
wgetfile($fwfile, "$url/$fwfile");
|
||||||
|
verify($fwfile, $files{$fwfile});
|
||||||
|
$allfiles .= " $fwfile";
|
||||||
|
}
|
||||||
|
|
||||||
|
$allfiles =~ s/^\s//;
|
||||||
|
|
||||||
|
$allfiles;
|
||||||
|
}
|
||||||
|
|
||||||
|
sub cx23885 {
|
||||||
|
my $url = "http://linuxtv.org/downloads/firmware/";
|
||||||
|
|
||||||
|
my %files = (
|
||||||
|
'v4l-cx23885-avcore-01.fw' => 'a9f8f5d901a7fb42f552e1ee6384f3bb',
|
||||||
|
'v4l-cx23885-enc.fw' => 'a9f8f5d901a7fb42f552e1ee6384f3bb',
|
||||||
|
);
|
||||||
|
|
||||||
|
checkstandard();
|
||||||
|
|
||||||
|
my $allfiles;
|
||||||
|
foreach my $fwfile (keys %files) {
|
||||||
|
wgetfile($fwfile, "$url/$fwfile");
|
||||||
|
verify($fwfile, $files{$fwfile});
|
||||||
|
$allfiles .= " $fwfile";
|
||||||
|
}
|
||||||
|
|
||||||
|
$allfiles =~ s/^\s//;
|
||||||
|
|
||||||
|
$allfiles;
|
||||||
|
}
|
||||||
|
|
||||||
|
sub pvrusb2 {
|
||||||
|
my $url = "http://linuxtv.org/downloads/firmware/";
|
||||||
|
|
||||||
|
my %files = (
|
||||||
|
'v4l-cx25840.fw' => 'dadb79e9904fc8af96e8111d9cb59320',
|
||||||
|
);
|
||||||
|
|
||||||
|
checkstandard();
|
||||||
|
|
||||||
|
my $allfiles;
|
||||||
|
foreach my $fwfile (keys %files) {
|
||||||
|
wgetfile($fwfile, "$url/$fwfile");
|
||||||
|
verify($fwfile, $files{$fwfile});
|
||||||
|
$allfiles .= " $fwfile";
|
||||||
|
}
|
||||||
|
|
||||||
|
$allfiles =~ s/^\s//;
|
||||||
|
|
||||||
|
$allfiles;
|
||||||
|
}
|
||||||
|
|
||||||
sub or51132_qam {
|
sub or51132_qam {
|
||||||
my $fwfile = "dvb-fe-or51132-qam.fw";
|
my $fwfile = "dvb-fe-or51132-qam.fw";
|
||||||
my $url = "http://linuxtv.org/downloads/firmware/$fwfile";
|
my $url = "http://linuxtv.org/downloads/firmware/$fwfile";
|
||||||
|
@@ -1,5 +1,5 @@
|
|||||||
How to set up the Technisat devices
|
How to set up the Technisat/B2C2 Flexcop devices
|
||||||
===================================
|
================================================
|
||||||
|
|
||||||
1) Find out what device you have
|
1) Find out what device you have
|
||||||
================================
|
================================
|
||||||
@@ -16,54 +16,60 @@ DVB: registering frontend 0 (Conexant CX24123/CX24109)...
|
|||||||
|
|
||||||
If the Technisat is the only TV device in your box get rid of unnecessary modules and check this one:
|
If the Technisat is the only TV device in your box get rid of unnecessary modules and check this one:
|
||||||
"Multimedia devices" => "Customise analog and hybrid tuner modules to build"
|
"Multimedia devices" => "Customise analog and hybrid tuner modules to build"
|
||||||
In this directory uncheck every driver which is activated there.
|
In this directory uncheck every driver which is activated there (except "Simple tuner support" for case 9 only).
|
||||||
|
|
||||||
Then please activate:
|
Then please activate:
|
||||||
2a) Main module part:
|
2a) Main module part:
|
||||||
|
|
||||||
a.)"Multimedia devices" => "DVB/ATSC adapters" => "Technisat/B2C2 FlexcopII(b) and FlexCopIII adapters"
|
a.)"Multimedia devices" => "DVB/ATSC adapters" => "Technisat/B2C2 FlexcopII(b) and FlexCopIII adapters"
|
||||||
b.)"Multimedia devices" => "DVB/ATSC adapters" => "Technisat/B2C2 FlexcopII(b) and FlexCopIII adapters" => "Technisat/B2C2 Air/Sky/Cable2PC PCI" in case of a PCI card OR
|
b.)"Multimedia devices" => "DVB/ATSC adapters" => "Technisat/B2C2 FlexcopII(b) and FlexCopIII adapters" => "Technisat/B2C2 Air/Sky/Cable2PC PCI" in case of a PCI card
|
||||||
|
OR
|
||||||
c.)"Multimedia devices" => "DVB/ATSC adapters" => "Technisat/B2C2 FlexcopII(b) and FlexCopIII adapters" => "Technisat/B2C2 Air/Sky/Cable2PC USB" in case of an USB 1.1 adapter
|
c.)"Multimedia devices" => "DVB/ATSC adapters" => "Technisat/B2C2 FlexcopII(b) and FlexCopIII adapters" => "Technisat/B2C2 Air/Sky/Cable2PC USB" in case of an USB 1.1 adapter
|
||||||
d.)"Multimedia devices" => "DVB/ATSC adapters" => "Technisat/B2C2 FlexcopII(b) and FlexCopIII adapters" => "Enable debug for the B2C2 FlexCop drivers"
|
d.)"Multimedia devices" => "DVB/ATSC adapters" => "Technisat/B2C2 FlexcopII(b) and FlexCopIII adapters" => "Enable debug for the B2C2 FlexCop drivers"
|
||||||
Notice: d.) is helpful for troubleshooting
|
Notice: d.) is helpful for troubleshooting
|
||||||
|
|
||||||
2b) Frontend module part:
|
2b) Frontend module part:
|
||||||
|
|
||||||
1.) Revision 2.3:
|
1.) SkyStar DVB-S Revision 2.3:
|
||||||
a.)"Multimedia devices" => "Customise DVB frontends" => "Customise the frontend modules to build"
|
a.)"Multimedia devices" => "Customise DVB frontends" => "Customise the frontend modules to build"
|
||||||
b.)"Multimedia devices" => "Customise DVB frontends" => "Zarlink VP310/MT312/ZL10313 based"
|
b.)"Multimedia devices" => "Customise DVB frontends" => "Zarlink VP310/MT312/ZL10313 based"
|
||||||
|
|
||||||
2.) Revision 2.6:
|
2.) SkyStar DVB-S Revision 2.6:
|
||||||
a.)"Multimedia devices" => "Customise DVB frontends" => "Customise the frontend modules to build"
|
a.)"Multimedia devices" => "Customise DVB frontends" => "Customise the frontend modules to build"
|
||||||
b.)"Multimedia devices" => "Customise DVB frontends" => "ST STV0299 based"
|
b.)"Multimedia devices" => "Customise DVB frontends" => "ST STV0299 based"
|
||||||
|
|
||||||
3.) Revision 2.7:
|
3.) SkyStar DVB-S Revision 2.7:
|
||||||
a.)"Multimedia devices" => "Customise DVB frontends" => "Customise the frontend modules to build"
|
a.)"Multimedia devices" => "Customise DVB frontends" => "Customise the frontend modules to build"
|
||||||
b.)"Multimedia devices" => "Customise DVB frontends" => "Samsung S5H1420 based"
|
b.)"Multimedia devices" => "Customise DVB frontends" => "Samsung S5H1420 based"
|
||||||
c.)"Multimedia devices" => "Customise DVB frontends" => "Integrant ITD1000 Zero IF tuner for DVB-S/DSS"
|
c.)"Multimedia devices" => "Customise DVB frontends" => "Integrant ITD1000 Zero IF tuner for DVB-S/DSS"
|
||||||
d.)"Multimedia devices" => "Customise DVB frontends" => "ISL6421 SEC controller"
|
d.)"Multimedia devices" => "Customise DVB frontends" => "ISL6421 SEC controller"
|
||||||
|
|
||||||
4.) Revision 2.8:
|
4.) SkyStar DVB-S Revision 2.8:
|
||||||
a.)"Multimedia devices" => "Customise DVB frontends" => "Customise the frontend modules to build"
|
a.)"Multimedia devices" => "Customise DVB frontends" => "Customise the frontend modules to build"
|
||||||
b.)"Multimedia devices" => "Customise DVB frontends" => "Conexant CX24113/CX24128 tuner for DVB-S/DSS"
|
b.)"Multimedia devices" => "Customise DVB frontends" => "Conexant CX24113/CX24128 tuner for DVB-S/DSS"
|
||||||
c.)"Multimedia devices" => "Customise DVB frontends" => "Conexant CX24123 based"
|
c.)"Multimedia devices" => "Customise DVB frontends" => "Conexant CX24123 based"
|
||||||
d.)"Multimedia devices" => "Customise DVB frontends" => "ISL6421 SEC controller"
|
d.)"Multimedia devices" => "Customise DVB frontends" => "ISL6421 SEC controller"
|
||||||
|
|
||||||
5.) DVB-T card:
|
5.) AirStar DVB-T card:
|
||||||
a.)"Multimedia devices" => "Customise DVB frontends" => "Customise the frontend modules to build"
|
a.)"Multimedia devices" => "Customise DVB frontends" => "Customise the frontend modules to build"
|
||||||
b.)"Multimedia devices" => "Customise DVB frontends" => "Zarlink MT352 based"
|
b.)"Multimedia devices" => "Customise DVB frontends" => "Zarlink MT352 based"
|
||||||
|
|
||||||
6.) DVB-C card:
|
6.) CableStar DVB-C card:
|
||||||
a.)"Multimedia devices" => "Customise DVB frontends" => "Customise the frontend modules to build"
|
a.)"Multimedia devices" => "Customise DVB frontends" => "Customise the frontend modules to build"
|
||||||
b.)"Multimedia devices" => "Customise DVB frontends" => "ST STV0297 based"
|
b.)"Multimedia devices" => "Customise DVB frontends" => "ST STV0297 based"
|
||||||
|
|
||||||
7.) ATSC card 1st generation:
|
7.) AirStar ATSC card 1st generation:
|
||||||
a.)"Multimedia devices" => "Customise DVB frontends" => "Customise the frontend modules to build"
|
a.)"Multimedia devices" => "Customise DVB frontends" => "Customise the frontend modules to build"
|
||||||
b.)"Multimedia devices" => "Customise DVB frontends" => "Broadcom BCM3510"
|
b.)"Multimedia devices" => "Customise DVB frontends" => "Broadcom BCM3510"
|
||||||
|
|
||||||
8.) ATSC card 2nd generation:
|
8.) AirStar ATSC card 2nd generation:
|
||||||
a.)"Multimedia devices" => "Customise DVB frontends" => "Customise the frontend modules to build"
|
a.)"Multimedia devices" => "Customise DVB frontends" => "Customise the frontend modules to build"
|
||||||
b.)"Multimedia devices" => "Customise DVB frontends" => "NxtWave Communications NXT2002/NXT2004 based"
|
b.)"Multimedia devices" => "Customise DVB frontends" => "NxtWave Communications NXT2002/NXT2004 based"
|
||||||
c.)"Multimedia devices" => "Customise DVB frontends" => "LG Electronics LGDT3302/LGDT3303 based"
|
c.)"Multimedia devices" => "Customise DVB frontends" => "Generic I2C PLL based tuners"
|
||||||
|
|
||||||
Author: Uwe Bugla <uwe.bugla@gmx.de> December 2008
|
9.) AirStar ATSC card 3rd generation:
|
||||||
|
a.)"Multimedia devices" => "Customise DVB frontends" => "Customise the frontend modules to build"
|
||||||
|
b.)"Multimedia devices" => "Customise DVB frontends" => "LG Electronics LGDT3302/LGDT3303 based"
|
||||||
|
c.)"Multimedia devices" => "Customise analog and hybrid tuner modules to build" => "Simple tuner support"
|
||||||
|
|
||||||
|
Author: Uwe Bugla <uwe.bugla@gmx.de> February 2009
|
||||||
|
240
Documentation/dynamic-debug-howto.txt
Normal file
240
Documentation/dynamic-debug-howto.txt
Normal file
@@ -0,0 +1,240 @@
|
|||||||
|
|
||||||
|
Introduction
|
||||||
|
============
|
||||||
|
|
||||||
|
This document describes how to use the dynamic debug (ddebug) feature.
|
||||||
|
|
||||||
|
Dynamic debug is designed to allow you to dynamically enable/disable kernel
|
||||||
|
code to obtain additional kernel information. Currently, if
|
||||||
|
CONFIG_DYNAMIC_DEBUG is set, then all pr_debug()/dev_debug() calls can be
|
||||||
|
dynamically enabled per-callsite.
|
||||||
|
|
||||||
|
Dynamic debug has even more useful features:
|
||||||
|
|
||||||
|
* Simple query language allows turning on and off debugging statements by
|
||||||
|
matching any combination of:
|
||||||
|
|
||||||
|
- source filename
|
||||||
|
- function name
|
||||||
|
- line number (including ranges of line numbers)
|
||||||
|
- module name
|
||||||
|
- format string
|
||||||
|
|
||||||
|
* Provides a debugfs control file: <debugfs>/dynamic_debug/control which can be
|
||||||
|
read to display the complete list of known debug statements, to help guide you
|
||||||
|
|
||||||
|
Controlling dynamic debug Behaviour
|
||||||
|
===============================
|
||||||
|
|
||||||
|
The behaviour of pr_debug()/dev_debug()s are controlled via writing to a
|
||||||
|
control file in the 'debugfs' filesystem. Thus, you must first mount the debugfs
|
||||||
|
filesystem, in order to make use of this feature. Subsequently, we refer to the
|
||||||
|
control file as: <debugfs>/dynamic_debug/control. For example, if you want to
|
||||||
|
enable printing from source file 'svcsock.c', line 1603 you simply do:
|
||||||
|
|
||||||
|
nullarbor:~ # echo 'file svcsock.c line 1603 +p' >
|
||||||
|
<debugfs>/dynamic_debug/control
|
||||||
|
|
||||||
|
If you make a mistake with the syntax, the write will fail thus:
|
||||||
|
|
||||||
|
nullarbor:~ # echo 'file svcsock.c wtf 1 +p' >
|
||||||
|
<debugfs>/dynamic_debug/control
|
||||||
|
-bash: echo: write error: Invalid argument
|
||||||
|
|
||||||
|
Viewing Dynamic Debug Behaviour
|
||||||
|
===========================
|
||||||
|
|
||||||
|
You can view the currently configured behaviour of all the debug statements
|
||||||
|
via:
|
||||||
|
|
||||||
|
nullarbor:~ # cat <debugfs>/dynamic_debug/control
|
||||||
|
# filename:lineno [module]function flags format
|
||||||
|
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:323 [svcxprt_rdma]svc_rdma_cleanup - "SVCRDMA Module Removed, deregister RPC RDMA transport\012"
|
||||||
|
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:341 [svcxprt_rdma]svc_rdma_init - "\011max_inline : %d\012"
|
||||||
|
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:340 [svcxprt_rdma]svc_rdma_init - "\011sq_depth : %d\012"
|
||||||
|
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:338 [svcxprt_rdma]svc_rdma_init - "\011max_requests : %d\012"
|
||||||
|
...
|
||||||
|
|
||||||
|
|
||||||
|
You can also apply standard Unix text manipulation filters to this
|
||||||
|
data, e.g.
|
||||||
|
|
||||||
|
nullarbor:~ # grep -i rdma <debugfs>/dynamic_debug/control | wc -l
|
||||||
|
62
|
||||||
|
|
||||||
|
nullarbor:~ # grep -i tcp <debugfs>/dynamic_debug/control | wc -l
|
||||||
|
42
|
||||||
|
|
||||||
|
Note in particular that the third column shows the enabled behaviour
|
||||||
|
flags for each debug statement callsite (see below for definitions of the
|
||||||
|
flags). The default value, no extra behaviour enabled, is "-". So
|
||||||
|
you can view all the debug statement callsites with any non-default flags:
|
||||||
|
|
||||||
|
nullarbor:~ # awk '$3 != "-"' <debugfs>/dynamic_debug/control
|
||||||
|
# filename:lineno [module]function flags format
|
||||||
|
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svcsock.c:1603 [sunrpc]svc_send p "svc_process: st_sendto returned %d\012"
|
||||||
|
|
||||||
|
|
||||||
|
Command Language Reference
|
||||||
|
==========================
|
||||||
|
|
||||||
|
At the lexical level, a command comprises a sequence of words separated
|
||||||
|
by whitespace characters. Note that newlines are treated as word
|
||||||
|
separators and do *not* end a command or allow multiple commands to
|
||||||
|
be done together. So these are all equivalent:
|
||||||
|
|
||||||
|
nullarbor:~ # echo -c 'file svcsock.c line 1603 +p' >
|
||||||
|
<debugfs>/dynamic_debug/control
|
||||||
|
nullarbor:~ # echo -c ' file svcsock.c line 1603 +p ' >
|
||||||
|
<debugfs>/dynamic_debug/control
|
||||||
|
nullarbor:~ # echo -c 'file svcsock.c\nline 1603 +p' >
|
||||||
|
<debugfs>/dynamic_debug/control
|
||||||
|
nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' >
|
||||||
|
<debugfs>/dynamic_debug/control
|
||||||
|
|
||||||
|
Commands are bounded by a write() system call. If you want to do
|
||||||
|
multiple commands you need to do a separate "echo" for each, like:
|
||||||
|
|
||||||
|
nullarbor:~ # echo 'file svcsock.c line 1603 +p' > /proc/dprintk ;\
|
||||||
|
> echo 'file svcsock.c line 1563 +p' > /proc/dprintk
|
||||||
|
|
||||||
|
or even like:
|
||||||
|
|
||||||
|
nullarbor:~ # (
|
||||||
|
> echo 'file svcsock.c line 1603 +p' ;\
|
||||||
|
> echo 'file svcsock.c line 1563 +p' ;\
|
||||||
|
> ) > /proc/dprintk
|
||||||
|
|
||||||
|
At the syntactical level, a command comprises a sequence of match
|
||||||
|
specifications, followed by a flags change specification.
|
||||||
|
|
||||||
|
command ::= match-spec* flags-spec
|
||||||
|
|
||||||
|
The match-spec's are used to choose a subset of the known dprintk()
|
||||||
|
callsites to which to apply the flags-spec. Think of them as a query
|
||||||
|
with implicit ANDs between each pair. Note that an empty list of
|
||||||
|
match-specs is possible, but is not very useful because it will not
|
||||||
|
match any debug statement callsites.
|
||||||
|
|
||||||
|
A match specification comprises a keyword, which controls the attribute
|
||||||
|
of the callsite to be compared, and a value to compare against. Possible
|
||||||
|
keywords are:
|
||||||
|
|
||||||
|
match-spec ::= 'func' string |
|
||||||
|
'file' string |
|
||||||
|
'module' string |
|
||||||
|
'format' string |
|
||||||
|
'line' line-range
|
||||||
|
|
||||||
|
line-range ::= lineno |
|
||||||
|
'-'lineno |
|
||||||
|
lineno'-' |
|
||||||
|
lineno'-'lineno
|
||||||
|
// Note: line-range cannot contain space, e.g.
|
||||||
|
// "1-30" is valid range but "1 - 30" is not.
|
||||||
|
|
||||||
|
lineno ::= unsigned-int
|
||||||
|
|
||||||
|
The meanings of each keyword are:
|
||||||
|
|
||||||
|
func
|
||||||
|
The given string is compared against the function name
|
||||||
|
of each callsite. Example:
|
||||||
|
|
||||||
|
func svc_tcp_accept
|
||||||
|
|
||||||
|
file
|
||||||
|
The given string is compared against either the full
|
||||||
|
pathname or the basename of the source file of each
|
||||||
|
callsite. Examples:
|
||||||
|
|
||||||
|
file svcsock.c
|
||||||
|
file /usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svcsock.c
|
||||||
|
|
||||||
|
module
|
||||||
|
The given string is compared against the module name
|
||||||
|
of each callsite. The module name is the string as
|
||||||
|
seen in "lsmod", i.e. without the directory or the .ko
|
||||||
|
suffix and with '-' changed to '_'. Examples:
|
||||||
|
|
||||||
|
module sunrpc
|
||||||
|
module nfsd
|
||||||
|
|
||||||
|
format
|
||||||
|
The given string is searched for in the dynamic debug format
|
||||||
|
string. Note that the string does not need to match the
|
||||||
|
entire format, only some part. Whitespace and other
|
||||||
|
special characters can be escaped using C octal character
|
||||||
|
escape \ooo notation, e.g. the space character is \040.
|
||||||
|
Alternatively, the string can be enclosed in double quote
|
||||||
|
characters (") or single quote characters (').
|
||||||
|
Examples:
|
||||||
|
|
||||||
|
format svcrdma: // many of the NFS/RDMA server dprintks
|
||||||
|
format readahead // some dprintks in the readahead cache
|
||||||
|
format nfsd:\040SETATTR // one way to match a format with whitespace
|
||||||
|
format "nfsd: SETATTR" // a neater way to match a format with whitespace
|
||||||
|
format 'nfsd: SETATTR' // yet another way to match a format with whitespace
|
||||||
|
|
||||||
|
line
|
||||||
|
The given line number or range of line numbers is compared
|
||||||
|
against the line number of each dprintk() callsite. A single
|
||||||
|
line number matches the callsite line number exactly. A
|
||||||
|
range of line numbers matches any callsite between the first
|
||||||
|
and last line number inclusive. An empty first number means
|
||||||
|
the first line in the file, an empty line number means the
|
||||||
|
last number in the file. Examples:
|
||||||
|
|
||||||
|
line 1603 // exactly line 1603
|
||||||
|
line 1600-1605 // the six lines from line 1600 to line 1605
|
||||||
|
line -1605 // the 1605 lines from line 1 to line 1605
|
||||||
|
line 1600- // all lines from line 1600 to the end of the file
|
||||||
|
|
||||||
|
The flags specification comprises a change operation followed
|
||||||
|
by one or more flag characters. The change operation is one
|
||||||
|
of the characters:
|
||||||
|
|
||||||
|
-
|
||||||
|
remove the given flags
|
||||||
|
|
||||||
|
+
|
||||||
|
add the given flags
|
||||||
|
|
||||||
|
=
|
||||||
|
set the flags to the given flags
|
||||||
|
|
||||||
|
The flags are:
|
||||||
|
|
||||||
|
p
|
||||||
|
Causes a printk() message to be emitted to dmesg
|
||||||
|
|
||||||
|
Note the regexp ^[-+=][scp]+$ matches a flags specification.
|
||||||
|
Note also that there is no convenient syntax to remove all
|
||||||
|
the flags at once, you need to use "-psc".
|
||||||
|
|
||||||
|
Examples
|
||||||
|
========
|
||||||
|
|
||||||
|
// enable the message at line 1603 of file svcsock.c
|
||||||
|
nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' >
|
||||||
|
<debugfs>/dynamic_debug/control
|
||||||
|
|
||||||
|
// enable all the messages in file svcsock.c
|
||||||
|
nullarbor:~ # echo -n 'file svcsock.c +p' >
|
||||||
|
<debugfs>/dynamic_debug/control
|
||||||
|
|
||||||
|
// enable all the messages in the NFS server module
|
||||||
|
nullarbor:~ # echo -n 'module nfsd +p' >
|
||||||
|
<debugfs>/dynamic_debug/control
|
||||||
|
|
||||||
|
// enable all 12 messages in the function svc_process()
|
||||||
|
nullarbor:~ # echo -n 'func svc_process +p' >
|
||||||
|
<debugfs>/dynamic_debug/control
|
||||||
|
|
||||||
|
// disable all 12 messages in the function svc_process()
|
||||||
|
nullarbor:~ # echo -n 'func svc_process -p' >
|
||||||
|
<debugfs>/dynamic_debug/control
|
||||||
|
|
||||||
|
// enable messages for NFS calls READ, READLINK, READDIR and READDIR+.
|
||||||
|
nullarbor:~ # echo -n 'format "nfsd: READ" +p' >
|
||||||
|
<debugfs>/dynamic_debug/control
|
@@ -11,8 +11,6 @@ aty128fb.txt
|
|||||||
- info on the ATI Rage128 frame buffer driver.
|
- info on the ATI Rage128 frame buffer driver.
|
||||||
cirrusfb.txt
|
cirrusfb.txt
|
||||||
- info on the driver for Cirrus Logic chipsets.
|
- info on the driver for Cirrus Logic chipsets.
|
||||||
cyblafb/
|
|
||||||
- directory with documentation files related to the cyblafb driver.
|
|
||||||
deferred_io.txt
|
deferred_io.txt
|
||||||
- an introduction to deferred IO.
|
- an introduction to deferred IO.
|
||||||
fbcon.txt
|
fbcon.txt
|
||||||
|
@@ -1,13 +0,0 @@
|
|||||||
Bugs
|
|
||||||
====
|
|
||||||
|
|
||||||
I currently don't know of any bug. Please do send reports to:
|
|
||||||
- linux-fbdev-devel@lists.sourceforge.net
|
|
||||||
- Knut_Petersen@t-online.de.
|
|
||||||
|
|
||||||
|
|
||||||
Untested features
|
|
||||||
=================
|
|
||||||
|
|
||||||
All LCD stuff is untested. If it worked in tridentfb, it should work in
|
|
||||||
cyblafb. Please test and report the results to Knut_Petersen@t-online.de.
|
|
@@ -1,7 +0,0 @@
|
|||||||
Thanks to
|
|
||||||
=========
|
|
||||||
* Alan Hourihane, for writing the X trident driver
|
|
||||||
* Jani Monoses, for writing the tridentfb driver
|
|
||||||
* Antonino A. Daplas, for review of the first published
|
|
||||||
version of cyblafb and some code
|
|
||||||
* Jochen Hein, for testing and a helpfull bug report
|
|
@@ -1,17 +0,0 @@
|
|||||||
Available Documentation
|
|
||||||
=======================
|
|
||||||
|
|
||||||
Apollo PLE 133 Chipset VT8601A North Bridge Datasheet, Rev. 1.82, October 22,
|
|
||||||
2001, available from VIA:
|
|
||||||
|
|
||||||
http://www.viavpsd.com/product/6/15/DS8601A182.pdf
|
|
||||||
|
|
||||||
The datasheet is incomplete, some registers that need to be programmed are not
|
|
||||||
explained at all and important bits are listed as "reserved". But you really
|
|
||||||
need the datasheet to understand the code. "p. xxx" comments refer to page
|
|
||||||
numbers of this document.
|
|
||||||
|
|
||||||
XFree/XOrg drivers are available and of good quality, looking at the code
|
|
||||||
there is a good idea if the datasheet does not provide enough information
|
|
||||||
or if the datasheet seems to be wrong.
|
|
||||||
|
|
@@ -1,154 +0,0 @@
|
|||||||
#
|
|
||||||
# Sample fb.modes file
|
|
||||||
#
|
|
||||||
# Provides an incomplete list of working modes for
|
|
||||||
# the cyberblade/i1 graphics core.
|
|
||||||
#
|
|
||||||
# The value 4294967256 is used instead of -40. Of course, -40 is not
|
|
||||||
# a really reasonable value, but chip design does not always follow
|
|
||||||
# logic. Believe me, it's ok, and it's the way the BIOS does it.
|
|
||||||
#
|
|
||||||
# fbset requires 4294967256 in fb.modes and -40 as an argument to
|
|
||||||
# the -t parameter. That's also not too reasonable, and it might change
|
|
||||||
# in the future or might even be differt for your current version.
|
|
||||||
#
|
|
||||||
|
|
||||||
mode "640x480-50"
|
|
||||||
geometry 640 480 2048 4096 8
|
|
||||||
timings 47619 4294967256 24 17 0 216 3
|
|
||||||
endmode
|
|
||||||
|
|
||||||
mode "640x480-60"
|
|
||||||
geometry 640 480 2048 4096 8
|
|
||||||
timings 39682 4294967256 24 17 0 216 3
|
|
||||||
endmode
|
|
||||||
|
|
||||||
mode "640x480-70"
|
|
||||||
geometry 640 480 2048 4096 8
|
|
||||||
timings 34013 4294967256 24 17 0 216 3
|
|
||||||
endmode
|
|
||||||
|
|
||||||
mode "640x480-72"
|
|
||||||
geometry 640 480 2048 4096 8
|
|
||||||
timings 33068 4294967256 24 17 0 216 3
|
|
||||||
endmode
|
|
||||||
|
|
||||||
mode "640x480-75"
|
|
||||||
geometry 640 480 2048 4096 8
|
|
||||||
timings 31746 4294967256 24 17 0 216 3
|
|
||||||
endmode
|
|
||||||
|
|
||||||
mode "640x480-80"
|
|
||||||
geometry 640 480 2048 4096 8
|
|
||||||
timings 29761 4294967256 24 17 0 216 3
|
|
||||||
endmode
|
|
||||||
|
|
||||||
mode "640x480-85"
|
|
||||||
geometry 640 480 2048 4096 8
|
|
||||||
timings 28011 4294967256 24 17 0 216 3
|
|
||||||
endmode
|
|
||||||
|
|
||||||
mode "800x600-50"
|
|
||||||
geometry 800 600 2048 4096 8
|
|
||||||
timings 30303 96 24 14 0 136 11
|
|
||||||
endmode
|
|
||||||
|
|
||||||
mode "800x600-60"
|
|
||||||
geometry 800 600 2048 4096 8
|
|
||||||
timings 25252 96 24 14 0 136 11
|
|
||||||
endmode
|
|
||||||
|
|
||||||
mode "800x600-70"
|
|
||||||
geometry 800 600 2048 4096 8
|
|
||||||
timings 21645 96 24 14 0 136 11
|
|
||||||
endmode
|
|
||||||
|
|
||||||
mode "800x600-72"
|
|
||||||
geometry 800 600 2048 4096 8
|
|
||||||
timings 21043 96 24 14 0 136 11
|
|
||||||
endmode
|
|
||||||
|
|
||||||
mode "800x600-75"
|
|
||||||
geometry 800 600 2048 4096 8
|
|
||||||
timings 20202 96 24 14 0 136 11
|
|
||||||
endmode
|
|
||||||
|
|
||||||
mode "800x600-80"
|
|
||||||
geometry 800 600 2048 4096 8
|
|
||||||
timings 18939 96 24 14 0 136 11
|
|
||||||
endmode
|
|
||||||
|
|
||||||
mode "800x600-85"
|
|
||||||
geometry 800 600 2048 4096 8
|
|
||||||
timings 17825 96 24 14 0 136 11
|
|
||||||
endmode
|
|
||||||
|
|
||||||
mode "1024x768-50"
|
|
||||||
geometry 1024 768 2048 4096 8
|
|
||||||
timings 19054 144 24 29 0 120 3
|
|
||||||
endmode
|
|
||||||
|
|
||||||
mode "1024x768-60"
|
|
||||||
geometry 1024 768 2048 4096 8
|
|
||||||
timings 15880 144 24 29 0 120 3
|
|
||||||
endmode
|
|
||||||
|
|
||||||
mode "1024x768-70"
|
|
||||||
geometry 1024 768 2048 4096 8
|
|
||||||
timings 13610 144 24 29 0 120 3
|
|
||||||
endmode
|
|
||||||
|
|
||||||
mode "1024x768-72"
|
|
||||||
geometry 1024 768 2048 4096 8
|
|
||||||
timings 13232 144 24 29 0 120 3
|
|
||||||
endmode
|
|
||||||
|
|
||||||
mode "1024x768-75"
|
|
||||||
geometry 1024 768 2048 4096 8
|
|
||||||
timings 12703 144 24 29 0 120 3
|
|
||||||
endmode
|
|
||||||
|
|
||||||
mode "1024x768-80"
|
|
||||||
geometry 1024 768 2048 4096 8
|
|
||||||
timings 11910 144 24 29 0 120 3
|
|
||||||
endmode
|
|
||||||
|
|
||||||
mode "1024x768-85"
|
|
||||||
geometry 1024 768 2048 4096 8
|
|
||||||
timings 11209 144 24 29 0 120 3
|
|
||||||
endmode
|
|
||||||
|
|
||||||
mode "1280x1024-50"
|
|
||||||
geometry 1280 1024 2048 4096 8
|
|
||||||
timings 11114 232 16 39 0 160 3
|
|
||||||
endmode
|
|
||||||
|
|
||||||
mode "1280x1024-60"
|
|
||||||
geometry 1280 1024 2048 4096 8
|
|
||||||
timings 9262 232 16 39 0 160 3
|
|
||||||
endmode
|
|
||||||
|
|
||||||
mode "1280x1024-70"
|
|
||||||
geometry 1280 1024 2048 4096 8
|
|
||||||
timings 7939 232 16 39 0 160 3
|
|
||||||
endmode
|
|
||||||
|
|
||||||
mode "1280x1024-72"
|
|
||||||
geometry 1280 1024 2048 4096 8
|
|
||||||
timings 7719 232 16 39 0 160 3
|
|
||||||
endmode
|
|
||||||
|
|
||||||
mode "1280x1024-75"
|
|
||||||
geometry 1280 1024 2048 4096 8
|
|
||||||
timings 7410 232 16 39 0 160 3
|
|
||||||
endmode
|
|
||||||
|
|
||||||
mode "1280x1024-80"
|
|
||||||
geometry 1280 1024 2048 4096 8
|
|
||||||
timings 6946 232 16 39 0 160 3
|
|
||||||
endmode
|
|
||||||
|
|
||||||
mode "1280x1024-85"
|
|
||||||
geometry 1280 1024 2048 4096 8
|
|
||||||
timings 6538 232 16 39 0 160 3
|
|
||||||
endmode
|
|
@@ -1,79 +0,0 @@
|
|||||||
Speed
|
|
||||||
=====
|
|
||||||
|
|
||||||
CyBlaFB is much faster than tridentfb and vesafb. Compare the performance data
|
|
||||||
for mode 1280x1024-[8,16,32]@61 Hz.
|
|
||||||
|
|
||||||
Test 1: Cat a file with 2000 lines of 0 characters.
|
|
||||||
Test 2: Cat a file with 2000 lines of 80 characters.
|
|
||||||
Test 3: Cat a file with 2000 lines of 160 characters.
|
|
||||||
|
|
||||||
All values show system time use in seconds, kernel 2.6.12 was used for
|
|
||||||
the measurements. 2.6.13 is a bit slower, 2.6.14 hopefully will include a
|
|
||||||
patch that speeds up kernel bitblitting a lot ( > 20%).
|
|
||||||
|
|
||||||
+-----------+-----------------------------------------------------+
|
|
||||||
| | not accelerated |
|
|
||||||
| TRIDENTFB +-----------------+-----------------+-----------------+
|
|
||||||
| of 2.6.12 | 8 bpp | 16 bpp | 32 bpp |
|
|
||||||
| | noypan | ypan | noypan | ypan | noypan | ypan |
|
|
||||||
+-----------+--------+--------+--------+--------+--------+--------+
|
|
||||||
| Test 1 | 4.31 | 4.33 | 6.05 | 12.81 | ---- | ---- |
|
|
||||||
| Test 2 | 67.94 | 5.44 | 123.16 | 14.79 | ---- | ---- |
|
|
||||||
| Test 3 | 131.36 | 6.55 | 240.12 | 16.76 | ---- | ---- |
|
|
||||||
+-----------+--------+--------+--------+--------+--------+--------+
|
|
||||||
| Comments | | | completely bro- |
|
|
||||||
| | | | ken, monitor |
|
|
||||||
| | | | switches off |
|
|
||||||
+-----------+-----------------+-----------------+-----------------+
|
|
||||||
|
|
||||||
|
|
||||||
+-----------+-----------------------------------------------------+
|
|
||||||
| | accelerated |
|
|
||||||
| TRIDENTFB +-----------------+-----------------+-----------------+
|
|
||||||
| of 2.6.12 | 8 bpp | 16 bpp | 32 bpp |
|
|
||||||
| | noypan | ypan | noypan | ypan | noypan | ypan |
|
|
||||||
+-----------+--------+--------+--------+--------+--------+--------+
|
|
||||||
| Test 1 | ---- | ---- | 20.62 | 1.22 | ---- | ---- |
|
|
||||||
| Test 2 | ---- | ---- | 22.61 | 3.19 | ---- | ---- |
|
|
||||||
| Test 3 | ---- | ---- | 24.59 | 5.16 | ---- | ---- |
|
|
||||||
+-----------+--------+--------+--------+--------+--------+--------+
|
|
||||||
| Comments | broken, writing | broken, ok only | completely bro- |
|
|
||||||
| | to wrong places | if bgcolor is | ken, monitor |
|
|
||||||
| | on screen + bug | black, bug in | switches off |
|
|
||||||
| | in fillrect() | fillrect() | |
|
|
||||||
+-----------+-----------------+-----------------+-----------------+
|
|
||||||
|
|
||||||
|
|
||||||
+-----------+-----------------------------------------------------+
|
|
||||||
| | not accelerated |
|
|
||||||
| VESAFB +-----------------+-----------------+-----------------+
|
|
||||||
| of 2.6.12 | 8 bpp | 16 bpp | 32 bpp |
|
|
||||||
| | noypan | ypan | noypan | ypan | noypan | ypan |
|
|
||||||
+-----------+--------+--------+--------+--------+--------+--------+
|
|
||||||
| Test 1 | 4.26 | 3.76 | 5.99 | 7.23 | ---- | ---- |
|
|
||||||
| Test 2 | 65.65 | 4.89 | 120.88 | 9.08 | ---- | ---- |
|
|
||||||
| Test 3 | 126.91 | 5.94 | 235.77 | 11.03 | ---- | ---- |
|
|
||||||
+-----------+--------+--------+--------+--------+--------+--------+
|
|
||||||
| Comments | vga=0x307 | vga=0x31a | vga=0x31b not |
|
|
||||||
| | fh=80kHz | fh=80kHz | supported by |
|
|
||||||
| | fv=75kHz | fv=75kHz | video BIOS and |
|
|
||||||
| | | | hardware |
|
|
||||||
+-----------+-----------------+-----------------+-----------------+
|
|
||||||
|
|
||||||
|
|
||||||
+-----------+-----------------------------------------------------+
|
|
||||||
| | accelerated |
|
|
||||||
| CYBLAFB +-----------------+-----------------+-----------------+
|
|
||||||
| | 8 bpp | 16 bpp | 32 bpp |
|
|
||||||
| | noypan | ypan | noypan | ypan | noypan | ypan |
|
|
||||||
+-----------+--------+--------+--------+--------+--------+--------+
|
|
||||||
| Test 1 | 8.02 | 0.23 | 19.04 | 0.61 | 57.12 | 2.74 |
|
|
||||||
| Test 2 | 8.38 | 0.55 | 19.39 | 0.92 | 57.54 | 3.13 |
|
|
||||||
| Test 3 | 8.73 | 0.86 | 19.74 | 1.24 | 57.95 | 3.51 |
|
|
||||||
+-----------+--------+--------+--------+--------+--------+--------+
|
|
||||||
| Comments | | | |
|
|
||||||
| | | | |
|
|
||||||
| | | | |
|
|
||||||
| | | | |
|
|
||||||
+-----------+-----------------+-----------------+-----------------+
|
|
@@ -1,31 +0,0 @@
|
|||||||
TODO / Missing features
|
|
||||||
=======================
|
|
||||||
|
|
||||||
Verify LCD stuff "stretch" and "center" options are
|
|
||||||
completely untested ... this code needs to be
|
|
||||||
verified. As I don't have access to such
|
|
||||||
hardware, please contact me if you are
|
|
||||||
willing run some tests.
|
|
||||||
|
|
||||||
Interlaced video modes The reason that interleaved
|
|
||||||
modes are disabled is that I do not know
|
|
||||||
the meaning of the vertical interlace
|
|
||||||
parameter. Also the datasheet mentions a
|
|
||||||
bit d8 of a horizontal interlace parameter,
|
|
||||||
but nowhere the lower 8 bits. Please help
|
|
||||||
if you can.
|
|
||||||
|
|
||||||
low-res double scan modes Who needs it?
|
|
||||||
|
|
||||||
accelerated color blitting Who needs it? The console driver does use color
|
|
||||||
blitting for nothing but drawing the penguine,
|
|
||||||
everything else is done using color expanding
|
|
||||||
blitting of 1bpp character bitmaps.
|
|
||||||
|
|
||||||
ioctls Who needs it?
|
|
||||||
|
|
||||||
TV-out Will be done later. Use "vga= " at boot time
|
|
||||||
to set a suitable video mode.
|
|
||||||
|
|
||||||
??? Feel free to contact me if you have any
|
|
||||||
feature requests
|
|
@@ -1,217 +0,0 @@
|
|||||||
CyBlaFB is a framebuffer driver for the Cyberblade/i1 graphics core integrated
|
|
||||||
into the VIA Apollo PLE133 (aka vt8601) south bridge. It is developed and
|
|
||||||
tested using a VIA EPIA 5000 board.
|
|
||||||
|
|
||||||
Cyblafb - compiled into the kernel or as a module?
|
|
||||||
==================================================
|
|
||||||
|
|
||||||
You might compile cyblafb either as a module or compile it permanently into the
|
|
||||||
kernel.
|
|
||||||
|
|
||||||
Unless you have a real reason to do so you should not compile both vesafb and
|
|
||||||
cyblafb permanently into the kernel. It's possible and it helps during the
|
|
||||||
developement cycle, but it's useless and will at least block some otherwise
|
|
||||||
usefull memory for ordinary users.
|
|
||||||
|
|
||||||
Selecting Modes
|
|
||||||
===============
|
|
||||||
|
|
||||||
Startup Mode
|
|
||||||
============
|
|
||||||
|
|
||||||
First of all, you might use the "vga=???" boot parameter as it is
|
|
||||||
documented in vesafb.txt and svga.txt. Cyblafb will detect the video
|
|
||||||
mode selected and will use the geometry and timings found by
|
|
||||||
inspecting the hardware registers.
|
|
||||||
|
|
||||||
video=cyblafb vga=0x317
|
|
||||||
|
|
||||||
Alternatively you might use a combination of the mode, ref and bpp
|
|
||||||
parameters. If you compiled the driver into the kernel, add something
|
|
||||||
like this to the kernel command line:
|
|
||||||
|
|
||||||
video=cyblafb:1280x1024,bpp=16,ref=50 ...
|
|
||||||
|
|
||||||
If you compiled the driver as a module, the same mode would be
|
|
||||||
selected by the following command:
|
|
||||||
|
|
||||||
modprobe cyblafb mode=1280x1024 bpp=16 ref=50 ...
|
|
||||||
|
|
||||||
None of the modes possible to select as startup modes are affected by
|
|
||||||
the problems described at the end of the next subsection.
|
|
||||||
|
|
||||||
For all startup modes cyblafb chooses a virtual x resolution of 2048,
|
|
||||||
the only exception is mode 1280x1024 in combination with 32 bpp. This
|
|
||||||
allows ywrap scrolling for all those modes if rotation is 0 or 2, and
|
|
||||||
also fast scrolling if rotation is 1 or 3. The default virtual y reso-
|
|
||||||
lution is 4096 for bpp == 8, 2048 for bpp==16 and 1024 for bpp == 32,
|
|
||||||
again with the only exception of 1280x1024 at 32 bpp.
|
|
||||||
|
|
||||||
Please do set your video memory size to 8 Mb in the Bios setup. Other
|
|
||||||
values will work, but performace is decreased for a lot of modes.
|
|
||||||
|
|
||||||
Mode changes using fbset
|
|
||||||
========================
|
|
||||||
|
|
||||||
You might use fbset to change the video mode, see "man fbset". Cyblafb
|
|
||||||
generally does assume that you know what you are doing. But it does
|
|
||||||
some checks, especially those that are needed to prevent you from
|
|
||||||
damaging your hardware.
|
|
||||||
|
|
||||||
- only 8, 16, 24 and 32 bpp video modes are accepted
|
|
||||||
- interlaced video modes are not accepted
|
|
||||||
- double scan video modes are not accepted
|
|
||||||
- if a flat panel is found, cyblafb does not allow you
|
|
||||||
to program a resolution higher than the physical
|
|
||||||
resolution of the flat panel monitor
|
|
||||||
- cyblafb does not allow vclk to exceed 230 MHz. As 32 bpp
|
|
||||||
and (currently) 24 bit modes use a doubled vclk internally,
|
|
||||||
the dotclock limit as seen by fbset is 115 MHz for those
|
|
||||||
modes and 230 MHz for 8 and 16 bpp modes.
|
|
||||||
- cyblafb will allow you to select very high resolutions as
|
|
||||||
long as the hardware can be programmed to these modes. The
|
|
||||||
documented limit 1600x1200 is not enforced, but don't expect
|
|
||||||
perfect signal quality.
|
|
||||||
|
|
||||||
Any request that violates the rules given above will be either changed
|
|
||||||
to something the hardware supports or an error value will be returned.
|
|
||||||
|
|
||||||
If you program a virtual y resolution higher than the hardware limit,
|
|
||||||
cyblafb will silently decrease that value to the highest possible
|
|
||||||
value. The same is true for a virtual x resolution that is not
|
|
||||||
supported by the hardware. Cyblafb tries to adapt vyres first because
|
|
||||||
vxres decides if ywrap scrolling is possible or not.
|
|
||||||
|
|
||||||
Attempts to disable acceleration are ignored, I believe that this is
|
|
||||||
safe.
|
|
||||||
|
|
||||||
Some video modes that should work do not work as expected. If you use
|
|
||||||
the standard fb.modes, fbset 640x480-60 will program that mode, but
|
|
||||||
you will see a vertical area, about two characters wide, with only
|
|
||||||
much darker characters than the other characters on the screen.
|
|
||||||
Cyblafb does allow that mode to be set, as it does not violate the
|
|
||||||
official specifications. It would need a lot of code to reliably sort
|
|
||||||
out all invalid modes, playing around with the margin values will
|
|
||||||
give a valid mode quickly. And if cyblafb would detect such an invalid
|
|
||||||
mode, should it silently alter the requested values or should it
|
|
||||||
report an error? Both options have some pros and cons. As stated
|
|
||||||
above, none of the startup modes are affected, and if you set
|
|
||||||
verbosity to 1 or higher, cyblafb will print the fbset command that
|
|
||||||
would be needed to program that mode using fbset.
|
|
||||||
|
|
||||||
|
|
||||||
Other Parameters
|
|
||||||
================
|
|
||||||
|
|
||||||
|
|
||||||
crt don't autodetect, assume monitor connected to
|
|
||||||
standard VGA connector
|
|
||||||
|
|
||||||
fp don't autodetect, assume flat panel display
|
|
||||||
connected to flat panel monitor interface
|
|
||||||
|
|
||||||
nativex inform driver about native x resolution of
|
|
||||||
flat panel monitor connected to special
|
|
||||||
interface (should be autodetected)
|
|
||||||
|
|
||||||
stretch stretch image to adapt low resolution modes to
|
|
||||||
higer resolutions of flat panel monitors
|
|
||||||
connected to special interface
|
|
||||||
|
|
||||||
center center image to adapt low resolution modes to
|
|
||||||
higer resolutions of flat panel monitors
|
|
||||||
connected to special interface
|
|
||||||
|
|
||||||
memsize use if autodetected memsize is wrong ...
|
|
||||||
should never be necessary
|
|
||||||
|
|
||||||
nopcirr disable PCI read retry
|
|
||||||
nopciwr disable PCI write retry
|
|
||||||
nopcirb disable PCI read bursts
|
|
||||||
nopciwb disable PCI write bursts
|
|
||||||
|
|
||||||
bpp bpp for specified modes
|
|
||||||
valid values: 8 || 16 || 24 || 32
|
|
||||||
|
|
||||||
ref refresh rate for specified mode
|
|
||||||
valid values: 50 <= ref <= 85
|
|
||||||
|
|
||||||
mode 640x480 or 800x600 or 1024x768 or 1280x1024
|
|
||||||
if not specified, the startup mode will be detected
|
|
||||||
and used, so you might also use the vga=??? parameter
|
|
||||||
described in vesafb.txt. If you do not specify a mode,
|
|
||||||
bpp and ref parameters are ignored.
|
|
||||||
|
|
||||||
verbosity 0 is the default, increase to at least 2 for every
|
|
||||||
bug report!
|
|
||||||
|
|
||||||
Development hints
|
|
||||||
=================
|
|
||||||
|
|
||||||
It's much faster do compile a module and to load the new version after
|
|
||||||
unloading the old module than to compile a new kernel and to reboot. So if you
|
|
||||||
try to work on cyblafb, it might be a good idea to use cyblafb as a module.
|
|
||||||
In real life, fast often means dangerous, and that's also the case here. If
|
|
||||||
you introduce a serious bug when cyblafb is compiled into the kernel, the
|
|
||||||
kernel will lock or oops with a high probability before the file system is
|
|
||||||
mounted, and the danger for your data is low. If you load a broken own version
|
|
||||||
of cyblafb on a running system, the danger for the integrity of the file
|
|
||||||
system is much higher as you might need a hard reset afterwards. Decide
|
|
||||||
yourself.
|
|
||||||
|
|
||||||
Module unloading, the vfb method
|
|
||||||
================================
|
|
||||||
|
|
||||||
If you want to unload/reload cyblafb using the virtual framebuffer, you need
|
|
||||||
to enable vfb support in the kernel first. After that, load the modules as
|
|
||||||
shown below:
|
|
||||||
|
|
||||||
modprobe vfb vfb_enable=1
|
|
||||||
modprobe fbcon
|
|
||||||
modprobe cyblafb
|
|
||||||
fbset -fb /dev/fb1 1280x1024-60 -vyres 2662
|
|
||||||
con2fb /dev/fb1 /dev/tty1
|
|
||||||
...
|
|
||||||
|
|
||||||
If you now made some changes to cyblafb and want to reload it, you might do it
|
|
||||||
as show below:
|
|
||||||
|
|
||||||
con2fb /dev/fb0 /dev/tty1
|
|
||||||
...
|
|
||||||
rmmod cyblafb
|
|
||||||
modprobe cyblafb
|
|
||||||
con2fb /dev/fb1 /dev/tty1
|
|
||||||
...
|
|
||||||
|
|
||||||
Of course, you might choose another mode, and most certainly you also want to
|
|
||||||
map some other /dev/tty* to the real framebuffer device. You might also choose
|
|
||||||
to compile fbcon as a kernel module or place it permanently in the kernel.
|
|
||||||
|
|
||||||
I do not know of any way to unload fbcon, and fbcon will prevent the
|
|
||||||
framebuffer device loaded first from unloading. [If there is a way, then
|
|
||||||
please add a description here!]
|
|
||||||
|
|
||||||
Module unloading, the vesafb method
|
|
||||||
===================================
|
|
||||||
|
|
||||||
Configure the kernel:
|
|
||||||
|
|
||||||
<*> Support for frame buffer devices
|
|
||||||
[*] VESA VGA graphics support
|
|
||||||
<M> Cyberblade/i1 support
|
|
||||||
|
|
||||||
Add e.g. "video=vesafb:ypan vga=0x307" to the kernel parameters. The ypan
|
|
||||||
parameter is important, choose any vga parameter you like as long as it is
|
|
||||||
a graphics mode.
|
|
||||||
|
|
||||||
After booting, load cyblafb without any mode and bpp parameter and assign
|
|
||||||
cyblafb to individual ttys using con2fb, e.g.:
|
|
||||||
|
|
||||||
modprobe cyblafb
|
|
||||||
con2fb /dev/fb1 /dev/tty1
|
|
||||||
|
|
||||||
Unloading cyblafb works without problems after you assign vesafb to all
|
|
||||||
ttys again, e.g.:
|
|
||||||
|
|
||||||
con2fb /dev/fb0 /dev/tty1
|
|
||||||
rmmod cyblafb
|
|
@@ -1,29 +0,0 @@
|
|||||||
0.62
|
|
||||||
====
|
|
||||||
|
|
||||||
- the vesafb parameter has been removed as I decided to allow the
|
|
||||||
feature without any special parameter.
|
|
||||||
|
|
||||||
- Cyblafb does not use the vga style of panning any longer, now the
|
|
||||||
"right view" register in the graphics engine IO space is used. Without
|
|
||||||
that change it was impossible to use all available memory, and without
|
|
||||||
access to all available memory it is impossible to ywrap.
|
|
||||||
|
|
||||||
- The imageblit function now uses hardware acceleration for all font
|
|
||||||
widths. Hardware blitting across pixel column 2048 is broken in the
|
|
||||||
cyberblade/i1 graphics core, but we work around that hardware bug.
|
|
||||||
|
|
||||||
- modes with vxres != xres are supported now.
|
|
||||||
|
|
||||||
- ywrap scrolling is supported now and the default. This is a big
|
|
||||||
performance gain.
|
|
||||||
|
|
||||||
- default video modes use vyres > yres and vxres > xres to allow
|
|
||||||
almost optimal scrolling speed for normal and rotated screens
|
|
||||||
|
|
||||||
- some features mainly usefull for debugging the upper layers of the
|
|
||||||
framebuffer system have been added, have a look at the code
|
|
||||||
|
|
||||||
- fixed: Oops after unloading cyblafb when reading /proc/io*
|
|
||||||
|
|
||||||
- we work around some bugs of the higher framebuffer layers.
|
|
@@ -1,85 +0,0 @@
|
|||||||
I tried the following framebuffer drivers:
|
|
||||||
|
|
||||||
- TRIDENTFB is full of bugs. Acceleration is broken for Blade3D
|
|
||||||
graphics cores like the cyberblade/i1. It claims to support a great
|
|
||||||
number of devices, but documentation for most of these devices is
|
|
||||||
unfortunately not available. There is _no_ reason to use tridentfb
|
|
||||||
for cyberblade/i1 + CRT users. VESAFB is faster, and the one
|
|
||||||
advantage, mode switching, is broken in tridentfb.
|
|
||||||
|
|
||||||
- VESAFB is used by many distributions as a standard. Vesafb does
|
|
||||||
not support mode switching. VESAFB is a bit faster than the working
|
|
||||||
configurations of TRIDENTFB, but it is still too slow, even if you
|
|
||||||
use ypan.
|
|
||||||
|
|
||||||
- EPIAFB (you'll find it on sourceforge) supports the Cyberblade/i1
|
|
||||||
graphics core, but it still has serious bugs and developement seems
|
|
||||||
to have stopped. This is the one driver with TV-out support. If you
|
|
||||||
do need this feature, try epiafb.
|
|
||||||
|
|
||||||
None of these drivers was a real option for me.
|
|
||||||
|
|
||||||
I believe that is unreasonable to change code that announces to support 20
|
|
||||||
devices if I only have more or less sufficient documentation for exactly one
|
|
||||||
of these. The risk of breaking device foo while fixing device bar is too high.
|
|
||||||
|
|
||||||
So I decided to start CyBlaFB as a stripped down tridentfb.
|
|
||||||
|
|
||||||
All code specific to other Trident chips has been removed. After that there
|
|
||||||
were a lot of cosmetic changes to increase the readability of the code. All
|
|
||||||
register names were changed to those mnemonics used in the datasheet. Function
|
|
||||||
and macro names were changed if they hindered easy understanding of the code.
|
|
||||||
|
|
||||||
After that I debugged the code and implemented some new features. I'll try to
|
|
||||||
give a little summary of the main changes:
|
|
||||||
|
|
||||||
- calculation of vertical and horizontal timings was fixed
|
|
||||||
|
|
||||||
- video signal quality has been improved dramatically
|
|
||||||
|
|
||||||
- acceleration:
|
|
||||||
|
|
||||||
- fillrect and copyarea were fixed and reenabled
|
|
||||||
|
|
||||||
- color expanding imageblit was newly implemented, color
|
|
||||||
imageblit (only used to draw the penguine) still uses the
|
|
||||||
generic code.
|
|
||||||
|
|
||||||
- init of the acceleration engine was improved and moved to a
|
|
||||||
place where it really works ...
|
|
||||||
|
|
||||||
- sync function has a timeout now and tries to reset and
|
|
||||||
reinit the accel engine if necessary
|
|
||||||
|
|
||||||
- fewer slow copyarea calls when doing ypan scrolling by using
|
|
||||||
undocumented bit d21 of screen start address stored in
|
|
||||||
CR2B[5]. BIOS does use it also, so this should be safe.
|
|
||||||
|
|
||||||
- cyblafb rejects any attempt to set modes that would cause vclk
|
|
||||||
values above reasonable 230 MHz. 32bit modes use a clock
|
|
||||||
multiplicator of 2, so fbset does show the correct values for
|
|
||||||
pixclock but not for vclk in this case. The fbset limit is 115 MHz
|
|
||||||
for 32 bpp modes.
|
|
||||||
|
|
||||||
- cyblafb rejects modes known to be broken or unimplemented (all
|
|
||||||
interlaced modes, all doublescan modes for now)
|
|
||||||
|
|
||||||
- cyblafb now works independant of the video mode in effect at startup
|
|
||||||
time (tridentfb does not init all needed registers to reasonable
|
|
||||||
values)
|
|
||||||
|
|
||||||
- switching between video modes does work reliably now
|
|
||||||
|
|
||||||
- the first video mode now is the one selected on startup using the
|
|
||||||
vga=???? mechanism or any of
|
|
||||||
- 640x480, 800x600, 1024x768, 1280x1024
|
|
||||||
- 8, 16, 24 or 32 bpp
|
|
||||||
- refresh between 50 Hz and 85 Hz, 1 Hz steps (1280x1024-32
|
|
||||||
is limited to 63Hz)
|
|
||||||
|
|
||||||
- pci retry and pci burst mode are settable (try to disable if you
|
|
||||||
experience latency problems)
|
|
||||||
|
|
||||||
- built as a module cyblafb might be unloaded and reloaded using
|
|
||||||
the vfb module and con2vt or might be used together with vesafb
|
|
||||||
|
|
@@ -59,7 +59,8 @@ Accepted options:
|
|||||||
ypan Enable display panning using the VESA protected mode
|
ypan Enable display panning using the VESA protected mode
|
||||||
interface. The visible screen is just a window of the
|
interface. The visible screen is just a window of the
|
||||||
video memory, console scrolling is done by changing the
|
video memory, console scrolling is done by changing the
|
||||||
start of the window. Available on x86 only.
|
start of the window. This option is available on x86
|
||||||
|
only and is the default option on that architecture.
|
||||||
|
|
||||||
ywrap Same as ypan, but assumes your gfx board can wrap-around
|
ywrap Same as ypan, but assumes your gfx board can wrap-around
|
||||||
the video memory (i.e. starts reading from top if it
|
the video memory (i.e. starts reading from top if it
|
||||||
@@ -67,7 +68,7 @@ ywrap Same as ypan, but assumes your gfx board can wrap-around
|
|||||||
Available on x86 only.
|
Available on x86 only.
|
||||||
|
|
||||||
redraw Scroll by redrawing the affected part of the screen, this
|
redraw Scroll by redrawing the affected part of the screen, this
|
||||||
is the safe (and slow) default.
|
is the default on non-x86.
|
||||||
|
|
||||||
(If you're using uvesafb as a module, the above three options are
|
(If you're using uvesafb as a module, the above three options are
|
||||||
used a parameter of the scroll option, e.g. scroll=ypan.)
|
used a parameter of the scroll option, e.g. scroll=ypan.)
|
||||||
@@ -182,7 +183,7 @@ from the Video BIOS if you set pixclock to 0 in fb_var_screeninfo.
|
|||||||
|
|
||||||
--
|
--
|
||||||
Michal Januszewski <spock@gentoo.org>
|
Michal Januszewski <spock@gentoo.org>
|
||||||
Last updated: 2007-06-16
|
Last updated: 2009-03-30
|
||||||
|
|
||||||
Documentation of the uvesafb options is loosely based on vesafb.txt.
|
Documentation of the uvesafb options is loosely based on vesafb.txt.
|
||||||
|
|
||||||
|
@@ -6,20 +6,47 @@ be removed from this file.
|
|||||||
|
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
||||||
What: old static regulatory information and ieee80211_regdom module parameter
|
What: The ieee80211_regdom module parameter
|
||||||
When: 2.6.29
|
When: March 2010 / desktop catchup
|
||||||
|
|
||||||
|
Why: This was inherited by the CONFIG_WIRELESS_OLD_REGULATORY code,
|
||||||
|
and currently serves as an option for users to define an
|
||||||
|
ISO / IEC 3166 alpha2 code for the country they are currently
|
||||||
|
present in. Although there are userspace API replacements for this
|
||||||
|
through nl80211 distributions haven't yet caught up with implementing
|
||||||
|
decent alternatives through standard GUIs. Although available as an
|
||||||
|
option through iw or wpa_supplicant its just a matter of time before
|
||||||
|
distributions pick up good GUI options for this. The ideal solution
|
||||||
|
would actually consist of intelligent designs which would do this for
|
||||||
|
the user automatically even when travelling through different countries.
|
||||||
|
Until then we leave this module parameter as a compromise.
|
||||||
|
|
||||||
|
When userspace improves with reasonable widely-available alternatives for
|
||||||
|
this we will no longer need this module parameter. This entry hopes that
|
||||||
|
by the super-futuristically looking date of "March 2010" we will have
|
||||||
|
such replacements widely available.
|
||||||
|
|
||||||
|
Who: Luis R. Rodriguez <lrodriguez@atheros.com>
|
||||||
|
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
What: CONFIG_WIRELESS_OLD_REGULATORY - old static regulatory information
|
||||||
|
When: March 2010 / desktop catchup
|
||||||
|
|
||||||
Why: The old regulatory infrastructure has been replaced with a new one
|
Why: The old regulatory infrastructure has been replaced with a new one
|
||||||
which does not require statically defined regulatory domains. We do
|
which does not require statically defined regulatory domains. We do
|
||||||
not want to keep static regulatory domains in the kernel due to the
|
not want to keep static regulatory domains in the kernel due to the
|
||||||
the dynamic nature of regulatory law and localization. We kept around
|
the dynamic nature of regulatory law and localization. We kept around
|
||||||
the old static definitions for the regulatory domains of:
|
the old static definitions for the regulatory domains of:
|
||||||
|
|
||||||
* US
|
* US
|
||||||
* JP
|
* JP
|
||||||
* EU
|
* EU
|
||||||
|
|
||||||
and used by default the US when CONFIG_WIRELESS_OLD_REGULATORY was
|
and used by default the US when CONFIG_WIRELESS_OLD_REGULATORY was
|
||||||
set. We also kept around the ieee80211_regdom module parameter in case
|
set. We will remove this option once the standard Linux desktop catches
|
||||||
some applications were relying on it. Changing regulatory domains
|
up with the new userspace APIs we have implemented.
|
||||||
can now be done instead by using nl80211, as is done with iw.
|
|
||||||
Who: Luis R. Rodriguez <lrodriguez@atheros.com>
|
Who: Luis R. Rodriguez <lrodriguez@atheros.com>
|
||||||
|
|
||||||
---------------------------
|
---------------------------
|
||||||
@@ -37,10 +64,10 @@ Who: Pavel Machek <pavel@suse.cz>
|
|||||||
|
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
||||||
What: Video4Linux API 1 ioctls and video_decoder.h from Video devices.
|
What: Video4Linux API 1 ioctls and from Video devices.
|
||||||
When: December 2008
|
When: July 2009
|
||||||
Files: include/linux/video_decoder.h include/linux/videodev.h
|
Files: include/linux/videodev.h
|
||||||
Check: include/linux/video_decoder.h include/linux/videodev.h
|
Check: include/linux/videodev.h
|
||||||
Why: V4L1 AP1 was replaced by V4L2 API during migration from 2.4 to 2.6
|
Why: V4L1 AP1 was replaced by V4L2 API during migration from 2.4 to 2.6
|
||||||
series. The old API have lots of drawbacks and don't provide enough
|
series. The old API have lots of drawbacks and don't provide enough
|
||||||
means to work with all video and audio standards. The newer API is
|
means to work with all video and audio standards. The newer API is
|
||||||
@@ -228,8 +255,20 @@ Who: Jan Engelhardt <jengelh@computergmbh.de>
|
|||||||
|
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
||||||
|
What: GPIO autorequest on gpio_direction_{input,output}() in gpiolib
|
||||||
|
When: February 2010
|
||||||
|
Why: All callers should use explicit gpio_request()/gpio_free().
|
||||||
|
The autorequest mechanism in gpiolib was provided mostly as a
|
||||||
|
migration aid for legacy GPIO interfaces (for SOC based GPIOs).
|
||||||
|
Those users have now largely migrated. Platforms implementing
|
||||||
|
the GPIO interfaces without using gpiolib will see no changes.
|
||||||
|
Who: David Brownell <dbrownell@users.sourceforge.net>
|
||||||
|
---------------------------
|
||||||
|
|
||||||
What: b43 support for firmware revision < 410
|
What: b43 support for firmware revision < 410
|
||||||
When: July 2008
|
When: The schedule was July 2008, but it was decided that we are going to keep the
|
||||||
|
code as long as there are no major maintanance headaches.
|
||||||
|
So it _could_ be removed _any_ time now, if it conflicts with something new.
|
||||||
Why: The support code for the old firmware hurts code readability/maintainability
|
Why: The support code for the old firmware hurts code readability/maintainability
|
||||||
and slightly hurts runtime performance. Bugfixes for the old firmware
|
and slightly hurts runtime performance. Bugfixes for the old firmware
|
||||||
are not provided by Broadcom anymore.
|
are not provided by Broadcom anymore.
|
||||||
@@ -244,13 +283,6 @@ Who: Glauber Costa <gcosta@redhat.com>
|
|||||||
|
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
||||||
What: remove HID compat support
|
|
||||||
When: 2.6.29
|
|
||||||
Why: needed only as a temporary solution until distros fix themselves up
|
|
||||||
Who: Jiri Slaby <jirislaby@gmail.com>
|
|
||||||
|
|
||||||
---------------------------
|
|
||||||
|
|
||||||
What: print_fn_descriptor_symbol()
|
What: print_fn_descriptor_symbol()
|
||||||
When: October 2009
|
When: October 2009
|
||||||
Why: The %pF vsprintf format provides the same functionality in a
|
Why: The %pF vsprintf format provides the same functionality in a
|
||||||
@@ -282,6 +314,18 @@ Who: Vlad Yasevich <vladislav.yasevich@hp.com>
|
|||||||
|
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
||||||
|
What: Ability for non root users to shm_get hugetlb pages based on mlock
|
||||||
|
resource limits
|
||||||
|
When: 2.6.31
|
||||||
|
Why: Non root users need to be part of /proc/sys/vm/hugetlb_shm_group or
|
||||||
|
have CAP_IPC_LOCK to be able to allocate shm segments backed by
|
||||||
|
huge pages. The mlock based rlimit check to allow shm hugetlb is
|
||||||
|
inconsistent with mmap based allocations. Hence it is being
|
||||||
|
deprecated.
|
||||||
|
Who: Ravikiran Thirumalai <kiran@scalex86.org>
|
||||||
|
|
||||||
|
---------------------------
|
||||||
|
|
||||||
What: CONFIG_THERMAL_HWMON
|
What: CONFIG_THERMAL_HWMON
|
||||||
When: January 2009
|
When: January 2009
|
||||||
Why: This option was introduced just to allow older lm-sensors userspace
|
Why: This option was introduced just to allow older lm-sensors userspace
|
||||||
@@ -310,8 +354,10 @@ Who: Krzysztof Piotr Oledzki <ole@ans.pl>
|
|||||||
|
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
||||||
What: i2c_attach_client(), i2c_detach_client(), i2c_driver->detach_client()
|
What: i2c_attach_client(), i2c_detach_client(), i2c_driver->detach_client(),
|
||||||
When: 2.6.29 (ideally) or 2.6.30 (more likely)
|
i2c_adapter->client_register(), i2c_adapter->client_unregister
|
||||||
|
When: 2.6.30
|
||||||
|
Check: i2c_attach_client i2c_detach_client
|
||||||
Why: Deprecated by the new (standard) device driver binding model. Use
|
Why: Deprecated by the new (standard) device driver binding model. Use
|
||||||
i2c_driver->probe() and ->remove() instead.
|
i2c_driver->probe() and ->remove() instead.
|
||||||
Who: Jean Delvare <khali@linux-fr.org>
|
Who: Jean Delvare <khali@linux-fr.org>
|
||||||
@@ -326,12 +372,59 @@ Who: Hans de Goede <hdegoede@redhat.com>
|
|||||||
|
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
||||||
What: SELinux "compat_net" functionality
|
What: sysfs ui for changing p4-clockmod parameters
|
||||||
When: 2.6.30 at the earliest
|
When: September 2009
|
||||||
Why: In 2.6.18 the Secmark concept was introduced to replace the "compat_net"
|
Why: See commits 129f8ae9b1b5be94517da76009ea956e89104ce8 and
|
||||||
network access control functionality of SELinux. Secmark offers both
|
e088e4c9cdb618675874becb91b2fd581ee707e6.
|
||||||
better performance and greater flexibility than the "compat_net"
|
Removal is subject to fixing any remaining bugs in ACPI which may
|
||||||
mechanism. Now that the major Linux distributions have moved to
|
cause the thermal throttling not to happen at the right time.
|
||||||
Secmark, it is time to deprecate the older mechanism and start the
|
Who: Dave Jones <davej@redhat.com>, Matthew Garrett <mjg@redhat.com>
|
||||||
process of removing the old code.
|
|
||||||
Who: Paul Moore <paul.moore@hp.com>
|
-----------------------------
|
||||||
|
|
||||||
|
What: __do_IRQ all in one fits nothing interrupt handler
|
||||||
|
When: 2.6.32
|
||||||
|
Why: __do_IRQ was kept for easy migration to the type flow handlers.
|
||||||
|
More than two years of migration time is enough.
|
||||||
|
Who: Thomas Gleixner <tglx@linutronix.de>
|
||||||
|
|
||||||
|
-----------------------------
|
||||||
|
|
||||||
|
What: obsolete generic irq defines and typedefs
|
||||||
|
When: 2.6.30
|
||||||
|
Why: The defines and typedefs (hw_interrupt_type, no_irq_type, irq_desc_t)
|
||||||
|
have been kept around for migration reasons. After more than two years
|
||||||
|
it's time to remove them finally
|
||||||
|
Who: Thomas Gleixner <tglx@linutronix.de>
|
||||||
|
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
What: fakephp and associated sysfs files in /sys/bus/pci/slots/
|
||||||
|
When: 2011
|
||||||
|
Why: In 2.6.27, the semantics of /sys/bus/pci/slots was redefined to
|
||||||
|
represent a machine's physical PCI slots. The change in semantics
|
||||||
|
had userspace implications, as the hotplug core no longer allowed
|
||||||
|
drivers to create multiple sysfs files per physical slot (required
|
||||||
|
for multi-function devices, e.g.). fakephp was seen as a developer's
|
||||||
|
tool only, and its interface changed. Too late, we learned that
|
||||||
|
there were some users of the fakephp interface.
|
||||||
|
|
||||||
|
In 2.6.30, the original fakephp interface was restored. At the same
|
||||||
|
time, the PCI core gained the ability that fakephp provided, namely
|
||||||
|
function-level hot-remove and hot-add.
|
||||||
|
|
||||||
|
Since the PCI core now provides the same functionality, exposed in:
|
||||||
|
|
||||||
|
/sys/bus/pci/rescan
|
||||||
|
/sys/bus/pci/devices/.../remove
|
||||||
|
/sys/bus/pci/devices/.../rescan
|
||||||
|
|
||||||
|
there is no functional reason to maintain fakephp as well.
|
||||||
|
|
||||||
|
We will keep the existing module so that 'modprobe fakephp' will
|
||||||
|
present the old /sys/bus/pci/slots/... interface for compatibility,
|
||||||
|
but users are urged to migrate their applications to the API above.
|
||||||
|
|
||||||
|
After a reasonable transition period, we will remove the legacy
|
||||||
|
fakephp interface.
|
||||||
|
Who: Alex Chiang <achiang@hp.com>
|
||||||
|
@@ -68,6 +68,8 @@ ncpfs.txt
|
|||||||
- info on Novell Netware(tm) filesystem using NCP protocol.
|
- info on Novell Netware(tm) filesystem using NCP protocol.
|
||||||
nfsroot.txt
|
nfsroot.txt
|
||||||
- short guide on setting up a diskless box with NFS root filesystem.
|
- short guide on setting up a diskless box with NFS root filesystem.
|
||||||
|
nilfs2.txt
|
||||||
|
- info and mount options for the NILFS2 filesystem.
|
||||||
ntfs.txt
|
ntfs.txt
|
||||||
- info and mount options for the NTFS filesystem (Windows NT).
|
- info and mount options for the NTFS filesystem (Windows NT).
|
||||||
ocfs2.txt
|
ocfs2.txt
|
||||||
|
@@ -437,8 +437,11 @@ grab BKL for cases when we close a file that had been opened r/w, but that
|
|||||||
can and should be done using the internal locking with smaller critical areas).
|
can and should be done using the internal locking with smaller critical areas).
|
||||||
Current worst offender is ext2_get_block()...
|
Current worst offender is ext2_get_block()...
|
||||||
|
|
||||||
->fasync() is a mess. This area needs a big cleanup and that will probably
|
->fasync() is called without BKL protection, and is responsible for
|
||||||
affect locking.
|
maintaining the FASYNC bit in filp->f_flags. Most instances call
|
||||||
|
fasync_helper(), which does that maintenance, so it's not normally
|
||||||
|
something one needs to worry about. Return values > 0 will be mapped to
|
||||||
|
zero in the VFS layer.
|
||||||
|
|
||||||
->readdir() and ->ioctl() on directories must be changed. Ideally we would
|
->readdir() and ->ioctl() on directories must be changed. Ideally we would
|
||||||
move ->readdir() to inode_operations and use a separate method for directory
|
move ->readdir() to inode_operations and use a separate method for directory
|
||||||
@@ -502,7 +505,7 @@ prototypes:
|
|||||||
void (*open)(struct vm_area_struct*);
|
void (*open)(struct vm_area_struct*);
|
||||||
void (*close)(struct vm_area_struct*);
|
void (*close)(struct vm_area_struct*);
|
||||||
int (*fault)(struct vm_area_struct*, struct vm_fault *);
|
int (*fault)(struct vm_area_struct*, struct vm_fault *);
|
||||||
int (*page_mkwrite)(struct vm_area_struct *, struct page *);
|
int (*page_mkwrite)(struct vm_area_struct *, struct vm_fault *);
|
||||||
int (*access)(struct vm_area_struct *, unsigned long, void*, int, int);
|
int (*access)(struct vm_area_struct *, unsigned long, void*, int, int);
|
||||||
|
|
||||||
locking rules:
|
locking rules:
|
||||||
|
658
Documentation/filesystems/caching/backend-api.txt
Normal file
658
Documentation/filesystems/caching/backend-api.txt
Normal file
@@ -0,0 +1,658 @@
|
|||||||
|
==========================
|
||||||
|
FS-CACHE CACHE BACKEND API
|
||||||
|
==========================
|
||||||
|
|
||||||
|
The FS-Cache system provides an API by which actual caches can be supplied to
|
||||||
|
FS-Cache for it to then serve out to network filesystems and other interested
|
||||||
|
parties.
|
||||||
|
|
||||||
|
This API is declared in <linux/fscache-cache.h>.
|
||||||
|
|
||||||
|
|
||||||
|
====================================
|
||||||
|
INITIALISING AND REGISTERING A CACHE
|
||||||
|
====================================
|
||||||
|
|
||||||
|
To start off, a cache definition must be initialised and registered for each
|
||||||
|
cache the backend wants to make available. For instance, CacheFS does this in
|
||||||
|
the fill_super() operation on mounting.
|
||||||
|
|
||||||
|
The cache definition (struct fscache_cache) should be initialised by calling:
|
||||||
|
|
||||||
|
void fscache_init_cache(struct fscache_cache *cache,
|
||||||
|
struct fscache_cache_ops *ops,
|
||||||
|
const char *idfmt,
|
||||||
|
...);
|
||||||
|
|
||||||
|
Where:
|
||||||
|
|
||||||
|
(*) "cache" is a pointer to the cache definition;
|
||||||
|
|
||||||
|
(*) "ops" is a pointer to the table of operations that the backend supports on
|
||||||
|
this cache; and
|
||||||
|
|
||||||
|
(*) "idfmt" is a format and printf-style arguments for constructing a label
|
||||||
|
for the cache.
|
||||||
|
|
||||||
|
|
||||||
|
The cache should then be registered with FS-Cache by passing a pointer to the
|
||||||
|
previously initialised cache definition to:
|
||||||
|
|
||||||
|
int fscache_add_cache(struct fscache_cache *cache,
|
||||||
|
struct fscache_object *fsdef,
|
||||||
|
const char *tagname);
|
||||||
|
|
||||||
|
Two extra arguments should also be supplied:
|
||||||
|
|
||||||
|
(*) "fsdef" which should point to the object representation for the FS-Cache
|
||||||
|
master index in this cache. Netfs primary index entries will be created
|
||||||
|
here. FS-Cache keeps the caller's reference to the index object if
|
||||||
|
successful and will release it upon withdrawal of the cache.
|
||||||
|
|
||||||
|
(*) "tagname" which, if given, should be a text string naming this cache. If
|
||||||
|
this is NULL, the identifier will be used instead. For CacheFS, the
|
||||||
|
identifier is set to name the underlying block device and the tag can be
|
||||||
|
supplied by mount.
|
||||||
|
|
||||||
|
This function may return -ENOMEM if it ran out of memory or -EEXIST if the tag
|
||||||
|
is already in use. 0 will be returned on success.
|
||||||
|
|
||||||
|
|
||||||
|
=====================
|
||||||
|
UNREGISTERING A CACHE
|
||||||
|
=====================
|
||||||
|
|
||||||
|
A cache can be withdrawn from the system by calling this function with a
|
||||||
|
pointer to the cache definition:
|
||||||
|
|
||||||
|
void fscache_withdraw_cache(struct fscache_cache *cache);
|
||||||
|
|
||||||
|
In CacheFS's case, this is called by put_super().
|
||||||
|
|
||||||
|
|
||||||
|
========
|
||||||
|
SECURITY
|
||||||
|
========
|
||||||
|
|
||||||
|
The cache methods are executed one of two contexts:
|
||||||
|
|
||||||
|
(1) that of the userspace process that issued the netfs operation that caused
|
||||||
|
the cache method to be invoked, or
|
||||||
|
|
||||||
|
(2) that of one of the processes in the FS-Cache thread pool.
|
||||||
|
|
||||||
|
In either case, this may not be an appropriate context in which to access the
|
||||||
|
cache.
|
||||||
|
|
||||||
|
The calling process's fsuid, fsgid and SELinux security identities may need to
|
||||||
|
be masqueraded for the duration of the cache driver's access to the cache.
|
||||||
|
This is left to the cache to handle; FS-Cache makes no effort in this regard.
|
||||||
|
|
||||||
|
|
||||||
|
===================================
|
||||||
|
CONTROL AND STATISTICS PRESENTATION
|
||||||
|
===================================
|
||||||
|
|
||||||
|
The cache may present data to the outside world through FS-Cache's interfaces
|
||||||
|
in sysfs and procfs - the former for control and the latter for statistics.
|
||||||
|
|
||||||
|
A sysfs directory called /sys/fs/fscache/<cachetag>/ is created if CONFIG_SYSFS
|
||||||
|
is enabled. This is accessible through the kobject struct fscache_cache::kobj
|
||||||
|
and is for use by the cache as it sees fit.
|
||||||
|
|
||||||
|
|
||||||
|
========================
|
||||||
|
RELEVANT DATA STRUCTURES
|
||||||
|
========================
|
||||||
|
|
||||||
|
(*) Index/Data file FS-Cache representation cookie:
|
||||||
|
|
||||||
|
struct fscache_cookie {
|
||||||
|
struct fscache_object_def *def;
|
||||||
|
struct fscache_netfs *netfs;
|
||||||
|
void *netfs_data;
|
||||||
|
...
|
||||||
|
};
|
||||||
|
|
||||||
|
The fields that might be of use to the backend describe the object
|
||||||
|
definition, the netfs definition and the netfs's data for this cookie.
|
||||||
|
The object definition contain functions supplied by the netfs for loading
|
||||||
|
and matching index entries; these are required to provide some of the
|
||||||
|
cache operations.
|
||||||
|
|
||||||
|
|
||||||
|
(*) In-cache object representation:
|
||||||
|
|
||||||
|
struct fscache_object {
|
||||||
|
int debug_id;
|
||||||
|
enum {
|
||||||
|
FSCACHE_OBJECT_RECYCLING,
|
||||||
|
...
|
||||||
|
} state;
|
||||||
|
spinlock_t lock
|
||||||
|
struct fscache_cache *cache;
|
||||||
|
struct fscache_cookie *cookie;
|
||||||
|
...
|
||||||
|
};
|
||||||
|
|
||||||
|
Structures of this type should be allocated by the cache backend and
|
||||||
|
passed to FS-Cache when requested by the appropriate cache operation. In
|
||||||
|
the case of CacheFS, they're embedded in CacheFS's internal object
|
||||||
|
structures.
|
||||||
|
|
||||||
|
The debug_id is a simple integer that can be used in debugging messages
|
||||||
|
that refer to a particular object. In such a case it should be printed
|
||||||
|
using "OBJ%x" to be consistent with FS-Cache.
|
||||||
|
|
||||||
|
Each object contains a pointer to the cookie that represents the object it
|
||||||
|
is backing. An object should retired when put_object() is called if it is
|
||||||
|
in state FSCACHE_OBJECT_RECYCLING. The fscache_object struct should be
|
||||||
|
initialised by calling fscache_object_init(object).
|
||||||
|
|
||||||
|
|
||||||
|
(*) FS-Cache operation record:
|
||||||
|
|
||||||
|
struct fscache_operation {
|
||||||
|
atomic_t usage;
|
||||||
|
struct fscache_object *object;
|
||||||
|
unsigned long flags;
|
||||||
|
#define FSCACHE_OP_EXCLUSIVE
|
||||||
|
void (*processor)(struct fscache_operation *op);
|
||||||
|
void (*release)(struct fscache_operation *op);
|
||||||
|
...
|
||||||
|
};
|
||||||
|
|
||||||
|
FS-Cache has a pool of threads that it uses to give CPU time to the
|
||||||
|
various asynchronous operations that need to be done as part of driving
|
||||||
|
the cache. These are represented by the above structure. The processor
|
||||||
|
method is called to give the op CPU time, and the release method to get
|
||||||
|
rid of it when its usage count reaches 0.
|
||||||
|
|
||||||
|
An operation can be made exclusive upon an object by setting the
|
||||||
|
appropriate flag before enqueuing it with fscache_enqueue_operation(). If
|
||||||
|
an operation needs more processing time, it should be enqueued again.
|
||||||
|
|
||||||
|
|
||||||
|
(*) FS-Cache retrieval operation record:
|
||||||
|
|
||||||
|
struct fscache_retrieval {
|
||||||
|
struct fscache_operation op;
|
||||||
|
struct address_space *mapping;
|
||||||
|
struct list_head *to_do;
|
||||||
|
...
|
||||||
|
};
|
||||||
|
|
||||||
|
A structure of this type is allocated by FS-Cache to record retrieval and
|
||||||
|
allocation requests made by the netfs. This struct is then passed to the
|
||||||
|
backend to do the operation. The backend may get extra refs to it by
|
||||||
|
calling fscache_get_retrieval() and refs may be discarded by calling
|
||||||
|
fscache_put_retrieval().
|
||||||
|
|
||||||
|
A retrieval operation can be used by the backend to do retrieval work. To
|
||||||
|
do this, the retrieval->op.processor method pointer should be set
|
||||||
|
appropriately by the backend and fscache_enqueue_retrieval() called to
|
||||||
|
submit it to the thread pool. CacheFiles, for example, uses this to queue
|
||||||
|
page examination when it detects PG_lock being cleared.
|
||||||
|
|
||||||
|
The to_do field is an empty list available for the cache backend to use as
|
||||||
|
it sees fit.
|
||||||
|
|
||||||
|
|
||||||
|
(*) FS-Cache storage operation record:
|
||||||
|
|
||||||
|
struct fscache_storage {
|
||||||
|
struct fscache_operation op;
|
||||||
|
pgoff_t store_limit;
|
||||||
|
...
|
||||||
|
};
|
||||||
|
|
||||||
|
A structure of this type is allocated by FS-Cache to record outstanding
|
||||||
|
writes to be made. FS-Cache itself enqueues this operation and invokes
|
||||||
|
the write_page() method on the object at appropriate times to effect
|
||||||
|
storage.
|
||||||
|
|
||||||
|
|
||||||
|
================
|
||||||
|
CACHE OPERATIONS
|
||||||
|
================
|
||||||
|
|
||||||
|
The cache backend provides FS-Cache with a table of operations that can be
|
||||||
|
performed on the denizens of the cache. These are held in a structure of type:
|
||||||
|
|
||||||
|
struct fscache_cache_ops
|
||||||
|
|
||||||
|
(*) Name of cache provider [mandatory]:
|
||||||
|
|
||||||
|
const char *name
|
||||||
|
|
||||||
|
This isn't strictly an operation, but should be pointed at a string naming
|
||||||
|
the backend.
|
||||||
|
|
||||||
|
|
||||||
|
(*) Allocate a new object [mandatory]:
|
||||||
|
|
||||||
|
struct fscache_object *(*alloc_object)(struct fscache_cache *cache,
|
||||||
|
struct fscache_cookie *cookie)
|
||||||
|
|
||||||
|
This method is used to allocate a cache object representation to back a
|
||||||
|
cookie in a particular cache. fscache_object_init() should be called on
|
||||||
|
the object to initialise it prior to returning.
|
||||||
|
|
||||||
|
This function may also be used to parse the index key to be used for
|
||||||
|
multiple lookup calls to turn it into a more convenient form. FS-Cache
|
||||||
|
will call the lookup_complete() method to allow the cache to release the
|
||||||
|
form once lookup is complete or aborted.
|
||||||
|
|
||||||
|
|
||||||
|
(*) Look up and create object [mandatory]:
|
||||||
|
|
||||||
|
void (*lookup_object)(struct fscache_object *object)
|
||||||
|
|
||||||
|
This method is used to look up an object, given that the object is already
|
||||||
|
allocated and attached to the cookie. This should instantiate that object
|
||||||
|
in the cache if it can.
|
||||||
|
|
||||||
|
The method should call fscache_object_lookup_negative() as soon as
|
||||||
|
possible if it determines the object doesn't exist in the cache. If the
|
||||||
|
object is found to exist and the netfs indicates that it is valid then
|
||||||
|
fscache_obtained_object() should be called once the object is in a
|
||||||
|
position to have data stored in it. Similarly, fscache_obtained_object()
|
||||||
|
should also be called once a non-present object has been created.
|
||||||
|
|
||||||
|
If a lookup error occurs, fscache_object_lookup_error() should be called
|
||||||
|
to abort the lookup of that object.
|
||||||
|
|
||||||
|
|
||||||
|
(*) Release lookup data [mandatory]:
|
||||||
|
|
||||||
|
void (*lookup_complete)(struct fscache_object *object)
|
||||||
|
|
||||||
|
This method is called to ask the cache to release any resources it was
|
||||||
|
using to perform a lookup.
|
||||||
|
|
||||||
|
|
||||||
|
(*) Increment object refcount [mandatory]:
|
||||||
|
|
||||||
|
struct fscache_object *(*grab_object)(struct fscache_object *object)
|
||||||
|
|
||||||
|
This method is called to increment the reference count on an object. It
|
||||||
|
may fail (for instance if the cache is being withdrawn) by returning NULL.
|
||||||
|
It should return the object pointer if successful.
|
||||||
|
|
||||||
|
|
||||||
|
(*) Lock/Unlock object [mandatory]:
|
||||||
|
|
||||||
|
void (*lock_object)(struct fscache_object *object)
|
||||||
|
void (*unlock_object)(struct fscache_object *object)
|
||||||
|
|
||||||
|
These methods are used to exclusively lock an object. It must be possible
|
||||||
|
to schedule with the lock held, so a spinlock isn't sufficient.
|
||||||
|
|
||||||
|
|
||||||
|
(*) Pin/Unpin object [optional]:
|
||||||
|
|
||||||
|
int (*pin_object)(struct fscache_object *object)
|
||||||
|
void (*unpin_object)(struct fscache_object *object)
|
||||||
|
|
||||||
|
These methods are used to pin an object into the cache. Once pinned an
|
||||||
|
object cannot be reclaimed to make space. Return -ENOSPC if there's not
|
||||||
|
enough space in the cache to permit this.
|
||||||
|
|
||||||
|
|
||||||
|
(*) Update object [mandatory]:
|
||||||
|
|
||||||
|
int (*update_object)(struct fscache_object *object)
|
||||||
|
|
||||||
|
This is called to update the index entry for the specified object. The
|
||||||
|
new information should be in object->cookie->netfs_data. This can be
|
||||||
|
obtained by calling object->cookie->def->get_aux()/get_attr().
|
||||||
|
|
||||||
|
|
||||||
|
(*) Discard object [mandatory]:
|
||||||
|
|
||||||
|
void (*drop_object)(struct fscache_object *object)
|
||||||
|
|
||||||
|
This method is called to indicate that an object has been unbound from its
|
||||||
|
cookie, and that the cache should release the object's resources and
|
||||||
|
retire it if it's in state FSCACHE_OBJECT_RECYCLING.
|
||||||
|
|
||||||
|
This method should not attempt to release any references held by the
|
||||||
|
caller. The caller will invoke the put_object() method as appropriate.
|
||||||
|
|
||||||
|
|
||||||
|
(*) Release object reference [mandatory]:
|
||||||
|
|
||||||
|
void (*put_object)(struct fscache_object *object)
|
||||||
|
|
||||||
|
This method is used to discard a reference to an object. The object may
|
||||||
|
be freed when all the references to it are released.
|
||||||
|
|
||||||
|
|
||||||
|
(*) Synchronise a cache [mandatory]:
|
||||||
|
|
||||||
|
void (*sync)(struct fscache_cache *cache)
|
||||||
|
|
||||||
|
This is called to ask the backend to synchronise a cache with its backing
|
||||||
|
device.
|
||||||
|
|
||||||
|
|
||||||
|
(*) Dissociate a cache [mandatory]:
|
||||||
|
|
||||||
|
void (*dissociate_pages)(struct fscache_cache *cache)
|
||||||
|
|
||||||
|
This is called to ask a cache to perform any page dissociations as part of
|
||||||
|
cache withdrawal.
|
||||||
|
|
||||||
|
|
||||||
|
(*) Notification that the attributes on a netfs file changed [mandatory]:
|
||||||
|
|
||||||
|
int (*attr_changed)(struct fscache_object *object);
|
||||||
|
|
||||||
|
This is called to indicate to the cache that certain attributes on a netfs
|
||||||
|
file have changed (for example the maximum size a file may reach). The
|
||||||
|
cache can read these from the netfs by calling the cookie's get_attr()
|
||||||
|
method.
|
||||||
|
|
||||||
|
The cache may use the file size information to reserve space on the cache.
|
||||||
|
It should also call fscache_set_store_limit() to indicate to FS-Cache the
|
||||||
|
highest byte it's willing to store for an object.
|
||||||
|
|
||||||
|
This method may return -ve if an error occurred or the cache object cannot
|
||||||
|
be expanded. In such a case, the object will be withdrawn from service.
|
||||||
|
|
||||||
|
This operation is run asynchronously from FS-Cache's thread pool, and
|
||||||
|
storage and retrieval operations from the netfs are excluded during the
|
||||||
|
execution of this operation.
|
||||||
|
|
||||||
|
|
||||||
|
(*) Reserve cache space for an object's data [optional]:
|
||||||
|
|
||||||
|
int (*reserve_space)(struct fscache_object *object, loff_t size);
|
||||||
|
|
||||||
|
This is called to request that cache space be reserved to hold the data
|
||||||
|
for an object and the metadata used to track it. Zero size should be
|
||||||
|
taken as request to cancel a reservation.
|
||||||
|
|
||||||
|
This should return 0 if successful, -ENOSPC if there isn't enough space
|
||||||
|
available, or -ENOMEM or -EIO on other errors.
|
||||||
|
|
||||||
|
The reservation may exceed the current size of the object, thus permitting
|
||||||
|
future expansion. If the amount of space consumed by an object would
|
||||||
|
exceed the reservation, it's permitted to refuse requests to allocate
|
||||||
|
pages, but not required. An object may be pruned down to its reservation
|
||||||
|
size if larger than that already.
|
||||||
|
|
||||||
|
|
||||||
|
(*) Request page be read from cache [mandatory]:
|
||||||
|
|
||||||
|
int (*read_or_alloc_page)(struct fscache_retrieval *op,
|
||||||
|
struct page *page,
|
||||||
|
gfp_t gfp)
|
||||||
|
|
||||||
|
This is called to attempt to read a netfs page from the cache, or to
|
||||||
|
reserve a backing block if not. FS-Cache will have done as much checking
|
||||||
|
as it can before calling, but most of the work belongs to the backend.
|
||||||
|
|
||||||
|
If there's no page in the cache, then -ENODATA should be returned if the
|
||||||
|
backend managed to reserve a backing block; -ENOBUFS or -ENOMEM if it
|
||||||
|
didn't.
|
||||||
|
|
||||||
|
If there is suitable data in the cache, then a read operation should be
|
||||||
|
queued and 0 returned. When the read finishes, fscache_end_io() should be
|
||||||
|
called.
|
||||||
|
|
||||||
|
The fscache_mark_pages_cached() should be called for the page if any cache
|
||||||
|
metadata is retained. This will indicate to the netfs that the page needs
|
||||||
|
explicit uncaching. This operation takes a pagevec, thus allowing several
|
||||||
|
pages to be marked at once.
|
||||||
|
|
||||||
|
The retrieval record pointed to by op should be retained for each page
|
||||||
|
queued and released when I/O on the page has been formally ended.
|
||||||
|
fscache_get/put_retrieval() are available for this purpose.
|
||||||
|
|
||||||
|
The retrieval record may be used to get CPU time via the FS-Cache thread
|
||||||
|
pool. If this is desired, the op->op.processor should be set to point to
|
||||||
|
the appropriate processing routine, and fscache_enqueue_retrieval() should
|
||||||
|
be called at an appropriate point to request CPU time. For instance, the
|
||||||
|
retrieval routine could be enqueued upon the completion of a disk read.
|
||||||
|
The to_do field in the retrieval record is provided to aid in this.
|
||||||
|
|
||||||
|
If an I/O error occurs, fscache_io_error() should be called and -ENOBUFS
|
||||||
|
returned if possible or fscache_end_io() called with a suitable error
|
||||||
|
code..
|
||||||
|
|
||||||
|
|
||||||
|
(*) Request pages be read from cache [mandatory]:
|
||||||
|
|
||||||
|
int (*read_or_alloc_pages)(struct fscache_retrieval *op,
|
||||||
|
struct list_head *pages,
|
||||||
|
unsigned *nr_pages,
|
||||||
|
gfp_t gfp)
|
||||||
|
|
||||||
|
This is like the read_or_alloc_page() method, except it is handed a list
|
||||||
|
of pages instead of one page. Any pages on which a read operation is
|
||||||
|
started must be added to the page cache for the specified mapping and also
|
||||||
|
to the LRU. Such pages must also be removed from the pages list and
|
||||||
|
*nr_pages decremented per page.
|
||||||
|
|
||||||
|
If there was an error such as -ENOMEM, then that should be returned; else
|
||||||
|
if one or more pages couldn't be read or allocated, then -ENOBUFS should
|
||||||
|
be returned; else if one or more pages couldn't be read, then -ENODATA
|
||||||
|
should be returned. If all the pages are dispatched then 0 should be
|
||||||
|
returned.
|
||||||
|
|
||||||
|
|
||||||
|
(*) Request page be allocated in the cache [mandatory]:
|
||||||
|
|
||||||
|
int (*allocate_page)(struct fscache_retrieval *op,
|
||||||
|
struct page *page,
|
||||||
|
gfp_t gfp)
|
||||||
|
|
||||||
|
This is like the read_or_alloc_page() method, except that it shouldn't
|
||||||
|
read from the cache, even if there's data there that could be retrieved.
|
||||||
|
It should, however, set up any internal metadata required such that
|
||||||
|
the write_page() method can write to the cache.
|
||||||
|
|
||||||
|
If there's no backing block available, then -ENOBUFS should be returned
|
||||||
|
(or -ENOMEM if there were other problems). If a block is successfully
|
||||||
|
allocated, then the netfs page should be marked and 0 returned.
|
||||||
|
|
||||||
|
|
||||||
|
(*) Request pages be allocated in the cache [mandatory]:
|
||||||
|
|
||||||
|
int (*allocate_pages)(struct fscache_retrieval *op,
|
||||||
|
struct list_head *pages,
|
||||||
|
unsigned *nr_pages,
|
||||||
|
gfp_t gfp)
|
||||||
|
|
||||||
|
This is an multiple page version of the allocate_page() method. pages and
|
||||||
|
nr_pages should be treated as for the read_or_alloc_pages() method.
|
||||||
|
|
||||||
|
|
||||||
|
(*) Request page be written to cache [mandatory]:
|
||||||
|
|
||||||
|
int (*write_page)(struct fscache_storage *op,
|
||||||
|
struct page *page);
|
||||||
|
|
||||||
|
This is called to write from a page on which there was a previously
|
||||||
|
successful read_or_alloc_page() call or similar. FS-Cache filters out
|
||||||
|
pages that don't have mappings.
|
||||||
|
|
||||||
|
This method is called asynchronously from the FS-Cache thread pool. It is
|
||||||
|
not required to actually store anything, provided -ENODATA is then
|
||||||
|
returned to the next read of this page.
|
||||||
|
|
||||||
|
If an error occurred, then a negative error code should be returned,
|
||||||
|
otherwise zero should be returned. FS-Cache will take appropriate action
|
||||||
|
in response to an error, such as withdrawing this object.
|
||||||
|
|
||||||
|
If this method returns success then FS-Cache will inform the netfs
|
||||||
|
appropriately.
|
||||||
|
|
||||||
|
|
||||||
|
(*) Discard retained per-page metadata [mandatory]:
|
||||||
|
|
||||||
|
void (*uncache_page)(struct fscache_object *object, struct page *page)
|
||||||
|
|
||||||
|
This is called when a netfs page is being evicted from the pagecache. The
|
||||||
|
cache backend should tear down any internal representation or tracking it
|
||||||
|
maintains for this page.
|
||||||
|
|
||||||
|
|
||||||
|
==================
|
||||||
|
FS-CACHE UTILITIES
|
||||||
|
==================
|
||||||
|
|
||||||
|
FS-Cache provides some utilities that a cache backend may make use of:
|
||||||
|
|
||||||
|
(*) Note occurrence of an I/O error in a cache:
|
||||||
|
|
||||||
|
void fscache_io_error(struct fscache_cache *cache)
|
||||||
|
|
||||||
|
This tells FS-Cache that an I/O error occurred in the cache. After this
|
||||||
|
has been called, only resource dissociation operations (object and page
|
||||||
|
release) will be passed from the netfs to the cache backend for the
|
||||||
|
specified cache.
|
||||||
|
|
||||||
|
This does not actually withdraw the cache. That must be done separately.
|
||||||
|
|
||||||
|
|
||||||
|
(*) Invoke the retrieval I/O completion function:
|
||||||
|
|
||||||
|
void fscache_end_io(struct fscache_retrieval *op, struct page *page,
|
||||||
|
int error);
|
||||||
|
|
||||||
|
This is called to note the end of an attempt to retrieve a page. The
|
||||||
|
error value should be 0 if successful and an error otherwise.
|
||||||
|
|
||||||
|
|
||||||
|
(*) Set highest store limit:
|
||||||
|
|
||||||
|
void fscache_set_store_limit(struct fscache_object *object,
|
||||||
|
loff_t i_size);
|
||||||
|
|
||||||
|
This sets the limit FS-Cache imposes on the highest byte it's willing to
|
||||||
|
try and store for a netfs. Any page over this limit is automatically
|
||||||
|
rejected by fscache_read_alloc_page() and co with -ENOBUFS.
|
||||||
|
|
||||||
|
|
||||||
|
(*) Mark pages as being cached:
|
||||||
|
|
||||||
|
void fscache_mark_pages_cached(struct fscache_retrieval *op,
|
||||||
|
struct pagevec *pagevec);
|
||||||
|
|
||||||
|
This marks a set of pages as being cached. After this has been called,
|
||||||
|
the netfs must call fscache_uncache_page() to unmark the pages.
|
||||||
|
|
||||||
|
|
||||||
|
(*) Perform coherency check on an object:
|
||||||
|
|
||||||
|
enum fscache_checkaux fscache_check_aux(struct fscache_object *object,
|
||||||
|
const void *data,
|
||||||
|
uint16_t datalen);
|
||||||
|
|
||||||
|
This asks the netfs to perform a coherency check on an object that has
|
||||||
|
just been looked up. The cookie attached to the object will determine the
|
||||||
|
netfs to use. data and datalen should specify where the auxiliary data
|
||||||
|
retrieved from the cache can be found.
|
||||||
|
|
||||||
|
One of three values will be returned:
|
||||||
|
|
||||||
|
(*) FSCACHE_CHECKAUX_OKAY
|
||||||
|
|
||||||
|
The coherency data indicates the object is valid as is.
|
||||||
|
|
||||||
|
(*) FSCACHE_CHECKAUX_NEEDS_UPDATE
|
||||||
|
|
||||||
|
The coherency data needs updating, but otherwise the object is
|
||||||
|
valid.
|
||||||
|
|
||||||
|
(*) FSCACHE_CHECKAUX_OBSOLETE
|
||||||
|
|
||||||
|
The coherency data indicates that the object is obsolete and should
|
||||||
|
be discarded.
|
||||||
|
|
||||||
|
|
||||||
|
(*) Initialise a freshly allocated object:
|
||||||
|
|
||||||
|
void fscache_object_init(struct fscache_object *object);
|
||||||
|
|
||||||
|
This initialises all the fields in an object representation.
|
||||||
|
|
||||||
|
|
||||||
|
(*) Indicate the destruction of an object:
|
||||||
|
|
||||||
|
void fscache_object_destroyed(struct fscache_cache *cache);
|
||||||
|
|
||||||
|
This must be called to inform FS-Cache that an object that belonged to a
|
||||||
|
cache has been destroyed and deallocated. This will allow continuation
|
||||||
|
of the cache withdrawal process when it is stopped pending destruction of
|
||||||
|
all the objects.
|
||||||
|
|
||||||
|
|
||||||
|
(*) Indicate negative lookup on an object:
|
||||||
|
|
||||||
|
void fscache_object_lookup_negative(struct fscache_object *object);
|
||||||
|
|
||||||
|
This is called to indicate to FS-Cache that a lookup process for an object
|
||||||
|
found a negative result.
|
||||||
|
|
||||||
|
This changes the state of an object to permit reads pending on lookup
|
||||||
|
completion to go off and start fetching data from the netfs server as it's
|
||||||
|
known at this point that there can't be any data in the cache.
|
||||||
|
|
||||||
|
This may be called multiple times on an object. Only the first call is
|
||||||
|
significant - all subsequent calls are ignored.
|
||||||
|
|
||||||
|
|
||||||
|
(*) Indicate an object has been obtained:
|
||||||
|
|
||||||
|
void fscache_obtained_object(struct fscache_object *object);
|
||||||
|
|
||||||
|
This is called to indicate to FS-Cache that a lookup process for an object
|
||||||
|
produced a positive result, or that an object was created. This should
|
||||||
|
only be called once for any particular object.
|
||||||
|
|
||||||
|
This changes the state of an object to indicate:
|
||||||
|
|
||||||
|
(1) if no call to fscache_object_lookup_negative() has been made on
|
||||||
|
this object, that there may be data available, and that reads can
|
||||||
|
now go and look for it; and
|
||||||
|
|
||||||
|
(2) that writes may now proceed against this object.
|
||||||
|
|
||||||
|
|
||||||
|
(*) Indicate that object lookup failed:
|
||||||
|
|
||||||
|
void fscache_object_lookup_error(struct fscache_object *object);
|
||||||
|
|
||||||
|
This marks an object as having encountered a fatal error (usually EIO)
|
||||||
|
and causes it to move into a state whereby it will be withdrawn as soon
|
||||||
|
as possible.
|
||||||
|
|
||||||
|
|
||||||
|
(*) Get and release references on a retrieval record:
|
||||||
|
|
||||||
|
void fscache_get_retrieval(struct fscache_retrieval *op);
|
||||||
|
void fscache_put_retrieval(struct fscache_retrieval *op);
|
||||||
|
|
||||||
|
These two functions are used to retain a retrieval record whilst doing
|
||||||
|
asynchronous data retrieval and block allocation.
|
||||||
|
|
||||||
|
|
||||||
|
(*) Enqueue a retrieval record for processing.
|
||||||
|
|
||||||
|
void fscache_enqueue_retrieval(struct fscache_retrieval *op);
|
||||||
|
|
||||||
|
This enqueues a retrieval record for processing by the FS-Cache thread
|
||||||
|
pool. One of the threads in the pool will invoke the retrieval record's
|
||||||
|
op->op.processor callback function. This function may be called from
|
||||||
|
within the callback function.
|
||||||
|
|
||||||
|
|
||||||
|
(*) List of object state names:
|
||||||
|
|
||||||
|
const char *fscache_object_states[];
|
||||||
|
|
||||||
|
For debugging purposes, this may be used to turn the state that an object
|
||||||
|
is in into a text string for display purposes.
|
501
Documentation/filesystems/caching/cachefiles.txt
Normal file
501
Documentation/filesystems/caching/cachefiles.txt
Normal file
@@ -0,0 +1,501 @@
|
|||||||
|
===============================================
|
||||||
|
CacheFiles: CACHE ON ALREADY MOUNTED FILESYSTEM
|
||||||
|
===============================================
|
||||||
|
|
||||||
|
Contents:
|
||||||
|
|
||||||
|
(*) Overview.
|
||||||
|
|
||||||
|
(*) Requirements.
|
||||||
|
|
||||||
|
(*) Configuration.
|
||||||
|
|
||||||
|
(*) Starting the cache.
|
||||||
|
|
||||||
|
(*) Things to avoid.
|
||||||
|
|
||||||
|
(*) Cache culling.
|
||||||
|
|
||||||
|
(*) Cache structure.
|
||||||
|
|
||||||
|
(*) Security model and SELinux.
|
||||||
|
|
||||||
|
(*) A note on security.
|
||||||
|
|
||||||
|
(*) Statistical information.
|
||||||
|
|
||||||
|
(*) Debugging.
|
||||||
|
|
||||||
|
|
||||||
|
========
|
||||||
|
OVERVIEW
|
||||||
|
========
|
||||||
|
|
||||||
|
CacheFiles is a caching backend that's meant to use as a cache a directory on
|
||||||
|
an already mounted filesystem of a local type (such as Ext3).
|
||||||
|
|
||||||
|
CacheFiles uses a userspace daemon to do some of the cache management - such as
|
||||||
|
reaping stale nodes and culling. This is called cachefilesd and lives in
|
||||||
|
/sbin.
|
||||||
|
|
||||||
|
The filesystem and data integrity of the cache are only as good as those of the
|
||||||
|
filesystem providing the backing services. Note that CacheFiles does not
|
||||||
|
attempt to journal anything since the journalling interfaces of the various
|
||||||
|
filesystems are very specific in nature.
|
||||||
|
|
||||||
|
CacheFiles creates a misc character device - "/dev/cachefiles" - that is used
|
||||||
|
to communication with the daemon. Only one thing may have this open at once,
|
||||||
|
and whilst it is open, a cache is at least partially in existence. The daemon
|
||||||
|
opens this and sends commands down it to control the cache.
|
||||||
|
|
||||||
|
CacheFiles is currently limited to a single cache.
|
||||||
|
|
||||||
|
CacheFiles attempts to maintain at least a certain percentage of free space on
|
||||||
|
the filesystem, shrinking the cache by culling the objects it contains to make
|
||||||
|
space if necessary - see the "Cache Culling" section. This means it can be
|
||||||
|
placed on the same medium as a live set of data, and will expand to make use of
|
||||||
|
spare space and automatically contract when the set of data requires more
|
||||||
|
space.
|
||||||
|
|
||||||
|
|
||||||
|
============
|
||||||
|
REQUIREMENTS
|
||||||
|
============
|
||||||
|
|
||||||
|
The use of CacheFiles and its daemon requires the following features to be
|
||||||
|
available in the system and in the cache filesystem:
|
||||||
|
|
||||||
|
- dnotify.
|
||||||
|
|
||||||
|
- extended attributes (xattrs).
|
||||||
|
|
||||||
|
- openat() and friends.
|
||||||
|
|
||||||
|
- bmap() support on files in the filesystem (FIBMAP ioctl).
|
||||||
|
|
||||||
|
- The use of bmap() to detect a partial page at the end of the file.
|
||||||
|
|
||||||
|
It is strongly recommended that the "dir_index" option is enabled on Ext3
|
||||||
|
filesystems being used as a cache.
|
||||||
|
|
||||||
|
|
||||||
|
=============
|
||||||
|
CONFIGURATION
|
||||||
|
=============
|
||||||
|
|
||||||
|
The cache is configured by a script in /etc/cachefilesd.conf. These commands
|
||||||
|
set up cache ready for use. The following script commands are available:
|
||||||
|
|
||||||
|
(*) brun <N>%
|
||||||
|
(*) bcull <N>%
|
||||||
|
(*) bstop <N>%
|
||||||
|
(*) frun <N>%
|
||||||
|
(*) fcull <N>%
|
||||||
|
(*) fstop <N>%
|
||||||
|
|
||||||
|
Configure the culling limits. Optional. See the section on culling
|
||||||
|
The defaults are 7% (run), 5% (cull) and 1% (stop) respectively.
|
||||||
|
|
||||||
|
The commands beginning with a 'b' are file space (block) limits, those
|
||||||
|
beginning with an 'f' are file count limits.
|
||||||
|
|
||||||
|
(*) dir <path>
|
||||||
|
|
||||||
|
Specify the directory containing the root of the cache. Mandatory.
|
||||||
|
|
||||||
|
(*) tag <name>
|
||||||
|
|
||||||
|
Specify a tag to FS-Cache to use in distinguishing multiple caches.
|
||||||
|
Optional. The default is "CacheFiles".
|
||||||
|
|
||||||
|
(*) debug <mask>
|
||||||
|
|
||||||
|
Specify a numeric bitmask to control debugging in the kernel module.
|
||||||
|
Optional. The default is zero (all off). The following values can be
|
||||||
|
OR'd into the mask to collect various information:
|
||||||
|
|
||||||
|
1 Turn on trace of function entry (_enter() macros)
|
||||||
|
2 Turn on trace of function exit (_leave() macros)
|
||||||
|
4 Turn on trace of internal debug points (_debug())
|
||||||
|
|
||||||
|
This mask can also be set through sysfs, eg:
|
||||||
|
|
||||||
|
echo 5 >/sys/modules/cachefiles/parameters/debug
|
||||||
|
|
||||||
|
|
||||||
|
==================
|
||||||
|
STARTING THE CACHE
|
||||||
|
==================
|
||||||
|
|
||||||
|
The cache is started by running the daemon. The daemon opens the cache device,
|
||||||
|
configures the cache and tells it to begin caching. At that point the cache
|
||||||
|
binds to fscache and the cache becomes live.
|
||||||
|
|
||||||
|
The daemon is run as follows:
|
||||||
|
|
||||||
|
/sbin/cachefilesd [-d]* [-s] [-n] [-f <configfile>]
|
||||||
|
|
||||||
|
The flags are:
|
||||||
|
|
||||||
|
(*) -d
|
||||||
|
|
||||||
|
Increase the debugging level. This can be specified multiple times and
|
||||||
|
is cumulative with itself.
|
||||||
|
|
||||||
|
(*) -s
|
||||||
|
|
||||||
|
Send messages to stderr instead of syslog.
|
||||||
|
|
||||||
|
(*) -n
|
||||||
|
|
||||||
|
Don't daemonise and go into background.
|
||||||
|
|
||||||
|
(*) -f <configfile>
|
||||||
|
|
||||||
|
Use an alternative configuration file rather than the default one.
|
||||||
|
|
||||||
|
|
||||||
|
===============
|
||||||
|
THINGS TO AVOID
|
||||||
|
===============
|
||||||
|
|
||||||
|
Do not mount other things within the cache as this will cause problems. The
|
||||||
|
kernel module contains its own very cut-down path walking facility that ignores
|
||||||
|
mountpoints, but the daemon can't avoid them.
|
||||||
|
|
||||||
|
Do not create, rename or unlink files and directories in the cache whilst the
|
||||||
|
cache is active, as this may cause the state to become uncertain.
|
||||||
|
|
||||||
|
Renaming files in the cache might make objects appear to be other objects (the
|
||||||
|
filename is part of the lookup key).
|
||||||
|
|
||||||
|
Do not change or remove the extended attributes attached to cache files by the
|
||||||
|
cache as this will cause the cache state management to get confused.
|
||||||
|
|
||||||
|
Do not create files or directories in the cache, lest the cache get confused or
|
||||||
|
serve incorrect data.
|
||||||
|
|
||||||
|
Do not chmod files in the cache. The module creates things with minimal
|
||||||
|
permissions to prevent random users being able to access them directly.
|
||||||
|
|
||||||
|
|
||||||
|
=============
|
||||||
|
CACHE CULLING
|
||||||
|
=============
|
||||||
|
|
||||||
|
The cache may need culling occasionally to make space. This involves
|
||||||
|
discarding objects from the cache that have been used less recently than
|
||||||
|
anything else. Culling is based on the access time of data objects. Empty
|
||||||
|
directories are culled if not in use.
|
||||||
|
|
||||||
|
Cache culling is done on the basis of the percentage of blocks and the
|
||||||
|
percentage of files available in the underlying filesystem. There are six
|
||||||
|
"limits":
|
||||||
|
|
||||||
|
(*) brun
|
||||||
|
(*) frun
|
||||||
|
|
||||||
|
If the amount of free space and the number of available files in the cache
|
||||||
|
rises above both these limits, then culling is turned off.
|
||||||
|
|
||||||
|
(*) bcull
|
||||||
|
(*) fcull
|
||||||
|
|
||||||
|
If the amount of available space or the number of available files in the
|
||||||
|
cache falls below either of these limits, then culling is started.
|
||||||
|
|
||||||
|
(*) bstop
|
||||||
|
(*) fstop
|
||||||
|
|
||||||
|
If the amount of available space or the number of available files in the
|
||||||
|
cache falls below either of these limits, then no further allocation of
|
||||||
|
disk space or files is permitted until culling has raised things above
|
||||||
|
these limits again.
|
||||||
|
|
||||||
|
These must be configured thusly:
|
||||||
|
|
||||||
|
0 <= bstop < bcull < brun < 100
|
||||||
|
0 <= fstop < fcull < frun < 100
|
||||||
|
|
||||||
|
Note that these are percentages of available space and available files, and do
|
||||||
|
_not_ appear as 100 minus the percentage displayed by the "df" program.
|
||||||
|
|
||||||
|
The userspace daemon scans the cache to build up a table of cullable objects.
|
||||||
|
These are then culled in least recently used order. A new scan of the cache is
|
||||||
|
started as soon as space is made in the table. Objects will be skipped if
|
||||||
|
their atimes have changed or if the kernel module says it is still using them.
|
||||||
|
|
||||||
|
|
||||||
|
===============
|
||||||
|
CACHE STRUCTURE
|
||||||
|
===============
|
||||||
|
|
||||||
|
The CacheFiles module will create two directories in the directory it was
|
||||||
|
given:
|
||||||
|
|
||||||
|
(*) cache/
|
||||||
|
|
||||||
|
(*) graveyard/
|
||||||
|
|
||||||
|
The active cache objects all reside in the first directory. The CacheFiles
|
||||||
|
kernel module moves any retired or culled objects that it can't simply unlink
|
||||||
|
to the graveyard from which the daemon will actually delete them.
|
||||||
|
|
||||||
|
The daemon uses dnotify to monitor the graveyard directory, and will delete
|
||||||
|
anything that appears therein.
|
||||||
|
|
||||||
|
|
||||||
|
The module represents index objects as directories with the filename "I..." or
|
||||||
|
"J...". Note that the "cache/" directory is itself a special index.
|
||||||
|
|
||||||
|
Data objects are represented as files if they have no children, or directories
|
||||||
|
if they do. Their filenames all begin "D..." or "E...". If represented as a
|
||||||
|
directory, data objects will have a file in the directory called "data" that
|
||||||
|
actually holds the data.
|
||||||
|
|
||||||
|
Special objects are similar to data objects, except their filenames begin
|
||||||
|
"S..." or "T...".
|
||||||
|
|
||||||
|
|
||||||
|
If an object has children, then it will be represented as a directory.
|
||||||
|
Immediately in the representative directory are a collection of directories
|
||||||
|
named for hash values of the child object keys with an '@' prepended. Into
|
||||||
|
this directory, if possible, will be placed the representations of the child
|
||||||
|
objects:
|
||||||
|
|
||||||
|
INDEX INDEX INDEX DATA FILES
|
||||||
|
========= ========== ================================= ================
|
||||||
|
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400
|
||||||
|
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...DB1ry
|
||||||
|
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...N22ry
|
||||||
|
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...FP1ry
|
||||||
|
|
||||||
|
|
||||||
|
If the key is so long that it exceeds NAME_MAX with the decorations added on to
|
||||||
|
it, then it will be cut into pieces, the first few of which will be used to
|
||||||
|
make a nest of directories, and the last one of which will be the objects
|
||||||
|
inside the last directory. The names of the intermediate directories will have
|
||||||
|
'+' prepended:
|
||||||
|
|
||||||
|
J1223/@23/+xy...z/+kl...m/Epqr
|
||||||
|
|
||||||
|
|
||||||
|
Note that keys are raw data, and not only may they exceed NAME_MAX in size,
|
||||||
|
they may also contain things like '/' and NUL characters, and so they may not
|
||||||
|
be suitable for turning directly into a filename.
|
||||||
|
|
||||||
|
To handle this, CacheFiles will use a suitably printable filename directly and
|
||||||
|
"base-64" encode ones that aren't directly suitable. The two versions of
|
||||||
|
object filenames indicate the encoding:
|
||||||
|
|
||||||
|
OBJECT TYPE PRINTABLE ENCODED
|
||||||
|
=============== =============== ===============
|
||||||
|
Index "I..." "J..."
|
||||||
|
Data "D..." "E..."
|
||||||
|
Special "S..." "T..."
|
||||||
|
|
||||||
|
Intermediate directories are always "@" or "+" as appropriate.
|
||||||
|
|
||||||
|
|
||||||
|
Each object in the cache has an extended attribute label that holds the object
|
||||||
|
type ID (required to distinguish special objects) and the auxiliary data from
|
||||||
|
the netfs. The latter is used to detect stale objects in the cache and update
|
||||||
|
or retire them.
|
||||||
|
|
||||||
|
|
||||||
|
Note that CacheFiles will erase from the cache any file it doesn't recognise or
|
||||||
|
any file of an incorrect type (such as a FIFO file or a device file).
|
||||||
|
|
||||||
|
|
||||||
|
==========================
|
||||||
|
SECURITY MODEL AND SELINUX
|
||||||
|
==========================
|
||||||
|
|
||||||
|
CacheFiles is implemented to deal properly with the LSM security features of
|
||||||
|
the Linux kernel and the SELinux facility.
|
||||||
|
|
||||||
|
One of the problems that CacheFiles faces is that it is generally acting on
|
||||||
|
behalf of a process, and running in that process's context, and that includes a
|
||||||
|
security context that is not appropriate for accessing the cache - either
|
||||||
|
because the files in the cache are inaccessible to that process, or because if
|
||||||
|
the process creates a file in the cache, that file may be inaccessible to other
|
||||||
|
processes.
|
||||||
|
|
||||||
|
The way CacheFiles works is to temporarily change the security context (fsuid,
|
||||||
|
fsgid and actor security label) that the process acts as - without changing the
|
||||||
|
security context of the process when it the target of an operation performed by
|
||||||
|
some other process (so signalling and suchlike still work correctly).
|
||||||
|
|
||||||
|
|
||||||
|
When the CacheFiles module is asked to bind to its cache, it:
|
||||||
|
|
||||||
|
(1) Finds the security label attached to the root cache directory and uses
|
||||||
|
that as the security label with which it will create files. By default,
|
||||||
|
this is:
|
||||||
|
|
||||||
|
cachefiles_var_t
|
||||||
|
|
||||||
|
(2) Finds the security label of the process which issued the bind request
|
||||||
|
(presumed to be the cachefilesd daemon), which by default will be:
|
||||||
|
|
||||||
|
cachefilesd_t
|
||||||
|
|
||||||
|
and asks LSM to supply a security ID as which it should act given the
|
||||||
|
daemon's label. By default, this will be:
|
||||||
|
|
||||||
|
cachefiles_kernel_t
|
||||||
|
|
||||||
|
SELinux transitions the daemon's security ID to the module's security ID
|
||||||
|
based on a rule of this form in the policy.
|
||||||
|
|
||||||
|
type_transition <daemon's-ID> kernel_t : process <module's-ID>;
|
||||||
|
|
||||||
|
For instance:
|
||||||
|
|
||||||
|
type_transition cachefilesd_t kernel_t : process cachefiles_kernel_t;
|
||||||
|
|
||||||
|
|
||||||
|
The module's security ID gives it permission to create, move and remove files
|
||||||
|
and directories in the cache, to find and access directories and files in the
|
||||||
|
cache, to set and access extended attributes on cache objects, and to read and
|
||||||
|
write files in the cache.
|
||||||
|
|
||||||
|
The daemon's security ID gives it only a very restricted set of permissions: it
|
||||||
|
may scan directories, stat files and erase files and directories. It may
|
||||||
|
not read or write files in the cache, and so it is precluded from accessing the
|
||||||
|
data cached therein; nor is it permitted to create new files in the cache.
|
||||||
|
|
||||||
|
|
||||||
|
There are policy source files available in:
|
||||||
|
|
||||||
|
http://people.redhat.com/~dhowells/fscache/cachefilesd-0.8.tar.bz2
|
||||||
|
|
||||||
|
and later versions. In that tarball, see the files:
|
||||||
|
|
||||||
|
cachefilesd.te
|
||||||
|
cachefilesd.fc
|
||||||
|
cachefilesd.if
|
||||||
|
|
||||||
|
They are built and installed directly by the RPM.
|
||||||
|
|
||||||
|
If a non-RPM based system is being used, then copy the above files to their own
|
||||||
|
directory and run:
|
||||||
|
|
||||||
|
make -f /usr/share/selinux/devel/Makefile
|
||||||
|
semodule -i cachefilesd.pp
|
||||||
|
|
||||||
|
You will need checkpolicy and selinux-policy-devel installed prior to the
|
||||||
|
build.
|
||||||
|
|
||||||
|
|
||||||
|
By default, the cache is located in /var/fscache, but if it is desirable that
|
||||||
|
it should be elsewhere, than either the above policy files must be altered, or
|
||||||
|
an auxiliary policy must be installed to label the alternate location of the
|
||||||
|
cache.
|
||||||
|
|
||||||
|
For instructions on how to add an auxiliary policy to enable the cache to be
|
||||||
|
located elsewhere when SELinux is in enforcing mode, please see:
|
||||||
|
|
||||||
|
/usr/share/doc/cachefilesd-*/move-cache.txt
|
||||||
|
|
||||||
|
When the cachefilesd rpm is installed; alternatively, the document can be found
|
||||||
|
in the sources.
|
||||||
|
|
||||||
|
|
||||||
|
==================
|
||||||
|
A NOTE ON SECURITY
|
||||||
|
==================
|
||||||
|
|
||||||
|
CacheFiles makes use of the split security in the task_struct. It allocates
|
||||||
|
its own task_security structure, and redirects current->act_as to point to it
|
||||||
|
when it acts on behalf of another process, in that process's context.
|
||||||
|
|
||||||
|
The reason it does this is that it calls vfs_mkdir() and suchlike rather than
|
||||||
|
bypassing security and calling inode ops directly. Therefore the VFS and LSM
|
||||||
|
may deny the CacheFiles access to the cache data because under some
|
||||||
|
circumstances the caching code is running in the security context of whatever
|
||||||
|
process issued the original syscall on the netfs.
|
||||||
|
|
||||||
|
Furthermore, should CacheFiles create a file or directory, the security
|
||||||
|
parameters with that object is created (UID, GID, security label) would be
|
||||||
|
derived from that process that issued the system call, thus potentially
|
||||||
|
preventing other processes from accessing the cache - including CacheFiles's
|
||||||
|
cache management daemon (cachefilesd).
|
||||||
|
|
||||||
|
What is required is to temporarily override the security of the process that
|
||||||
|
issued the system call. We can't, however, just do an in-place change of the
|
||||||
|
security data as that affects the process as an object, not just as a subject.
|
||||||
|
This means it may lose signals or ptrace events for example, and affects what
|
||||||
|
the process looks like in /proc.
|
||||||
|
|
||||||
|
So CacheFiles makes use of a logical split in the security between the
|
||||||
|
objective security (task->sec) and the subjective security (task->act_as). The
|
||||||
|
objective security holds the intrinsic security properties of a process and is
|
||||||
|
never overridden. This is what appears in /proc, and is what is used when a
|
||||||
|
process is the target of an operation by some other process (SIGKILL for
|
||||||
|
example).
|
||||||
|
|
||||||
|
The subjective security holds the active security properties of a process, and
|
||||||
|
may be overridden. This is not seen externally, and is used whan a process
|
||||||
|
acts upon another object, for example SIGKILLing another process or opening a
|
||||||
|
file.
|
||||||
|
|
||||||
|
LSM hooks exist that allow SELinux (or Smack or whatever) to reject a request
|
||||||
|
for CacheFiles to run in a context of a specific security label, or to create
|
||||||
|
files and directories with another security label.
|
||||||
|
|
||||||
|
|
||||||
|
=======================
|
||||||
|
STATISTICAL INFORMATION
|
||||||
|
=======================
|
||||||
|
|
||||||
|
If FS-Cache is compiled with the following option enabled:
|
||||||
|
|
||||||
|
CONFIG_CACHEFILES_HISTOGRAM=y
|
||||||
|
|
||||||
|
then it will gather certain statistics and display them through a proc file.
|
||||||
|
|
||||||
|
(*) /proc/fs/cachefiles/histogram
|
||||||
|
|
||||||
|
cat /proc/fs/cachefiles/histogram
|
||||||
|
JIFS SECS LOOKUPS MKDIRS CREATES
|
||||||
|
===== ===== ========= ========= =========
|
||||||
|
|
||||||
|
This shows the breakdown of the number of times each amount of time
|
||||||
|
between 0 jiffies and HZ-1 jiffies a variety of tasks took to run. The
|
||||||
|
columns are as follows:
|
||||||
|
|
||||||
|
COLUMN TIME MEASUREMENT
|
||||||
|
======= =======================================================
|
||||||
|
LOOKUPS Length of time to perform a lookup on the backing fs
|
||||||
|
MKDIRS Length of time to perform a mkdir on the backing fs
|
||||||
|
CREATES Length of time to perform a create on the backing fs
|
||||||
|
|
||||||
|
Each row shows the number of events that took a particular range of times.
|
||||||
|
Each step is 1 jiffy in size. The JIFS column indicates the particular
|
||||||
|
jiffy range covered, and the SECS field the equivalent number of seconds.
|
||||||
|
|
||||||
|
|
||||||
|
=========
|
||||||
|
DEBUGGING
|
||||||
|
=========
|
||||||
|
|
||||||
|
If CONFIG_CACHEFILES_DEBUG is enabled, the CacheFiles facility can have runtime
|
||||||
|
debugging enabled by adjusting the value in:
|
||||||
|
|
||||||
|
/sys/module/cachefiles/parameters/debug
|
||||||
|
|
||||||
|
This is a bitmask of debugging streams to enable:
|
||||||
|
|
||||||
|
BIT VALUE STREAM POINT
|
||||||
|
======= ======= =============================== =======================
|
||||||
|
0 1 General Function entry trace
|
||||||
|
1 2 Function exit trace
|
||||||
|
2 4 General
|
||||||
|
|
||||||
|
The appropriate set of values should be OR'd together and the result written to
|
||||||
|
the control file. For example:
|
||||||
|
|
||||||
|
echo $((1|4|8)) >/sys/module/cachefiles/parameters/debug
|
||||||
|
|
||||||
|
will turn on all function entry debugging.
|
333
Documentation/filesystems/caching/fscache.txt
Normal file
333
Documentation/filesystems/caching/fscache.txt
Normal file
@@ -0,0 +1,333 @@
|
|||||||
|
==========================
|
||||||
|
General Filesystem Caching
|
||||||
|
==========================
|
||||||
|
|
||||||
|
========
|
||||||
|
OVERVIEW
|
||||||
|
========
|
||||||
|
|
||||||
|
This facility is a general purpose cache for network filesystems, though it
|
||||||
|
could be used for caching other things such as ISO9660 filesystems too.
|
||||||
|
|
||||||
|
FS-Cache mediates between cache backends (such as CacheFS) and network
|
||||||
|
filesystems:
|
||||||
|
|
||||||
|
+---------+
|
||||||
|
| | +--------------+
|
||||||
|
| NFS |--+ | |
|
||||||
|
| | | +-->| CacheFS |
|
||||||
|
+---------+ | +----------+ | | /dev/hda5 |
|
||||||
|
| | | | +--------------+
|
||||||
|
+---------+ +-->| | |
|
||||||
|
| | | |--+
|
||||||
|
| AFS |----->| FS-Cache |
|
||||||
|
| | | |--+
|
||||||
|
+---------+ +-->| | |
|
||||||
|
| | | | +--------------+
|
||||||
|
+---------+ | +----------+ | | |
|
||||||
|
| | | +-->| CacheFiles |
|
||||||
|
| ISOFS |--+ | /var/cache |
|
||||||
|
| | +--------------+
|
||||||
|
+---------+
|
||||||
|
|
||||||
|
Or to look at it another way, FS-Cache is a module that provides a caching
|
||||||
|
facility to a network filesystem such that the cache is transparent to the
|
||||||
|
user:
|
||||||
|
|
||||||
|
+---------+
|
||||||
|
| |
|
||||||
|
| Server |
|
||||||
|
| |
|
||||||
|
+---------+
|
||||||
|
| NETWORK
|
||||||
|
~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
|
||||||
|
| +----------+
|
||||||
|
V | |
|
||||||
|
+---------+ | |
|
||||||
|
| | | |
|
||||||
|
| NFS |----->| FS-Cache |
|
||||||
|
| | | |--+
|
||||||
|
+---------+ | | | +--------------+ +--------------+
|
||||||
|
| | | | | | | |
|
||||||
|
V +----------+ +-->| CacheFiles |-->| Ext3 |
|
||||||
|
+---------+ | /var/cache | | /dev/sda6 |
|
||||||
|
| | +--------------+ +--------------+
|
||||||
|
| VFS | ^ ^
|
||||||
|
| | | |
|
||||||
|
+---------+ +--------------+ |
|
||||||
|
| KERNEL SPACE | |
|
||||||
|
~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|~~~~~~|~~~~
|
||||||
|
| USER SPACE | |
|
||||||
|
V | |
|
||||||
|
+---------+ +--------------+
|
||||||
|
| | | |
|
||||||
|
| Process | | cachefilesd |
|
||||||
|
| | | |
|
||||||
|
+---------+ +--------------+
|
||||||
|
|
||||||
|
|
||||||
|
FS-Cache does not follow the idea of completely loading every netfs file
|
||||||
|
opened in its entirety into a cache before permitting it to be accessed and
|
||||||
|
then serving the pages out of that cache rather than the netfs inode because:
|
||||||
|
|
||||||
|
(1) It must be practical to operate without a cache.
|
||||||
|
|
||||||
|
(2) The size of any accessible file must not be limited to the size of the
|
||||||
|
cache.
|
||||||
|
|
||||||
|
(3) The combined size of all opened files (this includes mapped libraries)
|
||||||
|
must not be limited to the size of the cache.
|
||||||
|
|
||||||
|
(4) The user should not be forced to download an entire file just to do a
|
||||||
|
one-off access of a small portion of it (such as might be done with the
|
||||||
|
"file" program).
|
||||||
|
|
||||||
|
It instead serves the cache out in PAGE_SIZE chunks as and when requested by
|
||||||
|
the netfs('s) using it.
|
||||||
|
|
||||||
|
|
||||||
|
FS-Cache provides the following facilities:
|
||||||
|
|
||||||
|
(1) More than one cache can be used at once. Caches can be selected
|
||||||
|
explicitly by use of tags.
|
||||||
|
|
||||||
|
(2) Caches can be added / removed at any time.
|
||||||
|
|
||||||
|
(3) The netfs is provided with an interface that allows either party to
|
||||||
|
withdraw caching facilities from a file (required for (2)).
|
||||||
|
|
||||||
|
(4) The interface to the netfs returns as few errors as possible, preferring
|
||||||
|
rather to let the netfs remain oblivious.
|
||||||
|
|
||||||
|
(5) Cookies are used to represent indices, files and other objects to the
|
||||||
|
netfs. The simplest cookie is just a NULL pointer - indicating nothing
|
||||||
|
cached there.
|
||||||
|
|
||||||
|
(6) The netfs is allowed to propose - dynamically - any index hierarchy it
|
||||||
|
desires, though it must be aware that the index search function is
|
||||||
|
recursive, stack space is limited, and indices can only be children of
|
||||||
|
indices.
|
||||||
|
|
||||||
|
(7) Data I/O is done direct to and from the netfs's pages. The netfs
|
||||||
|
indicates that page A is at index B of the data-file represented by cookie
|
||||||
|
C, and that it should be read or written. The cache backend may or may
|
||||||
|
not start I/O on that page, but if it does, a netfs callback will be
|
||||||
|
invoked to indicate completion. The I/O may be either synchronous or
|
||||||
|
asynchronous.
|
||||||
|
|
||||||
|
(8) Cookies can be "retired" upon release. At this point FS-Cache will mark
|
||||||
|
them as obsolete and the index hierarchy rooted at that point will get
|
||||||
|
recycled.
|
||||||
|
|
||||||
|
(9) The netfs provides a "match" function for index searches. In addition to
|
||||||
|
saying whether a match was made or not, this can also specify that an
|
||||||
|
entry should be updated or deleted.
|
||||||
|
|
||||||
|
(10) As much as possible is done asynchronously.
|
||||||
|
|
||||||
|
|
||||||
|
FS-Cache maintains a virtual indexing tree in which all indices, files, objects
|
||||||
|
and pages are kept. Bits of this tree may actually reside in one or more
|
||||||
|
caches.
|
||||||
|
|
||||||
|
FSDEF
|
||||||
|
|
|
||||||
|
+------------------------------------+
|
||||||
|
| |
|
||||||
|
NFS AFS
|
||||||
|
| |
|
||||||
|
+--------------------------+ +-----------+
|
||||||
|
| | | |
|
||||||
|
homedir mirror afs.org redhat.com
|
||||||
|
| | |
|
||||||
|
+------------+ +---------------+ +----------+
|
||||||
|
| | | | | |
|
||||||
|
00001 00002 00007 00125 vol00001 vol00002
|
||||||
|
| | | | |
|
||||||
|
+---+---+ +-----+ +---+ +------+------+ +-----+----+
|
||||||
|
| | | | | | | | | | | | |
|
||||||
|
PG0 PG1 PG2 PG0 XATTR PG0 PG1 DIRENT DIRENT DIRENT R/W R/O Bak
|
||||||
|
| |
|
||||||
|
PG0 +-------+
|
||||||
|
| |
|
||||||
|
00001 00003
|
||||||
|
|
|
||||||
|
+---+---+
|
||||||
|
| | |
|
||||||
|
PG0 PG1 PG2
|
||||||
|
|
||||||
|
In the example above, you can see two netfs's being backed: NFS and AFS. These
|
||||||
|
have different index hierarchies:
|
||||||
|
|
||||||
|
(*) The NFS primary index contains per-server indices. Each server index is
|
||||||
|
indexed by NFS file handles to get data file objects. Each data file
|
||||||
|
objects can have an array of pages, but may also have further child
|
||||||
|
objects, such as extended attributes and directory entries. Extended
|
||||||
|
attribute objects themselves have page-array contents.
|
||||||
|
|
||||||
|
(*) The AFS primary index contains per-cell indices. Each cell index contains
|
||||||
|
per-logical-volume indices. Each of volume index contains up to three
|
||||||
|
indices for the read-write, read-only and backup mirrors of those volumes.
|
||||||
|
Each of these contains vnode data file objects, each of which contains an
|
||||||
|
array of pages.
|
||||||
|
|
||||||
|
The very top index is the FS-Cache master index in which individual netfs's
|
||||||
|
have entries.
|
||||||
|
|
||||||
|
Any index object may reside in more than one cache, provided it only has index
|
||||||
|
children. Any index with non-index object children will be assumed to only
|
||||||
|
reside in one cache.
|
||||||
|
|
||||||
|
|
||||||
|
The netfs API to FS-Cache can be found in:
|
||||||
|
|
||||||
|
Documentation/filesystems/caching/netfs-api.txt
|
||||||
|
|
||||||
|
The cache backend API to FS-Cache can be found in:
|
||||||
|
|
||||||
|
Documentation/filesystems/caching/backend-api.txt
|
||||||
|
|
||||||
|
A description of the internal representations and object state machine can be
|
||||||
|
found in:
|
||||||
|
|
||||||
|
Documentation/filesystems/caching/object.txt
|
||||||
|
|
||||||
|
|
||||||
|
=======================
|
||||||
|
STATISTICAL INFORMATION
|
||||||
|
=======================
|
||||||
|
|
||||||
|
If FS-Cache is compiled with the following options enabled:
|
||||||
|
|
||||||
|
CONFIG_FSCACHE_STATS=y
|
||||||
|
CONFIG_FSCACHE_HISTOGRAM=y
|
||||||
|
|
||||||
|
then it will gather certain statistics and display them through a number of
|
||||||
|
proc files.
|
||||||
|
|
||||||
|
(*) /proc/fs/fscache/stats
|
||||||
|
|
||||||
|
This shows counts of a number of events that can happen in FS-Cache:
|
||||||
|
|
||||||
|
CLASS EVENT MEANING
|
||||||
|
======= ======= =======================================================
|
||||||
|
Cookies idx=N Number of index cookies allocated
|
||||||
|
dat=N Number of data storage cookies allocated
|
||||||
|
spc=N Number of special cookies allocated
|
||||||
|
Objects alc=N Number of objects allocated
|
||||||
|
nal=N Number of object allocation failures
|
||||||
|
avl=N Number of objects that reached the available state
|
||||||
|
ded=N Number of objects that reached the dead state
|
||||||
|
ChkAux non=N Number of objects that didn't have a coherency check
|
||||||
|
ok=N Number of objects that passed a coherency check
|
||||||
|
upd=N Number of objects that needed a coherency data update
|
||||||
|
obs=N Number of objects that were declared obsolete
|
||||||
|
Pages mrk=N Number of pages marked as being cached
|
||||||
|
unc=N Number of uncache page requests seen
|
||||||
|
Acquire n=N Number of acquire cookie requests seen
|
||||||
|
nul=N Number of acq reqs given a NULL parent
|
||||||
|
noc=N Number of acq reqs rejected due to no cache available
|
||||||
|
ok=N Number of acq reqs succeeded
|
||||||
|
nbf=N Number of acq reqs rejected due to error
|
||||||
|
oom=N Number of acq reqs failed on ENOMEM
|
||||||
|
Lookups n=N Number of lookup calls made on cache backends
|
||||||
|
neg=N Number of negative lookups made
|
||||||
|
pos=N Number of positive lookups made
|
||||||
|
crt=N Number of objects created by lookup
|
||||||
|
Updates n=N Number of update cookie requests seen
|
||||||
|
nul=N Number of upd reqs given a NULL parent
|
||||||
|
run=N Number of upd reqs granted CPU time
|
||||||
|
Relinqs n=N Number of relinquish cookie requests seen
|
||||||
|
nul=N Number of rlq reqs given a NULL parent
|
||||||
|
wcr=N Number of rlq reqs waited on completion of creation
|
||||||
|
AttrChg n=N Number of attribute changed requests seen
|
||||||
|
ok=N Number of attr changed requests queued
|
||||||
|
nbf=N Number of attr changed rejected -ENOBUFS
|
||||||
|
oom=N Number of attr changed failed -ENOMEM
|
||||||
|
run=N Number of attr changed ops given CPU time
|
||||||
|
Allocs n=N Number of allocation requests seen
|
||||||
|
ok=N Number of successful alloc reqs
|
||||||
|
wt=N Number of alloc reqs that waited on lookup completion
|
||||||
|
nbf=N Number of alloc reqs rejected -ENOBUFS
|
||||||
|
ops=N Number of alloc reqs submitted
|
||||||
|
owt=N Number of alloc reqs waited for CPU time
|
||||||
|
Retrvls n=N Number of retrieval (read) requests seen
|
||||||
|
ok=N Number of successful retr reqs
|
||||||
|
wt=N Number of retr reqs that waited on lookup completion
|
||||||
|
nod=N Number of retr reqs returned -ENODATA
|
||||||
|
nbf=N Number of retr reqs rejected -ENOBUFS
|
||||||
|
int=N Number of retr reqs aborted -ERESTARTSYS
|
||||||
|
oom=N Number of retr reqs failed -ENOMEM
|
||||||
|
ops=N Number of retr reqs submitted
|
||||||
|
owt=N Number of retr reqs waited for CPU time
|
||||||
|
Stores n=N Number of storage (write) requests seen
|
||||||
|
ok=N Number of successful store reqs
|
||||||
|
agn=N Number of store reqs on a page already pending storage
|
||||||
|
nbf=N Number of store reqs rejected -ENOBUFS
|
||||||
|
oom=N Number of store reqs failed -ENOMEM
|
||||||
|
ops=N Number of store reqs submitted
|
||||||
|
run=N Number of store reqs granted CPU time
|
||||||
|
Ops pend=N Number of times async ops added to pending queues
|
||||||
|
run=N Number of times async ops given CPU time
|
||||||
|
enq=N Number of times async ops queued for processing
|
||||||
|
dfr=N Number of async ops queued for deferred release
|
||||||
|
rel=N Number of async ops released
|
||||||
|
gc=N Number of deferred-release async ops garbage collected
|
||||||
|
|
||||||
|
|
||||||
|
(*) /proc/fs/fscache/histogram
|
||||||
|
|
||||||
|
cat /proc/fs/fscache/histogram
|
||||||
|
JIFS SECS OBJ INST OP RUNS OBJ RUNS RETRV DLY RETRIEVLS
|
||||||
|
===== ===== ========= ========= ========= ========= =========
|
||||||
|
|
||||||
|
This shows the breakdown of the number of times each amount of time
|
||||||
|
between 0 jiffies and HZ-1 jiffies a variety of tasks took to run. The
|
||||||
|
columns are as follows:
|
||||||
|
|
||||||
|
COLUMN TIME MEASUREMENT
|
||||||
|
======= =======================================================
|
||||||
|
OBJ INST Length of time to instantiate an object
|
||||||
|
OP RUNS Length of time a call to process an operation took
|
||||||
|
OBJ RUNS Length of time a call to process an object event took
|
||||||
|
RETRV DLY Time between an requesting a read and lookup completing
|
||||||
|
RETRIEVLS Time between beginning and end of a retrieval
|
||||||
|
|
||||||
|
Each row shows the number of events that took a particular range of times.
|
||||||
|
Each step is 1 jiffy in size. The JIFS column indicates the particular
|
||||||
|
jiffy range covered, and the SECS field the equivalent number of seconds.
|
||||||
|
|
||||||
|
|
||||||
|
=========
|
||||||
|
DEBUGGING
|
||||||
|
=========
|
||||||
|
|
||||||
|
If CONFIG_FSCACHE_DEBUG is enabled, the FS-Cache facility can have runtime
|
||||||
|
debugging enabled by adjusting the value in:
|
||||||
|
|
||||||
|
/sys/module/fscache/parameters/debug
|
||||||
|
|
||||||
|
This is a bitmask of debugging streams to enable:
|
||||||
|
|
||||||
|
BIT VALUE STREAM POINT
|
||||||
|
======= ======= =============================== =======================
|
||||||
|
0 1 Cache management Function entry trace
|
||||||
|
1 2 Function exit trace
|
||||||
|
2 4 General
|
||||||
|
3 8 Cookie management Function entry trace
|
||||||
|
4 16 Function exit trace
|
||||||
|
5 32 General
|
||||||
|
6 64 Page handling Function entry trace
|
||||||
|
7 128 Function exit trace
|
||||||
|
8 256 General
|
||||||
|
9 512 Operation management Function entry trace
|
||||||
|
10 1024 Function exit trace
|
||||||
|
11 2048 General
|
||||||
|
|
||||||
|
The appropriate set of values should be OR'd together and the result written to
|
||||||
|
the control file. For example:
|
||||||
|
|
||||||
|
echo $((1|8|64)) >/sys/module/fscache/parameters/debug
|
||||||
|
|
||||||
|
will turn on all function entry debugging.
|
778
Documentation/filesystems/caching/netfs-api.txt
Normal file
778
Documentation/filesystems/caching/netfs-api.txt
Normal file
@@ -0,0 +1,778 @@
|
|||||||
|
===============================
|
||||||
|
FS-CACHE NETWORK FILESYSTEM API
|
||||||
|
===============================
|
||||||
|
|
||||||
|
There's an API by which a network filesystem can make use of the FS-Cache
|
||||||
|
facilities. This is based around a number of principles:
|
||||||
|
|
||||||
|
(1) Caches can store a number of different object types. There are two main
|
||||||
|
object types: indices and files. The first is a special type used by
|
||||||
|
FS-Cache to make finding objects faster and to make retiring of groups of
|
||||||
|
objects easier.
|
||||||
|
|
||||||
|
(2) Every index, file or other object is represented by a cookie. This cookie
|
||||||
|
may or may not have anything associated with it, but the netfs doesn't
|
||||||
|
need to care.
|
||||||
|
|
||||||
|
(3) Barring the top-level index (one entry per cached netfs), the index
|
||||||
|
hierarchy for each netfs is structured according the whim of the netfs.
|
||||||
|
|
||||||
|
This API is declared in <linux/fscache.h>.
|
||||||
|
|
||||||
|
This document contains the following sections:
|
||||||
|
|
||||||
|
(1) Network filesystem definition
|
||||||
|
(2) Index definition
|
||||||
|
(3) Object definition
|
||||||
|
(4) Network filesystem (un)registration
|
||||||
|
(5) Cache tag lookup
|
||||||
|
(6) Index registration
|
||||||
|
(7) Data file registration
|
||||||
|
(8) Miscellaneous object registration
|
||||||
|
(9) Setting the data file size
|
||||||
|
(10) Page alloc/read/write
|
||||||
|
(11) Page uncaching
|
||||||
|
(12) Index and data file update
|
||||||
|
(13) Miscellaneous cookie operations
|
||||||
|
(14) Cookie unregistration
|
||||||
|
(15) Index and data file invalidation
|
||||||
|
(16) FS-Cache specific page flags.
|
||||||
|
|
||||||
|
|
||||||
|
=============================
|
||||||
|
NETWORK FILESYSTEM DEFINITION
|
||||||
|
=============================
|
||||||
|
|
||||||
|
FS-Cache needs a description of the network filesystem. This is specified
|
||||||
|
using a record of the following structure:
|
||||||
|
|
||||||
|
struct fscache_netfs {
|
||||||
|
uint32_t version;
|
||||||
|
const char *name;
|
||||||
|
struct fscache_cookie *primary_index;
|
||||||
|
...
|
||||||
|
};
|
||||||
|
|
||||||
|
This first two fields should be filled in before registration, and the third
|
||||||
|
will be filled in by the registration function; any other fields should just be
|
||||||
|
ignored and are for internal use only.
|
||||||
|
|
||||||
|
The fields are:
|
||||||
|
|
||||||
|
(1) The name of the netfs (used as the key in the toplevel index).
|
||||||
|
|
||||||
|
(2) The version of the netfs (if the name matches but the version doesn't, the
|
||||||
|
entire in-cache hierarchy for this netfs will be scrapped and begun
|
||||||
|
afresh).
|
||||||
|
|
||||||
|
(3) The cookie representing the primary index will be allocated according to
|
||||||
|
another parameter passed into the registration function.
|
||||||
|
|
||||||
|
For example, kAFS (linux/fs/afs/) uses the following definitions to describe
|
||||||
|
itself:
|
||||||
|
|
||||||
|
struct fscache_netfs afs_cache_netfs = {
|
||||||
|
.version = 0,
|
||||||
|
.name = "afs",
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
================
|
||||||
|
INDEX DEFINITION
|
||||||
|
================
|
||||||
|
|
||||||
|
Indices are used for two purposes:
|
||||||
|
|
||||||
|
(1) To aid the finding of a file based on a series of keys (such as AFS's
|
||||||
|
"cell", "volume ID", "vnode ID").
|
||||||
|
|
||||||
|
(2) To make it easier to discard a subset of all the files cached based around
|
||||||
|
a particular key - for instance to mirror the removal of an AFS volume.
|
||||||
|
|
||||||
|
However, since it's unlikely that any two netfs's are going to want to define
|
||||||
|
their index hierarchies in quite the same way, FS-Cache tries to impose as few
|
||||||
|
restraints as possible on how an index is structured and where it is placed in
|
||||||
|
the tree. The netfs can even mix indices and data files at the same level, but
|
||||||
|
it's not recommended.
|
||||||
|
|
||||||
|
Each index entry consists of a key of indeterminate length plus some auxilliary
|
||||||
|
data, also of indeterminate length.
|
||||||
|
|
||||||
|
There are some limits on indices:
|
||||||
|
|
||||||
|
(1) Any index containing non-index objects should be restricted to a single
|
||||||
|
cache. Any such objects created within an index will be created in the
|
||||||
|
first cache only. The cache in which an index is created can be
|
||||||
|
controlled by cache tags (see below).
|
||||||
|
|
||||||
|
(2) The entry data must be atomically journallable, so it is limited to about
|
||||||
|
400 bytes at present. At least 400 bytes will be available.
|
||||||
|
|
||||||
|
(3) The depth of the index tree should be judged with care as the search
|
||||||
|
function is recursive. Too many layers will run the kernel out of stack.
|
||||||
|
|
||||||
|
|
||||||
|
=================
|
||||||
|
OBJECT DEFINITION
|
||||||
|
=================
|
||||||
|
|
||||||
|
To define an object, a structure of the following type should be filled out:
|
||||||
|
|
||||||
|
struct fscache_cookie_def
|
||||||
|
{
|
||||||
|
uint8_t name[16];
|
||||||
|
uint8_t type;
|
||||||
|
|
||||||
|
struct fscache_cache_tag *(*select_cache)(
|
||||||
|
const void *parent_netfs_data,
|
||||||
|
const void *cookie_netfs_data);
|
||||||
|
|
||||||
|
uint16_t (*get_key)(const void *cookie_netfs_data,
|
||||||
|
void *buffer,
|
||||||
|
uint16_t bufmax);
|
||||||
|
|
||||||
|
void (*get_attr)(const void *cookie_netfs_data,
|
||||||
|
uint64_t *size);
|
||||||
|
|
||||||
|
uint16_t (*get_aux)(const void *cookie_netfs_data,
|
||||||
|
void *buffer,
|
||||||
|
uint16_t bufmax);
|
||||||
|
|
||||||
|
enum fscache_checkaux (*check_aux)(void *cookie_netfs_data,
|
||||||
|
const void *data,
|
||||||
|
uint16_t datalen);
|
||||||
|
|
||||||
|
void (*get_context)(void *cookie_netfs_data, void *context);
|
||||||
|
|
||||||
|
void (*put_context)(void *cookie_netfs_data, void *context);
|
||||||
|
|
||||||
|
void (*mark_pages_cached)(void *cookie_netfs_data,
|
||||||
|
struct address_space *mapping,
|
||||||
|
struct pagevec *cached_pvec);
|
||||||
|
|
||||||
|
void (*now_uncached)(void *cookie_netfs_data);
|
||||||
|
};
|
||||||
|
|
||||||
|
This has the following fields:
|
||||||
|
|
||||||
|
(1) The type of the object [mandatory].
|
||||||
|
|
||||||
|
This is one of the following values:
|
||||||
|
|
||||||
|
(*) FSCACHE_COOKIE_TYPE_INDEX
|
||||||
|
|
||||||
|
This defines an index, which is a special FS-Cache type.
|
||||||
|
|
||||||
|
(*) FSCACHE_COOKIE_TYPE_DATAFILE
|
||||||
|
|
||||||
|
This defines an ordinary data file.
|
||||||
|
|
||||||
|
(*) Any other value between 2 and 255
|
||||||
|
|
||||||
|
This defines an extraordinary object such as an XATTR.
|
||||||
|
|
||||||
|
(2) The name of the object type (NUL terminated unless all 16 chars are used)
|
||||||
|
[optional].
|
||||||
|
|
||||||
|
(3) A function to select the cache in which to store an index [optional].
|
||||||
|
|
||||||
|
This function is invoked when an index needs to be instantiated in a cache
|
||||||
|
during the instantiation of a non-index object. Only the immediate index
|
||||||
|
parent for the non-index object will be queried. Any indices above that
|
||||||
|
in the hierarchy may be stored in multiple caches. This function does not
|
||||||
|
need to be supplied for any non-index object or any index that will only
|
||||||
|
have index children.
|
||||||
|
|
||||||
|
If this function is not supplied or if it returns NULL then the first
|
||||||
|
cache in the parent's list will be chosed, or failing that, the first
|
||||||
|
cache in the master list.
|
||||||
|
|
||||||
|
(4) A function to retrieve an object's key from the netfs [mandatory].
|
||||||
|
|
||||||
|
This function will be called with the netfs data that was passed to the
|
||||||
|
cookie acquisition function and the maximum length of key data that it may
|
||||||
|
provide. It should write the required key data into the given buffer and
|
||||||
|
return the quantity it wrote.
|
||||||
|
|
||||||
|
(5) A function to retrieve attribute data from the netfs [optional].
|
||||||
|
|
||||||
|
This function will be called with the netfs data that was passed to the
|
||||||
|
cookie acquisition function. It should return the size of the file if
|
||||||
|
this is a data file. The size may be used to govern how much cache must
|
||||||
|
be reserved for this file in the cache.
|
||||||
|
|
||||||
|
If the function is absent, a file size of 0 is assumed.
|
||||||
|
|
||||||
|
(6) A function to retrieve auxilliary data from the netfs [optional].
|
||||||
|
|
||||||
|
This function will be called with the netfs data that was passed to the
|
||||||
|
cookie acquisition function and the maximum length of auxilliary data that
|
||||||
|
it may provide. It should write the auxilliary data into the given buffer
|
||||||
|
and return the quantity it wrote.
|
||||||
|
|
||||||
|
If this function is absent, the auxilliary data length will be set to 0.
|
||||||
|
|
||||||
|
The length of the auxilliary data buffer may be dependent on the key
|
||||||
|
length. A netfs mustn't rely on being able to provide more than 400 bytes
|
||||||
|
for both.
|
||||||
|
|
||||||
|
(7) A function to check the auxilliary data [optional].
|
||||||
|
|
||||||
|
This function will be called to check that a match found in the cache for
|
||||||
|
this object is valid. For instance with AFS it could check the auxilliary
|
||||||
|
data against the data version number returned by the server to determine
|
||||||
|
whether the index entry in a cache is still valid.
|
||||||
|
|
||||||
|
If this function is absent, it will be assumed that matching objects in a
|
||||||
|
cache are always valid.
|
||||||
|
|
||||||
|
If present, the function should return one of the following values:
|
||||||
|
|
||||||
|
(*) FSCACHE_CHECKAUX_OKAY - the entry is okay as is
|
||||||
|
(*) FSCACHE_CHECKAUX_NEEDS_UPDATE - the entry requires update
|
||||||
|
(*) FSCACHE_CHECKAUX_OBSOLETE - the entry should be deleted
|
||||||
|
|
||||||
|
This function can also be used to extract data from the auxilliary data in
|
||||||
|
the cache and copy it into the netfs's structures.
|
||||||
|
|
||||||
|
(8) A pair of functions to manage contexts for the completion callback
|
||||||
|
[optional].
|
||||||
|
|
||||||
|
The cache read/write functions are passed a context which is then passed
|
||||||
|
to the I/O completion callback function. To ensure this context remains
|
||||||
|
valid until after the I/O completion is called, two functions may be
|
||||||
|
provided: one to get an extra reference on the context, and one to drop a
|
||||||
|
reference to it.
|
||||||
|
|
||||||
|
If the context is not used or is a type of object that won't go out of
|
||||||
|
scope, then these functions are not required. These functions are not
|
||||||
|
required for indices as indices may not contain data. These functions may
|
||||||
|
be called in interrupt context and so may not sleep.
|
||||||
|
|
||||||
|
(9) A function to mark a page as retaining cache metadata [optional].
|
||||||
|
|
||||||
|
This is called by the cache to indicate that it is retaining in-memory
|
||||||
|
information for this page and that the netfs should uncache the page when
|
||||||
|
it has finished. This does not indicate whether there's data on the disk
|
||||||
|
or not. Note that several pages at once may be presented for marking.
|
||||||
|
|
||||||
|
The PG_fscache bit is set on the pages before this function would be
|
||||||
|
called, so the function need not be provided if this is sufficient.
|
||||||
|
|
||||||
|
This function is not required for indices as they're not permitted data.
|
||||||
|
|
||||||
|
(10) A function to unmark all the pages retaining cache metadata [mandatory].
|
||||||
|
|
||||||
|
This is called by FS-Cache to indicate that a backing store is being
|
||||||
|
unbound from a cookie and that all the marks on the pages should be
|
||||||
|
cleared to prevent confusion. Note that the cache will have torn down all
|
||||||
|
its tracking information so that the pages don't need to be explicitly
|
||||||
|
uncached.
|
||||||
|
|
||||||
|
This function is not required for indices as they're not permitted data.
|
||||||
|
|
||||||
|
|
||||||
|
===================================
|
||||||
|
NETWORK FILESYSTEM (UN)REGISTRATION
|
||||||
|
===================================
|
||||||
|
|
||||||
|
The first step is to declare the network filesystem to the cache. This also
|
||||||
|
involves specifying the layout of the primary index (for AFS, this would be the
|
||||||
|
"cell" level).
|
||||||
|
|
||||||
|
The registration function is:
|
||||||
|
|
||||||
|
int fscache_register_netfs(struct fscache_netfs *netfs);
|
||||||
|
|
||||||
|
It just takes a pointer to the netfs definition. It returns 0 or an error as
|
||||||
|
appropriate.
|
||||||
|
|
||||||
|
For kAFS, registration is done as follows:
|
||||||
|
|
||||||
|
ret = fscache_register_netfs(&afs_cache_netfs);
|
||||||
|
|
||||||
|
The last step is, of course, unregistration:
|
||||||
|
|
||||||
|
void fscache_unregister_netfs(struct fscache_netfs *netfs);
|
||||||
|
|
||||||
|
|
||||||
|
================
|
||||||
|
CACHE TAG LOOKUP
|
||||||
|
================
|
||||||
|
|
||||||
|
FS-Cache permits the use of more than one cache. To permit particular index
|
||||||
|
subtrees to be bound to particular caches, the second step is to look up cache
|
||||||
|
representation tags. This step is optional; it can be left entirely up to
|
||||||
|
FS-Cache as to which cache should be used. The problem with doing that is that
|
||||||
|
FS-Cache will always pick the first cache that was registered.
|
||||||
|
|
||||||
|
To get the representation for a named tag:
|
||||||
|
|
||||||
|
struct fscache_cache_tag *fscache_lookup_cache_tag(const char *name);
|
||||||
|
|
||||||
|
This takes a text string as the name and returns a representation of a tag. It
|
||||||
|
will never return an error. It may return a dummy tag, however, if it runs out
|
||||||
|
of memory; this will inhibit caching with this tag.
|
||||||
|
|
||||||
|
Any representation so obtained must be released by passing it to this function:
|
||||||
|
|
||||||
|
void fscache_release_cache_tag(struct fscache_cache_tag *tag);
|
||||||
|
|
||||||
|
The tag will be retrieved by FS-Cache when it calls the object definition
|
||||||
|
operation select_cache().
|
||||||
|
|
||||||
|
|
||||||
|
==================
|
||||||
|
INDEX REGISTRATION
|
||||||
|
==================
|
||||||
|
|
||||||
|
The third step is to inform FS-Cache about part of an index hierarchy that can
|
||||||
|
be used to locate files. This is done by requesting a cookie for each index in
|
||||||
|
the path to the file:
|
||||||
|
|
||||||
|
struct fscache_cookie *
|
||||||
|
fscache_acquire_cookie(struct fscache_cookie *parent,
|
||||||
|
const struct fscache_object_def *def,
|
||||||
|
void *netfs_data);
|
||||||
|
|
||||||
|
This function creates an index entry in the index represented by parent,
|
||||||
|
filling in the index entry by calling the operations pointed to by def.
|
||||||
|
|
||||||
|
Note that this function never returns an error - all errors are handled
|
||||||
|
internally. It may, however, return NULL to indicate no cookie. It is quite
|
||||||
|
acceptable to pass this token back to this function as the parent to another
|
||||||
|
acquisition (or even to the relinquish cookie, read page and write page
|
||||||
|
functions - see below).
|
||||||
|
|
||||||
|
Note also that no indices are actually created in a cache until a non-index
|
||||||
|
object needs to be created somewhere down the hierarchy. Furthermore, an index
|
||||||
|
may be created in several different caches independently at different times.
|
||||||
|
This is all handled transparently, and the netfs doesn't see any of it.
|
||||||
|
|
||||||
|
For example, with AFS, a cell would be added to the primary index. This index
|
||||||
|
entry would have a dependent inode containing a volume location index for the
|
||||||
|
volume mappings within this cell:
|
||||||
|
|
||||||
|
cell->cache =
|
||||||
|
fscache_acquire_cookie(afs_cache_netfs.primary_index,
|
||||||
|
&afs_cell_cache_index_def,
|
||||||
|
cell);
|
||||||
|
|
||||||
|
Then when a volume location was accessed, it would be entered into the cell's
|
||||||
|
index and an inode would be allocated that acts as a volume type and hash chain
|
||||||
|
combination:
|
||||||
|
|
||||||
|
vlocation->cache =
|
||||||
|
fscache_acquire_cookie(cell->cache,
|
||||||
|
&afs_vlocation_cache_index_def,
|
||||||
|
vlocation);
|
||||||
|
|
||||||
|
And then a particular flavour of volume (R/O for example) could be added to
|
||||||
|
that index, creating another index for vnodes (AFS inode equivalents):
|
||||||
|
|
||||||
|
volume->cache =
|
||||||
|
fscache_acquire_cookie(vlocation->cache,
|
||||||
|
&afs_volume_cache_index_def,
|
||||||
|
volume);
|
||||||
|
|
||||||
|
|
||||||
|
======================
|
||||||
|
DATA FILE REGISTRATION
|
||||||
|
======================
|
||||||
|
|
||||||
|
The fourth step is to request a data file be created in the cache. This is
|
||||||
|
identical to index cookie acquisition. The only difference is that the type in
|
||||||
|
the object definition should be something other than index type.
|
||||||
|
|
||||||
|
vnode->cache =
|
||||||
|
fscache_acquire_cookie(volume->cache,
|
||||||
|
&afs_vnode_cache_object_def,
|
||||||
|
vnode);
|
||||||
|
|
||||||
|
|
||||||
|
=================================
|
||||||
|
MISCELLANEOUS OBJECT REGISTRATION
|
||||||
|
=================================
|
||||||
|
|
||||||
|
An optional step is to request an object of miscellaneous type be created in
|
||||||
|
the cache. This is almost identical to index cookie acquisition. The only
|
||||||
|
difference is that the type in the object definition should be something other
|
||||||
|
than index type. Whilst the parent object could be an index, it's more likely
|
||||||
|
it would be some other type of object such as a data file.
|
||||||
|
|
||||||
|
xattr->cache =
|
||||||
|
fscache_acquire_cookie(vnode->cache,
|
||||||
|
&afs_xattr_cache_object_def,
|
||||||
|
xattr);
|
||||||
|
|
||||||
|
Miscellaneous objects might be used to store extended attributes or directory
|
||||||
|
entries for example.
|
||||||
|
|
||||||
|
|
||||||
|
==========================
|
||||||
|
SETTING THE DATA FILE SIZE
|
||||||
|
==========================
|
||||||
|
|
||||||
|
The fifth step is to set the physical attributes of the file, such as its size.
|
||||||
|
This doesn't automatically reserve any space in the cache, but permits the
|
||||||
|
cache to adjust its metadata for data tracking appropriately:
|
||||||
|
|
||||||
|
int fscache_attr_changed(struct fscache_cookie *cookie);
|
||||||
|
|
||||||
|
The cache will return -ENOBUFS if there is no backing cache or if there is no
|
||||||
|
space to allocate any extra metadata required in the cache. The attributes
|
||||||
|
will be accessed with the get_attr() cookie definition operation.
|
||||||
|
|
||||||
|
Note that attempts to read or write data pages in the cache over this size may
|
||||||
|
be rebuffed with -ENOBUFS.
|
||||||
|
|
||||||
|
This operation schedules an attribute adjustment to happen asynchronously at
|
||||||
|
some point in the future, and as such, it may happen after the function returns
|
||||||
|
to the caller. The attribute adjustment excludes read and write operations.
|
||||||
|
|
||||||
|
|
||||||
|
=====================
|
||||||
|
PAGE READ/ALLOC/WRITE
|
||||||
|
=====================
|
||||||
|
|
||||||
|
And the sixth step is to store and retrieve pages in the cache. There are
|
||||||
|
three functions that are used to do this.
|
||||||
|
|
||||||
|
Note:
|
||||||
|
|
||||||
|
(1) A page should not be re-read or re-allocated without uncaching it first.
|
||||||
|
|
||||||
|
(2) A read or allocated page must be uncached when the netfs page is released
|
||||||
|
from the pagecache.
|
||||||
|
|
||||||
|
(3) A page should only be written to the cache if previous read or allocated.
|
||||||
|
|
||||||
|
This permits the cache to maintain its page tracking in proper order.
|
||||||
|
|
||||||
|
|
||||||
|
PAGE READ
|
||||||
|
---------
|
||||||
|
|
||||||
|
Firstly, the netfs should ask FS-Cache to examine the caches and read the
|
||||||
|
contents cached for a particular page of a particular file if present, or else
|
||||||
|
allocate space to store the contents if not:
|
||||||
|
|
||||||
|
typedef
|
||||||
|
void (*fscache_rw_complete_t)(struct page *page,
|
||||||
|
void *context,
|
||||||
|
int error);
|
||||||
|
|
||||||
|
int fscache_read_or_alloc_page(struct fscache_cookie *cookie,
|
||||||
|
struct page *page,
|
||||||
|
fscache_rw_complete_t end_io_func,
|
||||||
|
void *context,
|
||||||
|
gfp_t gfp);
|
||||||
|
|
||||||
|
The cookie argument must specify a cookie for an object that isn't an index,
|
||||||
|
the page specified will have the data loaded into it (and is also used to
|
||||||
|
specify the page number), and the gfp argument is used to control how any
|
||||||
|
memory allocations made are satisfied.
|
||||||
|
|
||||||
|
If the cookie indicates the inode is not cached:
|
||||||
|
|
||||||
|
(1) The function will return -ENOBUFS.
|
||||||
|
|
||||||
|
Else if there's a copy of the page resident in the cache:
|
||||||
|
|
||||||
|
(1) The mark_pages_cached() cookie operation will be called on that page.
|
||||||
|
|
||||||
|
(2) The function will submit a request to read the data from the cache's
|
||||||
|
backing device directly into the page specified.
|
||||||
|
|
||||||
|
(3) The function will return 0.
|
||||||
|
|
||||||
|
(4) When the read is complete, end_io_func() will be invoked with:
|
||||||
|
|
||||||
|
(*) The netfs data supplied when the cookie was created.
|
||||||
|
|
||||||
|
(*) The page descriptor.
|
||||||
|
|
||||||
|
(*) The context argument passed to the above function. This will be
|
||||||
|
maintained with the get_context/put_context functions mentioned above.
|
||||||
|
|
||||||
|
(*) An argument that's 0 on success or negative for an error code.
|
||||||
|
|
||||||
|
If an error occurs, it should be assumed that the page contains no usable
|
||||||
|
data.
|
||||||
|
|
||||||
|
end_io_func() will be called in process context if the read is results in
|
||||||
|
an error, but it might be called in interrupt context if the read is
|
||||||
|
successful.
|
||||||
|
|
||||||
|
Otherwise, if there's not a copy available in cache, but the cache may be able
|
||||||
|
to store the page:
|
||||||
|
|
||||||
|
(1) The mark_pages_cached() cookie operation will be called on that page.
|
||||||
|
|
||||||
|
(2) A block may be reserved in the cache and attached to the object at the
|
||||||
|
appropriate place.
|
||||||
|
|
||||||
|
(3) The function will return -ENODATA.
|
||||||
|
|
||||||
|
This function may also return -ENOMEM or -EINTR, in which case it won't have
|
||||||
|
read any data from the cache.
|
||||||
|
|
||||||
|
|
||||||
|
PAGE ALLOCATE
|
||||||
|
-------------
|
||||||
|
|
||||||
|
Alternatively, if there's not expected to be any data in the cache for a page
|
||||||
|
because the file has been extended, a block can simply be allocated instead:
|
||||||
|
|
||||||
|
int fscache_alloc_page(struct fscache_cookie *cookie,
|
||||||
|
struct page *page,
|
||||||
|
gfp_t gfp);
|
||||||
|
|
||||||
|
This is similar to the fscache_read_or_alloc_page() function, except that it
|
||||||
|
never reads from the cache. It will return 0 if a block has been allocated,
|
||||||
|
rather than -ENODATA as the other would. One or the other must be performed
|
||||||
|
before writing to the cache.
|
||||||
|
|
||||||
|
The mark_pages_cached() cookie operation will be called on the page if
|
||||||
|
successful.
|
||||||
|
|
||||||
|
|
||||||
|
PAGE WRITE
|
||||||
|
----------
|
||||||
|
|
||||||
|
Secondly, if the netfs changes the contents of the page (either due to an
|
||||||
|
initial download or if a user performs a write), then the page should be
|
||||||
|
written back to the cache:
|
||||||
|
|
||||||
|
int fscache_write_page(struct fscache_cookie *cookie,
|
||||||
|
struct page *page,
|
||||||
|
gfp_t gfp);
|
||||||
|
|
||||||
|
The cookie argument must specify a data file cookie, the page specified should
|
||||||
|
contain the data to be written (and is also used to specify the page number),
|
||||||
|
and the gfp argument is used to control how any memory allocations made are
|
||||||
|
satisfied.
|
||||||
|
|
||||||
|
The page must have first been read or allocated successfully and must not have
|
||||||
|
been uncached before writing is performed.
|
||||||
|
|
||||||
|
If the cookie indicates the inode is not cached then:
|
||||||
|
|
||||||
|
(1) The function will return -ENOBUFS.
|
||||||
|
|
||||||
|
Else if space can be allocated in the cache to hold this page:
|
||||||
|
|
||||||
|
(1) PG_fscache_write will be set on the page.
|
||||||
|
|
||||||
|
(2) The function will submit a request to write the data to cache's backing
|
||||||
|
device directly from the page specified.
|
||||||
|
|
||||||
|
(3) The function will return 0.
|
||||||
|
|
||||||
|
(4) When the write is complete PG_fscache_write is cleared on the page and
|
||||||
|
anyone waiting for that bit will be woken up.
|
||||||
|
|
||||||
|
Else if there's no space available in the cache, -ENOBUFS will be returned. It
|
||||||
|
is also possible for the PG_fscache_write bit to be cleared when no write took
|
||||||
|
place if unforeseen circumstances arose (such as a disk error).
|
||||||
|
|
||||||
|
Writing takes place asynchronously.
|
||||||
|
|
||||||
|
|
||||||
|
MULTIPLE PAGE READ
|
||||||
|
------------------
|
||||||
|
|
||||||
|
A facility is provided to read several pages at once, as requested by the
|
||||||
|
readpages() address space operation:
|
||||||
|
|
||||||
|
int fscache_read_or_alloc_pages(struct fscache_cookie *cookie,
|
||||||
|
struct address_space *mapping,
|
||||||
|
struct list_head *pages,
|
||||||
|
int *nr_pages,
|
||||||
|
fscache_rw_complete_t end_io_func,
|
||||||
|
void *context,
|
||||||
|
gfp_t gfp);
|
||||||
|
|
||||||
|
This works in a similar way to fscache_read_or_alloc_page(), except:
|
||||||
|
|
||||||
|
(1) Any page it can retrieve data for is removed from pages and nr_pages and
|
||||||
|
dispatched for reading to the disk. Reads of adjacent pages on disk may
|
||||||
|
be merged for greater efficiency.
|
||||||
|
|
||||||
|
(2) The mark_pages_cached() cookie operation will be called on several pages
|
||||||
|
at once if they're being read or allocated.
|
||||||
|
|
||||||
|
(3) If there was an general error, then that error will be returned.
|
||||||
|
|
||||||
|
Else if some pages couldn't be allocated or read, then -ENOBUFS will be
|
||||||
|
returned.
|
||||||
|
|
||||||
|
Else if some pages couldn't be read but were allocated, then -ENODATA will
|
||||||
|
be returned.
|
||||||
|
|
||||||
|
Otherwise, if all pages had reads dispatched, then 0 will be returned, the
|
||||||
|
list will be empty and *nr_pages will be 0.
|
||||||
|
|
||||||
|
(4) end_io_func will be called once for each page being read as the reads
|
||||||
|
complete. It will be called in process context if error != 0, but it may
|
||||||
|
be called in interrupt context if there is no error.
|
||||||
|
|
||||||
|
Note that a return of -ENODATA, -ENOBUFS or any other error does not preclude
|
||||||
|
some of the pages being read and some being allocated. Those pages will have
|
||||||
|
been marked appropriately and will need uncaching.
|
||||||
|
|
||||||
|
|
||||||
|
==============
|
||||||
|
PAGE UNCACHING
|
||||||
|
==============
|
||||||
|
|
||||||
|
To uncache a page, this function should be called:
|
||||||
|
|
||||||
|
void fscache_uncache_page(struct fscache_cookie *cookie,
|
||||||
|
struct page *page);
|
||||||
|
|
||||||
|
This function permits the cache to release any in-memory representation it
|
||||||
|
might be holding for this netfs page. This function must be called once for
|
||||||
|
each page on which the read or write page functions above have been called to
|
||||||
|
make sure the cache's in-memory tracking information gets torn down.
|
||||||
|
|
||||||
|
Note that pages can't be explicitly deleted from the a data file. The whole
|
||||||
|
data file must be retired (see the relinquish cookie function below).
|
||||||
|
|
||||||
|
Furthermore, note that this does not cancel the asynchronous read or write
|
||||||
|
operation started by the read/alloc and write functions, so the page
|
||||||
|
invalidation and release functions must use:
|
||||||
|
|
||||||
|
bool fscache_check_page_write(struct fscache_cookie *cookie,
|
||||||
|
struct page *page);
|
||||||
|
|
||||||
|
to see if a page is being written to the cache, and:
|
||||||
|
|
||||||
|
void fscache_wait_on_page_write(struct fscache_cookie *cookie,
|
||||||
|
struct page *page);
|
||||||
|
|
||||||
|
to wait for it to finish if it is.
|
||||||
|
|
||||||
|
|
||||||
|
==========================
|
||||||
|
INDEX AND DATA FILE UPDATE
|
||||||
|
==========================
|
||||||
|
|
||||||
|
To request an update of the index data for an index or other object, the
|
||||||
|
following function should be called:
|
||||||
|
|
||||||
|
void fscache_update_cookie(struct fscache_cookie *cookie);
|
||||||
|
|
||||||
|
This function will refer back to the netfs_data pointer stored in the cookie by
|
||||||
|
the acquisition function to obtain the data to write into each revised index
|
||||||
|
entry. The update method in the parent index definition will be called to
|
||||||
|
transfer the data.
|
||||||
|
|
||||||
|
Note that partial updates may happen automatically at other times, such as when
|
||||||
|
data blocks are added to a data file object.
|
||||||
|
|
||||||
|
|
||||||
|
===============================
|
||||||
|
MISCELLANEOUS COOKIE OPERATIONS
|
||||||
|
===============================
|
||||||
|
|
||||||
|
There are a number of operations that can be used to control cookies:
|
||||||
|
|
||||||
|
(*) Cookie pinning:
|
||||||
|
|
||||||
|
int fscache_pin_cookie(struct fscache_cookie *cookie);
|
||||||
|
void fscache_unpin_cookie(struct fscache_cookie *cookie);
|
||||||
|
|
||||||
|
These operations permit data cookies to be pinned into the cache and to
|
||||||
|
have the pinning removed. They are not permitted on index cookies.
|
||||||
|
|
||||||
|
The pinning function will return 0 if successful, -ENOBUFS in the cookie
|
||||||
|
isn't backed by a cache, -EOPNOTSUPP if the cache doesn't support pinning,
|
||||||
|
-ENOSPC if there isn't enough space to honour the operation, -ENOMEM or
|
||||||
|
-EIO if there's any other problem.
|
||||||
|
|
||||||
|
(*) Data space reservation:
|
||||||
|
|
||||||
|
int fscache_reserve_space(struct fscache_cookie *cookie, loff_t size);
|
||||||
|
|
||||||
|
This permits a netfs to request cache space be reserved to store up to the
|
||||||
|
given amount of a file. It is permitted to ask for more than the current
|
||||||
|
size of the file to allow for future file expansion.
|
||||||
|
|
||||||
|
If size is given as zero then the reservation will be cancelled.
|
||||||
|
|
||||||
|
The function will return 0 if successful, -ENOBUFS in the cookie isn't
|
||||||
|
backed by a cache, -EOPNOTSUPP if the cache doesn't support reservations,
|
||||||
|
-ENOSPC if there isn't enough space to honour the operation, -ENOMEM or
|
||||||
|
-EIO if there's any other problem.
|
||||||
|
|
||||||
|
Note that this doesn't pin an object in a cache; it can still be culled to
|
||||||
|
make space if it's not in use.
|
||||||
|
|
||||||
|
|
||||||
|
=====================
|
||||||
|
COOKIE UNREGISTRATION
|
||||||
|
=====================
|
||||||
|
|
||||||
|
To get rid of a cookie, this function should be called.
|
||||||
|
|
||||||
|
void fscache_relinquish_cookie(struct fscache_cookie *cookie,
|
||||||
|
int retire);
|
||||||
|
|
||||||
|
If retire is non-zero, then the object will be marked for recycling, and all
|
||||||
|
copies of it will be removed from all active caches in which it is present.
|
||||||
|
Not only that but all child objects will also be retired.
|
||||||
|
|
||||||
|
If retire is zero, then the object may be available again when next the
|
||||||
|
acquisition function is called. Retirement here will overrule the pinning on a
|
||||||
|
cookie.
|
||||||
|
|
||||||
|
One very important note - relinquish must NOT be called for a cookie unless all
|
||||||
|
the cookies for "child" indices, objects and pages have been relinquished
|
||||||
|
first.
|
||||||
|
|
||||||
|
|
||||||
|
================================
|
||||||
|
INDEX AND DATA FILE INVALIDATION
|
||||||
|
================================
|
||||||
|
|
||||||
|
There is no direct way to invalidate an index subtree or a data file. To do
|
||||||
|
this, the caller should relinquish and retire the cookie they have, and then
|
||||||
|
acquire a new one.
|
||||||
|
|
||||||
|
|
||||||
|
===========================
|
||||||
|
FS-CACHE SPECIFIC PAGE FLAG
|
||||||
|
===========================
|
||||||
|
|
||||||
|
FS-Cache makes use of a page flag, PG_private_2, for its own purpose. This is
|
||||||
|
given the alternative name PG_fscache.
|
||||||
|
|
||||||
|
PG_fscache is used to indicate that the page is known by the cache, and that
|
||||||
|
the cache must be informed if the page is going to go away. It's an indication
|
||||||
|
to the netfs that the cache has an interest in this page, where an interest may
|
||||||
|
be a pointer to it, resources allocated or reserved for it, or I/O in progress
|
||||||
|
upon it.
|
||||||
|
|
||||||
|
The netfs can use this information in methods such as releasepage() to
|
||||||
|
determine whether it needs to uncache a page or update it.
|
||||||
|
|
||||||
|
Furthermore, if this bit is set, releasepage() and invalidatepage() operations
|
||||||
|
will be called on a page to get rid of it, even if PG_private is not set. This
|
||||||
|
allows caching to attempted on a page before read_cache_pages() to be called
|
||||||
|
after fscache_read_or_alloc_pages() as the former will try and release pages it
|
||||||
|
was given under certain circumstances.
|
||||||
|
|
||||||
|
This bit does not overlap with such as PG_private. This means that FS-Cache
|
||||||
|
can be used with a filesystem that uses the block buffering code.
|
||||||
|
|
||||||
|
There are a number of operations defined on this flag:
|
||||||
|
|
||||||
|
int PageFsCache(struct page *page);
|
||||||
|
void SetPageFsCache(struct page *page)
|
||||||
|
void ClearPageFsCache(struct page *page)
|
||||||
|
int TestSetPageFsCache(struct page *page)
|
||||||
|
int TestClearPageFsCache(struct page *page)
|
||||||
|
|
||||||
|
These functions are bit test, bit set, bit clear, bit test and set and bit
|
||||||
|
test and clear operations on PG_fscache.
|
313
Documentation/filesystems/caching/object.txt
Normal file
313
Documentation/filesystems/caching/object.txt
Normal file
@@ -0,0 +1,313 @@
|
|||||||
|
====================================================
|
||||||
|
IN-KERNEL CACHE OBJECT REPRESENTATION AND MANAGEMENT
|
||||||
|
====================================================
|
||||||
|
|
||||||
|
By: David Howells <dhowells@redhat.com>
|
||||||
|
|
||||||
|
Contents:
|
||||||
|
|
||||||
|
(*) Representation
|
||||||
|
|
||||||
|
(*) Object management state machine.
|
||||||
|
|
||||||
|
- Provision of cpu time.
|
||||||
|
- Locking simplification.
|
||||||
|
|
||||||
|
(*) The set of states.
|
||||||
|
|
||||||
|
(*) The set of events.
|
||||||
|
|
||||||
|
|
||||||
|
==============
|
||||||
|
REPRESENTATION
|
||||||
|
==============
|
||||||
|
|
||||||
|
FS-Cache maintains an in-kernel representation of each object that a netfs is
|
||||||
|
currently interested in. Such objects are represented by the fscache_cookie
|
||||||
|
struct and are referred to as cookies.
|
||||||
|
|
||||||
|
FS-Cache also maintains a separate in-kernel representation of the objects that
|
||||||
|
a cache backend is currently actively caching. Such objects are represented by
|
||||||
|
the fscache_object struct. The cache backends allocate these upon request, and
|
||||||
|
are expected to embed them in their own representations. These are referred to
|
||||||
|
as objects.
|
||||||
|
|
||||||
|
There is a 1:N relationship between cookies and objects. A cookie may be
|
||||||
|
represented by multiple objects - an index may exist in more than one cache -
|
||||||
|
or even by no objects (it may not be cached).
|
||||||
|
|
||||||
|
Furthermore, both cookies and objects are hierarchical. The two hierarchies
|
||||||
|
correspond, but the cookies tree is a superset of the union of the object trees
|
||||||
|
of multiple caches:
|
||||||
|
|
||||||
|
NETFS INDEX TREE : CACHE 1 : CACHE 2
|
||||||
|
: :
|
||||||
|
: +-----------+ :
|
||||||
|
+----------->| IObject | :
|
||||||
|
+-----------+ | : +-----------+ :
|
||||||
|
| ICookie |-------+ : | :
|
||||||
|
+-----------+ | : | : +-----------+
|
||||||
|
| +------------------------------>| IObject |
|
||||||
|
| : | : +-----------+
|
||||||
|
| : V : |
|
||||||
|
| : +-----------+ : |
|
||||||
|
V +----------->| IObject | : |
|
||||||
|
+-----------+ | : +-----------+ : |
|
||||||
|
| ICookie |-------+ : | : V
|
||||||
|
+-----------+ | : | : +-----------+
|
||||||
|
| +------------------------------>| IObject |
|
||||||
|
+-----+-----+ : | : +-----------+
|
||||||
|
| | : | : |
|
||||||
|
V | : V : |
|
||||||
|
+-----------+ | : +-----------+ : |
|
||||||
|
| ICookie |------------------------->| IObject | : |
|
||||||
|
+-----------+ | : +-----------+ : |
|
||||||
|
| V : | : V
|
||||||
|
| +-----------+ : | : +-----------+
|
||||||
|
| | ICookie |-------------------------------->| IObject |
|
||||||
|
| +-----------+ : | : +-----------+
|
||||||
|
V | : V : |
|
||||||
|
+-----------+ | : +-----------+ : |
|
||||||
|
| DCookie |------------------------->| DObject | : |
|
||||||
|
+-----------+ | : +-----------+ : |
|
||||||
|
| : : |
|
||||||
|
+-------+-------+ : : |
|
||||||
|
| | : : |
|
||||||
|
V V : : V
|
||||||
|
+-----------+ +-----------+ : : +-----------+
|
||||||
|
| DCookie | | DCookie |------------------------>| DObject |
|
||||||
|
+-----------+ +-----------+ : : +-----------+
|
||||||
|
: :
|
||||||
|
|
||||||
|
In the above illustration, ICookie and IObject represent indices and DCookie
|
||||||
|
and DObject represent data storage objects. Indices may have representation in
|
||||||
|
multiple caches, but currently, non-index objects may not. Objects of any type
|
||||||
|
may also be entirely unrepresented.
|
||||||
|
|
||||||
|
As far as the netfs API goes, the netfs is only actually permitted to see
|
||||||
|
pointers to the cookies. The cookies themselves and any objects attached to
|
||||||
|
those cookies are hidden from it.
|
||||||
|
|
||||||
|
|
||||||
|
===============================
|
||||||
|
OBJECT MANAGEMENT STATE MACHINE
|
||||||
|
===============================
|
||||||
|
|
||||||
|
Within FS-Cache, each active object is managed by its own individual state
|
||||||
|
machine. The state for an object is kept in the fscache_object struct, in
|
||||||
|
object->state. A cookie may point to a set of objects that are in different
|
||||||
|
states.
|
||||||
|
|
||||||
|
Each state has an action associated with it that is invoked when the machine
|
||||||
|
wakes up in that state. There are four logical sets of states:
|
||||||
|
|
||||||
|
(1) Preparation: states that wait for the parent objects to become ready. The
|
||||||
|
representations are hierarchical, and it is expected that an object must
|
||||||
|
be created or accessed with respect to its parent object.
|
||||||
|
|
||||||
|
(2) Initialisation: states that perform lookups in the cache and validate
|
||||||
|
what's found and that create on disk any missing metadata.
|
||||||
|
|
||||||
|
(3) Normal running: states that allow netfs operations on objects to proceed
|
||||||
|
and that update the state of objects.
|
||||||
|
|
||||||
|
(4) Termination: states that detach objects from their netfs cookies, that
|
||||||
|
delete objects from disk, that handle disk and system errors and that free
|
||||||
|
up in-memory resources.
|
||||||
|
|
||||||
|
|
||||||
|
In most cases, transitioning between states is in response to signalled events.
|
||||||
|
When a state has finished processing, it will usually set the mask of events in
|
||||||
|
which it is interested (object->event_mask) and relinquish the worker thread.
|
||||||
|
Then when an event is raised (by calling fscache_raise_event()), if the event
|
||||||
|
is not masked, the object will be queued for processing (by calling
|
||||||
|
fscache_enqueue_object()).
|
||||||
|
|
||||||
|
|
||||||
|
PROVISION OF CPU TIME
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
The work to be done by the various states is given CPU time by the threads of
|
||||||
|
the slow work facility (see Documentation/slow-work.txt). This is used in
|
||||||
|
preference to the workqueue facility because:
|
||||||
|
|
||||||
|
(1) Threads may be completely occupied for very long periods of time by a
|
||||||
|
particular work item. These state actions may be doing sequences of
|
||||||
|
synchronous, journalled disk accesses (lookup, mkdir, create, setxattr,
|
||||||
|
getxattr, truncate, unlink, rmdir, rename).
|
||||||
|
|
||||||
|
(2) Threads may do little actual work, but may rather spend a lot of time
|
||||||
|
sleeping on I/O. This means that single-threaded and 1-per-CPU-threaded
|
||||||
|
workqueues don't necessarily have the right numbers of threads.
|
||||||
|
|
||||||
|
|
||||||
|
LOCKING SIMPLIFICATION
|
||||||
|
----------------------
|
||||||
|
|
||||||
|
Because only one worker thread may be operating on any particular object's
|
||||||
|
state machine at once, this simplifies the locking, particularly with respect
|
||||||
|
to disconnecting the netfs's representation of a cache object (fscache_cookie)
|
||||||
|
from the cache backend's representation (fscache_object) - which may be
|
||||||
|
requested from either end.
|
||||||
|
|
||||||
|
|
||||||
|
=================
|
||||||
|
THE SET OF STATES
|
||||||
|
=================
|
||||||
|
|
||||||
|
The object state machine has a set of states that it can be in. There are
|
||||||
|
preparation states in which the object sets itself up and waits for its parent
|
||||||
|
object to transit to a state that allows access to its children:
|
||||||
|
|
||||||
|
(1) State FSCACHE_OBJECT_INIT.
|
||||||
|
|
||||||
|
Initialise the object and wait for the parent object to become active. In
|
||||||
|
the cache, it is expected that it will not be possible to look an object
|
||||||
|
up from the parent object, until that parent object itself has been looked
|
||||||
|
up.
|
||||||
|
|
||||||
|
There are initialisation states in which the object sets itself up and accesses
|
||||||
|
disk for the object metadata:
|
||||||
|
|
||||||
|
(2) State FSCACHE_OBJECT_LOOKING_UP.
|
||||||
|
|
||||||
|
Look up the object on disk, using the parent as a starting point.
|
||||||
|
FS-Cache expects the cache backend to probe the cache to see whether this
|
||||||
|
object is represented there, and if it is, to see if it's valid (coherency
|
||||||
|
management).
|
||||||
|
|
||||||
|
The cache should call fscache_object_lookup_negative() to indicate lookup
|
||||||
|
failure for whatever reason, and should call fscache_obtained_object() to
|
||||||
|
indicate success.
|
||||||
|
|
||||||
|
At the completion of lookup, FS-Cache will let the netfs go ahead with
|
||||||
|
read operations, no matter whether the file is yet cached. If not yet
|
||||||
|
cached, read operations will be immediately rejected with ENODATA until
|
||||||
|
the first known page is uncached - as to that point there can be no data
|
||||||
|
to be read out of the cache for that file that isn't currently also held
|
||||||
|
in the pagecache.
|
||||||
|
|
||||||
|
(3) State FSCACHE_OBJECT_CREATING.
|
||||||
|
|
||||||
|
Create an object on disk, using the parent as a starting point. This
|
||||||
|
happens if the lookup failed to find the object, or if the object's
|
||||||
|
coherency data indicated what's on disk is out of date. In this state,
|
||||||
|
FS-Cache expects the cache to create
|
||||||
|
|
||||||
|
The cache should call fscache_obtained_object() if creation completes
|
||||||
|
successfully, fscache_object_lookup_negative() otherwise.
|
||||||
|
|
||||||
|
At the completion of creation, FS-Cache will start processing write
|
||||||
|
operations the netfs has queued for an object. If creation failed, the
|
||||||
|
write ops will be transparently discarded, and nothing recorded in the
|
||||||
|
cache.
|
||||||
|
|
||||||
|
There are some normal running states in which the object spends its time
|
||||||
|
servicing netfs requests:
|
||||||
|
|
||||||
|
(4) State FSCACHE_OBJECT_AVAILABLE.
|
||||||
|
|
||||||
|
A transient state in which pending operations are started, child objects
|
||||||
|
are permitted to advance from FSCACHE_OBJECT_INIT state, and temporary
|
||||||
|
lookup data is freed.
|
||||||
|
|
||||||
|
(5) State FSCACHE_OBJECT_ACTIVE.
|
||||||
|
|
||||||
|
The normal running state. In this state, requests the netfs makes will be
|
||||||
|
passed on to the cache.
|
||||||
|
|
||||||
|
(6) State FSCACHE_OBJECT_UPDATING.
|
||||||
|
|
||||||
|
The state machine comes here to update the object in the cache from the
|
||||||
|
netfs's records. This involves updating the auxiliary data that is used
|
||||||
|
to maintain coherency.
|
||||||
|
|
||||||
|
And there are terminal states in which an object cleans itself up, deallocates
|
||||||
|
memory and potentially deletes stuff from disk:
|
||||||
|
|
||||||
|
(7) State FSCACHE_OBJECT_LC_DYING.
|
||||||
|
|
||||||
|
The object comes here if it is dying because of a lookup or creation
|
||||||
|
error. This would be due to a disk error or system error of some sort.
|
||||||
|
Temporary data is cleaned up, and the parent is released.
|
||||||
|
|
||||||
|
(8) State FSCACHE_OBJECT_DYING.
|
||||||
|
|
||||||
|
The object comes here if it is dying due to an error, because its parent
|
||||||
|
cookie has been relinquished by the netfs or because the cache is being
|
||||||
|
withdrawn.
|
||||||
|
|
||||||
|
Any child objects waiting on this one are given CPU time so that they too
|
||||||
|
can destroy themselves. This object waits for all its children to go away
|
||||||
|
before advancing to the next state.
|
||||||
|
|
||||||
|
(9) State FSCACHE_OBJECT_ABORT_INIT.
|
||||||
|
|
||||||
|
The object comes to this state if it was waiting on its parent in
|
||||||
|
FSCACHE_OBJECT_INIT, but its parent died. The object will destroy itself
|
||||||
|
so that the parent may proceed from the FSCACHE_OBJECT_DYING state.
|
||||||
|
|
||||||
|
(10) State FSCACHE_OBJECT_RELEASING.
|
||||||
|
(11) State FSCACHE_OBJECT_RECYCLING.
|
||||||
|
|
||||||
|
The object comes to one of these two states when dying once it is rid of
|
||||||
|
all its children, if it is dying because the netfs relinquished its
|
||||||
|
cookie. In the first state, the cached data is expected to persist, and
|
||||||
|
in the second it will be deleted.
|
||||||
|
|
||||||
|
(12) State FSCACHE_OBJECT_WITHDRAWING.
|
||||||
|
|
||||||
|
The object transits to this state if the cache decides it wants to
|
||||||
|
withdraw the object from service, perhaps to make space, but also due to
|
||||||
|
error or just because the whole cache is being withdrawn.
|
||||||
|
|
||||||
|
(13) State FSCACHE_OBJECT_DEAD.
|
||||||
|
|
||||||
|
The object transits to this state when the in-memory object record is
|
||||||
|
ready to be deleted. The object processor shouldn't ever see an object in
|
||||||
|
this state.
|
||||||
|
|
||||||
|
|
||||||
|
THE SET OF EVENTS
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
There are a number of events that can be raised to an object state machine:
|
||||||
|
|
||||||
|
(*) FSCACHE_OBJECT_EV_UPDATE
|
||||||
|
|
||||||
|
The netfs requested that an object be updated. The state machine will ask
|
||||||
|
the cache backend to update the object, and the cache backend will ask the
|
||||||
|
netfs for details of the change through its cookie definition ops.
|
||||||
|
|
||||||
|
(*) FSCACHE_OBJECT_EV_CLEARED
|
||||||
|
|
||||||
|
This is signalled in two circumstances:
|
||||||
|
|
||||||
|
(a) when an object's last child object is dropped and
|
||||||
|
|
||||||
|
(b) when the last operation outstanding on an object is completed.
|
||||||
|
|
||||||
|
This is used to proceed from the dying state.
|
||||||
|
|
||||||
|
(*) FSCACHE_OBJECT_EV_ERROR
|
||||||
|
|
||||||
|
This is signalled when an I/O error occurs during the processing of some
|
||||||
|
object.
|
||||||
|
|
||||||
|
(*) FSCACHE_OBJECT_EV_RELEASE
|
||||||
|
(*) FSCACHE_OBJECT_EV_RETIRE
|
||||||
|
|
||||||
|
These are signalled when the netfs relinquishes a cookie it was using.
|
||||||
|
The event selected depends on whether the netfs asks for the backing
|
||||||
|
object to be retired (deleted) or retained.
|
||||||
|
|
||||||
|
(*) FSCACHE_OBJECT_EV_WITHDRAW
|
||||||
|
|
||||||
|
This is signalled when the cache backend wants to withdraw an object.
|
||||||
|
This means that the object will have to be detached from the netfs's
|
||||||
|
cookie.
|
||||||
|
|
||||||
|
Because the withdrawing releasing/retiring events are all handled by the object
|
||||||
|
state machine, it doesn't matter if there's a collision with both ends trying
|
||||||
|
to sever the connection at the same time. The state machine can just pick
|
||||||
|
which one it wants to honour, and that effects the other.
|
213
Documentation/filesystems/caching/operations.txt
Normal file
213
Documentation/filesystems/caching/operations.txt
Normal file
@@ -0,0 +1,213 @@
|
|||||||
|
================================
|
||||||
|
ASYNCHRONOUS OPERATIONS HANDLING
|
||||||
|
================================
|
||||||
|
|
||||||
|
By: David Howells <dhowells@redhat.com>
|
||||||
|
|
||||||
|
Contents:
|
||||||
|
|
||||||
|
(*) Overview.
|
||||||
|
|
||||||
|
(*) Operation record initialisation.
|
||||||
|
|
||||||
|
(*) Parameters.
|
||||||
|
|
||||||
|
(*) Procedure.
|
||||||
|
|
||||||
|
(*) Asynchronous callback.
|
||||||
|
|
||||||
|
|
||||||
|
========
|
||||||
|
OVERVIEW
|
||||||
|
========
|
||||||
|
|
||||||
|
FS-Cache has an asynchronous operations handling facility that it uses for its
|
||||||
|
data storage and retrieval routines. Its operations are represented by
|
||||||
|
fscache_operation structs, though these are usually embedded into some other
|
||||||
|
structure.
|
||||||
|
|
||||||
|
This facility is available to and expected to be be used by the cache backends,
|
||||||
|
and FS-Cache will create operations and pass them off to the appropriate cache
|
||||||
|
backend for completion.
|
||||||
|
|
||||||
|
To make use of this facility, <linux/fscache-cache.h> should be #included.
|
||||||
|
|
||||||
|
|
||||||
|
===============================
|
||||||
|
OPERATION RECORD INITIALISATION
|
||||||
|
===============================
|
||||||
|
|
||||||
|
An operation is recorded in an fscache_operation struct:
|
||||||
|
|
||||||
|
struct fscache_operation {
|
||||||
|
union {
|
||||||
|
struct work_struct fast_work;
|
||||||
|
struct slow_work slow_work;
|
||||||
|
};
|
||||||
|
unsigned long flags;
|
||||||
|
fscache_operation_processor_t processor;
|
||||||
|
...
|
||||||
|
};
|
||||||
|
|
||||||
|
Someone wanting to issue an operation should allocate something with this
|
||||||
|
struct embedded in it. They should initialise it by calling:
|
||||||
|
|
||||||
|
void fscache_operation_init(struct fscache_operation *op,
|
||||||
|
fscache_operation_release_t release);
|
||||||
|
|
||||||
|
with the operation to be initialised and the release function to use.
|
||||||
|
|
||||||
|
The op->flags parameter should be set to indicate the CPU time provision and
|
||||||
|
the exclusivity (see the Parameters section).
|
||||||
|
|
||||||
|
The op->fast_work, op->slow_work and op->processor flags should be set as
|
||||||
|
appropriate for the CPU time provision (see the Parameters section).
|
||||||
|
|
||||||
|
FSCACHE_OP_WAITING may be set in op->flags prior to each submission of the
|
||||||
|
operation and waited for afterwards.
|
||||||
|
|
||||||
|
|
||||||
|
==========
|
||||||
|
PARAMETERS
|
||||||
|
==========
|
||||||
|
|
||||||
|
There are a number of parameters that can be set in the operation record's flag
|
||||||
|
parameter. There are three options for the provision of CPU time in these
|
||||||
|
operations:
|
||||||
|
|
||||||
|
(1) The operation may be done synchronously (FSCACHE_OP_MYTHREAD). A thread
|
||||||
|
may decide it wants to handle an operation itself without deferring it to
|
||||||
|
another thread.
|
||||||
|
|
||||||
|
This is, for example, used in read operations for calling readpages() on
|
||||||
|
the backing filesystem in CacheFiles. Although readpages() does an
|
||||||
|
asynchronous data fetch, the determination of whether pages exist is done
|
||||||
|
synchronously - and the netfs does not proceed until this has been
|
||||||
|
determined.
|
||||||
|
|
||||||
|
If this option is to be used, FSCACHE_OP_WAITING must be set in op->flags
|
||||||
|
before submitting the operation, and the operating thread must wait for it
|
||||||
|
to be cleared before proceeding:
|
||||||
|
|
||||||
|
wait_on_bit(&op->flags, FSCACHE_OP_WAITING,
|
||||||
|
fscache_wait_bit, TASK_UNINTERRUPTIBLE);
|
||||||
|
|
||||||
|
|
||||||
|
(2) The operation may be fast asynchronous (FSCACHE_OP_FAST), in which case it
|
||||||
|
will be given to keventd to process. Such an operation is not permitted
|
||||||
|
to sleep on I/O.
|
||||||
|
|
||||||
|
This is, for example, used by CacheFiles to copy data from a backing fs
|
||||||
|
page to a netfs page after the backing fs has read the page in.
|
||||||
|
|
||||||
|
If this option is used, op->fast_work and op->processor must be
|
||||||
|
initialised before submitting the operation:
|
||||||
|
|
||||||
|
INIT_WORK(&op->fast_work, do_some_work);
|
||||||
|
|
||||||
|
|
||||||
|
(3) The operation may be slow asynchronous (FSCACHE_OP_SLOW), in which case it
|
||||||
|
will be given to the slow work facility to process. Such an operation is
|
||||||
|
permitted to sleep on I/O.
|
||||||
|
|
||||||
|
This is, for example, used by FS-Cache to handle background writes of
|
||||||
|
pages that have just been fetched from a remote server.
|
||||||
|
|
||||||
|
If this option is used, op->slow_work and op->processor must be
|
||||||
|
initialised before submitting the operation:
|
||||||
|
|
||||||
|
fscache_operation_init_slow(op, processor)
|
||||||
|
|
||||||
|
|
||||||
|
Furthermore, operations may be one of two types:
|
||||||
|
|
||||||
|
(1) Exclusive (FSCACHE_OP_EXCLUSIVE). Operations of this type may not run in
|
||||||
|
conjunction with any other operation on the object being operated upon.
|
||||||
|
|
||||||
|
An example of this is the attribute change operation, in which the file
|
||||||
|
being written to may need truncation.
|
||||||
|
|
||||||
|
(2) Shareable. Operations of this type may be running simultaneously. It's
|
||||||
|
up to the operation implementation to prevent interference between other
|
||||||
|
operations running at the same time.
|
||||||
|
|
||||||
|
|
||||||
|
=========
|
||||||
|
PROCEDURE
|
||||||
|
=========
|
||||||
|
|
||||||
|
Operations are used through the following procedure:
|
||||||
|
|
||||||
|
(1) The submitting thread must allocate the operation and initialise it
|
||||||
|
itself. Normally this would be part of a more specific structure with the
|
||||||
|
generic op embedded within.
|
||||||
|
|
||||||
|
(2) The submitting thread must then submit the operation for processing using
|
||||||
|
one of the following two functions:
|
||||||
|
|
||||||
|
int fscache_submit_op(struct fscache_object *object,
|
||||||
|
struct fscache_operation *op);
|
||||||
|
|
||||||
|
int fscache_submit_exclusive_op(struct fscache_object *object,
|
||||||
|
struct fscache_operation *op);
|
||||||
|
|
||||||
|
The first function should be used to submit non-exclusive ops and the
|
||||||
|
second to submit exclusive ones. The caller must still set the
|
||||||
|
FSCACHE_OP_EXCLUSIVE flag.
|
||||||
|
|
||||||
|
If successful, both functions will assign the operation to the specified
|
||||||
|
object and return 0. -ENOBUFS will be returned if the object specified is
|
||||||
|
permanently unavailable.
|
||||||
|
|
||||||
|
The operation manager will defer operations on an object that is still
|
||||||
|
undergoing lookup or creation. The operation will also be deferred if an
|
||||||
|
operation of conflicting exclusivity is in progress on the object.
|
||||||
|
|
||||||
|
If the operation is asynchronous, the manager will retain a reference to
|
||||||
|
it, so the caller should put their reference to it by passing it to:
|
||||||
|
|
||||||
|
void fscache_put_operation(struct fscache_operation *op);
|
||||||
|
|
||||||
|
(3) If the submitting thread wants to do the work itself, and has marked the
|
||||||
|
operation with FSCACHE_OP_MYTHREAD, then it should monitor
|
||||||
|
FSCACHE_OP_WAITING as described above and check the state of the object if
|
||||||
|
necessary (the object might have died whilst the thread was waiting).
|
||||||
|
|
||||||
|
When it has finished doing its processing, it should call
|
||||||
|
fscache_put_operation() on it.
|
||||||
|
|
||||||
|
(4) The operation holds an effective lock upon the object, preventing other
|
||||||
|
exclusive ops conflicting until it is released. The operation can be
|
||||||
|
enqueued for further immediate asynchronous processing by adjusting the
|
||||||
|
CPU time provisioning option if necessary, eg:
|
||||||
|
|
||||||
|
op->flags &= ~FSCACHE_OP_TYPE;
|
||||||
|
op->flags |= ~FSCACHE_OP_FAST;
|
||||||
|
|
||||||
|
and calling:
|
||||||
|
|
||||||
|
void fscache_enqueue_operation(struct fscache_operation *op)
|
||||||
|
|
||||||
|
This can be used to allow other things to have use of the worker thread
|
||||||
|
pools.
|
||||||
|
|
||||||
|
|
||||||
|
=====================
|
||||||
|
ASYNCHRONOUS CALLBACK
|
||||||
|
=====================
|
||||||
|
|
||||||
|
When used in asynchronous mode, the worker thread pool will invoke the
|
||||||
|
processor method with a pointer to the operation. This should then get at the
|
||||||
|
container struct by using container_of():
|
||||||
|
|
||||||
|
static void fscache_write_op(struct fscache_operation *_op)
|
||||||
|
{
|
||||||
|
struct fscache_storage *op =
|
||||||
|
container_of(_op, struct fscache_storage, op);
|
||||||
|
...
|
||||||
|
}
|
||||||
|
|
||||||
|
The caller holds a reference on the operation, and will invoke
|
||||||
|
fscache_put_operation() when the processor function returns. The processor
|
||||||
|
function is at liberty to call fscache_enqueue_operation() or to take extra
|
||||||
|
references.
|
176
Documentation/filesystems/exofs.txt
Normal file
176
Documentation/filesystems/exofs.txt
Normal file
@@ -0,0 +1,176 @@
|
|||||||
|
===============================================================================
|
||||||
|
WHAT IS EXOFS?
|
||||||
|
===============================================================================
|
||||||
|
|
||||||
|
exofs is a file system that uses an OSD and exports the API of a normal Linux
|
||||||
|
file system. Users access exofs like any other local file system, and exofs
|
||||||
|
will in turn issue commands to the local OSD initiator.
|
||||||
|
|
||||||
|
OSD is a new T10 command set that views storage devices not as a large/flat
|
||||||
|
array of sectors but as a container of objects, each having a length, quota,
|
||||||
|
time attributes and more. Each object is addressed by a 64bit ID, and is
|
||||||
|
contained in a 64bit ID partition. Each object has associated attributes
|
||||||
|
attached to it, which are integral part of the object and provide metadata about
|
||||||
|
the object. The standard defines some common obligatory attributes, but user
|
||||||
|
attributes can be added as needed.
|
||||||
|
|
||||||
|
===============================================================================
|
||||||
|
ENVIRONMENT
|
||||||
|
===============================================================================
|
||||||
|
|
||||||
|
To use this file system, you need to have an object store to run it on. You
|
||||||
|
may download a target from:
|
||||||
|
http://open-osd.org
|
||||||
|
|
||||||
|
See Documentation/scsi/osd.txt for how to setup a working osd environment.
|
||||||
|
|
||||||
|
===============================================================================
|
||||||
|
USAGE
|
||||||
|
===============================================================================
|
||||||
|
|
||||||
|
1. Download and compile exofs and open-osd initiator:
|
||||||
|
You need an external Kernel source tree or kernel headers from your
|
||||||
|
distribution. (anything based on 2.6.26 or later).
|
||||||
|
|
||||||
|
a. download open-osd including exofs source using:
|
||||||
|
[parent-directory]$ git clone git://git.open-osd.org/open-osd.git
|
||||||
|
|
||||||
|
b. Build the library module like this:
|
||||||
|
[parent-directory]$ make -C KSRC=$(KER_DIR) open-osd
|
||||||
|
|
||||||
|
This will build both the open-osd initiator as well as the exofs kernel
|
||||||
|
module. Use whatever parameters you compiled your Kernel with and
|
||||||
|
$(KER_DIR) above pointing to the Kernel you compile against. See the file
|
||||||
|
open-osd/top-level-Makefile for an example.
|
||||||
|
|
||||||
|
2. Get the OSD initiator and target set up properly, and login to the target.
|
||||||
|
See Documentation/scsi/osd.txt for farther instructions. Also see ./do-osd
|
||||||
|
for example script that does all these steps.
|
||||||
|
|
||||||
|
3. Insmod the exofs.ko module:
|
||||||
|
[exofs]$ insmod exofs.ko
|
||||||
|
|
||||||
|
4. Make sure the directory where you want to mount exists. If not, create it.
|
||||||
|
(For example, mkdir /mnt/exofs)
|
||||||
|
|
||||||
|
5. At first run you will need to invoke the mkfs.exofs application
|
||||||
|
|
||||||
|
As an example, this will create the file system on:
|
||||||
|
/dev/osd0 partition ID 65536
|
||||||
|
|
||||||
|
mkfs.exofs --pid=65536 --format /dev/osd0
|
||||||
|
|
||||||
|
The --format is optional if not specified no OSD_FORMAT will be
|
||||||
|
preformed and a clean file system will be created in the specified pid,
|
||||||
|
in the available space of the target. (Use --format=size_in_meg to limit
|
||||||
|
the total LUN space available)
|
||||||
|
|
||||||
|
If pid already exist it will be deleted and a new one will be created in it's
|
||||||
|
place. Be careful.
|
||||||
|
|
||||||
|
An exofs lives inside a single OSD partition. You can create multiple exofs
|
||||||
|
filesystems on the same device using multiple pids.
|
||||||
|
|
||||||
|
(run mkfs.exofs without any parameters for usage help message)
|
||||||
|
|
||||||
|
6. Mount the file system.
|
||||||
|
|
||||||
|
For example, to mount /dev/osd0, partition ID 0x10000 on /mnt/exofs:
|
||||||
|
|
||||||
|
mount -t exofs -o pid=65536 /dev/osd0 /mnt/exofs/
|
||||||
|
|
||||||
|
7. For reference (See do-exofs example script):
|
||||||
|
do-exofs start - an example of how to perform the above steps.
|
||||||
|
do-exofs stop - an example of how to unmount the file system.
|
||||||
|
do-exofs format - an example of how to format and mkfs a new exofs.
|
||||||
|
|
||||||
|
8. Extra compilation flags (uncomment in fs/exofs/Kbuild):
|
||||||
|
CONFIG_EXOFS_DEBUG - for debug messages and extra checks.
|
||||||
|
|
||||||
|
===============================================================================
|
||||||
|
exofs mount options
|
||||||
|
===============================================================================
|
||||||
|
Similar to any mount command:
|
||||||
|
mount -t exofs -o exofs_options /dev/osdX mount_exofs_directory
|
||||||
|
|
||||||
|
Where:
|
||||||
|
-t exofs: specifies the exofs file system
|
||||||
|
|
||||||
|
/dev/osdX: X is a decimal number. /dev/osdX was created after a successful
|
||||||
|
login into an OSD target.
|
||||||
|
|
||||||
|
mount_exofs_directory: The directory to mount the file system on
|
||||||
|
|
||||||
|
exofs specific options: Options are separated by commas (,)
|
||||||
|
pid=<integer> - The partition number to mount/create as
|
||||||
|
container of the filesystem.
|
||||||
|
This option is mandatory
|
||||||
|
to=<integer> - Timeout in ticks for a single command
|
||||||
|
default is (60 * HZ) [for debugging only]
|
||||||
|
|
||||||
|
===============================================================================
|
||||||
|
DESIGN
|
||||||
|
===============================================================================
|
||||||
|
|
||||||
|
* The file system control block (AKA on-disk superblock) resides in an object
|
||||||
|
with a special ID (defined in common.h).
|
||||||
|
Information included in the file system control block is used to fill the
|
||||||
|
in-memory superblock structure at mount time. This object is created before
|
||||||
|
the file system is used by mkexofs.c It contains information such as:
|
||||||
|
- The file system's magic number
|
||||||
|
- The next inode number to be allocated
|
||||||
|
|
||||||
|
* Each file resides in its own object and contains the data (and it will be
|
||||||
|
possible to extend the file over multiple objects, though this has not been
|
||||||
|
implemented yet).
|
||||||
|
|
||||||
|
* A directory is treated as a file, and essentially contains a list of <file
|
||||||
|
name, inode #> pairs for files that are found in that directory. The object
|
||||||
|
IDs correspond to the files' inode numbers and will be allocated according to
|
||||||
|
a bitmap (stored in a separate object). Now they are allocated using a
|
||||||
|
counter.
|
||||||
|
|
||||||
|
* Each file's control block (AKA on-disk inode) is stored in its object's
|
||||||
|
attributes. This applies to both regular files and other types (directories,
|
||||||
|
device files, symlinks, etc.).
|
||||||
|
|
||||||
|
* Credentials are generated per object (inode and superblock) when they is
|
||||||
|
created in memory (read off disk or created). The credential works for all
|
||||||
|
operations and is used as long as the object remains in memory.
|
||||||
|
|
||||||
|
* Async OSD operations are used whenever possible, but the target may execute
|
||||||
|
them out of order. The operations that concern us are create, delete,
|
||||||
|
readpage, writepage, update_inode, and truncate. The following pairs of
|
||||||
|
operations should execute in the order written, and we need to prevent them
|
||||||
|
from executing in reverse order:
|
||||||
|
- The following are handled with the OBJ_CREATED and OBJ_2BCREATED
|
||||||
|
flags. OBJ_CREATED is set when we know the object exists on the OSD -
|
||||||
|
in create's callback function, and when we successfully do a read_inode.
|
||||||
|
OBJ_2BCREATED is set in the beginning of the create function, so we
|
||||||
|
know that we should wait.
|
||||||
|
- create/delete: delete should wait until the object is created
|
||||||
|
on the OSD.
|
||||||
|
- create/readpage: readpage should be able to return a page
|
||||||
|
full of zeroes in this case. If there was a write already
|
||||||
|
en-route (i.e. create, writepage, readpage) then the page
|
||||||
|
would be locked, and so it would really be the same as
|
||||||
|
create/writepage.
|
||||||
|
- create/writepage: if writepage is called for a sync write, it
|
||||||
|
should wait until the object is created on the OSD.
|
||||||
|
Otherwise, it should just return.
|
||||||
|
- create/truncate: truncate should wait until the object is
|
||||||
|
created on the OSD.
|
||||||
|
- create/update_inode: update_inode should wait until the
|
||||||
|
object is created on the OSD.
|
||||||
|
- Handled by VFS locks:
|
||||||
|
- readpage/delete: shouldn't happen because of page lock.
|
||||||
|
- writepage/delete: shouldn't happen because of page lock.
|
||||||
|
- readpage/writepage: shouldn't happen because of page lock.
|
||||||
|
|
||||||
|
===============================================================================
|
||||||
|
LICENSE/COPYRIGHT
|
||||||
|
===============================================================================
|
||||||
|
The exofs file system is based on ext2 v0.5b (distributed with the Linux kernel
|
||||||
|
version 2.6.10). All files include the original copyrights, and the license
|
||||||
|
is GPL version 2 (only version 2, as is true for the Linux kernel). The
|
||||||
|
Linux kernel can be downloaded from www.kernel.org.
|
@@ -373,10 +373,11 @@ Filesystem Resizing http://ext2resize.sourceforge.net/
|
|||||||
Compression (*) http://e2compr.sourceforge.net/
|
Compression (*) http://e2compr.sourceforge.net/
|
||||||
|
|
||||||
Implementations for:
|
Implementations for:
|
||||||
Windows 95/98/NT/2000 http://uranus.it.swin.edu.au/~jn/linux/Explore2fs.htm
|
Windows 95/98/NT/2000 http://www.chrysocome.net/explore2fs
|
||||||
Windows 95 (*) http://www.yipton.demon.co.uk/content.html#FSDEXT2
|
Windows 95 (*) http://www.yipton.net/content.html#FSDEXT2
|
||||||
DOS client (*) ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/
|
DOS client (*) ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/
|
||||||
OS/2 http://perso.wanadoo.fr/matthieu.willm/ext2-os2/
|
OS/2 (+) ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/
|
||||||
RISC OS client ftp://ftp.barnet.ac.uk/pub/acorn/armlinux/iscafs/
|
RISC OS client http://www.esw-heim.tu-clausthal.de/~marco/smorbrod/IscaFS/
|
||||||
|
|
||||||
(*) no longer actively developed/supported (as of Apr 2001)
|
(*) no longer actively developed/supported (as of Apr 2001)
|
||||||
|
(+) no longer actively developed/supported (as of Mar 2009)
|
||||||
|
@@ -14,6 +14,11 @@ Options
|
|||||||
When mounting an ext3 filesystem, the following option are accepted:
|
When mounting an ext3 filesystem, the following option are accepted:
|
||||||
(*) == default
|
(*) == default
|
||||||
|
|
||||||
|
ro Mount filesystem read only. Note that ext3 will replay
|
||||||
|
the journal (and thus write to the partition) even when
|
||||||
|
mounted "read only". Mount options "ro,noload" can be
|
||||||
|
used to prevent writes to the filesystem.
|
||||||
|
|
||||||
journal=update Update the ext3 file system's journal to the current
|
journal=update Update the ext3 file system's journal to the current
|
||||||
format.
|
format.
|
||||||
|
|
||||||
@@ -27,7 +32,9 @@ journal_dev=devnum When the external journal device's major/minor numbers
|
|||||||
identified through its new major/minor numbers encoded
|
identified through its new major/minor numbers encoded
|
||||||
in devnum.
|
in devnum.
|
||||||
|
|
||||||
noload Don't load the journal on mounting.
|
noload Don't load the journal on mounting. Note that this forces
|
||||||
|
mount of inconsistent filesystem, which can lead to
|
||||||
|
various problems.
|
||||||
|
|
||||||
data=journal All data are committed into the journal prior to being
|
data=journal All data are committed into the journal prior to being
|
||||||
written into the main file system.
|
written into the main file system.
|
||||||
@@ -92,9 +99,12 @@ nocheck
|
|||||||
|
|
||||||
debug Extra debugging information is sent to syslog.
|
debug Extra debugging information is sent to syslog.
|
||||||
|
|
||||||
errors=remount-ro(*) Remount the filesystem read-only on an error.
|
errors=remount-ro Remount the filesystem read-only on an error.
|
||||||
errors=continue Keep going on a filesystem error.
|
errors=continue Keep going on a filesystem error.
|
||||||
errors=panic Panic and halt the machine if an error occurs.
|
errors=panic Panic and halt the machine if an error occurs.
|
||||||
|
(These mount options override the errors behavior
|
||||||
|
specified in the superblock, which can be
|
||||||
|
configured using tune2fs.)
|
||||||
|
|
||||||
data_err=ignore(*) Just print an error message if an error occurs
|
data_err=ignore(*) Just print an error message if an error occurs
|
||||||
in a file data buffer in ordered mode.
|
in a file data buffer in ordered mode.
|
||||||
@@ -198,5 +208,5 @@ kernel source: <file:fs/ext3/>
|
|||||||
programs: http://e2fsprogs.sourceforge.net/
|
programs: http://e2fsprogs.sourceforge.net/
|
||||||
http://ext2resize.sourceforge.net
|
http://ext2resize.sourceforge.net
|
||||||
|
|
||||||
useful links: http://www-106.ibm.com/developerworks/linux/library/l-fs7/
|
useful links: http://www.ibm.com/developerworks/library/l-fs7.html
|
||||||
http://www-106.ibm.com/developerworks/linux/library/l-fs8/
|
http://www.ibm.com/developerworks/library/l-fs8.html
|
||||||
|
@@ -85,7 +85,7 @@ Note: More extensive information for getting started with ext4 can be
|
|||||||
* extent format more robust in face of on-disk corruption due to magics,
|
* extent format more robust in face of on-disk corruption due to magics,
|
||||||
* internal redundancy in tree
|
* internal redundancy in tree
|
||||||
* improved file allocation (multi-block alloc)
|
* improved file allocation (multi-block alloc)
|
||||||
* fix 32000 subdirectory limit
|
* lift 32000 subdirectory limit imposed by i_links_count[1]
|
||||||
* nsec timestamps for mtime, atime, ctime, create time
|
* nsec timestamps for mtime, atime, ctime, create time
|
||||||
* inode version field on disk (NFSv4, Lustre)
|
* inode version field on disk (NFSv4, Lustre)
|
||||||
* reduced e2fsck time via uninit_bg feature
|
* reduced e2fsck time via uninit_bg feature
|
||||||
@@ -100,6 +100,9 @@ Note: More extensive information for getting started with ext4 can be
|
|||||||
* efficent new ordered mode in JBD2 and ext4(avoid using buffer head to force
|
* efficent new ordered mode in JBD2 and ext4(avoid using buffer head to force
|
||||||
the ordering)
|
the ordering)
|
||||||
|
|
||||||
|
[1] Filesystems with a block size of 1k may see a limit imposed by the
|
||||||
|
directory hash tree having a maximum depth of two.
|
||||||
|
|
||||||
2.2 Candidate features for future inclusion
|
2.2 Candidate features for future inclusion
|
||||||
|
|
||||||
* Online defrag (patches available but not well tested)
|
* Online defrag (patches available but not well tested)
|
||||||
@@ -180,8 +183,8 @@ commit=nrsec (*) Ext4 can be told to sync all its data and metadata
|
|||||||
performance.
|
performance.
|
||||||
|
|
||||||
barrier=<0|1(*)> This enables/disables the use of write barriers in
|
barrier=<0|1(*)> This enables/disables the use of write barriers in
|
||||||
the jbd code. barrier=0 disables, barrier=1 enables.
|
barrier(*) the jbd code. barrier=0 disables, barrier=1 enables.
|
||||||
This also requires an IO stack which can support
|
nobarrier This also requires an IO stack which can support
|
||||||
barriers, and if jbd gets an error on a barrier
|
barriers, and if jbd gets an error on a barrier
|
||||||
write, it will disable again with a warning.
|
write, it will disable again with a warning.
|
||||||
Write barriers enforce proper on-disk ordering
|
Write barriers enforce proper on-disk ordering
|
||||||
@@ -189,6 +192,9 @@ barrier=<0|1(*)> This enables/disables the use of write barriers in
|
|||||||
safe to use, at some performance penalty. If
|
safe to use, at some performance penalty. If
|
||||||
your disks are battery-backed in one way or another,
|
your disks are battery-backed in one way or another,
|
||||||
disabling barriers may safely improve performance.
|
disabling barriers may safely improve performance.
|
||||||
|
The mount options "barrier" and "nobarrier" can
|
||||||
|
also be used to enable or disable barriers, for
|
||||||
|
consistency with other ext4 mount options.
|
||||||
|
|
||||||
inode_readahead=n This tuning parameter controls the maximum
|
inode_readahead=n This tuning parameter controls the maximum
|
||||||
number of inode table blocks that ext4's inode
|
number of inode table blocks that ext4's inode
|
||||||
@@ -310,6 +316,24 @@ journal_ioprio=prio The I/O priority (from 0 to 7, where 0 is the
|
|||||||
a slightly higher priority than the default I/O
|
a slightly higher priority than the default I/O
|
||||||
priority.
|
priority.
|
||||||
|
|
||||||
|
auto_da_alloc(*) Many broken applications don't use fsync() when
|
||||||
|
noauto_da_alloc replacing existing files via patterns such as
|
||||||
|
fd = open("foo.new")/write(fd,..)/close(fd)/
|
||||||
|
rename("foo.new", "foo"), or worse yet,
|
||||||
|
fd = open("foo", O_TRUNC)/write(fd,..)/close(fd).
|
||||||
|
If auto_da_alloc is enabled, ext4 will detect
|
||||||
|
the replace-via-rename and replace-via-truncate
|
||||||
|
patterns and force that any delayed allocation
|
||||||
|
blocks are allocated such that at the next
|
||||||
|
journal commit, in the default data=ordered
|
||||||
|
mode, the data blocks of the new file are forced
|
||||||
|
to disk before the rename() operation is
|
||||||
|
commited. This provides roughly the same level
|
||||||
|
of guarantees as ext3, and avoids the
|
||||||
|
"zero-length" problem that can happen when a
|
||||||
|
system crashes before the delayed allocation
|
||||||
|
blocks are forced to disk.
|
||||||
|
|
||||||
Data Mode
|
Data Mode
|
||||||
=========
|
=========
|
||||||
There are 3 different data modes:
|
There are 3 different data modes:
|
||||||
|
159
Documentation/filesystems/knfsd-stats.txt
Normal file
159
Documentation/filesystems/knfsd-stats.txt
Normal file
@@ -0,0 +1,159 @@
|
|||||||
|
|
||||||
|
Kernel NFS Server Statistics
|
||||||
|
============================
|
||||||
|
|
||||||
|
This document describes the format and semantics of the statistics
|
||||||
|
which the kernel NFS server makes available to userspace. These
|
||||||
|
statistics are available in several text form pseudo files, each of
|
||||||
|
which is described separately below.
|
||||||
|
|
||||||
|
In most cases you don't need to know these formats, as the nfsstat(8)
|
||||||
|
program from the nfs-utils distribution provides a helpful command-line
|
||||||
|
interface for extracting and printing them.
|
||||||
|
|
||||||
|
All the files described here are formatted as a sequence of text lines,
|
||||||
|
separated by newline '\n' characters. Lines beginning with a hash
|
||||||
|
'#' character are comments intended for humans and should be ignored
|
||||||
|
by parsing routines. All other lines contain a sequence of fields
|
||||||
|
separated by whitespace.
|
||||||
|
|
||||||
|
/proc/fs/nfsd/pool_stats
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
This file is available in kernels from 2.6.30 onwards, if the
|
||||||
|
/proc/fs/nfsd filesystem is mounted (it almost always should be).
|
||||||
|
|
||||||
|
The first line is a comment which describes the fields present in
|
||||||
|
all the other lines. The other lines present the following data as
|
||||||
|
a sequence of unsigned decimal numeric fields. One line is shown
|
||||||
|
for each NFS thread pool.
|
||||||
|
|
||||||
|
All counters are 64 bits wide and wrap naturally. There is no way
|
||||||
|
to zero these counters, instead applications should do their own
|
||||||
|
rate conversion.
|
||||||
|
|
||||||
|
pool
|
||||||
|
The id number of the NFS thread pool to which this line applies.
|
||||||
|
This number does not change.
|
||||||
|
|
||||||
|
Thread pool ids are a contiguous set of small integers starting
|
||||||
|
at zero. The maximum value depends on the thread pool mode, but
|
||||||
|
currently cannot be larger than the number of CPUs in the system.
|
||||||
|
Note that in the default case there will be a single thread pool
|
||||||
|
which contains all the nfsd threads and all the CPUs in the system,
|
||||||
|
and thus this file will have a single line with a pool id of "0".
|
||||||
|
|
||||||
|
packets-arrived
|
||||||
|
Counts how many NFS packets have arrived. More precisely, this
|
||||||
|
is the number of times that the network stack has notified the
|
||||||
|
sunrpc server layer that new data may be available on a transport
|
||||||
|
(e.g. an NFS or UDP socket or an NFS/RDMA endpoint).
|
||||||
|
|
||||||
|
Depending on the NFS workload patterns and various network stack
|
||||||
|
effects (such as Large Receive Offload) which can combine packets
|
||||||
|
on the wire, this may be either more or less than the number
|
||||||
|
of NFS calls received (which statistic is available elsewhere).
|
||||||
|
However this is a more accurate and less workload-dependent measure
|
||||||
|
of how much CPU load is being placed on the sunrpc server layer
|
||||||
|
due to NFS network traffic.
|
||||||
|
|
||||||
|
sockets-enqueued
|
||||||
|
Counts how many times an NFS transport is enqueued to wait for
|
||||||
|
an nfsd thread to service it, i.e. no nfsd thread was considered
|
||||||
|
available.
|
||||||
|
|
||||||
|
The circumstance this statistic tracks indicates that there was NFS
|
||||||
|
network-facing work to be done but it couldn't be done immediately,
|
||||||
|
thus introducing a small delay in servicing NFS calls. The ideal
|
||||||
|
rate of change for this counter is zero; significantly non-zero
|
||||||
|
values may indicate a performance limitation.
|
||||||
|
|
||||||
|
This can happen either because there are too few nfsd threads in the
|
||||||
|
thread pool for the NFS workload (the workload is thread-limited),
|
||||||
|
or because the NFS workload needs more CPU time than is available in
|
||||||
|
the thread pool (the workload is CPU-limited). In the former case,
|
||||||
|
configuring more nfsd threads will probably improve the performance
|
||||||
|
of the NFS workload. In the latter case, the sunrpc server layer is
|
||||||
|
already choosing not to wake idle nfsd threads because there are too
|
||||||
|
many nfsd threads which want to run but cannot, so configuring more
|
||||||
|
nfsd threads will make no difference whatsoever. The overloads-avoided
|
||||||
|
statistic (see below) can be used to distinguish these cases.
|
||||||
|
|
||||||
|
threads-woken
|
||||||
|
Counts how many times an idle nfsd thread is woken to try to
|
||||||
|
receive some data from an NFS transport.
|
||||||
|
|
||||||
|
This statistic tracks the circumstance where incoming
|
||||||
|
network-facing NFS work is being handled quickly, which is a good
|
||||||
|
thing. The ideal rate of change for this counter will be close
|
||||||
|
to but less than the rate of change of the packets-arrived counter.
|
||||||
|
|
||||||
|
overloads-avoided
|
||||||
|
Counts how many times the sunrpc server layer chose not to wake an
|
||||||
|
nfsd thread, despite the presence of idle nfsd threads, because
|
||||||
|
too many nfsd threads had been recently woken but could not get
|
||||||
|
enough CPU time to actually run.
|
||||||
|
|
||||||
|
This statistic counts a circumstance where the sunrpc layer
|
||||||
|
heuristically avoids overloading the CPU scheduler with too many
|
||||||
|
runnable nfsd threads. The ideal rate of change for this counter
|
||||||
|
is zero. Significant non-zero values indicate that the workload
|
||||||
|
is CPU limited. Usually this is associated with heavy CPU usage
|
||||||
|
on all the CPUs in the nfsd thread pool.
|
||||||
|
|
||||||
|
If a sustained large overloads-avoided rate is detected on a pool,
|
||||||
|
the top(1) utility should be used to check for the following
|
||||||
|
pattern of CPU usage on all the CPUs associated with the given
|
||||||
|
nfsd thread pool.
|
||||||
|
|
||||||
|
- %us ~= 0 (as you're *NOT* running applications on your NFS server)
|
||||||
|
|
||||||
|
- %wa ~= 0
|
||||||
|
|
||||||
|
- %id ~= 0
|
||||||
|
|
||||||
|
- %sy + %hi + %si ~= 100
|
||||||
|
|
||||||
|
If this pattern is seen, configuring more nfsd threads will *not*
|
||||||
|
improve the performance of the workload. If this patten is not
|
||||||
|
seen, then something more subtle is wrong.
|
||||||
|
|
||||||
|
threads-timedout
|
||||||
|
Counts how many times an nfsd thread triggered an idle timeout,
|
||||||
|
i.e. was not woken to handle any incoming network packets for
|
||||||
|
some time.
|
||||||
|
|
||||||
|
This statistic counts a circumstance where there are more nfsd
|
||||||
|
threads configured than can be used by the NFS workload. This is
|
||||||
|
a clue that the number of nfsd threads can be reduced without
|
||||||
|
affecting performance. Unfortunately, it's only a clue and not
|
||||||
|
a strong indication, for a couple of reasons:
|
||||||
|
|
||||||
|
- Currently the rate at which the counter is incremented is quite
|
||||||
|
slow; the idle timeout is 60 minutes. Unless the NFS workload
|
||||||
|
remains constant for hours at a time, this counter is unlikely
|
||||||
|
to be providing information that is still useful.
|
||||||
|
|
||||||
|
- It is usually a wise policy to provide some slack,
|
||||||
|
i.e. configure a few more nfsds than are currently needed,
|
||||||
|
to allow for future spikes in load.
|
||||||
|
|
||||||
|
|
||||||
|
Note that incoming packets on NFS transports will be dealt with in
|
||||||
|
one of three ways. An nfsd thread can be woken (threads-woken counts
|
||||||
|
this case), or the transport can be enqueued for later attention
|
||||||
|
(sockets-enqueued counts this case), or the packet can be temporarily
|
||||||
|
deferred because the transport is currently being used by an nfsd
|
||||||
|
thread. This last case is not very interesting and is not explicitly
|
||||||
|
counted, but can be inferred from the other counters thus:
|
||||||
|
|
||||||
|
packets-deferred = packets-arrived - ( sockets-enqueued + threads-woken )
|
||||||
|
|
||||||
|
|
||||||
|
More
|
||||||
|
----
|
||||||
|
Descriptions of the other statistics file should go here.
|
||||||
|
|
||||||
|
|
||||||
|
Greg Banks <gnb@sgi.com>
|
||||||
|
26 Mar 2009
|
161
Documentation/filesystems/nfs41-server.txt
Normal file
161
Documentation/filesystems/nfs41-server.txt
Normal file
@@ -0,0 +1,161 @@
|
|||||||
|
NFSv4.1 Server Implementation
|
||||||
|
|
||||||
|
Server support for minorversion 1 can be controlled using the
|
||||||
|
/proc/fs/nfsd/versions control file. The string output returned
|
||||||
|
by reading this file will contain either "+4.1" or "-4.1"
|
||||||
|
correspondingly.
|
||||||
|
|
||||||
|
Currently, server support for minorversion 1 is disabled by default.
|
||||||
|
It can be enabled at run time by writing the string "+4.1" to
|
||||||
|
the /proc/fs/nfsd/versions control file. Note that to write this
|
||||||
|
control file, the nfsd service must be taken down. Use your user-mode
|
||||||
|
nfs-utils to set this up; see rpc.nfsd(8)
|
||||||
|
|
||||||
|
The NFSv4 minorversion 1 (NFSv4.1) implementation in nfsd is based
|
||||||
|
on the latest NFSv4.1 Internet Draft:
|
||||||
|
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-29
|
||||||
|
|
||||||
|
From the many new features in NFSv4.1 the current implementation
|
||||||
|
focuses on the mandatory-to-implement NFSv4.1 Sessions, providing
|
||||||
|
"exactly once" semantics and better control and throttling of the
|
||||||
|
resources allocated for each client.
|
||||||
|
|
||||||
|
Other NFSv4.1 features, Parallel NFS operations in particular,
|
||||||
|
are still under development out of tree.
|
||||||
|
See http://wiki.linux-nfs.org/wiki/index.php/PNFS_prototype_design
|
||||||
|
for more information.
|
||||||
|
|
||||||
|
The table below, taken from the NFSv4.1 document, lists
|
||||||
|
the operations that are mandatory to implement (REQ), optional
|
||||||
|
(OPT), and NFSv4.0 operations that are required not to implement (MNI)
|
||||||
|
in minor version 1. The first column indicates the operations that
|
||||||
|
are not supported yet by the linux server implementation.
|
||||||
|
|
||||||
|
The OPTIONAL features identified and their abbreviations are as follows:
|
||||||
|
pNFS Parallel NFS
|
||||||
|
FDELG File Delegations
|
||||||
|
DDELG Directory Delegations
|
||||||
|
|
||||||
|
The following abbreviations indicate the linux server implementation status.
|
||||||
|
I Implemented NFSv4.1 operations.
|
||||||
|
NS Not Supported.
|
||||||
|
NS* unimplemented optional feature.
|
||||||
|
P pNFS features implemented out of tree.
|
||||||
|
PNS pNFS features that are not supported yet (out of tree).
|
||||||
|
|
||||||
|
Operations
|
||||||
|
|
||||||
|
+----------------------+------------+--------------+----------------+
|
||||||
|
| Operation | REQ, REC, | Feature | Definition |
|
||||||
|
| | OPT, or | (REQ, REC, | |
|
||||||
|
| | MNI | or OPT) | |
|
||||||
|
+----------------------+------------+--------------+----------------+
|
||||||
|
| ACCESS | REQ | | Section 18.1 |
|
||||||
|
NS | BACKCHANNEL_CTL | REQ | | Section 18.33 |
|
||||||
|
NS | BIND_CONN_TO_SESSION | REQ | | Section 18.34 |
|
||||||
|
| CLOSE | REQ | | Section 18.2 |
|
||||||
|
| COMMIT | REQ | | Section 18.3 |
|
||||||
|
| CREATE | REQ | | Section 18.4 |
|
||||||
|
I | CREATE_SESSION | REQ | | Section 18.36 |
|
||||||
|
NS*| DELEGPURGE | OPT | FDELG (REQ) | Section 18.5 |
|
||||||
|
| DELEGRETURN | OPT | FDELG, | Section 18.6 |
|
||||||
|
| | | DDELG, pNFS | |
|
||||||
|
| | | (REQ) | |
|
||||||
|
NS | DESTROY_CLIENTID | REQ | | Section 18.50 |
|
||||||
|
I | DESTROY_SESSION | REQ | | Section 18.37 |
|
||||||
|
I | EXCHANGE_ID | REQ | | Section 18.35 |
|
||||||
|
NS | FREE_STATEID | REQ | | Section 18.38 |
|
||||||
|
| GETATTR | REQ | | Section 18.7 |
|
||||||
|
P | GETDEVICEINFO | OPT | pNFS (REQ) | Section 18.40 |
|
||||||
|
P | GETDEVICELIST | OPT | pNFS (OPT) | Section 18.41 |
|
||||||
|
| GETFH | REQ | | Section 18.8 |
|
||||||
|
NS*| GET_DIR_DELEGATION | OPT | DDELG (REQ) | Section 18.39 |
|
||||||
|
P | LAYOUTCOMMIT | OPT | pNFS (REQ) | Section 18.42 |
|
||||||
|
P | LAYOUTGET | OPT | pNFS (REQ) | Section 18.43 |
|
||||||
|
P | LAYOUTRETURN | OPT | pNFS (REQ) | Section 18.44 |
|
||||||
|
| LINK | OPT | | Section 18.9 |
|
||||||
|
| LOCK | REQ | | Section 18.10 |
|
||||||
|
| LOCKT | REQ | | Section 18.11 |
|
||||||
|
| LOCKU | REQ | | Section 18.12 |
|
||||||
|
| LOOKUP | REQ | | Section 18.13 |
|
||||||
|
| LOOKUPP | REQ | | Section 18.14 |
|
||||||
|
| NVERIFY | REQ | | Section 18.15 |
|
||||||
|
| OPEN | REQ | | Section 18.16 |
|
||||||
|
NS*| OPENATTR | OPT | | Section 18.17 |
|
||||||
|
| OPEN_CONFIRM | MNI | | N/A |
|
||||||
|
| OPEN_DOWNGRADE | REQ | | Section 18.18 |
|
||||||
|
| PUTFH | REQ | | Section 18.19 |
|
||||||
|
| PUTPUBFH | REQ | | Section 18.20 |
|
||||||
|
| PUTROOTFH | REQ | | Section 18.21 |
|
||||||
|
| READ | REQ | | Section 18.22 |
|
||||||
|
| READDIR | REQ | | Section 18.23 |
|
||||||
|
| READLINK | OPT | | Section 18.24 |
|
||||||
|
NS | RECLAIM_COMPLETE | REQ | | Section 18.51 |
|
||||||
|
| RELEASE_LOCKOWNER | MNI | | N/A |
|
||||||
|
| REMOVE | REQ | | Section 18.25 |
|
||||||
|
| RENAME | REQ | | Section 18.26 |
|
||||||
|
| RENEW | MNI | | N/A |
|
||||||
|
| RESTOREFH | REQ | | Section 18.27 |
|
||||||
|
| SAVEFH | REQ | | Section 18.28 |
|
||||||
|
| SECINFO | REQ | | Section 18.29 |
|
||||||
|
NS | SECINFO_NO_NAME | REC | pNFS files | Section 18.45, |
|
||||||
|
| | | layout (REQ) | Section 13.12 |
|
||||||
|
I | SEQUENCE | REQ | | Section 18.46 |
|
||||||
|
| SETATTR | REQ | | Section 18.30 |
|
||||||
|
| SETCLIENTID | MNI | | N/A |
|
||||||
|
| SETCLIENTID_CONFIRM | MNI | | N/A |
|
||||||
|
NS | SET_SSV | REQ | | Section 18.47 |
|
||||||
|
NS | TEST_STATEID | REQ | | Section 18.48 |
|
||||||
|
| VERIFY | REQ | | Section 18.31 |
|
||||||
|
NS*| WANT_DELEGATION | OPT | FDELG (OPT) | Section 18.49 |
|
||||||
|
| WRITE | REQ | | Section 18.32 |
|
||||||
|
|
||||||
|
Callback Operations
|
||||||
|
|
||||||
|
+-------------------------+-----------+-------------+---------------+
|
||||||
|
| Operation | REQ, REC, | Feature | Definition |
|
||||||
|
| | OPT, or | (REQ, REC, | |
|
||||||
|
| | MNI | or OPT) | |
|
||||||
|
+-------------------------+-----------+-------------+---------------+
|
||||||
|
| CB_GETATTR | OPT | FDELG (REQ) | Section 20.1 |
|
||||||
|
P | CB_LAYOUTRECALL | OPT | pNFS (REQ) | Section 20.3 |
|
||||||
|
NS*| CB_NOTIFY | OPT | DDELG (REQ) | Section 20.4 |
|
||||||
|
P | CB_NOTIFY_DEVICEID | OPT | pNFS (OPT) | Section 20.12 |
|
||||||
|
NS*| CB_NOTIFY_LOCK | OPT | | Section 20.11 |
|
||||||
|
NS*| CB_PUSH_DELEG | OPT | FDELG (OPT) | Section 20.5 |
|
||||||
|
| CB_RECALL | OPT | FDELG, | Section 20.2 |
|
||||||
|
| | | DDELG, pNFS | |
|
||||||
|
| | | (REQ) | |
|
||||||
|
NS*| CB_RECALL_ANY | OPT | FDELG, | Section 20.6 |
|
||||||
|
| | | DDELG, pNFS | |
|
||||||
|
| | | (REQ) | |
|
||||||
|
NS | CB_RECALL_SLOT | REQ | | Section 20.8 |
|
||||||
|
NS*| CB_RECALLABLE_OBJ_AVAIL | OPT | DDELG, pNFS | Section 20.7 |
|
||||||
|
| | | (REQ) | |
|
||||||
|
I | CB_SEQUENCE | OPT | FDELG, | Section 20.9 |
|
||||||
|
| | | DDELG, pNFS | |
|
||||||
|
| | | (REQ) | |
|
||||||
|
NS*| CB_WANTS_CANCELLED | OPT | FDELG, | Section 20.10 |
|
||||||
|
| | | DDELG, pNFS | |
|
||||||
|
| | | (REQ) | |
|
||||||
|
+-------------------------+-----------+-------------+---------------+
|
||||||
|
|
||||||
|
Implementation notes:
|
||||||
|
|
||||||
|
EXCHANGE_ID:
|
||||||
|
* only SP4_NONE state protection supported
|
||||||
|
* implementation ids are ignored
|
||||||
|
|
||||||
|
CREATE_SESSION:
|
||||||
|
* backchannel attributes are ignored
|
||||||
|
* backchannel security parameters are ignored
|
||||||
|
|
||||||
|
SEQUENCE:
|
||||||
|
* no support for dynamic slot table renegotiation (optional)
|
||||||
|
|
||||||
|
nfsv4.1 COMPOUND rules:
|
||||||
|
The following cases aren't supported yet:
|
||||||
|
* Enforcing of NFS4ERR_NOT_ONLY_OP for: BIND_CONN_TO_SESSION, CREATE_SESSION,
|
||||||
|
DESTROY_CLIENTID, DESTROY_SESSION, EXCHANGE_ID.
|
||||||
|
* DESTROY_SESSION MUST be the final operation in the COMPOUND request.
|
||||||
|
|
200
Documentation/filesystems/nilfs2.txt
Normal file
200
Documentation/filesystems/nilfs2.txt
Normal file
@@ -0,0 +1,200 @@
|
|||||||
|
NILFS2
|
||||||
|
------
|
||||||
|
|
||||||
|
NILFS2 is a log-structured file system (LFS) supporting continuous
|
||||||
|
snapshotting. In addition to versioning capability of the entire file
|
||||||
|
system, users can even restore files mistakenly overwritten or
|
||||||
|
destroyed just a few seconds ago. Since NILFS2 can keep consistency
|
||||||
|
like conventional LFS, it achieves quick recovery after system
|
||||||
|
crashes.
|
||||||
|
|
||||||
|
NILFS2 creates a number of checkpoints every few seconds or per
|
||||||
|
synchronous write basis (unless there is no change). Users can select
|
||||||
|
significant versions among continuously created checkpoints, and can
|
||||||
|
change them into snapshots which will be preserved until they are
|
||||||
|
changed back to checkpoints.
|
||||||
|
|
||||||
|
There is no limit on the number of snapshots until the volume gets
|
||||||
|
full. Each snapshot is mountable as a read-only file system
|
||||||
|
concurrently with its writable mount, and this feature is convenient
|
||||||
|
for online backup.
|
||||||
|
|
||||||
|
The userland tools are included in nilfs-utils package, which is
|
||||||
|
available from the following download page. At least "mkfs.nilfs2",
|
||||||
|
"mount.nilfs2", "umount.nilfs2", and "nilfs_cleanerd" (so called
|
||||||
|
cleaner or garbage collector) are required. Details on the tools are
|
||||||
|
described in the man pages included in the package.
|
||||||
|
|
||||||
|
Project web page: http://www.nilfs.org/en/
|
||||||
|
Download page: http://www.nilfs.org/en/download.html
|
||||||
|
Git tree web page: http://www.nilfs.org/git/
|
||||||
|
NILFS mailing lists: http://www.nilfs.org/mailman/listinfo/users
|
||||||
|
|
||||||
|
Caveats
|
||||||
|
=======
|
||||||
|
|
||||||
|
Features which NILFS2 does not support yet:
|
||||||
|
|
||||||
|
- atime
|
||||||
|
- extended attributes
|
||||||
|
- POSIX ACLs
|
||||||
|
- quotas
|
||||||
|
- writable snapshots
|
||||||
|
- remote backup (CDP)
|
||||||
|
- data integrity
|
||||||
|
- defragmentation
|
||||||
|
|
||||||
|
Mount options
|
||||||
|
=============
|
||||||
|
|
||||||
|
NILFS2 supports the following mount options:
|
||||||
|
(*) == default
|
||||||
|
|
||||||
|
barrier=on(*) This enables/disables barriers. barrier=off disables
|
||||||
|
it, barrier=on enables it.
|
||||||
|
errors=continue(*) Keep going on a filesystem error.
|
||||||
|
errors=remount-ro Remount the filesystem read-only on an error.
|
||||||
|
errors=panic Panic and halt the machine if an error occurs.
|
||||||
|
cp=n Specify the checkpoint-number of the snapshot to be
|
||||||
|
mounted. Checkpoints and snapshots are listed by lscp
|
||||||
|
user command. Only the checkpoints marked as snapshot
|
||||||
|
are mountable with this option. Snapshot is read-only,
|
||||||
|
so a read-only mount option must be specified together.
|
||||||
|
order=relaxed(*) Apply relaxed order semantics that allows modified data
|
||||||
|
blocks to be written to disk without making a
|
||||||
|
checkpoint if no metadata update is going. This mode
|
||||||
|
is equivalent to the ordered data mode of the ext3
|
||||||
|
filesystem except for the updates on data blocks still
|
||||||
|
conserve atomicity. This will improve synchronous
|
||||||
|
write performance for overwriting.
|
||||||
|
order=strict Apply strict in-order semantics that preserves sequence
|
||||||
|
of all file operations including overwriting of data
|
||||||
|
blocks. That means, it is guaranteed that no
|
||||||
|
overtaking of events occurs in the recovered file
|
||||||
|
system after a crash.
|
||||||
|
|
||||||
|
NILFS2 usage
|
||||||
|
============
|
||||||
|
|
||||||
|
To use nilfs2 as a local file system, simply:
|
||||||
|
|
||||||
|
# mkfs -t nilfs2 /dev/block_device
|
||||||
|
# mount -t nilfs2 /dev/block_device /dir
|
||||||
|
|
||||||
|
This will also invoke the cleaner through the mount helper program
|
||||||
|
(mount.nilfs2).
|
||||||
|
|
||||||
|
Checkpoints and snapshots are managed by the following commands.
|
||||||
|
Their manpages are included in the nilfs-utils package above.
|
||||||
|
|
||||||
|
lscp list checkpoints or snapshots.
|
||||||
|
mkcp make a checkpoint or a snapshot.
|
||||||
|
chcp change an existing checkpoint to a snapshot or vice versa.
|
||||||
|
rmcp invalidate specified checkpoint(s).
|
||||||
|
|
||||||
|
To mount a snapshot,
|
||||||
|
|
||||||
|
# mount -t nilfs2 -r -o cp=<cno> /dev/block_device /snap_dir
|
||||||
|
|
||||||
|
where <cno> is the checkpoint number of the snapshot.
|
||||||
|
|
||||||
|
To unmount the NILFS2 mount point or snapshot, simply:
|
||||||
|
|
||||||
|
# umount /dir
|
||||||
|
|
||||||
|
Then, the cleaner daemon is automatically shut down by the umount
|
||||||
|
helper program (umount.nilfs2).
|
||||||
|
|
||||||
|
Disk format
|
||||||
|
===========
|
||||||
|
|
||||||
|
A nilfs2 volume is equally divided into a number of segments except
|
||||||
|
for the super block (SB) and segment #0. A segment is the container
|
||||||
|
of logs. Each log is composed of summary information blocks, payload
|
||||||
|
blocks, and an optional super root block (SR):
|
||||||
|
|
||||||
|
______________________________________________________
|
||||||
|
| |SB| | Segment | Segment | Segment | ... | Segment | |
|
||||||
|
|_|__|_|____0____|____1____|____2____|_____|____N____|_|
|
||||||
|
0 +1K +4K +8M +16M +24M +(8MB x N)
|
||||||
|
. . (Typical offsets for 4KB-block)
|
||||||
|
. .
|
||||||
|
.______________________.
|
||||||
|
| log | log |... | log |
|
||||||
|
|__1__|__2__|____|__m__|
|
||||||
|
. .
|
||||||
|
. .
|
||||||
|
. .
|
||||||
|
.______________________________.
|
||||||
|
| Summary | Payload blocks |SR|
|
||||||
|
|_blocks__|_________________|__|
|
||||||
|
|
||||||
|
The payload blocks are organized per file, and each file consists of
|
||||||
|
data blocks and B-tree node blocks:
|
||||||
|
|
||||||
|
|<--- File-A --->|<--- File-B --->|
|
||||||
|
_______________________________________________________________
|
||||||
|
| Data blocks | B-tree blocks | Data blocks | B-tree blocks | ...
|
||||||
|
_|_____________|_______________|_____________|_______________|_
|
||||||
|
|
||||||
|
|
||||||
|
Since only the modified blocks are written in the log, it may have
|
||||||
|
files without data blocks or B-tree node blocks.
|
||||||
|
|
||||||
|
The organization of the blocks is recorded in the summary information
|
||||||
|
blocks, which contains a header structure (nilfs_segment_summary), per
|
||||||
|
file structures (nilfs_finfo), and per block structures (nilfs_binfo):
|
||||||
|
|
||||||
|
_________________________________________________________________________
|
||||||
|
| Summary | finfo | binfo | ... | binfo | finfo | binfo | ... | binfo |...
|
||||||
|
|_blocks__|___A___|_(A,1)_|_____|(A,Na)_|___B___|_(B,1)_|_____|(B,Nb)_|___
|
||||||
|
|
||||||
|
|
||||||
|
The logs include regular files, directory files, symbolic link files
|
||||||
|
and several meta data files. The mata data files are the files used
|
||||||
|
to maintain file system meta data. The current version of NILFS2 uses
|
||||||
|
the following meta data files:
|
||||||
|
|
||||||
|
1) Inode file (ifile) -- Stores on-disk inodes
|
||||||
|
2) Checkpoint file (cpfile) -- Stores checkpoints
|
||||||
|
3) Segment usage file (sufile) -- Stores allocation state of segments
|
||||||
|
4) Data address translation file -- Maps virtual block numbers to usual
|
||||||
|
(DAT) block numbers. This file serves to
|
||||||
|
make on-disk blocks relocatable.
|
||||||
|
|
||||||
|
The following figure shows a typical organization of the logs:
|
||||||
|
|
||||||
|
_________________________________________________________________________
|
||||||
|
| Summary | regular file | file | ... | ifile | cpfile | sufile | DAT |SR|
|
||||||
|
|_blocks__|_or_directory_|_______|_____|_______|________|________|_____|__|
|
||||||
|
|
||||||
|
|
||||||
|
To stride over segment boundaries, this sequence of files may be split
|
||||||
|
into multiple logs. The sequence of logs that should be treated as
|
||||||
|
logically one log, is delimited with flags marked in the segment
|
||||||
|
summary. The recovery code of nilfs2 looks this boundary information
|
||||||
|
to ensure atomicity of updates.
|
||||||
|
|
||||||
|
The super root block is inserted for every checkpoints. It includes
|
||||||
|
three special inodes, inodes for the DAT, cpfile, and sufile. Inodes
|
||||||
|
of regular files, directories, symlinks and other special files, are
|
||||||
|
included in the ifile. The inode of ifile itself is included in the
|
||||||
|
corresponding checkpoint entry in the cpfile. Thus, the hierarchy
|
||||||
|
among NILFS2 files can be depicted as follows:
|
||||||
|
|
||||||
|
Super block (SB)
|
||||||
|
|
|
||||||
|
v
|
||||||
|
Super root block (the latest cno=xx)
|
||||||
|
|-- DAT
|
||||||
|
|-- sufile
|
||||||
|
`-- cpfile
|
||||||
|
|-- ifile (cno=c1)
|
||||||
|
|-- ifile (cno=c2) ---- file (ino=i1)
|
||||||
|
: : |-- file (ino=i2)
|
||||||
|
`-- ifile (cno=xx) |-- file (ino=i3)
|
||||||
|
: :
|
||||||
|
`-- file (ino=yy)
|
||||||
|
( regular file, directory, or symlink )
|
||||||
|
|
||||||
|
For detail on the format of each file, please see include/linux/nilfs2_fs.h.
|
70
Documentation/filesystems/pohmelfs/design_notes.txt
Normal file
70
Documentation/filesystems/pohmelfs/design_notes.txt
Normal file
@@ -0,0 +1,70 @@
|
|||||||
|
POHMELFS: Parallel Optimized Host Message Exchange Layered File System.
|
||||||
|
|
||||||
|
Evgeniy Polyakov <zbr@ioremap.net>
|
||||||
|
|
||||||
|
Homepage: http://www.ioremap.net/projects/pohmelfs
|
||||||
|
|
||||||
|
POHMELFS first began as a network filesystem with coherent local data and
|
||||||
|
metadata caches but is now evolving into a parallel distributed filesystem.
|
||||||
|
|
||||||
|
Main features of this FS include:
|
||||||
|
* Locally coherent cache for data and metadata with (potentially) byte-range locks.
|
||||||
|
Since all Linux filesystems lock the whole inode during writing, algorithm
|
||||||
|
is very simple and does not use byte-ranges, although they are sent in
|
||||||
|
locking messages.
|
||||||
|
* Completely async processing of all events except creation of hard and symbolic
|
||||||
|
links, and rename events.
|
||||||
|
Object creation and data reading and writing are processed asynchronously.
|
||||||
|
* Flexible object architecture optimized for network processing.
|
||||||
|
Ability to create long paths to objects and remove arbitrarily huge
|
||||||
|
directories with a single network command.
|
||||||
|
(like removing the whole kernel tree via a single network command).
|
||||||
|
* Very high performance.
|
||||||
|
* Fast and scalable multithreaded userspace server. Being in userspace it works
|
||||||
|
with any underlying filesystem and still is much faster than async in-kernel NFS one.
|
||||||
|
* Client is able to switch between different servers (if one goes down, client
|
||||||
|
automatically reconnects to second and so on).
|
||||||
|
* Transactions support. Full failover for all operations.
|
||||||
|
Resending transactions to different servers on timeout or error.
|
||||||
|
* Read request (data read, directory listing, lookup requests) balancing between multiple servers.
|
||||||
|
* Write requests are replicated to multiple servers and completed only when all of them are acked.
|
||||||
|
* Ability to add and/or remove servers from the working set at run-time.
|
||||||
|
* Strong authentification and possible data encryption in network channel.
|
||||||
|
* Extended attributes support.
|
||||||
|
|
||||||
|
POHMELFS is based on transactions, which are potentially long-standing objects that live
|
||||||
|
in the client's memory. Each transaction contains all the information needed to process a given
|
||||||
|
command (or set of commands, which is frequently used during data writing: single transactions
|
||||||
|
can contain creation and data writing commands). Transactions are committed by all the servers
|
||||||
|
to which they are sent and, in case of failures, are eventually resent or dropped with an error.
|
||||||
|
For example, reading will return an error if no servers are available.
|
||||||
|
|
||||||
|
POHMELFS uses a asynchronous approach to data processing. Courtesy of transactions, it is
|
||||||
|
possible to detach replies from requests and, if the command requires data to be received, the
|
||||||
|
caller sleeps waiting for it. Thus, it is possible to issue multiple read commands to different
|
||||||
|
servers and async threads will pick up replies in parallel, find appropriate transactions in the
|
||||||
|
system and put the data where it belongs (like the page or inode cache).
|
||||||
|
|
||||||
|
The main feature of POHMELFS is writeback data and the metadata cache.
|
||||||
|
Only a few non-performance critical operations use the write-through cache and
|
||||||
|
are synchronous: hard and symbolic link creation, and object rename. Creation,
|
||||||
|
removal of objects and data writing are asynchronous and are sent to
|
||||||
|
the server during system writeback. Only one writer at a time is allowed for any
|
||||||
|
given inode, which is guarded by an appropriate locking protocol.
|
||||||
|
Because of this feature, POHMELFS is extremely fast at metadata intensive
|
||||||
|
workloads and can fully utilize the bandwidth to the servers when doing bulk
|
||||||
|
data transfers.
|
||||||
|
|
||||||
|
POHMELFS clients operate with a working set of servers and are capable of balancing read-only
|
||||||
|
operations (like lookups or directory listings) between them.
|
||||||
|
Administrators can add or remove servers from the set at run-time via special commands (described
|
||||||
|
in Documentation/pohmelfs/info.txt file). Writes are replicated to all servers.
|
||||||
|
|
||||||
|
POHMELFS is capable of full data channel encryption and/or strong crypto hashing.
|
||||||
|
One can select any kernel supported cipher, encryption mode, hash type and operation mode
|
||||||
|
(hmac or digest). It is also possible to use both or neither (default). Crypto configuration
|
||||||
|
is checked during mount time and, if the server does not support it, appropriate capabilities
|
||||||
|
will be disabled or mount will fail (if 'crypto_fail_unsupported' mount option is specified).
|
||||||
|
Crypto performance heavily depends on the number of crypto threads, which asynchronously perform
|
||||||
|
crypto operations and send the resulting data to server or submit it up the stack. This number
|
||||||
|
can be controlled via a mount option.
|
86
Documentation/filesystems/pohmelfs/info.txt
Normal file
86
Documentation/filesystems/pohmelfs/info.txt
Normal file
@@ -0,0 +1,86 @@
|
|||||||
|
POHMELFS usage information.
|
||||||
|
|
||||||
|
Mount options:
|
||||||
|
idx=%u
|
||||||
|
Each mountpoint is associated with a special index via this option.
|
||||||
|
Administrator can add or remove servers from the given index, so all mounts,
|
||||||
|
which were attached to it, are updated.
|
||||||
|
Default it is 0.
|
||||||
|
|
||||||
|
trans_scan_timeout=%u
|
||||||
|
This timeout, expressed in milliseconds, specifies time to scan transaction
|
||||||
|
trees looking for stale requests, which have to be resent, or if number of
|
||||||
|
retries exceed specified limit, dropped with error.
|
||||||
|
Default is 5 seconds.
|
||||||
|
|
||||||
|
drop_scan_timeout=%u
|
||||||
|
Internal timeout, expressed in milliseconds, which specifies how frequently
|
||||||
|
inodes marked to be dropped are freed. It also specifies how frequently
|
||||||
|
the system checks that servers have to be added or removed from current working set.
|
||||||
|
Default is 1 second.
|
||||||
|
|
||||||
|
wait_on_page_timeout=%u
|
||||||
|
Number of milliseconds to wait for reply from remote server for data reading command.
|
||||||
|
If this timeout is exceeded, reading returns an error.
|
||||||
|
Default is 5 seconds.
|
||||||
|
|
||||||
|
trans_retries=%u
|
||||||
|
This is the number of times that a transaction will be resent to a server that did
|
||||||
|
not answer for the last @trans_scan_timeout milliseconds.
|
||||||
|
When the number of resends exceeds this limit, the transaction is completed with error.
|
||||||
|
Default is 5 resends.
|
||||||
|
|
||||||
|
crypto_thread_num=%u
|
||||||
|
Number of crypto processing threads. Threads are used both for RX and TX traffic.
|
||||||
|
Default is 2, or no threads if crypto operations are not supported.
|
||||||
|
|
||||||
|
trans_max_pages=%u
|
||||||
|
Maximum number of pages in a single transaction. This parameter also controls
|
||||||
|
the number of pages, allocated for crypto processing (each crypto thread has
|
||||||
|
pool of pages, the number of which is equal to 'trans_max_pages'.
|
||||||
|
Default is 100 pages.
|
||||||
|
|
||||||
|
crypto_fail_unsupported
|
||||||
|
If specified, mount will fail if the server does not support requested crypto operations.
|
||||||
|
By default mount will disable non-matching crypto operations.
|
||||||
|
|
||||||
|
mcache_timeout=%u
|
||||||
|
Maximum number of milliseconds to wait for the mcache objects to be processed.
|
||||||
|
Mcache includes locks (given lock should be granted by server), attributes (they should be
|
||||||
|
fully received in the given timeframe).
|
||||||
|
Default is 5 seconds.
|
||||||
|
|
||||||
|
Usage examples.
|
||||||
|
|
||||||
|
Add (or remove if it already exists) server server1.net:1025 into the working set with index $idx
|
||||||
|
with appropriate hash algorithm and key file and cipher algorithm, mode and key file:
|
||||||
|
$cfg -a server1.net -p 1025 -i $idx -K $hash_key -k $cipher_key
|
||||||
|
|
||||||
|
Mount filesystem with given index $idx to /mnt mountpoint.
|
||||||
|
Client will connect to all servers specified in the working set via previous command:
|
||||||
|
mount -t pohmel -o idx=$idx q /mnt
|
||||||
|
|
||||||
|
One can add or remove servers from working set after mounting too.
|
||||||
|
|
||||||
|
|
||||||
|
Server installation.
|
||||||
|
|
||||||
|
Creating a server, which listens at port 1025 and 0.0.0.0 address.
|
||||||
|
Working root directory (note, that server chroots there, so you have to have appropriate permissions)
|
||||||
|
is set to /mnt, server will negotiate hash/cipher with client, in case client requested it, there
|
||||||
|
are appropriate key files.
|
||||||
|
Number of working threads is set to 10.
|
||||||
|
|
||||||
|
# ./fserver -a 0.0.0.0 -p 1025 -r /mnt -w 10 -K hash_key -k cipher_key
|
||||||
|
|
||||||
|
-A 6 - listen on ipv6 address. Default: Disabled.
|
||||||
|
-r root - path to root directory. Default: /tmp.
|
||||||
|
-a addr - listen address. Default: 0.0.0.0.
|
||||||
|
-p port - listen port. Default: 1025.
|
||||||
|
-w workers - number of workers per connected client. Default: 1.
|
||||||
|
-K file - hash key size. Default: none.
|
||||||
|
-k file - cipher key size. Default: none.
|
||||||
|
-h - this help.
|
||||||
|
|
||||||
|
Number of worker threads specifies how many workers will be created for each client.
|
||||||
|
Bulk single-client transafers usually are better handled with smaller number (like 1-3).
|
227
Documentation/filesystems/pohmelfs/network_protocol.txt
Normal file
227
Documentation/filesystems/pohmelfs/network_protocol.txt
Normal file
@@ -0,0 +1,227 @@
|
|||||||
|
POHMELFS network protocol.
|
||||||
|
|
||||||
|
Basic structure used in network communication is following command:
|
||||||
|
|
||||||
|
struct netfs_cmd
|
||||||
|
{
|
||||||
|
__u16 cmd; /* Command number */
|
||||||
|
__u16 csize; /* Attached crypto information size */
|
||||||
|
__u16 cpad; /* Attached padding size */
|
||||||
|
__u16 ext; /* External flags */
|
||||||
|
__u32 size; /* Size of the attached data */
|
||||||
|
__u32 trans; /* Transaction id */
|
||||||
|
__u64 id; /* Object ID to operate on. Used for feedback.*/
|
||||||
|
__u64 start; /* Start of the object. */
|
||||||
|
__u64 iv; /* IV sequence */
|
||||||
|
__u8 data[0];
|
||||||
|
};
|
||||||
|
|
||||||
|
Commands can be embedded into transaction command (which in turn has own command),
|
||||||
|
so one can extend protocol as needed without breaking backward compatibility as long
|
||||||
|
as old commands are supported. All string lengths include tail 0 byte.
|
||||||
|
|
||||||
|
All commans are transfered over the network in big-endian. CPU endianess is used at the end peers.
|
||||||
|
|
||||||
|
@cmd - command number, which specifies command to be processed. Following
|
||||||
|
commands are used currently:
|
||||||
|
|
||||||
|
NETFS_READDIR = 1, /* Read directory for given inode number */
|
||||||
|
NETFS_READ_PAGE, /* Read data page from the server */
|
||||||
|
NETFS_WRITE_PAGE, /* Write data page to the server */
|
||||||
|
NETFS_CREATE, /* Create directory entry */
|
||||||
|
NETFS_REMOVE, /* Remove directory entry */
|
||||||
|
NETFS_LOOKUP, /* Lookup single object */
|
||||||
|
NETFS_LINK, /* Create a link */
|
||||||
|
NETFS_TRANS, /* Transaction */
|
||||||
|
NETFS_OPEN, /* Open intent */
|
||||||
|
NETFS_INODE_INFO, /* Metadata cache coherency synchronization message */
|
||||||
|
NETFS_PAGE_CACHE, /* Page cache invalidation message */
|
||||||
|
NETFS_READ_PAGES, /* Read multiple contiguous pages in one go */
|
||||||
|
NETFS_RENAME, /* Rename object */
|
||||||
|
NETFS_CAPABILITIES, /* Capabilities of the client, for example supported crypto */
|
||||||
|
NETFS_LOCK, /* Distributed lock message */
|
||||||
|
NETFS_XATTR_SET, /* Set extended attribute */
|
||||||
|
NETFS_XATTR_GET, /* Get extended attribute */
|
||||||
|
|
||||||
|
@ext - external flags. Used by different commands to specify some extra arguments
|
||||||
|
like partial size of the embedded objects or creation flags.
|
||||||
|
|
||||||
|
@size - size of the attached data. For NETFS_READ_PAGE and NETFS_READ_PAGES no data is attached,
|
||||||
|
but size of the requested data is incorporated here. It does not include size of the command
|
||||||
|
header (struct netfs_cmd) itself.
|
||||||
|
|
||||||
|
@id - id of the object this command operates on. Each command can use it for own purpose.
|
||||||
|
|
||||||
|
@start - start of the object this command operates on. Each command can use it for own purpose.
|
||||||
|
|
||||||
|
@csize, @cpad - size and padding size of the (attached if needed) crypto information.
|
||||||
|
|
||||||
|
Command specifications.
|
||||||
|
|
||||||
|
@NETFS_READDIR
|
||||||
|
This command is used to sync content of the remote dir to the client.
|
||||||
|
|
||||||
|
@ext - length of the path to object.
|
||||||
|
@size - the same.
|
||||||
|
@id - local inode number of the directory to read.
|
||||||
|
@start - zero.
|
||||||
|
|
||||||
|
|
||||||
|
@NETFS_READ_PAGE
|
||||||
|
This command is used to read data from remote server.
|
||||||
|
Data size does not exceed local page cache size.
|
||||||
|
|
||||||
|
@id - inode number.
|
||||||
|
@start - first byte offset.
|
||||||
|
@size - number of bytes to read plus length of the path to object.
|
||||||
|
@ext - object path length.
|
||||||
|
|
||||||
|
|
||||||
|
@NETFS_CREATE
|
||||||
|
Used to create object.
|
||||||
|
It does not require that all directories on top of the object were
|
||||||
|
already created, it will create them automatically. Each object has
|
||||||
|
associated @netfs_path_entry data structure, which contains creation
|
||||||
|
mode (permissions and type) and length of the name as long as name itself.
|
||||||
|
|
||||||
|
@start - 0
|
||||||
|
@size - size of the all data structures needed to create a path
|
||||||
|
@id - local inode number
|
||||||
|
@ext - 0
|
||||||
|
|
||||||
|
|
||||||
|
@NETFS_REMOVE
|
||||||
|
Used to remove object.
|
||||||
|
|
||||||
|
@ext - length of the path to object.
|
||||||
|
@size - the same.
|
||||||
|
@id - local inode number.
|
||||||
|
@start - zero.
|
||||||
|
|
||||||
|
|
||||||
|
@NETFS_LOOKUP
|
||||||
|
Lookup information about object on server.
|
||||||
|
|
||||||
|
@ext - length of the path to object.
|
||||||
|
@size - the same.
|
||||||
|
@id - local inode number of the directory to look object in.
|
||||||
|
@start - local inode number of the object to look at.
|
||||||
|
|
||||||
|
|
||||||
|
@NETFS_LINK
|
||||||
|
Create hard of symlink.
|
||||||
|
Command is sent as "object_path|target_path".
|
||||||
|
|
||||||
|
@size - size of the above string.
|
||||||
|
@id - parent local inode number.
|
||||||
|
@start - 1 for symlink, 0 for hardlink.
|
||||||
|
@ext - size of the "object_path" above.
|
||||||
|
|
||||||
|
|
||||||
|
@NETFS_TRANS
|
||||||
|
Transaction header.
|
||||||
|
|
||||||
|
@size - incorporates all embedded command sizes including theirs header sizes.
|
||||||
|
@start - transaction generation number - unique id used to find transaction.
|
||||||
|
@ext - transaction flags. Unused at the moment.
|
||||||
|
@id - 0.
|
||||||
|
|
||||||
|
|
||||||
|
@NETFS_OPEN
|
||||||
|
Open intent for given transaction.
|
||||||
|
|
||||||
|
@id - local inode number.
|
||||||
|
@start - 0.
|
||||||
|
@size - path length to the object.
|
||||||
|
@ext - open flags (O_RDWR and so on).
|
||||||
|
|
||||||
|
|
||||||
|
@NETFS_INODE_INFO
|
||||||
|
Metadata update command.
|
||||||
|
It is sent to servers when attributes of the object are changed and received
|
||||||
|
when data or metadata were updated. It operates with the following structure:
|
||||||
|
|
||||||
|
struct netfs_inode_info
|
||||||
|
{
|
||||||
|
unsigned int mode;
|
||||||
|
unsigned int nlink;
|
||||||
|
unsigned int uid;
|
||||||
|
unsigned int gid;
|
||||||
|
unsigned int blocksize;
|
||||||
|
unsigned int padding;
|
||||||
|
__u64 ino;
|
||||||
|
__u64 blocks;
|
||||||
|
__u64 rdev;
|
||||||
|
__u64 size;
|
||||||
|
__u64 version;
|
||||||
|
};
|
||||||
|
|
||||||
|
It effectively mirrors stat(2) returned data.
|
||||||
|
|
||||||
|
|
||||||
|
@ext - path length to the object.
|
||||||
|
@size - the same plus size of the netfs_inode_info structure.
|
||||||
|
@id - local inode number.
|
||||||
|
@start - 0.
|
||||||
|
|
||||||
|
|
||||||
|
@NETFS_PAGE_CACHE
|
||||||
|
Command is only received by clients. It contains information about
|
||||||
|
page to be marked as not up-to-date.
|
||||||
|
|
||||||
|
@id - client's inode number.
|
||||||
|
@start - last byte of the page to be invalidated. If it is not equal to
|
||||||
|
current inode size, it will be vmtruncated().
|
||||||
|
@size - 0
|
||||||
|
@ext - 0
|
||||||
|
|
||||||
|
|
||||||
|
@NETFS_READ_PAGES
|
||||||
|
Used to read multiple contiguous pages in one go.
|
||||||
|
|
||||||
|
@start - first byte of the contiguous region to read.
|
||||||
|
@size - contains of two fields: lower 8 bits are used to represent page cache shift
|
||||||
|
used by client, another 3 bytes are used to get number of pages.
|
||||||
|
@id - local inode number.
|
||||||
|
@ext - path length to the object.
|
||||||
|
|
||||||
|
|
||||||
|
@NETFS_RENAME
|
||||||
|
Used to rename object.
|
||||||
|
Attached data is formed into following string: "old_path|new_path".
|
||||||
|
|
||||||
|
@id - local inode number.
|
||||||
|
@start - parent inode number.
|
||||||
|
@size - length of the above string.
|
||||||
|
@ext - length of the old path part.
|
||||||
|
|
||||||
|
|
||||||
|
@NETFS_CAPABILITIES
|
||||||
|
Used to exchange crypto capabilities with server.
|
||||||
|
If crypto capabilities are not supported by server, then client will disable it
|
||||||
|
or fail (if 'crypto_fail_unsupported' mount options was specified).
|
||||||
|
|
||||||
|
@id - superblock index. Used to specify crypto information for group of servers.
|
||||||
|
@size - size of the attached capabilities structure.
|
||||||
|
@start - 0.
|
||||||
|
@size - 0.
|
||||||
|
@scsize - 0.
|
||||||
|
|
||||||
|
@NETFS_LOCK
|
||||||
|
Used to send lock request/release messages. Although it sends byte range request
|
||||||
|
and is capable of flushing pages based on that, it is not used, since all Linux
|
||||||
|
filesystems lock the whole inode.
|
||||||
|
|
||||||
|
@id - lock generation number.
|
||||||
|
@start - start of the locked range.
|
||||||
|
@size - size of the locked range.
|
||||||
|
@ext - lock type: read/write. Not used actually. 15'th bit is used to determine,
|
||||||
|
if it is lock request (1) or release (0).
|
||||||
|
|
||||||
|
@NETFS_XATTR_SET
|
||||||
|
@NETFS_XATTR_GET
|
||||||
|
Used to set/get extended attributes for given inode.
|
||||||
|
@id - attribute generation number or xattr setting type
|
||||||
|
@start - size of the attribute (request or attached)
|
||||||
|
@size - name length, path len and data size for given attribute
|
||||||
|
@ext - path length for given object
|
File diff suppressed because it is too large
Load Diff
@@ -22,7 +22,7 @@ Squashfs filesystem features versus Cramfs:
|
|||||||
|
|
||||||
Squashfs Cramfs
|
Squashfs Cramfs
|
||||||
|
|
||||||
Max filesystem size: 2^64 16 MiB
|
Max filesystem size: 2^64 256 MiB
|
||||||
Max file size: ~ 2 TiB 16 MiB
|
Max file size: ~ 2 TiB 16 MiB
|
||||||
Max files: unlimited unlimited
|
Max files: unlimited unlimited
|
||||||
Max directories: unlimited unlimited
|
Max directories: unlimited unlimited
|
||||||
|
@@ -9,8 +9,10 @@ that support it. For example, a given bus might look like this:
|
|||||||
| |-- class
|
| |-- class
|
||||||
| |-- config
|
| |-- config
|
||||||
| |-- device
|
| |-- device
|
||||||
|
| |-- enable
|
||||||
| |-- irq
|
| |-- irq
|
||||||
| |-- local_cpus
|
| |-- local_cpus
|
||||||
|
| |-- remove
|
||||||
| |-- resource
|
| |-- resource
|
||||||
| |-- resource0
|
| |-- resource0
|
||||||
| |-- resource1
|
| |-- resource1
|
||||||
@@ -32,8 +34,10 @@ files, each with their own function.
|
|||||||
class PCI class (ascii, ro)
|
class PCI class (ascii, ro)
|
||||||
config PCI config space (binary, rw)
|
config PCI config space (binary, rw)
|
||||||
device PCI device (ascii, ro)
|
device PCI device (ascii, ro)
|
||||||
|
enable Whether the device is enabled (ascii, rw)
|
||||||
irq IRQ number (ascii, ro)
|
irq IRQ number (ascii, ro)
|
||||||
local_cpus nearby CPU mask (cpumask, ro)
|
local_cpus nearby CPU mask (cpumask, ro)
|
||||||
|
remove remove device from kernel's list (ascii, wo)
|
||||||
resource PCI resource host addresses (ascii, ro)
|
resource PCI resource host addresses (ascii, ro)
|
||||||
resource0..N PCI resource N, if present (binary, mmap)
|
resource0..N PCI resource N, if present (binary, mmap)
|
||||||
resource0_wc..N_wc PCI WC map resource N, if prefetchable (binary, mmap)
|
resource0_wc..N_wc PCI WC map resource N, if prefetchable (binary, mmap)
|
||||||
@@ -44,6 +48,7 @@ files, each with their own function.
|
|||||||
|
|
||||||
ro - read only file
|
ro - read only file
|
||||||
rw - file is readable and writable
|
rw - file is readable and writable
|
||||||
|
wo - write only file
|
||||||
mmap - file is mmapable
|
mmap - file is mmapable
|
||||||
ascii - file contains ascii text
|
ascii - file contains ascii text
|
||||||
binary - file contains binary data
|
binary - file contains binary data
|
||||||
@@ -57,10 +62,26 @@ used to do actual device programming from userspace. Note that some platforms
|
|||||||
don't support mmapping of certain resources, so be sure to check the return
|
don't support mmapping of certain resources, so be sure to check the return
|
||||||
value from any attempted mmap.
|
value from any attempted mmap.
|
||||||
|
|
||||||
|
The 'enable' file provides a counter that indicates how many times the device
|
||||||
|
has been enabled. If the 'enable' file currently returns '4', and a '1' is
|
||||||
|
echoed into it, it will then return '5'. Echoing a '0' into it will decrease
|
||||||
|
the count. Even when it returns to 0, though, some of the initialisation
|
||||||
|
may not be reversed.
|
||||||
|
|
||||||
The 'rom' file is special in that it provides read-only access to the device's
|
The 'rom' file is special in that it provides read-only access to the device's
|
||||||
ROM file, if available. It's disabled by default, however, so applications
|
ROM file, if available. It's disabled by default, however, so applications
|
||||||
should write the string "1" to the file to enable it before attempting a read
|
should write the string "1" to the file to enable it before attempting a read
|
||||||
call, and disable it following the access by writing "0" to the file.
|
call, and disable it following the access by writing "0" to the file. Note
|
||||||
|
that the device must be enabled for a rom read to return data succesfully.
|
||||||
|
In the event a driver is not bound to the device, it can be enabled using the
|
||||||
|
'enable' file, documented above.
|
||||||
|
|
||||||
|
The 'remove' file is used to remove the PCI device, by writing a non-zero
|
||||||
|
integer to the file. This does not involve any kind of hot-plug functionality,
|
||||||
|
e.g. powering off the device. The device is removed from the kernel's list of
|
||||||
|
PCI devices, the sysfs directory for it is removed, and the device will be
|
||||||
|
removed from any drivers attached to it. Removal of PCI root buses is
|
||||||
|
disallowed.
|
||||||
|
|
||||||
Accessing legacy resources through sysfs
|
Accessing legacy resources through sysfs
|
||||||
----------------------------------------
|
----------------------------------------
|
||||||
|
@@ -2,8 +2,10 @@
|
|||||||
sysfs - _The_ filesystem for exporting kernel objects.
|
sysfs - _The_ filesystem for exporting kernel objects.
|
||||||
|
|
||||||
Patrick Mochel <mochel@osdl.org>
|
Patrick Mochel <mochel@osdl.org>
|
||||||
|
Mike Murphy <mamurph@cs.clemson.edu>
|
||||||
|
|
||||||
10 January 2003
|
Revised: 22 February 2009
|
||||||
|
Original: 10 January 2003
|
||||||
|
|
||||||
|
|
||||||
What it is:
|
What it is:
|
||||||
@@ -64,12 +66,13 @@ An attribute definition is simply:
|
|||||||
|
|
||||||
struct attribute {
|
struct attribute {
|
||||||
char * name;
|
char * name;
|
||||||
|
struct module *owner;
|
||||||
mode_t mode;
|
mode_t mode;
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|
||||||
int sysfs_create_file(struct kobject * kobj, struct attribute * attr);
|
int sysfs_create_file(struct kobject * kobj, const struct attribute * attr);
|
||||||
void sysfs_remove_file(struct kobject * kobj, struct attribute * attr);
|
void sysfs_remove_file(struct kobject * kobj, const struct attribute * attr);
|
||||||
|
|
||||||
|
|
||||||
A bare attribute contains no means to read or write the value of the
|
A bare attribute contains no means to read or write the value of the
|
||||||
@@ -81,8 +84,10 @@ For example, the driver model defines struct device_attribute like:
|
|||||||
|
|
||||||
struct device_attribute {
|
struct device_attribute {
|
||||||
struct attribute attr;
|
struct attribute attr;
|
||||||
ssize_t (*show)(struct device * dev, char * buf);
|
ssize_t (*show)(struct device *dev, struct device_attribute *attr,
|
||||||
ssize_t (*store)(struct device * dev, const char * buf);
|
char *buf);
|
||||||
|
ssize_t (*store)(struct device *dev, struct device_attribute *attr,
|
||||||
|
const char *buf, size_t count);
|
||||||
};
|
};
|
||||||
|
|
||||||
int device_create_file(struct device *, struct device_attribute *);
|
int device_create_file(struct device *, struct device_attribute *);
|
||||||
@@ -91,11 +96,7 @@ void device_remove_file(struct device *, struct device_attribute *);
|
|||||||
It also defines this helper for defining device attributes:
|
It also defines this helper for defining device attributes:
|
||||||
|
|
||||||
#define DEVICE_ATTR(_name, _mode, _show, _store) \
|
#define DEVICE_ATTR(_name, _mode, _show, _store) \
|
||||||
struct device_attribute dev_attr_##_name = { \
|
struct device_attribute dev_attr_##_name = __ATTR(_name, _mode, _show, _store)
|
||||||
.attr = {.name = __stringify(_name) , .mode = _mode }, \
|
|
||||||
.show = _show, \
|
|
||||||
.store = _store, \
|
|
||||||
};
|
|
||||||
|
|
||||||
For example, declaring
|
For example, declaring
|
||||||
|
|
||||||
@@ -107,9 +108,9 @@ static struct device_attribute dev_attr_foo = {
|
|||||||
.attr = {
|
.attr = {
|
||||||
.name = "foo",
|
.name = "foo",
|
||||||
.mode = S_IWUSR | S_IRUGO,
|
.mode = S_IWUSR | S_IRUGO,
|
||||||
},
|
|
||||||
.show = show_foo,
|
.show = show_foo,
|
||||||
.store = store_foo,
|
.store = store_foo,
|
||||||
|
},
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|
||||||
@@ -161,10 +162,12 @@ To read or write attributes, show() or store() methods must be
|
|||||||
specified when declaring the attribute. The method types should be as
|
specified when declaring the attribute. The method types should be as
|
||||||
simple as those defined for device attributes:
|
simple as those defined for device attributes:
|
||||||
|
|
||||||
ssize_t (*show)(struct device * dev, char * buf);
|
ssize_t (*show)(struct device * dev, struct device_attribute * attr,
|
||||||
ssize_t (*store)(struct device * dev, const char * buf);
|
char * buf);
|
||||||
|
ssize_t (*store)(struct device * dev, struct device_attribute * attr,
|
||||||
|
const char * buf);
|
||||||
|
|
||||||
IOW, they should take only an object and a buffer as parameters.
|
IOW, they should take only an object, an attribute, and a buffer as parameters.
|
||||||
|
|
||||||
|
|
||||||
sysfs allocates a buffer of size (PAGE_SIZE) and passes it to the
|
sysfs allocates a buffer of size (PAGE_SIZE) and passes it to the
|
||||||
@@ -300,13 +303,15 @@ Structure:
|
|||||||
|
|
||||||
struct device_attribute {
|
struct device_attribute {
|
||||||
struct attribute attr;
|
struct attribute attr;
|
||||||
ssize_t (*show)(struct device * dev, char * buf);
|
ssize_t (*show)(struct device *dev, struct device_attribute *attr,
|
||||||
ssize_t (*store)(struct device * dev, const char * buf);
|
char *buf);
|
||||||
|
ssize_t (*store)(struct device *dev, struct device_attribute *attr,
|
||||||
|
const char *buf, size_t count);
|
||||||
};
|
};
|
||||||
|
|
||||||
Declaring:
|
Declaring:
|
||||||
|
|
||||||
DEVICE_ATTR(_name, _str, _mode, _show, _store);
|
DEVICE_ATTR(_name, _mode, _show, _store);
|
||||||
|
|
||||||
Creation/Removal:
|
Creation/Removal:
|
||||||
|
|
||||||
@@ -342,7 +347,8 @@ Structure:
|
|||||||
struct driver_attribute {
|
struct driver_attribute {
|
||||||
struct attribute attr;
|
struct attribute attr;
|
||||||
ssize_t (*show)(struct device_driver *, char * buf);
|
ssize_t (*show)(struct device_driver *, char * buf);
|
||||||
ssize_t (*store)(struct device_driver *, const char * buf);
|
ssize_t (*store)(struct device_driver *, const char * buf,
|
||||||
|
size_t count);
|
||||||
};
|
};
|
||||||
|
|
||||||
Declaring:
|
Declaring:
|
||||||
|
@@ -24,6 +24,8 @@ The following mount options are supported:
|
|||||||
|
|
||||||
gid= Set the default group.
|
gid= Set the default group.
|
||||||
umask= Set the default umask.
|
umask= Set the default umask.
|
||||||
|
mode= Set the default file permissions.
|
||||||
|
dmode= Set the default directory permissions.
|
||||||
uid= Set the default user.
|
uid= Set the default user.
|
||||||
bs= Set the block size.
|
bs= Set the block size.
|
||||||
unhide Show otherwise hidden files.
|
unhide Show otherwise hidden files.
|
||||||
|
File diff suppressed because it is too large
Load Diff
@@ -123,7 +123,10 @@ platform-specific implementation issue.
|
|||||||
|
|
||||||
Using GPIOs
|
Using GPIOs
|
||||||
-----------
|
-----------
|
||||||
One of the first things to do with a GPIO, often in board setup code when
|
The first thing a system should do with a GPIO is allocate it, using
|
||||||
|
the gpio_request() call; see later.
|
||||||
|
|
||||||
|
One of the next things to do with a GPIO, often in board setup code when
|
||||||
setting up a platform_device using the GPIO, is mark its direction:
|
setting up a platform_device using the GPIO, is mark its direction:
|
||||||
|
|
||||||
/* set as input or output, returning 0 or negative errno */
|
/* set as input or output, returning 0 or negative errno */
|
||||||
@@ -141,8 +144,8 @@ This helps avoid signal glitching during system startup.
|
|||||||
|
|
||||||
For compatibility with legacy interfaces to GPIOs, setting the direction
|
For compatibility with legacy interfaces to GPIOs, setting the direction
|
||||||
of a GPIO implicitly requests that GPIO (see below) if it has not been
|
of a GPIO implicitly requests that GPIO (see below) if it has not been
|
||||||
requested already. That compatibility may be removed in the future;
|
requested already. That compatibility is being removed from the optional
|
||||||
explicitly requesting GPIOs is strongly preferred.
|
gpiolib framework.
|
||||||
|
|
||||||
Setting the direction can fail if the GPIO number is invalid, or when
|
Setting the direction can fail if the GPIO number is invalid, or when
|
||||||
that particular GPIO can't be used in that mode. It's generally a bad
|
that particular GPIO can't be used in that mode. It's generally a bad
|
||||||
@@ -195,7 +198,7 @@ This requires sleeping, which can't be done from inside IRQ handlers.
|
|||||||
|
|
||||||
Platforms that support this type of GPIO distinguish them from other GPIOs
|
Platforms that support this type of GPIO distinguish them from other GPIOs
|
||||||
by returning nonzero from this call (which requires a valid GPIO number,
|
by returning nonzero from this call (which requires a valid GPIO number,
|
||||||
either explicitly or implicitly requested):
|
which should have been previously allocated with gpio_request):
|
||||||
|
|
||||||
int gpio_cansleep(unsigned gpio);
|
int gpio_cansleep(unsigned gpio);
|
||||||
|
|
||||||
@@ -212,10 +215,9 @@ for GPIOs that can't be accessed from IRQ handlers, these calls act the
|
|||||||
same as the spinlock-safe calls.
|
same as the spinlock-safe calls.
|
||||||
|
|
||||||
|
|
||||||
Claiming and Releasing GPIOs (OPTIONAL)
|
Claiming and Releasing GPIOs
|
||||||
---------------------------------------
|
----------------------------
|
||||||
To help catch system configuration errors, two calls are defined.
|
To help catch system configuration errors, two calls are defined.
|
||||||
However, many platforms don't currently support this mechanism.
|
|
||||||
|
|
||||||
/* request GPIO, returning 0 or negative errno.
|
/* request GPIO, returning 0 or negative errno.
|
||||||
* non-null labels may be useful for diagnostics.
|
* non-null labels may be useful for diagnostics.
|
||||||
@@ -244,13 +246,6 @@ Some platforms may also use knowledge about what GPIOs are active for
|
|||||||
power management, such as by powering down unused chip sectors and, more
|
power management, such as by powering down unused chip sectors and, more
|
||||||
easily, gating off unused clocks.
|
easily, gating off unused clocks.
|
||||||
|
|
||||||
These two calls are optional because not not all current Linux platforms
|
|
||||||
offer such functionality in their GPIO support; a valid implementation
|
|
||||||
could return success for all gpio_request() calls. Unlike the other calls,
|
|
||||||
the state they represent doesn't normally match anything from a hardware
|
|
||||||
register; it's just a software bitmap which clearly is not necessary for
|
|
||||||
correct operation of hardware or (bug free) drivers.
|
|
||||||
|
|
||||||
Note that requesting a GPIO does NOT cause it to be configured in any
|
Note that requesting a GPIO does NOT cause it to be configured in any
|
||||||
way; it just marks that GPIO as in use. Separate code must handle any
|
way; it just marks that GPIO as in use. Separate code must handle any
|
||||||
pin setup (e.g. controlling which pin the GPIO uses, pullup/pulldown).
|
pin setup (e.g. controlling which pin the GPIO uses, pullup/pulldown).
|
||||||
|
@@ -49,12 +49,9 @@ of up to +/- 0.5 degrees even when compared against precise temperature
|
|||||||
readings. Be sure to have a high vs. low temperature limit gap of al least
|
readings. Be sure to have a high vs. low temperature limit gap of al least
|
||||||
1.0 degree Celsius to avoid Tout "bouncing", though!
|
1.0 degree Celsius to avoid Tout "bouncing", though!
|
||||||
|
|
||||||
As for alarms, you can read the alarm status of the DS1621 via the 'alarms'
|
The alarm bits are set when the high or low limits are met or exceeded and
|
||||||
/sys file interface. The result consists mainly of bit 6 and 5 of the
|
are reset by the module as soon as the respective temperature ranges are
|
||||||
configuration register of the chip; bit 6 (0x40 or 64) is the high alarm
|
left.
|
||||||
bit and bit 5 (0x20 or 32) the low one. These bits are set when the high or
|
|
||||||
low limits are met or exceeded and are reset by the module as soon as the
|
|
||||||
respective temperature ranges are left.
|
|
||||||
|
|
||||||
The alarm registers are in no way suitable to find out about the actual
|
The alarm registers are in no way suitable to find out about the actual
|
||||||
status of Tout. They will only tell you about its history, whether or not
|
status of Tout. They will only tell you about its history, whether or not
|
||||||
@@ -64,45 +61,3 @@ with neither of the alarms set.
|
|||||||
|
|
||||||
Temperature conversion of the DS1621 takes up to 1000ms; internal access to
|
Temperature conversion of the DS1621 takes up to 1000ms; internal access to
|
||||||
non-volatile registers may last for 10ms or below.
|
non-volatile registers may last for 10ms or below.
|
||||||
|
|
||||||
High Accuracy Temperature Reading
|
|
||||||
---------------------------------
|
|
||||||
|
|
||||||
As said before, the temperature issued via the 9-bit i2c-bus data is
|
|
||||||
somewhat arbitrary. Internally, the temperature conversion is of a
|
|
||||||
different kind that is explained (not so...) well in the DS1621 data sheet.
|
|
||||||
To cut the long story short: Inside the DS1621 there are two oscillators,
|
|
||||||
both of them biassed by a temperature coefficient.
|
|
||||||
|
|
||||||
Higher resolution of the temperature reading can be achieved using the
|
|
||||||
internal projection, which means taking account of REG_COUNT and REG_SLOPE
|
|
||||||
(the driver manages them):
|
|
||||||
|
|
||||||
Taken from Dallas Semiconductors App Note 068: 'Increasing Temperature
|
|
||||||
Resolution on the DS1620' and App Note 105: 'High Resolution Temperature
|
|
||||||
Measurement with Dallas Direct-to-Digital Temperature Sensors'
|
|
||||||
|
|
||||||
- Read the 9-bit temperature and strip the LSB (Truncate the .5 degs)
|
|
||||||
- The resulting value is TEMP_READ.
|
|
||||||
- Then, read REG_COUNT.
|
|
||||||
- And then, REG_SLOPE.
|
|
||||||
|
|
||||||
TEMP = TEMP_READ - 0.25 + ((REG_SLOPE - REG_COUNT) / REG_SLOPE)
|
|
||||||
|
|
||||||
Note that this is what the DONE bit in the DS1621 configuration register is
|
|
||||||
good for: Internally, one temperature conversion takes up to 1000ms. Before
|
|
||||||
that conversion is complete you will not be able to read valid things out
|
|
||||||
of REG_COUNT and REG_SLOPE. The DONE bit, as you may have guessed by now,
|
|
||||||
tells you whether the conversion is complete ("done", in plain English) and
|
|
||||||
thus, whether the values you read are good or not.
|
|
||||||
|
|
||||||
The DS1621 has two modes of operation: "Continuous" conversion, which can
|
|
||||||
be understood as the default stand-alone mode where the chip gets the
|
|
||||||
temperature and controls external devices via its Tout pin or tells other
|
|
||||||
i2c's about it if they care. The other mode is called "1SHOT", that means
|
|
||||||
that it only figures out about the temperature when it is explicitly told
|
|
||||||
to do so; this can be seen as power saving mode.
|
|
||||||
|
|
||||||
Now if you want to read REG_COUNT and REG_SLOPE, you have to either stop
|
|
||||||
the continuous conversions until the contents of these registers are valid,
|
|
||||||
or, in 1SHOT mode, you have to have one conversion made.
|
|
||||||
|
36
Documentation/hwmon/g760a
Normal file
36
Documentation/hwmon/g760a
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
Kernel driver g760a
|
||||||
|
===================
|
||||||
|
|
||||||
|
Supported chips:
|
||||||
|
* Global Mixed-mode Technology Inc. G760A
|
||||||
|
Prefix: 'g760a'
|
||||||
|
Datasheet: Publicly available at the GMT website
|
||||||
|
http://www.gmt.com.tw/datasheet/g760a.pdf
|
||||||
|
|
||||||
|
Author: Herbert Valerio Riedel <hvr@gnu.org>
|
||||||
|
|
||||||
|
Description
|
||||||
|
-----------
|
||||||
|
|
||||||
|
The GMT G760A Fan Speed PWM Controller is connected directly to a fan
|
||||||
|
and performs closed-loop control of the fan speed.
|
||||||
|
|
||||||
|
The fan speed is programmed by setting the period via 'pwm1' of two
|
||||||
|
consecutive speed pulses. The period is defined in terms of clock
|
||||||
|
cycle counts of an assumed 32kHz clock source.
|
||||||
|
|
||||||
|
Setting a period of 0 stops the fan; setting the period to 255 sets
|
||||||
|
fan to maximum speed.
|
||||||
|
|
||||||
|
The measured fan rotation speed returned via 'fan1_input' is derived
|
||||||
|
from the measured speed pulse period by assuming again a 32kHz clock
|
||||||
|
source and a 2 pulse-per-revolution fan.
|
||||||
|
|
||||||
|
The 'alarms' file provides access to the two alarm bits provided by
|
||||||
|
the G760A chip's status register: Bit 0 is set when the actual fan
|
||||||
|
speed differs more than 20% with respect to the programmed fan speed;
|
||||||
|
bit 1 is set when fan speed is below 1920 RPM.
|
||||||
|
|
||||||
|
The g760a driver will not update its values more frequently than every
|
||||||
|
other second; reading them more often will do no harm, but will return
|
||||||
|
'old' values.
|
101
Documentation/hwmon/hpfall.c
Normal file
101
Documentation/hwmon/hpfall.c
Normal file
@@ -0,0 +1,101 @@
|
|||||||
|
/* Disk protection for HP machines.
|
||||||
|
*
|
||||||
|
* Copyright 2008 Eric Piel
|
||||||
|
* Copyright 2009 Pavel Machek <pavel@suse.cz>
|
||||||
|
*
|
||||||
|
* GPLv2.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <unistd.h>
|
||||||
|
#include <fcntl.h>
|
||||||
|
#include <sys/stat.h>
|
||||||
|
#include <sys/types.h>
|
||||||
|
#include <string.h>
|
||||||
|
#include <stdint.h>
|
||||||
|
#include <errno.h>
|
||||||
|
#include <signal.h>
|
||||||
|
|
||||||
|
void write_int(char *path, int i)
|
||||||
|
{
|
||||||
|
char buf[1024];
|
||||||
|
int fd = open(path, O_RDWR);
|
||||||
|
if (fd < 0) {
|
||||||
|
perror("open");
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
sprintf(buf, "%d", i);
|
||||||
|
if (write(fd, buf, strlen(buf)) != strlen(buf)) {
|
||||||
|
perror("write");
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
close(fd);
|
||||||
|
}
|
||||||
|
|
||||||
|
void set_led(int on)
|
||||||
|
{
|
||||||
|
write_int("/sys/class/leds/hp::hddprotect/brightness", on);
|
||||||
|
}
|
||||||
|
|
||||||
|
void protect(int seconds)
|
||||||
|
{
|
||||||
|
write_int("/sys/block/sda/device/unload_heads", seconds*1000);
|
||||||
|
}
|
||||||
|
|
||||||
|
int on_ac(void)
|
||||||
|
{
|
||||||
|
// /sys/class/power_supply/AC0/online
|
||||||
|
}
|
||||||
|
|
||||||
|
int lid_open(void)
|
||||||
|
{
|
||||||
|
// /proc/acpi/button/lid/LID/state
|
||||||
|
}
|
||||||
|
|
||||||
|
void ignore_me(void)
|
||||||
|
{
|
||||||
|
protect(0);
|
||||||
|
set_led(0);
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
int main(int argc, char* argv[])
|
||||||
|
{
|
||||||
|
int fd, ret;
|
||||||
|
|
||||||
|
fd = open("/dev/freefall", O_RDONLY);
|
||||||
|
if (fd < 0) {
|
||||||
|
perror("open");
|
||||||
|
return EXIT_FAILURE;
|
||||||
|
}
|
||||||
|
|
||||||
|
signal(SIGALRM, ignore_me);
|
||||||
|
|
||||||
|
for (;;) {
|
||||||
|
unsigned char count;
|
||||||
|
|
||||||
|
ret = read(fd, &count, sizeof(count));
|
||||||
|
alarm(0);
|
||||||
|
if ((ret == -1) && (errno == EINTR)) {
|
||||||
|
/* Alarm expired, time to unpark the heads */
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (ret != sizeof(count)) {
|
||||||
|
perror("read");
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
protect(21);
|
||||||
|
set_led(1);
|
||||||
|
if (1 || on_ac() || lid_open()) {
|
||||||
|
alarm(2);
|
||||||
|
} else {
|
||||||
|
alarm(20);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
close(fd);
|
||||||
|
return EXIT_SUCCESS;
|
||||||
|
}
|
@@ -1,11 +1,11 @@
|
|||||||
Kernel driver lis3lv02d
|
Kernel driver lis3lv02d
|
||||||
==================
|
=======================
|
||||||
|
|
||||||
Supported chips:
|
Supported chips:
|
||||||
|
|
||||||
* STMicroelectronics LIS3LV02DL and LIS3LV02DQ
|
* STMicroelectronics LIS3LV02DL and LIS3LV02DQ
|
||||||
|
|
||||||
Author:
|
Authors:
|
||||||
Yan Burman <burman.yan@gmail.com>
|
Yan Burman <burman.yan@gmail.com>
|
||||||
Eric Piel <eric.piel@tremplin-utc.net>
|
Eric Piel <eric.piel@tremplin-utc.net>
|
||||||
|
|
||||||
@@ -15,7 +15,7 @@ Description
|
|||||||
|
|
||||||
This driver provides support for the accelerometer found in various HP
|
This driver provides support for the accelerometer found in various HP
|
||||||
laptops sporting the feature officially called "HP Mobile Data
|
laptops sporting the feature officially called "HP Mobile Data
|
||||||
Protection System 3D" or "HP 3D DriveGuard". It detect automatically
|
Protection System 3D" or "HP 3D DriveGuard". It detects automatically
|
||||||
laptops with this sensor. Known models (for now the HP 2133, nc6420,
|
laptops with this sensor. Known models (for now the HP 2133, nc6420,
|
||||||
nc2510, nc8510, nc84x0, nw9440 and nx9420) will have their axis
|
nc2510, nc8510, nc84x0, nw9440 and nx9420) will have their axis
|
||||||
automatically oriented on standard way (eg: you can directly play
|
automatically oriented on standard way (eg: you can directly play
|
||||||
@@ -33,6 +33,14 @@ rate - reports the sampling rate of the accelerometer device in HZ
|
|||||||
This driver also provides an absolute input class device, allowing
|
This driver also provides an absolute input class device, allowing
|
||||||
the laptop to act as a pinball machine-esque joystick.
|
the laptop to act as a pinball machine-esque joystick.
|
||||||
|
|
||||||
|
Another feature of the driver is misc device called "freefall" that
|
||||||
|
acts similar to /dev/rtc and reacts on free-fall interrupts received
|
||||||
|
from the device. It supports blocking operations, poll/select and
|
||||||
|
fasync operation modes. You must read 1 bytes from the device. The
|
||||||
|
result is number of free-fall interrupts since the last successful
|
||||||
|
read (or 255 if number of interrupts would not fit).
|
||||||
|
|
||||||
|
|
||||||
Axes orientation
|
Axes orientation
|
||||||
----------------
|
----------------
|
||||||
|
|
||||||
@@ -40,7 +48,7 @@ For better compatibility between the various laptops. The values reported by
|
|||||||
the accelerometer are converted into a "standard" organisation of the axes
|
the accelerometer are converted into a "standard" organisation of the axes
|
||||||
(aka "can play neverball out of the box"):
|
(aka "can play neverball out of the box"):
|
||||||
* When the laptop is horizontal the position reported is about 0 for X and Y
|
* When the laptop is horizontal the position reported is about 0 for X and Y
|
||||||
and a positive value for Z
|
and a positive value for Z
|
||||||
* If the left side is elevated, X increases (becomes positive)
|
* If the left side is elevated, X increases (becomes positive)
|
||||||
* If the front side (where the touchpad is) is elevated, Y decreases
|
* If the front side (where the touchpad is) is elevated, Y decreases
|
||||||
(becomes negative)
|
(becomes negative)
|
||||||
@@ -51,3 +59,13 @@ email to the authors to add it to the database. When reporting a new
|
|||||||
laptop, please include the output of "dmidecode" plus the value of
|
laptop, please include the output of "dmidecode" plus the value of
|
||||||
/sys/devices/platform/lis3lv02d/position in these four cases.
|
/sys/devices/platform/lis3lv02d/position in these four cases.
|
||||||
|
|
||||||
|
Q&A
|
||||||
|
---
|
||||||
|
|
||||||
|
Q: How do I safely simulate freefall? I have an HP "portable
|
||||||
|
workstation" which has about 3.5kg and a plastic case, so letting it
|
||||||
|
fall to the ground is out of question...
|
||||||
|
|
||||||
|
A: The sensor is pretty sensitive, so your hands can do it. Lift it
|
||||||
|
into free space, follow the fall with your hands for like 10
|
||||||
|
centimeters. That should be enough to trigger the detection.
|
||||||
|
@@ -42,6 +42,11 @@ Supported chips:
|
|||||||
Addresses scanned: I2C 0x4e
|
Addresses scanned: I2C 0x4e
|
||||||
Datasheet: Publicly available at the Maxim website
|
Datasheet: Publicly available at the Maxim website
|
||||||
http://www.maxim-ic.com/quick_view2.cfm/qv_pk/3497
|
http://www.maxim-ic.com/quick_view2.cfm/qv_pk/3497
|
||||||
|
* Maxim MAX6648
|
||||||
|
Prefix: 'max6646'
|
||||||
|
Addresses scanned: I2C 0x4c
|
||||||
|
Datasheet: Publicly available at the Maxim website
|
||||||
|
http://www.maxim-ic.com/quick_view2.cfm/qv_pk/3500
|
||||||
* Maxim MAX6649
|
* Maxim MAX6649
|
||||||
Prefix: 'max6646'
|
Prefix: 'max6646'
|
||||||
Addresses scanned: I2C 0x4c
|
Addresses scanned: I2C 0x4c
|
||||||
@@ -74,6 +79,11 @@ Supported chips:
|
|||||||
0x4c, 0x4d and 0x4e
|
0x4c, 0x4d and 0x4e
|
||||||
Datasheet: Publicly available at the Maxim website
|
Datasheet: Publicly available at the Maxim website
|
||||||
http://www.maxim-ic.com/quick_view2.cfm/qv_pk/3370
|
http://www.maxim-ic.com/quick_view2.cfm/qv_pk/3370
|
||||||
|
* Maxim MAX6692
|
||||||
|
Prefix: 'max6646'
|
||||||
|
Addresses scanned: I2C 0x4c
|
||||||
|
Datasheet: Publicly available at the Maxim website
|
||||||
|
http://www.maxim-ic.com/quick_view2.cfm/qv_pk/3500
|
||||||
|
|
||||||
|
|
||||||
Author: Jean Delvare <khali@linux-fr.org>
|
Author: Jean Delvare <khali@linux-fr.org>
|
||||||
|
50
Documentation/hwmon/ltc4215
Normal file
50
Documentation/hwmon/ltc4215
Normal file
@@ -0,0 +1,50 @@
|
|||||||
|
Kernel driver ltc4215
|
||||||
|
=====================
|
||||||
|
|
||||||
|
Supported chips:
|
||||||
|
* Linear Technology LTC4215
|
||||||
|
Prefix: 'ltc4215'
|
||||||
|
Addresses scanned: 0x44
|
||||||
|
Datasheet:
|
||||||
|
http://www.linear.com/pc/downloadDocument.do?navId=H0,C1,C1003,C1006,C1163,P17572,D12697
|
||||||
|
|
||||||
|
Author: Ira W. Snyder <iws@ovro.caltech.edu>
|
||||||
|
|
||||||
|
|
||||||
|
Description
|
||||||
|
-----------
|
||||||
|
|
||||||
|
The LTC4215 controller allows a board to be safely inserted and removed
|
||||||
|
from a live backplane.
|
||||||
|
|
||||||
|
|
||||||
|
Usage Notes
|
||||||
|
-----------
|
||||||
|
|
||||||
|
This driver does not probe for LTC4215 devices, due to the fact that some
|
||||||
|
of the possible addresses are unfriendly to probing. You will need to use
|
||||||
|
the "force" parameter to tell the driver where to find the device.
|
||||||
|
|
||||||
|
Example: the following will load the driver for an LTC4215 at address 0x44
|
||||||
|
on I2C bus #0:
|
||||||
|
$ modprobe ltc4215 force=0,0x44
|
||||||
|
|
||||||
|
|
||||||
|
Sysfs entries
|
||||||
|
-------------
|
||||||
|
|
||||||
|
The LTC4215 has built-in limits for overvoltage, undervoltage, and
|
||||||
|
undercurrent warnings. This makes it very likely that the reference
|
||||||
|
circuit will be used.
|
||||||
|
|
||||||
|
in1_input input voltage
|
||||||
|
in2_input output voltage
|
||||||
|
|
||||||
|
in1_min_alarm input undervoltage alarm
|
||||||
|
in1_max_alarm input overvoltage alarm
|
||||||
|
|
||||||
|
curr1_input current
|
||||||
|
curr1_max_alarm overcurrent alarm
|
||||||
|
|
||||||
|
power1_input power usage
|
||||||
|
power1_alarm power bad alarm
|
@@ -365,6 +365,7 @@ energy[1-*]_input Cumulative energy use
|
|||||||
Unit: microJoule
|
Unit: microJoule
|
||||||
RO
|
RO
|
||||||
|
|
||||||
|
|
||||||
**********
|
**********
|
||||||
* Alarms *
|
* Alarms *
|
||||||
**********
|
**********
|
||||||
@@ -453,6 +454,27 @@ beep_mask Bitmask for beep.
|
|||||||
RW
|
RW
|
||||||
|
|
||||||
|
|
||||||
|
***********************
|
||||||
|
* Intrusion detection *
|
||||||
|
***********************
|
||||||
|
|
||||||
|
intrusion[0-*]_alarm
|
||||||
|
Chassis intrusion detection
|
||||||
|
0: OK
|
||||||
|
1: intrusion detected
|
||||||
|
RW
|
||||||
|
Contrary to regular alarm flags which clear themselves
|
||||||
|
automatically when read, this one sticks until cleared by
|
||||||
|
the user. This is done by writing 0 to the file. Writing
|
||||||
|
other values is unsupported.
|
||||||
|
|
||||||
|
intrusion[0-*]_beep
|
||||||
|
Chassis intrusion beep
|
||||||
|
0: disable
|
||||||
|
1: enable
|
||||||
|
RW
|
||||||
|
|
||||||
|
|
||||||
sysfs attribute writes interpretation
|
sysfs attribute writes interpretation
|
||||||
-------------------------------------
|
-------------------------------------
|
||||||
|
|
||||||
|
@@ -2,30 +2,40 @@ Kernel driver w83627ehf
|
|||||||
=======================
|
=======================
|
||||||
|
|
||||||
Supported chips:
|
Supported chips:
|
||||||
* Winbond W83627EHF/EHG/DHG (ISA access ONLY)
|
* Winbond W83627EHF/EHG (ISA access ONLY)
|
||||||
Prefix: 'w83627ehf'
|
Prefix: 'w83627ehf'
|
||||||
Addresses scanned: ISA address retrieved from Super I/O registers
|
Addresses scanned: ISA address retrieved from Super I/O registers
|
||||||
Datasheet:
|
Datasheet:
|
||||||
http://www.winbond-usa.com/products/winbond_products/pdfs/PCIC/W83627EHF_%20W83627EHGb.pdf
|
http://www.nuvoton.com.tw/NR/rdonlyres/A6A258F0-F0C9-4F97-81C0-C4D29E7E943E/0/W83627EHF.pdf
|
||||||
DHG datasheet confidential.
|
* Winbond W83627DHG
|
||||||
|
Prefix: 'w83627dhg'
|
||||||
|
Addresses scanned: ISA address retrieved from Super I/O registers
|
||||||
|
Datasheet:
|
||||||
|
http://www.nuvoton.com.tw/NR/rdonlyres/7885623D-A487-4CF9-A47F-30C5F73D6FE6/0/W83627DHG.pdf
|
||||||
|
* Winbond W83667HG
|
||||||
|
Prefix: 'w83667hg'
|
||||||
|
Addresses scanned: ISA address retrieved from Super I/O registers
|
||||||
|
Datasheet: not available
|
||||||
|
|
||||||
Authors:
|
Authors:
|
||||||
Jean Delvare <khali@linux-fr.org>
|
Jean Delvare <khali@linux-fr.org>
|
||||||
Yuan Mu (Winbond)
|
Yuan Mu (Winbond)
|
||||||
Rudolf Marek <r.marek@assembler.cz>
|
Rudolf Marek <r.marek@assembler.cz>
|
||||||
David Hubbard <david.c.hubbard@gmail.com>
|
David Hubbard <david.c.hubbard@gmail.com>
|
||||||
|
Gong Jun <JGong@nuvoton.com>
|
||||||
|
|
||||||
Description
|
Description
|
||||||
-----------
|
-----------
|
||||||
|
|
||||||
This driver implements support for the Winbond W83627EHF, W83627EHG, and
|
This driver implements support for the Winbond W83627EHF, W83627EHG,
|
||||||
W83627DHG super I/O chips. We will refer to them collectively as Winbond chips.
|
W83627DHG and W83667HG super I/O chips. We will refer to them collectively
|
||||||
|
as Winbond chips.
|
||||||
|
|
||||||
The chips implement three temperature sensors, five fan rotation
|
The chips implement three temperature sensors, five fan rotation
|
||||||
speed sensors, ten analog voltage sensors (only nine for the 627DHG), one
|
speed sensors, ten analog voltage sensors (only nine for the 627DHG), one
|
||||||
VID (6 pins for the 627EHF/EHG, 8 pins for the 627DHG), alarms with beep
|
VID (6 pins for the 627EHF/EHG, 8 pins for the 627DHG and 667HG), alarms
|
||||||
warnings (control unimplemented), and some automatic fan regulation
|
with beep warnings (control unimplemented), and some automatic fan
|
||||||
strategies (plus manual fan control mode).
|
regulation strategies (plus manual fan control mode).
|
||||||
|
|
||||||
Temperatures are measured in degrees Celsius and measurement resolution is 1
|
Temperatures are measured in degrees Celsius and measurement resolution is 1
|
||||||
degC for temp1 and 0.5 degC for temp2 and temp3. An alarm is triggered when
|
degC for temp1 and 0.5 degC for temp2 and temp3. An alarm is triggered when
|
||||||
@@ -54,7 +64,8 @@ follows:
|
|||||||
temp1 -> pwm1
|
temp1 -> pwm1
|
||||||
temp2 -> pwm2
|
temp2 -> pwm2
|
||||||
temp3 -> pwm3
|
temp3 -> pwm3
|
||||||
prog -> pwm4 (the programmable setting is not supported by the driver)
|
prog -> pwm4 (not on 667HG; the programmable setting is not supported by
|
||||||
|
the driver)
|
||||||
|
|
||||||
/sys files
|
/sys files
|
||||||
----------
|
----------
|
||||||
|
@@ -7,10 +7,14 @@ Supported adapters:
|
|||||||
* nForce3 250Gb MCP 10de:00E4
|
* nForce3 250Gb MCP 10de:00E4
|
||||||
* nForce4 MCP 10de:0052
|
* nForce4 MCP 10de:0052
|
||||||
* nForce4 MCP-04 10de:0034
|
* nForce4 MCP-04 10de:0034
|
||||||
* nForce4 MCP51 10de:0264
|
* nForce MCP51 10de:0264
|
||||||
* nForce4 MCP55 10de:0368
|
* nForce MCP55 10de:0368
|
||||||
* nForce4 MCP61 10de:03EB
|
* nForce MCP61 10de:03EB
|
||||||
* nForce4 MCP65 10de:0446
|
* nForce MCP65 10de:0446
|
||||||
|
* nForce MCP67 10de:0542
|
||||||
|
* nForce MCP73 10de:07D8
|
||||||
|
* nForce MCP78S 10de:0752
|
||||||
|
* nForce MCP79 10de:0AA2
|
||||||
|
|
||||||
Datasheet: not publicly available, but seems to be similar to the
|
Datasheet: not publicly available, but seems to be similar to the
|
||||||
AMD-8111 SMBus 2.0 adapter.
|
AMD-8111 SMBus 2.0 adapter.
|
||||||
|
@@ -4,7 +4,7 @@ Supported adapters:
|
|||||||
* Intel 82371AB PIIX4 and PIIX4E
|
* Intel 82371AB PIIX4 and PIIX4E
|
||||||
* Intel 82443MX (440MX)
|
* Intel 82443MX (440MX)
|
||||||
Datasheet: Publicly available at the Intel website
|
Datasheet: Publicly available at the Intel website
|
||||||
* ServerWorks OSB4, CSB5, CSB6 and HT-1000 southbridges
|
* ServerWorks OSB4, CSB5, CSB6, HT-1000 and HT-1100 southbridges
|
||||||
Datasheet: Only available via NDA from ServerWorks
|
Datasheet: Only available via NDA from ServerWorks
|
||||||
* ATI IXP200, IXP300, IXP400, SB600, SB700 and SB800 southbridges
|
* ATI IXP200, IXP300, IXP400, SB600, SB700 and SB800 southbridges
|
||||||
Datasheet: Not publicly available
|
Datasheet: Not publicly available
|
||||||
|
167
Documentation/i2c/instantiating-devices
Normal file
167
Documentation/i2c/instantiating-devices
Normal file
@@ -0,0 +1,167 @@
|
|||||||
|
How to instantiate I2C devices
|
||||||
|
==============================
|
||||||
|
|
||||||
|
Unlike PCI or USB devices, I2C devices are not enumerated at the hardware
|
||||||
|
level. Instead, the software must know which devices are connected on each
|
||||||
|
I2C bus segment, and what address these devices are using. For this
|
||||||
|
reason, the kernel code must instantiate I2C devices explicitly. There are
|
||||||
|
several ways to achieve this, depending on the context and requirements.
|
||||||
|
|
||||||
|
|
||||||
|
Method 1: Declare the I2C devices by bus number
|
||||||
|
-----------------------------------------------
|
||||||
|
|
||||||
|
This method is appropriate when the I2C bus is a system bus as is the case
|
||||||
|
for many embedded systems. On such systems, each I2C bus has a number
|
||||||
|
which is known in advance. It is thus possible to pre-declare the I2C
|
||||||
|
devices which live on this bus. This is done with an array of struct
|
||||||
|
i2c_board_info which is registered by calling i2c_register_board_info().
|
||||||
|
|
||||||
|
Example (from omap2 h4):
|
||||||
|
|
||||||
|
static struct i2c_board_info __initdata h4_i2c_board_info[] = {
|
||||||
|
{
|
||||||
|
I2C_BOARD_INFO("isp1301_omap", 0x2d),
|
||||||
|
.irq = OMAP_GPIO_IRQ(125),
|
||||||
|
},
|
||||||
|
{ /* EEPROM on mainboard */
|
||||||
|
I2C_BOARD_INFO("24c01", 0x52),
|
||||||
|
.platform_data = &m24c01,
|
||||||
|
},
|
||||||
|
{ /* EEPROM on cpu card */
|
||||||
|
I2C_BOARD_INFO("24c01", 0x57),
|
||||||
|
.platform_data = &m24c01,
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
static void __init omap_h4_init(void)
|
||||||
|
{
|
||||||
|
(...)
|
||||||
|
i2c_register_board_info(1, h4_i2c_board_info,
|
||||||
|
ARRAY_SIZE(h4_i2c_board_info));
|
||||||
|
(...)
|
||||||
|
}
|
||||||
|
|
||||||
|
The above code declares 3 devices on I2C bus 1, including their respective
|
||||||
|
addresses and custom data needed by their drivers. When the I2C bus in
|
||||||
|
question is registered, the I2C devices will be instantiated automatically
|
||||||
|
by i2c-core.
|
||||||
|
|
||||||
|
The devices will be automatically unbound and destroyed when the I2C bus
|
||||||
|
they sit on goes away (if ever.)
|
||||||
|
|
||||||
|
|
||||||
|
Method 2: Instantiate the devices explicitly
|
||||||
|
--------------------------------------------
|
||||||
|
|
||||||
|
This method is appropriate when a larger device uses an I2C bus for
|
||||||
|
internal communication. A typical case is TV adapters. These can have a
|
||||||
|
tuner, a video decoder, an audio decoder, etc. usually connected to the
|
||||||
|
main chip by the means of an I2C bus. You won't know the number of the I2C
|
||||||
|
bus in advance, so the method 1 described above can't be used. Instead,
|
||||||
|
you can instantiate your I2C devices explicitly. This is done by filling
|
||||||
|
a struct i2c_board_info and calling i2c_new_device().
|
||||||
|
|
||||||
|
Example (from the sfe4001 network driver):
|
||||||
|
|
||||||
|
static struct i2c_board_info sfe4001_hwmon_info = {
|
||||||
|
I2C_BOARD_INFO("max6647", 0x4e),
|
||||||
|
};
|
||||||
|
|
||||||
|
int sfe4001_init(struct efx_nic *efx)
|
||||||
|
{
|
||||||
|
(...)
|
||||||
|
efx->board_info.hwmon_client =
|
||||||
|
i2c_new_device(&efx->i2c_adap, &sfe4001_hwmon_info);
|
||||||
|
|
||||||
|
(...)
|
||||||
|
}
|
||||||
|
|
||||||
|
The above code instantiates 1 I2C device on the I2C bus which is on the
|
||||||
|
network adapter in question.
|
||||||
|
|
||||||
|
A variant of this is when you don't know for sure if an I2C device is
|
||||||
|
present or not (for example for an optional feature which is not present
|
||||||
|
on cheap variants of a board but you have no way to tell them apart), or
|
||||||
|
it may have different addresses from one board to the next (manufacturer
|
||||||
|
changing its design without notice). In this case, you can call
|
||||||
|
i2c_new_probed_device() instead of i2c_new_device().
|
||||||
|
|
||||||
|
Example (from the pnx4008 OHCI driver):
|
||||||
|
|
||||||
|
static const unsigned short normal_i2c[] = { 0x2c, 0x2d, I2C_CLIENT_END };
|
||||||
|
|
||||||
|
static int __devinit usb_hcd_pnx4008_probe(struct platform_device *pdev)
|
||||||
|
{
|
||||||
|
(...)
|
||||||
|
struct i2c_adapter *i2c_adap;
|
||||||
|
struct i2c_board_info i2c_info;
|
||||||
|
|
||||||
|
(...)
|
||||||
|
i2c_adap = i2c_get_adapter(2);
|
||||||
|
memset(&i2c_info, 0, sizeof(struct i2c_board_info));
|
||||||
|
strlcpy(i2c_info.name, "isp1301_pnx", I2C_NAME_SIZE);
|
||||||
|
isp1301_i2c_client = i2c_new_probed_device(i2c_adap, &i2c_info,
|
||||||
|
normal_i2c);
|
||||||
|
i2c_put_adapter(i2c_adap);
|
||||||
|
(...)
|
||||||
|
}
|
||||||
|
|
||||||
|
The above code instantiates up to 1 I2C device on the I2C bus which is on
|
||||||
|
the OHCI adapter in question. It first tries at address 0x2c, if nothing
|
||||||
|
is found there it tries address 0x2d, and if still nothing is found, it
|
||||||
|
simply gives up.
|
||||||
|
|
||||||
|
The driver which instantiated the I2C device is responsible for destroying
|
||||||
|
it on cleanup. This is done by calling i2c_unregister_device() on the
|
||||||
|
pointer that was earlier returned by i2c_new_device() or
|
||||||
|
i2c_new_probed_device().
|
||||||
|
|
||||||
|
|
||||||
|
Method 3: Probe an I2C bus for certain devices
|
||||||
|
----------------------------------------------
|
||||||
|
|
||||||
|
Sometimes you do not have enough information about an I2C device, not even
|
||||||
|
to call i2c_new_probed_device(). The typical case is hardware monitoring
|
||||||
|
chips on PC mainboards. There are several dozen models, which can live
|
||||||
|
at 25 different addresses. Given the huge number of mainboards out there,
|
||||||
|
it is next to impossible to build an exhaustive list of the hardware
|
||||||
|
monitoring chips being used. Fortunately, most of these chips have
|
||||||
|
manufacturer and device ID registers, so they can be identified by
|
||||||
|
probing.
|
||||||
|
|
||||||
|
In that case, I2C devices are neither declared nor instantiated
|
||||||
|
explicitly. Instead, i2c-core will probe for such devices as soon as their
|
||||||
|
drivers are loaded, and if any is found, an I2C device will be
|
||||||
|
instantiated automatically. In order to prevent any misbehavior of this
|
||||||
|
mechanism, the following restrictions apply:
|
||||||
|
* The I2C device driver must implement the detect() method, which
|
||||||
|
identifies a supported device by reading from arbitrary registers.
|
||||||
|
* Only buses which are likely to have a supported device and agree to be
|
||||||
|
probed, will be probed. For example this avoids probing for hardware
|
||||||
|
monitoring chips on a TV adapter.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
See lm90_driver and lm90_detect() in drivers/hwmon/lm90.c
|
||||||
|
|
||||||
|
I2C devices instantiated as a result of such a successful probe will be
|
||||||
|
destroyed automatically when the driver which detected them is removed,
|
||||||
|
or when the underlying I2C bus is itself destroyed, whichever happens
|
||||||
|
first.
|
||||||
|
|
||||||
|
Those of you familiar with the i2c subsystem of 2.4 kernels and early 2.6
|
||||||
|
kernels will find out that this method 3 is essentially similar to what
|
||||||
|
was done there. Two significant differences are:
|
||||||
|
* Probing is only one way to instantiate I2C devices now, while it was the
|
||||||
|
only way back then. Where possible, methods 1 and 2 should be preferred.
|
||||||
|
Method 3 should only be used when there is no other way, as it can have
|
||||||
|
undesirable side effects.
|
||||||
|
* I2C buses must now explicitly say which I2C driver classes can probe
|
||||||
|
them (by the means of the class bitfield), while all I2C buses were
|
||||||
|
probed by default back then. The default is an empty class which means
|
||||||
|
that no probing happens. The purpose of the class bitfield is to limit
|
||||||
|
the aforementioned undesirable side effects.
|
||||||
|
|
||||||
|
Once again, method 3 should be avoided wherever possible. Explicit device
|
||||||
|
instantiation (methods 1 and 2) is much preferred for it is safer and
|
||||||
|
faster.
|
@@ -207,15 +207,26 @@ You simply have to define a detect callback which will attempt to
|
|||||||
identify supported devices (returning 0 for supported ones and -ENODEV
|
identify supported devices (returning 0 for supported ones and -ENODEV
|
||||||
for unsupported ones), a list of addresses to probe, and a device type
|
for unsupported ones), a list of addresses to probe, and a device type
|
||||||
(or class) so that only I2C buses which may have that type of device
|
(or class) so that only I2C buses which may have that type of device
|
||||||
connected (and not otherwise enumerated) will be probed. The i2c
|
connected (and not otherwise enumerated) will be probed. For example,
|
||||||
core will then call you back as needed and will instantiate a device
|
a driver for a hardware monitoring chip for which auto-detection is
|
||||||
for you for every successful detection.
|
needed would set its class to I2C_CLASS_HWMON, and only I2C adapters
|
||||||
|
with a class including I2C_CLASS_HWMON would be probed by this driver.
|
||||||
|
Note that the absence of matching classes does not prevent the use of
|
||||||
|
a device of that type on the given I2C adapter. All it prevents is
|
||||||
|
auto-detection; explicit instantiation of devices is still possible.
|
||||||
|
|
||||||
Note that this mechanism is purely optional and not suitable for all
|
Note that this mechanism is purely optional and not suitable for all
|
||||||
devices. You need some reliable way to identify the supported devices
|
devices. You need some reliable way to identify the supported devices
|
||||||
(typically using device-specific, dedicated identification registers),
|
(typically using device-specific, dedicated identification registers),
|
||||||
otherwise misdetections are likely to occur and things can get wrong
|
otherwise misdetections are likely to occur and things can get wrong
|
||||||
quickly.
|
quickly. Keep in mind that the I2C protocol doesn't include any
|
||||||
|
standard way to detect the presence of a chip at a given address, let
|
||||||
|
alone a standard way to identify devices. Even worse is the lack of
|
||||||
|
semantics associated to bus transfers, which means that the same
|
||||||
|
transfer can be seen as a read operation by a chip and as a write
|
||||||
|
operation by another chip. For these reasons, explicit device
|
||||||
|
instantiation should always be preferred to auto-detection where
|
||||||
|
possible.
|
||||||
|
|
||||||
|
|
||||||
Device Deletion
|
Device Deletion
|
||||||
|
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user