Merge branch 'linus' into stackprotector
Conflicts: arch/x86/kernel/Makefile include/asm-x86/pda.h
This commit is contained in:
12
.gitignore
vendored
12
.gitignore
vendored
@@ -3,6 +3,10 @@
|
|||||||
# subdirectories here. Add them in the ".gitignore" file
|
# subdirectories here. Add them in the ".gitignore" file
|
||||||
# in that subdirectory instead.
|
# in that subdirectory instead.
|
||||||
#
|
#
|
||||||
|
# NOTE! Please use 'git-ls-files -i --exclude-standard'
|
||||||
|
# command after changing this file, to see if there are
|
||||||
|
# any tracked files which get ignored after the change.
|
||||||
|
#
|
||||||
# Normal rules
|
# Normal rules
|
||||||
#
|
#
|
||||||
.*
|
.*
|
||||||
@@ -18,19 +22,21 @@
|
|||||||
*.lst
|
*.lst
|
||||||
*.symtypes
|
*.symtypes
|
||||||
*.order
|
*.order
|
||||||
|
*.elf
|
||||||
|
*.bin
|
||||||
|
*.gz
|
||||||
|
|
||||||
#
|
#
|
||||||
# Top-level generic files
|
# Top-level generic files
|
||||||
#
|
#
|
||||||
tags
|
tags
|
||||||
TAGS
|
TAGS
|
||||||
vmlinux*
|
vmlinux
|
||||||
!vmlinux.lds.S
|
|
||||||
!vmlinux.lds.h
|
|
||||||
System.map
|
System.map
|
||||||
Module.markers
|
Module.markers
|
||||||
Module.symvers
|
Module.symvers
|
||||||
!.gitignore
|
!.gitignore
|
||||||
|
!.mailmap
|
||||||
|
|
||||||
#
|
#
|
||||||
# Generated include files
|
# Generated include files
|
||||||
|
2
.mailmap
2
.mailmap
@@ -96,4 +96,6 @@ Tejun Heo <htejun@gmail.com>
|
|||||||
Thomas Graf <tgraf@suug.ch>
|
Thomas Graf <tgraf@suug.ch>
|
||||||
Tony Luck <tony.luck@intel.com>
|
Tony Luck <tony.luck@intel.com>
|
||||||
Tsuneo Yoshioka <Tsuneo.Yoshioka@f-secure.com>
|
Tsuneo Yoshioka <Tsuneo.Yoshioka@f-secure.com>
|
||||||
|
Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com>
|
||||||
|
Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de>
|
||||||
Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
|
Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
|
||||||
|
16
CREDITS
16
CREDITS
@@ -317,6 +317,14 @@ S: 2322 37th Ave SW
|
|||||||
S: Seattle, Washington 98126-2010
|
S: Seattle, Washington 98126-2010
|
||||||
S: USA
|
S: USA
|
||||||
|
|
||||||
|
N: Muli Ben-Yehuda
|
||||||
|
E: mulix@mulix.org
|
||||||
|
E: muli@il.ibm.com
|
||||||
|
W: http://www.mulix.org
|
||||||
|
D: trident OSS sound driver, x86-64 dma-ops and Calgary IOMMU,
|
||||||
|
D: KVM and Xen bits and other misc. hackery.
|
||||||
|
S: Haifa, Israel
|
||||||
|
|
||||||
N: Johannes Berg
|
N: Johannes Berg
|
||||||
E: johannes@sipsolutions.net
|
E: johannes@sipsolutions.net
|
||||||
W: http://johannes.sipsolutions.net/
|
W: http://johannes.sipsolutions.net/
|
||||||
@@ -2611,8 +2619,9 @@ S: Perth, Western Australia
|
|||||||
S: Australia
|
S: Australia
|
||||||
|
|
||||||
N: Miguel Ojeda Sandonis
|
N: Miguel Ojeda Sandonis
|
||||||
E: maxextreme@gmail.com
|
E: miguel.ojeda.sandonis@gmail.com
|
||||||
W: http://maxextreme.googlepages.com/
|
W: http://miguelojeda.es
|
||||||
|
W: http://jair.lab.fi.uva.es/~migojed/
|
||||||
D: Author of the ks0108, cfag12864b and cfag12864bfb auxiliary display drivers.
|
D: Author of the ks0108, cfag12864b and cfag12864bfb auxiliary display drivers.
|
||||||
D: Maintainer of the auxiliary display drivers tree (drivers/auxdisplay/*)
|
D: Maintainer of the auxiliary display drivers tree (drivers/auxdisplay/*)
|
||||||
S: C/ Mieses 20, 9-B
|
S: C/ Mieses 20, 9-B
|
||||||
@@ -3343,8 +3352,7 @@ S: Spain
|
|||||||
N: Linus Torvalds
|
N: Linus Torvalds
|
||||||
E: torvalds@linux-foundation.org
|
E: torvalds@linux-foundation.org
|
||||||
D: Original kernel hacker
|
D: Original kernel hacker
|
||||||
S: 12725 SW Millikan Way, Suite 400
|
S: Portland, Oregon 97005
|
||||||
S: Beaverton, Oregon 97005
|
|
||||||
S: USA
|
S: USA
|
||||||
|
|
||||||
N: Marcelo Tosatti
|
N: Marcelo Tosatti
|
||||||
|
@@ -89,8 +89,6 @@ cciss.txt
|
|||||||
- info, major/minor #'s for Compaq's SMART Array Controllers.
|
- info, major/minor #'s for Compaq's SMART Array Controllers.
|
||||||
cdrom/
|
cdrom/
|
||||||
- directory with information on the CD-ROM drivers that Linux has.
|
- directory with information on the CD-ROM drivers that Linux has.
|
||||||
cli-sti-removal.txt
|
|
||||||
- cli()/sti() removal guide.
|
|
||||||
computone.txt
|
computone.txt
|
||||||
- info on Computone Intelliport II/Plus Multiport Serial Driver.
|
- info on Computone Intelliport II/Plus Multiport Serial Driver.
|
||||||
connector/
|
connector/
|
||||||
@@ -161,8 +159,6 @@ hayes-esp.txt
|
|||||||
- info on using the Hayes ESP serial driver.
|
- info on using the Hayes ESP serial driver.
|
||||||
highuid.txt
|
highuid.txt
|
||||||
- notes on the change from 16 bit to 32 bit user/group IDs.
|
- notes on the change from 16 bit to 32 bit user/group IDs.
|
||||||
hpet.txt
|
|
||||||
- High Precision Event Timer Driver for Linux.
|
|
||||||
timers/
|
timers/
|
||||||
- info on the timer related topics
|
- info on the timer related topics
|
||||||
hw_random.txt
|
hw_random.txt
|
||||||
@@ -253,8 +249,6 @@ mono.txt
|
|||||||
- how to execute Mono-based .NET binaries with the help of BINFMT_MISC.
|
- how to execute Mono-based .NET binaries with the help of BINFMT_MISC.
|
||||||
moxa-smartio
|
moxa-smartio
|
||||||
- file with info on installing/using Moxa multiport serial driver.
|
- file with info on installing/using Moxa multiport serial driver.
|
||||||
mtrr.txt
|
|
||||||
- how to use PPro Memory Type Range Registers to increase performance.
|
|
||||||
mutex-design.txt
|
mutex-design.txt
|
||||||
- info on the generic mutex subsystem.
|
- info on the generic mutex subsystem.
|
||||||
namespaces/
|
namespaces/
|
||||||
@@ -361,8 +355,6 @@ telephony/
|
|||||||
- directory with info on telephony (e.g. voice over IP) support.
|
- directory with info on telephony (e.g. voice over IP) support.
|
||||||
time_interpolators.txt
|
time_interpolators.txt
|
||||||
- info on time interpolators.
|
- info on time interpolators.
|
||||||
tipar.txt
|
|
||||||
- information about Parallel link cable for Texas Instruments handhelds.
|
|
||||||
tty.txt
|
tty.txt
|
||||||
- guide to the locking policies of the tty layer.
|
- guide to the locking policies of the tty layer.
|
||||||
uml/
|
uml/
|
||||||
|
@@ -26,3 +26,37 @@ Description:
|
|||||||
I/O statistics of partition <part>. The format is the
|
I/O statistics of partition <part>. The format is the
|
||||||
same as the above-written /sys/block/<disk>/stat
|
same as the above-written /sys/block/<disk>/stat
|
||||||
format.
|
format.
|
||||||
|
|
||||||
|
|
||||||
|
What: /sys/block/<disk>/integrity/format
|
||||||
|
Date: June 2008
|
||||||
|
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||||
|
Description:
|
||||||
|
Metadata format for integrity capable block device.
|
||||||
|
E.g. T10-DIF-TYPE1-CRC.
|
||||||
|
|
||||||
|
|
||||||
|
What: /sys/block/<disk>/integrity/read_verify
|
||||||
|
Date: June 2008
|
||||||
|
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||||
|
Description:
|
||||||
|
Indicates whether the block layer should verify the
|
||||||
|
integrity of read requests serviced by devices that
|
||||||
|
support sending integrity metadata.
|
||||||
|
|
||||||
|
|
||||||
|
What: /sys/block/<disk>/integrity/tag_size
|
||||||
|
Date: June 2008
|
||||||
|
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||||
|
Description:
|
||||||
|
Number of bytes of integrity tag space available per
|
||||||
|
512 bytes of data.
|
||||||
|
|
||||||
|
|
||||||
|
What: /sys/block/<disk>/integrity/write_generate
|
||||||
|
Date: June 2008
|
||||||
|
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||||
|
Description:
|
||||||
|
Indicates whether the block layer should automatically
|
||||||
|
generate checksums for write requests bound for
|
||||||
|
devices that support receiving integrity metadata.
|
||||||
|
35
Documentation/ABI/testing/sysfs-bus-css
Normal file
35
Documentation/ABI/testing/sysfs-bus-css
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
What: /sys/bus/css/devices/.../type
|
||||||
|
Date: March 2008
|
||||||
|
Contact: Cornelia Huck <cornelia.huck@de.ibm.com>
|
||||||
|
linux-s390@vger.kernel.org
|
||||||
|
Description: Contains the subchannel type, as reported by the hardware.
|
||||||
|
This attribute is present for all subchannel types.
|
||||||
|
|
||||||
|
What: /sys/bus/css/devices/.../modalias
|
||||||
|
Date: March 2008
|
||||||
|
Contact: Cornelia Huck <cornelia.huck@de.ibm.com>
|
||||||
|
linux-s390@vger.kernel.org
|
||||||
|
Description: Contains the module alias as reported with uevents.
|
||||||
|
It is of the format css:t<type> and present for all
|
||||||
|
subchannel types.
|
||||||
|
|
||||||
|
What: /sys/bus/css/drivers/io_subchannel/.../chpids
|
||||||
|
Date: December 2002
|
||||||
|
Contact: Cornelia Huck <cornelia.huck@de.ibm.com>
|
||||||
|
linux-s390@vger.kernel.org
|
||||||
|
Description: Contains the ids of the channel paths used by this
|
||||||
|
subchannel, as reported by the channel subsystem
|
||||||
|
during subchannel recognition.
|
||||||
|
Note: This is an I/O-subchannel specific attribute.
|
||||||
|
Users: s390-tools, HAL
|
||||||
|
|
||||||
|
What: /sys/bus/css/drivers/io_subchannel/.../pimpampom
|
||||||
|
Date: December 2002
|
||||||
|
Contact: Cornelia Huck <cornelia.huck@de.ibm.com>
|
||||||
|
linux-s390@vger.kernel.org
|
||||||
|
Description: Contains the PIM/PAM/POM values, as reported by the
|
||||||
|
channel subsystem when last queried by the common I/O
|
||||||
|
layer (this implies that this attribute is not neccessarily
|
||||||
|
in sync with the values current in the channel subsystem).
|
||||||
|
Note: This is an I/O-subchannel specific attribute.
|
||||||
|
Users: s390-tools, HAL
|
328
Documentation/ABI/testing/sysfs-class-regulator
Normal file
328
Documentation/ABI/testing/sysfs-class-regulator
Normal file
@@ -0,0 +1,328 @@
|
|||||||
|
What: /sys/class/regulator/.../state
|
||||||
|
Date: April 2008
|
||||||
|
KernelVersion: 2.6.26
|
||||||
|
Contact: Liam Girdwood <lrg@slimlogic.co.uk>
|
||||||
|
Description:
|
||||||
|
Each regulator directory will contain a field called
|
||||||
|
state. This holds the regulator output state.
|
||||||
|
|
||||||
|
This will be one of the following strings:
|
||||||
|
|
||||||
|
'enabled'
|
||||||
|
'disabled'
|
||||||
|
'unknown'
|
||||||
|
|
||||||
|
'enabled' means the regulator output is ON and is supplying
|
||||||
|
power to the system.
|
||||||
|
|
||||||
|
'disabled' means the regulator output is OFF and is not
|
||||||
|
supplying power to the system..
|
||||||
|
|
||||||
|
'unknown' means software cannot determine the state.
|
||||||
|
|
||||||
|
NOTE: this field can be used in conjunction with microvolts
|
||||||
|
and microamps to determine regulator output levels.
|
||||||
|
|
||||||
|
|
||||||
|
What: /sys/class/regulator/.../type
|
||||||
|
Date: April 2008
|
||||||
|
KernelVersion: 2.6.26
|
||||||
|
Contact: Liam Girdwood <lrg@slimlogic.co.uk>
|
||||||
|
Description:
|
||||||
|
Each regulator directory will contain a field called
|
||||||
|
type. This holds the regulator type.
|
||||||
|
|
||||||
|
This will be one of the following strings:
|
||||||
|
|
||||||
|
'voltage'
|
||||||
|
'current'
|
||||||
|
'unknown'
|
||||||
|
|
||||||
|
'voltage' means the regulator output voltage can be controlled
|
||||||
|
by software.
|
||||||
|
|
||||||
|
'current' means the regulator output current limit can be
|
||||||
|
controlled by software.
|
||||||
|
|
||||||
|
'unknown' means software cannot control either voltage or
|
||||||
|
current limit.
|
||||||
|
|
||||||
|
|
||||||
|
What: /sys/class/regulator/.../microvolts
|
||||||
|
Date: April 2008
|
||||||
|
KernelVersion: 2.6.26
|
||||||
|
Contact: Liam Girdwood <lrg@slimlogic.co.uk>
|
||||||
|
Description:
|
||||||
|
Each regulator directory will contain a field called
|
||||||
|
microvolts. This holds the regulator output voltage setting
|
||||||
|
measured in microvolts (i.e. E-6 Volts).
|
||||||
|
|
||||||
|
NOTE: This value should not be used to determine the regulator
|
||||||
|
output voltage level as this value is the same regardless of
|
||||||
|
whether the regulator is enabled or disabled.
|
||||||
|
|
||||||
|
|
||||||
|
What: /sys/class/regulator/.../microamps
|
||||||
|
Date: April 2008
|
||||||
|
KernelVersion: 2.6.26
|
||||||
|
Contact: Liam Girdwood <lrg@slimlogic.co.uk>
|
||||||
|
Description:
|
||||||
|
Each regulator directory will contain a field called
|
||||||
|
microamps. This holds the regulator output current limit
|
||||||
|
setting measured in microamps (i.e. E-6 Amps).
|
||||||
|
|
||||||
|
NOTE: This value should not be used to determine the regulator
|
||||||
|
output current level as this value is the same regardless of
|
||||||
|
whether the regulator is enabled or disabled.
|
||||||
|
|
||||||
|
|
||||||
|
What: /sys/class/regulator/.../opmode
|
||||||
|
Date: April 2008
|
||||||
|
KernelVersion: 2.6.26
|
||||||
|
Contact: Liam Girdwood <lrg@slimlogic.co.uk>
|
||||||
|
Description:
|
||||||
|
Each regulator directory will contain a field called
|
||||||
|
opmode. This holds the regulator operating mode setting.
|
||||||
|
|
||||||
|
The opmode value can be one of the following strings:
|
||||||
|
|
||||||
|
'fast'
|
||||||
|
'normal'
|
||||||
|
'idle'
|
||||||
|
'standby'
|
||||||
|
'unknown'
|
||||||
|
|
||||||
|
The modes are described in include/linux/regulator/regulator.h
|
||||||
|
|
||||||
|
NOTE: This value should not be used to determine the regulator
|
||||||
|
output operating mode as this value is the same regardless of
|
||||||
|
whether the regulator is enabled or disabled.
|
||||||
|
|
||||||
|
|
||||||
|
What: /sys/class/regulator/.../min_microvolts
|
||||||
|
Date: April 2008
|
||||||
|
KernelVersion: 2.6.26
|
||||||
|
Contact: Liam Girdwood <lrg@slimlogic.co.uk>
|
||||||
|
Description:
|
||||||
|
Each regulator directory will contain a field called
|
||||||
|
min_microvolts. This holds the minimum safe working regulator
|
||||||
|
output voltage setting for this domain measured in microvolts.
|
||||||
|
|
||||||
|
NOTE: this will return the string 'constraint not defined' if
|
||||||
|
the power domain has no min microvolts constraint defined by
|
||||||
|
platform code.
|
||||||
|
|
||||||
|
|
||||||
|
What: /sys/class/regulator/.../max_microvolts
|
||||||
|
Date: April 2008
|
||||||
|
KernelVersion: 2.6.26
|
||||||
|
Contact: Liam Girdwood <lrg@slimlogic.co.uk>
|
||||||
|
Description:
|
||||||
|
Each regulator directory will contain a field called
|
||||||
|
max_microvolts. This holds the maximum safe working regulator
|
||||||
|
output voltage setting for this domain measured in microvolts.
|
||||||
|
|
||||||
|
NOTE: this will return the string 'constraint not defined' if
|
||||||
|
the power domain has no max microvolts constraint defined by
|
||||||
|
platform code.
|
||||||
|
|
||||||
|
|
||||||
|
What: /sys/class/regulator/.../min_microamps
|
||||||
|
Date: April 2008
|
||||||
|
KernelVersion: 2.6.26
|
||||||
|
Contact: Liam Girdwood <lrg@slimlogic.co.uk>
|
||||||
|
Description:
|
||||||
|
Each regulator directory will contain a field called
|
||||||
|
min_microamps. This holds the minimum safe working regulator
|
||||||
|
output current limit setting for this domain measured in
|
||||||
|
microamps.
|
||||||
|
|
||||||
|
NOTE: this will return the string 'constraint not defined' if
|
||||||
|
the power domain has no min microamps constraint defined by
|
||||||
|
platform code.
|
||||||
|
|
||||||
|
|
||||||
|
What: /sys/class/regulator/.../max_microamps
|
||||||
|
Date: April 2008
|
||||||
|
KernelVersion: 2.6.26
|
||||||
|
Contact: Liam Girdwood <lrg@slimlogic.co.uk>
|
||||||
|
Description:
|
||||||
|
Each regulator directory will contain a field called
|
||||||
|
max_microamps. This holds the maximum safe working regulator
|
||||||
|
output current limit setting for this domain measured in
|
||||||
|
microamps.
|
||||||
|
|
||||||
|
NOTE: this will return the string 'constraint not defined' if
|
||||||
|
the power domain has no max microamps constraint defined by
|
||||||
|
platform code.
|
||||||
|
|
||||||
|
|
||||||
|
What: /sys/class/regulator/.../name
|
||||||
|
Date: October 2008
|
||||||
|
KernelVersion: 2.6.28
|
||||||
|
Contact: Liam Girdwood <lrg@slimlogic.co.uk>
|
||||||
|
Description:
|
||||||
|
Each regulator directory will contain a field called
|
||||||
|
name. This holds a string identifying the regulator for
|
||||||
|
display purposes.
|
||||||
|
|
||||||
|
NOTE: this will be empty if no suitable name is provided
|
||||||
|
by platform or regulator drivers.
|
||||||
|
|
||||||
|
|
||||||
|
What: /sys/class/regulator/.../num_users
|
||||||
|
Date: April 2008
|
||||||
|
KernelVersion: 2.6.26
|
||||||
|
Contact: Liam Girdwood <lrg@slimlogic.co.uk>
|
||||||
|
Description:
|
||||||
|
Each regulator directory will contain a field called
|
||||||
|
num_users. This holds the number of consumer devices that
|
||||||
|
have called regulator_enable() on this regulator.
|
||||||
|
|
||||||
|
|
||||||
|
What: /sys/class/regulator/.../requested_microamps
|
||||||
|
Date: April 2008
|
||||||
|
KernelVersion: 2.6.26
|
||||||
|
Contact: Liam Girdwood <lrg@slimlogic.co.uk>
|
||||||
|
Description:
|
||||||
|
Each regulator directory will contain a field called
|
||||||
|
requested_microamps. This holds the total requested load
|
||||||
|
current in microamps for this regulator from all its consumer
|
||||||
|
devices.
|
||||||
|
|
||||||
|
|
||||||
|
What: /sys/class/regulator/.../parent
|
||||||
|
Date: April 2008
|
||||||
|
KernelVersion: 2.6.26
|
||||||
|
Contact: Liam Girdwood <lrg@slimlogic.co.uk>
|
||||||
|
Description:
|
||||||
|
Some regulator directories will contain a link called parent.
|
||||||
|
This points to the parent or supply regulator if one exists.
|
||||||
|
|
||||||
|
What: /sys/class/regulator/.../suspend_mem_microvolts
|
||||||
|
Date: May 2008
|
||||||
|
KernelVersion: 2.6.26
|
||||||
|
Contact: Liam Girdwood <lrg@slimlogic.co.uk>
|
||||||
|
Description:
|
||||||
|
Each regulator directory will contain a field called
|
||||||
|
suspend_mem_microvolts. This holds the regulator output
|
||||||
|
voltage setting for this domain measured in microvolts when
|
||||||
|
the system is suspended to memory.
|
||||||
|
|
||||||
|
NOTE: this will return the string 'not defined' if
|
||||||
|
the power domain has no suspend to memory voltage defined by
|
||||||
|
platform code.
|
||||||
|
|
||||||
|
What: /sys/class/regulator/.../suspend_disk_microvolts
|
||||||
|
Date: May 2008
|
||||||
|
KernelVersion: 2.6.26
|
||||||
|
Contact: Liam Girdwood <lrg@slimlogic.co.uk>
|
||||||
|
Description:
|
||||||
|
Each regulator directory will contain a field called
|
||||||
|
suspend_disk_microvolts. This holds the regulator output
|
||||||
|
voltage setting for this domain measured in microvolts when
|
||||||
|
the system is suspended to disk.
|
||||||
|
|
||||||
|
NOTE: this will return the string 'not defined' if
|
||||||
|
the power domain has no suspend to disk voltage defined by
|
||||||
|
platform code.
|
||||||
|
|
||||||
|
What: /sys/class/regulator/.../suspend_standby_microvolts
|
||||||
|
Date: May 2008
|
||||||
|
KernelVersion: 2.6.26
|
||||||
|
Contact: Liam Girdwood <lrg@slimlogic.co.uk>
|
||||||
|
Description:
|
||||||
|
Each regulator directory will contain a field called
|
||||||
|
suspend_standby_microvolts. This holds the regulator output
|
||||||
|
voltage setting for this domain measured in microvolts when
|
||||||
|
the system is suspended to standby.
|
||||||
|
|
||||||
|
NOTE: this will return the string 'not defined' if
|
||||||
|
the power domain has no suspend to standby voltage defined by
|
||||||
|
platform code.
|
||||||
|
|
||||||
|
What: /sys/class/regulator/.../suspend_mem_mode
|
||||||
|
Date: May 2008
|
||||||
|
KernelVersion: 2.6.26
|
||||||
|
Contact: Liam Girdwood <lrg@slimlogic.co.uk>
|
||||||
|
Description:
|
||||||
|
Each regulator directory will contain a field called
|
||||||
|
suspend_mem_mode. This holds the regulator operating mode
|
||||||
|
setting for this domain when the system is suspended to
|
||||||
|
memory.
|
||||||
|
|
||||||
|
NOTE: this will return the string 'not defined' if
|
||||||
|
the power domain has no suspend to memory mode defined by
|
||||||
|
platform code.
|
||||||
|
|
||||||
|
What: /sys/class/regulator/.../suspend_disk_mode
|
||||||
|
Date: May 2008
|
||||||
|
KernelVersion: 2.6.26
|
||||||
|
Contact: Liam Girdwood <lrg@slimlogic.co.uk>
|
||||||
|
Description:
|
||||||
|
Each regulator directory will contain a field called
|
||||||
|
suspend_disk_mode. This holds the regulator operating mode
|
||||||
|
setting for this domain when the system is suspended to disk.
|
||||||
|
|
||||||
|
NOTE: this will return the string 'not defined' if
|
||||||
|
the power domain has no suspend to disk mode defined by
|
||||||
|
platform code.
|
||||||
|
|
||||||
|
What: /sys/class/regulator/.../suspend_standby_mode
|
||||||
|
Date: May 2008
|
||||||
|
KernelVersion: 2.6.26
|
||||||
|
Contact: Liam Girdwood <lrg@slimlogic.co.uk>
|
||||||
|
Description:
|
||||||
|
Each regulator directory will contain a field called
|
||||||
|
suspend_standby_mode. This holds the regulator operating mode
|
||||||
|
setting for this domain when the system is suspended to
|
||||||
|
standby.
|
||||||
|
|
||||||
|
NOTE: this will return the string 'not defined' if
|
||||||
|
the power domain has no suspend to standby mode defined by
|
||||||
|
platform code.
|
||||||
|
|
||||||
|
What: /sys/class/regulator/.../suspend_mem_state
|
||||||
|
Date: May 2008
|
||||||
|
KernelVersion: 2.6.26
|
||||||
|
Contact: Liam Girdwood <lrg@slimlogic.co.uk>
|
||||||
|
Description:
|
||||||
|
Each regulator directory will contain a field called
|
||||||
|
suspend_mem_state. This holds the regulator operating state
|
||||||
|
when suspended to memory.
|
||||||
|
|
||||||
|
This will be one of the following strings:
|
||||||
|
|
||||||
|
'enabled'
|
||||||
|
'disabled'
|
||||||
|
'not defined'
|
||||||
|
|
||||||
|
What: /sys/class/regulator/.../suspend_disk_state
|
||||||
|
Date: May 2008
|
||||||
|
KernelVersion: 2.6.26
|
||||||
|
Contact: Liam Girdwood <lrg@slimlogic.co.uk>
|
||||||
|
Description:
|
||||||
|
Each regulator directory will contain a field called
|
||||||
|
suspend_disk_state. This holds the regulator operating state
|
||||||
|
when suspended to disk.
|
||||||
|
|
||||||
|
This will be one of the following strings:
|
||||||
|
|
||||||
|
'enabled'
|
||||||
|
'disabled'
|
||||||
|
'not defined'
|
||||||
|
|
||||||
|
What: /sys/class/regulator/.../suspend_standby_state
|
||||||
|
Date: May 2008
|
||||||
|
KernelVersion: 2.6.26
|
||||||
|
Contact: Liam Girdwood <lrg@slimlogic.co.uk>
|
||||||
|
Description:
|
||||||
|
Each regulator directory will contain a field called
|
||||||
|
suspend_standby_state. This holds the regulator operating
|
||||||
|
state when suspended to standby.
|
||||||
|
|
||||||
|
This will be one of the following strings:
|
||||||
|
|
||||||
|
'enabled'
|
||||||
|
'disabled'
|
||||||
|
'not defined'
|
20
Documentation/ABI/testing/sysfs-dev
Normal file
20
Documentation/ABI/testing/sysfs-dev
Normal file
@@ -0,0 +1,20 @@
|
|||||||
|
What: /sys/dev
|
||||||
|
Date: April 2008
|
||||||
|
KernelVersion: 2.6.26
|
||||||
|
Contact: Dan Williams <dan.j.williams@intel.com>
|
||||||
|
Description: The /sys/dev tree provides a method to look up the sysfs
|
||||||
|
path for a device using the information returned from
|
||||||
|
stat(2). There are two directories, 'block' and 'char',
|
||||||
|
beneath /sys/dev containing symbolic links with names of
|
||||||
|
the form "<major>:<minor>". These links point to the
|
||||||
|
corresponding sysfs path for the given device.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
$ readlink /sys/dev/block/8:32
|
||||||
|
../../block/sdc
|
||||||
|
|
||||||
|
Entries in /sys/dev/char and /sys/dev/block will be
|
||||||
|
dynamically created and destroyed as devices enter and
|
||||||
|
leave the system.
|
||||||
|
|
||||||
|
Users: mdadm <linux-raid@vger.kernel.org>
|
24
Documentation/ABI/testing/sysfs-devices-memory
Normal file
24
Documentation/ABI/testing/sysfs-devices-memory
Normal file
@@ -0,0 +1,24 @@
|
|||||||
|
What: /sys/devices/system/memory
|
||||||
|
Date: June 2008
|
||||||
|
Contact: Badari Pulavarty <pbadari@us.ibm.com>
|
||||||
|
Description:
|
||||||
|
The /sys/devices/system/memory contains a snapshot of the
|
||||||
|
internal state of the kernel memory blocks. Files could be
|
||||||
|
added or removed dynamically to represent hot-add/remove
|
||||||
|
operations.
|
||||||
|
|
||||||
|
Users: hotplug memory add/remove tools
|
||||||
|
https://w3.opensource.ibm.com/projects/powerpc-utils/
|
||||||
|
|
||||||
|
What: /sys/devices/system/memory/memoryX/removable
|
||||||
|
Date: June 2008
|
||||||
|
Contact: Badari Pulavarty <pbadari@us.ibm.com>
|
||||||
|
Description:
|
||||||
|
The file /sys/devices/system/memory/memoryX/removable
|
||||||
|
indicates whether this memory block is removable or not.
|
||||||
|
This is useful for a user-level agent to determine
|
||||||
|
identify removable sections of the memory before attempting
|
||||||
|
potentially expensive hot-remove memory operation
|
||||||
|
|
||||||
|
Users: hotplug memory remove tools
|
||||||
|
https://w3.opensource.ibm.com/projects/powerpc-utils/
|
@@ -29,46 +29,46 @@ Description:
|
|||||||
|
|
||||||
$ cd /sys/firmware/acpi/interrupts
|
$ cd /sys/firmware/acpi/interrupts
|
||||||
$ grep . *
|
$ grep . *
|
||||||
error:0
|
error: 0
|
||||||
ff_gbl_lock:0
|
ff_gbl_lock: 0 enable
|
||||||
ff_pmtimer:0
|
ff_pmtimer: 0 invalid
|
||||||
ff_pwr_btn:0
|
ff_pwr_btn: 0 enable
|
||||||
ff_rt_clk:0
|
ff_rt_clk: 2 disable
|
||||||
ff_slp_btn:0
|
ff_slp_btn: 0 invalid
|
||||||
gpe00:0
|
gpe00: 0 invalid
|
||||||
gpe01:0
|
gpe01: 0 enable
|
||||||
gpe02:0
|
gpe02: 108 enable
|
||||||
gpe03:0
|
gpe03: 0 invalid
|
||||||
gpe04:0
|
gpe04: 0 invalid
|
||||||
gpe05:0
|
gpe05: 0 invalid
|
||||||
gpe06:0
|
gpe06: 0 enable
|
||||||
gpe07:0
|
gpe07: 0 enable
|
||||||
gpe08:0
|
gpe08: 0 invalid
|
||||||
gpe09:174
|
gpe09: 0 invalid
|
||||||
gpe0A:0
|
gpe0A: 0 invalid
|
||||||
gpe0B:0
|
gpe0B: 0 invalid
|
||||||
gpe0C:0
|
gpe0C: 0 invalid
|
||||||
gpe0D:0
|
gpe0D: 0 invalid
|
||||||
gpe0E:0
|
gpe0E: 0 invalid
|
||||||
gpe0F:0
|
gpe0F: 0 invalid
|
||||||
gpe10:0
|
gpe10: 0 invalid
|
||||||
gpe11:60
|
gpe11: 0 invalid
|
||||||
gpe12:0
|
gpe12: 0 invalid
|
||||||
gpe13:0
|
gpe13: 0 invalid
|
||||||
gpe14:0
|
gpe14: 0 invalid
|
||||||
gpe15:0
|
gpe15: 0 invalid
|
||||||
gpe16:0
|
gpe16: 0 invalid
|
||||||
gpe17:0
|
gpe17: 1084 enable
|
||||||
gpe18:0
|
gpe18: 0 enable
|
||||||
gpe19:7
|
gpe19: 0 invalid
|
||||||
gpe1A:0
|
gpe1A: 0 invalid
|
||||||
gpe1B:0
|
gpe1B: 0 invalid
|
||||||
gpe1C:0
|
gpe1C: 0 invalid
|
||||||
gpe1D:0
|
gpe1D: 0 invalid
|
||||||
gpe1E:0
|
gpe1E: 0 invalid
|
||||||
gpe1F:0
|
gpe1F: 0 invalid
|
||||||
gpe_all:241
|
gpe_all: 1192
|
||||||
sci:241
|
sci: 1194
|
||||||
|
|
||||||
sci - The total number of times the ACPI SCI
|
sci - The total number of times the ACPI SCI
|
||||||
has claimed an interrupt.
|
has claimed an interrupt.
|
||||||
@@ -89,6 +89,13 @@ Description:
|
|||||||
|
|
||||||
error - an interrupt that can't be accounted for above.
|
error - an interrupt that can't be accounted for above.
|
||||||
|
|
||||||
|
invalid: it's either a wakeup GPE or a GPE/Fixed Event that
|
||||||
|
doesn't have an event handler.
|
||||||
|
|
||||||
|
disable: the GPE/Fixed Event is valid but disabled.
|
||||||
|
|
||||||
|
enable: the GPE/Fixed Event is valid and enabled.
|
||||||
|
|
||||||
Root has permission to clear any of these counters. Eg.
|
Root has permission to clear any of these counters. Eg.
|
||||||
# echo 0 > gpe11
|
# echo 0 > gpe11
|
||||||
|
|
||||||
@@ -97,3 +104,43 @@ Description:
|
|||||||
|
|
||||||
None of these counters has an effect on the function
|
None of these counters has an effect on the function
|
||||||
of the system, they are simply statistics.
|
of the system, they are simply statistics.
|
||||||
|
|
||||||
|
Besides this, user can also write specific strings to these files
|
||||||
|
to enable/disable/clear ACPI interrupts in user space, which can be
|
||||||
|
used to debug some ACPI interrupt storm issues.
|
||||||
|
|
||||||
|
Note that only writting to VALID GPE/Fixed Event is allowed,
|
||||||
|
i.e. user can only change the status of runtime GPE and
|
||||||
|
Fixed Event with event handler installed.
|
||||||
|
|
||||||
|
Let's take power button fixed event for example, please kill acpid
|
||||||
|
and other user space applications so that the machine won't shutdown
|
||||||
|
when pressing the power button.
|
||||||
|
# cat ff_pwr_btn
|
||||||
|
0
|
||||||
|
# press the power button for 3 times;
|
||||||
|
# cat ff_pwr_btn
|
||||||
|
3
|
||||||
|
# echo disable > ff_pwr_btn
|
||||||
|
# cat ff_pwr_btn
|
||||||
|
disable
|
||||||
|
# press the power button for 3 times;
|
||||||
|
# cat ff_pwr_btn
|
||||||
|
disable
|
||||||
|
# echo enable > ff_pwr_btn
|
||||||
|
# cat ff_pwr_btn
|
||||||
|
4
|
||||||
|
/*
|
||||||
|
* this is because the status bit is set even if the enable bit is cleared,
|
||||||
|
* and it triggers an ACPI fixed event when the enable bit is set again
|
||||||
|
*/
|
||||||
|
# press the power button for 3 times;
|
||||||
|
# cat ff_pwr_btn
|
||||||
|
7
|
||||||
|
# echo disable > ff_pwr_btn
|
||||||
|
# press the power button for 3 times;
|
||||||
|
# echo clear > ff_pwr_btn /* clear the status bit */
|
||||||
|
# echo disable > ff_pwr_btn
|
||||||
|
# cat ff_pwr_btn
|
||||||
|
7
|
||||||
|
|
||||||
|
71
Documentation/ABI/testing/sysfs-firmware-memmap
Normal file
71
Documentation/ABI/testing/sysfs-firmware-memmap
Normal file
@@ -0,0 +1,71 @@
|
|||||||
|
What: /sys/firmware/memmap/
|
||||||
|
Date: June 2008
|
||||||
|
Contact: Bernhard Walle <bwalle@suse.de>
|
||||||
|
Description:
|
||||||
|
On all platforms, the firmware provides a memory map which the
|
||||||
|
kernel reads. The resources from that memory map are registered
|
||||||
|
in the kernel resource tree and exposed to userspace via
|
||||||
|
/proc/iomem (together with other resources).
|
||||||
|
|
||||||
|
However, on most architectures that firmware-provided memory
|
||||||
|
map is modified afterwards by the kernel itself, either because
|
||||||
|
the kernel merges that memory map with other information or
|
||||||
|
just because the user overwrites that memory map via command
|
||||||
|
line.
|
||||||
|
|
||||||
|
kexec needs the raw firmware-provided memory map to setup the
|
||||||
|
parameter segment of the kernel that should be booted with
|
||||||
|
kexec. Also, the raw memory map is useful for debugging. For
|
||||||
|
that reason, /sys/firmware/memmap is an interface that provides
|
||||||
|
the raw memory map to userspace.
|
||||||
|
|
||||||
|
The structure is as follows: Under /sys/firmware/memmap there
|
||||||
|
are subdirectories with the number of the entry as their name:
|
||||||
|
|
||||||
|
/sys/firmware/memmap/0
|
||||||
|
/sys/firmware/memmap/1
|
||||||
|
/sys/firmware/memmap/2
|
||||||
|
/sys/firmware/memmap/3
|
||||||
|
...
|
||||||
|
|
||||||
|
The maximum depends on the number of memory map entries provided
|
||||||
|
by the firmware. The order is just the order that the firmware
|
||||||
|
provides.
|
||||||
|
|
||||||
|
Each directory contains three files:
|
||||||
|
|
||||||
|
start : The start address (as hexadecimal number with the
|
||||||
|
'0x' prefix).
|
||||||
|
end : The end address, inclusive (regardless whether the
|
||||||
|
firmware provides inclusive or exclusive ranges).
|
||||||
|
type : Type of the entry as string. See below for a list of
|
||||||
|
valid types.
|
||||||
|
|
||||||
|
So, for example:
|
||||||
|
|
||||||
|
/sys/firmware/memmap/0/start
|
||||||
|
/sys/firmware/memmap/0/end
|
||||||
|
/sys/firmware/memmap/0/type
|
||||||
|
/sys/firmware/memmap/1/start
|
||||||
|
...
|
||||||
|
|
||||||
|
Currently following types exist:
|
||||||
|
|
||||||
|
- System RAM
|
||||||
|
- ACPI Tables
|
||||||
|
- ACPI Non-volatile Storage
|
||||||
|
- reserved
|
||||||
|
|
||||||
|
Following shell snippet can be used to display that memory
|
||||||
|
map in a human-readable format:
|
||||||
|
|
||||||
|
-------------------- 8< ----------------------------------------
|
||||||
|
#!/bin/bash
|
||||||
|
cd /sys/firmware/memmap
|
||||||
|
for dir in * ; do
|
||||||
|
start=$(cat $dir/start)
|
||||||
|
end=$(cat $dir/end)
|
||||||
|
type=$(cat $dir/type)
|
||||||
|
printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type"
|
||||||
|
done
|
||||||
|
-------------------- >8 ----------------------------------------
|
27
Documentation/ABI/testing/sysfs-firmware-sgi_uv
Normal file
27
Documentation/ABI/testing/sysfs-firmware-sgi_uv
Normal file
@@ -0,0 +1,27 @@
|
|||||||
|
What: /sys/firmware/sgi_uv/
|
||||||
|
Date: August 2008
|
||||||
|
Contact: Russ Anderson <rja@sgi.com>
|
||||||
|
Description:
|
||||||
|
The /sys/firmware/sgi_uv directory contains information
|
||||||
|
about the SGI UV platform.
|
||||||
|
|
||||||
|
Under that directory are a number of files:
|
||||||
|
|
||||||
|
partition_id
|
||||||
|
coherence_id
|
||||||
|
|
||||||
|
The partition_id entry contains the partition id.
|
||||||
|
SGI UV systems can be partitioned into multiple physical
|
||||||
|
machines, which each partition running a unique copy
|
||||||
|
of the operating system. Each partition will have a unique
|
||||||
|
partition id. To display the partition id, use the command:
|
||||||
|
|
||||||
|
cat /sys/firmware/sgi_uv/partition_id
|
||||||
|
|
||||||
|
The coherence_id entry contains the coherence id.
|
||||||
|
A partitioned SGI UV system can have one or more coherence
|
||||||
|
domain. The coherence id indicates which coherence domain
|
||||||
|
this partition is in. To display the coherence id, use the
|
||||||
|
command:
|
||||||
|
|
||||||
|
cat /sys/firmware/sgi_uv/coherence_id
|
26
Documentation/ABI/testing/sysfs-gpio
Normal file
26
Documentation/ABI/testing/sysfs-gpio
Normal file
@@ -0,0 +1,26 @@
|
|||||||
|
What: /sys/class/gpio/
|
||||||
|
Date: July 2008
|
||||||
|
KernelVersion: 2.6.27
|
||||||
|
Contact: David Brownell <dbrownell@users.sourceforge.net>
|
||||||
|
Description:
|
||||||
|
|
||||||
|
As a Kconfig option, individual GPIO signals may be accessed from
|
||||||
|
userspace. GPIOs are only made available to userspace by an explicit
|
||||||
|
"export" operation. If a given GPIO is not claimed for use by
|
||||||
|
kernel code, it may be exported by userspace (and unexported later).
|
||||||
|
Kernel code may export it for complete or partial access.
|
||||||
|
|
||||||
|
GPIOs are identified as they are inside the kernel, using integers in
|
||||||
|
the range 0..INT_MAX. See Documentation/gpio.txt for more information.
|
||||||
|
|
||||||
|
/sys/class/gpio
|
||||||
|
/export ... asks the kernel to export a GPIO to userspace
|
||||||
|
/unexport ... to return a GPIO to the kernel
|
||||||
|
/gpioN ... for each exported GPIO #N
|
||||||
|
/value ... always readable, writes fail for input GPIOs
|
||||||
|
/direction ... r/w as: in, out (default low); write: high, low
|
||||||
|
/gpiochipN ... for each gpiochip; #N is its first GPIO
|
||||||
|
/base ... (r/o) same as N
|
||||||
|
/label ... (r/o) descriptive, not necessarily unique
|
||||||
|
/ngpio ... (r/o) number of GPIOs; numbered N to N + (ngpio - 1)
|
||||||
|
|
6
Documentation/ABI/testing/sysfs-kernel-mm
Normal file
6
Documentation/ABI/testing/sysfs-kernel-mm
Normal file
@@ -0,0 +1,6 @@
|
|||||||
|
What: /sys/kernel/mm
|
||||||
|
Date: July 2008
|
||||||
|
Contact: Nishanth Aravamudan <nacc@us.ibm.com>, VM maintainers
|
||||||
|
Description:
|
||||||
|
/sys/kernel/mm/ should contain any and all VM
|
||||||
|
related information in /sys/kernel/.
|
15
Documentation/ABI/testing/sysfs-kernel-mm-hugepages
Normal file
15
Documentation/ABI/testing/sysfs-kernel-mm-hugepages
Normal file
@@ -0,0 +1,15 @@
|
|||||||
|
What: /sys/kernel/mm/hugepages/
|
||||||
|
Date: June 2008
|
||||||
|
Contact: Nishanth Aravamudan <nacc@us.ibm.com>, hugetlb maintainers
|
||||||
|
Description:
|
||||||
|
/sys/kernel/mm/hugepages/ contains a number of subdirectories
|
||||||
|
of the form hugepages-<size>kB, where <size> is the page size
|
||||||
|
of the hugepages supported by the kernel/CPU combination.
|
||||||
|
|
||||||
|
Under these directories are a number of files:
|
||||||
|
nr_hugepages
|
||||||
|
nr_overcommit_hugepages
|
||||||
|
free_hugepages
|
||||||
|
surplus_hugepages
|
||||||
|
resv_hugepages
|
||||||
|
See Documentation/vm/hugetlbpage.txt for details.
|
@@ -474,25 +474,29 @@ make a good program).
|
|||||||
So, you can either get rid of GNU emacs, or change it to use saner
|
So, you can either get rid of GNU emacs, or change it to use saner
|
||||||
values. To do the latter, you can stick the following in your .emacs file:
|
values. To do the latter, you can stick the following in your .emacs file:
|
||||||
|
|
||||||
(defun linux-c-mode ()
|
(defun c-lineup-arglist-tabs-only (ignored)
|
||||||
"C mode with adjusted defaults for use with the Linux kernel."
|
"Line up argument lists by tabs, not spaces"
|
||||||
(interactive)
|
(let* ((anchor (c-langelem-pos c-syntactic-element))
|
||||||
(c-mode)
|
(column (c-langelem-2nd-pos c-syntactic-element))
|
||||||
(c-set-style "K&R")
|
(offset (- (1+ column) anchor))
|
||||||
(setq tab-width 8)
|
(steps (floor offset c-basic-offset)))
|
||||||
(setq indent-tabs-mode t)
|
(* (max steps 1)
|
||||||
(setq c-basic-offset 8))
|
c-basic-offset)))
|
||||||
|
|
||||||
This will define the M-x linux-c-mode command. When hacking on a
|
(add-hook 'c-mode-hook
|
||||||
module, if you put the string -*- linux-c -*- somewhere on the first
|
(lambda ()
|
||||||
two lines, this mode will be automatically invoked. Also, you may want
|
(let ((filename (buffer-file-name)))
|
||||||
to add
|
;; Enable kernel mode for the appropriate files
|
||||||
|
(when (and filename
|
||||||
|
(string-match "~/src/linux-trees" filename))
|
||||||
|
(setq indent-tabs-mode t)
|
||||||
|
(c-set-style "linux")
|
||||||
|
(c-set-offset 'arglist-cont-nonempty
|
||||||
|
'(c-lineup-gcc-asm-reg
|
||||||
|
c-lineup-arglist-tabs-only))))))
|
||||||
|
|
||||||
(setq auto-mode-alist (cons '("/usr/src/linux.*/.*\\.[ch]$" . linux-c-mode)
|
This will make emacs go better with the kernel coding style for C
|
||||||
auto-mode-alist))
|
files below ~/src/linux-trees.
|
||||||
|
|
||||||
to your .emacs file if you want to have linux-c-mode switched on
|
|
||||||
automagically when you edit source files under /usr/src/linux.
|
|
||||||
|
|
||||||
But even if you fail in getting emacs to do sane formatting, not
|
But even if you fail in getting emacs to do sane formatting, not
|
||||||
everything is lost: use "indent".
|
everything is lost: use "indent".
|
||||||
|
@@ -298,10 +298,10 @@ recommended that you never use these unless you really know what the
|
|||||||
cache width is.
|
cache width is.
|
||||||
|
|
||||||
int
|
int
|
||||||
dma_mapping_error(dma_addr_t dma_addr)
|
dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
|
||||||
|
|
||||||
int
|
int
|
||||||
pci_dma_mapping_error(dma_addr_t dma_addr)
|
pci_dma_mapping_error(struct pci_dev *hwdev, dma_addr_t dma_addr)
|
||||||
|
|
||||||
In some circumstances dma_map_single and dma_map_page will fail to create
|
In some circumstances dma_map_single and dma_map_page will fail to create
|
||||||
a mapping. A driver can check for these errors by testing the returned
|
a mapping. A driver can check for these errors by testing the returned
|
||||||
@@ -337,7 +337,7 @@ With scatterlists, you use the resulting mapping like this:
|
|||||||
int i, count = dma_map_sg(dev, sglist, nents, direction);
|
int i, count = dma_map_sg(dev, sglist, nents, direction);
|
||||||
struct scatterlist *sg;
|
struct scatterlist *sg;
|
||||||
|
|
||||||
for (i = 0, sg = sglist; i < count; i++, sg++) {
|
for_each_sg(sglist, sg, count, i) {
|
||||||
hw_address[i] = sg_dma_address(sg);
|
hw_address[i] = sg_dma_address(sg);
|
||||||
hw_len[i] = sg_dma_len(sg);
|
hw_len[i] = sg_dma_len(sg);
|
||||||
}
|
}
|
||||||
|
@@ -22,3 +22,12 @@ ready and available in memory. The DMA of the "completion indication"
|
|||||||
could race with data DMA. Mapping the memory used for completion
|
could race with data DMA. Mapping the memory used for completion
|
||||||
indications with DMA_ATTR_WRITE_BARRIER would prevent the race.
|
indications with DMA_ATTR_WRITE_BARRIER would prevent the race.
|
||||||
|
|
||||||
|
DMA_ATTR_WEAK_ORDERING
|
||||||
|
----------------------
|
||||||
|
|
||||||
|
DMA_ATTR_WEAK_ORDERING specifies that reads and writes to the mapping
|
||||||
|
may be weakly ordered, that is that reads and writes may pass each other.
|
||||||
|
|
||||||
|
Since it is optional for platforms to implement DMA_ATTR_WEAK_ORDERING,
|
||||||
|
those that do not will simply ignore the attribute and exhibit default
|
||||||
|
behavior.
|
||||||
|
@@ -740,7 +740,7 @@ failure can be determined by:
|
|||||||
dma_addr_t dma_handle;
|
dma_addr_t dma_handle;
|
||||||
|
|
||||||
dma_handle = pci_map_single(pdev, addr, size, direction);
|
dma_handle = pci_map_single(pdev, addr, size, direction);
|
||||||
if (pci_dma_mapping_error(dma_handle)) {
|
if (pci_dma_mapping_error(pdev, dma_handle)) {
|
||||||
/*
|
/*
|
||||||
* reduce current DMA mapping usage,
|
* reduce current DMA mapping usage,
|
||||||
* delay and try again later or
|
* delay and try again later or
|
||||||
|
@@ -12,7 +12,7 @@ DOCBOOKS := wanbook.xml z8530book.xml mcabook.xml videobook.xml \
|
|||||||
kernel-api.xml filesystems.xml lsm.xml usb.xml kgdb.xml \
|
kernel-api.xml filesystems.xml lsm.xml usb.xml kgdb.xml \
|
||||||
gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \
|
gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \
|
||||||
genericirq.xml s390-drivers.xml uio-howto.xml scsi.xml \
|
genericirq.xml s390-drivers.xml uio-howto.xml scsi.xml \
|
||||||
mac80211.xml debugobjects.xml
|
mac80211.xml debugobjects.xml sh.xml
|
||||||
|
|
||||||
###
|
###
|
||||||
# The build process is as follows (targets):
|
# The build process is as follows (targets):
|
||||||
@@ -102,6 +102,13 @@ C-procfs-example = procfs_example.xml
|
|||||||
C-procfs-example2 = $(addprefix $(obj)/,$(C-procfs-example))
|
C-procfs-example2 = $(addprefix $(obj)/,$(C-procfs-example))
|
||||||
$(obj)/procfs-guide.xml: $(C-procfs-example2)
|
$(obj)/procfs-guide.xml: $(C-procfs-example2)
|
||||||
|
|
||||||
|
# List of programs to build
|
||||||
|
##oops, this is a kernel module::hostprogs-y := procfs_example
|
||||||
|
obj-m += procfs_example.o
|
||||||
|
|
||||||
|
# Tell kbuild to always build the programs
|
||||||
|
always := $(hostprogs-y)
|
||||||
|
|
||||||
notfoundtemplate = echo "*** You have to install docbook-utils or xmlto ***"; \
|
notfoundtemplate = echo "*** You have to install docbook-utils or xmlto ***"; \
|
||||||
exit 1
|
exit 1
|
||||||
db2xtemplate = db2TYPE -o $(dir $@) $<
|
db2xtemplate = db2TYPE -o $(dir $@) $<
|
||||||
|
@@ -524,6 +524,44 @@ These utilities include endpoint autoconfiguration.
|
|||||||
<!-- !Edrivers/usb/gadget/epautoconf.c -->
|
<!-- !Edrivers/usb/gadget/epautoconf.c -->
|
||||||
</sect1>
|
</sect1>
|
||||||
|
|
||||||
|
<sect1 id="composite"><title>Composite Device Framework</title>
|
||||||
|
|
||||||
|
<para>The core API is sufficient for writing drivers for composite
|
||||||
|
USB devices (with more than one function in a given configuration),
|
||||||
|
and also multi-configuration devices (also more than one function,
|
||||||
|
but not necessarily sharing a given configuration).
|
||||||
|
There is however an optional framework which makes it easier to
|
||||||
|
reuse and combine functions.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>Devices using this framework provide a <emphasis>struct
|
||||||
|
usb_composite_driver</emphasis>, which in turn provides one or
|
||||||
|
more <emphasis>struct usb_configuration</emphasis> instances.
|
||||||
|
Each such configuration includes at least one
|
||||||
|
<emphasis>struct usb_function</emphasis>, which packages a user
|
||||||
|
visible role such as "network link" or "mass storage device".
|
||||||
|
Management functions may also exist, such as "Device Firmware
|
||||||
|
Upgrade".
|
||||||
|
</para>
|
||||||
|
|
||||||
|
!Iinclude/linux/usb/composite.h
|
||||||
|
!Edrivers/usb/gadget/composite.c
|
||||||
|
|
||||||
|
</sect1>
|
||||||
|
|
||||||
|
<sect1 id="functions"><title>Composite Device Functions</title>
|
||||||
|
|
||||||
|
<para>At this writing, a few of the current gadget drivers have
|
||||||
|
been converted to this framework.
|
||||||
|
Near-term plans include converting all of them, except for "gadgetfs".
|
||||||
|
</para>
|
||||||
|
|
||||||
|
!Edrivers/usb/gadget/f_acm.c
|
||||||
|
!Edrivers/usb/gadget/f_serial.c
|
||||||
|
|
||||||
|
</sect1>
|
||||||
|
|
||||||
|
|
||||||
</chapter>
|
</chapter>
|
||||||
|
|
||||||
<chapter id="controllers"><title>Peripheral Controller Drivers</title>
|
<chapter id="controllers"><title>Peripheral Controller Drivers</title>
|
||||||
|
@@ -283,6 +283,7 @@ X!Earch/x86/kernel/mca_32.c
|
|||||||
<chapter id="security">
|
<chapter id="security">
|
||||||
<title>Security Framework</title>
|
<title>Security Framework</title>
|
||||||
!Isecurity/security.c
|
!Isecurity/security.c
|
||||||
|
!Esecurity/inode.c
|
||||||
</chapter>
|
</chapter>
|
||||||
|
|
||||||
<chapter id="audit">
|
<chapter id="audit">
|
||||||
@@ -364,6 +365,10 @@ X!Edrivers/pnp/system.c
|
|||||||
!Eblock/blk-barrier.c
|
!Eblock/blk-barrier.c
|
||||||
!Eblock/blk-tag.c
|
!Eblock/blk-tag.c
|
||||||
!Iblock/blk-tag.c
|
!Iblock/blk-tag.c
|
||||||
|
!Eblock/blk-integrity.c
|
||||||
|
!Iblock/blktrace.c
|
||||||
|
!Iblock/genhd.c
|
||||||
|
!Eblock/genhd.c
|
||||||
</chapter>
|
</chapter>
|
||||||
|
|
||||||
<chapter id="chrdev">
|
<chapter id="chrdev">
|
||||||
|
@@ -219,10 +219,10 @@
|
|||||||
</para>
|
</para>
|
||||||
|
|
||||||
<sect1 id="lock-intro">
|
<sect1 id="lock-intro">
|
||||||
<title>Three Main Types of Kernel Locks: Spinlocks, Mutexes and Semaphores</title>
|
<title>Two Main Types of Kernel Locks: Spinlocks and Mutexes</title>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
There are three main types of kernel locks. The fundamental type
|
There are two main types of kernel locks. The fundamental type
|
||||||
is the spinlock
|
is the spinlock
|
||||||
(<filename class="headerfile">include/asm/spinlock.h</filename>),
|
(<filename class="headerfile">include/asm/spinlock.h</filename>),
|
||||||
which is a very simple single-holder lock: if you can't get the
|
which is a very simple single-holder lock: if you can't get the
|
||||||
@@ -239,14 +239,6 @@
|
|||||||
can't sleep (see <xref linkend="sleeping-things"/>), and so have to
|
can't sleep (see <xref linkend="sleeping-things"/>), and so have to
|
||||||
use a spinlock instead.
|
use a spinlock instead.
|
||||||
</para>
|
</para>
|
||||||
<para>
|
|
||||||
The third type is a semaphore
|
|
||||||
(<filename class="headerfile">include/linux/semaphore.h</filename>): it
|
|
||||||
can have more than one holder at any time (the number decided at
|
|
||||||
initialization time), although it is most commonly used as a
|
|
||||||
single-holder lock (a mutex). If you can't get a semaphore, your
|
|
||||||
task will be suspended and later on woken up - just like for mutexes.
|
|
||||||
</para>
|
|
||||||
<para>
|
<para>
|
||||||
Neither type of lock is recursive: see
|
Neither type of lock is recursive: see
|
||||||
<xref linkend="deadlock"/>.
|
<xref linkend="deadlock"/>.
|
||||||
@@ -278,7 +270,7 @@
|
|||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
Semaphores still exist, because they are required for
|
Mutexes still exist, because they are required for
|
||||||
synchronization between <firstterm linkend="gloss-usercontext">user
|
synchronization between <firstterm linkend="gloss-usercontext">user
|
||||||
contexts</firstterm>, as we will see below.
|
contexts</firstterm>, as we will see below.
|
||||||
</para>
|
</para>
|
||||||
@@ -289,18 +281,17 @@
|
|||||||
|
|
||||||
<para>
|
<para>
|
||||||
If you have a data structure which is only ever accessed from
|
If you have a data structure which is only ever accessed from
|
||||||
user context, then you can use a simple semaphore
|
user context, then you can use a simple mutex
|
||||||
(<filename>linux/linux/semaphore.h</filename>) to protect it. This
|
(<filename>include/linux/mutex.h</filename>) to protect it. This
|
||||||
is the most trivial case: you initialize the semaphore to the number
|
is the most trivial case: you initialize the mutex. Then you can
|
||||||
of resources available (usually 1), and call
|
call <function>mutex_lock_interruptible()</function> to grab the mutex,
|
||||||
<function>down_interruptible()</function> to grab the semaphore, and
|
and <function>mutex_unlock()</function> to release it. There is also a
|
||||||
<function>up()</function> to release it. There is also a
|
<function>mutex_lock()</function>, which should be avoided, because it
|
||||||
<function>down()</function>, which should be avoided, because it
|
|
||||||
will not return if a signal is received.
|
will not return if a signal is received.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
Example: <filename>linux/net/core/netfilter.c</filename> allows
|
Example: <filename>net/netfilter/nf_sockopt.c</filename> allows
|
||||||
registration of new <function>setsockopt()</function> and
|
registration of new <function>setsockopt()</function> and
|
||||||
<function>getsockopt()</function> calls, with
|
<function>getsockopt()</function> calls, with
|
||||||
<function>nf_register_sockopt()</function>. Registration and
|
<function>nf_register_sockopt()</function>. Registration and
|
||||||
@@ -515,7 +506,7 @@
|
|||||||
<listitem>
|
<listitem>
|
||||||
<para>
|
<para>
|
||||||
If you are in a process context (any syscall) and want to
|
If you are in a process context (any syscall) and want to
|
||||||
lock other process out, use a semaphore. You can take a semaphore
|
lock other process out, use a mutex. You can take a mutex
|
||||||
and sleep (<function>copy_from_user*(</function> or
|
and sleep (<function>copy_from_user*(</function> or
|
||||||
<function>kmalloc(x,GFP_KERNEL)</function>).
|
<function>kmalloc(x,GFP_KERNEL)</function>).
|
||||||
</para>
|
</para>
|
||||||
@@ -662,7 +653,7 @@
|
|||||||
<entry>SLBH</entry>
|
<entry>SLBH</entry>
|
||||||
<entry>SLBH</entry>
|
<entry>SLBH</entry>
|
||||||
<entry>SLBH</entry>
|
<entry>SLBH</entry>
|
||||||
<entry>DI</entry>
|
<entry>MLI</entry>
|
||||||
<entry>None</entry>
|
<entry>None</entry>
|
||||||
</row>
|
</row>
|
||||||
|
|
||||||
@@ -692,8 +683,8 @@
|
|||||||
<entry>spin_lock_bh</entry>
|
<entry>spin_lock_bh</entry>
|
||||||
</row>
|
</row>
|
||||||
<row>
|
<row>
|
||||||
<entry>DI</entry>
|
<entry>MLI</entry>
|
||||||
<entry>down_interruptible</entry>
|
<entry>mutex_lock_interruptible</entry>
|
||||||
</row>
|
</row>
|
||||||
|
|
||||||
</tbody>
|
</tbody>
|
||||||
@@ -1310,7 +1301,7 @@ as Alan Cox says, <quote>Lock data, not code</quote>.
|
|||||||
<para>
|
<para>
|
||||||
There is a coding bug where a piece of code tries to grab a
|
There is a coding bug where a piece of code tries to grab a
|
||||||
spinlock twice: it will spin forever, waiting for the lock to
|
spinlock twice: it will spin forever, waiting for the lock to
|
||||||
be released (spinlocks, rwlocks and semaphores are not
|
be released (spinlocks, rwlocks and mutexes are not
|
||||||
recursive in Linux). This is trivial to diagnose: not a
|
recursive in Linux). This is trivial to diagnose: not a
|
||||||
stay-up-five-nights-talk-to-fluffy-code-bunnies kind of
|
stay-up-five-nights-talk-to-fluffy-code-bunnies kind of
|
||||||
problem.
|
problem.
|
||||||
@@ -1335,7 +1326,7 @@ as Alan Cox says, <quote>Lock data, not code</quote>.
|
|||||||
|
|
||||||
<para>
|
<para>
|
||||||
This complete lockup is easy to diagnose: on SMP boxes the
|
This complete lockup is easy to diagnose: on SMP boxes the
|
||||||
watchdog timer or compiling with <symbol>DEBUG_SPINLOCKS</symbol> set
|
watchdog timer or compiling with <symbol>DEBUG_SPINLOCK</symbol> set
|
||||||
(<filename>include/linux/spinlock.h</filename>) will show this up
|
(<filename>include/linux/spinlock.h</filename>) will show this up
|
||||||
immediately when it happens.
|
immediately when it happens.
|
||||||
</para>
|
</para>
|
||||||
@@ -1558,7 +1549,7 @@ the amount of locking which needs to be done.
|
|||||||
<title>Read/Write Lock Variants</title>
|
<title>Read/Write Lock Variants</title>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
Both spinlocks and semaphores have read/write variants:
|
Both spinlocks and mutexes have read/write variants:
|
||||||
<type>rwlock_t</type> and <structname>struct rw_semaphore</structname>.
|
<type>rwlock_t</type> and <structname>struct rw_semaphore</structname>.
|
||||||
These divide users into two classes: the readers and the writers. If
|
These divide users into two classes: the readers and the writers. If
|
||||||
you are only reading the data, you can get a read lock, but to write to
|
you are only reading the data, you can get a read lock, but to write to
|
||||||
@@ -1681,7 +1672,7 @@ the amount of locking which needs to be done.
|
|||||||
#include <linux/slab.h>
|
#include <linux/slab.h>
|
||||||
#include <linux/string.h>
|
#include <linux/string.h>
|
||||||
+#include <linux/rcupdate.h>
|
+#include <linux/rcupdate.h>
|
||||||
#include <linux/semaphore.h>
|
#include <linux/mutex.h>
|
||||||
#include <asm/errno.h>
|
#include <asm/errno.h>
|
||||||
|
|
||||||
struct object
|
struct object
|
||||||
@@ -1913,7 +1904,7 @@ machines due to caching.
|
|||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>
|
<para>
|
||||||
<function> put_user()</function>
|
<function>put_user()</function>
|
||||||
</para>
|
</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
@@ -1927,13 +1918,13 @@ machines due to caching.
|
|||||||
|
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>
|
<para>
|
||||||
<function>down_interruptible()</function> and
|
<function>mutex_lock_interruptible()</function> and
|
||||||
<function>down()</function>
|
<function>mutex_lock()</function>
|
||||||
</para>
|
</para>
|
||||||
<para>
|
<para>
|
||||||
There is a <function>down_trylock()</function> which can be
|
There is a <function>mutex_trylock()</function> which can be
|
||||||
used inside interrupt context, as it will not sleep.
|
used inside interrupt context, as it will not sleep.
|
||||||
<function>up()</function> will also never sleep.
|
<function>mutex_unlock()</function> will also never sleep.
|
||||||
</para>
|
</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
@@ -2023,7 +2014,7 @@ machines due to caching.
|
|||||||
<para>
|
<para>
|
||||||
Prior to 2.5, or when <symbol>CONFIG_PREEMPT</symbol> is
|
Prior to 2.5, or when <symbol>CONFIG_PREEMPT</symbol> is
|
||||||
unset, processes in user context inside the kernel would not
|
unset, processes in user context inside the kernel would not
|
||||||
preempt each other (ie. you had that CPU until you have it up,
|
preempt each other (ie. you had that CPU until you gave it up,
|
||||||
except for interrupts). With the addition of
|
except for interrupts). With the addition of
|
||||||
<symbol>CONFIG_PREEMPT</symbol> in 2.5.4, this changed: when
|
<symbol>CONFIG_PREEMPT</symbol> in 2.5.4, this changed: when
|
||||||
in user context, higher priority tasks can "cut in": spinlocks
|
in user context, higher priority tasks can "cut in": spinlocks
|
||||||
|
@@ -98,6 +98,24 @@
|
|||||||
"Kernel debugging" select "KGDB: kernel debugging with remote gdb".
|
"Kernel debugging" select "KGDB: kernel debugging with remote gdb".
|
||||||
</para>
|
</para>
|
||||||
<para>
|
<para>
|
||||||
|
It is advised, but not required that you turn on the
|
||||||
|
CONFIG_FRAME_POINTER kernel option. This option inserts code to
|
||||||
|
into the compiled executable which saves the frame information in
|
||||||
|
registers or on the stack at different points which will allow a
|
||||||
|
debugger such as gdb to more accurately construct stack back traces
|
||||||
|
while debugging the kernel.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
If the architecture that you are using supports the kernel option
|
||||||
|
CONFIG_DEBUG_RODATA, you should consider turning it off. This
|
||||||
|
option will prevent the use of software breakpoints because it
|
||||||
|
marks certain regions of the kernel's memory space as read-only.
|
||||||
|
If kgdb supports it for the architecture you are using, you can
|
||||||
|
use hardware breakpoints if you desire to run with the
|
||||||
|
CONFIG_DEBUG_RODATA option turned on, else you need to turn off
|
||||||
|
this option.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
Next you should choose one of more I/O drivers to interconnect debugging
|
Next you should choose one of more I/O drivers to interconnect debugging
|
||||||
host and debugged target. Early boot debugging requires a KGDB
|
host and debugged target. Early boot debugging requires a KGDB
|
||||||
I/O driver that supports early debugging and the driver must be
|
I/O driver that supports early debugging and the driver must be
|
||||||
|
@@ -145,7 +145,6 @@ usage should require reading the full document.
|
|||||||
this though and the recommendation to allow only a single
|
this though and the recommendation to allow only a single
|
||||||
interface in STA mode at first!
|
interface in STA mode at first!
|
||||||
</para>
|
</para>
|
||||||
!Finclude/net/mac80211.h ieee80211_if_types
|
|
||||||
!Finclude/net/mac80211.h ieee80211_if_init_conf
|
!Finclude/net/mac80211.h ieee80211_if_init_conf
|
||||||
!Finclude/net/mac80211.h ieee80211_if_conf
|
!Finclude/net/mac80211.h ieee80211_if_conf
|
||||||
</chapter>
|
</chapter>
|
||||||
@@ -177,8 +176,7 @@ usage should require reading the full document.
|
|||||||
<title>functions/definitions</title>
|
<title>functions/definitions</title>
|
||||||
!Finclude/net/mac80211.h ieee80211_rx_status
|
!Finclude/net/mac80211.h ieee80211_rx_status
|
||||||
!Finclude/net/mac80211.h mac80211_rx_flags
|
!Finclude/net/mac80211.h mac80211_rx_flags
|
||||||
!Finclude/net/mac80211.h ieee80211_tx_control
|
!Finclude/net/mac80211.h ieee80211_tx_info
|
||||||
!Finclude/net/mac80211.h ieee80211_tx_status_flags
|
|
||||||
!Finclude/net/mac80211.h ieee80211_rx
|
!Finclude/net/mac80211.h ieee80211_rx
|
||||||
!Finclude/net/mac80211.h ieee80211_rx_irqsafe
|
!Finclude/net/mac80211.h ieee80211_rx_irqsafe
|
||||||
!Finclude/net/mac80211.h ieee80211_tx_status
|
!Finclude/net/mac80211.h ieee80211_tx_status
|
||||||
@@ -189,12 +187,11 @@ usage should require reading the full document.
|
|||||||
!Finclude/net/mac80211.h ieee80211_ctstoself_duration
|
!Finclude/net/mac80211.h ieee80211_ctstoself_duration
|
||||||
!Finclude/net/mac80211.h ieee80211_generic_frame_duration
|
!Finclude/net/mac80211.h ieee80211_generic_frame_duration
|
||||||
!Finclude/net/mac80211.h ieee80211_get_hdrlen_from_skb
|
!Finclude/net/mac80211.h ieee80211_get_hdrlen_from_skb
|
||||||
!Finclude/net/mac80211.h ieee80211_get_hdrlen
|
!Finclude/net/mac80211.h ieee80211_hdrlen
|
||||||
!Finclude/net/mac80211.h ieee80211_wake_queue
|
!Finclude/net/mac80211.h ieee80211_wake_queue
|
||||||
!Finclude/net/mac80211.h ieee80211_stop_queue
|
!Finclude/net/mac80211.h ieee80211_stop_queue
|
||||||
!Finclude/net/mac80211.h ieee80211_start_queues
|
|
||||||
!Finclude/net/mac80211.h ieee80211_stop_queues
|
|
||||||
!Finclude/net/mac80211.h ieee80211_wake_queues
|
!Finclude/net/mac80211.h ieee80211_wake_queues
|
||||||
|
!Finclude/net/mac80211.h ieee80211_stop_queues
|
||||||
</sect1>
|
</sect1>
|
||||||
</chapter>
|
</chapter>
|
||||||
|
|
||||||
@@ -230,8 +227,7 @@ usage should require reading the full document.
|
|||||||
<title>Multiple queues and QoS support</title>
|
<title>Multiple queues and QoS support</title>
|
||||||
<para>TBD</para>
|
<para>TBD</para>
|
||||||
!Finclude/net/mac80211.h ieee80211_tx_queue_params
|
!Finclude/net/mac80211.h ieee80211_tx_queue_params
|
||||||
!Finclude/net/mac80211.h ieee80211_tx_queue_stats_data
|
!Finclude/net/mac80211.h ieee80211_tx_queue_stats
|
||||||
!Finclude/net/mac80211.h ieee80211_tx_queue
|
|
||||||
</chapter>
|
</chapter>
|
||||||
|
|
||||||
<chapter id="AP">
|
<chapter id="AP">
|
||||||
|
@@ -29,12 +29,12 @@
|
|||||||
|
|
||||||
<revhistory>
|
<revhistory>
|
||||||
<revision>
|
<revision>
|
||||||
<revnumber>1.0 </revnumber>
|
<revnumber>1.0</revnumber>
|
||||||
<date>May 30, 2001</date>
|
<date>May 30, 2001</date>
|
||||||
<revremark>Initial revision posted to linux-kernel</revremark>
|
<revremark>Initial revision posted to linux-kernel</revremark>
|
||||||
</revision>
|
</revision>
|
||||||
<revision>
|
<revision>
|
||||||
<revnumber>1.1 </revnumber>
|
<revnumber>1.1</revnumber>
|
||||||
<date>June 3, 2001</date>
|
<date>June 3, 2001</date>
|
||||||
<revremark>Revised after comments from linux-kernel</revremark>
|
<revremark>Revised after comments from linux-kernel</revremark>
|
||||||
</revision>
|
</revision>
|
||||||
|
@@ -189,8 +189,6 @@ static int __init init_procfs_example(void)
|
|||||||
return 0;
|
return 0;
|
||||||
|
|
||||||
no_symlink:
|
no_symlink:
|
||||||
remove_proc_entry("tty", example_dir);
|
|
||||||
no_tty:
|
|
||||||
remove_proc_entry("bar", example_dir);
|
remove_proc_entry("bar", example_dir);
|
||||||
no_bar:
|
no_bar:
|
||||||
remove_proc_entry("foo", example_dir);
|
remove_proc_entry("foo", example_dir);
|
||||||
@@ -206,7 +204,6 @@ out:
|
|||||||
static void __exit cleanup_procfs_example(void)
|
static void __exit cleanup_procfs_example(void)
|
||||||
{
|
{
|
||||||
remove_proc_entry("jiffies_too", example_dir);
|
remove_proc_entry("jiffies_too", example_dir);
|
||||||
remove_proc_entry("tty", example_dir);
|
|
||||||
remove_proc_entry("bar", example_dir);
|
remove_proc_entry("bar", example_dir);
|
||||||
remove_proc_entry("foo", example_dir);
|
remove_proc_entry("foo", example_dir);
|
||||||
remove_proc_entry("jiffies", example_dir);
|
remove_proc_entry("jiffies", example_dir);
|
||||||
@@ -222,3 +219,4 @@ module_exit(cleanup_procfs_example);
|
|||||||
|
|
||||||
MODULE_AUTHOR("Erik Mouw");
|
MODULE_AUTHOR("Erik Mouw");
|
||||||
MODULE_DESCRIPTION("procfs examples");
|
MODULE_DESCRIPTION("procfs examples");
|
||||||
|
MODULE_LICENSE("GPL");
|
||||||
|
@@ -100,7 +100,7 @@
|
|||||||
the hardware structures represented here, please consult the Principles
|
the hardware structures represented here, please consult the Principles
|
||||||
of Operation.
|
of Operation.
|
||||||
</para>
|
</para>
|
||||||
!Iinclude/asm-s390/cio.h
|
!Iarch/s390/include/asm/cio.h
|
||||||
</sect1>
|
</sect1>
|
||||||
<sect1 id="ccwdev">
|
<sect1 id="ccwdev">
|
||||||
<title>ccw devices</title>
|
<title>ccw devices</title>
|
||||||
@@ -114,7 +114,7 @@
|
|||||||
ccw device structure. Device drivers must not bypass those functions
|
ccw device structure. Device drivers must not bypass those functions
|
||||||
or strange side effects may happen.
|
or strange side effects may happen.
|
||||||
</para>
|
</para>
|
||||||
!Iinclude/asm-s390/ccwdev.h
|
!Iarch/s390/include/asm/ccwdev.h
|
||||||
!Edrivers/s390/cio/device.c
|
!Edrivers/s390/cio/device.c
|
||||||
!Edrivers/s390/cio/device_ops.c
|
!Edrivers/s390/cio/device_ops.c
|
||||||
</sect1>
|
</sect1>
|
||||||
@@ -125,7 +125,7 @@
|
|||||||
measurement data which is made available by the channel subsystem
|
measurement data which is made available by the channel subsystem
|
||||||
for each channel attached device.
|
for each channel attached device.
|
||||||
</para>
|
</para>
|
||||||
!Iinclude/asm-s390/cmb.h
|
!Iarch/s390/include/asm/cmb.h
|
||||||
!Edrivers/s390/cio/cmf.c
|
!Edrivers/s390/cio/cmf.c
|
||||||
</sect1>
|
</sect1>
|
||||||
</chapter>
|
</chapter>
|
||||||
@@ -142,7 +142,7 @@
|
|||||||
</para>
|
</para>
|
||||||
<sect1 id="ccwgroupdevices">
|
<sect1 id="ccwgroupdevices">
|
||||||
<title>ccw group devices</title>
|
<title>ccw group devices</title>
|
||||||
!Iinclude/asm-s390/ccwgroup.h
|
!Iarch/s390/include/asm/ccwgroup.h
|
||||||
!Edrivers/s390/cio/ccwgroup.c
|
!Edrivers/s390/cio/ccwgroup.c
|
||||||
</sect1>
|
</sect1>
|
||||||
</chapter>
|
</chapter>
|
||||||
|
105
Documentation/DocBook/sh.tmpl
Normal file
105
Documentation/DocBook/sh.tmpl
Normal file
@@ -0,0 +1,105 @@
|
|||||||
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
|
||||||
|
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
|
||||||
|
|
||||||
|
<book id="sh-drivers">
|
||||||
|
<bookinfo>
|
||||||
|
<title>SuperH Interfaces Guide</title>
|
||||||
|
|
||||||
|
<authorgroup>
|
||||||
|
<author>
|
||||||
|
<firstname>Paul</firstname>
|
||||||
|
<surname>Mundt</surname>
|
||||||
|
<affiliation>
|
||||||
|
<address>
|
||||||
|
<email>lethal@linux-sh.org</email>
|
||||||
|
</address>
|
||||||
|
</affiliation>
|
||||||
|
</author>
|
||||||
|
</authorgroup>
|
||||||
|
|
||||||
|
<copyright>
|
||||||
|
<year>2008</year>
|
||||||
|
<holder>Paul Mundt</holder>
|
||||||
|
</copyright>
|
||||||
|
<copyright>
|
||||||
|
<year>2008</year>
|
||||||
|
<holder>Renesas Technology Corp.</holder>
|
||||||
|
</copyright>
|
||||||
|
|
||||||
|
<legalnotice>
|
||||||
|
<para>
|
||||||
|
This documentation is free software; you can redistribute
|
||||||
|
it and/or modify it under the terms of the GNU General Public
|
||||||
|
License version 2 as published by the Free Software Foundation.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
This program is distributed in the hope that it will be
|
||||||
|
useful, but WITHOUT ANY WARRANTY; without even the implied
|
||||||
|
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
||||||
|
See the GNU General Public License for more details.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
You should have received a copy of the GNU General Public
|
||||||
|
License along with this program; if not, write to the Free
|
||||||
|
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
|
||||||
|
MA 02111-1307 USA
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
For more details see the file COPYING in the source
|
||||||
|
distribution of Linux.
|
||||||
|
</para>
|
||||||
|
</legalnotice>
|
||||||
|
</bookinfo>
|
||||||
|
|
||||||
|
<toc></toc>
|
||||||
|
|
||||||
|
<chapter id="mm">
|
||||||
|
<title>Memory Management</title>
|
||||||
|
<sect1 id="sh4">
|
||||||
|
<title>SH-4</title>
|
||||||
|
<sect2 id="sq">
|
||||||
|
<title>Store Queue API</title>
|
||||||
|
!Earch/sh/kernel/cpu/sh4/sq.c
|
||||||
|
</sect2>
|
||||||
|
</sect1>
|
||||||
|
<sect1 id="sh5">
|
||||||
|
<title>SH-5</title>
|
||||||
|
<sect2 id="tlb">
|
||||||
|
<title>TLB Interfaces</title>
|
||||||
|
!Iarch/sh/mm/tlb-sh5.c
|
||||||
|
!Iarch/sh/include/asm/tlb_64.h
|
||||||
|
</sect2>
|
||||||
|
</sect1>
|
||||||
|
</chapter>
|
||||||
|
<chapter id="clk">
|
||||||
|
<title>Clock Framework Extensions</title>
|
||||||
|
!Iarch/sh/include/asm/clock.h
|
||||||
|
</chapter>
|
||||||
|
<chapter id="mach">
|
||||||
|
<title>Machine Specific Interfaces</title>
|
||||||
|
<sect1 id="dreamcast">
|
||||||
|
<title>mach-dreamcast</title>
|
||||||
|
!Iarch/sh/boards/mach-dreamcast/rtc.c
|
||||||
|
</sect1>
|
||||||
|
<sect1 id="x3proto">
|
||||||
|
<title>mach-x3proto</title>
|
||||||
|
!Earch/sh/boards/mach-x3proto/ilsel.c
|
||||||
|
</sect1>
|
||||||
|
</chapter>
|
||||||
|
<chapter id="busses">
|
||||||
|
<title>Busses</title>
|
||||||
|
<sect1 id="superhyway">
|
||||||
|
<title>SuperHyway</title>
|
||||||
|
!Edrivers/sh/superhyway/superhyway.c
|
||||||
|
</sect1>
|
||||||
|
|
||||||
|
<sect1 id="maple">
|
||||||
|
<title>Maple</title>
|
||||||
|
!Edrivers/sh/maple/maple.c
|
||||||
|
</sect1>
|
||||||
|
</chapter>
|
||||||
|
</book>
|
@@ -21,6 +21,18 @@
|
|||||||
</affiliation>
|
</affiliation>
|
||||||
</author>
|
</author>
|
||||||
|
|
||||||
|
<copyright>
|
||||||
|
<year>2006-2008</year>
|
||||||
|
<holder>Hans-Jürgen Koch.</holder>
|
||||||
|
</copyright>
|
||||||
|
|
||||||
|
<legalnotice>
|
||||||
|
<para>
|
||||||
|
This documentation is Free Software licensed under the terms of the
|
||||||
|
GPL version 2.
|
||||||
|
</para>
|
||||||
|
</legalnotice>
|
||||||
|
|
||||||
<pubdate>2006-12-11</pubdate>
|
<pubdate>2006-12-11</pubdate>
|
||||||
|
|
||||||
<abstract>
|
<abstract>
|
||||||
@@ -29,6 +41,12 @@
|
|||||||
</abstract>
|
</abstract>
|
||||||
|
|
||||||
<revhistory>
|
<revhistory>
|
||||||
|
<revision>
|
||||||
|
<revnumber>0.5</revnumber>
|
||||||
|
<date>2008-05-22</date>
|
||||||
|
<authorinitials>hjk</authorinitials>
|
||||||
|
<revremark>Added description of write() function.</revremark>
|
||||||
|
</revision>
|
||||||
<revision>
|
<revision>
|
||||||
<revnumber>0.4</revnumber>
|
<revnumber>0.4</revnumber>
|
||||||
<date>2007-11-26</date>
|
<date>2007-11-26</date>
|
||||||
@@ -57,20 +75,9 @@
|
|||||||
</bookinfo>
|
</bookinfo>
|
||||||
|
|
||||||
<chapter id="aboutthisdoc">
|
<chapter id="aboutthisdoc">
|
||||||
<?dbhtml filename="about.html"?>
|
<?dbhtml filename="aboutthis.html"?>
|
||||||
<title>About this document</title>
|
<title>About this document</title>
|
||||||
|
|
||||||
<sect1 id="copyright">
|
|
||||||
<?dbhtml filename="copyright.html"?>
|
|
||||||
<title>Copyright and License</title>
|
|
||||||
<para>
|
|
||||||
Copyright (c) 2006 by Hans-Jürgen Koch.</para>
|
|
||||||
<para>
|
|
||||||
This documentation is Free Software licensed under the terms of the
|
|
||||||
GPL version 2.
|
|
||||||
</para>
|
|
||||||
</sect1>
|
|
||||||
|
|
||||||
<sect1 id="translations">
|
<sect1 id="translations">
|
||||||
<?dbhtml filename="translations.html"?>
|
<?dbhtml filename="translations.html"?>
|
||||||
<title>Translations</title>
|
<title>Translations</title>
|
||||||
@@ -189,6 +196,30 @@ interested in translating it, please email me
|
|||||||
represents the total interrupt count. You can use this number
|
represents the total interrupt count. You can use this number
|
||||||
to figure out if you missed some interrupts.
|
to figure out if you missed some interrupts.
|
||||||
</para>
|
</para>
|
||||||
|
<para>
|
||||||
|
For some hardware that has more than one interrupt source internally,
|
||||||
|
but not separate IRQ mask and status registers, there might be
|
||||||
|
situations where userspace cannot determine what the interrupt source
|
||||||
|
was if the kernel handler disables them by writing to the chip's IRQ
|
||||||
|
register. In such a case, the kernel has to disable the IRQ completely
|
||||||
|
to leave the chip's register untouched. Now the userspace part can
|
||||||
|
determine the cause of the interrupt, but it cannot re-enable
|
||||||
|
interrupts. Another cornercase is chips where re-enabling interrupts
|
||||||
|
is a read-modify-write operation to a combined IRQ status/acknowledge
|
||||||
|
register. This would be racy if a new interrupt occurred
|
||||||
|
simultaneously.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
To address these problems, UIO also implements a write() function. It
|
||||||
|
is normally not used and can be ignored for hardware that has only a
|
||||||
|
single interrupt source or has separate IRQ mask and status registers.
|
||||||
|
If you need it, however, a write to <filename>/dev/uioX</filename>
|
||||||
|
will call the <function>irqcontrol()</function> function implemented
|
||||||
|
by the driver. You have to write a 32-bit value that is usually either
|
||||||
|
0 or 1 to disable or enable interrupts. If a driver does not implement
|
||||||
|
<function>irqcontrol()</function>, <function>write()</function> will
|
||||||
|
return with <varname>-ENOSYS</varname>.
|
||||||
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
To handle interrupts properly, your custom kernel module can
|
To handle interrupts properly, your custom kernel module can
|
||||||
@@ -362,6 +393,14 @@ device is actually used.
|
|||||||
<function>open()</function>, you will probably also want a custom
|
<function>open()</function>, you will probably also want a custom
|
||||||
<function>release()</function> function.
|
<function>release()</function> function.
|
||||||
</para></listitem>
|
</para></listitem>
|
||||||
|
|
||||||
|
<listitem><para>
|
||||||
|
<varname>int (*irqcontrol)(struct uio_info *info, s32 irq_on)
|
||||||
|
</varname>: Optional. If you need to be able to enable or disable
|
||||||
|
interrupts from userspace by writing to <filename>/dev/uioX</filename>,
|
||||||
|
you can implement this function. The parameter <varname>irq_on</varname>
|
||||||
|
will be 0 to disable interrupts and 1 to enable them.
|
||||||
|
</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
|
@@ -1648,7 +1648,7 @@ static struct video_buffer capture_fb;
|
|||||||
|
|
||||||
<chapter id="pubfunctions">
|
<chapter id="pubfunctions">
|
||||||
<title>Public Functions Provided</title>
|
<title>Public Functions Provided</title>
|
||||||
!Edrivers/media/video/videodev.c
|
!Edrivers/media/video/v4l2-dev.c
|
||||||
</chapter>
|
</chapter>
|
||||||
|
|
||||||
</book>
|
</book>
|
||||||
|
@@ -69,12 +69,6 @@
|
|||||||
device to be used as both a tty interface and as a synchronous
|
device to be used as both a tty interface and as a synchronous
|
||||||
controller is a project for Linux post the 2.4 release
|
controller is a project for Linux post the 2.4 release
|
||||||
</para>
|
</para>
|
||||||
<para>
|
|
||||||
The support code handles most common card configurations and
|
|
||||||
supports running both Cisco HDLC and Synchronous PPP. With extra
|
|
||||||
glue the frame relay and X.25 protocols can also be used with this
|
|
||||||
driver.
|
|
||||||
</para>
|
|
||||||
</chapter>
|
</chapter>
|
||||||
|
|
||||||
<chapter id="Driver_Modes">
|
<chapter id="Driver_Modes">
|
||||||
@@ -179,35 +173,27 @@
|
|||||||
<para>
|
<para>
|
||||||
If you wish to use the network interface facilities of the driver,
|
If you wish to use the network interface facilities of the driver,
|
||||||
then you need to attach a network device to each channel that is
|
then you need to attach a network device to each channel that is
|
||||||
present and in use. In addition to use the SyncPPP and Cisco HDLC
|
present and in use. In addition to use the generic HDLC
|
||||||
you need to follow some additional plumbing rules. They may seem
|
you need to follow some additional plumbing rules. They may seem
|
||||||
complex but a look at the example hostess_sv11 driver should
|
complex but a look at the example hostess_sv11 driver should
|
||||||
reassure you.
|
reassure you.
|
||||||
</para>
|
</para>
|
||||||
<para>
|
<para>
|
||||||
The network device used for each channel should be pointed to by
|
The network device used for each channel should be pointed to by
|
||||||
the netdevice field of each channel. The dev-> priv field of the
|
the netdevice field of each channel. The hdlc-> priv field of the
|
||||||
network device points to your private data - you will need to be
|
network device points to your private data - you will need to be
|
||||||
able to find your ppp device from this. In addition to use the
|
able to find your private data from this.
|
||||||
sync ppp layer the private data must start with a void * pointer
|
|
||||||
to the syncppp structures.
|
|
||||||
</para>
|
</para>
|
||||||
<para>
|
<para>
|
||||||
The way most drivers approach this particular problem is to
|
The way most drivers approach this particular problem is to
|
||||||
create a structure holding the Z8530 device definition and
|
create a structure holding the Z8530 device definition and
|
||||||
put that and the syncppp pointer into the private field of
|
put that into the private field of the network device. The
|
||||||
the network device. The network device fields of the channels
|
network device fields of the channels then point back to the
|
||||||
then point back to the network devices. The ppp_device can also
|
network devices.
|
||||||
be put in the private structure conveniently.
|
|
||||||
</para>
|
</para>
|
||||||
<para>
|
<para>
|
||||||
If you wish to use the synchronous ppp then you need to attach
|
If you wish to use the generic HDLC then you need to register
|
||||||
the syncppp layer to the network device. You should do this before
|
the HDLC device.
|
||||||
you register the network device. The
|
|
||||||
<function>sppp_attach</function> requires that the first void *
|
|
||||||
pointer in your private data is pointing to an empty struct
|
|
||||||
ppp_device. The function fills in the initial data for the
|
|
||||||
ppp/hdlc layer.
|
|
||||||
</para>
|
</para>
|
||||||
<para>
|
<para>
|
||||||
Before you register your network device you will also need to
|
Before you register your network device you will also need to
|
||||||
@@ -314,10 +300,10 @@
|
|||||||
buffer in sk_buff format and queues it for transmission. The
|
buffer in sk_buff format and queues it for transmission. The
|
||||||
caller must provide the entire packet with the exception of the
|
caller must provide the entire packet with the exception of the
|
||||||
bitstuffing and CRC. This is normally done by the caller via
|
bitstuffing and CRC. This is normally done by the caller via
|
||||||
the syncppp interface layer. It returns 0 if the buffer has been
|
the generic HDLC interface layer. It returns 0 if the buffer has been
|
||||||
queued and non zero values for queue full. If the function accepts
|
queued and non zero values for queue full. If the function accepts
|
||||||
the buffer it becomes property of the Z8530 layer and the caller
|
the buffer it becomes property of the Z8530 layer and the caller
|
||||||
should not free it.
|
should not free it.
|
||||||
</para>
|
</para>
|
||||||
<para>
|
<para>
|
||||||
The function <function>z8530_get_stats</function> returns a pointer
|
The function <function>z8530_get_stats</function> returns a pointer
|
||||||
|
@@ -77,7 +77,8 @@ documentation files are also added which explain how to use the feature.
|
|||||||
When a kernel change causes the interface that the kernel exposes to
|
When a kernel change causes the interface that the kernel exposes to
|
||||||
userspace to change, it is recommended that you send the information or
|
userspace to change, it is recommended that you send the information or
|
||||||
a patch to the manual pages explaining the change to the manual pages
|
a patch to the manual pages explaining the change to the manual pages
|
||||||
maintainer at mtk.manpages@gmail.com.
|
maintainer at mtk.manpages@gmail.com, and CC the list
|
||||||
|
linux-api@vger.kernel.org.
|
||||||
|
|
||||||
Here is a list of files that are in the kernel source tree that are
|
Here is a list of files that are in the kernel source tree that are
|
||||||
required reading:
|
required reading:
|
||||||
@@ -358,7 +359,7 @@ Here is a list of some of the different kernel trees available:
|
|||||||
- pcmcia, Dominik Brodowski <linux@dominikbrodowski.net>
|
- pcmcia, Dominik Brodowski <linux@dominikbrodowski.net>
|
||||||
git.kernel.org:/pub/scm/linux/kernel/git/brodo/pcmcia-2.6.git
|
git.kernel.org:/pub/scm/linux/kernel/git/brodo/pcmcia-2.6.git
|
||||||
|
|
||||||
- SCSI, James Bottomley <James.Bottomley@SteelEye.com>
|
- SCSI, James Bottomley <James.Bottomley@hansenpartnership.com>
|
||||||
git.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git
|
git.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git
|
||||||
|
|
||||||
- x86, Ingo Molnar <mingo@elte.hu>
|
- x86, Ingo Molnar <mingo@elte.hu>
|
||||||
@@ -377,7 +378,7 @@ Bug Reporting
|
|||||||
bugzilla.kernel.org is where the Linux kernel developers track kernel
|
bugzilla.kernel.org is where the Linux kernel developers track kernel
|
||||||
bugs. Users are encouraged to report all bugs that they find in this
|
bugs. Users are encouraged to report all bugs that they find in this
|
||||||
tool. For details on how to use the kernel bugzilla, please see:
|
tool. For details on how to use the kernel bugzilla, please see:
|
||||||
http://test.kernel.org/bugzilla/faq.html
|
http://bugzilla.kernel.org/page.cgi?id=faq.html
|
||||||
|
|
||||||
The file REPORTING-BUGS in the main kernel source directory has a good
|
The file REPORTING-BUGS in the main kernel source directory has a good
|
||||||
template for how to report a possible kernel bug, and details what kind
|
template for how to report a possible kernel bug, and details what kind
|
||||||
|
@@ -1,17 +1,26 @@
|
|||||||
|
ChangeLog:
|
||||||
|
Started by Ingo Molnar <mingo@redhat.com>
|
||||||
|
Update by Max Krasnyansky <maxk@qualcomm.com>
|
||||||
|
|
||||||
SMP IRQ affinity, started by Ingo Molnar <mingo@redhat.com>
|
SMP IRQ affinity
|
||||||
|
|
||||||
|
|
||||||
/proc/irq/IRQ#/smp_affinity specifies which target CPUs are permitted
|
/proc/irq/IRQ#/smp_affinity specifies which target CPUs are permitted
|
||||||
for a given IRQ source. It's a bitmask of allowed CPUs. It's not allowed
|
for a given IRQ source. It's a bitmask of allowed CPUs. It's not allowed
|
||||||
to turn off all CPUs, and if an IRQ controller does not support IRQ
|
to turn off all CPUs, and if an IRQ controller does not support IRQ
|
||||||
affinity then the value will not change from the default 0xffffffff.
|
affinity then the value will not change from the default 0xffffffff.
|
||||||
|
|
||||||
Here is an example of restricting IRQ44 (eth1) to CPU0-3 then restricting
|
/proc/irq/default_smp_affinity specifies default affinity mask that applies
|
||||||
the IRQ to CPU4-7 (this is an 8-CPU SMP box):
|
to all non-active IRQs. Once IRQ is allocated/activated its affinity bitmask
|
||||||
|
will be set to the default mask. It can then be changed as described above.
|
||||||
|
Default mask is 0xffffffff.
|
||||||
|
|
||||||
|
Here is an example of restricting IRQ44 (eth1) to CPU0-3 then restricting
|
||||||
|
it to CPU4-7 (this is an 8-CPU SMP box):
|
||||||
|
|
||||||
|
[root@moon 44]# cd /proc/irq/44
|
||||||
[root@moon 44]# cat smp_affinity
|
[root@moon 44]# cat smp_affinity
|
||||||
ffffffff
|
ffffffff
|
||||||
|
|
||||||
[root@moon 44]# echo 0f > smp_affinity
|
[root@moon 44]# echo 0f > smp_affinity
|
||||||
[root@moon 44]# cat smp_affinity
|
[root@moon 44]# cat smp_affinity
|
||||||
0000000f
|
0000000f
|
||||||
@@ -21,17 +30,27 @@ PING hell (195.4.7.3): 56 data bytes
|
|||||||
--- hell ping statistics ---
|
--- hell ping statistics ---
|
||||||
6029 packets transmitted, 6027 packets received, 0% packet loss
|
6029 packets transmitted, 6027 packets received, 0% packet loss
|
||||||
round-trip min/avg/max = 0.1/0.1/0.4 ms
|
round-trip min/avg/max = 0.1/0.1/0.4 ms
|
||||||
[root@moon 44]# cat /proc/interrupts | grep 44:
|
[root@moon 44]# cat /proc/interrupts | grep 'CPU\|44:'
|
||||||
44: 0 1785 1785 1783 1783 1
|
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
|
||||||
1 0 IO-APIC-level eth1
|
44: 1068 1785 1785 1783 0 0 0 0 IO-APIC-level eth1
|
||||||
|
|
||||||
|
As can be seen from the line above IRQ44 was delivered only to the first four
|
||||||
|
processors (0-3).
|
||||||
|
Now lets restrict that IRQ to CPU(4-7).
|
||||||
|
|
||||||
[root@moon 44]# echo f0 > smp_affinity
|
[root@moon 44]# echo f0 > smp_affinity
|
||||||
|
[root@moon 44]# cat smp_affinity
|
||||||
|
000000f0
|
||||||
[root@moon 44]# ping -f h
|
[root@moon 44]# ping -f h
|
||||||
PING hell (195.4.7.3): 56 data bytes
|
PING hell (195.4.7.3): 56 data bytes
|
||||||
..
|
..
|
||||||
--- hell ping statistics ---
|
--- hell ping statistics ---
|
||||||
2779 packets transmitted, 2777 packets received, 0% packet loss
|
2779 packets transmitted, 2777 packets received, 0% packet loss
|
||||||
round-trip min/avg/max = 0.1/0.5/585.4 ms
|
round-trip min/avg/max = 0.1/0.5/585.4 ms
|
||||||
[root@moon 44]# cat /proc/interrupts | grep 44:
|
[root@moon 44]# cat /proc/interrupts | 'CPU\|44:'
|
||||||
44: 1068 1785 1785 1784 1784 1069 1070 1069 IO-APIC-level eth1
|
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
|
||||||
[root@moon 44]#
|
44: 1068 1785 1785 1783 1784 1069 1070 1069 IO-APIC-level eth1
|
||||||
|
|
||||||
|
This time around IRQ44 was delivered only to the last four processors.
|
||||||
|
i.e counters for the CPU0-3 did not change.
|
||||||
|
|
||||||
|
@@ -48,7 +48,7 @@ IOVA generation is pretty generic. We used the same technique as vmalloc()
|
|||||||
but these are not global address spaces, but separate for each domain.
|
but these are not global address spaces, but separate for each domain.
|
||||||
Different DMA engines may support different number of domains.
|
Different DMA engines may support different number of domains.
|
||||||
|
|
||||||
We also allocate gaurd pages with each mapping, so we can attempt to catch
|
We also allocate guard pages with each mapping, so we can attempt to catch
|
||||||
any overflow that might happen.
|
any overflow that might happen.
|
||||||
|
|
||||||
|
|
||||||
@@ -112,4 +112,4 @@ TBD
|
|||||||
|
|
||||||
- For compatibility testing, could use unity map domain for all devices, just
|
- For compatibility testing, could use unity map domain for all devices, just
|
||||||
provide a 1-1 for all useful memory under a single domain for all devices.
|
provide a 1-1 for all useful memory under a single domain for all devices.
|
||||||
- API for paravirt ops for abstracting functionlity for VMM folks.
|
- API for paravirt ops for abstracting functionality for VMM folks.
|
||||||
|
3
Documentation/Makefile
Normal file
3
Documentation/Makefile
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
obj-m := DocBook/ accounting/ auxdisplay/ connector/ \
|
||||||
|
filesystems/configfs/ ia64/ networking/ \
|
||||||
|
pcmcia/ spi/ video4linux/ vm/ watchdog/src/
|
@@ -93,6 +93,9 @@ Since NMI handlers disable preemption, synchronize_sched() is guaranteed
|
|||||||
not to return until all ongoing NMI handlers exit. It is therefore safe
|
not to return until all ongoing NMI handlers exit. It is therefore safe
|
||||||
to free up the handler's data as soon as synchronize_sched() returns.
|
to free up the handler's data as soon as synchronize_sched() returns.
|
||||||
|
|
||||||
|
Important note: for this to work, the architecture in question must
|
||||||
|
invoke irq_enter() and irq_exit() on NMI entry and exit, respectively.
|
||||||
|
|
||||||
|
|
||||||
Answer to Quick Quiz
|
Answer to Quick Quiz
|
||||||
|
|
||||||
|
@@ -52,6 +52,10 @@ of each iteration. Unfortunately, chaotic relaxation requires highly
|
|||||||
structured data, such as the matrices used in scientific programs, and
|
structured data, such as the matrices used in scientific programs, and
|
||||||
is thus inapplicable to most data structures in operating-system kernels.
|
is thus inapplicable to most data structures in operating-system kernels.
|
||||||
|
|
||||||
|
In 1992, Henry (now Alexia) Massalin completed a dissertation advising
|
||||||
|
parallel programmers to defer processing when feasible to simplify
|
||||||
|
synchronization. RCU makes extremely heavy use of this advice.
|
||||||
|
|
||||||
In 1993, Jacobson [Jacobson93] verbally described what is perhaps the
|
In 1993, Jacobson [Jacobson93] verbally described what is perhaps the
|
||||||
simplest deferred-free technique: simply waiting a fixed amount of time
|
simplest deferred-free technique: simply waiting a fixed amount of time
|
||||||
before freeing blocks awaiting deferred free. Jacobson did not describe
|
before freeing blocks awaiting deferred free. Jacobson did not describe
|
||||||
@@ -138,6 +142,13 @@ blocking in read-side critical sections appeared [PaulEMcKenney2006c],
|
|||||||
Robert Olsson described an RCU-protected trie-hash combination
|
Robert Olsson described an RCU-protected trie-hash combination
|
||||||
[RobertOlsson2006a].
|
[RobertOlsson2006a].
|
||||||
|
|
||||||
|
2007 saw the journal version of the award-winning RCU paper from 2006
|
||||||
|
[ThomasEHart2007a], as well as a paper demonstrating use of Promela
|
||||||
|
and Spin to mechanically verify an optimization to Oleg Nesterov's
|
||||||
|
QRCU [PaulEMcKenney2007QRCUspin], a design document describing
|
||||||
|
preemptible RCU [PaulEMcKenney2007PreemptibleRCU], and the three-part
|
||||||
|
LWN "What is RCU?" series [PaulEMcKenney2007WhatIsRCUFundamentally,
|
||||||
|
PaulEMcKenney2008WhatIsRCUUsage, and PaulEMcKenney2008WhatIsRCUAPI].
|
||||||
|
|
||||||
Bibtex Entries
|
Bibtex Entries
|
||||||
|
|
||||||
@@ -202,6 +213,20 @@ Bibtex Entries
|
|||||||
,Year="1991"
|
,Year="1991"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@phdthesis{HMassalinPhD
|
||||||
|
,author="H. Massalin"
|
||||||
|
,title="Synthesis: An Efficient Implementation of Fundamental Operating
|
||||||
|
System Services"
|
||||||
|
,school="Columbia University"
|
||||||
|
,address="New York, NY"
|
||||||
|
,year="1992"
|
||||||
|
,annotation="
|
||||||
|
Mondo optimizing compiler.
|
||||||
|
Wait-free stuff.
|
||||||
|
Good advice: defer work to avoid synchronization.
|
||||||
|
"
|
||||||
|
}
|
||||||
|
|
||||||
@unpublished{Jacobson93
|
@unpublished{Jacobson93
|
||||||
,author="Van Jacobson"
|
,author="Van Jacobson"
|
||||||
,title="Avoid Read-Side Locking Via Delayed Free"
|
,title="Avoid Read-Side Locking Via Delayed Free"
|
||||||
@@ -635,3 +660,86 @@ Revised:
|
|||||||
"
|
"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@unpublished{PaulEMcKenney2007PreemptibleRCU
|
||||||
|
,Author="Paul E. McKenney"
|
||||||
|
,Title="The design of preemptible read-copy-update"
|
||||||
|
,month="October"
|
||||||
|
,day="8"
|
||||||
|
,year="2007"
|
||||||
|
,note="Available:
|
||||||
|
\url{http://lwn.net/Articles/253651/}
|
||||||
|
[Viewed October 25, 2007]"
|
||||||
|
,annotation="
|
||||||
|
LWN article describing the design of preemptible RCU.
|
||||||
|
"
|
||||||
|
}
|
||||||
|
|
||||||
|
########################################################################
|
||||||
|
#
|
||||||
|
# "What is RCU?" LWN series.
|
||||||
|
#
|
||||||
|
|
||||||
|
@unpublished{PaulEMcKenney2007WhatIsRCUFundamentally
|
||||||
|
,Author="Paul E. McKenney and Jonathan Walpole"
|
||||||
|
,Title="What is {RCU}, Fundamentally?"
|
||||||
|
,month="December"
|
||||||
|
,day="17"
|
||||||
|
,year="2007"
|
||||||
|
,note="Available:
|
||||||
|
\url{http://lwn.net/Articles/262464/}
|
||||||
|
[Viewed December 27, 2007]"
|
||||||
|
,annotation="
|
||||||
|
Lays out the three basic components of RCU: (1) publish-subscribe,
|
||||||
|
(2) wait for pre-existing readers to complete, and (2) maintain
|
||||||
|
multiple versions.
|
||||||
|
"
|
||||||
|
}
|
||||||
|
|
||||||
|
@unpublished{PaulEMcKenney2008WhatIsRCUUsage
|
||||||
|
,Author="Paul E. McKenney"
|
||||||
|
,Title="What is {RCU}? Part 2: Usage"
|
||||||
|
,month="January"
|
||||||
|
,day="4"
|
||||||
|
,year="2008"
|
||||||
|
,note="Available:
|
||||||
|
\url{http://lwn.net/Articles/263130/}
|
||||||
|
[Viewed January 4, 2008]"
|
||||||
|
,annotation="
|
||||||
|
Lays out six uses of RCU:
|
||||||
|
1. RCU is a Reader-Writer Lock Replacement
|
||||||
|
2. RCU is a Restricted Reference-Counting Mechanism
|
||||||
|
3. RCU is a Bulk Reference-Counting Mechanism
|
||||||
|
4. RCU is a Poor Man's Garbage Collector
|
||||||
|
5. RCU is a Way of Providing Existence Guarantees
|
||||||
|
6. RCU is a Way of Waiting for Things to Finish
|
||||||
|
"
|
||||||
|
}
|
||||||
|
|
||||||
|
@unpublished{PaulEMcKenney2008WhatIsRCUAPI
|
||||||
|
,Author="Paul E. McKenney"
|
||||||
|
,Title="{RCU} part 3: the {RCU} {API}"
|
||||||
|
,month="January"
|
||||||
|
,day="17"
|
||||||
|
,year="2008"
|
||||||
|
,note="Available:
|
||||||
|
\url{http://lwn.net/Articles/264090/}
|
||||||
|
[Viewed January 10, 2008]"
|
||||||
|
,annotation="
|
||||||
|
Gives an overview of the Linux-kernel RCU API and a brief annotated RCU
|
||||||
|
bibliography.
|
||||||
|
"
|
||||||
|
}
|
||||||
|
|
||||||
|
@article{DinakarGuniguntala2008IBMSysJ
|
||||||
|
,author="D. Guniguntala and P. E. McKenney and J. Triplett and J. Walpole"
|
||||||
|
,title="The read-copy-update mechanism for supporting real-time applications on shared-memory multiprocessor systems with {Linux}"
|
||||||
|
,Year="2008"
|
||||||
|
,Month="April"
|
||||||
|
,journal="IBM Systems Journal"
|
||||||
|
,volume="47"
|
||||||
|
,number="2"
|
||||||
|
,pages="@@-@@"
|
||||||
|
,annotation="
|
||||||
|
RCU, realtime RCU, sleepable RCU, performance.
|
||||||
|
"
|
||||||
|
}
|
||||||
|
@@ -13,10 +13,13 @@ over a rather long period of time, but improvements are always welcome!
|
|||||||
detailed performance measurements show that RCU is nonetheless
|
detailed performance measurements show that RCU is nonetheless
|
||||||
the right tool for the job.
|
the right tool for the job.
|
||||||
|
|
||||||
The other exception would be where performance is not an issue,
|
Another exception is where performance is not an issue, and RCU
|
||||||
and RCU provides a simpler implementation. An example of this
|
provides a simpler implementation. An example of this situation
|
||||||
situation is the dynamic NMI code in the Linux 2.6 kernel,
|
is the dynamic NMI code in the Linux 2.6 kernel, at least on
|
||||||
at least on architectures where NMIs are rare.
|
architectures where NMIs are rare.
|
||||||
|
|
||||||
|
Yet another exception is where the low real-time latency of RCU's
|
||||||
|
read-side primitives is critically important.
|
||||||
|
|
||||||
1. Does the update code have proper mutual exclusion?
|
1. Does the update code have proper mutual exclusion?
|
||||||
|
|
||||||
@@ -39,9 +42,10 @@ over a rather long period of time, but improvements are always welcome!
|
|||||||
|
|
||||||
2. Do the RCU read-side critical sections make proper use of
|
2. Do the RCU read-side critical sections make proper use of
|
||||||
rcu_read_lock() and friends? These primitives are needed
|
rcu_read_lock() and friends? These primitives are needed
|
||||||
to suppress preemption (or bottom halves, in the case of
|
to prevent grace periods from ending prematurely, which
|
||||||
rcu_read_lock_bh()) in the read-side critical sections,
|
could result in data being unceremoniously freed out from
|
||||||
and are also an excellent aid to readability.
|
under your read-side code, which can greatly increase the
|
||||||
|
actuarial risk of your kernel.
|
||||||
|
|
||||||
As a rough rule of thumb, any dereference of an RCU-protected
|
As a rough rule of thumb, any dereference of an RCU-protected
|
||||||
pointer must be covered by rcu_read_lock() or rcu_read_lock_bh()
|
pointer must be covered by rcu_read_lock() or rcu_read_lock_bh()
|
||||||
@@ -54,15 +58,30 @@ over a rather long period of time, but improvements are always welcome!
|
|||||||
be running while updates are in progress. There are a number
|
be running while updates are in progress. There are a number
|
||||||
of ways to handle this concurrency, depending on the situation:
|
of ways to handle this concurrency, depending on the situation:
|
||||||
|
|
||||||
a. Make updates appear atomic to readers. For example,
|
a. Use the RCU variants of the list and hlist update
|
||||||
|
primitives to add, remove, and replace elements on an
|
||||||
|
RCU-protected list. Alternatively, use the RCU-protected
|
||||||
|
trees that have been added to the Linux kernel.
|
||||||
|
|
||||||
|
This is almost always the best approach.
|
||||||
|
|
||||||
|
b. Proceed as in (a) above, but also maintain per-element
|
||||||
|
locks (that are acquired by both readers and writers)
|
||||||
|
that guard per-element state. Of course, fields that
|
||||||
|
the readers refrain from accessing can be guarded by the
|
||||||
|
update-side lock.
|
||||||
|
|
||||||
|
This works quite well, also.
|
||||||
|
|
||||||
|
c. Make updates appear atomic to readers. For example,
|
||||||
pointer updates to properly aligned fields will appear
|
pointer updates to properly aligned fields will appear
|
||||||
atomic, as will individual atomic primitives. Operations
|
atomic, as will individual atomic primitives. Operations
|
||||||
performed under a lock and sequences of multiple atomic
|
performed under a lock and sequences of multiple atomic
|
||||||
primitives will -not- appear to be atomic.
|
primitives will -not- appear to be atomic.
|
||||||
|
|
||||||
This is almost always the best approach.
|
This can work, but is starting to get a bit tricky.
|
||||||
|
|
||||||
b. Carefully order the updates and the reads so that
|
d. Carefully order the updates and the reads so that
|
||||||
readers see valid data at all phases of the update.
|
readers see valid data at all phases of the update.
|
||||||
This is often more difficult than it sounds, especially
|
This is often more difficult than it sounds, especially
|
||||||
given modern CPUs' tendency to reorder memory references.
|
given modern CPUs' tendency to reorder memory references.
|
||||||
@@ -123,18 +142,22 @@ over a rather long period of time, but improvements are always welcome!
|
|||||||
when publicizing a pointer to a structure that can
|
when publicizing a pointer to a structure that can
|
||||||
be traversed by an RCU read-side critical section.
|
be traversed by an RCU read-side critical section.
|
||||||
|
|
||||||
5. If call_rcu(), or a related primitive such as call_rcu_bh(),
|
5. If call_rcu(), or a related primitive such as call_rcu_bh() or
|
||||||
is used, the callback function must be written to be called
|
call_rcu_sched(), is used, the callback function must be
|
||||||
from softirq context. In particular, it cannot block.
|
written to be called from softirq context. In particular,
|
||||||
|
it cannot block.
|
||||||
|
|
||||||
6. Since synchronize_rcu() can block, it cannot be called from
|
6. Since synchronize_rcu() can block, it cannot be called from
|
||||||
any sort of irq context.
|
any sort of irq context. Ditto for synchronize_sched() and
|
||||||
|
synchronize_srcu().
|
||||||
|
|
||||||
7. If the updater uses call_rcu(), then the corresponding readers
|
7. If the updater uses call_rcu(), then the corresponding readers
|
||||||
must use rcu_read_lock() and rcu_read_unlock(). If the updater
|
must use rcu_read_lock() and rcu_read_unlock(). If the updater
|
||||||
uses call_rcu_bh(), then the corresponding readers must use
|
uses call_rcu_bh(), then the corresponding readers must use
|
||||||
rcu_read_lock_bh() and rcu_read_unlock_bh(). Mixing things up
|
rcu_read_lock_bh() and rcu_read_unlock_bh(). If the updater
|
||||||
will result in confusion and broken kernels.
|
uses call_rcu_sched(), then the corresponding readers must
|
||||||
|
disable preemption. Mixing things up will result in confusion
|
||||||
|
and broken kernels.
|
||||||
|
|
||||||
One exception to this rule: rcu_read_lock() and rcu_read_unlock()
|
One exception to this rule: rcu_read_lock() and rcu_read_unlock()
|
||||||
may be substituted for rcu_read_lock_bh() and rcu_read_unlock_bh()
|
may be substituted for rcu_read_lock_bh() and rcu_read_unlock_bh()
|
||||||
@@ -143,9 +166,9 @@ over a rather long period of time, but improvements are always welcome!
|
|||||||
such cases is a must, of course! And the jury is still out on
|
such cases is a must, of course! And the jury is still out on
|
||||||
whether the increased speed is worth it.
|
whether the increased speed is worth it.
|
||||||
|
|
||||||
8. Although synchronize_rcu() is a bit slower than is call_rcu(),
|
8. Although synchronize_rcu() is slower than is call_rcu(), it
|
||||||
it usually results in simpler code. So, unless update
|
usually results in simpler code. So, unless update performance
|
||||||
performance is critically important or the updaters cannot block,
|
is critically important or the updaters cannot block,
|
||||||
synchronize_rcu() should be used in preference to call_rcu().
|
synchronize_rcu() should be used in preference to call_rcu().
|
||||||
|
|
||||||
An especially important property of the synchronize_rcu()
|
An especially important property of the synchronize_rcu()
|
||||||
@@ -187,23 +210,23 @@ over a rather long period of time, but improvements are always welcome!
|
|||||||
number of updates per grace period.
|
number of updates per grace period.
|
||||||
|
|
||||||
9. All RCU list-traversal primitives, which include
|
9. All RCU list-traversal primitives, which include
|
||||||
list_for_each_rcu(), list_for_each_entry_rcu(),
|
rcu_dereference(), list_for_each_entry_rcu(),
|
||||||
list_for_each_continue_rcu(), and list_for_each_safe_rcu(),
|
list_for_each_continue_rcu(), and list_for_each_safe_rcu(),
|
||||||
must be within an RCU read-side critical section. RCU
|
must be either within an RCU read-side critical section or
|
||||||
|
must be protected by appropriate update-side locks. RCU
|
||||||
read-side critical sections are delimited by rcu_read_lock()
|
read-side critical sections are delimited by rcu_read_lock()
|
||||||
and rcu_read_unlock(), or by similar primitives such as
|
and rcu_read_unlock(), or by similar primitives such as
|
||||||
rcu_read_lock_bh() and rcu_read_unlock_bh().
|
rcu_read_lock_bh() and rcu_read_unlock_bh().
|
||||||
|
|
||||||
Use of the _rcu() list-traversal primitives outside of an
|
The reason that it is permissible to use RCU list-traversal
|
||||||
RCU read-side critical section causes no harm other than
|
primitives when the update-side lock is held is that doing so
|
||||||
a slight performance degradation on Alpha CPUs. It can
|
can be quite helpful in reducing code bloat when common code is
|
||||||
also be quite helpful in reducing code bloat when common
|
shared between readers and updaters.
|
||||||
code is shared between readers and updaters.
|
|
||||||
|
|
||||||
10. Conversely, if you are in an RCU read-side critical section,
|
10. Conversely, if you are in an RCU read-side critical section,
|
||||||
you -must- use the "_rcu()" variants of the list macros.
|
and you don't hold the appropriate update-side lock, you -must-
|
||||||
Failing to do so will break Alpha and confuse people reading
|
use the "_rcu()" variants of the list macros. Failing to do so
|
||||||
your code.
|
will break Alpha and confuse people reading your code.
|
||||||
|
|
||||||
11. Note that synchronize_rcu() -only- guarantees to wait until
|
11. Note that synchronize_rcu() -only- guarantees to wait until
|
||||||
all currently executing rcu_read_lock()-protected RCU read-side
|
all currently executing rcu_read_lock()-protected RCU read-side
|
||||||
@@ -230,6 +253,14 @@ over a rather long period of time, but improvements are always welcome!
|
|||||||
must use whatever locking or other synchronization is required
|
must use whatever locking or other synchronization is required
|
||||||
to safely access and/or modify that data structure.
|
to safely access and/or modify that data structure.
|
||||||
|
|
||||||
|
RCU callbacks are -usually- executed on the same CPU that executed
|
||||||
|
the corresponding call_rcu(), call_rcu_bh(), or call_rcu_sched(),
|
||||||
|
but are by -no- means guaranteed to be. For example, if a given
|
||||||
|
CPU goes offline while having an RCU callback pending, then that
|
||||||
|
RCU callback will execute on some surviving CPU. (If this was
|
||||||
|
not the case, a self-spawning RCU callback would prevent the
|
||||||
|
victim CPU from ever going offline.)
|
||||||
|
|
||||||
14. SRCU (srcu_read_lock(), srcu_read_unlock(), and synchronize_srcu())
|
14. SRCU (srcu_read_lock(), srcu_read_unlock(), and synchronize_srcu())
|
||||||
may only be invoked from process context. Unlike other forms of
|
may only be invoked from process context. Unlike other forms of
|
||||||
RCU, it -is- permissible to block in an SRCU read-side critical
|
RCU, it -is- permissible to block in an SRCU read-side critical
|
||||||
|
@@ -29,9 +29,9 @@ release_referenced() delete()
|
|||||||
}
|
}
|
||||||
|
|
||||||
If this list/array is made lock free using RCU as in changing the
|
If this list/array is made lock free using RCU as in changing the
|
||||||
write_lock() in add() and delete() to spin_lock and changing read_lock
|
write_lock() in add() and delete() to spin_lock() and changing read_lock()
|
||||||
in search_and_reference to rcu_read_lock(), the atomic_get in
|
in search_and_reference() to rcu_read_lock(), the atomic_inc() in
|
||||||
search_and_reference could potentially hold reference to an element which
|
search_and_reference() could potentially hold reference to an element which
|
||||||
has already been deleted from the list/array. Use atomic_inc_not_zero()
|
has already been deleted from the list/array. Use atomic_inc_not_zero()
|
||||||
in this scenario as follows:
|
in this scenario as follows:
|
||||||
|
|
||||||
@@ -40,20 +40,20 @@ add() search_and_reference()
|
|||||||
{ {
|
{ {
|
||||||
alloc_object rcu_read_lock();
|
alloc_object rcu_read_lock();
|
||||||
... search_for_element
|
... search_for_element
|
||||||
atomic_set(&el->rc, 1); if (atomic_inc_not_zero(&el->rc)) {
|
atomic_set(&el->rc, 1); if (!atomic_inc_not_zero(&el->rc)) {
|
||||||
write_lock(&list_lock); rcu_read_unlock();
|
spin_lock(&list_lock); rcu_read_unlock();
|
||||||
return FAIL;
|
return FAIL;
|
||||||
add_element }
|
add_element }
|
||||||
... ...
|
... ...
|
||||||
write_unlock(&list_lock); rcu_read_unlock();
|
spin_unlock(&list_lock); rcu_read_unlock();
|
||||||
} }
|
} }
|
||||||
3. 4.
|
3. 4.
|
||||||
release_referenced() delete()
|
release_referenced() delete()
|
||||||
{ {
|
{ {
|
||||||
... write_lock(&list_lock);
|
... spin_lock(&list_lock);
|
||||||
if (atomic_dec_and_test(&el->rc)) ...
|
if (atomic_dec_and_test(&el->rc)) ...
|
||||||
call_rcu(&el->head, el_free); delete_element
|
call_rcu(&el->head, el_free); delete_element
|
||||||
... write_unlock(&list_lock);
|
... spin_unlock(&list_lock);
|
||||||
} ...
|
} ...
|
||||||
if (atomic_dec_and_test(&el->rc))
|
if (atomic_dec_and_test(&el->rc))
|
||||||
call_rcu(&el->head, el_free);
|
call_rcu(&el->head, el_free);
|
||||||
|
@@ -10,23 +10,30 @@ status messages via printk(), which can be examined via the dmesg
|
|||||||
command (perhaps grepping for "torture"). The test is started
|
command (perhaps grepping for "torture"). The test is started
|
||||||
when the module is loaded, and stops when the module is unloaded.
|
when the module is loaded, and stops when the module is unloaded.
|
||||||
|
|
||||||
However, actually setting this config option to "y" results in the system
|
CONFIG_RCU_TORTURE_TEST_RUNNABLE
|
||||||
running the test immediately upon boot, and ending only when the system
|
|
||||||
is taken down. Normally, one will instead want to build the system
|
It is also possible to specify CONFIG_RCU_TORTURE_TEST=y, which will
|
||||||
with CONFIG_RCU_TORTURE_TEST=m and to use modprobe and rmmod to control
|
result in the tests being loaded into the base kernel. In this case,
|
||||||
the test, perhaps using a script similar to the one shown at the end of
|
the CONFIG_RCU_TORTURE_TEST_RUNNABLE config option is used to specify
|
||||||
this document. Note that you will need CONFIG_MODULE_UNLOAD in order
|
whether the RCU torture tests are to be started immediately during
|
||||||
to be able to end the test.
|
boot or whether the /proc/sys/kernel/rcutorture_runnable file is used
|
||||||
|
to enable them. This /proc file can be used to repeatedly pause and
|
||||||
|
restart the tests, regardless of the initial state specified by the
|
||||||
|
CONFIG_RCU_TORTURE_TEST_RUNNABLE config option.
|
||||||
|
|
||||||
|
You will normally -not- want to start the RCU torture tests during boot
|
||||||
|
(and thus the default is CONFIG_RCU_TORTURE_TEST_RUNNABLE=n), but doing
|
||||||
|
this can sometimes be useful in finding boot-time bugs.
|
||||||
|
|
||||||
|
|
||||||
MODULE PARAMETERS
|
MODULE PARAMETERS
|
||||||
|
|
||||||
This module has the following parameters:
|
This module has the following parameters:
|
||||||
|
|
||||||
nreaders This is the number of RCU reading threads supported.
|
irqreaders Says to invoke RCU readers from irq level. This is currently
|
||||||
The default is twice the number of CPUs. Why twice?
|
done via timers. Defaults to "1" for variants of RCU that
|
||||||
To properly exercise RCU implementations with preemptible
|
permit this. (Or, more accurately, variants of RCU that do
|
||||||
read-side critical sections.
|
-not- permit this know to ignore this variable.)
|
||||||
|
|
||||||
nfakewriters This is the number of RCU fake writer threads to run. Fake
|
nfakewriters This is the number of RCU fake writer threads to run. Fake
|
||||||
writer threads repeatedly use the synchronous "wait for
|
writer threads repeatedly use the synchronous "wait for
|
||||||
@@ -37,6 +44,16 @@ nfakewriters This is the number of RCU fake writer threads to run. Fake
|
|||||||
to trigger special cases caused by multiple writers, such as
|
to trigger special cases caused by multiple writers, such as
|
||||||
the synchronize_srcu() early return optimization.
|
the synchronize_srcu() early return optimization.
|
||||||
|
|
||||||
|
nreaders This is the number of RCU reading threads supported.
|
||||||
|
The default is twice the number of CPUs. Why twice?
|
||||||
|
To properly exercise RCU implementations with preemptible
|
||||||
|
read-side critical sections.
|
||||||
|
|
||||||
|
shuffle_interval
|
||||||
|
The number of seconds to keep the test threads affinitied
|
||||||
|
to a particular subset of the CPUs, defaults to 3 seconds.
|
||||||
|
Used in conjunction with test_no_idle_hz.
|
||||||
|
|
||||||
stat_interval The number of seconds between output of torture
|
stat_interval The number of seconds between output of torture
|
||||||
statistics (via printk()). Regardless of the interval,
|
statistics (via printk()). Regardless of the interval,
|
||||||
statistics are printed when the module is unloaded.
|
statistics are printed when the module is unloaded.
|
||||||
@@ -44,10 +61,11 @@ stat_interval The number of seconds between output of torture
|
|||||||
be printed -only- when the module is unloaded, and this
|
be printed -only- when the module is unloaded, and this
|
||||||
is the default.
|
is the default.
|
||||||
|
|
||||||
shuffle_interval
|
stutter The length of time to run the test before pausing for this
|
||||||
The number of seconds to keep the test threads affinitied
|
same period of time. Defaults to "stutter=5", so as
|
||||||
to a particular subset of the CPUs, defaults to 5 seconds.
|
to run and pause for (roughly) five-second intervals.
|
||||||
Used in conjunction with test_no_idle_hz.
|
Specifying "stutter=0" causes the test to run continuously
|
||||||
|
without pausing, which is the old default behavior.
|
||||||
|
|
||||||
test_no_idle_hz Whether or not to test the ability of RCU to operate in
|
test_no_idle_hz Whether or not to test the ability of RCU to operate in
|
||||||
a kernel that disables the scheduling-clock interrupt to
|
a kernel that disables the scheduling-clock interrupt to
|
||||||
|
@@ -1,3 +1,11 @@
|
|||||||
|
Please note that the "What is RCU?" LWN series is an excellent place
|
||||||
|
to start learning about RCU:
|
||||||
|
|
||||||
|
1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/
|
||||||
|
2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/
|
||||||
|
3. RCU part 3: the RCU API http://lwn.net/Articles/264090/
|
||||||
|
|
||||||
|
|
||||||
What is RCU?
|
What is RCU?
|
||||||
|
|
||||||
RCU is a synchronization mechanism that was added to the Linux kernel
|
RCU is a synchronization mechanism that was added to the Linux kernel
|
||||||
@@ -772,26 +780,16 @@ Linux-kernel source code, but it helps to have a full list of the
|
|||||||
APIs, since there does not appear to be a way to categorize them
|
APIs, since there does not appear to be a way to categorize them
|
||||||
in docbook. Here is the list, by category.
|
in docbook. Here is the list, by category.
|
||||||
|
|
||||||
Markers for RCU read-side critical sections:
|
|
||||||
|
|
||||||
rcu_read_lock
|
|
||||||
rcu_read_unlock
|
|
||||||
rcu_read_lock_bh
|
|
||||||
rcu_read_unlock_bh
|
|
||||||
srcu_read_lock
|
|
||||||
srcu_read_unlock
|
|
||||||
|
|
||||||
RCU pointer/list traversal:
|
RCU pointer/list traversal:
|
||||||
|
|
||||||
rcu_dereference
|
rcu_dereference
|
||||||
list_for_each_rcu (to be deprecated in favor of
|
|
||||||
list_for_each_entry_rcu)
|
|
||||||
list_for_each_entry_rcu
|
list_for_each_entry_rcu
|
||||||
list_for_each_continue_rcu (to be deprecated in favor of new
|
|
||||||
list_for_each_entry_continue_rcu)
|
|
||||||
hlist_for_each_entry_rcu
|
hlist_for_each_entry_rcu
|
||||||
|
|
||||||
RCU pointer update:
|
list_for_each_continue_rcu (to be deprecated in favor of new
|
||||||
|
list_for_each_entry_continue_rcu)
|
||||||
|
|
||||||
|
RCU pointer/list update:
|
||||||
|
|
||||||
rcu_assign_pointer
|
rcu_assign_pointer
|
||||||
list_add_rcu
|
list_add_rcu
|
||||||
@@ -799,16 +797,36 @@ RCU pointer update:
|
|||||||
list_del_rcu
|
list_del_rcu
|
||||||
list_replace_rcu
|
list_replace_rcu
|
||||||
hlist_del_rcu
|
hlist_del_rcu
|
||||||
|
hlist_add_after_rcu
|
||||||
|
hlist_add_before_rcu
|
||||||
hlist_add_head_rcu
|
hlist_add_head_rcu
|
||||||
|
hlist_replace_rcu
|
||||||
|
list_splice_init_rcu()
|
||||||
|
|
||||||
RCU grace period:
|
RCU: Critical sections Grace period Barrier
|
||||||
|
|
||||||
|
rcu_read_lock synchronize_net rcu_barrier
|
||||||
|
rcu_read_unlock synchronize_rcu
|
||||||
|
call_rcu
|
||||||
|
|
||||||
|
|
||||||
|
bh: Critical sections Grace period Barrier
|
||||||
|
|
||||||
|
rcu_read_lock_bh call_rcu_bh rcu_barrier_bh
|
||||||
|
rcu_read_unlock_bh
|
||||||
|
|
||||||
|
|
||||||
|
sched: Critical sections Grace period Barrier
|
||||||
|
|
||||||
|
[preempt_disable] synchronize_sched rcu_barrier_sched
|
||||||
|
[and friends] call_rcu_sched
|
||||||
|
|
||||||
|
|
||||||
|
SRCU: Critical sections Grace period Barrier
|
||||||
|
|
||||||
|
srcu_read_lock synchronize_srcu N/A
|
||||||
|
srcu_read_unlock
|
||||||
|
|
||||||
synchronize_net
|
|
||||||
synchronize_sched
|
|
||||||
synchronize_rcu
|
|
||||||
synchronize_srcu
|
|
||||||
call_rcu
|
|
||||||
call_rcu_bh
|
|
||||||
|
|
||||||
See the comment headers in the source code (or the docbook generated
|
See the comment headers in the source code (or the docbook generated
|
||||||
from them) for more information.
|
from them) for more information.
|
||||||
|
27
Documentation/SELinux.txt
Normal file
27
Documentation/SELinux.txt
Normal file
@@ -0,0 +1,27 @@
|
|||||||
|
If you want to use SELinux, chances are you will want
|
||||||
|
to use the distro-provided policies, or install the
|
||||||
|
latest reference policy release from
|
||||||
|
http://oss.tresys.com/projects/refpolicy
|
||||||
|
|
||||||
|
However, if you want to install a dummy policy for
|
||||||
|
testing, you can do using 'mdp' provided under
|
||||||
|
scripts/selinux. Note that this requires the selinux
|
||||||
|
userspace to be installed - in particular you will
|
||||||
|
need checkpolicy to compile a kernel, and setfiles and
|
||||||
|
fixfiles to label the filesystem.
|
||||||
|
|
||||||
|
1. Compile the kernel with selinux enabled.
|
||||||
|
2. Type 'make' to compile mdp.
|
||||||
|
3. Make sure that you are not running with
|
||||||
|
SELinux enabled and a real policy. If
|
||||||
|
you are, reboot with selinux disabled
|
||||||
|
before continuing.
|
||||||
|
4. Run install_policy.sh:
|
||||||
|
cd scripts/selinux
|
||||||
|
sh install_policy.sh
|
||||||
|
|
||||||
|
Step 4 will create a new dummy policy valid for your
|
||||||
|
kernel, with a single selinux user, role, and type.
|
||||||
|
It will compile the policy, will set your SELINUXTYPE to
|
||||||
|
dummy in /etc/selinux/config, install the compiled policy
|
||||||
|
as 'dummy', and relabel your filesystem.
|
@@ -67,6 +67,8 @@ kernel patches.
|
|||||||
|
|
||||||
19: All new userspace interfaces are documented in Documentation/ABI/.
|
19: All new userspace interfaces are documented in Documentation/ABI/.
|
||||||
See Documentation/ABI/README for more information.
|
See Documentation/ABI/README for more information.
|
||||||
|
Patches that change userspace interfaces should be CCed to
|
||||||
|
linux-api@vger.kernel.org.
|
||||||
|
|
||||||
20: Check that it all passes `make headers_check'.
|
20: Check that it all passes `make headers_check'.
|
||||||
|
|
||||||
|
@@ -528,7 +528,33 @@ See more details on the proper patch format in the following
|
|||||||
references.
|
references.
|
||||||
|
|
||||||
|
|
||||||
|
16) Sending "git pull" requests (from Linus emails)
|
||||||
|
|
||||||
|
Please write the git repo address and branch name alone on the same line
|
||||||
|
so that I can't even by mistake pull from the wrong branch, and so
|
||||||
|
that a triple-click just selects the whole thing.
|
||||||
|
|
||||||
|
So the proper format is something along the lines of:
|
||||||
|
|
||||||
|
"Please pull from
|
||||||
|
|
||||||
|
git://jdelvare.pck.nerim.net/jdelvare-2.6 i2c-for-linus
|
||||||
|
|
||||||
|
to get these changes:"
|
||||||
|
|
||||||
|
so that I don't have to hunt-and-peck for the address and inevitably
|
||||||
|
get it wrong (actually, I've only gotten it wrong a few times, and
|
||||||
|
checking against the diffstat tells me when I get it wrong, but I'm
|
||||||
|
just a lot more comfortable when I don't have to "look for" the right
|
||||||
|
thing to pull, and double-check that I have the right branch-name).
|
||||||
|
|
||||||
|
|
||||||
|
Please use "git diff -M --stat --summary" to generate the diffstat:
|
||||||
|
the -M enables rename detection, and the summary enables a summary of
|
||||||
|
new/deleted or renamed files.
|
||||||
|
|
||||||
|
With rename detection, the statistics are rather different [...]
|
||||||
|
because git will notice that a fair number of the changes are renames.
|
||||||
|
|
||||||
-----------------------------------
|
-----------------------------------
|
||||||
SECTION 2 - HINTS, TIPS, AND TRICKS
|
SECTION 2 - HINTS, TIPS, AND TRICKS
|
||||||
|
10
Documentation/accounting/Makefile
Normal file
10
Documentation/accounting/Makefile
Normal file
@@ -0,0 +1,10 @@
|
|||||||
|
# kbuild trick to avoid linker error. Can be omitted if a module is built.
|
||||||
|
obj- := dummy.o
|
||||||
|
|
||||||
|
# List of programs to build
|
||||||
|
hostprogs-y := getdelays
|
||||||
|
|
||||||
|
# Tell kbuild to always build the programs
|
||||||
|
always := $(hostprogs-y)
|
||||||
|
|
||||||
|
HOSTCFLAGS_getdelays.o += -I$(objtree)/usr/include
|
@@ -11,6 +11,7 @@ the delays experienced by a task while
|
|||||||
a) waiting for a CPU (while being runnable)
|
a) waiting for a CPU (while being runnable)
|
||||||
b) completion of synchronous block I/O initiated by the task
|
b) completion of synchronous block I/O initiated by the task
|
||||||
c) swapping in pages
|
c) swapping in pages
|
||||||
|
d) memory reclaim
|
||||||
|
|
||||||
and makes these statistics available to userspace through
|
and makes these statistics available to userspace through
|
||||||
the taskstats interface.
|
the taskstats interface.
|
||||||
@@ -41,7 +42,7 @@ this structure. See
|
|||||||
include/linux/taskstats.h
|
include/linux/taskstats.h
|
||||||
for a description of the fields pertaining to delay accounting.
|
for a description of the fields pertaining to delay accounting.
|
||||||
It will generally be in the form of counters returning the cumulative
|
It will generally be in the form of counters returning the cumulative
|
||||||
delay seen for cpu, sync block I/O, swapin etc.
|
delay seen for cpu, sync block I/O, swapin, memory reclaim etc.
|
||||||
|
|
||||||
Taking the difference of two successive readings of a given
|
Taking the difference of two successive readings of a given
|
||||||
counter (say cpu_delay_total) for a task will give the delay
|
counter (say cpu_delay_total) for a task will give the delay
|
||||||
@@ -94,7 +95,9 @@ CPU count real total virtual total delay total
|
|||||||
7876 92005750 100000000 24001500
|
7876 92005750 100000000 24001500
|
||||||
IO count delay total
|
IO count delay total
|
||||||
0 0
|
0 0
|
||||||
MEM count delay total
|
SWAP count delay total
|
||||||
|
0 0
|
||||||
|
RECLAIM count delay total
|
||||||
0 0
|
0 0
|
||||||
|
|
||||||
Get delays seen in executing a given simple command
|
Get delays seen in executing a given simple command
|
||||||
@@ -108,5 +111,7 @@ CPU count real total virtual total delay total
|
|||||||
6 4000250 4000000 0
|
6 4000250 4000000 0
|
||||||
IO count delay total
|
IO count delay total
|
||||||
0 0
|
0 0
|
||||||
MEM count delay total
|
SWAP count delay total
|
||||||
|
0 0
|
||||||
|
RECLAIM count delay total
|
||||||
0 0
|
0 0
|
||||||
|
@@ -196,14 +196,24 @@ void print_delayacct(struct taskstats *t)
|
|||||||
" %15llu%15llu%15llu%15llu\n"
|
" %15llu%15llu%15llu%15llu\n"
|
||||||
"IO %15s%15s\n"
|
"IO %15s%15s\n"
|
||||||
" %15llu%15llu\n"
|
" %15llu%15llu\n"
|
||||||
"MEM %15s%15s\n"
|
"SWAP %15s%15s\n"
|
||||||
|
" %15llu%15llu\n"
|
||||||
|
"RECLAIM %12s%15s\n"
|
||||||
" %15llu%15llu\n",
|
" %15llu%15llu\n",
|
||||||
"count", "real total", "virtual total", "delay total",
|
"count", "real total", "virtual total", "delay total",
|
||||||
t->cpu_count, t->cpu_run_real_total, t->cpu_run_virtual_total,
|
(unsigned long long)t->cpu_count,
|
||||||
t->cpu_delay_total,
|
(unsigned long long)t->cpu_run_real_total,
|
||||||
|
(unsigned long long)t->cpu_run_virtual_total,
|
||||||
|
(unsigned long long)t->cpu_delay_total,
|
||||||
"count", "delay total",
|
"count", "delay total",
|
||||||
t->blkio_count, t->blkio_delay_total,
|
(unsigned long long)t->blkio_count,
|
||||||
"count", "delay total", t->swapin_count, t->swapin_delay_total);
|
(unsigned long long)t->blkio_delay_total,
|
||||||
|
"count", "delay total",
|
||||||
|
(unsigned long long)t->swapin_count,
|
||||||
|
(unsigned long long)t->swapin_delay_total,
|
||||||
|
"count", "delay total",
|
||||||
|
(unsigned long long)t->freepages_count,
|
||||||
|
(unsigned long long)t->freepages_delay_total);
|
||||||
}
|
}
|
||||||
|
|
||||||
void task_context_switch_counts(struct taskstats *t)
|
void task_context_switch_counts(struct taskstats *t)
|
||||||
@@ -211,14 +221,17 @@ void task_context_switch_counts(struct taskstats *t)
|
|||||||
printf("\n\nTask %15s%15s\n"
|
printf("\n\nTask %15s%15s\n"
|
||||||
" %15llu%15llu\n",
|
" %15llu%15llu\n",
|
||||||
"voluntary", "nonvoluntary",
|
"voluntary", "nonvoluntary",
|
||||||
t->nvcsw, t->nivcsw);
|
(unsigned long long)t->nvcsw, (unsigned long long)t->nivcsw);
|
||||||
}
|
}
|
||||||
|
|
||||||
void print_cgroupstats(struct cgroupstats *c)
|
void print_cgroupstats(struct cgroupstats *c)
|
||||||
{
|
{
|
||||||
printf("sleeping %llu, blocked %llu, running %llu, stopped %llu, "
|
printf("sleeping %llu, blocked %llu, running %llu, stopped %llu, "
|
||||||
"uninterruptible %llu\n", c->nr_sleeping, c->nr_io_wait,
|
"uninterruptible %llu\n", (unsigned long long)c->nr_sleeping,
|
||||||
c->nr_running, c->nr_stopped, c->nr_uninterruptible);
|
(unsigned long long)c->nr_io_wait,
|
||||||
|
(unsigned long long)c->nr_running,
|
||||||
|
(unsigned long long)c->nr_stopped,
|
||||||
|
(unsigned long long)c->nr_uninterruptible);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@@ -6,7 +6,7 @@ This document contains an explanation of the struct taskstats fields.
|
|||||||
There are three different groups of fields in the struct taskstats:
|
There are three different groups of fields in the struct taskstats:
|
||||||
|
|
||||||
1) Common and basic accounting fields
|
1) Common and basic accounting fields
|
||||||
If CONFIG_TASKSTATS is set, the taskstats inteface is enabled and
|
If CONFIG_TASKSTATS is set, the taskstats interface is enabled and
|
||||||
the common fields and basic accounting fields are collected for
|
the common fields and basic accounting fields are collected for
|
||||||
delivery at do_exit() of a task.
|
delivery at do_exit() of a task.
|
||||||
2) Delay accounting fields
|
2) Delay accounting fields
|
||||||
@@ -24,6 +24,10 @@ There are three different groups of fields in the struct taskstats:
|
|||||||
|
|
||||||
4) Per-task and per-thread context switch count statistics
|
4) Per-task and per-thread context switch count statistics
|
||||||
|
|
||||||
|
5) Time accounting for SMT machines
|
||||||
|
|
||||||
|
6) Extended delay accounting fields for memory reclaim
|
||||||
|
|
||||||
Future extension should add fields to the end of the taskstats struct, and
|
Future extension should add fields to the end of the taskstats struct, and
|
||||||
should not change the relative position of each field within the struct.
|
should not change the relative position of each field within the struct.
|
||||||
|
|
||||||
@@ -164,4 +168,13 @@ struct taskstats {
|
|||||||
__u64 nvcsw; /* Context voluntary switch counter */
|
__u64 nvcsw; /* Context voluntary switch counter */
|
||||||
__u64 nivcsw; /* Context involuntary switch counter */
|
__u64 nivcsw; /* Context involuntary switch counter */
|
||||||
|
|
||||||
|
5) Time accounting for SMT machines
|
||||||
|
__u64 ac_utimescaled; /* utime scaled on frequency etc */
|
||||||
|
__u64 ac_stimescaled; /* stime scaled on frequency etc */
|
||||||
|
__u64 cpu_scaled_run_real_total; /* scaled cpu_run_real_total */
|
||||||
|
|
||||||
|
6) Extended delay accounting fields for memory reclaim
|
||||||
|
/* Delay waiting for memory reclaim */
|
||||||
|
__u64 freepages_count;
|
||||||
|
__u64 freepages_delay_total;
|
||||||
}
|
}
|
||||||
|
@@ -32,7 +32,7 @@ Linux currently supports the following features on the IXP4xx chips:
|
|||||||
- Flash access (MTD/JFFS)
|
- Flash access (MTD/JFFS)
|
||||||
- I2C through GPIO on IXP42x
|
- I2C through GPIO on IXP42x
|
||||||
- GPIO for input/output/interrupts
|
- GPIO for input/output/interrupts
|
||||||
See include/asm-arm/arch-ixp4xx/platform.h for access functions.
|
See arch/arm/mach-ixp4xx/include/mach/platform.h for access functions.
|
||||||
- Timers (watchdog, OS)
|
- Timers (watchdog, OS)
|
||||||
|
|
||||||
The following components of the chips are not supported by Linux and
|
The following components of the chips are not supported by Linux and
|
||||||
|
@@ -138,14 +138,8 @@ So, what's changed?
|
|||||||
|
|
||||||
Set active the IRQ edge(s)/level. This replaces the
|
Set active the IRQ edge(s)/level. This replaces the
|
||||||
SA1111 INTPOL manipulation, and the set_GPIO_IRQ_edge()
|
SA1111 INTPOL manipulation, and the set_GPIO_IRQ_edge()
|
||||||
function. Type should be one of the following:
|
function. Type should be one of IRQ_TYPE_xxx defined in
|
||||||
|
<linux/irq.h>
|
||||||
#define IRQT_NOEDGE (0)
|
|
||||||
#define IRQT_RISING (__IRQT_RISEDGE)
|
|
||||||
#define IRQT_FALLING (__IRQT_FALEDGE)
|
|
||||||
#define IRQT_BOTHEDGE (__IRQT_RISEDGE|__IRQT_FALEDGE)
|
|
||||||
#define IRQT_LOW (__IRQT_LOWLVL)
|
|
||||||
#define IRQT_HIGH (__IRQT_HIGHLVL)
|
|
||||||
|
|
||||||
3. set_GPIO_IRQ_edge() is obsolete, and should be replaced by set_irq_type.
|
3. set_GPIO_IRQ_edge() is obsolete, and should be replaced by set_irq_type.
|
||||||
|
|
||||||
@@ -164,7 +158,7 @@ So, what's changed?
|
|||||||
be re-checked for pending events. (see the Neponset IRQ handler for
|
be re-checked for pending events. (see the Neponset IRQ handler for
|
||||||
details).
|
details).
|
||||||
|
|
||||||
7. fixup_irq() is gone, as is include/asm-arm/arch-*/irq.h
|
7. fixup_irq() is gone, as is arch/arm/mach-*/include/mach/irq.h
|
||||||
|
|
||||||
Please note that this will not solve all problems - some of them are
|
Please note that this will not solve all problems - some of them are
|
||||||
hardware based. Mixing level-based and edge-based IRQs on the same
|
hardware based. Mixing level-based and edge-based IRQs on the same
|
||||||
|
@@ -79,7 +79,7 @@ Machine/Platform support
|
|||||||
To this end, we now have arch/arm/mach-$(MACHINE) directories which are
|
To this end, we now have arch/arm/mach-$(MACHINE) directories which are
|
||||||
designed to house the non-driver files for a particular machine (eg, PCI,
|
designed to house the non-driver files for a particular machine (eg, PCI,
|
||||||
memory management, architecture definitions etc). For all future
|
memory management, architecture definitions etc). For all future
|
||||||
machines, there should be a corresponding include/asm-arm/arch-$(MACHINE)
|
machines, there should be a corresponding arch/arm/mach-$(MACHINE)/include/mach
|
||||||
directory.
|
directory.
|
||||||
|
|
||||||
|
|
||||||
@@ -176,7 +176,7 @@ Kernel entry (head.S)
|
|||||||
class typically based around one or more system on a chip devices, and
|
class typically based around one or more system on a chip devices, and
|
||||||
acts as a natural container around the actual implementations. These
|
acts as a natural container around the actual implementations. These
|
||||||
classes are given directories - arch/arm/mach-<class> and
|
classes are given directories - arch/arm/mach-<class> and
|
||||||
include/asm-arm/arch-<class> - which contain the source files to
|
arch/arm/mach-<class> - which contain the source files to/include/mach
|
||||||
support the machine class. This directories also contain any machine
|
support the machine class. This directories also contain any machine
|
||||||
specific supporting code.
|
specific supporting code.
|
||||||
|
|
||||||
|
@@ -13,16 +13,31 @@ Introduction
|
|||||||
data-sheet/users manual to find out the complete list.
|
data-sheet/users manual to find out the complete list.
|
||||||
|
|
||||||
|
|
||||||
|
GPIOLIB
|
||||||
|
-------
|
||||||
|
|
||||||
|
With the event of the GPIOLIB in drivers/gpio, support for some
|
||||||
|
of the GPIO functions such as reading and writing a pin will
|
||||||
|
be removed in favour of this common access method.
|
||||||
|
|
||||||
|
Once all the extant drivers have been converted, the functions
|
||||||
|
listed below will be removed (they may be marked as __deprecated
|
||||||
|
in the near future).
|
||||||
|
|
||||||
|
- s3c2410_gpio_getpin
|
||||||
|
- s3c2410_gpio_setpin
|
||||||
|
|
||||||
|
|
||||||
Headers
|
Headers
|
||||||
-------
|
-------
|
||||||
|
|
||||||
See include/asm-arm/arch-s3c2410/regs-gpio.h for the list
|
See arch/arm/mach-s3c2410/include/mach/regs-gpio.h for the list
|
||||||
of GPIO pins, and the configuration values for them. This
|
of GPIO pins, and the configuration values for them. This
|
||||||
is included by using #include <asm/arch/regs-gpio.h>
|
is included by using #include <mach/regs-gpio.h>
|
||||||
|
|
||||||
The GPIO management functions are defined in the hardware
|
The GPIO management functions are defined in the hardware
|
||||||
header include/asm-arm/arch-s3c2410/hardware.h which can be
|
header arch/arm/mach-s3c2410/include/mach/hardware.h which can be
|
||||||
included by #include <asm/arch/hardware.h>
|
included by #include <mach/hardware.h>
|
||||||
|
|
||||||
A useful amount of documentation can be found in the hardware
|
A useful amount of documentation can be found in the hardware
|
||||||
header on how the GPIO functions (and others) work.
|
header on how the GPIO functions (and others) work.
|
||||||
|
@@ -8,9 +8,10 @@ Introduction
|
|||||||
|
|
||||||
The Samsung S3C24XX range of ARM9 System-on-Chip CPUs are supported
|
The Samsung S3C24XX range of ARM9 System-on-Chip CPUs are supported
|
||||||
by the 's3c2410' architecture of ARM Linux. Currently the S3C2410,
|
by the 's3c2410' architecture of ARM Linux. Currently the S3C2410,
|
||||||
S3C2412, S3C2413, S3C2440 and S3C2442 devices are supported.
|
S3C2412, S3C2413, S3C2440, S3C2442 and S3C2443 devices are supported.
|
||||||
|
|
||||||
|
Support for the S3C2400 and S3C24A0 series are in progress.
|
||||||
|
|
||||||
Support for the S3C2400 series is in progress.
|
|
||||||
|
|
||||||
Configuration
|
Configuration
|
||||||
-------------
|
-------------
|
||||||
@@ -36,7 +37,23 @@ Layout
|
|||||||
in arch/arm/mach-s3c2410 and S3C2440 in arch/arm/mach-s3c2440
|
in arch/arm/mach-s3c2410 and S3C2440 in arch/arm/mach-s3c2440
|
||||||
|
|
||||||
Register, kernel and platform data definitions are held in the
|
Register, kernel and platform data definitions are held in the
|
||||||
include/asm-arm/arch-s3c2410 directory.
|
arch/arm/mach-s3c2410 directory./include/mach
|
||||||
|
|
||||||
|
arch/arm/plat-s3c24xx:
|
||||||
|
|
||||||
|
Files in here are either common to all the s3c24xx family,
|
||||||
|
or are common to only some of them with names to indicate this
|
||||||
|
status. The files that are not common to all are generally named
|
||||||
|
with the initial cpu they support in the series to ensure a short
|
||||||
|
name without any possibility of confusion with newer devices.
|
||||||
|
|
||||||
|
As an example, initially s3c244x would cover s3c2440 and s3c2442, but
|
||||||
|
with the s3c2443 which does not share many of the same drivers in
|
||||||
|
this directory, the name becomes invalid. We stick to s3c2440-<x>
|
||||||
|
to indicate a driver that is s3c2440 and s3c2442 compatible.
|
||||||
|
|
||||||
|
This does mean that to find the status of any given SoC, a number
|
||||||
|
of directories may need to be searched.
|
||||||
|
|
||||||
|
|
||||||
Machines
|
Machines
|
||||||
@@ -159,6 +176,17 @@ NAND
|
|||||||
For more information see Documentation/arm/Samsung-S3C24XX/NAND.txt
|
For more information see Documentation/arm/Samsung-S3C24XX/NAND.txt
|
||||||
|
|
||||||
|
|
||||||
|
SD/MMC
|
||||||
|
------
|
||||||
|
|
||||||
|
The SD/MMC hardware pre S3C2443 is supported in the current
|
||||||
|
kernel, the driver is drivers/mmc/host/s3cmci.c and supports
|
||||||
|
1 and 4 bit SD or MMC cards.
|
||||||
|
|
||||||
|
The SDIO behaviour of this driver has not been fully tested. There is no
|
||||||
|
current support for hardware SDIO interrupts.
|
||||||
|
|
||||||
|
|
||||||
Serial
|
Serial
|
||||||
------
|
------
|
||||||
|
|
||||||
@@ -178,6 +206,9 @@ GPIO
|
|||||||
The core contains support for manipulating the GPIO, see the
|
The core contains support for manipulating the GPIO, see the
|
||||||
documentation in GPIO.txt in the same directory as this file.
|
documentation in GPIO.txt in the same directory as this file.
|
||||||
|
|
||||||
|
Newer kernels carry GPIOLIB, and support is being moved towards
|
||||||
|
this with some of the older support in line to be removed.
|
||||||
|
|
||||||
|
|
||||||
Clock Management
|
Clock Management
|
||||||
----------------
|
----------------
|
||||||
|
@@ -49,7 +49,7 @@ Board Support
|
|||||||
Platform Data
|
Platform Data
|
||||||
-------------
|
-------------
|
||||||
|
|
||||||
See linux/include/asm-arm/arch-s3c2410/usb-control.h for the
|
See arch/arm/mach-s3c2410/include/mach/usb-control.h for the
|
||||||
descriptions of the platform device data. An implementation
|
descriptions of the platform device data. An implementation
|
||||||
can be found in linux/arch/arm/mach-s3c2410/usb-simtec.c .
|
can be found in linux/arch/arm/mach-s3c2410/usb-simtec.c .
|
||||||
|
|
||||||
|
10
Documentation/auxdisplay/Makefile
Normal file
10
Documentation/auxdisplay/Makefile
Normal file
@@ -0,0 +1,10 @@
|
|||||||
|
# kbuild trick to avoid linker error. Can be omitted if a module is built.
|
||||||
|
obj- := dummy.o
|
||||||
|
|
||||||
|
# List of programs to build
|
||||||
|
hostprogs-y := cfag12864b-example
|
||||||
|
|
||||||
|
# Tell kbuild to always build the programs
|
||||||
|
always := $(hostprogs-y)
|
||||||
|
|
||||||
|
HOSTCFLAGS_cfag12864b-example.o += -I$(objtree)/usr/include
|
@@ -3,7 +3,7 @@
|
|||||||
===================================
|
===================================
|
||||||
|
|
||||||
License: GPLv2
|
License: GPLv2
|
||||||
Author & Maintainer: Miguel Ojeda Sandonis <maxextreme@gmail.com>
|
Author & Maintainer: Miguel Ojeda Sandonis
|
||||||
Date: 2006-10-27
|
Date: 2006-10-27
|
||||||
|
|
||||||
|
|
||||||
@@ -22,7 +22,7 @@ Date: 2006-10-27
|
|||||||
1. DRIVER INFORMATION
|
1. DRIVER INFORMATION
|
||||||
---------------------
|
---------------------
|
||||||
|
|
||||||
This driver support one cfag12864b display at time.
|
This driver supports a cfag12864b LCD.
|
||||||
|
|
||||||
|
|
||||||
---------------------
|
---------------------
|
||||||
|
@@ -4,7 +4,7 @@
|
|||||||
* Description: cfag12864b LCD userspace example program
|
* Description: cfag12864b LCD userspace example program
|
||||||
* License: GPLv2
|
* License: GPLv2
|
||||||
*
|
*
|
||||||
* Author: Copyright (C) Miguel Ojeda Sandonis <maxextreme@gmail.com>
|
* Author: Copyright (C) Miguel Ojeda Sandonis
|
||||||
* Date: 2006-10-31
|
* Date: 2006-10-31
|
||||||
*
|
*
|
||||||
* This program is free software; you can redistribute it and/or modify
|
* This program is free software; you can redistribute it and/or modify
|
||||||
|
@@ -3,7 +3,7 @@
|
|||||||
==========================================
|
==========================================
|
||||||
|
|
||||||
License: GPLv2
|
License: GPLv2
|
||||||
Author & Maintainer: Miguel Ojeda Sandonis <maxextreme@gmail.com>
|
Author & Maintainer: Miguel Ojeda Sandonis
|
||||||
Date: 2006-10-27
|
Date: 2006-10-27
|
||||||
|
|
||||||
|
|
||||||
@@ -21,7 +21,7 @@ Date: 2006-10-27
|
|||||||
1. DRIVER INFORMATION
|
1. DRIVER INFORMATION
|
||||||
---------------------
|
---------------------
|
||||||
|
|
||||||
This driver support the ks0108 LCD controller.
|
This driver supports the ks0108 LCD controller.
|
||||||
|
|
||||||
|
|
||||||
---------------------
|
---------------------
|
||||||
|
@@ -1,155 +0,0 @@
|
|||||||
A Simple Guide to Configure KGDB
|
|
||||||
|
|
||||||
Sonic Zhang <sonic.zhang@analog.com>
|
|
||||||
Aug. 24th 2006
|
|
||||||
|
|
||||||
|
|
||||||
This KGDB patch enables the kernel developer to do source level debugging on
|
|
||||||
the kernel for the Blackfin architecture. The debugging works over either the
|
|
||||||
ethernet interface or one of the uarts. Both software breakpoints and
|
|
||||||
hardware breakpoints are supported in this version.
|
|
||||||
http://docs.blackfin.uclinux.org/doku.php?id=kgdb
|
|
||||||
|
|
||||||
|
|
||||||
2 known issues:
|
|
||||||
1. This bug:
|
|
||||||
http://blackfin.uclinux.org/tracker/index.php?func=detail&aid=544&group_id=18&atid=145
|
|
||||||
The GDB client for Blackfin uClinux causes incorrect values of local
|
|
||||||
variables to be displayed when the user breaks the running of kernel in GDB.
|
|
||||||
2. Because of a hardware bug in Blackfin 533 v1.0.3:
|
|
||||||
05000067 - Watchpoints (Hardware Breakpoints) are not supported
|
|
||||||
Hardware breakpoints cannot be set properly.
|
|
||||||
|
|
||||||
|
|
||||||
Debug over Ethernet:
|
|
||||||
|
|
||||||
1. Compile and install the cross platform version of gdb for blackfin, which
|
|
||||||
can be found at $(BINROOT)/bfin-elf-gdb.
|
|
||||||
|
|
||||||
2. Apply this patch to the 2.6.x kernel. Select the menuconfig option under
|
|
||||||
"Kernel hacking" -> "Kernel debugging" -> "KGDB: kernel debug with remote gdb".
|
|
||||||
With this selected, option "Full Symbolic/Source Debugging support" and
|
|
||||||
"Compile the kernel with frame pointers" are also selected.
|
|
||||||
|
|
||||||
3. Select option "KGDB: connect over (Ethernet)". Add "kgdboe=@target-IP/,@host-IP/" to
|
|
||||||
the option "Compiled-in Kernel Boot Parameter" under "Kernel hacking".
|
|
||||||
|
|
||||||
4. Connect minicom to the serial port and boot the kernel image.
|
|
||||||
|
|
||||||
5. Configure the IP "/> ifconfig eth0 target-IP"
|
|
||||||
|
|
||||||
6. Start GDB client "bfin-elf-gdb vmlinux".
|
|
||||||
|
|
||||||
7. Connect to the target "(gdb) target remote udp:target-IP:6443".
|
|
||||||
|
|
||||||
8. Set software breakpoint "(gdb) break sys_open".
|
|
||||||
|
|
||||||
9. Continue "(gdb) c".
|
|
||||||
|
|
||||||
10. Run ls in the target console "/> ls".
|
|
||||||
|
|
||||||
11. Breakpoint hits. "Breakpoint 1: sys_open(..."
|
|
||||||
|
|
||||||
12. Display local variables and function paramters.
|
|
||||||
(*) This operation gives wrong results, see known issue 1.
|
|
||||||
|
|
||||||
13. Single stepping "(gdb) si".
|
|
||||||
|
|
||||||
14. Remove breakpoint 1. "(gdb) del 1"
|
|
||||||
|
|
||||||
15. Set hardware breakpoint "(gdb) hbreak sys_open".
|
|
||||||
|
|
||||||
16. Continue "(gdb) c".
|
|
||||||
|
|
||||||
17. Run ls in the target console "/> ls".
|
|
||||||
|
|
||||||
18. Hardware breakpoint hits. "Breakpoint 1: sys_open(...".
|
|
||||||
(*) This hardware breakpoint will not be hit, see known issue 2.
|
|
||||||
|
|
||||||
19. Continue "(gdb) c".
|
|
||||||
|
|
||||||
20. Interrupt the target in GDB "Ctrl+C".
|
|
||||||
|
|
||||||
21. Detach from the target "(gdb) detach".
|
|
||||||
|
|
||||||
22. Exit GDB "(gdb) quit".
|
|
||||||
|
|
||||||
|
|
||||||
Debug over the UART:
|
|
||||||
|
|
||||||
1. Compile and install the cross platform version of gdb for blackfin, which
|
|
||||||
can be found at $(BINROOT)/bfin-elf-gdb.
|
|
||||||
|
|
||||||
2. Apply this patch to the 2.6.x kernel. Select the menuconfig option under
|
|
||||||
"Kernel hacking" -> "Kernel debugging" -> "KGDB: kernel debug with remote gdb".
|
|
||||||
With this selected, option "Full Symbolic/Source Debugging support" and
|
|
||||||
"Compile the kernel with frame pointers" are also selected.
|
|
||||||
|
|
||||||
3. Select option "KGDB: connect over (UART)". Set "KGDB: UART port number" to be
|
|
||||||
a different one from the console. Don't forget to change the mode of
|
|
||||||
blackfin serial driver to PIO. Otherwise kgdb works incorrectly on UART.
|
|
||||||
|
|
||||||
4. If you want connect to kgdb when the kernel boots, enable
|
|
||||||
"KGDB: Wait for gdb connection early"
|
|
||||||
|
|
||||||
5. Compile kernel.
|
|
||||||
|
|
||||||
6. Connect minicom to the serial port of the console and boot the kernel image.
|
|
||||||
|
|
||||||
7. Start GDB client "bfin-elf-gdb vmlinux".
|
|
||||||
|
|
||||||
8. Set the baud rate in GDB "(gdb) set remotebaud 57600".
|
|
||||||
|
|
||||||
9. Connect to the target on the second serial port "(gdb) target remote /dev/ttyS1".
|
|
||||||
|
|
||||||
10. Set software breakpoint "(gdb) break sys_open".
|
|
||||||
|
|
||||||
11. Continue "(gdb) c".
|
|
||||||
|
|
||||||
12. Run ls in the target console "/> ls".
|
|
||||||
|
|
||||||
13. A breakpoint is hit. "Breakpoint 1: sys_open(..."
|
|
||||||
|
|
||||||
14. All other operations are the same as that in KGDB over Ethernet.
|
|
||||||
|
|
||||||
|
|
||||||
Debug over the same UART as console:
|
|
||||||
|
|
||||||
1. Compile and install the cross platform version of gdb for blackfin, which
|
|
||||||
can be found at $(BINROOT)/bfin-elf-gdb.
|
|
||||||
|
|
||||||
2. Apply this patch to the 2.6.x kernel. Select the menuconfig option under
|
|
||||||
"Kernel hacking" -> "Kernel debugging" -> "KGDB: kernel debug with remote gdb".
|
|
||||||
With this selected, option "Full Symbolic/Source Debugging support" and
|
|
||||||
"Compile the kernel with frame pointers" are also selected.
|
|
||||||
|
|
||||||
3. Select option "KGDB: connect over UART". Set "KGDB: UART port number" to console.
|
|
||||||
Don't forget to change the mode of blackfin serial driver to PIO.
|
|
||||||
Otherwise kgdb works incorrectly on UART.
|
|
||||||
|
|
||||||
4. If you want connect to kgdb when the kernel boots, enable
|
|
||||||
"KGDB: Wait for gdb connection early"
|
|
||||||
|
|
||||||
5. Connect minicom to the serial port and boot the kernel image.
|
|
||||||
|
|
||||||
6. (Optional) Ask target to wait for gdb connection by entering Ctrl+A. In minicom, you should enter Ctrl+A+A.
|
|
||||||
|
|
||||||
7. Start GDB client "bfin-elf-gdb vmlinux".
|
|
||||||
|
|
||||||
8. Set the baud rate in GDB "(gdb) set remotebaud 57600".
|
|
||||||
|
|
||||||
9. Connect to the target "(gdb) target remote /dev/ttyS0".
|
|
||||||
|
|
||||||
10. Set software breakpoint "(gdb) break sys_open".
|
|
||||||
|
|
||||||
11. Continue "(gdb) c". Then enter Ctrl+C twice to stop GDB connection.
|
|
||||||
|
|
||||||
12. Run ls in the target console "/> ls". Dummy string can be seen on the console.
|
|
||||||
|
|
||||||
13. Then connect the gdb to target again. "(gdb) target remote /dev/ttyS0".
|
|
||||||
Now you will find a breakpoint is hit. "Breakpoint 1: sys_open(..."
|
|
||||||
|
|
||||||
14. All other operations are the same as that in KGDB over Ethernet. The only
|
|
||||||
difference is that after continue command in GDB, please stop GDB
|
|
||||||
connection by 2 "Ctrl+C"s and connect again after breakpoints are hit or
|
|
||||||
Ctrl+A is entered.
|
|
327
Documentation/block/data-integrity.txt
Normal file
327
Documentation/block/data-integrity.txt
Normal file
@@ -0,0 +1,327 @@
|
|||||||
|
----------------------------------------------------------------------
|
||||||
|
1. INTRODUCTION
|
||||||
|
|
||||||
|
Modern filesystems feature checksumming of data and metadata to
|
||||||
|
protect against data corruption. However, the detection of the
|
||||||
|
corruption is done at read time which could potentially be months
|
||||||
|
after the data was written. At that point the original data that the
|
||||||
|
application tried to write is most likely lost.
|
||||||
|
|
||||||
|
The solution is to ensure that the disk is actually storing what the
|
||||||
|
application meant it to. Recent additions to both the SCSI family
|
||||||
|
protocols (SBC Data Integrity Field, SCC protection proposal) as well
|
||||||
|
as SATA/T13 (External Path Protection) try to remedy this by adding
|
||||||
|
support for appending integrity metadata to an I/O. The integrity
|
||||||
|
metadata (or protection information in SCSI terminology) includes a
|
||||||
|
checksum for each sector as well as an incrementing counter that
|
||||||
|
ensures the individual sectors are written in the right order. And
|
||||||
|
for some protection schemes also that the I/O is written to the right
|
||||||
|
place on disk.
|
||||||
|
|
||||||
|
Current storage controllers and devices implement various protective
|
||||||
|
measures, for instance checksumming and scrubbing. But these
|
||||||
|
technologies are working in their own isolated domains or at best
|
||||||
|
between adjacent nodes in the I/O path. The interesting thing about
|
||||||
|
DIF and the other integrity extensions is that the protection format
|
||||||
|
is well defined and every node in the I/O path can verify the
|
||||||
|
integrity of the I/O and reject it if corruption is detected. This
|
||||||
|
allows not only corruption prevention but also isolation of the point
|
||||||
|
of failure.
|
||||||
|
|
||||||
|
----------------------------------------------------------------------
|
||||||
|
2. THE DATA INTEGRITY EXTENSIONS
|
||||||
|
|
||||||
|
As written, the protocol extensions only protect the path between
|
||||||
|
controller and storage device. However, many controllers actually
|
||||||
|
allow the operating system to interact with the integrity metadata
|
||||||
|
(IMD). We have been working with several FC/SAS HBA vendors to enable
|
||||||
|
the protection information to be transferred to and from their
|
||||||
|
controllers.
|
||||||
|
|
||||||
|
The SCSI Data Integrity Field works by appending 8 bytes of protection
|
||||||
|
information to each sector. The data + integrity metadata is stored
|
||||||
|
in 520 byte sectors on disk. Data + IMD are interleaved when
|
||||||
|
transferred between the controller and target. The T13 proposal is
|
||||||
|
similar.
|
||||||
|
|
||||||
|
Because it is highly inconvenient for operating systems to deal with
|
||||||
|
520 (and 4104) byte sectors, we approached several HBA vendors and
|
||||||
|
encouraged them to allow separation of the data and integrity metadata
|
||||||
|
scatter-gather lists.
|
||||||
|
|
||||||
|
The controller will interleave the buffers on write and split them on
|
||||||
|
read. This means that the Linux can DMA the data buffers to and from
|
||||||
|
host memory without changes to the page cache.
|
||||||
|
|
||||||
|
Also, the 16-bit CRC checksum mandated by both the SCSI and SATA specs
|
||||||
|
is somewhat heavy to compute in software. Benchmarks found that
|
||||||
|
calculating this checksum had a significant impact on system
|
||||||
|
performance for a number of workloads. Some controllers allow a
|
||||||
|
lighter-weight checksum to be used when interfacing with the operating
|
||||||
|
system. Emulex, for instance, supports the TCP/IP checksum instead.
|
||||||
|
The IP checksum received from the OS is converted to the 16-bit CRC
|
||||||
|
when writing and vice versa. This allows the integrity metadata to be
|
||||||
|
generated by Linux or the application at very low cost (comparable to
|
||||||
|
software RAID5).
|
||||||
|
|
||||||
|
The IP checksum is weaker than the CRC in terms of detecting bit
|
||||||
|
errors. However, the strength is really in the separation of the data
|
||||||
|
buffers and the integrity metadata. These two distinct buffers much
|
||||||
|
match up for an I/O to complete.
|
||||||
|
|
||||||
|
The separation of the data and integrity metadata buffers as well as
|
||||||
|
the choice in checksums is referred to as the Data Integrity
|
||||||
|
Extensions. As these extensions are outside the scope of the protocol
|
||||||
|
bodies (T10, T13), Oracle and its partners are trying to standardize
|
||||||
|
them within the Storage Networking Industry Association.
|
||||||
|
|
||||||
|
----------------------------------------------------------------------
|
||||||
|
3. KERNEL CHANGES
|
||||||
|
|
||||||
|
The data integrity framework in Linux enables protection information
|
||||||
|
to be pinned to I/Os and sent to/received from controllers that
|
||||||
|
support it.
|
||||||
|
|
||||||
|
The advantage to the integrity extensions in SCSI and SATA is that
|
||||||
|
they enable us to protect the entire path from application to storage
|
||||||
|
device. However, at the same time this is also the biggest
|
||||||
|
disadvantage. It means that the protection information must be in a
|
||||||
|
format that can be understood by the disk.
|
||||||
|
|
||||||
|
Generally Linux/POSIX applications are agnostic to the intricacies of
|
||||||
|
the storage devices they are accessing. The virtual filesystem switch
|
||||||
|
and the block layer make things like hardware sector size and
|
||||||
|
transport protocols completely transparent to the application.
|
||||||
|
|
||||||
|
However, this level of detail is required when preparing the
|
||||||
|
protection information to send to a disk. Consequently, the very
|
||||||
|
concept of an end-to-end protection scheme is a layering violation.
|
||||||
|
It is completely unreasonable for an application to be aware whether
|
||||||
|
it is accessing a SCSI or SATA disk.
|
||||||
|
|
||||||
|
The data integrity support implemented in Linux attempts to hide this
|
||||||
|
from the application. As far as the application (and to some extent
|
||||||
|
the kernel) is concerned, the integrity metadata is opaque information
|
||||||
|
that's attached to the I/O.
|
||||||
|
|
||||||
|
The current implementation allows the block layer to automatically
|
||||||
|
generate the protection information for any I/O. Eventually the
|
||||||
|
intent is to move the integrity metadata calculation to userspace for
|
||||||
|
user data. Metadata and other I/O that originates within the kernel
|
||||||
|
will still use the automatic generation interface.
|
||||||
|
|
||||||
|
Some storage devices allow each hardware sector to be tagged with a
|
||||||
|
16-bit value. The owner of this tag space is the owner of the block
|
||||||
|
device. I.e. the filesystem in most cases. The filesystem can use
|
||||||
|
this extra space to tag sectors as they see fit. Because the tag
|
||||||
|
space is limited, the block interface allows tagging bigger chunks by
|
||||||
|
way of interleaving. This way, 8*16 bits of information can be
|
||||||
|
attached to a typical 4KB filesystem block.
|
||||||
|
|
||||||
|
This also means that applications such as fsck and mkfs will need
|
||||||
|
access to manipulate the tags from user space. A passthrough
|
||||||
|
interface for this is being worked on.
|
||||||
|
|
||||||
|
|
||||||
|
----------------------------------------------------------------------
|
||||||
|
4. BLOCK LAYER IMPLEMENTATION DETAILS
|
||||||
|
|
||||||
|
4.1 BIO
|
||||||
|
|
||||||
|
The data integrity patches add a new field to struct bio when
|
||||||
|
CONFIG_BLK_DEV_INTEGRITY is enabled. bio->bi_integrity is a pointer
|
||||||
|
to a struct bip which contains the bio integrity payload. Essentially
|
||||||
|
a bip is a trimmed down struct bio which holds a bio_vec containing
|
||||||
|
the integrity metadata and the required housekeeping information (bvec
|
||||||
|
pool, vector count, etc.)
|
||||||
|
|
||||||
|
A kernel subsystem can enable data integrity protection on a bio by
|
||||||
|
calling bio_integrity_alloc(bio). This will allocate and attach the
|
||||||
|
bip to the bio.
|
||||||
|
|
||||||
|
Individual pages containing integrity metadata can subsequently be
|
||||||
|
attached using bio_integrity_add_page().
|
||||||
|
|
||||||
|
bio_free() will automatically free the bip.
|
||||||
|
|
||||||
|
|
||||||
|
4.2 BLOCK DEVICE
|
||||||
|
|
||||||
|
Because the format of the protection data is tied to the physical
|
||||||
|
disk, each block device has been extended with a block integrity
|
||||||
|
profile (struct blk_integrity). This optional profile is registered
|
||||||
|
with the block layer using blk_integrity_register().
|
||||||
|
|
||||||
|
The profile contains callback functions for generating and verifying
|
||||||
|
the protection data, as well as getting and setting application tags.
|
||||||
|
The profile also contains a few constants to aid in completing,
|
||||||
|
merging and splitting the integrity metadata.
|
||||||
|
|
||||||
|
Layered block devices will need to pick a profile that's appropriate
|
||||||
|
for all subdevices. blk_integrity_compare() can help with that. DM
|
||||||
|
and MD linear, RAID0 and RAID1 are currently supported. RAID4/5/6
|
||||||
|
will require extra work due to the application tag.
|
||||||
|
|
||||||
|
|
||||||
|
----------------------------------------------------------------------
|
||||||
|
5.0 BLOCK LAYER INTEGRITY API
|
||||||
|
|
||||||
|
5.1 NORMAL FILESYSTEM
|
||||||
|
|
||||||
|
The normal filesystem is unaware that the underlying block device
|
||||||
|
is capable of sending/receiving integrity metadata. The IMD will
|
||||||
|
be automatically generated by the block layer at submit_bio() time
|
||||||
|
in case of a WRITE. A READ request will cause the I/O integrity
|
||||||
|
to be verified upon completion.
|
||||||
|
|
||||||
|
IMD generation and verification can be toggled using the
|
||||||
|
|
||||||
|
/sys/block/<bdev>/integrity/write_generate
|
||||||
|
|
||||||
|
and
|
||||||
|
|
||||||
|
/sys/block/<bdev>/integrity/read_verify
|
||||||
|
|
||||||
|
flags.
|
||||||
|
|
||||||
|
|
||||||
|
5.2 INTEGRITY-AWARE FILESYSTEM
|
||||||
|
|
||||||
|
A filesystem that is integrity-aware can prepare I/Os with IMD
|
||||||
|
attached. It can also use the application tag space if this is
|
||||||
|
supported by the block device.
|
||||||
|
|
||||||
|
|
||||||
|
int bdev_integrity_enabled(block_device, int rw);
|
||||||
|
|
||||||
|
bdev_integrity_enabled() will return 1 if the block device
|
||||||
|
supports integrity metadata transfer for the data direction
|
||||||
|
specified in 'rw'.
|
||||||
|
|
||||||
|
bdev_integrity_enabled() honors the write_generate and
|
||||||
|
read_verify flags in sysfs and will respond accordingly.
|
||||||
|
|
||||||
|
|
||||||
|
int bio_integrity_prep(bio);
|
||||||
|
|
||||||
|
To generate IMD for WRITE and to set up buffers for READ, the
|
||||||
|
filesystem must call bio_integrity_prep(bio).
|
||||||
|
|
||||||
|
Prior to calling this function, the bio data direction and start
|
||||||
|
sector must be set, and the bio should have all data pages
|
||||||
|
added. It is up to the caller to ensure that the bio does not
|
||||||
|
change while I/O is in progress.
|
||||||
|
|
||||||
|
bio_integrity_prep() should only be called if
|
||||||
|
bio_integrity_enabled() returned 1.
|
||||||
|
|
||||||
|
|
||||||
|
int bio_integrity_tag_size(bio);
|
||||||
|
|
||||||
|
If the filesystem wants to use the application tag space it will
|
||||||
|
first have to find out how much storage space is available.
|
||||||
|
Because tag space is generally limited (usually 2 bytes per
|
||||||
|
sector regardless of sector size), the integrity framework
|
||||||
|
supports interleaving the information between the sectors in an
|
||||||
|
I/O.
|
||||||
|
|
||||||
|
Filesystems can call bio_integrity_tag_size(bio) to find out how
|
||||||
|
many bytes of storage are available for that particular bio.
|
||||||
|
|
||||||
|
Another option is bdev_get_tag_size(block_device) which will
|
||||||
|
return the number of available bytes per hardware sector.
|
||||||
|
|
||||||
|
|
||||||
|
int bio_integrity_set_tag(bio, void *tag_buf, len);
|
||||||
|
|
||||||
|
After a successful return from bio_integrity_prep(),
|
||||||
|
bio_integrity_set_tag() can be used to attach an opaque tag
|
||||||
|
buffer to a bio. Obviously this only makes sense if the I/O is
|
||||||
|
a WRITE.
|
||||||
|
|
||||||
|
|
||||||
|
int bio_integrity_get_tag(bio, void *tag_buf, len);
|
||||||
|
|
||||||
|
Similarly, at READ I/O completion time the filesystem can
|
||||||
|
retrieve the tag buffer using bio_integrity_get_tag().
|
||||||
|
|
||||||
|
|
||||||
|
6.3 PASSING EXISTING INTEGRITY METADATA
|
||||||
|
|
||||||
|
Filesystems that either generate their own integrity metadata or
|
||||||
|
are capable of transferring IMD from user space can use the
|
||||||
|
following calls:
|
||||||
|
|
||||||
|
|
||||||
|
struct bip * bio_integrity_alloc(bio, gfp_mask, nr_pages);
|
||||||
|
|
||||||
|
Allocates the bio integrity payload and hangs it off of the bio.
|
||||||
|
nr_pages indicate how many pages of protection data need to be
|
||||||
|
stored in the integrity bio_vec list (similar to bio_alloc()).
|
||||||
|
|
||||||
|
The integrity payload will be freed at bio_free() time.
|
||||||
|
|
||||||
|
|
||||||
|
int bio_integrity_add_page(bio, page, len, offset);
|
||||||
|
|
||||||
|
Attaches a page containing integrity metadata to an existing
|
||||||
|
bio. The bio must have an existing bip,
|
||||||
|
i.e. bio_integrity_alloc() must have been called. For a WRITE,
|
||||||
|
the integrity metadata in the pages must be in a format
|
||||||
|
understood by the target device with the notable exception that
|
||||||
|
the sector numbers will be remapped as the request traverses the
|
||||||
|
I/O stack. This implies that the pages added using this call
|
||||||
|
will be modified during I/O! The first reference tag in the
|
||||||
|
integrity metadata must have a value of bip->bip_sector.
|
||||||
|
|
||||||
|
Pages can be added using bio_integrity_add_page() as long as
|
||||||
|
there is room in the bip bio_vec array (nr_pages).
|
||||||
|
|
||||||
|
Upon completion of a READ operation, the attached pages will
|
||||||
|
contain the integrity metadata received from the storage device.
|
||||||
|
It is up to the receiver to process them and verify data
|
||||||
|
integrity upon completion.
|
||||||
|
|
||||||
|
|
||||||
|
6.4 REGISTERING A BLOCK DEVICE AS CAPABLE OF EXCHANGING INTEGRITY
|
||||||
|
METADATA
|
||||||
|
|
||||||
|
To enable integrity exchange on a block device the gendisk must be
|
||||||
|
registered as capable:
|
||||||
|
|
||||||
|
int blk_integrity_register(gendisk, blk_integrity);
|
||||||
|
|
||||||
|
The blk_integrity struct is a template and should contain the
|
||||||
|
following:
|
||||||
|
|
||||||
|
static struct blk_integrity my_profile = {
|
||||||
|
.name = "STANDARDSBODY-TYPE-VARIANT-CSUM",
|
||||||
|
.generate_fn = my_generate_fn,
|
||||||
|
.verify_fn = my_verify_fn,
|
||||||
|
.get_tag_fn = my_get_tag_fn,
|
||||||
|
.set_tag_fn = my_set_tag_fn,
|
||||||
|
.tuple_size = sizeof(struct my_tuple_size),
|
||||||
|
.tag_size = <tag bytes per hw sector>,
|
||||||
|
};
|
||||||
|
|
||||||
|
'name' is a text string which will be visible in sysfs. This is
|
||||||
|
part of the userland API so chose it carefully and never change
|
||||||
|
it. The format is standards body-type-variant.
|
||||||
|
E.g. T10-DIF-TYPE1-IP or T13-EPP-0-CRC.
|
||||||
|
|
||||||
|
'generate_fn' generates appropriate integrity metadata (for WRITE).
|
||||||
|
|
||||||
|
'verify_fn' verifies that the data buffer matches the integrity
|
||||||
|
metadata.
|
||||||
|
|
||||||
|
'tuple_size' must be set to match the size of the integrity
|
||||||
|
metadata per sector. I.e. 8 for DIF and EPP.
|
||||||
|
|
||||||
|
'tag_size' must be set to identify how many bytes of tag space
|
||||||
|
are available per hardware sector. For DIF this is either 2 or
|
||||||
|
0 depending on the value of the Control Mode Page ATO bit.
|
||||||
|
|
||||||
|
See 6.2 for a description of get_tag_fn and set_tag_fn.
|
||||||
|
|
||||||
|
----------------------------------------------------------------------
|
||||||
|
2007-12-24 Martin K. Petersen <martin.petersen@oracle.com>
|
@@ -30,12 +30,18 @@ write_expire (in ms)
|
|||||||
Similar to read_expire mentioned above, but for writes.
|
Similar to read_expire mentioned above, but for writes.
|
||||||
|
|
||||||
|
|
||||||
fifo_batch
|
fifo_batch (number of requests)
|
||||||
----------
|
----------
|
||||||
|
|
||||||
When a read request expires its deadline, we must move some requests from
|
Requests are grouped into ``batches'' of a particular data direction (read or
|
||||||
the sorted io scheduler list to the block device dispatch queue. fifo_batch
|
write) which are serviced in increasing sector order. To limit extra seeking,
|
||||||
controls how many requests we move.
|
deadline expiries are only checked between batches. fifo_batch controls the
|
||||||
|
maximum number of requests per batch.
|
||||||
|
|
||||||
|
This parameter tunes the balance between per-request latency and aggregate
|
||||||
|
throughput. When low latency is the primary concern, smaller is better (where
|
||||||
|
a value of 1 yields first-come first-served behaviour). Increasing fifo_batch
|
||||||
|
generally improves throughput, at the cost of latency variation.
|
||||||
|
|
||||||
|
|
||||||
writes_starved (number of dispatches)
|
writes_starved (number of dispatches)
|
||||||
|
67
Documentation/bt8xxgpio.txt
Normal file
67
Documentation/bt8xxgpio.txt
Normal file
@@ -0,0 +1,67 @@
|
|||||||
|
===============================================================
|
||||||
|
== BT8XXGPIO driver ==
|
||||||
|
== ==
|
||||||
|
== A driver for a selfmade cheap BT8xx based PCI GPIO-card ==
|
||||||
|
== ==
|
||||||
|
== For advanced documentation, see ==
|
||||||
|
== http://www.bu3sch.de/btgpio.php ==
|
||||||
|
===============================================================
|
||||||
|
|
||||||
|
|
||||||
|
A generic digital 24-port PCI GPIO card can be built out of an ordinary
|
||||||
|
Brooktree bt848, bt849, bt878 or bt879 based analog TV tuner card. The
|
||||||
|
Brooktree chip is used in old analog Hauppauge WinTV PCI cards. You can easily
|
||||||
|
find them used for low prices on the net.
|
||||||
|
|
||||||
|
The bt8xx chip does have 24 digital GPIO ports.
|
||||||
|
These ports are accessible via 24 pins on the SMD chip package.
|
||||||
|
|
||||||
|
|
||||||
|
==============================================
|
||||||
|
== How to physically access the GPIO pins ==
|
||||||
|
==============================================
|
||||||
|
|
||||||
|
The are several ways to access these pins. One might unsolder the whole chip
|
||||||
|
and put it on a custom PCI board, or one might only unsolder each individual
|
||||||
|
GPIO pin and solder that to some tiny wire. As the chip package really is tiny
|
||||||
|
there are some advanced soldering skills needed in any case.
|
||||||
|
|
||||||
|
The physical pinouts are drawn in the following ASCII art.
|
||||||
|
The GPIO pins are marked with G00-G23
|
||||||
|
|
||||||
|
G G G G G G G G G G G G G G G G G G
|
||||||
|
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
|
||||||
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7
|
||||||
|
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
|
||||||
|
---------------------------------------------------------------------------
|
||||||
|
--| ^ ^ |--
|
||||||
|
--| pin 86 pin 67 |--
|
||||||
|
--| |--
|
||||||
|
--| pin 61 > |-- G18
|
||||||
|
--| |-- G19
|
||||||
|
--| |-- G20
|
||||||
|
--| |-- G21
|
||||||
|
--| |-- G22
|
||||||
|
--| pin 56 > |-- G23
|
||||||
|
--| |--
|
||||||
|
--| Brooktree 878/879 |--
|
||||||
|
--| |--
|
||||||
|
--| |--
|
||||||
|
--| |--
|
||||||
|
--| |--
|
||||||
|
--| |--
|
||||||
|
--| |--
|
||||||
|
--| |--
|
||||||
|
--| |--
|
||||||
|
--| |--
|
||||||
|
--| |--
|
||||||
|
--| |--
|
||||||
|
--| |--
|
||||||
|
--| |--
|
||||||
|
--| O |--
|
||||||
|
--| |--
|
||||||
|
---------------------------------------------------------------------------
|
||||||
|
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
|
||||||
|
^
|
||||||
|
This is pin 1
|
||||||
|
|
@@ -112,27 +112,18 @@ Hot plug support for SCSI tape drives
|
|||||||
|
|
||||||
Hot plugging of SCSI tape drives is supported, with some caveats.
|
Hot plugging of SCSI tape drives is supported, with some caveats.
|
||||||
The cciss driver must be informed that changes to the SCSI bus
|
The cciss driver must be informed that changes to the SCSI bus
|
||||||
have been made, in addition to and prior to informing the SCSI
|
have been made. This may be done via the /proc filesystem.
|
||||||
mid layer. This may be done via the /proc filesystem. For example:
|
For example:
|
||||||
|
|
||||||
echo "rescan" > /proc/scsi/cciss0/1
|
echo "rescan" > /proc/scsi/cciss0/1
|
||||||
|
|
||||||
This causes the adapter to query the adapter about changes to the
|
This causes the driver to query the adapter about changes to the
|
||||||
physical SCSI buses and/or fibre channel arbitrated loop and the
|
physical SCSI buses and/or fibre channel arbitrated loop and the
|
||||||
driver to make note of any new or removed sequential access devices
|
driver to make note of any new or removed sequential access devices
|
||||||
or medium changers. The driver will output messages indicating what
|
or medium changers. The driver will output messages indicating what
|
||||||
devices have been added or removed and the controller, bus, target and
|
devices have been added or removed and the controller, bus, target and
|
||||||
lun used to address the device. Once this is done, the SCSI mid layer
|
lun used to address the device. It then notifies the SCSI mid layer
|
||||||
can be informed of changes to the virtual SCSI bus which the driver
|
of these changes.
|
||||||
presents to it in the usual way. For example:
|
|
||||||
|
|
||||||
echo scsi add-single-device 3 2 1 0 > /proc/scsi/scsi
|
|
||||||
|
|
||||||
to add a device on controller 3, bus 2, target 1, lun 0. Note that
|
|
||||||
the driver makes an effort to preserve the devices positions
|
|
||||||
in the virtual SCSI bus, so if you are only moving tape drives
|
|
||||||
around on the same adapter and not adding or removing tape drives
|
|
||||||
from the adapter, informing the SCSI mid layer may not be necessary.
|
|
||||||
|
|
||||||
Note that the naming convention of the /proc filesystem entries
|
Note that the naming convention of the /proc filesystem entries
|
||||||
contains a number in addition to the driver name. (E.g. "cciss0"
|
contains a number in addition to the driver name. (E.g. "cciss0"
|
||||||
|
@@ -145,8 +145,7 @@ useful for reading photocds.
|
|||||||
|
|
||||||
To play an audio CD, you should first unmount and remove any data
|
To play an audio CD, you should first unmount and remove any data
|
||||||
CDROM. Any of the CDROM player programs should then work (workman,
|
CDROM. Any of the CDROM player programs should then work (workman,
|
||||||
workbone, cdplayer, etc.). Lacking anything else, you could use the
|
workbone, cdplayer, etc.).
|
||||||
cdtester program in Documentation/cdrom/sbpcd.
|
|
||||||
|
|
||||||
On a few drives, you can read digital audio directly using a program
|
On a few drives, you can read digital audio directly using a program
|
||||||
such as cdda2wav. The only types of drive which I've heard support
|
such as cdda2wav. The only types of drive which I've heard support
|
||||||
|
@@ -390,6 +390,10 @@ If you have several tasks to attach, you have to do it one after another:
|
|||||||
...
|
...
|
||||||
# /bin/echo PIDn > tasks
|
# /bin/echo PIDn > tasks
|
||||||
|
|
||||||
|
You can attach the current shell task by echoing 0:
|
||||||
|
|
||||||
|
# echo 0 > tasks
|
||||||
|
|
||||||
3. Kernel API
|
3. Kernel API
|
||||||
=============
|
=============
|
||||||
|
|
||||||
|
@@ -1,133 +0,0 @@
|
|||||||
|
|
||||||
#### cli()/sti() removal guide, started by Ingo Molnar <mingo@redhat.com>
|
|
||||||
|
|
||||||
|
|
||||||
as of 2.5.28, five popular macros have been removed on SMP, and
|
|
||||||
are being phased out on UP:
|
|
||||||
|
|
||||||
cli(), sti(), save_flags(flags), save_flags_cli(flags), restore_flags(flags)
|
|
||||||
|
|
||||||
until now it was possible to protect driver code against interrupt
|
|
||||||
handlers via a cli(), but from now on other, more lightweight methods
|
|
||||||
have to be used for synchronization, such as spinlocks or semaphores.
|
|
||||||
|
|
||||||
for example, driver code that used to do something like:
|
|
||||||
|
|
||||||
struct driver_data;
|
|
||||||
|
|
||||||
irq_handler (...)
|
|
||||||
{
|
|
||||||
....
|
|
||||||
driver_data.finish = 1;
|
|
||||||
driver_data.new_work = 0;
|
|
||||||
....
|
|
||||||
}
|
|
||||||
|
|
||||||
...
|
|
||||||
|
|
||||||
ioctl_func (...)
|
|
||||||
{
|
|
||||||
...
|
|
||||||
cli();
|
|
||||||
...
|
|
||||||
driver_data.finish = 0;
|
|
||||||
driver_data.new_work = 2;
|
|
||||||
...
|
|
||||||
sti();
|
|
||||||
...
|
|
||||||
}
|
|
||||||
|
|
||||||
was SMP-correct because the cli() function ensured that no
|
|
||||||
interrupt handler (amongst them the above irq_handler()) function
|
|
||||||
would execute while the cli()-ed section is executing.
|
|
||||||
|
|
||||||
but from now on a more direct method of locking has to be used:
|
|
||||||
|
|
||||||
DEFINE_SPINLOCK(driver_lock);
|
|
||||||
struct driver_data;
|
|
||||||
|
|
||||||
irq_handler (...)
|
|
||||||
{
|
|
||||||
unsigned long flags;
|
|
||||||
....
|
|
||||||
spin_lock_irqsave(&driver_lock, flags);
|
|
||||||
....
|
|
||||||
driver_data.finish = 1;
|
|
||||||
driver_data.new_work = 0;
|
|
||||||
....
|
|
||||||
spin_unlock_irqrestore(&driver_lock, flags);
|
|
||||||
....
|
|
||||||
}
|
|
||||||
|
|
||||||
...
|
|
||||||
|
|
||||||
ioctl_func (...)
|
|
||||||
{
|
|
||||||
...
|
|
||||||
spin_lock_irq(&driver_lock);
|
|
||||||
...
|
|
||||||
driver_data.finish = 0;
|
|
||||||
driver_data.new_work = 2;
|
|
||||||
...
|
|
||||||
spin_unlock_irq(&driver_lock);
|
|
||||||
...
|
|
||||||
}
|
|
||||||
|
|
||||||
the above code has a number of advantages:
|
|
||||||
|
|
||||||
- the locking relation is easier to understand - actual lock usage
|
|
||||||
pinpoints the critical sections. cli() usage is too opaque.
|
|
||||||
Easier to understand means it's easier to debug.
|
|
||||||
|
|
||||||
- it's faster, because spinlocks are faster to acquire than the
|
|
||||||
potentially heavily-used IRQ lock. Furthermore, your driver does
|
|
||||||
not have to wait eg. for a big heavy SCSI interrupt to finish,
|
|
||||||
because the driver_lock spinlock is only used by your driver.
|
|
||||||
cli() on the other hand was used by many drivers, and extended
|
|
||||||
the critical section to the whole IRQ handler function - creating
|
|
||||||
serious lock contention.
|
|
||||||
|
|
||||||
|
|
||||||
to make the transition easier, we've still kept the cli(), sti(),
|
|
||||||
save_flags(), save_flags_cli() and restore_flags() macros defined
|
|
||||||
on UP systems - but their usage will be phased out until 2.6 is
|
|
||||||
released.
|
|
||||||
|
|
||||||
drivers that want to disable local interrupts (interrupts on the
|
|
||||||
current CPU), can use the following five macros:
|
|
||||||
|
|
||||||
local_irq_disable(), local_irq_enable(), local_save_flags(flags),
|
|
||||||
local_irq_save(flags), local_irq_restore(flags)
|
|
||||||
|
|
||||||
but beware, their meaning and semantics are much simpler, far from
|
|
||||||
that of the old cli(), sti(), save_flags(flags) and restore_flags(flags)
|
|
||||||
SMP meaning:
|
|
||||||
|
|
||||||
local_irq_disable() => turn local IRQs off
|
|
||||||
|
|
||||||
local_irq_enable() => turn local IRQs on
|
|
||||||
|
|
||||||
local_save_flags(flags) => save the current IRQ state into flags. The
|
|
||||||
state can be on or off. (on some
|
|
||||||
architectures there's even more bits in it.)
|
|
||||||
|
|
||||||
local_irq_save(flags) => save the current IRQ state into flags and
|
|
||||||
disable interrupts.
|
|
||||||
|
|
||||||
local_irq_restore(flags) => restore the IRQ state from flags.
|
|
||||||
|
|
||||||
(local_irq_save can save both irqs on and irqs off state, and
|
|
||||||
local_irq_restore can restore into both irqs on and irqs off state.)
|
|
||||||
|
|
||||||
another related change is that synchronize_irq() now takes a parameter:
|
|
||||||
synchronize_irq(irq). This change too has the purpose of making SMP
|
|
||||||
synchronization more lightweight - this way you can wait for your own
|
|
||||||
interrupt handler to finish, no need to wait for other IRQ sources.
|
|
||||||
|
|
||||||
|
|
||||||
why were these changes done? The main reason was the architectural burden
|
|
||||||
of maintaining the cli()/sti() interface - it became a real problem. The
|
|
||||||
new interrupt system is much more streamlined, easier to understand, debug,
|
|
||||||
and it's also a bit faster - the same happened to it that will happen to
|
|
||||||
cli()/sti() using drivers once they convert to spinlocks :-)
|
|
||||||
|
|
11
Documentation/connector/Makefile
Normal file
11
Documentation/connector/Makefile
Normal file
@@ -0,0 +1,11 @@
|
|||||||
|
ifneq ($(CONFIG_CONNECTOR),)
|
||||||
|
obj-m += cn_test.o
|
||||||
|
endif
|
||||||
|
|
||||||
|
# List of programs to build
|
||||||
|
hostprogs-y := ucon
|
||||||
|
|
||||||
|
# Tell kbuild to always build the programs
|
||||||
|
always := $(hostprogs-y)
|
||||||
|
|
||||||
|
HOSTCFLAGS_ucon.o += -I$(objtree)/usr/include
|
@@ -13,7 +13,7 @@ either an integer or * for all. Access is a composition of r
|
|||||||
The root device cgroup starts with rwm to 'all'. A child device
|
The root device cgroup starts with rwm to 'all'. A child device
|
||||||
cgroup gets a copy of the parent. Administrators can then remove
|
cgroup gets a copy of the parent. Administrators can then remove
|
||||||
devices from the whitelist or add new entries. A child cgroup can
|
devices from the whitelist or add new entries. A child cgroup can
|
||||||
never receive a device access which is denied its parent. However
|
never receive a device access which is denied by its parent. However
|
||||||
when a device access is removed from a parent it will not also be
|
when a device access is removed from a parent it will not also be
|
||||||
removed from the child(ren).
|
removed from the child(ren).
|
||||||
|
|
||||||
@@ -29,7 +29,11 @@ allows cgroup 1 to read and mknod the device usually known as
|
|||||||
|
|
||||||
echo a > /cgroups/1/devices.deny
|
echo a > /cgroups/1/devices.deny
|
||||||
|
|
||||||
will remove the default 'a *:* mrw' entry.
|
will remove the default 'a *:* rwm' entry. Doing
|
||||||
|
|
||||||
|
echo a > /cgroups/1/devices.allow
|
||||||
|
|
||||||
|
will add the 'a *:* rwm' entry to the whitelist.
|
||||||
|
|
||||||
3. Security
|
3. Security
|
||||||
|
|
||||||
|
@@ -242,8 +242,7 @@ rmdir() if there are no tasks.
|
|||||||
1. Add support for accounting huge pages (as a separate controller)
|
1. Add support for accounting huge pages (as a separate controller)
|
||||||
2. Make per-cgroup scanner reclaim not-shared pages first
|
2. Make per-cgroup scanner reclaim not-shared pages first
|
||||||
3. Teach controller to account for shared-pages
|
3. Teach controller to account for shared-pages
|
||||||
4. Start reclamation when the limit is lowered
|
4. Start reclamation in the background when the limit is
|
||||||
5. Start reclamation in the background when the limit is
|
|
||||||
not yet hit but the usage is getting closer
|
not yet hit but the usage is getting closer
|
||||||
|
|
||||||
Summary
|
Summary
|
||||||
|
@@ -122,7 +122,7 @@ around '10000' or more.
|
|||||||
show_sampling_rate_(min|max): the minimum and maximum sampling rates
|
show_sampling_rate_(min|max): the minimum and maximum sampling rates
|
||||||
available that you may set 'sampling_rate' to.
|
available that you may set 'sampling_rate' to.
|
||||||
|
|
||||||
up_threshold: defines what the average CPU usaged between the samplings
|
up_threshold: defines what the average CPU usage between the samplings
|
||||||
of 'sampling_rate' needs to be for the kernel to make a decision on
|
of 'sampling_rate' needs to be for the kernel to make a decision on
|
||||||
whether it should increase the frequency. For example when it is set
|
whether it should increase the frequency. For example when it is set
|
||||||
to its default value of '80' it means that between the checking
|
to its default value of '80' it means that between the checking
|
||||||
|
@@ -35,11 +35,9 @@ Mailing List
|
|||||||
------------
|
------------
|
||||||
There is a CPU frequency changing CVS commit and general list where
|
There is a CPU frequency changing CVS commit and general list where
|
||||||
you can report bugs, problems or submit patches. To post a message,
|
you can report bugs, problems or submit patches. To post a message,
|
||||||
send an email to cpufreq@lists.linux.org.uk, to subscribe go to
|
send an email to cpufreq@vger.kernel.org, to subscribe go to
|
||||||
http://lists.linux.org.uk/mailman/listinfo/cpufreq. Previous post to the
|
http://vger.kernel.org/vger-lists.html#cpufreq and follow the
|
||||||
mailing list are available to subscribers at
|
instructions there.
|
||||||
http://lists.linux.org.uk/mailman/private/cpufreq/.
|
|
||||||
|
|
||||||
|
|
||||||
Links
|
Links
|
||||||
-----
|
-----
|
||||||
@@ -50,7 +48,7 @@ how to access the CVS repository:
|
|||||||
* http://cvs.arm.linux.org.uk/
|
* http://cvs.arm.linux.org.uk/
|
||||||
|
|
||||||
the CPUFreq Mailing list:
|
the CPUFreq Mailing list:
|
||||||
* http://lists.linux.org.uk/mailman/listinfo/cpufreq
|
* http://vger.kernel.org/vger-lists.html#cpufreq
|
||||||
|
|
||||||
Clock and voltage scaling for the SA-1100:
|
Clock and voltage scaling for the SA-1100:
|
||||||
* http://www.lartmaker.nl/projects/scaling
|
* http://www.lartmaker.nl/projects/scaling
|
||||||
|
@@ -59,15 +59,10 @@ apicid values in those tables for disabled apics. In the event BIOS doesn't
|
|||||||
mark such hot-pluggable cpus as disabled entries, one could use this
|
mark such hot-pluggable cpus as disabled entries, one could use this
|
||||||
parameter "additional_cpus=x" to represent those cpus in the cpu_possible_map.
|
parameter "additional_cpus=x" to represent those cpus in the cpu_possible_map.
|
||||||
|
|
||||||
s390 uses the number of cpus it detects at IPL time to also the number of bits
|
|
||||||
in cpu_possible_map. If it is desired to add additional cpus at a later time
|
|
||||||
the number should be specified using this option or the possible_cpus option.
|
|
||||||
|
|
||||||
possible_cpus=n [s390 only] use this to set hotpluggable cpus.
|
possible_cpus=n [s390 only] use this to set hotpluggable cpus.
|
||||||
This option sets possible_cpus bits in
|
This option sets possible_cpus bits in
|
||||||
cpu_possible_map. Thus keeping the numbers of bits set
|
cpu_possible_map. Thus keeping the numbers of bits set
|
||||||
constant even if the machine gets rebooted.
|
constant even if the machine gets rebooted.
|
||||||
This option overrides additional_cpus.
|
|
||||||
|
|
||||||
CPU maps and such
|
CPU maps and such
|
||||||
-----------------
|
-----------------
|
||||||
|
@@ -154,13 +154,15 @@ browsing and modifying the cpusets presently known to the kernel. No
|
|||||||
new system calls are added for cpusets - all support for querying and
|
new system calls are added for cpusets - all support for querying and
|
||||||
modifying cpusets is via this cpuset file system.
|
modifying cpusets is via this cpuset file system.
|
||||||
|
|
||||||
The /proc/<pid>/status file for each task has two added lines,
|
The /proc/<pid>/status file for each task has four added lines,
|
||||||
displaying the tasks cpus_allowed (on which CPUs it may be scheduled)
|
displaying the tasks cpus_allowed (on which CPUs it may be scheduled)
|
||||||
and mems_allowed (on which Memory Nodes it may obtain memory),
|
and mems_allowed (on which Memory Nodes it may obtain memory),
|
||||||
in the format seen in the following example:
|
in the two formats seen in the following example:
|
||||||
|
|
||||||
Cpus_allowed: ffffffff,ffffffff,ffffffff,ffffffff
|
Cpus_allowed: ffffffff,ffffffff,ffffffff,ffffffff
|
||||||
|
Cpus_allowed_list: 0-127
|
||||||
Mems_allowed: ffffffff,ffffffff
|
Mems_allowed: ffffffff,ffffffff
|
||||||
|
Mems_allowed_list: 0-63
|
||||||
|
|
||||||
Each cpuset is represented by a directory in the cgroup file system
|
Each cpuset is represented by a directory in the cgroup file system
|
||||||
containing (on top of the standard cgroup files) the following
|
containing (on top of the standard cgroup files) the following
|
||||||
@@ -544,6 +546,9 @@ otherwise initial value -1 that indicates the cpuset has no request.
|
|||||||
( 4 : search nodes in a chunk of node [on NUMA system] )
|
( 4 : search nodes in a chunk of node [on NUMA system] )
|
||||||
( 5 : search system wide [on NUMA system] )
|
( 5 : search system wide [on NUMA system] )
|
||||||
|
|
||||||
|
The system default is architecture dependent. The system default
|
||||||
|
can be changed using the relax_domain_level= boot parameter.
|
||||||
|
|
||||||
This file is per-cpuset and affect the sched domain where the cpuset
|
This file is per-cpuset and affect the sched domain where the cpuset
|
||||||
belongs to. Therefore if the flag 'sched_load_balance' of a cpuset
|
belongs to. Therefore if the flag 'sched_load_balance' of a cpuset
|
||||||
is disabled, then 'sched_relax_domain_level' have no effect since
|
is disabled, then 'sched_relax_domain_level' have no effect since
|
||||||
@@ -630,14 +635,16 @@ prior 'mems' setting, will not be moved.
|
|||||||
|
|
||||||
There is an exception to the above. If hotplug functionality is used
|
There is an exception to the above. If hotplug functionality is used
|
||||||
to remove all the CPUs that are currently assigned to a cpuset,
|
to remove all the CPUs that are currently assigned to a cpuset,
|
||||||
then the kernel will automatically update the cpus_allowed of all
|
then all the tasks in that cpuset will be moved to the nearest ancestor
|
||||||
tasks attached to CPUs in that cpuset to allow all CPUs. When memory
|
with non-empty cpus. But the moving of some (or all) tasks might fail if
|
||||||
hotplug functionality for removing Memory Nodes is available, a
|
cpuset is bound with another cgroup subsystem which has some restrictions
|
||||||
similar exception is expected to apply there as well. In general,
|
on task attaching. In this failing case, those tasks will stay
|
||||||
the kernel prefers to violate cpuset placement, over starving a task
|
in the original cpuset, and the kernel will automatically update
|
||||||
that has had all its allowed CPUs or Memory Nodes taken offline. User
|
their cpus_allowed to allow all online CPUs. When memory hotplug
|
||||||
code should reconfigure cpusets to only refer to online CPUs and Memory
|
functionality for removing Memory Nodes is available, a similar exception
|
||||||
Nodes when using hotplug to add or remove such resources.
|
is expected to apply there as well. In general, the kernel prefers to
|
||||||
|
violate cpuset placement, over starving a task that has had all
|
||||||
|
its allowed CPUs or Memory Nodes taken offline.
|
||||||
|
|
||||||
There is a second exception to the above. GFP_ATOMIC requests are
|
There is a second exception to the above. GFP_ATOMIC requests are
|
||||||
kernel internal allocations that must be satisfied, immediately.
|
kernel internal allocations that must be satisfied, immediately.
|
||||||
|
@@ -14,9 +14,8 @@ represent the thread siblings to cpu X in the same physical package;
|
|||||||
To implement it in an architecture-neutral way, a new source file,
|
To implement it in an architecture-neutral way, a new source file,
|
||||||
drivers/base/topology.c, is to export the 4 attributes.
|
drivers/base/topology.c, is to export the 4 attributes.
|
||||||
|
|
||||||
If one architecture wants to support this feature, it just needs to
|
For an architecture to support this feature, it must define some of
|
||||||
implement 4 defines, typically in file include/asm-XXX/topology.h.
|
these macros in include/asm-XXX/topology.h:
|
||||||
The 4 defines are:
|
|
||||||
#define topology_physical_package_id(cpu)
|
#define topology_physical_package_id(cpu)
|
||||||
#define topology_core_id(cpu)
|
#define topology_core_id(cpu)
|
||||||
#define topology_thread_siblings(cpu)
|
#define topology_thread_siblings(cpu)
|
||||||
@@ -25,17 +24,10 @@ The 4 defines are:
|
|||||||
The type of **_id is int.
|
The type of **_id is int.
|
||||||
The type of siblings is cpumask_t.
|
The type of siblings is cpumask_t.
|
||||||
|
|
||||||
To be consistent on all architectures, the 4 attributes should have
|
To be consistent on all architectures, include/linux/topology.h
|
||||||
default values if their values are unavailable. Below is the rule.
|
provides default definitions for any of the above macros that are
|
||||||
1) physical_package_id: If cpu has no physical package id, -1 is the
|
not defined by include/asm-XXX/topology.h:
|
||||||
default value.
|
1) physical_package_id: -1
|
||||||
2) core_id: If cpu doesn't support multi-core, its core id is 0.
|
2) core_id: 0
|
||||||
3) thread_siblings: Just include itself, if the cpu doesn't support
|
3) thread_siblings: just the given CPU
|
||||||
HT/multi-thread.
|
4) core_siblings: just the given CPU
|
||||||
4) core_siblings: Just include itself, if the cpu doesn't support
|
|
||||||
multi-core and HT/Multi-thread.
|
|
||||||
|
|
||||||
So be careful when declaring the 4 defines in include/asm-XXX/topology.h.
|
|
||||||
|
|
||||||
If an attribute isn't defined on an architecture, it won't be exported.
|
|
||||||
|
|
||||||
|
@@ -2560,9 +2560,6 @@ Your cooperation is appreciated.
|
|||||||
96 = /dev/usb/hiddev0 1st USB HID device
|
96 = /dev/usb/hiddev0 1st USB HID device
|
||||||
...
|
...
|
||||||
111 = /dev/usb/hiddev15 16th USB HID device
|
111 = /dev/usb/hiddev15 16th USB HID device
|
||||||
112 = /dev/usb/auer0 1st auerswald ISDN device
|
|
||||||
...
|
|
||||||
127 = /dev/usb/auer15 16th auerswald ISDN device
|
|
||||||
128 = /dev/usb/brlvgr0 First Braille Voyager device
|
128 = /dev/usb/brlvgr0 First Braille Voyager device
|
||||||
...
|
...
|
||||||
131 = /dev/usb/brlvgr3 Fourth Braille Voyager device
|
131 = /dev/usb/brlvgr3 Fourth Braille Voyager device
|
||||||
|
@@ -5,6 +5,8 @@
|
|||||||
*.css
|
*.css
|
||||||
*.dvi
|
*.dvi
|
||||||
*.eps
|
*.eps
|
||||||
|
*.fw.gen.S
|
||||||
|
*.fw
|
||||||
*.gif
|
*.gif
|
||||||
*.grep
|
*.grep
|
||||||
*.grp
|
*.grp
|
||||||
|
@@ -222,74 +222,9 @@ both csrow2 and csrow3 are populated, this indicates a dual ranked
|
|||||||
set of DIMMs for channels 0 and 1.
|
set of DIMMs for channels 0 and 1.
|
||||||
|
|
||||||
|
|
||||||
Within each of the 'mc','mcX' and 'csrowX' directories are several
|
Within each of the 'mcX' and 'csrowX' directories are several
|
||||||
EDAC control and attribute files.
|
EDAC control and attribute files.
|
||||||
|
|
||||||
|
|
||||||
============================================================================
|
|
||||||
DIRECTORY 'mc'
|
|
||||||
|
|
||||||
In directory 'mc' are EDAC system overall control and attribute files:
|
|
||||||
|
|
||||||
|
|
||||||
Panic on UE control file:
|
|
||||||
|
|
||||||
'edac_mc_panic_on_ue'
|
|
||||||
|
|
||||||
An uncorrectable error will cause a machine panic. This is usually
|
|
||||||
desirable. It is a bad idea to continue when an uncorrectable error
|
|
||||||
occurs - it is indeterminate what was uncorrected and the operating
|
|
||||||
system context might be so mangled that continuing will lead to further
|
|
||||||
corruption. If the kernel has MCE configured, then EDAC will never
|
|
||||||
notice the UE.
|
|
||||||
|
|
||||||
LOAD TIME: module/kernel parameter: panic_on_ue=[0|1]
|
|
||||||
|
|
||||||
RUN TIME: echo "1" >/sys/devices/system/edac/mc/edac_mc_panic_on_ue
|
|
||||||
|
|
||||||
|
|
||||||
Log UE control file:
|
|
||||||
|
|
||||||
'edac_mc_log_ue'
|
|
||||||
|
|
||||||
Generate kernel messages describing uncorrectable errors. These errors
|
|
||||||
are reported through the system message log system. UE statistics
|
|
||||||
will be accumulated even when UE logging is disabled.
|
|
||||||
|
|
||||||
LOAD TIME: module/kernel parameter: log_ue=[0|1]
|
|
||||||
|
|
||||||
RUN TIME: echo "1" >/sys/devices/system/edac/mc/edac_mc_log_ue
|
|
||||||
|
|
||||||
|
|
||||||
Log CE control file:
|
|
||||||
|
|
||||||
'edac_mc_log_ce'
|
|
||||||
|
|
||||||
Generate kernel messages describing correctable errors. These
|
|
||||||
errors are reported through the system message log system.
|
|
||||||
CE statistics will be accumulated even when CE logging is disabled.
|
|
||||||
|
|
||||||
LOAD TIME: module/kernel parameter: log_ce=[0|1]
|
|
||||||
|
|
||||||
RUN TIME: echo "1" >/sys/devices/system/edac/mc/edac_mc_log_ce
|
|
||||||
|
|
||||||
|
|
||||||
Polling period control file:
|
|
||||||
|
|
||||||
'edac_mc_poll_msec'
|
|
||||||
|
|
||||||
The time period, in milliseconds, for polling for error information.
|
|
||||||
Too small a value wastes resources. Too large a value might delay
|
|
||||||
necessary handling of errors and might loose valuable information for
|
|
||||||
locating the error. 1000 milliseconds (once each second) is the current
|
|
||||||
default. Systems which require all the bandwidth they can get, may
|
|
||||||
increase this.
|
|
||||||
|
|
||||||
LOAD TIME: module/kernel parameter: poll_msec=[0|1]
|
|
||||||
|
|
||||||
RUN TIME: echo "1000" >/sys/devices/system/edac/mc/edac_mc_poll_msec
|
|
||||||
|
|
||||||
|
|
||||||
============================================================================
|
============================================================================
|
||||||
'mcX' DIRECTORIES
|
'mcX' DIRECTORIES
|
||||||
|
|
||||||
@@ -392,7 +327,7 @@ Sdram memory scrubbing rate:
|
|||||||
'sdram_scrub_rate'
|
'sdram_scrub_rate'
|
||||||
|
|
||||||
Read/Write attribute file that controls memory scrubbing. The scrubbing
|
Read/Write attribute file that controls memory scrubbing. The scrubbing
|
||||||
rate is set by writing a minimum bandwith in bytes/sec to the attribute
|
rate is set by writing a minimum bandwidth in bytes/sec to the attribute
|
||||||
file. The rate will be translated to an internal value that gives at
|
file. The rate will be translated to an internal value that gives at
|
||||||
least the specified rate.
|
least the specified rate.
|
||||||
|
|
||||||
@@ -537,7 +472,6 @@ Channel 1 DIMM Label control file:
|
|||||||
motherboard specific and determination of this information
|
motherboard specific and determination of this information
|
||||||
must occur in userland at this time.
|
must occur in userland at this time.
|
||||||
|
|
||||||
|
|
||||||
============================================================================
|
============================================================================
|
||||||
SYSTEM LOGGING
|
SYSTEM LOGGING
|
||||||
|
|
||||||
@@ -570,7 +504,6 @@ error type, a notice of "no info" and then an optional,
|
|||||||
driver-specific error message.
|
driver-specific error message.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
============================================================================
|
============================================================================
|
||||||
PCI Bus Parity Detection
|
PCI Bus Parity Detection
|
||||||
|
|
||||||
@@ -604,6 +537,74 @@ Enable/Disable PCI Parity checking control file:
|
|||||||
echo "0" >/sys/devices/system/edac/pci/check_pci_parity
|
echo "0" >/sys/devices/system/edac/pci/check_pci_parity
|
||||||
|
|
||||||
|
|
||||||
|
Parity Count:
|
||||||
|
|
||||||
|
'pci_parity_count'
|
||||||
|
|
||||||
|
This attribute file will display the number of parity errors that
|
||||||
|
have been detected.
|
||||||
|
|
||||||
|
|
||||||
|
============================================================================
|
||||||
|
MODULE PARAMETERS
|
||||||
|
|
||||||
|
Panic on UE control file:
|
||||||
|
|
||||||
|
'edac_mc_panic_on_ue'
|
||||||
|
|
||||||
|
An uncorrectable error will cause a machine panic. This is usually
|
||||||
|
desirable. It is a bad idea to continue when an uncorrectable error
|
||||||
|
occurs - it is indeterminate what was uncorrected and the operating
|
||||||
|
system context might be so mangled that continuing will lead to further
|
||||||
|
corruption. If the kernel has MCE configured, then EDAC will never
|
||||||
|
notice the UE.
|
||||||
|
|
||||||
|
LOAD TIME: module/kernel parameter: edac_mc_panic_on_ue=[0|1]
|
||||||
|
|
||||||
|
RUN TIME: echo "1" > /sys/module/edac_core/parameters/edac_mc_panic_on_ue
|
||||||
|
|
||||||
|
|
||||||
|
Log UE control file:
|
||||||
|
|
||||||
|
'edac_mc_log_ue'
|
||||||
|
|
||||||
|
Generate kernel messages describing uncorrectable errors. These errors
|
||||||
|
are reported through the system message log system. UE statistics
|
||||||
|
will be accumulated even when UE logging is disabled.
|
||||||
|
|
||||||
|
LOAD TIME: module/kernel parameter: edac_mc_log_ue=[0|1]
|
||||||
|
|
||||||
|
RUN TIME: echo "1" > /sys/module/edac_core/parameters/edac_mc_log_ue
|
||||||
|
|
||||||
|
|
||||||
|
Log CE control file:
|
||||||
|
|
||||||
|
'edac_mc_log_ce'
|
||||||
|
|
||||||
|
Generate kernel messages describing correctable errors. These
|
||||||
|
errors are reported through the system message log system.
|
||||||
|
CE statistics will be accumulated even when CE logging is disabled.
|
||||||
|
|
||||||
|
LOAD TIME: module/kernel parameter: edac_mc_log_ce=[0|1]
|
||||||
|
|
||||||
|
RUN TIME: echo "1" > /sys/module/edac_core/parameters/edac_mc_log_ce
|
||||||
|
|
||||||
|
|
||||||
|
Polling period control file:
|
||||||
|
|
||||||
|
'edac_mc_poll_msec'
|
||||||
|
|
||||||
|
The time period, in milliseconds, for polling for error information.
|
||||||
|
Too small a value wastes resources. Too large a value might delay
|
||||||
|
necessary handling of errors and might loose valuable information for
|
||||||
|
locating the error. 1000 milliseconds (once each second) is the current
|
||||||
|
default. Systems which require all the bandwidth they can get, may
|
||||||
|
increase this.
|
||||||
|
|
||||||
|
LOAD TIME: module/kernel parameter: edac_mc_poll_msec=[0|1]
|
||||||
|
|
||||||
|
RUN TIME: echo "1000" > /sys/module/edac_core/parameters/edac_mc_poll_msec
|
||||||
|
|
||||||
|
|
||||||
Panic on PCI PARITY Error:
|
Panic on PCI PARITY Error:
|
||||||
|
|
||||||
@@ -614,21 +615,13 @@ Panic on PCI PARITY Error:
|
|||||||
error has been detected.
|
error has been detected.
|
||||||
|
|
||||||
|
|
||||||
module/kernel parameter: panic_on_pci_parity=[0|1]
|
module/kernel parameter: edac_panic_on_pci_pe=[0|1]
|
||||||
|
|
||||||
Enable:
|
Enable:
|
||||||
echo "1" >/sys/devices/system/edac/pci/panic_on_pci_parity
|
echo "1" > /sys/module/edac_core/parameters/edac_panic_on_pci_pe
|
||||||
|
|
||||||
Disable:
|
Disable:
|
||||||
echo "0" >/sys/devices/system/edac/pci/panic_on_pci_parity
|
echo "0" > /sys/module/edac_core/parameters/edac_panic_on_pci_pe
|
||||||
|
|
||||||
|
|
||||||
Parity Count:
|
|
||||||
|
|
||||||
'pci_parity_count'
|
|
||||||
|
|
||||||
This attribute file will display the number of parity errors that
|
|
||||||
have been detected.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
131
Documentation/fb/sh7760fb.txt
Normal file
131
Documentation/fb/sh7760fb.txt
Normal file
@@ -0,0 +1,131 @@
|
|||||||
|
SH7760/SH7763 integrated LCDC Framebuffer driver
|
||||||
|
================================================
|
||||||
|
|
||||||
|
0. Overwiew
|
||||||
|
-----------
|
||||||
|
The SH7760/SH7763 have an integrated LCD Display controller (LCDC) which
|
||||||
|
supports (in theory) resolutions ranging from 1x1 to 1024x1024,
|
||||||
|
with color depths ranging from 1 to 16 bits, on STN, DSTN and TFT Panels.
|
||||||
|
|
||||||
|
Caveats:
|
||||||
|
* Framebuffer memory must be a large chunk allocated at the top
|
||||||
|
of Area3 (HW requirement). Because of this requirement you should NOT
|
||||||
|
make the driver a module since at runtime it may become impossible to
|
||||||
|
get a large enough contiguous chunk of memory.
|
||||||
|
|
||||||
|
* The driver does not support changing resolution while loaded
|
||||||
|
(displays aren't hotpluggable anyway)
|
||||||
|
|
||||||
|
* Heavy flickering may be observed
|
||||||
|
a) if you're using 15/16bit color modes at >= 640x480 px resolutions,
|
||||||
|
b) during PCMCIA (or any other slow bus) activity.
|
||||||
|
|
||||||
|
* Rotation works only 90degress clockwise, and only if horizontal
|
||||||
|
resolution is <= 320 pixels.
|
||||||
|
|
||||||
|
files: drivers/video/sh7760fb.c
|
||||||
|
include/asm-sh/sh7760fb.h
|
||||||
|
Documentation/fb/sh7760fb.txt
|
||||||
|
|
||||||
|
1. Platform setup
|
||||||
|
-----------------
|
||||||
|
SH7760:
|
||||||
|
Video data is fetched via the DMABRG DMA engine, so you have to
|
||||||
|
configure the SH DMAC for DMABRG mode (write 0x94808080 to the
|
||||||
|
DMARSRA register somewhere at boot).
|
||||||
|
|
||||||
|
PFC registers PCCR and PCDR must be set to peripheral mode.
|
||||||
|
(write zeros to both).
|
||||||
|
|
||||||
|
The driver does NOT do the above for you since board setup is, well, job
|
||||||
|
of the board setup code.
|
||||||
|
|
||||||
|
2. Panel definitions
|
||||||
|
--------------------
|
||||||
|
The LCDC must explicitly be told about the type of LCD panel
|
||||||
|
attached. Data must be wrapped in a "struct sh7760fb_platdata" and
|
||||||
|
passed to the driver as platform_data.
|
||||||
|
|
||||||
|
Suggest you take a closer look at the SH7760 Manual, Section 30.
|
||||||
|
(http://documentation.renesas.com/eng/products/mpumcu/e602291_sh7760.pdf)
|
||||||
|
|
||||||
|
The following code illustrates what needs to be done to
|
||||||
|
get the framebuffer working on a 640x480 TFT:
|
||||||
|
|
||||||
|
====================== cut here ======================================
|
||||||
|
|
||||||
|
#include <linux/fb.h>
|
||||||
|
#include <asm/sh7760fb.h>
|
||||||
|
|
||||||
|
/*
|
||||||
|
* NEC NL6440bc26-01 640x480 TFT
|
||||||
|
* dotclock 25175 kHz
|
||||||
|
* Xres 640 Yres 480
|
||||||
|
* Htotal 800 Vtotal 525
|
||||||
|
* HsynStart 656 VsynStart 490
|
||||||
|
* HsynLenn 30 VsynLenn 2
|
||||||
|
*
|
||||||
|
* The linux framebuffer layer does not use the syncstart/synclen
|
||||||
|
* values but right/left/upper/lower margin values. The comments
|
||||||
|
* for the x_margin explain how to calculate those from given
|
||||||
|
* panel sync timings.
|
||||||
|
*/
|
||||||
|
static struct fb_videomode nl6448bc26 = {
|
||||||
|
.name = "NL6448BC26",
|
||||||
|
.refresh = 60,
|
||||||
|
.xres = 640,
|
||||||
|
.yres = 480,
|
||||||
|
.pixclock = 39683, /* in picoseconds! */
|
||||||
|
.hsync_len = 30,
|
||||||
|
.vsync_len = 2,
|
||||||
|
.left_margin = 114, /* HTOT - (HSYNSLEN + HSYNSTART) */
|
||||||
|
.right_margin = 16, /* HSYNSTART - XRES */
|
||||||
|
.upper_margin = 33, /* VTOT - (VSYNLEN + VSYNSTART) */
|
||||||
|
.lower_margin = 10, /* VSYNSTART - YRES */
|
||||||
|
.sync = FB_SYNC_HOR_HIGH_ACT | FB_SYNC_VERT_HIGH_ACT,
|
||||||
|
.vmode = FB_VMODE_NONINTERLACED,
|
||||||
|
.flag = 0,
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct sh7760fb_platdata sh7760fb_nl6448 = {
|
||||||
|
.def_mode = &nl6448bc26,
|
||||||
|
.ldmtr = LDMTR_TFT_COLOR_16, /* 16bit TFT panel */
|
||||||
|
.lddfr = LDDFR_8BPP, /* we want 8bit output */
|
||||||
|
.ldpmmr = 0x0070,
|
||||||
|
.ldpspr = 0x0500,
|
||||||
|
.ldaclnr = 0,
|
||||||
|
.ldickr = LDICKR_CLKSRC(LCDC_CLKSRC_EXTERNAL) |
|
||||||
|
LDICKR_CLKDIV(1),
|
||||||
|
.rotate = 0,
|
||||||
|
.novsync = 1,
|
||||||
|
.blank = NULL,
|
||||||
|
};
|
||||||
|
|
||||||
|
/* SH7760:
|
||||||
|
* 0xFE300800: 256 * 4byte xRGB palette ram
|
||||||
|
* 0xFE300C00: 42 bytes ctrl registers
|
||||||
|
*/
|
||||||
|
static struct resource sh7760_lcdc_res[] = {
|
||||||
|
[0] = {
|
||||||
|
.start = 0xFE300800,
|
||||||
|
.end = 0xFE300CFF,
|
||||||
|
.flags = IORESOURCE_MEM,
|
||||||
|
},
|
||||||
|
[1] = {
|
||||||
|
.start = 65,
|
||||||
|
.end = 65,
|
||||||
|
.flags = IORESOURCE_IRQ,
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct platform_device sh7760_lcdc_dev = {
|
||||||
|
.dev = {
|
||||||
|
.platform_data = &sh7760fb_nl6448,
|
||||||
|
},
|
||||||
|
.name = "sh7760-lcdc",
|
||||||
|
.id = -1,
|
||||||
|
.resource = sh7760_lcdc_res,
|
||||||
|
.num_resources = ARRAY_SIZE(sh7760_lcdc_res),
|
||||||
|
};
|
||||||
|
|
||||||
|
====================== cut here ======================================
|
@@ -3,11 +3,25 @@ Tridentfb is a framebuffer driver for some Trident chip based cards.
|
|||||||
The following list of chips is thought to be supported although not all are
|
The following list of chips is thought to be supported although not all are
|
||||||
tested:
|
tested:
|
||||||
|
|
||||||
those from the Image series with Cyber in their names - accelerated
|
those from the TGUI series 9440/96XX and with Cyber in their names
|
||||||
those with Blade in their names (Blade3D,CyberBlade...) - accelerated
|
those from the Image series and with Cyber in their names
|
||||||
the newer CyberBladeXP family - nonaccelerated
|
those with Blade in their names (Blade3D,CyberBlade...)
|
||||||
|
the newer CyberBladeXP family
|
||||||
|
|
||||||
Only PCI/AGP based cards are supported, none of the older Tridents.
|
All families are accelerated. Only PCI/AGP based cards are supported,
|
||||||
|
none of the older Tridents.
|
||||||
|
The driver supports 8, 16 and 32 bits per pixel depths.
|
||||||
|
The TGUI family requires a line length to be power of 2 if acceleration
|
||||||
|
is enabled. This means that range of possible resolutions and bpp is
|
||||||
|
limited comparing to the range if acceleration is disabled (see list
|
||||||
|
of parameters below).
|
||||||
|
|
||||||
|
Known bugs:
|
||||||
|
1. The driver randomly locks up on 3DImage975 chip with acceleration
|
||||||
|
enabled. The same happens in X11 (Xorg).
|
||||||
|
2. The ramdac speeds require some more fine tuning. It is possible to
|
||||||
|
switch resolution which the chip does not support at some depths for
|
||||||
|
older chips.
|
||||||
|
|
||||||
How to use it?
|
How to use it?
|
||||||
==============
|
==============
|
||||||
@@ -17,12 +31,11 @@ video=tridentfb
|
|||||||
|
|
||||||
The parameters for tridentfb are concatenated with a ':' as in this example.
|
The parameters for tridentfb are concatenated with a ':' as in this example.
|
||||||
|
|
||||||
video=tridentfb:800x600,bpp=16,noaccel
|
video=tridentfb:800x600-16@75,noaccel
|
||||||
|
|
||||||
The second level parameters that tridentfb understands are:
|
The second level parameters that tridentfb understands are:
|
||||||
|
|
||||||
noaccel - turns off acceleration (when it doesn't work for your card)
|
noaccel - turns off acceleration (when it doesn't work for your card)
|
||||||
accel - force text acceleration (for boards which by default are noacceled)
|
|
||||||
|
|
||||||
fp - use flat panel related stuff
|
fp - use flat panel related stuff
|
||||||
crt - assume monitor is present instead of fp
|
crt - assume monitor is present instead of fp
|
||||||
@@ -31,21 +44,24 @@ center - for flat panels and resolutions smaller than native size center the
|
|||||||
image, otherwise use
|
image, otherwise use
|
||||||
stretch
|
stretch
|
||||||
|
|
||||||
memsize - integer value in Kb, use if your card's memory size is misdetected.
|
memsize - integer value in KB, use if your card's memory size is misdetected.
|
||||||
look at the driver output to see what it says when initializing.
|
look at the driver output to see what it says when initializing.
|
||||||
memdiff - integer value in Kb,should be nonzero if your card reports
|
|
||||||
more memory than it actually has.For instance mine is 192K less than
|
memdiff - integer value in KB, should be nonzero if your card reports
|
||||||
|
more memory than it actually has. For instance mine is 192K less than
|
||||||
detection says in all three BIOS selectable situations 2M, 4M, 8M.
|
detection says in all three BIOS selectable situations 2M, 4M, 8M.
|
||||||
Only use if your video memory is taken from main memory hence of
|
Only use if your video memory is taken from main memory hence of
|
||||||
configurable size.Otherwise use memsize.
|
configurable size. Otherwise use memsize.
|
||||||
If in some modes which barely fit the memory you see garbage at the bottom
|
If in some modes which barely fit the memory you see garbage
|
||||||
this might help by not letting change to that mode anymore.
|
at the bottom this might help by not letting change to that mode
|
||||||
|
anymore.
|
||||||
|
|
||||||
nativex - the width in pixels of the flat panel.If you know it (usually 1024
|
nativex - the width in pixels of the flat panel.If you know it (usually 1024
|
||||||
800 or 1280) and it is not what the driver seems to detect use it.
|
800 or 1280) and it is not what the driver seems to detect use it.
|
||||||
|
|
||||||
bpp - bits per pixel (8,16 or 32)
|
bpp - bits per pixel (8,16 or 32)
|
||||||
mode - a mode name like 800x600 (as described in Documentation/fb/modedb.txt)
|
mode - a mode name like 800x600-8@75 as described in
|
||||||
|
Documentation/fb/modedb.txt
|
||||||
|
|
||||||
Using insane values for the above parameters will probably result in driver
|
Using insane values for the above parameters will probably result in driver
|
||||||
misbehaviour so take care(for instance memsize=12345678 or memdiff=23784 or
|
misbehaviour so take care(for instance memsize=12345678 or memdiff=23784 or
|
||||||
|
@@ -6,6 +6,24 @@ be removed from this file.
|
|||||||
|
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
||||||
|
What: old static regulatory information and ieee80211_regdom module parameter
|
||||||
|
When: 2.6.29
|
||||||
|
Why: The old regulatory infrastructure has been replaced with a new one
|
||||||
|
which does not require statically defined regulatory domains. We do
|
||||||
|
not want to keep static regulatory domains in the kernel due to the
|
||||||
|
the dynamic nature of regulatory law and localization. We kept around
|
||||||
|
the old static definitions for the regulatory domains of:
|
||||||
|
* US
|
||||||
|
* JP
|
||||||
|
* EU
|
||||||
|
and used by default the US when CONFIG_WIRELESS_OLD_REGULATORY was
|
||||||
|
set. We also kept around the ieee80211_regdom module parameter in case
|
||||||
|
some applications were relying on it. Changing regulatory domains
|
||||||
|
can now be done instead by using nl80211, as is done with iw.
|
||||||
|
Who: Luis R. Rodriguez <lrodriguez@atheros.com>
|
||||||
|
|
||||||
|
---------------------------
|
||||||
|
|
||||||
What: dev->power.power_state
|
What: dev->power.power_state
|
||||||
When: July 2007
|
When: July 2007
|
||||||
Why: Broken design for runtime control over driver power states, confusing
|
Why: Broken design for runtime control over driver power states, confusing
|
||||||
@@ -19,15 +37,6 @@ Who: Pavel Machek <pavel@suse.cz>
|
|||||||
|
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
||||||
What: old NCR53C9x driver
|
|
||||||
When: October 2007
|
|
||||||
Why: Replaced by the much better esp_scsi driver. Actual low-level
|
|
||||||
driver can be ported over almost trivially.
|
|
||||||
Who: David Miller <davem@davemloft.net>
|
|
||||||
Christoph Hellwig <hch@lst.de>
|
|
||||||
|
|
||||||
---------------------------
|
|
||||||
|
|
||||||
What: Video4Linux API 1 ioctls and video_decoder.h from Video devices.
|
What: Video4Linux API 1 ioctls and video_decoder.h from Video devices.
|
||||||
When: December 2008
|
When: December 2008
|
||||||
Files: include/linux/video_decoder.h include/linux/videodev.h
|
Files: include/linux/video_decoder.h include/linux/videodev.h
|
||||||
@@ -47,6 +56,30 @@ Who: Mauro Carvalho Chehab <mchehab@infradead.org>
|
|||||||
|
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
||||||
|
What: old tuner-3036 i2c driver
|
||||||
|
When: 2.6.28
|
||||||
|
Why: This driver is for VERY old i2c-over-parallel port teletext receiver
|
||||||
|
boxes. Rather then spending effort on converting this driver to V4L2,
|
||||||
|
and since it is extremely unlikely that anyone still uses one of these
|
||||||
|
devices, it was decided to drop it.
|
||||||
|
Who: Hans Verkuil <hverkuil@xs4all.nl>
|
||||||
|
Mauro Carvalho Chehab <mchehab@infradead.org>
|
||||||
|
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
What: V4L2 dpc7146 driver
|
||||||
|
When: 2.6.28
|
||||||
|
Why: Old driver for the dpc7146 demonstration board that is no longer
|
||||||
|
relevant. The last time this was tested on actual hardware was
|
||||||
|
probably around 2002. Since this is a driver for a demonstration
|
||||||
|
board the decision was made to remove it rather than spending a
|
||||||
|
lot of effort continually updating this driver to stay in sync
|
||||||
|
with the latest internal V4L2 or I2C API.
|
||||||
|
Who: Hans Verkuil <hverkuil@xs4all.nl>
|
||||||
|
Mauro Carvalho Chehab <mchehab@infradead.org>
|
||||||
|
|
||||||
|
---------------------------
|
||||||
|
|
||||||
What: PCMCIA control ioctl (needed for pcmcia-cs [cardmgr, cardctl])
|
What: PCMCIA control ioctl (needed for pcmcia-cs [cardmgr, cardctl])
|
||||||
When: November 2005
|
When: November 2005
|
||||||
Files: drivers/pcmcia/: pcmcia_ioctl.c
|
Files: drivers/pcmcia/: pcmcia_ioctl.c
|
||||||
@@ -138,24 +171,6 @@ Who: Kay Sievers <kay.sievers@suse.de>
|
|||||||
|
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
||||||
What: find_task_by_pid
|
|
||||||
When: 2.6.26
|
|
||||||
Why: With pid namespaces, calling this funciton will return the
|
|
||||||
wrong task when called from inside a namespace.
|
|
||||||
|
|
||||||
The best way to save a task pid and find a task by this
|
|
||||||
pid later, is to find this task's struct pid pointer (or get
|
|
||||||
it directly from the task) and call pid_task() later.
|
|
||||||
|
|
||||||
If someone really needs to get a task by its pid_t, then
|
|
||||||
he most likely needs the find_task_by_vpid() to get the
|
|
||||||
task from the same namespace as the current task is in, but
|
|
||||||
this may be not so in general.
|
|
||||||
|
|
||||||
Who: Pavel Emelyanov <xemul@openvz.org>
|
|
||||||
|
|
||||||
---------------------------
|
|
||||||
|
|
||||||
What: ACPI procfs interface
|
What: ACPI procfs interface
|
||||||
When: July 2008
|
When: July 2008
|
||||||
Why: ACPI sysfs conversion should be finished by January 2008.
|
Why: ACPI sysfs conversion should be finished by January 2008.
|
||||||
@@ -199,19 +214,6 @@ Who: Tejun Heo <htejun@gmail.com>
|
|||||||
|
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
||||||
What: The arch/ppc and include/asm-ppc directories
|
|
||||||
When: Jun 2008
|
|
||||||
Why: The arch/powerpc tree is the merged architecture for ppc32 and ppc64
|
|
||||||
platforms. Currently there are efforts underway to port the remaining
|
|
||||||
arch/ppc platforms to the merged tree. New submissions to the arch/ppc
|
|
||||||
tree have been frozen with the 2.6.22 kernel release and that tree will
|
|
||||||
remain in bug-fix only mode until its scheduled removal. Platforms
|
|
||||||
that are not ported by June 2008 will be removed due to the lack of an
|
|
||||||
interested maintainer.
|
|
||||||
Who: linuxppc-dev@ozlabs.org
|
|
||||||
|
|
||||||
---------------------------
|
|
||||||
|
|
||||||
What: i386/x86_64 bzImage symlinks
|
What: i386/x86_64 bzImage symlinks
|
||||||
When: April 2010
|
When: April 2010
|
||||||
|
|
||||||
@@ -222,13 +224,6 @@ Who: Thomas Gleixner <tglx@linutronix.de>
|
|||||||
|
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
||||||
What: i2c-i810, i2c-prosavage and i2c-savage4
|
|
||||||
When: May 2008
|
|
||||||
Why: These drivers are superseded by i810fb, intelfb and savagefb.
|
|
||||||
Who: Jean Delvare <khali@linux-fr.org>
|
|
||||||
|
|
||||||
---------------------------
|
|
||||||
|
|
||||||
What (Why):
|
What (Why):
|
||||||
- include/linux/netfilter_ipv4/ipt_TOS.h ipt_tos.h header files
|
- include/linux/netfilter_ipv4/ipt_TOS.h ipt_tos.h header files
|
||||||
(superseded by xt_TOS/xt_tos target & match)
|
(superseded by xt_TOS/xt_tos target & match)
|
||||||
@@ -255,6 +250,9 @@ What (Why):
|
|||||||
- xt_mark match revision 0
|
- xt_mark match revision 0
|
||||||
(superseded by xt_mark match revision 1)
|
(superseded by xt_mark match revision 1)
|
||||||
|
|
||||||
|
- xt_recent: the old ipt_recent proc dir
|
||||||
|
(superseded by /proc/net/xt_recent)
|
||||||
|
|
||||||
When: January 2009 or Linux 2.7.0, whichever comes first
|
When: January 2009 or Linux 2.7.0, whichever comes first
|
||||||
Why: Superseded by newer revisions or modules
|
Why: Superseded by newer revisions or modules
|
||||||
Who: Jan Engelhardt <jengelh@computergmbh.de>
|
Who: Jan Engelhardt <jengelh@computergmbh.de>
|
||||||
@@ -289,11 +287,10 @@ Who: Glauber Costa <gcosta@redhat.com>
|
|||||||
|
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
||||||
What: old style serial driver for ColdFire (CONFIG_SERIAL_COLDFIRE)
|
What: remove HID compat support
|
||||||
When: 2.6.28
|
When: 2.6.29
|
||||||
Why: This driver still uses the old interface and has been replaced
|
Why: needed only as a temporary solution until distros fix themselves up
|
||||||
by CONFIG_SERIAL_MCF.
|
Who: Jiri Slaby <jirislaby@gmail.com>
|
||||||
Who: Sebastian Siewior <sebastian@breakpoint.cc>
|
|
||||||
|
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
||||||
@@ -307,8 +304,49 @@ Who: ocfs2-devel@oss.oracle.com
|
|||||||
|
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
||||||
What: asm/semaphore.h
|
What: SCTP_GET_PEER_ADDRS_NUM_OLD, SCTP_GET_PEER_ADDRS_OLD,
|
||||||
When: 2.6.26
|
SCTP_GET_LOCAL_ADDRS_NUM_OLD, SCTP_GET_LOCAL_ADDRS_OLD
|
||||||
Why: Implementation became generic; users should now include
|
When: June 2009
|
||||||
linux/semaphore.h instead.
|
Why: A newer version of the options have been introduced in 2005 that
|
||||||
Who: Matthew Wilcox <willy@linux.intel.com>
|
removes the limitions of the old API. The sctp library has been
|
||||||
|
converted to use these new options at the same time. Any user
|
||||||
|
space app that directly uses the old options should convert to using
|
||||||
|
the new options.
|
||||||
|
Who: Vlad Yasevich <vladislav.yasevich@hp.com>
|
||||||
|
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
What: CONFIG_THERMAL_HWMON
|
||||||
|
When: January 2009
|
||||||
|
Why: This option was introduced just to allow older lm-sensors userspace
|
||||||
|
to keep working over the upgrade to 2.6.26. At the scheduled time of
|
||||||
|
removal fixed lm-sensors (2.x or 3.x) should be readily available.
|
||||||
|
Who: Rene Herman <rene.herman@gmail.com>
|
||||||
|
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
What: Code that is now under CONFIG_WIRELESS_EXT_SYSFS
|
||||||
|
(in net/core/net-sysfs.c)
|
||||||
|
When: After the only user (hal) has seen a release with the patches
|
||||||
|
for enough time, probably some time in 2010.
|
||||||
|
Why: Over 1K .text/.data size reduction, data is available in other
|
||||||
|
ways (ioctls)
|
||||||
|
Who: Johannes Berg <johannes@sipsolutions.net>
|
||||||
|
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
What: CONFIG_NF_CT_ACCT
|
||||||
|
When: 2.6.29
|
||||||
|
Why: Accounting can now be enabled/disabled without kernel recompilation.
|
||||||
|
Currently used only to set a default value for a feature that is also
|
||||||
|
controlled by a kernel/module/sysfs/sysctl parameter.
|
||||||
|
Who: Krzysztof Piotr Oledzki <ole@ans.pl>
|
||||||
|
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
What: ide-scsi (BLK_DEV_IDESCSI)
|
||||||
|
When: 2.6.29
|
||||||
|
Why: The 2.6 kernel supports direct writing to ide CD drives, which
|
||||||
|
eliminates the need for ide-scsi. The new method is more
|
||||||
|
efficient in every way.
|
||||||
|
Who: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
|
||||||
|
@@ -144,8 +144,8 @@ prototypes:
|
|||||||
void (*kill_sb) (struct super_block *);
|
void (*kill_sb) (struct super_block *);
|
||||||
locking rules:
|
locking rules:
|
||||||
may block BKL
|
may block BKL
|
||||||
get_sb yes yes
|
get_sb yes no
|
||||||
kill_sb yes yes
|
kill_sb yes no
|
||||||
|
|
||||||
->get_sb() returns error or 0 with locked superblock attached to the vfsmount
|
->get_sb() returns error or 0 with locked superblock attached to the vfsmount
|
||||||
(exclusive on ->s_umount).
|
(exclusive on ->s_umount).
|
||||||
@@ -409,12 +409,12 @@ ioctl: yes (see below)
|
|||||||
unlocked_ioctl: no (see below)
|
unlocked_ioctl: no (see below)
|
||||||
compat_ioctl: no
|
compat_ioctl: no
|
||||||
mmap: no
|
mmap: no
|
||||||
open: maybe (see below)
|
open: no
|
||||||
flush: no
|
flush: no
|
||||||
release: no
|
release: no
|
||||||
fsync: no (see below)
|
fsync: no (see below)
|
||||||
aio_fsync: no
|
aio_fsync: no
|
||||||
fasync: yes (see below)
|
fasync: no
|
||||||
lock: yes
|
lock: yes
|
||||||
readv: no
|
readv: no
|
||||||
writev: no
|
writev: no
|
||||||
@@ -431,13 +431,6 @@ For many filesystems, it is probably safe to acquire the inode
|
|||||||
semaphore. Note some filesystems (i.e. remote ones) provide no
|
semaphore. Note some filesystems (i.e. remote ones) provide no
|
||||||
protection for i_size so you will need to use the BKL.
|
protection for i_size so you will need to use the BKL.
|
||||||
|
|
||||||
->open() locking is in-transit: big lock partially moved into the methods.
|
|
||||||
The only exception is ->open() in the instances of file_operations that never
|
|
||||||
end up in ->i_fop/->proc_fops, i.e. ones that belong to character devices
|
|
||||||
(chrdev_open() takes lock before replacing ->f_op and calling the secondary
|
|
||||||
method. As soon as we fix the handling of module reference counters all
|
|
||||||
instances of ->open() will be called without the BKL.
|
|
||||||
|
|
||||||
Note: ext2_release() was *the* source of contention on fs-intensive
|
Note: ext2_release() was *the* source of contention on fs-intensive
|
||||||
loads and dropping BKL on ->release() helps to get rid of that (we still
|
loads and dropping BKL on ->release() helps to get rid of that (we still
|
||||||
grab BKL for cases when we close a file that had been opened r/w, but that
|
grab BKL for cases when we close a file that had been opened r/w, but that
|
||||||
@@ -510,6 +503,7 @@ prototypes:
|
|||||||
void (*close)(struct vm_area_struct*);
|
void (*close)(struct vm_area_struct*);
|
||||||
int (*fault)(struct vm_area_struct*, struct vm_fault *);
|
int (*fault)(struct vm_area_struct*, struct vm_fault *);
|
||||||
int (*page_mkwrite)(struct vm_area_struct *, struct page *);
|
int (*page_mkwrite)(struct vm_area_struct *, struct page *);
|
||||||
|
int (*access)(struct vm_area_struct *, unsigned long, void*, int, int);
|
||||||
|
|
||||||
locking rules:
|
locking rules:
|
||||||
BKL mmap_sem PageLocked(page)
|
BKL mmap_sem PageLocked(page)
|
||||||
@@ -517,6 +511,7 @@ open: no yes
|
|||||||
close: no yes
|
close: no yes
|
||||||
fault: no yes
|
fault: no yes
|
||||||
page_mkwrite: no yes no
|
page_mkwrite: no yes no
|
||||||
|
access: no yes
|
||||||
|
|
||||||
->page_mkwrite() is called when a previously read-only page is
|
->page_mkwrite() is called when a previously read-only page is
|
||||||
about to become writeable. The file system is responsible for
|
about to become writeable. The file system is responsible for
|
||||||
@@ -525,6 +520,11 @@ taking to lock out truncate, the page range should be verified to be
|
|||||||
within i_size. The page mapping should also be checked that it is not
|
within i_size. The page mapping should also be checked that it is not
|
||||||
NULL.
|
NULL.
|
||||||
|
|
||||||
|
->access() is called when get_user_pages() fails in
|
||||||
|
acces_process_vm(), typically used to debug a process through
|
||||||
|
/proc/pid/mem or ptrace. This function is needed only for
|
||||||
|
VM_IO | VM_PFNMAP VMAs.
|
||||||
|
|
||||||
================================================================================
|
================================================================================
|
||||||
Dubious stuff
|
Dubious stuff
|
||||||
|
|
||||||
|
@@ -26,11 +26,11 @@ You can simplify mounting by just typing:
|
|||||||
|
|
||||||
this will allocate the first available loopback device (and load loop.o
|
this will allocate the first available loopback device (and load loop.o
|
||||||
kernel module if necessary) automatically. If the loopback driver is not
|
kernel module if necessary) automatically. If the loopback driver is not
|
||||||
loaded automatically, make sure that your kernel is compiled with kmod
|
loaded automatically, make sure that you have compiled the module and
|
||||||
support (CONFIG_KMOD) enabled. Beware that umount will not
|
that modprobe is functioning. Beware that umount will not deallocate
|
||||||
deallocate /dev/loopN device if /etc/mtab file on your system is a
|
/dev/loopN device if /etc/mtab file on your system is a symbolic link to
|
||||||
symbolic link to /proc/mounts. You will need to do it manually using
|
/proc/mounts. You will need to do it manually using "-d" switch of
|
||||||
"-d" switch of losetup(8). Read losetup(8) manpage for more info.
|
losetup(8). Read losetup(8) manpage for more info.
|
||||||
|
|
||||||
To create the BFS image under UnixWare you need to find out first which
|
To create the BFS image under UnixWare you need to find out first which
|
||||||
slice contains it. The command prtvtoc(1M) is your friend:
|
slice contains it. The command prtvtoc(1M) is your friend:
|
||||||
|
3
Documentation/filesystems/configfs/Makefile
Normal file
3
Documentation/filesystems/configfs/Makefile
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
ifneq ($(CONFIG_CONFIGFS_FS),)
|
||||||
|
obj-m += configfs_example_explicit.o configfs_example_macros.o
|
||||||
|
endif
|
@@ -311,9 +311,20 @@ the subsystem must be ready for it.
|
|||||||
[An Example]
|
[An Example]
|
||||||
|
|
||||||
The best example of these basic concepts is the simple_children
|
The best example of these basic concepts is the simple_children
|
||||||
subsystem/group and the simple_child item in configfs_example.c It
|
subsystem/group and the simple_child item in configfs_example_explicit.c
|
||||||
shows a trivial object displaying and storing an attribute, and a simple
|
and configfs_example_macros.c. It shows a trivial object displaying and
|
||||||
group creating and destroying these children.
|
storing an attribute, and a simple group creating and destroying these
|
||||||
|
children.
|
||||||
|
|
||||||
|
The only difference between configfs_example_explicit.c and
|
||||||
|
configfs_example_macros.c is how the attributes of the childless item
|
||||||
|
are defined. The childless item has extended attributes, each with
|
||||||
|
their own show()/store() operation. This follows a convention commonly
|
||||||
|
used in sysfs. configfs_example_explicit.c creates these attributes
|
||||||
|
by explicitly defining the structures involved. Conversely
|
||||||
|
configfs_example_macros.c uses some convenience macros from configfs.h
|
||||||
|
to define the attributes. These macros are similar to their sysfs
|
||||||
|
counterparts.
|
||||||
|
|
||||||
[Hierarchy Navigation and the Subsystem Mutex]
|
[Hierarchy Navigation and the Subsystem Mutex]
|
||||||
|
|
||||||
|
@@ -1,485 +0,0 @@
|
|||||||
/*
|
|
||||||
* vim: noexpandtab ts=8 sts=0 sw=8:
|
|
||||||
*
|
|
||||||
* configfs_example.c - This file is a demonstration module containing
|
|
||||||
* a number of configfs subsystems.
|
|
||||||
*
|
|
||||||
* This program is free software; you can redistribute it and/or
|
|
||||||
* modify it under the terms of the GNU General Public
|
|
||||||
* License as published by the Free Software Foundation; either
|
|
||||||
* version 2 of the License, or (at your option) any later version.
|
|
||||||
*
|
|
||||||
* This program is distributed in the hope that it will be useful,
|
|
||||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
||||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
|
||||||
* General Public License for more details.
|
|
||||||
*
|
|
||||||
* You should have received a copy of the GNU General Public
|
|
||||||
* License along with this program; if not, write to the
|
|
||||||
* Free Software Foundation, Inc., 59 Temple Place - Suite 330,
|
|
||||||
* Boston, MA 021110-1307, USA.
|
|
||||||
*
|
|
||||||
* Based on sysfs:
|
|
||||||
* sysfs is Copyright (C) 2001, 2002, 2003 Patrick Mochel
|
|
||||||
*
|
|
||||||
* configfs Copyright (C) 2005 Oracle. All rights reserved.
|
|
||||||
*/
|
|
||||||
|
|
||||||
#include <linux/init.h>
|
|
||||||
#include <linux/module.h>
|
|
||||||
#include <linux/slab.h>
|
|
||||||
|
|
||||||
#include <linux/configfs.h>
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
/*
|
|
||||||
* 01-childless
|
|
||||||
*
|
|
||||||
* This first example is a childless subsystem. It cannot create
|
|
||||||
* any config_items. It just has attributes.
|
|
||||||
*
|
|
||||||
* Note that we are enclosing the configfs_subsystem inside a container.
|
|
||||||
* This is not necessary if a subsystem has no attributes directly
|
|
||||||
* on the subsystem. See the next example, 02-simple-children, for
|
|
||||||
* such a subsystem.
|
|
||||||
*/
|
|
||||||
|
|
||||||
struct childless {
|
|
||||||
struct configfs_subsystem subsys;
|
|
||||||
int showme;
|
|
||||||
int storeme;
|
|
||||||
};
|
|
||||||
|
|
||||||
struct childless_attribute {
|
|
||||||
struct configfs_attribute attr;
|
|
||||||
ssize_t (*show)(struct childless *, char *);
|
|
||||||
ssize_t (*store)(struct childless *, const char *, size_t);
|
|
||||||
};
|
|
||||||
|
|
||||||
static inline struct childless *to_childless(struct config_item *item)
|
|
||||||
{
|
|
||||||
return item ? container_of(to_configfs_subsystem(to_config_group(item)), struct childless, subsys) : NULL;
|
|
||||||
}
|
|
||||||
|
|
||||||
static ssize_t childless_showme_read(struct childless *childless,
|
|
||||||
char *page)
|
|
||||||
{
|
|
||||||
ssize_t pos;
|
|
||||||
|
|
||||||
pos = sprintf(page, "%d\n", childless->showme);
|
|
||||||
childless->showme++;
|
|
||||||
|
|
||||||
return pos;
|
|
||||||
}
|
|
||||||
|
|
||||||
static ssize_t childless_storeme_read(struct childless *childless,
|
|
||||||
char *page)
|
|
||||||
{
|
|
||||||
return sprintf(page, "%d\n", childless->storeme);
|
|
||||||
}
|
|
||||||
|
|
||||||
static ssize_t childless_storeme_write(struct childless *childless,
|
|
||||||
const char *page,
|
|
||||||
size_t count)
|
|
||||||
{
|
|
||||||
unsigned long tmp;
|
|
||||||
char *p = (char *) page;
|
|
||||||
|
|
||||||
tmp = simple_strtoul(p, &p, 10);
|
|
||||||
if (!p || (*p && (*p != '\n')))
|
|
||||||
return -EINVAL;
|
|
||||||
|
|
||||||
if (tmp > INT_MAX)
|
|
||||||
return -ERANGE;
|
|
||||||
|
|
||||||
childless->storeme = tmp;
|
|
||||||
|
|
||||||
return count;
|
|
||||||
}
|
|
||||||
|
|
||||||
static ssize_t childless_description_read(struct childless *childless,
|
|
||||||
char *page)
|
|
||||||
{
|
|
||||||
return sprintf(page,
|
|
||||||
"[01-childless]\n"
|
|
||||||
"\n"
|
|
||||||
"The childless subsystem is the simplest possible subsystem in\n"
|
|
||||||
"configfs. It does not support the creation of child config_items.\n"
|
|
||||||
"It only has a few attributes. In fact, it isn't much different\n"
|
|
||||||
"than a directory in /proc.\n");
|
|
||||||
}
|
|
||||||
|
|
||||||
static struct childless_attribute childless_attr_showme = {
|
|
||||||
.attr = { .ca_owner = THIS_MODULE, .ca_name = "showme", .ca_mode = S_IRUGO },
|
|
||||||
.show = childless_showme_read,
|
|
||||||
};
|
|
||||||
static struct childless_attribute childless_attr_storeme = {
|
|
||||||
.attr = { .ca_owner = THIS_MODULE, .ca_name = "storeme", .ca_mode = S_IRUGO | S_IWUSR },
|
|
||||||
.show = childless_storeme_read,
|
|
||||||
.store = childless_storeme_write,
|
|
||||||
};
|
|
||||||
static struct childless_attribute childless_attr_description = {
|
|
||||||
.attr = { .ca_owner = THIS_MODULE, .ca_name = "description", .ca_mode = S_IRUGO },
|
|
||||||
.show = childless_description_read,
|
|
||||||
};
|
|
||||||
|
|
||||||
static struct configfs_attribute *childless_attrs[] = {
|
|
||||||
&childless_attr_showme.attr,
|
|
||||||
&childless_attr_storeme.attr,
|
|
||||||
&childless_attr_description.attr,
|
|
||||||
NULL,
|
|
||||||
};
|
|
||||||
|
|
||||||
static ssize_t childless_attr_show(struct config_item *item,
|
|
||||||
struct configfs_attribute *attr,
|
|
||||||
char *page)
|
|
||||||
{
|
|
||||||
struct childless *childless = to_childless(item);
|
|
||||||
struct childless_attribute *childless_attr =
|
|
||||||
container_of(attr, struct childless_attribute, attr);
|
|
||||||
ssize_t ret = 0;
|
|
||||||
|
|
||||||
if (childless_attr->show)
|
|
||||||
ret = childless_attr->show(childless, page);
|
|
||||||
return ret;
|
|
||||||
}
|
|
||||||
|
|
||||||
static ssize_t childless_attr_store(struct config_item *item,
|
|
||||||
struct configfs_attribute *attr,
|
|
||||||
const char *page, size_t count)
|
|
||||||
{
|
|
||||||
struct childless *childless = to_childless(item);
|
|
||||||
struct childless_attribute *childless_attr =
|
|
||||||
container_of(attr, struct childless_attribute, attr);
|
|
||||||
ssize_t ret = -EINVAL;
|
|
||||||
|
|
||||||
if (childless_attr->store)
|
|
||||||
ret = childless_attr->store(childless, page, count);
|
|
||||||
return ret;
|
|
||||||
}
|
|
||||||
|
|
||||||
static struct configfs_item_operations childless_item_ops = {
|
|
||||||
.show_attribute = childless_attr_show,
|
|
||||||
.store_attribute = childless_attr_store,
|
|
||||||
};
|
|
||||||
|
|
||||||
static struct config_item_type childless_type = {
|
|
||||||
.ct_item_ops = &childless_item_ops,
|
|
||||||
.ct_attrs = childless_attrs,
|
|
||||||
.ct_owner = THIS_MODULE,
|
|
||||||
};
|
|
||||||
|
|
||||||
static struct childless childless_subsys = {
|
|
||||||
.subsys = {
|
|
||||||
.su_group = {
|
|
||||||
.cg_item = {
|
|
||||||
.ci_namebuf = "01-childless",
|
|
||||||
.ci_type = &childless_type,
|
|
||||||
},
|
|
||||||
},
|
|
||||||
},
|
|
||||||
};
|
|
||||||
|
|
||||||
|
|
||||||
/* ----------------------------------------------------------------- */
|
|
||||||
|
|
||||||
/*
|
|
||||||
* 02-simple-children
|
|
||||||
*
|
|
||||||
* This example merely has a simple one-attribute child. Note that
|
|
||||||
* there is no extra attribute structure, as the child's attribute is
|
|
||||||
* known from the get-go. Also, there is no container for the
|
|
||||||
* subsystem, as it has no attributes of its own.
|
|
||||||
*/
|
|
||||||
|
|
||||||
struct simple_child {
|
|
||||||
struct config_item item;
|
|
||||||
int storeme;
|
|
||||||
};
|
|
||||||
|
|
||||||
static inline struct simple_child *to_simple_child(struct config_item *item)
|
|
||||||
{
|
|
||||||
return item ? container_of(item, struct simple_child, item) : NULL;
|
|
||||||
}
|
|
||||||
|
|
||||||
static struct configfs_attribute simple_child_attr_storeme = {
|
|
||||||
.ca_owner = THIS_MODULE,
|
|
||||||
.ca_name = "storeme",
|
|
||||||
.ca_mode = S_IRUGO | S_IWUSR,
|
|
||||||
};
|
|
||||||
|
|
||||||
static struct configfs_attribute *simple_child_attrs[] = {
|
|
||||||
&simple_child_attr_storeme,
|
|
||||||
NULL,
|
|
||||||
};
|
|
||||||
|
|
||||||
static ssize_t simple_child_attr_show(struct config_item *item,
|
|
||||||
struct configfs_attribute *attr,
|
|
||||||
char *page)
|
|
||||||
{
|
|
||||||
ssize_t count;
|
|
||||||
struct simple_child *simple_child = to_simple_child(item);
|
|
||||||
|
|
||||||
count = sprintf(page, "%d\n", simple_child->storeme);
|
|
||||||
|
|
||||||
return count;
|
|
||||||
}
|
|
||||||
|
|
||||||
static ssize_t simple_child_attr_store(struct config_item *item,
|
|
||||||
struct configfs_attribute *attr,
|
|
||||||
const char *page, size_t count)
|
|
||||||
{
|
|
||||||
struct simple_child *simple_child = to_simple_child(item);
|
|
||||||
unsigned long tmp;
|
|
||||||
char *p = (char *) page;
|
|
||||||
|
|
||||||
tmp = simple_strtoul(p, &p, 10);
|
|
||||||
if (!p || (*p && (*p != '\n')))
|
|
||||||
return -EINVAL;
|
|
||||||
|
|
||||||
if (tmp > INT_MAX)
|
|
||||||
return -ERANGE;
|
|
||||||
|
|
||||||
simple_child->storeme = tmp;
|
|
||||||
|
|
||||||
return count;
|
|
||||||
}
|
|
||||||
|
|
||||||
static void simple_child_release(struct config_item *item)
|
|
||||||
{
|
|
||||||
kfree(to_simple_child(item));
|
|
||||||
}
|
|
||||||
|
|
||||||
static struct configfs_item_operations simple_child_item_ops = {
|
|
||||||
.release = simple_child_release,
|
|
||||||
.show_attribute = simple_child_attr_show,
|
|
||||||
.store_attribute = simple_child_attr_store,
|
|
||||||
};
|
|
||||||
|
|
||||||
static struct config_item_type simple_child_type = {
|
|
||||||
.ct_item_ops = &simple_child_item_ops,
|
|
||||||
.ct_attrs = simple_child_attrs,
|
|
||||||
.ct_owner = THIS_MODULE,
|
|
||||||
};
|
|
||||||
|
|
||||||
|
|
||||||
struct simple_children {
|
|
||||||
struct config_group group;
|
|
||||||
};
|
|
||||||
|
|
||||||
static inline struct simple_children *to_simple_children(struct config_item *item)
|
|
||||||
{
|
|
||||||
return item ? container_of(to_config_group(item), struct simple_children, group) : NULL;
|
|
||||||
}
|
|
||||||
|
|
||||||
static struct config_item *simple_children_make_item(struct config_group *group, const char *name)
|
|
||||||
{
|
|
||||||
struct simple_child *simple_child;
|
|
||||||
|
|
||||||
simple_child = kzalloc(sizeof(struct simple_child), GFP_KERNEL);
|
|
||||||
if (!simple_child)
|
|
||||||
return NULL;
|
|
||||||
|
|
||||||
|
|
||||||
config_item_init_type_name(&simple_child->item, name,
|
|
||||||
&simple_child_type);
|
|
||||||
|
|
||||||
simple_child->storeme = 0;
|
|
||||||
|
|
||||||
return &simple_child->item;
|
|
||||||
}
|
|
||||||
|
|
||||||
static struct configfs_attribute simple_children_attr_description = {
|
|
||||||
.ca_owner = THIS_MODULE,
|
|
||||||
.ca_name = "description",
|
|
||||||
.ca_mode = S_IRUGO,
|
|
||||||
};
|
|
||||||
|
|
||||||
static struct configfs_attribute *simple_children_attrs[] = {
|
|
||||||
&simple_children_attr_description,
|
|
||||||
NULL,
|
|
||||||
};
|
|
||||||
|
|
||||||
static ssize_t simple_children_attr_show(struct config_item *item,
|
|
||||||
struct configfs_attribute *attr,
|
|
||||||
char *page)
|
|
||||||
{
|
|
||||||
return sprintf(page,
|
|
||||||
"[02-simple-children]\n"
|
|
||||||
"\n"
|
|
||||||
"This subsystem allows the creation of child config_items. These\n"
|
|
||||||
"items have only one attribute that is readable and writeable.\n");
|
|
||||||
}
|
|
||||||
|
|
||||||
static void simple_children_release(struct config_item *item)
|
|
||||||
{
|
|
||||||
kfree(to_simple_children(item));
|
|
||||||
}
|
|
||||||
|
|
||||||
static struct configfs_item_operations simple_children_item_ops = {
|
|
||||||
.release = simple_children_release,
|
|
||||||
.show_attribute = simple_children_attr_show,
|
|
||||||
};
|
|
||||||
|
|
||||||
/*
|
|
||||||
* Note that, since no extra work is required on ->drop_item(),
|
|
||||||
* no ->drop_item() is provided.
|
|
||||||
*/
|
|
||||||
static struct configfs_group_operations simple_children_group_ops = {
|
|
||||||
.make_item = simple_children_make_item,
|
|
||||||
};
|
|
||||||
|
|
||||||
static struct config_item_type simple_children_type = {
|
|
||||||
.ct_item_ops = &simple_children_item_ops,
|
|
||||||
.ct_group_ops = &simple_children_group_ops,
|
|
||||||
.ct_attrs = simple_children_attrs,
|
|
||||||
.ct_owner = THIS_MODULE,
|
|
||||||
};
|
|
||||||
|
|
||||||
static struct configfs_subsystem simple_children_subsys = {
|
|
||||||
.su_group = {
|
|
||||||
.cg_item = {
|
|
||||||
.ci_namebuf = "02-simple-children",
|
|
||||||
.ci_type = &simple_children_type,
|
|
||||||
},
|
|
||||||
},
|
|
||||||
};
|
|
||||||
|
|
||||||
|
|
||||||
/* ----------------------------------------------------------------- */
|
|
||||||
|
|
||||||
/*
|
|
||||||
* 03-group-children
|
|
||||||
*
|
|
||||||
* This example reuses the simple_children group from above. However,
|
|
||||||
* the simple_children group is not the subsystem itself, it is a
|
|
||||||
* child of the subsystem. Creation of a group in the subsystem creates
|
|
||||||
* a new simple_children group. That group can then have simple_child
|
|
||||||
* children of its own.
|
|
||||||
*/
|
|
||||||
|
|
||||||
static struct config_group *group_children_make_group(struct config_group *group, const char *name)
|
|
||||||
{
|
|
||||||
struct simple_children *simple_children;
|
|
||||||
|
|
||||||
simple_children = kzalloc(sizeof(struct simple_children),
|
|
||||||
GFP_KERNEL);
|
|
||||||
if (!simple_children)
|
|
||||||
return NULL;
|
|
||||||
|
|
||||||
|
|
||||||
config_group_init_type_name(&simple_children->group, name,
|
|
||||||
&simple_children_type);
|
|
||||||
|
|
||||||
return &simple_children->group;
|
|
||||||
}
|
|
||||||
|
|
||||||
static struct configfs_attribute group_children_attr_description = {
|
|
||||||
.ca_owner = THIS_MODULE,
|
|
||||||
.ca_name = "description",
|
|
||||||
.ca_mode = S_IRUGO,
|
|
||||||
};
|
|
||||||
|
|
||||||
static struct configfs_attribute *group_children_attrs[] = {
|
|
||||||
&group_children_attr_description,
|
|
||||||
NULL,
|
|
||||||
};
|
|
||||||
|
|
||||||
static ssize_t group_children_attr_show(struct config_item *item,
|
|
||||||
struct configfs_attribute *attr,
|
|
||||||
char *page)
|
|
||||||
{
|
|
||||||
return sprintf(page,
|
|
||||||
"[03-group-children]\n"
|
|
||||||
"\n"
|
|
||||||
"This subsystem allows the creation of child config_groups. These\n"
|
|
||||||
"groups are like the subsystem simple-children.\n");
|
|
||||||
}
|
|
||||||
|
|
||||||
static struct configfs_item_operations group_children_item_ops = {
|
|
||||||
.show_attribute = group_children_attr_show,
|
|
||||||
};
|
|
||||||
|
|
||||||
/*
|
|
||||||
* Note that, since no extra work is required on ->drop_item(),
|
|
||||||
* no ->drop_item() is provided.
|
|
||||||
*/
|
|
||||||
static struct configfs_group_operations group_children_group_ops = {
|
|
||||||
.make_group = group_children_make_group,
|
|
||||||
};
|
|
||||||
|
|
||||||
static struct config_item_type group_children_type = {
|
|
||||||
.ct_item_ops = &group_children_item_ops,
|
|
||||||
.ct_group_ops = &group_children_group_ops,
|
|
||||||
.ct_attrs = group_children_attrs,
|
|
||||||
.ct_owner = THIS_MODULE,
|
|
||||||
};
|
|
||||||
|
|
||||||
static struct configfs_subsystem group_children_subsys = {
|
|
||||||
.su_group = {
|
|
||||||
.cg_item = {
|
|
||||||
.ci_namebuf = "03-group-children",
|
|
||||||
.ci_type = &group_children_type,
|
|
||||||
},
|
|
||||||
},
|
|
||||||
};
|
|
||||||
|
|
||||||
/* ----------------------------------------------------------------- */
|
|
||||||
|
|
||||||
/*
|
|
||||||
* We're now done with our subsystem definitions.
|
|
||||||
* For convenience in this module, here's a list of them all. It
|
|
||||||
* allows the init function to easily register them. Most modules
|
|
||||||
* will only have one subsystem, and will only call register_subsystem
|
|
||||||
* on it directly.
|
|
||||||
*/
|
|
||||||
static struct configfs_subsystem *example_subsys[] = {
|
|
||||||
&childless_subsys.subsys,
|
|
||||||
&simple_children_subsys,
|
|
||||||
&group_children_subsys,
|
|
||||||
NULL,
|
|
||||||
};
|
|
||||||
|
|
||||||
static int __init configfs_example_init(void)
|
|
||||||
{
|
|
||||||
int ret;
|
|
||||||
int i;
|
|
||||||
struct configfs_subsystem *subsys;
|
|
||||||
|
|
||||||
for (i = 0; example_subsys[i]; i++) {
|
|
||||||
subsys = example_subsys[i];
|
|
||||||
|
|
||||||
config_group_init(&subsys->su_group);
|
|
||||||
mutex_init(&subsys->su_mutex);
|
|
||||||
ret = configfs_register_subsystem(subsys);
|
|
||||||
if (ret) {
|
|
||||||
printk(KERN_ERR "Error %d while registering subsystem %s\n",
|
|
||||||
ret,
|
|
||||||
subsys->su_group.cg_item.ci_namebuf);
|
|
||||||
goto out_unregister;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
return 0;
|
|
||||||
|
|
||||||
out_unregister:
|
|
||||||
for (; i >= 0; i--) {
|
|
||||||
configfs_unregister_subsystem(example_subsys[i]);
|
|
||||||
}
|
|
||||||
|
|
||||||
return ret;
|
|
||||||
}
|
|
||||||
|
|
||||||
static void __exit configfs_example_exit(void)
|
|
||||||
{
|
|
||||||
int i;
|
|
||||||
|
|
||||||
for (i = 0; example_subsys[i]; i++) {
|
|
||||||
configfs_unregister_subsystem(example_subsys[i]);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
module_init(configfs_example_init);
|
|
||||||
module_exit(configfs_example_exit);
|
|
||||||
MODULE_LICENSE("GPL");
|
|
485
Documentation/filesystems/configfs/configfs_example_explicit.c
Normal file
485
Documentation/filesystems/configfs/configfs_example_explicit.c
Normal file
@@ -0,0 +1,485 @@
|
|||||||
|
/*
|
||||||
|
* vim: noexpandtab ts=8 sts=0 sw=8:
|
||||||
|
*
|
||||||
|
* configfs_example_explicit.c - This file is a demonstration module
|
||||||
|
* containing a number of configfs subsystems. It explicitly defines
|
||||||
|
* each structure without using the helper macros defined in
|
||||||
|
* configfs.h.
|
||||||
|
*
|
||||||
|
* This program is free software; you can redistribute it and/or
|
||||||
|
* modify it under the terms of the GNU General Public
|
||||||
|
* License as published by the Free Software Foundation; either
|
||||||
|
* version 2 of the License, or (at your option) any later version.
|
||||||
|
*
|
||||||
|
* This program is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||||
|
* General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public
|
||||||
|
* License along with this program; if not, write to the
|
||||||
|
* Free Software Foundation, Inc., 59 Temple Place - Suite 330,
|
||||||
|
* Boston, MA 021110-1307, USA.
|
||||||
|
*
|
||||||
|
* Based on sysfs:
|
||||||
|
* sysfs is Copyright (C) 2001, 2002, 2003 Patrick Mochel
|
||||||
|
*
|
||||||
|
* configfs Copyright (C) 2005 Oracle. All rights reserved.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include <linux/init.h>
|
||||||
|
#include <linux/module.h>
|
||||||
|
#include <linux/slab.h>
|
||||||
|
|
||||||
|
#include <linux/configfs.h>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
* 01-childless
|
||||||
|
*
|
||||||
|
* This first example is a childless subsystem. It cannot create
|
||||||
|
* any config_items. It just has attributes.
|
||||||
|
*
|
||||||
|
* Note that we are enclosing the configfs_subsystem inside a container.
|
||||||
|
* This is not necessary if a subsystem has no attributes directly
|
||||||
|
* on the subsystem. See the next example, 02-simple-children, for
|
||||||
|
* such a subsystem.
|
||||||
|
*/
|
||||||
|
|
||||||
|
struct childless {
|
||||||
|
struct configfs_subsystem subsys;
|
||||||
|
int showme;
|
||||||
|
int storeme;
|
||||||
|
};
|
||||||
|
|
||||||
|
struct childless_attribute {
|
||||||
|
struct configfs_attribute attr;
|
||||||
|
ssize_t (*show)(struct childless *, char *);
|
||||||
|
ssize_t (*store)(struct childless *, const char *, size_t);
|
||||||
|
};
|
||||||
|
|
||||||
|
static inline struct childless *to_childless(struct config_item *item)
|
||||||
|
{
|
||||||
|
return item ? container_of(to_configfs_subsystem(to_config_group(item)), struct childless, subsys) : NULL;
|
||||||
|
}
|
||||||
|
|
||||||
|
static ssize_t childless_showme_read(struct childless *childless,
|
||||||
|
char *page)
|
||||||
|
{
|
||||||
|
ssize_t pos;
|
||||||
|
|
||||||
|
pos = sprintf(page, "%d\n", childless->showme);
|
||||||
|
childless->showme++;
|
||||||
|
|
||||||
|
return pos;
|
||||||
|
}
|
||||||
|
|
||||||
|
static ssize_t childless_storeme_read(struct childless *childless,
|
||||||
|
char *page)
|
||||||
|
{
|
||||||
|
return sprintf(page, "%d\n", childless->storeme);
|
||||||
|
}
|
||||||
|
|
||||||
|
static ssize_t childless_storeme_write(struct childless *childless,
|
||||||
|
const char *page,
|
||||||
|
size_t count)
|
||||||
|
{
|
||||||
|
unsigned long tmp;
|
||||||
|
char *p = (char *) page;
|
||||||
|
|
||||||
|
tmp = simple_strtoul(p, &p, 10);
|
||||||
|
if (!p || (*p && (*p != '\n')))
|
||||||
|
return -EINVAL;
|
||||||
|
|
||||||
|
if (tmp > INT_MAX)
|
||||||
|
return -ERANGE;
|
||||||
|
|
||||||
|
childless->storeme = tmp;
|
||||||
|
|
||||||
|
return count;
|
||||||
|
}
|
||||||
|
|
||||||
|
static ssize_t childless_description_read(struct childless *childless,
|
||||||
|
char *page)
|
||||||
|
{
|
||||||
|
return sprintf(page,
|
||||||
|
"[01-childless]\n"
|
||||||
|
"\n"
|
||||||
|
"The childless subsystem is the simplest possible subsystem in\n"
|
||||||
|
"configfs. It does not support the creation of child config_items.\n"
|
||||||
|
"It only has a few attributes. In fact, it isn't much different\n"
|
||||||
|
"than a directory in /proc.\n");
|
||||||
|
}
|
||||||
|
|
||||||
|
static struct childless_attribute childless_attr_showme = {
|
||||||
|
.attr = { .ca_owner = THIS_MODULE, .ca_name = "showme", .ca_mode = S_IRUGO },
|
||||||
|
.show = childless_showme_read,
|
||||||
|
};
|
||||||
|
static struct childless_attribute childless_attr_storeme = {
|
||||||
|
.attr = { .ca_owner = THIS_MODULE, .ca_name = "storeme", .ca_mode = S_IRUGO | S_IWUSR },
|
||||||
|
.show = childless_storeme_read,
|
||||||
|
.store = childless_storeme_write,
|
||||||
|
};
|
||||||
|
static struct childless_attribute childless_attr_description = {
|
||||||
|
.attr = { .ca_owner = THIS_MODULE, .ca_name = "description", .ca_mode = S_IRUGO },
|
||||||
|
.show = childless_description_read,
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct configfs_attribute *childless_attrs[] = {
|
||||||
|
&childless_attr_showme.attr,
|
||||||
|
&childless_attr_storeme.attr,
|
||||||
|
&childless_attr_description.attr,
|
||||||
|
NULL,
|
||||||
|
};
|
||||||
|
|
||||||
|
static ssize_t childless_attr_show(struct config_item *item,
|
||||||
|
struct configfs_attribute *attr,
|
||||||
|
char *page)
|
||||||
|
{
|
||||||
|
struct childless *childless = to_childless(item);
|
||||||
|
struct childless_attribute *childless_attr =
|
||||||
|
container_of(attr, struct childless_attribute, attr);
|
||||||
|
ssize_t ret = 0;
|
||||||
|
|
||||||
|
if (childless_attr->show)
|
||||||
|
ret = childless_attr->show(childless, page);
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
static ssize_t childless_attr_store(struct config_item *item,
|
||||||
|
struct configfs_attribute *attr,
|
||||||
|
const char *page, size_t count)
|
||||||
|
{
|
||||||
|
struct childless *childless = to_childless(item);
|
||||||
|
struct childless_attribute *childless_attr =
|
||||||
|
container_of(attr, struct childless_attribute, attr);
|
||||||
|
ssize_t ret = -EINVAL;
|
||||||
|
|
||||||
|
if (childless_attr->store)
|
||||||
|
ret = childless_attr->store(childless, page, count);
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
static struct configfs_item_operations childless_item_ops = {
|
||||||
|
.show_attribute = childless_attr_show,
|
||||||
|
.store_attribute = childless_attr_store,
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct config_item_type childless_type = {
|
||||||
|
.ct_item_ops = &childless_item_ops,
|
||||||
|
.ct_attrs = childless_attrs,
|
||||||
|
.ct_owner = THIS_MODULE,
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct childless childless_subsys = {
|
||||||
|
.subsys = {
|
||||||
|
.su_group = {
|
||||||
|
.cg_item = {
|
||||||
|
.ci_namebuf = "01-childless",
|
||||||
|
.ci_type = &childless_type,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
/* ----------------------------------------------------------------- */
|
||||||
|
|
||||||
|
/*
|
||||||
|
* 02-simple-children
|
||||||
|
*
|
||||||
|
* This example merely has a simple one-attribute child. Note that
|
||||||
|
* there is no extra attribute structure, as the child's attribute is
|
||||||
|
* known from the get-go. Also, there is no container for the
|
||||||
|
* subsystem, as it has no attributes of its own.
|
||||||
|
*/
|
||||||
|
|
||||||
|
struct simple_child {
|
||||||
|
struct config_item item;
|
||||||
|
int storeme;
|
||||||
|
};
|
||||||
|
|
||||||
|
static inline struct simple_child *to_simple_child(struct config_item *item)
|
||||||
|
{
|
||||||
|
return item ? container_of(item, struct simple_child, item) : NULL;
|
||||||
|
}
|
||||||
|
|
||||||
|
static struct configfs_attribute simple_child_attr_storeme = {
|
||||||
|
.ca_owner = THIS_MODULE,
|
||||||
|
.ca_name = "storeme",
|
||||||
|
.ca_mode = S_IRUGO | S_IWUSR,
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct configfs_attribute *simple_child_attrs[] = {
|
||||||
|
&simple_child_attr_storeme,
|
||||||
|
NULL,
|
||||||
|
};
|
||||||
|
|
||||||
|
static ssize_t simple_child_attr_show(struct config_item *item,
|
||||||
|
struct configfs_attribute *attr,
|
||||||
|
char *page)
|
||||||
|
{
|
||||||
|
ssize_t count;
|
||||||
|
struct simple_child *simple_child = to_simple_child(item);
|
||||||
|
|
||||||
|
count = sprintf(page, "%d\n", simple_child->storeme);
|
||||||
|
|
||||||
|
return count;
|
||||||
|
}
|
||||||
|
|
||||||
|
static ssize_t simple_child_attr_store(struct config_item *item,
|
||||||
|
struct configfs_attribute *attr,
|
||||||
|
const char *page, size_t count)
|
||||||
|
{
|
||||||
|
struct simple_child *simple_child = to_simple_child(item);
|
||||||
|
unsigned long tmp;
|
||||||
|
char *p = (char *) page;
|
||||||
|
|
||||||
|
tmp = simple_strtoul(p, &p, 10);
|
||||||
|
if (!p || (*p && (*p != '\n')))
|
||||||
|
return -EINVAL;
|
||||||
|
|
||||||
|
if (tmp > INT_MAX)
|
||||||
|
return -ERANGE;
|
||||||
|
|
||||||
|
simple_child->storeme = tmp;
|
||||||
|
|
||||||
|
return count;
|
||||||
|
}
|
||||||
|
|
||||||
|
static void simple_child_release(struct config_item *item)
|
||||||
|
{
|
||||||
|
kfree(to_simple_child(item));
|
||||||
|
}
|
||||||
|
|
||||||
|
static struct configfs_item_operations simple_child_item_ops = {
|
||||||
|
.release = simple_child_release,
|
||||||
|
.show_attribute = simple_child_attr_show,
|
||||||
|
.store_attribute = simple_child_attr_store,
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct config_item_type simple_child_type = {
|
||||||
|
.ct_item_ops = &simple_child_item_ops,
|
||||||
|
.ct_attrs = simple_child_attrs,
|
||||||
|
.ct_owner = THIS_MODULE,
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
struct simple_children {
|
||||||
|
struct config_group group;
|
||||||
|
};
|
||||||
|
|
||||||
|
static inline struct simple_children *to_simple_children(struct config_item *item)
|
||||||
|
{
|
||||||
|
return item ? container_of(to_config_group(item), struct simple_children, group) : NULL;
|
||||||
|
}
|
||||||
|
|
||||||
|
static struct config_item *simple_children_make_item(struct config_group *group, const char *name)
|
||||||
|
{
|
||||||
|
struct simple_child *simple_child;
|
||||||
|
|
||||||
|
simple_child = kzalloc(sizeof(struct simple_child), GFP_KERNEL);
|
||||||
|
if (!simple_child)
|
||||||
|
return ERR_PTR(-ENOMEM);
|
||||||
|
|
||||||
|
config_item_init_type_name(&simple_child->item, name,
|
||||||
|
&simple_child_type);
|
||||||
|
|
||||||
|
simple_child->storeme = 0;
|
||||||
|
|
||||||
|
return &simple_child->item;
|
||||||
|
}
|
||||||
|
|
||||||
|
static struct configfs_attribute simple_children_attr_description = {
|
||||||
|
.ca_owner = THIS_MODULE,
|
||||||
|
.ca_name = "description",
|
||||||
|
.ca_mode = S_IRUGO,
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct configfs_attribute *simple_children_attrs[] = {
|
||||||
|
&simple_children_attr_description,
|
||||||
|
NULL,
|
||||||
|
};
|
||||||
|
|
||||||
|
static ssize_t simple_children_attr_show(struct config_item *item,
|
||||||
|
struct configfs_attribute *attr,
|
||||||
|
char *page)
|
||||||
|
{
|
||||||
|
return sprintf(page,
|
||||||
|
"[02-simple-children]\n"
|
||||||
|
"\n"
|
||||||
|
"This subsystem allows the creation of child config_items. These\n"
|
||||||
|
"items have only one attribute that is readable and writeable.\n");
|
||||||
|
}
|
||||||
|
|
||||||
|
static void simple_children_release(struct config_item *item)
|
||||||
|
{
|
||||||
|
kfree(to_simple_children(item));
|
||||||
|
}
|
||||||
|
|
||||||
|
static struct configfs_item_operations simple_children_item_ops = {
|
||||||
|
.release = simple_children_release,
|
||||||
|
.show_attribute = simple_children_attr_show,
|
||||||
|
};
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Note that, since no extra work is required on ->drop_item(),
|
||||||
|
* no ->drop_item() is provided.
|
||||||
|
*/
|
||||||
|
static struct configfs_group_operations simple_children_group_ops = {
|
||||||
|
.make_item = simple_children_make_item,
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct config_item_type simple_children_type = {
|
||||||
|
.ct_item_ops = &simple_children_item_ops,
|
||||||
|
.ct_group_ops = &simple_children_group_ops,
|
||||||
|
.ct_attrs = simple_children_attrs,
|
||||||
|
.ct_owner = THIS_MODULE,
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct configfs_subsystem simple_children_subsys = {
|
||||||
|
.su_group = {
|
||||||
|
.cg_item = {
|
||||||
|
.ci_namebuf = "02-simple-children",
|
||||||
|
.ci_type = &simple_children_type,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
/* ----------------------------------------------------------------- */
|
||||||
|
|
||||||
|
/*
|
||||||
|
* 03-group-children
|
||||||
|
*
|
||||||
|
* This example reuses the simple_children group from above. However,
|
||||||
|
* the simple_children group is not the subsystem itself, it is a
|
||||||
|
* child of the subsystem. Creation of a group in the subsystem creates
|
||||||
|
* a new simple_children group. That group can then have simple_child
|
||||||
|
* children of its own.
|
||||||
|
*/
|
||||||
|
|
||||||
|
static struct config_group *group_children_make_group(struct config_group *group, const char *name)
|
||||||
|
{
|
||||||
|
struct simple_children *simple_children;
|
||||||
|
|
||||||
|
simple_children = kzalloc(sizeof(struct simple_children),
|
||||||
|
GFP_KERNEL);
|
||||||
|
if (!simple_children)
|
||||||
|
return ERR_PTR(-ENOMEM);
|
||||||
|
|
||||||
|
config_group_init_type_name(&simple_children->group, name,
|
||||||
|
&simple_children_type);
|
||||||
|
|
||||||
|
return &simple_children->group;
|
||||||
|
}
|
||||||
|
|
||||||
|
static struct configfs_attribute group_children_attr_description = {
|
||||||
|
.ca_owner = THIS_MODULE,
|
||||||
|
.ca_name = "description",
|
||||||
|
.ca_mode = S_IRUGO,
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct configfs_attribute *group_children_attrs[] = {
|
||||||
|
&group_children_attr_description,
|
||||||
|
NULL,
|
||||||
|
};
|
||||||
|
|
||||||
|
static ssize_t group_children_attr_show(struct config_item *item,
|
||||||
|
struct configfs_attribute *attr,
|
||||||
|
char *page)
|
||||||
|
{
|
||||||
|
return sprintf(page,
|
||||||
|
"[03-group-children]\n"
|
||||||
|
"\n"
|
||||||
|
"This subsystem allows the creation of child config_groups. These\n"
|
||||||
|
"groups are like the subsystem simple-children.\n");
|
||||||
|
}
|
||||||
|
|
||||||
|
static struct configfs_item_operations group_children_item_ops = {
|
||||||
|
.show_attribute = group_children_attr_show,
|
||||||
|
};
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Note that, since no extra work is required on ->drop_item(),
|
||||||
|
* no ->drop_item() is provided.
|
||||||
|
*/
|
||||||
|
static struct configfs_group_operations group_children_group_ops = {
|
||||||
|
.make_group = group_children_make_group,
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct config_item_type group_children_type = {
|
||||||
|
.ct_item_ops = &group_children_item_ops,
|
||||||
|
.ct_group_ops = &group_children_group_ops,
|
||||||
|
.ct_attrs = group_children_attrs,
|
||||||
|
.ct_owner = THIS_MODULE,
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct configfs_subsystem group_children_subsys = {
|
||||||
|
.su_group = {
|
||||||
|
.cg_item = {
|
||||||
|
.ci_namebuf = "03-group-children",
|
||||||
|
.ci_type = &group_children_type,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
/* ----------------------------------------------------------------- */
|
||||||
|
|
||||||
|
/*
|
||||||
|
* We're now done with our subsystem definitions.
|
||||||
|
* For convenience in this module, here's a list of them all. It
|
||||||
|
* allows the init function to easily register them. Most modules
|
||||||
|
* will only have one subsystem, and will only call register_subsystem
|
||||||
|
* on it directly.
|
||||||
|
*/
|
||||||
|
static struct configfs_subsystem *example_subsys[] = {
|
||||||
|
&childless_subsys.subsys,
|
||||||
|
&simple_children_subsys,
|
||||||
|
&group_children_subsys,
|
||||||
|
NULL,
|
||||||
|
};
|
||||||
|
|
||||||
|
static int __init configfs_example_init(void)
|
||||||
|
{
|
||||||
|
int ret;
|
||||||
|
int i;
|
||||||
|
struct configfs_subsystem *subsys;
|
||||||
|
|
||||||
|
for (i = 0; example_subsys[i]; i++) {
|
||||||
|
subsys = example_subsys[i];
|
||||||
|
|
||||||
|
config_group_init(&subsys->su_group);
|
||||||
|
mutex_init(&subsys->su_mutex);
|
||||||
|
ret = configfs_register_subsystem(subsys);
|
||||||
|
if (ret) {
|
||||||
|
printk(KERN_ERR "Error %d while registering subsystem %s\n",
|
||||||
|
ret,
|
||||||
|
subsys->su_group.cg_item.ci_namebuf);
|
||||||
|
goto out_unregister;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
|
||||||
|
out_unregister:
|
||||||
|
for (; i >= 0; i--) {
|
||||||
|
configfs_unregister_subsystem(example_subsys[i]);
|
||||||
|
}
|
||||||
|
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __exit configfs_example_exit(void)
|
||||||
|
{
|
||||||
|
int i;
|
||||||
|
|
||||||
|
for (i = 0; example_subsys[i]; i++) {
|
||||||
|
configfs_unregister_subsystem(example_subsys[i]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
module_init(configfs_example_init);
|
||||||
|
module_exit(configfs_example_exit);
|
||||||
|
MODULE_LICENSE("GPL");
|
448
Documentation/filesystems/configfs/configfs_example_macros.c
Normal file
448
Documentation/filesystems/configfs/configfs_example_macros.c
Normal file
@@ -0,0 +1,448 @@
|
|||||||
|
/*
|
||||||
|
* vim: noexpandtab ts=8 sts=0 sw=8:
|
||||||
|
*
|
||||||
|
* configfs_example_macros.c - This file is a demonstration module
|
||||||
|
* containing a number of configfs subsystems. It uses the helper
|
||||||
|
* macros defined by configfs.h
|
||||||
|
*
|
||||||
|
* This program is free software; you can redistribute it and/or
|
||||||
|
* modify it under the terms of the GNU General Public
|
||||||
|
* License as published by the Free Software Foundation; either
|
||||||
|
* version 2 of the License, or (at your option) any later version.
|
||||||
|
*
|
||||||
|
* This program is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||||
|
* General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public
|
||||||
|
* License along with this program; if not, write to the
|
||||||
|
* Free Software Foundation, Inc., 59 Temple Place - Suite 330,
|
||||||
|
* Boston, MA 021110-1307, USA.
|
||||||
|
*
|
||||||
|
* Based on sysfs:
|
||||||
|
* sysfs is Copyright (C) 2001, 2002, 2003 Patrick Mochel
|
||||||
|
*
|
||||||
|
* configfs Copyright (C) 2005 Oracle. All rights reserved.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include <linux/init.h>
|
||||||
|
#include <linux/module.h>
|
||||||
|
#include <linux/slab.h>
|
||||||
|
|
||||||
|
#include <linux/configfs.h>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
* 01-childless
|
||||||
|
*
|
||||||
|
* This first example is a childless subsystem. It cannot create
|
||||||
|
* any config_items. It just has attributes.
|
||||||
|
*
|
||||||
|
* Note that we are enclosing the configfs_subsystem inside a container.
|
||||||
|
* This is not necessary if a subsystem has no attributes directly
|
||||||
|
* on the subsystem. See the next example, 02-simple-children, for
|
||||||
|
* such a subsystem.
|
||||||
|
*/
|
||||||
|
|
||||||
|
struct childless {
|
||||||
|
struct configfs_subsystem subsys;
|
||||||
|
int showme;
|
||||||
|
int storeme;
|
||||||
|
};
|
||||||
|
|
||||||
|
static inline struct childless *to_childless(struct config_item *item)
|
||||||
|
{
|
||||||
|
return item ? container_of(to_configfs_subsystem(to_config_group(item)), struct childless, subsys) : NULL;
|
||||||
|
}
|
||||||
|
|
||||||
|
CONFIGFS_ATTR_STRUCT(childless);
|
||||||
|
#define CHILDLESS_ATTR(_name, _mode, _show, _store) \
|
||||||
|
struct childless_attribute childless_attr_##_name = __CONFIGFS_ATTR(_name, _mode, _show, _store)
|
||||||
|
#define CHILDLESS_ATTR_RO(_name, _show) \
|
||||||
|
struct childless_attribute childless_attr_##_name = __CONFIGFS_ATTR_RO(_name, _show);
|
||||||
|
|
||||||
|
static ssize_t childless_showme_read(struct childless *childless,
|
||||||
|
char *page)
|
||||||
|
{
|
||||||
|
ssize_t pos;
|
||||||
|
|
||||||
|
pos = sprintf(page, "%d\n", childless->showme);
|
||||||
|
childless->showme++;
|
||||||
|
|
||||||
|
return pos;
|
||||||
|
}
|
||||||
|
|
||||||
|
static ssize_t childless_storeme_read(struct childless *childless,
|
||||||
|
char *page)
|
||||||
|
{
|
||||||
|
return sprintf(page, "%d\n", childless->storeme);
|
||||||
|
}
|
||||||
|
|
||||||
|
static ssize_t childless_storeme_write(struct childless *childless,
|
||||||
|
const char *page,
|
||||||
|
size_t count)
|
||||||
|
{
|
||||||
|
unsigned long tmp;
|
||||||
|
char *p = (char *) page;
|
||||||
|
|
||||||
|
tmp = simple_strtoul(p, &p, 10);
|
||||||
|
if (!p || (*p && (*p != '\n')))
|
||||||
|
return -EINVAL;
|
||||||
|
|
||||||
|
if (tmp > INT_MAX)
|
||||||
|
return -ERANGE;
|
||||||
|
|
||||||
|
childless->storeme = tmp;
|
||||||
|
|
||||||
|
return count;
|
||||||
|
}
|
||||||
|
|
||||||
|
static ssize_t childless_description_read(struct childless *childless,
|
||||||
|
char *page)
|
||||||
|
{
|
||||||
|
return sprintf(page,
|
||||||
|
"[01-childless]\n"
|
||||||
|
"\n"
|
||||||
|
"The childless subsystem is the simplest possible subsystem in\n"
|
||||||
|
"configfs. It does not support the creation of child config_items.\n"
|
||||||
|
"It only has a few attributes. In fact, it isn't much different\n"
|
||||||
|
"than a directory in /proc.\n");
|
||||||
|
}
|
||||||
|
|
||||||
|
CHILDLESS_ATTR_RO(showme, childless_showme_read);
|
||||||
|
CHILDLESS_ATTR(storeme, S_IRUGO | S_IWUSR, childless_storeme_read,
|
||||||
|
childless_storeme_write);
|
||||||
|
CHILDLESS_ATTR_RO(description, childless_description_read);
|
||||||
|
|
||||||
|
static struct configfs_attribute *childless_attrs[] = {
|
||||||
|
&childless_attr_showme.attr,
|
||||||
|
&childless_attr_storeme.attr,
|
||||||
|
&childless_attr_description.attr,
|
||||||
|
NULL,
|
||||||
|
};
|
||||||
|
|
||||||
|
CONFIGFS_ATTR_OPS(childless);
|
||||||
|
static struct configfs_item_operations childless_item_ops = {
|
||||||
|
.show_attribute = childless_attr_show,
|
||||||
|
.store_attribute = childless_attr_store,
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct config_item_type childless_type = {
|
||||||
|
.ct_item_ops = &childless_item_ops,
|
||||||
|
.ct_attrs = childless_attrs,
|
||||||
|
.ct_owner = THIS_MODULE,
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct childless childless_subsys = {
|
||||||
|
.subsys = {
|
||||||
|
.su_group = {
|
||||||
|
.cg_item = {
|
||||||
|
.ci_namebuf = "01-childless",
|
||||||
|
.ci_type = &childless_type,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
/* ----------------------------------------------------------------- */
|
||||||
|
|
||||||
|
/*
|
||||||
|
* 02-simple-children
|
||||||
|
*
|
||||||
|
* This example merely has a simple one-attribute child. Note that
|
||||||
|
* there is no extra attribute structure, as the child's attribute is
|
||||||
|
* known from the get-go. Also, there is no container for the
|
||||||
|
* subsystem, as it has no attributes of its own.
|
||||||
|
*/
|
||||||
|
|
||||||
|
struct simple_child {
|
||||||
|
struct config_item item;
|
||||||
|
int storeme;
|
||||||
|
};
|
||||||
|
|
||||||
|
static inline struct simple_child *to_simple_child(struct config_item *item)
|
||||||
|
{
|
||||||
|
return item ? container_of(item, struct simple_child, item) : NULL;
|
||||||
|
}
|
||||||
|
|
||||||
|
static struct configfs_attribute simple_child_attr_storeme = {
|
||||||
|
.ca_owner = THIS_MODULE,
|
||||||
|
.ca_name = "storeme",
|
||||||
|
.ca_mode = S_IRUGO | S_IWUSR,
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct configfs_attribute *simple_child_attrs[] = {
|
||||||
|
&simple_child_attr_storeme,
|
||||||
|
NULL,
|
||||||
|
};
|
||||||
|
|
||||||
|
static ssize_t simple_child_attr_show(struct config_item *item,
|
||||||
|
struct configfs_attribute *attr,
|
||||||
|
char *page)
|
||||||
|
{
|
||||||
|
ssize_t count;
|
||||||
|
struct simple_child *simple_child = to_simple_child(item);
|
||||||
|
|
||||||
|
count = sprintf(page, "%d\n", simple_child->storeme);
|
||||||
|
|
||||||
|
return count;
|
||||||
|
}
|
||||||
|
|
||||||
|
static ssize_t simple_child_attr_store(struct config_item *item,
|
||||||
|
struct configfs_attribute *attr,
|
||||||
|
const char *page, size_t count)
|
||||||
|
{
|
||||||
|
struct simple_child *simple_child = to_simple_child(item);
|
||||||
|
unsigned long tmp;
|
||||||
|
char *p = (char *) page;
|
||||||
|
|
||||||
|
tmp = simple_strtoul(p, &p, 10);
|
||||||
|
if (!p || (*p && (*p != '\n')))
|
||||||
|
return -EINVAL;
|
||||||
|
|
||||||
|
if (tmp > INT_MAX)
|
||||||
|
return -ERANGE;
|
||||||
|
|
||||||
|
simple_child->storeme = tmp;
|
||||||
|
|
||||||
|
return count;
|
||||||
|
}
|
||||||
|
|
||||||
|
static void simple_child_release(struct config_item *item)
|
||||||
|
{
|
||||||
|
kfree(to_simple_child(item));
|
||||||
|
}
|
||||||
|
|
||||||
|
static struct configfs_item_operations simple_child_item_ops = {
|
||||||
|
.release = simple_child_release,
|
||||||
|
.show_attribute = simple_child_attr_show,
|
||||||
|
.store_attribute = simple_child_attr_store,
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct config_item_type simple_child_type = {
|
||||||
|
.ct_item_ops = &simple_child_item_ops,
|
||||||
|
.ct_attrs = simple_child_attrs,
|
||||||
|
.ct_owner = THIS_MODULE,
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
struct simple_children {
|
||||||
|
struct config_group group;
|
||||||
|
};
|
||||||
|
|
||||||
|
static inline struct simple_children *to_simple_children(struct config_item *item)
|
||||||
|
{
|
||||||
|
return item ? container_of(to_config_group(item), struct simple_children, group) : NULL;
|
||||||
|
}
|
||||||
|
|
||||||
|
static struct config_item *simple_children_make_item(struct config_group *group, const char *name)
|
||||||
|
{
|
||||||
|
struct simple_child *simple_child;
|
||||||
|
|
||||||
|
simple_child = kzalloc(sizeof(struct simple_child), GFP_KERNEL);
|
||||||
|
if (!simple_child)
|
||||||
|
return ERR_PTR(-ENOMEM);
|
||||||
|
|
||||||
|
config_item_init_type_name(&simple_child->item, name,
|
||||||
|
&simple_child_type);
|
||||||
|
|
||||||
|
simple_child->storeme = 0;
|
||||||
|
|
||||||
|
return &simple_child->item;
|
||||||
|
}
|
||||||
|
|
||||||
|
static struct configfs_attribute simple_children_attr_description = {
|
||||||
|
.ca_owner = THIS_MODULE,
|
||||||
|
.ca_name = "description",
|
||||||
|
.ca_mode = S_IRUGO,
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct configfs_attribute *simple_children_attrs[] = {
|
||||||
|
&simple_children_attr_description,
|
||||||
|
NULL,
|
||||||
|
};
|
||||||
|
|
||||||
|
static ssize_t simple_children_attr_show(struct config_item *item,
|
||||||
|
struct configfs_attribute *attr,
|
||||||
|
char *page)
|
||||||
|
{
|
||||||
|
return sprintf(page,
|
||||||
|
"[02-simple-children]\n"
|
||||||
|
"\n"
|
||||||
|
"This subsystem allows the creation of child config_items. These\n"
|
||||||
|
"items have only one attribute that is readable and writeable.\n");
|
||||||
|
}
|
||||||
|
|
||||||
|
static void simple_children_release(struct config_item *item)
|
||||||
|
{
|
||||||
|
kfree(to_simple_children(item));
|
||||||
|
}
|
||||||
|
|
||||||
|
static struct configfs_item_operations simple_children_item_ops = {
|
||||||
|
.release = simple_children_release,
|
||||||
|
.show_attribute = simple_children_attr_show,
|
||||||
|
};
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Note that, since no extra work is required on ->drop_item(),
|
||||||
|
* no ->drop_item() is provided.
|
||||||
|
*/
|
||||||
|
static struct configfs_group_operations simple_children_group_ops = {
|
||||||
|
.make_item = simple_children_make_item,
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct config_item_type simple_children_type = {
|
||||||
|
.ct_item_ops = &simple_children_item_ops,
|
||||||
|
.ct_group_ops = &simple_children_group_ops,
|
||||||
|
.ct_attrs = simple_children_attrs,
|
||||||
|
.ct_owner = THIS_MODULE,
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct configfs_subsystem simple_children_subsys = {
|
||||||
|
.su_group = {
|
||||||
|
.cg_item = {
|
||||||
|
.ci_namebuf = "02-simple-children",
|
||||||
|
.ci_type = &simple_children_type,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
/* ----------------------------------------------------------------- */
|
||||||
|
|
||||||
|
/*
|
||||||
|
* 03-group-children
|
||||||
|
*
|
||||||
|
* This example reuses the simple_children group from above. However,
|
||||||
|
* the simple_children group is not the subsystem itself, it is a
|
||||||
|
* child of the subsystem. Creation of a group in the subsystem creates
|
||||||
|
* a new simple_children group. That group can then have simple_child
|
||||||
|
* children of its own.
|
||||||
|
*/
|
||||||
|
|
||||||
|
static struct config_group *group_children_make_group(struct config_group *group, const char *name)
|
||||||
|
{
|
||||||
|
struct simple_children *simple_children;
|
||||||
|
|
||||||
|
simple_children = kzalloc(sizeof(struct simple_children),
|
||||||
|
GFP_KERNEL);
|
||||||
|
if (!simple_children)
|
||||||
|
return ERR_PTR(-ENOMEM);
|
||||||
|
|
||||||
|
config_group_init_type_name(&simple_children->group, name,
|
||||||
|
&simple_children_type);
|
||||||
|
|
||||||
|
return &simple_children->group;
|
||||||
|
}
|
||||||
|
|
||||||
|
static struct configfs_attribute group_children_attr_description = {
|
||||||
|
.ca_owner = THIS_MODULE,
|
||||||
|
.ca_name = "description",
|
||||||
|
.ca_mode = S_IRUGO,
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct configfs_attribute *group_children_attrs[] = {
|
||||||
|
&group_children_attr_description,
|
||||||
|
NULL,
|
||||||
|
};
|
||||||
|
|
||||||
|
static ssize_t group_children_attr_show(struct config_item *item,
|
||||||
|
struct configfs_attribute *attr,
|
||||||
|
char *page)
|
||||||
|
{
|
||||||
|
return sprintf(page,
|
||||||
|
"[03-group-children]\n"
|
||||||
|
"\n"
|
||||||
|
"This subsystem allows the creation of child config_groups. These\n"
|
||||||
|
"groups are like the subsystem simple-children.\n");
|
||||||
|
}
|
||||||
|
|
||||||
|
static struct configfs_item_operations group_children_item_ops = {
|
||||||
|
.show_attribute = group_children_attr_show,
|
||||||
|
};
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Note that, since no extra work is required on ->drop_item(),
|
||||||
|
* no ->drop_item() is provided.
|
||||||
|
*/
|
||||||
|
static struct configfs_group_operations group_children_group_ops = {
|
||||||
|
.make_group = group_children_make_group,
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct config_item_type group_children_type = {
|
||||||
|
.ct_item_ops = &group_children_item_ops,
|
||||||
|
.ct_group_ops = &group_children_group_ops,
|
||||||
|
.ct_attrs = group_children_attrs,
|
||||||
|
.ct_owner = THIS_MODULE,
|
||||||
|
};
|
||||||
|
|
||||||
|
static struct configfs_subsystem group_children_subsys = {
|
||||||
|
.su_group = {
|
||||||
|
.cg_item = {
|
||||||
|
.ci_namebuf = "03-group-children",
|
||||||
|
.ci_type = &group_children_type,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
/* ----------------------------------------------------------------- */
|
||||||
|
|
||||||
|
/*
|
||||||
|
* We're now done with our subsystem definitions.
|
||||||
|
* For convenience in this module, here's a list of them all. It
|
||||||
|
* allows the init function to easily register them. Most modules
|
||||||
|
* will only have one subsystem, and will only call register_subsystem
|
||||||
|
* on it directly.
|
||||||
|
*/
|
||||||
|
static struct configfs_subsystem *example_subsys[] = {
|
||||||
|
&childless_subsys.subsys,
|
||||||
|
&simple_children_subsys,
|
||||||
|
&group_children_subsys,
|
||||||
|
NULL,
|
||||||
|
};
|
||||||
|
|
||||||
|
static int __init configfs_example_init(void)
|
||||||
|
{
|
||||||
|
int ret;
|
||||||
|
int i;
|
||||||
|
struct configfs_subsystem *subsys;
|
||||||
|
|
||||||
|
for (i = 0; example_subsys[i]; i++) {
|
||||||
|
subsys = example_subsys[i];
|
||||||
|
|
||||||
|
config_group_init(&subsys->su_group);
|
||||||
|
mutex_init(&subsys->su_mutex);
|
||||||
|
ret = configfs_register_subsystem(subsys);
|
||||||
|
if (ret) {
|
||||||
|
printk(KERN_ERR "Error %d while registering subsystem %s\n",
|
||||||
|
ret,
|
||||||
|
subsys->su_group.cg_item.ci_namebuf);
|
||||||
|
goto out_unregister;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
|
||||||
|
out_unregister:
|
||||||
|
for (; i >= 0; i--) {
|
||||||
|
configfs_unregister_subsystem(example_subsys[i]);
|
||||||
|
}
|
||||||
|
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
static void __exit configfs_example_exit(void)
|
||||||
|
{
|
||||||
|
int i;
|
||||||
|
|
||||||
|
for (i = 0; example_subsys[i]; i++) {
|
||||||
|
configfs_unregister_subsystem(example_subsys[i]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
module_init(configfs_example_init);
|
||||||
|
module_exit(configfs_example_exit);
|
||||||
|
MODULE_LICENSE("GPL");
|
@@ -13,72 +13,99 @@ Mailing list: linux-ext4@vger.kernel.org
|
|||||||
1. Quick usage instructions:
|
1. Quick usage instructions:
|
||||||
===========================
|
===========================
|
||||||
|
|
||||||
- Grab updated e2fsprogs from
|
- Compile and install the latest version of e2fsprogs (as of this
|
||||||
ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs-interim/
|
writing version 1.41) from:
|
||||||
This is a patchset on top of e2fsprogs-1.39, which can be found at
|
|
||||||
|
http://sourceforge.net/project/showfiles.php?group_id=2406
|
||||||
|
|
||||||
|
or
|
||||||
|
|
||||||
ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/
|
ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/
|
||||||
|
|
||||||
- It's still mke2fs -j /dev/hda1
|
or grab the latest git repository from:
|
||||||
|
|
||||||
- mount /dev/hda1 /wherever -t ext4dev
|
git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
|
||||||
|
|
||||||
- To enable extents,
|
- Note that it is highly important to install the mke2fs.conf file
|
||||||
|
that comes with the e2fsprogs 1.41.x sources in /etc/mke2fs.conf. If
|
||||||
|
you have edited the /etc/mke2fs.conf file installed on your system,
|
||||||
|
you will need to merge your changes with the version from e2fsprogs
|
||||||
|
1.41.x.
|
||||||
|
|
||||||
mount /dev/hda1 /wherever -t ext4dev -o extents
|
- Create a new filesystem using the ext4 filesystem type:
|
||||||
|
|
||||||
- The filesystem is compatible with the ext3 driver until you add a file
|
# mke2fs -t ext4 /dev/hda1
|
||||||
which has extents (ie: `mount -o extents', then create a file).
|
|
||||||
|
|
||||||
NOTE: The "extents" mount flag is temporary. It will soon go away and
|
Or configure an existing ext3 filesystem to support extents and set
|
||||||
extents will be enabled by the "-o extents" flag to mke2fs or tune2fs
|
the test_fs flag to indicate that it's ok for an in-development
|
||||||
|
filesystem to touch this filesystem:
|
||||||
|
|
||||||
|
# tune2fs -O extents -E test_fs /dev/hda1
|
||||||
|
|
||||||
|
If the filesystem was created with 128 byte inodes, it can be
|
||||||
|
converted to use 256 byte for greater efficiency via:
|
||||||
|
|
||||||
|
# tune2fs -I 256 /dev/hda1
|
||||||
|
|
||||||
|
(Note: we currently do not have tools to convert an ext4
|
||||||
|
filesystem back to ext3; so please do not do try this on production
|
||||||
|
filesystems.)
|
||||||
|
|
||||||
|
- Mounting:
|
||||||
|
|
||||||
|
# mount -t ext4 /dev/hda1 /wherever
|
||||||
|
|
||||||
- When comparing performance with other filesystems, remember that
|
- When comparing performance with other filesystems, remember that
|
||||||
ext3/4 by default offers higher data integrity guarantees than most. So
|
ext3/4 by default offers higher data integrity guarantees than most.
|
||||||
when comparing with a metadata-only journalling filesystem, use `mount -o
|
So when comparing with a metadata-only journalling filesystem, such
|
||||||
data=writeback'. And you might as well use `mount -o nobh' too along
|
as ext3, use `mount -o data=writeback'. And you might as well use
|
||||||
with it. Making the journal larger than the mke2fs default often helps
|
`mount -o nobh' too along with it. Making the journal larger than
|
||||||
performance with metadata-intensive workloads.
|
the mke2fs default often helps performance with metadata-intensive
|
||||||
|
workloads.
|
||||||
|
|
||||||
2. Features
|
2. Features
|
||||||
===========
|
===========
|
||||||
|
|
||||||
2.1 Currently available
|
2.1 Currently available
|
||||||
|
|
||||||
* ability to use filesystems > 16TB
|
* ability to use filesystems > 16TB (e2fsprogs support not available yet)
|
||||||
* extent format reduces metadata overhead (RAM, IO for access, transactions)
|
* extent format reduces metadata overhead (RAM, IO for access, transactions)
|
||||||
* extent format more robust in face of on-disk corruption due to magics,
|
* extent format more robust in face of on-disk corruption due to magics,
|
||||||
* internal redunancy in tree
|
* internal redunancy in tree
|
||||||
|
* improved file allocation (multi-block alloc)
|
||||||
2.1 Previously available, soon to be enabled by default by "mkefs.ext4":
|
* fix 32000 subdirectory limit
|
||||||
|
* nsec timestamps for mtime, atime, ctime, create time
|
||||||
* dir_index and resize inode will be on by default
|
* inode version field on disk (NFSv4, Lustre)
|
||||||
* large inodes will be used by default for fast EAs, nsec timestamps, etc
|
* reduced e2fsck time via uninit_bg feature
|
||||||
|
* journal checksumming for robustness, performance
|
||||||
|
* persistent file preallocation (e.g for streaming media, databases)
|
||||||
|
* ability to pack bitmaps and inode tables into larger virtual groups via the
|
||||||
|
flex_bg feature
|
||||||
|
* large file support
|
||||||
|
* Inode allocation using large virtual block groups via flex_bg
|
||||||
|
* delayed allocation
|
||||||
|
* large block (up to pagesize) support
|
||||||
|
* efficent new ordered mode in JBD2 and ext4(avoid using buffer head to force
|
||||||
|
the ordering)
|
||||||
|
|
||||||
2.2 Candidate features for future inclusion
|
2.2 Candidate features for future inclusion
|
||||||
|
|
||||||
There are several under discussion, whether they all make it in is
|
* Online defrag (patches available but not well tested)
|
||||||
partly a function of how much time everyone has to work on them:
|
* reduced mke2fs time via lazy itable initialization in conjuction with
|
||||||
|
the uninit_bg feature (capability to do this is available in e2fsprogs
|
||||||
|
but a kernel thread to do lazy zeroing of unused inode table blocks
|
||||||
|
after filesystem is first mounted is required for safety)
|
||||||
|
|
||||||
* improved file allocation (multi-block alloc, delayed alloc; basically done)
|
There are several others under discussion, whether they all make it in is
|
||||||
* fix 32000 subdirectory limit (patch exists, needs some e2fsck work)
|
partly a function of how much time everyone has to work on them. Features like
|
||||||
* nsec timestamps for mtime, atime, ctime, create time (patch exists,
|
metadata checksumming have been discussed and planned for a bit but no patches
|
||||||
needs some e2fsck work)
|
exist yet so I'm not sure they're in the near-term roadmap.
|
||||||
* inode version field on disk (NFSv4, Lustre; prototype exists)
|
|
||||||
* reduced mke2fs/e2fsck time via uninitialized groups (prototype exists)
|
|
||||||
* journal checksumming for robustness, performance (prototype exists)
|
|
||||||
* persistent file preallocation (e.g for streaming media, databases)
|
|
||||||
|
|
||||||
Features like metadata checksumming have been discussed and planned for
|
The big performance win will come with mballoc, delalloc and flex_bg
|
||||||
a bit but no patches exist yet so I'm not sure they're in the near-term
|
grouping of bitmaps and inode tables. Some test results available here:
|
||||||
roadmap.
|
|
||||||
|
|
||||||
The big performance win will come with mballoc and delalloc. CFS has
|
- http://www.bullopensource.org/ext4/20080530/ffsb-write-2.6.26-rc2.html
|
||||||
been using mballoc for a few years already with Lustre, and IBM + Bull
|
- http://www.bullopensource.org/ext4/20080530/ffsb-readwrite-2.6.26-rc2.html
|
||||||
did a lot of benchmarking on it. The reason it isn't in the first set of
|
|
||||||
patches is partly a manageability issue, and partly because it doesn't
|
|
||||||
directly affect the on-disk format (outside of much better allocation)
|
|
||||||
so it isn't critical to get into the first round of changes. I believe
|
|
||||||
Alex is working on a new set of patches right now.
|
|
||||||
|
|
||||||
3. Options
|
3. Options
|
||||||
==========
|
==========
|
||||||
@@ -150,6 +177,11 @@ barrier=<0|1(*)> This enables/disables the use of write barriers in
|
|||||||
your disks are battery-backed in one way or another,
|
your disks are battery-backed in one way or another,
|
||||||
disabling barriers may safely improve performance.
|
disabling barriers may safely improve performance.
|
||||||
|
|
||||||
|
inode_readahead=n This tuning parameter controls the maximum
|
||||||
|
number of inode table blocks that ext4's inode
|
||||||
|
table readahead algorithm will pre-read into
|
||||||
|
the buffer cache. The default value is 32 blocks.
|
||||||
|
|
||||||
orlov (*) This enables the new Orlov block allocator. It is
|
orlov (*) This enables the new Orlov block allocator. It is
|
||||||
enabled by default.
|
enabled by default.
|
||||||
|
|
||||||
@@ -191,6 +223,11 @@ errors=remount-ro(*) Remount the filesystem read-only on an error.
|
|||||||
errors=continue Keep going on a filesystem error.
|
errors=continue Keep going on a filesystem error.
|
||||||
errors=panic Panic and halt the machine if an error occurs.
|
errors=panic Panic and halt the machine if an error occurs.
|
||||||
|
|
||||||
|
data_err=ignore(*) Just print an error message if an error occurs
|
||||||
|
in a file data buffer in ordered mode.
|
||||||
|
data_err=abort Abort the journal if an error occurs in a file
|
||||||
|
data buffer in ordered mode.
|
||||||
|
|
||||||
grpid Give objects the same group ID as their creator.
|
grpid Give objects the same group ID as their creator.
|
||||||
bsdgroups
|
bsdgroups
|
||||||
|
|
||||||
@@ -222,9 +259,12 @@ stripe=n Number of filesystem blocks that mballoc will try
|
|||||||
to use for allocation size and alignment. For RAID5/6
|
to use for allocation size and alignment. For RAID5/6
|
||||||
systems this should be the number of data
|
systems this should be the number of data
|
||||||
disks * RAID chunk size in file system blocks.
|
disks * RAID chunk size in file system blocks.
|
||||||
|
delalloc (*) Deferring block allocation until write-out time.
|
||||||
|
nodelalloc Disable delayed allocation. Blocks are allocation
|
||||||
|
when data is copied from user to page cache.
|
||||||
|
|
||||||
Data Mode
|
Data Mode
|
||||||
---------
|
=========
|
||||||
There are 3 different data modes:
|
There are 3 different data modes:
|
||||||
|
|
||||||
* writeback mode
|
* writeback mode
|
||||||
@@ -236,10 +276,10 @@ typically provide the best ext4 performance.
|
|||||||
|
|
||||||
* ordered mode
|
* ordered mode
|
||||||
In data=ordered mode, ext4 only officially journals metadata, but it logically
|
In data=ordered mode, ext4 only officially journals metadata, but it logically
|
||||||
groups metadata and data blocks into a single unit called a transaction. When
|
groups metadata information related to data changes with the data blocks into a
|
||||||
it's time to write the new metadata out to disk, the associated data blocks
|
single unit called a transaction. When it's time to write the new metadata
|
||||||
are written first. In general, this mode performs slightly slower than
|
out to disk, the associated data blocks are written first. In general,
|
||||||
writeback but significantly faster than journal mode.
|
this mode performs slightly slower than writeback but significantly faster than journal mode.
|
||||||
|
|
||||||
* journal mode
|
* journal mode
|
||||||
data=journal mode provides full data and metadata journaling. All new data is
|
data=journal mode provides full data and metadata journaling. All new data is
|
||||||
@@ -247,7 +287,8 @@ written to the journal first, and then to its final location.
|
|||||||
In the event of a crash, the journal can be replayed, bringing both data and
|
In the event of a crash, the journal can be replayed, bringing both data and
|
||||||
metadata into a consistent state. This mode is the slowest except when data
|
metadata into a consistent state. This mode is the slowest except when data
|
||||||
needs to be read from and written to disk at the same time where it
|
needs to be read from and written to disk at the same time where it
|
||||||
outperforms all others modes.
|
outperforms all others modes. Curently ext4 does not have delayed
|
||||||
|
allocation support if this data journalling mode is selected.
|
||||||
|
|
||||||
References
|
References
|
||||||
==========
|
==========
|
||||||
@@ -256,7 +297,8 @@ kernel source: <file:fs/ext4/>
|
|||||||
<file:fs/jbd2/>
|
<file:fs/jbd2/>
|
||||||
|
|
||||||
programs: http://e2fsprogs.sourceforge.net/
|
programs: http://e2fsprogs.sourceforge.net/
|
||||||
http://ext2resize.sourceforge.net
|
|
||||||
|
|
||||||
useful links: http://fedoraproject.org/wiki/ext3-devel
|
useful links: http://fedoraproject.org/wiki/ext3-devel
|
||||||
http://www.bullopensource.org/ext4/
|
http://www.bullopensource.org/ext4/
|
||||||
|
http://ext4.wiki.kernel.org/index.php/Main_Page
|
||||||
|
http://fedoraproject.org/wiki/Features/Ext4
|
||||||
|
228
Documentation/filesystems/fiemap.txt
Normal file
228
Documentation/filesystems/fiemap.txt
Normal file
@@ -0,0 +1,228 @@
|
|||||||
|
============
|
||||||
|
Fiemap Ioctl
|
||||||
|
============
|
||||||
|
|
||||||
|
The fiemap ioctl is an efficient method for userspace to get file
|
||||||
|
extent mappings. Instead of block-by-block mapping (such as bmap), fiemap
|
||||||
|
returns a list of extents.
|
||||||
|
|
||||||
|
|
||||||
|
Request Basics
|
||||||
|
--------------
|
||||||
|
|
||||||
|
A fiemap request is encoded within struct fiemap:
|
||||||
|
|
||||||
|
struct fiemap {
|
||||||
|
__u64 fm_start; /* logical offset (inclusive) at
|
||||||
|
* which to start mapping (in) */
|
||||||
|
__u64 fm_length; /* logical length of mapping which
|
||||||
|
* userspace cares about (in) */
|
||||||
|
__u32 fm_flags; /* FIEMAP_FLAG_* flags for request (in/out) */
|
||||||
|
__u32 fm_mapped_extents; /* number of extents that were
|
||||||
|
* mapped (out) */
|
||||||
|
__u32 fm_extent_count; /* size of fm_extents array (in) */
|
||||||
|
__u32 fm_reserved;
|
||||||
|
struct fiemap_extent fm_extents[0]; /* array of mapped extents (out) */
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
fm_start, and fm_length specify the logical range within the file
|
||||||
|
which the process would like mappings for. Extents returned mirror
|
||||||
|
those on disk - that is, the logical offset of the 1st returned extent
|
||||||
|
may start before fm_start, and the range covered by the last returned
|
||||||
|
extent may end after fm_length. All offsets and lengths are in bytes.
|
||||||
|
|
||||||
|
Certain flags to modify the way in which mappings are looked up can be
|
||||||
|
set in fm_flags. If the kernel doesn't understand some particular
|
||||||
|
flags, it will return EBADR and the contents of fm_flags will contain
|
||||||
|
the set of flags which caused the error. If the kernel is compatible
|
||||||
|
with all flags passed, the contents of fm_flags will be unmodified.
|
||||||
|
It is up to userspace to determine whether rejection of a particular
|
||||||
|
flag is fatal to it's operation. This scheme is intended to allow the
|
||||||
|
fiemap interface to grow in the future but without losing
|
||||||
|
compatibility with old software.
|
||||||
|
|
||||||
|
fm_extent_count specifies the number of elements in the fm_extents[] array
|
||||||
|
that can be used to return extents. If fm_extent_count is zero, then the
|
||||||
|
fm_extents[] array is ignored (no extents will be returned), and the
|
||||||
|
fm_mapped_extents count will hold the number of extents needed in
|
||||||
|
fm_extents[] to hold the file's current mapping. Note that there is
|
||||||
|
nothing to prevent the file from changing between calls to FIEMAP.
|
||||||
|
|
||||||
|
The following flags can be set in fm_flags:
|
||||||
|
|
||||||
|
* FIEMAP_FLAG_SYNC
|
||||||
|
If this flag is set, the kernel will sync the file before mapping extents.
|
||||||
|
|
||||||
|
* FIEMAP_FLAG_XATTR
|
||||||
|
If this flag is set, the extents returned will describe the inodes
|
||||||
|
extended attribute lookup tree, instead of it's data tree.
|
||||||
|
|
||||||
|
|
||||||
|
Extent Mapping
|
||||||
|
--------------
|
||||||
|
|
||||||
|
Extent information is returned within the embedded fm_extents array
|
||||||
|
which userspace must allocate along with the fiemap structure. The
|
||||||
|
number of elements in the fiemap_extents[] array should be passed via
|
||||||
|
fm_extent_count. The number of extents mapped by kernel will be
|
||||||
|
returned via fm_mapped_extents. If the number of fiemap_extents
|
||||||
|
allocated is less than would be required to map the requested range,
|
||||||
|
the maximum number of extents that can be mapped in the fm_extent[]
|
||||||
|
array will be returned and fm_mapped_extents will be equal to
|
||||||
|
fm_extent_count. In that case, the last extent in the array will not
|
||||||
|
complete the requested range and will not have the FIEMAP_EXTENT_LAST
|
||||||
|
flag set (see the next section on extent flags).
|
||||||
|
|
||||||
|
Each extent is described by a single fiemap_extent structure as
|
||||||
|
returned in fm_extents.
|
||||||
|
|
||||||
|
struct fiemap_extent {
|
||||||
|
__u64 fe_logical; /* logical offset in bytes for the start of
|
||||||
|
* the extent */
|
||||||
|
__u64 fe_physical; /* physical offset in bytes for the start
|
||||||
|
* of the extent */
|
||||||
|
__u64 fe_length; /* length in bytes for the extent */
|
||||||
|
__u64 fe_reserved64[2];
|
||||||
|
__u32 fe_flags; /* FIEMAP_EXTENT_* flags for this extent */
|
||||||
|
__u32 fe_reserved[3];
|
||||||
|
};
|
||||||
|
|
||||||
|
All offsets and lengths are in bytes and mirror those on disk. It is valid
|
||||||
|
for an extents logical offset to start before the request or it's logical
|
||||||
|
length to extend past the request. Unless FIEMAP_EXTENT_NOT_ALIGNED is
|
||||||
|
returned, fe_logical, fe_physical, and fe_length will be aligned to the
|
||||||
|
block size of the file system. With the exception of extents flagged as
|
||||||
|
FIEMAP_EXTENT_MERGED, adjacent extents will not be merged.
|
||||||
|
|
||||||
|
The fe_flags field contains flags which describe the extent returned.
|
||||||
|
A special flag, FIEMAP_EXTENT_LAST is always set on the last extent in
|
||||||
|
the file so that the process making fiemap calls can determine when no
|
||||||
|
more extents are available, without having to call the ioctl again.
|
||||||
|
|
||||||
|
Some flags are intentionally vague and will always be set in the
|
||||||
|
presence of other more specific flags. This way a program looking for
|
||||||
|
a general property does not have to know all existing and future flags
|
||||||
|
which imply that property.
|
||||||
|
|
||||||
|
For example, if FIEMAP_EXTENT_DATA_INLINE or FIEMAP_EXTENT_DATA_TAIL
|
||||||
|
are set, FIEMAP_EXTENT_NOT_ALIGNED will also be set. A program looking
|
||||||
|
for inline or tail-packed data can key on the specific flag. Software
|
||||||
|
which simply cares not to try operating on non-aligned extents
|
||||||
|
however, can just key on FIEMAP_EXTENT_NOT_ALIGNED, and not have to
|
||||||
|
worry about all present and future flags which might imply unaligned
|
||||||
|
data. Note that the opposite is not true - it would be valid for
|
||||||
|
FIEMAP_EXTENT_NOT_ALIGNED to appear alone.
|
||||||
|
|
||||||
|
* FIEMAP_EXTENT_LAST
|
||||||
|
This is the last extent in the file. A mapping attempt past this
|
||||||
|
extent will return nothing.
|
||||||
|
|
||||||
|
* FIEMAP_EXTENT_UNKNOWN
|
||||||
|
The location of this extent is currently unknown. This may indicate
|
||||||
|
the data is stored on an inaccessible volume or that no storage has
|
||||||
|
been allocated for the file yet.
|
||||||
|
|
||||||
|
* FIEMAP_EXTENT_DELALLOC
|
||||||
|
- This will also set FIEMAP_EXTENT_UNKNOWN.
|
||||||
|
Delayed allocation - while there is data for this extent, it's
|
||||||
|
physical location has not been allocated yet.
|
||||||
|
|
||||||
|
* FIEMAP_EXTENT_ENCODED
|
||||||
|
This extent does not consist of plain filesystem blocks but is
|
||||||
|
encoded (e.g. encrypted or compressed). Reading the data in this
|
||||||
|
extent via I/O to the block device will have undefined results.
|
||||||
|
|
||||||
|
Note that it is *always* undefined to try to update the data
|
||||||
|
in-place by writing to the indicated location without the
|
||||||
|
assistance of the filesystem, or to access the data using the
|
||||||
|
information returned by the FIEMAP interface while the filesystem
|
||||||
|
is mounted. In other words, user applications may only read the
|
||||||
|
extent data via I/O to the block device while the filesystem is
|
||||||
|
unmounted, and then only if the FIEMAP_EXTENT_ENCODED flag is
|
||||||
|
clear; user applications must not try reading or writing to the
|
||||||
|
filesystem via the block device under any other circumstances.
|
||||||
|
|
||||||
|
* FIEMAP_EXTENT_DATA_ENCRYPTED
|
||||||
|
- This will also set FIEMAP_EXTENT_ENCODED
|
||||||
|
The data in this extent has been encrypted by the file system.
|
||||||
|
|
||||||
|
* FIEMAP_EXTENT_NOT_ALIGNED
|
||||||
|
Extent offsets and length are not guaranteed to be block aligned.
|
||||||
|
|
||||||
|
* FIEMAP_EXTENT_DATA_INLINE
|
||||||
|
This will also set FIEMAP_EXTENT_NOT_ALIGNED
|
||||||
|
Data is located within a meta data block.
|
||||||
|
|
||||||
|
* FIEMAP_EXTENT_DATA_TAIL
|
||||||
|
This will also set FIEMAP_EXTENT_NOT_ALIGNED
|
||||||
|
Data is packed into a block with data from other files.
|
||||||
|
|
||||||
|
* FIEMAP_EXTENT_UNWRITTEN
|
||||||
|
Unwritten extent - the extent is allocated but it's data has not been
|
||||||
|
initialized. This indicates the extent's data will be all zero if read
|
||||||
|
through the filesystem but the contents are undefined if read directly from
|
||||||
|
the device.
|
||||||
|
|
||||||
|
* FIEMAP_EXTENT_MERGED
|
||||||
|
This will be set when a file does not support extents, i.e., it uses a block
|
||||||
|
based addressing scheme. Since returning an extent for each block back to
|
||||||
|
userspace would be highly inefficient, the kernel will try to merge most
|
||||||
|
adjacent blocks into 'extents'.
|
||||||
|
|
||||||
|
|
||||||
|
VFS -> File System Implementation
|
||||||
|
---------------------------------
|
||||||
|
|
||||||
|
File systems wishing to support fiemap must implement a ->fiemap callback on
|
||||||
|
their inode_operations structure. The fs ->fiemap call is responsible for
|
||||||
|
defining it's set of supported fiemap flags, and calling a helper function on
|
||||||
|
each discovered extent:
|
||||||
|
|
||||||
|
struct inode_operations {
|
||||||
|
...
|
||||||
|
|
||||||
|
int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start,
|
||||||
|
u64 len);
|
||||||
|
|
||||||
|
->fiemap is passed struct fiemap_extent_info which describes the
|
||||||
|
fiemap request:
|
||||||
|
|
||||||
|
struct fiemap_extent_info {
|
||||||
|
unsigned int fi_flags; /* Flags as passed from user */
|
||||||
|
unsigned int fi_extents_mapped; /* Number of mapped extents */
|
||||||
|
unsigned int fi_extents_max; /* Size of fiemap_extent array */
|
||||||
|
struct fiemap_extent *fi_extents_start; /* Start of fiemap_extent array */
|
||||||
|
};
|
||||||
|
|
||||||
|
It is intended that the file system should not need to access any of this
|
||||||
|
structure directly.
|
||||||
|
|
||||||
|
|
||||||
|
Flag checking should be done at the beginning of the ->fiemap callback via the
|
||||||
|
fiemap_check_flags() helper:
|
||||||
|
|
||||||
|
int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags);
|
||||||
|
|
||||||
|
The struct fieinfo should be passed in as recieved from ioctl_fiemap(). The
|
||||||
|
set of fiemap flags which the fs understands should be passed via fs_flags. If
|
||||||
|
fiemap_check_flags finds invalid user flags, it will place the bad values in
|
||||||
|
fieinfo->fi_flags and return -EBADR. If the file system gets -EBADR, from
|
||||||
|
fiemap_check_flags(), it should immediately exit, returning that error back to
|
||||||
|
ioctl_fiemap().
|
||||||
|
|
||||||
|
|
||||||
|
For each extent in the request range, the file system should call
|
||||||
|
the helper function, fiemap_fill_next_extent():
|
||||||
|
|
||||||
|
int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical,
|
||||||
|
u64 phys, u64 len, u32 flags, u32 dev);
|
||||||
|
|
||||||
|
fiemap_fill_next_extent() will use the passed values to populate the
|
||||||
|
next free extent in the fm_extents array. 'General' extent flags will
|
||||||
|
automatically be set from specific flags on behalf of the calling file
|
||||||
|
system so that the userspace API is not broken.
|
||||||
|
|
||||||
|
fiemap_fill_next_extent() returns 0 on success, and 1 when the
|
||||||
|
user-supplied fm_extents array is full. If an error is encountered
|
||||||
|
while copying the extent to user memory, -EFAULT will be returned.
|
114
Documentation/filesystems/gfs2-glocks.txt
Normal file
114
Documentation/filesystems/gfs2-glocks.txt
Normal file
@@ -0,0 +1,114 @@
|
|||||||
|
Glock internal locking rules
|
||||||
|
------------------------------
|
||||||
|
|
||||||
|
This documents the basic principles of the glock state machine
|
||||||
|
internals. Each glock (struct gfs2_glock in fs/gfs2/incore.h)
|
||||||
|
has two main (internal) locks:
|
||||||
|
|
||||||
|
1. A spinlock (gl_spin) which protects the internal state such
|
||||||
|
as gl_state, gl_target and the list of holders (gl_holders)
|
||||||
|
2. A non-blocking bit lock, GLF_LOCK, which is used to prevent other
|
||||||
|
threads from making calls to the DLM, etc. at the same time. If a
|
||||||
|
thread takes this lock, it must then call run_queue (usually via the
|
||||||
|
workqueue) when it releases it in order to ensure any pending tasks
|
||||||
|
are completed.
|
||||||
|
|
||||||
|
The gl_holders list contains all the queued lock requests (not
|
||||||
|
just the holders) associated with the glock. If there are any
|
||||||
|
held locks, then they will be contiguous entries at the head
|
||||||
|
of the list. Locks are granted in strictly the order that they
|
||||||
|
are queued, except for those marked LM_FLAG_PRIORITY which are
|
||||||
|
used only during recovery, and even then only for journal locks.
|
||||||
|
|
||||||
|
There are three lock states that users of the glock layer can request,
|
||||||
|
namely shared (SH), deferred (DF) and exclusive (EX). Those translate
|
||||||
|
to the following DLM lock modes:
|
||||||
|
|
||||||
|
Glock mode | DLM lock mode
|
||||||
|
------------------------------
|
||||||
|
UN | IV/NL Unlocked (no DLM lock associated with glock) or NL
|
||||||
|
SH | PR (Protected read)
|
||||||
|
DF | CW (Concurrent write)
|
||||||
|
EX | EX (Exclusive)
|
||||||
|
|
||||||
|
Thus DF is basically a shared mode which is incompatible with the "normal"
|
||||||
|
shared lock mode, SH. In GFS2 the DF mode is used exclusively for direct I/O
|
||||||
|
operations. The glocks are basically a lock plus some routines which deal
|
||||||
|
with cache management. The following rules apply for the cache:
|
||||||
|
|
||||||
|
Glock mode | Cache data | Cache Metadata | Dirty Data | Dirty Metadata
|
||||||
|
--------------------------------------------------------------------------
|
||||||
|
UN | No | No | No | No
|
||||||
|
SH | Yes | Yes | No | No
|
||||||
|
DF | No | Yes | No | No
|
||||||
|
EX | Yes | Yes | Yes | Yes
|
||||||
|
|
||||||
|
These rules are implemented using the various glock operations which
|
||||||
|
are defined for each type of glock. Not all types of glocks use
|
||||||
|
all the modes. Only inode glocks use the DF mode for example.
|
||||||
|
|
||||||
|
Table of glock operations and per type constants:
|
||||||
|
|
||||||
|
Field | Purpose
|
||||||
|
----------------------------------------------------------------------------
|
||||||
|
go_xmote_th | Called before remote state change (e.g. to sync dirty data)
|
||||||
|
go_xmote_bh | Called after remote state change (e.g. to refill cache)
|
||||||
|
go_inval | Called if remote state change requires invalidating the cache
|
||||||
|
go_demote_ok | Returns boolean value of whether its ok to demote a glock
|
||||||
|
| (e.g. checks timeout, and that there is no cached data)
|
||||||
|
go_lock | Called for the first local holder of a lock
|
||||||
|
go_unlock | Called on the final local unlock of a lock
|
||||||
|
go_dump | Called to print content of object for debugfs file, or on
|
||||||
|
| error to dump glock to the log.
|
||||||
|
go_type; | The type of the glock, LM_TYPE_.....
|
||||||
|
go_min_hold_time | The minimum hold time
|
||||||
|
|
||||||
|
The minimum hold time for each lock is the time after a remote lock
|
||||||
|
grant for which we ignore remote demote requests. This is in order to
|
||||||
|
prevent a situation where locks are being bounced around the cluster
|
||||||
|
from node to node with none of the nodes making any progress. This
|
||||||
|
tends to show up most with shared mmaped files which are being written
|
||||||
|
to by multiple nodes. By delaying the demotion in response to a
|
||||||
|
remote callback, that gives the userspace program time to make
|
||||||
|
some progress before the pages are unmapped.
|
||||||
|
|
||||||
|
There is a plan to try and remove the go_lock and go_unlock callbacks
|
||||||
|
if possible, in order to try and speed up the fast path though the locking.
|
||||||
|
Also, eventually we hope to make the glock "EX" mode locally shared
|
||||||
|
such that any local locking will be done with the i_mutex as required
|
||||||
|
rather than via the glock.
|
||||||
|
|
||||||
|
Locking rules for glock operations:
|
||||||
|
|
||||||
|
Operation | GLF_LOCK bit lock held | gl_spin spinlock held
|
||||||
|
-----------------------------------------------------------------
|
||||||
|
go_xmote_th | Yes | No
|
||||||
|
go_xmote_bh | Yes | No
|
||||||
|
go_inval | Yes | No
|
||||||
|
go_demote_ok | Sometimes | Yes
|
||||||
|
go_lock | Yes | No
|
||||||
|
go_unlock | Yes | No
|
||||||
|
go_dump | Sometimes | Yes
|
||||||
|
|
||||||
|
N.B. Operations must not drop either the bit lock or the spinlock
|
||||||
|
if its held on entry. go_dump and do_demote_ok must never block.
|
||||||
|
Note that go_dump will only be called if the glock's state
|
||||||
|
indicates that it is caching uptodate data.
|
||||||
|
|
||||||
|
Glock locking order within GFS2:
|
||||||
|
|
||||||
|
1. i_mutex (if required)
|
||||||
|
2. Rename glock (for rename only)
|
||||||
|
3. Inode glock(s)
|
||||||
|
(Parents before children, inodes at "same level" with same parent in
|
||||||
|
lock number order)
|
||||||
|
4. Rgrp glock(s) (for (de)allocation operations)
|
||||||
|
5. Transaction glock (via gfs2_trans_begin) for non-read operations
|
||||||
|
6. Page lock (always last, very important!)
|
||||||
|
|
||||||
|
There are two glocks per inode. One deals with access to the inode
|
||||||
|
itself (locking order as above), and the other, known as the iopen
|
||||||
|
glock is used in conjunction with the i_nlink field in the inode to
|
||||||
|
determine the lifetime of the inode in question. Locking of inodes
|
||||||
|
is on a per-inode basis. Locking of rgrps is on a per rgrp basis.
|
||||||
|
|
@@ -5,7 +5,7 @@
|
|||||||
################################################################################
|
################################################################################
|
||||||
|
|
||||||
Author: NetApp and Open Grid Computing
|
Author: NetApp and Open Grid Computing
|
||||||
Date: April 15, 2008
|
Date: May 29, 2008
|
||||||
|
|
||||||
Table of Contents
|
Table of Contents
|
||||||
~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~
|
||||||
@@ -60,16 +60,18 @@ Installation
|
|||||||
The procedures described in this document have been tested with
|
The procedures described in this document have been tested with
|
||||||
distributions from Red Hat's Fedora Project (http://fedora.redhat.com/).
|
distributions from Red Hat's Fedora Project (http://fedora.redhat.com/).
|
||||||
|
|
||||||
- Install nfs-utils-1.1.1 or greater on the client
|
- Install nfs-utils-1.1.2 or greater on the client
|
||||||
|
|
||||||
An NFS/RDMA mount point can only be obtained by using the mount.nfs
|
An NFS/RDMA mount point can be obtained by using the mount.nfs command in
|
||||||
command in nfs-utils-1.1.1 or greater. To see which version of mount.nfs
|
nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils
|
||||||
you are using, type:
|
version with support for NFS/RDMA mounts, but for various reasons we
|
||||||
|
recommend using nfs-utils-1.1.2 or greater). To see which version of
|
||||||
|
mount.nfs you are using, type:
|
||||||
|
|
||||||
> /sbin/mount.nfs -V
|
$ /sbin/mount.nfs -V
|
||||||
|
|
||||||
If the version is less than 1.1.1 or the command does not exist,
|
If the version is less than 1.1.2 or the command does not exist,
|
||||||
then you will need to install the latest version of nfs-utils.
|
you should install the latest version of nfs-utils.
|
||||||
|
|
||||||
Download the latest package from:
|
Download the latest package from:
|
||||||
|
|
||||||
@@ -77,22 +79,33 @@ Installation
|
|||||||
|
|
||||||
Uncompress the package and follow the installation instructions.
|
Uncompress the package and follow the installation instructions.
|
||||||
|
|
||||||
If you will not be using GSS and NFSv4, the installation process
|
If you will not need the idmapper and gssd executables (you do not need
|
||||||
can be simplified by disabling these features when running configure:
|
these to create an NFS/RDMA enabled mount command), the installation
|
||||||
|
process can be simplified by disabling these features when running
|
||||||
|
configure:
|
||||||
|
|
||||||
> ./configure --disable-gss --disable-nfsv4
|
$ ./configure --disable-gss --disable-nfsv4
|
||||||
|
|
||||||
For more information on this see the package's README and INSTALL files.
|
To build nfs-utils you will need the tcp_wrappers package installed. For
|
||||||
|
more information on this see the package's README and INSTALL files.
|
||||||
|
|
||||||
After building the nfs-utils package, there will be a mount.nfs binary in
|
After building the nfs-utils package, there will be a mount.nfs binary in
|
||||||
the utils/mount directory. This binary can be used to initiate NFS v2, v3,
|
the utils/mount directory. This binary can be used to initiate NFS v2, v3,
|
||||||
or v4 mounts. To initiate a v4 mount, the binary must be called mount.nfs4.
|
or v4 mounts. To initiate a v4 mount, the binary must be called
|
||||||
The standard technique is to create a symlink called mount.nfs4 to mount.nfs.
|
mount.nfs4. The standard technique is to create a symlink called
|
||||||
|
mount.nfs4 to mount.nfs.
|
||||||
|
|
||||||
NOTE: mount.nfs and therefore nfs-utils-1.1.1 or greater is only needed
|
This mount.nfs binary should be installed at /sbin/mount.nfs as follows:
|
||||||
|
|
||||||
|
$ sudo cp utils/mount/mount.nfs /sbin/mount.nfs
|
||||||
|
|
||||||
|
In this location, mount.nfs will be invoked automatically for NFS mounts
|
||||||
|
by the system mount commmand.
|
||||||
|
|
||||||
|
NOTE: mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed
|
||||||
on the NFS client machine. You do not need this specific version of
|
on the NFS client machine. You do not need this specific version of
|
||||||
nfs-utils on the server. Furthermore, only the mount.nfs command from
|
nfs-utils on the server. Furthermore, only the mount.nfs command from
|
||||||
nfs-utils-1.1.1 is needed on the client.
|
nfs-utils-1.1.2 is needed on the client.
|
||||||
|
|
||||||
- Install a Linux kernel with NFS/RDMA
|
- Install a Linux kernel with NFS/RDMA
|
||||||
|
|
||||||
@@ -156,8 +169,8 @@ Check RDMA and NFS Setup
|
|||||||
this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel
|
this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel
|
||||||
card:
|
card:
|
||||||
|
|
||||||
> modprobe ib_mthca
|
$ modprobe ib_mthca
|
||||||
> modprobe ib_ipoib
|
$ modprobe ib_ipoib
|
||||||
|
|
||||||
If you are using InfiniBand, make sure there is a Subnet Manager (SM)
|
If you are using InfiniBand, make sure there is a Subnet Manager (SM)
|
||||||
running on the network. If your IB switch has an embedded SM, you can
|
running on the network. If your IB switch has an embedded SM, you can
|
||||||
@@ -166,7 +179,7 @@ Check RDMA and NFS Setup
|
|||||||
|
|
||||||
If an SM is running on your network, you should see the following:
|
If an SM is running on your network, you should see the following:
|
||||||
|
|
||||||
> cat /sys/class/infiniband/driverX/ports/1/state
|
$ cat /sys/class/infiniband/driverX/ports/1/state
|
||||||
4: ACTIVE
|
4: ACTIVE
|
||||||
|
|
||||||
where driverX is mthca0, ipath5, ehca3, etc.
|
where driverX is mthca0, ipath5, ehca3, etc.
|
||||||
@@ -174,10 +187,10 @@ Check RDMA and NFS Setup
|
|||||||
To further test the InfiniBand software stack, use IPoIB (this
|
To further test the InfiniBand software stack, use IPoIB (this
|
||||||
assumes you have two IB hosts named host1 and host2):
|
assumes you have two IB hosts named host1 and host2):
|
||||||
|
|
||||||
host1> ifconfig ib0 a.b.c.x
|
host1$ ifconfig ib0 a.b.c.x
|
||||||
host2> ifconfig ib0 a.b.c.y
|
host2$ ifconfig ib0 a.b.c.y
|
||||||
host1> ping a.b.c.y
|
host1$ ping a.b.c.y
|
||||||
host2> ping a.b.c.x
|
host2$ ping a.b.c.x
|
||||||
|
|
||||||
For other device types, follow the appropriate procedures.
|
For other device types, follow the appropriate procedures.
|
||||||
|
|
||||||
@@ -202,11 +215,11 @@ NFS/RDMA Setup
|
|||||||
/vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash)
|
/vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash)
|
||||||
/vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash)
|
/vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash)
|
||||||
|
|
||||||
The IP address(es) is(are) the client's IPoIB address for an InfiniBand HCA or the
|
The IP address(es) is(are) the client's IPoIB address for an InfiniBand
|
||||||
cleint's iWARP address(es) for an RNIC.
|
HCA or the cleint's iWARP address(es) for an RNIC.
|
||||||
|
|
||||||
NOTE: The "insecure" option must be used because the NFS/RDMA client does not
|
NOTE: The "insecure" option must be used because the NFS/RDMA client does
|
||||||
use a reserved port.
|
not use a reserved port.
|
||||||
|
|
||||||
Each time a machine boots:
|
Each time a machine boots:
|
||||||
|
|
||||||
@@ -214,43 +227,45 @@ NFS/RDMA Setup
|
|||||||
|
|
||||||
For InfiniBand using a Mellanox adapter:
|
For InfiniBand using a Mellanox adapter:
|
||||||
|
|
||||||
> modprobe ib_mthca
|
$ modprobe ib_mthca
|
||||||
> modprobe ib_ipoib
|
$ modprobe ib_ipoib
|
||||||
> ifconfig ib0 a.b.c.d
|
$ ifconfig ib0 a.b.c.d
|
||||||
|
|
||||||
NOTE: use unique addresses for the client and server
|
NOTE: use unique addresses for the client and server
|
||||||
|
|
||||||
- Start the NFS server
|
- Start the NFS server
|
||||||
|
|
||||||
If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in kernel config),
|
If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
|
||||||
load the RDMA transport module:
|
kernel config), load the RDMA transport module:
|
||||||
|
|
||||||
> modprobe svcrdma
|
$ modprobe svcrdma
|
||||||
|
|
||||||
Regardless of how the server was built (module or built-in), start the server:
|
Regardless of how the server was built (module or built-in), start the
|
||||||
|
server:
|
||||||
|
|
||||||
> /etc/init.d/nfs start
|
$ /etc/init.d/nfs start
|
||||||
|
|
||||||
or
|
or
|
||||||
|
|
||||||
> service nfs start
|
$ service nfs start
|
||||||
|
|
||||||
Instruct the server to listen on the RDMA transport:
|
Instruct the server to listen on the RDMA transport:
|
||||||
|
|
||||||
> echo rdma 2050 > /proc/fs/nfsd/portlist
|
$ echo rdma 2050 > /proc/fs/nfsd/portlist
|
||||||
|
|
||||||
- On the client system
|
- On the client system
|
||||||
|
|
||||||
If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in kernel config),
|
If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
|
||||||
load the RDMA client module:
|
kernel config), load the RDMA client module:
|
||||||
|
|
||||||
> modprobe xprtrdma.ko
|
$ modprobe xprtrdma.ko
|
||||||
|
|
||||||
Regardless of how the client was built (module or built-in), issue the mount.nfs command:
|
Regardless of how the client was built (module or built-in), use this
|
||||||
|
command to mount the NFS/RDMA server:
|
||||||
|
|
||||||
> /path/to/your/mount.nfs <IPoIB-server-name-or-address>:/<export> /mnt -i -o rdma,port=2050
|
$ mount -o rdma,port=2050 <IPoIB-server-name-or-address>:/<export> /mnt
|
||||||
|
|
||||||
To verify that the mount is using RDMA, run "cat /proc/mounts" and check the
|
To verify that the mount is using RDMA, run "cat /proc/mounts" and check
|
||||||
"proto" field for the given mount.
|
the "proto" field for the given mount.
|
||||||
|
|
||||||
Congratulations! You're using NFS/RDMA!
|
Congratulations! You're using NFS/RDMA!
|
||||||
|
@@ -40,7 +40,7 @@ Web site
|
|||||||
========
|
========
|
||||||
|
|
||||||
There is plenty of additional information on the linux-ntfs web site
|
There is plenty of additional information on the linux-ntfs web site
|
||||||
at http://linux-ntfs.sourceforge.net/
|
at http://www.linux-ntfs.org/
|
||||||
|
|
||||||
The web site has a lot of additional information, such as a comprehensive
|
The web site has a lot of additional information, such as a comprehensive
|
||||||
FAQ, documentation on the NTFS on-disk format, information on the Linux-NTFS
|
FAQ, documentation on the NTFS on-disk format, information on the Linux-NTFS
|
||||||
@@ -272,7 +272,7 @@ And you would know that /dev/hda2 has a size of 37768814 - 4209030 + 1 =
|
|||||||
For Win2k and later dynamic disks, you can for example use the ldminfo utility
|
For Win2k and later dynamic disks, you can for example use the ldminfo utility
|
||||||
which is part of the Linux LDM tools (the latest version at the time of
|
which is part of the Linux LDM tools (the latest version at the time of
|
||||||
writing is linux-ldm-0.0.8.tar.bz2). You can download it from:
|
writing is linux-ldm-0.0.8.tar.bz2). You can download it from:
|
||||||
http://linux-ntfs.sourceforge.net/downloads.html
|
http://www.linux-ntfs.org/
|
||||||
Simply extract the downloaded archive (tar xvjf linux-ldm-0.0.8.tar.bz2), go
|
Simply extract the downloaded archive (tar xvjf linux-ldm-0.0.8.tar.bz2), go
|
||||||
into it (cd linux-ldm-0.0.8) and change to the test directory (cd test). You
|
into it (cd linux-ldm-0.0.8) and change to the test directory (cd test). You
|
||||||
will find the precompiled (i386) ldminfo utility there. NOTE: You will not be
|
will find the precompiled (i386) ldminfo utility there. NOTE: You will not be
|
||||||
|
@@ -76,3 +76,9 @@ localalloc=8(*) Allows custom localalloc size in MB. If the value is too
|
|||||||
large, the fs will silently revert it to the default.
|
large, the fs will silently revert it to the default.
|
||||||
Localalloc is not enabled for local mounts.
|
Localalloc is not enabled for local mounts.
|
||||||
localflocks This disables cluster aware flock.
|
localflocks This disables cluster aware flock.
|
||||||
|
inode64 Indicates that Ocfs2 is allowed to create inodes at
|
||||||
|
any location in the filesystem, including those which
|
||||||
|
will result in inode numbers occupying more than 32
|
||||||
|
bits of significance.
|
||||||
|
user_xattr (*) Enables Extended User Attributes.
|
||||||
|
nouser_xattr Disables Extended User Attributes.
|
||||||
|
106
Documentation/filesystems/omfs.txt
Normal file
106
Documentation/filesystems/omfs.txt
Normal file
@@ -0,0 +1,106 @@
|
|||||||
|
Optimized MPEG Filesystem (OMFS)
|
||||||
|
|
||||||
|
Overview
|
||||||
|
========
|
||||||
|
|
||||||
|
OMFS is a filesystem created by SonicBlue for use in the ReplayTV DVR
|
||||||
|
and Rio Karma MP3 player. The filesystem is extent-based, utilizing
|
||||||
|
block sizes from 2k to 8k, with hash-based directories. This
|
||||||
|
filesystem driver may be used to read and write disks from these
|
||||||
|
devices.
|
||||||
|
|
||||||
|
Note, it is not recommended that this FS be used in place of a general
|
||||||
|
filesystem for your own streaming media device. Native Linux filesystems
|
||||||
|
will likely perform better.
|
||||||
|
|
||||||
|
More information is available at:
|
||||||
|
|
||||||
|
http://linux-karma.sf.net/
|
||||||
|
|
||||||
|
Various utilities, including mkomfs and omfsck, are included with
|
||||||
|
omfsprogs, available at:
|
||||||
|
|
||||||
|
http://bobcopeland.com/karma/
|
||||||
|
|
||||||
|
Instructions are included in its README.
|
||||||
|
|
||||||
|
Options
|
||||||
|
=======
|
||||||
|
|
||||||
|
OMFS supports the following mount-time options:
|
||||||
|
|
||||||
|
uid=n - make all files owned by specified user
|
||||||
|
gid=n - make all files owned by specified group
|
||||||
|
umask=xxx - set permission umask to xxx
|
||||||
|
fmask=xxx - set umask to xxx for files
|
||||||
|
dmask=xxx - set umask to xxx for directories
|
||||||
|
|
||||||
|
Disk format
|
||||||
|
===========
|
||||||
|
|
||||||
|
OMFS discriminates between "sysblocks" and normal data blocks. The sysblock
|
||||||
|
group consists of super block information, file metadata, directory structures,
|
||||||
|
and extents. Each sysblock has a header containing CRCs of the entire
|
||||||
|
sysblock, and may be mirrored in successive blocks on the disk. A sysblock may
|
||||||
|
have a smaller size than a data block, but since they are both addressed by the
|
||||||
|
same 64-bit block number, any remaining space in the smaller sysblock is
|
||||||
|
unused.
|
||||||
|
|
||||||
|
Sysblock header information:
|
||||||
|
|
||||||
|
struct omfs_header {
|
||||||
|
__be64 h_self; /* FS block where this is located */
|
||||||
|
__be32 h_body_size; /* size of useful data after header */
|
||||||
|
__be16 h_crc; /* crc-ccitt of body_size bytes */
|
||||||
|
char h_fill1[2];
|
||||||
|
u8 h_version; /* version, always 1 */
|
||||||
|
char h_type; /* OMFS_INODE_X */
|
||||||
|
u8 h_magic; /* OMFS_IMAGIC */
|
||||||
|
u8 h_check_xor; /* XOR of header bytes before this */
|
||||||
|
__be32 h_fill2;
|
||||||
|
};
|
||||||
|
|
||||||
|
Files and directories are both represented by omfs_inode:
|
||||||
|
|
||||||
|
struct omfs_inode {
|
||||||
|
struct omfs_header i_head; /* header */
|
||||||
|
__be64 i_parent; /* parent containing this inode */
|
||||||
|
__be64 i_sibling; /* next inode in hash bucket */
|
||||||
|
__be64 i_ctime; /* ctime, in milliseconds */
|
||||||
|
char i_fill1[35];
|
||||||
|
char i_type; /* OMFS_[DIR,FILE] */
|
||||||
|
__be32 i_fill2;
|
||||||
|
char i_fill3[64];
|
||||||
|
char i_name[OMFS_NAMELEN]; /* filename */
|
||||||
|
__be64 i_size; /* size of file, in bytes */
|
||||||
|
};
|
||||||
|
|
||||||
|
Directories in OMFS are implemented as a large hash table. Filenames are
|
||||||
|
hashed then prepended into the bucket list beginning at OMFS_DIR_START.
|
||||||
|
Lookup requires hashing the filename, then seeking across i_sibling pointers
|
||||||
|
until a match is found on i_name. Empty buckets are represented by block
|
||||||
|
pointers with all-1s (~0).
|
||||||
|
|
||||||
|
A file is an omfs_inode structure followed by an extent table beginning at
|
||||||
|
OMFS_EXTENT_START:
|
||||||
|
|
||||||
|
struct omfs_extent_entry {
|
||||||
|
__be64 e_cluster; /* start location of a set of blocks */
|
||||||
|
__be64 e_blocks; /* number of blocks after e_cluster */
|
||||||
|
};
|
||||||
|
|
||||||
|
struct omfs_extent {
|
||||||
|
__be64 e_next; /* next extent table location */
|
||||||
|
__be32 e_extent_count; /* total # extents in this table */
|
||||||
|
__be32 e_fill;
|
||||||
|
struct omfs_extent_entry e_entry; /* start of extent entries */
|
||||||
|
};
|
||||||
|
|
||||||
|
Each extent holds the block offset followed by number of blocks allocated to
|
||||||
|
the extent. The final extent in each table is a terminator with e_cluster
|
||||||
|
being ~0 and e_blocks being ones'-complement of the total number of blocks
|
||||||
|
in the table.
|
||||||
|
|
||||||
|
If this table overflows, a continuation inode is written and pointed to by
|
||||||
|
e_next. These have a header but lack the rest of the inode structure.
|
||||||
|
|
@@ -296,6 +296,7 @@ Table 1-4: Kernel info in /proc
|
|||||||
uptime System uptime
|
uptime System uptime
|
||||||
version Kernel version
|
version Kernel version
|
||||||
video bttv info of video resources (2.4)
|
video bttv info of video resources (2.4)
|
||||||
|
vmallocinfo Show vmalloced areas
|
||||||
..............................................................................
|
..............................................................................
|
||||||
|
|
||||||
You can, for example, check which interrupts are currently in use and what
|
You can, for example, check which interrupts are currently in use and what
|
||||||
@@ -380,28 +381,35 @@ i386 and x86_64 platforms support the new IRQ vector displays.
|
|||||||
Of some interest is the introduction of the /proc/irq directory to 2.4.
|
Of some interest is the introduction of the /proc/irq directory to 2.4.
|
||||||
It could be used to set IRQ to CPU affinity, this means that you can "hook" an
|
It could be used to set IRQ to CPU affinity, this means that you can "hook" an
|
||||||
IRQ to only one CPU, or to exclude a CPU of handling IRQs. The contents of the
|
IRQ to only one CPU, or to exclude a CPU of handling IRQs. The contents of the
|
||||||
irq subdir is one subdir for each IRQ, and one file; prof_cpu_mask
|
irq subdir is one subdir for each IRQ, and two files; default_smp_affinity and
|
||||||
|
prof_cpu_mask.
|
||||||
|
|
||||||
For example
|
For example
|
||||||
> ls /proc/irq/
|
> ls /proc/irq/
|
||||||
0 10 12 14 16 18 2 4 6 8 prof_cpu_mask
|
0 10 12 14 16 18 2 4 6 8 prof_cpu_mask
|
||||||
1 11 13 15 17 19 3 5 7 9
|
1 11 13 15 17 19 3 5 7 9 default_smp_affinity
|
||||||
> ls /proc/irq/0/
|
> ls /proc/irq/0/
|
||||||
smp_affinity
|
smp_affinity
|
||||||
|
|
||||||
The contents of the prof_cpu_mask file and each smp_affinity file for each IRQ
|
smp_affinity is a bitmask, in which you can specify which CPUs can handle the
|
||||||
is the same by default:
|
IRQ, you can set it by doing:
|
||||||
|
|
||||||
> cat /proc/irq/0/smp_affinity
|
> echo 1 > /proc/irq/10/smp_affinity
|
||||||
|
|
||||||
|
This means that only the first CPU will handle the IRQ, but you can also echo
|
||||||
|
5 which means that only the first and fourth CPU can handle the IRQ.
|
||||||
|
|
||||||
|
The contents of each smp_affinity file is the same by default:
|
||||||
|
|
||||||
|
> cat /proc/irq/0/smp_affinity
|
||||||
ffffffff
|
ffffffff
|
||||||
|
|
||||||
It's a bitmask, in which you can specify which CPUs can handle the IRQ, you can
|
The default_smp_affinity mask applies to all non-active IRQs, which are the
|
||||||
set it by doing:
|
IRQs which have not yet been allocated/activated, and hence which lack a
|
||||||
|
/proc/irq/[0-9]* directory.
|
||||||
|
|
||||||
> echo 1 > /proc/irq/prof_cpu_mask
|
prof_cpu_mask specifies which CPUs are to be profiled by the system wide
|
||||||
|
profiler. Default value is ffffffff (all cpus).
|
||||||
This means that only the first CPU will handle the IRQ, but you can also echo 5
|
|
||||||
which means that only the first and fourth CPU can handle the IRQ.
|
|
||||||
|
|
||||||
The way IRQs are routed is handled by the IO-APIC, and it's Round Robin
|
The way IRQs are routed is handled by the IO-APIC, and it's Round Robin
|
||||||
between all the CPUs which are allowed to handle it. As usual the kernel has
|
between all the CPUs which are allowed to handle it. As usual the kernel has
|
||||||
@@ -550,6 +558,49 @@ VmallocTotal: total size of vmalloc memory area
|
|||||||
VmallocUsed: amount of vmalloc area which is used
|
VmallocUsed: amount of vmalloc area which is used
|
||||||
VmallocChunk: largest contigious block of vmalloc area which is free
|
VmallocChunk: largest contigious block of vmalloc area which is free
|
||||||
|
|
||||||
|
..............................................................................
|
||||||
|
|
||||||
|
vmallocinfo:
|
||||||
|
|
||||||
|
Provides information about vmalloced/vmaped areas. One line per area,
|
||||||
|
containing the virtual address range of the area, size in bytes,
|
||||||
|
caller information of the creator, and optional information depending
|
||||||
|
on the kind of area :
|
||||||
|
|
||||||
|
pages=nr number of pages
|
||||||
|
phys=addr if a physical address was specified
|
||||||
|
ioremap I/O mapping (ioremap() and friends)
|
||||||
|
vmalloc vmalloc() area
|
||||||
|
vmap vmap()ed pages
|
||||||
|
user VM_USERMAP area
|
||||||
|
vpages buffer for pages pointers was vmalloced (huge area)
|
||||||
|
N<node>=nr (Only on NUMA kernels)
|
||||||
|
Number of pages allocated on memory node <node>
|
||||||
|
|
||||||
|
> cat /proc/vmallocinfo
|
||||||
|
0xffffc20000000000-0xffffc20000201000 2101248 alloc_large_system_hash+0x204 ...
|
||||||
|
/0x2c0 pages=512 vmalloc N0=128 N1=128 N2=128 N3=128
|
||||||
|
0xffffc20000201000-0xffffc20000302000 1052672 alloc_large_system_hash+0x204 ...
|
||||||
|
/0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 N3=64
|
||||||
|
0xffffc20000302000-0xffffc20000304000 8192 acpi_tb_verify_table+0x21/0x4f...
|
||||||
|
phys=7fee8000 ioremap
|
||||||
|
0xffffc20000304000-0xffffc20000307000 12288 acpi_tb_verify_table+0x21/0x4f...
|
||||||
|
phys=7fee7000 ioremap
|
||||||
|
0xffffc2000031d000-0xffffc2000031f000 8192 init_vdso_vars+0x112/0x210
|
||||||
|
0xffffc2000031f000-0xffffc2000032b000 49152 cramfs_uncompress_init+0x2e ...
|
||||||
|
/0x80 pages=11 vmalloc N0=3 N1=3 N2=2 N3=3
|
||||||
|
0xffffc2000033a000-0xffffc2000033d000 12288 sys_swapon+0x640/0xac0 ...
|
||||||
|
pages=2 vmalloc N1=2
|
||||||
|
0xffffc20000347000-0xffffc2000034c000 20480 xt_alloc_table_info+0xfe ...
|
||||||
|
/0x130 [x_tables] pages=4 vmalloc N0=4
|
||||||
|
0xffffffffa0000000-0xffffffffa000f000 61440 sys_init_module+0xc27/0x1d00 ...
|
||||||
|
pages=14 vmalloc N2=14
|
||||||
|
0xffffffffa000f000-0xffffffffa0014000 20480 sys_init_module+0xc27/0x1d00 ...
|
||||||
|
pages=4 vmalloc N1=4
|
||||||
|
0xffffffffa0014000-0xffffffffa0017000 12288 sys_init_module+0xc27/0x1d00 ...
|
||||||
|
pages=2 vmalloc N1=2
|
||||||
|
0xffffffffa0017000-0xffffffffa0022000 45056 sys_init_module+0xc27/0x1d00 ...
|
||||||
|
pages=10 vmalloc N0=10
|
||||||
|
|
||||||
1.3 IDE devices in /proc/ide
|
1.3 IDE devices in /proc/ide
|
||||||
----------------------------
|
----------------------------
|
||||||
@@ -872,45 +923,44 @@ CPUs.
|
|||||||
The "procs_blocked" line gives the number of processes currently blocked,
|
The "procs_blocked" line gives the number of processes currently blocked,
|
||||||
waiting for I/O to complete.
|
waiting for I/O to complete.
|
||||||
|
|
||||||
|
|
||||||
1.9 Ext4 file system parameters
|
1.9 Ext4 file system parameters
|
||||||
------------------------------
|
------------------------------
|
||||||
Ext4 file system have one directory per partition under /proc/fs/ext4/
|
|
||||||
# ls /proc/fs/ext4/hdc/
|
|
||||||
group_prealloc max_to_scan mb_groups mb_history min_to_scan order2_req
|
|
||||||
stats stream_req
|
|
||||||
|
|
||||||
mb_groups:
|
Information about mounted ext4 file systems can be found in
|
||||||
This file gives the details of mutiblock allocator buddy cache of free blocks
|
/proc/fs/ext4. Each mounted filesystem will have a directory in
|
||||||
|
/proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or
|
||||||
|
/proc/fs/ext4/dm-0). The files in each per-device directory are shown
|
||||||
|
in Table 1-10, below.
|
||||||
|
|
||||||
mb_history:
|
Table 1-10: Files in /proc/fs/ext4/<devname>
|
||||||
Multiblock allocation history.
|
..............................................................................
|
||||||
|
File Content
|
||||||
|
mb_groups details of multiblock allocator buddy cache of free blocks
|
||||||
|
mb_history multiblock allocation history
|
||||||
|
stats controls whether the multiblock allocator should start
|
||||||
|
collecting statistics, which are shown during the unmount
|
||||||
|
group_prealloc the multiblock allocator will round up allocation
|
||||||
|
requests to a multiple of this tuning parameter if the
|
||||||
|
stripe size is not set in the ext4 superblock
|
||||||
|
max_to_scan The maximum number of extents the multiblock allocator
|
||||||
|
will search to find the best extent
|
||||||
|
min_to_scan The minimum number of extents the multiblock allocator
|
||||||
|
will search to find the best extent
|
||||||
|
order2_req Tuning parameter which controls the minimum size for
|
||||||
|
requests (as a power of 2) where the buddy cache is
|
||||||
|
used
|
||||||
|
stream_req Files which have fewer blocks than this tunable
|
||||||
|
parameter will have their blocks allocated out of a
|
||||||
|
block group specific preallocation pool, so that small
|
||||||
|
files are packed closely together. Each large file
|
||||||
|
will have its blocks allocated out of its own unique
|
||||||
|
preallocation pool.
|
||||||
|
inode_readahead Tuning parameter which controls the maximum number of
|
||||||
|
inode table blocks that ext4's inode table readahead
|
||||||
|
algorithm will pre-read into the buffer cache
|
||||||
|
..............................................................................
|
||||||
|
|
||||||
stats:
|
|
||||||
This file indicate whether the multiblock allocator should start collecting
|
|
||||||
statistics. The statistics are shown during unmount
|
|
||||||
|
|
||||||
group_prealloc:
|
|
||||||
The multiblock allocator normalize the block allocation request to
|
|
||||||
group_prealloc filesystem blocks if we don't have strip value set.
|
|
||||||
The stripe value can be specified at mount time or during mke2fs.
|
|
||||||
|
|
||||||
max_to_scan:
|
|
||||||
How long multiblock allocator can look for a best extent (in found extents)
|
|
||||||
|
|
||||||
min_to_scan:
|
|
||||||
How long multiblock allocator must look for a best extent
|
|
||||||
|
|
||||||
order2_req:
|
|
||||||
Multiblock allocator use 2^N search using buddies only for requests greater
|
|
||||||
than or equal to order2_req. The request size is specfied in file system
|
|
||||||
blocks. A value of 2 indicate only if the requests are greater than or equal
|
|
||||||
to 4 blocks.
|
|
||||||
|
|
||||||
stream_req:
|
|
||||||
Files smaller than stream_req are served by the stream allocator, whose
|
|
||||||
purpose is to pack requests as close each to other as possible to
|
|
||||||
produce smooth I/O traffic. Avalue of 16 indicate that file smaller than 16
|
|
||||||
filesystem block size will use group based preallocation.
|
|
||||||
|
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
Summary
|
Summary
|
||||||
@@ -1281,12 +1331,24 @@ determine whether or not they are still functioning properly.
|
|||||||
Because the NMI watchdog shares registers with oprofile, by disabling the NMI
|
Because the NMI watchdog shares registers with oprofile, by disabling the NMI
|
||||||
watchdog, oprofile may have more registers to utilize.
|
watchdog, oprofile may have more registers to utilize.
|
||||||
|
|
||||||
maps_protect
|
msgmni
|
||||||
------------
|
------
|
||||||
|
|
||||||
Enables/Disables the protection of the per-process proc entries "maps" and
|
Maximum number of message queue ids on the system.
|
||||||
"smaps". When enabled, the contents of these files are visible only to
|
This value scales to the amount of lowmem. It is automatically recomputed
|
||||||
readers that are allowed to ptrace() the given process.
|
upon memory add/remove or ipc namespace creation/removal.
|
||||||
|
When a value is written into this file, msgmni's value becomes fixed, i.e. it
|
||||||
|
is not recomputed anymore when one of the above events occurs.
|
||||||
|
Use auto_msgmni to change this behavior.
|
||||||
|
|
||||||
|
auto_msgmni
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Enables/Disables automatic recomputing of msgmni upon memory add/remove or
|
||||||
|
upon ipc namespace creation/removal (see the msgmni description above).
|
||||||
|
Echoing "1" into this file enables msgmni automatic recomputing.
|
||||||
|
Echoing "0" turns it off.
|
||||||
|
auto_msgmni default value is 1.
|
||||||
|
|
||||||
|
|
||||||
2.4 /proc/sys/vm - The virtual memory subsystem
|
2.4 /proc/sys/vm - The virtual memory subsystem
|
||||||
@@ -1423,7 +1485,7 @@ used because pages_free(1355) is smaller than watermark + protection[2]
|
|||||||
normal page requirement. If requirement is DMA zone(index=0), protection[0]
|
normal page requirement. If requirement is DMA zone(index=0), protection[0]
|
||||||
(=0) is used.
|
(=0) is used.
|
||||||
|
|
||||||
zone[i]'s protection[j] is calculated by following exprssion.
|
zone[i]'s protection[j] is calculated by following expression.
|
||||||
|
|
||||||
(i < j):
|
(i < j):
|
||||||
zone[i]->protection[j]
|
zone[i]->protection[j]
|
||||||
@@ -2343,6 +2405,8 @@ The following 4 memory types are supported:
|
|||||||
- (bit 1) anonymous shared memory
|
- (bit 1) anonymous shared memory
|
||||||
- (bit 2) file-backed private memory
|
- (bit 2) file-backed private memory
|
||||||
- (bit 3) file-backed shared memory
|
- (bit 3) file-backed shared memory
|
||||||
|
- (bit 4) ELF header pages in file-backed private memory areas (it is
|
||||||
|
effective only if the bit 2 is cleared)
|
||||||
|
|
||||||
Note that MMIO pages such as frame buffer are never dumped and vDSO pages
|
Note that MMIO pages such as frame buffer are never dumped and vDSO pages
|
||||||
are always dumped regardless of the bitmask status.
|
are always dumped regardless of the bitmask status.
|
||||||
|
@@ -3,14 +3,14 @@ Quota subsystem
|
|||||||
===============
|
===============
|
||||||
|
|
||||||
Quota subsystem allows system administrator to set limits on used space and
|
Quota subsystem allows system administrator to set limits on used space and
|
||||||
number of used inodes (inode is a filesystem structure which is associated
|
number of used inodes (inode is a filesystem structure which is associated with
|
||||||
with each file or directory) for users and/or groups. For both used space and
|
each file or directory) for users and/or groups. For both used space and number
|
||||||
number of used inodes there are actually two limits. The first one is called
|
of used inodes there are actually two limits. The first one is called softlimit
|
||||||
softlimit and the second one hardlimit. An user can never exceed a hardlimit
|
and the second one hardlimit. An user can never exceed a hardlimit for any
|
||||||
for any resource. User is allowed to exceed softlimit but only for limited
|
resource (unless he has CAP_SYS_RESOURCE capability). User is allowed to exceed
|
||||||
period of time. This period is called "grace period" or "grace time". When
|
softlimit but only for limited period of time. This period is called "grace
|
||||||
grace time is over, user is not able to allocate more space/inodes until he
|
period" or "grace time". When grace time is over, user is not able to allocate
|
||||||
frees enough of them to get below softlimit.
|
more space/inodes until he frees enough of them to get below softlimit.
|
||||||
|
|
||||||
Quota limits (and amount of grace time) are set independently for each
|
Quota limits (and amount of grace time) are set independently for each
|
||||||
filesystem.
|
filesystem.
|
||||||
@@ -53,6 +53,12 @@ in parentheses):
|
|||||||
QUOTA_NL_BSOFTLONGWARN - space (block) softlimit is exceeded
|
QUOTA_NL_BSOFTLONGWARN - space (block) softlimit is exceeded
|
||||||
longer than given grace period.
|
longer than given grace period.
|
||||||
QUOTA_NL_BSOFTWARN - space (block) softlimit
|
QUOTA_NL_BSOFTWARN - space (block) softlimit
|
||||||
|
- four warnings are also defined for the event when user stops
|
||||||
|
exceeding some limit:
|
||||||
|
QUOTA_NL_IHARDBELOW - inode hardlimit
|
||||||
|
QUOTA_NL_ISOFTBELOW - inode softlimit
|
||||||
|
QUOTA_NL_BHARDBELOW - space (block) hardlimit
|
||||||
|
QUOTA_NL_BSOFTBELOW - space (block) softlimit
|
||||||
QUOTA_NL_A_DEV_MAJOR (u32)
|
QUOTA_NL_A_DEV_MAJOR (u32)
|
||||||
- major number of a device with the affected filesystem
|
- major number of a device with the affected filesystem
|
||||||
QUOTA_NL_A_DEV_MINOR (u32)
|
QUOTA_NL_A_DEV_MINOR (u32)
|
||||||
|
@@ -294,6 +294,16 @@ user-defined data with a channel, and is immediately available
|
|||||||
(including in create_buf_file()) via chan->private_data or
|
(including in create_buf_file()) via chan->private_data or
|
||||||
buf->chan->private_data.
|
buf->chan->private_data.
|
||||||
|
|
||||||
|
Buffer-only channels
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
These channels have no files associated and can be created with
|
||||||
|
relay_open(NULL, NULL, ...). Such channels are useful in scenarios such
|
||||||
|
as when doing early tracing in the kernel, before the VFS is up. In these
|
||||||
|
cases, one may open a buffer-only channel and then call
|
||||||
|
relay_late_setup_files() when the kernel is ready to handle files,
|
||||||
|
to expose the buffered data to the userspace.
|
||||||
|
|
||||||
Channel 'modes'
|
Channel 'modes'
|
||||||
---------------
|
---------------
|
||||||
|
|
||||||
|
@@ -248,6 +248,7 @@ The top level sysfs directory looks like:
|
|||||||
block/
|
block/
|
||||||
bus/
|
bus/
|
||||||
class/
|
class/
|
||||||
|
dev/
|
||||||
devices/
|
devices/
|
||||||
firmware/
|
firmware/
|
||||||
net/
|
net/
|
||||||
@@ -274,6 +275,11 @@ fs/ contains a directory for some filesystems. Currently each
|
|||||||
filesystem wanting to export attributes must create its own hierarchy
|
filesystem wanting to export attributes must create its own hierarchy
|
||||||
below fs/ (see ./fuse.txt for an example).
|
below fs/ (see ./fuse.txt for an example).
|
||||||
|
|
||||||
|
dev/ contains two directories char/ and block/. Inside these two
|
||||||
|
directories there are symlinks named <major>:<minor>. These symlinks
|
||||||
|
point to the sysfs directory for the given device. /sys/dev provides a
|
||||||
|
quick way to lookup the sysfs interface for a device from the result of
|
||||||
|
a stat(2) operation.
|
||||||
|
|
||||||
More information can driver-model specific features can be found in
|
More information can driver-model specific features can be found in
|
||||||
Documentation/driver-model/.
|
Documentation/driver-model/.
|
||||||
|
164
Documentation/filesystems/ubifs.txt
Normal file
164
Documentation/filesystems/ubifs.txt
Normal file
@@ -0,0 +1,164 @@
|
|||||||
|
Introduction
|
||||||
|
=============
|
||||||
|
|
||||||
|
UBIFS file-system stands for UBI File System. UBI stands for "Unsorted
|
||||||
|
Block Images". UBIFS is a flash file system, which means it is designed
|
||||||
|
to work with flash devices. It is important to understand, that UBIFS
|
||||||
|
is completely different to any traditional file-system in Linux, like
|
||||||
|
Ext2, XFS, JFS, etc. UBIFS represents a separate class of file-systems
|
||||||
|
which work with MTD devices, not block devices. The other Linux
|
||||||
|
file-system of this class is JFFS2.
|
||||||
|
|
||||||
|
To make it more clear, here is a small comparison of MTD devices and
|
||||||
|
block devices.
|
||||||
|
|
||||||
|
1 MTD devices represent flash devices and they consist of eraseblocks of
|
||||||
|
rather large size, typically about 128KiB. Block devices consist of
|
||||||
|
small blocks, typically 512 bytes.
|
||||||
|
2 MTD devices support 3 main operations - read from some offset within an
|
||||||
|
eraseblock, write to some offset within an eraseblock, and erase a whole
|
||||||
|
eraseblock. Block devices support 2 main operations - read a whole
|
||||||
|
block and write a whole block.
|
||||||
|
3 The whole eraseblock has to be erased before it becomes possible to
|
||||||
|
re-write its contents. Blocks may be just re-written.
|
||||||
|
4 Eraseblocks become worn out after some number of erase cycles -
|
||||||
|
typically 100K-1G for SLC NAND and NOR flashes, and 1K-10K for MLC
|
||||||
|
NAND flashes. Blocks do not have the wear-out property.
|
||||||
|
5 Eraseblocks may become bad (only on NAND flashes) and software should
|
||||||
|
deal with this. Blocks on hard drives typically do not become bad,
|
||||||
|
because hardware has mechanisms to substitute bad blocks, at least in
|
||||||
|
modern LBA disks.
|
||||||
|
|
||||||
|
It should be quite obvious why UBIFS is very different to traditional
|
||||||
|
file-systems.
|
||||||
|
|
||||||
|
UBIFS works on top of UBI. UBI is a separate software layer which may be
|
||||||
|
found in drivers/mtd/ubi. UBI is basically a volume management and
|
||||||
|
wear-leveling layer. It provides so called UBI volumes which is a higher
|
||||||
|
level abstraction than a MTD device. The programming model of UBI devices
|
||||||
|
is very similar to MTD devices - they still consist of large eraseblocks,
|
||||||
|
they have read/write/erase operations, but UBI devices are devoid of
|
||||||
|
limitations like wear and bad blocks (items 4 and 5 in the above list).
|
||||||
|
|
||||||
|
In a sense, UBIFS is a next generation of JFFS2 file-system, but it is
|
||||||
|
very different and incompatible to JFFS2. The following are the main
|
||||||
|
differences.
|
||||||
|
|
||||||
|
* JFFS2 works on top of MTD devices, UBIFS depends on UBI and works on
|
||||||
|
top of UBI volumes.
|
||||||
|
* JFFS2 does not have on-media index and has to build it while mounting,
|
||||||
|
which requires full media scan. UBIFS maintains the FS indexing
|
||||||
|
information on the flash media and does not require full media scan,
|
||||||
|
so it mounts many times faster than JFFS2.
|
||||||
|
* JFFS2 is a write-through file-system, while UBIFS supports write-back,
|
||||||
|
which makes UBIFS much faster on writes.
|
||||||
|
|
||||||
|
Similarly to JFFS2, UBIFS supports on-the-flight compression which makes
|
||||||
|
it possible to fit quite a lot of data to the flash.
|
||||||
|
|
||||||
|
Similarly to JFFS2, UBIFS is tolerant of unclean reboots and power-cuts.
|
||||||
|
It does not need stuff like fsck.ext2. UBIFS automatically replays its
|
||||||
|
journal and recovers from crashes, ensuring that the on-flash data
|
||||||
|
structures are consistent.
|
||||||
|
|
||||||
|
UBIFS scales logarithmically (most of the data structures it uses are
|
||||||
|
trees), so the mount time and memory consumption do not linearly depend
|
||||||
|
on the flash size, like in case of JFFS2. This is because UBIFS
|
||||||
|
maintains the FS index on the flash media. However, UBIFS depends on
|
||||||
|
UBI, which scales linearly. So overall UBI/UBIFS stack scales linearly.
|
||||||
|
Nevertheless, UBI/UBIFS scales considerably better than JFFS2.
|
||||||
|
|
||||||
|
The authors of UBIFS believe, that it is possible to develop UBI2 which
|
||||||
|
would scale logarithmically as well. UBI2 would support the same API as UBI,
|
||||||
|
but it would be binary incompatible to UBI. So UBIFS would not need to be
|
||||||
|
changed to use UBI2
|
||||||
|
|
||||||
|
|
||||||
|
Mount options
|
||||||
|
=============
|
||||||
|
|
||||||
|
(*) == default.
|
||||||
|
|
||||||
|
norm_unmount (*) commit on unmount; the journal is committed
|
||||||
|
when the file-system is unmounted so that the
|
||||||
|
next mount does not have to replay the journal
|
||||||
|
and it becomes very fast;
|
||||||
|
fast_unmount do not commit on unmount; this option makes
|
||||||
|
unmount faster, but the next mount slower
|
||||||
|
because of the need to replay the journal.
|
||||||
|
|
||||||
|
|
||||||
|
Quick usage instructions
|
||||||
|
========================
|
||||||
|
|
||||||
|
The UBI volume to mount is specified using "ubiX_Y" or "ubiX:NAME" syntax,
|
||||||
|
where "X" is UBI device number, "Y" is UBI volume number, and "NAME" is
|
||||||
|
UBI volume name.
|
||||||
|
|
||||||
|
Mount volume 0 on UBI device 0 to /mnt/ubifs:
|
||||||
|
$ mount -t ubifs ubi0_0 /mnt/ubifs
|
||||||
|
|
||||||
|
Mount "rootfs" volume of UBI device 0 to /mnt/ubifs ("rootfs" is volume
|
||||||
|
name):
|
||||||
|
$ mount -t ubifs ubi0:rootfs /mnt/ubifs
|
||||||
|
|
||||||
|
The following is an example of the kernel boot arguments to attach mtd0
|
||||||
|
to UBI and mount volume "rootfs":
|
||||||
|
ubi.mtd=0 root=ubi0:rootfs rootfstype=ubifs
|
||||||
|
|
||||||
|
|
||||||
|
Module Parameters for Debugging
|
||||||
|
===============================
|
||||||
|
|
||||||
|
When UBIFS has been compiled with debugging enabled, there are 3 module
|
||||||
|
parameters that are available to control aspects of testing and debugging.
|
||||||
|
The parameters are unsigned integers where each bit controls an option.
|
||||||
|
The parameters are:
|
||||||
|
|
||||||
|
debug_msgs Selects which debug messages to display, as follows:
|
||||||
|
|
||||||
|
Message Type Flag value
|
||||||
|
|
||||||
|
General messages 1
|
||||||
|
Journal messages 2
|
||||||
|
Mount messages 4
|
||||||
|
Commit messages 8
|
||||||
|
LEB search messages 16
|
||||||
|
Budgeting messages 32
|
||||||
|
Garbage collection messages 64
|
||||||
|
Tree Node Cache (TNC) messages 128
|
||||||
|
LEB properties (lprops) messages 256
|
||||||
|
Input/output messages 512
|
||||||
|
Log messages 1024
|
||||||
|
Scan messages 2048
|
||||||
|
Recovery messages 4096
|
||||||
|
|
||||||
|
debug_chks Selects extra checks that UBIFS can do while running:
|
||||||
|
|
||||||
|
Check Flag value
|
||||||
|
|
||||||
|
General checks 1
|
||||||
|
Check Tree Node Cache (TNC) 2
|
||||||
|
Check indexing tree size 4
|
||||||
|
Check orphan area 8
|
||||||
|
Check old indexing tree 16
|
||||||
|
Check LEB properties (lprops) 32
|
||||||
|
Check leaf nodes and inodes 64
|
||||||
|
|
||||||
|
debug_tsts Selects a mode of testing, as follows:
|
||||||
|
|
||||||
|
Test mode Flag value
|
||||||
|
|
||||||
|
Force in-the-gaps method 2
|
||||||
|
Failure mode for recovery testing 4
|
||||||
|
|
||||||
|
For example, set debug_msgs to 5 to display General messages and Mount
|
||||||
|
messages.
|
||||||
|
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
UBIFS documentation and FAQ/HOWTO at the MTD web site:
|
||||||
|
http://www.linux-mtd.infradead.org/doc/ubifs.html
|
||||||
|
http://www.linux-mtd.infradead.org/faq/ubifs.html
|
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user