docs: admin-guide: add a series of orphaned documents
There are lots of documents that belong to the admin-guide but are on random places (most under Documentation root dir). Move them to the admin guide. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
This commit is contained in:
124
Documentation/admin-guide/btmrvl.rst
Normal file
124
Documentation/admin-guide/btmrvl.rst
Normal file
@@ -0,0 +1,124 @@
|
||||
=============
|
||||
btmrvl driver
|
||||
=============
|
||||
|
||||
All commands are used via debugfs interface.
|
||||
|
||||
Set/get driver configurations
|
||||
=============================
|
||||
|
||||
Path: /debug/btmrvl/config/
|
||||
|
||||
gpiogap=[n], hscfgcmd
|
||||
These commands are used to configure the host sleep parameters::
|
||||
bit 8:0 -- Gap
|
||||
bit 16:8 -- GPIO
|
||||
|
||||
where GPIO is the pin number of GPIO used to wake up the host.
|
||||
It could be any valid GPIO pin# (e.g. 0-7) or 0xff (SDIO interface
|
||||
wakeup will be used instead).
|
||||
|
||||
where Gap is the gap in milli seconds between wakeup signal and
|
||||
wakeup event, or 0xff for special host sleep setting.
|
||||
|
||||
Usage::
|
||||
|
||||
# Use SDIO interface to wake up the host and set GAP to 0x80:
|
||||
echo 0xff80 > /debug/btmrvl/config/gpiogap
|
||||
echo 1 > /debug/btmrvl/config/hscfgcmd
|
||||
|
||||
# Use GPIO pin #3 to wake up the host and set GAP to 0xff:
|
||||
echo 0x03ff > /debug/btmrvl/config/gpiogap
|
||||
echo 1 > /debug/btmrvl/config/hscfgcmd
|
||||
|
||||
psmode=[n], pscmd
|
||||
These commands are used to enable/disable auto sleep mode
|
||||
|
||||
where the option is::
|
||||
|
||||
1 -- Enable auto sleep mode
|
||||
0 -- Disable auto sleep mode
|
||||
|
||||
Usage::
|
||||
|
||||
# Enable auto sleep mode
|
||||
echo 1 > /debug/btmrvl/config/psmode
|
||||
echo 1 > /debug/btmrvl/config/pscmd
|
||||
|
||||
# Disable auto sleep mode
|
||||
echo 0 > /debug/btmrvl/config/psmode
|
||||
echo 1 > /debug/btmrvl/config/pscmd
|
||||
|
||||
|
||||
hsmode=[n], hscmd
|
||||
These commands are used to enable host sleep or wake up firmware
|
||||
|
||||
where the option is::
|
||||
|
||||
1 -- Enable host sleep
|
||||
0 -- Wake up firmware
|
||||
|
||||
Usage::
|
||||
|
||||
# Enable host sleep
|
||||
echo 1 > /debug/btmrvl/config/hsmode
|
||||
echo 1 > /debug/btmrvl/config/hscmd
|
||||
|
||||
# Wake up firmware
|
||||
echo 0 > /debug/btmrvl/config/hsmode
|
||||
echo 1 > /debug/btmrvl/config/hscmd
|
||||
|
||||
|
||||
Get driver status
|
||||
=================
|
||||
|
||||
Path: /debug/btmrvl/status/
|
||||
|
||||
Usage::
|
||||
|
||||
cat /debug/btmrvl/status/<args>
|
||||
|
||||
where the args are:
|
||||
|
||||
curpsmode
|
||||
This command displays current auto sleep status.
|
||||
|
||||
psstate
|
||||
This command display the power save state.
|
||||
|
||||
hsstate
|
||||
This command display the host sleep state.
|
||||
|
||||
txdnldrdy
|
||||
This command displays the value of Tx download ready flag.
|
||||
|
||||
Issuing a raw hci command
|
||||
=========================
|
||||
|
||||
Use hcitool to issue raw hci command, refer to hcitool manual
|
||||
|
||||
Usage::
|
||||
|
||||
Hcitool cmd <ogf> <ocf> [Parameters]
|
||||
|
||||
Interface Control Command::
|
||||
|
||||
hcitool cmd 0x3f 0x5b 0xf5 0x01 0x00 --Enable All interface
|
||||
hcitool cmd 0x3f 0x5b 0xf5 0x01 0x01 --Enable Wlan interface
|
||||
hcitool cmd 0x3f 0x5b 0xf5 0x01 0x02 --Enable BT interface
|
||||
hcitool cmd 0x3f 0x5b 0xf5 0x00 0x00 --Disable All interface
|
||||
hcitool cmd 0x3f 0x5b 0xf5 0x00 0x01 --Disable Wlan interface
|
||||
hcitool cmd 0x3f 0x5b 0xf5 0x00 0x02 --Disable BT interface
|
||||
|
||||
SD8688 firmware
|
||||
===============
|
||||
|
||||
Images:
|
||||
|
||||
- /lib/firmware/sd8688_helper.bin
|
||||
- /lib/firmware/sd8688.bin
|
||||
|
||||
|
||||
The images can be downloaded from:
|
||||
|
||||
git.infradead.org/users/dwmw2/linux-firmware.git/libertas/
|
9
Documentation/admin-guide/clearing-warn-once.rst
Normal file
9
Documentation/admin-guide/clearing-warn-once.rst
Normal file
@@ -0,0 +1,9 @@
|
||||
Clearing WARN_ONCE
|
||||
------------------
|
||||
|
||||
WARN_ONCE / WARN_ON_ONCE / printk_once only emit a message once.
|
||||
|
||||
echo 1 > /sys/kernel/debug/clear_warn_once
|
||||
|
||||
clears the state and allows the warnings to print once again.
|
||||
This can be useful after test suite runs to reproduce problems.
|
114
Documentation/admin-guide/cpu-load.rst
Normal file
114
Documentation/admin-guide/cpu-load.rst
Normal file
@@ -0,0 +1,114 @@
|
||||
========
|
||||
CPU load
|
||||
========
|
||||
|
||||
Linux exports various bits of information via ``/proc/stat`` and
|
||||
``/proc/uptime`` that userland tools, such as top(1), use to calculate
|
||||
the average time system spent in a particular state, for example::
|
||||
|
||||
$ iostat
|
||||
Linux 2.6.18.3-exp (linmac) 02/20/2007
|
||||
|
||||
avg-cpu: %user %nice %system %iowait %steal %idle
|
||||
10.01 0.00 2.92 5.44 0.00 81.63
|
||||
|
||||
...
|
||||
|
||||
Here the system thinks that over the default sampling period the
|
||||
system spent 10.01% of the time doing work in user space, 2.92% in the
|
||||
kernel, and was overall 81.63% of the time idle.
|
||||
|
||||
In most cases the ``/proc/stat`` information reflects the reality quite
|
||||
closely, however due to the nature of how/when the kernel collects
|
||||
this data sometimes it can not be trusted at all.
|
||||
|
||||
So how is this information collected? Whenever timer interrupt is
|
||||
signalled the kernel looks what kind of task was running at this
|
||||
moment and increments the counter that corresponds to this tasks
|
||||
kind/state. The problem with this is that the system could have
|
||||
switched between various states multiple times between two timer
|
||||
interrupts yet the counter is incremented only for the last state.
|
||||
|
||||
|
||||
Example
|
||||
-------
|
||||
|
||||
If we imagine the system with one task that periodically burns cycles
|
||||
in the following manner::
|
||||
|
||||
time line between two timer interrupts
|
||||
|--------------------------------------|
|
||||
^ ^
|
||||
|_ something begins working |
|
||||
|_ something goes to sleep
|
||||
(only to be awaken quite soon)
|
||||
|
||||
In the above situation the system will be 0% loaded according to the
|
||||
``/proc/stat`` (since the timer interrupt will always happen when the
|
||||
system is executing the idle handler), but in reality the load is
|
||||
closer to 99%.
|
||||
|
||||
One can imagine many more situations where this behavior of the kernel
|
||||
will lead to quite erratic information inside ``/proc/stat``::
|
||||
|
||||
|
||||
/* gcc -o hog smallhog.c */
|
||||
#include <time.h>
|
||||
#include <limits.h>
|
||||
#include <signal.h>
|
||||
#include <sys/time.h>
|
||||
#define HIST 10
|
||||
|
||||
static volatile sig_atomic_t stop;
|
||||
|
||||
static void sighandler (int signr)
|
||||
{
|
||||
(void) signr;
|
||||
stop = 1;
|
||||
}
|
||||
static unsigned long hog (unsigned long niters)
|
||||
{
|
||||
stop = 0;
|
||||
while (!stop && --niters);
|
||||
return niters;
|
||||
}
|
||||
int main (void)
|
||||
{
|
||||
int i;
|
||||
struct itimerval it = { .it_interval = { .tv_sec = 0, .tv_usec = 1 },
|
||||
.it_value = { .tv_sec = 0, .tv_usec = 1 } };
|
||||
sigset_t set;
|
||||
unsigned long v[HIST];
|
||||
double tmp = 0.0;
|
||||
unsigned long n;
|
||||
signal (SIGALRM, &sighandler);
|
||||
setitimer (ITIMER_REAL, &it, NULL);
|
||||
|
||||
hog (ULONG_MAX);
|
||||
for (i = 0; i < HIST; ++i) v[i] = ULONG_MAX - hog (ULONG_MAX);
|
||||
for (i = 0; i < HIST; ++i) tmp += v[i];
|
||||
tmp /= HIST;
|
||||
n = tmp - (tmp / 3.0);
|
||||
|
||||
sigemptyset (&set);
|
||||
sigaddset (&set, SIGALRM);
|
||||
|
||||
for (;;) {
|
||||
hog (n);
|
||||
sigwait (&set, &i);
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
References
|
||||
----------
|
||||
|
||||
- http://lkml.org/lkml/2007/2/12/6
|
||||
- Documentation/filesystems/proc.txt (1.8)
|
||||
|
||||
|
||||
Thanks
|
||||
------
|
||||
|
||||
Con Kolivas, Pavel Machek
|
177
Documentation/admin-guide/cputopology.rst
Normal file
177
Documentation/admin-guide/cputopology.rst
Normal file
@@ -0,0 +1,177 @@
|
||||
===========================================
|
||||
How CPU topology info is exported via sysfs
|
||||
===========================================
|
||||
|
||||
Export CPU topology info via sysfs. Items (attributes) are similar
|
||||
to /proc/cpuinfo output of some architectures. They reside in
|
||||
/sys/devices/system/cpu/cpuX/topology/:
|
||||
|
||||
physical_package_id:
|
||||
|
||||
physical package id of cpuX. Typically corresponds to a physical
|
||||
socket number, but the actual value is architecture and platform
|
||||
dependent.
|
||||
|
||||
die_id:
|
||||
|
||||
the CPU die ID of cpuX. Typically it is the hardware platform's
|
||||
identifier (rather than the kernel's). The actual value is
|
||||
architecture and platform dependent.
|
||||
|
||||
core_id:
|
||||
|
||||
the CPU core ID of cpuX. Typically it is the hardware platform's
|
||||
identifier (rather than the kernel's). The actual value is
|
||||
architecture and platform dependent.
|
||||
|
||||
book_id:
|
||||
|
||||
the book ID of cpuX. Typically it is the hardware platform's
|
||||
identifier (rather than the kernel's). The actual value is
|
||||
architecture and platform dependent.
|
||||
|
||||
drawer_id:
|
||||
|
||||
the drawer ID of cpuX. Typically it is the hardware platform's
|
||||
identifier (rather than the kernel's). The actual value is
|
||||
architecture and platform dependent.
|
||||
|
||||
core_cpus:
|
||||
|
||||
internal kernel map of CPUs within the same core.
|
||||
(deprecated name: "thread_siblings")
|
||||
|
||||
core_cpus_list:
|
||||
|
||||
human-readable list of CPUs within the same core.
|
||||
(deprecated name: "thread_siblings_list");
|
||||
|
||||
package_cpus:
|
||||
|
||||
internal kernel map of the CPUs sharing the same physical_package_id.
|
||||
(deprecated name: "core_siblings")
|
||||
|
||||
package_cpus_list:
|
||||
|
||||
human-readable list of CPUs sharing the same physical_package_id.
|
||||
(deprecated name: "core_siblings_list")
|
||||
|
||||
die_cpus:
|
||||
|
||||
internal kernel map of CPUs within the same die.
|
||||
|
||||
die_cpus_list:
|
||||
|
||||
human-readable list of CPUs within the same die.
|
||||
|
||||
book_siblings:
|
||||
|
||||
internal kernel map of cpuX's hardware threads within the same
|
||||
book_id.
|
||||
|
||||
book_siblings_list:
|
||||
|
||||
human-readable list of cpuX's hardware threads within the same
|
||||
book_id.
|
||||
|
||||
drawer_siblings:
|
||||
|
||||
internal kernel map of cpuX's hardware threads within the same
|
||||
drawer_id.
|
||||
|
||||
drawer_siblings_list:
|
||||
|
||||
human-readable list of cpuX's hardware threads within the same
|
||||
drawer_id.
|
||||
|
||||
Architecture-neutral, drivers/base/topology.c, exports these attributes.
|
||||
However, the book and drawer related sysfs files will only be created if
|
||||
CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are selected, respectively.
|
||||
|
||||
CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are currently only used on s390,
|
||||
where they reflect the cpu and cache hierarchy.
|
||||
|
||||
For an architecture to support this feature, it must define some of
|
||||
these macros in include/asm-XXX/topology.h::
|
||||
|
||||
#define topology_physical_package_id(cpu)
|
||||
#define topology_die_id(cpu)
|
||||
#define topology_core_id(cpu)
|
||||
#define topology_book_id(cpu)
|
||||
#define topology_drawer_id(cpu)
|
||||
#define topology_sibling_cpumask(cpu)
|
||||
#define topology_core_cpumask(cpu)
|
||||
#define topology_die_cpumask(cpu)
|
||||
#define topology_book_cpumask(cpu)
|
||||
#define topology_drawer_cpumask(cpu)
|
||||
|
||||
The type of ``**_id macros`` is int.
|
||||
The type of ``**_cpumask macros`` is ``(const) struct cpumask *``. The latter
|
||||
correspond with appropriate ``**_siblings`` sysfs attributes (except for
|
||||
topology_sibling_cpumask() which corresponds with thread_siblings).
|
||||
|
||||
To be consistent on all architectures, include/linux/topology.h
|
||||
provides default definitions for any of the above macros that are
|
||||
not defined by include/asm-XXX/topology.h:
|
||||
|
||||
1) topology_physical_package_id: -1
|
||||
2) topology_die_id: -1
|
||||
3) topology_core_id: 0
|
||||
4) topology_sibling_cpumask: just the given CPU
|
||||
5) topology_core_cpumask: just the given CPU
|
||||
6) topology_die_cpumask: just the given CPU
|
||||
|
||||
For architectures that don't support books (CONFIG_SCHED_BOOK) there are no
|
||||
default definitions for topology_book_id() and topology_book_cpumask().
|
||||
For architectures that don't support drawers (CONFIG_SCHED_DRAWER) there are
|
||||
no default definitions for topology_drawer_id() and topology_drawer_cpumask().
|
||||
|
||||
Additionally, CPU topology information is provided under
|
||||
/sys/devices/system/cpu and includes these files. The internal
|
||||
source for the output is in brackets ("[]").
|
||||
|
||||
=========== ==========================================================
|
||||
kernel_max: the maximum CPU index allowed by the kernel configuration.
|
||||
[NR_CPUS-1]
|
||||
|
||||
offline: CPUs that are not online because they have been
|
||||
HOTPLUGGED off (see cpu-hotplug.txt) or exceed the limit
|
||||
of CPUs allowed by the kernel configuration (kernel_max
|
||||
above). [~cpu_online_mask + cpus >= NR_CPUS]
|
||||
|
||||
online: CPUs that are online and being scheduled [cpu_online_mask]
|
||||
|
||||
possible: CPUs that have been allocated resources and can be
|
||||
brought online if they are present. [cpu_possible_mask]
|
||||
|
||||
present: CPUs that have been identified as being present in the
|
||||
system. [cpu_present_mask]
|
||||
=========== ==========================================================
|
||||
|
||||
The format for the above output is compatible with cpulist_parse()
|
||||
[see <linux/cpumask.h>]. Some examples follow.
|
||||
|
||||
In this example, there are 64 CPUs in the system but cpus 32-63 exceed
|
||||
the kernel max which is limited to 0..31 by the NR_CPUS config option
|
||||
being 32. Note also that CPUs 2 and 4-31 are not online but could be
|
||||
brought online as they are both present and possible::
|
||||
|
||||
kernel_max: 31
|
||||
offline: 2,4-31,32-63
|
||||
online: 0-1,3
|
||||
possible: 0-31
|
||||
present: 0-31
|
||||
|
||||
In this example, the NR_CPUS config option is 128, but the kernel was
|
||||
started with possible_cpus=144. There are 4 CPUs in the system and cpu2
|
||||
was manually taken offline (and is the only CPU that can be brought
|
||||
online.)::
|
||||
|
||||
kernel_max: 127
|
||||
offline: 2,4-127,128-143
|
||||
online: 0-1,3
|
||||
possible: 0-127
|
||||
present: 0-3
|
||||
|
||||
See cpu-hotplug.txt for the possible_cpus=NUM kernel start parameter
|
||||
as well as more information on the various cpumasks.
|
@@ -13,7 +13,7 @@ the range specified.
|
||||
|
||||
The I/O statistics counters for each step-sized area of a region are
|
||||
in the same format as `/sys/block/*/stat` or `/proc/diskstats` (see:
|
||||
Documentation/iostats.txt). But two extra counters (12 and 13) are
|
||||
Documentation/admin-guide/iostats.rst). But two extra counters (12 and 13) are
|
||||
provided: total time spent reading and writing. When the histogram
|
||||
argument is used, the 14th parameter is reported that represents the
|
||||
histogram of latencies. All these counters may be accessed by sending
|
||||
@@ -151,7 +151,7 @@ Messages
|
||||
The first 11 counters have the same meaning as
|
||||
`/sys/block/*/stat or /proc/diskstats`.
|
||||
|
||||
Please refer to Documentation/iostats.txt for details.
|
||||
Please refer to Documentation/admin-guide/iostats.rst for details.
|
||||
|
||||
1. the number of reads completed
|
||||
2. the number of reads merged
|
||||
|
100
Documentation/admin-guide/efi-stub.rst
Normal file
100
Documentation/admin-guide/efi-stub.rst
Normal file
@@ -0,0 +1,100 @@
|
||||
=================
|
||||
The EFI Boot Stub
|
||||
=================
|
||||
|
||||
On the x86 and ARM platforms, a kernel zImage/bzImage can masquerade
|
||||
as a PE/COFF image, thereby convincing EFI firmware loaders to load
|
||||
it as an EFI executable. The code that modifies the bzImage header,
|
||||
along with the EFI-specific entry point that the firmware loader
|
||||
jumps to are collectively known as the "EFI boot stub", and live in
|
||||
arch/x86/boot/header.S and arch/x86/boot/compressed/eboot.c,
|
||||
respectively. For ARM the EFI stub is implemented in
|
||||
arch/arm/boot/compressed/efi-header.S and
|
||||
arch/arm/boot/compressed/efi-stub.c. EFI stub code that is shared
|
||||
between architectures is in drivers/firmware/efi/libstub.
|
||||
|
||||
For arm64, there is no compressed kernel support, so the Image itself
|
||||
masquerades as a PE/COFF image and the EFI stub is linked into the
|
||||
kernel. The arm64 EFI stub lives in arch/arm64/kernel/efi-entry.S
|
||||
and drivers/firmware/efi/libstub/arm64-stub.c.
|
||||
|
||||
By using the EFI boot stub it's possible to boot a Linux kernel
|
||||
without the use of a conventional EFI boot loader, such as grub or
|
||||
elilo. Since the EFI boot stub performs the jobs of a boot loader, in
|
||||
a certain sense it *IS* the boot loader.
|
||||
|
||||
The EFI boot stub is enabled with the CONFIG_EFI_STUB kernel option.
|
||||
|
||||
|
||||
How to install bzImage.efi
|
||||
--------------------------
|
||||
|
||||
The bzImage located in arch/x86/boot/bzImage must be copied to the EFI
|
||||
System Partition (ESP) and renamed with the extension ".efi". Without
|
||||
the extension the EFI firmware loader will refuse to execute it. It's
|
||||
not possible to execute bzImage.efi from the usual Linux file systems
|
||||
because EFI firmware doesn't have support for them. For ARM the
|
||||
arch/arm/boot/zImage should be copied to the system partition, and it
|
||||
may not need to be renamed. Similarly for arm64, arch/arm64/boot/Image
|
||||
should be copied but not necessarily renamed.
|
||||
|
||||
|
||||
Passing kernel parameters from the EFI shell
|
||||
--------------------------------------------
|
||||
|
||||
Arguments to the kernel can be passed after bzImage.efi, e.g.::
|
||||
|
||||
fs0:> bzImage.efi console=ttyS0 root=/dev/sda4
|
||||
|
||||
|
||||
The "initrd=" option
|
||||
--------------------
|
||||
|
||||
Like most boot loaders, the EFI stub allows the user to specify
|
||||
multiple initrd files using the "initrd=" option. This is the only EFI
|
||||
stub-specific command line parameter, everything else is passed to the
|
||||
kernel when it boots.
|
||||
|
||||
The path to the initrd file must be an absolute path from the
|
||||
beginning of the ESP, relative path names do not work. Also, the path
|
||||
is an EFI-style path and directory elements must be separated with
|
||||
backslashes (\). For example, given the following directory layout::
|
||||
|
||||
fs0:>
|
||||
Kernels\
|
||||
bzImage.efi
|
||||
initrd-large.img
|
||||
|
||||
Ramdisks\
|
||||
initrd-small.img
|
||||
initrd-medium.img
|
||||
|
||||
to boot with the initrd-large.img file if the current working
|
||||
directory is fs0:\Kernels, the following command must be used::
|
||||
|
||||
fs0:\Kernels> bzImage.efi initrd=\Kernels\initrd-large.img
|
||||
|
||||
Notice how bzImage.efi can be specified with a relative path. That's
|
||||
because the image we're executing is interpreted by the EFI shell,
|
||||
which understands relative paths, whereas the rest of the command line
|
||||
is passed to bzImage.efi.
|
||||
|
||||
|
||||
The "dtb=" option
|
||||
-----------------
|
||||
|
||||
For the ARM and arm64 architectures, a device tree must be provided to
|
||||
the kernel. Normally firmware shall supply the device tree via the
|
||||
EFI CONFIGURATION TABLE. However, the "dtb=" command line option can
|
||||
be used to override the firmware supplied device tree, or to supply
|
||||
one when firmware is unable to.
|
||||
|
||||
Please note: Firmware adds runtime configuration information to the
|
||||
device tree before booting the kernel. If dtb= is used to override
|
||||
the device tree, then any runtime data provided by firmware will be
|
||||
lost. The dtb= option should only be used either as a debug tool, or
|
||||
as a last resort when a device tree is not provided in the EFI
|
||||
CONFIGURATION TABLE.
|
||||
|
||||
"dtb=" is processed in the same manner as the "initrd=" option that is
|
||||
described above.
|
80
Documentation/admin-guide/highuid.rst
Normal file
80
Documentation/admin-guide/highuid.rst
Normal file
@@ -0,0 +1,80 @@
|
||||
===================================================
|
||||
Notes on the change from 16-bit UIDs to 32-bit UIDs
|
||||
===================================================
|
||||
|
||||
:Author: Chris Wing <wingc@umich.edu>
|
||||
:Last updated: January 11, 2000
|
||||
|
||||
- kernel code MUST take into account __kernel_uid_t and __kernel_uid32_t
|
||||
when communicating between user and kernel space in an ioctl or data
|
||||
structure.
|
||||
|
||||
- kernel code should use uid_t and gid_t in kernel-private structures and
|
||||
code.
|
||||
|
||||
What's left to be done for 32-bit UIDs on all Linux architectures:
|
||||
|
||||
- Disk quotas have an interesting limitation that is not related to the
|
||||
maximum UID/GID. They are limited by the maximum file size on the
|
||||
underlying filesystem, because quota records are written at offsets
|
||||
corresponding to the UID in question.
|
||||
Further investigation is needed to see if the quota system can cope
|
||||
properly with huge UIDs. If it can deal with 64-bit file offsets on all
|
||||
architectures, this should not be a problem.
|
||||
|
||||
- Decide whether or not to keep backwards compatibility with the system
|
||||
accounting file, or if we should break it as the comments suggest
|
||||
(currently, the old 16-bit UID and GID are still written to disk, and
|
||||
part of the former pad space is used to store separate 32-bit UID and
|
||||
GID)
|
||||
|
||||
- Need to validate that OS emulation calls the 16-bit UID
|
||||
compatibility syscalls, if the OS being emulated used 16-bit UIDs, or
|
||||
uses the 32-bit UID system calls properly otherwise.
|
||||
|
||||
This affects at least:
|
||||
|
||||
- iBCS on Intel
|
||||
|
||||
- sparc32 emulation on sparc64
|
||||
(need to support whatever new 32-bit UID system calls are added to
|
||||
sparc32)
|
||||
|
||||
- Validate that all filesystems behave properly.
|
||||
|
||||
At present, 32-bit UIDs _should_ work for:
|
||||
|
||||
- ext2
|
||||
- ufs
|
||||
- isofs
|
||||
- nfs
|
||||
- coda
|
||||
- udf
|
||||
|
||||
Ioctl() fixups have been made for:
|
||||
|
||||
- ncpfs
|
||||
- smbfs
|
||||
|
||||
Filesystems with simple fixups to prevent 16-bit UID wraparound:
|
||||
|
||||
- minix
|
||||
- sysv
|
||||
- qnx4
|
||||
|
||||
Other filesystems have not been checked yet.
|
||||
|
||||
- The ncpfs and smpfs filesystems cannot presently use 32-bit UIDs in
|
||||
all ioctl()s. Some new ioctl()s have been added with 32-bit UIDs, but
|
||||
more are needed. (as well as new user<->kernel data structures)
|
||||
|
||||
- The ELF core dump format only supports 16-bit UIDs on arm, i386, m68k,
|
||||
sh, and sparc32. Fixing this is probably not that important, but would
|
||||
require adding a new ELF section.
|
||||
|
||||
- The ioctl()s used to control the in-kernel NFS server only support
|
||||
16-bit UIDs on arm, i386, m68k, sh, and sparc32.
|
||||
|
||||
- make sure that the UID mapping feature of AX25 networking works properly
|
||||
(it should be safe because it's always used a 32-bit integer to
|
||||
communicate between user and kernel)
|
@@ -241,7 +241,7 @@ Guest mitigation mechanisms
|
||||
For further information about confining guests to a single or to a group
|
||||
of cores consult the cpusets documentation:
|
||||
|
||||
https://www.kernel.org/doc/Documentation/cgroup-v1/cpusets.rst
|
||||
https://www.kernel.org/doc/Documentation/admin-guide/cgroup-v1/cpusets.rst
|
||||
|
||||
.. _interrupt_isolation:
|
||||
|
||||
|
105
Documentation/admin-guide/hw_random.rst
Normal file
105
Documentation/admin-guide/hw_random.rst
Normal file
@@ -0,0 +1,105 @@
|
||||
==========================================================
|
||||
Linux support for random number generator in i8xx chipsets
|
||||
==========================================================
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
The hw_random framework is software that makes use of a
|
||||
special hardware feature on your CPU or motherboard,
|
||||
a Random Number Generator (RNG). The software has two parts:
|
||||
a core providing the /dev/hwrng character device and its
|
||||
sysfs support, plus a hardware-specific driver that plugs
|
||||
into that core.
|
||||
|
||||
To make the most effective use of these mechanisms, you
|
||||
should download the support software as well. Download the
|
||||
latest version of the "rng-tools" package from the
|
||||
hw_random driver's official Web site:
|
||||
|
||||
http://sourceforge.net/projects/gkernel/
|
||||
|
||||
Those tools use /dev/hwrng to fill the kernel entropy pool,
|
||||
which is used internally and exported by the /dev/urandom and
|
||||
/dev/random special files.
|
||||
|
||||
Theory of operation
|
||||
===================
|
||||
|
||||
CHARACTER DEVICE. Using the standard open()
|
||||
and read() system calls, you can read random data from
|
||||
the hardware RNG device. This data is NOT CHECKED by any
|
||||
fitness tests, and could potentially be bogus (if the
|
||||
hardware is faulty or has been tampered with). Data is only
|
||||
output if the hardware "has-data" flag is set, but nevertheless
|
||||
a security-conscious person would run fitness tests on the
|
||||
data before assuming it is truly random.
|
||||
|
||||
The rng-tools package uses such tests in "rngd", and lets you
|
||||
run them by hand with a "rngtest" utility.
|
||||
|
||||
/dev/hwrng is char device major 10, minor 183.
|
||||
|
||||
CLASS DEVICE. There is a /sys/class/misc/hw_random node with
|
||||
two unique attributes, "rng_available" and "rng_current". The
|
||||
"rng_available" attribute lists the hardware-specific drivers
|
||||
available, while "rng_current" lists the one which is currently
|
||||
connected to /dev/hwrng. If your system has more than one
|
||||
RNG available, you may change the one used by writing a name from
|
||||
the list in "rng_available" into "rng_current".
|
||||
|
||||
==========================================================================
|
||||
|
||||
|
||||
Hardware driver for Intel/AMD/VIA Random Number Generators (RNG)
|
||||
- Copyright 2000,2001 Jeff Garzik <jgarzik@pobox.com>
|
||||
- Copyright 2000,2001 Philipp Rumpf <prumpf@mandrakesoft.com>
|
||||
|
||||
|
||||
About the Intel RNG hardware, from the firmware hub datasheet
|
||||
=============================================================
|
||||
|
||||
The Firmware Hub integrates a Random Number Generator (RNG)
|
||||
using thermal noise generated from inherently random quantum
|
||||
mechanical properties of silicon. When not generating new random
|
||||
bits the RNG circuitry will enter a low power state. Intel will
|
||||
provide a binary software driver to give third party software
|
||||
access to our RNG for use as a security feature. At this time,
|
||||
the RNG is only to be used with a system in an OS-present state.
|
||||
|
||||
Intel RNG Driver notes
|
||||
======================
|
||||
|
||||
FIXME: support poll(2)
|
||||
|
||||
.. note::
|
||||
|
||||
request_mem_region was removed, for three reasons:
|
||||
|
||||
1) Only one RNG is supported by this driver;
|
||||
2) The location used by the RNG is a fixed location in
|
||||
MMIO-addressable memory;
|
||||
3) users with properly working BIOS e820 handling will always
|
||||
have the region in which the RNG is located reserved, so
|
||||
request_mem_region calls always fail for proper setups.
|
||||
However, for people who use mem=XX, BIOS e820 information is
|
||||
**not** in /proc/iomem, and request_mem_region(RNG_ADDR) can
|
||||
succeed.
|
||||
|
||||
Driver details
|
||||
==============
|
||||
|
||||
Based on:
|
||||
Intel 82802AB/82802AC Firmware Hub (FWH) Datasheet
|
||||
May 1999 Order Number: 290658-002 R
|
||||
|
||||
Intel 82802 Firmware Hub:
|
||||
Random Number Generator
|
||||
Programmer's Reference Manual
|
||||
December 1999 Order Number: 298029-001 R
|
||||
|
||||
Intel 82802 Firmware HUB Random Number Generator Driver
|
||||
Copyright (c) 2000 Matt Sottek <msottek@quiknet.com>
|
||||
|
||||
Special thanks to Matt Sottek. I did the "guts", he
|
||||
did the "brains" and all the testing.
|
@@ -85,8 +85,25 @@ configure specific aspects of kernel behavior to your liking.
|
||||
perf-security
|
||||
acpi/index
|
||||
aoe/index
|
||||
btmrvl
|
||||
clearing-warn-once
|
||||
cpu-load
|
||||
cputopology
|
||||
device-mapper/index
|
||||
efi-stub
|
||||
highuid
|
||||
hw_random
|
||||
iostats
|
||||
kernel-per-CPU-kthreads
|
||||
laptops/index
|
||||
lcd-panel-cgram
|
||||
ldm
|
||||
lockup-watchdogs
|
||||
numastat
|
||||
pnp
|
||||
rtc
|
||||
svga
|
||||
video-output
|
||||
|
||||
.. only:: subproject and html
|
||||
|
||||
|
197
Documentation/admin-guide/iostats.rst
Normal file
197
Documentation/admin-guide/iostats.rst
Normal file
@@ -0,0 +1,197 @@
|
||||
=====================
|
||||
I/O statistics fields
|
||||
=====================
|
||||
|
||||
Since 2.4.20 (and some versions before, with patches), and 2.5.45,
|
||||
more extensive disk statistics have been introduced to help measure disk
|
||||
activity. Tools such as ``sar`` and ``iostat`` typically interpret these and do
|
||||
the work for you, but in case you are interested in creating your own
|
||||
tools, the fields are explained here.
|
||||
|
||||
In 2.4 now, the information is found as additional fields in
|
||||
``/proc/partitions``. In 2.6 and upper, the same information is found in two
|
||||
places: one is in the file ``/proc/diskstats``, and the other is within
|
||||
the sysfs file system, which must be mounted in order to obtain
|
||||
the information. Throughout this document we'll assume that sysfs
|
||||
is mounted on ``/sys``, although of course it may be mounted anywhere.
|
||||
Both ``/proc/diskstats`` and sysfs use the same source for the information
|
||||
and so should not differ.
|
||||
|
||||
Here are examples of these different formats::
|
||||
|
||||
2.4:
|
||||
3 0 39082680 hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160
|
||||
3 1 9221278 hda1 35486 0 35496 38030 0 0 0 0 0 38030 38030
|
||||
|
||||
2.6+ sysfs:
|
||||
446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160
|
||||
35486 38030 38030 38030
|
||||
|
||||
2.6+ diskstats:
|
||||
3 0 hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160
|
||||
3 1 hda1 35486 38030 38030 38030
|
||||
|
||||
4.18+ diskstats:
|
||||
3 0 hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160 0 0 0 0
|
||||
|
||||
On 2.4 you might execute ``grep 'hda ' /proc/partitions``. On 2.6+, you have
|
||||
a choice of ``cat /sys/block/hda/stat`` or ``grep 'hda ' /proc/diskstats``.
|
||||
|
||||
The advantage of one over the other is that the sysfs choice works well
|
||||
if you are watching a known, small set of disks. ``/proc/diskstats`` may
|
||||
be a better choice if you are watching a large number of disks because
|
||||
you'll avoid the overhead of 50, 100, or 500 or more opens/closes with
|
||||
each snapshot of your disk statistics.
|
||||
|
||||
In 2.4, the statistics fields are those after the device name. In
|
||||
the above example, the first field of statistics would be 446216.
|
||||
By contrast, in 2.6+ if you look at ``/sys/block/hda/stat``, you'll
|
||||
find just the eleven fields, beginning with 446216. If you look at
|
||||
``/proc/diskstats``, the eleven fields will be preceded by the major and
|
||||
minor device numbers, and device name. Each of these formats provides
|
||||
eleven fields of statistics, each meaning exactly the same things.
|
||||
All fields except field 9 are cumulative since boot. Field 9 should
|
||||
go to zero as I/Os complete; all others only increase (unless they
|
||||
overflow and wrap). Yes, these are (32-bit or 64-bit) unsigned long
|
||||
(native word size) numbers, and on a very busy or long-lived system they
|
||||
may wrap. Applications should be prepared to deal with that; unless
|
||||
your observations are measured in large numbers of minutes or hours,
|
||||
they should not wrap twice before you notice them.
|
||||
|
||||
Each set of stats only applies to the indicated device; if you want
|
||||
system-wide stats you'll have to find all the devices and sum them all up.
|
||||
|
||||
Field 1 -- # of reads completed
|
||||
This is the total number of reads completed successfully.
|
||||
|
||||
Field 2 -- # of reads merged, field 6 -- # of writes merged
|
||||
Reads and writes which are adjacent to each other may be merged for
|
||||
efficiency. Thus two 4K reads may become one 8K read before it is
|
||||
ultimately handed to the disk, and so it will be counted (and queued)
|
||||
as only one I/O. This field lets you know how often this was done.
|
||||
|
||||
Field 3 -- # of sectors read
|
||||
This is the total number of sectors read successfully.
|
||||
|
||||
Field 4 -- # of milliseconds spent reading
|
||||
This is the total number of milliseconds spent by all reads (as
|
||||
measured from __make_request() to end_that_request_last()).
|
||||
|
||||
Field 5 -- # of writes completed
|
||||
This is the total number of writes completed successfully.
|
||||
|
||||
Field 6 -- # of writes merged
|
||||
See the description of field 2.
|
||||
|
||||
Field 7 -- # of sectors written
|
||||
This is the total number of sectors written successfully.
|
||||
|
||||
Field 8 -- # of milliseconds spent writing
|
||||
This is the total number of milliseconds spent by all writes (as
|
||||
measured from __make_request() to end_that_request_last()).
|
||||
|
||||
Field 9 -- # of I/Os currently in progress
|
||||
The only field that should go to zero. Incremented as requests are
|
||||
given to appropriate struct request_queue and decremented as they finish.
|
||||
|
||||
Field 10 -- # of milliseconds spent doing I/Os
|
||||
This field increases so long as field 9 is nonzero.
|
||||
|
||||
Since 5.0 this field counts jiffies when at least one request was
|
||||
started or completed. If request runs more than 2 jiffies then some
|
||||
I/O time will not be accounted unless there are other requests.
|
||||
|
||||
Field 11 -- weighted # of milliseconds spent doing I/Os
|
||||
This field is incremented at each I/O start, I/O completion, I/O
|
||||
merge, or read of these stats by the number of I/Os in progress
|
||||
(field 9) times the number of milliseconds spent doing I/O since the
|
||||
last update of this field. This can provide an easy measure of both
|
||||
I/O completion time and the backlog that may be accumulating.
|
||||
|
||||
Field 12 -- # of discards completed
|
||||
This is the total number of discards completed successfully.
|
||||
|
||||
Field 13 -- # of discards merged
|
||||
See the description of field 2
|
||||
|
||||
Field 14 -- # of sectors discarded
|
||||
This is the total number of sectors discarded successfully.
|
||||
|
||||
Field 15 -- # of milliseconds spent discarding
|
||||
This is the total number of milliseconds spent by all discards (as
|
||||
measured from __make_request() to end_that_request_last()).
|
||||
|
||||
To avoid introducing performance bottlenecks, no locks are held while
|
||||
modifying these counters. This implies that minor inaccuracies may be
|
||||
introduced when changes collide, so (for instance) adding up all the
|
||||
read I/Os issued per partition should equal those made to the disks ...
|
||||
but due to the lack of locking it may only be very close.
|
||||
|
||||
In 2.6+, there are counters for each CPU, which make the lack of locking
|
||||
almost a non-issue. When the statistics are read, the per-CPU counters
|
||||
are summed (possibly overflowing the unsigned long variable they are
|
||||
summed to) and the result given to the user. There is no convenient
|
||||
user interface for accessing the per-CPU counters themselves.
|
||||
|
||||
Disks vs Partitions
|
||||
-------------------
|
||||
|
||||
There were significant changes between 2.4 and 2.6+ in the I/O subsystem.
|
||||
As a result, some statistic information disappeared. The translation from
|
||||
a disk address relative to a partition to the disk address relative to
|
||||
the host disk happens much earlier. All merges and timings now happen
|
||||
at the disk level rather than at both the disk and partition level as
|
||||
in 2.4. Consequently, you'll see a different statistics output on 2.6+ for
|
||||
partitions from that for disks. There are only *four* fields available
|
||||
for partitions on 2.6+ machines. This is reflected in the examples above.
|
||||
|
||||
Field 1 -- # of reads issued
|
||||
This is the total number of reads issued to this partition.
|
||||
|
||||
Field 2 -- # of sectors read
|
||||
This is the total number of sectors requested to be read from this
|
||||
partition.
|
||||
|
||||
Field 3 -- # of writes issued
|
||||
This is the total number of writes issued to this partition.
|
||||
|
||||
Field 4 -- # of sectors written
|
||||
This is the total number of sectors requested to be written to
|
||||
this partition.
|
||||
|
||||
Note that since the address is translated to a disk-relative one, and no
|
||||
record of the partition-relative address is kept, the subsequent success
|
||||
or failure of the read cannot be attributed to the partition. In other
|
||||
words, the number of reads for partitions is counted slightly before time
|
||||
of queuing for partitions, and at completion for whole disks. This is
|
||||
a subtle distinction that is probably uninteresting for most cases.
|
||||
|
||||
More significant is the error induced by counting the numbers of
|
||||
reads/writes before merges for partitions and after for disks. Since a
|
||||
typical workload usually contains a lot of successive and adjacent requests,
|
||||
the number of reads/writes issued can be several times higher than the
|
||||
number of reads/writes completed.
|
||||
|
||||
In 2.6.25, the full statistic set is again available for partitions and
|
||||
disk and partition statistics are consistent again. Since we still don't
|
||||
keep record of the partition-relative address, an operation is attributed to
|
||||
the partition which contains the first sector of the request after the
|
||||
eventual merges. As requests can be merged across partition, this could lead
|
||||
to some (probably insignificant) inaccuracy.
|
||||
|
||||
Additional notes
|
||||
----------------
|
||||
|
||||
In 2.6+, sysfs is not mounted by default. If your distribution of
|
||||
Linux hasn't added it already, here's the line you'll want to add to
|
||||
your ``/etc/fstab``::
|
||||
|
||||
none /sys sysfs defaults 0 0
|
||||
|
||||
|
||||
In 2.6+, all disk statistics were removed from ``/proc/stat``. In 2.4, they
|
||||
appear in both ``/proc/partitions`` and ``/proc/stat``, although the ones in
|
||||
``/proc/stat`` take a very different format from those in ``/proc/partitions``
|
||||
(see proc(5), if your system has it.)
|
||||
|
||||
-- ricklind@us.ibm.com
|
@@ -5066,7 +5066,7 @@
|
||||
|
||||
vga= [BOOT,X86-32] Select a particular video mode
|
||||
See Documentation/x86/boot.rst and
|
||||
Documentation/svga.txt.
|
||||
Documentation/admin-guide/svga.rst.
|
||||
Use vga=ask for menu.
|
||||
This is actually a boot loader parameter; the value is
|
||||
passed to the kernel using a special protocol.
|
||||
|
356
Documentation/admin-guide/kernel-per-CPU-kthreads.rst
Normal file
356
Documentation/admin-guide/kernel-per-CPU-kthreads.rst
Normal file
@@ -0,0 +1,356 @@
|
||||
==========================================
|
||||
Reducing OS jitter due to per-cpu kthreads
|
||||
==========================================
|
||||
|
||||
This document lists per-CPU kthreads in the Linux kernel and presents
|
||||
options to control their OS jitter. Note that non-per-CPU kthreads are
|
||||
not listed here. To reduce OS jitter from non-per-CPU kthreads, bind
|
||||
them to a "housekeeping" CPU dedicated to such work.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
- Documentation/IRQ-affinity.txt: Binding interrupts to sets of CPUs.
|
||||
|
||||
- Documentation/admin-guide/cgroup-v1: Using cgroups to bind tasks to sets of CPUs.
|
||||
|
||||
- man taskset: Using the taskset command to bind tasks to sets
|
||||
of CPUs.
|
||||
|
||||
- man sched_setaffinity: Using the sched_setaffinity() system
|
||||
call to bind tasks to sets of CPUs.
|
||||
|
||||
- /sys/devices/system/cpu/cpuN/online: Control CPU N's hotplug state,
|
||||
writing "0" to offline and "1" to online.
|
||||
|
||||
- In order to locate kernel-generated OS jitter on CPU N:
|
||||
|
||||
cd /sys/kernel/debug/tracing
|
||||
echo 1 > max_graph_depth # Increase the "1" for more detail
|
||||
echo function_graph > current_tracer
|
||||
# run workload
|
||||
cat per_cpu/cpuN/trace
|
||||
|
||||
kthreads
|
||||
========
|
||||
|
||||
Name:
|
||||
ehca_comp/%u
|
||||
|
||||
Purpose:
|
||||
Periodically process Infiniband-related work.
|
||||
|
||||
To reduce its OS jitter, do any of the following:
|
||||
|
||||
1. Don't use eHCA Infiniband hardware, instead choosing hardware
|
||||
that does not require per-CPU kthreads. This will prevent these
|
||||
kthreads from being created in the first place. (This will
|
||||
work for most people, as this hardware, though important, is
|
||||
relatively old and is produced in relatively low unit volumes.)
|
||||
2. Do all eHCA-Infiniband-related work on other CPUs, including
|
||||
interrupts.
|
||||
3. Rework the eHCA driver so that its per-CPU kthreads are
|
||||
provisioned only on selected CPUs.
|
||||
|
||||
|
||||
Name:
|
||||
irq/%d-%s
|
||||
|
||||
Purpose:
|
||||
Handle threaded interrupts.
|
||||
|
||||
To reduce its OS jitter, do the following:
|
||||
|
||||
1. Use irq affinity to force the irq threads to execute on
|
||||
some other CPU.
|
||||
|
||||
Name:
|
||||
kcmtpd_ctr_%d
|
||||
|
||||
Purpose:
|
||||
Handle Bluetooth work.
|
||||
|
||||
To reduce its OS jitter, do one of the following:
|
||||
|
||||
1. Don't use Bluetooth, in which case these kthreads won't be
|
||||
created in the first place.
|
||||
2. Use irq affinity to force Bluetooth-related interrupts to
|
||||
occur on some other CPU and furthermore initiate all
|
||||
Bluetooth activity on some other CPU.
|
||||
|
||||
Name:
|
||||
ksoftirqd/%u
|
||||
|
||||
Purpose:
|
||||
Execute softirq handlers when threaded or when under heavy load.
|
||||
|
||||
To reduce its OS jitter, each softirq vector must be handled
|
||||
separately as follows:
|
||||
|
||||
TIMER_SOFTIRQ
|
||||
-------------
|
||||
|
||||
Do all of the following:
|
||||
|
||||
1. To the extent possible, keep the CPU out of the kernel when it
|
||||
is non-idle, for example, by avoiding system calls and by forcing
|
||||
both kernel threads and interrupts to execute elsewhere.
|
||||
2. Build with CONFIG_HOTPLUG_CPU=y. After boot completes, force
|
||||
the CPU offline, then bring it back online. This forces
|
||||
recurring timers to migrate elsewhere. If you are concerned
|
||||
with multiple CPUs, force them all offline before bringing the
|
||||
first one back online. Once you have onlined the CPUs in question,
|
||||
do not offline any other CPUs, because doing so could force the
|
||||
timer back onto one of the CPUs in question.
|
||||
|
||||
NET_TX_SOFTIRQ and NET_RX_SOFTIRQ
|
||||
---------------------------------
|
||||
|
||||
Do all of the following:
|
||||
|
||||
1. Force networking interrupts onto other CPUs.
|
||||
2. Initiate any network I/O on other CPUs.
|
||||
3. Once your application has started, prevent CPU-hotplug operations
|
||||
from being initiated from tasks that might run on the CPU to
|
||||
be de-jittered. (It is OK to force this CPU offline and then
|
||||
bring it back online before you start your application.)
|
||||
|
||||
BLOCK_SOFTIRQ
|
||||
-------------
|
||||
|
||||
Do all of the following:
|
||||
|
||||
1. Force block-device interrupts onto some other CPU.
|
||||
2. Initiate any block I/O on other CPUs.
|
||||
3. Once your application has started, prevent CPU-hotplug operations
|
||||
from being initiated from tasks that might run on the CPU to
|
||||
be de-jittered. (It is OK to force this CPU offline and then
|
||||
bring it back online before you start your application.)
|
||||
|
||||
IRQ_POLL_SOFTIRQ
|
||||
----------------
|
||||
|
||||
Do all of the following:
|
||||
|
||||
1. Force block-device interrupts onto some other CPU.
|
||||
2. Initiate any block I/O and block-I/O polling on other CPUs.
|
||||
3. Once your application has started, prevent CPU-hotplug operations
|
||||
from being initiated from tasks that might run on the CPU to
|
||||
be de-jittered. (It is OK to force this CPU offline and then
|
||||
bring it back online before you start your application.)
|
||||
|
||||
TASKLET_SOFTIRQ
|
||||
---------------
|
||||
|
||||
Do one or more of the following:
|
||||
|
||||
1. Avoid use of drivers that use tasklets. (Such drivers will contain
|
||||
calls to things like tasklet_schedule().)
|
||||
2. Convert all drivers that you must use from tasklets to workqueues.
|
||||
3. Force interrupts for drivers using tasklets onto other CPUs,
|
||||
and also do I/O involving these drivers on other CPUs.
|
||||
|
||||
SCHED_SOFTIRQ
|
||||
-------------
|
||||
|
||||
Do all of the following:
|
||||
|
||||
1. Avoid sending scheduler IPIs to the CPU to be de-jittered,
|
||||
for example, ensure that at most one runnable kthread is present
|
||||
on that CPU. If a thread that expects to run on the de-jittered
|
||||
CPU awakens, the scheduler will send an IPI that can result in
|
||||
a subsequent SCHED_SOFTIRQ.
|
||||
2. CONFIG_NO_HZ_FULL=y and ensure that the CPU to be de-jittered
|
||||
is marked as an adaptive-ticks CPU using the "nohz_full="
|
||||
boot parameter. This reduces the number of scheduler-clock
|
||||
interrupts that the de-jittered CPU receives, minimizing its
|
||||
chances of being selected to do the load balancing work that
|
||||
runs in SCHED_SOFTIRQ context.
|
||||
3. To the extent possible, keep the CPU out of the kernel when it
|
||||
is non-idle, for example, by avoiding system calls and by
|
||||
forcing both kernel threads and interrupts to execute elsewhere.
|
||||
This further reduces the number of scheduler-clock interrupts
|
||||
received by the de-jittered CPU.
|
||||
|
||||
HRTIMER_SOFTIRQ
|
||||
---------------
|
||||
|
||||
Do all of the following:
|
||||
|
||||
1. To the extent possible, keep the CPU out of the kernel when it
|
||||
is non-idle. For example, avoid system calls and force both
|
||||
kernel threads and interrupts to execute elsewhere.
|
||||
2. Build with CONFIG_HOTPLUG_CPU=y. Once boot completes, force the
|
||||
CPU offline, then bring it back online. This forces recurring
|
||||
timers to migrate elsewhere. If you are concerned with multiple
|
||||
CPUs, force them all offline before bringing the first one
|
||||
back online. Once you have onlined the CPUs in question, do not
|
||||
offline any other CPUs, because doing so could force the timer
|
||||
back onto one of the CPUs in question.
|
||||
|
||||
RCU_SOFTIRQ
|
||||
-----------
|
||||
|
||||
Do at least one of the following:
|
||||
|
||||
1. Offload callbacks and keep the CPU in either dyntick-idle or
|
||||
adaptive-ticks state by doing all of the following:
|
||||
|
||||
a. CONFIG_NO_HZ_FULL=y and ensure that the CPU to be
|
||||
de-jittered is marked as an adaptive-ticks CPU using the
|
||||
"nohz_full=" boot parameter. Bind the rcuo kthreads to
|
||||
housekeeping CPUs, which can tolerate OS jitter.
|
||||
b. To the extent possible, keep the CPU out of the kernel
|
||||
when it is non-idle, for example, by avoiding system
|
||||
calls and by forcing both kernel threads and interrupts
|
||||
to execute elsewhere.
|
||||
|
||||
2. Enable RCU to do its processing remotely via dyntick-idle by
|
||||
doing all of the following:
|
||||
|
||||
a. Build with CONFIG_NO_HZ=y and CONFIG_RCU_FAST_NO_HZ=y.
|
||||
b. Ensure that the CPU goes idle frequently, allowing other
|
||||
CPUs to detect that it has passed through an RCU quiescent
|
||||
state. If the kernel is built with CONFIG_NO_HZ_FULL=y,
|
||||
userspace execution also allows other CPUs to detect that
|
||||
the CPU in question has passed through a quiescent state.
|
||||
c. To the extent possible, keep the CPU out of the kernel
|
||||
when it is non-idle, for example, by avoiding system
|
||||
calls and by forcing both kernel threads and interrupts
|
||||
to execute elsewhere.
|
||||
|
||||
Name:
|
||||
kworker/%u:%d%s (cpu, id, priority)
|
||||
|
||||
Purpose:
|
||||
Execute workqueue requests
|
||||
|
||||
To reduce its OS jitter, do any of the following:
|
||||
|
||||
1. Run your workload at a real-time priority, which will allow
|
||||
preempting the kworker daemons.
|
||||
2. A given workqueue can be made visible in the sysfs filesystem
|
||||
by passing the WQ_SYSFS to that workqueue's alloc_workqueue().
|
||||
Such a workqueue can be confined to a given subset of the
|
||||
CPUs using the ``/sys/devices/virtual/workqueue/*/cpumask`` sysfs
|
||||
files. The set of WQ_SYSFS workqueues can be displayed using
|
||||
"ls sys/devices/virtual/workqueue". That said, the workqueues
|
||||
maintainer would like to caution people against indiscriminately
|
||||
sprinkling WQ_SYSFS across all the workqueues. The reason for
|
||||
caution is that it is easy to add WQ_SYSFS, but because sysfs is
|
||||
part of the formal user/kernel API, it can be nearly impossible
|
||||
to remove it, even if its addition was a mistake.
|
||||
3. Do any of the following needed to avoid jitter that your
|
||||
application cannot tolerate:
|
||||
|
||||
a. Build your kernel with CONFIG_SLUB=y rather than
|
||||
CONFIG_SLAB=y, thus avoiding the slab allocator's periodic
|
||||
use of each CPU's workqueues to run its cache_reap()
|
||||
function.
|
||||
b. Avoid using oprofile, thus avoiding OS jitter from
|
||||
wq_sync_buffer().
|
||||
c. Limit your CPU frequency so that a CPU-frequency
|
||||
governor is not required, possibly enlisting the aid of
|
||||
special heatsinks or other cooling technologies. If done
|
||||
correctly, and if you CPU architecture permits, you should
|
||||
be able to build your kernel with CONFIG_CPU_FREQ=n to
|
||||
avoid the CPU-frequency governor periodically running
|
||||
on each CPU, including cs_dbs_timer() and od_dbs_timer().
|
||||
|
||||
WARNING: Please check your CPU specifications to
|
||||
make sure that this is safe on your particular system.
|
||||
d. As of v3.18, Christoph Lameter's on-demand vmstat workers
|
||||
commit prevents OS jitter due to vmstat_update() on
|
||||
CONFIG_SMP=y systems. Before v3.18, is not possible
|
||||
to entirely get rid of the OS jitter, but you can
|
||||
decrease its frequency by writing a large value to
|
||||
/proc/sys/vm/stat_interval. The default value is HZ,
|
||||
for an interval of one second. Of course, larger values
|
||||
will make your virtual-memory statistics update more
|
||||
slowly. Of course, you can also run your workload at
|
||||
a real-time priority, thus preempting vmstat_update(),
|
||||
but if your workload is CPU-bound, this is a bad idea.
|
||||
However, there is an RFC patch from Christoph Lameter
|
||||
(based on an earlier one from Gilad Ben-Yossef) that
|
||||
reduces or even eliminates vmstat overhead for some
|
||||
workloads at https://lkml.org/lkml/2013/9/4/379.
|
||||
e. Boot with "elevator=noop" to avoid workqueue use by
|
||||
the block layer.
|
||||
f. If running on high-end powerpc servers, build with
|
||||
CONFIG_PPC_RTAS_DAEMON=n. This prevents the RTAS
|
||||
daemon from running on each CPU every second or so.
|
||||
(This will require editing Kconfig files and will defeat
|
||||
this platform's RAS functionality.) This avoids jitter
|
||||
due to the rtas_event_scan() function.
|
||||
WARNING: Please check your CPU specifications to
|
||||
make sure that this is safe on your particular system.
|
||||
g. If running on Cell Processor, build your kernel with
|
||||
CBE_CPUFREQ_SPU_GOVERNOR=n to avoid OS jitter from
|
||||
spu_gov_work().
|
||||
WARNING: Please check your CPU specifications to
|
||||
make sure that this is safe on your particular system.
|
||||
h. If running on PowerMAC, build your kernel with
|
||||
CONFIG_PMAC_RACKMETER=n to disable the CPU-meter,
|
||||
avoiding OS jitter from rackmeter_do_timer().
|
||||
|
||||
Name:
|
||||
rcuc/%u
|
||||
|
||||
Purpose:
|
||||
Execute RCU callbacks in CONFIG_RCU_BOOST=y kernels.
|
||||
|
||||
To reduce its OS jitter, do at least one of the following:
|
||||
|
||||
1. Build the kernel with CONFIG_PREEMPT=n. This prevents these
|
||||
kthreads from being created in the first place, and also obviates
|
||||
the need for RCU priority boosting. This approach is feasible
|
||||
for workloads that do not require high degrees of responsiveness.
|
||||
2. Build the kernel with CONFIG_RCU_BOOST=n. This prevents these
|
||||
kthreads from being created in the first place. This approach
|
||||
is feasible only if your workload never requires RCU priority
|
||||
boosting, for example, if you ensure frequent idle time on all
|
||||
CPUs that might execute within the kernel.
|
||||
3. Build with CONFIG_RCU_NOCB_CPU=y and boot with the rcu_nocbs=
|
||||
boot parameter offloading RCU callbacks from all CPUs susceptible
|
||||
to OS jitter. This approach prevents the rcuc/%u kthreads from
|
||||
having any work to do, so that they are never awakened.
|
||||
4. Ensure that the CPU never enters the kernel, and, in particular,
|
||||
avoid initiating any CPU hotplug operations on this CPU. This is
|
||||
another way of preventing any callbacks from being queued on the
|
||||
CPU, again preventing the rcuc/%u kthreads from having any work
|
||||
to do.
|
||||
|
||||
Name:
|
||||
rcuop/%d and rcuos/%d
|
||||
|
||||
Purpose:
|
||||
Offload RCU callbacks from the corresponding CPU.
|
||||
|
||||
To reduce its OS jitter, do at least one of the following:
|
||||
|
||||
1. Use affinity, cgroups, or other mechanism to force these kthreads
|
||||
to execute on some other CPU.
|
||||
2. Build with CONFIG_RCU_NOCB_CPU=n, which will prevent these
|
||||
kthreads from being created in the first place. However, please
|
||||
note that this will not eliminate OS jitter, but will instead
|
||||
shift it to RCU_SOFTIRQ.
|
||||
|
||||
Name:
|
||||
watchdog/%u
|
||||
|
||||
Purpose:
|
||||
Detect software lockups on each CPU.
|
||||
|
||||
To reduce its OS jitter, do at least one of the following:
|
||||
|
||||
1. Build with CONFIG_LOCKUP_DETECTOR=n, which will prevent these
|
||||
kthreads from being created in the first place.
|
||||
2. Boot with "nosoftlockup=0", which will also prevent these kthreads
|
||||
from being created. Other related watchdog and softlockup boot
|
||||
parameters may be found in Documentation/admin-guide/kernel-parameters.rst
|
||||
and Documentation/watchdog/watchdog-parameters.rst.
|
||||
3. Echo a zero to /proc/sys/kernel/watchdog to disable the
|
||||
watchdog timer.
|
||||
4. Echo a large number of /proc/sys/kernel/watchdog_thresh in
|
||||
order to reduce the frequency of OS jitter due to the watchdog
|
||||
timer down to a level that is acceptable for your workload.
|
27
Documentation/admin-guide/lcd-panel-cgram.rst
Normal file
27
Documentation/admin-guide/lcd-panel-cgram.rst
Normal file
@@ -0,0 +1,27 @@
|
||||
======================================
|
||||
Parallel port LCD/Keypad Panel support
|
||||
======================================
|
||||
|
||||
Some LCDs allow you to define up to 8 characters, mapped to ASCII
|
||||
characters 0 to 7. The escape code to define a new character is
|
||||
'\e[LG' followed by one digit from 0 to 7, representing the character
|
||||
number, and up to 8 couples of hex digits terminated by a semi-colon
|
||||
(';'). Each couple of digits represents a line, with 1-bits for each
|
||||
illuminated pixel with LSB on the right. Lines are numbered from the
|
||||
top of the character to the bottom. On a 5x7 matrix, only the 5 lower
|
||||
bits of the 7 first bytes are used for each character. If the string
|
||||
is incomplete, only complete lines will be redefined. Here are some
|
||||
examples::
|
||||
|
||||
printf "\e[LG0010101050D1F0C04;" => 0 = [enter]
|
||||
printf "\e[LG1040E1F0000000000;" => 1 = [up]
|
||||
printf "\e[LG2000000001F0E0400;" => 2 = [down]
|
||||
printf "\e[LG3040E1F001F0E0400;" => 3 = [up-down]
|
||||
printf "\e[LG40002060E1E0E0602;" => 4 = [left]
|
||||
printf "\e[LG500080C0E0F0E0C08;" => 5 = [right]
|
||||
printf "\e[LG60016051516141400;" => 6 = "IP"
|
||||
|
||||
printf "\e[LG00103071F1F070301;" => big speaker
|
||||
printf "\e[LG00002061E1E060200;" => small speaker
|
||||
|
||||
Willy
|
121
Documentation/admin-guide/ldm.rst
Normal file
121
Documentation/admin-guide/ldm.rst
Normal file
@@ -0,0 +1,121 @@
|
||||
==========================================
|
||||
LDM - Logical Disk Manager (Dynamic Disks)
|
||||
==========================================
|
||||
|
||||
:Author: Originally Written by FlatCap - Richard Russon <ldm@flatcap.org>.
|
||||
:Last Updated: Anton Altaparmakov on 30 March 2007 for Windows Vista.
|
||||
|
||||
Overview
|
||||
--------
|
||||
|
||||
Windows 2000, XP, and Vista use a new partitioning scheme. It is a complete
|
||||
replacement for the MSDOS style partitions. It stores its information in a
|
||||
1MiB journalled database at the end of the physical disk. The size of
|
||||
partitions is limited only by disk space. The maximum number of partitions is
|
||||
nearly 2000.
|
||||
|
||||
Any partitions created under the LDM are called "Dynamic Disks". There are no
|
||||
longer any primary or extended partitions. Normal MSDOS style partitions are
|
||||
now known as Basic Disks.
|
||||
|
||||
If you wish to use Spanned, Striped, Mirrored or RAID 5 Volumes, you must use
|
||||
Dynamic Disks. The journalling allows Windows to make changes to these
|
||||
partitions and filesystems without the need to reboot.
|
||||
|
||||
Once the LDM driver has divided up the disk, you can use the MD driver to
|
||||
assemble any multi-partition volumes, e.g. Stripes, RAID5.
|
||||
|
||||
To prevent legacy applications from repartitioning the disk, the LDM creates a
|
||||
dummy MSDOS partition containing one disk-sized partition. This is what is
|
||||
supported with the Linux LDM driver.
|
||||
|
||||
A newer approach that has been implemented with Vista is to put LDM on top of a
|
||||
GPT label disk. This is not supported by the Linux LDM driver yet.
|
||||
|
||||
|
||||
Example
|
||||
-------
|
||||
|
||||
Below we have a 50MiB disk, divided into seven partitions.
|
||||
|
||||
.. note::
|
||||
|
||||
The missing 1MiB at the end of the disk is where the LDM database is
|
||||
stored.
|
||||
|
||||
+-------++--------------+---------+-----++--------------+---------+----+
|
||||
|Device || Offset Bytes | Sectors | MiB || Size Bytes | Sectors | MiB|
|
||||
+=======++==============+=========+=====++==============+=========+====+
|
||||
|hda || 0 | 0 | 0 || 52428800 | 102400 | 50|
|
||||
+-------++--------------+---------+-----++--------------+---------+----+
|
||||
|hda1 || 51380224 | 100352 | 49 || 1048576 | 2048 | 1|
|
||||
+-------++--------------+---------+-----++--------------+---------+----+
|
||||
|hda2 || 16384 | 32 | 0 || 6979584 | 13632 | 6|
|
||||
+-------++--------------+---------+-----++--------------+---------+----+
|
||||
|hda3 || 6995968 | 13664 | 6 || 10485760 | 20480 | 10|
|
||||
+-------++--------------+---------+-----++--------------+---------+----+
|
||||
|hda4 || 17481728 | 34144 | 16 || 4194304 | 8192 | 4|
|
||||
+-------++--------------+---------+-----++--------------+---------+----+
|
||||
|hda5 || 21676032 | 42336 | 20 || 5242880 | 10240 | 5|
|
||||
+-------++--------------+---------+-----++--------------+---------+----+
|
||||
|hda6 || 26918912 | 52576 | 25 || 10485760 | 20480 | 10|
|
||||
+-------++--------------+---------+-----++--------------+---------+----+
|
||||
|hda7 || 37404672 | 73056 | 35 || 13959168 | 27264 | 13|
|
||||
+-------++--------------+---------+-----++--------------+---------+----+
|
||||
|
||||
The LDM Database may not store the partitions in the order that they appear on
|
||||
disk, but the driver will sort them.
|
||||
|
||||
When Linux boots, you will see something like::
|
||||
|
||||
hda: 102400 sectors w/32KiB Cache, CHS=50/64/32
|
||||
hda: [LDM] hda1 hda2 hda3 hda4 hda5 hda6 hda7
|
||||
|
||||
|
||||
Compiling LDM Support
|
||||
---------------------
|
||||
|
||||
To enable LDM, choose the following two options:
|
||||
|
||||
- "Advanced partition selection" CONFIG_PARTITION_ADVANCED
|
||||
- "Windows Logical Disk Manager (Dynamic Disk) support" CONFIG_LDM_PARTITION
|
||||
|
||||
If you believe the driver isn't working as it should, you can enable the extra
|
||||
debugging code. This will produce a LOT of output. The option is:
|
||||
|
||||
- "Windows LDM extra logging" CONFIG_LDM_DEBUG
|
||||
|
||||
N.B. The partition code cannot be compiled as a module.
|
||||
|
||||
As with all the partition code, if the driver doesn't see signs of its type of
|
||||
partition, it will pass control to another driver, so there is no harm in
|
||||
enabling it.
|
||||
|
||||
If you have Dynamic Disks but don't enable the driver, then all you will see
|
||||
is a dummy MSDOS partition filling the whole disk. You won't be able to mount
|
||||
any of the volumes on the disk.
|
||||
|
||||
|
||||
Booting
|
||||
-------
|
||||
|
||||
If you enable LDM support, then lilo is capable of booting from any of the
|
||||
discovered partitions. However, grub does not understand the LDM partitioning
|
||||
and cannot boot from a Dynamic Disk.
|
||||
|
||||
|
||||
More Documentation
|
||||
------------------
|
||||
|
||||
There is an Overview of the LDM together with complete Technical Documentation.
|
||||
It is available for download.
|
||||
|
||||
http://www.linux-ntfs.org/
|
||||
|
||||
If you have any LDM questions that aren't answered in the documentation, email
|
||||
me.
|
||||
|
||||
Cheers,
|
||||
FlatCap - Richard Russon
|
||||
ldm@flatcap.org
|
||||
|
83
Documentation/admin-guide/lockup-watchdogs.rst
Normal file
83
Documentation/admin-guide/lockup-watchdogs.rst
Normal file
@@ -0,0 +1,83 @@
|
||||
===============================================================
|
||||
Softlockup detector and hardlockup detector (aka nmi_watchdog)
|
||||
===============================================================
|
||||
|
||||
The Linux kernel can act as a watchdog to detect both soft and hard
|
||||
lockups.
|
||||
|
||||
A 'softlockup' is defined as a bug that causes the kernel to loop in
|
||||
kernel mode for more than 20 seconds (see "Implementation" below for
|
||||
details), without giving other tasks a chance to run. The current
|
||||
stack trace is displayed upon detection and, by default, the system
|
||||
will stay locked up. Alternatively, the kernel can be configured to
|
||||
panic; a sysctl, "kernel.softlockup_panic", a kernel parameter,
|
||||
"softlockup_panic" (see "Documentation/admin-guide/kernel-parameters.rst" for
|
||||
details), and a compile option, "BOOTPARAM_SOFTLOCKUP_PANIC", are
|
||||
provided for this.
|
||||
|
||||
A 'hardlockup' is defined as a bug that causes the CPU to loop in
|
||||
kernel mode for more than 10 seconds (see "Implementation" below for
|
||||
details), without letting other interrupts have a chance to run.
|
||||
Similarly to the softlockup case, the current stack trace is displayed
|
||||
upon detection and the system will stay locked up unless the default
|
||||
behavior is changed, which can be done through a sysctl,
|
||||
'hardlockup_panic', a compile time knob, "BOOTPARAM_HARDLOCKUP_PANIC",
|
||||
and a kernel parameter, "nmi_watchdog"
|
||||
(see "Documentation/admin-guide/kernel-parameters.rst" for details).
|
||||
|
||||
The panic option can be used in combination with panic_timeout (this
|
||||
timeout is set through the confusingly named "kernel.panic" sysctl),
|
||||
to cause the system to reboot automatically after a specified amount
|
||||
of time.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
The soft and hard lockup detectors are built on top of the hrtimer and
|
||||
perf subsystems, respectively. A direct consequence of this is that,
|
||||
in principle, they should work in any architecture where these
|
||||
subsystems are present.
|
||||
|
||||
A periodic hrtimer runs to generate interrupts and kick the watchdog
|
||||
task. An NMI perf event is generated every "watchdog_thresh"
|
||||
(compile-time initialized to 10 and configurable through sysctl of the
|
||||
same name) seconds to check for hardlockups. If any CPU in the system
|
||||
does not receive any hrtimer interrupt during that time the
|
||||
'hardlockup detector' (the handler for the NMI perf event) will
|
||||
generate a kernel warning or call panic, depending on the
|
||||
configuration.
|
||||
|
||||
The watchdog task is a high priority kernel thread that updates a
|
||||
timestamp every time it is scheduled. If that timestamp is not updated
|
||||
for 2*watchdog_thresh seconds (the softlockup threshold) the
|
||||
'softlockup detector' (coded inside the hrtimer callback function)
|
||||
will dump useful debug information to the system log, after which it
|
||||
will call panic if it was instructed to do so or resume execution of
|
||||
other kernel code.
|
||||
|
||||
The period of the hrtimer is 2*watchdog_thresh/5, which means it has
|
||||
two or three chances to generate an interrupt before the hardlockup
|
||||
detector kicks in.
|
||||
|
||||
As explained above, a kernel knob is provided that allows
|
||||
administrators to configure the period of the hrtimer and the perf
|
||||
event. The right value for a particular environment is a trade-off
|
||||
between fast response to lockups and detection overhead.
|
||||
|
||||
By default, the watchdog runs on all online cores. However, on a
|
||||
kernel configured with NO_HZ_FULL, by default the watchdog runs only
|
||||
on the housekeeping cores, not the cores specified in the "nohz_full"
|
||||
boot argument. If we allowed the watchdog to run by default on
|
||||
the "nohz_full" cores, we would have to run timer ticks to activate
|
||||
the scheduler, which would prevent the "nohz_full" functionality
|
||||
from protecting the user code on those cores from the kernel.
|
||||
Of course, disabling it by default on the nohz_full cores means that
|
||||
when those cores do enter the kernel, by default we will not be
|
||||
able to detect if they lock up. However, allowing the watchdog
|
||||
to continue to run on the housekeeping (non-tickless) cores means
|
||||
that we will continue to detect lockups properly on those cores.
|
||||
|
||||
In either case, the set of cores excluded from running the watchdog
|
||||
may be adjusted via the kernel.watchdog_cpumask sysctl. For
|
||||
nohz_full cores, this may be useful for debugging a case where the
|
||||
kernel seems to be hanging on the nohz_full cores.
|
25
Documentation/admin-guide/mm/cma_debugfs.rst
Normal file
25
Documentation/admin-guide/mm/cma_debugfs.rst
Normal file
@@ -0,0 +1,25 @@
|
||||
=====================
|
||||
CMA Debugfs Interface
|
||||
=====================
|
||||
|
||||
The CMA debugfs interface is useful to retrieve basic information out of the
|
||||
different CMA areas and to test allocation/release in each of the areas.
|
||||
|
||||
Each CMA zone represents a directory under <debugfs>/cma/, indexed by the
|
||||
kernel's CMA index. So the first CMA zone would be:
|
||||
|
||||
<debugfs>/cma/cma-0
|
||||
|
||||
The structure of the files created under that directory is as follows:
|
||||
|
||||
- [RO] base_pfn: The base PFN (Page Frame Number) of the zone.
|
||||
- [RO] count: Amount of memory in the CMA area.
|
||||
- [RO] order_per_bit: Order of pages represented by one bit.
|
||||
- [RO] bitmap: The bitmap of page states in the zone.
|
||||
- [WO] alloc: Allocate N pages from that CMA area. For example::
|
||||
|
||||
echo 5 > <debugfs>/cma/cma-2/alloc
|
||||
|
||||
would try to allocate 5 pages from the cma-2 area.
|
||||
|
||||
- [WO] free: Free N pages from that CMA area, similar to the above.
|
@@ -26,6 +26,7 @@ the Linux memory management.
|
||||
:maxdepth: 1
|
||||
|
||||
concepts
|
||||
cma_debugfs
|
||||
hugetlbpage
|
||||
idle_page_tracking
|
||||
ksm
|
||||
|
30
Documentation/admin-guide/numastat.rst
Normal file
30
Documentation/admin-guide/numastat.rst
Normal file
@@ -0,0 +1,30 @@
|
||||
===============================
|
||||
Numa policy hit/miss statistics
|
||||
===============================
|
||||
|
||||
/sys/devices/system/node/node*/numastat
|
||||
|
||||
All units are pages. Hugepages have separate counters.
|
||||
|
||||
=============== ============================================================
|
||||
numa_hit A process wanted to allocate memory from this node,
|
||||
and succeeded.
|
||||
|
||||
numa_miss A process wanted to allocate memory from another node,
|
||||
but ended up with memory from this node.
|
||||
|
||||
numa_foreign A process wanted to allocate on this node,
|
||||
but ended up with memory from another one.
|
||||
|
||||
local_node A process ran on this node and got memory from it.
|
||||
|
||||
other_node A process ran on this node and got memory from another node.
|
||||
|
||||
interleave_hit Interleaving wanted to allocate from this node
|
||||
and succeeded.
|
||||
=============== ============================================================
|
||||
|
||||
For easier reading you can use the numastat utility from the numactl package
|
||||
(http://oss.sgi.com/projects/libnuma/). Note that it only works
|
||||
well right now on machines with a small number of CPUs.
|
||||
|
292
Documentation/admin-guide/pnp.rst
Normal file
292
Documentation/admin-guide/pnp.rst
Normal file
@@ -0,0 +1,292 @@
|
||||
=================================
|
||||
Linux Plug and Play Documentation
|
||||
=================================
|
||||
|
||||
:Author: Adam Belay <ambx1@neo.rr.com>
|
||||
:Last updated: Oct. 16, 2002
|
||||
|
||||
|
||||
Overview
|
||||
--------
|
||||
|
||||
Plug and Play provides a means of detecting and setting resources for legacy or
|
||||
otherwise unconfigurable devices. The Linux Plug and Play Layer provides these
|
||||
services to compatible drivers.
|
||||
|
||||
|
||||
The User Interface
|
||||
------------------
|
||||
|
||||
The Linux Plug and Play user interface provides a means to activate PnP devices
|
||||
for legacy and user level drivers that do not support Linux Plug and Play. The
|
||||
user interface is integrated into sysfs.
|
||||
|
||||
In addition to the standard sysfs file the following are created in each
|
||||
device's directory:
|
||||
- id - displays a list of support EISA IDs
|
||||
- options - displays possible resource configurations
|
||||
- resources - displays currently allocated resources and allows resource changes
|
||||
|
||||
activating a device
|
||||
^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
::
|
||||
|
||||
# echo "auto" > resources
|
||||
|
||||
this will invoke the automatic resource config system to activate the device
|
||||
|
||||
manually activating a device
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
::
|
||||
|
||||
# echo "manual <depnum> <mode>" > resources
|
||||
|
||||
<depnum> - the configuration number
|
||||
<mode> - static or dynamic
|
||||
static = for next boot
|
||||
dynamic = now
|
||||
|
||||
disabling a device
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
|
||||
::
|
||||
|
||||
# echo "disable" > resources
|
||||
|
||||
|
||||
EXAMPLE:
|
||||
|
||||
Suppose you need to activate the floppy disk controller.
|
||||
|
||||
1. change to the proper directory, in my case it is
|
||||
/driver/bus/pnp/devices/00:0f::
|
||||
|
||||
# cd /driver/bus/pnp/devices/00:0f
|
||||
# cat name
|
||||
PC standard floppy disk controller
|
||||
|
||||
2. check if the device is already active::
|
||||
|
||||
# cat resources
|
||||
DISABLED
|
||||
|
||||
- Notice the string "DISABLED". This means the device is not active.
|
||||
|
||||
3. check the device's possible configurations (optional)::
|
||||
|
||||
# cat options
|
||||
Dependent: 01 - Priority acceptable
|
||||
port 0x3f0-0x3f0, align 0x7, size 0x6, 16-bit address decoding
|
||||
port 0x3f7-0x3f7, align 0x0, size 0x1, 16-bit address decoding
|
||||
irq 6
|
||||
dma 2 8-bit compatible
|
||||
Dependent: 02 - Priority acceptable
|
||||
port 0x370-0x370, align 0x7, size 0x6, 16-bit address decoding
|
||||
port 0x377-0x377, align 0x0, size 0x1, 16-bit address decoding
|
||||
irq 6
|
||||
dma 2 8-bit compatible
|
||||
|
||||
4. now activate the device::
|
||||
|
||||
# echo "auto" > resources
|
||||
|
||||
5. finally check if the device is active::
|
||||
|
||||
# cat resources
|
||||
io 0x3f0-0x3f5
|
||||
io 0x3f7-0x3f7
|
||||
irq 6
|
||||
dma 2
|
||||
|
||||
also there are a series of kernel parameters::
|
||||
|
||||
pnp_reserve_irq=irq1[,irq2] ....
|
||||
pnp_reserve_dma=dma1[,dma2] ....
|
||||
pnp_reserve_io=io1,size1[,io2,size2] ....
|
||||
pnp_reserve_mem=mem1,size1[,mem2,size2] ....
|
||||
|
||||
|
||||
|
||||
The Unified Plug and Play Layer
|
||||
-------------------------------
|
||||
|
||||
All Plug and Play drivers, protocols, and services meet at a central location
|
||||
called the Plug and Play Layer. This layer is responsible for the exchange of
|
||||
information between PnP drivers and PnP protocols. Thus it automatically
|
||||
forwards commands to the proper protocol. This makes writing PnP drivers
|
||||
significantly easier.
|
||||
|
||||
The following functions are available from the Plug and Play Layer:
|
||||
|
||||
pnp_get_protocol
|
||||
increments the number of uses by one
|
||||
|
||||
pnp_put_protocol
|
||||
deincrements the number of uses by one
|
||||
|
||||
pnp_register_protocol
|
||||
use this to register a new PnP protocol
|
||||
|
||||
pnp_unregister_protocol
|
||||
use this function to remove a PnP protocol from the Plug and Play Layer
|
||||
|
||||
pnp_register_driver
|
||||
adds a PnP driver to the Plug and Play Layer
|
||||
|
||||
this includes driver model integration
|
||||
returns zero for success or a negative error number for failure; count
|
||||
calls to the .add() method if you need to know how many devices bind to
|
||||
the driver
|
||||
|
||||
pnp_unregister_driver
|
||||
removes a PnP driver from the Plug and Play Layer
|
||||
|
||||
|
||||
|
||||
Plug and Play Protocols
|
||||
-----------------------
|
||||
|
||||
This section contains information for PnP protocol developers.
|
||||
|
||||
The following Protocols are currently available in the computing world:
|
||||
|
||||
- PNPBIOS:
|
||||
used for system devices such as serial and parallel ports.
|
||||
- ISAPNP:
|
||||
provides PnP support for the ISA bus
|
||||
- ACPI:
|
||||
among its many uses, ACPI provides information about system level
|
||||
devices.
|
||||
|
||||
It is meant to replace the PNPBIOS. It is not currently supported by Linux
|
||||
Plug and Play but it is planned to be in the near future.
|
||||
|
||||
|
||||
Requirements for a Linux PnP protocol:
|
||||
1. the protocol must use EISA IDs
|
||||
2. the protocol must inform the PnP Layer of a device's current configuration
|
||||
|
||||
- the ability to set resources is optional but preferred.
|
||||
|
||||
The following are PnP protocol related functions:
|
||||
|
||||
pnp_add_device
|
||||
use this function to add a PnP device to the PnP layer
|
||||
|
||||
only call this function when all wanted values are set in the pnp_dev
|
||||
structure
|
||||
|
||||
pnp_init_device
|
||||
call this to initialize the PnP structure
|
||||
|
||||
pnp_remove_device
|
||||
call this to remove a device from the Plug and Play Layer.
|
||||
it will fail if the device is still in use.
|
||||
automatically will free mem used by the device and related structures
|
||||
|
||||
pnp_add_id
|
||||
adds an EISA ID to the list of supported IDs for the specified device
|
||||
|
||||
For more information consult the source of a protocol such as
|
||||
/drivers/pnp/pnpbios/core.c.
|
||||
|
||||
|
||||
|
||||
Linux Plug and Play Drivers
|
||||
---------------------------
|
||||
|
||||
This section contains information for Linux PnP driver developers.
|
||||
|
||||
The New Way
|
||||
^^^^^^^^^^^
|
||||
|
||||
1. first make a list of supported EISA IDS
|
||||
|
||||
ex::
|
||||
|
||||
static const struct pnp_id pnp_dev_table[] = {
|
||||
/* Standard LPT Printer Port */
|
||||
{.id = "PNP0400", .driver_data = 0},
|
||||
/* ECP Printer Port */
|
||||
{.id = "PNP0401", .driver_data = 0},
|
||||
{.id = ""}
|
||||
};
|
||||
|
||||
Please note that the character 'X' can be used as a wild card in the function
|
||||
portion (last four characters).
|
||||
|
||||
ex::
|
||||
|
||||
/* Unknown PnP modems */
|
||||
{ "PNPCXXX", UNKNOWN_DEV },
|
||||
|
||||
Supported PnP card IDs can optionally be defined.
|
||||
ex::
|
||||
|
||||
static const struct pnp_id pnp_card_table[] = {
|
||||
{ "ANYDEVS", 0 },
|
||||
{ "", 0 }
|
||||
};
|
||||
|
||||
2. Optionally define probe and remove functions. It may make sense not to
|
||||
define these functions if the driver already has a reliable method of detecting
|
||||
the resources, such as the parport_pc driver.
|
||||
|
||||
ex::
|
||||
|
||||
static int
|
||||
serial_pnp_probe(struct pnp_dev * dev, const struct pnp_id *card_id, const
|
||||
struct pnp_id *dev_id)
|
||||
{
|
||||
. . .
|
||||
|
||||
ex::
|
||||
|
||||
static void serial_pnp_remove(struct pnp_dev * dev)
|
||||
{
|
||||
. . .
|
||||
|
||||
consult /drivers/serial/8250_pnp.c for more information.
|
||||
|
||||
3. create a driver structure
|
||||
|
||||
ex::
|
||||
|
||||
static struct pnp_driver serial_pnp_driver = {
|
||||
.name = "serial",
|
||||
.card_id_table = pnp_card_table,
|
||||
.id_table = pnp_dev_table,
|
||||
.probe = serial_pnp_probe,
|
||||
.remove = serial_pnp_remove,
|
||||
};
|
||||
|
||||
* name and id_table cannot be NULL.
|
||||
|
||||
4. register the driver
|
||||
|
||||
ex::
|
||||
|
||||
static int __init serial8250_pnp_init(void)
|
||||
{
|
||||
return pnp_register_driver(&serial_pnp_driver);
|
||||
}
|
||||
|
||||
The Old Way
|
||||
^^^^^^^^^^^
|
||||
|
||||
A series of compatibility functions have been created to make it easy to convert
|
||||
ISAPNP drivers. They should serve as a temporary solution only.
|
||||
|
||||
They are as follows::
|
||||
|
||||
struct pnp_card *pnp_find_card(unsigned short vendor,
|
||||
unsigned short device,
|
||||
struct pnp_card *from)
|
||||
|
||||
struct pnp_dev *pnp_find_dev(struct pnp_card *card,
|
||||
unsigned short vendor,
|
||||
unsigned short function,
|
||||
struct pnp_dev *from)
|
||||
|
140
Documentation/admin-guide/rtc.rst
Normal file
140
Documentation/admin-guide/rtc.rst
Normal file
@@ -0,0 +1,140 @@
|
||||
=======================================
|
||||
Real Time Clock (RTC) Drivers for Linux
|
||||
=======================================
|
||||
|
||||
When Linux developers talk about a "Real Time Clock", they usually mean
|
||||
something that tracks wall clock time and is battery backed so that it
|
||||
works even with system power off. Such clocks will normally not track
|
||||
the local time zone or daylight savings time -- unless they dual boot
|
||||
with MS-Windows -- but will instead be set to Coordinated Universal Time
|
||||
(UTC, formerly "Greenwich Mean Time").
|
||||
|
||||
The newest non-PC hardware tends to just count seconds, like the time(2)
|
||||
system call reports, but RTCs also very commonly represent time using
|
||||
the Gregorian calendar and 24 hour time, as reported by gmtime(3).
|
||||
|
||||
Linux has two largely-compatible userspace RTC API families you may
|
||||
need to know about:
|
||||
|
||||
* /dev/rtc ... is the RTC provided by PC compatible systems,
|
||||
so it's not very portable to non-x86 systems.
|
||||
|
||||
* /dev/rtc0, /dev/rtc1 ... are part of a framework that's
|
||||
supported by a wide variety of RTC chips on all systems.
|
||||
|
||||
Programmers need to understand that the PC/AT functionality is not
|
||||
always available, and some systems can do much more. That is, the
|
||||
RTCs use the same API to make requests in both RTC frameworks (using
|
||||
different filenames of course), but the hardware may not offer the
|
||||
same functionality. For example, not every RTC is hooked up to an
|
||||
IRQ, so they can't all issue alarms; and where standard PC RTCs can
|
||||
only issue an alarm up to 24 hours in the future, other hardware may
|
||||
be able to schedule one any time in the upcoming century.
|
||||
|
||||
|
||||
Old PC/AT-Compatible driver: /dev/rtc
|
||||
--------------------------------------
|
||||
|
||||
All PCs (even Alpha machines) have a Real Time Clock built into them.
|
||||
Usually they are built into the chipset of the computer, but some may
|
||||
actually have a Motorola MC146818 (or clone) on the board. This is the
|
||||
clock that keeps the date and time while your computer is turned off.
|
||||
|
||||
ACPI has standardized that MC146818 functionality, and extended it in
|
||||
a few ways (enabling longer alarm periods, and wake-from-hibernate).
|
||||
That functionality is NOT exposed in the old driver.
|
||||
|
||||
However it can also be used to generate signals from a slow 2Hz to a
|
||||
relatively fast 8192Hz, in increments of powers of two. These signals
|
||||
are reported by interrupt number 8. (Oh! So *that* is what IRQ 8 is
|
||||
for...) It can also function as a 24hr alarm, raising IRQ 8 when the
|
||||
alarm goes off. The alarm can also be programmed to only check any
|
||||
subset of the three programmable values, meaning that it could be set to
|
||||
ring on the 30th second of the 30th minute of every hour, for example.
|
||||
The clock can also be set to generate an interrupt upon every clock
|
||||
update, thus generating a 1Hz signal.
|
||||
|
||||
The interrupts are reported via /dev/rtc (major 10, minor 135, read only
|
||||
character device) in the form of an unsigned long. The low byte contains
|
||||
the type of interrupt (update-done, alarm-rang, or periodic) that was
|
||||
raised, and the remaining bytes contain the number of interrupts since
|
||||
the last read. Status information is reported through the pseudo-file
|
||||
/proc/driver/rtc if the /proc filesystem was enabled. The driver has
|
||||
built in locking so that only one process is allowed to have the /dev/rtc
|
||||
interface open at a time.
|
||||
|
||||
A user process can monitor these interrupts by doing a read(2) or a
|
||||
select(2) on /dev/rtc -- either will block/stop the user process until
|
||||
the next interrupt is received. This is useful for things like
|
||||
reasonably high frequency data acquisition where one doesn't want to
|
||||
burn up 100% CPU by polling gettimeofday etc. etc.
|
||||
|
||||
At high frequencies, or under high loads, the user process should check
|
||||
the number of interrupts received since the last read to determine if
|
||||
there has been any interrupt "pileup" so to speak. Just for reference, a
|
||||
typical 486-33 running a tight read loop on /dev/rtc will start to suffer
|
||||
occasional interrupt pileup (i.e. > 1 IRQ event since last read) for
|
||||
frequencies above 1024Hz. So you really should check the high bytes
|
||||
of the value you read, especially at frequencies above that of the
|
||||
normal timer interrupt, which is 100Hz.
|
||||
|
||||
Programming and/or enabling interrupt frequencies greater than 64Hz is
|
||||
only allowed by root. This is perhaps a bit conservative, but we don't want
|
||||
an evil user generating lots of IRQs on a slow 386sx-16, where it might have
|
||||
a negative impact on performance. This 64Hz limit can be changed by writing
|
||||
a different value to /proc/sys/dev/rtc/max-user-freq. Note that the
|
||||
interrupt handler is only a few lines of code to minimize any possibility
|
||||
of this effect.
|
||||
|
||||
Also, if the kernel time is synchronized with an external source, the
|
||||
kernel will write the time back to the CMOS clock every 11 minutes. In
|
||||
the process of doing this, the kernel briefly turns off RTC periodic
|
||||
interrupts, so be aware of this if you are doing serious work. If you
|
||||
don't synchronize the kernel time with an external source (via ntp or
|
||||
whatever) then the kernel will keep its hands off the RTC, allowing you
|
||||
exclusive access to the device for your applications.
|
||||
|
||||
The alarm and/or interrupt frequency are programmed into the RTC via
|
||||
various ioctl(2) calls as listed in ./include/linux/rtc.h
|
||||
Rather than write 50 pages describing the ioctl() and so on, it is
|
||||
perhaps more useful to include a small test program that demonstrates
|
||||
how to use them, and demonstrates the features of the driver. This is
|
||||
probably a lot more useful to people interested in writing applications
|
||||
that will be using this driver. See the code at the end of this document.
|
||||
|
||||
(The original /dev/rtc driver was written by Paul Gortmaker.)
|
||||
|
||||
|
||||
New portable "RTC Class" drivers: /dev/rtcN
|
||||
--------------------------------------------
|
||||
|
||||
Because Linux supports many non-ACPI and non-PC platforms, some of which
|
||||
have more than one RTC style clock, it needed a more portable solution
|
||||
than expecting a single battery-backed MC146818 clone on every system.
|
||||
Accordingly, a new "RTC Class" framework has been defined. It offers
|
||||
three different userspace interfaces:
|
||||
|
||||
* /dev/rtcN ... much the same as the older /dev/rtc interface
|
||||
|
||||
* /sys/class/rtc/rtcN ... sysfs attributes support readonly
|
||||
access to some RTC attributes.
|
||||
|
||||
* /proc/driver/rtc ... the system clock RTC may expose itself
|
||||
using a procfs interface. If there is no RTC for the system clock,
|
||||
rtc0 is used by default. More information is (currently) shown
|
||||
here than through sysfs.
|
||||
|
||||
The RTC Class framework supports a wide variety of RTCs, ranging from those
|
||||
integrated into embeddable system-on-chip (SOC) processors to discrete chips
|
||||
using I2C, SPI, or some other bus to communicate with the host CPU. There's
|
||||
even support for PC-style RTCs ... including the features exposed on newer PCs
|
||||
through ACPI.
|
||||
|
||||
The new framework also removes the "one RTC per system" restriction. For
|
||||
example, maybe the low-power battery-backed RTC is a discrete I2C chip, but
|
||||
a high functionality RTC is integrated into the SOC. That system might read
|
||||
the system clock from the discrete RTC, but use the integrated one for all
|
||||
other tasks, because of its greater functionality.
|
||||
|
||||
Check out tools/testing/selftests/rtc/rtctest.c for an example usage of the
|
||||
ioctl interface.
|
249
Documentation/admin-guide/svga.rst
Normal file
249
Documentation/admin-guide/svga.rst
Normal file
@@ -0,0 +1,249 @@
|
||||
.. include:: <isonum.txt>
|
||||
|
||||
=================================
|
||||
Video Mode Selection Support 2.13
|
||||
=================================
|
||||
|
||||
:Copyright: |copy| 1995--1999 Martin Mares, <mj@ucw.cz>
|
||||
|
||||
Intro
|
||||
~~~~~
|
||||
|
||||
This small document describes the "Video Mode Selection" feature which
|
||||
allows the use of various special video modes supported by the video BIOS. Due
|
||||
to usage of the BIOS, the selection is limited to boot time (before the
|
||||
kernel decompression starts) and works only on 80X86 machines.
|
||||
|
||||
.. note::
|
||||
|
||||
Short intro for the impatient: Just use vga=ask for the first time,
|
||||
enter ``scan`` on the video mode prompt, pick the mode you want to use,
|
||||
remember its mode ID (the four-digit hexadecimal number) and then
|
||||
set the vga parameter to this number (converted to decimal first).
|
||||
|
||||
The video mode to be used is selected by a kernel parameter which can be
|
||||
specified in the kernel Makefile (the SVGA_MODE=... line) or by the "vga=..."
|
||||
option of LILO (or some other boot loader you use) or by the "vidmode" utility
|
||||
(present in standard Linux utility packages). You can use the following values
|
||||
of this parameter::
|
||||
|
||||
NORMAL_VGA - Standard 80x25 mode available on all display adapters.
|
||||
|
||||
EXTENDED_VGA - Standard 8-pixel font mode: 80x43 on EGA, 80x50 on VGA.
|
||||
|
||||
ASK_VGA - Display a video mode menu upon startup (see below).
|
||||
|
||||
0..35 - Menu item number (when you have used the menu to view the list of
|
||||
modes available on your adapter, you can specify the menu item you want
|
||||
to use). 0..9 correspond to "0".."9", 10..35 to "a".."z". Warning: the
|
||||
mode list displayed may vary as the kernel version changes, because the
|
||||
modes are listed in a "first detected -- first displayed" manner. It's
|
||||
better to use absolute mode numbers instead.
|
||||
|
||||
0x.... - Hexadecimal video mode ID (also displayed on the menu, see below
|
||||
for exact meaning of the ID). Warning: rdev and LILO don't support
|
||||
hexadecimal numbers -- you have to convert it to decimal manually.
|
||||
|
||||
Menu
|
||||
~~~~
|
||||
|
||||
The ASK_VGA mode causes the kernel to offer a video mode menu upon
|
||||
bootup. It displays a "Press <RETURN> to see video modes available, <SPACE>
|
||||
to continue or wait 30 secs" message. If you press <RETURN>, you enter the
|
||||
menu, if you press <SPACE> or wait 30 seconds, the kernel will boot up in
|
||||
the standard 80x25 mode.
|
||||
|
||||
The menu looks like::
|
||||
|
||||
Video adapter: <name-of-detected-video-adapter>
|
||||
Mode: COLSxROWS:
|
||||
0 0F00 80x25
|
||||
1 0F01 80x50
|
||||
2 0F02 80x43
|
||||
3 0F03 80x26
|
||||
....
|
||||
Enter mode number or ``scan``: <flashing-cursor-here>
|
||||
|
||||
<name-of-detected-video-adapter> tells what video adapter did Linux detect
|
||||
-- it's either a generic adapter name (MDA, CGA, HGC, EGA, VGA, VESA VGA [a VGA
|
||||
with VESA-compliant BIOS]) or a chipset name (e.g., Trident). Direct detection
|
||||
of chipsets is turned off by default as it's inherently unreliable due to
|
||||
absolutely insane PC design.
|
||||
|
||||
"0 0F00 80x25" means that the first menu item (the menu items are numbered
|
||||
from "0" to "9" and from "a" to "z") is a 80x25 mode with ID=0x0f00 (see the
|
||||
next section for a description of mode IDs).
|
||||
|
||||
<flashing-cursor-here> encourages you to enter the item number or mode ID
|
||||
you wish to set and press <RETURN>. If the computer complains something about
|
||||
"Unknown mode ID", it is trying to tell you that it isn't possible to set such
|
||||
a mode. It's also possible to press only <RETURN> which leaves the current mode.
|
||||
|
||||
The mode list usually contains a few basic modes and some VESA modes. In
|
||||
case your chipset has been detected, some chipset-specific modes are shown as
|
||||
well (some of these might be missing or unusable on your machine as different
|
||||
BIOSes are often shipped with the same card and the mode numbers depend purely
|
||||
on the VGA BIOS).
|
||||
|
||||
The modes displayed on the menu are partially sorted: The list starts with
|
||||
the standard modes (80x25 and 80x50) followed by "special" modes (80x28 and
|
||||
80x43), local modes (if the local modes feature is enabled), VESA modes and
|
||||
finally SVGA modes for the auto-detected adapter.
|
||||
|
||||
If you are not happy with the mode list offered (e.g., if you think your card
|
||||
is able to do more), you can enter "scan" instead of item number / mode ID. The
|
||||
program will try to ask the BIOS for all possible video mode numbers and test
|
||||
what happens then. The screen will be probably flashing wildly for some time and
|
||||
strange noises will be heard from inside the monitor and so on and then, really
|
||||
all consistent video modes supported by your BIOS will appear (plus maybe some
|
||||
``ghost modes``). If you are afraid this could damage your monitor, don't use
|
||||
this function.
|
||||
|
||||
After scanning, the mode ordering is a bit different: the auto-detected SVGA
|
||||
modes are not listed at all and the modes revealed by ``scan`` are shown before
|
||||
all VESA modes.
|
||||
|
||||
Mode IDs
|
||||
~~~~~~~~
|
||||
|
||||
Because of the complexity of all the video stuff, the video mode IDs
|
||||
used here are also a bit complex. A video mode ID is a 16-bit number usually
|
||||
expressed in a hexadecimal notation (starting with "0x"). You can set a mode
|
||||
by entering its mode directly if you know it even if it isn't shown on the menu.
|
||||
|
||||
The ID numbers can be divided to those regions::
|
||||
|
||||
0x0000 to 0x00ff - menu item references. 0x0000 is the first item. Don't use
|
||||
outside the menu as this can change from boot to boot (especially if you
|
||||
have used the ``scan`` feature).
|
||||
|
||||
0x0100 to 0x017f - standard BIOS modes. The ID is a BIOS video mode number
|
||||
(as presented to INT 10, function 00) increased by 0x0100.
|
||||
|
||||
0x0200 to 0x08ff - VESA BIOS modes. The ID is a VESA mode ID increased by
|
||||
0x0100. All VESA modes should be autodetected and shown on the menu.
|
||||
|
||||
0x0900 to 0x09ff - Video7 special modes. Set by calling INT 0x10, AX=0x6f05.
|
||||
(Usually 940=80x43, 941=132x25, 942=132x44, 943=80x60, 944=100x60,
|
||||
945=132x28 for the standard Video7 BIOS)
|
||||
|
||||
0x0f00 to 0x0fff - special modes (they are set by various tricks -- usually
|
||||
by modifying one of the standard modes). Currently available:
|
||||
0x0f00 standard 80x25, don't reset mode if already set (=FFFF)
|
||||
0x0f01 standard with 8-point font: 80x43 on EGA, 80x50 on VGA
|
||||
0x0f02 VGA 80x43 (VGA switched to 350 scanlines with a 8-point font)
|
||||
0x0f03 VGA 80x28 (standard VGA scans, but 14-point font)
|
||||
0x0f04 leave current video mode
|
||||
0x0f05 VGA 80x30 (480 scans, 16-point font)
|
||||
0x0f06 VGA 80x34 (480 scans, 14-point font)
|
||||
0x0f07 VGA 80x60 (480 scans, 8-point font)
|
||||
0x0f08 Graphics hack (see the VIDEO_GFX_HACK paragraph below)
|
||||
|
||||
0x1000 to 0x7fff - modes specified by resolution. The code has a "0xRRCC"
|
||||
form where RR is a number of rows and CC is a number of columns.
|
||||
E.g., 0x1950 corresponds to a 80x25 mode, 0x2b84 to 132x43 etc.
|
||||
This is the only fully portable way to refer to a non-standard mode,
|
||||
but it relies on the mode being found and displayed on the menu
|
||||
(remember that mode scanning is not done automatically).
|
||||
|
||||
0xff00 to 0xffff - aliases for backward compatibility:
|
||||
0xffff equivalent to 0x0f00 (standard 80x25)
|
||||
0xfffe equivalent to 0x0f01 (EGA 80x43 or VGA 80x50)
|
||||
|
||||
If you add 0x8000 to the mode ID, the program will try to recalculate
|
||||
vertical display timing according to mode parameters, which can be used to
|
||||
eliminate some annoying bugs of certain VGA BIOSes (usually those used for
|
||||
cards with S3 chipsets and old Cirrus Logic BIOSes) -- mainly extra lines at the
|
||||
end of the display.
|
||||
|
||||
Options
|
||||
~~~~~~~
|
||||
|
||||
Build options for arch/x86/boot/* are selected by the kernel kconfig
|
||||
utility and the kernel .config file.
|
||||
|
||||
VIDEO_GFX_HACK - includes special hack for setting of graphics modes
|
||||
to be used later by special drivers.
|
||||
Allows to set _any_ BIOS mode including graphic ones and forcing specific
|
||||
text screen resolution instead of peeking it from BIOS variables. Don't use
|
||||
unless you think you know what you're doing. To activate this setup, use
|
||||
mode number 0x0f08 (see the Mode IDs section above).
|
||||
|
||||
Still doesn't work?
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
When the mode detection doesn't work (e.g., the mode list is incorrect or
|
||||
the machine hangs instead of displaying the menu), try to switch off some of
|
||||
the configuration options listed under "Options". If it fails, you can still use
|
||||
your kernel with the video mode set directly via the kernel parameter.
|
||||
|
||||
In either case, please send me a bug report containing what _exactly_
|
||||
happens and how do the configuration switches affect the behaviour of the bug.
|
||||
|
||||
If you start Linux from M$-DOS, you might also use some DOS tools for
|
||||
video mode setting. In this case, you must specify the 0x0f04 mode ("leave
|
||||
current settings") to Linux, because if you don't and you use any non-standard
|
||||
mode, Linux will switch to 80x25 automatically.
|
||||
|
||||
If you set some extended mode and there's one or more extra lines on the
|
||||
bottom of the display containing already scrolled-out text, your VGA BIOS
|
||||
contains the most common video BIOS bug called "incorrect vertical display
|
||||
end setting". Adding 0x8000 to the mode ID might fix the problem. Unfortunately,
|
||||
this must be done manually -- no autodetection mechanisms are available.
|
||||
|
||||
History
|
||||
~~~~~~~
|
||||
|
||||
=============== ================================================================
|
||||
1.0 (??-Nov-95) First version supporting all adapters supported by the old
|
||||
setup.S + Cirrus Logic 54XX. Present in some 1.3.4? kernels
|
||||
and then removed due to instability on some machines.
|
||||
2.0 (28-Jan-96) Rewritten from scratch. Cirrus Logic 64XX support added, almost
|
||||
everything is configurable, the VESA support should be much more
|
||||
stable, explicit mode numbering allowed, "scan" implemented etc.
|
||||
2.1 (30-Jan-96) VESA modes moved to 0x200-0x3ff. Mode selection by resolution
|
||||
supported. Few bugs fixed. VESA modes are listed prior to
|
||||
modes supplied by SVGA autodetection as they are more reliable.
|
||||
CLGD autodetect works better. Doesn't depend on 80x25 being
|
||||
active when started. Scanning fixed. 80x43 (any VGA) added.
|
||||
Code cleaned up.
|
||||
2.2 (01-Feb-96) EGA 80x43 fixed. VESA extended to 0x200-0x4ff (non-standard 02XX
|
||||
VESA modes work now). Display end bug workaround supported.
|
||||
Special modes renumbered to allow adding of the "recalculate"
|
||||
flag, 0xffff and 0xfffe became aliases instead of real IDs.
|
||||
Screen contents retained during mode changes.
|
||||
2.3 (15-Mar-96) Changed to work with 1.3.74 kernel.
|
||||
2.4 (18-Mar-96) Added patches by Hans Lermen fixing a memory overwrite problem
|
||||
with some boot loaders. Memory management rewritten to reflect
|
||||
these changes. Unfortunately, screen contents retaining works
|
||||
only with some loaders now.
|
||||
Added a Tseng 132x60 mode.
|
||||
2.5 (19-Mar-96) Fixed a VESA mode scanning bug introduced in 2.4.
|
||||
2.6 (25-Mar-96) Some VESA BIOS errors not reported -- it fixes error reports on
|
||||
several cards with broken VESA code (e.g., ATI VGA).
|
||||
2.7 (09-Apr-96) - Accepted all VESA modes in range 0x100 to 0x7ff, because some
|
||||
cards use very strange mode numbers.
|
||||
- Added Realtek VGA modes (thanks to Gonzalo Tornaria).
|
||||
- Hardware testing order slightly changed, tests based on ROM
|
||||
contents done as first.
|
||||
- Added support for special Video7 mode switching functions
|
||||
(thanks to Tom Vander Aa).
|
||||
- Added 480-scanline modes (especially useful for notebooks,
|
||||
original version written by hhanemaa@cs.ruu.nl, patched by
|
||||
Jeff Chua, rewritten by me).
|
||||
- Screen store/restore fixed.
|
||||
2.8 (14-Apr-96) - Previous release was not compilable without CONFIG_VIDEO_SVGA.
|
||||
- Better recognition of text modes during mode scan.
|
||||
2.9 (12-May-96) - Ignored VESA modes 0x80 - 0xff (more VESA BIOS bugs!)
|
||||
2.10(11-Nov-96) - The whole thing made optional.
|
||||
- Added the CONFIG_VIDEO_400_HACK switch.
|
||||
- Added the CONFIG_VIDEO_GFX_HACK switch.
|
||||
- Code cleanup.
|
||||
2.11(03-May-97) - Yet another cleanup, now including also the documentation.
|
||||
- Direct testing of SVGA adapters turned off by default, ``scan``
|
||||
offered explicitly on the prompt line.
|
||||
- Removed the doc section describing adding of new probing
|
||||
functions as I try to get rid of _all_ hardware probing here.
|
||||
2.12(25-May-98) Added support for VESA frame buffer graphics.
|
||||
2.13(14-May-99) Minor documentation fixes.
|
||||
=============== ================================================================
|
@@ -327,7 +327,7 @@ when a hard lockup is detected.
|
||||
0 - don't panic on hard lockup
|
||||
1 - panic on hard lockup
|
||||
|
||||
See Documentation/lockup-watchdogs.txt for more information. This can
|
||||
See Documentation/admin-guide/lockup-watchdogs.rst for more information. This can
|
||||
also be set using the nmi_watchdog kernel parameter.
|
||||
|
||||
|
||||
|
34
Documentation/admin-guide/video-output.rst
Normal file
34
Documentation/admin-guide/video-output.rst
Normal file
@@ -0,0 +1,34 @@
|
||||
Video Output Switcher Control
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
2006 luming.yu@intel.com
|
||||
|
||||
The output sysfs class driver provides an abstract video output layer that
|
||||
can be used to hook platform specific methods to enable/disable video output
|
||||
device through common sysfs interface. For example, on my IBM ThinkPad T42
|
||||
laptop, The ACPI video driver registered its output devices and read/write
|
||||
method for 'state' with output sysfs class. The user interface under sysfs is::
|
||||
|
||||
linux:/sys/class/video_output # tree .
|
||||
.
|
||||
|-- CRT0
|
||||
| |-- device -> ../../../devices/pci0000:00/0000:00:01.0
|
||||
| |-- state
|
||||
| |-- subsystem -> ../../../class/video_output
|
||||
| `-- uevent
|
||||
|-- DVI0
|
||||
| |-- device -> ../../../devices/pci0000:00/0000:00:01.0
|
||||
| |-- state
|
||||
| |-- subsystem -> ../../../class/video_output
|
||||
| `-- uevent
|
||||
|-- LCD0
|
||||
| |-- device -> ../../../devices/pci0000:00/0000:00:01.0
|
||||
| |-- state
|
||||
| |-- subsystem -> ../../../class/video_output
|
||||
| `-- uevent
|
||||
`-- TV0
|
||||
|-- device -> ../../../devices/pci0000:00/0000:00:01.0
|
||||
|-- state
|
||||
|-- subsystem -> ../../../class/video_output
|
||||
`-- uevent
|
||||
|
Reference in New Issue
Block a user