12345678910111213141516171819202122232425262728293031323334353637 |
- What: /sys/devices/system/machinecheck/machinecheckX/tolerant
- Contact: Borislav Petkov <[email protected]>
- Date: Dec, 2021
- Description:
- Unused and obsolete after the advent of recoverable machine
- checks (see last sentence below) and those are present since
- 2010 (Nehalem).
- Original description:
- The entries appear for each CPU, but they are truly shared
- between all CPUs.
- Tolerance level. When a machine check exception occurs for a
- non corrected machine check the kernel can take different
- actions.
- Since machine check exceptions can happen any time it is
- sometimes risky for the kernel to kill a process because it
- defies normal kernel locking rules. The tolerance level
- configures how hard the kernel tries to recover even at some
- risk of deadlock. Higher tolerant values trade potentially
- better uptime with the risk of a crash or even corruption
- (for tolerant >= 3).
- == ===========================================================
- 0 always panic on uncorrected errors, log corrected errors
- 1 panic or SIGBUS on uncorrected errors, log corrected errors
- 2 SIGBUS or log uncorrected errors, log corrected errors
- 3 never panic or SIGBUS, log all errors (for testing only)
- == ===========================================================
- Default: 1
- Note this only makes a difference if the CPU allows recovery
- from a machine check exception. Current x86 CPUs generally
- do not.
|