drm/amdgpu: Improve RAS documentation (v2)

Clarify some areas, clean up formatting, add section for
unrecoverable error handling.

v2: fix grammatical errors

Reviewed-by: Yong Zhao <yong.zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This commit is contained in:
Alex Deucher
2019-10-30 14:40:09 -04:00
parent 7158ca8476
commit ef177d11d6
2 changed files with 68 additions and 7 deletions

View File

@@ -82,12 +82,21 @@ AMDGPU XGMI Support
AMDGPU RAS Support
==================
The AMDGPU RAS interfaces are exposed via sysfs (for informational queries) and
debugfs (for error injection).
RAS debugfs/sysfs Control and Error Injection Interfaces
--------------------------------------------------------
.. kernel-doc:: drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
:doc: AMDGPU RAS debugfs control interface
RAS Reboot Behavior for Unrecoverable Errors
--------------------------------------------------------
.. kernel-doc:: drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
:doc: AMDGPU RAS Reboot Behavior for Unrecoverable Errors
RAS Error Count sysfs Interface
-------------------------------
@@ -109,6 +118,32 @@ RAS VRAM Bad Pages sysfs Interface
.. kernel-doc:: drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
:internal:
Sample Code
-----------
Sample code for testing error injection can be found here:
https://cgit.freedesktop.org/mesa/drm/tree/tests/amdgpu/ras_tests.c
This is part of the libdrm amdgpu unit tests which cover several areas of the GPU.
There are four sets of tests:
RAS Basic Test
The test verifies the RAS feature enabled status and makes sure the necessary sysfs and debugfs files
are present.
RAS Query Test
This test checks the RAS availability and enablement status for each supported IP block as well as
the error counts.
RAS Inject Test
This test injects errors for each IP.
RAS Disable Test
This test tests disabling of RAS features for each IP block.
GPU Power/Thermal Controls and Monitoring
=========================================