On gen7 and gen7.5 devices, there could be leftover data residuals in
EU/L3 from the retiring context. This patch introduces workaround to clear
that residual contexts, by submitting a batch buffer with dedicated HW
context to the GPU with ring allocation for each context switching.
This security mitigation changes does not triggers any performance
regression. Performance is on par with current drm-tips.
v2: Add igt generated header file for CB kernel assembled with Mesa tool
and addressed use of Kernel macro for ptr_align comment.
v3: Resolve Sparse warnings with newly generated, and imported CB
kernel.
v4: Include new igt generated CB kernel for gen7 and gen7.5. Also
add code formatting and compiler warnings changes (Chris Wilson)
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Cc: Chris Wilson <chris.p.wilson@intel.com>
Cc: Balestrieri Francesco <francesco.balestrieri@intel.com>
Cc: Bloomfield Jon <jon.bloomfield@intel.com>
Cc: Dutt Sudeep <sudeep.dutt@intel.com>
Acked-by: Chris Wilson <chris@chris-wilso.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200306000957.2836150-2-chris@chris-wilson.co.uk
This patch adds framework to submit an arbitrary batchbuffer on each
context switch to clear residual state for render engine on Gen7/7.5
devices.
The idea of always emitting the context and vm setup around each request
is primary to make reset recovery easy, and not require rewriting the
ringbuffer. As each request would set up its own context, leaving it to
the HW to notice and elide no-op context switches, we could restart the
ring at any point, and reorder the requests freely.
However, to avoid emitting clear_residuals() between consecutive requests
in the ringbuffer of the same context, we do want to track the current
context in the ring. In doing so, we need to be careful to only record a
context switch when we are sure the next request will be emitted.
This security mitigation change does not trigger any performance
regression. Performance is on par with current mainline/drm-tip.
v2: Update vm_alias params to point to correct address space "vm" due to
changes made in the patch "f21613797bae98773"
v3-v4: none
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Balestrieri Francesco <francesco.balestrieri@intel.com>
Cc: Bloomfield Jon <jon.bloomfield@intel.com>
Cc: Dutt Sudeep <sudeep.dutt@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200306000957.2836150-1-chris@chris-wilson.co.uk
Without this, we get a couple of warnings when CONFIG_PM
is disabled:
drivers/gpu/drm/arm/display/komeda/komeda_drv.c:156:12: error: 'komeda_rt_pm_resume' defined but not used [-Werror=unused-function]
static int komeda_rt_pm_resume(struct device *dev)
^~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/arm/display/komeda/komeda_drv.c:149:12: error: 'komeda_rt_pm_suspend' defined but not used [-Werror=unused-function]
static int komeda_rt_pm_suspend(struct device *dev)
^~~~~~~~~~~~~~~~~~~~
Fixes: efb4650885 ("drm/komeda: Add runtime_pm support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: James Qian Wang (Arm Technology China) <james.qian.wang@arm.com>
Signed-off-by: james qian wang (Arm Technology China) <james.qian.wang@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200107215327.1579195-1-arnd@arndb.de
The emulated vbt doesn't tell its size correctly. According to the
intel_vbt_defs.h, vbt_header.vbt_size should the size of VBT (VBT Header,
BDB Header and data blocks), and bdb_header.bdb_size should be the size
of BDB (BDB Header and data blocks).
This patch fixes the issue and lets vbt provided by GVT-g pass the guest
i915's sanity test.
v2: refine the commit message. (Zhenyu)
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20200305131600.29640-1-tina.zhang@intel.com
This interface is for dGPU Navi1x. Linux dc-pplib interface depends
on window driver dc implementation.
For Navi1x, clock settings of dcn watermarks are fixed. the settings
should be passed to smu during boot up and resume from s3.
boot up: dc calculate dcn watermark clock settings within dc_create,
dcn20_resource_construct, then call pplib functions below to pass
the settings to smu:
smu_set_watermarks_for_clock_ranges
smu_set_watermarks_table
navi10_set_watermarks_table
smu_write_watermarks_table
For Renoir, clock settings of dcn watermark are also fixed values.
dc has implemented different flow for window driver:
dc_hardware_init / dc_set_power_state
dcn10_init_hw
notify_wm_ranges
set_wm_ranges
For Linux
smu_set_watermarks_for_clock_ranges
renoir_set_watermarks_table
smu_write_watermarks_table
dc_hardware_init -> amdgpu_dm_init
dc_set_power_state --> dm_resume
therefore, linux dc-pplib interface of navi10/12/14 is different
from that of Renoir.
v2: add missing unlock in error case
Signed-off-by: Hersen Wu <hersenxs.wu@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This fix will handle some MP1 FW issue like as mclk dpm table in renoir has a reverse
dpm clock layout and a zero frequency dpm level as following case.
cat pp_dpm_mclk
0: 1200Mhz
1: 1200Mhz
2: 800Mhz
3: 0Mhz
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
[Why]
Swath sizes are being calculated incorrectly. The horizontal swath size
should be the product of block height, viewport width, and bytes per
element, but the calculation uses viewport height instead of width. The
vertical swath size is similarly incorrectly calculated. The effect of
this is that we report the wrong DCC caps.
[How]
Use viewport width in the horizontal swath size calculation and viewport
height in the vertical swath size calculation.
Signed-off-by: Josip Pavic <Josip.Pavic@amd.com>
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Currently we're comparing the watermarks between the old and new states
before we've fully computed the new watermarks. In particular
skl_build_pipe_wm() will not account for the amount of ddb space we'll
have. That information is only available during skl_compute_ddb()
which will proceed to zero out any watermark level exceeding the
ddb allocation. If we're short on ddb space this will end up
adding the plane to the state due erronously determining that the
watermarks have changed. Fix the problem by deferring
skl_wm_add_affected_planes() until we have the final watermarks
computed.
Noticed this when trying enable transition watermarks on glk.
We now computed the trans_wm as 28, but we only had 14 blocks
of ddb, and thus skl_compute_ddb() ended up disabling the cursor
trans_wm every time. Thus we ended up adding the cursor to every
commit that didn't actually affect the cursor at all.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200228203552.30273-2-ville.syrjala@linux.intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
The hardware never sees the uv_wm values (apart from
uv_wm.min_ddb_alloc affecting the ddb allocation). Thus there
is no point in comparing uv_wm to determine if we need to
reprogram the watermark registers. So let's check only the
rgb/y watermark in skl_plane_wm_equals(). But let's leave
a comment behind so that the next person reading this doesn't
get as confused as I did when I added this check.
If the ddb allocation ends up changing due to uv_wm
skl_ddb_add_affected_planes() takes care of adding the plane
to the state.
TODO: we should perhaps just eliminate uv_wm from the state
and simply track the min_ddb_alloc for uv instead.
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200228203552.30273-1-ville.syrjala@linux.intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The wait_for macro's for Broadcom V3D driver used msleep, which is
inappropriate due to its inaccuracy at low values (minimum wait time
is about 30ms on the Raspberry Pi). This sleep was triggering in
v3d_clean_caches(), causing us to only be able to dispatch ~33 compute
jobs per second.
This patch replaces the macro with the one from the Intel i915 version
which uses usleep_range to provide more accurate waits.
v2: Split from the vc4 patch so that we can confidently apply to
stable (by anholt)
Signed-off-by: James Hughes <james.hughes@raspberrypi.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20200217153145.13780-1-james.hughes@raspberrypi.com
Link: https://github.com/raspberrypi/linux/issues/3460
Fixes: 57692c94dc ("drm/v3d: Introduce a new DRM driver for Broadcom V3D V3.x+")
The wait_for macro's for Broadcom VC4 driver used msleep, which is
inappropriate due to its inaccuracy at low values (minimum wait time
is about 30ms on the Raspberry Pi). This sleep was triggering in
v3d_clean_caches(), causing us to only be able to dispatch ~33 compute
jobs per second.
This patch replaces the macro with the one from the Intel i915 version
which uses usleep_range to provide more accurate waits.
v2: Split from the v3d patch in case this tickles modesetting bugs (by
anholt)
Signed-off-by: James Hughes <james.hughes@raspberrypi.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20200217153145.13780-1-james.hughes@raspberrypi.com
Clang warns:
drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dmub_psr.c:147:31: warning:
address of 'pipe_ctx->plane_res' will always evaluate to 'true'
[-Wpointer-bool-conversion]
if (!pipe_ctx || !&pipe_ctx->plane_res || !&pipe_ctx->stream_res)
~ ~~~~~~~~~~^~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dmub_psr.c:147:56: warning:
address of 'pipe_ctx->stream_res' will always evaluate to 'true'
[-Wpointer-bool-conversion]
if (!pipe_ctx || !&pipe_ctx->plane_res || !&pipe_ctx->stream_res)
~ ~~~~~~~~~~^~~~~~~~~~
2 warnings generated.
As long as pipe_ctx is not NULL, the address of members in this struct
cannot be NULL, which means these checks will always evaluate to false.
Fixes: 4c1a1335df ("drm/amd/display: Driverside changes to support PSR in DMCUB")
Link: https://github.com/ClangBuiltLinux/linux/issues/915
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This interface is for dGPU Navi1x. Linux dc-pplib interface depends
on window driver dc implementation.
For Navi1x, clock settings of dcn watermarks are fixed. the settings
should be passed to smu during boot up and resume from s3.
boot up: dc calculate dcn watermark clock settings within dc_create,
dcn20_resource_construct, then call pplib functions below to pass
the settings to smu:
smu_set_watermarks_for_clock_ranges
smu_set_watermarks_table
navi10_set_watermarks_table
smu_write_watermarks_table
For Renoir, clock settings of dcn watermark are also fixed values.
dc has implemented different flow for window driver:
dc_hardware_init / dc_set_power_state
dcn10_init_hw
notify_wm_ranges
set_wm_ranges
For Linux
smu_set_watermarks_for_clock_ranges
renoir_set_watermarks_table
smu_write_watermarks_table
dc_hardware_init -> amdgpu_dm_init
dc_set_power_state --> dm_resume
therefore, linux dc-pplib interface of navi10/12/14 is different
from that of Renoir.
v2: add missing unlock in error case
Signed-off-by: Hersen Wu <hersenxs.wu@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The TDR will be randomly failed due to compute ring
test failure. If the compute ring wptr & 0x7ff(ring_buf_mask)
is 0x100 then after map mqd the compute ring rptr will be
synced with 0x100. And the ring test packet size is also 0x100.
Then after invocation of amdgpu_ring_commit, the cp will not
really handle the packet on the ring buffer because rptr is equal to wptr.
Signed-off-by: Yintian Tao <yttao@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Monk Liu <Monk.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Starts USBC PD FW download and reads back the latest FW version.
v2:
Move sysfs file creation to late init
Add locking around PSP calls to avoid concurrent access to PSP's C2P registers
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Navi12 VK CTS subtest timestamp.calibrated.dev_domain_test failed
because mmRLC_CAPTURE_GPU_CLOCK_COUNT register cannot be
written in VF due to security policy.
Solution: use a VF-accessible timestamp register pair
mmGOLDEN_TSC_COUNT_LOWER/UPPER for SRIOV case.
v2: according to Deucher Alexander's advice, switch to
mmGOLDEN_TSC_COUNT_LOWER/UPPER for both bare metal and SRIOV.
Signed-off-by: jianzh <Jiange.Zhao@amd.com>
Reviewed-by: Monk Liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When hit COMBINATIONAL_BYPASS the mclk will be bypass and can export
fclk frequency to user usage.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This fix will handle some MP1 FW issue like as mclk dpm table in renoir has a reverse
dpm clock layout and a zero frequency dpm level as following case.
cat pp_dpm_mclk
0: 1200Mhz
1: 1200Mhz
2: 800Mhz
3: 0Mhz
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
With new L1 policy, some regs are blocked at guest and they are
programed at host side. So skip programing the regs under sriov.
the regs are:
GCMC_VM_FB_LOCATION_TOP
GCMC_VM_FB_LOCATION_BASE
MMMC_VM_FB_LOCATION_TOP
MMMC_VM_FB_LOCATION_BASE
GCMC_VM_SYSTEM_APERTURE_HIGH_ADDR
GCMC_VM_SYSTEM_APERTURE_LOW_ADDR
MMMC_VM_SYSTEM_APERTURE_HIGH_ADDR
MMMC_VM_SYSTEM_APERTURE_LOW_ADDR
HDP_NONSURFACE_BASE
HDP_NONSURFACE_BASE_HI
GCMC_VM_AGP_TOP
GCMC_VM_AGP_BOT
GCMC_VM_AGP_BASE
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Tiecheng Zhou <Tiecheng.Zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>