what:
some os preemption path is messed up with world switch preemption
fix:
cleanup those logics so os preemption not mixed with world switch
this patch is a general fix for issues comes from SRIOV MCBP, but
there is still UMD side issues not resovlved yet, so this patch
cannot fix all world switch bug.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
1)
we shouldn't load PSP kdb and sys/sos for VF, they are
supposed to be handled by hypervisor
2)
ih reroute doesn't work on VF thus we should avoid calling
it, besides VF should not use those PSP register sets for PF
3)
shouldn't load SMU ucode under SRIOV, otherwise PSP would report
error
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use the existence of the workqueue itself to determine when to
enable HDCP features rather than sprinkling asic checks all over
the code. Also add a check for the existence of the hdcp
workqueue in the irq handling on the off chance we get and HPD
RX interrupt with the CP bit set. This avoids a crash if
the driver doesn't support HDCP for a particular asic.
Fixes: 96a3b32e67 ("drm/amd/display: only enable HDCP for DCN+")
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206519
Reviewed-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
As the firmware versions of arcturus are different from other gfx9
ASICs. And the warning("CP firmware version too old, please update!")
caused by this check can be eliminated.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The rlc version of raven_kicer_rlc is different from the legacy rlc
version of raven_rlc. So it needs to add a judgement function for
raven_kicer_rlc and avoid disable GFXOFF when loading raven_kicer_rlc.
Signed-off-by: changzhu <Changfeng.Zhu@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When NBIO's RAS error happens, before trigging GPU reset, it's needed
to record error counter information, which can correct the error counter
value missed issue when reading from debugfs.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Once sync flood interrupt is triggered by RAS error, before
actual GPU recovery job, it's necessary to log on and print
non-zero error counter, this will help user knows where the
RAS error source is from quickly.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Clang warns:
../drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link.c:980:36:
warning: address of 'sink->edid_caps.panel_patch.skip_scdc_overwrite'
will always evaluate to 'true' [-Wpointer-bool-conversion]
if (&sink->edid_caps.panel_patch.skip_scdc_overwrite)
~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~
1 warning generated.
This is probably not what was intended so remove the address of
operator, which matches how skip_scdc_overwrite is handled in the rest
of the driver.
While we're here, drop an extra newline after this if block.
Fixes: a760fc1bff ("drm/amd/display: add monitor patch to disable SCDC read/write")
Link: https://github.com/ClangBuiltLinux/linux/issues/879
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Zhan Liu <zhan.liu@amd.com>
Reviewed-by: Zhan Liu <zhan.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The rlc version of raven_kicer_rlc is different from the legacy rlc
version of raven_rlc. So it needs to add a judgement function for
raven_kicer_rlc and avoid disable GFXOFF when loading raven_kicer_rlc.
Signed-off-by: changzhu <Changfeng.Zhu@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lyude needs some patches in 5.6-rc2 and we didn't bring drm-misc-next
forward yet, so it looks like a good occasion.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
there was a type in the terminate command.
We should be calling psp_dtm_unload() instead of psp_hdcp_unload()
Fixes: 143f230533 ("drm/amdgpu: psp DTM init")
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Since 'smu_feature_is_enabled(smu, SMU_FEATURE_BACO_BIT)' will always return
false considering the 'smu_system_features_control(smu, false)' disabled
all SMU features.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The copy_to_user() function returns the number of bytes remaining to be
copied, but we want to return a negative error code to the user.
Fixes: 030d5b97a5 ("drm/amdgpu: use amdgpu_device_vram_access in amdgpu_ttm_vram_read")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise we readback all ones. Fixes rlc counter
readback while gfxoff is active.
Reviewed-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise we readback all ones. Fixes rlc counter
readback while gfxoff is active.
Reviewed-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It's 25 Mhz (refclk / 4). This fixes the interpretation
of the rlc clock counter.
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In certain situations the message could be reported dozens-to-hundreds of
times, based on how often the function is called.
E.g. If MCLK DPM, any calls to get/set MCLK will result in a failure
message, potentially flooding dmesg. Ratelimit the warnings to avoid
this flood.
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
there was a type in the terminate command.
We should be calling psp_dtm_unload() instead of psp_hdcp_unload()
Fixes: 143f230533 ("drm/amdgpu: psp DTM init")
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
HDCP DTM needs to be aware of the upto date display topology
information in order to validate hardware consistency.
[how]
update HDCP DTM on update_stream_config call.
Signed-off-by: Wenjing Liu <Wenjing.Liu@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
According to the specs when bksv or ksv list fails SRM check,
HDCP TX should abort hdcp immediately.
However with the current code HDCP will be reattampt upto 4 times.
[how]
Add the logic that stop HDCP retry if bksv or ksv list
is revoked.
Signed-off-by: Wenjing Liu <Wenjing.Liu@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
PSP added a new return code for revoked receivers (SRM). We need to
handle that so we don't retry hdcp
This is already being handled on windows
[How]
Add the enums to psp interface header and handle them.
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Seems to work reliably on VI+ except for a few so enable runpm barring
those where baco for runtime power management is not supported.
[rajneesh] Picked https://patchwork.freedesktop.org/patch/335402/ to
enable runtime pm with baco for kfd. Also fixed a checkpatch warning and
added extra checks for VEGA20 and ARCTURUS.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
So far the kfd driver implemented same routines for runtime and system
wide suspend and resume (s2idle or mem). During system wide suspend the
kfd aquires an atomic lock that prevents any more user processes to
create queues and interact with kfd driver and amd gpu. This mechanism
created problem when amdgpu device is runtime suspended with BACO
enabled. Any application that relies on kfd driver fails to load because
the driver reports a locked kfd device since gpu is runtime suspended.
However, in an ideal case, when gpu is runtime suspended the kfd driver
should be able to:
- auto resume amdgpu driver whenever a client requests compute service
- prevent runtime suspend for amdgpu while kfd is in use
This change refactors the amdgpu and amdkfd drivers to support BACO and
runtime power management.
Reviewed-by: Oak Zeng <oak.zeng@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
During system suspend the kfd driver aquires a lock that prohibits
further kfd actions unless the gpu is resumed. This adds some info which
can be useful while debugging.
Reviewed-by: Oak Zeng <oak.zeng@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY]
Programming DPPCLK to the same value currently set may cause
underflow while playing video in certain conditions.
[HOW]
Only program DPPCLK if clock is not the same as the
previous value programmed.
Signed-off-by: Sung Lee <sung.lee@amd.com>
Reviewed-by: Yongqiang Sun <yongqiang.sun@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
In DCN hardware sequencer we do actually call ATOM_INIT correctly per
pipe. The workaround is not necessary for command table offloading.
[How]
Drop the workaround since it's not needed.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Chris Park <Chris.Park@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix warning during switching to dpg pause mode for
VCN firmware Version ENC: 1.1 DEC: 1 VEP: 0 Revision: 16
Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
GDS clear workaround will cause gfx failure in suspend/resume case.
[ 98.679559] [drm:amdgpu_device_ip_late_init [amdgpu]] *ERROR* late_init of IP block <gfx_v9_0> failed -110
[ 98.679561] PM: dpm_run_callback(): pci_pm_resume+0x0/0xa0 returns -110
[ 98.679562] PM: Device 0000:03:00.0 failed to resume async: error -110
As this workaround is specific to the HW bug of GDS's ECC error
existing in cold boot up, so bypass this workaround in suspend/
resume case after booting up.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
hwc->conf was designated specifically for AMD APU IOMMU purposes. This
could cause problems in performance and/or function since APU IOMMU
implementation is elsewhere. Also hwc->conf and hwc->config are
different members of an anonymous union so hwc->conf aliases as
hw->last_tag.
Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>