This is amdkfd pull for 4.18. The major new features are:
- Add support for GFXv9 dGPUs (VEGA)
- Add support for userptr memory mapping
In addition, there are a couple of small fixes and improvements, such as:
- Fix lock handling
- Fix rollback packet in kernel kfd_queue
- Optimize kfd signal handling
- Fix CP hang in APU
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180514070126.GA1827@odedg-x270
Allocating up to 32 physically contiguous pages can easily fail (and has
failed for me), and isn't necessary anyway.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
- Fix a hang on CZ boards with EDC enabled
- Fix hangs related to DP MST handling
- Fix a deadlock in irq handling in DC
* 'drm-fixes-4.17' of git://people.freedesktop.org/~agd5f/linux:
drm/amd/display: Check dc_sink every time in MST hotplug
drm/amd/display: Update MST edid property every time
drm/amd/display: Don't read EDID in atomic_check
drm/amd/display: Disallow enabling CRTC without primary plane with FB
drm/amd/display: Fix deadlock when flushing irq
drm/amdgpu: set COMPUTE_PGM_RSRC1 for SGPR/VGPR clearing shaders
Extended fix to: "Don't read EDID in atomic_check"
Fix issue of missing dc_sink in .mode_valid in hot plug routine.
Need to check dc_sink everytime in .get_modes hook after checking
edid, since edid is not getting removed in hot unplug but dc_sink
doesn't.
Signed-off-by: Jerry (Fangzhi) Zuo <Jerry.Zuo@amd.com>
Reviewed-by: Roman Li <Roman.Li@amd.com>
Reviewed-by: Harry Wentland <Harry.Wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Extended fix to: "Don't read EDID in atomic_check"
Fix display property not observed in GUI display after hot plug.
Call drm_mode_connector_update_edid_property every time in
.get_modes hook, due to the fact that edid property is getting
removed from usermode ioctl DRM_IOCTL_MODE_GETCONNECTOR each time
in hot unplug.
Signed-off-by: Jerry (Fangzhi) Zuo <Jerry.Zuo@amd.com>
Reviewed-by: Harry Wentland <Harry.Wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
We shouldn't attempt to read EDID in atomic_check. We really shouldn't
even be modifying the connector object, or any other non-state object,
but this is a start at least.
Moving EDID cleanup to dm_dp_mst_connector_destroy from
dm_dp_destroy_mst_connector to ensure the EDID is still available for
headless mode.
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
The below commit
"drm/atomic: Try to preserve the crtc enabled state in drm_atomic_remove_fb, v2"
introduces a slight behavioral change to rmfb. Instead of disabling a crtc
when the primary plane is disabled, it now preserves it.
Since DC is currently not equipped to handle this we need to fail such
a commit, otherwise we might see a corrupted screen.
This is based on Shirish's previous approach but avoids adding all
planes to the new atomic state which leads to a full update in DC for
any commit, and is not what we intend.
Theoretically DM should be able to deal with states with fully populated planes,
even for simple updates, such as cursor updates. This should still be
addressed in the future.
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise, the SQ may skip some of the register writes, or shader waves may
be allocated where we don't expect them, so that as a result we don't actually
reset all of the register SRAMs. This can lead to spurious ECC errors later on
if a shader uses an uninitialized register.
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
When CONFIG_MMU_NOTIFIER is not enabled, struct mmu_notifier has an
incomplete type definition, which causes build errors.
../drivers/gpu/drm/amd/amdkfd/kfd_priv.h:607:22: error: field 'mmu_notifier' has incomplete type
../include/linux/kernel.h:979:32: error: dereferencing pointer to incomplete type
../include/linux/kernel.h:980:18: error: dereferencing pointer to incomplete type
../drivers/gpu/drm/amd/amdkfd/kfd_process.c:434:2: error: implicit declaration of function 'mmu_notifier_unregister_no_release' [-Werror=implicit-function-declaration]
../drivers/gpu/drm/amd/amdkfd/kfd_process.c:435:2: error: implicit declaration of function 'mmu_notifier_call_srcu' [-Werror=implicit-function-declaration]
../drivers/gpu/drm/amd/amdkfd/kfd_process.c:438:21: error: variable 'kfd_process_mmu_notifier_ops' has initializer but incomplete type
../drivers/gpu/drm/amd/amdkfd/kfd_process.c:439:2: error: unknown field 'release' specified in initializer
../drivers/gpu/drm/amd/amdkfd/kfd_process.c:439:2: warning: excess elements in struct initializer [enabled by default]
../drivers/gpu/drm/amd/amdkfd/kfd_process.c:439:2: warning: (near initialization for 'kfd_process_mmu_notifier_ops') [enabled by default]
../drivers/gpu/drm/amd/amdkfd/kfd_process.c:534:2: error: implicit declaration of function 'mmu_notifier_register' [-Werror=implicit-function-declaration]
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Currently if a user requests clock counters for a node without a GPU
resource we will always return EINVAL.
Instead if no GPU resource is attached, fill the gpu_clock_counter
argument with zeroes so that we may proceed and return valid CPU
counters.
Signed-off-by: Andres Rodriguez <andres.rodriguez@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Passing NULL pointer to PTR_ERR will result in return value of 0
indicating success which is clearly not what it is intended here.
This patch returns -EINVAL instead.
v2: change ret code to -ENODEV
Fixes: 5ec7e02854 ("drm/amdkfd: Add ioctls for GPUVM memory management")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Pull drm fixes from Dave Airlie:
"Exynos, i915, vc4, amdgpu fixes.
i915:
- an oops fix
- two race fixes
- some gvt fixes
amdgpu:
- dark screen fix
- clk/voltage fix
- vega12 smu fix
vc4:
- memory leak fix
exynos just drops some code"
* tag 'drm-fixes-for-v4.17-rc2' of git://people.freedesktop.org/~airlied/linux: (23 commits)
drm/amd/powerplay: header file interface to SMU update
drm/amd/pp: Fix bug voltage can't be OD separately on VI
drm/amd/display: Don't program bypass on linear regamma LUT
drm/i915: Fix LSPCON TMDS output buffer enabling from low-power state
drm/i915/audio: Fix audio detection issue on GLK
drm/i915: Call i915_perf_fini() on init_hw error unwind
drm/i915/bios: filter out invalid DDC pins from VBT child devices
drm/i915/pmu: Inspect runtime PM state more carefully while estimating RC6
drm/i915: Do no use kfree() to free a kmem_cache_alloc() return value
drm/exynos: exynos_drm_fb -> drm_framebuffer
drm/exynos: Move dma_addr out of exynos_drm_fb
drm/exynos: Move GEM BOs to drm_framebuffer
drm: Fix HDCP downstream dev count read
drm/vc4: Fix memory leak during BO teardown
drm/i915/execlists: Clear user-active flag on preemption completion
drm/i915/gvt: Add drm_format_mod update
drm/i915/gvt: Disable primary/sprite/cursor plane at virtual display initialization
drm/i915/gvt: Delete redundant error message in fb_decode.c
drm/i915/gvt: Cancel dma map when resetting ggtt entries
drm/i915/gvt: Missed to cancel dma map for ggtt entries
...
- Fix a dark screen issue in DC
- Fix clk/voltage dependency tracking for wattman
- Update SMU interface for vega12
* 'drm-next-4.17' of git://people.freedesktop.org/~agd5f/linux:
drm/amd/powerplay: header file interface to SMU update
drm/amd/pp: Fix bug voltage can't be OD separately on VI
drm/amd/display: Don't program bypass on linear regamma LUT
Remove Exynos specific framebuffer structure and
relevant functions.
- it removes exynos_drm_fb structure which is a wrapper of
drm_framebuffer and unnecessary two exynos specific callback
functions, exynos_drm_destory() and exynos_drm_fb_create_handle()
because we can reuse existing drm common callback ones instead.
* tag 'exynos-drm-fixes-for-v4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos:
drm/exynos: exynos_drm_fb -> drm_framebuffer
drm/exynos: Move dma_addr out of exynos_drm_fb
drm/exynos: Move GEM BOs to drm_framebuffer
drm/amdkfd: Deallocate SDMA queues correctly
drm/amdkfd: Fix scratch memory with HWS enabled
Even though this is required for degamma since DCE HW only supports a
couple predefined LUTs we can just program the LUT directly for regamma.
This fixes dark screens which occurs when we program regamma to bypass
while degamma is using srgb LUT.
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Only accept interrupts from KFD VMIDs. Just checking for a PASID may
not be enough because amdgpu started using PASIDs to map VM faults
to processes.
Warn if an IRQ doesn't have a valid PASID (indicating a firmware bug).
Suggested-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Suggested-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
HWS may hang in the middle of destroy queue, remove the queue from the
process queue list so it won't be freed again in the future
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The initialization is not necessary. amd-kfd-staging and ROCm
releases have worked without it for two years.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
It turns out that idr_for_each_entry is really slow compared to just
iterating over the slots. Based on measurements the difference is
estimated to be about a factor 64. That means using idr_for_each_entry
is only worth it with very few allocated events.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The problem happens on Raven and Carrizo. The context save handler
should not clear the high bits of PC_HI before extracting the bits
of IB_STS.
The bug is not relevant to VEGA10 until we enable demand paging.
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Since the assembly code is inside "#if 0", it is ineffective. Despite that,
during debugging, we need to change the assembly code, extract it into
a separate file and compile the new file into hex values using sp3.
That process also requires us to remove "#if 0" and modify lines starting
with "#", so that sp3 can successfully compile the new file.
With this change, all the above chore is no longer needed, and
cwsr_trap_handler_gfx*.asm can be directly used by sp3 to generate its
hex values.
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
MTYPE_NC_NV (0) marks scalar/vector L1 cache lines as non-volatile.
Cache lines loaded through these apertures are intended to be
invalidated before (and sometimes during) a dispatch. The non-volatile
qualifier prevents these cache lines from being distinguished from
those loaded through the private aperture.
Use MTYPE_NC (1) instead on both Gfx7 and Gfx8. This allows the
compiler to use the BUFFER_WBINVL1_VOL instruction and is a precursor
to automatic per-dispatch scalar/vector L1 volatile invalidation.
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Synchronization between context-saving wavefronts is achieved by
sending a SAVEWAVE message to the SPI and then spin-waiting for a
response. These spin-waiting wavefronts may inhibit the progress
of other wavefronts in the context save handler, leading to the
synchronization condition never being achieved.
Before spin-waiting reduce the priority of each wavefront to
guarantee foward progress in the others.
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
args->n_devices is a u32 that comes from the user. The multiplication
could overflow on 32 bit systems possibly leading to privilege
escalation.
Fixes: 5ec7e02854 ("drm/amdkfd: Add ioctls for GPUVM memory management")
Signed-off-by: Dan Carpenter dan.carpenter@oracle.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Pull drm fixes from Dave Airlie:
"One omap, and one alsa pm fix (we merged the breaking patch via drm
tree).
Otherwise it's two bunches of amdgpu fixes, removing an unneeded file,
some DC fixes, HDMI audio regression fix, and some vega12 fixes"
* tag 'drm-fixes-for-v4.17-rc1' of git://people.freedesktop.org/~airlied/linux: (27 commits)
Revert "drm/amd/display: disable CRTCs with NULL FB on their primary plane (V2)"
Revert "drm/amd/display: fix dereferencing possible ERR_PTR()"
drm/amd/display: Fix regamma not affecting full-intensity color values
drm/amd/display: Fix FBC text console corruption
drm/amd/display: Only register backlight device if embedded panel connected
drm/amd/display: fix brightness level after resume from suspend
drm/amd/display: HDMI has no sound after Panel power off/on
drm/amdgpu: add MP1 and THM hw ip base reg offset
drm/amdgpu: fix null pointer panic with direct fw loading on gpu reset
drm/radeon: add PX quirk for Asus K73TK
drm/omap: fix crash if there's no video PLL
drm/amdgpu: Fix memory leaks at amdgpu_init() error path
drm/amdgpu: Fix PCIe lane width calculation
drm/radeon: Fix PCIe lane width calculation
drm/amdgpu/si: implement get/set pcie_lanes asic callback
drm/amdgpu: Add support for SRBM selection v3
Revert "drm/amdgpu: Don't change preferred domian when fallback GTT v5"
drm/amd/powerply: fix power reading on Fiji
drm/amd/powerplay: Enable ACG SS feature
drm/amdgpu/sdma: fix mask in emit_pipeline_sync
...
Hardware understands the regamma LUT as a piecewise linear function,
with points spaced exponentially along the range. We previously
programmed the LUT for range [2^-10, 2^0). This causes (normalized)
color values of 1 (=2^0) to miss the programmed LUT, and fall onto the
end region.
For DCE, the end region is extrapolated using a single (base, slope)
pair, using the max y-value from the last point in the curve as base.
This presents a problem, since this value affects all three color
channels. Scaling down the intensity of say - the blue regamma curve -
will not affect it's end region. This is especially noticiable when
using RedShift. It scales down the blue and green channels, but leaves
full-intensity colors unshifted.
Therefore, extend the range to cover [2^-10, 2^1) by programming another
hardware segment, containing only one point. That way, we won't be
hitting the end region.
Note that things are a bit different for DCN, since the end region can
be set per-channel.
Signed-off-by: Leo (Sunpeng) Li <sunpeng.li@amd.com>
Reviewed-by: Krunoslav Kovac <Krunoslav.Kovac@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>