To quote Felix: "For testing KV with current user mode stack, please use
amdgpu. I don't expect this to work with radeon and I'm not planning to
spend any effort on making radeon work with a current user mode stack."
Only compile tested, but should be straight forward.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
In systems under heavy load the IH work may experience significant
scheduling delays.
Under load + system workqueue:
Max Latency: 7.023695 ms
Avg Latency: 0.263994 ms
Under load + high priority workqueue:
Max Latency: 1.162568 ms
Avg Latency: 0.163213 ms
Further work is required to measure the impact of per-cpu settings on IH
performance.
Signed-off-by: Andres Rodriguez <andres.rodriguez@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
We don't need to wait for all work to complete in the IH exit function.
We only need to make sure the interrupt_work has finished executing to
guarantee that ih_kfifo is no longer in use.
Signed-off-by: Andres Rodriguez <andres.rodriguez@amd.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Replace our implementation of a lockless ring buffer with the standard
linux kernel kfifo.
We shouldn't maintain our own version of a standard data structure.
Signed-off-by: Andres Rodriguez <andres.rodriguez@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This allows increasing the KFD_SIGNAL_EVENT_LIMIT in kfd_ioctl.h
without breaking processes built with older kfd_ioctl.h versions.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This speeds up signal lookup when the IH ring entry includes a
valid context ID or partial context ID. Only if the context ID is
found to be invalid, fall back to an exhaustive search of all
signaled events.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signal slots are identical to event IDs.
Replace the used_slot_bitmap and events hash table with an IDR to
allocate and lookup event IDs and signal slots more efficiently.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The first event page is always big enough to handle all events.
Handling of multiple events pages is not supported by user mode, and
not necessary.
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Use standard wait queues for waiting and waking up waiting threads
instead of inventing our own. We still have our own wait loop
because the HSA event semantics require the ability to have one
thread waiting on multiple wait queues (events) at the same time.
Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This always identical with the index of the event_waiter in the array.
No need to store it in the waiter record.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
When an event with pending waiters is destroyed, those waiters may
end up sleeping forever unless they are notified and woken up.
Implement the notification by clearing the waiter->event pointer,
which becomes invalid anyway, when the event is freed, and waking
up the waiting tasks.
Waiters on an event that's destroyed return failure.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cleaned up the code while resolving some potential bugs and
inconsistencies in the process.
Clean-ups:
* Remove enum kfd_event_wait_result, which duplicates
KFD_IOC_EVENT_RESULT definitions
* alloc_event_waiters can be called without holding p->event_mutex
* Return an error code from copy_signaled_event_data instead of bool
* Clean up error handling code paths to minimize duplication in
kfd_wait_on_events
Fixes:
* Consistently return an error code from kfd_wait_on_events and set
wait_result to KFD_IOC_WAIT_RESULT_FAIL in all failure cases.
* Always call free_waiters while holding p->event_mutex
* copy_signaled_event_data might sleep. Don't call it while the task state
is TASK_INTERRUPTIBLE.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The kfd_process doesn't own a reference to the mm_struct, so it can
disappear without warning even while the kfd_process still exists.
Therefore, avoid dereferencing the kfd_process.mm pointer and make
it opaque. Use get_task_mm to get a temporary reference to the mm
when it's needed.
v2: removed unnecessary WARN_ON
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The high part calculation of luma and chroma address' was
missing in dm_plane_helper_prepare_fb().
This fix brings uniformity in the address' at atomic_check
and atomic_commit for both RGB & YUV planes.
Signed-off-by: Shirish S <shirish.s@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Currently the high part of the address structure is not
populated in case of luma and chroma.
This patch adds this calculation.
Signed-off-by: Shirish S <shirish.s@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Allocate memory for the second pipe allocate_mem_input() needs to
be done prior to program pipe front end. It shows sensitive to
Fiji. Failure to do so will cause error in allocate memory
allocate_mem_input() on the second connected display.
Signed-off-by: Jerry Zuo <Jerry.Zuo@amd.com>
Signed-off-by: Yongqiang Sun <yongqiang.sun@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Reviewed-by: Harry Wentland <Harry.Wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Just a few fixes for 4.15.
* 'drm-next-4.15' of git://people.freedesktop.org/~agd5f/linux:
drm/amd/amdgpu: Remove workaround for suspend/resume in uvd7
drm/amdgpu: don't flush the TLB before initializing GART
drm/amdgpu: minor cleanup for amdgpu_ttm_bind
drm/amdgpu/psp: prevent page fault by checking write_frame address(v4)
drm/amd/powerplay: retrieve the real-time coreClock values
drm/amd/powerplay: fix performance drop on Vega10
drm/amd/powerplay: add one smc message for Vega10
drm/amd/powerplay: fix amd_powerplay_reset()
amdgpu: add padding to the fence to handle ioctl.
drm/amdgpu:fix wb_clear
drm/amdgpu:fix vf_error_put
drm/amdgpu/sriov:now must reinit psp
drm/amdgpu: merge bios post checking functions
- Prevent a possible buffer overflow when updating the ring buffer by
bounds checking the command frame against the available space in the
ring buffer.
v2: update the ring_buffer_end address
v3: update the commit log
v4: squash in print fix (Michel)
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
- Currently, the coreClock value for min/max performance level on raven
is hard-coded. Use the real-time value retrieved by GetGfxMinFreqLimit
and GetGfxMaxFreqLimit PPSMC messages
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We accidentally inverted an if statement and turned amd_powerplay_reset()
into a no-op.
Fixes: ae97988fc8 ("drm/amd/powerplay: tidy up ret checks in amd_powerplay.c (v3)")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.
However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:
----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()
// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
virtual patch
@ depends on patch @
expression E1, E2;
@@
- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)
@ depends on patch @
expression E;
@@
- ACCESS_ONCE(E)
+ READ_ONCE(E)
----
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Programming sequence to frontend and backend has been switched.
In such case, program_scaler() is getting called when programming
frontend, and should be removed from backend programming routine.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Jerry (Fangzhi) Zuo <Jerry.Zuo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For SoC's having software designed cursor plane,
should be treated differently than hardware cursor planes.
The DRM core initializes cursor plane by default with
legacy_cursor_update set.
Hence legacy_cursor_update can be use effectively
to handle software cursor planes' update and atomicity
functionalities.
This patch uses this variable to decide in the atomic_check
to whether add a requested plane to the list of affected planes or
not, hence fixing the issue of co-existence of MPO, i.e,
setting of available hardware planes like underlay and
updation of cursor planes as well.
Without this patch when underlay is set from user space,
only blank screen with backlight is visible.
Signed-off-by: Shirish S <shirish.s@amd.com>
Reviewed-by: Harry Wentland <Harry.Wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>