drm/scheduler: rework job destruction

We now destroy finished jobs from the worker thread to make sure that
we never destroy a job currently in timeout processing.
By this we avoid holding lock around ring mirror list in drm_sched_stop
which should solve a deadlock reported by a user.

v2: Remove unused variable.
v4: Move guilty job free into sched code.
v5:
Move sched->hw_rq_count to drm_sched_start to account for counter
decrement in drm_sched_stop even when we don't call resubmit jobs
if guily job did signal.
v6: remove unused variable

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692

Acked-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1555599624-12285-3-git-send-email-andrey.grodzovsky@amd.com
This commit is contained in:
Christian König
2019-04-18 11:00:21 -04:00
committed by Alex Deucher
parent b3198c38f0
commit 5918045c4e
8 changed files with 102 additions and 85 deletions

View File

@@ -167,9 +167,6 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f);
* @sched: the scheduler instance on which this job is scheduled.
* @s_fence: contains the fences for the scheduling of job.
* @finish_cb: the callback for the finished fence.
* @finish_work: schedules the function @drm_sched_job_finish once the job has
* finished to remove the job from the
* @drm_gpu_scheduler.ring_mirror_list.
* @node: used to append this struct to the @drm_gpu_scheduler.ring_mirror_list.
* @id: a unique id assigned to each job scheduled on the scheduler.
* @karma: increment on every hang caused by this job. If this exceeds the hang
@@ -188,7 +185,6 @@ struct drm_sched_job {
struct drm_gpu_scheduler *sched;
struct drm_sched_fence *s_fence;
struct dma_fence_cb finish_cb;
struct work_struct finish_work;
struct list_head node;
uint64_t id;
atomic_t karma;
@@ -296,7 +292,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
void *owner);
void drm_sched_job_cleanup(struct drm_sched_job *job);
void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
void drm_sched_stop(struct drm_gpu_scheduler *sched);
void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad);
void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery);
void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched);
void drm_sched_increase_karma(struct drm_sched_job *bad);