Commit Graph

64105 Commits

Author SHA1 Message Date
Douglas Anderson
cf33de1799 drm/bridge: ti-sn65dsi86: Config number of DP lanes Mo' Betta
The driver used to say that the value to program into bridge register
0x93 was dp_lanes - 1.  Looking at the datasheet for the bridge, this
is wrong.  The data sheet says:
* 1 = 1 lane
* 2 = 2 lanes
* 3 = 4 lanes

A more proper way to express this encoding is min(dp_lanes, 3).

At the moment this change has zero effect because we've hardcoded the
number of DP lanes to 4.  ...and (4 - 1) == min(4, 3).  How fortunate!
...but soon we'll stop hardcoding the number of lanes.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191218143416.v3.4.If3e2d0493e7b6e8b510ea90d8724ff760379b3ba@changeid
2020-02-13 10:21:09 +01:00
Douglas Anderson
fa8a66c687 drm/bridge: ti-sn65dsi86: Don't use MIPI variables for DP link
The ti-sn65dsi86 is a bridge from MIPI to DP and thus has two links:
the MIPI link and the DP link.  The two links do not need to have the
same format or number of lanes.  Stop using MIPI variables when
talking about the DP link.

This has zero functional change because:
* currently we are hardcoding the MIPI link as unpacked RGB888 which
  requires 24 bits and currently we are not changing the DP link rate
  from the bridge's default of 8 bits per pixel.
* currently we are hardcoding both the MIPI and DP as being 4 lanes.

This is all in prep for fixing some of the above.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191218143416.v3.3.Ia6e05f4961adb0d4a0d32ba769dd7781ee8db431@changeid
2020-02-13 10:21:08 +01:00
Douglas Anderson
2f8fcc7794 drm/bridge: ti-sn65dsi86: zero is never greater than an unsigned int
When we iterate over ti_sn_bridge_dp_rate_lut, there's no reason to
start at index 0 which always contains the value 0.  0 is not a valid
link rate.

This change should have no real effect but is a small cleanup.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191218143416.v3.2.Id445d0057bedcb0a190009e0706e9254c2fd48eb@changeid
2020-02-13 10:21:08 +01:00
Douglas Anderson
ca1b885cbe drm/bridge: ti-sn65dsi86: Split the setting of the dp and dsi rates
These two things were in one function.  Split into two.  This looks
like it's duplicating some code, but don't worry.  This is is just in
preparation for future changes.

This is intended to have zero functional change and will just make
future patches easier to understand.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191218143416.v3.1.Icb765d5799e9651e5249c0c27627ba33a9e411cf@changeid
2020-02-13 10:21:07 +01:00
Mika Kuoppala
3873fd1a43 drm/i915: Use engine wa list for Wa_1607090982
This is in mcr range of register, thus we can only verify
it through mmio. Use engine wa list with mcr range verification
skip.

Fixes: 0db1a5f870 ("drm/i915: Implement Wa_1607090982")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200212165707.11143-1-mika.kuoppala@linux.intel.com
2020-02-13 07:25:28 +00:00
Dan Carpenter
434cbcb1bd drm/amdgpu: return -EFAULT if copy_to_user() fails
The copy_to_user() function returns the number of bytes remaining to be
copied, but we want to return a negative error code to the user.

Fixes: 030d5b97a5 ("drm/amdgpu: use amdgpu_device_vram_access in amdgpu_ttm_vram_read")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-12 16:04:41 -05:00
Alex Deucher
72b4c01d66 drm/amdgpu/gfx10: disable gfxoff when reading rlc clock
Otherwise we readback all ones.  Fixes rlc counter
readback while gfxoff is active.

Reviewed-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-12 16:04:41 -05:00
Alex Deucher
e5f134958d drm/amdgpu/gfx9: disable gfxoff when reading rlc clock
Otherwise we readback all ones.  Fixes rlc counter
readback while gfxoff is active.

Reviewed-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-12 16:04:40 -05:00
Alex Deucher
b90c4d667c drm/amdgpu/soc15: fix xclk for raven
It's 25 Mhz (refclk / 4).  This fixes the interpretation
of the rlc clock counter.

Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-12 16:04:40 -05:00
Alex Deucher
228a10d4e1 drm/amdgpu/display move get_num_odm_splits() into dc_resource.c
It's used by more than just DCN2.0.  Fixes missing symbol when
amdgpu is built without DCN support.

Reviewed-by: Zhan Liu <zhan.liu@amd.com>
Tested-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-12 16:04:39 -05:00
Alex Deucher
cf2156e240 drm/amdgpu/display: extend DCN guards
to cover dcn2.x related headers.

Reviewed-by: Zhan Liu <zhan.liu@amd.com>
Tested-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-12 16:04:39 -05:00
Alex Deucher
09034ae43f drm/amdgpu/display: extend DCN guard in dal_bios_parser_init_cmd_tbl_helper2
To cover DCN 2.x.

Reviewed-by: Zhan Liu <zhan.liu@amd.com>
Tested-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-12 16:04:39 -05:00
Kent Russell
00151afc6f drm/powerplay: Ratelimit PP_ASSERT warnings
In certain situations the message could be reported dozens-to-hundreds of
times, based on how often the function is called.
E.g. If MCLK DPM, any calls to get/set MCLK will result in a failure
message, potentially flooding dmesg. Ratelimit the warnings to avoid
this flood.

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-12 16:04:31 -05:00
Evan Quan
79275af61e drm/amd/powerplay: always refetch the enabled features status on dpm enablement
Otherwise, the cached dpm features status may be inconsistent under some
case(e.g. baco reset of Navi asic).

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-12 16:04:18 -05:00
Bhawanpreet Lakha
c786530b21 drm/amd/display: fix dtm unloading
there was a type in the terminate command.

We should be calling psp_dtm_unload() instead of psp_hdcp_unload()

Fixes: 143f230533 ("drm/amdgpu: psp DTM init")
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-12 16:03:22 -05:00
Bhawanpreet Lakha
4a9a4e3a7c drm/amd/display: Fix message for encryption
-msg_in is not needed for enabling encryption.
-Use hdcp2_set_encryption instead of hdcp1_enable_encryption for hdcp2.2

Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-12 16:03:06 -05:00
Bhawanpreet Lakha
b215010fd3 drm/amd/display: fix backwards byte order in rx_caps.
We were using incorrect byte order after we started using the drm_defines
So fix it.

Fixes: 02837a91ae ("drm/amd/display: add and use defines from drm_hdcp.h")
Signed-off-by: JinZe.Xu <JinZe.Xu@amd.com>
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-12 16:01:54 -05:00
Wenjing Liu
9124ee78e3 drm/amd/display: update HDCP DTM immediately after hardware programming
[why]
HDCP DTM needs to be aware of the upto date display topology
information in order to validate hardware consistency.

[how]
update HDCP DTM on update_stream_config call.

Signed-off-by: Wenjing Liu <Wenjing.Liu@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-12 16:01:46 -05:00
Wenjing Liu
3744ee2c29 drm/amd/display: no hdcp retry if bksv or ksv list is revoked
[why]
According to the specs when bksv or ksv list fails SRM check,
HDCP TX should abort hdcp immediately.
However with the current code HDCP will be reattampt upto 4 times.

[how]
Add the logic that stop HDCP retry if bksv or ksv list
is revoked.

Signed-off-by: Wenjing Liu <Wenjing.Liu@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-12 16:01:38 -05:00
Bhawanpreet Lakha
c17f7220f5 drm/amd/display: Handle revoked receivers
[Why]
PSP added a new return code for revoked receivers (SRM). We need to
handle that so we don't retry hdcp

This is already being handled on windows

[How]
Add the enums to psp interface header and handle them.

Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-12 16:01:30 -05:00
Alex Deucher
4fdda2e66d drm/amdgpu/runpm: enable runpm on baco capable VI+ asics
Seems to work reliably on VI+ except for a few so enable runpm barring
those where baco for runtime power management is not supported.

[rajneesh] Picked https://patchwork.freedesktop.org/patch/335402/ to
enable runtime pm with baco for kfd. Also fixed a checkpatch warning and
added extra checks for VEGA20 and ARCTURUS.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-12 16:01:03 -05:00
Rajneesh Bhardwaj
9593f4d6a6 drm/amdkfd: refactor runtime pm for baco
So far the kfd driver implemented same routines for runtime and system
wide suspend and resume (s2idle or mem). During system wide suspend the
kfd aquires an atomic lock that prevents any more user processes to
create queues and interact with kfd driver and amd gpu. This mechanism
created problem when amdgpu device is runtime suspended with BACO
enabled. Any application that relies on kfd driver fails to load because
the driver reports a locked kfd device since gpu is runtime suspended.

However, in an ideal case, when gpu is runtime  suspended the kfd driver
should be able to:

 - auto resume amdgpu driver whenever a client requests compute service
 - prevent runtime suspend for amdgpu  while kfd is in use

This change refactors the amdgpu and amdkfd drivers to support BACO and
runtime power management.

Reviewed-by: Oak Zeng <oak.zeng@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-12 16:00:54 -05:00
Rajneesh Bhardwaj
3c1224c02e drm/amdkfd: show warning when kfd is locked
During system suspend the kfd driver aquires a lock that prohibits
further kfd actions unless the gpu is resumed. This adds some info which
can be useful while debugging.

Reviewed-by: Oak Zeng <oak.zeng@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-12 16:00:47 -05:00
Rajneesh Bhardwaj
70bedd68e7 drm/amdgpu: Fix missing error check in suspend
amdgpu_device_suspend might return an error code since it can be called
from both runtime and system suspend flows. Add the missing return code
in case of a failure.

Reviewed-by: Oak Zeng <oak.zeng@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-02-12 16:00:25 -05:00
Chris Wilson
c616d2387a drm/i915/gt: Expand bad CS completion event debug
Show the ring/request/context state if we see what we believe is an
early CS completion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200211230944.1203098-1-chris@chris-wilson.co.uk
2020-02-12 20:34:36 +00:00
Boris Brezillon
dde2bb2da0 drm/panfrost: perfcnt: Reserve/use the AS attached to the perfcnt MMU context
We need to use the AS attached to the opened FD when dumping counters.

Reported-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Fixes: 7282f7645d ("drm/panfrost: Implement per FD address spaces")
Cc: <stable@vger.kernel.org>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Tested-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20200206141327.446127-1-boris.brezillon@collabora.com
2020-02-12 14:27:29 -06:00
YueHaibing
fe154a2422 drm/panfrost: Remove set but not used variable 'bo'
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/gpu/drm/panfrost/panfrost_job.c: In function 'panfrost_job_cleanup':
drivers/gpu/drm/panfrost/panfrost_job.c:278:31: warning:
 variable 'bo' set but not used [-Wunused-but-set-variable]

commit bdefca2d8d ("drm/panfrost: Add the panfrost_gem_mapping concept")
involved this unused variable.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Alyssas Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20200203152724.42611-1-yuehaibing@huawei.com
2020-02-12 13:49:55 -06:00
Stephan Gerhold
5c320b6ce7 drm/modes: Allow DRM_MODE_ROTATE_0 when applying video mode parameters
At the moment, only DRM_MODE_ROTATE_180 is allowed when we try to apply
the rotation from the video mode parameters. It is also useful to allow
DRM_MODE_ROTATE_0 in case there is only a reflect option in the video mode
parameter (e.g. video=540x960,reflect_x).

DRM_MODE_ROTATE_0 means "no rotation" and should therefore not require
any special handling, so we can just add it to the if condition.

Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20200117153429.54700-3-stephan@gerhold.net
2020-02-12 18:32:58 +01:00
Stephan Gerhold
e6980a7271 drm/modes: Make sure to parse valid rotation value from cmdline
A rotation value should have exactly one rotation angle.
At the moment there is no validation for this when parsing video=
parameters from the command line. This causes problems later on
when we try to combine the command line rotation with the panel
orientation.

To make sure that we generate a valid rotation value:
  - Set DRM_MODE_ROTATE_0 by default (if no rotate= option is set)
  - Validate that there is exactly one rotation angle set
    (i.e. specifying the rotate= option multiple times is invalid)

Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20200117153429.54700-2-stephan@gerhold.net
2020-02-12 18:32:54 +01:00
Chris Wilson
2aaaa5ee1c drm/i915: Mark the removal of the i915_request from the sched.link
Keep the rq->fence.flags consistent with the status of the
rq->sched.link, and clear the associated bits when decoupling the link
on retirement (as we may wish to inspect those flags independent of
other state).

Fixes: c3f1ed90e6 ("drm/i915/gt: Allow temporary suspension of inflight requests")
References: https://gitlab.freedesktop.org/drm/intel/issues/997
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200122140243.495621-3-chris@chris-wilson.co.uk
(cherry picked from commit b4a9a149f9)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-02-12 17:04:33 +02:00
Chris Wilson
a2f90f4ff3 drm/i915/execlists: Reclaim the hanging virtual request
If we encounter a hang on a virtual engine, as we process the hang the
request may already have been moved back to the virtual engine (we are
processing the hang on the physical engine). We need to reclaim the
request from the virtual engine so that the locking is consistent and
local to the real engine on which we will hold the request for error
state capturing.

v2: Pull the reclamation into execlists_hold() and assert that cannot be
called from outside of the reset (i.e. with the tasklet disabled).
v3: Added selftest
v4: Drop the reference owned by the virtual engine

Fixes: ad18ba7b5e ("drm/i915/execlists: Offline error capture")
Testcase: igt/gem_exec_balancer/hang
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200122140243.495621-2-chris@chris-wilson.co.uk
(cherry picked from commit 989df3a7bd)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-02-12 17:03:53 +02:00
Chris Wilson
317e0395cc drm/i915/execlists: Take a reference while capturing the guilty request
Thanks to preempt-to-busy, we leave the request on the HW as we submit
the preemption request. This means that the request may complete at any
moment as we process HW events, and in particular the request may be
retired as we are planning to capture it for a preemption timeout.

Be more careful while obtaining the request to capture after a
preemption timeout, and check to see if it completed before we were able
to put it on the on-hold list. If we do see it did complete just before
we capture the request, proclaim the preemption-timeout a false positive
and pardon the reset as we should hit an arbitration point momentarily
and so be able to process the preemption.

Note that even after we move the request to be on hold it may be retired
(as the reset to stop the HW comes after), so we do require to hold our
own reference as we work on the request for capture (and all of the
peeking at state within the request needs to be carefully protected).

Fixes: c3f1ed90e6 ("drm/i915/gt: Allow temporary suspension of inflight requests")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/997
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200122140243.495621-1-chris@chris-wilson.co.uk
(cherry picked from commit 4ba5c086a1)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-02-12 17:02:53 +02:00
Chris Wilson
ad18ba7b5e drm/i915/execlists: Offline error capture
Currently, we skip error capture upon forced preemption. We apply forced
preemption when there is a higher priority request that should be
running but is being blocked, and we skip inline error capture so that
the preemption request is not further delayed by a user controlled
capture -- extending the denial of service.

However, preemption reset is also used for heartbeats and regular GPU
hangs. By skipping the error capture, we remove the ability to debug GPU
hangs.

In order to capture the error without delaying the preemption request
further, we can do an out-of-line capture by removing the guilty request
from the execution queue and scheduling a worker to dump that request.
When removing a request, we need to remove the entire context and all
descendants from the execution queue, so that they do not jump past.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/738
Fixes: 3a7a92aba8 ("drm/i915/execlists: Force preemption")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-3-chris@chris-wilson.co.uk
(cherry picked from commit 748317386a)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-02-12 16:55:58 +02:00
Chris Wilson
c3f1ed90e6 drm/i915/gt: Allow temporary suspension of inflight requests
In order to support out-of-line error capture, we need to remove the
active request from HW and put it to one side while a worker compresses
and stores all the details associated with that request. (As that
compression may take an arbitrary user-controlled amount of time, we
want to let the engine continue running on other workloads while the
hanging request is dumped.) Not only do we need to remove the active
request, but we also have to remove its context and all requests that
were dependent on it (both in flight, queued and future submission).

Finally once the capture is complete, we need to be able to resubmit the
request and its dependents and allow them to execute.

v2: Replace stack recursion with a simple list.
v3: Check all the parents, not just the first, when searching for a
stuck ancestor!

References: https://gitlab.freedesktop.org/drm/intel/issues/738
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-2-chris@chris-wilson.co.uk
(cherry picked from commit 32ff621fd7)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-02-12 16:55:58 +02:00
Chris Wilson
9e2750fc80 drm/i915: Keep track of request among the scheduling lists
If we keep track of when the i915_request.sched.link is on the HW
runlist, or in the priority queue we can simplify our interactions with
the request (such as during rescheduling). This also simplifies the next
patch where we introduce a new in-between list, for requests that are
ready but neither on the run list or in the queue.

v2: Update i915_sched_node.link explanation for current usage where it
is a link on both the queue and on the runlists.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-1-chris@chris-wilson.co.uk
(cherry picked from commit 672c368f93)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-02-12 16:55:58 +02:00
Jani Nikula
cc3251d8ef Merge tag 'gvt-fixes-2020-02-12' of https://github.com/intel/gvt-linux into drm-intel-next-fixes
gvt-fixes-2020-02-12

- fix possible high-order allocation fail for late load (Igor)
- fix one missed lock for ppgtt mm LRU list (Igor)

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
From: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200212065912.GB4997@zhen-hp.sh.intel.com
2020-02-12 16:50:04 +02:00
Maarten Lankhorst
74c12ee02a Merge v5.6-rc1 into drm-misc-fixes
We're based on v5.6, need v5.6-rc1 at least. :)

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2020-02-12 14:08:59 +01:00
Christian König
f704ff7c3d drm/ttm: individualize resv objects before calling release_notify
This allows release_notify to add and remove fences from deleted objects.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: xinhui pan <xinhui.pan@amd.com>
Link: https://patchwork.freedesktop.org/patch/352750/
2020-02-12 13:03:56 +01:00
Christian König
519c2de003 drm/ttm: replace dma_resv object on deleted BOs v3
When non-imported BOs are resurrected for delayed delete we replace
the dma_resv object to allow for easy reclaiming of the resources.

v2: move that to ttm_bo_individualize_resv
v3: add a comment to explain what's going on

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/352927/
2020-02-12 13:03:33 +01:00
Christian König
1ec39923ef drm/ttm: rework BO delayed delete. v2
This patch reworks the whole delayed deletion of BOs which aren't idle.

Instead of having two counters for the BO structure we resurrect the BO
when we find that a deleted BO is not idle yet.

This has many advantages, especially that we don't need to
increment/decrement the BOs reference counter any more when it
moves on the LRUs.

v2: remove duplicate ttm_tt_destroy, fix holde lock for LRU move

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: xinhui pan <xinhui.pan@amd.com>
Link: https://patchwork.freedesktop.org/patch/352912/
2020-02-12 13:03:27 +01:00
Chris Wilson
2933803bdc drm/i915/gem: Tighten checks and acquiring the mmap object
Make sure we hold the rcu lock as we acquire the rcu protected reference
of the object when looking it up from the associated mmap vma.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1083
Fixes: cc662126b4 ("drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200130143931.1906301-1-chris@chris-wilson.co.uk
(cherry picked from commit 280d14a69d)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-02-12 13:24:45 +02:00
José Roberto de Souza
52144db130 drm/i915: Fix preallocated barrier list append
Only the first and the last nodes were being added to
ref->preallocated_barriers.

Renaming variables to make it more easy to read.

Fixes: 8413502238 ("drm/i915/gt: Drop mutex serialisation between context pin/unpin")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200129232345.84512-1-jose.souza@intel.com
(cherry picked from commit d4c3c0b822)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-02-12 13:24:45 +02:00
Chris Wilson
5b92415e64 drm/i915/gt: Acquire ce->active before ce->pin_count/ce->pin_mutex
Similar to commit ac0e331a62 ("drm/i915: Tighten atomicity of
i915_active_acquire vs i915_active_release") we have the same race of
trying to pin the context underneath a mutex while allowing the
decrement to be atomic outside of that mutex. This leads to the problem
where two threads may simultaneously try to pin the context and the
second not notice that they needed to repin the context.

<2> [198.669621] kernel BUG at drivers/gpu/drm/i915/gt/intel_timeline.c:387!
<4> [198.669703] invalid opcode: 0000 [#1] PREEMPT SMP PTI
<4> [198.669712] CPU: 0 PID: 1246 Comm: gem_exec_create Tainted: G     U  W         5.5.0-rc6-CI-CI_DRM_7755+ #1
<4> [198.669723] Hardware name:  /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017
<4> [198.669776] RIP: 0010:timeline_advance+0x7b/0xe0 [i915]
<4> [198.669785] Code: 00 48 c7 c2 10 f1 46 a0 48 c7 c7 70 1b 32 a0 e8 bb dd e7 e0 bf 01 00 00 00 e8 d1 af e7 e0 31 f6 bf 09 00 00 00 e8 35 ef d8 e0 <0f> 0b 48 c7 c1 48 fa 49 a0 ba 84 01 00 00 48 c7 c6 10 f1 46 a0 48
<4> [198.669803] RSP: 0018:ffffc900004c3a38 EFLAGS: 00010296
<4> [198.669810] RAX: ffff888270b35140 RBX: ffff88826f32ee00 RCX: 0000000000000006
<4> [198.669818] RDX: 00000000000017c5 RSI: 0000000000000000 RDI: 0000000000000009
<4> [198.669826] RBP: ffffc900004c3a64 R08: 0000000000000000 R09: 0000000000000000
<4> [198.669834] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88826f9b5980
<4> [198.669841] R13: 0000000000000cc0 R14: ffffc900004c3dc0 R15: ffff888253610068
<4> [198.669849] FS:  00007f63e663fe40(0000) GS:ffff888276c00000(0000) knlGS:0000000000000000
<4> [198.669857] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [198.669864] CR2: 00007f171f8e39a8 CR3: 000000026b1f6005 CR4: 00000000003606f0
<4> [198.669872] Call Trace:
<4> [198.669924]  intel_timeline_get_seqno+0x12/0x40 [i915]
<4> [198.669977]  __i915_request_create+0x76/0x5a0 [i915]
<4> [198.670024]  i915_request_create+0x86/0x1c0 [i915]
<4> [198.670068]  i915_gem_do_execbuffer+0xbf2/0x2500 [i915]
<4> [198.670082]  ? __lock_acquire+0x460/0x15d0
<4> [198.670128]  i915_gem_execbuffer2_ioctl+0x11f/0x470 [i915]
<4> [198.670171]  ? i915_gem_execbuffer_ioctl+0x300/0x300 [i915]
<4> [198.670181]  drm_ioctl_kernel+0xa7/0xf0
<4> [198.670188]  drm_ioctl+0x2e1/0x390
<4> [198.670233]  ? i915_gem_execbuffer_ioctl+0x300/0x300 [i915]

Fixes: 8413502238 ("drm/i915/gt: Drop mutex serialisation between context pin/unpin")
References: ac0e331a62 ("drm/i915: Tighten atomicity of i915_active_acquire vs i915_active_release")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200127152829.2842149-1-chris@chris-wilson.co.uk
(cherry picked from commit e5429340bf)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-02-12 13:24:45 +02:00
Chris Wilson
7c34bb0398 drm/i915: Tighten atomicity of i915_active_acquire vs i915_active_release
As we use a mutex to serialise the first acquire (as it may be a lengthy
operation), but only an atomic decrement for the release, we have to
be careful in case a second thread races and completes both
acquire/release as the first finishes its acquire.

Thread A			Thread B
i915_active_acquire		i915_active_acquire
  atomic_read() == 0		  atomic_read() == 0
  mutex_lock()			  mutex_lock()
				  atomic_read() == 0
				    ref->active();
				  atomic_inc()
				  mutex_unlock()
  atomic_read() == 1
				i915_active_release
				  atomic_dec_and_test() -> 0
				    ref->retire()
  atomic_inc() -> 1
  mutex_unlock()

So thread A has acquired the ref->active_count but since the ref was
still active at the time, it did not initialise it. By switching the
check inside the mutex to an atomic increment only if already active, we
close the race.

Fixes: c9ad602fea ("drm/i915: Split i915_active.mutex into an irq-safe spinlock for the rbtree")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200126102346.1877661-3-chris@chris-wilson.co.uk
(cherry picked from commit ac0e331a62)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-02-12 13:24:44 +02:00
Chris Wilson
9556e5c7c4 drm/i915: Stub out i915_gpu_coredump_put
i915_gpu_coreddump_put is currently only defined if
CONFIG_DRM_I915_CAPTURE_ERROR is enabled, provide a stub otherwise.

Reported-by: Mike Lothian <mike@fireburn.co.uk>
Fixes: 742379c0c4 ("drm/i915: Start chopping up the GPU error capture")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mike Lothian <mike@fireburn.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200124192255.541355-1-chris@chris-wilson.co.uk
(cherry picked from commit 7e36505d0c)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-02-12 13:24:34 +02:00
Chris Wilson
c8b56cd014 drm/i915/selftests: Avoid choosing zero for phys_sz
Make sure we avoid ending up with a phys_sz of 0, or for phys_sz to be
larger than the actual size.

Closes: https://patchwork.freedesktop.org/series/73320/
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200212085432.1250807-1-chris@chris-wilson.co.uk
2020-02-12 10:14:06 +00:00
Chris Wilson
37305ede63 drm/i915/selftests: Sabotague the RING_HEAD
Apply vast quantities of poison and not tell anyone to see if we fall
for the trap of using a stale RING_HEAD.

References: 42827350f7 ("drm/i915/gt: Avoid resetting ring->head outside of its timeline mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200211205615.1190127-2-chris@chris-wilson.co.uk
2020-02-12 10:07:13 +00:00
Chris Wilson
89dd019a8a drm/i915: Poison rings after use
On retiring the request, we should not re-use these elements in the ring
(at least not until we fill the ringbuffer and knowingly reuse the space).
Leave behind some poison to (hopefully) trap ourselves if we make a
mistake.

Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200211205615.1190127-1-chris@chris-wilson.co.uk
2020-02-12 10:07:13 +00:00
Gerd Hoffmann
b1df3a2b24 drm/virtio: add drm_driver.release callback.
Split virtio_gpu_deinit(), move the drm shutdown and release to
virtio_gpu_release().  Drop vqs_ready variable, instead use
drm_dev_{enter,exit,unplug} to avoid touching hardware after
device removal.  Tidy up here and there.

v4: add changelog.
v3: use drm_dev_*().

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20200211135805.24436-1-kraxel@redhat.com
2020-02-12 10:24:08 +01:00
Gerd Hoffmann
81e7301d7d drm/cirrus: add drm_driver.release callback.
Move final cleanups from cirrus_pci_remove() to the new callback.
Add drm_atomic_helper_shutdown() call to cirrus_pci_remove().

Use drm_dev_{enter,exit,unplug} to avoid touching hardware after
device removal.

v4: add changelog.
v3: use drm_dev*.
v2: stop touching hardware after pci_remove().

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20200211135522.23654-1-kraxel@redhat.com
2020-02-12 10:24:08 +01:00