RPS provides a feedback loop where we use the load during the previous
evaluation interval to decide whether to up or down clock the GPU
frequency. Our responsiveness is split into 3 regimes, a high and low
plateau with the intent to keep the gpu clocked high to cover occasional
stalls under high load, and low despite occasional glitches under steady
low load, and inbetween. However, we run into situations like kodi where
we want to stay at low power (video decoding is done efficiently
inside the fixed function HW and doesn't need high clocks even for high
bitrate streams), but just occasionally the pipeline is more complex
than a video decode and we need a smidgen of extra GPU power to present
on time. In the high power regime, we sample at sub frame intervals with
a bias to upclocking, and conversely at low power we sample over a few
frames worth to provide what we consider to be the right levels of
responsiveness respectively. At low power, we more or less expect to be
kicked out to high power at the start of a busy sequence by waitboosting.
Prior to commit e9af4ea2b9 ("drm/i915: Avoid waitboosting on the active
request") whenever we missed the frame or stalled, we would immediate go
full throttle and upclock the GPU to max. But in commit e9af4ea2b9, we
relaxed the waitboosting to only apply if the pipeline was deep to avoid
over-committing resources for a near miss. Sadly though, a near miss is
still a miss, and perceptible as jitter in the frame delivery.
To try and prevent the near miss before having to resort to boosting
after the fact, we use the pageflip queue as an indication that we are
in an "interactive" regime and so should sample the load more frequently
to provide power before the frame misses it vblank. This will make us
more favorable to providing a small power increase (one or two bins) as
required rather than going all the way to maximum and then having to
work back down again. (We still keep the waitboosting mechanism around
just in case a dramatic change in system load requires urgent uplocking,
faster than we can provide in a few evaluation intervals.)
v2: Reduce rps_set_interactive to a boolean parameter to avoid the
confusion of what if they wanted a new power mode after pinning to a
different mode (which to choose?)
v3: Only reprogram RPS while the GT is awake, it will be set when we
wake the GT, and while off warns about being used outside of rpm.
v4: Fix deferred application of interactive mode
v5: s/state/interactive/
v6: Group the mutex with its principle in a substruct
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107111
Fixes: e9af4ea2b9 ("drm/i915: Avoid waitboosting on the active request")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180731132629.3381-1-chris@chris-wilson.co.uk
(cherry picked from commit 60548c554b)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
It was originally introduced following the VESA spec in order to validate PSR.
However we found so many issues around sink_crc that instead of helping PSR
development it only brought another layer of trouble to the table.
So, sink_crc has been a black whole for us in question of time, effort and hope.
First of the problems is that HW statement is clear: "Do not attempt to use
aux communication with PSR enabled". So the main reason behind sink_crc is
already compromised.
For a while we had hope on the aux-mutex could workaround this problem on SKL+
platforms, but that mutex was not reliable, not tested,
and we shouldn't use according to HW engineers.
Also, nor source, nor sink designed and implemented the sink_crc to be used like
we are trying to use here.
Well, the sink side of things is also apparently not prepared for this
case. Each panel that we tried seemed to have a different behavior with same
code and same source.
So, for all the time we lost on trying to ducktape all these different issues
I believe it is now time to move PSR to a more reliable validation.
Maybe not a perfect one as we dreamed for this sink_crc, but at least more
reliable.
Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180705192528.30515-1-rodrigo.vivi@intel.com
Usually we have no idea about the upper bound we need to wait to catch
up with userspace when idling the device, but in a few situations we
know the system was idle beforehand and can provide a short timeout in
order to very quickly catch a failure, long before hangcheck kicks in.
In the following patches, we will use the timeout to curtain two overly
long waits, where we know we can expect the GPU to complete within a
reasonable time or declare it broken.
In particular, with a broken GPU we expect it to fail during the initial
GPU setup where do a couple of context switches to record the defaults.
This is a task that takes a few milliseconds even on the slowest of
devices, but we may have to wait 60s for hangcheck to give in and
declare the machine inoperable. In this a case where any gpu hang is
unacceptable, both from a timeliness and practical standpoint.
The other improvement is that in selftests, we do not need to arm an
independent timer to inject a wedge, as we can just limit the timeout on
the wait directly.
v2: Include the timeout parameter in the trace.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180709122044.7028-1-chris@chris-wilson.co.uk
For each platform, we have a few registers that are rewritten with
different values -- they are not part of a sequence, just different parts
of a masked register set at different times (e.g. platform and gen
workarounds). Consolidate these into a single register write to keep the
table compact, important since we are running of room in the current
fixed sized buffer.
While adjusting the construction of the wa table, make it non fatal so
that the driver still loads but keeping the warning and extra details
for inspection.
Inspecting the changes for a Kabylake system,
Before:
Address val mask read
0x07014 0x20002000 0x00002000 0x00002100
0x0E194 0x01000100 0x00000100 0x00000114
0x0E4F0 0x81008100 0x00008100 0xFFFF8120
0x0E184 0x00200020 0x00000020 0x00000022
0x0E194 0x00140014 0x00000014 0x00000114
0x07004 0x00420042 0x00000042 0x000029C2
0x0E188 0x00080000 0x00000008 0x00008030
0x07300 0x80208020 0x00008020 0x00008830
0x07300 0x00100010 0x00000010 0x00008830
0x0E184 0x00020002 0x00000002 0x00000022
0x0E180 0x20002000 0x00002000 0x00002000
0x02580 0x00010000 0x00000001 0x00000004
0x02580 0x00060004 0x00000006 0x00000004
0x07014 0x01000100 0x00000100 0x00002100
0x0E100 0x00100010 0x00000010 0x00008050
After:
Address val mask read
0x02580 0x00070004 0x00000007 0x00000004
0x07004 0x00420042 0x00000042 0x000029C2
0x07014 0x21002100 0x00002100 0x00002100
0x07300 0x80308030 0x00008030 0x00008830
0x0E100 0x00100010 0x00000010 0x00008050
0x0E180 0x20002000 0x00002000 0x00002000
0x0E184 0x00220022 0x00000022 0x00000022
0x0E188 0x00080000 0x00000008 0x00008030
0x0E194 0x01140114 0x00000114 0x00000114
0x0E4F0 0x81008100 0x00008100 0xFFFF8120
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180615120207.13952-1-chris@chris-wilson.co.uk
Hangcheck is our back up in case the GPU or the driver gets stuck. It
detects when the GPU is not making any progress and issues a GPU reset.
However, if the driver is failing to make any progress, we can get
ourselves into a situation where we continually try resetting the GPU to
no avail. Employ a second timeout such that if we continue to see the
same seqno (the stalled engine has made no progress at all) over the
course of several hangchecks, declare the driver wedged and attempt to
start afresh.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180602104853.17140-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
The immediate enabling was actually not an issue for the
HW perspective for core platforms that have HW tracking.
HW will wait few identical idle frames before transitioning
to actual psr active anyways.
Now that we removed VLV/CHV out of the picture completely
we can safely remove any delays.
Note that this patch also remove the delayed activation
on HSW and BDW introduced by commit 'd0ac896a477d
("drm/i915: Delay first PSR activation.")'. This was
introduced to fix a blank screen on VLV/CHV and also
masked some frozen screens on other core platforms.
Probably the same that we are now properly hunting and fixing.
v2:(DK): Remove unnecessary WARN_ONs and make some other
VLV | CHV more readable.
v3: Do it regardless the timer rework.
v4: (DK/CI): Add VLV || CHV check on cancel work at psr_disable.
v5: Kill remaining items and fully rework activation functions.
v6: Rebase on top of VLV/CHV clean-up and keep the reactivation
on a regular non-delayed work to avoid extra delays on exit
calls and allow us to add few more safety checks before
real activation.
Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180613192600.3955-1-rodrigo.vivi@intel.com
PSR hardware and hence the driver code for VLV and CHV deviates a lot from
their DDI counterparts. While the feature has been disabled for a long time
now, retaining support for these platforms is a maintenance burden. There
have been multiple refactoring commits to just keep the existing code for
these platforms in line with the rest. There are known issues that need to
be fixed to enable PSR on these platforms, and there is no PSR capable
platform in CI to ensure the code does not break again if we get around to
fixing the existing issues. On account of all these reasons, let's nuke
this code for now and bring it back if a need arises in the future.
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180511230059.19387-1-dhinakaran.pandiyan@intel.com
This commit introduces the definitions for the ICL clocks and adds the
basic functions to the shared DPLL framework. It adds code for the
Enable and Disable sequences for some PLLs, but it does not have the
code to compute the actual PLL values, which are marked as TODO
comments and should be introduced as separate commits.
Special thanks to James Ausmus for investigating and fixing a bug with
the placement of icl_unmap_plls_to_ports() function.
v2:
- Rebase around dpll_lock changes.
v3:
- The spec now says what the timeouts should be.
- Touch DPCLKA_CFGCR0_ICL at the appropriate time so we don't freeze
the machine.
- Checkpatch found a white space problem.
- Small adjustments before upstreaming.
v4:
- Move the ICL checks out of the *map_plls_to_ports() functions
(James)
- Add extra encoder check (James)
- Call icl_unmap_plls_to_ports() later (James)
v5:
- Rebase after the pll struct changes.
v6:
- Properly make the unmap function based on encoders_post_disable()
with regarding to checks and iterators.
- Address checkpatch comment on "min = max = x()".
Cc: James Ausmus <james.ausmus@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: James Ausmus <james.ausmus@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180427231436.9353-1-paulo.r.zanoni@intel.com
In commit 9b6586ae9f ("drm/i915: Keep a global seqno per-engine"), we
moved from a global inflight counter to per-engine counters in the
hope that will be easy to run concurrently in future. However, with the
advent of the desire to move requests between engines, we do need a
global counter to preserve the semantics that no engine wraps in the
middle of a submit. (Although this semantic is now only required for gen7
semaphore support, which only supports greater-then comparisons!)
v2: Keep a global counter of all requests ever submitted and force the
reset when it wraps.
References: 9b6586ae9f ("drm/i915: Keep a global seqno per-engine")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-1-chris@chris-wilson.co.uk
Interrupts other than the one for AUX errors are required only for debug,
so unmask them via debugfs when the user requests debug.
User can make such a request with
echo 1 > <DEBUG_FS>/dri/0/i915_edp_psr_debug
There are no locks to serialize PSR debug enabling from
irq_postinstall() and debugfs for simplicity. As irq_postinstall() is
called only during module initialization/resume and IGT subtests
aren't expected to modify PSR debug at those times, we should be safe.
v2: Unroll loops (Ville)
Avoid resetting error mask bits.
v3: Unmask interrupts in postinstall() if debug was still enabled.
Avoid RMW (Ville)
v4: Avoid extra IMR write introduced in the previous version.(Jose)
Style changes, renames (Jose).
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Reviewed-by: Jose Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180405013717.24254-1-dhinakaran.pandiyan@intel.com
Add a selftest to ensure that we restore the whitelisted registers after
rewrite the registers everytime they might be scrubbed, e.g. module
load, reset and resume. For the other volatile workaround registers, we
export their presence via debugfs and check in igt/gem_workarounds.
However, we don't export the whitelist and rather than do so, let's test
them directly in the kernel.
The test we use is to read the registers back from the CS (this helps us
be sure that the registers will be valid for MI_LRI etc). In order to
generate the expected list, we split intel_whitelist_workarounds_emit
into two phases, the first to build the list and the second to apply.
Inside the test, we only build the list and then check that list against
the hw.
v2: Filter out pre-gen8 as they do not have RING_NONPRIV.
v3: Drop unused engine parameter, no plans to use it now or future.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Oscar Mateo <oscar.mateo@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180414122754.569-1-chris@chris-wilson.co.uk
Not all callers want the GPU error to handled in the same way, so expose
a control parameter. In the first instance, some callers do not want the
heavyweight error capture so add a bit to request the state to be
captured and saved.
v2: Pass msg down to i915_reset/i915_reset_engine so that we include the
reason for the reset in the dev_notice(), superseding the earlier option
to not print that notice.
v3: Stash the reason inside the i915->gpu_error to handover to the direct
reset from the blocking waiter.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180320100449.1360-2-chris@chris-wilson.co.uk
Those two concepts are really separate. Since GuC is writing data into
its own buffer and we even provide a way for userspace to read directly
from it using i915_guc_log_dump debugfs, there's no real reason to tie
log level with relay creation.
Let's create a separate debugfs, giving userspace a way to create a
relay on demand, when it wants to read a continuous log rather than a
snapshot.
v2: Don't touch guc_log_level on relay creation error, adjust locking
after rebase, s/dev_priv/i915, pass guc to file->private_data (Sagar)
Use struct_mutex rather than runtime.lock for set_log_level
v3: Tidy ordering of definitions (Sagar)
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180319095348.9716-5-michal.winiarski@intel.com
While the end goal is to make this information available to userspace
through a new ioctl, there is no reason we can't display it in a human
readable fashion through debugfs.
slice0: 3 subslice(s) (0x7):
subslice0: 8 EUs (0xff)
subslice1: 8 EUs (0xff)
subslice2: 8 EUs (0xff)
subslice3: 0 EUs (0x0)
slice1: 3 subslice(s) (0x7):
subslice0: 8 EUs (0xff)
subslice1: 8 EUs (0xff)
subslice2: 8 EUs (0xff)
subslice3: 0 EUs (0x0)
slice2: 3 subslice(s) (0x7):
subslice0: 8 EUs (0xff)
subslice1: 8 EUs (0xff)
subslice2: 8 EUs (0xff)
subslice3: 0 EUs (0x0)
v2: Reformat debugfs printing (Tvrtko)
Use the new EU mask helper (Tvrtko)
v3: Move printing code to intel_device_info.c to be shared with error
state (Michal)
v4: Bump u8 to u16 when using sseu_get_eus() (Lionel)
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180306122857.27317-4-lionel.g.landwerlin@intel.com
Up to now, subslice mask was assumed to be uniform across slices. But
starting with Cannonlake, slices can be asymmetric (for example slice0
has different number of subslices as slice1+). This change stores all
subslices masks for all slices rather than having a single mask that
applies to all slices.
v2: Rework how we store total numbers in sseu_dev_info (Tvrtko)
Fix CHV eu masks, was reading disabled as enabled (Tvrtko)
Readability changes (Tvrtko)
Add EU index helper (Tvrtko)
v3: Turn ALIGN(v, 8) / 8 into DIV_ROUND_UP(v, BITS_PER_BYTE) (Tvrtko)
Reuse sseu_eu_idx() for setting eu_mask on CHV (Tvrtko)
Reformat debug prints for subslices (Tvrtko)
v4: Change eu_mask helper into sseu_set_eus() (Tvrtko)
v5: With Haswell reporting masks & counts, bump sseu_*_eus() functions
to use u16 (Lionel)
v6: Fix sseu_get_eus() for > 8 EUs per subslice (Lionel)
v7: Change debugfs enabels for number of subslices per slice, will
need a small igt/pm_sseu change (Lionel)
Drop subslice_total field from sseu_dev_info, rely on
sseu_subslice_total() to recompute the value instead (Lionel)
v8: Remove unused function compute_subslice_total() (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180306122857.27317-2-lionel.g.landwerlin@intel.com