Because OCD.
Now seriously, commit 1aa920ea0e ("drm/i915: add register macro
definition style guide") has finally established a coding standard to
be followed by the rest of the file, and I've been trying to request
everybody to adhere to that since then. The problem is that when
someone adds a new line to a register that has the wrong style, these
people generally propagate the wrong style and I have to keep asking
them to drive-by fix the whole register, which is not something I like
to do and also creates extra work for them. Or I can ignore the
propagation of the wrong coding style and feel anxious about it. On
top of that, we now have our CI happily reminding us about these
problems, which makes everything worse.
So IMHO the best way to proceed is to fix the spacing issues in the
file once and for all. Contributors will stop propagating the bad
style when adding new bits to registers that already have bad style,
we will stop asking them to redo their patches and the CI emails will
become more relevant by having less semi-false errors.
Yes, there will be some pain involved for backporters, but at least
spacing issues like that are easy to spot and fix in the patch files.
This patch was generated by:
../../../../scripts/checkpatch.pl -f --strict --types SPACING \
--fix-inplace i915_reg.h
I manually checked the output and everything seems sane.
v2: Single conflict around the addition of DP_TP_CTL_LINK_TRAIN_PAT4.
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180618180943.894-1-paulo.r.zanoni@intel.com
commit b2209e62a4 ("drm/i915/execlists: Reset the CSB head tracking on
reset/sanitization") and commit 1288786b18 ("drm/i915: Move GEM sanitize
from resume_early to resume") show the conflicting requirements on the
code. We must reset the GPU before trashing live state on a fast resume
(hibernation debug, or error paths), but we must only reset our state
tracking iff the GPU is reset (or power cycled). This is tricky if we
are disabling GPU reset to simulate broken hardware; we reset our state
tracking but the GPU is left intact and recovers from its stale state.
v2: Again without the assertion for forcewake, no longer required since
commit b3ee09a4de ("drm/i915/ringbuffer: Fix context restore upon reset")
as the contexts are reset from the CS ensuring everything is powered up.
Fixes: b2209e62a4 ("drm/i915/execlists: Reset the CSB head tracking on reset/sanitization")
Fixes: 1288786b18 ("drm/i915: Move GEM sanitize from resume_early to resume")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180616202534.18767-1-chris@chris-wilson.co.uk
We can avoid the mmio read of the CSB pointers after reset based on the
knowledge that the HW always start writing at entry 0 in the CSB buffer.
We need to reset our CSB head tracking after GPU reset (and on
sanitization after resume) so that we are expecting to read from entry
0, hence we reset our head tracking back to the entry before (the last
entry in the ring).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180615093137.14270-2-chris@chris-wilson.co.uk
If client is smart or lucky enough to create a new context
after each hang, our context banning mechanism will never
catch up, and as a result of that it will be saved from
client banning. This can result in a never ending streak of
gpu hangs caused by bad or malicious client, preventing
access from other legit gpu clients.
Fix this by always incrementing per client ban score if
it hangs in short successions regardless of context ban
scoring. The exception are non bannable contexts. They remain
detached from client ban scoring mechanism.
v2: xchg timestamp, tidyup (Chris)
v3: comment, bannable & banned together (Chris)
Fixes: b083a0870c ("drm/i915: Add per client max context ban limit")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180615104429.31477-1-mika.kuoppala@linux.intel.com
For each platform, we have a few registers that are rewritten with
different values -- they are not part of a sequence, just different parts
of a masked register set at different times (e.g. platform and gen
workarounds). Consolidate these into a single register write to keep the
table compact, important since we are running of room in the current
fixed sized buffer.
While adjusting the construction of the wa table, make it non fatal so
that the driver still loads but keeping the warning and extra details
for inspection.
Inspecting the changes for a Kabylake system,
Before:
Address val mask read
0x07014 0x20002000 0x00002000 0x00002100
0x0E194 0x01000100 0x00000100 0x00000114
0x0E4F0 0x81008100 0x00008100 0xFFFF8120
0x0E184 0x00200020 0x00000020 0x00000022
0x0E194 0x00140014 0x00000014 0x00000114
0x07004 0x00420042 0x00000042 0x000029C2
0x0E188 0x00080000 0x00000008 0x00008030
0x07300 0x80208020 0x00008020 0x00008830
0x07300 0x00100010 0x00000010 0x00008830
0x0E184 0x00020002 0x00000002 0x00000022
0x0E180 0x20002000 0x00002000 0x00002000
0x02580 0x00010000 0x00000001 0x00000004
0x02580 0x00060004 0x00000006 0x00000004
0x07014 0x01000100 0x00000100 0x00002100
0x0E100 0x00100010 0x00000010 0x00008050
After:
Address val mask read
0x02580 0x00070004 0x00000007 0x00000004
0x07004 0x00420042 0x00000042 0x000029C2
0x07014 0x21002100 0x00002100 0x00002100
0x07300 0x80308030 0x00008030 0x00008830
0x0E100 0x00100010 0x00000010 0x00008050
0x0E180 0x20002000 0x00002000 0x00002000
0x0E184 0x00220022 0x00000022 0x00000022
0x0E188 0x00080000 0x00000008 0x00008030
0x0E194 0x01140114 0x00000114 0x00000114
0x0E4F0 0x81008100 0x00008100 0xFFFF8120
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180615120207.13952-1-chris@chris-wilson.co.uk
Hangcheck is our back up in case the GPU or the driver gets stuck. It
detects when the GPU is not making any progress and issues a GPU reset.
However, if the driver is failing to make any progress, we can get
ourselves into a situation where we continually try resetting the GPU to
no avail. Employ a second timeout such that if we continue to see the
same seqno (the stalled engine has made no progress at all) over the
course of several hangchecks, declare the driver wedged and attempt to
start afresh.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180602104853.17140-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
In the unlikely case where we have failed to keep submitting to the GPU,
we end up with the ELSP queue empty but a pending queue of requests.
Here, we skip the per-engine reset as there is no guilty request, but in
doing so we also skip the engine restart leaving ourselves with a
permanently hung engine. A quick way to recover is by moving the tasklet
kick to execlists_reset_finish() (from init_hw). We still emit the error
on hanging, so the error is not lost but we should be able to recover.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180604073441.6737-2-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
With an old (4.7.3 on 32bit) gcc, it emits a warning for
In file included from drivers/gpu/drm/i915/i915_request.c:1425:0:
drivers/gpu/drm/i915/selftests/i915_request.c: In function ‘live_nop_request’:
drivers/gpu/drm/i915/selftests/i915_request.c:380:21: error: ‘request’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
Silence it by just setting it to NULL on initialisation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180614124923.18071-1-chris@chris-wilson.co.uk
While Bspec doesn't list a specific sequence for turning off the DP port
on g4x we are getting an underrun if the port is disabled in the
.disable() hook. Looks like the pipe stops when the port stops, and by
that time the plane disable may not have completed yet. Also the plane(s)
seem to end up in some wonky state when this happens as they also signal
another underrun immediately after we turn them back on during the next
enable sequence.
We could add a vblank wait in .disable() to avoid wedging the planes,
but I assume we're still tripping up the pipe in some way. So it seems
better to me to just follow the ILK+ sequence and turn off the DP port
in .post_disable() instead. This sequence doesn't seem to suffer from
this problem. Could be it was always the intended sequence for DP and
the gen4 bspec was just never updated to include it.
Originally we used the bad sequence even on ilk+, but I changed that
in commit 08aff3fe26 ("drm/i915: Move DP port disable to post_disable
for pch platforms") as it was causing issues on those platforms as well.
I left out g4x then only because I didn't have the hardware to test it.
Now that I do it's fairly clear that the ilk+ sequence is also the
right choice for g4x.
v2: Fix whitespace fail (Jani)
Mention the ilk+ commit (Jani)
Cc: stable@vger.kernel.org
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180613160553.11664-2-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
On i965/g4x IIR is edge triggered. So in order for IIR to notice that
there is still a pending interrupt we have to force and edge in ISR.
For the ISR/IIR pipe event bits we can do that by temporarily
clearing all the PIPESTAT enable bits when we ack the status bits.
This will force the ISR pipe event bit low, and it can then go back
high when we restore the PIPESTAT enable bits.
This avoids the following race:
1. stat = read(PIPESTAT)
2. an enabled PIPESTAT status bit goes high
3. write(PIPESTAT, enable|stat);
4. write(IIR, PIPE_EVENT)
The end result is IIR==0 and ISR!=0. This can lead to nasty
vblank wait/flip_done timeouts if another interrupt source
doesn't trick us into looking at the PIPESTAT status bits despite
the IIR PIPE_EVENT bit being low.
Before i965 IIR was level triggered so this problem can't actually
happen there. And curiously VLV/CHV went back to the level triggered
scheme as well. But for simplicity we'll use the same i965/g4x
compatible code for all platforms.
Cc: stable@vger.kernel.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106033
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105225
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106030
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180611200258.27121-1-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>