This workaround should prevent a bug that can be hit on a context
restore. To avoid the issue, we must emit a PIPE_CONTROL with CS stall
(0x7a000004 0x00100000 0x00000000 0x00000000) followed by 12DW's of
NOOP(0x0) in the indirect context batch buffer, to ensure the engine is
idle prior to programming 3DSTATE_SAMPLE_PATTERN.
It's also not clear whether we should add those extra dwords because of
the workaround itself, or if that's just padding for the WA BB (and next
commands could come right after the PIPE_CONTROL). We keep them for now.
References: HSD#1939868
v2: More descriptive changelog and comments.
v3: Explain that PIPE_CONTROL is actually 6 dwords, and that we advance
10 more dwords because of that.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180205233330.14973-1-rafael.antognolli@intel.com
This patch clears a single bit. The bit is 0 by default but expected
not to be set. Explicitly clearing the bit in this patch is intended
to indicate some thinking has occurred, and that we want this bit
cleared and we are not just excepting the default value.
We also stop setting GFX_RUN_LIST_ENABLE, which is correct since that
bit is gone.
v2 (from Paulo): fix indentation.
v3: Changed GEN check to >= 11. Corrected author name.
v4 (from Paulo): improve commit message (Daniele).
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Kelvin Gardiner <kelvin.gardiner@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180130134918.32283-9-paulo.r.zanoni@intel.com
Currently on every submission, we recalculate the ELSP register offset
for the engine, after chasing the pointers to find the iomem base. Since
this is fixed for the lifetime of the driver, record the offset in the
execlists struct.
In practice the difference is negligible, it just happens to remove 27
bytes of eyesore pointer dancing from next to the hottest instruction
(which is itself due to stalling for a cache miss) in perf profiles of
the execlists_submission_tasklet().
v2: Trim off one more elsp local.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171207222434.17686-1-chris@chris-wilson.co.uk
Sagar noticed the check can be consolidated between the engine stats
implementation and the PMU.
My first choice was a static inline helper but that got into include
ordering mess quickly fast so I went with a macro instead. At some point
we should perhaps looking into taking out the non-ringubffer bits from
intel_ringbuffer.h into a new intel_engine.h or something.
v2: Use engine->flags. (Chris Wilson)
v3: Rebase and mark GuC as not yet supported. (Chris Wilson)
v4: Move flag setting to intel_engines_reset_default_submission.
(Chris Wilson)
v5: Move flag setting to logical_ring_setup.
v6: intel_engines_reset_default_submission is the wrong place to set the
flag - it needs to be in execlists_set_default_submission. (Sagar)
v7: Flag setting in logical_ring_setup is not required. (Chris)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com> (v6)
Link: https://patchwork.freedesktop.org/patch/msgid/20171129102805.22690-1-tvrtko.ursulin@linux.intel.com
Track total time requests have been executing on the hardware.
We add new kernel API to allow software tracking of time GPU
engines are spending executing requests.
Both per-engine and global API is added with the latter also
being exported for use by external users.
v2:
* Squashed with the internal API.
* Dropped static key.
* Made per-engine.
* Store time in monotonic ktime.
v3: Moved stats clearing to disable.
v4:
* Comments.
* Don't export the API just yet.
v5: Whitespace cleanup.
v6:
* Rename ref to active.
* Drop engine aggregate stats for now.
* Account initial busy period after enabling stats.
v7:
* Rebase.
v8:
* Move context in notification after the notifier. (Chris Wilson)
v9:
In cases where stats tracking is getting disabled while there is
an active context on an engine, add up the current value to the
total. This also implies we don't clear the total when tracking
is disabled any longer. There is no real need to do so because
we define the stats as relative while enabled, meaning
comparison between two samples while tracking is enabled is the
valid usage. However, when busy stats will later be plugged into
the perf PMU API, it is beneficial to not reset the total, since
the PMU core likes to do some counter disable/enable cycles on
startup, and while doing so during a single long context
executing on an engine we would lose some accuracy and so make
unit testing more difficult than needs to be.
v10:
* Fix accounting for preemption.
v11:
* Rebase for i915_modparams.enable_execlists removal.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-5-tvrtko.ursulin@linux.intel.com
The hardware needs some time to process the information received in the
ExecList Submission Port, and expects us to not write anything more until
it has 'acknowledged' this new submission by sending an IDLE_ACTIVE or
PREEMPTED CSB event.
If we do not follow this, the driver could write new data into the ELSP
before HW had finishing fetching the previous one, putting us in
'undefined behaviour' space.
This seems to be the problem causing the spurious PREEMPTED & COMPLETE
events after a COMPLETE like the one below:
[] vcs0: sw rd pointer = 2, hw wr pointer = 0, current 'head' = 3.
[] vcs0: Execlist CSB[0]: 0x00000018 _ 0x00000007
[] vcs0: Execlist CSB[1]: 0x00000001 _ 0x00000000
[] vcs0: Execlist CSB[2]: 0x00000018 _ 0x00000007 <<< COMPLETE
[] vcs0: Execlist CSB[3]: 0x00000012 _ 0x00000007 <<< PREEMPTED & COMPLETE
[] vcs0: Execlist CSB[4]: 0x00008002 _ 0x00000006
[] vcs0: Execlist CSB[5]: 0x00000014 _ 0x00000006
The ELSP writes that lead to this CSB sequence show that the HW hadn't
started executing the previous execlist (the one with only ctx 0x6) by the
time the new one was submitted; this is a bit more clear in the data
show in the EXECLIST_STATUS register at the time of the ELSP write.
[] vcs0: ELSP[0] = 0x0_0 [execlist1] - status_reg = 0x0_302
[] vcs0: ELSP[1] = 0x6_fedb2119 [execlist0] - status_reg = 0x0_8302
[] vcs0: ELSP[2] = 0x7_fedaf119 [execlist1] - status_reg = 0x0_8308
[] vcs0: ELSP[3] = 0x6_fedb2119 [execlist0] - status_reg = 0x7_8308
Note that having to wait for this ack does not disable lite-restores,
although it may reduce their numbers.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102035
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/<20171118003038.7935-1-michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120123458.23242-4-chris@chris-wilson.co.uk
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
At the start of building a request, we would wait for roughly enough
space to fit the average request (to reduce the likelihood of having to
wait and abort partway through request construction). To achieve we
would try to begin a 0-length command packet, this just adds extra
confusion so make the wait-for-space explicit, as in the next patch we
want to move it from the backend to the i915_gem_request_alloc() so it
can ensure that the wait-for-space is the first operation in building a
new request.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171115151204.8105-2-chris@chris-wilson.co.uk
Take a copy of the HW state after a reset upon module loading by
executing a context switch from a blank context to the kernel context,
thus saving the default hw state over the blank context image.
We can then use the default hw state to initialise any future context,
ensuring that each starts with the default view of hw state.
v2: Unmap our default state from the GTT after stealing it from the
context. This should stop us from accidentally overwriting it via the
GTT (and frees up some precious GTT space).
Testcase: igt/gem_ctx_isolation
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-7-chris@chris-wilson.co.uk
In the next few patches, we will want to both copy out of the context
image and write a valid image into a new context. To be completely safe,
we should then couple in our domain tracking to ensure that we don't
have any issues with stale data remaining in unwanted cachelines.
Historically, we omitted the .write=true from the call to set-gtt-domain
in i915_switch_context() in order to avoid a stall between every request
as we would want to wait for the previous context write from the gpu.
Since then, we limit the set-gtt-domain to only occur when we first bind
the vma, so once in use we will never stall, and we are sure to flush
the context following a load from swap.
Equally we never applied the lessons learnt from ringbuffer submission
to execlists; so time to apply the flush of the lrc after load as well.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-6-chris@chris-wilson.co.uk
Trying to enable printk debugging for GEM is fraught with the issue of
spam; interactions with HW are very frequent and often boring. However,
one instance where they are not so boring is just before a BUG; here
ftrace provides a facility to dump its ringbuffer on an oops. So for CI
let's enable trace_printk() to capture the last exchanges with HW as a
death rattle.
For example,
[ 79.234110] ------------[ cut here ]------------
[ 79.234137] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:907!
[ 79.234145] invalid opcode: 0000 [#1] SMP
[ 79.234153] Dumping ftrace buffer:
[ 79.234158] ---------------------------------
...
[ 79.314044] gem_conc-1059 1..s1 79203443us : intel_lrc_irq_handler: bcs0 out[0]: ctx=5.2, seqno=145
[ 79.314089] gem_conc-1059 1..s. 79220800us : intel_lrc_irq_handler: bcs0 csb[1/1]: status=0x00000018:0x00000005
[ 79.314133] gem_conc-1059 1..s. 79220803us : intel_lrc_irq_handler: bcs0 out[0]: ctx=5.1, seqno=145
[ 79.314177] gem_conc-1062 2..s1 79230458us : intel_lrc_irq_handler: bcs0 in[0]: ctx=8.1, seqno=146
[ 79.314220] gem_conc-1062 2..s1 79230515us : intel_lrc_irq_handler: bcs0 in[0]: ctx=8.2, seqno=147
[ 79.314265] gem_conc-1059 1..s1 79230951us : intel_lrc_irq_handler: bcs0 csb[2/3]: status=0x00000012:0x00000008
[ 79.314309] gem_conc-1059 1..s1 79230954us : intel_lrc_irq_handler: bcs0 out[0]: ctx=8.2, seqno=147
[ 79.314353] gem_conc-1059 1..s1 79230954us : intel_lrc_irq_handler: bcs0 csb[3/3]: status=0x00008002:0x00000008
[ 79.314396] gem_conc-1059 1..s1 79230955us : intel_lrc_irq_handler: bcs0 out[0]: ctx=8.1, seqno=147
[ 79.314402] ---------------------------------
v2: Tweak the formatting to be more consistent between in/out.
v3: do {} while (0) stub macro protection
Suggested-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171109143019.16568-1-chris@chris-wilson.co.uk
Pretty similar to what we have on execlists.
We're reusing most of the GEM code, however, due to GuC quirks we need a
couple of extra bits.
Preemption is implemented as GuC action, and actions can be pretty slow.
Because of that, we're using a mutex to serialize them. Since we're
requesting preemption from the tasklet, the task of creating a workitem
and wrapping it in GuC action is delegated to a worker.
To distinguish that preemption has finished, we're using additional
piece of HWSP, and since we're not getting context switch interrupts,
we're also adding a user interrupt.
The fact that our special preempt context has completed unfortunately
doesn't mean that we're ready to submit new work. We also need to wait
for GuC to finish its own processing.
v2: Don't compile out the wait for GuC, handle workqueue flush on reset,
no need for ordered workqueue, put on a reviewer hat when looking at my own
patches (Chris)
Move struct work around in intel_guc, move user interruput outside of
conditional (Michał)
Keep ring around rather than chase though intel_context
v3: Extract WA for flushing ggtt writes to a helper (Chris)
Keep work_struct in intel_guc rather than engine (Michał)
Use ordered workqueue for inject_preempt worker to avoid GuC quirks.
v4: Drop now unused INTEL_GUC_PREEMPT_OPTION_IMMEDIATE (Daniele)
Drop stray newlines, use container_of for intel_guc in worker,
check for presence of workqueue when flushing it, rather than
enable_guc_submission modparam, reorder preempt postprocessing (Chris)
v5: Make wq NULL after destroying it
v6: Swap struct guc_preempt_work members (Michał)
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171026133558.19580-1-michal.winiarski@intel.com
Originally we set the priority to max upon inserting the request into
the execlists queue (and removing it from the scheduler lists). We could
then use the prio==INT_MAX as a shortcut within execlists_schedule() to
detect the end of the dependency chain. Since commit 1f181225f8
("drm/i915/execlists: Keep request->priority for its lifetime") this is
no longer true as we use the request completion as an indicator the
schedule dependency chain is complete instead. (This allows us to then
reschedule requests even when its context is in flight.) However, this
makes the GEM_BUG_ON() inside execlists_schedule() racy as we may change
the rq->prio at the same time. As the assertion is useful, let's keep
the assertion and remove the micro-optimisation.
Fixes: 1f181225f8 ("drm/i915/execlists: Keep request->priority for its lifetime")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171024115501.21033-1-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Back in commit a4b2b01523 ("drm/i915: Don't mark an execlists
context-switch when idle") we noticed the presence of late
context-switch interrupts. We were able to filter those out by looking
at whether the ELSP remained active, but in commit beecec9017
("drm/i915/execlists: Preemption!") that became problematic as we now
anticipate receiving a context-switch event for preemption while ELSP
may be empty. To restore the spurious interrupt suppression, add a
counter for the expected number of pending context-switches and skip if
we do not need to handle this interrupt to make forward progress.
v2: Don't forget to switch on for preempt.
v3: Reduce the counter to a on/off boolean tracker. Declare the HW as
active when we first submit, and idle after the final completion event
(with which we confirm the HW says it is idle), and track each source
of activity separately. With a finite number of sources, it should aide
us in debugging which gets stuck.
Fixes: beecec9017 ("drm/i915/execlists: Preemption!")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171023213237.26536-3-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
In the next patch, we want to reduce the lock coverage within the
shrinker, and one of the dangerous walks we have is over obj->vma_list.
We are only walking the obj->vma_list in order to check whether it has
been permanently pinned by HW access, typically via use on the scanout.
But we have a couple of other long term pins, the context objects for
which we currently have to check the individual vma pin_count. If we
instead mark these using obj->pin_display, we can forgo the dangerous
and sometimes slow list iteration.
v2: Rearrange code to try and avoid confusion from false associations
due to arrangement of whitespace along with rebasing on obj->pin_global.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171013202621.7276-4-chris@chris-wilson.co.uk
When we write to ELSP, it triggers a context preemption at the earliest
arbitration point (3DPRIMITIVE, some PIPECONTROLs, a few other
operations and the explicit MI_ARB_CHECK). If this is to the same
context, it triggers a LITE_RESTORE where the RING_TAIL is merely
updated (used currently to chain requests from the same context
together, avoiding bubbles). However, if it is to a different context, a
full context-switch is performed and it will start to execute the new
context saving the image of the old for later execution.
Previously we avoided preemption by only submitting a new context when
the old was idle. But now we wish embrace it, and if the new request has
a higher priority than the currently executing request, we write to the
ELSP regardless, thus triggering preemption, but we tell the GPU to
switch to our special preemption context (not the target). In the
context-switch interrupt handler, we know that the previous contexts
have finished execution and so can unwind all the incomplete requests
and compute the new highest priority request to execute.
It would be feasible to avoid the switch-to-idle intermediate by
programming the ELSP with the target context. The difficulty is in
tracking which request that should be whilst maintaining the dependency
change, the error comes in with coalesced requests. As we only track the
most recent request and its priority, we may run into the issue of being
tricked in preempting a high priority request that was followed by a
low priority request from the same context (e.g. for PI); worse still
that earlier request may be our own dependency and the order then broken
by preemption. By injecting the switch-to-idle and then recomputing the
priority queue, we avoid the issue with tracking in-flight coalesced
requests. Having tried the preempt-to-busy approach, and failed to find
a way around the coalesced priority issue, Michal's original proposal to
inject an idle context (based on handling GuC preemption) succeeds.
The current heuristic for deciding when to preempt are only if the new
request is of higher priority, and has the privileged priority of
greater than 0. Note that the scheduler remains unfair!
v2: Disable for gen8 (bdw/bsw) as we need additional w/a for GPGPU.
Since, the feature is now conditional and not always available when we
have a scheduler, make it known via the HAS_SCHEDULER GETPARAM (now a
capability mask).
v3: Stylistic tweaks.
v4: Appease Joonas with a snippet of kerneldoc, only to fuel to fire of
the preempt vs preempting debate.
Suggested-by: Michal Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171003203453.15692-8-chris@chris-wilson.co.uk