We need to move to a more flexible timeline that doesn't assume one
fence context per engine, and so allow for a single timeline to be used
across a combination of engines. This means that preallocating a fence
context per engine is now a hindrance, and so we want to introduce the
singular timeline. From the code perspective, this has the notable
advantage of clearing up a lot of mirky semantics and some clumsy
pointer chasing.
By splitting the timeline up into a single entity rather than an array
of per-engine timelines, we can realise the goal of the previous patch
of tracking the timeline alongside the ring.
v2: Tweak wait_for_idle to stop the compiling thinking that ret may be
uninitialised.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180502163839.3248-2-chris@chris-wilson.co.uk
In the future, we want to move a request between engines. To achieve
this, we first realise that we have two timelines in effect here. The
first runs through the GTT is required for ordering vma access, which is
tracked currently by engine. The second is implied by sequential
execution of commands inside the ringbuffer. This timeline is one that
maps to userspace's expectations when submitting requests (i.e. given the
same context, batch A is executed before batch B). As the rings's
timelines map to userspace and the GTT timeline an implementation
detail, move the timeline from the GTT into the ring itself (per-context
in logical-ring-contexts/execlists, or a global per-engine timeline for
the shared ringbuffers in legacy submission.
The two timelines are still assumed to be equivalent at the moment (no
migrating requests between engines yet) and so we can simply move from
one to the other without adding extra ordering.
v2: Reinforce that one isn't allowed to mix the engine execution
timeline with the client timeline from userspace (on the ring).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180502163839.3248-1-chris@chris-wilson.co.uk
Commit 39bf4de89f ("drm/i915: Add -Wall -Wextra to our build, set
warnings to full") enabled extra warnings for i915 to spot possible
bugs in new code, and then disabled a subset of these warnings to keep
the current code building without warnings (with gcc). Enabling the
extra warnings also enabled some additional clang-only warnings, as a
result building i915 with clang currently is extremely noisy. For now
also disable the clang warnings sign-compare, sometimes-uninitialized,
unneeded-internal-declaration and initializer-overrides. If desired
they can be re-enabled after the code has been fixed.
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180501182440.70121-1-mka@chromium.org
Using plain jiffies in error state output makes the output
time differences relative to the current system time. This
is wrong as it makes output time differences dependent
of when the error state is printed rather than when it is
captured.
Store capture jiffies into error state and use it
when outputting the state to fix time differences output.
v2: use engine timestamp as epoch, output formatting (Chris)
v3: pass epoch to print_engine/request (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180430075259.4476-1-mika.kuoppala@linux.intel.com
Due to the latency of the tasklet running from ksoftirqd, by the time we
process the execlist dequeue may be a long time behind the GPU. If the
request was completed when we ran reschedule, we will not have tweaked
its priority, but if it is still listed as being in-flight for dequeue
we will use it as a reference for the rest of the queue, including
requests from its own context which will now be at higher priority. This
can cause us to issue a preempt-to-idle request, even though the request
we want to preempt is already complete.
Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180501122131.19435-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
This is an important part of the DDI initalization as well as
for changing the voltage during DisplayPort link training.
The Voltage swing seqeuence is similar to Cannonlake.
However it has different register definitions and hence
it makes sense to create a separate vswing sequence and
program functions for ICL to leave room for more changes
in case the Bspec changes later and deviates from CNL sequence.
v2:
Use ~TAP3_DISABLE for enbaling that bit (Jani Nikula)
v3:
* Use dw4_scaling column for PORT_TX_DW4 values (Rodrigo)
v4:
* Call it combo_vswing, use switch statement (Paulo)
v5 (from Paulo):
* Fix a typo.
* s/rate < 600000/rate <= 600000/.
* Don't remove blank lines that should be there.
v6:
* Rebased by Rodrigo on top of Cannonlake changes
where non vswing sequences are not aligned with iboost
anymore.
v7: Another rebase after an upstream rework.
v8 (from Paulo):
* Adjust the code to the upstream output type changes.
* Squash the patch that moved some functions up.
* Merge both get_combo_buf_trans functions in order to simplify the
code.
* Change the changelog format.
v9 (from Paulo):
* Use RTERM_SELECT instead of SCALING_MODE_SEL.
* Adjust the output type handling according to how the other platforms
do it now.
v10 (from Paulo):
* Fix comment left out from v9 changes (Rodrigo).
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: James Ausmus <james.ausmus@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Manasi Navare <manasi.d.navare@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180328215803.13835-8-paulo.r.zanoni@intel.com
In the next patch, rings are the central timeline as requests may jump
between engines. Therefore in the future as we retire in order along the
engine timeline, we may retire out-of-order within a ring (as the ring now
occurs along multiple engines), leading to much hilarity in miscomputing
the position of ring->head.
As an added bonus, retiring along the ring reduces the penalty of having
one execlists client do cleanup for another (old legacy submission
shares a ring between all clients). The downside is that slow and
irregular (off the critical path) process of cleaning up stale requests
after userspace becomes a modicum less efficient.
In the long run, it will become apparent that the ordered
ring->request_list matches the ring->timeline, a fun challenge for the
future will be unifying the two lists to avoid duplication!
v2: We need both engine-order and ring-order processing to maintain our
knowledge of where individual rings have completed upto as well as
knowing what was last executing on any engine. And finally by decoupling
retiring the contexts on the engine and the timelines along the rings,
we do have to keep a reference to the context on each request
(previously it was guaranteed by the context being pinned).
v3: Not just a reference to the context, but we need to keep it pinned
as we manipulate the rings; i.e. we need a pin for both the manipulation
of the engine state during its retirements, and a separate pin for the
manipulation of the ring state.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-3-chris@chris-wilson.co.uk
In commit 9b6586ae9f ("drm/i915: Keep a global seqno per-engine"), we
moved from a global inflight counter to per-engine counters in the
hope that will be easy to run concurrently in future. However, with the
advent of the desire to move requests between engines, we do need a
global counter to preserve the semantics that no engine wraps in the
middle of a submit. (Although this semantic is now only required for gen7
semaphore support, which only supports greater-then comparisons!)
v2: Keep a global counter of all requests ever submitted and force the
reset when it wraps.
References: 9b6586ae9f ("drm/i915: Keep a global seqno per-engine")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-1-chris@chris-wilson.co.uk
Previously, we just reset the ring register in the context image such
that we could skip over the broken batch and emit the closing
breadcrumb. However, on resume the context image and GPU state would be
reloaded, which may have been left in an inconsistent state by the
reset. The presumption was that at worst it would just cause another
reset and skip again until it recovered, however it seems just as likely
to cause an unrecoverable hang. Instead of risking loading an incomplete
context image, restore it back to the default state.
v2: Fix up off-by-one from including the ppHSWP in with the register
state.
v3: Use a ring local to compact a few lines.
v4: Beware setting the ring local before checking for a NULL request.
References: https://bugs.freedesktop.org/show_bug.cgi?id=105304
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com> #v2
Link: https://patchwork.freedesktop.org/patch/msgid/20180428111532.15819-1-chris@chris-wilson.co.uk
drm-misc-next for v4.18:
UAPI Changes:
- Add support for a generic plane alpha property to sun4i, rcar-du and atmel-hclcdc. (Maxime)
Core Changes:
- Stop looking at legacy plane->fb and crtc members in atomic drivers. (Ville)
- mode_valid return type fixes. (Luc)
- Handle zpos normalization in the core. (Peter)
Driver Changes:
- Implement CTM, plane alpha and generic async cursor support in vc4. (Stefan)
- Various fixes for HPD and aux chan in drm_bridge/analogix_dp. (Lin, Zain, Douglas)
- Add support for MIPI DSI to sun4i. (Maxime)
Signed-off-by: Dave Airlie <airlied@redhat.com>
# gpg: Signature made Thu 26 Apr 2018 08:21:01 PM AEST
# gpg: using RSA key FE558C72A67013C3
# gpg: Can't check signature: public key not found
Link: https://patchwork.freedesktop.org/patch/msgid/b33da7eb-efc9-ae6f-6f69-b7acd6df6797@mblankhorst.nl
ICL has two slices of DBuf, each slice of size 1024 blocks.
We should not always enable slice-2. It should be enabled only if
display total required BW is > 12GBps OR more than 1 pipes are enabled.
Changes since V1:
- typecast total_data_rate to u64 before multiplication to solve any
possible overflow (Rodrigo)
- fix where skl_wm_get_hw_state was memsetting ddb, resulting
enabled_slices to become zero
- Fix the logic of calculating ddb_size
Changes since V2:
- If no-crtc is part of commit required_slices will have value "0",
don't try to disable DBuf slice.
Changes since V3:
- Create a generic helper to enable/disable slice
- don't return early if total_data_rate is 0, it may be cursor only
commit, or atomic modeset without any plane.
Changes since V4:
- Solve checkpatch warnings
- use kernel types u8/u64 instead of uint8_t/uint64_t
Changes since V5:
- Rebase
Signed-off-by: Mahesh Kumar <mahesh1.kumar@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180426142517.16643-3-mahesh1.kumar@intel.com
As the Geminilake firmware is now merged to linux-firmware.git
use MODUE_FIRMWARE to load the firmware.
This removes the error message in the dmesg log:
i915 0000:00:02.0: Direct firmware load for
i915/glk_dmc_ver1_04.bin failed with error -2
i915 0000:00:02.0: Failed to load DMC firmware
i915/glk_dmc_ver1_04.bin. Disabling runtime power management.
i915 0000:00:02.0: DMC firmware homepage:
https://01.org/linuxgraphics/downloads/firmware
and now shows that the firmware has correctly loaded:
[drm] Finished loading DMC firmware i915/glk_dmc_ver1_04.bin (v1.4)
Signed-off-by: Ian W MORRISON <ianwmorrison@gmail.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180411044213.383-1-ianwmorrison@gmail.com
Any write in any display register was causing HW to exit PSR,
masking it to allow more power savings. Writes to pipe related
registers will still cause HW to exit PSR.
This is already masked for PSR2.
It also do not break the Display WA #0884, writes to CURSURFLIVE
are still causing hardware to exit PSR. This was tested in CNL machine
by triggering a write to CURSURFLIVE when a debugfs was read by user.
Bspec: 7721 and 8042
v4: Checked that it do not breaks WA #0884 and added this information
to the commit message.
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180425212334.21109-1-jose.souza@intel.com
Even though we weren't injecting guilty requests to be reset, we could
still fall over the issue of resetting the same request too fast -- where
the GPU refuses to start again. (Although it is interesting to note that
reloading the driver is sufficient, suggesting that we could recover if
we delayed the setup after reset?) Continue to paper over the problem by
adding a small delay by waiting for the engine to idle between tests,
and ensure that the engines are idle before starting the idle tests.
v2: Replace single instance of 50 with a magic macro.
References: 028666793a ("drm/i915/selftests: Avoid repeatedly harming the same innocent context")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180411120346.27618-1-chris@chris-wilson.co.uk
We call intel_dp_compute_rate() in intel_dp_compute_config() only to be
able to debug log the link_bw and rate_select parameters; we don't use
the parameters here for anything else. We call intel_dp_compute_rate()
again during link training where we actually need and use the
parameters.
Move the debug logging of link_bw and rate_select to
intel_dp_link_training_clock_recovery(), and clean up the extra
intel_dp_compute_rate() call and extra clutter from the already
overcrowded intel_dp_compute_config().
v2: Rewrote commit message (Rodrigo, Manasi)
Reviewed-by: Manasi Navare <manasi.d.navare@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/c5cf6a179e2d244eceb6bb80a792765d9efbee4f.1524730974.git.jani.nikula@intel.com
We can convert engine stats from a spinlock to seqlock to ensure interrupt
processing is never even a tiny bit delayed by parallel readers.
There is a smidgen bit more cost on the write lock side, and an extremely
unlikely chance that readers will have to retry a few times in face of
heavy interrupt load. But it should be extremely unlikely given how
lightweight read side section is compared to the interrupt processing
side, and also compared to the rest of the code paths which can lead into
it. Furthermore, writer is the ones doing the real, latency sensitive
work, while readers are only informative.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180426074716.7352-1-tvrtko.ursulin@linux.intel.com