Rather than wait to see whether more space becomes available in the GuC
submission workqueue, we can just return -EAGAIN and let the caller try
again in a little while. This gets rid of an uninterruptable sleep in
the polling code :)
We'll also add a counter to the GuC client statistics, to see how often
we find the WQ full.
v2:
Flag the likely() code path (Tvtrko Ursulin).
v4:
Add/update comments about failure counters (Tvtrko Ursulin).
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
The knowledge of how to derive the relevant client from the request
should be localised within i915_guc_submission.c; the LRC code shouldn't
have to know about the internal details of the GuC submission process.
And all the information the GuC code needs should be encapsulated in (or
reachable from) the request.
v2:
GEM_BUG_ON() for bad GuC client (Tvrtko Ursulin).
Add/update kerneldoc explaining check_space/submit protocol
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Split the function of "enable_guc_submission" into two separate
options. The new one ("enable_guc_loading") controls only the
*fetching and loading* of the GuC firmware image. The existing
one is redefined to control only the *use* of the GuC for batch
submission once the firmware is loaded.
In addition, the degree of control has been refined from a simple
bool to an integer key, allowing several options:
-1 (default) whatever the platform default is
0 DISABLE don't load/use the GuC
1 BEST EFFORT try to load/use the GuC, fallback if not available
2 REQUIRE must load/use the GuC, else leave the GPU wedged
The new platform default (as coded here) will be to attempt to
load the GuC iff the device has a GuC that requires firmware,
but not yet to use it for submission. A later patch will change
to enable it if appropriate.
v4:
Changed some error-message levels, mostly ERROR->INFO, per
review comments by Tvrtko Ursulin.
v5:
Dropped one more error message, disabled GuC submission on
hypothetical firmware-free devices [Tvrtko Ursulin].
v6:
Logging tidy by Tvrtko Ursulin:
* Do not log falling back to execlists when wedging the GPU.
* Do not log fw load errors when load was disabled by user.
* Pass down some error code from fw load for log message to
make more sense.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v5)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Tested-by: Fiedorowicz, Lukasz <lukasz.fiedorowicz@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v5)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Nick Hoath <nicholas.hoath@intel.com> (v6)
For now, anything with a GuC requires uCode loading, and then supports
command submission once loaded. But these are logically distinct from
simply "having a GuC", so we need a separate macro for the latter. Then,
various tests should use this new macro rather than HAS_GUC_UCODE() or
testing enable_guc_submission.
v4:
Added a couple more uses of the new macro.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
The GuC initialisation code could do other things apart from loading
firmware, so here we rename the three primary entry points to remove any
specific reference to "ucode" (no functional changes, just renaming).
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
When we resume the watermark register may contain some BIOS leftovers,
or just the hardware reset values. We should ignore those as the
pipes will be off anyway, and so frobbing around with intermediate
watermarks doesn't make much sense.
In fact I think we should just throw the skip_intermediate_wm flag
out, and instead properly sanitize the "active" watermarks to match
the current plane and pipe states. The actual wm state readout might
also need a bit of work. But for now, let's continue with the
skip_intermediate_wm to keep the fix more minimal.
Fixes this sort of errors on resume
[drm:ilk_validate_pipe_wm] LP0 watermark invalid
[drm:intel_crtc_atomic_check] No valid intermediate pipe watermarks are possible
[drm:intel_display_resume [i915]] *ERROR* Restoring old state failed with -22
and a boatload of subsequent modeset BAT fails on my ILK.
v2:
- Rebase; the SKL atomic WM patches that just landed changed the WM
structure fields in intel_crtc_state slightly. (Matt)
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Fixes: ed4a6a7ca8 ("drm/i915: Add two-stage ILK-style watermark programming (v11)")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463159442-20478-1-git-send-email-matthew.d.roper@intel.com
(cherry picked from commit e3d5457c7c)
[Jani: rebase on drm-next while cherry-picking]
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
In BXT DSI there is no regs programmed with few horizontal timings
in Pixels but txbyteclkhs.. So retrieval process adds some
ROUND_UP ERRORS in the process of PIXELS<==>txbyteclkhs.
Actually here for the given adjusted_mode, we are calculating the
value programmed to the port and then back to the horizontal timing
param in pixels. This is the expected value at the end of get_config,
including roundup errors. And if that is same as retrieved value
from port, then retrieved (HW state) adjusted_mode's horizontal
timings are corrected to match with SW state to nullify the errors.
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461053894-5058-2-git-send-email-ramalingam.c@intel.com
(cherry picked from commit 042ab0c3c4)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Move the intel_enable_gtt() call to happen before we touch the GTT
during resume. Right now it's done way too late. Before
commit ebb7c78d35 ("agp/intel-gtt: Only register fake agp driver for gen1")
it was actually done earlier on account of also getting called from
the resume hook of the fake agp driver. With the fake agp driver
no longer getting registered we must move the call up.
The symptoms I've seen on my 830 machine include lowmem corruption,
other kinds of memory corruption, and straight up hung machine during
or just after resume. Not really sure what causes the memory corruption,
but so far I've not seen any with this fix.
I think we shouldn't really need to call this during init, but we have
been doing that so I've decided to keep the call. However moving that
call earlier could be prudent as well. Doing it right after the
intel-gtt probe seems appropriate.
Also tested this on 946gz,elk,ilk and all seemed quite happy with
this change.
v2: Reorder init_hw vs. enable_hw functions (Chris)
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: drm-intel-fixes@lists.freedesktop.org
Fixes: ebb7c78d35 ("agp/intel-gtt: Only register fake agp driver for gen1")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462559755-353-1-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit ac840ae535)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Try to detect the max TMDS clock limit for the DP++ adaptor (if any)
and take it into account when checking the port clock.
Note that as with the sink (HDMI vs. DVI) TMDS clock limit we'll ignore
the adaptor TMDS clock limit in the modeset path, in case users are
already "overclocking" their TMDS links. One subtle change here is that
we'll have to respect the adaptor TMDS clock limit when we decide whether
to do 12bpc or 8bpc, otherwise we might end up picking 12bpc and
accidentally driving the TMDS link out of spec even when the user chose
a mode that fits wihting the limits at 8bpc. This means you can't
"overclock" your DP++ dongle at 12bpc anymore, but you can continue to
do so at 8bpc.
Note that for simplicity we'll use the I2C access method for all dual
mode adaptors including type 2. Otherwise we'd have to start mixing
DP AUX and HDMI together. In the future we may need to do that if we
come across any board designs that don't hook up the DDC pins to the
DP++ connectors. Such boards would obviously only work with type 2
dual mode adaptors, and not type 1.
v2: Store adaptor type under indel_hdmi->dp_dual_mode
Deal with DRM_DP_DUAL_MODE_UNKNOWN
Pass adaptor type to drm_dp_dual_mode_max_tmds_clock(),
and use it for type1 adaptors as well
Cc: stable@vger.kernel.org
Reported-by: Tore Anderson <tore@fud.no>
Fixes: 7a0baa6234 ("Revert "drm/i915: Disable 12bpc hdmi for now"")
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Shashank Sharma <shashank.sharma@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462216105-20881-3-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Shashank Sharma <shashank.sharma@intel.com>
(cherry picked from commit b1ba124d8e)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
I'm looking at trying to possibly merge the 32-bit and 64-bit versions
of the x86 uaccess.h implementation, but first this needs to be cleaned
up.
For example, the 32-bit version of "__copy_to_user_inatomic()" is mostly
the special cases for the constant size, and it's actually never
relevant. Every user except for one aren't actually using a constant
size anyway, and the one user that uses it is better off just using
__put_user() instead.
So get rid of the unnecessary complexity.
[ The same cleanup should likely happen to __copy_from_user_inatomic()
as well, but that one has a lot more users that I need to take a look
at first ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull staging and IIO driver updates from Greg KH:
"Here's the big staging and iio driver update for 4.7-rc1.
I think we almost broke even with this release, only adding a few more
lines than we removed, which isn't bad overall given that there's a
bunch of new iio drivers added.
The Lustre developers seem to have woken up from their sleep and have
been doing a great job in cleaning up the code and pruning unused or
old cruft, the filesystem is almost readable :)
Other than that, just a lot of basic coding style cleanups in the
churn. All have been in linux-next for a while with no reported
issues"
* tag 'staging-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (938 commits)
Staging: emxx_udc: emxx_udc: fixed coding style issue
staging/gdm724x: fix "alignment should match open parenthesis" issues
staging/gdm724x: Fix avoid CamelCase
staging: unisys: rename misleading var ii with frag
staging: unisys: visorhba: switch success handling to error handling
staging: unisys: visorhba: main path needs to flow down the left margin
staging: unisys: visorinput: handle_locking_key() simplifications
staging: unisys: visorhba: fail gracefully for thread creation failures
staging: unisys: visornic: comment restructuring and removing bad diction
staging: unisys: fix format string %Lx to %llx for u64
staging: unisys: remove unused struct members
staging: unisys: visorchannel: correct variable misspelling
staging: unisys: visorhba: replace functionlike macro with function
staging: dgnc: Need to check for NULL of ch
staging: dgnc: remove redundant condition check
staging: dgnc: fix 'line over 80 characters'
staging: dgnc: clean up the dgnc_get_modem_info()
staging: lustre: lnet: enable configuration per NI interface
staging: lustre: o2iblnd: properly set ibr_why
staging: lustre: o2iblnd: remove last of kiblnd_tunables_fini
...
Avoiding the out-of-line call to sg_next() reduces the kernel execution
overhead by 10% in some workloads (for example the Unreal Engine 4 demo
Atlantis on 2GiB GTTs) which are dominated by the cost of inserting PTEs
due to texture thrashing. We can demonstrate this in a microbenchmark
that forces us to rebind the object on every execbuf, where we can
measure a 25% improvement, in the time required to execute an execbuf
requiring a texture to be rebound, for inlining the sg_next() for large
texture sizes.
Benchmark: igt/benchmarks/gem_exec_fault
Benchmark: igt/benchmarks/gem_exec_trace/Atlantis
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1463741647-15666-5-git-send-email-chris@chris-wilson.co.uk
The existing for_each_sg_page() iterator is somewhat heavyweight, and is
limiting i915 driver performance in a few benchmarks. So here we
introduce somewhat lighter weight iterators, primarily for use with GEM
objects or other case where we need only deal with whole aligned pages.
Unlike the old iterator, the new iterators use an internal state
structure which is not intended to be accessed by the caller; instead
each takes as a parameter an output variable which is set before each
iteration. This makes them particularly simple to use :)
One of the new iterators provides the caller with the DMA address of
each page in turn; the other provides the 'struct page' pointer required
by many memory management operations.
Various uses of for_each_sg_page() are then converted to the new macros.
v2: Force inlining of the sg_iter constructor and make the union
anonymous.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1463741647-15666-4-git-send-email-chris@chris-wilson.co.uk
The recently-added i915_gem_object_pin_map() can be further optimised
for "small" objects. To facilitate this, and simplify the error paths
before adding the new code, this patch pulls out the "mapping" part of
the operation (involving local allocations which must be undone before
return) into its own subfunction.
The next patch will then insert the new optimisation into the middle of
the now-separated subfunction.
This reorganisation will probably not affect the generated code, as the
compiler will most likely inline it anyway, but it makes the logical
structure a bit clearer and easier to modify.
v2:
Restructure loop-over-pages & error check [Chris Wilson]
v3:
Add page count to debug messages [Chris Wilson]
Convert WARN_ON() to GEM_BUG_ON()
v4:
Drop the DEBUG messages [Tvrtko Ursulin]
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1463741647-15666-2-git-send-email-chris@chris-wilson.co.uk
This reverts
commit dfaf37baa0
Author: Rodrigo Vivi <rodrigo.vivi@intel.com>
Date: Mon Dec 7 14:45:20 2015 -0800
drm/i915: Fix idle_frames counter.
and
commit 97173eaf5f
Author: Rodrigo Vivi <rodrigo.vivi@intel.com>
Date: Tue Jul 7 16:28:55 2015 -0700
drm/i915: PSR: Increase idle_frames
and implements
commit d44b4dcbd1
Author: Rodrigo Vivi <rodrigo.vivi@intel.com>
Date: Fri Nov 14 08:52:31 2014 -0800
drm/i915: HSW/BDW PSR Set idle_frames = VBT + 1
without the hack to use 2 idle frames when VBT says 1. We keep the + 1
just for safety, although I haven't really figured out why that one
exists.
It's nonsense. idle_frames = number of frames where the screen is
entirely idle before we think about entering PSR.
idle_patter = part of link training, and we probably totally butchered
link training because we told the hw to entirely skip it. No wonder
PSR occasionally just fell over.
I suspect the reason we've increased idle frames is that it makes PSR
entry slightly less likely, and more likely to happen in a quite
system, which probably increased the changes the panel came back up
without link training. The proper fix is to implement link training
for PSR.
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Sonika Jindal <sonika.jindal@intel.com>
Cc: Durgadoss R <durgadoss.r@intel.com>
Cc: "Pandiyan, Dhinakaran" <dhinakaran.pandiyan@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463590036-17824-3-git-send-email-daniel.vetter@ffwll.ch
With intel_pipe_update begin/end we ensure that the mmio updates
don't run during vblank interrupt, using the hw counter we can
be sure that when current vblank count != vblank count at the time
of pipe_update_end the mmio update is complete.
This allows us to use mmio updates on all platforms, using the
update_plane call.
With Chris Wilson's patch to skip waiting for vblanks for
legacy_cursor_update this potentially leaves a small race
condition, in which update_plane can be called with a freed
crtc_state. Because of this commit acf4e84d61
("drm/i915: Avoid stalling on pending flips for legacy cursor updates")
is temporarily reverted.
Changes since v1:
- Split out the flip_work rename.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463490484-19540-9-git-send-email-maarten.lankhorst@linux.intel.com
Reviewed-by: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>