We track the LRU access for eviction and bump the last access for the
user GGTT on set-to-gtt. When we do so we need to not only bump the
primary GGTT VMA but all partials as well. Similarly we want to
bump the last access tracking for when unpinning an object from the
scanout so that they do not get promptly evicted and hopefully remain
available for reuse on the next frame.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-29-chris@chris-wilson.co.uk
The existing ABI says that scanouts are pinned into the mappable region
so that legacy clients (e.g. old Xorg or plymouthd) can write directly
into the scanout through a GTT mapping. However if the surface does not
fit into the mappable region, we are better off just trying to fit it
anywhere and hoping for the best. (Any userspace that is capable of
using ginormous scanouts is also likely not to rely on pure GTT
updates.) With the partial vma fault support, we are no longer
restricted to only using scanouts that we can pin (though it is still
preferred for performance reasons and for powersaving features like
FBC).
v2: Skip fence pinning when not mappable.
v3: Add a comment to explain the possible ramifications of not being
able to use fences for unmappable scanouts.
v4: Rebase to skip over some local patches
v5: Rebase to defer until after we have unmappable GTT fault support
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Deepak S <deepak.s@linux.intel.com>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-27-chris@chris-wilson.co.uk
With the introduction of the reloc page cache, we are just one step away
from refactoring the relocation write functions into one. Not only does
it tidy the code (slightly), but it greatly simplifies the control logic
much to gcc's satisfaction.
v2: Add selftests to document the relationship between the clflush
flags, the KMAP bit and packing into the page-aligned pointer.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-13-chris@chris-wilson.co.uk
There is an improbable, but not impossible, case that if we leave the
pages unpin as we operate on the object, then somebody via the shrinker
may steal the lock (which lock? right now, it is struct_mutex, THE lock)
and change the cache domains after we have already inspected them.
(Whilst here, avail ourselves of the opportunity to take a couple of
steps to make the two functions look more similar.)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-11-chris@chris-wilson.co.uk
If we quickly switch from writing through the GTT to a read of the
physical page directly with the CPU (e.g. performing relocations through
the GTT and then running the command parser), we can observe that the
writes are not visible to the CPU. It is not a coherency problem, as
extensive investigations with clflush have demonstrated, but a mere
timing issue - we have to wait for the GTT to complete it's write before
we start our read from the CPU.
The issue can be illustrated in userspace with:
gtt = gem_mmap__gtt(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
cpu = gem_mmap__cpu(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
gem_set_domain(fd, handle, I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);
for (i = 0; i < OBJECT_SIZE / 64; i++) {
int x = 16*i + (i%16);
gtt[x] = i;
clflush(&cpu[x], sizeof(cpu[x]));
assert(cpu[x] == i);
}
Experimenting with that shows that this behaviour is indeed limited to
recent Atom-class hardware.
Testcase: igt/gem_exec_flush/basic-batch-default-cmd #byt
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-10-chris@chris-wilson.co.uk
This is a companion to i915_gem_obj_prepare_shmem_read() that prepares
the backing storage for direct writes. It first serialises with the GPU,
pins the backing storage and then indicates what clfushes are required in
order for the writes to be coherent.
Whilst here, fix support for ancient CPUs without clflush for which we
cannot do the GTT+clflush tricks.
v2: Add i915_gem_obj_finish_shmem_access() for symmetry
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-8-chris@chris-wilson.co.uk
When doing relocations, we have to obtain a mapping to the page
containing the target address. This is either a kmap or iomap depending
on GPU and its cache coherency. Neighbouring relocation entries are
typically within the same page and so we can cache our kmapping between
them and avoid those pesky TLB flushes.
Note that there is some sleight-of-hand in how the slow relocate works
as the reloc_entry_cache implies pagefaults disabled (as we are inside a
kmap_atomic section). However, the slow relocate code is meant to be the
fallback from the atomic fast path failing. Fortunately it works as we
already have performed the copy_from_user for the relocation array (no
more pagefaults there) and the kmap_atomic cache is enabled after we
have waited upon an active buffer (so no more sleeping in atomic).
Magic!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-7-chris@chris-wilson.co.uk
If we cannot release the fence (for example if someone is inexplicably
trying to write into a tiled framebuffer that is currently pinned to the
display! *cough* kms_frontbuffer_tracking *cough*) fallback to using the
page-by-page pwrite/pread interface, rather than fail the syscall
entirely.
Since this is triggerable by the user (along pwrite) we have to remove
the WARN_ON(fence->pin_count).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-6-chris@chris-wilson.co.uk
For virtual display feature, by hardcoding 0 for the vblank counter and
-EINVAL for the scanout position return value, we signal to the core DRM code that there are
no hardware counters we can use for these.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit d25bcfb8c2.
I somehow missed that it only compiles on arm64 and broke the driver
rather badly.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Build the legacy semaphore initialisation array using the engine
hardware ids instead of driver internal ones. This makes the
static array size dependent only on the number of gen6 semaphore
engines.
Also makes the per-engine semaphore wait and signal tables
hardware id indexed saving some more space.
v2: Refactor I915_GEN6_NUM_ENGINES to GEN6_SEMAPHORE_LAST. (Chris Wilson)
v3: More polish. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1471363461-9973-1-git-send-email-tvrtko.ursulin@linux.intel.com
Put the engine hardware id in the common header so they are
not only associated with the GuC since they are needed for
the legacy semaphores implementation.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
We seem to have a bit a mess in how to describe the bus formats, with
a multitude of competing ways. Might be best to consolidate it all and
use MEDIA_BUS_FMT_ also for the hdmi color formats and high color
modes.
Also move all the display_info related functions into drm_connector.c
(there's only one) to group it all together. I did decided against
also moving the edid related display info functions, they seem to fit
better in drm_edid.c. Instead sprinkle a few cross references around.
While at that reduce the kerneldoc for static functions, there's not
point in documenting internals with that much detail really.
v2: Fix typo and move misplaced hunk (Sean).
Cc: Sean Paul <seanpaul@chromium.org>
Reviewed-by: Sean Paul <seanpaul@chromium.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471034937-651-19-git-send-email-daniel.vetter@ffwll.ch