Currently i915_reset.c mixes calls to intel_uncore, pci and our old
style I915_READ mmio interfaces. Cast aside the old implicit macros,
and harmonise on using uncore throughout.
add/remove: 1/1 grow/shrink: 0/4 up/down: 65/-207 (-142)
Function old new delta
rmw_register - 65 +65
gen8_reset_engines 945 942 -3
g4x_do_reset 407 376 -31
intel_gpu_reset 545 509 -36
clear_register 63 - -63
i915_clear_error_registers 461 387 -74
A little bit of pointer dancing elimination works wonders.
v2: Roll up the helpers into intel_uncore for general use
With the helpers gcc was a little more eager to inline:
add/remove: 0/1 grow/shrink: 1/3 up/down: 99/-133 (-34)
Function old new delta
i915_clear_error_registers 461 560 +99
gen8_reset_engines 945 942 -3
g4x_do_reset 407 376 -31
intel_gpu_reset 545 509 -36
clear_register 63 - -63
Total: Before=1544400, After=1544366, chg -0.00%
Win some, lose some, gcc is gcc.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190405202419.3093-1-chris@chris-wilson.co.uk
The PDP registers are an oddity inside the set of context saved
registers in that they take the engine as a parameter to the macro and
not the mmio_base as the others do. Make it accept the engine->mmio_base
for consistency in programming the context registers.
add/remove: 0/0 grow/shrink: 2/1 up/down: 3/-32 (-29)
Function old new delta
emit_ppgtt_update 324 326 +2
capture 5102 5103 +1
execlists_init_reg_state.isra 1128 1096 -32
And similar savings later!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190405123831.9724-1-chris@chris-wilson.co.uk
When we introduced preemption, we chose to keep it disabled for gen8 as
supporting preemption inside GPGPU user batches required various w/a in
userspace. Since then, the desire to preempt long queues of requests
between batches (e.g. within busywaiting semaphores) has grown. So allow
arbitration within the busywaits and between requests, but disable
arbitration within user batches so that we can preempt between requests
and not risk breaking GPGPU.
However, since this preemption is much coarser and doesn't interfere
with userspace, we decline to include it amongst the scheduler
capabilities. (This is also required for us to skip over the preemption
selftests that expect to be able to preempt user batches.)
Michal suggested that we could perhaps allow preemption inside gen8
userspace batches if we can satisfy ourselves that the default
preemption settings are viable with existing userspace (principally
OpenCL which already should carry any known workaround). We could then
merge the two code paths back into one, even dropping the artifical
has-preemption device feature flag.
Testcase: igt/gem_exec_scheduler/semaphore-user
References: beecec9017 ("drm/i915/execlists: Preemption!")
Fixes: e886196469 ("drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Michal Winiarski <michal.winiarski@intel.com> #irc
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190329134024.5254-1-chris@chris-wilson.co.uk
During request construction, we take the timeline->mutex to ensure
exclusive access to the ringbuffer (for command emission) and the
timeline itself (for command ordering). The timeline->mutex should not
be dropped by callers until we release it in i915_request_add().
lockdep provides a pin/unpin lock facility to detect accidental unlocks
inside critical sections, so put it to use for request construction.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190403082132.327-1-chris@chris-wilson.co.uk
The below commits added dummy files to test that certain headers are
self-contained, i.e. compilable as standalone units:
39e2f501c1 ("drm/i915: Split struct intel_context definition to its own header")
3a891a6267 ("drm/i915: Move intel_engine_mask_t around for use by i915_request_types.h")
8b74594aa4 ("drm/i915: Split out i915_priolist_types into its own header")
The idea is fine, but the implementation is a bit tedious and
inflexible, and does not really scale well.
Implement the same in make using autogenerated dummy sources to include
the headers.
v2 by Chris:
- Use patsubst
- Add .gitignore
- Add clean-files for generated dummy sources
v3 by Jani:
- Fix make clean
- Add the tests to i915-y instead of extra-y
v4 by Jani:
- quiet_cmd whitespace fix
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190403135236.8398-1-jani.nikula@intel.com
drivers/gpu/drm/i915/gvt/display.c:457: warning: Function parameter or member 'connected' not described in 'intel_vgpu_emulate_hotplug'
drivers/gpu/drm/i915/gvt/display.c:457: warning: Excess function parameter 'conncted' description in 'intel_vgpu_emulate_hotplug'
Fixes: 1ca20f33df ("drm/i915/gvt: add hotplug emulation")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Hang Yuan <hang.yuan@linux.intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
stride isn't in unit of pixel, it is bytes, so calculation of
plane size doesn't need to multiple bpp.
Fixes: e546e281d3 ("drm/i915/gvt: Dmabuf support for GVT-g")
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
On ivb+ we can select between the regular 10bit LUT mode with
1024 entries, and the split mode where the LUT is split into
seprate degamma and gamma halves (each with 512 entries). Currently
we expose the split gamma size of 512 as the GAMMA/DEGAMMA_LUT_SIZE.
When using only degamma or gamma (not both) we are wasting half of
the hardware LUT entries. Let's flip that around so that we expose
the full 1024 entries and just throw away half of the user provided
entries when using the split gamma mode.
Cc: Matt Roper <matthew.d.roper@intel.com>
Suggested-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190401200231.2333-8-ville.syrjala@linux.intel.com
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Just so we don't leave gen2/3 out in the cold let's advertize the
legacy LUT via the GAMMA_LUT/GAMMA_LUT_SIZE props. Without the
GAMMA_LUT prop we can't actually load a LUT using the atomic ioctl
(in preparation for the day of 100% atomic driver).
Supposedly some gen2/3 platforms have an interpolated 10bit gamma mode
as well. It's slightly funkier than the i965+ mode since you have to
specify the slope for the interpolation by hand. But when I tried it
I couldn't get it to work, the hardware just insisted on using the
8bit more regardless of the state of the relevant PIPECONF bit.
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190401200231.2333-7-ville.syrjala@linux.intel.com
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Using the split gamma mode when we don't have to has the annoying
requirement of loading a linear LUT to the unused half. Instead
let's make life simpler by switching to the 10bit gamma mode
and duplicating each entry.
This also allows us to load the software gamma LUT into the
hardware degamma LUT, thus removing some of the buggy
configurations we currently allow (YCbCr/limited range RGB
+ gamma LUT). We do still have other configurations that are
also buggy, but those will need more complicated fixes
or they just need to be rejected. Sadly GLK doesn't have
this flexibility anymore and the degamma and gamma LUTs
are very different so no help there.
v2: Apply a mask when checking gamma_mode on icl since it
contains more bits than just the gamma mode
v3: Rebase due to EXT_GC_MAX/EXT2_GC_MAX changes
v4: s/advertize/advertise/ (Uma)
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190401200231.2333-3-ville.syrjala@linux.intel.com
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
If we have only a single active pipe and the cdclk change only requires
the cd2x divider to be updated bxt+ can do the update with forcing a full
modeset on the pipe. Try to hook that up.
v2:
- Wait for vblank after an optimized CDCLK change.
- Avoid optimization if the pipe needs a modeset (or was disabled).
- Split CDCLK change to a pre/post plane update step.
v3:
- Use correct version of CDCLK state as old state. (Ville)
- Remove unused intel_cdclk_can_skip_modeset()
v4:
- For consistency call intel_set_cdclk_post_plane_update() only during
modesets (and not fastsets).
v5:
- Remove the logic to update the CD2X divider on-the-fly on ICL, since
only a divider of 1 is supported there. Clint also noticed that the
pipe select bits in CDCLK_CTL are oddly defined on ICL, it's not clear
yet whether that's only an error in the specification.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Abhay Kumar <abhay.kumar@intel.com>
Tested-by: Abhay Kumar <abhay.kumar@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Clint Taylor <Clinton.A.Taylor@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190327101321.3095-1-imre.deak@intel.com
CDCLK has to be at least twice the BLCK regardless of audio. Audio
driver has to probe using this hook and increase the clock even in
absence of any display.
v2: Use atomic refcount for get_power, put_power so that we can
call each once(Abhay).
v3: Reset power well 2 to avoid any transaction on iDisp link
during cdclk change(Abhay).
v4: Remove Power well 2 reset workaround(Ville).
v5: Remove unwanted Power well 2 register defined in v4(Abhay).
v6:
- Use a dedicated flag instead of state->modeset for min CDCLK changes
- Make get/put audio power domain symmetric
- Rebased on top of intel_wakeref tracking changes.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Abhay Kumar <abhay.kumar@intel.com>
Tested-by: Abhay Kumar <abhay.kumar@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Clint Taylor <Clinton.A.Taylor@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190320135439.12201-1-imre.deak@intel.com
We have two *_CLASS_DEVICE kernel config options (LCD_CLASS_DEVICE
and BACKLIGHT_LCD_DEVICE) that do the same job.
The patch removes useless BACKLIGHT_LCD_SUPPORT option
and converts LCD_CLASS_DEVICE into a menu.
Signed-off-by: Alexander Shiyan <shc_work@mail.ru>
Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Ideally we only need one semaphore per ring to accommodate waiting on
multiple engines in parallel. However, since we do not know which fences
we will finally be waiting on, we emit a semaphore for every fence. It
turns out to be quite easy to trick ourselves into exhausting our
ringbuffer causing an error, just by feeding in a batch that depends on
several thousand contexts.
Since we never can be waiting on more than one semaphore in parallel
(other than perhaps the desire to busywait on multiple engines), just
pick the first fence for our semaphore. If we pick the wrong fence to
busywait on, we just miss an opportunity to reduce latency.
An adaption might be to use sched.flags as either a semaphore counter,
or to track the first busywait on each engine, converting it back to a
single use bit prior to closing the request.
v2: Track first semaphore used per-engine (this caters for our basic
igt that semaphores are working).
Reported-by: Mika Kuoppala <mika.kuoppala@intel.com>
Testcase: igt/gem_exec_fence/long-history
Fixes: e886196469 ("drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190401162641.10963-3-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
If the user passes in a pointer to a GGTT mmaping of the same buffer
being written to, we can hit a deadlock in acquiring the shmemfs page
(once as the write destination and then as the read source).
[<0>] io_schedule+0xd/0x30
[<0>] __lock_page+0x105/0x1b0
[<0>] find_lock_entry+0x55/0x90
[<0>] shmem_getpage_gfp+0xbb/0x800
[<0>] shmem_read_mapping_page_gfp+0x2d/0x50
[<0>] shmem_get_pages+0x158/0x5d0 [i915]
[<0>] ____i915_gem_object_get_pages+0x17/0x90 [i915]
[<0>] __i915_gem_object_get_pages+0x57/0x70 [i915]
[<0>] i915_gem_fault+0x1b4/0x5c0 [i915]
[<0>] __do_fault+0x2d/0x80
[<0>] __handle_mm_fault+0xad4/0xfb0
[<0>] handle_mm_fault+0xe6/0x1f0
[<0>] __do_page_fault+0x18f/0x3f0
[<0>] page_fault+0x1b/0x20
[<0>] copy_user_enhanced_fast_string+0x7/0x10
[<0>] _copy_from_user+0x37/0x60
[<0>] shmem_pwrite+0xf0/0x160 [i915]
[<0>] i915_gem_pwrite_ioctl+0x14e/0x520 [i915]
[<0>] drm_ioctl_kernel+0x81/0xd0
[<0>] drm_ioctl+0x1a7/0x310
[<0>] do_vfs_ioctl+0x88/0x5d0
[<0>] ksys_ioctl+0x35/0x70
[<0>] __x64_sys_ioctl+0x11/0x20
[<0>] do_syscall_64+0x39/0xe0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
We can reduce (but not eliminate!) the chance of this happening by
faulting the user_data before we take the page lock in
pagecache_write_begin(). One way to eliminate the potential recursion
here is by disabling pagefaults for the copy, and handling the fallback
to use an alternative method -- so convert to use kmap_atomic (which
should disable preemption and pagefaulting for the copy) and report
ENODEV instead of EFAULT so that our caller tries again with a different
copy mechanism -- we already check that the page should have been
faultable so a false negative should be rare.
Testcase: igt/gem_pwrite/self
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190401133909.31203-1-chris@chris-wilson.co.uk
Concept of a sub-platform already exist in our code (like ULX and ULT
platform variants and similar),implemented via the macros which check a
list of device ids to determine a match.
With this patch we consolidate device ids checking into a single function
called during early driver load.
A few low bits in the platform mask are reserved for sub-platform
identification and defined as a per-platform namespace.
At the same time it future proofs the platform_mask handling by preparing
the code for easy extending, and tidies the very verbose WARN strings
generated when IS_PLATFORM macros are embedded into a WARN type
statements.
v2: Fixed IS_SUBPLATFORM. Updated commit msg.
v3: Chris was right, there is an ordering problem.
v4:
* Catch-up with new sub-platforms.
* Rebase for RUNTIME_INFO.
* Drop subplatform mask union tricks and convert platform_mask to an
array for extensibility.
v5:
* Fix subplatform check.
* Protect against forgetting to expand subplatform bits.
* Remove platform enum tallying.
* Add subplatform to error state. (Chris)
* Drop macros and just use static inlines.
* Remove redundant IRONLAKE_M. (Ville)
v6:
* Split out Ironlake change.
* Optimize subplatform check.
* Use __always_inline. (Lucas)
* Add platform_mask comment. (Paulo)
* Pass stored runtime info in error capture. (Chris)
v7:
* Rebased for new AML ULX device id.
* Bump platform mask array size for EHL.
* Stop mentioning device ids in intel_device_subplatform_init by using
the trick of splitting macros i915_pciids.h. (Jani)
* AML seems to be either a subplatform of KBL or CFL so express it like
that.
v8:
* Use one device id table per subplatform. (Jani)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Jose Souza <jose.souza@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190327142328.31780-1-tvrtko.ursulin@linux.intel.com
According to Intel GFX PRM on 01.org, plane surface address can be updated
synchronously or asynchronously. Synchronous flip will hold plane surface
address update to start of next vsync, which is current implementation.
Asynchronous flip will update the address as soon as possible. Without
async flip, some 3D application could not reach better performance and
the maximum performance is no higher than vsync frequency.
The patch enables the async flip on plane surface address mmio update,
and increment flip count correctly.
With async flip enabled, some 3D applications have significant performance
improvement. i.e. 3DMark Ice Storm has a 300%~400% increment on score.
v2:
Use bit operation definition for flip mode. (zhenyu)
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Colin Xu <colin.xu@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
According to Intel GFX PRM on 01.org, the MI_DISPLAY_FLIP command can
either request display plane flip synchronously or asynchronously.
In synchronous flip, flip will be hold until next vsync, which
is not implemented yet in GVT. In asynchronous flip, flip will happen
immediately, which is current implementation.
The patch enables the sync flip on handling MI_DISPLAY_FLIP,
and increment flip count correctly by only increment on primary plane.
v2:
Use bit operation definition for flip mode. (zhenyu)
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Colin Xu <colin.xu@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>