Currently on every submission, we recalculate the ELSP register offset
for the engine, after chasing the pointers to find the iomem base. Since
this is fixed for the lifetime of the driver, record the offset in the
execlists struct.
In practice the difference is negligible, it just happens to remove 27
bytes of eyesore pointer dancing from next to the hottest instruction
(which is itself due to stalling for a cache miss) in perf profiles of
the execlists_submission_tasklet().
v2: Trim off one more elsp local.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171207222434.17686-1-chris@chris-wilson.co.uk
In quite a few places, we have a list iteration over the vma on an
object that only want to inspect GGTT vma. By construction, these are
placed at the start of the list, so we have copied that knowledge into
many callsites. Pull that knowledge back to i915_vma.h and provide a
for_each_ggtt_vma() to tidy up the code.
v2: Add a backreference from vma_create() to remind ourselves why we put
ggtt vma at the head of the obj->vma_list (and ppgtt vma at the tail).
v3: Fixup s/vma/V/
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20171207211407.31549-1-chris@chris-wilson.co.uk
[airlied: fix conflict in intel_dsi.c]
drm-intel-next-2017-12-01:
- Init clock gate fix (Ville)
- Execlists event handling corrections (Chris, Michel)
- Improvements on GPU Cache invalidation and context switch (Chris)
- More perf OA changes (Lionel)
- More selftests improvements and fixes (Chris, Matthew)
- Clean-up on modules parameters (Chris)
- Clean-up around old ringbuffer submission and hw semaphore on old platforms (Chris)
- More Cannonlake stabilization effort (David, James)
- Display planes clean-up and improvements (Ville)
- New PMU interface for perf queries... (Tvrtko)
- ... and other subsequent PMU changes and fixes (Tvrtko, Chris)
- Remove success dmesg noise from rotation (Chris)
- New DMC for Kabylake (Anusha)
- Fixes around atomic commits (Daniel)
- GuC updates and fixes (Sagar, Michal, Chris)
- Couple gmbus/i2c fixes (Ville)
- Use exponential backoff for all our wait_for() (Chris)
- Fixes for i915/fbdev (Chris)
- Backlight fixes (Arnd)
- Updates on shrinker (Chris)
- Make Hotplug enable more robuts (Chris)
- Disable huge pages (TPH) on lack of a needed workaround (Joonas)
- New GuC images for SKL, KBL, BXT (Sagar)
- Add HW Workaround for Geminilake performance (Valtteri)
- Fixes for PPS timings (Imre)
- More IPS fixes (Maarten)
- Many fixes for Display Port on gen2-gen4 (Ville)
- Retry GPU reset making the recover from hang more robust (Chris)
* tag 'drm-intel-next-2017-12-01' of git://anongit.freedesktop.org/drm/drm-intel: (101 commits)
drm/i915: Update DRIVER_DATE to 20171201
drm/i915/cnl: Mask previous DDI - PLL mapping
drm/i915: Remove unsafe i915.enable_rc6
drm/i915: Sleep and retry a GPU reset if at first we don't succeed
drm/i915: Interlaced DP output doesn't work on VLV/CHV
drm/i915: Pass crtc state to intel_pipe_{enable,disable}()
drm/i915: Wait for pipe to start on i830 as well
drm/i915: Fix vblank timestamp/frame counter jumps on gen2
drm/i915: Fix deadlock in i830_disable_pipe()
drm/i915: Fix has_audio readout for DDI A
drm/i915: Don't add the "force audio" property to DP connectors that don't support audio
drm/i915: Disable DP audio for g4x
drm/i915/selftests: Wake the device before executing requests on the GPU
drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture
drm/i915: Tidy up signed/unsigned comparison
drm/i915: Enable IPS with only sprite plane visible too, v4.
drm/i915: Make ips_enabled a property depending on whether IPS is enabled, v3.
drm/i915: Avoid PPS HW/SW state mismatch due to rounding
drm/i915: Skip switch-to-kernel-context on suspend when wedged
drm/i915/glk: Apply WaProgramL3SqcReg1DefaultForPerf for GLK too
...
regression fix for vc4 + rpm stable fix for analogix bridge
* tag 'drm-misc-fixes-2017-12-07' of git://anongit.freedesktop.org/drm/drm-misc:
drm/bridge: analogix dp: Fix runtime PM state in get_modes() callback
drm/vc4: Fix false positive WARN() backtrace on refcount_inc() usage
- Fix for fd.o bug #103997 CNL eDP + HDMI causing a machine hard hang (James)
- Fix to allow suspending with a wedged GPU to hopefully unwedge it (Chris)
- Fix for Gen2 vblank timestap/frame counter jumps (Ville)
- Revert of a W/A for enabling FBC on CNL/GLK for certain images
and sizes (Rodrigo)
- Lockdep fix for i915 userptr code (Chris)
gvt-fixes-2017-12-06
- Fix invalid hw reg read value for vGPU (Xiong)
- Fix qemu warning on PCI ROM bar missing (Changbin)
- Workaround preemption regression (Zhenyu)
* tag 'drm-intel-fixes-2017-12-07' of git://anongit.freedesktop.org/drm/drm-intel:
Revert "drm/i915: Display WA #1133 WaFbcSkipSegments:cnl, glk"
drm/i915: Call i915_gem_init_userptr() before taking struct_mutex
drm/i915/gvt: set max priority for gvt context
drm/i915/gvt: Don't mark vgpu context as inactive when preempted
drm/i915/gvt: Limit read hw reg to active vgpu
drm/i915/gvt: Export intel_gvt_render_mmio_to_ring_id()
drm/i915/gvt: Emulate PCI expansion ROM base address register
drm/i915/cnl: Mask previous DDI - PLL mapping
drm/i915: Fix vblank timestamp/frame counter jumps on gen2
drm/i915: Skip switch-to-kernel-context on suspend when wedged
UAPI Changes:
- Add "panel orientation" property to DRM to indicate orientation of the
panel vs the device's casing (Hans de Goede)
Core Changes:
- misc doc and bug fixes
Driver Changes:
- sun4i: Many improvements to the DE driver like multi-plane support and
YUV formats (Jernej Skrabec)
* tag 'drm-misc-next-2017-12-07' of git://anongit.freedesktop.org/drm/drm-misc: (50 commits)
drm/sun4i: Fix uninitialized variables in vi layer
drm/fb-helper: Fix potential NULL pointer dereference
gpu: drm: stm: Adopt SPDX identifiers
gpu: drm: sti: Adopt SPDX identifiers
drm/fsl-dcu: Use drm_mode_config_helper_suspend/resume()
drm/sun4i: Wire in DE2 YUV support
drm/sun4i: Expand DE2 scaler lib with YUV support
drm/sun4i: Add DE2 definitions for YUV formats
drm/sun4i: Add DE2 CSC library
drm/sun4i: Add CCSC property to DE2 configuration
drm/sun4i: Add support for HW scaling to DE2
drm/sun4i: Add scaler configuration to DE2 mixers
drm/sun4i: Add support for DE2 VI planes
drm/sun4i: Reorganize UI layer code in DE2
drm/sun4i: Add support for all HW supported DE2 RGB formats
drm/sun4i: Add multi plane support to DE2 driver
drm/sun4i: Move interlace related code in DE2
drm/sun4i: Move channel size related code in DE2
drm/sun4i: Move line width setting in DE2
drm/sun4i: Use values calculated by atomic check
...
This moves and renames the AMDGPU scheduler to a common location in DRM
in order to facilitate re-use by other drivers. This is mostly a straight
forward rename with no code changes.
One notable exception is the function to_drm_sched_fence(), which is no
longer a inline header function to avoid the need to export the
drm_sched_fence_ops_scheduled and drm_sched_fence_ops_finished structures.
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
get_modes() callback might be called asynchronously from the DRM core and
it is not synchronized with bridge_enable(), which sets proper runtime PM
state of the main DP device. Fix this by calling pm_runtime_get_sync()
before calling drm_get_edid(), which in turn calls drm_dp_i2c_xfer() and
analogix_dp_transfer() to ensure that main DP device is runtime active
when doing any access to its registers.
This fixes the following kernel issue on Samsung Exynos5250 Snow board:
Unhandled fault: imprecise external abort (0x406) at 0x00000000
pgd = c0004000
[00000000] *pgd=00000000
Internal error: : 406 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 0 PID: 62 Comm: kworker/0:2 Not tainted 4.13.0-rc2-00364-g4a97a3da420b #3357
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
Workqueue: events output_poll_execute
task: edc14800 task.stack: edcb2000
PC is at analogix_dp_transfer+0x15c/0x2fc
LR is at analogix_dp_transfer+0x134/0x2fc
pc : [<c0468538>] lr : [<c0468510>] psr: 60000013
sp : edcb3be8 ip : 0000002a fp : 00000001
r10: 00000000 r9 : edcb3cd8 r8 : edcb3c40
r7 : 00000000 r6 : edd3b380 r5 : edd3b010 r4 : 00000064
r3 : 00000000 r2 : f0ad3000 r1 : edcb3c40 r0 : edd3b010
Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
Control: 10c5387d Table: 4000406a DAC: 00000051
Process kworker/0:2 (pid: 62, stack limit = 0xedcb2210)
Stack: (0xedcb3be8 to 0xedcb4000)
[<c0468538>] (analogix_dp_transfer) from [<c0424ba4>] (drm_dp_i2c_do_msg+0x8c/0x2b4)
[<c0424ba4>] (drm_dp_i2c_do_msg) from [<c0424e64>] (drm_dp_i2c_xfer+0x98/0x214)
[<c0424e64>] (drm_dp_i2c_xfer) from [<c057b2d8>] (__i2c_transfer+0x140/0x29c)
[<c057b2d8>] (__i2c_transfer) from [<c057b4a4>] (i2c_transfer+0x70/0xe4)
[<c057b4a4>] (i2c_transfer) from [<c0441de4>] (drm_do_probe_ddc_edid+0xb4/0x114)
[<c0441de4>] (drm_do_probe_ddc_edid) from [<c0441e5c>] (drm_probe_ddc+0x18/0x28)
[<c0441e5c>] (drm_probe_ddc) from [<c0445728>] (drm_get_edid+0x124/0x2d4)
[<c0445728>] (drm_get_edid) from [<c0465ea0>] (analogix_dp_get_modes+0x90/0x114)
[<c0465ea0>] (analogix_dp_get_modes) from [<c0425e8c>] (drm_helper_probe_single_connector_modes+0x198/0x68c)
[<c0425e8c>] (drm_helper_probe_single_connector_modes) from [<c04325d4>] (drm_setup_crtcs+0x1b4/0xd18)
[<c04325d4>] (drm_setup_crtcs) from [<c04344a8>] (drm_fb_helper_hotplug_event+0x94/0xd0)
[<c04344a8>] (drm_fb_helper_hotplug_event) from [<c0425a50>] (drm_kms_helper_hotplug_event+0x24/0x28)
[<c0425a50>] (drm_kms_helper_hotplug_event) from [<c04263ec>] (output_poll_execute+0x6c/0x174)
[<c04263ec>] (output_poll_execute) from [<c0136f18>] (process_one_work+0x188/0x3fc)
[<c0136f18>] (process_one_work) from [<c01371f4>] (worker_thread+0x30/0x4b8)
[<c01371f4>] (worker_thread) from [<c013daf8>] (kthread+0x128/0x164)
[<c013daf8>] (kthread) from [<c0108510>] (ret_from_fork+0x14/0x24)
Code: 0a000002 ea000009 e2544001 0a00004a (e59537c8)
---[ end trace cddc7919c79f7878 ]---
Reported-by: Misha Komarovskiy <zombah@gmail.com>
CC: stable@vger.kernel.org # v4.10+
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20171121074936.22520-1-m.szyprowski@samsung.com
With CONFIG_REFCOUNT_FULL enabled, refcount_inc() complains when it's
passed a refcount object that has its counter set to 0. In this driver,
this is a valid use case since we want to increment ->usecnt only when
the BO object starts to be used by real HW components and this is
definitely not the case when the BO is created.
Fix the problem by using refcount_inc_not_zero() instead of
refcount_inc() and fallback to refcount_set(1) when
refcount_inc_not_zero() returns false. Note that this 2-steps operation
is not racy here because the whole section is protected by a mutex
which guarantees that the counter does not change between the
refcount_inc_not_zero() and refcount_set() calls.
Fixes: b9f19259b8 ("drm/vc4: Add the DRM_IOCTL_VC4_GEM_MADVISE ioctl")
Reported-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Acked-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20171122203928.28135-1-boris.brezillon@free-electrons.com
Removed exynos_drm_get_dma_device funtion declaration on top
of exynos_drm_drv.c file.
We can remove this declaration by moving the implementation
of this function upwards.
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Removed two descriptions to 'da_start' and 'da_space_size'
from exynos_drm_private structure.
These members don't exist anymore.
Signed-off-by: Inki Dae <inki.dae@samsung.com>
When no IOMMU is available, all GEM buffers allocated by Exynos DRM driver
are contiguous, because of the underlying dma_alloc_attrs() function
provides only such buffers. In such case it makes no sense to keep
BO_NONCONTIG flag for the allocated GEM buffers. This allows to avoid
failures for buffer contiguity checks in the subsequent operations on GEM
objects.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
CC: stable@vger.kernel.org # v4.4+
When IOMMU support was enabled, dma-buf import in Exynos DRM was broken
since commit f43c35966a ("drm/exynos: use real device for DMA-mapping
operations") due to using wrong struct device in drm_gem_prime_import()
function. This patch fixes following kernel BUG caused by incorrect buffer
mapping to DMA address space:
exynos-sysmmu 14650000.sysmmu: 14450000.mixer: PAGE FAULT occurred at 0xb2e00000
------------[ cut here ]------------
kernel BUG at drivers/iommu/exynos-iommu.c:449!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc4-next-20171016-00033-g990d723669fd #3165
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
task: c0e0b7c0 task.stack: c0e00000
PC is at exynos_sysmmu_irq+0x1d0/0x24c
LR is at exynos_sysmmu_irq+0x154/0x24c
------------[ cut here ]------------
Reported-by: Marian Mihailescu <mihailescu2m@gmail.com>
Fixes: f43c35966a ("drm/exynos: use real device for DMA-mapping operations")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Just the connector_iter corner-case regression fix.
* tag 'drm-misc-fixes-2017-12-06' of git://anongit.freedesktop.org/drm/drm-misc:
drm: safely free connectors from connector_iter
First feature request for 4.16. Highlights:
- RV and Vega header cleanups
- TTM operation context support
- 48 bit GPUVM fixes for Vega/RV
- More smatch fixes
- ECC support for vega10
- Resizeable BAR support
- Multi-display sync support in DC
- SR-IOV fixes
- Various scheduler improvements
- GPU reset fixes and vram lost tracking
- Clean up DC/powerplay interfaces
- DCN display fixes
- Various DC fixes
* 'drm-next-4.16' of git://people.freedesktop.org/~agd5f/linux: (291 commits)
drm/radeon: Use drm_fb_helper_lastclose() and _poll_changed()
drm/amdgpu: Use drm_fb_helper_lastclose() and _poll_changed()
drm/amd/display: Use drm_fb_helper_poll_changed()
drm/ttm: swap consecutive allocated pooled pages v4
drm/amdgpu: fix amdgpu_sync_resv v2
drm/ttm: swap consecutive allocated cached pages v3
drm/amd/amdgpu: set gtt size according to system memory size only
drm/amdgpu: Get rid of dep_sync as a seperate object.
drm/amdgpu: allow specifying vm_block_size for multi level PDs v2
drm/amdgpu: move validation of the VM size into the VM code
drm/amdgpu: allow non pot VM size values
drm/amdgpu: choose number of VM levels based on VM size
drm/amdgpu: unify VM size handling of Vega10 with older generation
drm/amdgpu: fix amdgpu_vm_num_entries
drm/amdgpu: fix VM PD addr shift
drm/amdgpu: correct vce4.0 fw config for SRIOV (V2)
drm/amd/display: Don't call dm_log_to_buffer directly in dc_conn_log
drm/amd/display: Add dm_logger_append_va API
drm/ttm: Use a static string instead of an array of char *
drm/amd/display: remove usage of legacy_cursor_update
...
When we detect consecutive allocation of pages swap them to avoid
accidentally freeing them as huge page.
v2: use swap
v3: check if it's really the first allocated page
v4: don't touch the loop variable
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Roger He <Hongbo.He@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes a bug introduced by AMDGPU_GEM_CREATE_EXPLICIT_SYNC. We still need
to wait for pipelined moves in the shared fences list.
v2: fix typo
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When we detect consecutive allocation of pages swap them to avoid
accidentally freeing them as huge page.
v2: use swap
v3: check if it's really the first allocated page
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Roger He <Hongbo.He@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch allows specifying the vm_block_size even when multi level
page directories are active.
v2: fix signed/unsigned compare warning
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
dm_log_to_buffer logs unconditionally, so calling it directly resulted
in the main message being logged even when the event type isn't enabled
in the event mask.
To fix this, use the new dm_logger_append_va API.
Fixes spurious messages like
[drm] {1920x1200, 2080x1235@154000Khz}
in dmesg when a mode is set.
v2:
* Use new dm_logger_append_va API, fixes incorrect va_list usage in v1
* Just use and decrease entry.buf_offset to get rid of the trailing
newline
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Same as dm_logger_append, except it takes a va_list instead of a
variable number of arguments. dm_logger_append is now a minimal wrapper
around dm_logger_append_va.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Make the object a bit smaller by using a simple string instead
of a format string and array of char *.
$ size drivers/gpu/drm/ttm/ttm_page_alloc_dma.o*
text data bss dec hex filename
8820 216 4136 13172 3374 drivers/gpu/drm/ttm/ttm_page_alloc_dma.o.defconfig.new
8910 216 4136 13262 33ce drivers/gpu/drm/ttm/ttm_page_alloc_dma.o.defconfig.old
25383 5044 4384 34811 87fb drivers/gpu/drm/ttm/ttm_page_alloc_dma.o.allyesconfig.new
25797 5428 4384 35609 8b19 drivers/gpu/drm/ttm/ttm_page_alloc_dma.o.allyesconfig.old
Miscellanea:
o The h array had more entries than were emitted, all are now removed
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Currently the atomic check code uses legacy_cursor_update
to differnetiate if the cursor plane is being requested by
the user, which is not required as we shall be updating
plane only if modeset is requested/required.
Have tested cursor plane and underlay get updated seamlessly,
without any lag or frame drops.
Signed-off-by: Shirish S <shirish.s@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>