Commit 1413b2bc07 ("drm/i915: Trim NEWCLIENT boosting") had the
intended consequence of not allowing a sequence of work that merely
crossed into a new engine the privilege to be promoted to NEWCLIENT
status. It also had the unintended consequence of actually making
NEWCLIENT effective on heavily oversubscribed transcode machines and
impacting upon their throughput.
If we consider a client packet composed of (rcsA, rcsB, vcs) and 30 of
those clients, using the NEWCLIENT boost that will be scheduled as
rcsA x 30, (rcsB, vcs) x 30
where as before it would have been
(rcsA, rcsB, vcs) x 30
That is with NEWCLIENT only boosting the first request of each client,
we would execute all rcsA requests prior to running on the vcs engines;
acruing a lot of dead time as compared to the previous case where the
vcs engine would be started in parallel to processing the second client.
The previous patch has the effect of delaying submission until it is
required by a third party (either the user with an explicit wait, or by
another client/engine). We reduce the NEWCLIENT bump to a mere WAIT,
which has the effect of removing its preemptive grant and reducing it to
the same level as any other user interaction -- that it will not be
promoted above the interengine dependencies, and so preventing NEWCLIENTS
from starving other engines. This a large nerf to the rrul properties of
the current NEWCLIENT, but it still does give prioritised submission to
new requests from light workloads.
References: b16c765122 ("drm/i915: Priority boost for new clients")
Fixes: 1413b2bc07 ("drm/i915: Trim NEWCLIENT boosting") # customer impact
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-4-chris@chris-wilson.co.uk
The handling of the no-preemption priority level imposes the restriction
that we need to maintain the implied ordering even though preemption is
disabled. Otherwise we may end up with an AB-BA deadlock across multiple
engine due to a real preemption event reordering the no-preemption
WAITs. To resolve this issue we currently promote all requests to WAIT
on unsubmission, however this interferes with the timeslicing
requirement that we do not apply any implicit promotion that will defeat
the round-robin timeslice list. (If we automatically promote the active
request it will go back to the head of the queue and not the tail!)
So we need implicit promotion to prevent reordering around semaphores
where we are not allowed to preempt, and we must avoid implicit
promotion on unsubmission. So instead of at unsubmit, if we apply that
implicit promotion on adding the dependency, we avoid the semaphore
deadlock and we also reduce the gains made by the promotion for user
space waiting. Furthermore, by keeping the earlier dependencies at a
higher level, we reduce the search space for timeslicing without
altering runtime scheduling too badly (no dependencies at all will be
assigned a higher priority for rrul).
v2: Limit the bump to external edges (as originally intended) i.e.
between contexts and out to the user.
Testcase: igt/gem_concurrent_blit
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-3-chris@chris-wilson.co.uk
In commit b7404c7ecb ("drm/i915: Bump ready tasks ahead of
busywaits"), I tried cutting a corner in order to not install a signal
for each of our dependencies, and only listened to requests on which we
were intending to busywait. The compromise that was made was that
instead of then being able to promote the request with a full
NOSEMAPHORE like its non-busywaiting brethren, as we had not ensured we
had cleared the semaphore chain, we settled for only using the NEWCLIENT
boost. With an over saturated system with multiple NEWCLIENTS in flight
at any time, this was found to be an inadequate promotion and left us
with a much poorer scheduling order than prior to using semaphores.
The outcome of this patch, is that all requests have NOSEMAPHORE
priority when they have no dependencies and are ready to run and not
busywait, restoring the pre-semaphore ordering on saturated systems.
We can demonstrate the effect of poor scheduling order by oversaturating
the system using gem_wsim on a system with multiple vcs engines
(i.e running the same workloads across more clients than required for
peak throughput, e.g. media_load_balance_17i7.wsim -c4 -b context):
x v5.1 (normalized)
+ tip
* fix
+------------------------------------------------------------------------+
| x |
| x |
| x |
| x |
| %x |
| %%x |
| %%x |
| %%x |
| %%x |
| %%x |
| %%x |
| %%x |
| %%x |
| %%x |
| %%x |
| %#x |
| %#x |
| %#x |
| %#x |
| %#x |
| + %#xx |
| + %#xx |
| + %%#xx |
| + %%#xx |
| + %%#xx |
| + %%#xx |
| + %%##x |
| +++ %%##x |
| +++ %%##x |
| +++ %%##x |
| ++++ %%##x |
| ++++ %%##x |
| ++++ %%##xx |
| ++++ %###xx |
| ++++ %###xx |
| ++++ %###xx |
| ++++ %###xx |
| ++++ + %#O#xx |
| ++++ + %#O#xx |
| ++++++ + %#O#xx |
| ++++++++++ %OOOxxx|
| ++++++++++ + %#OOO#xx|
| + ++++++++++++ ++ +++++ + ++ @@OOOO#xx|
| |A_| |
||__________M_______A____________________| |
| |A_| |
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 120 0.99456 1.00628 0.999985 1.0001545 0.0024387139
+ 120 0.873021 1.00037 0.884134 0.90148752 0.039190862
Difference at 99.5% confidence
-0.098667 +/- 0.0110762
-9.86517% +/- 1.10745%
(Student's t, pooled s = 0.0277657)
% 120 0.990207 1.00165 0.9970265 0.99699748 0.0021024
Difference at 99.5% confidence
-0.003157 +/- 0.000908245
-0.315651% +/- 0.0908105%
(Student's t, pooled s = 0.00227678)
Fixes: b7404c7ecb ("drm/i915: Bump ready tasks ahead of busywaits")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-2-chris@chris-wilson.co.uk
Add a drm_mode_set_crtcinfo() call in our CRTC's mode_fixup callback
to ensure that any adjustments to the mode made by connectors etc are
properly accounted for by the CRTC.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Move the plane address and pitch calculations to atomic_check rather
than the update function, so we don't have to probe the interlace
setting for the CRTC while updating the plane.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Interlaced support has been missing from the overlay frame, which is
sub-optimal. Add support for this missing feature.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
When the CRTC is programmed for interlace, we have to halve the Y
parameters for the plane. Rather than doing this in the update
function (which would need the calculation repeated for the old
state as well as the new state), arrange to do the calculation in
atomic_check and save it in our private plane state structure.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Add accessors for getting the register values for the plane from the
plane state. This will allow us to generate the values when validating
the plane rather than when programming, which allows us to fix the
interlace handling without adding lots of additional handling in the
update functions.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Use the __drm_atomic_helper_plane_reset() helper in the overlay reset
code to ensure that generic features are correctly reset in future.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
We support interlace, but this was broken when we could no longer get
a ref on the vblank interrupt. Arrange to get the ref on the vblank
interrupt after we've re-enabled vblank, and put it before we disable
the vblank.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
The original bochs and vbox implementations of pin and unpin functions
automatically reserved BOs during validation. This functionality got lost
while converting the code to a generic implementation. This may result
in validating unlocked TTM BOs.
Adding the reserve and unreserve operations to GEM VRAM's pin and unpin
functions fixes the bochs and vbox drivers. Additionally the patch changes
the mgag200, ast and hibmc drivers to not reserve BOs by themselves.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reported-by: kernel test robot <lkp@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20190516162746.11636-3-tzimmermann@suse.de
Fixes: a3232987fd ("drm/bochs: Convert bochs driver to |struct drm_gem_vram_object|")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Pull more clk framework updates from Stephen Boyd:
"One more patch to remove io.h from clk-provider.h.
We used to need this include when we had clk_readl() and clk_writel(),
but those are gone now so this patch pushes the dependency out to the
users of clk-provider.h"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: Remove io.h from clk-provider.h
We were setting the wrong flags to enable PTI errors, so we were
seeing reads to invalid PTEs show up as write errors. Also, we
weren't turning on the interrupts. The AXI IDs we were dumping
included the outstanding write number and so they looked basically
random. And the VIO_ADDR decoding was based on the MMU VA_WIDTH for
the first platform I worked on and was wrong on others. In short,
this was a thorough mess from early HW enabling.
Tested on V3D 4.1 and 4.2 with intentional L2T, CLE, PTB, and TLB
faults.
Signed-off-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20190419001014.23579-4-eric@anholt.net
Reviewed-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Pull drm fixes from Dave Airlie:
"A bunch of fixes for the merge window closure, doesn't seem to be
anything too major or serious in there.
It does add TU117 turing modesetting to nouveau but it's just an
enable for preexisting code.
amdgpu:
- gpu reset at load crash fix
- ATPX hotplug fix for when dGPU is off
- SR-IOV fixes
radeon:
- r5xx pll fixes
i915:
- GVT (MCHBAR, buffer alignment, misc warnings fixes)
- Fixes for newly enabled semaphore code
- Geminilake disable framebuffer compression
- HSW edp fast modeset fix
- IRQ vs RCU race fix
nouveau:
- Turing modesetting fixes
- TU117 support
msm:
- SDM845 bringup fixes
panfrost:
- static checker fixes
pl111:
- spinlock init fix.
bridge:
- refresh rate register fix for adv7511"
* tag 'drm-next-2019-05-16' of git://anongit.freedesktop.org/drm/drm: (36 commits)
drm/msm: Upgrade gxpd checks to IS_ERR_OR_NULL
drm/msm/dpu: Remove duplicate header
drm/pl111: Initialize clock spinlock early
drm/msm: correct attempted NULL pointer dereference in debugfs
drm/msm: remove resv fields from msm_gem_object struct
drm/nouveau: fix duplication of nv50_head_atom struct
drm/nouveau/disp/dp: respect sink limits when selecting failsafe link configuration
drm/nouveau/core: initial support for boards with TU117 chipset
drm/nouveau/core: allow detected chipset to be overridden
drm/nouveau/kms/gf119-gp10x: push HeadSetControlOutputResource() mthd when encoders change
drm/nouveau/kms/nv50-: fix bug preventing non-vsync'd page flips
drm/nouveau/kms/gv100-: fix spurious window immediate interlocks
drm/bridge: adv7511: Fix low refresh rate selection
drm/panfrost: Add missing _fini() calls in panfrost_device_fini()
drm/panfrost: Only put sync_out if non-NULL
drm/i915: Seal races between async GPU cancellation, retirement and signaling
drm/i915: Fix fastset vs. pfit on/off on HSW EDP transcoder
drm/i915/fbc: disable framebuffer compression on GeminiLake
drm/amdgpu/psp: move psp version specific function pointers to early_init
drm/radeon: prefer lower reference dividers
...
Loop N1 instruction delay for burst mode devices are computed
based on horizontal sync and porch timing values.
The current driver is using u16 type for computing this hsync_porch
value, which would failed to fit within the u16 type for large sync
and porch timings devices. This would result in hsync_porch overflow
and eventually computed wrong instruction delay value.
Example, timings, where it produces the overflow
{
.hdisplay = 1080,
.hsync_start = 1080 + 408,
.hsync_end = 1080 + 408 + 4,
.htotal = 1080 + 408 + 4 + 38,
}
It reproduces the desired delay value 65487 but the correct working
value should be 7.
So, Fix it by computing hsync_porch value separately with u32 type.
Fixes: 1c1a7aa366 ("drm/sun4i: dsi: Add burst support")
Signed-off-by: Jagan Teki <jagan@amarulasolutions.com>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190512184128.13720-2-jagan@amarulasolutions.com
Current code initializes HDMI PHY clock driver before reset line is
deasserted and clocks enabled. Because of that, initial readout of
clock divider is incorrect (0 instead of 2). This causes any clock
rate with divider 1 (register value 0) to be set incorrectly.
Fix this by moving initialization of HDMI PHY clock driver after reset
line is deasserted and clocks enabled.
Cc: stable@vger.kernel.org # 4.17+
Fixes: 4f86e81748 ("drm/sun4i: Add support for H3 HDMI PHY variant")
Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190514204337.11068-2-jernej.skrabec@siol.net
The data structure |struct drm_vram_mm| and its helpers replace hibmc's
TTM-based memory manager. It's the same implementation; except for the
type names.
v5:
* set .llseek via DRM_VRAM_MM_FILE_OPERATIONS
v4:
* don't select DRM_TTM or DRM_VRAM_MM_HELPER
v3:
* use drm_gem_vram_mm_funcs
* convert driver to drm_device-based instance
v2:
* implement hibmc_mmap() with drm_vram_mm_mmap()
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: http://patchwork.freedesktop.org/patch/msgid/20190508082630.15116-21-tzimmermann@suse.de
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The data structure |struct drm_vram_mm| and its helpers replace vboxvideo's
TTM-based memory manager. It's the same implementation; except for the type
names.
v4:
* don't select DRM_TTM or DRM_VRAM_MM_HELPER
v3:
* use drm_gem_vram_mm_funcs
* convert driver to drm_device-based instance
v2:
* implement vbox_mmap() with drm_vram_mm_mmap()
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20190508082630.15116-19-tzimmermann@suse.de
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
This patch replaces |struct vbox_bo| and its helpers with the generic
implementation of |struct drm_gem_vram_object|. The only change in
semantics is that &ttm_bo_driver.verify_access() now does the actual
verification.
v4:
* select config option DRM_VRAM_HELPER
v3:
* remove forward declaration of struct vbox_gem_object
v2:
nothing
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20190508082630.15116-18-tzimmermann@suse.de
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The data structure |struct drm_vram_mm| and its helpers replace mgag200's
TTM-based memory manager. It's the same implementation; except for the type
names.
v4:
* don't select DRM_TTM or DRM_VRAM_MM_HELPER
v3:
* use drm_gem_vram_mm_funcs
* convert driver to drm_device-based instance
v2:
* implement mgag200_mmap() with drm_vram_mm_mmap()
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: http://patchwork.freedesktop.org/patch/msgid/20190508082630.15116-16-tzimmermann@suse.de
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The data structure |struct drm_vram_mm| and its helpers replace bochs'
TTM-based memory manager. It's the same implementation; except for the
type names.
v5:
* set .llseek via DRM_VRAM_MM_FILE_OPERATIONS
v4:
* don't select DRM_TTM or DRM_VRAM_MM_HELPER
v3:
* use drm_gem_vram_mm_funcs
* convert driver to drm_device-based instance
v2:
* implement bochs_mmap() with drm_vram_mm_mmap()
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: http://patchwork.freedesktop.org/patch/msgid/20190508082630.15116-14-tzimmermann@suse.de
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The data structure |struct drm_vram_mm| and its helpers replace ast's
TTM-based memory manager. It's the same implementation; except for the
type names.
v4:
* don't select DRM_TTM or DRM_VRAM_MM_HELPER
v3:
* use drm_gem_vram_mm_funcs
* convert driver to drm_device-based instance
v2:
* implement ast_mmap() with drm_vram_mm_mmap()
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: http://patchwork.freedesktop.org/patch/msgid/20190508082630.15116-11-tzimmermann@suse.de
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>