Prefer to defer activating our GEM shrinker until we have a few
megabytes to free; or we have accumulated sufficient mempressure by
deferring the reclaim to force a shrink. The intent is that because our
objects may typically be large, we are too effective at shrinking and
are not rewarded for freeing more pages than the batch. It will also
defer the initial shrinking to hopefully put it at a lower priority than
say the buffer cache (although it will balance out over a number of
reclaims, with GEM being more bursty).
v2: Give it a feedback system to try and tune the batch size towards
an effective size for the available objects.
v3: Start keeping track of shrinker stats in debugfs
v4: Protect against finding no shrinkable objects (div-by-zero)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171013202621.7276-7-chris@chris-wilson.co.uk
Remove the struct_mutex requirement around dev_priv->mm.bound_list and
dev_priv->mm.unbound_list by giving it its own spinlock. This reduces
one more requirement for struct_mutex and in the process gives us
slightly more accurate unbound_list tracking, which should improve the
shrinker - but the drawback is that we drop the retirement before
counting so i915_gem_object_is_active() may be stale and lead us to
underestimate the number of objects that may be shrunk (see commit
bed50aea61 ("drm/i915/shrinker: Flush active on objects before
counting")).
v2: Crosslink the spinlock to the lists it protects, and btw this
changes s/obj->global_link/obj->mm.link/
v3: Fix decoupling of old links in i915_gem_object_attach_phys()
v3.1: Fix the fix, only unlink if it was linked
v3.2: Use a local for to_i915(obj->base.dev)->mm.obj_lock
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171016114037.5556-1-chris@chris-wilson.co.uk
The engine also provides a mirror of the CSB write pointer in the HWSP,
but not of our read pointer. To take advantage of this we need to
remember where we read up to on the last interrupt and continue off from
there. This poses a problem following a reset, as we don't know where
the hw will start writing from, and due to the use of power contexts we
cannot perform that query during the reset itself. So we continue the
current modus operandi of delaying the first read of the context-status
read/write pointers until after the first interrupt. With this we should
now have eliminated all uncached mmio reads in handling the
context-status interrupt, though we still have the uncached mmio writes
for submitting new work, and many uncached mmio reads in the global
interrupt handler itself. Still a step in the right direction towards
reducing our resubmit latency, although it appears lost in the noise!
v2: Cannonlake moved the CSB write index
v3: Include the sw/hwsp state in debugfs/i915_engine_info
v4: Also revert to using CSB mmio for GVT-g
v5: Prevent the compiler reloading tail (Mika)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Acked-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170913085605.18299-6-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
The engine provides a mirror of the CSB in the HWSP. If we use the
cacheable reads from the HWSP, we can shave off a few mmio reads per
context-switch interrupt (which are quite frequent!). Just removing a
couple of mmio is not enough to actually reduce any latency, but a small
reduction in overall cpu usage.
Much appreciation for Ben dropping the bombshell that the CSB was in the
HWSP and for Michel in digging out the details.
v2: Don't be lazy, add the defines for the indices.
v3: Include the HWSP in debugfs/i915_engine_info
v4: Check for GVT-g, it currently depends on intercepting CSB mmio
v5: Fixup GVT-g mmio path
v6: Disable HWSP if VT-d is active as the iommu adds unpredictable
memory latency. (Mika)
v7: Also markup the CSB read with READ_ONCE() as it may still be an mmio
read and we want to stop the compiler from issuing a later (v.slow) reload.
Suggested-by: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Acked-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170913133534.26927-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
This was the competing idea long ago, but it was only with the rewrite
of the idr as an radixtree and using the radixtree directly ourselves,
along with the realisation that we can store the vma directly in the
radixtree and only need a list for the reverse mapping, that made the
patch performant enough to displace using a hashtable. Though the vma ht
is fast and doesn't require any extra allocation (as we can embed the node
inside the vma), it does require a thread for resizing and serialization
and will have the occasional slow lookup. That is hairy enough to
investigate alternatives and favour them if equivalent in peak performance.
One advantage of allocating an indirection entry is that we can support a
single shared bo between many clients, something that was done on a
first-come first-serve basis for shared GGTT vma previously. To offset
the extra allocations, we create yet another kmem_cache for them.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170816085210.4199-5-chris@chris-wilson.co.uk
When reading the i915_energy_uJ debugfs file, it tries to fetch
MSR_RAPL_POWER_UNIT, which might not be available, like in a vm
environment, causing the exception shown below.
We can easily prevent it by doing a rdmsrl_safe read instead, which will
handle the exception, allowing us to abort the debugfs file read.
This was caught by the new igt@debugfs_test@read_all_entries testcase in
the CI.
unchecked MSR access error: RDMSR from 0x606 at rIP:0xffffffffa0078f66
(i915_energy_uJ+0x36/0xb0 [i915])
Call Trace:
seq_read+0xdc/0x3a0
full_proxy_read+0x4f/0x70
__vfs_read+0x23/0x120
? putname+0x4f/0x60
? rcu_read_lock_sched_held+0x75/0x80
? entry_SYSCALL_64_fastpath+0x5/0xb1
vfs_read+0xa0/0x150
SyS_read+0x44/0xb0
entry_SYSCALL_64_fastpath+0x1c/0xb1
RIP: 0033:0x7f1f5e9f4500
RSP: 002b:00007ffc77e65cf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: ffffffff8146e003 RCX: 00007f1f5e9f4500
RDX: 0000000000000200 RSI: 00007ffc77e65d10 RDI: 0000000000000006
RBP: ffffc900007abf88 R08: 0000000001eaff20 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000006 R14: 0000000000000005 R15: 0000000001eb94db
? __this_cpu_preempt_check+0x13/0x20
v2:
- Drop unsigned long long cast and improve calculation (Chris)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101901
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/87o9s7zrx3.fsf@dilma.collabora.co.uk
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Pull more drm updates from Dave Airlie:
"i915, amd and some core fixes + mediatek color support.
Some fixes tree came in since the main pull request for rc1, primarily
i915 and drm-misc and one amd fix. The drm core vblank regression fix
is probably the most important thing.
I've also added the mediatek feature pull, it wasn't that big and
didn't look like it would have any impact outside of mediatek, in fact
it looks to just be a single feature, and some cleanups"
* tag 'drm-fixes-for-v4.13-rc1' of git://people.freedesktop.org/~airlied/linux: (31 commits)
drm/i915: Make DP-MST connector info work
drm/i915/gvt: Use fence error from GVT request for workload status
drm/i915/gvt: remove scheduler_mutex in per-engine workload_thread
drm/i915/gvt: Revert "drm/i915/gvt: Fix possible recursive locking issue"
drm/i915/gvt: Audit the command buffer address
drm/i915/gvt: Fix a memory leak in intel_gvt_init_gtt()
drm/rockchip: fix NULL check on devm_kzalloc() return value
drm/i915/fbdev: Check for existence of ifbdev->vma before operations
drm/radeon: Fix eDP for single-display iMac10,1 (v2)
drm/i915: Hold RPM wakelock while initializing OA buffer
drm/i915/cnl: Fix the CURSOR_COEFF_MASK used in DDI Vswing Programming
drm/i915/cfl: Fix Workarounds.
drm/i915: Avoid undefined behaviour of "u32 >> 32"
drm/i915: reintroduce VLV/CHV PFI programming power domain workaround
drm/i915: Fix an error checking test
drm/i915: Disable MSI for all pre-gen5
drm/atomic: Add missing drm_atomic_state_clear to atomic_remove_fb
drm: vblank: Fix vblank timestamp update
drm/i915/gvt: Make function dpy_reg_mmio_readx safe
drm/mediatek: separate color module to fixup error memory reallocation
...
Pull drm updates from Dave Airlie:
"This is the main pull request for the drm, I think I've got one later
driver pull for mediatek SoC driver, I'm undecided on if it needs to
go to you yet.
Otherwise summary below:
Core drm:
- Atomic add driver private objects
- Deprecate preclose hook in modern drivers
- MST bandwidth tracking
- Use kvmalloc in more places
- Add mode_valid hook for crtc/encoder/bridge
- Reduce sync_file construction time
- Documentation updates
- New DRM synchronisation object support
New drivers:
- pl111 - pl111 CLCD display controller
Panel:
- Innolux P079ZCA panel driver
- Add NL12880B20-05, NL192108AC18-02D, P320HVN03 panels
- panel-samsung-s6e3ha2: Add s6e3hf2 panel support
i915:
- SKL+ watermark fixes
- G4x/G33 reset improvements
- DP AUX backlight improvements
- Buffer based GuC/host communication
- New getparam for (sub)slice infomation
- Cannonlake and Coffeelake initial patches
- Execbuf optimisations
radeon/amdgpu:
- Lots of Vega10 bug fixes
- Preliminary raven support
- KIQ support for compute rings
- MEC queue management rework
- DCE6 Audio support
- SR-IOV improvements
- Better radeon/amdgpu selection support
nouveau:
- HDMI stereoscopic support
- Display code rework for >= GM20x GPUs
msm:
- GEM rework for fine-grained locking
- Per-process pagetable work
- HDMI fixes for Snapdragon 820.
vc4:
- Remove 256MB CMA limit from vc4
- Add out-fence support
- Add support for cygnus
- Get/set tiling ioctls support
- Add T-format tiling support for scanout
zte:
- add VGA support.
etnaviv:
- Thermal throttle support for newer GPUs
- Restore userspace buffer cache performance
- dma-buf sync fix
stm:
- add stm32f429 display support
exynos:
- Rework vblank handling
- Fixup sw-trigger code
sun4i:
- V3s display engine support
- HDMI support for older SoCs
- Preliminary work on dual-pipeline SoCs.
rcar-du:
- VSP work
imx-drm:
- Remove counter load enable from PRE
- Double read/write reduction flag support
tegra:
- Documentation for the host1x and drm driver.
- Lots of staging ioctl fixes due to grate project work.
omapdrm:
- dma-buf fence support
- TILER rotation fixes"
* tag 'drm-for-v4.13' of git://people.freedesktop.org/~airlied/linux: (1270 commits)
drm: Remove unused drm_file parameter to drm_syncobj_replace_fence()
drm/amd/powerplay: fix bug fail to remove sysfs when rmmod amdgpu.
amdgpu: Set cik/si_support to 1 by default if radeon isn't built
drm/amdgpu/gfx9: fix driver reload with KIQ
drm/amdgpu/gfx8: fix driver reload with KIQ
drm/amdgpu: Don't call amd_powerplay_destroy() if we don't have powerplay
drm/ttm: Fix use-after-free in ttm_bo_clean_mm
drm/amd/amdgpu: move get memory type function from early init to sw init
drm/amdgpu/cgs: always set reference clock in mode_info
drm/amdgpu: fix vblank_time when displays are off
drm/amd/powerplay: power value format change for Vega10
drm/amdgpu/gfx9: support the amdgpu.disable_cu option
drm/amd/powerplay: change PPSMC_MSG_GetCurrPkgPwr for Vega10
drm/amdgpu: Make amdgpu_cs_parser_init static (v2)
drm/amdgpu/cs: fix a typo in a comment
drm/amdgpu: Fix the exported always on CU bitmap
drm/amdgpu/gfx9: gfx_v9_0_enable_gfx_static_mg_power_gating() can be static
drm/amdgpu/psp: upper_32_bits/lower_32_bits for address setup
drm/amd/powerplay/cz: print message if smc message fails
drm/amdgpu: fix typo in amdgpu_debugfs_test_ib_init
...
Once a client has requested a waitboost, we keep that waitboost active
until all clients are no longer waiting. This is because we don't
distinguish which waiter deserves the boost. However, with the advent of
fence signaling, the signaler threads appear as waiters to the RPS
interrupt handler. So instead of using a single boolean to track when to
keep the waitboost active, use a counter of all outstanding waitboosted
requests.
At this point, I have removed all vestiges of the rate limiting on
clients. Whilst this means that compositors should remain more fluid,
it also means that boosts are more prevalent. See commit b29c19b645
("drm/i915: Boost RPS frequency for CPU stalls") for a longer discussion
on the pros and cons of both approaches.
A drawback of this implementation is that it requires constant request
submission to keep the waitboost trimmed (as it is now cancelled when the
request is completed). This will be fine for a busy system, but near
idle the boosts may be kept for longer than desired (effectively tens of
vblanks worstcase) and there is a reliance on rc6 instead.
v2: Remove defunct rps.client_lock
Reported-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170628123548.9236-1-chris@chris-wilson.co.uk