We want vblank counts and timestamps of flip completion as sent
in pageflip completion events to be consistent with the vblank
count and timestamp of the vblank of flip completion, like in non
VRR mode.
In VRR mode, drm_update_vblank_count() - and thereby vblank
count and timestamp updates - must be delayed until after the
end of front-porch of each vblank, as it is only safe to
calculate vblank timestamps outside of the front-porch, when
we actually know when the vblank will end or has ended.
The function drm_update_vblank_count() which updates timestamps
and counts gets called by drm_crtc_accurate_vblank_count() or by
drm_crtc_handle_vblank().
Therefore we must make sure that pageflip events for a completed
flip are only sent out after drm_crtc_accurate_vblank_count() or
drm_crtc_handle_vblank() is executed, after end of front-porch
for the vblank of flip completion.
Two cases:
a) Pageflip irq handler executes inside front-porch:
In this case we must defer sending pageflip events until
drm_crtc_handle_vblank() executes after end of front-porch,
and thereby calculates proper vblank count and timestamp.
Iow. the pflip irq handler must just arm a pageflip event
to be sent out by drm_crtc_handle_vblank() later on.
b) Pageflip irq handler executes after end of front-porch, e.g.,
after flip completion in back-porch or due to a massively
delayed handler invocation into the active scanout of the new
frame. In this case we can call drm_crtc_accurate_vblank_count()
to safely force calculation of a proper vblank count and
timestamp, and must send the pageflip completion event
ourselves from the pageflip irq handler.
This is the same behaviour as needed for standard fixed refresh
rate mode.
To decide from within pageflip handler if we are in case a) or b),
we check the current scanout position against the boundary of
front-porch. In non-VRR mode we just do what we did in the past.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In VRR mode, proper vblank/pageflip timestamps can only be computed
after the display scanout position has left front-porch. Therefore
delay calls to drm_crtc_handle_vblank(), and thereby calls to
drm_update_vblank_count() and pageflip event delivery, to after the
end of front-porch when in VRR mode.
We add a new vupdate irq, which triggers at the end of the vupdate
interval, ie. at the end of vblank, and calls the core vblank handler
function. The new irq handler is not executed in standard non-VRR
mode, so vblank handling for fixed refresh rate mode is identical
to the past implementation.
v2: Implement feedback by Nicholas and Paul Menzel.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For throttling to work correctly, we always need a baseline vblank
count last_flip_vblank that increments at start of front-porch.
This is the case for drm_crtc_vblank_count() in non-VRR mode, where
the vblank irq fires at start of front-porch and triggers DRM core
vblank handling, but it is no longer the case in VRR mode, where
core vblank handling is done later, after end of front-porch.
Therefore drm_crtc_vblank_count() is no longer useful for this.
We also can't use drm_crtc_accurate_vblank_count(), as that would
screw up vblank timestamps in VRR mode when called in front-porch.
To solve this, use the cooked hardware vblank counter returned by
amdgpu_get_vblank_counter_kms() instead, as that one is cooked to
always increment at start of front-porch, independent of when
vblank related irq's fire.
This patch allows vblank irq handling to happen anywhere within
vblank of even after it, without a negative impact on flip
throttling, so followup patches can shift the vblank core
handling trigger point wherever they need it.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
During VRR mode we can not allow vblank irq dis-/enable
transitions, as an enable after a disable can happen at
an arbitrary time during the video refresh cycle, e.g.,
with a high likelyhood inside vblank front-porch. An
enable during front-porch would cause vblank timestamp
updates/calculations which are completely bogus, given
the code can't know when the vblank will end as long
as we are in front-porch with no page flip completed.
Hold a permanent vblank reference on the crtc while
in active VRR mode to prevent a vblank disable, and
drop the reference again when switching back to fixed
refresh rate non-VRR mode.
v2: Make sure transition is also handled if vrr is
disabled and stream gets disabled in the same
atomic commit by moving the call to the transition
function outside of plane commit.
Suggested by Nicholas.
v3: Trivial rebase onto previous patch.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We need the VRR active/inactive state info earlier in
the commit sequence, so VRR related setup functions like
amdgpu_dm_handle_vrr_transition() know the final VRR state
when they need to do their hw setup work.
Split update_freesync_state_on_stream() into an early part,
that can run at the beginning of commit tail before the
vrr transition handling, and a late part that must run after
vrr transition handling inside the commit planes code for
enabled crtc's.
Suggested by Nicholas Kazlauskas.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Vega20 stores a CUSTOM profile on the GPU, but it may not be valid. Add
a bool to vega20_hwmgr to determine whether or not a valid CUSTOM
profile has been set, and use that to check when a user requests
switching to the CUSTOM profile without passing in any arguments. Then
if the CUSTOM profile has been set already, we can switch to it without
providing the parameters again
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Don't return an error if the CUSTOM profile is selected, just apply it
with the values saved to the GPU. But ensure that we zero out the
copy stored in adev to ensure that a valid profile has been submitted at
some point first
v2: Fix comment that wasn't updated from previous patch
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Allow changing to the CUSTOM profile without requiring the
parameters being passed in each time. Store the values in
the smu7_profiling table since it's defined here anyways
v2: Add check that CUSTOM was previously set
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
According to HW engineer, they prefer the TMR address be "naturally aligned", e.g. the start address
must be an integer divide of TME size.
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix the issue about TDR-2 will have "fallback timer expired on ring sdma1".
It is because the wrong number of irq types setting.
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We have two *_CLASS_DEVICE kernel config options (LCD_CLASS_DEVICE
and BACKLIGHT_LCD_DEVICE) that do the same job.
The patch removes useless BACKLIGHT_LCD_SUPPORT option
and converts LCD_CLASS_DEVICE into a menu.
Signed-off-by: Alexander Shiyan <shc_work@mail.ru>
Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
amdgpu:
- Switch to HMM for userptr (reverted until HMM fixes land)
- New experimental SMU 11 replacement for powerplay for vega20 (not enabled by default)
- Initial RAS support for vega20
- BACO support for vega12
- BACO fixes for vega20
- Rework IH handling for page fault and retry interrupts
- Cleanly split CPU and GPU paths for GPUVM updates
- Powerplay fixes
- XGMI fixes
- Rework how DC interacts with atomic for planes
- Clean up and simplify DC/Powerplay interfaces
- Misc cleanups and bug fixes
amdkfd:
- Switch to HMM for userptr (reverted until HMM fixes land)
- Add initial RAS support
- MQD fixes
ttm:
- Unify DRM_FILE_PAGE_OFFSET handling
- Account for kernel allocations in kernel zone only
- Misc cleanups
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190402170820.22197-1-alexander.deucher@amd.com
The rlc reset function is not necessary during gfx9 initialization/resume phase.
And this function would even cause rlc fw loading failed on some gfx9 ASIC.
Remove this function safely with verification well on Vega/Raven platform.
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ideally we only need one semaphore per ring to accommodate waiting on
multiple engines in parallel. However, since we do not know which fences
we will finally be waiting on, we emit a semaphore for every fence. It
turns out to be quite easy to trick ourselves into exhausting our
ringbuffer causing an error, just by feeding in a batch that depends on
several thousand contexts.
Since we never can be waiting on more than one semaphore in parallel
(other than perhaps the desire to busywait on multiple engines), just
pick the first fence for our semaphore. If we pick the wrong fence to
busywait on, we just miss an opportunity to reduce latency.
An adaption might be to use sched.flags as either a semaphore counter,
or to track the first busywait on each engine, converting it back to a
single use bit prior to closing the request.
v2: Track first semaphore used per-engine (this caters for our basic
igt that semaphores are working).
Reported-by: Mika Kuoppala <mika.kuoppala@intel.com>
Testcase: igt/gem_exec_fence/long-history
Fixes: e886196469 ("drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190401162641.10963-3-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes
warning: cast from pointer to integer of different size
[-Wpointer-to-int-cast]
on 32 bit platforms.
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If the user passes in a pointer to a GGTT mmaping of the same buffer
being written to, we can hit a deadlock in acquiring the shmemfs page
(once as the write destination and then as the read source).
[<0>] io_schedule+0xd/0x30
[<0>] __lock_page+0x105/0x1b0
[<0>] find_lock_entry+0x55/0x90
[<0>] shmem_getpage_gfp+0xbb/0x800
[<0>] shmem_read_mapping_page_gfp+0x2d/0x50
[<0>] shmem_get_pages+0x158/0x5d0 [i915]
[<0>] ____i915_gem_object_get_pages+0x17/0x90 [i915]
[<0>] __i915_gem_object_get_pages+0x57/0x70 [i915]
[<0>] i915_gem_fault+0x1b4/0x5c0 [i915]
[<0>] __do_fault+0x2d/0x80
[<0>] __handle_mm_fault+0xad4/0xfb0
[<0>] handle_mm_fault+0xe6/0x1f0
[<0>] __do_page_fault+0x18f/0x3f0
[<0>] page_fault+0x1b/0x20
[<0>] copy_user_enhanced_fast_string+0x7/0x10
[<0>] _copy_from_user+0x37/0x60
[<0>] shmem_pwrite+0xf0/0x160 [i915]
[<0>] i915_gem_pwrite_ioctl+0x14e/0x520 [i915]
[<0>] drm_ioctl_kernel+0x81/0xd0
[<0>] drm_ioctl+0x1a7/0x310
[<0>] do_vfs_ioctl+0x88/0x5d0
[<0>] ksys_ioctl+0x35/0x70
[<0>] __x64_sys_ioctl+0x11/0x20
[<0>] do_syscall_64+0x39/0xe0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
We can reduce (but not eliminate!) the chance of this happening by
faulting the user_data before we take the page lock in
pagecache_write_begin(). One way to eliminate the potential recursion
here is by disabling pagefaults for the copy, and handling the fallback
to use an alternative method -- so convert to use kmap_atomic (which
should disable preemption and pagefaulting for the copy) and report
ENODEV instead of EFAULT so that our caller tries again with a different
copy mechanism -- we already check that the page should have been
faultable so a false negative should be rare.
Testcase: igt/gem_pwrite/self
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190401133909.31203-1-chris@chris-wilson.co.uk
We don't want to overwrite "ret", it already holds the correct error
code. The "regmap" variable might be a valid pointer as this point.
Fixes: 8f83f26891 ("drm/mediatek: Add HDMI support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: CK Hu <ck.hu@mediatek.com>
We don't call this function if CONFIG_DEBUG_FS is not defined, but we
should not be compiling it either, as the declaration of the debugfs
core functions is not included.
Reported by the kbuild test robot.
Reviewed-by: James Qian Wang (Arm Technology China) <james.qian.wang@arm.com>
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
- Mali 4xx GPUs have two kinds of processors GP and PP. GP is for
OpenGL vertex shader processing and PP is for fragment shader
processing. Each processor has its own MMU so prcessors work in
virtual address space.
- There's only one GP but multiple PP (max 4 for mali 400 and 8
for mali 450) in the same mali 4xx GPU. All PPs are grouped
togather to handle a single fragment shader task divided by
FB output tiled pixels. Mali 400 user space driver is
responsible for assign target tiled pixels to each PP, but mali
450 has a HW module called DLBU to dynamically balance each
PP's load.
- User space driver allocate buffer object and map into GPU
virtual address space, upload command stream and draw data with
CPU mmap of the buffer object, then submit task to GP/PP with
a register frame indicating where is the command stream and misc
settings.
- There's no command stream validation/relocation due to each user
process has its own GPU virtual address space. GP/PP's MMU switch
virtual address space before running two tasks from different
user process. Error or evil user space code just get MMU fault
or GP/PP error IRQ, then the HW/SW will be recovered.
- Use GEM+shmem for MM. Currently just alloc and pin memory when
gem object creation. GPU vm map of the buffer is also done in
the alloc stage in kernel space. We may delay the memory
allocation and real GPU vm map to command submission stage in the
furture as improvement.
- Use drm_sched for GPU task schedule. Each OpenGL context should
have a lima context object in the kernel to distinguish tasks
from different user. drm_sched gets task from each lima context
in a fair way.
mesa driver can be found here before upstreamed:
https://gitlab.freedesktop.org/lima/mesa
v8:
- add comments for in_sync
- fix ctx free miss mutex unlock
v7:
- remove lima_fence_ops with default value
- move fence slab create to device probe
- check pad ioctl args to be zero
- add comments for user/kernel interface
v6:
- fix comments by checkpatch.pl
v5:
- export gp/pp version to userspace
- rebase on drm-misc-next
v4:
- use get param interface to get info
- separate context create/free ioctl
- remove unused max sched task param
- update copyright time
- use xarray instead of idr
- stop using drmP.h
v3:
- fix comments from kbuild robot
- restrict supported arch to tested ones
v2:
- fix syscall argument check
- fix job finish fence leak since kernel 5.0
- use drm syncobj to replace native fence
- move buffer object GPU va map into kernel
- reserve syscall argument space for future info
- remove kernel gem modifier
- switch TTM back to GEM+shmem MM
- use time based io poll
- use whole register name
- adopt gem reservation obj integration
- use drm_timeout_abs_to_jiffies
Cc: Eric Anholt <eric@anholt.net>
Cc: Rob Herring <robh@kernel.org>
Cc: Christian König <ckoenig.leichtzumerken@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Alex Deucher <alexdeucher@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Dave Airlie <airlied@gmail.com>
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Simon Shields <simon@lineageos.org>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Herring <robh@kerrnel.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/291200/