If we don't store local data into global variables
it isn't necessary to lock anything.
v2: rebased on new SA interface
Signed-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We can now protected the semaphore ram by a
fence, so free it immediately.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Directly use the suballocator to get small chunks of memory.
It's equally fast and doesn't crash when we encounter a GPU reset.
v2: rebased on new SA interface.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
A startover with a new idea for a multiple ring allocator.
Should perform as well as a normal ring allocator as long
as only one ring does somthing, but falls back to a more
complex algorithm if more complex things start to happen.
We store the last allocated bo in last, we always try to allocate
after the last allocated bo. Principle is that in a linear GPU ring
progression was is after last is the oldest bo we allocated and thus
the first one that should no longer be in use by the GPU.
If it's not the case we skip over the bo after last to the closest
done bo if such one exist. If none exist and we are not asked to
block we report failure to allocate.
If we are asked to block we wait on all the oldest fence of all
rings. We just wait for any of those fence to complete.
v2: We need to be able to let hole point to the list_head, otherwise
try free will never free the first allocation of the list. Also
stop calling radeon_fence_signalled more than necessary.
v3: Don't free allocations without considering them as a hole,
otherwise we might lose holes. Also return ENOMEM instead of ENOENT
when running out of fences to wait for. Limit the number of holes
we try for each ring to 3.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Use one wait queue for all rings. When one ring progress, other
likely does to and we are not expecting to have a lot of waiter
anyway.
Also add a fence_wait_any that will wait until the first fence
in the fence array (one fence per ring) is signaled. This allow
to wait on all rings.
v2: some minor cleanups and improvements.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Define the interface without modifying the allocation
algorithm in any way.
v2: rebase on top of fence new uint64 patch
v3: add ring to debugfs output
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Instead of offset + size keep start and end offset directly.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Make the suballocator self containing to locking.
v2: split the bugfix into a seperate patch.
v3: remove some unreleated changes.
Sig-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Some callers illegal called fence_wait_next/empty
while holding the ring emission mutex. So don't
relock the mutex in that cases, and move the actual
locking into the fence code.
v2: Don't try to unlock the mutex if it isn't locked.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Using 64bits fence sequence we can directly compare sequence
number to know if a fence is signaled or not. Thus the fence
list became useless, so does the fence lock that mainly
protected the fence list.
Things like ring.ready are no longer behind a lock, this should
be ok as ring.ready is initialized once and will only change
when facing lockup. Worst case is that we return an -EBUSY just
after a successfull GPU reset, or we go into wait state instead
of returning -EBUSY (thus delaying reporting -EBUSY to fence
wait caller).
v2: Remove left over comment, force using writeback on cayman and
newer, thus not having to suffer from possibly scratch reg
exhaustion
v3: Rebase on top of change to uint64 fence patch
v4: Change DCE5 test to force write back on cayman and newer but
also any APU such as PALM or SUMO family
v5: Rebase on top of new uint64 fence patch
v6: Just break if seq doesn't change any more. Use radeon_fence
prefix for all function names. Even if it's now highly optimized,
try avoiding polling to often.
v7: We should never poll the last_seq from the hardware without
waking the sleeping threads, otherwise we might lose events.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This convert fence to use uint64_t sequence number intention is
to use the fact that uin64_t is big enough that we don't need to
care about wrap around.
Tested with and without writeback using 0xFFFFF000 as initial
fence sequence and thus allowing to test the wrap around from
32bits to 64bits.
v2: Add comment about possible race btw CPU & GPU, add comment
stressing that we need 2 dword aligned for R600_WB_EVENT_OFFSET
Read fence sequenc in reverse order of GPU write them so we
mitigate the race btw CPU and GPU.
v3: Drop the need for ring to emit the 64bits fence, and just have
each ring emit the lower 32bits of the fence sequence. We
handle the wrap over 32bits in fence_process.
v4: Just a small optimization: Don't reread the last_seq value
if loop restarts, since we already know its value anyway.
Also start at zero not one for seq value and use pre instead
of post increment in emmit, otherwise wait_empty will deadlock.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
A single global mutex for ring submissions seems sufficient.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We need to sync with the GFX ring as ttm might have schedule bo move
on it and new command scheduled for other ring need to wait for bo
data to be in place.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Conflicts:
drivers/gpu/drm/i915/intel_display.c
Ok, this is a fun story of git totally messing things up. There
/shouldn't/ be any conflict in here, because the fixes in -rc6 do only
touch functions that have not been changed in -next.
The offending commits in drm-next are 14415745b2..1fa611065 which
simply move a few functions from intel_display.c to intel_pm.c. The
problem seems to be that git diff gets completely confused:
$ git diff 14415745b2..1fa611065
is a nice mess in intel_display.c, and the diff leaks into totally
unrelated functions, whereas
$git diff --minimal 14415745b2..1fa611065
is exactly what we want.
Unfortunately there seems to be no way to teach similar smarts to the
merge diff and conflict generation code, because with the minimal diff
there really shouldn't be any conflicts. For added hilarity, every
time something in that area changes the + and - lines in the diff move
around like crazy, again resulting in new conflicts. So I fear this
mess will stay with us for a little longer (and might result in
another backmerge down the road).
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Use family rather than DCE check for clarity, also always use
wb on APUs, there will never be AGP variants.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Instead of all this humpy pumpy with recursive
mutex (which also fixes only halve of the problem)
move the actual gpu reset out of the fence code,
return -EDEADLK and then reset the gpu in the
calling ioctl function.
v2: Split removal of radeon_mutex into separate patch.
Return -EAGAIN if reset is successful.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Rings need to lock in order, otherwise
the ring subsystem can deadlock.
v2: fix error handling and number of locked doublewords.
v3: stop creating unneeded semaphores.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Aligning offset can make it bigger than tmp->offset
leading to an overrun bug in the following subtraction.
v2: Against initial suspicions this can't happen in mainline,
so no need to push it into stable.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Daniel Vetter writes:
A new drm-intel-next pull. Highlights:
- More gmbus patches from Daniel Kurtz, I think gmbus is now ready, all
known issues fixed.
- Fencing cleanup and pipelined fencing removal from Chris.
- rc6 residency interface from Ben, useful for powertop.
- Cleanups and code reorg around the ringbuffer code (Ben&me).
- Use hw semaphores in the pageflip code from Ben.
- More vlv stuff from Jesse, unfortunately his vlv cpu is doa, so less
merged than I've hoped for - we still have the unused function warning :(
- More hsw patches from Eugeni, again, not yet enabled fully.
- intel_pm.c refactoring from Eugeni.
- Ironlake sprite support from Chris.
- And various smaller improvements/fixes all over the place.
Note that this pull request also contains a backmerge of -rc3 to sort out
a few things in -next. I've also had to frob the shortlog a bit to exclude
anything that -rc3 brings in with this pull.
Regression wise we have a few strange bugs going on, but for all of them
closer inspection revealed that they've been pre-existing, just now
slightly more likely to be hit. And for most of them we have a patch
already. Otherwise QA has not reported any regressions, and I'm also not
aware of anything bad happening in 3.4.
* tag 'drm-intel-next-2012-04-23' of git://people.freedesktop.org/~danvet/drm-intel: (420 commits)
drm/i915: rc6 residency (fix the fix)
drm/i915/tv: fix open-coded ARRAY_SIZE.
drm/i915: invalidate render cache on gen2
drm/i915: Silence the change of LVDS sync polarity
drm/i915: add generic power management initialization
drm/i915: move clock gating functionality into intel_pm module
drm/i915: move emon functionality into intel_pm module
drm/i915: move drps, rps and rc6-related functions to intel_pm
drm/i915: fix line breaks in intel_pm
drm/i915: move watermarks settings into intel_pm module
drm/i915: move fbc-related functionality into intel_pm module
drm/i915: Refactor get_fence() to use the common fence writing routine
drm/i915: Refactor fence clearing to use the common fence writing routine
drm/i915: Refactor put_fence() to use the common fence writing routine
drm/i915: Prepare to consolidate fence writing
drm/i915: Remove the unsightly "optimisation" from flush_fence()
drm/i915: Simplify fence finding
drm/i915: Discard the unused obj->last_fenced_ring
drm/i915: Remove unused ring->setup_seqno
drm/i915: Remove fence pipelining
...
R6xx has routable blocks, but there's nothing wrong in assignment based
on dig_encoder. We didn't really need that algorithm.
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Tested-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
release_firmware() does its own tests for NULL pointers so there's no
need to explicitly test before calling it.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>