In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Addresses-Coverity-ID: 141432
Addresses-Coverity-ID: 141433
Addresses-Coverity-ID: 141434
Addresses-Coverity-ID: 141435
Addresses-Coverity-ID: 141436
Addresses-Coverity-ID: 1357360
Addresses-Coverity-ID: 1357403
Addresses-Coverity-ID: 1357433
Addresses-Coverity-ID: 1392622
Addresses-Coverity-ID: 1415273
Addresses-Coverity-ID: 1435752
Addresses-Coverity-ID: 1441500
Addresses-Coverity-ID: 1454596
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180628223541.GA17665@embeddedor.com
Commit cd7e 61b9"init mmio by lri command in vgpu inhibit context"
initializes registers saved/restored in context with its vreg value
through lri command in ring buffer. It relies on vreg got updated
on every guest access. There is a case found that Linux guest uses
lri command in inhibit-ctx to update the register. This patch adds
vreg update on this case.
v2: move mmio_attribute functions to gvt.h (Zhenyu)
v3: use mask_mmio_write in vreg update
v4: refine codes and add more comments (Zhenyu)
Fixes: cd7e61b9("drm/i915/gvt: init mmio by lri command in vgpu inhibit context")
Signed-off-by: Hang Yuan <hang.yuan@linux.intel.com>
Signed-off-by: Weinan Li <weinan.z.li@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Currently, the wc-stash used for providing flushed WC pages ready for
constructing the page directories is assumed to be protected by the
struct_mutex. However, we want to remove this global lock and so must
install a replacement global lock for accessing the global wc-stash (the
per-vm stash continues to be guarded by the vm).
We need to push ahead on this patch due to an oversight in hastily
removing the struct_mutex guard around the igt_ppgtt_alloc selftest. No
matter, it will prove very useful (i.e. will be required) in the near
future.
v2: Restore the onstack stash so that we can drop the vm->mutex in
future across the allocation.
v3: Restore the lost pagevec_init of the onstack allocation, and repaint
function names.
v4: Reorder init so that we don't try and use i915_address_space before
it is ininitialised.
Fixes: 1f6f00238a ("drm/i915/selftests: Drop struct_mutex around lowlevel pggtt allocation")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180704185518.4193-1-chris@chris-wilson.co.uk
Adjust the EIR clearing to cope with the edge triggered IIR
on i965/g4x. To guarantee an edge in the ISR master error bit
we temporarily mask everything in EMR. As some of the EIR bits
can't even be directly cleared we also borrow a trick from
i915_clear_error_registers() and permanently mask any bit that
remains high. No real thought given to how we might unmask them
again once the cause for the error has been clered. I suppose
on pre-g4x GPU reset will reinitialize EMR from scratch.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180611200258.27121-3-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
Just like with PIPESTAT, the edge triggered IIR on i965/g4x
also causes problems for hotplug interrupts. To make sure
we don't get the IIR port interrupt bit stuck low with the
ISR bit high we must force an edge in ISR. Unfortunately
we can't borrow the PIPESTAT trick and toggle the enable
bits in PORT_HOTPLUG_EN as that act itself generates hotplug
interrupts. Instead we just have to loop until we've cleared
PORT_HOTPLUG_STAT, or we just give up and WARN.
v2: Don't frob with PORT_HOTPLUG_EN
Cc: stable@vger.kernel.org
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180614175625.1615-1-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
Two requests have come in for a backmerge,
and I've got some pull reqs on rc2, so this
just makes sense.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The PIPEDSL freezes on PSR entry and if PSR hasn't fully exited, then
the pipe_update_start call schedules itself out to check back later.
On ChromeOS-4.4 kernel, which is fairly up-to-date w.r.t drm/i915 but
lags w.r.t core kernel code, hot plugging an external display triggers
tons of "potential atomic update errors" in the dmesg, on *pipe A*. A
closer analysis reveals that we try to read the scanline 3 times and
eventually timeout, b/c PSR hasn't exited fully leading to a PIPEDSL
stuck @ 1599. This issue is not seen on upstream kernels, b/c for *some*
reason we loop inside intel_pipe_update start for ~2+ msec which in this
case is more than enough to exit PSR fully, hence an *unstuck* PIPEDSL
counter, hence no error. On the other hand, the ChromeOS kernel spends
~1.1 msec looping inside intel_pipe_update_start and hence errors out
b/c the source is still in PSR.
Regardless, we should wait for PSR exit (if PSR is disabled, we incur
a ~1-2 usec penalty) before reading the PIPEDSL, b/c if we haven't
fully exited PSR, then checking for vblank evasion isn't actually
applicable.
v4: Comment explaining psr_wait after enabling VBL interrupts (DK)
v5: CAN_PSR() to handle platforms that don't support PSR.
v6: Handle local_irq_disable on early return (Chris)
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Signed-off-by: Tarun Vyas <tarun.vyas@intel.com>
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180627200250.1515-2-tarun.vyas@intel.com
This is a lockless version of the exisiting psr_wait_for_idle().
We want to wait for PSR to idle out inside intel_pipe_update_start.
At the time of a pipe update, we should never race with any psr
enable or disable code, which is a part of crtc enable/disable.
The follow up patch will use this lockless wait inside pipe_update_
start to wait for PSR to idle out before checking for vblank evasion.
We need to keep the wait in pipe_update_start to as less as it can be.
So,we can live and flourish w/o taking any psr locks at all.
Even if psr is never enabled, psr2_enabled will be false and this
function will wait for PSR1 to idle out, which should just return
immediately, so a very short (~1-2 usec) wait for cases where PSR
is disabled.
v2: Add comment to explain the 25msec timeout (DK)
v3: Rename psr_wait_for_idle to __psr_wait_for_idle_locked to avoid
naming conflicts and propagate err (if any) to the caller (Chris)
v5: Form a series with the next patch
v7: Better explain the need for lockless wait and increase the max
timeout to handle refresh rates < 60 Hz (Daniel Vetter)
v8: Rebase
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Signed-off-by: Tarun Vyas <tarun.vyas@intel.com>
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180627200250.1515-1-tarun.vyas@intel.com
changed gvt display transcode DDI mode from DP_SST to
DVI to address below calltrace issue during guest booting
up which is caused by zero dotclock initial value with DP_SST
mode. transcode DVI mode emulation also align with native with DP
connection.
[drm:drm_calc_timestamping_constants]
ERROR crtc 41: Can't calculate constants, dotclock = 0!
WARNING: at drivers/gpu/drm/drm_vblank.c:620
drm_calc_vbltimestamp_from_scanoutpos
Call Trace:
? drm_calc_timestamping_constants+0x144/0x150 [drm]
drm_get_last_vbltimestamp+0x54/0x90 [drm]
drm_reset_vblank_timestamp+0x59/0xd0 [drm]
drm_crtc_vblank_on+0x7b/0xd0 [drm]
intel_modeset_setup_hw_state+0xb67/0xfd0 [i915]
? gen2_read32+0x110/0x110 [i915]
? drm_modeset_lock+0x30/0xa0 [drm]
intel_modeset_init+0x794/0x19d0 [i915]
? intel_setup_gmbus+0x232/0x2e0 [i915]
i915_driver_load+0xb4a/0xf40 [i915]
Signed-off-by: Xiaolin Zhang <xiaolin.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
when guest writes ggtt entries, it could write 8 bytes a time if
gtt_entry_size is 8. But, qemu could split the 8 bytes into 2 consecutive
4-byte writes.
If each 4-byte partial write could trigger a host ggtt write, it is very
possible that a wrong combination is written to the host ggtt. E.g.
the higher 4 bytes is the old value, but the lower 4 bytes is the new
value, and this 8-byte combination is wrong but written to the ggtt, thus
causing bugs.
To handle this condition, we just record the first 4-byte write, then wait
until the second 4-byte write comes and write the combined 64-bit data to
host ggtt table.
To save memory space and to spot partial write as early as possible, we
don't keep this information for every ggtt index. Instread, we just record
the last ggtt write position, and assume the two 4-byte writes come in
consecutively for each vgpu.
This assumption is right based on the characteristic of ggtt entry which
stores memory address. When gtt_entry_size is 8, the guest memory physical
address should be 64 bits, so any sane guest driver should write 8-byte
long data at a time, so 2 consecutive 4-byte writes at the same ggtt index
should be trapped in gvt.
v2:
when incomplete ggtt entry write is located, e.g.
1. guest only writes 4 bytes at a ggtt offset and no long writes the
rest 4 bytes.
2. guest writes 4 bytes of a ggtt offset, then write at other ggtt
offsets, then return back to write the left 4 bytes of the first
ggtt offset.
add error handling logic to remap host entry to scratch page, and mark
guest virtual ggtt entry as not present. (zhenyu wang)
Signed-off-by: Zhao Yan <yan.y.zhao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Pull block fixes from Jens Axboe:
"Small set of fixes for this series. Mostly just minor fixes, the only
oddball in here is the sg change.
The sg change came out of the stall fix for NVMe, where we added a
mempool and limited us to a single page allocation. CONFIG_SG_DEBUG
sort-of ruins that, since we'd need to account for that. That's
actually a generic problem, since lots of drivers need to allocate SG
lists. So this just removes support for CONFIG_SG_DEBUG, which I added
back in 2007 and to my knowledge it was never useful.
Anyway, outside of that, this pull contains:
- clone of request with special payload fix (Bart)
- drbd discard handling fix (Bart)
- SATA blk-mq stall fix (me)
- chunk size fix (Keith)
- double free nvme rdma fix (Sagi)"
* tag 'for-linus-20180629' of git://git.kernel.dk/linux-block:
sg: remove ->sg_magic member
drbd: Fix drbd_request_prepare() discard handling
blk-mq: don't queue more if we get a busy return
block: Fix cloning of requests with a special payload
nvme-rdma: fix possible double free of controller async event buffer
block: Fix transfer when chunk sectors exceeds max
This was introduced more than a decade ago when sg chaining was
added, but we never really caught anything with it. The scatterlist
entry size can be critical, since drivers allocate it, so remove
the magic member. Recently it's been triggering allocation stalls
and failures in NVMe.
Tested-by: Jordan Glover <Golden_Miller83@protonmail.ch>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Back in commit 27af5eea54 ("drm/i915: Move execlists irq handler to a
bottom half"), we came to the conclusion that running our CSB processing
and ELSP submission from inside the irq handler was a bad idea. A really
bad idea as we could impose nearly 1s latency on other users of the
system, on average! Deferring our work to a tasklet allowed us to do the
processing with irqs enabled, reducing the impact to an average of about
50us.
We have since eradicated the use of forcewaked mmio from inside the CSB
processing and ELSP submission, bringing the impact down to around 5us
(on Kabylake); an order of magnitude better than our measurements 2
years ago on Broadwell and only about 2x worse on average than the
gem_syslatency on an unladen system.
In this iteration of the tasklet-vs-direct submission debate, we seek a
compromise where by we submit new requests immediately to the HW but
defer processing the CS interrupt onto a tasklet. We gain the advantage
of low-latency and ksoftirqd avoidance when waking up the HW, while
avoiding the system-wide starvation of our CS irq-storms.
Comparing the impact on the maximum latency observed (that is the time
stolen from an RT process) over a 120s interval, repeated several times
(using gem_syslatency, similar to RT's cyclictest) while the system is
fully laden with i915 nops, we see that direct submission an actually
improve the worse case.
Maximum latency in microseconds of a third party RT thread
(gem_syslatency -t 120 -f 2)
x Always using tasklets (a couple of >1000us outliers removed)
+ Only using tasklets from CS irq, direct submission of requests
+------------------------------------------------------------------------+
| + |
| + |
| + |
| + + |
| + + + |
| + + + + x x x |
| +++ + + + x x x x x x |
| +++ + ++ + + *x x x x x x |
| +++ + ++ + * *x x * x x x |
| + +++ + ++ * * +*xxx * x x xx |
| * +++ + ++++* *x+**xx+ * x x xxxx x |
| **x++++*++**+*x*x****x+ * +x xx xxxx x x |
|x* ******+***************++*+***xxxxxx* xx*x xxx + x+|
| |__________MA___________| |
| |______M__A________| |
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 118 91 186 124 125.28814 16.279137
+ 120 92 187 109 112.00833 13.458617
Difference at 95.0% confidence
-13.2798 +/- 3.79219
-10.5994% +/- 3.02677%
(Student's t, pooled s = 14.9237)
However the mean latency is adversely affected:
Mean latency in microseconds of a third party RT thread
(gem_syslatency -t 120 -f 1)
x Always using tasklets
+ Only using tasklets from CS irq, direct submission of requests
+------------------------------------------------------------------------+
| xxxxxx + ++ |
| xxxxxx + ++ |
| xxxxxx + +++ ++ |
| xxxxxxx +++++ ++ |
| xxxxxxx +++++ ++ |
| xxxxxxx +++++ +++ |
| xxxxxxx + ++++++++++ |
| xxxxxxxx ++ ++++++++++ |
| xxxxxxxx ++ ++++++++++ |
| xxxxxxxxxx +++++++++++++++ |
| xxxxxxxxxxx x +++++++++++++++ |
|x xxxxxxxxxxxxx x + + ++++++++++++++++++ +|
| |__A__| |
| |____A___| |
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 120 3.506 3.727 3.631 3.6321417 0.02773109
+ 120 3.834 4.149 4.039 4.0375167 0.041221676
Difference at 95.0% confidence
0.405375 +/- 0.00888913
11.1608% +/- 0.244735%
(Student's t, pooled s = 0.03513)
However, since the mean latency corresponds to the amount of irqsoff
processing we have to do for a CS interrupt, we only need to speed that
up to benefit not just system latency but our own throughput.
v2: Remember to defer submissions when under reset.
v4: Only use direct submission for new requests
v5: Be aware that with mixing direct tasklet evaluation and deferred
tasklets, we may end up idling before running the deferred tasklet.
v6: Remove the redudant likely() from tasklet_is_enabled(), restrict the
annotation to reset_in_progress().
v7: Take the full timeline.lock when enabling perf_pmu stats as the
tasklet is no longer a valid guard. A consequence is that the stats are
now only valid for engines also using the timeline.lock to process
state.
Testcase: igt/gem_exec_latency/*rthog*
References: 27af5eea54 ("drm/i915: Move execlists irq handler to a bottom half")
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180628201211.13837-9-chris@chris-wilson.co.uk
Now that we use the CSB stored in the CPU friendly HWSP, we do not need
to track interrupts for when the mmio CSB registers are valid and can
just check where we read up to last from the cached HWSP. This means we
can forgo the atomic bit tracking from interrupt, and in the next patch
it means we can check the CSB at any time.
v2: Change the splitting inside reset_prepare, we only want to lose
testing the interrupt in this patch, the next patch requires the change
in locking
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180628201211.13837-8-chris@chris-wilson.co.uk
As we now never read back our current head position from the CSB
pointers register, and the HW itself doesn't use it to prevent
overwriting unread CSB entries, we do not need to keep updating the
register. As it turns out this register is not listed as being shadowed,
and so requires forcewake -- but we haven't been taking forcewake around
it so the writes has probably been regularly dropped. Fortuitously, we
only read the value after a reset where it did not matter, and zero was
the right answer (well, close enough).
Mika pointed out that this was how we used to do it (accidentally!)
before he fixed it in commit cc53699b25 ("drm/i915: Use masked write
for Context Status Buffer Pointer").
References: cc53699b25 ("drm/i915: Use masked write for Context Status Buffer Pointer")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180628201211.13837-7-chris@chris-wilson.co.uk
On HW reset, the HW clears the write pointer (to 0). But since it also
writes its first CSB entry to slot 0, we need to reset the write pointer
back to the element before (so the first entry we read is 0).
This is required for the next patch, where we trust the CSB completely!
v2: Use _MASKED_FIELD
v3: Store the reset value, so that we differentiate between mmio/hwsp
transparently and without pretense.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180628201211.13837-6-chris@chris-wilson.co.uk
Following the removal of the last workarounds, the only CSB mmio access
is for the old vGPU interface. The mmio registers presented by vGPU do
not require forcewake and can be treated as ordinary volatile memory,
i.e. they behave just like the HWSP access just at a different location.
We can reduce the CSB access to a set of read/write/buffer pointers and
treat the various paths identically and not worry about forcewake.
(Forcewake is nightmare for worstcase latency, and we want to process
this all with irqsoff -- no latency allowed!)
v2: Comments, comments, comments. Well, 2 bonus comments.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180628201211.13837-5-chris@chris-wilson.co.uk
In the next patch, we will process the CSB events directly from the
submission path, rather than only after a CS interrupt. Hence, we will
no longer have the need for a loop until the has-interrupt bit is clear,
and in the meantime can remove that small optimisation.
v2: Tvrtko pointed out it was safer to unconditionally kick the tasklet
after each irq, when assuming that the tasklet is called for each irq.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180628201211.13837-4-chris@chris-wilson.co.uk
In the following patch, we will process the CSB events under the
timeline.lock and not serialised by the tasklet. This also means that we
will need to protect access to common variables such as
execlists->csb_head with the timeline.lock during reset.
v2: Move sync_irq to avoid deadlocks between taking timeline.lock from
our interrupt handler.
v3: Kill off the synchronize_hardirq as it raises more questions than
answered; now we use the timeline.lock entirely for CSB serialisation
between the irq and elsewhere, we don't need to be so heavy handed with
flushing
v4: Treat request cancellation (wedging after failed reset) similarly
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180628201211.13837-3-chris@chris-wilson.co.uk
In the next patch, we will begin processing the CSB from inside the
submission path (underneath an irqsoff section, and even from inside
interrupt handlers). This means that updating the execlists->port[] will
no longer be serialised by the tasklet but needs to be locked by the
engine->timeline.lock instead. Pull dequeue and submit under the same
lock for protection. (An alternate future plan is to keep the in/out
arrays separate for concurrent processing and reduced lock coverage.)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180628201211.13837-2-chris@chris-wilson.co.uk
At the moment, gem_exec_gttfill fails with a sporadic EBUSY due to us
wanting to unbind a pinned batch. Let's dump who first bound that vma to
see if that helps us identify who still unexpectedly has it pinned.
v2: We cannot allocate inside the printer (as it may be on an fs-reclaim
path), so hope for the best and build the string on the stack
v3: stack depth of 16 routinely overflows a 512 character string, limit
it to 12 to avoid unsightly truncation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180628132206.8329-1-chris@chris-wilson.co.uk