drm-misc-next for 5.1:
UAPI Changes:
Cross-subsystem Changes:
- Turn dma-buf fence sequence numbers into 64 bit numbers
Core Changes:
- Move to a common helper for the DP MST hotplug for radeon, i915 and
amdgpu
- i2c improvements for drm_dp_mst
- Removal of drm_syncobj_cb
- Introduction of an helper to create and attach the TV margin properties
Driver Changes:
- Improve cache flushes for v3d
- Reflection support for vc4
- HDMI overscan support for vc4
- Add implicit fencing support for rockchip and sun4i
- Switch to generic fbdev emulation for virtio
Signed-off-by: Dave Airlie <airlied@redhat.com>
[airlied: applied amdgpu merge fixup]
From: Maxime Ripard <maxime.ripard@bootlin.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190107180333.amklwycudbsub3s5@flea
Add support for PMIC MIPI sequences using the new
intel_soc_pmic_exec_mipi_pmic_seq_element function.
This fixes the DSI LCD panel not lighting up when not initialized by the
GOP (because an external monitor was connected) on GPD win and GPD pocket
devices.
Specifically the LCD panel seems to need GPIO pin 9 on the PMIC to be
driven high, which is done through a PMIC MIPI sequence. Before this commit
if the sequence was not executed by the GOP the pin would stay low causing
the LCD panel to not work. Having the MIPI sequences properly control this
GPIO should also help save some power when the panel is off.
Changes in v2, v3:
-Only changes to other patches in this patch-set
Changes in v4:
-Move decoding of the raw 15 bytes PMIC MIPI sequence element into
i2c-address, register-address, value and mask into the mipi_exec_pmic()
function instead of passing the raw data to
intel_soc_pmic_exec_mipi_pmic_seq_element()
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190107111556.4510-5-hdegoede@redhat.com
In commit 6bb2a2af8b ("drm/i915/gvt: Fix crash after request->hw_context change"),
forgot to handle workload scan path in ELSP handler case which was to
optimize scanning earlier instead of in gvt submission thread, so request
alloc and add was splitting then which is against right process.
This trys to do a partial revert of that commit which still has workload
request alloc helper and make sure shadow state population is handled after
request alloc for target state buffer.
v3: Fix missed workload status setting in request alloc error path
v2: Fix dispatch workload err path that should add request after alloc anyway.
Fixes: 6bb2a2af8b ("drm/i915/gvt: Fix crash after request->hw_context change")
Cc: Bin Yang <bin.yang@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Bin Yang <bin.yang@intel.com>
Reviewed-by: Xiaolin Zhang <xiaolin.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
If we haven't shipped and enabled firmware for a particular platform,
there is nothing the user can do about it. Don't scare the user with an
unactionable, unidentifiable warning!
<6> [310.769452] i915 0000:00:02.0: GuC: No firmware known for this platform!
<4> [310.769458] [drm] HuC: No firmware known for this platform!
Unify both GuC/HuC messages to include the device for which we lack the
firmware, and provide the platform name as an aide-memoire.
v2: Move and refine the message to common site of intel_uc_fw_fetch.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190108150246.1471-1-chris@chris-wilson.co.uk
Ignore trying to shrink from i915 if we fail to acquire the struct_mutex
in the shrinker while performing direct-reclaim. The trade-off being
(much) lower latency for non-i915 clients at an increased risk of being
unable to obtain a page from direct-reclaim without hitting the
oom-notifier. The proviso being that we still keep trying to hard
obtain the lock for kswapd so that we can reap under heavy memory
pressure.
v2: Taint all mutexes taken within the shrinker with the struct_mutex
subclass as an early warning system, and drop I915_SHRINK_ACTIVE from
vmap to reduce the number of dangerous paths. We also have to drop
I915_SHRINK_ACTIVE from oom-notifier to be able to make the same claim
that ACTIVE is only used from outside context, which fits in with a
longer strategy of avoiding stalls due to scanning active during
shrinking.
The danger in using the subclass struct_mutex is that we declare
ourselves more knowledgable than lockdep and deprive ourselves of
automatic coverage. Instead, we require ourselves to mark up any mutex
taken inside the shrinker in order to detect lock-inversion, and if we
miss any we are doomed to a deadlock at the worst possible moment.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190107115509.12523-1-chris@chris-wilson.co.uk
Generally catch up with 5.0-rc1, and specifically get the changes:
96d4f267e4 ("Remove 'type' argument from access_ok() function")
0b2c8f8b6b ("i915: fix missing user_access_end() in page fault exception case")
594cc251fd ("make 'user_access_begin()' do 'access_ok()'")
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Include the total size of closed vma when reporting the per_ctx_stats of
debugfs/i915_gem_objects.
Whilst adjusting the context tracking, note that we can simply use our
list of contexts in i915->contexts rather than circumlocute via
dev->filelist and the per-file context idr, with the result that we can
show objects allocated to different vm (i.e. contexts within a file).
We change the output to show every context of each client, with its own
unique set of objects (for full-ppgtt machines, i.e. gen7+, for older
hardware all objects are in the global gtt and so can not be associated
with a single context). That should result in no loss of information,
and for gen7+, no duplication of active objects.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190107115509.12523-2-chris@chris-wilson.co.uk
Haswell also requires the RING_IMR flush for its unique vebox setup to
avoid losing interrupts, as per 476af9c260 ("drm/i915/gen6: Flush
RING_IMR changes before changing the global GT IMR"):
On Baytail, notably, we can still detect missed interrupt syndrome
(where we never spot a completed request). In this case, it can be
alleviated by always keeping the interrupt unmasked, implying that the
interrupt is being lost in the window after modifying the IMR. (This is
the reason we still have the posting reads on enable_irq, if we remove
them we miss interrupts!) Having narrowed the issue down to the IMR,
rather than keeping it always enabled, applying the usual posting
read/flush of the RING_IMR before unmasking the GT IMR also seems to
prevent the missed interrupt. So be it.
References: 476af9c260 ("drm/i915/gen6: Flush RING_IMR changes before changing the global GT IMR")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190105115647.4970-1-chris@chris-wilson.co.uk
Pull drm fixes from Dave Airlie:
"Happy New Year, just decloaking from leave to get some stuff from the
last week in before rc1:
core:
- two regression fixes for damage blob and atomic
i915 gvt:
- Some missed GVT fixes from the original pull
amdgpu:
- new PCI IDs
- SR-IOV fixes
- DC fixes
- Vega20 fixes"
* tag 'drm-next-2019-01-05' of git://anongit.freedesktop.org/drm/drm: (53 commits)
drm: Put damage blob when destroy plane state
drm: fix null pointer dereference on null state pointer
drm/amdgpu: Add new VegaM pci id
drm/ttm: Use drm_debug_printer for all ttm_bo_mem_space_debug output
drm/amdgpu: add Vega20 PSP ASD firmware loading
drm/amd/display: Fix MST dp_blank REG_WAIT timeout
drm/amd/display: validate extended dongle caps
drm/amd/display: Use div_u64 for flip timestamp ns to ms
drm/amdgpu/uvd:Change uvd ring name convention
drm/amd/powerplay: add Vega20 LCLK DPM level setting support
drm/amdgpu: print process info when job timeout
drm/amdgpu/nbio7.4: add hw bug workaround for vega20
drm/amdgpu/nbio6.1: add hw bug workaround for vega10/12
drm/amd/display: Optimize passive update planes.
drm/amd/display: verify lane status before exiting verify link cap
drm/amd/display: Fix bug with not updating VSP infoframe
drm/amd/display: Add retry to read ddc_clock pin
drm/amd/display: Don't skip link training for empty dongle
drm/amd/display: Wait edp HPD to high in detect_sink
drm/amd/display: fix surface update sequence
...
Our attempt to account for bit17 swizzling of pread/pwrite onto tiled
objects was flawed due to the simple fact that we do not always know the
swizzling for a particular page (due to the swizzling varying based on
location in certain unbalanced configurations). Furthermore, the
pread/pwrite paths are now unbalanced in that we are required to use the
GTT as in some cases we do not have direct CPU access to the backing
physical pages (thus some paths trying to account for the swizzle, but
others neglecting, chaos ensues).
There are no known users who do use pread/pwrite into a tiled object
(you need to manually detile anyway, so why now just use mmap and avoid
the copy?) and no user bug reports to indicate that it is being used in
the wild. As no one is hitting the buggy path, we can just remove the
buggy code.
v2: Just use the fault allowing kmap() + normal copy_(to|from)_user
v3: Avoid int overflow in computing 'length' from 'remain' (Tvrtko)
References: fe115628d5 ("drm/i915: Implement pwrite without struct-mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190105120758.9237-1-chris@chris-wilson.co.uk
Originally, the rule used to be that you'd have to do access_ok()
separately, and then user_access_begin() before actually doing the
direct (optimized) user access.
But experience has shown that people then decide not to do access_ok()
at all, and instead rely on it being implied by other operations or
similar. Which makes it very hard to verify that the access has
actually been range-checked.
If you use the unsafe direct user accesses, hardware features (either
SMAP - Supervisor Mode Access Protection - on x86, or PAN - Privileged
Access Never - on ARM) do force you to use user_access_begin(). But
nothing really forces the range check.
By putting the range check into user_access_begin(), we actually force
people to do the right thing (tm), and the range check vill be visible
near the actual accesses. We have way too long a history of people
trying to avoid them.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When commit fddcd00a49 ("drm/i915: Force the slow path after a
user-write error") unified the error handling for various user access
problems, it didn't do the user_access_end() that is needed for the
unsafe_put_user() case.
It's not a huge deal: a missed user_access_end() will only mean that
SMAP protection isn't active afterwards, and for the error case we'll be
returning to user mode soon enough anyway. But it's wrong, and adding
the proper user_access_end() is trivial enough (and doing it for the
other error cases where it isn't needed doesn't hurt).
I noticed it while doing the same prep-work for changing
user_access_begin() that precipitated the access_ok() changes in commit
96d4f267e4 ("Remove 'type' argument from access_ok() function").
Fixes: fddcd00a49 ("drm/i915: Force the slow path after a user-write error")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: stable@kernel.org # v4.20
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.
It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.
A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.
This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
There were a couple of notable cases:
- csky still had the old "verify_area()" name as an alias.
- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)
- microblaze used the type argument for a debug printout
but other than those oddities this should be a total no-op patch.
I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When we first introduced the reset to sanitize the GPU on taking over
from the BIOS and before returning control to third parties (the BIOS!),
we restricted it to only systems utilizing HW contexts as we were
uncertain of how stable our reset mechanism truly was. We now have
reasonable coverage across all machines that expose a GPU reset method,
and so we should be safe to sanitize the GPU state everywhere.
v2: We _have_ to skip the reset if it would clobber the display.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190103112104.19561-1-chris@chris-wilson.co.uk
On Baytail, notably, we can still detect missed interrupt syndrome
(where we never spot a completed request). In this case, it can be
alleviated by always keeping the interrupt unmasked, implying that the
interrupt is being lost in the window after modifying the IMR. (This is
the reason we still have the posting reads on enable_irq, if we remove
them we miss interrupts!) Having narrowed the issue down to the IMR,
rather than keeping it always enabled, applying the usual posting
read/flush of the RING_IMR before unmasking the GT IMR also seems to
prevent the missed interrupt. So be it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190102163524.19353-1-chris@chris-wilson.co.uk
This trys to make 'kvmgt' module as self loadable instead of loading
by i915/gvt device model. So hypervisor specific module could be
stand-alone, e.g only after loading hypervisor specific module, GVT
feature could be enabled via specific hypervisor interface, e.g VFIO/mdev.
So this trys to use hypervisor module register/unregister interface
for that. Hypervisor module needs to take care of module reference
itself when working for hypervisor interface, e.g for VFIO/mdev,
hypervisor module would reference counting mdev when open and release.
This makes 'kvmgt' module really split from GVT device model. User
needs to load 'kvmgt' to enable VFIO/mdev interface.
v6:
- remove unused variable
v5:
- put module reference in register error path
v4:
- fix checkpatch warning
v3:
- Fix module reference handling for device open and release. Unused
mdev devices would be cleaned up in device unregister when module unload.
v2:
- Fix kvmgt order after i915 for built-in case
Cc: "Yuan, Hang" <hang.yuan@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: "He, Min" <min.he@intel.com>
Reviewed-by: Yuan, Hang <hang.yuan@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Pull IOMMU updates from Joerg Roedel:
- Page table code for AMD IOMMU now supports large pages where smaller
page-sizes were mapped before. VFIO had to work around that in the
past and I included a patch to remove it (acked by Alex Williamson)
- Patches to unmodularize a couple of IOMMU drivers that would never
work as modules anyway.
- Work to unify the the iommu-related pointers in 'struct device' into
one pointer. This work is not finished yet, but will probably be in
the next cycle.
- NUMA aware allocation in iommu-dma code
- Support for r8a774a1 and r8a774c0 in the Renesas IOMMU driver
- Scalable mode support for the Intel VT-d driver
- PM runtime improvements for the ARM-SMMU driver
- Support for the QCOM-SMMUv2 IOMMU hardware from Qualcom
- Various smaller fixes and improvements
* tag 'iommu-updates-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (78 commits)
iommu: Check for iommu_ops == NULL in iommu_probe_device()
ACPI/IORT: Don't call iommu_ops->add_device directly
iommu/of: Don't call iommu_ops->add_device directly
iommu: Consolitate ->add/remove_device() calls
iommu/sysfs: Rename iommu_release_device()
dmaengine: sh: rcar-dmac: Use device_iommu_mapped()
xhci: Use device_iommu_mapped()
powerpc/iommu: Use device_iommu_mapped()
ACPI/IORT: Use device_iommu_mapped()
iommu/of: Use device_iommu_mapped()
driver core: Introduce device_iommu_mapped() function
iommu/tegra: Use helper functions to access dev->iommu_fwspec
iommu/qcom: Use helper functions to access dev->iommu_fwspec
iommu/of: Use helper functions to access dev->iommu_fwspec
iommu/mediatek: Use helper functions to access dev->iommu_fwspec
iommu/ipmmu-vmsa: Use helper functions to access dev->iommu_fwspec
iommu/dma: Use helper functions to access dev->iommu_fwspec
iommu/arm-smmu: Use helper functions to access dev->iommu_fwspec
ACPI/IORT: Use helper functions to access dev->iommu_fwspec
iommu: Introduce wrappers around dev->iommu_fwspec
...
The irq_seqno_barrier is a tradeoff between doing work on every request
(on the GPU) and doing work after every interrupt (on the CPU). We
presume we have many more requests than interrupts! However, for
Ironlake, the workaround is a pretty hideous usleep() and so even though
it was found we need to repeat the MI_STORE_DWORD_IMM 8 times, or about
1us of GPU time, doing so is preferrable than requiring a sleep of
125-250us on the CPU where we desire to respond immediately (ideally from
within the interrupt handler)!
The additional MI_STORE_DWORD_IMM also have the side-effect of flushing
MI operations from userspace which are not caught by MI_FLUSH!
Testcase: igt/gem_sync
Testcase: igt/gem_exec_whisper
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181228171641.16531-5-chris@chris-wilson.co.uk
The irq_seqno_barrier is a tradeoff between doing work on every request
(on the GPU) and doing work after every interrupt (on the CPU). We
presume we have many more requests than interrupts! However, the current
w/a for Ivybridge is an implicit delay that currently fails sporadically
and consistently if we move the w/a into the irq handler itself. This
makes the CPU barrier untenable for upcoming interrupt handler changes
and so we need to replace it with a delay on the GPU before we send the
MI_USER_INTERRUPT. As it turns out that delay is 32x MI_STORE_DWORD_IMM,
or about 0.6us per request! Quite nasty, but the lesser of two evils
looking to the future.
Testcase: igt/gem_sync
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181228171641.16531-4-chris@chris-wilson.co.uk
Having transitioned to using PIPECONTROL to combine the flush with the
breadcrumb write using their post-sync functions, assume that this will
resolve the serialisation with the subsequent MI_USER_INTERRUPT. That is
when inspecting the breadcrumb after an interrupt we can rely on the write
being posted (i.e. the HWSP will be coherent).
Testing using gem_sync shows that the PIPECONTROL + CS stall does
serialise the command streamer sufficient that the breadcrumb lands
before the MI_USER_INTERRUPT. The same is not true for MI_FLUSH_DW.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181228171641.16531-2-chris@chris-wilson.co.uk
drivers/gpu/drm/i915/intel_display.c:10708: warning: Function parameter or member 'cur' not described in 'intel_wm_need_update'
drivers/gpu/drm/i915/intel_display.c:10708: warning: Function parameter or member 'new' not described in 'intel_wm_need_update'
drivers/gpu/drm/i915/intel_display.c:10708: warning: Excess function parameter 'plane' description in 'intel_wm_need_update'
drivers/gpu/drm/i915/intel_display.c:10708: warning: Excess function parameter 'state' description in 'intel_wm_need_update'
References: cd1d3ee90e ("drm/i915: Use intel_ types more consistently for watermark code (v2)")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181231143505.2523-1-chris@chris-wilson.co.uk
Pull Kconfig updates from Masahiro Yamada:
- support -y option for merge_config.sh to avoid downgrading =y to =m
- remove S_OTHER symbol type, and touch include/config/*.h files correctly
- fix file name and line number in lexer warnings
- fix memory leak when EOF is encountered in quotation
- resolve all shift/reduce conflicts of the parser
- warn no new line at end of file
- make 'source' statement more strict to take only string literal
- rewrite the lexer and remove the keyword lookup table
- convert to SPDX License Identifier
- compile C files independently instead of including them from zconf.y
- fix various warnings of gconfig
- misc cleanups
* tag 'kconfig-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (39 commits)
kconfig: surround dbg_sym_flags with #ifdef DEBUG to fix gconf warning
kconfig: split images.c out of qconf.cc/gconf.c to fix gconf warnings
kconfig: add static qualifiers to fix gconf warnings
kconfig: split the lexer out of zconf.y
kconfig: split some C files out of zconf.y
kconfig: convert to SPDX License Identifier
kconfig: remove keyword lookup table entirely
kconfig: update current_pos in the second lexer
kconfig: switch to ASSIGN_VAL state in the second lexer
kconfig: stop associating kconf_id with yylval
kconfig: refactor end token rules
kconfig: stop supporting '.' and '/' in unquoted words
treewide: surround Kconfig file paths with double quotes
microblaze: surround string default in Kconfig with double quotes
kconfig: use T_WORD instead of T_VARIABLE for variables
kconfig: use specific tokens instead of T_ASSIGN for assignments
kconfig: refactor scanning and parsing "option" properties
kconfig: use distinct tokens for type and default properties
kconfig: remove redundant token defines
kconfig: rename depends_list to comment_option_list
...
Patch series "mmu notifier contextual informations", v2.
This patchset adds contextual information, why an invalidation is
happening, to mmu notifier callback. This is necessary for user of mmu
notifier that wish to maintains their own data structure without having to
add new fields to struct vm_area_struct (vma).
For instance device can have they own page table that mirror the process
address space. When a vma is unmap (munmap() syscall) the device driver
can free the device page table for the range.
Today we do not have any information on why a mmu notifier call back is
happening and thus device driver have to assume that it is always an
munmap(). This is inefficient at it means that it needs to re-allocate
device page table on next page fault and rebuild the whole device driver
data structure for the range.
Other use case beside munmap() also exist, for instance it is pointless
for device driver to invalidate the device page table when the
invalidation is for the soft dirtyness tracking. Or device driver can
optimize away mprotect() that change the page table permission access for
the range.
This patchset enables all this optimizations for device drivers. I do not
include any of those in this series but another patchset I am posting will
leverage this.
The patchset is pretty simple from a code point of view. The first two
patches consolidate all mmu notifier arguments into a struct so that it is
easier to add/change arguments. The last patch adds the contextual
information (munmap, protection, soft dirty, clear, ...).
This patch (of 3):
To avoid having to change many callback definition everytime we want to
add a parameter use a structure to group all parameters for the
mmu_notifier invalidate_range_start/end callback. No functional changes
with this patch.
[akpm@linux-foundation.org: fix drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c kerneldoc]
Link: http://lkml.kernel.org/r/20181205053628.3210-2-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Jason Gunthorpe <jgg@mellanox.com> [infiniband]
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In preparation for removing the manual EMIT_FLUSH prior to emitting the
breadcrumb implement the flush inline with writing the breadcrumb for
ringbuffer emission.
With a combined flush+breadcrumb, we can use a single operation to both
flush and after the flush is complete (post-sync) write the breadcrumb.
This gives us a strongly ordered operation that should be sufficient to
serialise the write before we emit the interrupt; and therefore we may
take the opportunity to remove the irq_seqno_barrier w/a for gen6+.
Although using the PIPECONTROL to write the breadcrumb is slower than
MI_STORE_DWORD_IMM, by combining the operations into one and removing the
extra flush (next patch) it is faster
For gen2-5, we simply combine the MI_FLUSH into the breadcrumb emission,
though maybe we could find a solution here to the seqno-vs-interrupt
issue on Ironlake by mixing up the flush? The answer is no, adding an
MI_FLUSH before the interrupt is insufficient.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181228153114.4948-2-chris@chris-wilson.co.uk
The writing is on the wall for the existence of a single execution queue
along each engine, and as a consequence we will not be able to track
dependencies along the HW queue itself, i.e. we will not be able to use
HW semaphores on gen7 as they use a global set of registers (and unlike
gen8+ we can not effectively target memory to keep per-context seqno and
dependencies).
On the positive side, when we implement request reordering for gen7 we
also can not presume a simple execution queue and would also require
removing the current semaphore generation code. So this bring us another
step closer to request reordering for ringbuffer submission!
The negative side is that using interrupts to drive inter-engine
synchronisation is much slower (4us -> 15us to do a nop on each of the 3
engines on ivb). This is much better than it was at the time of introducing
the HW semaphores and equally important userspace weaned itself off
intermixing dependent BLT/RENDER operations (the prime culprit was glyph
rendering in UXA). So while we regress the microbenchmarks, it should not
impact the user.
References: https://bugs.freedesktop.org/show_bug.cgi?id=108888
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181228140736.32606-2-chris@chris-wilson.co.uk