Merge updates from Andrew Morton:
"A few little subsystems and a start of a lot of MM patches.
Subsystems affected by this patch series: squashfs, ocfs2, parisc,
vfs. With mm subsystems: slab-generic, slub, debug, pagecache, gup,
swap, memcg, pagemap, memory-failure, vmalloc, kasan"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (128 commits)
kasan: move kasan_report() into report.c
mm/mm_init.c: report kasan-tag information stored in page->flags
ubsan: entirely disable alignment checks under UBSAN_TRAP
kasan: fix clang compilation warning due to stack protector
x86/mm: remove vmalloc faulting
mm: remove vmalloc_sync_(un)mappings()
x86/mm/32: implement arch_sync_kernel_mappings()
x86/mm/64: implement arch_sync_kernel_mappings()
mm/ioremap: track which page-table levels were modified
mm/vmalloc: track which page-table levels were modified
mm: add functions to track page directory modifications
s390: use __vmalloc_node in stack_alloc
powerpc: use __vmalloc_node in alloc_vm_stack
arm64: use __vmalloc_node in arch_alloc_vmap_stack
mm: remove vmalloc_user_node_flags
mm: switch the test_vmalloc module to use __vmalloc_node
mm: remove __vmalloc_node_flags_caller
mm: remove both instances of __vmalloc_node_flags
mm: remove the prot argument to __vmalloc_node
mm: remove the pgprot argument to __vmalloc
...
Doing a "get_user_pages()" on a copy-on-write page for reading can be
ambiguous: the page can be COW'ed at any time afterwards, and the
direction of a COW event isn't defined.
Yes, whoever writes to it will generally do the COW, but if the thread
that did the get_user_pages() unmapped the page before the write (and
that could happen due to memory pressure in addition to any outright
action), the writer could also just take over the old page instead.
End result: the get_user_pages() call might result in a page pointer
that is no longer associated with the original VM, and is associated
with - and controlled by - another VM having taken it over instead.
So when doing a get_user_pages() on a COW mapping, the only really safe
thing to do would be to break the COW when getting the page, even when
only getting it for reading.
At the same time, some users simply don't even care.
For example, the perf code wants to look up the page not because it
cares about the page, but because the code simply wants to look up the
physical address of the access for informational purposes, and doesn't
really care about races when a page might be unmapped and remapped
elsewhere.
This adds logic to force a COW event by setting FOLL_WRITE on any
copy-on-write mapping when FOLL_GET (or FOLL_PIN) is used to get a page
pointer as a result.
The current semantics end up being:
- __get_user_pages_fast(): no change. If you don't ask for a write,
you won't break COW. You'd better know what you're doing.
- get_user_pages_fast(): the fast-case "look it up in the page tables
without anything getting mmap_sem" now refuses to follow a read-only
page, since it might need COW breaking. Which happens in the slow
path - the fast path doesn't know if the memory might be COW or not.
- get_user_pages() (including the slow-path fallback for gup_fast()):
for a COW mapping, turn on FOLL_WRITE for FOLL_GET/FOLL_PIN, with
very similar semantics to FOLL_FORCE.
If it turns out that we want finer granularity (ie "only break COW when
it might actually matter" - things like the zero page are special and
don't need to be broken) we might need to push these semantics deeper
into the lookup fault path. So if people care enough, it's possible
that we might end up adding a new internal FOLL_BREAK_COW flag to go
with the internal FOLL_COW flag we already have for tracking "I had a
COW".
Alternatively, if it turns out that different callers might want to
explicitly control the forced COW break behavior, we might even want to
make such a flag visible to the users of get_user_pages() instead of
using the above default semantics.
But for now, this is mostly commentary on the issue (this commit message
being a lot bigger than the patch, and that patch in turn is almost all
comments), with that minimal "enable COW breaking early" logic using the
existing FOLL_WRITE behavior.
[ It might be worth noting that we've always had this ambiguity, and it
could arguably be seen as a user-space issue.
You only get private COW mappings that could break either way in
situations where user space is doing cooperative things (ie fork()
before an execve() etc), but it _is_ surprising and very subtle, and
fork() is supposed to give you independent address spaces.
So let's treat this as a kernel issue and make the semantics of
get_user_pages() easier to understand. Note that obviously a true
shared mapping will still get a page that can change under us, so this
does _not_ mean that get_user_pages() somehow returns any "stable"
page ]
Reported-by: Jann Horn <jannh@google.com>
Tested-by: Christoph Hellwig <hch@lst.de>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kirill Shutemov <kirill@shutemov.name>
Acked-by: Jan Kara <jack@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While the current locking/serialization of the global state
suffices for protecting the obj->state access and the actual
hardware reprogramming, we do have a problem with accessing
the old/new states during nonblocking commits.
The state computation and swap will be protected by the crtc
locks, but the commit_tails can finish out of order, thus also
causing the atomic states to be cleaned up out of order. This
would mean the commit that started first but finished last has
had its new state freed as the no-longer-needed old state by the
other commit.
To fix this let's just refcount the states. obj->state amounts
to one reference, and the intel_atomic_state holds extra references
to both its new and old global obj states.
Fixes: 0ef1905ecf ("drm/i915: Introduce better global state handling")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200527200245.13184-1-ville.syrjala@linux.intel.com
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
(cherry picked from commit f8c86ffa28)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Pull uaccess/readdir updates from Al Viro:
"Finishing the conversion of readdir.c to unsafe_... API.
This includes the uaccess_{read,write}_begin series by Christophe
Leroy"
* 'uaccess.readdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
readdir.c: get rid of the last __put_user(), drop now-useless access_ok()
readdir.c: get compat_filldir() more or less in sync with filldir()
switch readdir(2) to unsafe_copy_dirent_name()
drm/i915/gem: Replace user_access_begin by user_write_access_begin
uaccess: Selectively open read or write user access
uaccess: Add user_read_access_begin/end and user_write_access_begin/end
Ever noticed that our interrupt handlers are where we spend most of our
time on a busy system? In part this is unavoidable as each interrupt
requires to poll and reset several registers, but we can try and do so as
efficiently as possible.
Function old new delta
ilk_irq_handler 2317 2156 -161
v2: Restore the irqreturn_t ret
Function old new delta
ilk_irq_handler.cold 63 72 +9
ilk_irq_handler 2221 2080 -141
A slight improvement in the baseline overnight as well!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200601140355.20243-1-chris@chris-wilson.co.uk
While the current locking/serialization of the global state
suffices for protecting the obj->state access and the actual
hardware reprogramming, we do have a problem with accessing
the old/new states during nonblocking commits.
The state computation and swap will be protected by the crtc
locks, but the commit_tails can finish out of order, thus also
causing the atomic states to be cleaned up out of order. This
would mean the commit that started first but finished last has
had its new state freed as the no-longer-needed old state by the
other commit.
To fix this let's just refcount the states. obj->state amounts
to one reference, and the intel_atomic_state holds extra references
to both its new and old global obj states.
Fixes: 0ef1905ecf ("drm/i915: Introduce better global state handling")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200527200245.13184-1-ville.syrjala@linux.intel.com
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Our forcewake utilisation is split into categories: automatic and
manual. Around bare register reads, we look up the right forcewake
domain and automatically acquire and release [upon a timer] the
forcewake domain. For other access, where we know we require the
forcewake across a group of register reads, we manually acquire the
forcewake domain and release it at the end. Again, this currently arms
the domain timer for a later release.
However, looking at some energy utilisation profiles, we have tried to
avoid using forcewake [and rely on the natural wake up to post register
updates] due to that even keep the fw active for a brief period
contributes to a significant power draw [i.e. when the gpu is sleeping
with rc6 at high clocks]. But as it turns out, not posting the writes
immediately also has unintended consequences, such as not reducing the
clocks and so conserving power while busy.
As a compromise, let us only arm the domain timer for automatic
forcewake usage around bare register access, but immediately release the
forcewake when manually acquired by intel_uncore_forcewake_get/_put.
The corollary to this is that we may instead have to take forcewake more
often, and so incur a latency penalty in doing so. For Sandybridge this
was significant, and even on the latest machines, taking forcewake at
interrupt frequency is a huge impact. [So we don't do that anymore!
Hopefully, this will spare us from still needing the mitigation of the
timer for steady state execution.]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200601072446.19548-13-chris@chris-wilson.co.uk
With the advent of preempt-to-busy, a request may still be on the GPU as
we unwind. And in the case of a unpreemptible [due to HW] request, that
request will remain indefinitely on the GPU even though we have
returned it back to our submission queue, and cleared the active bit.
We only run the execution callbacks on transferring the request from our
submission queue to the execution queue, but if this is a bonded request
that the HW is waiting for, we will not submit it (as we wait for a
fresh execution) even though it is still being executed.
As we know that there are always preemption points between requests, we
know that only the currently executing request may be still active even
though we have cleared the flag. However, we do not precisely know which
request is in ELSP[0] due to a delay in processing events, and
furthermore we only store the last request in a context in our state
tracker.
Fixes: 22b7a426bb ("drm/i915/execlists: Preempt-to-busy")
Testcase: igt/gem_exec_balancer/bonded-dual
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200529143926.3245-1-chris@chris-wilson.co.uk
(cherry picked from commit b55230e5e8)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
When we push a virtual request onto the HW, we update the rq->engine to
point to the physical engine. A request that is then submitted by the
user that waits upon the virtual engine, but along the physical engine
in use, will then see that it is due to be submitted to the same engine
and take a shortcut (and be queued without waiting for the completion
fence). However, the virtual request may be preempted (either by higher
priority users, or by timeslicing) and removed from the physical engine
to be migrated over to one of its siblings. The dependent normal request
however is oblivious to the removal of the virtual request and remains
queued to execute on HW, believing that once it reaches the head of its
queue all of its predecessors will have completed executing!
v2: Beware restriction of signal->execution_mask prior to submission.
Fixes: 6d06779e86 ("drm/i915: Load balancing across a virtual engine")
Testcase: igt/gem_exec_balancer/sliced
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.3+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200526090753.11329-2-chris@chris-wilson.co.uk
(cherry picked from commit 511b6d9aed)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Currently the plane property doesn't have support for YCBCR_BT2020,
which enables the corresponding color conversion mode on plane CSC.
Enabling the plane property for the planes for GLK & ICL+ platforms.
Also as per spec, update the Plane Color CSC from YUV601_TO_RGB709
to YUV601_TO_RGB601.
V2: Enabling support for YCBCT_BT2020 for HDR planes on
platforms GLK & ICL
V3: Refined the condition check to handle GLK & ICL+ HDR planes
Also added BT2020 handling in glk_plane_color_ctl.
V4: Combine If-else into single If
V5: Drop the checking for HDR planes and enable YCBCR_BT2020
for platforms GLK & ICL+.
V6: As per Spec, update PLANE_COLOR_CSC_MODE_YUV601_TO_RGB709
to PLANE_COLOR_CSC_MODE_YUV601_TO_RGB601 as per Ville's
feedback.
V7: Rebased
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Kishore Kadiyala <kishore.kadiyala@intel.com>
Signed-off-by: Uma Shankar <uma.shankar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200601073544.11291-1-kishore.kadiyala@intel.com
With the advent of preempt-to-busy, a request may still be on the GPU as
we unwind. And in the case of a unpreemptible [due to HW] request, that
request will remain indefinitely on the GPU even though we have
returned it back to our submission queue, and cleared the active bit.
We only run the execution callbacks on transferring the request from our
submission queue to the execution queue, but if this is a bonded request
that the HW is waiting for, we will not submit it (as we wait for a
fresh execution) even though it is still being executed.
As we know that there are always preemption points between requests, we
know that only the currently executing request may be still active even
though we have cleared the flag. However, we do not precisely know which
request is in ELSP[0] due to a delay in processing events, and
furthermore we only store the last request in a context in our state
tracker.
Fixes: 22b7a426bb ("drm/i915/execlists: Preempt-to-busy")
Testcase: igt/gem_exec_balancer/bonded-dual
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200529143926.3245-1-chris@chris-wilson.co.uk
Replace the use of mode->private_flags with a truly private bitmaks
in our own crtc state. We also need a copy in the crtc itself so the
vblank code can get at it. We already have scanline_offset in there
for a similar reason, as well as the vblank->hwmode which is assigned
via drm_calc_timestamping_constants(). Fortunately we now have a
nice place for doing the crtc_state->crtc copy in
intel_crtc_update_active_timings() which gets called both for
modesets and init/resume readout.
The one slightly iffy spot is the INHERITED flag which we want to
preserve until userspace/fb_helper does the first proper commit after
actually calling .detecti() on the connectors. Otherwise we don't have
the full sink capabilities (audio,infoframes,etc.) when .compute_config()
gets called and thus we will fail to enable those features when the
first userspace commit happens. The only internal commit we do prior to
that should be from intel_initial_commit() and there we can simply
preserve the INHERITED flag from the readout.
v2: Deal with INHERITED in sanitize_watermarks() as well
CC: Sam Ravnborg <sam@ravnborg.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200429103904.11727-1-ville.syrjala@linux.intel.com
We should be able to skip restoring LOCAL (user) binds within the GGTT on
resume and let them be restored upon demand. However, our consistency
checks demand that the bind flags match the node state, and we cannot
simply clear the flags, we need to evict as well. For now, make sure we
restore the bind flags exactly upon resume.
Fixes: 0109a16ef3 ("drm/i915/gt: Clear LOCAL_BIND from shared GGTT on resume")
Fixes: bf0840cdb3 ("drm/i915/gt: Stop cross-polluting PIN_GLOBAL with PIN_USER with no-ppgtt")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200528150452.7880-1-chris@chris-wilson.co.uk
We have a I915_REQUEST_NOPREEMPT flag that we set when we must prevent
the HW from preempting during the course of this request. We need to
honour this flag and protect the HW even if we have a heartbeat request,
or other maximum priority barrier, pending. As such, restrict the
timeslicing check to avoid preempting into the topmost priority band,
leaving the unpreemptable requests in blissful peace running
uninterrupted on the HW.
v2: Set the I915_PRIORITY_BARRIER to be less than
I915_PRIORITY_UNPREEMPTABLE so that we never submit a request
(heartbeat or barrier) that can legitimately preempt the current
non-premptable request.
Fixes: 2a98f4e65b ("drm/i915: add infrastructure to hold off preemption on a request")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200527162418.24755-1-chris@chris-wilson.co.uk
gcc-9 gets confused by the code flow in check_dirty_whitelist:
drivers/gpu/drm/i915/gt/selftest_workarounds.c: In function 'check_dirty_whitelist':
drivers/gpu/drm/i915/gt/selftest_workarounds.c:492:17: error: 'rsvd' may be used uninitialized in this function [-Werror=maybe-uninitialized]
I could not figure out a good way to do this in a way that gcc
understands better, so initialize the variable to zero, as last
resort.
Fixes: aee20aaed8 ("drm/i915: Implement read-only support in whitelist selftest")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200527140526.1458215-2-arnd@arndb.de
Conditional spinlocks make it hard for gcc and for lockdep to
follow the code flow. This one causes a warning with at least
gcc-9 and higher:
In file included from include/linux/irq.h:14,
from drivers/gpu/drm/i915/i915_pmu.c:7:
drivers/gpu/drm/i915/i915_pmu.c: In function 'i915_sample':
include/linux/spinlock.h:289:3: error: 'flags' may be used uninitialized in this function [-Werror=maybe-uninitialized]
289 | _raw_spin_unlock_irqrestore(lock, flags); \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/i915/i915_pmu.c:288:17: note: 'flags' was declared here
288 | unsigned long flags;
| ^~~~~
Split out the part between the locks into a separate function
for readability and to let the compiler figure out what the
logic actually is.
Fixes: d79e1bd676 ("drm/i915/pmu: Only use exclusive mmio access for gen7")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200527140526.1458215-1-arnd@arndb.de
We only restore GLOBAL binds upon resume as we expect these to be pinned
for use by HW, whereas the LOCAL binds can be recreated on demand once
userspace is resumed. For the LOCAL bind to be recreated in the global
GTT (for old systems without ppgtt), we need to clear its presence flag
on deciding not to restore the mapping upon resume.
Fixes: bf0840cdb3 ("drm/i915/gt: Stop cross-polluting PIN_GLOBAL with PIN_USER with no-ppgtt")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200526150739.26147-1-chris@chris-wilson.co.uk
When we push a virtual request onto the HW, we update the rq->engine to
point to the physical engine. A request that is then submitted by the
user that waits upon the virtual engine, but along the physical engine
in use, will then see that it is due to be submitted to the same engine
and take a shortcut (and be queued without waiting for the completion
fence). However, the virtual request may be preempted (either by higher
priority users, or by timeslicing) and removed from the physical engine
to be migrated over to one of its siblings. The dependent normal request
however is oblivious to the removal of the virtual request and remains
queued to execute on HW, believing that once it reaches the head of its
queue all of its predecessors will have completed executing!
v2: Beware restriction of signal->execution_mask prior to submission.
Fixes: 6d06779e86 ("drm/i915: Load balancing across a virtual engine")
Testcase: igt/gem_exec_balancer/sliced
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.3+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200526090753.11329-2-chris@chris-wilson.co.uk
Reduce the irq_work llist for attaching the callbacks to the signal for
both smaller structs (two fewer pointers!) and simpler [debug] code:
Function old new delta
irq_execute_cb 35 34 -1
__igt_breadcrumbs_smoketest 1684 1682 -2
i915_request_retire 2003 1996 -7
__i915_request_create 1047 1040 -7
__notify_execute_cb 135 126 -9
__i915_request_ctor 188 178 -10
__await_execution.part.constprop 451 440 -11
igt_wait_request 924 714 -210
One minor artifact is that the order of cb exection is reversed. No
current use cases are affected by that change.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200526112051.10229-1-chris@chris-wilson.co.uk