To work around a DMC/Punit issue on ICL where the driver's
ICL_PORT_COMP_DW8/IREFGEN PHY setting is lost when entering/exiting DC6
state, make sure to reinit the PHY whenever disabling DC states.
Similarly the driver's PHY/DBUF/CDCLK settings should have been preserved
across DC5/6 transitions, so check this on all platforms.
This gets rid of the following WARN during suspend:
Combo PHY A HW state changed unexpectedly
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190816095523.15800-1-imre.deak@intel.com
Factor the main copy page to vram routine out into a helper that acts
on a single page and which doesn't require the nouveau_dmem_migrate
structure for argument passing. As an added benefit the new version
only allocates the dma address array once and reuses it for each
subsequent chunk of work.
Link: https://lore.kernel.org/r/20190814075928.23766-8-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Tested-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Factor the main copy page to ram routine out into a helper that acts on
a single page and which doesn't require the nouveau_dmem_fault
structure for argument passing. Also remove the loop over multiple
pages as we only handle one at the moment, although the structure of
the main worker function makes it relatively easy to add multi page
support back if needed in the future. But at least for now this avoid
the needed to dynamically allocate memory for the dma addresses in
what is essentially the page fault path.
Link: https://lore.kernel.org/r/20190814075928.23766-7-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Tested-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There isn't any good reason to pass callbacks to migrate_vma. Instead
we can just export the three steps done by this function to drivers and
let them sequence the operation without callbacks. This removes a lot
of boilerplate code as-is, and will allow the drivers to drastically
improve code flow and error handling further on.
Link: https://lore.kernel.org/r/20190814075928.23766-2-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Tested-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When using mmu_notifer_unregister_no_release() the caller must ensure
there is a SRCU synchronize before the mn memory is freed, otherwise use
after free races are possible, for instance:
CPU0 CPU1
invalidate_range_start
hlist_for_each_entry_rcu(..)
mmu_notifier_unregister_no_release(&p->mn)
kfree(mn)
if (mn->ops->invalidate_range_end)
The error unwind in amdkfd misses the SRCU synchronization.
amdkfd keeps the kfd_process around until the mm is released, so split the
flow to fully initialize the kfd_process and register it for find_process,
and with the notifier. Past this point the kfd_process does not need to be
cleaned up as it is fully ready.
The final failable step does a vm_mmap() and does not seem to impact the
kfd_process global state. Since it also cannot be undone (and already has
problems with undo if it internally fails), it has to be last.
This way we don't have to try to unwind the mmu_notifier_register() and
avoid the problem with the SRCU.
Along the way this also fixes various other error unwind bugs in the flow.
Fixes: 45102048f7 ("amdkfd: Add process queue manager module")
Link: https://lore.kernel.org/r/20190806231548.25242-10-jgg@ziepe.ca
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
radeon is using a device global hash table to track what mmu_notifiers
have been registered on struct mm. This is better served with the new
get/put scheme instead.
radeon has a bug where it was not blocking notifier release() until all
the BO's had been invalidated. This could result in a use after free of
pages the BOs. This is tied into a second bug where radeon left the
notifiers running endlessly even once the interval tree became
empty. This could result in a use after free with module unload.
Both are fixed by changing the lifetime model, the BOs exist in the
interval tree with their natural lifetimes independent of the mm_struct
lifetime using the get/put scheme. The release runs synchronously and just
does invalidate_start across the entire interval tree to create the
required DMA fence.
Additions to the interval tree after release are already impossible as
only current->mm is used during the add.
Link: https://lore.kernel.org/r/20190806231548.25242-9-jgg@ziepe.ca
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This is a significant simplification, it eliminates all the remaining
'hmm' stuff in mm_struct, eliminates krefing along the critical notifier
paths, and takes away all the ugly locking and abuse of page_table_lock.
mmu_notifier_get() provides the single struct hmm per struct mm which
eliminates mm->hmm.
It also directly guarantees that no mmu_notifier op callback is callable
while concurrent free is possible, this eliminates all the krefs inside
the mmu_notifier callbacks.
The remaining krefs in the range code were overly cautious, drivers are
already not permitted to free the mirror while a range exists.
Link: https://lore.kernel.org/r/20190806231548.25242-6-jgg@ziepe.ca
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Tested-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The 'memory-region' property of the komeda display driver DT binding
allows the use of a 'reserved-memory' node for buffer allocations. Add
the requisite of_reserved_mem_device_{init,release} calls to actually
make use of the memory if present.
Changes since v1:
- Move handling inside komeda_parse_dt
Signed-off-by: Mihail Atanassov <mihail.atanassov@arm.com>
Signed-off-by: Ayan Kumar Halder <ayan.halder@arm.com>
Reviewed-by: James Qian Wang (Arm Technology China) <james.qian.wang@arm.com>
Link:- https://patchwork.kernel.org/patch/11076413/
Use the new cec_notifier_conn_(un)register() functions to
(un)register the notifier for the HDMI connector, and fill in
the cec_connector_info.
Changes since v6:
- move cec_notifier_conn_unregister to a bridge detach
function,
- add a mutex protecting a CEC notifier.
Changes since v4:
- typo fix
Changes since v2:
- removed unnecessary NULL check before a call to
cec_notifier_conn_unregister,
- use cec_notifier_phys_addr_invalidate to invalidate physical
address.
Changes since v1:
Add memory barrier to make sure that the notifier
becomes visible to the irq thread once it is fully
constructed.
Signed-off-by: Dariusz Marcinkiewicz <darekm@google.com>
Acked-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Tested-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190814104520.6001-9-darekm@google.com
This improves Sphinx output in two ways:
- It avoids an unmatched single-quote ('), about which Sphinx complained:
Documentation/gpu/drm-internals.rst:298:
WARNING: Could not lex literal_block as "c". Highlighting skipped.
An alternative approach would be to replace "can't" with a word that
doesn't have a single-quote.
- It lets Sphinx format the comments in italics and grey, making the
code slightly easier to read.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Acked-by: Daniel Vetter <daniel@ffwll.ch> [via irc]
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190808163629.14280-1-j.neuschaefer@gmx.net
If there is no regulator defined for the GPU then still control the
frequency using the supplied clock.
Some boards have clock control but no (direct) control of the regulator.
For example the HiKey960 uses a mailbox protocol to a MCU to control
frequencies and doesn't directly control the voltage. This patch allows
frequency control of the GPU on this system.
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190816093107.30518-1-steven.price@arm.com
Up until now, a single shared GPU address space was used. This is not
ideal as there's no protection between processes and doesn't work for
supporting the same GPU/CPU VA feature. Most importantly, this will
hopefully mitigate Alyssa's fear of WebGL, whatever that is.
Most of the changes here are moving struct drm_mm and struct
panfrost_mmu objects from the per device struct to the per FD struct.
The critical function is panfrost_mmu_as_get() which handles allocating
and switching the h/w address spaces.
There's 3 states an AS can be in: free, allocated, and in use. When a
job runs, it requests an address space and then marks it not in use when
job is complete(but stays assigned). The first time thru, we find a free
AS in the alloc_mask and assign the AS to the FD. Then the next time
thru, we most likely already have our AS and we just mark it in use with
a ref count. We need a ref count because we have multiple job slots. If
the job/FD doesn't have an AS assigned and there are no free ones, then
we pick an allocated one not in use from our LRU list and switch the AS
from the old FD to the new one.
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Steven Price <steven.price@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190813150115.30338-1-robh@kernel.org
Our pin mapping tables for ICP and MCC currently only list the standard
GPIO pins used for various output ports. Even through ICP's standard
pin usage only utilizes pins 1, 2, and 9-12, and MCC's standard pin
usage only uses pins 1, 2, and 9, these platforms do still have GPIO
registers to address pins in the range 1-3 and 9-14. OEM's may remap
GPIO usage in non-standard ways (and provide the actual mapping via VBT
settings), so we shouldn't exclude pins on these platforms just because
they aren't part of the standard mappings.
TGP's standard pin tables contains all the possible pins, so let's
rename them to "icp" and use them for all PCH >= PCH_ICP. This will
prevent intel_gmbus_is_valid_pin from rejecting non-standard pin usage
that an OEM specifies via the VBT.
Note that this will cause pin 9 to be labeled as "tc1" instead of "dpc"
in debug messages on platforms with the MCC PCH, but that may actually
help avoid confusion since the text strings will now be the same on all
gen11+ platforms instead of being different on just EHL.
v2: Drop now-unused MCC_DDC_BUS_DDI_* names.
v3: We want to compare against INTEL_PCH_TYPE, not INTEL_PCH_ID.
Bspec: 8417
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190817005041.20651-1-matthew.d.roper@intel.com
The first pixel of the next tile is only sampled by the hardware if the
fractional input position corresponding to the last written output pixel
is not an integer position.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Burst aligned input and output width can be calculated once per column,
instead of repeatedly for each tile in the column. The same goes for
input and output height per row. Also don't round up the same values
repeatedly.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
If we managed to create tiles sized 0x0 because of a bug in the seam
calculation, return with an error message instead of letting the driver
run into a division by zero later. Also check for tile sizes that are
larger than supported by the hardware.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
This patch effectively reverts commit 912bbf7e9c ("gpu: ipu-v3:
image-convert: Fix image downsize coefficients") and replaces it with a
different solution based on the preceding patches.
The previous fix tried to solve the problem of intermediate tile size
between IC downsizing and main processing sections not being limited to
1024 pixels by downsizing the input image to a smaller intermediate size
in the downsizing box filter. This causes unnecessary blurring,
especially for scaling factors close to 1.
Now that the seam position calculation makes sure that the 1024 pixel
intermediate tile size limit is not exceeded, calculate the number of
tiles from the maximum of intermediate size and output size and avoid
unnecessary downsizing.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Limit the input seam position to an interval that guarantees the tile
size does not exceed 1024 pixels after the IC downsizing section and
that space is left for the next tile.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
This fixes a failure to determine any seam if the output size is
exactly 1024 multiplied by the number of tiles in a given direction.
In that case an empty interval out_start == out_end is being passed
to find_best_seam, which looks for a seam out_start <= x < out_end.
Also reduce the interval for all but the left column / top row, to
avoid returning position 0 as best fit.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Support is already implemented for the corresponding DRM formats,
just hook up the remaining V4L2 pixel formats.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Marco Felsch <m.felsch@pengutronix.de>
Each iteration of for_each_child_of_node puts the previous
node, but in the case of a goto from the middle of the loop, there is
no put, thus causing a memory leak. Hence add an of_node_put before the
goto in two places.
Issue found with Coccinelle.
Fixes: 119f517362 (drm/mediatek: Add DRM Driver for Mediatek SoC MT8173)
Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com>
Signed-off-by: CK Hu <ck.hu@mediatek.com>