[ Upstream commit cabbdea1f1861098991768d7bbf5a49ed1608213 ]
Pointer mqd_mem_obj can be deallocated in kfd_gtt_sa_allocate().
The function then returns non-zero value, which causes the second deallocation.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: d1f8f0d17d ("drm/amdkfd: Move non-sdma mqd allocation out of init_mqd")
Signed-off-by: Daniil Dulov <d.dulov@aladdin.ru>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7c6b6e18c890f30965b0589b0a57645e1dbccfde ]
dqm->gws_queue_count and pdd->qpd.mapped_gws_queue need to be updated
each time the queue gets evicted.
Fixes: b8020b0304 ("drm/amdkfd: Enable over-subscription with >1 GWS queue")
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ebbb7bb9e80305820dc2328a371c1b35679f2667 ]
As the kmalloc_array() may return null, the 'event_waiters[i].wait' would lead to null-pointer dereference.
Therefore, it is better to check the return value of kmalloc_array() to avoid this confusion.
Signed-off-by: QintaoShen <unSimple1993@163.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b7dfbd2e601f3fee545bc158feceba4f340fe7cf ]
Compute-only GPUs have more than 8 VMIDs allocated to KFD. Fix
this by passing correct number of VMIDs to HWS
v2: squash in warning fix (Alex)
Signed-off-by: Tushar Patel <tushar.patel@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9dff13f9edf755a15f6507874185a3290c1ae8bb ]
The driver has a fallback so make the message informational
rather than a warning. The driver has a fallback if the
Component Resource Association Table (CRAT) is missing, so
make this informational now.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1906
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2cf49e00d40d5132e3d067b5aa6d84791929ab15 ]
In SRIOV configuration, the reset may failed to bring asic back to normal but stop cpsch
already been called, the start_cpsch will not be called since there is no resume in this
case. When reset been triggered again, driver should avoid to do uninitialization again.
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6f4b590aae217da16cfa44039a2abcfb209137ab ]
When IOMMU disabled in sbios and kfd in iommuv2 path,
IOMMU resume failure blocks system resume. Don't allow kfd to
use iommu v2 when iommu is disabled.
Reported-by: youling <youling257@gmail.com>
Tested-by: youling <youling257@gmail.com>
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1ec06c2dee679e9f089e78ed20cb74ee90155f61 ]
On systems with multiple SH per SE compute_static_thread_mgmt_se#
is split into independent masks, one for each SH, in the upper and
lower 16 bits. We need to detect this and apply cu masking to each
SH. The cu mask bits are assigned first to each SE, then to
alternate SHs, then finally to higher CU id. This ensures that
the maximum number of SPIs are engaged as early as possible while
balancing CU assignment to each SH.
v2: Use max SH/SE rather than max SH in cu_per_sh.
v3: Fix comment blocks, ensure se_mask is initially zero filled,
and correctly assign se.sh.cu positions to unset bits in cu_mask.
Signed-off-by: Sean Keely <Sean.Keely@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dcdb4d904b4bd3078fe8d4d24b1658560d6078ef ]
3 cases of kobj leak, which causes memory leak:
kobj_type must have release() method to free memory from release
callback. Don't need NULL default_attrs to init kobj.
sysfs files created under kobj_status should be removed with kobj_status
as parent kobject.
Remove queue sysfs files when releasing queue from process MMU notifier
release callback.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a7b2451d31cfa2e8aeccf3b35612ce33f02371fc ]
Calling free_mqd inside of destroy_queue_nocpsch_locked can cause a
circular lock. destroy_queue_nocpsch_locked is called under a DQM lock,
which is taken in MMU notifiers, potentially in FS reclaim context.
Taking another lock, which is BO reservation lock from free_mqd, while
causing an FS reclaim inside the DQM lock creates a problematic circular
lock dependency. Therefore move free_mqd out of
destroy_queue_nocpsch_locked and call it after unlocking DQM.
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 63f6e01237257e7226efc5087f3f0b525d320f54 ]
get_wave_state acquires the mmap_lock on copy_to_user but so do
mmu_notifiers. mmu_notifiers allows dqm locking so do get_wave_state
outside the dqm_lock to prevent circular locking.
v2: squash in unused variable removal.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1e87068570a2cc4db5f95a881686add71729e769 ]
Using 'imply AMD_IOMMU_V2' does not guarantee that the driver can link
against the exported functions. If the GPU driver is built-in but the
IOMMU driver is a loadable module, the kfd_iommu.c file is indeed
built but does not work:
x86_64-linux-ld: drivers/gpu/drm/amd/amdkfd/kfd_iommu.o: in function `kfd_iommu_bind_process_to_device':
kfd_iommu.c:(.text+0x516): undefined reference to `amd_iommu_bind_pasid'
x86_64-linux-ld: drivers/gpu/drm/amd/amdkfd/kfd_iommu.o: in function `kfd_iommu_unbind_process':
kfd_iommu.c:(.text+0x691): undefined reference to `amd_iommu_unbind_pasid'
x86_64-linux-ld: drivers/gpu/drm/amd/amdkfd/kfd_iommu.o: in function `kfd_iommu_suspend':
kfd_iommu.c:(.text+0x966): undefined reference to `amd_iommu_set_invalidate_ctx_cb'
x86_64-linux-ld: kfd_iommu.c:(.text+0x97f): undefined reference to `amd_iommu_set_invalid_ppr_cb'
x86_64-linux-ld: kfd_iommu.c:(.text+0x9a4): undefined reference to `amd_iommu_free_device'
x86_64-linux-ld: drivers/gpu/drm/amd/amdkfd/kfd_iommu.o: in function `kfd_iommu_resume':
kfd_iommu.c:(.text+0xa9a): undefined reference to `amd_iommu_init_device'
x86_64-linux-ld: kfd_iommu.c:(.text+0xadc): undefined reference to `amd_iommu_set_invalidate_ctx_cb'
x86_64-linux-ld: kfd_iommu.c:(.text+0xaff): undefined reference to `amd_iommu_set_invalid_ppr_cb'
x86_64-linux-ld: kfd_iommu.c:(.text+0xc72): undefined reference to `amd_iommu_bind_pasid'
x86_64-linux-ld: kfd_iommu.c:(.text+0xe08): undefined reference to `amd_iommu_set_invalidate_ctx_cb'
x86_64-linux-ld: kfd_iommu.c:(.text+0xe26): undefined reference to `amd_iommu_set_invalid_ppr_cb'
x86_64-linux-ld: kfd_iommu.c:(.text+0xe42): undefined reference to `amd_iommu_free_device'
Use IS_REACHABLE to only build IOMMU-V2 support if the amd_iommu symbols
are reachable by the amdkfd driver. Output a warning if they are not,
because that may not be what the user was expecting.
Fixes: 64d1c3a43a ("drm/amdkfd: Centralize IOMMUv2 code and make it conditional")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit e92049ae4548ba09e53eaa9c8f6964b07ea274c9 upstream.
Amdgpu driver uses 4-byte data type as DQM fence memory,
and transmits GPU address of fence memory to microcode
through query status PM4 message. However, query status
PM4 message definition and microcode processing are all
processed according to 8 bytes. Fence memory only allocates
4 bytes of memory, but microcode does write 8 bytes of memory,
so there is a memory corruption.
Changes since v1:
* Change dqm->fence_addr as a u64 pointer to fix this issue,
also fix up query_status and amdkfd_fence_wait_timeout function
uses 64 bit fence value to make them consistent.
Signed-off-by: Qu Huang <jinsdb@126.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1fb8b1fc4dd1035a264c81d15d41f05884cc8058 upstream.
memalloc_nofs_save/restore are no longer sufficient to prevent recursive
lock warnings when holding locks that can be taken in MMU notifiers. Use
memalloc_noreclaim_save/restore instead.
Fixes: f920e413ff9c ("mm: track mmu notifiers in fs_reclaim_acquire/release")
CC: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.10.x
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8b335bff643f3b39935c7377dbcd361c5b605d98 ]
KASAN reported a slab-out-of-bounds read of size 1 in
kdf_create_vcrat_image_cpu().
This occurs when, for example, when on an x86_64 with a single NUMA node
because kfd_fill_iolink_info_for_cpu() is a no-op, but afterwards the
sub_type_hdr->length, which is out-of-bounds, is read and multiplied by
entries. Fortunately, entries is 0 in this case so the overall
crat_table->length is still correct.
Check if there were any entries before de-referencing sub_type_hdr which
may be pointing to out-of-bounds memory.
Fixes: b7b6c38529 ("drm/amdkfd: Calculate CPU VCRAT size dynamically (v2)")
Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Jeremy Cline <jcline@redhat.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c4cb773c702be5519442c8375a6476d08fe2cb46 ]
The acpi_get_table() should be coupled with acpi_put_table() if
the mapped table is not used at runtime to release the table
mapping which can prevent the memory leak.
In kfd_create_crat_image_acpi(), crat_table is copied to pcrat_image,
and in kfd_create_vcrat_image_cpu(), the acpi_table is only used to
get the OEM information, so those two table mappings need to be released
after using it.
Fixes: 174de876d6 ("drm/amdkfd: Group up CRAT related functions")
Fixes: 520b8fb755 ("drm/amdkfd: Add topology support for CPUs")
Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Release dmabuf reference before returning from kfd_ioctl_import_dmabuf.
amdgpu_amdkfd_gpuvm_import_dmabuf takes a reference to the underlying
GEM BO and doesn't keep the reference to the dmabuf wrapper.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull more drm fixes from Dave Airlie:
"This should be the last round of things for rc1, a bunch of i915
fixes, some amdgpu, more font OOB fixes and one ttm fix just found
reading code:
fbcon/fonts:
- Two patches to prevent OOB access
ttm:
- fix for evicition value range check
amdgpu:
- Sienna Cichlid fixes
- MST manager resource leak fix
- GPU reset fix
amdkfd:
- Luxmark fix for Navi1x
i915:
- Tweak initial DPCD backlight.enabled value (Sean)
- Initialize reserved MOCS indices (Ayaz)
- Mark initial fb obj as WT on eLLC machines to avoid rcu lockup (Ville)
- Support parsing of oversize batches (Chris)
- Delay execlists processing for TGL (Chris)
- Use the active reference on the vma during error capture (Chris)
- Widen CSB pointer (Chris)
- Wait for CSB entries on TGL (Chris)
- Fix unwind for scratch page allocation (Chris)
- Exclude low patches of stolen memory (Chris)
- Force VT'd workarounds when running as a guest OS (Chris)
- Drop runtime-pm assert from vpgu io accessors (Chris)"
* tag 'drm-next-2020-10-23' of git://anongit.freedesktop.org/drm/drm: (31 commits)
drm/amdgpu: correct the cu and rb info for sienna cichlid
drm/amd/pm: remove the average clock value in sysfs
drm/amd/pm: fix pp_dpm_fclk
Revert drm/amdgpu: disable sienna chichlid UMC RAS
drm/amd/pm: fix pcie information for sienna cichlid
drm/amdkfd: Use same SQ prefetch setting as amdgpu
drm/amd/swsmu: correct wrong feature bit mapping
drm/amd/psp: Fix sysfs: cannot create duplicate filename
drm/amd/display: Avoid MST manager resource leak.
drm/amd/display: Revert "drm/amd/display: Fix a list corruption"
drm/amdgpu: update golden setting for sienna_cichlid
drm/amd/swsmu: add missing feature map for sienna_cichlid
drm/amdgpu: correct the gpu reset handling for job != NULL case
drm/amdgpu: add rlc iram and dram firmware support
drm/amdgpu: add function to program pbb mode for sienna cichlid
drm/i915: Drop runtime-pm assert from vgpu io accessors
drm/i915: Force VT'd workarounds when running as a guest OS
drm/i915: Exclude low pages (128KiB) of stolen from use
drm/i915/gt: Onion unwind for scratch page allocation failure
drm/ttm: fix eviction valuable range check.
...
0 causes instruction fetch stall at cache line boundary under some
conditions on Navi10. A non-zero prefetch is the preferred default
in any case.
Fixes soft hang in Luxmark.
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Pull drm fixes from Dave Airlie:
"Some fixes queued up already for i915 and amdgpu, I've also included
the fix for the clang warning you've seen.
i915:
- set all unused color plane offsets to ~0xfff again (Ville)
- fix TGL DKL PHY DP vswing handling (Ville)
amdgpu:
- DCN clang warning fix
- eDP fix
- BACO fix
- kernel documentation fixes
- SMU7 mclk fix
- VCN1 hw bug workaround
amdkfd:
- kvfree vs kfree fix"
* tag 'drm-next-2020-10-19' of git://anongit.freedesktop.org/drm/drm:
drm/amd/display: Fix incorrect dsc force enable logic
drm/amdkfd: Use kvfree in destroy_crat_image
drm/amdgpu: vcn and jpeg ring synchronization
drm/amd/pm: increase mclk switch threshold to 200 us
docs: amdgpu: fix a warning when building the documentation
drm/amd/display: kernel-doc: document force_timing_sync
drm/amdgpu/swsmu: init the baco mutex in early_init
drm/amd/display: Fix module load hangs when connected to an eDP
drm/i915: Set all unused color plane offsets to ~0xfff again
drm/i915: Fix TGL DKL PHY DP vswing handling
Pull drm updates from Dave Airlie:
"Not a major amount of change, the i915 trees got split into display
and gt trees to better facilitate higher level review, and there's a
major refactoring of i915 GEM locking to use more core kernel concepts
(like ww-mutexes). msm gets per-process pagetables, older AMD SI cards
get DC support, nouveau got a bump in displayport support with common
code extraction from i915.
Outside of drm this contains a couple of patches for hexint
moduleparams which you've acked, and a virtio common code tree that
you should also get via it's regular path.
New driver:
- Cadence MHDP8546 DisplayPort bridge driver
core:
- cross-driver scatterlist cleanups
- devm_drm conversions
- remove drm_dev_init
- devm_drm_dev_alloc conversion
ttm:
- lots of refactoring and cleanups
bridges:
- chained bridge support in more drivers
panel:
- misc new panels
scheduler:
- cleanup priority levels
displayport:
- refactor i915 code into helpers for nouveau
i915:
- split into display and GT trees
- WW locking refactoring in GEM
- execbuf2 extension mechanism
- syncobj timeline support
- GEN 12 HOBL display powersaving
- Rocket Lake display additions
- Disable FBC on Tigerlake
- Tigerlake Type-C + DP improvements
- Hotplug interrupt refactoring
amdgpu:
- Sienna Cichlid updates
- Navy Flounder updates
- DCE6 (SI) support for DC
- Plane rotation enabled
- TMZ state info ioctl
- PCIe DPC recovery support
- DC interrupt handling refactor
- OLED panel fixes
amdkfd:
- add SMI events for thermal throttling
- SMI interface events ioctl update
- process eviction counters
radeon:
- move to dma_ for allocations
- expose sclk via sysfs
msm:
- DSI support for sm8150/sm8250
- per-process GPU pagetable support
- Displayport support
mediatek:
- move HDMI phy driver to PHY
- convert mtk-dpi to bridge API
- disable mt2701 tmds
tegra:
- bridge support
exynos:
- misc cleanups
vc4:
- dual display cleanups
ast:
- cleanups
gma500:
- conversion to GPIOd API
hisilicon:
- misc reworks
ingenic:
- clock handling and format improvements
mcde:
- DSI support
mgag200:
- desktop g200 support
mxsfb:
- i.MX7 + i.MX8M
- alpha plane support
panfrost:
- devfreq support
- amlogic SoC support
ps8640:
- EDID from eDP retrieval
tidss:
- AM65xx YUV workaround
virtio:
- virtio-gpu exported resources
rcar-du:
- R8A7742, R8A774E1 and R8A77961 support
- YUV planar format fixes
- non-visible plane handling
- VSP device reference count fix
- Kconfig fix to avoid displaying disabled options in .config"
* tag 'drm-next-2020-10-15' of git://anongit.freedesktop.org/drm/drm: (1494 commits)
drm/ingenic: Fix bad revert
drm/amdgpu: Fix invalid number of character '{' in amdgpu_acpi_init
drm/amdgpu: Remove warning for virtual_display
drm/amdgpu: kfd_initialized can be static
drm/amd/pm: setup APU dpm clock table in SMU HW initialization
drm/amdgpu: prevent spurious warning
drm/amdgpu/swsmu: fix ARC build errors
drm/amd/display: Fix OPTC_DATA_FORMAT programming
drm/amd/display: Don't allow pstate if no support in blank
drm/panfrost: increase readl_relaxed_poll_timeout values
MAINTAINERS: Update entry for st7703 driver after the rename
Revert "gpu/drm: ingenic: Add option to mmap GEM buffers cached"
drm/amd/display: HDMI remote sink need mode validation for Linux
drm/amd/display: Change to correct unit on audio rate
drm/amd/display: Avoid set zero in the requested clk
drm/amdgpu: align frag_end to covered address space
drm/amdgpu: fix NULL pointer dereference for Renoir
drm/vmwgfx: fix regression in thp code due to ttm init refactor.
drm/amdgpu/swsmu: add interrupt work handler for smu11 parts
drm/amdgpu/swsmu: add interrupt work function
...
Pull x86 PASID updates from Borislav Petkov:
"Initial support for sharing virtual addresses between the CPU and
devices which doesn't need pinning of pages for DMA anymore.
Add support for the command submission to devices using new x86
instructions like ENQCMD{,S} and MOVDIR64B. In addition, add support
for process address space identifiers (PASIDs) which are referenced by
those command submission instructions along with the handling of the
PASID state on context switch as another extended state.
Work by Fenghua Yu, Ashok Raj, Yu-cheng Yu and Dave Jiang"
* tag 'x86_pasid_for_5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm: Add an enqcmds() wrapper for the ENQCMDS instruction
x86/asm: Carve out a generic movdir64b() helper for general usage
x86/mmu: Allocate/free a PASID
x86/cpufeatures: Mark ENQCMD as disabled when configured out
mm: Add a pasid member to struct mm_struct
x86/msr-index: Define an IA32_PASID MSR
x86/fpu/xstate: Add supervisor PASID state for ENQCMD
x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions
Documentation/x86: Add documentation for SVA (Shared Virtual Addressing)
iommu/vt-d: Change flags type to unsigned int in binding mm
drm, iommu: Change type of pasid to u32
compute units that are in use.
[Why]
Allow user to know how many compute units (CU) are in use at any given
moment.
[How]
Surface files in Sysfs that allow user to determine the number of compute
units that are in use for a given process. One Sysfs file is used per
device.
Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Reviewed-By: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
amdkfd is dumping a stack during initialization.
kfd_procfs_add_sysfs_stats is being called twice. This removes one of
them.
Fixes: 4327bed2ff ("drm/amdkfd: Add process eviction counters to sysfs")
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Philip Cox <Philip.Cox@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move doorbell allocation for a process into kfd device and
allocate doorbell space in each PDD during process creation.
Currently, KFD manages its own doorbell space but for some
devices, amdgpu would allocate the complete doorbell
space instead of leaving a chunk of doorbell space for KFD to
manage. In a system with mix of such devices, KFD would need
to request process doorbell space based on the type of device,
either from amdgpu or from its own doorbell space.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Instead of guessing at a sufficient size for the CPU VCRAT, base the
size on the number of online NUMA nodes.
v2: fix warning
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Remember KFD module initializaton status in a global variable. Skip KFD
device probing when the module was not initialized. Other amdgpu_amdkfd
calls are then protected by the adev->kfd.dev check.
Also print a clear error message when KFD disables itself. Amdgpu
continues its initialization even when KFD failed.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add per-process eviction counters to sysfs to keep track of
how many eviction events have happened for each process.
v2: rename the stats dir, and track all evictions per process, per device.
v3: Simplify the stats kobject handling and cleanup.
v4: more code cleanup
Signed-off-by: Philip Cox <Philip.Cox@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Extending the module parameter debug_evictions to also print a stack
trace when the eviction code path is called.
Signed-off-by: Philip Cox <Philip.Cox@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In the resume stage of GPU recovery, start_cpsch will call pm_init
which set pm->allocated as false, cause the next pm_release_ib has
no chance to release ib memory.
Add pm_release_ib in stop_cpsch which will be called in the suspend
stage of GPU recovery.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If KFD_SUPPORT_IOMMU_V2 is not set, gcc warns:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device.c:121:37: warning: ‘raven_device_info’ defined but not used [-Wunused-const-variable=]
static const struct kfd_device_info raven_device_info = {
^~~~~~~~~~~~~~~~~
As Huang Rui suggested, Raven already has the fallback path,
so it should be out of IOMMU v2 flag.
Suggested-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In the resume stage of GPU recovery, start_cpsch will call pm_init
which set pm->allocated as false, cause the next pm_release_ib has
no chance to release ib memory.
Add pm_release_ib in stop_cpsch which will be called in the suspend
stage of GPU recovery.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add support for reporting GPU reset events through SMI. KFD
would report both pre and post GPU reset events.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The ctx->features are new RAS implementation which
is only available for Vega20 and onwards, it is not
available for vega10, vega10 should follow legacy
ECC implementation.
Changed from V1:
wrap function to initialize kfd node properties
Changed from V2:
remove wrap function and SDMA SRAM ECC check
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>