what:
the MQD's save and restore of KCQ (kernel compute queue)
cost lots of clocks during world switch which impacts a lot
to multi-VF performance
how:
introduce a paramter to control the number of KCQ to avoid
performance drop if there is no kernel compute queue needed
notes:
this paramter only affects gfx 8/9/10
v2:
refine namings
v3:
choose queues for each ring to that try best to cross pipes evenly.
v4:
fix indentation
some cleanupsin the gfx_compute_queue_acquire()
v5:
further fix on indentations
more cleanupsin gfx_compute_queue_acquire()
TODO:
in the future we will let hypervisor driver to set this paramter
automatically thus no need for user to configure it through
modprobe in virtual machine
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
During driver's probe, when it hits bad gpu tag in eeprom i2c
init calling(the tag was set when reported bad page reaches
bad page threshold in last driver's working loop), there are
some strategys to deal with the cases:
1. when the module parameter amdgpu_bad_page_threshold = 0,
that means page retirement feature is disabled, so just resetting
the eeprom is fine.
2. When amdgpu_bad_page_threshold is not 0, and moreover, user
sets one bigger valid data in order to make current boot up
succeeds, correct eeprom header tag and do not break booting.
3. For other cases, driver's probe will be broken.
v2: Just update eeprom header tag instead of resetting the whole
table header when user sets one bigger threshold data.
v3: Use dev_info/dev_err to print PCI device information, which
helps in mGPU case.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When amdgpu_bad_page_threshold = 0, bad page reservation stuffs
are skipped in either UMC ECC irq or page retirement calling of
sync flood isr.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Bad page information should not be exposed by sysfs when
bad page retirement is disabled, so decouple it from ras
sysfs group creating, and add one guard before creating.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add one definition for the RAS module's FS name. It's used
in both debugfs and sysfs cases.
v2: Use static variable instead of macro definition.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
RAS flags needs to be cleaned as well when user requires
one clean eeprom.
v2: RAS flags shall be restored after eeprom reset succeeds.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When GPU executes recovery and retriving bad GPU tag
from external eerpom device, the recovery will be broken
and error message is printed as well for user's awareness.
v2: Refine warning message in threshold reaching case, and
fix spelling typo.
v3: Fix explicit calling of bad gpu.
v4: Rename function names.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Once the bad page saved to eeprom reaches the configured
threshold, ras recovery will be issued to notify user.
v2: Fix spelling typo.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Once the ras recovery is issued from eeprom write itself,
bad page reservation should be ignored, otherwise, recursive
calling of writting to eeprom would happen.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When retrieving bad gpu tag from eeprom, GPU init should
fail as the GPU needs to be retired for further check.
v2: Fix spelling typo, correct the condition to detect
bad gpu tag and refine error message.
v3: Refine function argument name.
v4: Fix missing check of returning value of i2c
initialization error case.
v5: Use dev_err to print PCI information in dmesg instead
of DRM_ERROR.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Bad page threshold value should be valid in the range between
-1 and max records length of eeprom. It could determine when
saved bad pages exceed threshold value, and proceed corresponding
actions.
v2: When using the default typical value, it should be min
value between typical value and eeprom max records length.
v3: drop the case of setting bad_page_cnt_threshold to be
0xFFFFFFFF, as it confuses user.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
bad_page_threshold could be configured to enable/disable the
associated bad page retirement feature in RAS.
When it's -1, ras will use typical bad page failure value to
handle bad page retirement.
When it's 0, disable bad page retirement, and no bad page
will be recorded and saved.
For other valid value, driver will use this manual value
as the threshold value of totoal bad pages.
v2: correct documentation of this parameter.
v3: remove confused statement in documentation.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Event bitmask is a 64-bit mask with only 1 bit set. Sending this
event bitmask in KFD SMI event message is both wasteful of memory
and potentially limiting to only 64 events. Instead send event
index in SMI event message.
Please note this change does not break the ABI for the two event
types defined so far. The new index is identical to the mask used
before.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm_dp_mst_allocate_vcpi() invokes
drm_dp_mst_topology_get_port_validated(), which increases the refcount
of the "port".
These reference counting issues take place in two exception handling
paths separately. Either when “slots” is less than 0 or when
drm_dp_init_vcpi() returns a negative value, the function forgets to
reduce the refcnt increased drm_dp_mst_topology_get_port_validated(),
which results in a refcount leak.
Fix these issues by pulling up the error handling when "slots" is less
than 0, and calling drm_dp_mst_topology_put_port() before termination
when drm_dp_init_vcpi() returns a negative value.
Fixes: 1e797f556c ("drm/dp: Split drm_dp_mst_allocate_vcpi")
Cc: <stable@vger.kernel.org> # v4.12+
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: Xin Xiong <xiongx18@fudan.edu.cn>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200719154545.GA41231@xin-virtual-machine
Pull ARM SoC driver updates from Arnd Bergmann:
"A couple of subsystems have their own subsystem maintainers but choose
to have the code merged through the soc tree as upstream, as the code
tends to be used across multiple SoCs or has SoC specific drivers
itself:
- memory controllers:
Krzysztof Kozlowski takes ownership of the drivers/memory subsystem
and its drivers, starting out with a set of cleanup patches.
A larger driver for the Tegra memory controller that was
accidentally missed for v5.8 is now added.
- reset controllers:
Only minor updates to drivers/reset this time
- firmware:
The "turris mox" firmware driver gains support for signed firmware
blobs The tegra firmware driver gets extended to export some debug
information Various updates to i.MX firmware drivers, mostly
cosmetic
- ARM SCMI/SCPI:
A new mechanism for platform notifications is added, among a number
of minor changes.
- optee:
Probing of the TEE bus is rewritten to better support detection of
devices that depend on the tee-supplicant user space. A new
firmware based trusted platform module (fTPM) driver is added based
on OP-TEE
- SoC attributes:
A new driver is added to provide a generic soc_device for
identifying a machine through the SMCCC ARCH_SOC_ID firmware
interface rather than by probing SoC family specific registers.
The series also contains some cleanups to the common soc_device
code.
There are also a number of updates to SoC specific drivers, the main
ones are:
- Mediatek cmdq driver gains a few in-kernel interfaces
- Minor updates to Qualcomm RPMh, socinfo, rpm drivers, mostly adding
support for additional SoC variants
- The Qualcomm GENI core code gains interconnect path voting and
performance level support, and integrating this into a number of
device drivers.
- A new driver for Samsung Exynos5800 voltage coupler for
- Renesas RZ/G2H (R8A774E1) SoC support gets added to a couple of SoC
specific device drivers
- Updates to the TI K3 Ring Accelerator driver"
* tag 'arm-drivers-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (164 commits)
soc: qcom: geni: Fix unused label warning
soc: qcom: smd-rpm: Fix kerneldoc
memory: jz4780_nemc: Only request IO memory the driver will use
soc: qcom: pdr: Reorder the PD state indication ack
MAINTAINERS: Add Git repository for memory controller drivers
memory: brcmstb_dpfe: Fix language typo
memory: samsung: exynos5422-dmc: Correct white space issues
memory: samsung: exynos-srom: Correct alignment
memory: pl172: Enclose macro argument usage in parenthesis
memory: of: Correct kerneldoc
memory: omap-gpmc: Fix language typo
memory: omap-gpmc: Correct white space issues
memory: omap-gpmc: Use 'unsigned int' for consistency
memory: omap-gpmc: Enclose macro argument usage in parenthesis
memory: omap-gpmc: Correct kerneldoc
memory: mvebu-devbus: Align with open parenthesis
memory: mvebu-devbus: Add missing braces to all arms of if statement
memory: bt1-l2-ctl: Add blank lines after declarations
soc: TI knav_qmss: make symbol 'knav_acc_range_ops' static
firmware: ti_sci: Replace HTTP links with HTTPS ones
...
Now that the corresponding feature bit has been renamed,
rename the quirk too - it's about special ways to
do DMA, not necessarily about the IOMMU.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch adds support for G200 desktop cards. We can reuse the whole
memory and modesetting code. A few PCI and DAC register values have to
be updated accordingly.
The most significant change is in the PLL setup. The driver parses the
device's BIOS to retrieve clock limits and reference clocks. With no BIOS
found, safe defaults are being used.
v2:
* copy BIOS ROM to system memory and access with regular
load/store; resolves potential HW limitations
* fix some stray whitespaces
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Co-developed-by: Egbert Eich <eich@suse.com>
Signed-off-by: Egbert Eich <eich@suse.com>
Co-developed-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200730102844.10995-9-tzimmermann@suse.de
So far, PCI option registers were initialized as part of modesetting,
which is late in the process. As these registers control fundamental
operation, they should be set early.
The patch moves the PCI option handling into device register setup,
before even the device MMIO memory is being mapped. No functional
changes made.
Moving the PCI code next to the device-register setup also allows to
remove the has_sdram field from struct mga_device. The state is now
local to the init helper.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200730102844.10995-4-tzimmermann@suse.de
The mgag200 driver maps registers into the address space. Move the
code into a separate helper function. No functional changes.
One small difference is in the handling of SDRAM/SGRAM. MGA devices
can come with either SDRAM or SGRAM. So far, the driver checked for
SDRAM, which is the common case. The patch moves this code into a
separate helper and checks for SGRAM, which is the special case. The
test itself is the same as before.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200730102844.10995-3-tzimmermann@suse.de
SHMEM pages use write-combine caching by default, but can also use the
platform's default page caching. Doing so may improve the performance
of I/O on the framebuffer.
Mgag200's hardware does not access framebuffer pages directly (i.e.,
via DMA), so enabling caching does not have an effect on consistency
of the framebuffer memory or the displayed data.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200730102844.10995-2-tzimmermann@suse.de
The ast driver's load and unload functions are left-overs from when
struct drm_driver.load/unload was still in use. The PCI probe helper
allocated the DRM device and ran load to initialize it.
This patch replaces this code with device create and destroy. The
main difference is that the device's create function allocates the
DRM device and ast structures in the same place. This will be required
for switching ast to managed allocations.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20200730135206.30239-4-tzimmermann@suse.de