Added GPIO is "Power Alert". It's uncertain if this
GPIO is set on GPU initialization or only if a change is detected by the
GPU at runtime.
This GPIO can be found on Tesla and sometimes on Fermi GPUs.
Untested, wrote according to documentation.
Signed-off-by: Mark Menzynski <mmenzyns@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Added GPIO is "Thermal and External Power Detect". It's uncertain if this
GPIO is set on GPU initialization or only if a change is detected by the
GPU at runtime.
This GPIO can be found in Rankine and Curie and rarely on Tesla GPUs
VBIOS.
Untested, wrote according to documentation.
Signed-off-by: Mark Menzynski <mmenzyns@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Currently, nouveau doesn't check if GPU is missing power. This
patch makes nouveau fail when this happens on latest GPUs.
It checks GPIO function 121 (External Power Emergency), which
should detect power problems on GPU initialization.
This can be disabled with nouveau.config=NvPowerChecks=1
Tested on TU104, GP106 and GF100.
v3:
* Add config override for disabling power checks
Signed-off-by: Mark Menzynski <mmenzyns@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
One gpio was in wrong place, moved it for better readability.
Signed-off-by: Mark Menzynski <mmenzyns@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
There's already a condition in place which attempts to detect this, but
since we've begun to require a PMU subdev even on boards where we don't
load a custom FW, it's become inaccurate.
This will prevent unnecessarily running a periodic fan update thread on
GP100 and newer, where we don't yet override the default PMU FW.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Has a nice side-effect that we only update HW for this when it changes now,
rather than every time we do a page flip.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
zpos normalisation uses plane id to determine ordering for duplicate zpos
values, and we likely want to keep primary plane on the bottom here.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We have some of this open-coded already, use the helper to prevent problems
when adding (for example) support for the alpha property.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
For GF119:GV100, we can enable DEGAMMA/CTM/GAMMA. For earlier GPUs, as
there is no CTM, having both degamma and gamma is a bit pointless. Later
GPUs currently lack an implementation.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The overlay logic can only do colorkey-based selection, not
alpha-blending.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Pascal was particularly incorrect, as the register changed to be more in the
same format as the MMU fault buffers are.
Shouldn't have impacted much more than confusing MMU fault log messages.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Would like to be able to reuse gf100_fifo_intr_fault() for (some of) the
later chipsets too, as it's identical.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signal that the reset sequence has completed.
This opcode signals that the software reset sequence has completed.
Ordinarily, no actual operations are performed by the opcode.
However it allows for possible software work arounds by devinit
engines in software agents other than the VBIOS, such as the resman,
FCODE, and EFI driver.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signal that the reset sequence has begun.
This opcode signals that the software reset sequence has begun.
Ordinarily, no actual operations are performed by the opcode.
However it allows for possible software work arounds by devinit
engines in software agents other than the VBIOS, such as the resman,
FCODE, and EFI driver.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Absence of a TMDS Info Table is common on Optimus setups where the NVIDIA
gpu is not connected directly to any outputs.
Reporting an error in this scenario is too harsh. Accordingly, change the
error message to an info message.
By default the error message also causes a boot flicker for these sytems.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Older hardware seems to want 0..1024 values, while new hardware takes
0..1 values. We set the gain to 1024 for the earlier display classes.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
On Turing, an input LUT is required to transform inputs in fixed-point
formats to FP16 for the internal display pipe. We provide an identity
mapping whenever a window is enabled for this reason.
HW has error checks to ensure when the input is already FP16, that the
input LUT is also disabled.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes, in particular in the
context in which this code is being used.
So, replace the following form:
sizeof(*kind) + sizeof(*kind->data) * mmu->kind_nr;
with:
struct_size(kind, data, mmu->kind_nr)
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Drop include of the deprecated drmP.h from all nouveau heder files.
This allows us to remove drmP.h from all .c files without any
side-effects in a follow-up commit.
Build tested using allyeyconfig and allmodconfig
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: nouveau@lists.freedesktop.org
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The DRM_UDELAY is a simple wrapper for udealy() and to be consistent
call udelay() direct like in may other places.
This avoids the need to pull in drm_os_linux.h when we later
drop drmP.h uses in nouveau.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: nouveau@lists.freedesktop.org
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Fix sparse warning:
drivers/gpu/drm/nouveau/nvkm/subdev/secboot/acr_r352.c:1092:1:
warning: symbol 'acr_r352_ls_gpccs_func' was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
While I had thought I had fixed this issue in:
commit 342406e4fb ("drm/nouveau/i2c: Disable i2c bus access after
->fini()")
It turns out that while I did fix the error messages I was seeing on my
P50 when trying to access i2c busses with the GPU in runtime suspend, I
accidentally had missed one important detail that was mentioned on the
bug report this commit was supposed to fix: that the CPU would only lock
up when trying to access i2c busses _on connected devices_ _while the
GPU is not in runtime suspend_. Whoops. That definitely explains why I
was not able to get my machine to hang with i2c bus interactions until
now, as plugging my P50 into it's dock with an HDMI monitor connected
allowed me to finally reproduce this locally.
Now that I have managed to reproduce this issue properly, it looks like
the problem is much simpler then it looks. It turns out that some
connected devices, such as MST laptop docks, will actually ACK i2c reads
even if no data was actually read:
[ 275.063043] nouveau 0000:01:00.0: i2c: aux 000a: 1: 0000004c 1
[ 275.063447] nouveau 0000:01:00.0: i2c: aux 000a: 00 01101000 10040000
[ 275.063759] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000001
[ 275.064024] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000
[ 275.064285] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000
[ 275.064594] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000
Because we don't handle the situation of i2c ack without any data, we
end up entering an infinite loop in nvkm_i2c_aux_i2c_xfer() since the
value of cnt always remains at 0. This finally properly explains how
this could result in a CPU hang like the ones observed in the
aforementioned commit.
So, fix this by retrying transactions if no data is written or received,
and give up and fail the transaction if we continue to not write or
receive any data after 32 retries.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
To avoid the dpm frequence range get failed when DPM enabled and it
will be enabled later once handle well the feature bit map struct.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>