Pull Hyper-V updates from Wei Liu:
- a series from Boqun Feng to support page size larger than 4K
- a few miscellaneous clean-ups
* tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
hv: clocksource: Add notrace attribute to read_hv_sched_clock_*() functions
x86/hyperv: Remove aliases with X64 in their name
PCI: hv: Document missing hv_pci_protocol_negotiation() parameter
scsi: storvsc: Support PAGE_SIZE larger than 4K
Driver: hv: util: Use VMBUS_RING_SIZE() for ringbuffer sizes
HID: hyperv: Use VMBUS_RING_SIZE() for ringbuffer sizes
Input: hyperv-keyboard: Use VMBUS_RING_SIZE() for ringbuffer sizes
hv_netvsc: Use HV_HYP_PAGE_SIZE for Hyper-V communication
hv: hyperv.h: Introduce some hvpfn helper functions
Drivers: hv: vmbus: Move virt_to_hvpfn() to hyperv header
Drivers: hv: Use HV_HYP_PAGE in hv_synic_enable_regs()
Drivers: hv: vmbus: Introduce types of GPADL
Drivers: hv: vmbus: Move __vmbus_open()
Drivers: hv: vmbus: Always use HV_HYP_PAGE_SIZE for gpadl
drivers: hv: remove cast from hyperv_die_event
Currently, dw_pcie_msi_init() allocates and maps page for msi, then
program the PCIE_MSI_ADDR_LO and PCIE_MSI_ADDR_HI. The Root Complex
may lose power during suspend-to-RAM, so when we resume, we want to
redo the latter but not the former. If designware based driver (for
example, pcie-tegra194.c) calls dw_pcie_msi_init() in resume path, the
msi page will be leaked.
As pointed out by Rob and Ard, there's no need to allocate a page for
the MSI address, we could use an address in the driver data.
To avoid map the MSI msg again during resume, we move the map MSI msg
from dw_pcie_msi_init() to dw_pcie_host_init().
Suggested-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20201009155505.5a580ef5@xhacker.debian
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Pull x86 irq updates from Thomas Gleixner:
"Surgery of the MSI interrupt handling to prepare the support of
upcoming devices which require non-PCI based MSI handling:
- Cleanup historical leftovers all over the place
- Rework the code to utilize more core functionality
- Wrap XEN PCI/MSI interrupts into an irqdomain to make irqdomain
assignment to PCI devices possible.
- Assign irqdomains to PCI devices at initialization time which
allows to utilize the full functionality of hierarchical
irqdomains.
- Remove arch_.*_msi_irq() functions from X86 and utilize the
irqdomain which is assigned to the device for interrupt management.
- Make the arch_.*_msi_irq() support conditional on a config switch
and let the last few users select it"
* tag 'x86-irq-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
PCI: MSI: Fix Kconfig dependencies for PCI_MSI_ARCH_FALLBACKS
x86/apic/msi: Unbreak DMAR and HPET MSI
iommu/amd: Remove domain search for PCI/MSI
iommu/vt-d: Remove domain search for PCI/MSI[X]
x86/irq: Make most MSI ops XEN private
x86/irq: Cleanup the arch_*_msi_irqs() leftovers
PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable
x86/pci: Set default irq domain in pcibios_add_device()
iommm/amd: Store irq domain in struct device
iommm/vt-d: Store irq domain in struct device
x86/xen: Wrap XEN MSI management into irqdomain
irqdomain/msi: Allow to override msi_domain_alloc/free_irqs()
x86/xen: Consolidate XEN-MSI init
x86/xen: Rework MSI teardown
x86/xen: Make xen_msi_init() static and rename it to xen_hvm_msi_init()
PCI/MSI: Provide pci_dev_has_special_msi_domain() helper
PCI_vmd_Mark_VMD_irqdomain_with_DOMAIN_BUS_VMD_MSI
irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI
x86/irq: Initialize PCI/MSI domain at PCI init time
x86/pci: Reducde #ifdeffery in PCI init code
...
Fix sparse build warning:
drivers/pci/controller/pcie-iproc-platform.c:102:33: warning: Using plain integer as NULL pointer
The map_irq member of the struct iproc_pcie takes a function pointer
serving as a callback to map interrupts, therefore we should pass a NULL
pointer to it rather than a integer in the iproc_pcie_pltfm_probe()
function.
Related:
commit b64aa11eb2 ("PCI: Set bridge map_irq and swizzle_irq to
default functions")
Link: https://lore.kernel.org/r/20200922194932.465925-1-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Recent laptops with dual AMD GPUs fail to suspend the discrete GPU, thus
causing lockups on system sleep and high power consumption at runtime.
The discrete GPU would normally be suspended to D3cold by turning off
ACPI _PR3 Power Resources of the Root Port above the GPU.
However on affected systems, the Root Port is hotplug-capable and
pci_bridge_d3_possible() only allows hotplug ports to go to D3 if they
belong to a Thunderbolt device or if the Root Port possesses a
"HotPlugSupportInD3" ACPI property. Neither is the case on affected
laptops. The reason for whitelisting only specific, known to work
hotplug ports for D3 is that there have been reports of SkyLake Xeon-SP
systems raising Hardware Error NMIs upon suspending their hotplug ports:
https://lore.kernel.org/linux-pci/20170503180426.GA4058@otc-nc-03/
But if a hotplug port is power manageable by ACPI (as can be detected
through presence of Power Resources and corresponding _PS0 and _PS3
methods) then it ought to be safe to suspend it to D3. To this end,
amend acpi_pci_bridge_d3() to whitelist such ports for D3.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1222
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1252
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1304
Reported-and-tested-by: Arthur Borsboom <arthurborsboom@gmail.com>
Reported-and-tested-by: matoro <matoro@airmail.cc>
Reported-by: Aaron Zakhrov <aaron.zakhrov@gmail.com>
Reported-by: Michal Rostecki <mrostecki@suse.com>
Reported-by: Shai Coleman <git@shaicoleman.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Most of dma-debug.h is not required by anything outside of kernel/dma.
Move the four declarations needed by dma-mappin.h or dma-ops providers
into dma-mapping.h and dma-map-ops.h, and move the remainder of the
file to kernel/dma/debug.h.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Split out all the bits that are purely for dma_map_ops implementations
and related code into a new <linux/dma-map-ops.h> header so that they
don't get pulled into all the drivers. That also means the architecture
specific <asm/dma-mapping.h> is not pulled in by <linux/dma-mapping.h>
any more, which leads to a missing includes that were pulled in by the
x86 or arm versions in a few not overly portable drivers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Old ATF automatically power on pcie phy and does not provide SMC call for
phy power on functionality which leads to aardvark initialization failure:
[ 0.330134] mvebu-a3700-comphy d0018300.phy: unsupported SMC call, try updating your firmware
[ 0.338846] phy phy-d0018300.phy.1: phy poweron failed --> -95
[ 0.344753] advk-pcie d0070000.pcie: Failed to initialize PHY (-95)
[ 0.351160] advk-pcie: probe of d0070000.pcie failed with error -95
This patch fixes above failure by ignoring 'not supported' error in
aardvark driver. In this case it is expected that phy is already power on.
Tested-by: Tomasz Maciej Nowak <tmn505@gmail.com>
Link: https://lore.kernel.org/r/20200902144344.16684-3-pali@kernel.org
Fixes: 366697018c ("PCI: aardvark: Add PHY support")
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Cc: <stable@vger.kernel.org> # 5.8+: ea17a0f153: phy: marvell: comphy: Convert internal SMCC firmware return codes to errno
The value assigned to msi_val after the inner loop finishes its run is
never used for anything, and it is also immediately overridden in the
line that follows with the return value from the xgene_msi_int_read()
function.
Since the value of msi_val following the inner loop completion is never
used in any meaningful way the assignment can be removed.
Addresses-Coverity-ID: 1437183 ("Unused value")
Link: https://lore.kernel.org/r/20200922030257.459898-1-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
The Raspberry Pi (RPI) is currently the only chip using this driver
(pcie-brcmstb.c). There, only one memory controller is used, without an
extension region, and the SCB0 viewport size is set to the size of the
first and only dma-range region. Other BrcmSTB SOCs have more complicated
memory configurations that require setting additional viewport sizes.
BrcmSTB PCIe controllers are intimately connected to the memory
controller(s) on the SOC. The SOC may have one to three memory
controllers; they are indicated by the term SCBi. Each controller has a
base region and an optional extension region. In physical memory, the base
and extension regions of a controller are not adjacent, but in PCIe-space
they are.
There is a "viewport" for each memory controller that allows DMA from
endpoint devices. Each viewport's size must be set to a power of two, and
that size must be equal to or larger than the amount of memory each
controller supports which is the sum of base region and its optional
extension. Further, the 1-3 viewports are also adjacent in PCIe-space.
Unfortunately the viewport sizes cannot be ascertained from the
"dma-ranges" property so they have their own property, "brcm,scb-sizes".
This is because dma-range information does not indicate what memory
controller it is associated. For example, consider the following case
where the size of one dma-range is 2GB and the second dma-range is 1GB:
/* Case 1: SCB0 size set to 4GB */
dma-range0: 2GB (from memc0-base)
dma-range1: 1GB (from memc0-extension)
/* Case 2: SCB0 size set to 2GB, SCB1 size set to 1GB */
dma-range0: 2GB (from memc0-base)
dma-range1: 1GB (from memc0-extension)
By just looking at the dma-ranges information, one cannot tell which
situation applies. That is why an additional property is needed. Its
length indicates the number of memory controllers being used and each value
indicates the viewport size.
Note that the RPI DT does not have a "brcm,scb-sizes" property value,
as it is assumed that it only requires one memory controller and no
extension. So the optional use of "brcm,scb-sizes" will be backwards
compatible.
One last layer of complexity exists: all of the viewports sizes must be
added and rounded up to a power of two to determine what the "BAR" size is.
Further, an offset must be given that indicates the base PCIe address of
this "BAR". The use of the term BAR is typically associated with endpoint
devices, and the term is used here because the PCIe HW may be used as an RC
or an EP. In the former case, all of the system memory appears in a single
"BAR" region in PCIe memory. As it turns out, BrcmSTB PCIe HW is rarely
used in the EP role and its system of mapping memory is an artifact that
requires multiple dma-ranges regions.
Link: https://lore.kernel.org/r/20200911175232.19016-8-james.quinlan@broadcom.com
Signed-off-by: Jim Quinlan <james.quinlan@broadcom.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Some STB chips have a special purpose reset controller named RESCAL (reset
calibration). The PCIe HW can now control RESCAL to start and stop its
operation. On probe(), the RESCAL is deasserted and the driver goes
through the sequence of setting registers and reading status in order to
start the internal PHY that is required for the PCIe.
Link: https://lore.kernel.org/r/20200911175232.19016-7-james.quinlan@broadcom.com
Signed-off-by: Jim Quinlan <jquinlan@broadcom.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
pci_restore_msi_state() directly writes the MSI/MSI-X related registers
via MMIO. On a physical machine, this works perfectly; for a Linux VM
running on a hypervisor, which typically enables IOMMU interrupt remapping,
the hypervisor usually should trap and emulate the MMIO accesses in order
to re-create the necessary interrupt remapping table entries in the IOMMU,
otherwise the interrupts can not work in the VM after hibernation.
Hyper-V is different from other hypervisors in that it does not trap and
emulate the MMIO accesses, and instead it uses a para-virtualized method,
which requires the VM to call hv_compose_msi_msg() to notify the hypervisor
of the info that would be passed to the hypervisor in the case of the
trap-and-emulate method. This is not an issue to a lot of PCI device
drivers, which destroy and re-create the interrupts across hibernation, so
hv_compose_msi_msg() is called automatically. However, some PCI device
drivers (e.g. the in-tree GPU driver nouveau and the out-of-tree Nvidia
proprietary GPU driver) do not destroy and re-create MSI/MSI-X interrupts
across hibernation, so hv_pci_resume() has to call hv_compose_msi_msg(),
otherwise the PCI device drivers can no longer receive interrupts after
the VM resumes from hibernation.
Hyper-V is also different in that chip->irq_unmask() may fail in a
Linux VM running on Hyper-V (on a physical machine, chip->irq_unmask()
can not fail because unmasking an MSI/MSI-X register just means an MMIO
write): during hibernation, when a CPU is offlined, the kernel tries
to move the interrupt to the remaining CPUs that haven't been offlined
yet. In this case, hv_irq_unmask() -> hv_do_hypercall() always fails
because the vmbus channel has been closed: here the early "return" in
hv_irq_unmask() means the pci_msi_unmask_irq() is not called, i.e. the
desc->masked remains "true", so later after hibernation, the MSI interrupt
always remains masked, which is incorrect. Refer to cpu_disable_common()
-> fixup_irqs() -> irq_migrate_all_off_this_cpu() -> migrate_one_irq():
static bool migrate_one_irq(struct irq_desc *desc)
{
...
if (maskchip && chip->irq_mask)
chip->irq_mask(d);
...
err = irq_do_set_affinity(d, affinity, false);
...
if (maskchip && chip->irq_unmask)
chip->irq_unmask(d);
Fix the issue by calling pci_msi_unmask_irq() unconditionally in
hv_irq_unmask(). Also suppress the error message for hibernation because
the hypercall failure during hibernation does not matter (at this time
all the devices have been frozen). Note: the correct affinity info is
still updated into the irqdata data structure in migrate_one_irq() ->
irq_do_set_affinity() -> hv_set_affinity(), so later when the VM
resumes, hv_pci_restore_msi_state() is able to correctly restore
the interrupt with the correct affinity.
Link: https://lore.kernel.org/r/20201002085158.9168-1-decui@microsoft.com
Fixes: ac82fc8327 ("PCI: hv: Add hibernation support")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Jake Oshins <jakeo@microsoft.com>
Add Kconfig options for changing the default pcie_bus_config, i.e., the
strategy for configuration MPS and MRRS, in the same manner as the
CONFIG_PCIEASPM_XXXX choice. The pci_bus_config setting may still be
overridden by kernel command-line parameters, e.g.,
"pci=pcie_bus_tune_off".
[bhelgaas: depend on EXPERT, tweak help texts]
Link: https://lore.kernel.org/r/20200928194651.5393-2-james.quinlan@broadcom.com
Signed-off-by: Jim Quinlan <james.quinlan@broadcom.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
This reverts commit 7e24bc347e.
7e24bc347e was based on PCIe r5.0, sec 5.9, which claims we need a 200 ms
delay when transitioning to or from D2. However, sec 5.3.1.3 states the
delay as 200 μs (microseconds), as does the table in PCIe r4.0, sec 5.9.1.
This looks like a typo in the r5.0 spec, so revert back to a 200 μs delay
instead of a 200 ms delay.
Fixes: 7e24bc347e ("PCI/PM: Apply D2 delay as milliseconds, not microseconds")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
476e7faefc ("PCI PM: Do not wait for buses in B2 or B3 during resume")
removed the last use of PCI_PM_BUS_WAIT. Remove the definition as well.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Mika writes:
thunderbolt: Changes for v5.10 merge window
This includes following Thunderbolt/USB4 changes for v5.10 merge window:
* A couple of optimizations around Tiger Lake force power logic and
NHI (Native Host Interface) LC (Link Controller) mailbox command
processing
* Power management improvements for Software Connection Manager
* Debugfs support
* Allow KUnit tests to be enabled also when Thunderbolt driver is
configured as module.
* Few minor cleanups and fixes
All these have been in linux-next with no reported issues.
* tag 'thunderbolt-for-v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt: (37 commits)
thunderbolt: Capitalize comment on top of QUIRK_FORCE_POWER_LINK_CONTROLLER
thunderbolt: Correct tb_check_quirks() kernel-doc
thunderbolt: Log correct zeroX entries in decode_error()
thunderbolt: Handle ERR_LOCK notification
thunderbolt: Use "if USB4" instead of "depends on" in Kconfig
thunderbolt: Allow KUnit tests to be built also when CONFIG_USB4=m
thunderbolt: Only stop control channel when entering freeze
thunderbolt: debugfs: Fix uninitialized return in counters_write()
thunderbolt: Add debugfs interface
thunderbolt: No need to warn in TB_CFG_ERROR_INVALID_CONFIG_SPACE
thunderbolt: Introduce tb_switch_is_tiger_lake()
thunderbolt: Introduce tb_switch_is_ice_lake()
thunderbolt: Check for Intel vendor ID when identifying controller
thunderbolt: Introduce tb_port_is_nhi()
thunderbolt: Introduce tb_switch_next_cap()
thunderbolt: Introduce tb_port_next_cap()
thunderbolt: Move struct tb_cap_any to tb_regs.h
thunderbolt: Add runtime PM for Software CM
thunderbolt: Create device links from ACPI description
ACPI: Export acpi_get_first_physical_node() to modules
...
PCI devices support two variants of the D3 power state: D3hot (main power
present) D3cold (main power removed). Previously struct pci_dev contained:
unsigned int d3_delay; /* D3->D0 transition time in ms */
unsigned int d3cold_delay; /* D3cold->D0 transition time in ms */
"d3_delay" refers specifically to the D3hot state. Rename it to
"d3hot_delay" to avoid ambiguity and align with the ACPI "_DSM for
Specifying Device Readiness Durations" in the PCI Firmware spec r3.2,
sec 4.6.9.
There is no change to the functionality.
Link: https://lore.kernel.org/r/20200730210848.1578826-1-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The "struct dev_pm_ops pcibios_pm_ops", declared in include/linux/pci.h and
defined in drivers/pci/pci-driver.c, provided arch-specific hooks when a
PCI device was doing a hibernate transition.
394216275c ("s390: remove broken hibernate / power management support")
removed the last use of pcibios_pm_ops, so remove it completely.
[bhelgaas: drop unused "error"]
Link: https://lore.kernel.org/r/20200730194416.1029509-1-vaibhavgupta40@gmail.com
Reported-by: Bjorn Helgaas <helgaas@kernel.org>
Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
PCI host bridge driver can be probed before the gpiochip it requires,
so, of_get_named_gpio() can return -EPROBE_DEFER. Current code lets the
kirin_pcie_probe() directly return -ENODEV, which results in the PCI
host controller driver probe failure; with this error code the PCI host
controller driver will not be probed again when the gpiochip driver is
loaded.
Fix the above issue by letting kirin_pcie_probe() return -EPROBE_DEFER in
such a case.
Link: https://lore.kernel.org/r/20200918123800.19983-1-huobean@gmail.com
Signed-off-by: Bean Huo <beanhuo@micron.com>
[lorenzo.pieralisi@arm.com: commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Add missing documentation for the parameter "version" and "num_version"
of the hv_pci_protocol_negotiation() function and resolve build time
kernel-doc warnings:
drivers/pci/controller/pci-hyperv.c:2535: warning: Function parameter
or member 'version' not described in 'hv_pci_protocol_negotiation'
drivers/pci/controller/pci-hyperv.c:2535: warning: Function parameter
or member 'num_version' not described in 'hv_pci_protocol_negotiation'
No change to functionality intended.
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Link: https://lore.kernel.org/r/20200925234753.1767227-1-kw@linux.com
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
For VFs, the Memory Space Enable bit in the Command Register is
hard-wired to 0.
Add a new bit to signify devices where the Command Register Memory
Space Enable bit does not control the device's response to MMIO
accesses.
Fixes: abafbc551f ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory")
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
pci_dev_reset_slot_function() refuses to reset a hotplug slot if it is
shared by multiple pci_devs. That's the case if and only if the slot is
occupied by a multifunction device.
Simplify the function to check the device's multifunction flag instead
of iterating over the devices on the bus. (Iterating over the devices
requires holding pci_bus_sem, which the function erroneously does not
acquire.)
Link: https://lore.kernel.org/r/c6aab5af096f7b1b3db57f6335cebba8f0fcca89.1595330431.git.lukas@wunner.de
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Alex Williamson <alex.williamson@redhat.com>