Commit Graph

67736 Commits

Author SHA1 Message Date
Bjorn Helgaas
c5048a73b4 Merge branch 'pci/trivial'
- Fix typos and whitespace errors (Bjorn Helgaas, Krzysztof Wilczynski)

  - Remove unnecessary "return" statements (Krzysztof Wilczynski)

  - Correct of_irq_parse_pci() function documentation (Lubomir Rintel)

* pci/trivial:
  PCI: Remove unnecessary returns
  PCI: OF: Correct of_irq_parse_pci() documentation
  PCI: Fix typos and whitespace errors
2019-09-23 16:10:31 -05:00
Bjorn Helgaas
8b38b5f2cf Merge branch 'remotes/lorenzo/pci/mediatek'
- Add mediatek support for MT7629 (Jianjun Wang)

* remotes/lorenzo/pci/mediatek:
  PCI: mediatek: Add controller support for MT7629
  dt-bindings: PCI: Add support for MT7629
2019-09-23 16:10:24 -05:00
Bjorn Helgaas
af47f25f33 Merge branch 'remotes/lorenzo/pci/al'
- Add driver for Amazon Annapurna Labs PCIe controller (Jonathan Chocron)

  - Disable MSI-X since Annapurna Labs advertises it, but it's broken
    (Jonathan Chocron)

  - Disable VPD since Annapurna Labs advertises it, but it's broken
    (Jonathan Chocron)

  - Add ACS quirk since Annapurna Labs doesn't support ACS but does provide
    some equivalent protections (Ali Saidi)

* remotes/lorenzo/pci/al:
  PCI: dwc: Add validation that PCIe core is set to correct mode
  PCI: dwc: al: Add Amazon Annapurna Labs PCIe controller driver
  dt-bindings: PCI: Add Amazon's Annapurna Labs PCIe host bridge binding
  PCI: Add quirk to disable MSI-X support for Amazon's Annapurna Labs Root Port
  PCI/VPD: Prevent VPD access for Amazon's Annapurna Labs Root Port
  PCI: Add ACS quirk for Amazon Annapurna Labs root ports
  PCI: Add Amazon's Annapurna Labs vendor ID

# Conflicts:
#	drivers/pci/quirks.c
2019-09-23 16:10:17 -05:00
Bjorn Helgaas
0ca0ef1042 Merge branch 'pci/resource'
- Convert pci_resource_to_user() to a weak function to remove
    HAVE_ARCH_PCI_RESOURCE_TO_USER #defines (Denis Efremov)

  - Use PCI_SRIOV_NUM_BARS for idiomatic loop structure (Denis Efremov)

  - Fix Resizable BAR size suspend/restore for 1MB BARs (Sumit Saxena)

  - Correct "pci=resource_alignment" example in documentation (Alexey
    Kardashevskiy)

* pci/resource:
  PCI: Correct pci=resource_alignment parameter example
  PCI: Restore Resizable BAR size bits correctly for 1MB BARs
  PCI: Use PCI_SRIOV_NUM_BARS in loops instead of PCI_IOV_RESOURCE_END
  PCI: Convert pci_resource_to_user() to a weak function

# Conflicts:
#	drivers/pci/pci.c
2019-09-23 16:10:15 -05:00
Bjorn Helgaas
63fa8437cb Merge branch 'pci/p2pdma'
- Move P2PCMA PCI bus offset from generic dev_pagemap to
    pci_p2pdma_pagemap (Logan Gunthorpe)

  - Add provider's pci_dev to pci_p2pdma_pagemap (Logan Gunthorpe)

  - Apply host bridge whitelist for ACS (Logan Gunthorpe)

  - Whitelist some Intel host bridges for P2PDMA (Logan Gunthorpe)

  - Add attrs to pci_p2pdma_map_sg() to match dma_map_sg() (Logan
    Gunthorpe)

  - Add pci_p2pdma_unmap_sg() (Logan Gunthorpe)

  - Store P2PDMA mapping method in xarray (Logan Gunthorpe)

  - Map requests that traverse a host bridge (Logan Gunthorpe)

  - Allow IOMMU for host bridge whitelist (Logan Gunthorpe)

* pci/p2pdma:
  PCI/P2PDMA: Update pci_p2pdma_distance_many() documentation
  PCI/P2PDMA: Allow IOMMU for host bridge whitelist
  PCI/P2PDMA: dma_map() requests that traverse the host bridge
  PCI/P2PDMA: Store mapping method in an xarray
  PCI/P2PDMA: Factor out __pci_p2pdma_map_sg()
  PCI/P2PDMA: Introduce pci_p2pdma_unmap_sg()
  PCI/P2PDMA: Add attrs argument to pci_p2pdma_map_sg()
  PCI/P2PDMA: Whitelist some Intel host bridges
  PCI/P2PDMA: Factor out host_bridge_whitelist()
  PCI/P2PDMA: Apply host bridge whitelist for ACS
  PCI/P2PDMA: Factor out __upstream_bridge_distance()
  PCI/P2PDMA: Add constants for map type results to upstream_bridge_distance()
  PCI/P2PDMA: Add provider's pci_dev to pci_p2pdma_pagemap struct
  PCI/P2PDMA: Introduce private pagemap structure
2019-09-23 16:10:12 -05:00
Bjorn Helgaas
6ce54f0219 Merge branch 'pci/misc'
- Use devm_add_action_or_reset() helper (Fuqian Huang)

  - Mark expected switch fall-through (Gustavo A. R. Silva)

  - Convert sysfs device attributes from __ATTR() to DEVICE_ATTR() (Kelsey
    Skunberg)

  - Convert sysfs file permissions from S_IRUSR etc to octal (Kelsey
    Skunberg)

  - Move SR-IOV sysfs functions to iov.c (Kelsey Skunberg)

  - Add pci_info_ratelimited() to ratelimit PCI messages separately
    (Krzysztof Wilczynski)

  - Fix "'static' not at beginning of declaration" warnings (Krzysztof
    Wilczynski)

  - Clean up resource_alignment parameter to not require static buffer
    (Logan Gunthorpe)

  - Add ACS quirk for iProc PAXB (Abhinav Ratna)

  - Add pci_irq_vector() and other stubs for !CONFIG_PCI (Herbert Xu)

* pci/misc:
  PCI: Add pci_irq_vector() and other stubs when !CONFIG_PCI
  PCI: Add ACS quirk for iProc PAXB
  PCI: Force trailing new line to resource_alignment_param in sysfs
  PCI: Move pci_[get|set]_resource_alignment_param() into their callers
  PCI: Clean up resource_alignment parameter to not require static buffer
  PCI: Use static const struct, not const static struct
  PCI: Add pci_info_ratelimited() to ratelimit PCI separately
  PCI/IOV: Remove group write permission from sriov_numvfs, sriov_drivers_autoprobe
  PCI/IOV: Move sysfs SR-IOV functions to iov.c
  PCI: sysfs: Change permissions from symbolic to octal
  PCI: sysfs: Change DEVICE_ATTR() to DEVICE_ATTR_WO()
  PCI: sysfs: Define device attributes with DEVICE_ATTR*()
  PCI: Mark expected switch fall-through
  PCI: Use devm_add_action_or_reset()
2019-09-23 16:10:10 -05:00
Bjorn Helgaas
a10a1f60c7 Merge branch 'pci/enumeration'
- Consolidate _HPP & _HPX code in pci-acpi.h and remove unnecessary
    struct hotplug_program_ops (Krzysztof Wilczynski)

  - Fixup PCIe device types to remove the need for dev->has_secondary_link
    (Mika Westerberg)

* pci/enumeration:
  PCI: Get rid of dev->has_secondary_link flag
  PCI: Make pcie_downstream_port() available outside of access.c
  PCI/ACPI: Remove unnecessary struct hotplug_program_ops
  PCI/ACPI: Move _HPP & _HPX functions to pci-acpi.c
  PCI/ACPI: Rename _HPX structs from hpp_* to hpx_*
2019-09-23 16:10:08 -05:00
Bjorn Helgaas
77dc51fd55 Merge branch 'pci/encapsulate'
- Move many symbols from public linux/pci.h to subsystem-private
    drivers/pci/pci.h (Kelsey Skunberg)

  - Unexport pci_bus_get() and pci_bus_sem since they're not needed by
    modules (Kelsey Skunberg)

  - Remove unused pci_block_cfg_access() et al (Kelsey Skunberg)

* pci/encapsulate:
  PCI: Make pci_set_of_node(), etc private
  PCI: Make pci_enable_ptm() private
  PCI: Make pcie_set_ecrc_checking(), pcie_ecrc_get_policy() private
  PCI: Make pci_ats_init() private
  PCI: Make pcie_update_link_speed() private
  PCI: Make pci_bus_get(), pci_bus_put() private
  PCI: Make pci_hotplug_io_size, mem_size, and bus_size private
  PCI: Make pci_save_vc_state(), pci_restore_vc_state(), etc private
  PCI: Make pci_get_host_bridge_device(), pci_put_host_bridge_device() private
  PCI: Make pci_check_pme_status(), pci_pme_wakeup_bus() private
  PCI: Make PCI_PM_* delay times private
  PCI: Unexport pci_bus_sem
  PCI: Unexport pci_bus_get() and pci_bus_put()
  PCI: Remove pci_block_cfg_access() et al (unused)
2019-09-23 16:10:07 -05:00
Herbert Xu
0d8006ddbe PCI: Add pci_irq_vector() and other stubs when !CONFIG_PCI
Add stub functions pci_alloc_irq_vectors_affinity() and pci_irq_vector()
when CONFIG_PCI is off so drivers can use them without resorting to ifdefs.

Also move the PCI_IRQ_* macros outside of the ifdefs so they are always
available.

Fixes: 625f269a5a ("crypto: inside-secure - add support for PCI based FPGA development board")
Link: https://lore.kernel.org/r/20190904122600.GA28660@gondor.apana.org.au
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-09-21 09:51:55 -05:00
Jonathan Chocron
4a36a60c34 PCI: Add Amazon's Annapurna Labs vendor ID
Add Amazon's Annapurna Labs vendor ID to pci_ids.h.

Signed-off-by: Jonathan Chocron <jonnyc@amazon.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Andrew Murray <andrew.murray@arm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
2019-09-16 14:10:00 +01:00
Mika Westerberg
ca78410403 PCI: Get rid of dev->has_secondary_link flag
In some systems, the Device/Port Type in the PCI Express Capabilities
register incorrectly identifies upstream ports as downstream ports.

d0751b98df ("PCI: Add dev->has_secondary_link to track downstream PCIe
links") addressed this by adding pci_dev.has_secondary_link, which is set
for downstream ports.  But this is confusing because pci_pcie_type()
sometimes gives the wrong answer, and it's not obvious that we should use
pci_dev.has_secondary_link instead.

Reduce the confusion by correcting the type of the port itself so that
pci_pcie_type() returns the actual type regardless of what the Device/Port
Type register claims it is.  Update the users to call pci_pcie_type() and
pcie_downstream_port() accordingly, and remove pci_dev.has_secondary_link
completely.

Link: https://lore.kernel.org/linux-pci/20190703133953.GK128603@google.com/
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20190822085553.62697-2-mika.westerberg@linux.intel.com
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2019-09-07 07:45:31 -05:00
Krzysztof Wilczynski
7f1c62c443 PCI: Add pci_info_ratelimited() to ratelimit PCI separately
Do not use printk_ratelimit() in drivers/pci/pci.c as it shares the rate
limiting state with all other callers to the printk_ratelimit().

Add pci_info_ratelimited() (similar to pci_notice_ratelimited() added in
the commit a88a7b3eb0 ("vfio: Use dev_printk() when possible")) and use
it instead of printk_ratelimit() + pci_info().

Link: https://lore.kernel.org/r/20190825224616.8021-1-kw@linux.com
Signed-off-by: Krzysztof Wilczynski <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-09-05 13:26:46 -05:00
Bjorn Helgaas
d1bbf38aaf PCI: Fix typos and whitespace errors
Fix typos in drivers/pci.  Comment and whitespace changes only.

Link: https://lore.kernel.org/r/20190819115306.27338-1-kw@linux.com
Signed-off-by: Krzysztof Wilczynski <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>	# armada8k
2019-08-30 14:00:12 -05:00
Krzysztof Wilczynski
8c3aac6e1b PCI/ACPI: Move _HPP & _HPX functions to pci-acpi.c
Move program_hpx_type0(), program_hpx_type1(), etc., and enums
hpx_type3_dev_type, hpx_type3_fn_type and hpx_type3_cfg_loc to
drivers/pci/pci-acpi.c as these functions and enums are ACPI-specific.

Move structs hpx_type0, hpx_type1, hpx_type2 and hpx_type3 to
drivers/pci/pci.h as these are shared between drivers/pci/pci-acpi.c and
drivers/pci/probe.c.

Link: https://lore.kernel.org/r/20190827094951.10613-3-kw@linux.com
Signed-off-by: Krzysztof Wilczynski <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-08-28 15:25:53 -05:00
Krzysztof Wilczynski
e2797ad31f PCI/ACPI: Rename _HPX structs from hpp_* to hpx_*
The names of the hpp_type0, hpp_type1 and hpp_type2 structs suggest that
they're related to _HPP, when in fact they're related to _HPX.

The struct hpp_type0 denotes an _HPX Type 0 setting record that supersedes
the _HPP setting record, and it has been used interchangeably for _HPP as
per the ACPI specification (see version 6.3, section 6.2.9.1) which states
that it should be applied to PCI, PCI-X and PCI Express devices, with
settings being ignored if they are not applicable.

Rename them to hpx_type0, hpx_type1 and hpx_type2 to reflect their relation
to _HPX rather than _HPP.

Link: https://lore.kernel.org/r/20190827094951.10613-2-kw@linux.com
Signed-off-by: Krzysztof Wilczynski <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-08-28 15:09:45 -05:00
Krzysztof Wilczynski
7ce2e76a04 PCI: Move ASPM declarations to linux/pci.h
Move ASPM definitions and function prototypes from include/linux/pci-aspm.h
to include/linux/pci.h so users only need to include <linux/pci.h>:

  PCIE_LINK_STATE_L0S
  PCIE_LINK_STATE_L1
  PCIE_LINK_STATE_CLKPM
  pci_disable_link_state()
  pci_disable_link_state_locked()
  pcie_no_aspm()

No functional changes intended.

Link: https://lore.kernel.org/r/20190827095620.11213-1-kw@linux.com
Signed-off-by: Krzysztof Wilczynski <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-08-28 08:28:39 -05:00
Logan Gunthorpe
7f73eac3a7 PCI/P2PDMA: Introduce pci_p2pdma_unmap_sg()
Add pci_p2pdma_unmap_sg() to the two places that call pci_p2pdma_map_sg().

This is a prep patch to introduce correct mappings for p2pdma transactions
that go through the root complex.

Link: https://lore.kernel.org/r/20190730163545.4915-10-logang@deltatee.com
Link: https://lore.kernel.org/r/20190812173048.9186-10-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-08-16 08:41:26 -05:00
Logan Gunthorpe
2b9f4bb2a4 PCI/P2PDMA: Add attrs argument to pci_p2pdma_map_sg()
This is to match the dma_map_sg() API which this function will have to call
in an future patch.

Add a pci_p2pdma_map_sg_attrs() function and helper to call it with no
attributes just like the dma_map_sg() function.

Link: https://lore.kernel.org/r/20190730163545.4915-9-logang@deltatee.com
Link: https://lore.kernel.org/r/20190812173048.9186-9-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-08-16 08:41:15 -05:00
Logan Gunthorpe
a6e6fe6549 PCI/P2PDMA: Introduce private pagemap structure
Move the PCI bus offset from the generic dev_pagemap structure to a
specific pci_p2pdma_pagemap structure.

This structure will grow in subsequent patches.

Link: https://lore.kernel.org/r/20190730163545.4915-2-logang@deltatee.com
Link: https://lore.kernel.org/r/20190812173048.9186-2-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-08-16 08:38:52 -05:00
Denis Efremov
b8074aa246 PCI: Convert pci_resource_to_user() to a weak function
Convert pci_resource_to_user() to a weak function so the existing
architecture-specific implementations will automatically override the
generic one.  This allows us to remove HAVE_ARCH_PCI_RESOURCE_TO_USER
definitions and avoid the conditional compilation for this single function.

Link: https://lore.kernel.org/r/20190729101401.28068-1-efremov@linux.com
Link: https://lore.kernel.org/r/20190729101401.28068-2-efremov@linux.com
Link: https://lore.kernel.org/r/20190729101401.28068-3-efremov@linux.com
Link: https://lore.kernel.org/r/20190729101401.28068-4-efremov@linux.com
Link: https://lore.kernel.org/r/20190729101401.28068-5-efremov@linux.com
Link: https://lore.kernel.org/r/20190729101401.28068-6-efremov@linux.com
Signed-off-by: Denis Efremov <efremov@linux.com>
[bhelgaas: squash into one commit]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Paul Burton <paul.burton@mips.com>	# MIPS
2019-08-08 15:12:07 -05:00
Jianjun Wang
0cccd42e61 PCI: mediatek: Add controller support for MT7629
MT7629 is an ARM platform SoC which has the same PCIe IP as MT7622.

The HW default value of its PCI host controller Device ID is invalid,
fix it to match the hardware implementation.

Signed-off-by: Jianjun Wang <jianjun.wang@mediatek.com>
[lorenzo.pieralisi@arm.com: commit log/minor spelling update]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Andrew Murray <andrew.murray@arm.com>
Acked-by: Ryder Lee <ryder.lee@mediatek.com>
2019-08-07 11:34:01 +01:00
Kelsey Skunberg
621f7e354f PCI: Make pci_set_of_node(), etc private
These interfaces:

  void pci_set_of_node(struct pci_dev *dev);
  void pci_release_of_node(struct pci_dev *dev);
  void pci_set_bus_of_node(struct pci_bus *bus);
  void pci_release_bus_of_node(struct pci_bus *bus);

are only used in drivers/pci/ and do not need to be seen by the rest of the
kernel.  Move them to drivers/pci/pci.h so they're private to the PCI
subsystem.

Link: https://lore.kernel.org/r/20190724233848.73327-12-skunberg.kelsey@gmail.com
Signed-off-by: Kelsey Skunberg <skunberg.kelsey@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-07-30 08:45:56 -05:00
Kelsey Skunberg
ac6c26da29 PCI: Make pci_enable_ptm() private
This interface:

  int pci_enable_ptm(struct pci_dev *dev, u8 *granularity);

is only used in drivers/pci/ and does not need to be seen by the rest of
the kernel.  Move it to drivers/pci/pci.h so it's private to the PCI
subsystem.

Link: https://lore.kernel.org/r/20190724233848.73327-11-skunberg.kelsey@gmail.com
Signed-off-by: Kelsey Skunberg <skunberg.kelsey@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-07-30 08:45:55 -05:00
Kelsey Skunberg
72bde9ced3 PCI: Make pcie_set_ecrc_checking(), pcie_ecrc_get_policy() private
These interfaces:

  void pcie_set_ecrc_checking(struct pci_dev *dev);
  void pcie_ecrc_get_policy(char *str);

are only used in drivers/pci/ and do not need to be seen by the rest of the
kernel.  Move them to drivers/pci/pci.h so they're private to the PCI
subsystem.

Link: https://lore.kernel.org/r/20190724233848.73327-10-skunberg.kelsey@gmail.com
Signed-off-by: Kelsey Skunberg <skunberg.kelsey@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-07-30 08:45:54 -05:00
Kelsey Skunberg
b92b512a43 PCI: Make pci_ats_init() private
This interface:

  void pci_ats_init(struct pci_dev *dev);

is only used in drivers/pci/ and does not need to be seen by the rest of
the kernel.  Move it to drivers/pci/pci.h so it's private to the PCI
subsystem.

Link: https://lore.kernel.org/r/20190724233848.73327-9-skunberg.kelsey@gmail.com
Signed-off-by: Kelsey Skunberg <skunberg.kelsey@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-07-30 08:45:53 -05:00
Kelsey Skunberg
5da78d9578 PCI: Make pcie_update_link_speed() private
This interface:

  void pcie_update_link_speed(struct pci_bus *bus, u16 link_status);

is only used in drivers/pci/ and does not need to be seen by the rest of
the kernel.  Move it to drivers/pci/pci.h so it's private to the PCI
subsystem.

Link: https://lore.kernel.org/r/20190724233848.73327-8-skunberg.kelsey@gmail.com
Signed-off-by: Kelsey Skunberg <skunberg.kelsey@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-07-30 08:45:52 -05:00
Kelsey Skunberg
ecd29c1a38 PCI: Make pci_bus_get(), pci_bus_put() private
These interfaces:

  struct pci_bus *pci_bus_get(struct pci_bus *bus);
  void pci_bus_put(struct pci_bus *bus);

are only used in drivers/pci/ and do not need to be seen by the rest of the
kernel.  Move them to drivers/pci/pci.h so they're private to the PCI
subsystem.

Link: https://lore.kernel.org/r/20190724233848.73327-7-skunberg.kelsey@gmail.com
Signed-off-by: Kelsey Skunberg <skunberg.kelsey@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-07-30 08:45:52 -05:00
Kelsey Skunberg
003d3b2c5f PCI: Make pci_hotplug_io_size, mem_size, and bus_size private
These symbols:

  extern unsigned long pci_hotplug_io_size;
  extern unsigned long pci_hotplug_mem_size;
  extern unsigned long pci_hotplug_bus_size;

are only used in drivers/pci/ and do not need to be seen by the rest of the
kernel.  Move them to drivers/pci/pci.h so they're private to the PCI
subsystem.

Link: https://lore.kernel.org/r/20190724233848.73327-6-skunberg.kelsey@gmail.com
Signed-off-by: Kelsey Skunberg <skunberg.kelsey@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-07-30 08:45:52 -05:00
Kelsey Skunberg
440589dd10 PCI: Make pci_save_vc_state(), pci_restore_vc_state(), etc private
These Virtual Channel interfaces:

  int pci_save_vc_state(struct pci_dev *dev);
  void pci_restore_vc_state(struct pci_dev *dev);
  void pci_allocate_vc_save_buffers(struct pci_dev *dev);

are only used in drivers/pci/ and do not need to be seen by the rest of the
kernel.  Move them to drivers/pci/pci.h so they're private to the PCI
subsystem.

Link: https://lore.kernel.org/r/20190724233848.73327-5-skunberg.kelsey@gmail.com
Signed-off-by: Kelsey Skunberg <skunberg.kelsey@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-07-30 08:45:51 -05:00
Kelsey Skunberg
975e1ac173 PCI: Make pci_get_host_bridge_device(), pci_put_host_bridge_device() private
These interfaces:

  struct device *pci_get_host_bridge_device(struct pci_dev *dev);
  void pci_put_host_bridge_device(struct device *dev);

are only used in drivers/pci/ and do not need to be seen by the rest of the
kernel.  Move them to drivers/pci/pci.h so they're private to the PCI
subsystem.

Link: https://lore.kernel.org/r/20190724233848.73327-4-skunberg.kelsey@gmail.com
Signed-off-by: Kelsey Skunberg <skunberg.kelsey@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-07-30 08:45:49 -05:00
Kelsey Skunberg
669696ebbc PCI: Make pci_check_pme_status(), pci_pme_wakeup_bus() private
These interfaces:

  bool pci_check_pme_status(struct pci_dev *dev);
  void pci_pme_wakeup_bus(struct pci_bus *bus);

are only used in drivers/pci/ and do not need to be seen by the rest of the
kernel.  Move them to drivers/pci/pci.h so they're private to the PCI
subsystem.

Link: https://lore.kernel.org/r/20190724233848.73327-3-skunberg.kelsey@gmail.com
Signed-off-by: Kelsey Skunberg <skunberg.kelsey@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-07-30 08:45:49 -05:00
Kelsey Skunberg
c776dd5019 PCI: Make PCI_PM_* delay times private
These delay time definitions:

  #define PCI_PM_D2_DELAY         200
  #define PCI_PM_D3_WAIT          10
  #define PCI_PM_D3COLD_WAIT      100
  #define PCI_PM_BUS_WAIT         50

are only used in drivers/pci/ and do not need to be seen by the rest of the
kernel.  Move them to drivers/pci/pci.h so they're private to the PCI
subsystem.

Link: https://lore.kernel.org/r/20190724233848.73327-2-skunberg.kelsey@gmail.com
Signed-off-by: Kelsey Skunberg <skunberg.kelsey@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-07-30 08:45:49 -05:00
Kelsey Skunberg
4f77e0956b PCI: Remove pci_block_cfg_access() et al (unused)
Remove the following unused functions from include/linux/pci.h:

  pci_block_cfg_access()
  pci_block_cfg_access_in_atomic()
  pci_unblock_cfg_access()

These were added by fb51ccbf21 ("PCI: Rework config space blocking
services"), though no callers were added.

Link: https://lore.kernel.org/r/20190715203416.37547-1-skunberg.kelsey@gmail.com
Signed-off-by: Kelsey Skunberg <skunberg.kelsey@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
2019-07-23 18:32:11 -05:00
Linus Torvalds
bec5545ede Merge tag 'ntb-5.3' of git://github.com/jonmason/ntb
Pull NTB updates from Jon Mason:
 "New feature to add support for NTB virtual MSI interrupts, the ability
  to test and use this feature in the NTB transport layer.

  Also, bug fixes for the AMD and Switchtec drivers, as well as some
  general patches"

* tag 'ntb-5.3' of git://github.com/jonmason/ntb: (22 commits)
  NTB: Describe the ntb_msi_test client in the documentation.
  NTB: Add MSI interrupt support to ntb_transport
  NTB: Add ntb_msi_test support to ntb_test
  NTB: Introduce NTB MSI Test Client
  NTB: Introduce MSI library
  NTB: Rename ntb.c to support multiple source files in the module
  NTB: Introduce functions to calculate multi-port resource index
  NTB: Introduce helper functions to calculate logical port number
  PCI/switchtec: Add module parameter to request more interrupts
  PCI/MSI: Support allocating virtual MSI interrupts
  ntb_hw_switchtec: Fix setup MW with failure bug
  ntb_hw_switchtec: Skip unnecessary re-setup of shared memory window for crosslink case
  ntb_hw_switchtec: Remove redundant steps of switchtec_ntb_reinit_peer() function
  NTB: correct ntb_dev_ops and ntb_dev comment typos
  NTB: amd: Silence shift wrapping warning in amd_ntb_db_vector_mask()
  ntb_hw_switchtec: potential shift wrapping bug in switchtec_ntb_init_sndev()
  NTB: ntb_transport: Ensure qp->tx_mw_dma_addr is initaliazed
  NTB: ntb_hw_amd: set peer limit register
  NTB: ntb_perf: Clear stale values in doorbell and command SPAD register
  NTB: ntb_perf: Disable NTB link after clearing peer XLAT registers
  ...
2019-07-21 09:46:59 -07:00
Linus Torvalds
ac60602a6d Merge tag 'dma-mapping-5.3-1' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
 "Fix various regressions:

   - force unencrypted dma-coherent buffers if encryption bit can't fit
     into the dma coherent mask (Tom Lendacky)

   - avoid limiting request size if swiotlb is not used (me)

   - fix swiotlb handling in dma_direct_sync_sg_for_cpu/device (Fugang
     Duan)"

* tag 'dma-mapping-5.3-1' of git://git.infradead.org/users/hch/dma-mapping:
  dma-direct: correct the physical addr in dma_direct_sync_sg_for_cpu/device
  dma-direct: only limit the mapping size if swiotlb could be used
  dma-mapping: add a dma_addressing_limited helper
  dma-direct: Force unencrypted DMA under SME for certain DMA masks
2019-07-20 12:09:52 -07:00
Linus Torvalds
e6023adc5c Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core fixes from Thomas Gleixner:

 - A collection of objtool fixes which address recent fallout partially
   exposed by newer toolchains, clang, BPF and general code changes.

 - Force USER_DS for user stack traces

[ Note: the "objtool fixes" are not all to objtool itself, but for
  kernel code that triggers objtool warnings.

  Things like missing function size annotations, or code that confuses
  the unwinder etc.   - Linus]

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
  objtool: Support conditional retpolines
  objtool: Convert insn type to enum
  objtool: Fix seg fault on bad switch table entry
  objtool: Support repeated uses of the same C jump table
  objtool: Refactor jump table code
  objtool: Refactor sibling call detection logic
  objtool: Do frame pointer check before dead end check
  objtool: Change dead_end_function() to return boolean
  objtool: Warn on zero-length functions
  objtool: Refactor function alias logic
  objtool: Track original function across branches
  objtool: Add mcsafe_handle_tail() to the uaccess safe list
  bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()
  x86/uaccess: Remove redundant CLACs in getuser/putuser error paths
  x86/uaccess: Don't leak AC flag into fentry from mcsafe_handle_tail()
  x86/uaccess: Remove ELF function annotation from copy_user_handle_tail()
  x86/head/64: Annotate start_cpu0() as non-callable
  x86/entry: Fix thunk function ELF sizes
  x86/kvm: Don't call kvm_spurious_fault() from .fixup
  x86/kvm: Replace vmx_vmenter()'s call to kvm_spurious_fault() with UD2
  ...
2019-07-20 10:45:15 -07:00
Linus Torvalds
07ab9d5bc5 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more KVM updates from Paolo Bonzini:
 "Mostly bugfixes, but also:

   - s390 support for KVM selftests

   - LAPIC timer offloading to housekeeping CPUs

   - Extend an s390 optimization for overcommitted hosts to all
     architectures

   - Debugging cleanups and improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits)
  KVM: x86: Add fixed counters to PMU filter
  KVM: nVMX: do not use dangling shadow VMCS after guest reset
  KVM: VMX: dump VMCS on failed entry
  KVM: x86/vPMU: refine kvm_pmu err msg when event creation failed
  KVM: s390: Use kvm_vcpu_wake_up in kvm_s390_vcpu_wakeup
  KVM: Boost vCPUs that are delivering interrupts
  KVM: selftests: Remove superfluous define from vmx.c
  KVM: SVM: Fix detection of AMD Errata 1096
  KVM: LAPIC: Inject timer interrupt via posted interrupt
  KVM: LAPIC: Make lapic timer unpinned
  KVM: x86/vPMU: reset pmc->counter to 0 for pmu fixed_counters
  KVM: nVMX: Ignore segment base for VMX memory operand when segment not FS or GS
  kvm: x86: ioapic and apic debug macros cleanup
  kvm: x86: some tsc debug cleanup
  kvm: vmx: fix coccinelle warnings
  x86: kvm: avoid constant-conversion warning
  x86: kvm: avoid -Wsometimes-uninitized warning
  KVM: x86: expose AVX512_BF16 feature to guest
  KVM: selftests: enable pgste option for the linker on s390
  KVM: selftests: Move kvm_create_max_vcpus test to generic code
  ...
2019-07-20 10:20:27 -07:00
Linus Torvalds
18253e034d Merge branch 'work.dcache2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull dcache and mountpoint updates from Al Viro:
 "Saner handling of refcounts to mountpoints.

  Transfer the counting reference from struct mount ->mnt_mountpoint
  over to struct mountpoint ->m_dentry. That allows us to get rid of the
  convoluted games with ordering of mount shutdowns.

  The cost is in teaching shrink_dcache_{parent,for_umount} to cope with
  mixed-filesystem shrink lists, which we'll also need for the Slab
  Movable Objects patchset"

* 'work.dcache2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  switch the remnants of releasing the mountpoint away from fs_pin
  get rid of detach_mnt()
  make struct mountpoint bear the dentry reference to mountpoint, not struct mount
  Teach shrink_dcache_parent() to cope with mixed-filesystem shrink lists
  fs/namespace.c: shift put_mountpoint() to callers of unhash_mnt()
  __detach_mounts(): lookup_mountpoint() can't return ERR_PTR() anymore
  nfs: dget_parent() never returns NULL
  ceph: don't open-code the check for dead lockref
2019-07-20 09:15:51 -07:00
Wanpeng Li
d73eb57b80 KVM: Boost vCPUs that are delivering interrupts
Inspired by commit 9cac38dd5d (KVM/s390: Set preempted flag during
vcpu wakeup and interrupt delivery), we want to also boost not just
lock holders but also vCPUs that are delivering interrupts. Most
smp_call_function_many calls are synchronous, so the IPI target vCPUs
are also good yield candidates.  This patch introduces vcpu->ready to
boost vCPUs during wakeup and interrupt delivery time; unlike s390 we do
not reuse vcpu->preempted so that voluntarily preempted vCPUs are taken
into account by kvm_vcpu_on_spin, but vmx_vcpu_pi_put is not affected
(VT-d PI handles voluntary preemption separately, in pi_pre_block).

Testing on 80 HT 2 socket Xeon Skylake server, with 80 vCPUs VM 80GB RAM:
ebizzy -M

            vanilla     boosting    improved
1VM          21443       23520         9%
2VM           2800        8000       180%
3VM           1800        3100        72%

Testing on my Haswell desktop 8 HT, with 8 vCPUs VM 8GB RAM, two VMs,
one running ebizzy -M, the other running 'stress --cpu 2':

w/ boosting + w/o pv sched yield(vanilla)

            vanilla     boosting   improved
              1570         4000      155%

w/ boosting + w/ pv sched yield(vanilla)

            vanilla     boosting   improved
              1844         5157      179%

w/o boosting, perf top in VM:

 72.33%  [kernel]       [k] smp_call_function_many
  4.22%  [kernel]       [k] call_function_i
  3.71%  [kernel]       [k] async_page_fault

w/ boosting, perf top in VM:

 38.43%  [kernel]       [k] smp_call_function_many
  6.31%  [kernel]       [k] async_page_fault
  6.13%  libc-2.23.so   [.] __memcpy_avx_unaligned
  4.88%  [kernel]       [k] call_function_interrupt

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20 09:00:45 +02:00
Wanpeng Li
0c5f81dad4 KVM: LAPIC: Inject timer interrupt via posted interrupt
Dedicated instances are currently disturbed by unnecessary jitter due
to the emulated lapic timers firing on the same pCPUs where the
vCPUs reside.  There is no hardware virtual timer on Intel for guest
like ARM, so both programming timer in guest and the emulated timer fires
incur vmexits.  This patch tries to avoid vmexit when the emulated timer
fires, at least in dedicated instance scenario when nohz_full is enabled.

In that case, the emulated timers can be offload to the nearest busy
housekeeping cpus since APICv has been found for several years in server
processors. The guest timer interrupt can then be injected via posted interrupts,
which are delivered by the housekeeping cpu once the emulated timer fires.

The host should tuned so that vCPUs are placed on isolated physical
processors, and with several pCPUs surplus for busy housekeeping.
If disabled mwait/hlt/pause vmexits keep the vCPUs in non-root mode,
~3% redis performance benefit can be observed on Skylake server, and the
number of external interrupt vmexits drops substantially.  Without patch

            VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time   Avg time
EXTERNAL_INTERRUPT    42916    49.43%   39.30%   0.47us   106.09us   0.71us ( +-   1.09% )

While with patch:

            VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time         Avg time
EXTERNAL_INTERRUPT    6871     9.29%     2.96%   0.44us    57.88us   0.72us ( +-   4.02% )

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20 09:00:40 +02:00
Linus Torvalds
8362fd64f0 Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC-related driver updates from Olof Johansson:
 "Various driver updates for platforms and a couple of the small driver
  subsystems we merge through our tree:

   - A driver for SCU (system control) on NXP i.MX8QXP

   - Qualcomm Always-on Subsystem messaging driver (AOSS QMP)

   - Qualcomm PM support for MSM8998

   - Support for a newer version of DRAM PHY driver for Broadcom (DPFE)

   - Reset controller support for Bitmain BM1880

   - TI SCI (System Control Interface) support for CPU control on AM654
     processors

   - More TI sysc refactoring and rework"

* tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (84 commits)
  reset: remove redundant null check on pointer dev
  soc: rockchip: work around clang warning
  dt-bindings: reset: imx7: Fix the spelling of 'indices'
  soc: imx: Add i.MX8MN SoC driver support
  soc: aspeed: lpc-ctrl: Fix probe error handling
  soc: qcom: geni: Add support for ACPI
  firmware: ti_sci: Fix gcc unused-but-set-variable warning
  firmware: ti_sci: Use the correct style for SPDX License Identifier
  soc: imx8: Use existing of_root directly
  soc: imx8: Fix potential kernel dump in error path
  firmware/psci: psci_checker: Park kthreads before stopping them
  memory: move jedec_ddr.h from include/memory to drivers/memory/
  memory: move jedec_ddr_data.c from lib/ to drivers/memory/
  MAINTAINERS: Remove myself as qcom maintainer
  soc: aspeed: lpc-ctrl: make parameter optional
  soc: qcom: apr: Don't use reg for domain id
  soc: qcom: fix QCOM_AOSS_QMP dependency and build errors
  memory: tegra: Fix -Wunused-const-variable
  firmware: tegra: Early resume BPMP
  soc/tegra: Select pinctrl for Tegra194
  ...
2019-07-19 17:13:56 -07:00
Linus Torvalds
24e44913aa Merge tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC platform updates from Olof Johansson:
 "SoC platform changes. Main theme this merge window:

   - The Netx platform (Netx 100/500) platform is removed by Linus
     Walleij-- the SoC doesn't have active maintainers with hardware,
     and in discussions with the vendor the agreement was that it's OK
     to remove.

   - Russell King has a series of patches that cleans up and refactors
     SA1101 and RiscPC support"

* tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (47 commits)
  ARM: stm32: use "depends on" instead of "if" after prompt
  ARM: sa1100: convert to common clock framework
  ARM: exynos: Cleanup cppcheck shifting warning
  ARM: pxa/lubbock: remove lubbock_set_misc_wr() from global view
  ARM: exynos: Only build MCPM support if used
  arm: add missing include platform-data/atmel.h
  ARM: davinci: Use GPIO lookup table for DA850 LEDs
  ARM: OMAP2: drop explicit assembler architecture
  ARM: use arch_extension directive instead of arch argument
  ARM: imx: Switch imx7d to imx-cpufreq-dt for speed-grading
  ARM: bcm: Enable PINCTRL for ARCH_BRCMSTB
  ARM: bcm: Enable ARCH_HAS_RESET_CONTROLLER for ARCH_BRCMSTB
  ARM: riscpc: enable chained scatterlist support
  ARM: riscpc: reduce IRQ handling code
  ARM: riscpc: move RiscPC assembly files from arch/arm/lib to mach-rpc
  ARM: riscpc: parse video information from tagged list
  ARM: riscpc: add ecard quirk for Atomwide 3port serial card
  MAINTAINERS: mvebu: Add git entry
  soc: ti: pm33xx: Add a print while entering RTC only mode with DDR in self-refresh
  ARM: OMAP2+: Make some variables static
  ...
2019-07-19 17:05:08 -07:00
Linus Torvalds
26473f8370 Merge tag 'iomap-5.3-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull iomap split/cleanup from Darrick Wong:
 "As promised, here's the second part of the iomap merge for 5.3, in
  which we break up iomap.c into smaller files grouped by functional
  area so that it'll be easier in the long run to maintain cohesiveness
  of code units and to review incoming patches. There are no functional
  changes and fs/iomap.c split cleanly.

  Summary:

   - Regroup the fs/iomap.c code by major functional area so that we can
     start development for 5.4 from a more stable base"

* tag 'iomap-5.3-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: move internal declarations into fs/iomap/
  iomap: move the main iteration code into a separate file
  iomap: move the buffered IO code into a separate file
  iomap: move the direct IO code into a separate file
  iomap: move the SEEK_HOLE code into a separate file
  iomap: move the file mapping reporting code into a separate file
  iomap: move the swapfile code into a separate file
  iomap: start moving code to fs/iomap/
2019-07-19 11:38:12 -07:00
Linus Torvalds
933a90bf4f Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs mount updates from Al Viro:
 "The first part of mount updates.

  Convert filesystems to use the new mount API"

* 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  mnt_init(): call shmem_init() unconditionally
  constify ksys_mount() string arguments
  don't bother with registering rootfs
  init_rootfs(): don't bother with init_ramfs_fs()
  vfs: Convert smackfs to use the new mount API
  vfs: Convert selinuxfs to use the new mount API
  vfs: Convert securityfs to use the new mount API
  vfs: Convert apparmorfs to use the new mount API
  vfs: Convert openpromfs to use the new mount API
  vfs: Convert xenfs to use the new mount API
  vfs: Convert gadgetfs to use the new mount API
  vfs: Convert oprofilefs to use the new mount API
  vfs: Convert ibmasmfs to use the new mount API
  vfs: Convert qib_fs/ipathfs to use the new mount API
  vfs: Convert efivarfs to use the new mount API
  vfs: Convert configfs to use the new mount API
  vfs: Convert binfmt_misc to use the new mount API
  convenience helper: get_tree_single()
  convenience helper get_tree_nodev()
  vfs: Kill sget_userns()
  ...
2019-07-19 10:42:02 -07:00
Linus Torvalds
5f4fc6d440 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix AF_XDP cq entry leak, from Ilya Maximets.

 2) Fix handling of PHY power-down on RTL8411B, from Heiner Kallweit.

 3) Add some new PCI IDs to iwlwifi, from Ihab Zhaika.

 4) Fix handling of neigh timers wrt. entries added by userspace, from
    Lorenzo Bianconi.

 5) Various cases of missing of_node_put(), from Nishka Dasgupta.

 6) The new NET_ACT_CT needs to depend upon NF_NAT, from Yue Haibing.

 7) Various RDS layer fixes, from Gerd Rausch.

 8) Fix some more fallout from TCQ_F_CAN_BYPASS generalization, from
    Cong Wang.

 9) Fix FIB source validation checks over loopback, also from Cong Wang.

10) Use promisc for unsupported number of filters, from Justin Chen.

11) Missing sibling route unlink on failure in ipv6, from Ido Schimmel.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits)
  tcp: fix tcp_set_congestion_control() use from bpf hook
  ag71xx: fix return value check in ag71xx_probe()
  ag71xx: fix error return code in ag71xx_probe()
  usb: qmi_wwan: add D-Link DWM-222 A2 device ID
  bnxt_en: Fix VNIC accounting when enabling aRFS on 57500 chips.
  net: dsa: sja1105: Fix missing unlock on error in sk_buff()
  gve: replace kfree with kvfree
  selftests/bpf: fix test_xdp_noinline on s390
  selftests/bpf: fix "valid read map access into a read-only array 1" on s390
  net/mlx5: Replace kfree with kvfree
  MAINTAINERS: update netsec driver
  ipv6: Unlink sibling route in case of failure
  liquidio: Replace vmalloc + memset with vzalloc
  udp: Fix typo in net/ipv4/udp.c
  net: bcmgenet: use promisc for unsupported filters
  ipv6: rt6_check should return NULL if 'from' is NULL
  tipc: initialize 'validated' field of received packets
  selftests: add a test case for rp_filter
  fib: relax source validation check for loopback packets
  mlxsw: spectrum: Do not process learned records with a dummy FID
  ...
2019-07-19 10:06:06 -07:00
Linus Torvalds
249be8511b Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:
 "The rest of MM and a kernel-wide procfs cleanup.

  Summary of the more significant patches:

   - Patch series "mm/memory_hotplug: Factor out memory block
     devicehandling", v3. David Hildenbrand.

     Some spring-cleaning of the memory hotplug code, notably in
     drivers/base/memory.c

   - "mm: thp: fix false negative of shmem vma's THP eligibility". Yang
     Shi.

     Fix /proc/pid/smaps output for THP pages used in shmem.

   - "resource: fix locking in find_next_iomem_res()" + 1. Nadav Amit.

     Bugfix and speedup for kernel/resource.c

   - Patch series "mm: Further memory block device cleanups", David
     Hildenbrand.

     More spring-cleaning of the memory hotplug code.

   - Patch series "mm: Sub-section memory hotplug support". Dan
     Williams.

     Generalise the memory hotplug code so that pmem can use it more
     completely. Then remove the hacks from the libnvdimm code which
     were there to work around the memory-hotplug code's constraints.

   - "proc/sysctl: add shared variables for range check", Matteo Croce.

     We have about 250 instances of

          int zero;
          ...
                  .extra1 = &zero,

     in the tree. This is a tree-wide sweep to make all those private
     "zero"s and "one"s use global variables.

     Alas, it isn't practical to make those two global integers const"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (38 commits)
  proc/sysctl: add shared variables for range check
  mm: migrate: remove unused mode argument
  mm/sparsemem: cleanup 'section number' data types
  libnvdimm/pfn: stop padding pmem namespaces to section alignment
  libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields
  mm/devm_memremap_pages: enable sub-section remap
  mm: document ZONE_DEVICE memory-model implications
  mm/sparsemem: support sub-section hotplug
  mm/sparsemem: prepare for sub-section ranges
  mm: kill is_dev_zone() helper
  mm/hotplug: kill is_dev_zone() usage in __remove_pages()
  mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap()
  mm/hotplug: prepare shrink_{zone, pgdat}_span for sub-section removal
  mm/sparsemem: add helpers track active portions of a section at boot
  mm/sparsemem: introduce a SECTION_IS_EARLY flag
  mm/sparsemem: introduce struct mem_section_usage
  drivers/base/memory.c: get rid of find_memory_block_hinted()
  mm/memory_hotplug: move and simplify walk_memory_blocks()
  mm/memory_hotplug: rename walk_memory_range() and pass start+size instead of pfns
  mm: make register_mem_sect_under_node() static
  ...
2019-07-19 09:45:58 -07:00
Matteo Croce
eec4844fae proc/sysctl: add shared variables for range check
In the sysctl code the proc_dointvec_minmax() function is often used to
validate the user supplied value between an allowed range.  This
function uses the extra1 and extra2 members from struct ctl_table as
minimum and maximum allowed value.

On sysctl handler declaration, in every source file there are some
readonly variables containing just an integer which address is assigned
to the extra1 and extra2 members, so the sysctl range is enforced.

The special values 0, 1 and INT_MAX are very often used as range
boundary, leading duplication of variables like zero=0, one=1,
int_max=INT_MAX in different source files:

    $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l
    248

Add a const int array containing the most commonly used values, some
macros to refer more easily to the correct array member, and use them
instead of creating a local one for every object file.

This is the bloat-o-meter output comparing the old and new binary
compiled with the default Fedora config:

    # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o
    add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164)
    Data                                         old     new   delta
    sysctl_vals                                    -      12     +12
    __kstrtab_sysctl_vals                          -      12     +12
    max                                           14      10      -4
    int_max                                       16       -     -16
    one                                           68       -     -68
    zero                                         128      28    -100
    Total: Before=20583249, After=20583085, chg -0.00%

[mcroce@redhat.com: tipc: remove two unused variables]
  Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com
[akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c]
[arnd@arndb.de: proc/sysctl: make firmware loader table conditional]
  Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de
[akpm@linux-foundation.org: fix fs/eventpoll.c]
Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18 17:08:07 -07:00
Keith Busch
371096949f mm: migrate: remove unused mode argument
migrate_page_move_mapping() doesn't use the mode argument.  Remove it
and update callers accordingly.

Link: http://lkml.kernel.org/r/20190508210301.8472-1-keith.busch@intel.com
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18 17:08:07 -07:00
Dan Williams
a3619190d6 libnvdimm/pfn: stop padding pmem namespaces to section alignment
Now that the mm core supports section-unaligned hotplug of ZONE_DEVICE
memory, we no longer need to add padding at pfn/dax device creation
time.  The kernel will still honor padding established by older kernels.

Link: http://lkml.kernel.org/r/156092356588.979959.6793371748950931916.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
Cc: David Hildenbrand <david@redhat.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18 17:08:07 -07:00
Dan Williams
ba72b4c8cf mm/sparsemem: support sub-section hotplug
The libnvdimm sub-system has suffered a series of hacks and broken
workarounds for the memory-hotplug implementation's awkward
section-aligned (128MB) granularity.

For example the following backtrace is emitted when attempting
arch_add_memory() with physical address ranges that intersect 'System
RAM' (RAM) with 'Persistent Memory' (PMEM) within a given section:

    # cat /proc/iomem | grep -A1 -B1 Persistent\ Memory
    100000000-1ffffffff : System RAM
    200000000-303ffffff : Persistent Memory (legacy)
    304000000-43fffffff : System RAM
    440000000-23ffffffff : Persistent Memory
    2400000000-43bfffffff : Persistent Memory
      2400000000-43bfffffff : namespace2.0

    WARNING: CPU: 38 PID: 928 at arch/x86/mm/init_64.c:850 add_pages+0x5c/0x60
    [..]
    RIP: 0010:add_pages+0x5c/0x60
    [..]
    Call Trace:
     devm_memremap_pages+0x460/0x6e0
     pmem_attach_disk+0x29e/0x680 [nd_pmem]
     ? nd_dax_probe+0xfc/0x120 [libnvdimm]
     nvdimm_bus_probe+0x66/0x160 [libnvdimm]

It was discovered that the problem goes beyond RAM vs PMEM collisions as
some platform produce PMEM vs PMEM collisions within a given section.
The libnvdimm workaround for that case revealed that the libnvdimm
section-alignment-padding implementation has been broken for a long
while.

A fix for that long-standing breakage introduces as many problems as it
solves as it would require a backward-incompatible change to the
namespace metadata interpretation.  Instead of that dubious route [1],
address the root problem in the memory-hotplug implementation.

Note that EEXIST is no longer treated as success as that is how
sparse_add_section() reports subsection collisions, it was also obviated
by recent changes to perform the request_region() for 'System RAM'
before arch_add_memory() in the add_memory() sequence.

[1] https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwillia2-desk3.amr.corp.intel.com

[osalvador@suse.de: fix deactivate_section for early sections]
  Link: http://lkml.kernel.org/r/20190715081549.32577-2-osalvador@suse.de
Link: http://lkml.kernel.org/r/156092354368.979959.6232443923440952359.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18 17:08:07 -07:00