This adds support for PF control over the VF vlan configuration.
I.e., `ip link ... vf <x> vlan <vid>' should now be supported.
1. <vid> != 0 => VF receives [unknowingly] only traffic tagged by
<vid> and tags all outgoing traffic sent by VF with <vid>.
2. <vid> == 0 ==> Remove the pvid configuration, reverting to previous.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding a PCI callback for `sriov_configure' and a new PCI device id for
the VF [+ Some minor changes to accomodate differences between PF and VF
at the qede].
Following this, VF creation should be possible and the entire subset of
existing PF functionality that's allow to VFs should be supported.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As the VF infrastructure is supposed to offer backward/forward
compatibility, the various types associated with VF<->PF communication
should be aligned across all various platforms that support IOV
on our family of adapters.
This adds a couple of currently missing values, specifically aligning
the enum for the various TLVs possible in the communication between them.
It then adds the PF implementation for some of those missing VF requests.
This support isn't really necessary for the Linux VF as those VFs aren't
requiring it [at least today], but are required by VFs running on other
OSes. LRO is an example of one such configuration.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Up to this point, VF and PF communication always originates from VF.
As a result, VF cannot be notified of any async changes, and specifically
cannot be informed of the current link state.
This introduces the bulletin board, the mechanism through which the PF
is going to communicate async notifications back to the VF. basically,
it's a well-defined structure agreed by both PF and VF which the VF would
continuously poll and into which the PF would DMA messages when needed.
[Bulletin board is actually allocated and communicated in previous patches
but never before used]
Based on the bulletin infrastructure, the VF can query its link status
and receive said async carrier changes.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds sufficient changes to allow VFs l2-configuration flows to work.
While the fastpath of the VF and the PF are meant to be exactly the same,
the configuration of the VF is done by the PF.
This diverges all VF-related configuration flows that originate from a VF,
making them pass through the VF->PF channel and adding sufficient logic
on the PF side to support them.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While previous patches have already added the necessary logic to probe
VFs as well as enabling them in the HW, this patch adds the ability to
support VF FLR & SRIOV disable.
It then wraps both flows together into the first IOV callback to be
provided to the protocol driver - `configure'. This would later to be used
to enable and disable SRIOV in the adapter.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds the qed VFs for the first time -
The vfs are limited functions, with a very different PCI bar structure
[when compared with PFs] to better impose the related security demands
associated with them.
This patch includes the logic neccesary to allow VFs to successfully probe
[without actually adding the ability to enable iov].
This includes diverging all the flows that would occur as part of the pci
probe of the driver, preventing VF from accessing registers/memories it
can't and instead utilize the VF->PF channel to query the PF for needed
information.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Communication between VF and PF is based on a dedicated HW channel;
VF will prepare a messge, and by signaling the HW the PF would get a
notification of that message existance. The PF would then copy the
message, process it and DMA an answer back to the VF as a response.
The messages themselves are TLV-based - allowing easier backward/forward
compatibility.
This patch adds the infrastructure of the channel on the PF side -
starting with the arrival of the notification and ending with DMAing
the response back to the VF.
It also adds a dummy-response as reference, as it only lays the
groundwork of the communication; it doesn't really add support of any
actual messages.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for a new Kconfig option for qed* driver which would allow
[eventually] the support in VFs.
This patch adds the necessary logic in the PF to learn about the possible
VFs it will have to support [Based on PCI configuration space and HW],
and prepare a database with an entry per-VF as infrastructure for future
interaction with said VFs.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan says:
====================
bnxt_en: Add workaround to detect bad opaque in rx completion.
2-part workaround for this hardware bug.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add detection and recovery code when the hardware returned opaque value
does not match the expected consumer index. Once the issue is detected,
we skip the processing of all RX and LRO/GRO packets. These completion
entries are discarded without sending the SKB to the stack and without
producing new buffers. The function will be reset from a workqueue.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a rare hardware bug that can cause a bad opaque value in the RX
or TPA completion. When this happens, the hardware may have used the
same buffer twice for 2 rx packets. In addition, the driver will also
crash later using the bad opaque as the index into the ring.
The rx opaque value is predictable and is always monotonically increasing.
The workaround is to keep track of the expected next opaque value and
compare it with the one returned by hardware during RX and TPA start
completions. If they miscompare, we will not process any more RX and
TPA completions and exit NAPI. We will then schedule a workqueue to
reset the function.
This patch adds the logic to keep track of the next rx consumer index.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If qlcnic_fw_cmd_get_minidump_temp() fails then "fw_dump->tmpl_hdr" is
NULL or possibly freed. It can lead to an oops later.
Fixes: d01a6d3c8a ('qlcnic: Add support to enable capability to extend minidump for iSCSI')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
More amdgpu fixes for 4.7. Highlights:
- enable async pageflips
- UVD fixes for polaris
- lots of GPUVM fixes
- whitespace and code cleanups
- misc bug fixes
* 'drm-next-4.7' of git://people.freedesktop.org/~agd5f/linux: (32 commits)
drm/amd/powerplay: rewrite pp_sw_init to make code readable
drm/amdgpu/dce11: fix audio offset for asics with >7 audio pins
drm/amdgpu: fix and cleanup user fence handling v2
drm/amdgpu: move VM fields into job
drm/amdgpu: move the context from the IBs into the job
drm/amdgpu: move context switch handling into common code v2
drm/amdgpu: move preamble IB handling into common code
drm/amdgpu/gfx7: fix pipeline sync
amdgpu/uvd: separate context buffer from DPB
drm/amdgpu: use fence_context to judge ctx switch v2
drm/amd/amdgpu: Added more named DRM info messages for debugging
drm/amd/amdgpu: Add name field to amd_ip_funcs (v2)
drm/amdgpu: Support DRM_MODE_PAGE_FLIP_ASYNC (v2)
drm/amdgpu/dce11: don't share PLLs on Polaris
drm/amdgpu: Drop unused parameter for *get_sleep_divider_id_from_clock
drm/amdgpu: Simplify calculation in *get_sleep_divider_id_from_clock
drm/amdgpu: Use max macro in *get_sleep_divider_id_from_clock
drm/amd/powerplay: Use defined constants for minium engine clock
drm/amdgpu: add missing licenses on a couple of files
drm/amdgpu: fetch cu_info once at init
...
drm/tegra: Changes for v4.7-rc1
Two small changes, one getting rid of the bogus gamma table size and
another removing Terje from the MAINTAINERS file since he no longer does
any work on host1x or display.
* tag 'drm/tegra/for-4.7-rc1' of git://anongit.freedesktop.org/tegra/linux:
MAINTAINERS: Remove Terje Bergström as Tegra DRM maintainer
drm/tegra: Don't set a gamma table size
Summary:
- expose HDMI-PHY clock to other drivers.
. this patch was included in below patch series but I missed.
http://www.spinics.net/lists/dri-devel/msg103097.html
- some fixups about DECON5433 driver
. this patch corrects vblank handling and fixes up trigger
configuration.
- use generic functions - gem_prime_mmap and dma_buf_mmap.
- use DMA-Mapping API instead of specific one.
- some code cleanups and fixeups.
* 'exynos-drm-next' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos:
drm/exynos/decon5433: fix trigger configuration
drm/exynos/dsi: use of_graph_get_endpoint_by_regs helper
drm/exynos/dpi: use of_graph_get_endpoint_by_regs helper
drm/exynos: Nuke dummy fb->dirty callback
drm/exynos: use directly DMA mapping APIs on g2d
drm/exynos/hdmi: Don't print error on deferral due to regulators
drm/exynos: fix imported dma-buf to be mapped
drm/exynos: support gem_prime_mmap
drm/exynos: fimd: harden fimd_calc_clkdiv()
drm/exynos: fix cancel page flip code
drm/exynos/decon5433: do not use unnecessary software trigger
drm/exynos/decon5433: handle vblank in vblank interrupt
drm/exynos/hdmi: expose HDMI-PHY clock as pipeline clock
Two some radeon display fixes.
* 'drm-fixes-4.6' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: fix PLL sharing on DCE6.1 (v2)
drm/radeon: fix DP link training issue with second 4K monitor
Misc intel fixes, reverting MST audio which was causing oops for now.
* tag 'drm-intel-fixes-2016-05-11' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Bail out of pipe config compute loop on LPT
Revert "drm/i915: start adding dp mst audio"
drm/i915/bdw: Add missing delay during L3 SQC credit programming
drm/i915/lvds: separate border enable readout from panel fitter
drm/i915: Update CDCLK_FREQ register on BDW after changing cdclk frequency
Samuel Ortiz says:
====================
NFC 4.7 pull request
This is the first NFC pull request for 4.7. With this one we
mainly have:
- Support for NXP's pn532 NFC chipset. The pn532 is based on the same
microcontroller as the pn533, but it talks to the host through i2c
instead of USB. By separating the pn533 driver into core and PHY
parts, we can not add the i2c layer and support the pn532 chipset.
- Support for NCI's loopback mode. This is a testing mode where each
packet received by the NFCC is sent back to the DH, allowing the
host to test that the controller can receive and send data.
- A few ACPI related fixes for the STMicro drivers, in order to match
the device tree naming scheme.
- A bunch of cleanups for the st-nci and the st21nfca STMicro drivers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed says:
====================
Mellanox 100G mlx5 CQE compression
Introducing ConnectX-4 CQE (Completion Queue Entry) compression feature
for mlx5 etherent driver.
CQE Compressing reduces PCI overhead by coalescing and compressing multiple CQEs into a
single merged CQE. Successful compressing improves message rate especially for small packet
traffic.
CQE Compressing in details:
Instead of writing full CQEs to memory, multiple almost identical CQEs are merged and compressed.
Information that is shared between the CQEs is written once, regardless of the number of
compressed CQEs. In addition, only the unique information (small amount of bytes compared to
full CQE size) is written per CQE.
CQE Compression Block:
This block contains multiple compressed CQEs. CQE Compression Block contains a single copy
of CQEs properties which are shared between all the compressed CQEs (called Title, see below)
and multiple mini CQEs (CQEs in compressed form).
Title:
The Title holds information which is shared between all the compressed CQEs in the CQE Compression
Block. In each Compression Block there is only a single Title regardless of the number
of compressed CQEs.
Mini CQE:
A CQE in compressed form that holds some data needed to extract a single full CQE, for example
8 Bytes instead of 64 Bytes.
The shared information between all compressed CQEs, which belong to the same CQE Compression
Block called Title, is written once, and only the unique information in each compressed
CQE, for example 8 bytes, is written per compressed CQE, called mini CQE.
Since CQE Compression can add overhead to the software (CPU),
it will be only enabled on "weak/slow" PCI slots, where it can actually help.
Applied on top: c047c3b1af ('netfilter: conntrack: remove uninitialized shadow variable')
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We turn the feature ON, only for servers with PCI BW < MAX LINK BW, as it
helps reducing PCI pressure on weak PCI slots, but it adds some software
overhead.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the MPWQE/Striding RQ default configuration dynamic and not
statically set at compile time. Now at driver load we set
stride size and num strides dynamically.
By default we use same values as before, but when CQE compression
is enabled, we set larger stride size to benefit from CQE
compression for larger packets.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CQE compression feature is meant to save PCIe bandwidth by
compressing few CQEs into smaller amount of bytes on PCIe.
CQE compression can be selectively enabled per CQ. By default
is disabled for now and will be enabled later on.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn says:
====================
More enabler patches for DSA probing
The complete set of patches for the reworked DSA probing is too big to
post as once. These subset contains some enablers which are easy to
review.
Eventually, the Marvell driver will instantiate its own internal MDIO
bus, rather than have the framework do it, thus allows devices on the
bus to be listed in the device tree. Initialize the main mutex as soon
as it is created, to avoid lifetime issues with the mdio bus.
A previous patch renamed all the DSA probe functions to make room for
a true device probe. However the recent merging of all the Marvell
switch drivers resulted in mv88e6xxx going back to the old probe
name. Rename it again, so we can have a driver probe function.
Add minimum support for the Marvell switch driver to probe as an MDIO
device, as well as an DSA driver. Later patches will then register
this device with the new DSA core framework.
Move the GPIO reset code out of the DSA code. Different drivers may
need different reset mechanisms, e.g. via a reset controller for
memory mapped devices. Don't clutter up the core with this. Let each
driver implement what it needs.
master_dev is no longer needed in the switch drivers, since they have
access to a device pointer from the probe function. Remove it.
Let the switch parse the eeprom length from its one device tree
node. This is required with the new binding when the central DSA
platform device no longer exists.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
A switch can export an attached EEPROM using the standard ethtool API.
However the switch itself cannot determine the size of the EEPROM, and
multiple sizes are allowed. Thus a device tree property is supported
to indicate the length of the EEPROM. Parse this property during
device probe, and implement a callback function to retrieve it.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dsa_switch structure contains a dsa_chip_data member called pd.
However in the rest of the code, pd is used for dsa_platform_data.
This is confusing. Rename it cd, which is already often used in dsa.c
and slave.c for this data type.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The switch drivers only use the master_dev member for dev_info()
messages. Now that the device is passed to the old style probe, and
new style drivers are probed as true linux drivers, this is no longer
needed.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Resetting the switch is something the driver does, not the framework.
So move the parsing of this property into the driver.
There are no in kernel users of this property, so moving it does not
break anything. There is however a board which will make use of this
property making its way into the kernel.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow Marvell switches to be mdio devices. Currently the driver just
allocate the private structure and detects what device is on the
bus. Later patches will make them register with the DSA framework.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
All other DSA drivers use _drv_ in there DSA probe function name, thus
allowing for a true linux driver probe function to use the
conventional name. Make mv88e6xxx fit this pattern.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
By initialising immediately it, we don't run the danger of using it
before it is initialised.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some switch models have a STU (per VLAN port state database). Add a new
capability flag to switches info, instead of checking their family.
Also if the 6165 family has an STU, it must have a VTU, so add the
MV88E6XXX_FLAG_VTU to its family flags.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both VTU and STU operations use the same routine to access their
(common) data registers, with a different offset.
Add VTU and STU specific read and write functions to the data registers
to abstract the required offset.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern says:
====================
net: vrf: Fixup PKTINFO to return enslaved device index
Applications such as OSPF and BFD need the original ingress device not
the VRF device; the latter can be derived from the former. To that end
move the packet intercept from an rx handler that is invoked by
__netif_receive_skb_core to the ipv4 and ipv6 receive processing.
IPv6 already saves the skb_iif to the control buffer in ipv6_rcv. Since
the skb->dev has not been switched the cb has the enslaved device. Make
the same happen for IPv4 by adding the skb_iif to inet_skb_parm and set
it in ipv4 code after clearing the skb control buffer similar to IPv6.
From there the pktinfo can just pull it from cb with the PKTINFO_SKB_CB
cast.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Applications such as OSPF and BFD need the original ingress device not
the VRF device; the latter can be derived from the former. To that end
add the skb_iif to inet_skb_parm and set it in ipv4 code after clearing
the skb control buffer similar to IPv6. From there the pktinfo can just
pull it from cb with the PKTINFO_SKB_CB cast.
The previous patch moving the skb->dev change to L3 means nothing else
is needed for IPv6; it just works.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the VRF driver uses the rx_handler to switch the skb device
to the VRF device. Switching the dev prior to the ip / ipv6 layer
means the VRF driver has to duplicate IP/IPv6 processing which adds
overhead and makes features such as retaining the ingress device index
more complicated than necessary.
This patch moves the hook to the L3 layer just after the first NF_HOOK
for PRE_ROUTING. This location makes exposing the original ingress device
trivial (next patch) and allows adding other NF_HOOKs to the VRF driver
in the future.
dev_queue_xmit_nit is exported so that the VRF driver can cycle the skb
with the switched device through the packet taps to maintain current
behavior (tcpdump can be used on either the vrf device or the enslaved
devices).
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Protocol for 4in6 tunnel is IPPROTO_IPIP. This was wrongly changed by
the last cleanup.
CC: Tom Herbert <tom@herbertland.com>
Fixes: 0d3c703a9d ("ipv6: Cleanup IPv6 tunnel receive path")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add config option PCI_ECAM and file drivers/pci/ecam.c to provide generic
functions for accessing memory-mapped PCI config space.
The API is defined in drivers/pci/ecam.h and is written to replace the API
in drivers/pci/host/pci-host-common.h. The file defines a new 'struct
pci_config_window' to hold the information related to a PCI config area and
its mapping. This structure is expected to be used as sysdata for
controllers that have ECAM based mapping.
Helper functions are provided to setup the mapping, free the mapping and to
implement the map_bus method in 'struct pci_ops'
Signed-off-by: Jayachandran C <jchandra@broadcom.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The comments and the core_busy variable name in
get_target_pstate_use_performance() are totally confusing,
so modify them to reflect what's going on.
The results of the computations should be the same as before.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Notice that get_avg_pstate() can use sample.core_avg_perf instead of
carrying the same division again, so make it do that.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The core_pct_busy field of struct sample actually contains the
average performace during the last sampling period (in percent)
and not the utilization of the core as suggested by its name
which is confusing.
For this reason, change the name of that field to core_avg_perf
and rename the function that computes its value accordingly.
Also notice that storing this value as percentage requires a costly
integer multiplication to be carried out in a hot path, so instead
store it as an "extended fixed point" value with more fraction bits
and update the code using it accordingly (it is better to change the
name of the field along with its meaning in one go than to make those
two changes separately, as that would likely lead to more
confusion).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently, in intel_pstate_clear_update_util_hook(), after
clearing the utilization update hook, we leverage
synchronize_sched() to deal with synchronization, which
is a little bit time-costly because synchronize_sched()
has to wait for all the CPUs to go through a grace period.
Actually, the synchronize_sched() is not necessary if the utilization
update hook has not been set for the given CPU yet, so make the driver
check if that's the case and avoid the synchronize_sched() call then.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=116371
Tested-by: Tian Ye <yex.tian@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
[ rjw : Rebase ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CPU_FREQ_GOV_SCHEDUTIL gained a dependency on SMP, so now we
get a warning if it gets selected by CPU_FREQ_DEFAULT_GOV_SCHEDUTIL
without SMP:
warning: (CPU_FREQ_DEFAULT_GOV_SCHEDUTIL) selects CPU_FREQ_GOV_SCHEDUTIL which has unmet direct dependencies (CPU_FREQ && SMP)
This adds another dependency to avoid the problem.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: bf7cdff194 (cpufreq: schedutil: Make it depend on CONFIG_SMP)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If we don't support a mechanism for bypassing IRQs, don't register as
a consumer. This eliminates meaningless dev_info()s when the connect
fails between producer and consumer, such as on AMD systems where
kvm_x86_ops->update_pi_irte is not implemented
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A NULL token is meaningless and can only lead to unintended problems.
Error on registration with a NULL token, ignore de-registrations with
a NULL token.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The KVM_MAX_VCPUS define provides the maximum number of vCPUs per guest, and
also the upper limit for vCPU ids. This is okay for all archs except PowerPC
which can have higher ids, depending on the cpu/core/thread topology. In the
worst case (single threaded guest, host with 8 threads per core), it limits
the maximum number of vCPUS to KVM_MAX_VCPUS / 8.
This patch separates the vCPU numbering from the total number of vCPUs, with
the introduction of KVM_MAX_VCPU_ID, as the maximal valid value for vCPU ids
plus one.
The corresponding KVM_CAP_MAX_VCPU_ID allows userspace to validate vCPU ids
before passing them to KVM_CREATE_VCPU.
This patch only implements KVM_MAX_VCPU_ID with a specific value for PowerPC.
Other archs continue to return KVM_MAX_VCPUS instead.
Suggested-by: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit c896939f7c ("KVM: use heuristic for fast VCPU lookup by id") added
a return path that prevents vcpu ids to exceed KVM_MAX_VCPUS. This is a
problem for powerpc where vcpu ids can grow up to 8*KVM_MAX_VCPUS.
This patch simply reverses the logic so that we only try fast path if the
vcpu id can be tried as an index in kvm->vcpus[]. The slow path is not
affected by the change.
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>